CN117611933A

CN117611933A - Image processing method, device, equipment and medium based on classified network model

Info

Publication number: CN117611933A
Application number: CN202410095417.4A
Authority: CN
Inventors: 盛国军; 王坤; 秦承刚; 刘和松; 胡明臣
Original assignee: Karos Iot Technology Co ltd; Cosmoplat Industrial Intelligent Research Institute Qingdao Co Ltd
Current assignee: Karos Iot Technology Co ltd; Cosmoplat Industrial Intelligent Research Institute Qingdao Co Ltd
Priority date: 2024-01-24
Filing date: 2024-01-24
Publication date: 2024-02-27

Abstract

The application belongs to the technical field of industrial Internet, and particularly relates to an image processing method, device, equipment and medium based on a classification network model. And constructing a classifying network model with a plurality of sequential stages according to the acquired model construction parameter set and the historical training data set, and then processing the re-acquired original image by using the classifying network model to obtain a processing result of the original image. The method solves the problem that the Vision Transformer model cannot accurately distinguish the image elements of different species in the same category in the image classification task, thereby improving the accuracy of the image classification processing. Meanwhile, the generalization capability of the classification network model is improved, so that the classification network model can adapt to various different image processing tasks.

Description

Image processing method, device, equipment and medium based on classified network model

Technical Field

The application belongs to the technical field of industrial Internet, and particularly relates to an image processing method, device, equipment and medium based on a classification network model.

Background

With the continuous progress of deep learning technology, a CNN-based image classification method has achieved remarkable results. However, CNNs have limitations in processing spatial information of images, and present challenges for fine-grained image classification tasks. To solve this problem, a fine-grained image classification method based on Vision Transformer (simply "ViT") has been developed. The Vision Transformer model adopts a self-attention mechanism, and can comprehensively consider the context information in the image, so that the model is excellent in fine-grained image classification tasks.

However, while fine-grained image classification methods based on Vision Transformer exhibit certain advantages in certain tasks, they still suffer from a number of drawbacks. For example, since the self-attention mechanism in the Vision Transformer model is not sensitive to pixel location information in the image, some key nuances may be ignored in fine-grained image classification, resulting in false identifications.

Therefore, how to solve the problem that the Vision Transformer model cannot accurately distinguish image elements of different species in the same category in the image classification task.

Disclosure of Invention

The application provides an image processing method, device, equipment and medium based on a classification network model, which are used for solving the problem that Vision Transformer model cannot accurately distinguish image elements of different species in the same class in an image classification task.

In a first aspect, the present application provides an image processing method based on a classification network model, including:

obtaining a model build dataset, the model build dataset comprising: constructing a parameter set and a historical training data set by a model;

constructing a classification network model according to the model construction parameter set and the historical training data set, wherein the classification network model comprises: a plurality of training phases with a sequence;

Acquiring an original image in real time, and carrying out pixel processing on the original image according to the training stages to obtain a plurality of images to be processed;

and determining the processing result of the original image according to the plurality of images to be processed and the classification network model.

Optionally, the constructing a classification network model according to the model constructing parameter set and the historical training data set includes:

determining a plurality of candidate model parameters according to the model construction parameter set, wherein the candidate model parameters comprise: a general neural network model, a target classifier, an attention transfer algorithm and a progressive training algorithm;

constructing a classification network model to be trained according to the general neural network model, the target classifier, the attention transfer algorithm and the progressive training algorithm;

and training the classification network model to be trained according to the historical training data set to obtain a classification network model after training.

Optionally, the historical training data set includes: the training of the classification network model to be trained according to the historical training data set to obtain a trained classification network model comprises the following steps:

Analyzing the classification network model to be trained to obtain a plurality of training stages corresponding to the classification network model;

and training the classification network model to be trained according to the training stages and the image parameters to obtain the classification network model after training.

Optionally, the training the classification network model to be trained according to the plurality of training phases and the plurality of image parameters, and after obtaining the trained classification network model, the method further includes:

obtaining a test dataset comprising: a plurality of test images;

sequentially inputting the plurality of test images into the trained classification network model to obtain a verification result corresponding to each verification image;

judging whether the verification results reach a preset verification result or not;

and when the verification results reach the preset verification results, determining that the use state of the trained classification network model is an available state.

Optionally, the performing pixel processing on the original image according to the plurality of training phases to obtain a plurality of images to be processed includes:

Determining an image parameter standard of each training stage according to the plurality of training stages;

determining target image parameters of the original image according to the original image;

and carrying out pixel processing on the target image parameters according to a plurality of image parameter standards to obtain a plurality of images to be processed.

Optionally, the determining the processing result of the original image according to the multiple images to be processed and the classification network model includes:

according to the progressive training algorithm, respectively carrying out feature processing on the plurality of images to be processed to obtain feature results corresponding to each image to be processed;

according to the attention transfer algorithm, distinguishing the multiple feature results to obtain a target feature result corresponding to each training stage;

adopting the target classifier to respectively analyze a plurality of target feature results to obtain classification loss corresponding to each target feature result;

and determining the processing result according to the plurality of classification losses and the plurality of target feature results.

Optionally, the distinguishing processing is performed on the plurality of feature results according to the attention transfer algorithm to obtain a target feature result corresponding to each training stage, and the method further includes:

Determining a feature vector corresponding to each stage according to a plurality of feature results;

and according to the attention transfer algorithm, distinguishing the plurality of feature vectors to obtain the target feature result.

In a second aspect, the present application provides an image processing apparatus based on a classification network model, including:

an acquisition module for acquiring a model build dataset comprising: constructing a parameter set and a historical training data set by a model;

a building module, configured to build a classification network model according to the model building parameter set and the historical training data set, where the classification network model includes: a plurality of training phases with a sequence;

the acquisition module is also used for acquiring an original image;

the processing module is used for carrying out pixel processing on the original image according to the training stages to obtain a plurality of images to be processed;

and the determining module is used for determining the processing result of the original image according to the plurality of images to be processed and the classification network model.

Optionally, the determining module is further configured to determine a plurality of candidate model parameters according to the model construction parameter set, where the candidate model parameters include: a general neural network model, a target classifier, an attention transfer algorithm and a progressive training algorithm;

The construction module is specifically configured to construct a classification network model to be trained according to the general neural network model, the target classifier, the attention transfer algorithm and the progressive training algorithm;

the apparatus further comprises: a training module;

and the training module is used for training the classification network model to be trained according to the historical training data set to obtain a classification network model after training.

Optionally, the processing module is further configured to analyze the classification network model to be trained to obtain a plurality of training phases corresponding to the classification network model;

the training module is further configured to train the classification network model to be trained according to the plurality of training phases and the plurality of image parameters, and obtain the trained classification network model.

Optionally, the acquiring module is further configured to acquire a test data set, where the test data set includes: a plurality of test images;

the apparatus further comprises: an input module;

the input module is used for sequentially inputting the plurality of test images into the trained classification network model to obtain a verification result corresponding to each verification image;

The apparatus further comprises: a judging module;

the judging module is used for judging whether the plurality of verification results all reach the preset verification result;

and the determining module is used for determining that the use state of the classification network model after training is completed is an available state when the plurality of verification results reach preset verification results.

Optionally, the determining module is further configured to determine an image parameter standard of each training phase according to the plurality of training phases;

the determining module is further used for determining target image parameters of the original image according to the original image;

the processing module is specifically configured to perform pixel processing on the target image parameter according to a plurality of image parameter standards, so as to obtain the plurality of images to be processed.

Optionally, the processing module is further configured to perform feature processing on the multiple images to be processed according to the progressive training algorithm, so as to obtain a feature result corresponding to each image to be processed;

the processing module is further used for respectively distinguishing the plurality of characteristic results according to the attention transfer algorithm to obtain a target characteristic result corresponding to each training stage;

The processing module is further used for respectively analyzing the multiple target feature results by adopting the target classifier to obtain classification loss corresponding to each target feature result;

the determining module is specifically configured to determine the processing result according to the multiple classification losses and the multiple target feature results.

Optionally, the determining module is further configured to determine a feature vector corresponding to each stage according to a plurality of feature results;

the processing module is specifically configured to perform a distinguishing process on the plurality of feature vectors according to the attention transfer algorithm, so as to obtain the target feature result.

In a third aspect, the present application provides an image processing apparatus based on a classification network model, comprising:

a memory;

a processor;

wherein the memory stores computer-executable instructions;

the processor executes the computer-executable instructions stored in the memory to implement the image processing method based on the classification network model as described in the first aspect and the various possible implementation manners of the first aspect.

In a fourth aspect, the present application provides a computer storage medium having stored thereon computer-executable instructions to be executed by a processor to implement the image processing method based on a classification network model as described in the first aspect and various possible implementations of the first aspect.

According to the image processing method based on the classified network model, the classified network model with a plurality of sequential stages is firstly constructed according to the acquired model construction parameter set and the historical training data set, and then the classified network model is used for processing the acquired original image again to obtain the processing result of the original image. The method solves the problem that the Vision Transformer model cannot accurately distinguish the image elements of different species in the same category in the image classification task, thereby improving the accuracy of the image classification processing. Meanwhile, the generalization capability of the classification network model is improved, so that the classification network model can adapt to various different image processing tasks.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

FIG. 1 is a flowchart of a method for image processing based on a classification network model provided in the present application;

FIG. 2 is a second flowchart of an image processing method based on a classification network model provided in the present application;

FIG. 3 is a flowchart III of an image processing method based on a classification network model provided in the present application;

FIG. 4 is a schematic diagram of an image processing apparatus based on a classification network model provided in the present application;

fig. 5 is a schematic structural diagram of an image processing apparatus based on a classification network model provided in the present application.

Specific embodiments thereof have been shown by way of example in the drawings and will herein be described in more detail. These drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but to illustrate the concepts of the present application to those skilled in the art by reference to specific embodiments.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the present application will be clearly and completely described below with reference to the drawings in the present application, and it is apparent that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein.

In the embodiments of the present application, words such as "exemplary" or "such as" are used to mean examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

First, the names related to the present application will be explained.

Vision Transformer model: the ViT model is called ViT model, which is a transducer model applied to visual tasks. It applies the standard transducer structure directly to the image with minimal modification to the overall image classification process. The Vision Transformer model splits the whole image into small image blocks, then sends the linear embedded sequences of these small image blocks as input to the network as a transducer, and uses a supervised learning mode to perform training of image classification.

Softmax classifier: for mapping the input feature vectors to probability distributions on preset categories. The function of softmax is to convert the original score or probability value into a normalized probability distribution such that the sum of the probabilities for each category is 1. In the image classification task, the softmax classifier can map the image features to different categories, calculate the probability value of each category according to the similarity between the image features and each category, and finally classify the image to the category with the largest probability value. In Vision Transformer, the softmax classifier can classify images and improve classification accuracy and generalization ability according to different feature extraction methods and training strategies.

Progressive algorithm (Progressive Algorithm): generally refers to a class of algorithms that can solve problems or optimize objectives in a stepwise and progressive manner. When the progressive algorithm trains the neural network, the network structure or training data can be adjusted step by step according to a certain rule and strategy, so that the model can learn more complex and detailed characteristic representation step by step. The method can effectively avoid the problems of fitting or gradient disappearance and the like at the initial stage of training, and can realize more efficient model training under the condition of limited computing resources and time.

Attention algorithm: for helping the neural network model to better focus on relevant information when processing large amounts of data. In Vision Transformer, the attention diversion module may be regarded as a special attention algorithm. The method comprises the steps of obtaining a characteristic diagram of the most significant part of network attention in the current stage, then inputting the most significant part of the characteristic diagram to the next stage after being restrained or blocked, and forcing the next stage to pay attention to other unobtrusive but distinguishing parts. This effectively extracts insignificant features that aid in fine-grained image classification.

With the continuous progress of deep learning technology, a Convolutional Neural Network (CNN) -based image classification method has achieved remarkable results. However, CNNs have limitations in processing spatial information of images, making them challenging in fine-grained image classification tasks. To solve this problem, a fine-grained image classification method based on Vision Transformer has been developed. Due to the adoption of the self-attention mechanism, the Vision Transformer can comprehensively consider the context information in the image, so that excellent performance is shown in a fine-grained image classification task.

In view of the above, the present application provides an image processing method based on a classification network model. And constructing a classifying network model with a plurality of sequential stages according to the acquired model construction parameter set and the historical training data set, and then processing the re-acquired original image by using the classifying network model to obtain a processing result of the original image. The method solves the problem that the Vision Transformer model cannot accurately distinguish the image elements of different species in the same category in the image classification task, thereby improving the accuracy of the image classification processing. Meanwhile, the generalization capability of the classification network model is improved, so that the classification network model can adapt to various different image processing tasks.

The following describes the technical solutions of the present application and how the technical solutions of the present application solve the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 1 is a flowchart of an image processing method based on a classification network model according to an embodiment of the present application. As shown in fig. 1, the image processing method based on the classification network model provided in this embodiment includes:

s101: obtaining a model build dataset, the model build dataset comprising: the model builds a parameter set, a historical training data set.

Since the model construction data set includes the model construction parameter set and the historical training data set, by acquiring the model construction data set, it means that various parameter configurations and existing training data sets required for model training can be obtained. These parameter configurations and training data sets may be used to construct a classification network model with image classification processing functions.

It can be appreciated that obtaining the model construction dataset is a precondition for constructing a new classification network model, which provides necessary parameters and data support, so that a classification network model with good performance can be quickly constructed, thereby saving a great deal of time and cost, and simultaneously, the existing training data can be utilized to improve the processing performance of the model.

There are many forms of acquisition for different acquisition targets. When the acquisition target is a model building parameter set, the acquisition mode may be determined from a platform recommendation result after accessing an online resource platform (for example GitHub, kaggle, openAI, etc.), or may be acquired after querying an expert, or may be acquired after accessing an online open source item.

When the acquisition target is a historical training data set, the acquisition mode can be obtained from a data storage library or an online resource platform, for example.

In this step, for example, the model building parameter set and the history training data set may be acquired separately, and then the model building parameter set and the history training data set may be unified and generalized to the model building data set.

S102: constructing a classification network model according to the model construction parameter set and the historical training data set, wherein the classification network model comprises: a plurality of training phases with a sequence.

After the model construction parameter set and the historical training data set are obtained, the two data sets can be combined for a series of processing and analysis, so that a classification model network which can be used is constructed.

It will be appreciated that since the model building parameter set includes various parameter configurations (e.g., parameter configurations including neural network model structures, classifiers, and model processing algorithms, etc.) and existing training data (e.g., existing training data includes: a plurality of historical training data), a classification network model having an image classification processing function can be built together by comprehensively considering the various parameter configurations and the plurality of historical training data in the model building parameter set.

Because the classification network model is constructed according to a plurality of configuration parameters, when the plurality of configuration parameters includes a neural network model having a plurality of training phases, it can be determined that the classification network model also includes training phases having a sequence.

S103: and acquiring an original image in real time, and carrying out pixel processing on the original image according to the training stages to obtain a plurality of images to be processed.

The step of acquiring the original image means that the classification result of various specific image elements in the image needs to be determined according to the currently obtained image.

The original image may be obtained by a camera, a scanner, or an APP integrated by a classification network model. This is not particularly limited in this application.

Because the classification network model after training has a plurality of training stages and also has the function of accurately classifying and processing the input image, after the original image is determined, the original image is subjected to pixel processing according to the plurality of training stages of the classification network model, and a plurality of images to be processed corresponding to each training stage are obtained.

It can be understood that when it is determined that the classification network model after training has a plurality of training stages, it can be determined that different training stages have different image processing pixels, so when an original image is obtained, pixel processing should be performed on the original image that needs to be received by the classification network model according to the image processing pixels of each training stage, so as to obtain a plurality of images to be processed, so as to accurately identify and classify the input original image.

S104: and determining the processing result of the original image according to the plurality of images to be processed and the classification network model.

When a plurality of images to be processed are obtained, the classification result of the original image can be obtained by comprehensively considering the images to be processed and the classification network model.

It will be appreciated that since the original image may contain a plurality of image elements, each has its unique characteristics. These features may be color, shape, texture, size, etc. In some cases, these image elements may have some similar features, but there is still a subtle difference between them.

Therefore, in order to obtain accurate classification results of a plurality of image elements in an original image, when a plurality of images to be processed are obtained, the plurality of images to be processed are required to be input into a classification network model, so that the plurality of image elements in the original image are accurately classified.

For example, assume that there is currently an animal picture that contains a number of different birds, such as hawk, pigeon, and magpie. These birds are of the general class of birds, but they are of different species. Therefore, in order to accurately classify these birds, it is necessary to determine a plurality of animal images to be processed from animal pictures and then input the plurality of animal images to be processed into a classification network model trained in advance. The model correctly classifies each bird into a corresponding category according to the characteristics and modes of the image, so that an accurate classification result is output for display to a user.

According to the image processing method based on the classified network model, firstly, a model construction parameter set and a historical training data set are obtained, secondly, a classified network model with a plurality of sequential stages is constructed according to the obtained model construction parameter set and the historical training data set, then, pixel processing is carried out on an original image obtained again to obtain a plurality of images to be processed, and finally, the classified network model is used for processing the plurality of images to be processed to obtain a processing result of the original image. The method solves the problem that the Vision Transformer model in the prior art cannot accurately classify the input image in fine granularity. Meanwhile, the generalization capability of the classification network model is improved, so that the classification network model can adapt to various different image processing tasks.

Fig. 2 is a flowchart second of an image processing method based on a classification network model according to an embodiment of the present application. As shown in fig. 2, this embodiment is a detailed description of a training process of the classification network model based on the embodiment of fig. 1, and the image processing method based on the classification network model provided in this embodiment includes:

s201: obtaining a model build dataset, the model build dataset comprising: the model builds a parameter set, a historical training data set.

The explanation of step S201 is similar to that of step S101, and will not be repeated here.

S202: determining a plurality of candidate model parameters according to the model construction parameter set, wherein the candidate model parameters comprise: a general neural network model, a target classifier, an attention-transfer algorithm, and a progressive training algorithm.

The general neural network model may be, for example, vision Transformer neural network model. The target classifier may be, for example, a softmax classifier.

The purpose of determining a plurality of candidate model parameters by means of a model construction parameter set is to be able to obtain a model for classifying an image.

It will be appreciated that a generic neural network model (e.g., vision Transformer neural network model) provides a powerful representation of image data, while a target classifier (e.g., softmax classifier) is used to classify images into predetermined categories. Other parameters such as a attention transfer algorithm, a progressive training algorithm and the like can help to optimize the performance of the model, and improve the accuracy and efficiency of image classification.

Therefore, when the model construction parameter set is obtained, a plurality of candidate model parameters need to be determined from the model construction parameter set in order to be able to obtain a model that classifies images.

S203: and constructing a classification network model to be trained according to the general neural network model, the target classifier, the attention transfer algorithm and the progressive training algorithm.

The method comprises the steps of constructing a classification network model to be trained, wherein the purpose of constructing the classification network model to be trained is to train a network model suitable for accurately classifying images.

It will be appreciated that the generic neural network model provides a powerful representation capability for image data. They can extract complex features from the original image, enabling the model to better understand and classify the image content. The target classifier is responsible for classifying the images into predetermined categories, ensuring that the model can accurately classify the images according to the input image data.

The attention transfer algorithm helps the model to focus better on key areas in the image, highlighting important features, and thus improving classification accuracy. The progressive training algorithm is used for optimizing the training process of the model, so that the model can be gradually adapted to different data distribution, and the classification robustness is improved.

Therefore, by combining the general neural network model, the target classifier, the attention transfer algorithm and the progressive training algorithm, a network model suitable for accurately classifying the image can be trained.

S204: and analyzing the classification network model to be trained to obtain a plurality of training stages corresponding to the classification network model.

When it is determined that the component parts of the classification network model include the general neural network model, the training phase of the classification network model may be planned according to the characteristic that the general neural network model itself has a plurality of training phases.

It will be appreciated that a generic neural network model (e.g., model Vision Transformer) typically includes multiple phases during model training. Thus, by mapping the training phases of these generic models into the training process of the classification network model, it can be determined that the training process of the classification network model is divided into a plurality of training phases.

S205: the historical training dataset comprises: and training the classification network model to be trained according to the training phases and the image parameters to obtain the trained classification network model.

The training aims to enable the classification network model to automatically learn the rules and the characteristics of classification according to input data, so that the classification can be effectively performed on unknown data.

It will be appreciated that when the classification network model to be trained is obtained, it does not have the ability to accurately classify images since it has not been learned from any data. Therefore, there is a need to train a classification network model to be trained using a historical training data set to enable the classification network model to effectively classify on unknown data.

When the model to be trained is trained by utilizing the historical training data set, the model to be trained can continuously learn the modes and rules in the historical data, gradually optimize the parameter setting of the model to be trained, and therefore the unknown data can be better classified until the trained classified network model is obtained.

S206: obtaining a test dataset comprising: a plurality of test images.

Wherein the test dataset is a set of test data for evaluating the performance of the classified network model.

The purpose of acquiring the test data set is to evaluate the performance of the classification network model on unknown data when the training of the classification network model is determined to be completed, so as to improve the accuracy and the effectiveness of the classification network model.

It will be appreciated that the test dataset contains data that is different from the training dataset and can be used to evaluate the performance of the model on unknown data. Therefore, by evaluating the indexes such as classification accuracy, precision and the like of the classification network model on the test data set, the classification processing capacity of the classification network model on the image can be determined.

The acquisition mode of acquiring the test data set can be obtained from a data storage library, or can be obtained from an online resource platform. This is not particularly limited in this application.

S207: and sequentially inputting the plurality of test images into the trained classification network model to obtain a verification result corresponding to each verification image.

Wherein the validation results may be used to evaluate the performance of the classification network model on the test dataset.

After the training of the classification network model is completed, a plurality of test images can be sequentially input into the trained classification network model, and each test image can obtain a corresponding verification result so as to evaluate the performance of the classification network model according to each verification result.

It will be appreciated that since the test dataset is a set of unknown data used to evaluate the performance of the classification network model, a plurality of test images of the test dataset are input into the already trained classification network model, which classifies each image and outputs a corresponding classification result.

S208: judging whether the verification results reach a preset verification result or not; if yes, go to step S209; if not, step S208 is performed.

The use states of the classification network model include: available state, unavailable state. The available state refers to that the classification network model can classify the input image according to the training parameters and the training structure, and output an accurate classification result. The unavailable state refers to that the classification network model cannot output an accurate classification result.

The purpose of judging whether the plurality of verification results reach the preset verification results is to determine whether the performance of the current classified network model on the test data set is good.

It will be appreciated that by comparing each verification result with a preset verification result, the stability and reliability of the classification network model may be evaluated. And if all the verification results reach the preset verification results, the performance of the classification network model on different test data sets is reliable. Conversely, if any one or more verification results do not reach the preset verification result, the classification processing capability of the classification network model is insufficient.

Therefore, through a plurality of verification results and preset verification results, the method can be used for evaluating the performance and the stability of the classified network model so as to evaluate the classification processing capacity of the classified network model.

If the verification results reach the preset verification results, the current classification network model is indicated to have good performance on the test data set, and at the moment, the use state of the classification network model after training can be determined to be an available state.

If any one or more verification results do not reach the preset verification results, the current classification network model is indicated to perform poorly on the test data set, and at this time, the use state of the classification network model after training can be determined to be an unavailable state.

It will be appreciated that when the usage status of the classification network model is determined to be available, it is indicated that the classification network model performs well on the test dataset. This means that the classification network model has a high accuracy and reliability in the classification task. In this case, the trained classification network model can therefore be regarded as available.

Conversely, when the usage status of the classification network model is determined to be an unavailable status, it is indicated that the classification network model performs poorly on the test dataset. This may mean that the classification network model has some problems or inadequacies, such as over-fitting, under-fitting, or other training errors, so it may not provide accurate classification results. In this case, therefore, the trained classification network model is in an unavailable state.

S209: and determining the use state of the trained classification network model as an available state.

According to the image processing method based on the classified network model, firstly, a plurality of candidate model parameters of the classified network model are determined according to an obtained model construction parameter set, then, the classified network model to be trained is determined according to the plurality of candidate model parameters, secondly, analysis processing is conducted on the classified network model to be trained to obtain a plurality of training stages corresponding to the classified network model, then, a plurality of training images and image parameters corresponding to each training image are determined according to an obtained historical training data set, then, training is conducted on the classified network model to be trained according to the plurality of training stages and the plurality of image parameters to obtain a trained classified network model, finally, verification is conducted on the trained classified network model according to a test data set until a plurality of verification results reach preset verification results, and the use state of the trained classified network model is determined to be an available state. According to the method, a plurality of candidate model parameters are determined through the model construction parameter set, so that the flexibility of model construction is improved, and the model can adapt to different data distribution and task requirements. Meanwhile, the method can improve the generalization capability of the model by training through a plurality of training images, so that the model can be better adapted to different image features.

Fig. 3 is a flowchart III of an image processing method based on a classification network model according to an embodiment of the present application. As shown in fig. 3, this embodiment is a detailed description of a process of classifying an original image by using a classification network model based on the embodiment of fig. 1, and the image processing method based on the classification network model provided in this embodiment includes:

s301: and acquiring an original image in real time.

The explanation of step S301 is similar to the above steps, and will not be repeated here.

S302: and determining the image parameter standard of each training stage according to the training stages.

Wherein the purpose of determining the image parameter criteria for each training stage is to ensure that the input image meets the image parameter criteria for each training stage at each training stage.

It will be appreciated that since different training phases have different image processing pixels, when multiple training phases of the classification network model are obtained, in order to ensure that the input image can meet the image parameter criteria of each training phase, the image parameter criteria of each training phase need to be acquired so as to accurately process the input image by the classification network model, thereby improving the accuracy of the classification network model.

S303: and determining target image parameters of the original image according to the original image.

Wherein by determining the target classification parameters of the original image it is meant that image parameter information relating to the image can be obtained.

It will be appreciated that when raw images are obtained, these images typically have their own parameters such as size, resolution, color space, etc. While different training phases may require different image parameter criteria, different classification tasks may have different requirements on the size, resolution and color space of the image. Thus, in order to meet the requirements of each training phase, the target image parameters of the original image need to be determined in order to adjust the target image parameters.

S304: and carrying out pixel processing on the target image parameters according to a plurality of image parameter standards to obtain a plurality of images to be processed.

The object of the pixel processing of the target image parameters is to ensure that the input image matches the parameter criteria of the training phase.

It will be appreciated that since the image parameter criteria are different for each training stage, it is also necessary to perform pixel processing on the target image parameters according to the image parameter criteria for each training stage when determining the target image parameters for the original image, to ensure that the image meets the specific parameter criteria for that stage. These parameter criteria may include the size, resolution, color space, contrast, etc. of the image.

S305: and respectively carrying out feature processing on the plurality of images to be processed according to the progressive training algorithm to obtain feature results corresponding to each image to be processed.

Wherein, by extracting the characteristics of each image to be processed, more specific and fine characteristic results can be obtained. The feature results can reflect the content of the original image more directly, so that the classification network model can be helped to better understand the image content, and the possibility of misjudgment and confusion is reduced.

It can be understood that by converting a plurality of images to be processed into feature vectors, the classification processing speed and classification processing accuracy of image data can be improved. Meanwhile, the influence of factors such as illumination, angles and noise on image recognition can be reduced by the feature processing, so that the robustness of the classification network model is improved.

S306: and respectively distinguishing the plurality of characteristic results according to the attention transfer algorithm to obtain target characteristic results corresponding to each training stage.

Wherein, when a plurality of feature results of the image to be processed are obtained, since the feature results contain various information of the image, including salient features and non-salient features. Thus, in order to extract features that are not significant but distinguishing features, attention-diversion algorithms need to be used.

It will be appreciated that insignificant features may be more discriminative, enabling classification network models to better classify them. Whereas the attention-diversion algorithm can make the classification network model focus more on those features that are less significant but important, thereby better classifying the original image.

Thus, combining the attention-shifting algorithm with the multiple feature results, the attention of the classification network model can be shifted from those obvious features to less salient but distinguishing features, thereby more fully understanding the image content and improving the classification and recognition accuracy.

S307: and respectively analyzing the plurality of target feature results by adopting the target classifier to obtain classification loss corresponding to each target feature result.

After the target feature result of each stage is obtained, the classification loss of each target feature result on the classification task can be obtained by combining the target classifier with a plurality of target feature results, so that the specific classification result of the original image can be determined according to the plurality of classification losses.

It will be appreciated that during the processing of the classification network model, each stage produces a set of feature results that are intended to provide useful information for subsequent classification tasks. By using multiple feature results and classifiers to jointly determine the final classification of the image, accuracy of the image classification can be ensured.

S308: and determining the processing result according to the plurality of classification losses and the plurality of target feature results.

Wherein, since the original image may contain a plurality of image elements of the same category but different species. Therefore, in order to obtain accurate classification results of these image elements, multiple classification losses and target feature results need to be considered in combination.

It will be appreciated that in processing the image classification task, the original image may contain a plurality of image elements, each having its unique characteristics. These features may be color, shape, texture, size, etc. In some cases, these image elements may have some similar features, but there is still a subtle difference between them. For example, two different birds may have wings and feathers, which are similar. But their feathers may differ in detail in color, shape, size, etc.

Thus, the inability to use classification loss or target feature results alone may be insufficient to accurately distinguish between these image elements. And multiple classification losses and target feature results need to be comprehensively considered so as to better determine the specific classification result of the original image.

Optionally, according to the attention transfer algorithm, the multiple feature results are respectively distinguished, and a specific implementation process of obtaining the target feature result corresponding to each training stage may be, for example: determining a feature vector corresponding to each stage according to a plurality of feature results; and according to the attention transfer algorithm, distinguishing the plurality of feature vectors to obtain a target feature result.

The feature result of each training stage comprises feature results of a plurality of position layers, and the last layer position of each training stage can reflect the feature of each stage in the training process of the classification network model, so that when the plurality of feature results are obtained, the feature vector of each stage can be obtained in a linear mapping layer mode.

It can be appreciated that in the training process of the classification network model, the feature result of each position contains the context information and the spatial position information of the image due to the self-attention mechanism and the position coding mode in the transducer structure. Therefore, after a plurality of feature results are obtained, the original feature results can be subjected to dimension reduction processing in a linear mapping layer mode, and feature vectors of each stage are extracted, so that feature vectors of the position of the last layer are obtained.

After the feature vectors for each stage are obtained, since these feature vectors contain various information in the original image, including salient and less salient features. Although these feature vectors all contain important information of the image, some of them are less significant, but these less significant feature vectors have good distinguishability, enabling the classification network model to classify them better. Thus, in order to extract feature vectors that are not significant but distinguishing features, attention-diversion algorithms need to be used.

It will be appreciated that by means of the distraction algorithm, the attention of the classification network model can be distracted from those apparent features to less pronounced but distinguishing features, thereby more fully understanding the image content and improving the accuracy of classification and recognition.

According to the image processing method based on the classification network model, firstly, after an original image is acquired, parameter standards of each training stage are determined according to a plurality of training stages of the classification network model, meanwhile, target parameters of the original image are determined according to the original image, then pixel processing is carried out on the target parameters according to the parameter standards to obtain a plurality of images to be processed, secondly, feature processing is carried out on the plurality of images to be processed according to a progressive training algorithm to obtain feature results corresponding to each image to be processed, then distinguishing processing is carried out on the feature results according to an attention transfer algorithm to obtain target feature results corresponding to each training stage, finally, analysis processing is carried out on the target feature results respectively by adopting a target classifier to obtain classification losses corresponding to each target feature result, and then processing results of the original image are determined according to the classification losses and the target feature results. According to the method, a plurality of images to be processed are obtained by carrying out pixel processing on target parameters of an original image. And then, carrying out feature processing on the images, so that important features in the images can be extracted, and the classification accuracy is improved. Meanwhile, the method carries out feature processing on a plurality of images to be processed through a progressive training algorithm, an attention transfer algorithm and a target classifier, and carries out distinguishing processing on a plurality of feature results, so that the classification network model pays more attention to the features related to classification tasks, and the classification accuracy is improved.

Fig. 4 is a schematic structural diagram of an image processing apparatus based on a classification network model provided in the present application. As shown in fig. 4, the present application provides an image processing apparatus based on a classification network model, the image processing apparatus 400 based on the classification network model including:

an acquisition module 401 for acquiring a model build dataset comprising: constructing a parameter set and a historical training data set by a model;

a construction module 402, configured to construct a classification network model according to the model construction parameter set and the historical training data set, where the classification network model includes: a plurality of training phases with a sequence;

the acquiring module 401 is further configured to acquire an original image;

a processing module 403, configured to perform pixel processing on the original image according to the plurality of training phases, to obtain a plurality of images to be processed;

a determining module 404, configured to determine a processing result of the original image according to the multiple images to be processed and the classification network model.

Optionally, the determining module 404 is further configured to determine a plurality of candidate model parameters according to the model building parameter set, where the candidate model parameters include: a general neural network model, a target classifier, an attention transfer algorithm and a progressive training algorithm;

The construction module 402 is specifically configured to construct a classification network model to be trained according to the general neural network model, the target classifier, the attention transfer algorithm and the progressive training algorithm;

the apparatus further comprises: a training module 405;

the training module 405 is configured to train the classification network model to be trained according to the historical training data set, so as to obtain a classification network model after training is completed.

Optionally, the processing module 403 is further configured to analyze the classification network model to be trained to obtain a plurality of training phases corresponding to the classification network model;

the training module 405 is further configured to train the classification network model to be trained according to the plurality of training phases and the plurality of image parameters, so as to obtain the trained classification network model.

Optionally, the obtaining module 401 is further configured to obtain a test data set, where the test data set includes: a plurality of test images;

the apparatus further comprises: an input module 406;

the input module 406 is configured to sequentially input the plurality of test images to the trained classification network model, to obtain a verification result corresponding to each verification image;

The apparatus further comprises: a judgment module 407;

the judging module 407 is configured to judge whether the multiple verification results all reach a preset verification result;

the determining module 404 is configured to determine that the usage state of the trained classification network model is an available state when the plurality of verification results reach a preset verification result.

Optionally, the determining module 404 is further configured to determine an image parameter standard of each training phase according to the plurality of training phases;

the determining module 404 is further configured to determine a target image parameter of the original image according to the original image;

the processing module 403 is specifically configured to perform pixel processing on the target image parameter according to a plurality of image parameter standards, so as to obtain the plurality of images to be processed.

Optionally, the processing module 403 is further configured to perform feature processing on the multiple images to be processed according to the progressive training algorithm, to obtain a feature result corresponding to each of the images to be processed;

the processing module 403 is further configured to perform a distinguishing process on the multiple feature results according to the attention transfer algorithm, so as to obtain a target feature result corresponding to each training stage;

The processing module 403 is further configured to analyze the multiple target feature results by using the target classifier, so as to obtain a classification loss corresponding to each target feature result;

the determining module 404 is specifically configured to determine the processing result according to the multiple classification losses and the multiple target feature results.

Optionally, the determining module 404 is further configured to determine a feature vector corresponding to each stage according to a plurality of feature results;

the processing module 403 is specifically configured to perform a distinguishing process on the plurality of feature vectors according to the attention transfer algorithm, so as to obtain the target feature result.

Fig. 5 is a schematic structural diagram of an image processing apparatus based on a classification network model provided in the present application. As shown in fig. 5, the present application provides an image processing apparatus based on a classification network model, the image processing apparatus 500 based on the classification network model including: a receiver 501, a transmitter 502, a processor 503 and a memory 504.

A receiver 501 for receiving instructions and data;

a transmitter 502 for transmitting instructions and data;

memory 504 for storing computer-executable instructions;

a processor 503 for executing computer-executable instructions stored in a memory 504 to implement the steps executed by the image processing method based on the classification network model in the above-described embodiment. Reference may be made in particular to the description of the embodiments of the image processing method based on the classification network model described above.

Alternatively, the memory 504 may be separate or integrated with the processor 503.

When the memory 504 is provided separately, the electronic device further comprises a bus for connecting the memory 504 and the processor 503.

The present application also provides a computer-readable storage medium in which computer-executable instructions are stored, which when executed by a processor, implement an image processing method based on a classification network model as performed by the image processing apparatus based on a classification network model described above.

Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.

While the present application has been described in connection with the preferred embodiments illustrated in the accompanying drawings, it will be readily understood by those skilled in the art that the scope of the application is not limited to such specific embodiments, and the above examples are intended to illustrate the technical aspects of the application, but not to limit it; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

Claims

1. An image processing method based on a classified network model, the method further comprising:

2. The method of claim 1, wherein said constructing a classification network model from said model-constructing parameter sets and said historical training data sets comprises:

3. The method of claim 2, wherein the historical training data set comprises: the training of the classification network model to be trained according to the historical training data set to obtain a trained classification network model comprises the following steps:

4. A method according to claim 3, wherein the training of the classification network model to be trained based on the plurality of training phases and a plurality of image parameters results in the trained classification network model, the method further comprising:

obtaining a test dataset comprising: a plurality of test images;

5. The method according to claim 1, wherein the performing pixel processing on the original image according to the plurality of training phases to obtain a plurality of images to be processed includes:

6. A method according to claim 3, wherein said determining the processing result of the original image from the plurality of images to be processed and the classification network model comprises:

7. The method of claim 6, wherein the distinguishing the plurality of feature results according to the attention diversion algorithm to obtain the target feature result corresponding to each training stage, the method further comprises:

8. An image processing apparatus based on a classification network model, comprising:

the acquisition module is also used for acquiring an original image;

9. An image processing apparatus based on a classification network model, comprising:

a memory;

a processor;

wherein the memory stores computer-executable instructions;

The processor executes the computer-executable instructions stored by the memory to implement the classification network model-based image processing method of any of claims 1-7.

10. A computer storage medium having stored therein computer executable instructions which when executed by a processor are adapted to implement the method of image processing based on a classification network model as claimed in any of claims 1-7.