CN115240037A

CN115240037A - Model training method, image processing method, device and storage medium

Info

Publication number: CN115240037A
Application number: CN202211161244.9A
Authority: CN
Inventors: 孟海秀; 温书远; 陈录城; 孙琦; 王艳纳
Original assignee: Haier Digital Technology Qingdao Co Ltd; Cosmoplat Industrial Intelligent Research Institute Qingdao Co Ltd; Haier Cosmo IoT Technology Co Ltd
Current assignee: Haier Digital Technology Qingdao Co Ltd; Cosmoplat Industrial Intelligent Research Institute Qingdao Co Ltd; Haier Cosmo IoT Technology Co Ltd
Priority date: 2022-09-23
Filing date: 2022-09-23
Publication date: 2022-10-25
Also published as: WO2024060684A1

Abstract

The application discloses a model training method, an image processing method, equipment and a storage medium, which relate to the field of image processing, and the model training method comprises the following steps: preprocessing the image in the sample data set to obtain a multi-label word vector of the image and a multi-label adjacency matrix of the image; clustering multi-label word vectors and multi-label adjacency matrixes by adopting a graph wavelet neural network model to obtain a classification model; extracting the characteristics of the image to be processed and outputting a characteristic matrix of the image to be processed; training the classification model based on the characteristic matrix to obtain a multi-label probability distribution model of the image to be processed; and adopting a loss function to carry out convergence processing on the probability distribution model to obtain an image annotation model, wherein the image annotation model is used for carrying out annotation processing on the target image to obtain a label of the target image. The purpose of improving the precision of image labeling is achieved.

Description

Model training method, image processing method, device and storage medium

Technical Field

The application belongs to the field of image processing, and particularly relates to a model training method, an image processing method, equipment and a storage medium.

Background

With the development of computer vision technology, image annotation plays a crucial role in computer vision. The goal of image annotation is to determine task-specific labels that are relevant to the task.

In the related art, image annotation is usually performed based on a Graph Convolution Neural Network (GCNN), but this method has a large amount of calculation, does not have a locality characteristic, and cannot mine correlation and co-occurrence between labels of an image, so that the related art has a problem of low accuracy of image annotation when performing image annotation.

Disclosure of Invention

In order to solve the above problem, that is, to solve the problem that the precision of image annotation is low when the current technology performs image annotation, the present application provides a model training method, an image processing method, an apparatus, and a storage medium.

In a first aspect, the present application provides a model training method, including: preprocessing the image in the sample data set to obtain a multi-label word vector of the image and a multi-label adjacency matrix of the image; clustering multi-label word vectors and multi-label adjacency matrixes by adopting a graph wavelet neural network model to obtain a classification model; extracting the characteristics of the image to be processed, and outputting a characteristic matrix of the image to be processed; training the classification model based on the characteristic matrix to obtain a multi-label probability distribution model of the image to be processed; and adopting a loss function to carry out convergence processing on the probability distribution model to obtain an image annotation model, wherein the image annotation model is used for carrying out annotation processing on the target image to obtain a label of the target image.

In the preferred technical solution of the above model training method, the graph wavelet neural network model includes a 2-layer graph wavelet neural network, and the clustering process is performed on the multi-label word vector and the multi-label adjacency matrix by using the graph wavelet neural network model to obtain a classification model, including: adopting a first-layer image wavelet neural network in the 2-layer image wavelet neural network to perform clustering processing on the multi-label word vectors and the multi-label adjacency matrixes to obtain output vectors; and clustering the output vectors by adopting a second layer image wavelet neural network in the 2-layer image wavelet neural network to obtain a classification model.

In a preferred technical solution of the above model training method, the first-layer graph wavelet neural network is a nonlinear activation function silu, and the second-layer graph wavelet neural network is a nonlinear activation function softmax.

In a preferred technical solution of the above model training method, preprocessing an image in a sample data set to obtain a multi-label adjacency matrix of the image includes: determining a first parameter and a second parameter of the multi-label adjacency matrix according to a first label and a second label of the image, wherein the first parameter is used for representing the number of times that the first label and the second label in the sample data set appear at the same time, and the second parameter is used for representing the number of times that the first label appears in the sample data set; determining a conditional probability matrix of the image according to the first parameter and the second parameter; carrying out binarization processing on the conditional probability matrix to obtain a binarization adjacent matrix; and carrying out weighting processing on the binary adjacency matrix to obtain a multi-label adjacency matrix.

In the preferred technical solution of the above model training method, training the classification model based on the feature matrix to obtain a multi-label probability distribution model of the image to be processed includes: and carrying out matrix multiplication processing on the characteristic matrix and the classification model to obtain a probability distribution model.

In the preferred technical solution of the above model training method, performing convergence processing on the probability distribution model by using a loss function to obtain an image labeling model, including: determining a network hyper-parameter of the probability distribution model; and carrying out convergence processing on the probability distribution model according to the loss function and the network hyper-parameter to obtain an image annotation model.

In a preferred technical solution of the above model training method, the converging processing is performed on the probability distribution model by using a loss function to obtain an image labeling model, including: monitoring the value of the loss function and the precision value of the probability distribution model; and if the value of the loss function is smaller than the first threshold value or the precision value of the probability distribution model is larger than the second threshold value, outputting the probability distribution model as an image annotation model.

In a second aspect, the present application provides an image processing method, including: acquiring an image to be processed; and inputting the image to be processed into an image annotation model for annotation processing to obtain a label of the image to be processed, wherein the image annotation model is obtained by training through the model training method of the first aspect.

In a third aspect, the present application provides a model training apparatus, comprising: the preprocessing module is used for preprocessing the image in the sample data set to obtain a multi-label word vector of the image and a multi-label adjacency matrix of the image; the clustering module is used for clustering multi-label word vectors and multi-label adjacency matrixes by adopting a graph wavelet neural network model to obtain a classification model; the extraction module is used for extracting the characteristics of the image to be processed and outputting a characteristic matrix of the image to be processed; the training module is used for training the classification model based on the characteristic matrix to obtain a multi-label probability distribution model of the image to be processed; and the convergence module is used for carrying out convergence processing on the probability distribution model by adopting a loss function to obtain an image annotation model, and the image annotation model is used for carrying out annotation processing on the target image to obtain a label of the target image.

In a fourth aspect, the present application provides an image processing apparatus comprising: the acquisition module is used for acquiring an image to be processed; and the annotation module is used for inputting the image to be processed into the image annotation model for annotation processing to obtain the label of the image to be processed, wherein the image annotation model is obtained by training through the model training method of the first aspect.

In a fifth aspect, the present application provides an electronic device, comprising: a processor, and a memory communicatively coupled to the processor; the memory stores computer execution instructions; the processor executes computer-executable instructions stored by the memory to implement the model training method of the first aspect or the image processing method of the second aspect.

In a sixth aspect, the present application provides a computer-readable storage medium having stored thereon computer-executable instructions for implementing the model training method of the first aspect or the image processing method of the second aspect when executed by a processor.

In a seventh aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the model training method of the first aspect or the image processing method of the second aspect.

According to the model training method, the image processing equipment and the storage medium, the graph wavelet neural network is adopted, the local characteristics of the graph wavelets and the correlation among multiple labels of the image are utilized, the co-occurrence characteristics among the multiple labels are fully captured, and therefore the precision of labeling the multiple label images is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

Fig. 1 is a schematic structural diagram of an image processing system according to an embodiment of the present application;

FIG. 2 is a flow chart of a model training method provided in an embodiment of the present application;

FIG. 3 is a schematic diagram of a model training process provided in an embodiment of the present application;

FIG. 4 is a flowchart of an image processing method provided in an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a model training apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic view of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The terms referred to in the present application will be explained first.

Adjacency matrix: the two-dimensional array of data that holds the relationships (edges or arcs) between all vertices in the graph is called the adjacency matrix.

Graph wavelet neural network: a Wavelet Neural Network (WNN) is an artificial Neural Network provided on the basis of breakthrough in Wavelet analysis research. The method is a novel layered and multi-resolution artificial neural network model constructed based on wavelet analysis theory and wavelet transformation. Graph Wavelet Neural Network (GWNN) is a wavelet neural network used for analyzing images.

Image labeling: the method is a process of adding text characteristic information reflecting the content of the image to the image by a machine learning method aiming at the visual content of the image.

The related art mentioned in the background art has at least the following technical problems:

with the development of computer vision technology, image annotation plays a crucial role in computer vision. The goal of image annotation is to determine task-specific labels that are relevant to the task. For large-scale data sets, image automatic annotation is usually performed based on a Graph Convolution Neural Network (GCNN) in the related technology, but the method needs to perform characteristic decomposition of a graph correlation matrix in graph Fourier change, and the calculated amount is large; the multi-label correlation matrix has the characteristic of sparsity, and the sparsity of the original multi-label image automatic labeling based on the graph convolution neural network cannot be utilized; in addition, the graph convolution neural network has no locality characteristic, and correlation and co-occurrence characteristics among multiple labels cannot be mined.

In order to solve the problems, the application provides a model training method and an image processing method, a graph wavelet neural network is adopted, and the co-occurrence characteristics among multiple labels are fully captured by utilizing the localization characteristics of a graph wavelet and the correlation among the multiple labels of an image, so that the trained image labeling model can achieve the purpose of improving the precision of labeling the multiple label images, and the calculation cost can be reduced, thereby achieving the purpose of improving the efficiency of labeling the multiple label images.

In a possible implementation, the model training method and the image processing method provided by this embodiment may be applied in an application scenario. Fig. 1 is a schematic structural diagram of an image processing system according to an embodiment of the present application. As shown in FIG. 1, in this scenario, the image processing system may include a data acquisition device 101, a database 102, a training device 103, an execution device 104, a data storage system 105, and a user device 106, wherein the execution device 104 includes a target model/rule 107 and an I/O interface 108.

The data acquisition device 101 may be configured to obtain a multi-label adjacency matrix of a preprocessed image and a multi-label word vector of the preprocessed image of the sample data set, and store the multi-label adjacency matrix and the multi-label word vector in the database 102.

The training device 103 may perform the model training method in the embodiment of the present application, so as to train the target model/rule 107 for acquiring the image label. The target models/rules 107 derived by the training device 103 may be applied in different systems or devices.

The execution device 104 is configured with an I/O interface 108, and can perform data interaction with the user device 106, and a user can input a target image to be subjected to tag labeling to the I/O interface 108 through the user device 106; the object model/rules 107 in the execution device 104 may process the input object image to obtain a label for the object image; the I/O interface 108 returns the label of the target image to the user device 106 for presentation to the user by the user device 106.

The execution device 104 may call data, code, etc. in the data storage system 105, or may store data, instructions, etc. in the data storage system 105.

Based on the above scenario, in one case, the user may manually input the target image to the I/O interface 108 through the user device 106, for example, operating in an interface provided by the I/O interface 108; in another case, the user device 106 may automatically enter the target image into the I/O interface 108 and retrieve the tag for the target image returned by the I/O interface 108. It should be noted that, if the user device 106 automatically inputs data into the I/O interface 108 and obtains a result returned by the I/O interface 108, the user device 106 needs to obtain authorization of the user, and the user may set a permission for response in the user device 106.

In the above scenario, the user device 106 may also serve as a data acquisition end to store the received target image and the tag of the target image in the database 102 for use as a sample.

It should be noted that the structure of the image processing system shown in fig. 1 is only a schematic diagram, and the positional relationship between the devices, modules, and the like shown in the diagram does not constitute any limitation, for example, in fig. 1, the data storage system 105 is an external memory with respect to the execution device 104, and in other cases, the data storage system 105 may be disposed in the execution device 104; the database 102 is an external memory with respect to the training device 103, in other cases the database 102 may also be placed in the training device 103.

With reference to the above scenario, the following describes in detail the technical solutions of the model training method and the image processing method provided in the present application through several specific embodiments.

The embodiment of the application provides a model training method. Fig. 2 is a flowchart of a model training method provided in an embodiment of the present application, which may be executed by the training apparatus in fig. 1, as shown in fig. 2, where the model training method includes the following steps:

s201: and preprocessing the image in the sample data set to obtain a multi-label word vector of the image and a multi-label adjacency matrix of the image.

In this step, the image in the sample data set may be preprocessed through a pre-written Python script, and a Glove construction method is adopted to construct a multi-label word vector word2vec of the image.

Alternatively, the multi-tag adjacency matrix may be determined by the label and the interconnection relationship between the labels.

Optionally, the sample data set may include sample image data, text data, and tag data.

S202: and clustering the multi-label word vectors and the multi-label adjacency matrixes by adopting a graph wavelet neural network model to obtain a classification model.

In this step, a multi-label classification module may be constructed using the graph wavelet neural network to obtain a graph wavelet neural network model, and the multi-label word vectors and the multi-label adjacency matrix are clustered by the graph wavelet neural network model to obtain a classification model. Wherein, the classification model is the classifier.

Alternatively, the multi-label classification module may be composed of a two-layer graph wavelet neural network.

S203: and performing feature extraction on the image to be processed, and outputting a feature matrix of the image to be processed.

In this step, the image to be processed may be a training data set image in a training process, and may be an image input by a user in an actual operation process, a ResNet-101 network may be used to establish a feature extraction module, and the feature extraction module is used to perform feature extraction on the image to be processed, so as to obtain a feature matrix of the image to be processed.

S204: and training the classification model based on the characteristic matrix to obtain a multi-label probability distribution model of the image to be processed.

In this step, the probability distribution model is the multi-label probability distribution result of the image to be processed. After the feature matrix and the classification model are obtained, the classification model can be trained by using the feature matrix, so that probability distribution results of a plurality of labels of the image to be processed are obtained.

Alternatively, the probability of labeling each label of the plurality of labels of the image to be processed may be 1 or 0.

S205: and adopting a loss function to carry out convergence processing on the probability distribution model to obtain an image annotation model.

In this step, the image annotation model is used to perform annotation processing on the target image to obtain a label of the target image.

Optionally, for the multi-label classification problem, the Loss function may adopt a Binary Cross Entropy Loss function (Binary Cross Entropy Loss), and after the probability distribution model is obtained, the model parameter in the probability distribution model may be subjected to convergence processing by using the Binary Cross Entropy Loss function, so that when the obtained image labeling model performs image labeling on a target image, the accuracy of labeling the target image may be improved.

According to the model training method provided by the embodiment of the application, the graph wavelet neural network is adopted, the local characteristics of the graph wavelets and the correlation among the multiple labels of the image are utilized, the co-occurrence characteristics among the multiple labels are fully captured, and therefore the accuracy of labeling the multiple label images is improved.

In a possible implementation manner, the graph wavelet neural network model is a 2-layer graph wavelet neural network, and the clustering processing is performed on the multi-label word vector and the multi-label adjacency matrix by using the graph wavelet neural network model to obtain a classification model, which includes: adopting a first layer image wavelet neural network in the 2-layer image wavelet neural network to perform clustering processing on the multi-label word vectors and the multi-label adjacency matrix to obtain output vectors; and clustering the output vectors by adopting an activation function of a second layer image wavelet neural network in the 2-layer image wavelet neural network to obtain a classification model.

In this scheme, a directed graph G = (V, E, W) between multiple labels may be first established, where V may be used to represent a node of the directed graph G, E may be used to represent an edge of the directed graph G, and W may be used to represent a weight of the edge E. In the directed graph G, each node V may represent a label, and the directed graph G may be represented by a multi-label word vector. And connecting each label in the plurality of labels with each other to obtain a multi-label adjacency matrix, and clustering by using a two-layer graph wavelet neural network to obtain a classification model.

Alternatively, in general, the larger the convolution kernel, the larger the receptive field (receptive field), the more image information that can be seen, the better the global features obtained. Preferably, the graph wavelet neural network model in the present application may specifically include a two-layer graph wavelet neural network, and each layer of graph wavelet neural network may have 32 graph convolution kernels; optionally, to enlarge the receptive field, each layer of the graph wavelet neural network may also have 80 graph convolution kernels.

In the above scheme, the graph wavelet convolution may be defined as:

wherein, the wavelet basis can be expressed as:

u, may be used to represent the laplacian eigenvector,

can be used to represent matrix dot products; the wavelet transform can be expressed as:

the inverse wavelet transform may be expressed as:

，yandxall are variables of the graph wavelet participating in the operation.

Then further graph wavelet neural network can be obtainedModel (model)HIt can be formulated as follows:

wherein the content of the first and second substances,

can be used to represent the basis of a wavelet,

may be used to represent an inverse graph wavelet transform and h may be used to represent a nonlinear activation function.

After further simplification, a graph wavelet neural network model comprising a two-layer graph wavelet neural network, namely a classification model can be obtainedZThe formula may be as follows:

wherein the content of the first and second substances,

may be used to represent a multi-label adjacency matrix,Xcan be used to represent a tag characterization matrix,Wcan be used for representing a parameter matrix to be trained of the model, and the parameter matrix to be trained of the model can be initialized randomly.

In the scheme, the graph wavelet neural network algorithm is adopted to replace a graph convolution neural network, the calculated amount can be reduced, the correlation among multiple labels can be fully utilized by constructing a directed graph, the co-occurrence characteristic among the multiple labels can be fully captured by utilizing the localization characteristic of the graph wavelet, the multiple labels can have good interpretability, and therefore the image annotation model obtained by subsequent training can improve the efficiency of image annotation on the target image.

In one possible embodiment, the first-level map wavelet neural network is a nonlinear activation function silu, and the second-level map wavelet neural network is a nonlinear activation function softmax.

In the scheme, the first layer diagram wavelet neural network can be a nonlinear activation function silu or a nonlinear activation function relu. Relu is used as the most common activation function for executing most deep learning tasks, the required calculation amount is very small, and the calculation speed is fast; while the silu is more nonlinear than relu, is more suitable for performing a multi-label classification task, and both the silu and the first derivative have smooth characteristics, and are more easily converged than relu. For the selection of the silu and the relu, a more applicable activation function can be selected according to actual conditions, so that the efficiency and the accuracy of training the model can be improved.

In one possible embodiment, preprocessing an image in a sample data set to obtain a multi-label adjacency matrix of the image includes: determining a first parameter and a second parameter of the multi-label adjacency matrix according to a first label and a second label of the image, wherein the first parameter is used for representing the number of times that the first label and the second label appear in the sample data set at the same time, and the second parameter is used for representing the number of times that the first label appears in the sample data set; determining a conditional probability matrix of the image according to the first parameter and the second parameter; performing binarization processing on the conditional probability matrix to obtain a binarization adjacent matrix; and carrying out weighting processing on the binary adjacent matrix to obtain a multi-label adjacent matrix.

In this scheme, when constructing the multi-label adjacency matrix, firstly, the labels included in the images in the sample data set may be counted, then, the number of times that a first label and a second label of the labels appear in one image at the same time is calculated, the number of times is used as a first parameter, the number of times that the first label appears in the sample data set is calculated, the number of times is used as a second parameter, so that the conditional probability matrix is determined according to the first parameter and the second parameter, and finally, the multi-label adjacency matrix is obtained. The formula may be as follows:

the first label is denoted as i, the second label is denoted as j, and the first parameter is denoted as

The second parameter is recorded as

Conditional probability matrix

Can be expressed as:

in the above scheme, a threshold value may be determined first

Then, the conditional probability matrix is subjected to binarization processing to obtain a binarization adjacent matrix, and the binarization adjacent matrix

Can be expressed as:

in the above scheme, after the condition probability matrix is binarized, since the obtained binarized adjacent matrix has an over-smoothing phenomenon, the binarized adjacent matrix may be further reweighed to obtain a final multi-label adjacent matrix, and the multi-label adjacent matrix

Can be expressed as:

wherein the content of the first and second substances,

can be used to represent hyper-parameters which can be set manually, when

When the node is close to 1, the characteristics of the label corresponding to the node can be ignored; when in use

When the node is close to 0, the information of the label corresponding to the adjacent node can be ignored;

may be used to indicate the number of categories of multi-labels.

Optionally, by constructing a multi-label adjacency matrix of the image to be processed, the sparsity of the multi-label adjacency matrix can be utilized, so that the image labeling model obtained by subsequent training can improve the efficiency of image labeling on the target image.

In a possible implementation manner, training the classification model based on the feature matrix to obtain a multi-label probability distribution model of the image to be processed includes: and matrix multiplication processing is carried out on the characteristic matrix and the classification model to obtain a probability distribution model.

In the scheme, a feature extraction module can be adopted to perform feature extraction on the image to be processed input by a user so as to obtain a feature vector, namely a feature matrix, of the image to be processed, and then the feature matrix and the classification model are subjected to matrix multiplication, so that a probability distribution result of multi-label labeling can be obtained, and the result can be represented by a probability distribution model.

In the above scheme, fig. 3 is a schematic diagram of a model training process provided in an embodiment of the present application, and in fig. 3, after a to-be-processed image input by a user is subjected to feature extraction by a convolutional neural network (i.e., a ResNet-101 network), image features may be obtained, and then after the image features are subjected to global max pooling (global max pooling), a feature matrix may be obtained. The multi-label word vectors represented by the directed graph can form a multi-label adjacency matrix, the multi-label adjacency matrix passes through a first-layer graph wavelet neural network to obtain an output vector, and the distribution probability of a plurality of labels included in the output vector is different from that of the plurality of labels included in the multi-label adjacency matrix (the height of a box where the label is located in the graph represents the probability); and clustering the output vector by adopting a second layer diagram wavelet neural network to obtain another output vector, and converting the other output vector to obtain a classification model, wherein the classification model can represent the probability of multiple labels under different label categories and different characteristic dimensions. After matrix multiplication processing is carried out on the characteristic matrix and the classification model, a multi-label probability distribution result of the image to be processed can be obtained and is represented by the probability distribution model. Finally, after the convergence of the loss function, an image labeling model can be obtained, and the loss function can be a multi-label loss function.

In the above scheme, the feature extraction module may adopt a pre-trained network model of the ImageNet data set, so that subsequent convergence processing of the probability distribution model by adopting a loss function can be accelerated. The probability distribution model is obtained through training, the probability distribution result of the label labeling on the image to be processed can be determined, and therefore the label in the target image can be accurately labeled.

In a possible implementation manner, performing convergence processing on the probability distribution model by using a loss function to obtain an image labeling model includes: determining a network hyper-parameter of the probability distribution model; and carrying out convergence processing on the probability distribution model according to the loss function and the network hyperparameter to obtain an image annotation model.

In the scheme, the loss function may adopt a binarization cross entropy loss function, when the loss function is used for convergence processing of the probability distribution model, network hyper-parameters of the probability distribution model may be determined first, where the network hyper-parameters may include a learning rate lr =0.01, a momentum =0.9, and a weight attenuation penalty =5e-4, and the optimizer may adopt an SGD random gradient descent method to perform gradient back propagation and model training, the iteration number epoch =100, the batch size =32, and the like, so that the training speed of the image labeling model may be increased.

In a possible implementation manner, performing convergence processing on the probability distribution model by using a loss function to obtain an image labeling model includes: monitoring the value of the loss function and the precision value of the probability distribution model; and if the value of the loss function is smaller than the first threshold value or the precision value of the probability distribution model is larger than the second threshold value, outputting the probability distribution model as an image annotation model.

In the scheme, in the process of adopting the loss function to carry out convergence processing on the probability distribution model to obtain the image annotation model, the value of the loss function and the precision value of the probability distribution model can be monitored in real time, so that the optimal probability distribution model can be obtained, the optimal probability distribution model is the finally required image annotation model, and the accuracy of image annotation on the target image can be improved. The first threshold and the second threshold may be preset values, for example, the first threshold may be 0.03, the second threshold may be 99, when the value of the loss function is smaller than 0.03, or the precision value of the probability distribution model is greater than 99, the probability distribution model stops being converged, and the probability distribution model at this time may be output as the finally required image annotation model.

In the above scheme, the parameters of the training platform for model training may be selected as: the system comprises a Ubuntu 16.04 system and 4 Nvidia Tesla V100 display cards, wherein an Intel (R) Xeon (R) CPU E5-2637 v4 @ 3.50GHz can be adopted as a processor; the model framework may be based on an underlying environment of python =3.6, pytorch =1.8.0, cuda =10.2, cudnn =7.6.5. Optionally, multiple Graphics Processors (GPUs) may be employed to accelerate the convergence process on the probability distribution model.

The embodiment of the application also provides an image processing method. Fig. 4 is a flowchart of an image processing method provided in an embodiment of the present application, which may be executed by the execution device in fig. 1, as shown in fig. 4, where the image processing method includes the following steps:

s401: and acquiring an image to be processed.

S402: and inputting the image to be processed into the image annotation model for annotation processing to obtain the label of the image to be processed.

In this step, the image annotation model is obtained by training through the model training method.

According to the image processing method provided by the embodiment of the application, after the image annotation model is obtained through the model training method shown in the embodiment, the image to be processed to be annotated is input into the image annotation model, and then the label of the image to be processed can be rapidly and accurately obtained, so that the efficiency and the accuracy of label automatic annotation of the image to be processed can be improved.

In general, the technical scheme provided by the application is a technical scheme which can improve the efficiency of labeling the image to be processed and the accuracy of labeling the image to be processed.

The application also provides a model training device. Fig. 5 is a schematic structural diagram of a model training apparatus provided in an embodiment of the present application, and as shown in fig. 5, the model training apparatus 500 may include:

an obtaining module 501, configured to pre-process an image in the sample data set to obtain a multi-label word vector of the image and a multi-label adjacency matrix of the image;

a first processing module 502, configured to perform clustering processing on the multi-label word vector and the multi-label adjacency matrix by using a graph wavelet neural network model to obtain a classification model;

the extraction module 503 is configured to perform feature extraction on the image to be processed, and output a feature matrix of the image to be processed;

a training module 504, configured to train the classification model based on the feature matrix to obtain a multi-label probability distribution model of the image to be processed;

and the second processing module 505 is configured to perform convergence processing on the probability distribution model by using a loss function to obtain an image annotation model, where the image annotation model is configured to perform annotation processing on the target image to obtain a tag of the target image.

Optionally, the graph wavelet neural network model includes a 2-layer graph wavelet neural network, and the first processing module 502 is specifically configured to, when the graph wavelet neural network model is used to perform clustering processing on the multi-label word vector and the multi-label adjacency matrix to obtain the classification model: adopting a first layer image wavelet neural network in the 2-layer image wavelet neural network to perform clustering processing on the multi-label word vectors and the multi-label adjacency matrix to obtain output vectors; and clustering the output vectors by adopting a second layer image wavelet neural network in the 2-layer image wavelet neural network to obtain a classification model.

Optionally, the first-layer map wavelet neural network is a nonlinear activation function silu, and the second-layer map wavelet neural network is a nonlinear activation function softmax.

Optionally, the obtaining module 501, when preprocessing the image in the sample data set to obtain the multi-label adjacency matrix of the image, is specifically configured to: determining a first parameter and a second parameter of the multi-label adjacency matrix according to a first label and a second label of the image, wherein the first parameter is used for representing the number of times that the first label and the second label in the sample data set appear at the same time, and the second parameter is used for representing the number of times that the first label appears in the sample data set; determining a conditional probability matrix of the image according to the first parameter and the second parameter; performing binarization processing on the conditional probability matrix to obtain a binarization adjacent matrix; and carrying out weighting processing on the binary adjacency matrix to obtain a multi-label adjacency matrix.

Optionally, the training module 504 is specifically configured to, when training the classification model based on the feature matrix to obtain a multi-label probability distribution model of the image to be processed: and carrying out matrix multiplication processing on the characteristic matrix and the classification model to obtain a probability distribution model.

Optionally, the second processing module 505, when performing convergence processing on the probability distribution model by using a loss function to obtain an image labeling model, is specifically configured to: determining a network hyper-parameter of the probability distribution model; and carrying out convergence processing on the probability distribution model according to the loss function and the network hyper-parameter to obtain an image annotation model.

Optionally, when the second processing module 505 performs convergence processing on the probability distribution model by using the loss function to obtain the image annotation model, the second processing module is further specifically configured to: monitoring the value of the loss function and the precision value of the probability distribution model; and if the value of the loss function is smaller than the first threshold value or the precision value of the probability distribution model is larger than the second threshold value, outputting the probability distribution model as an image annotation model.

The model training device is used for executing the technical scheme provided by the embodiment of the model training method, and the implementation principle and the technical effect of the model training device are similar to those of the embodiment of the method, and are not repeated herein.

The application also provides an image processing device. Fig. 6 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application, and as shown in fig. 6, the image processing apparatus 600 may include:

an obtaining module 601, configured to obtain an image to be processed;

the labeling module 602 is configured to input the image to be processed into an image labeling model for labeling processing, so as to obtain a label of the image to be processed, where the image labeling model is obtained by the model training method.

The image processing apparatus is configured to execute the technical solution provided by the foregoing image processing method embodiment, and the implementation principle and technical effect of the image processing apparatus are similar to those in the foregoing method embodiment, and are not described herein again.

The embodiment of the application further provides the electronic equipment. Fig. 7 is a schematic view of an electronic device according to an embodiment of the present application. As shown in fig. 7, the electronic device 700 may include:

a processor 711, a memory 712, and an interaction interface 713;

wherein the processor 711 is communicatively coupled to the memory 712; the memory 712 is used to store computer-executable instructions that are executable by the processor 711;

wherein the processor 711 is configured to perform the technical solution of the aforementioned model training method or image processing method via executing computer executable instructions.

Alternatively, the memory 712 may be separate or integrated with the processor 711.

Optionally, when the memory 712 is a separate device from the processor 711, the electronic device 700 may further include:

and the bus is used for connecting the devices.

Alternatively, the Memory may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory is used for storing programs, and the processor executes the programs after receiving the execution instructions. Further, the software programs and modules within the aforementioned memories may also include an operating system, which may include various software components and/or drivers for managing system tasks (e.g., memory management, storage device control, power management, etc.), and may communicate with various hardware or software components to provide an operating environment for other software components.

Alternatively, the processor may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The embodiment of the present application further provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the computer-executable instructions are executed by a processor, the computer-executable instructions are used to implement the technical solution of the model training method or the image processing method provided in the foregoing method embodiment.

The embodiments of the present application further provide a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program is configured to implement the technical solution of the model training method or the image processing method provided in the foregoing method embodiments.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A method of model training, comprising:

preprocessing an image in the sample data set to obtain a multi-label word vector of the image and a multi-label adjacency matrix of the image;

clustering the multi-label word vectors and the multi-label adjacency matrix by adopting a graph wavelet neural network model to obtain a classification model;

extracting the characteristics of the image to be processed, and outputting a characteristic matrix of the image to be processed;

training the classification model based on the characteristic matrix to obtain a multi-label probability distribution model of the image to be processed;

and adopting a loss function to carry out convergence processing on the probability distribution model to obtain an image annotation model, wherein the image annotation model is used for carrying out annotation processing on a target image to obtain a label of the target image.

2. The model training method of claim 1, wherein the graph wavelet neural network model comprises a 2-layer graph wavelet neural network, and the clustering process is performed on the multi-label word vector and the multi-label adjacency matrix by using the graph wavelet neural network model to obtain a classification model, comprising:

adopting a first layer image wavelet neural network in the 2-layer image wavelet neural network to perform clustering processing on the multi-label word vectors and the multi-label adjacency matrix to obtain output vectors;

and clustering the output vectors by adopting a second layer image wavelet neural network in the 2-layer image wavelet neural network to obtain the classification model.

3. The model training method of claim 2, wherein the first layer diagram wavelet neural network is a nonlinear activation function silu and the second layer diagram wavelet neural network is a nonlinear activation function softmax.

4. The model training method of claim 1, wherein preprocessing the images in the sample data set to obtain a multi-label adjacency matrix for the images comprises:

determining a first parameter and a second parameter of the multi-label adjacency matrix according to a first label and a second label of the image, wherein the first parameter is used for representing the number of times that the first label and the second label appear in the sample data set at the same time, and the second parameter is used for representing the number of times that the first label appears in the sample data set;

determining a conditional probability matrix of the image according to the first parameter and the second parameter;

carrying out binarization processing on the conditional probability matrix to obtain a binarization adjacent matrix;

and carrying out weighting processing on the binarization adjacent matrix to obtain the multi-label adjacent matrix.

5. The model training method according to any one of claims 1 to 4, wherein the training of the classification model based on the feature matrix to obtain the multi-label probability distribution model of the image to be processed comprises:

and carrying out matrix multiplication processing on the characteristic matrix and the classification model to obtain the probability distribution model.

6. The model training method according to any one of claims 1 to 4, wherein the converging the probability distribution model by using the loss function to obtain the image labeling model comprises:

determining a network hyper-parameter of the probability distribution model;

and carrying out convergence processing on the probability distribution model according to the loss function and the network hyper-parameter to obtain the image annotation model.

7. The model training method according to any one of claims 1 to 4, wherein the converging the probability distribution model by using the loss function to obtain the image labeling model comprises:

monitoring values of the loss function and precision values of the probability distribution model;

and if the value of the loss function is smaller than a first threshold value or the precision value of the probability distribution model is larger than a second threshold value, outputting the probability distribution model as the image annotation model.

8. An image processing method, comprising:

acquiring an image to be processed;

inputting the image to be processed into an image annotation model for annotation processing to obtain a label of the image to be processed, wherein the image annotation model is obtained by training through the model training method of any one of claims 1 to 7.

9. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored by the memory to implement the model training method of any one of claims 1 to 7 or the image processing method of claim 8.

10. A computer-readable storage medium having computer-executable instructions stored thereon, wherein the computer-executable instructions, when executed by a processor, are configured to implement the model training method of any one of claims 1 to 7 or the image processing method of claim 8.