WO2022141094A1 - Model generation method and apparatus, image processing method and apparatus, and readable storage medium - Google Patents

Model generation method and apparatus, image processing method and apparatus, and readable storage medium Download PDF

Info

Publication number
WO2022141094A1
WO2022141094A1 PCT/CN2020/141005 CN2020141005W WO2022141094A1 WO 2022141094 A1 WO2022141094 A1 WO 2022141094A1 CN 2020141005 W CN2020141005 W CN 2020141005W WO 2022141094 A1 WO2022141094 A1 WO 2022141094A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
model
sample
initial
label
Prior art date
Application number
PCT/CN2020/141005
Other languages
French (fr)
Chinese (zh)
Inventor
张雪
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2020/141005 priority Critical patent/WO2022141094A1/en
Publication of WO2022141094A1 publication Critical patent/WO2022141094A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present invention belongs to the field of network technology, and in particular, relates to a model generation method, an image processing method, a device and a readable storage medium.
  • images are an excellent way to obtain information, and images are produced in more and more scenarios.
  • In order to extract image information in an image it is often necessary to generate an image processing model for extracting image information.
  • the present invention provides a model generation method, an image processing method, a device and a readable storage medium, so as to solve the problems of weak generalization ability of the image processing model and low accuracy of the image information extracted during use.
  • an embodiment of the present invention provides a model generation method, which includes:
  • the training data includes a sample image and a label of the sample image, and the label includes a label generated in at least two ways of acquiring image information;
  • an initial model is trained to generate an image processing model; the image processing model is used to extract image information, and the initial model includes a single processing branch.
  • an embodiment of the present invention provides an image processing method, which is applied to a processing device, and the method includes:
  • the image processing model is generated according to the above model generation method.
  • an embodiment of the present invention provides a model generation apparatus, the apparatus includes a memory and a processor;
  • the memory for storing program codes
  • the processor calls the program code, and when the program code is executed, is configured to perform the following operations:
  • the training data includes a sample image and a label of the sample image, and the label includes a label generated in at least two ways of acquiring image information;
  • an initial model is trained to generate an image processing model; the image processing model is used to extract image information, and the initial model includes a single processing branch.
  • an embodiment of the present invention provides an image processing apparatus, the apparatus includes a memory and a processor;
  • the memory for storing program codes
  • the processor calls the program code, and when the program code is executed, is configured to perform the following operations:
  • the image processing model is generated according to the above-mentioned model generating device.
  • an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, any one of the foregoing methods is implemented.
  • training data may be acquired, wherein the training data includes sample images and labels of the sample images, and the labels include labels generated by at least two image information acquisition methods. Then, based on the sample images and the labels of the sample images, the initial model is trained to generate an image processing model, wherein the image processing model is used to extract image information, and the initial model includes at least a single processing branch. Since the problem of insufficient samples caused by the limitation of a single labeling method can be avoided when labels are marked with multiple image information acquisition methods, so, by acquiring training data marked with multiple image information acquisition methods for training, the training data can be ensured. Diversity and sufficiency, which in turn can improve the generalization ability of the final image processing model to a certain extent, thereby improving the accuracy of the image information extracted by the image processing model subsequently.
  • FIG. 1 is a flow chart of steps of a model generation method provided by an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a model structure provided by an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a fusion layer provided by an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a depthwise separable convolution operation provided by an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a processing layer provided by an embodiment of the present invention.
  • FIG. 6 is a flowchart of steps of an image processing method provided by an embodiment of the present invention.
  • Fig. 7 is a block diagram of a model generation device provided by an embodiment of the present invention.
  • FIG. 8 is a block diagram of an image processing apparatus provided by an embodiment of the present invention.
  • FIG. 9 is a block diagram of a computing processing device according to an embodiment of the present invention.
  • FIG. 10 is a block diagram of a portable or fixed storage unit according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of steps of a model generation method provided by an embodiment of the present invention. As shown in FIG. 1 , the method may include:
  • Step 101 Acquire training data; the training data includes a sample image and a label of the sample image, and the label includes a label generated in at least two ways of acquiring image information.
  • the sample image may be obtained by receiving user input, or may be obtained independently from the network. For example, download sample images directly from open source databases.
  • the label of the sample image can be used to characterize the image information of the sample image. For example, when the image information is the angle information of the face in the image, the label of the sample image can be used to represent the angle information of the face in the sample image.
  • the label of the sample image may be the angle information itself, or the data used to calculate the angle information, for example, the key point information used to calculate the angle information.
  • the specific type and specific quantity of the image information acquisition manner used to acquire the label may be set according to actual requirements, which is not limited in this embodiment of the present invention.
  • Step 102 Train an initial model according to the sample image and the label of the sample image to generate an image processing model; the image processing model is used to extract image information, and the initial model includes a single processing branch.
  • the specific architecture of the initial model may be pre-designed according to actual requirements.
  • the complexity of the model structure can be reduced to a certain extent, and a lightweight model can be realized. Structural design to minimize required computing resources
  • the training of the initial model may take the sample image as the input of the initial model, then use the output of the initial model as the predicted value, and determine the real value according to the label of the sample image, for example, use the label as the actual value.
  • the current loss value of the initial model is calculated according to the predicted value and the actual value. If the loss value does not meet the preset requirements, it means that the initial model has not yet converged. Accordingly, the model parameters of the initial model can be adjusted, and the adjustment The subsequent initial model continues to train until the loss value meets the preset requirements. Finally, when the loss value of a certain round of initial models meets the preset requirements, the current initial model is used as the final image processing model.
  • the model generation method can acquire training data, wherein the training data includes sample images and labels of the sample images, and the labels include labels generated by at least two image information acquisition methods. Then, based on the sample images and the labels of the sample images, the initial model is trained to generate an image processing model, wherein the image processing model is used to extract image information, and the initial model includes a single processing branch. Since the problem of insufficient samples caused by the limitation of a single labeling method can be avoided when labels are marked with multiple image information acquisition methods, so, by acquiring training data marked with multiple image information acquisition methods for training, the diversity of training data can be ensured. To a certain extent, the generalization ability of the final generated image processing model can be improved, thereby improving the accuracy of the image information extracted by the image processing model subsequently.
  • the image information in this embodiment of the present invention may include angle information of the human face in the image, and the angle information may represent the posture angle of the human face in the image.
  • the angle information may include pitch, yaw, and roll.
  • the label may include a first label generated in a manner of acquiring first image information and a second tag generated in a manner of acquiring second image information.
  • the first method of acquiring image information may include a method of acquiring angle information according to key points of faces in the image
  • the second method of acquiring image information may include a method of performing regression detection according to color channel values of pixels in the image to obtain angle information.
  • the color channel value of the pixel point may be the red-green-blue (Red-Green-Blue, RGB) color channel value of the pixel point.
  • the determination efficiency of face key points is often high, more training samples can often be obtained based on the first image information acquisition method, but as the angle information of the face in the image becomes larger and larger, the face key points The accuracy of the determination will decrease accordingly, which will lead to a larger error in the label, which in turn will lead to a decrease in the accuracy of the final generated model.
  • the influence of the angle information of the face in the image becomes larger, that is, the angle information of the face in the image is larger.
  • the label can also accurately represent the angle information of the face, the label error is small, and the label quality is high.
  • two kinds of training data corresponding to the first label and the second label are generated in combination with the first image information acquisition method and the second image information acquisition method, which can ensure sufficient training data to a certain extent.
  • the accuracy of the training data is ensured and the model training effect is ensured, so that the final generated image processing model can extract image information more accurately in the case of limited training data.
  • the above operation of acquiring training data may include the following steps:
  • Step 1011 Acquire an initial image and an initial label of the initial image.
  • an image can be obtained from open source data as an initial image, and an initial label of the initial image can be obtained by manual labeling.
  • an initial image and the initial label of the initial image through the following steps A to C:
  • Step A Obtain a first preset model and a second preset model; the first preset model is used to obtain angle information according to the key points of the face in the image, and the second preset model is used to obtain the angle information according to the pixel points in the image.
  • the color channel value of to get the angle information is used to obtain the angle information according to the pixel points in the image.
  • the first preset model and the second preset model may be pre-trained models.
  • the first preset model when the first preset model is pre-trained, the first preset model can be trained with the images in the training set and the corresponding marked face key points, so that the first preset model can learn the ability to determine the face key points, Further, the first preset model may calculate the angle information of the face according to the determined key points of the face according to the preset acquisition algorithm.
  • the second preset model can be trained with the images in the training set and the correspondingly marked face angle information, so that the second preset model can learn to perform regression detection according to the pixels in the image to determine The ability to angle information.
  • the sample labeling is often done manually.
  • the labeling data is often less, and it is affected by personal subjective feelings. tend to be poorer, which in turn leads to poorer training results.
  • the problem of poor training effect when training based on a single labeling method can be avoided to a certain extent.
  • the pre-trained first preset model and the second preset model may be directly loaded, thereby improving the acquisition efficiency to a certain extent.
  • Step B Process the first sample image according to the first preset model to obtain the first label, and process the second sample image according to the second preset model to obtain the first label. Two labels.
  • Step C take the first sample image and the second sample image as the initial image, and use the first label and the second label as the initial label.
  • the first sample image and the second sample image may be multiple images, and the image set composed of the first sample image and the image set composed of the second sample image may not have the same image, or may have a part of the same image
  • the same image is not limited in this embodiment of the present invention.
  • the first sample image can be used as the input of the first preset model, the first sample image can be tagged by the first preset model, and then the first preset model is output as the first label.
  • the existing face key point data can also be directly obtained based on an open source database as an image as the first sample image, and the first label is generated according to the face key point data in these images.
  • the first preset model can also be used to determine the key points of the face in the first sample image, and the key points of the face can be used as training labels.
  • the final generated image processing model can be After learning how to accurately determine the key points of the face, correspondingly, when the image processing model extracts the angle information, the corresponding angle information can be calculated according to the determination of the key points of the face and the preset algorithm.
  • the second sample image can be used as the input of the second preset model, the second sample image can be tagged by the second preset model, and then the second preset model is output as the second label.
  • manual annotation may also be used in the embodiment of the present invention, which is not limited in the embodiment of the present invention. It should be noted that when labeling sample images based on the preset model, there may be certain errors, which may result in a small part of noise data in the final training data. A small amount of noise data can provide more abundant and diverse information for the subsequent training process, which in turn can improve the generalization ability of the model to a certain extent.
  • the first sample image and the second sample image are marked based on the acquired first preset model and the second preset model, respectively. , get the first label and the second label.
  • the sample image is divided into a first sample image and a second sample image, and the first sample image and the second sample image are marked in different ways, which can facilitate subsequent cross-training, so that the initial model can be trained during training.
  • learning is performed based on the characteristics of the two training data, which can ensure the training effect of the model to a certain extent.
  • the image information in this embodiment of the present invention may also be other information, such as age information corresponding to the face in the image.
  • the specific age information corresponding to the face in the sample image can be used as the label of the sample image.
  • Step 1012 Generate a target processing model according to the initial image and the initial label.
  • the initial image and the initial label can be used as training data to train the acquisition target processing model.
  • the initial image can be used as the input of the preset original model, then the output of the preset original model can be used as the predicted value, and the real value can be determined according to the initial label of the initial image, for example, the initial label can be used as the real value.
  • the current loss value of the preset original model is calculated according to the predicted value and the actual value. If the loss value does not meet the preset requirements, it means that the preset original model has not yet converged. Accordingly, the preset original model can be adjusted. model parameters, and continue to train the adjusted preset original model until the loss value meets the preset requirements. Finally, when the loss value of the preset original model in a certain round meets the preset requirements, the current preset original model can be used as the final target processing model.
  • Step 1013 Screen the initial image according to the target processing model, the initial image and the initial label to obtain the sample image.
  • the dirty data in the training data can be automatically eliminated to a certain extent, thereby improving the accuracy of the training data.
  • the automatic screening in the embodiment of the present invention can reduce the screening cost and screening time to a certain extent, which is beneficial to the iterative update of the model.
  • the initial image when screening, for any initial image, can be used as the input of the target processing model to obtain the output of the target processing model; the output is used as the predicted label of the initial image, and the The similarity between the predicted label and the initial label is eliminated; the initial image whose similarity is less than a preset similarity threshold is eliminated.
  • the target processing model can process the initial image, and then obtain the output.
  • model_origin representing the target processing model and dataset_origin representing the set of initial images as an example
  • the similarity between the predicted label and the initial label can represent the closeness between the two. If the similarity between the predicted label and the initial label is greater, it can be considered that the initial label is more accurate and the more reliable. high. If the similarity between the predicted label and the initial label is smaller, the initial label can be considered to be less credible.
  • screening can be performed according to the magnitude relationship between the similarity and the preset similarity threshold. If the similarity is less than the preset similarity threshold, it can be considered that the initial label of the initial image has low reliability, so the initial image can be eliminated, and only the similarity is not less than the preset similarity threshold, and the reliability is high
  • the initial image is used as the sample image.
  • the preset similarity threshold may be set according to actual requirements, which is not limited in this embodiment of the present invention.
  • the set dataset_clean composed of the retained initial images may be used as training data to train and obtain an image processing model, and the image processing model obtained by the training may be represented as model_clean.
  • the initial image is screened according to the similarity, so that the accuracy of the screening operation can be ensured to a certain extent.
  • the absolute value of the difference between the predicted label and the initial label can be calculated, and the predicted label and the initial label can be determined according to the absolute value. similarity between the initial labels; the similarity is negatively correlated with the absolute value.
  • the similarity can be set to be negatively correlated with the absolute value.
  • predict_label represents the predicted label and label represents the initial label
  • the absolute value of the difference between the predicted label and the initial label can be expressed as abs(predict_label-label).
  • abs(*) means to take the absolute value of the input "*”.
  • -abs(predict_label-label) can be used as similarity.
  • the calculation of the similarity can be conveniently realized, and the calculation efficiency can be improved to a certain extent.
  • other similarity algorithms may also be used for calculation, which is not limited in this embodiment of the present invention.
  • an initial image that was rejected due to misscreening may also be obtained from the rejected initial image by manual discrimination, and added to the training data.
  • the problem of reducing training data caused by false screening can be eliminated to a certain extent.
  • the initial images that are rejected are manually recycled for the second time through automatic screening, so that the manual processing volume can be reduced and the increase of While the screening speed is improved, the screening accuracy is improved.
  • the above-mentioned operation of training an initial model to generate an image processing model according to the sample image and the label of the sample image may include the following steps:
  • Step 1021 Adjust the sample image to a sample image with multiple preset sizes.
  • the preset size may be set according to actual requirements.
  • the preset size may include 64*64, 48*48, 40*40, and the like.
  • the sample images may be divided into N groups, where N may be the number of preset sizes. Next, different preset sizes may be set for different groups, and finally the size of the sample images in each group is adjusted to the preset size corresponding to the group, thereby obtaining sample images in multiple preset sizes.
  • other manners may also be used for adjustment, which is not limited in this embodiment of the present invention.
  • Step 1022 For each sample image in the preset size, train the initial model according to the sample image and the label of the sample image, so as to obtain an image processing model in each preset size.
  • the size of the sample image can represent the input size (inputsize) corresponding to the model, and the initial model is trained with the sample images in each preset size, and then image processing models corresponding to different input sizes can be obtained, wherein, Image processing models corresponding to different input sizes are image processing models under different preset sizes.
  • the computational complexity of the model tends to be higher.
  • the model calculation amount is represented by the double-format Multiply Accumulate (MACC) amount when the model is running
  • the corresponding input size is 40*40, 48*48, 64*64.
  • the corresponding MACC calculation amounts may be 3.8M (MByte, M), 4.8M, and 8.7M, respectively.
  • generating image processing models corresponding to different input sizes can provide users with image processing models with different calculation amounts, thereby facilitating users to choose and use them according to actual needs. For example, in the scene where the image processing model is applied to the beauty system, if the requirement set by the beauty system is that the calculation amount of the model is less than 10M, then you can choose one version of the three versions of the model to use. Alternatively, you can directly select the corresponding image processing model with the least amount of calculation to minimize the amount of calculation and increase the running speed to make the beauty system smoother.
  • the initial model is trained according to the sample images and the labels of the sample images, so as to obtain the sample images of each preset size.
  • Set the image processing model under the size that is, generate image processing models corresponding to different input sizes, provide users with a variety of choices, so that users can choose the image processing model suitable for the device's capabilities according to actual needs in subsequent applications, thereby improving the flexibility of operation. sex.
  • an operation of training according to the sample images in the preset size and the labels of the sample images may be performed once for each sample image in a preset size, so as to realize the generation of the preset size.
  • the sample image may include a first sample image and a second sample image, and the manner of acquiring the first label of the first sample image and the manner of acquiring the second label of the second sample image may be different.
  • the above steps of training the initial model according to the sample images and the labels of the sample images may include:
  • Step 10221 Divide the first sample image into multiple first sample groups, and divide the second sample image into multiple second sample groups.
  • the first sample image may be equally divided into multiple image groups, thereby obtaining multiple first sample groups.
  • the second sample image is equally divided into a plurality of image groups, thereby obtaining a plurality of second sample groups.
  • grouping may also be performed randomly, which is not limited in this embodiment of the present invention.
  • Step 10222 According to the first sample image in the first sample group and the first label of the first sample image, and the second sample image in the second sample group and the first sample The second label for this image, cross-training the initial model.
  • the first sample image and the second sample image have different labels, that is, the training data are separated
  • the first sample image and the second sample image are divided into multiple decibels.
  • a first sample group and a plurality of second sample groups are combined for cross-training with the first sample group and the second sample group.
  • the initial model can be learned based on the two training samples in a balanced manner during the training process.
  • the final image processing model can be obtained by combining the two image information acquisition methods, and the final training effect can be improved.
  • other training methods can also be used.
  • the first sample image and the second sample image may be the same image, that is, the first label and the second label are simultaneously set for the same sample image.
  • the sample image may be directly used for training, which is not limited in this embodiment of the present invention.
  • the number of first sample images included in the first sample group can be set to be the same as the number of second sample images included in the second sample group.
  • the above is based on the first sample image and the first label of the first sample image in the first sample group, and the second sample image and the first sample image in the second sample group.
  • the second label of the sample image, the step of cross-training the initial model may include:
  • Step 10222a Train the initial model according to a first sample image in a first sample group and a first label of the first sample image to update model parameters of the initial model.
  • an unused first sample group may be selected from the first sample group, and then the first sample image in the selected first sample group may be used as the input of the initial model, based on the output of the initial model.
  • the first tag determines a loss value, and if the loss value does not meet the preset requirements, the model parameters of the initial model can be updated.
  • a preset stochastic gradient descent method can be used to adjust the model parameters to achieve the update.
  • Step 10222b after updating the model parameters of the initial model, train the initial model according to the second sample image in the second sample group and the second label of the second sample image to update the The model parameters of the initial model, and after updating the model parameters, the first sample image in the first sample group and the first label of the first sample image are re-executed for all the Describe the steps for training the initial model.
  • the second sample group may be used for training, so as to realize cross-training between the first sample group and the second sample group.
  • an unused second sample group can be selected from the second sample group, and then the second sample image in the selected second sample group is used as the input of the initial model, based on the output of the initial model and the second label.
  • the loss value is determined, and if the loss value does not meet the preset requirements, the model parameters of the initial model can be updated.
  • a preset stochastic gradient descent method can be used to adjust the model parameters to achieve the update.
  • the above-mentioned process of performing the training update based on the first sample group may be repeated to realize cyclic update training.
  • the training can be ended when the loss value of the initial model meets the preset requirements.
  • cross-training is achieved by alternating cycles. Since the number of images contained in the first sample group and the second sample group is the same, the balance of training data during each cross-training can be improved to a certain extent, thereby ensuring the effect of cross-training.
  • continuous training and updating of model parameters based on the first sample group and the second sample group can add rich learnable information to the model and enable the model to be optimized by the two training samples in a balanced manner during the update process. Improve the generalization ability of the model and speed up the convergence efficiency of the model.
  • the processes of generating the image processing model by training the initial model is the process of adjusting the model parameters. process.
  • the single processing branch included in the initial model may specifically include a convolution layer, an activation function layer, a parallel maximum pooling layer and an average pooling layer, a splicing layer, a fusion layer, and a The processing layer where the output of the layer is processed.
  • the convolution layer can be used to extract the features of the model input through the convolution operation.
  • Setting the activation function layer can be used to add nonlinear factors to the model.
  • the model can be prevented from being only a simple linear combination, and the expressive ability of the model can be improved to a certain extent.
  • the pooling layer redundant information can be removed while retaining the main features, thereby reducing the size of the data.
  • the specific type of the activation function used by the activation function layer can be set according to actual requirements.
  • the activation function layers in this embodiment of the present invention may all be linear rectification function (Rectified Linear Unit, Relu) activation function layers.
  • the activation function layer to the Relu activation function layer, it is possible to avoid using too many hyperbolic tangent (Tanh) activation functions in the model structure while ensuring the accuracy of the model.
  • Teanh hyperbolic tangent
  • Low problem and avoid the problem that the actual running speed of the model cannot reach the theoretical speed, so as to ensure that the actual running time of the model is proportional to the theoretical calculation amount of the model, and further avoid the theoretical calculation amount is low, but the actual running time is longer.
  • a part of the activation function layer may also be set as a Tanh activation function layer, a part of the activation function layer may be set as a Relu activation function layer, etc., which is not limited in this embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a model structure provided by an embodiment of the present invention.
  • Input(3*40*40) indicates that it contains three color channel sizes
  • SeparableConxBnRelu represents a depthwise separable convolution layer
  • BN batch normalization
  • Relu activation function layer a Relu activation function layer
  • Avgpool represents the average pooling layer
  • maxpool represents the maximum pooling layer
  • concat represents the concatenation layer
  • conv2d represents the two-dimensional convolution operation layer.
  • the splicing layer can be used to splicing the output of the average pooling layer and the output of the maximum pooling layer, and conv2d can be used to perform a two-dimensional convolution operation on the splicing result of the splicing layer.
  • the output of the fusion layer and the output of the maximum pooling layer lead to the problem that the subsequent depthwise separable convolution layer cannot comprehensively extract the mixed information, which can ensure the subsequent processing effect.
  • the ith fusion layer is represented by Fusioni
  • the fusion layers included in the initial model can be Fusion1, Fusion2, and Fusion3.
  • the i-th processing layer is represented by Stagei_output
  • the processing layers included in the initial model can be Stage1_output, Stage2_output, and Stage3_output.
  • "1" in (16, 3, 1), (24, 3, 1), (48, 3, 1), (96, 3, 1) in Figure 2 can represent the step of the convolution operation long
  • "3" can indicate the size of the convolution kernel used
  • "16", "24", "48” and "96” can indicate the number of convolution kernels respectively.
  • conv2d(24, 1, 1) in Figure 2 indicates that the step size is 1, and the two-dimensional convolution operation is performed based on 24 convolution kernels with a size of 1 ⁇ 1.
  • conv2d(48, 1, 1) indicates that 1 is used as the The stride is based on 48 convolution kernels of size 1 ⁇ 1 for 2D convolution operations.
  • (2, 2) in Figure 2 represents the size of the region processed by the pooling operation each time. Since the size of the feature map will decrease correspondingly with the continuous processing of the pooling layer, accordingly, in the embodiment of the present invention, the number of convolution kernels in each depth-separable convolutional layer is set to increase in sequence. While ensuring that the amount of calculation is not too large, ensure that each convolution operation can extract enough feature information.
  • FIG. 3 is a schematic diagram of a fusion layer provided by an embodiment of the present invention.
  • Stream1_stagei represents the input data input by the average pooling layer to the ith fusion layer
  • Stream2_stagei respectively represents the input of the maximum pooling layer.
  • SeparableConxBnRelu represents the depthwise separable convolution layer, BN layer, and Relu activation function layer.
  • Avgpool represents the average pooling layer.
  • Maxpool represents the max pooling layer. "Elements multiply" means the layer used for element multiplication.
  • the model by designing a single processing branch, a fusion layer, and multiple parallel pooling layers, when the model is subsequently used, the model can perform a variety of output information from multiple parallel pooling layers based on the fusion layer. Fusion can improve the processing accuracy to a certain extent.
  • the convolution layer in the initial model of the embodiment of the present invention may be used to perform a depthwise separable convolution operation. That is, the convolutional layers in the initial model can all be depthwise separable convolutional layers. Further, the depthwise separable convolution layer can be used to perform a depthwise separable convolution operation, and the depthwise separable convolution operation may include two parts: spatial/depthwise convolution (Depthwise Convolution) and channel convolution (Pointwise Convolution). In specific implementation, depthwise convolution can be performed on the channels of the feature map respectively, and the output is spliced, and then the unit convolution kernel is used for pointwise convolution. Exemplarily, FIG.
  • FIG. 4 is a schematic diagram of a depthwise separable convolution operation provided by an embodiment of the present invention.
  • Depthwise_Conv(3,1) may be executed first, and then Pointwise_Conv(1,1) may be executed to achieve 3 ⁇ 3 convolution operation.
  • Depthwise_Conv represents spatial convolution
  • Pointwise_Conv represents channel convolution.
  • the specific execution process can be first through a 3 ⁇ 1 spatial convolution, and finally through a 1 ⁇ 1 channel convolution.
  • the embodiment of the present invention Compared with the method of directly using the standard convolution operation, in the embodiment of the present invention, by dividing the standard convolution into two parts, the correlation between the spatial dimension and the channel dimension can be split, so that the convolution calculation can be reduced.
  • the number of parameters required can reduce the model calculation amount to a certain extent, and improve the model calculation efficiency and calculation speed.
  • the processing layer in this embodiment of the present invention may include a first processing layer and a second processing layer, wherein the first processing layer may include a fully connected layer (Fully Connected layers, FC) and an activation function layer, the second processing layer.
  • the layers may include depthwise separable convolutional layers.
  • FC can be used to reassemble the features extracted by the previous layer into a complete feature map through the weight matrix. And play the role of a classifier in the model.
  • FC can be used to reassemble the features extracted by the previous layer into a complete feature map through the weight matrix.
  • play the role of a classifier in the model Compared with the method of directly using the standard convolution operation, in the embodiment of the present invention, by setting the depth-separable convolution layer in the processing layer, the number of parameters required for the convolution calculation can be reduced, and the calculation can be reduced to a certain extent. quantity and improve computational efficiency.
  • FIG. 5 is a schematic diagram of a processing layer provided by an embodiment of the present invention.
  • the input of the ith processing layer may be the output “Fusioni_output” of the corresponding ith fusion layer.
  • the input of the processing layer can be input to the first processing layer 01 and the second processing layer 02 correspondingly.
  • SeparableConV(10, 3, 1) indicates that with 1 as the stride, depthwise separable convolution is performed based on 10 convolution kernels of size 3 ⁇ 3.
  • the Tanh activation function is reserved in the first processing layer , to a certain extent, it can be ensured that there are positive values and positive values in the range of finally obtained angle information, thereby expanding the range of angle information.
  • angle calculation can be performed based on the outputs of each layer included in the first processing layer, to obtain angle information in one way, for example, obtain first angle information obtained according to the key points of the face in the image.
  • the angle calculation may be implemented based on a preset angle calculation method in the angle calculation layer, or the outputs of the three modules in the first processing layer may be directly used as the angle information.
  • the processing result of the second processing layer can represent the angle information extracted in another way, for example, the second angle information obtained according to the color channel value of the pixel in the image is obtained.
  • the initial model may also be set to include at least two processing branches, wherein the at least two processing branches may include a first processing branch and a second processing branch, Both the first processing branch and the second processing branch include a convolution layer, an activation function layer and a pooling layer.
  • the initial model also includes a fusion layer for fusing the outputs of the first processing branch and the second processing branch, and a processing layer for processing the output of the fusion layer.
  • the fusion layer includes a convolution layer, and the convolution layer in the fusion layer and the convolution layer in the first processing branch and the second processing branch are used for performing depthwise separable convolution operations.
  • the processing layer includes a first processing layer and a second processing layer; the first processing layer includes a fully connected layer and an activation function layer, and the second processing layer includes a depthwise separable convolution layer.
  • FIG. 6 is a flowchart of steps of an image processing method provided by an embodiment of the present invention.
  • the method can be applied to a processing device. As shown in FIG. 6 , the method can include:
  • Step 201 Use the image to be processed as the input of a preset image processing model to obtain the output of the image processing model.
  • the processing device may be a mobile phone, a pan-tilt camera, and other devices with shooting capability and processing capability.
  • a preset image processing model can be deployed on the processing device.
  • the image to be processed may be an image from which image information is extracted as required.
  • the image to be processed may be an image captured by a processing device, or an image in a captured video.
  • Step 202 Acquire image information of the image to be processed according to the output of the image processing model; wherein the image processing model is generated according to the above model generation method.
  • the preset image processing model is trained by acquiring training data labeled with multiple image information acquisition methods, in this way, when labeling with multiple image information acquisition methods, it is possible to avoid insufficient samples caused by the limitation of a single labeling method. It can ensure the diversity and sufficiency of training data, and to a certain extent can improve the generalization ability of the final generated image processing model, thereby improving the image information extracted when using the image processing model to extract the image to be processed. accuracy.
  • the image information may include angle information of the face in the image
  • the output of the image processing model may include the first angle information obtained according to the key points of the face in the image, and the first angle information obtained according to the color channel value of the pixel in the image.
  • Two angle information may be determined as the angle information of the image to be processed. That is, only the second angle information obtained according to the color channel value of the pixel in the image is used during application.
  • the second angle information obtained according to the color channel value of the pixel in the image is used as the angle information of the image to be processed, To a certain extent, the accuracy of the angle information of the image to be processed can be ensured.
  • the first angle information and the second angle information may also be displayed, and the angle information selected by the user is used as the angle information of the image to be processed.
  • the angle information of the image to be processed is calculated in combination with the first angle information and the second angle information, for example, the average value of the first angle information and the second angle information is used as the angle information of the to-be-processed image, which is not performed in this embodiment of the present invention. limited.
  • the preset image processing models may include image processing models corresponding to different preset sizes.
  • a preset size that matches the processing performance of the processing device may be determined first; wherein the processing performance The higher, the larger the matching preset size.
  • the to-be-processed image can be used as an input of a target image processing model to obtain an output of the target image processing model; the target image processing model is an image processing model corresponding to the matching preset size.
  • the processing performance may be determined based on the hardware configuration of the processing device. If the hardware configuration of the processing device is higher, it may be determined that the processing performance of the processing device is higher.
  • the preset size corresponding to the processing performance of the processing device may be determined according to the corresponding relationship between the preset processing performance and the preset size, so as to obtain a matching preset size. Then, the image processing model with the matching preset size is used as the target processing model. For example, assuming that the matching preset size is 40*40, the target processing model may be an image processing model corresponding to 40*40.
  • image processing models with different preset sizes are generated in the model generation stage, and during application, an appropriate image processing model is selected for image processing according to the actual processing capability of the processing device. In this way, to a certain extent, It can handle the problem that the device does not have enough capacity to run the image processing model, which causes the device to freeze.
  • FIG. 7 is a block diagram of an apparatus for generating a model provided by an embodiment of the present invention.
  • the apparatus may include: a memory 301 and a processor 302 .
  • the memory 301 is used to store program codes.
  • the processor 302 calls the program code, and when the program code is executed, is configured to perform the following operations:
  • the training data includes a sample image and a label of the sample image, and the label includes a label generated in at least two ways of acquiring image information;
  • an initial model is trained to generate an image processing model; the image processing model is used to extract image information, and the initial model includes a single processing branch.
  • the acquiring training data includes:
  • the initial image and the initial label the initial image is screened to obtain the sample image.
  • the initial image and the initial label is screened to obtain the sample image, including:
  • the initial image is used as the input of the target processing model to obtain the output of the target processing model;
  • the initial images whose similarity is less than the preset similarity threshold are eliminated, and the remaining initial images are used as the sample images.
  • the calculating the similarity between the predicted label and the initial label includes:
  • the similarity between the predicted label and the initial label is determined according to the absolute value; the similarity is negatively correlated with the absolute value.
  • the image information includes angle information of the face in the image
  • the label includes a first label generated in a manner of acquiring first image information and a second tag generated in a manner of acquiring second image information
  • the first method of acquiring image information includes a method of acquiring angle information according to a face key point in the image;
  • the second method of acquiring image information includes a method of performing regression detection according to the color channel value of the pixel point in the image to obtain the angle information.
  • the single processing branch includes a convolution layer, an activation function layer, a parallel maximum pooling layer and an average pooling layer, a splicing layer, a fusion layer, and a process for processing the output of the fusion layer.
  • Floor a convolution layer, an activation function layer, a parallel maximum pooling layer and an average pooling layer, a splicing layer, a fusion layer, and a process for processing the output of the fusion layer.
  • the fusion layer is used to fuse the outputs of the maximum pooling layer and the average pooling layer.
  • the convolution layer in the initial model is used to perform a depthwise separable convolution operation.
  • the activation function layers in the initial model are all RELU activation function layers.
  • the initial model is trained according to the sample image and the label of the sample image to generate an image processing model, including:
  • the initial model is trained according to the sample image and the label of the sample image, so as to obtain the image processing model in each preset size.
  • the sample image includes a first sample image and a second sample image, and the method for acquiring the first label of the first sample image is different from the method for acquiring the second label of the second sample image;
  • the number of first sample images included in the first sample group is the same as the number of second sample images included in the second sample group;
  • the second label of the image, to cross-train the initial model including:
  • the initial model After updating the model parameters of the initial model, the initial model is trained according to the second sample image in the second sample group and the second label of the second sample image to update the initial model the model parameters, and after updating the model parameters, re-execute the first sample image in the first sample group and the first label of the first sample image, to the initial model Steps for training.
  • the model generating apparatus can acquire training data, wherein the training data includes sample images and labels of the sample images, and the labels include labels generated by at least two image information acquisition methods. Then, based on the sample images and the labels of the sample images, the initial model is trained to generate an image processing model, wherein the image processing model is used to extract image information, and the initial model includes a single processing branch. Since the problem of insufficient samples caused by the limitation of a single labeling method can be avoided when labels are marked with multiple image information acquisition methods, so, by acquiring training data marked with multiple image information acquisition methods for training, the diversity of training data can be ensured. To a certain extent, the generalization ability of the final generated image processing model can be improved, thereby improving the accuracy of the image information extracted by the image processing model subsequently.
  • FIG. 8 is a block diagram of an image processing apparatus provided by an embodiment of the present invention.
  • the apparatus may include: a memory 401 and a processor 402 .
  • the memory 401 is used to store program codes.
  • the processor 402 calls the program code, and when the program code is executed, is configured to perform the following operations:
  • the image processing model is generated according to the above model generation method.
  • the image information includes the angle information of the face in the image
  • the output of the image processing model includes the first angle information obtained according to the key points of the face in the image, and the color channel value according to the pixel point in the image.
  • the acquired second angle information includes the first angle information obtained according to the key points of the face in the image, and the color channel value according to the pixel point in the image.
  • the obtaining image information of the to-be-processed image according to the output of the image processing model includes:
  • the second angle information is determined as the angle information of the image to be processed.
  • the image processing model includes image processing models corresponding to different preset sizes
  • Taking the image to be processed as the input of the preset image processing model to obtain the output of the image processing model includes:
  • the to-be-processed image is used as the input of the target image processing model to obtain the output of the target image processing model;
  • the target image processing model is the image processing model corresponding to the matching preset size.
  • the preset image processing model used is obtained by acquiring training data marked in various ways of acquiring image information for training, thus, using various image information
  • the problem of insufficient samples caused by the limitation of a single labeling method can be avoided when the label is obtained by the method of acquisition, thereby ensuring the diversity and sufficiency of training data, and to a certain extent, it can improve the generalization ability of the final image processing model, thereby improving the When using this image processing model to extract the image to be processed, the accuracy of the extracted image information.
  • an embodiment of the present invention also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, each step in the above method is implemented, and can achieve the same In order to avoid repetition, the technical effect will not be repeated here.
  • the device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.
  • Various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof.
  • a microprocessor or a digital signal processor may be used in practice to implement some or all of the functions of some or all of the components in a computing processing device according to embodiments of the present invention.
  • the present invention can also be implemented as apparatus or apparatus programs (eg, computer programs and computer program products) for performing part or all of the methods described herein.
  • Such a program implementing the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals. Such signals may be downloaded from Internet sites, or provided on carrier signals, or in any other form.
  • FIG. 9 is a block diagram of a computing processing device provided by an embodiment of the present invention.
  • FIG. 9 shows a computing processing device that can implement the method according to the present invention.
  • the computing processing device traditionally includes a processor 710 and a computer program product or computer readable medium in the form of a memory 720 .
  • the memory 720 may be electronic memory such as flash memory, EEPROM (electrically erasable programmable read only memory), EPROM, hard disk, or ROM.
  • the memory 720 has storage space 730 for program code for performing any of the method steps in the above-described methods.
  • the storage space 730 for program codes may include various program codes for implementing various steps in the above methods, respectively.
  • These program codes can be read from or written to one or more computer program products.
  • These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks.
  • Such computer program products are typically portable or fixed storage units as described with reference to FIG. 10 .
  • the storage unit may have storage segments, storage spaces, etc. arranged similarly to the memory 720 in the computing processing device of FIG. 9 .
  • the program code may, for example, be compressed in a suitable form.
  • the storage unit includes computer readable code, ie code readable by a processor such as 710 for example, which when executed by a computing processing device, causes the computing processing device to perform each of the methods described above. step.
  • references herein to "one embodiment,” “an embodiment,” or “one or more embodiments” means that a particular feature, structure, or characteristic described in connection with an embodiment is included in at least one embodiment of the present invention. Also, please note that instances of the phrase “in one embodiment” herein are not necessarily all referring to the same embodiment.
  • any reference signs placed between parentheses shall not be construed as limiting the claim.
  • the word “comprising” does not exclude the presence of elements or steps not listed in a claim.
  • the word “a” or “an” preceding an element does not exclude the presence of a plurality of such elements.
  • the invention can be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means may be embodied by one and the same item of hardware.
  • the use of the words first, second, and third, etc. do not denote any order. These words can be interpreted as names.

Abstract

A model generation method and apparatus, an image processing method and apparatus, and a readable storage medium. The model generation method comprises: acquiring training data, wherein the training data comprises sample images and labels of the sample images, and the labels comprise labels generated by using at least two image information acquisition methods; and training an initial model according to the sample images and the labels of the sample images, so as to generate an image processing model, wherein the image processing model is used for extracting image information, and the initial model comprises at least a single processing branch. When labels are labeled by using a plurality of image information acquisition methods, the problem of samples being insufficient caused by the limitation of a single labeling method can be avoided. In this way, training data labeled using a plurality of image information acquisition methods is acquired and is subjected to training, and the diversity and sufficiency of the training data can be ensured, so that the generalization capability of a finally generated image processing model can be improved to a certain extent, thereby improving the accuracy of image information subsequently extracted using the image processing model.

Description

模型生成方法、图像处理方法、装置及可读存储介质Model generation method, image processing method, device and readable storage medium 技术领域technical field
本发明属于网络技术领域,特别是涉及一种模型生成方法、图像处理方法、装置及可读存储介质。The present invention belongs to the field of network technology, and in particular, relates to a model generation method, an image processing method, a device and a readable storage medium.
背景技术Background technique
目前,图像作为获取信息的优良途径,越来越多的场景下会产出图像。为了提取图像中的图像信息,经常需要生成用于提取图像信息的图像处理模型。At present, images are an excellent way to obtain information, and images are produced in more and more scenarios. In order to extract image information in an image, it is often necessary to generate an image processing model for extracting image information.
现有方式中,往往是直接使用单一方式标注的训练数据生成图像处理模型。但是,这种方式最终生成的图像处理模型的泛化能力较弱,使用时提取到的图像信息的准确性较低。In the existing methods, image processing models are often generated directly using training data marked in a single method. However, the generalization ability of the image processing model finally generated by this method is weak, and the accuracy of the image information extracted when used is low.
发明内容SUMMARY OF THE INVENTION
本发明提供一种模型生成方法、图像处理方法、装置及可读存储介质,以便解决图像处理模型的泛化能力较弱且使用时提取到的图像信息的准确性较低的问题。The present invention provides a model generation method, an image processing method, a device and a readable storage medium, so as to solve the problems of weak generalization ability of the image processing model and low accuracy of the image information extracted during use.
为了解决上述技术问题,本发明是这样实现的:In order to solve the above-mentioned technical problems, the present invention is achieved in this way:
第一方面,本发明实施例提供了一种模型生成方法,该方法包括:In a first aspect, an embodiment of the present invention provides a model generation method, which includes:
获取训练数据;所述训练数据中包括样本图像以及所述样本图像的标签,所述标签包括以至少两种图像信息获取方式生成的标签;Acquiring training data; the training data includes a sample image and a label of the sample image, and the label includes a label generated in at least two ways of acquiring image information;
根据所述样本图像以及所述样本图像的标签,对初始模型进行训练,以生成图像处理模型;所述图像处理模型用于提取图像信息,所述初始模型包括单个处理分支。According to the sample images and the labels of the sample images, an initial model is trained to generate an image processing model; the image processing model is used to extract image information, and the initial model includes a single processing branch.
第二方面,本发明实施例提供了一种图像处理方法,应用于处理设备,所述方法包括:In a second aspect, an embodiment of the present invention provides an image processing method, which is applied to a processing device, and the method includes:
将待处理图像作为预设的图像处理模型的输入,以获取所述图像处理模型的输出;Using the image to be processed as the input of the preset image processing model to obtain the output of the image processing model;
根据所述图像处理模型的输出,获取所述待处理图像的图像信息;obtaining image information of the to-be-processed image according to the output of the image processing model;
其中,所述图像处理模型是根据上述模型生成方法生成的。Wherein, the image processing model is generated according to the above model generation method.
第三方面,本发明实施例提供了一种模型生成装置,所述装置包括存储器和处理器;In a third aspect, an embodiment of the present invention provides a model generation apparatus, the apparatus includes a memory and a processor;
所述存储器,用于存储程序代码;the memory for storing program codes;
所述处理器,调用所述程序代码,当所述程序代码被执行时,用于执行以下操作:The processor calls the program code, and when the program code is executed, is configured to perform the following operations:
获取训练数据;所述训练数据中包括样本图像以及所述样本图像的标签,所述标签包括以至少两种图像信息获取方式生成的标签;Acquiring training data; the training data includes a sample image and a label of the sample image, and the label includes a label generated in at least two ways of acquiring image information;
根据所述样本图像以及所述样本图像的标签,对初始模型进行训练,以生成图像处理模型;所述图像处理模型用于提取图像信息,所述初始模型包括单个处理分支。According to the sample images and the labels of the sample images, an initial model is trained to generate an image processing model; the image processing model is used to extract image information, and the initial model includes a single processing branch.
第四方面,本发明实施例提供了一种图像处理装置,所述装置包括存储器和处理器;In a fourth aspect, an embodiment of the present invention provides an image processing apparatus, the apparatus includes a memory and a processor;
所述存储器,用于存储程序代码;the memory for storing program codes;
所述处理器,调用所述程序代码,当所述程序代码被执行时,用于执行以下操作:The processor calls the program code, and when the program code is executed, is configured to perform the following operations:
将待处理图像作为预设的图像处理模型的输入,以获取所述图像处理模型的输出;Using the image to be processed as the input of the preset image processing model to obtain the output of the image processing model;
根据所述图像处理模型的输出,获取所述待处理图像的图像信息;obtaining image information of the to-be-processed image according to the output of the image processing model;
其中,所述图像处理模型是根据上述模型生成装置生成的。Wherein, the image processing model is generated according to the above-mentioned model generating device.
第五方面,本发明实施例提供了一种计算机可读存储介质,所述计算机可读存储介质上存储计算机程序,所述计算机程序被处理器执行时实现上述任一所述方法。In a fifth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, any one of the foregoing methods is implemented.
在本发明实施例中,可以获取训练数据,其中,训练数据中包括样本图像以及样本图像的标签,标签包括以至少两种图像信息获取方式生成的标签。然后,根据样本图像以及样本图像的标签,对初始模型进行训练,以生成图像处理模型,其中,图像处理模型用于提取图像信息,初始模型包括至少单个处理分支。由于以多种图像信息获取方式标注标签时可以避免单一方式标注方式的局限性导致的样本不足的问题,这样,通过获取以多种图像信息获取方式标注的训练数据进行训练,可以确保训练数据的多样性以及充足性,进而一定程度上可以提高最终生成的图像处理模型的泛化能力,从而提高后续使用该图像处理模型提取的图像信息的准确性。In this embodiment of the present invention, training data may be acquired, wherein the training data includes sample images and labels of the sample images, and the labels include labels generated by at least two image information acquisition methods. Then, based on the sample images and the labels of the sample images, the initial model is trained to generate an image processing model, wherein the image processing model is used to extract image information, and the initial model includes at least a single processing branch. Since the problem of insufficient samples caused by the limitation of a single labeling method can be avoided when labels are marked with multiple image information acquisition methods, so, by acquiring training data marked with multiple image information acquisition methods for training, the training data can be ensured. Diversity and sufficiency, which in turn can improve the generalization ability of the final image processing model to a certain extent, thereby improving the accuracy of the image information extracted by the image processing model subsequently.
附图说明Description of drawings
图1是本发明实施例提供的一种模型生成方法的步骤流程图;1 is a flow chart of steps of a model generation method provided by an embodiment of the present invention;
图2是本发明实施例提供的一种模型结构的示意图;2 is a schematic diagram of a model structure provided by an embodiment of the present invention;
图3是本发明实施例提供的一种融合层的示意图;3 is a schematic diagram of a fusion layer provided by an embodiment of the present invention;
图4是本发明实施例提供的一种深度可分卷积操作的示意图;4 is a schematic diagram of a depthwise separable convolution operation provided by an embodiment of the present invention;
图5是本发明实施例提供的一种处理层的示意图;5 is a schematic diagram of a processing layer provided by an embodiment of the present invention;
图6是本发明实施例提供的一种图像处理方法的步骤流程图;6 is a flowchart of steps of an image processing method provided by an embodiment of the present invention;
图7是本发明实施例提供的一种模型生成装置的框图;Fig. 7 is a block diagram of a model generation device provided by an embodiment of the present invention;
图8是本发明实施例提供的一种图像处理装置的框图;8 is a block diagram of an image processing apparatus provided by an embodiment of the present invention;
图9为本发明实施例提供的一种计算处理设备的框图;FIG. 9 is a block diagram of a computing processing device according to an embodiment of the present invention;
图10为本发明实施例提供的一种便携式或者固定存储单元的框图。FIG. 10 is a block diagram of a portable or fixed storage unit according to an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
图1是本发明实施例提供的一种模型生成方法的步骤流程图,如图1所示,所述方法可以包括:FIG. 1 is a flowchart of steps of a model generation method provided by an embodiment of the present invention. As shown in FIG. 1 , the method may include:
步骤101、获取训练数据;所述训练数据中包括样本图像以及所述样本图像的标签,所述标签包括以至少两种图像信息获取方式生成的标签。Step 101: Acquire training data; the training data includes a sample image and a label of the sample image, and the label includes a label generated in at least two ways of acquiring image information.
本发明实施例中,样本图像可以是接收用户输入得到的,也可以是从网络中自主获取得到的。例如,直接从开源数据库中下载样本图像。进一步地,样本图像的标签可以用于表征样本图像的图像信息。示例的,在图像信息为图像中人脸的角度信息时,样本图像的标签可以用于表征样本图像中人脸的角度信息。具体的,样本图像的标签可以为角度信息本身,或者是用于计算角度信息的数据,例如,用于计算角度信息的关键点信息。进一步地,用于获取标签的图像信息获取方式的具体种类以及具体数量可以根据实际需求设置,本发明实施例对此不作限定。In this embodiment of the present invention, the sample image may be obtained by receiving user input, or may be obtained independently from the network. For example, download sample images directly from open source databases. Further, the label of the sample image can be used to characterize the image information of the sample image. For example, when the image information is the angle information of the face in the image, the label of the sample image can be used to represent the angle information of the face in the sample image. Specifically, the label of the sample image may be the angle information itself, or the data used to calculate the angle information, for example, the key point information used to calculate the angle information. Further, the specific type and specific quantity of the image information acquisition manner used to acquire the label may be set according to actual requirements, which is not limited in this embodiment of the present invention.
步骤102、根据所述样本图像以及所述样本图像的标签,对初始模型进行训练,以生成图像处理模型;所述图像处理模型用于提取图像信息,所述初始模型包括单个处理分支。Step 102: Train an initial model according to the sample image and the label of the sample image to generate an image processing model; the image processing model is used to extract image information, and the initial model includes a single processing branch.
其中,初始模型的具体架构可以是根据实际需求预先设计的,本发明实施例中,通过设计包括单个处理分支的初始模型,一定程度上可以降低模型结构的复杂度,进而实现通过轻量化的模型结构设计,尽量减少需要的计算资源The specific architecture of the initial model may be pre-designed according to actual requirements. In the embodiment of the present invention, by designing an initial model including a single processing branch, the complexity of the model structure can be reduced to a certain extent, and a lightweight model can be realized. Structural design to minimize required computing resources
进一步地,在一种实现方式中,对初始模型进行训练可以是以样本图像作为初始模型的输入,然后将初始模型的输出作为预测值,根据样本图像的标签确定真实值,例如,将标签作为真实值。接着,根据预测值以及真实值计算初始模型当前的损失值,如果该损失值不满足预设要求,则说明该初始模型当前还未收敛,相应地,可以调整初始模型的模型参数,并对调整后的初始模型继续进行训练,直至损失值满足预设要求为止。最后,在某一轮初始模型的损失值满足预设要求时,将当前的初始模型作为最终的图像处理模型。Further, in an implementation manner, the training of the initial model may take the sample image as the input of the initial model, then use the output of the initial model as the predicted value, and determine the real value according to the label of the sample image, for example, use the label as the actual value. Next, the current loss value of the initial model is calculated according to the predicted value and the actual value. If the loss value does not meet the preset requirements, it means that the initial model has not yet converged. Accordingly, the model parameters of the initial model can be adjusted, and the adjustment The subsequent initial model continues to train until the loss value meets the preset requirements. Finally, when the loss value of a certain round of initial models meets the preset requirements, the current initial model is used as the final image processing model.
综上所述,本发明实施例提供的模型生成方法,可以获取训练数据,其中,训练数据中包括样本图像以及样本图像的标签,标签包括以至少两种图像信息获取方式生成的标签。然后,根据样本图像以及样本图像的标签,对初始模型进行训练,以生成图像处理模型,其中,图像处理模型用于提取图像信息,初始模型包括单个处理分支。由于以多种图像信息获取方式标注标签时可以避免单一标注方式的局限性造成的样本不足的问题,这样,通过获取以多种图像信息获取方式标注的训练数据进行训练,可以确保训练数据的多样性以及充足性,进而一定程度上可以提高最终生成的图像处理模型的泛化能力,从而提高后续使用该图像处理模型提取的图像信息的准确性。To sum up, the model generation method provided by the embodiments of the present invention can acquire training data, wherein the training data includes sample images and labels of the sample images, and the labels include labels generated by at least two image information acquisition methods. Then, based on the sample images and the labels of the sample images, the initial model is trained to generate an image processing model, wherein the image processing model is used to extract image information, and the initial model includes a single processing branch. Since the problem of insufficient samples caused by the limitation of a single labeling method can be avoided when labels are marked with multiple image information acquisition methods, so, by acquiring training data marked with multiple image information acquisition methods for training, the diversity of training data can be ensured. To a certain extent, the generalization ability of the final generated image processing model can be improved, thereby improving the accuracy of the image information extracted by the image processing model subsequently.
可选的,本发明实施例中的图像信息可以包括图像中人脸的角度信息,角度信息可以表征图像中人脸的姿态角。示例的,角度信息可以包括俯仰角(pitch)、偏航角(yaw)以及翻滚角(roll)。进一步地,标签可以包括以第一图像信息获取方式生成的第一标签以及以第二图像信息获取方式生成的第二标签。其中,第一图像信息获取方式可以包括根据图像中人脸关键点获取角度信息的方式,第二图像信息获取方式可以包括根据图像中像素点的颜色通道值进行回归检测以获取角度信息的方式。其中,像素点的颜色通道值可以是像素点的红绿蓝(Red-Green-Blue,RGB)颜色通道值。Optionally, the image information in this embodiment of the present invention may include angle information of the human face in the image, and the angle information may represent the posture angle of the human face in the image. For example, the angle information may include pitch, yaw, and roll. Further, the label may include a first label generated in a manner of acquiring first image information and a second tag generated in a manner of acquiring second image information. The first method of acquiring image information may include a method of acquiring angle information according to key points of faces in the image, and the second method of acquiring image information may include a method of performing regression detection according to color channel values of pixels in the image to obtain angle information. The color channel value of the pixel point may be the red-green-blue (Red-Green-Blue, RGB) color channel value of the pixel point.
由于人脸关键点的确定效率往往较高,因此,基于第一图像信息获 取方式往往能够得到较多的训练样本,但是随着图像中人脸的角度信息越来越大,人脸关键点的确定精度会随之降低,这样,就会导致标签的误差变大,进而导致最终生成的模型精度降低。而根据图像中像素点的颜色通道值进行回归检测以获取角度信息的方式中,受图像中人脸的角度信息变大的影响较小,即,在图像中人脸的角度信息较大的情况下,标签也能准确的表征人脸的角度信息,标签误差较小,标注质量较高。因此,本发明实施例中,结合第一图像信息获取方式以及第二图像信息获取方式进行标注,生成第一标签以及第二标签分别对应的两种训练数据,一定程度上可以在确保训练数据充足的同时,确保训练数据的精度进而确保模型训练效果,以实现在有限训练数据的情况下使得最终生成的图像处理模型能够较为准确的提取图像信息。Since the determination efficiency of face key points is often high, more training samples can often be obtained based on the first image information acquisition method, but as the angle information of the face in the image becomes larger and larger, the face key points The accuracy of the determination will decrease accordingly, which will lead to a larger error in the label, which in turn will lead to a decrease in the accuracy of the final generated model. However, in the method of performing regression detection based on the color channel value of the pixel in the image to obtain the angle information, the influence of the angle information of the face in the image becomes larger, that is, the angle information of the face in the image is larger. The label can also accurately represent the angle information of the face, the label error is small, and the label quality is high. Therefore, in the embodiment of the present invention, two kinds of training data corresponding to the first label and the second label are generated in combination with the first image information acquisition method and the second image information acquisition method, which can ensure sufficient training data to a certain extent. At the same time, the accuracy of the training data is ensured and the model training effect is ensured, so that the final generated image processing model can extract image information more accurately in the case of limited training data.
可选的,在本发明实施例的一种实现方式中,上述获取训练数据的操作,可以包括以下步骤:Optionally, in an implementation manner of the embodiment of the present invention, the above operation of acquiring training data may include the following steps:
步骤1011、获取初始图像以及所述初始图像的初始标签。Step 1011: Acquire an initial image and an initial label of the initial image.
本步骤中,可以从开源数据中获取图像作为初始图像,通过人工标注的方式获取初始图像的初始标签。或者,也可以是通过下述步骤A~步骤C实现获取初始图像以及初始图像的初始标签:In this step, an image can be obtained from open source data as an initial image, and an initial label of the initial image can be obtained by manual labeling. Alternatively, it is also possible to obtain the initial image and the initial label of the initial image through the following steps A to C:
步骤A、获取第一预设模型以及第二预设模型;所述第一预设模型用于根据图像中人脸关键点获取角度信息,所述第二预设模型用于根据图像中像素点的颜色通道值获取角度信息。Step A. Obtain a first preset model and a second preset model; the first preset model is used to obtain angle information according to the key points of the face in the image, and the second preset model is used to obtain the angle information according to the pixel points in the image. The color channel value of to get the angle information.
本步骤中,第一预设模型以及第二预设模型可以是预先训练好的模型。其中,预先训练第一预设模型时,可以以训练集中的图像及其对应标注的人脸关键点训练第一预设模型,使得第一预设模型可以学习到确定人脸关键点的能力,进一步地,第一预设模型可以根据预设获取算法,根据确定的人脸关键点计算出人脸的角度信息。预先训练第二预设模型时,可以以训练集中的图像及其对应标注的人脸角度信息训练第二预设模型,使得第二预设模型可以学习到根据图像中像素点进行回归检测以确定角度信息的能力。进一步地,由于训练用于根据图像中像素点的颜色通道值获取角度信息的模型时,往往是人工进行样本标注,这种方式中标注数据往往较少,且受到个人主观感受的影响,标注质量往往较差,进而会导致训练效果较差。本发明实施例中,通过结合两种预设模型获取两种方式对应的训练数据,并结合两种训练数据进行训练,一定程度 上可以避免基于单一标注方式训练时训练效果较差的问题。In this step, the first preset model and the second preset model may be pre-trained models. Wherein, when the first preset model is pre-trained, the first preset model can be trained with the images in the training set and the corresponding marked face key points, so that the first preset model can learn the ability to determine the face key points, Further, the first preset model may calculate the angle information of the face according to the determined key points of the face according to the preset acquisition algorithm. When pre-training the second preset model, the second preset model can be trained with the images in the training set and the correspondingly marked face angle information, so that the second preset model can learn to perform regression detection according to the pixels in the image to determine The ability to angle information. Further, when training a model for obtaining angle information according to the color channel value of the pixel in the image, the sample labeling is often done manually. In this way, the labeling data is often less, and it is affected by personal subjective feelings. tend to be poorer, which in turn leads to poorer training results. In the embodiment of the present invention, by combining two preset models to obtain training data corresponding to two methods, and combining the two types of training data for training, the problem of poor training effect when training based on a single labeling method can be avoided to a certain extent.
相应地,在获取第一预设模型以及第二预设模型时,可以是直接加载预先训练好的第一预设模型以及第二预设模型,进而一定程度上可以提高获取效率。例如,直接加载开源的第一预设模型以及第二预设模型。Correspondingly, when acquiring the first preset model and the second preset model, the pre-trained first preset model and the second preset model may be directly loaded, thereby improving the acquisition efficiency to a certain extent. For example, directly load the open-source first preset model and the second preset model.
步骤B、根据所述第一预设模型对第一样本图像进行处理,以获取所述第一标签,以及根据所述第二预设模型对第二样本图像进行处理,以获取所述第二标签。Step B. Process the first sample image according to the first preset model to obtain the first label, and process the second sample image according to the second preset model to obtain the first label. Two labels.
步骤C、将所述第一样本图像以及所述第二样本图像作为初始图像,将为所述第一标签以及所述第二标签作为所述初始标签。Step C, take the first sample image and the second sample image as the initial image, and use the first label and the second label as the initial label.
本发明实施例中,第一样本图像以及第二样本图像可以为多张图像,第一样本图像组成的图像集以及第二样本图像组成的图像集中可以不存在相同图像,也可以存在一部分相同图像,本发明实施例对此不作限定。获取第一标签时,可以将第一样本图像作为第一预设模型的输入,通过第一预设模型实现为第一样本图像打标签(tag),然后将该第一预设模型输出作为第一标签。需要说明的是,本发明实施例中也可以直接基于开源数据库获取存在人脸关键点数据作为图像作为第一样本图像,并根据这些图像中人脸关键点数据生成第一标签。进一步地,实际应用场景中也可以是利用第一预设模型确定第一样本图像中的人脸关键点,以人脸关键点作为训练标签,后期训练时,可以使得最终生成的图像处理模型学习到如何准确确定人脸关键点,相应地,图像处理模型提取角度信息时可以是根据确定人脸关键点以及预设的算法,计算对应的角度信息。In this embodiment of the present invention, the first sample image and the second sample image may be multiple images, and the image set composed of the first sample image and the image set composed of the second sample image may not have the same image, or may have a part of the same image The same image is not limited in this embodiment of the present invention. When acquiring the first label, the first sample image can be used as the input of the first preset model, the first sample image can be tagged by the first preset model, and then the first preset model is output as the first label. It should be noted that, in the embodiment of the present invention, the existing face key point data can also be directly obtained based on an open source database as an image as the first sample image, and the first label is generated according to the face key point data in these images. Further, in the actual application scenario, the first preset model can also be used to determine the key points of the face in the first sample image, and the key points of the face can be used as training labels. In the later training, the final generated image processing model can be After learning how to accurately determine the key points of the face, correspondingly, when the image processing model extracts the angle information, the corresponding angle information can be calculated according to the determination of the key points of the face and the preset algorithm.
获取第二标签时,可以将第二样本图像作为第二预设模型的输入,通过第二预设模型实现为第二样本图像打标签,然后将该第二预设模型输出作为第二标签。当然,本发明实施例中也可以采用人工标注,本发明实施例对此不作限定。需要说明的是,基于预设模型为样本图像打标签时,可能会存在一定的误差,进而会使得最终得到的训练数据中可能存在一小部分噪声数据。通过少部分的噪声数据可以为后续训练过程提供更丰富多样的信息,进而一定程度上可以提高模型的泛化能力。When acquiring the second label, the second sample image can be used as the input of the second preset model, the second sample image can be tagged by the second preset model, and then the second preset model is output as the second label. Of course, manual annotation may also be used in the embodiment of the present invention, which is not limited in the embodiment of the present invention. It should be noted that when labeling sample images based on the preset model, there may be certain errors, which may result in a small part of noise data in the final training data. A small amount of noise data can provide more abundant and diverse information for the subsequent training process, which in turn can improve the generalization ability of the model to a certain extent.
本发明实施例中,通过先获取第一预设模型以及第二预设模型,基于获取到的第一预设模型以及第二预设模型分别对第一样本图像以及第二样本图像进行标注,获取第一标签以及第二标签。这样,相较于人工标注的方式,一定程度上可以提高标注效率,降低标注成本。同时,将 样本图像划分为第一样本图像以及第二样本图像,针对第一样本图像以及第二样本图像使用不同的方式分别进行标注,可以方便后续进行交叉训练,使得初始模型可以在训练过程中基于两种训练数据的特征进行学习,进而一定程度上可以确保模型的训练效果。需要说明的是,本发明实施例中的图像信息还可以为其他信息,例如图像中人脸对应的年龄信息。相应地,在获取训练数据时,可以将样本图像中人脸对应的具体年龄信息作为该样本图像的标签。In the embodiment of the present invention, by first acquiring the first preset model and the second preset model, the first sample image and the second sample image are marked based on the acquired first preset model and the second preset model, respectively. , get the first label and the second label. In this way, compared with the manual labeling method, the labeling efficiency can be improved to a certain extent and the labeling cost can be reduced. At the same time, the sample image is divided into a first sample image and a second sample image, and the first sample image and the second sample image are marked in different ways, which can facilitate subsequent cross-training, so that the initial model can be trained during training. In the process, learning is performed based on the characteristics of the two training data, which can ensure the training effect of the model to a certain extent. It should be noted that, the image information in this embodiment of the present invention may also be other information, such as age information corresponding to the face in the image. Correspondingly, when acquiring the training data, the specific age information corresponding to the face in the sample image can be used as the label of the sample image.
步骤1012、根据所述初始图像以及所述初始标签,生成目标处理模型。Step 1012: Generate a target processing model according to the initial image and the initial label.
示例的,可以以初始图像以及初始标签作为训练数据,训练获取目标处理模型。例如,可以是以初始图像作为预设原始模型的输入,然后将预设原始模型的输出作为预测值,根据初始图像的初始标签确定真实值,例如,将初始标签作为真实值。接着,根据预测值以及真实值计算预设原始模型当前的损失值,如果该损失值不满足预设要求,则说明该预设原始模型当前还未收敛,相应地,可以调整预设原始模型的模型参数,并对调整后的预设原始模型继续进行训练,直至损失值满足预设要求为止。最后,在某一轮预设原始模型的损失值满足预设要求时,可以将当前的预设原始模型作为最终的目标处理模型。For example, the initial image and the initial label can be used as training data to train the acquisition target processing model. For example, the initial image can be used as the input of the preset original model, then the output of the preset original model can be used as the predicted value, and the real value can be determined according to the initial label of the initial image, for example, the initial label can be used as the real value. Next, the current loss value of the preset original model is calculated according to the predicted value and the actual value. If the loss value does not meet the preset requirements, it means that the preset original model has not yet converged. Accordingly, the preset original model can be adjusted. model parameters, and continue to train the adjusted preset original model until the loss value meets the preset requirements. Finally, when the loss value of the preset original model in a certain round meets the preset requirements, the current preset original model can be used as the final target processing model.
步骤1013、根据所述目标处理模型、所述初始图像以及所述初始标签,对所述初始图像进行筛选,以获取所述样本图像。Step 1013: Screen the initial image according to the target processing model, the initial image and the initial label to obtain the sample image.
本发明实施例中,通过对初始图像进行筛选,一定程度上可以实现自动剔除训练数据中的脏数据,进而提高训练数据的精度。同时,相较于人工筛选导致耗时较长,成本较大的问题,本发明实施例中通过自动筛选,一定程度上可以降低筛选成本以及筛选耗时,有利于模型的迭代更新。In the embodiment of the present invention, by screening the initial images, the dirty data in the training data can be automatically eliminated to a certain extent, thereby improving the accuracy of the training data. At the same time, compared with the problems of long time and high cost caused by manual screening, the automatic screening in the embodiment of the present invention can reduce the screening cost and screening time to a certain extent, which is beneficial to the iterative update of the model.
可选的,进行筛选时,对于任一初始图像,可以将初始图像作为目标处理模型的输入,以获取目标处理模型的输出;将所述输出作为所述初始图像的预测标签,并计算所述预测标签与所述初始标签之间的相似度;剔除相似度小于预设相似度阈值的初始图像。Optionally, when screening, for any initial image, the initial image can be used as the input of the target processing model to obtain the output of the target processing model; the output is used as the predicted label of the initial image, and the The similarity between the predicted label and the initial label is eliminated; the initial image whose similarity is less than a preset similarity threshold is eliminated.
其中,将初始图像输入目标处理模型之后,目标处理模型可以对初始图像进行处理,进而得到输出。以model_origin表示目标处理模型,dataset_origin表示初始图像组成的集合为例,可以使用model_origin确定 dataset_origin中各个图像的预测标签。进一步地,预测标签与初始标签之间的相似度可以表征两者之间的接近程度,如果预测标签与初始标签之间的相似度越大,则可以认为初始标签更为准确,可信度越高。如果预测标签与初始标签之间的相似度越小,则可以认为初始标签可信度越低。Wherein, after inputting the initial image into the target processing model, the target processing model can process the initial image, and then obtain the output. Taking model_origin representing the target processing model and dataset_origin representing the set of initial images as an example, you can use model_origin to determine the predicted labels of each image in dataset_origin. Further, the similarity between the predicted label and the initial label can represent the closeness between the two. If the similarity between the predicted label and the initial label is greater, it can be considered that the initial label is more accurate and the more reliable. high. If the similarity between the predicted label and the initial label is smaller, the initial label can be considered to be less credible.
相应地,可以根据相似度与预设相似度阈值之间的大小关系进行筛选。如果相似度小于预设相似度阈值,则可以认为该初始图像的初始标签可信度较低,因此可以将该初始图像剔除,仅保留相似度不小于预设相似度阈值,可信度较高的初始图像作为样本图像。其中,预设相似度阈值可以是根据实际需求设置,本发明实施例对此不作限定。进一步地,后续可以将被保留的初始图像组成的集合dataset_clean作为训练数据,以训练获取图像处理模型,该训练获取到的图像处理模型可以表示为model_clean。本发明实施例中,通过计算预测标签与初始标签之间的相似度,根据相似度对初始图像进行筛选,这样一定程度上可以确保筛选操作的精度。Correspondingly, screening can be performed according to the magnitude relationship between the similarity and the preset similarity threshold. If the similarity is less than the preset similarity threshold, it can be considered that the initial label of the initial image has low reliability, so the initial image can be eliminated, and only the similarity is not less than the preset similarity threshold, and the reliability is high The initial image is used as the sample image. The preset similarity threshold may be set according to actual requirements, which is not limited in this embodiment of the present invention. Further, the set dataset_clean composed of the retained initial images may be used as training data to train and obtain an image processing model, and the image processing model obtained by the training may be represented as model_clean. In the embodiment of the present invention, by calculating the similarity between the predicted label and the initial label, the initial image is screened according to the similarity, so that the accuracy of the screening operation can be ensured to a certain extent.
可选的,在计算预测标签与初始标签之间的相似度时,可以计算所述预测标签与所述初始标签之间的差值的绝对值,根据所述绝对值确定所述预测标签与所述初始标签之间的相似度;所述相似度与所述绝对值负相关。Optionally, when calculating the similarity between the predicted label and the initial label, the absolute value of the difference between the predicted label and the initial label can be calculated, and the predicted label and the initial label can be determined according to the absolute value. similarity between the initial labels; the similarity is negatively correlated with the absolute value.
其中,绝对值越大,则可以说明预测标签与初始标签之间的差距越大。绝对值越小,则可以说明预测标签与初始标签之间的差距越小。因此,可以设置相似度与绝对值负相关。示例的,假设以predict_label表示预测标签,以label表示初始标签,那么预测标签与初始标签之间的差值的绝对值可以表示为abs(predict_label-label)。其中abs(*)表示对输入“*”取绝对值。进一步地,可以将-abs(predict_label-label)作为相似度。本发明实施例中,通过计算两者差值的绝对值,根据绝对值确定相似度,可以便捷的实现计算相似度,进而一定程度上可以提高计算效率。当然,也可以采用其他相似度算法进行计算,本发明实施例对此不作限定。Among them, the larger the absolute value, the larger the gap between the predicted label and the initial label. The smaller the absolute value, the smaller the gap between the predicted label and the initial label. Therefore, the similarity can be set to be negatively correlated with the absolute value. As an example, assuming that predict_label represents the predicted label and label represents the initial label, the absolute value of the difference between the predicted label and the initial label can be expressed as abs(predict_label-label). Where abs(*) means to take the absolute value of the input "*". Further, -abs(predict_label-label) can be used as similarity. In the embodiment of the present invention, by calculating the absolute value of the difference between the two, and determining the similarity according to the absolute value, the calculation of the similarity can be conveniently realized, and the calculation efficiency can be improved to a certain extent. Of course, other similarity algorithms may also be used for calculation, which is not limited in this embodiment of the present invention.
需要说明的是,本发明实施例中还可以通过人工判别的方式,从被剔除的初始图像中获取由于误筛选导致被剔除的初始图像,并加入训练数据中。这样,可以一定程度上消除误筛选导致训练数据减少的问题。同时,相较于直接依靠人工筛选训练数据的方式,本发明实施例中,先 通过自动筛选,之后再通过人工对被剔除的初始图像进行二次回收,这样,可以在降低人工处理量,提高筛选速度的同时,提高筛选精度。It should be noted that, in the embodiment of the present invention, an initial image that was rejected due to misscreening may also be obtained from the rejected initial image by manual discrimination, and added to the training data. In this way, the problem of reducing training data caused by false screening can be eliminated to a certain extent. At the same time, compared to the method of directly relying on manual screening of training data, in the embodiment of the present invention, the initial images that are rejected are manually recycled for the second time through automatic screening, so that the manual processing volume can be reduced and the increase of While the screening speed is improved, the screening accuracy is improved.
可选的,上述根据所述样本图像以及所述样本图像的标签,对初始模型进行训练,以生成图像处理模型的操作,可以包括以下步骤:Optionally, the above-mentioned operation of training an initial model to generate an image processing model according to the sample image and the label of the sample image may include the following steps:
步骤1021、将所述样本图像调整为多个预设尺寸下的样本图像。Step 1021: Adjust the sample image to a sample image with multiple preset sizes.
本发明实施例中,预设尺寸可以是根据实际需求设置的,示例的,预设尺寸可以包括64*64,48*48,40*40,等等。进一步地,在一种实现方式中,可以将样本图像划分为N组,其中,N可以为预设尺寸的数量。接着,可以为不同组设定不同的预设尺寸,最后将每个组中的样本图像的尺寸调整为该组对应的预设尺寸,进而得到多个预设尺寸下的样本图像。当然,也可以采用其他方式进行调整,本发明实施例对此不作限定。In this embodiment of the present invention, the preset size may be set according to actual requirements. By way of example, the preset size may include 64*64, 48*48, 40*40, and the like. Further, in an implementation manner, the sample images may be divided into N groups, where N may be the number of preset sizes. Next, different preset sizes may be set for different groups, and finally the size of the sample images in each group is adjusted to the preset size corresponding to the group, thereby obtaining sample images in multiple preset sizes. Of course, other manners may also be used for adjustment, which is not limited in this embodiment of the present invention.
步骤1022、对于各个所述预设尺寸下的样本图像,根据所述样本图像以及所述样本图像的标签,对所述初始模型进行训练,以获取各个所述预设尺寸下的图像处理模型。Step 1022: For each sample image in the preset size, train the initial model according to the sample image and the label of the sample image, so as to obtain an image processing model in each preset size.
本步骤中,样本图像的尺寸可以表征模型对应的输入尺寸(inputsize),以各个预设尺寸下的样本图像,分别对初始模型进行训练,进而可以得到对应不同输入尺寸的图像处理模型,其中,对应不同输入尺寸的图像处理模型即为不同预设尺寸下的图像处理模型。进一步地,由于对应的输入尺寸越大,模型的计算量往往会越高。示例的,以模型运行时的双格式乘法累加(Multiply Accumulate,MACC)量表征模型计算量时,对应的输入尺寸依次为40*40,48*48,64*64的3版图像处理模型,其各自对应的MACC计算量可以分别是3.8兆(MByte,M),4.8M,8.7M。本发明实施例中,生成对应不同输入尺寸的图像处理模型,可以实现为用户提供不同计算量的图像处理模型,进而方便用户按照实际需求选择使用。例如,在图像处理模型应用于美颜系统的场景下,假设美颜系统设置的要求为模型的计算量在10M以内,那么可以在这3版模型中任选一版模型使用。或者,可以直接选择对应的计算量最小的图像处理模型,以最大程度的降低计算量,提高运行速度,以使美颜系统更加流畅。In this step, the size of the sample image can represent the input size (inputsize) corresponding to the model, and the initial model is trained with the sample images in each preset size, and then image processing models corresponding to different input sizes can be obtained, wherein, Image processing models corresponding to different input sizes are image processing models under different preset sizes. Further, due to the larger corresponding input size, the computational complexity of the model tends to be higher. For example, when the model calculation amount is represented by the double-format Multiply Accumulate (MACC) amount when the model is running, the corresponding input size is 40*40, 48*48, 64*64. The corresponding MACC calculation amounts may be 3.8M (MByte, M), 4.8M, and 8.7M, respectively. In the embodiment of the present invention, generating image processing models corresponding to different input sizes can provide users with image processing models with different calculation amounts, thereby facilitating users to choose and use them according to actual needs. For example, in the scene where the image processing model is applied to the beauty system, if the requirement set by the beauty system is that the calculation amount of the model is less than 10M, then you can choose one version of the three versions of the model to use. Alternatively, you can directly select the corresponding image processing model with the least amount of calculation to minimize the amount of calculation and increase the running speed to make the beauty system smoother.
本发明实施中,通过将样本图像调整为多个预设尺寸下的样本图像,对于各个预设尺寸下的样本图像,根据样本图像以及样本图像的标签,对初始模型进行训练,以获取各个预设尺寸下的图像处理模型,即,生成对应不同输入尺寸的图像处理模型,为用户提供多种选择,使得后续 应用时用户可以根据实际需求选择适合设备能力的图像处理模型,进而提高操作的灵活性。In the implementation of the present invention, by adjusting the sample images to sample images in a plurality of preset sizes, for the sample images in each preset size, the initial model is trained according to the sample images and the labels of the sample images, so as to obtain the sample images of each preset size. Set the image processing model under the size, that is, generate image processing models corresponding to different input sizes, provide users with a variety of choices, so that users can choose the image processing model suitable for the device's capabilities according to actual needs in subsequent applications, thereby improving the flexibility of operation. sex.
可选的,本发明实施例中可以对各个预设尺寸下的样本图像,可以分别执行一次根据该预设尺寸下的样本图像以及所述样本图像的标签进行训练的操作,以实现生成该预设尺寸下的图像处理模型。Optionally, in this embodiment of the present invention, an operation of training according to the sample images in the preset size and the labels of the sample images may be performed once for each sample image in a preset size, so as to realize the generation of the preset size. Set the size of the image processing model.
具体的,样本图像可以包括第一样本图像及第二样本图像,第一样本图像的第一标签的获取方式与第二样本图像的第二标签的获取方式可以不同。上述根据样本图像以及样本图像的标签,对初始模型进行训练的步骤,可以包括:Specifically, the sample image may include a first sample image and a second sample image, and the manner of acquiring the first label of the first sample image and the manner of acquiring the second label of the second sample image may be different. The above steps of training the initial model according to the sample images and the labels of the sample images may include:
步骤10221、将所述第一样本图像划分为多个第一样本组,以及,将所述第二样本图像划分为多个第二样本组。Step 10221: Divide the first sample image into multiple first sample groups, and divide the second sample image into multiple second sample groups.
示例的,可以将第一样本图像等分为多个图像组,进而得到多个第一样本组。将第二样本图像等分为多个图像组,进而得到多个第二样本组。当然,也可以随机进行分组,本发明实施例对此不作限定。For example, the first sample image may be equally divided into multiple image groups, thereby obtaining multiple first sample groups. The second sample image is equally divided into a plurality of image groups, thereby obtaining a plurality of second sample groups. Of course, grouping may also be performed randomly, which is not limited in this embodiment of the present invention.
步骤10222、根据所述第一样本组中的第一样本图像及所述第一样本图像的第一标签,以及所述第二样本组中的第二样本图像及所述第一样本图像的第二标签,对所述初始模型进行交叉训练。Step 10222: According to the first sample image in the first sample group and the first label of the first sample image, and the second sample image in the second sample group and the first sample The second label for this image, cross-training the initial model.
本发明实施例中,由于第一样本图像和第二样本图像具有不同的标签,即,训练数据是分开的,本发明实施例中通过将第一样本图像以及第二样本图像分贝划分多个第一样本组以及多个第二样本组,联合第一样本组以及第二样本组进行交叉训练,这样,可以使初始模型在训练过程中较为均衡的基于两种训练样本进行学习,进而一定程度上可以使得最终能够结合两种图像信息获取方式得到最终的图像处理模型,提高最终的训练效果。当然,也可以采用其他方式训练。例如,第一样本图像以及第二样本图像可以为相同图像,即,针对同一样本图像同时设置第一标签以及第二标签。相应地,这种情况下,可以直接使用样本图像进行训练,本发明实施例对此不作限定。In the embodiment of the present invention, since the first sample image and the second sample image have different labels, that is, the training data are separated, in the embodiment of the present invention, the first sample image and the second sample image are divided into multiple decibels. A first sample group and a plurality of second sample groups are combined for cross-training with the first sample group and the second sample group. In this way, the initial model can be learned based on the two training samples in a balanced manner during the training process. In turn, to a certain extent, the final image processing model can be obtained by combining the two image information acquisition methods, and the final training effect can be improved. Of course, other training methods can also be used. For example, the first sample image and the second sample image may be the same image, that is, the first label and the second label are simultaneously set for the same sample image. Correspondingly, in this case, the sample image may be directly used for training, which is not limited in this embodiment of the present invention.
可选的,在划分第一样本组以及第二样本组时,可以设置第一样本组中包含的第一样本图像的数量与第二样本组中包含的第二样本图像的数量相同。进一步地,上述根据所述第一样本组中的第一样本图像及所述第一样本图像的第一标签,以及所述第二样本组中的第二样本图像及所述第一样本图像的第二标签,对所述初始模型进行交叉训练的步骤, 可以包括:Optionally, when dividing the first sample group and the second sample group, the number of first sample images included in the first sample group can be set to be the same as the number of second sample images included in the second sample group. . Further, the above is based on the first sample image and the first label of the first sample image in the first sample group, and the second sample image and the first sample image in the second sample group. The second label of the sample image, the step of cross-training the initial model may include:
步骤10222a、根据一个所述第一样本组中的第一样本图像及所述第一样本图像的第一标签,对所述初始模型进行训练,以更新所述初始模型的模型参数。Step 10222a: Train the initial model according to a first sample image in a first sample group and a first label of the first sample image to update model parameters of the initial model.
示例的,可以从第一样本组中选择一个未使用过的第一样本组,然后将选择的第一样本组中的第一样本图像作为初始模型的输入,基于初始模型的输出以及第一标签确定损失值,如果该损失值不满足预设要求,则可以更新该初始模型的模型参数。例如,可以采用预设的随机梯度下降法进行模型参数调整,以实现更新。Illustratively, an unused first sample group may be selected from the first sample group, and then the first sample image in the selected first sample group may be used as the input of the initial model, based on the output of the initial model. And the first tag determines a loss value, and if the loss value does not meet the preset requirements, the model parameters of the initial model can be updated. For example, a preset stochastic gradient descent method can be used to adjust the model parameters to achieve the update.
步骤10222b、在更新所述初始模型的模型参数之后,根据所述第二样本组中的第二样本图像及所述第二样本图像的第二标签,对所述初始模型进行训练,以更新所述初始模型的模型参数,并在更新所述模型参数之后重新执行所述根据一个所述第一样本组中的第一样本图像及所述第一样本图像的第一标签,对所述初始模型进行训练的步骤。Step 10222b, after updating the model parameters of the initial model, train the initial model according to the second sample image in the second sample group and the second label of the second sample image to update the The model parameters of the initial model, and after updating the model parameters, the first sample image in the first sample group and the first label of the first sample image are re-executed for all the Describe the steps for training the initial model.
本步骤中,在更新模型参数之后,可以继续使用第二样本组进行训练,以实现第一样本组以及第二样本组之间的交叉训练。示例的,可以从第二样本组中选择一个未使用过的第二样本组,然后将选择的第二样本组中的第二样本图像作为初始模型的输入,基于初始模型的输出以及第二标签确定损失值,如果该损失值不满足预设要求,则可以更新该初始模型的模型参数。例如,可以采用预设的随机梯度下降法进行模型参数调整,以实现更新。相应地,在基于第二样本组进行参数更新之后,可以重复上述基于第一样本组进行训练更新的过程,以实现循环更新训练。最终,可以在初始模型的损失值满足预设要求的情况下,结束训练。In this step, after the model parameters are updated, the second sample group may be used for training, so as to realize cross-training between the first sample group and the second sample group. For example, an unused second sample group can be selected from the second sample group, and then the second sample image in the selected second sample group is used as the input of the initial model, based on the output of the initial model and the second label. The loss value is determined, and if the loss value does not meet the preset requirements, the model parameters of the initial model can be updated. For example, a preset stochastic gradient descent method can be used to adjust the model parameters to achieve the update. Correspondingly, after the parameter update is performed based on the second sample group, the above-mentioned process of performing the training update based on the first sample group may be repeated to realize cyclic update training. Finally, the training can be ended when the loss value of the initial model meets the preset requirements.
本发明实施例中,基于包含相同数量的第一样本组以及第二样本组,每使用一个第一样本组对初始模型进行训练更新之后,交替使用一个第二样本组对初始模型进行训练更新,通过循环交替实现交叉训练。由于第一样本组以及第二样本组中包含的图像数量相同,因此一定程度上可以提高每次交叉训练时训练数据的平衡性,进而确保交叉训练效果。同时,基于第一样本组以及第二样本组不断交替训练更新模型参数,可以为模型增加丰富的可学习信息以及使模型在更新过程中均衡的被两种训练样本优化,进而一定程度上可以提高模型的泛化能力以及加快模型的收敛效率。In this embodiment of the present invention, based on the same number of first sample groups and second sample groups, after each first sample group is used to train and update the initial model, a second sample group is alternately used to train the initial model Update, cross-training is achieved by alternating cycles. Since the number of images contained in the first sample group and the second sample group is the same, the balance of training data during each cross-training can be improved to a certain extent, thereby ensuring the effect of cross-training. At the same time, continuous training and updating of model parameters based on the first sample group and the second sample group can add rich learnable information to the model and enable the model to be optimized by the two training samples in a balanced manner during the update process. Improve the generalization ability of the model and speed up the convergence efficiency of the model.
需要说明的是,本发明实施例中提及的初始模型以及图像处理模型的结构相同,但是各个层中的模型参数可以不同,通过训练初始模型以生成图像处理模型的过程即为调整模型参数的过程。It should be noted that the structures of the initial model and the image processing model mentioned in the embodiments of the present invention are the same, but the model parameters in each layer may be different. The process of generating the image processing model by training the initial model is the process of adjusting the model parameters. process.
在一种实现方式中,初始模型包括的单个处理分支,可以具体包括卷积层、激活函数层、并行的最大池化层以及平均池化层、拼接层、融合层以及用于对所述融合层的输出进行处理的处理层。In an implementation manner, the single processing branch included in the initial model may specifically include a convolution layer, an activation function layer, a parallel maximum pooling layer and an average pooling layer, a splicing layer, a fusion layer, and a The processing layer where the output of the layer is processed.
其中,卷积层可以用于通过卷积操作提取模型输入的特征。设置激活函数层可以用于模型添加非线性因素,通过设置激活函数层可以避免模型仅为单纯的线性组合,进而一定程度上可以提高模型的表达能力。进一步地,通过设置池化层可以在保留主要的特征的同时去除冗余信息,降低数据的大小。其中,激活函数层所使用的激活函数的具体类型可以根据实际需求设置。可选的,本发明实施例中的激活函数层可以均为线性整流函数(Rectified Linear Unit,Relu)激活函数层。这样,通过将激活函数层均设置为Relu激活函数层,可以在保证模型精度的情况下,避免模型结构中使用过多的双曲正切(Tanh)激活函数,导致的模型存在梯度饱和,训练效率低的问题,以及避免模型实际运行速度达不到理论速度的问题,进而可以确保模型实际运行时长与模型的理论计算量成正比,进一步避免理论计算量较低,但是实际运行时长却更长的问题。当然,也可以将部分激活函数层设置为Tanh激活函数层,将部分激活函数层设置为Relu激活函数层,等等,本发明实施例对此不作限定。Among them, the convolution layer can be used to extract the features of the model input through the convolution operation. Setting the activation function layer can be used to add nonlinear factors to the model. By setting the activation function layer, the model can be prevented from being only a simple linear combination, and the expressive ability of the model can be improved to a certain extent. Further, by setting the pooling layer, redundant information can be removed while retaining the main features, thereby reducing the size of the data. The specific type of the activation function used by the activation function layer can be set according to actual requirements. Optionally, the activation function layers in this embodiment of the present invention may all be linear rectification function (Rectified Linear Unit, Relu) activation function layers. In this way, by setting the activation function layer to the Relu activation function layer, it is possible to avoid using too many hyperbolic tangent (Tanh) activation functions in the model structure while ensuring the accuracy of the model. Low problem, and avoid the problem that the actual running speed of the model cannot reach the theoretical speed, so as to ensure that the actual running time of the model is proportional to the theoretical calculation amount of the model, and further avoid the theoretical calculation amount is low, but the actual running time is longer. question. Certainly, a part of the activation function layer may also be set as a Tanh activation function layer, a part of the activation function layer may be set as a Relu activation function layer, etc., which is not limited in this embodiment of the present invention.
当然,初始模型还可以包含其他层,示例的,图2是本发明实施例提供的一种模型结构的示意图,如图2所示,Input(3*40*40)表示包含3个颜色通道尺寸为40*40的输入图像,SeparableConxBnRelu表示深度可分卷积层、批量归一化(Batch Normalization,BN)层、Relu激活函数层。Avgpool表示平均池化层、maxpool表示最大池化层。concat表示拼接层,conv2d表示二维卷积操作层。其中,拼接层可以用于对平均池化层的输出以及最大池化层的输出进行拼接,conv2d可以用于对拼接层的拼接结果进行二维卷积操作,这样,可以避免仅拼接使平均池化层的输出以及最大池化层的输出,导致后续的深度可分卷积层无法综合提取到混合后的信息的问题,进而可以确保后续的处理效果。Of course, the initial model may also include other layers. As an example, FIG. 2 is a schematic diagram of a model structure provided by an embodiment of the present invention. As shown in FIG. 2 , Input(3*40*40) indicates that it contains three color channel sizes For a 40*40 input image, SeparableConxBnRelu represents a depthwise separable convolution layer, a batch normalization (BN) layer, and a Relu activation function layer. Avgpool represents the average pooling layer, and maxpool represents the maximum pooling layer. concat represents the concatenation layer, and conv2d represents the two-dimensional convolution operation layer. Among them, the splicing layer can be used to splicing the output of the average pooling layer and the output of the maximum pooling layer, and conv2d can be used to perform a two-dimensional convolution operation on the splicing result of the splicing layer. The output of the fusion layer and the output of the maximum pooling layer lead to the problem that the subsequent depthwise separable convolution layer cannot comprehensively extract the mixed information, which can ensure the subsequent processing effect.
进一步地,以Fusioni表示第i个融合层,初始模型中包括的融合层可以为Fusion1,Fusion2,Fusion3。以Stagei_output表示第i个处理层, 初始模型中包括的处理层可以为Stage1_output,Stage2_output,Stage3_output。进一步地,图2中的(16,3,1),(24,3,1),(48,3,1),(96,3,1)中的“1”可以表示卷积操作的步长,“3”可以表示所使用的卷积核的大小,“16”、“24”、“48”以及“96”可以分别表示卷积核的数量。图2中的conv2d(24,1,1)表示以1作为步长,基于24个大小为1×1的卷积核进行二维卷积操作,conv2d(48,1,1)表示以1作为步长,基于48个大小为1×1的卷积核进行二维卷积操作。图2中的(2,2)表示池化操作每次处理的区域大小。由于随着池化层的不断处理,特征图的大小会相应降低,相应地,本发明实施例通过设置各个深度可分卷积层中的卷积核数量依次递增,这样,一定程度上可以在确保计算量不会过大的同时,确保每次卷积操作能够提取到足够的特征信息。Further, the ith fusion layer is represented by Fusioni, and the fusion layers included in the initial model can be Fusion1, Fusion2, and Fusion3. The i-th processing layer is represented by Stagei_output, and the processing layers included in the initial model can be Stage1_output, Stage2_output, and Stage3_output. Further, "1" in (16, 3, 1), (24, 3, 1), (48, 3, 1), (96, 3, 1) in Figure 2 can represent the step of the convolution operation long, "3" can indicate the size of the convolution kernel used, and "16", "24", "48" and "96" can indicate the number of convolution kernels respectively. conv2d(24, 1, 1) in Figure 2 indicates that the step size is 1, and the two-dimensional convolution operation is performed based on 24 convolution kernels with a size of 1 × 1. conv2d(48, 1, 1) indicates that 1 is used as the The stride is based on 48 convolution kernels of size 1 × 1 for 2D convolution operations. (2, 2) in Figure 2 represents the size of the region processed by the pooling operation each time. Since the size of the feature map will decrease correspondingly with the continuous processing of the pooling layer, accordingly, in the embodiment of the present invention, the number of convolution kernels in each depth-separable convolutional layer is set to increase in sequence. While ensuring that the amount of calculation is not too large, ensure that each convolution operation can extract enough feature information.
进一步地,图3是本发明实施例提供的一种融合层的示意图,如图3所示,Stream1_stagei表示平均池化层输入至第i个融合层的输入数据,Stream2_stagei分别表示最大池化层输入至第i个融合层的输入数据。SeparableConxBnRelu表示深度可分卷积层、BN层、Relu激活函数层。Avgpool表示平均池化层。Maxpool表示最大池化层。“Elements multiply”表示用于进行元素相乘的层。Further, FIG. 3 is a schematic diagram of a fusion layer provided by an embodiment of the present invention. As shown in FIG. 3 , Stream1_stagei represents the input data input by the average pooling layer to the ith fusion layer, and Stream2_stagei respectively represents the input of the maximum pooling layer. Input data to the ith fusion layer. SeparableConxBnRelu represents the depthwise separable convolution layer, BN layer, and Relu activation function layer. Avgpool represents the average pooling layer. Maxpool represents the max pooling layer. "Elements multiply" means the layer used for element multiplication.
本发明实施例中,通过设计单个处理分支、融合层以及多个并行的池化层,这样使得后续使用该模型时,模型可以基于融合层对多个并行的池化层的输出多种信息进行融合,进而一定程度上可以提高处理精度。In this embodiment of the present invention, by designing a single processing branch, a fusion layer, and multiple parallel pooling layers, when the model is subsequently used, the model can perform a variety of output information from multiple parallel pooling layers based on the fusion layer. Fusion can improve the processing accuracy to a certain extent.
可选的,本发明实施例的初始模型中的卷积层可以用于进行深度可分卷积操作。即,初始模型中的卷积层可以均为深度可分卷积层。进一步地,深度可分卷积层可以用于执行深度可分卷积操作,深度可分卷积操作可以包括空间/深度卷积(Depthwise Convolution)以及通道卷积(Pointwise Convolution)两部分。具体执行时,可以先对特征图的通道分别进行depthwise convolution,并对输出进行拼接,然后,使用单位卷积核进行pointwise convolution。示例的,图4是本发明实施例提供的一种深度可分卷积操作的示意图,如图4所示,可以先执行Depthwise_Conv(3,1),然后执行Pointwise_Conv(1,1),以实现3×3的卷积运算。其中,Depthwise_Conv表示空间卷积,Pointwise_Conv表示通道卷积。具体的执行过程可以为先通过一个3×1的空间卷积,最后再通过一个1×1 的通道卷积。Optionally, the convolution layer in the initial model of the embodiment of the present invention may be used to perform a depthwise separable convolution operation. That is, the convolutional layers in the initial model can all be depthwise separable convolutional layers. Further, the depthwise separable convolution layer can be used to perform a depthwise separable convolution operation, and the depthwise separable convolution operation may include two parts: spatial/depthwise convolution (Depthwise Convolution) and channel convolution (Pointwise Convolution). In specific implementation, depthwise convolution can be performed on the channels of the feature map respectively, and the output is spliced, and then the unit convolution kernel is used for pointwise convolution. Exemplarily, FIG. 4 is a schematic diagram of a depthwise separable convolution operation provided by an embodiment of the present invention. As shown in FIG. 4 , Depthwise_Conv(3,1) may be executed first, and then Pointwise_Conv(1,1) may be executed to achieve 3×3 convolution operation. Among them, Depthwise_Conv represents spatial convolution, and Pointwise_Conv represents channel convolution. The specific execution process can be first through a 3×1 spatial convolution, and finally through a 1×1 channel convolution.
相较于直接使用标准卷积操作的方式,本发明实施例中,通过将标准卷积拆分为两部分,可以实现拆分空间维度和通道维度的相关性,这样,可以减少卷积计算所需要的参数个数,进而一定程度上可以降低模型计算量,提高模型计算效率以及计算速度。Compared with the method of directly using the standard convolution operation, in the embodiment of the present invention, by dividing the standard convolution into two parts, the correlation between the spatial dimension and the channel dimension can be split, so that the convolution calculation can be reduced. The number of parameters required can reduce the model calculation amount to a certain extent, and improve the model calculation efficiency and calculation speed.
可选的,本发明实施例中的处理层可以包括第一处理层以及第二处理层,其中,第一处理层可以包括全连接层(Fully Connected layers,FC)以及激活函数层,第二处理层可以包括深度可分卷积层。其中,FC可以用于对之前层提取到的特征重新通过权值矩阵组装成完整的特征图。以及在模型中起到分类器的作用。相较于直接使用标准卷积操作的方式,本发明实施例中,通过在处理层中设置深度可分卷积层,可以减少卷积计算所需要的参数个数,进而一定程度上可以降低计算量,提高计算效率。Optionally, the processing layer in this embodiment of the present invention may include a first processing layer and a second processing layer, wherein the first processing layer may include a fully connected layer (Fully Connected layers, FC) and an activation function layer, the second processing layer. The layers may include depthwise separable convolutional layers. Among them, FC can be used to reassemble the features extracted by the previous layer into a complete feature map through the weight matrix. And play the role of a classifier in the model. Compared with the method of directly using the standard convolution operation, in the embodiment of the present invention, by setting the depth-separable convolution layer in the processing layer, the number of parameters required for the convolution calculation can be reduced, and the calculation can be reduced to a certain extent. quantity and improve computational efficiency.
示例的,图5是本发明实施例提供的一种处理层的示意图,如图5所示,第i个处理层的输入可以为对应的第i个融合层的输出“Fusioni_output”。处理层的输入可以对应输入至第一处理层01以及第二处理层02。其中,SeparableConV(10,3,1)表示以1作为步长,基于10个大小为3×3的卷积核进行深度可分卷积。进一步地,由于Tanh激活函数的输出范围为[1,-1],而Relu激活函数的输出范围为[0,+∞],因此,本发明实施例中在第一处理层中保留Tanh激活函数,一定程度上可以确保最终得到角度信息的范围中存在正值以及正值,进而可以扩大角度信息的范围。进一步地,基于第一处理层中包括的各个层的输出即可进行角度计算,得到一种方式下的角度信息,例如,得到根据图像中人脸关键点获取的第一角度信息。具体的,可以基于角度计算层中预设的角度计算方式实现角度计算,或者是直接将第一处理层中3个模块的输出作为角度信息。进一步地,第二处理层的处理结果可以表征另一种方式下提取到的角度信息,例如,得到根据图像中像素点的颜色通道值获取的第二角度信息。For example, FIG. 5 is a schematic diagram of a processing layer provided by an embodiment of the present invention. As shown in FIG. 5 , the input of the ith processing layer may be the output “Fusioni_output” of the corresponding ith fusion layer. The input of the processing layer can be input to the first processing layer 01 and the second processing layer 02 correspondingly. Among them, SeparableConV(10, 3, 1) indicates that with 1 as the stride, depthwise separable convolution is performed based on 10 convolution kernels of size 3×3. Further, since the output range of the Tanh activation function is [1, -1], and the output range of the Relu activation function is [0, +∞], therefore, in the embodiment of the present invention, the Tanh activation function is reserved in the first processing layer , to a certain extent, it can be ensured that there are positive values and positive values in the range of finally obtained angle information, thereby expanding the range of angle information. Further, angle calculation can be performed based on the outputs of each layer included in the first processing layer, to obtain angle information in one way, for example, obtain first angle information obtained according to the key points of the face in the image. Specifically, the angle calculation may be implemented based on a preset angle calculation method in the angle calculation layer, or the outputs of the three modules in the first processing layer may be directly used as the angle information. Further, the processing result of the second processing layer can represent the angle information extracted in another way, for example, the second angle information obtained according to the color channel value of the pixel in the image is obtained.
需要说明的是,在本发明实施例的另一种实现方式中,初始模型还可以设置为包括至少两个处理分支,其中,至少两个处理分支可以包括第一处理分支及第二处理分支,第一处理分支及第二处理分支均包括卷积层、激活函数层以及池化层。初始模型还包括用于对第一处理分支及 第二处理分支的输出进行融合的融合层以及用于对融合层的输出进行处理的处理层。进一步地,融合层中包括卷积层,融合层中的卷积层以及第一处理分支及所述第二处理分中的卷积层用于进行深度可分卷积操作。处理层包括第一处理层以及第二处理层;第一处理层包括全连接层以及激活函数层,第二处理层包括深度可分卷积层。It should be noted that, in another implementation manner of the embodiment of the present invention, the initial model may also be set to include at least two processing branches, wherein the at least two processing branches may include a first processing branch and a second processing branch, Both the first processing branch and the second processing branch include a convolution layer, an activation function layer and a pooling layer. The initial model also includes a fusion layer for fusing the outputs of the first processing branch and the second processing branch, and a processing layer for processing the output of the fusion layer. Further, the fusion layer includes a convolution layer, and the convolution layer in the fusion layer and the convolution layer in the first processing branch and the second processing branch are used for performing depthwise separable convolution operations. The processing layer includes a first processing layer and a second processing layer; the first processing layer includes a fully connected layer and an activation function layer, and the second processing layer includes a depthwise separable convolution layer.
图6是本发明实施例提供的一种图像处理方法的步骤流程图,该方法可以应用于处理设备,如图6所示,所述方法可以包括:FIG. 6 is a flowchart of steps of an image processing method provided by an embodiment of the present invention. The method can be applied to a processing device. As shown in FIG. 6 , the method can include:
步骤201、将待处理图像作为预设的图像处理模型的输入,以获取所述图像处理模型的输出。Step 201: Use the image to be processed as the input of a preset image processing model to obtain the output of the image processing model.
本发明实施例中,处理设备可以为手机、云台相机等具备拍摄能力、处理能力的设备。处理设备上可以部署有预设的图像处理模型。待处理图像可以是根据需要提取图像信息的图像。示例的,待处理图像可以通过处理设备拍摄得到的图像,或者是拍摄到的视频中的图像。In the embodiment of the present invention, the processing device may be a mobile phone, a pan-tilt camera, and other devices with shooting capability and processing capability. A preset image processing model can be deployed on the processing device. The image to be processed may be an image from which image information is extracted as required. For example, the image to be processed may be an image captured by a processing device, or an image in a captured video.
步骤202、根据所述图像处理模型的输出,获取所述待处理图像的图像信息;其中,所述图像处理模型是根据上述模型生成方法生成的。Step 202: Acquire image information of the image to be processed according to the output of the image processing model; wherein the image processing model is generated according to the above model generation method.
由于预设的图像处理模型是通过获取以多种图像信息获取方式标注的训练数据进行训练得到的,这样,以多种图像信息获取方式标注标签时可以避免单一标注方式的局限性造成的样本不足的问题,进而可以确保训练数据的多样性以及充足性,一定程度上可以提高最终生成的图像处理模型的泛化能力,从而提高使用该图像处理模型对待处理图像提取时,提取到的图像信息的准确性。Since the preset image processing model is trained by acquiring training data labeled with multiple image information acquisition methods, in this way, when labeling with multiple image information acquisition methods, it is possible to avoid insufficient samples caused by the limitation of a single labeling method. It can ensure the diversity and sufficiency of training data, and to a certain extent can improve the generalization ability of the final generated image processing model, thereby improving the image information extracted when using the image processing model to extract the image to be processed. accuracy.
可选的,图像信息可以包括图像中人脸的角度信息,图像处理模型的输出可以包括根据图像中人脸关键点获取的第一角度信息,以及根据图像中像素点的颜色通道值获取的第二角度信息。相应地,在根据图像处理模型的输出,获取待处理图像的图像信息时,可以是将第二角度信息确定为待处理图像的角度信息。即,应用时仅使用根据图像中像素点的颜色通道值获取的第二角度信息。Optionally, the image information may include angle information of the face in the image, and the output of the image processing model may include the first angle information obtained according to the key points of the face in the image, and the first angle information obtained according to the color channel value of the pixel in the image. Two angle information. Correspondingly, when acquiring the image information of the image to be processed according to the output of the image processing model, the second angle information may be determined as the angle information of the image to be processed. That is, only the second angle information obtained according to the color channel value of the pixel in the image is used during application.
由于根据图像中像素点的颜色通道值获取的第二角度信息的准确性往往较高,因此,通过采用根据图像中像素点的颜色通道值获取的第二角度信息作为待处理图像的角度信息,一定程度上可以确保待处理图像 的角度信息的准确性。当然,实际应用场景中,也可以是显示第一角度信息以及第二角度信息,将用户选择的角度信息作为待处理图像的角度信息。或者是,结合第一角度信息以及第二角度信息计算待处理图像的角度信息,例如,将第一角度信息以及第二角度信息的均值作为待处理图像的角度信息,本发明实施例对此不作限定。Since the accuracy of the second angle information obtained according to the color channel value of the pixel in the image is often high, the second angle information obtained according to the color channel value of the pixel in the image is used as the angle information of the image to be processed, To a certain extent, the accuracy of the angle information of the image to be processed can be ensured. Of course, in an actual application scenario, the first angle information and the second angle information may also be displayed, and the angle information selected by the user is used as the angle information of the image to be processed. Alternatively, the angle information of the image to be processed is calculated in combination with the first angle information and the second angle information, for example, the average value of the first angle information and the second angle information is used as the angle information of the to-be-processed image, which is not performed in this embodiment of the present invention. limited.
可选的,预设的图像处理模型可以包括对应不同预设尺寸的图像处理模型。相应地,将待处理图像作为预设的图像处理模型的输入,以获取图像处理模型的输出时:可以先确定与所述处理设备的处理性能相匹配的预设尺寸;其中,所述处理性能越高,所述相匹配的预设尺寸越大。然后,可以将所述待处理图像作为目标图像处理模型的输入,以获取所述目标图像处理模型的输出;所述目标图像处理模型为对应所述相匹配的预设尺寸的图像处理模型。Optionally, the preset image processing models may include image processing models corresponding to different preset sizes. Correspondingly, when the image to be processed is used as the input of the preset image processing model to obtain the output of the image processing model: a preset size that matches the processing performance of the processing device may be determined first; wherein the processing performance The higher, the larger the matching preset size. Then, the to-be-processed image can be used as an input of a target image processing model to obtain an output of the target image processing model; the target image processing model is an image processing model corresponding to the matching preset size.
其中,处理性能可能基于处理设备的硬件配置确定,如果处理设备的硬件配置越高,那么可以确定处理设备的处理性能越高。相应地,可以根据预设的处理性能与预设尺寸对应关系,确定该处理设备的处理性能对应的预设尺寸,进而得到相匹配的预设尺寸。然后将该相匹配的预设尺寸的图像处理模型作为目标处理模型。示例的,假设相匹配的预设尺寸为40*40,那么目标处理模型可以为对应40*40的图像处理模型。The processing performance may be determined based on the hardware configuration of the processing device. If the hardware configuration of the processing device is higher, it may be determined that the processing performance of the processing device is higher. Correspondingly, the preset size corresponding to the processing performance of the processing device may be determined according to the corresponding relationship between the preset processing performance and the preset size, so as to obtain a matching preset size. Then, the image processing model with the matching preset size is used as the target processing model. For example, assuming that the matching preset size is 40*40, the target processing model may be an image processing model corresponding to 40*40.
本发明实施例中,通过在模型生成阶段生成不同的预设尺寸的图像处理模型,在应用时,根据处理设备的实际处理能力选择相适配的图像处理模型进行图像处理,这样,一定程度上可以处理设备没有足够能力运行图像处理模型,进而导致设备卡顿的问题。In this embodiment of the present invention, image processing models with different preset sizes are generated in the model generation stage, and during application, an appropriate image processing model is selected for image processing according to the actual processing capability of the processing device. In this way, to a certain extent, It can handle the problem that the device does not have enough capacity to run the image processing model, which causes the device to freeze.
图7是本发明实施例提供的一种模型生成装置的框图,该装置可以包括:存储器301和处理器302。FIG. 7 is a block diagram of an apparatus for generating a model provided by an embodiment of the present invention. The apparatus may include: a memory 301 and a processor 302 .
所述存储器301,用于存储程序代码。The memory 301 is used to store program codes.
所述处理器302,调用所述程序代码,当所述程序代码被执行时,用于执行以下操作:The processor 302 calls the program code, and when the program code is executed, is configured to perform the following operations:
获取训练数据;所述训练数据中包括样本图像以及所述样本图像的标签,所述标签包括以至少两种图像信息获取方式生成的标签;Acquiring training data; the training data includes a sample image and a label of the sample image, and the label includes a label generated in at least two ways of acquiring image information;
根据所述样本图像以及所述样本图像的标签,对初始模型进行训练,以生成图像处理模型;所述图像处理模型用于提取图像信息,所述初始模型包括单个处理分支。According to the sample images and the labels of the sample images, an initial model is trained to generate an image processing model; the image processing model is used to extract image information, and the initial model includes a single processing branch.
可选的,所述获取训练数据,包括:Optionally, the acquiring training data includes:
获取初始图像以及所述初始图像的初始标签;obtaining an initial image and an initial label of the initial image;
根据所述初始图像以及所述初始标签,生成目标处理模型;generating a target processing model according to the initial image and the initial label;
根据所述目标处理模型、所述初始图像以及所述初始标签,对所述初始图像进行筛选,以获取所述样本图像。According to the target processing model, the initial image and the initial label, the initial image is screened to obtain the sample image.
可选的,所述根据所述目标处理模型、所述初始图像以及所述初始标签,对所述初始图像进行筛选,以获取所述样本图像,包括:Optionally, according to the target processing model, the initial image and the initial label, the initial image is screened to obtain the sample image, including:
对于任一所述初始图像,将所述初始图像作为所述目标处理模型的输入,以获取所述目标处理模型的输出;For any of the initial images, the initial image is used as the input of the target processing model to obtain the output of the target processing model;
将所述输出作为所述初始图像的预测标签,并计算所述预测标签与所述初始标签之间的相似度;using the output as the predicted label of the initial image, and calculating the similarity between the predicted label and the initial label;
剔除相似度小于预设相似度阈值的初始图像,并将剩余的初始图像作为所述样本图像。The initial images whose similarity is less than the preset similarity threshold are eliminated, and the remaining initial images are used as the sample images.
可选的,所述计算所述预测标签与所述初始标签之间的相似度,包括:Optionally, the calculating the similarity between the predicted label and the initial label includes:
计算所述预测标签与所述初始标签之间的差值的绝对值;Calculate the absolute value of the difference between the predicted label and the initial label;
根据所述绝对值确定所述预测标签与所述初始标签之间的相似度;所述相似度与所述绝对值负相关。The similarity between the predicted label and the initial label is determined according to the absolute value; the similarity is negatively correlated with the absolute value.
可选的,所述图像信息包括图像中人脸的角度信息,所述标签包括以第一图像信息获取方式生成的第一标签以及以第二图像信息获取方式生成的第二标签;Optionally, the image information includes angle information of the face in the image, and the label includes a first label generated in a manner of acquiring first image information and a second tag generated in a manner of acquiring second image information;
其中,所述第一图像信息获取方式包括根据图像中人脸关键点获取角度信息的方式;所述第二图像信息获取方式包括根据图像中像素点的颜色通道值进行回归检测以获取角度信息的方式。Wherein, the first method of acquiring image information includes a method of acquiring angle information according to a face key point in the image; the second method of acquiring image information includes a method of performing regression detection according to the color channel value of the pixel point in the image to obtain the angle information. Way.
可选的,所述单个处理分支中包括卷积层、激活函数层、并行的最大池化层以及平均池化层、拼接层、融合层以及用于对所述融合层的输 出进行处理的处理层;Optionally, the single processing branch includes a convolution layer, an activation function layer, a parallel maximum pooling layer and an average pooling layer, a splicing layer, a fusion layer, and a process for processing the output of the fusion layer. Floor;
其中,所述融合层用于对所述最大池化层及所述平均池化层的输出进行融合。Wherein, the fusion layer is used to fuse the outputs of the maximum pooling layer and the average pooling layer.
可选的,所述初始模型中的卷积层用于进行深度可分卷积操作。Optionally, the convolution layer in the initial model is used to perform a depthwise separable convolution operation.
可选的,所述初始模型中的激活函数层均为RELU激活函数层。Optionally, the activation function layers in the initial model are all RELU activation function layers.
可选的,所述根据所述样本图像以及所述样本图像的标签,对初始模型进行训练,以生成图像处理模型,包括:Optionally, the initial model is trained according to the sample image and the label of the sample image to generate an image processing model, including:
将所述样本图像调整为多个预设尺寸下的样本图像;adjusting the sample image to a sample image under multiple preset sizes;
对于各个所述预设尺寸下的样本图像,根据所述样本图像以及所述样本图像的标签,对所述初始模型进行训练,以获取各个所述预设尺寸下的图像处理模型。For each sample image in the preset size, the initial model is trained according to the sample image and the label of the sample image, so as to obtain the image processing model in each preset size.
可选的,所述样本图像包括第一样本图像及第二样本图像,所述第一样本图像的第一标签的获取方式与所述第二样本图像的第二标签的获取方式不同;Optionally, the sample image includes a first sample image and a second sample image, and the method for acquiring the first label of the first sample image is different from the method for acquiring the second label of the second sample image;
所述根据所述样本图像以及所述样本图像的标签,对所述初始模型进行训练,包括:The training of the initial model according to the sample image and the label of the sample image includes:
将所述第一样本图像划分为多个第一样本组,以及,将所述第二样本图像划分为多个第二样本组;dividing the first sample image into a plurality of first sample groups, and dividing the second sample image into a plurality of second sample groups;
根据所述第一样本组中的第一样本图像及所述第一样本图像的第一标签,以及所述第二样本组中的第二样本图像及所述第一样本图像的第二标签,对所述初始模型进行交叉训练。According to the first sample image in the first sample group and the first label of the first sample image, and the second sample image in the second sample group and the first sample image The second label, cross-training the initial model.
可选的,所述第一样本组中包含的第一样本图像的数量与所述第二样本组中包含的第二样本图像的数量相同;Optionally, the number of first sample images included in the first sample group is the same as the number of second sample images included in the second sample group;
所述根据所述第一样本组中的第一样本图像及所述第一样本图像的第一标签,以及所述第二样本组中的第二样本图像及所述第一样本图像的第二标签,对所述初始模型进行交叉训练,包括:the first label according to the first sample image and the first sample image in the first sample group, and the second sample image and the first sample in the second sample group The second label of the image, to cross-train the initial model, including:
根据一个所述第一样本组中的第一样本图像及所述第一样本图像的第一标签,对所述初始模型进行训练,以更新所述初始模型的模型参数;training the initial model according to a first sample image in one of the first sample groups and a first label of the first sample image to update model parameters of the initial model;
在更新所述初始模型的模型参数之后,根据所述第二样本组中的第 二样本图像及所述第二样本图像的第二标签,对所述初始模型进行训练,以更新所述初始模型的模型参数,并在更新所述模型参数之后重新执行所述根据一个所述第一样本组中的第一样本图像及所述第一样本图像的第一标签,对所述初始模型进行训练的步骤。After updating the model parameters of the initial model, the initial model is trained according to the second sample image in the second sample group and the second label of the second sample image to update the initial model the model parameters, and after updating the model parameters, re-execute the first sample image in the first sample group and the first label of the first sample image, to the initial model Steps for training.
综上所述,本发明实施例提供的模型生成装置,可以获取训练数据,其中,训练数据中包括样本图像以及样本图像的标签,标签包括以至少两种图像信息获取方式生成的标签。然后,根据样本图像以及样本图像的标签,对初始模型进行训练,以生成图像处理模型,其中,图像处理模型用于提取图像信息,初始模型包括单个处理分支。由于以多种图像信息获取方式标注标签时可以避免单一标注方式的局限性造成的样本不足的问题,这样,通过获取以多种图像信息获取方式标注的训练数据进行训练,可以确保训练数据的多样性以及充足性,进而一定程度上可以提高最终生成的图像处理模型的泛化能力,从而提高后续使用该图像处理模型提取的图像信息的准确性。To sum up, the model generating apparatus provided by the embodiments of the present invention can acquire training data, wherein the training data includes sample images and labels of the sample images, and the labels include labels generated by at least two image information acquisition methods. Then, based on the sample images and the labels of the sample images, the initial model is trained to generate an image processing model, wherein the image processing model is used to extract image information, and the initial model includes a single processing branch. Since the problem of insufficient samples caused by the limitation of a single labeling method can be avoided when labels are marked with multiple image information acquisition methods, so, by acquiring training data marked with multiple image information acquisition methods for training, the diversity of training data can be ensured. To a certain extent, the generalization ability of the final generated image processing model can be improved, thereby improving the accuracy of the image information extracted by the image processing model subsequently.
图8是本发明实施例提供的一种图像处理装置的框图,该装置可以包括:存储器401和处理器402。FIG. 8 is a block diagram of an image processing apparatus provided by an embodiment of the present invention. The apparatus may include: a memory 401 and a processor 402 .
所述存储器401,用于存储程序代码。The memory 401 is used to store program codes.
所述处理器402,调用所述程序代码,当所述程序代码被执行时,用于执行以下操作:The processor 402 calls the program code, and when the program code is executed, is configured to perform the following operations:
将待处理图像作为预设的图像处理模型的输入,以获取所述图像处理模型的输出;Using the image to be processed as the input of the preset image processing model to obtain the output of the image processing model;
根据所述图像处理模型的输出,获取所述待处理图像的图像信息;obtaining image information of the to-be-processed image according to the output of the image processing model;
其中,所述图像处理模型是根据上述模型生成方法生成的。Wherein, the image processing model is generated according to the above model generation method.
可选的,所述图像信息包括图像中人脸的角度信息;所述图像处理模型的输出包括以根据图像中人脸关键点获取的第一角度信息,以及根据图像中像素点的颜色通道值获取的第二角度信息;Optionally, the image information includes the angle information of the face in the image; the output of the image processing model includes the first angle information obtained according to the key points of the face in the image, and the color channel value according to the pixel point in the image. The acquired second angle information;
所述根据所述图像处理模型的输出,获取所述待处理图像的图像信息,包括:The obtaining image information of the to-be-processed image according to the output of the image processing model includes:
将所述第二角度信息确定为所述待处理图像的角度信息。The second angle information is determined as the angle information of the image to be processed.
可选的,所述图像处理模型包括对应不同预设尺寸的图像处理模型;Optionally, the image processing model includes image processing models corresponding to different preset sizes;
所述将待处理图像作为预设的图像处理模型的输入,以获取所述图像处理模型的输出,包括:Taking the image to be processed as the input of the preset image processing model to obtain the output of the image processing model includes:
确定与所述处理设备的处理性能相匹配的预设尺寸;其中,所述处理性能越高,所述相匹配的预设尺寸越大;determining a preset size matching the processing performance of the processing device; wherein, the higher the processing performance, the larger the matching preset size;
将所述待处理图像作为目标图像处理模型的输入,以获取所述目标图像处理模型的输出;所述目标图像处理模型为对应所述相匹配的预设尺寸的图像处理模型。The to-be-processed image is used as the input of the target image processing model to obtain the output of the target image processing model; the target image processing model is the image processing model corresponding to the matching preset size.
综上所述,本发明实施例提供的图像处理装置,由于使用的预设的图像处理模型是通过获取以多种图像信息获取方式标注的训练数据进行训练得到的,这样,以多种图像信息获取方式标注标签时可以避免单一标注方式的局限性造成的样本不足的问题,进而可以确保训练数据的多样性以及充足性,一定程度上可以提高最终生成的图像处理模型的泛化能力,从而提高使用该图像处理模型对待处理图像提取时,提取到的图像信息的准确性。To sum up, in the image processing apparatus provided by the embodiments of the present invention, since the preset image processing model used is obtained by acquiring training data marked in various ways of acquiring image information for training, thus, using various image information The problem of insufficient samples caused by the limitation of a single labeling method can be avoided when the label is obtained by the method of acquisition, thereby ensuring the diversity and sufficiency of training data, and to a certain extent, it can improve the generalization ability of the final image processing model, thereby improving the When using this image processing model to extract the image to be processed, the accuracy of the extracted image information.
进一步地,本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质上存储计算机程序,所述计算机程序被处理器执行时实现上述方法中的各个步骤,且能达到相同的技术效果,为避免重复,这里不再赘述。Further, an embodiment of the present invention also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, each step in the above method is implemented, and can achieve the same In order to avoid repetition, the technical effect will not be repeated here.
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器来实现根据本发明实施例的计算处理设备中的一些或者全部部件的一些或者全 部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。Various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor may be used in practice to implement some or all of the functions of some or all of the components in a computing processing device according to embodiments of the present invention. The present invention can also be implemented as apparatus or apparatus programs (eg, computer programs and computer program products) for performing part or all of the methods described herein. Such a program implementing the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals. Such signals may be downloaded from Internet sites, or provided on carrier signals, or in any other form.
例如,图9为本发明实施例提供的一种计算处理设备的框图,如图9所示,图9示出了可以实现根据本发明的方法的计算处理设备。该计算处理设备传统上包括处理器710和以存储器720形式的计算机程序产品或者计算机可读介质。存储器720可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器720具有用于执行上述方法中的任何方法步骤的程序代码的存储空间730。例如,用于程序代码的存储空间730可以包括分别用于实现上面的方法中的各种步骤的各个程序代码。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为如参考图10所述的便携式或者固定存储单元。该存储单元可以具有与图9的计算处理设备中的存储器720类似布置的存储段、存储空间等。程序代码可以例如以适当形式进行压缩。通常,存储单元包括计算机可读代码,即可以由例如诸如710之类的处理器读取的代码,这些代码当由计算处理设备运行时,导致该计算处理设备执行上面所描述的方法中的各个步骤。For example, FIG. 9 is a block diagram of a computing processing device provided by an embodiment of the present invention. As shown in FIG. 9 , FIG. 9 shows a computing processing device that can implement the method according to the present invention. The computing processing device traditionally includes a processor 710 and a computer program product or computer readable medium in the form of a memory 720 . The memory 720 may be electronic memory such as flash memory, EEPROM (electrically erasable programmable read only memory), EPROM, hard disk, or ROM. The memory 720 has storage space 730 for program code for performing any of the method steps in the above-described methods. For example, the storage space 730 for program codes may include various program codes for implementing various steps in the above methods, respectively. These program codes can be read from or written to one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such computer program products are typically portable or fixed storage units as described with reference to FIG. 10 . The storage unit may have storage segments, storage spaces, etc. arranged similarly to the memory 720 in the computing processing device of FIG. 9 . The program code may, for example, be compressed in a suitable form. Typically, the storage unit includes computer readable code, ie code readable by a processor such as 710 for example, which when executed by a computing processing device, causes the computing processing device to perform each of the methods described above. step.
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。The various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same and similar parts between the various embodiments may be referred to each other.
本文中所称的“一个实施例”、“实施例”或者“一个或者多个实施例”意味着,结合实施例描述的特定特征、结构或者特性包括在本发明的至少一个实施例中。此外,请注意,这里“在一个实施例中”的词语例子不一定全指同一个实施例。Reference herein to "one embodiment," "an embodiment," or "one or more embodiments" means that a particular feature, structure, or characteristic described in connection with an embodiment is included in at least one embodiment of the present invention. Also, please note that instances of the phrase "in one embodiment" herein are not necessarily all referring to the same embodiment.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解, 本发明的实施例可以在没有这些具体细节的情况下被实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. It will be understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. do not denote any order. These words can be interpreted as names.
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (17)

  1. 一种模型生成方法,其特征在于,所述方法包括:A model generation method, characterized in that the method comprises:
    获取训练数据;所述训练数据中包括样本图像以及所述样本图像的标签,所述标签包括以至少两种图像信息获取方式生成的标签;Acquiring training data; the training data includes a sample image and a label of the sample image, and the label includes a label generated in at least two ways of acquiring image information;
    根据所述样本图像以及所述样本图像的标签,对初始模型进行训练,以生成图像处理模型;所述图像处理模型用于提取图像信息,所述初始模型包括单个处理分支。According to the sample images and the labels of the sample images, an initial model is trained to generate an image processing model; the image processing model is used to extract image information, and the initial model includes a single processing branch.
  2. 根据权利要求1所述方法,其特征在于,所述获取训练数据,包括:The method according to claim 1, wherein the acquiring training data comprises:
    获取初始图像以及所述初始图像的初始标签;obtaining an initial image and an initial label of the initial image;
    根据所述初始图像以及所述初始标签,生成目标处理模型;generating a target processing model according to the initial image and the initial label;
    根据所述目标处理模型、所述初始图像以及所述初始标签,对所述初始图像进行筛选,以获取所述样本图像。According to the target processing model, the initial image and the initial label, the initial image is screened to obtain the sample image.
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述目标处理模型、所述初始图像以及所述初始标签,对所述初始图像进行筛选,以获取所述样本图像,包括:The method according to claim 2, wherein, according to the target processing model, the initial image and the initial label, the initial image is screened to obtain the sample image, comprising:
    对于任一所述初始图像,将所述初始图像作为所述目标处理模型的输入,以获取所述目标处理模型的输出;For any of the initial images, the initial image is used as the input of the target processing model to obtain the output of the target processing model;
    将所述输出作为所述初始图像的预测标签,并计算所述预测标签与所述初始标签之间的相似度;using the output as the predicted label of the initial image, and calculating the similarity between the predicted label and the initial label;
    剔除相似度小于预设相似度阈值的初始图像,并将剩余的初始图像作为所述样本图像。The initial images whose similarity is less than the preset similarity threshold are eliminated, and the remaining initial images are used as the sample images.
  4. 根据权利要求3所述的方法,其特征在于,所述计算所述预测标签与所述初始标签之间的相似度,包括:The method according to claim 3, wherein the calculating the similarity between the predicted label and the initial label comprises:
    计算所述预测标签与所述初始标签之间的差值的绝对值;Calculate the absolute value of the difference between the predicted label and the initial label;
    根据所述绝对值确定所述预测标签与所述初始标签之间的相似度;所述相似度与所述绝对值负相关。The similarity between the predicted label and the initial label is determined according to the absolute value; the similarity is negatively correlated with the absolute value.
  5. 根据权利要求1所述的方法,其特征在于,所述图像信息包括图像中人脸的角度信息,所述标签包括以第一图像信息获取方式生成的第一标签以及以第二图像信息获取方式生成的第二标签;The method according to claim 1, wherein the image information includes angle information of the face in the image, and the label includes a first label generated in a first image information acquisition manner and a second image information acquisition manner Generated second label;
    其中,所述第一图像信息获取方式包括根据图像中人脸关键点获取角度信息的方式;所述第二图像信息获取方式包括根据图像中像素点的颜色通道值进行回归检测以获取角度信息的方式。Wherein, the first method of acquiring image information includes a method of acquiring angle information according to a face key point in the image; the second method of acquiring image information includes a method of performing regression detection according to the color channel value of the pixel point in the image to obtain the angle information. Way.
  6. 根据权利要求1-5任一所述的方法,其特征在于,所述单个处理分支中包括卷积层、激活函数层、并行的最大池化层以及平均池化层、拼接层、融合层以及用于对所述融合层的输出进行处理的处理层;The method according to any one of claims 1-5, wherein the single processing branch includes a convolution layer, an activation function layer, a parallel max pooling layer, an average pooling layer, a splicing layer, a fusion layer, and a processing layer for processing the output of the fusion layer;
    其中,所述融合层用于对所述最大池化层及所述平均池化层的输出进行融合。Wherein, the fusion layer is used to fuse the outputs of the maximum pooling layer and the average pooling layer.
  7. 根据权利要求6所述的方法,其特征在于,所述初始模型中的卷积层用于进行深度可分卷积操作。The method according to claim 6, wherein the convolution layer in the initial model is used to perform a depthwise separable convolution operation.
  8. 根据权利要求6所述的方法,其特征在于,所述初始模型中的激活函数层均为RELU激活函数层。The method according to claim 6, wherein the activation function layers in the initial model are all RELU activation function layers.
  9. 根据权利要求1所述的方法,其特征在于,所述根据所述样本图像以及所述样本图像的标签,对初始模型进行训练,以生成图像处理模型,包括:The method according to claim 1, wherein the training an initial model according to the sample image and the label of the sample image to generate an image processing model, comprising:
    将所述样本图像调整为多个预设尺寸下的样本图像;adjusting the sample image to a sample image under multiple preset sizes;
    对于各个所述预设尺寸下的样本图像,根据所述样本图像以及所述样本图像的标签,对所述初始模型进行训练,以获取各个所述预设尺寸下的图像处理模型。For each sample image in the preset size, the initial model is trained according to the sample image and the label of the sample image, so as to obtain the image processing model in each preset size.
  10. 根据权利要求9所述的方法,其特征在于,所述样本图像包括第一样本图像及第二样本图像,所述第一样本图像的第一标签的获取方式与所述第二样本图像的第二标签的获取方式不同;The method according to claim 9, wherein the sample image comprises a first sample image and a second sample image, and the method of acquiring the first label of the first sample image is the same as that of the second sample image The way to obtain the second label of is different;
    所述根据所述样本图像以及所述样本图像的标签,对所述初始模型进行训练,包括:The training of the initial model according to the sample image and the label of the sample image includes:
    将所述第一样本图像划分为多个第一样本组,以及,将所述第二样本图像划分为多个第二样本组;dividing the first sample image into a plurality of first sample groups, and dividing the second sample image into a plurality of second sample groups;
    根据所述第一样本组中的第一样本图像及所述第一样本图像的第一标签,以及所述第二样本组中的第二样本图像及所述第一样本图像的第二标签,对所述初始模型进行交叉训练。According to the first sample image in the first sample group and the first label of the first sample image, and the second sample image in the second sample group and the first sample image The second label, cross-training the initial model.
  11. 根据权利要求10所述的方法,其特征在于,所述第一样本组中包含的第一样本图像的数量与所述第二样本组中包含的第二样本图像的数量相同;The method according to claim 10, wherein the number of first sample images included in the first sample group is the same as the number of second sample images included in the second sample group;
    所述根据所述第一样本组中的第一样本图像及所述第一样本图像的第一标签,以及所述第二样本组中的第二样本图像及所述第一样本图像的第二标签,对所述初始模型进行交叉训练,包括:the first label according to the first sample image and the first sample image in the first sample group, and the second sample image and the first sample in the second sample group The second label of the image, to cross-train the initial model, including:
    根据一个所述第一样本组中的第一样本图像及所述第一样本图像的第一标签,对所述初始模型进行训练,以更新所述初始模型的模型参数;training the initial model according to a first sample image in one of the first sample groups and a first label of the first sample image to update model parameters of the initial model;
    在更新所述初始模型的模型参数之后,根据所述第二样本组中的第二样本图像及所述第二样本图像的第二标签,对所述初始模型进行训练,以更新所述初始模型的模型参数,并在更新所述模型参数之后重新执行所述根据一个所述第一样本组中的第一样本图像及所述第一样本图像的第一标签,对所述初始模型进行训练的步骤。After updating the model parameters of the initial model, the initial model is trained according to the second sample image in the second sample group and the second label of the second sample image to update the initial model the model parameters, and after updating the model parameters, re-execute the first sample image in the first sample group and the first label of the first sample image, to the initial model Steps for training.
  12. 一种图像处理方法,其特征在于,应用于处理设备,所述方法包括:An image processing method, characterized in that, applied to a processing device, the method comprising:
    将待处理图像作为预设的图像处理模型的输入,以获取所述图像处理模型的输出;Using the image to be processed as the input of the preset image processing model to obtain the output of the image processing model;
    根据所述图像处理模型的输出,获取所述待处理图像的图像信息;obtaining image information of the to-be-processed image according to the output of the image processing model;
    其中,所述图像处理模型是根据上述权利要求1至11任一所述方法生成的。Wherein, the image processing model is generated according to the method of any one of the above claims 1 to 11.
  13. 根据权利要求12所述的方法,其特征在于,所述图像信息包括图像中人脸的角度信息;所述图像处理模型的输出包括以根据图像中人脸关键点获取的第一角度信息,以及根据图像中像素点的颜色通道值获取的第二角度信息;The method according to claim 12, wherein the image information includes angle information of the face in the image; the output of the image processing model includes the first angle information obtained according to the key points of the face in the image, and The second angle information obtained according to the color channel value of the pixel in the image;
    所述根据所述图像处理模型的输出,获取所述待处理图像的图像信息,包括:The obtaining image information of the to-be-processed image according to the output of the image processing model includes:
    将所述第二角度信息确定为所述待处理图像的角度信息。The second angle information is determined as the angle information of the image to be processed.
  14. 根据权利要求12或13所述的方法,其特征在于,所述图像处理模型包括对应不同预设尺寸的图像处理模型;The method according to claim 12 or 13, wherein the image processing model comprises image processing models corresponding to different preset sizes;
    所述将待处理图像作为预设的图像处理模型的输入,以获取所述图像处 理模型的输出,包括:The described image to be processed is used as the input of the preset image processing model, to obtain the output of the image processing model, including:
    确定与所述处理设备的处理性能相匹配的预设尺寸;其中,所述处理性能越高,所述相匹配的预设尺寸越大;determining a preset size matching the processing performance of the processing device; wherein, the higher the processing performance, the larger the matching preset size;
    将所述待处理图像作为目标图像处理模型的输入,以获取所述目标图像处理模型的输出;所述目标图像处理模型为对应所述相匹配的预设尺寸的图像处理模型。The to-be-processed image is used as the input of the target image processing model to obtain the output of the target image processing model; the target image processing model is the image processing model corresponding to the matching preset size.
  15. 一种模型生成装置,其特征在于,所述装置包括存储器和处理器;A model generation device, characterized in that the device includes a memory and a processor;
    所述存储器,用于存储程序代码;the memory for storing program codes;
    所述处理器,调用所述程序代码,当所述程序代码被执行时,用于执行以下操作:The processor calls the program code, and when the program code is executed, is configured to perform the following operations:
    获取训练数据;所述训练数据中包括样本图像以及所述样本图像的标签,所述标签包括以至少两种图像信息获取方式生成的标签;Acquiring training data; the training data includes a sample image and a label of the sample image, and the label includes a label generated in at least two ways of acquiring image information;
    根据所述样本图像以及所述样本图像的标签,对初始模型进行训练,以生成图像处理模型;所述图像处理模型用于提取图像信息,所述初始模型包括单个处理分支。According to the sample images and the labels of the sample images, an initial model is trained to generate an image processing model; the image processing model is used to extract image information, and the initial model includes a single processing branch.
  16. 一种图像处理装置,其特征在于,所述装置应用于处理设备,所述装置包括存储器和处理器;An image processing apparatus, characterized in that, the apparatus is applied to processing equipment, and the apparatus includes a memory and a processor;
    所述存储器,用于存储程序代码;the memory for storing program codes;
    所述处理器,调用所述程序代码,当所述程序代码被执行时,用于执行以下操作:The processor calls the program code, and when the program code is executed, is configured to perform the following operations:
    将待处理图像作为预设的图像处理模型的输入,以获取所述图像处理模型的输出;Using the image to be processed as the input of the preset image processing model to obtain the output of the image processing model;
    根据所述图像处理模型的输出,获取所述待处理图像的图像信息;obtaining image information of the to-be-processed image according to the output of the image processing model;
    其中,所述图像处理模型是根据上述权利要求13所述装置生成的。Wherein, the image processing model is generated according to the device of claim 13 above.
  17. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储计算机程序,所述计算机程序被处理器执行时实现入权利要求1至权利要求14中任一所述的方法。A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method of any one of claims 1 to 14 is implemented.
PCT/CN2020/141005 2020-12-29 2020-12-29 Model generation method and apparatus, image processing method and apparatus, and readable storage medium WO2022141094A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/141005 WO2022141094A1 (en) 2020-12-29 2020-12-29 Model generation method and apparatus, image processing method and apparatus, and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/141005 WO2022141094A1 (en) 2020-12-29 2020-12-29 Model generation method and apparatus, image processing method and apparatus, and readable storage medium

Publications (1)

Publication Number Publication Date
WO2022141094A1 true WO2022141094A1 (en) 2022-07-07

Family

ID=82258715

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/141005 WO2022141094A1 (en) 2020-12-29 2020-12-29 Model generation method and apparatus, image processing method and apparatus, and readable storage medium

Country Status (1)

Country Link
WO (1) WO2022141094A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115236509A (en) * 2022-08-08 2022-10-25 江苏大中电机股份有限公司 Data acquisition device for an electric machine

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543731A (en) * 2018-11-09 2019-03-29 江南大学 A kind of three preferred Semi-Supervised Regression algorithms under self-training frame
CN110163252A (en) * 2019-04-17 2019-08-23 平安科技(深圳)有限公司 Data classification method and device, electronic equipment, storage medium
CN110427542A (en) * 2018-04-26 2019-11-08 北京市商汤科技开发有限公司 Sorter network training and data mask method and device, equipment, medium
CN111639540A (en) * 2020-04-30 2020-09-08 中国海洋大学 Semi-supervised character re-recognition method based on camera style and human body posture adaptation
CN111783646A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Training method, device, equipment and storage medium of pedestrian re-identification model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427542A (en) * 2018-04-26 2019-11-08 北京市商汤科技开发有限公司 Sorter network training and data mask method and device, equipment, medium
CN109543731A (en) * 2018-11-09 2019-03-29 江南大学 A kind of three preferred Semi-Supervised Regression algorithms under self-training frame
CN110163252A (en) * 2019-04-17 2019-08-23 平安科技(深圳)有限公司 Data classification method and device, electronic equipment, storage medium
CN111639540A (en) * 2020-04-30 2020-09-08 中国海洋大学 Semi-supervised character re-recognition method based on camera style and human body posture adaptation
CN111783646A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Training method, device, equipment and storage medium of pedestrian re-identification model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115236509A (en) * 2022-08-08 2022-10-25 江苏大中电机股份有限公司 Data acquisition device for an electric machine
CN115236509B (en) * 2022-08-08 2023-11-10 江苏大中电机股份有限公司 Data acquisition equipment for motor

Similar Documents

Publication Publication Date Title
CN111062871B (en) Image processing method and device, computer equipment and readable storage medium
CN108182394B (en) Convolutional neural network training method, face recognition method and face recognition device
US11055516B2 (en) Behavior prediction method, behavior prediction system, and non-transitory recording medium
CN111209970B (en) Video classification method, device, storage medium and server
US8526728B2 (en) Establishing clusters of user preferences for image enhancement
CN112639828A (en) Data processing method, method and equipment for training neural network model
WO2018005565A1 (en) Automated selection of subjectively best images from burst captured image sequences
US11468296B2 (en) Relative position encoding based networks for action recognition
CN112183166A (en) Method and device for determining training sample and electronic equipment
WO2020092276A1 (en) Video recognition using multiple modalities
CN113505848B (en) Model training method and device
CN112132279B (en) Convolutional neural network model compression method, device, equipment and storage medium
CN112818888A (en) Video auditing model training method, video auditing method and related device
CN113869282B (en) Face recognition method, hyper-resolution model training method and related equipment
WO2022141094A1 (en) Model generation method and apparatus, image processing method and apparatus, and readable storage medium
CN110659631A (en) License plate recognition method and terminal equipment
WO2022141092A1 (en) Model generation method and apparatus, image processing method and apparatus, and readable storage medium
TWI803243B (en) Method for expanding images, computer device and storage medium
CN110795993A (en) Method and device for constructing model, terminal equipment and medium
CN114912540A (en) Transfer learning method, device, equipment and storage medium
CN114566160A (en) Voice processing method and device, computer equipment and storage medium
WO2022001364A1 (en) Method for extracting data features, and related apparatus
CN114155388A (en) Image recognition method and device, computer equipment and storage medium
WO2021147084A1 (en) Systems and methods for emotion recognition in user-generated video(ugv)
CN112634143A (en) Image color correction model training method and device and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20967442

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20967442

Country of ref document: EP

Kind code of ref document: A1