CN112614140A

CN112614140A - Method and related device for training color spot detection model

Info

Publication number: CN112614140A
Application number: CN202011496189.XA
Authority: CN
Inventors: 陈仿雄
Original assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Current assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2021-04-06

Abstract

The embodiment of the invention relates to the technical field of target detection, in particular to a method and a related device for training a color spot detection model, wherein images in an image sample set are intercepted into candidate area images, so that the interference of area features (such as background features and facial features) without color spots on the training model is reduced, and the accuracy of model training can be improved; the resolution enhancement processing and the size adjustment are carried out on each candidate area image containing the color spots to obtain a plurality of training images with consistent sizes, so that the interference of the sizes to model learning can be effectively avoided, and the characteristics of the color spots are enhanced; in addition, the positions of the color spots are constrained by introducing the number of the color spots in the training process, so that the predicted positions of the color spots can be trained in the same direction as the positions of the real color spots, and the model training precision is further improved; the improvement of the multiple training precision ensures that the detection precision of the color spot detection model obtained by the method through training is higher.

Description

Method and related device for training color spot detection model

Technical Field

The embodiment of the invention relates to the technical field of target detection, in particular to a method and a related device for training a color spot detection model.

Background

Along with the rapid development of mobile communication technology and the promotion of people's standard of living, various intelligent terminal have widely been applied to people's daily work and life for people are more and more accustomed to using software such as APP, make the face beautification auto heterodyne, shoot and survey the APP demand of this kind of function of skin and also become more and more, consequently not few users hope that this kind of APP can the automatic analysis go out the color spot condition of face, according to the condition of color spot, have pointed to put forward the skin improvement scheme.

The existing face color spot detection mainly adopts an image processing method for possible color spots to extract color spot areas, and the method is too complicated and is easily influenced by factors such as illumination and the like, so that the detection accuracy is low.

Disclosure of Invention

The technical problem to be solved by the embodiments of the present invention is to provide a method and a related device for training a color spot detection model, which can accurately classify and locate color spots.

To solve the above technical problem, in a first aspect, an embodiment of the present invention provides a method for training a wrinkle detection model, including:

acquiring an image sample set comprising a human face;

respectively intercepting at least one face local area image possibly having color spots from each image in the image sample set to obtain a plurality of candidate area images;

performing resolution enhancement processing and size adjustment on each candidate region image to obtain a plurality of training images with consistent sizes, wherein the resolution of the training images is higher than that of the candidate region images;

the method comprises the steps of taking a plurality of training images marked with real labels as a training set, training a preset convolutional neural network to enable the preset convolutional neural network to learn the real labels of the plurality of training images so as to obtain a color spot detection model, wherein the real labels of a target training image comprise the real positions, real types and real numbers of color spots in the target training image, and the target training image is any one of the plurality of training images.

In some embodiments, the training of the preset convolutional neural network with the training images labeled with real labels as a training set to enable the preset convolutional neural network to learn the real labels of the training images to obtain a color spot detection model includes:

inputting the target training image into the feature extraction module to obtain training feature maps of at least two sizes;

dividing the training characteristic diagram into two paths to obtain a first path of training characteristic diagram and a second path of training characteristic diagram;

inputting the first path of training feature map into the color spot target detection module to obtain a predicted position and a predicted category of the color spot in the target training image;

inputting the second path of training feature map into the color spot quantity detection module to obtain the predicted quantity of the color spots in the target training image;

calculating an error between a predicted label of the target training image and a real label of the target training image according to a preset loss function, wherein the predicted label comprises the predicted position, the predicted category and the predicted quantity;

and adjusting model parameters of the preset convolutional neural network according to the error, returning to execute the step of inputting the target training image into the feature extraction module to obtain training feature maps of at least two sizes until the preset convolutional neural network is converged to obtain the color spot detection model.

In some embodiments, the feature extraction module includes a plurality of feature convolution layers, and the number of convolution kernels of the plurality of feature convolution layers tends to increase first and then decrease as the number of layers of the feature convolution layers increases.

In some embodiments, the plurality of feature convolution layers includes a target feature convolution layer, the number of convolution kernels of the target feature convolution layer is the number of the categories of the pigmented spots, and the target feature convolution layer is a feature convolution layer connected to the pigmented spot target detection module and configured to output the training feature map.

In some embodiments, the preset loss function is a weighted sum of a location loss function for calculating an error between the real location and the predicted location, a class loss function for calculating an error between the real class and the predicted class, and a quantity loss function for calculating an error between the real quantity and the predicted quantity.

In some embodiments, the calculating an error between the predicted label of the target training image and the true label of the target training image according to a preset loss function includes:

calculating an error between the predicted tag and the true tag according to the following formula:

wherein λ is_objIs a preset position offset weight, alpha is a confidence coefficient loss weight, beta is a quantity weight, M is a preset number of prediction frames for predicting the color spot position, i is a mark number of the prediction frame, T_widthIs the width, T, of the prediction box_heightFor the height of the prediction box, (x, y, w, h) are the coordinates of the upper left corner of the prediction box and the width, height, T_rFor indicating the true position of the mottle, P_rFor indicating the predicted position of the color spot, K is the number of the types of the color spot, T_classIs the true category of the stain, P_classAs a predictive category of the stain, T_confAs confidence of the real label, P_confFor the confidence level of the predicted label or labels,

the true number of c-th color spots in the K-th color spots,

the predicted number of c-th color spots in the K-th color spots.

In some embodiments, the performing resolution enhancement processing and resizing on each candidate region image to obtain a plurality of training images with consistent sizes includes:

acquiring a target candidate region image and the size of a target training image, wherein the target candidate region image is any candidate region image in the candidate region images, and the target training image is obtained by mapping the target candidate region image;

acquiring a second pixel coordinate corresponding to a target pixel point in the target candidate region image according to a first pixel coordinate of the target pixel point in the target training image, the size of the target candidate region image and the size of the target training image, wherein the target pixel point is any pixel point in the target training image;

acquiring a plurality of neighborhood pixel points of the second pixel coordinate, wherein the neighborhood pixel points are pixel points adjacent to the second pixel coordinate in the target candidate region image;

obtaining the weight of each neighborhood pixel point according to the position relation between the second pixel coordinate and the plurality of neighborhood pixel points;

and determining the pixel value of the target pixel point according to the weight of each neighborhood pixel point and the pixel value of each domain pixel point to obtain the target training image.

In order to solve the above technical problem, in a second aspect, an embodiment of the present invention provides a method for detecting color spots, including:

acquiring a face image to be detected;

intercepting at least one target face local image possibly having color spots from the face image to be detected;

detecting the at least one target face local area image by using the color spot detection model of the first aspect, and acquiring the position, the category and the number of color spots of the at least one target face local area image;

and determining the position, the category and the number of the color spots in the face image to be detected based on the position, the category and the number of the color spots of the at least one target face local area image and the position of the target face local area image in the face image to be detected.

In order to solve the above technical problem, in a third aspect, an embodiment of the present invention provides an electronic device, including:

at least one processor, and

a memory communicatively coupled to the at least one processor, wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect as described above and the method of the second aspect as described above.

In order to solve the above technical problem, in a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium storing computer-executable instructions for causing an electronic device to perform the method according to the first aspect and the method according to the second aspect.

The embodiment of the invention has the following beneficial effects: different from the situation of the prior art, the method and the related device for training the color spot detection model provided by the embodiment of the invention can reduce the interference of regional features (such as background features and facial features) without color spots on the training model in the model training process by intercepting the images in the image sample set into the candidate regional images, thereby improving the accuracy of model training; resolution enhancement processing and size adjustment are carried out on each candidate area image containing the color spots to obtain a plurality of training images with consistent sizes, the size adjustment can effectively avoid the interference of the sizes on model learning, the resolution enhancement can enhance the characteristics of the color spots, and therefore the model training precision can be further improved; in addition, the positions of the color spots are constrained by introducing the number of the color spots in the training process, so that the predicted positions of the color spots can be trained in the same direction as the positions of the real color spots, and the model training precision is further improved; the improvement of the multiple training precision ensures that the detection precision of the color spot detection model obtained by the method through training is higher. In addition, images in the image sample set are intercepted into candidate area images, and the size of the candidate area images is smaller than that of the original images, so that the calculation amount in the training process is favorably reduced, the model training and calculating speed is improved, and a color spot detection model with small calculation amount and high detection precision is favorably obtained.

Drawings

One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.

FIG. 1 is a schematic diagram of an operating environment of a method for training a color spot detection model according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for training a mottle detection model according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a candidate region image according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a color spot detection model according to an embodiment of the present invention;

FIG. 6 is a schematic flow chart illustrating a sub-process of step S24 in the method of FIG. 3;

FIG. 7 is a schematic flow chart illustrating a sub-process of step S23 in the method of FIG. 3;

FIG. 8 is a diagram illustrating a neighborhood pixel provided in accordance with an embodiment of the present invention;

fig. 9 is a flowchart illustrating a method for detecting color spots according to an embodiment of the invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

It should be noted that, if not conflicted, the various features of the embodiments of the invention may be combined with each other within the scope of protection of the present application. Additionally, while functional block divisions are performed in apparatus schematics, with logical sequences shown in flowcharts, in some cases, steps shown or described may be performed in sequences other than block divisions in apparatus or flowcharts. Further, the terms "first," "second," "third," and the like, as used herein, do not limit the data and the execution order, but merely distinguish the same items or similar items having substantially the same functions and actions.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Fig. 1 is a schematic operating environment diagram of a method for training a color spot detection model according to an embodiment of the present invention. Referring to fig. 1, the electronic device 10 and the image capturing apparatus 20 are included, and the electronic device 10 and the image capturing apparatus 20 are connected in a communication manner.

The communication connection may be a wired connection, for example: fiber optic cables, and also wireless communication connections, such as: WIFI connection, bluetooth connection, 4G wireless communication connection, 5G wireless communication connection and so on.

The image acquiring apparatus 20 is configured to acquire an image sample set including a human face, and may also be configured to acquire an image of the human face to be detected, where the image acquiring apparatus 20 may be a terminal capable of capturing images, for example: a mobile phone, a tablet computer, a video recorder or a camera with shooting function.

The electronic device 10 is a device capable of automatically processing mass data at high speed according to a program, and is generally composed of a hardware system and a software system, for example: computers, smart phones, and the like. The electronic device 10 may be a local device, which is directly connected to the image capturing apparatus 20; it may also be a cloud device, for example: a cloud server, a cloud host, a cloud service platform, a cloud computing platform, etc., the cloud device is connected to the image acquisition apparatus 20 through a network, and the two are connected through a predetermined communication protocol, which may be TCP/IP, NETBEUI, IPX/SPX, etc. in some embodiments.

It can be understood that: the image capturing device 20 and the electronic apparatus 10 may also be integrated together as an integrated apparatus, such as a computer with a camera or a smart phone.

The electronic device 10 receives the image sample set including the face sent by the image acquisition device 20, trains the image sample set to obtain a color spot detection model, and detects the color spot position and the type of the face image to be detected sent by the image acquisition device 20 by using the color spot detection model. It is understood that the above-mentioned training of the color spot detection model and the detection of the face image to be detected can also be performed on different electronic devices.

On the basis of fig. 1, another embodiment of the present invention provides an electronic device 10, please refer to fig. 2, which is a hardware structure diagram of the electronic device 10 according to the embodiment of the present invention, specifically, as shown in fig. 2, the electronic device 10 includes at least one processor 11 and a memory 12 (in fig. 2, a bus connection, a processor is taken as an example) that are communicatively connected.

The processor 11 is configured to provide computing and control capabilities to control the electronic device 10 to perform corresponding tasks, for example, control the electronic device 10 to perform any one of the methods for training the color spot detection model provided in the embodiments of the invention described below or any one of the methods for detecting color spots provided in the embodiments of the invention described below.

It is understood that the Processor 11 may be a general-purpose Processor including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

The memory 12, which is a non-transitory computer readable storage medium, can be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method for training a stain detection model in the embodiments of the present invention, or program instructions/modules corresponding to the method for detecting a stain in the embodiments of the present invention. The processor 11 may implement the method of training the stain detection model in any of the method embodiments described below, and may implement the method of detecting a stain in any of the method embodiments described below, by running non-transitory software programs, instructions, and modules stored in the memory 12. In particular, the memory 12 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 12 may also include memory located remotely from the processor, which may be connected to the processor via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

In the following, a method for training a color spot detection model according to an embodiment of the present invention is described in detail, referring to fig. 3, the method S20 includes, but is not limited to, the following steps:

s21: a sample set of images including a human face is acquired.

S22: at least one face local area image possibly having color spots is respectively intercepted from each image in the image sample set to obtain a plurality of candidate area images.

S23: and performing resolution enhancement processing and size adjustment on each candidate region image to acquire a plurality of training images with consistent sizes, wherein the resolution of the training images is higher than that of the candidate region images.

S24: the method comprises the steps of taking a plurality of training images marked with real labels as a training set, training a preset convolutional neural network to enable the preset convolutional neural network to learn the real labels of the plurality of training images so as to obtain a color spot detection model, wherein the real labels of a target training image comprise the real positions, real types and real numbers of color spots in the target training image, and the target training image is any one of the plurality of training images.

The images in the image sample set include human faces, and can be acquired by the image acquisition device, for example, the image sample set can be a certificate photo or a self-portrait photo acquired by the image acquisition device. It is to be understood that the image sample set may also be data in an existing open source face database, wherein the open source face database may be a FERET face database, a CMU Multi-PIE face database, or a YALE face database, etc. Here, the source of the image sample is not limited as long as the image sample includes a human face.

It can be understood that the images in the image sample set include a human face and a background, wherein the color spots are only likely to exist in a facial local area of the human face, which results in that the color spot features are not obvious relative to the area features (such as background features and facial five-sense organ features) containing no color spots, and thus, the color spot detection effect is poor. In order to reduce the interference of regional features not containing color spots on color spot detection and reduce the training time of a subsequent algorithm model, at least one face local area image possibly containing color spots is respectively intercepted from each image in the image sample set to obtain a plurality of candidate region images. For example, the face area image may be a forehead area, a left face area, a right face area, a nose area, or the like, which are areas where color patches may be present. That is, the candidate region image may include at least one of a forehead region, a left face region, a right face region or a nose region in the image, so that a region where color spots may exist in the image is reserved, a background in the image is removed, interference of background features on model training is eliminated, and in addition, facial features are disassembled, interference of facial features on model training can be reduced. It is understood that the facial region image may have other division forms, such as a T region, a cheek region, or a chin region, and the like, so that, in this embodiment, the candidate region image includes at least one of the T region, the cheek region, or the chin region.

Each image in the image sample set can be intercepted by adopting the existing face key point algorithm, and the face key point algorithm can be Active Appearance Models (AAMs), Constrained Local Models (CLMs), Explicit Shape Regression (ESR), explicit sensitivity method (SDM) or the like.

As shown in fig. 4, a plurality of key points of the face of the human face are located according to the face key point algorithm, including points in the areas of eyebrows, eyes, nose, mouth, face contour, etc. Therefore, according to the key points, the face local area image can be determined, and then the candidate area image is intercepted according to the definition of the face local area image. In fig. 4, the face local area image is exemplarily illustrated as a forehead area, a left face area, a right face area or a nose area, and in this embodiment, an image may be obtained by cutting out 4 face local area images to obtain 4 candidate area images.

It is understood that the sizes of the candidate area images are different and not identical, and there are also insignificant color spots, for example, color spots with lighter colors. In order to eliminate the interference of the size of the images used for training on model learning and enhance the mottle characteristics, resolution enhancement processing and size adjustment are carried out on each candidate area image so as to obtain a plurality of training images with consistent sizes, wherein the resolution of each training image is higher than that of each candidate area image. Therefore, the sizes of the training images are unified, for example, the sizes of the training images can be 416 × 3, so that the interference of the sizes on model learning can be eliminated, and in addition, the resolution enhancement processing is also performed while the sizes are unified, so that the color spot features are enhanced.

And taking a plurality of training images marked with real labels as a training set. For any training image in the plurality of training images, namely a target training image, the real label of the training image comprises the real position, the real category and the real number of the color spots in the target training image. The real number of the color spots in the target training image is the number of the real positions of the color spots in the target training image, and the real number of the color spots in the target training image can be determined by counting the number of the real positions of the color spots in the target training image. The category of the stain is the category of the stain, for example, the category of the stain can comprise at least one of freckles, black spots, chloasma or senile plaques, and the specific category can be determined according to actual needs and can be classified into one or more categories. It will be appreciated that the actual labels may be manually marked using existing marking tools. It is understood that, since the actual amount of the color spots in the target training image can be obtained by the model statistics, in some embodiments, the actual amount of the color spots in the target training image may not be labeled, and only the actual positions and categories of the color spots in the target training image may be labeled.

And taking the training images marked with the real labels as a training set, and training a preset convolutional neural network so that the preset convolutional neural network learns the real labels of the training images to obtain a color spot detection model. After the preset convolutional neural network learns the characteristics and the real labels of a plurality of training images in a training set, predicting the position, the category and the number of color spots of each training image in the training set to obtain a prediction result, then calculating the error between the prediction result and the real labels through a preset loss function, and reversely adjusting the model parameters of the preset convolutional neural network according to the error. And (5) obtaining a color spot detection model with better accuracy through repeated iterative training.

Therefore, the number of the color spots is introduced during training so as to restrain the positions of the color spots, the number of the positions of the color spots can be trained towards the direction equal to the number of the real color spots, even if the predicted positions of the color spots are trained towards the direction same as the position of the real color spots, and therefore the accuracy of the color spot detection model is improved. For example, if the number of predicted color spots is 4 and the number of predicted color spot positions is only 3, it means that the deviation of the predicted color spot positions is large, the error of the entire model is large, and the model is trained in a direction with a small error, that is, the number of color spot positions is trained in a direction equal to the number of actual color spots, so that the color spot detection model has high accuracy.

In summary, the method can reduce the interference of the regional features (such as background features and facial features) which do not contain color spots on the training model in the model training process by intercepting the images in the image sample set into the candidate regional images, so that the model training accuracy can be improved; resolution enhancement processing and size adjustment are carried out on each candidate area image containing the color spots to obtain a plurality of training images with consistent sizes, the size adjustment can effectively avoid the interference of the sizes on model learning, the resolution enhancement can enhance the characteristics of the color spots, and therefore the model training precision can be further improved; in addition, the positions of the color spots are constrained by introducing the number of the color spots in the training process, so that the predicted positions of the color spots can be trained in the same direction as the positions of the real color spots, and the model training precision is further improved; the improvement of the multiple training precision ensures that the detection precision of the color spot detection model obtained by the method through training is higher. In addition, images in the image sample set are intercepted into candidate area images, and the size of the candidate area images is smaller than that of the original images, so that the calculation amount in the training process is favorably reduced, the model training and calculating speed is improved, and a color spot detection model with small calculation amount and high detection precision is favorably obtained.

In some embodiments, referring to fig. 5, the preset convolutional neural network includes a feature extraction module, a mottle target detection module and a mottle number detection module, wherein the feature extraction module is configured to extract features of an image, the mottle target detection module is configured to detect a position and a category of a mottle in the image, and the mottle number detection module is configured to detect a number of the mottles in the image.

Based on the above feature extraction module, the color spot target detection module and the color spot number detection module, referring to fig. 6, the step S24 specifically includes:

s241: and inputting the target training image into the feature extraction module to obtain training feature maps of at least two sizes.

S242: and dividing the training characteristic diagram into two paths to obtain a first path of training characteristic image and a second path of training characteristic image.

S243: and inputting the first path of training characteristic image into the color spot target detection module to obtain the predicted position and the predicted category of the color spot in the target training image.

S244: and inputting the second path of training feature map into the color spot quantity detection module to obtain the predicted quantity of the color spots in the target training image.

S245: and calculating an error between a predicted label of the target training image and a real label of the target training image according to a preset loss function, wherein the predicted label comprises the predicted position, the predicted category and the predicted quantity.

S246: and adjusting the model parameters of the preset convolutional neural network according to the error, and returning to execute the step S241 until the preset convolutional neural network is converged to obtain the color spot detection model.

Inputting the training set into the feature extraction module for convolution feature processing, learning image features, such as shapes, edges and the like, of each training image (target training image) in the training set, and extracting the image features to obtain a training feature map.

In this embodiment, since the convolution feature processing reduces the spatial dimension and resolution of the image, a training feature map of a single size cannot satisfy multi-size target detection. In order to achieve detection of both large-size targets (large-area color spots) and small-size targets (small-area color spots), in this embodiment, a multi-scale training feature map is used for training, that is, the target training image is subjected to convolution feature processing by the feature extraction module, and a plurality of feature maps are output. And selecting at least two sizes of feature maps from the plurality of feature maps as training feature maps for detecting the color spots.

In some embodiments, the feature extraction module includes a plurality of feature convolution layers, and the number of convolution kernels of the plurality of feature convolution layers tends to increase first and then decrease as the number of layers of the feature convolution layers increases. It can be understood that a feature convolution layer correspondingly outputs a feature map, and the more the number of convolution kernels of the feature convolution layer is, the stronger the feature extraction capability is, the more features in the corresponding feature map are, the further the feature map deviates from the image in the original training set. Therefore, in the shallow feature convolution layer, the number of convolution kernels of the plurality of feature convolution layers increases as the number of layers of the feature convolution layer increases. However, as the number of convolution kernels increases, the amount of model computation increases, making the model computation inefficient. In order to balance the computation efficiency of the model, in the deeper feature convolution layer, the number of convolution kernels is reduced along with the increase of the layer number of the feature convolution layer, so that the computation amount of the model is reduced, and the computation efficiency is improved.

In one embodiment, the feature extraction module includes 16 feature convolution layers and 5 pooling layers, wherein the pooling layers are used for dimensionality reduction and are located behind the feature convolution layers needing dimensionality reduction. The overall structure of the feature extraction module is shown in table 1 below:

TABLE 1 Structure of feature extraction Module

The number of convolution kernels of 1-7 layers of the feature convolution layers is gradually increased to acquire more features, and the total number of the feature convolution layers 8-16 is reduced to reduce the operation amount of the model, so that the feature extraction and training efficiency is balanced.

In this embodiment, feature maps output by the 10 th layer, the 13 th layer and the 16 th layer are selected as the training feature maps. That is, the training feature maps for the input models were of sizes 13 × N, 26 × N, and 52 × N. The training feature pattern of 13 × N is suitable for detecting large-size color spots, the training feature pattern of 26 × N is suitable for detecting medium-size color spots, and the training feature pattern of 52 × N is suitable for detecting small-size color spots.

In some embodiments, the plurality of feature convolution layers includes a target feature convolution layer having a number of convolution kernels that is a number of categories of the color patch. Referring to fig. 5 again, the target feature convolution layer is a feature convolution layer connected to the pigment spot target detection module and used for outputting the training feature map. For example, if there are 4 types of the color spots, in the above embodiment, the 10 th layer, the 13 th layer, and the 16 th layer are respectively connected to the color spot target detection module, and are used to output a training feature map, where N is set to 4, so as to specify a target finally output by network training, that is, a type and a position of the color spot, and further, in the training process of the network, the features learned by the network can be continuously adjusted according to a principle of minimizing a preset loss function, so as to learn the features belonging to the color spots in the entire training feature image.

The training feature map is divided into two paths to obtain a first training feature map and a second training feature map, and it can be understood that the first training feature map and the second training feature map are both obtained by copying the training feature maps, and the two training feature maps are completely the same.

And inputting the first path of training feature map into the color spot target detection module to obtain the predicted position and the predicted category of the color spot in the target training image. And inputting the second path of training feature map into the color spot quantity detection module to obtain the predicted quantity of the color spots in the target training image. For example, the training feature map of the target training image a is copied and divided into two paths, resulting in a first path of training feature map a1 and a second path of training feature map a 2. Then, the first training feature map a1 is input into the mottle target detection module, so that the mottle target detection module learns the features of the first training feature map a1, and predicts the location and the category of the mottle in the first training feature map a1 to obtain the predicted location and the predicted category of the mottle in the corresponding target training image a. Inputting the second road training feature map A2 into the color spot quantity detection module, so that the color spot quantity detection module learns the features of the second road training feature map A2, and predicts the number of color spots in the second road training feature map A2 to obtain the predicted number of color spots in the corresponding target training image A.

Based on the fact that the first training feature map a1 and the second training feature map a2 are completely the same and correspond to the same target training image a, the predicted position and the predicted category of the color spots predicted by the first training feature map a1 and the predicted number of the color spots predicted by the second training feature map a2 jointly form a predicted label of the target training image a. Thus, for the same target training image a, the location and class of the mottle in its prediction label are detected separately from the number in its prediction label, without correlation, and thus the prediction number can constrain the number of prediction locations.

Specifically, for any target training image in the training set, there are 3 sizes of training feature maps corresponding thereto: 13 × N, 26 × N, and 52 × N, dividing the training feature map of 13 × N into 13 × 13 meshes, dividing the training feature map of 26 × N into 26 × 26 meshes, and dividing the training feature map of 52 × N into 52 meshes, wherein 3 prior frames are provided for each mesh, and each prior frame generates 10647 prediction frames, which total 13 × 13 +26 × 3+52 × 52. The prior frames are rectangular frames with different sizes predefined at each position in the training feature map, and the rectangular frames contain different aspect ratios. The a priori box is used to approximately frame out the likely locations of the stain first. The priori frames are mapped and converted in the prediction convolution layer of the color spot target detection module, the positions are adjusted, and prediction frames corresponding to the priori frames one to one are output. The prediction frame reflects the mottle condition of a region of a grid mapped on the target training image in the training feature map, wherein the mottle condition comprises a preset position (namely the position of the prediction frame), a preset category, the confidence level of the prediction position and the confidence level of the prediction category. For example, the position of the prediction frame 1 of the grid B is (x1, y1, w1, h1), the prediction category is chloasma, and the confidence of the prediction category is 90%, that is, in the region mapped on the original training image by the grid B, the probability that the pixel region framed by the prediction frame 1(x1, y1, w1, h1) is chloasma is 90%, and in addition, the confidence of the prediction frame 1 is 80%, it indicates that the proximity of the prediction frame 1 and the real frame is 80%, where the real frame is the position of the pigmented spot and the type of the pigmented spot in the real label.

In addition, the second path of training feature map is input into the color spot number detection module, the color spot data detection module learns the features in the second path of training feature map, and a regression relation between the features and the number is generated, and the regression relation can fit the relation between the features and the real number to the maximum extent, so that the error between the predicted number predicted by the regression relation and the features and the real number is minimum.

During the training process, the model sets an appropriate confidence level for the predicted positions according to the predicted number, for example, when the predicted number is 4, if the predicted positions are 3, the predicted positions have a low confidence level, and if the predicted positions are 4, the predicted positions have a high confidence level.

Then, calculating the error between the predicted label and the real label according to a preset loss function, namely calculating the error between each predicted frame and the real frame. It is understood that, when calculating the error between the prediction frame and the real frame, the error calculation may be involved by calculating all the prediction frames corresponding to the target training image.

Finally, the preset convolution neural network can reversely adjust the model parameters according to the errors, and after new model parameters are determined, the color spot detection model can be obtained.

In some embodiments, the adam algorithm may be used to optimize the model parameters, the number of iterations may be set to 500, the initial learning rate is set to 0.001, the weight attenuation is set to 0.0005, the learning rate is attenuated to 1/10 as it is, and after training, the model parameters of the color spot detection model are output, that is, the color spot detection model is obtained.

In this embodiment, on one hand, the multi-size training feature map is used for detection, so that the influence of the spatial dimension of the image on the detection result can be eliminated, the detection of color spots of various sizes can be realized, and the detection precision is high. On the other hand, the training characteristic diagram is divided into two paths, wherein one path is used for learning and predicting the positions and the types of the color spots, and the other path is used for learning the predicted quantity of the color spots, so that the predicted quantity of the color spots and the predicted positions are independent and unrelated, the quantity and the confidence coefficient of the predicted positions can be more accurately constrained by the predicted quantity, and the quantity of the color spots is trained in the direction equal to the real quantity of the color spots.

In some embodiments, the preset loss function is a weighted sum of a location loss function for calculating an error between the real location and the predicted location, a class loss function for calculating an error between the real class and the predicted class, and a quantity loss function for calculating an error between the real quantity and the predicted quantity. Namely, the predicted quantity and the real quantity are fused to the loss function, and the parameter error is calculated, so that the error between the real label and the predicted label is more accurate. The error is used for reversely adjusting the model parameters, so that the model parameters are more reasonable, and the accuracy of the color spot detection model can be improved.

In some embodiments, the step S245 specifically includes:

the true number of c-th color spots in the K-th color spots,

the predicted number of c-th color spots in the K-th color spots.

In the training process, for the target training image, one corresponding size training feature map is divided into S_e*S_eEach grid is provided with f prior frames, and each prior frame generates a corresponding prediction frame through the network, so that a final formed frame

A prediction box, i.e.

Where n is the total number of dimensions of the training feature map. It can be understood that the loss function is to participate in calculating errors from all the prediction results reflected by the prediction blocks, so that the errors can reflect the differences between all the prediction labels and the true labels, and thus, the color spot detection model obtained by reversely adjusting the model parameters by the errors has higher accuracy.

Among the above loss functions, the position loss function is

Wherein the content of the first and second substances,

for the error between the predicted position reflected by one of the M prediction frames and the true position of the plaque, (2-T)_width*T_height) Is a scale factor for increasing position loss

The weight of (c). Wherein r ∈ (x, y, w, h), thereby,

is equivalent to

Wherein (x)_i,y_i) To predict the coordinates of the upper left corner of the box,

as the coordinates of the upper left corner of the real frame (real position), w_iIn order to predict the width of the frame,

width of real frame, h_iIn order to predict the width of the frame,

is the height of the real frame, it can be known that (T)_r-P_r)²Reflecting the error between the predicted position and the actual position.

Thus, the position loss function constrains the relationship between the predicted position (predicted box) of the preset convolutional neural network output and the true position (true box) in the true tag, i.e., minimizes the predicted position P_rWith the true position T_rAnd the position (prediction frame) of the color spot output by the preset convolution neural network continuously approximates to the position (real frame) of the color spot in the real label so as to optimize the model parameters.

Wherein the class loss function is

When predicting class P_classAs a true class T_classWhen the class loss value is 1, the class P is predicted_classNot of true class T_classWhen the class loss value is class P_classThe probability of (c). Therefore, the class loss function constrains the relationship between the probability of the class of the color spot output by the preset convolutional neural network and the true probability of the color spot class in the true tag, namely minimizes the error between the true probability of the color spot class in the true tag and the probability of the output prediction class, so that the probability of the prediction class output by the preset convolutional neural network is continuously close to the true probability of the color spot class in the true tag, and the model parameters are optimized.

Wherein the number loss function is

Wherein β is a number weight, nThe degree of constraint of the quantity on the predicted position can be adjusted by adjusting beta. It can be known that, for the target training image, the predicted number of the c-th color patches is predicted in the color patch number detection module, so that the predicted number of the c-th color patches is obtained

With a real number

The difference values are compared (difference value is calculated), and the difference values of all types of color spots are accumulated to obtain a loss function reflecting the quantity loss. Therefore, the quantity loss constrains the relationship between the predicted quantity output by the preset convolutional neural network and the real quantity in the real label, namely, the error between the real quantity and the predicted quantity is minimized, so that the quantity of the predicted positions is accurate, the color spot predicted positions are trained in the direction equal to the color spot real positions, and the model parameters are optimized.

It is understood that the above-mentioned loss function also includes a confidence loss function, which is α × (T)_conf-P_conf)²Wherein, T_confAs confidence of the real label, P_confIs the confidence of the predicted label. When the target training image corresponds to a prediction frame P_confLower, indicating that the prediction frame contains no color spots or less color spots, which is associated with T_confThe difference between them is large and the difference is taken into account in the losses. And the model parameters are reversely adjusted through the loss, so that the prediction frame output by the preset convolutional neural network continuously approaches to the real frame, the error between the prediction label and the real label is reduced, the model parameters reach the optimal solution, and the wrinkle detection model is obtained.

In this embodiment, the error calculated by the predetermined loss function is subjected to gradient back propagation, back propagation and model parameter adjustment, so that the predicted label is continuously close to the real label, thereby improving the accuracy of the color spot detection model.

In some embodiments, referring to fig. 7, the step S23 specifically includes:

s231: and acquiring a target candidate region image and the size of a target training image, wherein the target candidate region image is any candidate region image in the candidate region images, and the target training image is obtained by mapping the target candidate region image.

S232: and acquiring a second pixel coordinate corresponding to a target pixel point in the target candidate region image according to a first pixel coordinate of the target pixel point in the target training image, the size of the target candidate region image and the size of the target training image, wherein the target pixel point is any pixel point in the target training image.

S233: and acquiring a plurality of neighborhood pixel points of the second pixel coordinate, wherein the neighborhood pixel points are pixel points which are adjacent to the second pixel coordinate in the target candidate region image.

S234: and obtaining the weight of each neighborhood pixel point according to the position relation between the second pixel coordinate and the plurality of neighborhood pixel points.

S235: and determining the pixel value of the target pixel point according to the weight of each neighborhood pixel point and the pixel value of each domain pixel point so as to obtain the target training image.

The target candidate region image is any one of the candidate region images, the target candidate region image is obtained, and a target training image is generated based on the target candidate region image, wherein the size of the target training image is preset and can be specifically set according to actual conditions. Therefore, the size of the target training image is determined, and the pixel value of each pixel point of the new target training image is obtained through mapping transformation based on the pixel value of each pixel point in the original target candidate region image.

And for any pixel point in the target training image, namely a target pixel point, acquiring a corresponding second pixel coordinate of the target pixel point in the target candidate region image according to a first pixel coordinate of the target pixel point in the target training image. It can be understood that the first pixel coordinate is a position of the target pixel point in the target training image, and the first pixel coordinate may be described by (row, column), the second pixel coordinate is a position of the target pixel point mapped in the target candidate region image, and the second pixel coordinate may also be described by (row, column). And the second pixel coordinate is obtained by converting the first pixel coordinate, the size of the target candidate area image and the size of the target training image. For example, assuming that the size of the target candidate region image a is M × N and the size of the target training image B is M × N, the second pixel coordinates a (X, Y) of the target pixel point B (X, Y) in the target training image B in the target candidate region image a may be obtained as a (X × M), Y (N/N)) according to the size ratio between the target candidate region image and the target training image. It is understood that X ═ X (M/M) and Y ═ Y (N/N) in the second pixel coordinate may be integers or fractions.

As shown in fig. 8, a point P in the target candidate region image a is a position of the target training image B in the target candidate region image a at (X, Y), that is, coordinates of the point P are (X (M/M), Y (N/N), and thus, the point P is a position of the second pixel coordinate.

Obtaining the weight of each neighborhood pixel point according to the position relation between the second pixel coordinate and the plurality of neighborhood pixel points, namely according to the points P and a_ijDetermining the positional relationship of (a)_ijThe weight is an influence factor of the pixel value of the target pixel point from the neighborhood pixel point. Neighborhood pixel a_ijThe closer to the second pixel coordinate (point P), the greater the influence on the pixel value of the target pixel point, and the neighborhood pixel point a_ijThe greater the weight of (c). Based on the characteristics, the weight function can be constructed through mathematical modeling according to the position relation between the second pixel coordinate and the neighborhood pixel point.

It will be understood that in some embodimentsThe position relationship between the second pixel coordinate and each neighborhood pixel point can be divided into a row position relationship and a column position relationship, that is, the weight includes a row weight and a column weight. Thus, in some embodiments, the row and column weights are calculated separately by the weight function. It can be understood that the contribution value of the neighborhood pixel point to the target pixel point is the pixel value of the neighborhood pixel point multiplied by the corresponding row weight and column weight, so that the pixel value of the target pixel point is the sum of the contribution values of the neighborhood pixel points, for example, the pixel value of the target pixel point B (X, Y) is the sum of the contribution values of the neighborhood pixel points a₀₀-a₃₃The sum of the contribution values of (a).

And processing each pixel point in the target training image according to the mode of the target pixel point, so as to obtain the pixel value of each pixel point in the target training image, thereby obtaining the target training image.

In this embodiment, first, according to the size of the target candidate region image and the size of the target training image, a target pixel point is mapped to a second pixel coordinate point in the candidate region image, and then, according to pixel values and positions of a plurality of neighborhood pixel points of the second pixel coordinate, the pixel value of the target pixel point is determined, so that a boundary between a pixel point corresponding to a color spot and a pixel point corresponding to a surrounding non-color spot is clear, thereby enhancing a color spot characteristic to obtain a high-resolution target training image, and in addition, the sizes of the target training images can be unified.

In summary, in the method for training the color spot detection model, the images in the image sample set are intercepted into the candidate region images, so that the interference of region features (such as background features and facial features) which do not contain color spots on the training model can be reduced in the model training process, and the model training accuracy can be improved; resolution enhancement processing and size adjustment are carried out on each candidate area image containing the color spots to obtain a plurality of training images with consistent sizes, the size adjustment can effectively avoid the interference of the sizes on model learning, the resolution enhancement can enhance the characteristics of the color spots, and therefore the model training precision can be further improved; in addition, the positions of the color spots are constrained by introducing the number of the color spots in the training process, so that the predicted positions of the color spots can be trained in the same direction as the positions of the real color spots, and the model training precision is further improved; the improvement of the multiple training precision ensures that the detection precision of the color spot detection model obtained by the method through training is higher. In addition, images in the image sample set are intercepted into candidate area images, and the size of the candidate area images is smaller than that of the original images, so that the calculation amount in the training process is favorably reduced, the model training and calculating speed is improved, and a color spot detection model with small calculation amount and high detection precision is favorably obtained.

In the following, the method for detecting color spots provided by the embodiment of the present invention is described in detail, referring to fig. 9, the method S30 includes but is not limited to the following steps:

s31: and acquiring a face image to be detected.

S32: and intercepting at least one target face local image possibly with color spots from the face image to be detected.

S33: detecting the at least one target face local area image by using the color spot detection model in any one of the above embodiments, and acquiring the position, the category and the number of the color spots of the at least one target face local area image.

S34: and determining the position, the category and the number of the color spots in the face image to be detected based on the position, the category and the number of the color spots of the at least one target face local area image and the position of the target face local area image in the face image to be detected.

The image of the face to be detected is an image of the face of the person, and can be acquired by the image acquisition device 20, for example, the image of the face to be detected can be a certificate photo or a self-portrait photo acquired by the image acquisition device 20. Here, the source of the face image to be measured is not limited at all, and the face image may be the face image of the person.

It can be understood that the face image to be detected includes a face and a background, wherein the color spots may exist only in a local face area of the face, which results in that the color spot features are not obvious relative to the area features (such as background features and facial features) containing no color spots, thereby affecting the color spot detection. In order to reduce the interference of regional features not containing color spots on color spot detection, reduce detection time and improve detection efficiency, at least one target face local image possibly containing color spots is respectively intercepted from the face image to be detected.

Then, the at least one target face local area image is detected by using the stain detection model in any one of the above embodiments, and the position, category and number of the stains of the at least one target face local area image are obtained.

And finally, mapping at least one target face local image into the face image to be detected based on the position, the category and the quantity of the color spots of the at least one target face local image and the position of the target face local image in the face image to be detected, so that the position, the category and the quantity of the color spots in the face image to be detected can be determined. The number of the color spots in the face image to be detected is equal to the sum of the number of the color spots in each target local face image, the position of the color spots in the face image to be detected is the position when the position of the color spots in each target local face image is mapped back to the face to be detected, and the category of the color spots in the face image to be detected is the set of the categories of the positions of the color spots in each target local face image.

It can be understood that the color spot detection model is obtained by training through the method for training the color spot detection model in the above embodiment, and the structure and function of the color spot detection model are the same as those of the color spot detection model in the above embodiment, and are not described in detail here.

It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a general hardware platform, and certainly can also be implemented by hardware. It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a computer readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; within the idea of the invention, also technical features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of training a color spot detection model, comprising:

acquiring an image sample set comprising a human face;

2. The method of claim 1, wherein the preset convolutional neural network comprises a feature extraction module, a mottle target detection module and a mottle quantity detection module, and the training the plurality of training images labeled with real labels as a training set to train the preset convolutional neural network so that the preset convolutional neural network learns the real labels of the plurality of training images to obtain a mottle detection model comprises:

3. The method of claim 2, wherein the feature extraction module comprises a plurality of feature convolution layers, and the number of convolution kernels of the plurality of feature convolution layers tends to increase first and then decrease as the number of layers of the feature convolution layers increases.

4. The method of claim 3, wherein the plurality of feature convolution layers includes a target feature convolution layer, the number of convolution kernels of the target feature convolution layer is the number of the types of the color patches, and the target feature convolution layer is a feature convolution layer connected to the color patch target detection module and configured to output the training feature map.

5. The method according to claim 2, wherein the preset loss function is a weighted sum of a position loss function for calculating an error between the real position and the predicted position, a category loss function for calculating an error between the real category and the predicted category, and a quantity loss function for calculating an error between the real quantity and the predicted quantity.

6. The method of claim 5, wherein the calculating an error between the predicted label of the target training image and the true label of the target training image according to a preset loss function comprises:

wherein λ is_objIs a weight for the offset of the preset position,alpha is confidence coefficient loss weight, beta is quantity weight, M is the number of the preset prediction frames for predicting the color spot position, i is the mark number of the prediction frame, T_widthIs the width, T, of the prediction box_heightFor the height of the prediction box, (x, y, w, h) are the coordinates of the upper left corner of the prediction box and the width, height, T_rFor indicating the true position of the mottle, P_rFor indicating the predicted position of the color spot, K is the number of the types of the color spot, T_classIs the true category of the stain, P_classAs a predictive category of the stain, T_confAs confidence of the real label, P_confFor the confidence level of the predicted label or labels,

the true number of c-th color spots in the K-th color spots,

the predicted number of c-th color spots in the K-th color spots.

7. The method according to any one of claims 1 to 6, wherein the performing resolution enhancement processing and resizing on each candidate region image to obtain a plurality of training images with consistent sizes comprises:

and determining the pixel value of the target pixel point according to the weight of each neighborhood pixel point and the pixel value of each domain pixel point so as to obtain the target training image.

8. A method of detecting color spots, comprising:

acquiring a face image to be detected;

detecting the at least one target local facial image using the stain detection model of any one of claims 1-7, obtaining the location, category and amount of the stains of the at least one target local facial image;

9. An electronic device, comprising:

at least one processor, and

a memory communicatively coupled to the at least one processor, wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

10. A non-transitory computer-readable storage medium having stored thereon computer-executable instructions for causing an electronic device to perform the method of any one of claims 1-8.