CN112347843A

CN112347843A - Method and related device for training wrinkle detection model

Info

Publication number: CN112347843A
Application number: CN202010989797.8A
Authority: CN
Inventors: 曾梦萍
Original assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Current assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Priority date: 2020-09-18
Filing date: 2020-09-18
Publication date: 2021-02-09

Abstract

The embodiment of the invention relates to the technical field of target detection, in particular to a method and a related device for training a wrinkle detection model. On the basis, the position deviation is used for calculating an error between the first label and the detection result, and the model parameters of the preset convolutional neural network are reversely adjusted according to the error, so that the model parameters can be better optimized, and the obtained wrinkle detection model is more accurate.

Description

Method and related device for training wrinkle detection model

Technical Field

The embodiment of the invention relates to the technical field of target detection, in particular to a training wrinkle detection model and a related device.

Background

Face wrinkle detection techniques are increasingly used in a variety of fields. For example, in the field of cosmetic development, there is a need to design cosmetic products or recommend corresponding cosmetics to users for different facial wrinkle characteristics. For example, in the field of special beautifying effects for photographs, special beautifying effects of different degrees are required for different facial wrinkles. For another example, in face recognition, it is necessary to verify the identity of a user based on facial wrinkles as a user feature.

The current common face wrinkle detection technology is to segment a region in which wrinkles appear in a concentrated manner through key feature points of a human face, and then process the region in which wrinkles appear in a concentrated manner by adopting processing methods such as color rules, binarization and the like to obtain a wrinkle result. However, it is susceptible to interference from factors present on the face itself, such as hair, large pores, etc. In addition, only whether or not there is a wrinkle can be recognized, and then, the kind of wrinkle cannot be recognized. That is, no fine study was performed on the category of wrinkles.

Disclosure of Invention

The embodiment of the invention mainly solves the technical problem of providing a method and a related device for training a wrinkle detection model, wherein the trained wrinkle detection model can be used for rapidly and accurately classifying and positioning wrinkles.

To solve the above technical problem, in a first aspect, an embodiment of the present invention provides a method for training a wrinkle detection model, including:

acquiring an image sample comprising a human face;

intercepting a face area image according to the image sample, wherein the face area image is marked with a first label, and the first label comprises a wrinkle position and a wrinkle category of wrinkles in the image sample;

inputting a preset convolutional neural network into a face region image marked with the first label as a training sample to obtain a detection result of the training sample output by the preset convolutional neural network, wherein the detection result comprises a bounding box position corresponding to wrinkles in the training sample identified by the preset convolutional neural network, a probability that the wrinkles corresponding to the bounding box position belong to each wrinkle category and a confidence corresponding to the bounding box position;

calculating the minimum closure area of the bounding box position and the wrinkle position according to the bounding box position and the wrinkle position, and the union area of the bounding box position and the wrinkle position;

calculating a position deviation between the bounding box position and the wrinkle position according to the bounding box position, the wrinkle position, the minimum closure area and the union area;

calculating an error between the first label and the detection result according to the position of the bounding box, the position of the wrinkle, the position deviation, the wrinkle category, the probability and the confidence degree corresponding to the position of the bounding box;

and reversely adjusting the model parameters of the preset convolutional neural network according to the error so as to obtain a wrinkle detection model.

In some embodiments, said calculating a position deviation between said bounding box position and said wrinkle position based on said bounding box position, said wrinkle position, said minimum closure area, and said union area comprises:

calculating an absolute difference between the minimum closure area and the union area;

calculating an intersection ratio between the bounding box position and the wrinkle position, wherein the intersection ratio is used for feeding back the overlapping degree between the bounding box position and the wrinkle position;

and subtracting a first ratio from the intersection ratio to obtain the position deviation, wherein the first ratio is the ratio of the absolute difference value to the minimum closure area.

In some embodiments, the calculating an error between the first label and the detection result according to the bounding box position, the wrinkle position, the position deviation, the wrinkle category, the probability, and the confidence corresponding to the bounding box position includes:

calculating an error between the first label and the detection result according to the following formula:

wherein λ is_coordAnd λ_noobjFor presetting weight, M is the number of preset prior frames for predicting wrinkle positions, j is the mark number of the prior frames, KxK is the number of grids contained in a convolution feature map extracted by the preset convolution feature neural network and used for determining the detection result, i is the mark number of the grids, (x is the mark number of the grids)_i,y_i,w_i,h_i) In order to be the position of the bounding box,

as the position of the wrinkles, C_iFor the confidence level corresponding to the bounding box,

is the true value of the wrinkle location, p_i(c) In order to be the probability of the event,

to true probability, c to wrinkle class, I_ij ^objWhether the jth prior box representing the ith grid is responsible for wrinkles, I_ij ^noobjIndicating whether the jth prior frame of the ith grid is not responsible for wrinkles, wherein if the position deviation is not 0, then I_ij ^objIs 1, I_ij ^noobjIs 0, if the positional deviation is 0, then I_ij ^objIs 0, I_ij ^noobjIs 1.

In some embodiments, the inputting, as a training sample, the face region image labeled with the first label into a preset convolutional neural network to obtain a detection result of the training sample output by the preset convolutional neural network includes:

taking the face area image marked with the first label as a training sample, inputting the face area image into a feature convolution layer of the preset convolution neural network for feature sampling to obtain a first training feature map and a second training feature map, wherein a first downsampling multiple of the first training feature map relative to the face area image is larger than a second downsampling multiple of the second training feature map relative to the face area image, and the first downsampling multiple is smaller than 32;

and inputting the first training feature map and the second training feature map into a detection convolutional layer of the preset convolutional neural network to obtain a detection result of the training sample.

In some embodiments, before the step of inputting the face region image labeled with the first label as a training sample into a feature convolution layer of a preset convolutional neural network for feature sampling to obtain a first training feature map and a second training feature map, the method further includes:

and carrying out data augmentation processing on the training samples, wherein the data augmentation processing comprises at least one of image translation, rotation, clipping and histogram equalization.

In some embodiments, the performing data augmentation processing on the training samples includes:

and traversing the face region images in the training sample, and randomly selecting at least one of translation, rotation, cutting and histogram equalization of the target face region image from the images to perform data amplification processing, wherein the target face region image is any face region image in the training sample.

In order to solve the above technical problem, in a second aspect, an embodiment of the present invention provides a method for detecting wrinkles, including:

acquiring a face image to be detected;

the wrinkle detection model according to the first aspect is used to detect the face image to be detected, and the wrinkle position and the wrinkle category of wrinkles in the face image to be detected are obtained.

In some embodiments, the detecting the face image to be detected by using the wrinkle detection model according to the first aspect, and acquiring wrinkle positions and wrinkle categories of wrinkles in the face image to be detected includes:

inputting the face image to be detected into a feature convolution layer in the wrinkle detection model for feature sampling to obtain a first feature image to be detected and a second feature image to be detected, wherein a first downsampling multiple of the first feature image to be detected relative to the face image to be detected is larger than a second downsampling multiple of the second feature image to be detected relative to the face image to be detected, and the second downsampling multiple is smaller than 32;

and inputting the first characteristic diagram to be detected and the second characteristic diagram to be detected into a detection convolution layer in the wrinkle detection model so as to obtain the wrinkle position and the wrinkle category of the wrinkles in the face image to be detected.

In order to solve the above technical problem, in a third aspect, an embodiment of the present invention provides an image processing apparatus, including one or more functional modules, where the one or more functional modules are configured to implement any one of the implementation manners of the first aspect or the second aspect.

In order to solve the above technical problem, in a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium storing computer-executable instructions for causing an electronic device to perform the method according to the first aspect and the method according to the second aspect.

The embodiment of the invention has the following beneficial effects: different from the situation in the prior art, in the method for training a wrinkle detection model provided in the embodiment of the present invention, an image sample including a human face is obtained, a human face area image is intercepted according to the image sample, the human face area image is labeled with a first label, where the first label includes a wrinkle position and a wrinkle category of a wrinkle in the image sample, the human face area image labeled with the first label is used as a training sample and is input to a preset convolutional neural network to obtain a detection result of the training sample output by the preset convolutional neural network, where the detection result includes a bounding box position corresponding to the wrinkle in the training sample identified by the preset convolutional neural network, a probability that the wrinkle corresponding to the bounding box position belongs to each wrinkle category, and a confidence degree corresponding to the bounding box position, according to the bounding box position, The wrinkle detection method comprises the steps of calculating a minimum closure area of a boundary frame position and the wrinkle position and a union area of the boundary frame position and the wrinkle position, calculating a position deviation between the boundary frame position and the wrinkle position according to the boundary frame position, the wrinkle position, the minimum closure area and the union area, calculating an error between a first label and a detection result according to the boundary frame position, the wrinkle position, the position deviation, the wrinkle category, the probability and a confidence degree corresponding to the boundary frame position, and finally reversely adjusting model parameters of the preset convolutional neural network according to the error to obtain a wrinkle detection model.

That is, the method calculates the position deviation between the bounding box position and the wrinkle position according to the bounding box position, the wrinkle position, the minimum closure area and the union area, and can better reflect the coincidence degree of the bounding box position and the wrinkle position. On the basis, the position deviation is used for calculating an error between the first label and the detection result, and the model parameters of the preset convolutional neural network are reversely adjusted according to the error, so that the model parameters can be better optimized, and the obtained wrinkle detection model is more accurate.

Drawings

One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.

FIG. 1 is a schematic diagram of an operating environment of a method for training a wrinkle detection model according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a method for training a wrinkle detection model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an image of a face region captured by the method shown in FIG. 2;

FIG. 4 is a schematic illustration of a training sample labeled with a first label in the method of FIG. 2;

FIG. 5 is a diagram illustrating wrinkle locations, bounding boxes, and minimum closure areas in the method of FIG. 2;

FIG. 6 is a schematic flow chart illustrating a sub-process of step S25 in the method of FIG. 2;

FIG. 7 is a schematic flow chart illustrating a sub-process of step S23 in the method of FIG. 2;

FIG. 8 is a flow chart illustrating a data enhancement process according to an embodiment of the present invention;

FIG. 9 is a flowchart illustrating a method for detecting wrinkles according to an embodiment of the present invention;

FIG. 10 is a schematic illustration of the wrinkle classification and wrinkle location detected in the method of FIG. 9;

fig. 11 is a schematic view of a sub-flow of step S32 in the method of fig. 9.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

It should be noted that, if not conflicted, the various features of the embodiments of the invention may be combined with each other within the scope of protection of the present application. Additionally, while functional block divisions are performed in apparatus schematics, with logical sequences shown in flowcharts, in some cases, steps shown or described may be performed in sequences other than block divisions in apparatus or flowcharts. Further, the terms "first," "second," "third," and the like, as used herein, do not limit the data and the execution order, but merely distinguish the same items or similar items having substantially the same functions and actions.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Fig. 1 is a schematic operating environment diagram of a method for training a wrinkle detection model according to an embodiment of the present invention. Referring to fig. 1, the electronic device 10 and the image capturing apparatus 20 are included, and the electronic device 10 and the image capturing apparatus 20 are connected in a communication manner.

The communication connection may be a wired connection, for example: fiber optic cables, and also wireless communication connections, such as: WIFI connection, bluetooth connection, 4G wireless communication connection, 5G wireless communication connection and so on.

The image acquiring device 20 is configured to acquire an image sample including a human face and may also be configured to acquire an image of the human face to be detected, where the image acquiring device 20 may be a terminal capable of capturing images, for example: a mobile phone, a tablet computer, a video recorder or a camera with shooting function.

The electronic device 10 is a device capable of automatically processing mass data at high speed according to a program, and is generally composed of a hardware system and a software system, for example: computers, smart phones, and the like. The electronic device 10 may be a local device, which is directly connected to the image capturing apparatus 20; it may also be a cloud device, for example: a cloud server, a cloud host, a cloud service platform, a cloud computing platform, etc., the cloud device is connected to the image acquisition apparatus 20 through a network, and the two are connected through a predetermined communication protocol, which may be TCP/IP, NETBEUI, IPX/SPX, etc. in some embodiments.

It can be understood that: the image capturing device 20 and the electronic apparatus 10 may also be integrated together as an integrated apparatus, such as a computer with a camera or a smart phone.

The electronic device 10 receives the image sample including the face sent by the image acquisition device 20, trains the image sample to obtain a wrinkle detection model, and detects the position and the category of wrinkles of the face image to be detected sent by the image acquisition device 20 by using the wrinkle detection model. It is to be understood that the above training of the wrinkle detection model and the detection of the face image to be detected may also be performed on different electronic devices.

In the following, a method for training a wrinkle detection model according to an embodiment of the present invention is described in detail, referring to fig. 2, the method S20 includes, but is not limited to, the following steps:

s21: an image sample including a human face is acquired.

S22: and intercepting a face area image according to the image sample, wherein the face area image is marked with a first label, and the first label comprises the wrinkle position and the wrinkle category of wrinkles in the image sample.

S23: and inputting the face area image marked with the first label as a training sample into a preset convolutional neural network to obtain a detection result of the training sample output by the preset convolutional neural network, wherein the detection result comprises a bounding box position corresponding to wrinkles in the training sample identified by the preset convolutional neural network, the probability that the wrinkles corresponding to the bounding box position belong to each wrinkle category and a confidence corresponding to the bounding box position.

S24: and calculating the minimum closure area of the boundary box position and the wrinkle position according to the boundary box position and the wrinkle position, and the union area of the boundary box position and the wrinkle position.

S25: and calculating the position deviation between the position of the bounding box and the position of the wrinkle according to the position of the bounding box, the position of the wrinkle, the minimum closure area and the union area.

S26: and calculating the error between the first label and the detection result according to the position of the bounding box, the position of the wrinkle, the position deviation, the wrinkle category, the probability and the confidence degree corresponding to the position of the bounding box.

S27: and reversely adjusting the model parameters of the preset convolutional neural network according to the error so as to obtain a wrinkle detection model.

In this embodiment, the position deviation between the bounding box position and the wrinkle position is calculated according to the bounding box position, the wrinkle position, the minimum closure area, and the union area, so that the coincidence degree between the bounding box position and the wrinkle position can be better reflected. On the basis, the position deviation is used for calculating an error between the first label and the detection result, and the model parameters of the preset convolutional neural network are reversely adjusted according to the error, so that the model parameters can be better optimized, and the obtained wrinkle detection model is more accurate.

Specifically, in step S21, the image sample includes a human face and can be acquired by the image acquisition device, for example, the image sample can be a certificate photo or a self-portrait photo acquired by the image acquisition device. It is understood that the image samples may also be data in an existing open source face database, wherein the open source face database may be a FERET face database, a CMU Multi-PIE face database, a YALE face database, or the like. Here, the source of the image sample is not limited as long as the image sample includes a human face.

In step S22, the image sample includes a human face and a background, where the human face is a target area for detecting wrinkles. In order to reduce the interference of the background on wrinkle detection and reduce the training time of a subsequent algorithm model, only a face area image is intercepted as a sample. As shown in fig. 3, a face frame may be obtained by using an existing dlib toolkit, and then the width-to-height ratio of the face frame may be adjusted to the width-to-height ratio of the image sample by combining the width-to-height ratio of the image sample, so as to intercept the face region image. Wherein the dlib toolkit is a tool for object detection in images, for example, the dlib toolkit is used for face detection.

As shown in fig. 4, the face region image is labeled with a first label, that is, the face region image is labeled. Wherein the first label includes a wrinkle position and a wrinkle category of a wrinkle in the pattern sample, and the mark frame in fig. 4 is the wrinkle position in the first label. In some embodiments, the wrinkle category is a combination of a wrinkle type and a severity, the wrinkle type being used to feedback the type of the wrinkle, e.g., the wrinkle type includes at least one of a new line, an eyebrow line, a crow's tail line, or a french line. The severity is used to feedback the severity of the wrinkle, for example, the severity includes at least one of none, mild, moderate, or severe. Thus, the wrinkle classes may be subdivided into 16 classes. The wrinkle classes are set according to portions of the face where wrinkles are located, one wrinkle position corresponding to one wrinkle class. The wrinkle type and the wrinkle position can reflect the aging condition of the face, so as to assist the user to take targeted relieving measures, such as caring the skin around the eyes or changing the habit of lifting the eyebrows.

In the step S23, the face region image labeled with the first label is used as a training sample, and is input to a preset convolutional neural network, so as to obtain a detection result of the training sample output by the preset convolutional neural network.

Inputting a training sample into a preset convolutional neural network, wherein convolutional layers in the preset convolutional neural network can learn image characteristics, such as shapes, edges, classes and the like, of images in the training sample respectively according to a set of initial model parameters, then, the preset convolutional neural network can predict wrinkle classes and wrinkle positions in the image samples, and the preset convolutional neural network outputs a detection result of the training sample. The detection result comprises a boundary box position corresponding to wrinkles in the training sample identified by the preset convolutional neural network, the probability that the wrinkles corresponding to the boundary box position belong to each wrinkle category, and the confidence corresponding to the boundary box position.

The position of the bounding box corresponding to the wrinkle in the training sample is a position in the original image sample where the predicted wrinkle is located, and generally, the position of the bounding box includes 4 parameters: the coordinates of the center of the bounding box (x, y), the width of the bounding box w, and the height of the bounding box h. The probability that the wrinkle corresponding to the position of the bounding box belongs to each wrinkle category is the probability that the wrinkle framed by the bounding box in the image sample is a certain wrinkle category, for example, the probability that the wrinkle framed by the bounding box in the image sample is a new line is 0.9. The confidence corresponding to the position of the bounding box refers to the probability of wrinkles in the bounding box and the degree of the bounding box approaching the position of real wrinkles, that is, the confidence reflects the probability that the area framed by the bounding box in the original image sample is not the background, and the background is the area not being wrinkles in the original image sample, and reflects the confidence that the bounding box is the position of real wrinkles (real box) when the area framed by the bounding box in the original image sample includes wrinkles.

In step S24, the minimum closure area of the bounding box position and the wrinkle position and the union area of the bounding box position and the wrinkle position are calculated based on the bounding box position and the wrinkle position.

Wherein, the wrinkle position is used as a real frame, and the minimum closure area is an area of a minimum rectangular frame capable of framing the boundary frame and the real frame. For example, as shown in fig. 5, if the real frame corresponding to the wrinkle position is a frame a (solid line), the frame a has wrinkles c, and the bounding frame is a frame B (dotted line), the smallest rectangular frame that can enclose the bounding frame B and the real frame a is a frame D (solid line is thickened), and thus the smallest closed area is the area of the frame D.

The union area of the bounding box position and the wrinkle position refers to the area U of the union region of the bounding box and the real box. For example, as shown in fig. 5, the union area is an area corresponding to the union region of the bounding box B and the real box a.

In step S25, a positional deviation between the bounding box position and the wrinkle position is calculated from the bounding box position, the wrinkle position, the minimum closure area, and the union area.

The minimum closure area and the union area may reflect a positional deviation between the position of the bounding box and the position of the wrinkle, where the positional deviation is a degree of coincidence between the bounding box and the position of the wrinkle. For example, in the above example, the area of the frame D and the union area U reflect the positional deviation between the boundary frame B and the real frame B, and if the area of the frame D (the minimum closure area) and the union area U are larger, it is described that the degree of overlap between the boundary frame B and the real frame a is smaller and the positional deviation is larger, and if the area of the frame D (the minimum closure area) and the union area U are smaller, it is described that the degree of overlap between the boundary frame B and the real frame a is larger and the positional deviation is smaller.

Therefore, a mathematical model can be created from the bounding box position, the wrinkle position, the minimum closure area, and the union area, and a positional deviation between the bounding box position and the wrinkle position can be calculated.

In step S26, an error between the first label and the detection result is calculated according to the bounding box position, the wrinkle position, the positional deviation, the wrinkle type, the probability, and the confidence corresponding to the bounding box position.

The position error between the position of the bounding box in the detection result and the position of the wrinkle in the first label can be reflected through the position of the bounding box and the position of the wrinkle, the wrinkle category and the probability can reflect the category error between the wrinkle category in the detection result and the wrinkle category in the first label, and the confidence degree corresponding to the position of the bounding box and the position deviation can reflect the confidence degree error between the credibility of the position (real box) of the real wrinkle in the detection result and the real value of the position of the wrinkle in the first label. Therefore, errors are subdivided into position errors, category errors and confidence errors, the detection effect of the preset convolutional neural network model can be reflected, namely the loss of initial model parameters in the preset convolutional neural network model is reflected, and the model parameters can be adjusted according to the errors in the follow-up process.

Finally, in step S27, the model parameters of the preset convolutional network are adjusted according to the error, so as to obtain more accurate image characteristics, thereby improving the accuracy of the wrinkle detection model. The model parameters are convolution kernel parameters, namely weights and deviations of convolution kernels.

Therefore, through multiple times of training, errors are obtained, model parameters are adjusted, a new wrinkle detection model is generated in an iterative mode, the training is stopped until the errors are converged and fluctuate within a range, model parameters corresponding to the wrinkle detection model with the highest accuracy are selected as model parameters of the final wrinkle detection model, and the trained wrinkle detection model can be obtained.

In this embodiment, the method calculates the position deviation between the bounding box position and the wrinkle position according to the bounding box position, the wrinkle position, the minimum closure area, and the union area, so as to better reflect the coincidence degree of the bounding box position and the wrinkle position. On the basis, the position deviation is used for calculating an error between the first label and the detection result, and the model parameters of the preset convolutional neural network are reversely adjusted according to the error, so that the model parameters can be better optimized, and the obtained wrinkle detection model is more accurate.

In some embodiments, referring to fig. 6, the step S25 specifically includes:

s251: calculating an absolute difference between the minimum closure area and the union area.

S252: and calculating an intersection ratio between the bounding box position and the wrinkle position, wherein the intersection ratio is used for feeding back the overlapping degree between the bounding box position and the wrinkle position.

S253: and subtracting a first ratio from the intersection ratio to obtain the position deviation, wherein the first ratio is the ratio of the absolute difference value to the minimum closure area.

Wherein, the calculation formula of the intersection ratio is as follows:

IoU is the cross-over ratio, A is the area of the wrinkle position, B is the area of the bounding box, and GIoU is the position deviation.

In the case where the boundary frame and the wrinkle position are in wireless coincidence, the positional deviation G is 1, and the positional deviation G takes a minimum value of-1 when the boundary frame and the wrinkle position are not in intersection and are infinitely distant, so that the positional deviation can reflect not only the coincidence degree of the boundary frame and the wrinkle position but also the non-coincidence degree therebetween.

In the present embodiment, the positional deviation calculated by the above steps takes into account not only the overlapping area of the boundary frame and the wrinkle position (real frame) but also the non-overlapping area of the boundary frame and the wrinkle position, and for example, when there is no overlapping area between the boundary frame and the wrinkle position, the positional deviation is not both 0 but a negative number that reflects the degree of the misalignment, and thus, the overlapping degree of the boundary frame and the wrinkle position (real frame) can be better reflected.

In some embodiments, the step S26 specifically includes:

wherein λ is_coordAnd λ_noobjM is the number of the preset prior frames for predicting the wrinkle position, j is the mark number of the prior frames, and K multiplied by K is the convolution characteristic graph extracted by the preset convolution characteristic neural network and used for determining the detection resultI is the index of the grid, (x)_i,y_i,w_i,h_i) In order to be the position of the bounding box,

In this embodiment, the loss function is a result of integrating the position loss, the confidence loss, and the classification loss, wherein the position loss constrains the relationship between the position information (bounding box) of the class output by the preset convolutional neural network and the position information (wrinkle position) in the first label, i.e., minimizes the wrinkle position in the first label

The position (x) of the bounding box corresponding to the output detection result_i,y_i,w_i,h_i) And the position information (namely the bounding box) of the class output by the preset convolutional neural network continuously approximates to the position information (wrinkle position) of the class marked by the image sample so as to optimize the model parameters.

The classification loss constrains a relationship between the probability of the class output by the preset convolutional neural network and the true probability of the class in the first tag, that is, an error between the true probability of the class in the first tag and the probability of the class in the output detection result is minimized, so that the probability of the class output by the preset convolutional neural network is continuously close to the true probability of the class in the first tag, and the model parameters are optimized.

The confidence loss restrains and optimizes the model parameters through whether the grid contains the target (wrinkles), and the whole training process is supervised training, so that the detection result output by the preset convolutional neural network continuously approaches to the first label, the error between the detection result and the first label is reduced, the model parameters reach the optimal solution, and the wrinkle detection model is obtained.

Specifically, a training sample is input into a feature convolution layer in a preset convolution neural network for feature sampling so as to obtain at least one feature map. The preset convolution neural network carries out K multiplied by K rasterization on the feature map, namely the feature map is divided into K multiplied by K grids, then a prediction task of the position of a certain object (wrinkle) in the picture is given to a boundary frame of the grid where the center position of the object is located, one grid corresponds to M prior frames, namely for one grid, M prior frames are predicted and output to mark the wrinkle position of the wrinkle in an image area corresponding to the grid. For example, for a new line T in an image sample, which consists of pixels whose centers fall within a certain range of the grid S1, the convolutional neural network is preset to search for all pixels satisfying the new line T in a certain size range around the grid S1, that is, the new line T is predicted by a priori boxes whose centers fall within the grid S1, generally, one grid corresponds to one set of priori boxes, and 3 priori boxes form one set, and each priori box is only responsible for predicting wrinkles near the grid corresponding to the priori box.

Then, the prior frame is quantized, so that the size of the prior frame approximately accords with the size proportion of a real frame in the image sample, and the boundary frame is obtained. The real frame is the real wrinkle position in the image sample, and the size of the prior frame is diverse due to the diverse sizes of the real frame. Specifically, the K-means algorithm may be used to cluster the width and height of a representative shape (the size of several real boxes most frequently appearing in the training sample) in the real boxes of all the image samples in the training sample, for example, the prolate, the thin and tall, the approximate square, and the like, and 3 prior boxes in each group of prior boxes have different shapes. Each prior frame generates a bounding box after fine adjustment by sliding in the image sample, so that one grid corresponds to 3 bounding boxes of different sizes.

Because the 3 bounding boxes with different sizes can be responsible for predicting the wrinkles corresponding to the grids, but the error calculation is carried out on the bounding box which needs to be determined to be most matched with the real box for one grid, and the errors between the other bounding boxes and the real box can be properly reduced, so that the error of the preset convolutional neural network is reduced, and the prediction deviation of the preset convolutional neural network can be reflected. For example, for the grid S1, the corresponding wrinkle is a new line, there are 3 prior frames whose centers fall in the grid S1, where the prior frame L is closest to the real frame group channel at the grid S1, so that the bounding frame L 'generated by the prior verification frame L is used as a predicted value of the real frame group channel, participates in error calculation, and the bounding frame M generated by the prior frame M that has a larger difference from the real frame group channel and the bounding frame N' generated by the prior frame N are all weighted to reduce interference to errors.

Specifically, when the position deviation between the bounding box corresponding to the jth prior frame of the ith grid and the real frame is not 0, it indicates that the jth prior frame of the ith grid is responsible for wrinkles (including wrinkles), and I_ij ^objIs 1, I_ij ^noobj0, the error between the bounding box corresponding to the jth prior box of the ith grid and the real box

Participating in calculating the error between the first label and the detection result, when the position deviation between the boundary frame corresponding to the jth prior frame of the ith grid and the real frame is 0, indicating that the jth prior frame of the ith grid is not responsible for wrinkles (does not include wrinkles), and then I_ij ^objIs 0, I_ij ^noobjIf the number is 1, multiplying the error of the bounding box corresponding to the jth prior box of the ith grid by a preset weight lambda_noobjAfter (i.e. after)

And on the other hand, the boundary frame and the real frame can also carry out gradient pass-back and reverse propagation with certain weight under the condition of no intersection, so that the training can be carried out, and the accuracy of the wrinkle detection model can be improved.

In some embodiments, referring to fig. 7, the step S23 includes:

s231: and taking the face area image marked with the first label as a training sample, inputting the face area image into a feature convolution layer of the preset convolution neural network for feature sampling to obtain a first training feature map and a second training feature map, wherein a first downsampling multiple of the first training feature map relative to the face area image is larger than a second downsampling multiple of the second training feature map relative to the face area image, and the first downsampling multiple is smaller than 32.

S232: and inputting the first training feature map and the second training feature map into a detection convolutional layer of the preset convolutional neural network to obtain a detection result of the training sample.

And inputting training samples into the feature convolution layer, wherein the feature convolution layer performs feature sampling by using a group of initial model parameters to obtain a first training feature map and a second training feature map. The first training feature map and the second training feature map are both reduced in size relative to the original image sample, i.e., downsampled relative to the original image sample. For example, if the size of the input face region image is 416 × 416 and the size of the training feature map obtained after the feature convolution layer processing is 13 × 13, the down-sampling multiple of the training feature map with respect to the face region image is 416/13 — 32 times. The larger the down-sampling multiple is, the smaller the obtained training feature map is, and the larger the receptive field of the training feature map is, so that the method is suitable for detecting the object with large size contrast in the face region image.

The first downsampling multiple of the first training feature map relative to the face area image is larger than the second downsampling multiple of the second training feature map relative to the face area image, and therefore detection of different fine granularities can be achieved.

Since the wrinkle region in the human face belongs to the object region below the medium level, if the receptive field of the adopted training feature map is large, the detection of wrinkles is not facilitated. Thus, the first downsampling multiple is less than 32. That is, the first training feature map has a medium-sized receptive field suitable for detecting medium-sized objects, such as raised-head lines, etc., and the second training feature map has a small-sized receptive field suitable for detecting small-sized objects, such as crow's feet, etc.

By adopting the first training feature map and the second training feature map with the two sizes, wrinkles can be detected more accurately, the redundancy of the boundary frame can be reduced, and compared with the method adopting 3 or more than 3 training feature maps, the method can simplify the model parameters of the feature convolution layer and improve the learning efficiency of the preset convolutional neural network.

In addition, through the first training feature map and the second training feature map, the detection accuracy is high, the redundant boundary frames can be reduced, and the learning efficiency of the preset convolutional neural network is high.

In some embodiments, referring to fig. 8, before the step S23, the method further includes:

s28: and carrying out data augmentation processing on the training samples, wherein the data augmentation processing comprises at least one of image translation, rotation, clipping and histogram equalization.

In order to increase the data volume of the training sample and avoid the problem of inaccurate model caused by small data volume, data augmentation processing is performed on the training sample, specifically, geometric transformation is adopted to perform data augmentation, that is, the data augmentation processing includes at least one of image translation, rotation, clipping and histogram equalization. For example, image translation, or image translation and rotation, is used to perform data augmentation on the training samples.

In order to improve the accuracy of the model, in some embodiments, the performing data augmentation processing on the training samples specifically includes:

s29: and traversing the face region images in the training sample, and randomly selecting at least one of translation, rotation, cutting and histogram equalization of the target face region image from the images to perform data amplification processing, wherein the target face region image is any face region image in the training sample.

In this embodiment, the training samples are not processed in a uniform manner by using a certain data augmentation method, but any target face region image in the training samples is randomly selected from at least one of the image translation, rotation, clipping and histogram equalization in a random selection manner to perform data augmentation. For example, the image translation and rotation are selected for the face area image 1# to perform data expansion, and the image cropping is selected for the face area image 2# to perform data expansion processing. It can be understood that, for a target face region image, the combination of the corresponding data augmentation processes is random. Therefore, the training sample after data augmentation is more real, and the accuracy of the model is improved.

In the following, the method for detecting wrinkles according to the embodiment of the present invention is described in detail, referring to fig. 9, the method S30 includes, but is not limited to, the following steps:

s31: and acquiring a face image to be detected.

S32: the wrinkle detection model in any one of the embodiments is used to detect the face image to be detected, and the wrinkle position and the wrinkle category of the wrinkles in the face image to be detected are obtained.

The face image to be detected is a face image, and can be acquired by the image acquisition device 20, for example, the face image to be detected can be a face image acquired by capturing a face region of a person (an initial face image to be detected) from an identification photograph or a self-portrait photograph acquired by the image acquisition device 20. Here, the source of the face image to be measured is not limited at all, and the face image may be the face image of the person.

It can be understood that, when the initial face image to be detected further includes a background, for example, the identification photo or the self-portrait photo further includes a background, a face frame may be obtained through an existing dlib kit, and then, the width-to-height ratio of the face frame is adjusted to the width-to-height ratio of the initial face image to be detected, so as to intercept the face image and serve as the final face image to be detected. By the method, the face image is intercepted, the background of the face image to be detected is removed, the interference of the background on wrinkle detection can be reduced, and the accuracy of detection is improved.

And inputting the face image to be detected into the wrinkle detection model, so as to obtain the position and the category of wrinkles in the face image to be detected. For example, as shown in fig. 10, inputting a face image a to be detected into the wrinkle detection model, and after performing feature processing, detecting that wrinkles exist in the face image a to be detected is: light head-up lines, no crow's feet, no glabellar lines, and light statute lines, and the mark frame in fig. 10 is the position corresponding to each wrinkle type.

It can be understood that the wrinkle detection model is obtained by training through the method for training the wrinkle detection model in the above embodiment, and the structure and function of the wrinkle detection model are the same as those of the wrinkle detection model in the above embodiment, and are not described in detail here.

In this embodiment, the wrinkle detection model is used to detect the face image to be detected, and the wrinkle type and the wrinkle position can be directly located, so that wrinkles can be quickly and accurately located and classified, and a user can be helped to perform more refined nursing according to the positions of the wrinkle types.

In some embodiments, referring to fig. 11, the step S32 specifically includes:

s321: and inputting the face image to be detected into a feature convolution layer in the wrinkle detection model for feature sampling so as to obtain a first feature image to be detected and a second feature image to be detected, wherein a first downsampling multiple of the first feature image to be detected relative to the face image to be detected is larger than a second downsampling multiple of the second feature image to be detected relative to the face image to be detected, and the second downsampling multiple is smaller than 32.

S322: and inputting the first characteristic diagram to be detected and the second characteristic diagram to be detected into a detection convolution layer in the wrinkle detection model so as to obtain the wrinkle position and the wrinkle category of the wrinkles in the face image to be detected.

And inputting the face image to be detected into a feature convolution layer in the wrinkle detection model for feature sampling to obtain a first feature image to be detected and a second feature image to be detected. And the first characteristic image to be detected and the second characteristic image to be detected are both reduced in size relative to the face image to be detected, namely, are down-sampled relative to the face image to be detected. The larger the down-sampling multiple is, the smaller the obtained characteristic image to be detected is, and the larger the receptive field of the characteristic image to be detected is, so that the method is suitable for detecting the object with large size contrast in the human face image to be detected.

The first downsampling multiple of the first feature graph to be detected relative to the face image to be detected is larger than the second downsampling multiple of the second feature graph to be detected relative to the face area image, and therefore detection of different fine granularities can be achieved.

Since the wrinkle region in the human face belongs to the object region below the medium level, if the receptive field of the adopted training feature map is large, the detection of wrinkles is not facilitated. Thus, the first downsampling multiple is less than 32. That is, the first feature to be detected has a medium-scale receptive field suitable for detecting medium-sized objects, such as raised lines, etc., and the second feature to be detected has a small-scale receptive field suitable for detecting small-sized objects, such as crow's feet, etc.

By adopting the first characteristic diagram to be detected and the second characteristic diagram to be detected with two sizes, wrinkles can be detected more accurately, the redundancy of a boundary frame can be reduced, and compared with the adoption of 3 or more than 3 characteristic diagrams to be detected, the model parameters of the characteristic convolution layer can be simplified, and the detection efficiency of the wrinkle detection model is improved.

In addition, by using the first characteristic diagram to be detected and the second characteristic diagram to be detected, the detection accuracy is high, the redundant bounding box can be reduced, and the detection efficiency is high when the targets with various sizes, namely the detection of the target with the medium size (such as the raised line) and the detection of the target with the small size (such as the fishtail line) can be satisfied.

In order to implement the method, an embodiment of the present invention further provides an image processing apparatus, which may be the electronic device 10 or a combination of the electronic device 10 and the image acquisition apparatus 20, where the image processing apparatus includes one or more functional modules, and the one or more functional modules may implement the methods in fig. 2 to fig. 11 in combination.

An embodiment of the present invention also provides a non-transitory computer-readable storage medium storing computer-executable instructions for causing an electronic device to perform, for example, the methods of fig. 3-11 described above.

Embodiments of the present invention provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform a method of detecting wrinkles in any of the method embodiments described above, e.g. to perform the method steps in fig. 3-11 described above.

It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a general hardware platform, and certainly can also be implemented by hardware. It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a computer readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; within the idea of the invention, also technical features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of training a wrinkle detection model, comprising:

acquiring an image sample comprising a human face;

2. The method of claim 1, wherein said calculating a positional deviation between said bounding box location and said wrinkle location based on said bounding box location, said wrinkle location, said minimum closure area, and said union area comprises:

3. The method of claim 1, wherein the calculating an error between the first label and the detection result according to the bounding box position, the wrinkle position, the position deviation, the wrinkle classification, the probability, and the confidence corresponding to the bounding box position comprises:

to true probability, c to wrinkle class, I_ij ^objWhether the jth prior box representing the ith grid is responsible for wrinkles, I_ij ^noobjIndicating whether the jth prior frame of the ith grid is not responsible for wrinkles, wherein if the position deviation is not 0, then I_ij ^objIs 1, I_ij ^noobjIs 0, if the positional deviation is 0, thenI_ij ^objIs 0, I_ij ^noobjIs 1.

4. The method according to any one of claims 1 to 3, wherein the step of inputting the face region image labeled with the first label as a training sample into a preset convolutional neural network to obtain a detection result of the training sample output by the preset convolutional neural network comprises:

5. The method according to claim 1, wherein before the step of inputting the face region image labeled with the first label as a training sample into a feature convolution layer of a preset convolutional neural network for feature sampling to obtain a first training feature map and a second training feature map, the method further comprises:

6. The method of claim 5, further comprising:

7. A method of detecting wrinkles, comprising:

acquiring a face image to be detected;

detecting the face image to be detected by using the wrinkle detection model according to any one of claims 1 to 6, and acquiring wrinkle positions and wrinkle types of wrinkles in the face image to be detected.

8. The method according to claim 7, wherein the detecting the face image to be detected by using the wrinkle detection model according to any one of claims 1 to 6, and acquiring wrinkle positions and wrinkle classes of wrinkles in the face image to be detected comprises:

9. An image processing apparatus comprising one or more functional modules configured to cooperate to implement a method as claimed in any one of claims 1 to 6 or claim 7 or 8.

10. A non-transitory computer-readable storage medium storing computer-executable instructions for causing an electronic device to perform the method of any one of claims 1-6 or 7 or 8.