CN112613415A

CN112613415A - Face nose type recognition method and device, electronic equipment and medium

Info

Publication number: CN112613415A
Application number: CN202011567913.3A
Authority: CN
Inventors: 陈仿雄
Original assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Current assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-04-06

Abstract

The application discloses a face nose type identification method, a face nose type identification device, electronic equipment and a medium. The method comprises the following steps: acquiring a nasal type image to be processed; extracting texture features of the nose type image to be processed to obtain a texture image of the nose type image to be processed; fusing the texture image and the to-be-processed nose type image to obtain a nose type enhanced image of the to-be-processed nose type image, wherein the nose type enhanced image is a multi-channel image; inputting the nose type enhanced image into a face nose type identification model for processing to obtain a nose type label vector of the nose type image to be processed, wherein the nose type label vector comprises the prediction probability that the nose in the nose type image to be processed belongs to different nose types; and determining the nose type corresponding to the nose type image to be processed according to the nose type label vector, and outputting the nose type corresponding to the nose in the nose type image to be processed.

Description

Face nose type recognition method and device, electronic equipment and medium

Technical Field

The invention relates to the technical field of computer vision, in particular to a face nose type identification method, a face nose type identification device, electronic equipment and a medium.

Background

With the rapid development of mobile communication technology and the improvement of the living standard of people, various intelligent terminals have been widely applied to daily work and life of people, wherein Application software (App) for functions of beautifying and self-photographing, photographing and skin measuring is continuously increased, and many users want to know the face condition of the users by means of some automatically detected apps, such as the recognition of the nose type.

Because the types of the human nose types are more, the divided details are more, and the difficulty of identifying the nose types is increased. At present, the traditional image processing method is mostly adopted to carry out nose type identification according to the texture of the nose type, but the accuracy of the detection algorithm is not high enough.

Disclosure of Invention

The application provides a face nose type identification method, a face nose type identification device, electronic equipment and a medium.

In a first aspect, a method for identifying nose type of a face is provided, which includes:

acquiring a nasal type image to be processed;

extracting texture features of the nose type image to be processed to obtain a texture image of the nose type image to be processed;

fusing the texture image and the to-be-processed nose type image to obtain a nose type enhanced image of the to-be-processed nose type image, wherein the nose type enhanced image is a multi-channel image;

inputting the nose type enhanced image into a face nose type identification model for processing to obtain a nose type label vector of the nose type image to be processed, wherein the nose type label vector comprises the prediction probability that the nose in the nose type image to be processed belongs to different nose types;

and determining the nose type corresponding to the nose type image to be processed according to the nose type label vector, and outputting the nose type corresponding to the nose in the nose type image to be processed.

In a second aspect, there is provided a facial nose type recognition apparatus comprising:

the acquisition module is used for acquiring a nose type image to be processed;

the characteristic extraction module is used for extracting the texture characteristics of the nose type image to be processed so as to obtain the texture image of the nose type image to be processed;

the fusion module is used for fusing the texture image and the to-be-processed nose type image to obtain a nose type enhanced image of the to-be-processed nose type image, wherein the nose type enhanced image is a multi-channel image;

the processing module is used for inputting the nose type enhanced image into a face nose type identification model for processing to obtain a nose type label vector of the nose type image to be processed, wherein the nose type label vector comprises the prediction probability that the nose in the nose type image to be processed belongs to different nose types;

the processing module is further used for determining the nose type corresponding to the nose type image to be processed according to the nose type label vector and outputting the nose type corresponding to the nose in the nose type image to be processed.

In a third aspect, an electronic device is provided, comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps as in the first aspect and any one of its possible implementations.

In a fourth aspect, there is provided a computer storage medium storing one or more instructions adapted to be loaded by a processor and to perform the steps of the first aspect and any possible implementation thereof.

The embodiment of the application acquires a to-be-processed nose type image; extracting texture features of the nose type image to be processed to obtain a texture image of the nose type image to be processed; fusing the texture image and the to-be-processed nose type image to obtain a nose type enhanced image of the to-be-processed nose type image, wherein the nose type enhanced image is a multi-channel image; inputting the nose type enhanced image into a face nose type identification model for processing to obtain a nose type label vector of the nose type image to be processed, wherein the nose type label vector comprises the prediction probability that the nose in the nose type image to be processed belongs to different nose types; determining a nose type corresponding to the nose type image to be processed according to the nose type label vector, outputting the nose type corresponding to the nose in the nose type image to be processed, fusing a texture image of the nose type image to be processed and an original nose type image into a multi-channel nose type enhanced image, and inputting the multi-channel nose type enhanced image into the face nose type identification model for processing, so that the nose type texture feature of the nose type image to be processed is enhanced; the nose type recognition model of the face can process the nose type enhanced image to obtain the nose type corresponding to the nose in the nose type image to be processed, and the accuracy of the nose type recognition can be higher.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present application, the drawings required to be used in the embodiments or the background art of the present application will be described below.

Fig. 1 is a schematic flow chart of a method for identifying nose type of face according to an embodiment of the present application;

FIG. 2 is a schematic view of a nose region according to an embodiment of the present disclosure;

fig. 3A is a schematic flowchart of a method for training a facial nose type recognition model according to an embodiment of the present application;

fig. 3B is a schematic structural diagram of a facial nose type recognition model according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a facial nose type recognition device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

Neural Networks (NN) referred to in the embodiments of the present application are complex network systems formed by widely interconnecting a large number of simple processing units (called neurons), reflect many basic features of human brain functions, and are highly complex nonlinear dynamical learning systems. The neural network has the capabilities of large-scale parallel, distributed storage and processing, self-organization, self-adaptation and self-learning, and is particularly suitable for processing inaccurate and fuzzy information processing problems which need to consider many factors and conditions simultaneously.

Convolutional Neural Networks (CNN) are a class of feed forward Neural Networks (fed forward Neural Networks) that contain convolution computations and have a deep structure, and are one of the representative algorithms for deep learning (deep learning).

The embodiments of the present application will be described below with reference to the drawings.

Referring to fig. 1, fig. 1 is a schematic flow chart of a method for identifying a nose type of a face according to an embodiment of the present application. The method can comprise the following steps:

101. and acquiring a to-be-processed nose type image.

The subject of the embodiments of the present application may be a facial nose recognition apparatus, and may be an electronic device, which in particular may be a terminal, also referred to as a terminal device, including but not limited to other portable devices such as a mobile phone, a laptop computer, or a tablet computer having a touch sensitive surface (e.g., a touch screen display and/or a touch pad). It should also be understood that in some embodiments, the devices described above are not portable communication devices, but rather are desktop computers having touch-sensitive surfaces (e.g., touch screen displays and/or touch pads).

The above-mentioned nose type image to be processed may be an image of a nose region of the face.

In one embodiment, the step 101 may specifically include:

11. and acquiring a face image to be processed.

12. And performing face key point detection on the face image to be processed, and determining face key point coordinates in the face image to be processed.

13. And intercepting a nose type area of the face image to be processed as the nose type image to be processed based on the face key point coordinates in the face image to be processed.

The face image to be processed is an image containing a face. Specifically, the acquired face image to be processed may be preprocessed by cutting or the like to obtain a standardized face image to be processed, and then the face key point detection may be performed to determine the coordinates of the face key points. In the embodiment of the present application, any face keypoint detection model or algorithm may be used to perform keypoint detection, for example, 68 keypoints of a face contour may be located. Optionally, each detected keypoint has a corresponding keypoint identification (number).

After the face key point coordinates are determined, the key point coordinates of each part of the face, such as the key point coordinates representing the center of the nose, can be referred to, and the region where the nose of the face is located is divided, namely, the nose type region of the face image to be processed is intercepted.

Further optionally, step 13 specifically includes:

acquiring a nose center key point coordinate corresponding to a preset key point identifier from the face key point coordinate according to the preset key point identifier;

and determining and intercepting a nose type area of the face image to be processed as the nose type image to be processed according to a preset width threshold, a preset height threshold and the coordinates of the key point of the nose center.

Specifically, the preset key point identifier may be a preset identifier of a central point of a nose in the face, and the position of a key point where the preset key point identifier is located is determined at first, and then the nose type region is divided according to a preset width threshold and a preset height threshold: and intercepting a rectangular nose type area by taking the preset width threshold as the width, the preset height threshold as the length and the key point of the nose center as the center, thereby obtaining a nose type image to be processed.

Referring to fig. 2, a schematic diagram of a nose-shaped region may be seen, as shown in fig. 2, showing a plurality of face key points detected in a face image to be processed, where each key point corresponds to a number as a key point identifier. The rectangular region in fig. 2 is a nose type region, and is obtained by cutting a rectangular region with a preset length of 2r and width of 2r with the key point of the nose center (31 st key point of the face) as the center in the above manner.

The method can accurately acquire the nose type image from the image to be processed, acquire the nose type image to be processed with the standard size, and is suitable for the face nose type recognition model, so that the face nose type recognition model can accurately perform nose type recognition processing on the nose type image to be processed, and a nose type recognition result with higher accuracy can be acquired.

102. And extracting texture features of the nose type image to be processed to obtain a texture image of the nose type image to be processed.

The texture feature of the nose type image to be processed may be understood as edge contour information of the nose, that is, edge information of the nasal edge, the nostril, and the like in the nose region. Specifically, a network model may be used to perform feature extraction on the to-be-processed nose type image to obtain a texture image of the to-be-processed nose type image, and specifically, the texture image may be a binarized nose edge contour map having the same size as the original to-be-processed nose type image.

In an alternative embodiment, the outline information of the nose can be extracted by using a Sobel operator to obtain the texture image.

The Sobel operator is a discrete differentiation operator (discrete differentiation operator) for calculating an approximate gradient of the image gray scale, and the larger the gradient, the more likely the gradient is to be an edge. The function of Soble operator integrates Gaussian smoothing and differential derivation, also called first order differential operator, and derivation operator is used for derivation in horizontal and vertical directions, and the obtained image is a gradient image in X direction and Y direction. The embodiment of the present application does not limit the algorithm for extracting the texture image.

103. And fusing the texture image and the to-be-processed nose type image to obtain a nose type enhanced image of the to-be-processed nose type image, wherein the nose type enhanced image is a multi-channel image.

In the case of obtaining the texture image of the nose type image to be processed, the texture image may be fused with the nose type image to be processed, so as to use texture feature data represented by the texture image as data of one channel of the nose type image to be processed, which may be referred to as a nose type texture channel.

A typical RGB format image has R, G, B three channels of data. The RGB color scheme is a color standard in the industry, and various colors are obtained by changing three color channels of red (R), green (G) and blue (B) and superimposing them on each other.

In the embodiment of the application, the nose type image to be processed is fused with the texture image thereof, that is, the extracted texture feature data is used as fourth channel data, and original image information is added to obtain the nose type enhanced image. It can be understood that the fourth channel is a binarized nose edge contour map of the same size as the nose type image to be processed, and the nose type enhanced image in this case is a multi-channel image including at least R, G, B channels and one channel of texture feature data. The nose type texture features of the nose type image to be processed are enhanced through the steps, and then step 104 is executed.

104. And inputting the nose type enhanced image into a face nose type identification model for processing to obtain a nose type label vector of the nose type image to be processed, wherein the nose type label vector comprises the prediction probability that the nose in the nose type image to be processed belongs to different nose types.

The face nose type recognition model is a pre-trained network model and can process the multi-channel nose type enhanced image. The nose type label vector of the nose type image to be processed can be obtained through the face nose type identification model, and the nose type label vector is a probability distribution and comprises the predicted probability that the nose in the nose type image to be processed belongs to different nose types, such as P ═ P [ -P [ ] expressed by the prediction probability₁,P₂,P₃,P₄,P₅,P₆]Wherein the element P₁-P₆Respectively representing the prediction probability that the nose in the nose type image to be processed belongs to 6 different nose types.

In an alternative embodiment, the step 104 includes:

41. extracting features of the nose type enhanced image based on the face nose type recognition model to obtain a feature map set of the nose type enhanced image, wherein the feature map set comprises a plurality of category feature maps corresponding to different nose types and a texture feature map, and the texture feature map is generated based on the texture image;

42. respectively fusing the plurality of category feature maps with the texture feature map to obtain a plurality of target feature maps of the nose type enhanced image;

43. and generating the nose type label vector of the nose type image to be processed according to the plurality of target feature maps.

Specifically, the trained face nose type recognition model can perform image feature extraction to obtain a feature map set of a nose type enhanced image, wherein the feature map set comprises a plurality of category feature maps corresponding to different nose types, and further comprises a texture feature map generated based on the texture image.

Such as enhancing the image for the nose type I e R^4×40×40Extracting a depth feature F epsilon R corresponding to an input image as a model input image^T*W*HWhere T denotes the number of channels of the above depth feature, T is equal to the number n of classes of nose type plus 1, and W and H denote the width and height of each feature map. In the extracted n +1 feature maps, a texture image F is included' is the feature map corresponding to the nose type texture channel of the original input image. The class feature map for each nose type can be represented as F_i∈R^W×HI ═ 1,2,. n), then the depth feature is obtained as F ═ { F ═ F₁,F₂,...,F_n,F′}。

In the face nose type identification model, the feature map F is obtained by firstly performing a convolution kernel feature extraction process, namely convolution operation, wherein the convolution operation of each convolution kernel can obtain one feature map, and the number of the convolution kernels can obtain the number of feature maps. In the feature extraction structure of the face nose type recognition model, the number of convolution kernels of the final convolution operation is set to be n +1, wherein n represents the nose type number, namely the trained face nose type recognition model can obtain n class feature maps (F) through n convolution kernels₁,F₂,...,F_n) The remaining one convolution kernel is used to obtain a feature map (texture image F') indicating a texture feature corresponding to each of the different nose-type features, whereby the entire depth adjustment map F can be obtained as { F ═ F }₁,F₂,...,F_n,F′}。

After obtaining the feature map, the plurality of category feature maps F_iThe images are respectively fused with the texture images F', so that a plurality of target feature maps of the nose type enhanced image can be obtained. The plurality of target feature maps obtained at this time still correspond to the nose shape corresponding to the original image, and the prediction probability of the nose shape, that is, the nose shape label vector of the to-be-processed nose shape image can be generated by the plurality of target feature maps. Specifically, the feature map can be converted into a result of type probability distribution through a full connection layer and a Softmax layer in the face nose type recognition model. The input of the full connection layer is regarded as a feature map extracted from the input nose type enhanced image by the network, and one feature map corresponds to one point in the multi-dimensional space. Multiplying a preset weight matrix by a vector representation of an input feature map in the fully-connected layer, and adding a preset bias parameter, mapping n (∞, + ∞) real numbers into K (∞, + ∞) real numbers (the result can be understood as a fraction), and Softmax mapping K (∞, + ∞) real numbers into K (0,1) real numbers (the result can be understood as a probability) while ensuring that the sum of the K (∞, + ∞) real numbers is 1, wherein K is a nose type of real numberThe number of classes, and thus the predicted probability for each nose type.

Specifically, in order to further enhance the nose type texture feature, the texture image F' and the remaining feature maps F of each category are required_iAnd carrying out fusion operation.

Specifically, the step 42 may include:

and matrix-multiplying and weighting the plurality of category feature maps and the texture feature map to obtain a plurality of target feature maps of the nose-type enhanced image.

The feature map data may be expressed in a matrix form, and each class feature map may be multiplied by the texture feature map, that is, may be expressed as F_iAnd F', and performing weighting processing by adopting preset weight parameters to obtain a corresponding target characteristic diagram.

By way of further example, the finally obtained target feature map may be represented as:

F＝{F₁*α*F′,F₂*β*F′,...,F_n*θ*F′}；

the parameters α, β, θ, etc. are the preset weight parameters, and may be self-learning parameters in the model training.

105. And determining the nose type corresponding to the nose type image to be processed according to the nose type label vector, and outputting the nose type corresponding to the nose in the nose type image to be processed.

According to the prediction probability in the nose type label vector, the nose type corresponding to the nose type image to be processed can be determined, specifically, the nose type corresponding to the maximum prediction probability can be determined to be the nose type corresponding to the nose type image to be processed, and the nose type corresponding to the nose in the nose type image to be processed can be output. Alternatively, when the maximum prediction probability is plural (the prediction probabilities have the same value), a nose type corresponding to each maximum prediction probability may be output. The embodiment of the application does not limit other contents and output forms of the output result.

Referring to fig. 3A, fig. 3A is a schematic flow chart of a method for training a facial nose type recognition model according to an embodiment of the present application. The method can comprise the following steps:

301. obtaining a plurality of sample nose type images, wherein each sample nose type image is marked with a sample nose type label vector, and the sample nose type label vector is used for indicating the probability that the nose in the sample nose type image belongs to different nose types;

302. extracting texture images of each sample nose type image; fusing each sample nose type image with a texture image corresponding to each sample nose type image to generate a sample nose type enhanced image corresponding to each sample nose type image so as to obtain a plurality of sample nose type enhanced images, wherein the sample nose type enhanced images are multi-channel images;

303. and training the network model based on the plurality of sample nose type enhanced images to obtain a face nose type recognition model.

The execution subject of the embodiment of the present application may be a facial nose type recognition model training device, or may be the facial nose type recognition device, or may be an electronic device, and in a specific implementation, the electronic device may be a terminal, or may be referred to as a terminal device, including but not limited to a desktop computer having a touch-sensitive surface (e.g., a touch screen display and/or a touch pad).

The plurality of sample nose type images can be images of different face nose areas, or can be obtained by intercepting in a face image. The key point detection can be carried out on the sample face image firstly, then the sample nose type image is obtained through interception, normalization operation is needed due to model input, in order to not damage the spatial structure of the nose type, the model can better learn the spatial feature information of the nose type outline feature, when the nose type image is intercepted, the nose region can be intercepted by taking the key point at the center of the nose as the central point H (x, y) and the width and the height of the nose as preset values such as 2r, and the key point is used as the sample nose type image. This step may refer to the detailed description in step 101 of the embodiment shown in fig. 1, and is not described herein again.

In order to better learn the feature information between different nose types, when labeling the nose type of the sample nose type image, a one-hot method can be adopted for labeling, that is, the sample nose type label vector is adopted, wherein each value represents the probability of belonging to a certain nose type. For example, when there are 6 types of nose types, when labeling a sample, the sample nose type label vector P ═ 0,0,0.8,0.2,0,0] indicates that the sample nose type image has a probability of 0.8 belonging to the third type of nose and a probability of 0.2 belonging to the fourth type of nose.

ne-hot encoding is a process that converts class variables into a form that is readily utilized by machine learning algorithms. In general labeling methods such as binarization labeling, each sample corresponds to only one class in the multi-classification problem (i.e., the value is 1 only at the corresponding feature, and the value of the margin square is 0). The method provided by the embodiment of the application marks the probability of belonging to a certain category, so that the classification prediction result obtained by the model is also represented as the probability of belonging to a certain category, and thus, when a loss function (such as cross entropy loss) or accuracy calculation is carried out, the calculation complexity is simplified; and the labeled numerical value can also be understood as the similarity with a certain category, so that the distance between the features is more reasonable to calculate, and the accuracy of the model is improved.

Step 52 is similar to the specific method in steps 102 to 103 in the embodiment shown in fig. 1, and reference may be made to the foregoing detailed description, which is not repeated herein.

Optionally, before training the network model based on the plurality of sample nose type enhanced images, the method further includes:

subtracting the average value of the data of the plurality of sample nose type enhanced images on the target channel from the data of the target sample nose type enhanced image on the target channel to obtain the de-average data of the target sample nose type enhanced image on the target channel, wherein the target sample nose type enhanced image is any one of the plurality of sample nose type enhanced images, and the target channel is any one of the plurality of channels;

calculating the variance of the mean-removed data of the plurality of sample nose type enhanced images on the target channel;

updating data of the target sample nose type enhanced image on a target channel into a ratio of the mean-removed data of the target sample nose type enhanced image on the target channel to the variance so as to update data of the plurality of sample nose type enhanced images on the plurality of channels.

The normalization processing can be carried out on the sample nose type enhanced image before training, so that the convergence of weight parameters can be effectively accelerated, and an optimal solution can be better found.

For convenience of description, any one of the sample nose type enhanced images is represented by a target sample nose type enhanced image I, and any one of the multiple channels is represented by a target channel. The target sample nose type enhanced image is a four-channel input image which can be expressed as I^W*H*4. The target sample nose type enhanced image may be first de-averaged, and the data of the target sample nose type enhanced image on the target channel is subtracted from the average of the data of the plurality of sample nose type enhanced images on the target channel, that is, the following formula may be specifically adopted:

where I1, 2,3,.., N is the total number of sample nose type enhanced images, I_R,I_G,I_B,I_SData representing four channels (R, G, B, S) of the target sample nose type enhanced image I, respectively. I is_i,R,I_i,G,I_i,B,I_i,SThe values of the four channels of the ith sample nose type enhanced image are respectively represented.

Deaveraging data I (I ') of target sample nose type enhanced image can be obtained through the de-averaging processing'_R,I′_G,I′_B,I′_S)。

Further, the variance processing is performed on the above-mentioned de-averaged data, and the following formula may be specifically adopted:

further, a ratio of the de-averaged data to the variance over each channel may be found:

the data of the target sample nose type enhanced image on the target channel can be updated to be the ratio of the de-averaged data and the variance on the target channel, and the final data I (I 'of each sample nose type enhanced image on four channels is obtained'_R,I′_G,I′_B,I′_S) Namely, the above steps are performed for each channel of each sample nose type enhanced image, so that the data of a plurality of sample nose type enhanced images on the plurality of channels can be updated.

Through the mean value removal and the variance adjustment, sample data, namely the sample nose type enhanced image, accords with normal distribution, convergence of weight parameters during model training can be effectively accelerated, and an optimal solution can be better found.

In the facial nose type recognition model in the embodiment of the present application, the convolutional neural network model of the feature extraction model may include: network structures such as VGG, ResNet, and Incention. Considering the characteristics of the nose type image and the applicability of the recognition system, a small-model convolutional neural network can be selected as a basic network architecture, convolutional kernels with different sizes are adopted, different receptive fields (receptavefields) are obtained, and multi-scale feature fusion is obtained. And the average pooling layer is adopted to replace a full connection layer in the convolutional neural network, so that the parameters of the model are effectively reduced. Any network architecture can be adopted for replacement, and the feature extraction structure is not limited.

Fig. 3B is a schematic structural diagram of a facial nose type recognition model provided in an embodiment of the present application, and as shown in fig. 3B, the facial nose type recognition model may include a CNN network (feature extraction layer) for feature extraction, a feature fusion layer, and an average pooling layer, where:

extracting a feature map F ═ F from the sample nose type enhanced image through a feature extraction layer₁,F₂,...,F_nF', the number of convolution kernels of the feature extraction layer is n +1, n is the number of class feature maps, and the output feature maps areComprising n of the above class feature maps and a texture image F' representing the nose type texture feature.

In the feature extraction structure of the face nose type identification model, the number of convolution kernels of the final convolution operation is set to be n +1, wherein n represents the number of nose types, namely n convolution kernels are used for obtaining n sample class feature maps (F)₁,F₂,...,F_n) The remaining one convolution kernel is used to obtain a feature map (texture image F') representing texture features of the sample, corresponding to n nose type features, respectively. Wherein, the four-channel data [ R, G, B, V ] of the sample nose type enhanced image](here, an example ordering) the fourth channel data including R, G, B three-channel data and texture feature data V, and during the convolution operation, since the ordering of the four channels is not changed, in practice, when the feature map is visualized, it can be found that the last feature map can most feed back the texture feature data V, so that during the network setting, the last feature map is the feature map of the sample texture feature, that is, during the feature extraction process, the channel visualization feature map is found, so that the model setting including the above-mentioned convolution kernel setting can be performed, and the required feature map can be obtained through the convolution operation, that is, the code learning. Further, feature fusion is performed on the extracted feature maps in a feature fusion layer, each sample nose type image is fused with a texture image F' corresponding to each sample nose type image, a sample nose type enhanced image corresponding to each sample nose type image is generated, so as to obtain a plurality of sample nose type enhanced images, for example, if the previous convolutional layer comprises 32 convolution kernels, 32 feature maps are extracted and obtained, and one of the texture images is fused with the remaining 31 feature maps, so that 31 sample nose type enhanced images can be obtained;

and the obtained sample nose type enhanced image is subjected to down-sampling through an average pooling layer and is restored to the size of the original sample nose type enhanced image. For the subsequent classification process, the softmax layer can be adopted to map the output of a plurality of neurons into the (0,1) interval, which can be understood as probability, so as to perform the nasal classification. The nose type facial recognition model can be subjected to model training by adopting a preset loss function, and KL-loss shown in FIG. 3B is a KL divergence loss function. According to the probability distribution of the labels, a KL divergence loss function can be adopted, so that the model is optimized towards the direction with the minimum KL loss value during training, and the similar characteristics among the nose types can be better learned.

Specifically, the predicted probability distribution, which is the nose type tag vector output by the face nose type recognition model, is assumed to be P ═ P₁,P₂,P₃,P₄,P₅,P₆]And the probability distribution of the sample nose type label vector, namely the real label, is T ═ T [ T ]₁,T₂,T₃,T₄,T₅,T₆]Then the loss function of the facial nose type recognition model can be:

where Z is the total number of sample nose type enhanced images and n represents the number of categories of nose types.

In the embodiment of the application, the adam algorithm can be adopted to optimize the model parameters, and training is performed according to the preset iteration times, the initial learning rate, the learning rate attenuation parameters and the weight attenuation parameters until the network model is converged and the training is finished. For example, the number of iterations is set to 500, the initial learning rate is set to 0.001, the weight attenuation is set to 0.0005, and the learning rate is attenuated to 1/10 every 50 iterations. Different algorithms can be selected to optimize the model parameters according to needs, and the embodiment of the application is not limited to this.

After the training is completed, a trained face nose type recognition model can be obtained, and the trained face nose type recognition model is applied to the face nose type recognition in the embodiment shown in fig. 1.

The method and the device aim at the characteristics of small inter-class difference and large intra-class difference between the nose type features, combine the advantages of an image processing technology and a convolutional neural network, and strengthen the texture features of the nose type through the image processing technology. Constructing a four-channel nose type image containing texture image data, and strengthening the learning of a model on nose type characteristics; and constructing a nose type label vector, so that the model can better learn the similar characteristics among nose types, thereby reducing the influence caused by small difference among classes and improving the accuracy of model identification.

Based on the description of the embodiment of the face nose type identification method, the embodiment of the application also discloses a face nose type identification device. Referring to fig. 4, the facial nose type recognition apparatus 400 includes:

an obtaining module 410, configured to obtain a nasal type image to be processed;

a feature extraction module 420, configured to perform texture feature extraction on the to-be-processed nose-type image to obtain a texture image of the to-be-processed nose-type image;

a fusion module 430, configured to fuse the texture image and the to-be-processed nose type image to obtain a nose type enhanced image of the to-be-processed nose type image, where the nose type enhanced image is a multi-channel image;

a processing module 440, configured to input the nose type enhanced image into a nose type recognition model of a face for processing, so as to obtain a nose type tag vector of the to-be-processed nose type image, where the nose type tag vector includes a prediction probability that a nose in the to-be-processed nose type image belongs to different nose types;

the processing module 440 is further configured to determine a nose type corresponding to the to-be-processed nose type image according to the nose type tag vector, and output the nose type corresponding to the nose in the to-be-processed nose type image.

According to an embodiment of the present application, the steps involved in the methods shown in fig. 1 and fig. 3A may be performed by the modules in the facial nose type recognition apparatus 400 shown in fig. 4, and are not described herein again.

The facial nose type recognition device 400 in the embodiment of the application can acquire a to-be-processed nose type image; extracting texture features of the nose type image to be processed to obtain a texture image of the nose type image to be processed; fusing the texture image and the to-be-processed nose type image to obtain a nose type enhanced image of the to-be-processed nose type image, wherein the nose type enhanced image is a multi-channel image; inputting the nose type enhanced image into a face nose type identification model for processing to obtain a nose type label vector of the nose type image to be processed, wherein the nose type label vector comprises the prediction probability that the nose in the nose type image to be processed belongs to different nose types; determining a nose type corresponding to the nose type image to be processed according to the nose type label vector, outputting the nose type corresponding to the nose in the nose type image to be processed, fusing a texture image of the nose type image to be processed and an original nose type image into a multi-channel nose type enhanced image, and inputting the multi-channel nose type enhanced image into the face nose type recognition model for processing, so that the nose type texture feature of the nose type image to be processed is enhanced; the nose type recognition model of the face can process the nose type enhanced image to obtain the nose type corresponding to the nose in the nose type image to be processed, and the accuracy of the nose type recognition can be higher.

Based on the description of the method embodiment and the device embodiment, the embodiment of the application further provides an electronic device. Referring to fig. 5, the electronic device 500 includes at least a processor 501, an input device 502, an output device 503, and a computer storage medium 504. The processor 501, the input device 502, the output device 503, and the computer storage medium 504 within the electronic device may be connected by a bus or other means.

A computer storage medium 504 may be stored in the memory of the electronic device, the computer storage medium 504 being used for storing a computer program comprising program instructions, and the processor 501 being used for executing the program instructions stored by the computer storage medium 504. The processor 501 (or CPU) is a computing core and a control core of the electronic device, and is adapted to implement one or more instructions, and in particular, is adapted to load and execute the one or more instructions so as to implement a corresponding method flow or a corresponding function; in one embodiment, the processor 501 described above in the embodiments of the present application may be used to perform a series of processes, including the methods in the embodiments shown in fig. 1 and fig. 3A, and so on.

An embodiment of the present application further provides a computer storage medium (Memory), which is a Memory device in an electronic device and is used to store programs and data. It is understood that the computer storage medium herein may include both a built-in storage medium in the electronic device and, of course, an extended storage medium supported by the electronic device. Computer storage media provide storage space that stores an operating system for an electronic device. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by processor 501. The computer storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; and optionally at least one computer storage medium located remotely from the processor.

In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by processor 501 to perform the corresponding steps in the above embodiments; in particular implementations, one or more instructions in the computer storage medium may be loaded by processor 501 and executed to perform any step of the method in fig. 1 and/or fig. 3A, which is not described herein again.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the division of the module is only one logical division, and other divisions may be possible in actual implementation, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. The shown or discussed mutual coupling, direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some interfaces, and may be in an electrical, mechanical or other form.

Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)), or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a read-only memory (ROM), or a Random Access Memory (RAM), or a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, such as a Digital Versatile Disk (DVD), or a semiconductor medium, such as a Solid State Disk (SSD).

Claims

1. A facial nose pattern recognition method, comprising:

acquiring a nasal type image to be processed;

2. The method for identifying the nose type of the face according to claim 1, wherein the inputting the nose type enhanced image into a nose type identification model of the face for processing to obtain a nose type label vector of the nose type image to be processed comprises:

extracting features of the nose type enhanced image based on the face nose type recognition model to obtain a feature map set of the nose type enhanced image, wherein the feature map set comprises a plurality of category feature maps corresponding to different nose types and a texture feature map, and the texture feature map is generated based on the texture image;

respectively fusing the plurality of category feature maps with the texture feature map to obtain a plurality of target feature maps of the nose type enhanced image;

and generating a nose type label vector of the nose type image to be processed according to the plurality of target feature maps.

3. The method for identifying the nose type of the face according to claim 2, wherein the step of fusing the plurality of class feature maps with the texture feature map to obtain a plurality of target feature maps of the nose type enhanced image comprises:

and carrying out matrix multiplication and weighting on the plurality of category characteristic maps and the texture characteristic map respectively to obtain a plurality of target characteristic maps of the nose type enhanced image.

4. The facial nose type recognition method according to any one of claims 1 to 3, wherein the acquiring of the nose type image to be processed includes:

acquiring a face image to be processed;

performing face key point detection on the face image to be processed, and determining face key point coordinates in the face image to be processed;

and intercepting a nose type area of the face image to be processed as the nose type image to be processed based on the face key point coordinates in the face image to be processed.

5. The method for identifying the nose shape of the face according to claim 4, wherein the intercepting the nose shape area of the face image to be processed as the nose shape image to be processed based on the face key point coordinates in the face image to be processed comprises:

6. The facial nose type recognition method according to claim 1, wherein the facial nose type recognition model is trained, and the training method of the facial nose type recognition model comprises:

obtaining a plurality of sample nose type images, wherein each sample nose type image is marked with a sample nose type label vector which is used for indicating the probability that the nose in the sample nose type image belongs to different nose types;

extracting a texture image of each sample nose type image; fusing each sample nose type image with the texture image corresponding to each sample nose type image to generate a sample nose type enhanced image corresponding to each sample nose type image so as to obtain a plurality of sample nose type enhanced images, wherein the sample nose type enhanced images are multi-channel images;

training a network model based on the plurality of sample nose type enhanced images to obtain the face nose type recognition model.

7. The method of facial nose pattern recognition according to claim 6, wherein prior to said training a network model based on said plurality of sample nose pattern enhanced images, said method further comprises:

subtracting the average value of the data of the plurality of sample nose type enhanced images on the target channel from the data of the target sample nose type enhanced images on the target channel to obtain the de-average data of the target sample nose type enhanced images on the target channel, wherein the target sample nose type enhanced image is any one of the plurality of sample nose type enhanced images, and the target channel is any one of the plurality of channels;

calculating a variance of the de-averaged data of the plurality of sample nose-type enhanced images over the target channel;

updating data of the target sample nose type enhanced image on a target channel into a ratio of the de-averaged data of the target sample nose type enhanced image on the target channel to the variance so as to update data of the plurality of sample nose type enhanced images on the plurality of channels.

8. A facial nose type recognition device, comprising:

the acquisition module is used for acquiring a nose type image to be processed;

9. An electronic device, comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the facial nose type recognition method of any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that a computer program is stored which, when being executed by a processor, causes the processor to carry out the steps of the facial nose type recognition method according to any one of claims 1 to 7.