CN117333928B - Face feature point detection method and device, electronic equipment and storage medium - Google Patents

Face feature point detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117333928B
CN117333928B CN202311630749.XA CN202311630749A CN117333928B CN 117333928 B CN117333928 B CN 117333928B CN 202311630749 A CN202311630749 A CN 202311630749A CN 117333928 B CN117333928 B CN 117333928B
Authority
CN
China
Prior art keywords
convolution
current
image
result
feature point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311630749.XA
Other languages
Chinese (zh)
Other versions
CN117333928A (en
Inventor
王念欧
郦轲
刘文华
万进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Accompany Technology Co Ltd
Original Assignee
Shenzhen Accompany Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Accompany Technology Co Ltd filed Critical Shenzhen Accompany Technology Co Ltd
Priority to CN202311630749.XA priority Critical patent/CN117333928B/en
Publication of CN117333928A publication Critical patent/CN117333928A/en
Application granted granted Critical
Publication of CN117333928B publication Critical patent/CN117333928B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/169Holistic features and representations, i.e. based on the facial image taken as a whole
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Abstract

The invention discloses a face feature point detection method, a face feature point detection device, electronic equipment and a storage medium; the method comprises the following steps: acquiring an original face image; performing multi-scale transformation on the original face image to obtain at least one alternative image, wherein the multi-scale transformation at least comprises one of the following steps: size conversion, brightness conversion, angle conversion, and noise conversion; sequentially taking an original face image and each alternative image as an image to be identified, taking the image to be identified as input, and inputting the input image to a target feature point detection network for global feature extraction and multi-scale feature extraction to obtain a feature point coordinate set; each feature point coordinate set comprises at least one feature point coordinate and a feature point identifier corresponding to the feature point coordinate; and for each feature point mark, calculating based on the feature point coordinates corresponding to the feature point mark to obtain the face feature point coordinates, so that the problem of inaccurate face feature point detection is solved, and the accuracy of face feature point identification is improved.

Description

Face feature point detection method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method and apparatus for detecting facial feature points, an electronic device, and a storage medium.
Background
With the development of technology and the improvement of living standard, more and more people care skin by using beauty instruments. When some beauty treatment apparatuses assist users in caring skin on the face, the skin state of the users and facial feature points of the users need to be detected, the faces of the users are divided into areas through the facial feature points, and each area is detected and skin is protected in a targeted mode.
In the prior art, when face feature points are detected, the shot images are detected directly, but in the shooting process, internal factors (for example, equipment resolution) of equipment, external factors (for example, light rays, angles and the like) of environment and the like all have certain influence on the shot images, so that the recognition accuracy is influenced. Therefore, the user performs facial feature recognition when caring skin each time, and the recognition accuracy is low, so that the region division is inaccurate when dividing the facial region according to the recognition result each time, and the skin cannot be targeted for caring skin. In addition, the existing feature point coordinate detection model has low detection precision, and cannot accurately detect the feature points of the human face. How to improve the accuracy of face feature point identification and effectively care the skin becomes a problem to be solved.
Disclosure of Invention
The invention provides a face feature point detection method, a face feature point detection device, electronic equipment and a storage medium, and aims to solve the problem of low face feature point detection accuracy.
According to an aspect of the present invention, there is provided a face feature point detection method, including:
acquiring an original face image;
performing multi-scale transformation on the original face image to obtain at least one alternative image, wherein the multi-scale transformation at least comprises one of the following steps: size conversion, brightness conversion, angle conversion, and noise conversion;
sequentially taking the original face image and each candidate image as images to be identified;
the image to be identified is used as input and is input into a convolution layer, a batch standardization layer, an activation function and a maximum pooling layer which are sequentially connected in a pre-training target feature point detection network to carry out global feature extraction, so that a first extraction result is obtained, wherein the convolution layer is an 11 multiplied by 11 large convolution layer;
inputting the first extraction result as input into three multi-scale modules connected in sequence in the target feature point detection network for multi-scale feature extraction to obtain a second extraction result;
inputting the second extraction result as input into an average pooling layer, a full-connection layer and an output layer which are sequentially connected in the target feature point detection network to obtain a feature point coordinate set; each feature point coordinate set comprises at least one feature point coordinate and a feature point identifier corresponding to the feature point coordinate;
And for each feature point identifier, calculating based on each feature point coordinate corresponding to the feature point identifier to obtain the face feature point coordinate.
According to another aspect of the present invention, there is provided a face feature point detection apparatus including:
the image acquisition module is used for acquiring an original face image;
the image transformation module is used for carrying out multi-scale transformation on the original face image to obtain at least one alternative image, and the multi-scale transformation at least comprises the following steps: size conversion, brightness conversion, angle conversion, and noise conversion;
the image to be identified determining module is used for sequentially taking the original face image and each candidate image as images to be identified;
the global feature extraction module is used for taking the image to be identified as input, inputting the image to be identified into a convolution layer, a batch standardization layer, an activation function and a maximum pooling layer which are sequentially connected in a pre-training target feature point detection network, and carrying out global feature extraction to obtain a first extraction result, wherein the convolution layer is an 11 multiplied by 11 large convolution layer;
the multi-scale feature extraction module is used for inputting the first extraction result as input into three multi-scale modules connected in sequence in the target feature point detection network to perform multi-scale feature extraction to obtain a second extraction result;
The coordinate set determining module is used for inputting the second extraction result as input into an average pooling layer, a full-connection layer and an output layer which are sequentially connected in the target feature point detection network to obtain a feature point coordinate set; each feature point coordinate set comprises at least one feature point coordinate and a feature point identifier corresponding to the feature point coordinate;
and the characteristic point coordinate determining module is used for calculating each characteristic point identifier based on each characteristic point coordinate corresponding to the characteristic point identifier to obtain the face characteristic point coordinate.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the face feature point detection method according to any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement the face feature point detection method according to any one of the embodiments of the present invention when executed.
According to the technical scheme, the problem of inaccurate detection of the face feature points is solved, after the original face image is obtained, the original face image is converted in at least one mode of size conversion, brightness conversion, angle conversion and noise conversion, multi-scale conversion of the original face image is achieved, alternative images are obtained, the original face image and the alternative images are used as images to be identified for feature point detection, face feature point coordinates are determined, the condition that the accuracy of detection results is low when the original face image is used for detection is avoided, the face feature points are better described through the alternative images from different angles, corresponding images of different conditions of a user are simulated through the alternative images, the relative errors of the same user in different times of measurement are considered in the feature point identification process of each time, the errors in the feature point identification of different times are reduced, the relative accuracy of the feature point identification of the face is improved, the user can keep high consistency when the feature points are identified for multiple times, the accuracy and consistency of region division are guaranteed, and the skin care experience is improved. In addition, in the target feature point detection network provided by the embodiment of the application, feature extraction is performed through the 11×11 large convolution layer, the result is more accurate, meanwhile, the coordinates of the feature points of the human face can be detected from multiple dimensions in total and multiple dimensions, the multi-dimension expression capability of the network can be improved at a finer granularity level, and the accuracy of the detection result is further improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a face feature point detection method according to a first embodiment of the present invention;
fig. 2 is a schematic structural diagram of a target feature point detection network according to a first embodiment of the present invention;
fig. 3 is a flowchart of a face feature point detection method according to a second embodiment of the present invention;
fig. 4 is a schematic structural diagram of a multi-scale module according to a second embodiment of the present invention;
fig. 5 is a schematic structural diagram of a face feature point detection device according to a third embodiment of the present invention;
Fig. 6 is a schematic structural diagram of an electronic device implementing a face feature point detection method according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a face feature point detection method according to a first embodiment of the present invention, where the method may be performed by a face feature point detection device, the face feature point detection device may be implemented in hardware and/or software, and the face feature point detection device may be configured in an electronic device.
It should be noted that, an application environment for implementing the face feature point detection method according to this embodiment may be described as follows: the user wears the beauty instrument to detect the skin state, the beauty instrument collects the original face image through the camera that sets up, and discerns face feature point, carries out the line according to face feature point, and then realizes the regional division to the face, for example, the forehead is a region, and the left face is a region, and the right face is a region, and the targeted selection skin care mode of every region, for example, the forehead has more acne, selects the acne removal product when caring skin each time. However, in the prior art, the original face image is directly identified and then divided into areas, and the identification result is not accurate enough due to internal or external factors of the equipment, so that the divided areas are different each time. For the same piece of skin, each time divided into different areas, it is easy to perform skin care by adopting different and unsuitable skin care modes.
As shown in fig. 1, the method includes:
s101, acquiring an original face image.
In this embodiment, the original face image may be specifically understood as an original image for face recognition, and the original face image may be acquired by an image acquisition device, which may be a camera, a video recorder, or the like. The image acquisition device can be arranged at a fixed position for image acquisition, and also can be arranged on movable equipment for image acquisition at any time and any place, and the actual setting position can be set according to actual requirements.
The original face image can be periodically acquired by the image acquisition device, acquired after triggering the acquisition condition, and the like, the specific acquisition time can be set according to the specific application scene, and the acquired image can be sent to the execution equipment after being acquired once, or can be sent to the execution equipment after the data quantity reaches a certain degree or meets other conditions; the image acquisition device can also store the acquired images, and when the face feature point recognition is met, the execution device acquires the images from storage spaces such as a memory and a cloud as original face images.
S102, performing multi-scale transformation on an original face image to obtain at least one alternative image, wherein the multi-scale transformation at least comprises the following steps: size conversion, luminance conversion, angle conversion, and noise conversion.
In this embodiment, the candidate image may be specifically understood as an image obtained after the scale transformation processing, which is used for detecting the face feature point. The size conversion is a processing way of changing the size of the image, and the brightness conversion is a processing way of changing the brightness of the image; the angle conversion is a processing mode for changing the angle of the image; noise transformation is a process of adding noise to an image or removing noise.
And carrying out transformation processing on the original face image from a plurality of scales to obtain at least one alternative image. The candidate image may be obtained by processing the original face image through a multi-scale transformation, or may be obtained by simultaneously processing two or more multi-scale transformations, for example, first performing a size transformation and then performing an angle transformation, or sequentially performing a size transformation, a luminance transformation, and a noise transformation. The number and types of the multi-scale conversion can be preset, or the number and types of the multi-scale conversion can be selected according to the information related to the images such as the application scene type, the precision requirement and the like of the original face image, and after the original face image is obtained, the multi-scale conversion is carried out on the original face image according to the preset types of the multi-scale conversion. For example, the number of multi-scale transformations of the high-precision image is preset to be 4, the types are size transformation, brightness transformation, angle transformation and noise transformation, after the original face image is acquired, the precision requirement is determined to be high precision according to the information related to the original face image, the size transformation, brightness transformation, angle transformation and noise transformation are determined to be carried out on the acquired original face image according to the association relation between the precision and the type and the number of the multi-scale transformations, and the size transformation, brightness transformation, angle transformation and noise transformation are also set when the multi-scale transformation type is set when the original face image is processed singly or in combination (including the sequence of processing when the combination processing is processed), and the embodiment of the application is not limited. The error existing in the original face image is compensated by carrying out multi-scale transformation on the original face image.
S103, taking the original face image and each candidate image as images to be identified in sequence.
In this embodiment, the image to be recognized can be understood as an image having a feature point recognition requirement. And respectively taking the original face image and each alternative image as images to be recognized, namely, taking the original face image and each alternative image as the images to be recognized to detect characteristic points.
S104, taking an image to be identified as input, inputting the image to be identified into a convolution layer, a batch standardization layer, an activation function and a maximum pooling layer which are sequentially connected in a pre-training target feature point detection network, and performing global feature extraction to obtain a first extraction result, wherein the convolution layer is an 11×11 large convolution layer.
In this embodiment, the target feature point detection network may be specifically understood as a neural network model that can detect a face feature point in an image. The first extraction result can be understood as, in particular, feature data output through the convolution layer, batch normalization layer, activation function and maximum pooling layer processing. The convolution layer is used for extracting main characteristic information in the image, and the embodiment selects a large convolution layer of 11 multiplied by 11 for characteristic extraction, so that the result is more accurate; the batch normalization layer is normalized Bachnormal, and can normalize the data; the activation function and the maximum pooling layer are used for performing nonlinear processing on the extracted characteristic information, and the activation function can select a Relu function.
Training a neural network model in advance according to a training sample to obtain a target feature point detection network, wherein the target feature point detection network can identify feature points of an input image according to model parameters after training is completed and output identified feature point coordinates. The target feature point detection network in the embodiment of the application can perform global feature extraction and multi-scale feature extraction on the image input into the network. The loss function of the target feature point detection network may be WingLoss. The target feature point detection network has the structure of a convolution layer, a batch standardization layer, an activation function and maximum pooling layer, three multi-scale modules, an average pooling layer, a full connection layer and an output layer which are connected in sequence.
The method comprises the steps of sequentially connecting a convolution layer, a batch standardization layer, an activation function and a maximum pooling layer, inputting an image to be identified into the convolution layer, performing global feature extraction through the convolution layer, inputting the image to the batch standardization layer for normalization processing, and then inputting the image to the activation function and the maximum pooling layer for processing to obtain a first extraction result. Namely, the image to be identified is sequentially processed by a convolution layer, a batch standardization layer, an activation function and a maximum pooling layer to obtain a first extraction result.
S105, taking the first extraction result as input, inputting the input into three multi-scale modules connected in sequence in the target feature point detection network for multi-scale feature extraction, and obtaining a second extraction result.
In this embodiment, the second extraction result may be specifically understood as data obtained after feature extraction from multiple scales through three multiple scale modules. The multi-scale module Mutil scale module can perform feature extraction of data from multiple scales.
The target feature point detection network comprises three multi-scale modules GL (Global), the three multi-scale modules are sequentially connected, a first extraction result is input into the multi-scale modules, and feature extraction is performed from different scales through the multi-scale modules after the processing of the three multi-scale modules in sequence, so that a second extraction result is obtained.
S106, taking the second extraction result as input, and inputting the second extraction result into an average pooling layer, a full-connection layer and an output layer which are sequentially connected in a target feature point detection network to obtain a feature point coordinate set; each feature point coordinate set comprises at least one feature point coordinate and a feature point identifier corresponding to the feature point coordinate.
In this embodiment, the feature point coordinate set may be specifically understood as a set formed by feature point coordinates, and each image corresponds to one feature point coordinate set; since the number of the feature points of the face may be multiple, each feature point coordinate set includes at least one feature point coordinate, in order to facilitate distinguishing the types of feature points represented by different feature point coordinates when the number of the feature point coordinates is multiple, each feature point coordinate may be identified by a feature point identifier, where the feature point identifier may be a character such as 1,2,3 or …, or may be a feature point type, and the feature point type may be a left eyebrow peak, a right eyebrow peak, a left pupil, a right pupil, or the like, so long as different feature points may be distinguished.
The target feature point detection network further comprises an average pooling layer, a full-connection layer and an output layer, wherein the average pooling layer is connected with the last multi-scale module, a second extraction result output by the last multi-scale module is input into the average pooling layer for pooling treatment, and parameters can be reduced and the parameters can be reduced, translation invariance can be ensured, the parameters can be reduced, and the risk of overfitting can be reduced through the average pooling treatment; and the data output after the processing of the average pooling layer is input to the full-connection layer for processing, and finally, the processing result is input to the output layer, and the characteristic point coordinate set is output through the output layer.
Exemplary, fig. 2 provides a schematic structural diagram of a target feature point detection network, where the target feature point detection network includes: the system comprises a convolution layer, a batch standardization layer, an activation function and maximum pooling layer, three multi-scale modules, an average pooling layer, a full connection layer and an output layer; the multi-scale modules are respectively multi-scale block-1, multi-scale block-2 and multi-scale block-3. And inputting the image to be identified into a target characteristic point detection network, and sequentially processing each layer in the image to obtain a characteristic point coordinate set. The convolution layer, the batch standardization layer, the activation function and maximum pooling layer, the three multi-scale modules and the average pooling layer are a brand new GLNET (Global) module, and Global information and local information can be effectively identified. The number of feature point coordinates that the output layer can output may be preset, and each circle in the drawing represents one feature point coordinate.
It should be noted that the original face image and each candidate image are used as the images to be identified, and feature point detection is performed through the steps of S104-S106, so as to determine the corresponding feature point coordinate set.
And S107, for each feature point identifier, calculating based on the feature point coordinates corresponding to the feature point identifier to obtain the face feature point coordinates.
In this embodiment, the coordinates of the face feature points may be two-dimensional coordinates or three-dimensional coordinates, etc., and the number of the coordinates of the face feature points may be one or more, typically a plurality of, so that the face feature points may be better represented; the face feature point coordinates may be coordinates of the five sense organs, eyebrows, etc., for example, pupil coordinates of both eyes, nose tip coordinates, nose wing coordinates, and eyebrow peak coordinates.
For each feature point identifier, determining all feature point coordinates corresponding to the feature point identifier based on feature point coordinates in all feature point coordinate sets; and (3) carrying out comprehensive operation on all the feature point coordinates corresponding to the feature point identification, for example, determining the corresponding face feature point coordinates by means of averaging, maximum value, minimum value, weighting and the like.
Preferably, the weighting operation may be performed according to all the feature point coordinates corresponding to each feature point identifier to obtain corresponding face feature point coordinates, and the weighting coefficient may be preset, for example, the weighting coefficient may be set according to the errors of the candidate images obtained by different scale transformation, where the smaller the error, the higher the weighting coefficient. And carrying out weighted summation on the feature point coordinates of the same mark to obtain corresponding face feature point coordinates, for example, carrying out weighted operation on all feature point coordinates of which the feature point marks are left pupils.
Illustratively, face feature point coordinates=0.6×feature point coordinates of an original face image+0.1×enlarged feature point coordinates+0.1×reduced feature point coordinates+0.05×brightness change feature point coordinates+0.05×random noise feature point coordinates+0.1×angle conversion feature point coordinates.
The amplified feature point coordinates are feature point coordinates obtained by identifying the candidate images obtained after the size is amplified; the reduced feature point coordinates are feature point coordinates obtained by identifying the candidate image obtained after the size is reduced; the brightness change characteristic point coordinates are characteristic point coordinates obtained by identifying the alternative image obtained after brightness conversion; the random noise characteristic point coordinates are characteristic point coordinates obtained by identifying the alternative image obtained after noise transformation; the angle transformation feature point coordinates are feature point coordinates obtained by identifying the alternative image obtained after the angle transformation.
According to the face feature point detection method provided by the embodiment, the influence of the internal factors or the external factors of the environment of the equipment can be considered, the original face image data is subjected to multi-scale transformation, the images corresponding to different situations of a user are simulated, the relative errors of the same user in different times of measurement are considered in the feature point identification process each time, the feature points are identified, the multiple identification results are kept relatively stable, and the accuracy of the identification results is further improved.
According to the face feature point detection method provided by the embodiment of the invention, the problem of inaccurate face feature point detection is solved, after an original face image is obtained, the original face image is transformed in at least one mode of size transformation, brightness transformation, angle transformation and noise transformation, so that multi-scale transformation of the original face image is realized, alternative images are obtained, the original face image and the alternative images are used as images to be identified for feature point detection, the condition that the accuracy of detection results is low when the original face image is used for detection is avoided, the face feature points are better described through each alternative image from different angles, the corresponding images of different conditions of a user are simulated through different alternative images, the feature points are identified by considering the relative errors of the same user in different times of measurement in the feature point identification process, the errors in different times of feature point identification are reduced, the relative accuracy of the face feature point identification is improved, the user can keep high consistency in the process of identifying the feature points for multiple times, the accuracy and the consistency of region division are further ensured, and the skin care experience is improved. In addition, in the target feature point detection network provided by the embodiment of the application, feature extraction is performed through the 11×11 large convolution layer, the result is more accurate, meanwhile, the coordinates of the feature points of the human face can be detected from multiple dimensions in total and multiple dimensions, the multi-dimension expression capability of the network can be improved at a finer granularity level, and the accuracy of the detection result is further improved.
Example two
Fig. 3 is a flowchart of a face feature point detection method according to a second embodiment of the present invention, where the face feature point detection method is refined based on the foregoing embodiment. As shown in fig. 3, the method includes:
s201, acquiring an original face image.
S202, performing multi-scale transformation on an original face image to obtain at least one alternative image, wherein the multi-scale transformation at least comprises the following steps: size conversion, luminance conversion, angle conversion, and noise conversion.
As an optional embodiment of the present embodiment, the present optional embodiment further performs multi-scale transformation on the original face image to obtain at least one candidate image, which is optimized as A1 and/or A2:
a1, performing size amplification treatment on an original face image according to size amplification factors, and taking the amplified image as an alternative image;
in this embodiment, the size magnification is a ratio of the enlarged image to the original face image, for example, 1.2 times, that is, the size of the enlarged image is 1.2 times the size of the original face image. The size magnification can be preset as a fixed value, and can also be determined according to the resolution, the size, the application scene type and other related information of the original face image. After the original face image is determined, the corresponding size amplification factor is correspondingly determined, the size amplification factor is multiplied with the size of the original face image to obtain an amplified size, the pixel value of each pixel point in the amplified image can be determined in a difference mode and the like, and the amplified image obtained by amplifying the original face image is the alternative image.
A2, performing size reduction processing on the original face image according to the size reduction multiple, and taking the reduced image as an alternative image.
In this embodiment, the size reduction factor is a ratio of the reduced image to the original face image, for example, 0.8 times, that is, the size of the reduced image is 0.8 times the size of the original face image. The size reduction multiple can be set as a fixed value in advance, and can also be determined according to the resolution, the size, the application scene type and other related information of the original face image. After the original face image is determined, the corresponding size reduction multiple is correspondingly determined, the size reduction multiple is multiplied with the size of the original face image to obtain a reduced size, the pixel value of each pixel point in the reduced image can be determined by calculating the mean value, the maximum value, the minimum value and the like, and the reduced image obtained by carrying out reduction processing on the original face image is the candidate image.
Optionally, the size magnification and/or size reduction is determined according to the resolution of the original face image.
In the embodiment of the present application, when the size of the original face image is converted, only the enlargement conversion or the reduction conversion may be performed, or the enlargement conversion and the reduction conversion may be performed simultaneously.
As an optional embodiment of the present embodiment, the present optional embodiment further performs multi-scale transformation on an original face image to obtain at least one candidate image, and optimizes to:
b1, determining brightness adjusting parameters according to the acquisition time and the geographic position of the original face image.
In the present embodiment, the luminance adjustment parameter may be specifically understood as a parameter value for adjusting luminance, for example, a luminance increase N, a luminance decrease N, a luminance adjustment to N, and the like. The geographic position can be a geographic position covering a larger range in the south, the north, the east China and the like, can also be a geographic position of a specific province, a specific city, a specific district, a specific street and the like, can be set by a user, and can also be automatically positioned and acquired after the authorization of the user is obtained.
When the original face image is acquired, the image acquisition device can automatically record the acquisition time of the original face image, and meanwhile, the address set by the user is determined to be the geographic position corresponding to the original face image, or a positioning module is arranged on the image acquisition device, positioning is carried out through the positioning module after the user is authorized, and the positioning result is used as the geographic position of the original face image. After the original face image is acquired, the acquisition time and the geographic position are used as image information to be associated with the original face image, after the original face image is acquired, the image information is correspondingly acquired, and the image information is analyzed to determine the acquisition time and the geographic position corresponding to the original face image. Because the illumination conditions of different areas at different times are different, the influence on the image is different, the adjustment parameters corresponding to different times and geographic positions are preset to form a parameter table, after the acquisition time and the geographic positions are determined, the adjustment parameters corresponding to the matched time and geographic positions are determined according to the acquisition time and geographic position query parameter table, and the adjustment parameters are determined to be brightness adjustment parameters corresponding to the original face image.
And B2, carrying out brightness conversion on the original face image according to the brightness adjustment parameters, and taking the image after brightness conversion as an alternative image.
And adjusting the brightness in the original face image according to the brightness adjustment parameters, for example, increasing the brightness of each pixel by n, decreasing the brightness by n, and the like, so as to obtain a brightness-converted image, and taking the brightness-converted image as an alternative image. The recognition error due to brightness is compensated for by brightness conversion.
The embodiment of the application can also perform color conversion on the original face image, convert the original face image from the original color to another color, and take the image after the color conversion as an alternative image. The color after transformation can be preset, can be selected according to the geographic position, and the like.
As an optional embodiment of the present embodiment, the present optional embodiment further performs multi-scale transformation on an original face image to obtain at least one candidate image, and optimizes to:
and C1, randomly selecting a rotation angle in a preset angle interval.
In this embodiment, the angle interval is an interval formed by angles, and the maximum value and the minimum value thereof can be determined according to the dimension of the image, for example, the maximum value of the angle interval of the two-dimensional image is 180 ° and the minimum value thereof is-180 °; the maximum value of the angle interval of the three-dimensional image is 360 degrees, and the minimum value is-360 degrees. The angle interval can be radian system or angle system, and can be mutually converted in calculation, and the angle area is [ -20 degrees, 20 degrees ] by way of example.
An angle interval is preset, and when the multi-scale transformation of angles is carried out on the original face image, an angle is randomly selected in the angle area to serve as a rotation angle through a random function.
And C2, performing angle rotation on the original face image according to the rotation angle, and taking the rotated image as an alternative image.
And respectively carrying out angle rotation on each pixel point in the original face image according to the rotation angle to obtain rotated coordinates, forming a rotated image based on each rotated coordinate and the corresponding pixel value, and taking the rotated image as an alternative image.
As an optional embodiment of the present embodiment, the present optional embodiment further performs angular rotation on the original face image according to the rotation angle, and is optimized as follows:
and D1, determining a rotation matrix according to the rotation angle, and determining the center point coordinates of the original face image.
In the present embodiment, the rotation matrix can be understood as a matrix for rotating an image in particular; the center point coordinates can be understood as coordinates of the center point in the original face image in particular.
The expression of the rotation matrix may be predetermined, and the rotation matrix may be determined by substituting the rotation angle into the expression. And simultaneously, determining the coordinates of the center point according to the coordinates of each pixel point in the original face image.
And D2, for each pixel point in the original face image, carrying out coordinate conversion on the coordinates of the pixel point based on the rotation matrix and the coordinates of the central point to obtain converted pixel coordinates.
Each pixel point in the original face image is rotated, and coordinate conversion is carried out on each coordinate of the pixel point respectively, wherein the coordinates at least comprise an abscissa and an ordinate; taking the coordinate conversion of the abscissa as an example, subtracting the abscissa of the center point coordinate from the abscissa of the pixel point to obtain a first difference value, multiplying the first difference value by a corresponding element in the rotation matrix to obtain a first product, subtracting the ordinate of the center point coordinate from the ordinate of the pixel point to obtain a second difference value, multiplying the second difference value by a corresponding element in the rotation matrix to obtain a second product, subtracting the third difference value obtained by the second product from the first product, and taking the sum of the third difference value and the abscissa of the center point coordinate as the abscissa after rotation.
D3, forming a rotated image based on the converted pixel coordinates corresponding to each pixel point.
After the pixel coordinates of each pixel point after conversion are determined, the pixel values of the pixel coordinates after conversion are pixel values before conversion, and a rotated image is formed based on each pixel coordinates after conversion and the corresponding pixel values. By rotating the original face image, face recognition errors caused by a single angle are avoided.
Exemplary, the embodiment of the present application provides an expression of a rotation matrix, and a coordinate transformation formula:
rotating the matrix:
in the embodiment of the present application, θ is an arc system, and if the randomly generated rotation angle is an angle system, the rotation angle needs to be converted into the arc system.
And performing a rotation operation on the image. For each pixel (x, y) of the original face image, the rotation can be performed using the following formula:
x’=(x-cx)×cos(θ)-(y-cy)×sin(θ)+cx;
y’=(x-cx)×sin(θ)+(y-cy)×cos(θ)+cy;
wherein x' is the abscissa in the rotated pixel coordinates; y' is the ordinate in the rotated pixel coordinates; x is the abscissa of the pixel point in the original face image; y is the ordinate of the pixel point in the original face image; cx is the abscissa of the center point coordinate and cy is the ordinate of the center point coordinate.
Through the formula, the rotated pixel coordinates (x ', y') of each pixel point (x, y) in the original face image can be calculated based on the central point coordinates (cx, cy).
As an optional embodiment of the present embodiment, the present optional embodiment further performs multi-scale transformation on an original face image to obtain at least one candidate image, and optimizes to: noise is added to the original face image, and the image after noise addition is used as an alternative image.
The noise added to the original face image may be random noise, such as pretzel noise, gaussian noise, or the like. Presetting an added noise type, randomly generating noise according to the noise type, adding the noise to an original face image to obtain an image with the added noise, and taking the image as an alternative image.
According to the embodiment of the application, one or more transformation modes can be selected according to requirements to transform the original face image, so that a corresponding alternative image is obtained.
The errors corresponding to the face feature points of different images are different, and an error table is provided in table 1 for example, and the errors of the detection results of the different images are shown in an exemplary manner.
TABLE 1
Wherein Img1 in table 1 is an original face image, img2 is an alternative image obtained by performing brightness conversion on the image, img3 is an alternative image obtained by performing noise conversion on the image, img4 is an alternative image obtained by performing size enlargement on the image, img5 is an alternative image obtained by performing size reduction on the image, and Img6 is an alternative image obtained by performing angle conversion on the image.
S203, taking the original face image and each candidate image as images to be identified in sequence.
S204, taking an image to be identified as input, inputting the image to be identified into a convolution layer, a batch standardization layer, an activation function and a maximum pooling layer which are sequentially connected in a pre-training target feature point detection network, and performing global feature extraction to obtain a first extraction result, wherein the convolution layer is an 11×11 large convolution layer.
S205, taking the first extraction result as current input data, and taking the first multi-scale module as a current multi-scale module.
In this embodiment, the current input data may be specifically understood as data that is currently input to the multi-scale module for processing; the current multi-scale module can be specifically understood as a multi-scale module for processing data currently; the current multi-scale module is one of three multi-scale modules. The first extraction result is firstly used as current input data, and the first multi-scale module is used as the current multi-scale module.
S206, inputting the current input data into the current multi-scale module for feature extraction to obtain output data.
And inputting the current input data into a current multi-scale module, and obtaining corresponding output data by the current multi-scale module through filtering the feature map from different dimensions.
S207, judging whether the current multi-scale module is the last multi-scale module, if not, executing S208; if yes, S209 is executed.
S208, taking the output data as new current input data, taking the next multi-scale module of the current multi-scale module as a new current multi-scale module, and returning to the step S206.
S209, taking the output data of the last multi-scale module as a second extraction result.
Judging whether the current multi-scale module is the last multi-scale module, if so, taking the output data of the last multi-scale module as a second extraction result; if not, taking the output data as new current input data, taking the next multi-scale module of the current multi-scale module as the new current multi-scale module, repeating the step S206, and extracting the characteristics of the current input data through the current multi-scale module.
In the embodiment of the application, each multi-scale module is sequentially used as a current multi-scale module to perform feature extraction, and the three multi-scale modules are sequentially used for performing feature extraction to obtain a second extraction result.
As an optional embodiment of the present embodiment, the optional embodiment further performs feature extraction on the current input data input into the current multi-scale module, and optimizes output data, including:
And E1, inputting the current input data into a first convolution layer of the current multi-scale module for convolution processing to obtain a first convolution result, and dividing the first convolution result into at least three feature images.
In this embodiment, the first convolution layer may be specifically understood as one convolution layer in the multi-scale module, where the convolution kernel of the first convolution layer is 1×1. The first convolution result may be specifically understood as data obtained when the data is processed by the first convolution layer in the current multi-scale module.
And inputting the current input data into a first convolution layer of the current multi-scale module, carrying out convolution processing on the first convolution layer based on a convolution kernel to obtain a first convolution result, and dividing the first convolution result into at least three feature graphs.
And E2, directly taking the first characteristic diagram as an output result of the first characteristic diagram.
The feature maps can be ordered according to a certain rule or randomly, and the first feature map is directly output as an output result without being processed.
And E3, inputting the second characteristic diagram into a corresponding second convolution layer for convolution processing, and obtaining a convolution result and taking the convolution result as an output result of the second characteristic diagram.
The multi-scale module comprises a plurality of second convolution layers, the number of the second convolution layers is equal to the number of the feature images minus 1, taking N feature images as an example, the 2 nd to N feature images respectively correspond to one second convolution layer, and the convolution kernel of the second convolution layer is preferably 3 multiplied by 3. And inputting the second characteristic diagram into a second convolution layer corresponding to the second characteristic diagram for convolution processing to obtain a convolution result, and outputting the convolution result as an output result of the second characteristic diagram.
And E4, taking the third characteristic diagram as a current characteristic diagram, taking the output result of the second characteristic diagram as current convolution input data, inputting the current characteristic diagram and the current convolution input data into a corresponding second convolution layer for convolution processing, and taking the obtained convolution result as the output result of the current characteristic diagram.
In this embodiment, the current feature map may be specifically understood as a feature map that needs to be processed currently; the current convolution input data can be specifically understood as data which is required to be input to a convolution layer for convolution processing. The 2 nd to the N th feature graphs are provided with a second convolution layer corresponding to the feature graphs, the current feature graphs and the current convolution input data are input into the corresponding second convolution layers, convolution processing is carried out through convolution kernels of the second convolution layers, and the obtained convolution result is used as an output result of the current feature graphs.
E5, taking the next feature map of the current feature map as a new current feature map, taking an output result corresponding to the current feature map as new current convolution input data, and re-executing the step of inputting the current feature map and the current convolution input data into a corresponding second convolution layer to carry out convolution processing, wherein the obtained convolution result is taken as the output result of the current feature map until all the feature maps are traversed.
Judging whether all feature images are traversed, namely whether all feature images are selected to be subjected to convolution processing, if so, ending the convolution processing, and executing E6; otherwise, taking the next feature map of the current feature map as a new current feature map, taking an output result corresponding to the current feature map as new current convolution input data, and updating the current feature map and the current convolution input data. Taking the current feature map as a third feature map as an example, taking a fourth feature map as a new current feature map, taking the output result of the third feature map as new current convolution input data, and continuing to perform convolution processing.
And E6, splicing output results corresponding to all the feature graphs, and inputting the spliced data into a third convolution layer for fusion to obtain a fusion result.
In this embodiment, the third convolution layer may be specifically understood as another convolution layer in the multi-scale module, where the convolution kernel of the third convolution layer is 1×1. The fusion result is data obtained by fusing a plurality of output data. And splicing output results corresponding to all the feature graphs in sequence, inputting spliced data into a third convolution layer, and carrying out information fusion processing through a convolution kernel of 1 multiplied by 1 to obtain a fusion result.
And E7, inputting the current input data into a fourth convolution layer of the current multi-scale module for convolution processing to obtain a second convolution result, and carrying out maximum pooling processing on the second convolution result to obtain a pooling result.
In this embodiment, the fourth convolution layer may be specifically understood as one convolution layer in the multi-scale module, where the convolution kernel of the fourth convolution layer is 5×5. The second convolution result may be specifically understood as data obtained by processing the current input data by the fourth convolution layer.
Inputting the current input data into a fourth convolution layer of the current multi-scale module, and extracting global key features through a convolution kernel of 5×5 to obtain a second convolution result; and carrying out maximum pooling treatment on the second convolution result to obtain a pooling result, wherein the maximum pooling treatment in the step can be to carry out pooling treatment through a pooling layer of 3 multiplied by 3.
And E8, fusing the fusion result and the pooling result to obtain the output data of the current multi-scale module.
And fusing the fusion result and the pooling result again to obtain output data of the current multi-scale module, wherein the output data simultaneously comprises local features and global features.
For example, fig. 4 provides a schematic structural diagram of a multi-scale module, taking the number of feature maps as 5 as an example. The multi-scale module includes: a first convolution layer 31, 4 second convolution layers 32, a third convolution layer 33, a fourth convolution layer 34, and a maximum pooling layer 35. The current input data 36 is input into the first convolution layer 31 for convolution processing to obtain a first convolution result, the first convolution result is divided into 5 feature images in1-in5, the first feature image in1 is directly output as an output result out1, the second feature image in2 is input into the corresponding second convolution layer 32 for convolution processing, the convolution result is output as an output result out2, meanwhile, the output result out2 is taken as current convolution input data, and the third feature image in3 is input into the corresponding second convolution layer 32 for convolution processing, namely, the output result out corresponding to the last feature image is simultaneously convolved by the second convolution layer 32 from the third feature image in3 until all the feature images are processed. Finally, the output results corresponding to all the feature graphs are spliced together and sent to a third convolution layer 33 with a convolution kernel of 1 multiplied by 1 for information fusion, so as to obtain a fusion result; the current input data 36 also passes through a fourth convolution layer 34 with a convolution kernel of 5×5 and a maximum pooling layer 35 of 3×3 in sequence, resulting in a pooling result. On any possible path for converting the input feature map into the output feature map, the equal receptive fields will always increase after passing through the convolution kernel of 3×3, and finally many equivalent feature scales will be generated due to the combined effect. The convolution of the multi-scale module global 5×5 is further fused to extract key features, and output data 37 of the multi-scale module is obtained.
Fig. 4 shows the difference between the GLNET module and the common bottleneck layer in the target feature point detection network according to the embodiment of the present application. After a 1 x 1 first convolution layer, the first convolution result is divided into s feature graphs, denoted by ini, where i e {1, 2,..s }. The feature map has the same spatial size as the original image except that the number of channels is 1/s of the original first convolution result. Each feature map has its corresponding 3 x 3 second convolution layer except in1, using K i () And (3) representing. Definition K i () The output of (2) is outi. Feature diagrams ini and K i-1 () Added and sent to K together i () And (5) processing. In order to reduce the parameters and increase the number of s, s=5, in1 in the structure, a 3×3 convolution is not required. outi can be expressed as:
outi=ini,i=1
outi= K i (ini),i=2
outi= K i (ini+out(i-1)),2<i<=s
when the three multi-scale modules in the embodiment of the application perform data processing, the input channel of the first multi-scale module is 64, and the output channel is 64; the input channel of the second multi-scale module is 64, and the output channel is 128; the third multi-scale module has an input channel of 128 and an output channel of 256.
The multi-scale module provided by the embodiment of the application can improve the multi-scale expression capability of the network at a finer granularity level when detecting the feature points; the multiscale expression capacity of the network is enhanced by multiple receptive fields at finer granularity scales.
S210, taking a second extraction result as input, and inputting the second extraction result into an average pooling layer, a full-connection layer and an output layer which are sequentially connected in a target feature point detection network to obtain a feature point coordinate set; each feature point coordinate set comprises at least one feature point coordinate and a feature point identifier corresponding to the feature point coordinate.
S211, for each feature point identifier, calculating based on the feature point coordinates corresponding to the feature point identifier to obtain the face feature point coordinates.
The face feature point detection method solves the problem of inaccurate face feature point detection, avoids the situation that the detection result is low in accuracy when the original face image is used for detection, better describes the face feature points from different angles through each alternative image, simulates images corresponding to different conditions of a user through different alternative images, considers the relative error of the same user in different measurement processes in each feature point recognition process, recognizes the feature points, reduces the error in different feature point recognition processes, improves the relative accuracy of the face feature point recognition, enables the user to keep high consistency when the feature points are recognized for multiple times, further ensures accuracy and consistency of region division, realizes precision skin care, and improves user experience. In addition, the embodiment of the application provides a novel target feature point detection network, which can improve the multiscale expression capability of the network at a finer granularity level and further improve the accuracy of detection results.
Example III
Fig. 5 is a schematic structural diagram of a face feature point detection device according to a third embodiment of the present invention. As shown in fig. 5, the apparatus includes: an image acquisition module 41, an image transformation module 42, an image to be identified determination module 43, a global feature extraction module 44, a multi-scale feature extraction module 45, a coordinate set determination module 46, and a feature point coordinate determination module 47.
Wherein, the image acquisition module 41 is configured to acquire an original face image;
the image transformation module 42 is configured to perform a multi-scale transformation on the original face image to obtain at least one candidate image, where the multi-scale transformation includes at least one of the following: size conversion, brightness conversion, angle conversion, and noise conversion;
the image to be identified determining module 43 is configured to sequentially use the original face image and each of the candidate images as an image to be identified;
the global feature extraction module 44 is configured to input the image to be identified as input into a convolution layer, a batch standardization layer, an activation function and a maximum pooling layer, which are sequentially connected in the pre-training target feature point detection network, to perform global feature extraction, so as to obtain a first extraction result, where the convolution layer is an 11×11 large convolution layer;
The multi-scale feature extraction module 45 is configured to input the first extraction result as input into three multi-scale modules sequentially connected in the target feature point detection network to perform multi-scale feature extraction, so as to obtain a second extraction result;
the coordinate set determining module 46 is configured to input the second extraction result as an input to an average pooling layer, a full-connection layer, and an output layer that are sequentially connected in the target feature point detection network, to obtain a feature point coordinate set; each feature point coordinate set comprises at least one feature point coordinate and a feature point identifier corresponding to the feature point coordinate;
the feature point coordinate determining module 47 is configured to calculate, for each feature point identifier, based on each feature point coordinate corresponding to the feature point identifier, to obtain a face feature point coordinate.
The face feature point detection device provided by the embodiment of the invention solves the problem of inaccurate face feature point detection, after an original face image is obtained, the original face image is transformed in at least one mode of size transformation, brightness transformation, angle transformation and noise transformation, so that multi-scale transformation of the original face image is realized, alternative images are obtained, the original face image and the alternative images are used as images to be identified for feature point detection, the condition that the accuracy of detection results is lower when the original face image is used for detection is avoided, the face feature points are better described through each alternative image from different angles, corresponding images of different conditions of a user are simulated through different alternative images, the feature points are identified by considering the relative errors of the same user in different times of measurement in the process of identifying the feature points, the errors in the process of identifying the feature points of different times are reduced, the relative accuracy of identifying the feature points of the face is improved, the user can keep high consistency when the feature points are identified for multiple times, the accuracy and consistency of region division are further ensured, and the skin care experience is improved.
Optionally, the image transformation module 42 includes:
the amplifying unit is used for carrying out size amplification processing on the original face image according to the size amplification factors, and taking the amplified image as an alternative image; and/or the number of the groups of groups,
and the reduction unit is used for carrying out size reduction processing on the original face image according to the size reduction multiple, and taking the reduced image as an alternative image.
And the size magnification and/or size reduction is/are determined according to the resolution of the original face image.
Optionally, the image transformation module 42 includes:
the parameter determining unit is used for determining brightness adjusting parameters according to the acquisition time and the geographic position of the original face image;
and the brightness conversion unit is used for carrying out brightness conversion on the original face image according to the brightness adjustment parameters, and taking the image after brightness conversion as an alternative image.
Optionally, the image transformation module 42 includes:
the angle selection unit is used for randomly selecting a rotation angle in a preset angle interval;
and the rotating unit is used for carrying out angle rotation on the original face image according to the rotation angle, and taking the rotated image as an alternative image.
Optionally, the rotation unit is specifically configured to: determining a rotation matrix according to the rotation angle, and determining the center point coordinates of the original face image; for each pixel point in the original face image, carrying out coordinate conversion on the coordinates of the pixel point based on the rotation matrix and the center point coordinates to obtain converted pixel coordinates; and forming a rotated image based on the converted pixel coordinates corresponding to each pixel point.
Optionally, the image transformation module 42 includes:
and the noise adding unit is used for adding noise to the original face image and taking the image with the added noise as an alternative image.
Optionally, the multi-scale feature extraction module 45 includes:
the characteristic extraction unit is used for taking the first extraction result as current input data, taking a first multi-scale module as a current multi-scale module, inputting the current input data into the current multi-scale module for characteristic extraction, and obtaining output data;
and the data updating unit is used for taking the output data as new current input data, taking the next multi-scale module of the current multi-scale module as a new current multi-scale module, repeatedly executing the step of inputting the current input data into the current multi-scale module for feature extraction to obtain the output data until the current multi-scale module is the last multi-scale module, and taking the output data of the last multi-scale module as a second extraction result.
Optionally, the feature extraction unit is specifically configured to: inputting the current input data into a first convolution layer of the current multi-scale module for convolution processing to obtain a first convolution result, and dividing the first convolution result into at least three feature graphs; directly taking the first characteristic diagram as an output result of the first characteristic diagram; inputting the second characteristic diagram into a corresponding second convolution layer for convolution processing to obtain a convolution result and taking the convolution result as an output result of the second characteristic diagram; taking the third characteristic diagram as a current characteristic diagram, taking the output result of the second characteristic diagram as current convolution input data, inputting the current characteristic diagram and the current convolution input data into a corresponding second convolution layer for convolution processing, and taking the obtained convolution result as the output result of the current characteristic diagram; the next feature map of the current feature map is used as a new current feature map, an output result corresponding to the current feature map is used as new current convolution input data, the step of inputting the current feature map and the current convolution input data into a corresponding second convolution layer to carry out convolution processing is re-executed, and the obtained convolution result is used as the output result of the current feature map until all feature maps are traversed; splicing output results corresponding to all the feature graphs, and inputting spliced data into a third convolution layer for fusion to obtain a fusion result; inputting the current input data to a fourth convolution layer of the current multi-scale module for convolution processing to obtain a second convolution result, and carrying out maximum pooling processing on the second convolution result to obtain a pooling result; and fusing the fusion result with the pooling result to obtain the output data of the current multi-scale module.
The face feature point detection device provided by the embodiment of the invention can execute the face feature point detection method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example IV
Fig. 6 shows a schematic diagram of an electronic device 50 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 6, the electronic device 50 includes at least one processor 51, and a memory, such as a Read Only Memory (ROM) 52, a Random Access Memory (RAM) 53, etc., communicatively connected to the at least one processor 51, in which the memory stores a computer program executable by the at least one processor, and the processor 51 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 52 or the computer program loaded from the storage unit 58 into the Random Access Memory (RAM) 53. In the RAM 53, various programs and data required for the operation of the electronic device 50 can also be stored. The processor 51, the ROM 52 and the RAM 53 are connected to each other via a bus 54. An input/output (I/O) interface 55 is also connected to bus 54.
Various components in the electronic device 50 are connected to the I/O interface 55, including: an input unit 56 such as a keyboard, a mouse, etc.; an output unit 57 such as various types of displays, speakers, and the like; a storage unit 58 such as a magnetic disk, an optical disk, or the like; and a communication unit 59 such as a network card, modem, wireless communication transceiver, etc. The communication unit 59 allows the electronic device 50 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.
The processor 51 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 51 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 51 performs the respective methods and processes described above, such as a face feature point detection method.
In some embodiments, the face feature point detection method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 58. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 50 via the ROM 52 and/or the communication unit 59. When the computer program is loaded into the RAM 53 and executed by the processor 51, one or more steps of the face feature point detection method described above may be performed. Alternatively, in other embodiments, the processor 51 may be configured to perform the face feature point detection method in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (9)

1. The face feature point detection method is characterized by comprising the following steps of:
acquiring an original face image;
performing multi-scale transformation on the original face image to obtain at least one alternative image, wherein the multi-scale transformation at least comprises one of the following steps: size conversion, brightness conversion, angle conversion, and noise conversion;
sequentially taking the original face image and each candidate image as images to be identified;
The image to be identified is used as input and is input into a convolution layer, a batch standardization layer, an activation function and a maximum pooling layer which are sequentially connected in a pre-training target feature point detection network to carry out global feature extraction, so that a first extraction result is obtained, wherein the convolution layer is an 11 multiplied by 11 large convolution layer;
inputting the first extraction result as input into three multi-scale modules connected in sequence in the target feature point detection network for multi-scale feature extraction to obtain a second extraction result;
inputting the second extraction result as input into an average pooling layer, a full-connection layer and an output layer which are sequentially connected in the target feature point detection network to obtain a feature point coordinate set; each feature point coordinate set comprises at least one feature point coordinate and a feature point identifier corresponding to the feature point coordinate;
for each feature point identifier, calculating based on each feature point coordinate corresponding to the feature point identifier to obtain a face feature point coordinate;
the step of inputting the first extraction result as input into three multi-scale modules connected in sequence in the target feature point detection network for multi-scale feature extraction to obtain a second extraction result, wherein the step of obtaining the second extraction result comprises the following steps:
Taking the first extraction result as current input data, taking a first multi-scale module as a current multi-scale module, and inputting the current input data into the current multi-scale module for feature extraction to obtain output data;
the output data is used as new current input data, the next multi-scale module of the current multi-scale module is used as a new current multi-scale module, the step of inputting the current input data into the current multi-scale module for feature extraction is repeatedly executed, and the output data is obtained until the current multi-scale module is the last multi-scale module, and the output data of the last multi-scale module is used as a second extraction result;
correspondingly, the step of inputting the current input data into the current multi-scale module to perform feature extraction to obtain output data includes:
inputting the current input data into a first convolution layer of the current multi-scale module for convolution processing to obtain a first convolution result, and dividing the first convolution result into at least three feature graphs;
directly taking the first characteristic diagram as an output result of the first characteristic diagram;
Inputting the second characteristic diagram into a corresponding second convolution layer for convolution processing to obtain a convolution result and taking the convolution result as an output result of the second characteristic diagram;
taking the third characteristic diagram as a current characteristic diagram, taking the output result of the second characteristic diagram as current convolution input data, inputting the current characteristic diagram and the current convolution input data into a corresponding second convolution layer for convolution processing, and taking the obtained convolution result as the output result of the current characteristic diagram;
the next feature map of the current feature map is used as a new current feature map, an output result corresponding to the current feature map is used as new current convolution input data, the step of inputting the current feature map and the current convolution input data into a corresponding second convolution layer to carry out convolution processing is re-executed, and the obtained convolution result is used as the output result of the current feature map until all feature maps are traversed;
splicing output results corresponding to all the feature graphs, and inputting spliced data into a third convolution layer for fusion to obtain a fusion result;
inputting the current input data to a fourth convolution layer of the current multi-scale module for convolution processing to obtain a second convolution result, and carrying out maximum pooling processing on the second convolution result to obtain a pooling result;
And fusing the fusion result with the pooling result to obtain the output data of the current multi-scale module.
2. The method according to claim 1, wherein said performing a multi-scale transformation on said original face image results in at least one candidate image, comprising:
performing size amplification processing on the original face image according to the size amplification factor, and taking the amplified image as an alternative image; and/or the number of the groups of groups,
performing size reduction processing on the original face image according to the size reduction multiple, and taking the reduced image as an alternative image;
and the size magnification and/or size reduction is/are determined according to the resolution of the original face image.
3. The method according to claim 1, wherein said performing a multi-scale transformation on said original face image results in at least one candidate image, comprising:
determining brightness adjustment parameters according to the acquisition time and the geographic position of the original face image;
and carrying out brightness transformation on the original face image according to the brightness adjustment parameters, and taking the image after brightness transformation as an alternative image.
4. The method according to claim 1, wherein said performing a multi-scale transformation on said original face image results in at least one candidate image, comprising:
Randomly selecting a rotation angle in a preset angle interval;
and carrying out angle rotation on the original face image according to the rotation angle, and taking the rotated image as an alternative image.
5. The method of claim 4, wherein the angularly rotating the original face image according to the rotation angle comprises:
determining a rotation matrix according to the rotation angle, and determining the center point coordinates of the original face image;
for each pixel point in the original face image, carrying out coordinate conversion on the coordinates of the pixel point based on the rotation matrix and the center point coordinates to obtain converted pixel coordinates;
and forming a rotated image based on the converted pixel coordinates corresponding to each pixel point.
6. The method according to claim 1, wherein said performing a multi-scale transformation on said original face image results in at least one candidate image, comprising:
and adding noise to the original face image, and taking the image with the added noise as an alternative image.
7. A face feature point detection apparatus, comprising:
the image acquisition module is used for acquiring an original face image;
The image transformation module is used for carrying out multi-scale transformation on the original face image to obtain at least one alternative image, and the multi-scale transformation at least comprises the following steps: size conversion, brightness conversion, angle conversion, and noise conversion;
the image to be identified determining module is used for sequentially taking the original face image and each candidate image as images to be identified;
the global feature extraction module is used for taking the image to be identified as input, inputting the image to be identified into a convolution layer, a batch standardization layer, an activation function and a maximum pooling layer which are sequentially connected in a pre-training target feature point detection network, and carrying out global feature extraction to obtain a first extraction result, wherein the convolution layer is an 11 multiplied by 11 large convolution layer;
the multi-scale feature extraction module is used for inputting the first extraction result as input into three multi-scale modules connected in sequence in the target feature point detection network to perform multi-scale feature extraction to obtain a second extraction result;
the coordinate set determining module is used for inputting the second extraction result as input into an average pooling layer, a full-connection layer and an output layer which are sequentially connected in the target feature point detection network to obtain a feature point coordinate set; each feature point coordinate set comprises at least one feature point coordinate and a feature point identifier corresponding to the feature point coordinate;
The characteristic point coordinate determining module is used for calculating each characteristic point coordinate corresponding to each characteristic point identifier based on the characteristic point identifier to obtain a face characteristic point coordinate;
the multi-scale feature extraction module comprises:
the characteristic extraction unit is used for taking the first extraction result as current input data, taking a first multi-scale module as a current multi-scale module, inputting the current input data into the current multi-scale module for characteristic extraction, and obtaining output data;
the data updating unit is used for taking the output data as new current input data, taking the next multi-scale module of the current multi-scale module as a new current multi-scale module, repeatedly executing the step of inputting the current input data into the current multi-scale module for feature extraction to obtain the output data until the current multi-scale module is the last multi-scale module, and taking the output data of the last multi-scale module as a second extraction result;
the feature extraction unit is specifically configured to: inputting the current input data into a first convolution layer of the current multi-scale module for convolution processing to obtain a first convolution result, and dividing the first convolution result into at least three feature graphs; directly taking the first characteristic diagram as an output result of the first characteristic diagram; inputting the second characteristic diagram into a corresponding second convolution layer for convolution processing to obtain a convolution result and taking the convolution result as an output result of the second characteristic diagram; taking the third characteristic diagram as a current characteristic diagram, taking the output result of the second characteristic diagram as current convolution input data, inputting the current characteristic diagram and the current convolution input data into a corresponding second convolution layer for convolution processing, and taking the obtained convolution result as the output result of the current characteristic diagram; the next feature map of the current feature map is used as a new current feature map, an output result corresponding to the current feature map is used as new current convolution input data, the step of inputting the current feature map and the current convolution input data into a corresponding second convolution layer to carry out convolution processing is re-executed, and the obtained convolution result is used as the output result of the current feature map until all feature maps are traversed; splicing output results corresponding to all the feature graphs, and inputting spliced data into a third convolution layer for fusion to obtain a fusion result; inputting the current input data to a fourth convolution layer of the current multi-scale module for convolution processing to obtain a second convolution result, and carrying out maximum pooling processing on the second convolution result to obtain a pooling result; and fusing the fusion result with the pooling result to obtain the output data of the current multi-scale module.
8. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the face feature point detection method of any one of claims 1-6.
9. A computer readable storage medium storing computer instructions for causing a processor to perform the face feature point detection method of any one of claims 1-6.
CN202311630749.XA 2023-12-01 2023-12-01 Face feature point detection method and device, electronic equipment and storage medium Active CN117333928B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311630749.XA CN117333928B (en) 2023-12-01 2023-12-01 Face feature point detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311630749.XA CN117333928B (en) 2023-12-01 2023-12-01 Face feature point detection method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN117333928A CN117333928A (en) 2024-01-02
CN117333928B true CN117333928B (en) 2024-03-22

Family

ID=89279756

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311630749.XA Active CN117333928B (en) 2023-12-01 2023-12-01 Face feature point detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117333928B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117523645B (en) * 2024-01-08 2024-03-22 深圳市宗匠科技有限公司 Face key point detection method and device, electronic equipment and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150561A (en) * 2013-03-19 2013-06-12 华为技术有限公司 Face recognition method and equipment
CN109389030A (en) * 2018-08-23 2019-02-26 平安科技(深圳)有限公司 Facial feature points detection method, apparatus, computer equipment and storage medium
CN109858433A (en) * 2019-01-28 2019-06-07 四川大学 A kind of method and device based on three-dimensional face model identification two-dimension human face picture
CN110991412A (en) * 2019-12-20 2020-04-10 北京百分点信息科技有限公司 Face recognition method and device, storage medium and electronic equipment
CN112215179A (en) * 2020-10-19 2021-01-12 平安国际智慧城市科技股份有限公司 In-vehicle face recognition method, device, apparatus and storage medium
CN112949507A (en) * 2021-03-08 2021-06-11 平安科技(深圳)有限公司 Face detection method and device, computer equipment and storage medium
WO2021212736A1 (en) * 2020-04-23 2021-10-28 苏州浪潮智能科技有限公司 Feature fusion block, convolutional neural network, person re-identification method, and related device
CN115115836A (en) * 2022-06-29 2022-09-27 抖音视界(北京)有限公司 Image recognition method, image recognition device, storage medium and electronic equipment
WO2022206202A1 (en) * 2021-03-29 2022-10-06 Oppo广东移动通信有限公司 Image beautification processing method and apparatus, storage medium, and electronic device
CN116597020A (en) * 2023-05-23 2023-08-15 京东方科技集团股份有限公司 External parameter calibration method, computing equipment, image acquisition system and storage medium
CN116883466A (en) * 2023-07-11 2023-10-13 中国人民解放军国防科技大学 Optical and SAR image registration method, device and equipment based on position sensing

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150561A (en) * 2013-03-19 2013-06-12 华为技术有限公司 Face recognition method and equipment
CN109389030A (en) * 2018-08-23 2019-02-26 平安科技(深圳)有限公司 Facial feature points detection method, apparatus, computer equipment and storage medium
CN109858433A (en) * 2019-01-28 2019-06-07 四川大学 A kind of method and device based on three-dimensional face model identification two-dimension human face picture
CN110991412A (en) * 2019-12-20 2020-04-10 北京百分点信息科技有限公司 Face recognition method and device, storage medium and electronic equipment
WO2021212736A1 (en) * 2020-04-23 2021-10-28 苏州浪潮智能科技有限公司 Feature fusion block, convolutional neural network, person re-identification method, and related device
CN112215179A (en) * 2020-10-19 2021-01-12 平安国际智慧城市科技股份有限公司 In-vehicle face recognition method, device, apparatus and storage medium
CN112949507A (en) * 2021-03-08 2021-06-11 平安科技(深圳)有限公司 Face detection method and device, computer equipment and storage medium
WO2022206202A1 (en) * 2021-03-29 2022-10-06 Oppo广东移动通信有限公司 Image beautification processing method and apparatus, storage medium, and electronic device
CN115115836A (en) * 2022-06-29 2022-09-27 抖音视界(北京)有限公司 Image recognition method, image recognition device, storage medium and electronic equipment
CN116597020A (en) * 2023-05-23 2023-08-15 京东方科技集团股份有限公司 External parameter calibration method, computing equipment, image acquisition system and storage medium
CN116883466A (en) * 2023-07-11 2023-10-13 中国人民解放军国防科技大学 Optical and SAR image registration method, device and equipment based on position sensing

Also Published As

Publication number Publication date
CN117333928A (en) 2024-01-02

Similar Documents

Publication Publication Date Title
WO2018028546A1 (en) Key point positioning method, terminal, and computer storage medium
CN117333928B (en) Face feature point detection method and device, electronic equipment and storage medium
CN109522945B (en) Group emotion recognition method and device, intelligent device and storage medium
JP2023520846A (en) Image processing method, image processing apparatus, computer program and computer equipment based on artificial intelligence
CN113344862B (en) Defect detection method, device, electronic equipment and storage medium
CN112966725B (en) Method and device for matching template images and terminal equipment
CN110619334B (en) Portrait segmentation method based on deep learning, architecture and related device
CN113744268B (en) Crack detection method, electronic device and readable storage medium
CN112836625A (en) Face living body detection method and device and electronic equipment
CN113177472A (en) Dynamic gesture recognition method, device, equipment and storage medium
CN115239888B (en) Method, device, electronic equipment and medium for reconstructing three-dimensional face image
CN114092963A (en) Key point detection and model training method, device, equipment and storage medium
CN112446322A (en) Eyeball feature detection method, device, equipment and computer-readable storage medium
CN111199169A (en) Image processing method and device
CN108447092A (en) The method and device of vision positioning marker
CN114998980B (en) Iris detection method and device, electronic equipment and storage medium
CN114463856B (en) Method, device, equipment and medium for training attitude estimation model and attitude estimation
CN113610856B (en) Method and device for training image segmentation model and image segmentation
CN112990213B (en) Digital multimeter character recognition system and method based on deep learning
CN111968030B (en) Information generation method, apparatus, electronic device and computer readable medium
CN108388859B (en) Object detection method, network training method, device and computer storage medium
CN113781653A (en) Object model generation method and device, electronic equipment and storage medium
CN113763315A (en) Slide image information acquisition method, device, equipment and medium
CN113128277A (en) Generation method of face key point detection model and related equipment
CN112906629A (en) Training of facial expression classifier and facial expression recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant