CN118015669A

CN118015669A - Face alignment method and device, electronic equipment, chip and medium

Info

Publication number: CN118015669A
Application number: CN202311550037.7A
Authority: CN
Inventors: 李磊
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Priority date: 2023-11-20
Filing date: 2023-11-20
Publication date: 2024-05-10

Abstract

The disclosure provides a face alignment method, a face alignment device, electronic equipment, a chip and a medium, and relates to the technical field of computer vision. The face alignment method comprises the following steps: acquiring a face image; based on the face image, extracting first features through a first feature extraction network, wherein the first feature extraction network is composed of a first number of first feature extraction blocks, and the first feature extraction blocks are sequentially composed of the following networks: ECCB, DCANet, ECCB, SENet and DCANet; and determining face key points through a fully connected neural network based on the first characteristics, wherein the face key points are used for face alignment. According to the technical scheme provided by the disclosure, the problems of inconvenient deployment of different platforms, less characteristic association among the characteristic diagrams of each channel and low recognition accuracy of the key parts of the human face are solved, and the adaptability of the model to the platforms and the recognition accuracy of the key points of the human face are improved.

Description

Face alignment method and device, electronic equipment, chip and medium

Technical Field

The disclosure relates to the technical field of computer vision, and in particular relates to a face alignment method, a face alignment device, electronic equipment, a chip and a medium.

Background

Face alignment is the identification of the location of key points of a face in an image or video, which are typically the portions of a person's eyes, nose, mouth, cheeks, eyebrows, which have specific semantic information. In the related art, the alignment of the faces needs to adjust the network structure for different deployment platforms, the network model does not effectively screen each channel feature map, the feature association among the channel feature maps is less, and the recognition accuracy of key parts of the faces is low.

Disclosure of Invention

The disclosure provides a face alignment method, a device, electronic equipment, a chip and a medium, so as to solve the problems of less characteristic association among characteristic diagrams of all channels and low recognition accuracy of key parts of a face, and an elastic convolution characteristic extraction module (Elastic Circular Convolution Block, ECCB) is built based on Position-aware circular convolution (Position-aware Recurrent Convolutions, parC) to extract a large number of global characteristics of multiple channels. The deep connected attention network (Deep Connected Attention Network, DCANet) is then introduced to fully exploit information between the attention mechanisms, enhancing face key part information in the features. A brand new elastic high-precision face alignment model applicable to different computing platforms is constructed.

An embodiment of a first aspect of the present disclosure provides a face alignment method, including:

Acquiring a face image;

Based on the face image, extracting first features through a first feature extraction network, the first feature extraction network being composed of a first number of first feature extraction blocks, wherein the first feature extraction blocks are composed of the following networks in sequence: an elastic cyclic convolution module ECCB, a deep connection attention network DCANet, an elastic cyclic convolution module ECCB, a compression and excitation network SENet, and a deep connection attention network DCANet;

Based on the first feature, face key points are determined through the fully connected neural network, and the face key points are used for face alignment.

In one embodiment of the present disclosure, extracting a first feature through a first feature extraction network based on a face image includes:

Extracting a first global feature through a circular convolution module ECCB according to the pre-imputation force;

Extracting a second feature through the deep connected attention network DCANet based on the first global feature, the second feature comprising at least one attention feature;

Based on the second feature and a preset calculation force, extracting a second global feature through a circular convolution module ECCB, wherein the dimension of the second global feature is higher than that of the first global feature;

Extracting vector distance features between feature vectors of different channels through the compression and excitation network SENet based on the second global features;

Extracting a third feature through the deep connection attention network DCANet based on the vector distance feature, wherein the third feature comprises a attention feature and a face key region feature;

the first feature is determined based on the third feature and the first number.

In one embodiment of the present disclosure, determining the first feature from the third feature and the first number includes:

if the first number is equal to 1, the third feature is taken as a first feature;

If the first number is greater than 1, inputting the third feature into the first feature extraction block to extract the third feature until the first feature extraction block integrates the first number of continuous feature extraction, and taking the third feature as the first feature.

In one embodiment of the present disclosure, the elastic cyclic convolution module ECCB includes convolutions, position-aware cyclic convolutions ParC, which are used to extract global features.

In one embodiment of the present disclosure, extracting the first global feature or the second global feature by the circular convolution module ECCB according to the pre-imputation force comprises:

If the pre-imputation force is weak calculation force, extracting a first global feature or a second global feature by using a circular convolution module ECCB, wherein the circular convolution module ECCB sequentially comprises convolution and position sensing circular convolution ParC;

If the pre-imputation force is strong, extracting a first global feature or a second global feature through a circular convolution module ECCB, wherein the structure of the circular convolution module ECCB is formed by convolution, position sensing circular convolution ParC and convolution in sequence.

In one embodiment of the present disclosure, the loss function of the first feature extraction network is wing-loss.

An embodiment of a second aspect of the present disclosure provides a face alignment device, including:

the acquisition module is used for acquiring the face image;

The feature extraction module is used for extracting first features through a first feature extraction network based on the face image, wherein the first feature extraction network is composed of a first number of first feature extraction blocks, and the first feature extraction blocks comprise: an elastic cyclic convolution module ECCB, a deep connection attention network DCANet, an elastic cyclic convolution module ECCB, a compression and excitation network SENet, and a deep connection attention network DCANet;

And the determining module is used for determining the key points of the human face through the fully-connected neural network based on the first characteristics.

An embodiment of a third aspect of the present disclosure proposes an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the embodiments of the first aspect of the present disclosure.

An embodiment of a fourth aspect of the present disclosure proposes a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of the embodiment of the first aspect of the present disclosure.

A fifth aspect embodiment of the present disclosure proposes a chip comprising at least one processor and a communication interface; the communication interface is for receiving signals input to or output from the chip, and the processor communicates with the communication interface and implements the method of any of the embodiments of the first aspect of the disclosure by logic circuitry or executing code instructions.

In summary, according to the face alignment method provided by the disclosure, a face image is obtained, and a data source is provided for face alignment; based on the face image, extracting first features through a first feature extraction network, the first feature extraction network being composed of a first number of first feature extraction blocks, wherein the first feature extraction blocks are composed of the following networks in sequence: the elastic cyclic convolution module ECCB, the deep connection attention network DCANet, the elastic cyclic convolution module ECCB, the compression and excitation network SENet and the deep connection attention network DCANet can capture a large number of global characteristics of multiple channels, keep the characteristic positions sensitive, select corresponding convolution training according to the calculation force of a platform, obtain a corresponding model without changing the network structure, strengthen the connection between effective characteristic diagrams of all channels, strengthen key part information of faces and construct a high-precision face alignment model applicable to various calculation force platforms; based on the first feature, the face key points are determined through the fully-connected neural network and used for face alignment, so that the adaptability of the model to the platform and the recognition accuracy of the face key points are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

Fig. 1 is a flowchart of a face alignment method according to an embodiment of the present disclosure;

FIG. 1a is a schematic diagram of a first feature extraction network according to an embodiment of the disclosure;

FIG. 2 is a flowchart of extracting a first feature through a first feature extraction network based on a face image according to an embodiment of the present disclosure;

FIG. 3 is a flow chart of determining a first feature according to a third feature and a first number of embodiments of the present disclosure;

FIG. 4 is a schematic diagram of an elastic cyclic convolution module ECCB according to an embodiment of the disclosure;

FIG. 5 is a flow chart of extracting a first global feature or a second global feature by a circular convolution module ECCB according to a pre-imputation force in accordance with an embodiment of the present disclosure;

FIG. 5a is a schematic diagram of ECCB in a different computing power in accordance with an embodiment of the present disclosure;

fig. 6 is a schematic diagram of a result of face alignment according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a face alignment device according to an embodiment of the disclosure;

FIG. 8 is a block diagram of an electronic device for implementing the face alignment method of the present disclosure, shown in accordance with an exemplary embodiment;

Fig. 9 is a schematic structural diagram of a chip of an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals identify the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present disclosure and are not to be construed as limiting the present disclosure.

In the field of computer vision, face alignment is an important pre-step in face recognition systems, and has many applications in the face fields such as face tracking and expression recognition. Given a two-dimensional face image, the face alignment algorithm predicts the position l= { x ₁,y₁,x₂,y₂,...,x_D,y_D }, where x and y are used to represent the two-dimensional coordinates of the face key points, the number D of key points is different in each dataset, and there are 68 and 98 points in common. In the related art, most of face alignment uses a VGG16 of an hourglass network (hourglassnet) or a visual geometry group (Visual Geometry Group, VGG) as a basic framework and uses a mean square error (Mean Square Error, MSE) as a loss function optimization model, and the constructed network has high detection precision, but has strict calculation force requirements on a computing platform during network deployment. Another broad class of methods is to build some networks suitable for mobile devices, mainly Mobilenet series of lightweight networks. But in some scenarios the accuracy of recognition is not high. And these network models do not effectively screen the feature graphs of the various channels, even if attention mechanisms are introduced, the attention mechanisms only consider the current features and cannot flow information between the attention blocks attention blocks.

The method and the device aim to solve the problems that different platforms are inconvenient to deploy, feature association among feature graphs of all channels is few, and recognition accuracy of key parts of faces is low. By the face alignment method provided by the disclosure, the adaptability of the model to the platform and the accuracy of face key point identification are improved.

The face alignment method is applied to a face alignment task, has rich scenes, can obviously improve the recognition accuracy of the key points of the face for the face image, and ensures that the model can be conveniently deployed on different platforms. The application of the method has good commercial prospect, and can be widely applied to the fields of general face recognition, video special effects, expression recognition, artistic creation, face similarity comparison and the like. The application scenario is not limited in the embodiments of the present disclosure.

The face alignment method provided by the application is described in detail below with reference to the accompanying drawings.

Fig. 1 is a flowchart of a face alignment method according to an embodiment of the present disclosure. As shown in the embodiment of fig. 1, the face alignment method includes:

step 101, acquiring a face image.

In this embodiment, the face image is an image including a face, and preferably, the face image is a face image extracted through a face recognition network and cut through a bounding box.

Step 102, extracting first features through a first feature extraction network based on the face image, wherein the first feature extraction network is composed of a first number of first feature extraction blocks, and the first feature extraction blocks are sequentially composed of the following networks: an elastic cyclic convolution module ECCB, a deep connection attention network DCANet, an elastic cyclic convolution module ECCB, a compression and excitation network (Squeeze and Excitation, SENet), and a deep connection attention network DCANet.

In this embodiment, the first feature extraction network is a feature extraction network for extracting key points of key parts of a face in a face image. The first feature is a feature containing face position information of a person. The first number is an integer value greater than or equal to 1. The first feature extraction block is a block structure formed by a plurality of networks, and the first feature extraction network comprises a first number of first feature extraction blocks. The first feature extraction block is composed of ECCB, DCANet, ECCB, SENet and DCANet in order. Wherein the ECCB may capture a large number of global features of the multi-channel while maintaining feature location sensitivity. And the new convolution structure is suitable for different computing platforms. The attention structure based on DCANet can fully utilize the characteristic information of the ParC structure and the SE structure, enhance the relation between effective characteristic diagrams of all channels, collect information from the previous attention diagram, transmit the information to the next attention module, and enable the attention modules to cooperate with each other, thereby improving the learning ability of an attention mechanism, strengthening the key part information of a human face, selecting different convolution training models according to the calculation condition and deploying the models, and the novel attention mechanism structure further improves the alignment precision of the models. SENet the core idea is to learn the feature weights according to the loss function (loss) through the network, so that the effective feature map (feature map) has a large weight, and the model is trained in a mode of small feature map weight with no effect or small effect to achieve a better result. Wherein a position-aware cyclic convolution (ParC) P operator captures global features by using global kernels and cyclic convolution, while preserving position sensitivity by position embedding.

Fig. 1a is a schematic structural diagram of a first feature extraction network according to an embodiment of the disclosure. In this embodiment, a face image is input, features of face position information are extracted by a plurality of first feature extraction blocks, and the features are determined as face key point coordinates by the method of step 103 below and displayed on the face image.

Step 103, determining face key points through the fully connected neural network based on the first characteristics, wherein the face key points are used for face alignment.

In this embodiment, the fully-connected neural network is a basic artificial neural network fully-connected neural network (Fully Connected Neural Network, abbreviated as FCNN) which is a basic artificial neural network structure in which each neuron is connected to all neurons of the previous and subsequent layers. It is a most basic neural network that can handle the complex problem of multiple input multiple output. By inputting the first feature as FCNN, the face key point coordinates of the specified number of specified categories can be output through mapping. The face key point coordinates belong to the classification of different face parts and are used for representing main key points of all parts in the face, and the face key points are identified to indicate that the face alignment is completed.

Fig. 2 is a flowchart of extracting a first feature through a first feature extraction network based on a face image according to an embodiment of the present disclosure. Fig. 2 further illustrates step 102 of fig. 1, and includes the following steps, based on the embodiment shown in fig. 2:

In step 201, a first global feature is extracted by a circular convolution module ECCB according to the pre-imputation force.

In this embodiment, according to the different computing forces corresponding to the different running devices, according to the pre-imputation forces, a first global feature, which is a large number of multi-channel global features, can be extracted through the ECCB, and the global feature maintains the feature position sensitivity. In practical applications, the different convolutions correspond to specific parameters of the network interface. The corresponding convolution can be selected for training according to the actual computing power of the platform, and the corresponding model can be obtained without changing the network structure.

Step 202, extracting a second feature through the deep connected attention network DCANet based on the first global feature, the second feature comprising at least one attention feature.

In this embodiment, the second feature is facial key part information obtained by interconnecting adjacent blocks through an attention mechanism after strengthening the first global feature, and the second feature is extracted through DCANet according to the first global feature.

In step 203, based on the second feature and the preset computing force, the second global feature is extracted by the circular convolution module ECCB, wherein the dimension of the second global feature is higher than the dimension of the first global feature.

In this embodiment, the second global feature is a global feature including a key point position and a location of the face, and its dimension is higher than that of the first global feature, and the second global feature is extracted again through the ECCB according to the second feature and a preset calculation force.

Step 204 extracts vector distance features between feature vectors of different channels through the compression and excitation network SENet based on the second global features.

In this embodiment, according to the second global feature, the feature of the vector distance between feature vectors of different channels is extracted through SENet. Some important areas possibly weakened in the features extracted by the ECCB are strengthened, and more weight is allocated to the features related to the five semantic parts of the face. SENet first use averaging pooling to obtain feature vectors for channel directions, and then use a continuous full-join layer to capture dependencies between different channels.

Step 205, extracting a third feature through the deep connected attention network DCANet based on the vector distance feature, the third feature including a attention feature and a face key region feature.

In this embodiment, the third feature includes a attention feature and a feature of a key region of the face, that is, a feature of a key region of the face. And extracting a third feature again through DCANet according to the vector distance feature.

Step 206, determining a first feature based on the third feature and the first number.

In this embodiment, the number of times of executing the first feature extraction block may be determined according to the first number, and the third feature may be extracted every time the first feature extraction block is executed, so that the third feature obtained finally is used as the first feature after the first number of first feature extraction blocks are executed.

In this embodiment, an elastic convolution feature extraction module ECCB is built based on the position-aware circular convolution Parc to extract a large number of global features of multiple channels. Then DCANet is introduced to fully utilize information among the attention mechanisms and enhance the key part information of the face in the characteristics. The channel attention mechanism of SENet is also utilized to enhance the features of the ECCB extraction. Capturing a large number of global features of multiple channels, keeping feature position sensitive, selecting corresponding convolution training according to platform computing power, obtaining a corresponding model without changing a network structure, enhancing the relation between effective feature graphs of all channels, strengthening key part information of faces, and constructing a high-precision face alignment model applicable to various computing power platforms.

FIG. 3 is a flow chart of determining a first feature according to a third feature and a first number of embodiments of the present disclosure. Fig. 3 is a further illustration of step 206 of fig. 2, based on the embodiment shown in fig. 3, comprising the steps of:

In step 301, if the first number is equal to 1, the third feature is taken as the first feature.

In this embodiment, since the first feature extraction network includes a first number of first feature extraction blocks, if the first number is 1, it is explained that the first feature extraction network is the first feature extraction block, and the third feature can be used as the first feature according to the output of the first feature extraction block.

In step 302, if the first number is greater than 1, the third feature is input into the first feature extraction block to extract the third feature until the first feature extraction block integrates the first number of continuous feature extraction, and the third feature is used as the first feature.

In this embodiment, if the first feature extraction network is configured of 2 or more first feature extraction blocks, after feature extraction by the first feature extraction blocks a plurality of times, the last third feature is used as the first feature for representing feature information of a face part in a face image.

In this embodiment, according to the number of executions of the first feature extraction block, the first feature is extracted, and the third feature finally output by the first feature extraction block is used as the first feature to represent the feature information of the key part of the face. Important feature data is provided for face alignment.

Fig. 4 is a schematic diagram of an elastic cyclic convolution module ECCB according to an embodiment of the present disclosure. Fig. 4 is a specific illustration of fig. 1 or 2, and the ECCB includes convolutions, position-aware cyclic convolutions ParC for extracting global features, based on the embodiment shown in fig. 4. In this embodiment, the global features in the face image are extracted by the ECCB, and the first feature extraction block provides abundant features for face alignment.

Fig. 5 is a flow chart of extracting a first global feature or a second global feature by a circular convolution module ECCB according to a pre-imputation force in accordance with an embodiment of the present disclosure. Fig. 5 is a specific illustration of steps 201 and 203 of fig. 2, based on the embodiment shown in fig. 5, comprising the steps of:

In step 501, if the pre-imputation force is weak, the first global feature or the second global feature is extracted by the circular convolution module ECCB, where the circular convolution module ECCB is configured by sequentially performing the position-aware circular convolution ParC and the convolution.

In this embodiment, if the input data is processed on the weak computing device, the first global feature or the second global feature is extracted by the ECCB, and at this time, the structure of the ECCB is sequentially formed by the ParC and the convolutions Conv with the convolutions range of 1x1 and the step size stride=s, where the activation functions of the ParC and Conv are Relu6.

Step 502, if the pre-imputation force is a strong calculation force, extracting a first global feature or a second global feature by using a circular convolution module ECCB, wherein the structure of the circular convolution module ECCB is sequentially formed by convolution, position sensing circular convolution ParC and convolution.

In this embodiment, if the input data is processed on a powerful device, the first global feature or the second global feature is extracted by the ECCB, where the structure of the ECCB is sequentially formed by a convolution Conv with a convolution range of 1x1 and a step size stride=s, a ParC, and a convolution Conv with a convolution range of 1x1 and a step size stride=s, where an activation function of the ParC is Relu, and an activation function of the second convolution Conv is Linear.

In this embodiment, a large number of global features of multiple channels are extracted by ECCB, which maintain feature position sensitivity and provide features of key parts of a face for face alignment.

Fig. 5a is a schematic diagram of the composition of an ECCB in a different computing power in an embodiment of the present disclosure. In this embodiment, the ECCB is composed of two convolution structures, each adapted to a different computing device. For low-computation-power equipment, basic features are firstly extracted by using ParC, then the dimensionality of the feature map is changed by using point-by-point convolution, namely convolution, so that a multi-dimensional feature map is obtained, effective spatial information among the features is greatly reserved, and a ReLU6 is used as an activation function. For high-power devices, point-by-point convolution is used to expand the channel, parC is used to extract features, and Linear activation functions are used to replace ReLU6 for the finally output features. When the method is actually developed and applied, the encapsulation network provides an interface to the outside, and a specific convolution mode can be selected according to the parameters of the interface. Global features are provided for face alignment and sensitivity to main parts of the face is enhanced.

In the embodiment of the disclosure, the loss function of the first feature extraction network is wing-loss. The mathematical expression is as follows:

Where the non-negative number w sets the range of the non-linear part to (w, -w), e is used to limit the curvature of the non-linear region, and C is a constant used to connect the defined linear and non-linear parts such that the curve is finally smooth. And when the error of the loss function between the predicted value and the true value is small, the gradient is amplified, the training effect and performance of the model are improved, and the accuracy of face key point detection is improved.

Fig. 6 is a schematic diagram of a face alignment result according to an embodiment of the present disclosure. In this embodiment, the facial feature information in the facial image is extracted through the first feature extraction network, and mapped to the preset number of positions through the fully connected neural network, so as to form the key points of the facial, including the key points of the facial contour, lips, nose, eyes and eyebrow parts, thereby completing the facial alignment and ensuring that the facial parts can be identified in different objects.

In the face alignment method provided by the embodiment of the disclosure, the face image is acquired, and a data source is provided for face alignment; based on the face image, extracting first features through a first feature extraction network, the first feature extraction network being composed of a first number of first feature extraction blocks, wherein the first feature extraction blocks are composed of the following networks in sequence: the elastic cyclic convolution module ECCB, the deep connection attention network DCANet, the elastic cyclic convolution module ECCB, the compression and excitation network SENet and the deep connection attention network DCANet can capture a large number of global characteristics of multiple channels, keep the characteristic positions sensitive, select corresponding convolution training according to the calculation force of a platform, obtain a corresponding model without changing the network structure, strengthen the connection between effective characteristic diagrams of all channels, strengthen key part information of faces and construct a high-precision face alignment model applicable to various calculation force platforms; based on the first feature, the face key points are determined through the fully-connected neural network and used for face alignment, so that the adaptability of the model to the platform and the recognition accuracy of the face key points are improved. Corresponding to the methods provided in the foregoing several embodiments, the disclosure further provides a face alignment device, and since the device provided in the embodiments of the disclosure corresponds to the methods provided in the foregoing several embodiments, implementation manners of the method are also applicable to the device provided in the embodiments, and will not be described in detail in the embodiments.

Fig. 7 is a schematic structural diagram of a face alignment device 700 according to an embodiment of the disclosure. As shown in fig. 7, the face alignment device includes:

An acquisition module 710, configured to acquire a face image;

The feature extraction module 720 is configured to extract a first feature through a first feature extraction network based on the face image, where the first feature extraction network is composed of a first number of first feature extraction blocks, and the first feature extraction blocks include: an elastic cyclic convolution module ECCB, a deep connection attention network DCANet, an elastic cyclic convolution module ECCB, a compression and excitation network SENet, and a deep connection attention network DCANet;

a determining module 730, configured to determine a face key point through the fully-connected neural network based on the first feature.

The feature extraction module 720 is further configured to:

The feature extraction module 720 determines the first feature from the third feature and the first number in the following manner:

The elastic cyclic convolution module ECCB of the feature extraction module 720 includes convolution, position-aware cyclic convolution ParC, which is used to extract global features.

The feature extraction module 720 extracts the first global feature or the second global feature by the circular convolution module ECCB according to the pre-imputation force in the following manner:

The loss function of the first feature extraction network of the feature extraction module 720 is wing-loss.

In conclusion, a face image is obtained through a face alignment device; based on the face image, extracting first features through a first feature extraction network, the first feature extraction network being composed of a first number of first feature extraction blocks, wherein the first feature extraction blocks are composed of the following networks in sequence: an elastic cyclic convolution module ECCB, a deep connection attention network DCANet, an elastic cyclic convolution module ECCB, a compression and excitation network SENet, and a deep connection attention network DCANet; based on the first feature, face key points are determined through the fully connected neural network, and the face key points are used for face alignment. The device solves the problems of inconvenient deployment of different platforms, less characteristic association among the characteristic diagrams of each channel and low recognition accuracy of the key parts of the human face, and improves the adaptability of the model to the platforms and the recognition accuracy of the key points of the human face.

In the embodiment provided by the application, the method and the device provided by the embodiment of the application are introduced. In order to implement the functions in the method provided by the embodiment of the present application, the electronic device may include a hardware structure, a software module, and implement the functions in the form of a hardware structure, a software module, or a hardware structure plus a software module. Some of the functions described above may be implemented in a hardware structure, a software module, or a combination of a hardware structure and a software module.

Fig. 8 is a block diagram of an electronic device 800 for implementing the face alignment method described above, according to an example embodiment.

For example, electronic device 800 may be a mobile phone, computer, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.

Referring to fig. 8, an electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 600. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen between the electronic device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or sliding action, but also the duration and pressure associated with the touch or sliding operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operational mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the electronic device 800. For example, the sensor assembly 814 may detect an on/off state of the electronic device 800, a relative positioning of the components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in position of the electronic device 800 or a component of the electronic device 800, the presence or absence of a user's contact with the electronic device 800, an orientation or acceleration/deceleration of the electronic device 800, and a change in temperature of the electronic device 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communication between the electronic device 800 and other devices, either wired or wireless. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi,2G or 3G,4G LTE, 5G NR (New Radio), or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device a00 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the above method.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including instructions executable by processor 820 of electronic device 800 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

Embodiments of the present disclosure also provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the face alignment method described in the above-described embodiments of the present disclosure.

Embodiments of the present disclosure also provide a computer program product comprising a computer program which, when executed by a processor, performs the face alignment method described in the above embodiments of the present disclosure.

Fig. 9 is a schematic structural diagram of a chip 900 for implementing the above-mentioned face alignment method according to an exemplary embodiment.

Referring to fig. 9, a chip 900 includes at least one communication interface 901 and a processor 902; the communication interface 901 is configured to receive a signal input to the chip 900 or a signal output from the chip 900, and the processor 902 communicates with the communication interface 901 and implements the face alignment method described in the above embodiments through logic circuits or executing code instructions.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

In the description of the present specification, reference is made to the description of the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., meaning that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present disclosure in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present disclosure.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, system that includes a processing module, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (control method) with one or more wires, a portable computer cartridge (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It should be understood that portions of embodiments of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, and the program may be stored in a computer readable storage medium, where the program when executed includes one or a combination of the steps of the method embodiments.

Furthermore, functional units in various embodiments of the present disclosure may be integrated into one processing module, or each unit may exist alone physically, or two or more units may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented as software functional modules and sold or used as a stand-alone product. The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.

While embodiments of the present disclosure have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the present disclosure, and that variations, modifications, alternatives, and variations of the above embodiments may be made by those of ordinary skill in the art within the scope of the present disclosure.

Claims

1. A face alignment method, the method comprising:

Acquiring a face image;

Based on the face image, extracting first features through a first feature extraction network, wherein the first feature extraction network is composed of a first number of first feature extraction blocks, and the first feature extraction blocks are sequentially composed of the following networks: an elastic cyclic convolution module ECCB, a deep connection attention network DCANet, the elastic cyclic convolution module ECCB, a compression and excitation network SENet, and the deep connection attention network DCANet;

And determining face key points through a fully connected neural network based on the first characteristics, wherein the face key points are used for face alignment.

2. The method of claim 1, wherein the extracting a first feature through a first feature extraction network based on the face image comprises:

Extracting a first global feature by the circular convolution module ECCB according to the pre-imputation force;

Extracting a second feature through the deep connected attention network DCANet based on the first global feature, the second feature including at least one attention feature;

Extracting a second global feature by the circular convolution module ECCB based on the second feature and the preset calculation force, wherein the dimension of the second global feature is higher than that of the first global feature;

Extracting vector distance features between feature vectors of different channels through the compression and excitation network SENet based on the second global feature;

Extracting a third feature through the deep connected attention network DCANet based on the vector distance feature, the third feature including a degree of interest feature and a face key region feature;

3. The method of claim 2, wherein said determining said first feature from said third feature and said first number comprises:

if the first number is equal to 1, the third feature is taken as the first feature;

and if the first number is greater than 1, inputting the third feature into the first feature extraction block to extract the third feature until the first feature extraction block integrates the first number of continuous feature extraction, and taking the third feature as the first feature.

4. The method according to claim 1 or 2, wherein the elastic cyclic convolution module ECCB comprises a convolution, a position-aware cyclic convolution ParC, for extracting global features.

5. The method of claim 2, wherein extracting, by the circular convolution module ECCB, the first global feature or the second global feature according to the pre-imputation force comprises:

If the pre-imputation force is weak calculation force, extracting a first global feature or the second global feature by using the circular convolution module ECCB, wherein the circular convolution module ECCB sequentially comprises the convolution and the position sensing circular convolution ParC;

and if the pre-imputation force is strong calculation force, extracting a first global feature or the second global feature by the circular convolution module ECCB, wherein the structure of the circular convolution module ECCB is sequentially formed by the convolution, the position sensing circular convolution ParC and the convolution.

6. The method of claim 1, wherein the loss function of the first feature extraction network is wing-loss.

7. A face alignment device, the device comprising:

the acquisition module is used for acquiring the face image;

The feature extraction module is configured to extract a first feature through a first feature extraction network based on the face image, where the first feature extraction network is configured by a first number of first feature extraction blocks, and the first feature extraction blocks include: an elastic cyclic convolution module ECCB, a deep connection attention network DCANet, the elastic cyclic convolution module ECCB, a compression and excitation network SENet, and the deep connection attention network DCANet;

And the determining module is used for determining the key points of the human face through the fully-connected neural network based on the first characteristic.

8. An electronic device, comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

9. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6.

10. A chip comprising at least one processor and a communication interface; the communication interface is configured to receive signals input to or output from the chip, and the processor is in communication with the communication interface and implements the method according to any one of claims 1-6 by logic circuitry or execution of code instructions.