CN109934058B - Face image processing method, face image processing device, electronic apparatus, storage medium, and program - Google Patents

Face image processing method, face image processing device, electronic apparatus, storage medium, and program Download PDF

Info

Publication number
CN109934058B
CN109934058B CN201711352295.9A CN201711352295A CN109934058B CN 109934058 B CN109934058 B CN 109934058B CN 201711352295 A CN201711352295 A CN 201711352295A CN 109934058 B CN109934058 B CN 109934058B
Authority
CN
China
Prior art keywords
face
face image
key points
key point
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711352295.9A
Other languages
Chinese (zh)
Other versions
CN109934058A (en
Inventor
吴文岩
王权
钱晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Priority to CN202110982660.4A priority Critical patent/CN113688737A/en
Priority to CN201711352295.9A priority patent/CN109934058B/en
Publication of CN109934058A publication Critical patent/CN109934058A/en
Application granted granted Critical
Publication of CN109934058B publication Critical patent/CN109934058B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)
  • Image Processing (AREA)

Abstract

The embodiment of the application discloses a face image processing method, a method and a device for training a neural network, electronic equipment, a computer readable storage medium and a computer program, wherein the method mainly comprises the following steps: inputting a face image to be processed into a neural network; extracting face key point information including at least one three-dimensional motionless key point of the face image to be processed through the neural network.

Description

Face image processing method, face image processing device, electronic apparatus, storage medium, and program
Technical Field
The present application relates to computer vision technology, and in particular, to a face image processing method, a face image processing apparatus, a method for training a neural network, an apparatus for training a neural network, an electronic device, a computer-readable storage medium, and a computer program.
Background
The key points of the human face are an indispensable part in various applications such as human face recognition. The method and the device have the advantages that the key points of the face are accurately determined, particularly the key points of the face with a large expression or a large-angle deflection face are accurately determined, the correction of the related position of the face is facilitated, and the enhancement of the semantic information of the face is facilitated.
Disclosure of Invention
The embodiment of the application provides a technical scheme for processing a face image.
According to one aspect of the embodiments of the present application, there is provided a face image processing method, including: inputting a face image to be processed into a neural network; extracting face key point information including at least one three-dimensional motionless key point of the face image to be processed through the neural network.
In an embodiment of the present application, the face key point information extracted by the neural network includes: and the face key point information is positioned at the position which is not blocked and/or the position which is blocked in the face image to be processed.
In another embodiment of the present application, the three-dimensional motionless key points include: at least one of an eyelid center keypoint, a nasal lower edge center keypoint, a lip peak keypoint, a lip center keypoint, and a chin center keypoint.
In yet another embodiment of the present application, the eyelid center key points include: at least one of a left eye upper eyelid center keypoint, a left eye lower eyelid center keypoint, a right eye upper eyelid center keypoint, and a right eye lower eyelid center keypoint.
In yet another embodiment of the present application, the lip center key points include: at least one of a central key point of an upper lip line of the upper lip, a central key point of a lower lip line of the upper lip, a central key point of an upper lip line of the lower lip, and a central key point of a lower lip line of the lower lip.
In yet another embodiment of the present application, the lip peak key points include: at least one of lip left lip peak keypoints and lip right lip peak keypoints.
In another embodiment of the present application, the face key point information extracted by the neural network includes: 68, 106, 240, 252 or 270 face keypoint information.
In yet another embodiment of the present application, the neural network is implemented using a neural network labeled with: the face image sample of the face key point marking information including at least one three-dimensional immobile key point is trained.
In yet another embodiment of the present application, the method further comprises: obtaining a plurality of face image samples, wherein the face image samples are marked with: the face key point marking information comprises at least one three-dimensional immobile key point; training the neural network using the plurality of face image samples.
In yet another embodiment of the present application, the face image sample is marked with face key point marking information located at an unobstructed position and/or an obstructed position in the face image sample.
In another embodiment of the present application, the face key point annotation information of the face image sample includes: 68, 106, 240, 252, or 270 individual face key points are annotated with information.
In another embodiment of the present application, before the training of the neural network by using the face image sample, the method further includes the following steps: determining a curve control point of at least one part of a human face in a human face image sample; forming a curve according to the curve control points; determining three-dimensional immobile key points on the curve as face image samples; selecting points on a curve on at least one side of a three-dimensional fixed key point of a face image sample; and forming face key point marking information according to the selected points and the three-dimensional immobile key points of the face image samples.
In another embodiment of the present application, the determining a three-dimensional stationary key point on the curve as a face image sample includes: and determining a point on the curve, wherein the distance between the point and the three-dimensional immobile key point marked on the face image sample meets the requirement of a preset distance, and taking the point as the three-dimensional immobile key point of the face image sample.
In another embodiment of the present application, the training the neural network using the plurality of face image samples includes: inputting a face image sample into the neural network, and extracting face key point information of the face image sample through the neural network; calculating a first distance between different key points on at least one organ according to the extracted face key point information; calculating a second distance between different key points on at least one organ according to the face key point labeling information; supervised learning of the neural network via a first loss function based on the first and second distances.
In another embodiment of the present application, the calculating a first distance between different key points on at least one organ according to the extracted face key point information includes: respectively calculating the distance between two key points in a group aiming at any group of key points formed by the extracted key points of the upper eyelid line of the left eye and the corresponding key points of the lower eyelid line of the left eye so as to obtain at least one first distance based on the eyelid line of the left eye; and/or respectively calculating the distance between two key points in a group aiming at any group of key points formed by the extracted key points of the upper eyelid line of the right eye and the corresponding key points of the lower eyelid line of the right eye so as to obtain at least one first distance based on the eyelid line of the right eye; and/or respectively calculating the distance between two key points in a group aiming at any group of key points formed by the extracted key points of the upper lip line of the upper lip and the corresponding key points of the lower lip line of the lower lip to obtain at least one first distance based on the outer lip line; and/or respectively calculating the distance between two key points in each group aiming at any group of key points formed by the extracted key points of the lower lip line of the upper lip and the corresponding key points of the upper lip line of the lower lip so as to obtain at least one first distance based on the inner lip line.
In yet another embodiment of the present application, the first loss function includes: a first loss function for a first distance based on a left eye eyelid line, a first loss function for a first distance based on a right eye eyelid line, a first loss function for a first distance based on an outer lip line, and/or a first loss function for a first distance based on an inner lip line.
In another embodiment of the present application, the training the neural network using the plurality of face image samples includes: determining the distance between two key points in a group aiming at least one group of key points formed by the extracted face key points and the labeled corresponding face key points so as to obtain at least one drift distance; supervised learning of the neural network via a second loss function based on the drift distance.
According to another aspect of the embodiments of the present application, there is provided a method for training a neural network, the method including: obtaining a plurality of face image samples, wherein the face image samples are marked with: the face key point marking information comprises at least one three-dimensional immobile key point; and training a neural network by using the plurality of face image samples.
In an embodiment of the present application, the face image sample is marked with face key point marking information located at an unobstructed position and/or an obstructed position in the face image sample.
In another embodiment of the present application, the face key point annotation information of the face image sample includes: 68, 106, 240, 252, or 270 individual face key points are annotated with information.
In another embodiment of the present application, before the training of the neural network using the plurality of face image samples, the method further includes: determining a curve control point of at least one part of a human face in a human face image sample; forming a curve according to the curve control points; determining three-dimensional immobile key points on the curve as face image samples; selecting points on a curve on at least one side of a three-dimensional fixed key point of a face image sample; and forming face key point marking information according to the selected points and the three-dimensional immobile key points of the face image samples.
In another embodiment of the present application, the determining a three-dimensional stationary key point on the curve as a face image sample includes: and determining a point on the curve, wherein the distance between the point and the three-dimensional immobile key point marked on the face image sample meets the requirement of a preset distance, and taking the point as the three-dimensional immobile key point of the face image sample.
In another embodiment of the present application, the training a neural network using the plurality of face image samples includes: inputting a face image sample into a neural network, and extracting face key point information of the face image sample through the neural network; calculating a first distance between different key points on at least one organ according to the extracted face key point information; calculating a second distance between different key points on at least one organ according to the face key point labeling information; supervised learning of the neural network via a first loss function based on the first and second distances.
In another embodiment of the present application, the calculating a first distance between different key points on at least one organ according to the extracted face key point information includes: respectively calculating the distance between two key points in a group aiming at any group of key points formed by the extracted key points of the upper eyelid line of the left eye and the corresponding key points of the lower eyelid line of the left eye so as to obtain at least one first distance based on the eyelid line of the left eye; and/or respectively calculating the distance between two key points in a group aiming at any group of key points formed by the extracted key points of the upper eyelid line of the right eye and the corresponding key points of the lower eyelid line of the right eye so as to obtain at least one first distance based on the eyelid line of the right eye; and/or respectively calculating the distance between two key points in a group aiming at any group of key points formed by the extracted key points of the upper lip line of the upper lip and the corresponding key points of the lower lip line of the lower lip to obtain at least one first distance based on the outer lip line; and/or respectively calculating the distance between two key points in each group aiming at any group of key points formed by the extracted key points of the lower lip line of the upper lip and the corresponding key points of the upper lip line of the lower lip so as to obtain at least one first distance based on the inner lip line.
In yet another embodiment of the present application, the first loss function includes: a first loss function for a first distance based on a left eye eyelid line, a first loss function for a first distance based on a right eye eyelid line, a first loss function for a first distance based on an outer lip line, and/or a first loss function for a first distance based on an inner lip line.
In another embodiment of the present application, the training the neural network using the plurality of face image samples includes: determining the distance between two key points in a group aiming at least one group of key points formed by the extracted face key points and the labeled corresponding face key points so as to obtain at least one drift distance; supervised learning of the neural network via a second loss function based on the drift distance.
According to still another aspect of the embodiments of the present application, there is provided a face image processing apparatus, including: the input image module is used for inputting the face image to be processed into the neural network; and the key point acquisition module is used for extracting the face key point information of the face image to be processed, including at least one three-dimensional fixed key point, through the neural network.
In an embodiment of the present application, the face image processing apparatus further includes: an apparatus for training a neural network, the apparatus for training a neural network comprising: an obtaining sample module, configured to obtain a plurality of face image samples, where the face image samples are marked with: the face key point marking information comprises at least one three-dimensional immobile key point; and the training module is used for training the neural network by utilizing the plurality of face image samples.
In another embodiment of the present application, the apparatus for training a neural network further includes: a labeling module to: determining a curve control point of at least one part of a human face in a human face image sample; forming a curve according to the curve control points; determining three-dimensional immobile key points on the curve as face image samples; selecting points on a curve on at least one side of a three-dimensional fixed key point of a face image sample; and forming face key point marking information according to the selected points and the three-dimensional immobile key points of the face image samples.
In another embodiment of the present application, the training module is specifically configured to: inputting a face image sample into the neural network, and extracting face key point information of the face image sample through the neural network; calculating a first distance between different key points on at least one organ according to the extracted face key point information; calculating a second distance between different key points on at least one organ according to the face key point labeling information; supervised learning of the neural network via a first loss function based on the first and second distances.
In another embodiment of the present application, the training module is specifically configured to: determining the distance between two key points in a group aiming at least one group of key points formed by the extracted face key points and the labeled corresponding face key points so as to obtain at least one drift distance; supervised learning of the neural network via a second loss function based on the drift distance.
According to still another aspect of the embodiments of the present application, there is provided an apparatus for training a neural network, the apparatus including: an obtaining sample module, configured to obtain a plurality of face image samples, where the face image samples are marked with: the face key point marking information comprises at least one three-dimensional immobile key point; and the training module is used for training the neural network by utilizing the plurality of face image samples.
In an embodiment of the present application, the apparatus further includes: a labeling module to: determining a curve control point of at least one part of a human face in a human face image sample; forming a curve according to the curve control points; determining three-dimensional immobile key points on the curve as face image samples; selecting points on a curve on at least one side of a three-dimensional fixed key point of a face image sample; and forming face key point marking information according to the selected points and the three-dimensional immobile key points of the face image samples.
In another embodiment of the present application, the training module is specifically configured to: inputting a face image sample into the neural network, and extracting face key point information of the face image sample through the neural network; calculating a first distance between different key points on at least one organ according to the extracted face key point information; calculating a second distance between different key points on at least one organ according to the face key point labeling information; supervised learning of the neural network via a first loss function based on the first and second distances.
In another embodiment of the present application, the training module is specifically configured to: determining the distance between two key points in a group aiming at least one group of key points formed by the extracted face key points and the labeled corresponding face key points so as to obtain at least one drift distance; supervised learning of the neural network via a second loss function based on the drift distance.
According to still another aspect of embodiments of the present application, there is provided an electronic apparatus including: a memory for storing a computer program; a processor for executing the computer program stored in the memory, and when the computer program is executed, implementing the steps in any human face image processing method in the present application, or implementing the steps in any method for training a neural network in the present application.
According to a further aspect of the embodiments of the present application, there is provided a computer-readable storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing the steps in any of the face image processing methods in the present application or implementing the steps in any of the methods for training a neural network in the present application.
According to a further aspect of an embodiment of the present application, there is provided a computer program comprising computer instructions for implementing the steps of any of the face image processing methods of the present application, or for implementing the steps of any of the methods for training a neural network of the present application, when the computer instructions are run in a processor of a device.
Based on the face image processing method, the method for training the neural network, the device, the electronic equipment, the computer readable storage medium and the computer program provided by the application, the neural network is used for extracting the three-dimensional immobile key points in the face image to be processed, so that the semantics of the specific part on the face represented by the three-dimensional immobile key points are fixed and can not change no matter whether the face deflects or not and no matter what the expression on the face is specific, therefore, the application can accurately position the specific part in the face aiming at the faces with various forms, and is very important for various applications such as image rendering, face changing processing, beauty processing, face recognition, face state detection, expression detection and the like. Therefore, the method and the device are beneficial to improving the accuracy of extracting the key points of the human face.
The technical solution of the present application is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description, serve to explain the principles of the application.
The present application may be more clearly understood from the following detailed description with reference to the accompanying drawings, in which:
fig. 1 is a flowchart of an embodiment of a face image processing method according to the present application;
FIG. 2 is a schematic illustration of three-dimensional stationary key points of the present application;
FIG. 3 is a flow diagram of one embodiment of a method for training a neural network of the present application;
FIG. 4 is a flow chart of a specific example of a method for training a neural network of the present application;
FIG. 5 is a flowchart of labeling a face image sample according to the present application;
FIG. 6 is a schematic structural diagram of an embodiment of a face image processing apparatus according to the present application;
FIG. 7 is a schematic diagram of an embodiment of an apparatus for training a neural network according to the present application;
FIG. 8 is a block diagram of an exemplary device implementing embodiments of the present application.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
Embodiments of the invention are operational with numerous other general purpose or special purpose computing system environments or configurations, and with numerous other electronic devices, such as terminal devices, computer systems, servers, etc. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above, and the like.
Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
Exemplary embodiments
FIG. 1 is a flow chart of one embodiment of the method of the present application. As shown in fig. 1, the method of this embodiment includes: step S100 and step S110.
And S100, inputting the face image to be processed into a neural network.
In an optional example, the face image to be processed in the present application may be an image presenting a static picture or a photo, or may also be a video frame presenting a dynamic video, or the like. The face in the face image to be processed may be a front face, or may also be a side face with small angle deflection (for example, the face deflection amplitude is smaller than a predetermined angle), or may also be a side face with large angle deflection (for example, the face deflection amplitude is not smaller than a predetermined angle), and the like. In addition, the face in the facial image to be processed in the present application may be a face with a large expression, for example, a front face with a large expression, for example, a side face with a large expression and a small angular deflection, for example, a side face with a large expression and a large angular deflection, and the like. A large expression in this application generally refers to an expression that causes a large change in the form of at least one organ of a human face compared to its normal form. Large expressions in this application are sometimes also referred to as complex expressions. The method and the device do not limit the specific forms of the deflection angle, the expression and the like of the face in the face image to be processed.
In an optional example, the neural network in the present application is mainly used to extract face key point information including at least one three-dimensional motionless key point in a face image to be processed. The network structure of the neural network can be flexibly designed according to the actual requirement of extracting the face key point information, and the embodiment of the application does not limit the specific network structure of the neural network. For example, the neural network of the present application may include, but is not limited to, convolutional layers, nonlinear Relu layers, pooling layers, fully-connected layers, and the like, the greater the number of layers included in the neural network, the deeper the network; for another example, the Network structure of the neural Network of the present application may adopt, but is not limited to, a Network structure adopted by a neural Network such as ALexNet, Deep Residual Network (ResNet), or VGGnet (Visual Geometry Group Network).
The neural network can be used for extracting the face key point information aiming at the face image to be processed based on the front face and can also be used for extracting the face key point information aiming at the face image to be processed based on the side face, so that the phenomenon that the face key point jumps caused by processing the face image to be processed based on the front face by using one neural network, processing the face image to be processed based on the side face by using the other neural network and processing a continuously played video frame by switching the neural networks can be avoided; therefore, the method and the device are beneficial to improving the continuity of extracting the face key points from the continuously played video frames by the neural network.
S110, extracting face key point information including at least one three-dimensional motionless key point of the face image to be processed through a neural network.
In an alternative example, the three-dimensional motionless key point in the present application refers to a point of a human face in a three-dimensional space having a predetermined semantic meaning. Three-dimensional motionless keypoints can be represented by their points projected on a two-dimensional plane. That is, the three-dimensional stationary key point in the present application is a point on a two-dimensional plane, which is a projected point of a human face on the two-dimensional plane, the point having a predetermined semantic meaning in a three-dimensional space. The three-dimensional immobile key point is a face key point with preset semantics, namely the three-dimensional immobile key point can clearly show a specific part on the face, and the semantics of the specific part shown by the three-dimensional immobile key point is fixed and cannot be changed no matter whether the face deflects or not and no matter how the expression on the face is specific; that is, a three-dimensional motionless key point always represents a specific part on the face, regardless of whether the face is deflected or not and regardless of the expression on the face, and although the coordinate of the specific part in the three-dimensional coordinate changes with the deflection of the face and the change of the expression, the specific part always represents a point on the face with a certain fixed semantic meaning.
In an optional example, the three-dimensional immobile key points extracted from the face image to be processed by the neural network in the present application may include: one or more of an eyelid center keypoint, a nasal lower edge center keypoint, a lip peak keypoint, a lip center keypoint, and a chin center keypoint. The eyelid center key points therein generally include: one or more of a left eye upper eyelid center keypoint, a left eye lower eyelid center keypoint, a right eye upper eyelid center keypoint, and a right eye lower eyelid center keypoint. The lip center key points generally include: one or more of a center key point of the upper lip line of the upper lip (i.e., a center key point of the upper lip upper contour line), a center key point of the lower lip line of the upper lip (i.e., a center key point of the upper lip lower contour line), a center key point of the upper lip line of the lower lip (i.e., a center key point of the lower lip upper contour line), and a center key point of the lower lip line of the lower lip (i.e., a center key point of the lower lip lower contour line). The lip peak key points generally include: one or more of lip left lip peak keypoints and lip right lip peak keypoints.
In an optional example, the neural network of the present application may extract 12 three-dimensional stationary key points from the face image to be processed, as shown in fig. 2, the 12 three-dimensional stationary key points in the present application are respectively: a right eye upper eyelid center key point 1, a right eye lower eyelid center key point 2, a left eye upper eyelid center key point 3, a left eye lower eyelid center key point 4, a nose lower edge center key point 5, a lip right lip peak key point 6, an upper lip line center key point 7, a lip left lip peak key point 8, an upper lip lower lip line center key point 9, a lower lip upper lip line center key point 10, a lower lip line center key point 11, and a chin center key point 12. Although the human face in fig. 2 has a certain angle of deflection, the 12 three-dimensional stationary key points still have their predetermined semantics, for example, in a case where a line connecting the left mouth corner and the right mouth corner in fig. 2 is taken as an x coordinate axis, and a midpoint of the line connecting the left mouth corner and the right mouth corner is taken as an origin at which the x axis intersects with the y axis, the upper lip line center key point 7, the upper lip line center key point 9, the lower lip line center key point 10, and the lower lip line center key point 11 of the upper lip are not on the y axis. That is, the key point 7 of the center of the upper lip line of the upper lip in fig. 2 is not the center point of the upper lip line of the upper lip (i.e., the upper contour line of the upper lip) in the physical coordinate sense, but is a point of the human face in the three-dimensional space having the semantic meaning of the center point of the upper lip line of the upper lip, which is projected on the two-dimensional plane; similarly, the key point 9 of the center of the lower lip line of the upper lip is not the center point of the lower lip line of the upper lip (i.e. the lower contour line of the upper lip) in the physical coordinate sense in fig. 2, but is a point of the human face in the three-dimensional space having the semantic of the center point of the lower lip line of the upper lip, which is projected on the two-dimensional plane; the key point 10 of the center of the upper lip line of the lower lip is not the center point of the upper lip line of the lower lip (i.e. the upper contour line of the lower lip) in the physical coordinate sense in fig. 2, but is a point of the human face in the three-dimensional space having the semantic of the center point of the upper lip line of the lower lip, which is projected on a two-dimensional plane; the key point 11 of the center of the lower lip line of the lower lip in fig. 2 is not the center point of the lower lip line of the lower lip (i.e., the lower contour line of the lower lip) in the physical coordinate sense, but is a point of the face in the three-dimensional space having the semantic of the center point of the lower lip line of the lower lip, which is projected on a two-dimensional plane.
In an optional example, the face key point information extracted from the face image to be processed by the neural network in the present application may include not only: the face key point information located at the non-blocking position in the face image to be processed may further include: and the face key point information at the shielded position in the face image to be processed. The face key point information located at the shielded position in the face image to be processed in the application can be regarded as face key point information located at the shielded position in the face image to be processed and obtained based on a perspective mode. The method and the device have the advantages that the face key point information of the shielded position is extracted, so that the face key point information extracted by the neural network can be accurately sketched, and the method and the device are better suitable for image rendering, face changing processing, face beautifying processing, makeup beautifying processing, face recognition, face state detection, expression detection and other applications of the face.
In an alternative example, the face key point information extracted from the face image to be processed by the neural network in the present application generally includes: the number of the face key point and the coordinates of the face key point (i.e. the coordinates of the face key point in the coordinate system of the face image to be processed), for example, the neural network extracts the respective numbers and respective coordinates of 21-300 face key points from the face image to be processed. The 21-300 face key points may be specifically 68, 106, 240, 252 or 270, etc. Of course, the neural network in the present application may also extract a greater number or a smaller number of face key point information from the face image to be processed, for example, the neural network in the present application extracts only the above 12 three-dimensional stationary key point information from the face image to be processed. The specific number of the face key point information extracted from the face image to be processed is not limited by the application.
The face key point information including the three-dimensional motionless key points obtained by the method can be used for image rendering, face changing processing, facial beautifying processing, face recognition, face state detection, expression detection and other applications of the face, and the specific application range of the face key point information extracted through the neural network is not limited by the method.
In an alternative example, the application may train a neural network using a plurality of face image samples in a sample data set. The face image samples in the sample data set are provided with face key point marking information, and the face key point marking information comprises the following steps: at least one piece of three-dimensional immobile key point labeling information, for example, the face image sample has 21 to 300 pieces of face key point labeling information, the 21 to 300 pieces of face key point labeling information may specifically be 68, 106, 240, 252 or 270 pieces of face key point labeling information, and the 21 to 300 pieces of face key point labeling information include 12 pieces of three-dimensional immobile key point labeling information. The method and the device do not limit the specific quantity of the face key point annotation information carried by the face image sample and the specific quantity of the three-dimensional immobile key point annotation information contained in the face key point annotation information.
In an optional example, the face key point annotation information carried by the face image sample includes occluded face key point annotation information, that is, the application not only annotates the face key point located at the non-occluded position in the face image sample, but also annotates the face key point located at the occluded position in the face image sample. As an optional example, the total number of the face key points to be labeled in the face image samples may be preset to be 106, and the application should label 106 pieces of personal face key point labeling information for any face image sample regardless of the deflection direction and the expression of the face in the face image samples. In addition, no matter the three-dimensional fixed key points in the face image sample or other face key points except the three-dimensional fixed key points in the face image sample, for the face key points located at the shielded position in the face image sample, the coordinates marked by the face key points are the perspective coordinates of the face key points.
In an alternative example, the process of setting the face key point annotation information in the face image sample is shown in fig. 3.
In fig. 3, S300, curve control points on the face in the face image sample are determined, and a curve is formed according to the curve control points.
In an alternative example, all face key points in the face image sample are generally formed by a plurality of curves based on interpolation, and each curve is generally formed by curve control points; for example, for the left eye in the face image sample, the application should acquire a plurality of upper eyelid line control points and a plurality of lower eyelid line control points for the left eye, and form the upper eyelid line for the left eye using the plurality of upper eyelid line control points and the lower eyelid line for the left eye using the plurality of lower eyelid lines; for a right eye in a face image sample, acquiring a plurality of upper eyelid line control points and a plurality of lower eyelid line control points of the right eye, forming an upper eyelid line of the right eye by using the plurality of upper eyelid line control points, and forming a lower eyelid line of the right eye by using the plurality of lower eyelid lines; for the nose in the face image sample, acquiring a plurality of nose lower edge control points, and forming a lower edge line of the nose by using the plurality of nose lower edge control points; for the mouth in the face image sample, a plurality of upper lip line control points, a plurality of upper lip lower lip line control points, a plurality of lower lip upper lip line control points and a plurality of lower lip line control points are obtained, an upper lip line is formed by utilizing the plurality of upper lip line control points, an upper lip line is formed by utilizing the plurality of upper lip lower lip line control points, a lower lip upper lip line is formed by utilizing the plurality of lower lip upper lip line control points, and a lower lip upper lip line is formed by utilizing the plurality of lower lip line control points; for the face contour in the face image sample, a plurality of face contour control points are obtained, and the face contour is formed by using the plurality of face contour control points.
In an alternative example, the present application may form a corresponding curve from a corresponding plurality of curve control points by interpolating pixel-by-pixel between the curve control points. The present application is not limited to a particular implementation of forming a curve by curve control points.
And S310, determining three-dimensional immobile key points serving as face image samples on the curve.
In an optional example, the application may determine a point on the curve whose distance from the three-dimensional stationary key point marked on the face image sample satisfies a predetermined distance requirement, and use the point satisfying the predetermined distance requirement as the three-dimensional stationary key point of the face image sample.
In an alternative example, the face image sample in the present application is marked with at least one three-dimensional immobile key point, for example, a right upper eyelid center key point, a right lower eyelid center key point, a left upper eyelid center key point, a left lower eyelid center key point, a lower edge center key point of the nose, a lip right lip peak key point, an upper lip line center key point, a lip left lip peak key point, an upper lip lower lip line center key point, a lower lip upper lip line center key point, a lower lip line center key point, and a chin center key point, which are the coordinates of the 12 three-dimensional immobile key points, are marked on the face image sample by means of manual labeling.
In an alternative example, a three-dimensional stationary keypoint manually marked on a face image sample generally corresponds to a curve. For any three-dimensional immobile key point marked on the face image sample, at least two points can be selected from a curve corresponding to the three-dimensional immobile key point near the three-dimensional immobile key point, and the distance between each selected point and the three-dimensional immobile key point is calculated according to the coordinate of the marked three-dimensional immobile key point and the coordinate of the point on the curve, so that the point with the minimum distance from the three-dimensional immobile key point can be selected; the three-dimensional motionless key points marked by the method can be replaced by the points with the minimum distance.
The three-dimensional fixed key points of the face image sample can be set in other modes, for example, the three-dimensional fixed key points marked manually can be directly used as the three-dimensional fixed key points of the face image sample. The specific implementation mode of setting the three-dimensional immobile key points of the face image samples is not limited in the application.
And S320, selecting points on the curve on at least one side of the three-dimensional fixed key points of the face image sample.
In an alternative example, the application will generally select an equal number of points from the left and right sides of the three-dimensional motionless keypoint. The method can select points on the curve in an interpolation mode; for example, if the curve is a monocular upper or lower eyelid line, 9-14 (e.g., 10) points other than the three-dimensional motionless key point (e.g., upper or lower eyelid center key point) may be inserted (e.g., uniformly inserted) on the upper or lower eyelid line; for another example, if the curve is the upper lip line of the mouth, 13-18 (e.g., 14) points other than the three-dimensional motionless key points (e.g., the key point of the right lip peak, the key point of the center of the upper lip line, and the key point of the left lip peak) may be inserted (e.g., uniformly inserted) on the upper lip line; for another example, if the curve is the lower lip upper lip line or the lower lip line of the mouth, then 12-17 (e.g., 14) points other than the three-dimensional motionless key point (e.g., the lower lip upper lip line center key point or the lower lip line key point) may be inserted (e.g., uniformly inserted) on the lip upper lip line or the lower lip line; for another example, if the curve is a face contour curve, at least 32 points other than the three-dimensional motionless keypoints (e.g., the lower barycenter keypoints) may be inserted (e.g., uniformly inserted) on the face contour curve.
It should be noted that, in the present application, the number of points inserted for a curve may be determined according to actual requirements, but the number of points inserted for a curve should be ensured: the degree of error of the curve formed by fitting the inserted points with respect to the curve of the actual region of the face is 1/5-1/10 of the degree of error of the curve formed by the curve control points with respect to the curve of the actual region of the face. Therefore, the shape expressed by the face key point marking information formed by the face image sample can be closer to the actual part shape, and the neural network training is facilitated.
And S330, forming face key point marking information according to the selected points and the three-dimensional immobile key points of the face image sample.
In an optional example, the coordinate information of the points inserted in the curve and the three-dimensional fixed key points in the curve in the face image sample is the coordinate information in the corresponding face key point labeling information, and the sequence numbers of the points inserted in the curve and the three-dimensional fixed key points in the curve are converted into the numbers in the corresponding face key point labeling information in the face image sample.
The flow of the method for training a neural network of the present application is shown in fig. 4.
In fig. 4, S400, a plurality of face image samples are acquired.
The face image sample of the application is marked with: and marking information of the face key points including at least one three-dimensional motionless key point. The face image samples are marked with: the face key point marking information at the position which is not blocked in the face image sample and the face key point marking information at the position which is blocked in the face image sample. The face image sample can be marked with 21-300 pieces of face key point marking information, for example, the face image sample can be marked with 68, 106, 240, 252 or 270 pieces of face key point marking information. The process of labeling the face image sample with the face key point labeling information can be referred to the above description of fig. 3.
And S410, training a neural network by using the obtained multiple face image samples.
In an optional example, the method includes inputting a face image sample into a neural network, extracting face key point information of the face image sample through the neural network, and adjusting parameters in the neural network by using a corresponding loss function according to the face key point information extracted by the neural network and face key point marking information marked in the face image sample. One specific process of training a neural network using face image samples is described below in fig. 5.
In an alternative example, a flow of a specific example of a method for training a neural network of the present application is shown in fig. 5.
In fig. 5, S500, a face image sample is input into a neural network, and face key point information of the face image sample is extracted through the neural network.
In an alternative example, the face image sample in the present application carries a plurality of face key point annotation information. The face key point marking information carried by the face image sample comprises: at least one three-dimensional motionless keypoint label information. An optional example is that the face image sample has 106 pieces of personal face key point annotation information, where the 106 pieces of personal face key point annotation information include 12 pieces of three-dimensional immobile key point annotation information, for example, right upper eye eyelid center key point annotation information, right lower eye eyelid center key point annotation information, left upper eye eyelid center key point annotation information, left lower eye eyelid center key point annotation information, nose lower edge center key point annotation information, lip right lip peak key point annotation information, upper lip line center key point annotation information, lip left lip peak key point annotation information, upper lip lower lip line center key point annotation information, lower lip upper lip line center key point annotation information, lower lip line center key point annotation information, and chin center key point annotation information. Each of the key point annotation information includes: the key point number and the coordinates of the key point in the face image sample. In addition, all face key points (e.g. 106 individual face key points) have annotation information regardless of whether there is a face key point located at an occluded position in the face image sample.
And S510, calculating a first distance between different key points on at least one organ according to the extracted face key point information.
In an alternative example, for an organ (such as a left eye or a right eye or a mouth) in a human face, the neural network may extract a plurality of face key points, the plurality of face key points may form at least one group of face key points, and the application may calculate a distance between two face key points in each group, so as to obtain at least one first distance.
An optional example, when the organ is a single eye (left eye or right eye), the neural network may extract a plurality of upper eyelid line key points and a plurality of lower eyelid line key points, the upper eyelid line key points and the lower eyelid line key points are generally opposite in upper and lower positions, and the upper eyelid line key points and the lower eyelid line key points opposite in upper and lower positions may form a group of eyelid line key points, so that the present application may form a plurality of groups of eyelid line key points, and a distance between two eyelid line key points in the group may be calculated according to coordinates of the two eyelid line key points in the group, thereby obtaining the first eyelid line distance.
As an optional example, in a case that the organ is a mouth, the neural network may extract a plurality of upper lip line key points and a plurality of lower lip line key points of the upper lip, the upper lip line key points and the lower lip line key points are generally opposite in an upper-lower position, and the upper lip line key points and the lower lip line key points which are opposite in the upper-lower position may form a group of outer lip line key points, so that the present application may form a plurality of groups of outer lip line key points, and a distance between two lip line key points in the group may be calculated according to coordinates of the two lip line key points in the group, so as to obtain the first outer lip line distance.
As an optional example, in a case that the organ is a mouth, the neural network may extract a plurality of key points of a lower lip line of the upper lip and a plurality of key points of an upper lip line of the lower lip, where the key points of the lower lip line of the upper lip and the key points of the upper lip line of the lower lip are generally opposite to each other in an up-down position, and the key points of the lower lip line of the upper lip and the key points of the upper lip line of the lower lip opposite to each other in the up-down position may form a group of key points of the inner lip line, so that a plurality of groups of key points of the inner lip line may be formed in the present application, and a distance between the key points of the two lip lines in the group may be calculated according to coordinates of the key points of the two lip lines in the group, so as to obtain the first distance of the inner lip line.
S520, calculating a second distance between different key points on at least one organ according to the face key point labeling information.
In an alternative example, for one organ in the face, a plurality of face key points are generally marked, the marked face key points may form at least one group of face key points, and the application may calculate a distance between two face key points in each group, so as to obtain at least one second distance.
As an alternative example, in the case that the organ is a single eye (left eye or right eye), the face image sample may be generally marked with a plurality of upper eyelid line key points and a plurality of lower eyelid line key points, the upper eyelid line key points and the lower eyelid line key points are generally opposite to each other in the up-down position, and the upper eyelid line key points and the lower eyelid line key points opposite in the up-down position may form a group of eyelid line key points, so that the present application may form a plurality of groups of eyelid line key points, and the distance between the two eyelid line key points in the group may be calculated according to the marked coordinates of the two eyelid line key points in the group, so as to obtain the second eyelid line distance.
As an alternative example, in the case that the organ is a mouth, a plurality of upper lip line key points and a plurality of lower lip line key points are usually marked in the face image sample, the upper lip line key points and the lower lip line key points are usually opposite to each other in an up-down position, and the upper lip line key points and the lower lip line key points which are opposite to each other in the up-down position may form a group of outer lip line key points, so that the present application may form a plurality of groups of outer lip line key points, and a distance between two lip line key points in the group may be calculated according to marked coordinates of the two lip line key points in the group, so as to obtain a second outer lip line distance.
As an alternative example, in the case that the organ is a mouth, a plurality of key points of the lower lip line of the upper lip and a plurality of key points of the upper lip line of the lower lip are usually marked in the face image sample, the key points of the lower lip line of the upper lip and the key points of the upper lip line of the lower lip are usually opposite to each other in the up-down position, and the key points of the lower lip line of the upper lip and the key points of the upper lip line of the lower lip which are opposite in the up-down position may form a group of key points of the inner lip line, so that a plurality of groups of key points of the inner lip line may be formed in the present application, and the distance between the two key points of the lip line in the group may be calculated according to the marked coordinates of the two key points of the lip line in the group, so as to obtain the second distance of the inner lip line.
And S530, performing supervised learning on the neural network through a first loss function based on the first distance and the second distance.
In an alternative example, the present application is generally provided with four first loss functions corresponding to the left eye, the right eye, the outer lip line, and the inner lip line, respectively, the first eyelid line distance and the second eyelid line distance calculated based on the left eye eyelid line are substituted into one first loss function, the first eyelid line distance and the second eyelid line distance calculated based on the right eye eyelid line are substituted into another loss function, the first outer lip line distance and the second outer lip line distance calculated based on the outer lip line are substituted into still another loss function, and the first inner lip line distance and the second inner lip line distance calculated based on the inner lip line are substituted into the remaining one first loss function. According to the method and the device, the four first loss functions are used for supervised learning of the neural network, so that the accuracy of extracting the face key points of the neural network aiming at the face image with large expression or large-angle deflection is improved.
In an alternative example, the training process of the present application may further include step S540.
Step S540, forming any group of key points by the face key points extracted by the neural network and the labeled corresponding face key points, for example, forming a group of key points by two face key points with the same number, and calculating the distance between two key points in the group respectively to obtain at least one drift distance; and supervised learning of the neural network via a second loss function based on the drift distance. The method and the device have the advantages that the second loss function is utilized to supervise and learn the neural network, so that the accuracy of extracting the key points of the face by the neural network is improved.
Fig. 6 is a schematic structural diagram of an embodiment of a face image processing apparatus according to the present application. As shown in fig. 6, the apparatus of this embodiment includes: an input image module 600 and an acquire keypoint module 610. Optionally, the face image processing apparatus of this embodiment may further include: means for training a neural network 620.
The input image module 600 is used to input the face image to be processed into the neural network. The operation specifically performed by the input image module 600 is as described above with respect to S100 in fig. 1, and will not be described in detail here.
The key point obtaining module 610 is configured to extract, via a neural network, face key point information of the face image to be processed, where the face key point information includes at least one three-dimensional stationary key point. The operation specifically performed by the key point acquisition module 610 is as described above with respect to S110 in fig. 1, and will not be described in detail here.
The means 620 for training a neural network is used for training the neural network in the present application by using a plurality of face image samples. The specific structure of the apparatus 620 for training a neural network is described below with reference to fig. 7, and will not be described in detail here.
Fig. 7 is a schematic structural diagram of an embodiment of an apparatus 620 for training a neural network according to the present application. As shown in fig. 7, the apparatus of this embodiment includes: a get samples module 621 and a training module 622. Optionally, the means 620 for training the neural network may further include: and a labeling module 623.
The sample acquiring module 621 is configured to acquire a plurality of face image samples. The face image samples obtained by the sample obtaining module 621 are marked with: and marking information of the face key points including at least one three-dimensional motionless key point.
The training module 622 is configured to train a neural network using the plurality of face image samples obtained by the sample obtaining module 621.
In an optional example, the training module 622 may be specifically configured to input the face image sample into a neural network, extract face key point information of the face image sample via the neural network, calculate a first distance between different key points on at least one organ according to the extracted face key point information, and calculate a second distance between different key points on at least one organ according to the face key point labeling information; and supervised learning of the neural network via a first loss function based on the first distance and the second distance.
In an optional example, the training module 622 may be further specifically configured to, for at least one group of key points formed by the extracted face key points and the labeled corresponding face key points, determine a distance between two key points in the group to obtain at least one drift distance; supervised learning of the neural network via a second loss function is performed based on the drift distance.
The operations specifically performed by training module 622 are described above with respect to the steps in fig. 5 and will not be described in detail here.
The labeling module 623 is configured to determine a curve control point of at least one portion of a face in a face image sample, form a curve according to the curve control point, determine a three-dimensional stationary key point on the curve as the face image sample, select a point on the curve on at least one side (e.g., left and right sides) of the three-dimensional stationary key point of the face image sample, and form face key point labeling information according to the selected point and the three-dimensional stationary key point of the face image sample. The operations specifically performed by the labeling module 623 are as described above with respect to the steps in fig. 3 and will not be described in detail here.
Exemplary device
Fig. 8 illustrates an exemplary device 800 suitable for implementing the present application, the device 800 may be a control system/electronic system configured in an automobile, a mobile terminal (e.g., a smart mobile phone, etc.), a personal computer (PC, e.g., a desktop or laptop computer, etc.), a tablet computer, a server, and so forth. In fig. 8, the device 800 includes one or more processors, communication sections, and the like, and the one or more processors may be: one or more Central Processing Units (CPU)801, and/or one or more image processors (GPU)813, etc., which may perform various appropriate actions and processes according to executable instructions stored in a Read Only Memory (ROM)802 or loaded from a storage section 808 into a Random Access Memory (RAM) 803. The communication portion 812 may include, but is not limited to, a network card, which may include, but is not limited to, an ib (infiniband) network card. The processor may communicate with the read only memory 802 and/or the random access memory 830 to execute executable instructions, communicate with the communication portion 812 through the bus 804, and communicate with other target devices through the communication portion 812 to accomplish the steps of the present application.
In an alternative example, the processor implements the steps of any of the face image processing methods of the present application, or implements the steps of any of the methods for training a neural network of the present application, upon execution of the instructions. The steps implemented by the processor can be referred to the relevant description of the above method embodiments, and will not be described in detail here.
In addition, the RAM803 may store various programs and data necessary for the operation of the apparatus. The CPU801, ROM802, and RAM803 are connected to each other via a bus 804. The ROM802 is an optional module in the case of the RAM 803. The RAM803 stores or writes executable instructions into the ROM802 at runtime, which cause the central processing unit 801 to execute the steps included in the object segmentation method described above. An input/output (I/O) interface 805 is also connected to bus 804. The communication unit 812 may be provided integrally with the bus, or may be provided with a plurality of sub-modules (e.g., a plurality of IB network cards) and connected to the bus.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted in the storage section 808 as necessary.
It should be particularly noted that the architecture shown in fig. 8 is only an optional implementation manner, and in a specific practical process, the number and types of the components in fig. 8 may be selected, deleted, added or replaced according to actual needs; in the case of different functional component settings, separate settings or integrated settings may be used, for example, the GPU and the CPU may be separately provided, and for example, the GPU may be integrated on the CPU, the communication unit may be separately provided, or the GPU may be integrally provided on the CPU or the GPU. These alternative embodiments are all within the scope of the present application.
In particular, according to an embodiment of the present application, the processes described below with reference to the flow diagrams may be implemented as a computer software program, for example, the present application embodiment includes a computer program product having stored thereon a computer program that, when executed by a processor, implements the steps in any of the face image processing methods in the present application or implements the steps in any of the methods for training a neural network in the present application.
In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809 and/or installed from the removable medium 811. When the computer program is executed by the Central Processing Unit (CPU)801, the above-described steps described in the present application are executed.
The methods and apparatus, electronic devices, and computer-readable storage media of the present application may be implemented in a number of ways. For example, the methods and apparatus, electronic devices, and computer-readable storage media of the present application may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present application are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present application may also be embodied as a program recorded in a recording medium, the program including machine-readable instructions for implementing a method according to the present application. Thus, the present application also covers a recording medium (such as a computer-readable storage medium) storing a computer program for executing the face image processing method according to the present application and/or the method for training a neural network.
The description of the present application has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the application in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the application and the practical application, and to enable others of ordinary skill in the art to understand the application for various embodiments with various modifications as are suited to the particular use contemplated.

Claims (35)

1. A face image processing method is characterized by comprising the following steps:
inputting a face image to be processed into a neural network; the face in the face image to be processed comprises a face with expression change and face angle deflection;
extracting face key point information including at least one three-dimensional motionless key point of the face image to be processed through the neural network; the face key point information includes: the method comprises the steps that face key point information located at an unoccluded position and an occluded position in a face image to be processed is obtained based on a perspective mode, wherein the face key point information located at the occluded position is the face key point information located at the occluded position in the face image to be processed;
positioning a specific part of the face in the face image to be processed based on the face key point information; the three-dimensional motionless key points are points where points of human faces with semantics of specific parts in a three-dimensional space are projected on a two-dimensional plane, and the semantics of the three-dimensional motionless key points in the three-dimensional space are fixed.
2. The method of claim 1, wherein the three-dimensional stationary key points comprise: at least one of an eyelid center keypoint, a nasal lower edge center keypoint, a lip peak keypoint, a lip center keypoint, and a chin center keypoint.
3. The method of claim 2, wherein the eyelid center keypoint comprises: at least one of a left eye upper eyelid center keypoint, a left eye lower eyelid center keypoint, a right eye upper eyelid center keypoint, and a right eye lower eyelid center keypoint.
4. The method of claim 2, wherein the lip center key points comprise: at least one of a central key point of an upper lip line of the upper lip, a central key point of a lower lip line of the upper lip, a central key point of an upper lip line of the lower lip, and a central key point of a lower lip line of the lower lip.
5. The method of claim 2, wherein the lip peak key points comprise: at least one of lip left lip peak keypoints and lip right lip peak keypoints.
6. The method for processing the face image according to claim 1, wherein the face key point information extracted by the neural network comprises: 68, 106, 240, 252 or 270 face keypoint information.
7. The method according to any one of claims 1 to 6, wherein the neural network is a neural network using a neural network labeled with: the face image sample of the face key point marking information including at least one three-dimensional immobile key point is trained.
8. The method of processing a face image according to claim 7, the method further comprising:
obtaining a plurality of face image samples, wherein the face image samples are marked with: the face key point marking information comprises at least one three-dimensional immobile key point;
training the neural network using the plurality of face image samples.
9. The method according to claim 8, wherein the face image sample is marked with face keypoint marking information at an unobstructed position and/or an obstructed position in the face image sample.
10. The method according to claim 7, wherein the face image sample face key point labeling information comprises: 68, 106, 240, 252, or 270 individual face key points are annotated with information.
11. The method of claim 7, further comprising, before training the neural network with the face image samples, obtaining the face keypoint labeling information by:
determining a curve control point of at least one part of a human face in a human face image sample;
forming a curve according to the curve control points;
determining three-dimensional immobile key points on the curve as face image samples;
selecting points on a curve on at least one side of a three-dimensional fixed key point of a face image sample;
and forming face key point marking information according to the selected points and the three-dimensional immobile key points of the face image samples.
12. The method of claim 11, wherein the determining three-dimensional stationary key points on the curve as face image samples comprises:
and determining a point on the curve, wherein the distance between the point and the three-dimensional immobile key point marked on the face image sample meets the requirement of a preset distance, and taking the point as the three-dimensional immobile key point of the face image sample.
13. The method of claim 8, wherein the training the neural network using the plurality of face image samples comprises:
inputting a face image sample into the neural network, and extracting face key point information of the face image sample through the neural network;
calculating a first distance between different key points on at least one organ according to the extracted face key point information;
calculating a second distance between different key points on at least one organ according to the face key point labeling information;
supervised learning of the neural network via a first loss function based on the first and second distances.
14. The method of claim 13, wherein the calculating a first distance between different keypoints on at least one organ according to the extracted face keypoint information comprises:
respectively calculating the distance between two key points in a group aiming at any group of key points formed by the extracted key points of the upper eyelid line of the left eye and the corresponding key points of the lower eyelid line of the left eye so as to obtain at least one first distance based on the eyelid line of the left eye; and/or
Respectively calculating the distance between two key points in a group aiming at any group of key points formed by the extracted key points of the upper eyelid line of the right eye and the corresponding key points of the lower eyelid line of the right eye so as to obtain at least one first distance based on the eyelid line of the right eye; and/or
Respectively calculating the distance between two key points in a group aiming at any group of key points formed by the extracted key points of the upper lip line of the upper lip and the corresponding key points of the lower lip line of the lower lip to obtain at least one first distance based on the outer lip line; and/or
And respectively calculating the distance between two key points in the group aiming at any group of key points formed by the extracted key points of the lower lip line of the upper lip and the corresponding key points of the upper lip line of the lower lip so as to obtain at least one first distance based on the inner lip line.
15. The method of processing a face image according to claim 14, wherein the first loss function comprises: a first loss function for a first distance based on a left eye eyelid line, a first loss function for a first distance based on a right eye eyelid line, a first loss function for a first distance based on an outer lip line, and/or a first loss function for a first distance based on an inner lip line.
16. The method of claim 8, wherein the training the neural network using the plurality of face image samples comprises:
determining the distance between two key points in a group aiming at least one group of key points formed by the extracted face key points and the labeled corresponding face key points so as to obtain at least one drift distance;
supervised learning of the neural network via a second loss function based on the drift distance.
17. A method for training a neural network, the method comprising:
acquiring a plurality of face image samples, wherein faces in the face image samples comprise faces with expression changes and face angle deflection; the face image samples are marked with: the method comprises the steps that information of key points of a human face is marked, wherein the key points comprise at least one three-dimensional motionless key point, the three-dimensional motionless key point is a point of the human face, which has the semantic meaning of a specific part in a three-dimensional space, and the semantic meaning of the three-dimensional motionless key point in the three-dimensional space is fixed; the face image sample is marked with face key point marking information which is positioned at an unoccluded position and an occluded position in the face image sample, and the face key point information positioned at the occluded position is face key point information which is positioned at the occluded position in the face image to be processed and is obtained based on a perspective mode;
and training a neural network by using the plurality of face image samples, wherein the trained neural network is used for extracting face key point information including at least one three-dimensional motionless key point of the face image to be processed, and the face key point information is used for positioning a specific part of the face in the face image to be processed.
18. The method for training a neural network of claim 17, wherein the face key point labeling information of the face image samples comprises: 68, 106, 240, 252, or 270 individual face key points are annotated with information.
19. The method for training a neural network of claim 17, wherein prior to said training a neural network using said plurality of face image samples, further comprising:
determining a curve control point of at least one part of a human face in a human face image sample;
forming a curve according to the curve control points;
determining three-dimensional immobile key points on the curve as face image samples;
selecting points on a curve on at least one side of a three-dimensional fixed key point of a face image sample;
and forming face key point marking information according to the selected points and the three-dimensional immobile key points of the face image samples.
20. The method for training a neural network of claim 19, wherein said determining three-dimensional stationary keypoints on said curve as face image samples comprises:
and determining a point on the curve, wherein the distance between the point and the three-dimensional immobile key point marked on the face image sample meets the requirement of a preset distance, and taking the point as the three-dimensional immobile key point of the face image sample.
21. The method of any one of claims 17 to 20, wherein the training the neural network using the plurality of face image samples comprises:
inputting a face image sample into a neural network, and extracting face key point information of the face image sample through the neural network;
calculating a first distance between different key points on at least one organ according to the extracted face key point information;
calculating a second distance between different key points on at least one organ according to the face key point labeling information;
supervised learning of the neural network via a first loss function based on the first and second distances.
22. The method for training a neural network of claim 21, wherein said calculating a first distance between different keypoints on at least one organ from said extracted face keypoint information comprises:
respectively calculating the distance between two key points in a group aiming at any group of key points formed by the extracted key points of the upper eyelid line of the left eye and the corresponding key points of the lower eyelid line of the left eye so as to obtain at least one first distance based on the eyelid line of the left eye; and/or
Respectively calculating the distance between two key points in a group aiming at any group of key points formed by the extracted key points of the upper eyelid line of the right eye and the corresponding key points of the lower eyelid line of the right eye so as to obtain at least one first distance based on the eyelid line of the right eye; and/or
Respectively calculating the distance between two key points in a group aiming at any group of key points formed by the extracted key points of the upper lip line of the upper lip and the corresponding key points of the lower lip line of the lower lip to obtain at least one first distance based on the outer lip line; and/or
And respectively calculating the distance between two key points in the group aiming at any group of key points formed by the extracted key points of the lower lip line of the upper lip and the corresponding key points of the upper lip line of the lower lip so as to obtain at least one first distance based on the inner lip line.
23. The method for training a neural network of claim 22, wherein the first loss function comprises: a first loss function for a first distance based on a left eye eyelid line, a first loss function for a first distance based on a right eye eyelid line, a first loss function for a first distance based on an outer lip line, and/or a first loss function for a first distance based on an inner lip line.
24. The method of any one of claims 17 to 20, wherein the training the neural network using the plurality of face image samples comprises:
determining the distance between two key points in a group aiming at least one group of key points formed by the extracted face key points and the labeled corresponding face key points so as to obtain at least one drift distance;
supervised learning of the neural network via a second loss function based on the drift distance.
25. A face image processing apparatus, comprising:
the input image module is used for inputting the face image to be processed into the neural network; the face in the face image to be processed comprises a face with expression change and face angle deflection;
the key point acquisition module is used for extracting face key point information including at least one three-dimensional immobile key point of the face image to be processed through the neural network and positioning a specific part of a face in the face image to be processed based on the face key point information; wherein, the face key point information comprises: the method comprises the steps that face key point information located at an unoccluded position and an occluded position in a face image to be processed is obtained based on a perspective mode, wherein the face key point information located at the occluded position is the face key point information located at the occluded position in the face image to be processed; the three-dimensional immobile key points are points of which the semantic of a human face with a specific part in a three-dimensional space is projected on a two-dimensional plane, and the semantic of the three-dimensional immobile key points in the three-dimensional space is fixed.
26. The face image processing apparatus according to claim 25, further comprising: an apparatus for training a neural network, the apparatus for training a neural network comprising:
an obtaining sample module, configured to obtain a plurality of face image samples, where the face image samples are marked with: the face key point marking information comprises at least one three-dimensional immobile key point;
and the training module is used for training the neural network by utilizing the plurality of face image samples.
27. The apparatus of claim 26, wherein the means for training a neural network further comprises: a labeling module to:
determining a curve control point of at least one part of a human face in a human face image sample;
forming a curve according to the curve control points;
determining three-dimensional immobile key points on the curve as face image samples;
selecting points on a curve on at least one side of a three-dimensional fixed key point of a face image sample;
and forming face key point marking information according to the selected points and the three-dimensional immobile key points of the face image samples.
28. The facial image processing apparatus according to any one of claims 26 to 27, wherein the training module is specifically configured to:
inputting a face image sample into the neural network, and extracting face key point information of the face image sample through the neural network;
calculating a first distance between different key points on at least one organ according to the extracted face key point information;
calculating a second distance between different key points on at least one organ according to the face key point labeling information;
supervised learning of the neural network via a first loss function based on the first and second distances.
29. The facial image processing apparatus according to any one of claims 26 to 27, wherein the training module is specifically configured to:
determining the distance between two key points in a group aiming at least one group of key points formed by the extracted face key points and the labeled corresponding face key points so as to obtain at least one drift distance;
supervised learning of the neural network via a second loss function based on the drift distance.
30. An apparatus for training a neural network, the apparatus comprising:
the system comprises an acquisition sample module, a processing module and a display module, wherein the acquisition sample module is used for acquiring a plurality of face image samples, and faces in the face image samples comprise faces with expression changes and face angle deflection; the face image samples are marked with: the method comprises the steps that information of key points of a human face is marked, wherein the key points comprise at least one three-dimensional motionless key point, the three-dimensional motionless key point is a point of the human face, which has the semantic meaning of a specific part in a three-dimensional space, and the semantic meaning of the three-dimensional motionless key point in the three-dimensional space is fixed; the face image sample is marked with face key point marking information which is positioned at an unoccluded position and an occluded position in the face image sample, and the face key point information positioned at the occluded position is face key point information which is positioned at the occluded position in the face image to be processed and is obtained based on a perspective mode;
and the training module is used for training the neural network by using the plurality of face image samples, the trained neural network is used for extracting face key point information including at least one three-dimensional motionless key point of the face image to be processed, and the face key point information is used for positioning a specific part of the face in the face image to be processed.
31. The apparatus for training a neural network of claim 30, further comprising: a labeling module to:
determining a curve control point of at least one part of a human face in a human face image sample;
forming a curve according to the curve control points;
determining three-dimensional immobile key points on the curve as face image samples;
selecting points on a curve on at least one side of a three-dimensional fixed key point of a face image sample;
and forming face key point marking information according to the selected points and the three-dimensional immobile key points of the face image samples.
32. An apparatus for training a neural network as claimed in any one of claims 30 to 31, wherein the training module is specifically configured to:
inputting a face image sample into the neural network, and extracting face key point information of the face image sample through the neural network;
calculating a first distance between different key points on at least one organ according to the extracted face key point information;
calculating a second distance between different key points on at least one organ according to the face key point labeling information;
supervised learning of the neural network via a first loss function based on the first and second distances.
33. An apparatus for training a neural network as claimed in any one of claims 30 to 31, wherein the training module is specifically configured to:
determining the distance between two key points in a group aiming at least one group of key points formed by the extracted face key points and the labeled corresponding face key points so as to obtain at least one drift distance;
supervised learning of the neural network via a second loss function based on the drift distance.
34. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing a computer program stored in the memory, and when executed, implementing the steps in the method for processing a face image according to any one of the preceding claims 1 to 16, or implementing the steps in the method for training a neural network according to any one of the preceding claims 17 to 24.
35. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method for processing a face image according to any one of the preceding claims 1 to 16 or the steps of the method for training a neural network according to any one of the preceding claims 17 to 24.
CN201711352295.9A 2017-12-15 2017-12-15 Face image processing method, face image processing device, electronic apparatus, storage medium, and program Active CN109934058B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110982660.4A CN113688737A (en) 2017-12-15 2017-12-15 Face image processing method, face image processing device, electronic apparatus, storage medium, and program
CN201711352295.9A CN109934058B (en) 2017-12-15 2017-12-15 Face image processing method, face image processing device, electronic apparatus, storage medium, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711352295.9A CN109934058B (en) 2017-12-15 2017-12-15 Face image processing method, face image processing device, electronic apparatus, storage medium, and program

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202110982660.4A Division CN113688737A (en) 2017-12-15 2017-12-15 Face image processing method, face image processing device, electronic apparatus, storage medium, and program

Publications (2)

Publication Number Publication Date
CN109934058A CN109934058A (en) 2019-06-25
CN109934058B true CN109934058B (en) 2021-08-06

Family

ID=66980352

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202110982660.4A Pending CN113688737A (en) 2017-12-15 2017-12-15 Face image processing method, face image processing device, electronic apparatus, storage medium, and program
CN201711352295.9A Active CN109934058B (en) 2017-12-15 2017-12-15 Face image processing method, face image processing device, electronic apparatus, storage medium, and program

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202110982660.4A Pending CN113688737A (en) 2017-12-15 2017-12-15 Face image processing method, face image processing device, electronic apparatus, storage medium, and program

Country Status (1)

Country Link
CN (2) CN113688737A (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113689527B (en) * 2020-05-15 2024-02-20 武汉Tcl集团工业研究院有限公司 Training method of face conversion model and face image conversion method
CN112069993B (en) * 2020-09-04 2024-02-13 西安西图之光智能科技有限公司 Dense face detection method and system based on five-sense organ mask constraint and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760836A (en) * 2016-02-17 2016-07-13 厦门美图之家科技有限公司 Multi-angle face alignment method based on deep learning and system thereof and photographing terminal
CN106203376A (en) * 2016-07-19 2016-12-07 北京旷视科技有限公司 Face key point localization method and device
CN107122705A (en) * 2017-03-17 2017-09-01 中国科学院自动化研究所 Face critical point detection method based on three-dimensional face model
CN107451965A (en) * 2017-07-24 2017-12-08 深圳市智美达科技股份有限公司 Distort face image correcting method, device, computer equipment and storage medium

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868769A (en) * 2015-01-23 2016-08-17 阿里巴巴集团控股有限公司 Method and device for positioning face key points in image
CN104778712B (en) * 2015-04-27 2018-05-01 厦门美图之家科技有限公司 A kind of face chart pasting method and system based on affine transformation
CN106203242B (en) * 2015-05-07 2019-12-24 阿里巴巴集团控股有限公司 Similar image identification method and equipment
CN104899563B (en) * 2015-05-29 2020-01-07 深圳大学 Two-dimensional face key feature point positioning method and system
CN105205827A (en) * 2015-10-16 2015-12-30 中科院成都信息技术股份有限公司 Auxiliary feature point labeling method for statistical shape model
CN106295533B (en) * 2016-08-01 2019-07-02 厦门美图之家科技有限公司 A kind of optimization method, device and the camera terminal of self-timer image
CN106548179B (en) * 2016-09-29 2019-09-17 北京市商汤科技开发有限公司 The detection method of object and dress ornament key point, device and electronic equipment
CN106951840A (en) * 2017-03-09 2017-07-14 北京工业大学 A kind of facial feature points detection method
CN107016370B (en) * 2017-04-10 2019-10-11 电子科技大学 A kind of partial occlusion face identification method based on data enhancing
CN107239758B (en) * 2017-05-24 2022-03-08 北京小米移动软件有限公司 Method and device for positioning key points of human face

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760836A (en) * 2016-02-17 2016-07-13 厦门美图之家科技有限公司 Multi-angle face alignment method based on deep learning and system thereof and photographing terminal
CN106203376A (en) * 2016-07-19 2016-12-07 北京旷视科技有限公司 Face key point localization method and device
CN107122705A (en) * 2017-03-17 2017-09-01 中国科学院自动化研究所 Face critical point detection method based on three-dimensional face model
CN107451965A (en) * 2017-07-24 2017-12-08 深圳市智美达科技股份有限公司 Distort face image correcting method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN113688737A (en) 2021-11-23
CN109934058A (en) 2019-06-25

Similar Documents

Publication Publication Date Title
US11227147B2 (en) Face image processing methods and apparatuses, and electronic devices
US11120254B2 (en) Methods and apparatuses for determining hand three-dimensional data
US11295474B2 (en) Gaze point determination method and apparatus, electronic device, and computer storage medium
US11182591B2 (en) Methods and apparatuses for detecting face, and electronic devices
CN108460338B (en) Human body posture estimation method and apparatus, electronic device, storage medium, and program
CN108229322B (en) Video-based face recognition method and device, electronic equipment and storage medium
CN108229296B (en) Face skin attribute identification method and device, electronic equipment and storage medium
CN108734078B (en) Image processing method, image processing apparatus, electronic device, storage medium, and program
CN108229301B (en) Eyelid line detection method and device and electronic equipment
CN110610453B (en) Image processing method and device and computer readable storage medium
CN108229276B (en) Neural network training and image processing method and device and electronic equipment
CN108229353B (en) Human body image classification method and apparatus, electronic device, storage medium, and program
US11348275B2 (en) Methods and apparatuses for determining bounding box of target object, media, and devices
CN112966742A (en) Model training method, target detection method and device and electronic equipment
CN110853488B (en) POI (Point of interest) label display method, device and equipment
CN114511041B (en) Model training method, image processing method, device, equipment and storage medium
EP3855386A2 (en) Method, apparatus, device and storage medium for transforming hairstyle and computer program product
CN109934058B (en) Face image processing method, face image processing device, electronic apparatus, storage medium, and program
CN112580666A (en) Image feature extraction method, training method, device, electronic equipment and medium
Purps et al. Reconstructing facial expressions of HMD users for avatars in VR
US11769263B2 (en) Three-dimensional scan registration with deformable models
CN115082298A (en) Image generation method, image generation device, electronic device, and storage medium
CN114119990A (en) Method, apparatus and computer program product for image feature point matching
CN115994944A (en) Three-dimensional key point prediction method, training method and related equipment
CN113793349A (en) Target detection method and device, computer readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant