CN109934058A

CN109934058A - Face image processing process, device, electronic equipment, storage medium and program

Info

Publication number: CN109934058A
Application number: CN201711352295.9A
Authority: CN
Inventors: 吴文岩; 王权; 钱晨
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2017-12-15
Filing date: 2017-12-15
Publication date: 2019-06-25
Anticipated expiration: 2037-12-15
Also published as: CN113688737A; CN109934058B

Abstract

The application embodiment discloses a kind of face image processing process, the method, apparatus for training neural network, electronic equipment, computer readable storage medium and computer program, and method therein, which specifically includes that, is input to facial image to be processed in neural network；The face key point information including at least one three-dimensional motionless key point of the facial image to be processed is extracted via the neural network.

Description

Face image processing process, device, electronic equipment, storage medium and program

Technical field

This application involves computer vision techniques, more particularly, to a kind of face image processing process, face image processing Device, the method for training neural network, the device for training neural network, electronic equipment, computer-readable storage medium Matter and computer program.

Background technique

Face key point is a part indispensable in many applications such as recognition of face.Accurately determine face key Point is especially accurately determined with big expression or there are the face key point of the face of wide-angle deflection, not only favorably It is corrected in the relevant position to face, and is conducive to enhance the semantic information of face.

Summary of the invention

The application embodiment provides a kind of face image processing technical solution.

According to the application embodiment one aspect, a kind of face image processing process is provided, this method comprises: Facial image to be processed is input in neural network；Include via what the neural network extracted the facial image to be processed Face key point information including at least one three-dimensional motionless key point.

In one embodiment of the application, the face key point information that the neural network extracts includes: people to be processed The face key point information being located at the position that is not blocked at position and/or is blocked in face image.

In the another embodiment of the application, the motionless key point of three-dimensional includes: eyelid center key point, nose lower edge At least one of center key point, lip lip peak key point, lip center key point and chin center key point.

In the application a further embodiment, eyelid center key point includes: left eye upper eyelid center key point, a left side At least one of eyelid center key point, right eye upper eyelid center key point and right eye eyelid center key point now.

In the application a further embodiment, lip center key point include: upper lip upper lip line center key point, In upper lip lower lip line center key point, lower lip upper lip line center key point and lower lip lower lip line center key point extremely It is one few.

In the application a further embodiment, lip lip peak key point includes: the left lip peak key point of lip and mouth At least one of right lip peak key point of lip.

In the application a further embodiment, the face key point information that the neural network extracts includes: 68, 106,240,252 or 270 face key point informations.

In the application a further embodiment, the neural network is to utilize to be labeled with: motionless including at least one three-dimensional The facial image sample of face key point markup information including key point, made of training.

In the application a further embodiment, the method also includes: obtain multiple facial image samples, wherein described Facial image sample is labeled with: the face key point markup information including at least one three-dimensional motionless key point；Using institute State neural network described in multiple facial image sample trainings.

In the application a further embodiment, the facial image sample is labeled with being located at not in facial image sample The face key point markup information for being blocked at position and/or being blocked at position.

In the application a further embodiment, the face key point markup information of the facial image sample includes: 68, 106,240,252 or 270 face key point markup informations.

In the application a further embodiment, before the neural network described in the utilization facial image sample training, also Including obtaining the face key point markup information using following step: determining at least one portion of face in facial image sample The curve control point of position；According to the curve control point forming curves；Determine three as facial image sample on the curve Tie up motionless key point；In at least side of the motionless key point of three-dimensional of facial image sample, the point on curve is chosen；According to described The motionless key point of the three-dimensional of the point of selection and the facial image sample forms face key point markup information.

In the application a further embodiment, as the motionless pass of three-dimensional of facial image sample on the determination curve Key point, which comprises determining that, meets pre- spacing at a distance from the motionless key point of three-dimensional marked on facial image sample on the curve The motionless key point of three-dimensional from desired point, by the point as facial image sample.

It is described to utilize neural network packet described in the multiple facial image sample training in the application a further embodiment It includes: facial image sample being inputted into the neural network, the people of the facial image sample is extracted via the neural network Face key point information；It is calculated between the different key points at least one organ according to the face key point information extracted First distance；Second between the different key points at least one organ is calculated according to the face key point markup information Distance；It is exercised supervision study via first-loss function to the neural network based on the first distance and second distance.

In the application a further embodiment, the face key point information extracted according to calculates at least one The first distance between different key points on organ includes: for the upper eyelid line key point of left eye by extracting and corresponding Left eye palpebra inferior line key point formed any group of key point, calculate separately the distance between two key points in group, To obtain at least one first distance based on left eye eyelid line；And/or it is crucial for the upper eyelid line of the right eye by extracting Any group of key point that point is formed with the palpebra inferior line key point of corresponding right eye, calculates separately between two key points in group Distance, to obtain at least one first distance based on right eye eyelid line；And/or for the upper lip upper lip by extracting Any group of key point that line key point is formed with corresponding lower lip lower lip line key point calculates separately two key points in group The distance between, to obtain at least one first distance based on outer lip line；And/or for the upper lip lower lip by extracting Any group of key point that line key point is formed with corresponding lower lip upper lip line key point calculates separately two key points in group The distance between, to obtain at least one first distance based on epipharynx line.

In the application a further embodiment, the first-loss function includes: for first based on left eye eyelid line The first-loss function of distance, for the first distance based on right eye eyelid line first-loss function, for be based on outer lip line First distance first-loss function and/or first-loss function for the first distance based on epipharynx line.

It is described to utilize neural network packet described in the multiple facial image sample training in the application a further embodiment Include: at least one set of key point formed for the face key point face key point corresponding with mark by extracting determines The distance between two key points in group, to obtain at least one drift distance；Based on the drift distance via the second damage Function is lost to exercise supervision study to the neural network.

In terms of according to the application embodiment another one, a kind of method for training neural network is provided, it should Method includes: to obtain multiple facial image samples, wherein the facial image sample is labeled with: three-dimensional not including at least one Face key point markup information including dynamic key point；Utilize the multiple facial image sample training neural network.

In one embodiment of the application, the facial image sample be labeled in facial image sample be located at not by Face key point markup information at blocking position and/or at the position that is blocked.

In the another embodiment of the application, the face key point markup information of the facial image sample includes: 68, 106,240,252 or 270 face key point markup informations.

In the application a further embodiment, the method utilizes the multiple facial image sample training nerve described Before network, further includes: determine the curve control point at least one position of face in facial image sample；According to the curve Control point forming curves；Determine the motionless key point of three-dimensional on the curve as facial image sample；In facial image sample The motionless key point of three-dimensional at least side, choose the point on curve；According to the point of the selection and the facial image sample This motionless key point of three-dimensional forms face key point markup information.

It is described to include: using the multiple facial image sample training neural network in the application a further embodiment Facial image sample is inputted into neural network, the face key point of the facial image sample is extracted via the neural network Information；According to the face key point information extracted calculate between the different key points at least one organ first away from From；The second distance between the different key points at least one organ is calculated according to the face key point markup information；Base It exercises supervision study via first-loss function to the neural network in the first distance and second distance.

According to the application embodiment wherein another aspect, a kind of face image processing device is provided, the device packet It includes: input picture module, for facial image to be processed to be input in neural network；And crucial point module is obtained, it is used for The face including at least one three-dimensional motionless key point of the facial image to be processed is extracted via the neural network Key point information.

In one embodiment of the application, the face image processing device further include: for training the dress of neural network It sets, the device for being used to train neural network includes: to obtain sample module, for obtaining multiple facial image samples, wherein The facial image sample is labeled with: the face key point markup information including at least one three-dimensional motionless key point；Instruction Practice module, for utilizing neural network described in the multiple facial image sample training.

It is described for training the device of neural network in the another embodiment of the application further include: labeling module, it is described Labeling module is used for: determining the curve control point at least one position of face in facial image sample；According to the curve control System point forming curves；Determine the motionless key point of three-dimensional on the curve as facial image sample；In facial image sample At least side of three-dimensional motionless key point, chooses the point on curve；According to the point of the selection and the facial image sample The motionless key point of three-dimensional form face key point markup information.

In the application a further embodiment, the training module is specifically used for: facial image sample is inputted the mind Through network, the face key point information of the facial image sample is extracted via the neural network；It is extracted according to described Face key point information calculate the first distance between the different key points at least one organ；It is crucial according to the face Point markup information calculates the second distance between the different key points at least one organ；Based on the first distance and second Distance exercises supervision study to the neural network via first-loss function.

In the application a further embodiment, the training module is specifically used for: for the face key point by extracting At least one set of key point that face key point corresponding with mark is formed, determines the distance between two key points in group, To obtain at least one drift distance；It is exercised supervision via the second loss function to the neural network based on the drift distance Study.

According to the application embodiment wherein another aspect, provide it is a kind of for training the device of neural network, should Device includes: to obtain sample module, for obtaining multiple facial image samples, wherein the facial image sample is labeled with: packet Include the face key point markup information including at least one three-dimensional motionless key point；And training module, it is described more for utilizing Neural network described in a facial image sample training.

In one embodiment of the application, described device further include: labeling module, the labeling module are used for: determining people The curve control point at least one position of face in face image sample；According to the curve control point forming curves；Determine institute State the motionless key point of three-dimensional on curve as facial image sample；Facial image sample the motionless key point of three-dimensional at least The point on curve is chosen in side；It is formed according to the motionless key point of the three-dimensional of the point of the selection and the facial image sample Face key point markup information.

In the another embodiment of the application, the training module is specifically used for: facial image sample is inputted the mind Through network, the face key point information of the facial image sample is extracted via the neural network；It is extracted according to described Face key point information calculate the first distance between the different key points at least one organ；It is crucial according to the face Point markup information calculates the second distance between the different key points at least one organ；Based on the first distance and second Distance exercises supervision study to the neural network via first-loss function.

According to another aspect of the application embodiment, a kind of electronic equipment is provided, comprising: memory, for depositing Store up computer program；Processor, for executing the computer program stored in the memory, and the computer program is held When row, the step in any face image processing process in the application is realized, or realize any for instructing in the application Practice the step in the method for neural network.

According to another aspect of the application embodiment, a kind of computer readable storage medium is provided, is stored thereon with Computer program when the computer program is executed by processor, is realized in any face image processing process in the application Step, or realize the step in any method for training neural network in the application.

According to another aspect of the application embodiment, a kind of computer program, including computer instruction are provided, works as institute When stating computer instruction and running in the processor of equipment, the step in any face image processing process in the application is realized Suddenly, the step in any method for training neural network or in realization the application.

Method, apparatus, electronics based on face image processing process provided by the present application, for training neural network are set Standby, computer readable storage medium and computer program, the application is by extracting facial image to be processed using neural network In the motionless key point of three-dimensional, from regardless of whether face deflects, also specifically regardless of people's facial expression, it is three-dimensional not Privileged site this semanteme on face represented by dynamic key point be it is fixed, will not change, therefore, the application can The privileged site in face is accurately oriented with the face for variform, this is for image rendering, Face Changing processing, U.S. face It is very important for the various applications such as processing, makeups processing, recognition of face, face state-detection and expression detection.By This is it is found that the application is conducive to improve the accuracy for extracting face key point.

Below by drawings and embodiments, the technical solution of the application is described in further detail.

Detailed description of the invention

The attached drawing for constituting part of specification describes presently filed embodiment, and together with description for solving Release the principle of the application.

The application can be more clearly understood according to following detailed description referring to attached drawing, in which:

Fig. 1 is the flow chart of one embodiment of face image processing process of the application；

Fig. 2 is the schematic diagram of the motionless key point of three-dimensional of the application；

Fig. 3 is the flow chart of an embodiment of the method for training neural network of the application；

Fig. 4 is the flow chart of a specific example of the method for training neural network of the application；

Fig. 5 is the flow chart of the mark facial image sample of the application；

Fig. 6 is the structural schematic diagram of one embodiment of face image processing device of the application；

Fig. 7 is the structural schematic diagram of an embodiment of the device for training neural network of the application；

Fig. 8 is the block diagram for realizing an example devices of the application embodiment.

Specific embodiment

Carry out the various exemplary embodiments of detailed description of the present invention now with reference to attached drawing.It should also be noted that unless in addition having Body explanation, the unlimited system of component and the positioned opposite of step, numerical expression and the numerical value otherwise illustrated in these embodiments is originally The range of invention.

Simultaneously, it should be appreciated that for ease of description, the size of various pieces shown in attached drawing is not according to reality Proportionate relationship draw.

Be to the description only actually of at least one exemplary embodiment below it is illustrative, never as to the present invention And its application or any restrictions used.

Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitable In the case of, the technology, method and apparatus should be considered as part of specification.

It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, then in subsequent attached drawing does not need that it is further discussed.

The embodiment of the present invention can be applied to the electronic equipments such as terminal device, computer system, server, can with it is numerous Other general or specialized computing system environments or configuration operate together.Suitable for electric with terminal device, computer system, server etc. The example of well-known terminal device, computing system, environment and/or configuration that sub- equipment is used together includes but is not limited to: Personal computer system, thin client, thick client computer, hand-held or laptop devices, is based on microprocessor at server computer system System, set-top box, programmable consumer electronics, NetPC Network PC, little type Ji calculate machine Xi Tong ﹑ large computer system and Distributed cloud computing technology environment, etc. including above-mentioned any system.

The electronic equipments such as terminal device, computer system, server can be in the department of computer science executed by computer system It is described under the general context of system executable instruction (such as program module).In general, program module may include routine, program, mesh Beacon course sequence, component, logic, data structure etc., they execute specific task or realize specific abstract data type.Meter Calculation machine systems/servers can be implemented in distributed cloud computing environment, and in distributed cloud computing environment, task is by by logical What the remote processing devices of communication network link executed.In distributed cloud computing environment, it includes storage that program module, which can be located at, On the Local or Remote computing system storage medium of equipment.

Exemplary embodiment

Fig. 1 is the flow chart of the application method one embodiment.As shown in Figure 1, the embodiment method includes: step S100 And step S110.

S100, facial image to be processed is input in neural network.

In an optional example, picture or photo that the facial image to be processed in the application can be static for presentation Equal images, or the video frame etc. in dynamic video is presented.Face in the facial image to be processed can be positive face, It or can also be that there are big angles there are the side face of small angle deflection (for example, face deflection amplitude is less than predetermined angular) The side face etc. of degree deflection (for example, face deflection amplitude is not less than predetermined angular).In addition, the facial image to be processed in the application In face can be the face with big expression, for example, the positive face of big expression is had, for example, with big expression and there are small The side face of angular deflection, for another example with big expression and there are side faces of wide-angle deflection etc..Big expression in the application is usual Refer to the expression for making the form of at least one organ of face generate large change compared with its normal morphology.It is big in the application Expression is sometimes referred to as complex expression.The application does not limit the deflection angle and table of the face in facial image to be processed The specific forms such as feelings.

In an optional example, the neural network in the application, which is mainly used for extracting in facial image to be processed, includes Face key point information including at least one three-dimensional motionless key point.The network structure of the neural network can be according to extraction people The actual demand flexible design of face key point information, the embodiment of the present application are not intended to limit the specific network structure of the neural network. For example, the neural network of the application can include but is not limited to convolutional layer, elu layers of linear R, pond layer and full articulamentum Deng the number of plies that neural network is included is more, then network is deeper；For another example the network structure of the neural network of the application can be with Using but be not limited to ALexNet, depth residual error network (Deep Residual Network, ResNet) or VGGnet (Visual Geometry Group Network, visual geometric group network) etc. network structure used by neural networks.

It is closed since the neural network of the application both can be used for extracting face for the facial image to be processed based on positive face Key point information can be used for extracting face key point information, therefore, the application for the facial image to be processed based on side face Another Processing with Neural Network base can be utilized to avoid a to be processed facial image of the Processing with Neural Network based on positive face is utilized In the facial image to be processed of side face, the video frame continuously played is handled by the switching of neural network, and is caused Face key point there is a phenomenon where jump；It is mentioned from the video frame continuously played to which the application is conducive to raising neural network Take the continuity of face key point.

S110, extracted via neural network facial image to be processed including at least one three-dimensional motionless key point Face key point information.

In an optional example, the motionless key point of three-dimensional in the application refers to that face having in three dimensions is pre- The point of attribute justice.Three-dimensional motionless key point can be projected in the point on two-dimensional surface by it to indicate.That is, the application In the motionless key point of three-dimensional be point on two-dimensional surface, the point on the two-dimensional surface is that face having in three dimensions is pre- The point of attribute justice, the subpoint on two-dimensional surface.Three-dimensional motionless key point is that have the face of preset semanteme crucial Point, i.e., three-dimensional motionless key point can explicitly represent a privileged site on face, and no matter whether face deflects, Also specifically regardless of people's facial expression, this semanteme of privileged site represented by three-dimensional motionless key point is fixed, no It can change；That is, no matter whether face deflects, also regardless of people's facial expression, a three-dimensional is motionless Key point always indicates a privileged site on face, can be with people although coordinate of the privileged site in three-dimensional coordinate The deflection of face and the variation of expression and change, however, always show on face one of the privileged site fixes with certain Semantic point.

In an optional example, the three-dimensional that the neural network in the application is extracted from facial image to be processed is motionless Key point may include: crucial along center key point, lip lip peak key point, lip center under eyelid center key point, nose Point and chin center key point in one or more.Eyelid center therein key point generally includes: left eye upper eyelid Center key point, oculus sinisterlower lid center key point, right eye upper eyelid center key point and right eye eyelid center key point In one or more.Lip center therein key point generally includes: upper lip upper lip line center key point (i.e. upper lip Upper contour line center key point), upper lip lower lip line center key point (i.e. upper lip lower contour center key point), lower lip Upper lip line center key point (i.e. contour line center key point on lower lip) and lower lip lower lip line center key point (descend mouth One or more of lip lower contour center key point).Lip lip therein peak key point generally includes: closing at the left lip peak of lip One or more in key point and the right lip peak key point of lip.

In an optional example, the neural network of the application can extract 12 three-dimensionals from facial image to be processed Motionless key point, as shown in Fig. 2, 12 motionless key points of three-dimensional in the application are respectively as follows: right eye upper eyelid center key point 1, right eye eyelid center key point 2, left eye upper eyelid center key point 3, oculus sinisterlower lid center key point 4, nose lower edge The right lip peak key point 6 of center key point 5, lip, upper lip upper lip line center key point 7, the left lip peak key point 8 of lip, upper mouth Lip lower lip line center key point 9, lower lip upper lip line center key point 10, lower lip lower lip line center key point 11 and chin Center key point 12.Face in Fig. 2 is although with the deflection of certain angle, and still, the motionless key point of above-mentioned 12 three-dimensionals is still So there is its predetermined semanteme, for example, using the line of the left corners of the mouth and the right corners of the mouth in Fig. 2 as x coordinate axis, and by the left corners of the mouth and In the case where the origin that the midpoint of the line of the right corners of the mouth intersects as x-axis with y-axis, upper lip upper lip line center key point 7, upper mouth Lip lower lip line center key point 9, lower lip upper lip line center key point 10 and lower lip lower lip line center key point 11 are not On the y axis.That is, upper lip upper lip line center key point 7 is not upper lip upper lip line in Fig. 2 (i.e. on upper lip Contour line) central point in physical coordinates meaning, but face in three dimensions there is upper lip upper lip line central point Semantic point, the point being projected on two-dimensional surface；Likewise, upper lip lower lip line center key point 9 is not upper in Fig. 2 Central point of the lip lower lip line (i.e. upper lip lower contour) in physical coordinates meaning, but the tool of face in three dimensions There is the point of upper lip lower lip line central point semanteme, the point being projected on two-dimensional surface；Lower lip upper lip line center key point 10 It is not central point of the lower lip upper lip line (i.e. contour line on lower lip) in physical coordinates meaning in Fig. 2, but face The point with lower lip upper lip line central point semanteme in three dimensions, the point being projected on two-dimensional surface；Under lower lip Lip line center key point 11 is not lower lip lower lip line (i.e. lower lip lower contour) in Fig. 2 in physical coordinates meaning Central point, but the point with lower lip lower lip line central point semanteme of face in three dimensions, are projected in two-dimensional surface On point.

In an optional example, the face that the neural network in the application is extracted from facial image to be processed is crucial Point information not only may include: the face key point information being located at unobstructed position in facial image to be processed, can be with It include: the face key point information being located at the position that is blocked in facial image to be processed.Face to be processed in the application The face key point information being located at the position that is blocked in image may be considered, and being located in facial image to be processed is hidden Gear sets the face key point information obtained based on pivottable mode at place.The application is closed by extracting the face at the position that is blocked Key point information is conducive to accurately delineating to the form of face, so that the face key point information for extracting neural network can To be preferably suitable for image rendering, the Face Changing processing, U.S. face processing, makeups processing, recognition of face, face state-detection of face And the application such as expression detection.

In an optional example, the face that the neural network in the application is extracted from facial image to be processed is crucial Point information generally includes: (i.e. face key point is in facial image to be processed for face key point number and face key point coordinate Coordinate system in coordinate), for example, neural network extracts 21-300 face key point respectively from facial image to be processed Number and respective coordinate.Above-mentioned 21-300 face key point can be specially 68,106,240,252 or 270 etc..Certainly, the neural network in the application can also extract greater number or more from facial image to be processed The face key point information of small number, for example, the neural network in the application only extracted from facial image to be processed it is above-mentioned 12 motionless key point informations of three-dimensional.The face key point information extracted from facial image to be processed is not limited from the application Particular number.

The face key point information including three-dimensional motionless key point that the application obtains can be used for the image of face The applications such as rendering, Face Changing processing, U.S. face processing, makeups processing, recognition of face, face state-detection and expression detection, this Shen The concrete application range of the face key point information extracted via neural network is not limited please.

In an optional example, the application can use multiple facial image samples of sample data concentration to nerve net Network is trained.The facial image sample that sample data is concentrated has face key point markup information, face key point mark letter It should include: at least one three-dimensional motionless key point markup information in breath, for example, facial image sample is closed with 21-300 face Key point markup information, this 21-300 face key point markup information can be specially 68,106,240,252 or 270 face key point markup informations include 12 motionless key points of three-dimensional in this 21-300 face key point markup information Markup information.The application does not limit the particular number and face for the face key point markup information that facial image sample is had The particular number for the motionless key point markup information of three-dimensional that key point markup information is included.

In an optional example, the face key point markup information that facial image sample is had includes the people being blocked Face key point markup information, i.e. the application be not crucial only for the face being located at non-blocking position in facial image sample Point is labeled, and can be also labeled for the face key point being located at the position that is blocked in facial image sample.One Optional example, it is 106 that the application, which can be set in advance in the face total number of keypoints amount in facial image sample needed to mark, A, regardless of the deflection direction of the face in facial image sample and expression, the application is directed to any facial image sample 106 face key point markup informations should all be marked out.In addition, the motionless key point of three-dimensional either in facial image sample, Or other face key points in addition to three-dimensional motionless key point in facial image sample, in facial image sample For the face key point at the position that is blocked, the application is that the perspective that the coordinate that it is marked is the face key point is sat Mark.

In an optional example, the process of the face key point markup information in facial image sample is arranged such as in the application Shown in Fig. 3.

In Fig. 3, S300, curve control point on face in facial image sample is determined, and according to curve controlled dot At curve.

In an optional example, all face key points are usually to pass through a plurality of curve to be based on inserting in facial image sample What the mode of value was formed, each curve is usually to pass through its curve control point to be formed；For example, in facial image sample Left eye for, the application should obtain the multiple upper eyelid line traffic control points and multiple palpebra inferior line traffic control points of left eye, and using more A upper eyelid line traffic control point forms the upper eyelid line of left eye, and the palpebra inferior line of left eye is formed using multiple palpebra inferior lines；For people For right eye in face image sample, the application should obtain right eye multiple upper eyelid line traffic control points and multiple palpebra inferior line traffic controls Point, and using the upper eyelid line of multiple upper eyelid line traffic control points formation right eye, utilize multiple palpebra inferior lines to form the lower eye of right eye Eyelid line；For the nose in facial image sample, the application should obtain multiple nose lower edges control point, and utilize multiple Nose lower edge control point formed nose it is following along；For the mouth in facial image sample, the application should be obtained Multiple upper lip upper lip lines control point, multiple upper lip lower lip lines control point, multiple lower lip upper lip lines control point and multiple Lower lip lower lip line control point, and upper lip upper lip line is formed using multiple upper lip upper lip lines control point, utilize multiple upper mouths Lip lower lip line control point forms upper lip upper lip line, forms lower lip upper lip line, benefit using multiple lower lip upper lip lines control point Lower lip upper lip line is formed with multiple lower lip lower lip lines control point；For the face mask in facial image sample, this Application should obtain multiple face mask control points, and form face mask using multiple face mask control points.

In an optional example, the application pixel-by-pixel by the way of interpolation, can pass through using between curve control point Corresponding multiple curve control points form corresponding curve.The application does not limit the specific reality by curve control point forming curves Existing mode.

S310, the motionless key point of three-dimensional on curve as facial image sample is determined.

In an optional example, the application can determine on curve with marked on facial image sample three-dimensional it is motionless The distance of key point meets the point of preset distance requirement, and will meet three of the point of preset distance requirement as facial image sample Tie up motionless key point, optionally, the application can determine on curve with the motionless key of three-dimensional that is marked on facial image sample Point is apart from nearest point, and the motionless key point of three-dimensional by the nearest point of the distance as facial image sample.

In an optional example, label has three-dimensional motionless key on the facial image sample in the application Point, for example, marking eye under right eye upper eyelid center key point, right eye on facial image sample by way of manually marking Along center key point, mouth under eyelid center key point, left eye upper eyelid center key point, oculus sinisterlower lid center key point, nose The right lip peak key point of lip, upper lip upper lip line center key point, the left lip peak key point of lip, upper lip lower lip line center are crucial Point, lower lip upper lip line center key point, lower lip lower lip line center key point and chin center key point, this 12 three-dimensionals The coordinate of motionless key point.

In an optional example, the motionless key point of three-dimensional that handmarking goes out on facial image sample is usually corresponding One curve.For any motionless key point of three-dimensional marked on facial image sample, the application can be in the three-dimensional Near motionless key point, select from the corresponding curve of the motionless key point of the three-dimensional at 1 points, and according to marking The coordinate of three-dimensional motionless key point and the coordinate of the point on curve calculate each point selected and the motionless key point of the three-dimensional it Between distance, so as to select with the motionless key point of the three-dimensional apart from the smallest point；The application can use the distance most The above-mentioned motionless key point of three-dimensional marked of small point substitution.

The application can also be using the motionless key point of three-dimensional of other modes setting facial image sample, for example, the application The motionless key point of three-dimensional that directly handmarking can be gone out is as the motionless key point of three-dimensional of facial image sample.The application is not The specific implementation of the motionless key point of three-dimensional of limitation setting facial image sample.

S320, the point on at least side of the motionless key point of three-dimensional of facial image sample, selection curve.

In an optional example, the application would generally choose respectively quantity phase from the left and right sides of three-dimensional motionless key point Deng point.The application can choose the point on curve by way of interpolation；For example, if the curve be simple eye upper eyelid line or Person's palpebra inferior line can be then inserted into (such as uniform insertion) except three-dimensional motionless key point on upper eyelid line or palpebra inferior line 9-14 (such as 10) a point except (such as upper eyelid center key point or palpebra inferior center key point)；For another example if the song Line is the upper lip upper lip line of mouth, then can be inserted on upper lip upper lip line (such as uniform insertion) except three-dimensional motionless key 13-18 except point (the right lip peak key point of such as lip, upper lip upper lip line center key point and the left lip peak key point of lip) (such as 14) a point；For another example if the curve is the lower lip upper lip line or lower lip lower lip line of mouth, it can be in lip Insertion (such as uniform insertion) is except three-dimensional motionless key point (such as lower lip upper lip line center key point on upper lip line or lower lip line Or lower lip lower lip line key point) except 12-17 (such as 14) a point；For another example if the curve is face mask curve, Can be then inserted on face mask curve (such as uniform insertion) except three-dimensional motionless key point (following Palestine and China's heart key point) it Outer at least 32 points.

It should be strongly noted that the quantity for the point being inserted into the application for a curve can be according to actual needs It determines, still, should ensure that for the quantity for the point that a curve is inserted into: the curve counterpart of the be fitted formation of point through being inserted into The degree of error of the practice sites curve of face, for the error of the practice sites curve of the opposite face of curve formed by curve control point 1/5-the 1/10 of degree.It follows that the application is that facial image sample is formed by expressed by face key point markup information Shape, can be closer to actual position shape, to be more advantageous to trained neural network.

S330, face key point mark is formed according to the motionless key point of three-dimensional of the point and facial image sample of above-mentioned selection Infuse information.

In an optional example, the motionless key point of three-dimensional in point and curve being inserted into curve is in facial image Coordinate information in sample is the coordinate information in corresponding face key point markup information, the point and song being inserted into curve Serial number of the motionless key point of three-dimensional in curve in line can be converted into the key of the corresponding face in facial image sample Number in point markup information.

The process of the method for being trained to neural network of the application is as shown in Figure 4.

In Fig. 4, S400, multiple facial image samples are obtained.

The facial image sample of the application is labeled with: the face key point including at least one three-dimensional motionless key point Markup information.Facial image sample is labeled with: the face key point being located at the position that is not blocked in facial image sample Face key point markup information at markup information and the position that is blocked in facial image sample.In facial image sample 21-300 face key point markup information can be labeled with, for example, can be labeled in facial image sample 68,106, 240,252 or 270 face key point markup informations.The mistake of face key point markup information is marked for facial image sample Journey may refer to the above-mentioned description for Fig. 3.

S410, multiple facial image sample training neural networks using acquisition.

In an optional example, facial image sample is inputted neural network by the application, is extracted via neural network The face key point information of facial image sample, and the face key point information and facial image extracted according to neural network The face key point markup information marked in sample, using corresponding loss function, to adjust the parameter in neural network.This Shen Detailed process is as follows states description in Fig. 5 by one for please being trained using facial image sample to neural network.

In an optional example, a specific example of the method for being trained to neural network of the application Process is as shown in Figure 5.

In Fig. 5, facial image sample S500, is inputted into neural network, extracts facial image sample via neural network Face key point information.

In an optional example, the facial image sample in the application has multiple face key point markup informations.People The face key point markup information that face image sample is had includes: at least one three-dimensional motionless key point markup information.One Optional example, facial image sample have 106 face key point markup informations, this 106 face key point markup informations In include the motionless key point markup information of 12 three-dimensionals, for example, right eye upper eyelid center key point markup information, right eye eyelid Center key point markup information, left eye upper eyelid center key point markup information, oculus sinisterlower lid center key point markup information, Along center key point markup information, the right lip peak key point markup information of lip, upper lip upper lip line center key point mark under nose Infuse information, the left lip peak key point markup information of lip, upper lip lower lip line center key point markup information, in lower lip upper lip line Heart key point markup information, lower lip lower lip line center key point markup information and chin center key point markup information.Often One key point markup information includes: the coordinate of key point number and key point in facial image sample.In addition, no matter people With the presence or absence of the face key point being located at the position that is blocked in face image sample, (such as 106 faces close all face key points Key point) all have markup information.

S510, calculated according to the face key point information that extracts between the different key points at least one organ the One distance.

In an optional example, for the organ (such as left eye or right eye or mouth) in face, mind Multiple face key points can be extracted through network, multiple face key points can form at least one set of face key point, the application The distance between two face key points in each group can be calculated, to obtain at least one first distance.

One optional example, in the case where organ is simple eye (left eye or right eye), neural network can extract multiple The usually upper bottom of upper eyelid line key point and multiple palpebra inferior line key points, upper eyelid line key point and palpebra inferior line key point It sets relatively, the opposite upper eyelid line key point in upper and lower position and palpebra inferior line key point can form one group of eyelid line key point, To which the application can form multiple groups eyelid line key point, by that can be counted according to the coordinate of two eyelid line key points in group The distance between two eyelid line key points in the group are calculated, to obtain the first eyelid linear distance.

One optional example, in the case where organ is mouth, neural network can extract multiple upper lip upper lip lines and close Key point and multiple lower lip lower lip line key points, upper lip upper lip line key point and lower lip lower lip line key point are usually up and down Position is opposite, and the opposite upper lip upper lip line key point in upper and lower position and lower lip lower lip line key point can form one group of outer lip Line key point passes through the seat according to two lip line key points in group so that the application can form multiple groups outer lip line key point Mark can calculate the distance between two lip line key points in the group, to obtain the first outer lip linear distance.

One optional example, in the case where organ is mouth, neural network can extract multiple upper lip lower lip lines and close Key point and multiple lower lip upper lip line key points, upper lip lower lip line key point and lower lip upper lip line key point are usually up and down Position is opposite, and the opposite upper lip lower lip line key point in upper and lower position and lower lip upper lip line key point can form one group of epipharynx Line key point passes through the seat according to two lip line key points in group so that the application can form multiple groups epipharynx line key point Mark can calculate the distance between two lip line key points in the group, to obtain the first epipharynx linear distance.

S520, according to face key point markup information calculate between the different key points at least one organ second away from From.

In an optional example, for an organ in face, multiple face key points are usually labeled with, are marked The multiple face key points outpoured can form at least one set of face key point, and the application can calculate two people in each group The distance between face key point, to obtain at least one second distance.

One optional example, in the case where organ is simple eye (left eye or right eye), usual emblem in facial image sample It is marked with multiple upper eyelid line key points and multiple palpebra inferior line key points, upper eyelid line key point and palpebra inferior line key point are logical Normal upper and lower position is opposite, and the opposite upper eyelid line key point in upper and lower position and palpebra inferior line key point can form one group of eyelid line Key point passes through the mark according to two eyelid line key points in group so that the application can form multiple groups eyelid line key point Note coordinate can calculate the distance between two eyelid line key points in the group, to obtain the second eyelid linear distance.

One optional example would generally be labeled with multiple upper mouths in facial image sample in the case where organ is mouth Lip upper lip line key point and multiple lower lip lower lip line key points, upper lip upper lip line key point and lower lip lower lip line are crucial The usual upper and lower position of point is opposite, and the opposite upper lip upper lip line key point in upper and lower position and lower lip lower lip line key point can be with shapes At one group of outer lip line key point, so that the application can form multiple groups outer lip line key point, by according to two lip lines in group The mark coordinate of key point can calculate the distance between two lip line key points in the group, to obtain the second outer lip line Distance.

One optional example would generally be labeled with multiple upper mouths in facial image sample in the case where organ is mouth Lip lower lip line key point and multiple lower lip upper lip line key points, upper lip lower lip line key point and lower lip upper lip line are crucial The usual upper and lower position of point is opposite, and the opposite upper lip lower lip line key point in upper and lower position and lower lip upper lip line key point can be with shapes At one group of epipharynx line key point, so that the application can form multiple groups epipharynx line key point, by according to two lip lines in group The mark coordinate of key point can calculate the distance between two lip line key points in the group, to obtain the second epipharynx line Distance.

S530, it is exercised supervision via first-loss function to neural network based on above-mentioned first distance and second distance It practises.

In an optional example, the application is usually provided with four first-loss functions, this four first-loss functions Left eye, right eye, outer lip line and epipharynx line are respectively corresponded, the first eyelid linear distance and second gone out based on left eye eyelid line computation Eyelid linear distance is updated in a first-loss function, the first eyelid linear distance gone out based on right eye eyelid line computation and the Two eyelid linear distances are updated in another loss function, outside the first outer lip linear distance and second gone out based on outer lip line computation Lip linear distance is updated in another loss function, the first epipharynx linear distance and the second epipharynx line gone out based on epipharynx line computation Distance is updated in remaining first-loss function.The application by using four first-loss functions to neural network into Row supervised learning, the face key point for being conducive to improve the facial image that neural network is directed to big expression or wide-angle deflection mention The accuracy taken.

In an optional example, the training process of the application can also include step S540.

Step S540, it is formed for the face key point face key point corresponding with mark that neural network extracts Any group of key point calculates separately in the group for example, two face key points with identical number form one group of key point The distance between two key points, to obtain at least one drift distance；And based on the drift distance via the second loss function It exercises supervision study to neural network.The application is by exercising supervision study to neural network using the second loss function, favorably In the accuracy for improving neural network extraction face key point.

Fig. 6 is the structural schematic diagram of one embodiment of the face image processing device of the application.As shown in fig. 6, the reality The device for applying example includes: input picture module 600 and the crucial point module 610 of acquisition.Optionally, at the facial image of the embodiment Reason device can also include: the device 620 for training neural network.

Input picture module 600 is for facial image to be processed to be input in neural network.Input picture module 600 has The operation that body the executes such as above-mentioned description for the S100 in Fig. 1, is no longer described in detail herein.

It obtains crucial point module 610 and is used to extract the three-dimensional including at least one of facial image to be processed via neural network Face key point information including motionless key point.Obtain operation that crucial point module 610 specifically executes such as above-mentioned in Fig. 1 S110 description, be no longer described in detail herein.

Device 620 for training neural network is used to utilize the nerve net in multiple facial image sample training the application Network.For train neural network device 620 specific structure referring to following descriptions for Fig. 7, be no longer described in detail herein.

Fig. 7 is the structural schematic diagram of an embodiment of the device 620 for training neural network of the application.Such as Fig. 7 institute Show, the device of the embodiment includes: to obtain sample module 621 and training module 622.Optionally, for training neural network Device 620 can also include: labeling module 623.

Sample module 621 is obtained for obtaining multiple facial image samples.Obtain the facial image that sample module 621 obtains Sample is labeled with: the face key point markup information including at least one three-dimensional motionless key point.

Training module 622 is used for the multiple facial image sample training neural networks obtained using sample module 621 is obtained.

In an optional example, training module 622 can be specifically used for inputting facial image sample into neural network, warp The face key point information that facial image sample is extracted by neural network, according to the face key point information extracted calculate to The first distance between different key points on a few organ, calculates at least one organ according to face key point markup information On different key points between second distance；And first distance and second distance are based on via first-loss function to nerve net Network exercises supervision study.

In an optional example, training module 622 can also be specifically used for, for by extracting face key point with At least one set of key point that the corresponding face key point of mark is formed, determines the distance between two key points in group, with Obtain at least one drift distance；It is exercised supervision study via the second loss function to neural network based on drift distance.

The operation that training module 622 specifically the executes such as above-mentioned description for each step in Fig. 5, herein no longer specifically It is bright.

Labeling module 623 is used to determine the curve control point at least one position of face in facial image sample, according to The curve control point forming curves determine the motionless key point of three-dimensional on the curve as facial image sample, in facial image At least side (such as the left and right sides) of the motionless key point of three-dimensional of sample, chooses the point on curve, and according to the point of selection and The motionless key point of three-dimensional of facial image sample forms face key point markup information.The operation that labeling module 623 specifically executes Such as the above-mentioned description for each step in Fig. 3, no longer it is described in detail herein.

Example devices

Fig. 8 shows the example devices 800 for being adapted for carrying out the application, and equipment 800 can be the control configured in automobile System/electronic system, mobile terminal (for example, intelligent mobile phone etc.), personal computer (PC, for example, desktop computer or Notebook computer etc.), tablet computer and server etc..In Fig. 8, equipment 800 includes one or more processor, communication Portion etc., one or more of processors can be with are as follows: one or more central processing unit (CPU) 801, and/or, one Or multiple images processor (GPU) 813 etc., processor can be executable in read-only memory (ROM) 802 according to being stored in Instruction or executed from the executable instruction that storage section 808 is loaded into random access storage device (RAM) 803 it is various appropriate Movement and processing.Communication unit 812 can include but is not limited to network interface card, and the network interface card can include but is not limited to IB (Infiniband) network interface card.Processor can be executed with communicating in read-only memory 802 and/or random access storage device 830 can It executes instruction, be connected by bus 804 with communication unit 812 and communicated through communication unit 812 with other target devices, to complete Corresponding steps in the application.

In an optional example, processor upon executing the commands, realizes any face image processing in the application Step in method, or realize the step in any method for training neural network in the application.Processor institute is real Existing step may refer to the associated description in above-mentioned each method embodiment, no longer be described in detail herein.

In addition, in RAM 803, various programs and data needed for device operation can also be stored with.CPU801, ROM802 and RAM803 is connected with each other by bus 804.In the case where there is RAM803, ROM802 is optional module. RAM803 stores executable instruction, or executable instruction is written into ROM802 at runtime, and executable instruction makes central processing Unit 801 executes step included by above-mentioned method for segmenting objects.Input/output (I/O) interface 805 is also connected to bus 804. Communication unit 812 can integrate setting, may be set to be with multiple submodule (for example, multiple IB network interface cards), and respectively and always Line connection.

I/O interface 805 is connected to lower component: the importation 806 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 807 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage section 808 including hard disk etc.； And the communications portion 809 of the network interface card including LAN card, modem etc..Communications portion 809 via such as because The network of spy's net executes communication process.Driver 810 is also connected to I/O interface 805 as needed.Detachable media 811, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 810, in order to read from thereon Computer program be installed in storage section 808 as needed.

It should be strongly noted that framework as shown in Figure 8 is only a kind of optional implementation, in concrete practice process In, can the component count amount and type according to actual needs to above-mentioned Fig. 8 selected, deleted, increased or replaced；In different function In component setting, can also be used it is separately positioned or integrally disposed and other implementations, for example, the separable setting of GPU and CPU, for another example GPU, can be integrated on CPU by reason, the separable setting of communication unit, can also be integrally disposed on CPU or GPU etc..These are replaceable Embodiment each fall within the protection scope of the application.

Particularly, it according to presently filed embodiment, may be implemented as calculating below with reference to the process of flow chart description Machine software program is stored thereon with computer program for example, the application embodiment includes a kind of computer program product, should When computer program is executed by processor, the step in any face image processing process in the application is realized, or realize The step in any method for training neural network in the application.

In such an embodiment, which can be downloaded and be pacified from network by communications portion 809 Dress, and/or be mounted from detachable media 811.When the computer program is executed by central processing unit (CPU) 801, execute Above-mentioned steps as described in this application.

The present processes and device, electronic equipment and computer-readable storage medium may be achieved in many ways Matter.For example, can be realized by any combination of software, hardware, firmware or software, hardware, firmware the present processes and Device, electronic equipment and computer readable storage medium.The said sequence of the step of for method merely to be illustrated, The step of the present processes, is not limited to sequence described in detail above, unless specifically stated otherwise.In addition, some In embodiment, the application can be also embodied as recording program in the recording medium, these programs include for realizing basis The machine readable instructions of the present processes.Thus, the application also covers storage for executing the facial image according to the application Processing method and/or recording medium (such as computer-readable storage medium for training the computer program of the method for neural network Matter).

The description of the present application is given for the purpose of illustration and description, and is not exhaustively or by the application It is limited to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.It selects and retouches Embodiment is stated and be the principle and practical application in order to more preferably illustrate the application, and enables those skilled in the art Understand that the application is suitable for the various embodiments with various modifications of special-purpose to design.

Claims

1. a kind of face image processing process characterized by comprising

Facial image to be processed is input in neural network；

Via the neural network extract the facial image to be processed including at least one three-dimensional motionless key point Face key point information.

2. face image processing process according to claim 1, which is characterized in that the face that the neural network extracts Key point information includes: that the face being located at the position that is not blocked at position and/or is blocked in facial image to be processed closes Key point information.

3. face image processing process according to any one of claim 1 to 2, which is characterized in that the three-dimensional is motionless Key point include: under eyelid center key point, nose along center key point, lip lip peak key point, lip center key point with And at least one of chin center key point.

4. face image processing process according to claim 3, which is characterized in that eyelid center key point includes: Left eye upper eyelid center key point, oculus sinisterlower lid center key point, right eye upper eyelid center key point and right eye eyelid At least one of center key point.

5. a kind of method for training neural network, which is characterized in that the described method includes:

Obtain multiple facial image samples, wherein the facial image sample is labeled with: including at least one three-dimensional motionless key Face key point markup information including point；

Utilize the multiple facial image sample training neural network.

6. a kind of face image processing device characterized by comprising

Input picture module, for facial image to be processed to be input in neural network；

Obtain crucial point module, for extract the facial image to be processed via the neural network include at least one three Tie up the face key point information including motionless key point.

7. a kind of for training the device of neural network, which is characterized in that described device includes:

Sample module is obtained, for obtaining multiple facial image samples, wherein the facial image sample is labeled with: including extremely Face key point markup information including a few motionless key point of three-dimensional；

Training module, for utilizing neural network described in the multiple facial image sample training.

8. a kind of electronic equipment, comprising:

Memory, for storing computer program；

Processor, for executing the computer program stored in the memory, and the computer program is performed, and is realized Step in face image processing process described in any one of the claims 1-4, or realize 5 institute of the claims The step in the method for training neural network stated.

9. a kind of computer readable storage medium, is stored thereon with computer program, when which is executed by processor, It realizes the step in face image processing process described in any one of the claims 1-4, or realizes that aforesaid right is wanted The step in method described in asking 5 for training neural network.

10. a kind of computer program, including computer instruction, when the computer instruction is run in the processor of equipment, It realizes the step in face image processing process described in any one of the claims 1-4, or realizes that aforesaid right is wanted The step in method described in asking 5 for training neural network.