CN108875520B

CN108875520B - Method, device and system for positioning face shape point and computer storage medium

Info

Publication number: CN108875520B
Application number: CN201711386408.7A
Authority: CN
Inventors: 熊鹏飞
Original assignee: Beijing Kuangshi Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd
Priority date: 2017-12-20
Filing date: 2017-12-20
Publication date: 2022-02-08
Anticipated expiration: 2037-12-20
Also published as: CN108875520A

Abstract

The embodiment of the invention provides a method, a device and a system for positioning face shape points and a computer storage medium. The method comprises the following steps: carrying out face detection on an input image, and determining that the input image contains a face; acquiring sparse face shape points of the face in the input image; aligning the face in the input image with the sparse face shape points to obtain an aligned input image; and inputting the aligned input image into a trained dense point positioning model to obtain dense face shape points of the face in the input image. The embodiment of the invention can use the trained dense point positioning model to obtain the dense face shape points, thereby more accurately describing each organ on the face and providing an accurate processing basis for the processing of three-dimensional face reconstruction and the like. The process is short in time consumption and high in efficiency, and is easy to realize on equipment such as a mobile terminal.

Description

Method, device and system for positioning face shape point and computer storage medium

Technical Field

The present invention relates to the field of image processing, and more particularly, to a method, an apparatus, a system, and a computer storage medium for positioning human face shape points.

Background

The positioning of the face shape point is a core technology in the face correlation technology and is also a classic problem in the field of computer vision. With the development of smart phones, the face shape point positioning is widely applied to various face entertainment related application programs (APPs), such as three-dimensional face reconstruction, face beauty and the like.

The face shape point positioning can be performed in a short time by using a small number of sparse points, however, since the face shape and the expression are too complex, the complete contour of the organ cannot be accurately described by the small number of sparse points, and thus the face after the subsequent processing (such as three-dimensional reconstruction and the like) may be distorted. Therefore, a technique for rapidly and accurately locating the face shape point is urgently needed to solve the problem.

Disclosure of Invention

The present invention has been made in view of the above problems. The invention provides a method, a device and a system for positioning face shape points and a computer storage medium, which can obtain the dense face shape points of a face in an input image based on a trained dense point positioning model so as to accurately describe each organ on the face.

According to an aspect of the present invention, there is provided a method for locating face shape points, the method comprising:

carrying out face detection on an input image, and determining that the input image contains a face;

acquiring sparse face shape points of the face in the input image;

aligning the face in the input image with the sparse face shape points to obtain an aligned input image;

and inputting the aligned input image into a trained dense point positioning model to obtain dense face shape points of the face in the input image.

In one embodiment of the present invention, the dense point localization model is obtained by training:

constructing a data set according to a plurality of first face images, wherein the face images in the data set have labeled face shape points;

training the dense point localization model based on the dataset until convergence.

In one embodiment of the present invention, the constructing the data set from the plurality of first face images includes:

performing the following operation on each of the plurality of first face images, and constructing the plurality of operated first face images into the data set:

acquiring a plurality of sparse shape points on a face in a first face image;

obtaining the contour line of the organ on the human face according to the sparse shape points;

sampling on the contour line to obtain dense shape points on the human face;

and taking the dense shape points as the labeled human face shape points.

In one embodiment of the invention, the data sets comprise sparse shape point data sets and dense shape point data sets,

the constructing a data set from a plurality of first face images comprises:

constructing the dense shape point data set according to the first face images;

and constructing the sparse shape point data set according to the plurality of second face images.

In one embodiment of the invention, said constructing the dense shape point data set from the plurality of first face images comprises:

performing the following operation for each of the plurality of first face images, and constructing the plurality of first face images after the operation as the dense shape point data set:

acquiring a plurality of sparse shape points on a face in a first face image;

sampling on the contour line to obtain dense shape points on the human face;

and taking the dense shape points as the labeled human face shape points.

In an embodiment of the present invention, the constructing the sparse shape point data set from the plurality of second face images includes:

acquiring a plurality of sparse shape points on the face in each second face image;

constructing a plurality of second face images having the plurality of sparse shape points as the sparse shape point data set.

In one embodiment of the invention, the number of images in the dense shape point data set is smaller than the number of images in the sparse shape point data set.

In one embodiment of the invention, during the training process:

calculating a weighted sum of values of a loss function based on the dense shape point data set and values of a loss function based on the sparse shape point data set;

and performing parameter optimization on the dense point positioning model according to the weighted sum until convergence.

In an embodiment of the present invention, the acquiring a plurality of sparse shape points on a face in a first face image includes:

and inputting the first face image into a sparse point positioning model to obtain the plurality of sparse shape points on the face.

In an embodiment of the present invention, the obtaining, according to the plurality of sparse shape points, a contour line of an organ on the human face includes:

performing curve fitting on the plurality of sparse shape points to obtain a fitting curve;

and adjusting the fitting curve to obtain the contour line.

In an embodiment of the present invention, the sampling on the contour line to obtain dense shape points on the face includes:

and determining the sampled points on the contour line as the dense shape points by using a uniform sampling mode.

In one embodiment of the invention, the contour lines of different sizes of faces have different sampling step sizes, and/or the contour lines of different organs on the faces have different sampling step sizes.

In one embodiment of the invention, the loss function during training is determined as follows:

acquiring a first group of shape points marked by a face image in a data set;

acquiring a second group of shape points output by a dense point positioning model in training;

calculating the nearest matching points between the first group of shape points and the second group of shape points by adopting a dynamic programming method;

determining the loss function according to the distance between the nearest matching points.

In an embodiment of the present invention, the acquiring sparse face shape points of the face in the input image includes:

and inputting the input image into a sparse point positioning model to obtain the sparse face shape points of the face in the input image.

According to another aspect of the present invention, there is provided an apparatus for locating face shape points, the apparatus being adapted to implement the steps of the method of the preceding aspect or each embodiment, the apparatus comprising:

the face detection module is used for carrying out face detection on an input image and determining that the input image contains a face;

an obtaining module, configured to obtain sparse face shape points of the face in the input image;

the alignment module is used for aligning the face in the input image with the sparse face shape points to obtain an aligned input image;

and the output module is used for inputting the aligned input images to the trained dense point positioning model to obtain dense face shape points of the face in the input images.

According to a further aspect of the present invention, there is provided a system for locating face shape points, comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements the steps of the method for locating face shape points according to the aforementioned aspects and examples when executing the computer program.

According to a further aspect of the present invention, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of face shape point localization described in the preceding aspects and examples.

Therefore, the embodiment of the invention can train to obtain the dense point positioning model which outputs more shape points, thereby more accurately describing each organ on the face and providing accurate processing basis for the processing of three-dimensional face reconstruction and the like. The process is short in time consumption and high in efficiency, and is easy to realize on equipment such as a mobile terminal.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail embodiments of the present invention with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings, like reference numbers generally represent like parts or steps.

FIG. 1 is a schematic block diagram of an electronic device of an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a method of face shape point location in accordance with an embodiment of the present invention;

FIG. 3 is another schematic flow chart of a method of face shape point localization in accordance with an embodiment of the present invention;

FIG. 4 is a schematic flow chart diagram of a method of training a dense point localization model of an embodiment of the present invention;

fig. 5 is a schematic block diagram of an apparatus for locating face shape points according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments according to the present invention will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of embodiments of the invention and not all embodiments of the invention, with the understanding that the invention is not limited to the example embodiments described herein. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the invention described herein without inventive step, shall fall within the scope of protection of the invention.

The embodiment of the present invention can be applied to an electronic device, and fig. 1 is a schematic block diagram of the electronic device according to the embodiment of the present invention. The electronic device 10 shown in FIG. 1 includes one or more processors 102, one or more memory devices 104, an input device 106, an output device 108, an image sensor 110, and one or more non-image sensors 114, which are interconnected by a bus system 112 and/or otherwise. It should be noted that the components and configuration of the electronic device 10 shown in FIG. 1 are exemplary only, and not limiting, and that the electronic device may have other components and configurations as desired.

The processor 102 may include a Central Processing Unit (CPU) 1021 and a Graphics Processing Unit (GPU) 1022 or other forms of Processing units having data Processing capability and/or Instruction execution capability, such as a Field-Programmable Gate Array (FPGA) or an Advanced Reduced Instruction Set Machine (Reduced Instruction Set Computer) Machine (ARM), and the like, and the processor 102 may control other components in the electronic device 10 to perform desired functions.

The storage 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory 1041 and/or non-volatile memory 1042. The volatile Memory 1041 may include, for example, a Random Access Memory (RAM), a cache Memory (cache), and/or the like. The non-volatile Memory 1042 may include, for example, a Read-Only Memory (ROM), a hard disk, a flash Memory, and the like. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 102 to implement various desired functions. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 108 may output various information (e.g., images or sounds) to an outside (e.g., a user), and may include one or more of a display, a speaker, and the like.

The image sensor 110 may take images (e.g., photographs, videos, etc.) desired by the user and store the taken images in the storage device 104 for use by other components.

It should be noted that the components and structure of the electronic device 10 shown in fig. 1 are merely exemplary, and although the electronic device 10 shown in fig. 1 includes a plurality of different devices, some of the devices may not be necessary, some of the devices may be more numerous, and the like, as desired, and the invention is not limited thereto.

Fig. 2 is a schematic flow chart of a method for locating face shape points according to an embodiment of the present invention. The method shown in fig. 2 may include:

and S101, obtaining the trained dense point positioning model.

And S102, inputting the input image into the trained dense point positioning model to obtain dense face shape points of the face in the input image.

Alternatively, the dense point location model may be trained on the basis of the sparse point location model, and the training process may refer to the detailed description of fig. 4.

Illustratively, the number of dense face shape points output by the dense point localization model may be represented as N, where N is a positive integer. For example, the value of N may be 266 or 1064, etc.

Before S102, the method may further include: and carrying out face detection on the input image, and determining that the input image contains a face. For example, an input image may be input to a face detection model, resulting in a face region in the input image. Alternatively, the face detection model may be in the form of a convolutional neural network.

Exemplarily, the method for positioning a face shape point provided by the embodiment of the present invention may be as shown in fig. 3, and includes:

s201, carrying out face detection on an input image, and determining that the input image contains a face.

S202, acquiring sparse face shape points of the face in the input image.

And S203, aligning the human faces in the input image with the sparse human face shape points to obtain an aligned input image.

And S204, inputting the aligned input image to the trained dense point positioning model to obtain dense face shape points of the face in the input image.

As an implementation manner, S202 may include: and inputting the input image into a sparse point positioning model to obtain the sparse face shape points of the face in the input image.

The number of sparse face shape points output by the sparse point positioning model can be expressed as M, wherein M is a positive integer. For example, the value of M may be 68, 74, 87, 106, or the like. It should be noted that in the embodiment of the present invention, N > M. The sparse face shape points obtained in S202 may be recorded as M shape points, and the dense face shape points obtained in S204 may be recorded as N shape points.

Thus, in the embodiment of fig. 3, M shape points on the face may be obtained based on the sparse point location model first, and N shape points on the face may be obtained based on the trained dense point location model. Since N > M, the N shape points can better describe the contour of the organ on the face, so that the shape of each organ on the face can be obtained based on the N shape points. Also, if the number of shape points output by the dense point location model is fixed at certain organ-specific positions, for example, the number of shape points for fixing the outer canthus of the left eye is a1, and the number of inner canthus of the left eye is a 2. Thus, the shape of the organ can be determined based on the corresponding serial number.

Referring to the method shown in fig. 3, N shape points of a human face on each frame of image can be obtained for an image sequence (video). That is, each frame image may be used as the input image shown in fig. 3, so as to obtain N shape points of the face of each frame image. For example, the face detection may be performed on the 0 th frame (initial frame) image of the image sequence, and the alignment may be performed based on M shape points obtained by the sparse point positioning model, and then the aligned 0 th frame face image is input to the dense point positioning model, so as to obtain N shape points on the 0 th frame face image. Subsequently, the 1 st frame of face image can be aligned based on the 0 th frame of face image, and the aligned 1 st frame of face image is input to the dense point positioning model to obtain N shape points on the 1 st frame of face image. … the efficiency of the process can be improved by aligning the next frame image based on the adjacent previous frame image. Therefore, N shape points on the human face in each frame of image in the image sequence can be obtained.

Based on the analysis, the embodiment of the invention can use the trained dense point positioning model to obtain the dense face shape points, thereby more accurately describing each organ on the face and providing an accurate processing basis for the processing such as three-dimensional face reconstruction and the like. The process is short in time consumption and high in efficiency, and is easy to realize on equipment such as a mobile terminal.

Before the method shown in fig. 2 or fig. 3, obtaining a dense point location model by means of training is further included. The training process may be as shown in fig. 4, including:

s301, constructing a data set according to a plurality of first face images, wherein the face images in the data set have labeled face shape points;

s302, training the dense point positioning model based on the data set until convergence.

As one implementation, the data set in S301 may be a dense shape point data set, that is, each face image in the data set is labeled with dense shape points.

Exemplarily, S301 may include: performing the following operation on each of the plurality of first face images, and constructing the plurality of operated first face images into the data set: acquiring a plurality of sparse shape points on a face in a first face image; obtaining the contour line of the organ on the human face according to the sparse shape points; sampling on the contour line to obtain dense shape points on the human face; and taking the dense shape points as the labeled human face shape points.

Here, the number of the plurality of first face images may be thousands or tens of thousands, or may be more or less, and the present invention is not limited thereto.

For clarity of description, the first face image P1 is taken as an example for illustration. A plurality of sparse shape points on the face in the first face image P1 may be acquired; obtaining the contour line of the organ on the human face according to the sparse shape points; and sampling on the contour line to obtain dense shape points on the human face. Subsequently, the first face image P1 labeled with dense shape points may be taken as one image in the data set.

As an example, a plurality of sparse shape points on the face in the artificially labeled first face image P1 may be acquired. As another example, the first face image P1 may be input to a sparse point localization model to obtain the plurality of sparse shape points on the face, for example, M.

Obtaining a contour line of an organ on the face according to the plurality of sparse shape points may include: performing curve fitting on the plurality of sparse shape points to obtain a fitting curve; and adjusting the fitting curve to obtain the contour line.

In particular, the plurality of sparse shape points may be referred to as anchor points, which may be fitted to obtain a fitted curve, e.g. a bezier (Bessel) curve between each two adjacent anchor points may be fitted to obtain a fitted curve. Subsequently, the fitted curve may be adjusted, for example, a partial region of the fitted curve that does not significantly conform to an organ of a human face may be adjusted.

Illustratively, the fitting curve can be adjusted according to the adjustment instruction of the annotator. Specifically, a annotator manually defines adjustment points (such as one or a plurality of adjustment points) on each bezier curve at will, and drags the adjustment points to fit the contour of the organ according to the contour of the organ on the face, and the adjustment points can also be further used as anchor points. Further, the fitting curve can be adjusted based on a plurality of sparse shape points and newly defined adjustment points (i.e., all anchor points), and due to adjustment by the annotator, the adjusted fitting curve is a curve which is smoother and more conformable to the contour of the organ on the face, so that the adjusted fitting curve can be used as the contour line of the organ on the face. It is understood that the adjustment instruction of the annotator can be an instruction of the annotator to define the adjustment points on the fitted curve and to adjust the positions of the adjustment points.

The sampling on the contour line to obtain dense shape points on the face may include: and determining the sampled points on the contour line as the dense shape points by using a uniform sampling mode. Wherein the step sizes of the samples of different organs on the face may be equal or unequal.

Specifically, for the contour line of the organ on the face in the first face image P1, a plurality of dense shape points can be obtained from the contour line by using a uniform sampling method. As an example, the step size at which the sampling is performed on the contour line may be fixed, for example, 1 mm. As another example, the step size at the time of sampling on the contour line may vary from organ to organ, for example, since the motion of the nose is generally smaller than the motion of the mouth, the step size at the nose may be larger than the step size at the mouth, that is, the number of dense shape points at the nose may be smaller. In this way, differences between different organs can be fully taken into account, saving processing time while ensuring accuracy.

It is understood that the number of the obtained dense shape points is uncertain due to the size of the face in the first face image P1, the sampling step size, and the like, for example, the number of the dense shape points may be equal to N, or may be greater than or less than N.

For example, similar operations are performed on the contour lines of the organs on the faces in the multiple original images, so that the corresponding multiple original images with the second shape points can be obtained. In addition, optionally, the step sizes of the different original image samples may be equal or unequal; alternatively, the number of second shape points after sampling of different original images may be equal or unequal.

Similar processing may be performed for each of the plurality of first face images by the method described above similar to first face image P1, resulting in a data set required for training the dense point localization model. And will not be described in detail herein.

In this implementation, S302 may include: constructing a loss function; training the dense point localization model by iterating until the loss function converges.

In the training process, the dense point positioning model is input into a face image, and is output as N shape points of the face in the face image. Alternatively, the serial numbers of N shape points may be set, and specifically, the serial numbers of shape points at specific positions of some organs may be fixed, such as the serial number of the shape point of the outer corner of the left eye may be set to 0, the serial number of the shape point of the leftmost edge of the lips may be set to 60, and so on. The dense point location model may include several Convolutional layers and pooling layers, and may be, for example, a Convolutional Neural Network (CNN).

Illustratively, the loss function at training can be determined as follows: acquiring a first group of shape points marked by a face image in a data set; acquiring a second group of shape points output by a dense point positioning model in training; calculating the nearest matching points between the first group of shape points and the second group of shape points by adopting a dynamic programming method; determining the loss function according to the distance between the nearest matching points.

Wherein the error between the shape points output by the dense point localization model and the shape points marked in the dataset may be calculated. That is, the loss function may be defined as the difference between the second shape point output by the dense point localization model and the first shape point labeled in the dataset. Since the number of shape points output by the dense point location model is N, while the number of shape points labeled in the dataset is uncertain (related to the original image, sampling step size, etc.), the loss function can be determined by curve fitting.

Specifically, labeled in the dataset is a first set of shape points, and the dense point localization model outputs a second set of shape points. For the two groups of shape points, a closest matching point between the two shape points can be calculated by adopting a dynamic programming method, and a loss function is determined according to the distance between the closest matching points. For example, the sum of Euclidean distances (Euclidean distances) of these closest matching points can be determined as a loss function; as another example, the average of the Euclidean distances of these nearest matching points may be determined as a loss function. When the closest matching point is determined, a line matching method can be adopted, so that the influence caused by different shape point positions marked by different marking personnel can be avoided. It is understood that the distance between the matching points may be euclidean distance, or may be other distances such as cosine distance, which is not listed here.

Wherein, a gradient descent method can be adopted, and model parameters are trained through continuous iteration until the loss function is converged. This can speed up convergence, thereby reducing the duration of training. Specifically, if the difference between two results of successive iterations of the loss function is less than a certain set value (e.g., 10 "6), it may be determined that the loss function converges.

As another implementation, the data set in S301 may include a sparse shape point data set and a dense shape point data set. Wherein, each face image in the sparse shape point data set is marked with sparse shape points, and the sparse shape point data set can be referred to as a sparse data set; dense shape points are marked on each face image in the dense shape point data set, and the dense shape point data set can be referred to as a dense data set for short.

Exemplarily, S301 may include: constructing the dense data set according to the first face images; and constructing the sparse data set according to the plurality of second face images.

The number of the first human face images and the number of the second human face images can be equal or unequal. That is, the number of images in the dense data set may or may not be equal to the number of images in the sparse data set. As an example, the number of images in the dense data set is smaller than the number of images in the sparse data set, which can simplify the labeling process of the data set and improve the training efficiency.

Wherein the following operation is performed for each of the plurality of first face images, and the plurality of first face images after the operation are constructed as the dense data set: acquiring a plurality of sparse shape points on a face in a first face image; obtaining the contour line of the organ on the human face according to the sparse shape points; sampling on the contour line to obtain dense shape points on the human face; and taking the dense shape points as the labeled human face shape points. The process may specifically refer to the description of constructing the data set, and specifically combine the process of completing dense point labeling for the first face image P1, which is not described herein again to avoid repetition.

The constructing the sparse data set according to the plurality of second face images may include: acquiring a plurality of sparse shape points on the face in each second face image; constructing a plurality of second face images having the plurality of sparse shape points as the sparse data set. Here, the plurality of sparse shape points are labeled face shape points. As an example, the faces in the plurality of second face images may be manually labeled, so as to obtain a plurality of sparse shape points on the faces. As another example, each second face image may be input to a sparse point positioning model, and the plurality of sparse shape points, for example, M, on the face may be obtained.

Because the sparse data set can be constructed based on the existing sparse point positioning model, and the dense data set needs the processes of curve fitting, adjustment, sampling and the like, the number of images in the dense data set can be set to be smaller than that of the images in the sparse data set, so that the processing amount of the processes of curve fitting, adjustment, sampling and the like can be reduced, the training time is further shortened, and the training efficiency is improved.

In this implementation, S302 may include: constructing a loss function; training the dense point localization model by iterating until the loss function converges. In particular, a weighted sum of values of the loss function based on the dense shape point data set and values of the loss function based on the sparse shape point data set may be calculated; and performing parameter optimization on the dense point positioning model according to the weighted sum until convergence.

Illustratively, a first loss function may be constructed based on the dense data set, a second loss function may be constructed based on the sparse data set, and a weighted sum of the first loss function and the second loss function may be determined as the loss function used to train the dense point localization model. The process of determining the first loss function or determining the second loss function is similar to the process of determining the loss function in the above implementation manner, and may be, for example, the sum of euclidean distances between nearest matching points of two sets of shape points. For example, a first group of shape points labeled by the face image in the dense data set may be obtained; acquiring a second group of shape points output by a dense point positioning model in training; calculating the nearest matching points between the first group of shape points and the second group of shape points by adopting a dynamic programming method; determining a first loss function according to the distance between the nearest matching points. For example, a third group of shape points labeled by the face image in the sparse data set may be obtained; acquiring a fourth group of shape points output by a dense point positioning model in training; calculating the nearest matching points between the third group of shape points and the fourth group of shape points by adopting a dynamic programming method; and determining a second loss function according to the distance between the nearest matching points.

Specifically, in the training process of S302, since the dense point localization model and the loss function are independent of the number of shape points, the process based on the dense data set and the process based on the sparse data set may be performed simultaneously, thereby reducing the training duration and improving the training efficiency. In the training process, the weights of the two data set training process may be adjusted based on different number of iterations, e.g. at the beginning of the training, a first weight relating to a dense data set may be defined equal to 0.5 and a second weight relating to a sparse data set equal to 0.5. As the number of iterations increases, the first weight may be gradually increased and the second weight may be gradually decreased until the first weight equals 1 and the second weight equals 0.

Thus, in the implementation manner, since the data set comprises a sparse data set, image features with sparse points can be learned in the training process, so that the trained dense point positioning model can obtain more various human face shapes.

Fig. 5 is a schematic block diagram of an apparatus for locating face shape points according to an embodiment of the present invention. The apparatus 50 shown in fig. 5 comprises: a face detection module 510, an acquisition module 520, an alignment module 530, and an output module 540.

A face detection module 510, configured to perform face detection on an input image, and determine that the input image includes a face;

an obtaining module 520, configured to obtain sparse face shape points of the face in the input image;

an aligning module 530, configured to align a face in the input image with the sparse face shape points to obtain an aligned input image;

an output module 540, configured to input the aligned input image to the trained dense point positioning model, so as to obtain dense face shape points of the face in the input image.

As an implementation, the apparatus may further include a training module. And the training module comprises a construction submodule and a training submodule.

The construction submodule is used for constructing a data set according to a plurality of first face images, and the face images in the data set are provided with labeled face shape points. The training submodule is used for training the dense point positioning model based on the data set until convergence.

As one implementation, the construction submodule includes an acquisition unit, a first determination unit, and a second determination unit.

The construction sub-module is configured to perform, by the acquisition unit, the first determination unit, and the second determination unit, the following operation on each of the plurality of first face images, and construct the plurality of first face images after the operation as the data set: the acquiring unit is used for acquiring a plurality of sparse shape points on the face in the first face image; the first determining unit is used for obtaining the contour line of the organ on the human face according to the sparse shape points; and the second determining unit is used for sampling on the contour line to obtain dense shape points on the human face, and taking the dense shape points as the labeled human face shape points.

As one implementation, the data set includes a sparse shape point data set and a dense shape point data set. The building submodule may include a first building submodule and a second building submodule. Wherein the first construction sub-module is configured to construct the dense shape point data set from the plurality of first face images. And the second construction submodule is used for constructing the sparse shape point data set according to the plurality of second face images.

The first building sub-module may include an acquisition unit, a first determination unit, and a second determination unit. The construction sub-module is configured to perform, by the acquisition unit, the first determination unit, and the second determination unit, the following operation for each of the plurality of first face images, and construct the plurality of first face images after the operation as the dense shape point data set: the acquiring unit is used for acquiring a plurality of sparse shape points on the face in the first face image; the first determining unit is used for obtaining the contour line of the organ on the human face according to the sparse shape points; and the second determining unit is used for sampling on the contour line to obtain dense shape points on the human face, and taking the dense shape points as the labeled human face shape points.

The second building submodule may be specifically configured to: acquiring a plurality of sparse shape points on the face in each second face image; constructing a plurality of second face images having the plurality of sparse shape points as the sparse shape point data set.

Illustratively, the number of images in the dense shape point data set is less than the number of images in the sparse shape point data set.

Illustratively, in the training process: a weighted sum of values of a loss function based on the dense shape point data set and values of a loss function based on the sparse shape point data set may be calculated; and performing parameter optimization on the dense point positioning model according to the weighted sum until convergence.

For example, the obtaining unit may be specifically configured to input the first face image to a sparse point positioning model, and obtain the plurality of sparse shape points on the face.

Exemplarily, the first determining unit may be specifically configured to perform curve fitting on the plurality of sparse shape points to obtain a fitted curve; and adjusting the fitting curve to obtain the contour line.

For example, the second determination unit may be specifically configured to determine the sampled points on the contour line as the dense shape points using a uniform sampling manner.

Illustratively, the contour lines of different sizes of faces differ in sample step size, and/or the contour lines of different organs on the faces differ in sample step size.

As one implementation, the loss function used by the training module is determined as follows: acquiring a first group of shape points marked by a face image in a data set; acquiring a second group of shape points output by a dense point positioning model in training; calculating the nearest matching points between the first group of shape points and the second group of shape points by adopting a dynamic programming method; determining the loss function according to the distance between the nearest matching points.

As an implementation manner, the obtaining module 520 may specifically be configured to: and inputting the input image into a sparse point positioning model to obtain the sparse face shape points of the face in the input image.

The apparatus 50 shown in fig. 5 can implement the method shown in fig. 2 to 4, and is not described herein again to avoid repetition.

In addition, another system for positioning face shape points is further provided in an embodiment of the present invention, including a memory, a processor, and a computer program stored in the memory and running on the processor, where the processor implements the steps of the method for positioning face shape points shown in fig. 2 to 4 when executing the program.

In addition, an embodiment of the present invention further provides an electronic device, which may include the apparatus 50 shown in fig. 5. The electronic device can implement the method for positioning the face shape points shown in fig. 2 to 4.

In addition, the embodiment of the invention also provides a computer storage medium, and the computer storage medium is stored with the computer program. The computer program, when executed by a processor, may implement the steps of the method for face shape point localization as described above with reference to fig. 2 to 4. For example, the computer storage medium is a computer-readable storage medium.

The embodiment of the invention provides a method, a device, a system, electronic equipment and a computer storage medium for positioning face shape points, which can train to obtain a dense point positioning model for outputting a larger number of shape points, thereby more accurately describing each organ on a face and providing an accurate processing basis for processing such as three-dimensional face reconstruction. The process is short in time consumption and high in efficiency, and is easy to realize on equipment such as a mobile terminal.

Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the foregoing illustrative embodiments are merely exemplary and are not intended to limit the scope of the invention thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present invention. All such changes and modifications are intended to be included within the scope of the present invention as set forth in the appended claims.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another device, or some features may be omitted, or not executed.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the method of the present invention should not be construed to reflect the intent: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

It will be understood by those skilled in the art that all of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where such features are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some of the modules in an item analysis apparatus according to embodiments of the present invention. The present invention may also be embodied as apparatus programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

The above description is only for the specific embodiment of the present invention or the description thereof, and the protection scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the protection scope of the present invention. The protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for locating face shape points, the method comprising:

acquiring sparse face shape points of the face in the input image;

inputting the aligned input image into a trained dense point positioning model to obtain dense face shape points of the face in the input image;

the dense point positioning model is obtained by training through the following method:

constructing a data set according to a plurality of first face images, wherein the face images in the data set have labeled face shape points; training the dense point localization model based on the dataset until convergence;

the constructing a data set from a plurality of first face images comprises:

acquiring a plurality of sparse shape points on a face in a first face image;

sampling on the contour line to obtain dense shape points on the human face;

and taking the dense shape points as the labeled human face shape points.

2. The method of claim 1, wherein the data sets comprise a sparse shape point data set and a dense shape point data set,

the constructing a data set from a plurality of first face images comprises: constructing a data set according to the plurality of first face images and the plurality of second face images, wherein:

constructing the dense shape point data set according to the first face images;

3. A method according to claim 2 wherein said constructing the dense shape point data set from a plurality of first face images comprises:

acquiring a plurality of sparse shape points on a face in a first face image;

sampling on the contour line to obtain dense shape points on the human face;

and taking the dense shape points as the labeled human face shape points.

4. The method of claim 2, wherein said constructing the sparse shape point data set from a plurality of second face images comprises:

5. The method of claim 2, wherein the number of images in the dense shape point data set is less than the number of images in the sparse shape point data set.

6. The method according to any of claims 2 to 5, characterized in that during training:

7. The method according to claim 1 or 3, wherein the obtaining a plurality of sparse shape points on a face in a first face image comprises:

8. The method according to claim 1 or 3, wherein the obtaining an outline of an organ on the face from the plurality of sparse shape points comprises:

and adjusting the fitting curve to obtain the contour line.

9. The method according to claim 1 or 3, wherein the sampling on the contour line to obtain dense shape points on the face comprises:

10. The method of claim 9, wherein the sampling steps for contours of different sizes of the human face are different, and/or the sampling steps for contours of different organs on the human face are different.

11. The method of claim 1, wherein the loss function during training is determined as follows:

acquiring a first group of shape points marked by a face image in a data set;

12. The method according to any one of claims 1 to 5, wherein the obtaining sparse face shape points of the face in the input image comprises:

13. An apparatus for locating face shape points, the apparatus comprising:

the output module is used for inputting the aligned input images into a trained dense point positioning model to obtain dense face shape points of the face in the input images;

the constructing a data set from a plurality of first face images comprises:

acquiring a plurality of sparse shape points on a face in a first face image;

sampling on the contour line to obtain dense shape points on the human face;

and taking the dense shape points as the labeled human face shape points.

14. A system for facial shape point localization comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor when executing the computer program implements the steps of the method of any of claims 1 to 12.

15. A computer storage medium on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 12.