WO2017054652A1

WO2017054652A1 - Method and apparatus for positioning key point of image

Info

Publication number: WO2017054652A1
Application number: PCT/CN2016/099291
Authority: WO
Inventors: 陈岩; 黄英; 邹建法
Original assignee: 阿里巴巴集团控股有限公司; 陈岩; 黄英; 邹建法
Priority date: 2015-09-29
Filing date: 2016-09-19
Publication date: 2017-04-06
Also published as: CN106558042A; CN106558042B

Abstract

A method and apparatus for positioning a key point of an image. The method comprises: S1, using a first positioning model to position a current frame, and obtaining the positions of m positioning points (101); and S2, using W _id and W _exp of a previous frame as initial parameters, using a second positioning model to determine W _id and W _exp of the current frame based on the m positioning points, positioning the current frame based on W _id and W _exp of the current frame, and obtaining positions of n positioning points, wherein m is less than n, W _id is an object description parameter in an image and W _exp is an expression description parameter in the image (102). In the method and apparatus, the constraint problem that a previous frame and a subsequent frame are about the same object in a video is considered, the jittering of positioning points of the previous and subsequent frames in the video is reduced, and the visual effect is more natural and smoother.

Description

Method and device for key point positioning of image

[Technical Field]

The present invention relates to the field of computer application technologies, and in particular, to a method and apparatus for performing key point positioning on an image.

【Background technique】

With the increasing popularity of smart terminals, the demand for image processing using smart terminals is increasing, and various beauty apps are widely favored by beauty lovers. However, the existing APPs are based on the beauty processing of still images, and the main method is to locate the key points of the organs in the static images. However, there is currently no way to locate key points for video images.

[Summary of the Invention]

In view of this, the present invention provides a method and apparatus for keypoint positioning of an image to facilitate keypoint positioning of a video image.

The specific technical solutions are as follows:

The invention provides a method for performing key point positioning on an image, the method comprising:

S1, using the first positioning model to locate the current frame, and obtaining positions of m positioning points;

S2, using W _id and W _exp of the previous frame as initial parameters, determining, according to the m positioning points, the W _id and W _{exp of the} current frame by using the second positioning model, and based on the W _id and W _exp of the current frame Positioning the current frame to obtain the position of n positioning points;

Wherein m is smaller than the n, the W _id is an object description parameter in an image, and the W _exp is an expression description parameter in the image.

According to a preferred embodiment of the present invention, before the W _id and W _exp of the previous frame are used as initial parameters, the method further includes:

Determining whether the current frame is the first frame of the video, and if not, continuing to perform the steps of W _id and W _exp of the previous frame as initial parameters; if yes, using the initial initial W _id and the initial W _exp as initial parameter.

According to a preferred embodiment of the present invention, the first positioning model is a supervised descent method SDM model, and the second positioning model is a 3-mode singular value decomposition 3-mode SVD model.

According to a preferred embodiment of the present invention, the S1 includes:

S11, extracting an image gradient feature of the current frame, using the image gradient feature and the first positioning model to obtain a degree ΔX of the shape of the positioning point of the current frame deviating from the average positioning point shape;

S12. Using the ΔX of the current frame and the predetermined positions of the m average positioning points, obtain the positions of the m positioning points of the current frame.

According to a preferred embodiment of the present invention, using the image gradient feature and the first positioning model, the degree ΔX of the shape of the positioning point of the current frame deviating from the average positioning point shape is obtained:

Determining ΔX by using ΔX=Φ·R;

The Φ of the current frame is: Φ=p×Dim, where p is the number of images of the training set used by the first positioning model, and Dim is a set range area centered on the positions of the m average positioning points respectively. The extracted 2m-dimensional gradient feature, R is a parameter vector of the first positioning model.

According to a preferred embodiment of the present invention, the method further includes: pre-training the first positioning model, specifically:

A11, determine the image of the training set p key point X _real;

A12. The average position of each key point in the p images is taken as the current iteration position X;

A13, using ΔX = X _real -X, determining the value of Delta] X;

A14. Using R=(Φ ^T ·Φ) ^-1 ·Φ ^T ·ΔX, obtain the current value of the parameter vector R of the first positioning model;

A15. If the modulus of ΔX is less than or equal to the preset first modulus, the current value of R is the value of the parameter vector R of the first positioning model obtained by the training;

Otherwise, the current iteration position X is updated with the value obtained by Φ·R+X, and the process goes to A13.

According to a preferred embodiment of the present invention, determining, by using the second positioning model, the W _{id of the} current frame includes:

S21. The average value of the positions of the m positioning points is used as a current iteration position.

S22, using the current iteration position

The current W _id and W _exp and the parameter vector C _{sdm_exp_id} of the second positioning model corresponding to the m positioning points determine a new iteration position S;

S23, determining a deviation ΔS between the position of the m positioning points and the new iteration position S;

S24. Determine a new W _id by using the current W _id and ΔS.

Second mold value S25, if the ΔS is equal to or less than a preset mode, the new W _id W _id determined then the current frame;

Otherwise, with the new W _id update the current W _id and the use of new iteration location S to update the current iteration position

Go to execute S22.

According to a preferred embodiment of the present invention, the S22 includes: utilizing

Obtaining a new iteration position S;

Expressing a two-dimensional matrix in the direction of the expression description,

Representing the synthesis of a cubic matrix in the direction of the expression description;

The S24 includes:

Using ΔW _id =(Ψ ^T ·Ψ) ^-1 ·Ψ ^T ·ΔS, determine ΔW _id ;

The sum of ΔW _id and the current W _id is determined as the new W _id .

According to a preferred embodiment of the present invention, determining, by using the second positioning model, the W _{exp of the} current frame comprises:

S31. Using the average position of the m positioning points as the current iteration position

S32, using the current iteration position

Determining a new iteration position S by the W _{id of the} current frame and the current W _exp and the parameter vector C _{sdm_exp_id} of the second positioning model corresponding to the m positioning points;

S33, determining a deviation ΔS between the position of the m positioning points and the new iteration position S;

S34. Determine a new W _exp by using the current W _exp and ΔS;

S35, if the mold is less than or equal ΔS preset third modulo value, the new W _exp W _exp determined then the current frame;

Otherwise, with the new W _exp update the current W _exp and the use of new iteration location S to update the current iteration position

Go to execute S32.

According to a preferred embodiment of the present invention, the S32 includes: utilizing

Obtaining a new iteration position S;

Indicates that the object is expanded into a two-dimensional matrix according to the object description direction.

Represents the synthesis of a cubic matrix in the direction of the object description;

The S34 includes:

Using ΔW _exp = (Ω ^T · Ω) ^-1 · Ω ^T · ΔS, ΔW _{exp is} determined;

The sum of ΔW _exp and the current W _exp is determined as the new W _exp .

According to a preferred embodiment of the present invention, the positioning of the current frame based on the W _id and W _exp of the current frame includes:

use

Obtaining a vector f containing the position of n positioning points of the current frame;

among them,

a vector formed by the average value of the positions of n positioning points in the image of the training set used in the second positioning model, C _{exp_id} is a parameter vector of the second positioning model, and × ₁ indicates that the cubic matrix in front of × ₁ is in accordance with the expression description direction. expand remapping cubic matrix is a two-dimensional matrix and a two-dimensional matrix × ₁ dimensional matrix by multiplying the back, in accordance with the expression described in the obtained direction; represents × ₂ × ₂ matrix is expanded in front of a two-dimensional cubic described by object orientation After multiplying the matrix by the two-dimensional matrix behind × ₂ , the obtained two-dimensional matrix is transformed into a square matrix according to the direction of the object description.

According to a preferred embodiment of the present invention, the method further includes: pre-training the second positioning model, specifically:

B11. Collect images of different expressions of different objects, and construct a stereo training data tensor according to the object description, the expression description, and the position description;

B12, the position of the n key points in the stereo training data tensor is respectively subtracted from the position average of the n key points in each image to obtain a stereo data tensor D;

B13, use

Obtaining a parameter vector C _{exp_id of the} second positioning model;

Where U _exp is a unitary matrix in which D is expanded into a two-dimensional matrix in the direction of expression description, and U _id is a unitary matrix in which D is expanded into a two-dimensional matrix in the direction of object description.

According to a preferred embodiment of the present invention, the method further includes:

When the determined W _id of each frame tends to a stable value, the stable value is directly adopted for the W _{id of the} subsequent frame.

According to a preferred embodiment of the present invention, if there is more than one object in the image, the two object description parameters that have the smallest Euclidean distance between the object description parameter of the previous frame and the object description parameter of the current frame are determined to correspond to the same object. .

The present invention also provides an apparatus for performing key point positioning on an image, the apparatus comprising:

a first positioning unit, configured to locate the current frame by using the first positioning model, to obtain positions of the m positioning points;

a parameter determining unit, configured to use W _id and W _exp of the previous frame as initial parameters;

a second positioning unit, configured to determine W _id and W _{exp of the} current frame by using the second positioning model based on the m positioning points and initial parameters, and locate the current frame based on W _id and W _exp of the current frame, and obtain The position of n positioning points;

According to a preferred embodiment of the present invention, the parameter determining unit is further configured to determine whether the current frame is the first frame of the video, and if not, the W _id and W _exp of the previous frame are used as initial parameters; if yes, The preset initial W _id and initial W _{exp are} used as initial parameters.

According to a preferred embodiment of the present invention, the first positioning model is an SDM model, and the second positioning model is a 3-mode SVD model.

According to a preferred embodiment of the present invention, the first positioning unit comprises:

The first deviation determining subunit is configured to extract an image gradient feature of the current frame, and use the image gradient feature and the first positioning model to obtain a degree ΔX of the shape of the positioning point of the current frame deviating from the average positioning point shape;

The first positioning sub-unit is configured to obtain the positions of the m positioning points of the current frame by using the ΔX of the current frame and the positions of the predetermined m average positioning points.

According to a preferred embodiment of the present invention, the first deviation determining subunit, specifically determining ΔX by using ΔX=Φ·R;

According to a preferred embodiment of the present invention, the device further comprises:

a first model training unit, configured to perform the following operations to train the first positioning model:

A11, determine the training set p image key point X _real;

A13. Determine the value of ΔX by using ΔX=X _real -X;

According to a preferred embodiment of the present invention, the second positioning unit comprises:

The first parameter determining subunit is configured to determine the W _{id of the} current frame by using the second positioning model, and specifically perform the following operations:

S22, using the current iteration position

S24. Determine a new W _id by using the current W _id and ΔS.

Go to execute S22.

According to a preferred embodiment of the present invention, the first parameter determining subunit is specifically utilized when the S22 is executed.

Obtaining a new iteration position S;

When performing the S24, the specific use of _{^{ΔW id = (Ψ T · Ψ}} ) -1 · Ψ T · ΔS, determine ΔW _id; ΔW _id with the current W _id and determined as a new W _id.

The second parameter determining subunit is configured to determine W _{exp of the} current frame by using the second positioning model, and specifically:

S32, using the current iteration position

Positioning a second parameter vector _C sdm_exp_id Model W _id and the current W _exp current frame and the m corresponding anchor point, determining a new position of S iteration;

S34. Determine a new W _exp by using the current W _exp and ΔS;

Go to execute S32.

According to a preferred embodiment of the present invention, the second parameter determining subunit is specifically utilized when the S32 is executed.

Obtaining a new iteration position S;

When performing the S34, the specific use of _{^{ΔW exp = (Ω T · Ω}} ) -1 · Ω T · ΔS, determine ΔW _exp; ΔW _exp with the determined current and W _exp for the new W _exp.

The second positioning sub-unit is configured to locate the current frame based on the W _id and W _exp of the current frame, and specifically:

use

among them,

a second model training unit, configured to perform the following operations to train the second positioning model:

B13, use

Obtaining a parameter vector C _{exp_id of the} second positioning model;

According to a preferred embodiment of the present invention, the first parameter determining subunit is further configured to directly adopt the stable value for the W _{id of the} subsequent frame when the determined W _id of each frame tends to be a stable value.

The identity identifying unit is configured to determine, if there is more than one object in the image, two object description parameters that minimize the Euclidean distance between the object description parameter of the previous frame and the object description parameter of the current frame to correspond to the same object.

It can be seen from the above technical solution that the present invention accurately locates the current frame by combining the object description parameter and the expression description parameter of the previous frame on the basis of the rough positioning of the current frame, and considers the continuity of the preceding and succeeding frames in the video image. Relevance, thus achieving the positioning of key points in the video image.

[Description of the Drawings]

FIG. 1 is a flowchart of a main method according to an embodiment of the present invention;

2 is a flowchart of a method for training an SDM model according to an embodiment of the present invention;

FIG. 3a is a schematic diagram of an example of a training set according to an embodiment of the present invention; FIG.

Figure 3b is a partial enlarged view of an image of Figure 3a;

4 is a flowchart of a method for performing positioning by using an SDM model according to an embodiment of the present invention;

FIG. 5 is a flowchart of a method for training a 3-mode SVD model according to an embodiment of the present invention;

FIG. 6 is a diagram showing spatial representation of a stereo training data tensor according to an embodiment of the present invention; FIG.

FIG. 7 is a flowchart of a method for performing positioning by using a 3-mode SVD model according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of two frames before and after two objects according to an embodiment of the present invention; FIG.

FIG. 9 is a structural diagram of a device according to an embodiment of the present invention.

【detailed description】

The present invention will be described in detail below with reference to the drawings and specific embodiments.

FIG. 1 is a flowchart of a main method according to an embodiment of the present invention. As shown in FIG. 1 , when performing key point positioning on each frame image of a video, the following processing is performed on each frame in sequence:

In 101, the current positioning frame is used to locate the current frame, and the positions of the m positioning points are obtained.

In this application, there is a process of object detection before positioning each frame, mainly detecting the approximate area and number of objects. Taking face positioning as an example, firstly, face detection can be performed on the current frame to determine a face. The area and number. The positioning performed by the present application is mainly to locate in the area of the face. If the number is more than one, the area of each face is separately positioned. The manner of face detection can be adopted in the prior art, and will not be described in detail herein.

In 102, using W _id and W _exp of the previous frame as initial parameters, based on the above m positioning points, determining W _id and W _{exp of the} current frame by using the second positioning model, and based on W _id and W _{exp of the} current frame Positioning the current frame results in the location of n anchor points, where m is less than n.

The method provided by the embodiments of the present invention can be applied to the location of a key point in a video. Because different people and different expressions have different positions, the position of the key points may be deviated. Therefore, whether the positioning model is constructed or the positioning model is used, It is carried out in three dimensions: object (ie, person) description, expression description, and position description. There is an object description parameter W _id and an expression description parameter W _{exp for} each frame. As the name suggests, W _id is responsible for describing the object (ie, person) identifier in the image, and W _exp is responsible for describing the expression in the image. Of course, the present invention is not limited to the key point positioning of the human face, and the key point positioning may be performed on the animal such as a cat or a dog. In the embodiment of the present invention, the key point positioning of the human face is taken as an example for description.

The invention actually uses the first positioning model to roughly locate the current frame, and then uses the second positioning model to accurately locate the current frame based on the positioning of the first positioning model and the W _id and W _exp of the previous frame. .

In addition, for the first frame of the video, since the previous frame does not exist in the previous frame, the initial value may be adopted, that is, the preset initial W _id and the initial W _{exp are} used as initial parameters, which will be specifically described in the following embodiments. .

In the embodiment of the present invention, the first positioning model for coarse positioning may adopt the SDM (Supervised Descent Method) model, and the principle of the SDM model for positioning is: the average shape of the face (ie, the average position of the key points) Starting from the mapping between the average position of the key points and the image texture features centered on the average position of each key point, the positioning of the key points of the face is completed, that is, m positioning points are obtained, in the embodiment of the present invention The "positioning point" is used to indicate the key points obtained by positioning. The advantage of the SDM model is that the starting position is not high, and the average shape can be used to achieve a higher positioning effect, but the disadvantage is that the algorithm has high complexity. Therefore, in the embodiment of the present invention, only coarse positioning is used, and less is obtained. The number of anchor points.

The second positioning model for precise positioning in the embodiment of the present invention can adopt the 3-mode SVD (3-mode Singular Value Decomposition) model. The principle is as follows: Triggered from the average shape of the face organ, according to the mapping relationship between the average position of the key points and the existing positioning points, the precise positioning of the organ contour points is completed. The advantage is that the algorithm has low complexity, high precision and good real-time performance. The disadvantage is that it needs to realize the positioning of some key points.

Combining the advantages and disadvantages of the above two models, the present invention can complete real-time positioning and tracking of each frame image in the video. The method for performing key point positioning in video in the video combining the SDM model and the 3-mode SVD model will be described in detail below with reference to specific embodiments.

First, the SDM model is taken as an example to describe the step 101 shown in FIG. 1. In order to facilitate the understanding of the SDM model, the training process of the SDM model is first described. The training process is as shown in FIG. 2 and may include the following steps:

In 201, the key point position X _{real of the} p images in the training set is determined.

In the embodiment of the present invention, an image construction training set of different expressions of different people may be collected, as shown in FIG. 3a, key points are pre-determined on the images, and the number of images in the training set is assumed to be p, and each image exists on the image. m key point position (take one of the images in Fig. 3 as an example, and partially enlarge it as shown in Fig. 3b. In Fig. 3b, there are m key points on the eyes, nose, eyebrows, mouth, ears, chin and other organs. Then, X _real can be expressed as [x ₁ , x ₂ ,..., x _p ] ^T , that is, the coordinates of each key point of each image of the training set, and the X _{real is} recorded.

In 202, the average position of each key point in the p images is taken as the current iteration position X.

In this step, X can be expressed as

among them

That is, the average value of the coordinates of each key point in each image.

In 203, the value of ΔX is determined using ΔX = X _real -X.

ΔX reflects the deviation between the true position of the key point and the iteration position.

At 204, the current value of the parameter vector R of the first positioning model is obtained using R = (Φ ^T · Φ) ^-1 · Φ ^T · ΔX.

Where Φ=p×Dim, Dim is a 2m-dimensional gradient feature extracted by a set range region centered on the average position of m key points in each image. Suppose there are m key points, that is, there are m average positions, with each average position as the center, and the 8×8 region range extracts the 64-dimensional gradient features of the image. Therefore Dim is 2112 (m x 64) dimensions.

In 205, it is determined whether the modulo of ΔX is less than or equal to the preset first modulo value, and if so, 207 is performed; otherwise, 206 is performed.

This step actually sets a convergence condition, that is, if the modulus of ΔX is less than or equal to the first modulus, it indicates that the current deviation condition ΔX is within an acceptable range, and the iteration can be stopped. The first modulus value can take an empirical value, for example, 1 and the convergence condition can be achieved by performing 4 iterations in general.

At 206, the current iteration position X is updated with the value obtained by Φ·R+X, and the process proceeds to execution 203.

If the convergence condition is not met, after updating the current iteration position X, go to 203 to continue the iteration until the convergence condition is met.

In 207, the current value of R is the parameter value obtained in the final training, and the training process is ended.

It can be seen that the training process of the SDM model is actually the process of determining the R, and R represents the mapping relationship between the degree of the positioning point shape deviating from the average positioning point shape and the image gradient feature. After the SDM model is completed, the positioning process of step 101 shown in FIG. 1 by using the SDM model may be as shown in FIG. 4, and includes the following steps:

In 401, the image gradient feature of the current frame is extracted, and the image gradient feature and the R of the trained SDM model are used to obtain a degree ΔX of the shape of the anchor point of the current frame deviating from the average anchor point shape.

In this step, the image gradient feature of the current frame can be reflected by the gradient feature of the average anchor point on the current frame. The so-called average positioning point refers to the average position of m key points in the training set on each image. Since there are p images in the training set, there are m key points on each image, and the m key points are respectively in p images. The average of the position coordinates on the top is the coordinate value of the average anchor point.

Therefore, the image gradient feature can be represented by Φ, Φ=p×Dim, where the Dim of the current frame is the 2m-dimensional gradient feature extracted by the set range region centered on the positions of the m average positioning points on the current frame. Since the parameter vector R in the SDM model reflects the mapping relationship between the degree of the positioning point shape deviating from the average positioning point shape and the image gradient feature, the formula ΔX=Φ·R is used when determining ΔX, that is, by multiplying the Φ point of the current frame. R of the SDM model.

In 402, using the ΔX of the current frame and the predetermined position of the m average positioning points, The position of m anchor points to the current frame.

According to the meaning of ΔX, the position of the m positioning points of the current frame is the position of the current frame ΔX and the predetermined m average positioning points.

Sum.

After the rough positioning of m positioning points is completed by using the SDM model, the 3-modeSVD model is used for further precise positioning. In order to facilitate the understanding of the positioning mode of the 3-modeSVD model, the training process of the 3-modeSVD model is first described. As shown in FIG. 5, the training process may include the following steps:

In 501, images of different expressions of different objects are collected, and stereoscopic training data tensors are constructed according to object descriptions, expression descriptions, and position descriptions.

In the embodiment of the present invention, the object description is represented by id, the expression description is represented by exp, and the position description is represented by the positioning point vertices, and the spatial representation of the constructed stereo training data tensor can be as shown in FIG. 6.

In 502, the position average of n key points in each image is subtracted from the positions of n key points in the stereo training data tensor to obtain a stereo data tensor D.

If you construct 400 images of 39 expressions per person and there are n key points on each image, the key point coordinate expansion on each image can be expressed as [x ₁ , y ₁ , x ₂ , y ₂ .. .x _n , y _n ], where the subscript is the identifier of each key point. The position average of these n key points can be expressed as

Then, the position of n key points in the stereo data tensor is respectively subtracted from the position average of n key points in each image, and the obtained stereo data tensor is D, so that the stereo data tensor D size is 2n×39×400. Cubic matrix.

In 503, use

The parameter vector C _{exp_id of the} 3-modeSVD model is _obtained .

The formula in this step is based on the principle of the 3-modeSVD model. Wherein, the front × ₁ × ₁ represents a cubic matrix expanded two-dimensional matrix by multiplying a two-dimensional matrix and a two-dimensional matrix back × _1, obtained in accordance with the direction exp remapping according to a cubic matrix exp direction; represents ₂ × × ₂ The cubic matrix in front is expanded according to the id direction into a two-dimensional matrix and multiplied by the two-dimensional matrix behind × ₂ , and the obtained two-dimensional matrix is transformed into a square matrix according to the id direction. In this step, a formula, a cubic matrix D is actually two-dimensional matrix of expanded two-dimensional matrix after multiplying _exp ^T U, obtained according to the direction exp remapping according to a cubic matrix exp direction; then transform the The cubic matrix is expanded into a two-dimensional matrix according to the id direction.

After multiplication, the obtained two-dimensional matrix is transformed into a square matrix according to the id direction to obtain C _{exp_id} .

U _exp is a unitary matrix in which D is expanded into a two-dimensional matrix in the exp direction. Following the above example, D is expanded into a two-dimensional matrix of 39×800n in the exp direction, and then SVD is decomposed, and the obtained unitary matrix is 39×39, which can be directly the size of 39 × 39 matrix as U _exp. In order to reduce the size of the model and improve the positioning speed, the first 10 columns of the unitary matrix can be taken as U _exp according to the singular value, and the size of U _exp is 39×10.

U _id is a unitary matrix in which D is expanded into a two-dimensional matrix in the id direction. Following the above example, D is expanded into a two-dimensional matrix of 400×78n in the id direction, and then SVD is decomposed, and the obtained unitary matrix is 400×400, which can be directly The 400×400 size matrix is taken as U _id . In order to reduce the size of the model and improve the positioning speed, the first 20 columns of the matrix can be taken as U _id according to the singular value, and the size of the U _id is 400×20.

If the U _exp size is 39×10 and the U _id size is 400×20, the obtained C _{exp_id} size is 2n×10×20.

It can be seen from the formula in step 503 that D≈C _{exp_id} × ₁ U _exp × ₂ U _id , for the key point position f of an image in the training set, or the column vector of C _{exp_id} and U _exp which can be obtained by training The column vector of U _id represents that the column vector of U _exp is W _exp , and the column vector of U _id is represented as W _id . which is:

Further derivation:

The subsequent process of positioning using the 3-modeSVD model is based on this formula. The process of positioning using the 3-mode SVD model will be described in detail below with reference to FIG. 7. For each frame, the positioning method is the same. Therefore, in FIG. 7, only one frame (described as the current frame) is positioned. The description is made, that is, the implementation process of step 102 in FIG. As shown in Figure 7, the process can include the following steps:

In 701, the average value of the positions of the m positioning points determined by the SDM model is taken as the current iteration position.

In this step, the position average of m positioning points obtained by rough positioning using the SDM model is initially used as the current iteration position.

In 702, utilizing the current iteration position

The current W _id and W _exp and the parameter vector C _{sdm_exp_id} of the second positioning model corresponding to the above m positioning points determine a new iteration position S.

For the first frame of the video, the current W _id and W _exp both use preset initial values. According to the singular value, the initial value of W _id (ie, the initial vector) can be taken as

The initial value of W _exp (initial vector) is

The modulus of the initial value of W _{id and} the initial value of W _exp is both 1.

For non-video of the first frame, the current W _id and W _exp are used on one of W _id and W _exp.

In this step, you can use the following formula to get a new iteration position S:

among them

Indicates that the cubic matrix is synthesized in the direction of the expression description.

For the parameter vector C _{sdm_exp_id} of the second positioning model corresponding to m positioning points, for example, suppose the SDM model locates m positioning points, and the number of positioning points of the 3-modeSVD model is n, then the 3-modeSVD model The n points in the parameter vector C _{exp_id} are included in the m positioning points, and the parameter vectors corresponding to the m positioning points in the parameter vector C _{exp_id} are determined as C _{sdm_exp_id} .

In 703, the deviation ΔS of the position of the m positioning points determined by the SDM model from the new iteration position S is determined.

If the position of m anchor points determined by the SDM model is represented by S _SDM , then ΔS = S _SDM -S.

At 704, the new W _{id is} determined using the current W _id and ΔS .

In this step, it is possible firstly _{^{ΔW id = (Ψ T · Ψ}} ) -1 · Ψ T · ΔS, it is determined that ΔW _id; then ΔW _id of the current W _id and determined as a new W _id.

In 705, it is determined whether the modulus of ΔS is less than or equal to the preset second modulus value, and if so, 707 is performed; otherwise, 706 is performed.

In this step, a convergence condition is actually set by using ΔS. If the value of ΔS satisfies the convergence condition, it indicates that the position of the m positioning points determined by the SDM model and the new iteration position S have little deviation, and W _id out the current iteration (i.e., the current W _id) W _id as the current frame. If the convergence condition is not met, the iteration continues.

The second modulus value may be an empirical value, for example, one.

In 706, the current W _{id is} updated with the new W _id and the current iteration location is updated with the new iteration position S

Go to execution 702.

In 707, the new W _id is determined as W _id of the current frame.

This completes the determination of the W _id of the current frame. The actual method is to fix W _{exp by} iterating W _exp . The following steps begin to determine the W _{exp of the} current frame based on the W _id of the current frame, that is, the fixed W _id iterates out W _exp , and the principle is the same.

In 708, the average value of the positions of the m positioning points that are located by using the SDM model is taken as the current iteration position.

In 709, utilize the current iteration position

The W _{id of the} current frame and the current W _exp and the parameter vector C _{sdm_exp_id} of the 3-mode SVD model corresponding to the m anchor points determine a new iteration position S.

among them,

Indicates that the cubic matrix is synthesized in the direction of the object description.

At 710, a deviation ΔS from the position of the m anchor points located using the SDM model and the new iteration position S is determined.

In 711, the new W _{exp is} determined using the current W _exp and ΔS .

Firstly can _{^{ΔW exp = (Ω T · Ω}} ) -1 · Ω T · ΔS, determining Delta] W _exp; _exp Delta] W and the determined current and W is _exp _exp is the new W.

In 712, it is determined whether the modulo of ΔS is less than or equal to the preset third modulo value, and if so, 714 is performed; otherwise 713 is performed.

The third modulus value here can be an empirical value, for example, one.

In 713, with the new W _exp updates the current W _exp and the use of new iteration updates the current iteration S position location

Go to execution 709.

In 714, the new W _exp W _exp determined as the current frame.

The above determines the W _exp of the current frame, and the W _{id of the} current frame is also determined to be completed before, and then the current frame can be located in 715 by using the W _id and W _exp of the current frame.

Specifically, it can be utilized

A vector f containing the position of n anchor points of the current frame is obtained.

among them,

A vector formed by the average of the positions of the n anchor points in the image of the training set used for the second positioning model.

At this point, the positioning of the current frame ends. The above process can be performed for each frame of the video to obtain the anchor points of each frame. In addition, when positioning the frames in the video by using the above method, when locating a certain number of frames, the W _id gradually approaches a stable value, so the W _{id of the} subsequent frames does not need to adopt the manner shown in FIG. 7 . To iterate to get W _id , and directly use the stable value. The stable value is then used to calculate the W _{exp of} each frame. When it is determined whether the W _id gradually reaches a stable value, it can be determined whether the modulo of the difference between the W _id of the current frame and the W _id of the previous frame is less than a preset threshold, for example, whether it is less than 1, and if so, whether the W _id is determined. Gradually tend to stabilize. The stable value may be an average value of the W _id of the current frame and the W _id of the previous frames. Of course, other ways of judging whether it is stable or determining a stable value may be used, and will not be enumerated here.

As W _id gradually stabilizes, the position of the anchor point gradually becomes stable, which is crucial for the effect of beauty applications such as virtual makeup.

For the case where the image contains more than two objects, for example, two people, after performing the above positioning, two positioning results (each positioning result includes n positioning points) and two W _id are output. Since the characters in the video are not necessarily static, there may be positional changes caused by the movement, so it is necessary to distinguish which of the two frames is the same person. As shown in FIG. 8, the two images are two frames before and after, and the relative positions of the object A and the object B in the previous frame are changed in the current frame, and therefore, it is necessary to distinguish and recognize.

For convenience of description, the front one of the positioning result and object description parameters are identified as _{_{_{f 1_pre, W 1_pre, f 2_pre}}} , W 2_pre, current positioning result and the target frame described parameters are identified as _{_{_f}} 1_cur, W 1_cur, f 2_cur , W _{2_cur} .

In the embodiment of the present invention, two object description parameters that have the smallest Euclidean distance between the object description parameter of the previous frame and the object description parameter of the current frame may be determined to correspond to the same object. Specifically Euclidean distance Euclidean distance, and calculates _W 1_pre _W 1_cur Euclidean distance, and _W 2_cur _W 1_pre, the Euclidean distance _W 2_pre and _W 1_cur of the _W 2_pre and _W 2_cur. Selecting the smallest Euclidean distance, assuming that the Euclidean distance between W _{1_pre} and W _{2_cur} is the smallest, it means that W _{1_pre} and W _{2_cur} correspond to the same object. Accordingly, f _{1_pre} and f _{2_cur} correspond to the same object, and the positioning belonging to the same object can be The result is identified as the same object.

In the above embodiment, m may take an integer such as several tens of levels, and n may take an integer such as a hundred-level.

The foregoing is a detailed description of the method provided by the present invention. The apparatus provided by the embodiment of the present invention is described in detail below with reference to FIG. 9. As shown in FIG. 9, the apparatus may include: a first positioning unit 10 and a parameter determining unit 20 And the second positioning unit 30 may further include a first model training unit 40, a second model training unit 50, and an identity identifying unit 60. The first positioning unit 10 may specifically include a first deviation determining subunit 11 and a first positioning subunit 12. The second positioning unit 30 may specifically include a first parameter determining subunit 31, a second parameter determining subunit 32, and a second positioning subunit 33. The main functions of each component are as follows:

The first positioning unit 10 is responsible for positioning the current frame by using the first positioning model to obtain m positionings. The location of the point.

The parameter determination unit 20 is responsible for taking W _id and W _exp of the previous frame as initial parameters.

The second positioning unit 30 is responsible for determining the W _id and W _{exp of the} current frame by using the second positioning model based on the m positioning points and the initial parameters, and positioning the current frame based on the W _id and W _exp of the current frame to obtain n positioning. The location of the point.

Where m is less than n, W _id is an object description parameter in the image, and W _exp is an expression description parameter in the image.

The device actually uses the first positioning model to roughly locate the current frame, and then uses the second positioning model to accurately locate the current frame based on the positioning of the first positioning model and the W _id and W _exp of the previous frame.

The parameter determining unit 20 may first determine whether the current frame is the first frame of the video, and if not, use W _id and W _exp of the previous frame as initial parameters; if yes, since the first frame of the video does not exist in the previous frame Reference, so the preset initial W _id and initial W _exp can be used as initial parameters.

In the embodiment of the present invention, the first positioning model may be an SDM model, and the second positioning model may be a 3-mode SVD model.

The first model training unit 40 is responsible for operating the first positioning model, that is, the SDM model, and specifically performs the following operations:

A11. Determine a key point position X _{real of} p images in the training set.

A12. The average position of each key point in the p images is taken as the current iteration position X.

A13. Determine the value of ΔX by using ΔX=X _real -X.

A14. Using R = (Φ ^T · Φ) ^-1 · Φ ^T · ΔX, the current value of the parameter vector R of the SDM model is obtained.

A15. If the modulus of ΔX is less than or equal to the preset first modulus value, the current value of R is the value of the parameter vector R of the trained SDM model; otherwise, the current iteration is updated by the value obtained by Φ·R+X Position X, go to A13. The first modulus value may take an empirical value, for example, 1 is taken.

Where Φ=p×Dim, Dim is a 2m-dimensional gradient feature extracted by a set range region centered on the positions of m average anchor points.

The composition of the first positioning unit 10 will be described below.

The first deviation determining sub-unit 11 is responsible for extracting the image gradient feature of the current frame, and using the image gradient feature and the first positioning model, the degree ΔX of the shape of the positioning point of the current frame deviating from the average positioning point shape is obtained.

Specifically, the first deviation determination sub-unit 11 can determine ΔX using ΔX=Φ·R.

The first positioning sub-unit 12 is responsible for obtaining the positions of the m positioning points of the current frame by using the ΔX of the current frame and the positions of the predetermined m average positioning points. The position of the m positioning points of the current frame is the position of the current frame ΔX and the predetermined m average positioning points.

Sum.

After the rough positioning of m positioning points is completed by using the SDM model, the 3-modeSVD model is used for further precise positioning. To facilitate an understanding of the positioning of the 3-mode SVD model, the second model training unit 50 is first described.

The second model training unit 50 is responsible for training the second positioning model, the 3-mode SVD model. Specifically do the following:

B11. Collect images of different expressions of different objects, and construct a stereo training data tensor according to the object description, the expression description, and the position description.

B12. Subtracting the position average of n key points in each image by subtracting the position average of n key points in the stereo training data tensor to obtain a stereo data tensor D.

B13, use

The parameter vector C _{exp_id of the} 3-modeSVD model is _obtained ; where U _exp is the unitary matrix in which D is expanded into a two-dimensional matrix in the direction of expression description, and U _id is a unitary matrix in which D is expanded into a two-dimensional matrix in the object description direction. × ₁ indicates that the cubic matrix in front of × ₁ is expanded into a two-dimensional matrix according to the expression description direction and multiplied by the two-dimensional matrix behind × ₁ , and the obtained two-dimensional matrix is transformed into a square matrix according to the expression description direction; × ₂ indicates × ₂ The cubic matrix in front is expanded according to the object description direction and multiplied by the two-dimensional matrix behind × ₂ , and then the obtained two-dimensional matrix is transformed into a square matrix according to the object description direction.

The specific structure of the second positioning unit 30 will be described below.

The first parameter determining sub-unit 31 is responsible for determining the W _{id of the} current frame by using the 3-mode SVD model, and specifically performs the following operations:

S21. Taking the position average of the m positioning points as the current iteration position

S22, using the current iteration position

The current W _id and W _exp and the parameter vector C _{sdm_exp_id} of the 3-mode SVD model corresponding to the m anchor points determine a new iteration position S.

For the first frame of the video, the current W _id and W _exp both use preset initial values. According to the singular value, the initial value of W _id (ie, the initial vector) can be taken as The initial value of W _exp (initial vector) is

The first parameter determining subunit 31 can utilize

Get a new iteration position S;

S23. Determine a deviation ΔS between the positions of the m positioning points and the new iteration position S.

S24. Determine the new W _id by using the current W _id and ΔS.

The first sub-parameter determining unit 31 may use _{^{ΔW id = (Ψ T · Ψ}} ) -1 · Ψ T · ΔS, determine ΔW _id; ΔW _id with the current W _id and determined as a new W _id.

S25, if the value of the second mold ΔS is equal to or less than a preset mode, then the new W _id of the current frame is determined as W _id; otherwise, update with the new W _id W _id and the use of this new iteration location update S Current iteration position

Go to execution S22. The second modulus value here can be an empirical value, for example, one.

After the first parameter determining sub-unit 31 determines the W _id of the current frame, the second parameter determining sub-unit 32 is responsible for determining the W _{exp of the} current frame by using the 3-mode SVD model, and specifically:

S31. Taking the average value of the positions of the m positioning points as the current iteration position

S32, using the current iteration position

The new iteration position S is determined by the W _{id of the} current frame and the current W _exp and the parameter vector C _{sdm_exp_id} of the second positioning model corresponding to the m positioning points.

The second parameter determining subunit 32 can utilize

Get a new iteration position S; where

S33. Determine a deviation ΔS between the positions of the m positioning points and the new iteration position S.

S34. Determine the new W _exp by using the current W _exp and ΔS.

The second sub-parameter determining unit 32 may use _{^{ΔW exp = (Ω T · Ω}} ) -1 · Ω T · ΔS, determining Delta] W _exp; _exp Delta] W and the determined current and W is _exp _exp is the new W.

S35, if ΔS is less than or equal to a preset molding mold a third value, the new W _exp current frame is determined as W _exp; otherwise, update with the new W _exp W _exp current iteration and the use of new location update S Current iteration position

Go to execute S32. The third modulus value here can be an empirical value, for example, one.

After the determination of the W _id and the W _exp of the current frame is completed, the second locating sub-unit 33 is responsible for locating the current frame based on the W _id and W _exp of the current frame, and specifically:

use

among them,

A vector composed of the position averages of n anchor points in the image of the training set used in the 3-modeSVD model, and C _{exp_id} is a parameter vector of the 3-mode SVD model.

Further, when the above-described method using video frames positioned, in the positioning of several number of frames, W _id value became stable, then W _id for subsequent frames need not be employed as described above can be obtained iteratively W _id And use this stable value directly. The stable value is then used to calculate the W _{exp of} each frame. When it is determined whether the W _id gradually reaches a stable value, it can be determined whether the modulo of the difference between the W _id of the current frame and the W _id of the previous frame is less than a preset threshold, for example, whether it is less than 1, and if so, whether the W _id is determined. Gradually tend to stabilize. The stable value may be an average value of the W _id of the current frame and the W _id of the previous frames. Of course, other ways of judging whether it is stable or determining a stable value may be used, and will not be enumerated here.

If there is more than one object in the image, the identity identifying unit 60 is responsible for determining the two object description parameters that minimize the Euclidean distance between the object description parameter of the previous frame and the object description parameter of the current frame. Corresponding to the same object, the positioning result of the same object is identified, thereby distinguishing which object the positioning result on the image belongs to.

As can be seen from the above description, the method and apparatus provided by the present invention can have the following advantages:

1) Key point positioning for video images is achieved.

2) Considering the constraint problem that the front and back frames are the same object in the video image, the jitter of the front and rear frame positioning points in the video is reduced, and the visual effect is more natural and smooth.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division, and the actual implementation may have another division manner.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.

The above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium. The above software functional unit is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform the methods of the various embodiments of the present invention. Part of the steps. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

The above are only the preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalents, improvements, etc., which are made within the spirit and principles of the present invention, should be included in the present invention. Within the scope of protection.

Claims

A method for performing key point positioning on an image, the method comprising:

S1, using the first positioning model to locate the current frame, and obtaining positions of m positioning points;

S2, using W id and W exp of the previous frame as initial parameters, determining, according to the m positioning points, the W id and W exp of the current frame by using the second positioning model, and based on the W id and W exp of the current frame Positioning the current frame to obtain the position of n positioning points;

Wherein m is smaller than the n, the W id is an object description parameter in an image, and the W exp is an expression description parameter in the image.
The method according to claim 1, wherein before the W id and W exp of the previous frame are used as initial parameters, the method further comprises:

Determining whether the current frame is the first frame of the video, and if not, proceed to step c, and the upper one as W exp initial parameters; if so, a preset initial W id and initial parameters as initial W exp .
The method according to claim 1, wherein the first positioning model is a supervised descent method SDM model, and the second positioning model is a 3-mode singular value decomposition 3-mode SVD model.
The method of claim 3 wherein said S1 comprises:

S11, extracting an image gradient feature of the current frame, using the image gradient feature and the first positioning model to obtain a degree ΔX of the shape of the positioning point of the current frame deviating from the average positioning point shape;

S12. Using the ΔX of the current frame and the predetermined positions of the m average positioning points, obtain the positions of the m positioning points of the current frame.
The method according to claim 4, wherein the degree ΔX of the shape of the positioning point of the current frame deviating from the average positioning point shape by using the image gradient feature and the first positioning model comprises:

Determining ΔX by using ΔX=Φ·R;

Wherein the Φ of the current frame is: Φ=p×Dim, where p is a map of the training set used by the first positioning model Like the number, Dim is a 2m-dimensional gradient feature extracted from a set range region centered on the positions of the m average anchor points, and R is a parameter vector of the first positioning model.
The method according to claim 5, further comprising: pre-training the first positioning model, specifically comprising:

A11, determine the image of the training set p key point X real;

A12. The average position of each key point in the p images is taken as the current iteration position X;

A13, using ΔX = X real -X, determining the value of Delta] X;

A14. Using R=(Φ T ·Φ) -1 ·Φ T ·ΔX, obtain the current value of the parameter vector R of the first positioning model;

A15. If the modulus of ΔX is less than or equal to the preset first modulus, the current value of R is the value of the parameter vector R of the first positioning model obtained by the training;

Otherwise, the current iteration position X is updated with the value obtained by Φ·R+X, and the process goes to A13.
The method according to claim 3, wherein the determining the W id of the current frame by using the second positioning model comprises:

S21. The average value of the positions of the m positioning points is used as a current iteration position.

S22, using the current iteration position
The current W id and W exp and the parameter vector C sdm_exp_id of the second positioning model corresponding to the m positioning points determine a new iteration position S;

S23, determining a deviation ΔS between the position of the m positioning points and the new iteration position S;

S24. Determine a new W id by using the current W id and ΔS.

Second mold value S25, if the ΔS is equal to or less than a preset mode, the new W id W id determined then the current frame;

Otherwise, with the new W id update the current W id and the use of new iteration location S to update the current iteration position
Go to execute S22.
The method of claim 7 wherein said S22 comprises: utilizing
Obtaining a new iteration position S;
Expressing a two-dimensional matrix in the direction of the expression description,
Representing the synthesis of a cubic matrix in the direction of the expression description;

The S24 includes:

Using ΔW id =(Ψ T ·Ψ) -1 ·Ψ T ·ΔS, determine ΔW id ;

The sum of ΔW id and the current W id is determined as the new W id .
The method according to claim 3 or 7, wherein the determining the W exp of the current frame by using the second positioning model comprises:

S31. Using the average position of the m positioning points as the current iteration position

S32, using the current iteration position
Positioning a second parameter vector C sdm_exp_id Model W id and the current W exp current frame and the m corresponding anchor point, determining a new position of S iteration;

S33, determining a deviation ΔS between the position of the m positioning points and the new iteration position S;

S34. Determine a new W exp by using the current W exp and ΔS;

S35, if the mold is less than or equal ΔS preset third modulo value, the new W exp W exp determined then the current frame;

Otherwise, with the new W exp update the current W exp and the use of new iteration location S to update the current iteration position
Go to execute S32.
The method of claim 9 wherein said S32 comprises: utilizing
Obtaining a new iteration position S;
Indicates that the object is expanded into a two-dimensional matrix according to the object description direction.
Represents the synthesis of a cubic matrix in the direction of the object description;

The S34 includes:

Using ΔW exp = (Ω T · Ω) -1 · Ω T · ΔS, ΔW exp is determined;

The sum of ΔW exp and the current W exp is determined as the new W exp .
The method according to claim 3, wherein the locating the current frame based on the W id and W exp of the current frame comprises:

use
Obtaining a vector f containing the position of n positioning points of the current frame;

among them,
a vector formed by the average value of the positions of n positioning points in the image of the training set used in the second positioning model, C exp_id is a parameter vector of the second positioning model, and × 1 indicates that the cubic matrix in front of × 1 is in accordance with the expression description direction. expand remapping cubic matrix is a two-dimensional matrix and a two-dimensional matrix × 1 dimensional matrix by multiplying the back, in accordance with the expression described in the obtained direction; represents × 2 × 2 matrix is expanded in front of a two-dimensional cubic described by object orientation After multiplying the matrix by the two-dimensional matrix behind × 2 , the obtained two-dimensional matrix is transformed into a square matrix according to the direction of the object description.
The method according to claim 11, further comprising: pre-training the second positioning model, specifically comprising:

B11. Collect images of different expressions of different objects, and construct a stereo training data tensor according to the object description, the expression description, and the position description;

B12, the position of the n key points in the stereo training data tensor is respectively subtracted from the position average of the n key points in each image to obtain a stereo data tensor D;

B13, use
Obtaining a parameter vector C exp_id of the second positioning model;

Where U exp is a unitary matrix in which D is expanded into a two-dimensional matrix in the direction of expression description, and U id is a unitary matrix in which D is expanded into a two-dimensional matrix in the direction of object description.
The method of claim 1 further comprising:

When the determined W id of each frame tends to a stable value, the stable value is directly adopted for the W id of the subsequent frame.
The method according to claim 1, wherein if there is more than one object in the image, determining two object description parameters that minimize the Euclidean distance between the object description parameter of the previous frame and the object description parameter of the current frame To correspond to the same object.
An apparatus for performing key point positioning on an image, the apparatus comprising:

a first positioning unit, configured to locate the current frame by using the first positioning model, to obtain positions of the m positioning points;

a parameter determining unit, configured to use W id and W exp of the previous frame as initial parameters;

a second positioning unit, configured to determine W id and W exp of the current frame by using the second positioning model based on the m positioning points and initial parameters, and locate the current frame based on W id and W exp of the current frame, and obtain The position of n positioning points;

Wherein m is smaller than the n, the W id is an object description parameter in an image, and the W exp is an expression description parameter in the image.
The device according to claim 15, wherein the parameter determining unit is further configured to determine whether the current frame is the first frame of the video, and if not, the W id and W exp of the previous frame are used as initial parameters; If yes, the preset initial W id and initial W exp are taken as initial parameters.
The apparatus according to claim 15, wherein the first positioning model is an SDM model and the second positioning model is a 3-mode SVD model.
The device according to claim 17, wherein the first positioning unit comprises:

The first deviation determining subunit is configured to extract an image gradient feature of the current frame, and use the image gradient feature and the first positioning model to obtain a degree ΔX of the shape of the positioning point of the current frame deviating from the average positioning point shape;

The first positioning sub-unit is configured to obtain the positions of the m positioning points of the current frame by using the ΔX of the current frame and the positions of the predetermined m average positioning points.
The apparatus according to claim 18, wherein said first deviation determining subunit, specifically determining ΔX by using ΔX = Φ · R;

The Φ of the current frame is: Φ=p×Dim, where p is the number of images of the training set used by the first positioning model, and Dim is a set range area centered on the positions of the m average positioning points respectively. The extracted 2m-dimensional gradient feature, R is a parameter vector of the first positioning model.
The device of claim 19, further comprising:

a first model training unit, configured to perform the following operations to train the first positioning model:

A11, determine the image of the training set p key point X real;

A12. The average position of each key point in the p images is taken as the current iteration position X;

A13, using ΔX = X real -X, determining the value of Delta] X;

A14. Using R=(Φ T ·Φ) -1 ·Φ T ·ΔX, obtain the current value of the parameter vector R of the first positioning model;

A15. If the modulus of ΔX is less than or equal to the preset first modulus, the current value of R is the value of the parameter vector R of the first positioning model obtained by the training;

Otherwise, the current iteration position X is updated with the value obtained by Φ·R+X, and the process goes to A13.
The device according to claim 17, wherein the second positioning unit comprises:

The first parameter determining subunit is configured to determine the W id of the current frame by using the second positioning model, and specifically perform the following operations:

S21. The average value of the positions of the m positioning points is used as a current iteration position.

S22, using the current iteration position
The current W id and W exp and the parameter vector C sdm_exp_id of the second positioning model corresponding to the m positioning points determine a new iteration position S;

S23, determining a deviation ΔS between the position of the m positioning points and the new iteration position S;

S24. Determine a new W id by using the current W id and ΔS.

Second mold value S25, if the ΔS is equal to or less than a preset mode, the new W id W id determined then the current frame;

Otherwise, with the new W id update the current W id and the use of new iteration location S to update the current iteration position
Go to execute S22.
The apparatus according to claim 21, wherein said first parameter determining subunit specifically utilizes when said S22 is executed
Obtaining a new iteration position S;
Expressing a two-dimensional matrix in the direction of the expression description,
Representing the synthesis of a cubic matrix in the direction of the expression description;

When performing the S24, the specific use of ΔW id = (Ψ T · Ψ ) -1 · Ψ T · ΔS, determine ΔW id; ΔW id with the current W id and determined as a new W id.
The device according to claim 17 or 21, wherein the second positioning unit comprises:

The second parameter determining subunit is configured to determine W exp of the current frame by using the second positioning model, and specifically:

S31. Using the average position of the m positioning points as the current iteration position

S32, using the current iteration position
Positioning a second parameter vector C sdm_exp_id Model W id and the current W exp current frame and the m corresponding anchor point, determining a new position of S iteration;

S33, determining a deviation ΔS between the position of the m positioning points and the new iteration position S;

S34. Determine a new W exp by using the current W exp and ΔS;

S35, if the mold is less than or equal ΔS preset third modulo value, the new W exp W exp determined then the current frame;

Otherwise, with the new W exp update the current W exp and the use of new iteration location S to update the current iteration position
Go to execute S32.
The apparatus according to claim 23, wherein said second parameter determining subunit specifically utilizes when said S32 is executed
Obtaining a new iteration position S;
Indicates that the object is expanded into a two-dimensional matrix according to the object description direction.
Represents the synthesis of a cubic matrix in the direction of the object description;

When performing the S34, the specific use of ΔW exp = (Ω T · Ω ) -1 · Ω T · ΔS, determine ΔW exp; ΔW exp with the determined current and W exp for the new W exp.
The device according to claim 17, wherein the second positioning unit comprises:

The second positioning sub-unit is configured to locate the current frame based on the W id and W exp of the current frame, and specifically:

use
Obtaining a vector f containing the position of n positioning points of the current frame;

among them,
a vector formed by the average value of the positions of n positioning points in the image of the training set used in the second positioning model, C exp_id is a parameter vector of the second positioning model, and × 1 indicates that the cubic matrix in front of × 1 is in accordance with the expression description direction. expand remapping cubic matrix is a two-dimensional matrix and a two-dimensional matrix × 1 dimensional matrix by multiplying the back, in accordance with the expression described in the obtained direction; represents × 2 × 2 matrix is expanded in front of a two-dimensional cubic described by object orientation After multiplying the matrix by the two-dimensional matrix behind × 2 , the obtained two-dimensional matrix is transformed into a square matrix according to the direction of the object description.
The device of claim 25, further comprising:

a second model training unit, configured to perform the following operations to train the second positioning model:

B11. Collect images of different expressions of different objects according to object description, expression description and position description. Constructing a stereo training data tensor;

B12, the position of the n key points in the stereo training data tensor is respectively subtracted from the position average of the n key points in each image to obtain a stereo data tensor D;

B13, use
Obtaining a parameter vector C exp_id of the second positioning model;

Where U exp is a unitary matrix in which D is expanded into a two-dimensional matrix in the direction of expression description, and U id is a unitary matrix in which D is expanded into a two-dimensional matrix in the direction of object description.
The apparatus according to claim 21, wherein the first parameter determining subunit is further configured to directly adopt the W id for a subsequent frame when the determined W id of each frame tends to a stable value. Stable value.
The device according to claim 15, wherein the device further comprises:

The identity identifying unit is configured to determine, if there is more than one object in the image, two object description parameters that minimize the Euclidean distance between the object description parameter of the previous frame and the object description parameter of the current frame to correspond to the same object.