WO2017054652A1 - Method and apparatus for positioning key point of image - Google Patents

Method and apparatus for positioning key point of image Download PDF

Info

Publication number
WO2017054652A1
WO2017054652A1 PCT/CN2016/099291 CN2016099291W WO2017054652A1 WO 2017054652 A1 WO2017054652 A1 WO 2017054652A1 CN 2016099291 W CN2016099291 W CN 2016099291W WO 2017054652 A1 WO2017054652 A1 WO 2017054652A1
Authority
WO
WIPO (PCT)
Prior art keywords
positioning
exp
current
current frame
model
Prior art date
Application number
PCT/CN2016/099291
Other languages
French (fr)
Chinese (zh)
Inventor
陈岩
黄英
邹建法
Original Assignee
阿里巴巴集团控股有限公司
陈岩
黄英
邹建法
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 陈岩, 黄英, 邹建法 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017054652A1 publication Critical patent/WO2017054652A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Definitions

  • the present invention relates to the field of computer application technologies, and in particular, to a method and apparatus for performing key point positioning on an image.
  • the present invention provides a method and apparatus for keypoint positioning of an image to facilitate keypoint positioning of a video image.
  • the invention provides a method for performing key point positioning on an image, the method comprising:
  • the W id is an object description parameter in an image
  • the W exp is an expression description parameter in the image.
  • the method before the W id and W exp of the previous frame are used as initial parameters, the method further includes:
  • the first positioning model is a supervised descent method SDM model
  • the second positioning model is a 3-mode singular value decomposition 3-mode SVD model.
  • the S1 includes:
  • the degree ⁇ X of the shape of the positioning point of the current frame deviating from the average positioning point shape is obtained:
  • the extracted 2m-dimensional gradient feature, R is a parameter vector of the first positioning model.
  • the method further includes: pre-training the first positioning model, specifically:
  • the current value of R is the value of the parameter vector R of the first positioning model obtained by the training;
  • determining, by using the second positioning model, the W id of the current frame includes:
  • the average value of the positions of the m positioning points is used as a current iteration position.
  • the S22 includes: utilizing Obtaining a new iteration position S; Expressing a two-dimensional matrix in the direction of the expression description, Representing the synthesis of a cubic matrix in the direction of the expression description;
  • the S24 includes:
  • the sum of ⁇ W id and the current W id is determined as the new W id .
  • determining, by using the second positioning model, the W exp of the current frame comprises:
  • the S32 includes: utilizing Obtaining a new iteration position S; Indicates that the object is expanded into a two-dimensional matrix according to the object description direction. Represents the synthesis of a cubic matrix in the direction of the object description;
  • the S34 includes:
  • the positioning of the current frame based on the W id and W exp of the current frame includes:
  • expand remapping cubic matrix is a two-dimensional matrix and a two-dimensional matrix ⁇ 1 dimensional matrix by multiplying the back, in accordance with the expression described in the obtained direction; represents ⁇ 2 ⁇ 2 matrix is expanded in front of a two-dimensional cubic described by object orientation After multiplying the matrix by the two-dimensional matrix behind ⁇ 2 , the obtained two-dimensional matrix is transformed into a square matrix according to the direction of the object description.
  • the method further includes: pre-training the second positioning model, specifically:
  • U exp is a unitary matrix in which D is expanded into a two-dimensional matrix in the direction of expression description
  • U id is a unitary matrix in which D is expanded into a two-dimensional matrix in the direction of object description.
  • the method further includes:
  • the stable value is directly adopted for the W id of the subsequent frame.
  • the two object description parameters that have the smallest Euclidean distance between the object description parameter of the previous frame and the object description parameter of the current frame are determined to correspond to the same object. .
  • the present invention also provides an apparatus for performing key point positioning on an image, the apparatus comprising:
  • a first positioning unit configured to locate the current frame by using the first positioning model, to obtain positions of the m positioning points
  • a parameter determining unit configured to use W id and W exp of the previous frame as initial parameters
  • a second positioning unit configured to determine W id and W exp of the current frame by using the second positioning model based on the m positioning points and initial parameters, and locate the current frame based on W id and W exp of the current frame, and obtain The position of n positioning points;
  • the W id is an object description parameter in an image
  • the W exp is an expression description parameter in the image.
  • the parameter determining unit is further configured to determine whether the current frame is the first frame of the video, and if not, the W id and W exp of the previous frame are used as initial parameters; if yes, The preset initial W id and initial W exp are used as initial parameters.
  • the first positioning model is an SDM model
  • the second positioning model is a 3-mode SVD model
  • the first positioning unit comprises:
  • the first deviation determining subunit is configured to extract an image gradient feature of the current frame, and use the image gradient feature and the first positioning model to obtain a degree ⁇ X of the shape of the positioning point of the current frame deviating from the average positioning point shape;
  • the first positioning sub-unit is configured to obtain the positions of the m positioning points of the current frame by using the ⁇ X of the current frame and the positions of the predetermined m average positioning points.
  • the extracted 2m-dimensional gradient feature, R is a parameter vector of the first positioning model.
  • the device further comprises:
  • a first model training unit configured to perform the following operations to train the first positioning model:
  • the current value of R is the value of the parameter vector R of the first positioning model obtained by the training;
  • the second positioning unit comprises:
  • the first parameter determining subunit is configured to determine the W id of the current frame by using the second positioning model, and specifically perform the following operations:
  • the average value of the positions of the m positioning points is used as a current iteration position.
  • Second mold value S25 if the ⁇ S is equal to or less than a preset mode, the new W id W id determined then the current frame;
  • the first parameter determining subunit is specifically utilized when the S22 is executed. Obtaining a new iteration position S; Expressing a two-dimensional matrix in the direction of the expression description, Representing the synthesis of a cubic matrix in the direction of the expression description;
  • ⁇ W id ( ⁇ T ⁇ ⁇ ) -1 ⁇ ⁇ T ⁇ ⁇ S, determine ⁇ W id; ⁇ W id with the current W id and determined as a new W id.
  • the second positioning unit comprises:
  • the second parameter determining subunit is configured to determine W exp of the current frame by using the second positioning model, and specifically:
  • the second parameter determining subunit is specifically utilized when the S32 is executed.
  • Obtaining a new iteration position S; Indicates that the object is expanded into a two-dimensional matrix according to the object description direction. Represents the synthesis of a cubic matrix in the direction of the object description;
  • the second positioning unit comprises:
  • the second positioning sub-unit is configured to locate the current frame based on the W id and W exp of the current frame, and specifically:
  • expand remapping cubic matrix is a two-dimensional matrix and a two-dimensional matrix ⁇ 1 dimensional matrix by multiplying the back, in accordance with the expression described in the obtained direction; represents ⁇ 2 ⁇ 2 matrix is expanded in front of a two-dimensional cubic described by object orientation After multiplying the matrix by the two-dimensional matrix behind ⁇ 2 , the obtained two-dimensional matrix is transformed into a square matrix according to the direction of the object description.
  • the device further comprises:
  • a second model training unit configured to perform the following operations to train the second positioning model:
  • U exp is a unitary matrix in which D is expanded into a two-dimensional matrix in the direction of expression description
  • U id is a unitary matrix in which D is expanded into a two-dimensional matrix in the direction of object description.
  • the first parameter determining subunit is further configured to directly adopt the stable value for the W id of the subsequent frame when the determined W id of each frame tends to be a stable value.
  • the device further comprises:
  • the identity identifying unit is configured to determine, if there is more than one object in the image, two object description parameters that minimize the Euclidean distance between the object description parameter of the previous frame and the object description parameter of the current frame to correspond to the same object.
  • the present invention accurately locates the current frame by combining the object description parameter and the expression description parameter of the previous frame on the basis of the rough positioning of the current frame, and considers the continuity of the preceding and succeeding frames in the video image. Relevance, thus achieving the positioning of key points in the video image.
  • FIG. 1 is a flowchart of a main method according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a method for training an SDM model according to an embodiment of the present invention
  • FIG. 3a is a schematic diagram of an example of a training set according to an embodiment of the present invention.
  • Figure 3b is a partial enlarged view of an image of Figure 3a;
  • FIG. 4 is a flowchart of a method for performing positioning by using an SDM model according to an embodiment of the present invention
  • FIG. 5 is a flowchart of a method for training a 3-mode SVD model according to an embodiment of the present invention
  • FIG. 6 is a diagram showing spatial representation of a stereo training data tensor according to an embodiment of the present invention.
  • FIG. 7 is a flowchart of a method for performing positioning by using a 3-mode SVD model according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of two frames before and after two objects according to an embodiment of the present invention.
  • FIG. 9 is a structural diagram of a device according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of a main method according to an embodiment of the present invention. As shown in FIG. 1 , when performing key point positioning on each frame image of a video, the following processing is performed on each frame in sequence:
  • the current positioning frame is used to locate the current frame, and the positions of the m positioning points are obtained.
  • a process of object detection before positioning each frame mainly detecting the approximate area and number of objects.
  • face detection can be performed on the current frame to determine a face.
  • the area and number The positioning performed by the present application is mainly to locate in the area of the face. If the number is more than one, the area of each face is separately positioned.
  • the manner of face detection can be adopted in the prior art, and will not be described in detail herein.
  • the method provided by the embodiments of the present invention can be applied to the location of a key point in a video. Because different people and different expressions have different positions, the position of the key points may be deviated. Therefore, whether the positioning model is constructed or the positioning model is used, It is carried out in three dimensions: object (ie, person) description, expression description, and position description. There is an object description parameter W id and an expression description parameter W exp for each frame. As the name suggests, W id is responsible for describing the object (ie, person) identifier in the image, and W exp is responsible for describing the expression in the image.
  • the present invention is not limited to the key point positioning of the human face, and the key point positioning may be performed on the animal such as a cat or a dog.
  • the key point positioning of the human face is taken as an example for description.
  • the invention actually uses the first positioning model to roughly locate the current frame, and then uses the second positioning model to accurately locate the current frame based on the positioning of the first positioning model and the W id and W exp of the previous frame. .
  • the initial value may be adopted, that is, the preset initial W id and the initial W exp are used as initial parameters, which will be specifically described in the following embodiments. .
  • the first positioning model for coarse positioning may adopt the SDM (Supervised Descent Method) model, and the principle of the SDM model for positioning is: the average shape of the face (ie, the average position of the key points) Starting from the mapping between the average position of the key points and the image texture features centered on the average position of each key point, the positioning of the key points of the face is completed, that is, m positioning points are obtained, in the embodiment of the present invention
  • the "positioning point" is used to indicate the key points obtained by positioning.
  • the advantage of the SDM model is that the starting position is not high, and the average shape can be used to achieve a higher positioning effect, but the disadvantage is that the algorithm has high complexity. Therefore, in the embodiment of the present invention, only coarse positioning is used, and less is obtained.
  • the number of anchor points is used, and less is obtained.
  • the second positioning model for precise positioning in the embodiment of the present invention can adopt the 3-mode SVD (3-mode Singular Value Decomposition) model.
  • the principle is as follows: Triggered from the average shape of the face organ, according to the mapping relationship between the average position of the key points and the existing positioning points, the precise positioning of the organ contour points is completed.
  • the advantage is that the algorithm has low complexity, high precision and good real-time performance.
  • the disadvantage is that it needs to realize the positioning of some key points.
  • the present invention can complete real-time positioning and tracking of each frame image in the video.
  • the method for performing key point positioning in video in the video combining the SDM model and the 3-mode SVD model will be described in detail below with reference to specific embodiments.
  • the SDM model is taken as an example to describe the step 101 shown in FIG. 1.
  • the training process of the SDM model is first described.
  • the training process is as shown in FIG. 2 and may include the following steps:
  • the key point position X real of the p images in the training set is determined.
  • an image construction training set of different expressions of different people may be collected, as shown in FIG. 3a, key points are pre-determined on the images, and the number of images in the training set is assumed to be p, and each image exists on the image.
  • m key point position (take one of the images in Fig. 3 as an example, and partially enlarge it as shown in Fig. 3b.
  • Fig. 3b there are m key points on the eyes, nose, eyebrows, mouth, ears, chin and other organs.
  • X real can be expressed as [x 1 , x 2 ,..., x p ] T , that is, the coordinates of each key point of each image of the training set, and the X real is recorded.
  • the average position of each key point in the p images is taken as the current iteration position X.
  • X can be expressed as among them That is, the average value of the coordinates of each key point in each image.
  • ⁇ X X real -X.
  • ⁇ X reflects the deviation between the true position of the key point and the iteration position.
  • Dim is a 2m-dimensional gradient feature extracted by a set range region centered on the average position of m key points in each image.
  • m key points that is, there are m average positions, with each average position as the center, and the 8 ⁇ 8 region range extracts the 64-dimensional gradient features of the image. Therefore Dim is 2112 (m x 64) dimensions.
  • This step actually sets a convergence condition, that is, if the modulus of ⁇ X is less than or equal to the first modulus, it indicates that the current deviation condition ⁇ X is within an acceptable range, and the iteration can be stopped.
  • the first modulus value can take an empirical value, for example, 1 and the convergence condition can be achieved by performing 4 iterations in general.
  • the current iteration position X is updated with the value obtained by ⁇ R+X, and the process proceeds to execution 203.
  • the current value of R is the parameter value obtained in the final training, and the training process is ended.
  • the training process of the SDM model is actually the process of determining the R, and R represents the mapping relationship between the degree of the positioning point shape deviating from the average positioning point shape and the image gradient feature.
  • the positioning process of step 101 shown in FIG. 1 by using the SDM model may be as shown in FIG. 4, and includes the following steps:
  • the image gradient feature of the current frame is extracted, and the image gradient feature and the R of the trained SDM model are used to obtain a degree ⁇ X of the shape of the anchor point of the current frame deviating from the average anchor point shape.
  • the image gradient feature of the current frame can be reflected by the gradient feature of the average anchor point on the current frame.
  • the so-called average positioning point refers to the average position of m key points in the training set on each image. Since there are p images in the training set, there are m key points on each image, and the m key points are respectively in p images. The average of the position coordinates on the top is the coordinate value of the average anchor point.
  • the position of the m positioning points of the current frame is the position of the current frame ⁇ X and the predetermined m average positioning points. Sum.
  • the training process of the 3-modeSVD model is first described. As shown in FIG. 5, the training process may include the following steps:
  • 501 images of different expressions of different objects are collected, and stereoscopic training data tensors are constructed according to object descriptions, expression descriptions, and position descriptions.
  • the object description is represented by id
  • the expression description is represented by exp
  • the position description is represented by the positioning point vertices
  • the spatial representation of the constructed stereo training data tensor can be as shown in FIG. 6.
  • the position average of n key points in each image is subtracted from the positions of n key points in the stereo training data tensor to obtain a stereo data tensor D.
  • the key point coordinate expansion on each image can be expressed as [x 1 , y 1 , x 2 , y 2 .. .x n , y n ], where the subscript is the identifier of each key point.
  • the position average of these n key points can be expressed as Then, the position of n key points in the stereo data tensor is respectively subtracted from the position average of n key points in each image, and the obtained stereo data tensor is D, so that the stereo data tensor D size is 2n ⁇ 39 ⁇ 400. Cubic matrix.
  • the formula in this step is based on the principle of the 3-modeSVD model.
  • the front ⁇ 1 ⁇ 1 represents a cubic matrix expanded two-dimensional matrix by multiplying a two-dimensional matrix and a two-dimensional matrix back ⁇ 1, obtained in accordance with the direction exp remapping according to a cubic matrix exp direction; represents 2 ⁇ ⁇ 2
  • the cubic matrix in front is expanded according to the id direction into a two-dimensional matrix and multiplied by the two-dimensional matrix behind ⁇ 2 , and the obtained two-dimensional matrix is transformed into a square matrix according to the id direction.
  • a cubic matrix D is actually two-dimensional matrix of expanded two-dimensional matrix after multiplying exp T U, obtained according to the direction exp remapping according to a cubic matrix exp direction; then transform the The cubic matrix is expanded into a two-dimensional matrix according to the id direction. After multiplication, the obtained two-dimensional matrix is transformed into a square matrix according to the id direction to obtain C exp_id .
  • U exp is a unitary matrix in which D is expanded into a two-dimensional matrix in the exp direction.
  • D is expanded into a two-dimensional matrix of 39 ⁇ 800n in the exp direction, and then SVD is decomposed, and the obtained unitary matrix is 39 ⁇ 39, which can be directly the size of 39 ⁇ 39 matrix as U exp.
  • the first 10 columns of the unitary matrix can be taken as U exp according to the singular value, and the size of U exp is 39 ⁇ 10.
  • U id is a unitary matrix in which D is expanded into a two-dimensional matrix in the id direction.
  • D is expanded into a two-dimensional matrix of 400 ⁇ 78n in the id direction, and then SVD is decomposed, and the obtained unitary matrix is 400 ⁇ 400, which can be directly
  • the 400 ⁇ 400 size matrix is taken as U id .
  • the first 20 columns of the matrix can be taken as U id according to the singular value, and the size of the U id is 400 ⁇ 20.
  • the obtained C exp_id size is 2n ⁇ 10 ⁇ 20.
  • the subsequent process of positioning using the 3-modeSVD model is based on this formula.
  • the process of positioning using the 3-mode SVD model will be described in detail below with reference to FIG. 7.
  • the positioning method is the same. Therefore, in FIG. 7, only one frame (described as the current frame) is positioned.
  • the description is made, that is, the implementation process of step 102 in FIG. As shown in Figure 7, the process can include the following steps:
  • the average value of the positions of the m positioning points determined by the SDM model is taken as the current iteration position.
  • the position average of m positioning points obtained by rough positioning using the SDM model is initially used as the current iteration position.
  • the current W id and W exp both use preset initial values.
  • the initial value of W id ie, the initial vector
  • the initial value of W exp initial vector
  • the modulus of the initial value of W id and the initial value of W exp is both 1.
  • the current W id and W exp are used on one of W id and W exp.
  • the 3-modeSVD model For the parameter vector C sdm_exp_id of the second positioning model corresponding to m positioning points, for example, suppose the SDM model locates m positioning points, and the number of positioning points of the 3-modeSVD model is n, then the 3-modeSVD model The n points in the parameter vector C exp_id are included in the m positioning points, and the parameter vectors corresponding to the m positioning points in the parameter vector C exp_id are determined as C sdm_exp_id .
  • the deviation ⁇ S of the position of the m positioning points determined by the SDM model from the new iteration position S is determined.
  • the new W id is determined using the current W id and ⁇ S .
  • ⁇ W id ( ⁇ T ⁇ ⁇ ) -1 ⁇ ⁇ T ⁇ ⁇ S, it is determined that ⁇ W id; then ⁇ W id of the current W id and determined as a new W id.
  • a convergence condition is actually set by using ⁇ S. If the value of ⁇ S satisfies the convergence condition, it indicates that the position of the m positioning points determined by the SDM model and the new iteration position S have little deviation, and W id out the current iteration (i.e., the current W id) W id as the current frame. If the convergence condition is not met, the iteration continues.
  • the second modulus value may be an empirical value, for example, one.
  • the current W id is updated with the new W id and the current iteration location is updated with the new iteration position S Go to execution 702.
  • the new W id is determined as W id of the current frame.
  • the actual method is to fix W exp by iterating W exp .
  • the following steps begin to determine the W exp of the current frame based on the W id of the current frame, that is, the fixed W id iterates out W exp , and the principle is the same.
  • the average value of the positions of the m positioning points that are located by using the SDM model is taken as the current iteration position.
  • a deviation ⁇ S from the position of the m anchor points located using the SDM model and the new iteration position S is determined.
  • the new W exp is determined using the current W exp and ⁇ S .
  • the third modulus value here can be an empirical value, for example, one.
  • the above determines the W exp of the current frame, and the W id of the current frame is also determined to be completed before, and then the current frame can be located in 715 by using the W id and W exp of the current frame.
  • a vector f containing the position of n anchor points of the current frame is obtained.
  • the positioning of the current frame ends.
  • the above process can be performed for each frame of the video to obtain the anchor points of each frame.
  • the W id gradually approaches a stable value, so the W id of the subsequent frames does not need to adopt the manner shown in FIG. 7 .
  • the stable value is then used to calculate the W exp of each frame.
  • the W id When it is determined whether the W id gradually reaches a stable value, it can be determined whether the modulo of the difference between the W id of the current frame and the W id of the previous frame is less than a preset threshold, for example, whether it is less than 1, and if so, whether the W id is determined. Gradually tend to stabilize.
  • the stable value may be an average value of the W id of the current frame and the W id of the previous frames. Of course, other ways of judging whether it is stable or determining a stable value may be used, and will not be enumerated here.
  • the image contains more than two objects, for example, two people
  • two positioning results each positioning result includes n positioning points
  • two W id are output. Since the characters in the video are not necessarily static, there may be positional changes caused by the movement, so it is necessary to distinguish which of the two frames is the same person. As shown in FIG. 8, the two images are two frames before and after, and the relative positions of the object A and the object B in the previous frame are changed in the current frame, and therefore, it is necessary to distinguish and recognize.
  • the front one of the positioning result and object description parameters are identified as f 1_pre, W 1_pre, f 2_pre , W 2_pre, current positioning result and the target frame described parameters are identified as f 1_cur, W 1_cur, f 2_cur , W 2_cur .
  • two object description parameters that have the smallest Euclidean distance between the object description parameter of the previous frame and the object description parameter of the current frame may be determined to correspond to the same object.
  • Euclidean distance Euclidean distance and calculates W 1_pre W 1_cur Euclidean distance, and W 2_cur W 1_pre, the Euclidean distance W 2_pre and W 1_cur of the W 2_pre and W 2_cur.
  • W 1_pre and W 2_cur is the smallest, it means that W 1_pre and W 2_cur correspond to the same object.
  • f 1_pre and f 2_cur correspond to the same object, and the positioning belonging to the same object can be The result is identified as the same object.
  • n may take an integer such as a hundred-level.
  • the apparatus may include: a first positioning unit 10 and a parameter determining unit 20
  • the second positioning unit 30 may further include a first model training unit 40, a second model training unit 50, and an identity identifying unit 60.
  • the first positioning unit 10 may specifically include a first deviation determining subunit 11 and a first positioning subunit 12.
  • the second positioning unit 30 may specifically include a first parameter determining subunit 31, a second parameter determining subunit 32, and a second positioning subunit 33.
  • the main functions of each component are as follows:
  • the first positioning unit 10 is responsible for positioning the current frame by using the first positioning model to obtain m positionings. The location of the point.
  • the parameter determination unit 20 is responsible for taking W id and W exp of the previous frame as initial parameters.
  • the second positioning unit 30 is responsible for determining the W id and W exp of the current frame by using the second positioning model based on the m positioning points and the initial parameters, and positioning the current frame based on the W id and W exp of the current frame to obtain n positioning. The location of the point.
  • W id is an object description parameter in the image
  • W exp is an expression description parameter in the image
  • the device actually uses the first positioning model to roughly locate the current frame, and then uses the second positioning model to accurately locate the current frame based on the positioning of the first positioning model and the W id and W exp of the previous frame.
  • the parameter determining unit 20 may first determine whether the current frame is the first frame of the video, and if not, use W id and W exp of the previous frame as initial parameters; if yes, since the first frame of the video does not exist in the previous frame Reference, so the preset initial W id and initial W exp can be used as initial parameters.
  • the first positioning model may be an SDM model
  • the second positioning model may be a 3-mode SVD model
  • the first model training unit 40 is responsible for operating the first positioning model, that is, the SDM model, and specifically performs the following operations:
  • the current value of R is the value of the parameter vector R of the trained SDM model; otherwise, the current iteration is updated by the value obtained by ⁇ R+X Position X, go to A13.
  • the first modulus value may take an empirical value, for example, 1 is taken.
  • Dim is a 2m-dimensional gradient feature extracted by a set range region centered on the positions of m average anchor points.
  • the composition of the first positioning unit 10 will be described below.
  • the first deviation determining sub-unit 11 is responsible for extracting the image gradient feature of the current frame, and using the image gradient feature and the first positioning model, the degree ⁇ X of the shape of the positioning point of the current frame deviating from the average positioning point shape is obtained.
  • the first positioning sub-unit 12 is responsible for obtaining the positions of the m positioning points of the current frame by using the ⁇ X of the current frame and the positions of the predetermined m average positioning points.
  • the position of the m positioning points of the current frame is the position of the current frame ⁇ X and the predetermined m average positioning points.
  • the 3-modeSVD model is used for further precise positioning.
  • the second model training unit 50 is first described.
  • the second model training unit 50 is responsible for training the second positioning model, the 3-mode SVD model. Specifically do the following:
  • ⁇ 1 indicates that the cubic matrix in front of ⁇ 1 is expanded into a two-dimensional matrix according to the expression description direction and multiplied by the two-dimensional matrix behind ⁇ 1 , and the obtained two-dimensional matrix is transformed into a square matrix according to the expression description direction;
  • ⁇ 2 indicates ⁇ 2 The cubic matrix in front is expanded according to the object description direction and multiplied by the two-dimensional matrix behind ⁇ 2 , and then the obtained two-dimensional matrix is transformed into a square matrix according to the object description direction.
  • the first parameter determining sub-unit 31 is responsible for determining the W id of the current frame by using the 3-mode SVD model, and specifically performs the following operations:
  • the current W id and W exp both use preset initial values.
  • the initial value of W id ie, the initial vector
  • the initial value of W exp initial vector
  • the modulus of the initial value of W id and the initial value of W exp is both 1.
  • the current W id and W exp are used on one of W id and W exp.
  • the first parameter determining subunit 31 can utilize Get a new iteration position S; Expressing a two-dimensional matrix in the direction of the expression description, Indicates that the cubic matrix is synthesized in the direction of the expression description.
  • the new W id of the current frame is determined as W id; otherwise, update with the new W id W id and the use of this new iteration location update S Current iteration position Go to execution S22.
  • the second modulus value here can be an empirical value, for example, one.
  • the second parameter determining sub-unit 32 is responsible for determining the W exp of the current frame by using the 3-mode SVD model, and specifically:
  • the new iteration position S is determined by the W id of the current frame and the current W exp and the parameter vector C sdm_exp_id of the second positioning model corresponding to the m positioning points.
  • the second parameter determining subunit 32 can utilize Get a new iteration position S; where Indicates that the object is expanded into a two-dimensional matrix according to the object description direction. Indicates that the cubic matrix is synthesized in the direction of the object description.
  • the new W exp current frame is determined as W exp; otherwise, update with the new W exp W exp current iteration and the use of new location update S Current iteration position Go to execute S32.
  • the third modulus value here can be an empirical value, for example, one.
  • the second locating sub-unit 33 is responsible for locating the current frame based on the W id and W exp of the current frame, and specifically:
  • a vector composed of the position averages of n anchor points in the image of the training set used in the 3-modeSVD model, and C exp_id is a parameter vector of the 3-mode SVD model.
  • W id value when the above-described method using video frames positioned, in the positioning of several number of frames, W id value became stable, then W id for subsequent frames need not be employed as described above can be obtained iteratively W id And use this stable value directly.
  • the stable value is then used to calculate the W exp of each frame.
  • it can be determined whether the modulo of the difference between the W id of the current frame and the W id of the previous frame is less than a preset threshold, for example, whether it is less than 1, and if so, whether the W id is determined. Gradually tend to stabilize.
  • the stable value may be an average value of the W id of the current frame and the W id of the previous frames. Of course, other ways of judging whether it is stable or determining a stable value may be used, and will not be enumerated here.
  • the identity identifying unit 60 is responsible for determining the two object description parameters that minimize the Euclidean distance between the object description parameter of the previous frame and the object description parameter of the current frame. Corresponding to the same object, the positioning result of the same object is identified, thereby distinguishing which object the positioning result on the image belongs to.
  • the method and apparatus provided by the present invention can have the following advantages:
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division, and the actual implementation may have another division manner.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium.
  • the above software functional unit is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform the methods of the various embodiments of the present invention. Part of the steps.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

Abstract

A method and apparatus for positioning a key point of an image. The method comprises: S1, using a first positioning model to position a current frame, and obtaining the positions of m positioning points (101); and S2, using W id and W exp of a previous frame as initial parameters, using a second positioning model to determine W id and W exp of the current frame based on the m positioning points, positioning the current frame based on W id and W exp of the current frame, and obtaining positions of n positioning points, wherein m is less than n, W id is an object description parameter in an image and W exp is an expression description parameter in the image (102). In the method and apparatus, the constraint problem that a previous frame and a subsequent frame are about the same object in a video is considered, the jittering of positioning points of the previous and subsequent frames in the video is reduced, and the visual effect is more natural and smoother.

Description

一种对图像进行关键点定位的方法和装置Method and device for key point positioning of image 【技术领域】[Technical Field]
本发明涉及计算机应用技术领域,特别涉及一种对图像进行关键点定位的方法和装置。The present invention relates to the field of computer application technologies, and in particular, to a method and apparatus for performing key point positioning on an image.
【背景技术】【Background technique】
随着智能终端的不断普及,人们利用智能终端进行图像处理的需求越来越高,各类美颜类APP受到爱美人士的广泛青睐。然而,现有这类APP都是基于静态图像的美颜处理,采用的方式主要是对静态图像进行器官的关键点定位。但目前尚没有针对视频图像进行关键点定位的方式。With the increasing popularity of smart terminals, the demand for image processing using smart terminals is increasing, and various beauty apps are widely favored by beauty lovers. However, the existing APPs are based on the beauty processing of still images, and the main method is to locate the key points of the organs in the static images. However, there is currently no way to locate key points for video images.
【发明内容】[Summary of the Invention]
有鉴于此,本发明提供了一种对图像进行关键点定位的方法和装置,以便于实现对视频图像进行的关键点定位。In view of this, the present invention provides a method and apparatus for keypoint positioning of an image to facilitate keypoint positioning of a video image.
具体技术方案如下:The specific technical solutions are as follows:
本发明提供了一种对图像进行关键点定位的方法,该方法包括:The invention provides a method for performing key point positioning on an image, the method comprising:
S1、利用第一定位模型对当前帧进行定位,得到m个定位点的位置;S1, using the first positioning model to locate the current frame, and obtaining positions of m positioning points;
S2、将上一帧的Wid和Wexp作为初始参数,基于所述m个定位点,利用第二定位模型确定当前帧的Wid和Wexp,并基于当前帧的Wid和Wexp对当前帧进行定位,得到n个定位点的位置;S2, using W id and W exp of the previous frame as initial parameters, determining, according to the m positioning points, the W id and W exp of the current frame by using the second positioning model, and based on the W id and W exp of the current frame Positioning the current frame to obtain the position of n positioning points;
其中所述m小于所述n,所述Wid为图像中的对象描述参数,所述Wexp为图像中的表情描述参数。Wherein m is smaller than the n, the W id is an object description parameter in an image, and the W exp is an expression description parameter in the image.
根据本发明一优选实施方式,在所述将上一帧的Wid和Wexp作为初始参数之前还包括:According to a preferred embodiment of the present invention, before the W id and W exp of the previous frame are used as initial parameters, the method further includes:
判断当前帧是否为视频的首帧,如果否,则继续执行所述上一帧的Wid和Wexp 作为初始参数的步骤;如果是,则将预设的初始Wid和初始Wexp作为初始参数。Determining whether the current frame is the first frame of the video, and if not, continuing to perform the steps of W id and W exp of the previous frame as initial parameters; if yes, using the initial initial W id and the initial W exp as initial parameter.
根据本发明一优选实施方式,所述第一定位模型为监督下降法SDM模型,所述第二定位模型为3模奇异值分解3-modeSVD模型。According to a preferred embodiment of the present invention, the first positioning model is a supervised descent method SDM model, and the second positioning model is a 3-mode singular value decomposition 3-mode SVD model.
根据本发明一优选实施方式,所述S1包括:According to a preferred embodiment of the present invention, the S1 includes:
S11、提取当前帧的图像梯度特征,利用所述图像梯度特征和所述第一定位模型,得到当前帧的定位点形状偏离平均定位点形状的程度ΔX;S11, extracting an image gradient feature of the current frame, using the image gradient feature and the first positioning model to obtain a degree ΔX of the shape of the positioning point of the current frame deviating from the average positioning point shape;
S12、利用当前帧的ΔX和预先确定的m个平均定位点的位置,得到当前帧的m个定位点的位置。S12. Using the ΔX of the current frame and the predetermined positions of the m average positioning points, obtain the positions of the m positioning points of the current frame.
根据本发明一优选实施方式,利用所述图像梯度特征和所述第一定位模型,得到当前帧的定位点形状偏离平均定位点形状的程度ΔX包括:According to a preferred embodiment of the present invention, using the image gradient feature and the first positioning model, the degree ΔX of the shape of the positioning point of the current frame deviating from the average positioning point shape is obtained:
利用ΔX=Φ·R确定ΔX;Determining ΔX by using ΔX=Φ·R;
其中当前帧的Φ为:Φ=p×Dim,p为所述第一定位模型所采用训练集的图像个数,Dim为分别以所述m个平均定位点的位置为中心的设定范围区域提取的2m维梯度特征,R为所述第一定位模型的参数向量。The Φ of the current frame is: Φ=p×Dim, where p is the number of images of the training set used by the first positioning model, and Dim is a set range area centered on the positions of the m average positioning points respectively. The extracted 2m-dimensional gradient feature, R is a parameter vector of the first positioning model.
根据本发明一优选实施方式,该方法还包括:预先训练所述第一定位模型,具体包括:According to a preferred embodiment of the present invention, the method further includes: pre-training the first positioning model, specifically:
A11、确定训练集中p个图像的关键点位置XrealA11, determine the image of the training set p key point X real;
A12、将各关键点在p个图像中的平均位置作为当前迭代位置X;A12. The average position of each key point in the p images is taken as the current iteration position X;
A13、利用ΔX=Xreal-X,确定ΔX的值;A13, using ΔX = X real -X, determining the value of Delta] X;
A14、利用R=(ΦT·Φ)-1·ΦT·ΔX,得到第一定位模型的参数向量R的当前值;A14. Using R=(Φ T ·Φ) -1 ·Φ T ·ΔX, obtain the current value of the parameter vector R of the first positioning model;
A15、如果ΔX的模小于或等于预设的第一模值,则R的当前值即为训练得到的第一定位模型的参数向量R的值;A15. If the modulus of ΔX is less than or equal to the preset first modulus, the current value of R is the value of the parameter vector R of the first positioning model obtained by the training;
否则,利用Φ·R+X得到的值更新当前迭代位置X,转至所述A13。Otherwise, the current iteration position X is updated with the value obtained by Φ·R+X, and the process goes to A13.
根据本发明一优选实施方式,所述利用第二定位模型确定当前帧的Wid包括:According to a preferred embodiment of the present invention, determining, by using the second positioning model, the W id of the current frame includes:
S21、将所述m个定位点的位置平均值作为当前迭代位置
Figure PCTCN2016099291-appb-000001
S21. The average value of the positions of the m positioning points is used as a current iteration position.
Figure PCTCN2016099291-appb-000001
S22、利用当前迭代位置
Figure PCTCN2016099291-appb-000002
当前的Wid和Wexp以及所述m个定位点对应的第 二定位模型的参数向量Csdm_exp_id,确定新的迭代位置S;
S22, using the current iteration position
Figure PCTCN2016099291-appb-000002
The current W id and W exp and the parameter vector C sdm_exp_id of the second positioning model corresponding to the m positioning points determine a new iteration position S;
S23、确定所述m个定位点的位置与所述新的迭代位置S的偏差ΔS;S23, determining a deviation ΔS between the position of the m positioning points and the new iteration position S;
S24、利用当前的Wid和ΔS,确定新的WidS24. Determine a new W id by using the current W id and ΔS.
S25、如果所述ΔS的模小于或等于预设的第二模值,则将所述新的Wid确定为当前帧的WidSecond mold value S25, if the ΔS is equal to or less than a preset mode, the new W id W id determined then the current frame;
否则,利用新的Wid更新当前Wid以及利用新的迭代位置S更新当前迭代位置
Figure PCTCN2016099291-appb-000003
转至执行所述S22。
Otherwise, with the new W id update the current W id and the use of new iteration location S to update the current iteration position
Figure PCTCN2016099291-appb-000003
Go to execute S22.
根据本发明一优选实施方式,所述S22包括:利用
Figure PCTCN2016099291-appb-000004
得到新的迭代位置S;其中所述
Figure PCTCN2016099291-appb-000005
Figure PCTCN2016099291-appb-000006
表示按表情描述方向展开成二维矩阵,
Figure PCTCN2016099291-appb-000007
表示按表情描述方向合成立方矩阵;
According to a preferred embodiment of the present invention, the S22 includes: utilizing
Figure PCTCN2016099291-appb-000004
Obtaining a new iteration position S;
Figure PCTCN2016099291-appb-000005
Figure PCTCN2016099291-appb-000006
Expressing a two-dimensional matrix in the direction of the expression description,
Figure PCTCN2016099291-appb-000007
Representing the synthesis of a cubic matrix in the direction of the expression description;
所述S24包括:The S24 includes:
利用ΔWid=(ΨT·Ψ)-1·ΨT·ΔS,确定ΔWidUsing ΔW id =(Ψ T ·Ψ) -1 ·Ψ T ·ΔS, determine ΔW id ;
将ΔWid与当前的Wid的和确定为新的WidThe sum of ΔW id and the current W id is determined as the new W id .
根据本发明一优选实施方式,所述利用第二定位模型确定当前帧的Wexp包括:According to a preferred embodiment of the present invention, determining, by using the second positioning model, the W exp of the current frame comprises:
S31、将所述m个定位点的位置平均值作为当前迭代位置
Figure PCTCN2016099291-appb-000008
S31. Using the average position of the m positioning points as the current iteration position
Figure PCTCN2016099291-appb-000008
S32、利用当前迭代位置
Figure PCTCN2016099291-appb-000009
当前帧的Wid和当前的Wexp以及所述m个定位点对应的第二定位模型的参数向量Csdm_exp_id,确定新的迭代位置S;
S32, using the current iteration position
Figure PCTCN2016099291-appb-000009
Determining a new iteration position S by the W id of the current frame and the current W exp and the parameter vector C sdm_exp_id of the second positioning model corresponding to the m positioning points;
S33、确定所述m个定位点的位置与所述新的迭代位置S的偏差ΔS;S33, determining a deviation ΔS between the position of the m positioning points and the new iteration position S;
S34、利用当前的Wexp和ΔS,确定新的WexpS34. Determine a new W exp by using the current W exp and ΔS;
S35、如果所述ΔS的模小于或等于预设的第三模值,则将所述新的Wexp确定为当前帧的WexpS35, if the mold is less than or equal ΔS preset third modulo value, the new W exp W exp determined then the current frame;
否则,利用新的Wexp更新当前Wexp以及利用新的迭代位置S更新当前迭代位置
Figure PCTCN2016099291-appb-000010
转至执行所述S32。
Otherwise, with the new W exp update the current W exp and the use of new iteration location S to update the current iteration position
Figure PCTCN2016099291-appb-000010
Go to execute S32.
根据本发明一优选实施方式,所述S32包括:利用
Figure PCTCN2016099291-appb-000011
得到新的 迭代位置S;其中所述
Figure PCTCN2016099291-appb-000012
Figure PCTCN2016099291-appb-000013
表示按对象描述方向展开成二维矩阵,
Figure PCTCN2016099291-appb-000014
表示按对象描述方向合成立方矩阵;
According to a preferred embodiment of the present invention, the S32 includes: utilizing
Figure PCTCN2016099291-appb-000011
Obtaining a new iteration position S;
Figure PCTCN2016099291-appb-000012
Figure PCTCN2016099291-appb-000013
Indicates that the object is expanded into a two-dimensional matrix according to the object description direction.
Figure PCTCN2016099291-appb-000014
Represents the synthesis of a cubic matrix in the direction of the object description;
所述S34包括:The S34 includes:
利用ΔWexp=(ΩT·Ω)-1·ΩT·ΔS,确定ΔWexpUsing ΔW exp = (Ω T · Ω) -1 · Ω T · ΔS, ΔW exp is determined;
将ΔWexp与当前的Wexp的和确定为新的WexpThe sum of ΔW exp and the current W exp is determined as the new W exp .
根据本发明一优选实施方式,所述基于当前帧的Wid和Wexp对当前帧进行定位包括:According to a preferred embodiment of the present invention, the positioning of the current frame based on the W id and W exp of the current frame includes:
利用
Figure PCTCN2016099291-appb-000015
得到包含当前帧n个定位点的位置的向量f;
use
Figure PCTCN2016099291-appb-000015
Obtaining a vector f containing the position of n positioning points of the current frame;
其中,
Figure PCTCN2016099291-appb-000016
为所述第二定位模型所采用训练集的图像中n个定位点的位置平均值构成的向量,Cexp_id为第二定位模型的参数向量,×1表示×1前面的立方矩阵按照表情描述方向展开为二维矩阵与×1后面的二维矩阵相乘后,将得到的二维矩阵再按照表情描述方向变换成立方矩阵;×2表示×2前面的立方矩阵按照对象描述方向展开为二维矩阵与×2后面的二维矩阵相乘后,将得到的二维矩阵再按照对象描述方向变换成立方矩阵。
among them,
Figure PCTCN2016099291-appb-000016
a vector formed by the average value of the positions of n positioning points in the image of the training set used in the second positioning model, C exp_id is a parameter vector of the second positioning model, and × 1 indicates that the cubic matrix in front of × 1 is in accordance with the expression description direction. expand remapping cubic matrix is a two-dimensional matrix and a two-dimensional matrix × 1 dimensional matrix by multiplying the back, in accordance with the expression described in the obtained direction; represents × 2 × 2 matrix is expanded in front of a two-dimensional cubic described by object orientation After multiplying the matrix by the two-dimensional matrix behind × 2 , the obtained two-dimensional matrix is transformed into a square matrix according to the direction of the object description.
根据本发明一优选实施方式,该方法还包括:预先训练所述第二定位模型,具体包括:According to a preferred embodiment of the present invention, the method further includes: pre-training the second positioning model, specifically:
B11、收集不同对象的不同表情的图像,按照对象描述、表情描述和位置描述构建立体训练数据张量;B11. Collect images of different expressions of different objects, and construct a stereo training data tensor according to the object description, the expression description, and the position description;
B12、将所述立体训练数据张量中n个关键点的位置分别减去n个关键点在各图像中的位置平均值,得到立体数据张量D;B12, the position of the n key points in the stereo training data tensor is respectively subtracted from the position average of the n key points in each image to obtain a stereo data tensor D;
B13、利用
Figure PCTCN2016099291-appb-000017
得到第二定位模型的参数向量Cexp_id
B13, use
Figure PCTCN2016099291-appb-000017
Obtaining a parameter vector C exp_id of the second positioning model;
其中,Uexp是D在表情描述方向展开为二维矩阵的酉矩阵,Uid是D在对象描述方向展开为二维矩阵的酉矩阵。Where U exp is a unitary matrix in which D is expanded into a two-dimensional matrix in the direction of expression description, and U id is a unitary matrix in which D is expanded into a two-dimensional matrix in the direction of object description.
根据本发明一优选实施方式,该方法还包括:According to a preferred embodiment of the present invention, the method further includes:
当确定出的各帧的Wid趋于稳定值时,对于后续帧的Wid直接采用所述稳定 值。When the determined W id of each frame tends to a stable value, the stable value is directly adopted for the W id of the subsequent frame.
根据本发明一优选实施方式,若图像中存在多于一个的对象,则将上一帧的对象描述参数与当前帧的对象描述参数的欧式距离最小的两个对象描述参数确定为对应于同一对象。According to a preferred embodiment of the present invention, if there is more than one object in the image, the two object description parameters that have the smallest Euclidean distance between the object description parameter of the previous frame and the object description parameter of the current frame are determined to correspond to the same object. .
本发明还提供了一种对图像进行关键点定位的装置,该装置包括:The present invention also provides an apparatus for performing key point positioning on an image, the apparatus comprising:
第一定位单元,用于利用第一定位模型对当前帧进行定位,得到m个定位点的位置;a first positioning unit, configured to locate the current frame by using the first positioning model, to obtain positions of the m positioning points;
参数确定单元,用于将上一帧的Wid和Wexp作为初始参数;a parameter determining unit, configured to use W id and W exp of the previous frame as initial parameters;
第二定位单元,用于基于所述m个定位点和初始参数,利用第二定位模型确定当前帧的Wid和Wexp,并基于当前帧的Wid和Wexp对当前帧进行定位,得到n个定位点的位置;a second positioning unit, configured to determine W id and W exp of the current frame by using the second positioning model based on the m positioning points and initial parameters, and locate the current frame based on W id and W exp of the current frame, and obtain The position of n positioning points;
其中所述m小于所述n,所述Wid为图像中的对象描述参数,所述Wexp为图像中的表情描述参数。Wherein m is smaller than the n, the W id is an object description parameter in an image, and the W exp is an expression description parameter in the image.
根据本发明一优选实施方式,所述参数确定单元,还用于判断当前帧是否为视频的首帧,如果否,则将上一帧的Wid和Wexp作为初始参数;如果是,则将预设的初始Wid和初始Wexp作为初始参数。According to a preferred embodiment of the present invention, the parameter determining unit is further configured to determine whether the current frame is the first frame of the video, and if not, the W id and W exp of the previous frame are used as initial parameters; if yes, The preset initial W id and initial W exp are used as initial parameters.
根据本发明一优选实施方式,所述第一定位模型为SDM模型,所述第二定位模型为3-modeSVD模型。According to a preferred embodiment of the present invention, the first positioning model is an SDM model, and the second positioning model is a 3-mode SVD model.
根据本发明一优选实施方式,所述第一定位单元包括:According to a preferred embodiment of the present invention, the first positioning unit comprises:
第一偏离确定子单元,用于提取当前帧的图像梯度特征,利用所述图像梯度特征和所述第一定位模型,得到当前帧的定位点形状偏离平均定位点形状的程度ΔX;The first deviation determining subunit is configured to extract an image gradient feature of the current frame, and use the image gradient feature and the first positioning model to obtain a degree ΔX of the shape of the positioning point of the current frame deviating from the average positioning point shape;
第一定位子单元,用于利用当前帧的ΔX和预先确定的m个平均定位点的位置,得到当前帧的m个定位点的位置。The first positioning sub-unit is configured to obtain the positions of the m positioning points of the current frame by using the ΔX of the current frame and the positions of the predetermined m average positioning points.
根据本发明一优选实施方式,所述第一偏离确定子单元,具体利用ΔX=Φ·R确定ΔX; According to a preferred embodiment of the present invention, the first deviation determining subunit, specifically determining ΔX by using ΔX=Φ·R;
其中当前帧的Φ为:Φ=p×Dim,p为所述第一定位模型所采用训练集的图像个数,Dim为分别以所述m个平均定位点的位置为中心的设定范围区域提取的2m维梯度特征,R为所述第一定位模型的参数向量。The Φ of the current frame is: Φ=p×Dim, where p is the number of images of the training set used by the first positioning model, and Dim is a set range area centered on the positions of the m average positioning points respectively. The extracted 2m-dimensional gradient feature, R is a parameter vector of the first positioning model.
根据本发明一优选实施方式,该装置还包括:According to a preferred embodiment of the present invention, the device further comprises:
第一模型训练单元,用于执行以下操作训练所述第一定位模型:a first model training unit, configured to perform the following operations to train the first positioning model:
A11、确定训练集中p个图像的关键点位置XrealA11, determine the training set p image key point X real;
A12、将各关键点在p个图像中的平均位置作为当前迭代位置X;A12. The average position of each key point in the p images is taken as the current iteration position X;
A13、利用ΔX=Xreal-X,确定ΔX的值;A13. Determine the value of ΔX by using ΔX=X real -X;
A14、利用R=(ΦT·Φ)-1·ΦT·ΔX,得到第一定位模型的参数向量R的当前值;A14. Using R=(Φ T ·Φ) -1 ·Φ T ·ΔX, obtain the current value of the parameter vector R of the first positioning model;
A15、如果ΔX的模小于或等于预设的第一模值,则R的当前值即为训练得到的第一定位模型的参数向量R的值;A15. If the modulus of ΔX is less than or equal to the preset first modulus, the current value of R is the value of the parameter vector R of the first positioning model obtained by the training;
否则,利用Φ·R+X得到的值更新当前迭代位置X,转至所述A13。Otherwise, the current iteration position X is updated with the value obtained by Φ·R+X, and the process goes to A13.
根据本发明一优选实施方式,所述第二定位单元包括:According to a preferred embodiment of the present invention, the second positioning unit comprises:
第一参数确定子单元,用于利用第二定位模型确定当前帧的Wid,具体执行以下操作:The first parameter determining subunit is configured to determine the W id of the current frame by using the second positioning model, and specifically perform the following operations:
S21、将所述m个定位点的位置平均值作为当前迭代位置
Figure PCTCN2016099291-appb-000018
S21. The average value of the positions of the m positioning points is used as a current iteration position.
Figure PCTCN2016099291-appb-000018
S22、利用当前迭代位置
Figure PCTCN2016099291-appb-000019
当前的Wid和Wexp以及所述m个定位点对应的第二定位模型的参数向量Csdm_exp_id,确定新的迭代位置S;
S22, using the current iteration position
Figure PCTCN2016099291-appb-000019
The current W id and W exp and the parameter vector C sdm_exp_id of the second positioning model corresponding to the m positioning points determine a new iteration position S;
S23、确定所述m个定位点的位置与所述新的迭代位置S的偏差ΔS;S23, determining a deviation ΔS between the position of the m positioning points and the new iteration position S;
S24、利用当前的Wid和ΔS,确定新的WidS24. Determine a new W id by using the current W id and ΔS.
S25、如果所述ΔS的模小于或等于预设的第二模值,则将所述新的Wid确定为当前帧的WidSecond mold value S25, if the ΔS is equal to or less than a preset mode, the new W id W id determined then the current frame;
否则,利用新的Wid更新当前Wid以及利用新的迭代位置S更新当前迭代位置
Figure PCTCN2016099291-appb-000020
转至执行所述S22。
Otherwise, with the new W id update the current W id and the use of new iteration location S to update the current iteration position
Figure PCTCN2016099291-appb-000020
Go to execute S22.
根据本发明一优选实施方式,所述第一参数确定子单元在执行所述S22时, 具体利用
Figure PCTCN2016099291-appb-000021
得到新的迭代位置S;其中所述
Figure PCTCN2016099291-appb-000022
Figure PCTCN2016099291-appb-000023
表示按表情描述方向展开成二维矩阵,
Figure PCTCN2016099291-appb-000024
表示按表情描述方向合成立方矩阵;
According to a preferred embodiment of the present invention, the first parameter determining subunit is specifically utilized when the S22 is executed.
Figure PCTCN2016099291-appb-000021
Obtaining a new iteration position S;
Figure PCTCN2016099291-appb-000022
Figure PCTCN2016099291-appb-000023
Expressing a two-dimensional matrix in the direction of the expression description,
Figure PCTCN2016099291-appb-000024
Representing the synthesis of a cubic matrix in the direction of the expression description;
在执行所述S24时,具体利用ΔWid=(ΨT·Ψ)-1·ΨT·ΔS,确定ΔWid;将ΔWid与当前的Wid的和确定为新的WidWhen performing the S24, the specific use of ΔW id = (Ψ T · Ψ ) -1 · Ψ T · ΔS, determine ΔW id; ΔW id with the current W id and determined as a new W id.
根据本发明一优选实施方式,所述第二定位单元包括:According to a preferred embodiment of the present invention, the second positioning unit comprises:
第二参数确定子单元,用于利用第二定位模型确定当前帧的Wexp,具体执行:The second parameter determining subunit is configured to determine W exp of the current frame by using the second positioning model, and specifically:
S31、将所述m个定位点的位置平均值作为当前迭代位置
Figure PCTCN2016099291-appb-000025
S31. Using the average position of the m positioning points as the current iteration position
Figure PCTCN2016099291-appb-000025
S32、利用当前迭代位置
Figure PCTCN2016099291-appb-000026
当前帧的Wid和当前的Wexp以及所述m个定位点对应的第二定位模型的参数向量Csdm_exp_id,确定新的迭代位置S;
S32, using the current iteration position
Figure PCTCN2016099291-appb-000026
Positioning a second parameter vector C sdm_exp_id Model W id and the current W exp current frame and the m corresponding anchor point, determining a new position of S iteration;
S33、确定所述m个定位点的位置与所述新的迭代位置S的偏差ΔS;S33, determining a deviation ΔS between the position of the m positioning points and the new iteration position S;
S34、利用当前的Wexp和ΔS,确定新的WexpS34. Determine a new W exp by using the current W exp and ΔS;
S35、如果所述ΔS的模小于或等于预设的第三模值,则将所述新的Wexp确定为当前帧的WexpS35, if the mold is less than or equal ΔS preset third modulo value, the new W exp W exp determined then the current frame;
否则,利用新的Wexp更新当前Wexp以及利用新的迭代位置S更新当前迭代位置
Figure PCTCN2016099291-appb-000027
转至执行所述S32。
Otherwise, with the new W exp update the current W exp and the use of new iteration location S to update the current iteration position
Figure PCTCN2016099291-appb-000027
Go to execute S32.
根据本发明一优选实施方式,所述第二参数确定子单元在执行所述S32时,具体利用
Figure PCTCN2016099291-appb-000028
得到新的迭代位置S;其中所述
Figure PCTCN2016099291-appb-000029
Figure PCTCN2016099291-appb-000030
表示按对象描述方向展开成二维矩阵,
Figure PCTCN2016099291-appb-000031
表示按对象描述方向合成立方矩阵;
According to a preferred embodiment of the present invention, the second parameter determining subunit is specifically utilized when the S32 is executed.
Figure PCTCN2016099291-appb-000028
Obtaining a new iteration position S;
Figure PCTCN2016099291-appb-000029
Figure PCTCN2016099291-appb-000030
Indicates that the object is expanded into a two-dimensional matrix according to the object description direction.
Figure PCTCN2016099291-appb-000031
Represents the synthesis of a cubic matrix in the direction of the object description;
在执行所述S34时,具体利用ΔWexp=(ΩT·Ω)-1·ΩT·ΔS,确定ΔWexp;将ΔWexp与当前的Wexp的和确定为新的WexpWhen performing the S34, the specific use of ΔW exp = (Ω T · Ω ) -1 · Ω T · ΔS, determine ΔW exp; ΔW exp with the determined current and W exp for the new W exp.
根据本发明一优选实施方式,所述第二定位单元包括:According to a preferred embodiment of the present invention, the second positioning unit comprises:
第二定位子单元,用于基于当前帧的Wid和Wexp对当前帧进行定位,具体执行:The second positioning sub-unit is configured to locate the current frame based on the W id and W exp of the current frame, and specifically:
利用
Figure PCTCN2016099291-appb-000032
得到包含当前帧n个定位点的位置的向量f;
use
Figure PCTCN2016099291-appb-000032
Obtaining a vector f containing the position of n positioning points of the current frame;
其中,
Figure PCTCN2016099291-appb-000033
为所述第二定位模型所采用训练集的图像中n个定位点的位置平均值构成的向量,Cexp_id为第二定位模型的参数向量,×1表示×1前面的立方矩阵按照表情描述方向展开为二维矩阵与×1后面的二维矩阵相乘后,将得到的二维矩阵再按照表情描述方向变换成立方矩阵;×2表示×2前面的立方矩阵按照对象描述方向展开为二维矩阵与×2后面的二维矩阵相乘后,将得到的二维矩阵再按照对象描述方向变换成立方矩阵。
among them,
Figure PCTCN2016099291-appb-000033
a vector formed by the average value of the positions of n positioning points in the image of the training set used in the second positioning model, C exp_id is a parameter vector of the second positioning model, and × 1 indicates that the cubic matrix in front of × 1 is in accordance with the expression description direction. expand remapping cubic matrix is a two-dimensional matrix and a two-dimensional matrix × 1 dimensional matrix by multiplying the back, in accordance with the expression described in the obtained direction; represents × 2 × 2 matrix is expanded in front of a two-dimensional cubic described by object orientation After multiplying the matrix by the two-dimensional matrix behind × 2 , the obtained two-dimensional matrix is transformed into a square matrix according to the direction of the object description.
根据本发明一优选实施方式,该装置还包括:According to a preferred embodiment of the present invention, the device further comprises:
第二模型训练单元,用于执行以下操作训练所述第二定位模型:a second model training unit, configured to perform the following operations to train the second positioning model:
B11、收集不同对象的不同表情的图像,按照对象描述、表情描述和位置描述构建立体训练数据张量;B11. Collect images of different expressions of different objects, and construct a stereo training data tensor according to the object description, the expression description, and the position description;
B12、将所述立体训练数据张量中n个关键点的位置分别减去n个关键点在各图像中的位置平均值,得到立体数据张量D;B12, the position of the n key points in the stereo training data tensor is respectively subtracted from the position average of the n key points in each image to obtain a stereo data tensor D;
B13、利用
Figure PCTCN2016099291-appb-000034
得到第二定位模型的参数向量Cexp_id
B13, use
Figure PCTCN2016099291-appb-000034
Obtaining a parameter vector C exp_id of the second positioning model;
其中,Uexp是D在表情描述方向展开为二维矩阵的酉矩阵,Uid是D在对象描述方向展开为二维矩阵的酉矩阵。Where U exp is a unitary matrix in which D is expanded into a two-dimensional matrix in the direction of expression description, and U id is a unitary matrix in which D is expanded into a two-dimensional matrix in the direction of object description.
根据本发明一优选实施方式,所述第一参数确定子单元,还用于在确定出的各帧的Wid趋于稳定值时,对于后续帧的Wid直接采用所述稳定值。According to a preferred embodiment of the present invention, the first parameter determining subunit is further configured to directly adopt the stable value for the W id of the subsequent frame when the determined W id of each frame tends to be a stable value.
根据本发明一优选实施方式,该装置还包括:According to a preferred embodiment of the present invention, the device further comprises:
身份标识单元,用于若图像中存在多于一个的对象,则将上一帧的对象描述参数与当前帧的对象描述参数的欧式距离最小的两个对象描述参数确定为对应于同一对象。The identity identifying unit is configured to determine, if there is more than one object in the image, two object description parameters that minimize the Euclidean distance between the object description parameter of the previous frame and the object description parameter of the current frame to correspond to the same object.
由以上技术方案可以看出,本发明在对当前帧的粗略定位基础上,结合上一帧的对象描述参数和表情描述参数对当前帧进行精确定位,考虑了视频图像中前后帧的连续性和关联性,从而实现了对视频图像中关键点的定位。 It can be seen from the above technical solution that the present invention accurately locates the current frame by combining the object description parameter and the expression description parameter of the previous frame on the basis of the rough positioning of the current frame, and considers the continuity of the preceding and succeeding frames in the video image. Relevance, thus achieving the positioning of key points in the video image.
【附图说明】[Description of the Drawings]
图1为本发明实施例提供的主要方法流程图;FIG. 1 is a flowchart of a main method according to an embodiment of the present invention;
图2为本发明实施例提供的训练SDM模型的方法流程图;2 is a flowchart of a method for training an SDM model according to an embodiment of the present invention;
图3a为本发明实施例提供的训练集的实例图;FIG. 3a is a schematic diagram of an example of a training set according to an embodiment of the present invention; FIG.
图3b为图3a中一个图像的局部放大图;Figure 3b is a partial enlarged view of an image of Figure 3a;
图4为本发明实施例提供的利用SDM模型进行定位的方法流程图;4 is a flowchart of a method for performing positioning by using an SDM model according to an embodiment of the present invention;
图5为本发明实施例提供的训练3-modeSVD模型的方法流程图;FIG. 5 is a flowchart of a method for training a 3-mode SVD model according to an embodiment of the present invention;
图6为本发明实施例提供的立体训练数据张量在空间上的体现图示;FIG. 6 is a diagram showing spatial representation of a stereo training data tensor according to an embodiment of the present invention; FIG.
图7为本发明实施例提供的利用3-modeSVD模型进行定位的方法流程图;FIG. 7 is a flowchart of a method for performing positioning by using a 3-mode SVD model according to an embodiment of the present invention;
图8为本发明实施例提供的包含两个对象的前后两帧的示意图;FIG. 8 is a schematic diagram of two frames before and after two objects according to an embodiment of the present invention; FIG.
图9为本发明实施例提供的装置结构图。FIG. 9 is a structural diagram of a device according to an embodiment of the present invention.
【具体实施方式】【detailed description】
为了使本发明的目的、技术方案和优点更加清楚,下面结合附图和具体实施例对本发明进行详细描述。The present invention will be described in detail below with reference to the drawings and specific embodiments.
图1为本发明实施例提供的主要方法流程图,如图1中所示,在对视频的各帧图像进行关键点定位时,依次对各帧执行以下处理:FIG. 1 is a flowchart of a main method according to an embodiment of the present invention. As shown in FIG. 1 , when performing key point positioning on each frame image of a video, the following processing is performed on each frame in sequence:
在101中,利用第一定位模型对当前帧进行定位,得到m个定位点的位置。In 101, the current positioning frame is used to locate the current frame, and the positions of the m positioning points are obtained.
在本申请中,对每一帧进行定位之前会有一个对象检测的过程,主要是检测对象的大概区域和数量,以人脸定位为例,首先可以对当前帧进行人脸检测,确定人脸的区域和数量。本申请进行的定位主要是在人脸的区域进行定位,如果数量多于1个,那么分别针对每一个人脸的区域进行定位。人脸检测的方式可以采用现有技术中的方式,在此不再详述。In this application, there is a process of object detection before positioning each frame, mainly detecting the approximate area and number of objects. Taking face positioning as an example, firstly, face detection can be performed on the current frame to determine a face. The area and number. The positioning performed by the present application is mainly to locate in the area of the face. If the number is more than one, the area of each face is separately positioned. The manner of face detection can be adopted in the prior art, and will not be described in detail herein.
在102中,将上一帧的Wid和Wexp作为初始参数,基于上述m个定位点, 利用第二定位模型确定当前帧的Wid和Wexp,并基于当前帧的Wid和Wexp对当前帧进行定位,得到n个定位点的位置,其中m小于n。In 102, using W id and W exp of the previous frame as initial parameters, based on the above m positioning points, determining W id and W exp of the current frame by using the second positioning model, and based on W id and W exp of the current frame Positioning the current frame results in the location of n anchor points, where m is less than n.
本发明实施例所提供的方法可以应用于视频中的人脸关键点定位,由于不同人、不同表情时,关键点的位置会存在偏差,因此无论是构建定位模型还是利用定位模型的定位,都在对象(即人)描述、表情描述和位置描述这三个维度上进行。每一帧都存在对象描述参数Wid和表情描述参数Wexp,顾名思义,Wid负责对图像中的对象(即人)标识进行描述,Wexp负责对图像中的表情进行描述。当然,本发明并不限于对人脸进行关键点定位,也可以对诸如猫、狗等动物进行关键点定位,在本发明实施例中以人脸的关键点定位为例进行描述。The method provided by the embodiments of the present invention can be applied to the location of a key point in a video. Because different people and different expressions have different positions, the position of the key points may be deviated. Therefore, whether the positioning model is constructed or the positioning model is used, It is carried out in three dimensions: object (ie, person) description, expression description, and position description. There is an object description parameter W id and an expression description parameter W exp for each frame. As the name suggests, W id is responsible for describing the object (ie, person) identifier in the image, and W exp is responsible for describing the expression in the image. Of course, the present invention is not limited to the key point positioning of the human face, and the key point positioning may be performed on the animal such as a cat or a dog. In the embodiment of the present invention, the key point positioning of the human face is taken as an example for description.
本发明实际上,是利用第一定位模型对当前帧进行粗略定位,然后在第一定位模型的定位基础上,结合上一帧的Wid和Wexp采用第二定位模型对当前帧进行精确定位。The invention actually uses the first positioning model to roughly locate the current frame, and then uses the second positioning model to accurately locate the current frame based on the positioning of the first positioning model and the W id and W exp of the previous frame. .
另外,对于视频的首帧而言,由于首帧不存在上一帧,因此均可以采用初始值,即将预设的初始Wid和初始Wexp作为初始参数,具体将在后续实施例中详细描述。In addition, for the first frame of the video, since the previous frame does not exist in the previous frame, the initial value may be adopted, that is, the preset initial W id and the initial W exp are used as initial parameters, which will be specifically described in the following embodiments. .
在本发明实施例中,进行粗略定位的第一定位模型可以采用SDM(Supervised Descent Method,监督下降法)模型,SDM模型进行定位的原理在于:从人脸的平均形状(即关键点的平均位置)出发,根据关键点的平均位置以及从各关键点的平均位置为中心的图像纹理特征之间的映射关系,来完成人脸关键点的定位,即得到m个定位点,本发明实施例中以“定位点”来表示定位得到的关键点。SDM模型的优点是对起始位置要求不高,利用平均形状就可以达到较高的定位效果,但缺点就是算法复杂度高,因此本发明实施例中仅用它来进行粗略定位,得到较少数量的定位点。In the embodiment of the present invention, the first positioning model for coarse positioning may adopt the SDM (Supervised Descent Method) model, and the principle of the SDM model for positioning is: the average shape of the face (ie, the average position of the key points) Starting from the mapping between the average position of the key points and the image texture features centered on the average position of each key point, the positioning of the key points of the face is completed, that is, m positioning points are obtained, in the embodiment of the present invention The "positioning point" is used to indicate the key points obtained by positioning. The advantage of the SDM model is that the starting position is not high, and the average shape can be used to achieve a higher positioning effect, but the disadvantage is that the algorithm has high complexity. Therefore, in the embodiment of the present invention, only coarse positioning is used, and less is obtained. The number of anchor points.
本发明实施例中进行精确定位的第二定位模型可以采用3-modeSVD(3-mode Singular Value Decomposition,3模奇异值分解)模型的原理在于: 从人脸器官的平均形状触发,根据关键点的平均位置与已有定位点之间的映射关系,完成器官轮廓点的精确定位。其优点是算法复杂度低,精度高,实时性好,缺点就是需要实现完成部分关键点的定位。The second positioning model for precise positioning in the embodiment of the present invention can adopt the 3-mode SVD (3-mode Singular Value Decomposition) model. The principle is as follows: Triggered from the average shape of the face organ, according to the mapping relationship between the average position of the key points and the existing positioning points, the precise positioning of the organ contour points is completed. The advantage is that the algorithm has low complexity, high precision and good real-time performance. The disadvantage is that it needs to realize the positioning of some key points.
结合上述两种模型的优缺点,本发明就能够完成视频中各帧图像的实时定位和跟踪。下面结合具体实施例分别对结合SDM模型和3-modeSVD模型进行视频中图像关键点定位的方法进行详细描述。Combining the advantages and disadvantages of the above two models, the present invention can complete real-time positioning and tracking of each frame image in the video. The method for performing key point positioning in video in the video combining the SDM model and the 3-mode SVD model will be described in detail below with reference to specific embodiments.
首先以SDM模型为例对图1中所示步骤101进行详细描述,为了方便对SDM模型的理解,首先对SDM模型的训练过程进行描述。该训练过程该过程如图2中所示,可以包括以下步骤:First, the SDM model is taken as an example to describe the step 101 shown in FIG. 1. In order to facilitate the understanding of the SDM model, the training process of the SDM model is first described. The training process is as shown in FIG. 2 and may include the following steps:
在201中,确定训练集中p个图像的关键点位置XrealIn 201, the key point position X real of the p images in the training set is determined.
在本发明实施例中,可以收集不同人的不同表情的图像构建训练集,如图3a中所示,在这些图像上预先确定关键点,假设训练集中图像数量为p,每一个图像上都存在m个关键点位置(以图3中其中一个图像为例,对其局部放大如图3b所示,图3b中在眼睛、鼻子、眉毛、嘴巴、耳朵、下巴等器官上共存在m个关键点),那么Xreal可以表示为[x1,x2,...,xp]T,即训练集各图像每个关键点的坐标,记录该XrealIn the embodiment of the present invention, an image construction training set of different expressions of different people may be collected, as shown in FIG. 3a, key points are pre-determined on the images, and the number of images in the training set is assumed to be p, and each image exists on the image. m key point position (take one of the images in Fig. 3 as an example, and partially enlarge it as shown in Fig. 3b. In Fig. 3b, there are m key points on the eyes, nose, eyebrows, mouth, ears, chin and other organs. Then, X real can be expressed as [x 1 , x 2 ,..., x p ] T , that is, the coordinates of each key point of each image of the training set, and the X real is recorded.
在202中,将各关键点在p个图像中的平均位置作为当前迭代位置X。In 202, the average position of each key point in the p images is taken as the current iteration position X.
在本步骤中,X可以表示为
Figure PCTCN2016099291-appb-000035
其中
Figure PCTCN2016099291-appb-000036
即每个关键点在各图像中的坐标平均值。
In this step, X can be expressed as
Figure PCTCN2016099291-appb-000035
among them
Figure PCTCN2016099291-appb-000036
That is, the average value of the coordinates of each key point in each image.
在203中,利用ΔX=Xreal-X,确定ΔX的值。In 203, the value of ΔX is determined using ΔX = X real -X.
ΔX体现的是关键点的真实位置与迭代位置之间的偏差状况。ΔX reflects the deviation between the true position of the key point and the iteration position.
在204中,利用R=(ΦT·Φ)-1·ΦT·ΔX得到第一定位模型的参数向量R的当前值。At 204, the current value of the parameter vector R of the first positioning model is obtained using R = (Φ T · Φ) -1 · Φ T · ΔX.
其中Φ=p×Dim,Dim为分别以m个关键点在各图像中的平均位置为中心的设定范围区域提取的2m维梯度特征。假设有m个关键点,即存在m个平均位置,以每个平均位置为中心,8×8的区域范围提取图像64维的梯度特征, 因此Dim为2112(m×64)维。Where Φ=p×Dim, Dim is a 2m-dimensional gradient feature extracted by a set range region centered on the average position of m key points in each image. Suppose there are m key points, that is, there are m average positions, with each average position as the center, and the 8×8 region range extracts the 64-dimensional gradient features of the image. Therefore Dim is 2112 (m x 64) dimensions.
在205中,判断ΔX的模是否小于或等于预设的第一模值,如果是,执行207;否则,执行206。In 205, it is determined whether the modulo of ΔX is less than or equal to the preset first modulo value, and if so, 207 is performed; otherwise, 206 is performed.
本步骤实际上设置了一个收敛条件,即如果ΔX的模小于或等于第一模值,则说明当前偏差状况ΔX在可接受范围内,可以停止迭代。其中第一模值可以取经验值,例如取1,通常情况下进行4次迭代就可以达到该收敛条件。This step actually sets a convergence condition, that is, if the modulus of ΔX is less than or equal to the first modulus, it indicates that the current deviation condition ΔX is within an acceptable range, and the iteration can be stopped. The first modulus value can take an empirical value, for example, 1 and the convergence condition can be achieved by performing 4 iterations in general.
在206中,利用Φ·R+X得到的值更新当前迭代位置X,转至执行203。At 206, the current iteration position X is updated with the value obtained by Φ·R+X, and the process proceeds to execution 203.
如果未满足收敛条件,则更新当前迭代位置X后,转至203继续进行迭代,直至满足收敛条件。If the convergence condition is not met, after updating the current iteration position X, go to 203 to continue the iteration until the convergence condition is met.
在207中,R的当前值即为最终训练得得到的参数值,结束训练过程。In 207, the current value of R is the parameter value obtained in the final training, and the training process is ended.
可以看出,SDM模型的训练过程,实际上就是确定R的过程,R体现的就是定位点形状偏离平均定位点形状的程度与图像梯度特征的映射关系。在SDM模型训练完毕后,利用SDM模型实现图1中所示步骤101的定位过程可以如图4中所示,包括以下步骤:It can be seen that the training process of the SDM model is actually the process of determining the R, and R represents the mapping relationship between the degree of the positioning point shape deviating from the average positioning point shape and the image gradient feature. After the SDM model is completed, the positioning process of step 101 shown in FIG. 1 by using the SDM model may be as shown in FIG. 4, and includes the following steps:
在401中,提取当前帧的图像梯度特征,利用图像梯度特征和训练得到的SDM模型的R,得到当前帧的定位点形状偏离平均定位点形状的程度ΔX。In 401, the image gradient feature of the current frame is extracted, and the image gradient feature and the R of the trained SDM model are used to obtain a degree ΔX of the shape of the anchor point of the current frame deviating from the average anchor point shape.
在本步骤中,当前帧的图像梯度特征可以通过平均定位点在当前帧上的梯度特征来反映。所谓平均定位点指的是训练集中m个关键点在各图像上的平均位置,由于训练集中存在p个图像,每个图像上都有m个关键点,这m个关键点分别在p个图像上的位置坐标的平均值就是平均定位点的坐标值。In this step, the image gradient feature of the current frame can be reflected by the gradient feature of the average anchor point on the current frame. The so-called average positioning point refers to the average position of m key points in the training set on each image. Since there are p images in the training set, there are m key points on each image, and the m key points are respectively in p images. The average of the position coordinates on the top is the coordinate value of the average anchor point.
因此图像梯度特征可以用Φ表示,Φ=p×Dim,其中当前帧的Dim就是在当前帧上分别以m个平均定位点的位置为中心的设定范围区域提取的2m维梯度特征。由于SDM模型中的参数向量R体现的就是定位点形状偏离平均定位点形状的程度与图像梯度特征的映射关系,因此在确定ΔX时采用公式ΔX=Φ·R,即利用当前帧的Φ点乘SDM模型的R。Therefore, the image gradient feature can be represented by Φ, Φ=p×Dim, where the Dim of the current frame is the 2m-dimensional gradient feature extracted by the set range region centered on the positions of the m average positioning points on the current frame. Since the parameter vector R in the SDM model reflects the mapping relationship between the degree of the positioning point shape deviating from the average positioning point shape and the image gradient feature, the formula ΔX=Φ·R is used when determining ΔX, that is, by multiplying the Φ point of the current frame. R of the SDM model.
在402中,利用当前帧的ΔX和预先确定的m个平均定位点的位置,得 到当前帧的m个定位点的位置。In 402, using the ΔX of the current frame and the predetermined position of the m average positioning points, The position of m anchor points to the current frame.
根据ΔX的含义可知,当前帧的m个定位点的位置就是当前帧的ΔX与预先确定的m个平均定位点的位置
Figure PCTCN2016099291-appb-000037
之和。
According to the meaning of ΔX, the position of the m positioning points of the current frame is the position of the current frame ΔX and the predetermined m average positioning points.
Figure PCTCN2016099291-appb-000037
Sum.
在利用SDM模型完成m个定位点的粗略定位后,下面就要利用3-modeSVD模型进行进一步的精确定位。为了方便对3-modeSVD模型的定位方式的理解,首先对3-modeSVD模型的训练过程进行描述。如图5所示,该训练过程可以包括以下步骤:After the rough positioning of m positioning points is completed by using the SDM model, the 3-modeSVD model is used for further precise positioning. In order to facilitate the understanding of the positioning mode of the 3-modeSVD model, the training process of the 3-modeSVD model is first described. As shown in FIG. 5, the training process may include the following steps:
在501中,收集不同对象的不同表情的图像,按照对象描述、表情描述和位置描述构建立体训练数据张量。In 501, images of different expressions of different objects are collected, and stereoscopic training data tensors are constructed according to object descriptions, expression descriptions, and position descriptions.
在本发明实施例中,对象描述采用id表示,表情描述采用exp表示,位置描述采用定位点vertices表示,构建的立体训练数据张量在空间上的体现可以如图6中所示。In the embodiment of the present invention, the object description is represented by id, the expression description is represented by exp, and the position description is represented by the positioning point vertices, and the spatial representation of the constructed stereo training data tensor can be as shown in FIG. 6.
在502中,将立体训练数据张量中n个关键点的位置分别减去n个关键点在各图像中的位置平均值,得到立体数据张量D。In 502, the position average of n key points in each image is subtracted from the positions of n key points in the stereo training data tensor to obtain a stereo data tensor D.
如果构建400人,每人39种表情的图像,每个图像上存在n个关键点,那么每个图像上的关键点坐标展开可以表示为[x1,y1,x2,y2...xn,yn],其中下标为各关键点的标识。这n个关键点的位置平均值可以表示为
Figure PCTCN2016099291-appb-000038
然后将立体数据张量中n个关键点的位置分别减去n个关键点在各图像中的位置平均值,得到的立体数据张量为D,这样立体数据张量D大小为2n×39×400的立方矩阵。
If you construct 400 images of 39 expressions per person and there are n key points on each image, the key point coordinate expansion on each image can be expressed as [x 1 , y 1 , x 2 , y 2 .. .x n , y n ], where the subscript is the identifier of each key point. The position average of these n key points can be expressed as
Figure PCTCN2016099291-appb-000038
Then, the position of n key points in the stereo data tensor is respectively subtracted from the position average of n key points in each image, and the obtained stereo data tensor is D, so that the stereo data tensor D size is 2n×39×400. Cubic matrix.
在503中,利用
Figure PCTCN2016099291-appb-000039
得到3-modeSVD模型的参数向量Cexp_id
In 503, use
Figure PCTCN2016099291-appb-000039
The parameter vector C exp_id of the 3-modeSVD model is obtained .
本步骤中的公式,是依据3-modeSVD模型的原理得到的。其中,×1表示×1前面的立方矩阵按照exp方向展开为二维矩阵与×1后面的二维矩阵相乘后,将得到的二维矩阵再按照exp方向变换成立方矩阵;×2表示×2前面的立方矩阵按照id方向展开为二维矩阵与×2后面的二维矩阵相乘后,将得到的二维矩 阵再按照id方向变换成立方矩阵。在本步骤的公式中,实际上是将立方矩阵D按照exp方向展开为二维矩阵与Uexp T相乘后,将得到的二维矩阵再按照exp方向变换成立方矩阵;再将变换得到的立方矩阵按照id方向展开为二维矩阵后与
Figure PCTCN2016099291-appb-000040
相乘后,将得到的二维矩阵再按照id方向变换成立方矩阵,得到Cexp_id
The formula in this step is based on the principle of the 3-modeSVD model. Wherein, the front × 1 × 1 represents a cubic matrix expanded two-dimensional matrix by multiplying a two-dimensional matrix and a two-dimensional matrix back × 1, obtained in accordance with the direction exp remapping according to a cubic matrix exp direction; represents 2 × × 2 The cubic matrix in front is expanded according to the id direction into a two-dimensional matrix and multiplied by the two-dimensional matrix behind × 2 , and the obtained two-dimensional matrix is transformed into a square matrix according to the id direction. In this step, a formula, a cubic matrix D is actually two-dimensional matrix of expanded two-dimensional matrix after multiplying exp T U, obtained according to the direction exp remapping according to a cubic matrix exp direction; then transform the The cubic matrix is expanded into a two-dimensional matrix according to the id direction.
Figure PCTCN2016099291-appb-000040
After multiplication, the obtained two-dimensional matrix is transformed into a square matrix according to the id direction to obtain C exp_id .
Uexp是D在exp方向展开为二维矩阵的酉矩阵,接续上例,D在exp方向展开为39×800n的二维矩阵,然后进行SVD分解,得到的酉矩阵是39×39,可以直接将该39×39大小的矩阵作为Uexp。为了减少模型大小,提高定位速度,也可以根据奇异值大小,取酉矩阵的前10列作为Uexp,那么Uexp的大小为39×10。U exp is a unitary matrix in which D is expanded into a two-dimensional matrix in the exp direction. Following the above example, D is expanded into a two-dimensional matrix of 39×800n in the exp direction, and then SVD is decomposed, and the obtained unitary matrix is 39×39, which can be directly the size of 39 × 39 matrix as U exp. In order to reduce the size of the model and improve the positioning speed, the first 10 columns of the unitary matrix can be taken as U exp according to the singular value, and the size of U exp is 39×10.
Uid是D在id方向展开为二维矩阵的酉矩阵,接续上例,D在id方向展开为400×78n的二维矩阵,然后进行SVD分解,得到的酉矩阵是400×400,可以直接将该400×400大小的矩阵作为Uid。为了减少模型大小,提高定位速度,也可以根据奇异值大小,取酉矩阵的前20列作为Uid,那么Uid的大小为400×20。U id is a unitary matrix in which D is expanded into a two-dimensional matrix in the id direction. Following the above example, D is expanded into a two-dimensional matrix of 400×78n in the id direction, and then SVD is decomposed, and the obtained unitary matrix is 400×400, which can be directly The 400×400 size matrix is taken as U id . In order to reduce the size of the model and improve the positioning speed, the first 20 columns of the matrix can be taken as U id according to the singular value, and the size of the U id is 400×20.
如果Uexp大小为39×10,Uid大小为400×20,那么得到的Cexp_id大小为2n×10×20。If the U exp size is 39×10 and the U id size is 400×20, the obtained C exp_id size is 2n×10×20.
由步骤503中的公式可以看出,D≈Cexp_id×1Uexp×2Uid,对于训练集中的一副图像的关键点位置f,否可以由训练得到的Cexp_id、Uexp的列向量、Uid的列向量表示,其中Uexp的列向量为Wexp,Uid的列向量表示为Wid。即:It can be seen from the formula in step 503 that D≈C exp_id × 1 U exp × 2 U id , for the key point position f of an image in the training set, or the column vector of C exp_id and U exp which can be obtained by training The column vector of U id represents that the column vector of U exp is W exp , and the column vector of U id is represented as W id . which is:
Figure PCTCN2016099291-appb-000041
进一步推导出:
Figure PCTCN2016099291-appb-000041
Further derivation:
Figure PCTCN2016099291-appb-000042
后续利用3-modeSVD模型进行定位的过程就是基于该公式。下面结合图7对利用3-modeSVD模型进行定位的过程进行详细描述,对于每一帧而言,定位采用的方法相同,因此,在图7中仅对一帧(描述为当前帧)的定位过程进行描述,即图1中的步骤102的实现过程。 如图7中所示,该过程可以包括以下步骤:
Figure PCTCN2016099291-appb-000042
The subsequent process of positioning using the 3-modeSVD model is based on this formula. The process of positioning using the 3-mode SVD model will be described in detail below with reference to FIG. 7. For each frame, the positioning method is the same. Therefore, in FIG. 7, only one frame (described as the current frame) is positioned. The description is made, that is, the implementation process of step 102 in FIG. As shown in Figure 7, the process can include the following steps:
在701中,将SDM模型确定出的m个定位点的位置平均值作为当前迭代位置
Figure PCTCN2016099291-appb-000043
In 701, the average value of the positions of the m positioning points determined by the SDM model is taken as the current iteration position.
Figure PCTCN2016099291-appb-000043
本步骤中,初始将利用SDM模型进行粗略定位得到的m个定位点的位置平均值作为当前迭代位置
Figure PCTCN2016099291-appb-000044
In this step, the position average of m positioning points obtained by rough positioning using the SDM model is initially used as the current iteration position.
Figure PCTCN2016099291-appb-000044
在702中,利用当前迭代位置
Figure PCTCN2016099291-appb-000045
当前的Wid和Wexp以及上述m个定位点对应的第二定位模型的参数向量Csdm_exp_id,确定新的迭代位置S。
In 702, utilizing the current iteration position
Figure PCTCN2016099291-appb-000045
The current W id and W exp and the parameter vector C sdm_exp_id of the second positioning model corresponding to the above m positioning points determine a new iteration position S.
对于视频的首帧而言,当前的Wid和Wexp均采用预设的初始值,依据奇异值大小,可以取Wid的初始值(即初始向量)为
Figure PCTCN2016099291-appb-000046
Wexp的初始值(初始向量)为
Figure PCTCN2016099291-appb-000047
Wid的初始值和Wexp的初始值的模均为1。
For the first frame of the video, the current W id and W exp both use preset initial values. According to the singular value, the initial value of W id (ie, the initial vector) can be taken as
Figure PCTCN2016099291-appb-000046
The initial value of W exp (initial vector) is
Figure PCTCN2016099291-appb-000047
The modulus of the initial value of W id and the initial value of W exp is both 1.
对于视频的非首帧而言,当前的Wid和Wexp均采用上一帧的Wid和WexpFor non-video of the first frame, the current W id and W exp are used on one of W id and W exp.
在本步骤中,可以利用如下公式,得到新的迭代位置S:In this step, you can use the following formula to get a new iteration position S:
Figure PCTCN2016099291-appb-000048
其中
Figure PCTCN2016099291-appb-000049
Figure PCTCN2016099291-appb-000050
表示按表情描述方向展开成二维矩阵,
Figure PCTCN2016099291-appb-000051
表示按表情描述方向合成立方矩阵。
Figure PCTCN2016099291-appb-000048
among them
Figure PCTCN2016099291-appb-000049
Figure PCTCN2016099291-appb-000050
Expressing a two-dimensional matrix in the direction of the expression description,
Figure PCTCN2016099291-appb-000051
Indicates that the cubic matrix is synthesized in the direction of the expression description.
针对m个定位点对应的第二定位模型的参数向量Csdm_exp_id,举一个例子,假设SDM模型定位出m个定位点,而3-modeSVD模型的定位点个数为n,那么3-modeSVD模型的参数向量Cexp_id中n个点是包含上述m个定位点的,将这m个定位点在参数向量Cexp_id中对应的参数向量确定为Csdm_exp_idFor the parameter vector C sdm_exp_id of the second positioning model corresponding to m positioning points, for example, suppose the SDM model locates m positioning points, and the number of positioning points of the 3-modeSVD model is n, then the 3-modeSVD model The n points in the parameter vector C exp_id are included in the m positioning points, and the parameter vectors corresponding to the m positioning points in the parameter vector C exp_id are determined as C sdm_exp_id .
在703中,确定SDM模型确定出的m个定位点的位置与新的迭代位置S的偏差ΔS。In 703, the deviation ΔS of the position of the m positioning points determined by the SDM model from the new iteration position S is determined.
如果以SSDM表示SDM模型确定出的m个定位点的位置,那么ΔS=SSDM-S。If the position of m anchor points determined by the SDM model is represented by S SDM , then ΔS = S SDM -S.
在704中,利用当前的Wid和ΔS,确定新的WidAt 704, the new W id is determined using the current W id and ΔS .
本步骤中,可以首先利用ΔWid=(ΨT·Ψ)-1·ΨT·ΔS,确定出ΔWid;然后再将 ΔWid与当前的Wid的和确定为新的WidIn this step, it is possible firstly ΔW id = (Ψ T · Ψ ) -1 · Ψ T · ΔS, it is determined that ΔW id; then ΔW id of the current W id and determined as a new W id.
在705中,判断ΔS的模是否小于或等于预设的第二模值,如果是,则执行707;否则,执行706。In 705, it is determined whether the modulus of ΔS is less than or equal to the preset second modulus value, and if so, 707 is performed; otherwise, 706 is performed.
本步骤实际上是利用ΔS设置了一个收敛条件,如果ΔS的值满足收敛条件,则说明SDM模型确定出的m个定位点的位置与新的迭代位置S之间的已经偏差很小,可以将当前迭代出的Wid(即当前的Wid)作为当前帧的Wid。如果不满足收敛条件,则继续迭代。In this step, a convergence condition is actually set by using ΔS. If the value of ΔS satisfies the convergence condition, it indicates that the position of the m positioning points determined by the SDM model and the new iteration position S have little deviation, and W id out the current iteration (i.e., the current W id) W id as the current frame. If the convergence condition is not met, the iteration continues.
其中第二模值可以采用经验值,例如取1。The second modulus value may be an empirical value, for example, one.
在706中,利用新的Wid更新当前Wid以及利用新的迭代位置S更新当前迭代位置
Figure PCTCN2016099291-appb-000052
转至执行702。
In 706, the current W id is updated with the new W id and the current iteration location is updated with the new iteration position S
Figure PCTCN2016099291-appb-000052
Go to execution 702.
在707中,将新的Wid确定为当前帧的WidIn 707, the new W id is determined as W id of the current frame.
以上就完成了当前帧的Wid的确定,实际上采用的方式是固定Wexp迭代出Wexp。下面的步骤开始基于当前帧的Wid确定当前帧的Wexp,即固定Wid迭代出Wexp,原理相同。This completes the determination of the W id of the current frame. The actual method is to fix W exp by iterating W exp . The following steps begin to determine the W exp of the current frame based on the W id of the current frame, that is, the fixed W id iterates out W exp , and the principle is the same.
在708中,将利用SDM模型定位出的m个定位点的位置平均值作为当前迭代位置
Figure PCTCN2016099291-appb-000053
In 708, the average value of the positions of the m positioning points that are located by using the SDM model is taken as the current iteration position.
Figure PCTCN2016099291-appb-000053
在709中,利用当前迭代位置
Figure PCTCN2016099291-appb-000054
当前帧的Wid和当前的Wexp以及所述m个定位点对应的3-modeSVD模型的参数向量Csdm_exp_id,确定新的迭代位置S。
In 709, utilize the current iteration position
Figure PCTCN2016099291-appb-000054
The W id of the current frame and the current W exp and the parameter vector C sdm_exp_id of the 3-mode SVD model corresponding to the m anchor points determine a new iteration position S.
本步骤中,可以利用如下公式,得到新的迭代位置S:In this step, you can use the following formula to get a new iteration position S:
Figure PCTCN2016099291-appb-000055
Figure PCTCN2016099291-appb-000055
其中,
Figure PCTCN2016099291-appb-000056
Figure PCTCN2016099291-appb-000057
表示按对象描述方向展开成二维矩阵,
Figure PCTCN2016099291-appb-000058
表示按对象描述方向合成立方矩阵。
among them,
Figure PCTCN2016099291-appb-000056
Figure PCTCN2016099291-appb-000057
Indicates that the object is expanded into a two-dimensional matrix according to the object description direction.
Figure PCTCN2016099291-appb-000058
Indicates that the cubic matrix is synthesized in the direction of the object description.
在710中,确定利用SDM模型定位出的m个定位点的位置与新的迭代位置S的偏差ΔS。 At 710, a deviation ΔS from the position of the m anchor points located using the SDM model and the new iteration position S is determined.
如果以SSDM表示SDM模型确定出的m个定位点的位置,那么ΔS=SSDM-S。If the position of m anchor points determined by the SDM model is represented by S SDM , then ΔS = S SDM -S.
在711中,利用当前的Wexp和ΔS,确定新的WexpIn 711, the new W exp is determined using the current W exp and ΔS .
可以首先利用ΔWexp=(ΩT·Ω)-1·ΩT·ΔS,确定ΔWexp;将ΔWexp与当前的Wexp的和确定为新的WexpFirstly can ΔW exp = (Ω T · Ω ) -1 · Ω T · ΔS, determining Delta] W exp; exp Delta] W and the determined current and W is exp exp is the new W.
在712中,判断ΔS的模是否小于或等于预设的第三模值,如果是,执行714;否则执行713。In 712, it is determined whether the modulo of ΔS is less than or equal to the preset third modulo value, and if so, 714 is performed; otherwise 713 is performed.
这里的第三模值可以采用经验值,例如取1。The third modulus value here can be an empirical value, for example, one.
在713中,利用新的Wexp更新当前Wexp以及利用新的迭代位置S更新当前迭代位置
Figure PCTCN2016099291-appb-000059
转至执行709。
In 713, with the new W exp updates the current W exp and the use of new iteration updates the current iteration S position location
Figure PCTCN2016099291-appb-000059
Go to execution 709.
在714中,将新的Wexp确定为当前帧的WexpIn 714, the new W exp W exp determined as the current frame.
以上就完成了当前帧的Wexp的确定,当前帧的Wid之前也确定完成,后续就可以在715中利用当前帧的Wid和Wexp对当前帧进行定位。The above determines the W exp of the current frame, and the W id of the current frame is also determined to be completed before, and then the current frame can be located in 715 by using the W id and W exp of the current frame.
具体地,可以利用
Figure PCTCN2016099291-appb-000060
得到包含当前帧n个定位点的位置的向量f。
Specifically, it can be utilized
Figure PCTCN2016099291-appb-000060
A vector f containing the position of n anchor points of the current frame is obtained.
其中,
Figure PCTCN2016099291-appb-000061
为所述第二定位模型所采用训练集的图像中n个定位点的位置平均值构成的向量。
among them,
Figure PCTCN2016099291-appb-000061
A vector formed by the average of the positions of the n anchor points in the image of the training set used for the second positioning model.
至此,对当前帧的定位结束。可以针对视频的每一帧都执行上述流程,得到各帧的定位点。另外,在对视频中的各帧利用上述方法进行定位时,在定位若干数量的帧时,Wid逐渐趋于稳定值,那么对于后续帧的Wid就可以不必采用图7中所示的方式来迭代得到Wid,而直接采用该稳定值。然后利用该稳定值计算各帧的Wexp。在判断Wid是否逐渐趋于稳定值时,可以判断当前帧的Wid与上一帧的Wid之差的模是否小于预设的阈值,例如是否小于1,如果是,就确定Wid是否逐渐趋于稳定值。其中稳定值可以是当前帧的Wid与之前各帧的Wid的平均值。当然,判断是否趋于稳定和确定稳定值的方式也可以采用其他方式,在此不再列举。 At this point, the positioning of the current frame ends. The above process can be performed for each frame of the video to obtain the anchor points of each frame. In addition, when positioning the frames in the video by using the above method, when locating a certain number of frames, the W id gradually approaches a stable value, so the W id of the subsequent frames does not need to adopt the manner shown in FIG. 7 . To iterate to get W id , and directly use the stable value. The stable value is then used to calculate the W exp of each frame. When it is determined whether the W id gradually reaches a stable value, it can be determined whether the modulo of the difference between the W id of the current frame and the W id of the previous frame is less than a preset threshold, for example, whether it is less than 1, and if so, whether the W id is determined. Gradually tend to stabilize. The stable value may be an average value of the W id of the current frame and the W id of the previous frames. Of course, other ways of judging whether it is stable or determining a stable value may be used, and will not be enumerated here.
由于Wid逐渐趋于稳定,定位点的位置也逐渐趋于稳定,这对于诸如虚拟试妆等美颜应用的效果而言,至关重要。As W id gradually stabilizes, the position of the anchor point gradually becomes stable, which is crucial for the effect of beauty applications such as virtual makeup.
对于图像中包含两个以上对象的情况,例如包含两个人,那么进行上述定位后,会输出两个定位结果(每个定位结果包含n个定位点)以及两个Wid。由于视频中人物并不一定是静止的,可能会出现移动造成的位置变化,因此就需要区分两帧中的哪两个定位结果是属于同一个人的。如图8所示,两幅图像为前后两帧,前一帧中的对象甲和对象乙的相对位置在当前帧中发生了变更,因此,就需要进行区分和识别。For the case where the image contains more than two objects, for example, two people, after performing the above positioning, two positioning results (each positioning result includes n positioning points) and two W id are output. Since the characters in the video are not necessarily static, there may be positional changes caused by the movement, so it is necessary to distinguish which of the two frames is the same person. As shown in FIG. 8, the two images are two frames before and after, and the relative positions of the object A and the object B in the previous frame are changed in the current frame, and therefore, it is necessary to distinguish and recognize.
为了方便描述,将前一帧的定位结果和对象描述参数分别标识为f1_pre,W1_pre,f2_pre,W2_pre,当前帧的定位结果和对象描述参数分别标识为f1_cur,W1_cur,f2_cur,W2_curFor convenience of description, the front one of the positioning result and object description parameters are identified as f 1_pre, W 1_pre, f 2_pre , W 2_pre, current positioning result and the target frame described parameters are identified as f 1_cur, W 1_cur, f 2_cur , W 2_cur .
在本发明实施例中,可以将上一帧的对象描述参数与当前帧的对象描述参数的欧式距离最小的两个对象描述参数确定为对应于同一对象。具体地,分别计算W1_pre与W1_cur的欧式距离、W1_pre与W2_cur的欧式距离、W2_pre与W1_cur的欧式距离、W2_pre与W2_cur的欧式距离。选择其中最小的欧式距离,假设W1_pre与W2_cur的欧式距离最小,则说明W1_pre与W2_cur对应于同一对象,相应地,f1_pre与f2_cur对应于同一对象,可以将属于同一对象的定位结果标识为同一对象。In the embodiment of the present invention, two object description parameters that have the smallest Euclidean distance between the object description parameter of the previous frame and the object description parameter of the current frame may be determined to correspond to the same object. Specifically Euclidean distance Euclidean distance, and calculates W 1_pre W 1_cur Euclidean distance, and W 2_cur W 1_pre, the Euclidean distance W 2_pre and W 1_cur of the W 2_pre and W 2_cur. Selecting the smallest Euclidean distance, assuming that the Euclidean distance between W 1_pre and W 2_cur is the smallest, it means that W 1_pre and W 2_cur correspond to the same object. Accordingly, f 1_pre and f 2_cur correspond to the same object, and the positioning belonging to the same object can be The result is identified as the same object.
上述实施例中,m可以取诸如几十级别的整数,n可以取诸如百级别的整数。In the above embodiment, m may take an integer such as several tens of levels, and n may take an integer such as a hundred-level.
以上是对本发明所提供的方法进行的详细描述,下面结合图9对本发明实施例提供的装置进行详细描述,如图9中所示,该装置可以包括:第一定位单元10、参数确定单元20和第二定位单元30,还可以包括第一模型训练单元40、第二模型训练单元50和身份标识单元60。其中,第一定位单元10可以具体包括第一偏离确定子单元11和第一定位子单元12。第二定位单元30可以具体包括第一参数确定子单元31、第二参数确定子单元32和第二定位子单元33。各组成单元的主要功能如下:The foregoing is a detailed description of the method provided by the present invention. The apparatus provided by the embodiment of the present invention is described in detail below with reference to FIG. 9. As shown in FIG. 9, the apparatus may include: a first positioning unit 10 and a parameter determining unit 20 And the second positioning unit 30 may further include a first model training unit 40, a second model training unit 50, and an identity identifying unit 60. The first positioning unit 10 may specifically include a first deviation determining subunit 11 and a first positioning subunit 12. The second positioning unit 30 may specifically include a first parameter determining subunit 31, a second parameter determining subunit 32, and a second positioning subunit 33. The main functions of each component are as follows:
第一定位单元10负责利用第一定位模型对当前帧进行定位,得到m个定位 点的位置。The first positioning unit 10 is responsible for positioning the current frame by using the first positioning model to obtain m positionings. The location of the point.
参数确定单元20负责将上一帧的Wid和Wexp作为初始参数。The parameter determination unit 20 is responsible for taking W id and W exp of the previous frame as initial parameters.
第二定位单元30负责基于m个定位点和初始参数,利用第二定位模型确定当前帧的Wid和Wexp,并基于当前帧的Wid和Wexp对当前帧进行定位,得到n个定位点的位置。The second positioning unit 30 is responsible for determining the W id and W exp of the current frame by using the second positioning model based on the m positioning points and the initial parameters, and positioning the current frame based on the W id and W exp of the current frame to obtain n positioning. The location of the point.
其中m小于n,Wid为图像中的对象描述参数,Wexp为图像中的表情描述参数。Where m is less than n, W id is an object description parameter in the image, and W exp is an expression description parameter in the image.
本装置实际上是利用第一定位模型对当前帧进行粗略定位,然后在第一定位模型的定位基础上,结合上一帧的Wid和Wexp采用第二定位模型对当前帧进行精确定位。The device actually uses the first positioning model to roughly locate the current frame, and then uses the second positioning model to accurately locate the current frame based on the positioning of the first positioning model and the W id and W exp of the previous frame.
其中,参数确定单元20可以先判断当前帧是否为视频的首帧,如果否,则将上一帧的Wid和Wexp作为初始参数;如果是,由于视频的首帧不存在上一帧可参考,因此可以将预设的初始Wid和初始Wexp作为初始参数。The parameter determining unit 20 may first determine whether the current frame is the first frame of the video, and if not, use W id and W exp of the previous frame as initial parameters; if yes, since the first frame of the video does not exist in the previous frame Reference, so the preset initial W id and initial W exp can be used as initial parameters.
在本发明实施例中,上述第一定位模型可以为SDM模型,第二定位模型可以为3-modeSVD模型。In the embodiment of the present invention, the first positioning model may be an SDM model, and the second positioning model may be a 3-mode SVD model.
第一模型训练单元40负责操作训练第一定位模型,即SDM模型,具体执行以下操作:The first model training unit 40 is responsible for operating the first positioning model, that is, the SDM model, and specifically performs the following operations:
A11、确定训练集中p个图像的关键点位置XrealA11. Determine a key point position X real of p images in the training set.
A12、将各关键点在p个图像中的平均位置作为当前迭代位置X。A12. The average position of each key point in the p images is taken as the current iteration position X.
A13、利用ΔX=Xreal-X,确定ΔX的值。A13. Determine the value of ΔX by using ΔX=X real -X.
A14、利用R=(ΦT·Φ)-1·ΦT·ΔX,得到SDM模型的参数向量R的当前值。A14. Using R = (Φ T · Φ) -1 · Φ T · ΔX, the current value of the parameter vector R of the SDM model is obtained.
A15、如果ΔX的模小于或等于预设的第一模值,则R的当前值即为训练得到的SDM模型的参数向量R的值;否则,利用Φ·R+X得到的值更新当前迭代位置X,转至A13。其中第一模值可以取经验值,例如取1。A15. If the modulus of ΔX is less than or equal to the preset first modulus value, the current value of R is the value of the parameter vector R of the trained SDM model; otherwise, the current iteration is updated by the value obtained by Φ·R+X Position X, go to A13. The first modulus value may take an empirical value, for example, 1 is taken.
其中,Φ=p×Dim,Dim为分别以m个平均定位点的位置为中心的设定范围区域提取的2m维梯度特征。 Where Φ=p×Dim, Dim is a 2m-dimensional gradient feature extracted by a set range region centered on the positions of m average anchor points.
下面对第一定位单元10的组成结构进行描述。The composition of the first positioning unit 10 will be described below.
第一偏离确定子单元11负责提取当前帧的图像梯度特征,利用图像梯度特征和第一定位模型,得到当前帧的定位点形状偏离平均定位点形状的程度ΔX。The first deviation determining sub-unit 11 is responsible for extracting the image gradient feature of the current frame, and using the image gradient feature and the first positioning model, the degree ΔX of the shape of the positioning point of the current frame deviating from the average positioning point shape is obtained.
具体地,第一偏离确定子单元11可以利用ΔX=Φ·R确定ΔX。Specifically, the first deviation determination sub-unit 11 can determine ΔX using ΔX=Φ·R.
第一定位子单元12负责利用当前帧的ΔX和预先确定的m个平均定位点的位置,得到当前帧的m个定位点的位置。当前帧的m个定位点的位置就是当前帧的ΔX与预先确定的m个平均定位点的位置
Figure PCTCN2016099291-appb-000062
之和。
The first positioning sub-unit 12 is responsible for obtaining the positions of the m positioning points of the current frame by using the ΔX of the current frame and the positions of the predetermined m average positioning points. The position of the m positioning points of the current frame is the position of the current frame ΔX and the predetermined m average positioning points.
Figure PCTCN2016099291-appb-000062
Sum.
在利用SDM模型完成m个定位点的粗略定位后,下面就要利用3-modeSVD模型进行进一步的精确定位。为了方便对3-modeSVD模型的定位方式的理解,首先对第二模型训练单元50进行描述。After the rough positioning of m positioning points is completed by using the SDM model, the 3-modeSVD model is used for further precise positioning. To facilitate an understanding of the positioning of the 3-mode SVD model, the second model training unit 50 is first described.
第二模型训练单元50负责训练第二定位模型,即3-modeSVD模型。具体执行以下操作:The second model training unit 50 is responsible for training the second positioning model, the 3-mode SVD model. Specifically do the following:
B11、收集不同对象的不同表情的图像,按照对象描述、表情描述和位置描述构建立体训练数据张量。B11. Collect images of different expressions of different objects, and construct a stereo training data tensor according to the object description, the expression description, and the position description.
B12、将立体训练数据张量中n个关键点的位置分别减去n个关键点在各图像中的位置平均值,得到立体数据张量D。B12. Subtracting the position average of n key points in each image by subtracting the position average of n key points in the stereo training data tensor to obtain a stereo data tensor D.
B13、利用
Figure PCTCN2016099291-appb-000063
得到3-modeSVD模型的参数向量Cexp_id;其中,Uexp是D在表情描述方向展开为二维矩阵的酉矩阵,Uid是D在对象描述方向展开为二维矩阵的酉矩阵。×1表示×1前面的立方矩阵按照表情描述方向展开为二维矩阵与×1后面的二维矩阵相乘后,将得到的二维矩阵再按照表情描述方向变换成立方矩阵;×2表示×2前面的立方矩阵按照对象描述方向展开为二维矩阵与×2后面的二维矩阵相乘后,将得到的二维矩阵再按照对象描述方向变换成立方矩阵。
B13, use
Figure PCTCN2016099291-appb-000063
The parameter vector C exp_id of the 3-modeSVD model is obtained ; where U exp is the unitary matrix in which D is expanded into a two-dimensional matrix in the direction of expression description, and U id is a unitary matrix in which D is expanded into a two-dimensional matrix in the object description direction. × 1 indicates that the cubic matrix in front of × 1 is expanded into a two-dimensional matrix according to the expression description direction and multiplied by the two-dimensional matrix behind × 1 , and the obtained two-dimensional matrix is transformed into a square matrix according to the expression description direction; × 2 indicates × 2 The cubic matrix in front is expanded according to the object description direction and multiplied by the two-dimensional matrix behind × 2 , and then the obtained two-dimensional matrix is transformed into a square matrix according to the object description direction.
下面对第二定位单元30的具体结构进行描述。The specific structure of the second positioning unit 30 will be described below.
第一参数确定子单元31负责利用3-modeSVD模型确定当前帧的Wid,具体执行以下操作: The first parameter determining sub-unit 31 is responsible for determining the W id of the current frame by using the 3-mode SVD model, and specifically performs the following operations:
S21、将m个定位点的位置平均值作为当前迭代位置
Figure PCTCN2016099291-appb-000064
S21. Taking the position average of the m positioning points as the current iteration position
Figure PCTCN2016099291-appb-000064
S22、利用当前迭代位置
Figure PCTCN2016099291-appb-000065
当前的Wid和Wexp以及m个定位点对应的3-modeSVD模型的参数向量Csdm_exp_id,确定新的迭代位置S。
S22, using the current iteration position
Figure PCTCN2016099291-appb-000065
The current W id and W exp and the parameter vector C sdm_exp_id of the 3-mode SVD model corresponding to the m anchor points determine a new iteration position S.
对于视频的首帧而言,当前的Wid和Wexp均采用预设的初始值,依据奇异值大小,可以取Wid的初始值(即初始向量)为Wexp的初始值(初始向量)为
Figure PCTCN2016099291-appb-000067
Wid的初始值和Wexp的初始值的模均为1。
For the first frame of the video, the current W id and W exp both use preset initial values. According to the singular value, the initial value of W id (ie, the initial vector) can be taken as The initial value of W exp (initial vector) is
Figure PCTCN2016099291-appb-000067
The modulus of the initial value of W id and the initial value of W exp is both 1.
对于视频的非首帧而言,当前的Wid和Wexp均采用上一帧的Wid和WexpFor non-video of the first frame, the current W id and W exp are used on one of W id and W exp.
第一参数确定子单元31可以利用
Figure PCTCN2016099291-appb-000068
得到新的迭代位置S;其中
Figure PCTCN2016099291-appb-000069
Figure PCTCN2016099291-appb-000070
表示按表情描述方向展开成二维矩阵,
Figure PCTCN2016099291-appb-000071
表示按表情描述方向合成立方矩阵。
The first parameter determining subunit 31 can utilize
Figure PCTCN2016099291-appb-000068
Get a new iteration position S;
Figure PCTCN2016099291-appb-000069
Figure PCTCN2016099291-appb-000070
Expressing a two-dimensional matrix in the direction of the expression description,
Figure PCTCN2016099291-appb-000071
Indicates that the cubic matrix is synthesized in the direction of the expression description.
S23、确定m个定位点的位置与新的迭代位置S的偏差ΔS。S23. Determine a deviation ΔS between the positions of the m positioning points and the new iteration position S.
如果以SSDM表示SDM模型确定出的m个定位点的位置,那么ΔS=SSDM-S。If the position of m anchor points determined by the SDM model is represented by S SDM , then ΔS = S SDM -S.
S24、利用当前的Wid和ΔS,确定新的WidS24. Determine the new W id by using the current W id and ΔS.
第一参数确定子单元31可以利用ΔWid=(ΨT·Ψ)-1·ΨT·ΔS,确定ΔWid;将ΔWid与当前的Wid的和确定为新的WidThe first sub-parameter determining unit 31 may use ΔW id = (Ψ T · Ψ ) -1 · Ψ T · ΔS, determine ΔW id; ΔW id with the current W id and determined as a new W id.
S25、如果ΔS的模小于或等于预设的第二模值,则将新的Wid确定为当前帧的Wid;否则,利用新的Wid更新当前Wid以及利用新的迭代位置S更新当前迭代位置
Figure PCTCN2016099291-appb-000072
转至执行S22。这里的第二模值可以采用经验值,例如取1。
S25, if the value of the second mold ΔS is equal to or less than a preset mode, then the new W id of the current frame is determined as W id; otherwise, update with the new W id W id and the use of this new iteration location update S Current iteration position
Figure PCTCN2016099291-appb-000072
Go to execution S22. The second modulus value here can be an empirical value, for example, one.
在第一参数确定子单元31确定出当前帧的Wid后,由第二参数确定子单元32负责利用3-modeSVD模型确定当前帧的Wexp,具体执行:After the first parameter determining sub-unit 31 determines the W id of the current frame, the second parameter determining sub-unit 32 is responsible for determining the W exp of the current frame by using the 3-mode SVD model, and specifically:
S31、将m个定位点的位置平均值作为当前迭代位置
Figure PCTCN2016099291-appb-000073
S31. Taking the average value of the positions of the m positioning points as the current iteration position
Figure PCTCN2016099291-appb-000073
S32、利用当前迭代位置
Figure PCTCN2016099291-appb-000074
当前帧的Wid和当前的Wexp以及m个定位点对应的第二定位模型的参数向量Csdm_exp_id,确定新的迭代位置S。
S32, using the current iteration position
Figure PCTCN2016099291-appb-000074
The new iteration position S is determined by the W id of the current frame and the current W exp and the parameter vector C sdm_exp_id of the second positioning model corresponding to the m positioning points.
第二参数确定子单元32可以利用
Figure PCTCN2016099291-appb-000075
得到新的迭代位置S;其中,
Figure PCTCN2016099291-appb-000076
Figure PCTCN2016099291-appb-000077
表示按对象描述方向展开成二维矩阵,
Figure PCTCN2016099291-appb-000078
表示按对象描述方向合成立方矩阵。
The second parameter determining subunit 32 can utilize
Figure PCTCN2016099291-appb-000075
Get a new iteration position S; where
Figure PCTCN2016099291-appb-000076
Figure PCTCN2016099291-appb-000077
Indicates that the object is expanded into a two-dimensional matrix according to the object description direction.
Figure PCTCN2016099291-appb-000078
Indicates that the cubic matrix is synthesized in the direction of the object description.
S33、确定m个定位点的位置与新的迭代位置S的偏差ΔS。S33. Determine a deviation ΔS between the positions of the m positioning points and the new iteration position S.
如果以SSDM表示SDM模型确定出的m个定位点的位置,那么ΔS=SSDM-S。If the position of m anchor points determined by the SDM model is represented by S SDM , then ΔS = S SDM -S.
S34、利用当前的Wexp和ΔS,确定新的WexpS34. Determine the new W exp by using the current W exp and ΔS.
第二参数确定子单元32可以利用ΔWexp=(ΩT·Ω)-1·ΩT·ΔS,确定ΔWexp;将ΔWexp与当前的Wexp的和确定为新的WexpThe second sub-parameter determining unit 32 may use ΔW exp = (Ω T · Ω ) -1 · Ω T · ΔS, determining Delta] W exp; exp Delta] W and the determined current and W is exp exp is the new W.
S35、如果ΔS的模小于或等于预设的第三模值,则将新的Wexp确定为当前帧的Wexp;否则,利用新的Wexp更新当前Wexp以及利用新的迭代位置S更新当前迭代位置
Figure PCTCN2016099291-appb-000079
转至执行S32。这里的第三模值可以采用经验值,例如取1。
S35, if ΔS is less than or equal to a preset molding mold a third value, the new W exp current frame is determined as W exp; otherwise, update with the new W exp W exp current iteration and the use of new location update S Current iteration position
Figure PCTCN2016099291-appb-000079
Go to execute S32. The third modulus value here can be an empirical value, for example, one.
完成当前帧的Wid和Wexp的确定后,第二定位子单元33负责基于当前帧的Wid和Wexp对当前帧进行定位,具体执行:After the determination of the W id and the W exp of the current frame is completed, the second locating sub-unit 33 is responsible for locating the current frame based on the W id and W exp of the current frame, and specifically:
利用
Figure PCTCN2016099291-appb-000080
得到包含当前帧n个定位点的位置的向量f;
use
Figure PCTCN2016099291-appb-000080
Obtaining a vector f containing the position of n positioning points of the current frame;
其中,
Figure PCTCN2016099291-appb-000081
为3-modeSVD模型所采用训练集的图像中n个定位点的位置平均值构成的向量,Cexp_id为3-modeSVD模型的参数向量。
among them,
Figure PCTCN2016099291-appb-000081
A vector composed of the position averages of n anchor points in the image of the training set used in the 3-modeSVD model, and C exp_id is a parameter vector of the 3-mode SVD model.
另外,在对视频中的各帧利用上述方法进行定位时,在定位若干数量的帧时,Wid逐渐趋于稳定值,那么对于后续帧的Wid就可以不必采用上述方式来迭代得到Wid,而直接采用该稳定值。然后利用该稳定值计算各帧的Wexp。在判断Wid是否逐渐趋于稳定值时,可以判断当前帧的Wid与上一帧的Wid之差的模是否小于预设的阈值,例如是否小于1,如果是,就确定Wid是否逐渐趋于稳定值。其中稳定值可以是当前帧的Wid与之前各帧的Wid的平均值。当然,判断是否趋于稳定和确定稳定值的方式也可以采用其他方式,在此不再列举。Further, when the above-described method using video frames positioned, in the positioning of several number of frames, W id value became stable, then W id for subsequent frames need not be employed as described above can be obtained iteratively W id And use this stable value directly. The stable value is then used to calculate the W exp of each frame. When it is determined whether the W id gradually reaches a stable value, it can be determined whether the modulo of the difference between the W id of the current frame and the W id of the previous frame is less than a preset threshold, for example, whether it is less than 1, and if so, whether the W id is determined. Gradually tend to stabilize. The stable value may be an average value of the W id of the current frame and the W id of the previous frames. Of course, other ways of judging whether it is stable or determining a stable value may be used, and will not be enumerated here.
若图像中存在多于一个的对象,则由身份标识单元60负责将上一帧的对象描述参数与当前帧的对象描述参数的欧式距离最小的两个对象描述参数确 定为对应于同一对象,针对同一对象的定位结果进行身份标识,从而区分图像上的定位结果分别属于哪个对象。If there is more than one object in the image, the identity identifying unit 60 is responsible for determining the two object description parameters that minimize the Euclidean distance between the object description parameter of the previous frame and the object description parameter of the current frame. Corresponding to the same object, the positioning result of the same object is identified, thereby distinguishing which object the positioning result on the image belongs to.
由以上描述可以看出,本发明提供的方法和装置可以具备以下优点:As can be seen from the above description, the method and apparatus provided by the present invention can have the following advantages:
1)实现了针对视频图像的关键点定位。1) Key point positioning for video images is achieved.
2)考虑了视频图像中前后帧是同一对象的约束问题,减小了视频中前后帧定位点的抖动,视觉效果更加自然流畅。2) Considering the constraint problem that the front and back frames are the same object in the video image, the jitter of the front and rear frame positioning points in the video is reduced, and the visual effect is more natural and smooth.
在本发明所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division, and the actual implementation may have another division manner.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本发明各个实施例所述方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium. The above software functional unit is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform the methods of the various embodiments of the present invention. Part of the steps. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明保护的范围之内。 The above are only the preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalents, improvements, etc., which are made within the spirit and principles of the present invention, should be included in the present invention. Within the scope of protection.

Claims (28)

  1. 一种对图像进行关键点定位的方法,其特征在于,该方法包括:A method for performing key point positioning on an image, the method comprising:
    S1、利用第一定位模型对当前帧进行定位,得到m个定位点的位置;S1, using the first positioning model to locate the current frame, and obtaining positions of m positioning points;
    S2、将上一帧的Wid和Wexp作为初始参数,基于所述m个定位点,利用第二定位模型确定当前帧的Wid和Wexp,并基于当前帧的Wid和Wexp对当前帧进行定位,得到n个定位点的位置;S2, using W id and W exp of the previous frame as initial parameters, determining, according to the m positioning points, the W id and W exp of the current frame by using the second positioning model, and based on the W id and W exp of the current frame Positioning the current frame to obtain the position of n positioning points;
    其中所述m小于所述n,所述Wid为图像中的对象描述参数,所述Wexp为图像中的表情描述参数。Wherein m is smaller than the n, the W id is an object description parameter in an image, and the W exp is an expression description parameter in the image.
  2. 根据权利要求1所述的方法,其特征在于,在所述将上一帧的Wid和Wexp作为初始参数之前还包括:The method according to claim 1, wherein before the W id and W exp of the previous frame are used as initial parameters, the method further comprises:
    判断当前帧是否为视频的首帧,如果否,则继续执行所述上一帧的c和Wexp作为初始参数的步骤;如果是,则将预设的初始Wid和初始Wexp作为初始参数。Determining whether the current frame is the first frame of the video, and if not, proceed to step c, and the upper one as W exp initial parameters; if so, a preset initial W id and initial parameters as initial W exp .
  3. 根据权利要求1所述的方法,其特征在于,所述第一定位模型为监督下降法SDM模型,所述第二定位模型为3模奇异值分解3-modeSVD模型。The method according to claim 1, wherein the first positioning model is a supervised descent method SDM model, and the second positioning model is a 3-mode singular value decomposition 3-mode SVD model.
  4. 根据权利要求3所述的方法,其特征在于,所述S1包括:The method of claim 3 wherein said S1 comprises:
    S11、提取当前帧的图像梯度特征,利用所述图像梯度特征和所述第一定位模型,得到当前帧的定位点形状偏离平均定位点形状的程度ΔX;S11, extracting an image gradient feature of the current frame, using the image gradient feature and the first positioning model to obtain a degree ΔX of the shape of the positioning point of the current frame deviating from the average positioning point shape;
    S12、利用当前帧的ΔX和预先确定的m个平均定位点的位置,得到当前帧的m个定位点的位置。S12. Using the ΔX of the current frame and the predetermined positions of the m average positioning points, obtain the positions of the m positioning points of the current frame.
  5. 根据权利要求4所述的方法,其特征在于,利用所述图像梯度特征和所述第一定位模型,得到当前帧的定位点形状偏离平均定位点形状的程度ΔX包括:The method according to claim 4, wherein the degree ΔX of the shape of the positioning point of the current frame deviating from the average positioning point shape by using the image gradient feature and the first positioning model comprises:
    利用ΔX=Φ·R确定ΔX;Determining ΔX by using ΔX=Φ·R;
    其中当前帧的Φ为:Φ=p×Dim,p为所述第一定位模型所采用训练集的图 像个数,Dim为分别以所述m个平均定位点的位置为中心的设定范围区域提取的2m维梯度特征,R为所述第一定位模型的参数向量。Wherein the Φ of the current frame is: Φ=p×Dim, where p is a map of the training set used by the first positioning model Like the number, Dim is a 2m-dimensional gradient feature extracted from a set range region centered on the positions of the m average anchor points, and R is a parameter vector of the first positioning model.
  6. 根据权利要求5所述的方法,其特征在于,该方法还包括:预先训练所述第一定位模型,具体包括:The method according to claim 5, further comprising: pre-training the first positioning model, specifically comprising:
    A11、确定训练集中p个图像的关键点位置XrealA11, determine the image of the training set p key point X real;
    A12、将各关键点在p个图像中的平均位置作为当前迭代位置X;A12. The average position of each key point in the p images is taken as the current iteration position X;
    A13、利用ΔX=Xreal-X,确定ΔX的值;A13, using ΔX = X real -X, determining the value of Delta] X;
    A14、利用R=(ΦT·Φ)-1·ΦT·ΔX,得到第一定位模型的参数向量R的当前值;A14. Using R=(Φ T ·Φ) -1 ·Φ T ·ΔX, obtain the current value of the parameter vector R of the first positioning model;
    A15、如果ΔX的模小于或等于预设的第一模值,则R的当前值即为训练得到的第一定位模型的参数向量R的值;A15. If the modulus of ΔX is less than or equal to the preset first modulus, the current value of R is the value of the parameter vector R of the first positioning model obtained by the training;
    否则,利用Φ·R+X得到的值更新当前迭代位置X,转至所述A13。Otherwise, the current iteration position X is updated with the value obtained by Φ·R+X, and the process goes to A13.
  7. 根据权利要求3所述的方法,其特征在于,所述利用第二定位模型确定当前帧的Wid包括:The method according to claim 3, wherein the determining the W id of the current frame by using the second positioning model comprises:
    S21、将所述m个定位点的位置平均值作为当前迭代位置
    Figure PCTCN2016099291-appb-100001
    S21. The average value of the positions of the m positioning points is used as a current iteration position.
    Figure PCTCN2016099291-appb-100001
    S22、利用当前迭代位置
    Figure PCTCN2016099291-appb-100002
    当前的Wid和Wexp以及所述m个定位点对应的第二定位模型的参数向量Csdm_exp_id,确定新的迭代位置S;
    S22, using the current iteration position
    Figure PCTCN2016099291-appb-100002
    The current W id and W exp and the parameter vector C sdm_exp_id of the second positioning model corresponding to the m positioning points determine a new iteration position S;
    S23、确定所述m个定位点的位置与所述新的迭代位置S的偏差ΔS;S23, determining a deviation ΔS between the position of the m positioning points and the new iteration position S;
    S24、利用当前的Wid和ΔS,确定新的WidS24. Determine a new W id by using the current W id and ΔS.
    S25、如果所述ΔS的模小于或等于预设的第二模值,则将所述新的Wid确定为当前帧的WidSecond mold value S25, if the ΔS is equal to or less than a preset mode, the new W id W id determined then the current frame;
    否则,利用新的Wid更新当前Wid以及利用新的迭代位置S更新当前迭代位置
    Figure PCTCN2016099291-appb-100003
    转至执行所述S22。
    Otherwise, with the new W id update the current W id and the use of new iteration location S to update the current iteration position
    Figure PCTCN2016099291-appb-100003
    Go to execute S22.
  8. 根据权利要求7所述的方法,其特征在于,所述S22包括:利用
    Figure PCTCN2016099291-appb-100004
    得到新的迭代位置S;其中所述
    Figure PCTCN2016099291-appb-100005
    表示按表情描述方向展开成二维矩阵,
    Figure PCTCN2016099291-appb-100006
    表示按表情描述方向合成立方矩阵;
    The method of claim 7 wherein said S22 comprises: utilizing
    Figure PCTCN2016099291-appb-100004
    Obtaining a new iteration position S;
    Figure PCTCN2016099291-appb-100005
    Expressing a two-dimensional matrix in the direction of the expression description,
    Figure PCTCN2016099291-appb-100006
    Representing the synthesis of a cubic matrix in the direction of the expression description;
    所述S24包括:The S24 includes:
    利用ΔWid=(ΨT·Ψ)-1·ΨT·ΔS,确定ΔWidUsing ΔW id =(Ψ T ·Ψ) -1 ·Ψ T ·ΔS, determine ΔW id ;
    将ΔWid与当前的Wid的和确定为新的WidThe sum of ΔW id and the current W id is determined as the new W id .
  9. 根据权利要求3或7所述的方法,其特征在于,所述利用第二定位模型确定当前帧的Wexp包括:The method according to claim 3 or 7, wherein the determining the W exp of the current frame by using the second positioning model comprises:
    S31、将所述m个定位点的位置平均值作为当前迭代位置
    Figure PCTCN2016099291-appb-100007
    S31. Using the average position of the m positioning points as the current iteration position
    Figure PCTCN2016099291-appb-100007
    S32、利用当前迭代位置
    Figure PCTCN2016099291-appb-100008
    当前帧的Wid和当前的Wexp以及所述m个定位点对应的第二定位模型的参数向量Csdm_exp_id,确定新的迭代位置S;
    S32, using the current iteration position
    Figure PCTCN2016099291-appb-100008
    Positioning a second parameter vector C sdm_exp_id Model W id and the current W exp current frame and the m corresponding anchor point, determining a new position of S iteration;
    S33、确定所述m个定位点的位置与所述新的迭代位置S的偏差ΔS;S33, determining a deviation ΔS between the position of the m positioning points and the new iteration position S;
    S34、利用当前的Wexp和ΔS,确定新的WexpS34. Determine a new W exp by using the current W exp and ΔS;
    S35、如果所述ΔS的模小于或等于预设的第三模值,则将所述新的Wexp确定为当前帧的WexpS35, if the mold is less than or equal ΔS preset third modulo value, the new W exp W exp determined then the current frame;
    否则,利用新的Wexp更新当前Wexp以及利用新的迭代位置S更新当前迭代位置
    Figure PCTCN2016099291-appb-100009
    转至执行所述S32。
    Otherwise, with the new W exp update the current W exp and the use of new iteration location S to update the current iteration position
    Figure PCTCN2016099291-appb-100009
    Go to execute S32.
  10. 根据权利要求9所述的方法,其特征在于,所述S32包括:利用
    Figure PCTCN2016099291-appb-100010
    得到新的迭代位置S;其中所述
    Figure PCTCN2016099291-appb-100011
    表示按对象描述方向展开成二维矩阵,
    Figure PCTCN2016099291-appb-100012
    表示按对象描述方向合成立方矩阵;
    The method of claim 9 wherein said S32 comprises: utilizing
    Figure PCTCN2016099291-appb-100010
    Obtaining a new iteration position S;
    Figure PCTCN2016099291-appb-100011
    Indicates that the object is expanded into a two-dimensional matrix according to the object description direction.
    Figure PCTCN2016099291-appb-100012
    Represents the synthesis of a cubic matrix in the direction of the object description;
    所述S34包括:The S34 includes:
    利用ΔWexp=(ΩT·Ω)-1·ΩT·ΔS,确定ΔWexpUsing ΔW exp = (Ω T · Ω) -1 · Ω T · ΔS, ΔW exp is determined;
    将ΔWexp与当前的Wexp的和确定为新的WexpThe sum of ΔW exp and the current W exp is determined as the new W exp .
  11. 根据权利要求3所述的方法,其特征在于,所述基于当前帧的Wid和Wexp对当前帧进行定位包括:The method according to claim 3, wherein the locating the current frame based on the W id and W exp of the current frame comprises:
    利用
    Figure PCTCN2016099291-appb-100013
    得到包含当前帧n个定位点的位置的向量f;
    use
    Figure PCTCN2016099291-appb-100013
    Obtaining a vector f containing the position of n positioning points of the current frame;
    其中,
    Figure PCTCN2016099291-appb-100014
    为所述第二定位模型所采用训练集的图像中n个定位点的位置平均值构成的向量,Cexp_id为第二定位模型的参数向量,×1表示×1前面的立方矩阵按 照表情描述方向展开为二维矩阵与×1后面的二维矩阵相乘后,将得到的二维矩阵再按照表情描述方向变换成立方矩阵;×2表示×2前面的立方矩阵按照对象描述方向展开为二维矩阵与×2后面的二维矩阵相乘后,将得到的二维矩阵再按照对象描述方向变换成立方矩阵。
    among them,
    Figure PCTCN2016099291-appb-100014
    a vector formed by the average value of the positions of n positioning points in the image of the training set used in the second positioning model, C exp_id is a parameter vector of the second positioning model, and × 1 indicates that the cubic matrix in front of × 1 is in accordance with the expression description direction. expand remapping cubic matrix is a two-dimensional matrix and a two-dimensional matrix × 1 dimensional matrix by multiplying the back, in accordance with the expression described in the obtained direction; represents × 2 × 2 matrix is expanded in front of a two-dimensional cubic described by object orientation After multiplying the matrix by the two-dimensional matrix behind × 2 , the obtained two-dimensional matrix is transformed into a square matrix according to the direction of the object description.
  12. 根据权利要求11所述的方法,其特征在于,该方法还包括:预先训练所述第二定位模型,具体包括:The method according to claim 11, further comprising: pre-training the second positioning model, specifically comprising:
    B11、收集不同对象的不同表情的图像,按照对象描述、表情描述和位置描述构建立体训练数据张量;B11. Collect images of different expressions of different objects, and construct a stereo training data tensor according to the object description, the expression description, and the position description;
    B12、将所述立体训练数据张量中n个关键点的位置分别减去n个关键点在各图像中的位置平均值,得到立体数据张量D;B12, the position of the n key points in the stereo training data tensor is respectively subtracted from the position average of the n key points in each image to obtain a stereo data tensor D;
    B13、利用
    Figure PCTCN2016099291-appb-100015
    得到第二定位模型的参数向量Cexp_id
    B13, use
    Figure PCTCN2016099291-appb-100015
    Obtaining a parameter vector C exp_id of the second positioning model;
    其中,Uexp是D在表情描述方向展开为二维矩阵的酉矩阵,Uid是D在对象描述方向展开为二维矩阵的酉矩阵。Where U exp is a unitary matrix in which D is expanded into a two-dimensional matrix in the direction of expression description, and U id is a unitary matrix in which D is expanded into a two-dimensional matrix in the direction of object description.
  13. 根据权利要求1所述的方法,其特征在于,该方法还包括:The method of claim 1 further comprising:
    当确定出的各帧的Wid趋于稳定值时,对于后续帧的Wid直接采用所述稳定值。When the determined W id of each frame tends to a stable value, the stable value is directly adopted for the W id of the subsequent frame.
  14. 根据权利要求1所述的方法,其特征在于,若图像中存在多于一个的对象,则将上一帧的对象描述参数与当前帧的对象描述参数的欧式距离最小的两个对象描述参数确定为对应于同一对象。The method according to claim 1, wherein if there is more than one object in the image, determining two object description parameters that minimize the Euclidean distance between the object description parameter of the previous frame and the object description parameter of the current frame To correspond to the same object.
  15. 一种对图像进行关键点定位的装置,其特征在于,该装置包括:An apparatus for performing key point positioning on an image, the apparatus comprising:
    第一定位单元,用于利用第一定位模型对当前帧进行定位,得到m个定位点的位置;a first positioning unit, configured to locate the current frame by using the first positioning model, to obtain positions of the m positioning points;
    参数确定单元,用于将上一帧的Wid和Wexp作为初始参数;a parameter determining unit, configured to use W id and W exp of the previous frame as initial parameters;
    第二定位单元,用于基于所述m个定位点和初始参数,利用第二定位模型确定当前帧的Wid和Wexp,并基于当前帧的Wid和Wexp对当前帧进行定位,得到n个定位点的位置; a second positioning unit, configured to determine W id and W exp of the current frame by using the second positioning model based on the m positioning points and initial parameters, and locate the current frame based on W id and W exp of the current frame, and obtain The position of n positioning points;
    其中所述m小于所述n,所述Wid为图像中的对象描述参数,所述Wexp为图像中的表情描述参数。Wherein m is smaller than the n, the W id is an object description parameter in an image, and the W exp is an expression description parameter in the image.
  16. 根据权利要求15所述的装置,其特征在于,所述参数确定单元,还用于判断当前帧是否为视频的首帧,如果否,则将上一帧的Wid和Wexp作为初始参数;如果是,则将预设的初始Wid和初始Wexp作为初始参数。The device according to claim 15, wherein the parameter determining unit is further configured to determine whether the current frame is the first frame of the video, and if not, the W id and W exp of the previous frame are used as initial parameters; If yes, the preset initial W id and initial W exp are taken as initial parameters.
  17. 根据权利要求15所述的装置,其特征在于,所述第一定位模型为SDM模型,所述第二定位模型为3-modeSVD模型。The apparatus according to claim 15, wherein the first positioning model is an SDM model and the second positioning model is a 3-mode SVD model.
  18. 根据权利要求17所述的装置,其特征在于,所述第一定位单元包括:The device according to claim 17, wherein the first positioning unit comprises:
    第一偏离确定子单元,用于提取当前帧的图像梯度特征,利用所述图像梯度特征和所述第一定位模型,得到当前帧的定位点形状偏离平均定位点形状的程度ΔX;The first deviation determining subunit is configured to extract an image gradient feature of the current frame, and use the image gradient feature and the first positioning model to obtain a degree ΔX of the shape of the positioning point of the current frame deviating from the average positioning point shape;
    第一定位子单元,用于利用当前帧的ΔX和预先确定的m个平均定位点的位置,得到当前帧的m个定位点的位置。The first positioning sub-unit is configured to obtain the positions of the m positioning points of the current frame by using the ΔX of the current frame and the positions of the predetermined m average positioning points.
  19. 根据权利要求18所述的装置,其特征在于,所述第一偏离确定子单元,具体利用ΔX=Φ·R确定ΔX;The apparatus according to claim 18, wherein said first deviation determining subunit, specifically determining ΔX by using ΔX = Φ · R;
    其中当前帧的Φ为:Φ=p×Dim,p为所述第一定位模型所采用训练集的图像个数,Dim为分别以所述m个平均定位点的位置为中心的设定范围区域提取的2m维梯度特征,R为所述第一定位模型的参数向量。The Φ of the current frame is: Φ=p×Dim, where p is the number of images of the training set used by the first positioning model, and Dim is a set range area centered on the positions of the m average positioning points respectively. The extracted 2m-dimensional gradient feature, R is a parameter vector of the first positioning model.
  20. 根据权利要求19所述的装置,其特征在于,该装置还包括:The device of claim 19, further comprising:
    第一模型训练单元,用于执行以下操作训练所述第一定位模型:a first model training unit, configured to perform the following operations to train the first positioning model:
    A11、确定训练集中p个图像的关键点位置XrealA11, determine the image of the training set p key point X real;
    A12、将各关键点在p个图像中的平均位置作为当前迭代位置X;A12. The average position of each key point in the p images is taken as the current iteration position X;
    A13、利用ΔX=Xreal-X,确定ΔX的值;A13, using ΔX = X real -X, determining the value of Delta] X;
    A14、利用R=(ΦT·Φ)-1·ΦT·ΔX,得到第一定位模型的参数向量R的当前值;A14. Using R=(Φ T ·Φ) -1 ·Φ T ·ΔX, obtain the current value of the parameter vector R of the first positioning model;
    A15、如果ΔX的模小于或等于预设的第一模值,则R的当前值即为训练得到的第一定位模型的参数向量R的值; A15. If the modulus of ΔX is less than or equal to the preset first modulus, the current value of R is the value of the parameter vector R of the first positioning model obtained by the training;
    否则,利用Φ·R+X得到的值更新当前迭代位置X,转至所述A13。Otherwise, the current iteration position X is updated with the value obtained by Φ·R+X, and the process goes to A13.
  21. 根据权利要求17所述的装置,其特征在于,所述第二定位单元包括:The device according to claim 17, wherein the second positioning unit comprises:
    第一参数确定子单元,用于利用第二定位模型确定当前帧的Wid,具体执行以下操作:The first parameter determining subunit is configured to determine the W id of the current frame by using the second positioning model, and specifically perform the following operations:
    S21、将所述m个定位点的位置平均值作为当前迭代位置
    Figure PCTCN2016099291-appb-100016
    S21. The average value of the positions of the m positioning points is used as a current iteration position.
    Figure PCTCN2016099291-appb-100016
    S22、利用当前迭代位置
    Figure PCTCN2016099291-appb-100017
    当前的Wid和Wexp以及所述m个定位点对应的第二定位模型的参数向量Csdm_exp_id,确定新的迭代位置S;
    S22, using the current iteration position
    Figure PCTCN2016099291-appb-100017
    The current W id and W exp and the parameter vector C sdm_exp_id of the second positioning model corresponding to the m positioning points determine a new iteration position S;
    S23、确定所述m个定位点的位置与所述新的迭代位置S的偏差ΔS;S23, determining a deviation ΔS between the position of the m positioning points and the new iteration position S;
    S24、利用当前的Wid和ΔS,确定新的WidS24. Determine a new W id by using the current W id and ΔS.
    S25、如果所述ΔS的模小于或等于预设的第二模值,则将所述新的Wid确定为当前帧的WidSecond mold value S25, if the ΔS is equal to or less than a preset mode, the new W id W id determined then the current frame;
    否则,利用新的Wid更新当前Wid以及利用新的迭代位置S更新当前迭代位置
    Figure PCTCN2016099291-appb-100018
    转至执行所述S22。
    Otherwise, with the new W id update the current W id and the use of new iteration location S to update the current iteration position
    Figure PCTCN2016099291-appb-100018
    Go to execute S22.
  22. 根据权利要求21所述的装置,其特征在于,所述第一参数确定子单元在执行所述S22时,具体利用
    Figure PCTCN2016099291-appb-100019
    得到新的迭代位置S;其中所述
    Figure PCTCN2016099291-appb-100020
    表示按表情描述方向展开成二维矩阵,
    Figure PCTCN2016099291-appb-100021
    表示按表情描述方向合成立方矩阵;
    The apparatus according to claim 21, wherein said first parameter determining subunit specifically utilizes when said S22 is executed
    Figure PCTCN2016099291-appb-100019
    Obtaining a new iteration position S;
    Figure PCTCN2016099291-appb-100020
    Expressing a two-dimensional matrix in the direction of the expression description,
    Figure PCTCN2016099291-appb-100021
    Representing the synthesis of a cubic matrix in the direction of the expression description;
    在执行所述S24时,具体利用ΔWid=(ΨT·Ψ)-1·ΨT·ΔS,确定ΔWid;将ΔWid与当前的Wid的和确定为新的WidWhen performing the S24, the specific use of ΔW id = (Ψ T · Ψ ) -1 · Ψ T · ΔS, determine ΔW id; ΔW id with the current W id and determined as a new W id.
  23. 根据权利要求17或21所述的装置,其特征在于,所述第二定位单元包括:The device according to claim 17 or 21, wherein the second positioning unit comprises:
    第二参数确定子单元,用于利用第二定位模型确定当前帧的Wexp,具体执行:The second parameter determining subunit is configured to determine W exp of the current frame by using the second positioning model, and specifically:
    S31、将所述m个定位点的位置平均值作为当前迭代位置
    Figure PCTCN2016099291-appb-100022
    S31. Using the average position of the m positioning points as the current iteration position
    Figure PCTCN2016099291-appb-100022
    S32、利用当前迭代位置
    Figure PCTCN2016099291-appb-100023
    当前帧的Wid和当前的Wexp以及所述m个定位点对应的第二定位模型的参数向量Csdm_exp_id,确定新的迭代位置S;
    S32, using the current iteration position
    Figure PCTCN2016099291-appb-100023
    Positioning a second parameter vector C sdm_exp_id Model W id and the current W exp current frame and the m corresponding anchor point, determining a new position of S iteration;
    S33、确定所述m个定位点的位置与所述新的迭代位置S的偏差ΔS;S33, determining a deviation ΔS between the position of the m positioning points and the new iteration position S;
    S34、利用当前的Wexp和ΔS,确定新的WexpS34. Determine a new W exp by using the current W exp and ΔS;
    S35、如果所述ΔS的模小于或等于预设的第三模值,则将所述新的Wexp确定为当前帧的WexpS35, if the mold is less than or equal ΔS preset third modulo value, the new W exp W exp determined then the current frame;
    否则,利用新的Wexp更新当前Wexp以及利用新的迭代位置S更新当前迭代位置
    Figure PCTCN2016099291-appb-100024
    转至执行所述S32。
    Otherwise, with the new W exp update the current W exp and the use of new iteration location S to update the current iteration position
    Figure PCTCN2016099291-appb-100024
    Go to execute S32.
  24. 根据权利要求23所述的装置,其特征在于,所述第二参数确定子单元在执行所述S32时,具体利用
    Figure PCTCN2016099291-appb-100025
    得到新的迭代位置S;其中所述
    Figure PCTCN2016099291-appb-100026
    表示按对象描述方向展开成二维矩阵,
    Figure PCTCN2016099291-appb-100027
    表示按对象描述方向合成立方矩阵;
    The apparatus according to claim 23, wherein said second parameter determining subunit specifically utilizes when said S32 is executed
    Figure PCTCN2016099291-appb-100025
    Obtaining a new iteration position S;
    Figure PCTCN2016099291-appb-100026
    Indicates that the object is expanded into a two-dimensional matrix according to the object description direction.
    Figure PCTCN2016099291-appb-100027
    Represents the synthesis of a cubic matrix in the direction of the object description;
    在执行所述S34时,具体利用ΔWexp=(ΩT·Ω)-1·ΩT·ΔS,确定ΔWexp;将ΔWexp与当前的Wexp的和确定为新的WexpWhen performing the S34, the specific use of ΔW exp = (Ω T · Ω ) -1 · Ω T · ΔS, determine ΔW exp; ΔW exp with the determined current and W exp for the new W exp.
  25. 根据权利要求17所述的装置,其特征在于,所述第二定位单元包括:The device according to claim 17, wherein the second positioning unit comprises:
    第二定位子单元,用于基于当前帧的Wid和Wexp对当前帧进行定位,具体执行:The second positioning sub-unit is configured to locate the current frame based on the W id and W exp of the current frame, and specifically:
    利用
    Figure PCTCN2016099291-appb-100028
    得到包含当前帧n个定位点的位置的向量f;
    use
    Figure PCTCN2016099291-appb-100028
    Obtaining a vector f containing the position of n positioning points of the current frame;
    其中,
    Figure PCTCN2016099291-appb-100029
    为所述第二定位模型所采用训练集的图像中n个定位点的位置平均值构成的向量,Cexp_id为第二定位模型的参数向量,×1表示×1前面的立方矩阵按照表情描述方向展开为二维矩阵与×1后面的二维矩阵相乘后,将得到的二维矩阵再按照表情描述方向变换成立方矩阵;×2表示×2前面的立方矩阵按照对象描述方向展开为二维矩阵与×2后面的二维矩阵相乘后,将得到的二维矩阵再按照对象描述方向变换成立方矩阵。
    among them,
    Figure PCTCN2016099291-appb-100029
    a vector formed by the average value of the positions of n positioning points in the image of the training set used in the second positioning model, C exp_id is a parameter vector of the second positioning model, and × 1 indicates that the cubic matrix in front of × 1 is in accordance with the expression description direction. expand remapping cubic matrix is a two-dimensional matrix and a two-dimensional matrix × 1 dimensional matrix by multiplying the back, in accordance with the expression described in the obtained direction; represents × 2 × 2 matrix is expanded in front of a two-dimensional cubic described by object orientation After multiplying the matrix by the two-dimensional matrix behind × 2 , the obtained two-dimensional matrix is transformed into a square matrix according to the direction of the object description.
  26. 根据权利要求25所述的装置,其特征在于,该装置还包括:The device of claim 25, further comprising:
    第二模型训练单元,用于执行以下操作训练所述第二定位模型:a second model training unit, configured to perform the following operations to train the second positioning model:
    B11、收集不同对象的不同表情的图像,按照对象描述、表情描述和位置描 述构建立体训练数据张量;B11. Collect images of different expressions of different objects according to object description, expression description and position description. Constructing a stereo training data tensor;
    B12、将所述立体训练数据张量中n个关键点的位置分别减去n个关键点在各图像中的位置平均值,得到立体数据张量D;B12, the position of the n key points in the stereo training data tensor is respectively subtracted from the position average of the n key points in each image to obtain a stereo data tensor D;
    B13、利用
    Figure PCTCN2016099291-appb-100030
    得到第二定位模型的参数向量Cexp_id
    B13, use
    Figure PCTCN2016099291-appb-100030
    Obtaining a parameter vector C exp_id of the second positioning model;
    其中,Uexp是D在表情描述方向展开为二维矩阵的酉矩阵,Uid是D在对象描述方向展开为二维矩阵的酉矩阵。Where U exp is a unitary matrix in which D is expanded into a two-dimensional matrix in the direction of expression description, and U id is a unitary matrix in which D is expanded into a two-dimensional matrix in the direction of object description.
  27. 根据权利要求21所述的装置,其特征在于,所述第一参数确定子单元,还用于在确定出的各帧的Wid趋于稳定值时,对于后续帧的Wid直接采用所述稳定值。The apparatus according to claim 21, wherein the first parameter determining subunit is further configured to directly adopt the W id for a subsequent frame when the determined W id of each frame tends to a stable value. Stable value.
  28. 根据权利要求15所述的装置,其特征在于,该装置还包括:The device according to claim 15, wherein the device further comprises:
    身份标识单元,用于若图像中存在多于一个的对象,则将上一帧的对象描述参数与当前帧的对象描述参数的欧式距离最小的两个对象描述参数确定为对应于同一对象。 The identity identifying unit is configured to determine, if there is more than one object in the image, two object description parameters that minimize the Euclidean distance between the object description parameter of the previous frame and the object description parameter of the current frame to correspond to the same object.
PCT/CN2016/099291 2015-09-29 2016-09-19 Method and apparatus for positioning key point of image WO2017054652A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510631053.8 2015-09-29
CN201510631053.8A CN106558042B (en) 2015-09-29 2015-09-29 Method and device for positioning key points of image

Publications (1)

Publication Number Publication Date
WO2017054652A1 true WO2017054652A1 (en) 2017-04-06

Family

ID=58415925

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/099291 WO2017054652A1 (en) 2015-09-29 2016-09-19 Method and apparatus for positioning key point of image

Country Status (2)

Country Link
CN (1) CN106558042B (en)
WO (1) WO2017054652A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558837A (en) * 2018-11-28 2019-04-02 北京达佳互联信息技术有限公司 Face critical point detection method, apparatus and storage medium
CN112101109A (en) * 2020-08-11 2020-12-18 深圳数联天下智能科技有限公司 Face key point detection model training method and device, electronic equipment and medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830900B (en) * 2018-06-15 2021-03-12 北京字节跳动网络技术有限公司 Method and device for processing jitter of key point
CN110148158A (en) * 2019-05-13 2019-08-20 北京百度网讯科技有限公司 For handling the method, apparatus, equipment and storage medium of video
CN112950672B (en) * 2021-03-03 2023-09-19 百度在线网络技术(北京)有限公司 Method and device for determining positions of key points and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080037836A1 (en) * 2006-08-09 2008-02-14 Arcsoft, Inc. Method for driving virtual facial expressions by automatically detecting facial expressions of a face image
CN101499128A (en) * 2008-01-30 2009-08-05 中国科学院自动化研究所 Three-dimensional human face action detecting and tracing method based on video stream
CN102831382A (en) * 2011-06-15 2012-12-19 北京三星通信技术研究有限公司 Face tracking apparatus and method
CN103605965A (en) * 2013-11-25 2014-02-26 苏州大学 Multi-pose face recognition method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101271520A (en) * 2008-04-01 2008-09-24 北京中星微电子有限公司 Method and device for confirming characteristic point position in image
CN104217417B (en) * 2013-05-31 2017-07-07 张伟伟 A kind of method and device of video multi-target tracking

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080037836A1 (en) * 2006-08-09 2008-02-14 Arcsoft, Inc. Method for driving virtual facial expressions by automatically detecting facial expressions of a face image
CN101499128A (en) * 2008-01-30 2009-08-05 中国科学院自动化研究所 Three-dimensional human face action detecting and tracing method based on video stream
CN102831382A (en) * 2011-06-15 2012-12-19 北京三星通信技术研究有限公司 Face tracking apparatus and method
CN103605965A (en) * 2013-11-25 2014-02-26 苏州大学 Multi-pose face recognition method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558837A (en) * 2018-11-28 2019-04-02 北京达佳互联信息技术有限公司 Face critical point detection method, apparatus and storage medium
CN109558837B (en) * 2018-11-28 2024-03-22 北京达佳互联信息技术有限公司 Face key point detection method, device and storage medium
CN112101109A (en) * 2020-08-11 2020-12-18 深圳数联天下智能科技有限公司 Face key point detection model training method and device, electronic equipment and medium

Also Published As

Publication number Publication date
CN106558042A (en) 2017-04-05
CN106558042B (en) 2020-03-31

Similar Documents

Publication Publication Date Title
WO2017054652A1 (en) Method and apparatus for positioning key point of image
JP7249390B2 (en) Method and system for real-time 3D capture and live feedback using a monocular camera
US11393152B2 (en) Photorealistic real-time portrait animation
US10860838B1 (en) Universal facial expression translation and character rendering system
KR101339900B1 (en) Three dimensional montage generation system and method based on two dimensinal single image
JP4950787B2 (en) Image processing apparatus and method
US11282257B2 (en) Pose selection and animation of characters using video data and training techniques
JP4951498B2 (en) Face image recognition device, face image recognition method, face image recognition program, and recording medium recording the program
KR20100138648A (en) Image processing apparatus and method
JP2021505979A (en) Method and device for simultaneous self-position estimation and environment map creation
JP2019096113A (en) Processing device, method and program relating to keypoint data
WO2020037963A1 (en) Facial image identifying method, device and storage medium
CN111815768B (en) Three-dimensional face reconstruction method and device
KR102264803B1 (en) Method for generating character animation through extracting a character from an image and device using thereof
WO2017185301A1 (en) Three-dimensional hair modelling method and device
TWI763205B (en) Method and apparatus for key point detection, electronic device, and storage medium
US11361467B2 (en) Pose selection and animation of characters using video data and training techniques
CN113870420A (en) Three-dimensional face model reconstruction method and device, storage medium and computer equipment
TWI728037B (en) Method and device for positioning key points of image
US11830132B2 (en) Device for processing face feature point estimation image on basis of standard face model, and phusical computer-readable recording medium in which program for processing face feature point estimation image on basis of standard face medel is recorded
JP2011232845A (en) Feature point extracting device and method
CN110602499B (en) Dynamic holographic image coding method, device and storage medium
Dang et al. Generalizable Dynamic Radiance Fields For Talking Head Synthesis With Few-shot
Zhang et al. 3d face model reconstruction based on stretching algorithm
Hsieh et al. 3D Angle Searching System with PSO for Face Recognition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16850276

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16850276

Country of ref document: EP

Kind code of ref document: A1