KR101717377B1

KR101717377B1 - Device and method for head pose estimation

Info

Publication number: KR101717377B1
Application number: KR1020150169265A
Authority: KR
Inventors: 김현덕; 손명규; 이상헌
Original assignee: 재단법인대구경북과학기술원
Priority date: 2015-11-30
Filing date: 2015-11-30
Publication date: 2017-03-17

Abstract

The present invention relates to a face posture estimation apparatus and a face posture estimation method. The face posture estimating apparatus according to the present invention includes an extracting unit for extracting a feature point from an image and a code word (m is a natural number equal to or more than 3) And a processing unit for encoding the feature point with an encoding code related to face posture recognition in the image using a codeword.

Description

TECHNICAL FIELD [0001] The present invention relates to a face posture estimation apparatus and a face posture estimation method,

The present invention relates to a face posture estimation apparatus and a face posture estimation method.

In the prior art, a technique of detecting a shape of a person through a feature point has been used. At this time, a vector quantization (VQ) method and a sparse coding (SC) method have been used as methods of encoding feature points. However, the encoding method in the prior art may have the following disadvantages.

Fig. 1 is a diagram for explaining a method of coding feature points in the prior art.

The vector quantization method shown in FIG. 1 (a) is the simplest encoding method in the related art. The vector quantization method is a method of encoding the feature descriptor corresponding to the most locally closest code using a least squares method. The code through the vector quantization method has a disadvantage that the feature value and the code word of the codebook correspond to each other one-to-one, thereby causing a high possibility of causing a quantization error.

The sparse coding method shown in FIG. 1 (b) is a coding method widely used in the field of computer vision. The sparse coding method is a method of correspondingly coding a small number of code words considering only the number of codes in order to reduce errors in vector quantization. However, the sparse encoding method has a disadvantage in that the performance may be limited by encoding only emphasizing scarcity without considering the structure of data in the feature space.

Therefore, there is a need for a coding method capable of reducing the possibility of occurrence of quantization error and considering the structure of data by limiting the locality.

SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems, and it is an object of the present invention to quickly and accurately recognize a face posture by performing limited local sparsity coding with a small amount of computation and improved accuracy with regard to minutiae points.

The face posture estimating apparatus for achieving the above object comprises an extracting unit for extracting a feature point from an image and a code word (m is a natural number of 3 or more) close to coordinates possessed by the feature point from a codebook, and a processing unit for encoding the minutiae with an encoding code for recognizing a face posture in the image using m codewords.

In order to accomplish the above object, a face posture estimation method includes extracting a feature point from an image, calculating m (m is a natural number of 3 or more) codewords close to coordinates possessed by the feature point, And encoding the feature point with an encoding code for recognizing face orientation in the image using the m codewords.

According to the embodiment of the present invention, by performing limited local scarcity encoding with a small amount of computation and improved accuracy with respect to feature points, the face posture can be quickly and accurately recognized.

In addition, according to the embodiment of the present invention, it is possible to improve the interaction performance with the advertisement and the contents requiring fast recognition due to the fast and accurate face posture recognition, and without the need of a separate input device (e.g., keyboard, It is possible to improve the interaction performance with a computer apparatus requiring accuracy.

Fig. 1 is a diagram for explaining a method of coding feature points in the prior art.
2 is a block diagram showing a face posture estimating apparatus according to an embodiment of the present invention.
Figure 3 is a diagrammatic representation of limited local sparse encoding in accordance with one embodiment of the present invention.
4 is a flowchart illustrating a method of estimating a face posture according to an exemplary embodiment of the present invention.

Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings. However, the present invention is not limited to or limited by the embodiments. Like reference symbols in the drawings denote like elements.

The face posture estimating apparatus and the face posture estimating method described in this specification can encode codes considering the structure of data by limiting the locality.

2 is a block diagram showing a face posture estimating apparatus according to an embodiment of the present invention.

The face posture estimation apparatus 200 of the present invention may include an extraction unit 210 and a processing unit 220. [

The extraction unit 210 extracts feature points from the image. That is, the extracting unit 210 may extract feature points using a Histogram of Oriented Gradient (HOG) feature descriptor. At this time, the extracting unit 210 extracts a set of n feature points extracted from the image

Can be extracted. The extraction unit 210 can extract feature points efficiently by using the characteristics of the constant HOG feature descriptor with respect to the change in rotation.

The processing unit 220 identifies m code words (m is a natural number of 3 or more) close to the coordinates of the minutiae from the codebook, and uses the m code words to identify the minutiae as the in- And is encoded with an encoding code related to attitude recognition. That is, the processing unit 220 can identify at least three adjacent code words from the feature points. At this time, the processing unit 220 generates codebooks having m code words

Lt; / RTI > In addition, the processing unit 220 may encode the feature points through limited local sparse encoding to recognize the face posture distinct from the background in the image.

In addition, the processing unit 130 may generate the encoded code by inversely calculating a function composed of the identified m codewords with the minutiae. That is, the processing unit 220 determines

To an m-dimensional code c (coded code).

In addition, the processing unit 220 identifies the m codewords, and if the number of codewords having a value of 0 is relatively small, repeating the identification of the codewords and limiting the scarcity for the m codewords identified can do. That is, the processing unit 220 may limit the scarcity of m codewords so that a code c having a value of 0 is larger than a code c having a value other than 0.

In addition, when the feature point is extracted as a plurality of the feature points including the first feature point and the second feature point, the processing unit 220 may determine that the feature point is close to the coordinates of the second feature point except for the code word identified based on the first feature point The m codewords can be identified. That is, when a plurality of features are extracted from the feature points, the processing unit 220 may extract the code words that are close to the second feature points, excluding the code words identified by the first feature points with respect to the second feature points. For a more detailed description of limited local sparse encoding, reference is made to Fig. 3 below.

Figure 3 is a diagrammatic representation of limited local sparse encoding in accordance with one embodiment of the present invention.

In FIG. 3, it is assumed that the face posture estimation apparatus 200 extracts two minutiae as an example, but the present invention is not limited to this. That is, the face posture estimation apparatus 200 can extract n feature points.

First, the face posture estimation apparatus 200 can limit the scarcity of the code c such that most of the code c has a value of 0 and only a few numbers have a value other than 0. For example, the face posture estimation apparatus 200 identifies three code words 321, 322, and 323, and if the number of code words 321, 322, and 323 having a value of 0 is relatively small, To limit the sparseness for the three code words 321, 322, and 323 that are identified.

Next, the face posture estimation apparatus 200 can extract the first and second minutiae 310 and 320. At this time, the face posture estimation apparatus 200 can identify at least three code words 321, 322, and 323 for the second feature point 320. [ The face posture estimation apparatus 200 excludes the code words 311, 312, and 313 identified by the first feature point 310 when the code words 321, 322, and 323 for the second feature point 320 are identified And identify code words 321, 322, and 323 that are close to the second feature point 320. As described above, the face posture estimation apparatus 200 can restrict the locality by selecting only the near code word corresponding to the code c.

Referring again to FIG. 2, the processing unit 220 can perform limited local sparse encoding through the objective function equation (1).

here,

Represents the product of the elements,

Can be the distance between the feature point X and the codebook B. Further, the processing unit 220 may transform Equation (1) into Equation (2) using a slack variable.

The processing unit 220 can derive Equation 2 as in Equation 3 using inexact ALM (Augmented Lagrange Multiplier).

The processing unit 220 may repeatedly update the variables for Equation (3), thereby generating an encoded code.

At this time, the processing unit 220 can select the codeword in the closest distance to the coordinates through the K-nearest neighbor algorithm. That is, the processing unit 220 uses the code word close to the feature point X

Lt; / RTI >

(Code c) using Equation 4, which is a linear system of a smaller size.

The overall algorithm performed by the processing unit 220 may be as shown in Equation (5).

In addition, the processing unit 220 may apply the encoding code to a linear SVM (Support Vector Machine) to recognize the face posture from the image. That is, the processing unit 220 can quickly recognize the face posture through learning through the linear SVM using the encoded code. At this time, the processing unit 220 may use various detection models other than the SVM. For example, the processing unit 220 may use a kernel based SVM, a Bayes classifier, or the like.

Further, the processing unit 220 can separate the face posture divided by the encoding code from the background in the image. That is, the processing unit 220 can recognize the face posture of the encoded code generated from the subsequent image by learning the encoded code separated for the face posture and the background.

According to the facial attitude estimation apparatus 200 of the present invention, the facial attitude can be quickly and accurately recognized by performing the limited local scarcity coding with a small amount of computation and improved accuracy with regard to the feature points.

In addition, according to the face posture estimation apparatus 200 of the present invention, it is possible to improve the interaction performance between the advertisement and the content requiring fast recognition due to the quick and accurate face posture recognition, Mouse, etc.), it is possible to improve the interaction performance with a computer apparatus that requires accuracy.

4 is a flowchart illustrating a method of estimating a face posture according to an exemplary embodiment of the present invention.

First, the face posture estimation method according to the present embodiment can be performed by the above-described face posture estimation apparatus 200. [

First, the face posture estimation apparatus 200 extracts feature points from an image (410). That is, step 220 may be a process of extracting feature points using HOG (Histogram of Oriented Gradient) feature descriptor. At this time, the face posture estimation apparatus 200 estimates a set of n feature points extracted from the image

Can be extracted. The face posture estimation apparatus 200 can effectively extract the feature points by using the characteristics of the HOG feature descriptor that is invariant to the change in rotation.

Next, the face posture estimation apparatus 200 identifies 420 code words (m is 3 or more natural numbers) close to the coordinates of the feature points from the codebook (420). That is, step 420 may be a process of identifying at least three adjacent code words from the feature points. At this time, the face posture estimation apparatus 200 calculates the position of the codebook

Lt; / RTI > In addition, the face posture estimation apparatus 200 can recognize the face posture distinguished from the background in the image by encoding the feature points through limited local sparse encoding.

Next, the face postural estimation apparatus 200 codes the feature points using the m codewords into an encoding code related to face posture recognition in the image (430). That is, the step 430 may encode the feature points through limited local sparse encoding to recognize the face posture distinct from the background in the image.

In addition, step 430 may generate the encoded code by inversely calculating a function composed of the identified m codewords with the minutiae. That is, the face posture estimation apparatus 200 calculates

To an m-dimensional code c (coded code).

In this case, if the feature point is extracted as a plurality of the feature points including the first feature point and the second feature point, the step 420 may be performed so that the coordinates of the second feature point The m codewords can be identified. That is, when a plurality of features are extracted from the feature points, the face posture estimation apparatus 200 can extract a code word that is close to the second feature point, excluding the code word identified by the first feature point with respect to the second feature point.

Step 420 also identifies the m codewords, if the number of codewords with a value of 0 is relatively small, repeating the identification of the codewords and limiting the scarcity for the identified m codewords can do. That is, the face posture estimation apparatus 200 can limit the sparseness of m codewords so that a code c having a value of 0 is larger than a code c having a value other than 0.

The face posture estimation apparatus 200 can perform limited local sparse encoding through the objective function equation (6).

here,

Represents the product of the elements,

Can be the distance between the feature point X and the codebook B. In addition, the face posture estimation apparatus 200 can transform Equation (6) into Equation (7) using a slack variable.

The face posture estimation apparatus 200 can derive Equation (7) as Equation (8) using inexact ALM (Augmented Lagrange Multiplier).

The face posture estimation apparatus 200 can generate the encoded code by updating the variables repeatedly with respect to Equation (8).

In addition, the step 420 may be a process of selecting the codeword in the closest distance to the coordinates through the K-nearest neighbor algorithm.

That is, the face posture estimation apparatus 200 uses the code word close to the feature point X

Lt; / RTI >

(Code c) using Equation (9), which is a linear system of smaller size.

The overall algorithm performed by the face posture estimation apparatus 200 may be as shown in Equation (10).

According to the embodiment, the face posture estimation apparatus 200 can apply the encoding code to a linear SVM (Support Vector Machine) to recognize the face posture from the image. That is, the face posture estimation apparatus 200 can quickly recognize the face posture through learning through the linear SVM using the encoding code. At this time, the face posture estimation apparatus 200 can use various detection models other than the SVM. For example, the face posture estimation apparatus 200 can use a kernel based SVM, a Bayes classifier, and the like.

According to the embodiment, the face posture estimation apparatus 200 can separate the face posture divided by the encoding code from the background in the image. That is, the face posture estimation apparatus 200 can recognize the face posture of the encoded code generated from the subsequent image by learning the encoded code separated for the face posture and the background.

According to the facial attitude estimation method of the present invention, the facial attitude can be recognized quickly and accurately by performing limited local scarcity coding with a small amount of computation and improved accuracy with respect to the feature points.

According to the facial attitude estimation method of the present invention, it is possible to improve the interaction performance between the advertisement and the contents requiring fast recognition due to the quick and accurate facial attitude recognition, It is possible to improve the interaction performance with a computer apparatus that requires accuracy.

The method according to an embodiment of the present invention may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

200: face posture estimation device
210:
220:

Claims

An extracting unit for extracting a feature point from an image; And
(M is a natural number of 3 or more) codewords that are close to the coordinates of the minutiae from the codebook, and the minutiae points are encoded using the m code words A processing unit
Lt; / RTI >
When a plurality of feature points including the first feature point and the second feature point are extracted from the image,
Wherein,
Identifying the m codewords that are close to the coordinates of the second feature point, excluding codewords identified based on the first feature point, from among the plurality of feature points,
If the number of codewords with a value of 0 is less than the predetermined number, the identification of the codeword is repeated to limit the scarcity for the identified m codewords
A face posture estimation device.

delete

The method according to claim 1,
Wherein,
And a function composed of the identified m codewords is inversely calculated with respect to the minutiae to generate the coding code
A face posture estimation device.

The method according to claim 1,
Wherein,
The coding code is applied to a linear SVM (Support Vector Machine) to recognize the face posture from the image
A face posture estimation device.

delete

The method according to claim 1,
Wherein,
The codeword is selected in the order of closest distances from the coordinates through the K-nearest neighbor algorithm
A face posture estimation device.

The method according to claim 1,
Wherein,
And separating the face posture divided by the encoding code from the background in the image
A face posture estimation device.

Extracting feature points from an image;
Identifying m code words (m is a natural number of 3 or more) close to the coordinates of the feature points from the codebook; And
Encoding the minutiae points with an encoding code for recognizing a face posture in the image using the m code words;
Lt; / RTI >
When a plurality of feature points including the first feature point and the second feature point are extracted from the image,
Wherein identifying the code word from a codebook comprises:
Identifying among the plurality of minutiae the m codewords close to the coordinates of the second minutia except the codeword identified on the basis of the first minutiae; And
If the number of codewords having a value of 0 is less than the predetermined number, repeating the identification of the codewords and restricting the sparseness to the identified m codewords
The method comprising the steps of:

delete

9. The method of claim 8,
Wherein the encoding of the encoded code comprises:
Calculating a function consisting of the identified m codewords in inverse matrix with the minutiae to generate the coding code
The method comprising the steps of:

9. The method of claim 8,
In the face posture estimation method,
Applying the encoded code to a linear SVM (Support Vector Machine) to recognize the face posture from the image
Further comprising the steps of:

delete