WO2013009020A2

WO2013009020A2 - Method and apparatus for generating viewer face-tracing information, recording medium for same, and three-dimensional display apparatus

Info

Publication number: WO2013009020A2
Application number: PCT/KR2012/005202
Authority: WO
Inventors: 이인권; 이정헌
Original assignee: Lee In Kwon; Lee Jeong Heon
Priority date: 2011-07-08
Filing date: 2012-06-29
Publication date: 2013-01-17
Also published as: WO2013009020A4; WO2013009020A3; KR101216123B1; US20140307063A1

Abstract

The present invention relates to a method and an apparatus for generating viewer face-tracing information, a recording medium for same, and a three-dimensional display apparatus, and the method for generating the viewer face-tracing information for controlling dimensionality of the three-dimensional display apparatus according to information on a gazing direction and/or a gazing distance of a viewer, comprising the steps of: (a) detecting a face region of the viewer from an image that is extracted from an image that is inputted through an image input means, which is provided on one position on a side of the three-dimensional display apparatus; (b) detecting facial characteristics from the face region which is detected; (c) estimating an optimal transformation matrix for generating a three-dimensional viewer face model, which corresponds to the facial characteristics, by transforming model characteristics of a three-dimensional standard face model; and (d) generating the viewer face-tracing information by estimating the gaze direction and/or the gaze distance of the viewer based on the optimal transformation matrix.

Description

Method and apparatus for generating viewer face tracking information, recording medium and 3D display device

The present invention relates to a method and apparatus for generating viewer face tracking information, a recording medium and a three-dimensional display apparatus.

More specifically, the facial feature point in the viewer's face is detected from the image extracted from the image input through the image input means, and the viewer's gaze direction for controlling the stereoscopic effect of the 3D display device using the facial feature point and the optimal transformation matrix. And a method and apparatus for generating a viewer face tracking information for generating information on gaze distance, a recording medium, and a three-dimensional display device.

Human eyes are about 6.5 cm apart in the transverse direction.

The resulting binocular disparity acts as the most important factor for the three-dimensional feeling.

That is, the left eye and the right eye see different 2D images.

When these two images are delivered to the brain through the retina, the brain exactly fuses them together to create the depth and realism of the original 3D stereoscopic image.

In this way, a single image is created from two images obtained by the visual difference between two eyes and shows the difference between the two eyes so that a person can feel the liveness and reality as if they are in the place where the image is being made. The technology is called 3D stereoscopic imaging technology.

3D stereoscopic image technology has become a core technology that is widely applied to the development of all existing industrial products such as 3D TV, information and communication, broadcasting, medical, film, games, animation and so on.

For example, 3D TV is a device that inputs images for left and right eyes to each eye on a display using special glasses and recognizes 3D in human cognitive / information system using binocular parallax principle.

The 3D TV separates a left / right image that causes an artificial visual difference from a display and delivers it to both eyes, thereby making the brain feel a 3D stereoscopic feeling.

For example, a passive 3D TV is composed of an optical film, a liquid crystal, and a polaroid film (PR film), as shown in FIG. 1.

In the passive 3D TV, as shown in FIG. 2, when viewing at the same height as the TV screen from the front of the TV screen, the image to be seen by the left eye denoted by L is the left eye and the right denoted by R. The image to go to the eyes is displayed on the right eye, and the 3D stereoscopic feeling is felt.

However, as shown in FIG. 3, when the viewer does not watch from the front of the TV screen but views from the front left and right sides of the 3D TV, a crosstalk phenomenon in which the images overlap is generated. This makes it difficult for the viewer to feel normal 3D stereoscopic feeling.

This is caused by viewing an image that should not be visible on each eye because of the viewing angle, and the closer the distance between the viewer and the 3D TV screen is, the worse it becomes.

Therefore, a control technology is required such as tracking the direction and the position at which the viewer stares, controlling the stereoscopic effect of the 3D TV, or rotating the 3D TV screen.

On the other hand, recently, the development of auto glasses-free 3D TV has been accelerated due to the inconvenience of the 3D TV using the special glasses.

The glasses-free 3D TV is a TV that can provide 3D images without using special glasses, and in order to apply the glasses-free method, a technology for tracking a viewer's gaze is further required.

One example of a technique for tracking the direction in which the viewer stares is to track the viewer's eyes.

The method of tracking the viewer's eyes uses a method of outputting the coordinates of the pupil using an eye tracking algorithm after grasping the feature points of the eye position.

In detail, a method of detecting a boundary line between an iris and an egg white in a face image and tracking the same is used.

However, this method has a problem that it is difficult to accurately determine the angle at which the eye gazes, and the eye tracking angle is small.

As another example of a technique of tracking a viewer's gaze, there is a template matching method of finding and tracking a feature point of a face.

However, since the template matching method should be given a template corresponding to the feature point of the face at first, there is a problem that it is not common and is followed by constraints.

An object of the present invention for solving the problems according to the prior art is to detect the facial feature in the viewer's face from the image extracted from the image input through the image input means, and using the facial feature and the optimal conversion matrix three-dimensional display Disclosed is a method and apparatus for generating a viewer's face tracking information for generating information about a viewer's gaze direction and gaze distance for controlling a stereoscopic effect of a device, a recording medium, and a three-dimensional display device.

An embodiment of the present invention for achieving the above object, as a viewer face tracking information generation method for controlling the stereoscopic sense of the three-dimensional display device corresponding to at least one of the gaze direction and gaze distance of the viewer, ( a) detecting a face region of the viewer from an image extracted from an image input through an image input means provided at one position of the 3D display apparatus; (b) detecting a facial feature point in the detected face region; (c) estimating an optimal transformation matrix for generating a 3D viewer face model corresponding to the face feature by converting the model feature points of the 3D standard face model; And (d) estimating at least one of the gaze direction and gaze distance of the viewer based on the optimal transformation matrix to generate viewer face tracking information.

In accordance with another aspect of the present invention, there is provided a viewer face tracking information generation method for controlling a stereoscopic feeling of a 3D display apparatus in response to at least one of a gaze direction and a gaze distance of a viewer, wherein the 3D display is performed. A face region detecting step of detecting a face region of the viewer from an image extracted from an image input through an image input means provided at one position of a device side; A gaze information generation step of generating gaze information by estimating at least one information of gaze direction and gaze distance of the viewer based on the detected face region; And generating viewer information by estimating at least one piece of information of the gender and the age of the viewer based on the detected face region.

According to another aspect of the present invention, there is provided a computer-readable recording medium recording a program for executing each step of the viewer face tracking information generation method.

According to another aspect of the present invention, there is provided a three-dimensional display device for controlling the three-dimensional effect by using the viewer face tracking information generation method.

In accordance with another aspect of the present invention, there is provided a viewer face tracking information generation device for controlling a stereoscopic feeling of a 3D display device in response to at least one of a gaze direction and a gaze distance of a viewer, wherein the 3D display is provided. A face region detection module for detecting a face region of the viewer from an image extracted from an image input through an image input means provided at a position of a device side; A facial feature point detection module for detecting a facial feature point in the detected face area; A matrix estimation module for transforming a model feature point of a 3D standard face model to estimate an optimal transformation matrix for generating a 3D viewer face model corresponding to the face feature point; And a tracking information generation module for estimating at least one of a gaze direction and a gaze distance of the viewer based on the estimated optimal transformation matrix to generate viewer face tracking information.

In accordance with another aspect of the present invention, there is provided a viewer face tracking information generation device for controlling a stereoscopic feeling of a 3D display device in response to at least one of a gaze direction and a gaze distance of a viewer, wherein the 3D display is provided. Means for detecting a face region of the viewer from an image extracted from an image input through an image input means provided at a position on the apparatus side; Means for generating gaze information by estimating at least one of gaze direction and gaze distance of the viewer based on the detected face region; And means for estimating at least one of gender and age of the viewer based on the detected face region to generate viewer information.

As described above, the present invention estimates the gaze direction and gaze distance of a viewer by using an optimal transformation matrix for converting the model feature points of the 3D standard face model to generate a 3D viewer face model corresponding to the face feature points of the face region. do.

As described above, since the gaze direction and gaze distance are estimated, the tracking speed is high, so it is suitable for real-time tracking, and there is an advantage that the face area can be robustly tracked even in the local distortion of the face area.

In addition, since it is determined whether the detected face area is valid and face feature points are detected for the face area determined to be valid, there is an advantage that the detection reliability of the face feature point is high and the tracking performance of the face area is increased.

In addition, since an asymmetric similar feature (harr-like feature) is used to detect the non-frontal face region, the detection reliability of the face region with respect to the non-frontal face is high, thereby increasing the tracking performance of the face region.

In addition, basically, the gaze direction and gaze distance of the viewer are estimated to generate gaze direction information and gaze distance information, and additionally, at least one of gender or age of the viewer is estimated to generate viewer information.

As described above, it is possible to control the stereoscopic sense of the 3D display device by additionally utilizing the viewer information as well as the gaze direction information and gaze distance information, and thus, there is an advantage that more accurate stereoscopic adjustment is possible.

In addition, by estimating whether or not the viewer's eyes are closed, when the viewer watching the 3D display device is estimated to be closed, the screen output of the 3D display device may be used as information for turning off or stopping playback. There is this.

In addition, there is an advantage that it is possible to accurately track the gaze direction, gaze distance of the viewer with only one image input means (for example, a camera).

1 is a configuration diagram showing a schematic configuration of a passive 3D TV.

2 is a state diagram showing a state of watching a passive 3D TV from the front;

3 is a state diagram illustrating a state in which a passive 3D TV is viewed from the side;

4 is a block diagram showing a schematic configuration of a viewer face tracking information generating device according to an embodiment of the present invention.

5 is a picture showing a three-dimensional standard face model in connection with the viewer face tracking information generation according to an embodiment of the present invention.

FIG. 6A is a first picture showing an example screen of a UI module in connection with generating viewer face tracking information according to an embodiment of the present invention. FIG.

FIG. 6B is a second picture showing an example screen of a UI module in connection with generating viewer face tracking information according to an embodiment of the present invention. FIG.

7 is a flowchart illustrating a process of a viewer face tracking information generation method according to an embodiment of the present invention.

8 is a view showing the basic shape of a conventional Harr-like feaure.

9 is an exemplary photograph of a harr-like feaure for detecting a front face region in relation to the generation of viewer face tracking information according to an embodiment of the present invention.

FIG. 10 is an exemplary photograph of a harr-like feaure for detecting a non-frontal face region in connection with generating viewer face tracking information according to an embodiment of the present invention. FIG.

FIG. 11 is a diagram illustrating a newly added rectangular feaure in connection with generating viewer face tracking information according to an embodiment of the present invention. FIG.

FIG. 12 is an exemplary photograph of a harr-like feaure selected from FIG. 11 for detecting a non-frontal face region in connection with generating viewer face tracking information according to an embodiment of the present invention. FIG.

Figure 13 is a feature probability curve in a training set for a conventional Harr-like feaure and Harr-like feaure applied to the present invention.

14 is a table showing the variance of the probability curve of the newly added features and the existing Harr-like feaure and the mean value of Kurtosis in the training set of the non-facial face.

15 is a profile picture applied to the conventional ASM method for a low-resolution or poor image quality.

16 is a photograph of the pattern around each marker point used in Adaboost for marker point search of the present invention.

FIG. 17 is a photograph showing 28 feature points of a face in connection with generating viewer face tracking information according to an embodiment of the present invention. FIG.

18 is a flowchart illustrating a matrix estimation process of a method for generating viewer face tracking information according to an embodiment of the present invention.

19 is a flowchart illustrating a gender estimation process of a method for generating viewer face tracking information according to an embodiment of the present invention.

20 is an exemplary photograph for defining a gender estimation face area in the gender estimation process of the viewer face tracking information generation method according to an embodiment of the present invention.

21 is a flowchart illustrating an age estimation process of a method for generating viewer face tracking information according to an embodiment of the present invention.

22 is an exemplary photograph for defining an age estimation face region in an age estimation process of a method for generating viewer face tracking information according to an embodiment of the present invention.

23 is a flowchart illustrating a process of estimating eye closure of a method of generating viewer face tracking information according to an embodiment of the present invention.

24 is an exemplary photograph for defining a face region for eye closure estimation in a process of eyelid estimation of a method for generating viewer face tracking information according to an embodiment of the present invention.

25 is a plan view for explaining a coordinate system (camera coordinate system) of the image input means in connection with generating the viewer face tracking information according to an embodiment of the present invention.

The present invention can be embodied in many other forms without departing from the spirit or main features thereof.

Therefore, the embodiments of the present invention are merely examples in all respects and should not be interpreted limitedly.

Terms such as first and second may be used to describe various components, but the components should not be limited by the terms.

The terms are used only for the purpose of distinguishing one component from another.

For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component.

The term and / or includes a combination of a plurality of related items or any item of a plurality of related items.

When a component is referred to as being "connected" or "connected" to another component, it may be directly connected to or connected to that other component, but it may be understood that other components may be present in between. Should be.

On the other hand, when a component is said to be "directly connected" or "directly connected" to another component, it should be understood that there is no other component in between.

The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of the present invention.

Singular expressions include plural expressions unless the context clearly indicates otherwise.

In this application, the terms "comprise", "comprise", "have" and the like are intended to indicate that there are features, numbers, steps, operations, components, parts, or combinations thereof described in the specification.

Therefore, it should be understood that it does not exclude in advance the possibility of the presence or addition of one or more other features or numbers, steps, operations, components, parts or combinations thereof.

Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art.

Terms such as those defined in the commonly used dictionaries should be construed as having meanings consistent with the meanings in the context of the related art, and are not construed in ideal or excessively formal meanings unless expressly defined in this application. Do not.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings, and the same or corresponding components will be denoted by the same reference numerals regardless of the reference numerals and redundant description thereof will be omitted.

In the following description of the present invention, if it is determined that the detailed description of the related known technology may obscure the gist of the present invention, the detailed description thereof will be omitted.

Disclosed is a viewer face tracking information generating apparatus for controlling a stereoscopic feeling of a 3D display device in response to at least one of a gaze direction and a gaze distance of a viewer.

The viewer face tracking information generating device includes a computing element such as a central processing unit, a system DB, a system memory, and an interface.

The viewer face tracking information generating device may be a conventional computer system connected to a 3D display device such as a 3D TV to transmit and receive a control signal.

The viewer face tracking information generating apparatus can be regarded as functioning as the viewer face tracking information generating apparatus by installing and driving the viewer face tracking information generating program in the above-described conventional computer system.

In another aspect, the viewer face tracking information generation device of the present embodiment may be configured in the form of an embedded device in a three-dimensional display device such as a 3D TV.

A description of the general configuration of such a computer system is omitted, and the following description will focus on the configuration of functional aspects required for the description of the embodiments of the present invention.

The viewer face tracking information generating device includes a face region detection module 100.

The face region detection module 100 is captured by the image capture unit 20 captured by an image input unit 10, for example, an image input through a camera, provided at a position of the 3D display apparatus. The facial region of the viewer is detected from the image.

In this case, the detection viewing angle may be all faces in the range of -90 to +90.

For example, as illustrated in FIG. 25, the image input means 10 may be installed at the top or bottom side of the center portion of the 3D TV 1.

The image input means 10 may be a camera capable of capturing a face of a viewer located in front of a TV screen in real time as a video, and more preferably, a digital camera having an image sensor.

Even if only one image input means 10 of the present embodiment is provided, the viewer face tracking information described later may be generated.

The face area detection module 100 generates a YCbCr color model from the RGB color information of the extracted image, separates color information and brightness information from the created color model, and detects a face candidate area based on the brightness information. Perform the function.

The face region detection module 100 defines a quadrilateral feature point model for the detected face candidate region, and detects the face region based on the training material learned by the AdaBoost learning algorithm. Do this.

The face area detection module 100 performs a function of determining the detected face area as a valid face area when the size of the result value of the AdaBoost exceeds a predetermined threshold value.

The viewer face tracking information generation device also includes a face feature point detection module 200.

The facial feature point detection module 200 performs facial feature point detection on face areas determined to be valid in the face area detection module 100.

The facial feature detection module 200 may detect 28 facial feature points, including, for example, a face viewing rotation angle, for which each position of an eyebrow, an eye, a nose, and a mouth can be defined.

In this embodiment, a total of eight feature points, preferably four eyes, two noses, and two mouths, which are basic facial feature points, can be detected as facial feature points.

The viewer face tracking information generation device also includes a matrix estimation module 300.

The matrix estimation module 300 estimates an optimal transformation matrix for generating a 3D viewer face model corresponding to the face feature by converting a model feature point of the 3D standard face model.

Here, the 3D standard face model may be a 3D mesh model composed of 331 points and 630 triangles, as shown in FIG. 5.

The viewer face tracking information generation device also includes a tracking information generation module 400.

The tracking information generation module 400 estimates at least one of the gaze direction and gaze distance of the viewer based on the optimal transformation matrix to generate viewer face tracking information.

The viewer face tracking information generation device also includes a gender estimation module 500.

The gender estimating module 500 estimates the gender of the viewer using the detected face region.

The gender estimating module 500 cuts out a gender estimation face area from the detected face area, normalizes the cut out face area image, and estimates a sex by a SVM (Support Vector Machine) using the normalized image. Do this.

The viewer face tracking information generation device also includes an age estimation module 600.

The age estimation module 600 estimates the age of the viewer using the detected face region.

The age estimation module 600 cuts out an age estimation face area from the detected face area.

The age estimation module 600 performs a function of normalizing the cropped face region image.

The age estimating module 600 constructs an input vector from a normalized image and performs projection on a nine-body space.

The age estimation module 600 performs a function of estimating age using a second order polynomial regression.

The viewer face tracking information generation device also includes an eyelid estimation module 700.

The eyelid estimation module 700 estimates the eyelids of the viewer using the detected face region.

The eyelid estimation module 700 performs a function of cutting a face region for eyelid estimation, a function of normalizing the cut-out face region image, and an eyelid estimation function by a support vector machine (SVM) using the normalized image. .

The viewer face tracking information generating apparatus may also display the setting of the image input means 10 provided on one side of the 3D display apparatus (FIG. 6A), the detected face region, the age / gender result, and the like (FIG. 6B). It is provided with a UI (User Interface) module.

7 is a flowchart illustrating a process of generating a viewer face tracking information according to an embodiment of the present invention.

As shown, the viewer face tracking information generation method according to the present embodiment starts from the start of the generation process, and includes the face area detection step S100, the facial feature point detection step S200, the matrix estimation step S300, and the tracking. After the information generation step (S400), gender estimation step (S500), age estimation step (S600), eye closure estimation step (S700), the result output step (S800) is made to the end step.

In the face region detection step (S100), the face region of the viewer is detected from an image extracted from an image input through an image input means provided at one position of the 3D display apparatus.

As a method for face detection, for example, a knowledge-based method, a feature-based method, a template-matching method, an appearance-based method, and the like.

Preferably, in this embodiment, an appearance-based method is used.

The appearance-based method is a method of acquiring a face region and a non-face region from different images, learning the acquired regions to make a learning model, and comparing the input image and the learning model data to detect a face.

The appearance-based method is known as a relatively high performance method for front and side face detection.

Regarding this face detection, Jianxin Wu, S. Charles Brubaker, Matthew D. Mullin, and James M. Rehg, "Fast Asymmetric Learning for Cascade Face Detection," by IEEE Transcription on Pattern Analysis and Machine Intelligence, 30, No. 3, MARCH 2008.), and Paul Viola, Michael Jones, "Rapid Object Detection using a Boosted Cascade of Simple Features" (Accepted Conference on Computer Vision and Pattern Recognition 2001.). have.

Image extraction from an image input through the image input means may be performed by capturing an image from an image input through the image input means, for example, using a sample grabber of DirectX.

As a preferred example of extracting an image from an image input through the image input means, the media type of the sample grabber may be set to RGB24.

On the other hand, when the image format of the image input means is different from RGB24, a video converter filter is automatically attached to the front of the sample grabber filter so that the image captured by the sample grabber finally becomes RGB24.

E.g,

AM_MEDIA_TYPE mt;

// Set the media type to Sample Grabber

ZeroMemory (& mt, sizeof (AM_MEDIA_TYPE));

mt.formattype = FORMAT_VideoInfo;

mt.majortype = MEDIATYPE_Video;

mt.subtype = MEDIASUBTYPE_RGB24; // only accept 24-bit bitmaps

hr = pSampleGrabber-> SetMediaType (&mt);

It can be configured as.

Meanwhile, in the face area detection of the present embodiment, (a1) a YCbCr color model is generated from the RGB color information of the extracted image, color information and brightness information are separated from the generated color model, and the face candidate area is determined by the brightness information. Detecting; (a2) defining a quadrilateral feature point model for the detected face candidate region, and detecting a face region based on learning data trained by the AdaBoost learning algorithm on the quadrilateral feature point model; And (a3) determining the detected face area as a valid face area when the size of the result value of AdaBoost (CF _H (x) of Equation 1) exceeds a predetermined threshold value. do.

[Equation 1]

(However, M: the number of total classifiers constituting the strong classifiers

h _m (x): Output value from the mth weak classifier

θ: A value used to finely adjust the error judgment rate of the strong classifier.

The AdaBoost learning algorithm is known as an algorithm that generates a strong classifier with high detection performance through linear combination of weak classifiers.

In this embodiment, in order to further improve the detection performance in the non-face face, as well as the existing symmetrical Haar-Like feature, it further includes new features considering the asymmetry characteristic of the face.

In frontal face images, the structural features unique to the face, such as eyes, nose and mouth, are evenly distributed throughout the image and are symmetrical.

However, in the non-facial face image, it is not symmetrical and is concentrated in a narrow range. Since the face outline is not a straight line, the background area is mixed.

Therefore, there is a problem that high detection performance for the non-frontal face cannot be obtained only with the existing symmetric Haar-Like features.

In order to overcome this problem, the present embodiment further includes new Haar-Like features similar to the existing Haar-like features but adding asymmetry.

In this regard, FIG. 8 is a basic form of a conventional Harr-like feaure, FIG. 9 is an exemplary photograph of Haar-like features selected for front face area detection according to an embodiment of the present invention, and FIG. An example photograph of Haar-like features selected for area detection.

FIG. 11 shows a rectangular Haar-Like feature newly added by the present embodiment, and FIG. 12 shows an example of Haar-Like features selected for non-face detection among the Haar-Like features of FIG. 11. have.

Unlike the conventional symmetric Haar-Like feature, the Haar-Like feature of the present embodiment is configured to asymmetrically form, structure, and shape as shown in FIG. Excellent detection effect on the front face.

FIG. 13 is a probability curve of a Haar-Like feature in a training set for a conventional Harr-like feaure and a Harr-like feaure applied to this embodiment.

A) is the present case, b) is the existing case, and as shown, the probability curve corresponding to the case of the present embodiment is concentrated in a narrower range.

This means that the Haar-Like features added in this embodiment are effective in the face detection in view of the base classification rule.

FIG. 14 is a table showing newly added features in a training set of a non-facial face and an average value of variance and Kurtosis of the probability curve of the existing Harr-like feaure.

The table shows the variances and probability values of Kurtosis of the probability curves of the newly added Haar-Like features and existing Haar-Like features in the training set of the non-facial face.

The Haar-Like features added in this example show that the dispersion is small and the Kurtosis is large, which is effective in detection.

As described above, in the step (a2), the har-like feature for detecting the face area further includes an asymmetric har-like feature for detecting the non-frontal face area. do.

On the other hand, as a method for determining the validity of the face, for example, a method using a PCA (Principle Component Analysis) or a neural network, there is a disadvantage that these methods are slow and requires a separate analysis.

Therefore, in one embodiment of the present invention, the validity of the detected face is determined by comparing the magnitude of the result value of AdaBoost (CF _H (x) of Equation 1) with a predetermined threshold value.

In the conventional AdaBoost method, only a code value is used as in the following Equation 1, but in this embodiment, the validity of the face area is determined using its actual size.

… … … [Reference Formula 1]

That is, in Equation 1, the size of CF _H (x) can be used as an important factor for determining the validity of the face.

This value CF _H (x) is a measure of how close the detected area is to the face and can be used to determine the validity of the face by setting a predetermined threshold value.

At this time, the predetermined threshold is empirically set using the learning face group.

In the facial feature detection step S200, a facial feature point is detected in the detected face region.

The facial feature detection step S200 is performed by searching for a landmark of the ASM method, and detects the facial feature by proceeding using the AdaBoost algorithm.

For example, the detection of the facial feature point (b1) defines the position of the current feature point as (x _l , y _l ), and all possible partial windows of n * n pixel size in the vicinity of the current feature point position. Classifying them into a classifier; (b2) calculating candidate positions of the feature points according to Equation 2 below; And (b3) setting (x ' _l , y' _l ) as a new feature point if the condition of Equation 3 is satisfied, and maintaining the position (x _l , y _l ) of the current feature point if not satisfied. It is configured to include.

[Equation 2]

[Equation 3]

(However, the maximum near distance searched in the a: x axis direction

b: Maximum near distance searched in the y-axis direction

x _dx _, _dy : partial window centered around (dx, dy) from (x _l , y _l )

N _all : Total stage number of classifier

N _pass : the number of steps through which the partial window has passed

c: constant value less than 1 obtained from experiments to limit the reliability of partial windows not passed to the end)

As a method for detecting a feature point of a face, there are, for example, a method of individually detecting feature points and a method of simultaneously detecting a feature point in correlation.

Since the method of detecting feature points individually has many problems of detecting errors in partially obscured face images, in this embodiment, the Active Shape Model (ASM) method, which is a preferable method for face feature detection in terms of speed and accuracy, is used. I use it.

These ASM methods are discussed in TFCootes, CJTaylor, DHCooper, and J. Graham's paper, “Active shape models: Their training and application” (CVGIP: Image Understanding, Vol. 61, pp.38-59, 1995). SCYan, C.Liu, SZLi, L.Zhu, HJZhang, H.Shum, and Q.Cheng's paper “Texture-constrained active shape models” (In Proceedings of the First International Workshop on Generative-Model-Based Vision (with ECCV), May 2002), TFCootes, GJEdwards, and CJ Taylor's paper “Active appearance models” (In ECCV 98, Vol. 2, pp. 484-498, 1998) TFCootes, G.Edwards, and CJTaylor's paper “Comparing Active Shape Models with Active Appearance Models” can be understood.

On the other hand, since the feature point search of the existing ASM is a method using a profile at the feature point, detection is stable only in high quality images.

In general, an image extracted from an image input through an image input means such as a camera may be obtained as a low resolution, low quality image.

Therefore, in the present embodiment, the feature point is searched by the AdaBoost method to improve the feature, so that the feature points can be easily detected even in low resolution and low quality images.

FIG. 15 is a profile picture applied to an existing ASM method for an image having a low resolution or poor image quality. FIG. 16 is a pattern picture around each mark point used in Adaboost for mark point search of the present invention.

In the facial feature point detection step S200 and the estimation information generation step S400, as illustrated in FIG. 17, a plurality of feature points (for example, 28) may be detected.

In the present embodiment, eight basic facial features (4 eyes (4, 5, 6, 7), 2 noses (10, 11) and 2 mouths (8, 9)) in consideration of arithmetic processing and tracking performance together Bay is used to estimate gaze distance and gaze direction.

In the matrix estimating step S300, as illustrated in FIG. 18, eight facial feature points input S310, for example, the coordinate values of the detected eight feature points are stored in a memory device in which the program of the present embodiment is driven. Loading into the input value), 3D standard face model loading (S320, for example, the overall coordinate information of the 3D face model stored in the DB, the computing means that the program is driven as the input value), optimal Conversion matrix estimation (S330) is performed.

The estimation information generation step (S400) of calculating the gaze direction and gaze distance from the estimated optimal transformation matrix is performed.

As shown in FIG. 5, the 3D standard face model is a 3D mesh model composed of 331 points and 630 triangles.

The estimating information generating step (S400) generates viewer face tracking information by estimating at least one of the gaze direction and gaze distance of the viewer based on the optimal transformation matrix.

The optimal transformation matrix estimation is performed by calculating (c1) a transformation equation of Equation 4 using a 3 * 3 matrix M for face rotation information of the 3D standard face model and a 3D vector T for face parallel movement information. Step M and T are variables having respective components as variables and defining the optimal transformation matrix; (c2) calculating the three-dimensional vector P 'of Equation 5 using the camera feature point position vector P _C obtained by Equation 4 and the camera transformation matrix M _C obtained by Equation 6 below; ; (c3) defining a two-dimensional vector P _I as (P ' _x / P' _z , P ' _y / P' _z ) based on the three-dimensional vector P '; And (c4) estimating each variable of the optimal transformation matrix using the two-dimensional vector P _I and the coordinate values of the facial feature points detected in the step (b).

[Equation 4]

P _C = M * P _M + T

[Equation 5]

P '= M _c * P _c

(Where P 'is a three-dimensional vector defined by (P' _x , P ' _y , P' _z ))

The optimal transform matrix is mathematically composed of a 3 * 3 matrix M and a 3D vector T. Here, the 3 * 3 matrix M reflects the rotation information of the face, and the 3D vector T reflects the parallel movement information of the face.

First, according to Equation 4, the feature point position (three-dimensional vector) P _M in the coordinate system of the three-dimensional standard face model is the position (three-dimensional vector) P in the camera coordinate system by the optimal transformation matrix (M, T). converted to _c .

In this case, the 3D standard face model coordinate system is a 3D coordinate system whose coordinate center is located at the center of the 3D standard face model, and the camera coordinate system is a 3D coordinate system whose center is located at the center of the image input means (10 in FIG. 25). .

Next, P ', which is a three-dimensional vector defined by (P'x, P'y, P'z), is obtained using the camera feature point position vector P _c and the camera transformation matrix M _{c according} to Equation 5. .

Here, the camera transformation matrix _Mc is a 3 * 3 matrix determined by the focal length of the camera and the like, and is defined as in Equation 6 below.

[Equation 6]

(W: width of image input by video input means (camera)

H: Height of image input by video input means (camera)

focal_len: -0.5 * W / tan (Degree2Radian (fov * 0.5))

fov: angle of view of the camera)

Therefore, "P '= (P'x, P'y, P'z)" is defined including 12 variables of the optimal conversion matrix M, T as described below, and thus the 12 Including the variable, “P _I = (P'x / P'z, P'y / P'z)” can be defined.

The estimation process of the optimal conversion matrix (M, T) by the above process is as follows.

12 variables (3 * 3 = 9 of M and 3 of T) of the optimal transformation matrix using the position of the detected 8 basic facial feature points and the position pairs of corresponding points in the 3D standard face model for this position Are estimated using the least square method.

In other words, a target function of outputting a sum of squares of deviations between the positions of the detected feature points and the positions of the face model feature points to which the optimal transformation matrix is applied is set as 12 variables of the optimal transformation matrix.

The 12 optimal variables are calculated by solving the optimization problem that minimizes the target function.

The gaze direction information is defined by Equation 7 using each component of the rotation information related matrix M of the optimal transformation matrix, and the gaze distance information is a parallel movement related vector T of the optimal transformation matrix. Is defined.

[Equation 7]

Where m ₁₁ , m ₁₂ , ..., m ₃₃ : estimated values of each component of the 3 * 3 matrix M

That is, the gaze direction information becomes (a _x , a _y , a _z ), and the gaze distance information is defined by the parallel movement related vector T itself.

In the gender estimating step (S500), as shown in FIG. 19, the image and the facial feature point input (S510), the gender estimation face region clipping (S520), the cut face region image normalization (S530), and the gender by SVM It is made in the process of estimation (S540).

As a method for sex estimation, there are, for example, a view-based method using all of a human face and a geometric feature-based method using only geometric features of a face.

As a preferred example, the gender estimation is performed by a view-based gender classification method using SVM (Support Vector Machine) learning to normalize the detected face region to form a facial feature vector and predict the gender therewith.

The SVM method may be classified into a support vector classifier (SVC) and a support vector regression (SVR).

Regarding such gender estimation, Shumeet Baluja et al. “Boosting Sex Identification Performance”, Carnegie Mellon University, Computer Science Department (2005), Gutta, et al. “Gender and ethnic classification” .IEEE Int.Workshop on Automatic Face and Gesture Recognition, pages 194-199 (1998) and Moghaddam et al. “Learning Gender with Support Faces”. IEEE T. PAMI Vol. 24, No. 5 (2002), and the like.

In the present embodiment, the gender estimating step (S500) specifically includes: (e1) cutting out a face region for sex estimation from the detected face region based on the detected face feature points; (e2) normalizing the size of the cut face sex estimation region; (e3) normalizing a histogram of the face region for gender estimation in which the size is normalized; And (e4) constructing an input vector from the face region for gender estimation where the size and histogram are normalized, and estimating gender using a pre-learned SVM algorithm.

In the step (e1), the face area is cut out using the input image and the facial feature point. For example, as shown in FIG. 20, the half of the distance between the left and right eyes is cut to 1 and is to be cut. Calculate the area of the face.

In the step (e2), for example, the cut out facial region is normalized to 12 * 21 size.

In the step (e3), the histogram is normalized, which is a process of equalizing the number of pixels having each density value to the histogram in order to minimize the effect of the lighting effect.

In the step (e4), for example, a 252-dimensional input vector is constructed from a normalized 12 * 21 face image, and sex is estimated using a pre-trained SVM.

At this time, the gender is estimated as a male or a female if the calculated result of the classifier of Equation 8 is greater than zero.

[Equation 8]

(However, M: the number of samples,

y _i : Gender value of the i th test data, set to 1 for male and -1 for female.

α _i : coefficient of the i-th vector,

x: Exam,

x _i : Sample sample,

k: kernel function,

b: deviation)

In this case, the kernel function may use a Gaussian Radial Basis Function (GRBF) defined in Equation 9 below.

[Equation 9]

(However, x: test data, x ': learning sample data, σ: variable indicating the degree of dispersion)

Meanwhile, the kernel function may be a polynomial kernel, etc., in addition to the Gaussian copper soil function, and preferably, the Gaussian copper soil function is used in consideration of the identification performance.

On the other hand, the SVM (Support Vector Machine) method is a classification method that derives the boundary of two groups in a group having two groups and is known as a learning algorithm for pattern classification and regression.

The basic learning principle of SVMs is to find an optimal linear hyperplane with minimal predictive classification errors for invisible test samples, that is, with good generalization performance.

Based on this principle, the linear SVM uses a taxonomic method to find the linear function with the least order.

Learning problems of SVM result in linearly constrained two-dimensional planning problems.

Samples x1,… , xi, individual class labels y1,… , yi, and y = 1 if the sample is male and y = -1 if the female.

In order to determine the learning result uniquely, the following Equation 2 is restricted.

… … … [Reference Formula 2]

Given this constraint, the minimum distance between the learning sample and the hyperplane is represented by the following Equation 3, so it is necessarily as shown in the following Equation 4.

… … … [Reference Formula 3]

… … … [Reference Formula 4]

Since w and b must be determined to maximize the minimum distance while fully identifying the learning sample, w and b are formulated as shown in Equation 5 below.

… … … [Reference Formula 5]

Minimizing the objective function maximizes the value of Equation 4, which is the minimum distance.

Therefore, w and deviation b are calculated for the support vector maximizing the above objective function.

Optimal Constants for SVM with Kernel

It is determined as shown in Equation 6 below.

… [Reference Formula 6]

At this time, the constraint is shown in Equation 7 below.

… [Reference Formula 7]

Where K (x, x ') is a nonlinear kernel function.

The next deviation is calculated as shown in Equation 8 below.

… [Reference Formula 8]

If the result of calculation for the classifier of Equation 8 obtained by the above-described method is 1, it is determined as male, and if it is 0, female.

Meanwhile, although the Adaboost method may be used in the above process, considering the performance and generalization performance of the classifier, it is more preferable to use the SVM method.

For example, when Asians are trained by the Adaboost method and tested for sex estimates on Europeans, the performance is 10-15% lower than when tested by the SVM method.

From this, there is an advantage that high discrimination ability can be obtained when gender estimation is performed by SVM method under the condition that there is not enough learning data.

In the age estimating step (S600), as shown in FIG. 21, an image and a facial feature point input (S610), an age estimation face area cropping (S620), a cut out face area image normalization (S630), and a nine-body space Projection (S640), the second polynomial regression is made by the process of age estimation (S650).

Regarding age estimation methods, Y.Fu, Y.Xu, and TSHuang, “Estimating human ages by manifold analysis of face pictures and regression on aging features,” in Proc.IEEE Conf. Multimedia Expo., 2007, pp. 1383-1386 and in the papers of G.Guo, Y.Fu, TSHuang, and C.Dyer, “Locally adjusted robust regression for human age estimation,” presented at the IEEE Workshop on Applications of Computer Vision, 2008, A. Lanitis, C. Draganova, and C. Christodoulou, “Comparing different classifers for automatic age estimation,” IEEE Trans. Syst., Man, Cybern. B, Cybern., Vol. 34, no. 1, pp. 621- 628, Feb. 2004.

In the present embodiment, the estimation of the age specifically includes: (f1) cutting out an age estimation face area from the detected face area based on the detected facial feature point; (f2) normalizing the size of the cut age estimation face region; (f3) performing local illumination correction on the age estimation face region where the size is normalized; (f4) generating a feature vector by constructing an input vector from the size normalized and locally-illuminated age estimation face region and projecting it into a nine-body space; And (f5) estimating age by applying quadratic regression to the generated feature vectors.

In the step (f1), the face region is cut out using the input image and the facial feature point.

For example, as shown in FIG. 22, the face region is cut out from the binocular and the entrance point to the upper (0.8), the lower (0.2), the left (0.1), and the right (0.1), respectively.

In the step (f2), for example, the cut out face region is normalized to 64 * 64 size.

In the step (f3), in order to reduce the influence of the lighting effect, local illumination correction is performed by the following equation (10).

[Equation 10]

I (x, y) = (I (x, y) -M) / V * 10 + 127

(However, the shade value at position I (x, y) :( x, y), M: 4 value at the local window area, V: standard variance value)

The standard dispersion value (V) is a characteristic value representing the degree to which a certain amount of coincidence is scattered around the average value, and mathematically, the standard dispersion V is calculated as in Equation (9).

… … … [Reference Formula 9]

In the step (f4), for example, a 4096-dimensional input vector is constructed from a 64 * 64 face image, and a 50-dimensional feature vector is generated by projecting into a pre-learned manifold space.

The age estimation theory assumes that the characteristics of the human aging process reflected in the face image can be expressed in patterns according to any low dimensional distribution.

From this, it is basic to estimate projection projection from face image to naida body space.

We will briefly explain the learning matrix learning algorithm for Nida yang by Conformal Embedding Analysis (CEA).

Y = P ^T X... … … [Reference Formula 10]

In Ref. 10, X is an input vector, Y is a feature vector, and P is a projection matrix to Nida body trained using CEA.

In this regard, it can be understood through a paper by Yun Fu Huang, T.S., "Human Age Estimation With Regression on Discriminative Aging Manifold" in Multimedia, IEEE Transactions on, 2008, pp.578-584.

n face images x ₁ , x ₂ ,... , x _n is _replaced by X = {x ₁ ,... , x _n } ∈R ^m .

X is an m × n matrix and x _i represents every face image.

The manifold learning step is to obtain a projection matrix for representing the m-dimensional face vector as a d-dimensional face vector (aging feature vector), where d < m (d is much smaller than m).

In other words, we obtain the projection matrix P _mat whose y _i = P _mat × x _i . Where {y ₁ ,… , y _n } ∈R ^d . Here, d is set to 50.

In general, when performing face analysis, the image order m is much larger than the number n of images.

Therefore m × m matrix XX ^T is a degenerate matrix. To overcome this problem, we first project the face image into subspace without information loss using PCA, and the result matrix XX ^T becomes an immortality matrix.

(1) PCA Projection

Given n face vectors, we find the covariance matrix C _pca for this face vector group. C _pca is an m × m matrix.

The eigenvalues and eigenvectors of C _pca × Eigen _vector = Eigen _value × Eigen _vector for the covariance matrix C _pca are solved to obtain m eigenvalues and m m-dimensional eigenvectors.

Next, d matrix of eigenvectors are selected in order of eigenvalues to form matrix W _PCA .

W _PCA is an m × d matrix.

(2) Weight matrix Ws, Wd composition

Ws denotes a relationship between face images belonging to the same age group and Wd denotes a relationship between face images belonging to different groups.

… … … [Reference Formula 11]

In Ref. 11, Dist (X _i , X _j ) is the same as Ref. 12 below.

… [Reference Formula 12]

(3) CEA foundation vector calculation

The eigenvectors corresponding to the d largest eigenvalues of become CEA basis vectors.

… [Reference Formula 13]

(4) CEA silver coins

Orthogonal Vectors a ₁ ,. When, a _d is calculated, the matrix WCEA is defined as follows.

W _CEA = [a ₁ , a ₂ ,... , a _d ]… … … [Reference Formula 14]

Where W _CEA is the m × d matrix.

The projective matrix P _mat is defined as in Equation 15 below.

P _mat = W _PCA W _CEA . … … [Reference Formula 15]

The projection matrix P _mat is used to obtain aging characteristics for each face vector X.

x → y = P _mat ^T × x... … … [Reference Formula 16]

(Where y is a dimensional vector corresponding to the face vector X, ie, an aging characteristic amount)

In the step (f5), to estimate the age by applying the second regression is made by the following equation (11).

[Equation 11]

(However, b _o , b ₁ , b ₂ : regression coefficients precomputed from the learning data,

Y: aging characteristic vector calculated by reference formula 16 from test data x,

L: estimated age)

b _o , b ₁ , and b ₂ are precomputed from the learning material as follows:

The second regression model is shown in Equation 17 below.

… … … [Eq. 17]

here

Is the age of the i-th learning image

Is the feature vector of the i-th learning image.

This is expressed in the vector-matrix format as shown in Equation 18 below.

… … … [Equation 18]

here,

… … … [Reference Expression 19]

N is the number of learning materials.

Where regression constant

Is calculated as follows.

… … … [Reference Formula 20]

In the eyelid estimation step (S700), as shown in FIG. 23, the image and facial feature point input (S710), the eye region estimation for trimming the face region (S720), the cut out facial region image normalization (S730), SVM By eyelid estimation (S740) by the process is made.

In the present embodiment, the estimation of the eye closing may specifically include: (g1) cutting the eye mask estimation face area from the detected face area based on the detected facial feature point; (g2) normalizing the size of the cut-out eye mask estimation face region; (g3) normalizing a histogram of the face region for estimating the eyelid normalized in size; And (g4) constructing an input vector from the face region for eye-eye estimation for which the size and histogram are normalized, and estimating eye-eye closure using a pre-learned SVM algorithm.

In the step (g1), the eye region is cut out using the input image and the facial feature point.

For example, as illustrated in FIG. 24, the eye area may be cut out by determining the width of the feature points detected by the facial feature point detection based on both end points of the eye and determining the eye area at the same height up and down.

In the step (g2), for example, the cropped eye region image is normalized to 20 * 20 size.

In the step (g3), histogram normalization is performed to reduce the effect of the lighting effect.

In the step (g4), for example, a 400-dimensional input vector is constructed from a normalized 20 * 20 face image, and estimated whether to close the eye using a pre-learned SVM.

In the step (g4), the estimation of the eye closing is determined as the state of opening the eyes when the result value of Equation 12 is greater than 0, and the state of closing the eyes when the result value is less than 0. Is determined to be awakened.

[Equation 12]

(However, the number of M: SV vectors,

y _i : Whether to close the eye for the i-th learning material is set to 1 when the eyes are opened and -1 when the eyes are closed.

α _i : coefficient of the i-th vector,

x: test vector,

x _i : i-th learning vector,

k: kernel function,

b: deviation)

In this case, the kernel function may use a Gaussian landscape soil function defined in Equation 13.

[Equation 13]

In the result output step (S800), the sex information of the viewer and the age information of the viewer estimated by the process described above are output to the stereoscopic control means as information for controlling the stereoscopic sense of the 3D display apparatus.

In general, when developing a 3D display device, the development is based on the premise that an adult man is sitting on the front 2.5M of the 3D display device.

However, for example, in the case of 3DTV using binocular parallax, the stereoscopic effect is reduced or dizziness occurs when it is moved out of the corresponding position.

On the other hand, the average adult has a binocular distance of about 6.5cm, the brain is to calculate the depth information accordingly.

However, depending on race, gender, and age, this difference can be as small as 1cm or 1.5cm.

Therefore, the gender information and the age information of the viewer are needed to determine this and control the stereoscopic feeling of the 3D display device.

The gender information of the viewer and the age information of the viewer output by the stereoscopic control means may be used as a horizontal parallax change reference value, which means a change amount determined based on the point where the left and right images are focused. have.

That is, by controlling the stereoscopic sense of the 3D display apparatus by using the horizontal parallax change reference value based on the estimated gender information of the viewer and the age information of the viewer, a 3D screen optimized for the current viewer's viewing condition may be output and provided. It is.

On the other hand, as a result of the viewer's estimation of the gaze direction, when the viewer deviates by a predetermined angle or more from the front of the 3D display device, not when viewing from the front of the 3D display device (FIG. 25A) (for example, as illustrated in FIG. As described above, when the viewer gazes at a position 10 ° or more left or right (b in FIG. 25), the following processing can be performed.

The output direction of the 3D display apparatus may be changed by using rotation driving means (not shown) so that the front side of the 3D display apparatus faces the corresponding viewer.

Alternatively, the viewer may be guided so that the viewer can move to the front of the 3D display by outputting captions such as “deviating from the viewing angle” and “winding to the front of the screen” on the screen of the 3D display.

In addition, in the result output step (S800), the eye contact information estimated by the above-described process is output to the screen power control means as information for controlling the ON / OFF screen output of the 3D display device.

That is, when it is estimated that the viewer's eye-closing state continues, the screen power control means may turn off the image output to the display device screen so that no further image output is performed.

Reference numeral 1000 in FIG. 25 denotes control means for performing such various control processes.

Embodiments of the present invention include a computer readable recording medium including program instructions for performing various computer-implemented operations.

The computer-readable recording medium may include program instructions, data files, data structures, etc. alone or in combination.

The recording medium may be one specially designed and configured for the present invention, or may be known and available to those skilled in computer software.

Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical recording media such as CD-ROMs, DVDs, magnetic-optical media such as floppy disks, and ROM, RAM, flash memory, and the like. Hardware devices specifically configured to store and execute the same program instructions are included.

The recording medium may be a transmission medium such as an optical or metal wire, a waveguide, or the like including a carrier wave for transmitting a signal specifying a program command, a data structure, or the like.

Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like.

Although the present invention has been described with reference to the accompanying drawings, it will be apparent to those skilled in the art that many different and obvious modifications are possible without departing from the scope of the invention from this description. Therefore, the scope of the invention should be construed by the claims described to include many such variations.

Claims

A viewer face tracking information generation method for controlling stereoscopic feeling of a 3D display device in response to at least one piece of information of a viewer's gaze direction and gaze distance,

(a) detecting a face region of the viewer from an image extracted from an image input through an image input means provided at one position of the 3D display apparatus;

(b) detecting a facial feature point in the detected face region;

(c) estimating an optimal transformation matrix for generating a 3D viewer face model corresponding to the face feature by converting the model feature points of the 3D standard face model; And

and (d) estimating at least one of the gaze direction and gaze distance of the viewer based on the optimal transformation matrix to generate viewer face tracking information.
The method of claim 1,

In step (a),

(a1) creating a YCbCr color model from the RGB color information of the extracted image, separating color information and brightness information from the created color model, and detecting a face candidate area based on the brightness information; And

(a2) defining a quadrilateral feature point model for the detected face candidate region, and detecting a face region based on the training material trained by the AdaBoost learning algorithm on the quadrilateral feature point model; Viewer face tracking information generation method.
The method of claim 2,

After step (a2)

(a3) determining the detected face area as a valid face area when the size of the result value of AdaBoost (CF H (x) of Equation 1) exceeds a predetermined threshold value; Viewer face tracking information generation method, characterized in that.

[Equation 1]

(However, M: the number of total classifiers constituting the strong classifiers

h m (x): Output value from the mth weak classifier

θ: value used to adjust the error judgment rate of the strong classifier)
The method of claim 2,

In the step (a2),

The like-look feature for detecting the face region further comprises asymmetric like-like features for detecting the non-frontal face region.
The method of claim 1,

In step (b),

A method for generating viewer face tracking information, which is performed by searching for a landmark of an ASM method, and proceeds using an AdaBoost algorithm.
The method of claim 5,

Detection of the facial feature point,

(b1) defining a position of the current feature point as (x l , y l ), and classifying partial windows of n * n pixel size into a classifier around the current feature point;

(b2) calculating candidate positions of the feature points according to Equation 2 below; And

(b3) setting (x ' l , y' l ) as a new feature point if the condition of Equation 3 is satisfied, and maintaining the position (x l , y l ) of the current feature point if not satisfied; Viewer face tracking information generation method, characterized in that configured to include.

[Equation 2]

[Equation 3]

(However, the maximum near distance searched in the a: x axis direction

b: Maximum near distance searched in the y-axis direction

x dx , dy : partial window centered around (dx, dy) from (x l , y l )

N all : Total stage number of classifier

N pass : the number of steps through which the partial window has passed

c: constant value to limit the reliability value of partial windows not passed to the end)
The method of claim 1,

In step (c),

(c1) calculating a conversion equation of Equation 4 using a 3 * 3 matrix M of face rotation information of the 3D standard face model and a 3D vector T of face parallel movement information, wherein M and T are A matrix having each component as a variable and defining the optimal transformation matrix;

(c2) calculating the three-dimensional vector P 'of Equation 5 using the camera feature point position vector P C obtained by Equation 4 and the camera transformation matrix M C obtained by Equation 6 below; ;

(c3) defining a two-dimensional vector P I as (P ' x / P' z , P ' y / P' z ) based on the three-dimensional vector P '; And

(c4) estimating each variable of the optimal transformation matrix using coordinates of the two-dimensional vector P I and the facial feature points detected in the step (b); How to generate information.

[Equation 4]

P C = M * P M + T

[Equation 5]

P '= M c * P c

(Where P 'is a three-dimensional vector defined by (P' x , P ' y , P' z ))

[Equation 6]

(W: the width of the image input by the video input means,

H: height of the image inputted by the video input means,

focal_len: -0.5 * W / tan (Degree2Radian (fov * 0.5)),

fov: angle of view of the camera)
The method of claim 7, wherein

The gaze direction information is obtained by using Equation 7 below using the estimated respective components of the matrix M, and the gaze distance information is defined by the estimated respective components of the vector T. Way.

[Equation 7]

(M 11 , m 12 , ..., m 33 : estimated values of each component of the 3 * 3 matrix M)
The method of claim 1,

After step (d),

(e) a gender estimation step of estimating the gender of the viewer using the detected face region.
The method of claim 9,

In step (e),

(e1) cutting out a face estimation region for gender estimation from the detected face region based on the detected face feature point;

(e2) normalizing the size of the cut face sex estimation region;

(e3) normalizing a histogram of the face region for gender estimation in which the size is normalized; And

and (e4) constructing an input vector from the face region for gender estimation where the size and histogram are normalized, and estimating a gender using a pre-learned SVM algorithm.
The method of claim 1,

After step (d),

and (f) an age estimation step of estimating the age of the viewer using the detected face region.
The method of claim 11,

Estimation of the age,

(f1) cutting out an age estimation face area from the detected face area based on the detected face feature point;

(f2) normalizing the size of the cut age estimation face region;

(f3) performing local illumination correction on the age estimation face region where the size is normalized;

(f4) generating a feature vector by constructing an input vector from the size normalized and locally-illuminated age estimation face region and projecting it into a nine-body space; And

and (f5) estimating an age by applying quadratic regression to the generated feature vector.
The method of claim 1,

After step (d),

and (g) estimating eyelids of the viewer using the detected face region.
The method of claim 13,

Estimation of the eye closing,

(g1) cutting a face region for eye closure estimation from the detected face region based on the detected facial feature point;

(g2) normalizing the size of the cut-out eye mask estimation face region;

(g3) normalizing a histogram of the face region for estimating the eyelid normalized in size; And

(g4) constructing an input vector from the face region for eye-eye estimation for which the size and histogram are normalized, and estimating eye-eye by using a pre-learned SVM algorithm; generating viewer face tracking information Way.
A viewer face tracking information generation method for controlling stereoscopic feeling of a 3D display device in response to at least one piece of information of a viewer's gaze direction and gaze distance,

A face region detecting step of detecting a face region of the viewer from an image extracted from an image input through an image input means provided at one position of the 3D display apparatus;

A gaze information generation step of generating gaze information by estimating at least one information of gaze direction and gaze distance of the viewer based on the detected face region; And

And generating viewer information by estimating at least one piece of information of a gender and an age of the viewer based on the detected face region.
A computer-readable recording medium having recorded thereon a program for executing each step of the method according to any one of claims 1 to 15.
A three-dimensional display apparatus for controlling a three-dimensional effect by using the method for generating viewer face tracking information according to any one of claims 1 to 15.
A viewer face tracking information generation device for controlling a stereoscopic feeling of a three-dimensional display device in response to at least one piece of information of a viewer's gaze direction and gaze distance,

A face region detection module for detecting a face region of the viewer from an image extracted from an image input through an image input means provided at one position of the 3D display apparatus;

A facial feature point detection module for detecting a facial feature point in the detected face area;

A matrix estimation module for transforming a model feature point of a 3D standard face model to estimate an optimal transformation matrix for generating a 3D viewer face model corresponding to the face feature point; And

And a tracking information generation module configured to generate at least one of a gaze direction and a gaze distance of the viewer based on the estimated optimal transformation matrix to generate viewer face tracking information.
The method of claim 18,

The facial feature point detection module,

The apparatus for detecting facial features of a viewer characterized by detecting facial feature points by searching for landmarks of an active shape model (ASM) method, and using the AdaBoost algorithm.
The method of claim 18,

The matrix estimation module,

Using the 3 * 3 matrix M of the face rotation information of the 3D standard face model and the 3D vector T of the face parallel movement information, a conversion equation of Equation 4 is calculated, wherein M and T are variables of each component. Is a matrix defining the optimal transformation matrix; The three-dimensional vector P 'of Equation 5 is calculated by using the camera feature point position vector P C obtained by Equation 4 and the camera transformation matrix M C obtained by Equation 6 below, and the three-dimensional Based on the vector P ', the 2D vector P I is defined as (P' x / P ' z , P' y / P ' z ), and the 2D vector P I and the facial feature detected in step (b) An apparatus for tracking face tracking information of a viewer, comprising estimating each variable of the optimal transformation matrix using a coordinate value of.

[Equation 4]

P C = M * P M + T

[Equation 5]

P '= M c * P c

(Where P 'is a three-dimensional vector defined by (P' x , P ' y , P' z ))

[Equation 6]

(W: the width of the image input by the video input means,

H: height of the image inputted by the video input means,

focal_len: -0.5 * W / tan (Degree2Radian (fov * 0.5)),

fov: angle of view of the camera)
The method of claim 18,

And a gender estimating module for estimating the gender of the viewer by using the detected face region.
The method of claim 18,

And an age estimation module for estimating the age of the viewer using the detected face region.
The method of claim 18,

And an eye closure estimation module for estimating eye closure of the viewer by using the detected face region.
A viewer face tracking information generation device for controlling a stereoscopic feeling of a three-dimensional display device in response to at least one piece of information of a viewer's gaze direction and gaze distance,

Means for detecting a face region of the viewer from an image extracted from an image input through an image input means provided at one position of the 3D display apparatus;

Means for generating gaze information by estimating at least one of gaze direction and gaze distance of the viewer based on the detected face region; And

And means for generating viewer information by estimating at least one information of the gender and age of the viewer based on the detected face region.