WO2013009020A2 - Method and apparatus for generating viewer face-tracing information, recording medium for same, and three-dimensional display apparatus - Google Patents

Method and apparatus for generating viewer face-tracing information, recording medium for same, and three-dimensional display apparatus Download PDF

Info

Publication number
WO2013009020A2
WO2013009020A2 PCT/KR2012/005202 KR2012005202W WO2013009020A2 WO 2013009020 A2 WO2013009020 A2 WO 2013009020A2 KR 2012005202 W KR2012005202 W KR 2012005202W WO 2013009020 A2 WO2013009020 A2 WO 2013009020A2
Authority
WO
WIPO (PCT)
Prior art keywords
face
viewer
equation
information
face region
Prior art date
Application number
PCT/KR2012/005202
Other languages
French (fr)
Korean (ko)
Other versions
WO2013009020A4 (en
WO2013009020A3 (en
Inventor
이인권
이정헌
Original Assignee
Lee In Kwon
Lee Jeong Heon
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lee In Kwon, Lee Jeong Heon filed Critical Lee In Kwon
Priority to US14/003,685 priority Critical patent/US20140307063A1/en
Publication of WO2013009020A2 publication Critical patent/WO2013009020A2/en
Publication of WO2013009020A3 publication Critical patent/WO2013009020A3/en
Publication of WO2013009020A4 publication Critical patent/WO2013009020A4/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/366Image reproducers using viewer tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/446Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering using Haar-like filters, e.g. using integral image techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/366Image reproducers using viewer tracking
    • H04N13/383Image reproducers using viewer tracking for tracking with gaze detection, i.e. detecting the lines of sight of the viewer's eyes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/178Human faces, e.g. facial parts, sketches or expressions estimating age from face image; using age information for improving recognition

Definitions

  • the present invention relates to a method and apparatus for generating viewer face tracking information, a recording medium and a three-dimensional display apparatus.
  • the facial feature point in the viewer's face is detected from the image extracted from the image input through the image input means, and the viewer's gaze direction for controlling the stereoscopic effect of the 3D display device using the facial feature point and the optimal transformation matrix.
  • a method and apparatus for generating a viewer face tracking information for generating information on gaze distance, a recording medium, and a three-dimensional display device is used.
  • Human eyes are about 6.5 cm apart in the transverse direction.
  • the resulting binocular disparity acts as the most important factor for the three-dimensional feeling.
  • the left eye and the right eye see different 2D images.
  • a single image is created from two images obtained by the visual difference between two eyes and shows the difference between the two eyes so that a person can feel the liveness and reality as if they are in the place where the image is being made.
  • the technology is called 3D stereoscopic imaging technology.
  • 3D stereoscopic image technology has become a core technology that is widely applied to the development of all existing industrial products such as 3D TV, information and communication, broadcasting, medical, film, games, animation and so on.
  • 3D TV is a device that inputs images for left and right eyes to each eye on a display using special glasses and recognizes 3D in human cognitive / information system using binocular parallax principle.
  • the 3D TV separates a left / right image that causes an artificial visual difference from a display and delivers it to both eyes, thereby making the brain feel a 3D stereoscopic feeling.
  • a passive 3D TV is composed of an optical film, a liquid crystal, and a polaroid film (PR film), as shown in FIG. 1.
  • the image to be seen by the left eye denoted by L is the left eye and the right denoted by R.
  • the image to go to the eyes is displayed on the right eye, and the 3D stereoscopic feeling is felt.
  • a control technology such as tracking the direction and the position at which the viewer stares, controlling the stereoscopic effect of the 3D TV, or rotating the 3D TV screen.
  • the glasses-free 3D TV is a TV that can provide 3D images without using special glasses, and in order to apply the glasses-free method, a technology for tracking a viewer's gaze is further required.
  • One example of a technique for tracking the direction in which the viewer stares is to track the viewer's eyes.
  • the method of tracking the viewer's eyes uses a method of outputting the coordinates of the pupil using an eye tracking algorithm after grasping the feature points of the eye position.
  • a method of detecting a boundary line between an iris and an egg white in a face image and tracking the same is used.
  • this method has a problem that it is difficult to accurately determine the angle at which the eye gazes, and the eye tracking angle is small.
  • An object of the present invention for solving the problems according to the prior art is to detect the facial feature in the viewer's face from the image extracted from the image input through the image input means, and using the facial feature and the optimal conversion matrix three-dimensional display Disclosed is a method and apparatus for generating a viewer's face tracking information for generating information about a viewer's gaze direction and gaze distance for controlling a stereoscopic effect of a device, a recording medium, and a three-dimensional display device.
  • An embodiment of the present invention for achieving the above object, as a viewer face tracking information generation method for controlling the stereoscopic sense of the three-dimensional display device corresponding to at least one of the gaze direction and gaze distance of the viewer, ( a) detecting a face region of the viewer from an image extracted from an image input through an image input means provided at one position of the 3D display apparatus; (b) detecting a facial feature point in the detected face region; (c) estimating an optimal transformation matrix for generating a 3D viewer face model corresponding to the face feature by converting the model feature points of the 3D standard face model; And (d) estimating at least one of the gaze direction and gaze distance of the viewer based on the optimal transformation matrix to generate viewer face tracking information.
  • a viewer face tracking information generation method for controlling a stereoscopic feeling of a 3D display apparatus in response to at least one of a gaze direction and a gaze distance of a viewer, wherein the 3D display is performed.
  • a face region detecting step of detecting a face region of the viewer from an image extracted from an image input through an image input means provided at one position of a device side;
  • a gaze information generation step of generating gaze information by estimating at least one information of gaze direction and gaze distance of the viewer based on the detected face region; And generating viewer information by estimating at least one piece of information of the gender and the age of the viewer based on the detected face region.
  • a computer-readable recording medium recording a program for executing each step of the viewer face tracking information generation method.
  • a three-dimensional display device for controlling the three-dimensional effect by using the viewer face tracking information generation method.
  • a viewer face tracking information generation device for controlling a stereoscopic feeling of a 3D display device in response to at least one of a gaze direction and a gaze distance of a viewer, wherein the 3D display is provided.
  • a face region detection module for detecting a face region of the viewer from an image extracted from an image input through an image input means provided at a position of a device side;
  • a facial feature point detection module for detecting a facial feature point in the detected face area;
  • a matrix estimation module for transforming a model feature point of a 3D standard face model to estimate an optimal transformation matrix for generating a 3D viewer face model corresponding to the face feature point;
  • a tracking information generation module for estimating at least one of a gaze direction and a gaze distance of the viewer based on the estimated optimal transformation matrix to generate viewer face tracking information.
  • a viewer face tracking information generation device for controlling a stereoscopic feeling of a 3D display device in response to at least one of a gaze direction and a gaze distance of a viewer, wherein the 3D display is provided.
  • the present invention estimates the gaze direction and gaze distance of a viewer by using an optimal transformation matrix for converting the model feature points of the 3D standard face model to generate a 3D viewer face model corresponding to the face feature points of the face region. do.
  • the tracking speed is high, so it is suitable for real-time tracking, and there is an advantage that the face area can be robustly tracked even in the local distortion of the face area.
  • an asymmetric similar feature (harr-like feature) is used to detect the non-frontal face region, the detection reliability of the face region with respect to the non-frontal face is high, thereby increasing the tracking performance of the face region.
  • the gaze direction and gaze distance of the viewer are estimated to generate gaze direction information and gaze distance information, and additionally, at least one of gender or age of the viewer is estimated to generate viewer information.
  • the screen output of the 3D display device may be used as information for turning off or stopping playback. There is this.
  • 1 is a configuration diagram showing a schematic configuration of a passive 3D TV.
  • FIG. 2 is a state diagram showing a state of watching a passive 3D TV from the front
  • FIG. 3 is a state diagram illustrating a state in which a passive 3D TV is viewed from the side;
  • FIG. 4 is a block diagram showing a schematic configuration of a viewer face tracking information generating device according to an embodiment of the present invention.
  • FIG. 5 is a picture showing a three-dimensional standard face model in connection with the viewer face tracking information generation according to an embodiment of the present invention.
  • FIG. 6A is a first picture showing an example screen of a UI module in connection with generating viewer face tracking information according to an embodiment of the present invention.
  • FIG. 6B is a second picture showing an example screen of a UI module in connection with generating viewer face tracking information according to an embodiment of the present invention.
  • FIG. 7 is a flowchart illustrating a process of a viewer face tracking information generation method according to an embodiment of the present invention.
  • FIG. 8 is a view showing the basic shape of a conventional Harr-like feaure.
  • FIG. 9 is an exemplary photograph of a harr-like feaure for detecting a front face region in relation to the generation of viewer face tracking information according to an embodiment of the present invention.
  • FIG. 10 is an exemplary photograph of a harr-like feaure for detecting a non-frontal face region in connection with generating viewer face tracking information according to an embodiment of the present invention.
  • FIG. 11 is a diagram illustrating a newly added rectangular feaure in connection with generating viewer face tracking information according to an embodiment of the present invention.
  • FIG. 12 is an exemplary photograph of a harr-like feaure selected from FIG. 11 for detecting a non-frontal face region in connection with generating viewer face tracking information according to an embodiment of the present invention.
  • FIG. 12 is an exemplary photograph of a harr-like feaure selected from FIG. 11 for detecting a non-frontal face region in connection with generating viewer face tracking information according to an embodiment of the present invention.
  • Figure 13 is a feature probability curve in a training set for a conventional Harr-like feaure and Harr-like feaure applied to the present invention.
  • 15 is a profile picture applied to the conventional ASM method for a low-resolution or poor image quality.
  • 16 is a photograph of the pattern around each marker point used in Adaboost for marker point search of the present invention.
  • FIG. 17 is a photograph showing 28 feature points of a face in connection with generating viewer face tracking information according to an embodiment of the present invention.
  • FIG. 18 is a flowchart illustrating a matrix estimation process of a method for generating viewer face tracking information according to an embodiment of the present invention.
  • 19 is a flowchart illustrating a gender estimation process of a method for generating viewer face tracking information according to an embodiment of the present invention.
  • 20 is an exemplary photograph for defining a gender estimation face area in the gender estimation process of the viewer face tracking information generation method according to an embodiment of the present invention.
  • 21 is a flowchart illustrating an age estimation process of a method for generating viewer face tracking information according to an embodiment of the present invention.
  • FIG. 22 is an exemplary photograph for defining an age estimation face region in an age estimation process of a method for generating viewer face tracking information according to an embodiment of the present invention.
  • 23 is a flowchart illustrating a process of estimating eye closure of a method of generating viewer face tracking information according to an embodiment of the present invention.
  • 24 is an exemplary photograph for defining a face region for eye closure estimation in a process of eyelid estimation of a method for generating viewer face tracking information according to an embodiment of the present invention.
  • 25 is a plan view for explaining a coordinate system (camera coordinate system) of the image input means in connection with generating the viewer face tracking information according to an embodiment of the present invention.
  • the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component.
  • FIG. 4 is a block diagram showing a schematic configuration of a viewer face tracking information generating device according to an embodiment of the present invention.
  • a viewer face tracking information generating apparatus for controlling a stereoscopic feeling of a 3D display device in response to at least one of a gaze direction and a gaze distance of a viewer.
  • the viewer face tracking information generating device includes a computing element such as a central processing unit, a system DB, a system memory, and an interface.
  • a computing element such as a central processing unit, a system DB, a system memory, and an interface.
  • the viewer face tracking information generating device may be a conventional computer system connected to a 3D display device such as a 3D TV to transmit and receive a control signal.
  • the viewer face tracking information generating apparatus can be regarded as functioning as the viewer face tracking information generating apparatus by installing and driving the viewer face tracking information generating program in the above-described conventional computer system.
  • the viewer face tracking information generation device of the present embodiment may be configured in the form of an embedded device in a three-dimensional display device such as a 3D TV.
  • the viewer face tracking information generating device includes a face region detection module 100.
  • the face region detection module 100 is captured by the image capture unit 20 captured by an image input unit 10, for example, an image input through a camera, provided at a position of the 3D display apparatus.
  • the facial region of the viewer is detected from the image.
  • the detection viewing angle may be all faces in the range of -90 to +90.
  • the image input means 10 may be installed at the top or bottom side of the center portion of the 3D TV 1.
  • the image input means 10 may be a camera capable of capturing a face of a viewer located in front of a TV screen in real time as a video, and more preferably, a digital camera having an image sensor.
  • the face area detection module 100 generates a YCbCr color model from the RGB color information of the extracted image, separates color information and brightness information from the created color model, and detects a face candidate area based on the brightness information. Perform the function.
  • the face region detection module 100 defines a quadrilateral feature point model for the detected face candidate region, and detects the face region based on the training material learned by the AdaBoost learning algorithm. Do this.
  • the face area detection module 100 performs a function of determining the detected face area as a valid face area when the size of the result value of the AdaBoost exceeds a predetermined threshold value.
  • the viewer face tracking information generation device also includes a face feature point detection module 200.
  • the facial feature point detection module 200 performs facial feature point detection on face areas determined to be valid in the face area detection module 100.
  • the facial feature detection module 200 may detect 28 facial feature points, including, for example, a face viewing rotation angle, for which each position of an eyebrow, an eye, a nose, and a mouth can be defined.
  • a total of eight feature points preferably four eyes, two noses, and two mouths, which are basic facial feature points, can be detected as facial feature points.
  • the viewer face tracking information generation device also includes a matrix estimation module 300.
  • the matrix estimation module 300 estimates an optimal transformation matrix for generating a 3D viewer face model corresponding to the face feature by converting a model feature point of the 3D standard face model.
  • the 3D standard face model may be a 3D mesh model composed of 331 points and 630 triangles, as shown in FIG. 5.
  • the viewer face tracking information generation device also includes a tracking information generation module 400.
  • the tracking information generation module 400 estimates at least one of the gaze direction and gaze distance of the viewer based on the optimal transformation matrix to generate viewer face tracking information.
  • the viewer face tracking information generation device also includes a gender estimation module 500.
  • the gender estimating module 500 estimates the gender of the viewer using the detected face region.
  • the gender estimating module 500 cuts out a gender estimation face area from the detected face area, normalizes the cut out face area image, and estimates a sex by a SVM (Support Vector Machine) using the normalized image. Do this.
  • SVM Small Vector Machine
  • the viewer face tracking information generation device also includes an age estimation module 600.
  • the age estimation module 600 estimates the age of the viewer using the detected face region.
  • the age estimation module 600 cuts out an age estimation face area from the detected face area.
  • the age estimation module 600 performs a function of normalizing the cropped face region image.
  • the age estimating module 600 constructs an input vector from a normalized image and performs projection on a nine-body space.
  • the age estimation module 600 performs a function of estimating age using a second order polynomial regression.
  • the viewer face tracking information generation device also includes an eyelid estimation module 700.
  • the eyelid estimation module 700 estimates the eyelids of the viewer using the detected face region.
  • the eyelid estimation module 700 performs a function of cutting a face region for eyelid estimation, a function of normalizing the cut-out face region image, and an eyelid estimation function by a support vector machine (SVM) using the normalized image.
  • SVM support vector machine
  • the viewer face tracking information generating apparatus may also display the setting of the image input means 10 provided on one side of the 3D display apparatus (FIG. 6A), the detected face region, the age / gender result, and the like (FIG. 6B). It is provided with a UI (User Interface) module.
  • UI User Interface
  • FIG. 7 is a flowchart illustrating a process of generating a viewer face tracking information according to an embodiment of the present invention.
  • the viewer face tracking information generation method starts from the start of the generation process, and includes the face area detection step S100, the facial feature point detection step S200, the matrix estimation step S300, and the tracking.
  • the information generation step (S400) gender estimation step (S500), age estimation step (S600), eye closure estimation step (S700), the result output step (S800) is made to the end step.
  • the face region of the viewer is detected from an image extracted from an image input through an image input means provided at one position of the 3D display apparatus.
  • a method for face detection for example, a knowledge-based method, a feature-based method, a template-matching method, an appearance-based method, and the like.
  • an appearance-based method is used.
  • the appearance-based method is a method of acquiring a face region and a non-face region from different images, learning the acquired regions to make a learning model, and comparing the input image and the learning model data to detect a face.
  • the appearance-based method is known as a relatively high performance method for front and side face detection.
  • Image extraction from an image input through the image input means may be performed by capturing an image from an image input through the image input means, for example, using a sample grabber of DirectX.
  • the media type of the sample grabber may be set to RGB24.
  • a video converter filter is automatically attached to the front of the sample grabber filter so that the image captured by the sample grabber finally becomes RGB24.
  • mt.formattype FORMAT_VideoInfo
  • mt.majortype MEDIATYPE_Video
  • mt.subtype MEDIASUBTYPE_RGB24; // only accept 24-bit bitmaps
  • a YCbCr color model is generated from the RGB color information of the extracted image, color information and brightness information are separated from the generated color model, and the face candidate area is determined by the brightness information. Detecting; (a2) defining a quadrilateral feature point model for the detected face candidate region, and detecting a face region based on learning data trained by the AdaBoost learning algorithm on the quadrilateral feature point model; And (a3) determining the detected face area as a valid face area when the size of the result value of AdaBoost (CF H (x) of Equation 1) exceeds a predetermined threshold value. do.
  • A value used to finely adjust the error judgment rate of the strong classifier.
  • the AdaBoost learning algorithm is known as an algorithm that generates a strong classifier with high detection performance through linear combination of weak classifiers.
  • frontal face images the structural features unique to the face, such as eyes, nose and mouth, are evenly distributed throughout the image and are symmetrical.
  • the non-facial face image it is not symmetrical and is concentrated in a narrow range. Since the face outline is not a straight line, the background area is mixed.
  • the present embodiment further includes new Haar-Like features similar to the existing Haar-like features but adding asymmetry.
  • FIG. 8 is a basic form of a conventional Harr-like feaure
  • FIG. 9 is an exemplary photograph of Haar-like features selected for front face area detection according to an embodiment of the present invention
  • FIG. 11 shows a rectangular Haar-Like feature newly added by the present embodiment
  • FIG. 12 shows an example of Haar-Like features selected for non-face detection among the Haar-Like features of FIG. 11. have.
  • the Haar-Like feature of the present embodiment is configured to asymmetrically form, structure, and shape as shown in FIG. Excellent detection effect on the front face.
  • FIG. 13 is a probability curve of a Haar-Like feature in a training set for a conventional Harr-like feaure and a Harr-like feaure applied to this embodiment.
  • A) is the present case
  • b) is the existing case
  • the probability curve corresponding to the case of the present embodiment is concentrated in a narrower range.
  • Haar-Like features added in this embodiment are effective in the face detection in view of the base classification rule.
  • FIG. 14 is a table showing newly added features in a training set of a non-facial face and an average value of variance and Kurtosis of the probability curve of the existing Harr-like feaure.
  • the table shows the variances and probability values of Kurtosis of the probability curves of the newly added Haar-Like features and existing Haar-Like features in the training set of the non-facial face.
  • the Haar-Like features added in this example show that the dispersion is small and the Kurtosis is large, which is effective in detection.
  • the har-like feature for detecting the face area further includes an asymmetric har-like feature for detecting the non-frontal face area. do.
  • the validity of the detected face is determined by comparing the magnitude of the result value of AdaBoost (CF H (x) of Equation 1) with a predetermined threshold value.
  • Equation 1 the size of CF H (x) can be used as an important factor for determining the validity of the face.
  • This value CF H (x) is a measure of how close the detected area is to the face and can be used to determine the validity of the face by setting a predetermined threshold value.
  • the predetermined threshold is empirically set using the learning face group.
  • a facial feature point is detected in the detected face region.
  • the facial feature detection step S200 is performed by searching for a landmark of the ASM method, and detects the facial feature by proceeding using the AdaBoost algorithm.
  • the detection of the facial feature point (b1) defines the position of the current feature point as (x l , y l ), and all possible partial windows of n * n pixel size in the vicinity of the current feature point position. Classifying them into a classifier; (b2) calculating candidate positions of the feature points according to Equation 2 below; And (b3) setting (x ' l , y' l ) as a new feature point if the condition of Equation 3 is satisfied, and maintaining the position (x l , y l ) of the current feature point if not satisfied. It is configured to include.
  • N pass the number of steps through which the partial window has passed
  • a method for detecting a feature point of a face there are, for example, a method of individually detecting feature points and a method of simultaneously detecting a feature point in correlation.
  • the Active Shape Model (ASM) method which is a preferable method for face feature detection in terms of speed and accuracy, is used. I use it.
  • the feature point search of the existing ASM is a method using a profile at the feature point, detection is stable only in high quality images.
  • an image extracted from an image input through an image input means such as a camera may be obtained as a low resolution, low quality image.
  • the feature point is searched by the AdaBoost method to improve the feature, so that the feature points can be easily detected even in low resolution and low quality images.
  • FIG. 15 is a profile picture applied to an existing ASM method for an image having a low resolution or poor image quality.
  • FIG. 16 is a pattern picture around each mark point used in Adaboost for mark point search of the present invention.
  • a plurality of feature points (for example, 28) may be detected.
  • Bay eight basic facial features (4 eyes (4, 5, 6, 7), 2 noses (10, 11) and 2 mouths (8, 9)) in consideration of arithmetic processing and tracking performance together Bay is used to estimate gaze distance and gaze direction.
  • eight facial feature points input S310 for example, the coordinate values of the detected eight feature points are stored in a memory device in which the program of the present embodiment is driven.
  • Loading into the input value 3D standard face model loading (S320, for example, the overall coordinate information of the 3D face model stored in the DB, the computing means that the program is driven as the input value), optimal Conversion matrix estimation (S330) is performed.
  • the estimation information generation step (S400) of calculating the gaze direction and gaze distance from the estimated optimal transformation matrix is performed.
  • the 3D standard face model is a 3D mesh model composed of 331 points and 630 triangles.
  • the estimating information generating step (S400) generates viewer face tracking information by estimating at least one of the gaze direction and gaze distance of the viewer based on the optimal transformation matrix.
  • the optimal transformation matrix estimation is performed by calculating (c1) a transformation equation of Equation 4 using a 3 * 3 matrix M for face rotation information of the 3D standard face model and a 3D vector T for face parallel movement information.
  • Step M and T are variables having respective components as variables and defining the optimal transformation matrix;
  • (c2) calculating the three-dimensional vector P 'of Equation 5 using the camera feature point position vector P C obtained by Equation 4 and the camera transformation matrix M C obtained by Equation 6 below;
  • (c4) estimating each variable of the optimal transformation matrix using the two-dimensional vector P I and the coordinate values of the facial feature points detected in the step (b).
  • the optimal transform matrix is mathematically composed of a 3 * 3 matrix M and a 3D vector T.
  • the 3 * 3 matrix M reflects the rotation information of the face
  • the 3D vector T reflects the parallel movement information of the face.
  • the feature point position (three-dimensional vector) P M in the coordinate system of the three-dimensional standard face model is the position (three-dimensional vector) P in the camera coordinate system by the optimal transformation matrix (M, T). converted to c .
  • the 3D standard face model coordinate system is a 3D coordinate system whose coordinate center is located at the center of the 3D standard face model
  • the camera coordinate system is a 3D coordinate system whose center is located at the center of the image input means (10 in FIG. 25).
  • P ' which is a three-dimensional vector defined by (P'x, P'y, P'z), is obtained using the camera feature point position vector P c and the camera transformation matrix M c according to Equation 5.
  • the camera transformation matrix Mc is a 3 * 3 matrix determined by the focal length of the camera and the like, and is defined as in Equation 6 below.
  • focal_len -0.5 * W / tan (Degree2Radian (fov * 0.5))
  • a target function of outputting a sum of squares of deviations between the positions of the detected feature points and the positions of the face model feature points to which the optimal transformation matrix is applied is set as 12 variables of the optimal transformation matrix.
  • the 12 optimal variables are calculated by solving the optimization problem that minimizes the target function.
  • the gaze direction information is defined by Equation 7 using each component of the rotation information related matrix M of the optimal transformation matrix, and the gaze distance information is a parallel movement related vector T of the optimal transformation matrix. Is defined.
  • the gaze direction information becomes (a x , a y , a z ), and the gaze distance information is defined by the parallel movement related vector T itself.
  • the gender estimating step (S500) as shown in FIG. 19, the image and the facial feature point input (S510), the gender estimation face region clipping (S520), the cut face region image normalization (S530), and the gender by SVM It is made in the process of estimation (S540).
  • a method for sex estimation there are, for example, a view-based method using all of a human face and a geometric feature-based method using only geometric features of a face.
  • the gender estimation is performed by a view-based gender classification method using SVM (Support Vector Machine) learning to normalize the detected face region to form a facial feature vector and predict the gender therewith.
  • SVM Small Vector Machine
  • the SVM method may be classified into a support vector classifier (SVC) and a support vector regression (SVR).
  • SVC support vector classifier
  • SVR support vector regression
  • the gender estimating step (S500) specifically includes: (e1) cutting out a face region for sex estimation from the detected face region based on the detected face feature points; (e2) normalizing the size of the cut face sex estimation region; (e3) normalizing a histogram of the face region for gender estimation in which the size is normalized; And (e4) constructing an input vector from the face region for gender estimation where the size and histogram are normalized, and estimating gender using a pre-learned SVM algorithm.
  • the face area is cut out using the input image and the facial feature point. For example, as shown in FIG. 20, the half of the distance between the left and right eyes is cut to 1 and is to be cut. Calculate the area of the face.
  • the cut out facial region is normalized to 12 * 21 size.
  • the histogram is normalized, which is a process of equalizing the number of pixels having each density value to the histogram in order to minimize the effect of the lighting effect.
  • a 252-dimensional input vector is constructed from a normalized 12 * 21 face image, and sex is estimated using a pre-trained SVM.
  • the gender is estimated as a male or a female if the calculated result of the classifier of Equation 8 is greater than zero.
  • y i Gender value of the i th test data, set to 1 for male and -1 for female.
  • the kernel function may use a Gaussian Radial Basis Function (GRBF) defined in Equation 9 below.
  • GRBF Gaussian Radial Basis Function
  • the kernel function may be a polynomial kernel, etc., in addition to the Gaussian copper soil function, and preferably, the Gaussian copper soil function is used in consideration of the identification performance.
  • the SVM Small Vector Machine
  • the SVM is a classification method that derives the boundary of two groups in a group having two groups and is known as a learning algorithm for pattern classification and regression.
  • the basic learning principle of SVMs is to find an optimal linear hyperplane with minimal predictive classification errors for invisible test samples, that is, with good generalization performance.
  • the linear SVM uses a taxonomic method to find the linear function with the least order.
  • Equation 2 In order to determine the learning result uniquely, the following Equation 2 is restricted.
  • Equation 3 the minimum distance between the learning sample and the hyperplane is represented by the following Equation 3, so it is necessarily as shown in the following Equation 4.
  • Equation 5 Since w and b must be determined to maximize the minimum distance while fully identifying the learning sample, w and b are formulated as shown in Equation 5 below.
  • Equation 4 Minimizing the objective function maximizes the value of Equation 4, which is the minimum distance.
  • Equation 7 the constraint is shown in Equation 7 below.
  • K (x, x ') is a nonlinear kernel function
  • Adaboost method may be used in the above process, considering the performance and generalization performance of the classifier, it is more preferable to use the SVM method.
  • the performance is 10-15% lower than when tested by the SVM method.
  • the second polynomial regression is made by the process of age estimation (S650).
  • the estimation of the age specifically includes: (f1) cutting out an age estimation face area from the detected face area based on the detected facial feature point; (f2) normalizing the size of the cut age estimation face region; (f3) performing local illumination correction on the age estimation face region where the size is normalized; (f4) generating a feature vector by constructing an input vector from the size normalized and locally-illuminated age estimation face region and projecting it into a nine-body space; And (f5) estimating age by applying quadratic regression to the generated feature vectors.
  • the face region is cut out using the input image and the facial feature point.
  • the face region is cut out from the binocular and the entrance point to the upper (0.8), the lower (0.2), the left (0.1), and the right (0.1), respectively.
  • the cut out face region is normalized to 64 * 64 size.
  • step (f3) in order to reduce the influence of the lighting effect, local illumination correction is performed by the following equation (10).
  • I (x, y) (I (x, y) -M) / V * 10 + 127
  • the standard dispersion value (V) is a characteristic value representing the degree to which a certain amount of coincidence is scattered around the average value, and mathematically, the standard dispersion V is calculated as in Equation (9).
  • a 4096-dimensional input vector is constructed from a 64 * 64 face image, and a 50-dimensional feature vector is generated by projecting into a pre-learned manifold space.
  • the age estimation theory assumes that the characteristics of the human aging process reflected in the face image can be expressed in patterns according to any low dimensional distribution.
  • X is an input vector
  • Y is a feature vector
  • P is a projection matrix to Nida body trained using CEA.
  • X is an m ⁇ n matrix and x i represents every face image.
  • the manifold learning step is to obtain a projection matrix for representing the m-dimensional face vector as a d-dimensional face vector (aging feature vector), where d < m (d is much smaller than m).
  • the image order m is much larger than the number n of images.
  • m ⁇ m matrix XX T is a degenerate matrix.
  • C pca is an m ⁇ m matrix.
  • d matrix of eigenvectors are selected in order of eigenvalues to form matrix W PCA .
  • W PCA is an m ⁇ d matrix.
  • Ws denotes a relationship between face images belonging to the same age group and Wd denotes a relationship between face images belonging to different groups.
  • Dist (X i , X j ) is the same as Ref. 12 below.
  • the eigenvectors corresponding to the d largest eigenvalues of become CEA basis vectors.
  • Orthogonal Vectors a 1 When, a d is calculated, the matrix WCEA is defined as follows.
  • W CEA is the m ⁇ d matrix.
  • the projective matrix P mat is defined as in Equation 15 below.
  • the projection matrix P mat is used to obtain aging characteristics for each face vector X.
  • step (f5) to estimate the age by applying the second regression is made by the following equation (11).
  • b o , b 1 , and b 2 are precomputed from the learning material as follows:
  • Equation 17 The second regression model is shown in Equation 17 below.
  • Is the age of the i-th learning image Is the feature vector of the i-th learning image.
  • N is the number of learning materials.
  • the image and facial feature point input S710
  • the eye region estimation for trimming the face region S720
  • the cut out facial region image normalization S730
  • SVM By eyelid estimation S740
  • the estimation of the eye closing may specifically include: (g1) cutting the eye mask estimation face area from the detected face area based on the detected facial feature point; (g2) normalizing the size of the cut-out eye mask estimation face region; (g3) normalizing a histogram of the face region for estimating the eyelid normalized in size; And (g4) constructing an input vector from the face region for eye-eye estimation for which the size and histogram are normalized, and estimating eye-eye closure using a pre-learned SVM algorithm.
  • the eye region is cut out using the input image and the facial feature point.
  • the eye area may be cut out by determining the width of the feature points detected by the facial feature point detection based on both end points of the eye and determining the eye area at the same height up and down.
  • the cropped eye region image is normalized to 20 * 20 size.
  • step (g3) histogram normalization is performed to reduce the effect of the lighting effect.
  • a 400-dimensional input vector is constructed from a normalized 20 * 20 face image, and estimated whether to close the eye using a pre-learned SVM.
  • the estimation of the eye closing is determined as the state of opening the eyes when the result value of Equation 12 is greater than 0, and the state of closing the eyes when the result value is less than 0. Is determined to be awakened.
  • y i Whether to close the eye for the i-th learning material is set to 1 when the eyes are opened and -1 when the eyes are closed.
  • the kernel function may use a Gaussian landscape soil function defined in Equation 13.
  • the sex information of the viewer and the age information of the viewer estimated by the process described above are output to the stereoscopic control means as information for controlling the stereoscopic sense of the 3D display apparatus.
  • the development is based on the premise that an adult man is sitting on the front 2.5M of the 3D display device.
  • the brain is to calculate the depth information accordingly.
  • this difference can be as small as 1cm or 1.5cm.
  • the gender information and the age information of the viewer are needed to determine this and control the stereoscopic feeling of the 3D display device.
  • the gender information of the viewer and the age information of the viewer output by the stereoscopic control means may be used as a horizontal parallax change reference value, which means a change amount determined based on the point where the left and right images are focused. have.
  • a 3D screen optimized for the current viewer's viewing condition may be output and provided. It is.
  • the output direction of the 3D display apparatus may be changed by using rotation driving means (not shown) so that the front side of the 3D display apparatus faces the corresponding viewer.
  • the viewer may be guided so that the viewer can move to the front of the 3D display by outputting captions such as “deviating from the viewing angle” and “winding to the front of the screen” on the screen of the 3D display.
  • the eye contact information estimated by the above-described process is output to the screen power control means as information for controlling the ON / OFF screen output of the 3D display device.
  • the screen power control means may turn off the image output to the display device screen so that no further image output is performed.
  • Reference numeral 1000 in FIG. 25 denotes control means for performing such various control processes.
  • Embodiments of the present invention include a computer readable recording medium including program instructions for performing various computer-implemented operations.
  • the computer-readable recording medium may include program instructions, data files, data structures, etc. alone or in combination.
  • the recording medium may be one specially designed and configured for the present invention, or may be known and available to those skilled in computer software.
  • Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical recording media such as CD-ROMs, DVDs, magnetic-optical media such as floppy disks, and ROM, RAM, flash memory, and the like. Hardware devices specifically configured to store and execute the same program instructions are included.
  • the recording medium may be a transmission medium such as an optical or metal wire, a waveguide, or the like including a carrier wave for transmitting a signal specifying a program command, a data structure, or the like.
  • Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like.

Abstract

The present invention relates to a method and an apparatus for generating viewer face-tracing information, a recording medium for same, and a three-dimensional display apparatus, and the method for generating the viewer face-tracing information for controlling dimensionality of the three-dimensional display apparatus according to information on a gazing direction and/or a gazing distance of a viewer, comprising the steps of: (a) detecting a face region of the viewer from an image that is extracted from an image that is inputted through an image input means, which is provided on one position on a side of the three-dimensional display apparatus; (b) detecting facial characteristics from the face region which is detected; (c) estimating an optimal transformation matrix for generating a three-dimensional viewer face model, which corresponds to the facial characteristics, by transforming model characteristics of a three-dimensional standard face model; and (d) generating the viewer face-tracing information by estimating the gaze direction and/or the gaze distance of the viewer based on the optimal transformation matrix.

Description

시청자 얼굴 추적정보 생성방법 및 생성장치, 그 기록매체 및 3차원 디스플레이 장치Method and apparatus for generating viewer face tracking information, recording medium and 3D display device
본 발명은 시청자 얼굴 추적정보 생성방법 및 생성장치, 그 기록매체 및 3차원 디스플레이 장치에 관한 것이다. The present invention relates to a method and apparatus for generating viewer face tracking information, a recording medium and a three-dimensional display apparatus.
더욱 상세하게는, 영상입력수단을 통해 입력되는 영상에서 추출한 이미지로부터 시청자 얼굴 내의 얼굴특징점을 검출하고, 이러한 얼굴특징점 및 최적변환행렬을 이용하여 3차원 디스플레이 장치의 입체감을 제어하기 위한 시청자의 응시방향 및 응시거리에 대한 정보를 생성하는 시청자 얼굴 추적정보 생성방법 및 생성장치, 그 기록매체 및 3차원 디스플레이 장치에 관한 것이다. More specifically, the facial feature point in the viewer's face is detected from the image extracted from the image input through the image input means, and the viewer's gaze direction for controlling the stereoscopic effect of the 3D display device using the facial feature point and the optimal transformation matrix. And a method and apparatus for generating a viewer face tracking information for generating information on gaze distance, a recording medium, and a three-dimensional display device.
성인 남성을 기준으로 사람의 눈은 가로 방향으로 약 6.5㎝ 정도 떨어져서 존재한다. Human eyes are about 6.5 cm apart in the transverse direction.
이로 인해 나타나게 되는 양안시차(binocular disparity)는 입체감을 느끼는 가장 중요한 요인으로 작용한다. The resulting binocular disparity acts as the most important factor for the three-dimensional feeling.
즉, 좌측 눈과 우측 눈은 각각의 서로 다른 2D 영상을 보게 된다. That is, the left eye and the right eye see different 2D images.
이 두 영상이 망막을 통해 뇌로 전달되면, 뇌는 이를 정확히 서로 융합하여 본래 3D 입체 영상의 깊이감과 실체감을 생성하게 된다. When these two images are delivered to the brain through the retina, the brain exactly fuses them together to create the depth and realism of the original 3D stereoscopic image.
이와 같이 하나의 단일한 영상이 두 눈의 시각차에 의해 얻어진 두 장의 이미지로부터 생성하여 양안에 차이를 두고 보여줌으로써 사람이 마치 영상이 제작되고 있는 장소에 있는 것과 같은 생동감과 현실감을 느낄 수 있게 하는 시각적 기술을 3D 입체 영상 기술이라고 한다. In this way, a single image is created from two images obtained by the visual difference between two eyes and shows the difference between the two eyes so that a person can feel the liveness and reality as if they are in the place where the image is being made. The technology is called 3D stereoscopic imaging technology.
3D 입체 영상 기술은 3D TV를 비롯하여 정보통신, 방송, 의료, 영화, 게임, 애니메이션 등과 같은 기존의 모든 산업제품 개발에 광범위하게 응용되는 핵심기술로 자리 잡고 있다. 3D stereoscopic image technology has become a core technology that is widely applied to the development of all existing industrial products such as 3D TV, information and communication, broadcasting, medical, film, games, animation and so on.
예를 들어, 3D TV는 특수안경을 사용하여 디스플레이에 좌안/우안용 영상을 각각의 눈에 입력해 주고 양안시차 원리를 이용하여 사람의 인지/정보 체계에서 3D로 인식하게 하는 장치이다. For example, 3D TV is a device that inputs images for left and right eyes to each eye on a display using special glasses and recognizes 3D in human cognitive / information system using binocular parallax principle.
상기 3D TV는 인공의 시각차를 발생시킨 좌/우 영상을 디스플레이에서 분리시켜 두 눈에 전달함으로써 뇌에서 3D 입체감을 느끼게 한다. The 3D TV separates a left / right image that causes an artificial visual difference from a display and delivers it to both eyes, thereby making the brain feel a 3D stereoscopic feeling.
예를 들어, 패시브 방식의 3D TV는, 도 1에 도시된 바와 같이, 광학 필름, 액정, 편광필름(PR Film, polaroid film)으로 구성된다. For example, a passive 3D TV is composed of an optical film, a liquid crystal, and a polaroid film (PR film), as shown in FIG. 1.
상기 패시브 방식의 3D TV는, 도 2에 도시된 바와 같이, TV화면의 정면에서 TV화면과 동일한 높이에서 시청할 경우에, L로 표시된 왼쪽 눈에 비춰져야 할 화상은 왼쪽 눈으로, R로 표시된 오른쪽 눈으로 가야할 화상은 오른쪽 눈에 표시가 되어 3D 입체감을 느끼게 된다. In the passive 3D TV, as shown in FIG. 2, when viewing at the same height as the TV screen from the front of the TV screen, the image to be seen by the left eye denoted by L is the left eye and the right denoted by R. The image to go to the eyes is displayed on the right eye, and the 3D stereoscopic feeling is felt.
하지만, 도 3에 도시된 바와 같이, 시청자가 TV화면의 정면에서 시청하지 않고, 3D TV의 정면에서 좌우측으로 벗어난 위치에서 시청하는 경우에는 영상이 겹쳐 보이는 크로스토크(crosstalk) 현상이 발생한다. 이로 인하여 시청자는 정상적인 3D 입체감을 느끼기 어렵게 된다. However, as shown in FIG. 3, when the viewer does not watch from the front of the TV screen but views from the front left and right sides of the 3D TV, a crosstalk phenomenon in which the images overlap is generated. This makes it difficult for the viewer to feel normal 3D stereoscopic feeling.
이는, 시야각 때문에 각각의 눈 쪽에는 보이지 않아야 할 영상이 보이게 되어 발생하는 것이며, 시청자와 3D TV 화면의 거리가 가까울수록 더욱 심해지게 된다. This is caused by viewing an image that should not be visible on each eye because of the viewing angle, and the closer the distance between the viewer and the 3D TV screen is, the worse it becomes.
따라서, 시청자가 응시하는 방향과 위치를 추적하여, 3D TV의 화면 입체감을 제어하거나 3D TV 화면을 회전시켜주는 등의 제어기술이 요구된다. Therefore, a control technology is required such as tracking the direction and the position at which the viewer stares, controlling the stereoscopic effect of the 3D TV, or rotating the 3D TV screen.
한편, 최근에는, 특수안경을 사용하는 방식의 3D TV의 불편함으로 인해 무안경 방식의 3D TV 개발이 가속화되고 있다. On the other hand, recently, the development of auto glasses-free 3D TV has been accelerated due to the inconvenience of the 3D TV using the special glasses.
무안경 방식의 3D TV는 특수안경을 사용하지 않고도 3D 영상을 제공할 수 있는 TV로서, 이러한 무안경 방식을 적용하기 위해서는 시청자가 응시하는 방향을 추적하는 기술이 더욱 필요로 한다. The glasses-free 3D TV is a TV that can provide 3D images without using special glasses, and in order to apply the glasses-free method, a technology for tracking a viewer's gaze is further required.
시청자가 응시하는 방향을 추적하는 기술의 일예로서, 시청자의 눈을 추적하는 방식이 있다. One example of a technique for tracking the direction in which the viewer stares is to track the viewer's eyes.
시청자의 눈을 추적하는 방식은, 눈 위치에 대한 특징점을 파악한 후 눈 추적 알고리즘을 이용하여 눈동자의 좌표를 출력하는 방식을 사용한다. The method of tracking the viewer's eyes uses a method of outputting the coordinates of the pupil using an eye tracking algorithm after grasping the feature points of the eye position.
구체적으로, 홍채(Iris)와 흰자위(Sclera)의 경계선을 얼굴 영상에서 검출한 후 추적하는 방식을 사용한다. In detail, a method of detecting a boundary line between an iris and an egg white in a face image and tracking the same is used.
그러나, 이러한 방식은 눈이 응시하는 각도를 정확히 파악하기 어려우며, 눈 추적 각도가 작은 문제점이 있었다. However, this method has a problem that it is difficult to accurately determine the angle at which the eye gazes, and the eye tracking angle is small.
시청자가 응시하는 방향을 추적하는 기술의 다른 일예로서, 얼굴의 특징점을 찾고 추적하는 템플릿 매칭(Template Matching) 방식이 있다. As another example of a technique of tracking a viewer's gaze, there is a template matching method of finding and tracking a feature point of a face.
그러나, 템플릿 매칭 방식은 초기에 얼굴의 특징점에 해당하는 틀(template)이 주어져야 하므로 일반적이지 못하고 제약이 뒤따르게 되는 문제점이 있었다. However, since the template matching method should be given a template corresponding to the feature point of the face at first, there is a problem that it is not common and is followed by constraints.
상기 종래 기술에 따른 문제점을 해결하기 위한 본 발명의 목적은, 영상입력수단을 통해 입력되는 영상에서 추출한 이미지로부터 시청자 얼굴 내의 얼굴특징점을 검출하고, 이러한 얼굴특징점 및 최적변환행렬을 이용하여 3차원 디스플레이 장치의 입체감을 제어하기 위한 시청자의 응시방향 및 응시거리에 대한 정보를 생성하는 시청자 얼굴 추적정보 생성방법 및 생성장치, 그 기록매체 및 3차원 디스플레이 장치를 제공함에 있다. An object of the present invention for solving the problems according to the prior art is to detect the facial feature in the viewer's face from the image extracted from the image input through the image input means, and using the facial feature and the optimal conversion matrix three-dimensional display Disclosed is a method and apparatus for generating a viewer's face tracking information for generating information about a viewer's gaze direction and gaze distance for controlling a stereoscopic effect of a device, a recording medium, and a three-dimensional display device.
상기와 같은 목적을 달성하기 위한 본 발명의 일실시예는, 시청자의 응시방향 및 응시거리 중 적어도 하나의 정보에 대응하여 3차원 디스플레이 장치의 입체감을 제어하기 위한 시청자 얼굴 추적정보 생성방법으로서, (a) 상기 3차원 디스플레이 장치 측의 일 위치에 구비된 영상입력수단을 통해 입력되는 영상에서 추출한 이미지로부터 상기 시청자의 얼굴영역을 검출하는 단계; (b) 상기 검출된 얼굴영역에서 얼굴특징점을 검출하는 단계; (c) 3차원 표준 얼굴모델의 모델특징점을 변환하여 상기 얼굴특징점에 대응하는 3차원 시청자 얼굴모델을 생성하는 최적변환행렬을 추정하는 단계; 및 (d) 상기 최적변환행렬에 근거하여 상기 시청자의 응시방향 및 응시거리 중 적어도 하나를 추정하여 시청자 얼굴 추적정보를 생성하는 단계;를 포함하여 구성된다. An embodiment of the present invention for achieving the above object, as a viewer face tracking information generation method for controlling the stereoscopic sense of the three-dimensional display device corresponding to at least one of the gaze direction and gaze distance of the viewer, ( a) detecting a face region of the viewer from an image extracted from an image input through an image input means provided at one position of the 3D display apparatus; (b) detecting a facial feature point in the detected face region; (c) estimating an optimal transformation matrix for generating a 3D viewer face model corresponding to the face feature by converting the model feature points of the 3D standard face model; And (d) estimating at least one of the gaze direction and gaze distance of the viewer based on the optimal transformation matrix to generate viewer face tracking information.
본 발명의 또 다른 측면에 따른 일실시예는, 시청자의 응시방향 및 응시거리 중 적어도 하나의 정보에 대응하여 3차원 디스플레이 장치의 입체감을 제어하기 위한 시청자 얼굴 추적정보 생성방법으로서, 상기 3차원 디스플레이 장치 측의 일 위치에 구비된 영상입력수단을 통해 입력되는 영상에서 추출한 이미지로부터 상기 시청자의 얼굴영역을 검출하는 얼굴영역 검출단계; 상기 검출된 얼굴영역에 근거하여 상기 시청자의 응시방향 및 응시거리 중 적어도 하나의 정보를 추정하여 응시정보를 생성하는 응시정보 생성단계; 및 상기 검출된 얼굴영역에 근거하여 상기 시청자의 성별 및 나이 중 적어도 하나의 정보를 추정하여 시청자정보를 생성하는 시청자정보 생성단계;를 포함하여 구성된다. In accordance with another aspect of the present invention, there is provided a viewer face tracking information generation method for controlling a stereoscopic feeling of a 3D display apparatus in response to at least one of a gaze direction and a gaze distance of a viewer, wherein the 3D display is performed. A face region detecting step of detecting a face region of the viewer from an image extracted from an image input through an image input means provided at one position of a device side; A gaze information generation step of generating gaze information by estimating at least one information of gaze direction and gaze distance of the viewer based on the detected face region; And generating viewer information by estimating at least one piece of information of the gender and the age of the viewer based on the detected face region.
본 발명의 다른 측면에 따르면, 상기 시청자 얼굴 추적정보 생성방법의 각 단계를 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체가 제공된다. According to another aspect of the present invention, there is provided a computer-readable recording medium recording a program for executing each step of the viewer face tracking information generation method.
본 발명의 또 다른 측면에 따르면, 상기 시청자 얼굴 추적정보 생성방법을 이용하여 입체감을 제어하는 3차원 디스플레이 장치가 제공된다. According to another aspect of the present invention, there is provided a three-dimensional display device for controlling the three-dimensional effect by using the viewer face tracking information generation method.
본 발명의 또 다른 측면에 따른 일실시예는, 시청자의 응시방향 및 응시거리 중 적어도 하나의 정보에 대응하여 3차원 디스플레이 장치의 입체감을 제어하기 위한 시청자 얼굴 추적정보 생성장치로서, 상기 3차원 디스플레이 장치 측의 일 위치에 구비된 영상입력수단을 통해 입력되는 영상에서 추출한 이미지로부터 상기 시청자의 얼굴영역을 검출하는 얼굴영역 검출모듈; 상기 검출된 얼굴영역에서 얼굴특징점을 검출하는 얼굴특징점 검출모듈; 3차원 표준 얼굴모델의 모델특징점을 변환하여 상기 얼굴특징점에 대응하는 3차원 시청자 얼굴모델을 생성하는 최적변환행렬을 추정하는 행렬 추정모듈; 및 상기 추정된 최적변환행렬에 근거하여 상기 시청자의 응시방향 및 응시거리 중 적어도 하나를 추정하여 시청자 얼굴 추적정보를 생성하는 추적정보 생성모듈;을 포함하여 구성된다. In accordance with another aspect of the present invention, there is provided a viewer face tracking information generation device for controlling a stereoscopic feeling of a 3D display device in response to at least one of a gaze direction and a gaze distance of a viewer, wherein the 3D display is provided. A face region detection module for detecting a face region of the viewer from an image extracted from an image input through an image input means provided at a position of a device side; A facial feature point detection module for detecting a facial feature point in the detected face area; A matrix estimation module for transforming a model feature point of a 3D standard face model to estimate an optimal transformation matrix for generating a 3D viewer face model corresponding to the face feature point; And a tracking information generation module for estimating at least one of a gaze direction and a gaze distance of the viewer based on the estimated optimal transformation matrix to generate viewer face tracking information.
본 발명의 또 다른 측면에 따른 일실시예는, 시청자의 응시방향 및 응시거리 중 적어도 하나의 정보에 대응하여 3차원 디스플레이 장치의 입체감을 제어하기 위한 시청자 얼굴 추적정보 생성장치로서, 상기 3차원 디스플레이 장치 측의 일 위치에 구비된 영상입력수단을 통해 입력되는 영상에서 추출한 이미지로부터 상기 시청자의 얼굴영역을 검출하는 수단; 상기 검출된 얼굴영역에 근거하여 상기 시청자의 응시방향 및 응시거리 중 적어도 하나의 정보를 추정하여 응시정보를 생성하는 수단; 및 상기 검출된 얼굴영역에 근거하여 상기 시청자의 성별 및 나이 중 적어도 하나의 정보를 추정하여 시청자정보를 생성하는 수단;을 포함하여 구성된다. In accordance with another aspect of the present invention, there is provided a viewer face tracking information generation device for controlling a stereoscopic feeling of a 3D display device in response to at least one of a gaze direction and a gaze distance of a viewer, wherein the 3D display is provided. Means for detecting a face region of the viewer from an image extracted from an image input through an image input means provided at a position on the apparatus side; Means for generating gaze information by estimating at least one of gaze direction and gaze distance of the viewer based on the detected face region; And means for estimating at least one of gender and age of the viewer based on the detected face region to generate viewer information.
상술한 바와 같은 본 발명은, 3차원 표준 얼굴모델의 모델특징점을 변환하여 얼굴영역의 얼굴특징점에 대응하는 3차원 시청자 얼굴모델을 생성하는 최적변환행렬을 이용하여 시청자의 응시방향 및 응시거리를 추정한다. As described above, the present invention estimates the gaze direction and gaze distance of a viewer by using an optimal transformation matrix for converting the model feature points of the 3D standard face model to generate a 3D viewer face model corresponding to the face feature points of the face region. do.
상술한 바와 같이 응시방향 및 응시거리를 추정하므로, 추적속도가 빨라 실시간 추적에 적합하고, 얼굴영역의 국부적 일그러짐에도 강인하게 얼굴영역을 추적할 수 있다는 이점이 있다. As described above, since the gaze direction and gaze distance are estimated, the tracking speed is high, so it is suitable for real-time tracking, and there is an advantage that the face area can be robustly tracked even in the local distortion of the face area.
또한, 검출된 얼굴영역이 유효한지 여부를 판정하고, 유효하다고 판정된 얼굴영역에 대해서 얼굴특징점을 검출하므로, 얼굴특징점의 검출 신뢰도가 높아 얼굴영역의 추적성능이 높아진다는 이점이 있다. In addition, since it is determined whether the detected face area is valid and face feature points are detected for the face area determined to be valid, there is an advantage that the detection reliability of the face feature point is high and the tracking performance of the face area is increased.
또한, 비정면 얼굴영역을 검출하기 위해 비대칭성의 하 라이크 피쳐(harr-like feature)를 이용하므로, 비정면 얼굴에 대한 얼굴영역의 검출 신뢰도가 높아 얼굴영역의 추적성능이 높아진다는 이점이 있다. In addition, since an asymmetric similar feature (harr-like feature) is used to detect the non-frontal face region, the detection reliability of the face region with respect to the non-frontal face is high, thereby increasing the tracking performance of the face region.
또한, 기본적으로 시청자의 응시방향 및 응시거리를 추정하여 응시방향정보 및 응시거리정보를 생성하고, 부가적으로 시청자의 성별 또는 나이 중 적어도 어느 하나를 추정하여 시청자정보를 생성한다. In addition, basically, the gaze direction and gaze distance of the viewer are estimated to generate gaze direction information and gaze distance information, and additionally, at least one of gender or age of the viewer is estimated to generate viewer information.
상술한 바와 같이, 상기 응시방향정보 및 응시거리정보뿐만 아니라 상기 시청자정보를 부가적으로 활용하여 3차원 디스플레이 장치의 입체감을 제어할 수 있도록 하므로, 더욱 정확한 입체감 조절이 가능하다는 이점이 있다. As described above, it is possible to control the stereoscopic sense of the 3D display device by additionally utilizing the viewer information as well as the gaze direction information and gaze distance information, and thus, there is an advantage that more accurate stereoscopic adjustment is possible.
또한, 시청자의 눈감김 여부를 추정하여, 3차원 디스플레이 장치를 시청하는 시청자의 눈이 감겨 있다고 추정된 경우에 3차원 디스플레이 장치의 화면출력을 OFF시키거나 재생을 중지시키기 위한 정보로 활용할 수 있다는 이점이 있다. In addition, by estimating whether or not the viewer's eyes are closed, when the viewer watching the 3D display device is estimated to be closed, the screen output of the 3D display device may be used as information for turning off or stopping playback. There is this.
또한, 하나의 영상입력수단(예를 들어, 카메라)만으로 시청자의 응시방향, 응시거리의 정확한 추적이 가능하다는 이점이 있다. In addition, there is an advantage that it is possible to accurately track the gaze direction, gaze distance of the viewer with only one image input means (for example, a camera).
도 1은 패시브 방식의 3D TV의 개략적인 구성을 도시한 구성도. 1 is a configuration diagram showing a schematic configuration of a passive 3D TV.
도 2는 패시브 방식의 3D TV를 정면에서 시청하는 상태를 도시한 상태도. 2 is a state diagram showing a state of watching a passive 3D TV from the front;
도 3은 패시브 방식의 3D TV를 측면에서 시청하는 상태를 도시한 상태도. 3 is a state diagram illustrating a state in which a passive 3D TV is viewed from the side;
도 4는 본 발명의 일실시예에 따른 시청자 얼굴 추적정보 생성장치의 개략적인 구성을 도시한 구성도. 4 is a block diagram showing a schematic configuration of a viewer face tracking information generating device according to an embodiment of the present invention.
도 5는 본 발명의 일실시예에 따른 시청자 얼굴 추적정보 생성과 관련하여, 3차원 표준 얼굴모델을 보여주는 사진. 5 is a picture showing a three-dimensional standard face model in connection with the viewer face tracking information generation according to an embodiment of the present invention.
도 6a는 본 발명의 일실시예에 따른 시청자 얼굴 추적정보 생성과 관련하여, UI모듈의 예시화면을 보여주는 제1사진. FIG. 6A is a first picture showing an example screen of a UI module in connection with generating viewer face tracking information according to an embodiment of the present invention. FIG.
도 6b는 본 발명의 일실시예에 따른 시청자 얼굴 추적정보 생성과 관련하여, UI모듈의 예시화면을 보여주는 제2사진. FIG. 6B is a second picture showing an example screen of a UI module in connection with generating viewer face tracking information according to an embodiment of the present invention. FIG.
도 7은 본 발명의 일실시예에 따른 시청자 얼굴 추적정보 생성방법의 과정을 도시한 순서도. 7 is a flowchart illustrating a process of a viewer face tracking information generation method according to an embodiment of the present invention.
도 8은 기존의 Harr-like feaure의 기본 형태를 도시한 도면. 8 is a view showing the basic shape of a conventional Harr-like feaure.
도 9는 본 발명의 일실시예에 따른 시청자 얼굴 추적정보 생성과 관련하여, 정면 얼굴 영역 검출을 위한 Harr-like feaure의 예시 사진. 9 is an exemplary photograph of a harr-like feaure for detecting a front face region in relation to the generation of viewer face tracking information according to an embodiment of the present invention.
도 10은 본 발명의 일실시예에 따른 시청자 얼굴 추적정보 생성과 관련하여,비정면 얼굴 영역 검출을 위한 Harr-like feaure의 예시 사진. FIG. 10 is an exemplary photograph of a harr-like feaure for detecting a non-frontal face region in connection with generating viewer face tracking information according to an embodiment of the present invention. FIG.
도 11은 본 발명의 일실시예에 따른 시청자 얼굴 추적정보 생성과 관련하여, 새롭게 추가된 직4각 feaure를 도시한 도면. FIG. 11 is a diagram illustrating a newly added rectangular feaure in connection with generating viewer face tracking information according to an embodiment of the present invention. FIG.
도 12는 본 발명의 일실시예에 따른 시청자 얼굴 추적정보 생성과 관련하여,비정면 얼굴 영역 검출을 위해 도 11에서 선택된 Harr-like feaure의 예시 사진. FIG. 12 is an exemplary photograph of a harr-like feaure selected from FIG. 11 for detecting a non-frontal face region in connection with generating viewer face tracking information according to an embodiment of the present invention. FIG.
도 13은 기존의 Harr-like feaure와 본 발명에 적용된 Harr-like feaure에 대한 Training Set에서의 feature 확률곡선. Figure 13 is a feature probability curve in a training set for a conventional Harr-like feaure and Harr-like feaure applied to the present invention.
도 14는 비정면얼굴의 Training Set에서 새로 추가한 특징들과 기존 Harr-like feaure의 확률곡선의 분산과 Kurtosis의 평균값을 도시한 표. 14 is a table showing the variance of the probability curve of the newly added features and the existing Harr-like feaure and the mean value of Kurtosis in the training set of the non-facial face.
도 15는 해상도가 낮거나 화질이 나쁜 화상에 대해 기존 ASM방법에 적용된 프로필사진. 15 is a profile picture applied to the conventional ASM method for a low-resolution or poor image quality.
도 16은 본 발명의 표식점탐색을 위한 Adaboost에 이용되는 각 표식점주변의 패턴사진. 16 is a photograph of the pattern around each marker point used in Adaboost for marker point search of the present invention.
도 17은 본 발명의 일실시예에 따른 시청자 얼굴 추적정보 생성과 관련하여, 얼굴의 28개 특징점을 표시한 사진. FIG. 17 is a photograph showing 28 feature points of a face in connection with generating viewer face tracking information according to an embodiment of the present invention. FIG.
도 18은 본 발명의 일실시예에 따른 시청자 얼굴 추적정보 생성방법의 행렬 추정과정을 도시한 순서도. 18 is a flowchart illustrating a matrix estimation process of a method for generating viewer face tracking information according to an embodiment of the present invention.
도 19는 본 발명의 일실시예에 따른 시청자 얼굴 추적정보 생성방법의 성별 추정과정을 도시한 순서도. 19 is a flowchart illustrating a gender estimation process of a method for generating viewer face tracking information according to an embodiment of the present invention.
도 20은 본 발명의 일실시예에 따른 시청자 얼굴 추적정보 생성방법의 성별 추정과정에서 성별추정용 얼굴영역을 정의하기 위한 예시사진. 20 is an exemplary photograph for defining a gender estimation face area in the gender estimation process of the viewer face tracking information generation method according to an embodiment of the present invention.
도 21은 본 발명의 일실시예에 따른 시청자 얼굴 추적정보 생성방법의 나이 추정과정을 도시한 순서도. 21 is a flowchart illustrating an age estimation process of a method for generating viewer face tracking information according to an embodiment of the present invention.
도 22는 본 발명의 일실시예에 따른 시청자 얼굴 추적정보 생성방법의 나이 추정과정에서 나이추정용 얼굴영역을 정의하기 위한 예시사진. 22 is an exemplary photograph for defining an age estimation face region in an age estimation process of a method for generating viewer face tracking information according to an embodiment of the present invention.
도 23은 본 발명의 일실시예에 따른 시청자 얼굴 추적정보 생성방법의 눈감김 추정과정을 도시한 순서도. 23 is a flowchart illustrating a process of estimating eye closure of a method of generating viewer face tracking information according to an embodiment of the present invention.
도 24는 본 발명의 일실시예에 따른 시청자 얼굴 추적정보 생성방법의 눈감김 추정과정에서 눈감김추정용 얼굴영역을 정의하기 위한 예시사진. 24 is an exemplary photograph for defining a face region for eye closure estimation in a process of eyelid estimation of a method for generating viewer face tracking information according to an embodiment of the present invention.
도 25는 본 발명의 일실시예에 따른 시청자 얼굴 추적정보 생성과 관련하여, 영상입력수단의 좌표계(카메라 좌표계)를 설명하기 위한 평면도. 25 is a plan view for explaining a coordinate system (camera coordinate system) of the image input means in connection with generating the viewer face tracking information according to an embodiment of the present invention.
본 발명은 그 기술적 사상 또는 주요한 특징으로부터 벗어남이 없이 다른 여러가지 형태로 실시될 수 있다. The present invention can be embodied in many other forms without departing from the spirit or main features thereof.
따라서, 본 발명의 실시예들은 모든 점에서 단순한 예시에 지나지 않으며 한정적으로 해석되어서는 안된다.Therefore, the embodiments of the present invention are merely examples in all respects and should not be interpreted limitedly.
제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. Terms such as first and second may be used to describe various components, but the components should not be limited by the terms.
상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. The terms are used only for the purpose of distinguishing one component from another.
예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component.
및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.The term and / or includes a combination of a plurality of related items or any item of a plurality of related items.
어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. When a component is referred to as being "connected" or "connected" to another component, it may be directly connected to or connected to that other component, but it may be understood that other components may be present in between. Should be.
반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.On the other hand, when a component is said to be "directly connected" or "directly connected" to another component, it should be understood that there is no other component in between.
본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of the present invention.
단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. Singular expressions include plural expressions unless the context clearly indicates otherwise.
본 출원에서, "포함하다" 또는 "구비하다", "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이다. In this application, the terms "comprise", "comprise", "have" and the like are intended to indicate that there are features, numbers, steps, operations, components, parts, or combinations thereof described in the specification.
그러므로, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Therefore, it should be understood that it does not exclude in advance the possibility of the presence or addition of one or more other features or numbers, steps, operations, components, parts or combinations thereof.
다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art.
일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Terms such as those defined in the commonly used dictionaries should be construed as having meanings consistent with the meanings in the context of the related art, and are not construed in ideal or excessively formal meanings unless expressly defined in this application. Do not.
이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 대응하는 구성 요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings, and the same or corresponding components will be denoted by the same reference numerals regardless of the reference numerals and redundant description thereof will be omitted.
본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.In the following description of the present invention, if it is determined that the detailed description of the related known technology may obscure the gist of the present invention, the detailed description thereof will be omitted.
도 4는 본 발명의 일실시예에 따른 시청자 얼굴 추적정보 생성장치의 개략적인 구성을 도시한 구성도이다. 4 is a block diagram showing a schematic configuration of a viewer face tracking information generating device according to an embodiment of the present invention.
시청자의 응시방향 및 응시거리 중 적어도 하나의 정보에 대응하여 3차원 디스플레이 장치의 입체감을 제어하기 위한 시청자 얼굴 추적정보 생성장치가 개시된다. Disclosed is a viewer face tracking information generating apparatus for controlling a stereoscopic feeling of a 3D display device in response to at least one of a gaze direction and a gaze distance of a viewer.
시청자 얼굴 추적정보 생성장치는 중앙처리유닛, 시스템 DB, 시스템 메모리, 인터페이스 등의 컴퓨팅 요소를 구비한다. The viewer face tracking information generating device includes a computing element such as a central processing unit, a system DB, a system memory, and an interface.
시청자 얼굴 추적정보 생성장치는 3D TV와 같은 3차원 디스플레이 장치에 제어 신호 송수신이 가능하도록 연결된 통상의 컴퓨터 시스템이 될 수 있다. The viewer face tracking information generating device may be a conventional computer system connected to a 3D display device such as a 3D TV to transmit and receive a control signal.
시청자 얼굴 추적정보 생성장치는 상술한 통상의 컴퓨터 시스템에 시청자 얼굴 추적정보 생성 프로그램의 설치 및 구동에 의해 시청자 얼굴 추적정보 생성장치로서 기능되는 것으로 볼 수 있다. The viewer face tracking information generating apparatus can be regarded as functioning as the viewer face tracking information generating apparatus by installing and driving the viewer face tracking information generating program in the above-described conventional computer system.
다른 관점에서, 본 실시예의 시청자 얼굴 추적정보 생성장치는, 3D TV와 같은 3차원 디스플레이 장치에 임베디드 장치 형태로 구성될 수도 있다. In another aspect, the viewer face tracking information generation device of the present embodiment may be configured in the form of an embedded device in a three-dimensional display device such as a 3D TV.
이러한 컴퓨터 시스템의 통상적 구성에 대한 설명은 생략하며, 이하에서는 본 발명의 실시예의 설명에 필요한 기능 관점의 구성을 중심으로 설명한다. A description of the general configuration of such a computer system is omitted, and the following description will focus on the configuration of functional aspects required for the description of the embodiments of the present invention.
시청자 얼굴 추적정보 생성장치는 얼굴영역 검출모듈(100)을 구비한다. The viewer face tracking information generating device includes a face region detection module 100.
상기 얼굴영역 검출모듈(100)은, 상기 3차원 디스플레이 장치 측의 일 위치에 구비된 영상입력수단(10), 예를 들어, 카메라를 통해 입력되는 영상에서 이미지 캡쳐부(20)가 캡쳐하여 추출한 이미지로부터 상기 시청자의 얼굴영역을 검출한다. The face region detection module 100 is captured by the image capture unit 20 captured by an image input unit 10, for example, an image input through a camera, provided at a position of the 3D display apparatus. The facial region of the viewer is detected from the image.
이때, 검출 보기각도는 -90 ~ +90 범위의 모든 얼굴들이 될 수 있다. In this case, the detection viewing angle may be all faces in the range of -90 to +90.
상기 영상입력수단(10)은, 예를 들어, 도 25에 도시된 바와 같이, 3D TV(1)의 정중앙부 상단 또는 하단 측에 설치될 수 있다. For example, as illustrated in FIG. 25, the image input means 10 may be installed at the top or bottom side of the center portion of the 3D TV 1.
상기 영상입력수단(10)은, 실시간으로 TV화면 전방에 위치한 시청자의 얼굴을 동영상으로 촬영할 수 있는 카메라, 더욱 바람직하게는, 이미지센서가 부착된 디지털 카메라가 될 수 있다. The image input means 10 may be a camera capable of capturing a face of a viewer located in front of a TV screen in real time as a video, and more preferably, a digital camera having an image sensor.
본 실시예의 영상입력수단(10)은 하나만 구비되어도 후술하는 시청자 얼굴 추적정보를 생성할 수 있다. Even if only one image input means 10 of the present embodiment is provided, the viewer face tracking information described later may be generated.
상기 얼굴영역 검출모듈(100)은, 상기 추출된 이미지의 RGB 색 정보로부터 YCbCr 색 모델을 작성하고, 작성된 색 모델에서 색 정보와 밝기 정보를 분리하며, 상기 밝기 정보에 의하여 얼굴후보영역을 검출하는 기능을 수행한다. The face area detection module 100 generates a YCbCr color model from the RGB color information of the extracted image, separates color information and brightness information from the created color model, and detects a face candidate area based on the brightness information. Perform the function.
상기 얼굴영역 검출모듈(100)은, 상기 검출된 얼굴후보영역에 대한 4각 특징점 모델을 정의하고, 상기 4각 특징점 모델을 AdaBoost 학습 알고리즘에 의하여 학습시킨 학습자료에 기초하여 얼굴영역을 검출하는 기능을 수행한다. The face region detection module 100 defines a quadrilateral feature point model for the detected face candidate region, and detects the face region based on the training material learned by the AdaBoost learning algorithm. Do this.
상기 얼굴영역 검출모듈(100)은, 상기 AdaBoost의 결과값의 크기가 소정임계값을 초과하는 경우에 상기 검출된 얼굴영역을 유효한 얼굴영역으로 판정하는 기능을 수행한다. The face area detection module 100 performs a function of determining the detected face area as a valid face area when the size of the result value of the AdaBoost exceeds a predetermined threshold value.
시청자 얼굴 추적정보 생성장치는 또한, 얼굴특징점 검출모듈(200)을 구비한다. The viewer face tracking information generation device also includes a face feature point detection module 200.
상기 얼굴특징점 검출모듈(200)은, 상기 얼굴영역 검출모듈(100)에서 유효하다고 판단된 얼굴영역들에 대하여 얼굴특징점 검출을 진행한다. The facial feature point detection module 200 performs facial feature point detection on face areas determined to be valid in the face area detection module 100.
상기 얼굴특징점 검출모듈(200)은, 얼굴 보기회전각도를 포함한, 예를 들어, 눈썹, 눈, 코, 입의 각 위치에 대한 정의가 가능한 28개의 얼굴특징점을 검출할 수 있다. The facial feature detection module 200 may detect 28 facial feature points, including, for example, a face viewing rotation angle, for which each position of an eyebrow, an eye, a nose, and a mouth can be defined.
본 실시예에서, 바람직하게는 기본 얼굴특징점인 눈4개, 코2개, 입2개의 총 8개의 특징점을 얼굴특징점으로서 검출할 수 있다. In this embodiment, a total of eight feature points, preferably four eyes, two noses, and two mouths, which are basic facial feature points, can be detected as facial feature points.
시청자 얼굴 추적정보 생성장치는 또한, 행렬 추정모듈(300)을 구비한다. The viewer face tracking information generation device also includes a matrix estimation module 300.
상기 행렬 추정모듈(300)은, 3차원 표준 얼굴모델의 모델특징점을 변환하여 상기 얼굴특징점에 대응하는 3차원 시청자 얼굴모델을 생성하는 최적변환행렬을 추정한다. The matrix estimation module 300 estimates an optimal transformation matrix for generating a 3D viewer face model corresponding to the face feature by converting a model feature point of the 3D standard face model.
여기서, 상기 3차원 표준 얼굴모델은, 도 5에 도시된 바와 같이, 331개의 점과 630개의 삼각형으로 구성된 3D 메쉬 형태의 모형이 될 수 있다. Here, the 3D standard face model may be a 3D mesh model composed of 331 points and 630 triangles, as shown in FIG. 5.
시청자 얼굴 추적정보 생성장치는 또한, 추적정보 생성모듈(400)을 구비한다. The viewer face tracking information generation device also includes a tracking information generation module 400.
상기 추적정보 생성모듈(400)은, 상기 최적변환행렬에 근거하여 상기 시청자의 응시방향 및 응시거리 중 적어도 하나를 추정하여 시청자 얼굴 추적정보를 생성한다. The tracking information generation module 400 estimates at least one of the gaze direction and gaze distance of the viewer based on the optimal transformation matrix to generate viewer face tracking information.
시청자 얼굴 추적정보 생성장치는 또한, 성별 추정모듈(500)을 구비한다. The viewer face tracking information generation device also includes a gender estimation module 500.
상기 성별 추정모듈(500)은 상기 검출된 얼굴영역을 이용하여 상기 시청자의 성별을 추정한다. The gender estimating module 500 estimates the gender of the viewer using the detected face region.
상기 성별 추정모듈(500)은 상기 검출된 얼굴영역에서 성별 추정용 얼굴영역을 잘라내는 기능, 잘라낸 얼굴영역 이미지를 정규화하는 기능, 정규화된 이미지를 이용하여 SVM(Support Vector Machine)에 의한 성별추정 기능을 수행한다. The gender estimating module 500 cuts out a gender estimation face area from the detected face area, normalizes the cut out face area image, and estimates a sex by a SVM (Support Vector Machine) using the normalized image. Do this.
시청자 얼굴 추적정보 생성장치는 또한, 나이 추정모듈(600)을 구비한다. The viewer face tracking information generation device also includes an age estimation module 600.
상기 나이 추정모듈(600)은 상기 검출된 얼굴영역을 이용하여 상기 시청자의 나이를 추정한다. The age estimation module 600 estimates the age of the viewer using the detected face region.
상기 나이 추정모듈(600)은 상기 검출된 얼굴영역에서 나이 추정용 얼굴영역을 잘라내는 기능을 수행한다. The age estimation module 600 cuts out an age estimation face area from the detected face area.
상기 나이 추정모듈(600)은 잘라낸 얼굴영역 이미지를 정규화하는 기능을 수행한다. The age estimation module 600 performs a function of normalizing the cropped face region image.
상기 나이 추정모듈(600)은 정규화된 이미지로부터 입력벡터를 구성하고 나이다양체 공간으로 사영하는 기능을 수행한다. The age estimating module 600 constructs an input vector from a normalized image and performs projection on a nine-body space.
상기 나이 추정모듈(600)은 2차 다항식 회귀를 이용하여 나이를 추정하는 기능을 수행한다. The age estimation module 600 performs a function of estimating age using a second order polynomial regression.
시청자 얼굴 추적정보 생성장치는 또한, 눈감김 추정모듈(700)을 구비한다. The viewer face tracking information generation device also includes an eyelid estimation module 700.
상기 눈감김 추정모듈(700)은 상기 검출된 얼굴영역을 이용하여 상기 시청자의 눈감김을 추정한다. The eyelid estimation module 700 estimates the eyelids of the viewer using the detected face region.
상기 눈감김 추정모듈(700)은 눈감김 추정용 얼굴영역을 잘라내는 기능, 잘라낸 얼굴영역 이미지를 정규화하는 기능, 정규화된 이미지를 이용하여 SVM(Support Vector Machine)에 의한 눈감김추정 기능을 수행한다. The eyelid estimation module 700 performs a function of cutting a face region for eyelid estimation, a function of normalizing the cut-out face region image, and an eyelid estimation function by a support vector machine (SVM) using the normalized image. .
시청자 얼굴 추적정보 생성장치는 또한, 상기 3차원 디스플레이 장치의 일측에 구비된 영상입력수단(10)의 설정(도 6a), 검출한 얼굴영역 및 나이/성별 결과 등을 디스플레이(도 6b)할 수 있도록 하는 UI(30, User Interface) 모듈을 구비한다. The viewer face tracking information generating apparatus may also display the setting of the image input means 10 provided on one side of the 3D display apparatus (FIG. 6A), the detected face region, the age / gender result, and the like (FIG. 6B). It is provided with a UI (User Interface) module.
도 7은 본 발명의 일실시예에 따른 시청자 얼굴 추적정보 생성방법의 과정을 도시한 순서도이다. 7 is a flowchart illustrating a process of generating a viewer face tracking information according to an embodiment of the present invention.
도시된 바와 같이 본 실시예에 의한 시청자 얼굴 추적정보 생성방법은, 생성 과정의 시작 단계로부터 출발하여, 얼굴영역 검출단계(S100), 얼굴특징점 검출단계(S200), 행렬 추정단계(S300), 추적정보 생성단계(S400), 성별 추정단계(S500), 나이 추정단계(S600), 눈감김 추정단계(S700), 결과 출력단계(S800)를 거쳐 종료 단계로 이뤄진다. As shown, the viewer face tracking information generation method according to the present embodiment starts from the start of the generation process, and includes the face area detection step S100, the facial feature point detection step S200, the matrix estimation step S300, and the tracking. After the information generation step (S400), gender estimation step (S500), age estimation step (S600), eye closure estimation step (S700), the result output step (S800) is made to the end step.
상기 얼굴영역 검출단계(S100)에서는, 상기 3차원 디스플레이 장치 측의 일 위치에 구비된 영상입력수단을 통해 입력되는 영상에서 추출한 이미지로부터 상기 시청자의 얼굴영역을 검출한다. In the face region detection step (S100), the face region of the viewer is detected from an image extracted from an image input through an image input means provided at one position of the 3D display apparatus.
얼굴 검출을 위한 방법으로서, 예를 들어, 지식기반 방법(Knowledge-based), 특징기반방법(feature-based), 형판 정합(template-matching) 방법, 외형기반(Appearance-based)방법 등이 있다.As a method for face detection, for example, a knowledge-based method, a feature-based method, a template-matching method, an appearance-based method, and the like.
바람직하게, 본 실시예에서는 외형기반(Appearance-based)방법을 사용한다. Preferably, in this embodiment, an appearance-based method is used.
외형기반방법은 상이한 영상들에서 얼굴영역과 비얼굴영역을 획득하며, 획득된 영역들을 학습하여 학습모델을 만들고, 입력 영상과 학습모델자료를 비교하여 얼굴을 검출하는 방법이다. The appearance-based method is a method of acquiring a face region and a non-face region from different images, learning the acquired regions to make a learning model, and comparing the input image and the learning model data to detect a face.
상기 외형기반방법은 정면 및 측면 얼굴 검출에 대해서 비교적 성능이 높은 방법으로 알려져 있다.The appearance-based method is known as a relatively high performance method for front and side face detection.
이러한 얼굴검출과 관련하여, Jianxin Wu, S.Charles Brubaker, Matthew D.Mullin, and James M.Rehg의 논문, "Fast Asymmetric Learning for Cascade Face Detection,"(IEEE Tran- saction on Pattern Analysis and Machine Intelligence, Vol.30, No.3, MARCH 2008.)와, Paul Viola, Michael Jones, "Rapid Object Detection using a Boosted Cascade of Simple Features"(Accepted Conference on Computer Vision and Pattern Recognition 2001.)등을 통해 이해될 수 있다.Regarding this face detection, Jianxin Wu, S. Charles Brubaker, Matthew D. Mullin, and James M. Rehg, "Fast Asymmetric Learning for Cascade Face Detection," by IEEE Transcription on Pattern Analysis and Machine Intelligence, 30, No. 3, MARCH 2008.), and Paul Viola, Michael Jones, "Rapid Object Detection using a Boosted Cascade of Simple Features" (Accepted Conference on Computer Vision and Pattern Recognition 2001.). have.
상기 영상입력수단을 통해 입력되는 영상에서의 이미지 추출은, 예를 들어, DirectX의 샘플 그래버(SampleGrabber)를 이용하여 영상입력수단을 통해 입력되는 영상에서 이미지를 캡쳐하는 방식으로 이뤄질 수 있다. Image extraction from an image input through the image input means may be performed by capturing an image from an image input through the image input means, for example, using a sample grabber of DirectX.
상기 영상입력수단을 통해 입력되는 영상에서의 이미지 추출에 관한 바람직한 일예로서, 샘플 그래버의 미디어형식(MediaType)을 RGB24로 설정할 수 있다. As a preferred example of extracting an image from an image input through the image input means, the media type of the sample grabber may be set to RGB24.
한편, 영상입력수단의 영상포멧(format)이 RGB24와 다른 경우 샘플 그래버 필터의 앞단에 비디오 컨버터 필터(videoconverter filter)가 자동으로 붙어 최종적으로 샘플 그래버에서 캡쳐되는 이미지가 RGB24가 되도록 할 수 있다. On the other hand, when the image format of the image input means is different from RGB24, a video converter filter is automatically attached to the front of the sample grabber filter so that the image captured by the sample grabber finally becomes RGB24.
예를 들어, E.g,
AM_MEDIA_TYPE mt;AM_MEDIA_TYPE mt;
// Set the media type to Sample Grabber// Set the media type to Sample Grabber
ZeroMemory(&mt, sizeof(AM_MEDIA_TYPE));ZeroMemory (& mt, sizeof (AM_MEDIA_TYPE));
mt.formattype = FORMAT_VideoInfo; mt.formattype = FORMAT_VideoInfo;
mt.majortype = MEDIATYPE_Video;mt.majortype = MEDIATYPE_Video;
mt.subtype = MEDIASUBTYPE_RGB24; // only accept 24-bit bitmapsmt.subtype = MEDIASUBTYPE_RGB24; // only accept 24-bit bitmaps
hr = pSampleGrabber->SetMediaType(&mt); hr = pSampleGrabber-> SetMediaType (&mt);
와 같이 구성될 수 있다. It can be configured as.
한편, 본 실시예의 얼굴 영역 검출은, (a1) 상기 추출된 이미지의 RGB 색 정보로부터 YCbCr 색 모델을 작성하고, 작성된 색 모델에서 색 정보와 밝기 정보를 분리하며, 상기 밝기 정보에 의하여 얼굴후보영역을 검출하는 단계; (a2) 상기 검출된 얼굴후보영역에 대한 4각 특징점 모델을 정의하고, 상기 4각 특징점 모델을 AdaBoost 학습 알고리즘에 의하여 학습시킨 학습자료에 기초하여 얼굴영역을 검출하는 단계; 및 (a3) 상기 AdaBoost의 결과값(하기 수학식1의 CFH(x))의 크기가 소정임계값을 초과하는 경우에 상기 검출된 얼굴영역을 유효한 얼굴영역으로 판정하는 단계;를 포함하여 구성된다. Meanwhile, in the face area detection of the present embodiment, (a1) a YCbCr color model is generated from the RGB color information of the extracted image, color information and brightness information are separated from the generated color model, and the face candidate area is determined by the brightness information. Detecting; (a2) defining a quadrilateral feature point model for the detected face candidate region, and detecting a face region based on learning data trained by the AdaBoost learning algorithm on the quadrilateral feature point model; And (a3) determining the detected face area as a valid face area when the size of the result value of AdaBoost (CF H (x) of Equation 1) exceeds a predetermined threshold value. do.
[수학식1][Equation 1]
Figure PCTKR2012005202-appb-I000001
Figure PCTKR2012005202-appb-I000001
(단, M:강분류기를 구성하고 있는 전체 약분류기의 개수(However, M: the number of total classifiers constituting the strong classifiers
hm(x):m번째 약분류기에서의 출력값h m (x): Output value from the mth weak classifier
θ:강분류기의 오류판정률을 보다 세밀하게 조절하는데 이용되는 값으로써 경험적으로 설정한다.)θ: A value used to finely adjust the error judgment rate of the strong classifier.
AdaBoost 학습알고리즘은 약분류기의 선형적인 결합을 통하여 최종적으로 높은 검출 성능을 가지는 강분류기를 생성하는 알고리즘으로 알려져 있다. The AdaBoost learning algorithm is known as an algorithm that generates a strong classifier with high detection performance through linear combination of weak classifiers.
본 실시예에서는 비정면얼굴에서의 검출성능을 보다 높이기 위해 기존의 대칭적인 Haar-Like feature 뿐만아니라 비정면얼굴의 비대칭특성을 고려한 새로운 feature들을 더 포함한다. In this embodiment, in order to further improve the detection performance in the non-face face, as well as the existing symmetrical Haar-Like feature, it further includes new features considering the asymmetry characteristic of the face.
정면얼굴화상에서는 눈, 코, 입과 같이 얼굴의 고유한 구조적 특성들이 화상에 전반적으로 골고루 분포되어 있으며 대칭적이다. In frontal face images, the structural features unique to the face, such as eyes, nose and mouth, are evenly distributed throughout the image and are symmetrical.
그러나, 비정면얼굴화상에서는 대칭적이지 못하고 좁은 범위에 밀집되어 있으며 얼굴윤곽이 직선이 아니므로 배경영역이 많이 섞어져 있다. However, in the non-facial face image, it is not symmetrical and is concentrated in a narrow range. Since the face outline is not a straight line, the background area is mixed.
따라서 기존의 대칭적인 Haar-Like feature 들만으로는 비정면얼굴에 대한 높은 검출성능을 얻을 수 없는 문제점이 있다. Therefore, there is a problem that high detection performance for the non-frontal face cannot be obtained only with the existing symmetric Haar-Like features.
이러한 문제점을 극복하기 위해, 본 실시예에서는 기존의 Haar-like feature와 비슷하면서도 비대칭성을 부가한 새로운 Haar-Like feature 들을 더 포함한다. In order to overcome this problem, the present embodiment further includes new Haar-Like features similar to the existing Haar-like features but adding asymmetry.
이와 관련하여, 도 8은 기존의 Harr-like feaure 의 기본형태들이고, 도 9는 본 발명의 실시예에 의한 정면 얼굴 영역 검출을 위하여 선택된 Haar-like feature 들의 예시 사진이며, 도 10은 비정면 얼굴 영역 검출을 위하여 선택된 Haar-like feature 들의 예시 사진이다. In this regard, FIG. 8 is a basic form of a conventional Harr-like feaure, FIG. 9 is an exemplary photograph of Haar-like features selected for front face area detection according to an embodiment of the present invention, and FIG. An example photograph of Haar-like features selected for area detection.
도 11은 본 실시예에 의하여 새롭게 추가된 직4각 Haar-Like feature 를 보여주고 있으며, 도 12는 도 11의 Haar-Like feature 중 비정면얼굴검출을 위해 선택된 Haar-Like feature 들의 예시를 보여주고 있다. FIG. 11 shows a rectangular Haar-Like feature newly added by the present embodiment, and FIG. 12 shows an example of Haar-Like features selected for non-face detection among the Haar-Like features of FIG. 11. have.
본 실시예의 Haar-Like feature는 기존의 대칭적인 Haar-Like feature와 다르게 도 12에 도시된 바와 같이, 비대칭적인 형태, 구조, 모양으로 구성되어 비정면얼굴의 구조적 특성을 잘 반영하도록 구성되며, 비정면 얼굴에 대한 검출효과가 뛰어나다. Unlike the conventional symmetric Haar-Like feature, the Haar-Like feature of the present embodiment is configured to asymmetrically form, structure, and shape as shown in FIG. Excellent detection effect on the front face.
도 13은 기존의 Harr-like feaure와 본 실시예에 적용된 Harr-like feaure에 대한 Training Set에서의 Haar-Like feature 확률곡선이다. FIG. 13 is a probability curve of a Haar-Like feature in a training set for a conventional Harr-like feaure and a Harr-like feaure applied to this embodiment.
ㄱ)은 본 실시예의 경우, ㄴ)은 기존의 경우이며, 도시된 바와 같이, 본 실시예의 경우에 해당하는 확률곡선이 보다 좁은 범위에 밀집되어 있다. A) is the present case, b) is the existing case, and as shown, the probability curve corresponding to the case of the present embodiment is concentrated in a narrower range.
이것은 베이스분류규칙에 비추어 볼 때 본 실시예에서 추가된 Haar-Like feature 들이 비정면얼굴검출에서 효과적이라는 것을 의미한다. This means that the Haar-Like features added in this embodiment are effective in the face detection in view of the base classification rule.
도 14는 비정면얼굴의 Training Set에서 새로 추가한 특징들과 기존 Harr-like feaure의 확률곡선의 분산과 Kurtosis의 평균값을 도시한 표이다. FIG. 14 is a table showing newly added features in a training set of a non-facial face and an average value of variance and Kurtosis of the probability curve of the existing Harr-like feaure.
상기 표는 비정면얼굴의 Training Set에서 새로 추가한 Haar-Like feature 들과 기존 Haar-Like feature 들의 확률곡선의 분산과 Kurtosis의 평균값을 보여주고 있다. The table shows the variances and probability values of Kurtosis of the probability curves of the newly added Haar-Like features and existing Haar-Like features in the training set of the non-facial face.
본 실시예에서 추가된 Haar-Like feature 들이 분산이 작고 Kurtosis가 크며 이것은 검출에서 효과적이라는 것을 알 수 있다. The Haar-Like features added in this example show that the dispersion is small and the Kurtosis is large, which is effective in detection.
상술한 바와 같이, 상기 (a2) 단계에서, 상기 얼굴영역 검출을 위한 하 라이크 피쳐(harr-like feature)는 비정면 얼굴영역을 검출하기 위한 비대칭성의 하 라이크 피쳐(harr-like feature)를 더욱 포함한다. As described above, in the step (a2), the har-like feature for detecting the face area further includes an asymmetric har-like feature for detecting the non-frontal face area. do.
한편, 얼굴의 유효성을 판정하기 위한 방법으로서, 예를 들어, PCA(Principle Component Analysis)나 신경망을 이용한 방법 등이 있는데, 이러한 방법들은 속도가 느리고 별도의 해석을 필요로 한다는 단점이 있다. On the other hand, as a method for determining the validity of the face, for example, a method using a PCA (Principle Component Analysis) or a neural network, there is a disadvantage that these methods are slow and requires a separate analysis.
따라서, 본 발명의 일실시예에서는, 상기 AdaBoost의 결과값(상기 수학식1의 CFH(x))의 크기와 소정임계값을 비교하여 검출된 얼굴의 유효성을 판정한다. Therefore, in one embodiment of the present invention, the validity of the detected face is determined by comparing the magnitude of the result value of AdaBoost (CF H (x) of Equation 1) with a predetermined threshold value.
기존 AdaBoost방법에서는, 하기 참고식1과 같이 부호값만을 이용하였으나, 본 실시예에서는 그의 실제적인 크기를 이용하여 얼굴영역의 유효성을 판정한다. In the conventional AdaBoost method, only a code value is used as in the following Equation 1, but in this embodiment, the validity of the face area is determined using its actual size.
Figure PCTKR2012005202-appb-I000002
………[참고식 1]
Figure PCTKR2012005202-appb-I000002
… … … [Reference Formula 1]
즉, 상기 수학식1에서, CFH(x)의 크기가 얼굴의 유효성을 판정하기 위한 중요한 요소로 활용될 수 있다. That is, in Equation 1, the size of CF H (x) can be used as an important factor for determining the validity of the face.
이 값(CFH(x))은 검출된 영역이 얼굴에 얼마나 근사한가를 나타내는 척도로써 소정임계값을 설정하여 얼굴의 유효성판정에 이용할 수 있다. This value CF H (x) is a measure of how close the detected area is to the face and can be used to determine the validity of the face by setting a predetermined threshold value.
이때, 소정임계값은 학습얼굴모임을 이용하여 경험적으로 설정한다.At this time, the predetermined threshold is empirically set using the learning face group.
상기 얼굴특징점 검출단계(S200)에서는, 상기 검출된 얼굴영역에서 얼굴특징점을 검출한다. In the facial feature detection step S200, a facial feature point is detected in the detected face region.
상기 얼굴특징점 검출단계(S200)는, ASM(active shape model) 방법의 특징점(landmark) 탐색에 의해 이뤄지되, AdaBoost 알고리즘을 이용하여 진행하여 얼굴특징점을 검출한다. The facial feature detection step S200 is performed by searching for a landmark of the ASM method, and detects the facial feature by proceeding using the AdaBoost algorithm.
예를 들어, 상기 얼굴특징점의 검출은, (b1) 현재 특징점의 위치를 (xl, yl)라고 정의하고, 현재 특징점의 위치를 중심으로 그 근방에서 n*n 화소크기의 가능한 모든 부분창문들을 분류기로 분류하는 단계; (b2) 하기 수학식2에 의하여 특징점의 후보위치를 계산하는 단계; 및 (b3) 하기 수학식3의 조건을 만족하는 경우에는 (x'l, y'l)을 새로운 특징점으로 정하고, 만족하지 못하는 경우에는 현재 특징점의 위치(xl, yl)를 유지하는 단계;를 포함하여 구성된다. For example, the detection of the facial feature point (b1) defines the position of the current feature point as (x l , y l ), and all possible partial windows of n * n pixel size in the vicinity of the current feature point position. Classifying them into a classifier; (b2) calculating candidate positions of the feature points according to Equation 2 below; And (b3) setting (x ' l , y' l ) as a new feature point if the condition of Equation 3 is satisfied, and maintaining the position (x l , y l ) of the current feature point if not satisfied. It is configured to include.
[수학식2][Equation 2]
Figure PCTKR2012005202-appb-I000003
Figure PCTKR2012005202-appb-I000003
[수학식3][Equation 3]
Figure PCTKR2012005202-appb-I000004
Figure PCTKR2012005202-appb-I000004
(단, a:x축방향으로 탐색해나가는 최대근방거리(However, the maximum near distance searched in the a: x axis direction
b:y축방향으로 탐색해나가는 최대근방거리b: Maximum near distance searched in the y-axis direction
xdx , dy:(xl, yl)에서 (dx, dy)만큼 떨어진 점을 중심으로 하는 부분창문x dx , dy : partial window centered around (dx, dy) from (x l , y l )
Nall:분류기의 총계단수N all : Total stage number of classifier
Npass:부분창문이 통과된 계단수N pass : the number of steps through which the partial window has passed
c:끝까지 통과되지 못한 부분창문의 신뢰도값을 제한하기 위해 실험을 통해 얻은 1보다 작은 상수값)c: constant value less than 1 obtained from experiments to limit the reliability of partial windows not passed to the end)
얼굴의 특징점을 검출하기 위한 방법으로서, 예를 들어, 특징점들을 개별적으로 검출하는 방법과 특징점들의 상호연관속에서 동시에 검출해내는 방법 등이 있다. As a method for detecting a feature point of a face, there are, for example, a method of individually detecting feature points and a method of simultaneously detecting a feature point in correlation.
개별적으로 특징점들을 검출하는 방법은 부분적인 가림이 있는 얼굴화상들에서 검출오류가 많은 문제점이 있기 때문에, 본 실시예에서는 속도와 정확성에 있어서 얼굴 특징 검출에 바람직한 방법인 ASM(Active Shape Model) 방법을 이용한다. Since the method of detecting feature points individually has many problems of detecting errors in partially obscured face images, in this embodiment, the Active Shape Model (ASM) method, which is a preferable method for face feature detection in terms of speed and accuracy, is used. I use it.
이러한 ASM 방법에 대하여서는 T.F.Cootes, C.J.Taylor, D.H.Cooper, and J.Graham의 논문 “Active shape models: Their training and application” (CVGIP: Image Understanding, Vol.61, pp.38-59, 1995) 과 S.C.Yan, C.Liu, S.Z.Li, L.Zhu, H.J.Zhang, H.Shum, and Q.Cheng의 논문 “Texture-constrained active shape models”(In Proceedings of the First International Workshop on Generative-Model-Based Vision (with ECCV), May 2002), T.F.Cootes, G.J.Edwards, and C.J.Taylor의 논문 “Active appearance models”(In ECCV 98, Vol.2, pp.484-498, 1998) T.F.Cootes, G.Edwards, and C.J.Taylor의 논문 “Comparing Active Shape Models with Active Appearance Models” 등을 통해 이해될 수 있다. These ASM methods are discussed in TFCootes, CJTaylor, DHCooper, and J. Graham's paper, “Active shape models: Their training and application” (CVGIP: Image Understanding, Vol. 61, pp.38-59, 1995). SCYan, C.Liu, SZLi, L.Zhu, HJZhang, H.Shum, and Q.Cheng's paper “Texture-constrained active shape models” (In Proceedings of the First International Workshop on Generative-Model-Based Vision (with ECCV), May 2002), TFCootes, GJEdwards, and CJ Taylor's paper “Active appearance models” (In ECCV 98, Vol. 2, pp. 484-498, 1998) TFCootes, G.Edwards, and CJTaylor's paper “Comparing Active Shape Models with Active Appearance Models” can be understood.
한편, 기존 ASM의 특징점탐색은 특징점에서의 프로필(Profile)을 이용하는 방법이기 때문에 고품질의 화상에서만 검출이 안정적으로 이뤄진다. On the other hand, since the feature point search of the existing ASM is a method using a profile at the feature point, detection is stable only in high quality images.
일반적으로 카메라 등의 영상입력수단을 통해 입력되는 영상에서 추출된 이미지는 저해상도, 저품질의 이미지로서 얻어질 수 있다. In general, an image extracted from an image input through an image input means such as a camera may be obtained as a low resolution, low quality image.
따라서, 본실시예에서는 AdaBoost방법에 의한 특징점탐색에 의해 이를 개선하여, 저해상도와 저품질의 화상에서도 특징점들을 용이하게 검출할 수 있도록 한다. Therefore, in the present embodiment, the feature point is searched by the AdaBoost method to improve the feature, so that the feature points can be easily detected even in low resolution and low quality images.
도 15는 해상도가 낮거나 화질이 나쁜 화상에 대해 기존 ASM방법에 적용된 프로필사진이고, 도 16은 본 발명의 표식점탐색을 위한 Adaboost에 이용되는 각 표식점주변의 패턴사진이다. FIG. 15 is a profile picture applied to an existing ASM method for an image having a low resolution or poor image quality. FIG. 16 is a pattern picture around each mark point used in Adaboost for mark point search of the present invention.
상기 얼굴특징점 검출단계(S200) 및 추정정보 생성단계(S400)에서는, 도 17에 도시된 바와 같이, 다수의 특징점(예를 들어, 28개)을 검출할 수 있다. In the facial feature point detection step S200 and the estimation information generation step S400, as illustrated in FIG. 17, a plurality of feature points (for example, 28) may be detected.
본 실시예에서는 연산처리 및 추적성능을 함께 고려하여 기본얼굴특징점(눈4개(4, 5, 6, 7), 코2개(10, 11), 입2개(8, 9)) 8개만을 응시거리 및 응시방향의 추정에 사용한다. In the present embodiment, eight basic facial features (4 eyes (4, 5, 6, 7), 2 noses (10, 11) and 2 mouths (8, 9)) in consideration of arithmetic processing and tracking performance together Bay is used to estimate gaze distance and gaze direction.
상기 행렬 추정단계(S300)는, 도 18에 도시된 바와 같이, 8개의 얼굴특징점 입력(S310, 예를 들어, 검출된 8개의 특징점의 좌표값을 본 실시예의 프로그램이 구동되는 컴퓨팅 수단이 메모리 상에 입력값으로 불러들임), 3차원 표준 얼굴모델 적재(S320, 예를 들어, DB에 저장되어 있던 3D얼굴모델의 전체 좌표 정보를 본 프로그램이 구동되는 컴퓨팅 수단이 입력값으로 불러들임), 최적변환행렬 추정(S330)으로 이뤄진다. In the matrix estimating step S300, as illustrated in FIG. 18, eight facial feature points input S310, for example, the coordinate values of the detected eight feature points are stored in a memory device in which the program of the present embodiment is driven. Loading into the input value), 3D standard face model loading (S320, for example, the overall coordinate information of the 3D face model stored in the DB, the computing means that the program is driven as the input value), optimal Conversion matrix estimation (S330) is performed.
이렇게 추정된 최적변환행렬로부터 응시방향 및 응시거리를 계산하는 추정정보 생성단계(S400)가 이뤄진다. The estimation information generation step (S400) of calculating the gaze direction and gaze distance from the estimated optimal transformation matrix is performed.
상기 3차원 표준 얼굴모델은, 도 5에 도시된 바와 같이, 331개의 점과 630개의 삼각형으로 구성된 3D 메쉬 형태의 모형이다. As shown in FIG. 5, the 3D standard face model is a 3D mesh model composed of 331 points and 630 triangles.
상기 추정정보 생성단계(S400)는, 상기 최적변환행렬에 근거하여 상기 시청자의 응시방향 및 응시거리 중 적어도 하나를 추정하여 시청자 얼굴 추적정보를 생성한다. The estimating information generating step (S400) generates viewer face tracking information by estimating at least one of the gaze direction and gaze distance of the viewer based on the optimal transformation matrix.
상기 최적변환행렬 추정은, (c1) 상기 3차원 표준 얼굴모델의 얼굴 회전정보에 관한 3*3 행렬 M과 얼굴 평행이동정보에 관한 3차원 벡터 T를 이용하여 하기 수학식4의 변환식을 계산하는 단계-상기 M과 T는 각 성분을 변수로 가지며, 상기 최적변환행렬을 정의하는 행렬임-; (c2) 상기 수학식4에 의해 구해진 카메라특징점위치벡터(PC)와 하기 수학식6에 의해 구해진 카메라변환행렬(MC)를 이용하여 하기 수학식5의 3차원 벡터 P'을 계산하는 단계; (c3) 상기 3차원 벡터 P'에 근거하여 2차원 벡터 PI를 (P'x/P'z, P'y/P'z)로 정의하는 단계; 및 (c4) 상기 2차원 벡터 PI와 상기 (b) 단계에서 검출된 얼굴특징점의 좌표값을 이용하여 상기 최적변환행렬의 각 변수를 추정하는 단계;를 포함하여 구성된다. The optimal transformation matrix estimation is performed by calculating (c1) a transformation equation of Equation 4 using a 3 * 3 matrix M for face rotation information of the 3D standard face model and a 3D vector T for face parallel movement information. Step M and T are variables having respective components as variables and defining the optimal transformation matrix; (c2) calculating the three-dimensional vector P 'of Equation 5 using the camera feature point position vector P C obtained by Equation 4 and the camera transformation matrix M C obtained by Equation 6 below; ; (c3) defining a two-dimensional vector P I as (P ' x / P' z , P ' y / P' z ) based on the three-dimensional vector P '; And (c4) estimating each variable of the optimal transformation matrix using the two-dimensional vector P I and the coordinate values of the facial feature points detected in the step (b).
[수학식4][Equation 4]
PC=M*PM+TP C = M * P M + T
[수학식5][Equation 5]
P'=Mc * Pc P '= M c * P c
(단, P'은 (P'x, P'y, P'z)로 정의되는 3차원 벡터)(Where P 'is a three-dimensional vector defined by (P' x , P ' y , P' z ))
최적변환행렬은 수학적으로 보면 3*3 행렬 M과 3차원 벡터 T로 구성되어 있다. 여기서 3*3 행렬 M은 얼굴의 회전정보를 반영하며, 3차원 벡터 T는 얼굴의 평행이동정보를 반영한다. The optimal transform matrix is mathematically composed of a 3 * 3 matrix M and a 3D vector T. Here, the 3 * 3 matrix M reflects the rotation information of the face, and the 3D vector T reflects the parallel movement information of the face.
먼저, 상기 수학식4에 의하여, 3차원 표준 얼굴모델의 좌표계에서의 특징점위치(3차원벡터) PM은 상기 최적변환행렬(M, T)에 의해 카메라좌표계에서의 위치(3차원벡터) Pc로 변환된다. First, according to Equation 4, the feature point position (three-dimensional vector) P M in the coordinate system of the three-dimensional standard face model is the position (three-dimensional vector) P in the camera coordinate system by the optimal transformation matrix (M, T). converted to c .
이때, 상기 3차원 표준 얼굴모델 좌표계는 좌표중심이 3차원 표준 얼굴모델의 중심에 위치한 3차원 좌표계이고, 상기 카메라좌표계는 중심이 영상입력수단(도 25의 10)의 중심에 위치한 3차원 좌표계이다. In this case, the 3D standard face model coordinate system is a 3D coordinate system whose coordinate center is located at the center of the 3D standard face model, and the camera coordinate system is a 3D coordinate system whose center is located at the center of the image input means (10 in FIG. 25). .
다음으로, 상기 수학식5에 의하여, 상기 카메라특징점위치벡터 Pc와 카메라변환행렬 Mc 를 이용하여 (P'x, P'y, P'z)로 정의된 3차원 벡터인 P'을 구한다. Next, P ', which is a three-dimensional vector defined by (P'x, P'y, P'z), is obtained using the camera feature point position vector P c and the camera transformation matrix M c according to Equation 5. .
여기서 카메라변환행렬Mc는 카메라의 초점거리 등에 의하여 결정되는 3*3행렬로서, 하기 수학식6과 같이 정의된다. Here, the camera transformation matrix Mc is a 3 * 3 matrix determined by the focal length of the camera and the like, and is defined as in Equation 6 below.
[수학식6][Equation 6]
Figure PCTKR2012005202-appb-I000005
Figure PCTKR2012005202-appb-I000005
(단, W:영상입력수단(카메라)으로 입력된 이미지의 폭(W: width of image input by video input means (camera)
H:영상입력수단(카메라)으로 입력된 이미지의 높이H: Height of image input by video input means (camera)
focal_len:-0.5*W/tan(Degree2Radian(fov*0.5))focal_len: -0.5 * W / tan (Degree2Radian (fov * 0.5))
fov:카메라의 보임각도)fov: angle of view of the camera)
따라서, 최적변환행렬(M, T)의 하기에서 설명하는 바와 같은 12개의 변수를 포함하여 “P'=(P'x, P'y, P'z)”이 정의되고, 이에 따라 상기 12개의 변수를 포함하여 “PI=(P'x/P'z, P'y/P'z)”가 정의될 수 있다. Therefore, "P '= (P'x, P'y, P'z)" is defined including 12 variables of the optimal conversion matrix M, T as described below, and thus the 12 Including the variable, “P I = (P'x / P'z, P'y / P'z)” can be defined.
상술한 바와 같은 과정에 의한 최적변환행렬(M, T)의 추정과정을 간단히 보면 다음과 같다. The estimation process of the optimal conversion matrix (M, T) by the above process is as follows.
검출된 8개의 기본얼굴특징점들의 위치와 이 위치에 대해 3차원 표준 얼굴모델에서 대응하는 점의 위치쌍을 이용하여 최적변환행렬의 12개 변수(M의 3*3=9개와 T의 3개)들을 최소제곱법을 이용하여 추정한다. 12 variables (3 * 3 = 9 of M and 3 of T) of the optimal transformation matrix using the position of the detected 8 basic facial feature points and the position pairs of corresponding points in the 3D standard face model for this position Are estimated using the least square method.
즉, 최적변환행렬의 12개 성분들을 변수로 하고, 검출된 특징점의 위치와 최적변환행렬을 적용한 얼굴모델특징점들의 위치 사이 편차의 제곱합을 출력으로 하는 목표함수를 설정한다. In other words, a target function of outputting a sum of squares of deviations between the positions of the detected feature points and the positions of the face model feature points to which the optimal transformation matrix is applied is set as 12 variables of the optimal transformation matrix.
상기 목표함수를 최소화하는 최적화문제를 풀어 12개의 최적 변수를 계산한다. The 12 optimal variables are calculated by solving the optimization problem that minimizes the target function.
상기 응시방향정보는 상기 최적변환행렬의 회전정보 관련 행렬(M)의 각 성분을 이용하여 하기 수학식7에 의해 정의되고, 상기 응시거리정보는 상기 최적변환행렬의 평행이동 관련 벡터(T)로 정의된다. The gaze direction information is defined by Equation 7 using each component of the rotation information related matrix M of the optimal transformation matrix, and the gaze distance information is a parallel movement related vector T of the optimal transformation matrix. Is defined.
[수학식7][Equation 7]
Figure PCTKR2012005202-appb-I000006
Figure PCTKR2012005202-appb-I000006
(단, m11, m12, ...,m33: 3*3 행렬 M의 추정된 각 성분값)Where m 11 , m 12 , ..., m 33 : estimated values of each component of the 3 * 3 matrix M
즉, 상기 응시방향정보는 (ax, ay, az)가 되고, 상기 응시거리정보는 평행이동 관련 벡터(T) 자체로 정의되는 것이다. That is, the gaze direction information becomes (a x , a y , a z ), and the gaze distance information is defined by the parallel movement related vector T itself.
상기 성별 추정단계(S500)에서는, 도 19에 도시된 바와 같이, 이미지 및 얼굴특징점 입력(S510), 성별 추정용 얼굴영역 잘라냄(S520), 잘라낸 얼굴영역 이미지 정규화(S530), SVM에 의한 성별추정(S540)의 과정으로 이뤄진다. In the gender estimating step (S500), as shown in FIG. 19, the image and the facial feature point input (S510), the gender estimation face region clipping (S520), the cut face region image normalization (S530), and the gender by SVM It is made in the process of estimation (S540).
성별추정을 위한 방법으로서, 예를 들어, 사람의 얼굴 전부를 이용하는 보기 기반 방법과 얼굴의 기하학적인 특징들만을 이용하는 기하학적인 특징기반방법 등이 있다. As a method for sex estimation, there are, for example, a view-based method using all of a human face and a geometric feature-based method using only geometric features of a face.
바람직한 일예로서, 상기 성별 추정은, SVM(Support Vector Machine)학습을 이용한 보기기반 성별 분류 방법으로써 검출된 얼굴 영역을 정규화하여 얼굴 특징벡터를 구성하고 그것으로 성별을 예측하는 과정으로 이뤄진다. As a preferred example, the gender estimation is performed by a view-based gender classification method using SVM (Support Vector Machine) learning to normalize the detected face region to form a facial feature vector and predict the gender therewith.
SVM방법은 SVC(Support Vector Classifier)와 SVR(Support Vector Regression)로 구분하여 볼 수 있다. The SVM method may be classified into a support vector classifier (SVC) and a support vector regression (SVR).
상기 성별 추정과 관련하여, Shumeet Baluja et al.”Boosting Sex Identification Performance”, Carnegie Mellon University, Computer Science Department(2005), Gutta, et al.“Gender and ethnic classification”.IEEE Int.Workshop on Automatic Face and Gesture Recognition, pages 194-199(1998)과, Moghaddam et al.“Learning Gender with Support Faces”.IEEE T.PAMI Vol.24, No.5(2002), 등을 통해 이해될 수 있다. Regarding such gender estimation, Shumeet Baluja et al. “Boosting Sex Identification Performance”, Carnegie Mellon University, Computer Science Department (2005), Gutta, et al. “Gender and ethnic classification” .IEEE Int.Workshop on Automatic Face and Gesture Recognition, pages 194-199 (1998) and Moghaddam et al. “Learning Gender with Support Faces”. IEEE T. PAMI Vol. 24, No. 5 (2002), and the like.
본 실시예에서, 성별 추정단계(S500)는 구체적으로, (e1) 상기 검출된 얼굴특징점을 기준으로 상기 검출된 얼굴영역에서 성별추정용 얼굴영역을 잘라내는 단계; (e2) 상기 잘라낸 성별추정용 얼굴영역의 크기를 정규화하는 단계; (e3) 상기 크기가 정규화된 성별추정용 얼굴영역의 히스토그램을 정규화하는 단계; 및 (e4) 상기 크기 및 히스토그램이 정규화된 성별추정용 얼굴영역으로부터 입력벡터를 구성하고 미리 학습된 SVM 알고리즘을 이용하여 성별을 추정하는 단계;를 포함하여 구성된다. In the present embodiment, the gender estimating step (S500) specifically includes: (e1) cutting out a face region for sex estimation from the detected face region based on the detected face feature points; (e2) normalizing the size of the cut face sex estimation region; (e3) normalizing a histogram of the face region for gender estimation in which the size is normalized; And (e4) constructing an input vector from the face region for gender estimation where the size and histogram are normalized, and estimating gender using a pre-learned SVM algorithm.
상기 (e1) 단계에서는, 입력된 이미지와 얼굴특징점을 이용하여 얼굴영역을 잘라내며, 예를 들어, 도 20에 도시된 바와 같이, 왼쪽눈귀와 오른쪽눈귀 사이의 거리의 절반을 1로 보고 자르려는 얼굴의 영역을 계산한다. In the step (e1), the face area is cut out using the input image and the facial feature point. For example, as shown in FIG. 20, the half of the distance between the left and right eyes is cut to 1 and is to be cut. Calculate the area of the face.
상기 (e2) 단계에서는, 예를 들어, 잘라낸 얼굴영역을 12 * 21 크기로 정규화한다. In the step (e2), for example, the cut out facial region is normalized to 12 * 21 size.
상기 (e3) 단계에서는, 조명효과의 영향을 최소화하기 위하여 히스토그램을 매 농도값을 가지는 화소수를 동일하게 하는 과정인 히스토그램정규화를 한다. In the step (e3), the histogram is normalized, which is a process of equalizing the number of pixels having each density value to the histogram in order to minimize the effect of the lighting effect.
상기 (e4) 단계에서는, 예를 들어, 정규화된 12 * 21 크기의 얼굴이미지로부터 252차원의 입력벡터를 구성하고, 미리 학습된 SVM을 이용하여 성별을 추정한다. In the step (e4), for example, a 252-dimensional input vector is constructed from a normalized 12 * 21 face image, and sex is estimated using a pre-trained SVM.
이때, 상기 성별의 추정은, 하기 수학식8의 분류기의 계산 결과값이 0보다 크면 남자, 아니면 여자로 판정한다. At this time, the gender is estimated as a male or a female if the calculated result of the classifier of Equation 8 is greater than zero.
[수학식8][Equation 8]
Figure PCTKR2012005202-appb-I000007
Figure PCTKR2012005202-appb-I000007
(단, M:표본자료의 개수, (However, M: the number of samples,
yi:i번째 시험자료의 성별 값으로써 남자면 1, 여자면 -1로 설정y i : Gender value of the i th test data, set to 1 for male and -1 for female.
αi:i번째 벡터의 계수, α i : coefficient of the i-th vector,
x:시험자료, x: Exam,
xi:학습표본자료, x i : Sample sample,
k:커널함수, k: kernel function,
b:편차)b: deviation)
이때, 상기 커널함수는 하기 수학식9에 정의된 가우시안동경토대함수(GRBF, Gaussian Radial Basis Function)를 이용할 수 있다. In this case, the kernel function may use a Gaussian Radial Basis Function (GRBF) defined in Equation 9 below.
[수학식9][Equation 9]
Figure PCTKR2012005202-appb-I000008
Figure PCTKR2012005202-appb-I000008
(단, x:시험자료, x':학습표본자료, σ:분산정도를 나타내는 변수)(However, x: test data, x ': learning sample data, σ: variable indicating the degree of dispersion)
한편, 커넬함수로서는 가우시안동경토대함수 이외에 다항식커널 등을 사용할 수 있으며, 바람직하게, 식별성능을 고려하여 가우시안동경토대함수를 사용한다. Meanwhile, the kernel function may be a polynomial kernel, etc., in addition to the Gaussian copper soil function, and preferably, the Gaussian copper soil function is used in consideration of the identification performance.
한편, SVM(Support Vector Machine) 방법은 두 개의 그룹을 가지는 모임에서 두 그룹의 경계선을 도출해내는 분류방법으로서 패턴분류와 회귀를 위한 학습 알고리즘으로 알려져 있다. On the other hand, the SVM (Support Vector Machine) method is a classification method that derives the boundary of two groups in a group having two groups and is known as a learning algorithm for pattern classification and regression.
SVM들의 기초적인 학습원리는 눈에 보이지 않는 시험표본을 위한 예측분류오유가 최소로 되는, 즉, 좋은 일반화 성능을 가지는 최적의 선형초평면을 찾는 것이다. The basic learning principle of SVMs is to find an optimal linear hyperplane with minimal predictive classification errors for invisible test samples, that is, with good generalization performance.
이러한 원리에 기초하여 선형 SVM에서는 최소의 차수를 가지는 선형함수를 찾는 분류학적인 방법을 사용한다. Based on this principle, the linear SVM uses a taxonomic method to find the linear function with the least order.
SVM의 학습문제는 선형제한붙은 2차원계획문제에 귀착된다. Learning problems of SVM result in linearly constrained two-dimensional planning problems.
학습표본을 x1,…,xi , 개개의 클래스라벨을 y1,…,yi이라고 하고 학습표본이 남자이면 y = 1 , 여자라면 y = -1 로 한다. Samples x1,… , xi, individual class labels y1,… , yi, and y = 1 if the sample is male and y = -1 if the female.
학습결과를 일의로 결정하기 위하여 하기 참고식2의 제약을 준다. In order to determine the learning result uniquely, the following Equation 2 is restricted.
Figure PCTKR2012005202-appb-I000009
………[참고식2]
Figure PCTKR2012005202-appb-I000009
… … … [Reference Formula 2]
이러한 제약을 주면 학습표본과 초평면의 최소거리는, 하기 참고식3으로 표시되므로 반드시 하기 참고식4와 같이 된다. Given this constraint, the minimum distance between the learning sample and the hyperplane is represented by the following Equation 3, so it is necessarily as shown in the following Equation 4.
Figure PCTKR2012005202-appb-I000010
………[참고식3]
Figure PCTKR2012005202-appb-I000010
… … … [Reference Formula 3]
Figure PCTKR2012005202-appb-I000011
………[참고식4]
Figure PCTKR2012005202-appb-I000011
… … … [Reference Formula 4]
w, b 는 학습표본을 완전히 식별하는 가운데서 최소거리를 최대로 하도록 결정해야 하므로 하기 참고식5와 같이 정식화된다.Since w and b must be determined to maximize the minimum distance while fully identifying the learning sample, w and b are formulated as shown in Equation 5 below.
Figure PCTKR2012005202-appb-I000012
………[참고식5]
Figure PCTKR2012005202-appb-I000012
… … … [Reference Formula 5]
목적함수를 최소화하는 것은 최소거리인 상기 식4의 값을 최대화하는 것으로 된다. Minimizing the objective function maximizes the value of Equation 4, which is the minimum distance.
따라서 위의 목적함수를 최대화하는 지지벡터를 w와 편차 b를 계산한다. Therefore, w and deviation b are calculated for the support vector maximizing the above objective function.
커널을 이용한 SVM에서는 최적상수
Figure PCTKR2012005202-appb-I000013
을 하기 참고식6과 같이 결정한다.
Optimal Constants for SVM with Kernel
Figure PCTKR2012005202-appb-I000013
It is determined as shown in Equation 6 below.
Figure PCTKR2012005202-appb-I000014
…[참고식6]
Figure PCTKR2012005202-appb-I000014
… [Reference Formula 6]
이때 제한조건은 하기 참고식7과 같다. At this time, the constraint is shown in Equation 7 below.
Figure PCTKR2012005202-appb-I000015
…[참고식7]
Figure PCTKR2012005202-appb-I000015
… [Reference Formula 7]
여기서 K(x, x')는 비선형커널함수이다. Where K (x, x ') is a nonlinear kernel function.
다음 편차를 하기 참고식8과 같이 계산한다. The next deviation is calculated as shown in Equation 8 below.
Figure PCTKR2012005202-appb-I000016
…[참고식8]
Figure PCTKR2012005202-appb-I000016
… [Reference Formula 8]
상술한 바와 같은 방법에 의해 얻어진 상기 수학식8의 분류기에 대한 계산 결과값이 1이면 남자, 0이면 여자로 판정되는 것이다. If the result of calculation for the classifier of Equation 8 obtained by the above-described method is 1, it is determined as male, and if it is 0, female.
한편, 상기 과정에서 Adaboost 방법을 사용할 수도 있으나, 분류기의 성능과 일반화 성능을 고려할 때, SVM 방법을 사용하는 것이 더욱 바람직하다. Meanwhile, although the Adaboost method may be used in the above process, considering the performance and generalization performance of the classifier, it is more preferable to use the SVM method.
예를 들어, 아시아인들의 얼굴들을 Adaboost 방법으로 학습시키고 유럽인들에 대하여 성별추정성능을 시험해보았을 때 SVM 방법으로 시험할 때보다 10 ~ 15%정도 성능이 내려가게 된다. For example, when Asians are trained by the Adaboost method and tested for sex estimates on Europeans, the performance is 10-15% lower than when tested by the SVM method.
이로부터 충분한 학습자료가 주어지지 않은 조건에서 SVM 방법으로 성별추정을 진행하는 경우 높은 식별능력을 얻을 수 있다는 이점이 있다. From this, there is an advantage that high discrimination ability can be obtained when gender estimation is performed by SVM method under the condition that there is not enough learning data.
상기 나이 추정단계(S600)에서는, 도 21에 도시된 바와 같이, 이미지 및 얼굴특징점 입력(S610), 나이 추정용 얼굴영역 잘라냄(S620), 잘라낸 얼굴영역 이미지 정규화(S630), 나이다양체 공간으로 사영(S640), 2차 다항식 회귀를 이용하여 나이추정(S650)의 과정으로 이뤄진다. In the age estimating step (S600), as shown in FIG. 21, an image and a facial feature point input (S610), an age estimation face area cropping (S620), a cut out face area image normalization (S630), and a nine-body space Projection (S640), the second polynomial regression is made by the process of age estimation (S650).
나이 추정방법과 관련하여, Y.Fu, Y.Xu, and T.S.Huang의 논문, “Estimating human ages by manifold analysis of face pictures and regression on aging features,” in Proc.IEEE Conf.Multimedia Expo., 2007, pp.1383-1386과, G.Guo, Y.Fu, T.S.Huang, and C.Dyer의 논문, “Locally adjusted robust regression for human age estimation,” presented at the IEEEWorkshop on Applications of Computer Vision, 2008, A.Lanitis, C.Draganova, and C.Christodoulou의 논문, “Comparing different classifers for automatic age estimation,” IEEE Trans.Syst., Man, Cybern.B, Cybern., vol.34, no.1, pp.621-628, Feb.2004.등을 통해 이해할 수 있다. Regarding age estimation methods, Y.Fu, Y.Xu, and TSHuang, “Estimating human ages by manifold analysis of face pictures and regression on aging features,” in Proc.IEEE Conf. Multimedia Expo., 2007, pp. 1383-1386 and in the papers of G.Guo, Y.Fu, TSHuang, and C.Dyer, “Locally adjusted robust regression for human age estimation,” presented at the IEEE Workshop on Applications of Computer Vision, 2008, A. Lanitis, C. Draganova, and C. Christodoulou, “Comparing different classifers for automatic age estimation,” IEEE Trans. Syst., Man, Cybern. B, Cybern., Vol. 34, no. 1, pp. 621- 628, Feb. 2004.
본 실시예에서, 나이의 추정은 구체적으로, (f1) 상기 검출된 얼굴특징점을 기준으로 상기 검출된 얼굴영역에서 나이추정용 얼굴영역을 잘라내는 단계; (f2) 상기 잘라낸 나이추정용 얼굴영역의 크기를 정규화하는 단계; (f3) 상기 크기가 정규화된 나이추정용 얼굴영역의 국부적 조명보정을 하는 단계; (f4) 상기 크기 정규화 및 국부적 조명보정된 나이추정용 얼굴영역으로부터 입력벡터를 구성하고 나이다양체 공간으로 사영하여 특징벡터를 생성하는 단계; 및 (f5) 상기 생성된 특징벡터에 2차회귀를 적용하여 나이를 추정하는 단계;를 포함하여 구성된다. In the present embodiment, the estimation of the age specifically includes: (f1) cutting out an age estimation face area from the detected face area based on the detected facial feature point; (f2) normalizing the size of the cut age estimation face region; (f3) performing local illumination correction on the age estimation face region where the size is normalized; (f4) generating a feature vector by constructing an input vector from the size normalized and locally-illuminated age estimation face region and projecting it into a nine-body space; And (f5) estimating age by applying quadratic regression to the generated feature vectors.
상기 (f1) 단계에서는, 입력된 이미지와 얼굴특징점을 이용하여 얼굴영역을 잘라낸다. In the step (f1), the face region is cut out using the input image and the facial feature point.
예를 들어, 도 22에 도시된 바와 같이, 두눈귀 및 입귀점으로부터 위(0.8), 아래(0.2), 왼쪽(0.1), 오른쪽(0.1)로 각각 확장하여 얼굴영역을 잘라낸다. For example, as shown in FIG. 22, the face region is cut out from the binocular and the entrance point to the upper (0.8), the lower (0.2), the left (0.1), and the right (0.1), respectively.
상기 (f2) 단계에서는, 예를 들어, 잘라낸 얼굴영역을 64 * 64 크기로 정규화한다. In the step (f2), for example, the cut out face region is normalized to 64 * 64 size.
상기 (f3) 단계에서는, 조명효과의 영향을 줄이기 위하여, 하기 수학식10에 의해 국부적 조명보정이 이뤄진다. In the step (f3), in order to reduce the influence of the lighting effect, local illumination correction is performed by the following equation (10).
[수학식10][Equation 10]
I(x,y)=(I(x,y)-M)/V*10 + 127I (x, y) = (I (x, y) -M) / V * 10 + 127
(단, I(x,y):(x,y)위치에서의 농담값, M:4*4 국부적 창문영역에서의 농담평균값, V:표준분산값)(However, the shade value at position I (x, y) :( x, y), M: 4 value at the local window area, V: standard variance value)
상기 표준분산값(V)은 어떤 우연량의 값이 평균값주위에서 흩어지는 정도를 나타내는 특성값이며, 수학적으로 표준분산 V는 다음 식9와 같이 계산된다. The standard dispersion value (V) is a characteristic value representing the degree to which a certain amount of coincidence is scattered around the average value, and mathematically, the standard dispersion V is calculated as in Equation (9).
Figure PCTKR2012005202-appb-I000017
………[참고식9]
Figure PCTKR2012005202-appb-I000017
… … … [Reference Formula 9]
상기 (f4) 단계에서는, 예를 들어, 64 * 64 얼굴이미지로부터 4096차원의 입력벡터를 구성하고, 미리 학습된 나이다양체공간으로 사영하여 50차원의 특징벡터를 생성한다. In the step (f4), for example, a 4096-dimensional input vector is constructed from a 64 * 64 face image, and a 50-dimensional feature vector is generated by projecting into a pre-learned manifold space.
나이추정이론에서는 얼굴화상에 반영된 인간의 노화과정을 나타내는 특징들이 어떠한 저차원분포에 따르는 패턴들로 표시될 수 있다고 가정하며, 이때의 저차원특징공간을 나이다양체공간이라고 한다. The age estimation theory assumes that the characteristics of the human aging process reflected in the face image can be expressed in patterns according to any low dimensional distribution.
이로부터 나이추정에서 기본은 얼굴화상으로부터 나이다양체공간에로의 사영행렬을 추정하는 것이 기본이다. From this, it is basic to estimate projection projection from face image to naida body space.
CEA(Conformal Embedding Analysis)에 의한 나이다양체에로의 사영행렬 학습 알고리즘에 대하여 간략하게 설명한다. We will briefly explain the learning matrix learning algorithm for Nida yang by Conformal Embedding Analysis (CEA).
Y=PTX………[참고식10]Y = P T X... … … [Reference Formula 10]
상기 참고식10에서, X는 입력벡터, Y는 특징벡터이며 P는 CEA를 이용하여 학습된 나이다양체에로의 사영행렬이다. In Ref. 10, X is an input vector, Y is a feature vector, and P is a projection matrix to Nida body trained using CEA.
이와 관련하여, Yun Fu Huang, T.S.의 논문, "Human Age Estimation With Regression on Discriminative Aging Manifold" in Multimedia, IEEE Transactions on, 2008, pp.578-584 등을 통해 이해할 수 있다. In this regard, it can be understood through a paper by Yun Fu Huang, T.S., "Human Age Estimation With Regression on Discriminative Aging Manifold" in Multimedia, IEEE Transactions on, 2008, pp.578-584.
n개의 얼굴이미지 x1, x2,…,xn을 X={x1,…, xn}∈Rm로 표시한다. n face images x 1 , x 2 ,... , x n is replaced by X = {x 1 ,... , x n } ∈R m .
이때, X는 m×n 행렬이며 xi는 매 얼굴이미지를 나타낸다. X is an m × n matrix and x i represents every face image.
다양체학습단계는 m차원의 얼굴벡터를 d≪m(d는 m보다 훨씬 작다)인 d차원의 얼굴벡터(노화특징벡터)로 표현하기 위한 사영행렬을 구하는 것이다. The manifold learning step is to obtain a projection matrix for representing the m-dimensional face vector as a d-dimensional face vector (aging feature vector), where d < m (d is much smaller than m).
즉, yi= Pmat×xi 인 사영행렬 Pmat를 구하는 것이다. 여기서 {y1,…, yn}∈Rd이다. 여기서, d를 50으로 설정한다. In other words, we obtain the projection matrix P mat whose y i = P mat × x i . Where {y 1 ,… , y n } ∈R d . Here, d is set to 50.
일반적으로 얼굴해석을 진행할 때, 이미지차수 m은 이미지개수 n보다 훨씬 더 크다.In general, when performing face analysis, the image order m is much larger than the number n of images.
그러므로 m×m행렬 XXT는 퇴화행렬이다. 이 문제를 극복하기 위해 처음에 PCA를 이용하여 얼굴이미지를 정보손실이 없는 부분공간으로 사영하며 결과 행렬 XXT는 불퇴화행렬로 된다.Therefore m × m matrix XX T is a degenerate matrix. To overcome this problem, we first project the face image into subspace without information loss using PCA, and the result matrix XX T becomes an immortality matrix.
(1) PCA 사영(1) PCA Projection
n개의 얼굴벡터가 주어지면 이 얼굴벡터모임에 대한 공분산행렬 Cpca를 구한다. Cpca는 m×m 행렬이다. Given n face vectors, we find the covariance matrix C pca for this face vector group. C pca is an m × m matrix.
공분산행렬 Cpca에 대한 Cpca×Eigenvector=Eigenvalue×Eigenvector인 고유값, 고유벡터 문제를 풀어서 m개의 고유값들과 m개의 m차원 고유벡터들을 얻는다. The eigenvalues and eigenvectors of C pca × Eigen vector = Eigen value × Eigen vector for the covariance matrix C pca are solved to obtain m eigenvalues and m m-dimensional eigenvectors.
다음 고유값이 큰 순서로 d개의 고유벡터를 선택하여 행렬 WPCA를 구성한다.Next, d matrix of eigenvectors are selected in order of eigenvalues to form matrix W PCA .
WPCA는 m×d 행렬이다.W PCA is an m × d matrix.
(2) 무게행렬 Ws, Wd구성(2) Weight matrix Ws, Wd composition
Ws는 같은 나이그룹에 속하는 얼굴이미지들사이의 관계를 나타내며 Wd는 서로 다른 그룹에 속하는 얼굴이미지들사이의 관계를 나타낸다.Ws denotes a relationship between face images belonging to the same age group and Wd denotes a relationship between face images belonging to different groups.
Figure PCTKR2012005202-appb-I000018
………[참고식11]
Figure PCTKR2012005202-appb-I000018
… … … [Reference Formula 11]
상기 참고식11에서, Dist(Xi,Xj)는 하기 참고식12와 같다. In Ref. 11, Dist (X i , X j ) is the same as Ref. 12 below.
Figure PCTKR2012005202-appb-I000019
…[참고식12]
Figure PCTKR2012005202-appb-I000019
… [Reference Formula 12]
(3) CEA토대벡터 계산(3) CEA foundation vector calculation
Figure PCTKR2012005202-appb-I000020
의 d개의 가장 큰 고유값에 대응하는 고유벡터가 CEA토대벡터로 된다.
Figure PCTKR2012005202-appb-I000020
The eigenvectors corresponding to the d largest eigenvalues of become CEA basis vectors.
Figure PCTKR2012005202-appb-I000021
…[참고식13]
Figure PCTKR2012005202-appb-I000021
… [Reference Formula 13]
(4) CEA 은페화(4) CEA silver coins
직교토대벡터들인 a1,…,ad가 계산되면 행렬 WCEA는 하기 참고식14와 같이 정의된다. Orthogonal Vectors a 1 ,. When, a d is calculated, the matrix WCEA is defined as follows.
WCEA = [a1, a2, …, ad]………[참고식14]W CEA = [a 1 , a 2 ,... , a d ]… … … [Reference Formula 14]
식에서 WCEA은 m×d행렬이다.Where W CEA is the m × d matrix.
이때 사영행렬 Pmat는 하기 참고식15와 같이 정의된다.The projective matrix P mat is defined as in Equation 15 below.
Pmat=WPCAWCEA………[참고식15]P mat = W PCA W CEA . … … [Reference Formula 15]
사영행렬 Pmat를 이용하여 매 얼굴벡터 X에 대한 노화특징량을 얻어낸다.The projection matrix P mat is used to obtain aging characteristics for each face vector X.
x→y = Pmat T× x………[참고식16]x → y = P mat T × x... … … [Reference Formula 16]
(단, y는 얼굴벡터 X에 대응하는 d차원벡터, 즉, 노화특징량임)(Where y is a dimensional vector corresponding to the face vector X, ie, an aging characteristic amount)
상기 (f5) 단계에서, 상기 2차회귀를 적용하여 나이를 추정하는 것은 하기 수학식11에 의해 이뤄진다. In the step (f5), to estimate the age by applying the second regression is made by the following equation (11).
[수학식11][Equation 11]
Figure PCTKR2012005202-appb-I000022
Figure PCTKR2012005202-appb-I000022
(단, bo, b1, b2:학습자료로부터 미리 계산된 회귀계수, (However, b o , b 1 , b 2 : regression coefficients precomputed from the learning data,
Y:시험자료x로부터 참고식16에 의하여 계산된 노화특징벡터, Y: aging characteristic vector calculated by reference formula 16 from test data x,
L:추정 나이)L: estimated age)
bo, b1, b2는 학습자료로부터 다음과 같이 미리 계산한다. b o , b 1 , and b 2 are precomputed from the learning material as follows:
2차회귀모형은 하기 참고식17과 같다.The second regression model is shown in Equation 17 below.
Figure PCTKR2012005202-appb-I000023
………[참고식17]
Figure PCTKR2012005202-appb-I000023
… … … [Eq. 17]
여기서
Figure PCTKR2012005202-appb-I000024
는 i번째 학습화상의 나이값이며
Figure PCTKR2012005202-appb-I000025
는 i번째 학습화상의 특징벡터이다.
here
Figure PCTKR2012005202-appb-I000024
Is the age of the i-th learning image
Figure PCTKR2012005202-appb-I000025
Is the feature vector of the i-th learning image.
이것은 벡터-행렬형식으로 하기 참고식18과 같이 표시된다. This is expressed in the vector-matrix format as shown in Equation 18 below.
Figure PCTKR2012005202-appb-I000026
………[참고식18]
Figure PCTKR2012005202-appb-I000026
… … … [Equation 18]
여기서, here,
Figure PCTKR2012005202-appb-I000027
………[참고식19]
Figure PCTKR2012005202-appb-I000027
… … … [Reference Expression 19]
이며, n은 학습자료의 개수이다. N is the number of learning materials.
이때, 회귀상수
Figure PCTKR2012005202-appb-I000028
는 하기 참고식20과 같이 계산된다.
Where regression constant
Figure PCTKR2012005202-appb-I000028
Is calculated as follows.
Figure PCTKR2012005202-appb-I000029
………[참고식20]
Figure PCTKR2012005202-appb-I000029
… … … [Reference Formula 20]
상기 눈감김 추정단계(S700)에서는, 도 23에 도시된 바와 같이, 이미지 및 얼굴특징점 입력(S710), 눈감김 추정용 얼굴영역 잘라냄(S720), 잘라낸 얼굴영역 이미지 정규화(S730), SVM에 의한 눈감김 추정(S740)의 과정으로 이뤄진다. In the eyelid estimation step (S700), as shown in FIG. 23, the image and facial feature point input (S710), the eye region estimation for trimming the face region (S720), the cut out facial region image normalization (S730), SVM By eyelid estimation (S740) by the process is made.
본 실시예에서, 상기 눈감김의 추정은 구체적으로, (g1) 상기 검출된 얼굴특징점을 기준으로 상기 검출된 얼굴영역에서 눈감김추정용 얼굴영역을 잘라내는 단계; (g2) 상기 잘라낸 눈감김추정용 얼굴영역의 크기를 정규화하는 단계; (g3) 상기 크기가 정규화된 눈감김추정용 얼굴영역의 히스토그램을 정규화하는 단계; 및 (g4) 상기 크기 및 히스토그램이 정규화된 눈감김추정용 얼굴영역으로부터 입력벡터를 구성하고 미리 학습된 SVM 알고리즘을 이용하여 눈감김을 추정하는 단계;를 포함하여 구성된다. In the present embodiment, the estimation of the eye closing may specifically include: (g1) cutting the eye mask estimation face area from the detected face area based on the detected facial feature point; (g2) normalizing the size of the cut-out eye mask estimation face region; (g3) normalizing a histogram of the face region for estimating the eyelid normalized in size; And (g4) constructing an input vector from the face region for eye-eye estimation for which the size and histogram are normalized, and estimating eye-eye closure using a pre-learned SVM algorithm.
상기 (g1) 단계에서는, 입력된 이미지와 얼굴특징점을 이용하여 눈영역을 잘라낸다. In the step (g1), the eye region is cut out using the input image and the facial feature point.
예를 들어, 도 24에 도시된 바와 같이, 얼굴특징점 검출에서 검출된 특징점 중에서 눈의 양쪽 끝점을 기준으로 너비를 확정하고, 위아래로 동일한 높이로 눈영역을 확정하여 눈영역을 잘라낼 수 있다. For example, as illustrated in FIG. 24, the eye area may be cut out by determining the width of the feature points detected by the facial feature point detection based on both end points of the eye and determining the eye area at the same height up and down.
상기 (g2) 단계에서는, 예를 들어, 잘라낸 눈영역이미지를 20*20크기로 정규화한다. In the step (g2), for example, the cropped eye region image is normalized to 20 * 20 size.
상기 (g3) 단계에서는, 조명효과의 영향을 줄이기 위하여 히스토그램정규화를 한다. In the step (g3), histogram normalization is performed to reduce the effect of the lighting effect.
상기 (g4) 단계에서는, 예를 들어, 정규화된 20*20 크기의 얼굴이미지로부터 400차원의 입력벡터를 구성하고, 미리 학습된 SVM을 이용하여 눈감김여부를 추정한다.In the step (g4), for example, a 400-dimensional input vector is constructed from a normalized 20 * 20 face image, and estimated whether to close the eye using a pre-learned SVM.
상기 (g4) 단계에서, 상기 눈감김의 추정은, 하기 수학식12의 결과값이 0보다 크면 눈을 뜬 상태, 0보다 작으면 눈을 감은 상태로 판정하며, 결과값이 0인 경우에는 바람직하게는 눈을 뜬 것으로 판정한다. In the step (g4), the estimation of the eye closing is determined as the state of opening the eyes when the result value of Equation 12 is greater than 0, and the state of closing the eyes when the result value is less than 0. Is determined to be awakened.
[수학식12][Equation 12]
Figure PCTKR2012005202-appb-I000030
Figure PCTKR2012005202-appb-I000030
(단, M:SV벡터의 개수, (However, the number of M: SV vectors,
yi:i번째 학습자료에 대한 눈감김 여부로써 눈을 뜬 상태인 경우 1, 눈을 감은 상태인 경우 -1로 설정, y i : Whether to close the eye for the i-th learning material is set to 1 when the eyes are opened and -1 when the eyes are closed.
αi:i번째 벡터의 계수, α i : coefficient of the i-th vector,
x:시험벡터, x: test vector,
xi:i번째 학습벡터, x i : i-th learning vector,
k:커널함수, k: kernel function,
b:편차)b: deviation)
이때, 상기 커널함수는 하기 수학식13에 정의된 가우시안동경토대함수를 이용할 수 있다. In this case, the kernel function may use a Gaussian landscape soil function defined in Equation 13.
[수학식13][Equation 13]
Figure PCTKR2012005202-appb-I000031
Figure PCTKR2012005202-appb-I000031
(단, x:시험자료, x':학습표본자료, σ:분산정도를 나타내는 변수)(However, x: test data, x ': learning sample data, σ: variable indicating the degree of dispersion)
상기 결과 출력단계(S800)에서는, 상술한 바와 같은 과정에 의해 추정된 시청자의 성별정보, 시청자의 나이정보를 3차원 디스플레이 장치의 입체감을 제어하기 정보로서 입체감 제어수단으로 출력한다. In the result output step (S800), the sex information of the viewer and the age information of the viewer estimated by the process described above are output to the stereoscopic control means as information for controlling the stereoscopic sense of the 3D display apparatus.
일반적으로 3차원 디스플레이 장치 개발시, 3차원 디스플레이 장치의 정면 2.5M에 성인 남자가 앉아있다는 전제조건으로 개발을 한다. In general, when developing a 3D display device, the development is based on the premise that an adult man is sitting on the front 2.5M of the 3D display device.
하지만, 예를 들어, 양안 시차를 이용하는 3DTV의 경우 해당위치에서 벗어나게 되면 입체효과가 줄어들거나 어지러움증이 일어나는 문제가 있다. However, for example, in the case of 3DTV using binocular parallax, the stereoscopic effect is reduced or dizziness occurs when it is moved out of the corresponding position.
한편, 일반적인 성인남자의 경우 대략 6.5cm의 양안 거리를 가지고 있으며, 이에 맞도록 뇌는 깊이정보를 계산하도록 되어있다. On the other hand, the average adult has a binocular distance of about 6.5cm, the brain is to calculate the depth information accordingly.
하지만 인종, 성별, 나이에 따라 이 차이가 작게는 1cm 많게는 1.5cm 정도 차이가 벌어진다. However, depending on race, gender, and age, this difference can be as small as 1cm or 1.5cm.
그러므로, 이를 판별하여 3차원 디스플레이 장치의 입체감을 제어하기 위하여 시청자의 성별정보와 나이정보가 필요하다. Therefore, the gender information and the age information of the viewer are needed to determine this and control the stereoscopic feeling of the 3D display device.
상기 입체감 제어수단으로 출력된 시청자의 성별정보, 시청자의 나이정보는, 좌영상과 우영상 촬영시의 초점이 맞추어지는 점을 기준으로 하여 정해지는 변경 량을 의미하는 수평 시차 변경 기준값으로 활용될 수 있다. The gender information of the viewer and the age information of the viewer output by the stereoscopic control means may be used as a horizontal parallax change reference value, which means a change amount determined based on the point where the left and right images are focused. have.
즉, 상기 추정된 시청자의 성별정보, 시청자의 나이정보에 근거한 수평 시차 변경 기준값을 이용하여 3차원 디스플레이 장치의 입체감을 제어함에 따라 현재 시청자의 시청 조건에 최적화된 3차원 화면을 출력하여 제공할 수 있는 것이다. That is, by controlling the stereoscopic sense of the 3D display apparatus by using the horizontal parallax change reference value based on the estimated gender information of the viewer and the age information of the viewer, a 3D screen optimized for the current viewer's viewing condition may be output and provided. It is.
한편, 시청자의 응시방향에 대한 추정 결과, 3차원 디스플레이 장치의 정면에서 시청하는 경우(도 25의 a)가 아닌 3차원 디스플레이 장치의 정면에서 소정 각도 이상 벗어난 경우(예를 들어, 도 25에 도시된 바와 같이, 좌우 10˚ 이상 벗어난 위치에서 시청자가 응시하고 있는 경우(도 25의 b))에는 다음과 같은 처리를 할 수 있다. On the other hand, as a result of the viewer's estimation of the gaze direction, when the viewer deviates by a predetermined angle or more from the front of the 3D display device, not when viewing from the front of the 3D display device (FIG. 25A) (for example, as illustrated in FIG. As described above, when the viewer gazes at a position 10 ° or more left or right (b in FIG. 25), the following processing can be performed.
3차원 디스플레이 장치의 정면이 해당 시청자를 향하도록 회전구동수단(도면 미도시)을 이용하여 3차원 디스플레이 장치의 출력방향을 변경할 수 있다. The output direction of the 3D display apparatus may be changed by using rotation driving means (not shown) so that the front side of the 3D display apparatus faces the corresponding viewer.
또는, 3차원 디스플레이 장치의 화면으로 "시청 각도에서 벗어남", "화면 정면으로 이동 바람" 등의 자막을 출력하여 시청자가 3차원 디스플레이 장치의 정면으로 이동할 수 있도록 안내할 수도 있다. Alternatively, the viewer may be guided so that the viewer can move to the front of the 3D display by outputting captions such as “deviating from the viewing angle” and “winding to the front of the screen” on the screen of the 3D display.
또한, 상기 결과 출력단계(S800)에서는, 상술한 바와 같은 과정에 의해 추정된 시청자의 눈감김정보를 3차원 디스플레이 장치 화면 출력 ON/OFF를 제어하기 위한 정보로서 화면전원 제어수단으로 출력한다. In addition, in the result output step (S800), the eye contact information estimated by the above-described process is output to the screen power control means as information for controlling the ON / OFF screen output of the 3D display device.
즉, 시청자의 눈감김 상태가 지속된다고 추정된 경우에, 상기 화면전원 제어수단은 상기 디스플레이 장치 화면으로 출력되는 영상을 OFF시켜서 더 이상의 영상 출력이 이뤄지지 않도록 할 수 있다. That is, when it is estimated that the viewer's eye-closing state continues, the screen power control means may turn off the image output to the display device screen so that no further image output is performed.
도 25의 도면부호 1000은, 이러한 각종 제어 처리를 하기 위한 제어수단이다. Reference numeral 1000 in FIG. 25 denotes control means for performing such various control processes.
본 발명의 실시예 들은 다양한 컴퓨터로 구현되는 동작을 수행하기 위한 프로그램 명령을 포함하는 컴퓨터 판독가능 기록매체를 포함한다. Embodiments of the present invention include a computer readable recording medium including program instructions for performing various computer-implemented operations.
상기 컴퓨터 판독 가능 기록매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. The computer-readable recording medium may include program instructions, data files, data structures, etc. alone or in combination.
상기 기록매체는 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. The recording medium may be one specially designed and configured for the present invention, or may be known and available to those skilled in computer software.
컴퓨터 판독 가능 기록매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크와 같은 자기-광 매체, 및 롬, 램, 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical recording media such as CD-ROMs, DVDs, magnetic-optical media such as floppy disks, and ROM, RAM, flash memory, and the like. Hardware devices specifically configured to store and execute the same program instructions are included.
상기 기록매체는 프로그램 명령, 데이터 구조 등을 지정하는 신호를 전송하는 반송파를 포함하는 광 또는 금속선, 도파관 등의 전송 매체일 수도 있다. The recording medium may be a transmission medium such as an optical or metal wire, a waveguide, or the like including a carrier wave for transmitting a signal specifying a program command, a data structure, or the like.
프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like.
본 발명은 첨부된 도면을 참조하여 바람직한 실시예를 중심으로 기술되었지만 당업자라면 이러한 기재로부터 본 발명의 범주를 벗어남이 없이 많은 다양하고 자명한 변형이 가능하다는 것은 명백하다. 따라서 본 발명의 범주는 이러한 많은 변형예들을 포함하도록 기술된 특허청구범위에 의해서 해석돼야 한다.Although the present invention has been described with reference to the accompanying drawings, it will be apparent to those skilled in the art that many different and obvious modifications are possible without departing from the scope of the invention from this description. Therefore, the scope of the invention should be construed by the claims described to include many such variations.

Claims (24)

  1. 시청자의 응시방향 및 응시거리 중 적어도 하나의 정보에 대응하여 3차원 디스플레이 장치의 입체감을 제어하기 위한 시청자 얼굴 추적정보 생성방법으로서, A viewer face tracking information generation method for controlling stereoscopic feeling of a 3D display device in response to at least one piece of information of a viewer's gaze direction and gaze distance,
    (a) 상기 3차원 디스플레이 장치 측의 일 위치에 구비된 영상입력수단을 통해 입력되는 영상에서 추출한 이미지로부터 상기 시청자의 얼굴영역을 검출하는 단계; (a) detecting a face region of the viewer from an image extracted from an image input through an image input means provided at one position of the 3D display apparatus;
    (b) 상기 검출된 얼굴영역에서 얼굴특징점을 검출하는 단계; (b) detecting a facial feature point in the detected face region;
    (c) 3차원 표준 얼굴모델의 모델특징점을 변환하여 상기 얼굴특징점에 대응하는 3차원 시청자 얼굴모델을 생성하는 최적변환행렬을 추정하는 단계; 및 (c) estimating an optimal transformation matrix for generating a 3D viewer face model corresponding to the face feature by converting the model feature points of the 3D standard face model; And
    (d) 상기 최적변환행렬에 근거하여 상기 시청자의 응시방향 및 응시거리 중 적어도 하나를 추정하여 시청자 얼굴 추적정보를 생성하는 단계;를 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법. and (d) estimating at least one of the gaze direction and gaze distance of the viewer based on the optimal transformation matrix to generate viewer face tracking information.
  2. 제1항에 있어서, The method of claim 1,
    상기 (a) 단계는, In step (a),
    (a1) 상기 추출된 이미지의 RGB 색 정보로부터 YCbCr 색 모델을 작성하고, 작성된 색 모델에서 색 정보와 밝기 정보를 분리하며, 상기 밝기 정보에 의하여 얼굴후보영역을 검출하는 단계; 및 (a1) creating a YCbCr color model from the RGB color information of the extracted image, separating color information and brightness information from the created color model, and detecting a face candidate area based on the brightness information; And
    (a2) 상기 검출된 얼굴후보영역에 대한 4각 특징점 모델을 정의하고, 상기 4각 특징점 모델을 AdaBoost 학습 알고리즘에 의하여 학습시킨 학습자료에 기초하여 얼굴영역을 검출하는 단계;를 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법. (a2) defining a quadrilateral feature point model for the detected face candidate region, and detecting a face region based on the training material trained by the AdaBoost learning algorithm on the quadrilateral feature point model; Viewer face tracking information generation method.
  3. 제2항에 있어서, The method of claim 2,
    상기 (a2) 단계 이후에After step (a2)
    (a3) 상기 AdaBoost의 결과값(하기 수학식1의 CFH(x))의 크기가 소정임계값을 초과하는 경우에 상기 검출된 얼굴영역을 유효한 얼굴영역으로 판정하는 단계;를 더 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법. (a3) determining the detected face area as a valid face area when the size of the result value of AdaBoost (CF H (x) of Equation 1) exceeds a predetermined threshold value; Viewer face tracking information generation method, characterized in that.
    [수학식1][Equation 1]
    Figure PCTKR2012005202-appb-I000032
    Figure PCTKR2012005202-appb-I000032
    (단, M:강분류기를 구성하고 있는 전체 약분류기의 개수(However, M: the number of total classifiers constituting the strong classifiers
    hm(x):m번째 약분류기에서의 출력값h m (x): Output value from the mth weak classifier
    θ:강분류기의 오류판정률을 조절하는데 이용되는 값)θ: value used to adjust the error judgment rate of the strong classifier)
  4. 제2항에 있어서, The method of claim 2,
    상기 (a2) 단계에서, In the step (a2),
    상기 얼굴영역 검출을 위한 하 라이크 피쳐(harr-like feature)는 비정면 얼굴영역을 검출하기 위한 비대칭성의 하 라이크 피쳐(harr-like feature)를 더욱 포함하는 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법. The like-look feature for detecting the face region further comprises asymmetric like-like features for detecting the non-frontal face region.
  5. 제1항에 있어서, The method of claim 1,
    상기 (b) 단계는, In step (b),
    ASM(active shape model) 방법의 특징점(landmark) 탐색에 의해 이뤄지되, AdaBoost 알고리즘을 이용하여 진행하는 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법. A method for generating viewer face tracking information, which is performed by searching for a landmark of an ASM method, and proceeds using an AdaBoost algorithm.
  6. 제5항에 있어서, The method of claim 5,
    상기 얼굴특징점의 검출은, Detection of the facial feature point,
    (b1) 현재 특징점의 위치를 (xl, yl)라고 정의하고, 현재 특징점의 위치를 중심으로 그 근방에서 n*n 화소크기의 부분창문들을 분류기로 분류하는 단계; (b1) defining a position of the current feature point as (x l , y l ), and classifying partial windows of n * n pixel size into a classifier around the current feature point;
    (b2) 하기 수학식2에 의하여 특징점의 후보위치를 계산하는 단계; 및 (b2) calculating candidate positions of the feature points according to Equation 2 below; And
    (b3) 하기 수학식3의 조건을 만족하는 경우에는 (x'l, y'l)을 새로운 특징점으로 정하고, 만족하지 못하는 경우에는 현재 특징점의 위치(xl, yl)를 유지하는 단계;를 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법. (b3) setting (x ' l , y' l ) as a new feature point if the condition of Equation 3 is satisfied, and maintaining the position (x l , y l ) of the current feature point if not satisfied; Viewer face tracking information generation method, characterized in that configured to include.
    [수학식2][Equation 2]
    Figure PCTKR2012005202-appb-I000033
    Figure PCTKR2012005202-appb-I000033
    [수학식3][Equation 3]
    Figure PCTKR2012005202-appb-I000034
    Figure PCTKR2012005202-appb-I000034
    (단, a:x축방향으로 탐색해나가는 최대근방거리(However, the maximum near distance searched in the a: x axis direction
    b:y축방향으로 탐색해나가는 최대근방거리b: Maximum near distance searched in the y-axis direction
    xdx , dy:(xl, yl)에서 (dx, dy)만큼 떨어진 점을 중심으로 하는 부분창문x dx , dy : partial window centered around (dx, dy) from (x l , y l )
    Nall:분류기의 총계단수N all : Total stage number of classifier
    Npass:부분창문이 통과된 계단수N pass : the number of steps through which the partial window has passed
    c:끝까지 통과되지 못한 부분창문의 신뢰도값을 제한하기 위한 상수값)c: constant value to limit the reliability value of partial windows not passed to the end)
  7. 제1항에 있어서, The method of claim 1,
    상기 (c) 단계는, In step (c),
    (c1) 상기 3차원 표준 얼굴모델의 얼굴 회전정보에 관한 3*3 행렬 M과 얼굴 평행이동정보에 관한 3차원 벡터 T를 이용하여 하기 수학식4의 변환식을 계산하는 단계-상기 M과 T는 각 성분을 변수로 가지며, 상기 최적변환행렬을 정의하는 행렬임-;(c1) calculating a conversion equation of Equation 4 using a 3 * 3 matrix M of face rotation information of the 3D standard face model and a 3D vector T of face parallel movement information, wherein M and T are A matrix having each component as a variable and defining the optimal transformation matrix;
    (c2) 상기 수학식4에 의해 구해진 카메라특징점위치벡터(PC)와 하기 수학식6에 의해 구해진 카메라변환행렬(MC)를 이용하여 하기 수학식5의 3차원 벡터 P'을 계산하는 단계;(c2) calculating the three-dimensional vector P 'of Equation 5 using the camera feature point position vector P C obtained by Equation 4 and the camera transformation matrix M C obtained by Equation 6 below; ;
    (c3) 상기 3차원 벡터 P'에 근거하여 2차원 벡터 PI를 (P'x/P'z, P'y/P'z)로 정의하는 단계; 및 (c3) defining a two-dimensional vector P I as (P ' x / P' z , P ' y / P' z ) based on the three-dimensional vector P '; And
    (c4) 상기 2차원 벡터 PI와 상기 (b) 단계에서 검출된 얼굴특징점의 좌표값을 이용하여 상기 최적변환행렬의 각 변수를 추정하는 단계;를 더욱 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법. (c4) estimating each variable of the optimal transformation matrix using coordinates of the two-dimensional vector P I and the facial feature points detected in the step (b); How to generate information.
    [수학식4][Equation 4]
    PC=M*PM+TP C = M * P M + T
    [수학식5][Equation 5]
    P'=Mc*Pc P '= M c * P c
    (단, P'은 (P'x, P'y, P'z)로 정의되는 3차원 벡터)(Where P 'is a three-dimensional vector defined by (P' x , P ' y , P' z ))
    [수학식6][Equation 6]
    Figure PCTKR2012005202-appb-I000035
    Figure PCTKR2012005202-appb-I000035
    (단, W:영상입력수단으로 입력된 이미지의 폭,(W: the width of the image input by the video input means,
    H:영상입력수단으로 입력된 이미지의 높이,H: height of the image inputted by the video input means,
    focal_len:-0.5*W/tan(Degree2Radian(fov*0.5)),focal_len: -0.5 * W / tan (Degree2Radian (fov * 0.5)),
    fov:카메라의 보임각도)fov: angle of view of the camera)
  8. 제7항에 있어서, The method of claim 7, wherein
    상기 응시방향 정보는 상기 행렬 M의 추정된 각 성분을 이용하여 하기 수학식7에 의해 구해지고, 상기 응시거리 정보는 상기 벡터 T의 추정된 각 성분으로 정의되는 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법. The gaze direction information is obtained by using Equation 7 below using the estimated respective components of the matrix M, and the gaze distance information is defined by the estimated respective components of the vector T. Way.
    [수학식7][Equation 7]
    Figure PCTKR2012005202-appb-I000036
    Figure PCTKR2012005202-appb-I000036
    (단, m11, m12, ...,m33: 3*3 행렬 M의 추정된 각 성분값)(M 11 , m 12 , ..., m 33 : estimated values of each component of the 3 * 3 matrix M)
  9. 제1항에 있어서, The method of claim 1,
    상기 (d) 단계 이후에, After step (d),
    (e) 상기 검출된 얼굴영역을 이용하여 상기 시청자의 성별을 추정하는 성별추정단계;를 더 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법. (e) a gender estimation step of estimating the gender of the viewer using the detected face region.
  10. 제9항에 있어서, The method of claim 9,
    상기 (e) 단계는, In step (e),
    (e1) 상기 검출된 얼굴특징점을 기준으로 상기 검출된 얼굴영역에서 성별추정용 얼굴영역을 잘라내는 단계; (e1) cutting out a face estimation region for gender estimation from the detected face region based on the detected face feature point;
    (e2) 상기 잘라낸 성별추정용 얼굴영역의 크기를 정규화하는 단계; (e2) normalizing the size of the cut face sex estimation region;
    (e3) 상기 크기가 정규화된 성별추정용 얼굴영역의 히스토그램을 정규화하는 단계; 및 (e3) normalizing a histogram of the face region for gender estimation in which the size is normalized; And
    (e4) 상기 크기 및 히스토그램이 정규화된 성별추정용 얼굴영역으로부터 입력벡터를 구성하고 미리 학습된 SVM 알고리즘을 이용하여 성별을 추정하는 단계;를 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법. and (e4) constructing an input vector from the face region for gender estimation where the size and histogram are normalized, and estimating a gender using a pre-learned SVM algorithm.
  11. 제1항에 있어서, The method of claim 1,
    상기 (d) 단계 이후에, After step (d),
    (f) 상기 검출된 얼굴영역을 이용하여 상기 시청자의 나이를 추정하는 나이추정단계;를 더 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법. and (f) an age estimation step of estimating the age of the viewer using the detected face region.
  12. 제11항에 있어서, The method of claim 11,
    상기 나이의 추정은, Estimation of the age,
    (f1) 상기 검출된 얼굴특징점을 기준으로 상기 검출된 얼굴영역에서 나이추정용 얼굴영역을 잘라내는 단계; (f1) cutting out an age estimation face area from the detected face area based on the detected face feature point;
    (f2) 상기 잘라낸 나이추정용 얼굴영역의 크기를 정규화하는 단계; (f2) normalizing the size of the cut age estimation face region;
    (f3) 상기 크기가 정규화된 나이추정용 얼굴영역의 국부적 조명보정을 하는 단계; (f3) performing local illumination correction on the age estimation face region where the size is normalized;
    (f4) 상기 크기 정규화 및 국부적 조명보정된 나이추정용 얼굴영역으로부터 입력벡터를 구성하고 나이다양체 공간으로 사영하여 특징벡터를 생성하는 단계; 및 (f4) generating a feature vector by constructing an input vector from the size normalized and locally-illuminated age estimation face region and projecting it into a nine-body space; And
    (f5) 상기 생성된 특징벡터에 2차회귀를 적용하여 나이를 추정하는 단계;를 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법.and (f5) estimating an age by applying quadratic regression to the generated feature vector.
  13. 제1항에 있어서, The method of claim 1,
    상기 (d) 단계 이후에, After step (d),
    (g) 상기 검출된 얼굴영역을 이용하여 상기 시청자의 눈감김을 추정하는 눈감김추정단계;를 더 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법. and (g) estimating eyelids of the viewer using the detected face region.
  14. 제13항에 있어서, The method of claim 13,
    상기 눈감김의 추정은, Estimation of the eye closing,
    (g1) 상기 검출된 얼굴특징점을 기준으로 상기 검출된 얼굴영역에서 눈감김추정용 얼굴영역을 잘라내는 단계; (g1) cutting a face region for eye closure estimation from the detected face region based on the detected facial feature point;
    (g2) 상기 잘라낸 눈감김추정용 얼굴영역의 크기를 정규화하는 단계; (g2) normalizing the size of the cut-out eye mask estimation face region;
    (g3) 상기 크기가 정규화된 눈감김추정용 얼굴영역의 히스토그램을 정규화하는 단계; 및 (g3) normalizing a histogram of the face region for estimating the eyelid normalized in size; And
    (g4) 상기 크기 및 히스토그램이 정규화된 눈감김추정용 얼굴영역으로부터 입력벡터를 구성하고 미리 학습된 SVM 알고리즘을 이용하여 눈감김을 추정하는 단계;를 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법. (g4) constructing an input vector from the face region for eye-eye estimation for which the size and histogram are normalized, and estimating eye-eye by using a pre-learned SVM algorithm; generating viewer face tracking information Way.
  15. 시청자의 응시방향 및 응시거리 중 적어도 하나의 정보에 대응하여 3차원 디스플레이 장치의 입체감을 제어하기 위한 시청자 얼굴 추적정보 생성방법으로서, A viewer face tracking information generation method for controlling stereoscopic feeling of a 3D display device in response to at least one piece of information of a viewer's gaze direction and gaze distance,
    상기 3차원 디스플레이 장치 측의 일 위치에 구비된 영상입력수단을 통해 입력되는 영상에서 추출한 이미지로부터 상기 시청자의 얼굴영역을 검출하는 얼굴영역 검출단계; A face region detecting step of detecting a face region of the viewer from an image extracted from an image input through an image input means provided at one position of the 3D display apparatus;
    상기 검출된 얼굴영역에 근거하여 상기 시청자의 응시방향 및 응시거리 중 적어도 하나의 정보를 추정하여 응시정보를 생성하는 응시정보 생성단계; 및 A gaze information generation step of generating gaze information by estimating at least one information of gaze direction and gaze distance of the viewer based on the detected face region; And
    상기 검출된 얼굴영역에 근거하여 상기 시청자의 성별 및 나이 중 적어도 하나의 정보를 추정하여 시청자정보를 생성하는 시청자정보 생성단계;를 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법. And generating viewer information by estimating at least one piece of information of a gender and an age of the viewer based on the detected face region.
  16. 제1항 내지 제15항 중의 어느 한 항에 기재된 방법의 각 단계를 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체. A computer-readable recording medium having recorded thereon a program for executing each step of the method according to any one of claims 1 to 15.
  17. 제1항 내지 제15항 중의 어느 한 항에 기재된 시청자 얼굴 추적정보 생성방법을 이용하여 입체감을 제어하는 3차원 디스플레이 장치. A three-dimensional display apparatus for controlling a three-dimensional effect by using the method for generating viewer face tracking information according to any one of claims 1 to 15.
  18. 시청자의 응시방향 및 응시거리 중 적어도 하나의 정보에 대응하여 3차원 디스플레이 장치의 입체감을 제어하기 위한 시청자 얼굴 추적정보 생성장치로서, A viewer face tracking information generation device for controlling a stereoscopic feeling of a three-dimensional display device in response to at least one piece of information of a viewer's gaze direction and gaze distance,
    상기 3차원 디스플레이 장치 측의 일 위치에 구비된 영상입력수단을 통해 입력되는 영상에서 추출한 이미지로부터 상기 시청자의 얼굴영역을 검출하는 얼굴영역 검출모듈; A face region detection module for detecting a face region of the viewer from an image extracted from an image input through an image input means provided at one position of the 3D display apparatus;
    상기 검출된 얼굴영역에서 얼굴특징점을 검출하는 얼굴특징점 검출모듈; A facial feature point detection module for detecting a facial feature point in the detected face area;
    3차원 표준 얼굴모델의 모델특징점을 변환하여 상기 얼굴특징점에 대응하는 3차원 시청자 얼굴모델을 생성하는 최적변환행렬을 추정하는 행렬 추정모듈; 및 A matrix estimation module for transforming a model feature point of a 3D standard face model to estimate an optimal transformation matrix for generating a 3D viewer face model corresponding to the face feature point; And
    상기 추정된 최적변환행렬에 근거하여 상기 시청자의 응시방향 및 응시거리 중 적어도 하나를 추정하여 시청자 얼굴 추적정보를 생성하는 추적정보 생성모듈;을 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성장치. And a tracking information generation module configured to generate at least one of a gaze direction and a gaze distance of the viewer based on the estimated optimal transformation matrix to generate viewer face tracking information.
  19. 제18항에 있어서, The method of claim 18,
    상기 얼굴특징점 검출모듈은, The facial feature point detection module,
    ASM(active shape model) 방법의 특징점(landmark) 탐색에 의해 얼굴특징점을 검출하되, AdaBoost 알고리즘을 이용하여 진행하는 것을 특징으로 하는 시청자 얼굴 추적정보 생성장치. The apparatus for detecting facial features of a viewer characterized by detecting facial feature points by searching for landmarks of an active shape model (ASM) method, and using the AdaBoost algorithm.
  20. 제18항에 있어서, The method of claim 18,
    상기 행렬 추정모듈은, The matrix estimation module,
    상기 3차원 표준 얼굴모델의 얼굴 회전정보에 관한 3*3 행렬 M과 얼굴 평행이동정보에 관한 3차원 벡터 T를 이용하여 하기 수학식4의 변환식을 계산하고-상기 M과 T는 각 성분을 변수로 가지며, 상기 최적변환행렬을 정의하는 행렬임-; 상기 수학식4에 의해 구해진 카메라특징점위치벡터(PC)와 하기 수학식6에 의해 구해진 카메라변환행렬(MC)를 이용하여 하기 수학식5의 3차원 벡터 P'을 계산하며, 상기 3차원 벡터 P'에 근거하여 2차원 벡터 PI를 (P'x/P'z, P'y/P'z)로 정의하고, 상기 2차원 벡터 PI와 상기 (b) 단계에서 검출된 얼굴특징점의 좌표값을 이용하여 상기 최적변환행렬의 각 변수를 추정하는 것을 특징으로 하는 시청자 얼굴 추적정보 생성장치. Using the 3 * 3 matrix M of the face rotation information of the 3D standard face model and the 3D vector T of the face parallel movement information, a conversion equation of Equation 4 is calculated, wherein M and T are variables of each component. Is a matrix defining the optimal transformation matrix; The three-dimensional vector P 'of Equation 5 is calculated by using the camera feature point position vector P C obtained by Equation 4 and the camera transformation matrix M C obtained by Equation 6 below, and the three-dimensional Based on the vector P ', the 2D vector P I is defined as (P' x / P ' z , P' y / P ' z ), and the 2D vector P I and the facial feature detected in step (b) An apparatus for tracking face tracking information of a viewer, comprising estimating each variable of the optimal transformation matrix using a coordinate value of.
    [수학식4][Equation 4]
    PC=M*PM+TP C = M * P M + T
    [수학식5][Equation 5]
    P'=Mc*Pc P '= M c * P c
    (단, P'은 (P'x, P'y, P'z)로 정의되는 3차원 벡터)(Where P 'is a three-dimensional vector defined by (P' x , P ' y , P' z ))
    [수학식6][Equation 6]
    Figure PCTKR2012005202-appb-I000037
    Figure PCTKR2012005202-appb-I000037
    (단, W:영상입력수단으로 입력된 이미지의 폭,(W: the width of the image input by the video input means,
    H:영상입력수단으로 입력된 이미지의 높이,H: height of the image inputted by the video input means,
    focal_len:-0.5*W/tan(Degree2Radian(fov*0.5)),focal_len: -0.5 * W / tan (Degree2Radian (fov * 0.5)),
    fov:카메라의 보임각도)fov: angle of view of the camera)
  21. 제18항에 있어서, The method of claim 18,
    상기 검출된 얼굴영역을 이용하여 상기 시청자의 성별을 추정하는 성별추정모듈;을 더 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성장치. And a gender estimating module for estimating the gender of the viewer by using the detected face region.
  22. 제18항에 있어서, The method of claim 18,
    상기 검출된 얼굴영역을 이용하여 상기 시청자의 나이를 추정하는 나이추정모듈;을 더 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성장치. And an age estimation module for estimating the age of the viewer using the detected face region.
  23. 제18항에 있어서, The method of claim 18,
    상기 검출된 얼굴영역을 이용하여 상기 시청자의 눈감김을 추정하는 눈감김추정모듈;을 더 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성장치. And an eye closure estimation module for estimating eye closure of the viewer by using the detected face region.
  24. 시청자의 응시방향 및 응시거리 중 적어도 하나의 정보에 대응하여 3차원 디스플레이 장치의 입체감을 제어하기 위한 시청자 얼굴 추적정보 생성장치로서, A viewer face tracking information generation device for controlling a stereoscopic feeling of a three-dimensional display device in response to at least one piece of information of a viewer's gaze direction and gaze distance,
    상기 3차원 디스플레이 장치 측의 일 위치에 구비된 영상입력수단을 통해 입력되는 영상에서 추출한 이미지로부터 상기 시청자의 얼굴영역을 검출하는 수단; Means for detecting a face region of the viewer from an image extracted from an image input through an image input means provided at one position of the 3D display apparatus;
    상기 검출된 얼굴영역에 근거하여 상기 시청자의 응시방향 및 응시거리 중 적어도 하나의 정보를 추정하여 응시정보를 생성하는 수단; 및 Means for generating gaze information by estimating at least one of gaze direction and gaze distance of the viewer based on the detected face region; And
    상기 검출된 얼굴영역에 근거하여 상기 시청자의 성별 및 나이 중 적어도 하나의 정보를 추정하여 시청자정보를 생성하는 수단;을 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성장치. And means for generating viewer information by estimating at least one information of the gender and age of the viewer based on the detected face region.
PCT/KR2012/005202 2011-07-08 2012-06-29 Method and apparatus for generating viewer face-tracing information, recording medium for same, and three-dimensional display apparatus WO2013009020A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/003,685 US20140307063A1 (en) 2011-07-08 2012-06-29 Method and apparatus for generating viewer face-tracing information, recording medium for same, and three-dimensional display apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2011-0067713 2011-07-08
KR20110067713A KR101216123B1 (en) 2011-07-08 2011-07-08 Method and device for generating tracking information of viewer's face, computer-readable recording medium for the same, three dimensional display apparatus

Publications (3)

Publication Number Publication Date
WO2013009020A2 true WO2013009020A2 (en) 2013-01-17
WO2013009020A3 WO2013009020A3 (en) 2013-03-07
WO2013009020A4 WO2013009020A4 (en) 2013-08-15

Family

ID=47506652

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2012/005202 WO2013009020A2 (en) 2011-07-08 2012-06-29 Method and apparatus for generating viewer face-tracing information, recording medium for same, and three-dimensional display apparatus

Country Status (3)

Country Link
US (1) US20140307063A1 (en)
KR (1) KR101216123B1 (en)
WO (1) WO2013009020A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107278369A (en) * 2016-12-26 2017-10-20 深圳前海达闼云端智能科技有限公司 Method, device and the communication system of people finder

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5649601B2 (en) * 2012-03-14 2015-01-07 株式会社東芝 Verification device, method and program
US9104908B1 (en) * 2012-05-22 2015-08-11 Image Metrics Limited Building systems for adaptive tracking of facial features across individuals and groups
US9111134B1 (en) 2012-05-22 2015-08-18 Image Metrics Limited Building systems for tracking facial features across individuals and groups
KR20150057064A (en) * 2013-11-18 2015-05-28 엘지전자 주식회사 Electronic device and control method thereof
JP6507747B2 (en) * 2015-03-18 2019-05-08 カシオ計算機株式会社 INFORMATION PROCESSING APPARATUS, CONTENT DETERMINING METHOD, AND PROGRAM
US9514397B2 (en) * 2015-03-23 2016-12-06 Intel Corporation Printer monitoring
KR101779096B1 (en) * 2016-01-06 2017-09-18 (주)지와이네트웍스 The object pursuit way in the integration store management system of the intelligent type image analysis technology-based
CN105739707B (en) * 2016-03-04 2018-10-02 京东方科技集团股份有限公司 Electronic equipment, face recognition tracking and 3 D displaying method
KR101686620B1 (en) * 2016-03-17 2016-12-15 델리아이 주식회사 System for judging senior citizen with face picture
KR102308871B1 (en) 2016-11-02 2021-10-05 삼성전자주식회사 Device and method to train and recognize object based on attribute of object
CN106960203B (en) * 2017-04-28 2021-04-20 北京搜狐新媒体信息技术有限公司 Facial feature point tracking method and system
CN107203743B (en) * 2017-05-08 2020-06-05 杭州电子科技大学 Face depth tracking device and implementation method
US10643383B2 (en) 2017-11-27 2020-05-05 Fotonation Limited Systems and methods for 3D facial modeling
TW202014992A (en) * 2018-10-08 2020-04-16 財團法人資訊工業策進會 System and method for simulating expression of virtual facial model
US10949649B2 (en) 2019-02-22 2021-03-16 Image Metrics, Ltd. Real-time tracking of facial features in unconstrained video
US11610414B1 (en) * 2019-03-04 2023-03-21 Apple Inc. Temporal and geometric consistency in physical setting understanding
MX2022003020A (en) 2019-09-17 2022-06-14 Boston Polarimetrics Inc Systems and methods for surface modeling using polarization cues.
CN110602556A (en) * 2019-09-20 2019-12-20 深圳创维-Rgb电子有限公司 Playing method, cloud server and storage medium
EP4033758A4 (en) * 2019-09-30 2024-01-17 Beijing Ivisual 3D Tech Co Ltd Method and apparatus for realizing 3d display, and 3d display terminal
DE112020004813B4 (en) 2019-10-07 2023-02-09 Boston Polarimetrics, Inc. System for expanding sensor systems and imaging systems with polarization
CN114787648B (en) 2019-11-30 2023-11-10 波士顿偏振测定公司 Systems and methods for transparent object segmentation using polarization cues
US11195303B2 (en) 2020-01-29 2021-12-07 Boston Polarimetrics, Inc. Systems and methods for characterizing object pose detection and measurement systems
JP2023511747A (en) 2020-01-30 2023-03-22 イントリンジック イノベーション エルエルシー Systems and methods for synthesizing data for training statistical models with different imaging modalities, including polarization imaging
KR102265624B1 (en) * 2020-05-08 2021-06-17 주식회사 온페이스에스디씨 Start-up security system for vehicles using facial recognition
US11953700B2 (en) 2020-05-27 2024-04-09 Intrinsic Innovation Llc Multi-aperture polarization optical systems using beam splitters
US11954886B2 (en) 2021-04-15 2024-04-09 Intrinsic Innovation Llc Systems and methods for six-degree of freedom pose estimation of deformable objects
US11290658B1 (en) 2021-04-15 2022-03-29 Boston Polarimetrics, Inc. Systems and methods for camera exposure control
US11689813B2 (en) 2021-07-01 2023-06-27 Intrinsic Innovation Llc Systems and methods for high dynamic range imaging using crossed polarizers

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000278716A (en) * 1999-03-25 2000-10-06 Mr System Kenkyusho:Kk Device and method for detecting view point position and stereoscopic picture display system
JP2005275935A (en) * 2004-03-25 2005-10-06 Omron Corp Terminal device
KR100711223B1 (en) * 2005-02-18 2007-04-25 한국방송공사 Face recognition method using Zernike/LDA and recording medium storing the method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6466250B1 (en) * 1999-08-09 2002-10-15 Hughes Electronics Corporation System for electronically-mediated collaboration including eye-contact collaboratory
KR101890622B1 (en) * 2011-11-22 2018-08-22 엘지전자 주식회사 An apparatus for processing a three-dimensional image and calibration method of the same

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000278716A (en) * 1999-03-25 2000-10-06 Mr System Kenkyusho:Kk Device and method for detecting view point position and stereoscopic picture display system
JP2005275935A (en) * 2004-03-25 2005-10-06 Omron Corp Terminal device
KR100711223B1 (en) * 2005-02-18 2007-04-25 한국방송공사 Face recognition method using Zernike/LDA and recording medium storing the method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FU, Y. ET AL.: 'Estimating Human Age by Manifold Analysis of Face Pictures and Regression on Aging Features' MULTIMEDIA AND EXPO, 2007 IEEE INTERNATIONAL CONFERENCE July 2007, pages 1383 - 1386 *
JAE-YOON, JUNG.: 'Robust Face Feature Extraction for Various Pose and Expression' THESIS FOR MASTER COURSE IN HONGIK UNIVERSITY GRADUATE SCHOOL February 2006, pages 18 - 55 *
KANG RYOUNG, PARK ET AL.: 'Facial Gaze Detection by Estimating Three Dimensional Positional Movements' JOURNAL OF THE INSTITUTE OF ELECTRONIC ENGINEERS OF KOREA vol. 39, no. 3, May 2002, pages 23 - 36 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107278369A (en) * 2016-12-26 2017-10-20 深圳前海达闼云端智能科技有限公司 Method, device and the communication system of people finder

Also Published As

Publication number Publication date
WO2013009020A4 (en) 2013-08-15
WO2013009020A3 (en) 2013-03-07
KR101216123B1 (en) 2012-12-27
US20140307063A1 (en) 2014-10-16

Similar Documents

Publication Publication Date Title
WO2013009020A2 (en) Method and apparatus for generating viewer face-tracing information, recording medium for same, and three-dimensional display apparatus
WO2013022226A4 (en) Method and apparatus for generating personal information of client, recording medium thereof, and pos system
WO2019216593A1 (en) Method and apparatus for pose processing
WO2018143707A1 (en) Makeup evaluation system and operation method thereof
WO2021167394A1 (en) Video processing method, apparatus, electronic device, and readable storage medium
WO2015102361A1 (en) Apparatus and method for acquiring image for iris recognition using distance of facial feature
WO2020050499A1 (en) Method for acquiring object information and apparatus for performing same
WO2020213750A1 (en) Artificial intelligence device for recognizing object, and method therefor
WO2019103484A1 (en) Multi-modal emotion recognition device, method and storage medium using artificial intelligence
WO2017188706A1 (en) Mobile robot and mobile robot control method
EP3740936A1 (en) Method and apparatus for pose processing
WO2017164716A1 (en) Method and device for processing multimedia information
WO2018016837A1 (en) Method and apparatus for iris recognition
WO2017039348A1 (en) Image capturing apparatus and operating method thereof
WO2018048054A1 (en) Method for producing virtual reality interface on the basis of single-camera 3d image analysis, and device for producing virtual reality interface on the basis of single-camera 3d image analysis
WO2017090837A1 (en) Digital photographing apparatus and method of operating the same
WO2018062647A1 (en) Normalized-metadata generation apparatus, object occlusion detection apparatus, and methods thereof
WO2020141729A1 (en) Body measurement device, and control method therefor
WO2015133699A1 (en) Object recognition apparatus, and recording medium in which method and computer program therefor are recorded
WO2019085495A1 (en) Micro-expression recognition method, apparatus and system, and computer-readable storage medium
WO2021006366A1 (en) Artificial intelligence device for adjusting color of display panel, and method therefor
WO2017188800A1 (en) Mobile robot and control method therefor
WO2020117006A1 (en) Ai-based face recognition system
EP3440593A1 (en) Method and apparatus for iris recognition
WO2019135621A1 (en) Video playback device and control method thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12811349

Country of ref document: EP

Kind code of ref document: A2

WWE Wipo information: entry into national phase

Ref document number: 14003685

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21/05/2014)

122 Ep: pct application non-entry in european phase

Ref document number: 12811349

Country of ref document: EP

Kind code of ref document: A2