WO2013009020A2 - 시청자 얼굴 추적정보 생성방법 및 생성장치, 그 기록매체 및 3차원 디스플레이 장치 - Google Patents
시청자 얼굴 추적정보 생성방법 및 생성장치, 그 기록매체 및 3차원 디스플레이 장치 Download PDFInfo
- Publication number
- WO2013009020A2 WO2013009020A2 PCT/KR2012/005202 KR2012005202W WO2013009020A2 WO 2013009020 A2 WO2013009020 A2 WO 2013009020A2 KR 2012005202 W KR2012005202 W KR 2012005202W WO 2013009020 A2 WO2013009020 A2 WO 2013009020A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- face
- viewer
- equation
- information
- face region
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/30—Image reproducers
- H04N13/366—Image reproducers using viewer tracking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/251—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/446—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering using Haar-like filters, e.g. using integral image techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
- G06V40/165—Detection; Localisation; Normalisation using facial parts and geometric relationships
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/30—Image reproducers
- H04N13/366—Image reproducers using viewer tracking
- H04N13/383—Image reproducers using viewer tracking for tracking with gaze detection, i.e. detecting the lines of sight of the viewer's eyes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/178—Human faces, e.g. facial parts, sketches or expressions estimating age from face image; using age information for improving recognition
Definitions
- the present invention relates to a method and apparatus for generating viewer face tracking information, a recording medium and a three-dimensional display apparatus.
- the facial feature point in the viewer's face is detected from the image extracted from the image input through the image input means, and the viewer's gaze direction for controlling the stereoscopic effect of the 3D display device using the facial feature point and the optimal transformation matrix.
- a method and apparatus for generating a viewer face tracking information for generating information on gaze distance, a recording medium, and a three-dimensional display device is used.
- Human eyes are about 6.5 cm apart in the transverse direction.
- the resulting binocular disparity acts as the most important factor for the three-dimensional feeling.
- the left eye and the right eye see different 2D images.
- a single image is created from two images obtained by the visual difference between two eyes and shows the difference between the two eyes so that a person can feel the liveness and reality as if they are in the place where the image is being made.
- the technology is called 3D stereoscopic imaging technology.
- 3D stereoscopic image technology has become a core technology that is widely applied to the development of all existing industrial products such as 3D TV, information and communication, broadcasting, medical, film, games, animation and so on.
- 3D TV is a device that inputs images for left and right eyes to each eye on a display using special glasses and recognizes 3D in human cognitive / information system using binocular parallax principle.
- the 3D TV separates a left / right image that causes an artificial visual difference from a display and delivers it to both eyes, thereby making the brain feel a 3D stereoscopic feeling.
- a passive 3D TV is composed of an optical film, a liquid crystal, and a polaroid film (PR film), as shown in FIG. 1.
- the image to be seen by the left eye denoted by L is the left eye and the right denoted by R.
- the image to go to the eyes is displayed on the right eye, and the 3D stereoscopic feeling is felt.
- a control technology such as tracking the direction and the position at which the viewer stares, controlling the stereoscopic effect of the 3D TV, or rotating the 3D TV screen.
- the glasses-free 3D TV is a TV that can provide 3D images without using special glasses, and in order to apply the glasses-free method, a technology for tracking a viewer's gaze is further required.
- One example of a technique for tracking the direction in which the viewer stares is to track the viewer's eyes.
- the method of tracking the viewer's eyes uses a method of outputting the coordinates of the pupil using an eye tracking algorithm after grasping the feature points of the eye position.
- a method of detecting a boundary line between an iris and an egg white in a face image and tracking the same is used.
- this method has a problem that it is difficult to accurately determine the angle at which the eye gazes, and the eye tracking angle is small.
- An object of the present invention for solving the problems according to the prior art is to detect the facial feature in the viewer's face from the image extracted from the image input through the image input means, and using the facial feature and the optimal conversion matrix three-dimensional display Disclosed is a method and apparatus for generating a viewer's face tracking information for generating information about a viewer's gaze direction and gaze distance for controlling a stereoscopic effect of a device, a recording medium, and a three-dimensional display device.
- An embodiment of the present invention for achieving the above object, as a viewer face tracking information generation method for controlling the stereoscopic sense of the three-dimensional display device corresponding to at least one of the gaze direction and gaze distance of the viewer, ( a) detecting a face region of the viewer from an image extracted from an image input through an image input means provided at one position of the 3D display apparatus; (b) detecting a facial feature point in the detected face region; (c) estimating an optimal transformation matrix for generating a 3D viewer face model corresponding to the face feature by converting the model feature points of the 3D standard face model; And (d) estimating at least one of the gaze direction and gaze distance of the viewer based on the optimal transformation matrix to generate viewer face tracking information.
- a viewer face tracking information generation method for controlling a stereoscopic feeling of a 3D display apparatus in response to at least one of a gaze direction and a gaze distance of a viewer, wherein the 3D display is performed.
- a face region detecting step of detecting a face region of the viewer from an image extracted from an image input through an image input means provided at one position of a device side;
- a gaze information generation step of generating gaze information by estimating at least one information of gaze direction and gaze distance of the viewer based on the detected face region; And generating viewer information by estimating at least one piece of information of the gender and the age of the viewer based on the detected face region.
- a computer-readable recording medium recording a program for executing each step of the viewer face tracking information generation method.
- a three-dimensional display device for controlling the three-dimensional effect by using the viewer face tracking information generation method.
- a viewer face tracking information generation device for controlling a stereoscopic feeling of a 3D display device in response to at least one of a gaze direction and a gaze distance of a viewer, wherein the 3D display is provided.
- a face region detection module for detecting a face region of the viewer from an image extracted from an image input through an image input means provided at a position of a device side;
- a facial feature point detection module for detecting a facial feature point in the detected face area;
- a matrix estimation module for transforming a model feature point of a 3D standard face model to estimate an optimal transformation matrix for generating a 3D viewer face model corresponding to the face feature point;
- a tracking information generation module for estimating at least one of a gaze direction and a gaze distance of the viewer based on the estimated optimal transformation matrix to generate viewer face tracking information.
- a viewer face tracking information generation device for controlling a stereoscopic feeling of a 3D display device in response to at least one of a gaze direction and a gaze distance of a viewer, wherein the 3D display is provided.
- the present invention estimates the gaze direction and gaze distance of a viewer by using an optimal transformation matrix for converting the model feature points of the 3D standard face model to generate a 3D viewer face model corresponding to the face feature points of the face region. do.
- the tracking speed is high, so it is suitable for real-time tracking, and there is an advantage that the face area can be robustly tracked even in the local distortion of the face area.
- an asymmetric similar feature (harr-like feature) is used to detect the non-frontal face region, the detection reliability of the face region with respect to the non-frontal face is high, thereby increasing the tracking performance of the face region.
- the gaze direction and gaze distance of the viewer are estimated to generate gaze direction information and gaze distance information, and additionally, at least one of gender or age of the viewer is estimated to generate viewer information.
- the screen output of the 3D display device may be used as information for turning off or stopping playback. There is this.
- 1 is a configuration diagram showing a schematic configuration of a passive 3D TV.
- FIG. 2 is a state diagram showing a state of watching a passive 3D TV from the front
- FIG. 3 is a state diagram illustrating a state in which a passive 3D TV is viewed from the side;
- FIG. 4 is a block diagram showing a schematic configuration of a viewer face tracking information generating device according to an embodiment of the present invention.
- FIG. 5 is a picture showing a three-dimensional standard face model in connection with the viewer face tracking information generation according to an embodiment of the present invention.
- FIG. 6A is a first picture showing an example screen of a UI module in connection with generating viewer face tracking information according to an embodiment of the present invention.
- FIG. 6B is a second picture showing an example screen of a UI module in connection with generating viewer face tracking information according to an embodiment of the present invention.
- FIG. 7 is a flowchart illustrating a process of a viewer face tracking information generation method according to an embodiment of the present invention.
- FIG. 8 is a view showing the basic shape of a conventional Harr-like feaure.
- FIG. 9 is an exemplary photograph of a harr-like feaure for detecting a front face region in relation to the generation of viewer face tracking information according to an embodiment of the present invention.
- FIG. 10 is an exemplary photograph of a harr-like feaure for detecting a non-frontal face region in connection with generating viewer face tracking information according to an embodiment of the present invention.
- FIG. 11 is a diagram illustrating a newly added rectangular feaure in connection with generating viewer face tracking information according to an embodiment of the present invention.
- FIG. 12 is an exemplary photograph of a harr-like feaure selected from FIG. 11 for detecting a non-frontal face region in connection with generating viewer face tracking information according to an embodiment of the present invention.
- FIG. 12 is an exemplary photograph of a harr-like feaure selected from FIG. 11 for detecting a non-frontal face region in connection with generating viewer face tracking information according to an embodiment of the present invention.
- Figure 13 is a feature probability curve in a training set for a conventional Harr-like feaure and Harr-like feaure applied to the present invention.
- 15 is a profile picture applied to the conventional ASM method for a low-resolution or poor image quality.
- 16 is a photograph of the pattern around each marker point used in Adaboost for marker point search of the present invention.
- FIG. 17 is a photograph showing 28 feature points of a face in connection with generating viewer face tracking information according to an embodiment of the present invention.
- FIG. 18 is a flowchart illustrating a matrix estimation process of a method for generating viewer face tracking information according to an embodiment of the present invention.
- 19 is a flowchart illustrating a gender estimation process of a method for generating viewer face tracking information according to an embodiment of the present invention.
- 20 is an exemplary photograph for defining a gender estimation face area in the gender estimation process of the viewer face tracking information generation method according to an embodiment of the present invention.
- 21 is a flowchart illustrating an age estimation process of a method for generating viewer face tracking information according to an embodiment of the present invention.
- FIG. 22 is an exemplary photograph for defining an age estimation face region in an age estimation process of a method for generating viewer face tracking information according to an embodiment of the present invention.
- 23 is a flowchart illustrating a process of estimating eye closure of a method of generating viewer face tracking information according to an embodiment of the present invention.
- 24 is an exemplary photograph for defining a face region for eye closure estimation in a process of eyelid estimation of a method for generating viewer face tracking information according to an embodiment of the present invention.
- 25 is a plan view for explaining a coordinate system (camera coordinate system) of the image input means in connection with generating the viewer face tracking information according to an embodiment of the present invention.
- the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component.
- FIG. 4 is a block diagram showing a schematic configuration of a viewer face tracking information generating device according to an embodiment of the present invention.
- a viewer face tracking information generating apparatus for controlling a stereoscopic feeling of a 3D display device in response to at least one of a gaze direction and a gaze distance of a viewer.
- the viewer face tracking information generating device includes a computing element such as a central processing unit, a system DB, a system memory, and an interface.
- a computing element such as a central processing unit, a system DB, a system memory, and an interface.
- the viewer face tracking information generating device may be a conventional computer system connected to a 3D display device such as a 3D TV to transmit and receive a control signal.
- the viewer face tracking information generating apparatus can be regarded as functioning as the viewer face tracking information generating apparatus by installing and driving the viewer face tracking information generating program in the above-described conventional computer system.
- the viewer face tracking information generation device of the present embodiment may be configured in the form of an embedded device in a three-dimensional display device such as a 3D TV.
- the viewer face tracking information generating device includes a face region detection module 100.
- the face region detection module 100 is captured by the image capture unit 20 captured by an image input unit 10, for example, an image input through a camera, provided at a position of the 3D display apparatus.
- the facial region of the viewer is detected from the image.
- the detection viewing angle may be all faces in the range of -90 to +90.
- the image input means 10 may be installed at the top or bottom side of the center portion of the 3D TV 1.
- the image input means 10 may be a camera capable of capturing a face of a viewer located in front of a TV screen in real time as a video, and more preferably, a digital camera having an image sensor.
- the face area detection module 100 generates a YCbCr color model from the RGB color information of the extracted image, separates color information and brightness information from the created color model, and detects a face candidate area based on the brightness information. Perform the function.
- the face region detection module 100 defines a quadrilateral feature point model for the detected face candidate region, and detects the face region based on the training material learned by the AdaBoost learning algorithm. Do this.
- the face area detection module 100 performs a function of determining the detected face area as a valid face area when the size of the result value of the AdaBoost exceeds a predetermined threshold value.
- the viewer face tracking information generation device also includes a face feature point detection module 200.
- the facial feature point detection module 200 performs facial feature point detection on face areas determined to be valid in the face area detection module 100.
- the facial feature detection module 200 may detect 28 facial feature points, including, for example, a face viewing rotation angle, for which each position of an eyebrow, an eye, a nose, and a mouth can be defined.
- a total of eight feature points preferably four eyes, two noses, and two mouths, which are basic facial feature points, can be detected as facial feature points.
- the viewer face tracking information generation device also includes a matrix estimation module 300.
- the matrix estimation module 300 estimates an optimal transformation matrix for generating a 3D viewer face model corresponding to the face feature by converting a model feature point of the 3D standard face model.
- the 3D standard face model may be a 3D mesh model composed of 331 points and 630 triangles, as shown in FIG. 5.
- the viewer face tracking information generation device also includes a tracking information generation module 400.
- the tracking information generation module 400 estimates at least one of the gaze direction and gaze distance of the viewer based on the optimal transformation matrix to generate viewer face tracking information.
- the viewer face tracking information generation device also includes a gender estimation module 500.
- the gender estimating module 500 estimates the gender of the viewer using the detected face region.
- the gender estimating module 500 cuts out a gender estimation face area from the detected face area, normalizes the cut out face area image, and estimates a sex by a SVM (Support Vector Machine) using the normalized image. Do this.
- SVM Small Vector Machine
- the viewer face tracking information generation device also includes an age estimation module 600.
- the age estimation module 600 estimates the age of the viewer using the detected face region.
- the age estimation module 600 cuts out an age estimation face area from the detected face area.
- the age estimation module 600 performs a function of normalizing the cropped face region image.
- the age estimating module 600 constructs an input vector from a normalized image and performs projection on a nine-body space.
- the age estimation module 600 performs a function of estimating age using a second order polynomial regression.
- the viewer face tracking information generation device also includes an eyelid estimation module 700.
- the eyelid estimation module 700 estimates the eyelids of the viewer using the detected face region.
- the eyelid estimation module 700 performs a function of cutting a face region for eyelid estimation, a function of normalizing the cut-out face region image, and an eyelid estimation function by a support vector machine (SVM) using the normalized image.
- SVM support vector machine
- the viewer face tracking information generating apparatus may also display the setting of the image input means 10 provided on one side of the 3D display apparatus (FIG. 6A), the detected face region, the age / gender result, and the like (FIG. 6B). It is provided with a UI (User Interface) module.
- UI User Interface
- FIG. 7 is a flowchart illustrating a process of generating a viewer face tracking information according to an embodiment of the present invention.
- the viewer face tracking information generation method starts from the start of the generation process, and includes the face area detection step S100, the facial feature point detection step S200, the matrix estimation step S300, and the tracking.
- the information generation step (S400) gender estimation step (S500), age estimation step (S600), eye closure estimation step (S700), the result output step (S800) is made to the end step.
- the face region of the viewer is detected from an image extracted from an image input through an image input means provided at one position of the 3D display apparatus.
- a method for face detection for example, a knowledge-based method, a feature-based method, a template-matching method, an appearance-based method, and the like.
- an appearance-based method is used.
- the appearance-based method is a method of acquiring a face region and a non-face region from different images, learning the acquired regions to make a learning model, and comparing the input image and the learning model data to detect a face.
- the appearance-based method is known as a relatively high performance method for front and side face detection.
- Image extraction from an image input through the image input means may be performed by capturing an image from an image input through the image input means, for example, using a sample grabber of DirectX.
- the media type of the sample grabber may be set to RGB24.
- a video converter filter is automatically attached to the front of the sample grabber filter so that the image captured by the sample grabber finally becomes RGB24.
- mt.formattype FORMAT_VideoInfo
- mt.majortype MEDIATYPE_Video
- mt.subtype MEDIASUBTYPE_RGB24; // only accept 24-bit bitmaps
- a YCbCr color model is generated from the RGB color information of the extracted image, color information and brightness information are separated from the generated color model, and the face candidate area is determined by the brightness information. Detecting; (a2) defining a quadrilateral feature point model for the detected face candidate region, and detecting a face region based on learning data trained by the AdaBoost learning algorithm on the quadrilateral feature point model; And (a3) determining the detected face area as a valid face area when the size of the result value of AdaBoost (CF H (x) of Equation 1) exceeds a predetermined threshold value. do.
- ⁇ A value used to finely adjust the error judgment rate of the strong classifier.
- the AdaBoost learning algorithm is known as an algorithm that generates a strong classifier with high detection performance through linear combination of weak classifiers.
- frontal face images the structural features unique to the face, such as eyes, nose and mouth, are evenly distributed throughout the image and are symmetrical.
- the non-facial face image it is not symmetrical and is concentrated in a narrow range. Since the face outline is not a straight line, the background area is mixed.
- the present embodiment further includes new Haar-Like features similar to the existing Haar-like features but adding asymmetry.
- FIG. 8 is a basic form of a conventional Harr-like feaure
- FIG. 9 is an exemplary photograph of Haar-like features selected for front face area detection according to an embodiment of the present invention
- FIG. 11 shows a rectangular Haar-Like feature newly added by the present embodiment
- FIG. 12 shows an example of Haar-Like features selected for non-face detection among the Haar-Like features of FIG. 11. have.
- the Haar-Like feature of the present embodiment is configured to asymmetrically form, structure, and shape as shown in FIG. Excellent detection effect on the front face.
- FIG. 13 is a probability curve of a Haar-Like feature in a training set for a conventional Harr-like feaure and a Harr-like feaure applied to this embodiment.
- A) is the present case
- b) is the existing case
- the probability curve corresponding to the case of the present embodiment is concentrated in a narrower range.
- Haar-Like features added in this embodiment are effective in the face detection in view of the base classification rule.
- FIG. 14 is a table showing newly added features in a training set of a non-facial face and an average value of variance and Kurtosis of the probability curve of the existing Harr-like feaure.
- the table shows the variances and probability values of Kurtosis of the probability curves of the newly added Haar-Like features and existing Haar-Like features in the training set of the non-facial face.
- the Haar-Like features added in this example show that the dispersion is small and the Kurtosis is large, which is effective in detection.
- the har-like feature for detecting the face area further includes an asymmetric har-like feature for detecting the non-frontal face area. do.
- the validity of the detected face is determined by comparing the magnitude of the result value of AdaBoost (CF H (x) of Equation 1) with a predetermined threshold value.
- Equation 1 the size of CF H (x) can be used as an important factor for determining the validity of the face.
- This value CF H (x) is a measure of how close the detected area is to the face and can be used to determine the validity of the face by setting a predetermined threshold value.
- the predetermined threshold is empirically set using the learning face group.
- a facial feature point is detected in the detected face region.
- the facial feature detection step S200 is performed by searching for a landmark of the ASM method, and detects the facial feature by proceeding using the AdaBoost algorithm.
- the detection of the facial feature point (b1) defines the position of the current feature point as (x l , y l ), and all possible partial windows of n * n pixel size in the vicinity of the current feature point position. Classifying them into a classifier; (b2) calculating candidate positions of the feature points according to Equation 2 below; And (b3) setting (x ' l , y' l ) as a new feature point if the condition of Equation 3 is satisfied, and maintaining the position (x l , y l ) of the current feature point if not satisfied. It is configured to include.
- N pass the number of steps through which the partial window has passed
- a method for detecting a feature point of a face there are, for example, a method of individually detecting feature points and a method of simultaneously detecting a feature point in correlation.
- the Active Shape Model (ASM) method which is a preferable method for face feature detection in terms of speed and accuracy, is used. I use it.
- the feature point search of the existing ASM is a method using a profile at the feature point, detection is stable only in high quality images.
- an image extracted from an image input through an image input means such as a camera may be obtained as a low resolution, low quality image.
- the feature point is searched by the AdaBoost method to improve the feature, so that the feature points can be easily detected even in low resolution and low quality images.
- FIG. 15 is a profile picture applied to an existing ASM method for an image having a low resolution or poor image quality.
- FIG. 16 is a pattern picture around each mark point used in Adaboost for mark point search of the present invention.
- a plurality of feature points (for example, 28) may be detected.
- Bay eight basic facial features (4 eyes (4, 5, 6, 7), 2 noses (10, 11) and 2 mouths (8, 9)) in consideration of arithmetic processing and tracking performance together Bay is used to estimate gaze distance and gaze direction.
- eight facial feature points input S310 for example, the coordinate values of the detected eight feature points are stored in a memory device in which the program of the present embodiment is driven.
- Loading into the input value 3D standard face model loading (S320, for example, the overall coordinate information of the 3D face model stored in the DB, the computing means that the program is driven as the input value), optimal Conversion matrix estimation (S330) is performed.
- the estimation information generation step (S400) of calculating the gaze direction and gaze distance from the estimated optimal transformation matrix is performed.
- the 3D standard face model is a 3D mesh model composed of 331 points and 630 triangles.
- the estimating information generating step (S400) generates viewer face tracking information by estimating at least one of the gaze direction and gaze distance of the viewer based on the optimal transformation matrix.
- the optimal transformation matrix estimation is performed by calculating (c1) a transformation equation of Equation 4 using a 3 * 3 matrix M for face rotation information of the 3D standard face model and a 3D vector T for face parallel movement information.
- Step M and T are variables having respective components as variables and defining the optimal transformation matrix;
- (c2) calculating the three-dimensional vector P 'of Equation 5 using the camera feature point position vector P C obtained by Equation 4 and the camera transformation matrix M C obtained by Equation 6 below;
- (c4) estimating each variable of the optimal transformation matrix using the two-dimensional vector P I and the coordinate values of the facial feature points detected in the step (b).
- the optimal transform matrix is mathematically composed of a 3 * 3 matrix M and a 3D vector T.
- the 3 * 3 matrix M reflects the rotation information of the face
- the 3D vector T reflects the parallel movement information of the face.
- the feature point position (three-dimensional vector) P M in the coordinate system of the three-dimensional standard face model is the position (three-dimensional vector) P in the camera coordinate system by the optimal transformation matrix (M, T). converted to c .
- the 3D standard face model coordinate system is a 3D coordinate system whose coordinate center is located at the center of the 3D standard face model
- the camera coordinate system is a 3D coordinate system whose center is located at the center of the image input means (10 in FIG. 25).
- P ' which is a three-dimensional vector defined by (P'x, P'y, P'z), is obtained using the camera feature point position vector P c and the camera transformation matrix M c according to Equation 5.
- the camera transformation matrix Mc is a 3 * 3 matrix determined by the focal length of the camera and the like, and is defined as in Equation 6 below.
- focal_len -0.5 * W / tan (Degree2Radian (fov * 0.5))
- a target function of outputting a sum of squares of deviations between the positions of the detected feature points and the positions of the face model feature points to which the optimal transformation matrix is applied is set as 12 variables of the optimal transformation matrix.
- the 12 optimal variables are calculated by solving the optimization problem that minimizes the target function.
- the gaze direction information is defined by Equation 7 using each component of the rotation information related matrix M of the optimal transformation matrix, and the gaze distance information is a parallel movement related vector T of the optimal transformation matrix. Is defined.
- the gaze direction information becomes (a x , a y , a z ), and the gaze distance information is defined by the parallel movement related vector T itself.
- the gender estimating step (S500) as shown in FIG. 19, the image and the facial feature point input (S510), the gender estimation face region clipping (S520), the cut face region image normalization (S530), and the gender by SVM It is made in the process of estimation (S540).
- a method for sex estimation there are, for example, a view-based method using all of a human face and a geometric feature-based method using only geometric features of a face.
- the gender estimation is performed by a view-based gender classification method using SVM (Support Vector Machine) learning to normalize the detected face region to form a facial feature vector and predict the gender therewith.
- SVM Small Vector Machine
- the SVM method may be classified into a support vector classifier (SVC) and a support vector regression (SVR).
- SVC support vector classifier
- SVR support vector regression
- the gender estimating step (S500) specifically includes: (e1) cutting out a face region for sex estimation from the detected face region based on the detected face feature points; (e2) normalizing the size of the cut face sex estimation region; (e3) normalizing a histogram of the face region for gender estimation in which the size is normalized; And (e4) constructing an input vector from the face region for gender estimation where the size and histogram are normalized, and estimating gender using a pre-learned SVM algorithm.
- the face area is cut out using the input image and the facial feature point. For example, as shown in FIG. 20, the half of the distance between the left and right eyes is cut to 1 and is to be cut. Calculate the area of the face.
- the cut out facial region is normalized to 12 * 21 size.
- the histogram is normalized, which is a process of equalizing the number of pixels having each density value to the histogram in order to minimize the effect of the lighting effect.
- a 252-dimensional input vector is constructed from a normalized 12 * 21 face image, and sex is estimated using a pre-trained SVM.
- the gender is estimated as a male or a female if the calculated result of the classifier of Equation 8 is greater than zero.
- y i Gender value of the i th test data, set to 1 for male and -1 for female.
- the kernel function may use a Gaussian Radial Basis Function (GRBF) defined in Equation 9 below.
- GRBF Gaussian Radial Basis Function
- the kernel function may be a polynomial kernel, etc., in addition to the Gaussian copper soil function, and preferably, the Gaussian copper soil function is used in consideration of the identification performance.
- the SVM Small Vector Machine
- the SVM is a classification method that derives the boundary of two groups in a group having two groups and is known as a learning algorithm for pattern classification and regression.
- the basic learning principle of SVMs is to find an optimal linear hyperplane with minimal predictive classification errors for invisible test samples, that is, with good generalization performance.
- the linear SVM uses a taxonomic method to find the linear function with the least order.
- Equation 2 In order to determine the learning result uniquely, the following Equation 2 is restricted.
- Equation 3 the minimum distance between the learning sample and the hyperplane is represented by the following Equation 3, so it is necessarily as shown in the following Equation 4.
- Equation 5 Since w and b must be determined to maximize the minimum distance while fully identifying the learning sample, w and b are formulated as shown in Equation 5 below.
- Equation 4 Minimizing the objective function maximizes the value of Equation 4, which is the minimum distance.
- Equation 7 the constraint is shown in Equation 7 below.
- K (x, x ') is a nonlinear kernel function
- Adaboost method may be used in the above process, considering the performance and generalization performance of the classifier, it is more preferable to use the SVM method.
- the performance is 10-15% lower than when tested by the SVM method.
- the second polynomial regression is made by the process of age estimation (S650).
- the estimation of the age specifically includes: (f1) cutting out an age estimation face area from the detected face area based on the detected facial feature point; (f2) normalizing the size of the cut age estimation face region; (f3) performing local illumination correction on the age estimation face region where the size is normalized; (f4) generating a feature vector by constructing an input vector from the size normalized and locally-illuminated age estimation face region and projecting it into a nine-body space; And (f5) estimating age by applying quadratic regression to the generated feature vectors.
- the face region is cut out using the input image and the facial feature point.
- the face region is cut out from the binocular and the entrance point to the upper (0.8), the lower (0.2), the left (0.1), and the right (0.1), respectively.
- the cut out face region is normalized to 64 * 64 size.
- step (f3) in order to reduce the influence of the lighting effect, local illumination correction is performed by the following equation (10).
- I (x, y) (I (x, y) -M) / V * 10 + 127
- the standard dispersion value (V) is a characteristic value representing the degree to which a certain amount of coincidence is scattered around the average value, and mathematically, the standard dispersion V is calculated as in Equation (9).
- a 4096-dimensional input vector is constructed from a 64 * 64 face image, and a 50-dimensional feature vector is generated by projecting into a pre-learned manifold space.
- the age estimation theory assumes that the characteristics of the human aging process reflected in the face image can be expressed in patterns according to any low dimensional distribution.
- X is an input vector
- Y is a feature vector
- P is a projection matrix to Nida body trained using CEA.
- X is an m ⁇ n matrix and x i represents every face image.
- the manifold learning step is to obtain a projection matrix for representing the m-dimensional face vector as a d-dimensional face vector (aging feature vector), where d < m (d is much smaller than m).
- the image order m is much larger than the number n of images.
- m ⁇ m matrix XX T is a degenerate matrix.
- C pca is an m ⁇ m matrix.
- d matrix of eigenvectors are selected in order of eigenvalues to form matrix W PCA .
- W PCA is an m ⁇ d matrix.
- Ws denotes a relationship between face images belonging to the same age group and Wd denotes a relationship between face images belonging to different groups.
- Dist (X i , X j ) is the same as Ref. 12 below.
- the eigenvectors corresponding to the d largest eigenvalues of become CEA basis vectors.
- Orthogonal Vectors a 1 When, a d is calculated, the matrix WCEA is defined as follows.
- W CEA is the m ⁇ d matrix.
- the projective matrix P mat is defined as in Equation 15 below.
- the projection matrix P mat is used to obtain aging characteristics for each face vector X.
- step (f5) to estimate the age by applying the second regression is made by the following equation (11).
- b o , b 1 , and b 2 are precomputed from the learning material as follows:
- Equation 17 The second regression model is shown in Equation 17 below.
- Is the age of the i-th learning image Is the feature vector of the i-th learning image.
- N is the number of learning materials.
- the image and facial feature point input S710
- the eye region estimation for trimming the face region S720
- the cut out facial region image normalization S730
- SVM By eyelid estimation S740
- the estimation of the eye closing may specifically include: (g1) cutting the eye mask estimation face area from the detected face area based on the detected facial feature point; (g2) normalizing the size of the cut-out eye mask estimation face region; (g3) normalizing a histogram of the face region for estimating the eyelid normalized in size; And (g4) constructing an input vector from the face region for eye-eye estimation for which the size and histogram are normalized, and estimating eye-eye closure using a pre-learned SVM algorithm.
- the eye region is cut out using the input image and the facial feature point.
- the eye area may be cut out by determining the width of the feature points detected by the facial feature point detection based on both end points of the eye and determining the eye area at the same height up and down.
- the cropped eye region image is normalized to 20 * 20 size.
- step (g3) histogram normalization is performed to reduce the effect of the lighting effect.
- a 400-dimensional input vector is constructed from a normalized 20 * 20 face image, and estimated whether to close the eye using a pre-learned SVM.
- the estimation of the eye closing is determined as the state of opening the eyes when the result value of Equation 12 is greater than 0, and the state of closing the eyes when the result value is less than 0. Is determined to be awakened.
- y i Whether to close the eye for the i-th learning material is set to 1 when the eyes are opened and -1 when the eyes are closed.
- the kernel function may use a Gaussian landscape soil function defined in Equation 13.
- the sex information of the viewer and the age information of the viewer estimated by the process described above are output to the stereoscopic control means as information for controlling the stereoscopic sense of the 3D display apparatus.
- the development is based on the premise that an adult man is sitting on the front 2.5M of the 3D display device.
- the brain is to calculate the depth information accordingly.
- this difference can be as small as 1cm or 1.5cm.
- the gender information and the age information of the viewer are needed to determine this and control the stereoscopic feeling of the 3D display device.
- the gender information of the viewer and the age information of the viewer output by the stereoscopic control means may be used as a horizontal parallax change reference value, which means a change amount determined based on the point where the left and right images are focused. have.
- a 3D screen optimized for the current viewer's viewing condition may be output and provided. It is.
- the output direction of the 3D display apparatus may be changed by using rotation driving means (not shown) so that the front side of the 3D display apparatus faces the corresponding viewer.
- the viewer may be guided so that the viewer can move to the front of the 3D display by outputting captions such as “deviating from the viewing angle” and “winding to the front of the screen” on the screen of the 3D display.
- the eye contact information estimated by the above-described process is output to the screen power control means as information for controlling the ON / OFF screen output of the 3D display device.
- the screen power control means may turn off the image output to the display device screen so that no further image output is performed.
- Reference numeral 1000 in FIG. 25 denotes control means for performing such various control processes.
- Embodiments of the present invention include a computer readable recording medium including program instructions for performing various computer-implemented operations.
- the computer-readable recording medium may include program instructions, data files, data structures, etc. alone or in combination.
- the recording medium may be one specially designed and configured for the present invention, or may be known and available to those skilled in computer software.
- Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical recording media such as CD-ROMs, DVDs, magnetic-optical media such as floppy disks, and ROM, RAM, flash memory, and the like. Hardware devices specifically configured to store and execute the same program instructions are included.
- the recording medium may be a transmission medium such as an optical or metal wire, a waveguide, or the like including a carrier wave for transmitting a signal specifying a program command, a data structure, or the like.
- Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Geometry (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
- Processing Or Creating Images (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (24)
- 시청자의 응시방향 및 응시거리 중 적어도 하나의 정보에 대응하여 3차원 디스플레이 장치의 입체감을 제어하기 위한 시청자 얼굴 추적정보 생성방법으로서,(a) 상기 3차원 디스플레이 장치 측의 일 위치에 구비된 영상입력수단을 통해 입력되는 영상에서 추출한 이미지로부터 상기 시청자의 얼굴영역을 검출하는 단계;(b) 상기 검출된 얼굴영역에서 얼굴특징점을 검출하는 단계;(c) 3차원 표준 얼굴모델의 모델특징점을 변환하여 상기 얼굴특징점에 대응하는 3차원 시청자 얼굴모델을 생성하는 최적변환행렬을 추정하는 단계; 및(d) 상기 최적변환행렬에 근거하여 상기 시청자의 응시방향 및 응시거리 중 적어도 하나를 추정하여 시청자 얼굴 추적정보를 생성하는 단계;를 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법.
- 제1항에 있어서,상기 (a) 단계는,(a1) 상기 추출된 이미지의 RGB 색 정보로부터 YCbCr 색 모델을 작성하고, 작성된 색 모델에서 색 정보와 밝기 정보를 분리하며, 상기 밝기 정보에 의하여 얼굴후보영역을 검출하는 단계; 및(a2) 상기 검출된 얼굴후보영역에 대한 4각 특징점 모델을 정의하고, 상기 4각 특징점 모델을 AdaBoost 학습 알고리즘에 의하여 학습시킨 학습자료에 기초하여 얼굴영역을 검출하는 단계;를 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법.
- 제2항에 있어서,상기 (a2) 단계에서,상기 얼굴영역 검출을 위한 하 라이크 피쳐(harr-like feature)는 비정면 얼굴영역을 검출하기 위한 비대칭성의 하 라이크 피쳐(harr-like feature)를 더욱 포함하는 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법.
- 제1항에 있어서,상기 (b) 단계는,ASM(active shape model) 방법의 특징점(landmark) 탐색에 의해 이뤄지되, AdaBoost 알고리즘을 이용하여 진행하는 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법.
- 제5항에 있어서,상기 얼굴특징점의 검출은,(b1) 현재 특징점의 위치를 (xl, yl)라고 정의하고, 현재 특징점의 위치를 중심으로 그 근방에서 n*n 화소크기의 부분창문들을 분류기로 분류하는 단계;(b2) 하기 수학식2에 의하여 특징점의 후보위치를 계산하는 단계; 및(b3) 하기 수학식3의 조건을 만족하는 경우에는 (x'l, y'l)을 새로운 특징점으로 정하고, 만족하지 못하는 경우에는 현재 특징점의 위치(xl, yl)를 유지하는 단계;를 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법.[수학식2][수학식3](단, a:x축방향으로 탐색해나가는 최대근방거리b:y축방향으로 탐색해나가는 최대근방거리xdx , dy:(xl, yl)에서 (dx, dy)만큼 떨어진 점을 중심으로 하는 부분창문Nall:분류기의 총계단수Npass:부분창문이 통과된 계단수c:끝까지 통과되지 못한 부분창문의 신뢰도값을 제한하기 위한 상수값)
- 제1항에 있어서,상기 (c) 단계는,(c1) 상기 3차원 표준 얼굴모델의 얼굴 회전정보에 관한 3*3 행렬 M과 얼굴 평행이동정보에 관한 3차원 벡터 T를 이용하여 하기 수학식4의 변환식을 계산하는 단계-상기 M과 T는 각 성분을 변수로 가지며, 상기 최적변환행렬을 정의하는 행렬임-;(c2) 상기 수학식4에 의해 구해진 카메라특징점위치벡터(PC)와 하기 수학식6에 의해 구해진 카메라변환행렬(MC)를 이용하여 하기 수학식5의 3차원 벡터 P'을 계산하는 단계;(c3) 상기 3차원 벡터 P'에 근거하여 2차원 벡터 PI를 (P'x/P'z, P'y/P'z)로 정의하는 단계; 및(c4) 상기 2차원 벡터 PI와 상기 (b) 단계에서 검출된 얼굴특징점의 좌표값을 이용하여 상기 최적변환행렬의 각 변수를 추정하는 단계;를 더욱 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법.[수학식4]PC=M*PM+T[수학식5]P'=Mc*Pc(단, P'은 (P'x, P'y, P'z)로 정의되는 3차원 벡터)[수학식6](단, W:영상입력수단으로 입력된 이미지의 폭,H:영상입력수단으로 입력된 이미지의 높이,focal_len:-0.5*W/tan(Degree2Radian(fov*0.5)),fov:카메라의 보임각도)
- 제1항에 있어서,상기 (d) 단계 이후에,(e) 상기 검출된 얼굴영역을 이용하여 상기 시청자의 성별을 추정하는 성별추정단계;를 더 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법.
- 제9항에 있어서,상기 (e) 단계는,(e1) 상기 검출된 얼굴특징점을 기준으로 상기 검출된 얼굴영역에서 성별추정용 얼굴영역을 잘라내는 단계;(e2) 상기 잘라낸 성별추정용 얼굴영역의 크기를 정규화하는 단계;(e3) 상기 크기가 정규화된 성별추정용 얼굴영역의 히스토그램을 정규화하는 단계; 및(e4) 상기 크기 및 히스토그램이 정규화된 성별추정용 얼굴영역으로부터 입력벡터를 구성하고 미리 학습된 SVM 알고리즘을 이용하여 성별을 추정하는 단계;를 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법.
- 제1항에 있어서,상기 (d) 단계 이후에,(f) 상기 검출된 얼굴영역을 이용하여 상기 시청자의 나이를 추정하는 나이추정단계;를 더 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법.
- 제11항에 있어서,상기 나이의 추정은,(f1) 상기 검출된 얼굴특징점을 기준으로 상기 검출된 얼굴영역에서 나이추정용 얼굴영역을 잘라내는 단계;(f2) 상기 잘라낸 나이추정용 얼굴영역의 크기를 정규화하는 단계;(f3) 상기 크기가 정규화된 나이추정용 얼굴영역의 국부적 조명보정을 하는 단계;(f4) 상기 크기 정규화 및 국부적 조명보정된 나이추정용 얼굴영역으로부터 입력벡터를 구성하고 나이다양체 공간으로 사영하여 특징벡터를 생성하는 단계; 및(f5) 상기 생성된 특징벡터에 2차회귀를 적용하여 나이를 추정하는 단계;를 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법.
- 제1항에 있어서,상기 (d) 단계 이후에,(g) 상기 검출된 얼굴영역을 이용하여 상기 시청자의 눈감김을 추정하는 눈감김추정단계;를 더 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법.
- 제13항에 있어서,상기 눈감김의 추정은,(g1) 상기 검출된 얼굴특징점을 기준으로 상기 검출된 얼굴영역에서 눈감김추정용 얼굴영역을 잘라내는 단계;(g2) 상기 잘라낸 눈감김추정용 얼굴영역의 크기를 정규화하는 단계;(g3) 상기 크기가 정규화된 눈감김추정용 얼굴영역의 히스토그램을 정규화하는 단계; 및(g4) 상기 크기 및 히스토그램이 정규화된 눈감김추정용 얼굴영역으로부터 입력벡터를 구성하고 미리 학습된 SVM 알고리즘을 이용하여 눈감김을 추정하는 단계;를 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법.
- 시청자의 응시방향 및 응시거리 중 적어도 하나의 정보에 대응하여 3차원 디스플레이 장치의 입체감을 제어하기 위한 시청자 얼굴 추적정보 생성방법으로서,상기 3차원 디스플레이 장치 측의 일 위치에 구비된 영상입력수단을 통해 입력되는 영상에서 추출한 이미지로부터 상기 시청자의 얼굴영역을 검출하는 얼굴영역 검출단계;상기 검출된 얼굴영역에 근거하여 상기 시청자의 응시방향 및 응시거리 중 적어도 하나의 정보를 추정하여 응시정보를 생성하는 응시정보 생성단계; 및상기 검출된 얼굴영역에 근거하여 상기 시청자의 성별 및 나이 중 적어도 하나의 정보를 추정하여 시청자정보를 생성하는 시청자정보 생성단계;를 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성방법.
- 제1항 내지 제15항 중의 어느 한 항에 기재된 방법의 각 단계를 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체.
- 제1항 내지 제15항 중의 어느 한 항에 기재된 시청자 얼굴 추적정보 생성방법을 이용하여 입체감을 제어하는 3차원 디스플레이 장치.
- 시청자의 응시방향 및 응시거리 중 적어도 하나의 정보에 대응하여 3차원 디스플레이 장치의 입체감을 제어하기 위한 시청자 얼굴 추적정보 생성장치로서,상기 3차원 디스플레이 장치 측의 일 위치에 구비된 영상입력수단을 통해 입력되는 영상에서 추출한 이미지로부터 상기 시청자의 얼굴영역을 검출하는 얼굴영역 검출모듈;상기 검출된 얼굴영역에서 얼굴특징점을 검출하는 얼굴특징점 검출모듈;3차원 표준 얼굴모델의 모델특징점을 변환하여 상기 얼굴특징점에 대응하는 3차원 시청자 얼굴모델을 생성하는 최적변환행렬을 추정하는 행렬 추정모듈; 및상기 추정된 최적변환행렬에 근거하여 상기 시청자의 응시방향 및 응시거리 중 적어도 하나를 추정하여 시청자 얼굴 추적정보를 생성하는 추적정보 생성모듈;을 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성장치.
- 제18항에 있어서,상기 얼굴특징점 검출모듈은,ASM(active shape model) 방법의 특징점(landmark) 탐색에 의해 얼굴특징점을 검출하되, AdaBoost 알고리즘을 이용하여 진행하는 것을 특징으로 하는 시청자 얼굴 추적정보 생성장치.
- 제18항에 있어서,상기 행렬 추정모듈은,상기 3차원 표준 얼굴모델의 얼굴 회전정보에 관한 3*3 행렬 M과 얼굴 평행이동정보에 관한 3차원 벡터 T를 이용하여 하기 수학식4의 변환식을 계산하고-상기 M과 T는 각 성분을 변수로 가지며, 상기 최적변환행렬을 정의하는 행렬임-; 상기 수학식4에 의해 구해진 카메라특징점위치벡터(PC)와 하기 수학식6에 의해 구해진 카메라변환행렬(MC)를 이용하여 하기 수학식5의 3차원 벡터 P'을 계산하며, 상기 3차원 벡터 P'에 근거하여 2차원 벡터 PI를 (P'x/P'z, P'y/P'z)로 정의하고, 상기 2차원 벡터 PI와 상기 (b) 단계에서 검출된 얼굴특징점의 좌표값을 이용하여 상기 최적변환행렬의 각 변수를 추정하는 것을 특징으로 하는 시청자 얼굴 추적정보 생성장치.[수학식4]PC=M*PM+T[수학식5]P'=Mc*Pc(단, P'은 (P'x, P'y, P'z)로 정의되는 3차원 벡터)[수학식6](단, W:영상입력수단으로 입력된 이미지의 폭,H:영상입력수단으로 입력된 이미지의 높이,focal_len:-0.5*W/tan(Degree2Radian(fov*0.5)),fov:카메라의 보임각도)
- 제18항에 있어서,상기 검출된 얼굴영역을 이용하여 상기 시청자의 성별을 추정하는 성별추정모듈;을 더 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성장치.
- 제18항에 있어서,상기 검출된 얼굴영역을 이용하여 상기 시청자의 나이를 추정하는 나이추정모듈;을 더 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성장치.
- 제18항에 있어서,상기 검출된 얼굴영역을 이용하여 상기 시청자의 눈감김을 추정하는 눈감김추정모듈;을 더 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성장치.
- 시청자의 응시방향 및 응시거리 중 적어도 하나의 정보에 대응하여 3차원 디스플레이 장치의 입체감을 제어하기 위한 시청자 얼굴 추적정보 생성장치로서,상기 3차원 디스플레이 장치 측의 일 위치에 구비된 영상입력수단을 통해 입력되는 영상에서 추출한 이미지로부터 상기 시청자의 얼굴영역을 검출하는 수단;상기 검출된 얼굴영역에 근거하여 상기 시청자의 응시방향 및 응시거리 중 적어도 하나의 정보를 추정하여 응시정보를 생성하는 수단; 및상기 검출된 얼굴영역에 근거하여 상기 시청자의 성별 및 나이 중 적어도 하나의 정보를 추정하여 시청자정보를 생성하는 수단;을 포함하여 구성된 것을 특징으로 하는 시청자 얼굴 추적정보 생성장치.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/003,685 US20140307063A1 (en) | 2011-07-08 | 2012-06-29 | Method and apparatus for generating viewer face-tracing information, recording medium for same, and three-dimensional display apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2011-0067713 | 2011-07-08 | ||
KR20110067713A KR101216123B1 (ko) | 2011-07-08 | 2011-07-08 | 시청자 얼굴 추적정보 생성방법 및 생성장치, 그 기록매체 및 3차원 디스플레이 장치 |
Publications (3)
Publication Number | Publication Date |
---|---|
WO2013009020A2 true WO2013009020A2 (ko) | 2013-01-17 |
WO2013009020A3 WO2013009020A3 (ko) | 2013-03-07 |
WO2013009020A4 WO2013009020A4 (ko) | 2013-08-15 |
Family
ID=47506652
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2012/005202 WO2013009020A2 (ko) | 2011-07-08 | 2012-06-29 | 시청자 얼굴 추적정보 생성방법 및 생성장치, 그 기록매체 및 3차원 디스플레이 장치 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20140307063A1 (ko) |
KR (1) | KR101216123B1 (ko) |
WO (1) | WO2013009020A2 (ko) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107278369A (zh) * | 2016-12-26 | 2017-10-20 | 深圳前海达闼云端智能科技有限公司 | 人员查找的方法、装置及通信系统 |
Families Citing this family (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5649601B2 (ja) * | 2012-03-14 | 2015-01-07 | 株式会社東芝 | 照合装置、方法及びプログラム |
US9104908B1 (en) * | 2012-05-22 | 2015-08-11 | Image Metrics Limited | Building systems for adaptive tracking of facial features across individuals and groups |
US9111134B1 (en) | 2012-05-22 | 2015-08-18 | Image Metrics Limited | Building systems for tracking facial features across individuals and groups |
KR20150057064A (ko) * | 2013-11-18 | 2015-05-28 | 엘지전자 주식회사 | 전자 다바이스 및 그 제어방법 |
JP6507747B2 (ja) * | 2015-03-18 | 2019-05-08 | カシオ計算機株式会社 | 情報処理装置、コンテンツ決定方法、及びプログラム |
US9514397B2 (en) * | 2015-03-23 | 2016-12-06 | Intel Corporation | Printer monitoring |
KR101779096B1 (ko) * | 2016-01-06 | 2017-09-18 | (주)지와이네트웍스 | 지능형 영상분석 기술 기반 통합 매장관리시스템에서의 객체 추적방법 |
CN105739707B (zh) * | 2016-03-04 | 2018-10-02 | 京东方科技集团股份有限公司 | 电子设备、脸部识别跟踪方法和三维显示方法 |
KR101686620B1 (ko) * | 2016-03-17 | 2016-12-15 | 델리아이 주식회사 | 얼굴영상을 통한 고령자판단시스템 |
KR102308871B1 (ko) | 2016-11-02 | 2021-10-05 | 삼성전자주식회사 | 객체의 속성에 기반하여 객체를 인식 및 트레이닝하는 방법 및 장치 |
CN106960203B (zh) * | 2017-04-28 | 2021-04-20 | 北京搜狐新媒体信息技术有限公司 | 一种面部特征点跟踪方法及系统 |
CN107203743B (zh) * | 2017-05-08 | 2020-06-05 | 杭州电子科技大学 | 一种人脸深度跟踪装置及实现方法 |
US10643383B2 (en) * | 2017-11-27 | 2020-05-05 | Fotonation Limited | Systems and methods for 3D facial modeling |
TW202014992A (zh) * | 2018-10-08 | 2020-04-16 | 財團法人資訊工業策進會 | 虛擬臉部模型之表情擬真系統及方法 |
US10949649B2 (en) | 2019-02-22 | 2021-03-16 | Image Metrics, Ltd. | Real-time tracking of facial features in unconstrained video |
US11610414B1 (en) * | 2019-03-04 | 2023-03-21 | Apple Inc. | Temporal and geometric consistency in physical setting understanding |
BR112022004811A2 (pt) | 2019-09-17 | 2022-06-21 | Boston Polarimetrics Inc | Sistemas e métodos para modelagem de superfície usando indicações de polarização |
CN110602556A (zh) * | 2019-09-20 | 2019-12-20 | 深圳创维-Rgb电子有限公司 | 播放方法、云端服务器及存储介质 |
WO2021063321A1 (zh) * | 2019-09-30 | 2021-04-08 | 北京芯海视界三维科技有限公司 | 实现3d显示的方法、装置及3d显示终端 |
CN114746717A (zh) | 2019-10-07 | 2022-07-12 | 波士顿偏振测定公司 | 利用偏振进行表面法线感测的系统和方法 |
WO2021108002A1 (en) | 2019-11-30 | 2021-06-03 | Boston Polarimetrics, Inc. | Systems and methods for transparent object segmentation using polarization cues |
JP7462769B2 (ja) | 2020-01-29 | 2024-04-05 | イントリンジック イノベーション エルエルシー | 物体の姿勢の検出および測定システムを特徴付けるためのシステムおよび方法 |
WO2021154459A1 (en) | 2020-01-30 | 2021-08-05 | Boston Polarimetrics, Inc. | Systems and methods for synthesizing data for training statistical models on different imaging modalities including polarized images |
KR102265624B1 (ko) * | 2020-05-08 | 2021-06-17 | 주식회사 온페이스에스디씨 | 안면인식을 이용한 차량의 시동보안 시스템 |
WO2021243088A1 (en) | 2020-05-27 | 2021-12-02 | Boston Polarimetrics, Inc. | Multi-aperture polarization optical systems using beam splitters |
US11290658B1 (en) | 2021-04-15 | 2022-03-29 | Boston Polarimetrics, Inc. | Systems and methods for camera exposure control |
US11954886B2 (en) | 2021-04-15 | 2024-04-09 | Intrinsic Innovation Llc | Systems and methods for six-degree of freedom pose estimation of deformable objects |
US11689813B2 (en) | 2021-07-01 | 2023-06-27 | Intrinsic Innovation Llc | Systems and methods for high dynamic range imaging using crossed polarizers |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000278716A (ja) * | 1999-03-25 | 2000-10-06 | Mr System Kenkyusho:Kk | 視点位置検出装置、方法及び立体画像表示システム |
JP2005275935A (ja) * | 2004-03-25 | 2005-10-06 | Omron Corp | 端末装置 |
KR100711223B1 (ko) * | 2005-02-18 | 2007-04-25 | 한국방송공사 | 저니키(Zernike)/선형 판별 분석(LDA)을 이용한얼굴 인식 방법 및 그 방법을 기록한 기록매체 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6466250B1 (en) * | 1999-08-09 | 2002-10-15 | Hughes Electronics Corporation | System for electronically-mediated collaboration including eye-contact collaboratory |
KR101890622B1 (ko) * | 2011-11-22 | 2018-08-22 | 엘지전자 주식회사 | 입체영상 처리 장치 및 입체영상 처리 장치의 칼리브레이션 방법 |
-
2011
- 2011-07-08 KR KR20110067713A patent/KR101216123B1/ko active IP Right Grant
-
2012
- 2012-06-29 WO PCT/KR2012/005202 patent/WO2013009020A2/ko active Application Filing
- 2012-06-29 US US14/003,685 patent/US20140307063A1/en not_active Abandoned
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000278716A (ja) * | 1999-03-25 | 2000-10-06 | Mr System Kenkyusho:Kk | 視点位置検出装置、方法及び立体画像表示システム |
JP2005275935A (ja) * | 2004-03-25 | 2005-10-06 | Omron Corp | 端末装置 |
KR100711223B1 (ko) * | 2005-02-18 | 2007-04-25 | 한국방송공사 | 저니키(Zernike)/선형 판별 분석(LDA)을 이용한얼굴 인식 방법 및 그 방법을 기록한 기록매체 |
Non-Patent Citations (3)
Title |
---|
FU, Y. ET AL.: 'Estimating Human Age by Manifold Analysis of Face Pictures and Regression on Aging Features' MULTIMEDIA AND EXPO, 2007 IEEE INTERNATIONAL CONFERENCE July 2007, pages 1383 - 1386 * |
JAE-YOON, JUNG.: 'Robust Face Feature Extraction for Various Pose and Expression' THESIS FOR MASTER COURSE IN HONGIK UNIVERSITY GRADUATE SCHOOL February 2006, pages 18 - 55 * |
KANG RYOUNG, PARK ET AL.: 'Facial Gaze Detection by Estimating Three Dimensional Positional Movements' JOURNAL OF THE INSTITUTE OF ELECTRONIC ENGINEERS OF KOREA vol. 39, no. 3, May 2002, pages 23 - 36 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107278369A (zh) * | 2016-12-26 | 2017-10-20 | 深圳前海达闼云端智能科技有限公司 | 人员查找的方法、装置及通信系统 |
Also Published As
Publication number | Publication date |
---|---|
WO2013009020A4 (ko) | 2013-08-15 |
US20140307063A1 (en) | 2014-10-16 |
WO2013009020A3 (ko) | 2013-03-07 |
KR101216123B1 (ko) | 2012-12-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2013009020A2 (ko) | 시청자 얼굴 추적정보 생성방법 및 생성장치, 그 기록매체 및 3차원 디스플레이 장치 | |
WO2013022226A4 (ko) | 고객 인적정보 생성방법 및 생성장치, 그 기록매체 및 포스 시스템 | |
WO2019216593A1 (en) | Method and apparatus for pose processing | |
WO2018143707A1 (ko) | 메이크업 평가 시스템 및 그의 동작 방법 | |
WO2021167394A1 (en) | Video processing method, apparatus, electronic device, and readable storage medium | |
WO2015102361A1 (ko) | 얼굴 구성요소 거리를 이용한 홍채인식용 이미지 획득 장치 및 방법 | |
WO2020050499A1 (ko) | 객체 정보 획득 방법 및 이를 수행하는 장치 | |
WO2020213750A1 (ko) | 객체를 인식하는 인공 지능 장치 및 그 방법 | |
WO2019103484A1 (ko) | 인공지능을 이용한 멀티모달 감성인식 장치, 방법 및 저장매체 | |
WO2017188706A1 (ko) | 이동 로봇 및 이동 로봇의 제어방법 | |
EP3740936A1 (en) | Method and apparatus for pose processing | |
WO2018016837A1 (en) | Method and apparatus for iris recognition | |
WO2017164716A1 (en) | Method and device for processing multimedia information | |
WO2018048054A1 (ko) | 단일 카메라 기반의 3차원 영상 해석에 기초한 가상현실 인터페이스 구현 방법, 단일 카메라 기반의 3차원 영상 해석에 기초한 가상현실 인터페이스 구현 장치 | |
WO2017039348A1 (en) | Image capturing apparatus and operating method thereof | |
WO2017090837A1 (en) | Digital photographing apparatus and method of operating the same | |
WO2018062647A1 (ko) | 정규화된 메타데이터 생성 장치, 객체 가려짐 검출 장치 및 그 방법 | |
WO2020141729A1 (ko) | 신체 측정 디바이스 및 그 제어 방법 | |
WO2021006366A1 (ko) | 디스플레이 패널의 색상을 조정하는 인공 지능 장치 및 그 방법 | |
WO2015133699A1 (ko) | 객체 식별 장치, 그 방법 및 컴퓨터 프로그램이 기록된 기록매체 | |
WO2019085495A1 (zh) | 微表情识别方法、装置、系统及计算机可读存储介质 | |
WO2017188800A1 (ko) | 이동 로봇 및 그 제어방법 | |
WO2020117006A1 (ko) | Ai 기반의 안면인식시스템 | |
EP3440593A1 (en) | Method and apparatus for iris recognition | |
WO2019135621A1 (ko) | 영상 재생 장치 및 그의 제어 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12811349 Country of ref document: EP Kind code of ref document: A2 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14003685 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21/05/2014) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 12811349 Country of ref document: EP Kind code of ref document: A2 |