US20220036581A1 - Estimation device, estimation method, and storage medium - Google Patents

Estimation device, estimation method, and storage medium Download PDF

Info

Publication number
US20220036581A1
US20220036581A1 US17/278,500 US201817278500A US2022036581A1 US 20220036581 A1 US20220036581 A1 US 20220036581A1 US 201817278500 A US201817278500 A US 201817278500A US 2022036581 A1 US2022036581 A1 US 2022036581A1
Authority
US
United States
Prior art keywords
face
image
perturbation
region
sight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US17/278,500
Other languages
English (en)
Inventor
Yusuke Morishita
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Assigned to NEC CORPORATION reassignment NEC CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MORISHITA, YUSUKE
Publication of US20220036581A1 publication Critical patent/US20220036581A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/163Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state by tracking eye movement, gaze, or pupil change
    • G06K9/00268
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/197Matching; Classification
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/18Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state for vehicle drivers or machine operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the present disclosure relates to a technique for estimating a direction, and particularly, to a technique for estimating a direction of a line of sight or a face of a person included in an image.
  • Directions of a line of sight that is, direction at which eyes look
  • a face of a person may be important clues to analyze a behavior and an intention of the person. For example, an object or an event at which the person looks can be specified from the line of sight of the person.
  • An intended line of sight can be specified from a difference between the directions of the line of sight and the face orientation obtained by measuring the line of sight and the face orientation of the person.
  • the face orientation and the line of sight of the person are generally directed in the same direction in many cases.
  • the line of sight and the face orientation of the person are different from each other, for example, in a case where the face is oriented to the right and the line of sight is directed to the left, it is considered that the person tries to look at a target while hiding the direction of the line of sight from others.
  • the others can easily recognize the face orientation of the person.
  • a technique for estimating a line of sight and a face orientation of a person particularly, a technique for estimating the line of sight and the face orientation of the person using an image including a face of the person (hereinafter, referred to as “face image”) are disclosed in the documents described below.
  • PTL 1 discloses a method for estimating a line of sight using feature points included in a face image (image feature point) (feature-based methods).
  • NPL 1 discloses a method for estimating a line of sight from a face image including only one eye.
  • PTL 2 and NPL 2 disclose examples of “estimation of a line of sight based on an appearance” (appearance-based gaze estimation). For example, in PTL 2, a relationship between a face and a line of sight is learned by performing deep learning based on a Convolutional neural network (CNN) model using a given data set of a face image.
  • CNN Convolutional neural network
  • NPL 3 discloses a method for simultaneously estimating a position of the face and a position of a part of the face, an orientation of the face, or the like by performing deep learning based on the CNN model.
  • PTL 4 discloses a device that estimates a line of sight direction on the basis of a difference between the center position of the face calculated on the basis of three-dimensional positions of parts of the face and the center position of the pupil.
  • PTL 5 discloses a device that detects a direction of a line of sight on the basis of an outline of the face and positions of the eyes.
  • PTL 6 discloses a device that estimates a direction recognized as the front by a vehicle driver on the basis of a time-series change in the estimated line of sight and corrects the direction of the line of sight on the basis of the estimated direction.
  • PTL 7 discloses a device that estimates an eye region on the basis of a result of detecting the nostril and determines an eye opened/closed state.
  • PTL 8 discloses a device that determines a face orientation by projecting a vector indicating coordinates of the detected feature points on each of partial spaces generated for a plurality of face orientations and integrating the directions determined for the respective partial spaces.
  • PTL 9 discloses a device that estimates a direction of a line of sight on the basis of a feature amount of an eye region and a reliability of each of both eyes according to a detected face orientation.
  • the line of sight and the face orientation are estimated from a single image. Therefore, in a case where the image that is an estimation target is not suitable for estimation because of imaging conditions and shielding, it is not possible to perform accurate estimation. Even if an error occurs in the estimation result, it is not possible to correct the error.
  • the technique disclosed in NPL 2 estimates a line of sight from a single input face image. Therefore, in a case where a state of the image is poor, it is not possible to accurately obtain positions of the face and the eyes.
  • a case where the state of the image is poor includes, for example, a case where an entire image is dark due to a poor lighting condition, or in a case where the face is shadowed.
  • a case where the state of the image is poor includes, for example, a case where the face or the eye is not clearly reflected in the image or a case where a part of the eye or the face is shielded by another object. If it is not possible to accuracy obtain the positions of the face and the eyes, it is not possible to accuracy extract the eye regions used to estimate the line of sight. As a result, the estimation of the line of sight may fail.
  • the positions of the face and the parts of the face are detected from the single input image, and the face orientation is estimated. Therefore, in a case similar to the above case, there is a case where the estimation of the face orientation fails for the similar reasons.
  • An object of the present disclosure is to provide an estimation device or the like that can suppress deterioration in an accuracy for estimating a line of sight or a face orientation in an image of a person due to a state of the image.
  • An estimation device includes: perturbation means for generating a plurality of extraction regions by adding a perturbation to an extraction region of a partial image determined on the basis of positions of feature points extracted from a face image; estimation means for estimating a plurality of directions of at least one of a face and a line of sight and a reliability of each of the plurality of directions on the basis of a plurality of partial images in the plurality of extraction regions of the face image; and integration means for calculating an integrated direction obtained by integrating the plurality of directions on the basis of the estimated reliability.
  • An estimation method includes: generating a plurality of extraction regions by adding a perturbation to an extraction region of a partial image determined on the basis of positions of feature points extracted from a face image; estimating a plurality of directions of at least one of a face and a line of sight and a reliability of each of the plurality of directions on the basis of a plurality of partial images in the plurality of extraction regions of the face image; and calculating an integrated direction obtained by integrating the plurality of directions on the basis of the estimated reliability.
  • a storage medium that stores a program causing a computer to execute: perturbation processing of generating a plurality of extraction regions by adding a perturbation to an extraction region of a partial image determined on the basis of positions of feature points extracted from a face image; estimation processing of estimating a plurality of directions of at least one of a face and a line of sight and a reliability of each of the plurality of directions on the basis of a plurality of partial images in the plurality of extraction regions of the face image; and integration processing of calculating an integrated direction obtained by integrating the plurality of directions on the basis of the estimated reliability.
  • One aspect of the present disclosure is also implemented by the program stored in the storage medium described above.
  • FIG. 1 is a block diagram illustrating an example of a configuration of an estimation device according to a first example embodiment of the present disclosure.
  • FIG. 2 is a diagram illustrating an example of a face image.
  • FIG. 3 is a diagram illustrating an example of a partial image (eye region image).
  • FIG. 4 is a diagram illustrating an example of a partial image (face region image).
  • FIG. 5 is a diagram for explaining a flow for extracting a partial image based on a perturbation amount.
  • FIG. 6 is a flowchart illustrating an example of an operation of the estimation device according to the first example embodiment of the present disclosure.
  • FIG. 7 is a block diagram illustrating an example of a configuration of an estimation device according to a second example embodiment of the present disclosure.
  • FIG. 8 is a flowchart illustrating an example of an operation of the estimation device according to the second example embodiment of the present disclosure.
  • FIG. 9 is a block diagram illustrating an example of a hardware configuration of a computer for implementing an estimation device.
  • FIG. 1 is a block diagram illustrating a configuration of an estimation device 100 according to a first example embodiment.
  • the estimation device 100 is a device that estimates at least one of a line of sight and a face orientation of a person included in an image.
  • a direction of the line of sight of the person and a direction of a face of the person are collectively described as a direction of the person.
  • the direction of the line of sight of the person is simply described as a line of sight.
  • the direction of the face of the person is simply described as a face orientation. As illustrated in FIG.
  • the estimation device 100 includes an acquisition unit 110 , a detection unit 120 , a perturbation unit 130 , an extraction unit 140 , an estimation unit 150 , an integration unit 160 , and an output unit 170 .
  • the estimation device 100 may include other components.
  • the acquisition unit 110 acquires image data of an image that includes a face of a person.
  • the acquisition unit 110 may be connected to the estimation device 100 via a communication network and may receive image data from another device that outputs the image data.
  • the acquisition unit 110 may be connected to the estimation device 100 via the communication network and may read the image data from another device that stores the image data.
  • the other device may be an imaging device such as a monitoring camera or a camera built in an electronic device that outputs image data of an imaged image.
  • the other device may be a storage device that stores image data, for example, as a database or the like.
  • the acquisition unit 110 sends the acquired image data to the detection unit 120 .
  • the image data acquired by the acquisition unit 110 is expressed by luminance values of a plurality of pixels.
  • Each of the number of pixels, the number of colors (that is, the number of color components), the number of gradations, or the like included in the image data (in other words, image represented by image data) is not limited to a specific numerical value.
  • the acquisition unit 110 may acquire only image data having a predetermined number of pixels and a predetermined number of colors.
  • the number of pixels and the number of colors of the image data acquired by the acquisition unit 110 are not respectively limited to the specific number of pixels and the specific number of colors.
  • the image data may be still image data or moving image data. For convenience of description, in the following, the image data acquired by the acquisition unit 110 is referred to as an “input image”.
  • the input image include a face of one person.
  • the acquisition unit 110 divide the input image into a plurality of input images each including only one face. It is sufficient that the acquisition unit 110 and other components of the estimation device 100 may perform operations to be described below on each of the plurality of input images generated by the division.
  • the acquisition unit 110 generates a face image from the acquired input image.
  • the acquisition unit 110 supplies the generated face image to the detection unit 120 and the extraction unit 140 .
  • the face image represents an image that includes a part or all of the face of the person.
  • the face image may be an image obtained by removing elements other than the face of the person (for example, background, object, body of person, or the like) from the input image.
  • the face image may be an image obtained by removing elements other than a part of the face of the person from the input image.
  • the acquisition unit 110 may detect a face region in the input image, for example, using a general method for detecting a face region.
  • the acquisition unit 110 may detect a partial region of the face (for example, region of specific part of face) in the input image using a general method for detecting a region of a specific part (for example, eyes or the like) of the face.
  • Removing the elements other than the face of the person from the input image may be changing pixel values of all the pixels in the region other than the face of the person of the input image to a predetermined pixel value.
  • Removing the elements other than a part of the face of the person from the input image may be changing pixel values of all the pixels in the region other than a part of the face of the person of the input image to a predetermined pixel value.
  • the acquisition unit 110 may change, for example, a pixel value of a pixel in a region other than the detected face region (or partial region of face) to a predetermined pixel value.
  • the acquisition unit 110 may supply an image, of which the pixel value of the pixel in the region other than the detected face region (or partial region of face) is changed to a predetermined pixel value, to the detection unit 120 and the extraction unit 140 as a face image.
  • FIG. 2 illustrates an example of the face image (face image 400 ) generated from the input image by the acquisition unit 110 .
  • the face image 400 illustrated in FIG. 2 includes face parts (eyes, eyebrows, nose, and mouth). It is sufficient that the face image include at least information necessary for estimating a line of sight or a face orientation by the estimation unit 150 . For example, in a case where the estimation unit 150 estimates a line of sight, only an eye region of the face image is used. Therefore, in a case where the estimation unit 150 estimates a line of sight, it is sufficient that the face image include at least the eyes. In the following description, an image of the eye region is also referred to as an eye region image.
  • an image generated from the input image by the acquisition unit 110 that is, an image including at least parts of a face extracted from the input image by the acquisition unit 110 is referred to as a face image.
  • An image extracted by the extraction unit 140 from a region of the face image determined on the basis of a position of a feature point detected by the detection unit 120 and a region obtained by adding a perturbation to the region is referred to as a partial image.
  • the input image includes a plurality of images (that is, frames). In this case, not all the frames included in the input image include the face. There is a possibility that one frame includes the face and another frame does not include the face. Therefore, in a case where the input image is a moving image, the acquisition unit 110 may extract only an image including the face of the person from the moving image and supply the extracted image to the detection unit 120 and the extraction unit 140 as a face image. With this configuration, processing (to be described later) for estimating the line of sight or the face orientation by the estimation device 100 can be efficiently performed.
  • the acquisition unit 110 may supply the input image to the detection unit 120 and the extraction unit 140 as a face image.
  • the acquisition unit 110 may process the input image and supply the processed input image to the detection unit 120 and the extraction unit 140 as a face image.
  • the acquisition unit 110 may detect a face of a person in the input image, extract a part of the input image including the detected face as a face image, and supply the extracted face image to the detection unit 120 and the extraction unit 140 .
  • the face image may be a monochrome image.
  • the face image may be a color image. That is, a pixel value of a pixel of the face image indicates values indicating magnitudes of a plurality of color components such as red (R), green (G), blue (B), or the like.
  • the acquisition unit 110 may convert the face image in such a way that the number of colors in the face image is set to be the predetermined number of colors.
  • the acquisition unit 110 may convert the face image in such a way that the number of gradations in the face image is set to be the predetermined number of gradations.
  • the acquisition unit 110 may supply the converted face image to the detection unit 120 and the extraction unit 140 .
  • the acquisition unit 110 may convert the face image into a face image expressed by a single-component gray scale.
  • the face image converted in this way is also simply referred to as a “face image” below.
  • the detection unit 120 receives the face image supplied from the acquisition unit 110 (for example, face image 400 illustrated in FIG. 2 ) and detects feature points of a face from the received face image.
  • the feature points of the face are feature points determined for the face or parts of the face.
  • the detection unit 120 may detect feature points determined for the eyes.
  • the detection unit 120 may, for example, detect the center of the pupil of the eye from the face image as the feature point.
  • the detection unit 120 may further detect a plurality of points on the outline of the eye as the feature points.
  • the center of the pupil and the plurality of points on the outline of the eye detected by the detection unit 120 as the feature points are referred to as feature points of the eye below.
  • the plurality of points on the outline of the eye includes, for example, four points including an inner canthus, an outer canthus, a center of an upper eyelid, and a center of a lower eyelid.
  • the inner canthus (so-called inner corner of eye) indicates a point on the inner side of the face of two points, where the upper and the lower eyelids are closed, at both ends of the outline of the eye.
  • the outer canthus indicates a point on the outer side of the face of the two points where the upper and the lower eyelids are closed.
  • the center of the upper eyelid is a point at the center of the border between the upper eyelid and the eyeball in the lateral direction.
  • the center of the lower eyelid is a point at the center of the border between the lower eyelid and the eyeball in the lateral direction.
  • the extraction unit 140 extracts a partial image including the point at the center of the pupil as the center.
  • the extraction unit 140 may extract a partial image including a midpoint of a line segment that connects the inner canthus and the outer canthus as the center, instead of the point at the center of the pupil.
  • the extraction unit 140 may extract a partial image including a point determined on the basis of the four points including the inner canthus, the outer canthus, the center of the upper eyelid, and the center of the lower eyelid as the center. In this way, the position of the partial image extracted by the extraction unit 140 is further stabilized.
  • the point based on the four points described above may be the center of gravity of a rectangle having the four points as vertexes.
  • the point based on the four points described above may be an intersection of a line segment connecting the inner canthus and the outer canthus and a line segment connecting the center of the upper eyelid and the center of the lower eyelid.
  • the point based on the four points described above may be the center of gravity of a parallelogram of which two parallel sides each pass through the inner canthus and the outer canthus and other two parallel sides each pass through the center of the upper eyelid and the center of the lower eyelid.
  • the side passing through the inner canthus and the side passing through the outer canthus may be parallel to an axis, of two axes in the image, having a larger angle with respect to the straight line passing through the inner canthus and the outer canthus.
  • the side passing through the center of the upper eyelid and the side passing through the center of the lower eyelid may be parallel to an axis, of the two axes in the image, having a larger angle with respect to the straight line passing through the center of the upper eyelid and the center of the lower eyelid.
  • the detection unit 120 may detect feature points of the face without limiting to the feature points determined for the eye.
  • the detection unit 120 may detect a plurality of points determined for the eyebrows, the nose, the mouth, the submandibular region, or the like, in addition to the feature points of the eye described above, from the face image.
  • the plurality of points for the eyes, the eyebrows, the nose, the mouth, and the submandibular region detected by the detection unit 120 in this case is referred to as a feature point of the face below.
  • the feature point of the face in the present example embodiment may be a feature point of the face that is often used in general.
  • the feature point of the face in the present example embodiment may be a point that is appropriately determined on the face by an operator of the estimation device 100 , for example.
  • the detection unit 120 may detect the feature points of the face.
  • the detection unit 120 may use any one of various known methods, for example, the method described in PTL 3 to detect the feature points of the eye. Similarly, the detection unit 120 may use any one of various known methods, for example, the method described in PTL 3 to detect the feature points of the face. For example, the detection unit 120 may use general machine learning such as supervised learning. In this case, for example, the detection unit 120 learns features and positions of the eyes, the eyebrows, the noses, the mouths, and the submandibular regions in faces of a plurality of persons using face images of the plurality of persons to which positions of feature points of the eyes, the eyebrows, the noses, the mouths, the submandibular regions, or the like are applied.
  • general machine learning such as supervised learning. In this case, for example, the detection unit 120 learns features and positions of the eyes, the eyebrows, the noses, the mouths, and the submandibular regions in faces of a plurality of persons using face images of the plurality of persons to which positions
  • the detection unit 120 makes a detector in advance, which outputs positions of feature points of an input face image, perform learning using the face image to which the positions of the feature points are applied. Then, the detection unit 120 detects the feature points from the supplied face image using the detector that has been caused to learn.
  • the detection unit 120 sends information regarding the feature points detected from the face image (for example, feature points of eye or feature points of face) to the perturbation unit 130 and the extraction unit 140 .
  • the perturbation unit 130 receives the information regarding the feature points detected by the detection unit 120 (for example, feature points of eye or feature points of face) from the detection unit 120 .
  • the perturbation unit 130 calculates an amount of a perturbation (hereinafter, referred to as “perturbation amount”) to be added to a region of a partial image extracted by the extraction unit 140 on the basis of the received information regarding the feature points. The calculation of the perturbation amount will be described in detail later.
  • the region of the partial image extracted from the face image is defined on the basis of the feature points as described above.
  • the perturbation indicates a variation to be applied to a position of a region where a partial image is extracted.
  • the perturbation amount is a value indicating a variation applied to a position of a region where a partial image is extracted.
  • the perturbation unit 130 calculates the variation amount on the basis of the information regarding the feature points.
  • To perturb the region of the partial image indicates to determine another region (in other words, to generate another region) by adding the variation determined on the basis of the perturbation amount to a region defined on the basis of the feature points (hereinafter, also described as original region).
  • the perturbation may be a plurality of variations (for example, a set of plurality of variations).
  • the perturbation unit 130 calculates the plurality of variation amounts on the basis of the information regarding the feature points.
  • To perturb the region is to determine a plurality of regions (in other words, to generate plurality of regions) by applying the plurality of variations indicating the perturbation to each region.
  • the perturbation is represented as “the perturbation includes the plurality of variations”.
  • the perturbation may be, for example, a change in a position of a region, such as a parallel translation.
  • the parallel translation of the region represents a movement of the region with no change in the size and the direction of the region.
  • the perturbation amount may be represented by a single two-dimensional vector determined according to the information regarding the feature points.
  • the perturbation may be a set of the plurality of variations determined according to the information regarding the feature points.
  • the perturbation unit 130 may determine, for example, a plurality of perturbation amounts using a value p calculated on the basis of the information regarding the feature points.
  • the value p may be, for example, a constant multiple of a distance between predetermined feature points.
  • the value p may be, for example, a constant multiple of a value calculated on the basis of a positional relationship between predetermined feature points.
  • the perturbation may be a set of translations of the position of the region that is determined on the basis of two coordinate axes set for the face image and of which at least one of values of two elements of coordinates increases by p.
  • the perturbation is represented by three vectors including (p, 0), (0, p), and (p, p).
  • the perturbation unit 130 may determine these three vectors as the perturbation amounts.
  • the extraction unit 140 described later may extract a partial image from three regions obtained by moving the original region, defined by the feature points of the face image, according to the three vectors (p, 0), (0, p), and (p, p) and the original region.
  • the perturbation may be a set of translations of the position of the region that is determined on the basis of two coordinate axes set for the face image and of which at least one of values of two elements of coordinates increases or decreases by p.
  • the perturbation is represented by eight vectors including (p, 0), (0, p), (p, p), ( ⁇ p, 0), (0, ⁇ p), ( ⁇ p, p), (p, ⁇ p), and ( ⁇ p, ⁇ p).
  • the perturbation unit 130 may determine these eight vectors as the perturbation amounts.
  • the extraction unit 140 described later may extract partial images from eight regions obtained by moving the original region of the face image according to the eight vectors and the original region.
  • a method for calculating and determining the perturbation amount in a case where the perturbation amount is the parallel translation is not limited to the above method.
  • the perturbation may be, for example, a variation in a size of a region where a partial image is extracted.
  • the variation in the size may be, for example, enlargement.
  • the variation in the size may be reduction.
  • the variation in the size does not need to be isotropic. For example, a variation in a size within the face image in a certain direction may be different from a variation in a size in another direction.
  • the perturbation amount may represent, for example, a change rate of the size of the region.
  • the perturbation amount may be r.
  • the perturbation amount may be a vector (r1, r2).
  • the perturbation amount may be a set of change rates.
  • the perturbation amount may be, for example, a change amount of the size of the region.
  • the perturbation may be, for example, a change in a size of a region for increasing a size in the vertical direction by s1 and increasing a size in the lateral direction by s2.
  • the perturbation amount may be a vector (s1, s2).
  • the perturbation amount may be a set of vectors indicating the change amounts.
  • the extraction unit 140 applies a variation having a size indicated by the perturbation amount to a region determined on the basis of the information regarding the feature points and extracts partial images from the region determined on the basis of the information regarding the feature points and a region to which the variation in the size is applied.
  • the extraction unit 140 may determine a region so as not to change the center position of the region.
  • the perturbation may be enlargement or reduction of the extracted partial image.
  • the perturbation amount may be a value indicating a change amount of the size of the partial image.
  • the perturbation amount may be a value indicating a change rate of the size of the partial image.
  • the perturbation unit 130 may determine the size of the region generated by adding the perturbation to the region of the partial image according to a method similar to the method for determining the size of the region in a case where the perturbation is the variation in the size of the region where the partial image is extracted.
  • the extraction unit 140 may generate a partial image obtained by the perturbation by converting the partial image extracted from the region determined on the basis of the feature points into an image having the determined size, for example, by interpolation.
  • the perturbation may be, for example, rotation of a region where a partial image is extracted.
  • the perturbation amount may be a magnitude of an angle of the rotation.
  • the perturbation may be a set of rotations in which the region determined on the basis of the feature points is rotated around the center point of the region by an angle indicated by the perturbation amount.
  • the angle amount is t
  • the perturbation may be a rotation for rotating the region determined on the basis of the feature points by an angle t and a rotation for rotating the region by an angle ⁇ t.
  • the extraction unit 140 may calculate a pixel value of each pixel of an image extracted from the rotated region by interpolation using the pixel value of the pixel of the face image.
  • the perturbation may be other conversion that can adjust a magnitude of deformation according to a parameter.
  • the perturbation may be addition of noise, for example, white noise or the like to the face image.
  • the perturbation amount may be a parameter indicating either one of an intensity or an amount of the added noise.
  • a method for generating the noise to be added may be any one of existing methods for generating noise that can adjust either one of the intensity and the amount of the noise according to the parameter.
  • the perturbation may be smoothing of a face image.
  • the perturbation amount may be a parameter indicating an intensity of the smoothing.
  • a smoothing method may be any one of smoothing methods that can adjust the intensity according to the parameter.
  • the perturbation may be other processing, on the image, that can adjust the intensity or the like according to the parameter.
  • the perturbation unit 130 may determine a plurality of perturbation amounts. Specifically, for example, the perturbation amount may be determined on the basis of the information regarding the feature points, and in addition, a value of the other perturbation amount of which the value is between zero and the determined perturbation amount value, for example, by a predetermined method using the determined perturbation amount.
  • the perturbation unit 130 may determine, for example, a value obtained by equally dividing a value between zero and the determined value of the perturbation amount by a predetermined number as the other perturbation amount described above. For example, in a case where the predetermined number is two, the perturbation unit 130 may determine a value obtained by dividing the perturbation amount determined on the basis of the information regarding the feature points by two as the other perturbation amount described above.
  • the variation represented by the perturbation does not include a value indicating a variation that does not change the region
  • the extraction unit 140 extracts a partial image from the region determined on the basis of the information regarding the feature points.
  • the extraction unit 140 may extract a partial image from the region obtained by adding the variation represented by the perturbation amount to the region determined on the basis of the information regarding the feature points and does not necessarily need to extract a partial image from the region determined on the basis of the information regarding the feature points.
  • the perturbation unit 130 may set a perturbation amount in such a way that the perturbation amount includes a value indicating a variation that does not change a region. Then, in a case where the perturbation amount includes the value indicating the variation that does not change the region, the extraction unit 140 may extract a partial image from the region determined on the basis of the information regarding the feature points.
  • the perturbation may be a combination of the perturbations described above.
  • the perturbation represented by the combination of the perturbations is, for example, a perturbation that rotates the position of the region, translates the region, and changes the size of the region.
  • the perturbation represented by the combination of the perturbations is not limited to this example.
  • FIGS. 3 and 4 are diagrams illustrating a part of the face image 400 illustrated in FIG. 2 and the feature points detected in the part.
  • a partial image 410 illustrated in FIG. 3 corresponds to the partial image extracted from a region 410 of the face image 400 illustrated in FIG. 2 .
  • the region 410 is a region that includes the left eye of the face image 400 .
  • a partial image 420 illustrated in FIG. 3 corresponds to a partial image extracted from a region 420 of the face image 400 illustrated in FIG. 2 .
  • the region 420 is a region that includes the right eye of the face image 400 .
  • the partial image 430 corresponds to a partial image extracted from the region 430 including parts of the face such as the eyes and the nose of the face image 400 .
  • the partial image 430 includes a part of the face serving as a clue for estimating a face orientation.
  • a positional relationship between the right eye, the left eye, and the top of the nose in the face image indicates that a distance between the right eye and the nose and a distance between the left eye and the nose generally coincide with each other in the face image in a case where the face faces front.
  • the top of the nose indicates the most protruding portion of the nose.
  • the distance in the face image between the right eye and the top of the nose is shorter than the distance in the face image between the left eye and the top of the nose. This difference in distance can be used as a clue to estimate that the face faces sideways.
  • the parts of the face included in the partial image 430 are not limited to the right eye, the left eye, and the nose described above.
  • points P 1 and P 2 are the centers of the pupils.
  • the points P 1 and P 2 are the centers of the pupils.
  • a point P 3 is the top of the nose.
  • a point P 4 is the submandibular region.
  • a point P 12 is a midpoint of a line segment connecting the points P 1 and P 2 .
  • the perturbation unit 130 obtains a perturbation amount indicating a magnitude of a perturbation to be applied to the position of the region of the partial image extracted by the extraction unit 140 on the basis of a value indicating the size of the face (hereinafter, also referred to as size of face).
  • the perturbation unit 130 determines, for example, an interval between the both eyes in the image as the size of the face.
  • the perturbation unit 130 may use a distance between the position of the pupil of the right eye (for example, point P 1 in partial image 410 in FIG. 3 ) and the position of the pupil of the left eye (for example, point P 2 in partial image 420 in FIG. 3 ) of the feature points of the eyes detected by the detection unit 120 as the size of the face.
  • the distance is, for example, an Euclidean distance. The distance may be other distances.
  • the perturbation unit 130 may determine an interval between the midpoint of the both eyes and the lowest point of the jaw as the size of the face. Specifically, the perturbation unit 130 may use a distance between a midpoint of a line segment connecting the position of the pupil of the right eye and the position of the pupil of the left eye (for example, point P 12 in partial image 430 in FIG. 4 ) and the lowest point of the jaw (for example, point P 4 in partial image 430 in FIG. 4 ) as the size of the face.
  • the eye has a characteristic pattern in an image.
  • the white of the eye and the black eye have a clear difference in luminance. Therefore, the feature points of the eyes are often obtained with high accuracy. Therefore, in a case where the interval between the both eyes is used as the size of the face, the size of the face is obtained with high accuracy.
  • the interval between the both eyes in the image (for example, Euclidean distance) is shorter than the interval between the both eyes in a case where the face faces front. In this case, by using the interval between the midpoint of the both eyes and the lowest point of the jaw instead of the interval between the both eyes, it is possible to stably obtain the size of the face regardless of the orientation of the face.
  • the perturbation unit 130 may obtain a moving amount d x of the position of the region in the x-axis direction and a moving amount d y in the y-axis direction indicated by the perturbation as the perturbation amount indicating the magnitude of the perturbation to be added to the position of the partial image, for example, according to the following formula.
  • the reference i is a number applied to a variation included in the perturbation
  • the references u xi and u yi are parameters that is predetermined to determine a magnitude of an i-th variation of the perturbation to be added to the position of the region
  • “ ⁇ ” is an operator representing multiplication.
  • (d xi , d yi ) is a perturbation amount of the position indicating the i-the variation.
  • the parameters u xi and u yi may be the same value.
  • the parameters u xi and u yi may be values different from each other.
  • the perturbation may include the plurality of variations. Examples of the plurality of parameters in that case are described below.
  • the perturbation unit 130 may obtain, for example, a perturbation amount indicating a perturbation to be added to the size of the partial image extracted by the extraction unit 140 on the basis of the size of the face.
  • the method for calculating the size of the face may be the same as the above-described calculation method.
  • the perturbation unit 130 may obtain a change amount s x of the size of the region in the x-axis direction and a change amount s y in the y-axis direction indicated by the perturbation as the perturbation amount indicating the perturbation to be added to the size of the partial image, for example, according to the following formula.
  • the perturbation amount of the size indicates a magnitude of an i-th variation of the perturbation to be added to the size of the region.
  • the references v xi and v yi are predetermined parameters to determine the magnitude of the i-th variation of the perturbation to be added to the size of the region.
  • the parameters v xi and v yi may be the same value.
  • the parameters v xi and v yi may be values different from each other.
  • the perturbation may include the plurality of variations. Examples of the plurality of parameters in that case are described below.
  • the parameter (u xi , u yi ) to determine the size of the perturbation to be added to the position of the region and the parameter (v xi , v yi ) to determine the magnitude of the perturbation to be added to the size of the region may be predetermined.
  • the perturbation unit 130 may determine these parameters on the basis of properties of the face image 400 or some index.
  • the perturbation unit 130 may evaluate an image quality of the face image and determine these parameters according to the image quality of the face image.
  • the evaluation of the image quality may be evaluation based on an amount of noise contained in the image.
  • the evaluation of the image quality may be evaluation based on a magnitude of a contrast.
  • the perturbation unit 130 may evaluate the image quality of the face image using any one of existing methods for evaluating image quality. In a case where the image quality of the face image is low, it is considered that an accuracy of the feature points detected by the detection unit 120 is low (in other words, accurate detection fails, and detected position deviates from true position).
  • the perturbation unit 130 may determine the perturbation amount in such a way that the magnitude of the perturbation increases as the image quality of the face image decreases.
  • the perturbation unit 130 may determine the parameter (u xi , u yi ) to determine the magnitude of the perturbation to be added to the position of the region in such a way that the magnitude of the perturbation increases as the image quality of the face image decreases.
  • the perturbation unit 130 may determine the parameter (v xi , v yi ) to determine the magnitude of the perturbation to be added to the size of the region in such a way that the magnitude of the perturbation increases as the image quality of the face image decreases.
  • the perturbation unit 130 may determine the above-described parameters to determine the magnitude of the perturbation on the basis of the reliability of the face. Even in a case where the estimation device 100 is configured to receive the position of the detected face and the reliability of the detected face from an external face detection device or the like, the perturbation unit 130 may determine the above-described parameters to determine the magnitude of the perturbation on the basis of the reliability of the face. In a case where the reliability of the detected face is low, there is a high possibility that the accurate position of the face is not detected.
  • the perturbation unit 130 may determine the above-described parameters (for example, (u xi , u yi ) and (v xi , v yi )) on the basis of the reliability of the detected face in such a way as to increase the magnitude of the perturbation.
  • the perturbation unit 130 sends the calculated perturbation amount (specifically, information regarding perturbation amount) to the extraction unit 140 .
  • the extraction unit 140 receives the face image (illustrated in FIG. 2 as face image 400 ) from the acquisition unit 110 .
  • the extraction unit 140 receives the perturbation amount (specifically, information regarding perturbation amount) from the perturbation unit 130 .
  • the extraction unit 140 receives the information regarding the feature points from the detection unit 120 .
  • the extraction unit 140 determines a position of a region on the basis of the received information regarding the feature points and specifies a position of a region where a partial image is extracted on the basis of the position of the region and the received perturbation amount. Specifically, for example, in a case where the perturbation is a change in a range of a region (position change, size change, or the like), the extraction unit 140 extracts a partial image from a region indicated by the position of the region. In a case where the extraction unit 140 is configured to extract the partial image only from the region obtained by adding the perturbation indicated by the perturbation amount to the position of the region based on the information regarding the feature points, it is not necessary to extract the partial image from the position of the region based on the information regarding the feature points.
  • the extraction unit 140 further specifies a region where a partial image is extracted by adding the perturbation indicated by the perturbation amount to the position of the region based on the received information regarding the feature points in the received face image (that is, by applying variation indicated by perturbation amount). Then, the extraction unit 140 extracts the partial image from the specified region of the received face image. For example, in a case where the perturbation is processing such as noise removal or the like on a partial image, the extraction unit 140 may extract a partial image from the position of the region based on the received information regarding the feature points in the received face image and execute processing based on the perturbation amount on the extracted partial image.
  • the extraction unit 140 may execute the processing based on the perturbation amount on the received face image and extract a partial image from the position of the region based on the received information regarding the feature points in the processed face image.
  • the extraction unit 140 extracts the plurality of partial images as described above.
  • the processing of extracting the partial image from the region obtained by adding the perturbation to the region determined on the basis of the feature points in the face image is also referred to as normalization processing.
  • the extracted partial image is also referred to as a normalized face image.
  • the partial images extracted by the extraction unit 140 include an image of a region near the right eye and an image of a region near the left eye (hereinafter, also referred to as eye region image).
  • the extraction unit 140 first determines four reference coordinates, on the face image, that define the positions and the sizes of the partial images (eye region images of both eyes) using the information regarding the perturbation amount acquired from the perturbation unit 130 .
  • the extraction unit 140 generates four reference coordinates for each variation indicated by the perturbation amount and extracts partial images (eye region images of right eye and left eye) for each variation indicated by the perturbation amount.
  • reference coordinates A to D respectively indicate coordinates of an upper left point, an upper right point, a lower right point, and a lower left point of a partial region.
  • references A to D are illustrated.
  • the reference coordinates A to D are coordinates of a coordinate system defined in a two-dimensional image, each coordinate has a two-dimensional coordinate value.
  • coordinate axes of the coordinate system of the image be the x axis and the y axis.
  • an x coordinate and a y coordinate of the reference coordinate A are respectively referred to as Ax and Ay.
  • the extraction unit 140 obtains a reference size of the partial image (that is, size of quadrangle defined by reference coordinates A to D) on the basis of the size of the face.
  • the size of the face may be, for example, the interval between the both eyes (distance between right eye and left eye).
  • the extraction unit 140 may use a distance (for example, Euclidean distance) between the position of the pupil of the right eye and the position of the pupil of the left eye of the feature points of the eyes detected by the detection unit 120 as the size of the face.
  • the size of the face may be the interval between the midpoint of the line segment connecting the both eyes and the lowest point of the jaw.
  • the extraction unit 140 may use a distance (for example, Euclidean distance) between the midpoint of the straight line connecting the position of the pupil of the right eye and the position of the pupil of the left eye and the lowest point of the jaw (that is, point in submandibular region) of the feature points of the face detected by the detection unit 120 as the size of the face.
  • the detection unit 120 detects feature points (for example, feature points of eye, or feature points of face including feature points of eye).
  • the extraction unit 140 can calculate the size of the face using the information regarding the feature points received from the detection unit 120 .
  • the extraction unit 140 calculates a width X0 and a height Y0 of the partial image, for example, according to the following formula (1) to set the reference coordinates A to D.
  • the reference S represents the size of the face
  • the reference k represents a predetermined constant.
  • the width X0 and the height Y0 of the partial image are proportional to the size S of the face.
  • the constant k may be appropriately determined.
  • the constant k may be, for example, 0.75.
  • the constant k may be any other value.
  • the formula to calculate X0 and Y0 is not limited to the formula (1).
  • the extraction unit 140 sets, for example, a rectangular region (square according to calculation in formula (1)) of which the feature point P 1 of the center of the pupil of the right eye is the center of gravity and the lengths of two sides are X0 and Y0 as a region where a partial image of the right eye (that is, eye region image) is extracted.
  • the extraction unit 140 sets coordinates of four vertexes of the region as the reference coordinates A to D of the region where the partial image of the right eye is extracted.
  • the extraction unit 140 may set, for example, a rectangular region in such a way that a side having the length of X0 is parallel to the x axis and a side having the length of Y0 is parallel to the y axis.
  • the extraction unit 140 similarly sets the region where the partial image of the left eye (that is, eye region image) is extracted with respect to the feature point P 2 of the center of the pupil of the left eye. Then, coordinates of four vertexes of the region are set to the reference coordinates A to D of the region where the partial image of the left eye is extracted.
  • relative positions between the feature point P 1 and the reference coordinates A to D of the region where the partial region of the right eye is extracted is respectively expressed by four vectors ( ⁇ X0/2, Y0/2), (X0/2, Y0/2), (X0/2, ⁇ Y0/2), and ( ⁇ X0/2, ⁇ Y0/2).
  • relative positions between the feature point P 2 and the reference coordinates A to D of the region where the partial region of the left eye is extracted is respectively expressed by four vectors ( ⁇ X0/2, Y0/2), (X0/2, Y0/2), (X0/2, ⁇ Y0/2), and ( ⁇ X0/2, ⁇ Y0/2).
  • the extraction unit 140 further adds the perturbation to the region determined on the basis of the information regarding the feature points using the information regarding the perturbation amount received from the perturbation unit 130 . Specifically, the extraction unit 140 adds the perturbation to the positions, the sizes, or the like of the reference coordinates A to D using the received information regarding the perturbation amount. In a case where the perturbation is added to the position of the region, the extraction unit 140 adds the perturbation amount of the position (d xi , d yi ) to each of the reference coordinates A to D.
  • the extraction unit 140 adds variations indicated by the perturbation amounts of the multiple positions (for example, perturbation amount of position (d xi , d yi )) to the reference coordinates A to D.
  • variations indicated by the perturbation amounts of the multiple positions for example, perturbation amount of position (d xi , d yi )
  • the reference coordinates A to D Coordinates obtained by adding the variations to the reference coordinates A to D are referred to as perturbated reference coordinates A′ to D′.
  • the perturbated reference coordinates A′ to D′ are also referred to as perturbation reference coordinates A′ to D′.
  • an i-th perturbation reference coordinate A′ is also referred to as (A′x i , A′y i ).
  • the perturbation reference coordinates B′ to D′ are similarly described. Relationships between the perturbation reference coordinates A′ to D′, the reference coordinates A to D, and the perturbation amount of the position (d xi , d yi ) are expressed as follows.
  • the extraction unit 140 changes the size of the region by adding the variations to the reference coordinates A to D so as not to move the center of the region. Specifically, the extraction unit 140 adds an amount calculated from the perturbation amount of the size (s xi , s yi ) to the reference coordinates A to D, for example, as follows. In a case where the perturbation includes the perturbation amount of the size (d xi , d yi ) as a value indicating a plurality of variations, the extraction unit 140 respectively adds amounts calculated from the perturbation amounts of the plurality of sizes (d xi , d yi ) to the reference coordinates A to D.
  • the extraction unit 140 may rotate the reference coordinates A to D in such a way that a line segment connecting the center P 1 of the pupil of the right eye and the center P 2 of the pupil of the left eye is parallel to the two sides of the rectangular (or square) region where the partial image is extracted. Specifically, the extraction unit 140 calculates an angle ⁇ of the line segment connecting the center P 1 of the pupil of the right eye and the center P 2 of the pupil of the left eye with respect to the horizontal axis of the face image. The extraction unit 140 rotates the reference coordinates A to D of the region including the center P 1 of the pupil of the right eye by ⁇ around the center P 1 of the pupil of the right eye.
  • the extraction unit 140 further rotates the reference coordinates A to D of the region including the center P 2 of the pupil of the left eye by ⁇ around the center P 2 of the pupil of the left eye. With this rotation, inclinations of the eyes included in the eye region images are constant regardless of an inclination of the face included in the face image in the horizontal direction.
  • the extraction unit 140 may perform the above-described rotation before processing of adding the perturbation to the region. In a case where the perturbation is a perturbation to be added to the size of the region, the extraction unit 140 may perform the above-described rotation after the processing of adding the perturbation to the region. In this case, the extraction unit 140 also rotates the perturbation reference coordinates A′ to D′.
  • FIG. 5 is a diagram schematically illustrating an example of a region obtained by adding a perturbation to a region and a partial image.
  • a partial image that is, eye region image
  • Partial images 411 and 421 in FIG. 5 indicate partial images extracted from regions generated by adding the perturbations to the regions where the partial images 410 and 420 are extracted.
  • FIG. 5 for simplification, only a partial image is illustrated that is extracted from a region in a case where the above-described variation of which the variation number i is three is added.
  • A′ to D′ in the partial images 411 and 421 respectively indicate points indicated by the perturbation reference coordinates A′ to D′.
  • the reference S indicates the size of the face. In the example illustrated in FIG. 5 , the reference S indicates the interval between the both eyes (that is, distance between points P 1 and P 2 ).
  • the extraction unit 140 adds the perturbation amount of the position (d x3 , d y3 ) to the reference coordinates A to D. Because the size of the face is not negative, 0.08 ⁇ S is also not negative.
  • the x and y coordinates of the perturbation reference coordinates A′ to D′ are values obtained by adding non-negative values to the x and y coordinates of the reference coordinates A to D. Therefore, the region indicated by the perturbation reference coordinates A′ to D′ corresponds to a region that is obtained by moving the region indicated by the reference coordinates A to D to the lower right direction in the image.
  • the references A′ to D′ in FIG. 5 indicate this state.
  • the extraction unit 140 extracts an image of a region of an entire face as a partial image.
  • the extraction of the partial image in a case where the face orientation is estimated and the extraction of the partial image in a case where the line of sight is estimated are different from each other in two points, that is, the magnitude of k in the formula (1) and the center position of the reference coordinates A to D.
  • the constant k in the formula (1) that defines the magnitudes of the reference coordinates A to D may be 2.5, not 0.75 in a case where the line of sight is estimated.
  • the center position of the reference coordinates A to D may be the center position of the face, for example, the top of the nose, not the center of the pupil in a case where the line of sight is estimated.
  • the reference coordinates A to D indicating the region where the extraction unit 140 extracts the partial image are calculated on the basis of the feature points detected by the detection unit 120 .
  • the detection unit 120 in a case where imaging conditions are poor, in a case where a shield exists, and in a case where the image quality of the face image from which the feature points are extracted is low, there is a case where it is not possible for the detection unit 120 to accurately detect the feature points of the face and positions of actual feature points and positions of the detected feature points are deviated from each other.
  • the position and the size of the region where the partial image is extracted may be different from the position and the size of the region in a case where the positions of the feature points can be accurately detected.
  • the perturbation in the present example embodiment is added to the region determined on the basis of the detected feature points, a plurality of regions where the partial image is extracted is set around the region determined on the basis of the detected feature points. Even in a case where the feature points are not accurately detected, there is a possibility that any one of the partial images extracted from the region generated by adding the perturbation to the region is an image suitable for estimating the direction of the person (that is, estimation of line of sight or face orientation). If an image suitable for the estimation of the direction of the person is included in the plurality of partial images, the estimation unit 150 which will be described in detail later can accuracy estimate the direction of the person on the basis of the image. In other words, the estimation unit 150 can estimate the direction of the person with a high reliability.
  • the integration unit 160 to be described in detail later integrates the plurality of estimated directions of the person on the basis of the reliability. If the direction of the person with a high reliability is estimated, a possibility increases that the direction of the person obtained by integrating the plurality of estimated directions of the person is a correct direction of the person. In other words, the estimation device 100 according to the present example embodiment can suppress deterioration in the accuracy for estimating the direction of the person caused because the state of the face in the input image is not suitable for the detection of the feature points with high accuracy.
  • the estimation unit 150 estimates a direction of a person included in a face image (for example, at least one of line of sight of person and face orientation of person).
  • the line of sight indicates a direction to which the person looks.
  • the face orientation indicates a direction in which the face of the person faces.
  • the estimation unit 150 estimates a direction of a person on the basis of the plurality of partial images normalized by the extraction unit 140 (that is, plurality of images extracted by extraction unit 140 ).
  • the estimation unit 150 estimates the direction of the person using an estimator that is learned in advance in such a way as to estimate the direction of the person on the basis of the input image of the face.
  • a method for learning the estimator may be any one of existing learning methods.
  • the estimation unit 150 causes the estimator to learn in advance a relationship between an appearance of the face in the input image of the face and the line of sight or the face orientation using a plurality of images of the face in which the direction of the person is specified in advance (in other words, image of face including correct answer).
  • the image of the face is, for example, a partial image extracted from a region determined on the basis of the feature points of the face that are given as the correct feature points of the face.
  • the estimation unit 150 estimates the line of sight or the face orientation using the estimator that has performed learning.
  • the estimation unit 150 outputs data of the estimation result to the integration unit 160 .
  • the estimation unit 150 includes an estimator that estimates the line of sight.
  • the estimation unit 150 includes an estimator that estimates the face orientation.
  • the estimation unit 150 includes the estimator that estimates the line of sight and the estimator that estimates the face orientation.
  • the estimation unit 150 may learn in advance the estimator that estimates the direction of the line of sight on the basis of the image of the face and the estimator that estimates the face orientation on the basis of the image of the face. Then, the estimation unit 150 may send the direction of the line of sight estimated by the estimator that estimates the direction of the line of sight on the basis of the image of the face and the face orientation estimated by the estimator that estimates the face orientation on the basis of the image of the face to the integration unit 160 .
  • the direction of the person estimated by the estimator is represented by a vector (g x , g y ).
  • a vector (g x , g y ) in a case where the direction of the person estimated by the estimator is a line of sight that is, a case where estimator estimates line of sight
  • a vector (g x , g y ) in a case where the direction of the person to be estimated is a face orientation that is, estimator estimates face orientation
  • the vector (g x , g y ) is a vector in a coordinate system defined in the image.
  • the estimated line of sight is represented by a vector (g x , g y ).
  • the reference g x indicates an angle of the line of sight in the horizontal direction
  • the reference g y indicates an angle of the line of sight in the vertical direction.
  • the vector (g x , g y ) may represent a relative direction with respect to the front of the face.
  • the line of sight may represent a difference between the direction to which the person looks and the direction of the front of the person's face.
  • the direction to which the eyes of the imaged person look is not specified only according to the vector (g x , g y ) of the line of sight and is specified according to the vector (g x , g y ) and the face orientation of the person.
  • the line of sight estimated by the estimator may use a direction to a camera (that is, direction from eye to camera) as a reference, instead of using the front of the face as a reference.
  • a direction to a camera that is, direction from eye to camera
  • the estimated face orientation is represented by a vector (g x , g y ).
  • the reference g x indicates an angle of the face orientation in the horizontal direction
  • the reference g y indicates an angle of the face orientation in the vertical direction.
  • the estimation unit 150 learns the estimator in advance in such a way as to estimate the direction of the person (for example, line of sight or face orientation) by any one of supervised learning methods.
  • a supervised learning method for example, an angle of a line of sight or a face orientation and its reliability are estimated using Generalized Learning Vector Quantization (GLVQ) as the supervised learning method.
  • the reliability is a value indicating how reliable an angle of a line of sight or a face orientation estimated by the estimator is.
  • the learning method to be used may be a method other than the GLVQ as long as a learning method can estimate the angle of the line of sight or the face orientation and its reliability.
  • a Support Vector Machine (SVM) can be used.
  • a plurality of combinations of an image of a face in which a direction of a person is specified (that is, partial image) and the specified direction of the person is input to the acquisition unit 110 .
  • the acquisition unit 110 sends the plurality of combinations of the image of the face in which the direction of the person is specified and the specified direction of the person to the estimation unit 150 .
  • the estimation unit 150 receives the plurality of combinations of the image of the face in which the direction of the person is specified and the specified direction of the person via the acquisition unit 110 .
  • the direction of the person in this case is a correct answer of the direction to be estimated by the estimator (that is, line of sight or face orientation).
  • the direction of the person is represented by a vector (g x , g y ).
  • the estimation unit 150 classifies continuous “angles” into discrete “classes” by discretizing angles of the direction of the person in the horizontal direction and the vertical direction. Specifically, for example, in a case where the direction of the person is a line of sight, the estimation unit 150 discretizes components of line of sight vectors (g x , g y ) in the horizontal direction and the vertical direction in a range from ⁇ 30 degrees to +30 degrees for each 10 degrees.
  • the line of sight angle in the horizontal direction is divided into six ranges including a range of ⁇ 30 degrees to ⁇ 20 degrees, a range of ⁇ 20 degrees to ⁇ 10 degrees, a range of ⁇ 10 degrees to zero degrees, a range of zero degrees to +10 degrees, a range of +10 degrees to +20 degrees, and a range of +20 degrees to +30 degrees.
  • the line of sight angle in the vertical direction is divided into six ranges including a range of ⁇ 30 degrees to ⁇ 20 degrees, a range of ⁇ 20 degrees to ⁇ 10 degrees, a range of ⁇ 10 degrees to zero degrees, a range of zero degrees to +10 degrees, a range of +10 degrees to +20 degrees, and a range of +20 degrees to +30 degrees.
  • the line of sight is classified into any one of the above-described 36 ranges by discretizing the line of sight represented by the vector (g x , g y ) as described above.
  • the estimation unit 150 classifies the direction of the person into any one of 37 classes including the 36 classes and a negative example class related to an image of a region other than the eyes and the face. For example, in a case where the number is smaller as a lower limit value of the range in the vertical direction is smaller and the lower limit value of the range in the vertical direction is the same, the numbers may be applied to the 37 classes in such a way that the number decreases as a number of a lower limit value of the range in the horizontal direction is smaller.
  • a number of one may be assigned to a class of which a range in the horizontal direction is from ⁇ 30 degrees to ⁇ 20 degrees and a range in the vertical direction is from ⁇ 30 degrees to ⁇ 20 degrees.
  • a number of two may be assigned to a class of which a range in the horizontal direction is from ⁇ 20 degrees to ⁇ 10 degrees and a range in the vertical direction is from ⁇ 30 degrees to ⁇ 20 degrees.
  • the vector (g x , g y ) is ( ⁇ 15, ⁇ 15), a class into which the vector is classified is a class of which the range in the horizontal direction is from ⁇ 20 degrees to ⁇ 10 degrees and the range in the vertical direction is from ⁇ 20 degrees to ⁇ 10 degrees.
  • the number eight is assigned to the class.
  • the number assigned to the negative example class is, for example, zero.
  • the reason why the negative example class is added is, for example, to learn the estimator in such a way that the estimator outputs information indicating that the partial image is not an estimation target, instead of outputting the direction, in a case where the partial image extracted from the region other than the face is input to the estimator.
  • the detection unit 120 fails to detect the feature points of the face
  • the estimator classifies the input partial image into any one of the 36 classes.
  • the estimator can output information indicating that the partial image is not an estimation target in the case described above.
  • the estimation unit 150 makes the estimator perform learning by learning a relationship between the partial image normalized by the extraction unit 140 and a class into which a direction of a person in the partial image is classified, for example, using the Generalized Learning Vector Quantization (GLVQ). Specifically, the estimation unit 150 learns a multi-class classification problem of 37 classes by the GLVQ. More specifically, the estimation unit 150 calculates an image feature amount f from a partial image (that is, image of face in which correct direction of person is given). The image feature amount f is represented by a vector. The estimation unit 150 adjusts a reference vector m in such a way as to optimize an evaluation value J k that is calculated from the calculated image feature amount f and the reference vector m according to the formula (2). Specifically, as described later, the estimation unit 150 adjusts the reference vector m, for example, in such a way that the value of the evaluation value J k approaches ⁇ 1.
  • the estimation unit 150 adjusts the reference vector m, for example, in such a way that the value of the evaluation
  • a function d (x, y) is a function used to calculate a distance (for example, Euclidean distance or the like) between a vector x and a vector y.
  • M reference vectors m exist in each class. That is, the number of reference vectors is M for each of the 37 classes, and the total number of reference vectors is 37 ⁇ M. However, the number of reference vectors for each class does not need to be the same. In the present example embodiment, a case will be described where the number of reference vectors is common in each class and is M.
  • a reference vector m k in the formula (2) indicates a reference vector having a shortest distance to the image feature amount f, that is, a reference vector closest to the image feature amount f among all the reference vectors determined according to the GLVQ.
  • a class to which the reference vector closest to the image feature amount f belongs is indicated by k.
  • the reference vector m ki indicates an i-th reference vector of M reference vectors belonging to the class k.
  • a reference vector m ij in the formula (2) indicates a reference vector that is the next closest to f except for the M reference vectors belonging to the class k.
  • the reference vector m ij indicates a j-th reference vector of M reference vectors belonging to a class 1 .
  • the image feature amount f indicates a direction and a magnitude of a change in luminance in the partial image with a predetermined number of dimensions (for example, several hundreds to thousands).
  • the image feature amount f indicates an image feature amount f regarding a luminance gradient of an image.
  • the image feature amount f regarding the luminance gradient for example, Histograms of Oriented Gradients (HOG) is known.
  • the image feature amount f is represented by a column vector having a predetermined number of elements.
  • the reference vectors m ki and m ij are column vectors.
  • the number of elements of each of the reference vectors m ki and m ij is the same as the number of elements of the image feature amount f. Therefore, the estimation unit 150 can calculate a distance between the image feature amount f and each of the reference vectors m ki and m ij .
  • the evaluation value J k in the formula (2) is referred to as a classification error scale in the GLVQ.
  • the evaluation value J k satisfies ⁇ 1 ⁇ J k ⁇ +1.
  • the evaluation value J k indicates that an accuracy that the image feature amount f belongs to the class k is high.
  • the estimation unit 150 determines an optimum reference vector m by the supervised learning using the GLVQ.
  • the determined reference vector m is used when the estimator estimates an angle.
  • the learning of the estimator described above may be, for example, the determination of the reference vector m.
  • a method for estimating an angle by the estimator may be, for example, a method according to “coarse angle estimation” to be described below.
  • the method for estimating an angle by the estimator may be, for example, a method according to “detailed angle estimation” to be described below.
  • the estimator estimates an angle of a direction of a person (that is, line of sight or face orientation) using the reference vector m determined according to the GLVQ and further estimates a reliability according to the formula (2).
  • the estimator first obtains a reference vector closest to the image feature amount f calculated from the extracted partial image, from among all the reference vectors determined according to the GLVQ.
  • the reference vector closest to the image feature amount f is a reference vector m Ki belonging to the class K
  • an angle of a line of sight or a face orientation of a face image input to the acquisition unit 110 is included in a range of the angle of the class K.
  • the direction of the person is included in a range that is a range of an angle of the eighth class and of which a range in the horizontal direction is from ⁇ 20 degrees to ⁇ 10 degrees and a range in the vertical direction is from ⁇ 20 degrees to ⁇ 10 degrees.
  • the estimator may output an angle of the center in the range of the angle of the class to which the reference vector closest to the image feature amount f belongs as the estimation result.
  • the angle of the center in the range from ⁇ 20 degrees to ⁇ 10 degrees in the horizontal direction is ⁇ 15 degrees
  • the angle of the center in the range from ⁇ 20 degrees to ⁇ 10 degrees in the vertical direction is also ⁇ 15 degrees.
  • the estimator may set the direction of ⁇ 15 degrees in the horizontal direction and ⁇ 15 degrees in the vertical direction as the estimated direction of the person (that is, line of sight or face orientation).
  • the estimator calculates the evaluation value J k according to the formula (2).
  • the evaluation value J k satisfies ⁇ 1 ⁇ J k ⁇ +1.
  • the estimator may set a value obtained by inverting a sign of the evaluation value J k as a reliability.
  • the reliability is ⁇ J k .
  • the reliability is included in a range from ⁇ 1 to +1. Then, the larger the value of the reliability is, the higher the reliability of the angle of the direction of the person (that is, line of sight or face orientation) estimated by the estimator.
  • the estimator may calculate an evaluation value J k for each class according to the formula (3) described later using the reference vector m determined according to the GLVQ and estimate a more detailed angle on the basis of the calculated evaluation value J k .
  • the estimator first obtains a reference vector closest to the image feature amount f calculated from the extracted partial image, from among all the reference vectors determined according to the GLVQ.
  • the classes around the class k may be, for example, nine classes in total, including the class k, that are classes in a 3 ⁇ 3 region around the region of the class k.
  • the classes around the eighth class are the eighth class and eight classes of which angle regions are adjacent to the region of the angle of the eighth class.
  • the estimator obtains a reference vector closest to the image feature amount f for each of the classes.
  • the formula (3) is different from the formula (2), and each of the second terms of the denominator and the numerator is a distance between the image feature amount f and a reference vector m 0j .
  • the reference vector m 0j is a reference vector of the 0-th class, that is, a reference vector closest to the image feature amount f among the reference vectors belonging to the negative example class related to the image of the region other than the eyes and the face.
  • the estimator estimates an angle indicated by the obtained apex as the detailed direction of the direction of the person (that is, line of sight or face orientation).
  • the estimator further calculates a reliability of the estimated direction of the person (that is, reliability of angle indicated by obtained apex).
  • the estimation unit 150 estimates the direction of the person and the reliability by the estimator for each extracted partial image and sends the estimated direction of the person and the estimated reliability (specifically, data indicating direction of person and reliability) to the integration unit 160 .
  • the estimation unit 150 may estimate both of the line of sight and the face orientation as the direction of the person. In this case, the estimation unit 150 may separately estimate a reliability of the angle indicating the line of sight and a reliability of the angle indicating the face orientation. Then, the estimation unit 150 sends the reliability of the angle indicating the line of sight and the reliability of the angle indicating the face orientation to the integration unit 160 .
  • the integration unit 160 receives the data (hereinafter, referred to as “estimation data”) indicating the direction of the person (that is, line of sight or face orientation) and the reliability estimated by the estimation unit 150 from the estimation unit 150 .
  • the integration unit 160 integrates the directions of the person included in the received estimation data on the basis of the reliability included in the estimation data. As described above, the direction of the person is represented by an angle.
  • the integration unit 160 may receive both of the direction of the line of sight and the face orientation from the estimation unit 150 . In this case, the integration unit 160 separately integrates the direction of the line of sight and the face orientation.
  • the integration unit 160 integrates the direction of the person on the basis of the reliability as follows.
  • the integration unit 160 may specify an angle indicating a direction of a person of which a reliability is higher than a predetermined threshold among the directions of the person (that is, line of sight or face orientation represented by angle) estimated by the estimation unit 150 .
  • the integration unit 160 may calculate an average of the specified angles indicating the direction of the person as the integrated angle indicating the direction of the person (that is, line of sight or face orientation).
  • the integration unit 160 may first, for example, normalize the reliability. Specifically, first, the integration unit 160 may add a value obtained by inverting a sign of a value of the lowest reliability to all the reliabilities in such a way that the value of the lowest reliability is set to be zero. The integration unit 160 may further normalize the reliability by dividing all the reliabilities by a total sum of the reliabilities in such a way that a total sum of the normalized reliabilities is set to be one.
  • the integration unit 160 may assume the normalized reliability as a weight, and calculate a weighted average of the all angles indicating the directions of the person (that is, line of sight or face orientation) as the angle indicating the integrated direction of the person (that is, line of sight or face orientation). Specifically, the integration unit 160 may calculate an angle and a product of the angle for each angle indicating the direction of the person and calculate a total sum of the products.
  • the integration unit 160 may set an angle indicating the direction of the person with the highest reliability as the integrated angle indicating the direction of the person.
  • the integration unit 160 sends integrated data indicating the direction of the person (that is, line of sight or face orientation) to the output unit 170 .
  • the output unit 170 receives the data indicating the line of sight or the face orientation integrated by the integration unit 160 (hereinafter, referred to as “integrated data”) from the integration unit 160 .
  • the output unit 170 outputs the integrated data.
  • the estimation data is, for example, data indicating the direction of the person (that is, line of sight or face orientation) integrated by the integration unit 160 in accordance with a predetermined format.
  • the output unit 170 may output the estimation data, for example, to another device such as a display device. That is, the output unit 170 may supply the estimation data to the other device.
  • the output unit 170 may superimpose a mark indicating the direction of the person on the input image and output the input image on which the mark indicating the direction of the person is superimposed (also referred to as output image) to the display device.
  • the output unit 170 may superimpose, for example, a mark, such as an arrow, indicating the direction of the line of sight on a position based on the center of the extracted pupil in the input image and output the input image on which the mark is superimposed to the display device.
  • the position based on the center of the extracted pupil may be, for example, a midpoint of a line segment connecting the position of the pupil of the right eye and the position of the pupil of the left eye.
  • the position based on the center of the extracted pupil may be, for example, a point that is away from the above-described midpoint by a predetermined distance in the direction of the line of sight.
  • the output unit 170 may superimpose an arrow starting from the midpoint described above or the point that is away from the midpoint by the predetermined distance in the direction of the line of sight on the input image.
  • the output unit 170 may superimpose a mark, for example, an arrow or the like indicating the face orientation on a position based on the feature points of the face in the input image and output the input image on which the mark is superimposed to the display device.
  • the position based on the feature points of the face may be, for example, the point indicating the top of the nose.
  • the position based on the feature points of the face may be, for example, a point that is away from the point indicating the top of the nose by a predetermined distance in the direction of the face orientation.
  • the output unit 170 may superimpose, for example, an arrow starting from the position based on the feature points of the face on the input image.
  • the output unit 170 may superimpose a mark indicating the direction of the line of sight and a mark indicating the face orientation in the integrated data on the input image.
  • the output unit 170 may write the estimation data to a storage medium included in the estimation device 100 or a storage device communicably connected to the estimation device 100 .
  • the estimation device 100 having the configuration described above operates, for example, as described below. However, a specific operation of the estimation device 100 is not limited to an example of an operation to be described below.
  • FIG. 6 is a flowchart illustrating an example of the operation of the estimation device 100 according to the present example embodiment.
  • FIG. 6 is a flowchart illustrating an estimation method for estimating the direction of the person (at least one of line of sight and face orientation) executed by the estimation device 100 according to the present example embodiment.
  • the estimation device 100 may, for example, estimate a direction of a person from a face image by sequentially executing processing in each step illustrated in FIG. 6 according to the flow illustrated in FIG. 6 .
  • the estimation device 100 can start the processing illustrated in FIG. 6 at an appropriate timing, for example, a timing designated by a user, a timing when the input image is transmitted from the other device, or the like.
  • image data input to the estimation device 100 includes a face of a person.
  • the coordinates on the image are represented by a Cartesian coordinate system having a predetermined position (for example, center of image) as an origin.
  • the acquisition unit 110 acquires an input image (step S 101 ).
  • the acquisition unit 110 extracts a face region from the acquired input image (step S 102 ).
  • the acquisition unit 110 may detect the face region in such a way that a single face region includes a single face.
  • the acquisition unit 110 may extract one or more face regions from the input image.
  • the acquisition unit 110 generates a face image from an image of the extracted face region of the input image.
  • the acquisition unit 110 may generate one or a plurality of face images. Each of the face images includes a face of one person.
  • the detection unit 120 detects feature points of a part of the face included in the face image generated in step S 102 (step S 103 ).
  • the perturbation unit 130 calculates a magnitude of a perturbation (that is, perturbation amount) to be added to a region (specifically, position or size of region) determined on the basis of the detected feature points using information regarding the feature points of the face calculated in step S 103 (step S 104 ).
  • the perturbation amount may include a value indicating a plurality of variations.
  • the extraction unit 140 extracts a partial image of the face image in the region obtained by adding the perturbation to the region determined on the basis of the detected feature points from the face image generated in step S 102 and the perturbation amount calculated in step S 104 (step S 105 ).
  • the extraction unit 140 may extract a plurality of partial images including the partial image extracted from the region determined on the basis of the detected feature points.
  • the extraction unit 140 may extract a plurality of partial images obtained by adding the plurality of variations, indicated by the perturbation amount, to the region determined on the basis of the detected feature points, in the face image.
  • the estimation unit 150 estimates a direction of the person (that is, line of sight or face orientation) and a reliability from each of the plurality of partial images generated in step S 105 using the estimator that has performed machine learning in advance (step S 106 ).
  • the integration unit 160 integrates the directions of the person (that is, line of sight or face orientation) estimated by the estimation unit 150 on the basis of the reliability (step S 107 ).
  • the output unit 170 outputs estimation data indicating the direction of the person integrated by the integration unit 160 (step S 108 ).
  • the estimation data is visualized, for example, by being output to the display device.
  • the estimation data may be displayed as a numerical value or may be displayed by an arrow, indicating the direction of the line of sight, that is superimposed on the face image.
  • the first example embodiment can be modified, for example, as the following modification. Two or more modifications to be described later can be appropriately combined.
  • a user may input a position of a feature point such as the center of the right eye, the center of the left eye, or the like and a position of a region where a partial image is extracted.
  • the estimation device 100 does not need to detect the feature points and does not need to generate the partial image.
  • the shape of the partial image is not necessarily limited to a rectangle.
  • a part of the face for example, part such as eyebrows that does not directly affect estimation of direction of person
  • a partial image used to estimate the line of sight may be a partial image including both eyes, not a partial image including only one eye (left eye or right eye).
  • the estimation device 100 may be applied to a system that estimates a line of sight of a person imaged by a monitoring camera installed in a shop and determines a suspicious person from the estimated line of sight.
  • the estimation device 100 may be applied to a system that estimates a line of sight of a user who faces a screen where information is displayed and estimates interests and concerns of the user on the basis of the estimated line of sight.
  • the estimation device 100 may be applied to an electronic device that can be operated according to a movement of the line of sight.
  • the estimation device 100 may be applied to driving assistance of an automobile or the like.
  • a specific hardware configuration of the estimation device 100 may variously vary and is not limited to a particular configuration.
  • the estimation device 100 may be implemented using software.
  • the estimation device 100 may be configured in such a way that a plurality of pieces of hardware share a plurality of processes.
  • the configuration of the present modification will be described in detail in the following description regarding the other example embodiment.
  • the estimation device 100 extracts a plurality of partial images from a plurality of regions obtained by adding a perturbation to a position, a size, or the like of a region where the partial image is extracted.
  • the estimation device 100 estimates a direction of a person (that is, line of sight or face orientation) from the plurality of extracted partial images.
  • the estimation device 100 obtains a result of the estimation of the direction of the person (for example, line of sight or face orientation) by integrating the estimated directions of the person on the basis of a reliability. In this way, the estimation device 100 can stably obtain the robust estimation result by integrating the estimation results on the basis of the plurality of partial images extracted from the region obtained by adding the perturbation to the region according to the reliability.
  • FIG. 7 is a block diagram illustrating an example of a configuration of an estimation device 101 according to the present example embodiment.
  • the estimation device 101 includes a perturbation unit 130 , an estimation unit 150 , and an integration unit 160 .
  • the perturbation unit 130 generates a plurality of extraction regions by adding a perturbation to an extraction region of a partial image determined on the basis of positions of feature points extracted from a face image.
  • the estimation unit 150 estimates a plurality of directions of at least one of a face of a line of sight and a reliability of each of the plurality of directions on the basis of the plurality of partial images in the plurality of extraction regions of the face image.
  • the integration unit 160 calculates an integrated direction obtained by integrating the plurality of directions on the basis of the estimated reliability.
  • FIG. 8 is a flowchart illustrating an example of an operation of the estimation device 101 according to the present example embodiment.
  • the perturbation unit 130 generates the plurality of extraction regions by adding the perturbation to the extraction region of the partial image determined on the basis of the positions of the feature points extracted from the face image (step S 201 ).
  • the perturbation unit 130 according to the present example embodiment may operate similarly to the perturbation unit 130 according to the first example embodiment.
  • the estimation unit 150 estimates the plurality of directions of at least one of the face of the line of sight and the reliability of each of the plurality of directions on the basis of the plurality of partial images in the plurality of extraction regions of the face image (step S 202 ).
  • the estimation unit 150 according to the present example embodiment may estimate the direction and the reliability by an estimator that is made to perform learning in advance in such a way as to estimate the direction and the reliability on the basis of the partial images, similarly to the estimation unit 150 according to the first example embodiment.
  • the integration unit 160 calculates an integrated direction obtained by integrating the plurality of directions on the basis of the estimated reliability (step S 203 ).
  • the integration unit 160 may integrate the plurality of directions on the basis of the reliability by the method similar to that of the integration unit 160 according to the first example embodiment.
  • the estimation device 101 can suppress deterioration in accuracy for estimating a line of sight or a face orientation in an image of a person due to a state of the image.
  • the perturbation unit 130 generates the plurality of extraction regions by adding the perturbation to the extraction region of the partial image determined on the basis of positions of feature points extracted from the face image.
  • the estimation unit 150 estimates the directions and the reliabilities of the directions from the plurality of generated extraction regions.
  • the integration unit 160 calculates the integrated direction obtained by integrating the plurality of directions estimated by the estimation unit 150 on the basis of the reliability estimated by the estimation unit 150 . In a case where the positions of the feature points extracted from the face image are inaccurate, there is a case where a partial image extracted from an extraction region determined on the basis of the positions is not suitable for the estimation of the direction.
  • the estimation device 100 can suppress the deterioration in the accuracy for estimating a line of sight or a face orientation in an image of a person due to a state of the image.
  • the estimation device 100 can be implemented by a computer that includes a memory to which a program is loaded and a processor that executes the program.
  • the estimation device 100 can be implemented by a plurality of computers communicably connected to each other.
  • the estimation device 100 can be implemented by dedicated hardware.
  • the estimation device 100 can be also implemented by a combination of the above-described computer and dedicated hardware.
  • the estimation device 101 can be implemented by a computer that includes a memory to which a program is loaded and a processor that executes the program.
  • the estimation device 101 can be implemented by a plurality of computers communicably connected to each other.
  • the estimation device 101 can be implemented by dedicated hardware.
  • the estimation device 101 can be also implemented by a combination of the above-described computer and dedicated hardware. More detailed description is made below.
  • FIG. 9 is a block diagram illustrating an example of a hardware configuration of a computer 300 that can implement the estimation device 100 and the estimation device 101 .
  • the computer 300 includes a Central Processing Unit (CPU) 301 , a Read Only Memory (ROM) 302 , a Random Access Memory (RAM) 303 , a storage device 304 , a drive device 305 , a communication interface 306 , and an input/output interface 307 .
  • CPU Central Processing Unit
  • ROM Read Only Memory
  • RAM Random Access Memory
  • the CPU 301 executes a program 308 loaded to the RAM 303 .
  • the program 308 may be stored in the ROM 302 .
  • the program 308 may be recorded in a storage medium 309 such as a memory card and be read by the drive device 305 .
  • the program 308 may be transmitted from an external device to the computer 300 via a communication network 310 .
  • the communication interface 306 exchanges data with an external device via the communication network 310 .
  • the input/output interface 307 exchanges data with peripheral devices (for example, input device, display device, or the like).
  • the communication interface 306 and the input/output interface 307 can function as components that acquire and output data.
  • the components of the estimation device 100 can be implemented by a processor such as the CPU 301 that executes a program such as the program 308 , for implementing the functions of the components of the estimation device 100 , loaded to the memory such as the RAM 303 .
  • the components of the estimation device 100 are, for example, an acquisition unit 110 , a detection unit 120 , a perturbation unit 130 , an extraction unit 140 , an estimation unit 150 , an integration unit 160 , and an output unit 170 .
  • the components of the estimation device 101 can be implemented by a processor such as the CPU 301 that executes a program such as the program 308 , for implementing the functions of the components of the estimation device 101 , loaded to the memory such as the RAM 303 .
  • the components of the estimation device 100 are, for example, a perturbation unit 130 , an estimation unit 150 , and an integration unit 160 .
  • the components of the estimation device 100 may be implemented by a single circuit (circuitry) (for example, processor or the like).
  • the components of the estimation device 100 may be implemented by a combination of a plurality of circuits.
  • the circuit and the plurality of circuits may be dedicated circuits or general-purpose circuits.
  • a part of the estimation device 100 may be implemented by a dedicated circuit, and other part may be implemented by a general-purpose circuit.
  • the components of the estimation device 101 may be implemented by a single circuit (circuitry) (for example, processor or the like).
  • the components of the estimation device 101 may be implemented by a combination of a plurality of circuits.
  • the circuit and the plurality of circuits may be dedicated circuits or general-purpose circuits.
  • a part of the estimation device 101 may be implemented by a dedicated circuit, and other part may be implemented by a general-purpose circuit.
  • the computer that implements the estimation device 100 and the estimation device 101 does not need to be a single computer.
  • the components of the estimation device 100 and the components of the estimation device 101 may be separately provided in a plurality of computers.
  • the estimation device 100 and the estimation device 100 may be implemented by a plurality of computer devices in cooperation with each other using the cloud computing technology.
  • An estimation device including:
  • perturbation means for generating a plurality of extraction regions by adding a perturbation to an extraction region of a partial image determined based on positions of feature points extracted from a face image
  • estimation means for estimating a plurality of directions of at least one of a face and a line of sight and a reliability of each of the plurality of directions based on a plurality of partial images in the plurality of extraction regions of the face image;
  • integration means for calculating an integrated direction obtained by integrating the plurality of directions based on the estimated reliability.
  • the perturbation means determines, based on the positions of the feature points, the perturbation to be added to the extraction region determined based on the positions of the feature points.
  • the perturbation means extracts a face region that is a region of the face from the face image, extracts the feature points from the face region, estimates a size of the face based on the positions of the extracted feature points, and determines the perturbation based on the estimated size.
  • the perturbation is at least one of a change in a size of the extraction region, a change in a position of the extraction region, a change in an angle of the extraction region, and image processing on a partial image extracted from the extraction region.
  • the estimation device according to any one of supplementary notes 1 to 4, further including:
  • acquisition means for acquiring an input image and extracting the face image from the input image
  • extraction means for extracting the feature points from the face image
  • output means for outputting the integrated direction.
  • the estimation means estimates a plurality of directions of the face and a plurality of directions of the line of sight
  • the integration means calculates an integrated face direction obtained by integrating the plurality of directions of the face and an integrated line of sight direction obtained by integrating the plurality of directions of the line of sight, and
  • the output means superimposes a first mark indicating the integrated face direction and a second mark indicating the integrated line of sight direction on the input image and outputs the input image on which the first mark and the second mark are superimposed.
  • An estimation method including:
  • the perturbation to be added to the extraction region determined based on the positions of the feature points is determined.
  • the perturbation is at least one of a change in a size of the extraction region, a change in a position of the extraction region, a change in an angle of the extraction region, and image processing on a partial image extracted from the extraction region.
  • estimation processing of estimating a plurality of directions of at least one of a face and a line of sight and a reliability of each of the plurality of directions based on a plurality of partial images in the plurality of extraction regions of the face image;
  • the perturbation processing determines the perturbation to be added to the extraction region determined based on the positions of the feature points, based on the positions of the feature points.
  • the perturbation processing extracts a face region that is a region of the face from the face image, extracts the feature points from the face region, estimates a size of the face based on the positions of the extracted feature points, and determines the perturbation based on the estimated size.
  • the perturbation is at least one of a change in a size of the extraction region, a change in a position of the extraction region, a change in an angle of the extraction region, and image processing on a partial image extracted from the extraction region.
  • the estimation processing estimates a plurality of directions of the face and a plurality of directions of the line of sight
  • the integration processing calculates an integrated face direction obtained by integrating the plurality of directions of the face and an integrated line of sight direction obtained by integrating the plurality of directions of the line of sight
  • the output processing superimposes a first mark indicating the integrated face direction and a second mark indicating the integrated line of sight direction on the input image and outputs the input image on which the first mark and the second mark are superimposed.
US17/278,500 2018-09-26 2018-09-26 Estimation device, estimation method, and storage medium Pending US20220036581A1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2018/035785 WO2020065790A1 (ja) 2018-09-26 2018-09-26 推定装置、推定方法、および記憶媒体

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/035785 A-371-Of-International WO2020065790A1 (ja) 2018-09-26 2018-09-26 推定装置、推定方法、および記憶媒体

Related Child Applications (4)

Application Number Title Priority Date Filing Date
US17/680,844 Continuation US20220180555A1 (en) 2018-09-26 2022-02-25 Estimation device, estimation method, and storage medium
US17/680,853 Continuation US20220180556A1 (en) 2018-09-26 2022-02-25 Estimation device, estimation method, and storage medium
US17/680,829 Continuation US20220180554A1 (en) 2018-09-26 2022-02-25 Estimation device, estimation method, and storage medium
US18/223,231 Continuation US20230360433A1 (en) 2018-09-26 2023-07-18 Estimation device, estimation method, and storage medium

Publications (1)

Publication Number Publication Date
US20220036581A1 true US20220036581A1 (en) 2022-02-03

Family

ID=69953429

Family Applications (5)

Application Number Title Priority Date Filing Date
US17/278,500 Pending US20220036581A1 (en) 2018-09-26 2018-09-26 Estimation device, estimation method, and storage medium
US17/680,853 Pending US20220180556A1 (en) 2018-09-26 2022-02-25 Estimation device, estimation method, and storage medium
US17/680,829 Pending US20220180554A1 (en) 2018-09-26 2022-02-25 Estimation device, estimation method, and storage medium
US17/680,844 Pending US20220180555A1 (en) 2018-09-26 2022-02-25 Estimation device, estimation method, and storage medium
US18/223,231 Pending US20230360433A1 (en) 2018-09-26 2023-07-18 Estimation device, estimation method, and storage medium

Family Applications After (4)

Application Number Title Priority Date Filing Date
US17/680,853 Pending US20220180556A1 (en) 2018-09-26 2022-02-25 Estimation device, estimation method, and storage medium
US17/680,829 Pending US20220180554A1 (en) 2018-09-26 2022-02-25 Estimation device, estimation method, and storage medium
US17/680,844 Pending US20220180555A1 (en) 2018-09-26 2022-02-25 Estimation device, estimation method, and storage medium
US18/223,231 Pending US20230360433A1 (en) 2018-09-26 2023-07-18 Estimation device, estimation method, and storage medium

Country Status (4)

Country Link
US (5) US20220036581A1 (ja)
EP (1) EP3858235A4 (ja)
JP (1) JP7107380B2 (ja)
WO (1) WO2020065790A1 (ja)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11341770B2 (en) * 2018-03-14 2022-05-24 Omron Corporation Facial image identification system, identifier generation device, identification device, image identification system, and identification system

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI768852B (zh) 2021-04-28 2022-06-21 緯創資通股份有限公司 人體方向之偵測裝置及人體方向之偵測方法

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090304232A1 (en) * 2006-07-14 2009-12-10 Panasonic Corporation Visual axis direction detection device and visual line direction detection method
US20120189160A1 (en) * 2010-08-03 2012-07-26 Canon Kabushiki Kaisha Line-of-sight detection apparatus and method thereof
US20150172553A1 (en) * 2013-10-16 2015-06-18 Olympus Corporation Display device, display method, and computer-readable recording medium
US20150339757A1 (en) * 2014-05-20 2015-11-26 Parham Aarabi Method, system and computer program product for generating recommendations for products and treatments
US20190163966A1 (en) * 2016-07-05 2019-05-30 Nec Corporation Suspicious person detection device, suspicious person detection method, and program
US20190279347A1 (en) * 2016-11-25 2019-09-12 Nec Corporation Image generation device, image generation method, and storage medium storing program
US20200242336A1 (en) * 2019-01-30 2020-07-30 Realnetworks, Inc. Method for selecting images in video of faces in the wild
US20210192184A1 (en) * 2019-12-23 2021-06-24 Ubtech Robotics Corp Ltd Face image quality evaluating method and apparatus and computer readable storage medium using the same

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS4829141B1 (ja) 1969-01-25 1973-09-07
JP3695990B2 (ja) 1999-05-25 2005-09-14 三菱電機株式会社 顔画像処理装置
JP4826506B2 (ja) 2007-02-27 2011-11-30 日産自動車株式会社 視線推定装置
JP2009059257A (ja) 2007-09-03 2009-03-19 Sony Corp 情報処理装置、および情報処理方法、並びにコンピュータ・プログラム
EP2338416B1 (en) * 2008-09-26 2019-02-27 Panasonic Intellectual Property Corporation of America Line-of-sight direction determination device and line-of-sight direction determination method
JP5282612B2 (ja) * 2009-03-11 2013-09-04 オムロン株式会社 情報処理装置及び方法、プログラム、並びに情報処理システム
EP2490171B1 (en) * 2009-10-16 2020-11-25 Nec Corporation Person image search starting from clothing query text.
JP5406705B2 (ja) 2009-12-28 2014-02-05 キヤノン株式会社 データ補正装置及び方法
JP5772821B2 (ja) 2010-05-26 2015-09-02 日本電気株式会社 顔特徴点位置補正装置、顔特徴点位置補正方法および顔特徴点位置補正プログラム
JP2012038106A (ja) * 2010-08-06 2012-02-23 Canon Inc 情報処理装置、情報処理方法、およびプログラム
JP5856100B2 (ja) 2013-04-19 2016-02-09 株式会社ユニバーサルエンターテインメント 遊技機および遊技機の管理方法
KR102365393B1 (ko) * 2014-12-11 2022-02-21 엘지전자 주식회사 이동단말기 및 그 제어방법
CN105913487B (zh) * 2016-04-09 2018-07-06 北京航空航天大学 一种基于人眼图像中虹膜轮廓分析匹配的视线方向计算方法
JP6822482B2 (ja) * 2016-10-31 2021-01-27 日本電気株式会社 視線推定装置、視線推定方法及びプログラム記録媒体

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090304232A1 (en) * 2006-07-14 2009-12-10 Panasonic Corporation Visual axis direction detection device and visual line direction detection method
US20120189160A1 (en) * 2010-08-03 2012-07-26 Canon Kabushiki Kaisha Line-of-sight detection apparatus and method thereof
US20150172553A1 (en) * 2013-10-16 2015-06-18 Olympus Corporation Display device, display method, and computer-readable recording medium
US20150339757A1 (en) * 2014-05-20 2015-11-26 Parham Aarabi Method, system and computer program product for generating recommendations for products and treatments
US20190163966A1 (en) * 2016-07-05 2019-05-30 Nec Corporation Suspicious person detection device, suspicious person detection method, and program
US20190279347A1 (en) * 2016-11-25 2019-09-12 Nec Corporation Image generation device, image generation method, and storage medium storing program
US20200242336A1 (en) * 2019-01-30 2020-07-30 Realnetworks, Inc. Method for selecting images in video of faces in the wild
US20210192184A1 (en) * 2019-12-23 2021-06-24 Ubtech Robotics Corp Ltd Face image quality evaluating method and apparatus and computer readable storage medium using the same

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11341770B2 (en) * 2018-03-14 2022-05-24 Omron Corporation Facial image identification system, identifier generation device, identification device, image identification system, and identification system

Also Published As

Publication number Publication date
US20220180556A1 (en) 2022-06-09
EP3858235A4 (en) 2021-09-08
EP3858235A1 (en) 2021-08-04
WO2020065790A1 (ja) 2020-04-02
US20230360433A1 (en) 2023-11-09
US20220180554A1 (en) 2022-06-09
JPWO2020065790A1 (ja) 2021-09-24
JP7107380B2 (ja) 2022-07-27
US20220180555A1 (en) 2022-06-09

Similar Documents

Publication Publication Date Title
US11775056B2 (en) System and method using machine learning for iris tracking, measurement, and simulation
TWI383325B (zh) 臉部表情辨識
US20230360433A1 (en) Estimation device, estimation method, and storage medium
CN102375974B (zh) 信息处理设备和信息处理方法
US9117111B2 (en) Pattern processing apparatus and method, and program
Mian et al. Automatic 3d face detection, normalization and recognition
US11232586B2 (en) Line-of-sight estimation device, line-of-sight estimation method, and program recording medium
US8401253B2 (en) Distinguishing true 3-d faces from 2-d face pictures in face recognition
US11210498B2 (en) Facial authentication device, facial authentication method, and program recording medium
JP6410450B2 (ja) オブジェクト識別装置、オブジェクト識別方法及びプログラム
US9858501B2 (en) Reliability acquiring apparatus, reliability acquiring method, and reliability acquiring program
US11462052B2 (en) Image processing device, image processing method, and recording medium
JP2012048326A (ja) 画像処理装置及びプログラム
US11887331B2 (en) Information processing apparatus, control method, and non-transitory storage medium
KR101484003B1 (ko) 얼굴 분석 평가 시스템
Xia et al. SDM-based means of gradient for eye center localization
JP7040539B2 (ja) 視線推定装置、視線推定方法、およびプログラム
Dahmane et al. Learning symmetrical model for head pose estimation
JP7255721B2 (ja) 視線推定装置、視線推定方法、およびプログラム
Rabba et al. Discriminative robust gaze estimation using kernel-dmcca fusion
Thomas Detection I
Olsson et al. Recognizing postures and head movements from video sequences

Legal Events

Date Code Title Description
AS Assignment

Owner name: NEC CORPORATION, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MORISHITA, YUSUKE;REEL/FRAME:055671/0465

Effective date: 20210104

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS