WO2021210340A1 - Image processing device, image processing method, and imaging device - Google Patents

Image processing device, image processing method, and imaging device Download PDF

Info

Publication number
WO2021210340A1
WO2021210340A1 PCT/JP2021/011225 JP2021011225W WO2021210340A1 WO 2021210340 A1 WO2021210340 A1 WO 2021210340A1 JP 2021011225 W JP2021011225 W JP 2021011225W WO 2021210340 A1 WO2021210340 A1 WO 2021210340A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
subject
inclination
unit
image processing
Prior art date
Application number
PCT/JP2021/011225
Other languages
French (fr)
Japanese (ja)
Inventor
真範 三上
孝之 細川
徳宏 西川
Original Assignee
ソニーグループ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーグループ株式会社 filed Critical ソニーグループ株式会社
Publication of WO2021210340A1 publication Critical patent/WO2021210340A1/en

Links

Images

Classifications

    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B7/00Mountings, adjusting means, or light-tight connections, for optical elements
    • G02B7/28Systems for automatic generation of focusing signals
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B7/00Mountings, adjusting means, or light-tight connections, for optical elements
    • G02B7/28Systems for automatic generation of focusing signals
    • G02B7/34Systems for automatic generation of focusing signals using different areas in a pupil plane
    • GPHYSICS
    • G03PHOTOGRAPHY; CINEMATOGRAPHY; ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ELECTROGRAPHY; HOLOGRAPHY
    • G03BAPPARATUS OR ARRANGEMENTS FOR TAKING PHOTOGRAPHS OR FOR PROJECTING OR VIEWING THEM; APPARATUS OR ARRANGEMENTS EMPLOYING ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ACCESSORIES THEREFOR
    • G03B13/00Viewfinders; Focusing aids for cameras; Means for focusing for cameras; Autofocus systems for cameras
    • G03B13/32Means for focusing
    • G03B13/34Power focusing
    • G03B13/36Autofocus systems
    • GPHYSICS
    • G03PHOTOGRAPHY; CINEMATOGRAPHY; ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ELECTROGRAPHY; HOLOGRAPHY
    • G03BAPPARATUS OR ARRANGEMENTS FOR TAKING PHOTOGRAPHS OR FOR PROJECTING OR VIEWING THEM; APPARATUS OR ARRANGEMENTS EMPLOYING ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ACCESSORIES THEREFOR
    • G03B15/00Special procedures for taking photographs; Apparatus therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules

Definitions

  • This technology relates to an image processing device, an image processing method, and an imaging device for face detection in an image.
  • the focus may be on the center of the face frame or the portion closest to the image pickup device.
  • the purpose of this technology is to perform appropriate response processing according to the upward adjustment and downward adjustment of the subject's face.
  • the image processing apparatus includes a tilt calculation unit that calculates the tilt of the face of the subject in the pitch direction, and a corresponding processing unit that performs corresponding processing using the calculated tilt. By calculating the tilt of the subject's face, it is possible to perform optimal focusing control and white balance adjustment.
  • the image processing device described above includes a site specifying unit that detects the position of the nose and the position of the pupil in the subject, and the inclination calculating unit determines the face based on the detected position of the nose and the position of the pupil.
  • the angle of the inclination in the pitch direction may be calculated. As a result, the angle of the face in the pitch direction can be calculated with high accuracy.
  • the inclination calculation unit in the image processing apparatus described above may use face average data which is average distance information about a face in calculating the angle. As a result, the angle of the face in the pitch direction can be calculated with higher accuracy.
  • the image processing apparatus described above includes an average data selection unit that selects one from the face average data provided at least for each race or age, and the inclination calculation unit is one selected by the average data selection unit.
  • the angle may be calculated based on the two face average data. As a result, the angle of the face is calculated with high accuracy based on the average face data according to the age.
  • the average data selection unit in the image processing apparatus described above clusters the distance information for each of a plurality of ranging areas provided on the image, and from the distribution of the distance information of the face of the subject obtained as a result of the clustering.
  • the size of the subject's face in the depth direction may be calculated, and the face average data may be selected based on the calculated size of the face in the depth direction.
  • the face average data is selected based on the fact that the child's face is small and the adult's face is large.
  • the inclination calculation unit in the image processing device may calculate the inclination in the yaw direction and the inclination in the roll direction of the face of the subject by using the face average data. This makes it possible to calculate the angle of inclination of the face in all directions.
  • the corresponding processing unit when the face of the subject is determined to be a tilted face facing upward based on the inclination calculated by the inclination calculation unit, the corresponding processing unit performs the corresponding processing.
  • Focus control may be performed at a position other than the chin on the subject. This prevents the chin from being in focus.
  • the corresponding processing unit when the face of the subject is determined to be a downward-facing face based on the inclination calculated by the inclination calculation unit, the corresponding processing unit performs the corresponding processing as the corresponding processing.
  • Focus control may be performed at a position other than the head of the subject. This prevents the head from being in focus.
  • the corresponding processing unit in the image processing apparatus described above may perform focusing control at the position of the pupil of the subject as the corresponding process. For example, when the depth of field is extremely shallow, such as in a high-resolution imaging device, the pupil may become unclear.
  • the corresponding processing unit in the image processing apparatus described above may perform a process of outputting the tilt information as the corresponding process.
  • the external device can perform processing using the tilt information of the face.
  • the corresponding processing unit performs the pupil as the corresponding process.
  • Focus control may be performed using the defocus amount estimated for the position of. For example, when an object such as a branch is located between the pupil and the image pickup device, the defocus amount may not be calculated appropriately.
  • the image processing method calculates the inclination of the face of the subject in the pitch direction, and performs the corresponding processing using the calculated inclination.
  • the imaging device includes an imaging unit that images a subject, a tilt calculation unit that calculates the tilt of the subject's face in the pitch direction, and a control unit that performs focusing control according to the calculated tilt. It is equipped with.
  • FIG. 16 is a diagram showing a second example of the corresponding process together with FIG. 16, and is a diagram showing an example of a process of highlighting the tip of the line of sight of the subject.
  • Imaging device configuration> ⁇ 2.
  • Control function configuration> ⁇ 3.
  • Imaging device configuration> As one of the embodiments of the image processing apparatus of the present technology, an image pickup apparatus provided with an image processing unit will be described as an example. However, the image processing device is not limited to the image processing device provided with the image processing unit, and may be an information processing device that performs image processing on the received captured image.
  • the image pickup device 1 can be considered in various forms as a video camera or a still camera.
  • the image pickup device 1 is provided with an image pickup element, various lenses, an operator, and the like inside and outside. Further, the image pickup apparatus 1 may be provided with a display monitor or an EVF (Electric Viewfinder).
  • EVF Electronic Viewfinder
  • FIG. 1 shows a block diagram of the image pickup apparatus 1.
  • the imaging device 1 includes a lens system 2, an imaging unit 3, a signal processing unit 4, a recording unit 5, a display unit 6, an output unit 7, an operation unit 8, a control unit 9, a memory unit 10, and a driver unit 11. ..
  • the lens system 2 is equipped with various optical lenses, an aperture mechanism, and the like.
  • the image pickup unit 3 is configured to include, for example, an image pickup element 12 of a CCD (Charge Coupled Device) type or a CMOS (Complementary Metal-Oxide Semiconductor) type.
  • the sensor surface of the image pickup device 12 is configured to include a sensing element in which a plurality of pixels are two-dimensionally arranged.
  • the image pickup unit 3 executes, for example, CDS (Correlated Double Sampling) processing, AGC (Automatic Gain Control) processing, and the like on the electric signal obtained by photoelectric conversion of the light received by the image sensor 12, and further performs A / D (Analog) processing. / Digital) Performs conversion processing.
  • the imaging unit 3 outputs the captured image signal as digital data to the signal processing unit 4 and the control unit 9.
  • the image sensor 12 includes, for example, an image plane phase difference pixel 12a that outputs a signal for calculating phase difference information for calculating the defocus amount. All the pixels included in the image pickup device 12 may be the image plane phase difference pixel 12a, or some pixels may be the image plane phase difference pixel 12a.
  • FIG. 2 shows an example of arranging the image plane retardation pixels 12a.
  • the pixels provided with the Bayer-arranged color filters (R, G, B) a part of the pixels having the color filter having the spectral sensitivity of green (G) is defined as the image plane phase difference pixel 12a.
  • the image plane phase difference pixel 12a may be, for example, a PD divided pixel by a PD (Photodiode) division method or a light-shielding pixel by a light-shielding method.
  • PD Photodiode
  • FIG. 2 The example shown in which the PD division pixel is adopted as the image plane retardation pixel 12a.
  • the defocus amount is calculated based on the signal output from the image plane phase difference pixel 12a included in the pixel region of a predetermined size (hereinafter referred to as “target region”), and the defocus amount is calculated from the defocus amount.
  • the distance information between the reference position (for example, the image sensor 12) and the subject is calculated for each target area. Thereby, for example, the distance information for each part constituting the face of the subject is calculated and used in the processing described later.
  • the signal processing unit 4 is composed of, for example, a microcomputer specialized in digital signal processing such as a DSP (Digital Signal Processor), a microcomputer, or the like.
  • a microcomputer specialized in digital signal processing such as a DSP (Digital Signal Processor), a microcomputer, or the like.
  • the signal processing unit 4 includes each unit for performing various signal processing on the digital signal (image captured image signal) sent from the imaging unit 3.
  • processing such as correction processing between R, G, and B color channels, white balance correction, aberration correction, and shading correction is performed.
  • the signal processing unit 4 generates (separates) a luminance (Y) signal and a color (C) signal from the image data of R, G, and B, a YC generation process, a process of adjusting the luminance and the color, and a knee correction. And each process such as gamma correction.
  • the signal processing unit 4 performs conversion to the final output format by performing resolution conversion processing, codec processing for encoding for recording and communication, and the like.
  • the image data converted into the final output format is stored in the memory unit 10.
  • the image is displayed on the back side of the main body or on the monitor provided in the EVF. Further, by outputting from the output unit 7 provided with the external output terminal, it is displayed on a device such as a monitor provided outside the image pickup apparatus 1.
  • the recording unit 5 is composed of, for example, a non-volatile memory, and functions as a storage means for storing an image file (content file) such as still image data or moving image data, attribute information of the image file, a thumbnail image, or the like.
  • the image file is stored in a format such as JPEG (Joint Photographic Experts Group), TIFF (Tagged Image File Format), GIF (Graphics Interchange Format), or the like.
  • the actual form of the recording unit 5 can be considered in various ways.
  • the recording unit 5 may be configured as a flash memory built in the image pickup device 1, or a memory card (for example, a portable flash memory) that can be attached to and detached from the image pickup device 1 and storage / reading from the memory card. It may be composed of an access unit that performs access for. Further, it may be realized as an HDD (Hard Disk Drive) or the like as a form built in the image pickup apparatus 1.
  • HDD Hard Disk Drive
  • the display unit 6 executes processing for performing various displays on the imager.
  • the display unit 6 is, for example, a rear monitor or a finder monitor.
  • the display unit 6 performs a process of displaying image data converted to an appropriate resolution input from the signal processing unit 4.
  • a so-called through image which is a captured image during the standby of the release, is displayed.
  • the display unit 6 realizes on the screen the display of various operation menus, icons, messages, etc. as a GUI (Graphical User Interface) based on the instruction from the control unit 9.
  • the display unit 6 can display a reproduced image of the image data read from the recording medium by the recording unit 5.
  • the output unit 7 performs data communication and network communication with an external device by wire or wireless. For example, captured image data (still image file or moving image file) is transmitted to an external display device, recording device, playback device, or the like. Further, the output unit 7 may function as a network communication unit. For example, communication may be performed by various networks such as the Internet, a home network, and a LAN (Local Area Network), and various data may be transmitted and received to and from a server, a terminal, or the like on the network.
  • networks such as the Internet, a home network, and a LAN (Local Area Network)
  • the operation unit 8 provided in the housing of the camera includes not only mechanical controls such as buttons and switches, but also a monitor that employs a touch panel method, and various operations such as tap operation and swipe operation of the imager.
  • the operation information corresponding to the operation is output to the control unit 9.
  • the control unit 9 is composed of a microcomputer equipped with a CPU (Central Processing Unit) and performs overall control of the image pickup apparatus 1. For example, the shutter speed is controlled according to the operation of the imager, the signal processing unit 4 instructs various signal processing, the imaging operation and the recording operation are performed according to the user's operation, and the recorded image file is reproduced.
  • a CPU Central Processing Unit
  • control unit 9 gives an instruction to the driver unit 11 in order to control various lenses included in the optical system 51. For example, a process of designating an aperture value in order to secure a necessary amount of light for AF control, an operation instruction of an aperture mechanism according to the aperture value, and the like are performed.
  • the control unit 9 has a configuration capable of realizing various functions described later. Specifically, it will be described later.
  • the memory unit 10 stores information and the like used for processing executed by the control unit 9.
  • the illustrated memory unit 10 comprehensively indicates, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), a flash memory, and the like.
  • the memory unit 10 may be a memory area built in the microcomputer chip as the control unit 9, or may be configured by a separate memory chip.
  • Programs and the like used by the control unit 9 are stored in the ROM, flash memory, and the like of the memory unit 10.
  • ROM Read Only Memory
  • flash memory etc.
  • content files such as an OS (Operating System) for the CPU to control each part and an image file
  • application programs and firmware for various operations are stored.
  • the control unit 9 controls the entire image pickup apparatus 1 by executing the program.
  • the RAM of the memory unit 10 is used as a work area of the control unit 9 by temporarily storing data, programs, and the like used in various data processing executed by the CPU of the control unit 9.
  • the driver unit 11 is provided with, for example, a motor driver for the zoom lens drive motor, a motor driver for the focus lens drive motor, an aperture mechanism driver for the motor that drives the aperture mechanism, and the like. Each driver supplies a drive current to the corresponding drive motor in response to an instruction from the control unit 9.
  • Control function configuration In the control unit 9 of the image pickup apparatus 1, the functional configuration as shown in FIG. 3 is constructed by executing the program stored in the ROM or RAM as the memory unit 10.
  • the functional configuration shown in FIG. 3 is not only constructed only in the control unit 9, but may be constructed so that the control unit 9 and the signal processing unit 4 cooperate to perform a function. Further, the functional configuration shown in FIG. 3 may be constructed in the signal processing unit 4.
  • the control unit 9 includes a subject identification unit 20, a site identification unit 21, a defocus amount calculation unit 22, an average data selection unit 23, an inclination calculation unit 24, an area selection unit 25, a focusing control unit 26, and an output control unit 27. I have.
  • the subject identification unit 20 detects and identifies a subject such as a person or an object in the image to be processed by a known technique for detecting the subject.
  • a method for detecting a subject for example, a face / object recognition technique by template matching, a matching method based on the brightness distribution information of the subject, a method based on a skin-colored part included in an image, a feature amount of a human face, or the like is used. be able to. Further, these methods may be combined to improve the detection accuracy.
  • the subject may be specified by executing image recognition processing using machine learning technology such as CNN (Convolutional Neural Network), or may be specified by using the information input by the user about the subject. good.
  • CNN Convolutional Neural Network
  • the site specifying unit 21 executes a process of identifying the position of the face portion of the subject on the captured image.
  • the parts of the face to be identified are, for example, eyes, nose, chin, head, cheeks, ears, mouth and the like.
  • the identification of the facial part is performed by machine learning such as CNN.
  • CNN the feature amount of the image is compressed by performing the convolution process and the pooling process. This makes it possible to identify the facial part of the subject.
  • the defocus amount calculation unit 22 calculates the distance between the face of the subject and the image sensor 12 by calculating the defocus amount for each target area. By combining the defocus amount for each target area and the position information of the facial part of the subject detected by the part identification unit 21, the distance from the image sensor 12 can be calculated for each part.
  • the average data selection unit 23 performs a process of selecting face average data (hereinafter referred to as “face average data”) based on subject attribute information.
  • Attribute information is, for example, gender, age, race, etc.
  • the age may be not only a specific age but also a rough outline such as the age (teens, 20s, etc.), or more roughly, such as "adult” or "child”. You may.
  • Face average data includes, for example, occipital / nasal tip distance, total head height, inter-pupil width, head circumference, head length, head width, buccal arch width, mandibular angle width, ears width, mastoid width, and internal eye angle.
  • Distance information between parts such as width, external eye angle width, mouth fissure width, morphological face height, nose height, and subnasal / chin distance.
  • the face average data consisting of these various distance information will be described based on an example in which two types, "for adults” and "for children", are provided.
  • facial parts such as the position of the nose, the position of the eyes, and the position of the chin of the subject is obtained.
  • distance information between facial parts for example, inter-pupil width, nose height, etc.
  • face average data children's face average data and adult face average data.
  • the tilt calculation unit 24 calculates tilt information for the subject specified by the subject identification unit 20. Specifically, for example, the inclination of the pitch direction, the yaw direction, and the roll direction of the face of the person as the subject specified by the subject identification unit 20 is calculated. It should be noted that only a part of the inclinations in the pitch direction, the yaw direction and the roll direction may be calculated.
  • the optical axis direction of the image pickup apparatus 1 is referred to as "front-back direction", the vertical direction is referred to as “vertical direction”, and the direction orthogonal to both the front-back direction and the up-down direction is referred to as "left-right direction”.
  • the inclination of the face in the pitch direction refers to the inclination of the face with the left-right direction as the rotation axis.
  • the inclination of the face in the yaw direction refers to the inclination of the face with the vertical direction as the rotation axis.
  • the inclination in the roll direction refers to the inclination of the face with the rotation axis in the front-back direction.
  • face inclination the inclinations in the pitch direction, the yaw direction, and the roll direction are collectively referred to as "face inclination”.
  • the area selection unit 25 performs a process of selecting an in-focus area, which is an area to be in-focus control, from the target area.
  • the focus area is selected based on which part of the subject is in focus. For example, when it is desired to focus on the pupil, the target area in which the pupil is imaged is selected as the focusing area. When it is desired to select the target area in which the pupil of the subject is imaged as the focusing area, the target area may be unknown.
  • the focusing control unit 26 performs focusing control based on the focusing area selected by the area selection unit 25. There are several possible patterns for focusing control. An example will be described in which the focusing area is the area where the pupil of the subject is imaged.
  • the focusing control is performed based on the defocus amount of the focusing area.
  • the focusing area is selected without any problem, but an obstacle (hereinafter referred to as "forward object") located between the pupil and the image sensor 1 is located, and the defocus amount in the focusing area is appropriate. If it cannot be calculated, the distance between the pupil and the image sensor 12 is estimated from the distance between the other part of the subject (for example, the nose) and the image sensor 12, and the difference information between the other part and the position of the pupil in the anteroposterior direction. Then focus control is performed. Even if the focusing area cannot be specified, it is possible to estimate the distance between the pupil and the image sensor 12 from the relative positional relationship between the other portion and the pupil and focus on the focus.
  • the output control unit 27 outputs information such as the position of the subject and the position of the face in the captured image, or various inclinations (rotation angles around each axis) calculated by the inclination calculation unit 24 via the output unit 7. ..
  • the control unit 9 performs face detection processing in step S101. This process is performed using machine learning such as CNN, for example, as described above.
  • step S102 the control unit 9 determines whether or not the face of the subject can be detected and performs branch processing.
  • control unit 9 When it is determined that the face of the subject cannot be determined, the control unit 9 performs a process of causing the user to select one target area, for example, in step S103, and selects the selected target area as the focusing area. After that, the control unit 9 proceeds to the process of step S105.
  • control unit 9 proceeds to the focusing area selection process in step S104.
  • the control unit 9 determines in step S201 whether or not the pupil can be detected. When the pupil of the subject can be detected, the control unit 9 attempts to calculate the defocus amount for the target area corresponding to the pupil position detected in step S202, and determines whether or not the defocus amount can be calculated.
  • the control unit 9 When it is determined that the defocus amount can be calculated, for example, when there is no obstacle (forward object) between the pupil of the subject and the imaging device 1, the control unit 9 is the target corresponding to the pupil position in step S203. The area is selected as the focusing area, and the focusing area selection process shown in FIG. 5 is completed.
  • step S201 determines whether the pupil could not be detected, or if it is determined in step S202 that the pupil could be detected but the defocus amount could not be calculated.
  • the control unit 9 performs each process from step S204 to step S211.
  • the focusing area for focusing on the pupil of the subject is selected. For example, when a front object such as a branch is located between the pupil and the image pickup apparatus 1, the defocus amount may not be calculated normally even if the pupil can be detected. In such a case, the process proceeds to step S204.
  • step S204 the control unit 9 calculates the defocus amount for each target area 31A included in the face frame 30. For example, as shown in FIG. 6, the defocus amount is calculated for each target area 31A located inside the face frame 30 among the plurality of target areas 31 included in the captured image.
  • step S205 the control unit 9 creates a histogram of the defocus amount calculated in step S204 and performs a clustering process.
  • FIG. 7 is an example of a histogram of the target area 31A.
  • the horizontal axis in the histogram represents the distance to the image sensor 12, and the vertical axis represents the number of the corresponding target regions 31A.
  • Clustering is performed from the histogram shown in FIG. 7 using a method such as the k-means method.
  • a method such as the k-means method.
  • each target area 31A is classified into a "face cluster” in which the area in which the face is imaged is classified, a "front object cluster” in which the area in which the front object is imaged is classified, and an area in which the background object is imaged.
  • FIG. 8 shows an example of the classification results. As shown in the figure, each target area 31A is classified into one of a face cluster, a front object cluster, and a background cluster according to the distance from the image sensor 12.
  • the length of the face in the anterior-posterior direction (head length D1) and the distance D2 between the positions of the ears and the nose in the anterior-posterior direction may be taken into consideration (FIG. 9).
  • the image sensor When focusing on a subject located at a distance of 12 to 2 m, the front depth of field is 6.41 mm and the rear depth of field is 6.45 mm. Let this be one depth of field.
  • the distance from the center position of the face to the position of the ears is 96 mm.
  • the depth of field is 134.4 ⁇ m when converted to the defocus amount. That is, the range in which the defocus amount can be taken is 134.4 ⁇ m. Based on this value, the target area 31A classified into the face cluster may be determined.
  • step S206 of FIG. 5 the control unit 9 performs a process of excluding the target area 31A classified into the front object cluster and the target area 31A classified into the background object cluster.
  • step S207 the control unit 9 estimates the distance between the image sensor 12 and the face from the data of the lens position and the defocus amount.
  • the median defocus amount of the defocus amount for each target area 31A classified into the face cluster is specified, and the distance between the imaging element 12 and the face is calculated from the median defocus amount. presume.
  • the amount of defocus for the other target areas 31A classified into the face cluster may be calculated.
  • variable f represents the focal length f.
  • variable L1 represents the distance L1 between the subject and the lens.
  • variable L2 represents the distance L2 between the lens and the image sensor 12.
  • the distance L1 between the subject and the lens can be calculated (see FIG. 10).
  • the distance between facial parts (for example, interpupillary width, inner canthus width, outer canthus width, etc.) can be calculated.
  • variable Y2 represents the distance Y2 on the image sensor 12.
  • the variable Y1 represents the actual distance Y1. Since the distance Y2, the distance L1 and the distance L2 are known, the distance Y1 can be calculated (see FIG. 10).
  • the control unit 9 selects face average data based on the distance information. Specifically, one is selected from a plurality of prepared face average data (for example, adult face average data and child face average data) based on the ratio and size of each distance information. As an example, the difference between the minimum defocus amount and the maximum defocus amount is calculated in the target area 31A classified into the face cluster, and the face average data for adults is selected or the face average for children is selected according to the magnitude of the difference. Decide whether to select the data.
  • a plurality of prepared face average data for example, adult face average data and child face average data
  • step S209 the control unit 9 calculates the position of the front pupil from the inclination of the face and the selected face average data.
  • the front pupil is the pupil of the two pupils of the subject that is closer to the image pickup apparatus 1. That is, the distance between the front pupil and the image sensor 12 is calculated.
  • the information on the inclination of the face is the information acquired in the face detection process in step S101 of FIG.
  • the control unit 9 performs a three-axis correction process of the face angle in step S210.
  • the angle of the face that is, the inclination of the face in the pitch direction, the yaw direction, and the roll direction is calculated, and the distance between the front pupil and the image sensor 12 is corrected.
  • step S211 the control unit 9 selects the target area 31A having a close defocus amount as the focusing area.
  • the target area 31A close to the pupil position may be selected from the target areas 31A having a close defocus amount.
  • control unit 9 that has completed the focusing area selection process acquires the defocus amount of the selected focusing area in step S105. Subsequently, in step S106, the control unit 9 drives the lens according to the acquired defocus amount to perform focusing control. As a result, the focus is on the front pupil.
  • the control unit 9 determines in step S107 whether or not the face can be detected. If the face cannot be detected, the process returns to step S101 again.
  • the inclination of the face in the pitch direction, yaw direction, and roll direction does not have to be output by machine learning such as CNN.
  • the inclination in each direction may be calculated according to the pupil position and the nose position.
  • the distance X be the distance between the pupil position and the nose position of the face facing the image pickup device 1 in the vertical direction (see FIG. 11). Further, the distance between the pupil position and the nose position in the vertical direction is defined as the distance Y in the tilted face with the subject's face facing upward (see FIG. 12). Further, the distance between the pupil position and the nose position in the vertical direction is defined as the distance Z in the face with the subject's face facing downward (see FIG. 13).
  • the distance Y and the distance Z are shorter than the distance X (see FIG. 14). Therefore, it is possible to calculate how many times the face of the subject is tilted in the pitch direction according to the value of the distance Y or the distance Z.
  • the distance X may differ depending on the attributes of the subject (gender, age, race, etc.).
  • the values of the distance Y and the distance Z also differ depending on the attributes of the subject. Specifically, if it is a child, the shape is such that the graph of FIG. 14 is crushed in the vertical direction. Therefore, an appropriate graph may be selected using the face average data according to the attributes of the subject, and the inclination of the face may be calculated from the values of the distance Y and the distance Z.
  • the inclination angle in the pitch direction may be calculated according to the ratio of the distance Y (distance Z) to the distance X.
  • the tilted face and the depressed face can be determined by the position of the eyes and the position of the nose with respect to the face frame 30. Specifically, if the pupil position and the nose position are above the face frame 30, it can be determined to be a tilted face, and if it is below, it can be determined to be a depressed face.
  • the relationship between the change in the distance X and the inclination angle in the pitch direction can be stored in advance in the memory unit 10 of the image pickup apparatus 1 in the form of a mathematical formula or a look-up table.
  • the tilt angle of the face not only in the pitch direction but also in the yaw direction and the roll direction by performing the same processing. For example, in the case of inclination in the yaw direction, the distance between the nose position and the pupil position (or ear position) in the left-right direction is calculated. Then, the face average data is selected according to the attribute information of the subject. Further, based on the selected face average data, one graph showing the relationship between the horizontal distance between the nose position and the pupil position (or the ear position) and the tilt angle is selected to calculate the tilt angle in the yaw direction.
  • the inclination in the yaw direction can be calculated by using the distance information that changes depending on the degree of inclination in the yaw direction.
  • the inclination in the roll direction can also be calculated by using the distance information that changes depending on the inclination in the roll direction.
  • the control unit 9 performs face detection processing in step S301.
  • the process of step S301 may be omitted because the face detection process has already been performed in step S101 of FIG. ..
  • step S302 the control unit 9 calculates a distance Y (or distance Z) in the vertical direction from the pupil position and the nose position of the subject.
  • step S303 the control unit 9 determines whether the face is a tilted face or a depressed face according to the pupil position and the nose position with respect to the face frame 30.
  • step S304 the control unit 9 calculates the tilt angle of the face using a mathematical formula, a look-up table, or the like.
  • step S304 After calculating the tilt angle in the pitch direction in step S304, the processing after step S209 in FIG. 5 is performed, and finally step S106 in FIG. 4 is executed to focus on the pupil position. That is, it is possible to prevent the chin from being in focus when the face is tilted and the head from being focused when the face is depressed.
  • processing other than focusing control is performed by using the tilt information of the subject's face in the pitch direction, the yaw direction, and the roll direction.
  • This process may be executed by the signal processing unit 4 or the control unit 9 of the image pickup apparatus 1.
  • the tilt information of the face may be output from the output unit 7 and used in an information processing device provided outside the image pickup device 1 as an image processing device.
  • the tilt information of the subject's face in the captured image for white balance adjustment.
  • the face may be adjusted to a bluish color by performing normal white balance adjustment.
  • by strengthening orange and red, which have a complementary color relationship with blue it is possible to perform white balance adjustment that reproduces a natural complexion.
  • the tilt information of the face in the information processing device external to the image pickup device can be mentioned. What the person is looking at by automatically adjusting the blowout position of the person (subject) based on the captured image and the tilt information of the face output as metadata, or by moving the image in the direction of the person's line of sight. (See FIGS. 16 and 17) and the like can be considered.
  • the metadata output together with the captured image may include coordinate information of the pupil position and coordinate information of the nose position in addition to the tilt information. Further, for example, flag information for identifying a tilted face or a depressed face may be output as rougher information than the angle information. Of course, not all of this information needs to be output as metadata, and some information may be output as metadata.
  • the image processing device (for example, the image pickup device 1) in the present technology corresponds to the inclination calculation unit 24 that calculates the inclination of the subject's face in the pitch direction by using the calculated inclination. It is provided with a corresponding processing unit (focusing control unit 26, output control unit 27) for performing processing. By calculating the inclination of the subject's face and using it for various processes, it is possible to perform optimum focusing control and white balance adjustment.
  • a site specifying unit 21 for detecting the position of the nose and the position of the pupil in the subject is provided, and the inclination calculation unit 24 includes the detected position of the nose and the pupil.
  • the angle of inclination of the face in the pitch direction may be calculated based on the position of.
  • the angle of the face in the pitch direction can be calculated with high accuracy. Therefore, focusing control and the like can be performed with higher accuracy. For example, when the hair color of the subject is black, if the head is in focus and the white balance is adjusted based on the area where the head is reflected, the entire image becomes bright.
  • the inclination calculation unit 24 may use the face average data which is the average distance information about the face in the calculation of the angle. As a result, the angle of the face in the pitch direction can be calculated with higher accuracy. Therefore, it is possible to perform focusing control and the like with higher accuracy and to create an effective effect in post-production.
  • an average data selection unit 23 for selecting one from the face average data provided for at least race or age is provided, and a slope calculation unit is provided. 24 may calculate the angle based on one face average data selected by the average data selection unit 23. As a result, the angle of the face is calculated with high accuracy based on the face average data according to the gender, age, race, and the like. Therefore, it is possible to perform focusing control and the like with higher accuracy and to create a more effective effect in post-production.
  • the average data selection unit 23 is the distance information for each of the plurality of ranging areas (target areas 31) provided on the image.
  • the size of the subject's face in the depth direction is calculated from the distribution of the distance information of the subject's face obtained as a result of the clustering, and the face average data is selected based on the calculated size of the face in the depth direction. You may go.
  • the face average data is selected based on the fact that the child's face is small and the adult's face is large. Therefore, regardless of whether the subject is a child or an adult, focusing control, post-production processing, and the like can be performed with high accuracy.
  • the inclination calculation unit 24 may calculate the inclination in the yaw direction and the inclination in the roll direction with respect to the face of the subject by using the face average data. This makes it possible to calculate the angle of inclination of the face in all directions. Therefore, for example, when performing focusing control, the distance to the pupil can be appropriately calculated, and highly accurate focusing control can be performed.
  • the corresponding processing unit may perform focusing control at a position other than the chin on the subject as the corresponding processing. This prevents the chin from being in focus. Therefore, appropriate focusing control can be performed.
  • the corresponding processing unit may perform focusing control at a position other than the head on the subject as the corresponding processing. This prevents the head from being in focus. Therefore, appropriate focusing control can be performed.
  • the corresponding processing unit may perform focusing control at the position of the pupil of the subject as the corresponding processing. ..
  • the corresponding processing unit may perform focusing control at the position of the pupil of the subject as the corresponding processing. ..
  • the corresponding processing unit may perform focusing control at the position of the pupil of the subject as the corresponding processing. ..
  • the pupil may become unclear. Even in such a case, the distance to the pupil can be appropriately calculated and appropriate focusing control can be executed.
  • the corresponding processing unit may perform a process of outputting tilt information as the corresponding process.
  • the external device can perform processing using the tilt information of the face. For example, it is possible to perform a process of arranging the image in front of the line of sight according to the direction of the face. Such processing is performed, for example, in post production.
  • the corresponding processing unit (focus control unit 26). ) May perform focusing control using the defocus amount estimated for the position of the pupil as the corresponding process. For example, when a front object such as a branch is located between the pupil and the image pickup apparatus 1, the defocus amount may not be calculated appropriately. In such a case, appropriate focusing control can be performed by appropriately estimating the amount of defocus according to the distance to the pupil.
  • the image processing method of the present technology calculates the inclination of the face of the subject in the pitch direction, and performs the corresponding processing using the calculated inclination.
  • the imaging device 1 of the present technology includes an imaging unit 3 that images a subject, an inclination calculation unit 24 that calculates the inclination of the subject's face in the pitch direction, and a control unit 9 that performs focusing control according to the calculated inclination. , Is provided.
  • a program for realizing such an image processing device or an image pickup device 1 may be recorded in advance in an HDD as a recording medium built in a device such as the image pickup device 1 or a ROM in a microcomputer having a CPU.
  • a recording medium built in a device such as the image pickup device 1 or a ROM in a microcomputer having a CPU.
  • flexible discs CD-ROMs (Compact Disc Read Only Memory), MO (Magneto optical) discs, DVDs (Digital Versatile Discs), Blu-ray discs (Blu-ray Discs (registered trademarks)), magnetic discs, semiconductor memories, It can be temporarily or permanently stored (recorded) on a removable recording medium such as a memory card.
  • a removable recording medium can be provided as so-called package software.
  • it can also be downloaded from a download site via a network such as a LAN (Local Area Network) or the Internet.
  • LAN Local Area Network
  • the image processing device imaging device 1 and the like
  • a program it is suitable for a wide range of provision of the image processing device (imaging device 1 and the like) of the embodiment.
  • a mobile terminal device such as a smartphone or tablet equipped with a camera function, a mobile phone, a personal computer, a game device, a video device, a PDA (Personal Digital Assistant), etc.
  • these devices are disclosed.
  • the present technology can also adopt the following configurations.
  • a tilt calculation unit that calculates the tilt of the subject's face in the pitch direction
  • An image processing device including a corresponding processing unit that performs corresponding processing using the calculated inclination.
  • a part specifying part for detecting the position of the nose and the position of the pupil in the subject is provided.
  • the image processing device according to (1) above, wherein the tilt calculation unit calculates the angle of the tilt in the pitch direction with respect to the face based on the detected position of the nose and the position of the pupil.
  • the inclination calculation unit uses face average data obtained as average distance information about a face in calculating the angle.
  • the image processing apparatus (4) It is provided with an average data selection unit that selects one from the face average data provided at least for each race or age.
  • the image processing apparatus according to (3) above, wherein the inclination calculation unit calculates the angle based on one face average data selected by the average data selection unit.
  • the average data selection unit clusters distance information for each of a plurality of ranging areas provided on the image, and the depth of the subject's face is obtained from the distribution of the distance information of the subject's face obtained as a result of the clustering.
  • the image processing apparatus according to (4) above, which calculates the size in the direction and selects the face average data based on the calculated size in the depth direction of the face.
  • the image processing apparatus When the face of the subject is determined to be a downward-facing face based on the inclination calculated by the inclination calculation unit, The image processing apparatus according to any one of (1) to (7) above, wherein the corresponding processing unit performs focusing control at a position other than the head of the subject as the corresponding processing. (9) The image processing apparatus according to any one of (1) to (8) above, wherein the corresponding processing unit controls focusing at the position of the pupil of the subject as the corresponding processing. (10) The image processing apparatus according to any one of (1) to (9) above, wherein the corresponding processing unit performs a process of outputting information on the inclination as the corresponding process.
  • the corresponding processing unit estimates the defocus amount for the pupil position as the corresponding process.
  • the image processing apparatus according to any one of (2) to (6) above, which controls focusing using the above.
  • (12) Calculate the inclination of the subject's face in the pitch direction, An image processing method that performs corresponding processing using the calculated inclination.
  • (12) An imaging unit that captures the subject and A tilt calculation unit that calculates the tilt of the subject's face in the pitch direction, An imaging device including a control unit that performs focusing control according to the calculated inclination.
  • Imaging device image processing device
  • Imaging unit 21
  • Site identification unit 23
  • Average data selection unit 24
  • Tilt calculation unit 26
  • Focus control unit corresponding processing unit
  • Output control unit (corresponding processing unit)

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Optics & Photonics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Studio Devices (AREA)

Abstract

The present technology relates to an imaging device comprising an inclination calculation unit for calculating inclination of the face of a subject in a pitch direction, and a corresponding process unit for performing a corresponding process using the calculated inclination.

Description

画像処理装置、画像処理方法及び撮像装置Image processing equipment, image processing method and imaging equipment
 本技術は、画像における顔検出についての画像処理装置、画像処理方法及び撮像装置に関する。 This technology relates to an image processing device, an image processing method, and an imaging device for face detection in an image.
 カメラで人物を撮影する機会は多く、その際には人物の顔検出を行い顔に合焦することが行われている。
 人物の顔に対する合焦制御では、例えば特許文献1に示すように、顔枠の中心や撮像装置までの距離が最も近い部位に合焦することがある。
There are many opportunities to take a picture of a person with a camera, and in that case, the face of the person is detected and the person is focused on the face.
In focusing control on a person's face, for example, as shown in Patent Document 1, the focus may be on the center of the face frame or the portion closest to the image pickup device.
特開2007-328215号公報Japanese Unexamined Patent Publication No. 2007-328215
 ところで、近年において撮像装置の高画素化及び撮像素子の大型化が進んでおり、被写界深度が浅い撮像装置が増えてきている。
 被写界深度が浅い撮像装置の場合には、顔枠の中心や撮像装置に最も近い部位に合焦してしまうと、意図しない位置に合焦してしまうという問題がある。特に、被写体の顔の上向き加減や下向き加減によっては、撮像装置に最も近い部位があごや頭となってしまう。
 あごや頭に合焦してしまうと、ホワイトバランスの調整処理の結果顔色が不適切になってしまう場合がある。
By the way, in recent years, the number of pixels of an image pickup device has been increased and the size of an image pickup device has been increased, and the number of image pickup devices having a shallow depth of field is increasing.
In the case of an image pickup device having a shallow depth of field, if the focus is on the center of the face frame or the portion closest to the image pickup device, there is a problem that the image pickup device is focused on an unintended position. In particular, depending on the upward or downward adjustment of the subject's face, the part closest to the imaging device becomes the chin or head.
If you focus on your chin or head, your complexion may become inappropriate as a result of the white balance adjustment process.
 このような問題に鑑みて、本技術は、被写体の顔の上向き加減や下向き加減に応じて適切な対応処理を行うことを目的とする。 In view of such problems, the purpose of this technology is to perform appropriate response processing according to the upward adjustment and downward adjustment of the subject's face.
 本技術に係る画像処理装置は、被写体の顔についてのピッチ方向の傾きを算出する傾き算出部と、算出した前記傾きを用いて対応する処理を行う対応処理部と、を備えたものである。
 被写体の顔の傾きを算出することにより、最適な合焦制御やホワイトバランス調整などを行うことが可能となる。
The image processing apparatus according to the present technology includes a tilt calculation unit that calculates the tilt of the face of the subject in the pitch direction, and a corresponding processing unit that performs corresponding processing using the calculated tilt.
By calculating the tilt of the subject's face, it is possible to perform optimal focusing control and white balance adjustment.
 上記した画像処理装置においては、前記被写体における鼻の位置及び瞳の位置を検出する部位特定部を備え、前記傾き算出部は、検出された前記鼻の位置と前記瞳の位置に基づいて前記顔についてのピッチ方向の前記傾きの角度を算出してもよい。
 これにより、ピッチ方向における顔の角度を高精度に算出することができる。
The image processing device described above includes a site specifying unit that detects the position of the nose and the position of the pupil in the subject, and the inclination calculating unit determines the face based on the detected position of the nose and the position of the pupil. The angle of the inclination in the pitch direction may be calculated.
As a result, the angle of the face in the pitch direction can be calculated with high accuracy.
 上記した画像処理装置における前記傾き算出部は、前記角度の算出において、顔についての平均的な距離情報とされた顔平均データを利用してもよい。
 これにより、ピッチ方向における顔の角度をより高精度に算出することができる。
The inclination calculation unit in the image processing apparatus described above may use face average data which is average distance information about a face in calculating the angle.
As a result, the angle of the face in the pitch direction can be calculated with higher accuracy.
 上記した画像処理装置においては、少なくとも人種または年齢ごとに設けられた前記顔平均データから一つを選択する平均データ選択部を備え、前記傾き算出部は、前記平均データ選択部が選択した一つの前記顔平均データに基づいて前記角度の算出を行ってもよい。
 これにより、年齢に応じた顔平均データに基づいて顔の角度が高精度に算出される。
The image processing apparatus described above includes an average data selection unit that selects one from the face average data provided at least for each race or age, and the inclination calculation unit is one selected by the average data selection unit. The angle may be calculated based on the two face average data.
As a result, the angle of the face is calculated with high accuracy based on the average face data according to the age.
 上記した画像処理装置における前記平均データ選択部は、画像上に設けられた複数の測距エリアごとの距離情報のクラスタリングを行い、前記クラスタリングの結果として得られる前記被写体の顔の距離情報の分布から前記被写体の顔の奥行き方向のサイズを算出し、算出された前記顔の奥行き方向のサイズに基づいて前記顔平均データの選択を行ってもよい。
 これにより、子供の顔は小さく大人の顔は大きいことに基づいて顔平均データが選択される。
The average data selection unit in the image processing apparatus described above clusters the distance information for each of a plurality of ranging areas provided on the image, and from the distribution of the distance information of the face of the subject obtained as a result of the clustering. The size of the subject's face in the depth direction may be calculated, and the face average data may be selected based on the calculated size of the face in the depth direction.
As a result, the face average data is selected based on the fact that the child's face is small and the adult's face is large.
 上記した画像処理装置における前記傾き算出部は、前記顔平均データを利用することにより、前記被写体の顔についてのヨー方向の傾き及びロール方向の傾きを算出してもよい。
 これにより、あらゆる方向において顔の傾きの角度を算出することができる。
The inclination calculation unit in the image processing device may calculate the inclination in the yaw direction and the inclination in the roll direction of the face of the subject by using the face average data.
This makes it possible to calculate the angle of inclination of the face in all directions.
 上記した画像処理装置においては、前記傾き算出部によって算出された前記傾きに基づいて、前記被写体の顔が上方を向くあおり顔と判定された場合に、前記対応処理部は、前記対応する処理として、前記被写体におけるあご以外の位置における合焦制御を行ってもよい。
 これにより、あごに合焦されてしまうことが防止される。
In the above-mentioned image processing apparatus, when the face of the subject is determined to be a tilted face facing upward based on the inclination calculated by the inclination calculation unit, the corresponding processing unit performs the corresponding processing. , Focus control may be performed at a position other than the chin on the subject.
This prevents the chin from being in focus.
 上記した画像処理装置において、前記傾き算出部によって算出された前記傾きに基づいて、前記被写体の顔が下方を向くうつむき顔と判定された場合に、前記対応処理部は、前記対応する処理として、前記被写体における頭以外の位置における合焦制御を行ってもよい。
 これにより、頭に合焦されてしまうことが防止される。
In the above-mentioned image processing apparatus, when the face of the subject is determined to be a downward-facing face based on the inclination calculated by the inclination calculation unit, the corresponding processing unit performs the corresponding processing as the corresponding processing. Focus control may be performed at a position other than the head of the subject.
This prevents the head from being in focus.
 上記した画像処理装置における前記対応処理部は、前記対応する処理として、前記被写体の瞳の位置における合焦制御を行ってもよい。
 例えば、高解像度の撮像装置など被写界深度がきわめて浅い場合において、瞳が不鮮明になってしまう場合がある。
The corresponding processing unit in the image processing apparatus described above may perform focusing control at the position of the pupil of the subject as the corresponding process.
For example, when the depth of field is extremely shallow, such as in a high-resolution imaging device, the pupil may become unclear.
 上記した画像処理装置における前記対応処理部は、前記対応する処理として、前記傾きの情報を出力する処理を行ってもよい。
 これにより、例えば外部装置は顔の傾き情報を用いた処理を行うことができる。
The corresponding processing unit in the image processing apparatus described above may perform a process of outputting the tilt information as the corresponding process.
As a result, for example, the external device can perform processing using the tilt information of the face.
 上記した画像処理装置においては、前記部位特定部によって瞳の位置の検出ができたが瞳の位置についてのデフォーカス量が算出できなかった場合に、前記対応処理部は、前記対応する処理として瞳の位置について推定したデフォーカス量を用いて合焦制御を行ってもよい。
 例えば、瞳と撮像装置の間に枝などのオブジェクトが位置している場合にはデフォーカス量が適切に算出できない場合がある。
In the image processing apparatus described above, when the position of the pupil can be detected by the site identification unit but the amount of defocus for the position of the pupil cannot be calculated, the corresponding processing unit performs the pupil as the corresponding process. Focus control may be performed using the defocus amount estimated for the position of.
For example, when an object such as a branch is located between the pupil and the image pickup device, the defocus amount may not be calculated appropriately.
 本技術に係る画像処理方法は、被写体の顔についてのピッチ方向の傾きを算出し、算出した前記傾きを用いて対応する処理を行うものである。 The image processing method according to the present technology calculates the inclination of the face of the subject in the pitch direction, and performs the corresponding processing using the calculated inclination.
 本技術に係る撮像装置は、被写体を撮像する撮像部と、前記被写体の顔についてのピッチ方向の傾きを算出する傾き算出部と、算出した前記傾きに応じて合焦制御を行う制御部と、を備えたものである。 The imaging device according to the present technology includes an imaging unit that images a subject, a tilt calculation unit that calculates the tilt of the subject's face in the pitch direction, and a control unit that performs focusing control according to the calculated tilt. It is equipped with.
本技術の実施の形態の撮像装置のブロック図である。It is a block diagram of the image pickup apparatus of embodiment of this technique. 像面位相差画素としてのPD分割画素の配置例を示す概略説明図である。It is a schematic explanatory drawing which shows the arrangement example of the PD division pixel as an image plane phase difference pixel. 撮像装置の機能構成例を示すブロック図である。It is a block diagram which shows the functional configuration example of an image pickup apparatus. 対応処理の第1例のフローチャートである。It is a flowchart of 1st example of correspondence processing. 合焦エリア選択処理の一例を示すフローチャートである。It is a flowchart which shows an example of the focusing area selection process. 対象領域と顔枠の関係を示す概略説明図である。It is a schematic explanatory drawing which shows the relationship between a target area and a face frame. 顔枠内の対象領域の距離についてのヒストグラムである。It is a histogram about the distance of the target area in a face frame. クラスタリング処理によって顔クラスタに分類された対象領域を示す図である。It is a figure which shows the target area classified into a face cluster by a clustering process. 頭長及び耳と鼻の距離を示すための説明図である。It is explanatory drawing for showing the head length and the distance between an ear and a nose. 被写体とレンズと撮像素子と焦点距離の関係を示す説明図である。It is explanatory drawing which shows the relationship between a subject, a lens, an image sensor, and a focal length. 撮像装置に正対した顔の瞳位置と鼻位置を示す概略図である。It is the schematic which shows the pupil position and the nose position of the face facing the image pickup apparatus. あおり顔における瞳位置と鼻位置を示す概略図である。It is the schematic which shows the pupil position and the nose position in the tilted face. うつむき顔における瞳位置と鼻位置を示す概略図である。It is the schematic which shows the pupil position and the nose position in the face down. 瞳位置と鼻位置の距離と傾きの関係を示すグラフである。It is a graph which shows the relationship between the distance and the inclination of the pupil position and the nose position. 瞳位置及び鼻位置の情報を用いてピッチ方向の顔の傾きを算出するための処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the process flow for calculating the inclination of a face in a pitch direction using the information of a pupil position and a nose position. 対応処理の第2例を示す図であり、被写体の視線の先を強調表示する処理の例を示す図である。It is a figure which shows the 2nd example of the correspondence processing, and is the figure which shows the example of the processing which highlights the tip of the line of sight of a subject. 図16と共に対応処理の第2例を示す図であり、被写体の視線の先を強調表示する処理の例を示す図である。FIG. 16 is a diagram showing a second example of the corresponding process together with FIG. 16, and is a diagram showing an example of a process of highlighting the tip of the line of sight of the subject.
 以下、実施の形態について添付図面を参照しながら次の順序で説明する。
<1.撮像装置の構成>
<2.制御部の機能構成>
<3.傾き情報を用いた対応処理>
<3-1.対応処理の第1例>
<3-2.対応処理の第2例>
<4.まとめ>
<5.本技術>
Hereinafter, embodiments will be described in the following order with reference to the accompanying drawings.
<1. Imaging device configuration>
<2. Control function configuration>
<3. Correspondence processing using tilt information>
<3-1. First example of corresponding processing>
<3-2. Second example of corresponding processing>
<4. Summary>
<5. This technology>
<1.撮像装置の構成>
 本技術の画像処理装置の実施の形態の一つとして、画像処理部を備えた撮像装置を例に挙げて説明する。
 但し、画像処理装置は画像処理部を備えた撮像装置に限らず、受信した撮像画像に対する画像処理を行う情報処理装置であってもよい。
<1. Imaging device configuration>
As one of the embodiments of the image processing apparatus of the present technology, an image pickup apparatus provided with an image processing unit will be described as an example.
However, the image processing device is not limited to the image processing device provided with the image processing unit, and may be an information processing device that performs image processing on the received captured image.
 撮像装置1は、ビデオカメラやスチルカメラとしての各種の形態が考え得る。 The image pickup device 1 can be considered in various forms as a video camera or a still camera.
 撮像装置1は、内外に撮像素子や各種レンズや操作子等を備えている。また、撮像装置1に表示モニタやEVF(Electric Viewfinder)が設けられていてもよい。 The image pickup device 1 is provided with an image pickup element, various lenses, an operator, and the like inside and outside. Further, the image pickup apparatus 1 may be provided with a display monitor or an EVF (Electric Viewfinder).
 図1に撮像装置1のブロック図を示す。
 撮像装置1は、レンズ系2と撮像部3と信号処理部4と記録部5と表示部6と出力部7と操作部8と制御部9とメモリ部10とドライバ部11とを備えている。
FIG. 1 shows a block diagram of the image pickup apparatus 1.
The imaging device 1 includes a lens system 2, an imaging unit 3, a signal processing unit 4, a recording unit 5, a display unit 6, an output unit 7, an operation unit 8, a control unit 9, a memory unit 10, and a driver unit 11. ..
 レンズ系2は、各種の光学レンズや絞り機構等を備えている。 The lens system 2 is equipped with various optical lenses, an aperture mechanism, and the like.
 撮像部3は、例えばCCD(Charge Coupled Device)型やCMOS(Complementary Metal-Oxide Semiconductor)型とされた撮像素子12を備えて構成されている。撮像素子12のセンサ面は、複数の画素が2次元配列されたセンシング素子を有して構成されている。 The image pickup unit 3 is configured to include, for example, an image pickup element 12 of a CCD (Charge Coupled Device) type or a CMOS (Complementary Metal-Oxide Semiconductor) type. The sensor surface of the image pickup device 12 is configured to include a sensing element in which a plurality of pixels are two-dimensionally arranged.
  撮像部3では、撮像素子12で受光した光を光電変換して得た電気信号について、例えばCDS(Correlated Double Sampling)処理、AGC(Automatic Gain Control)処理などを実行し、さらにA/D(Analog/Digital)変換処理を行う。撮像部3は、デジタルデータとしての撮像画像信号を、信号処理部4や制御部9に出力する。 The image pickup unit 3 executes, for example, CDS (Correlated Double Sampling) processing, AGC (Automatic Gain Control) processing, and the like on the electric signal obtained by photoelectric conversion of the light received by the image sensor 12, and further performs A / D (Analog) processing. / Digital) Performs conversion processing. The imaging unit 3 outputs the captured image signal as digital data to the signal processing unit 4 and the control unit 9.
 撮像素子12は、例えば、デフォーカス量を算出するための位相差情報を算出するための信号を出力する像面位相差画素12aを含んで構成されている。
 撮像素子12が備える全ての画素が像面位相差画素12aとされていてもよいし、一部の画素が像面位相差画素12aとされていてもよい。
The image sensor 12 includes, for example, an image plane phase difference pixel 12a that outputs a signal for calculating phase difference information for calculating the defocus amount.
All the pixels included in the image pickup device 12 may be the image plane phase difference pixel 12a, or some pixels may be the image plane phase difference pixel 12a.
 像面位相差画素12aの配置例について図2に示す。
 ベイヤー配列のカラーフィルタ(R,G,B)を備えた各画素のうち、緑(G)の分光感度を有するカラーフィルタを有する画素の一部が像面位相差画素12aとされている。
FIG. 2 shows an example of arranging the image plane retardation pixels 12a.
Among the pixels provided with the Bayer-arranged color filters (R, G, B), a part of the pixels having the color filter having the spectral sensitivity of green (G) is defined as the image plane phase difference pixel 12a.
 像面位相差画素12aは、例えば、PD(Photodiode)分割方式によるPD分割画素とされていてもよいし、遮光方式による遮光画素とされていてもよい。図2に示す例は、PD分割画素が像面位相差画素12aとして採用されている例である。 The image plane phase difference pixel 12a may be, for example, a PD divided pixel by a PD (Photodiode) division method or a light-shielding pixel by a light-shielding method. The example shown in FIG. 2 is an example in which the PD division pixel is adopted as the image plane retardation pixel 12a.
 本実施の形態では、所定の大きさの画素領域(以降「対象領域」と記載)に含まれる像面位相差画素12aから出力される信号に基づいてデフォーカス量を算出し、デフォーカス量から対象領域ごとに基準位置(例えば撮像素子12)と被写体の距離情報を算出する。これにより、例えば被写体の顔を構成する部分ごとの距離情報を算出し、後述する処理に用いる。 In the present embodiment, the defocus amount is calculated based on the signal output from the image plane phase difference pixel 12a included in the pixel region of a predetermined size (hereinafter referred to as “target region”), and the defocus amount is calculated from the defocus amount. The distance information between the reference position (for example, the image sensor 12) and the subject is calculated for each target area. Thereby, for example, the distance information for each part constituting the face of the subject is calculated and used in the processing described later.
 信号処理部4は、例えば、DSP(Digital Signal Processor)などのデジタル信号処理に特化したマイクロプロセッサや、マイクロコンピュータなどにより構成される。 The signal processing unit 4 is composed of, for example, a microcomputer specialized in digital signal processing such as a DSP (Digital Signal Processor), a microcomputer, or the like.
 信号処理部4は、撮像部3から送られてくるデジタル信号(撮像画像信号)に対して、各種の信号処理を施すための各部を備える。 The signal processing unit 4 includes each unit for performing various signal processing on the digital signal (image captured image signal) sent from the imaging unit 3.
 具体的には、R,G,Bの色チャンネル間の補正処理、ホワイトバランス補正、収差補正、シェーディング補正等の処理を行う。
 また、信号処理部4は、R,G,Bの画像データから、輝度(Y)信号及び色(C)信号を生成(分離)するYC生成処理や、輝度や色を調整する処理、ニー補正やガンマ補正などの各処理を行う。
 更に、信号処理部4は、解像度変換処理や記録用や通信用のための符号化を行うコーデック処理などを行うことによって最終的な出力形式への変換を行う。最終的な出力形式へ変換された画像データは、メモリ部10に記憶される。また、画像データが表示部6に出力されることにより、本体の背面側やEVFに設けられたモニタに画像が表示される。更に、外部出力端子を備えた出力部7から出力されることにより、撮像装置1の外部に設けられたモニタ等の機器に表示される。
Specifically, processing such as correction processing between R, G, and B color channels, white balance correction, aberration correction, and shading correction is performed.
Further, the signal processing unit 4 generates (separates) a luminance (Y) signal and a color (C) signal from the image data of R, G, and B, a YC generation process, a process of adjusting the luminance and the color, and a knee correction. And each process such as gamma correction.
Further, the signal processing unit 4 performs conversion to the final output format by performing resolution conversion processing, codec processing for encoding for recording and communication, and the like. The image data converted into the final output format is stored in the memory unit 10. Further, by outputting the image data to the display unit 6, the image is displayed on the back side of the main body or on the monitor provided in the EVF. Further, by outputting from the output unit 7 provided with the external output terminal, it is displayed on a device such as a monitor provided outside the image pickup apparatus 1.
 記録部5は、例えば不揮発性メモリからなり、静止画データや動画データ等の画像ファイル(コンテンツファイル)や、画像ファイルの属性情報、サムネイル画像等を記憶する記憶手段として機能する。
 画像ファイルは、例えばJPEG(Joint Photographic Experts Group)、TIFF(Tagged Image File Format)、GIF(Graphics Interchange Format)等の形式で記憶される。
 記録部5の実際の形態は多様に考えられる。例えば、記録部5が撮像装置1に内蔵されるフラッシュメモリとして構成されていてもよいし、撮像装置1に着脱できるメモリカード(例えば可搬型のフラッシュメモリ)と該メモリカードに対して記憶や読み出しのためのアクセスを行うアクセス部とで構成されていてもよい。また撮像装置1に内蔵されている形態としてHDD(Hard Disk Drive)などとして実現されることもある。
The recording unit 5 is composed of, for example, a non-volatile memory, and functions as a storage means for storing an image file (content file) such as still image data or moving image data, attribute information of the image file, a thumbnail image, or the like.
The image file is stored in a format such as JPEG (Joint Photographic Experts Group), TIFF (Tagged Image File Format), GIF (Graphics Interchange Format), or the like.
The actual form of the recording unit 5 can be considered in various ways. For example, the recording unit 5 may be configured as a flash memory built in the image pickup device 1, or a memory card (for example, a portable flash memory) that can be attached to and detached from the image pickup device 1 and storage / reading from the memory card. It may be composed of an access unit that performs access for. Further, it may be realized as an HDD (Hard Disk Drive) or the like as a form built in the image pickup apparatus 1.
 表示部6は、撮像者に対して各種の表示を行うための処理を実行する。表示部6は、例えば、背面モニタやファインダモニタとされる。表示部6は、信号処理部4から入力される適切な解像度に変換された画像データを表示する処理を行う。これにより、レリーズのスタンバイ中の撮像画像である所謂スルー画を表示させる。
 更に、表示部6は、制御部9からの指示に基づいて各種操作メニューやアイコン、メッセージ等、GUI(Graphical User Interface)としての表示を画面上で実現させる。
 また、表示部6は、記録部5において記録媒体から読み出された画像データの再生画像を表示させることが可能である。
The display unit 6 executes processing for performing various displays on the imager. The display unit 6 is, for example, a rear monitor or a finder monitor. The display unit 6 performs a process of displaying image data converted to an appropriate resolution input from the signal processing unit 4. As a result, a so-called through image, which is a captured image during the standby of the release, is displayed.
Further, the display unit 6 realizes on the screen the display of various operation menus, icons, messages, etc. as a GUI (Graphical User Interface) based on the instruction from the control unit 9.
Further, the display unit 6 can display a reproduced image of the image data read from the recording medium by the recording unit 5.
 出力部7は、外部機器とのデータ通信やネットワーク通信を有線や無線で行う。例えば、外部の表示装置、記録装置、再生装置等に対して撮像画像データ(静止画ファイルや動画ファイル)の送信を行う。
 また、出力部7は、ネットワーク通信部として機能してもよい。例えば、インターネット、ホームネットワーク、LAN(Local Area Network)等の各種のネットワークによる通信を行い、ネットワーク上のサーバや端末等との間で各種データの送受信を行うようにしてもよい。
The output unit 7 performs data communication and network communication with an external device by wire or wireless. For example, captured image data (still image file or moving image file) is transmitted to an external display device, recording device, playback device, or the like.
Further, the output unit 7 may function as a network communication unit. For example, communication may be performed by various networks such as the Internet, a home network, and a LAN (Local Area Network), and various data may be transmitted and received to and from a server, a terminal, or the like on the network.
 カメラの筐体に設けられた操作部8は、ボタンやスイッチなどのメカニカルな操作子だけでなく、タッチパネル方式を採用したモニタなども含んでおり、撮像者のタップ操作やスワイプ操作などの種々の操作に応じた操作情報を制御部9に出力する。 The operation unit 8 provided in the housing of the camera includes not only mechanical controls such as buttons and switches, but also a monitor that employs a touch panel method, and various operations such as tap operation and swipe operation of the imager. The operation information corresponding to the operation is output to the control unit 9.
 制御部9は、CPU(Central Processing Unit)を備えたマイクロコンピュータにより構成され、撮像装置1の統括的な制御を行う。例えば、撮像者の操作に応じたシャッタスピードの制御や、信号処理部4における各種信号処理についての指示、ユーザの操作に応じた撮像動作や記録動作、記録した画像ファイルの再生動作を行う。 The control unit 9 is composed of a microcomputer equipped with a CPU (Central Processing Unit) and performs overall control of the image pickup apparatus 1. For example, the shutter speed is controlled according to the operation of the imager, the signal processing unit 4 instructs various signal processing, the imaging operation and the recording operation are performed according to the user's operation, and the recorded image file is reproduced.
 また、制御部9は、光学系51が備える各種のレンズを制御するためにドライバ部11に対する指示を行う。
 例えば、AF制御のための必要な光量を確保するために絞り値を指定する処理や、絞り値に応じた絞り機構の動作指示などを行う。
 なお、制御部9は後述する各種の機能を実現可能な構成とされている。具体的には後述する。
Further, the control unit 9 gives an instruction to the driver unit 11 in order to control various lenses included in the optical system 51.
For example, a process of designating an aperture value in order to secure a necessary amount of light for AF control, an operation instruction of an aperture mechanism according to the aperture value, and the like are performed.
The control unit 9 has a configuration capable of realizing various functions described later. Specifically, it will be described later.
 メモリ部10は、制御部9が実行する処理に用いられる情報等を記憶する。図示するメモリ部10は、例えば、ROM(Read Only Memory)、RAM(Random Access Memory)、フラッシュメモリなどを包括的に示している。
 メモリ部10は制御部9としてのマイクロコンピュータチップに内蔵されるメモリ領域であってもよいし、別体のメモリチップにより構成されてもよい。
The memory unit 10 stores information and the like used for processing executed by the control unit 9. The illustrated memory unit 10 comprehensively indicates, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), a flash memory, and the like.
The memory unit 10 may be a memory area built in the microcomputer chip as the control unit 9, or may be configured by a separate memory chip.
 メモリ部10のROMやフラッシュメモリ等には、制御部9が利用するプログラム等が記憶される。ROMやフラッシュメモリ等には、CPUが各部を制御するためのOS(Operating System)や画像ファイル等のコンテンツファイルの他、各種動作のためのアプリケーションプログラムやファームウェア等が記憶される。
 制御部9は、当該プログラムを実行することで、撮像装置1の全体を制御する。
Programs and the like used by the control unit 9 are stored in the ROM, flash memory, and the like of the memory unit 10. In the ROM, flash memory, etc., in addition to content files such as an OS (Operating System) for the CPU to control each part and an image file, application programs and firmware for various operations are stored.
The control unit 9 controls the entire image pickup apparatus 1 by executing the program.
 メモリ部10のRAMは、制御部9のCPUが実行する各種データ処理の際に用いられるデータやプログラム等が一時的に格納されることにより、制御部9の作業領域として利用される。 The RAM of the memory unit 10 is used as a work area of the control unit 9 by temporarily storing data, programs, and the like used in various data processing executed by the CPU of the control unit 9.
 ドライバ部11は、例えば、ズームレンズ駆動モータに対するモータドライバ、フォーカスレンズ駆動モータに対するモータドライバ、絞り機構を駆動するモータに対する絞り機構ドライバ等が設けられている。
 各ドライバは制御部9からの指示に応じて駆動電流を対応する駆動モータに供給する。
The driver unit 11 is provided with, for example, a motor driver for the zoom lens drive motor, a motor driver for the focus lens drive motor, an aperture mechanism driver for the motor that drives the aperture mechanism, and the like.
Each driver supplies a drive current to the corresponding drive motor in response to an instruction from the control unit 9.
<2.制御部の機能構成>
 撮像装置1の制御部9では、メモリ部10としてのROMやRAMに記憶されたプログラムが実行されることにより、図3のような機能構成が構築される。
 なお、図3に示す機能構成は、制御部9のみに構築されるだけでなく、制御部9と信号処理部4が協働して機能を果たすように構築されてもよい。また、図3に示す機能構成が信号処理部4に構築されるものであってもよい。
<2. Control function configuration>
In the control unit 9 of the image pickup apparatus 1, the functional configuration as shown in FIG. 3 is constructed by executing the program stored in the ROM or RAM as the memory unit 10.
The functional configuration shown in FIG. 3 is not only constructed only in the control unit 9, but may be constructed so that the control unit 9 and the signal processing unit 4 cooperate to perform a function. Further, the functional configuration shown in FIG. 3 may be constructed in the signal processing unit 4.
 制御部9は、被写体特定部20と部位特定部21とデフォーカス量算出部22と平均データ選択部23と傾き算出部24とエリア選択部25と合焦制御部26と出力制御部27とを備えている。 The control unit 9 includes a subject identification unit 20, a site identification unit 21, a defocus amount calculation unit 22, an average data selection unit 23, an inclination calculation unit 24, an area selection unit 25, a focusing control unit 26, and an output control unit 27. I have.
 被写体特定部20は、被写体を検出する公知の技術により処理対象の画像における人物や物体などの被写体を検出し特定する。被写体を検出する手法としては、例えば、テンプレートマッチングによる顔/物体認識技術、被写体の輝度分布情報に基づくマッチング方法、画像に含まれる肌色の部分や人間の顔の特徴量等に基づく方法などを用いることができる。また、これらの手法を組み合わせて検出精度を高めるようにしてもよい。
 更に、CNN(Convolutional Neural Network)などの機械学習の技術を利用して画像認識処理を実行することにより被写体を特定してもよいし、ユーザが被写体について入力した情報を利用して特定してもよい。
The subject identification unit 20 detects and identifies a subject such as a person or an object in the image to be processed by a known technique for detecting the subject. As a method for detecting a subject, for example, a face / object recognition technique by template matching, a matching method based on the brightness distribution information of the subject, a method based on a skin-colored part included in an image, a feature amount of a human face, or the like is used. be able to. Further, these methods may be combined to improve the detection accuracy.
Further, the subject may be specified by executing image recognition processing using machine learning technology such as CNN (Convolutional Neural Network), or may be specified by using the information input by the user about the subject. good.
 部位特定部21は、撮像画像上における被写体の顔の部位の位置を特定する処理を実行する。特定する顔の部位は、例えば、瞳、鼻、あご、頭、ほほ、耳、口などである。
 顔の部位の特定には、例えばCNNなどの機械学習により行う。CNNでは、畳み込み処理とプーリング処理を行うことにより画像の特徴量の圧縮を行う。これにより、被写体の顔の部位を特定することができる。
The site specifying unit 21 executes a process of identifying the position of the face portion of the subject on the captured image. The parts of the face to be identified are, for example, eyes, nose, chin, head, cheeks, ears, mouth and the like.
The identification of the facial part is performed by machine learning such as CNN. In CNN, the feature amount of the image is compressed by performing the convolution process and the pooling process. This makes it possible to identify the facial part of the subject.
 デフォーカス量算出部22は、対象領域ごとのデフォーカス量を算出することにより、被写体の顔と撮像素子12の距離を算出する。対象領域ごとのデフォーカス量と部位特定部21で検出した被写体の顔の部位の位置情報を組み合わせることにより、部位ごとに撮像素子12からの距離を算出することができる。 The defocus amount calculation unit 22 calculates the distance between the face of the subject and the image sensor 12 by calculating the defocus amount for each target area. By combining the defocus amount for each target area and the position information of the facial part of the subject detected by the part identification unit 21, the distance from the image sensor 12 can be calculated for each part.
 平均データ選択部23は、被写体の属性情報に基づいて顔の平均データ(以降、「顔平均データ」と記載)を選択する処理を行う。 The average data selection unit 23 performs a process of selecting face average data (hereinafter referred to as “face average data”) based on subject attribute information.
 属性情報とは、例えば、性別、年齢、人種などである。また、年齢は、具体的な年齢だけでなく、年代(10代、20代など)のような大まかなくくりであってもよいし、更に大まかに「大人」や「子供」などの別であってもよい。 Attribute information is, for example, gender, age, race, etc. In addition, the age may be not only a specific age but also a rough outline such as the age (teens, 20s, etc.), or more roughly, such as "adult" or "child". You may.
 顔平均データは、例えば、後頭・鼻尖距離、全頭高、瞳孔間幅、頭囲、頭長、頭幅、頬弓幅、下顎角幅、耳珠間幅、乳様突起間幅、内眼角幅、外眼角幅、口裂幅、形態学顔高、鼻高、鼻下・オトガイ距離などの部位間の距離情報である。
 以下の説明においては、これらの各種距離情報から成る顔平均データについて「大人用」と「子供用」の2種類が設けられている例に基づいて説明を行う。
Face average data includes, for example, occipital / nasal tip distance, total head height, inter-pupil width, head circumference, head length, head width, buccal arch width, mandibular angle width, ears width, mastoid width, and internal eye angle. Distance information between parts such as width, external eye angle width, mouth fissure width, morphological face height, nose height, and subnasal / chin distance.
In the following description, the face average data consisting of these various distance information will be described based on an example in which two types, "for adults" and "for children", are provided.
 顔平均データを選択するためには、先ず、機械学習を用いた画像認識の結果として被写体の鼻の位置や瞳の位置やあごの位置等の顔の部位についての位置情報を得る。続いて、顔の部位間の距離情報(例えば、瞳孔間幅や鼻高など)を算出し、それらの情報と複数の顔平均データ(子供用顔平均データと大人用顔平均データ)を比較して最も適合する顔平均データを選択する。 In order to select face average data, first, as a result of image recognition using machine learning, position information about facial parts such as the position of the nose, the position of the eyes, and the position of the chin of the subject is obtained. Next, distance information between facial parts (for example, inter-pupil width, nose height, etc.) is calculated, and the information is compared with a plurality of face average data (children's face average data and adult face average data). Select the most suitable face average data.
 傾き算出部24は、被写体特定部20が特定した被写体についての傾き情報を算出する。具体的には、例えば、被写体特定部20が特定した被写体としての人物の顔のピッチ方向とヨー方向とロール方向の傾きを算出する。なお、ピッチ方向、ヨー方向及びロール方向の一部の傾きのみを算出してもよい。 The tilt calculation unit 24 calculates tilt information for the subject specified by the subject identification unit 20. Specifically, for example, the inclination of the pitch direction, the yaw direction, and the roll direction of the face of the person as the subject specified by the subject identification unit 20 is calculated. It should be noted that only a part of the inclinations in the pitch direction, the yaw direction and the roll direction may be calculated.
 ここで、撮像装置1の光軸方向を「前後方向」とし、垂直方向を「上下方向」とし、前後方向及び上下方向に共に直交する方向を「左右方向」と記載する。
 ピッチ方向の顔の傾きは、左右方向を回転軸とした顔の傾きを指す。また、ヨー方向の顔の傾きは、上下方向を回転軸とした顔の傾きを指す。更に、ロール方向の傾きは、前後方向を回転軸とした顔の傾きを指す。
 以降の説明においては、ピッチ方向、ヨー方向及びロール方向の傾きをまとめて「顔の傾き」と記載する。
Here, the optical axis direction of the image pickup apparatus 1 is referred to as "front-back direction", the vertical direction is referred to as "vertical direction", and the direction orthogonal to both the front-back direction and the up-down direction is referred to as "left-right direction".
The inclination of the face in the pitch direction refers to the inclination of the face with the left-right direction as the rotation axis. The inclination of the face in the yaw direction refers to the inclination of the face with the vertical direction as the rotation axis. Further, the inclination in the roll direction refers to the inclination of the face with the rotation axis in the front-back direction.
In the following description, the inclinations in the pitch direction, the yaw direction, and the roll direction are collectively referred to as "face inclination".
 各方向の顔の傾きを算出する方法はいくつか考えられる。具体的には後述する。 There are several possible ways to calculate the inclination of the face in each direction. Specifically, it will be described later.
 エリア選択部25は、合焦制御の対象となる領域である合焦エリアを対象領域から選択する処理を行う。
 合焦エリアの選択は、被写体のどの部位に合焦するかに基づいて行われる。例えば、瞳に合焦したい場合には、瞳が撮像された対象エリアを合焦エリアとして選択する。
 なお、被写体の瞳が撮像された対象領域を合焦エリアとして選択したい場合に該対象領域が不明な場合がある。
The area selection unit 25 performs a process of selecting an in-focus area, which is an area to be in-focus control, from the target area.
The focus area is selected based on which part of the subject is in focus. For example, when it is desired to focus on the pupil, the target area in which the pupil is imaged is selected as the focusing area.
When it is desired to select the target area in which the pupil of the subject is imaged as the focusing area, the target area may be unknown.
 合焦制御部26は、エリア選択部25によって選択された合焦エリアに基づいて合焦制御を行う。合焦制御にはいくつかのパターンが考えられる。合焦エリアが被写体の瞳が撮像された領域である例を挙げて説明する。 The focusing control unit 26 performs focusing control based on the focusing area selected by the area selection unit 25. There are several possible patterns for focusing control. An example will be described in which the focusing area is the area where the pupil of the subject is imaged.
 先ず、合焦エリアが問題なく選択されており、且つ、合焦エリアのデフォーカス量が適切に算出できている場合には、合焦エリアについてのデフォーカス量に基づいた合焦制御を行う。 First, if the focusing area is selected without any problem and the defocus amount of the focusing area can be calculated appropriately, the focusing control is performed based on the defocus amount of the focusing area.
 次に、合焦エリアが問題なく選択されているが、瞳と撮像装置1の間に位置する障害物(以降「前方オブジェクト」と記載)が位置しており合焦エリアにおけるデフォーカス量が適切に算出できなかった場合には、被写体の他の部位(例えば鼻)と撮像素子12の距離と、当該他の部位と瞳の前後方向における位置の差分情報から瞳と撮像素子12の距離を推定して合焦制御を行う。
 合焦エリアが特定できなかった場合であっても、他の部位と瞳の相対的な位置関係から瞳と撮像素子12の距離を推定して合焦することが可能である。
Next, the focusing area is selected without any problem, but an obstacle (hereinafter referred to as "forward object") located between the pupil and the image sensor 1 is located, and the defocus amount in the focusing area is appropriate. If it cannot be calculated, the distance between the pupil and the image sensor 12 is estimated from the distance between the other part of the subject (for example, the nose) and the image sensor 12, and the difference information between the other part and the position of the pupil in the anteroposterior direction. Then focus control is performed.
Even if the focusing area cannot be specified, it is possible to estimate the distance between the pupil and the image sensor 12 from the relative positional relationship between the other portion and the pupil and focus on the focus.
 出力制御部27は、撮像画像中の被写体の位置や顔の位置、或いは、傾き算出部24が算出した各種の傾き(各軸回りの回転角度)などの情報を出力部7を介して出力する。
The output control unit 27 outputs information such as the position of the subject and the position of the face in the captured image, or various inclinations (rotation angles around each axis) calculated by the inclination calculation unit 24 via the output unit 7. ..
<3.傾き情報を用いた対応処理>
 被写体の顔の傾き情報を用いて各種の対応処理を実行することが可能である。いくつかの例を示す。
<3. Correspondence processing using tilt information>
It is possible to execute various corresponding processes using the tilt information of the subject's face. Here are some examples.
<3-1.対応処理の第1例>
 対応処理の第1例では、被写体の瞳に合焦する制御を行う。具体的に、図4のフローチャートを参照しながら説明する。
<3-1. First example of corresponding processing>
In the first example of the correspondence process, the focus is controlled on the pupil of the subject. Specifically, it will be described with reference to the flowchart of FIG.
 制御部9はステップS101において、顔検出処理を行う。この処理は、例えば前述したように、CNNなどの機械学習を用いて行う。 The control unit 9 performs face detection processing in step S101. This process is performed using machine learning such as CNN, for example, as described above.
 続いて、制御部9はステップS102において、被写体の顔を検出できたか否かを判定し分岐処理を行う。 Subsequently, in step S102, the control unit 9 determines whether or not the face of the subject can be detected and performs branch processing.
 被写体の顔を判定できなかったと判定した場合、制御部9は、例えばステップS103において、ユーザに対象領域一つを選択させる処理を行い、選択された対象領域を合焦エリアとして選択する。その後、制御部9はステップS105の処理へと進む。 When it is determined that the face of the subject cannot be determined, the control unit 9 performs a process of causing the user to select one target area, for example, in step S103, and selects the selected target area as the focusing area. After that, the control unit 9 proceeds to the process of step S105.
 一方、被写体の顔を検出できたと判定した場合、制御部9はステップS104の合焦エリア選択処理へと進む。 On the other hand, if it is determined that the face of the subject can be detected, the control unit 9 proceeds to the focusing area selection process in step S104.
 合焦エリア選択処理の一例について図5に示す。
 合焦エリア選択処理において、制御部9はステップS201において、瞳検出ができたか否かを判定する。被写体の瞳検出ができた場合、制御部9はステップS202において検出された瞳位置に該当する対象エリアについてのデフォーカス量の算出を試みると共にデフォーカス量の算出可否を判定する。
An example of the focusing area selection process is shown in FIG.
In the focusing area selection process, the control unit 9 determines in step S201 whether or not the pupil can be detected. When the pupil of the subject can be detected, the control unit 9 attempts to calculate the defocus amount for the target area corresponding to the pupil position detected in step S202, and determines whether or not the defocus amount can be calculated.
 デフォーカス量の算出ができたと判定した場合、例えば、被写体の瞳と撮像装置1の間に障害物(前方オブジェクト)が存在しないような場合、制御部9はステップS203において瞳位置に該当する対象領域を合焦エリアとして選択して図5に示す合焦エリア選択処理を終了する。 When it is determined that the defocus amount can be calculated, for example, when there is no obstacle (forward object) between the pupil of the subject and the imaging device 1, the control unit 9 is the target corresponding to the pupil position in step S203. The area is selected as the focusing area, and the focusing area selection process shown in FIG. 5 is completed.
 一方、ステップS201において瞳検出ができなかったと判定した場合、或いはステップS202において瞳検出はできたがデフォーカス量が算出できなかったと判定した場合、制御部9はステップS204からステップS211の各処理を実行することにより、被写体の瞳に合焦するための合焦エリアを選択する。
 例えば、瞳と撮像装置1の間に枝などの前方オブジェクトが位置している場合には、瞳検出できたとしてもデフォーカス量が正常に算出できない場合がある。このような場合に、ステップS204へと進む。
On the other hand, if it is determined in step S201 that the pupil could not be detected, or if it is determined in step S202 that the pupil could be detected but the defocus amount could not be calculated, the control unit 9 performs each process from step S204 to step S211. By executing this, the focusing area for focusing on the pupil of the subject is selected.
For example, when a front object such as a branch is located between the pupil and the image pickup apparatus 1, the defocus amount may not be calculated normally even if the pupil can be detected. In such a case, the process proceeds to step S204.
 制御部9はステップS204において、顔枠30内に含まれる対象領域31Aごとのデフォーカス量を算出する。
 例えば、図6に示すように、撮像画像に含まれる複数の対象領域31のうち顔枠30の内側に位置する対象領域31Aごとにデフォーカス量を算出する。
In step S204, the control unit 9 calculates the defocus amount for each target area 31A included in the face frame 30.
For example, as shown in FIG. 6, the defocus amount is calculated for each target area 31A located inside the face frame 30 among the plurality of target areas 31 included in the captured image.
 続いて、制御部9はステップS205において、ステップS204で算出したデフォーカス量のヒストグラムを作成し、クラスタリング処理を行う。
 図7は対象領域31Aのヒストグラムの一例である。ヒストグラムにおける横軸は撮像素子12との距離を表し、縦軸は、該当する対象領域31Aの数を表す。
Subsequently, in step S205, the control unit 9 creates a histogram of the defocus amount calculated in step S204 and performs a clustering process.
FIG. 7 is an example of a histogram of the target area 31A. The horizontal axis in the histogram represents the distance to the image sensor 12, and the vertical axis represents the number of the corresponding target regions 31A.
 図7に示すヒストグラムから、k平均法などの手法を用いてクラスタリングを行う。クラスタリングの結果、各対象領域31Aは、顔を撮像した領域が分類される「顔クラスタ」と前方オブジェクトを撮像した領域が分類される「前方オブジェクトクラスタ」と背景オブジェクトを撮像した領域が分類される「背景オブジェクトクラスタ」とに分類される。 Clustering is performed from the histogram shown in FIG. 7 using a method such as the k-means method. As a result of clustering, each target area 31A is classified into a "face cluster" in which the area in which the face is imaged is classified, a "front object cluster" in which the area in which the front object is imaged is classified, and an area in which the background object is imaged. Classified as "background object cluster".
 分類した結果の一例を図8に示す。図示するように各対象領域31Aは、撮像素子12との距離に応じて顔クラスタ、前方オブジェクトクラスタ、背景クラスタの何れかに分類される。 Figure 8 shows an example of the classification results. As shown in the figure, each target area 31A is classified into one of a face cluster, a front object cluster, and a background cluster according to the distance from the image sensor 12.
 なお、各クラスタへの分類の際には、顔の前後方向の長さ(頭長D1)や前後方向における耳と鼻の位置の距離D2などを考慮してもよい(図9)。 When classifying into each cluster, the length of the face in the anterior-posterior direction (head length D1) and the distance D2 between the positions of the ears and the nose in the anterior-posterior direction may be taken into consideration (FIG. 9).
 具体的に一例を挙げる。例えば、撮像装置1の撮像画素数が5000万画素とされ、焦点距離fが85mmとされ、絞りF値が1.4とされ、許容錯乱円径が8.3μmとされた場合において、撮像素子12から2mの距離に位置する被写体に合焦する際には、前方被写界深度が6.41mm、後方被写界深度が6.45mmとされる。これを1被写界深度とする。 A specific example is given. For example, when the number of image pickup pixels of the image pickup apparatus 1 is 50 million, the focal length f is 85 mm, the aperture F value is 1.4, and the permissible circle of confusion diameter is 8.3 μm, the image sensor When focusing on a subject located at a distance of 12 to 2 m, the front depth of field is 6.41 mm and the rear depth of field is 6.45 mm. Let this be one depth of field.
 また、顔平均データにおいて顔の中心位置から耳の位置までの距離が96mmとする。このとき、被写体の横顔を撮像した場合に、96mmの奥行きは約15被写界深度となる(=96/6.45)。15被写界深度はデフォーカス量に変換すると134.4μmとなる。即ち、デフォーカス量の取り得る範囲は134.4μmとなる。この値に基づいて顔クラスタに分類される対象領域31Aが決定されてもよい。 Also, in the face average data, the distance from the center position of the face to the position of the ears is 96 mm. At this time, when the profile of the subject is imaged, the depth of 96 mm is about 15 depth of field (= 96 / 6.45). 15 The depth of field is 134.4 μm when converted to the defocus amount. That is, the range in which the defocus amount can be taken is 134.4 μm. Based on this value, the target area 31A classified into the face cluster may be determined.
 制御部9は図5のステップS206において、前方オブジェクトクラスタに分類された対象領域31Aと背景オブジェクトクラスタに分類された対象領域31Aを除外する処理を行う。 In step S206 of FIG. 5, the control unit 9 performs a process of excluding the target area 31A classified into the front object cluster and the target area 31A classified into the background object cluster.
 次に制御部9はステップS207において、レンズ位置やデフォーカス量のデータから撮像素子12と顔の距離を推定する。この処理では、例えば、顔クラスタに分類された対象領域31Aごとのデフォーカス量のうち中央値となるデフォーカス量を特定し、中央値とされたデフォーカス量から撮像素子12と顔の距離を推定する。なお、顔クラスタに分類されたこれ以外の対象領域31Aについてのデフォーカス量を算出してもよい。 Next, in step S207, the control unit 9 estimates the distance between the image sensor 12 and the face from the data of the lens position and the defocus amount. In this process, for example, the median defocus amount of the defocus amount for each target area 31A classified into the face cluster is specified, and the distance between the imaging element 12 and the face is calculated from the median defocus amount. presume. The amount of defocus for the other target areas 31A classified into the face cluster may be calculated.
 レンズ位置やデフォーカス量に基づいて撮像素子12と顔の距離を算出するには、例えば、以下の式(1)を用いる。 To calculate the distance between the image sensor 12 and the face based on the lens position and the amount of defocus, for example, the following formula (1) is used.
 1/f=1/L1+1/L2・・・式(1) 1 / f = 1 / L1 + 1 / L2 ... Equation (1)
 ここで、変数fは焦点距離fを表す。変数L1は被写体とレンズの距離L1を表す。変数L2はレンズと撮像素子12の距離L2を表す。 Here, the variable f represents the focal length f. The variable L1 represents the distance L1 between the subject and the lens. The variable L2 represents the distance L2 between the lens and the image sensor 12.
 変数fと変数L2が既知であることから、被写体とレンズの距離L1を算出することができる(図10参照)。 Since the variable f and the variable L2 are known, the distance L1 between the subject and the lens can be calculated (see FIG. 10).
 また、以下の式(2)を用いることにより、顔の部位間の距離(例えば瞳孔間幅、内眼角幅、外眼角幅など)を算出することができる。 Further, by using the following formula (2), the distance between facial parts (for example, interpupillary width, inner canthus width, outer canthus width, etc.) can be calculated.
 Y2/Y1=L2/L1・・・式(2) Y2 / Y1 = L2 / L1 ... Equation (2)
 ここで、変数Y2は撮像素子12上の距離Y2を表す。変数Y1は実際の距離Y1を表す。
 距離Y2と距離L1と距離L2は既知であることから、距離Y1を算出することができる(図10参照)。
Here, the variable Y2 represents the distance Y2 on the image sensor 12. The variable Y1 represents the actual distance Y1.
Since the distance Y2, the distance L1 and the distance L2 are known, the distance Y1 can be calculated (see FIG. 10).
 制御部9は図5のステップS208において、距離情報に基づいて顔平均データを選択する。具体的には、各距離情報の比率や大きさに基づいて、複数用意されている顔平均データ(例えば、大人用顔平均データと子供用顔平均データ)から一つを選択する。
 一例を挙げると、顔クラスタに分類された対象領域31Aにおいて最小のデフォーカス量と最大のデフォーカス量の差分を算出し、その差分の大小によって大人用顔平均データを選択するか子供用貌平均データを選択するかを決定する。
In step S208 of FIG. 5, the control unit 9 selects face average data based on the distance information. Specifically, one is selected from a plurality of prepared face average data (for example, adult face average data and child face average data) based on the ratio and size of each distance information.
As an example, the difference between the minimum defocus amount and the maximum defocus amount is calculated in the target area 31A classified into the face cluster, and the face average data for adults is selected or the face average for children is selected according to the magnitude of the difference. Decide whether to select the data.
 制御部9はステップS209において、顔の傾きと選択した顔平均データから手前瞳の位置を算出する。手前瞳とは、被写体が有する二つの瞳のうち撮像装置1に近い方の瞳である。即ち、手前瞳と撮像素子12の距離を算出する。
 なお、顔の傾きの情報は、図4のステップS101の顔検出処理において取得される情報である。
In step S209, the control unit 9 calculates the position of the front pupil from the inclination of the face and the selected face average data. The front pupil is the pupil of the two pupils of the subject that is closer to the image pickup apparatus 1. That is, the distance between the front pupil and the image sensor 12 is calculated.
The information on the inclination of the face is the information acquired in the face detection process in step S101 of FIG.
 制御部9はステップS210において、顔の角度の3軸補正処理を行う。3軸補正処理では、顔の角度、即ち、ピッチ方向とヨー方向とロール方向の顔の傾きを算出し、手前瞳と撮像素子12の距離を補正する。 The control unit 9 performs a three-axis correction process of the face angle in step S210. In the 3-axis correction process, the angle of the face, that is, the inclination of the face in the pitch direction, the yaw direction, and the roll direction is calculated, and the distance between the front pupil and the image sensor 12 is corrected.
 制御部9はステップS211において、デフォーカス量が近い対象領域31Aを合焦エリアとして選択する。なお、合焦エリアの選択では、デフォーカス量が近い対象領域31Aのうち、瞳位置に近い対象領域31Aを選択してもよい。 In step S211 the control unit 9 selects the target area 31A having a close defocus amount as the focusing area. In selecting the focusing area, the target area 31A close to the pupil position may be selected from the target areas 31A having a close defocus amount.
 図4の説明に戻る。
 合焦エリア選択処理を終えた制御部9は、ステップS105において、選択された合焦エリアのデフォーカス量を取得する。
 続いて、制御部9はステップS106において、取得したデフォーカス量に応じたレンズ駆動を行い合焦制御を行う。これにより、手前瞳に合焦される。
Returning to the description of FIG.
The control unit 9 that has completed the focusing area selection process acquires the defocus amount of the selected focusing area in step S105.
Subsequently, in step S106, the control unit 9 drives the lens according to the acquired defocus amount to perform focusing control. As a result, the focus is on the front pupil.
 制御部9はステップS107において、顔検出できたか否かを判定する。顔検出できていない場合は、再度ステップS101の処理へと戻る。 The control unit 9 determines in step S107 whether or not the face can be detected. If the face cannot be detected, the process returns to step S101 again.
 一方、合焦制御を行った結果顔検出が成功した場合には、図4に示す一連の処理を終了する。 On the other hand, if face detection is successful as a result of focusing control, the series of processes shown in FIG. 4 is terminated.
 なお、ピッチ方向とヨー方向とロール方向の顔の傾きはCNNなどの機械学習によって出力されるものでなくてもよい。例えば、瞳位置と鼻位置に応じて各方向の傾きを算出してもよい。 Note that the inclination of the face in the pitch direction, yaw direction, and roll direction does not have to be output by machine learning such as CNN. For example, the inclination in each direction may be calculated according to the pupil position and the nose position.
 具体的に、ピッチ方向の顔の傾きを算出する例について、図11を参照して説明する。
 撮像装置1に対して正対した顔の瞳位置と鼻位置の上下方向における距離を距離Xとする(図11参照)。
 また、被写体の顔が上方を向いたあおり顔において瞳位置と鼻位置の上下方向における距離を距離Yとする(図12参照)。
 更に、被写体の顔が下方を向いたうつむき顔において瞳位置と鼻位置の上下方向における距離を距離Zとする(図13参照)。
Specifically, an example of calculating the inclination of the face in the pitch direction will be described with reference to FIG.
Let the distance X be the distance between the pupil position and the nose position of the face facing the image pickup device 1 in the vertical direction (see FIG. 11).
Further, the distance between the pupil position and the nose position in the vertical direction is defined as the distance Y in the tilted face with the subject's face facing upward (see FIG. 12).
Further, the distance between the pupil position and the nose position in the vertical direction is defined as the distance Z in the face with the subject's face facing downward (see FIG. 13).
 距離Xに対して距離Y及び距離Zは短くなることが分かる(図14参照)。従って、距離Yまたは距離Zの値に応じて被写体の顔がピッチ方向に何度傾いたか算出することができる。
 なお、距離Xは被写体の属性(性別や年齢や人種など)に応じて異なることが考えられる。この場合には、距離Yや距離Zの値も被写体の属性によって異なる。具体的に、子供であれば、図14のグラフを上下方向に押しつぶしたような形状となる。
 従って、被写体の属性に応じた顔平均データを用いて適切なグラフを選択し、距離Yや距離Zの値から顔の傾きを算出してもよい。
It can be seen that the distance Y and the distance Z are shorter than the distance X (see FIG. 14). Therefore, it is possible to calculate how many times the face of the subject is tilted in the pitch direction according to the value of the distance Y or the distance Z.
The distance X may differ depending on the attributes of the subject (gender, age, race, etc.). In this case, the values of the distance Y and the distance Z also differ depending on the attributes of the subject. Specifically, if it is a child, the shape is such that the graph of FIG. 14 is crushed in the vertical direction.
Therefore, an appropriate graph may be selected using the face average data according to the attributes of the subject, and the inclination of the face may be calculated from the values of the distance Y and the distance Z.
 また、距離Xに対する距離Y(距離Z)の比率に応じてピッチ方向の傾き角度を算出してもよい。 Further, the inclination angle in the pitch direction may be calculated according to the ratio of the distance Y (distance Z) to the distance X.
 あおり顔とうつむき顔の判定は、顔枠30に対する瞳位置や鼻位置によって判定することができる。具体的に、瞳位置や鼻位置が顔枠30における上方にあればあおり顔と判定することができ、下方にあればうつむき顔を判定することができる。 The tilted face and the depressed face can be determined by the position of the eyes and the position of the nose with respect to the face frame 30. Specifically, if the pupil position and the nose position are above the face frame 30, it can be determined to be a tilted face, and if it is below, it can be determined to be a depressed face.
 なお、距離Xの変化とピッチ方向の傾きの角度の関係は、事前に数式やルックアップテーブルなどの形式で撮像装置1のメモリ部10などに記憶しておくことが可能である。 The relationship between the change in the distance X and the inclination angle in the pitch direction can be stored in advance in the memory unit 10 of the image pickup apparatus 1 in the form of a mathematical formula or a look-up table.
 なお、ピッチ方向だけでなくヨー方向やロール方向における顔の傾き角度についても、同様の処理を行うことで算出することが可能である。
 例えば、ヨー方向の傾きであれば、鼻位置と瞳位置(或いは耳位置)の左右方向の距離を算出する。そして、被写体の属性情報に応じて顔平均データを選択する。更に、選択された顔平均データに基づいて鼻位置と瞳位置(或いは耳位置)の左右方向の距離と傾き角度の関係を示すグラフを一つ選択してヨー方向の傾き角度を算出する。
It is possible to calculate the tilt angle of the face not only in the pitch direction but also in the yaw direction and the roll direction by performing the same processing.
For example, in the case of inclination in the yaw direction, the distance between the nose position and the pupil position (or ear position) in the left-right direction is calculated. Then, the face average data is selected according to the attribute information of the subject. Further, based on the selected face average data, one graph showing the relationship between the horizontal distance between the nose position and the pupil position (or the ear position) and the tilt angle is selected to calculate the tilt angle in the yaw direction.
 このように、ヨー方向の傾きは、ヨー方向の傾き具合によって変化する距離情報を用いることにより算出可能である。ロール方向の傾きについても、ロール方向の傾き具合によって変化する距離情報を用いることにより算出可能である。 In this way, the inclination in the yaw direction can be calculated by using the distance information that changes depending on the degree of inclination in the yaw direction. The inclination in the roll direction can also be calculated by using the distance information that changes depending on the inclination in the roll direction.
 被写体の瞳位置と鼻位置によってピッチ方向の顔の傾きを算出する場合の処理の流れについて、図15を参照して説明する。 The flow of processing when calculating the inclination of the face in the pitch direction based on the position of the pupil and the position of the nose of the subject will be described with reference to FIG.
 制御部9はステップS301において、顔検出処理を行う。なお、図5のステップS209の処理を実行するために顔の傾きを算出する場合には、図4のステップS101において既に顔検出処理を行っているため、ステップS301の処理を省略してもよい。 The control unit 9 performs face detection processing in step S301. When calculating the inclination of the face to execute the process of step S209 of FIG. 5, the process of step S301 may be omitted because the face detection process has already been performed in step S101 of FIG. ..
 制御部9はステップS302において、被写体の瞳位置と鼻位置から上下方向の距離Y(または距離Z)を算出する。 In step S302, the control unit 9 calculates a distance Y (or distance Z) in the vertical direction from the pupil position and the nose position of the subject.
 次に、制御部9はステップS303において、顔枠30に対する瞳位置と鼻位置に応じてあおり顔とうつむき顔の何れであるかを判定する。 Next, in step S303, the control unit 9 determines whether the face is a tilted face or a depressed face according to the pupil position and the nose position with respect to the face frame 30.
 最後に、制御部9はステップS304において、顔の傾き角度を数式やルックアップテーブル等を用いて算出する。 Finally, in step S304, the control unit 9 calculates the tilt angle of the face using a mathematical formula, a look-up table, or the like.
 ステップS304においてピッチ方向の傾き角度を算出した後は、図5のステップS209以降の処理を行い、最終的に図4のステップS106を実行することにより、瞳位置に合焦することができる。即ち、あおり顔である場合にあごに合焦してしまうことやうつむき顔の場合に頭に合焦してしまうことが防止される。
After calculating the tilt angle in the pitch direction in step S304, the processing after step S209 in FIG. 5 is performed, and finally step S106 in FIG. 4 is executed to focus on the pupil position. That is, it is possible to prevent the chin from being in focus when the face is tilted and the head from being focused when the face is depressed.
<3-2.対応処理の第2例>
 対応処理の第2例では、ピッチ方向とヨー方向とロール方向の被写体の顔の傾き情報を利用して合焦制御以外の処理を行うものである。
 この処理は、撮像装置1の信号処理部4や制御部9が実行してもよい。或いは、顔の傾き情報を出力部7から出力し、画像処理装置としての撮像装置1の外部に設けられた情報処理装置などで利用するものであってもよい。
<3-2. Second example of corresponding processing>
In the second example of the corresponding processing, processing other than focusing control is performed by using the tilt information of the subject's face in the pitch direction, the yaw direction, and the roll direction.
This process may be executed by the signal processing unit 4 or the control unit 9 of the image pickup apparatus 1. Alternatively, the tilt information of the face may be output from the output unit 7 and used in an information processing device provided outside the image pickup device 1 as an image processing device.
 例えば、撮像画像の被写体の顔の傾き情報をホワイトバランス調整に用いることが考えられる。具体的には、撮像画像の被写体がうつむき顔であった場合、通常のホワイトバランス調整を行うと顔が青みがかった色に調整されてしまうことがある。そのような場合には、青と補色関係にあるオレンジや赤を強めにすることで、自然な顔色を再現したホワイトバランス調整を行うことが可能となる。 For example, it is conceivable to use the tilt information of the subject's face in the captured image for white balance adjustment. Specifically, when the subject of the captured image is a face down, the face may be adjusted to a bluish color by performing normal white balance adjustment. In such a case, by strengthening orange and red, which have a complementary color relationship with blue, it is possible to perform white balance adjustment that reproduces a natural complexion.
 また、撮像装置1の外部の情報処理装置で顔の傾き情報を利用する例としては、ポストプロダクションが挙げられる。
 撮像画像とメタデータとして出力された顔の傾き情報に基づいて、人物(被写体)の吹き出し位置を自動的に調整する処理や、人物の視線方向に画像を移動させることにより人物が見ているものを強調する処理(図16,図17参照)などが考えられる。
 なお、撮像画像と共に出力されるメタデータとしては、傾き情報以外にも、瞳位置の座標情報や鼻位置の座標情報を含んでいてもよい。更に、角度情報よりも大まかな情報として例えば、あおり顔かうつむき顔かを特定するフラグ情報を出力してもよい。
 もちろん、これらの情報全てがメタデータとして出力される必要はなく、一部の情報がメタデータとして出力されてもよい。
Further, as an example of using the tilt information of the face in the information processing device external to the image pickup device 1, post production can be mentioned.
What the person is looking at by automatically adjusting the blowout position of the person (subject) based on the captured image and the tilt information of the face output as metadata, or by moving the image in the direction of the person's line of sight. (See FIGS. 16 and 17) and the like can be considered.
The metadata output together with the captured image may include coordinate information of the pupil position and coordinate information of the nose position in addition to the tilt information. Further, for example, flag information for identifying a tilted face or a depressed face may be output as rougher information than the angle information.
Of course, not all of this information needs to be output as metadata, and some information may be output as metadata.
<4.まとめ>
 上述した各例で説明したように、本技術における画像処理装置(例えば撮像装置1)は、被写体の顔についてのピッチ方向の傾きを算出する傾き算出部24と、算出した傾きを用いて対応する処理を行う対応処理部(合焦制御部26、出力制御部27)と、を備えたものである。
 被写体の顔の傾きを算出して各種の処理に用いることにより、最適な合焦制御やホワイトバランス調整などを行うことが可能となる。
<4. Summary>
As described in each of the above examples, the image processing device (for example, the image pickup device 1) in the present technology corresponds to the inclination calculation unit 24 that calculates the inclination of the subject's face in the pitch direction by using the calculated inclination. It is provided with a corresponding processing unit (focusing control unit 26, output control unit 27) for performing processing.
By calculating the inclination of the subject's face and using it for various processes, it is possible to perform optimum focusing control and white balance adjustment.
 図3,図11~図15の各図で説明したように、被写体における鼻の位置及び瞳の位置を検出する部位特定部21を備え、傾き算出部24は、検出された鼻の位置と瞳の位置に基づいて顔についてのピッチ方向の傾きの角度を算出してもよい。
 これにより、ピッチ方向における顔の角度を高精度に算出することができる。
 従って、合焦制御等をより高精度に行うことができる。例えば、被写体の髪色が黒である場合に、頭に合焦してしまい更に頭が写った領域を基準としてホワイトバランス調整を行ってしまうと、画像全体が明るくなってしまう。また、屋外の撮影条件であってあおり顔の場合は顔が明るくなりすぎてしまうのを防止するように調整した結果、画像全体が暗くなりすぎてしまうことがある。このような場合に顔の角度を高精度に算出することにより、適切な合焦制御及びホワイトバランス調整を行うことが可能となる。
 また、傾き情報が高精度に算出されることにより、対応処理の第2例で説明したポストプロダクションにおいて効果的な演出を容易に作成することが可能となる。
As described with reference to FIGS. 3, 11 to 15, a site specifying unit 21 for detecting the position of the nose and the position of the pupil in the subject is provided, and the inclination calculation unit 24 includes the detected position of the nose and the pupil. The angle of inclination of the face in the pitch direction may be calculated based on the position of.
As a result, the angle of the face in the pitch direction can be calculated with high accuracy.
Therefore, focusing control and the like can be performed with higher accuracy. For example, when the hair color of the subject is black, if the head is in focus and the white balance is adjusted based on the area where the head is reflected, the entire image becomes bright. In addition, as a result of adjusting to prevent the face from becoming too bright in the case of a tilted face under outdoor shooting conditions, the entire image may become too dark. In such a case, by calculating the face angle with high accuracy, it is possible to perform appropriate focusing control and white balance adjustment.
Further, since the inclination information is calculated with high accuracy, it is possible to easily create an effective effect in the post-production described in the second example of the corresponding processing.
 対応処理の第1例(図15)などで説明したように、傾き算出部24は、角度の算出において、顔についての平均的な距離情報とされた顔平均データを利用してもよい。
 これにより、ピッチ方向における顔の角度をより高精度に算出することができる。
 従って、合焦制御等をより高精度に行うことやポストプロダクションにおいて効果的な演出を作成することができる。
As described in the first example (FIG. 15) of the corresponding processing, the inclination calculation unit 24 may use the face average data which is the average distance information about the face in the calculation of the angle.
As a result, the angle of the face in the pitch direction can be calculated with higher accuracy.
Therefore, it is possible to perform focusing control and the like with higher accuracy and to create an effective effect in post-production.
 対応処理の第1例(図3,図5)などで説明したように、少なくとも人種または年齢ごとに設けられた顔平均データから一つを選択する平均データ選択部23を備え、傾き算出部24は、平均データ選択部23が選択した一つの顔平均データに基づいて角度の算出を行ってもよい。
 これにより、性別や年齢や人種等に応じた顔平均データに基づいて顔の角度が高精度に算出される。
 従って、合焦制御等を更に高精度に行うことやポストプロダクションにおいてより効果的な演出を作成することができる。
As described in the first example of the corresponding processing (FIGS. 3 and 5), an average data selection unit 23 for selecting one from the face average data provided for at least race or age is provided, and a slope calculation unit is provided. 24 may calculate the angle based on one face average data selected by the average data selection unit 23.
As a result, the angle of the face is calculated with high accuracy based on the face average data according to the gender, age, race, and the like.
Therefore, it is possible to perform focusing control and the like with higher accuracy and to create a more effective effect in post-production.
 対応処理の第1例(図5,図7,図8)などで説明したように、平均データ選択部23は、画像上に設けられた複数の測距エリア(対象領域31)ごとの距離情報のクラスタリングを行い、クラスタリングの結果として得られる被写体の顔の距離情報の分布から被写体の顔の奥行き方向のサイズを算出し、算出された顔の奥行き方向のサイズに基づいて顔平均データの選択を行ってもよい。
 これにより、子供の顔は小さく大人の顔は大きいことに基づいて顔平均データが選択される。
 従って、被写体が子供であっても大人であっても合焦制御やポストプロダクションに係る処理等を高精度に行うことができる。
As described in the first example of the corresponding processing (FIGS. 5, 7, 8), the average data selection unit 23 is the distance information for each of the plurality of ranging areas (target areas 31) provided on the image. The size of the subject's face in the depth direction is calculated from the distribution of the distance information of the subject's face obtained as a result of the clustering, and the face average data is selected based on the calculated size of the face in the depth direction. You may go.
As a result, the face average data is selected based on the fact that the child's face is small and the adult's face is large.
Therefore, regardless of whether the subject is a child or an adult, focusing control, post-production processing, and the like can be performed with high accuracy.
 対応処理の第1例で説明したように、傾き算出部24は、顔平均データを利用することにより、被写体の顔についてのヨー方向の傾き及びロール方向の傾きを算出してもよい。 これにより、あらゆる方向において顔の傾きの角度を算出することができる。
 従って、例えば合焦制御を行う場合に瞳までの距離を適切に算出することができ、高精度な合焦制御を行うことができる。
As described in the first example of the corresponding processing, the inclination calculation unit 24 may calculate the inclination in the yaw direction and the inclination in the roll direction with respect to the face of the subject by using the face average data. This makes it possible to calculate the angle of inclination of the face in all directions.
Therefore, for example, when performing focusing control, the distance to the pupil can be appropriately calculated, and highly accurate focusing control can be performed.
 対応処理の第1例(図15,図4,図5など)で説明したように、傾き算出部24によって算出された傾きに基づいて、被写体の顔が上方を向くあおり顔と判定された場合に、対応処理部(合焦制御部26)は、対応する処理として、被写体におけるあご以外の位置における合焦制御を行ってもよい。
 これにより、あごに合焦されてしまうことが防止される。
 従って、適切な合焦制御を行うことができる。
As described in the first example of the corresponding processing (FIGS. 15, FIG. 4, FIG. 5, etc.), when the subject's face is determined to be a tilted face facing upward based on the inclination calculated by the inclination calculation unit 24. In addition, the corresponding processing unit (focusing control unit 26) may perform focusing control at a position other than the chin on the subject as the corresponding processing.
This prevents the chin from being in focus.
Therefore, appropriate focusing control can be performed.
 対応処理の第1例(図15,図4,図5など)で説明したように、傾き算出部24によって算出された傾きに基づいて、被写体の顔が下方を向くうつむき顔と判定された場合に、対応処理部(合焦制御部26)は、対応する処理として、被写体における頭以外の位置における合焦制御を行ってもよい。
 これにより、頭に合焦されてしまうことが防止される。
 従って、適切な合焦制御を行うことができる。
As described in the first example of the corresponding processing (FIGS. 15, FIG. 4, FIG. 5, etc.), when the subject's face is determined to be a downward-facing face based on the inclination calculated by the inclination calculation unit 24. In addition, the corresponding processing unit (focusing control unit 26) may perform focusing control at a position other than the head on the subject as the corresponding processing.
This prevents the head from being in focus.
Therefore, appropriate focusing control can be performed.
 対応処理の第1例(図4~図15)で説明したように、対応処理部(合焦制御部26)は、対応する処理として、被写体の瞳の位置における合焦制御を行ってもよい。
 例えば、高解像度の撮像装置1など被写界深度がきわめて浅い場合において、瞳が不鮮明になってしまう場合がある。
 そのような場合であっても、適切に瞳までの距離を算出して適切な合焦制御を実行することができる。
As described in the first example of the corresponding processing (FIGS. 4 to 15), the corresponding processing unit (focusing control unit 26) may perform focusing control at the position of the pupil of the subject as the corresponding processing. ..
For example, when the depth of field is extremely shallow, such as in the high-resolution image pickup apparatus 1, the pupil may become unclear.
Even in such a case, the distance to the pupil can be appropriately calculated and appropriate focusing control can be executed.
 対応処理の第2例(図16、図17)で説明したように、対応処理部(出力制御部27)は、対応する処理として、傾きの情報を出力する処理を行ってもよい。
 これにより、例えば外部装置は顔の傾き情報を用いた処理を行うことができる。
 例えば、顔の向きに合わせて目線の先に画像を配置するような処理を行うことができる。このような処理は、例えば、ポストプロダクションにおいて実行される。
As described in the second example of the corresponding processing (FIGS. 16 and 17), the corresponding processing unit (output control unit 27) may perform a process of outputting tilt information as the corresponding process.
As a result, for example, the external device can perform processing using the tilt information of the face.
For example, it is possible to perform a process of arranging the image in front of the line of sight according to the direction of the face. Such processing is performed, for example, in post production.
 図3などを用いて説明したように、部位特定部21によって瞳の位置の検出ができたが瞳の位置についてのデフォーカス量が算出できなかった場合に、対応処理部(合焦制御部26)は、対応する処理として瞳の位置について推定したデフォーカス量を用いて合焦制御を行ってもよい。
 例えば、瞳と撮像装置1の間に枝などの前方オブジェクトが位置している場合にはデフォーカス量が適切に算出できない場合がある。
 このような場合に、瞳までの距離に応じたデフォーカス量が適切に推定されることにより適切な合焦制御を行うことができる。
As described with reference to FIG. 3 and the like, when the position of the pupil can be detected by the site identification unit 21 but the amount of defocus for the position of the pupil cannot be calculated, the corresponding processing unit (focus control unit 26). ) May perform focusing control using the defocus amount estimated for the position of the pupil as the corresponding process.
For example, when a front object such as a branch is located between the pupil and the image pickup apparatus 1, the defocus amount may not be calculated appropriately.
In such a case, appropriate focusing control can be performed by appropriately estimating the amount of defocus according to the distance to the pupil.
 本技術の画像処理方法は、被写体の顔についてのピッチ方向の傾きを算出し、算出した前記傾きを用いて対応する処理を行うものである。 The image processing method of the present technology calculates the inclination of the face of the subject in the pitch direction, and performs the corresponding processing using the calculated inclination.
 本技術の撮像装置1は、被写体を撮像する撮像部3と、被写体の顔についてのピッチ方向の傾きを算出する傾き算出部24と、算出した傾きに応じて合焦制御を行う制御部9と、を備えたものである。 The imaging device 1 of the present technology includes an imaging unit 3 that images a subject, an inclination calculation unit 24 that calculates the inclination of the subject's face in the pitch direction, and a control unit 9 that performs focusing control according to the calculated inclination. , Is provided.
 このような画像処理装置や撮像装置1を実現するプログラムは撮像装置1等の機器に内蔵されている記録媒体としてのHDDや、CPUを有するマイクロコンピュータ内のROM等に予め記録しておくことができる。
 あるいはまた、フレキシブルディスク、CD-ROM(Compact Disc Read Only Memory)、MO(Magneto optical)ディスク、DVD(Digital Versatile Disc)、ブルーレイディスク(Blu-ray Disc(登録商標))、磁気ディスク、半導体メモリ、メモリカードなどのリムーバブル記録媒体に、一時的あるいは永続的に格納(記録)しておくことができる。このようなリムーバブル記録媒体は、いわゆるパッケージソフトウェアとして提供することができる。
 また、このようなプログラムは、リムーバブル記録媒体からパーソナルコンピュータ等にインストールする他、ダウンロードサイトから、LAN(Local Area Network)、インターネットなどのネットワークを介してダウンロードすることもできる。
A program for realizing such an image processing device or an image pickup device 1 may be recorded in advance in an HDD as a recording medium built in a device such as the image pickup device 1 or a ROM in a microcomputer having a CPU. can.
Alternatively, flexible discs, CD-ROMs (Compact Disc Read Only Memory), MO (Magneto optical) discs, DVDs (Digital Versatile Discs), Blu-ray discs (Blu-ray Discs (registered trademarks)), magnetic discs, semiconductor memories, It can be temporarily or permanently stored (recorded) on a removable recording medium such as a memory card. Such a removable recording medium can be provided as so-called package software.
In addition to installing such a program from a removable recording medium on a personal computer or the like, it can also be downloaded from a download site via a network such as a LAN (Local Area Network) or the Internet.
 またこのようなプログラムによれば、実施の形態の画像処理装置(撮像装置1など)の広範な提供に適している。例えばカメラ機能を備えたスマートホンやタブレット等の携帯端末装置、携帯電話機、パーソナルコンピュータ、ゲーム機器、ビデオ機器、PDA(Personal Digital Assistant)等にプログラムをダウンロードすることで、これらの機器を、本開示の撮像装置1として機能させることができる。 Further, according to such a program, it is suitable for a wide range of provision of the image processing device (imaging device 1 and the like) of the embodiment. For example, by downloading a program to a mobile terminal device such as a smartphone or tablet equipped with a camera function, a mobile phone, a personal computer, a game device, a video device, a PDA (Personal Digital Assistant), etc., these devices are disclosed. Can function as the image pickup device 1 of the above.
 尚、本明細書に記載された効果はあくまでも例示であって限定されるものではなく、また他の効果があってもよい。
It should be noted that the effects described in the present specification are merely examples and are not limited, and other effects may be obtained.
<5.本技術>
 本技術は以下のような構成も採ることができる。
(1)
 被写体の顔についてのピッチ方向の傾きを算出する傾き算出部と、
 算出した前記傾きを用いて対応する処理を行う対応処理部と、を備えた
 画像処理装置。
(2)
 前記被写体における鼻の位置及び瞳の位置を検出する部位特定部を備え、
 前記傾き算出部は、検出された前記鼻の位置と前記瞳の位置に基づいて前記顔についてのピッチ方向の前記傾きの角度を算出する
 上記(1)に記載の画像処理装置。
(3)
 前記傾き算出部は、前記角度の算出において、顔についての平均的な距離情報とされた顔平均データを利用する
 上記(2)に記載の画像処理装置。
(4)
 少なくとも人種または年齢ごとに設けられた前記顔平均データから一つを選択する平均データ選択部を備え、
 前記傾き算出部は、前記平均データ選択部が選択した一つの前記顔平均データに基づいて前記角度の算出を行う
 上記(3)に記載の画像処理装置。
(5)
 前記平均データ選択部は、画像上に設けられた複数の測距エリアごとの距離情報のクラスタリングを行い、前記クラスタリングの結果として得られる前記被写体の顔の距離情報の分布から前記被写体の顔の奥行き方向のサイズを算出し、算出された前記顔の奥行き方向のサイズに基づいて前記顔平均データの選択を行う
 上記(4)に記載の画像処理装置。
(6)
 前記傾き算出部は、前記顔平均データを利用することにより、前記被写体の顔についてのヨー方向の傾き及びロール方向の傾きを算出する
 上記(3)に記載の画像処理装置。
(7)
 前記傾き算出部によって算出された前記傾きに基づいて、前記被写体の顔が上方を向くあおり顔と判定された場合に、
 前記対応処理部は、前記対応する処理として、前記被写体におけるあご以外の位置における合焦制御を行う
 上記(1)から上記(6)の何れかに記載の画像処理装置。
(8)
 前記傾き算出部によって算出された前記傾きに基づいて、前記被写体の顔が下方を向くうつむき顔と判定された場合に、
 前記対応処理部は、前記対応する処理として、前記被写体における頭以外の位置における合焦制御を行う
 上記(1)から上記(7)の何れかに記載の画像処理装置。
(9)
 前記対応処理部は、前記対応する処理として、前記被写体の瞳の位置における合焦制御を行う
 上記(1)から上記(8)の何れかに記載の画像処理装置。
(10)
 前記対応処理部は、前記対応する処理として、前記傾きの情報を出力する処理を行う
 上記(1)から上記(9)の何れかに記載の画像処理装置。
(11)
 前記部位特定部によって瞳の位置の検出ができたが瞳の位置についてのデフォーカス量が算出できなかった場合に、前記対応処理部は、前記対応する処理として瞳の位置について推定したデフォーカス量を用いて合焦制御を行う
 上記(2)から上記(6)の何れかに記載の画像処理装置。
(12)
 被写体の顔についてのピッチ方向の傾きを算出し、
 算出した前記傾きを用いて対応する処理を行う
 画像処理方法。
(13)
 被写体を撮像する撮像部と、
 前記被写体の顔についてのピッチ方向の傾きを算出する傾き算出部と、
 算出した前記傾きに応じて合焦制御を行う制御部と、を備えた
 撮像装置。
<5. This technology>
The present technology can also adopt the following configurations.
(1)
A tilt calculation unit that calculates the tilt of the subject's face in the pitch direction,
An image processing device including a corresponding processing unit that performs corresponding processing using the calculated inclination.
(2)
A part specifying part for detecting the position of the nose and the position of the pupil in the subject is provided.
The image processing device according to (1) above, wherein the tilt calculation unit calculates the angle of the tilt in the pitch direction with respect to the face based on the detected position of the nose and the position of the pupil.
(3)
The image processing apparatus according to (2) above, wherein the inclination calculation unit uses face average data obtained as average distance information about a face in calculating the angle.
(4)
It is provided with an average data selection unit that selects one from the face average data provided at least for each race or age.
The image processing apparatus according to (3) above, wherein the inclination calculation unit calculates the angle based on one face average data selected by the average data selection unit.
(5)
The average data selection unit clusters distance information for each of a plurality of ranging areas provided on the image, and the depth of the subject's face is obtained from the distribution of the distance information of the subject's face obtained as a result of the clustering. The image processing apparatus according to (4) above, which calculates the size in the direction and selects the face average data based on the calculated size in the depth direction of the face.
(6)
The image processing apparatus according to (3) above, wherein the inclination calculation unit calculates the inclination in the yaw direction and the inclination in the roll direction with respect to the face of the subject by using the face average data.
(7)
When the face of the subject is determined to be an upwardly tilted face based on the inclination calculated by the inclination calculation unit,
The image processing apparatus according to any one of (1) to (6) above, wherein the corresponding processing unit performs focusing control at a position other than the chin on the subject as the corresponding processing.
(8)
When the face of the subject is determined to be a downward-facing face based on the inclination calculated by the inclination calculation unit,
The image processing apparatus according to any one of (1) to (7) above, wherein the corresponding processing unit performs focusing control at a position other than the head of the subject as the corresponding processing.
(9)
The image processing apparatus according to any one of (1) to (8) above, wherein the corresponding processing unit controls focusing at the position of the pupil of the subject as the corresponding processing.
(10)
The image processing apparatus according to any one of (1) to (9) above, wherein the corresponding processing unit performs a process of outputting information on the inclination as the corresponding process.
(11)
When the pupil position can be detected by the site identification unit but the defocus amount for the pupil position cannot be calculated, the corresponding processing unit estimates the defocus amount for the pupil position as the corresponding process. The image processing apparatus according to any one of (2) to (6) above, which controls focusing using the above.
(12)
Calculate the inclination of the subject's face in the pitch direction,
An image processing method that performs corresponding processing using the calculated inclination.
(13)
An imaging unit that captures the subject and
A tilt calculation unit that calculates the tilt of the subject's face in the pitch direction,
An imaging device including a control unit that performs focusing control according to the calculated inclination.
1 撮像装置(画像処理装置)
3 撮像部
21 部位特定部
23 平均データ選択部
24 傾き算出部
26 合焦制御部(対応処理部)
27 出力制御部(対応処理部)
1 Imaging device (image processing device)
3 Imaging unit 21 Site identification unit 23 Average data selection unit 24 Tilt calculation unit 26 Focus control unit (corresponding processing unit)
27 Output control unit (corresponding processing unit)

Claims (13)

  1.  被写体の顔についてのピッチ方向の傾きを算出する傾き算出部と、
     算出した前記傾きを用いて対応する処理を行う対応処理部と、を備えた
     画像処理装置。
    A tilt calculation unit that calculates the tilt of the subject's face in the pitch direction,
    An image processing device including a corresponding processing unit that performs corresponding processing using the calculated inclination.
  2.  前記被写体における鼻の位置及び瞳の位置を検出する部位特定部を備え、
     前記傾き算出部は、検出された前記鼻の位置と前記瞳の位置に基づいて前記顔についてのピッチ方向の前記傾きの角度を算出する
     請求項1に記載の画像処理装置。
    A part specifying part for detecting the position of the nose and the position of the pupil in the subject is provided.
    The image processing device according to claim 1, wherein the tilt calculation unit calculates the angle of the tilt in the pitch direction with respect to the face based on the detected position of the nose and the position of the pupil.
  3.  前記傾き算出部は、前記角度の算出において、顔についての平均的な距離情報とされた顔平均データを利用する
     請求項2に記載の画像処理装置。
    The image processing apparatus according to claim 2, wherein the inclination calculation unit uses face average data as average distance information about a face in calculating the angle.
  4.  少なくとも人種または年齢ごとに設けられた前記顔平均データから一つを選択する平均データ選択部を備え、
     前記傾き算出部は、前記平均データ選択部が選択した一つの前記顔平均データに基づいて前記角度の算出を行う
     請求項3に記載の画像処理装置。
    It is provided with an average data selection unit that selects one from the face average data provided at least for each race or age.
    The image processing apparatus according to claim 3, wherein the inclination calculation unit calculates the angle based on one face average data selected by the average data selection unit.
  5.  前記平均データ選択部は、画像上に設けられた複数の測距エリアごとの距離情報のクラスタリングを行い、前記クラスタリングの結果として得られる前記被写体の顔の距離情報の分布から前記被写体の顔の奥行き方向のサイズを算出し、算出された前記顔の奥行き方向のサイズに基づいて前記顔平均データの選択を行う
     請求項4に記載の画像処理装置。
    The average data selection unit clusters distance information for each of a plurality of ranging areas provided on the image, and the depth of the subject's face is obtained from the distribution of the distance information of the subject's face obtained as a result of the clustering. The image processing apparatus according to claim 4, wherein the size in the direction is calculated, and the face average data is selected based on the calculated size in the depth direction of the face.
  6.  前記傾き算出部は、前記顔平均データを利用することにより、前記被写体の顔についてのヨー方向の傾き及びロール方向の傾きを算出する
     請求項3に記載の画像処理装置。
    The image processing apparatus according to claim 3, wherein the inclination calculation unit calculates an inclination in the yaw direction and an inclination in the roll direction with respect to the face of the subject by using the face average data.
  7.  前記傾き算出部によって算出された前記傾きに基づいて、前記被写体の顔が上方を向くあおり顔と判定された場合に、
     前記対応処理部は、前記対応する処理として、前記被写体におけるあご以外の位置における合焦制御を行う
     請求項1に記載の画像処理装置。
    When the face of the subject is determined to be an upwardly tilted face based on the inclination calculated by the inclination calculation unit,
    The image processing apparatus according to claim 1, wherein the corresponding processing unit performs focusing control at a position other than the chin on the subject as the corresponding processing.
  8.  前記傾き算出部によって算出された前記傾きに基づいて、前記被写体の顔が下方を向くうつむき顔と判定された場合に、
     前記対応処理部は、前記対応する処理として、前記被写体における頭以外の位置における合焦制御を行う
     請求項1に記載の画像処理装置。
    When the face of the subject is determined to be a downward-facing face based on the inclination calculated by the inclination calculation unit,
    The image processing apparatus according to claim 1, wherein the corresponding processing unit performs focusing control at a position other than the head of the subject as the corresponding process.
  9.  前記対応処理部は、前記対応する処理として、前記被写体の瞳の位置における合焦制御を行う
     請求項1に記載の画像処理装置。
    The image processing apparatus according to claim 1, wherein the corresponding processing unit controls focusing at the position of the pupil of the subject as the corresponding process.
  10.  前記対応処理部は、前記対応する処理として、前記傾きの情報を出力する処理を行う
     請求項1に記載の画像処理装置。
    The image processing apparatus according to claim 1, wherein the corresponding processing unit performs a process of outputting the tilt information as the corresponding process.
  11.  前記部位特定部によって瞳の位置の検出ができたが瞳の位置についてのデフォーカス量が算出できなかった場合に、前記対応処理部は、前記対応する処理として瞳の位置について推定したデフォーカス量を用いて合焦制御を行う
     請求項2に記載の画像処理装置。
    When the pupil position can be detected by the site identification unit but the defocus amount for the pupil position cannot be calculated, the corresponding processing unit estimates the defocus amount for the pupil position as the corresponding process. The image processing apparatus according to claim 2, wherein focusing control is performed using the image processing apparatus according to claim 2.
  12.  被写体の顔についてのピッチ方向の傾きを算出し、
     算出した前記傾きを用いて対応する処理を行う
     画像処理方法。
    Calculate the inclination of the subject's face in the pitch direction,
    An image processing method that performs corresponding processing using the calculated inclination.
  13.  被写体を撮像する撮像部と、
     前記被写体の顔についてのピッチ方向の傾きを算出する傾き算出部と、
     算出した前記傾きに応じて合焦制御を行う制御部と、を備えた
     撮像装置。
    An imaging unit that captures the subject and
    A tilt calculation unit that calculates the tilt of the subject's face in the pitch direction,
    An imaging device including a control unit that performs focusing control according to the calculated inclination.
PCT/JP2021/011225 2020-04-14 2021-03-18 Image processing device, image processing method, and imaging device WO2021210340A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020-072423 2020-04-14
JP2020072423 2020-04-14

Publications (1)

Publication Number Publication Date
WO2021210340A1 true WO2021210340A1 (en) 2021-10-21

Family

ID=78083692

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/011225 WO2021210340A1 (en) 2020-04-14 2021-03-18 Image processing device, image processing method, and imaging device

Country Status (1)

Country Link
WO (1) WO2021210340A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009105851A (en) * 2007-10-25 2009-05-14 Sony Corp Imaging apparatus, control method and program thereof
JP2010108167A (en) * 2008-10-29 2010-05-13 Toyota Motor Corp Face recognition device
JP2012123301A (en) * 2010-12-10 2012-06-28 Olympus Imaging Corp Imaging apparatus
JP2017198996A (en) * 2017-05-18 2017-11-02 カシオ計算機株式会社 Imaging device, imaging method, and program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009105851A (en) * 2007-10-25 2009-05-14 Sony Corp Imaging apparatus, control method and program thereof
JP2010108167A (en) * 2008-10-29 2010-05-13 Toyota Motor Corp Face recognition device
JP2012123301A (en) * 2010-12-10 2012-06-28 Olympus Imaging Corp Imaging apparatus
JP2017198996A (en) * 2017-05-18 2017-11-02 カシオ計算機株式会社 Imaging device, imaging method, and program

Similar Documents

Publication Publication Date Title
US10009540B2 (en) Image processing device, image capturing device, and image processing method for setting a combination parameter for combining a plurality of image data
JP5136669B2 (en) Image processing apparatus, image processing method, and program
JP5115139B2 (en) Composition determination apparatus, composition determination method, and program
KR101795601B1 (en) Apparatus and method for processing image, and computer-readable storage medium
US8494301B2 (en) Refocusing images using scene captured images
US8786749B2 (en) Digital photographing apparatus for displaying an icon corresponding to a subject feature and method of controlling the same
JP4974812B2 (en) Electronic camera
US9065998B2 (en) Photographing apparatus provided with an object detection function
JP2007150601A (en) Electronic camera
JP2017069776A (en) Imaging apparatus, determination method and program
CN104702824A (en) Image capturing apparatus and control method of image capturing apparatus
JP2007124282A (en) Imaging apparatus
JP2007281647A (en) Electronic camera and image processing apparatus
JP5370555B2 (en) Imaging apparatus, imaging method, and program
WO2020195073A1 (en) Image processing device, image processing method, program, and imaging device
US8514305B2 (en) Imaging apparatus
WO2021210340A1 (en) Image processing device, image processing method, and imaging device
WO2020195198A1 (en) Image processing device, image processing method, program, and imaging device
US11108944B2 (en) Image processing apparatus for providing information for confirming depth range of image, control method of the same, and storage medium
US20240119565A1 (en) Image processing device, image processing method, and program
JP4773924B2 (en) IMAGING DEVICE, ITS CONTROL METHOD, PROGRAM, AND STORAGE MEDIUM
JP5640466B2 (en) Digital camera
JP2011197995A (en) Image processor and image processing method
JP5446619B2 (en) Digital camera and image data editing program
JP2024013019A (en) Control device, imaging apparatus, control method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21787665

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21787665

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP