WO2022224498A1 - Recognition device, recognition method, and program - Google Patents

Recognition device, recognition method, and program Download PDF

Info

Publication number
WO2022224498A1
WO2022224498A1 PCT/JP2022/000218 JP2022000218W WO2022224498A1 WO 2022224498 A1 WO2022224498 A1 WO 2022224498A1 JP 2022000218 W JP2022000218 W JP 2022000218W WO 2022224498 A1 WO2022224498 A1 WO 2022224498A1
Authority
WO
WIPO (PCT)
Prior art keywords
recognition
image
sensor
recognition target
lidar sensor
Prior art date
Application number
PCT/JP2022/000218
Other languages
French (fr)
Japanese (ja)
Inventor
達雄 藤原
Original Assignee
ソニーグループ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーグループ株式会社 filed Critical ソニーグループ株式会社
Priority to CN202280028267.4A priority Critical patent/CN117178293A/en
Publication of WO2022224498A1 publication Critical patent/WO2022224498A1/en

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01BMEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
    • G01B11/00Measuring arrangements characterised by the use of optical techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/521Depth or shape recovery from laser ranging, e.g. using interferometry; from the projection of structured light
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images

Definitions

  • the present technology relates to a recognition device, a recognition method, and a program related to recognition of a recognition target.
  • Patent Document 1 describes providing the user with an image of the user reaching out for the virtual object in an augmented reality image in which the virtual object is superimposed on the camera image.
  • an object of the present technology is to provide a recognition device, a recognition method, and a program capable of improving recognition accuracy of a recognition target object.
  • a recognition device includes a processing unit.
  • the processing unit captures an image of a LiDAR (Light Detection and Ranging) sensor having a light emitting unit that irradiates a recognition target with light and a light receiving unit that receives light reflected from the recognition target, and the recognition target.
  • a LiDAR Light Detection and Ranging
  • the depth value of the recognition target obtained by the LiDAR sensor of the device equipped with an image sensor. to correct.
  • the depth correction information may include difference information between the depth value of the recognition target object based on the sensing result of the LiDAR sensor and the actual depth value of the recognition target object.
  • the device comprises a plurality of image sensors and one LiDAR sensor
  • the depth correction information includes the depth value of the recognition target calculated by triangulation using the position information of the recognition target detected from the sensing results of each of the plurality of image sensors, and the sensing result of the LiDAR sensor. may include difference information from the depth value of the recognition target object based on the depth image as .
  • the device comprises at least one image sensor and one LiDAR sensor;
  • the depth correction information is the position information of the recognition target detected from the sensing result of one of the image sensors and the position information of the recognition target detected from the reliability image as the sensing result of the LiDAR sensor. and the depth value of the recognition target object calculated by triangulation using the LiDAR sensor, and the difference information between the depth value of the recognition target object based on the depth image as the sensing result of the LiDAR sensor.
  • the object to be recognized may be a translucent body.
  • the object to be recognized may be human skin.
  • the object to be recognized may be a human hand.
  • the processing unit may recognize a gesture motion of a person who is the object to be recognized.
  • the processing unit may generate the depth correction information using the sensing result of the LiDAR sensor and the sensing result of the image sensor.
  • the device has a display
  • the processing unit may generate an image to be displayed on the display unit using the corrected depth value of the recognition target object.
  • a recognition method includes a LiDAR (Light Detection and Ranging) sensor having a light emitting unit that irradiates light onto an object to be recognized and a light receiving unit that receives light reflected from the object to be recognized, and the object to be recognized.
  • Depth correction information generated using the sensing result of the LiDAR sensor and the sensing result of the image sensor, the depth value of the recognition target acquired by the LiDAR sensor of the device comprising an image sensor that captures the Refer to and correct.
  • the program related to this technology is A LiDAR (Light Detection and Ranging) sensor having a light emitting section that irradiates light onto a recognition target and a light receiving section that receives light reflected from the recognition target, and an image sensor that captures the recognition target.
  • a step of correcting the depth value of the recognition object acquired by the LiDAR sensor of the device by referring to depth correction information generated using the sensing result of the LiDAR sensor and the sensing result of the image sensor. Let the device do it.
  • FIG. 1 is an external view of a mobile terminal as a recognition device according to an embodiment of the present technology
  • FIG. It is a schematic block diagram of the said portable terminal.
  • 2 is a configuration diagram including functional configuration blocks of the mobile terminal
  • FIG. FIG. 4 is a flowchart of a recognition method for a recognition target object
  • FIG. 4 is a diagram for explaining a correction map
  • FIG. FIG. 4 is a schematic diagram illustrating a method of generating a correction map according to the first embodiment
  • FIG. FIG. 4 is a flowchart of a correction map generation method according to the first embodiment
  • FIG. 4 is a diagram for explaining a basic image displayed on the display section when generating a correction map
  • FIG. 10 is a diagram for explaining a more detailed image displayed on the display unit when generating a correction map;
  • FIG. 10 is a flow chart relating to a method of displaying an image displayed on a display unit when generating a correction map;
  • FIG. 10 is a schematic diagram illustrating a method of generating a correction map according to the second embodiment;
  • FIG. 11 is a flowchart of a correction map generation method according to the second embodiment;
  • FIG. 1 is an external view of a mobile terminal 1 as a recognition device.
  • FIG. 1A is a plan view of the mobile terminal 1 as seen from the front 1a side where the display unit 34 is located
  • FIG. 1B is a plan view of the mobile terminal 1 as seen from the back 1b side.
  • the XYZ coordinate directions orthogonal to each other shown in the drawings correspond to the width, length, and height of the mobile terminal 1, which has a substantially rectangular parallelepiped shape.
  • a plane parallel to the front surface 1a and the rear surface 1b is defined as an XY plane
  • the thickness direction of the mobile terminal 1 corresponding to the height direction is defined as the Z axis.
  • the mobile terminal 1 functions as a recognition device that recognizes a recognition target object.
  • the mobile terminal 1 is a device having a first camera 2A and a second camera 2B, which are image sensors, a LiDAR sensor 3, and a display section .
  • a mobile terminal 1 is a device having a multi-view camera.
  • the mobile terminal 1 has a housing 4, a display section 34, a first camera 2A, a second camera 2B, and a LiDAR sensor 3.
  • the mobile terminal 1 is configured such that a housing 4 holds a display panel constituting a display unit 34, a first camera 2A, a second camera 2B, a LiDAR sensor 3, other various sensors, a drive circuit, and the like. .
  • the mobile terminal 1 has a front surface 1a and a rear surface 1b located on the opposite side of the front surface 1a.
  • a display section 34 is arranged on the front face 1a side.
  • the display unit 34 is configured by a display panel (image display means) such as a liquid crystal display or an organic EL display (Organic Electro-Luminescence Display).
  • the display unit 34 displays images transmitted and received from an external device through a communication unit 41 described later, images generated by a display image generation unit 54 described later, input operation buttons, and images captured by the first camera 2A and the second camera 2B. A through image or the like can be displayed. Images include still images and moving images.
  • the imaging lens of the first camera 2A, the imaging lens of the second camera 2B, and the imaging lens of the LiDAR sensor 3 are positioned on the rear surface 1b side.
  • the first camera 2A, the second camera 2B, and the LiDAR sensor 3 are preliminarily calibrated so that the coordinate values of the same recognition object (subject) sensed in the shooting space are the same.
  • RGB information RGB image data
  • depth information depth image data
  • FIG. 2 is a schematic configuration diagram of the mobile terminal 1.
  • FIG. 3 is a configuration diagram including functional configuration blocks of the mobile terminal 1.
  • FIG. 2 is a schematic configuration diagram of the mobile terminal 1.
  • FIG. 3 is a configuration diagram including functional configuration blocks of the mobile terminal 1.
  • the mobile terminal 1 includes a sensor unit 10, a communication unit 41, a CPU (Central Processing Unit) 42, a display unit 34, a GNSS reception unit 44, a main memory 45, and a flash memory 46. , an audio device unit 47 , and a battery 48 .
  • a sensor unit 10 As shown in FIG. 2, the mobile terminal 1 includes a sensor unit 10, a communication unit 41, a CPU (Central Processing Unit) 42, a display unit 34, a GNSS reception unit 44, a main memory 45, and a flash memory 46.
  • a main memory 45 As shown in FIG. 2, the mobile terminal 1 includes a sensor unit 10, a communication unit 41, a CPU (Central Processing Unit) 42, a display unit 34, a GNSS reception unit 44, a main memory 45, and a flash memory 46.
  • an audio device unit 47 As shown in FIG. 2, the mobile terminal 1 includes a battery 48 .
  • a battery 48 As shown in FIG. 2, the mobile terminal 1 includes a sensor unit 10, a communication unit
  • the sensor unit 10 includes imaging devices such as the first camera 2A, the second camera 2B, and the LiDAR sensor 3 and various sensors such as the touch sensor 43 .
  • the touch sensor 43 is typically arranged on a display panel that constitutes the display section 34 .
  • the touch sensor 43 receives input operations such as settings performed by the user on the display unit 34 .
  • the communication unit 41 is configured to communicate with an external device.
  • the CPU 42 controls the entire mobile terminal 1 by executing an operating system.
  • the CPU 42 also executes various programs read from a removable recording medium and loaded into the main memory 45 or downloaded via the communication section 41 .
  • the GNSS receiver 44 is a Global Navigation Satellite System (GNSS) signal receiver.
  • the GNSS receiver 44 acquires position information of the mobile terminal 1 .
  • GNSS Global Navigation Satellite System
  • the main memory 45 is composed of a RAM (Random Access Memory) and stores programs and data necessary for processing. Flash memory 46 is an auxiliary storage device. Audio device section 47 includes a microphone and a speaker. A battery 48 is a power source for driving the mobile terminal 1 .
  • the mobile terminal 1 has a sensor section 10, a processing section 50, a storage section 56, and a display section .
  • the sensor section 10 of FIG. 3 only main sensors mainly related to the present technology are illustrated.
  • the sensing results of the first camera 2A, the second camera 2B, and the LiDAR sensor 3 included in the sensor unit 10 are output to the processing unit 50.
  • the first camera 2A and the second camera 2B have the same configuration.
  • the camera 2 is an RGB camera capable of capturing a color two-dimensional image (also called an RGB image) of a subject as image data.
  • the RGB image is the sensing result of camera 2 .
  • the camera 2 is an image sensor that captures an image of a recognition target (object).
  • the image sensor is, for example, a CCD (Charge-Coupled Device) sensor or a CMOS (Complementary Metal Oxide Semiconductor) sensor.
  • the image sensor has a photodiode, which is a light receiving portion, and a signal processing circuit. In the image sensor, the light received by the light receiving portion is subjected to signal processing by a signal processing circuit, and image data corresponding to the amount of light incident on the light receiving portion is obtained.
  • LiDAR sensor The LiDAR sensor 3 captures a depth image (also referred to as a distance image) of a recognition target (subject).
  • a depth image is a sensing result of the LiDAR sensor 3 .
  • a depth image is depth information including a depth value of a recognition object.
  • the LiDAR sensor 3 is a ranging sensor that uses remote sensing technology (LiDAR: Light Detection and Ranging) using laser light.
  • LiDAR sensors include a ToF (Time of flight) method and an FMCW (Frequency Modulated Continuous Wave) method, and although either method may be used, the ToF method can be preferably used.
  • a ToF-type LiDAR sensor (hereinafter referred to as a ToF sensor)
  • a ToF sensor There are two types of ToF sensors: a “direct method” and an “indirect method”, and either type of ToF sensor may be used.
  • the "direct method” irradiates a subject with a light pulse that emits light for a short time, and measures the time it takes for the reflected light to reach the ToF sensor.
  • the “indirect method” uses light that blinks periodically and detects the time delay as the phase difference when the light makes a round trip to and from the subject. From the viewpoint of increasing the number of pixels, it is more preferable to use an indirect ToF sensor.
  • the LiDAR sensor 3 has a light emitting part, a photodiode as a light receiving part, and a signal processing circuit.
  • the light emitting unit emits laser light, typically near-infrared light (NIR light).
  • the light receiving unit receives return light (reflected light) when the NIR light emitted from the light emitting unit is reflected by a recognition object (object).
  • the received return light is signal-processed by the signal processing circuit, and a depth image corresponding to the subject is acquired.
  • the light emitting unit includes, for example, a light emitting member such as a light emitting diode (LED) and a driver circuit for causing the light emitting member to emit light.
  • LED light emitting diode
  • the recognition target is a translucent object
  • the measurement value and the actual value hereinafter referred to as actual value.
  • the three-dimensional measurement accuracy of the recognition target deteriorates due to the reflection characteristics of the material of the recognition target and the individual differences of the sensor devices.
  • subsurface scattering also called subcutaneous scattering
  • the object to be recognized is human skin
  • an error of about 20 mm may occur between the measured value and the actual depth value.
  • Human skin, marble, milk, etc. are known as examples of translucent bodies.
  • a translucent body is an object within which light transmission and scattering occurs.
  • the depth value acquired by the LiDAR sensor 3 is corrected with reference to a correction map, which is depth correction information.
  • a correction map which is depth correction information.
  • the correction map can be generated using sensing results of the first camera 2A, the second camera 2B, and the LiDAR sensor 3, respectively. Details of the correction map will be described later.
  • the recognition target is a human hand with semi-transparent skin that is exposed, and an example of recognizing the hand will be used.
  • the processing unit 50 corrects the depth value acquired by the LiDAR sensor 3 using the correction map.
  • the processing unit 50 may generate a correction map.
  • the processing unit 50 has an acquisition unit 51 , a recognition unit 52 , a correction unit 53 , a display image generation unit 54 and a correction map generation unit 55 .
  • the acquisition unit 51 acquires the sensing results of the first camera 2A, the second camera 2B, and the LiDAR sensor 3, that is, the RGB image and the depth image.
  • the recognition unit 52 detects a hand region from the depth image and the RGB image acquired by the acquisition unit 51 .
  • the recognition unit 52 detects the position of the characteristic point of the hand from the image area obtained by cutting out the detected hand area.
  • Characteristic points of the hand for recognizing the position of the hand include fingertips, finger joints, wrists, and the like. Fingertips, finger joints, and wrists are parts of the hand.
  • the recognition unit 52 detects the two-dimensional feature point positions of the hands from the hand regions of the RGB images respectively acquired by the first camera 2A and the second camera 2B.
  • the detected two-dimensional feature point positions are output to the correction map generator 55 .
  • two-dimensional feature point position may be referred to as "two-dimensional position”.
  • the recognition unit 52 also estimates and detects the three-dimensional feature point positions of the hand from the hand region of the depth image acquired by the LiDAR sensor 3 .
  • the three-dimensional feature point positions of the recognition target detected based on the depth image of the LiDAR sensor 3 are output to the correction unit 53 .
  • three-dimensional feature point position may be referred to as "three-dimensional position”.
  • the three-dimensional position includes depth value information.
  • the detection of the hand region and the detection of the feature point position can be performed by a known method.
  • DNN deep neural network
  • hand recognition technology of the human body such as Hand Pose Detection, Hand Pose Estimation, Hand segmentation, feature points such as HOG (Histogram of Oriented Gradient), SIFT (Scale Invariant Feature Transform)
  • HOG Hexogram of Oriented Gradient
  • SIFT Scale Invariant Feature Transform
  • the position of the hand in the image can be recognized by an extraction method, an object recognition method based on pattern recognition such as Boosting and SVM (Support Vector Machine), and an area extraction method such as Graph Cut.
  • the correction unit 53 detects the recognition target (hand in this embodiment) based on the depth image of the LiDAR sensor 3. ) is corrected with reference to the correction map.
  • the depth value is corrected so that the deviation (error) between the measured value by the LiDAR sensor 3 and the actual value due to subsurface scattering is eliminated. be done.
  • the correction using the correction map makes it possible to obtain the three-dimensional position information of the actual recognition target from the sensing result of the LiDAR sensor 3, thereby recognizing the recognition target with high accuracy.
  • the depth value of the recognition target object corrected by the correction unit 53 is output to the display image generation unit 54 .
  • the display image generation section 54 generates an image signal to be output to the display section 34 .
  • the image signal is output to the display section 34, and an image is displayed on the display section 34 based on the image signal.
  • the display image generation unit 54 may generate an image in which the virtual object is superimposed on the through image (camera image) acquired by the camera 2 .
  • the virtual object may be a virtual object used when generating a correction map, which will be described later.
  • the virtual object may be, for example, a virtual object forming an augmented reality image by a game application.
  • the display image generation unit 54 uses the corrected depth value of the hand, which is the object to be recognized, to generate an augmented reality image in which the positional relationship between the hand and the wall, which is the virtual object, is appropriate. be able to.
  • the virtual object of the wall is superimposed on a part of the hand, making the hand partially invisible, and the user cannot see the finger on the wall. There is no such thing as an image that is stuck.
  • the correction map generation unit 55 generates a correction map, which is depth correction information, using the sensing results of the first camera 2A and the second camera 2B and the sensing results of the LiDAR sensor 3 .
  • the correction map generating unit 55 uses the two-dimensional feature point positions of the recognition target (hand) detected from the RGB images of the respective cameras 2 by the recognition unit 52 to determine the recognition target by triangulation. 3D feature point positions are calculated. The three-dimensional feature point positions of the recognition target object calculated using this triangulation are assumed to correspond to the three-dimensional feature point positions of the actual recognition target object, and include the depth values of the actual recognition target object.
  • the correction map generation unit 55 uses difference information between the depth value of the recognition target object calculated by triangulation and the depth value of the recognition target object based on the depth image of the LiDAR sensor 3 detected by the recognition unit 52, Generate a correction map. A method of generating the correction map will be described later.
  • the storage unit 56 includes a memory device such as a RAM and a non-volatile recording medium such as a hard disk drive, and is used to cause the mobile terminal 1 to execute recognition processing for recognition objects, correction map (depth correction information) generation processing, and the like. program.
  • the recognition processing program for the recognition target object stored in the storage unit 56 is for causing the recognition device (mobile terminal 1 in this embodiment) to execute the following steps.
  • the depth value of the object to be recognized acquired by the LiDAR sensor of a device (mobile terminal 1 in this embodiment) provided with a LiDAR sensor and an image sensor, the sensing result of the LiDAR sensor and the sensing result of the image sensor are obtained.
  • This is a step of referring to and correcting depth correction information (correction map) generated using the depth correction information.
  • the correction map (depth correction information) generation processing program stored in the storage unit 56 is for causing the recognition device (mobile terminal 1 in this embodiment) to execute the following steps.
  • the above steps include a step of calculating the three-dimensional position of the recognition target by triangulation from the two-dimensional position of the recognition target detected from the RGB images of each of the plurality of cameras, and a step of calculating the three-dimensional position of the recognition target from the depth image of the LiDAR sensor.
  • the storage unit 56 may store a pre-generated correction map.
  • the correction unit 53 may refer to the correction map prepared in advance to correct the depth value acquired by the LiDAR sensor 3 .
  • FIG. 4 is a flow diagram of a method for recognizing a recognition object. As shown in FIG. 4, when the recognition process starts, the acquisition unit 51 acquires the sensing result (depth image) of the LiDAR sensor 3 (ST1).
  • the hand region is detected by the recognition unit 52 using the depth image acquired by the acquisition unit 51 (ST2).
  • the recognition unit 52 estimates and detects the three-dimensional feature point positions of the hand, which is the object to be recognized, from the depth image (ST3).
  • the detected three-dimensional feature point position information of the recognition target object is output to the correction unit 53 .
  • the correction unit 53 corrects the Z position of the detected three-dimensional feature point position of the recognition object using the correction map (ST4).
  • the corrected 3D feature point positions of the recognition target object correspond to the actual 3D feature point positions of the recognition target object.
  • the corrected three-dimensional feature point position information of the object to be recognized is output to the display image generator 54 (ST5).
  • the recognition method of the present embodiment even if the recognition target is translucent human skin, the sensing result of the LiDAR sensor 3 is corrected using the correction map. The recognition accuracy of is improved.
  • the correction map is depth correction information for correcting the depth value (Z value) of the recognition target detected by the LiDAR sensor 3 .
  • the measured value of the LiDAR sensor 3 has an error from the actual value due to subsurface scattering on the skin, which is the object to be recognized, and individual differences of the LiDAR sensor 3 .
  • a correction map corrects for this error.
  • FIG. 5A a three-dimensional grid 9 is arranged in the real space of the imaging area 8 that can be acquired by the LiDAR sensor 3 .
  • the three-dimensional grid 9 includes a plurality of evenly spaced grid lines parallel to the X axis, a plurality of uniformly spaced grid lines parallel to the Y axis, and a plurality of uniformly spaced grid lines parallel to the Y axis. It is divided by grid lines parallel to the Z-axis.
  • FIG. 5B is a schematic diagram of FIG. 5A viewed from the Y-axis direction. 5A and 5B, reference numeral 30 indicates the center of the LiDAR sensor 3.
  • the correction map is a map that holds an offset value related to depth on each grid point of the three-dimensional grid 9 .
  • the "offset value related to depth” means how much the depth value (measured value) obtained by the LiDAR sensor 3 deviates from the actual depth value (actual value) in the Z-axis direction by + or -. is the value shown.
  • the black circle located on the grid point A indicates the three-dimensional position 13 of the recognition object based on the depth image acquired by the LiDAR sensor 3 .
  • a white circle with a white inside indicates the three-dimensional position 12 of the actual object to be recognized.
  • the three-dimensional position of the recognition object includes depth value information.
  • reference numeral 13 indicates the position measured by the LiDAR sensor 3
  • reference numeral 12 indicates the actual position.
  • the difference a between the depth value of the three-dimensional position 13 of the recognition target based on the depth image of the LiDAR sensor 3 and the depth value of the actual three-dimensional position 12 of the recognition target is the "offset value related to depth" at the grid point A. becomes.
  • the "offset value for depth" at grid point A is +.
  • an “offset value related to depth” is set for each grid point of the three-dimensional grid 9 arranged in the imaging region 8 .
  • offset value related to depth is simply referred to as “offset value”.
  • the three-dimensional position of the object to be recognized acquired by the LiDAR sensor 3 is called “measured position”.
  • the “measured position” is a pre-correction three-dimensional position and includes pre-correction depth value information.
  • an offset value is set for each lattice point of the three-dimensional grid 9.
  • the offset value set at the grid point is used to correct the depth value of the measurement position.
  • an offset value at the measurement position is calculated, and the offset value can be used to correct the depth value of the measurement position.
  • the offset value at the measurement position is calculated as follows.
  • the measurement position is in the XY plane passing through four grid points formed by two grid lines extending adjacently in the X-axis direction and two grid lines extending adjacently in the Y-axis direction.
  • the offset value at the measurement position is based on the ratio of the offset value at each of the four grid points and the distance value in the X-axis direction between two grid points adjacent in the X-axis direction among the four grid points and the measurement position.
  • the offset value at the measurement position is calculated based on the offset value at each of the four grid points and the weighted average of the distance values in the XY directions between the four grid points and the measurement position.
  • the case where the measurement position is positioned within a plane passing through four grid points has been described as an example. can be calculated. That is, in the three-dimensional grid 9, when the measurement position is in the three-dimensional space of the minimum unit partitioned by the grid lines, the offset value at each of the eight grid points constituting the minimum three-dimensional space, and the 8 The offset value at the measurement position can be calculated based on the weighted average of the distance values in each XYZ axis direction between one grid point and the measurement position.
  • a correction map can be generated using the sensing results of the first camera 2A and the second camera 2B and the sensing results of the LiDAR sensor 3 .
  • the outline of the correction map generation method will be described below with reference to FIGS. 6 and 7.
  • FIG. 6 The outline of the correction map generation method will be described below with reference to FIGS. 6 and 7.
  • FIG. 6 is a schematic diagram illustrating an example of generating a correction map using the mobile terminal 1 having two cameras and one LiDAR sensor.
  • the correction map is generated in a state in which the hand of the user U, which is the object to be recognized, is positioned within the shooting area of the mobile terminal 1 .
  • a plurality of small white circles superimposed on the hand of the user U indicate the characteristic point positions 6 of the hand of the user U, and indicate joint positions, fingertip positions, wrist positions, and the like.
  • a case of recognizing the fingertip position of the index finger will be described.
  • the white circle with reference numeral 120 is the tip of the index finger calculated by triangulation using the two-dimensional feature point positions detected from the RGB images respectively acquired by the first camera 2A and the second camera 2B. shows the 3D feature point positions of .
  • the fingertip position 120 calculated using this triangulation corresponds to the actual fingertip position and includes information on the depth value of the actual recognition object.
  • reference numeral 130 indicates the three-dimensional feature point positions of the tip of the index finger based on the depth image acquired by the LiDAR sensor 3.
  • the fingertip position 130 of the index finger acquired by the LiDAR sensor 3 is deviated from the actual fingertip position 120 of the object to be recognized due to subsurface scattering during measurement by the LiDAR sensor 3 .
  • the difference between the fingertip position 120 calculated using triangulation and the fingertip position 130 of the index finger based on the depth image of the LiDAR sensor 3 is the error component.
  • This error component becomes the "offset value related to depth" in the correction map.
  • a correction map for correcting the measurement error originating from the LiDAR sensor 3 when the recognition target in the mobile terminal 1 is human skin is generated by acquiring such error component data over the entire imaging area. be able to.
  • the three-dimensional feature point positions of the recognition object are detected from the depth image of the LiDAR sensor 3 (ST11).
  • the three-dimensional feature point positions based on this depth image correspond to reference numeral 130 in FIG.
  • two-dimensional feature point positions are detected from the RGB images of the first camera 2A and the second camera 2B (ST12).
  • the three-dimensional feature point positions of the recognition object are calculated by triangulation (ST13).
  • the three-dimensional feature point positions calculated by this triangulation are the actual three-dimensional feature point positions of the recognition object.
  • Three-dimensional feature point positions calculated by triangulation correspond to reference numeral 120 in FIG.
  • the depth image of the LiDAR sensor 3 estimated in ST21 with respect to the three-dimensional feature point positions calculated based on the RGB images of each of the plurality of cameras (the first camera 2A and the second camera 2B) calculated in ST23. is calculated as an error component (ST14).
  • a correction map is generated by acquiring such error component data for the entire imaging area.
  • the correction map includes difference information between the depth value of the recognition target object based on the sensing result of the LiDAR sensor 3 and the actual depth value of the recognition target object.
  • FIG. 8 is a diagram for explaining a basic image displayed on the display section 34 when the correction map is generated.
  • the display unit 34 of the mobile terminal 1 displays the through image acquired by the first camera 2A or the second camera 2B with the correction map generated.
  • An image in which a target sphere 7, which is a virtual object for , is superimposed is displayed.
  • the virtual object for generating the correction map is not limited to a spherical shape, and can have various shapes.
  • the user U holds the mobile terminal 1 with one hand and positions the other hand within the imaging area so that the other hand is displayed on the display unit 34 .
  • the correction map is generated by the user U viewing the image displayed on the display unit 34 and moving the other hand.
  • the target sphere 7 is displayed so that its position can be changed within the imaging area.
  • the user U moves the other hand so as to chase the target ball 7 according to the movement of the target ball 7 displayed on the display unit 34 . In this way, by moving the hand according to the movement of the target sphere 7, it is possible to obtain error component data in the entire imaging area, and use the data to generate a correction map.
  • FIG. 9 is a diagram for explaining an image displayed on the display unit 34 when the correction map is generated.
  • FIG. 10 is a flowchart relating to display of an image displayed on the display unit 34 when the correction map is generated.
  • the user U holds the mobile terminal 1 with one hand and positions the other hand so as to be within the field of view of the camera 2 .
  • the user U moves the other hand according to the moving direction and size of the target ball displayed on the display unit 34 while looking at the display unit 34 .
  • a correction map is generated based on this hand movement information.
  • FIG. 9A An image displayed when the correction map is generated will be described with reference to FIG. 9 according to the flow of FIG.
  • a through image captured by the first camera 2A or the second camera 2B is displayed on the display section 34 of the mobile terminal 1 (ST21).
  • the target sphere 7 is superimposed on the through-the-lens image and displayed at the target location (ST22).
  • a sphere 11 is displayed (ST23).
  • the ⁇ user recognition result sphere'' will be referred to as a ⁇ user sphere''.
  • Both the target sphere 7 and the user sphere 11 are virtual objects.
  • the target sphere 7 is displayed in different colors, for example, yellow, and the user sphere 11 is displayed in blue, for example, so that they can be distinguished from each other.
  • the size of the target sphere 7 does not change and is always displayed at a constant size.
  • the user sphere 11 is displayed at a predetermined position of the recognized user U's hand. For example, in the example shown in FIG. 8, the user sphere 11 is displayed such that the center of the user sphere 11 is positioned near the base of the middle finger.
  • a user sphere 11 indicates a recognition result based on the sensing result of the LiDAR sensor 3 .
  • the user sphere 11 is displayed so as to follow the movement of the hand of the user U within the XY plane. Furthermore, the size of the user sphere 11 changes according to the movement of the hand of the user U in the Z-axis direction. In other words, the size of the user sphere 11 changes according to the position (depth value) of the hand of the user U in the Z-axis direction.
  • the mobile terminal 1 guides the user, for example, by voice, to move the hand so that the user sphere 11 matches the target sphere 7 as shown in FIG. 9(B) (ST24).
  • the match between the target sphere 7 and the user sphere 11 means that the positions and sizes of the two spheres are substantially the same.
  • Guidance for the match between the target sphere 7 and the user sphere 11 may be displayed on the display unit 34 in text as well as voice.
  • the target sphere 7 moves as shown in FIG. 9(D).
  • the portable terminal 1 guides the user U to follow the movement of the target ball 7 by voice or the like.
  • the target sphere 7 moves throughout the imaging area.
  • the correction map generation unit 55 acquires information on the movement of the hand of the user U, which moves so as to follow the target sphere 7 that moves over the entire imaging area. That is, the three-dimensional position information of the object (hand) to be recognized by the LiDAR sensor 3 in the entire imaging area is acquired by the correction map generation unit 55 (ST25).
  • the correction map generation unit 55 obtains the three-dimensional position information of the recognition target object by the LiDAR sensor 3, and in parallel, the three-dimensional position information calculated by triangulation. is also obtained. That is, the correction map generation unit 55 acquires the RGB images of the two cameras 2A and 2B, and uses the two-dimensional position information of the recognition target detected from the RGB images of each camera to perform triangulation to determine the three-dimensional image of the recognition target. The original position is calculated. Three-dimensional position information calculated by this triangulation is also acquired over the entire imaging area.
  • the three-dimensional position information of the recognition target based on the depth image (sensing result) of the LiDAR sensor 3 and the RGB images (sensing results) of the two cameras 2A and 2B, respectively. ) is calculated.
  • a correction map is generated by the correction map generator 55 using the error component data in the entire imaging area. In this way, the user can generate a correction map for correcting the measurement error (ranging error) by the LiDAR sensor 3 for each mobile terminal 1, and can make adjustments suitable for the mounted LiDAR sensor 3. Become.
  • correction map may be generated by the user for each mobile terminal 1 as described above, or may be prepared in advance.
  • a device mobile terminal in this embodiment
  • the type of sensor installed for each type of device is known in advance.
  • a correction map may be generated and prepared in advance. The same thing can be said for a second embodiment, which will be described later.
  • ⁇ Second embodiment> In the first embodiment, an example of generating a correction map using the sensing results of two cameras and one LiDAR sensor was given, but the present invention is not limited to this. In this embodiment, an example of generating a correction map using the sensing results of one camera and one LiDAR sensor mounted on a device (mobile terminal in this embodiment) will be given.
  • the mobile terminal as a device in this embodiment differs from the mobile terminal in the first embodiment in that the number of cameras is different. While the mobile terminal in the first embodiment is equipped with a compound camera, the mobile terminal in the second embodiment is equipped with a monocular camera. Differences will be mainly described below.
  • the program for generating the correction map (depth correction information) stored in the storage unit 56 of the mobile terminal 1 that also functions as the recognition device performs the following steps: 1) is executed.
  • the above steps include a step of detecting the two-dimensional position of the recognition target from the RGB image (sensing result) of one camera, and a step of detecting the two-dimensional position of the recognition target from the reliability image (sensing result) of the LiDAR sensor.
  • FIG. 11 is a schematic diagram illustrating an example of generating a correction map using the mobile terminal 1.
  • FIG. 11 a plurality of small white circles overlapped with the user's U hand indicate feature point positions 6 of the user's U hand.
  • FIG. 12 is a flowchart of the correction map generation method according to this embodiment. The image displayed on the display unit when generating the correction map is the same as in the first embodiment.
  • reference numeral 121 denotes an index finger calculated by triangulation using the two-dimensional feature point positions detected from the RGB image of the camera 2 and the two-dimensional feature point positions detected from the reliability image of the LiDAR sensor 3. indicates the fingertip position of the The fingertip position 121 calculated using triangulation is assumed to correspond to the actual fingertip position and includes information on the actual depth value of the recognition target object.
  • a fingertip position 121 is a three-dimensional feature point position of the recognition object.
  • a reliability image is reliability information that represents the reliability of depth information acquired by the LiDAR sensor 3 for each pixel.
  • the reliability is calculated at the same time when depth information is acquired by the LiDAR sensor 3 .
  • the reliability is calculated using luminance information and contrast information of the image used for depth information calculation.
  • the reliability is determined for each pixel using a real value, and finally a reliability image is generated as a grayscale image in which the reliability is a luminance value.
  • reference numeral 131 indicates the three-dimensional feature point positions of the tip of the index finger based on the depth image acquired by the LiDAR sensor 3.
  • the fingertip position 131 of the index finger acquired by the LiDAR sensor 3 is deviated from the actual fingertip position 121 of the object to be recognized due to subsurface scattering during measurement by the LiDAR sensor 3 .
  • the difference between the fingertip position 121 calculated using triangulation and the fingertip position 131 of the index finger based on the depth image of the LiDAR sensor 3 is the error component.
  • This error component becomes the "offset value related to depth" in the correction map.
  • a correction map for correcting the measurement error originating from the LiDAR sensor 3 when the recognition target in the mobile terminal 1 is human skin is generated by acquiring such error component data over the entire imaging area. be able to.
  • the correction map includes difference information between the depth value of the recognition target object based on the sensing result of the LiDAR sensor 3 and the actual depth value of the recognition target object.
  • the correction map generation unit 55 generates three-dimensional position information of the recognition target based on the depth image (sensing result) of the LiDAR sensor 3 and the RGB image (sensing result) of one camera 2. and three-dimensional position information of the recognition target based on the reliability image (sensing result) of the LiDAR sensor 3, a correction map is generated.
  • the flow of correction map generation processing in the processing unit 50 will be described below with reference to FIG. 12 .
  • the three-dimensional feature point positions of the recognition object are detected from the depth image of the LiDAR sensor 3 (ST31).
  • the three-dimensional feature point positions based on this depth image correspond to reference numeral 131 in FIG.
  • two-dimensional feature points are detected from the reliability image of the LiDAR sensor 3 (ST32).
  • two-dimensional feature point positions are detected from the RGB image of camera 2 (ST33).
  • the three-dimensional feature point positions of the recognition target are calculated by triangulation.
  • the three-dimensional feature point positions calculated using this triangulation correspond to the actual three-dimensional feature point positions of the recognition object.
  • Three-dimensional feature point positions calculated by triangulation correspond to reference numeral 121 in FIG.
  • the present technology generates the depth value obtained by the LiDAR sensor of a device equipped with a LiDAR sensor and a camera (image sensor) using the sensing result of the LiDAR sensor and the sensing result of the camera. Correction is performed by referring to the correction map (depth correction information) provided. As a result, it is possible to correct the error in the depth value of the sensing result of the LiDAR sensor according to the individual difference of the LiDAR sensor, and it is possible to improve the recognition accuracy of the recognition object.
  • This technology is particularly preferably applied when the object to be recognized is translucent like human skin.
  • the recognition target is a translucent object
  • by correcting the depth value acquired by the LiDAR sensor using a correction map subsurface scattering in the recognition target and individual differences in sensor devices can be corrected.
  • the deviation (error) between the measured value of the LiDAR sensor and the actual value due to is corrected.
  • This enables stable and highly accurate measurement of the recognition target object, thereby improving the recognition accuracy of the recognition target object. Therefore, as described above, the present technology can be particularly preferably applied to the recognition of human hands whose skin is frequently exposed.
  • the technology may also be applied to gesture recognition to recognize gesture actions performed by a user.
  • gesture recognition results of hand gestures performed by users can be used to input operations for games and home appliances. Since the present technology enables highly accurate recognition of a recognition target, stable and accurate operation input is possible.
  • Embodiments of the present technology are not limited to the above-described embodiments, and various modifications are possible without departing from the gist of the present technology.
  • RGB-D cameras may be used.
  • the first embodiment instead of two cameras and one LiDAR sensor, one camera and one RGB-D camera may be used.
  • one RGB-D camera instead of one camera and one LiDAR sensor, one RGB-D camera may be used.
  • a mobile terminal which is a device equipped with an image sensor and a LiDAR sensor, functions as a recognition device that recognizes a recognition target.
  • the recognition device that recognizes the recognition target object may be an external device different from the device including the image sensor and the LiDAR sensor.
  • part or all of the processing unit 50 shown in FIG. 3 may be configured by an external device such as a server different from the device including the image sensor and the LiDAR sensor.
  • a LiDAR Light Detection and Ranging
  • a recognition device comprising a processing unit.
  • the depth correction information includes difference information between the depth value of the recognition target object based on the sensing result of the LiDAR sensor and the actual depth value of the recognition target object.
  • the recognition device comprises a plurality of the image sensors and one LiDAR sensor,
  • the depth correction information includes the depth value of the recognition target calculated by triangulation using the position information of the recognition target detected from the sensing results of each of the plurality of image sensors, and the sensing result of the LiDAR sensor.
  • Recognition device including difference information from the depth value of the recognition target object based on the depth image of .
  • the device comprises at least one image sensor and one LiDAR sensor;
  • the depth correction information is the position information of the recognition target detected from the sensing result of one of the image sensors and the position information of the recognition target detected from the reliability image as the sensing result of the LiDAR sensor.
  • the recognition device according to any one of (1) to (4) above, The recognition device, wherein the object to be recognized is a translucent object. (6) The recognition device according to (5) above, The recognition device, wherein the recognition object is human skin. (7) The recognition device according to (6) above, The recognition device, wherein the recognition object is a human hand. (8) The recognition device according to any one of (1) to (7) above, The recognition device, wherein the processing unit recognizes a gesture motion of a person who is the recognition target object.
  • the recognition device (9) The recognition device according to any one of (1) to (8) above, The recognition device, wherein the processing unit generates the depth correction information using a sensing result of the LiDAR sensor and a sensing result of the image sensor. (10) The recognition device according to any one of (1) to (9) above, The device has a display, The recognition device, wherein the processing unit generates an image to be displayed on the display unit using the corrected depth value of the recognition target object.
  • a LiDAR (Light Detection and Ranging) sensor having a light emitting section that irradiates light onto a recognition target and a light receiving section that receives light reflected from the recognition target, and an image sensor that captures the recognition target.
  • a LiDAR (Light Detection and Ranging) sensor having a light emitting section that irradiates light onto a recognition target and a light receiving section that receives light reflected from the recognition target, and an image sensor that captures the recognition target. and correcting the depth value of the recognition target acquired by the LiDAR sensor by referring to depth correction information generated using the sensing result of the LiDAR sensor and the sensing result of the image sensor.
  • a program that causes a recognizer to perform steps .
  • Mobile terminal (recognition device, device) 2... Camera (image sensor) 2A... First camera (image sensor) 2B... Second camera (image sensor) 3 LiDAR sensor 12, 120, 121 Actual fingertip position, fingertip position calculated by triangulation (three-dimensional position of recognition object including actual depth value) 13, 130, 131 ... Fingertip positions based on LiDAR sensor sensing results (three-dimensional positions of recognition objects including depth values based on LiDAR sensor sensing results) 34... Display unit 50... Processing unit

Abstract

[Problem] To provide a recognition device, a recognition method, and a program that enable improvement in the accuracy of recognizing a recognition subject. [Solution] A recognition device of the present technology is equipped with a processing unit. The processing unit corrects a depth value of a recognition subject acquired with a Light Detection and Ranging (LiDAR) sensor of an apparatus provided with the LiDAR sensor and an image sensor that images the recognition subject, the LiDAR sensor including a light-emitting unit for irradiating the recognition subject with light and a light-receiving unit for receiving light that is reflected back from the recognition subject. The depth value is corrected by consulting depth correction information generated by using a sensing result of the LiDAR sensor and a sensing result of the image sensor.

Description

認識装置、認識方法及びプログラムRecognition device, recognition method and program
 本技術は、認識対象物の認識に係る認識装置、認識方法及びプログラムに関する。 The present technology relates to a recognition device, a recognition method, and a program related to recognition of a recognition target.
 特許文献1には、カメラ画像に仮想オブジェクトを重畳した拡張現実画像に対して、ユーザが仮想オブジェクトに手を伸ばしている画像をユーザに提供することが記載されている。 Patent Document 1 describes providing the user with an image of the user reaching out for the virtual object in an augmented reality image in which the virtual object is superimposed on the camera image.
特開2020-064592号公報JP 2020-064592 A
 例えば、仮想オブジェクトを重畳した拡張現実画像に対して、ユーザが仮想オブジェクトに手を伸ばしている画像を生成する場合、手の認識精度が低いと、手の上に仮想オブジェクトが重畳されて手が見えなくなるなど不自然な拡張現実画像になることがあった。 For example, when generating an image in which a user reaches out to a virtual object from an augmented reality image in which a virtual object is superimposed, if the hand recognition accuracy is low, the virtual object is superimposed on the hand and the hand is displaced. Sometimes it became an unnatural augmented reality image such as becoming invisible.
 以上のような事情に鑑み、本技術の目的は、認識対象物の認識精度を向上することが可能な認識装置、認識方法及びプログラムを提供することにある。 In view of the circumstances as described above, an object of the present technology is to provide a recognition device, a recognition method, and a program capable of improving recognition accuracy of a recognition target object.
 本技術に係る認識装置は、処理部を具備する。
 上記処理部は、認識対象物に光を照射する発光部と、上記認識対象物から反射される光を受光する受光部とを有するLiDAR(Light Detection and Ranging)センサと上記認識対象物を撮像するイメージセンサとを備える機器の、上記LiDARセンサで取得される上記認識対象物のデプス値を、上記LiDARセンサのセンシング結果及び上記イメージセンサのセンシング結果を用いて生成されるデプス補正情報を参照して補正する。
A recognition device according to the present technology includes a processing unit.
The processing unit captures an image of a LiDAR (Light Detection and Ranging) sensor having a light emitting unit that irradiates a recognition target with light and a light receiving unit that receives light reflected from the recognition target, and the recognition target. With reference to depth correction information generated using the sensing result of the LiDAR sensor and the sensing result of the image sensor, the depth value of the recognition target obtained by the LiDAR sensor of the device equipped with an image sensor. to correct.
 このような構成によれば、LiDARセンサに由来する計測誤差を補正することができ、認識対象物の認識精度を向上させることができる。 According to such a configuration, it is possible to correct the measurement error derived from the LiDAR sensor and improve the recognition accuracy of the recognition target.
 上記デプス補正情報は、上記LiDARセンサのセンシング結果に基づく上記認識対象物のデプス値と、上記認識対象物の実際のデプス値との差分情報を含んでもよい。 The depth correction information may include difference information between the depth value of the recognition target object based on the sensing result of the LiDAR sensor and the actual depth value of the recognition target object.
 上記機器は、複数の上記イメージセンサと、1つの上記LiDARセンサを備え、
 上記デプス補正情報は、複数の上記イメージセンサそれぞれのセンシング結果から検出される上記認識対象物の位置情報を用いて三角測量により算出される上記認識対象物のデプス値と、上記LiDARセンサのセンシング結果としてのデプス画像に基づく上記認識対象物のデプス値との差分情報を含んでもよい。
The device comprises a plurality of image sensors and one LiDAR sensor,
The depth correction information includes the depth value of the recognition target calculated by triangulation using the position information of the recognition target detected from the sensing results of each of the plurality of image sensors, and the sensing result of the LiDAR sensor. may include difference information from the depth value of the recognition target object based on the depth image as .
 上記機器は、少なくとも1つの上記イメージセンサと、1つの上記LiDARセンサを備え、
 上記デプス補正情報は、1つの上記イメージセンサのセンシング結果から検出される上記認識対象物の位置情報と上記LiDARセンサのセンシング結果としての信頼度画像から検出される上記認識対象物の位置情報とを用いて三角測量により算出される上記認識対象物のデプス値、上記LiDARセンサのセンシング結果としてのデプス画像に基づく上記認識対象物のデプス値との差分情報を含んでもよい。
the device comprises at least one image sensor and one LiDAR sensor;
The depth correction information is the position information of the recognition target detected from the sensing result of one of the image sensors and the position information of the recognition target detected from the reliability image as the sensing result of the LiDAR sensor. and the depth value of the recognition target object calculated by triangulation using the LiDAR sensor, and the difference information between the depth value of the recognition target object based on the depth image as the sensing result of the LiDAR sensor.
 上記認識対象物は半透明体であってもよい。
 上記認識対象物は人間の肌であってもよい。
 上記認識対象物は人間の手であってもよい。
 上記処理部は、上記認識対象物である人間のジェスチャ動作を認識してもよい。
The object to be recognized may be a translucent body.
The object to be recognized may be human skin.
The object to be recognized may be a human hand.
The processing unit may recognize a gesture motion of a person who is the object to be recognized.
 上記処理部は、上記LiDARセンサのセンシング結果及び上記イメージセンサのセンシング結果を用いて上記デプス補正情報を生成してもよい。 The processing unit may generate the depth correction information using the sensing result of the LiDAR sensor and the sensing result of the image sensor.
 上記機器は表示部を備え、
 上記処理部は、補正した上記認識対象物のデプス値を用いて、上記表示部に表示する画像を生成してもよい。
The device has a display,
The processing unit may generate an image to be displayed on the display unit using the corrected depth value of the recognition target object.
 本技術に係る認識方法は、認識対象物に光を照射する発光部と、上記認識対象物から反射される光を受光する受光部とを有するLiDAR(Light Detection and Ranging)センサと上記認識対象物を撮像するイメージセンサとを備える機器の、上記LiDARセンサで取得される上記認識対象物のデプス値を、上記LiDARセンサのセンシング結果及び上記イメージセンサのセンシング結果を用いて生成されるデプス補正情報を参照して補正する。 A recognition method according to the present technology includes a LiDAR (Light Detection and Ranging) sensor having a light emitting unit that irradiates light onto an object to be recognized and a light receiving unit that receives light reflected from the object to be recognized, and the object to be recognized. Depth correction information generated using the sensing result of the LiDAR sensor and the sensing result of the image sensor, the depth value of the recognition target acquired by the LiDAR sensor of the device comprising an image sensor that captures the Refer to and correct.
 本技術に係るプログラムは、
 認識対象物に光を照射する発光部と、上記認識対象物から反射される光を受光する受光部とを有するLiDAR(Light Detection and Ranging)センサと上記認識対象物を撮像するイメージセンサとを備える機器の、上記LiDARセンサで取得される上記認識対象物のデプス値を、上記LiDARセンサのセンシング結果及び上記イメージセンサのセンシング結果を用いて生成されるデプス補正情報を参照して補正するステップ
 を認識装置に実行させる。
The program related to this technology is
A LiDAR (Light Detection and Ranging) sensor having a light emitting section that irradiates light onto a recognition target and a light receiving section that receives light reflected from the recognition target, and an image sensor that captures the recognition target. A step of correcting the depth value of the recognition object acquired by the LiDAR sensor of the device by referring to depth correction information generated using the sensing result of the LiDAR sensor and the sensing result of the image sensor. Let the device do it.
本技術の実施形態に係る認識装置としての携帯端末の外観図である。1 is an external view of a mobile terminal as a recognition device according to an embodiment of the present technology; FIG. 上記携帯端末の概略構成図である。It is a schematic block diagram of the said portable terminal. 上記携帯端末の機能構成ブロックを含む構成図である。2 is a configuration diagram including functional configuration blocks of the mobile terminal; FIG. 認識対象物の認識方法のフロー図である。FIG. 4 is a flowchart of a recognition method for a recognition target object; 補正マップを説明するための図である。FIG. 4 is a diagram for explaining a correction map; FIG. 第1実施形態に係る補正マップの生成方法を説明する模式図である。FIG. 4 is a schematic diagram illustrating a method of generating a correction map according to the first embodiment; FIG. 第1実施形態における補正マップ生成方法のフロー図である。FIG. 4 is a flowchart of a correction map generation method according to the first embodiment; 補正マップ生成時に表示部に表示される基本的な画像を説明する図である。FIG. 4 is a diagram for explaining a basic image displayed on the display section when generating a correction map; 補正マップ生成時の表示部に表示されるより詳細な画像を説明する図である。FIG. 10 is a diagram for explaining a more detailed image displayed on the display unit when generating a correction map; 補正マップ生成時の表示部に表示される画像の表示方法に係るフロー図である。FIG. 10 is a flow chart relating to a method of displaying an image displayed on a display unit when generating a correction map; 第2実施形態に係る補正マップの生成方法を説明する模式図である。FIG. 10 is a schematic diagram illustrating a method of generating a correction map according to the second embodiment; FIG. 第2実施形態における補正マップ生成方法のフロー図である。FIG. 11 is a flowchart of a correction map generation method according to the second embodiment;
 以下、本技術に係る実施形態を、図面を参照しながら説明する。以下の説明において、同様の構成については同様の符号を付し、既出の構成については説明を省略する場合がある。 Hereinafter, embodiments according to the present technology will be described with reference to the drawings. In the following description, the same reference numerals are given to the same configurations, and the description of the already-outed configurations may be omitted.
<第1実施形態>
[認識装置の外観構成]
 図1は認識装置としての携帯端末1の外観図である。図1(A)は携帯端末1を表示部34のある正面1a側から見た平面図であり、図1(B)は携帯端末1を背面1b側から見た平面図である。
 本明細書において、図に示す互いに直交するXYZ座標方向は、略直方体の携帯端末1の横、縦、高さに対応する。正面1a及び背面1bと平行な面をXY平面として、高さ方向に相当する携帯端末1の厚み方向をZ軸とする。本明細書において、該Z軸方向がデプス方向に対応する。
 本実施形態において、携帯端末1は、認識対象物を認識する認識装置として機能する。また、携帯端末1は、イメージセンサである第1カメラ2A及び第2カメラ2Bと、LiDARセンサ3と、表示部34を有する機器である。携帯端末1は多眼カメラを有する機器である。
<First Embodiment>
[Appearance Configuration of Recognition Device]
FIG. 1 is an external view of a mobile terminal 1 as a recognition device. FIG. 1A is a plan view of the mobile terminal 1 as seen from the front 1a side where the display unit 34 is located, and FIG. 1B is a plan view of the mobile terminal 1 as seen from the back 1b side.
In this specification, the XYZ coordinate directions orthogonal to each other shown in the drawings correspond to the width, length, and height of the mobile terminal 1, which has a substantially rectangular parallelepiped shape. A plane parallel to the front surface 1a and the rear surface 1b is defined as an XY plane, and the thickness direction of the mobile terminal 1 corresponding to the height direction is defined as the Z axis. In this specification, the Z-axis direction corresponds to the depth direction.
In this embodiment, the mobile terminal 1 functions as a recognition device that recognizes a recognition target object. The mobile terminal 1 is a device having a first camera 2A and a second camera 2B, which are image sensors, a LiDAR sensor 3, and a display section . A mobile terminal 1 is a device having a multi-view camera.
 図1(A)及び(B)に示すように、携帯端末1は、筐体4と、表示部34と、第1カメラ2Aと、第2カメラ2Bと、LiDARセンサ3と、を有する。携帯端末1は、筐体4に、表示部34を構成する表示パネル、第1カメラ2A、第2カメラ2B、LiDARセンサ3、その他の各種センサ、及び、駆動回路等が保持されて構成される。 As shown in FIGS. 1A and 1B, the mobile terminal 1 has a housing 4, a display section 34, a first camera 2A, a second camera 2B, and a LiDAR sensor 3. The mobile terminal 1 is configured such that a housing 4 holds a display panel constituting a display unit 34, a first camera 2A, a second camera 2B, a LiDAR sensor 3, other various sensors, a drive circuit, and the like. .
 携帯端末1は、正面1aと、該正面1aの反対側に位置する背面1bと、を有する。
 図1(A)に示すように、正面1a側には表示部34が配置される。表示部34は、例えば液晶ディスプレイ、有機ELディスプレイ(Organic Electro-Luminescence Display)等の表示パネル(画像表示手段)により構成される。表示部34は、後述する通信部41を通して外部機器から送受信される画像、後述する表示画像生成部54で生成された画像、入力操作用のボタン、第1カメラ2Aや第2カメラ2Bにより撮影されたスルー画像等を表示可能に構成される。画像には静止画及び動画が含まれる。
 図1(B)に示すように、背面1b側には第1カメラ2Aの撮像レンズ、第2カメラ2Bの撮像レンズ、LiDARセンサ3の撮像レンズが位置する。
The mobile terminal 1 has a front surface 1a and a rear surface 1b located on the opposite side of the front surface 1a.
As shown in FIG. 1A, a display section 34 is arranged on the front face 1a side. The display unit 34 is configured by a display panel (image display means) such as a liquid crystal display or an organic EL display (Organic Electro-Luminescence Display). The display unit 34 displays images transmitted and received from an external device through a communication unit 41 described later, images generated by a display image generation unit 54 described later, input operation buttons, and images captured by the first camera 2A and the second camera 2B. A through image or the like can be displayed. Images include still images and moving images.
As shown in FIG. 1B, the imaging lens of the first camera 2A, the imaging lens of the second camera 2B, and the imaging lens of the LiDAR sensor 3 are positioned on the rear surface 1b side.
 第1カメラ2A、第2カメラ2B、及び、LiDARセンサ3は、撮影空間においてセンシングされる同一の認識対象物(被写体)の座標値が同じになるように、それぞれ予めキャリブレーションされる。これにより、第1カメラ2A、第2カメラ2B、及び、LiDARセンサ3によりセンシングされたRGB情報(RGB画像データ)及びデプス情報(デプス画像データ)を統合することにより、ポイントクラウド(各ポイントが三次元座標を持つような情報の集合)を構成することが可能である。
 第1カメラ2A、第2カメラ2B、及び、LiDARセンサ3の構成については後述する。
The first camera 2A, the second camera 2B, and the LiDAR sensor 3 are preliminarily calibrated so that the coordinate values of the same recognition object (subject) sensed in the shooting space are the same. As a result, by integrating the RGB information (RGB image data) and depth information (depth image data) sensed by the first camera 2A, the second camera 2B, and the LiDAR sensor 3, a point cloud (each point is a tertiary It is possible to construct a set of information that has original coordinates).
The configurations of the first camera 2A, the second camera 2B, and the LiDAR sensor 3 will be described later.
[認識装置の全体構成及び各部の構成]
 図2は携帯端末1の概略構成図である。図3は携帯端末1の機能構成ブロックを含む構成図である。
[Overall Configuration of Recognition Device and Configuration of Each Part]
FIG. 2 is a schematic configuration diagram of the mobile terminal 1. As shown in FIG. FIG. 3 is a configuration diagram including functional configuration blocks of the mobile terminal 1. As shown in FIG.
 図2に示すように、携帯端末1は、センサ部10と、通信部41と、CPU(Central Processing Unit)42と、表示部34と、GNSS受信部44と、メインメモリ45と、フラッシュメモリ46と、オーディオデバイス部47と、バッテリー48と、を有する。 As shown in FIG. 2, the mobile terminal 1 includes a sensor unit 10, a communication unit 41, a CPU (Central Processing Unit) 42, a display unit 34, a GNSS reception unit 44, a main memory 45, and a flash memory 46. , an audio device unit 47 , and a battery 48 .
 センサ部10は、第1カメラ2A、第2カメラ2B、LiDARセンサ3といった撮像デバイスやタッチセンサ43等の各種センサを含む。タッチセンサ43は、典型的には表示部34を構成する表示パネル上に配置される。タッチセンサ43は、表示部34上でユーザによって行われる設定等の入力操作等をうけつける。
 通信部41は、外部機器と通信可能に構成される。
 CPU42は、オペレーティングシステムを実行することにより携帯端末1の全体を制御する。CPU42はまた、リムーバブル記録媒体から読みだされてメインメモリ45にロードされた、あるいは通信部41を介してダウンロードされた各種プログラムを実行する。
 GNSS受信部44は、全球測位衛星システム(Global Navigation Satellite System:GNSS)信号受信機である。GNSS受信部44は携帯端末1の位置情報を取得する。
 メインメモリ45は、RAM(Random Access Memory)により構成され、処理に必要なプログラムやデータを記憶する。
 フラッシュメモリ46は、補助記憶装置である。
 オーディオデバイス部47は、マイクロフォン及びスピーカを含む。
 バッテリー48は、携帯端末1の駆動電源である。
The sensor unit 10 includes imaging devices such as the first camera 2A, the second camera 2B, and the LiDAR sensor 3 and various sensors such as the touch sensor 43 . The touch sensor 43 is typically arranged on a display panel that constitutes the display section 34 . The touch sensor 43 receives input operations such as settings performed by the user on the display unit 34 .
The communication unit 41 is configured to communicate with an external device.
The CPU 42 controls the entire mobile terminal 1 by executing an operating system. The CPU 42 also executes various programs read from a removable recording medium and loaded into the main memory 45 or downloaded via the communication section 41 .
The GNSS receiver 44 is a Global Navigation Satellite System (GNSS) signal receiver. The GNSS receiver 44 acquires position information of the mobile terminal 1 .
The main memory 45 is composed of a RAM (Random Access Memory) and stores programs and data necessary for processing.
Flash memory 46 is an auxiliary storage device.
Audio device section 47 includes a microphone and a speaker.
A battery 48 is a power source for driving the mobile terminal 1 .
 図3に示すように、携帯端末1は、センサ部10と、処理部50と、記憶部56と、表示部34と、を有する。図3のセンサ部10において、本技術に主に係る主要なセンサについてのみ図示している。 As shown in FIG. 3, the mobile terminal 1 has a sensor section 10, a processing section 50, a storage section 56, and a display section . In the sensor section 10 of FIG. 3, only main sensors mainly related to the present technology are illustrated.
 センサ部10に含まれる第1カメラ2A、第2カメラ2B、LiDARセンサ3でのセンシング結果は処理部50に出力される。 The sensing results of the first camera 2A, the second camera 2B, and the LiDAR sensor 3 included in the sensor unit 10 are output to the processing unit 50.
 (カメラ)
 第1カメラ2Aと第2カメラ2Bは同様の構成を有する。以下、第1カメラ2A、第2カメラ2Bというように両者を特に区別する必要がない場合は、カメラ2という。
 カメラ2は、画像データとして、被写体のカラー二次元画像(RGB画像ということもある。)を撮像することが可能なRGBカメラである。RGB画像は、カメラ2のセンシング結果である。
 カメラ2は、認識対象物(被写体)を撮像するイメージセンサである。イメージセンサは、例えばCCD(Charge-Coupled Device)センサ又はCMOS(Complementary Metal Oxide Semiconductor)センサ等である。イメージセンサは、受光部であるフォトダイオードと、信号処理回路を有する。イメージセンサでは、受光部で受光した光が信号処理回路により信号処理され、受光部への入射光の光量に応じた画像データが取得される。
(camera)
The first camera 2A and the second camera 2B have the same configuration. Hereinafter, when there is no particular need to distinguish between the two, such as the first camera 2A and the second camera 2B, the camera 2 is used.
The camera 2 is an RGB camera capable of capturing a color two-dimensional image (also called an RGB image) of a subject as image data. The RGB image is the sensing result of camera 2 .
The camera 2 is an image sensor that captures an image of a recognition target (object). The image sensor is, for example, a CCD (Charge-Coupled Device) sensor or a CMOS (Complementary Metal Oxide Semiconductor) sensor. The image sensor has a photodiode, which is a light receiving portion, and a signal processing circuit. In the image sensor, the light received by the light receiving portion is subjected to signal processing by a signal processing circuit, and image data corresponding to the amount of light incident on the light receiving portion is obtained.
 (LiDARセンサ)
 LiDARセンサ3は、認識対象物(被写体)のデプス画像(距離画像ともいう。)を撮像する。デプス画像は、LiDARセンサ3のセンシング結果である。デプス画像は、認識対象物のデプス値を含むデプス情報である。
 LiDARセンサ3は、レーザ光を用いたリモートセンシング技術(LiDAR:Light Detection and Ranging)を用いた測距センサである。
 LiDARセンサには、ToF(Time of flight)方式及びFMCW(Frequency Modulated Continuous Wave)方式があり、いずれの方式のものを用いてもよいが、ToF方式を好適に用いることができる。本実施形態では、ToF方式のLiDARセンサ(以下、ToFセンサという。)を用いる例をあげる。
 ToFセンサには、「直接(Direct)方式」と「間接(In Direct)方式」があり、いずれの方式のToFセンサを用いてもよい。「直接方式」は、短時間発光する光パルスを被写体に照射し、その反射光がToFセンサに届くまでの時間を実測する。「間接方式」は、周期的に点滅する光を用いて、その光が被写体との間を往復するときの時間遅れを位相差として検出する。高画素化の観点から間接方式のToFセンサを用いることがより好ましい。
(LiDAR sensor)
The LiDAR sensor 3 captures a depth image (also referred to as a distance image) of a recognition target (subject). A depth image is a sensing result of the LiDAR sensor 3 . A depth image is depth information including a depth value of a recognition object.
The LiDAR sensor 3 is a ranging sensor that uses remote sensing technology (LiDAR: Light Detection and Ranging) using laser light.
LiDAR sensors include a ToF (Time of flight) method and an FMCW (Frequency Modulated Continuous Wave) method, and although either method may be used, the ToF method can be preferably used. In this embodiment, an example using a ToF-type LiDAR sensor (hereinafter referred to as a ToF sensor) will be given.
There are two types of ToF sensors: a “direct method” and an “indirect method”, and either type of ToF sensor may be used. The "direct method" irradiates a subject with a light pulse that emits light for a short time, and measures the time it takes for the reflected light to reach the ToF sensor. The "indirect method" uses light that blinks periodically and detects the time delay as the phase difference when the light makes a round trip to and from the subject. From the viewpoint of increasing the number of pixels, it is more preferable to use an indirect ToF sensor.
 LiDARセンサ3は、発光部、受光部であるフォトダイオード、及び、信号処理回路を有する。発光部は、レーザ光、典型的には近赤外光(NIR光)を発光する。受光部は、発光部から発光されたNIR光が認識対象物(被写体)で反射したときの戻り光(反射光)を受光する。LiDARセンサ3では、信号処理回路によって受光した戻り光が信号処理され、被写体に対応したデプス画像が取得される。発光部は、例えば発光ダイオード(LED)等の発光部材とそれを発光させるためのドライバ回路を含んで構成される。 The LiDAR sensor 3 has a light emitting part, a photodiode as a light receiving part, and a signal processing circuit. The light emitting unit emits laser light, typically near-infrared light (NIR light). The light receiving unit receives return light (reflected light) when the NIR light emitted from the light emitting unit is reflected by a recognition object (object). In the LiDAR sensor 3, the received return light is signal-processed by the signal processing circuit, and a depth image corresponding to the subject is acquired. The light emitting unit includes, for example, a light emitting member such as a light emitting diode (LED) and a driver circuit for causing the light emitting member to emit light.
 ここで、LiDARセンサを用いて認識対象物(被写体)のデプス情報を得る際、認識対象物が半透明体であると、認識対象物での表面下散乱やセンサデバイスの個体差によって、計測値と実際の値(以下、実際値という。)との誤差(測距誤差)が生じるという問題があった。言い換えると、認識対象物の材質の反射特性やセンサデバイスの個体差によって、認識対象物の三次元計測精度が悪化するという問題があった。
 LiDARセンサにおいて、人間の肌のような半透明体が認識対象物である場合、表面下散乱(皮下散乱ともいう。)の影響で、発光部から発光した光が認識対象物で反射して戻ってくるまでの時間が余計にかかる。このため、LiDARセンサでは、実際値よりも少し奥のデプス値として計測される。例えば、認識対象物が人の肌である場合、計測値と実際値のデプス値における誤差が20mm程度生じる場合がある。
 半透明体の例としては、人間の肌、大理石、牛乳等が知られている。半透明体は、その内部で光の透過と散乱が生じる物体である。
Here, when obtaining depth information of a recognition target (subject) using a LiDAR sensor, if the recognition target is a translucent object, the measurement value and the actual value (hereinafter referred to as actual value). In other words, there is a problem that the three-dimensional measurement accuracy of the recognition target deteriorates due to the reflection characteristics of the material of the recognition target and the individual differences of the sensor devices.
In the LiDAR sensor, when a translucent body such as human skin is the recognition target, the light emitted from the light emitting unit is reflected back by the recognition target due to the influence of subsurface scattering (also called subcutaneous scattering). It takes extra time to arrive. Therefore, the LiDAR sensor measures a depth value slightly deeper than the actual value. For example, when the object to be recognized is human skin, an error of about 20 mm may occur between the measured value and the actual depth value.
Human skin, marble, milk, etc. are known as examples of translucent bodies. A translucent body is an object within which light transmission and scattering occurs.
 これに対し、本技術では、LiDARセンサ3で取得されるデプス値を、デプス補正情報である補正マップを参照して補正している。これにより、認識対象物の三次元計測精度を高精度なものとし、認識対象物の認識精度を向上させることができる。
 本実施形態において、上記補正マップは、第1カメラ2A、第2カメラ2B、及び、LiDARセンサ3それぞれのセンシング結果を用いて生成することができる。補正マップの詳細については後述する。
In contrast, in the present technology, the depth value acquired by the LiDAR sensor 3 is corrected with reference to a correction map, which is depth correction information. As a result, the three-dimensional measurement accuracy of the recognition target object can be made highly accurate, and the recognition accuracy of the recognition target object can be improved.
In this embodiment, the correction map can be generated using sensing results of the first camera 2A, the second camera 2B, and the LiDAR sensor 3, respectively. Details of the correction map will be described later.
 以下、認識対象物が、半透明体である肌が露出した状態の人間の手であり、手を認識する例を用いて説明する。 In the following, the recognition target is a human hand with semi-transparent skin that is exposed, and an example of recognizing the hand will be used.
 (処理部)
 処理部50は、補正マップを用いて、LiDARセンサ3で取得されるデプス値を補正する。
 処理部50は、補正マップを生成してもよい。
 処理部50は、取得部51と、認識部52と、補正部53と、表示画像生成部54と、補正マップ生成部55と、を有する。
(Processing part)
The processing unit 50 corrects the depth value acquired by the LiDAR sensor 3 using the correction map.
The processing unit 50 may generate a correction map.
The processing unit 50 has an acquisition unit 51 , a recognition unit 52 , a correction unit 53 , a display image generation unit 54 and a correction map generation unit 55 .
 ((取得部))
 取得部51は、第1カメラ2A、第2カメラ2B、及び、LiDARセンサ3それぞれでのセンシング結果、すなわちRGB画像、デプス画像を取得する。
((acquisition unit))
The acquisition unit 51 acquires the sensing results of the first camera 2A, the second camera 2B, and the LiDAR sensor 3, that is, the RGB image and the depth image.
 ((認識部))
 認識部52は、取得部51で取得したデプス画像やRGB画像から手の領域を検出する。認識部52は、検出した手領域を切り出した画像領域から、手の特徴点位置を検出する。手の位置を認識するための手の特徴点としては、指先、指の関節、手首等がある。指先、指の関節、手首は、手を構成する部位である。
((recognition part))
The recognition unit 52 detects a hand region from the depth image and the RGB image acquired by the acquisition unit 51 . The recognition unit 52 detects the position of the characteristic point of the hand from the image area obtained by cutting out the detected hand area. Characteristic points of the hand for recognizing the position of the hand include fingertips, finger joints, wrists, and the like. Fingertips, finger joints, and wrists are parts of the hand.
 より詳細には、認識部52は、第1カメラ2A及び第2カメラ2Bそれぞれで取得されたRGB画像の手領域から手の二次元特徴点位置を検出する。検出された二次元特徴点位置は補正マップ生成部55に出力される。以下、「二次元特徴点位置」を「二次元位置」ということがある。
 また、認識部52は、LiDARセンサ3で取得したデプス画像の手領域から、手の三次元特徴点位置を推定し検出する。このLiDARセンサ3のデプス画像に基づいて検出された認識対象物の三次元特徴点位置は、補正部53に出力される。以下、「三次元特徴点位置」を「三次元位置」ということがある。三次元位置は、デプス値の情報を含む。
More specifically, the recognition unit 52 detects the two-dimensional feature point positions of the hands from the hand regions of the RGB images respectively acquired by the first camera 2A and the second camera 2B. The detected two-dimensional feature point positions are output to the correction map generator 55 . Hereinafter, "two-dimensional feature point position" may be referred to as "two-dimensional position".
The recognition unit 52 also estimates and detects the three-dimensional feature point positions of the hand from the hand region of the depth image acquired by the LiDAR sensor 3 . The three-dimensional feature point positions of the recognition target detected based on the depth image of the LiDAR sensor 3 are output to the correction unit 53 . Hereinafter, "three-dimensional feature point position" may be referred to as "three-dimensional position". The three-dimensional position includes depth value information.
 上記手領域の検出、特徴点位置の検出は、既知の手法により行うことができる。例えば、ディープニューラルネットワーク(DNN:Deep Neural Network)、Hand Pose Detection、Hand Pose Estimation、Hand segmentationなどの人体の手認識技術、HOG(Histogram of Oriented Gradient)、SIFT(Scale Invariant Feature Transform)などの特徴点抽出方法、Boosting、SVM(Support Vector Machine)などのパターン認識による被写体認識方法、Graph Cutなどによる領域抽出方法、などにより、画像中における手の位置を認識することができる。  The detection of the hand region and the detection of the feature point position can be performed by a known method. For example, deep neural network (DNN), hand recognition technology of the human body such as Hand Pose Detection, Hand Pose Estimation, Hand segmentation, feature points such as HOG (Histogram of Oriented Gradient), SIFT (Scale Invariant Feature Transform) The position of the hand in the image can be recognized by an extraction method, an object recognition method based on pattern recognition such as Boosting and SVM (Support Vector Machine), and an area extraction method such as Graph Cut.
 ((補正部))
 補正部53は、認識部52により認識対象物の領域が手等の人間の肌であると認識されると、LiDARセンサ3のデプス画像に基づいて検出された認識対象物(本実施形態では手である。)の三次元特徴点位置のデプス値(Z軸方向の位置)を、補正マップを参照して補正する。
((corrector))
When the recognition unit 52 recognizes that the region of the recognition target is human skin such as a hand, the correction unit 53 detects the recognition target (hand in this embodiment) based on the depth image of the LiDAR sensor 3. ) is corrected with reference to the correction map.
 これにより、認識対象物が人の肌のような半透明体であっても、表面下散乱によるLiDARセンサ3による計測値と実際値のずれ(誤差)が解消されるように、デプス値が補正される。
 すなわち、補正マップを用いる補正により、LiDARセンサ3のセンシング結果から、実際の認識対象物の三次元位置情報を得ることができ、認識対象物を高精度に認識することができる。
 補正部53によって補正された認識対象物のデプス値は、表示画像生成部54に出力される。
As a result, even if the object to be recognized is a translucent object such as human skin, the depth value is corrected so that the deviation (error) between the measured value by the LiDAR sensor 3 and the actual value due to subsurface scattering is eliminated. be done.
In other words, the correction using the correction map makes it possible to obtain the three-dimensional position information of the actual recognition target from the sensing result of the LiDAR sensor 3, thereby recognizing the recognition target with high accuracy.
The depth value of the recognition target object corrected by the correction unit 53 is output to the display image generation unit 54 .
 ((表示画像生成部))
 表示画像生成部54は、表示部34に出力する画像信号を生成する。該画像信号は、表示部34に出力され、表示部34では、画像信号に基づいて画像が表示される。
((display image generator))
The display image generation section 54 generates an image signal to be output to the display section 34 . The image signal is output to the display section 34, and an image is displayed on the display section 34 based on the image signal.
 表示画像生成部54は、カメラ2で取得されたスルー画像(カメラ画像)に、仮想オブジェクトが重畳された画像を生成してもよい。該仮想オブジェクトは、後述する補正マップ生成時に用いる仮想オブジェクトであってよい。また、仮想オブジェクトは、例えばゲームアプリケーションによる拡張現実画像を構成する仮想オブジェクトであってもよい。 The display image generation unit 54 may generate an image in which the virtual object is superimposed on the through image (camera image) acquired by the camera 2 . The virtual object may be a virtual object used when generating a correction map, which will be described later. Also, the virtual object may be, for example, a virtual object forming an augmented reality image by a game application.
 ここで、カメラ画像に壁の仮想オブジェクトを重畳した拡張現実画像に対して、ユーザが仮想オブジェクトである壁に手で触れる画像を、表示部34に表示する例をあげる。
 表示画像生成部54は、該表示画像を生成するにあたり、補正された認識対象物である手のデプス値を用いて、手と仮想オブジェクトである壁の位置関係が適切な拡張現実画像を生成することができる。
 これにより、例えば、仮想オブジェクトである壁の表面を手で触れるという画像が表示されるべきところ、手の一部に壁の仮想オブジェクトが重畳されて手の一部が見えなくなり、壁に指が突っ込まれた画像になってしまう、といったことがない。
Here, an example of displaying on the display unit 34 an image in which a user touches a wall, which is a virtual object, is displayed on an augmented reality image obtained by superimposing a wall virtual object on a camera image.
When generating the display image, the display image generation unit 54 uses the corrected depth value of the hand, which is the object to be recognized, to generate an augmented reality image in which the positional relationship between the hand and the wall, which is the virtual object, is appropriate. be able to.
As a result, for example, when an image in which a hand touches the surface of a wall, which is a virtual object, should be displayed, the virtual object of the wall is superimposed on a part of the hand, making the hand partially invisible, and the user cannot see the finger on the wall. There is no such thing as an image that is stuck.
 ((補正マップ生成部))
 補正マップ生成部55は、第1カメラ2Aと第2カメラ2Bそれぞれのセンシング結果と、LiDARセンサ3のセンシング結果を用いて、デプス補正情報である補正マップを生成する。
((correction map generator))
The correction map generation unit 55 generates a correction map, which is depth correction information, using the sensing results of the first camera 2A and the second camera 2B and the sensing results of the LiDAR sensor 3 .
 より具体的には、補正マップ生成部55は、認識部52で各カメラ2のRGB画像から検出された認識対象物(手)の二次元特徴点位置を用いて、三角測量により認識対象物の三次元特徴点位置を算出する。この三角測量を用いて算出した認識対象物の三次元特徴点位置は、実際の認識対象物の三次元特徴点位置に相当するものとし、実際の認識対象物のデプス値を含むものとする。
 補正マップ生成部55は、三角測量により算出された認識対象物のデプス値と、認識部52により検出されたLiDARセンサ3のデプス画像に基づく認識対象物のデプス値との差分情報を用いて、補正マップを生成する。
 補正マップの生成方法については後述する。
More specifically, the correction map generating unit 55 uses the two-dimensional feature point positions of the recognition target (hand) detected from the RGB images of the respective cameras 2 by the recognition unit 52 to determine the recognition target by triangulation. 3D feature point positions are calculated. The three-dimensional feature point positions of the recognition target object calculated using this triangulation are assumed to correspond to the three-dimensional feature point positions of the actual recognition target object, and include the depth values of the actual recognition target object.
The correction map generation unit 55 uses difference information between the depth value of the recognition target object calculated by triangulation and the depth value of the recognition target object based on the depth image of the LiDAR sensor 3 detected by the recognition unit 52, Generate a correction map.
A method of generating the correction map will be described later.
 (記憶部)
 記憶部56は、RAM等のメモリデバイス、及びハードディスクドライブ等の不揮発性の記録媒体を含み、認識対象物の認識処理や補正マップ(デプス補正情報)生成処理等を、携帯端末1に実行させるためのプログラムを記憶する。
(storage unit)
The storage unit 56 includes a memory device such as a RAM and a non-volatile recording medium such as a hard disk drive, and is used to cause the mobile terminal 1 to execute recognition processing for recognition objects, correction map (depth correction information) generation processing, and the like. program.
 記憶部56に記憶される、認識対象物の認識処理のプログラムは、以下のステップを認識装置(本実施形態では携帯端末1)に実行させるためのものである。
 上記ステップは、LiDARセンサとイメージセンサとを備える機器(本実施形態では携帯端末1)の、LiDARセンサで取得される認識対象物のデプス値を、LiDARセンサのセンシング結果及びイメージセンサのセンシング結果を用いて生成されるデプス補正情報(補正マップ)を参照して補正するステップである。
The recognition processing program for the recognition target object stored in the storage unit 56 is for causing the recognition device (mobile terminal 1 in this embodiment) to execute the following steps.
In the above step, the depth value of the object to be recognized acquired by the LiDAR sensor of a device (mobile terminal 1 in this embodiment) provided with a LiDAR sensor and an image sensor, the sensing result of the LiDAR sensor and the sensing result of the image sensor are obtained. This is a step of referring to and correcting depth correction information (correction map) generated using the depth correction information.
 記憶部56に記憶される、補正マップ(デプス補正情報)の生成処理のプログラムは、以下のステップを認識装置(本実施形態では携帯端末1)に実行させるためのものである。
 上記ステップは、複数のカメラそれぞれのRGB画像から検出された認識対象物の二次元位置から三角測量により認識対象物の三次元位置を算出するステップと、LiDARセンサのデプス画像から認識対象物の三次元位置を検出するステップと、三角測量により算出された認識対象物の三次元位置とLiDARセンサのデプス画像に基づく認識対象物の三次元位置との差分情報を用いて補正マップ(デプス補正情報)を生成するステップ、である。
The correction map (depth correction information) generation processing program stored in the storage unit 56 is for causing the recognition device (mobile terminal 1 in this embodiment) to execute the following steps.
The above steps include a step of calculating the three-dimensional position of the recognition target by triangulation from the two-dimensional position of the recognition target detected from the RGB images of each of the plurality of cameras, and a step of calculating the three-dimensional position of the recognition target from the depth image of the LiDAR sensor. a correction map (depth correction information) using difference information between the three-dimensional position of the recognition target calculated by triangulation and the three-dimensional position of the recognition target based on the depth image of the LiDAR sensor; a step of generating
 また、記憶部56は、予め生成された補正マップを格納していてもよい。補正部53はこの予め準備された補正マップを参照して、LiDARセンサ3で取得されたデプス値を補正してもよい。 Further, the storage unit 56 may store a pre-generated correction map. The correction unit 53 may refer to the correction map prepared in advance to correct the depth value acquired by the LiDAR sensor 3 .
[認識方法]
 図4は、認識対象物の認識方法のフロー図である。
 図4に示すように、認識処理がスタートすると、取得部51により、LiDARセンサ3のセンシング結果(デプス画像)が取得される(ST1)。
[Recognition method]
FIG. 4 is a flow diagram of a method for recognizing a recognition object.
As shown in FIG. 4, when the recognition process starts, the acquisition unit 51 acquires the sensing result (depth image) of the LiDAR sensor 3 (ST1).
 次に、認識部52により、取得部51で取得したデプス画像を用いて手の領域が検出される(ST2)。
 認識部52により、デプス画像から認識対象物である手の三次元特徴点位置が推定、検出される(ST3)。検出された認識対象物の三次元特徴点位置情報は補正部53に出力される。
Next, the hand region is detected by the recognition unit 52 using the depth image acquired by the acquisition unit 51 (ST2).
The recognition unit 52 estimates and detects the three-dimensional feature point positions of the hand, which is the object to be recognized, from the depth image (ST3). The detected three-dimensional feature point position information of the recognition target object is output to the correction unit 53 .
 次に、補正部53により、補正マップを用いて、検出された認識対象物の三次元特徴点位置のZ位置が補正される(ST4)。補正された認識対象物の三次元特徴点位置は、実際の認識対象物の三次元特徴点位置に相当する。
 補正された認識対象物の三次元特徴点位置情報は表示画像生成部54に出力される(ST5)。
Next, the correction unit 53 corrects the Z position of the detected three-dimensional feature point position of the recognition object using the correction map (ST4). The corrected 3D feature point positions of the recognition target object correspond to the actual 3D feature point positions of the recognition target object.
The corrected three-dimensional feature point position information of the object to be recognized is output to the display image generator 54 (ST5).
 このように、本実施形態の認識方法では、認識対象物が半透明体である人間の肌であっても、LiDARセンサ3のセンシング結果を、補正マップを用いて補正することにより、認識対象物の認識精度が向上する。 As described above, in the recognition method of the present embodiment, even if the recognition target is translucent human skin, the sensing result of the LiDAR sensor 3 is corrected using the correction map. The recognition accuracy of is improved.
[補正マップ]
 補正マップは、LiDARセンサ3で検出される認識対象物のデプス値(Z値)を補正するためのデプス補正情報である。LiDARセンサ3での計測値は、認識対象物である肌での表面下散乱やLiDARセンサ3の個体差によって、実際値との誤差が生じる。補正マップは、この誤差を補正する。
[Correction map]
The correction map is depth correction information for correcting the depth value (Z value) of the recognition target detected by the LiDAR sensor 3 . The measured value of the LiDAR sensor 3 has an error from the actual value due to subsurface scattering on the skin, which is the object to be recognized, and individual differences of the LiDAR sensor 3 . A correction map corrects for this error.
 図5を用いて補正マップについて説明する。
 図5(A)に示すように、LiDARセンサ3で取得され得る撮影領域8の実空間に対して、三次元のグリッド9を配置する。三次元のグリッド9は、均一な間隔で配置される複数のX軸に平行なグリッド線、均一な間隔で配置される複数のY軸に平行なグリッド線、均一な間隔で配置される複数のZ軸に平行なグリッド線で分割されて構成される。
 図5(B)は、図5(A)をY軸方向からみたときの模式図である。
 図5(A)及び(B)において、符号30は、LiDARセンサ3の中心を示す。
A correction map will be described with reference to FIG.
As shown in FIG. 5A, a three-dimensional grid 9 is arranged in the real space of the imaging area 8 that can be acquired by the LiDAR sensor 3 . The three-dimensional grid 9 includes a plurality of evenly spaced grid lines parallel to the X axis, a plurality of uniformly spaced grid lines parallel to the Y axis, and a plurality of uniformly spaced grid lines parallel to the Y axis. It is divided by grid lines parallel to the Z-axis.
FIG. 5B is a schematic diagram of FIG. 5A viewed from the Y-axis direction.
5A and 5B, reference numeral 30 indicates the center of the LiDAR sensor 3. FIG.
 補正マップは、三次元のグリッド9の各格子点上で、デプスに関するオフセット値を保持したマップである。「デプスに関するオフセット値」とは、LiDARセンサ3で取得されるデプス値(計測値)が、実際のデプス値(実際値)に対して、+又は-でZ軸方向にどの程度ずれているかを示す値である。 The correction map is a map that holds an offset value related to depth on each grid point of the three-dimensional grid 9 . The "offset value related to depth" means how much the depth value (measured value) obtained by the LiDAR sensor 3 deviates from the actual depth value (actual value) in the Z-axis direction by + or -. is the value shown.
 「デプスに関するオフセット値」について説明する。
 図5(B)に示す例において、格子点A上に位置する、中が黒い黒丸はLiDARセンサ3で取得されたデプス画像に基づく認識対象物の三次元位置13を示す。中が白い白丸は実際の認識対象物の三次元位置12を示す。認識対象物の三次元位置には、デプス値の情報が含まれる。換言すると、符号13はLiDARセンサ3による計測位置を示し、符号12は実際の位置を示す。
 LiDARセンサ3のデプス画像に基づく認識対象物の三次元位置13のデプス値と、実際の認識対象物の三次元位置12のデプス値との差分aが、格子点Aにおける「デプスに関するオフセット値」となる。図5(B)に示す例では、格子点Aにおける「デプスに関するオフセット値」は+である。
 補正マップでは、撮影領域8に配置される三次元のグリッド9の全ての格子点毎に「デプスに関するオフセット値」が設定される。
 このような補正マップを参照して、LiDARセンサ3で取得された認識対象物のデプス値を補正することにより、認識対象物の三次元計測精度を高精度なものとし、認識対象物の認識精度を向上させることができる。
The "offset value related to depth" will be described.
In the example shown in FIG. 5B , the black circle located on the grid point A indicates the three-dimensional position 13 of the recognition object based on the depth image acquired by the LiDAR sensor 3 . A white circle with a white inside indicates the three-dimensional position 12 of the actual object to be recognized. The three-dimensional position of the recognition object includes depth value information. In other words, reference numeral 13 indicates the position measured by the LiDAR sensor 3, and reference numeral 12 indicates the actual position.
The difference a between the depth value of the three-dimensional position 13 of the recognition target based on the depth image of the LiDAR sensor 3 and the depth value of the actual three-dimensional position 12 of the recognition target is the "offset value related to depth" at the grid point A. becomes. In the example shown in FIG. 5B, the "offset value for depth" at grid point A is +.
In the correction map, an “offset value related to depth” is set for each grid point of the three-dimensional grid 9 arranged in the imaging region 8 .
By referring to such a correction map and correcting the depth value of the recognition target acquired by the LiDAR sensor 3, the three-dimensional measurement accuracy of the recognition target is made highly accurate, and the recognition accuracy of the recognition target is improved. can be improved.
[補正マップを用いた補正方法]
 上述の補正マップを用いたデプス値の補正方法について説明する。以下、「デプスに関するオフセット値」を単に「オフセット値」という。LiDARセンサ3で取得された認識対象物の三次元位置を「計測位置」という。「計測位置」は、補正前三次元位置であり、補正前のデプス値の情報を含む。
[Correction method using correction map]
A method of correcting the depth value using the correction map described above will be described. Hereinafter, the "offset value related to depth" is simply referred to as "offset value". The three-dimensional position of the object to be recognized acquired by the LiDAR sensor 3 is called "measured position". The “measured position” is a pre-correction three-dimensional position and includes pre-correction depth value information.
 上述のように、補正マップでは、三次元のグリッド9の格子点毎にオフセット値が設定されている。計測位置が格子点上にある場合は、当該格子点に設定されるオフセット値を用いて、計測位置のデプス値が補正される。 As described above, in the correction map, an offset value is set for each lattice point of the three-dimensional grid 9. When the measurement position is on a grid point, the offset value set at the grid point is used to correct the depth value of the measurement position.
 一方、計測位置が格子点上にない場合は、例えばBilinear補間処理等を用いて、計測位置におけるオフセット値を算出し、該オフセット値を用いて、計測位置のデプス値の補正を行うことができる。 On the other hand, when the measurement position is not on the grid point, for example, using bilinear interpolation processing or the like, an offset value at the measurement position is calculated, and the offset value can be used to correct the depth value of the measurement position. .
 Bilinear補間処理では、例えば次のように計測位置におけるオフセット値が算出される。
 X軸方向に隣り合って延在する2つのグリッド線とY軸方向に隣り合って延在する2つのグリッド線が交差してなる4つの格子点が通るXY平面内に計測位置がある場合を例にあげて説明する。
 計測位置におけるオフセット値は、上記4つの格子点それぞれにおけるオフセット値と、上記4つの格子点のうちX軸方向に隣接する2つの格子点と計測位置とのX軸方向における距離値の比に基づく重み係数と、上記4つの格子点のうちY軸方向に隣接する2つの格子点と計測位置とのY軸方向における距離値の比に基づく重み係数と、を用いて算出される。すなわち、計測位置におけるオフセット値は、上記4つの格子点それぞれにおけるオフセット値と、上記4つの格子点と計測位置との各XY軸方向における距離値の加重平均に基づいて算出される。
In the bilinear interpolation process, for example, the offset value at the measurement position is calculated as follows.
The measurement position is in the XY plane passing through four grid points formed by two grid lines extending adjacently in the X-axis direction and two grid lines extending adjacently in the Y-axis direction. An example will be given.
The offset value at the measurement position is based on the ratio of the offset value at each of the four grid points and the distance value in the X-axis direction between two grid points adjacent in the X-axis direction among the four grid points and the measurement position. It is calculated using a weighting factor and a weighting factor based on a ratio of distance values in the Y-axis direction between two adjacent lattice points in the Y-axis direction among the four lattice points and the measurement position. That is, the offset value at the measurement position is calculated based on the offset value at each of the four grid points and the weighted average of the distance values in the XY directions between the four grid points and the measurement position.
 尚、ここでは、便宜的に、計測位置が4つの格子点が通る平面内に位置する場合を例にあげて説明したが、該平面内に計測位置がない場合は、次のように計測位置におけるオフセット値を算出することができる。
 すなわち、三次元のグリッド9において、グリッド線により区画された最小単位の三次元空間内に計測位置がある場合、該最小の三次元空間を構成する8つの格子点それぞれにおけるオフセット値と、該8つの格子点と計測位置との各XYZ軸方向における距離値の加重平均に基づいて、計測位置におけるオフセット値を算出することができる。
Here, for the sake of convenience, the case where the measurement position is positioned within a plane passing through four grid points has been described as an example. can be calculated.
That is, in the three-dimensional grid 9, when the measurement position is in the three-dimensional space of the minimum unit partitioned by the grid lines, the offset value at each of the eight grid points constituting the minimum three-dimensional space, and the 8 The offset value at the measurement position can be calculated based on the weighted average of the distance values in each XYZ axis direction between one grid point and the measurement position.
[補正マップの生成方法]
 (補正マップの生成方法の概略)
 補正マップは、第1カメラ2Aと第2カメラ2Bそれぞれのセンシング結果、LiDARセンサ3のセンシング結果を用いて、生成することができる。以下、図6及び図7を用いて補正マップの生成方法の概略について説明する。
[Correction map generation method]
(Outline of correction map generation method)
A correction map can be generated using the sensing results of the first camera 2A and the second camera 2B and the sensing results of the LiDAR sensor 3 . The outline of the correction map generation method will be described below with reference to FIGS. 6 and 7. FIG.
 図6は、2つのカメラと1つのLiDARセンサを備える携帯端末1を用いた補正マップの生成例を説明する模式図である。携帯端末1の撮影領域内に、認識対象物であるユーザUの手が位置した状態で、補正マップの生成は行われる。
 図6において、ユーザUの手に重なって示される複数の中が白い小さな白丸は、ユーザUの手の特徴点位置6を示すものであり、関節位置、指先位置、手首位置等を示す。
 ここでは、人差し指の指先位置を認識する場合について説明する。
FIG. 6 is a schematic diagram illustrating an example of generating a correction map using the mobile terminal 1 having two cameras and one LiDAR sensor. The correction map is generated in a state in which the hand of the user U, which is the object to be recognized, is positioned within the shooting area of the mobile terminal 1 .
In FIG. 6 , a plurality of small white circles superimposed on the hand of the user U indicate the characteristic point positions 6 of the hand of the user U, and indicate joint positions, fingertip positions, wrist positions, and the like.
Here, a case of recognizing the fingertip position of the index finger will be described.
 図6において、符号120が付された白丸は、第1カメラ2Aと第2カメラ2Bそれぞれで取得されたRGB画像から検出された二次元特徴点位置を用いて三角測量により算出された人差し指の指先の三次元特徴点位置を示す。この三角測量を用いて算出した指先位置120は、実際の指先位置に相当するとし、実際の認識対象物のデプス値の情報を含むものとする。 In FIG. 6, the white circle with reference numeral 120 is the tip of the index finger calculated by triangulation using the two-dimensional feature point positions detected from the RGB images respectively acquired by the first camera 2A and the second camera 2B. shows the 3D feature point positions of . The fingertip position 120 calculated using this triangulation corresponds to the actual fingertip position and includes information on the depth value of the actual recognition object.
 図6において、符号130は、LiDARセンサ3で取得されたデプス画像に基づく人差し指の指先の三次元特徴点位置を示す。LiDARセンサ3で取得される人差し指の指先位置130は、LiDARセンサ3での計測時の表面下散乱により、実際の認識対象物の指先位置120とデプス値がずれている。 In FIG. 6, reference numeral 130 indicates the three-dimensional feature point positions of the tip of the index finger based on the depth image acquired by the LiDAR sensor 3. The fingertip position 130 of the index finger acquired by the LiDAR sensor 3 is deviated from the actual fingertip position 120 of the object to be recognized due to subsurface scattering during measurement by the LiDAR sensor 3 .
 三角測量を用いて算出した指先位置120と、LiDARセンサ3のデプス画像に基づく人差し指の指先位置130との差分が誤差成分となる。この誤差成分が、補正マップにおける「デプスに関するオフセット値」となる。
 このような誤差成分のデータを、撮影領域全体で取得することによって、携帯端末1における認識対象が人の肌である場合のLiDARセンサ3に由来する計測誤差を補正するための補正マップを生成することができる。
The difference between the fingertip position 120 calculated using triangulation and the fingertip position 130 of the index finger based on the depth image of the LiDAR sensor 3 is the error component. This error component becomes the "offset value related to depth" in the correction map.
A correction map for correcting the measurement error originating from the LiDAR sensor 3 when the recognition target in the mobile terminal 1 is human skin is generated by acquiring such error component data over the entire imaging area. be able to.
 図7を用いて、処理部50での補正マップ生成の処理の流れを説明する。
 図7に示すように、LiDARセンサ3のデプス画像から認識対象物の三次元特徴点位置が検出される(ST11)。このデプス画像に基づく三次元特徴点位置は、図6における符号130に対応する。
 また、第1カメラ2A及び第2カメラ2BそれぞれのRGB画像から二次元特徴点位置が検出される(ST12)。検出された二次元特徴点位置を用いて三角測量により認識対象物の三次元特徴点位置が算出される(ST13)。この三角測量によって算出される三次元特徴点位置は、認識対象物の実際の三次元特徴点位置である。三角測量によって算出される三次元特徴点位置は、図6における符号120に対応する。
The flow of correction map generation processing in the processing unit 50 will be described with reference to FIG.
As shown in FIG. 7, the three-dimensional feature point positions of the recognition object are detected from the depth image of the LiDAR sensor 3 (ST11). The three-dimensional feature point positions based on this depth image correspond to reference numeral 130 in FIG.
Also, two-dimensional feature point positions are detected from the RGB images of the first camera 2A and the second camera 2B (ST12). Using the detected two-dimensional feature point positions, the three-dimensional feature point positions of the recognition object are calculated by triangulation (ST13). The three-dimensional feature point positions calculated by this triangulation are the actual three-dimensional feature point positions of the recognition object. Three-dimensional feature point positions calculated by triangulation correspond to reference numeral 120 in FIG.
 次に、ST23で算出された複数のカメラ(第1カメラ2A及び第2カメラ2B)それぞれのRGB画像に基づいて算出された三次元特徴点位置に対する、ST21で推定されたLiDARセンサ3のデプス画像に基づく三次元特徴点位置の差分が、誤差成分として算出される(ST14)。
 このような誤差成分のデータが撮影領域全体で取得されることにより、補正マップが生成される。
Next, the depth image of the LiDAR sensor 3 estimated in ST21 with respect to the three-dimensional feature point positions calculated based on the RGB images of each of the plurality of cameras (the first camera 2A and the second camera 2B) calculated in ST23. is calculated as an error component (ST14).
A correction map is generated by acquiring such error component data for the entire imaging area.
 このように、補正マップは、LiDARセンサ3のセンシング結果に基づく認識対象物のデプス値と実際の認識対象物のデプス値との差分情報を含む。 Thus, the correction map includes difference information between the depth value of the recognition target object based on the sensing result of the LiDAR sensor 3 and the actual depth value of the recognition target object.
 図8は、補正マップ生成時に表示部34に表示される基本的な画像を説明する図である。
 補正マップの生成時、図8(A)及び(B)に示すように、携帯端末1の表示部34には、第1カメラ2A又は第2カメラ2Bにより取得されたスルー画像に、補正マップ生成のための仮想オブジェクトであるターゲット球7が重畳表示された画像が表示される。尚、補正マップ生成のための仮想オブジェクトは、球状に限られず、種々の形状とすることができる。
 ユーザUは、例えば一方の手で携帯端末1を持ち、他方の手を撮影領域内に位置させて表示部34に他方の手が映し出される体勢をとる。ユーザUが、表示部34に表示される画像を見て、他方の手を動かすことによって、補正マップの生成が行われる。
 ターゲット球7は、撮影領域内でその位置が変化可能に表示される。ユーザUは、表示部34に映し出されるターゲット球7の動きに応じて、ターゲット球7を追いかけるように他方の手を動かす。このように、ターゲット球7の動きに応じて手を動かすことにより、撮影領域全体における誤差成分のデータを取得することができ、該データを用いて補正マップを生成することができる。
FIG. 8 is a diagram for explaining a basic image displayed on the display section 34 when the correction map is generated.
When the correction map is generated, as shown in FIGS. 8A and 8B, the display unit 34 of the mobile terminal 1 displays the through image acquired by the first camera 2A or the second camera 2B with the correction map generated. An image in which a target sphere 7, which is a virtual object for , is superimposed is displayed. Note that the virtual object for generating the correction map is not limited to a spherical shape, and can have various shapes.
For example, the user U holds the mobile terminal 1 with one hand and positions the other hand within the imaging area so that the other hand is displayed on the display unit 34 . The correction map is generated by the user U viewing the image displayed on the display unit 34 and moving the other hand.
The target sphere 7 is displayed so that its position can be changed within the imaging area. The user U moves the other hand so as to chase the target ball 7 according to the movement of the target ball 7 displayed on the display unit 34 . In this way, by moving the hand according to the movement of the target sphere 7, it is possible to obtain error component data in the entire imaging area, and use the data to generate a correction map.
 より具体的な補正マップの生成方法について、以下に説明する。
 (補正マップの具体的な生成方法例)
 図9及び図10を用いて、より具体的な補正マップの生成方法について説明する。
 図9は、補正マップ生成時の表示部34に表示される画像を説明する図である。
 図10は、補正マップ生成時に表示部34に表示される画像の表示に係るフロー図である。
A more specific correction map generation method will be described below.
(Specific example of correction map generation method)
A more specific method of generating a correction map will be described with reference to FIGS. 9 and 10. FIG.
FIG. 9 is a diagram for explaining an image displayed on the display unit 34 when the correction map is generated.
FIG. 10 is a flowchart relating to display of an image displayed on the display unit 34 when the correction map is generated.
 上述したように、補正マップの生成処理の際、ユーザUは、一方の手で携帯端末1を持ち、他方の手をカメラ2の視野領域に入るように位置させた体勢をとる。
 ユーザUは、表示部34を見ながら、表示部34に表示されるターゲット球の移動方向や大きさに応じて、他方の手を動かす。この手の動きの情報に基づいて補正マップが生成される。
As described above, during the correction map generation process, the user U holds the mobile terminal 1 with one hand and positions the other hand so as to be within the field of view of the camera 2 .
The user U moves the other hand according to the moving direction and size of the target ball displayed on the display unit 34 while looking at the display unit 34 . A correction map is generated based on this hand movement information.
 図10のフローに従い、図9を参照して補正マップ生成時に表示される画像について説明する。
 補正マップの生成処理がスタートすると、図9(A)に示すように、携帯端末1の表示部34に、第1カメラ2A又は第2カメラ2Bで撮影されたスルー画像が表示される(ST21)。さらに、図9(A)に示すように、スルー画像に重畳して、目標場所にターゲット球7が表示され(ST22)、該ターゲット球7を追うユーザUの手の認識結果としてユーザ認識結果の球11が表示される(ST23)。以下、「ユーザ認識結果の球」を「ユーザ球」という。
An image displayed when the correction map is generated will be described with reference to FIG. 9 according to the flow of FIG.
When the correction map generation process starts, as shown in FIG. 9A, a through image captured by the first camera 2A or the second camera 2B is displayed on the display section 34 of the mobile terminal 1 (ST21). . Furthermore, as shown in FIG. 9A, the target sphere 7 is superimposed on the through-the-lens image and displayed at the target location (ST22). A sphere 11 is displayed (ST23). Hereinafter, the ``user recognition result sphere'' will be referred to as a ``user sphere''.
 ターゲット球7及びユーザ球11はいずれも仮想オブジェクトである。ターゲット球7は例えば黄色、ユーザ球11は例えば青色というように、互いに異なる色で表示され、両者は識別可能となっている。
 ターゲット球7の大きさは変化することなく、常に一定の大きさで表示される。
 ユーザ球11は、認識されたユーザUの手の所定の位置に表示される。例えば、図8に示す例では、ユーザ球11の中心が中指の付け根付近に位置するようにユーザ球11は表示される。ユーザ球11は、LiDARセンサ3でのセンシング結果に基づく認識結果を示す。ユーザ球11は、表示部34に表示される画像において、ユーザUのXY平面内での手の動きに追従して移動するように表示される。更に、ユーザ球11は、ユーザUのZ軸方向における手の動きに応じて大きさが変化する。言い換えると、ユーザ球11の大きさは、ユーザUの手のZ軸方向における位置(デプス値)に応じて変化する。
Both the target sphere 7 and the user sphere 11 are virtual objects. The target sphere 7 is displayed in different colors, for example, yellow, and the user sphere 11 is displayed in blue, for example, so that they can be distinguished from each other.
The size of the target sphere 7 does not change and is always displayed at a constant size.
The user sphere 11 is displayed at a predetermined position of the recognized user U's hand. For example, in the example shown in FIG. 8, the user sphere 11 is displayed such that the center of the user sphere 11 is positioned near the base of the middle finger. A user sphere 11 indicates a recognition result based on the sensing result of the LiDAR sensor 3 . In the image displayed on the display unit 34, the user sphere 11 is displayed so as to follow the movement of the hand of the user U within the XY plane. Furthermore, the size of the user sphere 11 changes according to the movement of the hand of the user U in the Z-axis direction. In other words, the size of the user sphere 11 changes according to the position (depth value) of the hand of the user U in the Z-axis direction.
 携帯端末1により、例えば音声等によって、ユーザに対して、図9(B)に示すように、ターゲット球7にユーザ球11が合致するように手を動かすよう誘導が行われる(ST24)。ここで、ターゲット球7とユーザ球11が合致するとは、両者の位置及び両者の球の大きさがほぼ同じとなることをいう。ターゲット球7とユーザ球11との合致の誘導は、音声の他、文章で表示部34に表示されてもよい。 The mobile terminal 1 guides the user, for example, by voice, to move the hand so that the user sphere 11 matches the target sphere 7 as shown in FIG. 9(B) (ST24). Here, the match between the target sphere 7 and the user sphere 11 means that the positions and sizes of the two spheres are substantially the same. Guidance for the match between the target sphere 7 and the user sphere 11 may be displayed on the display unit 34 in text as well as voice.
 次に、図9(C)に示すように、ターゲット球7とユーザ球11の合致が認められると、図9(D)に示すようにターゲット球7が移動する。携帯端末1により、音声等によって、ユーザUに対して、ターゲット球7の動きにユーザUの手を追従させるように誘導が行われる。ターゲット球7は、撮影領域全体を移動する。 Next, as shown in FIG. 9(C), when the match between the target sphere 7 and the user sphere 11 is recognized, the target sphere 7 moves as shown in FIG. 9(D). The portable terminal 1 guides the user U to follow the movement of the target ball 7 by voice or the like. The target sphere 7 moves throughout the imaging area.
 補正マップ生成部55により、撮影領域全体を移動するターゲット球7を追うように移動するユーザUの手の動き情報が取得される。すなわち、補正マップ生成部55により、撮影領域全体のLiDARセンサ3による認識対象物(手)の三次元位置情報が取得される(ST25)。 The correction map generation unit 55 acquires information on the movement of the hand of the user U, which moves so as to follow the target sphere 7 that moves over the entire imaging area. That is, the three-dimensional position information of the object (hand) to be recognized by the LiDAR sensor 3 in the entire imaging area is acquired by the correction map generation unit 55 (ST25).
 更に、上述のST11~ST15の補正マップ生成処理では、補正マップ生成部55により、LiDARセンサ3による認識対象物の三次元位置情報の取得と平行して、三角測量により算出される三次元位置情報も取得される。
 すなわち、補正マップ生成部55により、2つのカメラ2A及び2BのRGB画像が取得され、各カメラのRGB画像から検出された認識対象物の二次元位置情報を用いて三角測量により認識対象物の三次元位置が算出される。この三角測量により算出される三次元位置情報も、撮影領域全体に亘って取得される。
Furthermore, in the above-described correction map generation processing of ST11 to ST15, the correction map generation unit 55 obtains the three-dimensional position information of the recognition target object by the LiDAR sensor 3, and in parallel, the three-dimensional position information calculated by triangulation. is also obtained.
That is, the correction map generation unit 55 acquires the RGB images of the two cameras 2A and 2B, and uses the two-dimensional position information of the recognition target detected from the RGB images of each camera to perform triangulation to determine the three-dimensional image of the recognition target. The original position is calculated. Three-dimensional position information calculated by this triangulation is also acquired over the entire imaging area.
 そして、図7のフロー図を用いて説明したように、LiDARセンサ3のデプス画像(センシング結果)に基づく認識対象物の三次元位置情報と、2つのカメラ2A及び2BそれぞれのRGB画像(センシング結果)に基づく三次元位置情報との誤差が算出される。補正マップ生成部55により、撮影領域全体における誤差成分のデータを用いて補正マップが生成される。
 このように、ユーザにより、携帯端末1毎に、LiDARセンサ3による計測誤差(測距誤差)を補正する補正マップを生成することができ、搭載されているLiDARセンサ3に適した調整が可能となる。
Then, as described using the flow diagram of FIG. 7, the three-dimensional position information of the recognition target based on the depth image (sensing result) of the LiDAR sensor 3 and the RGB images (sensing results) of the two cameras 2A and 2B, respectively. ) is calculated. A correction map is generated by the correction map generator 55 using the error component data in the entire imaging area.
In this way, the user can generate a correction map for correcting the measurement error (ranging error) by the LiDAR sensor 3 for each mobile terminal 1, and can make adjustments suitable for the mounted LiDAR sensor 3. Become.
 尚、補正マップは、上述のように、携帯端末1毎にユーザにより生成されてもよいし、予め準備されていてもよい。LiDARセンサ及びカメラを備える機器(本実施形態における携帯端末)において、機器の種類毎に搭載されるセンサの種類は予めわかるので、機種やセンサ毎に、認識対象物が人の肌である場合の補正マップが予め生成され準備されていてもよい。後述する第2実施形態においても同様のことがいえる。 Note that the correction map may be generated by the user for each mobile terminal 1 as described above, or may be prepared in advance. In a device (mobile terminal in this embodiment) equipped with a LiDAR sensor and a camera, the type of sensor installed for each type of device is known in advance. A correction map may be generated and prepared in advance. The same thing can be said for a second embodiment, which will be described later.
<第2実施形態>
 第1実施形態では、2つのカメラと1つのLiDARセンサそれぞれのセンシング結果を用いて補正マップを生成する例をあげたが、これに限定されない。
 本実施形態では、機器(本実施形態では携帯端末)に搭載される1つのカメラと1つのLiDARセンサそれぞれのセンシング結果を用いて補正マップを生成する例をあげる。
 本実施形態における機器としての携帯端末は、カメラの数が異なる点で第1実施形態の携帯端末と異なり、その他の基本構成は同様であり、処理部50の構成はほぼ同様である。第1実施形態における携帯端末は複眼カメラを搭載しているのに対し、第2に実施形態における携帯端末は単眼カメラを搭載している。以下、異なる点について主に説明する。
<Second embodiment>
In the first embodiment, an example of generating a correction map using the sensing results of two cameras and one LiDAR sensor was given, but the present invention is not limited to this.
In this embodiment, an example of generating a correction map using the sensing results of one camera and one LiDAR sensor mounted on a device (mobile terminal in this embodiment) will be given.
The mobile terminal as a device in this embodiment differs from the mobile terminal in the first embodiment in that the number of cameras is different. While the mobile terminal in the first embodiment is equipped with a compound camera, the mobile terminal in the second embodiment is equipped with a monocular camera. Differences will be mainly described below.
 第2実施形態において、認識装置としても機能する携帯端末1の記憶部56に記憶される補正マップ(デプス補正情報)の生成処理のプログラムは、以下のステップを認識装置(本実施形態では携帯端末1)に実行させるためのものである。
 上記ステップは、1つのカメラのRGB画像(センシング結果)から認識対象物の二次元位置を検出するステップと、LiDARセンサの信頼度画像(センシング結果)から認識対象物の二次元位置を検出するステップと、カメラのRGB画像に基づく認識対象物の二次元位置とLiDARセンサの信頼度画像に基づく認識対象物の二次元位置を用いて三角測量により認識対象物の三次元位置を算出するステップと、LiDARセンサのデプス画像から認識対象物の三次元位置を検出するステップと、三角測量により算出された認識対象物の三次元位置とLiDARセンサのデプス画像に基づく認識対象物の三次元位置との差分を用いてデプス補正情報(補正マップ)を生成するステップ、である。
In the second embodiment, the program for generating the correction map (depth correction information) stored in the storage unit 56 of the mobile terminal 1 that also functions as the recognition device performs the following steps: 1) is executed.
The above steps include a step of detecting the two-dimensional position of the recognition target from the RGB image (sensing result) of one camera, and a step of detecting the two-dimensional position of the recognition target from the reliability image (sensing result) of the LiDAR sensor. and calculating the three-dimensional position of the recognition target by triangulation using the two-dimensional position of the recognition target based on the RGB image of the camera and the two-dimensional position of the recognition target based on the reliability image of the LiDAR sensor; A step of detecting the three-dimensional position of the recognition target from the depth image of the LiDAR sensor, and a difference between the three-dimensional position of the recognition target calculated by triangulation and the three-dimensional position of the recognition target based on the depth image of the LiDAR sensor. generating depth correction information (correction map) using
 図11及び図12を用いて、本実施形態における補正マップの生成方法について説明する。
 図11は、携帯端末1を用いた補正マップの生成例を説明する模式図である。
 図11において、ユーザUの手に重なって示される複数の中が白い小さな白丸は、ユーザUの手の特徴点位置6を示す。ここでは、人差し指の指先位置を認識する場合について説明する。
 図12は、本実施形態における補正マップ生成方法のフロー図である。
 尚、補正マップ生成時の表示部に表示される画像は、第1実施形態と同様である。
A method of generating a correction map according to this embodiment will be described with reference to FIGS. 11 and 12. FIG.
FIG. 11 is a schematic diagram illustrating an example of generating a correction map using the mobile terminal 1. FIG.
In FIG. 11 , a plurality of small white circles overlapped with the user's U hand indicate feature point positions 6 of the user's U hand. Here, a case of recognizing the fingertip position of the index finger will be described.
FIG. 12 is a flowchart of the correction map generation method according to this embodiment.
The image displayed on the display unit when generating the correction map is the same as in the first embodiment.
 図11において、符号121は、カメラ2のRGB画像から検出された二次元特徴点位置とLiDARセンサ3の信頼度画像から検出された二次元特徴点位置とを用いて、三角測量により算出した人差し指の指先位置を示す。三角測量を用いて算出した指先位置121は、実際の指先位置に相当するものとし、実際の認識対象物のデプス値の情報を含むものとする。指先位置121は、認識対象物の三次元特徴点位置である。 In FIG. 11, reference numeral 121 denotes an index finger calculated by triangulation using the two-dimensional feature point positions detected from the RGB image of the camera 2 and the two-dimensional feature point positions detected from the reliability image of the LiDAR sensor 3. indicates the fingertip position of the The fingertip position 121 calculated using triangulation is assumed to correspond to the actual fingertip position and includes information on the actual depth value of the recognition target object. A fingertip position 121 is a three-dimensional feature point position of the recognition object.
 信頼度画像は、LiDARセンサ3で取得されるデプス情報の信頼度を画素毎に表す信頼度情報である。該信頼度は、LiDARセンサ3でデプス情報を取得する際に、同時に算出される。信頼度は、デプス情報算出に用いた画像の輝度情報やコントラスト情報を用いて算出される。信頼度は実数値で画素毎に決定され、最終的に信頼度を輝度値とするグレースケールの画像として信頼度画像が生成される。 A reliability image is reliability information that represents the reliability of depth information acquired by the LiDAR sensor 3 for each pixel. The reliability is calculated at the same time when depth information is acquired by the LiDAR sensor 3 . The reliability is calculated using luminance information and contrast information of the image used for depth information calculation. The reliability is determined for each pixel using a real value, and finally a reliability image is generated as a grayscale image in which the reliability is a luminance value.
 図11において、符号131は、LiDARセンサ3で取得されたデプス画像に基づく人差し指の指先の三次元特徴点位置を示す。LiDARセンサ3で取得される人差し指の指先位置131は、LiDARセンサ3での計測時の表面下散乱により、実際の認識対象物の指先位置121とデプス値がずれている。 In FIG. 11, reference numeral 131 indicates the three-dimensional feature point positions of the tip of the index finger based on the depth image acquired by the LiDAR sensor 3. The fingertip position 131 of the index finger acquired by the LiDAR sensor 3 is deviated from the actual fingertip position 121 of the object to be recognized due to subsurface scattering during measurement by the LiDAR sensor 3 .
 三角測量を用いて算出した指先位置121と、LiDARセンサ3のデプス画像に基づく人差し指の指先位置131との差分が誤差成分となる。この誤差成分が、補正マップにおける「デプスに関するオフセット値」となる。
 このような誤差成分のデータを、撮影領域全体で取得することによって、携帯端末1における認識対象が人の肌である場合のLiDARセンサ3に由来する計測誤差を補正するための補正マップを生成することができる。
The difference between the fingertip position 121 calculated using triangulation and the fingertip position 131 of the index finger based on the depth image of the LiDAR sensor 3 is the error component. This error component becomes the "offset value related to depth" in the correction map.
A correction map for correcting the measurement error originating from the LiDAR sensor 3 when the recognition target in the mobile terminal 1 is human skin is generated by acquiring such error component data over the entire imaging area. be able to.
 このように、補正マップは、LiDARセンサ3のセンシング結果に基づく認識対象物のデプス値と実際の認識対象物のデプス値との差分情報を含む。 Thus, the correction map includes difference information between the depth value of the recognition target object based on the sensing result of the LiDAR sensor 3 and the actual depth value of the recognition target object.
 本実施形態の補正マップ生成処理では、補正マップ生成部55により、LiDARセンサ3のデプス画像(センシング結果)に基づく認識対象物の三次元位置情報と、1つのカメラ2のRGB画像(センシング結果)とLiDARセンサ3の信頼度画像(センシング結果)とに基づく認識対象物の三次元位置情報とを用いて、補正マップが生成される。
 以下、図12を用いて処理部50での補正マップ生成の処理の流れを説明する。
In the correction map generation process of the present embodiment, the correction map generation unit 55 generates three-dimensional position information of the recognition target based on the depth image (sensing result) of the LiDAR sensor 3 and the RGB image (sensing result) of one camera 2. and three-dimensional position information of the recognition target based on the reliability image (sensing result) of the LiDAR sensor 3, a correction map is generated.
The flow of correction map generation processing in the processing unit 50 will be described below with reference to FIG. 12 .
 図12に示すように、LiDARセンサ3のデプス画像から認識対象物の三次元特徴点位置が検出される(ST31)。このデプス画像に基づく三次元特徴点位置は、図11における符号131に対応する。
 また、LiDARセンサ3の信頼度画像から二次元特徴点が検出される(ST32)。
 また、カメラ2のRGB画像から二次元特徴点位置が検出される(ST33)。
As shown in FIG. 12, the three-dimensional feature point positions of the recognition object are detected from the depth image of the LiDAR sensor 3 (ST31). The three-dimensional feature point positions based on this depth image correspond to reference numeral 131 in FIG.
Also, two-dimensional feature points are detected from the reliability image of the LiDAR sensor 3 (ST32).
Also, two-dimensional feature point positions are detected from the RGB image of camera 2 (ST33).
 次に、信頼度画像から検出された二次元特徴点位置と、カメラ2のRGB画像から検出された二次元特徴点位置を用いて、三角測量により認識対象物の三次元特徴点位置が算出される(ST34)。この三角測量を用いて算出される三次元特徴点位置は、認識対象物の実際の三次元特徴点位置に相当するものである。三角測量によって算出される三次元特徴点位置は、図11における符号121に対応する。 Next, using the two-dimensional feature point positions detected from the reliability image and the two-dimensional feature point positions detected from the RGB image of the camera 2, the three-dimensional feature point positions of the recognition target are calculated by triangulation. (ST34). The three-dimensional feature point positions calculated using this triangulation correspond to the actual three-dimensional feature point positions of the recognition object. Three-dimensional feature point positions calculated by triangulation correspond to reference numeral 121 in FIG.
 次に、ST34で三角測量を用いて算出された認識対象物の三次元特徴点位置に対する、ST31で推定されたLiDARセンサ3のデプス画像に基づく三次元特徴点位置の差分が、誤差成分として算出される(ST35)。
 このような誤差成分のデータが撮影領域全体で取得されることにより、補正マップが生成される。
Next, the difference between the three-dimensional feature point position of the recognition object calculated using triangulation in ST34 and the three-dimensional feature point position based on the depth image of the LiDAR sensor 3 estimated in ST31 is calculated as an error component. (ST35).
A correction map is generated by acquiring such error component data for the entire imaging area.
 以上の各実施形態のように、本技術は、LiDARセンサとカメラ(イメージセンサ)を備える機器の、LiDARセンサで取得されるデプス値を、LiDARセンサのセンシング結果とカメラのセンシング結果を用いて生成された補正マップ(デプス補正情報)を参照して補正している。これにより、LiDARセンサの個体差に応じたLiDARセンサのセンシング結果のデプス値における誤差の補正が可能となり、認識対象物の認識精度を向上させることができる。 As in each of the above embodiments, the present technology generates the depth value obtained by the LiDAR sensor of a device equipped with a LiDAR sensor and a camera (image sensor) using the sensing result of the LiDAR sensor and the sensing result of the camera. Correction is performed by referring to the correction map (depth correction information) provided. As a result, it is possible to correct the error in the depth value of the sensing result of the LiDAR sensor according to the individual difference of the LiDAR sensor, and it is possible to improve the recognition accuracy of the recognition object.
 本技術は、認識対象物が、人間の肌のように半透明体である場合に特に好ましく適用される。本技術においては、認識対象物が半透明体であっても、補正マップを用いてLiDARセンサで取得されるデプス値を補正することにより、認識対象物での表面下散乱やセンサデバイスの個体差によるLiDARセンサの計測値と実際値とのずれ(誤差)が補正される。これにより、安定した高精度な認識対象物の計測が可能となり、認識対象物の認識精度を向上させることができる。
 このため、本技術は、上述の説明のように、肌が露出される状態の多い人間の手の認識に、特に好ましく適用することができる。
 また、本技術は、ユーザが実行するジェスチャ動作を認識するジェスチャ認識にも適用され得る。ゲームや家電機器等のコントローラやリモートコントローラの代替として、ユーザが実行する手によるジェスチャ動作のジェスチャ認識結果を用い、ゲームや家電機器の操作入力等を行うことができる。本技術では、認識対象物の高精度な認識が可能であるので、安定かつ正確な操作入力が可能となる。
This technology is particularly preferably applied when the object to be recognized is translucent like human skin. With this technology, even if the recognition target is a translucent object, by correcting the depth value acquired by the LiDAR sensor using a correction map, subsurface scattering in the recognition target and individual differences in sensor devices can be corrected. The deviation (error) between the measured value of the LiDAR sensor and the actual value due to is corrected. This enables stable and highly accurate measurement of the recognition target object, thereby improving the recognition accuracy of the recognition target object.
Therefore, as described above, the present technology can be particularly preferably applied to the recognition of human hands whose skin is frequently exposed.
The technology may also be applied to gesture recognition to recognize gesture actions performed by a user. As an alternative to controllers and remote controllers for games, home appliances, etc., gesture recognition results of hand gestures performed by users can be used to input operations for games and home appliances. Since the present technology enables highly accurate recognition of a recognition target, stable and accurate operation input is possible.
<他の構成例>
 本技術の実施の形態は、上述した実施の形態に限定されるものではなく、本技術の要旨を逸脱しない範囲において種々の変更が可能である。
<Other configuration examples>
Embodiments of the present technology are not limited to the above-described embodiments, and various modifications are possible without departing from the gist of the present technology.
 例えば、上述の第1及び第2実施形態では、それぞれ別々のデバイスであるRGBカメラとLiDARセンサを用いる例をあげたが、RGB画像とデプス画像(NIR画像)を同時に撮影可能な1つのデバイスであるRGB-Dカメラを用いてもよい。
 第1実施形態において、2つのカメラと1つのLiDARセンサの代わりに、1つのカメラと、1つのRGB-Dカメラを用いてもよい。
 第2実施形態において、1つのカメラと1つのLiDARセンサの代わりに、1つのRGB-Dカメラを用いてもよい。
For example, in the above-described first and second embodiments, an example using an RGB camera and a LiDAR sensor, which are separate devices, was given. Some RGB-D cameras may be used.
In the first embodiment, instead of two cameras and one LiDAR sensor, one camera and one RGB-D camera may be used.
In the second embodiment, instead of one camera and one LiDAR sensor, one RGB-D camera may be used.
 また、例えば、上述の実施形態においては、イメージセンサ及びLiDARセンサを備える機器である携帯端末が、認識対象物を認識する認識装置として機能する例をあげた。これに対し、認識対象物を認識する認識装置が、イメージセンサ及びLiDARセンサを備える機器とは別の外部機器であってもよい。例えば、図3に示す処理部50の一部又は全てが、イメージセンサ及びLiDARセンサを備える機器とは別のサーバ等の外部機器によって構成されていてもよい。 Also, for example, in the above-described embodiment, an example was given in which a mobile terminal, which is a device equipped with an image sensor and a LiDAR sensor, functions as a recognition device that recognizes a recognition target. On the other hand, the recognition device that recognizes the recognition target object may be an external device different from the device including the image sensor and the LiDAR sensor. For example, part or all of the processing unit 50 shown in FIG. 3 may be configured by an external device such as a server different from the device including the image sensor and the LiDAR sensor.
 本技術は、以下の構成をとることもできる。
(1) 認識対象物に光を照射する発光部と、前記認識対象物から反射される光を受光する受光部とを有するLiDAR(Light Detection and Ranging)センサと前記認識対象物を撮像するイメージセンサとを備える機器の、前記LiDARセンサで取得される前記認識対象物のデプス値を、前記LiDARセンサのセンシング結果及び前記イメージセンサのセンシング結果を用いて生成されるデプス補正情報を参照して補正する処理部
 を具備する認識装置。
(2) 上記(1)に記載の認識装置であって、
 前記デプス補正情報は、前記LiDARセンサのセンシング結果に基づく前記認識対象物のデプス値と、前記認識対象物の実際のデプス値との差分情報を含む
 認識装置。
(3) 上記(1)又は(2)に記載の認識装置であって、
 前記機器は、複数の前記イメージセンサと、1つの前記LiDARセンサを備え、
 前記デプス補正情報は、複数の前記イメージセンサそれぞれのセンシング結果から検出される前記認識対象物の位置情報を用いて三角測量により算出される前記認識対象物のデプス値と、前記LiDARセンサのセンシング結果としてのデプス画像に基づく前記認識対象物のデプス値との差分情報を含む
 認識装置。
(4) 上記(1)又は(2)に記載の認識装置であって、
 前記機器は、少なくとも1つの前記イメージセンサと、1つの前記LiDARセンサを備え、
 前記デプス補正情報は、1つの前記イメージセンサのセンシング結果から検出される前記認識対象物の位置情報と前記LiDARセンサのセンシング結果としての信頼度画像から検出される前記認識対象物の位置情報とを用いて三角測量により算出される前記認識対象物のデプス値と、前記LiDARセンサのセンシング結果としてのデプス画像に基づく前記認識対象物のデプス値との差分情報を含む
 認識装置。
(5) 上記(1)~(4)のいずれか1つに記載の認識装置であって、
 前記認識対象物は半透明体である
 認識装置。
(6) 上記(5)に記載の認識装置であって、
 前記認識対象物は人間の肌である
 認識装置。
(7) 上記(6)に記載の認識装置であって、
 前記認識対象物は人間の手である
 認識装置。
(8) 上記(1)~(7)のうちいずれか1つに記載の認識装置であって、
 前記処理部は、前記認識対象物である人間のジェスチャ動作を認識する
 認識装置。
(9) 上記(1)~(8)のうちいずれか1つに記載の認識装置であって、
 前記処理部は、前記LiDARセンサのセンシング結果及び前記イメージセンサのセンシング結果を用いて前記デプス補正情報を生成する
 認識装置。
(10) 上記(1)~(9)のうちいずれか1つに記載の認識装置であって、
 前記機器は表示部を備え、
 前記処理部は、補正した前記認識対象物のデプス値を用いて、前記表示部に表示する画像を生成する
 認識装置。
(11) 認識対象物に光を照射する発光部と、前記認識対象物から反射される光を受光する受光部とを有するLiDAR(Light Detection and Ranging)センサと前記認識対象物を撮像するイメージセンサとを備える機器の、前記LiDARセンサで取得される前記認識対象物のデプス値を、前記LiDARセンサのセンシング結果及び前記イメージセンサのセンシング結果を用いて生成されるデプス補正情報を参照して補正する
 認識方法。
(12) 認識対象物に光を照射する発光部と、前記認識対象物から反射される光を受光する受光部とを有するLiDAR(Light Detection and Ranging)センサと前記認識対象物を撮像するイメージセンサとを備える機器の、前記LiDARセンサで取得される前記認識対象物のデプス値を、前記LiDARセンサのセンシング結果及び前記イメージセンサのセンシング結果を用いて生成されるデプス補正情報を参照して補正するステップ
 を認識装置に実行させるプログラム。
This technique can also take the following configurations.
(1) A LiDAR (Light Detection and Ranging) sensor having a light emitting section that irradiates light onto a recognition target and a light receiving section that receives light reflected from the recognition target, and an image sensor that captures the recognition target. and correcting the depth value of the recognition target acquired by the LiDAR sensor by referring to depth correction information generated using the sensing result of the LiDAR sensor and the sensing result of the image sensor. A recognition device comprising a processing unit.
(2) The recognition device according to (1) above,
The depth correction information includes difference information between the depth value of the recognition target object based on the sensing result of the LiDAR sensor and the actual depth value of the recognition target object.
(3) The recognition device according to (1) or (2) above,
The device comprises a plurality of the image sensors and one LiDAR sensor,
The depth correction information includes the depth value of the recognition target calculated by triangulation using the position information of the recognition target detected from the sensing results of each of the plurality of image sensors, and the sensing result of the LiDAR sensor. Recognition device including difference information from the depth value of the recognition target object based on the depth image of .
(4) The recognition device according to (1) or (2) above,
the device comprises at least one image sensor and one LiDAR sensor;
The depth correction information is the position information of the recognition target detected from the sensing result of one of the image sensors and the position information of the recognition target detected from the reliability image as the sensing result of the LiDAR sensor. and a depth value of the recognition target object based on a depth image as a sensing result of the LiDAR sensor.
(5) The recognition device according to any one of (1) to (4) above,
The recognition device, wherein the object to be recognized is a translucent object.
(6) The recognition device according to (5) above,
The recognition device, wherein the recognition object is human skin.
(7) The recognition device according to (6) above,
The recognition device, wherein the recognition object is a human hand.
(8) The recognition device according to any one of (1) to (7) above,
The recognition device, wherein the processing unit recognizes a gesture motion of a person who is the recognition target object.
(9) The recognition device according to any one of (1) to (8) above,
The recognition device, wherein the processing unit generates the depth correction information using a sensing result of the LiDAR sensor and a sensing result of the image sensor.
(10) The recognition device according to any one of (1) to (9) above,
The device has a display,
The recognition device, wherein the processing unit generates an image to be displayed on the display unit using the corrected depth value of the recognition target object.
(11) A LiDAR (Light Detection and Ranging) sensor having a light emitting section that irradiates light onto a recognition target and a light receiving section that receives light reflected from the recognition target, and an image sensor that captures the recognition target. and correcting the depth value of the recognition target acquired by the LiDAR sensor by referring to depth correction information generated using the sensing result of the LiDAR sensor and the sensing result of the image sensor. recognition method.
(12) A LiDAR (Light Detection and Ranging) sensor having a light emitting section that irradiates light onto a recognition target and a light receiving section that receives light reflected from the recognition target, and an image sensor that captures the recognition target. and correcting the depth value of the recognition target acquired by the LiDAR sensor by referring to depth correction information generated using the sensing result of the LiDAR sensor and the sensing result of the image sensor. A program that causes a recognizer to perform steps .
 1…携帯端末(認識装置、機器)
 2…カメラ(イメージセンサ)
 2A…第1カメラ(イメージセンサ)
 2B…第2カメラ(イメージセンサ)
 3…LiDARセンサ
 12、120、121…実際の指先位置、三角測量により算出された指先位置(実際のデプス値を含む認識対象物の三次元位置)
 13、130、131…LiDARセンサのセンシング結果に基づく指先位置(LiDARセンサのセンシング結果に基づくデプス値を含む認識対象物の三次元位置)
 34…表示部
 50…処理部
1 … Mobile terminal (recognition device, device)
2... Camera (image sensor)
2A... First camera (image sensor)
2B... Second camera (image sensor)
3 LiDAR sensor 12, 120, 121 Actual fingertip position, fingertip position calculated by triangulation (three-dimensional position of recognition object including actual depth value)
13, 130, 131 ... Fingertip positions based on LiDAR sensor sensing results (three-dimensional positions of recognition objects including depth values based on LiDAR sensor sensing results)
34... Display unit 50... Processing unit

Claims (12)

  1.  認識対象物に光を照射する発光部と、前記認識対象物から反射される光を受光する受光部とを有するLiDAR(Light Detection and Ranging)センサと前記認識対象物を撮像するイメージセンサとを備える機器の、前記LiDARセンサで取得される前記認識対象物のデプス値を、前記LiDARセンサのセンシング結果及び前記イメージセンサのセンシング結果を用いて生成されるデプス補正情報を参照して補正する処理部
     を具備する認識装置。
    A LiDAR (Light Detection and Ranging) sensor having a light emitting section that irradiates light onto a recognition target and a light receiving section that receives light reflected from the recognition target, and an image sensor that captures the recognition target. A processing unit of a device that corrects the depth value of the recognition target acquired by the LiDAR sensor by referring to depth correction information generated using the sensing result of the LiDAR sensor and the sensing result of the image sensor. recognition device.
  2.  請求項1に記載の認識装置であって、
     前記デプス補正情報は、前記LiDARセンサのセンシング結果に基づく前記認識対象物のデプス値と、前記認識対象物の実際のデプス値との差分情報を含む
     認識装置。
    The recognition device according to claim 1,
    The depth correction information includes difference information between the depth value of the recognition target object based on the sensing result of the LiDAR sensor and the actual depth value of the recognition target object.
  3.  請求項2に記載の認識装置であって、
     前記機器は、複数の前記イメージセンサと、1つの前記LiDARセンサを備え、
     前記デプス補正情報は、複数の前記イメージセンサそれぞれのセンシング結果から検出される前記認識対象物の位置情報を用いて三角測量により算出される前記認識対象物のデプス値と、前記LiDARセンサのセンシング結果としてのデプス画像に基づく前記認識対象物のデプス値との差分情報を含む
     認識装置。
    The recognition device according to claim 2,
    The device comprises a plurality of the image sensors and one LiDAR sensor,
    The depth correction information includes the depth value of the recognition target calculated by triangulation using the position information of the recognition target detected from the sensing results of each of the plurality of image sensors, and the sensing result of the LiDAR sensor. Recognition device including difference information from the depth value of the recognition target object based on the depth image of .
  4.  請求項2に記載の認識装置であって、
     前記機器は、少なくとも1つの前記イメージセンサと、1つの前記LiDARセンサを備え、
     前記デプス補正情報は、1つの前記イメージセンサのセンシング結果から検出される前記認識対象物の位置情報と前記LiDARセンサのセンシング結果としての信頼度画像から検出される前記認識対象物の位置情報とを用いて三角測量により算出される前記認識対象物のデプス値と、前記LiDARセンサのセンシング結果としてのデプス画像に基づく前記認識対象物のデプス値との差分情報を含む
     認識装置。
    The recognition device according to claim 2,
    the device comprises at least one image sensor and one LiDAR sensor;
    The depth correction information is the position information of the recognition target detected from the sensing result of one of the image sensors and the position information of the recognition target detected from the reliability image as the sensing result of the LiDAR sensor. and a depth value of the recognition target object based on a depth image as a sensing result of the LiDAR sensor.
  5.  請求項1に記載の認識装置であって、
     前記認識対象物は半透明体である
     認識装置。
    The recognition device according to claim 1,
    The recognition device, wherein the object to be recognized is a translucent object.
  6.  請求項5に記載の認識装置であって、
     前記認識対象物は人間の肌である
     認識装置。
    The recognition device according to claim 5,
    The recognition device, wherein the recognition object is human skin.
  7.  請求項6に記載の認識装置であって、
     前記認識対象物は人間の手である
     認識装置。
    A recognition device according to claim 6,
    The recognition device, wherein the recognition object is a human hand.
  8.  請求項1に記載の認識装置であって、
     前記処理部は、前記認識対象物である人間のジェスチャ動作を認識する
     認識装置。
    The recognition device according to claim 1,
    The recognition device, wherein the processing unit recognizes a gesture motion of a person who is the recognition target object.
  9.  請求項1に記載の認識装置であって、
     前記処理部は、前記LiDARセンサのセンシング結果及び前記イメージセンサのセンシング結果を用いて前記デプス補正情報を生成する
     認識装置。
    The recognition device according to claim 1,
    The recognition device, wherein the processing unit generates the depth correction information using a sensing result of the LiDAR sensor and a sensing result of the image sensor.
  10.  請求項1に記載の認識装置であって、
     前記機器は表示部を備え、
     前記処理部は、補正した前記認識対象物のデプス値を用いて、前記表示部に表示する画像を生成する
     認識装置。
    The recognition device according to claim 1,
    The device has a display,
    The recognition device, wherein the processing unit generates an image to be displayed on the display unit using the corrected depth value of the recognition target object.
  11.  認識対象物に光を照射する発光部と、前記認識対象物から反射される光を受光する受光部とを有するLiDAR(Light Detection and Ranging)センサと前記認識対象物を撮像するイメージセンサとを備える機器の、前記LiDARセンサで取得される前記認識対象物のデプス値を、前記LiDARセンサのセンシング結果及び前記イメージセンサのセンシング結果を用いて生成されるデプス補正情報を参照して補正する
     認識方法。
    A LiDAR (Light Detection and Ranging) sensor having a light emitting section that irradiates light onto a recognition target and a light receiving section that receives light reflected from the recognition target, and an image sensor that captures the recognition target. A recognition method of correcting the depth value of the recognition object acquired by the LiDAR sensor of the device by referring to depth correction information generated using the sensing result of the LiDAR sensor and the sensing result of the image sensor.
  12.  認識対象物に光を照射する発光部と、前記認識対象物から反射される光を受光する受光部とを有するLiDAR(Light Detection and Ranging)センサと前記認識対象物を撮像するイメージセンサとを備える機器の、前記LiDARセンサで取得される前記認識対象物のデプス値を、前記LiDARセンサのセンシング結果及び前記イメージセンサのセンシング結果を用いて生成されるデプス補正情報を参照して補正するステップ
     を認識装置に実行させるプログラム。
    A LiDAR (Light Detection and Ranging) sensor having a light emitting section that irradiates light onto a recognition target and a light receiving section that receives light reflected from the recognition target, and an image sensor that captures the recognition target. A step of correcting the depth value of the recognition object acquired by the LiDAR sensor of the device by referring to depth correction information generated using the sensing result of the LiDAR sensor and the sensing result of the image sensor. A program that causes a device to run.
PCT/JP2022/000218 2021-04-22 2022-01-06 Recognition device, recognition method, and program WO2022224498A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202280028267.4A CN117178293A (en) 2021-04-22 2022-01-06 Identification device, identification method, and program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021-072234 2021-04-22
JP2021072234A JP2022166872A (en) 2021-04-22 2021-04-22 Recognition apparatus, recognition method, and program

Publications (1)

Publication Number Publication Date
WO2022224498A1 true WO2022224498A1 (en) 2022-10-27

Family

ID=83722279

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/000218 WO2022224498A1 (en) 2021-04-22 2022-01-06 Recognition device, recognition method, and program

Country Status (3)

Country Link
JP (1) JP2022166872A (en)
CN (1) CN117178293A (en)
WO (1) WO2022224498A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000261617A (en) * 1999-03-09 2000-09-22 Minolta Co Ltd Image reader
JP2016085602A (en) * 2014-10-27 2016-05-19 株式会社日立製作所 Sensor information integrating method, and apparatus for implementing the same
JP2021051347A (en) * 2019-09-20 2021-04-01 いすゞ自動車株式会社 Distance image generation apparatus and distance image generation method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000261617A (en) * 1999-03-09 2000-09-22 Minolta Co Ltd Image reader
JP2016085602A (en) * 2014-10-27 2016-05-19 株式会社日立製作所 Sensor information integrating method, and apparatus for implementing the same
JP2021051347A (en) * 2019-09-20 2021-04-01 いすゞ自動車株式会社 Distance image generation apparatus and distance image generation method

Also Published As

Publication number Publication date
CN117178293A (en) 2023-12-05
JP2022166872A (en) 2022-11-04

Similar Documents

Publication Publication Date Title
US11928838B2 (en) Calibration system and method to align a 3D virtual scene and a 3D real world for a stereoscopic head-mounted display
EP3788403B1 (en) Field calibration of a structured light range-sensor
US11215711B2 (en) Using photometric stereo for 3D environment modeling
US9208566B2 (en) Speckle sensing for motion tracking
US9646384B2 (en) 3D feature descriptors with camera pose information
US10254546B2 (en) Optically augmenting electromagnetic tracking in mixed reality
US20190179146A1 (en) Selective tracking of a head-mounted display
US20170374342A1 (en) Laser-enhanced visual simultaneous localization and mapping (slam) for mobile devices
EP2531980B1 (en) Depth camera compatibility
US10613228B2 (en) Time-of-flight augmented structured light range-sensor
US10091489B2 (en) Image capturing device, image processing method, and recording medium
EP2531979B1 (en) Depth camera compatibility
US10936900B2 (en) Color identification using infrared imaging
JP2011123071A (en) Image capturing device, method for searching occlusion area, and program
US10019839B2 (en) Three-dimensional object scanning feedback
EP3646147B1 (en) Display apparatus for computer-mediated reality
WO2022224498A1 (en) Recognition device, recognition method, and program
Pal et al. 3D point cloud generation from 2D depth camera images using successive triangulation
CN112424641A (en) Using time-of-flight techniques for stereo image processing
WO2021253308A1 (en) Image acquisition apparatus
US20240103133A1 (en) Information processing apparatus, information processing method, and sensing system
Kraus Wireless Optical Communication: Infrared, Time-Of-Flight, Light Fields, and Beyond

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22791286

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22791286

Country of ref document: EP

Kind code of ref document: A1