WO2022019128A1

WO2022019128A1 - Information processing device, information processing method, and computer-readable recording medium

Info

Publication number: WO2022019128A1
Application number: PCT/JP2021/025736
Authority: WO
Inventors: 宏基水野
Original assignee: ソニーグループ株式会社
Priority date: 2020-07-21
Filing date: 2021-07-08
Publication date: 2022-01-27
Also published as: JP2022021027A

Abstract

The information processing device according to one aspect of the present technology comprises an acquisition unit and a setting unit. The acquisition unit acquires, by a ToF method, a depth value for a site of interest of a real object being irradiated with radiation light. The setting unit sets the width of a distribution of weighting values having the site of interest as a reference so as to be lower, in relation to a parameter for the distance to the site of interest as based on the depth value, as a brightness value of the radiation light reflected by the site of interest increases.

Description

Information processing equipment, information processing methods, and computer-readable recording media

This technology relates to an information processing device applicable to shape measurement, an information processing method, and a computer-readable recording medium.

Conventionally, a technique for detecting a three-dimensional shape of an object has been developed. For example, the ToF (Time-of-Flight) method is a technology that detects the depth (depth value) of an object by measuring the time it takes for the light emitted to the target to be reflected by the target and detected by the sensor. Is. Based on the depth information detected in this way, it is possible to generate a three-dimensional model of an object or the like.

For example, Non-Patent Document 1 describes a method of generating a target three-dimensional model based on depth information. In this method, the distance value to the surface of the target is calculated for each position (voxel) in the space based on the distance image mapping the depth to the target. A three-dimensional model is generated by connecting the positions where the distance value becomes 0. This makes it possible to accurately restore the shape of the target even if the depth detection accuracy is uneven (pages 3-4 of Non-Patent Document 1, FIGS. 2, 5, etc.). ..

The technology for detecting 3D shapes in this way has become more familiar due to the miniaturization of sensors, etc., and is expected to be applied in various scenes. Therefore, there is a demand for a technique capable of detecting a three-dimensional shape with high accuracy.

In view of the above circumstances, an object of the present technology is to provide an information processing apparatus capable of detecting a three-dimensional shape with high accuracy, an information processing method, and a recording medium readable by a computer.

In order to achieve the above object, the information processing device according to one form of the present technology includes an acquisition unit and a setting unit. The acquisition unit acquires the depth value of the target portion of the real object irradiated with the irradiation light by the ToF method. The larger the brightness value of the irradiation light reflected by the target portion, the more the setting unit determines the width of the distribution of the weight value with respect to the target portion with respect to the distance parameter to the target portion based on the depth value. Set narrow.

In this information processing device, the actual object is irradiated with irradiation light, and the depth value of the target part is acquired by the ToF method. Further, regarding the distance parameter to the target site based on the depth value, a weight value based on the target site is set. The width of the distribution of the weight values is set narrower as the brightness value of the irradiation light reflected at the target portion is larger. As a result, for example, the position of the target portion can be represented with appropriate accuracy, and the three-dimensional shape can be detected with high accuracy.

The information processing method according to one form of the present technology is an information processing method executed by a computer system, and includes acquiring a depth value of a target portion of a real object irradiated with irradiation light by a ToF method. Further, in the information processing method, the larger the luminance value of the irradiation light reflected by the target portion, the more the distribution of the weight value with respect to the target portion with respect to the distance parameter to the target portion based on the depth value. Includes setting the width of.

A computer-readable recording medium according to an embodiment of the present technology records a program that causes a computer system to perform the following steps.
A step of acquiring the depth value of the target part of the real object irradiated with the irradiation light by the ToF method.
The step of setting the width of the distribution of the weight value with respect to the target portion to be narrower with respect to the distance parameter to the target portion based on the depth value, as the brightness value of the irradiation light reflected by the target portion is larger.

It is a figure which shows typically the appearance of the mobile terminal which concerns on 1st Embodiment of this technique. It is a schematic diagram which shows the use example of a mobile terminal. It is a block diagram which shows the configuration example of a mobile terminal. It is a schematic diagram for demonstrating a depth map and an infrared image. It is a schematic diagram for demonstrating the distance parameter stored in a voxel. It is a schematic diagram for demonstrating the case where a distance parameter interferes. It is a flowchart which shows the flow of the basic processing of a mobile terminal. It is a flowchart which shows an example of the calibration process. It is a schematic diagram which shows the scene which takes a picture of the test object. It is a plot figure which shows the relationship between the luminance average of infrared light and the standard deviation of a depth value. It is an enlarged view of the plot diagram shown in FIG. It is a plot figure for demonstrating the estimation process of a regression coefficient. It is a schematic diagram which shows the calculation example of the distance parameter F. It is a flowchart which shows an example of the volume data generation processing. It is a schematic diagram which shows the setting example of the weight value with respect to the distance parameter. It is a figure which shows the generation example of a 3D model. It is a schematic diagram which shows the relationship between a normal vector and a line-of-sight vector. It is a schematic diagram which shows the photographing system which concerns on 2nd Embodiment. It is a schematic diagram which shows the photographing system which concerns on 3rd Embodiment.

Hereinafter, embodiments relating to this technology will be described with reference to the drawings.

<First Embodiment>
[Information processing device configuration]
FIG. 1 is a diagram schematically showing the appearance of a mobile terminal according to the first embodiment of the present technology. The mobile terminal 100 is an information terminal that can be carried by a user, and has a function of generating a three-dimensional model of an object. As the mobile terminal 100, for example, a smartphone, a tablet, or the like is used.
The mobile terminal 100 is a plate-shaped device and has a display 10, a ToF camera 11, and an outward-facing camera 12. The ToF camera 11 and the outward-facing camera 12 are calibrated as camera modules and are provided at one end of a surface opposite to the surface on which the display 10 is provided.

In the following, the surface on which the display 10 is provided will be referred to as the front surface 13 of the mobile terminal 100. Further, the surface on which the ToF camera 11 and the outward-facing camera 12 are provided (the surface opposite to the front surface 13) is referred to as the back surface 14. Further, the side where the camera module (ToF camera 11 and the outward camera 12) is provided may be described as the upper side of the mobile terminal 100, and the opposite side may be described as the lower side. 1 (a) and 1 (b) are perspective views of the mobile terminal 100 as viewed from the front surface 13 and the back surface 14.

The display 10 is a display element that displays information processed by the mobile terminal 100. The display 10 also has a touch panel function. For example, when the user touches the display 10, various operations such as a selection operation and a movement operation are accepted.
As the display 10, for example, an organic EL display provided with a contact sensor, a liquid crystal display (LCD), or the like is used.

The ToF camera 11 is a depth camera of the ToF system (Time of Flight system).
Here, the ToF method is a detection method for acquiring a distance (depth value) to an object by measuring, for example, the time difference until the light (irradiation light) irradiated to the object is returned to the sensor.
In the present embodiment, infrared light (IR: Infrared) is used as the irradiation light to irradiate the object. Specifically, the ToF camera 11 is configured by using a light source (infrared LED or the like) that irradiates infrared light and an image sensor (IR image sensor or the like) that detects infrared light reflected by an object. ..
In the ToF camera 11, data indicating the flight time (time difference) of infrared light is detected for each of a plurality of pixels. A depth value is calculated for each pixel from the data indicating this flight time, and a depth map is generated.
Further, in the ToF camera 11, data indicating the brightness of the infrared light reflected by the object is detected for each of the plurality of pixels. An infrared image of the object is generated from the data showing this brightness.
Depth maps and infrared images will be described in detail later.

The outward-facing camera 12 is a monocular RGB camera capable of capturing color moving images and still images. The image taken by the outward-facing camera 12 is output to the display 10 in real time, for example. This enables the user to shoot an object while checking the state of the image to be shot.
As the outward-facing camera 12, a digital camera including an image sensor such as a CMOS (Complementary Metal-Oxide Semiconductor) sensor or a CCD (Charge Coupled Device) sensor can be used. In addition, any configuration may be adopted. Further, as the outward-facing camera 12, a plurality of cameras such as a stereo camera may be used.

FIG. 2 is a schematic diagram showing a usage example of the mobile terminal 100. In the following, the real object to be sensed is referred to as an object 1. Here, a method of sensing (shooting) the object 1 using the mobile terminal 100 when generating the 3D model 2 of the object 1 will be described.
The object 1 is, for example, a human face. In this case, the mobile terminal 100 executes face modeling to generate a 3D model 2 of the target person's face. FIG. 2 schematically shows a view of a human being as an object 1 as viewed from above.
Of course, the object 1 to be photographed is not limited to the human face, and the present technology can be applied to any object 1.

In the present embodiment, a user (photographer) who uses the mobile terminal 100 moves the object 1 so as to surround the object 1 with the ToF camera 11 facing the object 1 (the face of the object). By taking a picture of the object 1, data for generating a 3D model 2 of the object 1 is acquired. Specifically, the object 1 is photographed at a predetermined frame rate by the ToF camera 11 while the photographing position is moving. As a result, a depth map obtained by photographing (sensing) the object 1 from various angles is acquired.

In the mobile terminal 100, the 3D model 2 of the object 1 is generated based on the depth map acquired while the user moves around the object 1. More specifically, volume data representing the position of the surface of the object 1 is generated for each depth map. Then, the 3D model 2 is generated based on the data (integrated volume data) in which each volume data is integrated. This point will be described in detail later.
As described above, in the present embodiment, it is possible to generate a three-dimensional model of the object 1 by holding the mobile terminal 100 in the hand and taking a picture with the ToF camera 11 while moving so as to surround the object 1. .. That is, the user can restore the shape of the object 1 by holding the mobile terminal 100 in his hand and photographing the object 1 from various angles.

FIG. 3 is a block diagram showing a configuration example of the mobile terminal 100. The mobile terminal 100 further includes a position / attitude sensor 15, a microphone 16, a speaker 17, a communication unit 18, a storage unit 19, and a controller 30.

The position / posture sensor 15 is a sensor that detects data for calculating the position and posture of the mobile terminal 100.
As the position / attitude sensor 15, for example, an acceleration sensor that detects the acceleration of the mobile terminal 100 and a gyro sensor that detects the angular velocity of the mobile terminal 100 are used. Alternatively, an inertial measurement unit (IMU) in which an acceleration sensor and a gyro sensor are modularized may be used.
In addition, it is equipped with an orientation sensor that detects the geomagnetism and detects the orientation of the mobile terminal 100, and a GPS sensor that receives GPS (Global Positioning System) signals transmitted from satellites and outputs the position information of the mobile terminal 100. May be done.

The microphone 16 is a sound collecting element that detects voice around the mobile terminal 100. The speaker 17 is a reproduction element that reproduces voice or the like output from the mobile terminal 100. As shown in FIG. 1A, the microphone 16 is arranged below the front surface 13 of the mobile terminal 100. The speaker 17 is arranged on the upper side of the front surface 13 of the mobile terminal 100.
The communication unit 18 is a module for executing network communication, short-range wireless communication, and the like with other devices. For example, a wireless LAN module such as WiFi and a communication module such as Bluetooth (registered trademark) are provided.
The specific configuration of the microphone 16, the speaker 17, and the communication unit 18 is not limited.

The storage unit 19 is a non-volatile storage device. As the storage unit 19, for example, a recording medium using a solid-state element such as an SSD (Solid State Drive) or a magnetic recording medium such as an HDD (Hard Disk Drive) is used. In addition, the type of recording medium used as the storage unit 19 is not limited, and for example, any recording medium for recording data non-temporarily may be used.

The storage unit 19 stores a control program 20 for controlling the entire operation of the mobile terminal 100. The control program 20 is a program according to the present embodiment, and the storage unit 19 corresponds to a computer-readable recording medium on which the program is recorded.
Further, as shown in FIG. 3, the storage unit 19 stores the calibration data 21 and the model data 22. The calibration data 21 is data calculated by the calibration process described later, and is referred to when the 3D model 2 is generated. The model data 22 is the data of the 3D model 2 (for example, mesh data).
The calibration data 21 and the model data 22 will be described in detail later.

The controller 30 controls the operation of each block of the mobile terminal 100. The controller 30 has a hardware configuration necessary for a computer such as a CPU and a memory (RAM, ROM). Various processes are executed by the CPU loading the control program 20 stored in the storage unit 19 into the RAM and executing the control program 20. In this embodiment, the controller 30 corresponds to an information processing device.

As the controller 30, for example, a device such as a PLD (Programmable Logic Device) such as an FPGA (Field Programmable Gate Array) or another device such as an ASIC (Application Specific Integrated Circuit) may be used. Further, for example, a processor such as a GPU (Graphics Processing Unit) may be used as the controller 30.

In the present embodiment, the CPU of the controller 30 executes the program according to the present embodiment to realize the data acquisition unit 31, the calibration processing unit 32, the volume data generation unit 33, and the model data generation unit 34 as functional blocks. Will be done. Then, the information processing method according to the present embodiment is executed by these functional blocks. In addition, in order to realize each functional block, dedicated hardware such as an IC (integrated circuit) may be appropriately used.

The data acquisition unit 31 generates various data from the output of each sensor (ToF camera 11 or the like) provided in the mobile terminal 100. As shown in FIG. 3, the data acquisition unit 31 includes a shooting parameter acquisition unit 38, a depth map acquisition unit 39, and an infrared image acquisition unit 40.

The shooting parameter acquisition unit 38 acquires the shooting parameters of the ToF camera 11. Here, the photographing parameters are internal parameters and external parameters when the object 1 is photographed by the ToF camera 11.
The internal parameters are information indicating the distortion of the lens, the focal length of the lens, the optical center, and the like. For example, the internal parameters calculated by calibrating the ToF camera 11 in advance are stored in the storage unit 19, and are appropriately referred to by the photographing parameter acquisition unit 38.
The external parameters are information indicating the current position and orientation of the ToF camera 11. For example, Visual SLAM (Simultaneous Localization and Mapping), which simultaneously estimates the self-position and creates the surrounding environment map, is executed using the external image of the mobile terminal 100 taken by the outward-facing camera 12, and the position of the ToF camera 11 and The posture is estimated. Alternatively, the position and attitude estimated by inertial navigation based on the output of the position / attitude sensor 15 may be used. Further, external parameters may be calculated by combining these methods.

The depth map acquisition unit 39 acquires the depth value of the target portion of the object 1 irradiated with infrared light by the ToF method. Specifically, the depth value of the target portion is calculated based on the output of the ToF camera 11.
Here, the target portion is a portion to be measured for depth (depth value), and is, for example, a portion on the surface of the object 1 irradiated with infrared light. For example, the ToF camera 11 measures the round-trip flight time of the infrared light with respect to the target portion by detecting the infrared light reflected at the target portion. The depth map acquisition unit 39 calculates the depth value of the target portion by using the data indicating the flight time and the speed of light.

In the present embodiment, the depth map acquisition unit 39 acquires a depth map to which the depth value is mapped. Specifically, a depth value is calculated for each pixel of the ToF camera 11, and a distance image having the depth value as the pixel value is generated as a depth map. Therefore, it can be said that the depth map is data in which the depth values of the target portions, which are different for each pixel, are recorded.
The process of generating a depth map is always executed, for example, at a predetermined frame rate.

The infrared image acquisition unit 40 generates an infrared image (IR image) of the object 1 based on the output of the ToF camera 11.
For example, the ToF camera 11 measures the brightness (intensity) of the infrared light in, for example, a fixed exposure time when detecting the infrared light reflected by the target portion. The brightness of the infrared light represents the brightness of the infrared light reflected at the target portion, and is, for example, a value corresponding to the reflection characteristic at the target portion. The depth map acquisition unit 39 generates an infrared image of the object 1 based on the data indicating the brightness. That is, the infrared image is data in which the reflection characteristics of the target portion, which are different for each pixel, are recorded.
The process of generating an infrared image is always executed at a predetermined frame rate at the same time as the depth map, for example.

In the present embodiment, the depth map 3 and the infrared image 4 are generated by using the output of the detection camera mounted on the ToF camera 11. Therefore, each pixel of the depth map 3 corresponds to each pixel of the infrared image 4 as it is. That is, the target portion indicated by a certain pixel of the depth map 3 is the same portion as the target portion indicated by the pixel at the same position in the infrared image 4.

FIG. 4 is a schematic diagram for explaining a depth map and an infrared image. FIG. 4A is a schematic view of the face of the person who is the object 1 as viewed from the front. 4B and 4C are schematic views showing a depth map 3 and an infrared image 4 in which the object 1 shown in FIG. 4A is photographed from the front.

In FIG. 4B, the depth value is schematically shown by the light and shade of the gray scale. Here, the depth map 3 is expressed so that the larger the depth value, that is, the farther the distance from the ToF camera 11, the brighter the color.
For example, the portion corresponding to the nose of the subject has a low depth value and is close to the ToF camera 11. Further, as the contour of the face is approached, the position of each portion moves away from the ToF camera 11, so that the depth value increases. Ideally, the depth map 3 is data representing the shape of the object 1 regardless of the material of the surface of the object 1.
In the present embodiment, such a depth map 3 is generated by the depth map acquisition unit 39 described above.

In FIG. 4C, the luminance value of infrared light is schematically shown by the light and shade of gray scale. Here, the infrared image 4 is expressed so that the portion having a larger luminance value, that is, the portion where infrared rays are strongly reflected, the brighter the color.
For example, the reflection intensity of infrared light is low in a portion where the subject's hair is present. On the contrary, in the part where the skin is exposed, the reflection intensity of infrared light becomes high. As described above, the infrared image 4 is data representing the reflection characteristics of the surface material such as hair and skin.
In addition, the reflection intensity is low in a portion where the incident angle of infrared light is shallow (that is, a portion where infrared light is incident along the surface) such as the side surface of the nose or the contour of the face. On the contrary, the reflection intensity is high in the portion where the incident angle of the infrared light is deep (that is, the portion where the infrared light is incident at an angle close to perpendicular to the surface). In this way, the infrared image 4 records the reflection intensity according to the shape of the object 1.
In the present embodiment, such an infrared image 4 is generated by the infrared image acquisition unit 40 described above.

Returning to FIG. 3, the calibration processing unit 32 executes the calibration process and calculates the calibration data 21. The calibration process is, for example, a process performed in advance before generating the 3D model 2.
In the calibration process, a predetermined test target is repeatedly photographed from the same position, and test data including a plurality of depth maps 3 and a plurality of infrared images 4 is generated (see FIG. 9). Using this test data, a coefficient (regression coefficient) for expressing the relationship between the noise level of the depth value and the brightness value of the infrared light is calculated as the calibration data 21.
The calibration processing unit 32 includes a depth standard deviation calculation unit 41, an infrared luminance average calculation unit 42, and a regression coefficient calculation unit 43.

The depth standard deviation calculation unit 41 calculates the standard deviation of the depth value in each pixel for a plurality of depth maps 3 generated as test data. For example, the depth values of the i-th pixel included in each depth map 3 are read, and their standard deviations are calculated. The value of this standard deviation is used as the noise level of the depth value of the i-th pixel.

The infrared brightness average calculation unit 42 calculates the average value of the brightness values of the infrared light in each pixel of the plurality of infrared images 4 generated as test data. For example, the luminance value of the i-th pixel included in each infrared image 4 is read, and the average value thereof is calculated.
The average of these luminance values is a value representing the reflection characteristic at the portion corresponding to the i-th pixel.

The regression coefficient calculation unit 43 calculates a regression coefficient for expressing a regression function based on the standard deviation of the depth value calculated for each pixel and the brightness average of infrared light. Here, the regression function is a function that expresses the relationship between the brightness of infrared light and the noise level of the depth value (see FIG. 12).
For example, a predetermined regression function is fitted to a plot of the standard deviation of the depth value and the brightness average of infrared light, and a regression coefficient for expressing the regression function is calculated (robust regression estimation). The regression function will be described in detail later with reference to FIG. 12 and the like.
The regression coefficient calculated here is stored in the storage unit 19 as the calibration data 21.

The volume data generation unit 33 generates volume data representing the shape of the object 1 based on the depth map 3. In the volume data generation unit 33, for example, every time the depth map acquisition unit 39 acquires the depth map 3, the volume data corresponding to the depth map 3 is generated.

In the present embodiment, the space is divided by volume cells called voxels, and data representing the shape of the object 1 is stored in each voxel. Voxels are typically cube-shaped cells that divide space into grids. Volume data is a set of voxels in which data representing the shape of the object 1 is stored.
As the data representing the shape of the object 1, a distance parameter representing the distance between the target portion from which the depth value is calculated and each voxel is used. Hereinafter, the volume data will be specifically described.

FIG. 5 is a schematic diagram for explaining the distance parameter stored in the voxel. FIG. 5 schematically shows a view of a human head as an object 1 as viewed from above. Further, the portion of the object 1 protruding downward in the figure represents the nose of the subject.
In the following, the target portion irradiated with the infrared light 5 will be referred to as a target portion P. In FIG. 5, the irradiation direction of the infrared light 5 irradiated to the target portion P is schematically illustrated by using a black arrow.

In this embodiment, a data structure called Volumetric TSDF (Volumetric Truncated Signed Distance Function) is used as the volume data 6. Volumetric TSDF expresses an object 1 that restores its shape as a space (volume space) in which voxels 7 are spread, and sets the value of the surface of the object 1 to 0, and each voxel 7 has a distance to a nearby surface. It is a data structure that expresses a space by storing with a sign (plus / minus).
The value (TSDF value) stored with this sign becomes the distance parameter.

The sign of the distance parameter (TSDF value) stores a negative value when the position of the voxel 7 is outside the surface of the object 1, and stores a positive value when the position of the voxel 7 is inside. The direction of the sign may be unified as a system. For example, the outside may be defined as minus and the inside as plus, or the inside may be minus and the outside may be plus. It may be defined.
The distance parameter is calculated based on the depth value of the depth map 3. In the following, it may be described as a distance parameter F.

For example, FIG. 5 schematically shows a voxel 7 in which the distance parameter F is calculated based on the depth value of the target portion P as a square region. For example, in the voxel 7 including the target portion P, the distance parameter F = 0. Further, the voxel 7 on the left side of the voxel 7 with F = 0 is outside the object 1, so F = −d, and the voxel 7 on the right side of the voxel 7 with F = 0 is the object 1. Since it is inside, F = d.
Here, "d" is, for example, a length representing the interval between voxels 7 in the detection direction of the depth value. The depth value detection direction is the depth direction seen from the ToF camera 11, for example, the direction along the optical axis of the ToF camera 11 (see FIG. 13). Further, the value of d does not need to correspond to the actual length, and is a value normalized by a predetermined threshold value μ.
As described above, the distance parameter F is a distance obtained by normalizing the distance between the voxel 7 and the target portion P in the detection direction of the depth value with the threshold value μ.

Similarly, for the voxel 7 on the left side of F = −d, the distance parameter F is calculated in order from the one closest to the target portion P, such as F = -2d, -3d, -4d. Further, for the voxel 7 on the right side of F = d, the distance parameter F is calculated in order from the one closest to the target portion P, such as F = 2d, 3d, and 4d.
When the shooting position of the ToF camera 11 changes, the irradiation direction of the infrared light changes, and the position of the target portion P, which is the reference for calculating the distance parameter F of each voxel 7, also changes. In this case, the distance parameter F is calculated based on the changed target portion P. Therefore, each time the shooting position changes, the value of the distance parameter F stored in each voxel 7 changes.

Further, the Volumetric TSDF does not store the distance value up to infinity, and does not store the TSDF value (distance parameter) for the voxel 7 which is separated from the surface of the object 1 by a certain distance μ or more.
For example, in FIG. 5, a null value is set as the distance parameter F for the voxel 7 whose distance from the target portion P is larger than | 4d |. That is, μ = | 4d |.
As described above, in the present embodiment, the distance parameter F is calculated for the voxel 7 in which the distance between the voxel 7 and the target portion P in the detection direction of the depth value is the threshold value μ or less.

As shown in FIG. 3, the volume data generation unit 33 has a TSDF calculation unit 44 and a noise level calculation unit 45.
The TSDF calculation unit 44 obtains the above-mentioned TSDF value (distance parameter F) for each voxel 7 based on the depth map 3 generated by the depth map acquisition unit 39 and the shooting parameters acquired by the shooting parameter acquisition unit 38. calculate.

As described above, in the present embodiment, the internal parameters (focal length, optical center, distortion coefficient, etc.) and external parameters (spatial position / posture of the camera) of the ToF camera 11 are acquired as shooting parameters. Based on these shooting parameters, each voxel 7 is projected onto the shooting range of the ToF camera 11. That is, the coordinates of each voxel 7 seen from the ToF camera 11 are calculated.
Then, the depth value of the pixel (target portion P) corresponding to each voxel 7 is referred to, and the distance parameter (signed distance value) of each voxel 7 is calculated (see FIG. 5).
As described above, in the present embodiment, for each of the plurality of voxels 7 that divide the space including the object 1, the distance parameter F (TSDF value) representing the distance between the voxels 7 and the target portion P is based on the depth value. It is calculated. In the following, the volume data in which the TSDF value is stored may be referred to as a TSDF volume.

Further, the TSDF calculation unit 44 sets a weight value W for each of the plurality of voxels 7. The weight value W is used when integrating the volume data 6 (voxels 7) generated for each depth map 3, and represents the degree to which the distance parameter F is reflected when integrating the data. The weight value W is stored for each voxel 7 together with the distance parameter F. Specifically, the width of the distribution of the weight value W is set based on the noise level of the depth value calculated by the noise level calculation unit 45 described later. This point will be described in detail later.

The noise level calculation unit 45 calculates the noise level related to the depth value for each pixel based on the infrared image 4 generated by the infrared image acquisition unit 40.
Specifically, the noise level of the depth value is estimated from the brightness of the infrared light by using the regression function introduced with the regression coefficient calculated by the calibration processing unit 32 (regression coefficient calculation unit 43) described above.
The estimated noise level is output to the TSDF calculation unit 44 and is referred to when setting the weight value W of the distance parameter F.
In the present embodiment, the TSDF calculation unit 44 and the noise level calculation unit 45 work together to realize the setting unit.

The model data generation unit 34 generates the 3D model 2 of the object 1 based on the volume data 6 generated for each depth map 3 by the volume data generation unit 33.
In the present embodiment, the volume data generation unit 33 and the model data generation unit 34 function as a model generation unit that generates a 3D model of a real object based on the depth map.
As shown in FIG. 3, the model data generation unit 34 has a voxel integration unit 46 and a mesh extraction unit 47.

The voxel integration unit 46 integrates a plurality of volume data 6 generated for each depth map 3 to generate integrated volume data.
For example, the plurality of volume data 6 output from the TSDF calculation unit 44 is a TSDF volume generated by photographing the same subject (object 1) from a plurality of viewpoints with the ToF camera 11. In the voxel integration unit 46, a new TSDF volume (integrated volume) that integrates information from a plurality of viewpoints is added by multiplying the voxel 7 of each volume data 6 by the weight value W for each distance parameter F. Data) is calculated.
In this way, the voxel integration unit 46 integrates the voxels 7 generated for each depth map 3 based on the weight value W with respect to the distance parameter F.

The mesh extraction unit 47 extracts the model data 22 in the mesh format from the finally integrated integrated volume data. This model data 22 is stored in the storage unit 19 as data of the 3D model 2.
In this embodiment, the mesh extraction unit 47 generates the 3D model 2 based on the distance parameter. For example, in the integrated volume data (integrated TSDF volume), the portion where the value of the distance parameter F becomes 0 can be regarded as the position of the surface of the object 1. Therefore, for example, an isosurface in which the distance parameter F = 0 is extracted by using the Marching Cube method or the like. The isosurface data is output as model data 22 in mesh format.

In this way, by using Volumetric TSDF, even if a depth value that is inaccurate to some extent is input, it is possible to restore an accurate 3D shape by integrating it with the results of other viewpoints. This improves noise immunity and makes it possible to generate a highly accurate 3D model 2.

The method of generating 3D model 2 using Volumetric TSDF can be regarded as a kind of Visual SLAM. For example, in Visual SLAM, data mapping 3D space is sequentially constructed (Mapping) while estimating the position and orientation of the camera in each frame (Localization) from the image of the camera freely moving in 3D space. ).
In this embodiment, the data structure of Volumetric TSDF is used as a representation of the three-dimensional space. In this approach, since the depth map 3 of each frame corresponds to the depth map 3 from different viewpoints, it is possible to acquire a large amount of viewpoint data by freely moving around the object 1, and finally high accuracy. It is possible to restore the shape with.

In this way, by adopting the spatial representation by Volumetric TSDF and integrating the results of multiple viewpoints, 3D model 2 is generated without degrading the accuracy even when a sensor whose depth value fluctuates due to noise is used, for example. It is possible.
Further, as described above, in the Volumetric TSDF, the range in which the distance parameter F (signed distance value) is stored is cut off at a constant threshold value μ. The longer the cutoff length μ, the stronger the resistance to noise (depth value fluctuation) of the depth value.
On the other hand, when the threshold value μ is long, there may be a case where the distance parameter F interferes. In the following, a case where the distance parameter F interferes will be described.

FIG. 6 is a schematic diagram for explaining a case where distance parameters interfere with each other. In the figure on the left side of FIG. 6A, the head of the subject is photographed as the object 1. Here, it is assumed that the subject is facing the lower side in the figure and the image is taken by the ToF camera 11 from the right side (left side in the figure) of the subject.

At this time, it is assumed that the target portion P existing on the right side surface of the subject's nose is irradiated with infrared light 5 from the right side of the subject. Further, it is assumed that the threshold value μ is set sufficiently longer than the width of the nose. In this case, the distance parameter F with respect to the target portion P on the right side surface is stored for the voxel 7 outside the left side surface of the nose of the subject. This value is not relative to the left side of the subject's nose.
As a result, as shown in the figure on the right side of FIG. 6A, the generated 3D model 2 may have a shape in which the right side surface of the subject's nose protrudes outward from the actual shape.

In this way, if the surface is also present on the back side of the observed surface, the distance parameter F corresponding to the surface on the front side (observation side) may interfere with the back surface due to the long threshold value μ. In Volumetric TSDF, for example, it is implicitly assumed that the sensor noise of the ToF camera 11 is generated in a distribution such as a normal distribution or a uniform distribution. Therefore, the interference from the back surface deviates from such an assumption, and accurate shape restoration cannot be performed. As a result, as shown in FIG. 6B, it is conceivable that the shape of the 3D model 2 becomes inaccurate.

In other words, there is a trade-off between the resistance to noise of the depth value and the high definition of the shape that can be restored, and when the threshold μ is increased and the resistance to noise is prioritized, the fine shape is restored. It can be difficult to do.
For example, in the depth map 3 obtained by the ToF camera 11, the distance value from the camera to the object 1 can be acquired for each pixel. On the other hand, the uncertainty of the depth value of each pixel (that is, the noise level of the depth value) is not uniform, and may differ depending on, for example, the reflection characteristics of the material to be photographed. Therefore, it is difficult to uniquely determine the noise level in the ToF camera 11 over all the pixels.
Even if the noise level is uniquely determined and the reconstruction of the three-dimensional shape by Volumetric TSDF is attempted, the object 1 has different reflection characteristics for each part, and the noise level of the depth value is significantly different for each pixel. In such cases, there is a possibility that accurate shape restoration cannot be performed.

Therefore, in the present embodiment, the larger the luminance value of the infrared light 5 reflected by the target portion P by the TSDF calculation unit 44, the more the target portion P is referred to with respect to the distance parameter F to the target portion P based on the depth value. The width of the distribution of the weight values is set narrow.
The width of the distribution of the weight values is, for example, a value that is half of the peak value of the weight value W. In this way, the TSDF calculation unit 44 sets the width of the distribution of the weight value W with respect to the distance parameter F. Therefore, the range in which the distance parameter F is valid (the range in which the weight value is high) is narrow for the part where the infrared light 5 is reflected well and the depth value can be detected accurately, that is, the part where the noise level of the depth value is low. It can be said that it is set.

As described above, in the present embodiment, an adaptive weight value is given to each voxel 7 of the Volumetric TSDF in consideration of the noise level of the depth value caused by the reflection characteristic of the object 1. Further, an infrared image 4 that can be acquired at the same time as the depth map 3 is used for calculating the noise level of each pixel of the depth map 3. This makes it possible to easily realize highly accurate three-dimensional shape restoration for the object 1.

[Basic operation of mobile terminal 100]
FIG. 7 is a flowchart showing the flow of basic processing of the mobile terminal 100.
First, the calibration process is executed (step 101). In the calibration process, a coefficient (calibration data) for calculating the noise level of the depth value is calculated for each pixel of the depth map 3.
The calibration process is executed, for example, at the time of factory shipment of the mobile terminal 100. This makes it possible to execute the calibration process in an appropriate environment. Alternatively, the calibration process may be executed when the user of the mobile terminal 100 generates the 3D model 2.

Next, a three-dimensional shape restoration process using the volume data 6 of the Volumetric TSDF is executed (step 201). The three-dimensional shape restoration process is a process of reconstructing the three-dimensional shape of the object 1 as a 3D model 2.
Here, the object 1 is photographed by the ToF camera 11, and the volume data 6 of the Volumetric TSDF is generated. At this time, the noise level is calculated for each pixel of the depth map 3 by using the coefficient acquired in the previous calibration process. Using this noise level, the weight value W is set for each voxel 7 together with the TSDF value (distance parameter F), and the volume data 6 is generated. The volume data 6 of a plurality of viewpoints is integrated by using the weight value W set according to the noise level in this way. This makes it possible to realize highly accurate three-dimensional shape restoration.

[Calibration process]
FIG. 8 is a flowchart showing an example of the calibration process.
As described above, the purpose of the calibration process is to obtain a regression coefficient for estimating the noise level of the depth value when the object 1 is observed.
This calibration process does not have to be performed every time, and the regression coefficient (calibration data 21) once calculated can be continuously used. Hereinafter, the details of the calibration process will be described with reference to FIG.

First, the ToF camera 11 is set toward the test target (step 201). The test target is, for example, an object for performing calibration. Here, for example, an instruction to point the ToF camera 11 at the test target and arrange the mobile terminal 100 is output to the display 10 or the like. Therefore, the ToF camera 11 (mobile terminal 100) is set by the calibrating worker.
In this embodiment, the test target corresponds to a test object.

FIG. 9 is a schematic diagram showing a scene in which the test target is photographed. In FIG. 9, three types of cylindrical objects are illustrated as test targets 50. Of course, the test target 50 is not limited to such an example and can be set arbitrarily.

As described above, the ToF camera 11 is a sensor that acquires the distance to the object 1 by measuring the time difference until the irradiated infrared light returns to the sensor. Since this method is used, the smaller the amount of light returned to the ToF camera 11, the lower the S / N ratio, and as a result, the accuracy of the depth value is lowered. How much the irradiated light is reflected and returned to the sensor depends on the material of the reflected object 1.

In the ToF camera 11 according to the present embodiment, infrared light is used as the light to be irradiated, and an IR image sensor or the like having sensitivity to infrared light is used for the light receiving portion. Therefore, the infrared image 4 can be acquired at the same time as the depth map 3. In this case, the luminance value of the infrared image 4 acquired at the same time as the depth map can be used to calculate the noise level of the depth value of each pixel. That is, it is possible to calculate the noise level of each pixel of the depth map 3 based on the luminance value of the infrared image 4.

In the calibration process, the infrared image 4 and the depth map 3 taken by the test target 50 are used in order to investigate the relationship between the brightness value of the infrared light and the noise level of the depth value when the ToF camera 11 is used. Be done.

It is desirable that the test target is, for example, an object whose position, posture, shape, etc. do not change over time. Further, a specific object may not be used as the test target 50, for example, a scene without a moving object may be photographed. As a result, the influence of moving objects and the like is excluded, and the regression coefficient can be calculated appropriately.
Further, it is desirable that the infrared image 4 obtained by photographing the test target 50 includes various luminance values. Therefore, for example, as a cylindrical object in FIG. 9, an object having a different surface material is used. As a result, the deviation of the luminance value is reduced, and the regression coefficient can be calculated appropriately.

Returning to FIG. 8, when the ToF camera 11 is set, the test target is photographed and a plurality of depth maps 3 and infrared images 4 are acquired (step 202). Here, the ToF camera 11 executes shooting for N frames. Then, the depth map acquisition unit 39 and the infrared image acquisition unit 40 generate a depth map 3 and an infrared image 4 for each frame. Here, N = several tens to several hundreds of frames, and is appropriately set so that the required accuracy can be obtained.
As described above, in the calibration process, a plurality of depth maps 3 in which the test target 50 is measured from the same position and a plurality of infrared images 4 corresponding to the plurality of depth maps 3 are acquired.

When a predetermined number of depth maps 3 and infrared images 4 are acquired, the depth standard deviation calculation unit calculates the standard deviation σ of the depth value of each pixel (step 203). For example, N depth values are read for each pixel from the depth map 3 for N frames, and the standard deviation σ is calculated. The standard deviation σ of the depth value represents the noise level for the depth value of the target portion P corresponding to the pixel. In this way, the depth standard deviation calculation unit calculates the standard deviation of the depth value as the noise level of the depth value for each pixel of the plurality of depth maps 3.

Next, the infrared luminance average calculation unit 42 calculates the luminance average I of the infrared light of each pixel (step 204). For example, N luminance values are read for each pixel from the infrared image 4 for N frames, and the luminance average I is calculated. The brightness average I of the infrared light represents the average brightness of the infrared light reflected by the target portion P corresponding to the pixel, and is a value corresponding to the reflection characteristic of the target portion P. In this way, the infrared luminance average calculation unit 42 calculates the average luminance value for each pixel of the plurality of infrared images 4.

FIG. 10 is a plot diagram showing the relationship between the brightness average of infrared light and the standard deviation of the depth value.
The horizontal axis of FIG. 10 is the brightness average I of infrared light, and the vertical axis is the standard deviation σ of the depth value. In FIG. 10, a predetermined test target 50 is photographed for 600 frames, and the standard deviation I of the depth value in each pixel and the brightness average I of the infrared light (the average value of the IR brightness values) are calculated.

For example, when the brightness average I of infrared light is large, the standard deviation σ of the depth value becomes small. That is, the brighter the reflected infrared light, the lower the fluctuation (noise level) of the depth value tends to be. On the contrary, when the brightness average I of the infrared light is sufficiently small, the standard deviation σ of the depth value tends to increase sharply, and the fluctuation of the depth value (noise level) tends to increase.

FIG. 11 is an enlarged view of the plot diagram shown in FIG. FIG. 11 shows an enlarged plot of the area surrounded by the square in FIG.
As can be discriminated from the plot of FIG. 11, the standard deviation σ (that is, the noise level) of the depth value shows a distribution that is inversely proportional to the luminance average I of the infrared light. From this relationship, assuming that the standard deviation σ of the depth value is proportional to the reciprocal of the brightness average I of infrared light (1 / I), the relationship between σ and I is expressed by the following equation when approximate modeling is performed. The standard deviation.

Here, the coefficient A is a proportional coefficient with respect to (1 / I), and the coefficient B is a coefficient representing an intercept. In the present embodiment, the coefficient A corresponds to the first coefficient (A), and the coefficient B corresponds to the second coefficient (B).
Equation (1) is a regression function representing the relationship between the brightness (I) of infrared light and the noise level (σ) of the depth value, and the coefficients A and B are regression coefficients for expressing the regression function. .. Further, in the equation (1), for example, when X = 1 / I and Y = σ, Y = AX + B, which is a function showing a linear relationship.

Returning to FIG. 8, when the brightness average I of the infrared light is calculated, the regression coefficient calculation unit 43 calculates the coefficient A and the coefficient B satisfying the equation (1) (step 205). Here, the coefficients A and B are estimated by performing robust regression estimation on the distribution data of I and σ.
In this way, the calibration processing unit 32 calculates the coefficient A and the coefficient B based on the plurality of depth maps 3 and the plurality of infrared images 4.

FIG. 12 is a plot diagram for explaining the regression coefficient estimation process. In FIG. 12, for the plots shown in FIGS. 10 and 11, the horizontal axis is the reciprocal of the infrared brightness average I (X = 1 / I), and the vertical axis is the standard deviation of the depth value (Y = σ). ), The re-plot plot is shown.
Here, if robust regression estimation is performed based on the equation (1), a straight line 51 can be drawn as a regression function as shown in FIG. The slope of the straight line 51 is the coefficient A, and the intercept is the coefficient B. As described above, in the present embodiment, the coefficient A and the coefficient B are calculated by fitting the equation (1) to the distribution of the average value I of the brightness value of the infrared light and the standard deviation σ of the depth value.

By executing the above contents as a calibration process, the coefficient A and the coefficient B can be calculated. The coefficient A and the coefficient B are stored in the storage unit 19 as calibration data 21.
For example, in an arbitrary scene in which a ToF camera 11 is used, when it is desired to know the noise level of a specific pixel of the depth map 3, the coefficient A and the coefficient B and the luminance value I of the infrared light in the pixel are desired. By using the equation (1) using the above, it is possible to estimate the noise level (σ).

[3D shape restoration processing]
In the three-dimensional shape restoration process, volume data 6 is generated for each of a plurality of depth maps 3 in which the object 1 is photographed from different viewpoints. Then, a 3D model 2 in which the shape of the object 1 is restored is generated from the integrated volume data in which a plurality of volume data 6 are integrated.
First, a method of calculating the distance parameter F that becomes the volume data 6 will be described.

[Volume data generation process]
FIG. 13 is a schematic diagram showing a calculation example of the distance parameter F. FIG. 13 schematically shows the arrangement relationship between the object 1 and the ToF camera 11 (mobile terminal 100) that captures the object 1. Further, in FIG. 13, the world coordinates 52 and the camera coordinates 53 are schematically illustrated. Here, the direction of the Z axis of the camera coordinates 53 is the direction along the optical axis of the ToF camera 11, and is the detection direction (depth direction) in which the depth value is detected.

In the Volumetric TSDF, the distance to the target site P where the depth value is detected by the ToF camera 11 is stored as plus / minus signed data (distance parameter F) for each voxel 7. The distance parameter F (v) stored in the voxel 7 at the coordinate v is defined by the following equation (2).

Here, D (v) is a distance value between the voxel 7 and the target portion P in the detection direction (depth direction) of the depth value.
v'is a four-vector representing the coordinates of the voxel 7 expanded so that the shooting parameters can be applied.
M is an external parameter of the ToF camera 11 and is represented as a 4 × 3 real number matrix R.
K is an internal parameter of the ToF camera 11 and is represented as a 3 × 3 real number matrix R.
π (v) is a function that transforms the three-dimensional coordinate v into a two-dimensional pixel by perspective projection. For example, the three-dimensional coordinates v (x, y, z) are converted into two-dimensional coordinates (x', y') = (x / z, y / z).
d (x) is the depth value of the point x in the image (depth map 3) of the ToF camera 11. That is, d (x) is a value on the Z axis in the camera coordinates of the target portion P corresponding to the point x.
[V] _z is the value of the Z axis at the camera coordinate 53 of the coordinate v of the voxel 7.

As shown in FIG. 13, the value of the Z axis at the camera coordinate 53 of the voxel 7 at the coordinate v is expressed as _{[Mv'] z.} Further, the value on the Z axis at the camera coordinates 53 of the target portion P, that is, the depth value of the target portion P is expressed as d (π (KMv')). Here, KMv'is the coordinates of the target portion P converted into the camera coordinates 53, and π (KMv') is the two-dimensional coordinates of the pixel in which the target portion P is detected.
As described above, D (v) is represented as the difference between the Z-axis value at the camera coordinate 53 of the coordinate v and the depth value of the target portion P. That is, it can be said that D (v) represents the actual distance between the voxel 7 and the target site P along the detection direction of the depth value.

As shown in equation (2), when the absolute value of D (v) is smaller than the threshold value μ, the value obtained by dividing D (v) by the threshold value μ is the distance parameter F (v). The threshold value μ is the distance until the calculation of the TSDF value is stopped. Therefore, the distance parameter F (v) is a value obtained by normalizing D (v) with a threshold value μ so as to satisfy 0 ≦ F (v) <1. FIG. 13 shows an arrow representing the distance parameter F (v) (or D (v)).

FIG. 14 is a flowchart showing an example of the volume data 6 generation process. The process shown in FIG. 14 is a process executed by the volume data generation unit 33 shown in FIG. 3, for example, every time the depth map 3 is acquired. Alternatively, after a predetermined number of depth maps 3 have been acquired, the process shown in FIG. 14 may be executed for each depth map 3.
Hereinafter, it is described as a number (index) i representing the voxel 7 included in the volume data 6 (TSDF volume). Further, the coordinates of the voxel 7 are described as v. Here, the coordinate v of the voxel 7 is represented by, for example, a local coordinate system for modeling fixed in the space around the object 1.

First, the index i of the voxel 7 is initialized to i = 0 by the TSDF calculation unit 44 (step 301). Next, the coordinate v of the i-th voxel 7 is read (step 302). Next, the coordinate v of the voxel 7 is converted into the world coordinate 52 (step 303). Then, the coordinate v of the voxel 7 converted into the world coordinates is projected on the camera coordinates 53 (step 304). As a result, the position of the voxel 7 at the camera coordinates 53 is calculated, so that the position of the voxel 7 can be directly compared with the depth value.

Next, the distance value between the voxel 7 and the target portion P in the detection direction (depth direction) of the depth value, that is, D (v) is calculated (step 305). Then, it is determined whether or not the absolute value of D (v) is smaller than the threshold value μ (step 306).
For example, when it is determined that the absolute value of D (v) is larger than the threshold value μ (No in step 306), the distance parameter F is not calculated and the null value is stored in the voxel 7 (step 307). ..

Further, when it is determined that the absolute value of D (v) is smaller than the threshold value μ (Yes in step 306), the noise level of the depth value is calculated (step 308).
Specifically, the noise level calculation unit 45 reads the infrared image 4 taken from the same viewpoint as the depth map 3 to be processed. Further, the pixel position (π (KMv')) in the depth map 3 of the target portion P referred to when calculating D (v) is read. Then, the brightness value I of the infrared light at the pixel position of the target portion P is extracted from the infrared image 4. Finally, the noise level σ is calculated according to the equation (1) based on the extracted luminance value I.

As described above, in the present embodiment, the noise level of the depth value of the target portion P is set to σ and the brightness value of the infrared light reflected by the target portion P is set to I by the noise level calculation unit 45, which is preset. The noise level is calculated according to the equation (1), where A is the first coefficient and B is the preset second coefficient.

When the noise level is calculated, the weight value W (v) with respect to the distance parameter F (v) is calculated (step 309).
In the present embodiment, the TSDF calculation unit 44 sets the weight value W (v) for each of the plurality of voxels 7 so that the distribution of the weight value W (v) becomes a normal distribution having a peak at the target site P. To. That is, the distribution of the weight value W (v) is a normal distribution centered on the target site P.

At this time, the width of the distribution of the weight value W (v) that peaks at the target portion P is adjusted according to the noise level σ. As described above, in the present embodiment, the noise level of the depth value is calculated based on the luminance value of the irradiation light reflected by the target portion P, and the width of the distribution of the weight value W (v) is calculated based on the noise level. Is set.
Specifically, the weight value W (v) with respect to the distance parameter F (v) is defined by the following equation (3).

As shown in equation (3), the weight value W (v) forms a mountain-shaped distribution such as a normal distribution with the maximum value of the site (target site P) where D (v) = 0. .. The width of this distribution is determined by the values of σ and δ.
Here, σ is the value of the noise level of the depth map 3 estimated according to the equation (1) based on the coefficient A and the coefficient B acquired in the previous calibration process in step 308.
δ is a bias value for adjusting the width of the distribution and functions as an adjustment coefficient.
In the present embodiment, the equation (3) corresponds to the equation for setting the weight value.

As described above, in the present embodiment, the TSDF calculation unit 44 sets the position coordinates of the voxel 7 as v, sets the distance between the voxel 7 and the target portion P in the detection direction of the depth value as D (v), and sets the voxel 7. The weight value W (v) is set according to the equation (3), where W (v) is the weight value, the noise level of the depth value of the target portion P is σ, and the preset adjustment coefficient is δ.
The distribution of the weight value W (v) is not limited to the normal distribution, and any distribution having the maximum value at the target site P can be used.

FIG. 15 is a schematic diagram showing an example of setting a weight value for a distance parameter. FIG. 15 schematically shows the face (head) of the subject as seen from above as an example of the object 1. Further, for the target parts (P1 and P2) on the surface of the target person's face, the distribution of the weight value W (v) set based on the noise level σ of the depth value of each target part is schematically represented by the gray area. Is illustrated.

As described with reference to FIG. 4C, it is generally difficult for a hair portion such as black hair to reflect infrared light. Therefore, in the portion with hair and the like, the brightness of infrared light tends to be lower and the noise level σ of the depth value tends to be larger than that in the portion where the skin is exposed.
As shown in FIG. 15, since the target portion P2 existing in the portion with hair has a large value of σ (high noise level), the weight value W (v) is calculated according to the equation (3). The width of the distribution of (v) becomes wider. That is, the weight value W (v) is stored up to the voxel 7 far from the target portion P2.
As a result, it is possible to properly represent the position of the target portion where the fluctuation of the depth value is large.

On the other hand, it is conceivable that the noise level is low and the depth value is detected with high accuracy in the part where the skin is exposed. For example, as shown in FIG. 15, in the target portion P1 existing on the side surface of the nose, since the value of the noise level σ is small, the width of the distribution of the weight value W (v) becomes narrow. That is, the weight value W (v) is stored only in the voxel 7 which is a short distance from the target portion P1.
As a result, the range in which the distance parameter F (v) is valid, that is, the range in which the weight value W (v) is set is narrowed for the target portion where the fluctuation of the depth value is small and the depth value is reliable.

For example, when the volume data 6 (TSDF volume) is generated, the resistance to noise of the depth value can be improved by increasing the value of the threshold value μ. On the other hand, as described with reference to FIG. 6, if the value of the threshold value μ is large, there is a possibility that the back surface is affected.
In the present embodiment, the width of the distribution of the weight value W (v) is controlled by the above method, and the weight value W (v) can be adaptively stored for each target portion P. As a result, the weight value W (v) is set in a wide range in the portion where the noise of the depth value is large, and the width of the distribution of the weight value W (v) is narrowed in the portion where the noise of the depth value is small. This makes it possible to avoid unnecessary interference of the distance parameter F (v) while maintaining noise immunity.

Returning to FIG. 14, when the weight value W (v) is calculated, the distance parameter F (v) is calculated (step 310). Here, the TSDF calculation unit 44 normalizes D (v) with the threshold value μ according to the equation (2) to calculate F (v).
Then, the weight value W (v) calculated in step 309 and the distance parameter F (v) calculated in step 310 are stored in the voxel 7 (step 311). For example, the calculation results of W (v) and F (v) are stored as variable values prepared for each voxel 7.
When each data is stored in voxels 7, it is determined whether or not there are unprocessed voxels 7 (step 312). If there is an unprocessed voxel 7 (Yes in step 312), the index i of the voxel 7 is incremented (step 313), and the processes after step 302 are executed again. If there is no unprocessed voxel 7 (No in step 312), it is assumed that the processing is completed for all voxels 7, and the volume data 6 generation processing is completed.

[Volume data integration processing]
Here, a method of reconstructing the shape of the object 1 by integrating a plurality of volume data 6 having a Volumetric TSDF data structure will be described.
For example, the process shown in FIG. 14 is executed for each depth map 3 taken from a plurality of viewpoints, and a plurality of volume data 6 are generated. Then, these volume data 6 are integrated using the above-mentioned weight values. It can be said that the process of integrating the volume data 6 in this way is the process of integrating the depth map 3.

The distance parameter F (v) and the weight value W (v) set for each voxel 7 are stored in each voxel 7 of the volume data 6. For example, when the data generated from the i-th depth map 3 and the data generated from the i + 1th depth map 3 are integrated, the integrated distance parameter F (v) and the weight value W (v) are as follows. It can be calculated according to the formula (4) shown.

Here, the subscripts (i and i + 1) are indexes representing the numbers of the corresponding depth maps 3 (volume data 6).

For example, when the i-th volume data 6 is generated, the voxel integration unit 46 adds (4) to the volume data (data obtained by integrating the first to i-1st volume data 6) that has been integrated up to that point. ) The i-th volume data 6 is integrated according to the equation. In this way, the volume data 6 is sequentially integrated to generate the final integrated volume data. Then, the mesh extraction unit 47 generates model data 22 of the 3D model 2 in which the shape of the object 1 is restored from the integrated volume data.
As described above, in the present embodiment, the voxel integration unit 46 corresponds to the plurality of voxels 7 based on the weighted sum of the distance parameters F (v) of each of the plurality of voxels 7 based on the weight value W (v). Each depth map 3 is integrated. That is, the depth maps 3 (volume data 6) taken from different viewpoints are integrated using the equation (4). This makes it possible to realize high-quality three-dimensional shape restoration that is resistant to noise.

FIG. 16 is a diagram showing an example of generating a 3D model. FIG. 16A shows a 3D model 2 of a person's face generated by adjusting the width of the distribution of the weight value W (v) by applying the present technique. Further, FIG. 16B shows a 3D model 2 of a person's face generated without adjusting the width of the distribution of the weight value W (v).

For example, in the example shown in FIG. 16B, since the width of the distribution of the weight value W (v) is not adjusted, the distance parameter F (v) calculated from different viewpoints interferes. As a result, a 3D model 2 deformed so that the subject's nose is swollen is generated, and the incorrect shape is restored.
On the other hand, as shown in FIG. 16A, when the width of the distribution of the weight value W (v) is adjusted, the weight value W (v) of the distance parameter F (v) becomes, for example, on the side surface of the nose of the subject. Set low. This avoids a situation in which the distance parameter F (v) interferes with the side surface of the subject's nose. As a result, as shown in FIG. 16A, the shape of the nose of the subject can be restored with high accuracy, and a highly accurate 3D model 2 that accurately reproduces the shape of the object 1 can be easily generated. It will be possible.

[Setting weight value using line-of-sight vector]
In the above, the method of setting the weight value W (V) has been described mainly by referring to the reflection characteristic (luminance value of infrared light) of the target portion P.
As the value of the weight value W (v) of TSDF, the reliability of the depth value according to the shape of the surface of the object 1 may be used. Specifically, the accuracy can be further improved by using the angle formed by the orientation of the surface of the object 1 and the orientation taken by the ToF camera 11 in addition to the noise level of the depth map 3.

FIG. 17 is a schematic diagram showing the relationship between the normal vector and the line-of-sight vector. In FIG. 17, the normal vector n on the surface of the object 1 in the target portion P and the line-of-sight vector r of the ToF camera 11 directed to the target portion P are schematically shown.
The normal vector n is a unit vector orthogonal to the surface of the object 1 at the target portion P. For example, the surface shape of the object 1 is easily estimated using the depth map 3. Based on the estimation result of the surface shape, the normal vector n at the target portion P is calculated.
The line-of-sight vector r is a unit vector indicating the observation direction when the target portion P is viewed from the ToF camera 11. For example, the direction along the line connecting the current position of the ToF camera 11 and the target portion P is the line-of-sight vector r.

Here, a method of setting the weight value W (V) based on the normal vector n and the line-of-sight vector r will be described.
In the mobile terminal 100, as described above, the depth value is calculated based on the output of the ToF camera 11 that detects the infrared light reflected by the target portion P. At this time, the infrared light is the line-of-sight vector r.
It is incident on the target site along the line, and the reflected light is detected. Therefore, for example, when the angle formed by the surface of the object 1 is close to the direction in which infrared light is detected (the line-of-sight vector r of the ToF camera 11), the accuracy of the depth value may decrease.
Therefore, in the present embodiment, the TSDF calculation unit adjusts the weight value W (v) according to the angle between the line-of-sight vector r from the ToF camera 11 toward the target portion P and the surface of the target portion P.

For example, when the ToF camera 11 faces the normal vector n on the surface of the object 1, the accuracy of the depth value is the highest, and it is considered that the accuracy decreases as the line-of-sight vector r of the ToF camera 11 becomes slanted. Be done. By reflecting this characteristic in the TSDF value (distance parameter F (v)), it can be expected that the accuracy of the 3D model 2 will be improved.
Specifically, the weight value W (v) with respect to the distance parameter F (v) is calculated according to the following equation (5).

Here, r and n are the above-mentioned normal vector n and line-of-sight vector r.

The equation (5) is an equation obtained by multiplying the above equation (3) by a term (−r · n) representing the inner product of the normal vector n and the line-of-sight vector r.
For example, when the angle between the normal vector n and the line-of-sight vector r is θ, the term representing the inner product of the equation (5) is represented by cos θ between the normal vector n and the line-of-sight vector r. Therefore, as the line-of-sight vector r becomes perpendicular to the surface, θ approaches 0, and the value of the inner product term approaches 1. On the contrary, as the line-of-sight vector r becomes parallel to the plane, θ approaches 90 °, and the value of the inner product term approaches 0.

As described above, in the present embodiment, the weight value W (v) with respect to the distance parameter F (v) is set so as to be proportional to the inner product of the normal vector n and the line-of-sight vector r on the surface of the target portion P. This makes it possible to set W (v) so that, for example, when observing the same location from multiple viewpoints, the information on the viewpoint position, which is expected to have high accuracy of the depth value, is positively used. Become. Therefore, it is possible to restore the shape with higher accuracy.

As described above, in the controller 30 according to the present embodiment, the object 1 is irradiated with infrared light, and the depth value of the target portion P is acquired by the ToF method. Further, with respect to the distance parameter F (v) to the target portion P based on the depth value, the weight value W (v) with respect to the target portion is set. The width of the distribution of the weight value W (v) is set narrower as the luminance value of the infrared light reflected by the target portion P increases. As a result, for example, the position of the target portion P can be represented with appropriate accuracy, and the three-dimensional shape can be detected with high accuracy.

As a method of restoring the three-dimensional shape of a real object, for example, a method called Photogrammetry using a large number of cameras is known. This method requires a large number of cameras, which may increase the cost of modeling. In addition, since it is a large-scale shooting system, it is possible that the shooting itself cannot be performed when the subject cannot be moved.

In the present embodiment, an adaptive weight value W (v) is given to each voxel 7 of the Masstric TSDF in consideration of the noise level of the depth value caused by the reflection characteristic of the object 1. At this time, the infrared image 4 that can be acquired at the same time as the depth map 3 is used to calculate the noise level. This makes it possible to restore the three-dimensional shape of a real object with high accuracy. In addition, since the device configuration is simple, it is possible to realize highly accurate 3D modeling and the like at low cost.

Further, by using this technology, it is possible to perform high-precision shape restoration using only the monocular ToF camera 11 like the mobile terminal 100. Therefore, by using this technology, it is possible to easily realize high-quality three-dimensional shape restoration in a smartphone or tablet terminal equipped with a ToF camera 11, for example. This makes it possible to provide users with an unprecedented new experience, such as recording the shapes of familiar objects and sharing them as 3D models.

In recent years, cameras mounted on devices such as smartphones have been actively promoted to have multiple eyes and depth sensors. For example, a device that realizes a face recognition function by mounting a Structured Light depth sensor on the front and a sensor that combines a ToF sensor and an IR camera (ToF camera) are mounted on the front, in addition to face recognition. Devices have been developed that realize vein recognition and non-contact gesture recognition functions.
It is expected that such a trend will continue to accelerate in the future, and it is highly likely that ToF sensors will be installed in many devices. This technology can be applied to devices equipped with such a ToF sensor, and it is possible to provide a technology for three-dimensionally restoring the shape of an object with high quality and high accuracy, including the generation of a 3D model. ..

<Second embodiment>
The photographing system of the second embodiment which concerns on this technique will be described. In the following description, the description of the parts similar to the configuration and operation in the mobile terminal 100 described in the above embodiment will be omitted or simplified.

FIG. 18 is a schematic diagram showing a photographing system according to the second embodiment. The photographing system 200 has a plurality of ToF cameras 211.
Each ToF camera 211 is arranged so as to surround, for example, a predetermined shooting area 201. As the ToF camera 211, for example, the same device as the ToF camera 11 shown in FIG. 1 is used. The ToF camera 211 may be configured as a single photographing device, or may be used by connecting the ToF camera 211 provided in an information terminal such as a smartphone to each other.

The object 1 is arranged in the photographing area 201 surrounded by the ToF camera 11. That is, the photographing system 200 is a system that surrounds the object 1 with a plurality of ToF cameras 211 and photographs the object 1 from various directions.
In the photographing system 200, the depth value is calculated based on the outputs of a plurality of ToF cameras 211 provided so as to surround the object 1. For example, the object 1 is simultaneously photographed by a plurality of ToF cameras 211, and the depth value is calculated for each pixel based on the output of each camera. This makes it possible to generate a plurality of depth maps 3 at once.
Therefore, in the photographing system 200, even when the object 1 is moving, it is possible to accurately measure the shape at each moment.

Volume data 6 (position parameter F (v) and weight value W (v)) are generated for each of these depth maps 3. At this time, shooting parameters such as the position and posture of each ToF camera 211 are appropriately referred to. The shooting parameters are acquired, for example, by pre-calibration.
When setting the weight value W (v), for example, the width of the distribution of the weight value W (v) is set according to the noise level calculated from the infrared image 4. in this way. By using the volume data 6 in which the weight value W (v) is adaptively set, it is possible to realize highly accurate shape restoration.

<Third embodiment>
FIG. 19 is a schematic diagram showing a photographing system according to the third embodiment. The photographing system 300 has a ToF camera 311 and a rotating stage 312.
The ToF camera 311 is arranged so that an object on the rotating stage 312 can be photographed. As the ToF camera 311, for example, the same device as the ToF camera 11 shown in FIG. 1 is used.
The rotary stage 312 is a pedestal that rotates about a predetermined axis. As the rotary stage 312, for example, a turntable or the like is used.

The object 1 is placed on the rotary stage 312. That is, in the photographing system 300, it can be said that the object 1 is rotatably arranged.
Then, with the object 1 rotated, the object 1 is photographed at a predetermined frame rate by the ToF camera 311, and the depth value is calculated for each pixel. As a result, a plurality of depth maps 3 in which the object 1 is viewed from different directions are generated.
As described above, in the photographing system 300, the depth value is calculated based on the output of the ToF camera 311 that photographs the rotating object 1.

For example, as shown in FIG. 19, when the object 1 is on the rotation stage 312 and the motion of the object 1 itself is limited to the rigid body motion, the positional relationship between the object 1 and the ToF camera 311 is estimated. It is possible. For example, the area of the object 1 is segmented by image recognition or the like. Then, using a method such as Visual SLAM, the positional relationship between the object 1 and the ToF camera 311 in each frame is estimated.
Based on this positional relationship, volume data 6 (position parameter F (v) and weight value W (v)) are generated for each depth map 3 acquired in each frame, and the 3D model 2 of the object 1 is generated. Generated.

As shown in FIG. 19, a stage or the like in which the object moves in parallel may be used in addition to the case where the object 1 rotates. Even in this case, the depth map 3 obtained by photographing the object 1 from different positions can be obtained.
Further, for example, when the motion of the object 1 is a non-rigid body motion, it is possible to realize shape restoration by combining with Warp-Field estimation that parametrically expresses the non-rigid body motion in each video frame, for example. ..

<Other embodiments>
The present technology is not limited to the embodiments described above, and various other embodiments can be realized.

In the above, a ToF camera capable of acquiring an infrared image together with a depth value has been described as an example. Not limited to this, when an infrared image cannot be acquired, it is possible to estimate the noise level of the depth value by using an image of another camera or the like.
For example, when a color camera (outward facing camera, etc.) is provided at a position adjacent to the ToF sensor, it can be used as a substitute for the brightness value of infrared light based on the image taken by the color camera. be.

For example, when the irradiation light used by the ToF sensor is infrared light, the red pixel value (Red Channel), which has the closest wavelength to the infrared light, is used instead of the brightness value of the infrared light in the output of the color camera. The noise level is estimated by doing so. In this case, in the calibration process, a regression function (regression coefficient) showing the relationship between the red pixel value and the standard deviation of the depth value is calculated.
Since the optical axes of the ToF sensor and the color camera are different, the camera is geometrically calibrated in advance, and the internal parameters of the two and the information on the positional relationship between the elements are calculated and used in advance.
This makes it possible to realize highly accurate three-dimensional shape restoration processing even when a ToF camera is not provided.

In the above, the shape restoration that mainly generates a 3D model of the object has been described. The present technology may be applied to any application using the depth value without being limited to this. For example, in face recognition and non-contact gesture recognition, processing for recognizing the shape of a face or hand in real space is performed. At this time, the width of the distribution of the weight values for combining the plurality of depth maps is appropriately set according to the brightness value of the irradiation light. This enables highly accurate recognition processing.
In addition, the applications to which this technology can be applied are not limited.

In the above, the case where the information processing method according to the present technology is executed by a computer such as a mobile terminal operated by the user has been described. However, the information processing method and the program according to the present technology may be executed by a computer operated by the user and another computer capable of communicating via a network or the like.

That is, the information processing method and program according to the present technology can be executed not only in a computer system composed of a single computer but also in a computer system in which a plurality of computers operate in conjunction with each other. In the present disclosure, the system means a set of a plurality of components (devices, modules (parts), etc.), and it does not matter whether or not all the components are in the same housing. Therefore, a plurality of devices housed in separate housings and connected via a network, and one device in which a plurality of modules are housed in one housing are both systems.

The information processing method and program execution related to this technology by a computer system are, for example, acquisition of a depth value and setting of a weight value based on the target part regarding the distance parameter to the target part based on the depth value. Includes both when performed by one computer and when each process is performed by a different computer. Further, the execution of each process by a predetermined computer includes having another computer execute a part or all of the process and acquiring the result.

That is, the information processing method and program related to this technology can be applied to the configuration of cloud computing in which one function is shared by a plurality of devices via a network and processed jointly.

It is also possible to combine at least two feature parts among the feature parts related to the present technology described above. That is, the various characteristic portions described in each embodiment may be arbitrarily combined without distinction between the respective embodiments. Further, the various effects described above are merely exemplary and not limited, and other effects may be exhibited.

In the present disclosure, "same", "equal", "orthogonal", etc. are concepts including "substantially the same", "substantially equal", "substantially orthogonal", and the like. For example, a state included in a predetermined range (for example, a range of ± 10%) based on "exactly the same", "exactly equal", "exactly orthogonal", etc. is also included.

In addition, this technology can also adopt the following configurations.
(1) An acquisition unit that acquires the depth value of the target part of the real object irradiated with the irradiation light by the ToF method, and the acquisition unit.
The larger the brightness value of the irradiation light reflected by the target portion, the narrower the width of the distribution of the weight value with respect to the target portion with respect to the distance parameter to the target portion based on the depth value. An information processing device equipped with.
(2) The information processing apparatus according to (1).
The acquisition unit acquires a depth map to which the depth value is mapped, and obtains the depth map.
Further, an information processing apparatus including a model generation unit that generates a 3D model of the real object based on the depth map.
(3) The information processing apparatus according to (2).
The model generation unit calculates the distance parameter representing the distance between the voxel and the target site for each of the plurality of voxels that divide the space including the real object based on the depth value, and uses the distance parameter as the distance parameter. Based on the above 3D model,
The setting unit is an information processing device that sets the weight value for each of the plurality of voxels.
(4) The information processing apparatus according to (3).
The model generation unit is an information processing device that integrates each depth map corresponding to the plurality of voxels based on the weighted sum of the distance parameters of each of the plurality of voxels based on the weight values.
(5) The information processing apparatus according to (3) or (4).
The setting unit is an information processing device that sets the weight values for each of the plurality of voxels so that the distribution of the weight values becomes a normal distribution that peaks at the target site.
(6) The information processing apparatus according to any one of (3) to (5).
The irradiation light is infrared light, and is
In the setting unit, the noise level of the depth value of the target portion is σ, the brightness value of the infrared light reflected by the target portion is I, the first coefficient set in advance is A, and the preset portion is set in advance. With the set second coefficient as B, the noise level is calculated according to the following equation (1).

Information processing equipment.
(7) The information processing apparatus according to (6).
The acquisition unit acquires a plurality of depth maps obtained by measuring the test object from the same position and a plurality of infrared images corresponding to the plurality of depth maps.
Further, an information processing apparatus including a calibration processing unit that calculates the first coefficient (A) and the second coefficient (B) based on the plurality of depth maps and the plurality of infrared images.
(8) The information processing apparatus according to (7).
The calibration processing unit calculates the average value of the luminance value for each pixel of the plurality of infrared images, and calculates the standard deviation of the depth value as the noise level of the depth value for each pixel of the plurality of depth maps. Information for calculating the first coefficient (A) and the second coefficient (B) by fitting the equation (1) to the distribution of the average value of the luminance value and the standard deviation of the depth value. Processing device.
(9) The information processing apparatus according to any one of (3) to (8).
The distance parameter is a distance obtained by normalizing the distance between the voxel and the target site in the detection direction of the depth value with a threshold value.
The model generation unit is an information processing device that calculates the distance parameter for the voxel in which the distance between the voxel and the target portion in the detection direction of the depth value is equal to or less than the threshold value.
(10) The information processing apparatus according to (9).
In the setting unit, the position coordinates of the voxel are v, the distance between the voxel and the target portion in the detection direction of the depth value is D (v), and the weight value set in the voxel is W (v). ), The noise level of the depth value of the target portion is σ, the preset adjustment coefficient is δ, and the weight value is set according to the following equation (2).

Information processing equipment.
(11) The information processing apparatus according to any one of (1) to (10).
The acquisition unit calculates the depth value based on the output of the ToF sensor that detects the irradiation light reflected by the target portion.
The setting unit is an information processing device that adjusts the weight value according to the angle between the detection vector toward the target portion from the ToF sensor and the surface of the target portion.
(12) The information processing apparatus according to (11).
The setting unit is an information processing device that sets the weight value in proportion to the inner product of the normal vector and the detection vector on the surface of the target portion.
(13) The information processing apparatus according to any one of (1) to (12).
The setting unit calculates the noise level of the depth value based on the brightness value of the irradiation light reflected by the target portion, and sets the width of the distribution of the weight value based on the noise level. Device.
(14) The information processing apparatus according to any one of (1) to (13).
The acquisition unit is an information processing device that calculates the depth value based on the output of a ToF sensor provided in a device carried by a user.
(15) The information processing apparatus according to any one of (1) to (13).
The acquisition unit is an information processing device that calculates the depth value based on the outputs of a plurality of ToF sensors provided so as to surround the real object.
(16) The information processing apparatus according to any one of (1) to (13).
The real object is rotatably arranged and
The acquisition unit is an information processing device that calculates the depth value based on the output of the ToF sensor that captures the rotating real object.
(17) The information processing apparatus according to any one of (1) to (16).
The real object is an information processing device that is a human face.
(18) By the ToF method, the depth value of the target part of the real object irradiated with the irradiation light is acquired.
The larger the luminance value of the irradiation light reflected by the target site, the narrower the width of the distribution of the weight value with respect to the target site is set with respect to the distance parameter to the target site based on the depth value. Information processing methods performed by computer systems.
(19) A step of acquiring the depth value of the target part of the real object irradiated with the irradiation light by the ToF method, and
As the brightness value of the irradiation light reflected by the target site is larger, the width of the distribution of the weight value with respect to the target site is set narrower with respect to the distance parameter to the target site based on the depth value. A computer-readable recording medium on which the program that runs the program is recorded.

1 ... Object 2 ... 3D model 3 ... Depth map 4 ... Infrared image 5 ... Infrared light 7 ...

Voxel

11, 211, 311 ... ToF camera 19 ... Storage unit 20 ... Control program 30 ... Controller 31 ... Data acquisition unit 32 … Calibration processing unit 33… Volume data generation unit 34… Model data generation unit 100…

Mobile terminal

200, 300… Shooting system

Claims

An acquisition unit that acquires the depth value of the target part of the real object irradiated with the irradiation light by the ToF method, and the acquisition unit.
The larger the brightness value of the irradiation light reflected by the target portion, the narrower the width of the distribution of the weight value with respect to the target portion with respect to the distance parameter to the target portion based on the depth value. An information processing device equipped with.
The information processing apparatus according to claim 1.
The acquisition unit acquires a depth map to which the depth value is mapped, and obtains the depth map.
Further, an information processing apparatus including a model generation unit that generates a 3D model of the real object based on the depth map.
The information processing apparatus according to claim 2.
The model generation unit calculates the distance parameter representing the distance between the voxel and the target site for each of the plurality of voxels that divide the space including the real object based on the depth value, and uses the distance parameter as the distance parameter. Based on the above 3D model,
The setting unit is an information processing device that sets the weight value for each of the plurality of voxels.
The information processing apparatus according to claim 3.
The model generation unit is an information processing device that integrates each depth map corresponding to the plurality of voxels based on the weighted sum of the distance parameters of each of the plurality of voxels based on the weight values.
The information processing apparatus according to claim 3.
The setting unit is an information processing device that sets the weight values for each of the plurality of voxels so that the distribution of the weight values becomes a normal distribution that peaks at the target site.
The information processing apparatus according to claim 3.
The irradiation light is infrared light, and is
In the setting unit, the noise level of the depth value of the target portion is σ, the brightness value of the infrared light reflected by the target portion is I, the first coefficient set in advance is A, and the preset portion is set in advance. With the set second coefficient as B, the noise level is calculated according to the following equation (1).

Information processing equipment.
The information processing apparatus according to claim 6.
The acquisition unit acquires a plurality of depth maps obtained by measuring the test object from the same position and a plurality of infrared images corresponding to the plurality of depth maps.
Further, an information processing apparatus including a calibration processing unit that calculates the first coefficient (A) and the second coefficient (B) based on the plurality of depth maps and the plurality of infrared images.
The information processing apparatus according to claim 7.
The calibration processing unit calculates the average value of the luminance value for each pixel of the plurality of infrared images, and calculates the standard deviation of the depth value as the noise level of the depth value for each pixel of the plurality of depth maps. Information for calculating the first coefficient (A) and the second coefficient (B) by fitting the equation (1) to the distribution of the average value of the luminance value and the standard deviation of the depth value. Processing device.
The information processing apparatus according to claim 3.
The distance parameter is a distance obtained by normalizing the distance between the voxel and the target site in the detection direction of the depth value with a threshold value.
The model generation unit is an information processing device that calculates the distance parameter for the voxel in which the distance between the voxel and the target portion in the detection direction of the depth value is equal to or less than the threshold value.
The information processing apparatus according to claim 9.
In the setting unit, the position coordinates of the voxel are v, the distance between the voxel and the target portion in the detection direction of the depth value is D (v), and the weight value set in the voxel is W (v). ), The noise level of the depth value of the target portion is σ, the preset adjustment coefficient is δ, and the weight value is set according to the following equation (2).

Information processing equipment.
The information processing apparatus according to claim 1.
The acquisition unit calculates the depth value based on the output of the ToF sensor that detects the irradiation light reflected by the target portion.
The setting unit is an information processing device that adjusts the weight value according to the angle between the detection vector toward the target portion from the ToF sensor and the surface of the target portion.
The information processing apparatus according to claim 11.
The setting unit is an information processing device that sets the weight value in proportion to the inner product of the normal vector and the detection vector on the surface of the target portion.
The information processing apparatus according to claim 1.
The setting unit calculates the noise level of the depth value based on the brightness value of the irradiation light reflected by the target portion, and sets the width of the distribution of the weight value based on the noise level. Device.
The information processing apparatus according to claim 1.
The acquisition unit is an information processing device that calculates the depth value based on the output of a ToF sensor provided in a device carried by a user.
The information processing apparatus according to claim 1.
The acquisition unit is an information processing device that calculates the depth value based on the outputs of a plurality of ToF sensors provided so as to surround the real object.
The information processing apparatus according to claim 1.
The real object is rotatably arranged and
The acquisition unit is an information processing device that calculates the depth value based on the output of the ToF sensor that captures the rotating real object.
The information processing apparatus according to claim 1.
The real object is an information processing device that is a human face.
By the ToF method, the depth value of the target part of the real object irradiated with the irradiation light is acquired.
The larger the luminance value of the irradiation light reflected by the target site, the narrower the width of the distribution of the weight value with respect to the target site is set with respect to the distance parameter to the target site based on the depth value. Information processing methods performed by computer systems.
The step of acquiring the depth value of the target part of the real object irradiated with the irradiation light by the ToF method, and
As the brightness value of the irradiation light reflected by the target site is larger, the width of the distribution of the weight value with respect to the target site is set narrower with respect to the distance parameter to the target site based on the depth value. A computer-readable recording medium on which the program that runs the program is recorded.