WO2022123919A1

WO2022123919A1 - Information processing device, information processing method, and information processing program

Info

Publication number: WO2022123919A1
Application number: PCT/JP2021/038741
Authority: WO
Inventors: 健志後藤
Original assignee: ソニーグループ株式会社
Priority date: 2020-12-11
Filing date: 2021-10-20
Publication date: 2022-06-16
Also published as: CN116615748A; JPWO2022123919A1

Abstract

Provided are an information processing device, an information processing method, and an information processing program that can extract a region in an image or a depth map of a subject, even when colors are the same or similar. The information processing device comprises: a difference map generation unit that generates a difference map from a first depth map of a subject and a second depth map of the subject, said depth maps having been acquired using a ToF sensor; and a region extraction unit that extracts a region in the first depth map on the basis of the difference map.

Description

Information processing equipment, information processing method, information processing program

This technology relates to information processing devices, information processing methods, and information processing programs.

Various techniques have been proposed to extract a specific area from an image of an object or a depth map. For example, by using a so-called RGB camera capable of capturing RGB (Red, Green, Blue) images, it is possible to extract regions having different colors.

Further, a technique called ToF (Time Of Flight) can be used for extracting the region. There is a technique called ToF, which acquires distance information (depth information) by measuring the reflection time of pulsed light applied to an object for each pixel.

For ToF, in order to acquire distance information (depth information) more accurately, the stereo distance calculated according to the stereo method using two images and the accurate distance information of the object are generated using ToF. A technique has been proposed (Patent Document 1).

WO2017 / 159312 Gazette

However, even if an RGB camera is used, there is a problem that it is not possible to accurately extract areas that are different from the image of the object, depth map, etc., but have the same or similar colors. Further, even if the accuracy of distance information generation by ToF is improved, there is a similar problem.

This technique has been made in view of these points, and is an information processing device, an information processing method, and an information processing device capable of extracting an area having the same or similar colors in an image of an object or a depth map. The purpose is to provide an information processing program.

In order to solve the above-mentioned problems, the first technique is a difference map generator that generates a difference map from a first depth map for an object acquired by a ToF sensor and a second depth map for the object. It is an information processing apparatus including a region extraction unit that extracts an region on a first depth map based on a difference map.

Further, the second technique generates a difference map from the first depth map of the object acquired by the ToF sensor and the second depth map of the object, and on the first depth map based on the difference map. It is an information processing method for extracting the area in.

Further, the third technique generates a difference map from the first depth map for the object acquired by the ToF sensor and the second depth map for the object, and on the first depth map based on the difference map. It is an information processing program that causes a computer to execute an information processing method for extracting an area in.

It is a block diagram which shows the structure of the information processing system 10 in 1st Embodiment. It is a block diagram which shows the structure of the information processing apparatus 100 in 1st Embodiment. It is a block diagram which shows the structure of the functional block of the information processing apparatus 100 in 1st Embodiment. It is a flowchart which shows the process of the information processing apparatus 100 in 1st Embodiment. It is explanatory drawing of the detection of the peak in a histogram. It is a figure which shows the example of the display image in 1st Embodiment. It is a block diagram which shows the structure of the functional block of the information processing apparatus 200 in 2nd Embodiment. It is a figure which shows the example of the table for type identification. It is a flowchart which shows the process of the information processing apparatus 200 in 2nd Embodiment. It is a figure which shows the example of a histogram. It is a figure which shows the example of the display image in 2nd Embodiment. It is a block diagram which shows the structure of the information processing system 30 in 3rd Embodiment. It is a block diagram which shows the structure of the functional block of the information processing apparatus 300 in 3rd Embodiment. It is explanatory drawing of 3D shape data. It is a flowchart which shows the process of the information processing apparatus 300 in 3rd Embodiment. It is explanatory drawing of the effect by the 3rd Embodiment.

Hereinafter, embodiments of the present technology will be described with reference to the drawings. The explanation will be given in the following order.
<1. First Embodiment>
[1-1. Configuration of information processing system 10]
[1-2. Configuration of information processing device 100]
[1-3. Processing by information processing device 100]
<2. Second Embodiment>
[2-1. Configuration of information processing device 200]
[2-1. Processing by information processing device 200]
<3. Third Embodiment>
[3-1. Configuration of information processing system 30]
[3-2. Configuration of information processing device 300]
[3-3. Processing by information processing device 300]
<4. Modification example>

<1. First Embodiment>
[1-1. Configuration of information processing system 10]
The configuration of the information processing system 10 in the first embodiment of the present technology will be described with reference to FIG. 1. The information processing system 10 includes an information processing device 100, a ToF sensor 500, and a distance measuring sensor 600.

The information processing apparatus 100 performs region extraction processing based on a depth map of an object generated by the ToF sensor 500 and an image or depth map of the object generated by the distance measuring sensor 600.

The object is an object to be subjected to the area extraction process by the information processing apparatus 100, and is two or more objects that can be distinguished from each other, or one object but different materials, materials, and materials on the surface thereof (hereinafter referred to as “objects”). It is an object that has a plurality of areas composed of materials (referred to as materials, etc.).

Two or more things that can be distinguished are, for example, a "hand (skin) holding a spoon" consisting of a spoon and a hand (skin), and a "food on a plate" consisting of a plate and food. There is only one object, but the surface of the object has multiple areas made of different materials, for example, a tip part (pot) made of metal, a spoon made of a wooden handle, and cardboard. The material of the box and its cardboard box is different from that of the box, and the characters, figures, decorations, etc. attached to the surface of the box.

Note that items with the same name but different materials, etc. shall be "two or more items that can be distinguished". So, for example, a wooden spoon and a metal spoon are two or more things that are the same but distinguishable.

When two or more areas are extracted by the information processing apparatus 100, there are cases where there are two or more objects and each object is extracted as an area, and there is one object but the object. There may be more than one distinguishable area on the surface. Further, there may be two or more objects and two or more distinguishable areas on the surface of the object.

The ToF sensor 500 is a sensor using ToF that acquires distance information (first depth map) to an object to be processed by the information processing apparatus 100. There are two types of ToF, iToF (indirect Time of Flight) and dToF (direct Time of Flight), and the ToF sensor 500 may use either method. iToF is a method of obtaining the depth from the phase difference of a periodic signal. dToF is a method of measuring the time when a pulsed laser irradiated from a light source is sent and the time when it is returned, and obtaining the depth from the difference.

In ToF, a phenomenon called multipath occurs. ToF measures the time it takes for light to be reflected by an object and returned to be received, but depending on the object, the light may not be totally reflected on the surface of the object, but part of it may enter the inside of the object and be reflected inside. After repeating the above steps, the light returns to the ToF sensor 500. Therefore, depending on the object, it takes a long time for the ToF sensor 500 to receive the light, and the distance is detected farther than it actually is. The reflectance of this object varies depending on the material of the object and the like.

The user needs to generate a first depth map for the object by using the ToF sensor 500 before performing the processing by the information processing apparatus 100.

The distance measuring sensor 600 is an RGB stereo camera composed of a first RGB camera 610 and a second RGB camera 620. The first RGB camera 610 is a camera capable of capturing RGB (Red, Green, Blue) images, and corresponds to the first imaging device within the scope of claims. The first RGB image acquired by taking a picture of the first RGB camera 610 corresponds to the first image in the claims.

The second RGB camera 620 is a camera capable of capturing RGB (Red, Green, Blue) images, and corresponds to the second imaging device within the scope of claims. The second RGB image acquired by the shooting of the second RGB camera 620 corresponds to the second image in the claims. In the following description, the distance measuring sensor 600 will be described unless it is necessary to distinguish between the first RGB camera 610 and the second RGB camera 620.

The user needs to acquire the first RGB image and the second RGB image of the object by using the distance measuring sensor 600 before performing the processing by the information processing apparatus 100.

[1-2. Configuration of information processing device 100]
Next, the configuration of the information processing apparatus 100 will be described. The information processing apparatus 100 includes a control unit 150, a storage unit 160, an interface 170, and a display unit 180.

The control unit 150 is composed of a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), and the like. The CPU executes various processes according to the program stored in the ROM and issues commands to control the entire information processing apparatus 100 and each part thereof.

The storage unit 160 is a large-capacity storage medium such as a hard disk or a flash memory. The storage unit 160 stores programs, calibration data, tables, and the like used in processing in the information processing apparatus 100.

The interface 170 is an interface for communicating with the ToF sensor 500 and the distance measuring sensor 600. The interface 170 may include a wired or wireless communication interface. More specifically, the wired or wireless communication interface is cellular communication such as 3TTE, Wi-Fi, Bluetooth (registered trademark), NFC (Near Field Communication), Ethernet (registered trademark), HDMI (registered trademark). (High-Definition Multimedia Interface), USB (Universal Serial Bus), etc. may be included. Further, when the information processing device 100, the ToF sensor 500, and the distance measuring sensor 600 are connected in terms of hardware, the interface 170 may include a connection terminal between the devices, a bus in the device, and the like. Also, if the information processing device 100 is distributed across a plurality of devices, the interface 170 may include different types of interfaces for each device. For example, the interface 170 may include both a communication interface and an in-device interface.

The display unit 180 is a display device composed of, for example, an LCD (Liquid Crystal Display), a PDP (Plasma Display Panel), an organic EL (Electro Luminescence) panel, or the like.

As shown in FIG. 3, the information processing apparatus 100 is configured to include functional blocks such as a depth map generation unit 101, a difference map generation unit 102, an area extraction unit 103, and an image processing unit 104. Each of these units is a function realized by the control unit 150. Further, data and information are transmitted and received between each unit and the ToF sensor 500 and the distance measuring sensor 600 by using the interface 170.

The depth map generation unit 101 performs triangulation by pattern matching or the like using the first RGB image taken by the first RGB camera 610 constituting the distance measuring sensor 600 and the second RGB image taken by the second RGB camera 620. This will generate a second depth map.

The difference map generation unit 102 generates a difference map using the first depth map generated by the ToF sensor 500 and the second depth map generated by the depth map generation unit 101.

The area extraction unit 103 extracts an area on the first depth map using the first depth map and the difference map.

The image processing unit 104 generates a display image for showing the extraction result by the area extraction unit 103.

The information processing device 100 is configured as described above. The functional block in the information processing apparatus 100 may be realized by executing a program, and the execution of the program may cause a personal computer, a tablet terminal, a smartphone, a server apparatus, or the like to have a function as the information processing apparatus 100. .. The program may be installed in a device such as a personal computer in advance, or may be distributed by download, storage medium, or the like so that the user can install it by himself / herself.

[1-3. Processing by information processing device 100]
Next, processing by the information processing apparatus 100 will be described with reference to the flowchart of FIG.

First, in step S101, the information processing apparatus 100 acquires the first depth map of the object from the ToF sensor 500.

Further, in step S102, the information processing apparatus 100 acquires the first RGB image and the second RGB image from the ranging sensor 600. It should be noted that steps S101 and S102 do not necessarily have to be performed in this order, and may be in the reverse order or may be performed substantially at the same time.

Next, in step S103, the depth map generation unit 101 generates a second depth map from the first RGB image and the second RGB image by performing triangulation by pattern matching or the like using the first calibration data.

The first calibration data is data showing the relative positional relationship between the first RGB camera 610 and the second RGB camera 620. Since the first RGB camera 610 and the second RGB camera 620 have different viewpoints, it is necessary to match the viewpoints using the first calibration data in order to generate the second depth map.

Further, in order to perform pattern matching, it is necessary to consider the distortion of the first RGB camera 610 and the second RGB camera 620. The distortion correction data may be included in the first calibration data.

The first calibration data may be stored in the storage unit 160 in advance, or may be stored in the information processing apparatus 100 in advance. Further, the first calibration data such as an external server may be stored, and the information processing apparatus 100 may access and read the server via the interface 170.

Next, in step S104, the difference map generation unit 102 generates a difference map based on the first depth map and the second depth map.

In the generation of the difference map, first, the second depth map is projected onto the first depth map using the second calibration data to generate the difference map generation map.

The second calibration data is data showing the relative positional relationship between the ToF sensor 500 and the distance measuring sensor 600. Since the ToF sensor 500 and the distance measuring sensor 600 are separate bodies and have different viewpoints, the viewpoints are matched with the second calibration data and the second depth map is projected onto the first depth map. Then, the difference map is generated by calculating the difference from the difference map generation map for each pixel constituting the first depth map.

The second calibration data may be stored in the storage unit 160 in advance, or may be stored in the information processing apparatus 100 in advance. Further, the second calibration data such as an external server may be stored, and the information processing apparatus 100 may access and read the server via the interface 170.

Next, in step S105, the region extraction unit 103 generates a histogram of the difference map and detects a peak from the histogram. Peaks in the histogram are detected, for example, as shown in FIG. 5A. Peaks are detected in each region. For example, if there are two objects, two peaks will be detected. Even if the object is one object, if there are two distinguishable regions on the surface of the object, two peaks are detected.

Next, in step S106, the area extraction unit 103 extracts an area from all the pixels constituting the first depth map by extracting all the pixels having the depth difference of the peak detected from the histogram.

Alternatively, as shown in FIG. 5B, the region extraction unit 103 extracts a region from the first depth map by extracting all pixels having a depth difference included in a predetermined amount of width centered on the peak detected from the histogram. Extract.

The width of a predetermined amount centered on the peak is a width set based on the variation in depth due to the performance of the ToF sensor 500 and the distance measuring sensor 600, the variation in depth for each object, and the like, and is a width set in advance based on the information processing apparatus 100. Set in. By extracting all the pixels having the depth difference included in the width of a predetermined amount, the area can be appropriately defined according to the variation in depth due to the performance of the ToF sensor 500 and the distance measuring sensor 600, the variation in depth for each object, and the like. Can be extracted.

The user may be able to select whether to extract the region based on the peak or the region based on the width of a predetermined amount centered on the peak, or the ToF sensor 500 and the ranging sensor 600 may be selected. The information processing apparatus 100 may set the extraction method based on the performance and the like.

The depth that can be acquired by the ToF sensor 500 deviates from the true value depending on the material of the object and the like due to the above-mentioned multipath. Therefore, by comparing the depth acquired by another method for the same object (distance measuring sensor 600 in this embodiment) and the depth acquired by the ToF sensor 500 and grasping the deviation, the area for each material and the like is obtained. It becomes possible to perform extraction.

Next, in step S107, the image processing unit 104 generates a display image in which the area extracted from the first depth map is drawn. The display image is generated by drawing the extracted area on the first depth map. The display image may be generated by drawing on any of the second depth map, the first RGB image, and the second RGB image. By displaying the generated display image on the display unit 180, the user can confirm the extracted area.

For example, assume that the object is a "hand of a person holding a spoon", the spoon is made of wood, and the color is similar to that of a person's hand. Here, "similar in color" means the values of hue, saturation, and lightness, which are the elements that make up the color of the spoon, and the values of hue, saturation, and lightness that make up the color of the hand (skin). Is within a predetermined approximation range, or the RGB value representing the color of the spoon and the RGB value representing the color of the hand (skin) are within the predetermined approximation range.

The gray image of the object "spoon and hand" is shown in FIG. 6A, the first depth map of the object is shown in FIG. 6B, and the region extraction unit 103 makes the spoon and the hand different regions. It is assumed that it is extracted as. In this case, as shown in FIG. 6C, the display image is an image showing that the spoon and the hand extracted as different regions are extracted as different regions by drawing the hands in different colors.

By extracting the region in this way, two or more objects may be extracted, or a plurality of regions composed of different materials or the like on the surface of the object may be extracted although the object is one. It may be done. Further, in some cases, two or more objects are extracted, and two or more distinguishable regions are extracted on the surface of the objects.

As described above, the processing according to the first embodiment is performed. According to the first embodiment, since the region is extracted based on the depth difference, it is possible to extract the region that cannot be extracted only by the RGB image because the colors are the same or similar.

<2. Second Embodiment>
[2-1. Configuration of information processing device 200]
Next, a second embodiment of the present technique will be described. The configuration of the information processing system 10 is the same as that of the first embodiment. As shown in FIG. 7, the information processing apparatus 200 according to the second embodiment includes functional blocks of a depth map generation unit 101, a difference map generation unit 102, an area extraction unit 103, a type identification unit 201, and an image processing unit 104. It is configured to be prepared. Since the depth map generation unit 101, the difference map generation unit 102, the area extraction unit 103, and the image processing unit 104 are the same as those in the first embodiment, the description thereof will be omitted.

The type specifying unit 201 associates the depth map acquired by the ToF sensor 500 with the peak of the depth difference in the depth map generated from the image acquired by the distance measuring sensor 600 in advance and the type of the object as shown in FIG. By referring to the table, the types of materials that make up the object are specified. The table may be possessed by the type specifying unit 201, or may be stored in the storage unit 160 in advance so that the type specifying unit 201 can read out the table of the storage unit 160. Further, the table may be stored in an external server or the like, and the information processing apparatus 200 may access the server via the interface 170 and read the table. The table shown in FIG. 8 is for convenience of explanation, and does not mean that the depth differences of wood, food, skin, and cloth described in FIG. 8 are the values shown in FIG.

The type is a group of items that have the same properties and morphology according to a certain standard. There are various types such as metals, plants, foods, organisms, cloths, synthetic resins, minerals, and papers. The types may be further classified, for example, food may be classified into vegetables and meat in more detail, and metals may be further classified into iron, copper, gold and the like.

[2-1. Processing by information processing device 200]
Next, the processing of the information processing apparatus 200 in the second embodiment will be described with reference to the flowchart of FIG.

Since steps S101 to S106 are the same as the processes in the first embodiment, the description thereof will be omitted.

When the area extraction unit 103 extracts the area on the first depth map in step S106, the type identification unit 201 then specifies the type of the area extracted by the area extraction unit 103 in step S201.

For example, when the table is associated with the depth difference and the type of material or the like constituting the object as shown in FIG. 8, the histogram of the object generated by the region extraction unit 103 is shown in FIG. And. Then, the histogram of the table and the object are compared, and when the position of the peak is included in a certain kind of range in the table, it is specified as the kind of the material or the like constituting the region.

In the table of FIG. 8 and the histogram of FIG. 10, since the two peaks of the histogram of FIG. 10 are included in the "tree" and the "skin" in the table of FIG. 8, the extracted regions are the region of the tree and the skin. The type can be specified as an area.

Next, in step S202, the image processing unit 104 generates a display image in which the region extracted from the first depth map and the type of the region are drawn. The display image is generated by drawing the extraction area on the first depth map. The display image may be generated by drawing on any of the second depth map, the first RGB image, and the second RGB image.

The gray image of the object "spoon and hand (skin)" is shown in FIG. 11A, the first depth map of the object is shown in FIG. 11B, and the spoon and hand (skin) are shown by the region extraction unit 103. ) Are extracted as different regions. In this case, as shown in FIG. 11C, the display image is an image showing that the spoon and the hand (skin) extracted as different regions are extracted as different regions by drawing them in different colors.

Further, in the display image shown in FIG. 11C, the type of each specified area is indicated by characters corresponding to each extracted area. By displaying this display image on the display unit 180, the user can confirm the extracted area and the type of the area. The method of indicating the type of the area in the display image may be the method shown in FIG. 11D, and any method may be used as long as the user can know the type of the area. In FIG. 11D, it is assumed that the spoon is shown in blue and the hand (skin) is shown in red.

As described above, the processing in the second embodiment is performed. According to the second embodiment, in addition to the detection of the region in the object, the type of the material or the like constituting the region can be specified.

In addition, in order to specify the types in detail (for example, specify the types of meat, vegetables, etc. in foods, specify radishes, green onions, carrots, etc. in vegetables), conduct experiments to obtain the depth difference and prepare a table. do it.

<3. Third Embodiment>
[3-1. Configuration of information processing system 30]
Next, with reference to FIG. 12, the configuration of the information processing system 30 according to the third embodiment of the present technology will be described. The information processing system 30 includes an information processing device 300 and a ToF sensor 500. Since the ToF sensor 500 is the same as that of the first embodiment, the description thereof will be omitted.

In the third embodiment, it is premised that the object has a region on the surface of the object whose material is different from that of the object and whose color is the same or similar. The area may be one that originally exists on the surface of the object, or may be something like a marker consisting of characters or figures attached to the surface of the object by a user or the like. "Similar in color" is similar to that described in the first embodiment.

[3-2. Configuration of information processing device 300]
Next, the configuration of the information processing apparatus 300 according to the third embodiment will be described. Since the configuration other than the functional block is the same as the configuration in the first embodiment, the description thereof will be omitted.

As shown in FIG. 13, the information processing apparatus 300 according to the third embodiment includes functional blocks of a state estimation unit 301, a depth map generation unit 101, a difference map generation unit 102, an area extraction unit 103, and an image processing unit 104. It is configured to be prepared. Since the depth map generation unit 101, the difference map generation unit 102, the area extraction unit 103, and the image processing unit 104 are the same as those in the first embodiment, the description thereof will be omitted.

The state estimation unit 301 estimates the state of the object using the three-dimensional shape data. The state of the object is the shape, posture and size of the object. When the shape of the object is one type, the state of the object may be only the posture. Further, the size is not essential, and the state of the object may be a combination of posture and shape.

The three-dimensional shape data is data showing a plurality of three-dimensional shapes such as a sphere, a cylinder, a cone, a square, a rectangle, a hexagonal column, a triangular pyramid, a triangular prism, and a flat plate as shown in FIGS. 14A to 14I. However, the three-dimensional shape data is not limited to these, and any three-dimensional shape information may be used. In order to enable region detection of more various shapes of objects, it is advisable to prepare more three-dimensional shape data in advance.

The three-dimensional shape data may be possessed by the state estimation unit 301, or may be stored in the storage unit 160 in advance so that the state estimation unit 301 can read out the three-dimensional shape data of the storage unit 160. Further, the table may be stored in an external server or the like, and the information processing apparatus 300 may access the server via the interface 170 and read the three-dimensional shape data.

The information processing device 300 is configured as described above.

[3-3. Processing by information processing device 300]
Next, the processing of the information processing apparatus 300 in the third embodiment will be described with reference to the flowchart of FIG. A detailed description of the same processing as that of the first embodiment will be omitted.

In step S101, the information processing apparatus 300 acquires the first depth map of the object from the ToF sensor 500, and then in step S301, the state estimation unit 301 estimates the state of the object.

To estimate the shape, posture, and size of an object, first, the posture and size of one 3D shape data at a certain viewpoint in 3D space are tentatively determined, and then the 3D shape viewed from that viewpoint. Find the depth map for state estimation about. The depth map for state estimation is compared with the first depth map for the object. Then, this comparison process is performed for all possible patterns of postures and a plurality of sizes for each of the three-dimensional shape data, and the one with the smallest depth difference is estimated as the shape, posture, and size of the object. The range is predetermined for the size, and the temporary size is determined within the range.

At that time, for example, if it is a sphere, the depth map will be the same regardless of the posture, but in that case, it is better to select one as a representative.

Next, in step S302, the depth map generation unit 101 adjusts the three-dimensional shape data estimated to be the state of the object in the process in step S301 to the estimated posture of the object according to the estimated size of the object. Based on this, a second depth map of the ToF sensor 500 viewpoint is generated.

Next, in step S303, the difference map generation unit 102 generates a difference map based on the first depth map and the second depth map. The difference map is generated by calculating the depth difference from the second depth map at each pixel constituting the first depth map, as in the first embodiment.

Next, in step S304, the area extraction unit 103 generates a histogram of the difference map and detects a peak from the histogram.

Next, in step S305, the area extraction unit 103 extracts an area from all the pixels constituting the first depth map by extracting all the pixels having the depth difference of the peak detected in the histogram.

Alternatively, the region extraction unit 103 extracts a region by extracting all pixels having a depth difference included in a predetermined amount of width centered on the peak detected in the histogram from the first depth map. The method of region extraction is the same as that of the first embodiment.

Next, in step S306, the image processing unit 104 generates a display image in which the extraction area extracted from the first depth map is drawn. The display image is generated by drawing the extraction area on the first depth map. The display image may be generated by drawing on the second depth map or the three-dimensional shape data. By displaying the generated display image on the display unit 180, the user can confirm the extracted area.

As described above, the processing according to the third embodiment is performed.

According to the third embodiment, for example, the orientation is not determined only by the shape, and a marker is attached to any surface of an object that looks the same when viewed from any direction, up, down, left, or right, and the marker is used in the present technique. By extracting, the direction and surface of the object can be easily grasped.

Specifically, as shown in FIG. 16A, a marker (character "front" in FIG. 16A) of a material different from the object is provided as a mark on the front surface of the box which is the object. Then, by extracting and recognizing the marker as an area by the present technique, it is possible to grasp that the surface on which the marker is located is the front surface of the box.

The marker can be extracted even if the color is the same as or similar to the object, so it does not interfere with the appearance of the object and does not make the viewer of the object aware that the marker is present. You can also do it.

Further, as shown in FIG. 16B, if secret information is described on the surface of the object as an area where the material is different from the object and the color is the same or similar, only by using this technique. The secret information can be read. As a result, secret information can be exchanged without being known by other persons. In FIG. 16B, "secret information" is described in characters as an area on the paper that is the object.

<4. Modification example>
Although the embodiment of the present technology has been specifically described above, the present technology is not limited to the above-described embodiment, and various modifications based on the technical idea of the present technology are possible.

In the embodiment, an RGB stereo camera is used as the distance measuring sensor 600, but the distance measuring sensor may be any camera or sensor as long as it can acquire depth information and generate a depth map. For example, the distance measuring sensor 600 may be a stereo IR camera composed of two IR (Infrared) cameras, a triangulation by one IR camera and Structured Light, or the like.

In the embodiment, the information processing apparatus 100 generates a depth map by acquiring an image from the ranging sensor 600, but the information processing apparatus 100 generates a depth map in the ranging sensor 600 or an external device so as to acquire it. You may do it.

Further, in any of the embodiments, the information processing device may operate on the server or the cloud. In that case, the information processing apparatus receives the first depth map generated by the ToF sensor 500, the image generated by the distance measuring sensor 600, and the like via the network and processes them.

The present technology can also take the following configurations.
(1)
A difference map generator that generates a difference map from a first depth map of an object acquired by a ToF sensor and a second depth map of the object.
An information processing device including an area extraction unit that extracts an area on the first depth map based on the difference map.
(2)
The information processing apparatus according to (1), comprising a depth map generation unit that generates the second depth map.
(3)
The depth map generation unit is based on a first image acquired by a first image pickup apparatus constituting a stereo camera as a distance measuring sensor and a second image acquired by a second image pickup apparatus constituting the stereo camera. The information processing apparatus according to (2), which generates the second depth map.
(4)
Described in any one of (1) to (3), the difference map generation unit generates the difference map by calculating the depth difference between each pixel constituting the first depth map and the second depth map. Information processing equipment.
(5)
The information processing apparatus according to any one of (1) to (4), wherein the region extraction unit extracts a region on the first depth map having a depth difference corresponding to a peak in the histogram of the difference map.
(6)
Described in any one of (1) to (5), the region extraction unit extracts a region on the first depth map having a depth difference included in a predetermined width centered on a peak in the histogram of the difference map. Information processing device.
(7)
The information processing apparatus according to any one of (1) to (6), comprising an image processing unit that generates an image indicating an region extracted by the region extraction unit.
(8)
The information processing apparatus according to any one of (1) to (7), comprising a type specifying unit for specifying the type of the object by referring to a table in which the depth difference and the type of the object are previously associated with each other.
(9)
The information processing apparatus according to (8), further comprising an image processing unit that generates an image indicating the type of the object specified by the area extraction unit and the area extraction unit.
(10)
The description according to any one of (1) to (9), comprising a state estimation unit that estimates the state of the object based on the first depth map and the state estimation depth map generated from the three-dimensional shape data. Information processing device.
(11)
The information processing apparatus according to (10), wherein the state of the object is the posture of the object.
(12)
The information processing apparatus according to (11), wherein the state of the object is the shape of the object.
(13)
The information processing apparatus according to (11) or (12), wherein the state of the object is the size of the object.
(14)
The depth map generation unit generates the second depth map for the three-dimensional shape data with respect to the viewpoint of the ToF sensor based on the state of the object.
The information processing apparatus according to (10), wherein the difference map generation unit generates the difference map from the first depth map and the second depth map.
(15)
A difference map is generated from the first depth map of the object acquired by the ToF sensor and the second depth map of the object.
An information processing method for extracting an area on the first depth map based on the difference map.
(16)
A difference map is generated from the first depth map of the object acquired by the ToF sensor and the second depth map of the object.
An information processing program that causes a computer to execute an information processing method for extracting an area on the first depth map based on the difference map.

100, 200, 300 ... Information processing device 101 ... Depth map generation unit 102 ... Difference map generation unit 103 ... Area extraction unit 104 ... Image processing unit 201 ... Type identification unit 301.・・ State estimation unit 500 ・・・ ToF sensor 600 ・・・ Stereo camera

Claims

A difference map generator that generates a difference map from a first depth map of an object acquired by a ToF sensor and a second depth map of the object.
An information processing device including an area extraction unit that extracts an area on the first depth map based on the difference map.
The information processing apparatus according to claim 1, further comprising a depth map generation unit that generates the second depth map.
The depth map generation unit is based on a first image acquired by a first image pickup apparatus constituting a stereo camera as a distance measuring sensor and a second image acquired by a second image pickup apparatus constituting the stereo camera. The information processing apparatus according to claim 2, wherein the second depth map is generated.
The information processing apparatus according to claim 1, wherein the difference map generation unit generates the difference map by calculating the depth difference between each pixel constituting the first depth map and the second depth map.
The information processing apparatus according to claim 1, wherein the region extraction unit extracts a region on the first depth map having a depth difference corresponding to a peak in the histogram of the difference map.
The information processing apparatus according to claim 1, wherein the region extraction unit extracts a region on the first depth map having a depth difference included in a predetermined width centered on a peak in the histogram of the difference map.
The information processing apparatus according to claim 1, further comprising an image processing unit that generates an image indicating an region extracted by the region extraction unit.
The information processing apparatus according to claim 1, further comprising a type specifying unit that specifies the type of the object by referring to a table in which the depth difference and the type of the object are previously associated with each other.
The information processing apparatus according to claim 8, further comprising an image processing unit that generates an image indicating the type of the object specified by the area extraction unit and the area extraction unit.
The information processing apparatus according to claim 1, further comprising a state estimation unit that estimates the state of the object based on the first depth map and the state estimation depth map generated from the three-dimensional shape data.
The information processing device according to claim 10, wherein the state of the object is the posture of the object.
The information processing apparatus according to claim 11, wherein the state of the object is the shape of the object.
The information processing apparatus according to claim 11, wherein the state of the object is the size of the object.
The depth map generation unit generates the second depth map for the three-dimensional shape data with respect to the viewpoint of the ToF sensor based on the state of the object.
The information processing device according to claim 10, wherein the difference map generation unit generates the difference map from the first depth map and the second depth map.
A difference map is generated from the first depth map of the object acquired by the ToF sensor and the second depth map of the object.
An information processing method for extracting an area on the first depth map based on the difference map.
A difference map is generated from the first depth map of the object acquired by the ToF sensor and the second depth map of the object.
An information processing program that causes a computer to execute an information processing method for extracting an area on the first depth map based on the difference map.