WO2022019149A1

WO2022019149A1 - Information processing device, 3d model generation method, information processing method, and program

Info

Publication number: WO2022019149A1
Application number: PCT/JP2021/025929
Authority: WO
Inventors: 宜之高尾; 剛也小林
Original assignee: ソニーグループ株式会社
Priority date: 2020-07-21
Filing date: 2021-07-09
Publication date: 2022-01-27

Abstract

The present invention properly generates 3D models, for example.　An information processing device according to the present invention has a mask region setting unit that sets a mask region with respect to an obstruction that is present between a real camera and a target, and a 3D model generation unit that generates a 3D model on the basis of multiple pieces of image data, including image data in which a mask region is set.

Description

Information processing equipment, 3D model generation method, information processing method and program

This disclosure relates to an information processing device, a 3D model generation method, an information processing method and a program.

Patent Document 1 discloses a technique for drawing an object as a three-dimensional model.

Japanese Patent No. 5483761

When generating a 3D model, there is a risk that an unnatural 3D model will be generated if an appropriate mask area is not set. Conventionally, since it was necessary to manually set this mask area, a method capable of efficiently generating a three-dimensional model has been desired.

One of the purposes of the present disclosure is to provide an information processing device that can automatically set a mask area, a 3D model generation method, an information processing method, and a program.

The present disclosure is, for example,
A mask area setting unit that sets the mask area for the obstruction that exists between the actual camera and the target,
It is an information processing apparatus having a 3D model generation unit that generates a 3D model based on a plurality of image data including image data in which a mask area is set.

The present disclosure is, for example,
The mask area setting unit sets the mask area for the obstruction that exists between the actual camera and the target.
This is a 3D model generation method in which the 3D model generation unit generates a 3D model based on a plurality of image data including image data in which a mask area is set.

The present disclosure is, for example,
The mask area setting unit sets the mask area for the obstruction that exists between the actual camera and the target.
The 3D model generation unit is a program that causes a computer to execute a 3D model generation method for generating a 3D model based on a plurality of image data including image data in which a mask area is set.

The present disclosure is, for example,
The acquisition unit acquires mask area information indicating the mask area set for the obstruction existing between the actual camera and the target.
The rendering unit is an information processing method that performs rendering excluding the mask area.

The present disclosure is, for example,
The acquisition unit acquires mask area information indicating the mask area set for the obstruction existing between the actual camera and the target.
The rendering unit is a program that causes a computer to execute an information processing method that performs rendering excluding the mask area.

1A to 1C are views which are referred to when the outline of the present disclosure is explained. 2A to 2C are diagrams referred to when the outline of the present disclosure is explained. 3A to 3C are views which are referred to when the outline of the present disclosure is explained. FIG. 4 is a diagram referred to when the outline of the present disclosure is explained. 5A-5C are views which will be referred to when the outline of the present disclosure is explained. FIG. 6 is a diagram referred to when the outline of the present disclosure is explained. FIG. 7 is a block diagram showing a configuration example of the information processing system according to the embodiment. FIG. 8 is a diagram for explaining an example of one process performed in an information processing system. 9A to 9C are diagrams referred to when the processing performed by the mask area setting unit according to the embodiment is described. 10A to 10C are diagrams referred to when the processing performed by the mask area setting unit according to the embodiment is described. 11A to 11C are diagrams that are referred to when the processing performed by the mask area setting unit according to the embodiment is described. FIG. 12 is a diagram referred to when an example of using the mask area at the time of rendering is explained. 13A to 13C are diagrams referred to when an example of using the mask area at the time of rendering is described. FIG. 14 is a diagram referred to when an example of using the mask area at the time of rendering is described. FIG. 15 is a flowchart for explaining an operation example of the information processing apparatus according to the embodiment. FIG. 16 is a flowchart for explaining an operation example of the mask area setting unit according to the embodiment. FIG. 17 is a diagram showing a configuration example when the processing performed by the information processing system according to the embodiment is configured in terms of hardware.

Hereinafter, embodiments and the like of the present disclosure will be described with reference to the drawings. The explanation will be given in the following order.
<Summary of this disclosure>
<One Embodiment>
<Modification example>
<Application example>
The embodiments and the like described below are suitable specific examples of the present disclosure, and the contents of the present disclosure are not limited to these embodiments and the like.

<Summary of this disclosure>
First, the outline of the present disclosure will be described while touching on the issues to be considered in the present disclosure. As one method of generating a 3D model (hereinafter, appropriately referred to as a 3D model), a method of creating a mesh by modeling with Visual Hull and generating a 3D model by performing texture mapping on the mesh is known. Has been done. In generating a 3D model, the subject and the background are separated for each of a plurality of 2D image data. For example, a binary image called a silhouette image in which the silhouette of the subject is represented in white and the other areas are represented in black is obtained. Used.

Here, if there is a shield between the camera and the subject, there is a risk that an inappropriate 3D model will be generated. This point will be specifically described with reference to FIGS. 1 to 3. FIG. 1A is a two-dimensional image data IM1A including a target (person TA in this example) that is a target for generating a 3D model. 2A and 3A are two-dimensional image data IM2A and IM3A taken by another camera different from the camera that took the two-dimensional image data shown in FIG. 1A. As shown in FIG. 3A, a bar BA, which is an example of a shield, exists between the camera that captured the two-dimensional image data IM3A and the person TA.

By performing the process of separating the background image data IM1B shown in FIG. 1B from the two-dimensional image data IM1A shown in FIG. 1A, a silhouette image SI1 (see FIG. 1C) in which the person TA and the background are separated is obtained. Be done. Further, the silhouette image SI2 in which the person TA and the background are separated by performing the process of separating the background image data IM2B shown in FIG. 2B from the two-dimensional image data IM2A shown in FIG. 2A (see FIG. 2C). Is obtained.

Similarly, the silhouette image SI3 (see FIG. 3C) in which the person TA and the background are separated by performing the process of separating the background image data IM3B shown in FIG. 3B from the two-dimensional image data IM3A shown in FIG. 3A. ) Is obtained. Here, when the background image data IM3B is excluded (subtracted) from the two-dimensional image data IM3A, the bar BA disappears, so that the area of the bar BA is regarded as the background. That is, in the silhouette image SI3 shown in FIG. 3C, the bar BA is represented by the background, that is, black.

FIG. 4 shows an example of a 3D model generated using the silhouette images SI1 to SI3. Although more silhouette images are actually used to generate the 3D model, the 3D model is generated using the silhouette images SI1 to SI3 in order to simplify the explanation. If a shield such as a bar BA exists between the camera and the person TA, the silhouette cannot be acquired correctly as in the silhouette image SI3, so that the obtained 3D model becomes unnatural. For example, as shown in FIG. 4, the body becomes a 3D model divided into upper and lower parts.

Therefore, in the present embodiment, the shield portion is set as the mask area. Specifically, as shown in FIG. 5B, the portion of the bar BA is set as the mask area MA. Since the portion set as the mask area MA is not subject to processing in background subtraction, the silhouette can be extracted as the foreground. FIG. 5C shows a silhouette image SI3'when the mask area MA is set.

A 3D model is generated using the silhouette images SI1, SI2, SI3'. FIG. 6 shows an example of a 3D model generated using the silhouette images SI1, SI2, and SI3'. In the silhouette image SI3', the silhouettes of the person TA and the bar BA (white parts) overlap. However, in the process of creating by Visual Hull using silhouette images SI1 and SI2 corresponding to two-dimensional image data (image data without obstruction) from another camera, that is, another viewpoint, the part of the bar BA is It will be scraped. As a result, an appropriate 3D model corresponding to the person TA can be obtained.

It is preferable that the above-mentioned mask area can be set automatically. This is because the 3D model can be efficiently generated by automatically setting the mask area.

When the mask area is set, it is necessary to consider it in the rendering process in the process of generating the 3D model. That is, since there is no texture at the part of the mask area, it is desired that the texture at the part of the mask area can be rendered appropriately. The details of the rendering process when there is a mask area will be described later. Based on the above, an embodiment of the present disclosure will be described in detail.

<One Embodiment>
[Overview of information processing system]
FIG. 7 shows an outline of an information processing system to which the present technology is applied. The data acquisition unit 1 acquires image data for generating a 3D model of the subject. For example, as shown in FIG. 8, a plurality of viewpoint images captured by a plurality of image pickup devices 8B arranged so as to surround the subject 8A are acquired as image data. In this case, the plurality of viewpoint images are preferably images captured by a plurality of cameras in synchronization. Further, the data acquisition unit 1 may acquire, for example, a plurality of viewpoint images obtained by capturing the subject 8A from a plurality of viewpoints with one camera as image data.

The data acquisition unit 1 may perform calibration based on the image data and acquire the internal parameters and the external parameters of each image pickup apparatus 8B. Further, the data acquisition unit 1 may acquire a plurality of depth information indicating a distance from a plurality of viewpoints to the subject 8A, for example.

The 3D model generation unit 2 generates a model having 3D information of the subject 8A based on the image data for generating the 3D model of the subject 8A. The 3D model generation unit 2 uses, for example, the so-called Visual Hull to cut the three-dimensional shape of the subject 8A using images from a plurality of viewpoints (for example, silhouette images from a plurality of viewpoints) to create a 3D image of the subject 8A. Generate a model. In this case, the 3D model generation unit 2 can further transform the 3D model generated by using Visual Hull with high accuracy by using a plurality of depth information indicating the distances from the viewpoints of a plurality of points to the subject 8A. The 3D model generated by the 3D model generation unit 2 can be said to be a moving image of the 3D model by generating it in time-series frame units. Further, since the 3D model is generated by using the image captured by the image pickup apparatus 8B, it can be said to be a live-action 3D model. The 3D model can express shape information representing the surface shape of the subject 8A in the form of mesh data called a polygon mesh, which is expressed by the connection between vertices (Vertex) and vertices. The method of expressing the 3D model is not limited to these, and may be described by a so-called point cloud expression method expressed by the position information of points.

Color information data is also generated as a texture in a form linked to these 3D shape data. For example, there are cases of ViewIndependent textures that have a constant color when viewed from any direction, and cases of ViewDependent textures whose colors change depending on the viewing direction.

The 3D model generation unit 2 has a mask area setting unit 2A as a functional block. The mask area setting unit 2A sets the mask area for the shield existing between the actual camera and the target. The 3D model generation unit 2 generates a 3D model based on a plurality of image data including image data in which a mask area is set.

The formatting unit 3 (encoding unit) converts the 3D model data generated by the 3D model generation unit 2 into a format suitable for transmission and storage. For example, the 3D model generated by the 3D model generation unit 2 may be converted into a plurality of two-dimensional images by perspectively projecting them from a plurality of directions. In this case, the 3D model may be used to generate depth information which is a two-dimensional depth image from a plurality of viewpoints. The depth information and the color information of the state of this two-dimensional image are compressed and output to the transmission unit 4. The depth information and the color information may be transmitted side by side as one image, or may be transmitted as two separate images. In this case, since it is in the form of two-dimensional image data, it can be compressed by using a two-dimensional compression technique such as AVC (Advanced Video Coding). Further, in the present embodiment, the formatting unit 3 converts the mask area information indicating the mask area set by the mask area setting unit 2A into a predetermined format. The mask area information is, for example, information indicating a mask area in the background two-dimensional image data, but is not limited to this.

Also, for example, 3D data may be converted into a point cloud format. It may be output to the transmission unit 4 as three-dimensional data. In this case, for example, the Geometry-based-Approach 3D compression technique discussed in MPEG can be used.

The transmission unit 4 transmits the transmission data (including the mask area information) formed by the formatting unit 3 to the reception unit 5. The transmission unit 4 transmits the transmission data to the reception unit 5 after performing a series of processes of the data acquisition unit 1, the 3D model generation unit 2 and the formatting unit 3 offline. Further, the transmission unit 4 may transmit the transmission data generated from the series of processes described above to the reception unit 5 in real time.

The receiving unit 5 receives the transmission data transmitted from the transmitting unit 4. As described above, the transmission data includes mask setting information. In this way, the receiving unit 5 functions as an acquisition unit for acquiring the mask setting information transmitted from the transmitting unit 4.

The rendering unit 6 renders using the transmission data received by the receiving unit 5. For example, texture mapping is performed by projecting a mesh of a 3D model from the viewpoint of a camera that draws it and pasting a texture representing a color or pattern. The drawing at this time can be arbitrarily set regardless of the camera position at the time of shooting and can be viewed from a free viewpoint. Further, although the details will be described later, the rendering unit 6 performs rendering excluding the mask area.

The rendering unit 6 performs texture mapping to paste a texture representing the color, pattern or texture of the mesh according to the position of the mesh of the 3D model, for example. Texture mapping includes a so-called View Dependent method that considers the user's viewing viewpoint and a View Independent method that does not consider the user's viewing viewpoint. The View Dependent method has the advantage of being able to achieve higher quality rendering than the View Independent method because the texture to be pasted on the 3D model changes according to the position of the viewing viewpoint. On the other hand, the View Independent method has an advantage that the amount of processing is smaller than that of the View Dependent method because the position of the viewing viewpoint is not considered. The viewing viewpoint data is input to the rendering unit 6 from the display device after the display device detects the viewing point (Region of Interest) of the user. Further, the rendering unit 6 may adopt, for example, billboard rendering that renders the object so that the object maintains a vertical posture with respect to the viewing viewpoint. For example, when rendering multiple objects, objects that are of less interest to the viewer may be rendered on the billboard, and other objects may be rendered using other rendering methods.

The display unit 7 displays the result rendered by the rendering unit 6 on the display unit 7 of the display device. The display device may be a 2D monitor or a 3D monitor such as a head-mounted display, a spatial display, a mobile phone, a television, or a PC.

The information processing system of FIG. 7 shows a series of flows from a data acquisition unit 1 that acquires an captured image, which is a material for generating content, to a display control unit that controls a display device to be viewed by a user. However, this does not mean that all functional blocks are required for the implementation of the present technology, and the present technology can be implemented for each functional block or a combination of a plurality of functional blocks. For example, in FIG. 7, a transmitting unit 4 and a receiving unit 5 are provided to show a series of flow from the side that creates the content to the side that views the content through the distribution of the content data. When the same information processing device (for example, a personal computer) is used, it is not necessary to include a coding unit, a transmitting unit 4, a decoding unit, or a receiving unit 5.

In implementing this information processing system, the same implementer may implement everything, or different implementers may implement each functional block. As an example, the business operator A generates 3D contents through the data acquisition unit 1, the 3D model generation unit 2, and the formatting unit 3. Then, it is conceivable that the 3D content is distributed through the transmission unit 4 (platform) of the business operator B, and the display device of the business operator C performs reception, rendering, and display control of the 3D content.

Also, each functional block can be implemented on the cloud. For example, the rendering unit 6 may be performed in the display device or may be performed in the server. In that case, information is exchanged between the display device and the server.

FIG. 7 describes the data acquisition unit 1, the 3D model generation unit 2, the formatting unit 3, the transmission unit 4, the reception unit 5, the rendering unit 6, and the display unit 7 as an information processing system. However, the information processing system of the present specification is referred to as an information processing system if two or more functional blocks are involved. For example, the data acquisition unit 1 and the 3D model generation unit 2 are not included in the display unit 7. , The encoding unit, the transmitting unit 4, the receiving unit 5, the decoding unit, and the rendering unit 6 can be collectively referred to as an information processing system. Further, the present disclosure can be configured as an information processing apparatus including any configuration among the configurations of the information processing system shown in FIG. 7. For example, the present disclosure can be configured as an information processing device having all the configurations shown in FIG. 7, an information processing device having only a 3D model generation unit 2, and an information processing device having a receiving unit 5 and a rendering unit 6. ..

[Operation example of mask area setting unit]
Next, an operation example of the mask area setting unit 2A will be described. As shown in FIG. 9A, for example, it is assumed that the actual camera RC1 shoots a person TA as a target. A shield 41 exists between the actual camera RC1 and the person TA. As shown in FIG. 9B, the image taken by the actual camera RC1 is an image in which a shield exists. In this example, as shown in FIG. 9C, an image viewed from a virtual camera VC1 (virtual viewpoint) having no obstruction is created.

I will explain more specifically. As shown in FIG. 10A, there are actual cameras RC1 and RC2. There is a shield 41 between the actual cameras RC1 and RC2 and the person TA. FIG. 10B shows an image taken by the actual camera RC1, and FIG. 10C shows an image taken by the actual camera RC2.

The position and range of the person TA to be modeled or rendered is manually set by the user, for example. The person TA may be set automatically.

The position and orientation of each actual camera (hereinafter collectively referred to as position and orientation) can be determined by prior camera calibration. As a method for camera calibration, Zhang's method using a chess board is known. Of course, as a method related to camera calibration, a method other than Zhang's method can also be applied. It is also possible to apply a method of finding, a method of projecting a feature point using a projector and finding parameters using the projected image, a method of shaking an LED (Light Emitting Diode) light to image a point light source, and finding parameters. It is possible.

The 3D position in the input image is estimated using the position information of each camera and the input image from each camera. The three-dimensional position is estimated in pixel units, for example. When the three-dimensional position is the position between the real camera and the person TA, the pixel is regarded as a shield and is set as a mask area. The above processing is performed in pixel units.

FIG. 11A is the same diagram as FIG. 10A. FIG. 11B is a diagram schematically showing the mask region MA4 set in the image obtained by the actual camera RC1. FIG. 11C is a diagram schematically showing the mask region MA5 set in the image obtained by the actual camera RC2. As described above, in the present embodiment, the mask area can be automatically set.

(Example of using mask area during modeling)
An example of using the mask area set as described above will be described. The area set as the mask area is excluded from the modeling process. Specifically, as described above, the portion of the mask region is extracted as the foreground. Then, by removing the portion of the shield by the silhouette image based on the image data taken from the viewpoint not including the shield, it becomes possible to generate a 3D model that is not covered by the shield.

(Example of using the mask area at the time of rendering)
An example of using the mask area at the time of rendering will be described with reference to FIGS. 12 to 14. The rendering unit 6 excludes the set mask area and renders an area other than the mask area. Here, rendering with the mask area excluded means that the texture obtained from the image taken by the actual camera is not used. That is, rendering may be performed on the mask area by a texture obtained from an image other than the image taken by the actual camera.

As shown in FIG. 12, the person TA is photographed by the actual cameras RC1 to RC3. Further, the virtual camera VC1 is arranged in the virtual space at a position corresponding to the virtual viewpoint. There is a shield 41 between the actual camera RC3 and the person TA.

FIG. 13A shows the photographed image IM4A obtained by the actual camera RC1, FIG. 13B shows the photographed image IM4B obtained by the actual camera RC2, and FIG. 13C shows the photographed image IM4C obtained by the actual camera RC3. By the processing by the mask area setting unit 2A described above, the mask area MA6 is set at the place of the shield 41 in the captured image IM4C. There are no textures in this mask area. Therefore, the rendering unit 6 renders the estimated texture based on the pixels in the other regions with respect to the mask region. For example, the texture is estimated based on the image of the mask area seen from the camera image at a position close to the virtual viewpoint. Specifically, the texture is estimated from the pixels in the area corresponding to the mask area taken by the real cameras RC1 and RC2, which are close to the virtual cameras VC1, and the pixels in the area around the mask area MA6 of the real camera RC3. Estimate the texture using. The rendering unit 6 renders using the estimated texture.

Explain other examples. FIG. 14 is a video showing two athletes engaged in martial arts in a polygonal wire mesh and a spectator watching the game outside the wire mesh. This example is an example in which the processing corresponding to the mask area differs depending on the context of the two target players.

As shown in FIG. 14, the mask area MA7 is set on the front side of the two athletes, and the mask area MA8 is set on the rear side. There is no corresponding 3D model in the area corresponding to the mask area MA7, and no texture is pasted in the area. Further, although the 3D model exists in the mask area MA8, a 3D model created from another image and a texture associated with the 3D model are pasted. This makes it possible to create a more accurate free-viewpoint video.

[Processing flow]
(Overall processing flow)
Next, the flow of processing performed by the information processing system according to the present embodiment will be described. First, the entire processing flow will be described with reference to the flowchart of FIG. When the process is started, in step S101, the data acquisition unit 1 acquires image data for generating a 3D model of the target subject. In step S102, the mask area setting unit 2A sets the mask area, and the 3D model generation unit 2 generates a model having the three-dimensional information of the subject based on the image data for generating the 3D model of the subject. That is, modeling using the mask area is performed. In step S103, the formatting unit 3 encodes the shape, texture data, and mask area information of the 3D model generated by the 3D model generation unit 2 into a format suitable for transmission and storage. In step S104, the transmission unit 4 transmits the encoded data, and in step S105, the reception unit 5 receives the transmitted data. In step S106, a decoding unit (not shown) performs decoding processing, converts it into shape and texture data necessary for display, and the rendering unit 6 renders using the shape, texture data, and mask area information. In step S107, the display unit 7 displays the rendered result. When the process of step S107 is completed, the process of the information processing system is completed.

(Flow of automatic mask area setting process)
Next, the flow of the automatic mask area setting process for automatically setting the mask area performed by the mask area setting unit 2A will be described with reference to the flowchart of FIG. The process described below is performed by the mask area setting unit 2A as a part of the process of generating the 3D model in step S102 in the flowchart of FIG.

In step S201, camera calibration regarding the position and orientation of each actual camera is performed. The camera calibration process is usually performed when creating a free-viewpoint image. For example, the position and orientation between the cameras are estimated using the captured image using the calibration board. Then, the process proceeds to step S202.

In step S202, the target area is set. The target area is, for example, an area for which modeling and rendering are desired. The area may be set manually by the user or may be set automatically. When it is set automatically, for example, if there is a camera path of the free viewpoint image in advance, the area near the focal point of the virtual camera is automatically set as the target area. Then, the process proceeds to step S203.

In step S203, the three-dimensional position of each pixel of each camera image is estimated by using a plurality of camera position information obtained in step S201. Then, the process proceeds to step S204.

In step S204, in step S203, when the position information of each pixel obtained in step S203 is between the position of the camera and the position of the target, the area is regarded as a shield and set as a mask area. After that, normal free-viewpoint video modeling and rendering processing operates. As mentioned above, the mask area is not used in modeling and rendering.

[Hardware configuration example]
FIG. 17 is a block diagram showing an example of hardware configuration of a computer that executes the above-mentioned series of processes programmatically. In the computer shown in FIG. 17, a CPU (Central Processing Unit) 11, a ROM (Read Only Memory) 12, and a RAM (Random Access Memory) 13 are connected to each other via a bus 14. The input / output interface 15 is also connected to the bus 14. An input unit 16, an output unit 17, a storage unit 18, a communication unit 19, and a drive 20 are connected to the input / output interface 15. The input unit 16 includes, for example, a keyboard, a mouse, a microphone, a touch panel, an input terminal, and the like. The output unit 17 includes, for example, a display, a speaker, an output terminal, and the like. The storage unit 18 is composed of, for example, a hard disk, a RAM disk, a non-volatile memory, or the like. The communication unit 19 is composed of, for example, a network interface. The drive 20 drives a removable medium such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.

In the computer configured as described above, the CPU 11 loads the program stored in the storage unit into the RAM 13 via the input / output interface 15 and the bus 14 and executes the program, thereby executing the series described above. Processing is done. The RAM 13 also appropriately stores data and the like necessary for the CPU 11 to execute various processes.

The program executed by the computer can be recorded and applied to removable media such as package media, for example. In that case, the program can be installed in the storage unit 18 via the input / output interface 15 by mounting the removable media in the drive 20. The program can also be provided via wired or wireless transmission media such as local area networks, the Internet, and digital satellite broadcasts. In that case, the program can be received by the communication unit 19 and installed in the storage unit 18.

<Modification example>
Although one embodiment of the present disclosure has been specifically described above, the content of the present disclosure is not limited to the above-mentioned one embodiment, and various modifications based on the technical idea of the present disclosure are possible.

A part of the configuration and functions of the information processing device according to the above-described embodiment may exist in a device different from the information processing device (for example, a server device on a network).

Further, for example, the program that realizes the above-mentioned function may be executed in any device. In that case, the device may have the necessary functional blocks so that the necessary information can be obtained. Further, for example, each step of one flowchart may be executed by one device, or may be shared and executed by a plurality of devices. Further, when a plurality of processes are included in one step, one device may execute the plurality of processes, or the plurality of devices may share and execute the plurality of processes. In other words, a plurality of processes included in one step can be executed as processes of a plurality of steps. On the contrary, the processes described as a plurality of steps can be collectively executed as one step.

Further, for example, in a program executed by a computer, the processing of the steps for writing the program may be executed in chronological order in the order described in the present specification, and may be executed in parallel or in a row. It may be executed individually at the required timing such as when it is broken. That is, as long as there is no contradiction, the processes of each step may be executed in an order different from the above-mentioned order. Further, the processing of the step for describing this program may be executed in parallel with the processing of another program, or may be executed in combination with the processing of another program. Further, for example, a plurality of techniques related to this technique can be independently implemented independently as long as there is no contradiction. Of course, any plurality of the present technologies can be used in combination. For example, some or all of the techniques described in any of the embodiments may be combined with some or all of the techniques described in other embodiments. In addition, a part or all of any of the above-mentioned techniques may be carried out in combination with other techniques not described above.

It should be noted that the contents of the present disclosure are not limitedly interpreted due to the effects exemplified in the present specification.

The present disclosure may also adopt the following configuration.
(1)
A mask area setting unit that sets the mask area for the obstruction that exists between the actual camera and the target,
An information processing device including a 3D model generation unit that generates a 3D model based on a plurality of image data including image data in which a mask area is set.
(2)
The information processing according to (1), wherein the 3D model generation unit generates the 3D model based on a silhouette image in which the mask area is extracted as a foreground and another silhouette image in which the foreground and the background are separated. Device.
(3)
The information processing apparatus according to (2), wherein the silhouette of the mask area extracted as the foreground is removed based on the silhouette of another silhouette image.
(4)
The mask area setting unit obtains the three-dimensional position information of the photographed image taken by the actual camera pixel by pixel based on the position / orientation information of the actual camera estimated by the camera calibration, and the three-dimensional of the pixel. The information processing apparatus according to any one of (1) to (3), wherein when the position information is between the actual camera and the target, the pixel is determined as a shield.
(5)
The information processing apparatus according to any one of (1) to (4), wherein the target is automatically or manually set.
(6)
The information processing apparatus according to any one of (1) to (5), which has a rendering unit that performs rendering excluding the mask area.
(7)
The information processing apparatus according to (6), wherein the rendering unit renders a texture estimated based on pixels in another region with respect to the mask region.
(8)
The information processing apparatus according to (6), wherein the rendering unit renders a 3D model generated in advance for the mask area and a texture associated with the 3D model.
(9)
The mask area setting unit sets the mask area for the obstruction that exists between the actual camera and the target.
A 3D model generation method in which a 3D model generation unit generates a 3D model based on a plurality of image data including image data in which the mask area is set.
(10)
The mask area setting unit sets the mask area for the obstruction that exists between the actual camera and the target.
A program in which a 3D model generation unit causes a computer to execute a 3D model generation method for generating a 3D model based on a plurality of image data including image data in which the mask area is set.
(11)
An acquisition unit that acquires mask area information indicating a mask area set for a shield existing between the actual camera and the target, and an acquisition unit.
An information processing device having a rendering unit that performs rendering excluding the mask area.
(12)
The information processing apparatus according to (11), wherein the rendering unit renders a texture estimated based on pixels in another region with respect to the mask region.
(13)
The information processing apparatus according to (12), wherein the other region is a region around the mask region.
(14)
The information processing apparatus according to (12), wherein the other area corresponds to the mask area obtained by a real camera close to a virtual camera.
(15)
The information processing apparatus according to (11), wherein the rendering unit renders a 3D model generated in advance in the mask area and a texture associated with the 3D model.
(16)
A first mask area is set on the front side of the target as seen from a predetermined virtual viewpoint, and a second mask area is set on the rear side of the target.
The information processing apparatus according to any one of (11) to (15), wherein different rendering processes are performed on the first mask area and the second mask area.
(17)
The acquisition unit acquires mask area information indicating the mask area set for the obstruction existing between the actual camera and the target.
An information processing method in which the rendering unit performs rendering excluding the mask area.
(18)
The acquisition unit acquires mask area information indicating the mask area set for the obstruction existing between the actual camera and the target.
A program in which the rendering unit causes a computer to execute an information processing method for rendering excluding the mask area.

<Application example>
The technology according to the present disclosure can be applied to various products and services.
(Content production)
For example, a new video content may be created by synthesizing the 3D model of the subject generated in the above-described embodiment and the 3D data managed by another server. Further, for example, when the background data acquired by an image pickup device such as Lidar exists, the subject can be placed in the place indicated by the background data by combining the 3D model of the subject generated in the above-described embodiment and the background data. You can also create content that looks as if it were. The video content may be a three-dimensional video content or a two-dimensional video content converted into two dimensions. The 3D model of the subject generated in the above-described embodiment includes, for example, a 3D model generated by the 3D model generation unit 2 and a 3D model reconstructed by the rendering unit 6.

(Experience in virtual space)
For example, a subject (for example, a performer) generated in the present embodiment can be placed in a virtual space where the user acts as an avatar and communicates. In this case, the user can act as an avatar and view the live-action subject in the virtual space.

(Application to communication with remote areas)
For example, by transmitting the 3D model of the subject generated by the 3D model generation unit 2 from the transmission unit 4 to a remote location, a user in the remote location can view the 3D model of the subject through the playback device in the remote location. .. For example, by transmitting the 3D model of the subject in real time, the subject and a user in a remote place can communicate in real time. For example, it can be assumed that the subject is a teacher and the user is a student, or the subject is a doctor and the user is a patient.

(others)
For example, it is possible to generate a free-viewpoint image such as sports based on a 3D model of a plurality of subjects generated in the above-described embodiment, or an individual distributes himself / herself as a 3D model generated in the above-mentioned embodiment. It can also be delivered to the platform. As described above, the contents of the embodiments described in the present specification can be applied to various techniques and services.

2 ... 3D model generation unit 2A ... Mask area setting unit 4 ... Transmission unit 5 ... Reception unit 6 ... Rendering unit

Claims

A mask area setting unit that sets the mask area for the obstruction that exists between the actual camera and the target,
An information processing device including a 3D model generation unit that generates a 3D model based on a plurality of image data including image data in which a mask area is set.
The information processing according to claim 1, wherein the 3D model generation unit generates the 3D model based on a silhouette image in which the mask area is extracted as a foreground and another silhouette image in which the foreground and the background are separated. Device.
The information processing apparatus according to claim 2, wherein the silhouette of the mask area extracted as the foreground is removed based on the silhouette of another silhouette image.
The mask area setting unit obtains the three-dimensional position information of the photographed image taken by the actual camera pixel by pixel based on the position / orientation information of the actual camera estimated by the camera calibration, and the three-dimensional of the pixel. The information processing apparatus according to claim 1, wherein when the position information is between the actual camera and the target, the pixel is determined as a shield.
The information processing apparatus according to claim 1, wherein the target is automatically or manually set.
The information processing apparatus according to claim 1, further comprising a rendering unit that performs rendering excluding the mask area.
The information processing apparatus according to claim 6, wherein the rendering unit renders a texture estimated based on pixels in another region with respect to the mask region.
The information processing device according to claim 6, wherein the rendering unit renders a 3D model generated in advance for the mask area and a texture associated with the 3D model.
The mask area setting unit sets the mask area for the obstruction that exists between the actual camera and the target.
A 3D model generation method in which a 3D model generation unit generates a 3D model based on a plurality of image data including image data in which the mask area is set.
The mask area setting unit sets the mask area for the obstruction that exists between the actual camera and the target.
A program in which a 3D model generation unit causes a computer to execute a 3D model generation method for generating a 3D model based on a plurality of image data including image data in which the mask area is set.
An acquisition unit that acquires mask area information indicating a mask area set for a shield existing between the actual camera and the target, and an acquisition unit.
An information processing device having a rendering unit that performs rendering excluding the mask area.
The information processing device according to claim 11, wherein the rendering unit renders a texture estimated based on pixels in another region with respect to the mask region.
The information processing apparatus according to claim 12, wherein the other region is a region around the mask region.
The information processing apparatus according to claim 12, wherein the other area corresponds to the mask area obtained by a real camera close to a virtual camera.
The information processing device according to claim 11, wherein the rendering unit renders a 3D model generated in advance in the mask area and a texture associated with the 3D model.
A first mask area is set on the front side of the target as seen from a predetermined virtual viewpoint, and a second mask area is set on the rear side of the target.
The information processing apparatus according to claim 11, wherein different rendering processes are performed on the first mask area and the second mask area.
The acquisition unit acquires mask area information indicating the mask area set for the obstruction existing between the actual camera and the target.
An information processing method in which the rendering unit performs rendering excluding the mask area.
The acquisition unit acquires mask area information indicating the mask area set for the obstruction existing between the actual camera and the target.
A program in which the rendering unit causes a computer to execute an information processing method for rendering excluding the mask area.