WO2022191010A1

WO2022191010A1 - Information processing device and information processing method

Info

Publication number: WO2022191010A1
Application number: PCT/JP2022/008967
Authority: WO
Inventors: 剛也小林
Original assignee: ソニーグループ株式会社
Priority date: 2021-03-12
Filing date: 2022-03-02
Publication date: 2022-09-15

Abstract

An information processing device according to the present disclosure is provided with: a generation unit (1214) that generates an image obtained by applying a texture image to a three-dimensional model included in three-dimensional data; and a selection unit (1211) that selects, on the basis of a first position of a virtual camera for acquiring an image of a virtual space, a second position of the three-dimensional model, and a third position of one or more imaging cameras each for capturing an image of a subject in a real space, an imaging camera for acquiring the image which is captured of the subject and which is to be used as the texture image, from among the one or more imaging cameras.

Description

Information processing device and information processing method

The present disclosure relates to an information processing device and an information processing method.

A technology called volumetric capture that generates a 3D model of a subject using captured images of an existing subject, and generates a high-quality 3D image of the subject based on the generated 3D model and the captured image of the subject. is known (for example, Non-Patent Document 1).

WO2017/082076

With conventional volumetric capture technology, when multiple subjects are included in a captured image, each subject cannot be separated or recognized, so sufficient quality cannot be obtained in the generated 3D image of each subject. there was a case.

An object of the present disclosure is to provide an information processing apparatus and an information processing method capable of generating a three-dimensional image of higher quality in volumetric capture.

An information processing apparatus according to the present disclosure includes a generation unit that generates an image by applying a texture image to a three-dimensional model included in three-dimensional data, a first position of a virtual camera that acquires an image of a virtual space, Based on a second position of the three-dimensional model and a third position of one or more imaging cameras that capture an image of the subject in real space, a captured image of the subject to be used as a texture image is obtained from one or more imaging cameras. and a selection unit that selects an imaging camera.

Further, the information processing apparatus according to the present disclosure includes a generation unit that generates three-dimensional data based on captured images captured by one or more imaging cameras, and a three-dimensional data corresponding to a subject included in the captured image from the three-dimensional data. a separating unit that separates the model and generates position information indicating the position of the separated three-dimensional model.

FIG. 10 is a diagram showing basic processing of volumetric capture based on a captured image of a photographed image, which is applicable to the embodiment; FIG. 10 is a diagram for explaining a problem of an example of existing technology; FIG. 10 is a diagram for explaining a problem of another example of the existing technology; FIG. 10 is a diagram for explaining a method of selecting an example of an imaging camera that acquires a texture image to be applied to a 3D model, according to existing technology; FIG. 10 is a schematic diagram for explaining a first example of image pickup camera selection according to existing technology; FIG. 10 is a schematic diagram for explaining a first example of image pickup camera selection according to existing technology; FIG. 10 is a schematic diagram for explaining a second example of image pickup camera selection according to existing technology; FIG. 10 is a schematic diagram for explaining a second example of image pickup camera selection according to existing technology; 1 is a functional block diagram showing an example of functions of an information processing system according to an embodiment; FIG. FIG. 3 is a schematic diagram showing an example configuration for acquiring image data of a subject, which is applicable to the embodiment; 1 is a block diagram showing a hardware configuration of an example of an information processing device applicable to an information processing system according to an embodiment; FIG. 6 is an exemplary flowchart schematically showing processing in the information processing system according to the embodiment; FIG. 4 is a schematic diagram schematically showing a three-dimensional model generation process applicable to the embodiment; It is a block diagram showing an example of the configuration of a 3D model generation unit according to the embodiment. FIG. 7 is a schematic diagram for explaining subject separation processing according to the embodiment; FIG. 14 is an exemplary flowchart illustrating subject separation processing according to the embodiment. FIG. 4 is a schematic diagram for explaining selection of an imaging camera according to the embodiment; 4 is a block diagram showing an example configuration of a rendering unit according to the embodiment; FIG. 8 is an example flowchart illustrating a first example of imaging camera selection processing in rendering processing according to the embodiment; FIG. 4 is a schematic diagram for explaining the relationship between an object and a virtual camera according to the embodiment; FIG. 10 is a schematic diagram for explaining processing for calculating an average value of reference positions of objects according to the embodiment; 9 is an example flowchart illustrating a second example of imaging camera selection processing in rendering processing according to the embodiment; 6 is an example flowchart illustrating rendering processing according to the embodiment; FIG. 4 is a schematic diagram for explaining post-effect processing according to the embodiment; FIG. 5 is a schematic diagram showing more specifically post-effect processing according to the embodiment;

Hereinafter, embodiments of the present disclosure will be described in detail based on the drawings. In addition, in the following embodiments, the same parts are denoted by the same reference numerals, thereby omitting redundant explanations.

Hereinafter, embodiments of the present disclosure will be described according to the following order.
1. Overview of Volumetric Capture2. 3. Existing technology Embodiment of Present Disclosure 3-1. Configuration example of information processing system according to embodiment 3-2. 3D model generation processing according to the embodiment 3-2-1. Overview of 3D model generation processing according to embodiment 3-2-2. Configuration example of 3D model generation unit according to embodiment 3-2-3. Specific example of 3D model generation processing according to embodiment 3-3. Regarding rendering processing according to the embodiment 3-3-1. Overview of Rendering Processing According to Embodiment 3-3-2. Configuration example of rendering unit according to embodiment 3-3-3. Specific example of rendering processing according to embodiment 3-3-3-1. First example of imaging camera selection process 3-3-3-2. Second example of imaging camera selection process 3-3-3-3. Details of rendering processing 3-3-3-4. 4. Regarding post-effect processing. 5. Application example of the embodiment of the present disclosure. Other embodiment

[1. Overview of volumetric capture]
First, prior to describing the embodiments of the present disclosure, an overview of volumetric capture will be described for easy understanding. FIG. 1 is a diagram showing basic processing of volumetric capture based on a captured image of a photographed image, which is applicable to the embodiment.

In FIG. 1, first, in step S1, the system surrounds an object (subject) with a large number of cameras in real space and captures an image of the subject. A camera that captures an image of a subject in real space is hereinafter referred to as an imaging camera. Next, in step S2, the system converts the subject into three-dimensional data and generates a three-dimensional model of the subject based on a plurality of captured images captured by multiple imaging cameras (3D modeling processing). Next, at step S3, the system renders the three-dimensional model generated at step S2 to generate an image.

In step S3, the system places the 3D model in the virtual space, and renders the 3D model from a virtual camera (hereinafter referred to as a virtual camera) that can freely move in the virtual space. to generate an image. That is, the system renders according to the position and orientation of the virtual camera with respect to the 3D model. For example, a user who operates a virtual camera can observe an image of a three-dimensional model viewed from a position according to his/her own operation.

As formats for 3D model data, a format combining mesh information and UV textures and a format combining mesh information and multi-textures are generally used. Mesh information is a set of vertices and edges of a three-dimensional model made up of polygons. A UV texture is a texture obtained by assigning UV coordinates, which are coordinates on the texture, to a texture image. Also, multi-texture is used to overlap and paste a plurality of texture images onto the polygons of the three-dimensional model.

Among these, the format that combines mesh information and UV texture covers all directions of the 3D model with one UV texture, so the amount of data is relatively small and lightweight, and the rendering load is low. This format is suitable for use in the View Independent method (hereinafter abbreviated as the VI method), which is a rendering method in which the geometry is fixed with respect to the viewpoint movement of the virtual camera.

On the other hand, the format that combines mesh information and multi-textures increases the amount of data and the rendering load, but it can provide high image quality. This format is suitable for use in the View Dependent method (hereinafter abbreviated as the VD method) in which the geometric shape changes as the viewpoint of the virtual camera moves.

[2. About existing technology]
Next, existing techniques and their problems regarding volumetric capture as described above will be described.

(Example when multiple subjects are included)
FIG. 2 is a diagram for explaining a problem of an example of existing technology. In the existing technology, multiple three-dimensional models 51 ₁ to 51 ₃ are included for a single three-dimensional data 50, as shown in section (a) of FIG. The three-dimensional models 51 ₁ to 51 ₃ are objects in virtual space obtained by giving three-dimensional information to the image of the subject in real space included in the captured image.

In this case, since the three-dimensional models 51 ₁ to 51 ₃ could not be separated and recognized, it was difficult to obtain sufficient quality when rendering the three-dimensional models 51 ₁ to 51 ₃ . That is, in order to render each three-dimensional model 51 ₁ -51 ₃ with sufficient quality, each _three -dimensional model 51 ₁ -51 3 must be rendered as independent data, as shown in section (b) of FIG. It must be treated as 52 ₁ to 52 ₃ .

In addition, when a plurality of three-dimensional models 51 ₁ to 51 ₃ are included in a single three-dimensional data 50, it is difficult to perform post-effect processing to apply effects to each of the three-dimensional models 51 ₁ to 51 ₃ after rendering. may become.

FIG. 3 is a diagram for explaining a problem of another example of existing technology. Consider a case where a single three-dimensional data 50 includes a plurality of three-dimensional models 51 ₁ to 51 ₃ as shown in section (a) of FIG. _In this case _, for example, as shown as _three _- dimensional data 500 after post-effect processing in section (b) of FIG. It was difficult to selectively apply effect processing (in this example, non-display processing).

In order to apply effect processing to a specific subject among the plurality of three-dimensional models 51 ₁ to 51 ₃ , separation processing for separating each of the three-dimensional models 51 ₁ to 51 ₃ is required. The existing technology does not consider such separation of the plurality of three-dimensional models 51 ₁ to 51 ₃ .

(Selection of the imaging camera according to the position of the virtual camera)
Next, a problem of still another example of existing technology will be described. In the existing technology, when a plurality of 3D models based on a plurality of subjects are included in one data, when selecting an imaging camera for obtaining textures to be applied to each 3D model, the position of the virtual camera In some cases, the optimal texture was not selected.

FIG. 4 is a diagram for explaining an example of a selection method of an imaging camera that acquires a texture image to be applied to a subject according to existing technology. In the example of FIG. 4, a subject 80 in real space and a plurality of imaging cameras 60 ₁ to 60 ₈ surrounding the subject 80 in real space are shown. In addition, in the example of FIG. 4, a reference position 81 is shown as a reference position of the subject 80 . FIG. 4 also shows a virtual camera 70 arranged in the virtual space.

In the following description, it is assumed that the coordinates in the real space and the coordinates in the virtual space match, and unless otherwise specified, the description will be made without distinguishing between the real space and the virtual space. For example, it is assumed that the real space and the virtual space have the same scale, and the position of an object (object, imaging camera, etc.) placed in the real space can be directly replaced with the position in the virtual space. Similarly, the positions of, for example, the three-dimensional model and the virtual camera 70 in the virtual space can be directly replaced with the positions in the real space.

As the reference position 81 of the object 80, the position corresponding to the point closest to the optical axes of all the imaging cameras 60 ₁ to 60 ₈ in the object 80 can be applied. Not limited to this, the reference position 81 of the subject 80 may be an intermediate position between the maximum value and the minimum value of the vertex coordinates of the subject 80, or the most important position in the subject 80 (corresponding to the subject 80). If the subject is a person, it may be the position of the face, for example.

When the three-dimensional model corresponding to the subject 80 is viewed from the position of the virtual camera 70, the optimum imaging camera for acquiring the texture to be applied to the three-dimensional model is determined according to the importance of each imaging camera 60 ₁ to 60 ₈ . It is known to select based on The degree of importance can be determined, for example, based on the angle formed by the position of the virtual camera 70 and the positions of the imaging cameras 60 ₁ to 60 ₈ with the reference position 81 as the vertex.

In the example of FIG. 4, the angle θ ₁ formed by the position of the virtual camera 70 and the imaging camera 60 ₁ with respect to the reference position 81 is the smallest angle, and the angle θ ₂ formed by the imaging camera 60 ₂ is the next smallest angle. becomes. Therefore, with respect to the position of the virtual camera 70, the imaging camera 60 ₁ has the highest importance, and the imaging camera 60 ₂ has the next highest importance after the imaging camera 60 ₁ .

Specifically, the importance P(i) of each imaging camera 60 ₁ to 60 ₈ can be calculated by the following equation (1).
P(i)=arccos(C _i ·C _v ) (1)

Note that in equation (1), the value i represents each of the imaging cameras 60 ₁ to 60 ₈ . Also, the value C _i represents a vector from each imaging camera 60 ₁ to 60 ₈ to the reference position 81 , and the value C _v represents a vector from the virtual camera 70 to the reference position 81 . That is, equation (1) obtains the importance P(i) of the imaging cameras 60 ₁ to 60 ₈ based on the inner product of vectors from the imaging cameras 60 ₁ to 60 ₈ and the virtual camera 70 to the reference position 81 .

According to the existing technology, when selecting the optimal imaging camera from multiple imaging cameras that capture multiple subjects, an unintended imaging camera may be selected as the optimal imaging camera.

(First example of selection of imaging camera by existing technology)
First, a first example of image pickup camera selection by existing technology will be described. 5A and 5B are schematic diagrams for explaining a first example of image pickup camera selection according to the existing technology. This first example is an example of selecting an imaging camera based on a vector relative to a reference position. That is, in FIGS. 5A and 5B, as described with reference to FIG. ₄ , when _a plurality of subjects 82 ₁ and 82 ₂ are included, vectors C _i and , and a vector C _v from the virtual camera 70 to the reference position.

5A and 5B, an imaging range 84 includes two subjects 82 ₁ and 82 ₂ . The subject 82 ₁ is positioned at the upper left corner of the imaging range 84 in the figure, and the subject 82 ₂ is positioned at the lower right corner of the imaging range 84 in the figure. Also, 16 imaging cameras 60 ₁ to 60 ₁₆ , each having an angle of view β, are arranged surrounding the imaging range 84 with their imaging directions facing the center of the imaging range 84 . Also, the virtual camera 70 has an angle of view α, and the three-dimensional model corresponding to the subject 82 ₁ is assumed to fit within the angle of view α. Since the positions of the subjects 82 ₁ and 82 ₂ are unknown, the reference position 83 adopts the center of the imaging range 84 or the center of gravity of the subjects 82 ₁ and 82 ₂ .

In the following description, the three-dimensional models corresponding to the subjects 82 ₁ and 82 ₂ are assumed to be the subjects 82 ₁ and 82 ₂ unless otherwise specified.

FIG. 5A shows an example in which the virtual camera 70 is on the front side of the reference position 83 with respect to the subject 82 ₁ . When an object 82 ₁ is imaged by the virtual camera 70, ideally, the imaging camera 60 ₁ located on a straight line 93a passing through the virtual camera 70 from the object 82 ₁ is the optimum imaging camera.

However, in this example, the direction of the vector 91a from the virtual camera 70 to the reference position 83 and the direction of the vector 90a from the imaging camera 60 ₁₆ to the reference position 83 substantially match, and the imaging camera 60 ₁₆ is optimal. camera. The imaging camera 60 ₁₆ differs in position and orientation with respect to the object 82 ₁ with respect to the optimal imaging camera 60 ₁ in the ideal case. Therefore, the quality of the texture based on the captured image of the imaging camera 60 ₁₆ is lower than that of the texture based on the captured image of the imaging camera 60 ₁ .

FIG. 5B shows an example in which the virtual camera 70 is positioned between the subject 82 ₁ and the reference position 83. As shown in FIG. Also in this case, ideally, the imaging camera 60 ₁ located on the straight line 93b passing through the virtual camera 70 from the object 82 ₂ is the optimum imaging camera.

However, in this example, the reference position 83 is on the side opposite to the subject 82 ₁ with respect to the virtual camera 70, and the vector 91b from the virtual camera 70 to the reference position 83 points to the side opposite to the subject 82 ₁ . become. Therefore, the direction of the vector 90b from the imaging camera 60 ₁₁ located on the opposite side of the subject 82 ₁ as viewed from the virtual camera 70 to the reference position 83 becomes close to the direction of the vector 91b, and the imaging camera 60 ₁₁ is the optimum imaging camera. be selected. The imaging camera 60 ₁₁ images a surface of the subject 82 ₁ that cannot be seen from the virtual camera 70 . Therefore, the quality of the texture based on the image captured by the imaging camera 60 ₁₁ is greatly reduced compared to the texture based on the ideal image captured by the imaging camera 60 ₂ .

(Second example of selection of imaging camera by existing technology)
Next, a second example of image pickup camera selection by existing technology will be described. The selection method of the optimum imaging camera is not limited to the selection method based on the vector for the reference position described above. A second example of image pickup camera selection by existing technology is based on the angle between the optical axis of the virtual camera 70 and the vector of each image pickup camera 60 ₁ to 60 ₁₆ with respect to the subject 82 ₁ , from each image pickup camera 60 ₁ to 60 ₁₆ This is an example of selecting the optimum imaging camera.

FIGS. 6A and 6B are schematic diagrams for explaining a second example of image pickup camera selection according to existing technology. In FIGS. 6A and 6B, the

subjects

821 and 822, the reference position 83, and the imaging range 84 are the same as in FIGS. 5A and 5B described above, so descriptions thereof will be omitted here.

FIG. 6A corresponds to FIG. 5A described above, and shows an example in which the virtual camera 70 is positioned closer to the subject 82 ₁ than the reference position 83 . When an object 82 ₁ is imaged by the virtual camera 70, ideally, the imaging camera 60 ₁ located on a straight line 93a passing through the virtual camera 70 from the object 82 ₁ is the optimum imaging camera.

In the example of FIG. 6A, the virtual camera 70 faces upward in the figure, and the optical axis 94a is upward. In FIG. 6A, among the imaging cameras 60 ₁ to 60 ₁₆ , the angle between the direction of the vector 90c from the imaging camera 60 ₁ to the reference position 83 and the optical axis 94a of the virtual camera 70 is the smallest. Therefore, the same imaging camera 60 ₁ as the ideal optimum imaging camera is selected as the optimum imaging camera, and high-quality texture can be obtained.

FIG. 6B corresponds to FIG. 5B described above, and shows an example in which the virtual camera 70 is positioned between the object 82 ₁ and the reference position 83 . In this case, ideally, the imaging camera 60 ₂ located on a straight line 93c passing through the virtual camera 70 from the object 82 ₂ is the optimum imaging camera.

In the example of FIG. 6B, as in FIG. 6A, the virtual camera 70 faces upward in the drawing, and the optical axis 94b is directed upward. In FIG. 6B, among the imaging cameras 60 ₁ to 60 ₁₆ , the angle between the direction of the vector 90c from the imaging camera 60 ₁ to the reference position 83 and the optical axis 94b of the virtual camera 70 is the smallest. Therefore, as the optimum imaging camera, the imaging camera 60 ₁ different from the optimum imaging camera 60 ₂ in the ideal case is selected. Therefore, the quality of the texture based on the captured image of the imaging camera 60 ₁ is lower than that of the texture based on the captured image of the imaging camera 60 ₂ .

The information processing system according to the embodiment of the present disclosure obtains the position of each subject when generating each three-dimensional model. Then, when rendering a three-dimensional model based on each subject, the information processing system uses the position of each subject obtained when the three-dimensional model is generated, and uses the imaging camera to acquire the texture to be applied to the three-dimensional model. to select.

Therefore, by applying the information processing system according to the embodiment, even when a plurality of subjects are included in the input, the imaging camera used for acquiring the texture to be applied to the 3D model can be appropriately selected. You can choose and get high quality textures. Also, by using the position information added to each three-dimensional model, it is possible to apply post-effect processing to each three-dimensional model.

[3. Embodiment of the Present Disclosure]
Next, embodiments of the present disclosure will be described.

(3-1. Configuration example of information processing system according to embodiment)
First, a configuration example of an information processing system according to an embodiment will be described. FIG. 7 is an exemplary functional block diagram illustrating functions of the information processing system according to the embodiment. 7, the information processing system 100 includes a data acquisition unit 110, a 3D (3-Dimensional) model generation unit 111, a formatting unit 112, a transmission unit 113, a reception unit 120, and a rendering unit 121. , and a display unit 122 .

The information processing system 100 includes, for example, an information processing device for outputting a 3D model including a data acquisition unit 110, a 3D model generation unit 111, a formatting unit 112, and a transmission unit 113, a reception unit 120, and rendering units 121 and 122. and an information processing device for outputting display information. The information processing system 100 can also be configured by a single computer device (information processing device).

The data acquisition unit 110, the 3D model generation unit 111, the formatting unit 112, the transmission unit 113, the reception unit 120, the rendering unit 121, and the display unit 122 run the information processing program according to the embodiment on, for example, a CPU (Central Processing Unit). It is realized by being executed. Not limited to this, some or all of the data acquisition unit 110, the 3D model generation unit 111, the formatting unit 112, the transmission unit 113, the reception unit 120, the rendering unit 121, and the display unit 122 may be hardware that cooperates with each other. It may be realized by a circuit.

The data acquisition unit 110 acquires image data for generating a 3D model of a subject. FIG. 8 is a schematic diagram showing an example configuration for acquiring image data of a subject, which is applicable to the embodiment. As shown in FIG. 8, a plurality of captured images captured from a plurality of _viewpoints by a plurality of imaging cameras ₆₀ ₁ , 60 ₂ , 60 ₃ , . Acquire as image data. In this case, the captured images from multiple viewpoints are preferably images captured in synchronism by the plurality of imaging cameras 60 ₁ to 60 _n .

For example, the data acquisition unit 110 may acquire, as image data, a plurality of captured images obtained by capturing the subject 80 from a plurality of viewpoints with a single imaging camera. As an example, this image data acquisition method is applicable when the position of the subject 80 is fixed.

Note that the data acquisition unit 110 may perform calibration based on the image data and acquire the internal parameters and external parameters of each imaging camera 60 ₁ to 60 _n . Also, the data acquisition unit 110 may acquire a plurality of pieces of depth information indicating distances from a plurality of viewpoints to the subject 80, for example.

The 3D model generation unit 111 generates a 3D model having 3D information of the subject 80 based on image data obtained by the data acquisition unit 110 and obtained by imaging the subject 80 from multiple viewpoints.

The 3D model generation unit 111 uses, for example, a so-called Visual Hull to cut the three-dimensional shape of the subject 80 using images from multiple viewpoints (for example, silhouette images from multiple viewpoints). A three-dimensional model of the subject 80 is generated. In this case, the 3D model generation unit 111 further transforms the 3D model generated using Visual Full with a high degree of accuracy using a plurality of pieces of depth information indicating distances from a plurality of viewpoints to the subject 80. can be done.

The 3D model generated by the 3D model generation unit 111 is generated using captured images captured by the imaging cameras 60 ₁ to 60 _n in the real space, and therefore can be said to be a real 3D model.

The 3D model generation unit 111 can express the generated 3D model, for example, in the form of mesh data. The mesh data is data representing shape information representing the surface shape of the subject 80 by connections between vertices called polygon meshes. The method of expressing the three-dimensional model generated by the 3D model generation unit 111 is not limited to mesh data. For example, the 3D model generation unit 111 may describe the generated 3D model in a so-called point cloud representation method represented by point position information.

The 3D model generation unit 111 also generates color information data of the subject 80 as a texture in association with the three-dimensional model of the subject 80 . The 3D model generation unit 111 can generate, for example, a View Independent (VD) texture that has a constant color when viewed from any direction. Not limited to this, the 3D model generation unit 111 may generate a View Dependent (VI) texture whose color changes depending on the viewing direction.

The formatting unit 112 converts the 3D model data generated by the 3D model generation unit 111 into data in a format suitable for transmission and storage. For example, the formatting unit 112 can convert the 3D model generated by the 3D model generating unit 111 into a plurality of two-dimensional images by perspectively projecting the model from a plurality of directions. In this case, the formatting unit 112 may generate depth information, which is two-dimensional depth images from multiple viewpoints, using the three-dimensional model.

The formatting unit 112 compresses and encodes the depth information and the color information of the state of the two-dimensional image, and outputs them to the transmission unit 113 . The formatting unit 112 may transmit the depth information and the color information side by side as one image, or may transmit them as two separate images. In this case, since the data to be transmitted is in the form of two-dimensional image data, the formatting unit 112 can compress and encode the data using a compression technique for two-dimensional images such as AVC (Advanced Video Coding).

The formatting unit 112 may also convert the three-dimensional model into a point cloud format. Furthermore, the formatting unit 112 may output the 3D model to the transmission unit 113 as 3D data. In this case, the formatting unit 112 can use, for example, the Geometry-based-Approach three-dimensional compression technology discussed in MPEG (Moving Picture Experts Group).

The transmission unit 113 transmits transmission data generated by the formatting unit 112 . The transmission unit 113 transmits transmission data after performing a series of processes of the data acquisition unit 110, the 3D model generation unit 111, and the formatting unit 112 offline. Further, the transmission unit 113 may transmit the transmission data generated from the series of processes described above in real time.

The receiving section 120 receives transmission data transmitted from the transmitting section 113 .

The rendering unit 121 performs rendering according to the position of the virtual camera 70 using the transmission data received by the receiving unit 120 . For example, mesh data of a three-dimensional model is projected from the viewpoint of the virtual camera 70 that performs drawing, and texture mapping is performed to paste textures representing colors and patterns. The drawn image can be viewed from a freely set viewpoint by means of the virtual camera 70 regardless of the positions of the imaging cameras 60 ₁ to 60 _n at the time of photographing.

The rendering unit 121, for example, performs texture mapping to paste textures representing the color, pattern, and texture of the mesh according to the position of the mesh of the three-dimensional model. The rendering unit 121 may perform texture mapping using a VD method that considers the viewpoint from the user (virtual camera 70). Not limited to this, the rendering unit 121 may perform texture mapping by a VI method that does not consider the viewpoint of the user.

The VD method changes the texture to be pasted on the 3D model according to the position of the viewpoint from the user (the viewpoint from the virtual camera 70). Therefore, the VD method has the advantage of realizing higher quality rendering than the VI method. On the other hand, since the VI method does not consider the position of the viewpoint from the user, it has the advantage of reducing the amount of processing compared to the VD method.

Note that the user's viewpoint data may be input to the rendering unit 121 from the display device, for example, by detecting the region of interest of the user. Also, the rendering unit 121 may employ billboard rendering, which renders an object such that the object maintains a vertical orientation with respect to the viewpoint of the user, for example. For example, when rendering a plurality of objects, the rendering unit 121 may render an object that the user is less interested in using billboard rendering, and render other objects using another rendering method.

The display unit 122 displays the resulting image rendered by the rendering unit 121 on the display device of the display device. The display device may be, for example, a head-mounted display or a spatial display, or may be a display device of an information device such as a smartphone, a television receiver, or a personal computer. Also, the display device may be a 2D monitor for two-dimensional display, or a 3D monitor for three-dimensional display.

(Another configuration example of the information processing system according to the embodiment)
The information processing system 100 shown in FIG. 7 shows a series of flows from the data acquisition unit 110 that acquires captured images, which are materials for generating content, to the display control unit that controls the display device observed by the user. However, this does not mean that all functional blocks are required to implement an embodiment, and embodiments can be implemented for each functional block or combination of multiple functional blocks. For example, in FIG. 7, the transmitting unit 113 and the receiving unit 120 are provided to show a series of flow from the content (three-dimensional model) creation side to the content observation side through the distribution of the content data. If the same information processing apparatus (for example, a personal computer) is used for production to observation, the information processing system 100 can omit the formatting unit 112, the transmission unit 113, and the reception unit 120. FIG.

In implementing the information processing system 100 according to the embodiment, the same implementer may implement all of them, or each functional block may be implemented by different implementers. As an example, operator A generates 3D content (three-dimensional model) using data acquisition section 110 , 3D model generation section 111 and format formation section 112 . In addition, it is conceivable that the 3D content is distributed through the transmitter 113 (platform) of the operator B, and the display device of the operator C receives, renders, and displays the 3D content.

Also, each functional block shown in FIG. 7 can be implemented on a cloud network. For example, the rendering unit 121 may be implemented within a display device, or may be implemented in a server on a cloud network. In that case, information is exchanged between the display device and the server.

In FIG. 7, the data acquisition unit 110, the 3D model generation unit 111, the formatting unit 112, the transmission unit 113, the reception unit 120, the rendering unit 121, and the display unit 122 are collectively described as the information processing system 100. This is not limited to this example, and the information processing system 100 according to the embodiment is defined as the information processing system 100 if two or more functional blocks are involved. The unit 110 , the 3D model generation unit 111 , the formatting unit 112 , the transmission unit 113 , the reception unit 120 and the rendering unit 121 may be collectively referred to as the information processing system 100 .

(Hardware Configuration Example of Information Processing Apparatus Applicable to Embodiment)
FIG. 9 is a block diagram showing a hardware configuration of an example of an information processing device applicable to the information processing system 100 according to the embodiment. The information processing apparatus 2000 shown in FIG. 9 can be applied to both the information processing apparatus for outputting the 3D model and the information processing apparatus for outputting the display information described above. The information processing apparatus 2000 shown in FIG. 9 can also be applied to a configuration including the entire information processing system 100 shown in FIG.

9, an information processing device 2000 includes a CPU (Central Processing Unit) 2100, a ROM (Read Only Memory) 2101, a RAM (Random Access Memory) 2102, an interface (I/F) 2103, an input section 2104, and , an output unit 2105 , a storage device 2106 , a communication I/F 2107 and a drive device 2108 .

The CPU 2100, ROM 2101, RAM 2102 and I/F 2103 are communicably connected to each other via a bus 2110. An input unit 2104 , an output unit 2105 , a storage device 2106 , a communication I/F 2107 and a drive device 2108 are connected to the I/F 2103 . These input unit 2104 , output unit 2105 , storage device 2106 , communication I/F 2107 and drive device 2108 can communicate with CPU 2100 and the like via I/F 2103 and bus 2110 .

The storage device 2106 is a non-volatile storage medium such as a hard disk drive or flash memory. The CPU 2100 controls the overall operation of the information processing apparatus 2000 according to programs stored in the ROM 2101 and storage device 2106 and using the RAM 2102 as a work memory.

The input unit 2104 accepts data input to the information processing device 2000 . As the input unit 2104, an input device for inputting data according to user operation, such as a pointing device such as a mouse, a keyboard, a touch panel, a joystick, or a controller, can be applied. Also, the input unit 2104 can include various input terminals for inputting data from an external device. Additionally, the input section 2104 can include a sound pickup device such as a microphone.

The output unit 2105 is responsible for outputting information from the information processing device 2000 . A display device such as a display can be applied as the output unit 2105 . Also, the output unit 2105 can include a sound output device such as a speaker. Furthermore, the output unit 2105 can include various output terminals for outputting data to external devices.

Also, when the information processing device 2000 executes the processing of the rendering unit 121, the output unit 2105 preferably includes a GPU (Graphics Processing Unit). The GPU has a memory (GPU memory) for graphics processing.

A communication I/F 2107 controls communication via a network such as a LAN (Local Area Network) or the Internet. A drive device 2108 drives removable media such as optical discs, magneto-optical discs, flexible discs, and semiconductor memories to read and write data.

For example, when the information processing apparatus 2000 is used as an information processing apparatus for outputting a 3D model, the CPU 2100 executes the information processing program according to the embodiment to obtain the data acquisition unit 110 and the 3D model generation unit 111 described above. , the formatting unit 112 and the transmitting unit 113 are configured as modules on the main storage area of the RAM 2012, respectively.

Further, for example, when the information processing apparatus 2000 is used as an information processing apparatus for outputting display information, the CPU 2100 executes the information processing program according to the embodiment, so that the receiving unit 120, the rendering unit 121, and the display unit 122 are configured as modules, for example, on the main memory area of the RAM 2012 .

These information processing programs can be acquired from outside (for example, a server device) via a network such as a LAN or the Internet by communication via the communication I/F 2107, and installed on the information processing device 2000. ing. Not limited to this, the information program may be stored in a removable storage medium such as a CD (Compact Disk), a DVD (Digital Versatile Disk), or a USB (Universal Serial Bus) memory and provided.

(Outline of processing according to the embodiment)
FIG. 10 is an exemplary flowchart schematically showing processing in the information processing system 100 according to the embodiment. Prior to the processing according to the _flowchart of FIG. ₁₀ , as described with reference to FIG.

When the process according to the flowchart of FIG. 10 is started, the information processing system 100 acquires captured image data for generating a three-dimensional model of the subject 80 by the data acquisition unit 110 in step S10. In the next step S11, the information processing system 100 uses the 3D model generation unit 111 to generate a three-dimensional model having three-dimensional information of the subject 80 based on the captured image data acquired in step S10.

In the next step S12, the information processing system 100 causes the formatting unit 112 to encode the three-dimensional model shape and texture data generated in step S11 into a format suitable for transmission and storage. In the next step 13, the information processing system 100 causes the transmission unit 113 to transmit the data encoded in step S12.

In the next step S14, the information processing system 100 receives the data transmitted in step S13 by the receiving unit 120. The receiving unit 120 decodes the received data and restores the shape and texture data of the three-dimensional model.

In the next step S15, the information processing system 100 causes the rendering section 121 to perform rendering using the shape and texture data passed from the receiving section 120, and generate image data for displaying the three-dimensional model. In the next step 16, the information processing system 100 causes the display unit 122 to display the image data generated by rendering on the display device.

When the process of step S16 ends, the series of processes in the flowchart of FIG. 10 ends.

(3-2. 3D model generation processing according to the embodiment)
Next, the 3D model generation processing by the 3D model generation unit 111 according to the embodiment in step S11 of FIG. 10 will be described in more detail.

(3-2-1. Overview of 3D model generation processing according to the embodiment)
FIG. 11 is a schematic diagram that schematically shows a three-dimensional model generation process that can be applied to the embodiment. As shown in section (a) of FIG. 11, the 3D model generation unit 111 generates a plurality of 3D models 51 ₁ to 51 ₃ based on captured images captured from different viewpoints, for example, each of which is based on an object in real space. Generate three-dimensional data 50 including: Various methods are conceivable for adding position information to each of the three-dimensional models 51 ₁ to 51 ₃ . Here, position information is added to each of the three-dimensional models 51 ₁ to 51 ₃ using bounding boxes.

Section (b) of FIG. 11 shows an example of a bounding box. Rectangular parallelepipeds circumscribing the three-

dimensional models

51 ₁ , 51 ₂ and 51 ₃ are determined as three-dimensional bounding boxes 200 ₁ , 200 ₂ and 200 ₃ . Each vertex of these three-dimensional bounding boxes 200 ₁ to 200 ₃ is used as position information indicating the position of the corresponding three-dimensional models 51 ₁ to 51 ₃ .

In the example of section (b), the position BoundingBox[0] of the three-dimensional bounding box 200 ₁ with respect to the three-dimensional model 51 ₁ has the x-axis in the horizontal direction in the drawing, the y-axis in the vertical direction in the drawing, and the z-axis in the drawing. is expressed as the following equation (2). Note that "min" and "max" indicate the minimum and maximum values of the bounding box in that direction, respectively.
BoundingBox[0] = (x _min0 , x _max0 , y _min0 , y _max0 , z _min0 , z _max0 ) … (2)

The positions BoundingBox[1] and BoundingBox[2] of the three-dimensional bounding boxes 200 ₂ and 200 ₃ with respect to the three-

dimensional models

51 ₂ and 51 ₃ are similarly represented by the following equations (3) and (4).
BoundingBox[1] = (x _min1 , x _max1 , y _min1 , y _max1 , z _min1 , z _max1 ) … (3)
BoundingBox[2] = (x _min2 , x _max2 , y _min2 , y _max2 , z _min2 , z _max2 ) (4)

(3-2-2. Configuration example of 3D model generation unit according to embodiment)
FIG. 12 is a block diagram showing an example configuration of the 3D model generation unit 111 according to the embodiment. In FIG. 12 , the 3D model generation unit 111 includes a 3D model processing unit 1110 and a 3D model separation unit 1111 .

The image data captured by each of the imaging cameras 60 ₁ to 60 _n and the imaging camera information output from the data acquisition unit 110 are input to the 3D model generation unit 111 . The imaging camera information may include color information, depth information, camera parameter information, and the like. The camera parameter information includes, for example, information on the position, direction and angle of view β of each imaging camera 60 ₁ to 60 _n . Camera parameter information may further include zoom information, shutter speed information, aperture information, and the like. The imaging camera information of each of the imaging cameras 60 ₁ to 60 _n is passed to the 3D model processing section 1110 and output from the 3D model generation section 111 .

Based on the image data captured by each of the imaging cameras 60 ₁ to 60 _n and the imaging camera information, the 3D model processing unit 1110 removes the three-dimensional shape of the subject included in the imaging range using the above-described Visual Full, thereby arranging the vertices of the subject. and generate surface data. More specifically, the 3D model processing unit 1110 acquires in advance an image of the background of the space in which the subject is placed in the real space for each of the imaging cameras 60 ₁ to 60 _n . A silhouette image of the subject is generated based on the difference between each image of the subject captured by each of the imaging cameras 60 ₁ to 60 _n and each background image. By shaving the three-dimensional space from this silhouette image, it is possible to obtain the three-dimensional shape of the subject based on the vertex and surface data.

In this way, the 3D model processing unit 1110 functions as a generation unit that generates three-dimensional data based on captured images captured by one or more imaging cameras.

The 3D model processing unit 1110 outputs the generated vertex and surface data of the subject as mesh information. The mesh information output from the 3D model processing unit 1110 is output from the 3D model generation unit 111 and passed to the 3D model separation unit 1111 . The 3D model separation unit 1111 separates each subject based on the mesh information passed from the 3D model processing unit 1110 and generates position information of each subject.

(3-2-3. Specific example of 3D model generation processing according to the embodiment)
The subject separation processing by the 3D model separation unit 1111 will be described in more detail. FIG. 13 is a schematic diagram for explaining subject separation processing according to the embodiment. Also, FIG. 14 is a flowchart of an example showing subject separation processing according to the embodiment.

First, as shown in section (a) of FIG. 13, in the 3D model generation unit 111, the 3D model processing unit 1110 generates a plurality of 3D models 51 visually based on captured images captured from different viewpoints. Generate three-dimensional data 50 containing ₁ to 51 ₃ .

In step S100 of FIG. 14, the 3D model separation unit 1111 projects the 3D data 50 in the height direction (y-axis direction) to generate 2D silhouette information for each of the 3D models 51 ₁ to 51 ₃ . . Section (b) of FIG. 13 shows examples of two-dimensional silhouettes 52 ₁ -52 ₃ based on respective three-dimensional models 51 ₁ -51 ₃ .

In the next step S101, the 3D model separation unit 1111 performs clustering on a two-dimensional plane based on the silhouettes 52 ₁ to 52 ₃ to detect blobs. In the next step S102, the 3D model separation unit 1111 sets the loop variable i related to the number of detected blobs to i=0. Subsequent steps S103 to S105 are processed for each blob detected in step S101.

In step S103, the 3D model separating unit 1111 divides the detected blob into a two-dimensional bounding rectangle corresponding to each of the three-dimensional models 51 ₁ to 51 ₃ , as shown in section (c) of FIG. Obtained as boxes 53 ₁ to 53 ₃ .

In the next step S104, the 3D model separation unit 1111 adds height information to the two-dimensional bounding box 53 ₁ obtained in step S103 to obtain a 3D model as shown in section (d) of FIG. Generate a dimensional bounding box 200 ₁ . Height information to be added to the two-dimensional bounding box 53 ₁ can be obtained based on the three-dimensional data 50 shown in section (a) of FIG. 13, for example.

Similarly, the 3D model separation unit 1111 gives height information to the two-dimensional bounding boxes 53 ₂ and 53 ₃ to generate three-dimensional bounding boxes 200 ₂ and 200 ₃ .

At the next step S105, the 3D model separation unit 1111 determines whether or not all the blobs detected at step S101 have been processed. For example, when m blobs are detected in step S101, the 3D model separation unit 1111 determines that processing for all blobs has not ended if i<m−1 for the loop variable i. When the 3D model separation unit 1111 determines that the processing for all blobs has not been completed (step S105, “No”), the loop variable i is set to i=i+1 and the process returns to step S103.

In the example of FIG. 13, the loop variable i is i=1, a two-dimensional bounding box 53 ₂ is obtained in step S103, and height information is added to the two-dimensional bounding box 53 ₂ in the next step S104. , a three-dimensional bounding box 200 ₂ is generated. Similarly, when the loop variable i is i=2, a two-dimensional bounding box 53 ₃ is obtained in step S103, and height information is added to the two-dimensional bounding box 53 ₃ in step S104 to obtain a three-dimensional bounding box 200 information ₃ is generated.

In step S105, when the 3D model separation unit 1111 determines that the processing for all blobs has ended (step S105, "Yes"), it ends the series of processing according to the flowchart in FIG.

In this manner, three-dimensional bounding boxes 200 ₁ -200 ₃ corresponding to the three-dimensional models 51 ₁ -51 ₃ are generated. Then, based on the vertex coordinates of each of these three-dimensional bounding boxes 200 ₁ to 200 ₃ , position information indicating the position of each of the three-dimensional models 51 ₁ to 51 ₃ is obtained.

In this way, the 3D model separation unit 1111 separates the 3D model corresponding to the subject included in the captured image from the 3D data and generates position information indicating the position of the separated 3D model. Function. The 3D model generation unit 111 adds position information indicating the position of the 3D model to the 3D model separated by the 3D model separation unit 1111 and outputs the 3D model.

(3-3. Regarding rendering processing according to the embodiment)
Next, rendering processing by the rendering unit 121 according to the embodiment will be described in more detail. The rendering unit 121 uses the position information indicating the position of the subject acquired by the 3D model generation unit 111 as described above to select the optimum imaging camera for acquiring the texture to be applied to the subject.

(3-3-1. Overview of rendering processing according to the embodiment)
FIG. 15 is a schematic diagram for explaining selection of an imaging camera according to the embodiment. In FIG. 15, section (a) shows an example of imaging camera selection according to the embodiment. Also, section (b) is the same diagram as FIG. 6B according to the existing technology described above, and is reprinted for comparison with the embodiment.

In section (a) of FIG. 15, two subjects 82 ₁ and 82 ₂ are included in an imaging range 84 to be imaged, as in FIG. 5A and the like described above. Subject 82 ₁ is positioned at the upper left corner of imaging range 84 in section (a) of FIG. 15, and subject 82 ₂ is positioned at the lower right corner of imaging range 84 in section (a) of FIG. In section (a) of FIG. 15, 16 imaging cameras 60 ₁ to 60 ₁₆ each having an angle of view β are arranged to surround an imaging range 84 with their imaging directions facing the center of the imaging range 84. . Also, the virtual camera 70 has an angle of view α, and the three-dimensional model corresponding to the subject 82 ₁ is assumed to fit within the angle of view α.

In addition, in the example of section (a) of FIG. 15, the virtual camera 70 is arranged closer to the subject 82 ₁ than the center of the imaging range 84, and the subject 82 ₁ is included within the angle of view α of the virtual camera 70. . The reference position 83 is set based on position information indicating the position of the subject 82 ₁ obtained by the 3D model generation unit 111 . Here, it is assumed that the reference position 83 is set at the center of the subject 82 ₁ .

As shown in section (a) of FIG. 15, in the embodiment, the rendering unit 121 renders the positions of the imaging cameras 60 ₁ to 60 ₁₆ and the position of the subject 82 ₁ included within the angle of view α of the virtual camera 70. , each vector from each imaging camera 60 ₁ to 60 ₁₆ to the reference position 83 is obtained. The rendering unit 121 also obtains a vector 91 e from the virtual camera 70 to the reference position 83 from the virtual camera 70 and the position of the subject 821 included within the angle of view α of the virtual camera 70 .

The rendering unit 121, for example, in accordance with the above-described formula (1), for each angle formed by each vector (vector C _i ) from each imaging camera 60 ₁ to 60 ₁₆ to the reference position 83 and the vector 91e (vector C _v ) Based on this, the degree of importance P(i) of each imaging camera 60 ₁ to 60 ₁₆ is obtained. The rendering unit 121 selects the optimum imaging camera for acquiring the texture to be applied to the subject 82 ₁ based on the importance P(i) obtained for each of the imaging cameras 60 ₁ to 60 ₁₆ .

In the example of section (a) of FIG. 15, when the subject 82 ₁ is imaged by the virtual camera 70, ideally, the imaging camera 60 is on a straight line 93c passing through the virtual camera 70 from the subject 82 ₁ (reference position 83). ₂ is the optimal imaging camera.

In the embodiment, the position of the subject 82 ₁ is obtained and set as the reference position 83 . Therefore, among the vectors from the imaging cameras 60 ₁ to 60 ₁₆ to the reference position 83, by selecting the vector with the smallest angle with the vector 91e from the virtual camera 70 to the reference position 83, an imaging camera close to the ideal can be selected as the optimal imaging camera. In the example of section (a) of _FIG . 15, the imaging camera 602, which is the above-described ideal imaging camera, is selected as the optimum imaging camera. In other words, the imaging camera 60 ₂ can be said to be a camera viewing the subject 82 ₁ from substantially the same direction as the virtual camera 70 . Therefore, according to the imaging camera selection method according to the embodiment, it is possible to obtain textures of higher quality.

Section (b) of FIG. 15 is based on the angle between the optical axis of the virtual camera 70 and the vector of each of the imaging cameras 60 ₁ to 60 ₁₆ with respect to the subject 82 ₁ from each of the imaging cameras 60 ₁ to 60 ₁₆ according to the existing technology. This is an example of selecting the optimum imaging camera. In this selection method according to the existing technology, the reference position 83 does not match the position of the subject 82 ₁ , and the virtual camera 70 and the selected optimum imaging camera are viewing the subject 82 ₁ from substantially the same direction. Not necessarily. In the example of section (b) of FIG. 15, the imaging camera 60 ₁ different from the ideal imaging camera 60 ₂ is selected as the optimum imaging camera. Therefore, compared to the selection direction using the position information indicating the position of the subject 821 according to the embodiment, the quality of the acquired texture is degraded.

(3-3-2. Configuration example of rendering unit according to embodiment)
FIG. 16 is a block diagram showing an example configuration of the rendering unit 121 according to the embodiment. 16, the rendering unit 121 includes a mesh transfer unit 1210, an imaging camera selection unit 1211, an imaging viewpoint depth generation unit 1212, an imaging camera information transfer unit 1213, and a virtual viewpoint texture generation unit 1214.

The mesh information, imaging camera information, and subject position information generated by the 3D model generation unit 111 are input to the rendering unit 121 . Also, virtual viewpoint position information indicating the position and direction of the virtual camera 70 is input to the rendering unit 121 . This virtual viewpoint position information is input by the user, for example, using a controller (corresponding to the input unit 2104). The rendering unit 121 generates a texture at the virtual viewpoint of the virtual camera 70 based on the mesh information, the imaging camera information, the virtual viewpoint position information, and the subject position information.

The mesh information is transferred to the mesh transfer unit 1210. The mesh transfer unit 1210 transfers the passed mesh information to the imaging viewpoint depth generation unit 1212 and the virtual viewpoint texture generation unit 1214 . For example, when the information processing apparatus 2000 including the rendering unit 121 has a GPU, the mesh transfer processing by the mesh transfer unit 1210 is processing for transferring mesh information to the GPU memory. In this case, the virtual viewpoint texture generation unit 1214 may access the GPU memory to acquire mesh information. Note that if the mesh information is on the GPU memory when the reception unit 120 receives the mesh information, the mesh transfer unit 1210 can be omitted.

The imaging camera information is transferred to the imaging camera information transfer unit 1213. In addition, camera parameter information in the imaging camera information is transferred to the imaging camera selection unit 1211 and the imaging viewpoint depth generation unit 1212 .

The imaging viewpoint depth generation unit 1212 selects an imaging camera from the imaging cameras 60 ₁ to 60 _n according to camera selection information passed from the imaging camera selection unit 1211, which will be described later. Based on the mesh information transferred from the mesh transfer unit 1210, the imaging viewpoint depth generation unit 1212 generates selected imaging viewpoint depth information, which is depth information corresponding to the image captured by the selected imaging camera.

Note that it is also possible to transfer the depth information included in the imaging camera information input to the rendering unit 121 to the imaging viewpoint depth generation unit 1212 . In this case, depth generation processing by the imaging viewpoint depth generation unit 1212 is unnecessary, and the imaging viewpoint depth generation unit 1212 transfers the depth information to the virtual viewpoint texture generation unit 1214 as selected imaging viewpoint depth information. Further, when the depth information is on the GPU memory, the transfer of the selected imaging viewpoint depth information can be omitted. In this case, the virtual viewpoint texture generation unit 1214 may access the GPU memory and acquire the selected imaging viewpoint depth information.

The virtual viewpoint position information and the subject position information are transferred to the imaging camera selection section 1211 and the imaging camera information transfer section 1213 . The imaging camera selection unit 1211 selects one or more imaging cameras to be used in subsequent processing from the imaging cameras 60 ₁ to 60 _n based on the camera parameter information, the virtual viewpoint position information, and the subject position information. Camera selection information is generated that indicates one or more imaging cameras. The imaging camera selection unit 1211 transfers the generated camera selection information to the imaging viewpoint depth generation unit 1212 and the imaging camera information transfer unit 1213 .

In this way, the imaging camera selection unit 1211 selects the first position of the virtual camera that acquires the image of the virtual space, the second position of the three-dimensional model, and one or more imaging cameras that capture the subject in the real space. It functions as a selection unit that selects, from one or more imaging cameras, an imaging camera that acquires an imaging image of a subject to be used as a texture image based on the third position.

Based on the camera selection information passed from the imaging camera selection section 1211, the imaging camera information transfer section 1213 transfers imaging camera information indicating the selected imaging camera to the virtual viewpoint texture generation section 1214 as selected camera information. Even in this case, if the imaging camera information is already on the GPU memory, the process of transferring the selected camera information can be omitted. In this case, the virtual viewpoint texture generation unit 1214 may access the GPU memory and acquire the imaging camera information.

As described above, the virtual viewpoint texture generation unit 1214 receives the mesh information from the mesh transfer unit 1210, the selected imaging viewpoint depth information from the imaging viewpoint depth generation unit 1212, and the selected camera information from the imaging camera information transfer unit 1213. is transferred. Also, the virtual viewpoint position information and the subject position information input to the rendering unit 121 are transferred to the virtual viewpoint texture generation unit 1214 . The virtual viewpoint texture generation unit 1214 generates the texture of the virtual viewpoint, which is the viewpoint from the virtual camera 70, based on the information transferred from each of these units.

Thus, the virtual viewpoint texture generation unit 1214 functions as a generation unit that generates an image by applying a texture image to the 3D model included in the 3D data.

(3-3-3. Specific example of rendering processing according to the embodiment)
Next, rendering processing according to the embodiment will be described more specifically.

(3-3-3-1. First example of imaging camera selection processing)
FIG. 17 is an exemplary flowchart illustrating a first example of imaging camera selection processing in rendering processing according to the embodiment. In this first example, one reference position is set for one or more subjects. Each process in the flowchart of FIG. 17 is a process executed by the imaging camera selection unit 1211 included in the rendering unit 121 .

In step S200, the imaging camera selection unit 1211 sets the loop variable i related to the number of objects (subjects) input to the rendering unit 121 to i=0. The subsequent processing from step S201 to step S205 is processing for each object (subject). Note that the number of objects input to the rendering unit 121 can be obtained from subject position information.

In the next step S201, the imaging camera selection unit 1211 sets the loop variable j related to the number of vertices of the bounding box to j=0. The subsequent processing in steps S202 and S203 is processing for each vertex of the bounding box of the i-th object.

In the next step S202, the imaging camera selection unit 1211 projects the j-th vertex of the bounding box of the i-th object onto the virtual camera 70 based on the virtual viewpoint position information and the subject position information related to the object.

In the next step S203, the imaging camera selection unit 1211 determines whether the processing for all vertices of the target bounding box has been completed, or whether the j-th vertex of the target bounding box has been projected within the angle of view α of the virtual camera 70. determine whether or not The imaging camera selection unit 1211 determines that processing has not been completed for all vertices of the target bounding box, and that the j-th vertex of the target bounding box is not projected within the angle of view α of the virtual camera 70. If so (step S203, "No"), the loop variable j is set to j=j+1, and the process returns to step S202.

On the other hand, the imaging camera selection unit 1211 has determined that processing has been completed for all vertices of the target bounding box, or that the j-th vertex of the target bounding box has been projected within the angle of view α of the virtual camera 70. If so (step S203, "Yes"), the process proceeds to step S204. In step S204, if even one of all the vertices of the target bounding box exists within the angle of view α of the virtual camera 70, the imaging camera selection unit 1211 adds a reference position based on the bounding box.

In steps S203 and S204, if at least one of the vertices of the bounding box projected onto the virtual camera 70 is included within the angle of view α of the virtual camera 70, the imaging camera selection unit 1211 It is assumed that the (subject) of the object related to the bounding box exists within the angle of view α of the virtual camera 70 . Then, the imaging camera selection unit 1211 obtains the reference position based on the bounding box assumed to exist within the angle of view α of the virtual camera 70 .

FIG. 18 is a schematic diagram for explaining the relationship between an object (subject) and the virtual camera 70 according to the embodiment. In the example of FIG. 18 , among the vertices of the three-dimensional bounding box 200a related to the three-dimensional model 51a, the vertex 201a is outside the angle of view α of the virtual camera 70. In the example of FIG. Even in such a case, the imaging camera selection unit 1211 assumes that the three-dimensional model 51a exists within the angle of view α of the virtual camera 70 . That is, when at least one of the vertices of the three-dimensional bounding box 200a related to the three-dimensional model 51a is within the angle of view α of the virtual camera 70, the imaging camera selection unit 1211 determines that the three-dimensional model 51a is It is assumed to exist within the angle of view α of the virtual camera 70 . The imaging camera selection unit 1211 obtains the reference position 84a for the three-dimensional model 51a related to the bounding box 200a based on the coordinates of each vertex of the three-dimensional bounding box 200a. For example, the imaging camera selection unit 1211 obtains the average value of the coordinates of each vertex of the three-dimensional bounding box 200a as the reference position 84a for the three-dimensional model 51a related to the bounding box 200a.

In the example of FIG. 18, when it is determined in step S203 that any vertex other than the vertex 201a of the bounding box 200a exists within the angle of view α of the virtual camera 70, , the process proceeds to step S204.

In the next step S205, the imaging camera selection unit 1211 determines whether the processes of steps S202 to S204 have been completed for all objects input to the rendering unit 121. When the imaging camera selection unit 1211 determines that processing has not been completed for all objects input to the rendering unit 121 (“No” in step S205), the loop variable i is set to i=i+1, and the process proceeds to step S201. back to on the other hand. When the imaging camera selection unit 1211 determines that the processing has been completed for all objects input to the rendering unit 121 (step S205, “Yes”), the process proceeds to step S206.

In step S206, the imaging camera selection unit 1211 calculates a representative reference position for all objects for which processing has been completed up to step S205. More specifically, in step S206, the imaging camera selection unit 1211 calculates the average value of the reference positions of all the objects for which the processing has been completed up to step S205. The imaging camera selection unit 1211 uses the calculated average value as a representative reference position for all the objects.

FIG. 19 is a schematic diagram for explaining the process of calculating the average value of the reference positions of the objects according to the embodiment. In the example of FIG. 19, the angle of view α of the virtual camera 70 includes a bounding box 200a for the three-dimensional model 51a and a bounding box 200b for the three-dimensional model 51b.

Reference positions

84a and 84b are set for the three-

dimensional models

51a and 51b, respectively.

The imaging camera selection unit 1211 sets the reference position 85 with respect to the coordinates of the average values of the coordinates of the

reference positions

84a and 84b. This reference position 85 serves as a common reference position for the three-

dimensional models

51a and 51b. For example, when the three-

dimensional models

51a and 51b are regarded as one object that does not need to be separated, the reference position 85 is used to select the optimum imaging camera for the three-

dimensional models

51a and 51b in common. set. As an example of the case where the three-

dimensional models

51a and 51b are regarded as one subject that does not need to be separated, for example, the three-

dimensional models

51a and 51b form one group.

In the next step S207, the imaging camera selection unit 1211 sets the loop variable k related to the imaging cameras 60 ₁ to 60 _n to k=0. The subsequent processing of steps S208 to S210 is processing for each of the imaging cameras 60 ₁ to 60 _n . Also, the processing target in the loop among the imaging cameras 60 ₁ to 60 _n is assumed to be the imaging camera 60 _k .

In step S208, for the k-th imaging camera 60 _k , the imaging camera selection unit 1211 obtains an angle between a vector directed from the imaging camera 60 _k to the reference position 85 and a vector directed from the virtual camera 70 to the reference position 85. . In the next step S209, the imaging camera selection unit 1211 sorts the imaging cameras 60 _k in ascending order of angles obtained in step S208 in the loop processing based on the loop variable k. That is, in step S209, the imaging camera selection unit 1211 sorts the imaging cameras 60 _k in descending order of importance.

In the next step S210, the imaging camera selection unit 1211 determines whether or not processing has been completed for all of the arranged imaging cameras 60 ₁ to 60 _n . When the imaging camera selection unit 1211 determines that the processing has not been completed for all the imaging cameras 60 ₁ to 60 _n (step S210, “No”), the loop variable k is set to k=k+1, and the process proceeds to step S208. return. On the other hand, when the imaging camera selection unit 1211 determines that the processing has been completed for all the imaging cameras 60 ₁ to 60 _n (step S210, “Yes”), the process proceeds to step S211.

In step S211, the imaging camera selection unit 1211 selects camera information indicating top m imaging cameras from the array of imaging cameras 60 ₁ to 60 _n sorted in ascending order of angle. The imaging camera selection unit 1211 transfers information indicating each selected imaging camera to the imaging viewpoint depth generation unit 1212 and the imaging camera information transfer unit 1213 as camera selection information.

When the process of step S211 ends, the imaging camera selection unit 1211 ends the series of processes according to the flowchart of FIG.

In this first example, one reference position 85 is collectively set for a plurality of three-

dimensional models

51a and 51b. Therefore, in post-effect processing and the like, which will be described later, the three-

dimensional models

51a and 51b are subjected to common effect processing at the same time.

(3-3-3-2. Second example of imaging camera selection processing)
Next, a second example of imaging camera selection processing in rendering processing according to the embodiment will be described. FIG. 20 is an exemplary flowchart illustrating a second example of imaging camera selection processing in rendering processing according to the embodiment. In this second example, one reference position is set for each of one or more subjects. Each process in the flowchart of FIG. 20 is a process executed by the imaging camera selection unit 1211 included in the rendering unit 121. In FIG.

In the flowchart of FIG. 20, the processing of steps S200 to S205 is the same as the processing of steps S200 to S205 in the flowchart of the above-described figure, so the description is omitted here.

The imaging camera selection unit 1211 advances the process to step S2060 when the reference position addition processing for all objects is completed in step S205. In step S2060, the imaging camera selection unit 1211 sets the loop variable l related to the object included within the angle of view α of the virtual camera 70 to l=0. The subsequent processing from step S208 to step S2101 is processing for each object.

In the next step S207, the imaging camera selection unit 1211 sets the loop variable k related to the imaging cameras 60 ₁ to 60 _n to k=0. The subsequent processing of steps S208 to S210 is processing for each of the imaging cameras 60 ₁ to 60 _n . Note that the processing of steps S208 to S210 is the same as the processing of steps S208 to S210 in the flow chart of FIG.

When the imaging camera selection unit 1211 determines that the processing for all the arranged imaging cameras 60 ₁ to 60 _n is completed in step S210 (step S210, “Yes”), the process proceeds to step S2101.

In step S2101, the imaging camera selection unit 1211 determines whether or not the processing for all objects included within the angle of view α of the virtual camera 70 has been completed. When the imaging camera selection unit 1211 determines that processing for all objects included in the angle of view α of the virtual camera 70 has not ended (step S2101, “No”), the loop variable l is set to l=l+1, The process returns to step S207. On the other hand, if the imaging camera selection unit 1211 determines that the process has ended (step S2101, "Yes"), the process proceeds to step S211.

In step S211, the imaging camera selection unit 1211 selects the top m imaging cameras from the arrangement of the imaging cameras 60 ₁ to 60 _n sorted in ascending order of angle, as in step S211 of the flowchart of FIG. Select the camera information to display. The imaging camera selection unit 1211 transfers information indicating each selected imaging camera to the imaging viewpoint depth generation unit 1212 and the imaging camera information transfer unit 1213 as camera selection information.

In the second example of the imaging camera selection process in this rendering process, for example, the reference position 85 in FIG. 19 described above is not set, and

reference positions

84a and 84b are set for the three-

dimensional models

51a and 51b, respectively. .

In this second example,

reference positions

84a and 84b are individually set for each of the plurality of three-

dimensional models

51a and 51b. Therefore, in post-effect processing, etc., which will be described later, it is possible to apply effect processing to each of the three-

dimensional models

51a and 51b individually.

(3-3-3-3. Details of rendering processing)
Next, rendering processing according to the embodiment will be described in more detail. FIG. 21 is an exemplary flowchart illustrating rendering processing according to the embodiment. Each process according to the flowchart of FIG. 21 is a process executed by the virtual viewpoint texture generation unit 1214 included in the rendering unit 121 .

At step S300, the virtual viewpoint texture generation unit 1214 sets the loop variable p related to the vertices of the mesh information based on the subject to p=0. Thereafter, the processing of steps S301 to S303 is performed for each vertex indicated in the mesh information. Note that the mesh information may include mesh information of a plurality of subjects.

In the next step S301, the virtual viewpoint texture generation unit 1214 selects vertices to be projected onto the virtual viewpoint by the virtual camera 70 from the mesh information based on the subject position information. In the next step S302, the virtual viewpoint texture generation unit 1214 rasterizes based on the vertices selected in step S301. That is, the vertices not selected in step S301 are not rasterized and are not projected onto the virtual viewpoint, ie, the virtual camera 70 . Therefore, the virtual viewpoint texture generation unit 1214 can selectively set display/non-display for each of a plurality of subjects.

In the next step S303, the virtual viewpoint texture generation unit 1214 determines whether or not processing has been completed for all vertices of the mesh indicated by the mesh information. If the virtual viewpoint texture generation unit 1214 determines in step S303 that processing has not been completed for all vertices of the mesh indicated by the mesh information (step S303, “No”), the loop variable p is set to p=p+1, and processing is performed. is returned to step S301. On the other hand, if the virtual viewpoint texture generation unit 1214 determines that processing has been completed for all vertices of the mesh indicated by the mesh information (step S303, "Yes"), the process proceeds to step S304.

In step S304, the virtual viewpoint texture generation unit 1214 sets the loop variable q related to the virtual viewpoint of the virtual camera 70 to q=0. After that, the processing from step S305 to step S315 is processing for each pixel of the virtual viewpoint.

In step S305, the virtual viewpoint texture generation unit 1214 obtains the vertex of the mesh corresponding to the pixel q of the virtual viewpoint.

In the next step S306, the virtual viewpoint texture generation unit 1214 sets the loop variable r related to the imaging cameras 60 ₁ to 60 _n to r=0. The subsequent processing of steps S307 to S313 is processing for each of the imaging cameras 60 ₁ to 60 _n . Also, the imaging camera 60 _r is assumed to be the object of processing in the loop among the imaging cameras 60 ₁ to 60 _n .

In step S307, the virtual viewpoint texture generation unit 1214 projects the vertex coordinates of the vertices obtained in step S305 onto the imaging camera _60r , and obtains the UV coordinates of the vertex coordinates of the imaging camera _60r . In the next step S308, the virtual viewpoint texture generation unit 1214 compares the depth of each vertex of the mesh in the imaging camera 60 _r with the depth of the vertex coordinates of the vertex obtained in step S305, and obtains the difference between the two.

In step S309, the virtual viewpoint texture generation unit 1214 determines whether the difference obtained in step S308 is equal to or greater than a threshold. If the virtual viewpoint texture generation unit 1214 determines that the difference is equal to or greater than the threshold value (step S309, "Yes"), the process proceeds to step S310, and the imaging camera information (selected camera information) obtained by the imaging camera _60r is generated. I don't use it.

On the other hand, if the virtual viewpoint texture generation unit 1214 determines that the difference obtained in step S308 is less than the threshold value (step _S309 , "No"), the process proceeds to step S311, and Suppose we use information.

In the next step S312, the virtual viewpoint texture generation unit 1214 acquires color information at the UV coordinates obtained in step S307 from the imaging camera information. The virtual viewpoint texture generation unit 1214 then obtains a blend coefficient for the color information. For example, the virtual viewpoint texture generation unit 1214 selects the imaging camera 60 _r based on the imaging camera information selected by the processing of steps S208 to S211 in the flowchart of FIG. 17 or FIG. Find the blend coefficient for the captured image (texture image) of _r .

After the process of step S310 or the process of step S312, the virtual viewpoint texture generation unit 1214 shifts the process to step S313. In step S313, the virtual viewpoint texture generation unit 1214 determines whether or not the processing for all of the arranged imaging cameras 60 ₁ to 60 _n has been completed. When the virtual viewpoint texture generation unit 1214 determines that the processing for all the imaging cameras 60 ₁ to 60 _n has not been completed (step S313, “No”), the loop variable r is set to r=r+1, and the process proceeds to step S307. back to On the other hand, when the virtual viewpoint texture generation unit 1214 determines that the processing for all the imaging cameras 60 ₁ to 60 _n is completed (step S313, “Yes”), the process proceeds to step S314.

In step S314, the virtual viewpoint texture generation unit 1214 blends the color information in the imaging camera information used in step S311 among the imaging cameras 60 ₁ to 60 _n according to the blending coefficient obtained in step S312. Thus, color information for pixel q is determined.

In the next step S315, the virtual viewpoint texture generation unit 1214 determines whether or not the processing for all pixels at the virtual viewpoint of the virtual camera 70 has been completed. If the virtual viewpoint texture generation unit 1214 determines that processing has not been completed for all pixels (“No” at step S315), the loop variable q is set to q=q+1, and the process returns to step S305.

On the other hand, if the virtual viewpoint texture generation unit 1214 determines in step S315 that processing has been completed for all pixels, it terminates the series of processing according to the flowchart of FIG.

(3-3-3-4. Post effect processing)
Next, post-effect processing according to the embodiment will be described. FIG. 22 is a schematic diagram for explaining post-effect processing according to the embodiment. Section (a) of FIG. 22 shows an example of rendering processing according to the embodiment, and section (b) shows an example of rendering processing by existing technology. In each of sections (a) and (b) of FIG. 22, there are two

subjects

86 and 87 within the angle of view α of the virtual camera 70, and the imaging cameras 60 ₁ to 60 ₄ surround these

subjects

86 and 87. shall be placed.

Here, a case will be considered in which the subject 87 among the

subjects

86 and 87 is hidden from the image captured by the virtual camera 70 by post-effect processing. The virtual viewpoint texture generation unit 1214 performs ray tracing from the position of the pixel 72 of the image acquired by the virtual camera 70 (the output pixel of the virtual camera 70) to the virtual optical path 95 and the optical paths 96 ₁ to 96 ₄ . Then, the position of the pixel (input pixel) corresponding to the pixel 72 of each of the plurality of imaging cameras 60 ₁ to 60 ₄ is obtained (FIG. 21, steps S305 to S307). The virtual viewpoint texture generation unit 1214 obtains the color information of the pixels 72 of the virtual camera 70 by blending the obtained color information of the respective pixels of the plurality of imaging cameras 60 ₁ to 60 ₄ according to the blend coefficients (Fig. 21, step S312).

The subject 87 is on the front side of the subject 86 from the virtual camera 70 , and is on the virtual optical path 96 ₄ to the subject 86 with respect to the imaging camera 60 ₄ and is reflected in the imaging camera 604 . In this case, it is possible not to use the captured image of the subject 87 as the texture image of the subject 86 based on the depth information included in the imaging camera information of the imaging camera 60 ₄ . This is the same for the rendering processing according to the embodiment shown in section (a) of FIG. 22 and the rendering processing by the existing technology shown in section (b).

In the embodiment, as described in step S301 of FIG. 21, the virtual viewpoint texture generation unit 1214 selects vertices to be projected onto the virtual viewpoint by the virtual camera 70 from mesh information based on subject position information. Therefore, the virtual viewpoint texture generation unit 1214 can selectively set display/non-display for each of a plurality of subjects. Specifically, like the subject 87 indicated by the dotted line in section (a) of FIG. 22, the subject 87 can be hidden based on the subject position information indicating the position of the subject 87 .

Also in the case of section (a), the imaging camera 60 ₄ in real space images the subject 87 . Therefore, the virtual viewpoint texture generation unit 1214 does not use the captured image captured by the imaging camera 604 as the texture image of the subject 86 . Also, the plane of the subject 87 on the side of the imaging camera (not shown) located at a position where the subject 87 has passed from the virtual camera 70 (indicated by an arrow 97 ) cannot be seen from the virtual camera 70 . Therefore, the virtual viewpoint texture generation unit 1214 does not acquire the imaging camera information of the imaging camera. As a result, the processing load of the virtual viewpoint texture generation unit 1214 can be reduced.

In the above description, as the post-effect processing, the processing for switching display/non-display of a specific subject among the subjects included in the angle of view α of the virtual camera 70 has been described as an example, but this is not limited to this example. , can also be applied to other post-effect processing.

FIG. 23 is a schematic diagram showing more specifically the post-effect processing according to the embodiment. Section (a) of FIG. 23 shows an example of an output image 300a output from the virtual camera 70 in which the three-

dimensional models

51c and 51d are included in the angle of view α of the virtual camera 70. FIG. Three-

dimensional models

51c and 51d are associated with bounding

boxes

200c and 200d, respectively. Note that in sections (a) and (b) of FIG. 23, the frame lines of the

respective bounding boxes

200c and 200d are shown for explanation and are not displayed in the actual image.

Section (b) of FIG. 23 shows an example of an output image 300b when the position of the virtual camera 70 is moved from section (a) and the three-dimensional model 51c is moved. In this section (b), the three-dimensional model 51d associated with the bounding box 200d is specified based on subject position information indicating the position of the three-dimensional model 51d, and is hidden by post-effect processing. However, the three-dimensional model 51d itself is within the angle of view α of the virtual camera 70, and the related bounding box 200d exists.

Thus, according to the embodiment, it is possible to switch on/off of post-effect processing individually for the three-

dimensional models

51c and 51d based on their subject position information.

[4. Application example of the embodiment of the present disclosure]
The technology according to the present disclosure can be applied to various products and services. Next, application examples to the embodiments of the present disclosure will be described.

(Content production)
For example, a 3D model of a subject generated by the information processing system 100 according to the embodiment and 3D data managed by another device (such as a server) may be combined to produce new video content. good. Further, for example, when there is background data acquired by an imaging device such as LiDAR (Light Detection and Rangin/Laser Imaging Detection and Ranging), the three-dimensional model of the subject generated by the information processing system 100 according to the embodiment By combining the background data with the background data, it is possible to create video content that makes the subject appear as if it exists in the location indicated by the background data. In these cases, the video content to be produced may be video content having three-dimensional information, or video content obtained by converting three-dimensional information into two-dimensional information. Note that the 3D model of the subject generated by the information processing system 100 according to the embodiment includes, for example, a 3D model generated by the 3D model generation unit 111 and a 3D model reconstructed by the rendering unit 121. .

(Experience in virtual space)
For example, a subject (for example, a performer) generated by the information processing system 100 according to the embodiment can be placed in a virtual space where the user communicates as an avatar. In this case, the user becomes an avatar and can observe the photographed subject in the virtual space.

(Application to communication with remote locations)
For example, by transmitting the 3D model of the subject generated by the 3D model generation unit 111 from the transmission unit 113 to a remote location, the user at the remote location can observe the 3D model of the subject through a playback device at the remote location. can be done. For example, real-time communication between the subject and a remote user can be realized by transmitting the three-dimensional model of the subject in real time. As an application example of this case, a case where the subject is a teacher and the user is a student, or a case where the subject is a doctor and the user is a patient can be assumed.

(Other application examples)
For example, it is possible to generate a free-viewpoint video of a sport or the like based on three-dimensional models of a plurality of subjects generated by the information processing system 100 according to the embodiment. Also, an individual can distribute his/her own three-dimensional model generated by the information processing system 100 according to the embodiment to the distribution platform. In this way, the technology according to the embodiments of the present disclosure can be applied to various technologies and services.

[5. Other embodiments]
For example, the information processing program according to the embodiment described above may be executed in another device having a CPU, a ROM, a RAM, etc. and having functions as an information processing device. In that case, the device should have the necessary functional blocks and be able to obtain the necessary information.

Also, for example, in each of the flowcharts described above, each step of one flowchart may be executed by one device, or may be shared by a plurality of devices. Furthermore, when one step of the flowchart includes a plurality of processes, the plurality of processes may be executed by one device, or may be shared by a plurality of devices. In other words, a plurality of processes included in one step of the flowchart can also be executed as a process of a plurality of steps. Conversely, the processing described as a plurality of steps in the flowchart can also be collectively executed as one step.

Furthermore, for example, in the information processing program executed by the information processing system 100, the processing of the steps describing the information processing program may be executed in chronological order according to the order shown in each flowchart described above. , may be executed in parallel, or individually as needed, such as when a call is made. That is, as long as there is no contradiction, the processing of each step may be executed in an order different from the order described above. Furthermore, the processing of the step of writing the information processing program according to the embodiment may be executed in parallel with the processing of other programs, or may be executed in combination with the processing of other programs. good.

Furthermore, for example, a plurality of technologies related to the present disclosure can be implemented independently, or a plurality of technologies related to the present disclosure can be implemented in combination, as long as there is no contradiction. Also, part or all of the techniques according to the above-described embodiments can be implemented in combination with other techniques not described above.

It should be noted that the effects described in this specification are only examples and are not limited, and other effects may also occur.

Note that the present technology can also take the following configuration.
(1)
a generation unit that generates an image by applying a texture image to a three-dimensional model included in three-dimensional data;
Based on a first position of a virtual camera that acquires an image of the virtual space, a second position of the three-dimensional model, and a third position of one or more imaging cameras that capture an object in the real space, a selection unit that selects, from one or more imaging cameras, an imaging camera that acquires a captured image of the subject to be used as the texture image;
comprising
Information processing equipment.
(2)
The generating unit
The texture image is generated according to the viewpoint from the virtual camera based on the captured image acquired by the imaging camera selected by the selection unit from the one or more imaging cameras.
The information processing device according to (1) above.
(3)
The selection unit
Selecting an imaging camera that acquires a captured image of the subject according to the importance of each of the one or more imaging cameras obtained based on the first position, the second position, and the third position;
The information processing apparatus according to (1) or (2).
(4)
The selection unit
Obtaining the degree of importance based on an angle formed by the first position and the third position with the second position as the vertex;
The information processing device according to (3) above.
(5)
The generating unit
generating the texture image by blending the captured images captured by the one or more imaging cameras according to the importance;
The information processing apparatus according to (3) or (4).
(6)
The generating unit
applying the texture image to the three-dimensional model when at least one vertex coordinate of each of the vertex coordinates of a rectangular parallelepiped circumscribing the three-dimensional model is within the angle of view of the virtual camera;
The information processing apparatus according to any one of (1) to (5) above.
(7)
The generating unit
Designating the three-dimensional model to give a predetermined effect based on the second position;
The information processing apparatus according to any one of (1) to (6).
(8)
the predetermined effect is an effect of hiding the specified three-dimensional model from the virtual camera;
The information processing device according to (7) above.
(9)
The selection unit
Deselecting, from among the one or more imaging cameras, an imaging camera that images the subject from a direction outside the angle of view of the virtual camera of the three-dimensional model;
The information processing apparatus according to any one of (1) to (8).
(10)
The selection unit
Using the average coordinates of the vertex coordinates of a rectangular parallelepiped circumscribing the three-dimensional model as the second position;
The information processing apparatus according to any one of (1) to (9).
(11)
The selection unit
When a plurality of three-dimensional models included in the three-dimensional data are included within the angle of view of the virtual camera, the average of the second positions of the plurality of three-dimensional models is calculated as the plurality of three-dimensional models. as the second position for
The information processing device according to (10) above.
(12)
executed by a processor;
a generation step of generating an image by applying a texture image to a three-dimensional model included in three-dimensional data;
Based on a first position of a virtual camera that acquires an image of the virtual space, a second position of the three-dimensional model, and a third position of one or more imaging cameras that capture an object in the real space, a selection step of selecting, from one or more imaging cameras, an imaging camera that acquires a captured image of the subject to be used as the texture image;
having
Information processing methods.
(13)
a generation unit that generates three-dimensional data based on captured images captured by one or more imaging cameras;
a separation unit that separates a three-dimensional model corresponding to a subject included in the captured image from the three-dimensional data and generates position information indicating the position of the separated three-dimensional model;
comprising
Information processing equipment.
(14)
The separation unit is
By specifying a region of the subject on the two-dimensional plane based on information on the two-dimensional plane obtained by projecting the three-dimensional data in the height direction, and providing the region with information in the height direction, separating the three-dimensional model;
The information processing device according to (13) above.
(15)
The separation unit is
generating the position information including the coordinates of each vertex of a rectangular parallelepiped circumscribing the three-dimensional model, which is generated by giving information in the height direction to the region;
The information processing device according to (14) above.
(16)
an output unit that adds the position information to the three-dimensional model separated from the three-dimensional data by the separation unit and outputs the model;
further comprising
The information processing apparatus according to any one of (13) to (15).
(17)
The output unit
outputting the information of the three-dimensional model as multi-viewpoint captured images obtained by capturing the subject corresponding to the three-dimensional model with the one or more imaging cameras, and depth information for each of the multi-viewpoint captured images;
The information processing device according to (16) above.
(18)
The output unit
outputting the information of the three-dimensional model as mesh information;
The information processing device according to (16) above.
(19)
executed by a processor;
a generation step of generating three-dimensional data based on captured images captured by one or more imaging cameras;
a separation unit step of separating a three-dimensional model corresponding to a subject included in the captured image from the three-dimensional data and generating position information indicating the position of the separated three-dimensional model;
has a
Information processing methods.

50 three-

dimensional data

51 ₁ , 51 ₂ , 51 ₃ , 51a, 51b, 51c, 51d three-dimensional models 52 ₁ , 52 ₂ , 52 ₃

silhouettes

53 ₁ , 53 ₂ , 53 ₃ , 200 ₁ , 200 ₂ , 200 ₃ , 200a, 200b, 200c,

200d bounding boxes

601, 602, 603, ₆₀₄ , ₆₀₈ , ₆₀₁₆ , _60n _- ₁ , _60n imaging camera ₇₀ virtual cameras ₈₀ , 821, ₈₂₂ , 86, 87

Objects

81, 83, 84a, 84b, 85

Reference positions

90a, 90b, 91a, 91b Vector 100 Information processing system 110 Data acquisition unit 111 3D model generation unit 112 Formatting unit 113 Transmission unit 120 Reception unit 121 Rendering unit 122 Display unit 1110 3D model processing unit 1111 3D model separation unit 1210 Mesh transfer unit 1211 Imaging camera selection unit 1212 Imaging viewpoint depth generation unit 1213 Imaging camera information transfer unit 1214 Virtual viewpoint texture generation unit 2000 Information processing device 2100 CPU

Claims

a generation unit that generates an image by applying a texture image to a three-dimensional model included in three-dimensional data;
Based on a first position of a virtual camera that acquires an image of the virtual space, a second position of the three-dimensional model, and a third position of one or more imaging cameras that capture an object in the real space, a selection unit that selects, from one or more imaging cameras, an imaging camera that acquires a captured image of the subject to be used as the texture image;
comprising
Information processing equipment.
The generating unit
The texture image is generated according to the viewpoint from the virtual camera based on the captured image acquired by the imaging camera selected by the selection unit from the one or more imaging cameras.
The information processing device according to claim 1 .
The selection unit
Selecting an imaging camera that acquires a captured image of the subject according to the importance of each of the one or more imaging cameras obtained based on the first position, the second position, and the third position;
The information processing device according to claim 1 .
The selection unit
Obtaining the degree of importance based on an angle formed by the first position and the third position with the second position as the vertex;
The information processing apparatus according to claim 3.
The generating unit
generating the texture image by blending the captured images captured by the one or more imaging cameras according to the importance;
The information processing apparatus according to claim 3.
The generating unit
applying the texture image to the three-dimensional model when at least one vertex coordinate of each of the vertex coordinates of a rectangular parallelepiped circumscribing the three-dimensional model is within the angle of view of the virtual camera;
The information processing device according to claim 1 .
The generating unit
Designating the three-dimensional model to give a predetermined effect based on the second position;
The information processing device according to claim 1 .
the predetermined effect is an effect of hiding the specified three-dimensional model from the virtual camera;
The information processing apparatus according to claim 7.
The selection unit
Deselecting, from among the one or more imaging cameras, an imaging camera that images the subject from a direction outside the angle of view of the virtual camera of the three-dimensional model;
The information processing device according to claim 1 .
The selection unit
Using the average coordinates of the vertex coordinates of a rectangular parallelepiped circumscribing the three-dimensional model as the second position;
The information processing device according to claim 1 .
The selection unit
When a plurality of three-dimensional models included in the three-dimensional data are included within the angle of view of the virtual camera, the average of the second positions of the plurality of three-dimensional models is calculated as the plurality of three-dimensional models. as the second position for
The information processing apparatus according to claim 10.
executed by a processor,
a generation step of generating an image by applying a texture image to a three-dimensional model included in three-dimensional data;
Based on a first position of a virtual camera that acquires an image of the virtual space, a second position of the three-dimensional model, and a third position of one or more imaging cameras that capture an object in the real space, a selection step of selecting, from one or more imaging cameras, an imaging camera that acquires a captured image of the subject to be used as the texture image;
having
Information processing methods.
a generation unit that generates three-dimensional data based on captured images captured by one or more imaging cameras;
a separation unit that separates a three-dimensional model corresponding to a subject included in the captured image from the three-dimensional data and generates position information indicating the position of the separated three-dimensional model;
comprising
Information processing equipment.
The separation unit is
By specifying a region of the subject on the two-dimensional plane based on information on the two-dimensional plane obtained by projecting the three-dimensional data in the height direction, and providing the region with information in the height direction, separating the three-dimensional model;
The information processing apparatus according to claim 13.
The separation unit is
generating the position information including the coordinates of each vertex of a rectangular parallelepiped circumscribing the three-dimensional model, which is generated by giving information in the height direction to the region;
The information processing apparatus according to claim 14.
an output unit that adds the position information to the three-dimensional model separated from the three-dimensional data by the separation unit and outputs the model;
further comprising
The information processing apparatus according to claim 13.
The output unit
outputting the information of the three-dimensional model as multi-viewpoint captured images obtained by capturing the subject corresponding to the three-dimensional model with the one or more imaging cameras, and depth information for each of the multi-viewpoint captured images;
The information processing apparatus according to claim 16.
The output unit
outputting the information of the three-dimensional model as mesh information;
The information processing apparatus according to claim 16.
executed by a processor,
a generation step of generating three-dimensional data based on captured images captured by one or more imaging cameras;
a separation unit step of separating a three-dimensional model corresponding to a subject included in the captured image from the three-dimensional data and generating position information indicating the position of the separated three-dimensional model;
having
Information processing methods.