WO2023095353A1

WO2023095353A1 - Multi-viewpoint image generation device, method, and program

Info

Publication number: WO2023095353A1
Application number: PCT/JP2022/006981
Authority: WO
Inventors: 公孝堤
Original assignee: 日本電信電話株式会社
Priority date: 2021-11-24
Filing date: 2022-02-21
Publication date: 2023-06-01
Also published as: WO2023095212A1; WO2023095792A1

Abstract

One aspect of the present invention involves transforming training images that are photographed from a plurality of viewpoints into wavenumber domain training images respectively, expanding the wavenumber domain training images in the form of spherical harmonics series for each wavenumber component, generating a wavenumber domain image corresponding to a designated virtual viewpoint different from the plurality of viewpoints on the basis of the spherical harmonics series and information for specifying a direction of photography from the certain virtual viewpoint, and transforming the generated wavenumber domain image into a generated space domain image.

Description

Multi-viewpoint image generation device, method and program

One aspect of the present invention relates to a multi-viewpoint image generating apparatus, method, and program for generating images captured from viewpoints not included in the above images, for example, using images captured from a plurality of different viewpoints of an object arranged in space as input. .

A known technique is to estimate the three-dimensional shape of a subject from images taken by a plurality of cameras, and use this three-dimensional shape to synthesize images taken from arbitrary directions not taken by the above cameras. For example, in Patent Document 1, a silhouette of a subject is extracted from a plurality of images, a three-dimensional shape represented by voxels is estimated from the silhouette, and this three-dimensional shape is captured from an arbitrary viewpoint using a virtual camera. A technique for synthesizing the images photographed from the above arbitrary directions by photographing is described. This type of technology is attracting attention as an important elemental technology in the fields of content production and sports science, because it enables the presentation of images viewed from various viewpoints, for example, in live sports broadcasts.

Japanese Patent No. 5686412

However, since the technique described in Patent Document 1 expresses the three-dimensional shape of the subject using voxels, the data amount of the image representing the three-dimensional shape becomes large. For this reason, the amount of calculation increases when processing a three-dimensional image, and a large-capacity memory is required for the calculation. This tendency becomes more conspicuous as the resolution of the three-dimensional shape image increases, resulting in an increase in the processing load and processing time of the image generation device.

The present invention has been made in view of the above circumstances, and is capable of generating an image corresponding to an arbitrary viewpoint without using a three-dimensional image with a large amount of data, thereby reducing the processing load associated with image processing. And it is intended to provide a technique that enables shortening of the processing time.

In order to solve the above-described problems, a first aspect of a multi-viewpoint image generation apparatus or method according to the present invention converts teacher images captured from a plurality of viewpoints into teacher images in the wavenumber domain, and An image is expanded into a spherical harmonic series for each wavenumber component, and the wavenumber corresponding to the virtual viewpoint is obtained based on the spherical harmonic series and information specifying a shooting direction from an arbitrary virtual viewpoint different from the plurality of viewpoints. A generated image of a region is generated, and the generated image of the wavenumber domain is transformed into a generated image of the spatial domain.

According to the first aspect of the present invention, a teacher image photographed from a plurality of viewpoints is developed into a spherical harmonic series for each wave number component, and an image corresponding to an arbitrary virtual viewpoint is generated based on this spherical harmonic series. generated. Therefore, for example, compared to the case of expressing the three-dimensional shape of the subject using voxels, it is possible to generate an image viewed from a virtual viewpoint with a smaller amount of data, thereby reducing the processing load and processing related to image processing. It is possible to shorten the time.

A second aspect of the present invention provides a spherical harmonic expansion series optimization processing unit that receives as input the spherical harmonic expansion series obtained by the spherical harmonic expansion processing unit and outputs an optimized spherical harmonic expansion series, It is designed to be further equipped.

According to the second aspect of the present invention, it is possible to suppress deterioration in accuracy of generated multi-view image data and generate highly accurate multi-view image data.

A third aspect of the present invention is a multi-viewpoint image generation device that generates an image corresponding to a shooting direction from an arbitrary virtual viewpoint using spherical harmonics, wherein the shooting direction from the arbitrary virtual viewpoint is A basis vector calculation processing unit that calculates a basis vector of a spherical harmonic function corresponding to the obtained photographing direction when specified information is input, and a basis vector calculation processing unit that receives the calculated basis vector as an input and corresponds to the photographing direction. and a spherical harmonic transform processing section having an image generation processing section for generating and outputting an image in the spatial domain.

According to the third aspect of the present invention, for example, even when the scale of teacher image data and teacher direction data is enormous and it is difficult to directly calculate spherical harmonic expansion series data, spherical harmonic expansion series and subsequent high-level By learning a model corresponding to the accuracy improvement process, it is possible to suppress deterioration in accuracy of generated multi-viewpoint image data and generate highly accurate multi-viewpoint image data.

According to a fourth aspect of the present invention, the image generation processing unit includes a first neural network for generating and outputting a first image having a first resolution using the basis vectors as an input; generating a second image having a second resolution higher than the first resolution using the first image output from a network as an input, and corresponding the generated second image to the photographing direction; and a second neural network for outputting an image in the spatial domain.

According to the fourth aspect of the present invention, model learning can be made more efficient and the size of the learning model can be reduced.

That is, according to each aspect of the present invention, an image corresponding to an arbitrary viewpoint can be generated without using a three-dimensional image with a large amount of data, thereby reducing the processing load and processing time associated with image processing. It is possible to provide a technique that enables shortening.

FIG. 1 is a block diagram showing an example of the hardware configuration of a multi-viewpoint image generation device according to the first embodiment of the invention. FIG. 2 is a block diagram showing an example of the software configuration of the multi-viewpoint image generation device according to the first embodiment of the invention. FIG. 3 is a flowchart showing an example of a processing procedure and processing contents of a multi-viewpoint image generation process executed by a control unit of the multi-viewpoint image generation apparatus shown in FIG. FIG. 4 is a diagram used for explaining spherical harmonic series. FIG. 5 is a block diagram showing an example of the software configuration of the multi-viewpoint image generation device according to the second embodiment of the invention. 6 is a flow chart showing an example of a processing procedure and processing contents of a multi-viewpoint image generation process executed by a control unit of the multi-viewpoint image generation apparatus shown in FIG. 5. FIG. FIG. 7 is a diagram for explaining an operation example of a spherical harmonic expansion series optimization processing unit provided in the multi-viewpoint image generation device shown in FIG. FIG. 8 is a block diagram showing an example of the software configuration of the multi-viewpoint image generation device according to the third embodiment of the invention. 9 is a flowchart showing an example of a processing procedure and processing contents of a multi-viewpoint image generation process executed by a control unit of the multi-viewpoint image generation apparatus shown in FIG. 8. FIG. FIG. 10 is a block diagram showing an example of the software configuration of the spherical harmonic transform processing section of the multi-viewpoint image generation device according to the fourth embodiment of the present invention.

Hereinafter, embodiments according to the present invention will be described with reference to the drawings.

[First embodiment]
(Configuration example)
FIG. 1 is a block diagram showing an example of the hardware configuration of a multi-viewpoint image generating apparatus and its peripheral parts according to the first embodiment of the present invention, and FIG. 2 is a block diagram showing an example of the software configuration for generating the multi-viewpoint image. is.

The multi-viewpoint image generation device FGA is composed of an information processing device such as a server computer or a personal computer. is connected.

The cameras 61 to 6N are distributed in, for example, an event venue where sports are held, and photograph the inside of the event venue from a plurality of viewpoints and output the image data. Note that the photographing directions of the cameras 61 to 6N are adjusted so as to point toward points (origins) preset in the event venue.

The input/output device 7 is, for example, a personal computer, a smartphone, or a tablet terminal. In the first embodiment, the input/output device 7 transmits information representing an arbitrary virtual viewpoint specified by the user to the multi-viewpoint image generation device FGA, and receives image data sent from the multi-viewpoint image generation device FGA. used to display

The multi-viewpoint image generation apparatus FGA includes a control unit 1A using a hardware processor such as a central processing unit (CPU). A storage unit having a data storage section 3 and an input/output interface (hereinafter the interface is abbreviated as I/F) section 4 are connected.

The input/output I/F section 4 has a communication interface function, and transmits and receives image data and input data to and from the cameras 61 to 6N and the input/output device 7 via signal cables or networks.

The program storage unit 2 is composed of, for example, a non-volatile memory such as an SSD (Solid State Drive) that can be written and read at any time as a storage medium, and a non-volatile memory such as a ROM (Read Only Memory). , OS (Operating System) and other middleware, as well as application programs necessary for executing various control processes according to the first embodiment. Hereinafter, the OS and each application program will be collectively referred to as programs.

The data storage unit 3 is, for example, a combination of a non-volatile memory such as an SSD that can be written and read at any time and a volatile memory such as a RAM (Random Access Memory) as a storage medium. A teacher data storage unit 31, a wavenumber domain data storage unit 32, a spherical harmonic expansion series data storage unit 33, and a generated image data storage unit 34 are provided as main storage units necessary for implementation.

The teacher data storage unit 31 is used to store image data sent from the cameras 61 to 6N and information representing the installation positions or shooting directions of the cameras 61 to 6N as teacher data.

The wavenumber domain data storage unit 32 is used to store image data converted into wavenumber domain data by the control unit 1A, which will be described later.

The spherical harmonic expansion series data storage unit 33 is used to store spherical harmonic series data expanded based on the image data in the wavenumber domain by the control unit 1A, which will be described later.

The generated image data storage unit 34 is used to store image data in the wavenumber domain corresponding to the virtual viewpoint generated by the control unit 1A, which will be described later.

The control unit 1A includes, as processing functions necessary for carrying out the first embodiment, a teacher data acquisition processing unit 11, a Fourier transform processing unit 12, a spherical harmonic series expansion processing unit 13, and a spherical harmonic inverse transform processing. and an inverse Fourier transform processing unit 15 . These processing units 11 to 15 are realized by causing the hardware processor of the control unit 1A to execute the application programs stored in the program storage unit 2. FIG.

Some or all of the processing units 11 to 15 may be implemented using hardware such as LSI (Large Scale Integration) and ASIC (Application Specific Integrated Circuit).

The teacher data acquisition processing unit 11 receives image data output from the cameras 61 to 6N through the input/output I/F unit 4, and stores the received image data as teacher image data in the teacher data storage unit 31. Memorize. Further, the teacher data acquisition processing unit 11 acquires camera attribute information representing the installation positions or shooting directions of the cameras 61 to 6N from the cameras 61 to 6N or the input/output device 7, and stores the acquired camera attribute information as teacher direction data. , are stored in the teacher data storage unit 31 in association with the image data of the cameras 61 to 6N.

The Fourier transform processing unit 12 transforms each image data of each of the cameras 61 to 6N stored in the teacher data storage unit 31 into image data in the wavenumber domain by Fourier transform processing, and converts each image in the wavenumber domain. The data are stored in the wavenumber domain data storage unit 32 .

The spherical harmonic series expansion processing unit 13 converts each image data in the wavenumber domain stored in the wavenumber domain data storage unit 32 to each wavenumber based on the teacher direction data stored in the teacher data storage unit 31. Expand to a spherical harmonic series. Then, the expanded spherical harmonic expansion series data is stored in the spherical harmonic expansion series data storage unit 33 .

The inverse spherical harmonic transform processing unit 14 acquires, via the input/output I/F unit 4 , data representing the shooting direction from the virtual viewpoint input from the input/output device 7 . Then, the spherical harmonic expansion series data stored in the spherical harmonic expansion series data storage unit 33 is subjected to inverse spherical harmonic transformation to generate image data in the wavenumber domain corresponding to the photographing direction viewed from the virtual viewpoint, The generated image data of the wavenumber domain is stored in the generated image data storage unit 34 .

The inverse Fourier transform processing unit 15 transforms the generated image data in the wavenumber domain stored in the generated image data storage unit 34 into generated image data in the spatial domain by inverse Fourier transform processing. Then, the converted generated image data of the spatial domain is output from the input/output I/F section 4 to the input/output device 7 .

(Operation example)
Next, an operation example of the multi-viewpoint image generation apparatus FGA configured as described above will be described.
FIG. 3 is a flow chart showing an example of the processing procedure and processing contents of the multi-viewpoint image generation process executed by the control unit 1A of the multi-viewpoint image generation apparatus FGA.

(1) Acquisition of teaching data For example, assume that the cameras 61 to 6N photograph subjects in the event venue and transmit the image data. Upon receiving the transmission request for the image data in step S11, the control unit 1A of the multi-viewpoint image generation apparatus FGA receives the image data transmitted from the cameras 61 to 6N in step S12 under the control of the teacher data acquisition processing unit 11. Each data is received via the input/output I/F unit 4, and each of the received image data is stored in the teacher data storage unit 31 as teacher image data.
Of course, image data may be captured using the virtually arranged cameras 61 to 6N for an object that is virtually reproduced by computer graphics or the like, and the image data may be transmitted.

At the same time, the teacher data acquisition processing unit 11 receives camera attribute information indicating the installation positions or shooting directions of the cameras 61 to 6N from the cameras 61 to 6N or the input/output device 7 via the input/output I/F 4. Then, the received camera attribute information is stored in the teacher data storage unit 31 as teacher direction data in association with the image data of each of the cameras 61 to 6N.

Here, the teacher image data obtained by each of the cameras 61 to 6N is taken from a plurality of viewpoints which are equidistant from the origin of the object as the subject and which are present in different directions (elevation angle, azimuth angle). It is obtained. For example, if the spherical model shown in FIG. 4 is defined, teacher image data is represented as f _i (x, y) using coordinates x, y. Also, the direction of the camera viewed from the object is expressed as (θ _i , φ _i ). However, (1≤i≤N).

(2) Fourier Transform When the teacher image data is obtained, the controller 1A of the multi-viewpoint image generation device FGA, under the control of the Fourier transform processor 12, in step S13, converts the teacher data storage unit 31 to each teacher. Image data is read, and these teacher image data are converted into image data in the wavenumber domain by Fourier transform processing. Then, the converted image data of the wavenumber domain is stored in the wavenumber domain data storage unit 32 . The image data in the wavenumber domain is expressed as F _i (k _x , k _y ). However, (1≤i≤N).

(3) Conversion to Spherical Harmonic Expansion Series Next, in step S14, the control unit 1A of the multi-viewpoint image generation apparatus FGA converts the wavenumbers from the wavenumber domain data storage unit 32 to the wavenumber domain data storage unit 32 under the control of the spherical harmonic series expansion processing unit 13. _Fi (k _x , _ky ) is read as the image data of the region, and _Fi (k _x , _ky ) is converted into a spherical harmonic expansion series for each wave number component (k _x , _ky ). Convert. Then, the obtained spherical harmonic expansion series data is stored in the spherical harmonic expansion series data storage unit 33 .

For example, the above spherical harmonic expansion series data is obtained by using numerical integration,

It can be calculated by

Note that the spherical harmonic series expansion may be defined using the fact that the inverse transformation of the spherical harmonic series expansion is represented by the following equation. However, the truncated order of the spherical harmonic series expansion is set to the Mth order.

Here, matrix Y and vector A are defined by the following equations, respectively.

Also, a vector is defined in which only wavenumber (k _x , k _y ) components are arranged in the teacher image data in the wavenumber domain. This vector is

is indicated by

Using these, the spherical harmonic series expansion is

It is also possible to calculate by Note that Y ⁺ is a pseudo-inverse matrix of the matrix Y.

(4) Generation of Image Data Corresponding to Virtual Viewpoints The control unit 1A of the multi-viewpoint image generation apparatus FGA monitors virtual viewpoint designation inputs in step S15 under the control of the spherical harmonic inverse transform processing unit 14. . In this state, for example, when the user designates and inputs an arbitrary virtual viewpoint using the input/output device 7, the input/output device 7 transmits data designating the photographing direction from the virtual viewpoint.

When the spherical harmonic inverse transform processing unit 14 receives the data specifying the shooting direction from the virtual viewpoint, in step S16, the spherical harmonic expansion series data is read from the spherical harmonic expansion series data storage unit 33, and the virtual viewpoint Image data of the wave number domain viewed from the virtual viewpoint is generated based on the data specifying the imaging direction from the virtual viewpoint and the read spherical harmonic expansion series data. Then, the generated image data of the wavenumber domain viewed from the virtual viewpoint is stored in the generated image data storage unit 34 .

For example, the inverse spherical harmonic transform processing unit 14 first calculates a basis vector according to the following equation, where (θ̂, φ̂) is the shooting direction from the virtual viewpoint.

Next, the inverse spherical harmonic transform processing unit 14 uses the basis vectors to transform the generated image data F^(k _x , _ky ) in the wavenumber domain into

Calculated by

The inverse spherical harmonic transform processor 14 performs the above calculations on all wavenumber components (k _x , k _y ).

(5) Transformation into Spatial Domain Image Data Finally, under the control of the inverse Fourier transform processing unit 15, the control unit 1A of the multi-viewpoint image generation apparatus FGA converts the generated image data storage unit 34 to the virtual viewpoints in step S17. Read the image data F^(k _x , k _y ) in the wavenumber domain viewed from . Then, the read image data F^(k _x , k _y ) in the wavenumber domain is transformed into image data in the spatial domain by inverse Fourier transform processing.

The control unit 1A of the multi-viewpoint image generation apparatus FGA transmits the image data of the spatial region from the input/output I/F unit 4 to the input/output device 7.

Thus, the input/output device 7 displays the image data generated for the virtual viewpoint designated by the user.

(action/effect)
As described above, in the first embodiment, teacher image data photographed from a plurality of viewpoints and attribute data indicating the shooting direction of the teacher image data are acquired, and the teacher image data are converted into image data in the wavenumber domain by Fourier transform. After that, each wavenumber component is converted into spherical harmonic expansion series data. Based on the spherical harmonic expansion series data and the information indicating the imaging direction from the virtual viewpoint designated by the user, spherical harmonic inverse calculation is performed to generate image data of the wavenumber domain viewed from the virtual viewpoint. Then, the generated image data is transformed into image data in the spatial domain by inverse Fourier transform, and is output to the input/output device 7 .

Therefore, it is possible to generate image data viewed from virtual viewpoints with a smaller amount of data than, for example, when the three-dimensional shape of an object is expressed using voxels. It is possible to reduce the processing load and shorten the processing time.

[Second embodiment]
In the second embodiment of the present invention, by applying a pre-learned post-filter to the data converted to the spherical harmonic expansion series, the error between the generated image data and the target image data is reduced. is minimized to generate spherical harmonic expansion series data, and the generated spherical harmonic expansion series data is subjected to inverse transformation processing.

(Configuration example)
FIG. 5 is a block diagram showing the software configuration of the multi-viewpoint image generation device FGB according to the second embodiment of the invention.
In FIG. 5, the same parts as in FIG. 2 are denoted by the same reference numerals, and detailed description thereof will be omitted. Also, the hardware configuration of the multi-viewpoint image generation device FGB is the same as that in FIG. 1, so the description is omitted.

The control unit 1B of the multi-viewpoint image generation device FGB includes a teacher data acquisition processing unit 11, a Fourier transform processing unit 12, a spherical harmonic series expansion processing unit 13, an inverse spherical harmonic transform processing unit 14, and an inverse Fourier transform processing unit 15. A spherical harmonic expansion series optimization processing unit 16 is provided. These processing units 11 to 16 are realized by causing the hardware processor of the control unit 1A to execute the application programs stored in the program storage unit 2. FIG.

Also in the second embodiment, part or all of the processing units 11 to 16 may be implemented using hardware such as LSI and ASIC.

The spherical harmonic expansion series optimization processing unit 16 includes, for example, a multi-layer neural network constituting a postfilter, and receives as input the spherical harmonic expansion series data output from the spherical harmonic expansion processing unit 13, and performs an optimized spherical harmonic expansion. Output series data. The spherical harmonic expansion series optimization processing unit 16 stores the output optimized spherical harmonic expansion series data in the spherical harmonic expansion series data storage unit 33 .

The inverse spherical harmonic transform processing unit 14 acquires, via the input/output I/F unit 4 , data representing the shooting direction from the virtual viewpoint input from the input/output device 7 . At the same time, the optimized spherical harmonic expansion series data is read from the spherical harmonic expansion series data storage unit 33 . Then, the optimized spherical harmonic expansion series data is subjected to inverse spherical harmonic transformation to generate image data in the wavenumber domain corresponding to the imaging direction seen from the virtual viewpoint, and the generated image data in the wavenumber domain is stored in the generated image data storage unit 34 .

(Operation example)
Next, an operation example of the multi-viewpoint image generation device FGB configured as described above will be described.
FIG. 6 is a flowchart showing an example of the processing procedure and processing contents of the multi-viewpoint image generation process executed by the control unit 1B of the multi-viewpoint image generation device FGB. In FIG. 6, steps in which the same processes as those in FIG. 3 are performed are denoted by the same reference numerals, and detailed description thereof will be omitted.

When the spherical harmonic expansion series data is output from the spherical harmonic expansion processing unit 13, the control unit 1B of the multi-viewpoint image generation device FGB, under the control of the spherical harmonic expansion series optimization processing unit 16, expands the spherical surface in step S19. Optimize the harmonic expansion series data.

For example, the spherical harmonic expansion series optimization processing unit 16 uses a multi-layer neural network, and the spherical harmonic expansion series data A _mn (k _x , k _y ) and outputs the optimized spherical harmonic expansion series data A ^~ _mn (k _x ,k _y ) from the multi-layer neural network. Specifically, the multi-layer neural network constructs a postfilter to generate spherical harmonic expansion series data A ^~ _mn (k _x , k _y ) that minimizes the error between the generated image data and the target image data. is generated and output.

Then, the spherical harmonic expansion series optimization processing unit 16 stores the output optimized spherical harmonic expansion series data A ^~ _mn (k _x , k _y ) in the spherical harmonic expansion series data storage unit 33.

When optimizing the spherical harmonic expansion series data A _mn (k _x , k _y ), the spherical harmonic expansion series optimization processing unit 16 repeats the optimization process a preset number of times, for example. You may do so. FIG. 7 shows an example of its operation.

That is, the spherical harmonic expansion series optimization processing unit 16 inputs the spherical harmonic expansion series data A _mn (k _x , k _y ) to the multilayer neural network module 161 as shown in FIG. After that, the output is input again to the multi-layer neural network module 161 and optimized as shown in FIG. 7(b). Then, after repeating this optimization process a preset number of times, the finally obtained optimized spherical harmonic expansion series data A ^~ _mn (k _x , k _y ) are output as shown in Fig. 7(c). and stored in the spherical harmonic expansion series data storage unit 33 .

It should be noted that a method such as error backpropagation can be applied to the parameter learning method of the multi-layer neural network that constitutes the spherical harmonic expansion series optimization processing unit 16 .

Further, the _spherical harmonic expansion series _optimization _processing unit 16 generates optimized spherical harmonic expansion series data A ^~ _mn (k _x , k _y ) may be output.

In this case, the spherical harmonic expansion series optimization processing unit 16, for example, generates initial spherical harmonic expansion series data A (0) having the same degree as the optimized spherical harmonic expansion series data A ^~ _mn (k _x , k _y ⁾ _mn (k _x , k _y ) is input to the multi-layer neural network instead of the spherical harmonic series data A _mn (k _x , k _y ) output from the spherical harmonic series expansion processor 13 .

Note that the initial spherical harmonic expansion series data A ⁽⁰⁾ _mn (k _x , k _y ) is, for example, a higher-order spherical harmonic expansion series that is not included in the spherical harmonic expansion series data A _mn (k _x , k _y ). It can be generated by initializing with zero or a random number.

Next, the control unit 1B of the multi-viewpoint image generation device FGB controls the data designating the shooting direction from the virtual viewpoint input by the user in the input/output device 7 and the optimum image data stored in the spherical harmonic expansion series data storage unit 33. The converted spherical harmonic expansion series data A ^~ _mn (k _x , k _y ) are input to the spherical harmonic inverse transform processing unit 14, and the spherical harmonic inverse transform processing unit 14 generates an image of the wavenumber domain seen from the virtual viewpoint. Generate data. This processing is the same as that described in the first embodiment.

(action/effect)
As described above, in the second embodiment, the spherical harmonic expansion series data obtained by the spherical harmonic expansion series processing unit 13 is input to the spherical harmonic expansion series optimization processing unit 16, and this spherical harmonic expansion series optimization processing is performed. Optimization processing is performed by the unit 16 .

Therefore, for example, if the number of training images is insufficient for the order of the desired spherical harmonic expansion series, there is concern that the accuracy of the generated multi-viewpoint image data will decrease. By optimizing the spherical harmonic expansion series data by the optimization processing unit 16, it is possible to suppress deterioration in accuracy of the generated multi-viewpoint image data and generate highly accurate multi-viewpoint image data.

[Third embodiment]
In the third embodiment of the present invention, a spherical harmonics transform processing unit includes a basis vector calculation unit and a multi-layered neural network to generate basis vectors of spherical harmonics corresponding to a designated shooting direction. By inputting base vectors into the above-mentioned multi-layered neural network which has been learned in advance, image data viewed from the above-mentioned photographing direction is generated and output.

(Configuration example)
FIG. 8 is a block diagram showing the software configuration of the multi-viewpoint image generation device FGC according to the third embodiment of the invention.
In FIG. 8, the same parts as in FIG. 2 are denoted by the same reference numerals, and detailed description thereof will be omitted. Further, the hardware configuration of the multi-viewpoint image generation device FGC is the same as that of FIG. 1, so the description is omitted.

The control unit 1C of the multi-viewpoint image generation device FGC includes a teacher data acquisition processing unit 11 and a spherical harmonic transform processing unit 17. These

processing units

11 and 17 are realized by causing the hardware processor of the control unit 1C to execute the application programs stored in the program storage unit 2. FIG.

Also in the third embodiment, part or all of the

processing units

11 and 17 may be realized using hardware such as LSI and ASIC.

The teacher data acquisition processing unit 11 acquires image data captured by the cameras 61 to 6N and camera attribute information representing the installation positions or shooting directions of the cameras 61 to 6N, and stores the acquired image data and camera attribute information. is stored in the teacher data storage unit 31 as teacher data. The teacher data is used as data for learning the parameters of a multi-layer neural network of the spherical harmonic transform processing unit 17, which will be described later.

The spherical harmonic transform processing unit 17 includes, for example, a basis vector calculation processing unit and an image data generation processing unit using a multi-layer neural network. The multi-layer neural network is configured in advance so that the teacher data stored in the teacher data storage unit 31 is used as learning data, the basis vectors of spherical harmonic functions are input, and image data of the corresponding spatial region corresponding thereto is output. parameters are learned.

The basis vector generation processing unit acquires data representing the shooting direction from the virtual viewpoint input from the input/output device 7 via the input/output I/F unit 4, and generates spherical harmonic functions corresponding to the acquired shooting direction. Calculate the basis vectors of .

The multilayer neural network of the image data generation processing unit receives as input the basis vectors of the spherical harmonics generated by the basis vector calculation processing unit, and converts these basis vectors into image data of the spatial region corresponding to the shooting direction. Then, the generated image data of the transformed spatial domain is output.

(Operation example)
Next, an operation example of the multi-viewpoint image generation device FGC configured as described above will be described.

FIG. 9 is a flowchart showing an example of the processing procedure and processing contents of the multi-viewpoint image generation processing executed by the control unit 1C of the multi-viewpoint image generation device FGC. In FIG. 9, steps in which the same processing as in FIG. 3 is performed are denoted by the same reference numerals, and detailed description thereof will be omitted.

(1) Learning of Spherical Harmonics Transformation Processing Unit 17 First, in the learning phase, the control unit 1C of the multi-viewpoint image generation device FGC reads each teacher image data and teacher direction data from the teacher data storage unit 31, and performs spherical harmonics transformation. Under the control of the processing unit 17, the parameters of the multilayer neural network are learned and stored in step S20.

For example, the spherical harmonic transform processing unit 17 generates spherical harmonic basis vectors corresponding to teacher direction data read from the teacher data storage unit 31 . That is, assuming that the teacher direction data is (θ^, φ^), first, the basis vectors are calculated according to the following equation.

Next, the spherical harmonic transform processing unit 17 inputs the calculated basis vectors to the multi-layer neural network, and outputs image data of the corresponding spatial region from the multi-layer neural network.

It should be noted that a method such as error backpropagation based on the error between the output image data in the spatial domain and the teacher image data can be applied to learning the parameters of the multi-layer neural network.

(2) Transformation of Spherical Harmonics When the learning of the multi-layer neural network is completed, the spherical harmonics transformation processing unit 17 executes the spherical harmonics transformation processing as follows.

That is, the spherical harmonic transform processing unit 17 first acquires, via the input/output I/F unit 4, data representing the shooting direction from the virtual viewpoint, which is input from the input/output device 7 in step S15. Then, in step S21, the spherical harmonic transform processing unit 17 calculates the base vector of the spherical harmonic function corresponding to the obtained photographing direction by the base vector calculation processing unit.

For example, if (θ ̂ , φ ̂ ) is specified as the shooting direction from the virtual viewpoint, the spherical harmonics transformation processing unit 17 causes the basis vector calculation processing unit to calculate the basis of the spherical harmonics in accordance with the above equation (1). Compute a vector.

Next, in step S22, the spherical harmonic transform processing unit 17 inputs the calculated basis vectors to the multi-layer neural network, and outputs image data of the corresponding spatial region from the multi-layer neural network. Then, the spherical harmonic transform processing unit 17 transmits the spatial domain image data output from the multilayer neural network from the input/output I/F unit 4 to the input/output device 7 in step S18.

(action/effect)
As described above, in the third embodiment, instead of calculating the spherical harmonic expansion series data in the spherical harmonic expansion processing unit 13 described in the first and second embodiments, the spherical harmonic conversion processing unit 17 performs multi-layer You are learning the parameters of a neural network.

Therefore, for example, even if the size of the teacher image data and teacher direction data is enormous and it is difficult to directly calculate the spherical harmonic expansion series data, a model corresponding to the spherical harmonic expansion series and the subsequent high-precision processing is learned. By doing so, it is possible to suppress deterioration in accuracy of generated multi-viewpoint image data and generate highly accurate multi-viewpoint image data.

[Fourth embodiment]
The fourth embodiment of the present invention is a further improvement of the third embodiment, wherein the multi-layer neural network included in the spherical harmonic transform processing unit is replaced with the first multi-layer neural network for generating low-resolution image data. , and a second multilayer neural network for up-sampling the low-resolution image data output from the first multilayer neural network and outputting high-resolution image data.

(Configuration example)
FIG. 10 is a block diagram showing the software configuration of the spherical harmonic transform processing section 170 of the multi-viewpoint image generating apparatus according to the fourth embodiment of the present invention.

The spherical harmonic transform processing unit 170 includes a basis vector calculation unit 171, a first multilayer neural network 172, and a second multilayer neural network 173.

The first multilayer neural network 172 receives the basis vectors output from the basis vector calculation unit 171 and outputs low-resolution image data.

The second multilayer neural network outputs high-resolution image data by upsampling the low-resolution image data output from the first multilayer neural network.

(Operation example)
Next, the operation of the spherical harmonic transform processing section 170 will be described.

Here, the dimensions of the output image data are assumed to be (B, C, UW, UH) dimensions for convenience of explanation. In this case, the spherical harmonic transform processing unit 170 outputs B sheets of C channel (a monochrome image when C is "1" and an RGB color image when C is "3") of height UH and width UW. Furthermore, U is a positive integer.

In the spherical harmonic transform processing unit 170, the basis vector calculation unit 171 first calculates the basis vectors from the teacher direction data or the shooting directions (θ^, φ^) from the virtual viewpoint according to formula (1).

Next, the spherical harmonic transform processing unit 170 inputs the calculated basis vectors to the first multilayer neural network 172 to generate low-resolution image data. The first multi-layer neural network 172 has a fully connected layer in the first layer, and the dimensions of the output low-resolution image data are (B, C, W, H). It may be configured as

Subsequently, the spherical harmonic transform processing unit 170 inputs the low-resolution image data output from the first multilayer neural network 172 to the second multilayer neural network 173, and outputs high-resolution image data. . The second multilayer neural network 173 assumes that the dimensions of the input low-resolution image data are (B, C, W, H) and the dimensions of the output high-resolution image data are (B, C , UW, UH).

Finally, the spherical harmonic transform processing unit 170 outputs the high-resolution image data output from the second multilayer neural network 173 to the input/output device 7 from the input/output I/F unit 4 .

(action/effect)
In the fourth embodiment of the present invention, the multi-layer neural network included in the spherical harmonic transform processing unit 170 is a first multi-layer neural network 172 that generates low-resolution image data, and the first multi-layer neural network 172 A second multi-layer neural network 173 for up-sampling output low-resolution image data to increase the resolution is connected in tandem. Therefore, model learning can be made efficient and the size of the learning model can be reduced.

[Other embodiments]
(1) In each of the above-described embodiments, the case where the teacher image data is obtained from each of the cameras 61 to 6N has been described. However, the present invention is not limited to this. The training image data may be temporarily stored in a storage server, database, or the like, and the multi-viewpoint image generation apparatuses FGA, FGB, and FGC may collectively acquire the teacher image data from this storage server or database.

(2) In each of the above embodiments, the case where only one shooting direction from the virtual viewpoint is specified has been described as an example. However, by collectively designating the photographing directions corresponding to a plurality of virtual viewpoints, the multi-viewpoint image generating devices FGA, FGB, and FGC generate and output image data viewed from the plurality of designated virtual viewpoints. good too.

(3) For the spherical harmonic series expansion processing unit 13, a conversion model using, for example, a convolutional neural network is prepared in advance, and training image data and a training direction in the wave number domain are input to this conversion model, It may be configured to output data expanded into a spherical harmonic series. Similarly, a transformation model using a convolutional neural network is prepared for the inverse spherical harmonic transform processing unit 14, and spherical harmonic series data and data designating the shooting direction from the virtual viewpoint are input to this transformation model, It may be configured to output image data in the wavenumber domain.

(4) In addition, the configuration and installation location of the multi-viewpoint image generation device, the type of neural network, the processing procedure and processing details of the multi-viewpoint image generation processing, the type of object to be photographed, etc. do not deviate from the gist of the present invention. Various modifications can be made within the range.

Although the embodiments of the present invention have been described in detail above, the above descriptions are merely examples of the present invention in all respects. It goes without saying that various modifications and variations can be made without departing from the scope of the invention. That is, in carrying out the present invention, a specific configuration according to the embodiment may be adopted as appropriate.

In short, the present invention is not limited to the above-described embodiments as they are, and can be embodied by modifying the constituent elements without departing from the gist of the invention at the implementation stage. Also, various inventions can be formed by appropriate combinations of the plurality of constituent elements disclosed in the above embodiments. For example, some components may be deleted from all the components shown in each embodiment. Furthermore, constituent elements of different embodiments may be combined as appropriate.

FGA, FGB, FGC... Multi-viewpoint

image generation device

1A, 1B, 1C... Control unit 2... Program storage unit 3... Data storage unit 4... Input/output I/F unit 5... Bus 6... Camera 7... Input/output device 11... Teacher data acquisition processing unit 12 Fourier transform processing unit 13 Spherical harmonic series expansion processing unit 14 Spherical harmonic inverse transform processing unit 15 Inverse Fourier transform processing unit 16 Spherical harmonic expansion series

optimization processing unit

17, 170 Spherical harmonics Function conversion processing unit 31 Teacher data storage unit 32 Wavenumber domain data storage unit 33 Spherical harmonic expansion series data storage unit 34 Generated image data storage unit 161 Multilayer neural network module 171 Basis

vector calculation unit

172, 173 Multilayer neural network

Claims

a Fourier transform processing unit that transforms teacher images captured from a plurality of viewpoints into teacher images in the wave number domain;
a spherical harmonic series expansion processing unit that expands the teacher image in the wavenumber domain into a spherical harmonic series for each wavenumber component;
Generating a wave number region corresponding to the virtual viewpoint based on the spherical harmonic expansion series obtained by the spherical harmonic expansion processing unit and information designating an imaging direction from an arbitrary virtual viewpoint different from the plurality of viewpoints. a spherical harmonic inverse transform processor for generating an image;
and an inverse Fourier transform processing unit that transforms the generated image in the wavenumber domain into a generated image in the spatial domain.
2. The spherical harmonic expansion series according to claim 1, further comprising a spherical harmonic expansion series optimization processing section that receives as input the spherical harmonic expansion series obtained by the spherical harmonic expansion series and outputs an optimized spherical harmonic expansion series. Multi-viewpoint image generation device.
3. The spherical harmonic series expansion processing unit according to claim 1 or 2, wherein said spherical harmonic series expansion processing unit expands said teacher image in said wavenumber domain into said spherical harmonic series for each of said wavenumber components, with an imaging direction from said plurality of viewpoints as a teacher direction. multi-viewpoint image generation device.
a teacher data acquisition processing unit that acquires the teacher images shot from the plurality of viewpoints from an external camera or a database; 3. The multi-viewpoint image generation device according to claim 1, further comprising a transmission processing unit that transmits to a device that is a source of specifying a shooting direction.
A multi-viewpoint image generation device that uses spherical harmonics to generate an image corresponding to a shooting direction from an arbitrary virtual viewpoint,
Equipped with a spherical harmonic transform processing unit,
The spherical harmonic transform processing unit is
a basis vector calculation processing unit that calculates a basis vector of a spherical harmonic function corresponding to the obtained photographing direction when information specifying the photographing direction from the arbitrary virtual viewpoint is input;
and an image generation processing unit that receives the calculated base vector as an input and generates and outputs an image of a spatial region corresponding to the shooting direction.
The image generation processing unit
a first neural network that receives the basis vectors as input and generates and outputs a first image having a first resolution;
using the first image output from the first neural network as an input, generating a second image having a second resolution higher than the first resolution; 6. The multi-viewpoint image generation device according to claim 5, further comprising: a second neural network that outputs images in a spatial domain corresponding to a shooting direction.
A multi-view image generation method executed by an information processing device,
a first processing step of transforming teacher images captured from a plurality of viewpoints into teacher images in the wavenumber domain;
a second processing step of expanding the teacher image in the wavenumber domain into a spherical harmonic series for each wavenumber component;
Generating a wavenumber region corresponding to the virtual viewpoint based on the spherical harmonic expansion series obtained by the second processing step and information designating an imaging direction from an arbitrary virtual viewpoint different from the plurality of viewpoints. a third process for generating an image;
and a fourth processing step of converting the generated image in the wavenumber domain into a generated image in the spatial domain.
8. The multi-viewpoint image generation according to claim 7, further comprising a fifth processing step of inputting the spherical harmonic expansion series obtained by the second processing step and outputting an optimized spherical harmonic expansion series. Method.
A multi-viewpoint image generation method in which an information processing apparatus executes processing for generating an image corresponding to a shooting direction from an arbitrary virtual viewpoint using spherical harmonics,
a step of calculating a basis vector of a spherical harmonic function corresponding to the obtained photographing direction when information specifying the photographing direction from the arbitrary virtual viewpoint is input;
A multi-viewpoint image generation method comprising a step of generating and outputting an image of a spatial region corresponding to the imaging direction using the calculated basis vectors as input.
A program that causes a processor included in the multi-viewpoint image generation device to execute processing by each of the processing units included in the multi-viewpoint image generation device according to any one of claims 1 to 6.