WO2023095353A1 - Multi-viewpoint image generation device, method, and program - Google Patents

Multi-viewpoint image generation device, method, and program Download PDF

Info

Publication number
WO2023095353A1
WO2023095353A1 PCT/JP2022/006981 JP2022006981W WO2023095353A1 WO 2023095353 A1 WO2023095353 A1 WO 2023095353A1 JP 2022006981 W JP2022006981 W JP 2022006981W WO 2023095353 A1 WO2023095353 A1 WO 2023095353A1
Authority
WO
WIPO (PCT)
Prior art keywords
spherical harmonic
processing unit
image
viewpoint
series
Prior art date
Application number
PCT/JP2022/006981
Other languages
French (fr)
Japanese (ja)
Inventor
公孝 堤
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2022/043195 priority Critical patent/WO2023095792A1/en
Publication of WO2023095353A1 publication Critical patent/WO2023095353A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/387Composing, repositioning or otherwise geometrically modifying originals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/111Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation

Definitions

  • One aspect of the present invention relates to a multi-viewpoint image generating apparatus, method, and program for generating images captured from viewpoints not included in the above images, for example, using images captured from a plurality of different viewpoints of an object arranged in space as input. .
  • a known technique is to estimate the three-dimensional shape of a subject from images taken by a plurality of cameras, and use this three-dimensional shape to synthesize images taken from arbitrary directions not taken by the above cameras.
  • a silhouette of a subject is extracted from a plurality of images, a three-dimensional shape represented by voxels is estimated from the silhouette, and this three-dimensional shape is captured from an arbitrary viewpoint using a virtual camera.
  • a technique for synthesizing the images photographed from the above arbitrary directions by photographing is described. This type of technology is attracting attention as an important elemental technology in the fields of content production and sports science, because it enables the presentation of images viewed from various viewpoints, for example, in live sports broadcasts.
  • Patent Document 1 expresses the three-dimensional shape of the subject using voxels, the data amount of the image representing the three-dimensional shape becomes large. For this reason, the amount of calculation increases when processing a three-dimensional image, and a large-capacity memory is required for the calculation. This tendency becomes more conspicuous as the resolution of the three-dimensional shape image increases, resulting in an increase in the processing load and processing time of the image generation device.
  • the present invention has been made in view of the above circumstances, and is capable of generating an image corresponding to an arbitrary viewpoint without using a three-dimensional image with a large amount of data, thereby reducing the processing load associated with image processing. And it is intended to provide a technique that enables shortening of the processing time.
  • a first aspect of a multi-viewpoint image generation apparatus or method converts teacher images captured from a plurality of viewpoints into teacher images in the wavenumber domain, and An image is expanded into a spherical harmonic series for each wavenumber component, and the wavenumber corresponding to the virtual viewpoint is obtained based on the spherical harmonic series and information specifying a shooting direction from an arbitrary virtual viewpoint different from the plurality of viewpoints.
  • a generated image of a region is generated, and the generated image of the wavenumber domain is transformed into a generated image of the spatial domain.
  • a teacher image photographed from a plurality of viewpoints is developed into a spherical harmonic series for each wave number component, and an image corresponding to an arbitrary virtual viewpoint is generated based on this spherical harmonic series. generated. Therefore, for example, compared to the case of expressing the three-dimensional shape of the subject using voxels, it is possible to generate an image viewed from a virtual viewpoint with a smaller amount of data, thereby reducing the processing load and processing related to image processing. It is possible to shorten the time.
  • a second aspect of the present invention provides a spherical harmonic expansion series optimization processing unit that receives as input the spherical harmonic expansion series obtained by the spherical harmonic expansion processing unit and outputs an optimized spherical harmonic expansion series, It is designed to be further equipped.
  • the second aspect of the present invention it is possible to suppress deterioration in accuracy of generated multi-view image data and generate highly accurate multi-view image data.
  • a third aspect of the present invention is a multi-viewpoint image generation device that generates an image corresponding to a shooting direction from an arbitrary virtual viewpoint using spherical harmonics, wherein the shooting direction from the arbitrary virtual viewpoint is A basis vector calculation processing unit that calculates a basis vector of a spherical harmonic function corresponding to the obtained photographing direction when specified information is input, and a basis vector calculation processing unit that receives the calculated basis vector as an input and corresponds to the photographing direction. and a spherical harmonic transform processing section having an image generation processing section for generating and outputting an image in the spatial domain.
  • the third aspect of the present invention for example, even when the scale of teacher image data and teacher direction data is enormous and it is difficult to directly calculate spherical harmonic expansion series data, spherical harmonic expansion series and subsequent high-level By learning a model corresponding to the accuracy improvement process, it is possible to suppress deterioration in accuracy of generated multi-viewpoint image data and generate highly accurate multi-viewpoint image data.
  • the image generation processing unit includes a first neural network for generating and outputting a first image having a first resolution using the basis vectors as an input; generating a second image having a second resolution higher than the first resolution using the first image output from a network as an input, and corresponding the generated second image to the photographing direction; and a second neural network for outputting an image in the spatial domain.
  • model learning can be made more efficient and the size of the learning model can be reduced.
  • an image corresponding to an arbitrary viewpoint can be generated without using a three-dimensional image with a large amount of data, thereby reducing the processing load and processing time associated with image processing. It is possible to provide a technique that enables shortening.
  • FIG. 1 is a block diagram showing an example of the hardware configuration of a multi-viewpoint image generation device according to the first embodiment of the invention.
  • FIG. 2 is a block diagram showing an example of the software configuration of the multi-viewpoint image generation device according to the first embodiment of the invention.
  • FIG. 3 is a flowchart showing an example of a processing procedure and processing contents of a multi-viewpoint image generation process executed by a control unit of the multi-viewpoint image generation apparatus shown in FIG.
  • FIG. 4 is a diagram used for explaining spherical harmonic series.
  • FIG. 5 is a block diagram showing an example of the software configuration of the multi-viewpoint image generation device according to the second embodiment of the invention.
  • FIG. 6 is a flow chart showing an example of a processing procedure and processing contents of a multi-viewpoint image generation process executed by a control unit of the multi-viewpoint image generation apparatus shown in FIG. 5.
  • FIG. FIG. 7 is a diagram for explaining an operation example of a spherical harmonic expansion series optimization processing unit provided in the multi-viewpoint image generation device shown in FIG.
  • FIG. 8 is a block diagram showing an example of the software configuration of the multi-viewpoint image generation device according to the third embodiment of the invention.
  • 9 is a flowchart showing an example of a processing procedure and processing contents of a multi-viewpoint image generation process executed by a control unit of the multi-viewpoint image generation apparatus shown in FIG. 8.
  • FIG. FIG. 10 is a block diagram showing an example of the software configuration of the spherical harmonic transform processing section of the multi-viewpoint image generation device according to the fourth embodiment of the present invention.
  • FIG. 1 is a block diagram showing an example of the hardware configuration of a multi-viewpoint image generating apparatus and its peripheral parts according to the first embodiment of the present invention
  • FIG. 2 is a block diagram showing an example of the software configuration for generating the multi-viewpoint image. is.
  • the multi-viewpoint image generation device FGA is composed of an information processing device such as a server computer or a personal computer. is connected.
  • the cameras 61 to 6N are distributed in, for example, an event venue where sports are held, and photograph the inside of the event venue from a plurality of viewpoints and output the image data. Note that the photographing directions of the cameras 61 to 6N are adjusted so as to point toward points (origins) preset in the event venue.
  • the input/output device 7 is, for example, a personal computer, a smartphone, or a tablet terminal.
  • the input/output device 7 transmits information representing an arbitrary virtual viewpoint specified by the user to the multi-viewpoint image generation device FGA, and receives image data sent from the multi-viewpoint image generation device FGA. used to display
  • the multi-viewpoint image generation apparatus FGA includes a control unit 1A using a hardware processor such as a central processing unit (CPU).
  • a storage unit having a data storage section 3 and an input/output interface (hereinafter the interface is abbreviated as I/F) section 4 are connected.
  • the input/output I/F section 4 has a communication interface function, and transmits and receives image data and input data to and from the cameras 61 to 6N and the input/output device 7 via signal cables or networks.
  • the program storage unit 2 is composed of, for example, a non-volatile memory such as an SSD (Solid State Drive) that can be written and read at any time as a storage medium, and a non-volatile memory such as a ROM (Read Only Memory). , OS (Operating System) and other middleware, as well as application programs necessary for executing various control processes according to the first embodiment.
  • OS Operating System
  • application programs necessary for executing various control processes according to the first embodiment.
  • the OS and each application program will be collectively referred to as programs.
  • the data storage unit 3 is, for example, a combination of a non-volatile memory such as an SSD that can be written and read at any time and a volatile memory such as a RAM (Random Access Memory) as a storage medium.
  • a teacher data storage unit 31, a wavenumber domain data storage unit 32, a spherical harmonic expansion series data storage unit 33, and a generated image data storage unit 34 are provided as main storage units necessary for implementation.
  • the teacher data storage unit 31 is used to store image data sent from the cameras 61 to 6N and information representing the installation positions or shooting directions of the cameras 61 to 6N as teacher data.
  • the wavenumber domain data storage unit 32 is used to store image data converted into wavenumber domain data by the control unit 1A, which will be described later.
  • the spherical harmonic expansion series data storage unit 33 is used to store spherical harmonic series data expanded based on the image data in the wavenumber domain by the control unit 1A, which will be described later.
  • the generated image data storage unit 34 is used to store image data in the wavenumber domain corresponding to the virtual viewpoint generated by the control unit 1A, which will be described later.
  • the control unit 1A includes, as processing functions necessary for carrying out the first embodiment, a teacher data acquisition processing unit 11, a Fourier transform processing unit 12, a spherical harmonic series expansion processing unit 13, and a spherical harmonic inverse transform processing. and an inverse Fourier transform processing unit 15 .
  • These processing units 11 to 15 are realized by causing the hardware processor of the control unit 1A to execute the application programs stored in the program storage unit 2.
  • processing units 11 to 15 may be implemented using hardware such as LSI (Large Scale Integration) and ASIC (Application Specific Integrated Circuit).
  • the teacher data acquisition processing unit 11 receives image data output from the cameras 61 to 6N through the input/output I/F unit 4, and stores the received image data as teacher image data in the teacher data storage unit 31. Memorize. Further, the teacher data acquisition processing unit 11 acquires camera attribute information representing the installation positions or shooting directions of the cameras 61 to 6N from the cameras 61 to 6N or the input/output device 7, and stores the acquired camera attribute information as teacher direction data. , are stored in the teacher data storage unit 31 in association with the image data of the cameras 61 to 6N.
  • the Fourier transform processing unit 12 transforms each image data of each of the cameras 61 to 6N stored in the teacher data storage unit 31 into image data in the wavenumber domain by Fourier transform processing, and converts each image in the wavenumber domain.
  • the data are stored in the wavenumber domain data storage unit 32 .
  • the spherical harmonic series expansion processing unit 13 converts each image data in the wavenumber domain stored in the wavenumber domain data storage unit 32 to each wavenumber based on the teacher direction data stored in the teacher data storage unit 31. Expand to a spherical harmonic series. Then, the expanded spherical harmonic expansion series data is stored in the spherical harmonic expansion series data storage unit 33 .
  • the inverse spherical harmonic transform processing unit 14 acquires, via the input/output I/F unit 4 , data representing the shooting direction from the virtual viewpoint input from the input/output device 7 . Then, the spherical harmonic expansion series data stored in the spherical harmonic expansion series data storage unit 33 is subjected to inverse spherical harmonic transformation to generate image data in the wavenumber domain corresponding to the photographing direction viewed from the virtual viewpoint, The generated image data of the wavenumber domain is stored in the generated image data storage unit 34 .
  • the inverse Fourier transform processing unit 15 transforms the generated image data in the wavenumber domain stored in the generated image data storage unit 34 into generated image data in the spatial domain by inverse Fourier transform processing. Then, the converted generated image data of the spatial domain is output from the input/output I/F section 4 to the input/output device 7 .
  • FIG. 3 is a flow chart showing an example of the processing procedure and processing contents of the multi-viewpoint image generation process executed by the control unit 1A of the multi-viewpoint image generation apparatus FGA.
  • (1) Acquisition of teaching data For example, assume that the cameras 61 to 6N photograph subjects in the event venue and transmit the image data.
  • the control unit 1A of the multi-viewpoint image generation apparatus FGA receives the image data transmitted from the cameras 61 to 6N in step S12 under the control of the teacher data acquisition processing unit 11.
  • Each data is received via the input/output I/F unit 4, and each of the received image data is stored in the teacher data storage unit 31 as teacher image data.
  • image data may be captured using the virtually arranged cameras 61 to 6N for an object that is virtually reproduced by computer graphics or the like, and the image data may be transmitted.
  • the teacher data acquisition processing unit 11 receives camera attribute information indicating the installation positions or shooting directions of the cameras 61 to 6N from the cameras 61 to 6N or the input/output device 7 via the input/output I/F 4. Then, the received camera attribute information is stored in the teacher data storage unit 31 as teacher direction data in association with the image data of each of the cameras 61 to 6N.
  • the teacher image data obtained by each of the cameras 61 to 6N is taken from a plurality of viewpoints which are equidistant from the origin of the object as the subject and which are present in different directions (elevation angle, azimuth angle). It is obtained.
  • teacher image data is represented as f i (x, y) using coordinates x, y.
  • the direction of the camera viewed from the object is expressed as ( ⁇ i , ⁇ i ).
  • (1 ⁇ i ⁇ N) (1 ⁇ i ⁇ N).
  • step S13 converts the teacher data storage unit 31 to each teacher.
  • Image data is read, and these teacher image data are converted into image data in the wavenumber domain by Fourier transform processing.
  • the converted image data of the wavenumber domain is stored in the wavenumber domain data storage unit 32 .
  • the image data in the wavenumber domain is expressed as F i (k x , k y ). However, (1 ⁇ i ⁇ N).
  • step S14 the control unit 1A of the multi-viewpoint image generation apparatus FGA converts the wavenumbers from the wavenumber domain data storage unit 32 to the wavenumber domain data storage unit 32 under the control of the spherical harmonic series expansion processing unit 13.
  • Fi (k x , ky ) is read as the image data of the region, and Fi (k x , ky ) is converted into a spherical harmonic expansion series for each wave number component (k x , ky ). Convert. Then, the obtained spherical harmonic expansion series data is stored in the spherical harmonic expansion series data storage unit 33 .
  • the above spherical harmonic expansion series data is obtained by using numerical integration, It can be calculated by
  • the spherical harmonic series expansion may be defined using the fact that the inverse transformation of the spherical harmonic series expansion is represented by the following equation. However, the truncated order of the spherical harmonic series expansion is set to the Mth order.
  • matrix Y and vector A are defined by the following equations, respectively.
  • a vector is defined in which only wavenumber (k x , k y ) components are arranged in the teacher image data in the wavenumber domain. This vector is is indicated by
  • the control unit 1A of the multi-viewpoint image generation apparatus FGA monitors virtual viewpoint designation inputs in step S15 under the control of the spherical harmonic inverse transform processing unit 14. .
  • the input/output device 7 transmits data designating the photographing direction from the virtual viewpoint.
  • step S16 the spherical harmonic expansion series data is read from the spherical harmonic expansion series data storage unit 33, and the virtual viewpoint Image data of the wave number domain viewed from the virtual viewpoint is generated based on the data specifying the imaging direction from the virtual viewpoint and the read spherical harmonic expansion series data. Then, the generated image data of the wavenumber domain viewed from the virtual viewpoint is stored in the generated image data storage unit 34 .
  • the inverse spherical harmonic transform processing unit 14 first calculates a basis vector according to the following equation, where ( ⁇ , ⁇ ) is the shooting direction from the virtual viewpoint.
  • the inverse spherical harmonic transform processing unit 14 uses the basis vectors to transform the generated image data F ⁇ (k x , ky ) in the wavenumber domain into Calculated by
  • the inverse spherical harmonic transform processor 14 performs the above calculations on all wavenumber components (k x , k y ).
  • the control unit 1A of the multi-viewpoint image generation apparatus FGA converts the generated image data storage unit 34 to the virtual viewpoints in step S17.
  • the read image data F ⁇ (k x , k y ) in the wavenumber domain is transformed into image data in the spatial domain by inverse Fourier transform processing.
  • the control unit 1A of the multi-viewpoint image generation apparatus FGA transmits the image data of the spatial region from the input/output I/F unit 4 to the input/output device 7.
  • the input/output device 7 displays the image data generated for the virtual viewpoint designated by the user.
  • teacher image data photographed from a plurality of viewpoints and attribute data indicating the shooting direction of the teacher image data are acquired, and the teacher image data are converted into image data in the wavenumber domain by Fourier transform.
  • each wavenumber component is converted into spherical harmonic expansion series data.
  • spherical harmonic inverse calculation is performed to generate image data of the wavenumber domain viewed from the virtual viewpoint.
  • the generated image data is transformed into image data in the spatial domain by inverse Fourier transform, and is output to the input/output device 7 .
  • FIG. 5 is a block diagram showing the software configuration of the multi-viewpoint image generation device FGB according to the second embodiment of the invention.
  • the same parts as in FIG. 2 are denoted by the same reference numerals, and detailed description thereof will be omitted.
  • the hardware configuration of the multi-viewpoint image generation device FGB is the same as that in FIG. 1, so the description is omitted.
  • the control unit 1B of the multi-viewpoint image generation device FGB includes a teacher data acquisition processing unit 11, a Fourier transform processing unit 12, a spherical harmonic series expansion processing unit 13, an inverse spherical harmonic transform processing unit 14, and an inverse Fourier transform processing unit 15.
  • a spherical harmonic expansion series optimization processing unit 16 is provided. These processing units 11 to 16 are realized by causing the hardware processor of the control unit 1A to execute the application programs stored in the program storage unit 2. FIG.
  • part or all of the processing units 11 to 16 may be implemented using hardware such as LSI and ASIC.
  • the spherical harmonic expansion series optimization processing unit 16 includes, for example, a multi-layer neural network constituting a postfilter, and receives as input the spherical harmonic expansion series data output from the spherical harmonic expansion processing unit 13, and performs an optimized spherical harmonic expansion. Output series data.
  • the spherical harmonic expansion series optimization processing unit 16 stores the output optimized spherical harmonic expansion series data in the spherical harmonic expansion series data storage unit 33 .
  • the inverse spherical harmonic transform processing unit 14 acquires, via the input/output I/F unit 4 , data representing the shooting direction from the virtual viewpoint input from the input/output device 7 .
  • the optimized spherical harmonic expansion series data is read from the spherical harmonic expansion series data storage unit 33 .
  • the optimized spherical harmonic expansion series data is subjected to inverse spherical harmonic transformation to generate image data in the wavenumber domain corresponding to the imaging direction seen from the virtual viewpoint, and the generated image data in the wavenumber domain is stored in the generated image data storage unit 34 .
  • FIG. 6 is a flowchart showing an example of the processing procedure and processing contents of the multi-viewpoint image generation process executed by the control unit 1B of the multi-viewpoint image generation device FGB.
  • steps in which the same processes as those in FIG. 3 are performed are denoted by the same reference numerals, and detailed description thereof will be omitted.
  • the control unit 1B of the multi-viewpoint image generation device FGB under the control of the spherical harmonic expansion series optimization processing unit 16, expands the spherical surface in step S19. Optimize the harmonic expansion series data.
  • the spherical harmonic expansion series optimization processing unit 16 uses a multi-layer neural network, and the spherical harmonic expansion series data A mn (k x , k y ) and outputs the optimized spherical harmonic expansion series data A ⁇ mn (k x ,k y ) from the multi-layer neural network.
  • the multi-layer neural network constructs a postfilter to generate spherical harmonic expansion series data A ⁇ mn (k x , k y ) that minimizes the error between the generated image data and the target image data. is generated and output.
  • the spherical harmonic expansion series optimization processing unit 16 stores the output optimized spherical harmonic expansion series data A ⁇ mn (k x , k y ) in the spherical harmonic expansion series data storage unit 33.
  • the spherical harmonic expansion series optimization processing unit 16 repeats the optimization process a preset number of times, for example. You may do so.
  • FIG. 7 shows an example of its operation.
  • the spherical harmonic expansion series optimization processing unit 16 inputs the spherical harmonic expansion series data A mn (k x , k y ) to the multilayer neural network module 161 as shown in FIG. After that, the output is input again to the multi-layer neural network module 161 and optimized as shown in FIG. 7(b). Then, after repeating this optimization process a preset number of times, the finally obtained optimized spherical harmonic expansion series data A ⁇ mn (k x , k y ) are output as shown in Fig. 7(c). and stored in the spherical harmonic expansion series data storage unit 33 .
  • a method such as error backpropagation can be applied to the parameter learning method of the multi-layer neural network that constitutes the spherical harmonic expansion series optimization processing unit 16 .
  • the spherical harmonic expansion series optimization processing unit 16 generates optimized spherical harmonic expansion series data A ⁇ mn (k x , k y ) may be output.
  • the spherical harmonic expansion series optimization processing unit 16 for example, generates initial spherical harmonic expansion series data A (0) having the same degree as the optimized spherical harmonic expansion series data A ⁇ mn (k x , k y ) mn (k x , k y ) is input to the multi-layer neural network instead of the spherical harmonic series data A mn (k x , k y ) output from the spherical harmonic series expansion processor 13 .
  • the initial spherical harmonic expansion series data A (0) mn (k x , k y ) is, for example, a higher-order spherical harmonic expansion series that is not included in the spherical harmonic expansion series data A mn (k x , k y ). It can be generated by initializing with zero or a random number.
  • the control unit 1B of the multi-viewpoint image generation device FGB controls the data designating the shooting direction from the virtual viewpoint input by the user in the input/output device 7 and the optimum image data stored in the spherical harmonic expansion series data storage unit 33.
  • the converted spherical harmonic expansion series data A ⁇ mn (k x , k y ) are input to the spherical harmonic inverse transform processing unit 14, and the spherical harmonic inverse transform processing unit 14 generates an image of the wavenumber domain seen from the virtual viewpoint. Generate data. This processing is the same as that described in the first embodiment.
  • the spherical harmonic expansion series data obtained by the spherical harmonic expansion series processing unit 13 is input to the spherical harmonic expansion series optimization processing unit 16, and this spherical harmonic expansion series optimization processing is performed. Optimization processing is performed by the unit 16 .
  • a spherical harmonics transform processing unit includes a basis vector calculation unit and a multi-layered neural network to generate basis vectors of spherical harmonics corresponding to a designated shooting direction.
  • FIG. 8 is a block diagram showing the software configuration of the multi-viewpoint image generation device FGC according to the third embodiment of the invention.
  • the same parts as in FIG. 2 are denoted by the same reference numerals, and detailed description thereof will be omitted.
  • the hardware configuration of the multi-viewpoint image generation device FGC is the same as that of FIG. 1, so the description is omitted.
  • the control unit 1C of the multi-viewpoint image generation device FGC includes a teacher data acquisition processing unit 11 and a spherical harmonic transform processing unit 17. These processing units 11 and 17 are realized by causing the hardware processor of the control unit 1C to execute the application programs stored in the program storage unit 2. FIG.
  • part or all of the processing units 11 and 17 may be realized using hardware such as LSI and ASIC.
  • the teacher data acquisition processing unit 11 acquires image data captured by the cameras 61 to 6N and camera attribute information representing the installation positions or shooting directions of the cameras 61 to 6N, and stores the acquired image data and camera attribute information. is stored in the teacher data storage unit 31 as teacher data.
  • the teacher data is used as data for learning the parameters of a multi-layer neural network of the spherical harmonic transform processing unit 17, which will be described later.
  • the spherical harmonic transform processing unit 17 includes, for example, a basis vector calculation processing unit and an image data generation processing unit using a multi-layer neural network.
  • the multi-layer neural network is configured in advance so that the teacher data stored in the teacher data storage unit 31 is used as learning data, the basis vectors of spherical harmonic functions are input, and image data of the corresponding spatial region corresponding thereto is output. parameters are learned.
  • the basis vector generation processing unit acquires data representing the shooting direction from the virtual viewpoint input from the input/output device 7 via the input/output I/F unit 4, and generates spherical harmonic functions corresponding to the acquired shooting direction. Calculate the basis vectors of .
  • the multilayer neural network of the image data generation processing unit receives as input the basis vectors of the spherical harmonics generated by the basis vector calculation processing unit, and converts these basis vectors into image data of the spatial region corresponding to the shooting direction. Then, the generated image data of the transformed spatial domain is output.
  • FIG. 9 is a flowchart showing an example of the processing procedure and processing contents of the multi-viewpoint image generation processing executed by the control unit 1C of the multi-viewpoint image generation device FGC.
  • steps in which the same processing as in FIG. 3 is performed are denoted by the same reference numerals, and detailed description thereof will be omitted.
  • control unit 1C of the multi-viewpoint image generation device FGC reads each teacher image data and teacher direction data from the teacher data storage unit 31, and performs spherical harmonics transformation. Under the control of the processing unit 17, the parameters of the multilayer neural network are learned and stored in step S20.
  • the spherical harmonic transform processing unit 17 generates spherical harmonic basis vectors corresponding to teacher direction data read from the teacher data storage unit 31 . That is, assuming that the teacher direction data is ( ⁇ , ⁇ ), first, the basis vectors are calculated according to the following equation.
  • the spherical harmonic transform processing unit 17 inputs the calculated basis vectors to the multi-layer neural network, and outputs image data of the corresponding spatial region from the multi-layer neural network.
  • a method such as error backpropagation based on the error between the output image data in the spatial domain and the teacher image data can be applied to learning the parameters of the multi-layer neural network.
  • the spherical harmonics transformation processing unit 17 executes the spherical harmonics transformation processing as follows.
  • the spherical harmonic transform processing unit 17 first acquires, via the input/output I/F unit 4, data representing the shooting direction from the virtual viewpoint, which is input from the input/output device 7 in step S15. Then, in step S21, the spherical harmonic transform processing unit 17 calculates the base vector of the spherical harmonic function corresponding to the obtained photographing direction by the base vector calculation processing unit.
  • the spherical harmonics transformation processing unit 17 causes the basis vector calculation processing unit to calculate the basis of the spherical harmonics in accordance with the above equation (1). Compute a vector.
  • step S22 the spherical harmonic transform processing unit 17 inputs the calculated basis vectors to the multi-layer neural network, and outputs image data of the corresponding spatial region from the multi-layer neural network. Then, the spherical harmonic transform processing unit 17 transmits the spatial domain image data output from the multilayer neural network from the input/output I/F unit 4 to the input/output device 7 in step S18.
  • the spherical harmonic conversion processing unit 17 performs multi-layer You are learning the parameters of a neural network.
  • the fourth embodiment of the present invention is a further improvement of the third embodiment, wherein the multi-layer neural network included in the spherical harmonic transform processing unit is replaced with the first multi-layer neural network for generating low-resolution image data. , and a second multilayer neural network for up-sampling the low-resolution image data output from the first multilayer neural network and outputting high-resolution image data.
  • FIG. 10 is a block diagram showing the software configuration of the spherical harmonic transform processing section 170 of the multi-viewpoint image generating apparatus according to the fourth embodiment of the present invention.
  • the spherical harmonic transform processing unit 170 includes a basis vector calculation unit 171, a first multilayer neural network 172, and a second multilayer neural network 173.
  • the first multilayer neural network 172 receives the basis vectors output from the basis vector calculation unit 171 and outputs low-resolution image data.
  • the second multilayer neural network outputs high-resolution image data by upsampling the low-resolution image data output from the first multilayer neural network.
  • the dimensions of the output image data are assumed to be (B, C, UW, UH) dimensions for convenience of explanation.
  • the spherical harmonic transform processing unit 170 outputs B sheets of C channel (a monochrome image when C is "1" and an RGB color image when C is "3") of height UH and width UW.
  • U is a positive integer.
  • the basis vector calculation unit 171 first calculates the basis vectors from the teacher direction data or the shooting directions ( ⁇ , ⁇ ) from the virtual viewpoint according to formula (1).
  • the spherical harmonic transform processing unit 170 inputs the calculated basis vectors to the first multilayer neural network 172 to generate low-resolution image data.
  • the first multi-layer neural network 172 has a fully connected layer in the first layer, and the dimensions of the output low-resolution image data are (B, C, W, H). It may be configured as
  • the spherical harmonic transform processing unit 170 inputs the low-resolution image data output from the first multilayer neural network 172 to the second multilayer neural network 173, and outputs high-resolution image data.
  • the second multilayer neural network 173 assumes that the dimensions of the input low-resolution image data are (B, C, W, H) and the dimensions of the output high-resolution image data are (B, C , UW, UH).
  • the spherical harmonic transform processing unit 170 outputs the high-resolution image data output from the second multilayer neural network 173 to the input/output device 7 from the input/output I/F unit 4 .
  • the multi-layer neural network included in the spherical harmonic transform processing unit 170 is a first multi-layer neural network 172 that generates low-resolution image data, and the first multi-layer neural network 172
  • a second multi-layer neural network 173 for up-sampling output low-resolution image data to increase the resolution is connected in tandem. Therefore, model learning can be made efficient and the size of the learning model can be reduced.
  • the teacher image data is obtained from each of the cameras 61 to 6N.
  • the training image data may be temporarily stored in a storage server, database, or the like, and the multi-viewpoint image generation apparatuses FGA, FGB, and FGC may collectively acquire the teacher image data from this storage server or database.
  • the multi-viewpoint image generating devices FGA, FGB, and FGC generate and output image data viewed from the plurality of designated virtual viewpoints. good too.
  • a conversion model using, for example, a convolutional neural network is prepared in advance, and training image data and a training direction in the wave number domain are input to this conversion model, It may be configured to output data expanded into a spherical harmonic series.
  • a transformation model using a convolutional neural network is prepared for the inverse spherical harmonic transform processing unit 14, and spherical harmonic series data and data designating the shooting direction from the virtual viewpoint are input to this transformation model, It may be configured to output image data in the wavenumber domain.
  • the configuration and installation location of the multi-viewpoint image generation device, the type of neural network, the processing procedure and processing details of the multi-viewpoint image generation processing, the type of object to be photographed, etc. do not deviate from the gist of the present invention. Various modifications can be made within the range.
  • the present invention is not limited to the above-described embodiments as they are, and can be embodied by modifying the constituent elements without departing from the gist of the invention at the implementation stage.
  • various inventions can be formed by appropriate combinations of the plurality of constituent elements disclosed in the above embodiments. For example, some components may be deleted from all the components shown in each embodiment. Furthermore, constituent elements of different embodiments may be combined as appropriate.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)

Abstract

One aspect of the present invention involves transforming training images that are photographed from a plurality of viewpoints into wavenumber domain training images respectively, expanding the wavenumber domain training images in the form of spherical harmonics series for each wavenumber component, generating a wavenumber domain image corresponding to a designated virtual viewpoint different from the plurality of viewpoints on the basis of the spherical harmonics series and information for specifying a direction of photography from the certain virtual viewpoint, and transforming the generated wavenumber domain image into a generated space domain image.

Description

多視点画像生成装置、方法およびプログラムMulti-viewpoint image generation device, method and program
 この発明の一態様は、例えば空間に配置したオブジェクトを異なる複数の視点から撮影した画像を入力として、上記画像に含まれない視点から撮影した画像を生成する多視点画像生成装置、方法およびプログラムに関する。 One aspect of the present invention relates to a multi-viewpoint image generating apparatus, method, and program for generating images captured from viewpoints not included in the above images, for example, using images captured from a plurality of different viewpoints of an object arranged in space as input. .
 複数のカメラで撮影した画像から被写体の3次元形状を推定し、この3次元形状を用いて、上記カメラにより撮影されていない任意の方向から撮影した画像を合成する技術が知られている。例えば、特許文献1には、複数の画像から被写体のシルエットを抽出して、当該シルエットからボクセルにより表現される3次元形状を推定し、この3次元形状を、仮想カメラを用いて任意の視点から撮影することにより、上記任意の方向から撮影した画像を合成する技術が記載されている。この種の技術は、例えばスポーツ中継において様々な視点から見た画像を提示することが可能となるため、コンテンツの制作やスポーツ科学の分野において重要な要素技術として注目されている。 A known technique is to estimate the three-dimensional shape of a subject from images taken by a plurality of cameras, and use this three-dimensional shape to synthesize images taken from arbitrary directions not taken by the above cameras. For example, in Patent Document 1, a silhouette of a subject is extracted from a plurality of images, a three-dimensional shape represented by voxels is estimated from the silhouette, and this three-dimensional shape is captured from an arbitrary viewpoint using a virtual camera. A technique for synthesizing the images photographed from the above arbitrary directions by photographing is described. This type of technology is attracting attention as an important elemental technology in the fields of content production and sports science, because it enables the presentation of images viewed from various viewpoints, for example, in live sports broadcasts.
日本国特許第5686412号公報Japanese Patent No. 5686412
 ところが、特許文献1に記載された技術は、被写体の3次元形状をボクセルを用いて表現しているため、3次元形状を表す画像のデータ量が大きくなる。このため、3次元形状の画像を処理する際の演算量が増加し、また演算のために大容量のメモリが必要となる。この傾向は、3次元形状の画像の解像度が高くなるに従い顕著になり、この結果画像生成装置の処理負荷および処理時間の増加を引き起こす。 However, since the technique described in Patent Document 1 expresses the three-dimensional shape of the subject using voxels, the data amount of the image representing the three-dimensional shape becomes large. For this reason, the amount of calculation increases when processing a three-dimensional image, and a large-capacity memory is required for the calculation. This tendency becomes more conspicuous as the resolution of the three-dimensional shape image increases, resulting in an increase in the processing load and processing time of the image generation device.
 この発明は上記事情に着目してなされたもので、データ量の大きい3次元形状の画像を用いることなく任意の視点に対応する画像を生成できるようにし、これにより画像処理に係る処理負荷の軽減および処理時間の短縮を可能とする技術を提供しようとするものである。 The present invention has been made in view of the above circumstances, and is capable of generating an image corresponding to an arbitrary viewpoint without using a three-dimensional image with a large amount of data, thereby reducing the processing load associated with image processing. And it is intended to provide a technique that enables shortening of the processing time.
 上記課題を解決するためにこの発明に係る多視点画像生成装置又は方法の第1の態様は、複数の視点から撮影された教師画像をそれぞれ波数領域の教師画像に変換し、前記波数領域の教師画像を波数成分ごとに球面調和級数に展開し、前記球面調和級数と、前記複数の視点とは異なる任意の仮想視点からの撮影方向を指定する情報とに基づいて、前記仮想視点に対応する波数領域の生成画像を生成し、前記波数領域の生成画像を空間領域の生成画像に変換するようにしたものである。 In order to solve the above-described problems, a first aspect of a multi-viewpoint image generation apparatus or method according to the present invention converts teacher images captured from a plurality of viewpoints into teacher images in the wavenumber domain, and An image is expanded into a spherical harmonic series for each wavenumber component, and the wavenumber corresponding to the virtual viewpoint is obtained based on the spherical harmonic series and information specifying a shooting direction from an arbitrary virtual viewpoint different from the plurality of viewpoints. A generated image of a region is generated, and the generated image of the wavenumber domain is transformed into a generated image of the spatial domain.
 この発明の第1の態様によれば、複数の視点から撮影された教師画像がその波数成分ごとに球面調和級数に展開され、この球面調和級数をもとに任意の仮想視点に対応する画像が生成される。従って、例えば被写体の3次元形状を、ボクセルを用いて表現する場合に比べ、少ないデータ量により仮想視点から見た画像を生成することが可能となり、これにより画像処理に係る処理負荷の軽減および処理時間の短縮を図ることが可能となる。 According to the first aspect of the present invention, a teacher image photographed from a plurality of viewpoints is developed into a spherical harmonic series for each wave number component, and an image corresponding to an arbitrary virtual viewpoint is generated based on this spherical harmonic series. generated. Therefore, for example, compared to the case of expressing the three-dimensional shape of the subject using voxels, it is possible to generate an image viewed from a virtual viewpoint with a smaller amount of data, thereby reducing the processing load and processing related to image processing. It is possible to shorten the time.
 この発明の第2の態様は、前記球面調和級数展開処理部により得られた前記球面調和展開級数を入力とし、最適化された球面調和展開級数を出力する球面調和展開級数最適化処理部を、さらに具備するようにしたものである。 A second aspect of the present invention provides a spherical harmonic expansion series optimization processing unit that receives as input the spherical harmonic expansion series obtained by the spherical harmonic expansion processing unit and outputs an optimized spherical harmonic expansion series, It is designed to be further equipped.
 この発明の第2の態様によれば、生成される多視点画像データの精度の低下を抑えて、高精度の多視点画像データを生成することが可能となる。 According to the second aspect of the present invention, it is possible to suppress deterioration in accuracy of generated multi-view image data and generate highly accurate multi-view image data.
 この発明の第3の態様は、球面調和関数を用いて、任意の仮想視点からの撮影方向に対応する画像を生成する多視点画像生成装置にあって、前記任意の仮想視点からの撮影方向を指定する情報が入力された場合に、取得された前記撮影方向に対応する球面調和関数の基底ベクトルを算出する基底ベクトル算出処理部と、算出された前記基底ベクトルを入力として、前記撮影方向に対応する空間領域の画像を生成し出力する画像生成処理部とを備える球面調和関数変換処理部を具備するようにしたものである。 A third aspect of the present invention is a multi-viewpoint image generation device that generates an image corresponding to a shooting direction from an arbitrary virtual viewpoint using spherical harmonics, wherein the shooting direction from the arbitrary virtual viewpoint is A basis vector calculation processing unit that calculates a basis vector of a spherical harmonic function corresponding to the obtained photographing direction when specified information is input, and a basis vector calculation processing unit that receives the calculated basis vector as an input and corresponds to the photographing direction. and a spherical harmonic transform processing section having an image generation processing section for generating and outputting an image in the spatial domain.
 この発明の第3の態様によれば、例えば教師画像データおよび教師方向データの規模が膨大で、直接球面調和展開級数データを算出することが困難な場合にも、球面調和展開級数およびそれに続く高精度化処理に相当するモデルを学習することで、生成される多視点画像データの精度の低下を抑えて、高精度の多視点画像データを生成することが可能となる。 According to the third aspect of the present invention, for example, even when the scale of teacher image data and teacher direction data is enormous and it is difficult to directly calculate spherical harmonic expansion series data, spherical harmonic expansion series and subsequent high-level By learning a model corresponding to the accuracy improvement process, it is possible to suppress deterioration in accuracy of generated multi-viewpoint image data and generate highly accurate multi-viewpoint image data.
 この発明の第4の態様は、前記画像生成処理部を、前記基底ベクトルを入力として、第1の解像度を有する第1の画像を生成し出力する第1のニューラルネットワークと、前記第1のニューラルネットワークから出力される前記第1の画像を入力として、前記第1の解像度より高い第2の解像度を有する第2の画像を生成し、生成された前記第2の画像を前記撮影方向に対応する空間領域の画像として出力する第2のニューラルネットワークとを備える構成としたものである。 According to a fourth aspect of the present invention, the image generation processing unit includes a first neural network for generating and outputting a first image having a first resolution using the basis vectors as an input; generating a second image having a second resolution higher than the first resolution using the first image output from a network as an input, and corresponding the generated second image to the photographing direction; and a second neural network for outputting an image in the spatial domain.
 この発明の第4の態様によれば、モデル学習を効率化すると共に、学習モデルのサイズを削減することができる。 According to the fourth aspect of the present invention, model learning can be made more efficient and the size of the learning model can be reduced.
 すなわちこの発明の各態様によれば、データ量の大きい3次元形状の画像を用いることなく任意の視点に対応する画像を生成できるようにし、これにより画像処理に係る処理負荷の軽減および処理時間の短縮を可能とする技術を提供することができる。 That is, according to each aspect of the present invention, an image corresponding to an arbitrary viewpoint can be generated without using a three-dimensional image with a large amount of data, thereby reducing the processing load and processing time associated with image processing. It is possible to provide a technique that enables shortening.
図1は、この発明の第1の実施形態に係る多視点画像生成装置のハードウェア構成の一例を示すブロック図である。FIG. 1 is a block diagram showing an example of the hardware configuration of a multi-viewpoint image generation device according to the first embodiment of the invention. 図2は、この発明の第1の実施形態に係る多視点画像生成装置のソフトウェア構成の一例を示すブロック図である。FIG. 2 is a block diagram showing an example of the software configuration of the multi-viewpoint image generation device according to the first embodiment of the invention. 図3は、図2に示した多視点画像生成装置の制御部が実行する多視点画像生成処理の処理手順と処理内容の一例を示すフローチャートである。FIG. 3 is a flowchart showing an example of a processing procedure and processing contents of a multi-viewpoint image generation process executed by a control unit of the multi-viewpoint image generation apparatus shown in FIG. 図4は、球面調和級数の説明に用いる図である。FIG. 4 is a diagram used for explaining spherical harmonic series. 図5は、この発明の第2の実施形態に係る多視点画像生成装置のソフトウェア構成の一例を示すブロック図である。FIG. 5 is a block diagram showing an example of the software configuration of the multi-viewpoint image generation device according to the second embodiment of the invention. 図6は、図5に示した多視点画像生成装置の制御部が実行する多視点画像生成処理の処理手順と処理内容の一例を示すフローチャートである。6 is a flow chart showing an example of a processing procedure and processing contents of a multi-viewpoint image generation process executed by a control unit of the multi-viewpoint image generation apparatus shown in FIG. 5. FIG. 図7は、図5に示した多視点画像生成装置が備える球面調和展開級数最適化処理部の動作例を説明するための図である。FIG. 7 is a diagram for explaining an operation example of a spherical harmonic expansion series optimization processing unit provided in the multi-viewpoint image generation device shown in FIG. 図8は、この発明の第3の実施形態に係る多視点画像生成装置のソフトウェア構成の一例を示すブロック図である。FIG. 8 is a block diagram showing an example of the software configuration of the multi-viewpoint image generation device according to the third embodiment of the invention. 図9は、図8に示した多視点画像生成装置の制御部が実行する多視点画像生成処理の処理手順と処理内容の一例を示すフローチャートである。9 is a flowchart showing an example of a processing procedure and processing contents of a multi-viewpoint image generation process executed by a control unit of the multi-viewpoint image generation apparatus shown in FIG. 8. FIG. 図10は、この発明の第4の実施形態に係る多視点画像生成装置の球面調和関数変換処理部のソフトウェア構成の一例を示すブロック図である。FIG. 10 is a block diagram showing an example of the software configuration of the spherical harmonic transform processing section of the multi-viewpoint image generation device according to the fourth embodiment of the present invention.
 以下、図面を参照してこの発明に係わる実施形態を説明する。 Hereinafter, embodiments according to the present invention will be described with reference to the drawings.
 [第1の実施形態]
 (構成例)
 図1はこの発明の第1の実施形態に係る多視点画像生成装置とその周辺部のハードウェア構成の一例を示すブロック図、図2は上記多視点画像生成のソフトウェア構成の一例を示すブロック図である。
[First embodiment]
(Configuration example)
FIG. 1 is a block diagram showing an example of the hardware configuration of a multi-viewpoint image generating apparatus and its peripheral parts according to the first embodiment of the present invention, and FIG. 2 is a block diagram showing an example of the software configuration for generating the multi-viewpoint image. is.
 多視点画像生成装置FGAは、例えばサーバコンピュータまたはパーソナルコンピュータ等の情報処理装置からなり、多視点画像生成装置FGAには信号ケーブルまたはネットワークを介して、複数台のカメラ61~6Nおよび入出力デバイス7が接続されている。 The multi-viewpoint image generation device FGA is composed of an information processing device such as a server computer or a personal computer. is connected.
 カメラ61~6Nは、例えばスポーツ等が行われるイベント会場に分散配置され、複数の視点からイベント会場内を撮影してその画像データを出力する。なお、カメラ61~6Nの撮影方向は、それぞれイベント会場内に予め設定された地点(原点)の方向を向くように調整されている。 The cameras 61 to 6N are distributed in, for example, an event venue where sports are held, and photograph the inside of the event venue from a plurality of viewpoints and output the image data. Note that the photographing directions of the cameras 61 to 6N are adjusted so as to point toward points (origins) preset in the event venue.
 入出力デバイス7は、例えばパーソナルコンピュータやスマートフォン、タブレット型端末からなる。入出力デバイス7は、第1の実施形態では、ユーザが指定した任意の仮想視点を表す情報を上記多視点画像生成装置FGAへ送信したり、多視点画像生成装置FGAから送られる画像データを受信して表示するために使用される。 The input/output device 7 is, for example, a personal computer, a smartphone, or a tablet terminal. In the first embodiment, the input/output device 7 transmits information representing an arbitrary virtual viewpoint specified by the user to the multi-viewpoint image generation device FGA, and receives image data sent from the multi-viewpoint image generation device FGA. used to display
 多視点画像生成装置FGAは、中央処理ユニット(Central Processing Unit:CPU)等のハードウェアプロセッサを使用した制御部1Aを備え、この制御部1Aに対し、バス5を介して、プログラム記憶部2およびデータ記憶部3を有する記憶ユニットと、入出力インタフェース(以後インタフェースをI/Fと略称する)部4を接続したものとなっている。 The multi-viewpoint image generation apparatus FGA includes a control unit 1A using a hardware processor such as a central processing unit (CPU). A storage unit having a data storage section 3 and an input/output interface (hereinafter the interface is abbreviated as I/F) section 4 are connected.
 入出力I/F部4は、通信インタフェース機能を有し、信号ケーブルまたはネットワークを介して、上記カメラ61~6Nおよび入出力デバイス7との間で画像データおよび入力データの送受信を行う。 The input/output I/F section 4 has a communication interface function, and transmits and receives image data and input data to and from the cameras 61 to 6N and the input/output device 7 via signal cables or networks.
 プログラム記憶部2は、例えば、記憶媒体としてSSD(Solid State Drive)等の随時書込みおよび読出しが可能な不揮発性メモリと、ROM(Read Only Memory)等の不揮発性メモリとを組み合わせて構成したもので、OS(Operating System)等のミドルウェアに加えて、第1の実施形態に係る各種制御処理を実行するために必要なアプリケーション・プログラムを格納する。なお、以後OSと各アプリケーション・プログラムとをまとめてプログラムと称する。 The program storage unit 2 is composed of, for example, a non-volatile memory such as an SSD (Solid State Drive) that can be written and read at any time as a storage medium, and a non-volatile memory such as a ROM (Read Only Memory). , OS (Operating System) and other middleware, as well as application programs necessary for executing various control processes according to the first embodiment. Hereinafter, the OS and each application program will be collectively referred to as programs.
 データ記憶部3は、例えば、記憶媒体として、SSD等の随時書込みおよび読出しが可能な不揮発性メモリと、RAM(Random Access Memory)等の揮発性メモリと組み合わせたもので、第1の実施形態を実施するために必要な主たる記憶部として、教師データ記憶部31と、波数領域データ記憶部32と、球面調和展開級数データ記憶部33と、生成画像データ記憶部34とを備えている。 The data storage unit 3 is, for example, a combination of a non-volatile memory such as an SSD that can be written and read at any time and a volatile memory such as a RAM (Random Access Memory) as a storage medium. A teacher data storage unit 31, a wavenumber domain data storage unit 32, a spherical harmonic expansion series data storage unit 33, and a generated image data storage unit 34 are provided as main storage units necessary for implementation.
 教師データ記憶部31は、カメラ61~6Nから送られる画像データと、当該カメラ61~6Nの設置位置または撮影方向を表す情報を、教師データとして記憶するために使用される。 The teacher data storage unit 31 is used to store image data sent from the cameras 61 to 6N and information representing the installation positions or shooting directions of the cameras 61 to 6N as teacher data.
 波数領域データ記憶部32は、後述する制御部1Aにより波数領域のデータに変換された画像データを記憶するために使用される。 The wavenumber domain data storage unit 32 is used to store image data converted into wavenumber domain data by the control unit 1A, which will be described later.
 球面調和展開級数データ記憶部33は、後述する制御部1Aにより上記波数領域の画像データをもとに展開された球面調和級数データを記憶するために使用される。 The spherical harmonic expansion series data storage unit 33 is used to store spherical harmonic series data expanded based on the image data in the wavenumber domain by the control unit 1A, which will be described later.
 生成画像データ記憶部34は、後述する制御部1Aにより生成される、仮想視点に対応する波数領域の画像データを記憶するために使用される。 The generated image data storage unit 34 is used to store image data in the wavenumber domain corresponding to the virtual viewpoint generated by the control unit 1A, which will be described later.
 制御部1Aは、第1の実施形態を実施するために必要な処理機能として、教師データ取得処理部11と、フーリエ変換処理部12と、球面調和級数展開処理部13と、球面調和逆変換処理部14と、逆フーリエ変換処理部15とを備える。これらの処理部11~15は、何れもプログラム記憶部2に格納されたアプリケーション・プログラムを制御部1Aのハードウェアプロセッサに実行させることにより実現される。 The control unit 1A includes, as processing functions necessary for carrying out the first embodiment, a teacher data acquisition processing unit 11, a Fourier transform processing unit 12, a spherical harmonic series expansion processing unit 13, and a spherical harmonic inverse transform processing. and an inverse Fourier transform processing unit 15 . These processing units 11 to 15 are realized by causing the hardware processor of the control unit 1A to execute the application programs stored in the program storage unit 2. FIG.
 なお、上記処理部11~15の一部または全部は、LSI(Large Scale Integration)やASIC(Application Specific Integrated Circuit)等のハードウェアを用いて実現されてもよい。 Some or all of the processing units 11 to 15 may be implemented using hardware such as LSI (Large Scale Integration) and ASIC (Application Specific Integrated Circuit).
 教師データ取得処理部11は、カメラ61~6Nから出力された画像データを入出力I/F部4を介してそれぞれ受信し、受信された各画像データを教師画像データとして教師データ記憶部31に記憶させる。また教師データ取得処理部11は、カメラ61~6Nの設置位置または撮影方向を表すカメラ属性情報を、カメラ61~6Nまたは入出力デバイス7から取得し、取得された上記カメラ属性情報を教師方向データとして、上記各カメラ61~6Nの画像データと対応付けて上記教師データ記憶部31に記憶させる。 The teacher data acquisition processing unit 11 receives image data output from the cameras 61 to 6N through the input/output I/F unit 4, and stores the received image data as teacher image data in the teacher data storage unit 31. Memorize. Further, the teacher data acquisition processing unit 11 acquires camera attribute information representing the installation positions or shooting directions of the cameras 61 to 6N from the cameras 61 to 6N or the input/output device 7, and stores the acquired camera attribute information as teacher direction data. , are stored in the teacher data storage unit 31 in association with the image data of the cameras 61 to 6N.
 フーリエ変換処理部12は、上記教師データ記憶部31に記憶された上記各カメラ61~6Nの各画像データをフーリエ変換処理により波数領域の画像データにそれぞれ変換し、変換された波数領域の各画像データを波数領域データ記憶部32に記憶させる。 The Fourier transform processing unit 12 transforms each image data of each of the cameras 61 to 6N stored in the teacher data storage unit 31 into image data in the wavenumber domain by Fourier transform processing, and converts each image in the wavenumber domain. The data are stored in the wavenumber domain data storage unit 32 .
 球面調和級数展開処理部13は、上記波数領域データ記憶部32に記憶された波数領域の各画像データを、上記教師データ記憶部31に記憶された上記教師方向データをもとに、波数ごとに球面調和級数に展開する。そして、展開された上記球面調和展開級数データを球面調和展開級数データ記憶部33に記憶させる。 The spherical harmonic series expansion processing unit 13 converts each image data in the wavenumber domain stored in the wavenumber domain data storage unit 32 to each wavenumber based on the teacher direction data stored in the teacher data storage unit 31. Expand to a spherical harmonic series. Then, the expanded spherical harmonic expansion series data is stored in the spherical harmonic expansion series data storage unit 33 .
 球面調和逆変換処理部14は、入出力デバイス7において入力された仮想視点からの撮影方向を表すデータを入出力I/F部4を介して取得する。そして、上記球面調和展開級数データ記憶部33に記憶された上記球面調和展開級数データを逆球面調和変換することで、上記仮想視点から見た撮影方向に対応する波数領域の画像データを生成し、生成された上記波数領域の画像データを生成画像データ記憶部34に記憶させる。 The inverse spherical harmonic transform processing unit 14 acquires, via the input/output I/F unit 4 , data representing the shooting direction from the virtual viewpoint input from the input/output device 7 . Then, the spherical harmonic expansion series data stored in the spherical harmonic expansion series data storage unit 33 is subjected to inverse spherical harmonic transformation to generate image data in the wavenumber domain corresponding to the photographing direction viewed from the virtual viewpoint, The generated image data of the wavenumber domain is stored in the generated image data storage unit 34 .
 逆フーリエ変換処理部15は、上記生成画像データ記憶部34に記憶された上記波数領域の生成画像データを逆フーリエ変換処理により空間領域の生成画像データに変換する。そして、変換された上記空間領域の生成画像データを入出力I/F部4から入出力デバイス7へ出力する。 The inverse Fourier transform processing unit 15 transforms the generated image data in the wavenumber domain stored in the generated image data storage unit 34 into generated image data in the spatial domain by inverse Fourier transform processing. Then, the converted generated image data of the spatial domain is output from the input/output I/F section 4 to the input/output device 7 .
 (動作例)
 次に、以上のように構成された多視点画像生成装置FGAの動作例を説明する。 
 図3は、多視点画像生成装置FGAの制御部1Aにより実行される多視点画像生成処理の処理手順と処理内容の一例を示すフローチャートである。
(Operation example)
Next, an operation example of the multi-viewpoint image generation apparatus FGA configured as described above will be described.
FIG. 3 is a flow chart showing an example of the processing procedure and processing contents of the multi-viewpoint image generation process executed by the control unit 1A of the multi-viewpoint image generation apparatus FGA.
 (1)教師データの取得
 例えば、いまイベント会場において、カメラ61~6Nにより会場内の被写体が撮影され、その画像データが送信されたとする。多視点画像生成装置FGAの制御部1Aは、ステップS11により上記画像データの送信要求を受け取ると、教師データ取得処理部11の制御の下、ステップS12により上記カメラ61~6Nから送信される上記画像データを入出力I/F部4を介してそれぞれ受信し、受信された上記各画像データを教師画像データとして教師データ記憶部31に記憶させる。 
 なお、当然、コンピュータグラフィクス等で仮想的に再現されたオブジェクトに対して、仮想的に配置されたカメラ61~6Nを用いて画像データを撮影し、送信するようにしてもよい。
(1) Acquisition of teaching data For example, assume that the cameras 61 to 6N photograph subjects in the event venue and transmit the image data. Upon receiving the transmission request for the image data in step S11, the control unit 1A of the multi-viewpoint image generation apparatus FGA receives the image data transmitted from the cameras 61 to 6N in step S12 under the control of the teacher data acquisition processing unit 11. Each data is received via the input/output I/F unit 4, and each of the received image data is stored in the teacher data storage unit 31 as teacher image data.
Of course, image data may be captured using the virtually arranged cameras 61 to 6N for an object that is virtually reproduced by computer graphics or the like, and the image data may be transmitted.
 またそれと共に、教師データ取得処理部11は、カメラ61~6Nの設置位置または撮影方向を表すカメラ属性情報を、カメラ61~6Nまたは入出力デバイス7から入出力I/F4を介して受信する。そして、受信された上記カメラ属性情報を教師方向データとして、上記各カメラ61~6Nの画像データと対応付けて上記教師データ記憶部31に記憶させる。 At the same time, the teacher data acquisition processing unit 11 receives camera attribute information indicating the installation positions or shooting directions of the cameras 61 to 6N from the cameras 61 to 6N or the input/output device 7 via the input/output I/F 4. Then, the received camera attribute information is stored in the teacher data storage unit 31 as teacher direction data in association with the image data of each of the cameras 61 to 6N.
 ここで、上記各カメラ61~6Nにより得られる教師画像データは、被写体としてのオブジェクトを原点とし、この原点から等距離でかつ異なる方向(仰角、方位角)に存在する複数の視点から撮影されて得られたものである。例えば、いま図4に示す球体モデルを定義すると、教師画像データは座標x,yを用いてfi (x,y)と表される。また、オブジェクトから見たカメラの方向は(θi ,φ)と表される。但し(1≦i≦N)である。 Here, the teacher image data obtained by each of the cameras 61 to 6N is taken from a plurality of viewpoints which are equidistant from the origin of the object as the subject and which are present in different directions (elevation angle, azimuth angle). It is obtained. For example, if the spherical model shown in FIG. 4 is defined, teacher image data is represented as f i (x, y) using coordinates x, y. Also, the direction of the camera viewed from the object is expressed as (θ i , φ i ). However, (1≤i≤N).
 (2)フーリエ変換
 上記教師画像データが取得されると、多視点画像生成装置FGAの制御部1Aは、フーリエ変換処理部12の制御の下、ステップS13において、上記教師データ記憶部31から各教師画像データを読み込み、これらの教師画像データをフーリエ変換処理によりそれぞれ波数領域の画像データに変換する。そして、変換された上記波数領域の画像データを波数領域データ記憶部32に保存する。上記波数領域の画像データをFi (kx ,ky) と表す。但し(1≦i≦N)である。
(2) Fourier Transform When the teacher image data is obtained, the controller 1A of the multi-viewpoint image generation device FGA, under the control of the Fourier transform processor 12, in step S13, converts the teacher data storage unit 31 to each teacher. Image data is read, and these teacher image data are converted into image data in the wavenumber domain by Fourier transform processing. Then, the converted image data of the wavenumber domain is stored in the wavenumber domain data storage unit 32 . The image data in the wavenumber domain is expressed as F i (k x , k y ). However, (1≤i≤N).
 (3)球面調和展開級数への変換
 多視点画像生成装置FGAの制御部1Aは、次にステップS14において、球面調和級数展開処理部13の制御の下、上記波数領域データ記憶部32から上記波数領域の画像データをFi (kx ,ky) を読み込み、読み込まれた上記画像データをFi (kx ,ky) を波数成分(k,k)ごとに球面調和展開級数に変換する。そして、得られた球面調和展開級数データを球面調和展開級数データ記憶部33に保存する。
(3) Conversion to Spherical Harmonic Expansion Series Next, in step S14, the control unit 1A of the multi-viewpoint image generation apparatus FGA converts the wavenumbers from the wavenumber domain data storage unit 32 to the wavenumber domain data storage unit 32 under the control of the spherical harmonic series expansion processing unit 13. Fi (k x , ky ) is read as the image data of the region, and Fi (k x , ky ) is converted into a spherical harmonic expansion series for each wave number component (k x , ky ). Convert. Then, the obtained spherical harmonic expansion series data is stored in the spherical harmonic expansion series data storage unit 33 .
 例えば、上記球面調和展開級数データは、数値積分を用いて、
Figure JPOXMLDOC01-appb-M000001
により算出することができる。
For example, the above spherical harmonic expansion series data is obtained by using numerical integration,
Figure JPOXMLDOC01-appb-M000001
It can be calculated by
 なお、球面調和級数展開の逆変換が次式により表されることを利用して、球面調和級数展開を定義してもよい。但し、球面調和級数展開の打ち切り次数をM次とした。 
Figure JPOXMLDOC01-appb-M000002
Note that the spherical harmonic series expansion may be defined using the fact that the inverse transformation of the spherical harmonic series expansion is represented by the following equation. However, the truncated order of the spherical harmonic series expansion is set to the Mth order.
Figure JPOXMLDOC01-appb-M000002
 ここで、行列YおよびベクトルAはそれぞれ次式で定義される。 
Figure JPOXMLDOC01-appb-M000003
Here, matrix Y and vector A are defined by the following equations, respectively.
Figure JPOXMLDOC01-appb-M000003
 また、波数領域の教師画像データのうち、波数(k,k)の成分のみを並べたベクトルを定義する。このベクトルは、
Figure JPOXMLDOC01-appb-M000004
で示される。
Also, a vector is defined in which only wavenumber (k x , k y ) components are arranged in the teacher image data in the wavenumber domain. This vector is
Figure JPOXMLDOC01-appb-M000004
is indicated by
 これらを用いて、球面調和級数展開を、
Figure JPOXMLDOC01-appb-M000005
により計算することも可能である。なお、Y+は行列Yの擬似逆行列である。
Using these, the spherical harmonic series expansion is
Figure JPOXMLDOC01-appb-M000005
It is also possible to calculate by Note that Y + is a pseudo-inverse matrix of the matrix Y.
 (4)仮想視点に対応する画像データの生成
 多視点画像生成装置FGAの制御部1Aは、球面調和逆変換処理部14の制御の下、ステップS15において、仮想視点の指定入力を監視している。この状態で、例えばユーザが入出力デバイス7において、任意の仮想視点を指定入力すると、入出力デバイス7から上記仮想視点からの撮影方向を指定するデータが送信される。
(4) Generation of Image Data Corresponding to Virtual Viewpoints The control unit 1A of the multi-viewpoint image generation apparatus FGA monitors virtual viewpoint designation inputs in step S15 under the control of the spherical harmonic inverse transform processing unit 14. . In this state, for example, when the user designates and inputs an arbitrary virtual viewpoint using the input/output device 7, the input/output device 7 transmits data designating the photographing direction from the virtual viewpoint.
 球面調和逆変換処理部14は、上記仮想視点からの撮影方向を指定するデータを受け取ると、ステップS16において、上記球面調和展開級数データ記憶部33から上記球面調和展開級数データを読み込み、上記仮想視点からの撮影方向を指定するデータと、読み込まれた上記球面調和展開級数データとをもとに、上記仮想視点から見た波数領域の画像データを生成する。そして、生成された上記仮想視点から見た波数領域の画像データを生成画像データ記憶部34に保存する。 When the spherical harmonic inverse transform processing unit 14 receives the data specifying the shooting direction from the virtual viewpoint, in step S16, the spherical harmonic expansion series data is read from the spherical harmonic expansion series data storage unit 33, and the virtual viewpoint Image data of the wave number domain viewed from the virtual viewpoint is generated based on the data specifying the imaging direction from the virtual viewpoint and the read spherical harmonic expansion series data. Then, the generated image data of the wavenumber domain viewed from the virtual viewpoint is stored in the generated image data storage unit 34 .
 例えば、球面調和逆変換処理部14は、仮想視点からの撮影方向を(θ^,φ^)とすると、先ず次式に従い基底ベクトルを計算する。 
Figure JPOXMLDOC01-appb-M000006
For example, the inverse spherical harmonic transform processing unit 14 first calculates a basis vector according to the following equation, where (θ̂, φ̂) is the shooting direction from the virtual viewpoint.
Figure JPOXMLDOC01-appb-M000006
 球面調和逆変換処理部14は、次に上記基底ベクトルを用いて、波数領域の生成画像データF^(kx, ky) を、
Figure JPOXMLDOC01-appb-M000007
により算出する。
Next, the inverse spherical harmonic transform processing unit 14 uses the basis vectors to transform the generated image data F^(k x , ky ) in the wavenumber domain into
Figure JPOXMLDOC01-appb-M000007
Calculated by
 球面調和逆変換処理部14は、以上の計算をすべての波数成分(k,k)に対し、それぞれ実行する。 The inverse spherical harmonic transform processor 14 performs the above calculations on all wavenumber components (k x , k y ).
 (5)空間領域の画像データへの変換
 多視点画像生成装置FGAの制御部1Aは、最後に逆フーリエ変換処理部15の制御の下、ステップS17において、生成画像データ記憶部34から上記仮想視点から見た波数領域の画像データF^(kx, ky) を読み込む。そして、読み込まれた上記波数領域の画像データF^(kx, ky) を、逆フーリエ変換処理により空間領域の画像データに変換する。
(5) Transformation into Spatial Domain Image Data Finally, under the control of the inverse Fourier transform processing unit 15, the control unit 1A of the multi-viewpoint image generation apparatus FGA converts the generated image data storage unit 34 to the virtual viewpoints in step S17. Read the image data F^(k x , k y ) in the wavenumber domain viewed from . Then, the read image data F^(k x , k y ) in the wavenumber domain is transformed into image data in the spatial domain by inverse Fourier transform processing.
 多視点画像生成装置FGAの制御部1Aは、上記空間領域の画像データを入出力I/F部4から入出力デバイス7へ送信する。 The control unit 1A of the multi-viewpoint image generation apparatus FGA transmits the image data of the spatial region from the input/output I/F unit 4 to the input/output device 7.
 かくして、入出力デバイス7では、ユーザが入力指定した仮想視点に対し生成された画像データが表示される。 Thus, the input/output device 7 displays the image data generated for the virtual viewpoint designated by the user.
 (作用・効果)
 以上述べたように第1の実施形態では、複数の視点から撮影された教師画像データとその撮影方向を示す属性データを取得し、上記教師画像データをフーリエ変換により波数領域の画像データに変換した後、波数成分ごとに球面調和展開級数データに変換する。そして、上記球面調和展開級数データと、ユーザにより指定された仮想視点からの撮影方向を示す情報とをもとに、球面調和逆演算を行って上記仮想視点から見た波数領域の画像データを生成し、生成された画像データを逆フーリエ変換により空間領域の画像データに変換し、入出力デバイス7へ出力するようにしている。
(action/effect)
As described above, in the first embodiment, teacher image data photographed from a plurality of viewpoints and attribute data indicating the shooting direction of the teacher image data are acquired, and the teacher image data are converted into image data in the wavenumber domain by Fourier transform. After that, each wavenumber component is converted into spherical harmonic expansion series data. Based on the spherical harmonic expansion series data and the information indicating the imaging direction from the virtual viewpoint designated by the user, spherical harmonic inverse calculation is performed to generate image data of the wavenumber domain viewed from the virtual viewpoint. Then, the generated image data is transformed into image data in the spatial domain by inverse Fourier transform, and is output to the input/output device 7 .
 従って、例えば被写体の3次元形状をボクセルを用いて表現する場合に比べ、少ないデータ量により仮想視点から見た画像データを生成することが可能となり、これにより多視点画像生成装置FGAの画像処理に係る処理負荷の軽減および処理時間の短縮を図ることが可能となる。 Therefore, it is possible to generate image data viewed from virtual viewpoints with a smaller amount of data than, for example, when the three-dimensional shape of an object is expressed using voxels. It is possible to reduce the processing load and shorten the processing time.
 [第2の実施形態]
 この発明の第2の実施形態は、球面調和展開級数に変換されたデータに対し、事前に学習したポストフィルタを適用することで、生成される画像データと目標となる画像データとの間の誤差を最小化した球面調和展開級数データを生成し、生成された球面調和展開級数データを逆変換処理に供するようにしたものである。
[Second embodiment]
In the second embodiment of the present invention, by applying a pre-learned post-filter to the data converted to the spherical harmonic expansion series, the error between the generated image data and the target image data is reduced. is minimized to generate spherical harmonic expansion series data, and the generated spherical harmonic expansion series data is subjected to inverse transformation processing.
 (構成例)
 図5は、この発明の第2の実施形態に係る多視点画像生成装置FGBのソフトウェア構成を示すブロック図である。 
 なお、図5において前記図2と同一部分には同一符号を付して詳しい説明は省略する。また、多視点画像生成装置FGBのハードウェア構成については図1と同一なので説明は省略する。
(Configuration example)
FIG. 5 is a block diagram showing the software configuration of the multi-viewpoint image generation device FGB according to the second embodiment of the invention.
In FIG. 5, the same parts as in FIG. 2 are denoted by the same reference numerals, and detailed description thereof will be omitted. Also, the hardware configuration of the multi-viewpoint image generation device FGB is the same as that in FIG. 1, so the description is omitted.
 多視点画像生成装置FGBの制御部1Bは、教師データ取得処理部11、フーリエ変換処理部12、球面調和級数展開処理部13、球面調和逆変換処理部14および逆フーリエ変換処理部15に加え、球面調和展開級数最適化処理部16を備えている。これらの処理部11~16は、何れもプログラム記憶部2に格納されたアプリケーション・プログラムを制御部1Aのハードウェアプロセッサに実行させることにより実現される。 The control unit 1B of the multi-viewpoint image generation device FGB includes a teacher data acquisition processing unit 11, a Fourier transform processing unit 12, a spherical harmonic series expansion processing unit 13, an inverse spherical harmonic transform processing unit 14, and an inverse Fourier transform processing unit 15. A spherical harmonic expansion series optimization processing unit 16 is provided. These processing units 11 to 16 are realized by causing the hardware processor of the control unit 1A to execute the application programs stored in the program storage unit 2. FIG.
 なお、この第2の実施形態においても、上記処理部11~16の一部または全部は、LSIやASIC等のハードウェアを用いて実現されてもよい。 Also in the second embodiment, part or all of the processing units 11 to 16 may be implemented using hardware such as LSI and ASIC.
 球面調和展開級数最適化処理部16は、例えばポストフィルタを構成する多層ニューラルネットワークを備え、球面調和級数展開処理部13から出力された球面調和展開級数データを入力として、最適化された球面調和展開級数データを出力する。球面調和展開級数最適化処理部16は、出力した上記最適化された球面調和展開級数データを、球面調和展開級数データ記憶部33に保存する。 The spherical harmonic expansion series optimization processing unit 16 includes, for example, a multi-layer neural network constituting a postfilter, and receives as input the spherical harmonic expansion series data output from the spherical harmonic expansion processing unit 13, and performs an optimized spherical harmonic expansion. Output series data. The spherical harmonic expansion series optimization processing unit 16 stores the output optimized spherical harmonic expansion series data in the spherical harmonic expansion series data storage unit 33 .
 球面調和逆変換処理部14は、入出力デバイス7において入力された仮想視点からの撮影方向を表すデータを入出力I/F部4を介して取得する。またそれと共に、上記球面調和展開級数データ記憶部33から上記最適化された球面調和展開級数データを読み込む。そして、上記最適化された球面調和展開級数データを逆球面調和変換することで、上記仮想視点から見た撮影方向に対応する波数領域の画像データを生成し、生成された上記波数領域の画像データを生成画像データ記憶部34に記憶させる。 The inverse spherical harmonic transform processing unit 14 acquires, via the input/output I/F unit 4 , data representing the shooting direction from the virtual viewpoint input from the input/output device 7 . At the same time, the optimized spherical harmonic expansion series data is read from the spherical harmonic expansion series data storage unit 33 . Then, the optimized spherical harmonic expansion series data is subjected to inverse spherical harmonic transformation to generate image data in the wavenumber domain corresponding to the imaging direction seen from the virtual viewpoint, and the generated image data in the wavenumber domain is stored in the generated image data storage unit 34 .
 (動作例)
 次に、以上のように構成された多視点画像生成装置FGBの動作例を説明する。 
 図6は、多視点画像生成装置FGBの制御部1Bにより実行される多視点画像生成処理の処理手順と処理内容の一例を示すフローチャートである。なお、図6において前記図3と同一処理を行うステップには同一符号を付して詳しい説明は省略する。
(Operation example)
Next, an operation example of the multi-viewpoint image generation device FGB configured as described above will be described.
FIG. 6 is a flowchart showing an example of the processing procedure and processing contents of the multi-viewpoint image generation process executed by the control unit 1B of the multi-viewpoint image generation device FGB. In FIG. 6, steps in which the same processes as those in FIG. 3 are performed are denoted by the same reference numerals, and detailed description thereof will be omitted.
 多視点画像生成装置FGBの制御部1Bは、球面調和級数展開処理部13から球面調和展開級数データが出力されると、球面調和展開級数最適化処理部16の制御の下、ステップS19において上記球面調和展開級数データを最適化処理する。 When the spherical harmonic expansion series data is output from the spherical harmonic expansion processing unit 13, the control unit 1B of the multi-viewpoint image generation device FGB, under the control of the spherical harmonic expansion series optimization processing unit 16, expands the spherical surface in step S19. Optimize the harmonic expansion series data.
 例えば、球面調和展開級数最適化処理部16は、多層ニューラルネットワークを用い、この多層ニューラルネットワークに、上記球面調和級数展開処理部13から出力された球面調和展開級数データAmn(kx ,ky) を入力し、多層ニューラルネットワークから最適化された球面調和展開級数データA~ mn(kx ,ky) を出力する。具体的には、多層ニューラルネットワークはポストフィルタを構成し、生成される画像データと目標となる画像データとの間の誤差を最小化した球面調和展開級数データA~ mn(kx ,ky) を生成し出力する。 For example, the spherical harmonic expansion series optimization processing unit 16 uses a multi-layer neural network, and the spherical harmonic expansion series data A mn (k x , k y ) and outputs the optimized spherical harmonic expansion series data A ~ mn (k x ,k y ) from the multi-layer neural network. Specifically, the multi-layer neural network constructs a postfilter to generate spherical harmonic expansion series data A ~ mn (k x , k y ) that minimizes the error between the generated image data and the target image data. is generated and output.
 そして、球面調和展開級数最適化処理部16は、出力された上記最適化された球面調和展開級数データA~ mn(kx ,ky) を、球面調和展開級数データ記憶部33に保存する。 Then, the spherical harmonic expansion series optimization processing unit 16 stores the output optimized spherical harmonic expansion series data A ~ mn (k x , k y ) in the spherical harmonic expansion series data storage unit 33.
 なお、上記球面調和展開級数データAmn(kx ,ky) を最適化処理する際に、球面調和展開級数最適化処理部16は、例えば上記最適化処理を予め設定された回数だけ反復するようにしてもよい。図7はその動作例を示すものである。 When optimizing the spherical harmonic expansion series data A mn (k x , k y ), the spherical harmonic expansion series optimization processing unit 16 repeats the optimization process a preset number of times, for example. You may do so. FIG. 7 shows an example of its operation.
 すなわち、球面調和展開級数最適化処理部16は、球面調和展開級数データAmn(kx ,ky) を図7(a)に示すように多層ニューラルネットモジュール161に入力して最適化処理した後、その出力を図7(b)に示すように再度多層ニューラルネットモジュール161に入力して最適化処理する。そして、この最適化処理を予め設定された回数反復した後、最終的に得られた最適化球面調和展開級数データA~ mn(kx ,ky) を図7(c)に示すように出力して、球面調和展開級数データ記憶部33に保存する。 That is, the spherical harmonic expansion series optimization processing unit 16 inputs the spherical harmonic expansion series data A mn (k x , k y ) to the multilayer neural network module 161 as shown in FIG. After that, the output is input again to the multi-layer neural network module 161 and optimized as shown in FIG. 7(b). Then, after repeating this optimization process a preset number of times, the finally obtained optimized spherical harmonic expansion series data A ~ mn (k x , k y ) are output as shown in Fig. 7(c). and stored in the spherical harmonic expansion series data storage unit 33 .
 なお、球面調和展開級数最適化処理部16を構成する多層ニューラルネットワークのパラメータの学習手法には、誤差逆伝搬等の手法を適用することができる。 It should be noted that a method such as error backpropagation can be applied to the parameter learning method of the multi-layer neural network that constitutes the spherical harmonic expansion series optimization processing unit 16 .
 また、球面調和展開級数最適化処理部16は、入力される球面調和展開級数データAmn(kx ,ky) よりも高い次数を持つ最適化された球面調和展開級数データA~ mn(kx ,ky) を出力するようにしてもよい。 Further, the spherical harmonic expansion series optimization processing unit 16 generates optimized spherical harmonic expansion series data A ~ mn (k x , k y ) may be output.
 この場合、球面調和展開級数最適化処理部16は、例えば、最適化された球面調和展開級数データA~ mn(kx ,ky) と同じ次数を持つ初期球面調和展開級数データA(0) mn(kx ,ky) を、球面調和級数展開処理部13から出力される球面調和展開級数データAmn(kx ,ky) の代わりに多層ニューラルネットワークに入力する。 In this case, the spherical harmonic expansion series optimization processing unit 16, for example, generates initial spherical harmonic expansion series data A (0) having the same degree as the optimized spherical harmonic expansion series data A ~ mn (k x , k y ) mn (k x , k y ) is input to the multi-layer neural network instead of the spherical harmonic series data A mn (k x , k y ) output from the spherical harmonic series expansion processor 13 .
 なお、初期球面調和展開級数データA(0) mn(kx ,ky) は、例えば、球面調和展開級数データAmn(kx ,ky) に含まれない高次の球面調和展開級数をゼロや乱数などで初期化することにより生成することができる。 Note that the initial spherical harmonic expansion series data A (0) mn (k x , k y ) is, for example, a higher-order spherical harmonic expansion series that is not included in the spherical harmonic expansion series data A mn (k x , k y ). It can be generated by initializing with zero or a random number.
 次に多視点画像生成装置FGBの制御部1Bは、入出力デバイス7においてユーザが入力した仮想視点からの撮影方向を指定するデータと、上記球面調和展開級数データ記憶部33に保存された上記最適化された球面調和展開級数データA~ mn(kx ,ky) とを球面調和逆変換処理部14に入力し、この球面調和逆変換処理部14により上記仮想視点から見た波数領域の画像データを生成する。この処理は、第1の実施形態で述べたものと同一である。 Next, the control unit 1B of the multi-viewpoint image generation device FGB controls the data designating the shooting direction from the virtual viewpoint input by the user in the input/output device 7 and the optimum image data stored in the spherical harmonic expansion series data storage unit 33. The converted spherical harmonic expansion series data A ~ mn (k x , k y ) are input to the spherical harmonic inverse transform processing unit 14, and the spherical harmonic inverse transform processing unit 14 generates an image of the wavenumber domain seen from the virtual viewpoint. Generate data. This processing is the same as that described in the first embodiment.
 (作用・効果)
 以上述べたように第2の実施形態では、球面調和級数展開処理部13により得られた球面調和展開級数データが球面調和展開級数最適化処理部16に入力され、この球面調和展開級数最適化処理部16により最適化処理される。
(action/effect)
As described above, in the second embodiment, the spherical harmonic expansion series data obtained by the spherical harmonic expansion series processing unit 13 is input to the spherical harmonic expansion series optimization processing unit 16, and this spherical harmonic expansion series optimization processing is performed. Optimization processing is performed by the unit 16 .
 従って、例えば求めたい球面調和展開級数の次数に対して学習用画像の数が不十分な場合には、生成される多視点画像データの精度が低下することが懸念されるが、球面調和展開級数最適化処理部16により球面調和展開級数データを最適化することにより、生成される多視点画像データの精度の低下を抑えて、高精度の多視点画像データを生成することが可能となる。 Therefore, for example, if the number of training images is insufficient for the order of the desired spherical harmonic expansion series, there is concern that the accuracy of the generated multi-viewpoint image data will decrease. By optimizing the spherical harmonic expansion series data by the optimization processing unit 16, it is possible to suppress deterioration in accuracy of the generated multi-viewpoint image data and generate highly accurate multi-viewpoint image data.
 [第3の実施形態]
 この発明の第3の実施形態は、球面調和関数変換処理部に基底ベクトル算出部と、多層ニューラルネットワークを備え、指定された撮影方向に対応する球面調和関数の基底ベクトルを生成し、生成された基底ベクトルを事前に学習された上記多層ニューラルネットワークに入力することで、上記撮影方向から見た画像データを生成し出力するようにしたものである。
[Third embodiment]
In the third embodiment of the present invention, a spherical harmonics transform processing unit includes a basis vector calculation unit and a multi-layered neural network to generate basis vectors of spherical harmonics corresponding to a designated shooting direction. By inputting base vectors into the above-mentioned multi-layered neural network which has been learned in advance, image data viewed from the above-mentioned photographing direction is generated and output.
 (構成例)
 図8は、この発明の第3の実施形態に係る多視点画像生成装置FGCのソフトウェア構成を示すブロック図である。 
 なお、図8において前記図2と同一部分には同一符号を付して詳しい説明は省略する。また、多視点画像生成装置FGCのハードウェア構成については図1と同一なので説明は省略する。
(Configuration example)
FIG. 8 is a block diagram showing the software configuration of the multi-viewpoint image generation device FGC according to the third embodiment of the invention.
In FIG. 8, the same parts as in FIG. 2 are denoted by the same reference numerals, and detailed description thereof will be omitted. Further, the hardware configuration of the multi-viewpoint image generation device FGC is the same as that of FIG. 1, so the description is omitted.
 多視点画像生成装置FGCの制御部1Cは、教師データ取得処理部11と、球面調和関数変換処理部17とを備えている。これらの処理部11、17は、何れもプログラム記憶部2に格納されたアプリケーション・プログラムを制御部1Cのハードウェアプロセッサに実行させることにより実現される。 The control unit 1C of the multi-viewpoint image generation device FGC includes a teacher data acquisition processing unit 11 and a spherical harmonic transform processing unit 17. These processing units 11 and 17 are realized by causing the hardware processor of the control unit 1C to execute the application programs stored in the program storage unit 2. FIG.
 なお、この第3の実施形態においても、上記各処理部11、17の一部または全部は、LSIやASIC等のハードウェアを用いて実現されてもよい。 Also in the third embodiment, part or all of the processing units 11 and 17 may be realized using hardware such as LSI and ASIC.
 教師データ取得処理部11は、カメラ61~6Nにより撮影された画像データと、カメラ61~6Nの設置位置または撮影方向を表すカメラ属性情報とを取得し、取得された上記画像データおよびカメラ属性情報を教師データとして、上記教師データ記憶部31に記憶させる。上記教師データは、後述する球面調和関数変換処理部17の多層ニューラルネットワークのパラメータを学習するためのデータとして使用される。 The teacher data acquisition processing unit 11 acquires image data captured by the cameras 61 to 6N and camera attribute information representing the installation positions or shooting directions of the cameras 61 to 6N, and stores the acquired image data and camera attribute information. is stored in the teacher data storage unit 31 as teacher data. The teacher data is used as data for learning the parameters of a multi-layer neural network of the spherical harmonic transform processing unit 17, which will be described later.
 球面調和関数変換処理部17は、例えば基底ベクトル算出処理部と、多層ニューラルネットワークを使用した画像データ生成処理部とを備える。上記多層ニューラルネットワークは、上記教師データ記憶部31に記憶された教師データを学習データとして、球面調和関数の基底ベクトルを入力とし、それに対応する対応する空間領域の画像データを出力するように、事前にパラメータが学習される。 The spherical harmonic transform processing unit 17 includes, for example, a basis vector calculation processing unit and an image data generation processing unit using a multi-layer neural network. The multi-layer neural network is configured in advance so that the teacher data stored in the teacher data storage unit 31 is used as learning data, the basis vectors of spherical harmonic functions are input, and image data of the corresponding spatial region corresponding thereto is output. parameters are learned.
 基底ベクトル生成処理部は、入出力デバイス7において入力された仮想視点からの撮影方向を表すデータを入出力I/F部4を介して取得し、取得された上記撮影方向に対応する球面調和関数の基底ベクトルを算出する。 The basis vector generation processing unit acquires data representing the shooting direction from the virtual viewpoint input from the input/output device 7 via the input/output I/F unit 4, and generates spherical harmonic functions corresponding to the acquired shooting direction. Calculate the basis vectors of .
 画像データ生成処理部の多層ニューラルネットワークは、上記基底ベクトル算出処理部により生成された球面調和関数の基底ベクトルを入力とし、この基底ベクトルを上記撮影方向に対応する空間領域の画像データに変換する。そして、変換された上記空間領域の生成画像データを出力する。 The multilayer neural network of the image data generation processing unit receives as input the basis vectors of the spherical harmonics generated by the basis vector calculation processing unit, and converts these basis vectors into image data of the spatial region corresponding to the shooting direction. Then, the generated image data of the transformed spatial domain is output.
 (動作例)
 次に、以上のように構成された多視点画像生成装置FGCの動作例を説明する。
(Operation example)
Next, an operation example of the multi-viewpoint image generation device FGC configured as described above will be described.
 図9は、多視点画像生成装置FGCの制御部1Cにより実行される多視点画像生成処理の処理手順と処理内容の一例を示すフローチャートである。なお、図9において前記図3と同一処理を行うステップには同一符号を付して詳しい説明は省略する。 FIG. 9 is a flowchart showing an example of the processing procedure and processing contents of the multi-viewpoint image generation processing executed by the control unit 1C of the multi-viewpoint image generation device FGC. In FIG. 9, steps in which the same processing as in FIG. 3 is performed are denoted by the same reference numerals, and detailed description thereof will be omitted.
 (1)球面調和関数変換処理部17の学習
 多視点画像生成装置FGCの制御部1Cは、先ず学習フェーズにおいて、教師データ記憶部31から各教師画像データおよび教師方向データを読み込み、球面調和関数変換処理部17の制御の下、ステップS20において多層ニューラルネットワークのパラメータを学習し保存する。
(1) Learning of Spherical Harmonics Transformation Processing Unit 17 First, in the learning phase, the control unit 1C of the multi-viewpoint image generation device FGC reads each teacher image data and teacher direction data from the teacher data storage unit 31, and performs spherical harmonics transformation. Under the control of the processing unit 17, the parameters of the multilayer neural network are learned and stored in step S20.
 例えば、球面調和関数変換処理部17は、教師データ記憶部31から読み込んだ教師方向データに対応する球面調和関数基底ベクトルを生成する。すなわち、教師方向データを(θ^,φ^)とすると、先ず下式に従い基底ベクトルを計算する。 
Figure JPOXMLDOC01-appb-M000008
For example, the spherical harmonic transform processing unit 17 generates spherical harmonic basis vectors corresponding to teacher direction data read from the teacher data storage unit 31 . That is, assuming that the teacher direction data is (θ^, φ^), first, the basis vectors are calculated according to the following equation.
Figure JPOXMLDOC01-appb-M000008
 次に、球面調和関数変換処理部17は、算出された上記基底ベクトルを多層ニューラルネットワークに入力し、この多層ニューラルネットワークから対応する空間領域の画像データを出力する。 Next, the spherical harmonic transform processing unit 17 inputs the calculated basis vectors to the multi-layer neural network, and outputs image data of the corresponding spatial region from the multi-layer neural network.
 なお、上記多層ニューラルネットワークのパラメータの学習には、上記出力された空間領域の画像データと教師画像データの誤差に基づく誤差逆伝搬等の手法を適用することができる。 It should be noted that a method such as error backpropagation based on the error between the output image data in the spatial domain and the teacher image data can be applied to learning the parameters of the multi-layer neural network.
 (2)球面調和関数の変換
 上記多層ニューラルネットワークの学習が終了すると、球面調和関数変換処理部17は以下のように球面調和関数変換処理を実行する。
(2) Transformation of Spherical Harmonics When the learning of the multi-layer neural network is completed, the spherical harmonics transformation processing unit 17 executes the spherical harmonics transformation processing as follows.
 すなわち、球面調和関数変換処理部17は、先ずステップS15において、入出力デバイス7から入力された、仮想視点からの撮影方向を表すデータを入出力I/F部4を介して取得する。そして、球面調和関数変換処理部17は、ステップS21において、基底ベクトル算出処理部により、取得された上記撮影方向に対応する球面調和関数の基底ベクトルを算出する。 That is, the spherical harmonic transform processing unit 17 first acquires, via the input/output I/F unit 4, data representing the shooting direction from the virtual viewpoint, which is input from the input/output device 7 in step S15. Then, in step S21, the spherical harmonic transform processing unit 17 calculates the base vector of the spherical harmonic function corresponding to the obtained photographing direction by the base vector calculation processing unit.
 例えば、いま仮想視点からの撮影方向として(θ^,φ^)が指定されると、球面調和関数変換処理部17は、基底ベクトル算出処理部により上記(1) 式に従い、球面調和関数の基底ベクトルを算出する。 For example, if (θ ̂ , φ ̂ ) is specified as the shooting direction from the virtual viewpoint, the spherical harmonics transformation processing unit 17 causes the basis vector calculation processing unit to calculate the basis of the spherical harmonics in accordance with the above equation (1). Compute a vector.
 次に、球面調和関数変換処理部17は、ステップS22において、算出された上記基底ベクトルを多層ニューラルネットワークに入力し、この多層ニューラルネットワークから対応する空間領域の画像データを出力する。そして、球面調和関数変換処理部17は、上記多層ニューラルネットワークから出力された空間領域の画像データを、ステップS18により入出力I/F部4から入出力デバイス7へ送信する。 Next, in step S22, the spherical harmonic transform processing unit 17 inputs the calculated basis vectors to the multi-layer neural network, and outputs image data of the corresponding spatial region from the multi-layer neural network. Then, the spherical harmonic transform processing unit 17 transmits the spatial domain image data output from the multilayer neural network from the input/output I/F unit 4 to the input/output device 7 in step S18.
 (作用・効果)
 以上述べたように第3の実施形態では、第1、第2実施形態で述べた球面調和級数展開処理部13における球面調和展開級数データの算出の代わりに、球面調和関数変換処理部17において多層ニューラルネットワークのパラメータを学習している。
(action/effect)
As described above, in the third embodiment, instead of calculating the spherical harmonic expansion series data in the spherical harmonic expansion processing unit 13 described in the first and second embodiments, the spherical harmonic conversion processing unit 17 performs multi-layer You are learning the parameters of a neural network.
 従って、例えば教師画像データおよび教師方向データの規模が膨大で、直接球面調和展開級数データを算出することが困難な場合にも、球面調和展開級数およびそれに続く高精度化処理に相当するモデルを学習することで、生成される多視点画像データの精度の低下を抑えて、高精度の多視点画像データを生成することが可能となる。 Therefore, for example, even if the size of the teacher image data and teacher direction data is enormous and it is difficult to directly calculate the spherical harmonic expansion series data, a model corresponding to the spherical harmonic expansion series and the subsequent high-precision processing is learned. By doing so, it is possible to suppress deterioration in accuracy of generated multi-viewpoint image data and generate highly accurate multi-viewpoint image data.
 [第4の実施形態]
 この発明の第4の実施形態は、第3の実施形態をさらに改良したもので、球面調和関数変換処理部が備える多層ニューラルネットワークを、低解像度の画像データを生成する第1の多層ニューラルネットワークと、この第1の多層ニューラルネットワークから出力される低解像度の画像データをアップサンプリングして高解像度の画像データを出力する第2の多層ニューラルネットワークとから構成したものである。
[Fourth embodiment]
The fourth embodiment of the present invention is a further improvement of the third embodiment, wherein the multi-layer neural network included in the spherical harmonic transform processing unit is replaced with the first multi-layer neural network for generating low-resolution image data. , and a second multilayer neural network for up-sampling the low-resolution image data output from the first multilayer neural network and outputting high-resolution image data.
 (構成例)
 図10は、この発明の第4の実施形態に係る多視点画像生成装置の球面調和関数変換処理部170のソフトウェア構成を示すブロック図である。
(Configuration example)
FIG. 10 is a block diagram showing the software configuration of the spherical harmonic transform processing section 170 of the multi-viewpoint image generating apparatus according to the fourth embodiment of the present invention.
 球面調和関数変換処理部170は、基底ベクトル算出部171と、第1の多層ニューラルネットワーク172と、第2の多層ニューラルネットワーク173とを備えている。 The spherical harmonic transform processing unit 170 includes a basis vector calculation unit 171, a first multilayer neural network 172, and a second multilayer neural network 173.
 第1の多層ニューラルネットワーク172は、基底ベクトル算出部171から出力される基底ベクトルを入力とし、低解像度の画像データを出力する。 The first multilayer neural network 172 receives the basis vectors output from the basis vector calculation unit 171 and outputs low-resolution image data.
 第2の多層ニューラルネットワークは、上記第1の多層ニューラルネットワークから出力される低解像度の画像データをアップサンプリングして高解像度の画像データを出力する。 The second multilayer neural network outputs high-resolution image data by upsampling the low-resolution image data output from the first multilayer neural network.
 (動作例)
 次に、球面調和関数変換処理部170の動作について説明する。
(Operation example)
Next, the operation of the spherical harmonic transform processing section 170 will be described.
 なお、ここでは、出力される画像データの次元を、説明の都合上(B, C, UW, UH)次元とする。この場合、球面調和関数変換処理部170は、高さUH、幅UWのCチャンネル(Cが“1”の時モノクロ画像であり、“3”の時RGBカラー画像)を、B枚出力する。さらに、Uは正の整数とする。 Here, the dimensions of the output image data are assumed to be (B, C, UW, UH) dimensions for convenience of explanation. In this case, the spherical harmonic transform processing unit 170 outputs B sheets of C channel (a monochrome image when C is "1" and an RGB color image when C is "3") of height UH and width UW. Furthermore, U is a positive integer.
 球面調和関数変換処理部170は、先ず基底ベクトル算出部171において、教師方向データ、あるいは仮想視点からの撮影方向(θ^,φ^)から、(1) 式に従い基底ベクトルを計算する。 In the spherical harmonic transform processing unit 170, the basis vector calculation unit 171 first calculates the basis vectors from the teacher direction data or the shooting directions (θ^, φ^) from the virtual viewpoint according to formula (1).
 次に、球面調和関数変換処理部170は、算出された上記基底ベクトルを第1の多層ニューラルネットワーク172に入力し、低解像度の画像データを生成する。第1の多層ニューラルネットワーク172は、最初の層に全結合層を有することと、出力される低解像度の画像データの次元が(B, C, W, H)次元となることを除いて、どのように構成されてもよい。 Next, the spherical harmonic transform processing unit 170 inputs the calculated basis vectors to the first multilayer neural network 172 to generate low-resolution image data. The first multi-layer neural network 172 has a fully connected layer in the first layer, and the dimensions of the output low-resolution image data are (B, C, W, H). It may be configured as
 続いて、球面調和関数変換処理部170は、上記第1の多層ニューラルネットワーク172から出力された上記低解像度の画像データを第2の多層ニューラルネットワーク173に入力し、高解像度の画像データを出力する。第2の多層ニューラルネットワーク173は、入力される低解像度の画像データの次元が(B, C, W, H)次元となることと、出力される高解像度の画像データの次元が(B, C, UW, UH)次元となることを除いて、どのように構成されてもよい。 Subsequently, the spherical harmonic transform processing unit 170 inputs the low-resolution image data output from the first multilayer neural network 172 to the second multilayer neural network 173, and outputs high-resolution image data. . The second multilayer neural network 173 assumes that the dimensions of the input low-resolution image data are (B, C, W, H) and the dimensions of the output high-resolution image data are (B, C , UW, UH).
 最後に、球面調和関数変換処理部170は、上記第2の多層ニューラルネットワーク173から出力された高解像度の画像データを、入出力I/F部4から入出力デバイス7へ出力する。 Finally, the spherical harmonic transform processing unit 170 outputs the high-resolution image data output from the second multilayer neural network 173 to the input/output device 7 from the input/output I/F unit 4 .
 (作用・効果)
 この発明の第4の実施形態では、球面調和関数変換処理部170が備える多層ニューラルネットワークを、低解像度の画像データを生成する第1の多層ニューラルネットワーク172と、この第1の多層ニューラルネットワーク172から出力される低解像度の画像データをアップサンプリングして高解像度化する第2の多層ニューラルネットワーク173とを縦列に接続した構成としている。従って、モデル学習を効率化すると共に、学習モデルのサイズを削減することができる。
(action/effect)
In the fourth embodiment of the present invention, the multi-layer neural network included in the spherical harmonic transform processing unit 170 is a first multi-layer neural network 172 that generates low-resolution image data, and the first multi-layer neural network 172 A second multi-layer neural network 173 for up-sampling output low-resolution image data to increase the resolution is connected in tandem. Therefore, model learning can be made efficient and the size of the learning model can be reduced.
 [その他の実施形態]
 (1)前記各実施形態では、教師画像データを複数のカメラ61~6Nからそれぞれ取得する場合について説明した。しかし、この発明はそれに限らず、例えば1台のカメラを用いて異なる複数の視点から撮影した教師画像データを順次取得するようにしてもよく、また複数の視点から撮影された各教師画像データを記憶サーバまたはデータベース等に一旦蓄積し、この記憶サーバまたはデータベースから上記各教師画像データを多視点画像生成装置FGA,FGB,FGCが一括して取得するようにしてもよい。
[Other embodiments]
(1) In each of the above-described embodiments, the case where the teacher image data is obtained from each of the cameras 61 to 6N has been described. However, the present invention is not limited to this. The training image data may be temporarily stored in a storage server, database, or the like, and the multi-viewpoint image generation apparatuses FGA, FGB, and FGC may collectively acquire the teacher image data from this storage server or database.
 (2)前記各実施形態では、仮想視点からの撮影方向を1つのみ指定した場合を例に取って説明した。しかし、複数の仮想視点に対応する撮影方向を一括指定し、多視点画像生成装置FGA,FGB,FGCが指定された上記複数の仮想視点から見た画像データをそれぞれ生成して出力するようにしてもよい。 (2) In each of the above embodiments, the case where only one shooting direction from the virtual viewpoint is specified has been described as an example. However, by collectively designating the photographing directions corresponding to a plurality of virtual viewpoints, the multi-viewpoint image generating devices FGA, FGB, and FGC generate and output image data viewed from the plurality of designated virtual viewpoints. good too.
 (3)また、球面調和級数展開処理部13については、例えば畳み込みニューラルネットワークを用いた変換モデルを事前に作成して用意し、この変換モデルに波数領域の教師画像データと教師方向を入力し、球面調和級数に展開されたデータを出力するように構成してもよい。また同様に、球面調和逆変換処理部14についても畳み込みニューラルネットワークを用いた変換モデルを用意し、この変換モデルに球面調和級数データと、仮想視点からの撮影方向を指定するデータとを入力し、波数領域の画像データを出力するように構成してもよい。 (3) For the spherical harmonic series expansion processing unit 13, a conversion model using, for example, a convolutional neural network is prepared in advance, and training image data and a training direction in the wave number domain are input to this conversion model, It may be configured to output data expanded into a spherical harmonic series. Similarly, a transformation model using a convolutional neural network is prepared for the inverse spherical harmonic transform processing unit 14, and spherical harmonic series data and data designating the shooting direction from the virtual viewpoint are input to this transformation model, It may be configured to output image data in the wavenumber domain.
 (4)その他、多視点画像生成装置の構成や設置場所、ニューラルネットワークの種類、多視点画像生成処理の処理手順と処理内容、被写体となるオブジェクトの種類等についても、この発明の要旨を逸脱しない範囲で種々変形して実施可能である。 (4) In addition, the configuration and installation location of the multi-viewpoint image generation device, the type of neural network, the processing procedure and processing details of the multi-viewpoint image generation processing, the type of object to be photographed, etc. do not deviate from the gist of the present invention. Various modifications can be made within the range.
 以上、この発明の実施形態を詳細に説明してきたが、前述までの説明はあらゆる点においてこの発明の例示に過ぎない。この発明の範囲を逸脱することなく種々の改良や変形を行うことができることは言うまでもない。つまり、この発明の実施にあたって、実施形態に応じた具体的構成が適宜採用されてもよい。 Although the embodiments of the present invention have been described in detail above, the above descriptions are merely examples of the present invention in all respects. It goes without saying that various modifications and variations can be made without departing from the scope of the invention. That is, in carrying out the present invention, a specific configuration according to the embodiment may be adopted as appropriate.
 要するにこの発明は、上記各実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記各実施形態に開示されている複数の構成要素の適宜な組み合せにより種々の発明を形成できる。例えば、各実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態に亘る構成要素を適宜組み合せてもよい。 In short, the present invention is not limited to the above-described embodiments as they are, and can be embodied by modifying the constituent elements without departing from the gist of the invention at the implementation stage. Also, various inventions can be formed by appropriate combinations of the plurality of constituent elements disclosed in the above embodiments. For example, some components may be deleted from all the components shown in each embodiment. Furthermore, constituent elements of different embodiments may be combined as appropriate.
 FGA,FGB,FGC…多視点画像生成装置
 1A,1B,1C…制御部
 2…プログラム記憶部
 3…データ記憶部
 4…入出力I/F部
 5…バス
 6…カメラ
 7…入出力デバイス
 11…教師データ取得処理部
 12…フーリエ変換処理部
 13…球面調和級数展開処理部
 14…球面調和逆変換処理部
 15…逆フーリエ変換処理部
 16…球面調和展開級数最適化処理部
 17,170…球面調和関数変換処理部
 31…教師データ記憶部
 32…波数領域データ記憶部
 33…球面調和展開級数データ記憶部
 34…生成画像データ記憶部
 161…多層ニューラルネットモジュール
 171…基底ベクトル算出部
 172,173…多層ニューラルネットワーク
FGA, FGB, FGC... Multi-viewpoint image generation device 1A, 1B, 1C... Control unit 2... Program storage unit 3... Data storage unit 4... Input/output I/F unit 5... Bus 6... Camera 7... Input/output device 11... Teacher data acquisition processing unit 12 Fourier transform processing unit 13 Spherical harmonic series expansion processing unit 14 Spherical harmonic inverse transform processing unit 15 Inverse Fourier transform processing unit 16 Spherical harmonic expansion series optimization processing unit 17, 170 Spherical harmonics Function conversion processing unit 31 Teacher data storage unit 32 Wavenumber domain data storage unit 33 Spherical harmonic expansion series data storage unit 34 Generated image data storage unit 161 Multilayer neural network module 171 Basis vector calculation unit 172, 173 Multilayer neural network

Claims (10)

  1.  複数の視点から撮影された教師画像をそれぞれ波数領域の教師画像に変換するフーリエ変換処理部と、
     前記波数領域の教師画像を波数成分ごとに球面調和級数に展開する球面調和級数展開処理部と、
     球面調和級数展開処理部により得られた球面調和展開級数と、前記複数の視点とは異なる任意の仮想視点からの撮影方向を指定する情報とに基づいて、前記仮想視点に対応する波数領域の生成画像を生成する球面調和逆変換処理部と、
     前記波数領域の生成画像を空間領域の生成画像に変換する逆フーリエ変換処理部と
     を具備する多視点画像生成装置。
    a Fourier transform processing unit that transforms teacher images captured from a plurality of viewpoints into teacher images in the wave number domain;
    a spherical harmonic series expansion processing unit that expands the teacher image in the wavenumber domain into a spherical harmonic series for each wavenumber component;
    Generating a wave number region corresponding to the virtual viewpoint based on the spherical harmonic expansion series obtained by the spherical harmonic expansion processing unit and information designating an imaging direction from an arbitrary virtual viewpoint different from the plurality of viewpoints. a spherical harmonic inverse transform processor for generating an image;
    and an inverse Fourier transform processing unit that transforms the generated image in the wavenumber domain into a generated image in the spatial domain.
  2.  前記球面調和級数展開処理部により得られた前記球面調和展開級数を入力とし、最適化された球面調和展開級数を出力する球面調和展開級数最適化処理部を、さらに具備する請求項1に記載の多視点画像生成装置。 2. The spherical harmonic expansion series according to claim 1, further comprising a spherical harmonic expansion series optimization processing section that receives as input the spherical harmonic expansion series obtained by the spherical harmonic expansion series and outputs an optimized spherical harmonic expansion series. Multi-viewpoint image generation device.
  3.  前記球面調和級数展開処理部は、前記波数領域の教師画像を、前記複数の視点からの撮影方向を教師方向として、前記波数成分ごとに前記球面調和級数に展開する、請求項1または2に記載の多視点画像生成装置。 3. The spherical harmonic series expansion processing unit according to claim 1 or 2, wherein said spherical harmonic series expansion processing unit expands said teacher image in said wavenumber domain into said spherical harmonic series for each of said wavenumber components, with an imaging direction from said plurality of viewpoints as a teacher direction. multi-viewpoint image generation device.
  4.  前記複数の視点から撮影された教師画像を、外部のカメラまたはデータベースから取得する教師データ取得処理部と、前記逆フーリエ変換処理部により変換された前記空間領域の生成画像を、前記仮想視点からの撮影方向の指定元となる装置へ送信する送信処理部とを、さらに具備する請求項1または2に記載の多視点画像生成装置。 a teacher data acquisition processing unit that acquires the teacher images shot from the plurality of viewpoints from an external camera or a database; 3. The multi-viewpoint image generation device according to claim 1, further comprising a transmission processing unit that transmits to a device that is a source of specifying a shooting direction.
  5.  球面調和関数を用いて、任意の仮想視点からの撮影方向に対応する画像を生成する多視点画像生成装置であって、
     球面調和関数変換処理部を具備し、
     前記球面調和関数変換処理部は、
      前記任意の仮想視点からの撮影方向を指定する情報が入力された場合に、取得された前記撮影方向に対応する球面調和関数の基底ベクトルを算出する基底ベクトル算出処理部と、
      算出された前記基底ベクトルを入力として、前記撮影方向に対応する空間領域の画像を生成し出力する画像生成処理部と
     を備える多視点画像生成装置。
    A multi-viewpoint image generation device that uses spherical harmonics to generate an image corresponding to a shooting direction from an arbitrary virtual viewpoint,
    Equipped with a spherical harmonic transform processing unit,
    The spherical harmonic transform processing unit is
    a basis vector calculation processing unit that calculates a basis vector of a spherical harmonic function corresponding to the obtained photographing direction when information specifying the photographing direction from the arbitrary virtual viewpoint is input;
    and an image generation processing unit that receives the calculated base vector as an input and generates and outputs an image of a spatial region corresponding to the shooting direction.
  6.  前記画像生成処理部は、
      前記基底ベクトルを入力として、第1の解像度を有する第1の画像を生成し出力する第1のニューラルネットワークと、
      前記第1のニューラルネットワークから出力される前記第1の画像を入力として、前記第1の解像度より高い第2の解像度を有する第2の画像を生成し、生成された前記第2の画像を前記撮影方向に対応する空間領域の画像として出力する第2のニューラルネットワークと
     を備える請求項5に記載の多視点画像生成装置。
    The image generation processing unit
    a first neural network that receives the basis vectors as input and generates and outputs a first image having a first resolution;
    using the first image output from the first neural network as an input, generating a second image having a second resolution higher than the first resolution; 6. The multi-viewpoint image generation device according to claim 5, further comprising: a second neural network that outputs images in a spatial domain corresponding to a shooting direction.
  7.  情報処理装置が実行する多視点画像生成方法であって、
     複数の視点から撮影された教師画像をそれぞれ波数領域の教師画像に変換する第1の処理過程と、
     前記波数領域の教師画像を波数成分ごとに球面調和級数に展開する第2の処理過程と、
     前記第2の処理過程により得られた球面調和展開級数と、前記複数の視点とは異なる任意の仮想視点からの撮影方向を指定する情報とに基づいて、前記仮想視点に対応する波数領域の生成画像を生成する第3の処理過程と、
     前記波数領域の生成画像を空間領域の生成画像に変換する第4の処理過程と
     を具備する多視点画像生成方法。
    A multi-view image generation method executed by an information processing device,
    a first processing step of transforming teacher images captured from a plurality of viewpoints into teacher images in the wavenumber domain;
    a second processing step of expanding the teacher image in the wavenumber domain into a spherical harmonic series for each wavenumber component;
    Generating a wavenumber region corresponding to the virtual viewpoint based on the spherical harmonic expansion series obtained by the second processing step and information designating an imaging direction from an arbitrary virtual viewpoint different from the plurality of viewpoints. a third process for generating an image;
    and a fourth processing step of converting the generated image in the wavenumber domain into a generated image in the spatial domain.
  8.  前記第2の処理過程により得られた前記球面調和展開級数を入力とし、最適化された球面調和展開級数を出力する第5の処理過程を、さらに具備する請求項7に記載の多視点画像生成方法。 8. The multi-viewpoint image generation according to claim 7, further comprising a fifth processing step of inputting the spherical harmonic expansion series obtained by the second processing step and outputting an optimized spherical harmonic expansion series. Method.
  9.  球面調和関数を用いて任意の仮想視点からの撮影方向に対応する画像を生成する処理を、情報処理装置が実行する多視点画像生成方法であって、
     前記任意の仮想視点からの撮影方向を指定する情報が入力された場合に、取得された前記撮影方向に対応する球面調和関数の基底ベクトルを算出する過程と、
     算出された前記基底ベクトルを入力として、前記撮影方向に対応する空間領域の画像を生成し出力する過程と
     を備える多視点画像生成方法。
    A multi-viewpoint image generation method in which an information processing apparatus executes processing for generating an image corresponding to a shooting direction from an arbitrary virtual viewpoint using spherical harmonics,
    a step of calculating a basis vector of a spherical harmonic function corresponding to the obtained photographing direction when information specifying the photographing direction from the arbitrary virtual viewpoint is input;
    A multi-viewpoint image generation method comprising a step of generating and outputting an image of a spatial region corresponding to the imaging direction using the calculated basis vectors as input.
  10.  請求項1乃至請求項6のいずれかに記載の多視点画像生成装置が具備する前記各処理部による処理を、前記多視点画像生成装置が備えるプロセッサに実行させるプログラム。 A program that causes a processor included in the multi-viewpoint image generation device to execute processing by each of the processing units included in the multi-viewpoint image generation device according to any one of claims 1 to 6.
PCT/JP2022/006981 2021-11-24 2022-02-21 Multi-viewpoint image generation device, method, and program WO2023095353A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/043195 WO2023095792A1 (en) 2021-11-24 2022-11-22 Multi-viewpoint image generation device, method, and program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JPPCT/JP2021/043015 2021-11-24
PCT/JP2021/043015 WO2023095212A1 (en) 2021-11-24 2021-11-24 Multi-viewpoint image generation device, method, and program

Publications (1)

Publication Number Publication Date
WO2023095353A1 true WO2023095353A1 (en) 2023-06-01

Family

ID=86539084

Family Applications (3)

Application Number Title Priority Date Filing Date
PCT/JP2021/043015 WO2023095212A1 (en) 2021-11-24 2021-11-24 Multi-viewpoint image generation device, method, and program
PCT/JP2022/006981 WO2023095353A1 (en) 2021-11-24 2022-02-21 Multi-viewpoint image generation device, method, and program
PCT/JP2022/043195 WO2023095792A1 (en) 2021-11-24 2022-11-22 Multi-viewpoint image generation device, method, and program

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/043015 WO2023095212A1 (en) 2021-11-24 2021-11-24 Multi-viewpoint image generation device, method, and program

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/043195 WO2023095792A1 (en) 2021-11-24 2022-11-22 Multi-viewpoint image generation device, method, and program

Country Status (1)

Country Link
WO (3) WO2023095212A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06162173A (en) * 1992-11-20 1994-06-10 Mitsubishi Electric Corp Three-dimensional body recognizing device
JP2007128009A (en) * 2005-11-07 2007-05-24 Research Organization Of Information & Systems Imaging device and imaging method using out-of-focus structure
JP2010176325A (en) * 2009-01-28 2010-08-12 Ntt Docomo Inc Device and method for generating optional viewpoint image
JP2017199235A (en) * 2016-04-28 2017-11-02 株式会社朋栄 Focus correction processing method by learning type algorithm

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI303791B (en) * 2002-03-21 2008-12-01 Microsoft Corp Graphics image rendering with radiance self-transfer for low-frequency lighting environments

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06162173A (en) * 1992-11-20 1994-06-10 Mitsubishi Electric Corp Three-dimensional body recognizing device
JP2007128009A (en) * 2005-11-07 2007-05-24 Research Organization Of Information & Systems Imaging device and imaging method using out-of-focus structure
JP2010176325A (en) * 2009-01-28 2010-08-12 Ntt Docomo Inc Device and method for generating optional viewpoint image
JP2017199235A (en) * 2016-04-28 2017-11-02 株式会社朋栄 Focus correction processing method by learning type algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
AKIRA KUBOTA, KAZUYA KODAMA, YOSHINORI HATORI: "A view interpolation method without depth estimation and its stability analysis for generating a center view image using multiple images of a circular camera array", IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, DENSHI JOUHOU TSUUSHIN GAKKAI, JOUHOU SHISUTEMU SOSAIETI, JP, vol. J90-D, no. 4, 1 April 2007 (2007-04-01), JP , pages 1063 - 1072, XP009546089, ISSN: 1880-4535 *
IZAWA IPPEITA, SHUN NONOSHITA, KAZUYA KODAMA, TAKAYUK HAMAMOTO: "FPGA-based Real-Time Free Viewpoint Image Reconstruction from 3-D Multi-focus Imaging Sequences", IEICE TECHNICAL REPORT IE2010-50(2010-07), vol. 34, no. 32, 26 July 2010 (2010-07-26), pages 7 - 12, XP093068639 *

Also Published As

Publication number Publication date
WO2023095212A1 (en) 2023-06-01
WO2023095792A1 (en) 2023-06-01

Similar Documents

Publication Publication Date Title
US10311547B2 (en) Image upscaling system, training method thereof, and image upscaling method
CN111386550A (en) Unsupervised learning of image depth and ego-motion predictive neural networks
US20180137611A1 (en) Novel View Synthesis Using Deep Convolutional Neural Networks
US20200098144A1 (en) Transforming grayscale images into color images using deep neural networks
CN114549731A (en) Method and device for generating visual angle image, electronic equipment and storage medium
JP7355851B2 (en) Method and apparatus for identifying videos
WO2020146911A2 (en) Multi-stage multi-reference bootstrapping for video super-resolution
JP2022522564A (en) Image processing methods and their devices, computer equipment and computer programs
CN115690382B (en) Training method of deep learning model, and method and device for generating panorama
CN112887728A (en) Electronic device, control method and system of electronic device
WO2020092051A1 (en) Rolling shutter rectification in images/videos using convolutional neural networks with applications to sfm/slam with rolling shutter images/videos
CN111429501A (en) Depth map prediction model generation method and device and depth map prediction method and device
TWI834814B (en) Method and system for providing rotational invariant neural networks
CN114640885B (en) Video frame inserting method, training device and electronic equipment
CN115375838A (en) Binocular gray image three-dimensional reconstruction method based on unmanned aerial vehicle
JP6521352B2 (en) Information presentation system and terminal
KR20210109244A (en) Device and Method for Image Style Transfer
WO2023095353A1 (en) Multi-viewpoint image generation device, method, and program
JP7378500B2 (en) Autoregressive video generation neural network
CN115375780A (en) Color difference calculation method and device, electronic equipment, storage medium and product
CN110830848B (en) Image interpolation method, image interpolation device, computer equipment and storage medium
CN114820745A (en) Monocular visual depth estimation system, method, computer device, and computer-readable storage medium
JP6892557B2 (en) Learning device, image generator, learning method, image generation method and program
JP6315542B2 (en) Image generating apparatus and image generating method
CN112465716A (en) Image conversion method and device, computer readable storage medium and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22898139

Country of ref document: EP

Kind code of ref document: A1