WO2018087856A1 - Image synthesis device and image synthesis method - Google Patents

Image synthesis device and image synthesis method Download PDF

Info

Publication number
WO2018087856A1
WO2018087856A1 PCT/JP2016/083316 JP2016083316W WO2018087856A1 WO 2018087856 A1 WO2018087856 A1 WO 2018087856A1 JP 2016083316 W JP2016083316 W JP 2016083316W WO 2018087856 A1 WO2018087856 A1 WO 2018087856A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
imaging
pixel
imaging device
imaging devices
Prior art date
Application number
PCT/JP2016/083316
Other languages
French (fr)
Japanese (ja)
Inventor
浩平 岡原
古木 一朗
司 深澤
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to JP2018549688A priority Critical patent/JP6513305B2/en
Priority to PCT/JP2016/083316 priority patent/WO2018087856A1/en
Publication of WO2018087856A1 publication Critical patent/WO2018087856A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules

Definitions

  • the present invention relates to a video composition device and a video composition method for generating one composite video from a plurality of videos (that is, a plurality of video data) acquired by a plurality of imaging devices.
  • a video synthesis process in which a plurality of videos acquired by shooting with a plurality of imaging devices (that is, a plurality of cameras) are combined to generate one combined video.
  • video processing such as lens distortion correction processing, viewpoint conversion processing, projection conversion processing, etc. is performed on each of a plurality of videos output from a plurality of imaging devices. Done. Since the processing load of these video processes is very large, it is difficult to perform these video processes in real time by a normal arithmetic unit (CPU: Central Processing Unit). Therefore, in the conventional apparatus, the video composition processing is performed by a GPU (Graphics Processing Unit) which is a parallel arithmetic apparatus that can operate in parallel with a normal arithmetic apparatus.
  • GPU Graphics Processing Unit
  • the present invention has been made in order to solve the above-described conventional problems, and the object of the present invention is to obtain one image from a plurality of images acquired by a plurality of imaging devices even when the number of imaging devices is increased.
  • An object of the present invention is to provide a video composition device and a video composition method capable of performing a video composition process for generating a composite image in a short time.
  • a video composition device is a video composition device that generates one composite video from a plurality of videos acquired by a plurality of imaging devices, and a video reception unit that receives the plurality of videos; A parameter input unit to which camera parameters of the plurality of imaging devices are input; and a video processing unit that generates the composite video from the plurality of videos, wherein the video processing unit receives the camera parameters input in advance.
  • first imaging device identification information for identifying a corresponding imaging device among the plurality of imaging devices, and an imaging device identified by the first imaging device identification information
  • a reference table including a corresponding first pixel position and a first weighted coefficient at the corresponding first pixel position is created, and the composite video is referred to by referring to the reference table
  • the first value obtained by multiplying the pixel value of the corresponding first pixel position in the imaging device specified by the first imaging device specifying information by the first weighting coefficient is obtained.
  • the synthesized video is generated.
  • a video composition method is a video composition method for generating one composite video from a plurality of videos acquired by a plurality of imaging devices, the cameras being input in advance for the plurality of imaging devices.
  • first imaging device specifying information for specifying a corresponding imaging device among the plurality of imaging devices, and imaging specified by the first imaging device specifying information
  • Creating a first reference table including a corresponding first pixel position in the device and a first weighted coefficient at the corresponding first pixel position and referring to the first reference table
  • the pixel value of the corresponding first pixel position in the imaging device specified by the first imaging device specifying information is multiplied by the first weighted coefficient for each pixel of the composite video.
  • video composition processing for generating one composite video from a plurality of videos acquired by a plurality of imaging devices can be performed in a short time.
  • FIG. 1 is a hardware configuration diagram schematically showing a video composition device according to Embodiment 1.
  • FIG. 4 is a diagram illustrating an example of a correspondence relationship between a synthesized video pixel and pixels of a plurality of imaging devices in the video synthesis device according to Embodiment 1.
  • FIG. 6 is a diagram illustrating an example of an overlapping area of imaging ranges of a plurality of imaging devices in the video composition device according to Embodiment 1.
  • FIG. 4 is a diagram illustrating a pixel range of an imaging device included in a first reference table in the video composition device according to Embodiment 1.
  • FIG. 1 is a hardware configuration diagram schematically showing a video composition device according to Embodiment 1.
  • FIG. 4 is a diagram illustrating an example of a correspondence relationship between a synthesized video pixel and pixels of a plurality of imaging devices in the video synthesis device according to Embodiment 1.
  • FIG. 6 is a diagram illustrating an example of an overlapping area of imaging ranges of
  • FIG. 6 is a diagram illustrating a pixel range of an imaging device included in a second reference table in the video composition device according to Embodiment 1.
  • FIG. 6 is a flowchart showing an operation of the video composition device according to the first embodiment (that is, a video composition method according to the first embodiment).
  • 6 is a diagram illustrating an example of a superposed region of trapezoidal imaging ranges of a plurality of imaging devices in a video composition device according to Embodiment 2.
  • FIG. FIG. 10 is a diagram illustrating an example in which imaging ranges of a plurality of imaging devices in a video composition device according to Embodiment 2 are simplified.
  • FIG. 10 is a diagram illustrating a pixel range of an imaging device included in a first reference table in a video composition device according to Embodiment 2.
  • FIG. 10 is a diagram illustrating a pixel range (superimposed region) of an imaging device included in a second reference table in the video composition device according to Embodiment 2.
  • FIG. 10 is a diagram illustrating a pixel range (superimposed region) of an imaging device included in a third reference table in the video composition device according to Embodiment 2. It is a figure which shows the range (superimposition area
  • FIG. 1 is a functional block diagram schematically showing a configuration of a video composition device 1 according to Embodiment 1 of the present invention.
  • the video composition apparatus 1 is an apparatus that can perform the video composition method according to the first embodiment.
  • the video synthesizing device 1 is a composite video (ie, a plurality of video data) output from a plurality of imaging devices (ie, a plurality of cameras) Cam1, ..., Cami, ..., CamN. 1 composite video data) is generated.
  • N is an integer of 2 or more
  • i is an arbitrary integer of 1 or more and N or less.
  • the video synthesizing device 1 When the video is a moving image, the video synthesizing device 1 performs processing for creating one synthesized video frame from N video frames output from the N imaging devices Cam1, ..., CamN. By repeatedly performing each time a video frame is input from Cam1,..., CamN, moving image data as composite video data is generated. The generated composite video data is output to the display device 2.
  • the display device 2 displays a video based on the received composite video data.
  • Examples of composite video include panoramic video that is a horizontally long video with a wide field of view and an overhead video that is a video looking down from a high position.
  • a synthesized video generated by synthesizing a plurality of videos arranged in the left-right direction (one-dimensional direction) acquired by a plurality of imaging devices is a panoramic video.
  • a synthesized video generated by synthesizing a plurality of videos arranged in the vertical and horizontal directions (two-dimensional directions) acquired by a plurality of imaging devices is an overhead video.
  • the video composition device 1 creates in advance a reference table having information regarding the pixels of the imaging devices Cam1,..., CamN corresponding to the composite video pixels, and sets the pixel values of the composite video pixels using this reference table ( substitute.
  • the video composition device 1 includes a video receiving unit 4, a parameter input unit 5, a video processing unit 6 having a storage unit 6a, and a display processing unit 7. is doing.
  • the storage unit 6 a may be provided outside the video processing unit 6.
  • the video composition apparatus 1 shown in FIG. 1 uses a memory as a storage unit 6a that stores a program as software and a processor as an information processing unit that executes a program stored in the memory (for example, by a computer). ) Can be realized. 1 may be realized by a memory that stores a program and a processor that executes the program.
  • the video reception unit 4 receives a plurality of video data output from the plurality of imaging devices Cam1,..., CamN, and outputs the received video data to the video processing unit 6.
  • the video data decoding process may be performed by the video receiving unit 4 and the decoded video data may be output to the video processing unit 6.
  • the parameter input unit 5 receives information indicating camera parameters for a plurality of imaging devices Cam1,..., CamN obtained by calibration performed in advance, that is, parameter estimation of imaging elements of the lens and the image sensor, and performs video processing. Output to unit 6.
  • the camera parameters include, for example, internal parameters that are camera parameters unique to the imaging devices Cam1,..., CamN, external parameters that are camera parameters indicating the positions and orientations of the imaging devices Cam1,. ..., including lens distortion correction coefficient (for example, lens distortion correction map) used to correct distortion specific to the CamN lens (for example, distortion in the radial direction of the lens and distortion in the circumferential direction of the lens) .
  • the video processing unit 6 creates a reference table for video composition at the time of initialization using the camera parameters calculated by the calibration performed in advance, and stores this reference table in the storage unit 6a. Store.
  • the video processing unit 6 refers to the reference table and generates composite video data from a plurality of video data (video frames) output from the video receiving unit 4.
  • the display processing unit 7 outputs the composite video data generated by the video processing unit 6 to the display device 2.
  • FIG. 2 is a hardware configuration diagram schematically showing the video composition device 1 according to the first embodiment.
  • the video composition device 1 includes a main processor 10, a main memory 11, an auxiliary memory 12, a video processing processor 13 that is a parallel processing device such as a GPU, a video processing memory 14, an input interface 15, and a file interface 16.
  • FIG. 1 includes a main processor 10, a main memory 11, an auxiliary memory 12, a video processing processor 13, and a video processing memory 14 shown in FIG.
  • the storage unit 6a in FIG. 1 includes the main memory 11, the auxiliary memory 12, and the video processing memory 14 shown in FIG.
  • the parameter input unit 5 of FIG. 1 includes a file interface 16 shown in FIG. 1 includes a video input interface 18 shown in FIG.
  • the display processing unit 7 in FIG. 1 includes a display interface 17 shown in FIG.
  • FIG. 2 only shows an example of the hardware configuration of the video composition apparatus 1 shown in FIG. 1, and the hardware configuration can be variously changed. Further, the correspondence relationship between the functional blocks 4 to 7 shown in FIG. 1 and the hardware configurations 10 to 18 shown in FIG. 2 is not limited to the above example.
  • the parameter input unit 5 in FIG. 1 acquires the camera parameter information calculated by the calibration executed in advance from the auxiliary memory 12 and writes it to the main memory 11.
  • the auxiliary memory 12 may store camera parameters calculated by a previously executed calibration.
  • the main processor 10 may store the camera parameters in the main memory 11 through the file interface 16.
  • the main processor 10 may store a still image file in the auxiliary memory 12 when creating a composite video from a still image.
  • the input interface 15 receives device input such as mouse input, keyboard input, touch panel input, and the like, and sends input information to the main processor 10.
  • the video processing memory 14 stores the input video data transferred from the main memory 11 and the composite video data created by the video processing processor 13.
  • the display interface 17 and the display device 2 are connected by an HDMI (registered trademark) (High-Definition Multimedia Interface) cable or the like.
  • the synthesized video is output to the display device 2 via the display interface 17 as the display processing unit 7.
  • the video input interface 18 as the video receiver 4 receives video inputs from the imaging devices Cam1,..., CamN connected to the video synthesizer 1 and stores the input video in the main memory 11.
  • the imaging devices Cam1,..., CamN are, for example, network cameras, analog cameras, USB (Universal Serial Bus) cameras, HD-SDI (High Definition Serial Digital Interface) cameras, and the like. Note that the video input interface 18 uses a standard conforming to the connected device.
  • the video processing unit 6 in FIG. 1 determines the resolution W_synth ⁇ H_synth of the composite video to be created, and reserves a memory area for storing the composite video in the storage unit 6a in FIG.
  • W_synth indicates the number of pixels in the horizontal direction of the rectangular composite video
  • H_synth indicates the number of pixels in the vertical direction of the composite video.
  • the video processor 13 determines the resolution W_synth ⁇ H_synth of the composite video to be created, and reserves a memory area for storing the composite video in the video processing memory 14.
  • the video processing unit 6 in FIG. 1 uses the camera parameters (internal parameters, external parameters, lens distortion correction data, projection plane, etc.) of the imaging devices Cam1,..., CamN input from the parameter input unit 5 in FIG. ,
  • a reference table for the imaging devices Cam1,..., CamN is created and stored in the storage unit 6a.
  • the video processor 13 creates a reference table for the imaging devices Cam1,..., CamN from the camera parameters of the imaging devices Cam1,..., CamN input from the file interface 16. And stored in the video processing memory 14.
  • FIG. 3 is a diagram illustrating an example of a correspondence relationship between a synthesized video pixel and pixels of a plurality of imaging devices Cam1,..., CamN in the video synthesis device 1 according to the first embodiment.
  • the reference table for the imaging devices Cam1,..., CamN as shown in FIG. 3, the pixels (x_cam1, y_cam1), ..., (x_camN, y_camN) of the imaging devices Cam1,.
  • the ⁇ value is a weighted coefficient used for the blending process of the overlapping region of the imaging range of the imaging devices Cam1,.
  • FIG. 4 is a diagram illustrating an example of a superposed region of the imaging ranges of the plurality of imaging devices Cam1,..., Cam4 in the video composition device 1 according to the first embodiment.
  • a superimposition region in the imaging range of the adjacent imaging devices Cam1,.
  • Blend processing is applied to the superimposition area, but the pixel values of different imaging devices Cam1,..., Cam4 are referred to, weighted by multiplying the pixel values by a weighted coefficient ⁇ , and the weighted pixel values are synthesized.
  • a processing waiting time occurs for the video data output from the imaging devices Cam1,. Note that since the blending process is performed in the overlapping region, the overlapping region is also referred to as a blending region.
  • FIG. 5 is a diagram illustrating pixel ranges of the imaging devices Cam1,..., Cam4 included in the reference table (first reference table) in the video composition device 1 according to the first embodiment.
  • FIG. 6 is a diagram showing the pixel ranges of the imaging devices Cam1,..., Cam4 included in another reference table (second reference table) in the video composition device 1 according to the first embodiment.
  • the video processing unit 6 creates a reference table for video composition from the reference tables of the imaging devices Cam1, ..., CamN.
  • This reference table for video composition is two reference tables holding information on the upper side (left imaging device) of the blend region and the lower side (right imaging device) of the blend region, that is, the first table shown in FIG. 1 reference table and a second reference table shown in FIG.
  • the first reference table for video composition includes the camera number i as the first imaging device identification information, the pixel (x_cami, y_cami) of the corresponding imaging device Cami, and the ⁇ value of the pixel of the corresponding imaging device Cami. Hold. Note that the ⁇ value of the pixels other than the overlapping region is 1.
  • FIG. 5 shows an example, and the pixel ranges of the imaging devices Cam1,..., Cam4 included in the first reference table are not limited to the example of FIG.
  • the second reference table for video composition is used as second imaging device specifying information for specifying an imaging device having an overlapping area among a plurality of imaging ranges captured by the plurality of imaging devices Cam1,..., CamN.
  • the camera number i, the pixel (x_cami, y_cami) of the corresponding imaging device Cami, and the ⁇ value of the pixel in the overlapping region of the corresponding imaging device Cami are held.
  • FIG. 6 shows an example, and the pixel ranges of the imaging devices Cam1,..., Cam4 included in the second reference table are not limited to the example of FIG.
  • Pixels other than the overlapping region of the imaging range of the imaging devices Cam1,..., CamN and the pixels corresponding to the imaging device on the left side of the overlapping region (or the imaging device on the right side) can be simultaneously assigned to the synthesized video pixels. .
  • the upper reference table (first reference table) shown in FIG. The information stored in the pixel corresponding to the imaging device on the right side (or the imaging device on the left side) of the overlapping region of the imaging range of the imaging devices Cam1,..., CamN is stored in the lower reference table (first table) shown in FIG. 2 reference table).
  • the pixel values of the pixels in the imaging range of the imaging devices Cam1, Cam2, Cam3, and Cam4 shown in FIG. 5 can be simultaneously substituted into the synthesized video pixels using the first reference table.
  • the pixel values of the pixels in the overlapping region of the imaging ranges of the imaging devices Cam2, Cam3, and Cam4 illustrated in FIG. 6 can be simultaneously assigned to the synthesized video pixels using the second reference table.
  • a video processor that is a parallel processing device such as a GPU is used, and the first and second reference tables are used, so that no processing wait occurs in the video composition processing, and the imaging device Cam1, ..., regardless of the number of CamNs, a composite image can be generated in two steps, that is, the substitution process using the first reference table and the substitution process using the second reference table.
  • the video input interface 18 in the video receiver 4 acquires video data for one frame of the imaging devices Cam1,..., CamN and stores the video data in the main memory 11. The acquired video data is transferred from the main memory 11 to the video processing memory 14.
  • the video processor 13 in the video processing unit 6 combines the pixel values of the input video transferred to the video processing memory 14 using the first reference table and the second reference table, corresponding to the pixels of the input video. Substitute as the pixel value of the image pixel. This processing procedure will be described below.
  • the following video composition processing is executed by the video processor 13 in parallel with the processing of the main processor 10. ⁇ 1> First, as the first process, the video processor 13 determines from the first reference table that the camera number i corresponding to each pixel (x_synth, y_synth) in the synthesized video and the camera device i corresponding to the camera number i correspond to each other. The pixel position (x_cami, y_cami) to be performed and the weighted coefficient ⁇ are extracted.
  • the video processor 13 refers to the pixel value of the input video (x_cami, y_cami) of the camera number i on the video processing memory 14 and assigns a weighting coefficient ⁇ to this pixel value. Is substituted for the pixel of the composite video (x_synth, y_synth) on the video processing memory 14.
  • the video processor 13 executes the following video composition processing in parallel with the processing of the main processor 10.
  • the video processor 13 determines from the second reference table that the camera number i corresponding to each pixel (x_synth, y_synth) in the synthesized video and the imaging device Cami corresponding to the camera number i.
  • the pixel position (x_cami, y_cami) to be performed and the weighted coefficient ⁇ are extracted.
  • the video processor 13 refers to the pixel value of the input video (x_cami, y_cami) of the camera number i on the video processing memory 14 and assigns a weighting coefficient ⁇ to this pixel value. Is substituted for the pixel of the composite video (x_synth, y_synth) on the video processing memory 14. As a result, blend processing is performed on the pixels in the superimposed region of the composite video.
  • FIG. 7 is a flowchart showing the operation of the video composition apparatus according to the first embodiment (that is, the video composition method according to the first embodiment).
  • the video processing unit 6 After creating the reference table in the initialization process (step S1), the video processing unit 6 performs the video input process (step S2) and the video composition process (step S4) until the video input is completed (step S4). , Repeat.
  • the video processing unit 6 corrects the positional shift using the feature points on the video and creates a new reference table in the background. By replacing the currently used reference table with a new reference table, an aligned composite video can be created.
  • the display processing unit 7 transmits the panoramic composite video data as the composite video data created by the video processing unit 6 to the display device 2.
  • the display device 2 displays a video based on the received panoramic composite video data. Note that the display device 2 may display the panoramic composite video on a single display screen, or may display it over a plurality of display screens. The display device 2 may cut out and display only a partial area of the panoramic composite video.
  • ⁇ 1-3 Effect As described above, according to the video composition device 1 and the video composition method according to the first embodiment, the decoding load of the input video data according to the number of the imaging devices Cam1,. However, the load of the video composition processing of the video acquired by the imaging devices Cam1,..., CamN hardly increases.
  • the processing time increases as the number of imaging devices Cam1,. .
  • a reference table is prepared for each of the imaging devices Cam1,..., CamN and lens distortion correction processing, viewpoint conversion processing, and projection conversion processing are combined, processing in the overlapping region of the imaging devices Cam1,. Since waiting occurs, the processing time increases as the number of imaging devices Cam1,..., CamN increases.
  • the first reference table is configured only by data that can be substituted with pixels simultaneously.
  • the video composition process can be realized with the maximum number of steps of the imaging devices Cam1,. That is, in the first embodiment, since the maximum number of imaging devices Cam1,..., CamN related to the overlapping region is a panoramic composite video, the steps using the first reference table and the second reference table are used.
  • the synthesizing process can be executed in two steps consisting of steps using.
  • Embodiment 2 ⁇ 2-1 Configuration
  • the video composition apparatus and the video composition method for generating one composite video (panoramic video) from a plurality of videos arranged in the left-right direction have been described.
  • a video composition device and a video composition method for generating one composite video (overhead video) from a plurality of videos arranged in the vertical and horizontal directions will be described.
  • the difference from Embodiment 1 is that the video composition processing is performed using the reference tables (second to fourth reference tables). Except for these points, the second embodiment is the same as the first embodiment. Therefore, in the description of the second embodiment, reference is also made to FIGS. 1, 2 and 7 used in the description of the first embodiment.
  • FIG. 8 is a diagram illustrating an example of the overlapping region of the trapezoidal imaging ranges of the plurality of imaging devices Cam1,..., Cam9 in the video composition device according to the second embodiment.
  • FIG. 8 shows an example of the arrangement of the plurality of imaging devices Cam1,..., Cam9, and does not limit the arrangement method of the plurality of imaging devices.
  • the imaging device when a plurality of imaging devices Cam1,..., Cam9 are arranged, the maximum number of imaging devices corresponding to the same pixel (41 in FIG. 8) in the superimposed region of the composite video is 4 in the vertical and horizontal directions. It becomes a stand.
  • the imaging device generates four types of reference tables, that is, first to fourth reference tables, as reference tables composed of only pixels that can be simultaneously substituted for the synthesized video pixels. Regardless of the number of images, the video composition process can be executed in four steps.
  • FIG. 9 is a diagram illustrating an example of an overlapping area of the imaging ranges of the plurality of imaging devices Cam1,..., Cam9 in the video composition device according to the second embodiment.
  • the imaging range is drawn in a rectangle.
  • FIG. 10 is a diagram illustrating a first reference table for the imaging range in the video composition device according to Embodiment 2.
  • FIG. 11 to FIG. 13 show the second reference table, the third reference table, and the fourth reference table, which are other reference tables for the imaging range (superimposed region) in the video composition device according to the second embodiment.
  • FIG. 9 to 13 show the images after projection conversion of the imaging devices Cam1,..., CamN as rectangles for the sake of simplicity, but the process is the same for trapezoids or other shapes.
  • the first to fourth reference tables shown in FIGS. 9 to 13 are examples, and the shape and number of the reference tables are not limited to the examples of FIGS. 9 to 13.
  • Video composition processing The pixel value of the input video transferred to the video processing memory 14 is substituted into the composite video using the first to fourth reference tables. The processing procedure is shown below.
  • the video processor 13 of the video processing unit 6 executes the following operations in parallel. ⁇ 11> In the first processing, the video processor 13 determines from the first reference table shown in FIG. 10 the imaging device Cami of the camera number i and camera number i corresponding to each pixel (x_synth, y_synth) in the synthesized video. A corresponding pixel position (x_cami, y_cami) and a weighted coefficient ⁇ are extracted.
  • the video processor 13 refers to the pixel value of the pixel (x_cami, y_cami) of the input video of the imaging device Cami with the camera number i on the video processing memory 14 and weights this pixel value The added coefficient ⁇ is multiplied and assigned to the pixel of the composite video (x_synth, y_synth) on the video processing memory 14.
  • the video processor 13 performs the same process as the eleventh process and the twelfth process on the second reference table.
  • the video processor 13 executes the same process as the eleventh process and the twelfth process for the third reference table.
  • the video processor 13 also executes the same processes as the eleventh process and the twelfth process for the fourth reference table.
  • the processing procedure of the entire video processing unit is the same as that in FIG.
  • the operation of the display processing unit is the same as that of the first embodiment.
  • ⁇ 2-3 Effect According to the video synthesizing apparatus and the video synthesizing method according to the second embodiment, the decoding load of the input video data increases according to the number of the imaging devices Cam1, ..., CamN, but the imaging device Cam1. ,..., The load of video composition processing of the video acquired by CamN hardly increases.
  • the first reference table is composed of only data that can be assigned to pixels at the same time, paying attention to the processing wait between the images of the imaging devices that have adjacent overlapping regions.
  • processing can be executed in a maximum of four steps.
  • 1 video composition device 2 display device, 4 video reception unit, 5 parameter input unit, 6 video processing unit, 6a storage unit, 7 display processing unit, 10 main processor, 11 main memory, 12 auxiliary memory, 13 video processing processor, 14 video processing memory, 15 input interface, 16 file interface, 17 display interface, 18 video input interface, Cam1, ..., Cami, ..., CamN imaging device (camera).

Abstract

An image synthesis device (1) comprises an image reception unit (4), a parameter input unit (5) and an image processing unit (6). The image processing unit (6) uses previously input camera parameters to create a reference table including, for each pixel of a synthesized image: first image pickup device identifying information (Cami) identifying a corresponding image pickup device among a plurality of image pickup devices (Cam1,..., CamN); a first corresponding pixel position (x_cami, y_cami) in the image pickup device identified by the first image pickup device identifying information; and a first weighting coefficient (α) at the first corresponding pixel position. The image processing unit (6) then refers to the reference table and generates a synthesized image by substituting, for each pixel (x_synth, y_synth) of the synthesized image, a first value obtained by multiplying the pixel value of the first corresponding pixel position in the image pickup device, identified by the first image pickup device identifying information, by the first weighting coefficient (α).

Description

映像合成装置及び映像合成方法Video composition apparatus and video composition method
 本発明は、複数の撮像装置によって取得された複数の映像(すなわち、複数の映像データ)から1つの合成映像を生成するための映像合成装置及び映像合成方法に関する。 The present invention relates to a video composition device and a video composition method for generating one composite video from a plurality of videos (that is, a plurality of video data) acquired by a plurality of imaging devices.
 映像の撮影画角を広げるために、複数の撮像装置(すなわち、複数台のカメラ)による撮影で取得された複数の映像を合成して、1つの合成映像を生成する映像合成処理が知られている(例えば、特許文献1、2、及び3参照)。通常、1つの合成映像を生成するための映像合成処理では、複数の撮像装置から出力された複数の映像の各々に対して、レンズ歪補正処理、視点変換処理、投影変換処理等の映像処理が行われる。これらの映像処理の処理負荷は非常に大きいので、これらの映像処理を、通常の演算装置(CPU:Central Processing Unit)によって実時間で行うことは難しい。そこで、従来の装置では、通常の演算装置と並列的に動作することができる並列演算装置であるGPU(Graphics Processing Unit)によって映像合成処理が行われる。 In order to widen the shooting angle of view of a video, a video synthesis process is known in which a plurality of videos acquired by shooting with a plurality of imaging devices (that is, a plurality of cameras) are combined to generate one combined video. (For example, see Patent Documents 1, 2, and 3). Usually, in video composition processing for generating one composite video, video processing such as lens distortion correction processing, viewpoint conversion processing, projection conversion processing, etc. is performed on each of a plurality of videos output from a plurality of imaging devices. Done. Since the processing load of these video processes is very large, it is difficult to perform these video processes in real time by a normal arithmetic unit (CPU: Central Processing Unit). Therefore, in the conventional apparatus, the video composition processing is performed by a GPU (Graphics Processing Unit) which is a parallel arithmetic apparatus that can operate in parallel with a normal arithmetic apparatus.
特許第4744823号公報Japanese Patent No. 4747423 特開2015-207802号公報Japanese Patent Laid-Open No. 2015-207802 特開2016-066842号公報JP 2016-066682 A
 しかしながら、GPU等の並列演算装置を用いた場合においても、映像合成処理の負荷は、撮像装置の台数の増加(すなわち、合成される映像の数の増加)に伴って大きくなる。特に、投影変換処理において、撮像装置による撮像範囲間の境界部である重畳領域における映像にブレンド処理を行う際には、重畳領域における複数の映像が入力されるまで処理待ちが発生するため、映像合成処理に要する処理時間が長くなるという課題がある。 However, even when a parallel arithmetic device such as a GPU is used, the load of video composition processing increases with an increase in the number of imaging devices (that is, an increase in the number of images to be synthesized). In particular, in the projection conversion process, when performing blend processing on a video in a superimposition area that is a boundary between imaging ranges by an imaging device, processing waits until a plurality of videos in the superposition area are input. There is a problem that the processing time required for the synthesis process becomes long.
 本発明は、上記従来の課題を解決するためになされたものであり、その目的は、撮像装置の数が増加した場合であっても、複数の撮像装置で取得された複数の映像から1つの合成映像を生成する映像合成処理を短時間で行うことができる映像合成装置及び映像合成方法を提供することにある。 The present invention has been made in order to solve the above-described conventional problems, and the object of the present invention is to obtain one image from a plurality of images acquired by a plurality of imaging devices even when the number of imaging devices is increased. An object of the present invention is to provide a video composition device and a video composition method capable of performing a video composition process for generating a composite image in a short time.
 本発明の一態様に係る映像合成装置は、複数の撮像装置で取得された複数の映像から1つの合成映像を生成する映像合成装置であって、前記複数の映像を受信する映像受信部と、前記複数の撮像装置のカメラパラメータが入力されるパラメータ入力部と、前記複数の映像から前記合成映像を生成する映像処理部と、を備え、前記映像処理部は、予め入力された前記カメラパラメータを用いて、前記合成映像の画素毎に、前記複数の撮像装置のうちの対応する撮像装置を特定する第1の撮像装置特定情報と、前記第1の撮像装置特定情報によって特定された撮像装置における対応する第1の画素位置と、前記対応する第1の画素位置における第1の重み付き係数とを含む参照テーブルを作成し、前記参照テーブルを参照して、前記合成映像の画素毎に、前記第1の撮像装置特定情報によって特定された撮像装置における対応する第1の画素位置の画素値に前記第1の重み付き係数を乗算することで得られた第1の値を代入することで、前記合成映像を生成するものである。 A video composition device according to an aspect of the present invention is a video composition device that generates one composite video from a plurality of videos acquired by a plurality of imaging devices, and a video reception unit that receives the plurality of videos; A parameter input unit to which camera parameters of the plurality of imaging devices are input; and a video processing unit that generates the composite video from the plurality of videos, wherein the video processing unit receives the camera parameters input in advance. Using, for each pixel of the composite video, first imaging device identification information for identifying a corresponding imaging device among the plurality of imaging devices, and an imaging device identified by the first imaging device identification information A reference table including a corresponding first pixel position and a first weighted coefficient at the corresponding first pixel position is created, and the composite video is referred to by referring to the reference table For each element, the first value obtained by multiplying the pixel value of the corresponding first pixel position in the imaging device specified by the first imaging device specifying information by the first weighting coefficient is obtained. By substituting, the synthesized video is generated.
 本発明の他の態様に係る映像合成方法は、複数の撮像装置で取得された複数の映像から1つの合成映像を生成する映像合成方法であって、前記複数の撮像装置について予め入力されたカメラパラメータを用いて、前記合成映像の画素毎に、前記複数の撮像装置のうちの対応する撮像装置を特定する第1の撮像装置特定情報と、前記第1の撮像装置特定情報によって特定された撮像装置における対応する第1の画素位置と、前記対応する第1の画素位置における第1の重み付き係数とを含む第1の参照テーブルを作成するステップと、前記第1の参照テーブルを参照して、前記合成映像の画素毎に、前記第1の撮像装置特定情報によって特定された撮像装置における対応する第1の画素位置の画素値に前記第1の重み付き係数を乗算することで得られた第1の値を代入することで、前記合成映像を生成するステップとを有するものである。 A video composition method according to another aspect of the present invention is a video composition method for generating one composite video from a plurality of videos acquired by a plurality of imaging devices, the cameras being input in advance for the plurality of imaging devices. Using the parameter, for each pixel of the composite video, first imaging device specifying information for specifying a corresponding imaging device among the plurality of imaging devices, and imaging specified by the first imaging device specifying information Creating a first reference table including a corresponding first pixel position in the device and a first weighted coefficient at the corresponding first pixel position; and referring to the first reference table The pixel value of the corresponding first pixel position in the imaging device specified by the first imaging device specifying information is multiplied by the first weighted coefficient for each pixel of the composite video. By substituting the first value obtained, in which a step of generating the combined image.
 本発明によれば、撮像装置の数が増加した場合であっても、複数の撮像装置で取得された複数の映像から1つの合成映像を生成する映像合成処理を短時間で行うことができる。 According to the present invention, even when the number of imaging devices is increased, video composition processing for generating one composite video from a plurality of videos acquired by a plurality of imaging devices can be performed in a short time.
本発明の実施の形態1に係る映像合成装置の構成を概略的に示す機能ブロック図である。It is a functional block diagram which shows roughly the structure of the video synthesizing | combining apparatus which concerns on Embodiment 1 of this invention. 実施の形態1に係る映像合成装置を概略的に示すハードウェア構成図である。1 is a hardware configuration diagram schematically showing a video composition device according to Embodiment 1. FIG. 実施の形態1に係る映像合成装置における合成映像の画素と複数の撮像装置の画素との対応関係の例を示す図である。4 is a diagram illustrating an example of a correspondence relationship between a synthesized video pixel and pixels of a plurality of imaging devices in the video synthesis device according to Embodiment 1. FIG. 実施の形態1に係る映像合成装置における複数の撮像装置の撮像範囲の重畳領域の例を示す図である。6 is a diagram illustrating an example of an overlapping area of imaging ranges of a plurality of imaging devices in the video composition device according to Embodiment 1. FIG. 実施の形態1に係る映像合成装置において第1の参照テーブルに含まれる撮像装置の画素の範囲を示す図である。4 is a diagram illustrating a pixel range of an imaging device included in a first reference table in the video composition device according to Embodiment 1. FIG. 実施の形態1に係る映像合成装置において第2の参照テーブルに含まれる撮像装置の画素の範囲を示す図である。6 is a diagram illustrating a pixel range of an imaging device included in a second reference table in the video composition device according to Embodiment 1. FIG. 実施の形態1に係る映像合成装置の動作(すなわち、実施の形態1に係る映像合成方法)を示すフローチャートである。6 is a flowchart showing an operation of the video composition device according to the first embodiment (that is, a video composition method according to the first embodiment). 実施の形態2に係る映像合成装置における複数の撮像装置の台形状の撮像範囲の重畳領域の例を示す図である。6 is a diagram illustrating an example of a superposed region of trapezoidal imaging ranges of a plurality of imaging devices in a video composition device according to Embodiment 2. FIG. 実施の形態2に係る映像合成装置における複数の撮像装置の撮像範囲を簡素化した例を示す図である。FIG. 10 is a diagram illustrating an example in which imaging ranges of a plurality of imaging devices in a video composition device according to Embodiment 2 are simplified. 実施の形態2に係る映像合成装置における第1の参照テーブルに含まれる撮像装置の画素の範囲を示す図である。FIG. 10 is a diagram illustrating a pixel range of an imaging device included in a first reference table in a video composition device according to Embodiment 2. 実施の形態2に係る映像合成装置における第2の参照テーブルに含まれる撮像装置の画素の範囲(重畳領域)を示す図である。FIG. 10 is a diagram illustrating a pixel range (superimposed region) of an imaging device included in a second reference table in the video composition device according to Embodiment 2. 実施の形態2に係る映像合成装置における第3の参照テーブルに含まれる撮像装置の画素の範囲(重畳領域)を示す図である。FIG. 10 is a diagram illustrating a pixel range (superimposed region) of an imaging device included in a third reference table in the video composition device according to Embodiment 2. 実施の形態2に係る映像合成装置における第4の参照テーブルに含まれる撮像装置の画素の範囲(重畳領域)を示す図である。It is a figure which shows the range (superimposition area | region) of the pixel of the imaging device contained in the 4th reference table in the video synthesizing | combining apparatus which concerns on Embodiment 2. FIG.
《1》実施の形態1
《1-1》構成
 図1は、本発明の実施の形態1に係る映像合成装置1の構成を概略的に示す機能ブロック図である。映像合成装置1は、実施の形態1に係る映像合成方法を実施することができる装置である。映像合成装置1は、複数の撮像装置(すなわち、複数台のカメラ)Cam1,…,Cami,…,CamNから出力された複数の映像(すなわち、複数の映像データ)から1つの合成映像(すなわち、1つの合成映像データ)を生成する。Nは2以上の整数であり、iは1以上N以下の任意の整数である。映像が動画像である場合には、映像合成装置1は、N台の撮像装置Cam1,…,CamNから出力されたN枚の映像フレームから1枚の合成映像フレームを作成する処理を、撮像装置Cam1,…,CamNから映像フレームが入力される度に繰り返し行うことで、合成映像データとしての動画像データを生成する。生成された合成映像データは、表示装置2に出力される。表示装置2は、受信した合成映像データに基づく映像を表示する。
<< 1 >> Embodiment 1
<< 1-1 >> Configuration FIG. 1 is a functional block diagram schematically showing a configuration of a video composition device 1 according to Embodiment 1 of the present invention. The video composition apparatus 1 is an apparatus that can perform the video composition method according to the first embodiment. The video synthesizing device 1 is a composite video (ie, a plurality of video data) output from a plurality of imaging devices (ie, a plurality of cameras) Cam1, ..., Cami, ..., CamN. 1 composite video data) is generated. N is an integer of 2 or more, and i is an arbitrary integer of 1 or more and N or less. When the video is a moving image, the video synthesizing device 1 performs processing for creating one synthesized video frame from N video frames output from the N imaging devices Cam1, ..., CamN. By repeatedly performing each time a video frame is input from Cam1,..., CamN, moving image data as composite video data is generated. The generated composite video data is output to the display device 2. The display device 2 displays a video based on the received composite video data.
 合成映像の例は、広い視野を持つ横長の映像であるパノラマ映像及び高い位置から見下ろした映像である俯瞰映像等である。実施の形態1においては、複数の撮像装置によって取得された左右方向(一次元方向)に並ぶ複数の映像を合成することによって生成された合成映像がパノラマ映像である場合を説明する。また、後述の実施の形態2においては、複数の撮像装置によって取得された上下左右方向(二次元方向)に並ぶ複数の映像を合成することによって生成された合成映像が俯瞰映像である場合を説明する。映像合成装置1は、合成映像の画素に対応する、撮像装置Cam1,…,CamNの画素に関する情報を持つ参照テーブルを予め作成し、この参照テーブルを用いて合成映像の画素の画素値を設定(代入)する。 Examples of composite video include panoramic video that is a horizontally long video with a wide field of view and an overhead video that is a video looking down from a high position. In Embodiment 1, a case will be described in which a synthesized video generated by synthesizing a plurality of videos arranged in the left-right direction (one-dimensional direction) acquired by a plurality of imaging devices is a panoramic video. In the second embodiment to be described later, a case is described in which a synthesized video generated by synthesizing a plurality of videos arranged in the vertical and horizontal directions (two-dimensional directions) acquired by a plurality of imaging devices is an overhead video. To do. The video composition device 1 creates in advance a reference table having information regarding the pixels of the imaging devices Cam1,..., CamN corresponding to the composite video pixels, and sets the pixel values of the composite video pixels using this reference table ( substitute.
 図1に示されるように、実施の形態1に係る映像合成装置1は、映像受信部4と、パラメータ入力部5と、記憶部6aを有する映像処理部6と、表示処理部7とを具備している。記憶部6aは、映像処理部6の外部に備えられてもよい。図1に示される映像合成装置1は、ソフトウェアとしてのプログラムを格納する記憶部6aとしてのメモリと、メモリに格納されたプログラムを実行する情報処理部としてのプロセッサとを用いて(例えば、コンピュータにより)実現することができる。なお、図1に示される映像合成装置1の一部を、プログラムを格納するメモリと、このプログラムを実行するプロセッサとによって実現してもよい。 As shown in FIG. 1, the video composition device 1 according to Embodiment 1 includes a video receiving unit 4, a parameter input unit 5, a video processing unit 6 having a storage unit 6a, and a display processing unit 7. is doing. The storage unit 6 a may be provided outside the video processing unit 6. The video composition apparatus 1 shown in FIG. 1 uses a memory as a storage unit 6a that stores a program as software and a processor as an information processing unit that executes a program stored in the memory (for example, by a computer). ) Can be realized. 1 may be realized by a memory that stores a program and a processor that executes the program.
 映像受信部4は、複数の撮像装置Cam1,…,CamNから出力された複数の映像データを受信し、受信した映像データを映像処理部6に出力する。なお、映像データのデコード処理を映像受信部4で行い、デコードされた映像データを映像処理部6に出力してもよい。 The video reception unit 4 receives a plurality of video data output from the plurality of imaging devices Cam1,..., CamN, and outputs the received video data to the video processing unit 6. The video data decoding process may be performed by the video receiving unit 4 and the decoded video data may be output to the video processing unit 6.
 パラメータ入力部5は、予め行われたキャリブレーション、すなわち、レンズ及びイメージセンサの撮像素子のパラメータ推定によって得られた複数の撮像装置Cam1,…,CamNについてのカメラパラメータを示す情報を受け取り、映像処理部6に出力する。カメラパラメータは、例えば、撮像装置Cam1,…,CamNに固有のカメラパラメータである内部パラメータ、世界座標系における撮像装置Cam1,…,CamNの位置姿勢を示すカメラパラメータである外部パラメータ、撮像装置Cam1,…,CamNのレンズに特有の歪(例えば、レンズの半径方向の歪及びレンズの円周方向の歪)を補正するために使用されるレンズ歪補正係数(例えば、レンズ歪補正マップ)等を含む。 The parameter input unit 5 receives information indicating camera parameters for a plurality of imaging devices Cam1,..., CamN obtained by calibration performed in advance, that is, parameter estimation of imaging elements of the lens and the image sensor, and performs video processing. Output to unit 6. The camera parameters include, for example, internal parameters that are camera parameters unique to the imaging devices Cam1,..., CamN, external parameters that are camera parameters indicating the positions and orientations of the imaging devices Cam1,. ..., including lens distortion correction coefficient (for example, lens distortion correction map) used to correct distortion specific to the CamN lens (for example, distortion in the radial direction of the lens and distortion in the circumferential direction of the lens) .
 実施の形態1においては、映像処理部6は、予め行われたキャリブレーションによって算出されたカメラパラメータを用いて、初期化時に映像合成用の参照テーブルを作成し、この参照テーブルを記憶部6aに格納する。映像処理部6は、参照テーブルを参照して、映像受信部4から出力された複数の映像データ(映像フレーム)から合成映像データを生成する。 In the first embodiment, the video processing unit 6 creates a reference table for video composition at the time of initialization using the camera parameters calculated by the calibration performed in advance, and stores this reference table in the storage unit 6a. Store. The video processing unit 6 refers to the reference table and generates composite video data from a plurality of video data (video frames) output from the video receiving unit 4.
 表示処理部7は、映像処理部6で生成された合成映像データを表示装置2に出力する。 The display processing unit 7 outputs the composite video data generated by the video processing unit 6 to the display device 2.
 図2は、実施の形態1に係る映像合成装置1を概略的に示すハードウェア構成図である。映像合成装置1は、メインプロセッサ10と、メインメモリ11と、補助メモリ12と、GPU等の並列演算装置である映像処理プロセッサ13と、映像処理メモリ14と、入力インタフェース15と、ファイルインタフェース16と、表示インタフェース17と、映像入力インタフェース18とを具備している。 FIG. 2 is a hardware configuration diagram schematically showing the video composition device 1 according to the first embodiment. The video composition device 1 includes a main processor 10, a main memory 11, an auxiliary memory 12, a video processing processor 13 that is a parallel processing device such as a GPU, a video processing memory 14, an input interface 15, and a file interface 16. A display interface 17 and a video input interface 18.
 図1の映像処理部6は、図2に示されるメインプロセッサ10、メインメモリ11、補助メモリ12、映像処理プロセッサ13、及び映像処理メモリ14を含む。また、図1の記憶部6aは、図2に示されるメインメモリ11、補助メモリ12、及び映像処理メモリ14を含む。また、図1のパラメータ入力部5は、図2に示されるファイルインタフェース16を含む。また、図1の映像受信部4は、図2に示される映像入力インタフェース18を含む。また、図1の表示処理部7は、図2に示される表示インタフェース17を含む。ただし、図2は、図1に示される映像合成装置1のハードウェア構成の一例を示しているに過ぎず、ハードウェア構成は種々の変更が可能である。また、図1に示される機能ブロック4~7と、図2に示されるハードウェア構成10~18の対応関係も上記例に限定されない。 1 includes a main processor 10, a main memory 11, an auxiliary memory 12, a video processing processor 13, and a video processing memory 14 shown in FIG. The storage unit 6a in FIG. 1 includes the main memory 11, the auxiliary memory 12, and the video processing memory 14 shown in FIG. Further, the parameter input unit 5 of FIG. 1 includes a file interface 16 shown in FIG. 1 includes a video input interface 18 shown in FIG. The display processing unit 7 in FIG. 1 includes a display interface 17 shown in FIG. However, FIG. 2 only shows an example of the hardware configuration of the video composition apparatus 1 shown in FIG. 1, and the hardware configuration can be variously changed. Further, the correspondence relationship between the functional blocks 4 to 7 shown in FIG. 1 and the hardware configurations 10 to 18 shown in FIG. 2 is not limited to the above example.
 図1のパラメータ入力部5は、予め実行されたキャリブレーションによって算出したカメラパラメータ情報を、ファイルインタフェース16を補助メモリ12から取得し、メインメモリ11に書き込む。 The parameter input unit 5 in FIG. 1 acquires the camera parameter information calculated by the calibration executed in advance from the auxiliary memory 12 and writes it to the main memory 11.
 補助メモリ12は、予め実行されたキャリブレーションによって算出されたカメラパラメータを保存してもよい。また、メインプロセッサ10は、カメラパラメータを、ファイルインタフェース16を通じて、メインメモリ11に保存してもよい。また、メインプロセッサ10は、静止画から合成映像を作成する場合に、補助メモリ12に静止画ファイルを保存してもよい。 The auxiliary memory 12 may store camera parameters calculated by a previously executed calibration. The main processor 10 may store the camera parameters in the main memory 11 through the file interface 16. The main processor 10 may store a still image file in the auxiliary memory 12 when creating a composite video from a still image.
 入力インタフェース15は、マウス入力、キーボード入力、タッチパネル入力等のデバイス入力を受け付け、入力情報をメインプロセッサ10に送る。 The input interface 15 receives device input such as mouse input, keyboard input, touch panel input, and the like, and sends input information to the main processor 10.
 映像処理メモリ14は、メインメモリ11から転送された入力映像データ、及び映像処理プロセッサ13で作成された合成映像データを記憶する。 The video processing memory 14 stores the input video data transferred from the main memory 11 and the composite video data created by the video processing processor 13.
 表示インタフェース17と表示装置2とは、HDMI(登録商標)(High-Definition Multimedia Interface)ケーブル等によって接続される。合成映像は、表示処理部7としての表示インタフェース17を経由して表示装置2に出力される。 The display interface 17 and the display device 2 are connected by an HDMI (registered trademark) (High-Definition Multimedia Interface) cable or the like. The synthesized video is output to the display device 2 via the display interface 17 as the display processing unit 7.
 映像受信部4としての映像入力インタフェース18は、映像合成装置1に接続された撮像装置Cam1,…,CamNの映像入力を受け付け、入力映像をメインメモリ11に記憶させる。撮像装置Cam1,…,CamNは、例えば、ネットワークカメラ、アナログカメラ、USB(Universal Serial Bus)カメラ、HD-SDI(High Definition Serial Digital Interface)カメラ等である。なお、映像入力インタフェース18としては、接続されるデバイスに適合した規格のものが使用される。 The video input interface 18 as the video receiver 4 receives video inputs from the imaging devices Cam1,..., CamN connected to the video synthesizer 1 and stores the input video in the main memory 11. The imaging devices Cam1,..., CamN are, for example, network cameras, analog cameras, USB (Universal Serial Bus) cameras, HD-SDI (High Definition Serial Digital Interface) cameras, and the like. Note that the video input interface 18 uses a standard conforming to the connected device.
《1-2》動作
[初期化処理]
 まず、図1における映像処理部6は、作成する合成映像の解像度W_synth×H_synthを決定し、図1における記憶部6aに合成映像を格納するためのメモリ領域を確保する。ここで、W_synthは、矩形の合成映像の水平方向の画素数を示し、H_synthは、合成映像の垂直方向の画素数を示す。この処理を、図2を用いて言い換えれば、映像処理プロセッサ13は、作成する合成映像の解像度W_synth×H_synthを決定し、映像処理メモリ14に合成映像を格納するためのメモリ領域を確保する。
<< 1-2 >> Operation [Initialization Process]
First, the video processing unit 6 in FIG. 1 determines the resolution W_synth × H_synth of the composite video to be created, and reserves a memory area for storing the composite video in the storage unit 6a in FIG. Here, W_synth indicates the number of pixels in the horizontal direction of the rectangular composite video, and H_synth indicates the number of pixels in the vertical direction of the composite video. In other words, the video processor 13 determines the resolution W_synth × H_synth of the composite video to be created, and reserves a memory area for storing the composite video in the video processing memory 14.
 次に、図1における映像処理部6は、図1におけるパラメータ入力部5から入力された撮像装置Cam1,…,CamNのカメラパラメータ(内部パラメータ、外部パラメータ、レンズ歪補正データ及び投影面等)から、撮像装置Cam1,…,CamNについての参照テーブルを作成して、記憶部6aに格納する。この処理を、図2を用いて言い換えれば、映像処理プロセッサ13は、ファイルインタフェース16から入力された撮像装置Cam1,…,CamNのカメラパラメータから、撮像装置Cam1,…,CamNについての参照テーブルを作成して、映像処理メモリ14に格納する。 Next, the video processing unit 6 in FIG. 1 uses the camera parameters (internal parameters, external parameters, lens distortion correction data, projection plane, etc.) of the imaging devices Cam1,..., CamN input from the parameter input unit 5 in FIG. , A reference table for the imaging devices Cam1,..., CamN is created and stored in the storage unit 6a. In other words, the video processor 13 creates a reference table for the imaging devices Cam1,..., CamN from the camera parameters of the imaging devices Cam1,..., CamN input from the file interface 16. And stored in the video processing memory 14.
 図3は、実施の形態1に係る映像合成装置1における合成映像の画素と複数の撮像装置Cam1,…,CamNの画素との対応関係の例を示す図である。
 撮像装置Cam1,…,CamNについての参照テーブルは、図3に示されるように、合成映像の画素に対応する、撮像装置Cam1,…,CamNの画素(x_cam1,y_cam1),…,(x_camN,y_camN)、並びに、対応する撮像装置Cam1,…,CamNの画素(x_cam1,y_cam1),…,(x_camN,y_camN)におけるα値(アルファ値)を記憶する。x_cam1は、カメラ番号i=1である撮像装置Cam1のイメージセンサの画素のx座標を示し、y_cam1は、カメラ番号i=1である撮像装置Cam1のイメージセンサの画素のy座標を示す。α値は、撮像装置Cam1,…,CamNの撮像範囲の重畳領域のブレンド処理に利用する重み付き係数である。α値は、ピクセルデータの不透明度を示すカメラパラメータであり、0から1の範囲内の値であり、α値=0は完全透明を表し、α値=1は完全不透明を表す。
FIG. 3 is a diagram illustrating an example of a correspondence relationship between a synthesized video pixel and pixels of a plurality of imaging devices Cam1,..., CamN in the video synthesis device 1 according to the first embodiment.
The reference table for the imaging devices Cam1,..., CamN, as shown in FIG. 3, the pixels (x_cam1, y_cam1), ..., (x_camN, y_camN) of the imaging devices Cam1,. , And α values (alpha values) in the pixels (x_cam1, y_cam1),..., (X_camN, y_camN) of the corresponding imaging devices Cam1,. x_cam1 indicates the x coordinate of the pixel of the image sensor of the imaging device Cam1 with the camera number i = 1, and y_cam1 indicates the y coordinate of the pixel of the image sensor of the imaging device Cam1 with the camera number i = 1. The α value is a weighted coefficient used for the blending process of the overlapping region of the imaging range of the imaging devices Cam1,. The α value is a camera parameter indicating the opacity of the pixel data, and is a value in the range of 0 to 1, where α value = 0 represents complete transparency and α value = 1 represents complete opacity.
 合成映像の画素に対応する撮像装置Cam1,…,CamNの画素が存在しない場合は、参照テーブルに対応画素が存在しないことを示す値を設定する。撮像装置Cam1,…,CamNの画素(x_cam1,y_cam1),…,(x_camN,y_camN)と合成映像の画素(x_synth,y_synth)との対応は、定義した投影面上の座標から投影変換処理前の座標、視点変換処理前の座標、レンズ歪補正処理前の座標と逆算することにより算出することができる。x_synthは、合成映像の画素のx座標を示し、y_synthは、合成映像の画素のy座標を示す。参照テーブルを利用すると、映像合成処理は、合成映像の画素(x_synth,y_synth)の画素値として、合成映像の画素(x_synth,y_synth)に対応する撮像装置Cam1,…,CamNの画素(x_cam1,y_cam1),…,(x_camN,y_camN)の画素値を代入する処理である。
 ただし、撮像装置Cam1,…,CamNの参照テーブルを順に利用して映像合成処理を行う場合には、撮像装置Cam1,…,CamNの台数の増加に比例して処理時間が増加する。また、GPU等の並列演算装置を利用する場合であっても、重畳領域においてブレンド処理を行う際に、処理待ちが発生するため、撮像装置Cam1,…,CamNの台数の増加に伴って処理時間が増加する。
When there are no pixels of the imaging devices Cam1,. The correspondence between the pixels (x_cam1, y_cam1),..., (X_camN, y_camN) of the imaging devices Cam1,..., CamN and the pixels (x_synth, y_synth) of the synthesized image The calculation can be performed by calculating back the coordinates, the coordinates before the viewpoint conversion process, and the coordinates before the lens distortion correction process. x_synth indicates the x coordinate of the pixel of the composite video, and y_synth indicates the y coordinate of the pixel of the composite video. When the reference table is used, the video composition processing is performed by using the pixel values (x_cam1, y_cam1) of the imaging devices Cam1,. ,..., (X_camN, y_camN).
However, when video composition processing is performed using the reference tables of the imaging devices Cam1,..., CamN in order, the processing time increases in proportion to the increase in the number of imaging devices Cam1,. In addition, even when a parallel processing device such as a GPU is used, processing wait occurs when performing blend processing in the superimposition region, so that processing time increases with the increase in the number of imaging devices Cam1,. Will increase.
 図4は、実施の形態1に係る映像合成装置1における複数の撮像装置Cam1,…,Cam4の撮像範囲の重畳領域の例を示す図である。通常、図4に示されるようなパノラマ合成映像を作成する際には、隣接する撮像装置Cam1,…,Cam4の撮像範囲に重畳領域が存在する。重畳領域には、ブレンド処理を適用するが、異なる撮像装置Cam1,…,Cam4の画素値を参照し、この画素値に重み付き係数αを乗算して重み付けを行い、重み付けされた画素値を合成映像の対応する画素の画素値として代入(ブレンド)するため、撮像装置Cam1,…,Cam4から出力された映像データについて処理待ち時間が発生する。なお、重畳領域ではブレンド処理が行われるので、重畳領域をブレンド領域とも言う。 FIG. 4 is a diagram illustrating an example of a superposed region of the imaging ranges of the plurality of imaging devices Cam1,..., Cam4 in the video composition device 1 according to the first embodiment. Normally, when creating a panoramic composite image as shown in FIG. 4, there is a superimposition region in the imaging range of the adjacent imaging devices Cam1,. Blend processing is applied to the superimposition area, but the pixel values of different imaging devices Cam1,..., Cam4 are referred to, weighted by multiplying the pixel values by a weighted coefficient α, and the weighted pixel values are synthesized. In order to substitute (blend) as the pixel value of the corresponding pixel of the video, a processing waiting time occurs for the video data output from the imaging devices Cam1,. Note that since the blending process is performed in the overlapping region, the overlapping region is also referred to as a blending region.
 図5は、実施の形態1に係る映像合成装置1において参照テーブル(第1の参照テーブル)に含まれる撮像装置Cam1,…,Cam4の画素の範囲を示す図である。また、図6は、実施の形態1に係る映像合成装置1において他の参照テーブル(第2の参照テーブル)に含まれる撮像装置Cam1,…,Cam4の画素の範囲を示す図である。実施の形態1に係る映像合成装置1では、映像処理部6は、撮像装置Cam1,…,CamNの参照テーブルから映像合成用の参照テーブルを作成する。この映像合成用の参照テーブルは、ブレンド領域の上側(左側の撮像装置)とブレンド領域の下側(右側の撮像装置)の情報を保持した2枚の参照テーブル、すなわち、図5に示される第1の参照テーブルと図6に示される第2の参照テーブルとから構成される。 FIG. 5 is a diagram illustrating pixel ranges of the imaging devices Cam1,..., Cam4 included in the reference table (first reference table) in the video composition device 1 according to the first embodiment. FIG. 6 is a diagram showing the pixel ranges of the imaging devices Cam1,..., Cam4 included in another reference table (second reference table) in the video composition device 1 according to the first embodiment. In the video composition device 1 according to the first embodiment, the video processing unit 6 creates a reference table for video composition from the reference tables of the imaging devices Cam1, ..., CamN. This reference table for video composition is two reference tables holding information on the upper side (left imaging device) of the blend region and the lower side (right imaging device) of the blend region, that is, the first table shown in FIG. 1 reference table and a second reference table shown in FIG.
 映像合成用の第1の参照テーブルは、第1の撮像装置特定情報としてのカメラ番号iと、対応する撮像装置Camiの画素(x_cami,y_cami)と、対応する撮像装置Camiの画素のα値とを保持する。なお、重畳領域以外の画素のα値は1とする。図5の例では、撮像装置Cam1,…,Cam4が4台である場合を示す。図5は、一例を示すものであり、第1の参照テーブルに含まれる撮像装置Cam1,…,Cam4の画素の範囲は、図5の例に限定されない。 The first reference table for video composition includes the camera number i as the first imaging device identification information, the pixel (x_cami, y_cami) of the corresponding imaging device Cami, and the α value of the pixel of the corresponding imaging device Cami. Hold. Note that the α value of the pixels other than the overlapping region is 1. In the example of FIG. 5, the case where there are four imaging devices Cam1,. FIG. 5 shows an example, and the pixel ranges of the imaging devices Cam1,..., Cam4 included in the first reference table are not limited to the example of FIG.
 映像合成用の第2の参照テーブルは、複数の撮像装置Cam1,…,CamNによって撮像される複数の撮像範囲のうちの、重畳領域を持つ撮像装置を特定する第2の撮像装置特定情報としてのカメラ番号iと、対応する撮像装置Camiの画素(x_cami,y_cami)と、対応する撮像装置Camiの重畳領域の画素のα値とを保持する。図6は、一例を示すものであり、第2の参照テーブルに含まれる撮像装置Cam1,…,Cam4の画素の範囲は、図6の例に限定されない。 The second reference table for video composition is used as second imaging device specifying information for specifying an imaging device having an overlapping area among a plurality of imaging ranges captured by the plurality of imaging devices Cam1,..., CamN. The camera number i, the pixel (x_cami, y_cami) of the corresponding imaging device Cami, and the α value of the pixel in the overlapping region of the corresponding imaging device Cami are held. FIG. 6 shows an example, and the pixel ranges of the imaging devices Cam1,..., Cam4 included in the second reference table are not limited to the example of FIG.
 撮像装置Cam1,…,CamNの撮像範囲の重畳領域以外の画素と重畳領域の左側の撮像装置(又は、右側の撮像装置)に対応する画素とは、合成映像の画素に同時に代入することができる。これらの画素の情報を保持したものは、図5に示される上側の参照テーブル(第1の参照テーブル)である。撮像装置Cam1,…,CamNの撮像範囲の重畳領域の右側の撮像装置(又は、左側の撮像装置)に対応する画素の情報を保持したものは、図6に示される下側の参照テーブル(第2の参照テーブル)である。 Pixels other than the overlapping region of the imaging range of the imaging devices Cam1,..., CamN and the pixels corresponding to the imaging device on the left side of the overlapping region (or the imaging device on the right side) can be simultaneously assigned to the synthesized video pixels. . What holds information on these pixels is the upper reference table (first reference table) shown in FIG. The information stored in the pixel corresponding to the imaging device on the right side (or the imaging device on the left side) of the overlapping region of the imaging range of the imaging devices Cam1,..., CamN is stored in the lower reference table (first table) shown in FIG. 2 reference table).
 例えば、図5に示される撮像装置Cam1、Cam2、Cam3、及びCam4の撮像範囲の画素の画素値は、第1の参照テーブルを用いて、合成映像の画素に同時に代入することができる。また、図6に示される撮像装置Cam2、Cam3、及びCam4の撮像範囲の重畳領域の画素の画素値は、第2の参照テーブルを用いて、合成映像の画素に同時に代入することができる。実施の形態1においては、GPU等の並列演算装置である映像処理プロセッサを使用し、第1及び第2の参照テーブルを用いることで、映像合成処理に処理待ちが発生せず、撮像装置Cam1,…,CamNの台数に関わらず、2つのステップ、すなわち、第1の参照テーブルを用いた代入処理と第2の参照テーブルを用いた代入処理とで、合成映像を生成することができる。 For example, the pixel values of the pixels in the imaging range of the imaging devices Cam1, Cam2, Cam3, and Cam4 shown in FIG. 5 can be simultaneously substituted into the synthesized video pixels using the first reference table. In addition, the pixel values of the pixels in the overlapping region of the imaging ranges of the imaging devices Cam2, Cam3, and Cam4 illustrated in FIG. 6 can be simultaneously assigned to the synthesized video pixels using the second reference table. In the first embodiment, a video processor that is a parallel processing device such as a GPU is used, and the first and second reference tables are used, so that no processing wait occurs in the video composition processing, and the imaging device Cam1, ..., regardless of the number of CamNs, a composite image can be generated in two steps, that is, the substitution process using the first reference table and the substitution process using the second reference table.
[映像入力処理]
 映像受信部4における映像入力インタフェース18は、撮像装置Cam1,…,CamNの1フレーム分の映像データを取得しメインメモリ11に格納される。取得された映像データは、メインメモリ11から映像処理メモリ14に転送される。
[Video input processing]
The video input interface 18 in the video receiver 4 acquires video data for one frame of the imaging devices Cam1,..., CamN and stores the video data in the main memory 11. The acquired video data is transferred from the main memory 11 to the video processing memory 14.
[映像合成処理]
 映像処理部6における映像処理プロセッサ13は、映像処理メモリ14に転送された入力映像の画素値を、第1の参照テーブルと第2の参照テーブルとを用いて、入力映像の画素に対応する合成映像の画素の画素値として代入する。この処理手順を以下に説明する。
[Video composition processing]
The video processor 13 in the video processing unit 6 combines the pixel values of the input video transferred to the video processing memory 14 using the first reference table and the second reference table, corresponding to the pixels of the input video. Substitute as the pixel value of the image pixel. This processing procedure will be described below.
 以下の映像合成処理は、映像処理プロセッサ13によって、メインプロセッサ10の処理と並列に実行される。
 〈1〉先ず、第1の処理として、映像処理プロセッサ13は、第1の参照テーブルから、合成映像における各画素(x_synth,y_synth)に対応するカメラ番号i、カメラ番号iの撮像装置Camiにおける対応する画素位置(x_cami,y_cami)、及び重み付き係数αを取り出す。
 〈2〉次に、第2の処理として、映像処理プロセッサ13は、映像処理メモリ14上のカメラ番号iの入力映像(x_cami,y_cami)の画素値を参照し、この画素値に重み付き係数αを乗算して、映像処理メモリ14上の合成映像(x_synth,y_synth)の画素に代入する。
The following video composition processing is executed by the video processor 13 in parallel with the processing of the main processor 10.
<1> First, as the first process, the video processor 13 determines from the first reference table that the camera number i corresponding to each pixel (x_synth, y_synth) in the synthesized video and the camera device i corresponding to the camera number i correspond to each other. The pixel position (x_cami, y_cami) to be performed and the weighted coefficient α are extracted.
<2> Next, as a second process, the video processor 13 refers to the pixel value of the input video (x_cami, y_cami) of the camera number i on the video processing memory 14 and assigns a weighting coefficient α to this pixel value. Is substituted for the pixel of the composite video (x_synth, y_synth) on the video processing memory 14.
 次に、映像処理プロセッサ13は、以下の映像合成処理を、メインプロセッサ10の処理と並列に実行する。
 〈3〉先ず、第3の処理として、映像処理プロセッサ13は、第2の参照テーブルから、合成映像における各画素(x_synth,y_synth)に対応するカメラ番号i、カメラ番号iの撮像装置Camiにおける対応する画素位置(x_cami,y_cami)、及び重み付き係数αを取り出す。
 〈4〉次に、第4の処理として、映像処理プロセッサ13は、映像処理メモリ14上のカメラ番号iの入力映像(x_cami,y_cami)の画素値を参照し、この画素値に重み付き係数αを乗算して、映像処理メモリ14上の合成映像(x_synth,y_synth)の画素に代入する。これによって、合成映像の重畳領域の画素にブレンド処理が行われる。
Next, the video processor 13 executes the following video composition processing in parallel with the processing of the main processor 10.
<3> First, as the third process, the video processor 13 determines from the second reference table that the camera number i corresponding to each pixel (x_synth, y_synth) in the synthesized video and the imaging device Cami corresponding to the camera number i. The pixel position (x_cami, y_cami) to be performed and the weighted coefficient α are extracted.
<4> Next, as a fourth process, the video processor 13 refers to the pixel value of the input video (x_cami, y_cami) of the camera number i on the video processing memory 14 and assigns a weighting coefficient α to this pixel value. Is substituted for the pixel of the composite video (x_synth, y_synth) on the video processing memory 14. As a result, blend processing is performed on the pixels in the superimposed region of the composite video.
 図7は、実施の形態1に係る映像合成装置の動作(すなわち、実施の形態1に係る映像合成方法)を示すフローチャートである。映像処理部6は、初期化処理(ステップS1)にて参照テーブルを作成した後は、映像入力処理(ステップS2)と映像合成処理(ステップS4)を、映像入力が終了するまで(ステップS4)、繰り返し実行する。映像処理部6は、撮像装置Cam1,…,CamNの位置がずれた場合は、映像上の特徴点等を利用して、位置ズレを補正して新たな参照テーブルをバックグラウンドで作成した後、現在使用している参照テーブルを新たな参照テーブルと入れ替えることで、位置合わせされた合成映像を作成することができる。 FIG. 7 is a flowchart showing the operation of the video composition apparatus according to the first embodiment (that is, the video composition method according to the first embodiment). After creating the reference table in the initialization process (step S1), the video processing unit 6 performs the video input process (step S2) and the video composition process (step S4) until the video input is completed (step S4). , Repeat. When the positions of the imaging devices Cam1,..., CamN are shifted, the video processing unit 6 corrects the positional shift using the feature points on the video and creates a new reference table in the background. By replacing the currently used reference table with a new reference table, an aligned composite video can be created.
 表示処理部7は、映像処理部6で作成された合成映像データとしてのパノラマ合成映像データを表示装置2に送信する。表示装置2は、受信したパノラマ合成映像データに基づく映像を表示する。なお、表示装置2は、パノラマ合成映像を、1枚のディスプレイ画面に表示してもよく、又は、複数枚のディスプレイ画面に渡って表示してもよい。また、表示装置2は、パノラマ合成映像の一部の領域のみを切り出して表示してもよい。 The display processing unit 7 transmits the panoramic composite video data as the composite video data created by the video processing unit 6 to the display device 2. The display device 2 displays a video based on the received panoramic composite video data. Note that the display device 2 may display the panoramic composite video on a single display screen, or may display it over a plurality of display screens. The display device 2 may cut out and display only a partial area of the panoramic composite video.
《1-3》効果
 以上に説明したように、実施の形態1に係る映像合成装置1及び映像合成方法によれば、撮像装置Cam1,…,CamNの台数に応じて、入力映像データのデコード負荷は増大するが、撮像装置Cam1,…,CamNが取得した映像の映像合成処理の負荷はほとんど増加しない。
<< 1-3 >> Effect As described above, according to the video composition device 1 and the video composition method according to the first embodiment, the decoding load of the input video data according to the number of the imaging devices Cam1,. However, the load of the video composition processing of the video acquired by the imaging devices Cam1,..., CamN hardly increases.
 また、撮像装置Cam1,…,CamN毎の入力映像にレンズ歪補正処理、視点変換処理、投影変換処理を適用した場合、撮像装置Cam1,…,CamNの台数の増加に伴って処理時間が増加する。また、撮像装置Cam1,…,CamN毎の参照テーブルを用意してレンズ歪補正処理、視点変換処理、投影変換処理をひとまとめにした場合においても、撮像装置Cam1,…,CamN境界の重畳領域における処理待ちが発生するため、撮像装置Cam1,…,CamNの台数の増加に伴って処理時間が増加する。
 実施の形態1に係る映像合成装置1及び映像合成方法では、隣り合って重畳領域を持つ撮像装置の映像間の処理待ちに着目し、第1の参照テーブルを同時に画素代入できるデータのみで構成することで、重畳領域に係る撮像装置Cam1,…,CamNの最大台数のステップ数で映像合成処理を実現することができる。つまり、実施の形態1においては、重畳領域に係る撮像装置Cam1,…,CamNの最大台数は2台であるパノラマ合成映像であるため、第1の参照テーブルを用いたステップと第2の参照テーブルを用いたステップとからなる2つのステップで合成処理が実行可能である。
In addition, when lens distortion correction processing, viewpoint conversion processing, and projection conversion processing are applied to an input image for each of the imaging devices Cam1,..., CamN, the processing time increases as the number of imaging devices Cam1,. . In addition, even when a reference table is prepared for each of the imaging devices Cam1,..., CamN and lens distortion correction processing, viewpoint conversion processing, and projection conversion processing are combined, processing in the overlapping region of the imaging devices Cam1,. Since waiting occurs, the processing time increases as the number of imaging devices Cam1,..., CamN increases.
In the video synthesizing apparatus 1 and the video synthesizing method according to the first embodiment, paying attention to the waiting for processing between videos of the imaging apparatuses adjacent to each other with the overlapping area, the first reference table is configured only by data that can be substituted with pixels simultaneously. Thus, the video composition process can be realized with the maximum number of steps of the imaging devices Cam1,. That is, in the first embodiment, since the maximum number of imaging devices Cam1,..., CamN related to the overlapping region is a panoramic composite video, the steps using the first reference table and the second reference table are used. The synthesizing process can be executed in two steps consisting of steps using.
《2》実施の形態2
《2-1》構成
 上記実施の形態1においては、左右方向に並ぶ複数の映像から1つの合成映像(パノラマ映像)を生成するための映像合成装置及び映像合成方法を説明した。これに対し、本発明の実施の形態2では、上下左右方向に並ぶ複数の映像から1つの合成映像(俯瞰映像)を生成するための映像合成装置及び映像合成方法を説明する。
<< 2 >> Embodiment 2
<< 2-1 >> Configuration In the first embodiment, the video composition apparatus and the video composition method for generating one composite video (panoramic video) from a plurality of videos arranged in the left-right direction have been described. On the other hand, in Embodiment 2 of the present invention, a video composition device and a video composition method for generating one composite video (overhead video) from a plurality of videos arranged in the vertical and horizontal directions will be described.
 実施の形態2は、複数の撮像装置Cam1,…,CamNの配置の点、及び図1の映像処理部6(又は図2の映像処理プロセッサ)が参照テーブル(第1の参照テーブル)と他の参照テーブル(第2から第4の参照テーブル)を使用して映像合成処理を行う点において、実施の形態1と相違する。これらの点を除き、実施の形態2は、実施の形態1と同じである。したがって、実施の形態2の説明に際しては、実施の形態1の説明で用いた図1、図2及び図7をも参照する。 In the second embodiment, the arrangement of a plurality of imaging devices Cam1,..., CamN, and the video processing unit 6 (or the video processing processor in FIG. 2) in FIG. The difference from Embodiment 1 is that the video composition processing is performed using the reference tables (second to fourth reference tables). Except for these points, the second embodiment is the same as the first embodiment. Therefore, in the description of the second embodiment, reference is also made to FIGS. 1, 2 and 7 used in the description of the first embodiment.
《2-2》動作
 図8は、実施の形態2に係る映像合成装置における複数の撮像装置Cam1,…,Cam9の台形状の撮像範囲の重畳領域の例を示す図である。図8に示されるように、俯瞰合成映像を作成する場合、複数の撮像装置Cam1,…,Cam9が取得する映像を視点変換及び投影変換することで、大きな1枚の俯瞰映像を作成することができる。図8は、複数の撮像装置Cam1,…,Cam9の配置の一例を示すものであり、複数の撮像装置の配置方法を限定するものではない。
<< 2-2 >> Operation FIG. 8 is a diagram illustrating an example of the overlapping region of the trapezoidal imaging ranges of the plurality of imaging devices Cam1,..., Cam9 in the video composition device according to the second embodiment. As shown in FIG. 8, when creating a bird's-eye view synthesized video, a large one bird's-eye view video can be created by performing viewpoint conversion and projection conversion of videos acquired by a plurality of imaging devices Cam1, ..., Cam9. it can. FIG. 8 shows an example of the arrangement of the plurality of imaging devices Cam1,..., Cam9, and does not limit the arrangement method of the plurality of imaging devices.
 図8に示されるように、複数の撮像装置Cam1,…,Cam9を配置した場合、合成映像の重畳領域の同一画素(図8における41)に対応する撮像装置の台数は、上下左右の最大4台となる。パノラマ合成画像の生成と同様に、合成映像の画素に同時に代入可能な画素のみで構成した参照テーブルとして4種類の参照テーブル、すなわち、第1から第4の参照テーブルを作成することで、撮像装置の台数に関わらず、4回のステップで映像合成処理を実行することができる。 As shown in FIG. 8, when a plurality of imaging devices Cam1,..., Cam9 are arranged, the maximum number of imaging devices corresponding to the same pixel (41 in FIG. 8) in the superimposed region of the composite video is 4 in the vertical and horizontal directions. It becomes a stand. Similarly to the generation of the panorama composite image, the imaging device generates four types of reference tables, that is, first to fourth reference tables, as reference tables composed of only pixels that can be simultaneously substituted for the synthesized video pixels. Regardless of the number of images, the video composition process can be executed in four steps.
 図9は、実施の形態2に係る映像合成装置における複数の撮像装置Cam1,…,Cam9の撮像範囲の重畳領域の例を示す図である。図9では、図8の例を簡素化するために、撮影範囲を長方形に描いている。図10は、実施の形態2に係る映像合成装置における撮像範囲についての第1の参照テーブルを示す図である。図11から図13は、実施の形態2に係る映像合成装置における撮像範囲(重畳領域)についての他の参照テーブルである第2の参照テーブル、第3の参照テーブル、及び第4の参照テーブルを示す図である。図9から図13は、説明を簡単にするため、撮像装置Cam1,…,CamNの投影変換後の映像を長方形で示すが、台形又は他の形状であっても処理は同様である。また、図9から図13に示される第1から第4の参照テーブルは一例であり、参照テーブルの形状及び個数は図9から図13の例に限定されない。 FIG. 9 is a diagram illustrating an example of an overlapping area of the imaging ranges of the plurality of imaging devices Cam1,..., Cam9 in the video composition device according to the second embodiment. In FIG. 9, in order to simplify the example of FIG. 8, the imaging range is drawn in a rectangle. FIG. 10 is a diagram illustrating a first reference table for the imaging range in the video composition device according to Embodiment 2. FIG. 11 to FIG. 13 show the second reference table, the third reference table, and the fourth reference table, which are other reference tables for the imaging range (superimposed region) in the video composition device according to the second embodiment. FIG. 9 to 13 show the images after projection conversion of the imaging devices Cam1,..., CamN as rectangles for the sake of simplicity, but the process is the same for trapezoids or other shapes. Also, the first to fourth reference tables shown in FIGS. 9 to 13 are examples, and the shape and number of the reference tables are not limited to the examples of FIGS. 9 to 13.
[映像入力処理]
 映像受信部4が撮像装置Cam1,…,CamNの1フレーム分の入力映像を取得完了したら、メインメモリ11から映像処理メモリ14に入力映像データを転送する。
[Video input processing]
When the video receiving unit 4 completes acquiring the input video for one frame of the imaging devices Cam1,..., CamN, the input video data is transferred from the main memory 11 to the video processing memory 14.
[映像合成処理]
 映像処理メモリ14上に転送された入力映像の画素値を、上記の第1から第4の参照テーブルを用いて合成映像に代入する。処理手順を以下に示す。
[Video composition processing]
The pixel value of the input video transferred to the video processing memory 14 is substituted into the composite video using the first to fourth reference tables. The processing procedure is shown below.
 以下の操作を、映像処理部6の映像処理プロセッサ13が、並列に実行する。
 〈11〉第1の処理では、映像処理プロセッサ13は、図10に示される第1の参照テーブルから合成映像における各画素(x_synth,y_synth)に対応するカメラ番号i、カメラ番号iの撮像装置Camiにおける対応する画素位置(x_cami,y_cami)、及び重み付き係数αを取り出す。
 〈12〉第2の処理では、映像処理プロセッサ13は、映像処理メモリ14上のカメラ番号iの撮像装置Camiの入力映像の画素(x_cami,y_cami)の画素値を参照し、この画素値に重み付き係数αを乗算して、映像処理メモリ14上の合成映像(x_synth,y_synth)の画素に代入する。
 〈13〉第3の処理では、映像処理プロセッサ13は、第2の参照テーブルについて第11の処理と第12の処理と同様の処理を実行する。
 〈14〉第4の処理では、映像処理プロセッサ13は、第3の参照テーブルについても、第11の処理と第12の処理と同様の処理を実行する。
 〈15〉第5の処理では、映像処理プロセッサ13は、第4の参照テーブルについても、第11の処理と第12の処理と同様の処理を実行する。
 また、映像処理部全体の処理手順は図7と同じである。また、表示処理部の動作も実施の形態1のものと同じである。
The video processor 13 of the video processing unit 6 executes the following operations in parallel.
<11> In the first processing, the video processor 13 determines from the first reference table shown in FIG. 10 the imaging device Cami of the camera number i and camera number i corresponding to each pixel (x_synth, y_synth) in the synthesized video. A corresponding pixel position (x_cami, y_cami) and a weighted coefficient α are extracted.
<12> In the second processing, the video processor 13 refers to the pixel value of the pixel (x_cami, y_cami) of the input video of the imaging device Cami with the camera number i on the video processing memory 14 and weights this pixel value The added coefficient α is multiplied and assigned to the pixel of the composite video (x_synth, y_synth) on the video processing memory 14.
<13> In the third process, the video processor 13 performs the same process as the eleventh process and the twelfth process on the second reference table.
<14> In the fourth process, the video processor 13 executes the same process as the eleventh process and the twelfth process for the third reference table.
<15> In the fifth process, the video processor 13 also executes the same processes as the eleventh process and the twelfth process for the fourth reference table.
The processing procedure of the entire video processing unit is the same as that in FIG. The operation of the display processing unit is the same as that of the first embodiment.
《2-3》効果
 実施の形態2に係る映像合成装置及び映像合成方法によれば、撮像装置Cam1,…,CamNの台数に応じて、入力映像データのデコード負荷は増大するが、撮像装置Cam1,…,CamNが取得した映像の映像合成処理の負荷はほとんど増加しない。
<< 2-3 >> Effect According to the video synthesizing apparatus and the video synthesizing method according to the second embodiment, the decoding load of the input video data increases according to the number of the imaging devices Cam1, ..., CamN, but the imaging device Cam1. ,..., The load of video composition processing of the video acquired by CamN hardly increases.
 また、実施の形態2に係る映像合成装置及び映像合成方法では、隣り合って重畳領域を持つ撮像装置の映像間の処理待ちに着目し、第1の参照テーブルを同時に画素代入できるデータのみで構成することで、重畳領域に係る撮像装置Cam1,…,CamNの最大台数のステップ数のみで映像合成処理を実現することができる。これにより、撮像装置Cam1,…,CamNの台数が増加しても、同一処理時間での合成映像が作成可能となる。例えば、俯瞰合成映像の場合、4回のステップで処理が実行可能である。 In the video composition device and the video composition method according to the second embodiment, the first reference table is composed of only data that can be assigned to pixels at the same time, paying attention to the processing wait between the images of the imaging devices that have adjacent overlapping regions. By doing this, it is possible to realize the video composition processing only with the maximum number of steps of the imaging devices Cam1,. Thereby, even if the number of imaging devices Cam1,..., CamN increases, a composite image can be created in the same processing time. For example, in the case of a bird's-eye synthesized video, the process can be executed in four steps.
 俯瞰合成映像の場合、図7のように個別カメラの配置を行った場合は、最大4ステップで処理が実行可能となる。 In the case of a bird's-eye view composite video, if individual cameras are arranged as shown in FIG. 7, processing can be executed in a maximum of four steps.
 1 映像合成装置、 2 表示装置、 4 映像受信部、 5 パラメータ入力部、 6 映像処理部、 6a 記憶部、 7 表示処理部、  10 メインプロセッサ、 11 メインメモリ、 12 補助メモリ、 13 映像処理プロセッサ、 14 映像処理メモリ、 15 入力インタフェース、 16 ファイルインタフェース、 17 表示インタフェース、 18 映像入力インタフェース、 Cam1,…,Cami,…,CamN 撮像装置(カメラ)。 1 video composition device, 2 display device, 4 video reception unit, 5 parameter input unit, 6 video processing unit, 6a storage unit, 7 display processing unit, 10 main processor, 11 main memory, 12 auxiliary memory, 13 video processing processor, 14 video processing memory, 15 input interface, 16 file interface, 17 display interface, 18 video input interface, Cam1, ..., Cami, ..., CamN imaging device (camera).

Claims (6)

  1.  複数の撮像装置で取得された複数の映像から1つの合成映像を生成する映像合成装置であって、
     前記複数の映像を受信する映像受信部と、
     前記複数の撮像装置のカメラパラメータが入力されるパラメータ入力部と、
     前記複数の映像から前記合成映像を生成する映像処理部と、
     を備え、
     前記映像処理部は、
     予め入力された前記カメラパラメータを用いて、前記合成映像の画素毎に、前記複数の撮像装置のうちの対応する撮像装置を特定する第1の撮像装置特定情報と、前記第1の撮像装置特定情報によって特定された撮像装置における対応する第1の画素位置と、前記対応する第1の画素位置における第1の重み付き係数とを含む参照テーブルを作成し、
     前記参照テーブルを参照して、前記合成映像の画素毎に、前記第1の撮像装置特定情報によって特定された撮像装置における対応する第1の画素位置の画素値に前記第1の重み付き係数を乗算することで得られた第1の値を代入することで、前記合成映像を生成する
     ことを特徴とする映像合成装置。
    A video synthesizing device that generates one synthesized video from a plurality of videos acquired by a plurality of imaging devices,
    A video receiver for receiving the plurality of videos;
    A parameter input unit for inputting camera parameters of the plurality of imaging devices;
    A video processing unit that generates the composite video from the plurality of videos;
    With
    The video processing unit
    First imaging device identification information for identifying a corresponding imaging device among the plurality of imaging devices and the first imaging device identification for each pixel of the composite video using the camera parameters input in advance Creating a reference table including a corresponding first pixel position in the imaging device identified by the information and a first weighted coefficient at the corresponding first pixel position;
    With reference to the reference table, for each pixel of the composite video, the first weighted coefficient is added to the pixel value of the corresponding first pixel position in the imaging device specified by the first imaging device specifying information. An image composition device, wherein the composite image is generated by substituting the first value obtained by multiplication.
  2.  前記映像処理部は、
     前記カメラパラメータを用いて、前記合成映像の画素毎に、前記複数の撮像装置によって撮像される複数の撮像範囲のうちの、互いに重畳する撮像範囲である重畳領域を持つ撮像装置の一つを特定する第2の撮像装置特定情報と、前記第2の撮像装置特定情報によって特定された撮像装置における前記重畳領域に対応する第2の画素位置と、前記重畳領域に対応する第2の画素位置における第2の重み付き係数とを含む他の参照テーブルを作成し、
     前記他の参照テーブルを参照して、前記合成映像の画素のうちの、前記重畳領域の第2の画素位置に対応する画素毎に、前記第2の撮像装置特定情報によって特定された撮像装置における対応する画素位置の画素値に前記第2の重み付き係数を乗算することで得られた第2の値を代入してブレンド処理を行うことによって、前記合成映像のうちの前記重畳領域に対応する部分を生成する
     ことを特徴とする請求項1に記載の映像合成装置。
    The video processing unit
    Using the camera parameters, for each pixel of the composite video, one of the imaging devices having an overlapping region that is an imaging region that overlaps one another among a plurality of imaging ranges captured by the plurality of imaging devices is specified. Second imaging device specifying information, a second pixel position corresponding to the overlapping region in the imaging device specified by the second imaging device specifying information, and a second pixel position corresponding to the overlapping region. Create another lookup table that includes the second weighting factor,
    In the imaging device identified by the second imaging device identification information for each pixel corresponding to the second pixel position of the superimposition region among the pixels of the composite video with reference to the other reference table A blending process is performed by substituting the second value obtained by multiplying the pixel value at the corresponding pixel position by the second weighted coefficient, thereby corresponding to the superimposed region in the composite video. The video synthesizing apparatus according to claim 1, wherein a part is generated.
  3.  前記複数の撮像装置によって撮像される複数の撮像範囲のうちの、互いに重畳する撮像範囲は、左右方向に重畳し、前記合成映像は、パノラマ映像であり、
     前記他の参照テーブルは1つのテーブルである
     ことを特徴とする請求項2に記載の映像合成装置。
    Among a plurality of imaging ranges imaged by the plurality of imaging devices, imaging ranges that overlap each other overlap in the left-right direction, and the composite video is a panoramic video,
    The video composition device according to claim 2, wherein the other reference table is a single table.
  4.  前記複数の撮像装置によって撮像される複数の撮像範囲のうちの、互いに重畳する撮像範囲は、上下左右方向に重畳し、前記合成映像は、俯瞰映像であり、
     前記他の参照テーブルは3つのテーブルである
     ことを特徴とする請求項2に記載の映像合成装置。
    Among a plurality of imaging ranges imaged by the plurality of imaging devices, imaging ranges that overlap each other are superimposed in the vertical and horizontal directions, and the composite video is an overhead video,
    The video synthesizing apparatus according to claim 2, wherein the other reference tables are three tables.
  5.  複数の撮像装置で取得された複数の映像から1つの合成映像を生成する映像合成方法であって、
     前記複数の撮像装置について予め入力されたカメラパラメータを用いて、前記合成映像の画素毎に、前記複数の撮像装置のうちの対応する撮像装置を特定する第1の撮像装置特定情報と、前記第1の撮像装置特定情報によって特定された撮像装置における対応する第1の画素位置と、前記対応する第1の画素位置における第1の重み付き係数とを含む第1の参照テーブルを作成するステップと、
     前記第1の参照テーブルを参照して、前記合成映像の画素毎に、前記第1の撮像装置特定情報によって特定された撮像装置における対応する第1の画素位置の画素値に前記第1の重み付き係数を乗算することで得られた第1の値を代入することで、前記合成映像を生成するステップと
     を有することを特徴とする映像合成方法。
    A video synthesis method for generating one composite video from a plurality of videos acquired by a plurality of imaging devices,
    First imaging device specifying information for specifying a corresponding imaging device among the plurality of imaging devices for each pixel of the composite video using camera parameters input in advance for the plurality of imaging devices; Creating a first reference table including a corresponding first pixel position in the imaging device specified by the one imaging device specifying information and a first weighted coefficient at the corresponding first pixel position; ,
    With reference to the first reference table, for each pixel of the composite video, the first weight is set to the pixel value at the corresponding first pixel position in the imaging device specified by the first imaging device specifying information. And a step of generating the composite video by substituting the first value obtained by multiplying the attached coefficient.
  6.  前記カメラパラメータを用いて、前記合成映像の画素毎に、前記複数の撮像装置によって撮像される複数の撮像範囲のうちの、互いに重畳する撮像範囲である重畳領域を持つ撮像装置の一つを特定する第2の撮像装置特定情報と、前記第2の撮像装置特定情報によって特定された撮像装置における前記重畳領域に対応する第2の画素位置と、前記重畳領域に対応する第2の画素位置)における第2の重み付き係数とを含む第2の参照テーブルを作成するステップと、
     前記第2の参照テーブルを参照して、前記合成映像の画素のうちの、前記重畳領域の第2の画素位置に対応する画素毎に、前記第2の撮像装置特定情報によって特定された撮像装置における対応する画素位置の画素値に前記第2の重み付き係数を乗算することで得られた第2の値を代入してブレンド処理を行うことによって、前記合成映像のうちの前記重畳領域に対応する部分を生成するステップと
     を有することを特徴とする請求項5に記載の映像合成方法。
    Using the camera parameters, for each pixel of the composite video, one of the imaging devices having an overlapping region that is an imaging region that overlaps one another among a plurality of imaging ranges captured by the plurality of imaging devices is specified. Second imaging device specifying information, a second pixel position corresponding to the overlapping region in the imaging device specified by the second imaging device specifying information, and a second pixel position corresponding to the overlapping region) Creating a second look-up table including a second weighted factor at
    The imaging device identified by the second imaging device identification information for each pixel corresponding to the second pixel position of the superimposition region among the pixels of the composite video with reference to the second reference table By substituting the second value obtained by multiplying the pixel value of the corresponding pixel position in the second weighted coefficient by the blending process, it corresponds to the superimposed region in the synthesized video 6. The method for synthesizing a video according to claim 5, further comprising: generating a portion to be processed.
PCT/JP2016/083316 2016-11-10 2016-11-10 Image synthesis device and image synthesis method WO2018087856A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2018549688A JP6513305B2 (en) 2016-11-10 2016-11-10 Video combining apparatus and video combining method
PCT/JP2016/083316 WO2018087856A1 (en) 2016-11-10 2016-11-10 Image synthesis device and image synthesis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2016/083316 WO2018087856A1 (en) 2016-11-10 2016-11-10 Image synthesis device and image synthesis method

Publications (1)

Publication Number Publication Date
WO2018087856A1 true WO2018087856A1 (en) 2018-05-17

Family

ID=62109506

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/083316 WO2018087856A1 (en) 2016-11-10 2016-11-10 Image synthesis device and image synthesis method

Country Status (2)

Country Link
JP (1) JP6513305B2 (en)
WO (1) WO2018087856A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPWO2019225255A1 (en) * 2018-05-21 2021-02-18 富士フイルム株式会社 Image correction device, image correction method, and image correction program
WO2021192096A1 (en) * 2020-03-25 2021-09-30 三菱電機株式会社 Image processing device, image processing method, and image processing program
US11198393B2 (en) * 2019-07-01 2021-12-14 Vadas Co., Ltd. Method and apparatus for calibrating a plurality of cameras

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008048266A (en) * 2006-08-18 2008-02-28 Matsushita Electric Ind Co Ltd On-vehicle image processor and viewpoint change information generating method
WO2015029934A1 (en) * 2013-08-30 2015-03-05 クラリオン株式会社 Camera calibration device, camera calibration system, and camera calibration method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008048266A (en) * 2006-08-18 2008-02-28 Matsushita Electric Ind Co Ltd On-vehicle image processor and viewpoint change information generating method
WO2015029934A1 (en) * 2013-08-30 2015-03-05 クラリオン株式会社 Camera calibration device, camera calibration system, and camera calibration method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPWO2019225255A1 (en) * 2018-05-21 2021-02-18 富士フイルム株式会社 Image correction device, image correction method, and image correction program
US11198393B2 (en) * 2019-07-01 2021-12-14 Vadas Co., Ltd. Method and apparatus for calibrating a plurality of cameras
WO2021192096A1 (en) * 2020-03-25 2021-09-30 三菱電機株式会社 Image processing device, image processing method, and image processing program
JPWO2021192096A1 (en) * 2020-03-25 2021-09-30
JP7038935B2 (en) 2020-03-25 2022-03-18 三菱電機株式会社 Image processing device, image processing method, and image processing program

Also Published As

Publication number Publication date
JPWO2018087856A1 (en) 2019-04-11
JP6513305B2 (en) 2019-05-15

Similar Documents

Publication Publication Date Title
US9196022B2 (en) Image transformation and multi-view output systems and methods
JP2009124685A (en) Method and system for combining videos for display in real-time
KR101521008B1 (en) Correction method of distortion image obtained by using fisheye lens and image display system implementing thereof
JP6882868B2 (en) Image processing equipment, image processing method, system
JP6735908B2 (en) Panorama video compression method and apparatus
JP2007089110A (en) Image splitting method for television wall
WO2018087856A1 (en) Image synthesis device and image synthesis method
TW200839734A (en) Video compositing device and video output device
JP2002014611A (en) Video projecting method to planetarium or spherical screen and device therefor
US20090059018A1 (en) Navigation assisted mosaic photography
WO2009090727A1 (en) Display
JP2004056359A (en) Image composite apparatus and image composite program
KR101819984B1 (en) Image synthesis method in real time
JP2009065519A (en) Image processing apparatus
JPH10108003A (en) Image compositing device and image compositing method
Shete et al. Real-time panorama composition for video surveillance using GPU
JP5645448B2 (en) Image processing apparatus, image processing method, and program
JP4676385B2 (en) Image composition method, image composition apparatus, and image composition program
KR102074072B1 (en) A focus-context display techinique and apparatus using a mobile device with a dual camera
JP6417204B2 (en) Image processing apparatus and image processing method
JP2023550764A (en) Methods, devices, smart terminals and media for creating panoramic images based on large displays
JP2012150614A (en) Free viewpoint image generation device
CN110519530B (en) Hardware-based picture-in-picture display method and device
JP5387276B2 (en) Image processing apparatus and image processing method
JP2009294273A (en) Super resolution display device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16921180

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018549688

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16921180

Country of ref document: EP

Kind code of ref document: A1