WO2018087856A1

WO2018087856A1 - Image synthesis device and image synthesis method

Info

Publication number: WO2018087856A1
Application number: PCT/JP2016/083316
Authority: WO
Inventors: 浩平岡原; 古木　一朗; 司深澤
Original assignee: 三菱電機株式会社
Priority date: 2016-11-10
Filing date: 2016-11-10
Publication date: 2018-05-17
Also published as: JPWO2018087856A1; JP6513305B2

Abstract

An image synthesis device (1) comprises an image reception unit (4), a parameter input unit (5) and an image processing unit (6). The image processing unit (6) uses previously input camera parameters to create a reference table including, for each pixel of a synthesized image: first image pickup device identifying information (Cami) identifying a corresponding image pickup device among a plurality of image pickup devices (Cam1,..., CamN); a first corresponding pixel position (x_cami, y_cami) in the image pickup device identified by the first image pickup device identifying information; and a first weighting coefficient (α) at the first corresponding pixel position. The image processing unit (6) then refers to the reference table and generates a synthesized image by substituting, for each pixel (x_synth, y_synth) of the synthesized image, a first value obtained by multiplying the pixel value of the first corresponding pixel position in the image pickup device, identified by the first image pickup device identifying information, by the first weighting coefficient (α).

Description

Video composition apparatus and video composition method

The present invention relates to a video composition device and a video composition method for generating one composite video from a plurality of videos (that is, a plurality of video data) acquired by a plurality of imaging devices.

In order to widen the shooting angle of view of a video, a video synthesis process is known in which a plurality of videos acquired by shooting with a plurality of imaging devices (that is, a plurality of cameras) are combined to generate one combined video. (For example, see

Patent Documents

1, 2, and 3). Usually, in video composition processing for generating one composite video, video processing such as lens distortion correction processing, viewpoint conversion processing, projection conversion processing, etc. is performed on each of a plurality of videos output from a plurality of imaging devices. Done. Since the processing load of these video processes is very large, it is difficult to perform these video processes in real time by a normal arithmetic unit (CPU: Central Processing Unit). Therefore, in the conventional apparatus, the video composition processing is performed by a GPU (Graphics Processing Unit) which is a parallel arithmetic apparatus that can operate in parallel with a normal arithmetic apparatus.

Japanese Patent No. 4747423 Japanese Patent Laid-Open No. 2015-207802 JP 2016-066682 A

However, even when a parallel arithmetic device such as a GPU is used, the load of video composition processing increases with an increase in the number of imaging devices (that is, an increase in the number of images to be synthesized). In particular, in the projection conversion process, when performing blend processing on a video in a superimposition area that is a boundary between imaging ranges by an imaging device, processing waits until a plurality of videos in the superposition area are input. There is a problem that the processing time required for the synthesis process becomes long.

The present invention has been made in order to solve the above-described conventional problems, and the object of the present invention is to obtain one image from a plurality of images acquired by a plurality of imaging devices even when the number of imaging devices is increased. An object of the present invention is to provide a video composition device and a video composition method capable of performing a video composition process for generating a composite image in a short time.

A video composition device according to an aspect of the present invention is a video composition device that generates one composite video from a plurality of videos acquired by a plurality of imaging devices, and a video reception unit that receives the plurality of videos; A parameter input unit to which camera parameters of the plurality of imaging devices are input; and a video processing unit that generates the composite video from the plurality of videos, wherein the video processing unit receives the camera parameters input in advance. Using, for each pixel of the composite video, first imaging device identification information for identifying a corresponding imaging device among the plurality of imaging devices, and an imaging device identified by the first imaging device identification information A reference table including a corresponding first pixel position and a first weighted coefficient at the corresponding first pixel position is created, and the composite video is referred to by referring to the reference table For each element, the first value obtained by multiplying the pixel value of the corresponding first pixel position in the imaging device specified by the first imaging device specifying information by the first weighting coefficient is obtained. By substituting, the synthesized video is generated.

A video composition method according to another aspect of the present invention is a video composition method for generating one composite video from a plurality of videos acquired by a plurality of imaging devices, the cameras being input in advance for the plurality of imaging devices. Using the parameter, for each pixel of the composite video, first imaging device specifying information for specifying a corresponding imaging device among the plurality of imaging devices, and imaging specified by the first imaging device specifying information Creating a first reference table including a corresponding first pixel position in the device and a first weighted coefficient at the corresponding first pixel position; and referring to the first reference table The pixel value of the corresponding first pixel position in the imaging device specified by the first imaging device specifying information is multiplied by the first weighted coefficient for each pixel of the composite video. By substituting the first value obtained, in which a step of generating the combined image.

According to the present invention, even when the number of imaging devices is increased, video composition processing for generating one composite video from a plurality of videos acquired by a plurality of imaging devices can be performed in a short time.

It is a functional block diagram which shows roughly the structure of the video synthesizing | combining apparatus which concerns on Embodiment 1 of this invention. 1 is a hardware configuration diagram schematically showing a video composition device according to Embodiment 1. FIG. 4 is a diagram illustrating an example of a correspondence relationship between a synthesized video pixel and pixels of a plurality of imaging devices in the video synthesis device according to Embodiment 1. FIG. 6 is a diagram illustrating an example of an overlapping area of imaging ranges of a plurality of imaging devices in the video composition device according to Embodiment 1. FIG. 4 is a diagram illustrating a pixel range of an imaging device included in a first reference table in the video composition device according to Embodiment 1. FIG. 6 is a diagram illustrating a pixel range of an imaging device included in a second reference table in the video composition device according to Embodiment 1. FIG. 6 is a flowchart showing an operation of the video composition device according to the first embodiment (that is, a video composition method according to the first embodiment). 6 is a diagram illustrating an example of a superposed region of trapezoidal imaging ranges of a plurality of imaging devices in a video composition device according to Embodiment 2. FIG. FIG. 10 is a diagram illustrating an example in which imaging ranges of a plurality of imaging devices in a video composition device according to Embodiment 2 are simplified. FIG. 10 is a diagram illustrating a pixel range of an imaging device included in a first reference table in a video composition device according to Embodiment 2. FIG. 10 is a diagram illustrating a pixel range (superimposed region) of an imaging device included in a second reference table in the video composition device according to Embodiment 2. FIG. 10 is a diagram illustrating a pixel range (superimposed region) of an imaging device included in a third reference table in the video composition device according to Embodiment 2. It is a figure which shows the range (superimposition area | region) of the pixel of the imaging device contained in the 4th reference table in the video synthesizing | combining apparatus which concerns on Embodiment 2. FIG.

<< 1 >> Embodiment 1
<< 1-1 >> Configuration FIG. 1 is a functional block diagram schematically showing a configuration of a video composition device 1 according to Embodiment 1 of the present invention. The video composition apparatus 1 is an apparatus that can perform the video composition method according to the first embodiment. The video synthesizing device 1 is a composite video (ie, a plurality of video data) output from a plurality of imaging devices (ie, a plurality of cameras) Cam1, ..., Cami, ..., CamN. 1 composite video data) is generated. N is an integer of 2 or more, and i is an arbitrary integer of 1 or more and N or less. When the video is a moving image, the video synthesizing device 1 performs processing for creating one synthesized video frame from N video frames output from the N imaging devices Cam1, ..., CamN. By repeatedly performing each time a video frame is input from Cam1,..., CamN, moving image data as composite video data is generated. The generated composite video data is output to the display device 2. The display device 2 displays a video based on the received composite video data.

Examples of composite video include panoramic video that is a horizontally long video with a wide field of view and an overhead video that is a video looking down from a high position. In Embodiment 1, a case will be described in which a synthesized video generated by synthesizing a plurality of videos arranged in the left-right direction (one-dimensional direction) acquired by a plurality of imaging devices is a panoramic video. In the second embodiment to be described later, a case is described in which a synthesized video generated by synthesizing a plurality of videos arranged in the vertical and horizontal directions (two-dimensional directions) acquired by a plurality of imaging devices is an overhead video. To do. The video composition device 1 creates in advance a reference table having information regarding the pixels of the imaging devices Cam1,..., CamN corresponding to the composite video pixels, and sets the pixel values of the composite video pixels using this reference table ( substitute.

As shown in FIG. 1, the video composition device 1 according to Embodiment 1 includes a video receiving unit 4, a parameter input unit 5, a video processing unit 6 having a storage unit 6a, and a display processing unit 7. is doing. The storage unit 6 a may be provided outside the video processing unit 6. The video composition apparatus 1 shown in FIG. 1 uses a memory as a storage unit 6a that stores a program as software and a processor as an information processing unit that executes a program stored in the memory (for example, by a computer). ) Can be realized. 1 may be realized by a memory that stores a program and a processor that executes the program.

The video reception unit 4 receives a plurality of video data output from the plurality of imaging devices Cam1,..., CamN, and outputs the received video data to the video processing unit 6. The video data decoding process may be performed by the video receiving unit 4 and the decoded video data may be output to the video processing unit 6.

The parameter input unit 5 receives information indicating camera parameters for a plurality of imaging devices Cam1,..., CamN obtained by calibration performed in advance, that is, parameter estimation of imaging elements of the lens and the image sensor, and performs video processing. Output to unit 6. The camera parameters include, for example, internal parameters that are camera parameters unique to the imaging devices Cam1,..., CamN, external parameters that are camera parameters indicating the positions and orientations of the imaging devices Cam1,. ..., including lens distortion correction coefficient (for example, lens distortion correction map) used to correct distortion specific to the CamN lens (for example, distortion in the radial direction of the lens and distortion in the circumferential direction of the lens) .

In the first embodiment, the video processing unit 6 creates a reference table for video composition at the time of initialization using the camera parameters calculated by the calibration performed in advance, and stores this reference table in the storage unit 6a. Store. The video processing unit 6 refers to the reference table and generates composite video data from a plurality of video data (video frames) output from the video receiving unit 4.

The display processing unit 7 outputs the composite video data generated by the video processing unit 6 to the display device 2.

FIG. 2 is a hardware configuration diagram schematically showing the video composition device 1 according to the first embodiment. The video composition device 1 includes a main processor 10, a main memory 11, an auxiliary memory 12, a video processing processor 13 that is a parallel processing device such as a GPU, a video processing memory 14, an input interface 15, and a file interface 16. A display interface 17 and a video input interface 18.

1 includes a main processor 10, a main memory 11, an auxiliary memory 12, a video processing processor 13, and a video processing memory 14 shown in FIG. The storage unit 6a in FIG. 1 includes the main memory 11, the auxiliary memory 12, and the video processing memory 14 shown in FIG. Further, the parameter input unit 5 of FIG. 1 includes a file interface 16 shown in FIG. 1 includes a video input interface 18 shown in FIG. The display processing unit 7 in FIG. 1 includes a display interface 17 shown in FIG. However, FIG. 2 only shows an example of the hardware configuration of the video composition apparatus 1 shown in FIG. 1, and the hardware configuration can be variously changed. Further, the correspondence relationship between the functional blocks 4 to 7 shown in FIG. 1 and the hardware configurations 10 to 18 shown in FIG. 2 is not limited to the above example.

The parameter input unit 5 in FIG. 1 acquires the camera parameter information calculated by the calibration executed in advance from the auxiliary memory 12 and writes it to the main memory 11.

The auxiliary memory 12 may store camera parameters calculated by a previously executed calibration. The main processor 10 may store the camera parameters in the main memory 11 through the file interface 16. The main processor 10 may store a still image file in the auxiliary memory 12 when creating a composite video from a still image.

The input interface 15 receives device input such as mouse input, keyboard input, touch panel input, and the like, and sends input information to the main processor 10.

The video processing memory 14 stores the input video data transferred from the main memory 11 and the composite video data created by the video processing processor 13.

The display interface 17 and the display device 2 are connected by an HDMI (registered trademark) (High-Definition Multimedia Interface) cable or the like. The synthesized video is output to the display device 2 via the display interface 17 as the display processing unit 7.

The video input interface 18 as the video receiver 4 receives video inputs from the imaging devices Cam1,..., CamN connected to the video synthesizer 1 and stores the input video in the main memory 11. The imaging devices Cam1,..., CamN are, for example, network cameras, analog cameras, USB (Universal Serial Bus) cameras, HD-SDI (High Definition Serial Digital Interface) cameras, and the like. Note that the video input interface 18 uses a standard conforming to the connected device.

<< 1-2 >> Operation [Initialization Process]
First, the video processing unit 6 in FIG. 1 determines the resolution W_synth × H_synth of the composite video to be created, and reserves a memory area for storing the composite video in the storage unit 6a in FIG. Here, W_synth indicates the number of pixels in the horizontal direction of the rectangular composite video, and H_synth indicates the number of pixels in the vertical direction of the composite video. In other words, the video processor 13 determines the resolution W_synth × H_synth of the composite video to be created, and reserves a memory area for storing the composite video in the video processing memory 14.

Next, the video processing unit 6 in FIG. 1 uses the camera parameters (internal parameters, external parameters, lens distortion correction data, projection plane, etc.) of the imaging devices Cam1,..., CamN input from the parameter input unit 5 in FIG. , A reference table for the imaging devices Cam1,..., CamN is created and stored in the storage unit 6a. In other words, the video processor 13 creates a reference table for the imaging devices Cam1,..., CamN from the camera parameters of the imaging devices Cam1,..., CamN input from the file interface 16. And stored in the video processing memory 14.

FIG. 3 is a diagram illustrating an example of a correspondence relationship between a synthesized video pixel and pixels of a plurality of imaging devices Cam1,..., CamN in the video synthesis device 1 according to the first embodiment.
The reference table for the imaging devices Cam1,..., CamN, as shown in FIG. 3, the pixels (x_cam1, y_cam1), ..., (x_camN, y_camN) of the imaging devices Cam1,. , And α values (alpha values) in the pixels (x_cam1, y_cam1),..., (X_camN, y_camN) of the corresponding imaging devices Cam1,. x_cam1 indicates the x coordinate of the pixel of the image sensor of the imaging device Cam1 with the camera number i = 1, and y_cam1 indicates the y coordinate of the pixel of the image sensor of the imaging device Cam1 with the camera number i = 1. The α value is a weighted coefficient used for the blending process of the overlapping region of the imaging range of the imaging devices Cam1,. The α value is a camera parameter indicating the opacity of the pixel data, and is a value in the range of 0 to 1, where α value = 0 represents complete transparency and α value = 1 represents complete opacity.

When there are no pixels of the imaging devices Cam1,. The correspondence between the pixels (x_cam1, y_cam1),..., (X_camN, y_camN) of the imaging devices Cam1,..., CamN and the pixels (x_synth, y_synth) of the synthesized image The calculation can be performed by calculating back the coordinates, the coordinates before the viewpoint conversion process, and the coordinates before the lens distortion correction process. x_synth indicates the x coordinate of the pixel of the composite video, and y_synth indicates the y coordinate of the pixel of the composite video. When the reference table is used, the video composition processing is performed by using the pixel values (x_cam1, y_cam1) of the imaging devices Cam1,. ,..., (X_camN, y_camN).
However, when video composition processing is performed using the reference tables of the imaging devices Cam1,..., CamN in order, the processing time increases in proportion to the increase in the number of imaging devices Cam1,. In addition, even when a parallel processing device such as a GPU is used, processing wait occurs when performing blend processing in the superimposition region, so that processing time increases with the increase in the number of imaging devices Cam1,. Will increase.

FIG. 4 is a diagram illustrating an example of a superposed region of the imaging ranges of the plurality of imaging devices Cam1,..., Cam4 in the video composition device 1 according to the first embodiment. Normally, when creating a panoramic composite image as shown in FIG. 4, there is a superimposition region in the imaging range of the adjacent imaging devices Cam1,. Blend processing is applied to the superimposition area, but the pixel values of different imaging devices Cam1,..., Cam4 are referred to, weighted by multiplying the pixel values by a weighted coefficient α, and the weighted pixel values are synthesized. In order to substitute (blend) as the pixel value of the corresponding pixel of the video, a processing waiting time occurs for the video data output from the imaging devices Cam1,. Note that since the blending process is performed in the overlapping region, the overlapping region is also referred to as a blending region.

FIG. 5 is a diagram illustrating pixel ranges of the imaging devices Cam1,..., Cam4 included in the reference table (first reference table) in the video composition device 1 according to the first embodiment. FIG. 6 is a diagram showing the pixel ranges of the imaging devices Cam1,..., Cam4 included in another reference table (second reference table) in the video composition device 1 according to the first embodiment. In the video composition device 1 according to the first embodiment, the video processing unit 6 creates a reference table for video composition from the reference tables of the imaging devices Cam1, ..., CamN. This reference table for video composition is two reference tables holding information on the upper side (left imaging device) of the blend region and the lower side (right imaging device) of the blend region, that is, the first table shown in FIG. 1 reference table and a second reference table shown in FIG.

The first reference table for video composition includes the camera number i as the first imaging device identification information, the pixel (x_cami, y_cami) of the corresponding imaging device Cami, and the α value of the pixel of the corresponding imaging device Cami. Hold. Note that the α value of the pixels other than the overlapping region is 1. In the example of FIG. 5, the case where there are four imaging devices Cam1,. FIG. 5 shows an example, and the pixel ranges of the imaging devices Cam1,..., Cam4 included in the first reference table are not limited to the example of FIG.

The second reference table for video composition is used as second imaging device specifying information for specifying an imaging device having an overlapping area among a plurality of imaging ranges captured by the plurality of imaging devices Cam1,..., CamN. The camera number i, the pixel (x_cami, y_cami) of the corresponding imaging device Cami, and the α value of the pixel in the overlapping region of the corresponding imaging device Cami are held. FIG. 6 shows an example, and the pixel ranges of the imaging devices Cam1,..., Cam4 included in the second reference table are not limited to the example of FIG.

Pixels other than the overlapping region of the imaging range of the imaging devices Cam1,..., CamN and the pixels corresponding to the imaging device on the left side of the overlapping region (or the imaging device on the right side) can be simultaneously assigned to the synthesized video pixels. . What holds information on these pixels is the upper reference table (first reference table) shown in FIG. The information stored in the pixel corresponding to the imaging device on the right side (or the imaging device on the left side) of the overlapping region of the imaging range of the imaging devices Cam1,..., CamN is stored in the lower reference table (first table) shown in FIG. 2 reference table).

For example, the pixel values of the pixels in the imaging range of the imaging devices Cam1, Cam2, Cam3, and Cam4 shown in FIG. 5 can be simultaneously substituted into the synthesized video pixels using the first reference table. In addition, the pixel values of the pixels in the overlapping region of the imaging ranges of the imaging devices Cam2, Cam3, and Cam4 illustrated in FIG. 6 can be simultaneously assigned to the synthesized video pixels using the second reference table. In the first embodiment, a video processor that is a parallel processing device such as a GPU is used, and the first and second reference tables are used, so that no processing wait occurs in the video composition processing, and the imaging device Cam1, ..., regardless of the number of CamNs, a composite image can be generated in two steps, that is, the substitution process using the first reference table and the substitution process using the second reference table.

[Video input processing]
The video input interface 18 in the video receiver 4 acquires video data for one frame of the imaging devices Cam1,..., CamN and stores the video data in the main memory 11. The acquired video data is transferred from the main memory 11 to the video processing memory 14.

[Video composition processing]
The video processor 13 in the video processing unit 6 combines the pixel values of the input video transferred to the video processing memory 14 using the first reference table and the second reference table, corresponding to the pixels of the input video. Substitute as the pixel value of the image pixel. This processing procedure will be described below.

The following video composition processing is executed by the video processor 13 in parallel with the processing of the main processor 10.
<1> First, as the first process, the video processor 13 determines from the first reference table that the camera number i corresponding to each pixel (x_synth, y_synth) in the synthesized video and the camera device i corresponding to the camera number i correspond to each other. The pixel position (x_cami, y_cami) to be performed and the weighted coefficient α are extracted.
<2> Next, as a second process, the video processor 13 refers to the pixel value of the input video (x_cami, y_cami) of the camera number i on the video processing memory 14 and assigns a weighting coefficient α to this pixel value. Is substituted for the pixel of the composite video (x_synth, y_synth) on the video processing memory 14.

Next, the video processor 13 executes the following video composition processing in parallel with the processing of the main processor 10.
<3> First, as the third process, the video processor 13 determines from the second reference table that the camera number i corresponding to each pixel (x_synth, y_synth) in the synthesized video and the imaging device Cami corresponding to the camera number i. The pixel position (x_cami, y_cami) to be performed and the weighted coefficient α are extracted.
<4> Next, as a fourth process, the video processor 13 refers to the pixel value of the input video (x_cami, y_cami) of the camera number i on the video processing memory 14 and assigns a weighting coefficient α to this pixel value. Is substituted for the pixel of the composite video (x_synth, y_synth) on the video processing memory 14. As a result, blend processing is performed on the pixels in the superimposed region of the composite video.

FIG. 7 is a flowchart showing the operation of the video composition apparatus according to the first embodiment (that is, the video composition method according to the first embodiment). After creating the reference table in the initialization process (step S1), the video processing unit 6 performs the video input process (step S2) and the video composition process (step S4) until the video input is completed (step S4). , Repeat. When the positions of the imaging devices Cam1,..., CamN are shifted, the video processing unit 6 corrects the positional shift using the feature points on the video and creates a new reference table in the background. By replacing the currently used reference table with a new reference table, an aligned composite video can be created.

The display processing unit 7 transmits the panoramic composite video data as the composite video data created by the video processing unit 6 to the display device 2. The display device 2 displays a video based on the received panoramic composite video data. Note that the display device 2 may display the panoramic composite video on a single display screen, or may display it over a plurality of display screens. The display device 2 may cut out and display only a partial area of the panoramic composite video.

<< 1-3 >> Effect As described above, according to the video composition device 1 and the video composition method according to the first embodiment, the decoding load of the input video data according to the number of the imaging devices Cam1,. However, the load of the video composition processing of the video acquired by the imaging devices Cam1,..., CamN hardly increases.

In addition, when lens distortion correction processing, viewpoint conversion processing, and projection conversion processing are applied to an input image for each of the imaging devices Cam1,..., CamN, the processing time increases as the number of imaging devices Cam1,. . In addition, even when a reference table is prepared for each of the imaging devices Cam1,..., CamN and lens distortion correction processing, viewpoint conversion processing, and projection conversion processing are combined, processing in the overlapping region of the imaging devices Cam1,. Since waiting occurs, the processing time increases as the number of imaging devices Cam1,..., CamN increases.
In the video synthesizing apparatus 1 and the video synthesizing method according to the first embodiment, paying attention to the waiting for processing between videos of the imaging apparatuses adjacent to each other with the overlapping area, the first reference table is configured only by data that can be substituted with pixels simultaneously. Thus, the video composition process can be realized with the maximum number of steps of the imaging devices Cam1,. That is, in the first embodiment, since the maximum number of imaging devices Cam1,..., CamN related to the overlapping region is a panoramic composite video, the steps using the first reference table and the second reference table are used. The synthesizing process can be executed in two steps consisting of steps using.

<< 2 >> Embodiment 2
<< 2-1 >> Configuration In the first embodiment, the video composition apparatus and the video composition method for generating one composite video (panoramic video) from a plurality of videos arranged in the left-right direction have been described. On the other hand, in Embodiment 2 of the present invention, a video composition device and a video composition method for generating one composite video (overhead video) from a plurality of videos arranged in the vertical and horizontal directions will be described.

In the second embodiment, the arrangement of a plurality of imaging devices Cam1,..., CamN, and the video processing unit 6 (or the video processing processor in FIG. 2) in FIG. The difference from Embodiment 1 is that the video composition processing is performed using the reference tables (second to fourth reference tables). Except for these points, the second embodiment is the same as the first embodiment. Therefore, in the description of the second embodiment, reference is also made to FIGS. 1, 2 and 7 used in the description of the first embodiment.

<< 2-2 >> Operation FIG. 8 is a diagram illustrating an example of the overlapping region of the trapezoidal imaging ranges of the plurality of imaging devices Cam1,..., Cam9 in the video composition device according to the second embodiment. As shown in FIG. 8, when creating a bird's-eye view synthesized video, a large one bird's-eye view video can be created by performing viewpoint conversion and projection conversion of videos acquired by a plurality of imaging devices Cam1, ..., Cam9. it can. FIG. 8 shows an example of the arrangement of the plurality of imaging devices Cam1,..., Cam9, and does not limit the arrangement method of the plurality of imaging devices.

As shown in FIG. 8, when a plurality of imaging devices Cam1,..., Cam9 are arranged, the maximum number of imaging devices corresponding to the same pixel (41 in FIG. 8) in the superimposed region of the composite video is 4 in the vertical and horizontal directions. It becomes a stand. Similarly to the generation of the panorama composite image, the imaging device generates four types of reference tables, that is, first to fourth reference tables, as reference tables composed of only pixels that can be simultaneously substituted for the synthesized video pixels. Regardless of the number of images, the video composition process can be executed in four steps.

FIG. 9 is a diagram illustrating an example of an overlapping area of the imaging ranges of the plurality of imaging devices Cam1,..., Cam9 in the video composition device according to the second embodiment. In FIG. 9, in order to simplify the example of FIG. 8, the imaging range is drawn in a rectangle. FIG. 10 is a diagram illustrating a first reference table for the imaging range in the video composition device according to Embodiment 2. FIG. 11 to FIG. 13 show the second reference table, the third reference table, and the fourth reference table, which are other reference tables for the imaging range (superimposed region) in the video composition device according to the second embodiment. FIG. 9 to 13 show the images after projection conversion of the imaging devices Cam1,..., CamN as rectangles for the sake of simplicity, but the process is the same for trapezoids or other shapes. Also, the first to fourth reference tables shown in FIGS. 9 to 13 are examples, and the shape and number of the reference tables are not limited to the examples of FIGS. 9 to 13.

[Video input processing]
When the video receiving unit 4 completes acquiring the input video for one frame of the imaging devices Cam1,..., CamN, the input video data is transferred from the main memory 11 to the video processing memory 14.

[Video composition processing]
The pixel value of the input video transferred to the video processing memory 14 is substituted into the composite video using the first to fourth reference tables. The processing procedure is shown below.

The video processor 13 of the video processing unit 6 executes the following operations in parallel.
<11> In the first processing, the video processor 13 determines from the first reference table shown in FIG. 10 the imaging device Cami of the camera number i and camera number i corresponding to each pixel (x_synth, y_synth) in the synthesized video. A corresponding pixel position (x_cami, y_cami) and a weighted coefficient α are extracted.
<12> In the second processing, the video processor 13 refers to the pixel value of the pixel (x_cami, y_cami) of the input video of the imaging device Cami with the camera number i on the video processing memory 14 and weights this pixel value The added coefficient α is multiplied and assigned to the pixel of the composite video (x_synth, y_synth) on the video processing memory 14.
<13> In the third process, the video processor 13 performs the same process as the eleventh process and the twelfth process on the second reference table.
<14> In the fourth process, the video processor 13 executes the same process as the eleventh process and the twelfth process for the third reference table.
<15> In the fifth process, the video processor 13 also executes the same processes as the eleventh process and the twelfth process for the fourth reference table.
The processing procedure of the entire video processing unit is the same as that in FIG. The operation of the display processing unit is the same as that of the first embodiment.

<< 2-3 >> Effect According to the video synthesizing apparatus and the video synthesizing method according to the second embodiment, the decoding load of the input video data increases according to the number of the imaging devices Cam1, ..., CamN, but the imaging device Cam1. ,..., The load of video composition processing of the video acquired by CamN hardly increases.

In the video composition device and the video composition method according to the second embodiment, the first reference table is composed of only data that can be assigned to pixels at the same time, paying attention to the processing wait between the images of the imaging devices that have adjacent overlapping regions. By doing this, it is possible to realize the video composition processing only with the maximum number of steps of the imaging devices Cam1,. Thereby, even if the number of imaging devices Cam1,..., CamN increases, a composite image can be created in the same processing time. For example, in the case of a bird's-eye synthesized video, the process can be executed in four steps.

In the case of a bird's-eye view composite video, if individual cameras are arranged as shown in FIG. 7, processing can be executed in a maximum of four steps.

1 video composition device, 2 display device, 4 video reception unit, 5 parameter input unit, 6 video processing unit, 6a storage unit, 7 display processing unit, 10 main processor, 11 main memory, 12 auxiliary memory, 13 video processing processor, 14 video processing memory, 15 input interface, 16 file interface, 17 display interface, 18 video input interface, Cam1, ..., Cami, ..., CamN imaging device (camera).

Claims

A video synthesizing device that generates one synthesized video from a plurality of videos acquired by a plurality of imaging devices,
A video receiver for receiving the plurality of videos;
A parameter input unit for inputting camera parameters of the plurality of imaging devices;
A video processing unit that generates the composite video from the plurality of videos;
With
The video processing unit
First imaging device identification information for identifying a corresponding imaging device among the plurality of imaging devices and the first imaging device identification for each pixel of the composite video using the camera parameters input in advance Creating a reference table including a corresponding first pixel position in the imaging device identified by the information and a first weighted coefficient at the corresponding first pixel position;
With reference to the reference table, for each pixel of the composite video, the first weighted coefficient is added to the pixel value of the corresponding first pixel position in the imaging device specified by the first imaging device specifying information. An image composition device, wherein the composite image is generated by substituting the first value obtained by multiplication.
The video processing unit
Using the camera parameters, for each pixel of the composite video, one of the imaging devices having an overlapping region that is an imaging region that overlaps one another among a plurality of imaging ranges captured by the plurality of imaging devices is specified. Second imaging device specifying information, a second pixel position corresponding to the overlapping region in the imaging device specified by the second imaging device specifying information, and a second pixel position corresponding to the overlapping region. Create another lookup table that includes the second weighting factor,
In the imaging device identified by the second imaging device identification information for each pixel corresponding to the second pixel position of the superimposition region among the pixels of the composite video with reference to the other reference table A blending process is performed by substituting the second value obtained by multiplying the pixel value at the corresponding pixel position by the second weighted coefficient, thereby corresponding to the superimposed region in the composite video. The video synthesizing apparatus according to claim 1, wherein a part is generated.
Among a plurality of imaging ranges imaged by the plurality of imaging devices, imaging ranges that overlap each other overlap in the left-right direction, and the composite video is a panoramic video,
The video composition device according to claim 2, wherein the other reference table is a single table.
Among a plurality of imaging ranges imaged by the plurality of imaging devices, imaging ranges that overlap each other are superimposed in the vertical and horizontal directions, and the composite video is an overhead video,
The video synthesizing apparatus according to claim 2, wherein the other reference tables are three tables.
A video synthesis method for generating one composite video from a plurality of videos acquired by a plurality of imaging devices,
First imaging device specifying information for specifying a corresponding imaging device among the plurality of imaging devices for each pixel of the composite video using camera parameters input in advance for the plurality of imaging devices; Creating a first reference table including a corresponding first pixel position in the imaging device specified by the one imaging device specifying information and a first weighted coefficient at the corresponding first pixel position; ,
With reference to the first reference table, for each pixel of the composite video, the first weight is set to the pixel value at the corresponding first pixel position in the imaging device specified by the first imaging device specifying information. And a step of generating the composite video by substituting the first value obtained by multiplying the attached coefficient.
Using the camera parameters, for each pixel of the composite video, one of the imaging devices having an overlapping region that is an imaging region that overlaps one another among a plurality of imaging ranges captured by the plurality of imaging devices is specified. Second imaging device specifying information, a second pixel position corresponding to the overlapping region in the imaging device specified by the second imaging device specifying information, and a second pixel position corresponding to the overlapping region) Creating a second look-up table including a second weighted factor at
The imaging device identified by the second imaging device identification information for each pixel corresponding to the second pixel position of the superimposition region among the pixels of the composite video with reference to the second reference table By substituting the second value obtained by multiplying the pixel value of the corresponding pixel position in the second weighted coefficient by the blending process, it corresponds to the superimposed region in the synthesized video 6. The method for synthesizing a video according to claim 5, further comprising: generating a portion to be processed.