WO2023119488A1

WO2023119488A1 - Video compositing system, video compositing method, and video compositing program

Info

Publication number: WO2023119488A1
Application number: PCT/JP2021/047582
Authority: WO
Inventors: 広夢宮下; 真二深津; 英一郎松本; 麻衣子井元
Original assignee: 日本電信電話株式会社
Priority date: 2021-12-22
Filing date: 2021-12-22
Publication date: 2023-06-29

Abstract

A video compositing system according to one embodiment comprises: a compositing instruction unit that generates compositing information indicative of the image blocks an output image is to be composed of; a compositing information transmission unit that transmits the compositing information; an image reception unit that receives a plurality of transmission image blocks which have been generated on the basis of the compositing information; an composite image unit that composites the plurality of transmission image blocks into the output image on the basis of the compositing information; and a display unit that displays the output image.

Description

Video Synthesis System, Video Synthesis Method, and Video Synthesis Program

The present invention relates to a video synthesizing system, a video synthesizing method, and a video synthesizing program.

As a video synthesis technology, a method is known in which a plurality of video signals are input and a desired output image is generated by synthesizing the plurality of video signals. These methods are generally realized by functions such as mixer/keyer provided in devices such as live production switchers.

Japanese Patent Laid-Open No. 2004-100000 discloses a multi-point communication conference system in which a bit rate of image data can be freely changed for each client terminal, and means for arbitrarily switching between a synthesized image of a client and a specific personal image is provided, thereby allowing an individual It discloses a technique for viewing images with high definition.

Japanese Patent No. 3936707

In Patent Document 1, in one server device and a plurality of client terminals connected via a network, the client terminal transmits video to the server device in a scalable bitstream by hierarchical coding, and is returned from the server device. Transmission bandwidth is saved by enabling the client terminal to specify the video.

However, since Patent Document 1 assumes control by a scalable bitstream, image quality tends to deteriorate in video with a low bit rate. In addition, since Patent Document 1 assumes only simultaneous display of a plurality of images with respect to image synthesis by a server device, deterioration in image quality tends to appear on the finally displayed image. Furthermore, there is a problem that it is not possible to cope with image effects such as mixing in which the display area of each image cannot be clearly divided.

SUMMARY OF THE INVENTION The object of the present invention is to address the above-mentioned circumstances, and the object of the present invention is to notify in advance of synthesis information regarding video synthesis, and to allow the server on the transmission side to process and compress the input image based on the synthesis information. to provide a technique for realizing high-definition transmission and synthesis of a plurality of images even on a transmission path with a narrow transmission band.

In order to solve the above problems, a video composition system according to one aspect of the present invention includes a composition instruction unit that generates composition information indicating what kind of image blocks an output image is composed of, and a composition instruction unit that transmits the composition information. A synthesis information transmission unit, an image reception unit that receives a plurality of transmission image blocks generated based on the synthesis information, and a synthesis image unit that synthesizes the output image from the plurality of transmission image blocks based on the synthesis information. and a display section for displaying the output image.

According to one aspect of the present invention, synthesizing information related to video synthesizing is notified in advance, and the server on the transmitting side processes and compresses the input image based on the synthesizing information. It is possible to provide high-definition transmission and composition of video.

FIG. 1 is a diagram illustrating a configuration example of a video synthesizing system according to an embodiment. FIG. 2 is a diagram showing a configuration example of the video synthesizing system shown in FIG. FIG. 3 is a schematic diagram illustrating an example of the configuration of a computer functioning as a transmission server according to the embodiment; FIG. 4 is a schematic diagram illustrating an example of the configuration of a computer functioning as a receiving server according to the embodiment; FIG. 5 is a flowchart illustrating an example of processing operations of the transmission server. FIG. 6 is a flow chart showing an example of the processing operation of the receiving server. FIG. 7 is a diagram showing an example of an input image block. FIG. 8 is a diagram showing an example of a transfer image block in picture-in-picture and a synthesis result. FIG. 9 is a diagram illustrating an example of synthesis information in picture-in-picture. FIG. 10 is a diagram showing an example of synthesis information in alpha blending. FIG. 11 is a diagram showing an example of transferred image blocks and synthesis results in alpha blending. FIG. 12 is a diagram showing an example of composition information when image processing is performed by composition information alpha blending. FIG. 13 is a diagram showing the transmission timing of transmission image blocks and image synthesis in dissolve. FIG. 14 is a diagram illustrating an example of synthesis information in dissolve.

Hereinafter, embodiments according to the present invention will be described with reference to the drawings. Elements that are the same as or similar to elements that have already been explained are denoted by the same or similar reference numerals, and overlapping explanations are basically omitted.

(composition)
FIG. 1 is a diagram illustrating a configuration example of a video synthesizing system according to an embodiment.
The video synthesizing system 1 according to the present embodiment includes a first transmission server 2A located at a first site A, a second transmission server 2B located at a second site B, and a transmission server 2B located at a site C. a third transmission server 2C, a fourth transmission server 2D located at a site D, a reception server 3, and a network 4 for relaying video signals and the like transmitted between them. For simplicity of explanation, the first transmission server 2A, the second transmission server 2B, the third transmission server 2C, and the fourth transmission server 2D are simply the transmission server 2 when there is no need to distinguish them. and described. Also, although the example shown in FIG. 1 shows a plurality of transmission servers 2 and a single reception server 3, the number of transmission servers 2 may be one, and the number of reception servers 3 may be plural. is of course.

The transmission server 2 is located at each base. The transmission server 2 transmits the video shot at each base to the reception server 3 via the network 4 .

The receiving server 3 is installed at any location. The reception server 3 can synthesize the video received from the transmission server 2 and output the synthesized image.

FIG. 2 is a diagram showing a configuration example of the video synthesizing system 1 shown in FIG.
In the example of FIG. 2, only the first transmission server 2A and the second transmission server 2B are shown for simplification of explanation.

The first transmission server 2A includes a first imaging section 21A, a deformation section 22, an image transmission section 23, and a synthesis information reception section 24. Although the second transmission server 2B has the same configuration as the first transmission server 2A, only the second imaging unit 21B is shown in the example of FIG. 2 for simplification. Further, the example of FIG. 2 shows an example in which the first imaging unit 21A and the second imaging unit 21B are arranged outside the first transmission server 2A and the second transmission server 2B. Of course, the imaging unit 21A may be arranged inside the first transmission server 2A, and the second imaging unit 21B may be arranged inside the second transmission server 2B. For simplification of explanation, the first imaging section 21A and the second imaging section 21B are simply referred to as the imaging section 21 when there is no need to distinguish between them.

The receiving server 3 includes a first image receiving section 31A, a second image receiving section 31B, an image synthesizing section 32, a synthesizing information transmitting section 33, a display section 34, and a synthesizing instruction section 35. In the example of FIG. 2, the display unit 34 and the composition instruction unit 35 are arranged outside the receiving server 3, but the display unit 34 and the composition instruction unit 35 may be arranged inside the reception server 3. Of course. For simplicity of explanation, the first image receiving section 31A and the second image receiving section 31B are simply referred to as the image receiving section 31 when there is no need to distinguish them.

The imaging unit 21 shoots images at the base. The imaging unit 21 acquires an input image block from a part of the video frame of the captured video and outputs it to the transformation unit 22 .

The transformation unit 22 transforms the input image block received from the imaging unit 21 based on the composition information received from the composition information reception unit 24, which will be described later, to generate the transmission image block. Here, the composition information indicates what kind of image blocks the output image is composed of. The transformation unit 22 then outputs the transmission image block to the image transmission unit 23 . A method for generating transmission image blocks from input image blocks will be described later.

The image transmission unit 23 transmits the transmission image block to the first image reception unit 31A of the reception server 3, which will be described later, via the network 4.

The combining information receiving unit 24 receives combining information from the receiving server 3 via the network 4 and outputs the received combining information to the transforming unit 22 .

The second transmission server 2B is the same as the first transmission server 2A, so the description is omitted here. Note that the image transmission unit 23 of the second transmission server 2B transmits the transmission image block to the second image reception unit 31B of the reception server 3, which will be described later, via the network 4. FIG.

Next, the receiving server 3 will be described in detail.
The first image reception unit 31A and the second image reception unit 31B output the transmission image blocks received from the first transmission server 2A and the second transmission server 2B to the image composition unit 32, respectively.

The image synthesizing unit 32 synthesizes the output image from the transmission image block based on the synthesizing information received from the synthesizing instruction unit 35 to be described later, and outputs the synthesized output image to the display unit 34 .

The combining information transmission unit 33 transmits the combining information received from the combining instruction unit 35 (to be described later) to the transmission server 2 .

The display unit 34 displays the output image received from the image synthesizing unit 32 .

The synthesizing instruction unit 35 generates synthesizing information. The synthesis information may be, for example, manually input or generated by parameters set by the administrator of the receiving server 3, or may be automatically generated by the synthesis instruction unit 35 reading settings stored in the data memory 303 or the like. It may be generated in

It is assumed that the network 4 stably secures a band sufficient for transmitting one video stream. For example, it is assumed that the network 4 has a band capable of stably transmitting and receiving 4K UHD (Ultra High Definition) resolution 60 fps video at 12 Gbps.

FIG. 3 is a schematic diagram showing an example of the configuration of a computer functioning as the transmission server 2 according to the embodiment.
The transmission server 2, as shown in FIG. 3, is composed of a computer device and has a processor 201 such as a CPU. A program memory 202 , a data memory 203 , a communication interface 204 and an input/output interface 205 are connected to the processor 201 via a bus 206 .

The program memory 202 includes storage media such as EPROM (Erasable Programmable Read Only Memory), HDD (Hard Disk Drive), SSD (Solid State Drive) and other non-volatile memories that can be written and read at any time, and ROM (Read Only Memory) can be used in combination with non-volatile memory. The program memory 202 stores programs necessary for executing various processes. That is, all of the various processes described below can be realized by reading and executing a program stored in the program memory 202 by the processor 201 .

The data memory 203 is storage that uses, as a storage medium, a combination of a non-volatile memory that can be written and read at any time, such as an HDD or memory card, and a volatile memory such as a RAM (Random Access Memory). The data memory 203 is used to store data acquired and generated while the processor 201 executes the program and performs various processes.

The communication interface 204 includes one or more wired or wireless communication modules. For example, the communication interface 204 includes a communication module for wired or wireless connection with other devices including the receiving server 3 . Additionally, the communication interface 204 may include a wireless communication module for wirelessly connecting with other devices using short-range wireless technology. That is, the communication interface 204 may be a general communication interface as long as it can communicate with other devices and transmit and receive various information under the control of the processor 201 .

An input unit 207 and a display unit 208 are connected to the input/output interface 205 .

The input unit 207 includes at least one camera or video camera. For example, the camera or video camera included in the input unit 207 captures a target captured image and outputs the captured image to the input/output interface 205 . Further, the input unit 207 is, for example, an input detection sheet that employs an electrostatic method or a pressure method and is arranged on the display screen of a display device that is the display unit 208. Through the input/output interface 205, the transmission server 2 administrator's touch position is output to the processor 201 .

The display unit 208 is a display device using, for example, liquid crystal, organic EL (Electro Luminescence), etc., and displays images and messages according to signals input from the input/output interface 205 .

FIG. 4 is a schematic diagram showing an example of the configuration of a computer functioning as the reception server 3 according to the embodiment.
The receiving server 3, as shown in FIG. 4, is composed of a computer device and has a processor 301 such as a CPU. A program memory 302 , a data memory 303 , a communication interface 304 and an input/output interface 305 are connected to the processor 301 via a bus 306 .

For the program memory 302, as a storage medium, for example, non-volatile memory such as EPROM, HDD, SSD, etc., which can be written and read at any time, and non-volatile memory such as ROM can be used in combination. The program memory 302 stores programs necessary for executing various processes. That is, all of the various processes described below can be realized by reading and executing a program stored in the program memory 302 by the processor 301 .

The data memory 303 is storage that uses, as a storage medium, a combination of a non-volatile memory that can be written and read at any time, such as an HDD or memory card, and a volatile memory such as a RAM. The data memory 303 is used to store data acquired and generated while the processor 301 executes the program and performs various processes.

The communication interface 304 includes one or more wired or wireless communication modules. For example, the communication interface 304 includes a communication module that makes wired or wireless connections with other devices, including the transmission server 2 . Additionally, the communication interface 304 may include a wireless communication module for wirelessly connecting with other devices using short-range wireless technology. That is, the communication interface 304 may be a general communication interface as long as it can communicate with other devices under the control of the processor 301 and transmit and receive various kinds of information.

An input unit 307 and a display unit 308 are connected to the input/output interface 305 .

The input unit 307 is, for example, an input detection sheet that employs an electrostatic method or a pressure method and is arranged on the display screen of a display device, which is the display unit 308 . The administrator's touch position is output to the processor 301 .

The display unit 308 is a display device using, for example, liquid crystal, organic EL (Electro Luminescence), etc., and displays images and messages according to signals input from the input/output interface 305 .

(motion)
FIG. 5 is a flow chart showing an example of the processing operation of the transmission server 2, and FIG. 6 is a flow chart showing an example of the processing operation of the reception server 3. As shown in FIG.
When the processor 201 of the transmission server 2 reads and executes the program stored in the program memory 202, the operation of the flowchart shown in FIG. 5 is realized. Similarly, the processor 301 of the receiving server 3 reads and executes the program stored in the program memory 302, thereby realizing the operation of the flowchart shown in FIG.

　Here, it is assumed that the processing operations shown in FIGS. 5 and 6 are performed for each frame, and are performed continuously and in parallel each time a frame is input.

The operation may be started when it is necessary to create synthesis information, for example, triggered by a video effect (Digital Video Effect, etc.) realized by a live production switcher, etc. Alternatively, the operation may be started at arbitrary timing. For example, it may be initiated when there is a need to create synthetic information.

First, the processing operation of the transmission server 2 will be described with reference to FIG.
Synthesis information receiving section 24 receives synthesis information (step ST101). Here, the combining information is generated by the combining instruction section 35 of the receiving server 3 . The synthesis information receiving section 24 outputs the received synthesis information to the transformation section 22 . Details of the information included in the synthesis information will be described later. Here, the composite information may be received at any timing before step ST102 is processed. That is, it may be received in advance by the transmission server 2 and stored in the data memory 203 or the like. Then, each unit may be read out from the data memory 203 and used when necessary. Further, step ST101 may of course be processed in parallel with step ST102, which will be described later.

The imaging unit 21 acquires an input image block (step ST102). The imaging unit 21 acquires a video frame and acquires a part of the acquired video frame as an input image block. Here, if the video frame cannot be obtained part by part, one frame may be buffered to obtain an input image, and then divided into input image blocks. The imaging unit 21 then outputs the input image block to the deformation unit 22 .

FIG. 7 is a diagram showing an example of an input image block.

(a) of FIG. 7 is the input image A acquired by the imaging unit 21 of the first transmission server 2A at the first site A, and (b) is a diagram showing an example of an input image block. As shown in (b), it is assumed that the input image block is divided into 16 blocks of 4 vertical×4 horizontal, such as A-01 to A-16. FIG. 7C is an input image B acquired by the imaging unit 21 of the second transmission server 2B at the second site B, and FIG. 7C is a diagram showing an example of an input image block. . In (d) as well as (b), it is assumed that the input image block is divided into 16 blocks of 4 vertical×4 horizontal, for example, B-01 to B-16.

The transformation unit 22 transfers the input image block to the transmission image block (step ST103). The transformation unit 22 transfers the pixels included in the input image block to the transmission image block based on the synthesis information. Further, when it is necessary to transform the input image block according to the content of the synthesis information, the transformation unit 22 may set different pixel coordinates before and after the pixel reference and transfer.

The transformation unit 22 compresses the transmission image block (step ST104). A transformation unit 22 compresses the transmission image block based on the synthesis information. Here, since the compression method may be a general method, detailed description thereof is omitted here. Note that step ST104 may be skipped if there is no compression flag in the combined information. Then, the transformation unit 22 outputs the compressed or skipped transmission image blocks to the image transmission unit 23 .

The image transmission unit 23 transmits the transmission image block to the receiving server 3 (step ST105). The image transmission unit 23 receives the transmission image block from the transformation unit 22 and transmits the received transmission image to the reception server 3 through the network 4 .

Next, the processing operation of the receiving server 3 will be described with reference to FIG.
Synthesis information transmitting section 33 transmits the synthesis information to transmission server 2 (step ST201). The composition instruction unit 35 outputs composition information to the composition information transmission unit 33 and the image composition unit 32 . After receiving the combined information, the combined information transmission unit 33 transmits the combined information to each transmission server 2 through the network 4 . Note that the synthesis information may be transmitted in advance before frame processing is started. In other words, the processing of step ST201 may be performed at any timing as long as the frame is transmitted before or during processing by the transmission server 2 .

The image receiving unit 31 receives the transmission image block (step ST202). For example, the first image receiving unit 31A receives the transmission image block generated based on the input image A from the first transmission server 2A at the base A, and the second image receiving unit 31B receives the transmission image block at the base B. A transmission image block generated based on the input image B is received from the second transmission server 2B. The first image receiving section 31A and the second image receiving section 31B output the received transmission image blocks to the image synthesizing section 32, respectively.

The image synthesizing unit 32 determines whether or not the transmission image block is a single block (step ST203). For example, when the transmission image block is received only from the first transmission server 2A, the image synthesizing unit 32 determines that the transmission image block is single. In this case, the process proceeds to step ST204. On the other hand, when transmission image blocks are received from the first transmission server 2A and the second transmission server 2B, it is determined that there are a plurality of transmission image blocks. In this case, the process proceeds to step ST205.

The image synthesizing unit 32 inserts the transmission image block into the output image (step ST204). The image synthesizing unit 32 generates an output image by arranging the transmission image blocks in sequence.

The image synthesizing unit 32 generates an output image from a plurality of transmission image blocks (step ST205). The image composition unit 32 combines and/or blends the transmission image blocks based on the composition information to generate an output image.

The image composition unit 32 outputs the output image to the display unit 34 (step ST206). The image synthesizing section 32 outputs the generated output image to the display section 34 . The display unit 34 displays video frames according to the output image.

(Example of composite pattern in picture-in-picture)
In the following, a description will be given of a case in which the synthesized image is "superimposed on the input image B with the input image A reduced to 1/2" based on a synthesis pattern generally called picture-in-picture.

FIG. 8 is a diagram showing an example of a transfer image block and a synthesis result (output image) in picture-in-picture.
(a) of FIG. 8 is the input image A reduced to 1/2, and (b) shows an input image block corresponding to the input image A of (a). (c) is obtained by deleting a portion of the input image B that overlaps with the input image A, and (d) shows an input image block corresponding to the input image B in (c). (e) shows the image of the final synthesis result, that is, the output image, and (f) shows the image block corresponding to (e). Here, R-xx (xx is the serial number of the image block) indicates the transmission image block.

That is, the first transmission server 2A transmits the transfer image block shown in (b) to the reception server 3, and the second transmission server 2B sends the transfer image block shown in (d) to the reception server 3. Then, the image synthesizing unit 32 of the receiving server 3 combines these transferred image blocks to generate an output image shown in (e).

As shown in (e), of the output image, the input image B is assigned to the image block corresponding to the peripheral portion, and the input image A is assigned to the image block corresponding to the central portion of the image. Then, the image synthesizing unit 32 generates an output image by combining the input image A and the input image B based on the synthesizing information.

FIG. 9 is a diagram illustrating an example of synthesis information in picture-in-picture.
FIG. 9(a) shows how input image A and input image B are transformed to generate transmission image blocks. For example, the first column in (a) indicates the transmission image block, the second column indicates which input image block to use for the transmission image block, and the third column indicates the input image block to use. It shows reference flags that indicate how to transform or not. (b) indicates how to compress the transmitted image block. As shown in (a), R-06, R-07, R-10, and R-11 are each assigned an input image block of input image A, and a reference flag for scaling 0.5 is set. there is Input image blocks of input image B are assigned to R-01 to R-05, R-08 to R-09, and R-12 to R-16, respectively, and reference flags for copying input image B are set. ing. That is, these transmission image blocks are generated by directly transferring the input image B without transforming it.

It should be noted that when using the composite information described with reference to FIG. 9, the output image block corresponds to either the transfer image block received from the first transmission server 2A or the second transmission server 2B, so there is no need to perform blending. . Therefore, the image synthesizing unit 32 can generate an output image by arranging and combining the transmission image blocks received from the first transmission server 2A and the second transmission server 2B according to the serial numbers.

Here, if the first transmission server 2A and the second transmission server 2B transmit the input image A and the input image B without processing, respectively, a transmission band for two streams is required. If the network 4 only has a transmission band for one stream, the network 4 will be congested and the receiving server 3 will either be unable to receive the input image, or will receive a degraded input image. However, in this embodiment, the first transmission server 2A and the second transmission server 2B transmit transmission image blocks processed in advance into image blocks constituting output image blocks. As a result, the total amount of data output from all transmission servers 2 does not exceed the transmission band for one stream. Therefore, since congestion does not occur in the network 4, the reception server 3 can stably receive a plurality of transmission image blocks.

For example, in general video synthesis, each input video is received without any processing, and processing such as blending is executed with all the data in place. However, if there are a plurality (n) of 4K UHD imaging units 21, video data of n×12 Gbps will flow into the video synthesizing server and its preceding transmission line. Therefore, especially when uncompressed video transmission such as ST2110-20 is considered, it is necessary to design a video transmission line on the premise of multi-streaming according to the number of connected devices, which increases the cost.

In this embodiment, the transmission server 2 is controlled to transmit only the required number of image blocks based on the composition information. Therefore, it is sufficient to secure a transmission band for one stream on all video transmission lines.

(Example of composite pattern in alpha blending)
In the example below, based on a synthesis pattern commonly referred to as alpha blending, the synthesis information is "moving input image A to the left and input image B to the right, and blending so that the overlapping portions are smoothly connected." An example in the case of is explained.

FIG. 10 is a diagram showing an example of synthesis information in alpha blending.
Here, each input image and output image may be, for example, 4K UHD resolution (3840×2160). The x-axis of FIG. 10 indicates pixel position and the y-axis indicates alpha value. For example, when using the transfer image blocks shown in FIG. 6, the resolution of one transfer block is 960×540.

Input image A shown in FIG. 10 has been moved to the left by 960 pixels. In other words, it shows a state of moving from the original x-coordinate 0 to 960, and A-01, A-05, A-09, and A-13 of the transfer blocks are not used for the output image. Similarly, input image B has been shifted to the right by 960 pixels. In other words, it shows a state of moving from the original x-coordinate of 2880 by 3840, and B-04, B-08, B-12 and B-16 of the transfer blocks are not used for the output image. Therefore, these transfer blocks are controlled so as not to be transmitted from the transmission server 2. FIG. Also, the overlapped portion is synthesized with an alpha value when generating the output image. Here, the pixel value of the input image A at the coordinates (m, n) is A _{(m, n),} the pixel value of the input image B is B _{(m, n)} , and the alpha value (ranging from 0 to 100%) is When α _{(m, n)} , the pixel value R(m, n) of the synthesized image (output image) is calculated as follows.

FIG. 11 is a diagram showing an example of transferred image blocks and synthesis results (output images) in alpha blending.
FIG. 11(a) is the input image A according to alpha blending, (b) is the transferred image block, and (c) is according to the size ratio according to the numerical values shown at the bottom of (b). Shows a compressed image. Similarly, (d) is the input image B according to alpha blending, (e) is the transferred image block, and (f) is according to the size ratio according to the numbers shown at the bottom of (e). Shows a compressed image. Also, (h) indicates the image of the final synthesis result, that is, the output image, and (i) indicates the image block corresponding to (h).

Since the input image A has moved to the left as described above, no pixels are transferred to the transmission image blocks (R-04, R-08, R-12, R-16) corresponding to the right end of the image. Similarly, in the input image B, pixels are not transferred to the transmission image blocks (R-01, R-05, R-09, R-13) corresponding to the left end of the image. Transmission image blocks (R-02 to R-03, R-06 to R-07, R-10 to R-11, R-14 to R-15), which are overlapping portions, are input image A and input image B are included in the transmission image block. However, these transmitted image blocks will be compressed based on the synthesis information.

FIG. 12 is a diagram showing an example of synthesis information when image processing is performed using alpha blending.
(a) of FIG. 12 is combined information referred to by the first transmission server 2A, and (b) is combined information referred to by the second transmission server 2B. The first column of the synthesis information indicates the transmission image block, the second column indicates which input image block to use for the transmission image block, and the third column indicates how the input image block to be used. A reference flag indicating whether to transform is shown, and the fourth column shows a compression flag indicating how the transmission image block is compressed (ie, data volume after compression). Here, xx in R-xx-y in the first column indicates the serial number of the image block, and y indicates the location. Also, the transformation unit 22 of the transmission server 2 reduces the size of the transferred image block based on the value of the compression flag, and the image synthesis unit 32 of the reception server 3 performs processing to restore the size after receiving the block. For example, the compression flag may be calculated from the alpha value for alpha blending. For example, in the regions corresponding to R-02, R-06, R-10, and R-14, referring to FIG. 11, the average alpha value of input image A is 75%, and the average alpha value of input image B is 25%. , the value corresponding to it is set as the compression flag.

In this embodiment, the synthesis instruction unit 35 may generate synthesis information so that the compression rate of light portions (portions with low alpha values) is increased and the compression rate of dark portions (portions with high alpha values) is decreased. That is, the compression rate of a transmission image block that is output dark as an output image is lower than the compression rate of a transmission image block that is output light as an output image. For example, pixels output lightly as an output image are considered to be less important than pixels output darkly as an output image. Therefore, even if the image quality is degraded by increasing the compression rate of pixels that are output lightly, there is an effect that the influence of image quality deterioration due to compression and decompression of pixels that are output dark is suppressed.

Each transmission server 2 compresses the transmission image block based on the synthesis information and transmits the data to the reception server 3. After receiving the transmission image block, the receiving server 3 decompresses the transmission image block based on the synthesis information, and generates an output image from the transmission image block.

Since the transmission image blocks are compressed by the alpha value, the total amount of data output from all transmission servers 2 does not exceed the transmission band for one 4K UHD stream. Therefore, since congestion does not occur in the network 4, the reception server 3 can stably receive a plurality of transmission image blocks.

(Example of composite pattern in dissolve)
Below, based on a synthesis pattern generally called a dissolve, an example in which the synthesis information is "smooth transition from input image A to input image B as frames transition" will be described.

FIG. 13 is a diagram showing transmission timing of transmission image blocks and image synthesis (output image) in dissolve.
In the example shown in FIG. 13, it is assumed that the imaging unit 21 at each site outputs a video stream at a frame rate of 60 fps. The synthesizing instruction unit 35 of the receiving server 3 generates synthesizing information so as to control the frame rate transmitted from the transmitting server 2 according to the dissolve transition rate. 13(a) shows a frame transmitted from the first transmission server 2A, (b) shows a frame transmitted from the second transmission server 2B, and (c) is displayed on the display unit 34. An example of an output image is shown. Here, ▪ in (a) and (b) indicates that the transmission image block is transmitted, and □ indicates that the transmission image block is not transmitted. In the example of FIG. 13, as the transition rate increases, the transmission timings of the transfer image blocks transmitted by the first transmission server 2A become sparse, and the transmission timings of the transfer image blocks transmitted by the second transmission server 2B become denser. shows an example. By doing so, the video of the output image is switched from the input image A to the input image B gradually.

FIG. 14 is a diagram illustrating an example of synthesis information in dissolve.
FIG. 14 is a diagram showing an example of synthesis information when a dissolve is instructed from 00:01:00 to 00:07:00, for example. The first column in the table of FIG. 14 indicates the time, the second column indicates the frame rate transmitted by the first transmission server 2A, and the third column indicates the frame rate transmitted by the second transmission server 2B. show. As shown in FIG. 13, the blending ratio changes as time progresses, so the video frame rate is increased or decreased accordingly. Here, it shall be set so as not to exceed the original frame rate.

In this embodiment, for example, instead of compressing the input image to 1/2 when the transition rate is 50%, the bandwidth of the transfer image block is saved by controlling the frame rate. For example, when compressing an input image, the quality of the output image may be degraded. However, when changing the frame rate to save the bandwidth of the transferred image blocks, there is no degradation in the quality of the output image. Here, the sending server 2 and the receiving server 3 may be synchronized by peer-to-peer (PTP) or the like. Also, past frames may be used for portions where the frame rate is low. Furthermore, the higher the frame rate, the higher the transfer image block update frequency.

When transitioning images using dissolve, the receiving server 3 analyzes the video content and generates synthesis information so as to control the compression rate of the image when the motion is large and the frame rate when the image content is simple. You can

Thus, the transmission capacity of transmission image blocks output from the first transmission server 2A and the second transmission server 2B is proportional to its frame rate. By controlling the frame rate as described above, the total amount of data output from all transmission servers 2 does not exceed the transmission band for one stream of 4K UHD. Therefore, since congestion does not occur in the network 4, the reception server 3 can stably receive a plurality of transmission image blocks.

(Effect)
According to the embodiment, in a communication system configuration including a plurality of transmission servers 2 and reception servers 3, the reception server 3 notifies the transmission server 2 in advance of synthesis information, which is information regarding video synthesis. By having the transmission server 2 process and compress the input image based on the synthesis information, it is possible to provide a technique for realizing high-definition transmission and synthesis of multiple images even on a transmission path with a narrow transmission band.

[Other embodiments]
In addition, this invention is not limited to the said embodiment. For example, in the example shown in this embodiment, an example has been described in which control is performed so as not to exceed the transmission band for one stream of the original video. However, when the transmission band is sufficiently secured, the composition instruction unit 35 sets the composition information to relax the compression flag and frame rate restrictions described above, and sets the data volume to a value that increases. can be That is, the synthesizing instruction unit 35 evaluates the transmission path from the transmitting server 2 to the receiving server 3, and adjusts the compression flag to be included in the synthesizing information and the lower/upper limit value of the frame rate according to the lowest transmission capacity that becomes a bottleneck. You can set the value.

In addition, the method described in the above embodiment can be executed by a computer (computer) as a program (software means), such as a magnetic disk (floppy (registered trademark) disk, hard disk, etc.), an optical disk (CD-ROM, DVD , MO, etc.), semiconductor memory (ROM, RAM, flash memory, etc.), etc., or can be transmitted and distributed via a communication medium. The programs stored on the medium also include a setting program for configuring software means (including not only execution programs but also tables and data structures) to be executed by the computer. A computer that realizes this apparatus reads a program stored in a storage medium, and in some cases, constructs software means by a setting program, and executes the above-described processes by controlling the operation of the software means. Note that the storage medium referred to in this specification includes storage media such as magnetic disks, semiconductor memories, etc. provided in computers or devices connected via a network, without being limited to those for distribution.

In short, the present invention is not limited to the above embodiments, and can be modified in various ways without departing from the gist of the invention at the implementation stage. Moreover, each embodiment may be implemented in combination as much as possible, and in that case, the combined effect can be obtained. Furthermore, the above-described embodiments include inventions at various stages, and various inventions can be extracted by appropriately combining a plurality of disclosed constituent elements.

REFERENCE SIGNS LIST 1 video synthesis system 2 transmission server 21 imaging unit 22 transformation unit 23 image transmission unit 24 synthesis information reception unit 201 processor 202 program memory 203 data memory 204 communication interface 205 input/output interface 206 Bus 207 Input unit 208 Display unit 3 Receiving server 31 Image receiving unit 32 Image synthesis unit 33 Synthesis information transmission unit 34 Display unit 35 Synthesis instruction unit 301 Processor 302 Program memory 303 Data memory 304 Communication interface 305 Input/output interface 306 Bus 307 Input unit 308 Display unit 4 Network

Claims

a synthesizing instruction unit that generates synthesizing information indicating what kind of image blocks an output image is composed of;
a combination information transmission unit that transmits the combination information;
an image receiving unit that receives a plurality of transmission image blocks generated based on the synthesis information;
a composite image unit that composites the output image from the plurality of transmission image blocks based on the composite information;
a display unit that displays the output image;
A video compositing system with
Equipped with multiple information processing devices,
Each of the plurality of information processing devices
an imaging unit that acquires an input image;
a combination information receiving unit that receives the combination information;
a transformation unit that transforms an input image based on the synthesis information to generate the transmission image block that includes image blocks that form the output image;
an image transmission unit that transmits the transmission image block;
2. The video composition system of claim 1, comprising:
3. The video composition according to claim 2, wherein said transformation unit acquires image blocks necessary for an output image from said input image based on said composition information, and compresses said image blocks to generate said transmission image blocks. system.
The video synthesizing system according to any one of claims 1 to 3, wherein the total amount of data of the plurality of transmission image blocks is the amount of data that can be transmitted in a transmission band for one stream.
The output image is generated by blending the plurality of transmission image blocks, and among the transmission image blocks to be blended, a compression ratio of a transmission image block output dark as the output image is output light as the output image. 5. The video synthesizing system according to claim 1, wherein the compression rate of the transmitted image block is smaller than the compression rate of the transmitted image block.
The video synthesizing system according to any one of claims 1 to 5, wherein the output image is generated by blending all of the plurality of transmission image blocks, and each of the plurality of transmission images has a varying frame rate.
A video synthesizing method executed by a video post-upper system comprising a processor, comprising:
the processor generating synthesis information indicating what image blocks an output image is composed of;
the processor transmitting the combined information;
the processor receiving a plurality of transmission image blocks generated based on the synthesis information;
the processor synthesizing the output image from the plurality of transmission image blocks based on the synthesizing information;
the processor displaying the output image;
A video synthesis method comprising:
A video synthesizing program that causes a computer to function as each unit included in the video synthesizing system according to any one of claims 1 to 6.