JP7521604B2

JP7521604B2 - Apparatus, method and program for synthesizing video signals

Info

Publication number: JP7521604B2
Application number: JP2022570804A
Authority: JP
Inventors: 稔久藤原; 央也小野; 達也福井; 智彦池田; 亮太椎名
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2024-07-24
Anticipated expiration: 2040-12-22
Also published as: WO2022137324A1; JPWO2022137324A1

Description

複数の映像信号から、画面を１つに合成し、出力する、映像合成システムに関する。 This relates to a video synthesis system that synthesizes multiple video signals into a single screen and outputs it.

近年、多くの映像デバイスが利用されている。このような多くの映像デバイスの映像には、多様な画素数（解像度）、フレームレート等が利用されている。この映像デバイスの映像信号は、規格によって、物理的な信号、コントロール信号等に差異があるものの、１画面をそのフレームレート分の１の時間を使って伝送する。 In recent years, many video devices have come into use. The images produced by these many video devices use a wide variety of pixel counts (resolutions) and frame rates. Although the video signals of these video devices differ in terms of physical signals, control signals, etc. depending on the standard, one screen is transmitted in a time period equal to one half of the frame rate.

これらの映像の利用方法には、テレビ会議など、複数のカメラをカメラの数よりも少ないモニタで表示するような形態がある。このような場合、複数の映像を、例えば１つの画面上に分割表示することや、ある映像画面中に、その他の映像画面縮小表示などをしてはめ込むことなどの、画面合成を行う。 One way to use these images is to display multiple cameras on fewer monitors than the number of cameras, such as in video conferencing. In such cases, multiple images are composited, for example by splitting them onto one screen, or by fitting one image onto another image in a reduced size.

通常、映像信号のタイミングは同期されておらず、合成する他の映像信号のタイミングが異なることから、信号をメモリなどに一時的にバッファリングしてから、合成する。結果として、合成された画面の出力には遅延が発生する。 Normally, the timing of a video signal is not synchronized, and since the timing of other video signals to be mixed is different, the signals are temporarily buffered in memory before being mixed. As a result, there is a delay in the output of the mixed screen.

遠隔地などでの合奏等をこのような画面合成を行うテレビ会議で行うことを想定すると、この合成に関わる遅延は、その実現性を大きく損なう。例えば、１秒間に１２０拍の曲（以下、１２０ＢＰＭ（ＢｅａｔＰｅｒＭｉｎｕｔｅ））であれば、１拍の時間は、６０／１２０秒＝５００ミリ秒である。仮にこれを、５％の精度で合わせることが必要であるとすると、５００×０．０５＝２５ミリ秒以下にカメラで撮影して表示するまでの遅延を抑える必要がある。 If we imagine a video conference in which an ensemble playing music in a remote location is held using this type of screen composition, the delay involved in this composition would greatly impair its feasibility. For example, in a song with 120 beats per second (hereafter referred to as 120 BPM (beats per minute)), the duration of one beat is 60/120 seconds = 500 milliseconds. If we need to match this with an accuracy of 5%, then the delay between capturing the image with a camera and displaying it needs to be reduced to 500 x 0.05 = 25 milliseconds or less.

カメラで撮影して表示するまでには、実際には、合成に関わる処理以外に、カメラでの画像処理時間、モニタでの表示時間、伝送に関わる時間などの、その他の遅延も含む必要がある。結果として、従来技術では、遠隔地で相互に映像を見ながらの合奏等のタイミングが重視される用途での、協調作業は困難であった。 In reality, the process from capturing an image with a camera to displaying it requires other delays besides the process related to composition, such as the time it takes to process the image on the camera, the time it takes to display it on the monitor, and the time it takes to transmit it. As a result, with conventional technology, it has been difficult to collaborate in applications where timing is important, such as playing an ensemble while watching each other's videos from remote locations.

そこで、低遅延要求が厳しい協調作業に対して、複数拠点などの複数の映像信号を合成するシステムで、非同期の映像信号の入力から、合成された映像信号の出力までの時間を低遅延化するシステムの提供が必要である。Therefore, for collaborative work that requires strict low latency, it is necessary to provide a system that synthesizes multiple video signals from multiple locations, etc., and that reduces the latency between the input of asynchronous video signals and the output of the synthesized video signal.

ＶＥＳＡａｎｄＩｎｄｕｓｔｒｙＳｔａｎｄａｒｄｓａｎｄＧｕｉｄｅｌｉｎｅｓｆｏｒＣｏｍｐｕｔｅｒＤｉｓｐｌａｙＭｏｎｉｔｏｒＴｉｍｉｎｇ（ＤＭＴ），Ｖｅｒｓｉｏｎ１．０，Ｒｅｖ．１３，Ｆｅｂｒｕａｒｙ８，２０１３VESA and Industry Standards and Guidelines for Computer Display Monitor Timing (DMT), Version 1.0, Rev. 13, February 8, 2013

本開示は、合成映像の出力までの遅延時間を短縮することを目的とする。 The purpose of this disclosure is to reduce the delay time until the composite image is output.

本開示の装置は、複数の非同期の映像信号を合成して表示する装置において、入力された複数の映像信号の各々から出力映像の遅延時間が最小となる映像信号の組み合わせを選択して合成する。The device disclosed herein is a device that synthesizes and displays multiple asynchronous video signals, and selects and synthesizes a combination of video signals from each of the multiple input video signals that minimizes the delay time of the output video.

本開示の映像合成装置及び映像合成方法は、
非同期で入力された複数の映像信号を構成する入力フレーム同士の遅延時間を検出し、
前記複数の映像信号を合成した出力フレームの遅延時間が最小となるように、前記複数の映像信号の入力フレームをそれぞれ選択し、
選択した入力フレームを用いて、前記複数の映像信号を合成した出力フレームを生成する。 The image synthesizing device and image synthesizing method disclosed herein include
Detects the delay between input frames that compose multiple asynchronously input video signals,
selecting input frames of the plurality of video signals so as to minimize a delay time of an output frame obtained by combining the plurality of video signals;
A selected input frame is used to generate an output frame that combines the multiple video signals.

本開示の映像合成方法は、映像合成装置が、
非同期で入力された複数の映像信号を構成する入力フレーム同士の遅延時間を検出し、
前記複数の映像信号を合成した出力フレームの遅延時間が最小となるように、前記複数の映像信号の入力フレームをそれぞれ選択し、
選択した入力フレームを用いて、前記複数の映像信号を合成した出力フレームを生成する。 The image compositing method according to the present disclosure includes an image compositing device,
Detects the delay between input frames that compose multiple asynchronously input video signals,
selecting input frames of the plurality of video signals so as to minimize a delay time of an output frame obtained by combining the plurality of video signals;
A selected input frame is used to generate an output frame that combines the multiple video signals.

本開示のプログラムは、本開示に係る装置に備わる各機能部としてコンピュータを実現させるためのプログラムであり、本開示に係る装置が実行する方法に備わる各ステップをコンピュータに実行させるためのプログラムである。The program of the present disclosure is a program for causing a computer to realize each functional unit of the device of the present disclosure, and is a program for causing a computer to execute each step of the method performed by the device of the present disclosure.

本開示は、合成映像の出力までの遅延時間を短縮することができる。 This disclosure can reduce the delay time until the composite image is output.

映像信号に含まれる画面の情報の一例を示す。3 shows an example of screen information included in a video signal. 画面の合成例を示す。An example of screen composition is shown below. 本開示に関連する映像合成方法の一例を示す。1 illustrates an example of a video synthesis method related to the present disclosure. 本開示の映像合成方法の一例を示す。1 illustrates an example of a video synthesis method according to the present disclosure. 本開示の映像合成方法の一例を示す。1 illustrates an example of a video synthesis method according to the present disclosure. 本実施形態に係る映像合成装置の構成例を示す。1 shows an example of the configuration of an image synthesizing device according to an embodiment of the present invention. 本開示の映像合成方法の一例を示す。1 illustrates an example of a video synthesis method according to the present disclosure. 本開示の映像合成方法の一例を示す。1 illustrates an example of a video synthesis method according to the present disclosure.

以下、本開示の実施形態について、図面を参照しながら詳細に説明する。なお、本開示は、以下に示す実施形態に限定されるものではない。これらの実施の例は例示に過ぎず、本開示は当業者の知識に基づいて種々の変更、改良を施した形態で実施することができる。なお、本明細書及び図面において符号が同じ構成要素は、相互に同一のものを示すものとする。 Below, the embodiments of the present disclosure will be described in detail with reference to the drawings. Note that the present disclosure is not limited to the embodiments shown below. These implementation examples are merely illustrative, and the present disclosure can be implemented in various forms with various modifications and improvements based on the knowledge of those skilled in the art. Note that components with the same reference numerals in this specification and drawings are considered to be identical to each other.

図１に、映像信号に含まれる画面の情報の一例を示す。画面の情報は、画面を横方向に１つの走査線２１毎に走査して、順次下の走査線２１を走査することで伝送される。この走査には、表示画面２４の他、ブランキング部分２２、また、ボーダ部分２３などのオーバヘッド情報／信号の走査を含む。ブランキング部分２２に、制御情報や音声情報など、映像情報以外の情報を含む場合もある。（例えば、非特許文献１、第３章参照。） Figure 1 shows an example of screen information contained in a video signal. The screen information is transmitted by scanning the screen horizontally, one scan line 21 at a time, and then scanning the scan lines 21 below in sequence. This scanning includes scanning of the display screen 24, as well as blanking portion 22 and overhead information/signals such as border portion 23. The blanking portion 22 may contain information other than video information, such as control information and audio information. (For example, see Non-Patent Document 1, Chapter 3.)

図２に、映像信号の合成例を示す。本開示では、一例として、入力１～４の４つの映像信号が映像合成装置に入力され、映像合成装置が１つの映像信号に合成して出力する例を示す。映像信号では１画面をそのフレームレート分の１の時間を使って伝送する。例えば、１秒間に６０フレームの映像信号であれば、１／６０秒、すなわち約１６．７ミリ秒を掛けて１画面の映像信号を伝送する（以下、６０ｆｐｓ（ＦｒａｍｅｐｅｒＳｅｃｏｎｄ））。映像信号に含まれる各時点での１画面の情報を「フレーム」と称し、映像合成装置に入力される各映像信号の１画面の情報を「入力フレーム」、映像合成装置から出力される合成された１画面の情報を「出力フレーム」と称する。 Figure 2 shows an example of video signal synthesis. In this disclosure, as an example, four video signals, inputs 1 to 4, are input to a video synthesis device, which synthesizes them into one video signal and outputs it. A video signal transmits one screen over a period of time equal to one frame rate. For example, if the video signal has 60 frames per second, one screen of the video signal is transmitted over 1/60th of a second, or approximately 16.7 milliseconds (hereinafter, 60 fps (Frames per Second)). The information of one screen at each time point contained in the video signal is referred to as a "frame," the information of one screen of each video signal input to the video synthesis device is referred to as an "input frame," and the information of one synthesized screen output from the video synthesis device is referred to as an "output frame."

例えば、図３に示すように、映像合成装置が、全ての入力フレームを読み込んでから、１つの出力フレームに合成し、出力する場合を考える。この場合、各入力フレームのフレーム時間をＴ＿ｆ、合成処理時間をＴ＿ｐとすると、出力フレームの出力は、最初の入力１の入力フレームの入力時点から最大で、２Ｔ＿ｆ＋Ｔ＿ｐ遅れることとなる。For example, consider a case where a video synthesizer reads all input frames, synthesizes them into one output frame, and outputs it, as shown in Figure 3. In this case, if the frame time of each input frame is T_f and the synthesis processing time is T_p, the output of the output frame will be delayed by a maximum of 2T_f + T_p from the time of input of the first input frame, Input 1.

本開示は、複数の非同期の映像を入力し、それらの画像を合成するシステムであって、合成後の遅延が最も低くなるよう、合成する入力フレームを選択することを特徴とする。 The present disclosure relates to a system that inputs multiple asynchronous video images and synthesizes those images, characterized by selecting the input frames to be synthesized so as to minimize the delay after synthesis.

ｋ番目の出力フレームを｛Ｏ，ｋフレーム｝とすると、その入力からの変換関数をｆ（ｉｎｐｕｔ１，ｉｎｐｕｔ２，…）と表す。また、入力タイミングが早い順に入力１，２，３，４とする。 If the kth output frame is {O, k frame}, the transformation function from the input is expressed as f(input1, input2, ...). Also, inputs 1, 2, 3, and 4 are inputs in order of earliest input timing.

（第１の合成例）
図４に、本開示の第１の合成例を示す。｛Ｏ，ｋ｝＝ｆ（｛１，ｋ｝，｛２，ｋ｝，｛３，ｋ｝，｛４，ｋ｝）の場合、図のように、入力｛１，ｋ｝，｛２，ｋ｝，｛３，ｋ｝は、遅延時間なしに入力され、｛４，ｋ｝は、他に比べて入力遅延時間がＤ＿ｉｎ４である。このとき、Ｏ，ｋフレームの遅延時間は、入力４に対して、Ｔ＿ｆ＋Ｔ＿ｐ、その他の入力１，２，３に対しては、Ｔ＿ｆ＋Ｔ＿ｐ＋Ｄ＿ｉｎ４である。この場合、４つの入力の遅延の平均値は、
（数１）
Ｔ＿ｆ＋Ｔ＿ｐ＋３Ｄ＿ｉｎ４／４（１）
である。 (First Synthesis Example)
4 shows a first synthesis example of the present disclosure. When {O,k}=f({1,k}, {2,k}, {3,k}, {4,k}), as shown in the figure, inputs {1,k}, {2,k}, {3,k} are input without delay time, and {4,k} has an input delay time D_in4 compared to the others. In this case, the delay time of the O,k frame is T_f+T_p for input 4, and T_f+T_p+D_in4 for the other inputs 1, 2, and 3. In this case, the average delay value of the four inputs is
(Equation 1)
T_f+T_p+3D_in4/4 (1)
It is.

（第２の合成例）
図５に、本開示の第２の合成例を示す。｛Ｏ，ｋ｝＝ｆ（｛１，ｋ＋１｝，｛２，ｋ＋１｝，｛３，ｋ＋１｝，｛４，ｋ｝）とした場合、入力｛４，ｋ｝から（Ｔ＿ｆ－Ｄ＿ｉｎ４）経過後に入力｛１，ｋ＋１｝，｛２，ｋ＋１｝，｛３，ｋ＋１｝が入力され、入力｛１，ｋ＋１｝，｛２，ｋ＋１｝，｛３，ｋ＋１｝の直後にＴ＿ｐで合成処理が行われる。この場合、Ｏ，ｋフレームの遅延時間は、入力１，２，３に対しては、Ｔ＿ｆ＋Ｔ＿ｐであり、入力４に対しては、２Ｔ＿ｆ＋Ｔ＿ｐ－Ｄ＿ｉｎ４である。この場合、４つの入力の遅延の平均値は、
（数２）
５Ｔ＿ｆ／４＋Ｔ＿ｐ－Ｄ＿ｉｎ４／４（２）
である。 (Second Synthesis Example)
FIG. 5 shows a second synthesis example of the present disclosure. In the case where {O,k}=f({1,k+1}, {2,k+1}, {3,k+1}, {4,k}), inputs {1,k+1}, {2,k+1}, {3,k+1} are input after (T_f-D_in4) from input {4,k}, and synthesis processing is performed at T_p immediately after inputs {1,k+1}, {2,k+1}, {3,k+1}. In this case, the delay time of O,k frames is T_f+T_p for inputs 1, 2, and 3, and 2T_f+T_p-D_in4 for input 4. In this case, the average delay value of the four inputs is
(Equation 2)
5T_f/4+T_p-D_in4/4 (2)
It is.

ここで、Ｔ＿ｆ＜４Ｄ＿ｉｎ４であれば、式（１）の合成例より式（２）の合成例の方が平均遅延時間が短くなる。このように入力フレームの遅延時間量に応じて、出力の元となる入力フレームの位置（時間）の組み合わせを変えることで、平均値を最小とする組み合わせが存在し、その最小となる組み合わせで出力フレーム構成することで、合成の遅延を最小化することが可能である。Here, if T_f<4D_in4, the average delay time is shorter in the synthesis example of formula (2) than in the synthesis example of formula (1). In this way, by changing the combination of positions (times) of the input frames that are the source of the output according to the amount of delay time of the input frames, a combination exists that minimizes the average value, and by constructing the output frame with this minimum combination, it is possible to minimize the synthesis delay.

つまり、（Ｏ，ｋ）に対して
ｆ（｛１，ｋ｝，｛２，ｋ｝，｛３，ｋ｝，｛４，ｋ｝）
ｆ（｛１，ｋ＋１｝，｛２，ｋ｝，｛３，ｋ｝，｛４，ｋ｝）
ｆ（｛１，ｋ＋１｝，｛２，ｋ＋１｝，｛３，ｋ｝，｛４，ｋ｝）
ｆ（｛１，ｋ＋１｝，｛２，ｋ＋１｝，｛３，ｋ＋１｝，｛４，ｋ｝）
の組み合わせでの遅延時間を計算し、そのうちの遅延時間の平均値が最小となる入力フレームの組み合わせを、出力の組み合わせに選択する。 That is, for (O, k), f({1, k}, {2, k}, {3, k}, {4, k})
f({1, k+1}, {2, k}, {3, k}, {4, k})
f ({1, k+1}, {2, k+1}, {3, k}, {4, k})
f ({1, k+1}, {2, k+1}, {3, k+1}, {4, k})
The delay time for each combination is calculated, and the combination of input frames that has the smallest average delay time is selected as the output combination.

入力フレームの組み合わせは、遅延時間の平均値が最小となる組み合わせに限らず、遅延時間の最大値が最小となる組み合わせであってもよい。また、一部の入力フレームに対してのみ低遅延性を要求する等の場合、全入力フレームのうち、一部の入力フレームの遅延時間の平均値が最小となる組み合わせや、一部の入力フレームの遅延時間の最大値が最小となる組み合わせであってもよい。本実施形態では４つの映像信号の入力フレームを合成する例を示したが、これは任意のＮ個の映像信号の入力フレームで適用できる。また、図４及び図５では、理解が容易になるよう、ｋ番目及びｋ＋１番目のフレーム番号を用いたが、本開示で前提とする映像信号は非同期であり、フレーム番号や各フレームの入力タイミングは異なる。The combination of input frames is not limited to the combination that minimizes the average delay time, but may be the combination that minimizes the maximum delay time. In addition, in cases where low latency is required only for some input frames, the combination may be the combination that minimizes the average delay time of some input frames among all input frames, or the combination that minimizes the maximum delay time of some input frames. In this embodiment, an example of synthesizing input frames of four video signals is shown, but this can be applied to input frames of any N video signals. In addition, in Figures 4 and 5, the kth and k+1th frame numbers are used for ease of understanding, but the video signals assumed in this disclosure are asynchronous, and the frame numbers and input timing of each frame are different.

図６に、本実施形態に係る映像合成装置の構成例を示す。本実施形態に係る映像合成装置１０は、検出部１０１、クロスバスイッチ１０２、アップダウンコンバータ１０３、バッファ１０４、画素合成部１０５を備える。図は４入力１出力であるが、任意の数Ｎの入出力でも構わない。 Figure 6 shows an example of the configuration of an image synthesis device according to this embodiment. The image synthesis device 10 according to this embodiment comprises a detection unit 101, a crossbar switch 102, an up/down converter 103, a buffer 104, and a pixel synthesis unit 105. The figure shows four inputs and one output, but any number N of inputs and outputs may be used.

１０１は、Ｎ個の入力フレームに対して、フレーム時間内の入力順と入力の遅延時間を検出する機能部である。例えば、図４及び図５に示す入力４の入力遅延時間がＤ＿ｉｎ４を検出する。
１０２は、クロスバスイッチであり、１０１からの入力フレームの順序の検出結果順に並べ替え出力する機能である。例えば、図４及び図５に示す入力１、２、３、４の順に、かつｋ番目、ｋ＋１番目の順に並べて出力する。
１０３は画素数を任意の大きさに拡大縮小を行う、アップダウンコンバータである。例えば、入力１の画素数を、図２に示す画面の大きさに整合するよう拡大又は縮小する。
１０２と１０３は入力（ａ，ｂ，ｃ，ｄ，…）に対して逆に接続しても構わない。すなわち入力ａ，ｂ，ｃ，ｄから１０３で拡大縮小を行い、その後１０２で入力１、２、３、４の順に並べ替え出力しても構わない。
１０４は、各入力フレームを格納するバッファである。１０３または１０２の入力をバッファリングして、任意の順序で出力することができる。
１０５は、画素合成部である。画素合成部１０５は、出力の全体画面のうち、１０１からの遅延時間を元に遅延を最小化する組み合わせの入力に対するフレーム番号を選択し、１０４からデータを読み出し、合成して出力フレームを生成し、出力する。これにより、図２に示すような、４つの映像信号が合成された映像が画面に表示される。１０５は、任意のコントロール信号を画面のブランキング部分２２に付加しても構わない。 Reference numeral 101 denotes a functional unit that detects the input order and input delay time within a frame time for N input frames. For example, the input delay time D_in4 of the input 4 shown in FIG.
A crossbar switch 102 has a function of sorting and outputting the frames in the order of the detection result of the order of the input frames from 101. For example, the inputs 1, 2, 3, and 4 shown in Figs. 4 and 5 are sorted and output in the order of the kth and k+1th.
An up-down converter 103 enlarges or reduces the number of pixels to an arbitrary size. For example, the number of pixels of the input 1 is enlarged or reduced so as to match the size of the screen shown in FIG.
The inputs 102 and 103 may be connected inversely to the inputs (a, b, c, d, ...). In other words, the inputs a, b, c, d may be enlarged or reduced by 103, and then the inputs 1, 2, 3, 4 may be rearranged and output by 102 in that order.
A buffer 104 stores each input frame. The input of 103 or 102 can be buffered and output in any order.
Reference numeral 105 denotes a pixel synthesis unit. The pixel synthesis unit 105 selects a frame number for an input combination that minimizes delay based on the delay time from 101 from the entire output screen, reads data from 104, synthesizes the data to generate an output frame, and outputs the output frame. As a result, an image in which four video signals are synthesized as shown in Fig. 2 is displayed on the screen. 105 may add an arbitrary control signal to the blanking portion 22 of the screen.

本開示の装置は、コンピュータとプログラムによっても実現でき、プログラムを記録媒体に記録することも、ネットワークを通して提供することも可能である。 The device disclosed herein can also be realized by a computer and a program, and the program can be recorded on a recording medium or provided over a network.

上述の実施形態では４入力、４分割１画面の例を示したが、本開示はこれに限らず、任意の入力に適用できる。また上述の実施形態では、入力１～４のフレームレートすなわちフレーム時間Ｔ＿ｆが同じ例を示したが、本開示はフレーム時間Ｔ＿ｆが異なる入力１～４に対しても適用可能である。In the above embodiment, an example of four inputs and one screen divided into four has been shown, but the present disclosure is not limited to this and can be applied to any input. In addition, in the above embodiment, an example has been shown in which inputs 1 to 4 have the same frame rate, i.e., frame time T_f, but the present disclosure can also be applied to inputs 1 to 4 with different frame times T_f.

例えば、出力フレームよりフレームレートが高い入力１の入力フレームについては、図７に示すように、不要な入力フレームを間引けばよい。不要な入力フレームは、例えば、入力完了時点Ｔ_１１及びＴ_１２を基準にした場合に、出力フレームの遅延時間が長くなる入力フレーム、例えばｋ－１番目のフレームである。入力完了時点は、入力フレームの先頭などに記載されているフレーム長を用いて予測されたタイミングであってもよい。 For example, for input frames of input 1 having a higher frame rate than the output frame, unnecessary input frames may be thinned out as shown in Fig. 7. The unnecessary input frames are, for example, input frames that increase the delay time of the output frame when input completion times _T11 and _T12 are used as references, such as the k-1th frame. The input completion time may be a timing predicted using the frame length written at the beginning of the input frame, for example.

一方、出力フレームよりフレームレートが低い入力１の入力フレームについても、図８に示すように、出力フレームＯ，ｋには、入力完了時点Ｔ_１１及びＴ_１２を基準に出力フレームの遅延時間が短くなる入力フレーム（｛１，ｋ｝，｛２，ｋ｝，｛３，ｋ｝，｛４，ｋ｝）を選択することができる。不足するフレームは時間的に過去のフレームを利用して補完することができる。例えば、出力フレームＯ，ｋ＋１には、入力フレーム（｛１，ｋ｝，｛２，ｋ＋１｝，｛３，ｋ＋１｝，｛４，ｋ＋１｝）を選択することができる。このように、本開示は、ｋ番目の入力フレームをｋ番目及びｋ＋１番目などの連続する複数の出力フレームに用いてもよいし、ｋ番目及びｋ＋１番目などの連続する複数の入力フレームを１つのｋ番目の出力フレームに用いてもよい。 On the other hand, as shown in FIG. 8, for the input frame of input 1 having a lower frame rate than the output frame, the input frame ({1, k}, { ₂ , k}, { ₃ , k}, {4, k}) that shortens the delay time of the output frame based on the input completion time points T11 and T12 can be selected for the output frame O, k. Missing frames can be complemented by using past frames. For example, the input frame ({1, k}, {2, k+1}, {3, k+1}, {4, k+1}) can be selected for the output frame O, k+1. In this way, the present disclosure may use the kth input frame for multiple consecutive output frames such as the kth and k+1th, or may use multiple consecutive input frames such as the kth and k+1th for one kth output frame.

また、遅延時間の最小化にあたっては、上記、複数の出力フレームに関して、入力の組み合わせの最適化を行っても良い。つまり、上記の例では、出力フレームＯ，ｋに関してのみ入力の組み合わせが最適化されているが、出力フレームＯ，ｋ＋１については必ずしも最適化されているとは言えない。そこで、例えば、出力フレームＯ，ｋと出力フレームＯ，ｋ＋１などの複数の出力フレームに対して、平均や最大などの遅延値を最小化する最適化を行うことができる。 In addition, when minimizing the delay time, the input combination may be optimized for the multiple output frames. In other words, in the above example, the input combination is optimized only for output frame O,k, but it cannot necessarily be said to be optimized for output frame O,k+1. Therefore, for multiple output frames such as output frame O,k and output frame O,k+1, optimization can be performed to minimize delay values such as average or maximum.

（本開示の効果）
本開示は、非同期の映像入力信号を、出力の遅延時間を最小化するように入力フレームの組み合わせを選び、合成することで、合成後の出力までの遅延時間を短縮することができる。これにより、本開示は、複数拠点等の複数画面を合成するシステムで低遅延要求が厳しい協調作業が可能となる。 (Effects of the present disclosure)
The present disclosure can reduce the delay time until the output of asynchronous video input signals after synthesis by selecting a combination of input frames that minimizes the output delay time. This enables collaborative work with strict low-latency requirements in a system that synthesizes multiple screens from multiple locations, etc.

例として、本開示の効果を、図４および図５に示した入力フレームタイミングの場合で示す。例えば、６０ｆｐｓ（Ｔ＿ｆ＝約１６．７ミリ秒）、Ｔ＿ｐ＝０とし、また、Ｄ＿ｉｎ４＝０．７Ｔ＿ｆ，０．８Ｔ＿ｆ，０．９Ｔ＿ｆのとき、本開示適用前の式（１）の値はそれぞれ２５．４ミリ秒、２５．７ミリ秒、２７．９ミリ秒となり２５ミリ秒を超過するが、本開示適用後の式（２）はそれぞれ１７．９ミリ秒、１７．５ミリ秒、１７．１ミリ秒となり２５ミリ秒を下回る。このため、本開示は、時間的に連続する入力フレームのうちの適切な入力フレームの組み合わせで出力フレームを生成することで、合奏のような低遅延要求が厳しい協調作業であっても、複数拠点の映像を合成して表示するシステムを提供することができる。As an example, the effect of the present disclosure will be shown in the case of the input frame timing shown in Figures 4 and 5. For example, when 60 fps (T_f = about 16.7 ms), T_p = 0, and D_in4 = 0.7T_f, 0.8T_f, and 0.9T_f, the values of formula (1) before the application of the present disclosure are 25.4 ms, 25.7 ms, and 27.9 ms, respectively, which exceed 25 ms, while the values of formula (2) after the application of the present disclosure are 17.9 ms, 17.5 ms, and 17.1 ms, respectively, which are below 25 ms. Therefore, the present disclosure can provide a system that generates an output frame by combining appropriate input frames from among input frames that are consecutive in time, and can synthesize and display images from multiple locations even in collaborative work that requires low latency, such as an ensemble.

本開示は、映像コンテンツやゲームコンテンツを配信する情報通信産業のほか、映像制作に関わる映画、広告、ゲーム産業に適用することができる。 This disclosure can be applied to the information and communications industry, which distributes video and game content, as well as the film, advertising, and game industries involved in video production.

１０：映像合成装置
２１：走査線
２２：ブランキング部分
２３：ボーダ部分
２４：表示画面
１０１：検出部
１０２：クロスバスイッチ
１０３：アップダウンコンバータ
１０４：バッファ
１０５：画素合成部 10: Video synthesizer 21: Scanning line 22: Blanking portion 23: Border portion 24: Display screen 101: Detector 102: Crossbar switch 103: Up/down converter 104: Buffer 105: Pixel synthesizer

Claims

Detects the delay between input frames that compose multiple asynchronously input video signals,
comparing delay times of an output frame when the multiple video signals are synthesized using adjacent input frames at the same time among the input frames of different video signals that are consecutive in time with each other, and selecting input frames of the multiple video signals so that the delay time of the output frame obtained by synthesizing the multiple video signals is minimized;
generating an output frame by combining the plurality of video signals using the selected input frame;
Image synthesis device.

Detects the delay between input frames that compose multiple asynchronously input video signals,
calculating a delay time of an output frame when the multiple video signals are synthesized using input frames adjacent to each other at the same time among the multiple video signal input frames that are consecutive in time, for multiple output frames that are different in time, and selecting input frames of the multiple video signals such that an average value or a maximum value of the delay times of the multiple output frames is minimized;
generating an output frame by combining the plurality of video signals using the selected input frame;
Image synthesis device.

Calculating an average delay time of an output frame obtained by combining the plurality of video signals;
selecting input frames of the plurality of video signals so that the average value is minimized;
2. The image synthesizing apparatus according to claim 1.

Calculating a maximum delay time of an output frame obtained by combining the plurality of video signals;
selecting input frames of the plurality of video signals such that the maximum value is minimized;
2. The image synthesizing apparatus according to claim 1.

Detect the delay between input frames based on the input completion time of the input frames.
3. The image synthesizing apparatus according to claim 1 or 2.

The image synthesizer
Detects the delay between input frames that compose multiple asynchronously input video signals,
comparing delay times of an output frame when the multiple video signals are synthesized using adjacent input frames at the same time among the input frames of different video signals that are consecutive in time with each other, and selecting input frames of the multiple video signals so that the delay time of the output frame obtained by synthesizing the multiple video signals is minimized;
generating an output frame by combining the plurality of video signals using the selected input frame;
Video compositing method.

The image synthesizer
Detects the delay between input frames that compose multiple asynchronously input video signals,
calculating a delay time of an output frame when the multiple video signals are synthesized using input frames adjacent to each other at the same time among the multiple video signal input frames that are consecutive in time, for multiple output frames that are different in time, and selecting input frames of the multiple video signals such that an average value or a maximum value of the delay times of the multiple output frames is minimized;
generating an output frame by combining the plurality of video signals using the selected input frame;
Video compositing method.

A program for causing a computer to realize each functional unit of the image synthesizing device according to claim 1 or 2.