WO2022137325A1 - Device, method, and program for synthesizing video signals - Google Patents

Device, method, and program for synthesizing video signals Download PDF

Info

Publication number
WO2022137325A1
WO2022137325A1 PCT/JP2020/047864 JP2020047864W WO2022137325A1 WO 2022137325 A1 WO2022137325 A1 WO 2022137325A1 JP 2020047864 W JP2020047864 W JP 2020047864W WO 2022137325 A1 WO2022137325 A1 WO 2022137325A1
Authority
WO
WIPO (PCT)
Prior art keywords
video signals
input
video
time
output
Prior art date
Application number
PCT/JP2020/047864
Other languages
French (fr)
Japanese (ja)
Inventor
稔久 藤原
央也 小野
達也 福井
智彦 池田
亮太 椎名
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to JP2022570805A priority Critical patent/JPWO2022137325A1/ja
Priority to PCT/JP2020/047864 priority patent/WO2022137325A1/en
Publication of WO2022137325A1 publication Critical patent/WO2022137325A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/238Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • the timing of the video signal is not synchronized, and the timing of other video signals to be combined is different. Therefore, the signal is temporarily buffered in a memory or the like before being combined. As a result, there is a delay in the output of the combined screen.
  • the delay related to this composition greatly impairs its feasibility.
  • 120 BPM Beat Per Minute
  • the purpose of this disclosure is to reduce the delay time until the output of the composite video.
  • a synthesis process is performed in which the screens are arranged from the top in the order of the earliest input timing from the plurality of input video signals.
  • the video synthesizer of the present disclosure is Detects the input timing of each input frame that composes multiple video signals, When a set number of video signals among the plurality of video signals are input, the synthesis processing of the set number of video signals is sequentially started. An output frame is generated by synthesizing the plurality of video signals into one video signal.
  • the video composition method of the present disclosure is The video synthesizer Detects the input timing of each input frame that composes multiple video signals, When a set number of video signals among the plurality of video signals are input, the synthesis processing of the set number of video signals is sequentially started. An output frame is generated by synthesizing the plurality of video signals into one video signal.
  • the program of the present disclosure is a program for realizing a computer as each functional unit provided in the apparatus according to the present disclosure, and is a program for causing the computer to execute each step provided in the method of the apparatus according to the present disclosure. ..
  • This disclosure can shorten the delay time until the output of the composite video.
  • An example of screen information included in a video signal is shown.
  • An example of screen composition is shown.
  • An example of a video composition method related to the present disclosure is shown.
  • An example of the video synthesis method of the present disclosure is shown.
  • An example of the video synthesis method of the present disclosure is shown.
  • An example of the video synthesis method of the present disclosure is shown.
  • An example of the video synthesis method of the present disclosure is shown.
  • a configuration example of the video synthesizer according to this embodiment is shown.
  • FIG. 1 shows an example of screen information included in a video signal.
  • the information on the screen is transmitted by scanning the screen in the horizontal direction for each scanning line 21 and sequentially scanning the lower scanning line 21.
  • This scan includes scanning of overhead information / signals such as the blanking portion 22 and the border portion 23 in addition to the display screen 24.
  • the blanking portion 22 may include information other than video information, such as control information and audio information. (See, for example, Non-Patent Document 1, Chapter 3.)
  • FIG. 2 shows an example of synthesizing video signals.
  • four video signals are input to the video synthesizer, and the video synthesizer synthesizes and outputs one video signal.
  • one screen is transmitted using a time that is one-third of the frame rate.
  • the video signal of one screen is transmitted in 1/60 seconds, that is, about 16.7 milliseconds (hereinafter, 60 fps (Frame per Second)).
  • the information on one screen at each time point included in the video signal is called a "frame”, and the information on one screen of each video signal input to the video synthesizer is called an "input frame”, which is synthesized and output from the video synthesizer.
  • the information on one screen is referred to as an "output frame”.
  • the video synthesizer reads all the input frames, then synthesizes them into one output frame, and outputs them, as shown in FIG.
  • the frame time of each input frame is T_f and the synthesis processing time is T_p
  • the output of the output frame is delayed by 2T_f + T_p at the maximum from the input start time of the first input 1. If the frame rate is 60 fps, there will be a delay of at least 33.3 ms.
  • the apparatus and method of the present disclosure is a system for inputting a plurality of asynchronous videos and synthesizing the images, and is characterized in that the compositing process is started in the order of the earliest input timing and arranged at the upper part of the screen. ..
  • FIG. 4 shows a first synthetic example of the present disclosure.
  • the composition process (1) is started and the output to the upper part of the display screen 24 is started.
  • the composition process (2) is started, and the output to the lower part of the display screen 24 is started.
  • the maximum delay time from the input start time of the input 1 to the output start time of the upper display screen 24 is (5 / 4T_f + T_p).
  • the output delay can be shortened by 3/4 T_f time compared to the example shown in FIG. For example, when the frame rate is 60 fps, the delay is 21 milliseconds + T_p.
  • each input is not shifted by 1/4 T_f time.
  • the lower display follows the output of the upper display screen 24 in which the input 1 and the input 2 are combined.
  • the start of the synthesis process (1) is waited for at least T_in2toin4-T_f / 2 hours so that the output of the screen 24 is in time.
  • the output of the output frame on the upper screen may be waited for T_in2toin4-T_f / 2 hours.
  • the time difference at the end of the input frame is set as the time difference between the input 2 and the input 4, and the comparison target is compared with T_f / 2.
  • the time difference at the end of the input frame and the comparison target thereof may be any number determined according to the number of video signals to be combined and the arrangement of the screen. For example, when synthesizing six video signals into one video signal of the upper two screens, the middle two screens, and the lower three screens, the time difference at the end of the input frame is the time difference between the input 4 and the input 6, and the comparison target is T_f /. It may be set to 3.
  • the T_f / 2 and T_f / 3 to be compared are the signals for the display screen 24. It is a numerical value and needs to be corrected according to the overhead portion.
  • the pipelined synthesis processing time is T_pp (Time of Pipelined Processing).
  • the pipelined synthesis processing time indicates only the initial overhead time for the pipeline (the time required for the entire processing before passing it to the next stage processing including data reading etc.), and the synthesis processing is , Is executed continuously according to the input or output.
  • the actual time of the pipelined synthesis process is the time for processing one unit of data in the pipeline process before the output, which is the subsequent process. In this case, the process can be started so that the output of the output frame is completed at the end time of the video signal input + T_pp.
  • the synthesis process (1) is started so that the time when the T_pp time elapses from the input completion time T_2E of the input 2 coincides with the output completion time T_UE of the upper display screen 24, and the output to the upper part of the display screen 24 is started. ..
  • the synthesis process (2) is started so that the time when the T_pp time elapses from the input completion time T_4E of the input 4 coincides with the output completion time T_DE of the lower display screen 24, and the output to the lower part of the display screen 24 is started. ..
  • the maximum delay time from the input start time of the input 1 to the output start time of the upper display screen 24 is (3/4 T_f + T_pp).
  • the output delay can be shortened as compared with the example shown in FIG. For example, when the frame rate is 60 fps, the delay is 12.5 milliseconds + T_pp.
  • T_f After the input of input 4 is completed so that the output of the upper display screen 24 is followed by the output of the lower display screen 24.
  • / 2-T_in2toin 4 hours wait for the start of the synthesis process (2).
  • the output of the output frame may be waited for T_f / 2-T_in2toin 4 hours after the synthesis process (2) is performed.
  • the time difference at the end of the input frame is set as the time difference between the input 2 and the input 4, and the comparison target is compared with T_f / 2.
  • the time difference at the end of the input frame and the comparison target thereof may be any number determined according to the number of video signals to be combined and the arrangement of the screen. For example, when synthesizing six video signals into one video signal of the upper two screens, the middle two screens, and the lower three screens, the time difference at the end of the input frame is the time difference between the input 4 and the input 6, and the comparison target is T_f /. It may be set to 3.
  • the T_f / 2 and T_f / 3 to be compared are the signals for the display screen 24. It is a numerical value and needs to be corrected according to the overhead portion.
  • FIG. 8 shows an example of the system configuration according to this embodiment.
  • the video compositing device 10 according to the present embodiment includes a detection unit 101, a crossbar switch 102, an up / down converter 103, a buffer 104, and a pixel compositing unit 105.
  • the figure shows 4 inputs and 1 output, but any number of inputs and outputs may be used.
  • Reference numeral 101 is a functional unit that detects the input order within the frame time for N input frames. For example, the input timings of the inputs 1, 2, 3, and 4 shown in FIGS. 4 and 5 are detected, and the order of the inputs 1, 2, 3, and 4 is determined using the input timings.
  • 102 is a crossbar switch, and is a function of sorting and outputting in the order of detection results of the input order from 101. For example, the input frame order is arranged in the order of inputs 1, 2, 3, and 4 shown in FIGS. 4 and 5.
  • Reference numeral 103 is an up / down converter that enlarges / reduces the number of pixels to an arbitrary size.
  • the number of pixels of the input 1 is enlarged or reduced so as to match the size of the screen shown in FIG. 102 and 103 may be connected in reverse to the inputs (a, b, c, d, ). That is, the inputs a, b, c, and d may be enlarged or reduced at 103, and then the inputs 1, 2, 3, and 4 may be rearranged and output at 102.
  • Reference numeral 104 is a buffer for storing each input frame. The inputs of 103 or 102 can be buffered and output in any order.
  • Reference numeral 105 is a pixel synthesizing unit.
  • the pixel synthesizing unit 105 reads pixel data from 104 in the order of output from the entire output screen, synthesizes them, generates an output frame, and outputs the data. As a result, a video in which the four video signals are combined is displayed on the screen as shown in FIG. This timing is as described above.
  • the 105 may add an arbitrary control signal to the blanking portion 22 of the screen.
  • the device of the present disclosure can also be realized by a computer and a program, and the program can be recorded on a recording medium or provided through a network.
  • This disclosure can be applied to the movie, advertising, and game industries related to video production, as well as the information and communication industry that distributes video content and game content.
  • Video compositing device 21 Scanning line 22: Blanking part 23: Border part 24: Display screen 101: Detection unit 102: Crossbar switch 103: Up / down converter 104: Buffer 105: Pixel compositing unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Transforming Electric Information Into Light Information (AREA)

Abstract

One purpose of this disclosure is to reduce a delay time before outputting a synthesized video. According to this disclosure, a video synthesizer detects the input timing of each of input frames constituting multiple video signals, starts sequentially a process of synthesizing a set number of video signals among the multiple video signals when the set number of video signals are input, and generates an output frame obtained by synthesizing the multiple video signals into one video signal.

Description

映像信号を合成する装置、方法及びプログラムDevices, methods and programs for synthesizing video signals
 複数の映像入力信号から、画面を1つに合成し、出力する、映像合成システムに関する。 Regarding a video composition system that synthesizes and outputs screens from multiple video input signals into one.
 近年、多くの映像デバイスが利用されている。このような多くの映像デバイスの映像には、多様な画素数(解像度)、フレームレート等が利用されている。この映像デバイスの映像信号は、規格によって、物理的な信号、コントロール信号等に差異があるものの、1画面をそのフレームレート分の1の時間を使って伝送する。 In recent years, many video devices have been used. Various numbers of pixels (resolutions), frame rates, and the like are used in the images of many such image devices. The video signal of this video device is transmitted on one screen using a time of 1 part of the frame rate, although there are differences in physical signals, control signals, etc. depending on the standard.
 これらの映像の利用方法には、テレビ会議など、複数のカメラをカメラの数よりも少ないモニタで表示するような形態がある。このような場合、複数の映像を、例えば1つの画面上に分割表示することや、ある映像画面中に、その他の映像画面縮小表示などをしてはめ込むことなどの、画面合成を行う。 There is a form of using these images, such as a video conference, in which multiple cameras are displayed on a monitor that is smaller than the number of cameras. In such a case, screen composition is performed, for example, a plurality of images are divided and displayed on one screen, or another image screen is reduced and displayed in a certain image screen.
 通常、映像信号のタイミングは同期されておらず、合成する他の映像信号のタイミングが異なることから、信号をメモリなどに一時的にバッファリングしてから、合成する。結果として、合成された画面の出力には遅延が発生する。 Normally, the timing of the video signal is not synchronized, and the timing of other video signals to be combined is different. Therefore, the signal is temporarily buffered in a memory or the like before being combined. As a result, there is a delay in the output of the combined screen.
 遠隔地などでの合奏等をこのような画面合成を行うテレビ会議で行うことを想定すると、この合成に関わる遅延は、その実現性を大きく損なう。例えば、1秒間に120拍の曲(以下、120BPM(Beat Per Minute))であれば、1拍の時間は、60/120秒=500ミリ秒である。仮にこれを、5%の精度で合わせることが必要であるとすると、500×0.05=25ミリ秒以下にカメラで撮影して表示するまでの遅延を抑える必要がある。 Assuming that an ensemble in a remote place is performed in a video conference that performs such screen composition, the delay related to this composition greatly impairs its feasibility. For example, in the case of a song with 120 beats per second (hereinafter, 120 BPM (Beat Per Minute)), the time for one beat is 60/120 seconds = 500 milliseconds. If it is necessary to adjust this with an accuracy of 5%, it is necessary to suppress the delay until the camera takes a picture and displays it at 500 × 0.05 = 25 milliseconds or less.
 カメラで撮影して表示するまでには、実際には、合成に関わる処理以外に、カメラでの画像処理時間、モニタでの表示時間、伝送に関わる時間などの、その他の遅延も含む必要がある。結果として、従来技術では、遠隔地で相互に映像を見ながらの合奏等のタイミングが重視される用途での、協調作業は困難であった。 In addition to the processing related to compositing, it is actually necessary to include other delays such as image processing time on the camera, display time on the monitor, and time related to transmission before shooting and displaying with the camera. .. As a result, with the prior art, it has been difficult to perform collaborative work in applications where timing such as ensemble while watching images mutually at a remote location is important.
 そこで、低遅延要求が厳しい協調作業に対して、複数拠点などの複数の映像信号を合成するシステムで、非同期の映像信号の入力から、合成された映像信号の出力までの時間を低遅延化するシステムの提供が必要である。 Therefore, in a system that synthesizes multiple video signals from multiple locations, etc., for collaborative work with strict low delay requirements, the time from the input of asynchronous video signals to the output of the synthesized video signals is reduced. It is necessary to provide the system.
 本開示は、合成映像の出力までの遅延時間を短縮することを目的とする。 The purpose of this disclosure is to reduce the delay time until the output of the composite video.
 本開示は、複数の非同期の映像を合成して表示する装置において、入力された複数の映像信号から入力タイミングの早い順に上部から画面を配置する合成処理を行う。 In the present disclosure, in a device that synthesizes and displays a plurality of asynchronous videos, a synthesis process is performed in which the screens are arranged from the top in the order of the earliest input timing from the plurality of input video signals.
 本開示の映像合成装置は、
 複数の映像信号を構成する各入力フレームの入力タイミングを検出し、
 前記複数の映像信号のうちの設定された数の映像信号が入力されると、前記設定された数の映像信号の合成処理を順次開始し、
 前記複数の映像信号を1つの映像信号に合成した出力フレームを生成する。
The video synthesizer of the present disclosure is
Detects the input timing of each input frame that composes multiple video signals,
When a set number of video signals among the plurality of video signals are input, the synthesis processing of the set number of video signals is sequentially started.
An output frame is generated by synthesizing the plurality of video signals into one video signal.
 本開示の映像合成方法は、
 映像合成装置が、
 複数の映像信号を構成する各入力フレームの入力タイミングを検出し、
 前記複数の映像信号のうちの設定された数の映像信号が入力されると、前記設定された数の映像信号の合成処理を順次開始し、
 前記複数の映像信号を1つの映像信号に合成した出力フレームを生成する。
The video composition method of the present disclosure is
The video synthesizer
Detects the input timing of each input frame that composes multiple video signals,
When a set number of video signals among the plurality of video signals are input, the synthesis processing of the set number of video signals is sequentially started.
An output frame is generated by synthesizing the plurality of video signals into one video signal.
 本開示のプログラムは、本開示に係る装置に備わる各機能部としてコンピュータを実現させるためのプログラムであり、本開示に係る装置が実行する方法に備わる各ステップをコンピュータに実行させるためのプログラムである。 The program of the present disclosure is a program for realizing a computer as each functional unit provided in the apparatus according to the present disclosure, and is a program for causing the computer to execute each step provided in the method of the apparatus according to the present disclosure. ..
 本開示は、合成映像の出力までの遅延時間を短縮することができる。 This disclosure can shorten the delay time until the output of the composite video.
映像信号に含まれる画面の情報の一例を示す。An example of screen information included in a video signal is shown. 画面の合成例を示す。An example of screen composition is shown. 本開示に関連する映像合成方法の一例を示す。An example of a video composition method related to the present disclosure is shown. 本開示の映像合成方法の一例を示す。An example of the video synthesis method of the present disclosure is shown. 本開示の映像合成方法の一例を示す。An example of the video synthesis method of the present disclosure is shown. 本開示の映像合成方法の一例を示す。An example of the video synthesis method of the present disclosure is shown. 本開示の映像合成方法の一例を示す。An example of the video synthesis method of the present disclosure is shown. 本実施形態に係る映像合成装置の構成例を示す。A configuration example of the video synthesizer according to this embodiment is shown.
 以下、本開示の実施形態について、図面を参照しながら詳細に説明する。なお、本開示は、以下に示す実施形態に限定されるものではない。これらの実施の例は例示に過ぎず、本開示は当業者の知識に基づいて種々の変更、改良を施した形態で実施することができる。なお、本明細書及び図面において符号が同じ構成要素は、相互に同一のものを示すものとする。 Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. The present disclosure is not limited to the embodiments shown below. Examples of these implementations are merely examples, and the present disclosure can be implemented in various modified and improved forms based on the knowledge of those skilled in the art. In the present specification and the drawings, the components having the same reference numerals shall indicate the same components.
 図1に、映像信号に含まれる画面の情報の一例を示す。画面の情報は、画面を横方向に1つの走査線21毎に走査して、順次下の走査線21を走査することで伝送される。この走査には、表示画面24の他、ブランキング部分22、また、ボーダ部分23などのオーバヘッド情報/信号の走査を含む。ブランキング部分22に、制御情報や音声情報など、映像情報以外の情報を含む場合もある。(例えば、非特許文献1、第3章参照。) FIG. 1 shows an example of screen information included in a video signal. The information on the screen is transmitted by scanning the screen in the horizontal direction for each scanning line 21 and sequentially scanning the lower scanning line 21. This scan includes scanning of overhead information / signals such as the blanking portion 22 and the border portion 23 in addition to the display screen 24. The blanking portion 22 may include information other than video information, such as control information and audio information. (See, for example, Non-Patent Document 1, Chapter 3.)
 図2に、映像信号の合成例を示す。本開示では、一例として、映像合成装置に4つの映像信号が入力され、映像合成装置が1つの映像信号に合成して出力する例を示す。映像信号では1画面をそのフレームレート分の1の時間を使って伝送する。例えば、1秒間に60フレームの映像信号であれば、1/60秒、すなわち約16.7ミリ秒を掛けて1画面の映像信号を伝送する(以下、60fps(Frame per Second))。映像信号に含まれる各時点での1画面の情報を「フレーム」と称し、映像合成装置に入力される各映像信号の1画面の情報を「入力フレーム」、映像合成装置から出力される合成された1画面の情報を「出力フレーム」と称する。 FIG. 2 shows an example of synthesizing video signals. In the present disclosure, as an example, four video signals are input to the video synthesizer, and the video synthesizer synthesizes and outputs one video signal. For video signals, one screen is transmitted using a time that is one-third of the frame rate. For example, in the case of a video signal of 60 frames per second, the video signal of one screen is transmitted in 1/60 seconds, that is, about 16.7 milliseconds (hereinafter, 60 fps (Frame per Second)). The information on one screen at each time point included in the video signal is called a "frame", and the information on one screen of each video signal input to the video synthesizer is called an "input frame", which is synthesized and output from the video synthesizer. The information on one screen is referred to as an "output frame".
 例えば、図3に示すように、映像合成装置が、全ての入力フレームを読み込んでから、1つの出力フレームに合成し、出力する場合を考える。この場合、各入力フレームのフレーム時間をT_f、合成処理時間をT_pとすると、出力フレームの出力は、最初の入力1の入力開始時点から最大で、2T_f+T_p遅れることとなる。フレームレートが60fpsの場合、少なくとも33.3ミリ秒の遅延が生じる。 For example, consider a case where the video synthesizer reads all the input frames, then synthesizes them into one output frame, and outputs them, as shown in FIG. In this case, assuming that the frame time of each input frame is T_f and the synthesis processing time is T_p, the output of the output frame is delayed by 2T_f + T_p at the maximum from the input start time of the first input 1. If the frame rate is 60 fps, there will be a delay of at least 33.3 ms.
 本開示の装置及び方法は、複数の非同期の映像を入力し、それらの画像を合成するシステムであって、入力タイミングの早い順に合成処理を開始し、画面の上部に配置することを特徴とする。 The apparatus and method of the present disclosure is a system for inputting a plurality of asynchronous videos and synthesizing the images, and is characterized in that the compositing process is started in the order of the earliest input timing and arranged at the upper part of the screen. ..
 本実施形態では、4入力、4分割1画面出力で各入力が1/4 T_f時間ずつずれがある場合について述べる。入力タイミングが早い順に入力1,2,3,4とする。この場合、4つの映像信号のうちの入力1及び入力2の映像信号が表示画面24の上部に表示され、残りの入力3及び入力4の映像信号が表示画面24の下部に表示される。そこで、本実施形態では、4つの映像信号のうちの2つの映像信号が入力されると、当該2つの映像信号の合成処理を開始する。簡略化のため、映像信号のブランキング部分22、ボーダ部分23については除外して、表示画面24の部分の信号のみについて説明する。 In this embodiment, a case where each input has a deviation of 1/4 T_f time in 4 inputs, 4 divisions and 1 screen output will be described. Inputs are 1, 2, 3, and 4 in order of input timing. In this case, the video signals of input 1 and input 2 out of the four video signals are displayed at the upper part of the display screen 24, and the remaining video signals of input 3 and input 4 are displayed at the lower part of the display screen 24. Therefore, in the present embodiment, when two video signals out of the four video signals are input, the synthesis processing of the two video signals is started. For the sake of simplification, the blanking portion 22 and the border portion 23 of the video signal are excluded, and only the signal of the portion of the display screen 24 will be described.
 図4に、本開示の第1の合成例を示す。入力2の入力が完了した時点で、合成処理(1)を始め、表示画面24の上部への出力を開始する。
 次いで、入力4の入力が完了した時点で、合成処理(2)を始め、表示画面24の下部への出力を開始する。
 この場合、入力1の入力開始時点から上部の表示画面24の出力開始時点までの最大遅延時間は(5/4T_f+T_p)となる。これにより、3/4 T_f時間、図3に示す例より出力遅延を短縮できる。例えば、フレームレートが60fpsの場合、21ミリ秒+T_pの遅延になる。
FIG. 4 shows a first synthetic example of the present disclosure. When the input of the input 2 is completed, the composition process (1) is started and the output to the upper part of the display screen 24 is started.
Next, when the input of the input 4 is completed, the composition process (2) is started, and the output to the lower part of the display screen 24 is started.
In this case, the maximum delay time from the input start time of the input 1 to the output start time of the upper display screen 24 is (5 / 4T_f + T_p). As a result, the output delay can be shortened by 3/4 T_f time compared to the example shown in FIG. For example, when the frame rate is 60 fps, the delay is 21 milliseconds + T_p.
 次いで、各入力が1/4 T_f時間ずつのずれではない場合について述べる。
 図5に示すように、入力2と入力4の入力フレームの末尾の時間差T_in2toin4がT_f/2より長い場合は、入力1及び入力2を合成した上部の表示画面24の出力に続いて下部の表示画面24の出力が間に合うよう、入力2の入力完了後に、少なくともT_in2toin4-T_f/2時間、合成処理(1)の開始を待つ。もしくは、合成処理(1)を実施してから、T_in2toin4-T_f/2時間上部画面の出力フレームの出力を待つようにしてもよい。
Next, a case where each input is not shifted by 1/4 T_f time will be described.
As shown in FIG. 5, when the time difference T_in2toin4 at the end of the input frame of the input 2 and the input 4 is longer than T_f / 2, the lower display follows the output of the upper display screen 24 in which the input 1 and the input 2 are combined. After the input of the input 2 is completed, the start of the synthesis process (1) is waited for at least T_in2toin4-T_f / 2 hours so that the output of the screen 24 is in time. Alternatively, after performing the synthesis process (1), the output of the output frame on the upper screen may be waited for T_in2toin4-T_f / 2 hours.
 図6に示すように、入力2と入力4の入力フレームの末尾の時間差T_in2toin4がT_f/2より短い場合は、上部の表示画面24の出力に続いて下部の表示画面24の出力が間に合うよう、入力4の入力完了後に、T_f/2-T_in2toin4時間、合成処理(2)の開始を待つ。もしくは、合成処理(2)を実施してからT_f/2-T_in2toin4時間、下部画面の出力フレームの出力を待つようにしてもよい。 As shown in FIG. 6, when the time difference T_in2toin4 at the end of the input frame of the input 2 and the input 4 is shorter than T_f / 2, the output of the upper display screen 24 is followed by the output of the lower display screen 24 so that the output of the lower display screen 24 is in time. After the input of the input 4 is completed, the start of the synthesis process (2) is waited for T_f / 2-T_in2toin 4 hours. Alternatively, the output of the output frame on the lower screen may be waited for T_f / 2-T_in2toin 4 hours after the synthesis process (2) is performed.
 なお、本実施形態では、図2に示すような、4つの映像信号を上2画面下2画面の1つの映像信号に合成して出力する例を示す。このため、図5及び図6の例では、入力フレームの末尾の時間差を入力2と入力4の時間差とし、その比較対象をT_f/2と比較した。しかし、入力フレームの末尾の時間差及びこれの比較対象は、合成する映像信号の数及び画面の配置に応じて定められる任意の数でありうる。例えば、6つの映像信号を上2画面、中2画面、下3画面の1つの映像信号に合成する場合、入力フレームの末尾の時間差を入力4と入力6の時間差とし、その比較対象をT_f/3とすればよい。 In this embodiment, an example is shown in which four video signals as shown in FIG. 2 are combined into one video signal on the upper two screens and the lower two screens and output. Therefore, in the examples of FIGS. 5 and 6, the time difference at the end of the input frame is set as the time difference between the input 2 and the input 4, and the comparison target is compared with T_f / 2. However, the time difference at the end of the input frame and the comparison target thereof may be any number determined according to the number of video signals to be combined and the arrangement of the screen. For example, when synthesizing six video signals into one video signal of the upper two screens, the middle two screens, and the lower three screens, the time difference at the end of the input frame is the time difference between the input 4 and the input 6, and the comparison target is T_f /. It may be set to 3.
 なお、実際の映像信号には、前記したブランキング部分やボーダ部分などのオーバヘッド部分あるため、前記比較対象のT_f/2や、T_f/3は、表示画面24に対しての信号に対しての数値であり、オーバヘッド部分に応じて、補正する必要がある。 Since the actual video signal has an overhead portion such as the blanking portion and the border portion described above, the T_f / 2 and T_f / 3 to be compared are the signals for the display screen 24. It is a numerical value and needs to be corrected according to the overhead portion.
 図7を参照して、前記までの方法をパイプライン処理化した場合について述べる。パイプライン化合成処理時間をT_pp(Time of Pipelined Processing)とする。ここでの、パイプライン化合成処理時間とは、パイプラインのための最初のオーバヘッド時間(データの読み込み等を含む次段の処理に渡す前の処理全般に要する時間)のみを示し、合成処理は、入力または、出力に合わせて継続的に実行される。パイプライン化合成処理の実態の時間は、後段処理である出力までに、パイプライン処理上の1単位データを処理する時間である。この場合は、映像信号の入力の終了時刻+T_pp時点で、出力フレームの出力が完了するよう処理を開始できる。 With reference to FIG. 7, a case where the above method is pipelined will be described. The pipelined synthesis processing time is T_pp (Time of Pipelined Processing). Here, the pipelined synthesis processing time indicates only the initial overhead time for the pipeline (the time required for the entire processing before passing it to the next stage processing including data reading etc.), and the synthesis processing is , Is executed continuously according to the input or output. The actual time of the pipelined synthesis process is the time for processing one unit of data in the pipeline process before the output, which is the subsequent process. In this case, the process can be started so that the output of the output frame is completed at the end time of the video signal input + T_pp.
 4入力、4分割1画面出力で、各入力フレームに1/4 T_f時間ずつずれがある場合について述べる。入力タイミングが早い順に入力1,2,3,4とする。簡略化のため、映像信号のブランキング部分22、ボーダ部分23については除外して、表示画面24の信号のみについて説明する。 A case where there is a 1/4 T_f time difference in each input frame with 4 inputs and 4 divisions and 1 screen output will be described. Inputs are 1, 2, 3, and 4 in order of input timing. For simplification, the blanking portion 22 and the border portion 23 of the video signal are excluded, and only the signal of the display screen 24 will be described.
 入力2の入力完了時刻T_2EからT_pp時間を経過した時点が、上部の表示画面24の出力完了時点T_UEと一致するように合成処理(1)を始め、表示画面24の上部への出力を開始する。
 入力4の入力完了時刻T_4EからT_pp時間を経過した時点が、下部の表示画面24の出力完了時点T_DEと一致するように合成処理(2)を始め、表示画面24の下部への出力を開始する。
 この場合、入力1の入力開始時点から上部の表示画面24の出力開始時点までの最大遅延時間は(3/4 T_f+T_pp)となる。これにより、図3に示す例より出力遅延を短縮できる。例えば、フレームレートが60fpsの場合、12.5ミリ秒+T_ppの遅延になる。
The synthesis process (1) is started so that the time when the T_pp time elapses from the input completion time T_2E of the input 2 coincides with the output completion time T_UE of the upper display screen 24, and the output to the upper part of the display screen 24 is started. ..
The synthesis process (2) is started so that the time when the T_pp time elapses from the input completion time T_4E of the input 4 coincides with the output completion time T_DE of the lower display screen 24, and the output to the lower part of the display screen 24 is started. ..
In this case, the maximum delay time from the input start time of the input 1 to the output start time of the upper display screen 24 is (3/4 T_f + T_pp). As a result, the output delay can be shortened as compared with the example shown in FIG. For example, when the frame rate is 60 fps, the delay is 12.5 milliseconds + T_pp.
 次いで、各入力が1/4 T_f時間ずつのずれではない場合について述べる。
 入力2と入力4の入力フレームの末尾の時間差T_in2toin4がT_f/2より長い場合は、入力1及び入力2を合成した上部の表示画面24の出力に続いて下部の表示画面24の出力が間に合うよう、入力2の入力完了後に、少なくともT_in2toin4-T_f/2時間、合成処理(1)の開始を待つ。もしくは、合成処理(1)を実施してから、T_in2toin4-T_f/2時間出力フレームの出力を待つようにしてもよい。
Next, a case where each input is not shifted by 1/4 T_f time will be described.
When the time difference T_in2toin4 at the end of the input frame of input 2 and input 4 is longer than T_f / 2, the output of the upper display screen 24 that combines input 1 and input 2 is followed by the output of the lower display screen 24. After the input of the input 2 is completed, the start of the synthesis process (1) is waited for at least T_in2toin4-T_f / 2 hours. Alternatively, after performing the synthesis process (1), the output of the T_in2toin4-T_f / 2 hour output frame may be waited for.
 入力2と入力4の入力フレームの末尾の時間差T_in2toin4がT_f/2より短い場合は、上部の表示画面24の出力に続いて下部の表示画面24出力が間に合うよう、入力4の入力完了後に、T_f/2-T_in2toin4時間、合成処理(2)の開始を待つ。もしくは、合成処理(2)を実施してからT_f/2-T_in2toin4時間、出力フレームの出力を待つようにしてもよい。 When the time difference T_in2toin4 at the end of the input frame of input 2 and input 4 is shorter than T_f / 2, T_f after the input of input 4 is completed so that the output of the upper display screen 24 is followed by the output of the lower display screen 24. / 2-T_in2toin 4 hours, wait for the start of the synthesis process (2). Alternatively, the output of the output frame may be waited for T_f / 2-T_in2toin 4 hours after the synthesis process (2) is performed.
 なお、本実施形態では、図2に示すような、4つの映像信号を上2画面下2画面の1つの映像信号に合成して出力する例を示す。このため、前記の例では、入力フレームの末尾の時間差を入力2と入力4の時間差とし、その比較対象をT_f/2と比較した。しかし、入力フレームの末尾の時間差及びこれの比較対象は、合成する映像信号の数及び画面の配置に応じて定められる任意の数でありうる。例えば、6つの映像信号を上2画面、中2画面、下3画面の1つの映像信号に合成する場合、入力フレームの末尾の時間差を入力4と入力6の時間差とし、その比較対象をT_f/3とすればよい。 In this embodiment, an example is shown in which four video signals as shown in FIG. 2 are combined into one video signal on the upper two screens and the lower two screens and output. Therefore, in the above example, the time difference at the end of the input frame is set as the time difference between the input 2 and the input 4, and the comparison target is compared with T_f / 2. However, the time difference at the end of the input frame and the comparison target thereof may be any number determined according to the number of video signals to be combined and the arrangement of the screen. For example, when synthesizing six video signals into one video signal of the upper two screens, the middle two screens, and the lower three screens, the time difference at the end of the input frame is the time difference between the input 4 and the input 6, and the comparison target is T_f /. It may be set to 3.
 なお、実際の映像信号には、前記したブランキング部分やボーダ部分などのオーバヘッド部分あるため、前記比較対象のT_f/2や、T_f/3は、表示画面24に対しての信号に対しての数値であり、オーバヘッド部分に応じて、補正する必要がある。 Since the actual video signal has an overhead portion such as the blanking portion and the border portion described above, the T_f / 2 and T_f / 3 to be compared are the signals for the display screen 24. It is a numerical value and needs to be corrected according to the overhead portion.
 図8に、本実施形態に係るシステム構成の一例を示す。本実施形態に係る映像合成装置10は、検出部101、クロスバスイッチ102、アップダウンコンバータ103、バッファ104、画素合成部105を備える。図は4入力1出力であるが、任意の数の入出力でも構わない。 FIG. 8 shows an example of the system configuration according to this embodiment. The video compositing device 10 according to the present embodiment includes a detection unit 101, a crossbar switch 102, an up / down converter 103, a buffer 104, and a pixel compositing unit 105. The figure shows 4 inputs and 1 output, but any number of inputs and outputs may be used.
 101は、N個の入力フレームに対して、フレーム時間内の入力順を検出する機能部である。例えば、図4及び図5に示す入力1、2、3、4の入力フレームが入力された入力タイミングを検出し、入力タイミングを用いて入力1、2、3、4の順序を判定する。
 102は、クロスバスイッチであり、101からの入力順序の検出結果順に並べ替え出力する機能である。例えば、図4及び図5に示す入力1、2、3、4の順に入力フレーム順を並べる。
 103は画素数を任意の大きさに拡大縮小を行う、アップダウンコンバータである。例えば、入力1の画素数を、図2に示す画面の大きさに整合するよう拡大又は縮小する。
 102と103は入力(a,b,c,d,…)に対して逆に接続しても構わない。すなわち入力a,b,c,dから103で拡大縮小を行い、その後102で入力1、2、3、4の順に並べ替え出力しても構わない。
 104は、各入力フレームを格納するバッファである。103または102の入力をバッファリングして、任意の順序で出力することができる。
 105は、画素合成部である。画素合成部105は、出力の全体画面のうち、出力する順に104から画素データを読み出し、合成して出力フレームを生成し、出力する。これにより、図2に示すような、4つの映像信号が合成された映像が画面に表示される。このタイミングは、前記による。105は、任意のコントロール信号を画面のブランキング部分22に付加しても構わない。
Reference numeral 101 is a functional unit that detects the input order within the frame time for N input frames. For example, the input timings of the inputs 1, 2, 3, and 4 shown in FIGS. 4 and 5 are detected, and the order of the inputs 1, 2, 3, and 4 is determined using the input timings.
102 is a crossbar switch, and is a function of sorting and outputting in the order of detection results of the input order from 101. For example, the input frame order is arranged in the order of inputs 1, 2, 3, and 4 shown in FIGS. 4 and 5.
Reference numeral 103 is an up / down converter that enlarges / reduces the number of pixels to an arbitrary size. For example, the number of pixels of the input 1 is enlarged or reduced so as to match the size of the screen shown in FIG.
102 and 103 may be connected in reverse to the inputs (a, b, c, d, ...). That is, the inputs a, b, c, and d may be enlarged or reduced at 103, and then the inputs 1, 2, 3, and 4 may be rearranged and output at 102.
Reference numeral 104 is a buffer for storing each input frame. The inputs of 103 or 102 can be buffered and output in any order.
Reference numeral 105 is a pixel synthesizing unit. The pixel synthesizing unit 105 reads pixel data from 104 in the order of output from the entire output screen, synthesizes them, generates an output frame, and outputs the data. As a result, a video in which the four video signals are combined is displayed on the screen as shown in FIG. This timing is as described above. The 105 may add an arbitrary control signal to the blanking portion 22 of the screen.
 本開示の装置は、コンピュータとプログラムによっても実現でき、プログラムを記録媒体に記録することも、ネットワークを通して提供することも可能である。 The device of the present disclosure can also be realized by a computer and a program, and the program can be recorded on a recording medium or provided through a network.
 上述の実施形態では4入力、4分割1画面の例を示したが、本開示はこれに限らず、任意の入力に適用できる。また上述の実施形態では、主に入力1~4のフレーム時間T_fが同じ例を示したが、本開示はフレーム時間T_fが異なる入力1~4に対しても適用可能である。 In the above embodiment, an example of 4 inputs, 4 divisions and 1 screen is shown, but the present disclosure is not limited to this, and can be applied to any input. Further, in the above-described embodiment, the example in which the frame times T_f of the inputs 1 to 4 are the same is mainly shown, but the present disclosure can be applied to the inputs 1 to 4 having different frame times T_f.
(本開示の効果)
 非同期の映像入力信号の入力タイミングの早い順に、上部から画面を合成配置し、出力することで、合成後の出力までの遅延時間を短縮することができる。これにより、複数拠点等の複数画面を合成するシステムで低遅延要求が厳しい協調作業が可能となる。
(Effect of this disclosure)
By arranging the screens from the top in the order of the input timing of the asynchronous video input signal and outputting them, the delay time until the output after the composition can be shortened. This enables collaborative work with strict low delay requirements in a system that synthesizes multiple screens at multiple locations.
 本開示は、映像コンテンツやゲームコンテンツを配信する情報通信産業のほか、映像制作に関わる映画、広告、ゲーム産業に適用することができる。 This disclosure can be applied to the movie, advertising, and game industries related to video production, as well as the information and communication industry that distributes video content and game content.
10:映像合成装置
21:走査線
22:ブランキング部分
23:ボーダ部分
24:表示画面
101:検出部
102:クロスバスイッチ
103:アップダウンコンバータ
104:バッファ
105:画素合成部
10: Video compositing device 21: Scanning line 22: Blanking part 23: Border part 24: Display screen 101: Detection unit 102: Crossbar switch 103: Up / down converter 104: Buffer 105: Pixel compositing unit

Claims (5)

  1.  複数の映像信号を構成する各入力フレームの入力タイミングを検出し、
     前記複数の映像信号のうちの設定された数の映像信号が入力されると、前記設定された数の映像信号の合成処理を順次開始し、
     前記複数の映像信号を1つの映像信号に合成した出力フレームを生成する、
     映像合成装置。
    Detects the input timing of each input frame that composes multiple video signals,
    When a set number of video signals among the plurality of video signals are input, the synthesis processing of the set number of video signals is sequentially started.
    Generates an output frame in which the plurality of video signals are combined into one video signal.
    Video synthesizer.
  2.  前記複数の映像信号のうちの設定された数の映像信号の入力完了時点と前記複数の映像信号の最後の映像信号の入力完了時点との時間差を、前記複数の映像信号の数若しくは合成画面の配置のいずれか又は両方に応じて定められる時間と比較し、
     前記時間差が前記複数の映像信号の数若しくは合成画面の配置のいずれか又は両方に応じて定められる時間よりも長い場合は、前記設定された数の映像信号の合成処理のタイミングを調整し、
     前記時間差が前記複数の映像信号の数若しくは合成画面の配置のいずれか又は両方に応じて定められる時間よりも短い場合は、前記複数の映像信号のうちの残りの映像信号の合成処理のタイミングを調整する、
     請求項1に記載の映像合成装置。
    The time difference between the time when the input of the set number of video signals among the plurality of video signals is completed and the time when the input of the last video signal of the plurality of video signals is completed is set as the number of the plurality of video signals or the composite screen. Compared to the time set according to one or both of the placements,
    If the time difference is longer than the time determined according to the number of the plurality of video signals and / or the arrangement of the composite screens, the timing of the composite processing of the set number of video signals is adjusted.
    When the time difference is shorter than the time determined according to the number of the plurality of video signals and / or the arrangement of the composite screens, the timing of the synthesis processing of the remaining video signals among the plurality of video signals is set. adjust,
    The video synthesizer according to claim 1.
  3.  前記複数の映像信号のすべての入力が完了し、前記複数の映像信号の合成処理が完了した時点で前記出力フレームの出力が完了するよう、前記複数の映像信号の合成処理を開始する、
     請求項1に記載の映像合成装置。
    The composition processing of the plurality of video signals is started so that the output of the output frame is completed when all the inputs of the plurality of video signals are completed and the composition processing of the plurality of video signals is completed.
    The video synthesizer according to claim 1.
  4.  映像合成装置が、
     複数の映像信号を構成する各入力フレームの入力タイミングを検出し、
     前記複数の映像信号のうちの設定された数の映像信号が入力されると、前記設定された数の映像信号の合成処理を順次開始し、
     前記複数の映像信号を1つの映像信号に合成した出力フレームを生成する、
     映像合成方法。
    The video synthesizer
    Detects the input timing of each input frame that composes multiple video signals,
    When a set number of video signals among the plurality of video signals are input, the synthesis processing of the set number of video signals is sequentially started.
    Generates an output frame in which the plurality of video signals are combined into one video signal.
    Video composition method.
  5.  請求項1から3のいずれかに記載の映像合成装置に備わる各機能部としてコンピュータを実現させるためのプログラム。 A program for realizing a computer as each functional unit provided in the video synthesizer according to any one of claims 1 to 3.
PCT/JP2020/047864 2020-12-22 2020-12-22 Device, method, and program for synthesizing video signals WO2022137325A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2022570805A JPWO2022137325A1 (en) 2020-12-22 2020-12-22
PCT/JP2020/047864 WO2022137325A1 (en) 2020-12-22 2020-12-22 Device, method, and program for synthesizing video signals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/047864 WO2022137325A1 (en) 2020-12-22 2020-12-22 Device, method, and program for synthesizing video signals

Publications (1)

Publication Number Publication Date
WO2022137325A1 true WO2022137325A1 (en) 2022-06-30

Family

ID=82159167

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/047864 WO2022137325A1 (en) 2020-12-22 2020-12-22 Device, method, and program for synthesizing video signals

Country Status (2)

Country Link
JP (1) JPWO2022137325A1 (en)
WO (1) WO2022137325A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009265319A (en) * 2008-04-24 2009-11-12 Mitsubishi Electric Corp Video composition device
JP2015165628A (en) * 2014-03-03 2015-09-17 Smk株式会社 image processing system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009265319A (en) * 2008-04-24 2009-11-12 Mitsubishi Electric Corp Video composition device
JP2015165628A (en) * 2014-03-03 2015-09-17 Smk株式会社 image processing system

Also Published As

Publication number Publication date
JPWO2022137325A1 (en) 2022-06-30

Similar Documents

Publication Publication Date Title
JP4991129B2 (en) Video / audio playback apparatus and video / audio playback method
JP2004522365A (en) Apparatus and method for high-quality multi-screen division with multi-channel input
JP6986621B2 (en) Display device, multi-display system
US8154654B2 (en) Frame interpolation device, frame interpolation method and image display device
WO2022137325A1 (en) Device, method, and program for synthesizing video signals
WO2023017577A1 (en) Apparatus, method, and program for combining video signals
WO2022137324A1 (en) Device for synthesizing video signal, method, and program
WO2022137326A1 (en) Video and sound synthesis device, method, and program
JP4723427B2 (en) Image processing circuit, image processing system, and image processing method
WO2023017578A1 (en) Device, method, and program for compositing video signals
WO2023013072A1 (en) Device, method, and program for combining video signals
JP2003348446A (en) Video signal processing apparatus
JP2005338498A (en) Display memory device
US20150189127A1 (en) Video processing apparatus
JP7480908B2 (en) Image synthesizing device and image synthesizing method
JP5077037B2 (en) Image processing device
JP5489649B2 (en) Image processing apparatus and image processing method
JP6261696B2 (en) Image processing apparatus and control method thereof
JPH05173530A (en) Multiinput video signal display device
JP5534948B2 (en) Image processing apparatus and control method thereof
JPH05176229A (en) Multi-input video signal display device
JP2010035092A (en) Video signal processing method, and video image signal processor
JPH099164A (en) Multi-screen signal processing unit
JPH0431892A (en) Video signal displaying device
JP2006253828A (en) Digital camera

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20966818

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022570805

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20966818

Country of ref document: EP

Kind code of ref document: A1