JP2018156344A

JP2018156344A - Video stream consistency determination program

Info

Publication number: JP2018156344A
Application number: JP2017052092A
Authority: JP
Inventors: 勇太萩尾; Yuta Hagio; 寿小林; Hisashi Kobayashi; 吉彦河合; Yoshihiko Kawai; 祐太星; Yuta Hoshi
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2017-03-17
Filing date: 2017-03-17
Publication date: 2018-10-04
Anticipated expiration: 2037-03-17
Also published as: JP6850166B2

Abstract

PROBLEM TO BE SOLVED: To provide a video stream consistency determination program that compares a plurality of resource videos with on-air videos currently broadcasting and can determine the consistency between a resource video and a broadcast-using video even if there is a spatial difference or a temporal difference.SOLUTION: A computer extracts feature quantities of the latest predetermined number of frames of a resource video and the latest one frame of a broadcast-using video, calculates the degree of consistency between the latest predetermined number of frames of the resource video and the latest one frame of the broadcast-using video based on the extracted feature quantities, calculates a sum of degrees of consistency calculated during a period from the present time to a predetermined time before for each of the latest predetermined number of frames of the resource video, calculates an average by dividing the sum by a predetermined number, determines whether the average is equal to or less than a predetermined threshold, and when the average of the degrees of consistency is equal to or less than the predetermined threshold, determines that the resource video having the average of the degrees of consistency equal to or less than the threshold among the plurality of resource videos coincides with the broadcast-using video.SELECTED DRAWING: Figure 1

Description

本発明は、空間的・時間的に差異のある2つの映像ストリームを高速にマッチングする装置に関する。より具体的には、本発明は、テレビ放送局における各種映像の利用状況管理などに利用できる映像ストリームの一致判定プログラムに関する。 The present invention relates to an apparatus for rapidly matching two video streams having spatial and temporal differences. More specifically, the present invention relates to a video stream matching determination program that can be used for managing the use status of various videos in a television broadcasting station.

従来より、テレビ放送局において放送にオンエア利用されているリソース（素材）映像の識別は、映像信号のアンシラリ領域に信号を付加することで識別を行っていた。しかし、この方法は、テレビ放送局における各種装置においてその信号を読み取る機能を必要とし、新たにこの方法を導入する場合は既存の装置を全て改修・更新する必要がある。したがって、この方法では、放送にオンエア利用されているリソース映像の識別機能を実現するにあたって導入コストが膨大となる。 Conventionally, identification of resource (material) video used on-air for broadcasting in a television broadcasting station has been performed by adding a signal to an ancillary area of a video signal. However, this method requires a function of reading the signal in various devices in a television broadcasting station, and when this method is newly introduced, it is necessary to repair and update all existing devices. Therefore, in this method, the introduction cost becomes enormous when realizing the identification function of the resource video used on the air for broadcasting.

映像信号のアンシラリ領域を利用せずにこれを実現する場合、リソース映像とオンエアする映像をそれぞれキャプチャーして画像処理で比較する方法が考えられる。しかし、オンエアされる映像には文字スーパーが付加されており、さらに映像が縮小されて画面内の一部に利用されているパターンも存在する（空間的な差異）。また、デジタル映像の圧縮・復元の処理によりリソース映像とオンエア映像の間には不定値の遅延が発生している（時間的な差異）。したがって、画像処理により比較して実現する場合、空間的な差異と時間的な差異の双方に対応する必要がある。 When this is realized without using the ancillary area of the video signal, a method of capturing the resource video and the on-air video and comparing them by image processing can be considered. However, a character super is added to the on-air video, and there is a pattern that is further reduced and used as a part of the screen (spatial difference). In addition, an indefinite delay occurs between the resource video and the on-air video due to the digital video compression / decompression process (temporal difference). Therefore, when compared and realized by image processing, it is necessary to deal with both spatial differences and temporal differences.

映像の一致を比較判定する技術としては、特許文献１や特許文献２のような手法が挙げられる。特許文献１は類似画像を判別する装置であり、映像が切り替わるタイミングを検出してチャプター分割することを目的としている。また、特許文献２は映像を集積することでデータベースを構築し、その中から該当する映像を探索するシステムである。特許文献２の技術は、入力した映像ファイルのコピーがネット上に不正アップロードされていないか検出する手法に関するものであり、映像をストリームではなくファイルで扱っていること、検索対象が膨大であることなどから、テレビ放送局において放送にオンエア利用されているリソース（素材）映像の識別とは前提となる条件が異なっている。 As a technique for comparing and judging the coincidence of images, there are methods such as Patent Document 1 and Patent Document 2. Patent Document 1 is a device that discriminates similar images, and aims to detect chapter switching and to divide chapters. Further, Patent Document 2 is a system that constructs a database by accumulating videos and searches for the corresponding video from the database. The technology of Patent Document 2 relates to a method for detecting whether a copy of an input video file has been illegally uploaded on the network. The video is handled as a file instead of a stream, and the search target is enormous. Thus, the prerequisites are different from the identification of resource (material) video that is used on-air for broadcasting in a television broadcasting station.

空間的な差異に対応して映像の一致を比較判定する手法としては、特徴量を用いて検出を行う手法が従来から利用されている（例えば、特許文献３参照）。しかし、この手法単体では時間的な差異には対応することができない。 As a method for comparing and determining the coincidence of video corresponding to a spatial difference, a method of performing detection using a feature amount has been conventionally used (for example, see Patent Document 3). However, this method alone cannot cope with the time difference.

また、時間的な差異に対応して映像の一致を比較判定する手法としては、映像信号の遅延測定手法が提案されている（例えば、特許文献４、５参照）。 In addition, as a method for comparing and judging video coincidence corresponding to temporal differences, a video signal delay measurement method has been proposed (see, for example, Patent Documents 4 and 5).

国際公開第２００９／０３１３９８号International Publication No. 2009/031398 特開２０１１−２３７８７９号公報JP2011-237879A 特開２０１６−１１５２２６号公報JP 2016-115226 A 特開平１１−３４６１９６号公報JP 11-346196 A 特開２００５−３３３１４号公報JP 2005-33314 A

しかしながら、特許文献４の手法は、映像のＴＳ信号に遅延時間測定用のＴＳパケットを付加して遅延量を測定しているため、ＴＳパケットが付加されてしまうことになる。また、入力信号がＴＳ信号のみに限定されるという課題もある。 However, in the method of Patent Document 4, since a delay amount is measured by adding a TS packet for delay time measurement to a video TS signal, a TS packet is added. There is also a problem that the input signal is limited to only the TS signal.

また、特許文献５の手法は、信号発生器等を必要としていない遅延測定装置であるが、空間的な差異については考慮していないため、時間的な差異と空間的な差異の双方には対応できない。 The method of Patent Document 5 is a delay measurement device that does not require a signal generator or the like, but does not consider spatial differences, and thus handles both temporal differences and spatial differences. Can not.

そこで、複数のリソース映像と現在放送にオンエアされている映像を比較し、空間的な差違や時間的な差異が存在していても、リソース映像と放送利用映像の一致性を判定できる映像ストリームの一致判定プログラムを提供することを目的とする。 Therefore, by comparing multiple resource videos with videos currently being aired on the broadcast, even if there are spatial differences or temporal differences, the video stream that can determine the consistency between the resource video and the broadcast video An object is to provide a coincidence determination program.

本発明の実施の形態の映像ストリームの一致判定プログラムは、リソース映像と放送利用映像のマッチングを判定する映像ストリームの一致判定プログラムであって、コンピュータが、複数のリソース映像の各リソース映像について、当該リソース映像の最新の所定数のフレームと、放送利用映像の最新の１個のフレームとの特徴量を抽出し、前記特徴量に基づき、前記リソース映像の最新の所定数のフレームと、前記放送利用映像の最新の１個のフレームとの一致度合を算出し、前記リソース映像の最新の所定数のフレームの各々について、現在から所定時間前までに算出された一致度合の合計を算出し、前記合計を前記所定数で除算することによって前記一致度合の平均値を算出し、前記一致度合の平均値が所定の閾値以下であるかどうかを判定し、前記一致度合の平均値が所定の閾値以下である場合に、前記複数のリソース映像のうち、前記閾値以下の一致度合の平均値を有する当該リソース映像と放送利用映像が一致すると判定する。 A video stream match determination program according to an embodiment of the present invention is a video stream match determination program for determining matching between a resource video and a broadcast-use video. The feature amount of the latest predetermined number of frames of the resource video and the latest one frame of the broadcast use video is extracted, and based on the feature amount, the latest predetermined number of frames of the resource video and the broadcast use The degree of coincidence with the latest one frame of the video is calculated, and for each of the latest predetermined number of frames of the resource video, the total degree of coincidence calculated from the present to a predetermined time before is calculated, and the total Is divided by the predetermined number to calculate an average value of the degree of coincidence, and whether the average value of the degree of coincidence is a predetermined threshold value or less. And when the average value of the degree of coincidence is equal to or less than a predetermined threshold, among the plurality of resource videos, it is determined that the resource video having the average value of the degree of coincidence equal to or less than the threshold matches the broadcast use video. To do.

複数のリソース映像と現在放送にオンエアされている映像を比較し、空間的な差違や時間的な差異が存在していても、リソース映像と放送利用映像の一致性を判定できる映像ストリームの一致判定プログラムを提供することができる。 Video stream match determination that can compare resource video and broadcast-use video even if there is a spatial difference or temporal difference by comparing multiple resource videos with videos that are currently on air A program can be provided.

実施の形態の映像ストリームの一致判定プログラムがインストールされたコンピュータを含む映像処理システム１０を示す図である。1 is a diagram illustrating a video processing system 10 including a computer in which a video stream matching determination program according to an embodiment is installed. FIG. ＣＰＵ１２０の機能ブロックとデータの流れを表すである。It is a functional block of the CPU 120 and the flow of data. ＣＰＵ１２０によって実行される処理を示すフローチャートである。It is a flowchart which shows the process performed by CPU120. ＣＰＵ１２０によって実行される映像ストリームの高速一致判定処理を示す図である。It is a figure which shows the high-speed matching determination process of the video stream performed by CPU120.

以下、本発明の映像ストリームの一致判定プログラムを適用した実施の形態について説明する。 An embodiment to which the video stream matching determination program of the present invention is applied will be described below.

＜実施の形態＞
図１は、実施の形態の映像ストリームの一致判定プログラムがインストールされたコンピュータを含む映像処理システム１０を示す図である。 <Embodiment>
FIG. 1 is a diagram illustrating a video processing system 10 including a computer in which a video stream matching determination program according to an embodiment is installed.

映像処理システム１０は、ＰＣ(Personal Computer)１００、インサータ１５０、及びスピーカ１６０を含む。映像処理システム１０は、例えば、全国を放送エリアとするテレビ放送局（キー局）、全国のうちの一部の地域を放送エリアとするローカル局等に配備される。なお、ここでは、ローカル局に配備される形態について説明する。以下では、特に断らない限り、ローカル局とは、映像処理システム１０が配備される放送局をいう。 The video processing system 10 includes a PC (Personal Computer) 100, an inserter 150, and a speaker 160. The video processing system 10 is deployed, for example, in a television broadcasting station (key station) having a broadcasting area in the whole country, a local station having a broadcasting area in a part of the country. Here, a form deployed in a local station will be described. Hereinafter, unless otherwise specified, the local station refers to a broadcasting station in which the video processing system 10 is provided.

ＰＣ１００は、映像入力端子１０１Ａ、１０１Ｂ、同期信号入力端子１０２、映像出力端子１０３、音声出力端子１０４、ＨＤ(High Definition)−ＳＤＩ(Serial Digital Interface)入出力ボード１１０、ＣＰＵ(Central Processing Unit)１２０、及びメモリ１３０を有する。 The PC 100 includes video input terminals 101A and 101B, a synchronization signal input terminal 102, a video output terminal 103, an audio output terminal 104, an HD (High Definition) -SDI (Serial Digital Interface) input / output board 110, and a CPU (Central Processing Unit) 120. And a memory 130.

映像入力端子１０１Ａ、１０１Ｂには、それぞれ、リソース映像と放送利用映像が入力される。 Resource video and broadcast use video are input to the video input terminals 101A and 101B, respectively.

リソース映像とは、映像処理システム１０が配備されるローカル局で、その時点で放送（オンエア）可能な映像（音声付き）のコンテンツ（番組）であり、ここでは一例として、１つの画面を１６分割して、１６個のリソース映像を表示する。１６個のリソース映像には、例えば、ローカル局が制作するニュース番組等の各番組の映像、ローカル局が保有する番組等のビデオ映像、ローカル局又はキー局が保有する天気カメラの映像、キー局からの中継映像等がある。リソース映像は、映像処理システム１０に含まれるマルチビューワー装置（不図示）から出力され、映像入力端子１０１Ａに入力される。 The resource video is a content (program) of video (with audio) that can be broadcast (on-air) at that point in time at the local station where the video processing system 10 is deployed. Here, as an example, one screen is divided into 16 Then, 16 resource videos are displayed. The 16 resource videos include, for example, video of each program such as a news program produced by the local station, video video of a program held by the local station, weather camera video held by the local station or key station, key station There is a relay video from. The resource video is output from a multi-viewer device (not shown) included in the video processing system 10 and input to the video input terminal 101A.

放送利用映像とは、１６個のリソース映像の中から、ローカル局で選択してオンエアしている１つの番組の映像である。放送利用映像は、ＨＤ−ＳＤＩ入出力ボード１１０によって、１６個のリソース映像から１つのみが放送利用映像として選択されて出力される映像である。映像入力端子１０１Ｂには、比較対象としたい放送利用映像を入力する。例えば、キー局からローカル局へと送られてくる放送プログラムやローカル局が電波塔へと送っている放送プログラム、ローカル局のアンテナが受信した放送波をデコードして得られる映像などが入力される。 The broadcast use video is a video of one program selected from 16 resource videos by the local station and on-air. The broadcast use video is a video that is selected and output as a broadcast use video from the 16 resource videos by the HD-SDI input / output board 110. A broadcast use video to be compared is input to the video input terminal 101B. For example, the broadcast program sent from the key station to the local station, the broadcast program sent from the local station to the radio tower, the video obtained by decoding the broadcast wave received by the local station antenna, etc. are input. .

このため、映像入力端子１０１Ｂに入力される放送利用映像は、マルチビューワー装置から出力されて映像入力端子１０１Ａに入力されるリソース映像よりも遅延しており、リソース映像に対して時間的な差違を有する。また、放送利用映像は、１画面分に表示されるのに対して、リソース映像は、１画面を１６分割されたうちの１区画に表示されるため、画素数が異なる。このため、放送利用映像とリソース映像には、空間的な差違が生じる。また、例えば、放送利用映像が４Ｋや８Ｋである場合には、リソース映像は４Ｋや８Ｋではないため、このようなことによっても放送利用映像とリソース映像には、空間的な差違が生じる。 For this reason, the broadcast use video input to the video input terminal 101B is delayed from the resource video output from the multi-viewer device and input to the video input terminal 101A, and there is a temporal difference with respect to the resource video. Have. In addition, while the broadcast use video is displayed on one screen, the resource video is displayed on one section of the 16 screens, and thus the number of pixels is different. For this reason, there is a spatial difference between the broadcast use video and the resource video. Further, for example, when the broadcast use video is 4K or 8K, the resource video is not 4K or 8K, and this also causes a spatial difference between the broadcast use video and the resource video.

また、リソース映像が縮小されて映像の一部に利用されることによっても放送利用映像とリソース映像には、空間的な差違が生じる場合がある。例えば、スタジオ内でのアナウンサーの映像の横に小さな矩形の画面が表示され、その中に天気カメラの映像が合成されているパターン等が該当する。さらにリソース映像の上から文字スーパー等が付加されることもあり、このようなケースにおいても放送利用映像とリソース映像には、空間的な差違が生じる。 In addition, there may be a spatial difference between the broadcast video and the resource video even when the resource video is reduced and used as a part of the video. For example, a pattern in which a small rectangular screen is displayed beside an announcer's video in a studio, and a weather camera video is synthesized in the screen is applicable. Furthermore, a character super or the like may be added on the resource video, and even in such a case, a spatial difference occurs between the broadcast use video and the resource video.

同期信号入力端子１０２は、ＨＤ−ＳＤＩ入出力ボード１１０が様々な同期処理に利用する同期信号が入力される端子である。 The synchronization signal input terminal 102 is a terminal to which a synchronization signal used by the HD-SDI input / output board 110 for various synchronization processes is input.

映像出力端子１０３は、タリーの映像を表すタリー信号を出力する。タリーとは、１６個のリソース映像の中から、ローカル局で選択してオンエアしている１つの番組のリソース映像を示すために、１画面中の１６個のリソース映像のうちのオンエア中のリソース映像を囲むように表示される赤い枠である。タリー信号は、黒い背景の中に、１画面中の１６個のリソース映像の中で１つだけ選択されるオンエア中の映像に赤く表示される枠の画像を表す信号である。なお、図１には、タリーを白抜きで示す。実際の運用では、１６個のリソースに同一の映像が選択されるケースがあるため、表示されるタリーが１つのみに限定されるわけではない。具体的には、１６分割のリソース映像の中には専用線で伝送されている遅延の少ない映像の他、IP系統で伝送されている遅延の大きい映像が存在する可能性がある。2つの映像は同一のカメラにて撮影されているが、伝送ルートが異なるため遅延値は異なっているような場合には、複数個のタリーが同時に表示される。 The video output terminal 103 outputs a tally signal representing a tally video. A tally is an on-air resource out of 16 resource videos in one screen in order to indicate a resource video of one program selected from the 16 resource videos at the local station and being aired. It is a red frame displayed to surround the video. The tally signal is a signal representing an image of a frame displayed in red on an on-air video image in which only one of 16 resource videos in one screen is selected in a black background. In FIG. 1, the tally is shown in white. In actual operation, there are cases where the same video is selected for 16 resources, so that the displayed tally is not limited to one. Specifically, in the 16-divided resource video, there may be a video with a large delay transmitted through the IP system in addition to a video with a small delay transmitted through the dedicated line. Two images are taken by the same camera, but when the transmission route is different and the delay values are different, a plurality of tally are displayed simultaneously.

音声出力端子１０４は、例えば、リソース映像がオンエアされたこと等を表す音声信号を出力する端子である。このような音声信号は、ＣＰＵ１２０によって生成される。 The audio output terminal 104 is a terminal that outputs an audio signal indicating, for example, that the resource video is on the air. Such an audio signal is generated by the CPU 120.

ＣＰＵ１２０は、映像ストリームの一致判定プログラムがインストールされており、映像ストリームの一致判定プログラムを実行する。これにより、ＰＣ１００は、映像ストリームの高速一致判定装置として機能する。 The CPU 120 has a video stream matching determination program installed therein, and executes the video stream matching determination program. As a result, the PC 100 functions as a high-speed match determination device for video streams.

メモリ１３０は、ＰＣ１００に含まれるＲＡＭ(Random Access Memory)、ＲＯＭ(Read Only Memory)、ハードディスク等を１つのブロックとして表したものである。 The memory 130 represents a RAM (Random Access Memory), a ROM (Read Only Memory), a hard disk, and the like included in the PC 100 as one block.

インサータ１５０は、リソース映像と、タリー信号とが入力され、リソース映像にタリー信号が表すタリーの画像を上乗せ（スーパー）した映像を出力する。より簡易的な方法として、ＰＣ１００からリソース映像にタリーを合成した状態で出力するようにしてもよい。これにより、インサータが必要なくなり、ハードウェアの規模が小さくなり導入コストを削減できる。ただし、映像合成・出力の処理が増えるため、PCのメインとなる画像処理の速度が多少低下する。そのため、これら2つの方法(インサータモード・タリー合成モードと称す)をユーザが選択して利用する方式が望ましい。 The inserter 150 receives a resource video and a tally signal, and outputs a video obtained by adding (super) a tally image represented by the tally signal to the resource video. As a simpler method, the PC 100 may output the resource video in a synthesized state. This eliminates the need for an inserter, reduces the hardware scale, and reduces the installation cost. However, since the video composition / output processing increases, the speed of the main image processing of the PC is somewhat reduced. Therefore, a method in which the user selects and uses these two methods (referred to as inserter mode / tally synthesis mode) is desirable.

スピーカ１６０は、ＣＰＵ１２０から出力される音声信号を音声に変換して出力する。 The speaker 160 converts the sound signal output from the CPU 120 into sound and outputs the sound.

図２は、ＣＰＵ１２０の機能ブロックとデータの流れを表す図である。ＣＰＵ１２０は、映像ストリームの一致判定プログラムを実行することにより、図２に示す機能を実現する。 FIG. 2 is a diagram illustrating a functional block of the CPU 120 and a data flow. The CPU 120 implements the function shown in FIG. 2 by executing a video stream matching determination program.

ＣＰＵ１２０は、映像読込部１２１、特徴量抽出部１２２、特徴量比較部１２３、遅延量推定部１２４、一致判定部１２５、タリー制御部１２６を有する。また、図２には、ＣＰＵ１２０の外部の構成要素として、インサータ１５０に含まれる映像合成部１５１を示すとともに、データとして、リソース映像、放送利用映像、タリー付き映像、音声アラームを示す。 The CPU 120 includes a video reading unit 121, a feature amount extraction unit 122, a feature amount comparison unit 123, a delay amount estimation unit 124, a match determination unit 125, and a tally control unit 126. 2 shows a video composition unit 151 included in the inserter 150 as an external component of the CPU 120, and resource data, broadcast use video, video with tally, and audio alarm as data.

ここで、図３を用いて、各部の機能について説明する。図３は、ＣＰＵ１２０によって実行される処理を示すフローチャートである。 Here, the function of each unit will be described with reference to FIG. FIG. 3 is a flowchart showing processing executed by the CPU 120.

ＣＰＵ１２０は、まず初期化を行う（ステップＳ１）。この処理は、映像読込部１２１、特徴量抽出部１２２、特徴量比較部１２３、遅延量推定部１２４、一致判定部１２５、タリー制御部１２６が行う処理以外の処理を行うＣＰＵ１２０の主制御部が行う。 The CPU 120 first performs initialization (step S1). This processing is performed by the main control unit of the CPU 120 that performs processing other than the processing performed by the video reading unit 121, the feature amount extraction unit 122, the feature amount comparison unit 123, the delay amount estimation unit 124, the coincidence determination unit 125, and the tally control unit 126. Do.

この初期化では、この後に処理で扱う配列や変数の初期化を行い、設定した各種パラメーターを読み込む。例えば、マッチングコストのマトリクスでは、初めは全ての要素を最大値へと初期化する。また、比較するフレーム数などユーザが指定したパラメーターに基づきメモリを確保するなど、処理の準備を行う。 In this initialization, the array and variables to be handled in the process are initialized, and various parameters set are read. For example, in the matching cost matrix, all elements are initially initialized to a maximum value. In addition, preparation for processing is performed such as securing a memory based on parameters specified by the user such as the number of frames to be compared.

リソース映像と放送利用映像は、主制御部によって映像読込部１２１に入力される。リソース映像は、映像分配器（不図示）を経由して映像合成部１５１にも入力される。 The resource video and the broadcast use video are input to the video reading unit 121 by the main control unit. The resource video is also input to the video composition unit 151 via a video distributor (not shown).

映像読込部１２１は、リソース映像と放送利用映像の２つの映像ストリームをキャプチャーして画像として扱えるように、最新フレームの画像を読み込む（ステップＳ２）。その際、リソース映像は、ユーザが指定した座標位置にしたがってキャプチャーを行い、マルチビューワー装置による合成前の複数のリソース映像に分割してキャプチャーを行う。キャプチャーはユーザが指定したフレームレートで行い、ユーザが指定したフレーム数分だけキャプチャーしたフレームは、メモリ１３０に格納しておく。 The video reading unit 121 reads the image of the latest frame so that the two video streams of the resource video and the broadcast use video can be captured and handled as images (step S2). At that time, the resource video is captured in accordance with the coordinate position designated by the user, and is captured by being divided into a plurality of resource videos before being synthesized by the multi-viewer device. The capture is performed at a frame rate specified by the user, and frames captured for the number of frames specified by the user are stored in the memory 130.

特徴量抽出部１２２は、映像読込部１２１が読み込んだ最新フレームの画像に対して画像処理を行い、特徴量を抽出する（ステップＳ３）。特徴量抽出部１２２は、抽出した特徴量を表すデータをメモリ１３０に格納する。 The feature amount extraction unit 122 performs image processing on the image of the latest frame read by the video reading unit 121 and extracts the feature amount (step S3). The feature amount extraction unit 122 stores data representing the extracted feature amount in the memory 130.

特徴量比較部１２３は、特徴量抽出部１２２で抽出した放送利用映像の最新フレームの特徴量と、リソース映像の最新のフレームから過去（２〜３秒程度）のフレームの特徴量とを比較し、特徴量同士のマッチング処理を行う（ステップＳ４）。 The feature amount comparison unit 123 compares the feature amount of the latest frame of the broadcast use video extracted by the feature amount extraction unit 122 with the feature amount of the past frame (about 2 to 3 seconds) from the latest frame of the resource video. Then, a matching process between feature quantities is performed (step S4).

マッチング処理のアルゴリズムとしては、既存の特徴量抽出手法として、例えばＯＲＢ（"An efficient alternative to SIFT or SURF", Ethan Rublee 他, IEEE International Conference on Computer Vision, Nov. 2011.）の手法を用いることができる。ここで得たマッチング結果をマッチングコストとする。マッチングコストは、特徴量同士の一致の度合を表し、低いほど一致の度合が高いことを表す。 As an algorithm for the matching process, for example, an ORB ("An efficient alternative to SIFT or SURF", Ethan Rublee et al., IEEE International Conference on Computer Vision, Nov. 2011) is used as an existing feature extraction method. it can. The matching result obtained here is used as a matching cost. The matching cost represents the degree of matching between feature quantities, and the lower the matching cost, the higher the degree of matching.

具体的には、マッチングコストは、１−(マッチと判定している特徴点の数)/(リソース映像の総特徴点数)として求める。これにより、マッチングコストは0〜1の間の小数で表され、全ての特徴点が一致している場合は0、全く一致していない場合は1となる。また、特徴点間のマッチング時に、通常にマッチングを行った場合は外れ値(データ上のノイズ)により、誤検出が発生することがある。したがって、k近傍法やRANSACと呼ばれる外れ値処理を適用することで、外れ値の影響を抑えるのが望ましい。 Specifically, the matching cost is obtained as 1− (number of feature points determined to be a match) / (total number of feature points of resource video). Thereby, the matching cost is represented by a decimal number between 0 and 1, and is 0 when all the feature points match, and 1 when they do not match at all. In addition, when matching is performed between feature points, erroneous detection may occur due to outliers (noise in data). Therefore, it is desirable to suppress the influence of outliers by applying outlier processing called k-nearest neighbor method or RANSAC.

遅延量推定部１２４は、特徴量比較部１２３で算出したマッチングコストの平均値を求める。平均値は、リソース映像の最新のフレームについての過去の所定フレーム数分の平均値、リソース映像の最新の１つ前のフレームについての過去の所定フレーム数分の平均値、・・・、リソース映像の最新の６個前のフレームについての過去の所定フレーム数分の平均値である。過去の所定フレーム数は、一例として、７フレームである。 The delay amount estimation unit 124 obtains an average value of matching costs calculated by the feature amount comparison unit 123. The average value is an average value for the past predetermined number of frames for the latest frame of the resource video, an average value for the predetermined number of past frames for the latest previous frame of the resource video,... Is the average value for the past predetermined number of frames for the latest six previous frames. The predetermined number of frames in the past is, for example, 7 frames.

この計算手法の詳細については、図４を用いて後述するが、リソース映像の最新のフレームから、リソース映像の最新の６個前のフレームまでの平均値を求めるのは、リソース映像に対する放送利用映像の遅延時間が、どの程度であるかを７フレームまでの間で当たりを付ける（推定する）ためである。 The details of this calculation method will be described later with reference to FIG. 4. The average value from the latest frame of the resource video to the latest six previous frames of the resource video is calculated using the broadcast video for the resource video. This is to hit (estimate) the delay time of up to 7 frames.

そして、遅延量推定部１２４は、リソース映像の最新のフレームについての平均値から、リソース映像の最新の６個前のフレームについての平均値までの７個の平均値のうちの最小値を求める（ステップＳ５）。７個の平均値のうちの最小値を与えるフレームと最新のフレームとの時間差は、リソース映像に対する放送利用映像の遅延時間であると推定されるからである。この理由は、図４を用いて後述する。 Then, the delay amount estimation unit 124 obtains the minimum value of the seven average values from the average value for the latest frame of the resource video to the average value for the latest six previous frames of the resource video ( Step S5). This is because the time difference between the frame that gives the minimum value among the seven average values and the latest frame is estimated to be the delay time of the broadcast use video with respect to the resource video. The reason for this will be described later with reference to FIG.

一致判定部１２５は、遅延量推定部１２４がステップＳ５で求めた平均値のうちの最小値が、ユーザが設定した閾値を下回っているかどうか判定することにより、放送利用映像とリソース映像が一致しているかどうかを判定する（ステップＳ６）。 The coincidence determination unit 125 determines whether the minimum value of the average value obtained by the delay amount estimation unit 124 in step S5 is below a threshold set by the user, so that the broadcast use video and the resource video match. It is determined whether or not (step S6).

一致判定部１２５は、遅延量推定部１２４がステップＳ５で求めた平均値のうちの最小値が閾値を下回っている（放送利用映像とリソース映像が一致している）場合に、該当のリソース映像は放送としてオンエア利用されていると判定する。この場合には、フローはステップＳ７Ａに進行する。 The coincidence determination unit 125, when the minimum value among the average values obtained in step S5 by the delay amount estimation unit 124 is below the threshold (the broadcast use video and the resource video match), Is determined to be used on-air as a broadcast. In this case, the flow proceeds to step S7A.

一方、一致判定部１２５は、遅延量推定部１２４がステップＳ５で求めた平均値のうちの最小値が閾値を下回っていない（放送利用映像とリソース映像が一致していない）場合には、該当のリソース映像は放送としてオンエア利用されていないと判定する。この場合には、フローはステップＳ７Ｂに進行する。 On the other hand, the coincidence determination unit 125, if the minimum value of the average values obtained in step S5 by the delay amount estimation unit 124 is not less than the threshold value (broadcast video and resource video do not match), This resource video is determined not to be used on-air as a broadcast. In this case, the flow proceeds to step S7B.

タリー制御部１２６は、タリーを表示するタリー信号と音声信号とを出力する（ステップＳ７Ａ）。タリー制御部１２６は、一致判定部１２５によって放送利用映像とリソース映像が一致していると判定された場合に、１６個のリソース映像を含む１画面の表示のうち、放送利用映像と一致すると判定されたリソース映像を表す部分にタリーを表示したタリー信号を映像合成部１５１に出力するとともに、「リソース映像がオンエアされました」という音声信号をスピーカ１６０に出力する。 The tally control unit 126 outputs a tally signal for displaying the tally and an audio signal (step S7A). When the match determination unit 125 determines that the broadcast-use video and the resource video match, the tally control unit 126 determines that the broadcast-use video matches the display of one screen including 16 resource videos. The tally signal in which the tally is displayed in the portion representing the resource video thus output is output to the video composition unit 151, and the audio signal “the resource video has been on air” is output to the speaker 160.

タリー制御部１２６は、タリーを非表示にするタリー信号を出力する（ステップＳ７Ｂ）。タリー制御部１２６は、一致判定部１２５によって放送利用映像とリソース映像が一致していないと判定された場合に、タリーを非表示したタリー信号を映像合成部１５１に出力する。なお、この場合に、「リソース映像はオンエアされていません」という音声信号をスピーカ１６０に出力してもよい。 The tally control unit 126 outputs a tally signal that hides the tally (step S7B). When the match determination unit 125 determines that the broadcast use video and the resource video do not match, the tally control unit 126 outputs a tally signal in which the tally is not displayed to the video synthesis unit 151. In this case, an audio signal “resource video is not on the air” may be output to the speaker 160.

主制御部は、ステップＳ７Ａ又はＳ７Ｂの処理が終わると、リソース映像が放送としてオンエア利用された際の開始時間と終了時間、リソース名、オンエアされた際の放送利用映像のフレームを表すログをメモリ１３０に書き込む（ステップＳ８）。これらのデータをもとに、ユーザに要求される任意のタイミングでリソース運用ログを出力する。主制御部は、ステップＳ８の処理を終えると、フローをステップＳ１にリターンする。 When the processing of step S7A or S7B is completed, the main control unit stores a log indicating the start time and end time when the resource video is used as air as a broadcast, the resource name, and the frame of the broadcast use video when it is on air. 130 is written (step S8). Based on these data, a resource operation log is output at an arbitrary timing required by the user. When the main control unit finishes the process of step S8, the flow returns to step S1.

図４は、ＣＰＵ１２０によって実行される映像ストリームの高速一致判定処理を示す図である。図４（Ａ）には、リソース映像の２０個のフレームＡ１〜Ａ１５及びＡ１１〜Ａ１５と、放送利用映像の２０個のフレームＢ１〜Ｂ７、Ａ５〜Ａ１１、及びＣ１〜Ｃ６とを示す。 FIG. 4 is a diagram illustrating a high-speed matching determination process for video streams executed by the CPU 120. FIG. 4A shows 20 frames A1 to A15 and A11 to A15 of the resource video, and 20 frames B1 to B7, A5 to A11, and C1 to C6 of the broadcast use video.

図４（Ａ）〜（Ｅ）に示すリソース映像は、１画面中の１６個のリソース映像のうちの１つである。ここで図４を用いて説明する処理は、１つのリソース映像に含まれる１６個のリソース映像の各々と、放送利用映像とについて、同様に並列的に行われる。 The resource video shown in FIGS. 4A to 4E is one of 16 resource videos in one screen. Here, the processing described with reference to FIG. 4 is similarly performed in parallel for each of the 16 resource videos included in one resource video and the broadcast use video.

図４（Ａ）において、横軸は時間であり、左に行くほど過去のフレームになり、右に行くほど新しいフレームになる。このため、図４（Ａ）に一例として示す状態では、リソース映像の右端のフレームＡ２０は最新（現在）のフレームであり、放送利用映像の右端のフレームＣ６は、最新（現在）のフレームである。 In FIG. 4A, the horizontal axis is time, and the left frame becomes a past frame, and the right axis becomes a new frame. For this reason, in the state shown as an example in FIG. 4A, the rightmost frame A20 of the resource video is the latest (current) frame, and the rightmost frame C6 of the broadcast-use video is the latest (current) frame. .

また、フレームＡ１〜Ａ２０と、フレームＢ１〜Ｂ７、Ａ５〜Ａ１１、及びＣ１〜Ｃ６とのアルファベットは、コンテンツ（番組）の種類を表す。アルファベットの添え数字は、時間方向の順番を表し、番号が小さいほど古く、番号が大きいほど新しいことを表す。 The alphabets of the frames A1 to A20 and the frames B1 to B7, A5 to A11, and C1 to C6 represent the types of contents (programs). Alphabet suffixes indicate the order in the time direction, with smaller numbers indicating older and larger numbers indicating newer.

このため、リソース映像のフレームＡ１〜Ａ２０は、１種類の同一のコンテンツ（番組）Ａの時系列的に連続な２０個のフレームＡ１〜Ａ２０を示す。 Therefore, the frames A1 to A20 of the resource video indicate 20 frames A1 to A20 that are continuous in time series of one type of the same content (program) A.

また、放送利用映像のフレームＢ１〜Ｂ７、Ａ５〜Ａ１１、及びＣ１〜Ｃ６は、コンテンツ（番組）Ｂの時系列的に連続な７個のフレームＢ１〜Ｂ７を放送した後に、コンテンツ（番組）Ａの時系列的に連続な７個のフレームＡ５〜Ａ１１を放送し、その後にコンテンツ（番組）Ｃの時系列的に連続な６個のフレームＣ１〜Ｃ６を放送することを表す。 Also, the frames B1 to B7, A5 to A11, and C1 to C6 of the broadcast use video broadcast the content (program) A after broadcasting seven frames B1 to B7 that are continuous in time series of the content (program) B. This shows that seven frames A5 to A11 continuous in time series are broadcasted, and then six frames C1 to C6 continuous in time series of content (program) C are broadcast.

また、図４（Ａ）には、リソース映像と放送利用映像の２０個の順番を示すために、過去のフレームから最新のフレームにかけて、１〜２０の番号を振る。 Also, in FIG. 4A, numbers 1 to 20 are assigned from the past frame to the latest frame in order to show the order of 20 of the resource video and the broadcast use video.

ここで、図４（Ａ）に示すリソース映像と放送利用映像は、フレームＡ５に着目すると、リソース映像に比べて放送利用映像が３フレーム遅れていることが分かる。これは、上述したように、ローカル局は、放送利用映像を電波塔から放送し、自局のアンテナで受信した放送利用映像を映像入力端子１０１Ｂに入力するため、放送から受信までに要する時間による遅延である。 Here, when the resource video and the broadcast use video shown in FIG. 4A are focused on the frame A5, it can be seen that the broadcast use video is delayed by three frames compared to the resource video. As described above, this is because the local station broadcasts the broadcast use video from the radio tower and inputs the broadcast use video received by the antenna of the local station to the video input terminal 101B. It is a delay.

なお、ローカル局で運用する場合、放送利用映像にはキー局から送られてくる放送プログラムを利用することもできる。また、このようにキー局から送られてくる放送プログラムを検出することにより、キー局で利用したローカル局の映像を検出することができる。この方式の場合、ローカル局が持つリソース映像を実際に放送にオンエアし、さらに同ローカル局が持つ別のリソース映像をキー局側が利用している状況でも検出することが可能である。 In the case of operation at a local station, a broadcast program sent from a key station can be used for broadcast use video. Further, by detecting the broadcast program sent from the key station in this way, it is possible to detect the video of the local station used in the key station. In the case of this method, it is possible to detect even in a situation where the resource video possessed by the local station is actually on-air for broadcasting, and another resource video possessed by the local station is used by the key station side.

また、ローカル局で利用した映像を検出したい場合は、電波塔へ送っている映像信号を放送利用映像として使うことも可能である。これにより、リソース映像と放送利用映像の間の遅延を小さくすることができる。ただし、キー局側やローカル局側でエンコード・デコードの処理を行っていることが多いため、遅延が0になることはない。 In addition, when it is desired to detect the video used at the local station, the video signal sent to the radio tower can be used as the broadcast video. Thereby, the delay between the resource video and the broadcast use video can be reduced. However, since the encoding / decoding process is often performed on the key station side or the local station side, the delay does not become zero.

このようにリソース映像に対して放送利用映像が遅延する場合に、リソース映像のフレームと、放送利用映像のフレームとの同一性を次のようにして判定する。 In this way, when the broadcast video is delayed with respect to the resource video, the identity of the frame of the resource video and the frame of the broadcast video is determined as follows.

図４（Ｂ）〜（Ｅ）には、横軸方向にリソース映像の最新のフレームから７個前のフレームまでの連続する７個のフレーム（すなわち最新の７個のフレーム）を示し、左側が最新のリソース映像のフレームであり、右側が過去の（より古い）リソース映像のフレームである。また、縦軸方向に放送利用映像の最新のフレームから７個前のフレームまでの連続する７個のフレーム（すなわち最新の７個のフレーム）を示し、上側が最新の放送利用映像のフレームであり、下側が過去の（より古い）放送利用映像のフレームである。 4B to 4E show seven consecutive frames (that is, the latest seven frames) from the latest frame of the resource video to the seven previous frames in the horizontal axis direction, and the left side is the left side. This is the latest resource video frame, and the right side is a past (older) resource video frame. In addition, in the vertical axis direction, seven consecutive frames (that is, the latest seven frames) from the latest frame of the broadcast use video to the previous seven frames are shown, and the upper side is the latest broadcast use video frame. The lower side is a frame of a past (older) broadcast use video.

リソース映像と放送利用映像の最新の７個のフレーム（最新のフレームから７個前のフレームまでの連続する７個のフレーム）は、時間の経過によってフレームが入れ替わる度に、１つずつ入れ替わって行く。 The latest seven frames of resource video and broadcast-use video (seven consecutive frames from the latest frame to the previous seven frames) are replaced one by one each time the frames are switched over time. .

また、リソース映像と放送利用映像の各フレーム同士の４９個のマッチングコストをマトリクス状に示す。時間の経過によってフレームが入れ替わる度に、最新の放送利用映像と、最新の７個のリソース映像とについて、７個のマッチングコストが求められ、４９個のマトリクスの最上行に値が格納される。そして、リソース映像と放送利用映像の最新の７個のフレームが時間の経過によって１つずつ入れ替わる度に、７個の平均値は１行ずつ下にずれて行く。 In addition, 49 matching costs between each frame of the resource video and the broadcast use video are shown in a matrix. Each time a frame is switched over time, seven matching costs are obtained for the latest broadcast-use video and the latest seven resource videos, and values are stored in the top row of 49 matrices. Then, every time the latest seven frames of the resource video and the broadcast use video are replaced one by one with the passage of time, the average value of the seven shifts downward by one row.

このため、４９個のマトリクスのうち、最上行は、最新の放送利用映像と、最新の７個のリソース映像とについて求められた７個のマッチングコストである。４９個のマトリクスのうち、最上行の１行下は、１フレーム前の時点で、最新の放送利用映像と、最新の７個のリソース映像とについて求められた７個のマッチングコストである。以下同様に、４９個のマトリクスのうち、最上行から６行下（最下行）は、６フレーム前の時点で、最新の放送利用映像と、最新の７個のリソース映像とについて求められた７個のマッチングコストである。 Therefore, the top row of the 49 matrices is the seven matching costs obtained for the latest broadcast use video and the latest seven resource videos. Of the 49 matrices, the one row below the top row is the seven matching costs obtained for the latest broadcast use video and the latest seven resource videos at the time one frame before. Similarly, in the 49 matrixes, 6 rows below the top row (bottom row) are obtained for the latest broadcast use video and the latest 7 resource videos at the time of 6 frames before. This is the matching cost of the piece.

また、４９個のマトリクスの下には、リソース映像の各フレームと、放送利用映像の７個のフレームとのマッチングコストの合計値と平均値を示す。すなわち、各マッチングコストの平均値は、フレームが入れ替わる各周期において、リソース映像の最新の７個のフレームの各々について、最新の１個の放送利用映像のフレームとについて求められた７個のマッチングコスト（縦に並べられている７個のマッチングコスト）の平均値を示す。 Further, below the 49 matrixes, the total value and the average value of the matching costs of each frame of the resource video and the seven frames of the broadcast use video are shown. That is, the average value of each matching cost is the seven matching costs obtained for the latest one frame of the broadcast use video for each of the latest seven frames of the resource video in each cycle in which the frames are switched. The average value of (seven matching costs arranged vertically) is shown.

ここでは、このようなマッチングコストの平均値を用いて、空間的な差違と時間的な差違がある１６個のリソース映像と放送利用映像との一致を判定する。ここでは、並列的に行われる１６の処理のうちの１つについて説明する。 Here, using the average value of such matching costs, it is determined whether the 16 resource videos having spatial differences and temporal differences match the broadcast use videos. Here, one of 16 processes performed in parallel will be described.

また、以下では、マッチングコストの最小値は０であり、マッチングコストの最大値は１であることとする。また、図３のステップＳ６において放送利用映像とリソース映像が一致しているかどうかを判定する際のマッチングコストの平均値の閾値は、一例として０．７であることとする。このような閾値は、実験等で最適な値に決めればよい。 In the following, it is assumed that the minimum value of the matching cost is 0 and the maximum value of the matching cost is 1. In addition, the threshold value of the average value of matching costs when determining whether or not the broadcast use video and the resource video match in step S6 of FIG. 3 is 0.7 as an example. Such a threshold value may be determined to an optimum value through experiments or the like.

まず、図４（Ｂ）には、放送利用映像の最新のフレームから６フレーム前までの７フレームについて、各フレームが最新のフレームであった時点において求められた、最新の放送利用映像と、最新の７個のリソース映像とについて求められた７個のマッチングコストを示す。図４（Ｂ）には、一例として、リソース映像と放送利用映像のフレームＡ１〜Ａ７とＢ１〜Ｂ７を示す。 First, FIG. 4 (B) shows the latest broadcast-use video, the latest broadcast-use video obtained at the time when each frame was the latest frame for the seven frames from the latest frame of the broadcast-use video to six frames before. 7 matching costs obtained for the 7 resource videos. In FIG. 4B, as an example, frames A1 to A7 and B1 to B7 of resource video and broadcast use video are shown.

図４（Ｂ）に示す４９個のマッチングコストの値は、ＣＰＵ１２０が図３に示すステップＳ２からステップＳ８の処理を７回繰り返すときのステップＳ３〜Ｓ５の処理によって得られる。 The 49 matching cost values shown in FIG. 4B are obtained by the processing of steps S3 to S5 when the CPU 120 repeats the processing of steps S2 to S8 shown in FIG. 3 seven times.

図４（Ｂ）の例では、４９のマッチングコストがすべて１．０である。このため、ＣＰＵ１２０によって７個求められるマッチングコストの平均値は、すべて１．０である。これは、図３に示すステップＳ５の処理に相当する。ここでは、７個のマッチングコストの平均値は、すべて１．０であるため、最小値は１．０である。この値は閾値（０．７）よりも大きいため、ＣＰＵ１２０は、ステップＳ６でＮＯと判定し、ステップＳ７Ｂでタリーを非表示にし、ステップＳ８でログを書き込み、フローをステップＳ２にリターンする。 In the example of FIG. 4B, all 49 matching costs are 1.0. For this reason, the average value of seven matching costs obtained by the CPU 120 is 1.0. This corresponds to the process of step S5 shown in FIG. Here, since the average value of the seven matching costs is 1.0, the minimum value is 1.0. Since this value is larger than the threshold (0.7), the CPU 120 determines NO in step S6, hides the tally in step S7B, writes the log in step S8, and returns the flow to step S2.

次に、ＣＰＵ１２０は、ステップＳ２の処理を実行して放送利用映像とリソース映像の最新フレームを取得することにより、図４（Ｃ）に示すように、リソース映像及び放送利用映像の最新の７個のフレームを１つずつ更新し、フレームＡ２〜Ａ８とフレームＢ２〜Ｂ７及びＡ５について処理を行う。そして、ＣＰＵ１２０は、放送利用映像とリソース映像の最新フレームの部分について、ステップＳ３〜Ｓ５の処理に従って７個のマッチングコストを求める。 Next, the CPU 120 executes the process of step S2 to acquire the latest frames of the broadcast video and resource video, and as shown in FIG. Are updated one by one, and processing is performed for frames A2 to A8 and frames B2 to B7 and A5. And CPU120 calculates | requires seven matching costs according to the process of step S3-S5 about the part of the newest frame of a broadcast utilization image | video and a resource image | video.

図４（Ｃ）の例では、リソース映像のフレームＡ２〜Ａ８と、放送利用映像のフレームＡ５とのマッチングコスト（最上行参照）が１．０よりも低くなっている。そして、リソース映像のフレームＡ５と、放送利用映像のフレームＡ５とのマッチングコストが０であり、最小値になっている。リソース映像のフレームＡ５と、放送利用映像のフレームＡ５とのマッチングコストであり、画素や縮尺等の違い（空間的な差違）と、リソース映像に対する放送利用映像の遅延（時間的な差違）とがあっても、基本的に同一の映像同士を比べているため、マッチングコストが最小になったものである。 In the example of FIG. 4C, the matching cost (see the top row) between the resource video frames A2 to A8 and the broadcast-use video frame A5 is lower than 1.0. The matching cost between the frame A5 of the resource video and the frame A5 of the broadcast video is 0, which is the minimum value. This is the matching cost between the frame A5 of the resource video and the frame A5 of the broadcast video, and there are differences in pixels and scales (spatial differences) and delays in the broadcast video with respect to the resource videos (temporal differences). Even if it exists, since the same image | video is compared fundamentally, the matching cost is the minimum.

図４（Ｃ）では、ＣＰＵ１２０が実行するステップＳ５の処理によって７個求められるマッチングコストの平均値は、最新から過去にかけて、０．９０、０．８９、０．８７、０．８６、０．８７、０．８９、０．９０である。すなわち、最小値は０．８６である。 4C, the average value of the seven matching costs obtained by the process of step S5 executed by the CPU 120 is 0.90, 0.89, 0.87, 0.86, 0. 87, 0.89, and 0.90. That is, the minimum value is 0.86.

なお、図４（Ｃ）では、リソース映像のフレームＡ５に対して放送利用映像の最新のフレームＡ５が３フレーム遅延しているため、遅延時間は３フレーム分であると推定される。 In FIG. 4C, since the latest frame A5 of the broadcast use video is delayed by 3 frames with respect to the frame A5 of the resource video, the delay time is estimated to be 3 frames.

しかしながら、マッチングコストの平均値の最小値は閾値（０．７）よりも大きいため、ＣＰＵ１２０は、ステップＳ６でＮＯと判定し、ステップＳ７Ｂでタリーを非表示にし、ステップＳ８でログを書き込み、フローをステップＳ２にリターンする。 However, since the minimum value of the average value of matching costs is larger than the threshold (0.7), the CPU 120 determines NO in step S6, hides the tally in step S7B, writes the log in step S8, and the flow. To step S2.

次に、ＣＰＵ１２０は、ステップＳ２の処理を実行して放送利用映像とリソース映像の最新フレームを取得することにより、図４（Ｄ）に示すように、リソース映像及び放送利用映像の最新の７個のフレームを１つずつ更新し、フレームＡ３〜Ａ９とフレームＢ３〜Ｂ７、Ａ５、及びＡ６について処理を行う。そして、ＣＰＵ１２０は、放送利用映像とリソース映像の最新フレームの部分について、ステップＳ３〜Ｓ５の処理に従って７個のマッチングコストを求める。 Next, the CPU 120 executes the process of step S2 to acquire the latest frames of the broadcast video and resource video, and as shown in FIG. Are updated one by one, and processing is performed for frames A3 to A9 and frames B3 to B7, A5, and A6. And CPU120 calculates | requires seven matching costs according to the process of step S3-S5 about the part of the newest frame of a broadcast utilization image | video and a resource image | video.

図４（Ｄ）の例では、リソース映像の最新の７個のフレームＡ３〜Ａ９と、放送利用映像の最新のフレームＡ６とのマッチングコストが１．０よりも低くなっている。そして、リソース映像のフレームＡ６と、放送利用映像のフレームＡ６とのマッチングコストが０であり、最小値になっている。空間的な差違と、時間的な差違があっても、基本的に同一の映像同士を比べているため、マッチングコストが最小になったものである。 In the example of FIG. 4D, the matching cost between the latest seven frames A3 to A9 of the resource video and the latest frame A6 of the broadcast use video is lower than 1.0. The matching cost between the frame A6 of the resource video and the frame A6 of the broadcast video is 0, which is the minimum value. Even if there is a spatial difference and a temporal difference, the matching costs are minimized because basically the same images are compared.

図４（Ｄ）では、ＣＰＵ１２０が実行するステップＳ５の処理によって７個求められるマッチングコストの平均値は、最新から過去にかけて、０．７６、０．７４、０．７３、０．７１、０．７３、０．７４、０．７６である。すなわち、最小値は０．７１である。 4D, the average value of seven matching costs obtained by the process of step S5 executed by the CPU 120 is 0.76, 0.74, 0.73, 0.71, 0,. 73, 0.74, and 0.76. That is, the minimum value is 0.71.

なお、図４（Ｃ）では、リソース映像のフレームＡ６に対して放送利用映像の最新のフレームＡ６が３フレーム遅延しているため、遅延時間は３フレーム分であると推定される。 In FIG. 4C, since the latest frame A6 of the broadcast use video is delayed by 3 frames with respect to the frame A6 of the resource video, the delay time is estimated to be 3 frames.

しかしながら、マッチングコストの最小値は閾値（０．７）よりも大きいため、ＣＰＵ１２０は、ステップＳ６でＮＯと判定し、ステップＳ７Ｂでタリーを非表示にし、ステップＳ８でログを書き込み、フローをステップＳ２にリターンする。 However, since the minimum value of the matching cost is larger than the threshold (0.7), the CPU 120 determines NO in step S6, hides the tally in step S7B, writes the log in step S8, and flows the flow to step S2. Return to

その後、ＣＰＵ１２０がステップＳ２〜Ｓ８の処理を繰り返すことにより、リソース映像及び放送利用映像の最新の７個のフレームを１つずつ更新することが繰り返される。そして、図４（Ｅ）では、フレームＡ９〜Ａ１５とフレームＡ６〜Ａ１１、及びＣ１１について処理が行われる場合について説明する。 Thereafter, the CPU 120 repeats the processes of steps S2 to S8, thereby repeatedly updating the latest seven frames of the resource video and the broadcast use video one by one. In FIG. 4E, a case where processing is performed for frames A9 to A15, frames A6 to A11, and C11 will be described.

ＣＰＵ１２０は、ステップＳ２の処理を実行して放送利用映像とリソース映像の最新フレームを取得することにより、図４（Ｅ）に示すように、フレームＡ９〜Ａ１５とフレームＡ６〜Ａ１１、及びＣ１１について処理を行う。そして、ＣＰＵ１２０は、放送利用映像の最新のフレームと、リソース映像の最新の７個のフレームとについて、ステップＳ３〜Ｓ５の処理に従って７個のマッチングコストの平均値を求める。 The CPU 120 executes the process of step S2 to acquire the latest frames of the broadcast use video and the resource video, thereby processing the frames A9 to A15, the frames A6 to A11, and C11 as shown in FIG. I do. And CPU120 calculates | requires the average value of seven matching cost according to the process of step S3-S5 about the newest frame of a broadcast utilization image | video and the latest seven frame of a resource image | video.

図４（Ｅ）の例では、放送利用映像の最新のフレームＣ１と、リソース映像の最新の７個のフレームＡ１５〜Ａ９とのマッチングコストが１．０になっている。 In the example of FIG. 4E, the matching cost between the latest frame C1 of the broadcast video and the latest seven frames A15 to A9 of the resource video is 1.0.

図４（Ｅ）では、ＣＰＵ１２０が実行するステップＳ５の処理によって７個求められるマッチングコストの平均値は、最新から過去にかけて、０．４０、０．３１、０．２３、０．１４、０．２３、０．３１、０．４０である。すなわち、最小値は０．１４である。 In FIG. 4E, the average value of seven matching costs obtained by the process of step S5 executed by the CPU 120 is 0.40, 0.31, 0.23, 0.14,. 23, 0.31, and 0.40. That is, the minimum value is 0.14.

そして、マッチングコストの最小値は閾値（０．７）以下であるため、ＣＰＵ１２０は、ステップＳ６でＹＥＳ（放送利用映像とリソース映像が一致）と判定し、ステップＳ７Ａでタリーを表示させ、ステップＳ８でログを書き込み、フローをステップＳ２にリターンする。 Since the minimum value of the matching cost is equal to or less than the threshold (0.7), the CPU 120 determines YES in step S6 (broadcast video and resource video match), displays a tally in step S7A, and performs step S8. The log is written in, and the flow returns to step S2.

ここでは、１画面に含まれる１６個のリソース映像の各々と、放送利用映像とについて並列的に行われる１６の処理のうちの１つについて説明しているが、ステップＳ７Ａでは、放送利用映像と一致すると判定されたリソース映像が、１６個のリソース映像を含む１画面の表示の中に表示される座標に、タリーの赤い枠が表示される。 Here, one of the 16 processes performed in parallel for each of the 16 resource videos included in one screen and the broadcast use video has been described, but in step S7A, the broadcast use video and A tally red frame is displayed at the coordinates at which the resource video determined to match is displayed in a one-screen display including 16 resource videos.

これにより、オンエア中のリソース映像が１６個のリソース映像のうちのどれであるかをユーザが認識することができる。 As a result, the user can recognize which of the 16 resource videos is the on-air resource video.

以上のように、実施の形態によれば、最新の１個の放送利用映像のフレームと、最新の７個のリソース映像のフレームとのマッチングコストを求め、過去７フレームにわたるマッチングコストの平均値を、リソース映像の最新のフレームから６個前のフレームまでの各々について求める。 As described above, according to the embodiment, the matching cost between the latest one broadcast use video frame and the latest seven resource video frames is obtained, and the average value of the matching costs over the past seven frames is obtained. Then, each of the resource video from the latest frame to the 6th previous frame is obtained.

そして、マッチングコストの７個の平均値の最小値が閾値以下になると、当該リソース映像がオンエアされていると判定し、タリーを表示する。マッチングコストの７個の平均値の最小値を与えるリソース映像は、リソース映像に対する放送利用映像の遅延時間であると推定される時間に対応するフレームについて求められた平均値を有するからである。 Then, when the minimum value of the seven average values of the matching costs is equal to or less than the threshold value, it is determined that the resource video is on air, and a tally is displayed. This is because the resource video that gives the minimum of the seven average values of the matching cost has the average value obtained for the frame corresponding to the time estimated to be the delay time of the broadcast video for the resource video.

従って、複数のリソース映像と現在放送にオンエアされている映像を比較し、空間的な差違や時間的な差異が存在していても、リソース映像と放送利用映像の一致性を判定できる映像ストリームの一致判定プログラムを提供することができる。 Therefore, by comparing multiple resource videos and videos currently on the air, even if there are spatial differences or temporal differences, the video stream that can determine the consistency between the resource video and the broadcast video A coincidence determination program can be provided.

また、以上では、リソース映像が１つの画面の中に１６個のリソース映像を含む形態について説明したが、１つの画面の中に含まれるリソース映像の数は１６個に限られない。例えば、４個でもよいし、１個でもよい。 In the above description, the mode in which the resource video includes 16 resource videos in one screen has been described. However, the number of resource videos included in one screen is not limited to 16. For example, the number may be four or one.

また、音声信号は、スピーカ１６０に出力する代わりに、デジタルハイブリッド及び電話回線を経由して、中継現場等に提供するようにしてもよい。この場合は、中継現場等で当該中継現場の映像がオンエアされていることを把握することができる。なお、デジタルハイブリッドは、テレホンハイブリットと称されることもある。 Further, instead of outputting the audio signal to the speaker 160, the audio signal may be provided to a relay site or the like via a digital hybrid and a telephone line. In this case, it is possible to grasp that the video at the relay site is on air at the relay site or the like. The digital hybrid is sometimes referred to as a telephone hybrid.

また、以上では、映像処理システム１０がローカル局に配備される形態について説明した。ローカル局では、基本的にはキー局から送られてくる放送プログラムとローカル局内で作成する放送プログラムを選択的に送出、放送する。また、ローカル局の天気カメラやその他リソース映像は光回線(専用線)やIP回線にてキー局へと送られる。したがって、キー局ではローカル局が持っているリソース映像も放送プログラム上で利用することができる。しかし、ローカル局側ではキー局側がどのようなタイミングでローカル局側のリソース映像を利用するかわからない。そのため、放送利用映像とリソース映像を比較することでキー局がローカル局側のリソースを利用したことを確認することができる。 In the above description, the mode in which the video processing system 10 is deployed in a local station has been described. In the local station, basically, a broadcast program transmitted from the key station and a broadcast program created in the local station are selectively transmitted and broadcast. The local station's weather camera and other resource images are sent to the key station via an optical line (private line) or IP line. Therefore, the key station can use the resource video held by the local station on the broadcast program. However, the local station side does not know at what timing the key station side uses the local station side resource video. Therefore, it is possible to confirm that the key station has used the resources on the local station side by comparing the broadcast use video and the resource video.

一方、キー局側で運用する場合を考えると、キー局側ではどのリソース映像を使用しているかという情報を自分で持っている。したがって、ローカル局とは異なり、現在利用している映像を可視化(タリーをつける)などはある程度容易に行うことができる。しかし、本発明ではこのようなリソース映像の使用情報を用いることなく、２つの映像ストリームを比較するだけで一致判定を行うことができる。そのため、システムの規模が小さく、導入の敷居は低いと考えられ、ローカル局に限らず、キー局にも適したシステムである。 On the other hand, considering the case of operation on the key station side, the key station side has its own information on which resource video is used. Therefore, unlike a local station, visualization (attaching a tally) of a currently used video can be easily performed to some extent. However, according to the present invention, it is possible to perform a match determination by simply comparing two video streams without using such resource video usage information. Therefore, the scale of the system is small and the threshold for introduction is considered to be low, and the system is suitable not only for local stations but also for key stations.

以上、本発明の例示的な実施の形態の映像ストリームの一致判定プログラムについて説明したが、本発明は、具体的に開示された実施の形態に限定されるものではなく、特許請求の範囲から逸脱することなく、種々の変形や変更が可能である。 Although the video stream matching determination program of the exemplary embodiment of the present invention has been described above, the present invention is not limited to the specifically disclosed embodiment, and departs from the scope of the claims. Various modifications and changes can be made without this.

１０映像処理システム
１００ＰＣ
１０１Ａ、１０１Ｂ映像入力端子
１０２同期信号入力端子
１０３映像出力端子
１０４音声出力端子
１１０ＨＤ−ＳＤＩ入出力ボード
１２０ＣＰＵ
１２１映像読込部
１２２特徴量抽出部
１２３特徴量比較部
１２４遅延量推定部
１２５一致判定部
１２６タリー制御部
１３０メモリ
１５０インサータ
１５１映像合成部
１６０スピーカ 10 Video processing system 100 PC
101A, 101B Video input terminal 102 Synchronization signal input terminal 103 Video output terminal 104 Audio output terminal 110 HD-SDI input / output board 120 CPU
121 video reading unit 122 feature amount extraction unit 123 feature amount comparison unit 124 delay amount estimation unit 125 coincidence determination unit 126 tally control unit 130 memory 150 inserter 151 video synthesis unit 160 speaker

Claims

A video stream match determination program for determining matching between resource video and broadcast use video,
Computer
For each resource video of a plurality of resource videos, the feature amount of the latest predetermined number of frames of the resource video and the latest one frame of the broadcast use video is extracted,
Based on the feature amount, the degree of coincidence between the latest predetermined number of frames of the resource video and the latest one frame of the broadcast video is calculated,
For each of the latest predetermined number of frames of the resource video, the total degree of coincidence calculated from the present to a predetermined time before is calculated,
By calculating the average value of the degree of coincidence by dividing the total by the predetermined number,
Determining whether the average value of the degree of coincidence is a predetermined threshold value or less;
A video stream that, when the average value of the degree of coincidence is equal to or less than a predetermined threshold, determines that, among the plurality of resource videos, the resource video having the average value of the degree of coincidence equal to or less than the threshold matches a broadcast video. Match determination program.

The plurality of resource videos are displayed in a format in which one screen is divided,
2. The video stream matching according to claim 1, wherein when it is determined that the resource video and the broadcast use video match, a tally is displayed at a position of the one screen where the resource video determined to match is displayed. Judgment program.

3. The video stream matching determination program according to claim 1, wherein, when it is determined that the resource video matches the broadcast use video, an audio guidance indicating that the resource video is on-air is output.