JP7343378B2

JP7343378B2 - editing system

Info

Publication number: JP7343378B2
Application number: JP2019223031A
Authority: JP
Inventors: 宏幸田中
Original assignee: Hitachi Kokusai Electric Inc
Current assignee: Hitachi Kokusai Electric Inc
Priority date: 2019-12-10
Filing date: 2019-12-10
Publication date: 2023-09-12
Anticipated expiration: 2039-12-10
Also published as: JP2021093627A

Description

本発明は、主に放送局等で使用される放送映像に編集を行って再送出可能な編集システムに関する。 TECHNICAL FIELD The present invention relates to an editing system that is capable of editing and retransmitting broadcast video mainly used in broadcast stations and the like.

近年、放送局等において、映像データを編集用の素材として、素材用のビデオサーバー等に格納し、これをノンリニア編集し、放送用に送出するような編集システムが実用化されている。 2. Description of the Related Art In recent years, editing systems have been put into practical use at broadcasting stations and the like, in which video data is stored as editing material in a video server for the material, non-linear editing is performed, and the data is sent out for broadcasting.

従来の編集システムとして、例えば、特許文献１を参照すると、映像内の物体を特定するために、処理対象となる映像部分や音声部分をそれぞれ認識する処理対象認識部を備える技術が記載されている。 As a conventional editing system, for example, referring to Patent Document 1, a technology is described that includes a processing target recognition unit that recognizes each video part and audio part to be processed in order to identify an object in a video. .

一方、従来の編集システムでは、放送時の送出映像を同時録画した放送同録映像（以下、単に「放送映像」という。）も格納している。このような放送映像を時差配信や再放送等により再送出する場合、放送時に付加された不要領域をマスクして、元映像に近い映像素材データ（加工映像データ）を生成する必要がある。この不要領域としては、例えば、「Ｌ字」、時刻表示、天気予報、緊急報道、津波警戒情報、災害情報等を示した画像や字幕等の付加表示が存在する（以下、これらの付加表示を「Ｌ字等」という。）。なお、「Ｌ字」とは、「Ｌ字型画面」等と呼称される、例えば、本来の番組放送画面を多少右下端に縮小し、余剰した画面の左側及び上側をＬ字型のスペースと見なして、災害等の情報を表示するような映像加工のことを指す。また、Ｌ字型画面以外にも、Ｕ字型、側面を全て取り囲んで縮小表示するような付加表示も、付加表示に含む。
この場合、手作業による編集で、Ｌ字等の不要領域のマスク加工やトリミング加工等を行う必要があった。 On the other hand, conventional editing systems also store broadcast simultaneously recorded video (hereinafter simply referred to as "broadcast video"), which is a simultaneous recording of the broadcast video. When retransmitting such broadcast video through staggered distribution, rebroadcasting, etc., it is necessary to mask unnecessary areas added during broadcasting and generate video material data (processed video data) that is close to the original video. Examples of this unnecessary area include additional displays such as "L-shaped", time display, images and subtitles showing weather forecasts, emergency reports, tsunami warning information, disaster information, etc. (Hereinafter, these additional displays will be referred to as (referred to as "L-shape, etc."). In addition, "L-shaped" is called "L-shaped screen", for example, the original program broadcast screen is reduced somewhat to the lower right corner, and the left and upper sides of the excess screen are used as L-shaped space. This refers to video processing that displays information about disasters, etc. In addition to the L-shaped screen, the additional display also includes a U-shaped additional display that surrounds all the sides and displays in a reduced size.
In this case, it was necessary to perform masking, trimming, etc. of unnecessary areas such as the L-shape by manual editing.

特開２０１９－６２３８１号公報JP2019-62381A

しかしながら、従来の編集システムにおいて、Ｌ字等の不要領域を手作業により編集すると、トリミングのミス、同録時のエンコードやマスク加工、トリミングによる部分拡大等により画質劣化が避けられなかった。 However, in conventional editing systems, when unnecessary areas such as the L-shape are manually edited, image quality deterioration is unavoidable due to trimming errors, encoding and mask processing during simultaneous recording, partial enlargement due to trimming, etc.

本発明は、このような状況に鑑みてなされたものであり、上述の問題を解消することを課題とする。 The present invention has been made in view of this situation, and an object of the present invention is to solve the above-mentioned problems.

本発明の編集システムは、放送映像を再送出する編集システムであって、前記放送映像の特定箇所に連続して表示される不要領域を特定する不要領域特定手段と、前記不要領域特定手段により特定された不要領域を前記放送映像から削除又は目立たなくする加工を行った加工映像を作成する不要領域加工手段と、前記不要領域加工手段により加工された前記加工映像及び／又は前記放送映像から、格納された元映像を特定する元映像特定手段と、前記元映像特定手段により特定された前記元映像を基に、前記加工映像を高画質化する高画質化手段とを備えることを特徴とする。
本発明の編集システムは、前記不要領域特定手段は、削除する対象の領域の特徴を学習させたモデルにより前記不要領域を特定することを特徴とする。
本発明の編集システムは、前記高画質化手段は、前記加工映像について、前記元映像に基づくエッジ情報並びに／若しくは色情報を利用したエッジ強調若しくは合成、及び／又は、前記元映像の切り出しによる合成を行うことで高画質化することを特徴とする。
本発明の編集システムは、前記元映像特定手段は、前記加工映像と前記元映像との画像中の共通点を抽出し、抽出した前記共通点に基づいて前記元映像を特定することを特徴とする。
本発明の編集システムは、前記放送映像に対応した音声を解析して、削除箇所を特定する削除箇所特定手段と、前記元映像に対応する元音声を特定する元音声特定手段と、前記元音声特定手段により特定された前記元音声を基に、前記削除箇所特定手段により特定された前記音声の前記削除箇所を高音質化する高音質化処理手段とを更に備えることを特徴とする。
本発明の編集システムは、前記削除箇所特定手段は、特定のモデルを用いて音声解析を行い、前記音声中の警報音の箇所を特定することを特徴とする。 The editing system of the present invention is an editing system that retransmits a broadcast video, and includes an unnecessary area specifying means for specifying an unnecessary area that is continuously displayed at a specific location of the broadcast video, and an unnecessary area specifying means that identifies the unnecessary area by the unnecessary area specifying means. an unnecessary area processing means for creating a processed video in which the unnecessary area has been processed to delete or make it less conspicuous from the broadcast video; and storage from the processed video and/or the broadcast video processed by the unnecessary area processing means. The image processing apparatus is characterized by comprising an original video specifying means for specifying the original image that has been processed, and an image quality improving means for increasing the image quality of the processed image based on the original image specified by the original image specifying means.
The editing system of the present invention is characterized in that the unnecessary area identifying means identifies the unnecessary area using a model that has learned the characteristics of the area to be deleted.
In the editing system of the present invention, the image quality improving means performs edge enhancement or synthesis on the processed video using edge information and/or color information based on the original video, and/or synthesis by cutting out the original video. The feature is that the image quality is improved by doing this.
The editing system of the present invention is characterized in that the original video identifying means extracts common points in the images of the processed video and the original video, and identifies the original video based on the extracted common points. do.
The editing system of the present invention includes: a deletion part specifying means for analyzing the audio corresponding to the broadcast video and specifying a deletion part; an original audio specifying means for identifying the original audio corresponding to the original video; and the original audio The apparatus is characterized by further comprising a sound quality enhancement processing means for enhancing the sound quality of the deletion portion of the voice specified by the deletion portion specifying means based on the original voice specified by the specifying means.
The editing system of the present invention is characterized in that the deletion location identifying means performs audio analysis using a specific model and identifies the location of the alarm sound in the audio.

本発明によれば、放送映像の特定箇所に連続して表示される不要領域を特定し、削除又は目立たなくする加工を行った加工映像を作成し、この加工映像及び／又は放送映像から、元映像データを特定して、この元映像データを基に、加工映像を高画質化することで、再送出時の画質劣化を抑えることが可能な編集システムを提供することができる。 According to the present invention, a processed video is created in which an unnecessary area that is continuously displayed in a specific part of a broadcast video is deleted or made inconspicuous, and the original video is extracted from the processed video and/or the broadcast video. By specifying video data and increasing the quality of processed video based on this original video data, it is possible to provide an editing system that can suppress image quality deterioration during retransmission.

本発明の実施の形態に係る編集システムＸの概略構成を示すシステム構成図である。1 is a system configuration diagram showing a schematic configuration of an editing system X according to an embodiment of the present invention. 本発明の実施の形態に係る再送出処理の流れを示すフローチャートである。3 is a flowchart showing the flow of retransmission processing according to an embodiment of the present invention. 図２に示す再送出処理における高画質化処理を示す概念図である。FIG. 3 is a conceptual diagram showing image quality improvement processing in the retransmission processing shown in FIG. 2; 図２に示す再送出処理における高音質化処理を示す概念図である。FIG. 3 is a conceptual diagram showing high-quality sound processing in the retransmission processing shown in FIG. 2; 従来のビデオサーバーシステムによる再送出の概念図である。FIG. 2 is a conceptual diagram of retransmission by a conventional video server system.

＜実施の形態＞
〔編集システムＸの制御構成〕
以下で、本発明の実施の形態について、図面を参照して説明する。
編集システムＸは、放送局等で使用される編集システム（ビデオサーバーシステム）である。編集システムＸは、放送映像データ２００に含まれる放送映像を、時差配信や再放送等で再送出することが可能である。この際、編集システムＸは、前回放送した映像の不要部分を削除することが可能である。
図１によると、編集システムＸは、解析装置１と、蓄積サーバー２と、収録装置３と、編集装置４とが、ネットワーク５で接続されて構成されている。 <Embodiment>
[Control configuration of editing system X]
Embodiments of the present invention will be described below with reference to the drawings.
Editing system X is an editing system (video server system) used at broadcasting stations and the like. The editing system X can retransmit the broadcast video included in the broadcast video data 200 by staggered distribution, rebroadcasting, or the like. At this time, the editing system X can delete unnecessary parts of the previously broadcast video.
According to FIG. 1, the editing system X includes an analysis device 1, a storage server 2, a recording device 3, and an editing device 4 connected through a network 5.

解析装置１は、蓄積サーバー２に格納された放送映像データ２００等の内容を解析するための装置である。解析装置１は、例えば、映像データに含まれる映像（画像）について、各種フィルター処理やＯＣＲ（Optical Character Recognition、光学文字認識）を含む画像成分分析、畳み込みニューラルネット、ＧＡＮ（Generative Adversarial Network）、ＲＮＮ（Recurrent Neural Network）、ＬＳＴＭ（Long short term memory network）、その他の多層ニューラルネット、カーネルマシン、決定木、ベイジアンネットワーク、ＨＭＭ（Hidden Markov Model）、その他の統計的手法等を含む、いわゆるＡＩ（Artificial Intelligence）等の演算を行う装置である。さらに、解析装置１は、音声成分分析やＡＩにより、音声データ３００の解析も行うことが可能である。
解析装置１の詳細な構成については後述する。 The analysis device 1 is a device for analyzing the contents of broadcast video data 200 etc. stored in the storage server 2. For example, the analysis device 1 performs image component analysis including various filter processing and OCR (Optical Character Recognition), a convolutional neural network, a GAN (Generative Adversarial Network), and an RNN on a video (image) included in video data. (Recurrent Neural Network), LSTM (Long short term memory network), other multilayer neural networks, kernel machines, decision trees, Bayesian networks, HMM (Hidden Markov Model), and other statistical methods, etc. It is a device that performs calculations such as intelligence. Furthermore, the analysis device 1 can also analyze the audio data 300 using audio component analysis or AI.
The detailed configuration of the analysis device 1 will be described later.

収録装置３は、画像データや音声データ３００等を収録して、これらを画像や音声のエンコーダーを用いて、撮像された各種コーデックに符号化（変換）する装置である。
本実施形態において、収録装置３は、例えば、後述する撮像部３０で撮像された非圧縮の画像データを収録して符号化する。また、収録装置３は、専用回線やネットワーク５を介して、他局等にあるサーバー、ＶＴＲ、その他の機器から画像データを収録してもよいし、ＭＸＦ（Media eXchange Format）等のファイルで取り込むことで収録してもよい。エンコーダーでの符号化に用いる映像符号化方式（コーデック）は、例えば、ＭＰＥＧ２、Ｈ．２６４、Ｈ．２６５等を用いることが可能であるが、これに限られない。符号化されたデータについて、収録装置３は、蓄積サーバー２や再生用の送出設備へ送信することが可能である。 The recording device 3 is a device that records image data, audio data 300, etc., and encodes (converts) these data into various codecs captured by the image using an image and audio encoder.
In this embodiment, the recording device 3 records and encodes uncompressed image data captured by an imaging unit 30, which will be described later, for example. The recording device 3 may also record image data from a server, VTR, or other device located at another station via a dedicated line or network 5, or may import image data in a file such as MXF (Media eXchange Format). You can also record it by doing so. The video encoding method (codec) used for encoding by the encoder is, for example, MPEG2, H. 264, H. Although it is possible to use H.265 or the like, it is not limited to this. The recording device 3 can transmit the encoded data to the storage server 2 or the transmission equipment for reproduction.

蓄積サーバー２は、放送映像データ２００を蓄積し、他装置へ送信するサーバー等の装置である。本実施形態において、蓄積サーバー２は、収録装置３で収録された収録素材（素材映像、素材ファイル）の放送映像データ２００、元映像データ２２０等を格納する素材映像サーバーとして機能する。これに加えて、蓄積サーバー２は、マルチプレクサ（Multiplexer、MUX）による多重化の機能を含んでいてもよい。
蓄積サーバー２に格納されるデータの詳細については後述する。 The storage server 2 is a device such as a server that stores broadcast video data 200 and transmits it to other devices. In this embodiment, the storage server 2 functions as a material video server that stores broadcast video data 200, original video data 220, etc. of recorded materials (material video, material files) recorded by the recording device 3. In addition, the storage server 2 may include a multiplexing function using a multiplexer (MUX).
Details of the data stored in the storage server 2 will be described later.

編集装置４は、いわゆる汎用のノンリニア編集機（装置）である。編集装置４は、レンダリング編集、カット編集等の編集処理を行う。このうち、レンダリング編集は、蓄積サーバー２に格納された放送映像データ２００を、実際にレンダリングしつつ編集する処理である。カット編集は、レンダリングを行わないでクリップ化する処理である。 The editing device 4 is a so-called general-purpose nonlinear editing machine (device). The editing device 4 performs editing processing such as rendering editing and cut editing. Among these, rendering editing is a process of editing the broadcast video data 200 stored in the storage server 2 while actually rendering it. Cut editing is a process of clipping without rendering.

本実施形態において、編集装置４は、図示しない表示部、キーボード、ポインティングデバイス、操作器等を備えている。さらに、編集装置４は、実際にこの編集作業を行うコンピュータである編集制御手段（編集手段）と、放送映像データ２００や編集のタイムライン等を表示させる表示部（ディスプレイ）と、編集の指示を入力するための操作パネル（操作手段）等を備えている。 In this embodiment, the editing device 4 includes a display section, a keyboard, a pointing device, an operating device, etc. (not shown). Furthermore, the editing device 4 includes an editing control means (editing means) that is a computer that actually performs this editing work, a display unit (display) that displays broadcast video data 200, an editing timeline, etc., and a display that displays editing instructions. It is equipped with an operation panel (operation means) for inputting information.

編集装置４は、蓄積サーバー２に対して放送映像データ２００や元映像データ２２０等を参照し、編集可能な装置である。編集装置４は、ユーザに操作パネルを操作させ、編集処理の対象となる部分を指定させて、カット編集やレンダリング編集等を実行することが可能である。そして、編集装置４は、編集後の放送映像データ２００や元映像データ２２０等の編集情報を、蓄積サーバー２に送信して格納させる。 The editing device 4 is a device that can edit broadcast video data 200, original video data 220, etc. by referring to the storage server 2. The editing device 4 can perform cut editing, rendering editing, etc. by having the user operate the operation panel and specify a portion to be edited. Then, the editing device 4 transmits the edited information such as the edited broadcast video data 200 and the original video data 220 to the storage server 2 for storage.

これらの編集処理において用いる編集情報は、例えば、処理対象となる部分の映像フレーム位置、映像上の座標、音声サンプルの位置の範囲、処理の内容等を含む。上述の編集処理の種類は、処理対象が映像の場合には、各種画像効果、クリップ間の接続とその効果、輝度や色の調整処理、フェードイン、フェードアウト、音量調整等を含む。 The editing information used in these editing processes includes, for example, the video frame position of the portion to be processed, the coordinates on the video, the position range of the audio sample, the content of the processing, and the like. When the processing target is video, the types of editing processing described above include various image effects, connections between clips and their effects, brightness and color adjustment processing, fade-in, fade-out, volume adjustment, and the like.

ネットワーク５は、各装置を結ぶＬＡＮ（Local Area Network）、光ファイバー網、ｃ．ｌｉｎｋ、無線ＬＡＮ（ＷｉＦｉ）、携帯電話網等の各装置を相互に接続して通信を行う通信手段である。ネットワーク５は、専用線、イントラネット、インターネット等を用いてもよく、これらが混在しても、ＶＰＮ（Virtual Private Network）を構成していてもよい。さらに、ネットワーク５は、ＴＣＰ／ＩＰやＵＤＰ等のＩＰネットワークを用いて、各種プロトコルで接続されてもよい。 The network 5 includes a LAN (Local Area Network) connecting each device, an optical fiber network, and c. It is a communication means that interconnects devices such as a link, wireless LAN (WiFi), and mobile phone network to perform communication. The network 5 may be a dedicated line, an intranet, the Internet, etc., or a mixture of these may constitute a VPN (Virtual Private Network). Furthermore, the network 5 may be connected using various protocols using an IP network such as TCP/IP or UDP.

なお、この他にも、編集システムＸは、汎用の放送局用の送出サーバー等を含む送出設備（装置）等を含んでいる。これらの装置は、蓄積サーバー２に記録されている素材映像や蓄積サーバー２に記録された放送映像を放送出力（オンエア）する。加えて、放送映像を、試写のために再生することも可能である。 In addition to this, the editing system X also includes transmission equipment (devices) including a transmission server for a general-purpose broadcasting station. These devices output (on-air) the material video recorded in the storage server 2 and the broadcast video recorded in the storage server 2. In addition, it is also possible to play back broadcast video for preview purposes.

より具体的に説明すると、解析装置１は、ハードウェア資源の一部として、制御部１０を備えている。 To explain more specifically, the analysis device 1 includes a control unit 10 as a part of hardware resources.

制御部１０は、後述する機能部を実現し、本実施形態の再送出処理の各処理を実行する情報処理手段である。制御部１０は、例えば、ＣＰＵ（Central Processing Unit、中央処理装置）、ＭＰＵ（Micro Processing Unit）、ＧＰＵ（Graphics Processing Unit）、ＴＰＵ（Tensor Processing Unit）、ＤＳＰ（Digital Signal Processor）、ＡＳＩＣ（Application Specific Processor、特定用途向けプロセッサー）等で構成される。これにより、制御部１０は、画像成分分析、音声成分分析、及び映像や音声用のＡＩ等の処理を、バッチ処理等を用いて、高速に実行することが可能である。 The control unit 10 is an information processing unit that realizes a functional unit to be described later and executes each process of the retransmission process of this embodiment. The control unit 10 includes, for example, a CPU (Central Processing Unit), an MPU (Micro Processing Unit), a GPU (Graphics Processing Unit), a TPU (Tensor Processing Unit), a DSP (Digital Signal Processor), and an ASIC (Application Specific It consists of processors, processors for specific applications), etc. Thereby, the control unit 10 can perform image component analysis, audio component analysis, and AI processing for video and audio at high speed using batch processing or the like.

蓄積サーバー２は、ハードウェア資源の一部として、記憶部１１を備えている。 The storage server 2 includes a storage unit 11 as part of its hardware resources.

記憶部１１は、一時的でない記録媒体である。記憶部１１は、例えば、ＳＳＤ（Solid State Disk）、ＨＤＤ（Hard Disk Drive）、磁気カートリッジ、テープドライブ、光ディスクアレイ等のビデオストレージとして構成される。
このビデオストレージには、例えば、素材映像のデータ（素材データ）、完成した番組等の放送映像の映像データ、放送映像である放送映像データ２００等が格納される。蓄積サーバー２に格納されたファイルは、番組の放送スケジュールに沿って再生装置に転送されたり、編集装置４による番組編集処理に用いられたりする。これらのデータの詳細については後述する。
加えて、記憶部１１は、一般的なＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）等も含んでいる。これらには、蓄積サーバー２及び解析装置１の制御部１０が実行する処理のプログラム、データベース、一時データ、その他の各種ファイル等が格納される。 The storage unit 11 is a non-temporary recording medium. The storage unit 11 is configured as a video storage such as, for example, an SSD (Solid State Disk), an HDD (Hard Disk Drive), a magnetic cartridge, a tape drive, an optical disk array, or the like.
This video storage stores, for example, material video data (material data), video data of a broadcast video such as a completed program, broadcast video data 200 that is a broadcast video, and the like. The files stored in the storage server 2 are transferred to a playback device according to the broadcast schedule of the program, or are used in program editing processing by the editing device 4. Details of these data will be described later.
In addition, the storage unit 11 also includes general ROM (Read Only Memory), RAM (Random Access Memory), and the like. These stores include programs for processing executed by the storage server 2 and the control unit 10 of the analysis device 1, databases, temporary data, and other various files.

収録装置３は、撮像部３０（撮像手段）を備えている。 The recording device 3 includes an imaging section 30 (imaging means).

撮像部３０は、ＣＣＤ（Charge Coupled Device）やＣＭＯＳ（Complementary Metal Oxide Semiconductor）素子等を用いたカメラ等の撮像装置である。撮像部３０は、収録装置３に内蔵しても、接続された外付けのカメラであってもよい。
撮像部３０は、撮像された画像をデジタル変換し、例えば、ＨＤ－ＳＤＩ規格の画像データとして、収録装置３へ送信する。この際、撮像部３０に装着され、又は、外設されたマイクロフォン等からの音声データも、ほぼ同時に収録装置３へ送信してもよい。または、これらの画像データや音声データは、ミキサーや各種機材を介して、収録装置３へ送信することも可能である。 The imaging unit 30 is an imaging device such as a camera using a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor) element. The imaging unit 30 may be built into the recording device 3 or may be a connected external camera.
The imaging unit 30 digitally converts the captured image and transmits it to the recording device 3 as, for example, HD-SDI standard image data. At this time, audio data from a microphone attached to the imaging unit 30 or provided externally may also be transmitted to the recording device 3 almost at the same time. Alternatively, these image data and audio data can also be transmitted to the recording device 3 via a mixer or various equipment.

次に、解析装置１の機能構成、及び蓄積サーバー２に格納されるデータの詳細について説明する。
制御部１０は、不要領域特定手段１００、不要領域加工手段１１０、元映像特定手段１２０、高画質化手段１３０、削除箇所特定手段１４０、元音声特定手段１５０、及び高音質化処理手段１６０を備える。
記憶部１１は、放送映像データ２００、加工映像データ２１０、元映像データ２２０、音声データ３００、加工音声データ３１０、及び元音声データ３２０を格納する。 Next, the functional configuration of the analysis device 1 and the details of the data stored in the storage server 2 will be explained.
The control unit 10 includes an unnecessary area specifying means 100, an unnecessary area processing means 110, an original video specifying means 120, a high image quality improving means 130, a deletion part specifying means 140, an original audio specifying means 150, and a high sound quality processing means 160. .
The storage unit 11 stores broadcast video data 200, processed video data 210, original video data 220, audio data 300, processed audio data 310, and original audio data 320.

不要領域特定手段１００は、収録装置３から、放送映像データ２００を取得して、放送映像データ２００に含まれる放送映像の特定箇所に連続して表示される不要領域を特定する。この際、不要領域特定手段１００は、例えば、映像の内容を解析し、Ｌ字等を不要領域として、映像上の位置を特定する。
具体的には、不要領域特定手段１００は、削除する対象の領域の特徴を学習させたモデルにより不要領域を特定することが可能である。このモデルは、例えば、画像成分分析やＡＩを用いてもよい。 The unnecessary area specifying means 100 acquires the broadcast video data 200 from the recording device 3 and specifies an unnecessary area that is continuously displayed at a specific part of the broadcast video included in the broadcast video data 200. At this time, the unnecessary area specifying means 100 analyzes the content of the video, for example, and specifies the position on the video as an L-shape or the like as an unnecessary area.
Specifically, the unnecessary area identifying means 100 can identify unnecessary areas using a model that has learned the characteristics of the area to be deleted. This model may use image component analysis or AI, for example.

不要領域特定手段１００により特定された不要領域を放送映像データ２００から削除又は目立たなくする加工を行った加工映像データ２１０を作成する。
不要領域加工手段１１０は、例えば、特定した不要領域について、自動でトリミング、拡大操作、マスク編集等のいずれか又は任意の組み合わせ（以下、単に「マスク処理」という。）により目立たないように加工する。
不要領域加工手段１１０は、作成された加工映像データ２１０を蓄積サーバー２へ格納する。 Processed video data 210 is created in which the unnecessary area identified by unnecessary area identifying means 100 is deleted from broadcast video data 200 or processed to make it less noticeable.
The unnecessary area processing means 110 processes the identified unnecessary area, for example, by automatically trimming, enlarging, mask editing, etc., or any combination thereof (hereinafter simply referred to as "mask processing") to make it less noticeable. .
The unnecessary area processing means 110 stores the created processed video data 210 in the storage server 2.

元映像特定手段１２０は、不要領域加工手段により加工された加工映像データ２１０及び／又は放送映像データ２００から、蓄積サーバー２に格納された元映像データ２２０を特定する。元映像特定手段１２０は、例えば、加工映像データ２１０及び／又は放送映像データ２００の映像内容を解析して、蓄積サーバー２の記憶部１１に格納された、放送映像の素材となる映像データ（素材データ）の映像と照合し、放送に使用された元映像データ２２０を特定する。
より具体的には、元映像特定手段１２０は、加工映像と元映像データ２２０に含まれる元映像との画像中の共通点を抽出し、抽出した共通点に基づいて元映像データ２２０に含まれる元映像を特定することが可能である。 The original video identifying means 120 identifies the original video data 220 stored in the storage server 2 from the processed video data 210 and/or the broadcast video data 200 processed by the unnecessary area processing means. For example, the original video identifying means 120 analyzes the video content of the processed video data 210 and/or the broadcast video data 200, and stores the video data (material) that is the material of the broadcast video stored in the storage unit 11 of the storage server 2. data) to identify the original video data 220 used for broadcasting.
More specifically, the original video identifying means 120 extracts common points in the image between the processed video and the original video included in the original video data 220, and based on the extracted common points, the original video data 220 contains the processed video. It is possible to identify the original video.

高画質化手段１３０は、元映像特定手段１２０により特定された元映像データ２２０を基に、加工映像データ２１０を高画質化する。
具体的には、高画質化手段１３０は、加工映像データ２１０の各加工映像について、元映像データ２２０に含まれる元映像に基づくエッジ情報並びに／若しくは色情報を利用したエッジ強調若しくは合成、及び／又は、元映像データ２２０に含まれる元映像の切り出しによる合成を行うことで高画質化することが可能である。 The image quality improving means 130 improves the image quality of the processed video data 210 based on the original video data 220 specified by the original video specifying means 120.
Specifically, the image quality improvement unit 130 performs edge enhancement or synthesis using edge information and/or color information based on the original video included in the original video data 220, and/or for each processed video of the processed video data 210. Alternatively, high image quality can be achieved by cutting out and synthesizing the original video included in the original video data 220.

削除箇所特定手段１４０は、放送映像に対応した音声データ３００を解析して、削除箇所を特定する。
具体的には、削除箇所特定手段１４０は、特定のモデルを用いて音声解析を行い、音声中の警報音の箇所を特定する。 Deletion location specifying means 140 analyzes audio data 300 corresponding to broadcast video and identifies a deletion location.
Specifically, the deletion location specifying means 140 performs audio analysis using a specific model and identifies the location of the alarm sound in the audio.

元音声特定手段１５０は、元映像データ２２０に対応する元音声データ３２０を特定する。
元音声特定手段１５０は、例えば、蓄積サーバー２に格納された音声の素材データと照合し、放送に使用された元映像データ２２０と対応する元音声データ３２０を特定する。 Original audio identifying means 150 identifies original audio data 320 corresponding to original video data 220.
The original audio specifying means 150 identifies the original audio data 320 that corresponds to the original video data 220 used for broadcasting, for example, by comparing it with the audio material data stored in the storage server 2.

高音質化処理手段１６０は、元音声特定手段１５０により特定された元音声データ３２０を基に、削除箇所特定手段１４０により特定された音声の削除箇所を高音質化する。
高音質化処理手段１６０は、例えば、警報音の逆位相を合成、及び／又は元音声データ３２０の切り出しによる合成を行うことで高音質化することが可能である。 Based on the original audio data 320 specified by the original audio specifying means 150, the high-quality sound processing means 160 enhances the sound quality of the deletion portion of the audio specified by the deletion portion specifying means 140.
The high-quality sound processing unit 160 can improve the sound quality by, for example, synthesizing the opposite phase of the alarm sound and/or synthesizing by cutting out the original audio data 320.

放送映像データ２００は、放送映像のデータである。本実施形態では、放送映像データ２００は、放送時の送出映像を同時録画した放送同録映像等の放送映像を含んでいる。本実施形態では、放送映像データ２００は、例えば、ＭＸＦ形式のファイルを用いる。ＭＸＦは、いわゆる業務用映像ファイルを格納するコンテナフォーマットのファイルの一種である。具体的には、ＭＸＦは、カムコーダ、録画再生機、ノンリニア編集機、送出設備等の放送用装置機材に利用されており、映像や音声等の様々なフォーマットのデータを、メタデータとともにラッピングすることができる。このメタデータは、本実施形態においては、例えば、特定された不要領域のデータ、映像中の特徴データ、元映像との画像中の共通点のデータ等を含ませることが可能である。さらに、メタデータは、例えば、フレームレート、フレームサイズ、作成日、撮像部３０の撮影者、素材映像の各種情報を含めることができる。この各種情報としては、例えば、タイトルや内容、再生時間、シーンの情報、映像中の人物、撮影場所、撮影日時等を含む物体の情報等を用いることが可能である。 Broadcast video data 200 is data of broadcast video. In this embodiment, the broadcast video data 200 includes broadcast video such as a broadcast simultaneous recording video that is a simultaneous recording of a video sent out during broadcasting. In this embodiment, the broadcast video data 200 uses, for example, an MXF format file. MXF is a type of container format file that stores so-called business video files. Specifically, MXF is used in broadcasting equipment such as camcorders, recording/playback machines, nonlinear editing machines, and playout equipment, and is used to wrap data in various formats such as video and audio together with metadata. I can do it. In this embodiment, this metadata can include, for example, data on identified unnecessary areas, feature data in the video, data on common points in the image with the original video, and the like. Further, the metadata can include, for example, frame rate, frame size, creation date, photographer of the imaging unit 30, and various information about the material video. As this various information, it is possible to use, for example, title, content, playback time, scene information, object information including people in the video, shooting location, shooting date and time, and the like.

加工映像データ２１０は、放送映像データ２００から不要領域を削除又は目立たなくする加工を行った映像のデータである。この加工映像データ２１０も、ＭＸＦ形式のデータ、又は、最終的に送出設備で送出用のデータに加工される前の、編集用の中間的な形式のデータ等であってもよい。または、加工映像データ２１０は、元映像データ２２０のような素材データと同じ形式のデータであってもよい。さらに、加工映像データ２１０は、上述のように元映像データ２２０により高画質化されて、送出されてもよい。 Processed video data 210 is video data that has been processed from broadcast video data 200 by deleting unnecessary areas or making them less noticeable. This processed video data 210 may also be data in MXF format, or data in an intermediate format for editing before being finally processed into data for transmission at a transmission facility. Alternatively, the processed video data 210 may be data in the same format as the material data, such as the original video data 220. Furthermore, the processed video data 210 may be sent out after being enhanced in image quality with the original video data 220 as described above.

元映像データ２２０は、蓄積サーバー２に格納された素材データである。元映像データ２２０は、実際の放送映像データ２００で使用された番組のデータ、その素材のデータ等を含む。元映像データ２２０と、放送映像データ２００とは、映像のフォーマットが異なってもよく、画質が放送映像データ２００より低圧縮や非圧縮等で高画質であってもよい。すなわち、元映像データ２２０のフォーマット（形式）は、ＭＸＦ形式以外の形式であっても、独自形式であってもよい。さらに、元映像データ２２０は、収録装置３から、素材データとして収録され、多重化された映像ストリームであってもよい。
加えて、本実施形態において、元映像データ２２０は、映像中の特徴データ、放送映像との画像中の共通点のデータ等を含んでいてもよい。 The original video data 220 is material data stored in the storage server 2. The original video data 220 includes program data used in the actual broadcast video data 200, data of its materials, and the like. The original video data 220 and the broadcast video data 200 may have different video formats, and may have higher image quality than the broadcast video data 200, such as with lower compression or non-compression. That is, the format of the original video data 220 may be a format other than the MXF format or may be an original format. Furthermore, the original video data 220 may be a video stream recorded as material data from the recording device 3 and multiplexed.
Additionally, in this embodiment, the original video data 220 may include feature data in the video, data on common points in the image with the broadcast video, and the like.

音声データ３００は、放送映像データ２００に対応した音声のデータである。音声データ３００は、放送時の送出音声を同時録音した放送同録録音等の放送音声を含んでいる。この放送音声は、例えば、Ｌ字等の箇所に注目を促すためのチャイムやブザーや短い音声等の警報音を含んでいてもよい。具体的には、音声データ３００は、例えば、ＭＸＦ形式のコンテナフォーマットに含まれるストリームとして、まとめられていてもよい。または、この放送音声は、例えば、各種量子化ビット数や周波数のＷＡＶ形式のファイル、各種圧縮形式や音声ストリーム形式のファイルであってもよい。音声データ３００は、後述するように、警報音の箇所が逆位相の警報音により加工されても、元音声データ３２０により置き換えられてもよい。 Audio data 300 is audio data corresponding to broadcast video data 200. The audio data 300 includes broadcast audio such as broadcast simultaneous recording, which is a simultaneous recording of broadcast audio. This broadcast sound may include, for example, a warning sound such as a chime, a buzzer, or a short sound to draw attention to a location such as an L-shape. Specifically, the audio data 300 may be collected as a stream included in a container format such as MXF format, for example. Alternatively, this broadcast audio may be, for example, a WAV format file with various quantization bit numbers and frequencies, a file in various compression formats, or an audio stream format file. As described later, the audio data 300 may be processed with an alarm sound having an opposite phase, or may be replaced with the original audio data 320.

加工音声データ３１０は、加工映像データ２１０に含まれる音声のデータである。
本実施形態において、加工音声データ３１０は、音声データ３００と同じデータであってもよい。加工音声データ３１０は、上述のように音声データ３００により高音質化されて、加工映像データ２１０と共に送出されてもよい。
加工音声データ３１０も、ＷＡＶ形式のファイル、各種圧縮形式や音声ストリーム形式のファイル、編集用の中間的な形式のデータ等であってもよい。 Processed audio data 310 is audio data included in processed video data 210.
In this embodiment, the processed audio data 310 may be the same data as the audio data 300. The processed audio data 310 may be enhanced in quality with the audio data 300 as described above, and may be sent together with the processed video data 210.
The processed audio data 310 may also be a WAV format file, a file in various compression formats or an audio stream format, data in an intermediate format for editing, or the like.

元音声データ３２０は、元映像データ２２０に対応する音声のデータである。元音声データ３２０も、ＷＡＶ形式のファイル、各種圧縮形式や音声ストリーム形式のファイル、編集用の中間的な形式のデータ等であってもよい。 The original audio data 320 is audio data corresponding to the original video data 220. The original audio data 320 may also be a WAV format file, a file in various compression formats or an audio stream format, data in an intermediate format for editing, or the like.

ここで、上述の各機能部は、記憶部１１に記憶された制御プログラム等が制御部１０で実行されることにより実現される。
なお、これらの各機能部は、ＦＰＧＡ（Field Programmable Gate Array）やＡＳＩＣ（Application Specific Integrated Circuit）等により、回路的に構成されてもよい。 Here, each of the above-mentioned functional units is realized by the control program and the like stored in the storage unit 11 being executed by the control unit 10.
Note that each of these functional units may be configured as a circuit using an FPGA (Field Programmable Gate Array), an ASIC (Application Specific Integrated Circuit), or the like.

〔編集システムＸの再送出処理〕
次に、図２～図４を参照して、本発明の実施の形態に係る編集システムＸを用いた再送出処理についてより詳しく説明する。
本実施形態の再送出処理においては、放送同録映像等の放送映像データ２００を高画質化する高画質化処理と、放送同録録音等の音声データ３００を高音質化する高音質化処理とを実行する。これらの処理は、スレッドやプロセス等で同時並行的に実行されてもよい。
以下で、この編集システムＸによる再送出処理について、図２の各フローチャートを用いて説明する。 [Resend processing of editing system X]
Next, the retransmission process using the editing system X according to the embodiment of the present invention will be described in more detail with reference to FIGS. 2 to 4.
In the retransmission process of this embodiment, a high image quality process is performed to improve the image quality of broadcast video data 200 such as a broadcast simultaneous recorded video, and a high sound quality process is performed to improve the sound quality of audio data 300 such as a broadcast simultaneous recorded video. Execute. These processes may be executed concurrently using threads, processes, or the like.
The retransmission process by the editing system X will be explained below using the flowcharts in FIG.

まず、再送出処理における高画質化処理について、図２（ａ）のフローチャートと、図３とを用いて、ステップ毎に詳しく説明する。 First, the image quality improvement process in the retransmission process will be explained in detail step by step using the flowchart of FIG. 2(a) and FIG. 3.

ステップＳ１００において、不要領域特定手段１００が、初期処理を行う。
図３（ａ）によると、不要領域特定手段１００は、放送局等毎に、Ｌ字等の削除する対象の領域の特徴をモデルに学習させる。このため、例えば、不要領域特定手段１００は、Ｌ字等に含まれる文字の形状、時刻表示の形状、字幕の表示位置やフォント等の加工が特定のパターンに限られることを利用し、これをモデルとして用いる。すなわち、ビデオサーバーシステムは単一の放送局が保有、運用することが一般的であることから、Ｌ字の形状、時刻表示の形状、字幕の表示位置やフォント等の加工方法は、ある程度特定のパターンに限られることを利用することができる。これは、ビデオサーバーシステムは放送局ごとに稼働しており、そこで扱われるＬ字等の挿入フォーマットは、ある程度の規則性があるからである。 In step S100, the unnecessary area specifying means 100 performs initial processing.
According to FIG. 3A, the unnecessary area specifying means 100 causes the model to learn the characteristics of the area to be deleted, such as an L-shape, for each broadcasting station or the like. For this reason, for example, the unnecessary area identifying means 100 takes advantage of the fact that the shape of characters included in the L character, the shape of the time display, the display position of subtitles, the font, etc. can only be modified to a specific pattern. Use as a model. In other words, since video server systems are generally owned and operated by a single broadcasting station, processing methods such as the L-shape, time display shape, subtitle display position, font, etc. are limited to a certain extent. You can take advantage of being limited to patterns. This is because the video server system is operated for each broadcasting station, and the format used for inserting L characters and the like has a certain degree of regularity.

具体的には、不要領域特定手段１００は、画像成分分析を行う場合、例えば、Ｌ字等に含まれる特定の画像成分を検出する。これは、例えば、画像の成分分析において、Ｌ字等に含まれる、特定の画像成分を検出することを示す。
または、不要領域特定手段１００は、ＡＩを用いる場合、放送局等毎に、Ｌ字等の特定の図柄、時刻表示等を不要領域として、予め学習させることが可能である。これは、例えば、特定の図柄を示したＬ字等を、ＡＩに削除する対象の領域と認識させることを示す。 Specifically, when performing image component analysis, the unnecessary area specifying means 100 detects, for example, a specific image component included in an L-shape or the like. This indicates that, for example, in image component analysis, a specific image component included in an L-shape or the like is detected.
Alternatively, when using AI, the unnecessary area specifying means 100 can learn in advance that a specific pattern such as an L character, a time display, etc. are unnecessary areas for each broadcasting station or the like. This indicates, for example, that the AI recognizes an L-shape or the like indicating a specific pattern as an area to be deleted.

一方、不要領域特定手段１００は、放送映像データ２００を取得する。具体的には、送出設備から送出された、放送時の送出映像を同時録画し、この放送同録映像を放送映像データ２００として、記憶部１１へ格納する。 On the other hand, unnecessary area specifying means 100 acquires broadcast video data 200. Specifically, the transmission video transmitted from the transmission equipment at the time of broadcasting is simultaneously recorded, and the broadcast simultaneously recorded video is stored in the storage unit 11 as broadcast video data 200.

次に、ステップＳ１０１において、不要領域特定手段１００が不要領域特定処理を行う。
図３（ａ）によると、不要領域特定手段１００は、放送映像データ２００に含まれる放送映像の特定箇所に連続して表示される不要領域を特定する。
Ｌ字等を削除する方法としては、まず映像解析が必要である。本実施形態においては、不要領域特定手段１００は、削除する対象の領域の特徴を学習させたモデルにより、不要領域を特定する。具体的には、不要領域特定手段１００は、例えば、放送映像データ２００の全編に対して解析を行って、番組の一部で生じた偶発的な類似画像の発生と、意図的に挿入されたＬ字等とを判別する。すなわち、不要領域特定手段１００は、放送映像データ２００の内容により、不要なＬ字等の位置を特定することが可能である。
不要領域特定手段１００は、このモデルとして、上述の画像成分分析又はＡＩを用いてもよい。これらにより、映像に含まれるＬ字等の有無、及びその不要領域の範囲の特定が可能である。 Next, in step S101, the unnecessary area specifying means 100 performs unnecessary area specifying processing.
According to FIG. 3A, the unnecessary area specifying means 100 specifies an unnecessary area that is continuously displayed at a specific part of the broadcast video included in the broadcast video data 200.
As a method for deleting L-characters etc., video analysis is first required. In the present embodiment, the unnecessary area identifying means 100 identifies unnecessary areas using a model that has learned the characteristics of the area to be deleted. Specifically, the unnecessary area identification means 100 analyzes the entire broadcast video data 200, and identifies accidental occurrences of similar images that occur in a part of the program and images that have been intentionally inserted. Distinguish between L-shape, etc. That is, the unnecessary area specifying means 100 can specify the position of an unnecessary L-shape or the like based on the contents of the broadcast video data 200.
The unnecessary area identifying means 100 may use the above-mentioned image component analysis or AI as this model. With these, it is possible to identify the presence or absence of an L-shape, etc. included in the video, and the range of its unnecessary area.

ここで、不要領域特定手段１００は、放送映像データ２００について、映像の各フレームを全て解析する必要はなく、フレームを間引いて解析してもよい。この間引きの間隔は、解析のモデル等の特性等により設定可能である。
具体的に説明すると、Ｌ字等の特徴として、基本的に放送全編において、長期間連続して表示されていることが挙げられる。すなわち、不要領域特定手段１００は、放送映像データ２００の全編にＬ字等がある場合、任意の１フレームの映像を解析すれば、不要領域を特定可能である。 Here, the unnecessary area specifying means 100 does not need to analyze every frame of the broadcast video data 200, and may thin out frames for analysis. This thinning interval can be set depending on the characteristics of the analysis model, etc.
To be more specific, a characteristic of the L-shape etc. is that it is basically displayed continuously for a long period of time throughout the broadcast. That is, when the entire broadcast video data 200 includes an L-shape or the like, the unnecessary area identifying means 100 can identify the unnecessary area by analyzing one arbitrary frame of video.

しかしながら、Ｌ字等が全編ではない場合もある。さらに、ＣＭを放映している最中はＬ字等の表示を解除している可能性もある。このため、不要領域特定手段１００は、５秒程度あたり１フレーム毎の解析によって、不要領域を特定することも可能である。この場合、不要領域特定手段１００は、Ｌ字等の有無が放送映像データ２００中で変化する場合、不要領域があった箇所の前後の各フレームを解析していき、変化点を算出することが可能である。さらに、不要領域特定手段１００は、変化点においてＬ字等の大きさが変動する場合、Ｌ字等の領域範囲の特定を、各フレームに対して実行することが可能である。 However, the L-shape etc. may not be the entire story. Furthermore, there is a possibility that the display of the letter L or the like may be canceled while a commercial is being aired. For this reason, the unnecessary area specifying means 100 can also specify unnecessary areas by analyzing every frame every about 5 seconds. In this case, if the presence or absence of an L-shape or the like changes in the broadcast video data 200, the unnecessary area specifying means 100 can calculate the point of change by analyzing each frame before and after the location where the unnecessary area exists. It is possible. Further, if the size of the L-shape or the like changes at a change point, the unnecessary area specifying means 100 can specify the area range of the L-shape or the like for each frame.

さらに、不要領域特定手段１００は、放送映像データ２００のメタデータやＯＣＲによる解析を行って、Ｌ字等に含まれる文字列の文脈（コンテキスト）を解析し、含まれる情報の内容により、削除するべき内容なのか、映像コンテンツに元から存在した情報なのかを判別することも可能である。
不要領域特定手段１００は、これらの不要領域と特定された箇所について、放送映像データ２００のメタデータに格納することが可能である。 Further, the unnecessary area specifying means 100 analyzes the broadcast video data 200 using metadata and OCR, analyzes the context of character strings included in L characters, etc., and deletes them depending on the content of the information included. It is also possible to determine whether the information is the original content or information that originally existed in the video content.
The unnecessary area specifying means 100 can store the locations identified as unnecessary areas in the metadata of the broadcast video data 200.

図３（ａ）では、Ｌ字の領域である不要領域Ａ１と、時刻表示の領域である不要領域Ａ２と、地図の領域である不要領域Ａ３とが特定された例を示している。 FIG. 3A shows an example in which an unnecessary area A1 that is an L-shaped area, an unnecessary area A2 that is a time display area, and an unnecessary area A3 that is a map area are identified.

次に、ステップＳ１０２において、不要領域加工手段１１０が、不要領域があったか否かを判断する。不要領域加工手段１１０は、例えば、不要領域特定手段１００により特定された不要領域が放送映像データ２００のメタデータに設定されていた場合、Ｙｅｓと判断する。
Ｙｅｓの場合、不要領域加工手段１１０は、処理をステップＳ１０３へ進める。
Ｎｏの場合、不要領域加工手段１１０は、再送出処理における高画質化処理を終了する。 Next, in step S102, the unnecessary area processing means 110 determines whether there is an unnecessary area. For example, if the unnecessary area specified by the unnecessary area specifying means 100 is set in the metadata of the broadcast video data 200, the unnecessary area processing means 110 determines Yes.
If Yes, the unnecessary area processing means 110 advances the process to step S103.
In the case of No, the unnecessary area processing means 110 ends the image quality improvement process in the retransmission process.

不要領域があった場合、ステップＳ１０３において、不要領域加工手段１１０が、不要領域加工処理を行う。
不要領域加工手段１１０は、不要領域を放送映像データ２００から削除又は目立たなくする加工を行った加工映像データ２１０を作成する。この不要領域を削除又は目立たなくする加工として、不要領域加工手段１１０は、例えば、特定されたＬ字の不要領域については、直接、画面表示されないような編集を行い、加工映像データ２１０を作成する。具体的には、不要領域加工手段１１０は、例えば、Ｌ字を自動でトリミングし、Ｌ字以外の領域を拡大し、全画面表示となるような編集を行う。これによって、加工映像データ２１０から、Ｌ字の表示を削除することが可能である。 If there is an unnecessary area, the unnecessary area processing means 110 performs unnecessary area processing processing in step S103.
Unnecessary area processing means 110 creates processed video data 210 in which unnecessary areas are deleted from broadcast video data 200 or processed to make them less noticeable. As a process for deleting or making the unnecessary area less noticeable, the unnecessary area processing means 110, for example, edits the identified L-shaped unnecessary area so that it is not directly displayed on the screen, and creates processed video data 210. . Specifically, the unnecessary area processing unit 110 performs editing such as automatically trimming the L-shape, enlarging areas other than the L-shape, and displaying the entire screen. With this, it is possible to delete the L-shaped display from the processed video data 210.

一方、不要領域加工手段１１０は、例えば、特定された時刻や字幕等の不要領域に対しては、自動的にマスク処理を実行する。この場合、不要領域加工手段１１０は、例えば、不要領域にガウスブラー等のボカシ処理をするような編集を行う。これにより、加工映像データ２１０において、時刻や字幕や地図等の表示が、目立たなくなるか、視認できないようになる。
すなわち、不要領域加工手段１１０は。自動的にマスク処理を行った加工映像データ２１０を作成可能である。 On the other hand, the unnecessary area processing means 110 automatically performs mask processing on unnecessary areas such as the specified time and subtitles, for example. In this case, the unnecessary region processing means 110 performs editing such as blurring processing such as Gaussian blur on the unnecessary region. As a result, in the processed video data 210, displays such as the time, subtitles, and maps become inconspicuous or invisible.
That is, the unnecessary area processing means 110. It is possible to create processed video data 210 that has been automatically masked.

図３（ｂ）は、不要領域を削除又は目立たなくした加工映像データ２１０の例を示す。 FIG. 3(b) shows an example of processed video data 210 in which unnecessary areas have been deleted or made less noticeable.

次に、ステップＳ１０４において、元映像特定手段１２０が、元映像特定処理を行う。
ビデオサーバーシステムを用いて放送された番組は、ビデオサーバーシステム内に格納されている映像を用いている可能性が十分に考えられる。このため、ビデオサーバーシステム内に保管されている映像を検索し、特定する。
具体的には、元映像特定手段１２０は、不要領域加工手段により加工された加工映像データ２１０及び／又は放送映像データ２００から、蓄積サーバー２に格納された元映像データ２２０を特定する。より具体的には、元映像特定手段１２０は、加工映像データ２１０に含まれる映像と、元映像データ２２０に含まれる元映像との画像中の共通点を抽出することが可能である。 Next, in step S104, the original video identifying means 120 performs original video identifying processing.
It is highly possible that programs broadcast using a video server system use video stored within the video server system. For this purpose, the video stored in the video server system is searched and identified.
Specifically, the original video identifying means 120 identifies the original video data 220 stored in the storage server 2 from the processed video data 210 and/or the broadcast video data 200 processed by the unnecessary area processing means. More specifically, the original video identifying means 120 is capable of extracting common points between the video included in the processed video data 210 and the original video included in the original video data 220.

ここで、元映像特定手段１２０は、例えば、加工映像データ２１０と元素材映像データとの映像中の特徴データを、成分分析して、メタデータ等として、それぞれに格納する。元映像特定手段１２０は、この加工映像データ２１０及び元素材映像データの特徴データを、時系列に沿って比較することで、共通点として抽出可能である。この映像中の特徴データは、例えば、文字情報、画面の色情報、描画されたオブジェクトの情報、サムネイル画像の情報等を設定可能である。または、元映像特定手段１２０は、加工映像データ２１０と元素材映像データとを、直接、ＡＩに学習させ、抽出した共通点に基づいて照合するといった処理を行うことも可能である。 Here, the original video specifying means 120, for example, performs component analysis on characteristic data in the processed video data 210 and the original material video data, and stores the analyzed data as metadata or the like in each of them. The original video identifying means 120 can extract common points by comparing the feature data of the processed video data 210 and the original material video data in chronological order. The feature data in the video can include, for example, text information, screen color information, drawn object information, thumbnail image information, and the like. Alternatively, the original video specifying means 120 can also directly cause the AI to learn the processed video data 210 and the original material video data, and perform a process of comparing them based on the extracted common points.

すなわち、元映像特定手段１２０は、加工映像データ２１０及び／又は放送映像データ２００において、抽出した共通点に基づいて元映像データ２２０に含まれる元映像を特定する。
この検索により、放送に用いられた元映像の特定が可能となる。 That is, the original video identifying means 120 identifies the original video included in the original video data 220 based on the extracted common points in the processed video data 210 and/or the broadcast video data 200.
This search makes it possible to identify the original video used for broadcasting.

図３（ｃ）は、加工映像データ２１０及び放送映像データ２００に対応して特定された元映像データ２２０の例を示す。 FIG. 3C shows an example of original video data 220 specified in correspondence to processed video data 210 and broadcast video data 200.

次に、ステップＳ１０５において、高画質化手段１３０が、元映像データ２２０が特定できたか否かを判断する。高画質化手段１３０は、元映像特定手段１２０により放送映像データ２００から元映像データ２２０が特定できた場合、Ｙｅｓと判定する。
Ｙｅｓの場合、高画質化手段１３０は、処理をステップＳ１０６へ進める。
Ｎｏの場合、高画質化手段１３０は、再送出処理における高画質化処理を終了する。 Next, in step S105, the image quality improvement unit 130 determines whether or not the original video data 220 has been identified. The image quality improving means 130 determines Yes when the original video identifying means 120 is able to identify the original video data 220 from the broadcast video data 200.
If Yes, the image quality improvement unit 130 advances the process to step S106.
In the case of No, the image quality improvement unit 130 ends the image quality improvement process in the retransmission process.

元映像データ２２０が特定できた場合、ステップＳ１０６において、高画質化手段１３０が、高画質処理を行う。
高音質化処理手段１６０は、元映像特定手段１２０により特定された元映像データ２２０を基に、加工映像を高画質化する。
高画質化手段１３０は、加工映像データ２１０について、超解像処理や高画質化処理を行う。具体的には、高画質化手段１３０は、元映像データ２２０に含まれる元映像の切り出しによる合成を行うことで高画質化することが可能である。 If the original video data 220 can be identified, the image quality improving means 130 performs high image quality processing in step S106.
The high quality sound processing means 160 improves the quality of the processed video based on the original video data 220 specified by the original video specifying means 120.
The image quality improvement unit 130 performs super resolution processing and image quality improvement processing on the processed video data 210. Specifically, the image quality improvement unit 130 can improve the image quality by cutting out and combining original videos included in the original video data 220.

図３（ｃ）及び図３（ｅ）は、この元映像データ２２０から元映像の一部又は全画面を切り出して、加工映像データ２１０に上書き等で合成した例を示す。 3(c) and 3(e) show examples in which a part or the entire screen of the original video is cut out from the original video data 220 and combined with the processed video data 210 by overwriting or the like.

さらに、高画質化手段１３０は、加工映像データ２１０について、元映像データ２２０に含まれる元映像に基づくエッジ情報及び／又は色情報を利用したエッジ強調又は合成を行うことも可能である。 Further, the image quality improvement unit 130 can perform edge enhancement or synthesis on the processed video data 210 using edge information and/or color information based on the original video included in the original video data 220.

図３（ｄ）及び図３（ｅ）は、この元映像データ２２０からエッジ情報や色情報を抽出し、加工映像データ２１０に合成した例を示す。 FIGS. 3(d) and 3(e) show examples in which edge information and color information are extracted from this original video data 220 and combined with processed video data 210.

さらに加えて、高画質化手段１３０は、ＧＡＮ等のＡＩにより加工映像データ２１０を高画質化することも可能である。 In addition, the image quality improving means 130 can also improve the image quality of the processed video data 210 using AI such as GAN.

その後、この高画質化された加工映像データ２１０は、送出設備により再送出される。この際、下記の高音質化処理が行われた音声データ３００を、ＭＸＦ形式等のコンテナフォーマットのファイルとして再送出してもよい。なお、不要領域が特定されず、加工映像データ２１０が生成されなかった場合、放送映像データ２００をそのまま再送出することも可能である。
以上により、再送出処理における高画質化処理を終了する。 Thereafter, this high-quality processed video data 210 is retransmitted by the transmission equipment. At this time, the audio data 300 that has been subjected to the high-quality sound processing described below may be retransmitted as a file in a container format such as MXF format. Note that if an unnecessary area is not identified and the processed video data 210 is not generated, it is also possible to retransmit the broadcast video data 200 as is.
With the above steps, the image quality improvement process in the retransmission process is completed.

次に、再送出処理における高音質化処理について、図２（ｂ）のフローチャートと、図４とを用いて、ステップ毎に詳しく説明する。 Next, the high-quality sound processing in the retransmission processing will be explained step by step in detail using the flowchart of FIG. 2(b) and FIG. 4.

まず、ステップＳ１１０において、削除箇所特定手段１４０が、初期処理を行う。
削除箇所特定手段１４０は、上述の映像の高画質化処理と同様に、特定のモデルとして、例えば、音声データ３００から検索するモデルを設定する。ここで、上述のように、ビデオサーバーシステムは、単一の放送局が保有、運用することが一般的であることから、重畳される音声は、ある程度、特定のパターンに限られることを利用することが可能である。これは、例えば、特定のメロディ、音声パターン、音声の周波数変化等の特徴を、削除する対象と認識させることを示す。
本実施形態では、警報音についてのモデルを設定する例について説明する。このモデルは、例えば、ＨＭＭ等の統計モデル、ＲＮＮやＬＳＴＭ等の時系列モデルを用いたＡＩにより学習、設定されてもよい。 First, in step S110, the deletion location specifying means 140 performs initial processing.
The deletion portion specifying means 140 sets, for example, a model to be searched from the audio data 300 as a specific model, similarly to the above-described video quality enhancement process. Here, as mentioned above, video server systems are generally owned and operated by a single broadcasting station, so the superimposed audio is limited to a certain specific pattern. Is possible. This indicates, for example, that characteristics such as a specific melody, voice pattern, voice frequency change, etc. are recognized as objects to be deleted.
In this embodiment, an example of setting a model for an alarm sound will be described. This model may be learned and set by AI using, for example, a statistical model such as HMM, or a time series model such as RNN or LSTM.

次に、ステップＳ１１１において、削除箇所特定手段１４０が、削除箇所特定処理を行う。
削除箇所特定手段１４０は、放送映像データ２００に対応した音声データ３００を解析して、削除箇所を特定する。削除箇所特定手段１４０は、例えば、放送映像データ２００のコンテナフォーマットの映像ストリームに対応づけられた音声データ３００を蓄積サーバー２から取得して、解析する。 Next, in step S111, the deletion location identifying means 140 performs deletion location identification processing.
Deletion location specifying means 140 analyzes audio data 300 corresponding to broadcast video data 200 and identifies a deletion location. For example, the deletion location specifying means 140 acquires the audio data 300 associated with the container format video stream of the broadcast video data 200 from the storage server 2 and analyzes it.

図４（ａ）によれば、削除箇所特定手段１４０は、特定のモデルを用いて音声データ３００の解析を行い、音声中の警報音の箇所を特定する。音声データ３００の解析方法としては、機械的に音声の成分分析を行っても、ＡＩを用いてもよい。加えて、警報音の箇所は、単に警報音のみが音声データ３００に録音されているのではなく、他の音声に警報音が重畳された箇所であってもよい。この際、削除箇所特定手段１４０は、例えば、音声データ３００を数ミリ秒～数百ミリ秒程度のウィンドウに分けてＦＦＴ（Fast Fourier Transform）を行い、警報音のパターンの位置を検索する。具体的には、削除箇所特定手段１４０は、例えば、ＨＭＭ等の統計モデル、ＲＮＮやＬＳＴＭ等のＡＩ等により、音声中の警報音の箇所を特定することが可能である。この警報音の特定も、音声データ全編に対して行っても、特定間隔で行っても、元映像データ２２０のＬ字等と対応する箇所のみに絞って行ってもよい。 According to FIG. 4A, the deletion location specifying means 140 analyzes the audio data 300 using a specific model and identifies the location of the alarm sound in the audio. As a method for analyzing the audio data 300, a mechanical audio component analysis may be performed or AI may be used. In addition, the location of the alarm sound is not simply a location where only the alarm sound is recorded in the audio data 300, but may be a location where the alarm sound is superimposed on other sounds. At this time, the deletion point specifying means 140, for example, divides the audio data 300 into windows of several milliseconds to several hundred milliseconds and performs FFT (Fast Fourier Transform) to search for the position of the alarm sound pattern. Specifically, the deletion location specifying means 140 can specify the location of the alarm sound in the audio using, for example, a statistical model such as HMM, AI such as RNN or LSTM, or the like. This alarm sound may be specified for the entire audio data, at specific intervals, or limited to a portion of the original video data 220 that corresponds to an L-shape or the like.

次に、ステップＳ１１２において、元音声特定手段１５０が、削除箇所があったか否かを判断する。
Ｙｅｓの場合、は、処理をステップＳ１１３へ進める。
Ｎｏの場合、は、再送出処理の高音質化処理を終了する。 Next, in step S112, the original audio specifying means 150 determines whether there is a deleted portion.
If Yes, the process advances to step S113.
In the case of No, the high-quality sound processing of the retransmission processing ends.

削除箇所があった場合、ステップＳ１１３において、元音声特定手段１５０が、元音声特定処理を行う。
映像と同様、ビデオサーバーシステムを用いて放送された番組は、ビデオサーバーシステム内に保管されている音声を用いている可能性が十分に考えられる。このため、音声解析時に、ビデオサーバーシステム内に格納されている元音声データ３２０の検索を行うことが可能である。 If there is a deleted portion, the original audio specifying means 150 performs original audio specifying processing in step S113.
As with video, programs broadcast using a video server system are highly likely to use audio stored within the video server system. Therefore, during audio analysis, it is possible to search the original audio data 320 stored within the video server system.

図４（ｂ）によれば、元音声特定手段１５０は、例えば、元映像データ２２０に対応する元音声データ３２０を特定する。この検索により、放送に用いられた元音声の特定が可能である。 According to FIG. 4(b), the original audio identifying means 150 identifies, for example, original audio data 320 corresponding to the original video data 220. Through this search, it is possible to specify the original audio used in the broadcast.

次に、ステップＳ１１４において、高音質化処理手段１６０が、元音声データ３２０を特定できたか否かを判断する。
Ｙｅｓの場合、高音質化処理手段１６０は、処理をステップＳ１１５へ進める。
Ｎｏの場合、高音質化処理手段１６０は、処理をステップＳ１１６へ進める。 Next, in step S114, the high-quality sound processing means 160 determines whether or not the original audio data 320 has been identified.
If Yes, the high-quality sound processing means 160 advances the process to step S115.
In the case of No, the high-quality sound processing means 160 advances the process to step S116.

警報音の重畳が検出され、元音声データ３２０が特定できた場合、ステップＳ１１５において、高音質化処理手段１６０が、コピー高音質処理を行う。
高音質化処理手段１６０は、元音声特定手段１５０により特定された元音声データ３２０を基に、削除箇所特定手段１４０により特定された音声の削除箇所を高音質化する。高音質化処理手段１６０は、例えば、音声データ３００の警報音が含まれる範囲を元音声データ３２０の当該範囲で置き換える。 If the superimposition of the alarm sound is detected and the original audio data 320 can be identified, the high-quality sound processing unit 160 performs copy high-quality processing in step S115.
Based on the original audio data 320 specified by the original audio specifying means 150, the high-quality sound processing means 160 enhances the sound quality of the deletion portion of the audio specified by the deletion portion specifying means 140. For example, the high-quality sound processing unit 160 replaces the range in which the alarm sound is included in the audio data 300 with the corresponding range in the original audio data 320.

図４（ｂ）及び図４（ｃ）によれば、高音質化処理手段１６０は、音声データ３００の音声の削除を指定し、削除箇所を対応する元音声データ３２０の箇所で置換して、警報音を消去するような編集内容を設定し、実行する。この処理は、制御部１０に含まれるＤＳＰ等の専用プロセッサーで実行することも可能である。さらに、この際、高音質化処理手段１６０は、コンプレッサー等のエフェクトにより、音声の出力レベルを調整してもよい。
その後、高音質化処理手段１６０は、再送出処理の高音質化処理を終了する。 According to FIGS. 4(b) and 4(c), the high-quality sound processing means 160 specifies the deletion of the audio of the audio data 300, replaces the deleted portion with the corresponding portion of the original audio data 320, and Set and execute editing contents such as erasing the alarm sound. This processing can also be executed by a dedicated processor such as a DSP included in the control unit 10. Furthermore, at this time, the high-quality sound processing means 160 may adjust the output level of the audio using an effect such as a compressor.
Thereafter, the high-quality sound processing unit 160 ends the high-quality sound processing of the retransmission process.

警報音の重畳が検出されたものの、元音声データ３２０が特定できなかった場合、ステップＳ１１６において、高音質化処理手段１６０が、反転高音質処理を行う。 If the superimposition of the alarm sound is detected but the original audio data 320 cannot be identified, the high-quality sound processing unit 160 performs inverted high-quality sound processing in step S116.

図４（ｄ）によると、高音質化処理手段１６０は、警報音を位相反転した逆位相の波形データを、適切な出力レベルで音声データ３００と合成して、警報音を削除する。または、高音質化処理手段１６０は、警報音の周波数成分を削除する等の特殊なフィルター処理により、警報音を削除することも可能である。または、高音質化処理手段１６０は、警報音を消すように学習させたＡＩを利用して、警報音を削除することも可能である。さらに、削除後、高音質化処理手段１６０は、音声の出力レベルを調整してもよい。 According to FIG. 4(d), the high-quality sound processing unit 160 deletes the alarm sound by synthesizing the waveform data of the opposite phase obtained by inverting the phase of the alarm sound with the audio data 300 at an appropriate output level. Alternatively, the high-quality sound processing means 160 can also delete the alarm sound by performing special filter processing such as removing frequency components of the alarm sound. Alternatively, the high-quality sound processing unit 160 can also delete the alarm sound by using AI that has been trained to mute the alarm sound. Furthermore, after deletion, the high-quality sound processing means 160 may adjust the output level of the audio.

これらの処理が終了した後、加工された音声データ３００は、加工映像データ２１０に対応づけられて、送出設備により再送出される。ここで、削除箇所がなかった場合、加工されない状態の音声データ３００が再送出される。なお、放送映像データ２００に、加工された又は加工されていない音声データ３００が対応づけられて再送出されてもよい。
以上により、再送出処理の高音質化処理を終了する。 After these processes are completed, the processed audio data 300 is associated with the processed video data 210 and retransmitted by the transmission equipment. Here, if there is no deleted portion, the unprocessed audio data 300 is retransmitted. Note that the broadcast video data 200 may be retransmitted in association with the processed or unprocessed audio data 300.
With the above steps, the high quality sound processing of the retransmission processing is completed.

以上のように構成することで、以下のような効果を得ることができる。
図５によると、従来、放送同録の放送映像を元に、再放送や再配信等で再送出を行う場合、Ｌ字等の不要な要素を削除するような映像加工を行っていた。このような映像の削除加工は、編集作業が都度手動で行われており、運用者の業務負荷が発生するうえ、再配信の迅速性にも欠ける。また、編集は手動であるため、Ｌ字部分を削除する範囲の設定不備により、必要以上の領域を削除した場合は不自然な画角となったり、その逆に削除範囲が狭かった場合はＬ字部分の背景色がハミ出し残存したりして、放送に適さない映像となる可能性があった。加えて、Ｌ字により縮小した領域には、再エンコードによる圧縮ノイズ等が発生することがあった。さらに、このＬ字により縮小した領域を再度拡大すると、映像の解像感が元の放送映像と比較すると、損なわれる（ボケが生じる）ことがあった。一方、時刻等をマスク（ボカシ）加工した領域は、周囲の映像との境界が生じ、極めて不自然な映像となっていた。そもそも、放送同録映像は放送映像を保存するために再圧縮したものが多いと想定されることから、本来と比較すると画質が劣っていた。
これらにより、映像上の違和感が生じて、放送に相応しくない映像となる可能性があった。 By configuring as described above, the following effects can be obtained.
According to FIG. 5, conventionally, when retransmitting for rebroadcasting or redistribution based on broadcast video recorded simultaneously, video processing was performed to delete unnecessary elements such as L characters. Such video deletion processing requires manual editing each time, which creates a workload for the operator and also lacks the speed of redistribution. In addition, since editing is done manually, if the area to be deleted is incorrectly set, if more than necessary area is deleted, the angle of view may become unnatural, or conversely, if the deletion area is narrow, the L-shaped part may be deleted. There was a possibility that the background color of the text would bleed through and remain, making the video unsuitable for broadcast. In addition, compression noise and the like may occur due to re-encoding in the area reduced by the L-shape. Furthermore, when the area reduced by this L-shape is enlarged again, the resolution of the video may be impaired (blurring may occur) when compared with the original broadcast video. On the other hand, areas where the time and other information have been masked (blurred) create boundaries with surrounding images, resulting in extremely unnatural images. In the first place, it is assumed that many of the broadcast simulcast videos are recompressed in order to preserve the broadcast video, so the image quality was inferior compared to the original.
These may create a sense of discomfort in the video, resulting in a video that is not suitable for broadcasting.

これに対して、本発明の実施の形態に係る編集システムＸは、放送映像データ２００に含まれる放送映像を再送出する編集システムであって、放送映像データ２００に含まれる放送映像の特定箇所に連続して表示される不要領域を特定する不要領域特定手段１００と、不要領域特定手段１００により特定された不要領域を放送映像データ２００から削除又は目立たなくする加工を行った加工映像データ２１０を作成する不要領域加工手段１１０と、不要領域加工手段により加工された加工映像データ２１０及び／又は放送映像データ２００から、格納された元映像データ２２０を特定する元映像特定手段１２０と、元映像特定手段１２０により特定された元映像データ２２０を基に、加工映像を高画質化する高画質化手段１３０とを備えることを特徴とする。 On the other hand, the editing system X according to the embodiment of the present invention is an editing system that retransmits the broadcast video included in the broadcast video data 200. An unnecessary area identifying means 100 identifies unnecessary areas that are continuously displayed, and processed video data 210 is created in which the unnecessary area identified by the unnecessary area identifying means 100 is deleted from the broadcast video data 200 or processed to make it less noticeable. an unnecessary area processing means 110 for processing, an original video specifying means 120 for specifying stored original video data 220 from the processed video data 210 and/or broadcast video data 200 processed by the unnecessary area processing means, and an original video specifying means. The video processing apparatus is characterized by comprising an image quality enhancing means 130 for enhancing the image quality of the processed image based on the original image data 220 specified by 120.

このように構成し、放送映像を再放送する際に、前回放送した映像の不要部分を削除する。すなわち、映像の内容を解析し、Ｌ時等の不要領域の位置を特定し、特定した不要領域を自動でマスク処理して目立たなく加工する。そして、映像内容を解析して、元映像データ２２０と照合し、放送に使用された元映像を特定する。この上で、蓄積サーバー２に格納された元映像データ２２０に基づいて、元の放送映像に近い映像を復元する。
このように、放送に用いられた元映像データ２２０を特定できた場合、元映像データ２２０を参照することで、従来よりも低負荷で、なおかつ高い精度の高画質化を行うことができる。これにより、放送映像の再送出時の画質劣化を抑えて、画質を改善できる。さらに、放送時の送出映像の同時録画から再送出までのワークフローを、自動編集により省力化することもできる。加えて、自動編集可能な編集システムとして、運用者の業務負荷を減らし、コストも改善できる。 With this configuration, when rebroadcasting broadcast video, unnecessary parts of the previously broadcast video are deleted. That is, the content of the video is analyzed, the position of unnecessary areas such as at L time is specified, and the identified unnecessary areas are automatically masked to make them less noticeable. Then, the video content is analyzed and compared with the original video data 220 to identify the original video used for broadcasting. Then, based on the original video data 220 stored in the storage server 2, a video close to the original broadcast video is restored.
In this way, when the original video data 220 used for broadcasting can be identified, by referring to the original video data 220, image quality can be improved with lower load and higher accuracy than in the past. This makes it possible to suppress image quality deterioration during retransmission of broadcast video and improve image quality. Furthermore, automatic editing can save labor in the workflow from simultaneous recording of broadcast video to retransmission. In addition, as an editing system that can automatically edit, it can reduce the workload of the operator and improve costs.

本発明の実施の形態に係る編集システムＸは、不要領域特定手段１００は、削除する対象の領域の特徴を学習させたモデルにより不要領域を特定することを特徴とする。
このように構成することで、不要領域を確実に特定することが可能となる。すなわち、ビデオサーバーシステムは、放送局ごとに稼働しており、扱われる放送同録映像のＬ字等における文字の形状、時刻表示の形状、字幕の表示位置、フォント等の加工、挿入フォーマットは、ある程度の規則性がある。このような、特定のパターンを示すＬ字等を削除する対象の領域のモデルとして学習させ、Ｌ字等に含まれる特定の成分を検出して、不要領域を削除することで、高精度で不要領域を特定することが可能となる。これにより、自動編集による高画質化を確実に実行可能となる。 The editing system X according to the embodiment of the present invention is characterized in that the unnecessary area identifying means 100 identifies unnecessary areas using a model that has learned the characteristics of the area to be deleted.
With this configuration, it is possible to reliably identify unnecessary areas. In other words, the video server system operates for each broadcast station, and the shape of characters such as L letters of the broadcast recorded video, the shape of the time display, the display position of subtitles, the processing of fonts, etc., and the insertion format are There is some regularity. This kind of L-shape, etc. that shows a specific pattern can be trained as a model of the target area to be deleted, detect specific components contained in the L-shape, etc., and delete unnecessary areas with high precision. It becomes possible to specify the area. This makes it possible to reliably achieve high image quality through automatic editing.

放送同録映像の放送品位を高めるために、超解像技術等の適用により、高画質化を行うことも考えられる。ここで、特にボカシを行った領域は、意図的に解像感を極めて低く加工している。さらに、たとえボカシの範囲を、時刻や字幕等の形状に精密に合わせたとしても、時刻や字幕等の上書きによって失われた元映像の画素情報は復元することが困難である。これらに対しては、ＡＩ等による高度な画像予測を行ったとしても、本来存在した画素情報や解像感を得ることは極めて難しかった。 In order to improve the broadcast quality of broadcast-recorded video, it is also possible to improve the image quality by applying super-resolution technology or the like. Here, especially in the blurred area, the resolution is intentionally processed to be extremely low. Furthermore, even if the blur range is precisely matched to the shape of the time, subtitles, etc., it is difficult to restore pixel information of the original video that is lost due to overwriting of the time, subtitles, etc. Even if advanced image prediction using AI or the like is performed for these images, it is extremely difficult to obtain the originally existing pixel information and resolution.

これに対して、本発明の実施の形態に係る編集システムＸは、高画質化手段１３０は、加工映像データ２１０について、元映像データ２２０に含まれる元映像に基づくエッジ情報並びに／若しくは色情報を利用したエッジ強調若しくは合成、及び／又は、元映像データ２２０に含まれる元映像の切り出しによる合成を行うことで高画質化することを特徴とする。
このように構成し、元映像データ２２０に基づくエッジ情報や色情報を利用したエッジ強調や合成、元映像データ２２０に含まれる元映像の切り出しを行うことで、放送映像に本来存在した画素情報や解像感を再現することが可能である。すなわち、元の映像に近い映像を復元することができる。さらに、元映像データ２２０を用いてエッジや色を強調、合成することで、放送時よりも高画質化できる可能性も生じる。 On the other hand, in the editing system The feature is that the image quality is improved by performing edge enhancement or synthesis using the original video data 220, and/or synthesis by cutting out the original video included in the original video data 220.
With this configuration, by performing edge enhancement and compositing using edge information and color information based on the original video data 220, and cutting out the original video included in the original video data 220, the pixel information that originally existed in the broadcast video It is possible to reproduce the sense of resolution. In other words, it is possible to restore a video that is close to the original video. Furthermore, by emphasizing and compositing edges and colors using the original video data 220, there is a possibility that the image quality can be made higher than that at the time of broadcast.

本発明の実施の形態に係る編集システムＸは、元映像特定手段１２０は、加工映像データ２１０に含まれる加工映像と元映像データ２２０に含まれる元映像との画像中の共通点を抽出し、抽出した共通点に基づいて元映像データ２２０に含まれる元映像を特定することを特徴とする。
このように構成し、加工映像データ２１０と元映像データ２２０の画像中の共通点を予め抽出しておき、抽出した共通点に基づいて学習させて照合し、加工映像データ２１０から元映像データ２２０を特定することが可能である。このように、映像内容を解析しておき、保管された映像と照合し、放送に使用された元映像を特定することで、元映像データ２２０の検索を高速化し、更に、画質復元精度を向上させることができる。 In the editing system X according to the embodiment of the present invention, the original video specifying means 120 extracts common points between the processed video included in the processed video data 210 and the original video included in the original video data 220, The feature is that the original video included in the original video data 220 is specified based on the extracted common points.
With this configuration, the common points in the images of the processed video data 210 and the original video data 220 are extracted in advance, the learning is performed based on the extracted common points, the comparison is made, and the original video data 220 is converted from the processed video data 210. It is possible to identify In this way, by analyzing the video content, comparing it with the stored video, and identifying the original video used for broadcasting, the search for the original video data 220 can be speeded up, and the accuracy of image quality restoration can be improved. can be done.

従来、手動編集作業による逆位相合成やフィルター処理等だけでは警報音を完全に削除しきれず、警報音の成分がノイズとして残ってしまうことがあった。
これに対して、本発明の実施の形態に係る編集システムＸは、放送映像データ２００に対応した音声データ３００を解析して、削除箇所を特定する削除箇所特定手段１４０と、元映像データ２２０に対応する元音声データ３２０を特定する元音声特定手段１５０と、元音声特定手段１５０により特定された元音声データ３２０を基に、削除箇所特定手段１４０により特定された音声の削除箇所を高音質化する高音質化処理手段１６０とを更に備えることを特徴とする。 In the past, the alarm sound could not be completely deleted by manual editing such as reverse phase synthesis or filter processing, and components of the alarm sound remained as noise.
On the other hand, the editing system The original audio identifying means 150 identifies the corresponding original audio data 320, and based on the original audio data 320 identified by the original audio identifying means 150, the deletion location of the audio identified by the deletion location identifying means 140 is improved in sound quality. The present invention is characterized in that it further includes a high-quality sound processing means 160.

このように構成し、音声内容を解析して、格納された音声データ３００と照合し、放送に使用された元映像データ２２０に対応した元音声データ３２０を特定する。これにより、放送に用いられた元音声データ３２０が特定できた場合、これを参照することで、通常よりも高い精度で放送時に付加された警報音の削除を行うことができる。これにより、警報音に由来するノイズを緩和することができ、確実に高音質化させることができる。 With this configuration, the audio content is analyzed and compared with the stored audio data 300 to identify the original audio data 320 that corresponds to the original video data 220 used for broadcasting. As a result, if the original audio data 320 used in the broadcast can be identified, by referring to this, the alarm sound added during the broadcast can be deleted with higher precision than usual. Thereby, noise originating from the alarm sound can be alleviated, and high quality sound can be reliably achieved.

本発明の実施の形態に係る編集システムＸは、削除箇所特定手段１４０は、特定のモデルを用いて音声解析を行い、音声中の警報音の箇所を特定することを特徴とする。
このように構成し、音声内容に、ＡＩ等を含む特定のモデルを用いて、格納された音声と照合し、放送に使用された元映像の音声を特定することで、警報音除去精度を向上させることが可能となる。 The editing system X according to the embodiment of the present invention is characterized in that the deletion location specifying means 140 performs audio analysis using a specific model and identifies the location of the alarm sound in the audio.
With this configuration, the audio content is compared with the stored audio using a specific model that includes AI, and the audio of the original video used for broadcasting is identified, improving the accuracy of alarm sound removal. It becomes possible to do so.

なお、上述の実施の形態では、蓄積サーバー２に既に格納されている放送映像データ２００について、高画質化処理を実行し、音声データ３００について高音質化処理をする例について説明した。
しかしながら、収録中、又は収録せずに、リアルタイムに処理を行うことも可能である。また、警報音の削除を行う高音質化処理についても、収録中、又は収録せずに、リアルタイムに処理を行うことも可能である。 In the above-described embodiment, an example has been described in which the broadcast video data 200 already stored in the storage server 2 is subjected to image quality enhancement processing, and the audio data 300 is subjected to audio quality enhancement processing.
However, it is also possible to perform processing in real time during or without recording. Furthermore, the high-quality sound processing for deleting the alarm sound can also be performed in real time during recording or without recording.

上述の実施の形態では、放送映像データ２００を解析し、不要領域の有無を検索してから特定するように記載した。
しかしながら、運用者の操作によって、放送映像データ２００に、Ｌ字等が含まれることや、Ｌ字等の映像上の位置や表示開始時間や終了時間等を指定してもよい。このように構成することで、放送映像データ２００の解析を省くことができる。 In the above-described embodiment, the broadcast video data 200 is analyzed, and the presence or absence of an unnecessary area is searched for and then identified.
However, the operator may specify that the broadcast video data 200 includes an L-shape or the like, or specify the position of the L-shape or the like on the video, the display start time, the display end time, and the like. With this configuration, analysis of the broadcast video data 200 can be omitted.

上述の実施の形態では、放送映像データ２００から加工映像データ２１０を作成し、その加工映像データ２１０について元映像データ２２０からの置き換え、エッジや色の強調、合成等を行うように記載した。
しかしながら、放送映像データ２００について加工映像データ２１０を作成せず、直接、放送映像データ２００を加工することも可能である。または、マスク処理を行わず、例えば、放送映像データ２００のコピーを加工映像データ２１０として作成することも可能である。この場合、不要領域のあるフレームを元映像データ２２０のフレームで直接、置き換えたり、時刻や字幕等の表示位置を元映像データ２２０で置き換え、マスク処理は行わないようにしたりすることが可能である。
このように構成することで、マスク処理の手間を減らし、速く高画質化することが可能となる。 In the embodiment described above, the processed video data 210 is created from the broadcast video data 200, and the processed video data 210 is replaced with the original video data 220, edges and colors are emphasized, synthesized, etc.
However, it is also possible to directly process the broadcast video data 200 without creating the processed video data 210 for the broadcast video data 200. Alternatively, for example, it is also possible to create a copy of the broadcast video data 200 as the processed video data 210 without performing mask processing. In this case, it is possible to directly replace a frame with an unnecessary area with a frame of the original video data 220, or to replace the display position of the time, subtitles, etc. with the original video data 220 without performing mask processing. .
With this configuration, it is possible to reduce the effort required for mask processing and quickly improve image quality.

上述の実施の形態では、元映像データ２２０として、放送映像データ２００で使用された番組のデータを用いる例について説明した。
しかしながら、放送に用いられた映像そのものの特定が困難であっても、例えば、類似地点を映した類似の映像の素材データを、元映像データ２２０として用いることも可能である。このように構成することで、従来より高い精度の高画質化が可能となる。
なお、この類似の映像は、上述のＡＩにより検索することも可能である。さらに、ＧＡＮ等のＡＩを用いて、この類似の映像を、実際の放送に用いられた映像に近い映像に加工することも可能である。 In the above-described embodiment, an example has been described in which the program data used in the broadcast video data 200 is used as the original video data 220.
However, even if it is difficult to identify the video itself used for broadcasting, it is also possible to use material data of a similar video showing a similar point as the original video data 220, for example. With this configuration, it is possible to achieve higher image quality with higher precision than in the past.
Note that this similar video can also be searched using the above-mentioned AI. Furthermore, using AI such as GAN, it is also possible to process this similar video into a video that is close to the video used in the actual broadcast.

一方、上述の実施の形態では、元音声データ３２０が特定できなかった場合のみ、警報音の逆位相合成やフィルター処理を行うように記載した。
しかしながら、加工音声データ３１０について、逆位相合成やフィルター処理をしてから、元音声データ３２０による置き換えを行ってもよい。
このように構成することで、元音声データ３２０の特定の可否によって処理を分ける必要がなくなり、高音質化の効率を向上させることができる。 On the other hand, in the above-described embodiment, it is described that anti-phase synthesis and filter processing of the alarm sound are performed only when the original audio data 320 cannot be specified.
However, the processed audio data 310 may be replaced with the original audio data 320 after performing anti-phase synthesis or filter processing.
With this configuration, there is no need to separate processing depending on whether or not the original audio data 320 can be specified, and the efficiency of improving sound quality can be improved.

加えて、上述の実施の形態では、音声データ３００を直接、逆位相合成やフィルター処理、又は、元音声データ３２０に置き換えるように記載した。
しかしながら、音声データ３００には加工せず、加工音声データ３１０を加工して、加工映像データ２１０と供に出力するように構成することも可能である。 In addition, in the embodiments described above, the audio data 300 is directly subjected to anti-phase synthesis, filter processing, or replaced with the original audio data 320.
However, it is also possible to configure the processed audio data 310 to be processed and output together with the processed video data 210 without processing the audio data 300.

上述の実施の形態では、蓄積サーバー２に格納された放送映像データ２００に対する各機能部の処理を、専用の解析装置１が実行する例について説明した。
しかしながら、上述の各機能部の処理は、解析装置１で行わなくてもよく、編集装置４や蓄積サーバー２等で実行してもよい。 In the embodiment described above, an example has been described in which the dedicated analysis device 1 executes the processing of each functional unit on the broadcast video data 200 stored in the storage server 2.
However, the processing of each of the above-mentioned functional units does not have to be performed by the analysis device 1, and may be performed by the editing device 4, the storage server 2, or the like.

上述の実施の形態では、コンテナフォーマットのファイルとして、ＭＸＦを用いる例について記載した。
しかしながら、ＭＸＦ以外のコンテナフォーマット、例えば、ＭＫＶ等を用いることも可能である。さらに、放送映像データ２００の記録形式や記録フォーマットは、システム要件に応じて、ＭＰ４、ＡＶＩ、その他のプログラムストリーム（ＰＳ）形式、その他のトランスポートストリーム形式（ＴＳ）等でもよい。さらに、放送映像データ２００は、各種コーデックで圧縮されていてもよい。 In the embodiment described above, an example is described in which MXF is used as the container format file.
However, it is also possible to use container formats other than MXF, such as MKV. Furthermore, the recording format and recording format of the broadcast video data 200 may be MP4, AVI, other program stream (PS) formats, other transport stream formats (TS), etc., depending on system requirements. Furthermore, the broadcast video data 200 may be compressed using various codecs.

また、高画質化手段１３０及び高音質化処理手段１６０は、元映像データ２２０からの置き換えの際、映像についてはディゾルブ効果、音声はクロスフェード効果等を用いて、徐々に元映像データ２２０や元音声データ３２０と置き換えるようにしてもよい。
このように構成することで、不連続性に伴う違和感を緩和することが可能となる。 In addition, when replacing the original video data 220, the high image quality improvement means 130 and the high sound quality processing means 160 gradually change the original video data 220 and the original data by using a dissolve effect for video, a cross fade effect for audio, etc. It may be replaced with the audio data 320.
With this configuration, it is possible to alleviate the discomfort caused by discontinuity.

また、本発明の実施の形態に係る編集システムは、映像データを使用する各種装置に適用できる。たとえば、映像データを使用する装置として、エンコーダー、デコーダー、編集機、素材サーバー、送出サーバー等にも適用可能である。 Further, the editing system according to the embodiment of the present invention can be applied to various devices that use video data. For example, the invention can be applied to encoders, decoders, editing machines, material servers, transmission servers, etc. as devices that use video data.

なお、上記実施の形態の構成及び動作は例であって、本発明の趣旨を逸脱しない範囲で適宜変更して実行することができることは言うまでもない。 Note that the configuration and operation of the embodiment described above are merely examples, and it goes without saying that the configuration and operation of the embodiment can be modified and executed as appropriate without departing from the spirit of the present invention.

１解析装置
２蓄積サーバー
３収録装置
４編集装置
５ネットワーク
１０制御部
１１記憶部
３０撮像部
１００不要領域特定手段
１１０不要領域加工手段
１２０元映像特定手段
１３０高画質化手段
１４０削除箇所特定手段
１５０元音声特定手段
１６０高音質化処理手段
２００放送映像データ
２１０加工映像データ
２２０元映像データ
３００音声データ
３１０加工音声データ
３２０元音声データ
Ａ１、Ａ２、Ａ３不要領域
Ｘ編集システム 1 Analyzing device 2 Storage server 3 Recording device 4 Editing device 5 Network 10 Control section 11 Storage section 30 Imaging section 100 Unnecessary area specifying means 110 Unnecessary area processing means 120 Original video specifying means 130 High image quality means 140 Deletion part specifying means 150 Yuan Audio identification means 160 High quality sound processing means 200 Broadcast video data 210 Processed video data 220 Original video data 300 Audio data 310 Processed audio data 320 Original audio data A1, A2, A3 Unnecessary area X Editing system

Claims

An editing system that retransmits broadcast video,
unnecessary area identifying means for identifying an unnecessary area that is continuously displayed at a specific location of the broadcast video;
unnecessary area processing means for creating a processed video in which the unnecessary area identified by the unnecessary area identification means is processed to delete or make it less noticeable from the broadcast video;
Original video identifying means for identifying a stored original video from the processed video and/or the broadcast video processed by the unnecessary area processing means;
An editing system comprising: an image quality improvement unit that increases the image quality of the processed video based on the original video specified by the original video identification unit.

The unnecessary area identifying means includes:
The editing system according to claim 1, wherein the unnecessary area is identified by a model that has learned the characteristics of the area to be deleted.

The image quality improvement means includes:
A claim characterized in that the image quality of the processed video is enhanced by performing edge enhancement or composition using edge information and/or color information based on the original video, and/or composition by cutting out the original video. The editing system according to item 1 or 2.

The source video identifying means includes:
4. The method according to claim 1, wherein common points between the processed video and the original video are extracted, and the original video is identified based on the extracted common points. Editing system.

Deletion location identifying means for analyzing audio corresponding to the broadcast video to identify deletion locations;
Original audio identifying means for identifying the original audio corresponding to the original video;
The method further comprises a high-quality sound processing means for enhancing the sound quality of the deletion portion of the voice specified by the deletion portion specifying means based on the original voice specified by the original voice specifying means. An editing system according to any one of claims 1 to 4.

The deletion point specifying means includes:
6. The editing system according to claim 5, wherein a voice analysis is performed using a specific model to identify a location of an alarm sound in the voice.