JP2019193023A

JP2019193023A - Desired video information notification system

Info

Publication number: JP2019193023A
Application number: JP2018081824A
Authority: JP
Inventors: 孝利石井; Takatoshi Ishii
Original assignee: JCC KK
Current assignee: JCC KK
Priority date: 2018-04-20
Filing date: 2018-04-20
Publication date: 2019-10-31

Abstract

To provide a desired video information notification system capable of encouraging users to view only useful video parts by informing the users that content that matches desired conditions contained in one content is output.SOLUTION: The desired video information notification system includes: determination means for determining whether content suitable for a desired condition is included in data that composes content being output while learning optimal conditions on the basis of accumulated results of summaries pre-stored on the basis of the previous content; and informing means for informing a user of the content being output when it is determined by the determination means that it is included.SELECTED DRAWING: Figure 9

Description

本発明は、希望映像情報報知システムに関し、特に、コンテンツにおける希望映像情報を利用者に報知するための希望映像情報報知システムに関する。 The present invention relates to a desired video information notification system, and more particularly to a desired video information notification system for notifying a user of desired video information in content.

従来、例えば、コンテンツとしてのテレビ放映等で出力している映像の中から、予め登録した映像のある区間と類似する映像の箇所を探索する技術が知られている。 2. Description of the Related Art Conventionally, for example, a technique for searching for a video portion similar to a certain section of a pre-registered video from videos output by television broadcasting as content is known.

このような探索技術は、例えば、テレビ放映信号の中から特定のタイトルロールを検出してリアルタイム録画の開始・停止や、異なる時間・放送局で放送された同一ニュース素材を検出して映像の構造解析を行う、等の技術に用いられている（例えば、特許文献１参照）。 Such a search technique is, for example, the detection of a specific title roll from a television broadcast signal to start / stop real-time recording, or the same news material broadcast at different times / broadcast stations to detect the structure of the video. It is used for techniques such as analysis (see, for example, Patent Document 1).

また、このような探索技術は、テレビ放映に限定されず、例えば、インターネット回線を通じて受信した映像コンテンツ等の配信データを対象とすることも可能である（例えば、特許文献２参照）。 Moreover, such a search technique is not limited to television broadcasting, but can also target distribution data such as video content received through the Internet line (see, for example, Patent Document 2).

さらに、このような探索技術は、映像（動画・静止画）に限定されず、例えば、テキストへの対応も可能である。具体的には、コンテンツに含まれる字幕テキストの他、放送番組のコーナーごとの放送開始時刻、放送終了時刻、出演者、及び、コーナーの内容の要約等のメタデータを、放送番組の終了後に配信する有料サービス（番組メタデータサービスとも称される）のサービス提供者が提供するメタデータや、ユーザがキーボード等を操作することによって入力する、コンテンツを説明するテキスト等を採用することができる（例えば、特許文献２参照）。 Furthermore, such a search technique is not limited to video (moving images / still images), and can deal with text, for example. Specifically, in addition to the subtitle text included in the content, metadata such as the broadcast start time, broadcast end time, performers, and summary of the contents of the corner for each broadcast program corner is distributed after the broadcast program ends. Metadata provided by a service provider of a paid service (also referred to as a program metadata service), text describing the content input by the user operating a keyboard, etc. can be adopted (for example, , See Patent Document 2).

特開２０１０−２６２４１３号公報JP 2010-262413 A 特開２０１２−０３８２３９号公報JP 2012-038239 A

しかしながら、これらの技術は、例えば、一つの番組やコンテンツを対象としており、ユーザの嗜好に十分に対応しきれていないという問題が生じていた。 However, these techniques, for example, target one program or content, and have a problem that they are not fully compatible with user preferences.

例えば、テレビ放映において、特定のニュース番組を嗜好の一つとすることはできるものの、そのニュース番組中の特定のコーナーや特定のニュース情報までも対象とすることはできなかった。 For example, in a television broadcast, a specific news program can be one of the preferences, but a specific corner in the news program and specific news information cannot be targeted.

本発明は、上述のような課題を解決するために、利用者に有用な映像部分のみを視聴するよう喚起することができる希望映像情報報知システムを提供することを目的とする。 In order to solve the above-described problems, an object of the present invention is to provide a desired video information notification system that can prompt a user to view only a useful video portion.

本発明に係る希望映像情報報知システムは、上記目的を達成のため、先のコンテンツに基づいて予め蓄積された要約の蓄積結果に基づいて最適な条件を学習しつつ、出力中のコンテンツを構成するデータに所望の条件に適合した内容が含まれているか否かを判定する判定手段と、前記判定手段で含まれていると判定した場合に出力中のコンテンツの利用者に対してその旨を報知する報知手段と、を備える。 In order to achieve the above object, the desired video information notification system according to the present invention configures the content being output while learning the optimum conditions based on the accumulation result of the summary accumulated in advance based on the previous content. Determining means for determining whether or not the data includes content that meets a desired condition, and informing the user of the content being output when it is determined that the data is included in the determining means Notification means.

本発明によれば、コンテンツを出力している際に、コンテンツ全体を視聴するのではなく、所望の条件に適合した内容が出力される場合にのみ視聴を行うことができるように、判定手段が出力中のコンテンツを監視することができる。 According to the present invention, when the content is output, the determination unit is configured so that the content can be viewed only when the content that meets the desired condition is output, instead of viewing the entire content. The content being output can be monitored.

したがって、出力中のコンテンツに所望の条件に適合した内容が出力される場合にのみ視聴を行えばよいため、それ以外の出力中は他の作業を行うなどの、「ながら視聴」を行うことができる。 Therefore, since it is only necessary to view the content being output that is suitable for the content being output, it is possible to perform “while viewing” such as performing other work during other output. it can.

同じく請求項２に記載の発明は、請求項１に記載の希望映像情報報知システムにおいて、前記出力中のコンテンツを前記表示手段に表示させるための起動手段を備え、前記報知手段は、前記判定手段で含まれていると判定した場合に前記起動手段に報知信号を出力して前記出力中のコンテンツを表示させる、ことを特徴とする。 Similarly, the invention described in claim 2 is the desired video information notification system according to claim 1, further comprising an activation unit for displaying the content being output on the display unit, wherein the notification unit includes the determination unit. When it is determined that the content is included, the notification signal is output to the activation unit to display the content being output.

すなわち、コンテンツは、最初からモニタ等の表示手段（表示画面）に出力している必要はなく、例えば、出力として録画のみを行ってもよい。 That is, the content does not need to be output from the beginning to a display means (display screen) such as a monitor. For example, only the recording may be performed as an output.

判定手段は、そのコンテンツ録画中に映像データ解析や音声データ解析を行って所望の条件に適合した内容が含まれているか否かを判定する。 The determination means performs video data analysis and audio data analysis during the content recording to determine whether or not content that meets a desired condition is included.

そして、その判定手段による判定結果において、所望の条件に適合した内容が含まれているかと判定した場合に、報知として起動手段によって表示手段をＯＮして実際の表示を開始してもよい。 Then, when it is determined in the determination result by the determination means that the content suitable for the desired condition is included, the display means may be turned on by the activation means to start actual display as notification.

また、例えば、一つの表示画面中にメイン画面とワイプ画面とが表示されている場合において、ワイプ画面に所望の条件に対応する人物等が表示されていると判定したときに、メイン画面とワイプ画面との表示状態を切り替えるように起動制御してもよい。 Also, for example, when the main screen and the wipe screen are displayed on one display screen, when it is determined that a person corresponding to the desired condition is displayed on the wipe screen, the main screen and the wipe screen are displayed. The activation control may be performed so as to switch the display state with the screen.

同じく請求項３に記載の発明は、請求項１に記載の希望映像情報報知システムにおいて、前記判定手段は、前記コンテンツがテレビ放映である場合に、リアルタイムで視聴している放映データに含まれる音声データ又は映像データの少なくとも一方に所望の条件に適合した内容が含まれているか否かをリアルタイムで判定し、所望の条件に適合した内容が含まれていると判定したときに前記報知手段に報知信号を出力する、ことを特徴とする。 Similarly, the invention according to claim 3 is the desired video information notification system according to claim 1, wherein the determination means includes audio included in broadcast data being viewed in real time when the content is broadcast on television. It is determined in real time whether at least one of the data and the video data includes content that meets a desired condition, and the notification means is notified when it is determined that content that meets a desired condition is included. A signal is output.

コンテンツがテレビ放映である場合、例えば、一つの番組であっても利用者によって視聴したいのは番組全体とは限らず、特定のコーナーや出演者のみである場合がある。 In the case where the content is broadcast on television, for example, even a single program is not limited to the entire program, but may be only a specific corner or performer.

そこで、本発明によれば、リアルタイムで視聴している放映データに含まれる音声データ又は映像データの少なくとも一方に所望の条件に適合した内容が含まれているか否かをリアルタイムで判定手段が判定することにより、利用者が視聴したいとする所望の条件に適合した内容が含まれている部分に差し掛かったときに、報知手段が報知信号を出力することにより、所望の出力を視聴することが可能となる。 Therefore, according to the present invention, the determination unit determines in real time whether at least one of the audio data and the video data included in the broadcast data being viewed in real time includes content that meets a desired condition. As a result, the notification means outputs a notification signal so that the desired output can be viewed when the user reaches a portion that includes content that meets the desired condition that the user wants to view. Become.

同じく請求項４に記載の発明は、請求項１に記載の希望映像情報報知システムにおいて、前記判定手段は、前記コンテンツが電気通信回線を通じて受信した配信データである場合に、リアルタイムで視聴している配信データに含まれる音声データ又は映像データの少なくとも一方に所望の条件に適合した内容が含まれているか否かをリアルタイムで判定し、所望の条件に適合した内容が含まれていると判定したときに前記報知手段に報知信号を出力する、ことを特徴とする。 Similarly, the invention according to claim 4 is the desired video information notification system according to claim 1, wherein the determination unit views the content in real time when the content is distribution data received through a telecommunication line. When it is determined in real time whether at least one of the audio data or video data included in the distribution data includes content that meets the desired condition, and it is determined that content that meets the desired condition is included A notification signal is output to the notification means.

コンテンツがインターネット回線等の電気通信回線を利用して受信した映像コンテンツ等の配信データである場合、その映像コンテンツが編集されたものであっても利用者によって視聴したいのはコンテンツ全体とは限らず、その一部のみである場合がある。 When the content is distribution data such as video content received using a telecommunication line such as the Internet line, even if the video content is edited, what the user wants to view is not necessarily the entire content. , May be only part of it.

そこで、本発明によれば、リアルタイムで視聴している配信データに含まれる音声データ又は映像データの少なくとも一方に所望の条件に適合した内容が含まれているか否かをリアルタイムで判定手段が判定することにより、利用者が視聴したいとする所望の条件に適合した内容が含まれている部分に差し掛かったときに、報知手段が報知信号を出力することにより、所望の出力を視聴することが可能となる。 Therefore, according to the present invention, the determination unit determines in real time whether at least one of the audio data and the video data included in the distribution data viewed in real time includes content that meets a desired condition. As a result, the notification means outputs a notification signal so that the desired output can be viewed when the user reaches a portion that includes content that meets the desired condition that the user wants to view. Become.

同じく請求項５に記載の発明は、請求項１に記載の希望映像情報報知システムにおいて、前記判定手段は、前記コンテンツが電気通信回線を通じて受信した配信データである場合に、リアルタイムで視聴している配信データに先行して受信した配信データに含まれる音声データ又は映像データの少なくとも一方に所望の条件に適合した内容が含まれているか否かを予め判定し、所望の条件に適合した内容が含まれていると判定した部分を出力するときに前記報知手段に報知信号を出力する、ことを特徴とする。 Similarly, the invention according to claim 5 is the desired video information notification system according to claim 1, wherein the determination unit views the content in real time when the content is distribution data received through a telecommunication line. It is determined in advance whether at least one of the audio data or video data included in the distribution data received prior to the distribution data includes content that meets the desired condition, and content that meets the desired condition is included A notification signal is output to the notification means when a portion determined to be output is output.

コンテンツがインターネット回線等の電気通信回線を利用して受信した映像コンテンツ等の配信データである場合、例えば、インターネットサーバへのアクセス数、電気通信回線の受信速度、パーソナルコンピュータやスマートフォン等の受信・再生端末の機能、等によっては、出力部分よりも先の部分の配信データを予め受信している場合がある。 When the content is distribution data such as video content received using a telecommunication line such as the Internet line, for example, the number of accesses to the Internet server, the receiving speed of the telecommunication line, reception / reproduction of personal computers, smartphones, etc. Depending on the function of the terminal, etc., the delivery data of the part ahead of the output part may be received in advance.

そこで、本発明によれば、コンテンツが電気通信回線を通じて受信した配信データである場合には、リアルタイムで視聴している配信データに先行して受信した配信データに含まれる音声データ又は映像データの少なくとも一方に所望の条件に適合した内容が含まれているか否かを予め判定手段で判定しておけば、その部分の出力に差し掛かったときに報知手段で報知することができる。 Therefore, according to the present invention, when the content is distribution data received via a telecommunication line, at least audio data or video data included in the distribution data received prior to the distribution data being viewed in real time. If it is determined in advance by the determining means whether or not the content suitable for the desired condition is included, it is possible to notify the notification means when it reaches the output of that part.

同じく請求項６に記載の発明は、請求項１に記載の希望映像情報報知システムにおいて、前記判定手段は、前記コンテンツが記憶手段に予め記憶した放映データ又は配信データである場合に、リアルタイムで視聴している配信データに先行して受信した配信データに含まれる音声データ又は映像データの少なくとも一方に所望の条件に適合した内容が含まれているか否かを予め判定し、所望の条件に適合した内容が含まれていると判定した部分を出力するときに報知手段に報知信号を出力する、ことを特徴とする。 Similarly, the invention according to claim 6 is the desired video information notification system according to claim 1, wherein the determination unit is configured to view in real time when the content is broadcast data or distribution data stored in advance in the storage unit. It is determined in advance whether at least one of the audio data and the video data included in the distribution data received prior to the distribution data being included includes content that meets the desired condition, and the desired condition is met A notification signal is output to the notification means when a portion determined to contain content is output.

利用者によっては、他の都合によってリアルタイムでコンテンツを視聴することができず、記憶手段に記憶（いわゆる、録画）している場合がある。 Some users may not be able to view the content in real time for other reasons, but may be stored (so-called recording) in the storage means.

そこで、本発明によれば、予め記憶手段に記憶した放映データ又は配信データを出力（再生）している場合には、リアルタイムで視聴（再生）している配信データに先行して記憶した配信データに含まれる音声データ又は映像データの少なくとも一方に所望の条件に適合した内容が含まれているか否かを予め判定手段で判定しておけば、その部分の出力に差し掛かったときに報知手段で報知することができる。 Therefore, according to the present invention, when the broadcast data or distribution data stored in the storage means is output (reproduced), the distribution data stored prior to the distribution data being viewed (reproduced) in real time. If at least one of the audio data or video data included in the image contains content that meets the desired condition, it is determined in advance by the determining means, and when the output of that portion is reached, the notification means notifies can do.

同じく請求項７に記載の発明は、請求項６に記載の希望映像情報報知システムにおいて、前記報知手段は、前記記憶手段に予め記憶した放映データ又は配信データを出力している際に、時間と出力速度とが同じ標準速度よりも早い高速出力若しくは前記標準速度よりも遅い低速出力である場合には、前記判定手段により所望の条件に適合した内容が含まれていると判定した場合に出力中のコンテンツの出力速度を前記標準速度に切り替える、ことを特徴とする。 Similarly, the invention according to claim 7 is the desired video information notification system according to claim 6, wherein the notification means outputs time data and broadcast data stored in the storage means in advance. If the output speed is a high-speed output that is faster than the same standard speed or a low-speed output that is slower than the standard speed, the output is being performed when it is determined by the determination means that content that meets the desired conditions is included. The output speed of the content is switched to the standard speed.

予め記憶手段に記憶した放映データ又は配信データを出力（再生）する場合には、本来の出力速度、すなわち、時間と出力速度とが一致している標準速度よりも早い高速出力（若しくは前記標準速度よりも遅い低速出力）で出力（再生）している場合がある。 When outputting (reproducing) the broadcast data or distribution data stored in the storage means in advance, the original output speed, that is, the high-speed output that is faster than the standard speed at which the time and the output speed coincide (or the standard speed) Output (playback) in some cases.

そこで、本発明によれば、出力速度が標準速度でない場合には、報知手段による報知として、判定手段により所望の条件に適合した内容が含まれていると判定した場合に出力中のコンテンツの出力速度を標準速度に切り替えることにより、所望のタイミングから視聴をすることが可能となる。 Therefore, according to the present invention, when the output speed is not the standard speed, the output of the content being output when the determination unit determines that the content conforming to the desired condition is included as the notification by the notification unit By switching the speed to the standard speed, it is possible to view from a desired timing.

同じく請求項８に記載の発明は、請求項１から請求項７のいずれか１の請求項に記載の希望映像情報報知システムにおいて、前記報知手段は、前記判定手段により所望の条件に適合した内容が含まれていると判定した場合に、前記音声データに基づいて音声出力部から出力している音声の音量を増加させる、ことを特徴とする。 Similarly, the invention according to claim 8 is the desired video information notification system according to any one of claims 1 to 7, wherein the notification means is a content adapted to a desired condition by the determination means. Is determined, the volume of the sound output from the sound output unit is increased based on the sound data.

視聴者は、音量が増加したことにより、所望の条件に適合した部分の放映（再生）に差し掛かったことを容易に認識することができる。 The viewer can easily recognize that the increase in the volume has led to the broadcast (reproduction) of the part that meets the desired condition.

同じく請求項９に記載の発明は、請求項１から請求項７のいずれか１の請求項に記載の希望映像情報報知システムにおいて、前記報知手段は、前記判定手段により所望の条件に適合した内容が含まれていると判定した場合に、前記音声データに基づいて音声出力部から出力している音声とは異なる報知音声を出力する、ことを特徴とする。 Similarly, the invention according to claim 9 is the desired video information notification system according to any one of claims 1 to 7, wherein the notification means is a content adapted to a desired condition by the determination means. When it is determined that the information is included, a notification voice different from the voice output from the voice output unit is output based on the voice data.

視聴者は、コンテンツに含まれる音声とは異なる音声を出力することにより、所望の条件に適合した部分の放映（再生）に差し掛かったことを容易に認識することができる。 The viewer can easily recognize that the part that has met the desired condition has been broadcast (reproduced) by outputting a sound different from the sound included in the content.

その結果、視聴者は、放映時間に拘束されることなく、自分が希望、或いは、必要とする映像部分にのみを視聴することができ、特に多忙な際には時間効率化に貢献し得る報知後の映像を視聴することができる。 As a result, viewers can watch only the video part they want or need without being restricted by the broadcast time, and notifications that can contribute to time efficiency, especially when busy. You can watch later videos.

また、クラウド上の映像配信サービスにおいても、番組の全ての映像を視聴することなく、自分が希望、或いは、必要とする映像部分にのみを視聴することができる。 Also, in the video distribution service on the cloud, it is possible to view only the video portion that the user desires or needs without viewing all the videos of the program.

本発明によれば、一つのコンテンツに含まれる所望の条件に適合した内容が出力される際に、その旨を利用者に報知することにより、利用者に有用な映像部分のみを視聴するよう喚起することができる。 According to the present invention, when content suitable for a desired condition included in one content is output, the user is notified so that only the video portion useful for the user is viewed. can do.

本発明に係る要約作成システムの一実施の形態における全体構成を示すブロック図である。It is a block diagram which shows the whole structure in one Embodiment of the summary preparation system which concerns on this invention. 本発明に係る要約作成システムの一実施の形態における発話テキスト化部を示すものであり、（ａ）はブロック図、（ｂ）は処理の流れを示す図である。The utterance text-izing part in one Embodiment of the summary preparation system which concerns on this invention is shown, (a) is a block diagram, (b) is a figure which shows the flow of a process. 本発明に係る要約作成システムの一実施の形態におけるテロップテキスト化部を示すものであり、（ａ）はブロック図、（ｂ）は処理の流れを示す図である。The telop text conversion part in one Embodiment of the summary production system which concerns on this invention is shown, (a) is a block diagram, (b) is a figure which shows the flow of a process. 本発明に係る要約作成システムの一実施の形態における背景画像テキスト化部を示すものであり、（ａ）はブロック図、（ｂ）は処理の流れを示す図である。The background image text-izing part in one Embodiment of the summary preparation system which concerns on this invention is shown, (a) is a block diagram, (b) is a figure which shows the flow of a process. 本発明に係る要約作成システムの一実施の形態におけるロゴマークテキスト化部を示すものであり、（ａ）はブロック図、（ｂ）は処理の流れを示す図である。The logo mark text-izing part in one Embodiment of the summary preparation system which concerns on this invention is shown, (a) is a block diagram, (b) is a figure which shows the flow of a process. 本発明に係る要約作成システムの一実施の形態におけるテキスト統合部を示すブロック図である。It is a block diagram which shows the text integration part in one Embodiment of the summary preparation system which concerns on this invention. 本発明に係る要約作成システムの一実施の形態における要約作成部を示すブロック図である。It is a block diagram which shows the summary preparation part in one Embodiment of the summary preparation system which concerns on this invention. 本発明に係る要約作成システムの一実施の形態における動作を示すフローチャートである。It is a flowchart which shows the operation | movement in one Embodiment of the summary preparation system which concerns on this invention. 本発明に係る要約作成システムの一実施の形態における希望映像情報報知システムの全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the desired image | video information alerting | reporting system in one Embodiment of the summary preparation system which concerns on this invention. 本発明に係る要約作成システムの一実施の形態における希望映像情報報知システムの適用例を示し、（Ａ）は文字認識により所望の条件に適合していると判定した場合の説明図、（Ｂ）は音声認識によりにより所望の条件に適合していると判定した場合の説明図、である。The application example of the desired image | video information alerting | reporting system in one Embodiment of the summary preparation system which concerns on this invention is shown, (A) is explanatory drawing when it determines with satisfying desired conditions by character recognition, (B) These are explanatory drawings when it is determined by voice recognition that a desired condition is met.

図１は本発明の実施形態に係る希望映像情報報知システムを実現するための要約作成システムの全体構成を示すブロック図である。 FIG. 1 is a block diagram showing an overall configuration of a summary creation system for realizing a desired video information notification system according to an embodiment of the present invention.

＜要約作製システム１０の全体構成＞
図１に示すように、要約作製システム１０は、ビデオ信号分離部２０、発話テキスト化部１００、テロップテキスト化部２００、背景画像テキスト化部３００、ロゴマークテキスト化部４００、テキスト統合部５００、及び要約作成部６００を備える。本実施形態では要約作製システム１０はビデオ信号をテレビ放送局３０からの放送番組やインターネット４０で配信される動画映像から取得する。なお、ビデオ信号は、インターネットにおける映像から取得することができる。なお、テロップテキスト化部２００、背景画像テキスト化部３００、ロゴマークテキスト化部４００、及びテキスト統合部５００から選択した少なくとも２つの部分を備えるものとすることができる。 <Overall Configuration of Summary Production System 10>
As shown in FIG. 1, the summary generation system 10 includes a video signal separation unit 20, an utterance text conversion unit 100, a telop text conversion unit 200, a background image text conversion unit 300, a logo mark text conversion unit 400, a text integration unit 500, And a summary creation unit 600. In the present embodiment, the summary generation system 10 acquires a video signal from a broadcast program from the television broadcast station 30 or a moving image distributed on the Internet 40. The video signal can be acquired from video on the Internet. It should be noted that at least two parts selected from the telop text unit 200, the background image text unit 300, the logo mark text unit 400, and the text integration unit 500 may be provided.

音声信号と映像信号を含むビデオ信号Ｖは、ビデオ信号分離部２０で音声信号Ａと映像信号Ｂとに分離される。音声信号Ａは発話テキスト化部１００に入力され、映像信号Ｂはテロップテキスト化部２００、背景画像テキスト化部３００、及びロゴマークテキスト化部４００に入力される。 The video signal V including the audio signal and the video signal is separated into the audio signal A and the video signal B by the video signal separation unit 20. The audio signal A is input to the utterance text unit 100, and the video signal B is input to the telop text unit 200, the background image text unit 300, and the logo mark text unit 400.

＜発話テキスト化部１００＞
発話テキスト化部１００は音声信号Ａを受けてコンテンツ中における人の発話内容を記述したテキストである発話テキストを出力する。発話テキスト化部１００は、発話情報抽出部１１０、発話内容認識部１２０、発話内容テキスト化部１３０を備える。 <Speech text unit 100>
The speech text unit 100 receives the audio signal A and outputs speech text that is text describing the content of a person's speech in the content. The utterance text conversion unit 100 includes an utterance information extraction unit 110, an utterance content recognition unit 120, and an utterance content text conversion unit 130.

発話情報抽出部１１０は、ビデオ信号Ｖの音声信号Ａから発話情報を抽出する。即ち、音声信号Ａ中の雑音を取り除き、人の発話音声の情報を抽出する。この発話情報として効果音や特徴的な音楽を含むことができる。 The utterance information extraction unit 110 extracts utterance information from the audio signal A of the video signal V. That is, the noise in the voice signal A is removed, and the information of the human speech voice is extracted. The utterance information can include sound effects and characteristic music.

発話内容認識部１２０は、発話情報から発話内容を認識する。即ち、発話情報を音響的、文法的に解析して発話内容を言語として認識する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去の音声テキストの生成データから機械学習により生成できる。 The utterance content recognition unit 120 recognizes the utterance content from the utterance information. That is, the utterance information is acoustically and grammatically analyzed to recognize the utterance content as a language. Parameters, conditions, and the like used for this recognition can be generated by machine learning from the past speech text generation data accumulated as described later.

発話内容テキスト化部１３０は発話内容をテキスト化して出力する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去の音声テキストの入力データ及び生成データから機械学習により生成できる。 The utterance content text conversion unit 130 converts the utterance content into text and outputs it. The parameters, conditions, and the like used for this recognition can be generated by machine learning from past speech text input data and generation data accumulated as described later.

＜テロップテキスト化部２００＞
テロップテキスト化部２００は映像信号Ｂを受けてコンテンツ中におけるテロップ内容を記述したテキストであるテロップテキストを出力する。テロップテキスト化部２００は、テロップ情報抽出部２１０、テロップ内容認識部２２０、テロップ内容テキスト化部２３０を備える。 <Telop text converter 200>
The telop text converting unit 200 receives the video signal B and outputs telop text that is text describing the telop content in the content. The telop text conversion unit 200 includes a telop information extraction unit 210, a telop content recognition unit 220, and a telop content text conversion unit 230.

テロップ情報抽出部２１０は、ビデオ信号Ｖの映像信号Ｂからテロップ情報を抽出する。即ち、映像信号Ｂ中の背景を取り除き、テロップ画像だけの情報を抽出する。 The telop information extraction unit 210 extracts telop information from the video signal B of the video signal V. That is, the background in the video signal B is removed, and only the telop image information is extracted.

発話内容認識部１２０は、テロップ画像情報からテロップ内容を認識する。即ち、テロップ情報を言語的、文法的に解析してテロップ表示内容を言語として認識する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去のテロップテキストの入力データ及び生成データから機械学習により生成できる。 The utterance content recognition unit 120 recognizes the telop content from the telop image information. That is, the telop information is analyzed linguistically and grammatically to recognize the telop display content as a language. Parameters, conditions, and the like used for this recognition can be generated by machine learning from the past telop text input data and generation data accumulated as described later.

テロップ内容テキスト化部２３０はテロップ内容をテキスト化して出力する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去のテロップテキストの入力データ及び生成データから機械学習により生成できる。 The telop content text conversion unit 230 converts the telop content into text and outputs it. Parameters, conditions, and the like used for this recognition can be generated by machine learning from the past telop text input data and generation data accumulated as described later.

＜背景画像テキスト化部３００＞
背景画像テキスト化部３００は映像信号Ｂを受けてコンテンツ中における背景画像内容を記述したテキストである背景画像テキストを出力する。背景画像としては、場面、状況、物品、及び事象のうち少なくとも一つ、例えば、人物、人物の持ち物、人物の表情、風景、建築物の状況、室内の状況、動物、乗物、その他の物品を挙げることができる。背景画像テキスト化部３００は、背景画像情報抽出部３１０、背景画像内容認識部３２０、背景画像内容テキスト化部３３０を備える。 <Background image text unit 300>
The background image text conversion unit 300 receives the video signal B and outputs a background image text that is a text describing the background image content in the content. The background image includes at least one of a scene, a situation, an article, and an event, for example, a person, a person's belongings, a person's facial expression, a landscape, a building situation, an indoor situation, an animal, a vehicle, and other articles. Can be mentioned. The background image text conversion unit 300 includes a background image information extraction unit 310, a background image content recognition unit 320, and a background image content text conversion unit 330.

背景画像情報抽出部３１０は、ビデオ信号Ｖの映像信号Ｂから背景画像情報を抽出する。即ち、映像信号Ｂ中のテロップや不鮮明な画像を取り除き、認識可能な背景画像だけの情報を抽出する。 The background image information extraction unit 310 extracts background image information from the video signal B of the video signal V. That is, the telop and unclear image in the video signal B are removed, and only the recognizable background image information is extracted.

背景画像内容認識部３２０は、背景画像情報から背景画像の内容を認識する。即ち、背景画像情報を解析して表されている人物、人物の持ち物、人物の表情、風景、建築物の状況、室内の状況、動物、乗物、その他の物品を認識する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去の背景画像テキストの入力データ及び生成データから機械学習により生成できる。 The background image content recognition unit 320 recognizes the content of the background image from the background image information. That is, a person, a personal belonging, a facial expression, a landscape, a building situation, an indoor situation, an animal, a vehicle, and other articles represented by analyzing background image information are recognized. Parameters, conditions, and the like used for this recognition can be generated by machine learning from input data and generation data of past background image text accumulated as will be described later.

背景画像内容テキスト化部３３０は背景画像内容をテキスト化して出力する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去の背景画像テキストの入力データ及び生成データから機械学習により生成できる。 The background image content text unit 330 converts the background image content into text and outputs it. Parameters, conditions, and the like used for this recognition can be generated by machine learning from input data and generation data of past background image text accumulated as will be described later.

＜ロゴマークテキスト化部４００＞
ロゴマークテキスト化部４００は映像信号Ｂを受けてコンテンツ中におけるロゴマーク内容を記述したテキストであるロゴマークテキストを出力する。ロゴマークとしては、商品の出所を表示する商標、記号、符号を含むマーク、その他の標章を挙げることができる。ロゴマークテキスト化部４００は、ロゴマーク画像情報抽出部４１０、ロゴマーク内容認識部４２０、ロゴマーク内容テキスト化部４３０を備える。 <Logo Mark Textification Unit 400>
The logo mark text converting unit 400 receives the video signal B and outputs a logo mark text which is a text describing the logo mark contents in the content. Examples of the logo mark include a mark, a mark including a mark, and a mark indicating the origin of the product, and other marks. The logo mark text conversion unit 400 includes a logo mark image information extraction unit 410, a logo mark content recognition unit 420, and a logo mark content text conversion unit 430.

ロゴマーク画像情報抽出部４１０は、ビデオ信号Ｖの映像信号Ｂからロゴマーク画像情報を抽出する。即ち、映像信号Ｂ中のテロップや背景画像を取り除き、認識可能なロゴマーク画像だけの情報を抽出する。 The logo mark image information extraction unit 410 extracts logo mark image information from the video signal B of the video signal V. That is, the telop and the background image in the video signal B are removed, and only the recognizable logo mark image information is extracted.

ロゴマーク内容認識部４２０は、ロゴマーク画像情報からロゴマークの内容を認識する。即ち、ロゴマーク画像情報を解析して表されている商品、サービス、店舗、施設等を認識する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去のロゴマークテキストの入力データ及び生成データから機械学習により生成できる。 The logo mark content recognition unit 420 recognizes the content of the logo mark from the logo mark image information. That is, it recognizes products, services, stores, facilities, etc. represented by analyzing logo mark image information. Parameters, conditions, and the like used for this recognition can be generated by machine learning from the past logo mark text input data and generation data accumulated as described later.

ロゴマーク内容テキスト化部４３０はロゴマーク画像内容をテキスト化して出力する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去のロゴマークテキストの入力データ及び生成データから機械学習により生成できる。 The logo mark content text conversion unit 430 converts the logo mark image content into text and outputs it. Parameters, conditions, and the like used for this recognition can be generated by machine learning from the past logo mark text input data and generation data accumulated as described later.

＜テキスト統合部５００＞
テキスト統合部５００は、発話テキスト化部１００からの発話テキスト、テロップテキスト化部２００からのテロップテキスト、背景画像テキスト化部３００からの背景テキスト、ロゴマークテキスト化部４００からの背景テキストを統合する。即ち、各テキストにおける矛盾や誤りを訂正して、統合テキストを生成する。このテキストの統合に使用するパラメータ、条件等は後述するように蓄積された過去のテキスト統合の入力、出力データから機械学習により生成できる。 <Text integration unit 500>
The text integration unit 500 integrates the utterance text from the utterance text conversion unit 100, the telop text from the telop text conversion unit 200, the background text from the background image text conversion unit 300, and the background text from the logo mark text conversion unit 400. . That is, inconsistencies and errors in each text are corrected, and an integrated text is generated. Parameters, conditions, and the like used for text integration can be generated by machine learning from past text integration input and output data accumulated as described later.

＜要約作成部６００＞
要約作成部６００は、テキスト統合部５００からの統合テキストを要約する。即ち、要約テキストの内容を要約して指定された文字数とする。この要約に使用するパラメータ、条件等は後述するように蓄積された過去のようよう役処理の入力データ、出力データから機械学習により生成できる。 <Summary creation unit 600>
The summary creation unit 600 summarizes the integrated text from the text integration unit 500. In other words, the content of the summary text is summarized to the number of characters designated. Parameters, conditions, and the like used for this summarization can be generated by machine learning from input data and output data of combination processing such as the past accumulated as will be described later.

次に、各部の機械学習処理について説明する。
＜発話テキスト化部１００の機械学習処理＞
図２は同要約作成システムの発話テキスト化部を示すものであり、（ａ）はブロック図、（ｂ）は処理の流れを示す図である。同図（ａ）に示すように、発話テキスト化部１００は、発話情報抽出部１１０、発話内容認識部１２０、発話内容テキスト化部１３０の他、機械学習部１４０、内容認識テキスト作成設定部１５０、比較評価部１６０を備える。また発話テキスト化部１００には、既存データ格納部７００が接続されている。 Next, machine learning processing of each unit will be described.
<Machine learning process of speech text unit 100>
2A and 2B show an utterance text conversion unit of the summary creation system, where FIG. 2A is a block diagram and FIG. 2B is a diagram showing a processing flow. As shown in FIG. 5A, the utterance text conversion unit 100 includes an utterance information extraction unit 110, an utterance content recognition unit 120, an utterance content text conversion unit 130, a machine learning unit 140, and a content recognition text creation setting unit 150. The comparison evaluation unit 160 is provided. Further, an existing data storage unit 700 is connected to the speech text unit 100.

発話テキスト化部１００は既存データ格納部７００が格納する既存のビデオデータと既存の発話テキストに基づいて機械学習を行い、発話内容認識部１２０及び発話内容テキスト化部１３０を最適化する。既存データ格納部７００には、過去に人が発話テキストを作成したときに使用した多数のビデオデータを格納した既存ビデオデータ格納部７１０と、このビデオデータの発話内容から作成した発話テキストを格納した既存発話テキスト格納部７２０を備える。これらのビデオデータ及び発話テキストは機械学習の教材となる。 The utterance text conversion unit 100 performs machine learning based on the existing video data stored in the existing data storage unit 700 and the existing utterance text, and optimizes the utterance content recognition unit 120 and the utterance content text conversion unit 130. The existing data storage unit 700 stores an existing video data storage unit 710 that stores a large number of video data that has been used when a person has created an utterance text in the past, and an utterance text created from the utterance content of the video data. An existing utterance text storage unit 720 is provided. These video data and utterance texts are used as machine learning materials.

また、発話テキスト化部１００には、機械学習を行うときと、新規のビデオデータから発話内容テキストを作成するときにデータ出力の切り換えを行う切換部１７０、１８０を備える。 Further, the utterance text conversion unit 100 includes switching units 170 and 180 that perform data output switching when machine learning is performed and when an utterance content text is created from new video data.

内容認識テキスト作成設定部１５０は、発話情報抽出部１１０の発話情報の抽出処理の設定と、発話内容認識部１２０の発話内容認識処理の設定と、発話内容テキスト化部１３０のテキスト化処理の設定とが格納されている。発話情報抽出部１１０、発話内容認識部１２０及び発話内容テキスト化部１３０は内容認識テキスト作成設定部１５０の設定した条件、パラメータに従って発話情報抽出と、発話内容の認識、テキスト化とを行う。 The content recognition text creation setting unit 150 sets the speech information extraction processing of the speech information extraction unit 110, the speech content recognition processing setting of the speech content recognition unit 120, and the text conversion processing setting of the speech content text conversion unit 130. And are stored. The utterance information extraction unit 110, the utterance content recognition unit 120, and the utterance content text conversion unit 130 perform utterance information extraction, utterance content recognition, and text conversion according to the conditions and parameters set by the content recognition text creation setting unit 150.

比較評価部１６０は、比較部１６１と評価部１６２とを備える。比較部１６１は、既存ビデオデータ格納部７１０からの既存ビデオデータを受けて発話内容テキスト化部１３０が作成した発話テキストと、既存発話テキスト格納部７２０からの既存発話テキストとを比較する。評価部１６２は比較部１６１の比較結果に基づいて評価を行い、よく一致した場合は高い点数を与える。 The comparative evaluation unit 160 includes a comparison unit 161 and an evaluation unit 162. The comparison unit 161 receives the existing video data from the existing video data storage unit 710 and compares the utterance text created by the utterance content text conversion unit 130 with the existing utterance text from the existing utterance text storage unit 720. The evaluation unit 162 performs an evaluation based on the comparison result of the comparison unit 161, and gives a high score when the values match well.

機械学習部１４０は、評価部１６２からの評価を受け、内容認識テキスト作成設定部１５０の設定状態を変更する。この処理を同一のビデオデータについて繰り返し行い、評価部１６２の評価値をできるだけ高いものとする。この処理は複数のビデオデータについて繰り返し行うことができる。 The machine learning unit 140 receives the evaluation from the evaluation unit 162 and changes the setting state of the content recognition text creation setting unit 150. This process is repeated for the same video data to make the evaluation value of the evaluation unit 162 as high as possible. This process can be repeated for a plurality of video data.

このような機械学習を行うことにより、発話内容認識部１２０及び発話内容テキスト化部１３０の能力が向上する。所定の機械学習を終了した後、発話テキスト化部１００は新規ビデオデータを処理して、最適な発話テキストを出力できる状態となる。 By performing such machine learning, the abilities of the utterance content recognition unit 120 and the utterance content text conversion unit 130 are improved. After the predetermined machine learning is completed, the utterance text converting unit 100 processes the new video data and is in a state where the optimum utterance text can be output.

発話テキスト化部１００の処理について説明する。図２（ｂ）に示すように、まず内容認識テキスト作成設定部１５０に音声認識及びテキスト化の特徴量を設定する（ステップＳＡ１）。この設定は機械学習部１４０の学習結果により行う。 Processing of the utterance text unit 100 will be described. As shown in FIG. 2B, first, a feature amount for speech recognition and text conversion is set in the content recognition text creation setting unit 150 (step SA1). This setting is performed based on the learning result of the machine learning unit 140.

次いで、発話情報抽出部１１０が、設定された特徴に基づいて音声を大量の音声信号の中から抽出する（ステップＳＡ２）。 Next, the utterance information extraction unit 110 extracts speech from a large amount of speech signals based on the set features (step SA2).

更に、発話内容認識部１２０が、設定された特徴に基づいて抽出した音声を解析する（ステップＳＡ３）。 Further, the utterance content recognition unit 120 analyzes the extracted voice based on the set feature (step SA3).

そして、発話内容テキスト化部１３０が、設定された特徴に基づいて音声をテキスト化して発話テキストを出力する（ステップＳＡ４）。 Then, the utterance content text conversion unit 130 converts the voice into text based on the set feature and outputs the utterance text (step SA4).

＜テロップテキスト化部２００の機械学習＞
図３は同要約作成システムのテロップテキスト化部を示すものであり、（ａ）はブロック図、（ｂ）は処理の流れを示す図である。同図（ａ）に示すように、テロップテキスト化部２００は、テロップ情報抽出部２１０、テロップ内容認識部２２０、テロップ内容テキスト化部２３０の他、機械学習部２４０、内容認識テキスト作成設定部２５０、比較評価部２６０を備える。またテロップテキスト化部２００には、既存データ格納部７００が接続されている。 <Machine learning of telop text unit 200>
FIG. 3 shows a telop text conversion unit of the summary creation system, where (a) is a block diagram and (b) is a diagram showing a flow of processing. As shown in FIG. 6A, the telop text conversion unit 200 includes a telop information extraction unit 210, a telop content recognition unit 220, and a telop content text conversion unit 230, as well as a machine learning unit 240 and a content recognition text creation setting unit 250. The comparison evaluation unit 260 is provided. An existing data storage unit 700 is connected to the telop text conversion unit 200.

テロップテキスト化部２００は既存データ格納部７００が格納する既存のビデオデータと既存のテロップテキストに基づいて機械学習を行い、テロップ内容認識部２２０及びテロップ内容テキスト化部２３０を最適化する。既存データ格納部７００には、過去に人がテロップテキストを作成したときに使用した多数のビデオデータを格納した既存ビデオデータ格納部７１０と、このビデオデータの発話内容から作成したテロップテキストを格納した既存テロップテキスト格納部７３０を備える。これらのビデオデータ及び発話テキストは機械学習の教材となる。 The telop text conversion unit 200 performs machine learning based on the existing video data stored in the existing data storage unit 700 and the existing telop text, and optimizes the telop content recognition unit 220 and the telop content text conversion unit 230. The existing data storage unit 700 stores an existing video data storage unit 710 that stores a large number of video data used when a telop text was created by a person in the past, and a telop text created from the utterance content of the video data. An existing telop text storage unit 730 is provided. These video data and utterance texts are used as machine learning materials.

また、テロップテキスト化部２００には、機械学習を行うときと、新規のビデオデータから発話内容テキストを作成するときにデータ出力の切り換えを行う切換部２７０、２８０を備える。 The telop text conversion unit 200 includes switching units 270 and 280 that perform data output switching when machine learning is performed and when an utterance content text is created from new video data.

内容認識テキスト作成設定部２５０は、テロップ情報抽出部２１０のテロップ情報抽出の設定と、テロップ内容認識部２２０のテキスト内容認識処理の設定と、テロップ内容テキスト化部２３０のテキスト化処理の設定とが格納されている。テロップ情報抽出部２１０、テロップ内容認識部２２０及びテロップ内容テキスト化部２３０は内容認識テキスト作成設定部２５０の設定した条件、パラメータに従ってテロップの抽出、内容認識、及びテキスト化を行う。 The content-recognized text creation setting unit 250 includes a setting of telop information extraction by the telop information extraction unit 210, a setting of text content recognition processing by the telop content recognition unit 220, and a setting of text conversion processing by the telop content text conversion unit 230. Stored. The telop information extraction unit 210, the telop content recognition unit 220, and the telop content text conversion unit 230 perform telop extraction, content recognition, and text conversion according to the conditions and parameters set by the content recognition text creation setting unit 250.

比較評価部２６０は、比較部２６１と評価部２６２とを備える。比較部２６１は、既存ビデオデータ格納部７１０からの既存ビデオデータを受けてテロップ内容テキスト化部２３０が作成したテロップテキストと、既存テロップテキスト格納部７３０からの既存テロップテキストとを比較する。評価部２６２は比較部２６１の比較結果に基づいて評価を行い、よく一致した場合は高い点数を与える。 The comparative evaluation unit 260 includes a comparison unit 261 and an evaluation unit 262. The comparison unit 261 receives the existing video data from the existing video data storage unit 710 and compares the telop text created by the telop content text unit 230 with the existing telop text from the existing telop text storage unit 730. The evaluation unit 262 performs an evaluation based on the comparison result of the comparison unit 261, and gives a high score when the values match well.

機械学習部２４０は、評価部２６２からの評価を受け、内容認識テキスト作成設定部２５０の設定状態を変更する。この処理を同一のビデオデータについて繰り返し行い、評価部２６２の評価値をできるだけ高いものとする。この処理は複数のビデオデータについて繰り返し行うことができる。 The machine learning unit 240 receives the evaluation from the evaluation unit 262 and changes the setting state of the content recognition text creation setting unit 250. This process is repeated for the same video data to make the evaluation value of the evaluation unit 262 as high as possible. This process can be repeated for a plurality of video data.

このような機械学習を行うことにより、テロップ内容認識部２２０及びテロップ内容テキスト化部２３０の能力が向上する。所定の機械学習を終了した後、テロップテキスト化部２００は新規ビデオデータを処理して、最適なテロップテキストを出力できる状態となる。 By performing such machine learning, the capabilities of the telop content recognition unit 220 and the telop content text conversion unit 230 are improved. After completing the predetermined machine learning, the telop text conversion unit 200 processes the new video data and is in a state where it can output the optimum telop text.

テロップテキスト化部２００の処理について説明する。図３（ｂ）に示すように、まず内容認識テキスト作成設定部２５０にテロップ情報抽出、内容抽出、及びテキスト化の特徴量を設定する（ステップＳＢ１）。この設定は機械学習部２４０の学習結果により行う（ステップＳＢ２）。 The processing of the telop text conversion unit 200 will be described. As shown in FIG. 3B, first, telop information extraction, content extraction, and text conversion feature amounts are set in the content recognition text creation setting unit 250 (step SB1). This setting is performed based on the learning result of the machine learning unit 240 (step SB2).

次いで、テロップ情報抽出部２１０が、設定された特徴に基づいてテロップを大量の映像信号の中から抽出する（ステップＳＢ２）。 Next, the telop information extraction unit 210 extracts a telop from a large amount of video signals based on the set feature (step SB2).

更に、テロップ内容認識部２２０が、設定された特徴に基づいて抽出したテロップを解析する（ステップＳＢ３）。 Further, the telop content recognition unit 220 analyzes the telop extracted based on the set feature (step SB3).

そして、テロップ内容テキスト化部２３０が、設定された特徴に基づいてテロップの内容をテキスト化してテロップテキストとして出力する（ステップＳＢ４）。 Then, the telop content text conversion unit 230 converts the telop content into text based on the set feature and outputs it as telop text (step SB4).

＜背景画像テキスト化部３００の機械学習＞
図４は同要約作成システムの背景画像テキスト化部を示すものであり、（ａ）はブロック図、（ｂ）は処理の流れを示す図である。同図（ａ）に示すように、背景画像テキスト化部３００は、背景画像情報抽出部３１０、背景画像内容認識部３２０、背景画像内容テキスト化部３３０の他、機械学習部３４０、内容認識テキスト作成設定部３５０、比較評価部３６０を備える。また背景画像テキスト化部３００には、既存データ格納部７００が接続されている。 <Machine learning of background image text unit 300>
4A and 4B show a background image text conversion unit of the summary creation system. FIG. 4A is a block diagram and FIG. 4B is a diagram showing a flow of processing. As shown in FIG. 6A, the background image text conversion unit 300 includes a background image information extraction unit 310, a background image content recognition unit 320, a background image content text conversion unit 330, a machine learning unit 340, a content recognition text. A creation setting unit 350 and a comparative evaluation unit 360 are provided. An existing data storage unit 700 is connected to the background image text unit 300.

背景画像テキスト化部３００は既存データ格納部７００が格納する既存のビデオデータと既存の背景画像テキストに基づいて機械学習を行い、背景画像内容認識部３２０及び背景画像内容テキスト化部３３０を最適化する。既存データ格納部７００には、過去に人がテロップテキストを作成したときに使用した多数のビデオデータを格納した既存ビデオデータ格納部７１０と、このビデオデータの発話内容から作成した背景画像テキストを格納した既存背景画像テキスト格納部７４０を備える。これらのビデオデータ及び背景画像テキストは機械学習の教材となる。 The background image text conversion unit 300 performs machine learning based on the existing video data stored in the existing data storage unit 700 and the existing background image text, and optimizes the background image content recognition unit 320 and the background image content text conversion unit 330. To do. The existing data storage unit 700 stores an existing video data storage unit 710 that stores a large number of video data used when a person created telop text in the past, and a background image text created from the utterance content of the video data. The existing background image text storage unit 740 is provided. These video data and background image text serve as machine learning materials.

また、背景画像テキスト化部３００には、機械学習を行うときと、新規のビデオデータから発話内容テキストを作成するときにデータ出力の切り換えを行う切換部３７０、３８０を備える。 The background image text converting unit 300 includes switching units 370 and 380 that perform data output switching when machine learning is performed and when an utterance content text is created from new video data.

内容認識テキスト作成設定部３５０は、背景画像情報抽出部３１０の背景画像抽出処理の設定と、背景画像内容認識部３２０の背景画像内容認識処理の設定と、背景画像内容テキスト化部３３０のテキスト化処理の設定とが格納されている。背景画像情報抽出部３１０、背景画像内容認識部３２０及び背景画像内容テキスト化部３３０は内容認識テキスト作成設定部３５０の設定した条件、パラメータに従って背景画像の抽出、背景画像の内容認識及びテキスト化を行う。 The content recognition text creation setting unit 350 sets the background image extraction processing of the background image information extraction unit 310, the background image content recognition processing of the background image content recognition unit 320, and the text conversion of the background image content text conversion unit 330. Stores processing settings. The background image information extraction unit 310, the background image content recognition unit 320, and the background image content text conversion unit 330 perform background image extraction, background image content recognition, and text conversion according to the conditions and parameters set by the content recognition text creation setting unit 350. Do.

比較評価部３６０は、比較部３６１と評価部３６２とを備える。比較部３６１は、既存ビデオデータ格納部７１０からの既存ビデオデータを受けて背景画像内容テキスト化部３３０が作成した背景画像テキストと、既存背景画像テキスト格納部７４０からの既存背景画像テキストとを比較する。評価部３６２は比較部３６１の比較結果に基づいて評価を行い、よく一致した場合は高い点数を与える。 The comparative evaluation unit 360 includes a comparison unit 361 and an evaluation unit 362. The comparison unit 361 compares the background image text generated by the background image content text unit 330 upon receiving the existing video data from the existing video data storage unit 710 and the existing background image text from the existing background image text storage unit 740. To do. The evaluation unit 362 performs an evaluation based on the comparison result of the comparison unit 361, and gives a high score when the values match well.

機械学習部３４０は、評価部３６２からの評価を受け、内容認識テキスト作成設定部３５０の設定状態を変更する。この処理を同一のビデオデータについて繰り返し行い、評価部３６２の評価値をできるだけ高いものとする。この処理は複数のビデオデータについて繰り返し行うことができる。 The machine learning unit 340 receives the evaluation from the evaluation unit 362 and changes the setting state of the content recognition text creation setting unit 350. This process is repeated for the same video data to make the evaluation value of the evaluation unit 362 as high as possible. This process can be repeated for a plurality of video data.

このような機械学習を行うことにより、背景画像内容認識部３２０及び背景画像内容テキスト化部３３０の能力が向上する。所定の機械学習を終了した後、背景画像テキスト化部３００は新規ビデオデータを処理して、最適な背景画像テキストを出力できる状態となる。 By performing such machine learning, the capabilities of the background image content recognition unit 320 and the background image content text conversion unit 330 are improved. After the predetermined machine learning is completed, the background image text converting unit 300 processes the new video data and is in a state where the optimum background image text can be output.

背景画像テキスト化部３００の処理について説明する。図４（ｂ）に示すように、まず内容認識テキスト作成設定部３５０に背景画像情報抽出、背景画像認識、及びテキスト化の特徴量を設定する（ステップＳＣ１）。この設定は機械学習部３４０の学習結果により行う。 The process of the background image text unit 300 will be described. As shown in FIG. 4B, first, background image information extraction, background image recognition, and text conversion feature quantities are set in the content recognition text creation setting unit 350 (step SC1). This setting is performed based on the learning result of the machine learning unit 340.

次いで、背景画像情報抽出部３１０が、設定された特徴に基づいて背景画像を大量の映像信号の中から抽出する（ステップＳＣ２）。 Next, the background image information extraction unit 310 extracts a background image from a large amount of video signals based on the set feature (step SC2).

更に、背景画像内容認識部３２０が、設定された特徴に基づいて抽出した背景画像を解析する（ステップＳＣ３）。 Further, the background image content recognition unit 320 analyzes the background image extracted based on the set feature (step SC3).

そして、背景画像内容テキスト化部３３０が、設定された特徴に基づいて背景画像の内容をテキスト化して背景画像テキストとして出力する（ステップＳＣ４）。 Then, the background image content text conversion unit 330 converts the content of the background image into text based on the set feature and outputs it as background image text (step SC4).

＜ロゴマークテキスト化部４００の機械学習＞
図５は同要約作成システムのロゴマークテキスト化部を示すものであり、（ａ）はブロック図、（ｂ）は処理の流れを示す図である。ロゴマークテキスト化部４００は、ロゴマーク画像情報抽出部４１０、ロゴマーク内容認識部４２０、ロゴマーク内容テキスト化部４３０の他、機械学習部４４０、内容認識テキスト作成設定部４５０、比較評価部４６０を備える。またロゴマークテキスト化部４００には、既存データ格納部７００が接続されている。 <Machine learning of logo mark text unit 400>
5A and 5B show a logo mark text conversion unit of the summary creation system, where FIG. 5A is a block diagram and FIG. 5B is a diagram showing a flow of processing. In addition to the logo mark image information extraction unit 410, the logo mark content recognition unit 420, and the logo mark content text conversion unit 430, the logo mark text conversion unit 400 includes a machine learning unit 440, a content recognition text creation setting unit 450, and a comparative evaluation unit 460. Is provided. Further, an existing data storage unit 700 is connected to the logo mark text unit 400.

ロゴマークテキスト化部４００は既存データ格納部７００が格納する既存のビデオデータと既存のロゴマークテキストに基づいて機械学習を行い、ロゴマーク画像情報抽出部４１０、ロゴマーク内容認識部４２０及びロゴマーク内容テキスト化部４３０を最適化する。既存データ格納部７００には、過去に人がロゴマークテキストを作成したときに使用した多数のビデオデータを格納した既存ビデオデータ格納部７１０と、このビデオデータの発話内容から作成したロゴマークテキストを格納した既存ロゴマークテキスト格納部７５０を備える。これらのビデオデータ及びロゴマークテキストは機械学習の教材となる。 The logo mark text conversion unit 400 performs machine learning based on the existing video data stored in the existing data storage unit 700 and the existing logo mark text, and performs the logo mark image information extraction unit 410, the logo mark content recognition unit 420, and the logo mark. The content text unit 430 is optimized. In the existing data storage unit 700, an existing video data storage unit 710 storing a large number of video data used when a person has created logo mark text in the past, and a logo mark text created from the utterance content of the video data are stored. The stored existing logo mark text storage unit 750 is provided. These video data and logo mark text are used as machine learning materials.

また、ロゴマークテキスト化部４００には、機械学習を行うときと、新規のビデオデータから発話内容テキストを作成するときにデータ出力の切り換えを行う切換部４７０、４８０を備える。 In addition, the logo mark text converting unit 400 includes switching units 470 and 480 that switch data output when machine learning is performed and when an utterance content text is created from new video data.

内容認識テキスト作成設定部４５０は、ロゴマーク内容認識部４２０のロゴマーク画像内容認識処理の設定と、ロゴマーク内容テキスト化部４３０のテキスト化処理の設定が格納されている。ロゴマーク画像情報抽出部４１０、ロゴマーク内容認識部４２０及びロゴマーク内容テキスト化部４３０は内容認識テキスト作成設定部４５０の設定した条件、パラメータに従ってロゴマークの抽出、内容認識及びテキスト化を行う。 The content recognition text creation setting unit 450 stores the settings of the logo mark image content recognition processing of the logo mark content recognition unit 420 and the settings of the text conversion processing of the logo mark content text conversion unit 430. The logo mark image information extraction unit 410, the logo mark content recognition unit 420, and the logo mark content text conversion unit 430 perform logo mark extraction, content recognition, and text conversion according to the conditions and parameters set by the content recognition text creation setting unit 450.

比較評価部４６０は、比較部４６１と評価部４６２とを備える。比較部４６１は、既存ビデオデータ格納部７１０からの既存ビデオデータを受けてロゴマーク内容テキスト化部４３０が作成したテキストと、既存ロゴマークテキスト格納部７５０からの既存背景画像テキストとを比較する。評価部４６２は比較部４６１の比較結果に基づいて評価を行い、よく一致した場合は高い点数を与える。 The comparative evaluation unit 460 includes a comparison unit 461 and an evaluation unit 462. The comparison unit 461 receives the existing video data from the existing video data storage unit 710 and compares the text created by the logo mark content text conversion unit 430 with the existing background image text from the existing logo mark text storage unit 750. The evaluation unit 462 performs an evaluation based on the comparison result of the comparison unit 461, and gives a high score when the values match well.

機械学習部４４０は、評価部４６２からの評価を受け、内容認識テキスト作成設定部４５０の設定状態を変更する。この処理を同一のビデオデータについて繰り返し行い、評価部４６２の評価値をできるだけ高いものとする。この処理は複数のビデオデータについて繰り返し行うことができる。 The machine learning unit 440 receives the evaluation from the evaluation unit 462 and changes the setting state of the content recognition text creation setting unit 450. This process is repeated for the same video data to make the evaluation value of the evaluation unit 462 as high as possible. This process can be repeated for a plurality of video data.

このような機械学習を行うことにより、ロゴマーク内容認識部４２０及びロゴマーク内容テキスト化部４３０の能力が向上する。所定の機械学習を終了した後、ロゴマークテキスト化部４００は新規ビデオデータを処理して、最適な背景画像テキストを出力できる状態となる。 By performing such machine learning, the ability of the logo mark content recognition unit 420 and the logo mark content text conversion unit 430 is improved. After the predetermined machine learning is completed, the logo mark text conversion unit 400 processes the new video data and is in a state where it can output the optimum background image text.

ロゴマークテキスト化部４００の処理について説明する。図５（ｂ）に示すように、まず内容認識テキスト作成設定部４５０にロゴマークの特徴量を設定する（ステップＳＣ１）。この設定は機械学習部３４０の学習結果により行う。 Processing of the logo mark text conversion unit 400 will be described. As shown in FIG. 5B, first, a feature amount of a logo mark is set in the content recognition text creation setting unit 450 (step SC1). This setting is performed based on the learning result of the machine learning unit 340.

次いで、ロゴマーク画像情報抽出部４１０が、設定された特徴に基づいてロゴマークを大量の映像信号の中から抽出する（ステップＳＤ２）。 Next, the logo mark image information extraction unit 410 extracts a logo mark from a large amount of video signals based on the set feature (step SD2).

更に、ロゴマーク内容認識部４２０が、設定された特徴に基づいて抽出した背景画像を解析し、自動的に確認して登録する（ステップＳＤ３）。 Further, the logo mark content recognition unit 420 analyzes the background image extracted based on the set feature, and automatically confirms and registers it (step SD3).

更に、ロゴマーク内容認識部４２０が、登録されたロゴマークや特定のロゴマークに合致したものをロゴマークとして認識する（ステップＳＤ４）。 Further, the logo mark content recognition unit 420 recognizes a registered logo mark or a thing that matches a specific logo mark as a logo mark (step SD4).

そして、ロゴマーク内容テキスト化部４３０が設定された特徴に基づいてロゴマークの内容をテキスト化してロゴマークテキストとして出力する（ステップＳＤ５）。 Then, the logo mark content text converting unit 430 converts the logo mark content into text based on the set feature and outputs it as logo mark text (step SD5).

＜テキスト統合部５００の機械学習＞
図６は同要約作成システムのテキスト統合部を示すブロック図である。テキスト統合部５００は、統合テキスト作成部５１０、統合テキスト作成設定部５２０、機械学習部５３０、比較評価部５４０を備える。テキスト統合部５００には、既存データ格納部７００が接続されている。 <Machine learning of text integration unit 500>
FIG. 6 is a block diagram showing a text integration unit of the summary creation system. The text integration unit 500 includes an integrated text creation unit 510, an integrated text creation setting unit 520, a machine learning unit 530, and a comparative evaluation unit 540. An existing data storage unit 700 is connected to the text integration unit 500.

テキスト統合部５００は、既存データ格納部７００が格納する既存の各種、すなわち、発話テキスト、テロップテキスト、背景テキスト及びロゴマークテキストと既存の統合テキストに基づいて機械学習を行い、統合テキスト作成部５１０の動作を最適化する。既存データ格納部７００には、過去に統合テキストを作成したときに使用した各種テキストデータを格納した既存各種テキスト格納部７６０と、この各種テキストから作成した統合テキストを格納した既存統合テキスト格納部７７０とを備える。これらの各種テキスト及び統合テキストは機械学習の教材となる。 The text integration unit 500 performs machine learning based on various existing types stored in the existing data storage unit 700, that is, speech text, telop text, background text, logo mark text, and existing integrated text, and an integrated text creation unit 510. Optimize the operation. The existing data storage unit 700 stores an existing various text storage unit 760 that stores various text data used when an integrated text was created in the past, and an existing integrated text storage unit 770 that stores an integrated text created from the various texts. With. These various texts and integrated texts serve as machine learning materials.

また、テキスト統合部５００には、機械学習を行うときと、新規の各種テキストから新たな統合テキストを作成するときにデータ出力の切り換えを行う切換部５７０、５８０を備える。 In addition, the text integration unit 500 includes switching units 570 and 580 that perform data output switching when performing machine learning and when creating a new integrated text from various new texts.

統合テキスト作成設定部５２０は、統合テキスト作成部５１０のテキスト統合処理の設定が格納されている。統合テキスト作成部５１０は統合テキスト作成設定部５２０の設定した条件、パラメータに従ってテキスト統合処理を行う。 The integrated text creation setting unit 520 stores text integration processing settings of the integrated text creation unit 510. The integrated text creation unit 510 performs text integration processing according to the conditions and parameters set by the integrated text creation setting unit 520.

比較評価部５４０は、比較部５４１と評価部５４２とを備える。比較部５４１は、既存各種テキスト格納部７６０からの既存各種テキストを受けて統合テキスト作成部５１０が作成した統合テキストと、既存統合テキスト格納部７７０からの既存統合テキストとを比較する。評価部５４２は比較部５４１の比較結果に基づいて評価を行い、よく一致した場合は高い点数を与える。 The comparative evaluation unit 540 includes a comparison unit 541 and an evaluation unit 542. The comparison unit 541 receives the existing various texts from the existing various text storage units 760 and compares the integrated text created by the integrated text creation unit 510 with the existing integrated texts from the existing integrated text storage unit 770. The evaluation unit 542 performs an evaluation based on the comparison result of the comparison unit 541, and gives a high score when the values match well.

機械学習部５３０は、評価部５４２からの評価を受け、統合テキスト作成設定部５２０の設定状態を変更する。この処理を同一の各種テキストデータについて繰り返し行い、評価部５４２の評価値をできるだけ高いものとする。この処理は複数の各種テキストデータについて繰り返し行うことができる。 The machine learning unit 530 receives the evaluation from the evaluation unit 542 and changes the setting state of the integrated text creation setting unit 520. This process is repeated for the same various text data to make the evaluation value of the evaluation unit 542 as high as possible. This process can be repeated for a plurality of various text data.

このような機械学習を行うことにより、統合テキスト作成部５１０の能力が向上する。所定の機械学習を終了した後、テキスト統合部５００は新規ビデオデータを処理して、最適な統合テキストを出力できる状態となる。 By performing such machine learning, the ability of the integrated text creation unit 510 is improved. After completing the predetermined machine learning, the text integration unit 500 processes the new video data and is in a state where it can output the optimum integrated text.

＜要約作成部６００の機械学習＞
図７は同要約作成システムの要約作成部を示すブロック図である。要約作成部６００は、要約テキスト作成部６１０、要約作成設定部６２０、機械学習部６３０、比較評価部６４０を備える。要約作成部６００には、既存データ格納部７００が接続されている。 <Machine learning of summary creation unit 600>
FIG. 7 is a block diagram showing a summary creation unit of the summary creation system. The summary creation unit 600 includes a summary text creation unit 610, a summary creation setting unit 620, a machine learning unit 630, and a comparative evaluation unit 640. An existing data storage unit 700 is connected to the summary creation unit 600.

要約作成部６００は既存データ格納部７００が格納する統合テキストと要約テキストに基づいて機械学習を行い、要約テキスト作成部６１０の動作を最適化する。既存データ格納部７００には、過去に要約テキストを作成したときに使用した統合テキストデータを格納した既存統合テキスト格納部７７０と、この統合テキストから作成した要約テキストを格納した既存要約テキスト格納部７８０とを備える。これらの統合テキスト及び要約テキストは機械学習の教材となる。 The summary creation unit 600 performs machine learning based on the integrated text and summary text stored in the existing data storage unit 700, and optimizes the operation of the summary text creation unit 610. The existing data storage unit 700 includes an existing integrated text storage unit 770 that stores integrated text data used when a summary text was created in the past, and an existing summary text storage unit 780 that stores a summary text created from the integrated text. With. These integrated texts and summary texts serve as machine learning materials.

また、要約作成部６００には、機械学習を行うときと、新規の統合テキストから新たな要約テキストを作成するときにデータ出力の切り換えを行う切換部６７０、６８０を備える。 The summary creation unit 600 includes switching units 670 and 680 that perform data output switching when machine learning is performed and when a new summary text is created from a new integrated text.

要約作成設定部６２０には、要約テキスト作成部６１０の要約処理の設定が格納されている。要約テキスト作成部６１０は要約作成設定部６２０の設定した条件、パラメータに従ってテキスト要約処理を行う。 The summary creation setting unit 620 stores the summary processing settings of the summary text creation unit 610. The summary text creation unit 610 performs text summary processing according to the conditions and parameters set by the summary creation setting unit 620.

比較評価部６４０は、比較部６４１と評価部６４２とを備える。比較部６４１は、既存統合テキスト格納部７７０からの既存統合テキストを受けて要約テキスト作成部６１０が作成した要約テキストと、既存要約テキスト格納部７８０からの要約テキストとを比較する。評価部６４２は比較部６４１の比較結果に基づいて評価を行い、よく一致した場合は高い点数を与える。 The comparative evaluation unit 640 includes a comparison unit 641 and an evaluation unit 642. The comparison unit 641 compares the summary text created by the summary text creation unit 610 in response to the existing integration text from the existing integration text storage unit 770 and the summary text from the existing summary text storage unit 780. The evaluation unit 642 performs an evaluation based on the comparison result of the comparison unit 641 and gives a high score when the values match well.

機械学習部６３０は、評価部６４２からの評価を受け、要約作成設定部６２０の設定状態を変更する。この処理を同一の各種テキストデータについて繰り返し行い、評価部６４２の評価値をできるだけ高いものとする。この処理は複数の統合テキストデータについて繰り返し行うことができる。 The machine learning unit 630 receives the evaluation from the evaluation unit 642 and changes the setting state of the summary creation setting unit 620. This process is repeated for the same various text data to make the evaluation value of the evaluation unit 642 as high as possible. This process can be repeated for a plurality of integrated text data.

このような機械学習を行うことにより、要約テキスト作成部６１０の能力が向上する。所定の機械学習を終了した後、要約作成部６００は新規ビデオデータを処理して、最適な要約テキストを出力できる状態となる。 By performing such machine learning, the capability of the summary text creation unit 610 is improved. After completing the predetermined machine learning, the summary creation unit 600 can process the new video data and output an optimum summary text.

次に、要約作成システム１０の処理について説明する。図８は同要約作成システムの動作を示すフローチャートである。
まず、既存データ格納部７００の既存ビデオデータ格納部７１０、既存発話テキスト格納部７２０、既存テロップテキスト格納部７３０、既存背景画像テキスト格納部７４０、既存ロゴマークテキスト格納部７５０、既存各種テキスト格納部７６０、既存統合テキスト格納部７７０、既存要約テキスト格納部７８０に既存のビデオ信号、各種テキストデータを読み込む（ステップＳＴ１）。 Next, processing of the summary creation system 10 will be described. FIG. 8 is a flowchart showing the operation of the summary creation system.
First, the existing video data storage unit 710, the existing utterance text storage unit 720, the existing telop text storage unit 730, the existing background image text storage unit 740, the existing logo mark text storage unit 750, and the existing various text storage units of the existing data storage unit 700 760, an existing video signal and various text data are read into the existing integrated text storage unit 770 and the existing summary text storage unit 780 (step ST1).

次いで発話テキスト化部１００、テロップテキスト化部２００、背景画像テキスト化部３００、ロゴマークテキスト化部４００において、機械学習処理を行う（ステップＳＴ２ａ、ＳＴ２ｂ、ＳＴ２ｃ、ＳＴ２ｄ）。この学習処理は逐次的に行うこともできる。 Next, machine learning processing is performed in the speech text unit 100, the telop text unit 200, the background image text unit 300, and the logo mark text unit 400 (steps ST2a, ST2b, ST2c, and ST2d). This learning process can also be performed sequentially.

次に、テキスト統合部５００の既存データ格納部５５０、要約作成部６００の既存データ格納部６５０に既存の入力データ、出力データを読み込む（ステップＳＴ３）。次いで、テキスト統合部５００、要約作成部６００において機械学習処理を行う（ステップＳＴ３ａ、３ｂ）。この学習処理は逐次的に行うこともできる。なお、発話テキスト化部１００、テロップテキスト化部２００、背景画像テキスト化部３００、及びロゴマークテキスト化部４００の機械学習処理と、及びテキスト統合部５００及び要約作成部６００の機械学習処理とは処理の順序を問わず、逆の順序で行うことができる。 Next, the existing input data and output data are read into the existing data storage unit 550 of the text integration unit 500 and the existing data storage unit 650 of the summary creation unit 600 (step ST3). Next, machine learning processing is performed in the text integration unit 500 and the summary creation unit 600 (steps ST3a and 3b). This learning process can also be performed sequentially. Note that the machine learning processing of the utterance text conversion unit 100, the telop text conversion unit 200, the background image text conversion unit 300, and the logo mark text conversion unit 400 and the machine learning processing of the text integration unit 500 and the summary creation unit 600 are as follows. Regardless of the order of processing, it can be performed in the reverse order.

学習処理が終了すると（ステップＳＴ４のyes）、処理対象となるビデオ信号をビデオ信号分離部２０に入力する（ステップＳＴ５）。これにより、発話テキスト化部１００、テロップテキスト化部２００、背景画像テキスト化部３００、ロゴマークテキスト化部４００は、テキスト化処理を実行する（ステップＳＴ６ａ、ＳＴ６ｂ、ＳＴ６ｃ、ＳＴ６ｄ）。 When the learning process is completed (yes in step ST4), the video signal to be processed is input to the video signal separation unit 20 (step ST5). Thereby, the utterance text conversion unit 100, the telop text conversion unit 200, the background image text conversion unit 300, and the logo mark text conversion unit 400 execute text conversion processing (steps ST6a, ST6b, ST6c, ST6d).

そして、各テキストをテキスト統合部５００で統合処理し（ステップＳＴ７）、更に統合されたテキストを要約作成部６００で要約処理し（ステップＳＴ８）、要約テキストを出力し、要約作成システム１０の処理は終了する。 Then, each text is integrated by the text integration unit 500 (step ST7), the integrated text is further summarized by the summary creation unit 600 (step ST8), and the summary text is output. finish.

次の要約作成処理からは、機械学習処理（ステップＳＴ１〜ＳＴ４）は行わなくて直ちに要約作成の対象ビデオ信号の入力（ステップＳＴ５）をするだけで、最適な要約作成を行うことができる。また、機械学習処理は必要に応じて行うことができる。 From the next summary creation process, the optimum summary creation can be performed only by inputting the target video signal for summary creation (step ST5) immediately without performing the machine learning process (steps ST1 to ST4). The machine learning process can be performed as necessary.

以上のシステムは、処理装置としてのＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、記憶装置としてＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＨＤＤ（ＨａｒｄＤｉｓｃＤｒｉｖｅ）、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等を備えたコンピュータシステムでアプリケーションションソフトウエアを実行して実現できる。また、各部は同一ヶ所に配置される必要はなく、一部をクラウド上に配置してネットワークで接続して実現することができる。また、これらの処理は、大量のデータを対象とするためＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）を使用して処理することが好ましい。 The above system includes a CPU (Central Processing Unit) as a processing device, a RAM (Random Access Memory), a ROM (Read Only Memory), a HDD (Hard Disc Drive), an SSD (Solid State Drive), etc. as a storage device. It can be realized by executing application software on a computer system. Moreover, each part does not need to be arrange | positioned in the same place, and it can implement | achieve by arrange | positioning a part on cloud and connecting with a network. In addition, since these processes target a large amount of data, it is preferable to perform processing using a GPU (Graphics Processing Unit).

すなわち、統合テキストは、単に、音声、文字,背景映像等の文字化してものであり、膨大な文章についてのデータである。コのため、ＧＰＵをテキスト処理に特化することにより高速に処理できる。 That is, the integrated text is simply converted into characters such as voice, characters, background video, etc., and is data about a huge amount of text. Therefore, GPU can be processed at high speed by specializing in text processing.

また、テキスト統合部５００によるテキスト入力は、発話テキスト、テロップテキスト、背景画像テキスト及びロゴマークテキストに限定されない。 Further, text input by the text integration unit 500 is not limited to speech text, telop text, background image text, and logo mark text.

例えば、テレビ番組（地上デジタルテレビ放送番組）を対象とする場合、電子番組表（ＥＰＧ）、字幕放送、解説放送（二か国語放送等を含む）から取得した文字や音声をテキストとして取得して入力することができる。これにより、統合テキストの質と量とを向上させるとともに、テキストの汎用性や嗜好性を向上させることができる。 For example, when a TV program (terrestrial digital TV broadcast program) is targeted, text and sound acquired from an electronic program guide (EPG), caption broadcast, commentary broadcast (including bilingual broadcasts, etc.) are acquired as text. Can be entered. As a result, the quality and quantity of the integrated text can be improved, and the versatility and taste of the text can be improved.

同様に、インターネット映像配信を対象とする場合、第三者の評価（コメントを含む）や評判をテキストとして取得して入力できる。これにより、統合テキストの質と量とを向上させるとともに、テキストの汎用性や嗜好性を向上させることができる。 Similarly, in the case of Internet video distribution, third party evaluation (including comments) and reputation can be acquired and input as text. As a result, the quality and quantity of the integrated text can be improved, and the versatility and taste of the text can be improved.

なお、後述する「報知」のためのトリガーとして、これらの字幕放送や解説放送、あるいは、第三者の評価・評判等も対象とすることができる。 In addition, as a trigger for “notification” to be described later, these caption broadcasting and commentary broadcasting, or evaluation / reputation of a third party, etc. can be targeted.

［他の実施形態］
本発明にあってはデータ処理をＡＩ（人工知能：ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）処理により高速且つ適切に処理して要約化する。ＡＩ処理は、上述した機械学習（ＭＬ：ＭａｃｈｉｎｅＬｅａｒｎｉｎｇ）により実現できる。更に、機械学習として、既存データを正解とする教師有り学習が採用できる。また、機械学習としてディープラーニング（深層学習：ＤＬ：ＤｅｅｐＬｅａｒｎｉｎｇ）により行うと効果的である。 [Other Embodiments]
In the present invention, data processing is summarized at high speed and appropriately by AI (Artificial Intelligence) processing. The AI process can be realized by the above-described machine learning (ML). Furthermore, supervised learning with existing data as correct answers can be adopted as machine learning. Further, it is effective to perform deep learning (DL: Deep Learning) as machine learning.

ディープラーニングでは、既存の多数のビデオデータ、各ビデオデータに対応する各種テキストデータ、統合テキスト、要約テキストをビッグデータとして学習を行う。この、各機械学習部は、入力層、複数の中間層、出力層を備え、多数のニューロンを備えたニューラルネットワークにより処理を行い。すなわち、本発明に係る要約作成システムに入力された新規ビデオデータ、このビデオデータによる各種テキスト、統合テキスト、要約を入力とした出力が、既存の各種テキスト、統合テキスト、要約に近づくように中間層のニューロンにおける重み、パラメータを最小二乗法等の手法で適正化する。 In deep learning, a large number of existing video data, various text data corresponding to each video data, integrated text, and summary text are learned as big data. Each machine learning unit includes an input layer, a plurality of intermediate layers, and an output layer, and performs processing by a neural network including a large number of neurons. That is, an intermediate layer so that new video data input to the summary creation system according to the present invention, various texts based on the video data, integrated texts, and an output with the summary as input are close to existing various texts, integrated texts, and summaries. The weights and parameters in the neurons are optimized by a method such as the least square method.

上記の基本構成を一例として、本願発明は、例えば、一つの番組（コンテンツ）に含まれる所望の条件に適合した内容が出力される際に、その旨を利用者（視聴者・オペレータ等）に報知することにより、利用者に有用な映像部分のみを視聴するよう喚起することを目的として、先のコンテンツに基づいて予め蓄積された要約を参照して出力中のコンテンツを構成するデータに所望の条件に適合した内容が含まれているか否かを判定する判定手段と、前記判定手段で含まれていると判定した場合に出力中のコンテンツの利用者に対してその旨を報知する報知手段と、を備えるものである。 Taking the above basic configuration as an example, the present invention can, for example, inform the user (viewer / operator, etc.) when content suitable for a desired condition included in one program (content) is output. In order to urge the user to view only a useful video portion, the data constituting the content being output is referred to by referring to the summary accumulated in advance based on the previous content. A determination unit that determines whether or not content that meets the conditions is included; and a notification unit that notifies the user of the content that is being output when the determination unit determines that the content is included. , Are provided.

図９は、本発明の実施形態に係る映像情報提供システムの全体構成を示すブロック図である。 FIG. 9 is a block diagram showing the overall configuration of the video information providing system according to the embodiment of the present invention.

なお、図９において、映像情報提供システム１は、上述した要約作成システム１０を専用の管理サーバ等によって構成し、その管理サーバによって作成された要約に基づいて稼働する映像出力システム部分を、例えば、コンピュータ機能を備えるテレビ、パーソナルコンピュータ、スマートフォン、タブレット端末等（以下、「再生装置９」と総称する。）で実現してもよい。なお、再生装置９は、１台での利用のほか、複数台での利用も可能である。この場合、各台で所望の条件を変更することも可能である。 In FIG. 9, the video information providing system 1 includes the above-described summary creation system 10 by a dedicated management server or the like, and a video output system portion that operates based on the summary created by the management server, for example, You may implement | achieve with a television provided with a computer function, a personal computer, a smart phone, a tablet terminal, etc. (henceforth "reproducing apparatus 9" generically). In addition, the reproducing | regenerating apparatus 9 can be used not only by one but in multiple units. In this case, it is possible to change a desired condition for each unit.

また、以下の説明においては、テレビ放映の場合を主として説明するとともに、クラウド上の映像配信の固有の場合は適宜説明し、テレビ放映の利用形態と同一若しくは実質的に同一のクラウド上の映像配信の利用形態に関しては、その説明を省略する。 In the following description, the case of broadcasting on television will be mainly described, and the case of video distribution specific to the cloud will be described as appropriate, and the distribution of video on the cloud that is the same as or substantially the same as the usage form of television broadcasting. The description of the usage form is omitted.

テレビ放映には、地上波デジタル放送、衛星放送、ワンセグ放送、インターネット放送等、特に放送形態や受信形態は問わない。 Broadcasting television and broadcasting are not particularly limited, such as terrestrial digital broadcasting, satellite broadcasting, one-segment broadcasting, Internet broadcasting, and the like.

図９において、映像情報提供システム１は、上述したテレビ局３０（若しくはクラウド上の映像配信サーバ）からコンテンツに関するビデオ信号を受信するチューナ等を備える受信部２と、再生装置９に装備の操作部（リモコン等を含む）３と、再生装置９としての各種機能を実現するためのアプリケーションを格納した記憶部４と、記憶部４に記憶したアプリケーションに基づいて各種機能を処理する制御回路部５と、上述した要約作成システム（ビデオ信号分離部２０、発話テキスト化部１００、テロップテキスト化部２００、背景画像テキスト化部３００、ロゴマークテキスト化部４００、テキスト統合部５００、及び要約作成部６００）によって作成した要約並びにコンテンツの録画用の各種データを記憶する大容量記憶部６と、音声出力用のスピーカや映像出力用のモニタを含む出力部７と、を備える。 In FIG. 9, the video information providing system 1 includes a receiving unit 2 including a tuner that receives a video signal related to content from the above-described television station 30 (or a video distribution server on the cloud), and an operation unit ( 3 including a remote control) 3, a storage unit 4 storing applications for realizing various functions as the playback device 9, a control circuit unit 5 for processing various functions based on the applications stored in the storage unit 4, By the above-described summary creation system (video signal separation unit 20, speech text conversion unit 100, telop text conversion unit 200, background image text conversion unit 300, logo mark text conversion unit 400, text integration unit 500, and summary generation unit 600). A large-capacity storage unit 6 for storing the created summary and various data for content recording, and a sound And an output unit 7 including a speaker and a monitor for image output for output, a.

なお、図９において、発話テキスト化部１００、テロップテキスト化部２００、背景画像テキスト化部３００、ロゴマークテキスト化部４００は、その全体をビデオ信号処理部８として説明する。したがって、再生装置９は、図９に示した構成要件のうち、ビデオ信号処理部８を除く、受信部２、操作部３、記憶部４、制御回路部５、大容量記憶部６、出力部７を有している。また、制御回路部５は判定手段としての機能を具備し、出力部７は、例えば、重み付け付与手段としての制御回路部５で算出した重み付け付与に基づく、付与結果をモニタ出力或いはプリンタ出力する機能を具備している。 In FIG. 9, the utterance text conversion unit 100, the telop text conversion unit 200, the background image text conversion unit 300, and the logo mark text conversion unit 400 are described as the video signal processing unit 8 as a whole. Accordingly, the playback device 9 includes the receiving unit 2, the operation unit 3, the storage unit 4, the control circuit unit 5, the large-capacity storage unit 6, and the output unit, excluding the video signal processing unit 8 among the configuration requirements shown in FIG. 9. 7. In addition, the control circuit unit 5 has a function as a determination unit, and the output unit 7 has a function of outputting the output result to a monitor or a printer based on the weighting calculation calculated by the control circuit unit 5 as a weighting unit, for example. It has.

＜映像メタデータの制作・配信＞
ここでは、映像メタデータを制作して配信若しくは配信可能とする場合の一例として、テレビ放送内容を日本語処理してデータベース化する場合を説明する。また、この場合にコンテンツとは、一つの番組（又はコーナー）を対象として例示する。 <Production and distribution of video metadata>
Here, as an example of the case where video metadata is produced and distributed or can be distributed, a case where the contents of a television broadcast are processed into Japanese and converted into a database is described. In this case, the content is exemplified for one program (or corner).

テレビ番組において、特に、刻々と放送されるニュース・放送番組にあっては、「即時性」や「正確性」が重要となっている。 In TV programs, especially “news” and “broadcast programs” that are broadcast every moment, “immediateness” and “accuracy” are important.

その一方で、テレビ放送におけるこのようなニュース・放送番組にあっては、一部のニュース内容が時間帯の異なる他のニュース番組等（放送局の相違は問わない）で放送されることはあるものの、同一番組が異なる曜日に再放送されることはなく、消えゆく情報ともいえる。 On the other hand, in such news / broadcast programs in television broadcasting, some news content may be broadcast on other news programs with different time zones (regardless of differences in broadcasting stations). However, it can be said that the same program is not rebroadcast on different days of the week and disappears.

このような「即時性」や「正確性」を有する情報にあっては、ニュース内容によって、社会的な重要性やニーズ、或は、新情報が明らかになる、などの条件によって継続性を有する場合があるため、例えば、出現回数等が所定値に達するなどの重要度・ニーズ度等に応じてニュースが重み付けされるのが望ましい。 Such “immediate” and “accurate” information has continuity depending on the news content, social importance and needs, or new information becomes clear. In some cases, for example, it is desirable that the news is weighted according to the degree of importance / needs such that the number of appearances reaches a predetermined value.

ここで、重要度・ニーズ度には、短期的、長期的、時期的な要素を有していることから、例えば、週間、月間、季間（旬間）、年間、別の統計によって重み付けしたグラフを作成することも可能である。この際、作成されたグラフは、出力部７からモニタ出力又はプリンタ出力が可能である。 Here, the importance / needs level has short-term, long-term, and seasonal factors. For example, a graph weighted by weekly, monthly, seasonal (seasonal), annual, and other statistics. It is also possible to create. At this time, the created graph can be output from the output unit 7 as a monitor or a printer.

これにより、短期間での重要度・ニーズ度は高いものの。年間を通じた場合に重要度・ニーズ度が低くなってしまうことを抑制することができるうえ、対応する時期における重要度・ニーズ度が高いという重み付けを付与することができる。 As a result, the importance and needs in the short term are high. It is possible to prevent the importance / needs from becoming low over the course of the year, and to assign a weighting that the importance / needs at the corresponding time is high.

具体的には、「桜の開花予想」「桜の名所」「オリンピック」などの特定の周期で重要度・ニーズ度が高くなる場合等に有効な重み付けを付与することができる。 Specifically, an effective weighting can be given when the degree of importance / needs increases in a specific cycle such as “forecasting of cherry blossoms”, “famous spots for cherry blossoms”, and “the Olympics”.

また、新たに放送されるビデオ情報に対するメタデータは、１０分程度のタイムラグで逐次更新することができ、最新の情報に基づいた重要度等に更新することができる。この際、複数の放送局の番組を同時に受信して最新の情報に更新することも可能である。 Also, metadata for newly broadcast video information can be sequentially updated with a time lag of about 10 minutes, and can be updated to importance based on the latest information. At this time, it is also possible to simultaneously receive programs from a plurality of broadcasting stations and update them to the latest information.

メタデータには、放送局や放送時間等の基本情報に加え、ニュースのタイトル、内容の抄録、コメンテータの氏名や目立つロゴ、といったテキスト情報に加え、背景画像等の画像認証、キャスターの顔認証、声紋分析、等によってより細かい映像メタデータを制作・配信することができる。 In addition to basic information such as broadcasting stations and broadcast times, metadata includes text information such as news titles, content abstracts, commentator names and prominent logos, image authentication of background images, caster face authentication, Finer video metadata can be produced and distributed by voiceprint analysis.

さらに、その結果は、Ｗｅｂやメールにより、ユーザ側で確認することも可能となっている。したがって、ユーザ側において、これらの映像メタデータをハードディスク等の大容量記憶媒体に保存・蓄積していけば、さまざまな活用場面に利用することができる。 Further, the result can be confirmed on the user side by Web or mail. Therefore, if the video metadata is stored and accumulated on a large-capacity storage medium such as a hard disk on the user side, it can be used in various utilization situations.

具体的には、日々のニュース放送から、特定のコメンテータの言動をクローズアップして詳細を「完全収録」し、追って、その内容を検証することも可能となる。また、その特定のコメンテータを報知条件（トリガー）としておけば、現在放送中のニュース番組、或は、録画したニュース番組において、そのコメンテータがコメントしている際に、スポット的にボリュームを上げる等の報知も可能となる。 Specifically, from daily news broadcasts, it is possible to close up the behavior of a particular commentator and “completely record” the details, and to verify the contents later. In addition, if that particular commentator is used as a notification condition (trigger), when the commentator is commenting on a news program currently being broadcast or a recorded news program, the volume is increased in a spot manner. Notification is also possible.

なお、このような報知条件に適合した内容が含まれている場合に、出力中のコンテンツの利用者（視聴者）に対する報知には、上述したボリュームを上げる場合のほか、メッセージ等を発音するなどの利用者の聴覚に対して行うことができる。また、利用者の聴覚に対する報知のほか、例えば、表示画面７ａ（図１０参照）の明暗反転の繰り返しや専用ランプの点灯・点滅など、利用者の視覚に対する報知でもよい。また、これら聴覚と視覚との併用でもよい。さらに、単なる報知にとどまらず、他の動作（例えば、録画）を開始するためのトリガー信号として利用することも可能である。 In addition, when contents conforming to such notification conditions are included, notification to the user (viewer) of the content being output includes not only increasing the volume described above but also sounding a message or the like This can be done for the user's hearing. In addition to notification to the user's hearing, notification to the user's vision, such as repeated reversal of the brightness of the display screen 7a (see FIG. 10) or lighting / flashing of a dedicated lamp, may be used. Further, a combination of hearing and vision may be used. Furthermore, it can be used as a trigger signal for starting other operations (for example, recording) as well as mere notification.

さらに、番組中に流れる映像中の登場人物、例えば、上述した特定コメンテータのコメント時間や論調分析、放送された内容中（番組中）に紹介された政治家（政党）やスポーツ選手の映像等を含む放送時間といった、映像メタデータのデータベース化を行うとともに、クラスタリング（データを外的基準なしに自動的に分類する機能等）を行うことにより、人・物のＣＭ換算値を算出するといった重み付けの付与も可能である。 Furthermore, characters in the video that flows during the program, such as comment time and tone analysis of the specific commentator mentioned above, video of politicians (political parties) and sports players introduced in the broadcast content (during the program), etc. In addition to creating a database of video metadata such as broadcast time, including weighting such as calculating CM-converted values of people and objects by performing clustering (a function that automatically classifies data without external criteria, etc.) Granting is also possible.

なお、蓄積された過去の要約作成結果の入力データと出力データとを教材として最適な要約作成設定を学習する要約作成システム１０の機能である要約作成処理（ＡＩ処理）を利用して上述したような重み付けを付与する場合、ＡＩ処理とは別に、視聴率、或は、新聞や雑誌等の映像メタデータに含まれていない情報に基づいたオペレータの手動入力により、ＣＭ換算値を人物毎に評価価格（単位時間当たりの単価）に変換してもよい。 As described above using the summary creation process (AI process) that is a function of the summary creation system 10 that learns the optimum summary creation setting using the input data and output data of the past summary creation results accumulated as teaching materials. In addition to the AI processing, the CM conversion value is evaluated for each person by the manual input of the operator based on the audience rating or information not included in the video metadata such as newspapers and magazines. You may convert into a price (unit price per unit time).

さらに、重み付けされたＣＭ換算値は、例えば、単一放送局、単一番組、複数放送局（例えば、関東エリアのキー局）等を対象として映像メタデータを制作し、週報／月報／旬報（四半期）／半期／通期／単位でまとめることができる。なお、まとめたデータはグラフや一覧表（例えば、上位１００人を対象として）等によって出力部７からモニタ出力又はプリンタ出力が可能である。 Furthermore, the weighted CM conversion value is generated as video metadata for a single broadcast station, a single program, a plurality of broadcast stations (for example, a key station in the Kanto area), and the weekly / monthly / seasonal ( (Quarterly) / half year / full year / unit. The collected data can be output from the output unit 7 as a monitor or a printer by a graph or a list (for example, for the top 100 people).

さらに、テキスト化した映像メタデータは、同時放送中の文字放送として利用することができるうえ、例えば、テレビのニュース・放送番組、ワイドショー、討論番組、政治・経済番組、政治・経済バラエティなど、１日単位で延べ１００時間以上にもおよぶ国営放送局及び民放キー局の情報番組について、その内容や記事単位の詳細情報をオペレータによって作成するためのテキスト情報として利用することも可能である。 In addition, text-formatted video metadata can be used as teletext broadcasting during simultaneous broadcasting. For example, TV news / broadcast programs, wide shows, discussion programs, political / economic programs, political / economic variety, etc. It is also possible to use the contents and the detailed information for each article as text information to be created by the operator for the information programs of the state-run broadcasting stations and private key stations that extend over 100 hours per day.

＜再生装置９＞
再生装置９には、受信部２として、テレビ放送（地デジ・衛星放送・ワンセグ）用のチューナ機能、或は、インターネット配信映像を受信する受信機能、を有し、図１０に示すように、その映像を出力部７の表示画面７ａに出力することが可能であれば、テレビ、パーソナルコンピュータ、スマートフォン、タブレット端末、等を利用することができる。 <Reproducing apparatus 9>
The playback device 9 has a tuner function for television broadcasting (terrestrial digital broadcasting / satellite broadcasting / one-segment broadcasting) or a receiving function for receiving Internet distribution video as the receiving unit 2, and as shown in FIG. If the video can be output to the display screen 7a of the output unit 7, a television, a personal computer, a smartphone, a tablet terminal, or the like can be used.

受信部２は、要約作成システム１０によって作成した要約を適宜（又は逐次）受信する機能を有する。なお、受信部２で受信した要約は、大容量記憶部６に記憶（又は更新）される。 The receiving unit 2 has a function of appropriately (or sequentially) receiving the summary created by the summary creation system 10. The summary received by the receiving unit 2 is stored (or updated) in the mass storage unit 6.

操作部３は、テレビに付帯の各種スイッチ等、テレビに付属のリモートコントロール装置、コンピュータ用のマウスやキーボード、スマートフォンやタブレット端末に付帯の各種スイッチやタッチパネル、等を利用することができる。 The operation unit 3 can use various switches attached to the television, a remote control device attached to the television, a mouse and keyboard for computers, various switches and touch panels attached to smartphones and tablet terminals, and the like.

これにより、ニュース番組において、利用者がスポーツニュースの結果のみを知りたい場合、ニュース番組全体を視聴するのではなく、制御回路部（判定手段）５の監視により、例えば、図１０（Ａ）に示すように、表示画面７ａに「スポーツ」の文字がテロップ表示された場合や、図１０（Ｂ）に示すように、キャスターが「スポーツ」を含むアナウンス原稿を読み上げたときに、利用者に報知することができる。 Thereby, when the user wants to know only the result of the sports news in the news program, the user does not view the entire news program but monitors the control circuit unit (determination means) 5, for example, as shown in FIG. As shown in the figure, when the word “sports” is displayed in a telop on the display screen 7a, or when the caster reads an announcement document including “sports” as shown in FIG. can do.

ところで、上述したテレビ放送において、ニュースでは、ある事件が起きると、複数局あるテレビ放送局が繰り返し同じシーンを放送する。このような場合、各テレビメディアが何をいつどう放送したか、一つ一つ把握しても全体像を容易に認識することはできない場合が多い。 By the way, in the above-mentioned television broadcasting, when a certain incident occurs in news, a plurality of television broadcasting stations repeatedly broadcast the same scene. In such a case, it is often impossible to easily recognize the entire picture even if each TV media broadcasts what and when it broadcasts.

そこで、このような事件を所望の条件として設定すれば、指定した全てのニュース放送番組の内容を秒単位でテキストデータ化したうえでデータベース化し、要約を作成することができる。 Therefore, if such an incident is set as a desired condition, the contents of all designated news broadcast programs can be converted into text data in seconds and then compiled into a database to create a summary.

そして、その要約の内容を同一テーマ毎に分類（クラスター化）した結果分析（例えば、利用者や契約した専用会社のオペレータの処理）すれば、なにが、いつ、どの局で、どのくらい放送されたか、定量化された情報を得ることも可能となる。 Then, if the contents of the summary are classified (clustered) by the same theme and analyzed (for example, processing by the user or the operator of the contracted dedicated company), what will be broadcast at what station, when and how much. It is also possible to obtain quantified information.

そして、このような定量化された情報を、所望の条件として設定することにより、以降のニュース放送では、より最新の正確な条件を設定することも可能となり、上述した事件に関する放送の場合には報知視聴、他のニュース放送に関しては通常視聴、といったように切り替えることができる。 And by setting such quantified information as a desired condition, it becomes possible to set a more recent and accurate condition in subsequent news broadcasts. It is possible to switch between notification viewing and other news broadcasts such as normal viewing.

この定量化に際し、例えば、事件の映像部分（例えば、原子力発電所の事故処理の経過に関する映像部分）を大容量記憶部６に自動録画するなどの出力機能を重み付けとして付与することも可能である。 At the time of this quantification, for example, an output function such as automatically recording the video portion of the incident (for example, the video portion regarding the progress of accident processing at the nuclear power plant) in the large-capacity storage unit 6 can be given as a weight. .

また、上述したように、このような事件・事故に関する放送がテレビメディアでどのくらい扱われたか、どの局がどのテーマを時間・回数的にどう扱ってきたかをグラフ化するといった利用形態への重み付けも可能である。 In addition, as described above, weighting of usage patterns such as how much such broadcasts related to incidents and accidents were handled in television media, and which stations have handled which themes in terms of time and frequency, is also weighted. Is possible.

さらに、このような要約には、ニュース放送に限らず、各種エンターテイメント番組の内容を多角的に分析することも可能である。 Furthermore, such a summary is not limited to news broadcasting, but it is also possible to analyze the contents of various entertainment programs from various perspectives.

これにより、例えば、網羅的に構築されたエンタメ・データベースをもとに、ドラマ、映画、バラエティなどのエンターテイメント番組の内容やジャンル比較、時間帯把握など、多角的な観点で分析することができる。 Thereby, for example, based on an entertainment database constructed in an exhaustive manner, it is possible to analyze from various viewpoints such as contents of entertainment programs such as dramas, movies, variety, genre comparison, and time zone grasp.

したがって、例えば、バラエティ番組の出演者のうち、顔認証による特定の出演者の映像が出力された際の時間、音声認証（声紋）による特定の出演者の音声が出力された際の時間や内容、等を学習し、以降の放送での当該特定の出演者が出演している番組中の当該特定の出演者が画面上で放送されている場合や発言している場合を利用者に報知することができ、視聴者の汎用性を向上することができる。 Therefore, for example, among the performers of a variety program, the time when the video of a specific performer by face authentication is output, the time and content when the sound of a specific performer by voice authentication (voice print) is output , Etc., and informs the user when the specific performer in the program in which the specific performer in subsequent broadcasts is broadcast or speaking on the screen It is possible to improve the versatility of the viewer.

さらに、当該特定の出演者の出演時間を換算し、例えば、日・週・月単位での出演割合等からその演者価値を容易に算出することができる。 Furthermore, the performance time of the specific performer can be converted, and for example, the performer value can be easily calculated from the appearance ratio in units of days, weeks, and months.

また、上述した出演者の音声は、音声認識後のテキスト化のための形態素解析の際に、方言を標準語へと変換する重み付けを付与することも可能である。 In addition, the voice of the performer described above can be given a weight for converting a dialect into a standard word when performing morphological analysis for text conversion after voice recognition.

したがって、制御回路部５は、要約作成システム１０によって作成した要約を適宜（又は逐次）受信して大容量記憶部６に蓄積するとともに、その要約の蓄積結果に基づいて（重み付け付与のための）最適な条件を学習しつつ、複数のコンテンツに対して要約に含まれる一つ以上の所定の条件に特化した重み付けを付与することができる。 Therefore, the control circuit unit 5 receives the summary created by the summary creation system 10 as appropriate (or sequentially) and stores it in the large-capacity storage unit 6, and based on the storage result of the summary (for weighting). While learning the optimum conditions, weighting specialized for one or more predetermined conditions included in the summary can be given to a plurality of contents.

同様に、制御回路部５は、要約作成システム１０によって作成した要約を適宜（又は逐次）受信して大容量記憶部６に蓄積するとともに、その要約の蓄積結果に基づいて（報知ための）最適な条件（例えば、顔認証や声紋認証による人物の特定）を学習して所望の条件に適合した内容が含まれているか否かの判定精度を向上させつつ、出力中のコンテンツの利用者に対して所望の条件に適合した旨を出力部７で報知させることができる。 Similarly, the control circuit unit 5 receives the summary created by the summary creation system 10 as appropriate (or sequentially) and stores it in the large-capacity storage unit 6, and optimizes it (for notification) based on the storage result of the summary. Learning conditions (for example, identification of a person by face authentication or voiceprint authentication) to improve the accuracy of determining whether or not content that meets the desired conditions is included, Thus, the output unit 7 can notify that the desired condition is met.

このように、希望映像情報報知システム１は、先のコンテンツに基づいて予め蓄積された要約を参照して出力中のコンテンツを構成するデータに所望の条件に適合した内容が含まれているか否かを制御回路部５で判定（監視）し、その判定結果が含まれているとした場合には、出力中のコンテンツの利用者に対してその旨を出力部７から報知することによって、コンテンツを出力している際に、コンテンツ全体を視聴するのではなく、所望の条件に適合した内容が出力される場合にのみ視聴を行うことができる。 In this way, the desired video information notification system 1 refers to the summary stored in advance based on the previous content, and whether or not the data constituting the content being output includes content that meets the desired condition. Is determined (monitored) by the control circuit unit 5, and if the determination result is included, the content of the content is notified by notifying the user of the content being output from the output unit 7. When outputting, it is possible not to view the entire content but only to view the content that meets the desired conditions.

したがって、出力中のコンテンツに所望の条件に適合した内容が出力される場合にのみ視聴を行えばよいため、それ以外の出力中は他の作業を行うなどの、「ながら視聴」を行うことができ、汎用性を向上することができる。 Therefore, since it is only necessary to view the content being output that is suitable for the content being output, it is possible to perform “while viewing” such as performing other work during other output. And versatility can be improved.

また、コンテンツがテレビ放映である場合、例えば、一つの番組であっても利用者によって視聴したいのは番組全体とは限らず、特定のコーナーや出演者のみである場合がある。 Further, when the content is broadcast on television, for example, even a single program may not be limited to the entire program, but may be only a specific corner or performer.

そこで、リアルタイムで視聴している放映データに含まれる音声データ又は映像データの少なくとも一方に所望の条件に適合した内容が含まれているか否かをリアルタイムで判定手段が判定することにより、利用者が視聴したいとする所望の条件に適合した内容が含まれている部分に差し掛かったときに、報知信号を出力すれば、所望の出力を容易に視聴することが可能となる。 Therefore, the determination means determines in real time whether at least one of the audio data or the video data included in the broadcast data being viewed in real time includes content that meets a desired condition. If a notification signal is output when a portion that contains content that conforms to a desired condition to be viewed is reached, the desired output can be easily viewed.

また、コンテンツがインターネット回線等の電気通信回線を利用して受信した映像コンテンツ等の配信データである場合、その映像コンテンツが編集されたものであっても利用者によって視聴したいのはコンテンツ全体とは限らず、その一部のみである場合がある。 Also, if the content is distribution data such as video content received using a telecommunication line such as the Internet line, what the user wants to view is the entire content even if the video content is edited There is a case where it is not limited but only a part thereof.

そこで、リアルタイムで視聴している配信データに含まれる音声データ又は映像データの少なくとも一方に所望の条件に適合した内容が含まれているか否かをリアルタイムで判定手段が判定することにより、利用者が視聴したいとする所望の条件に適合した内容が含まれている部分に差し掛かったときに、報知信号を出力することにより、所望の出力を視聴することが可能となる。 Therefore, the determination means determines in real time whether at least one of the audio data or the video data included in the distribution data that is viewed in real time includes content that meets a desired condition. When a portion that contains content that meets a desired condition to be viewed is reached, a desired output can be viewed by outputting a notification signal.

また、コンテンツがインターネット回線等の電気通信回線を利用して受信した映像コンテンツ等の配信データである場合、例えば、インターネットサーバへのアクセス数、電気通信回線の受信速度、パーソナルコンピュータやスマートフォン等の受信・再生端末の機能、とうによっては、出力部分よりも先の部分の配信データを予め受信している場合がある。 In addition, when the content is distribution data such as video content received using a telecommunication line such as the Internet line, for example, the number of accesses to the Internet server, the receiving speed of the telecommunication line, the reception of a personal computer, a smartphone, etc. -Depending on the function of the playback terminal, the delivery data of the part ahead of the output part may be received in advance.

そこで、コンテンツが電気通信回線を通じて受信した配信データである場合には、リアルタイムで視聴している配信データに先行して受信した配信データに含まれる音声データ又は映像データの少なくとも一方に所望の条件に適合した内容が含まれているか否かを予め判定手段で判定しておけば、その部分の出力に差し掛かったときに報知することができる。 Therefore, when the content is distribution data received through a telecommunication line, at least one of audio data and video data included in the distribution data received prior to the distribution data being viewed in real time satisfies a desired condition. If it is determined in advance by the determining means whether or not the adapted content is included, it is possible to notify when the output of that portion is reached.

また、利用者によっては、他の都合によってリアルタイムでコンテンツを視聴することができず、記憶手段に記憶（いわゆる、録画）している場合がある。 In addition, some users may not be able to view content in real time for other reasons, but may be stored (so-called recording) in the storage means.

そこで、予め記憶手段に記憶した放映データ又は配信データを出力（再生）している場合には、リアルタイムで視聴（再生）している配信データに先行して記憶した配信データに含まれる音声データ又は映像データの少なくとも一方に所望の条件に適合した内容が含まれているか否かを予め判定手段で判定しておけば、その部分の出力に差し掛かったときに報知手段で報知することができる。 Therefore, when the broadcast data or distribution data stored in the storage means is output (reproduced), the audio data included in the distribution data stored prior to the distribution data being viewed (reproduced) in real time or If it is determined in advance by the determining means whether or not at least one of the video data includes content that meets a desired condition, the notifying means can notify when the output of that portion is reached.

また、予め記憶手段に記憶した放映データ又は配信データを出力（再生）する場合には、本来の出力速度、すなわち、時間と出力速度とが一致している標準速度よりも早い高速出力（若しくは前記標準速度よりも遅い低速出力）で出力（再生）している場合がある。 In addition, when outputting (reproducing) broadcast data or distribution data stored in the storage unit in advance, the original output speed, that is, a high-speed output that is faster than the standard speed where the time and the output speed match (or The output (reproduction) may be performed at a low-speed output slower than the standard speed).

そこで、出力速度が標準速度でない場合には、報知の例として、所望の条件に適合した内容が含まれていると判定した場合に出力中のコンテンツの出力速度を標準速度に切り替えることにより、所望のタイミングから視聴をすることが可能となる。 Therefore, when the output speed is not the standard speed, as an example of notification, when it is determined that the content suitable for the desired condition is included, the output speed of the content being output is switched to the standard speed. It becomes possible to view from the timing of.

このような報知により、視聴者は、音量が増加したことにより、所望の条件に適合した部分の放映（再生）に差し掛かったことを容易に認識することができる。 By such notification, the viewer can easily recognize that the portion that meets the desired condition has been broadcast (reproduced) due to the increase in volume.

また、視聴者は、コンテンツに含まれる音声とは異なる音声を出力することにより、所望の条件に適合した部分の放映（再生）に差し掛かったことを容易に認識することができる。 In addition, the viewer can easily recognize that the portion that has met the desired condition has been broadcast (reproduced) by outputting a sound different from the sound included in the content.

さらに、例えば、表示画面７ａにコンテンツを表示せず、大容量記憶装置６に映像を録画しているような出力形態の場合も想定される。 Further, for example, a case where the output form is such that the content is not displayed on the display screen 7a and the video is recorded in the mass storage device 6 is also assumed.

このような場合、制御回路部５は、その出力中（録画中）のコンテンツに含まれる映像データや音声データリアルタイムで解析するとともに、そのデータ解析中に所望の条件に適合した内容が含まれていると判定したときに表示画面７ａにコンテンツを実際に表示する、ように起動制御してもよい。 In such a case, the control circuit unit 5 analyzes the video data and audio data included in the content being output (recording) in real time, and includes contents that meet the desired conditions during the data analysis. The activation control may be performed so that the content is actually displayed on the display screen 7a when it is determined that the content is present.

また、例えば、図１０に示すように、一つの表示画面７ａにメイン画面（キャスター画面）とワイプ画面（スポーツ画面）とが表示されている場合には、スポーツコーナーが開始されたと判定した時点でメイン画面とワイプ画面とで表示状態を切り替える、或いは、ワイプ画面を全画面表示に切り替える、ように起動制御してもよい。 Further, for example, as shown in FIG. 10, when a main screen (caster screen) and a wipe screen (sports screen) are displayed on one display screen 7a, when it is determined that the sports corner has started. The activation control may be performed such that the display state is switched between the main screen and the wipe screen, or the wipe screen is switched to the full screen display.

１：希望映像情報報知システム
２：受信部
３：操作部
４：記憶部
５：制御回路部（判定手段）
６：大容量記憶部
７：出力部（出力手段）
７ａ：表示画面
８：ビデオ信号処理部
９：再生装置
１０：要約作成システム
２０：ビデオ信号分離部
３０：テレビ局
１００：発話テキスト化部
１１０：発話情報抽出部
１２０：発話内容認識部
１３０：発話内容テキスト化部
１４０：機械学習部
１５０：内容認識テキスト作成設定部
１６０：比較評価部
１６１：比較部
１６２：評価部
１７０：切換部
１８０：切換部
２００：テロップテキスト化部
２１０：テロップ情報抽出部
２２０：テロップ内容認識部
２３０：テロップ内容テキスト化部
２４０：機械学習部
２５０：内容認識テキスト作成設定部
２６０：比較評価部
２６１：比較部
２６２：評価部
２７０：切換部
２８０：切換部
３００：背景画像テキスト化部
３１０：背景画像情報抽出部
３２０：背景画像内容認識部
３３０：背景画像内容テキスト化部
３４０：機械学習部
３５０：内容認識テキスト作成設定部
３６０：比較評価部
３６１：比較部
３６２：評価部
３７０：切換部
３８０：切換部
４００：ロゴマークテキスト化部
４１０：ロゴマーク画像情報抽出部
４２０：ロゴマーク内容認識部
４３０：ロゴマーク内容テキスト化部
４４０：機械学習部
４５０：内容認識テキスト作成設定部
４６０：比較評価部
４６１：比較部
４６２：評価部
４７０：切換部
４８０：切換部
５００：テキスト統合部
５１０：統合テキスト作成部
５２０：統合テキスト作成設定部
５３０：機械学習部
５４０：比較評価部
５４１：比較部
５４２：評価部
５５０：既存データ格納部
５７０：切換部
５８０：切換部
６００：要約作成部
６１０：要約テキスト作成部
６２０：要約作成設定部
６３０：機械学習部
６４０：比較評価部
６４１：比較部
６４２：評価部
６５０：既存データ格納部
６７０：切換部
６８０：切換部
７００：既存データ格納部
７１０：既存ビデオデータ格納部
７２０：既存発話テキスト格納部
７３０：既存テロップテキスト格納部
７４０：既存背景画像テキスト格納部
７５０：既存ロゴマークテキスト格納部
７６０：既存各種テキスト格納部
７７０：既存統合テキスト格納部
７８０：既存要約テキスト格納部 1: Desired video information notification system 2: Receiving unit 3: Operation unit 4: Storage unit 5: Control circuit unit (determination means)
6: Mass storage section 7: Output section (output means)
7a: Display screen 8: Video signal processing unit 9: Playback device 10: Summary creation system 20: Video signal separation unit 30: TV station 100: Utterance text conversion unit 110: Utterance information extraction unit 120: Utterance content recognition unit 130: Utterance content Text conversion unit 140: Machine learning unit 150: Content recognition text creation setting unit 160: Comparison evaluation unit 161: Comparison unit 162: Evaluation unit 170: Switching unit 180: Switching unit 200: Telop text conversion unit 210: Telop information extraction unit 220 : Telop content recognition unit 230: telop content text conversion unit 240: machine learning unit 250: content recognition text creation setting unit 260: comparative evaluation unit 261: comparison unit 262: evaluation unit 270: switching unit 280: switching unit 300: background image Text conversion unit 310: Background image information extraction unit 320: Background image content recognition unit 330: Background image content Kisting unit 340: Machine learning unit 350: Content recognition text creation setting unit 360: Comparison evaluation unit 361: Comparison unit 362: Evaluation unit 370: Switching unit 380: Switching unit 400: Logo mark text conversion unit 410: Logo mark image information Extraction unit 420: Logo mark content recognition unit 430: Logo mark content text conversion unit 440: Machine learning unit 450: Content recognition text creation setting unit 460: Comparison evaluation unit 461: Comparison unit 462: Evaluation unit 470: Switching unit 480: Switching Unit 500: text integration unit 510: integrated text creation unit 520: integrated text creation setting unit 530: machine learning unit 540: comparative evaluation unit 541: comparison unit 542: evaluation unit 550: existing data storage unit 570: switching unit 580: switching Unit 600: summary creation unit 610: summary text creation unit 620: summary creation setting unit 630: machine learning unit 6 40: comparative evaluation unit 641: comparison unit 642: evaluation unit 650: existing data storage unit 670: switching unit 680: switching unit 700: existing data storage unit 710: existing video data storage unit 720: existing utterance text storage unit 730: existing Telop text storage unit 740: Existing background image text storage unit 750: Existing logo mark text storage unit 760: Existing various text storage unit 770: Existing integrated text storage unit 780: Existing summary text storage unit

Claims

The data that constitutes the content being output includes content that meets the desired conditions while learning the optimal conditions based on the accumulated results of the summaries stored in advance based on the previous contents that can be displayed on the display means. A determination means for determining whether or not the content is included, and a notification means for notifying the user of the content being output when the determination means determines that the content is included. Video information notification system.

An activation unit configured to display the content being output on the display unit, and the notification unit outputs a notification signal to the activation unit when the determination unit determines that the content is being output; The desired video information notification system according to claim 1, wherein the content is displayed.

The determination means determines whether at least one of audio data and video data included in broadcast data being viewed in real time includes content that meets a desired condition when the content is broadcast on television. 2. The desired video information notification system according to claim 1, wherein a notification signal is output to the notification means when it is determined in real time and it is determined that content suitable for a desired condition is included.

In the case where the content is distribution data received through a telecommunication line, the determination means includes at least one of audio data and video data included in the distribution data viewed in real time that includes content that meets a desired condition. 2. The desired video according to claim 1, wherein: a desired image is output to the notification means when it is determined in real time whether or not content suitable for a desired condition is included. Information reporting system.

When the content is distribution data received through a telecommunication line, the determination means may be set to at least one of audio data and video data included in the distribution data received prior to the distribution data viewed in real time. It is determined in advance whether or not content that conforms to the condition is included, and when a portion that is determined to include content that conforms to the desired condition is output, a notification signal is output to the notification unit. The desired video information notification system according to claim 1, wherein:

In the case where the content is broadcast data or distribution data stored in advance in the storage unit, the determination unit includes at least audio data or video data included in the distribution data received prior to the distribution data viewed in real time. A determination signal is output to the notification means when it is determined in advance whether content suitable for the desired condition is included on one side, and a portion determined to include content suitable for the desired condition is output. The desired video information notification system according to claim 1, wherein:

The notification means is a high-speed output whose time and output speed are faster than the same standard speed or a low-speed output which is slower than the standard speed when outputting broadcast data or distribution data stored in advance in the storage means. 7. The request according to claim 6, wherein the output speed of the content being output is switched to the standard speed when it is determined by the determination means that the content conforming to a desired condition is included. Video information notification system.

The notification means increases the volume of the sound output from the sound output unit based on the sound data when it is determined by the determination means that content suitable for a desired condition is included. The desired video information notification system according to any one of claims 1 to 7.

The notification means outputs a notification sound different from the sound output from the sound output unit based on the sound data when the determination means determines that the content that meets the desired condition is included. The desired video information notification system according to any one of claims 1 to 7, wherein the desired video information notification system is provided.