JP7137825B2

JP7137825B2 - Video information provision system

Info

Publication number: JP7137825B2
Application number: JP2018107238A
Authority: JP
Inventors: 孝利石井
Original assignee: Jcc株式会社; Ｊｃｃ株式会社
Priority date: 2018-06-04
Filing date: 2018-06-04
Publication date: 2022-09-15
Anticipated expiration: 2038-06-04
Also published as: JP2019213038A

Description

本発明は、映像情報提供システムに関し、特に、ＴＶ及びウェブに出現する映像コンテンツの出現頻度を判断し、出現頻度が高い映像コンテンツを適時にユーザーに提供する映像情報提供システムに関する。 The present invention relates to a video information providing system, and more particularly to a video information providing system that determines the frequency of appearance of video content appearing on TV and on the web, and timely provides users with video content that has a high frequency of appearance.

現在、多種多様な大量の映像情報がＴＶ及びウェブを介して視聴者、ユーザーに継続的に配信されている。このように２４時間、随時、自動的に提供されている大量の映像情報の中から、視聴者及びユーザーが自分に有用な映像コンテンツを適切、適時に抽出することは困難な状況になりつつある。 Currently, a large amount of diverse video information is continuously distributed to viewers and users via TV and the web. It is becoming difficult for viewers and users to appropriately and timely extract useful video content from a large amount of video information that is automatically provided 24 hours a day. .

即ち、多忙な現代人は、従来のように、ＴＶ受像機の前において所定時間に亘って放送局から配信されてくる複数の番組、大量の映像情報を全て視認し、その中から自分に必要な映像コンテンツ、情報を入手することが次第に不可能となっている状況がある。 In other words, busy modern people visually recognize all the multiple programs and a large amount of video information distributed from the broadcasting station over a predetermined period of time in front of the TV receiver, and select from among them what they need. There is a situation where it is gradually becoming impossible to obtain video content and information that are of interest to the public.

また、ウェブからの情報の入手に関しても同様の事情がある。即ち、ウェブから必要な情報を入手しようとする場合には、Ｇｏｏｇｌｅ（登録商標）等のインターネット検索エンジンを利用し、キーワード検索により情報を入手するが、キーワード検索により検索される情報には、いわゆる「ノイズ」と称される、希望する情報に類似する情報までもが大量に抽出される。 In addition, there is a similar situation regarding the acquisition of information from the Web. That is, when trying to obtain necessary information from the web, information is obtained by keyword search using an Internet search engine such as Google (registered trademark). Even information similar to the desired information, referred to as "noise", is extracted in large amounts.

従って、ユーザーは、検索エンジンを使用した場合であっても、大量の関連情報の中から、自らが希望する情報をさらに抽出する必要があり、検索作業がなお煩雑である、という不具合があった。 Therefore, even if the user uses a search engine, the user still needs to extract the desired information from a large amount of related information, and the search work is still troublesome. .

ところで、従来、例えば、映像コンテンツとしてのテレビ放映等で出力している映像の中から、ある期間において予め登録した映像と類似する映像を探索する技術が知られている。 By the way, conventionally, for example, there is known a technique of searching for a video similar to a pre-registered video for a certain period of time, from among videos output by TV broadcasting or the like as video content.

このような探索技術は、例えば、テレビ放映信号の中から特定のタイトルロールを検出してリアルタイム録画の開始・停止や、異なる時間・放送局で放送された同一ニュース素材を検出して映像の構造解析を行う等の技術に用いられている（例えば、特許文献１参照）。 Such search technology can detect, for example, a specific title roll in a television broadcast signal to start/stop real-time recording, or detect the same news material broadcasted at different times and on different stations to detect the structure of a video. It is used in techniques such as analysis (see, for example, Patent Document 1).

また、このような探索技術は、テレビ放映に限定されず、例えば、インターネット回線を通じて受信した映像コンテンツ等の配信データを対象とすることも可能である（例えば、特許文献２参照）。 In addition, such a search technique is not limited to television broadcasting, and can also target distribution data such as video content received via an Internet line (see, for example, Patent Document 2).

さらに、このような探索技術は、映像に限定されず、例えば、テキストへの対応も可能である。具体的には、映像コンテンツに含まれる字幕テキストの他、放送番組のコーナーごとの放送開始時刻、放送終了時刻、出演者、及び、コーナーの内容の要約等のメタデータを、放送番組の終了後に配信する有料サービス（番組メタデータサービスとも称される）のサービス提供者が提供するメタデータや、ユーザーがキーボード等を操作することによって入力する、映像コンテンツを説明するテキスト等を採用することができる（例えば、特許文献２参照）。 Furthermore, such search techniques are not limited to video, but can also work with text, for example. Specifically, in addition to the caption text included in the video content, metadata such as the broadcast start time, broadcast end time, performers, and a summary of the contents of each corner of the broadcast program are displayed after the end of the broadcast program. It is possible to adopt metadata provided by the service provider of the paid service (also called program metadata service) to be distributed, or text that describes the video content that the user inputs by operating the keyboard, etc. (See Patent Document 2, for example).

特開２０１０－２６２４１３号公報JP 2010-262413 A 特開２０１２－０３８２３９号公報JP 2012-038239 A

しかしながら、これらの技術は、例えば、一つの番組や映像コンテンツを対象としており、上記のような、視聴者やユーザーが、ＴＶにおいて提供されている全番組の中から重要度が高い、又は関心度が高い、有用な映像コンテンツを時間効率的に取得したい、という要請、及び、ウェブにおいてアップされている全映像コンテンツの中から重要な又は関心度の高い、有用な映像コンテンツを時間効率的に取得したい、という要請に応えられるものではなかった。 However, these techniques target, for example, one program or video content, and the viewer or user, as described above, has a high degree of importance or interest among all the programs provided on TV. Requests for time-efficient acquisition of useful video content with high interest rate, and time-efficient acquisition of important or high-interest useful video content from all video content uploaded on the web. It was not something that could respond to the request that I want to do it.

本発明の課題は、上述のような課題を解決するために、配信された多数の映像コンテンツの中から利用者にとって有用な映像コンテンツを、迅速かつ適切に供給することができる映像情報提供システムを提供することにある。 The object of the present invention is to provide a video information providing system capable of quickly and appropriately supplying useful video content to users from among a large number of distributed video content, in order to solve the above problems. to provide.

本発明に係る映像情報提供システムは、上記目的を達成のため、配信された映像コンテンツを受信する受信手段と、前記受信手段により受信された前記映像コンテンツを記憶して蓄積する記憶手段と、配信された前記映像コンテンツの構成要素を解析し、解析した前記構成要素と前記記憶手段に蓄積された多数の過去の映像コンテンツとを対比させて、同一の構成要素の出現頻度を判断する構成要素出現頻度判断手段と、前記構成要素出現頻度判断手段による判断に基づき、出現頻度が所定値以上の場合に前記構成要素が含まれる映像コンテンツを、前記記憶手段に蓄積された多数の過去の映像コンテンツから抽出する映像コンテンツ抽出手段と、前記映像コンテンツ抽出手段により抽出された映像を視聴者に提供する映像コンテンツ提供手段とを備え、前記構成要素出現頻度判断手段は、配信された前記映像コンテンツに含まれる構成要素を解析し、解析した前記構成要素と、前記記憶手段において蓄積した多数の過去の前記映像コンテンツとを対比して同一の前記構成要素の出現頻度を判断し、所定の前記構成要素の出現頻度が所定期間において所定値に達したと判断された場合には、前記映像コンテンツ抽出手段により前記構成要素を含む映像コンテンツを前記記憶手段に蓄積された多数の映像コンテンツから抽出し、前記映像コンテンツ提供手段により抽出した前記映像コンテンツを視聴者に提供することを特徴とする。 In order to achieve the above object, a video information providing system according to the present invention comprises: receiving means for receiving distributed video content; storage means for storing and accumulating the video content received by the receiving means; component appearance for determining the frequency of appearance of the same component by analyzing the components of the video content thus obtained and comparing the analyzed component with a large number of past video contents stored in the storage means. frequency determining means, and based on the determination by said component appearance frequency determining means, video content containing said component when the appearance frequency is equal to or greater than a predetermined value, is selected from a large number of past video contents accumulated in said storage means. video content extracting means for extracting; and video content providing means for providing a viewer with the video extracted by the video content extracting means, wherein the component appearance frequency determining means is included in the distributed video content. Analyzing the constituent elements, comparing the analyzed constituent elements with a large number of the past video contents accumulated in the storage means, judging the appearance frequency of the same constituent elements, and determining the appearance of the predetermined constituent elements. When it is determined that the frequency has reached a predetermined value in a predetermined period of time, the video content extracting means extracts the video content including the component from a large number of video content stored in the storage means, and extracts the video content. The video content extracted by a providing means is provided to a viewer .

配信された映像コンテンツは、構成要素出現頻度判断手段により、映像コンテンツに含まれる構成要素が解析されて、過去に蓄積した映像コンテンツとの対比がされることにより同一の構成要素の出現頻度が判断される。ここで上記映像コンテンツは動画が主体となるが静止画も含まれる。また、動画の中に存在する映像構成要素としての音声やテキストも判断対象となる。 In the distributed video content, the component appearance frequency determination means analyzes the components included in the video content and compares them with the video content stored in the past to determine the appearance frequency of the same component. be done. Here, the video content is mainly moving images, but still images are also included. In addition, voices and texts, which are image components existing in moving images, are also subject to determination.

そして、構成要素出現頻度判断手段により所定の構成要素の出現頻度が所定期間において所定値に達したと判断された場合には、映像コンテンツ抽出手段によりその構成要素を含む映像コンテンツが、蓄積された多数の映像コンテンツから抽出される。 Then, when it is determined by the component appearance frequency determining means that the appearance frequency of a predetermined component has reached a predetermined value in a predetermined period, the video contents including the component are accumulated by the video content extracting means. Extracted from a large number of video contents.

その後、映像コンテンツ提供手段により抽出された映像コンテンツは視聴者に提供される。この場合の映像コンテンツの提供には、映像コンテンツの表示による提供のみならず、映像コンテンツの記録・配信、その他の出力の態様が含まれる。 After that, the video content extracted by the video content providing means is provided to the viewer. In this case, provision of the video content includes not only provision of the video content by display, but also recording/distribution of the video content and other modes of output.

この場合、構成要素出現頻度判断手段及び映像コンテンツ抽出手段には、ＡＩ（人工知能：artificial intelligence）が使用され、高速でのデータ処理が行われる。 In this case, AI (artificial intelligence) is used for the component appearance frequency determination means and the video content extraction means, and high-speed data processing is performed.

請求項２記載の発明にあっては、前記構成要素出現頻度判断手段は視聴者が登録した視聴希望に基づき前記構成要素の出現頻度を判断することを特徴とする。 In the invention according to claim 2, the constituent element appearance frequency determination means is characterized in that the appearance frequency of the constituent elements is determined based on viewing wishes registered by the viewer.

従って、本システムのユーザーである視聴者は、自身が視聴を希望する映像、映像の分類、種類、ジャンル等を適宜の指定範囲において登録することができ、前記構成要素出現頻度判断手段はその希望に基づき構成要素の出現頻度の判断を行う。 Therefore, the viewer, who is the user of this system, can register the video he or she desires to view, the classification, type, genre, etc. of the video within an appropriate designated range, and the component appearance frequency determining means can Based on the above, the frequency of appearance of the constituent elements is determined.

請求項３記載の発明は、前記構成要素は、文字、音声、前記映像コンテンツのテーマとなる映像対象物、登場人物又は映像対象物の背景であることを特徴とする。 The invention according to claim 3 is characterized in that the constituent element is a character, a voice, a video object that is the theme of the video content, a character, or a background of the video object.

動画映像の場合に、構成要素出現頻度判断手段が特定の映像コンテンツ内の構成要素をいかに認識、特定するか、に関しては、「文字、音声、前記映像コンテンツのテーマとなる映像対象物、登場人物又は映像対象物の背景」の観点から行われる。 In the case of video images, how the component appearance frequency determination means recognizes and identifies components in specific video content is described in terms of "characters, voices, video objects that are themes of the video content, characters or from the point of view of the "background of the image object".

この場合、「文字」とは動画に現れるテロップ等の文字であり、いわゆる文字認識技術に基づき行われ、形態素解析技術が使用される。 In this case, the "characters" are characters such as telops that appear in moving images, and are performed based on so-called character recognition technology, using morphological analysis technology.

また、「音声」とは、動画等に含まれる多様な音や人の声であり、背景音、テーマソング、登場人物の話し声、効果音等が含まれる。必要な場合には、その多様な音声の中から、当該映像のテーマとなる話題に関する特定の登場人物の話し声のみを抽出して構成要素出現頻度判断手段により出現頻度が判断される。 Also, "sound" refers to various sounds and human voices included in moving images and the like, including background sounds, theme songs, voices of characters, sound effects, and the like. If necessary, only the speaking voice of a specific character on the theme of the video is extracted from the various voices, and the frequency of appearance is determined by the component appearance frequency determination means.

この場合、特定の映像の主題となる人物の話し声のみを特定する場合には、当該人物の声の周波数帯域を特定することにより、当該人物の話し声のみを抽出することが可能となる。 In this case, when specifying only the speaking voice of a person who is the subject of a specific video, it is possible to extract only the speaking voice of the person by specifying the frequency band of the voice of the person.

また、「テーマとなる映像対象物」とは、当該映像の主題となる対象物であり、「登場人物」とは映像のテーマに関連して主役として登場する人物又は脇役として登場する人物を含む。この場合、登場人物の特定に関しては顔認証技術等が使用される。また、「映像対象物の背景」とは、登場人物や映像の主題となる対象物の背景として映っているものを指し、例えば、建造物や、海、山、空、平原等である。 In addition, "theme video object" is the subject of the video, and "character" includes a person who appears as a main character or a supporting character in relation to the theme of the video. . In this case, facial recognition technology or the like is used to identify characters. Also, the 'background of an image object' refers to what appears as the background of a character or an object that is the subject of an image, such as a building, the sea, a mountain, the sky, a plain, and the like.

請求項４記載の発明にあっては、前記構成要素出現頻度判断手段は、映像画面内に文字により表示されたタイトル部を参照して構成要素の出現頻度を判断することを特徴とする。
即ち、ＴＶ放送局から配信されるニュース映像の場合には、画面に表示された映像内の右上等にテキストから成るタイトル部が表示される場合があり、構成要件出現頻度判断手段はこのタイトル部を参照して映像認識、音声認識、人物認識、背景認識等を行い、構成要件の出現頻度を判断する。
請求項５記載の発明は、前記映像コンテンツは、電気通信回線を介してサーバから配信されることを特徴とする。 A fourth aspect of the present invention is characterized in that the constituent element appearance frequency determination means judges the appearance frequency of the constituent elements by referring to the title portion displayed by characters in the video screen.
That is, in the case of a news video distributed from a TV broadcasting station, a title portion composed of text may be displayed in the upper right portion of the video displayed on the screen. , perform video recognition, voice recognition, person recognition, background recognition, etc., and determine the frequency of appearance of constituent features.
The invention according to claim 5 is characterized in that the video content is distributed from a server via an electric communication line.

従って、請求項５に係る発明においては、構成要素出現頻度判断手段は、ウェブを介して配信された動画を含む映像コンテンツを対象として同一の構成要素の出願頻度を判断する。ここで「ウェブを介して配信された動画」とは、ウェブにアップされている各種の動画、ＳＮＳ（ソーシャルネットワークサービス）上の動画、インターネットを通じて配信される動画（例えば、ＮＥＴＦＬＩＸ[登録商標]、ａｍａｚｏｎＦＩＲＥＴＶ）等の一切を含む。
また、以下の説明において、「ウェブ」とは、ブラウザで閲覧するウェブシステムの場合に限定されず、クラウドシステムの場合を含む。したがって、映像コンテンツは、ウェブサーバ或いはクラウドサーバの何れかに存在している場合を含んでいる。 Therefore, in the invention according to claim 5, the element appearance frequency determination means determines the application frequency of the same element for video content including moving images distributed via the web. Here, "video distributed via the web" means various videos uploaded on the web, videos on SNS (social network service), videos distributed via the Internet (for example, NETFLIX [registered trademark], amazon FIRE TV), etc.
Also, in the following description, the term "web" is not limited to web systems viewed with a browser, and includes cloud systems. Therefore, video content includes cases where it resides on either a web server or a cloud server.

請求項６に記載の発明は、前記映像コンテンツは、電気通信回線を介して放送局から配信されることを特徴とする。 The invention according to claim 6 is characterized in that the video content is distributed from a broadcasting station via an electric communication line.

従って、請求項６記載の発明においては、構成要素出現頻度判断手段は、ＴＶ映像として表示される全ての映像コンテンツを対象として同一の構成要素の出願頻度を判断する。 Therefore, in the sixth aspect of the present invention, the component appearance frequency determination means determines the application frequency of the same component for all video contents displayed as TV images.

また、請求項７記載の発明に係る映像情報提供システムは、複数の映像コンテンツについての各ビデオ信号から抽出した前記複数の映像コンテンツにおける音声データ又は映像データをテキスト化して要約を作成する要約作成手段と、前記要約作成手段で作成した要約の蓄積結果に基づいて最適な条件を学習しつつ、前記複数の映像コンテンツに対して前記要約に含まれる一つ以上の所定の条件に特化した重み付けを付与する重み付け付与手段とを備え、前記構成要素出現頻度判断手段は前記要約に基づき構成要素の出現頻度を判断することを特徴とする。 In the video information providing system according to the invention of claim 7, there is provided a summary creation means for creating a summary by converting audio data or video data in the plurality of video contents extracted from each video signal of the plurality of video contents into text. weighting specific to one or more predetermined conditions included in the summary for the plurality of video contents while learning the optimum condition based on the accumulation result of the summary created by the summary creation means. and weighting means for assigning weights, wherein the element appearance frequency determination means determines the appearance frequency of the elements based on the summary.

請求項７記載の発明にあっては、前記構成要素出現頻度判断手段は映像コンテンツの要約を基礎として映像コンテンツの出現頻度、即ち重要度の判断を行う。 In the seventh aspect of the present invention, the constituent element appearance frequency determination means determines the frequency of appearance of the video content, ie, the degree of importance, based on the summary of the video content.

即ち、前記要約作成手段により、複数の映像コンテンツについての各ビデオ信号から抽出した前記複数の映像コンテンツにおける音声データ又は映像データをテキスト化された要約に基づき、構成要素の出現頻度が判断される。 That is, the summarization means determines the frequency of appearance of the constituent elements based on a textual summary of the audio data or video data of the plurality of video contents extracted from each video signal of the plurality of video contents.

この場合、要約作成手段で作成された要約は、重み付け付与手段によって複数の映像コンテンツを対象として少なくとも一つ以上の所定の条件に特化した重み付けを付与することにより、複数の映像コンテンツを対象として利用者に対して、重要度、関心度の高い、より有用な情報を供給する。 In this case, the summary created by the summary creation means is applied to the plurality of video contents by applying weights specific to at least one or more predetermined conditions to the plurality of video contents by the weighting means. To provide users with more useful information of high importance and interest.

請求項８に記載の発明は、請求項７に記載の映像情報提供システムにおいて、前記重み付け付与手段は、前記複数の映像コンテンツに含まれる音声データ又は映像データから重複するテキストを参照したうえで、その参照結果が所定値以上である場合に、そのテキストを前記所定の条件に合致した重要テキストであると判定して重み付けを付与することを特徴とする。 The invention according to claim 8 is the video information providing system according to claim 7, wherein the weighting means refers to duplicate text from audio data or video data included in the plurality of video contents, When the reference result is equal to or greater than a predetermined value, the text is determined to be an important text that meets the predetermined condition and is weighted.

例えば、日々放送されるニュース番組やウェブ、ＳＮＳ等においては、社会的に大きな報道価値を有する話題、事件や事故等の情報は社会的に重要度、関心度が高いといえる。その結果、そのような社会的事象に関する情報は複数のメディアにおいて、複数の映像コンテンツとして出現する場合が多い。 For example, in daily broadcast news programs, the web, SNS, and the like, it can be said that topics with great social news value, information on incidents and accidents, etc., are of high social importance and interest. As a result, information about such social events often appears as multiple video contents in multiple media.

従って、複数の映像コンテンツにおいて、テキストの形態素解析等によって重複するテキストを同一の構成要素として判断するとともに、その出現頻度、例えば、出現回数、出現時間、出現率等に応じて、段階的等の重要度を適正に設定するものである。 Therefore, in multiple video contents, overlapping texts are judged to be the same constituent element by morphological analysis of the text, etc., and according to the frequency of appearance, for example, the number of appearances, the time of appearance, the appearance rate, etc. The importance is appropriately set.

請求項９記載の発明は、請求項８記載の映像情報提供システムにおいて、前記重み付け付与手段は、予め設定された期間内における複数映像コンテンツを対象として重要テキストであるか否かを判定することを特徴とする。 According to a ninth aspect of the invention, there is provided the video information providing system of the eighth aspect, wherein the weighting means determines whether or not a plurality of video contents within a preset period are important texts. Characterized by

前記のような話題、事件又は事故等の情報の社会的な重要度の判断に要する期間を所定の期間内において判断するものである。 The period required for judging the degree of social importance of information such as the topic, incident or accident is judged within a predetermined period.

請求項１０記載の発明は、請求項８又は請求項９記載の映像情報提供システムにおいて、前記重み付け付与手段は、新たな映像コンテンツを対象として、前記重要テキストを含む映像コンテンツであると判定した場合には、当該映像コンテンツの録画を開始することを特徴とする。 The invention according to claim 10 is the video information providing system according to claim 8 or 9, wherein the weighting means determines that new video content is video content including the important text. is characterized by starting recording of the video content.

日々放送されるニュース番組等においては、大きな事件や事故などは社会的な重要度、関心度が高いといえる。そこで、そのような事件・事故等を、例えば、要約作成用とは別個に、大容量記憶部に録画記憶しておけば、オペレータ等の編集、記録映像等として残しておくことができる。 In news programs that are broadcast daily, it can be said that major incidents and accidents are of high social importance and interest. Therefore, if such incidents, accidents, etc., are recorded and stored in a large-capacity storage section separately from those used for summarization, for example, they can be edited by operators, etc., and left as recorded images.

請求項１１に記載の発明は、請求項６に記載の映像情報提供システムにおいて、前記重み付け付与手段は、前記複数の映像コンテンツの前記音声データ又は前記映像データの少なくとも一方に含まれるテキストから特定の人・物を対象として、その出現頻度からＣＭ換算値を算出することを特徴とする。 According to the eleventh aspect of the invention, in the video information providing system according to the sixth aspect, the weighting means assigns a specific It is characterized by calculating a CM conversion value from the frequency of appearance of people and objects.

上述した特定の条件には、特定の人物・物・事件・事故を含ませることができる。ここで、特定の条件に、例えば、特定の人（法人名・各種団体等を含む）を対象とすることにより、複数の映像コンテンツから特定の人物の出現頻度、例えば、出現回数、出現時間、出現率等を割り出すことができる。 The specific conditions mentioned above can include specific persons, objects, incidents, and accidents. Here, for example, by targeting a specific person (including corporate name, various organizations, etc.) as a specific condition, the frequency of appearance of a specific person from a plurality of video contents, such as the number of appearances, the appearance time, The appearance rate, etc. can be calculated.

そして、その出現回数、出現時間、出現率等に基づいて、マスコミ、情報の世界においてどのくらいの人物価値（例えば、ギャラの設定など）があるかの目安となるＣＭ換算値を算出することができる。なお、特定の法人名を対象とした場合には、ＣＭ全体に対する提供割合、法人が販売する各種商品の重要度（力の入れ具合や新商品の特定）、株価変動との相互関係、といった解析（重み付け）等に利用することもできる。 Then, based on the number of appearances, appearance time, appearance rate, etc., it is possible to calculate a CM conversion value that serves as a guideline of how much a person's value (for example, setting a guarantee) is in the world of mass media and information. . In addition, when targeting a specific corporate name, analysis such as the ratio of commercials provided to the entire commercial, the importance of the various products sold by the corporation (degree of effort and identification of new products), and the correlation with stock price fluctuations It can also be used for (weighting) and the like.

請求項１２記載の発明にあっては、前記重み付け付与手段は、前記複数の映像コンテンツの前記音声データ又は前記映像データの少なくとも一方に含まれるテキストから特定の法人名を対象とすることを特徴とする。 In the invention according to claim 12, the weighting means targets a specific corporate name from the text included in at least one of the audio data and the video data of the plurality of video contents. do.

従って、請求項１２記載の発明にあっては、特定の法人名が出現頻度判断の対象となる構成要素とされることから、特定の法人名が現れる映像コンテンツが抽出され、ユーザーとしての視聴者である当該法人に提供される。 Therefore, in the invention according to claim 12, since the specific corporate name is used as a component for determining the appearance frequency, the video content in which the specific corporate name appears is extracted, and the viewer as the user is extracted. provided to the legal entity that is

請求項１乃至１２記載の発明にあっては、複数の映像コンテンツを対象として利用者に対してより有用な情報を、映像システム側から自動的に提供することができる。 In the inventions according to claims 1 to 12, more useful information can be automatically provided to the user from the video system side for a plurality of video contents.

即ち、配信された動画や静止画に係る映像コンテンツは、構成要素出現頻度判断手段により、映像コンテンツに含まれる要素が解析されて、過去に蓄積した映像コンテンツとの対比がされることにより同一の構成要素の出現頻度が判断され、所定の構成要素の出現頻度が所定期間において所定値に達したと判断された場合、映像コンテンツ抽出手段によりその構成要素を含む映像コンテンツを蓄積した多数の映像コンテンツから抽出され、映像コンテンツ提供手段により抽出した映像コンテンツは視聴者に提供されるように構成されていることから、視聴者はＴＶやウェブに配信される映像コンテンツを常時視聴する必要はなく、自動的に提供された所定の頻度以上にＴＶやウェブに出現する映像コンテンツのみを視聴することにより重要な情報のチェック、把握が可能となる。 That is, the video content related to the delivered moving image or still image is analyzed by the element appearance frequency determination means, and the elements included in the video content are analyzed and compared with the video content accumulated in the past. Appearance frequency of a component is determined, and when it is determined that the appearance frequency of a predetermined component reaches a predetermined value in a predetermined period, a large number of video contents in which video contents including the component are accumulated by the video content extracting means. Since the video content extracted by the video content providing means is configured to be provided to the viewer, the viewer does not need to constantly watch the video content delivered to the TV or the web, and automatically It is possible to check and comprehend important information by viewing only the video content that appears on TV or the web more than a predetermined frequency that is publicly provided.

その結果、出現頻度の高い映像コンテンツは重要な情報である場合、又はその時点においてトレンディな情報である場合が多いことから、視聴者は、常時、ＴＶの複数チャンネルの映像やウェブ情報をチェックする必要はなく、ＴＶ受像機、パソコンやスマートフォンに時間的に縛られることなく、本発明に係るシステムが提供してくれる重要な情報やその時点におけるトレンディな情報からなる映像コンテンツのみを時間効率的に視聴することができる。 As a result, video content that appears frequently is often important information or trendy information at that time, so viewers always check multiple TV channels and web information. There is no need, and only video content consisting of important information provided by the system according to the present invention and trendy information at that time can be viewed in a time efficient manner without being bound by time to a TV receiver, personal computer or smartphone. can be viewed.

従って、視聴者が多忙な環境にあっても、ＴＶ受像機により常にＴＶの番組を確認したり、パソコン、スマートフォンにより頻繁にウェブ情報をチェックすることなく、ＴＶ、パソコン、スマートフォン上において必要な重要情報を効率的に取得することができる、という効果を奏する。 Therefore, even if the viewer is in a busy environment, he or she does not need to constantly check the TV program on the TV receiver or frequently check the web information on the personal computer or smart phone. It is effective in being able to acquire information efficiently.

請求項２記載の発明にあっては、前記構成要素出現頻度判断手段は視聴者が登録した視聴希望に基づき前記構成要素の出現頻度を判断するように構成されており、本システムのユーザーである視聴者は、自身が視聴を希望する映像、映像の分類、種類、ジャンル等を適宜の指定範囲において登録することができ、前記構成要素出現頻度判断手段はその希望に基づき出現頻度の判断を行うことができるため、ユーザーが自分が希望する映像コンテンツのジャンル、種類等を適宜、あらかじめシステムに登録しておき、そのジャンル、種類の範囲内において出現頻度の高い映像コンテンツを自動的に、効率よく視聴することが可能となる。 In the invention according to claim 2, the component appearance frequency determination means is configured to determine the appearance frequency of the component based on a viewing request registered by a viewer, who is a user of the system. The viewer can register the video he/she wishes to view, the classification, type, genre, etc. of the video in an appropriate designated range, and the component appearance frequency determination means determines the appearance frequency based on the request. Therefore, the user registers the genre, type, etc. of the video content that the user desires in the system in advance, and automatically and efficiently displays the video content that appears frequently within the range of the genre and type. It is possible to watch.

また、例えば、ユーザーである特定の会社が、ＴＶ又はウェブに現れる、自社に関連する情報を一括管理しておきたいと希望する場合には、当該会社名、コーポレーションアイデンティティ、社名の略称等を登録しておくことにより、当該会社に関連するＴＶ映像コンテンツ又はウェブにおける映像情報を全て自動的に収集して視聴し、様々な観点から会社の経営管理に役立てることが可能となる。
請求項４記載の発明にあっては、ＴＶ放送局から送信されてくる映像情報に関しては、例えば、ニュース番組の場合には、画面中において右上等の位置に当該ニュースのタイトル部が短いテキストにより表示される場合がある。このような場合には、構成要素の出現頻度を判断するための映像認識、音声認識、人物認識、背景認識を行う場合に、タイトル部を参照して行うこともできる。このようにタイトル部を参照して映像認識、音声認識、人物認識、背景認識を行った場合には、より精度の高い認識を行うことができ、より効率の良い構成要素の出現頻度の判断をすることが可能となる。 Also, for example, if a specific company that is a user wishes to collectively manage information related to the company that appears on TV or the web, register the company name, corporate identity, company name abbreviation, etc. By doing so, it becomes possible to automatically collect and view all TV video content or video information on the web related to the company, and use it for business management of the company from various points of view.
In the invention according to claim 4, regarding the video information transmitted from the TV broadcasting station, for example, in the case of a news program, the title of the news is displayed in a short text at a position such as the upper right in the screen. may be displayed. In such a case, when video recognition, voice recognition, person recognition, and background recognition are performed to determine the appearance frequency of the constituent elements, the title part can be referred to. In this way, when video recognition, voice recognition, person recognition, and background recognition are performed by referring to the title part, more accurate recognition can be performed, and more efficient determination of the appearance frequency of constituent elements can be performed. It becomes possible to

請求項５に係る発明においては、構成要素出現頻度判断手段は、ウェブにアップされている各種の動画、ＳＮＳ（ソーシャルネットワークサービス）上の一切の動画、インターネットを通じて配信されてくる各種の動画等を対象として出現頻度を判断することから、ウェブを介して世の中に出回る全ての映像コンテンツから重要度の高い映像コンテンツを抽出し、ユーザーである視聴者に対して重要度の高い映像コンテンツを網羅的に検索し、迅速かつ自動的に提供することが可能となる。 In the invention according to claim 5, the constituent element appearance frequency determination means detects various moving images uploaded on the web, all moving images on SNS (social network service), various moving images distributed over the Internet, etc. Since the appearance frequency is determined as a target, we extract video content with high importance from all the video content that is circulated in the world through the web, and comprehensively identify video content with high importance for the viewers who are users. It will be possible to search and provide quickly and automatically.

請求項６記載の発明は、構成要素出現頻度判断手段は、ＴＶ映像として表示される全ての映像コンテンツを対象として同一の構成要素の出願頻度を判断することから、ＴＶを介して世の中に出回る全ての映像コンテンツから重要度の高い映像コンテンツを抽出し、ユーザーである視聴者に対して重要度の高い映像コンテンツを網羅的に検索し、迅速かつ自動的に提供することが可能となる。 According to the sixth aspect of the invention, the element appearance frequency determination means determines the application frequency of the same element for all video contents displayed as TV images. It is possible to extract video content of high importance from the video content of the user, comprehensively search for video content of high importance to the viewer who is the user, and provide it quickly and automatically.

請求項７～１２記載の発明にあっては、前記構成要素出現頻度判断手段は映像コンテンツの要約を基礎として映像コンテンツの出現頻度、即ち重要度の判断を行うように構成されていることから、直接に映像コンテンツから、顔認識技術、音声認識、形態認識技術等の各種の高度な技術を各映像の主題を絞り込む場合に比して、より迅速かつ正確に当該構成要素の出現頻度の判断を行うことが可能となる。 In the inventions according to claims 7 to 12, since the component appearance frequency determination means is configured to determine the frequency of appearance of the video content, that is, the degree of importance, based on the summary of the video content, Compared to narrowing down the subject of each video directly from the video content using various advanced technologies such as face recognition technology, voice recognition technology, and morphological recognition technology, it is possible to determine the frequency of appearance of the relevant component more quickly and accurately. can be done.

請求項７に記載の発明にあっては、社会的事象に関する情報は複数のメディアにおいて、複数の映像コンテンツとして出現する場合が多いことから、複数の映像コンテンツにおいて、テキストの形態素解析等によって重複するテキストを同一の構成要素として判断するとともに、その出現頻度、例えば、出現回数、出現時間、出現率等に応じて、段階的等の重要度を適正に設定するように構成されていることから、正確に構成要素の出現頻度、即ち、社会的重要度の判断が行われ、ユーザーに対して適切な重要度の動画コンテンツが提供される。 In the invention according to claim 7, information related to social events often appears as multiple video contents in multiple media. Since the text is judged to be the same constituent element, and the importance is appropriately set in stages according to the frequency of appearance, for example, the number of appearances, the time of appearance, the appearance rate, etc. The frequency of appearance of the constituent elements, that is, the degree of social importance is determined accurately, and video content with an appropriate degree of importance is provided to the user.

請求項９記載の発明は、社会的な重要度の判断に要する期間を所定の期間内において判断されることから、ユーザーに対して、迅速に、社会的に重要な映像コンテンツを提供することが可能となる。 According to the ninth aspect of the invention, since the period required for judging social importance is determined within a predetermined period, it is possible to quickly provide socially important video contents to users. It becomes possible.

請求項１０記載の発明は、社会的に重要な事件・事故等を、例えば、要約作成用とは別個に、大容量記憶部に録画記憶しておけば、オペレータ等の編集、記録映像等として残しておくことができ、収集した当該ユーザーにとって重要な映像コンテンツを適宜、分析等に再利用することができる映像情報提供システムを提供することができる。 According to the invention of claim 10, if socially important incidents, accidents, etc., are recorded and stored in a large-capacity storage unit separately from those used for summarization, they can be edited and recorded by operators, etc. It is possible to provide a video information providing system that can retain and reuse collected video content that is important to the user for analysis or the like as appropriate.

請求項１１に記載の発明は、前記重み付け付与手段により算出されたＣＭ換算値を利用して様々な経済活動の分析指標として使用することができる映像情報算出システムを提供することができる。 The invention according to claim 11 can provide a video information calculation system that can use the CM conversion value calculated by the weighting means as an analysis index for various economic activities.

請求項１２記載の発明にあっては、特定の法人名が出現頻度判断の対象となる構成要素とされることから、特定の法人名が現れる映像コンテンツが抽出され、ユーザーとしての視聴者である当該法人に提供されることから、例えば、自社の社会的評価、評判等に関する情報を迅速、適切、かつ網羅的に収集することができ、自社の経営に敏速に反映させることが可能となる。 In the twelfth aspect of the invention, since the specific corporate name is used as a component for determining the appearance frequency, the video content in which the specific corporate name appears is extracted, and the viewer is the user. Since it is provided to the corporation concerned, for example, it is possible to quickly, appropriately and comprehensively collect information on the company's social evaluation, reputation, etc., and to promptly reflect it in the management of the company.

本発明に係る映像情報提供システムの第一の実施の形態の全体構成を示すブロック図である。1 is a block diagram showing the overall configuration of a first embodiment of a video information providing system according to the present invention; FIG. 本発明に係る映像情報提供システムの第二の実施の形態の全体構成を示すブロック図である。1 is a block diagram showing the overall configuration of a second embodiment of a video information providing system according to the present invention; FIG. 本発明に係る映像情報提供システムの第二実施の形態に係る要約作成システムの全体構成を示すブロック図である。FIG. 10 is a block diagram showing the overall configuration of a summary creation system according to a second embodiment of the video information providing system according to the present invention; 本発明に係る映像情報提供システムの第二実施の形態において発話テキスト化部を示すブロック図であり、（ａ）はブロック図、（ｂ）は処理の流れを示す図である。FIG. 4 is a block diagram showing a speech-to-text unit in the second embodiment of the video information providing system according to the present invention, where (a) is a block diagram and (b) is a diagram showing the flow of processing; 本発明に係る映像情報提供システムの第二実施の形態において要約作成システムのテロップテキスト化部を示すブロック図であり、（ａ）はブロック図、（ｂ）は処理の流れを示す図である。FIG. 4 is a block diagram showing a telop text conversion unit of a summary creation system in the second embodiment of the video information providing system according to the present invention, where (a) is a block diagram and (b) is a diagram showing the flow of processing; 本発明に係る映像情報提供システムの第二実施の形態において要約作成システムの背景画像テキスト化部を示すブロック図であり、（ａ）はブロック図、（ｂ）は処理の流れを示す図である。FIG. 4 is a block diagram showing a background image text conversion unit of a summary creation system in the second embodiment of the video information providing system according to the present invention, where (a) is a block diagram and (b) is a diagram showing the flow of processing; . 本発明に係る映像情報提供システムの第二実施の形態において要約作成システムのロゴマークテキスト化部を示すものであり、（ａ）はブロック図、（ｂ）は処理の流れを示す図である。FIG. 10A is a block diagram showing a logo mark text conversion unit of a summary creating system in a second embodiment of the video information providing system according to the present invention, and FIG. 本発明に係る映像情報提供システムの第二実施の形態において要約作成システムのテキスト統合部を示すブロック図である。FIG. 4 is a block diagram showing a text integration unit of a summary creation system in the second embodiment of the video information providing system according to the present invention; 本発明に係る映像情報提供システムの第二実施の形態において要約作成システムの要約作成部を示すブロック図である。FIG. 10 is a block diagram showing a summary creation unit of a summary creation system in the second embodiment of the video information providing system according to the present invention; 本発明に係る映像情報提供システムの第二実施の形態において要約作成システムの作動工程をしめすフローチャートである。FIG. 10 is a flow chart showing an operation process of a summary creating system in the second embodiment of the video information providing system according to the present invention; FIG. 本発明に係る映像情報提供システムの適用例を示し、（Ａ）は文字認識により所望の条件に適合していると判定した場合の説明図、（Ｂ）は音声認識によりにより所望の条件に適合していると判定した場合の説明図、である。1 shows an application example of the video information providing system according to the present invention, (A) is an explanatory diagram when it is determined that the desired condition is met by character recognition, and (B) is the desired condition by voice recognition. FIG. 11 is an explanatory diagram when it is determined that

図１は、本発明に係る映像情報提供システム１を実現するための第一実施の形態を示すブロック図である。
本実施の形態に係る映像情報提供システム１は、配信された多数の映像コンテンツを対象として利用者にとって有用な映像コンテンツを、迅速かつ適切に供給することを目的として、配信された映像コンテンツの構成要素に基づき、同一の構成要素の出現頻度を判断する構成要素出現頻度判断手段５ａと、構成要素出現頻度判断手段５ａによる判断に基づき、出現頻度が所定値以上の場合に構成要素が含まれる映像コンテンツを抽出する映像コンテンツ抽出手段５ｂと、映像コンテンツ抽出手段５ｂにより抽出された映像を視聴者に提供する映像コンテンツ提供手段７とを備えている。 FIG. 1 is a block diagram showing a first embodiment for realizing a video information providing system 1 according to the present invention.
The video information providing system 1 according to the present embodiment aims to quickly and appropriately supply useful video content for users from a large number of distributed video content. A component appearance frequency determining means 5a for determining the appearance frequency of the same component based on the element, and a video containing the component when the appearance frequency is equal to or higher than a predetermined value based on the determination by the component appearance frequency determining means 5a. It comprises video content extracting means 5b for extracting content, and video content providing means 7 for providing the video extracted by the video content extracting means 5b to viewers.

本実施の形態にあっては、図１に示すように、映像情報提供システム１は、例えば、１台のコンピュータ機能を備えるテレビ、パーソナルコンピュータ、スマートフォン、タブレット端末等の再生装置９のみでの利用を可能としたもので、テレビ局（テレビ放送局）若しくはウェブの映像配信サーバからコンテンツに関するビデオ信号を受信するチューナ等を備える受信部２と、再生装置９に装備の操作部（リモコン等を含む）３と、再生装置９としての各種機能を実現するためのアプリケーションを格納した記憶部４と、記憶部４に記憶したアプリケーションに基づいて各種機能を処理する制御回路部５と、映像コンテンツを格納する大容量記憶部６と、音声出力用のスピーカや映像出力用のモニタを含む出力部７とを備えている。 In this embodiment, as shown in FIG. 1, the video information providing system 1 can be used only with a playback device 9 such as a television, a personal computer, a smart phone, a tablet terminal, etc., having a single computer function. A receiving unit 2 equipped with a tuner or the like for receiving video signals related to content from a television station (television broadcasting station) or a web video distribution server, and an operation unit (including a remote control, etc.) equipped in the playback device 9 3, a storage unit 4 storing applications for realizing various functions of the playback device 9, a control circuit unit 5 processing various functions based on the applications stored in the storage unit 4, and storing video content . It has a large-capacity storage unit 6 and an output unit 7 including a speaker for audio output and a monitor for video output.

受信部２は、衛星放送等を含むテレビ放送用の受信アンテナやインターネット回線等の電気通信回線用のモデム、並びに、チューナ等を含み、映像コンテンツに関するデータ全般を受信することができる。 The receiving unit 2 includes a receiving antenna for television broadcasting including satellite broadcasting, a modem for telecommunication lines such as the Internet line, a tuner, and the like, and can receive all data related to video content.

操作部３は、再生装置９がテレビ等の受像機（ビデオデッキ等を含む）の場合には、主電源、チャネル選択操作、音量増減スイッチ、等の所望の映像コンテンツを視聴するためにユーザーが操作を行うためのもの（リモコン装置等を含む）である。また、操作部３は、再生装置９がパーソナルコンピュータ等である場合には、マウスやキーボード、或いはタッチパネル等の所望の映像コンテンツを視聴するためにユーザーが操作を行うためのものである。 When the playback device 9 is a receiver such as a television (including a video deck, etc.), the operation unit 3 includes a main power supply, a channel selection operation, a volume increase/decrease switch, etc., which are operated by the user to view desired video content. A device (including a remote control device, etc.) for performing operations. When the playback device 9 is a personal computer or the like, the operation unit 3 is a mouse, keyboard, touch panel, or the like for the user to operate to view desired video content.

記憶部４は、例えば、再生装置９が動作するために制御回路部５が実行する各種アプリケーションを格納している。また、記憶部４に格納したアプリケーションには、本実施形態が実現するための機能用のアプリケーションを含む。 The storage unit 4 stores, for example, various applications executed by the control circuit unit 5 in order for the playback device 9 to operate. Further, the applications stored in the storage unit 4 include applications for functions to be realized by the present embodiment.

制御回路部５は、記憶部４に記憶したアプリケーションに基づいて、受信した映像コンテンツの視聴を可能とするとともに、構成要素出現頻度判断手段５ａ、映像コンテンツ抽出手段５ｂを有している。 The control circuit unit 5 enables viewing of the received video content based on the application stored in the storage unit 4, and has component appearance frequency determination means 5a and video content extraction means 5b.

大容量記憶部６は、受信部２で受信した映像コンテンツのデータを格納する。また、大容量記憶部６は、制御回路部５で映像コンテンツを解析した結果を蓄積することができる。 The large-capacity storage unit 6 stores the video content data received by the receiving unit 2 . In addition, the large-capacity storage unit 6 can accumulate the results of video content analysis performed by the control circuit unit 5 .

具体的には、ウェブサーバ（クラウドサーバ）或いはテレビ放送局等から配信され、受信部２で受信した映像コンテンツは、大容量記憶部６に蓄積される。 Specifically, video content distributed from a web server (cloud server) or a television broadcasting station or the like and received by the receiving unit 2 is accumulated in the large-capacity storage unit 6 .

そして、蓄積された映像コンテンツに含まれる文字、音声、映像コンテンツのテーマとなる映像対象物、登場人物又は画面の背景等の各構成要素は、記憶部４に格納したアプリケーションにしたがって構成要件出現頻度判断手段５ａによって各構成要素の出現頻度が判断される。 Each component such as text, voice, video object, character or screen background that is the theme of the video content contained in the stored video content is determined according to the application stored in the storage unit 4. The appearance frequency of each component is determined by the determining means 5a.

即ち、制御回路部５は、対象となる構成要素の出現頻度、例えば、出現回数、出現時間、出現率等を逐次算出するために、その算出結果と各構成要素とを対応させたデータを大容量記憶部１６に格納する。 That is, in order to sequentially calculate the frequency of appearance of the target component, for example, the number of appearances, the time of appearance, the appearance rate, etc., the control circuit unit 5 stores data in which the calculation result and each component correspond to each other. Stored in the capacity storage unit 16 .

制御回路部５は、構成要素出現頻度判断手段５ａ及び映像コンテンツ抽出手段５ｂを備えており、また、出力部７が備える表示画面は映像コンテンツ提供手段７を構成する。 The control circuit section 5 includes component appearance frequency determination means 5a and video content extraction means 5b.

制御回路部５は、逐次算出した各構成要素の出現頻度が所定値以上に達したか否か、を判定し、出現頻度が所定値以上に達したと判定した場合に、その構成要素が含まれる映像コンテンツを抽出し、コンテンツ提供手段である出力部７の表示画面に表示させる。 The control circuit unit 5 determines whether or not the appearance frequency of each component that is sequentially calculated has reached a predetermined value or more, and if it is determined that the appearance frequency has reached a predetermined value or more, the component is included. The video content is extracted and displayed on the display screen of the output unit 7, which is content providing means.

なお、出力部７は、例えば、図示を略す外部記録メディア（例えば、ＳＤメモリカードやＵＳＢメモリスティックなど）に抽出した映像コンテンツを記憶するためのストレージやスロットにより構成してもよい。 The output unit 7 may be configured by a storage or a slot for storing the extracted video content in an external recording medium (eg, SD memory card, USB memory stick, etc.) not shown.

また、このような外部記録メディアに抽出した映像コンテンツデータを記録する場合、例えば、大容量記憶部６に記憶した映像コンテンツにおける映像データの解像度を低くしたバックアップ用映像データを生成して記憶部４に一時的に記憶したうえで、そのバックアップ用映像データを記録するように構成してもよい。 Further, when recording the extracted video content data in such an external recording medium, for example, backup video data in which the resolution of the video data in the video content stored in the large-capacity storage unit 6 is reduced is generated and stored in the storage unit 4. It may also be configured such that the backup video data is recorded after the video data is temporarily stored in the .

このような構成において、配信された映像コンテンツは、構成要素出現頻度判断手段５ａにより、映像コンテンツに含まれる構成要素が解析されて、過去に蓄積した映像コンテンツとの対比がされることにより同一の構成要素の出現頻度が判断される。ここで映像コンテンツは動画が主体となるが静止画も含まれる。 In such a configuration, the distributed video content is analyzed by the component appearance frequency determination means 5a for the component elements included in the video content, and compared with the video content stored in the past. A component's frequency of occurrence is determined. Here, video content is mainly composed of moving images, but still images are also included.

そして、構成要素出現頻度判断手段５ａにより所定の構成要素の出現頻度が所定期間において所定値に達したと判断された場合、映像コンテンツ抽出手段５ｂによりその構成要素を含む映像コンテンツが蓄積された多数の映像コンテンツから抽出される。 When the component appearance frequency determining means 5a determines that the appearance frequency of a predetermined component has reached a predetermined value in a predetermined period, the video content extracting means 5b accumulates a large number of video contents including the component. extracted from the video content of

その後、映像コンテンツ提供手段７により抽出した映像コンテンツは視聴者に提供される。この場合の映像コンテンツの提供には、適宜のディスプレイを介しての映像コンテンツの表示のみならず、映像コンテンツの記録・配信、その他の出力の態様が含まれる。 After that, the video content extracted by the video content providing means 7 is provided to the viewer. The provision of video content in this case includes not only display of video content via an appropriate display, but also recording/distribution of video content and other modes of output.

この場合、構成要素出現頻度判断手段５ａ及び映像コンテンツ抽出手段５ｂには、ＡＩ（人工知能：artificial intelligence）が使用され、高速でのデータ処理が行われる。 In this case, AI (artificial intelligence) is used for the component appearance frequency determination means 5a and the video content extraction means 5b, and data processing is performed at high speed.

即ち、配信された動画や静止画に係る映像コンテンツは、構成要素出現頻度判断手段５ａにより、映像コンテンツに含まれる要素が解析されて、過去に蓄積した映像コンテンツとの対比がされることにより同一の構成要素の出現頻度が判断され、所定の構成要素の出現頻度が所定期間において所定値に達したと判断された場合、映像コンテンツ抽出手段５ｂによりその構成要素を含む映像コンテンツを蓄積した多数の映像コンテンツから抽出され、映像コンテンツ提供手段７により抽出した映像コンテンツは視聴者に提供されるように構成されていることから、視聴者はＴＶやウェブに配信される映像コンテンツを常時視聴する必要はなく、自動的に提供された所定の頻度以上にＴＶやウェブに出現する映像コンテンツを視聴することができる。 That is, the distributed video content related to moving images and still images is analyzed by the component appearance frequency determination means 5a for elements included in the video content, and is compared with the video content accumulated in the past to determine whether the video content is the same. is determined, and when it is determined that the appearance frequency of a predetermined component has reached a predetermined value in a predetermined period of time, the video content extracting means 5b accumulates a large number of video contents including the component Since the video content extracted from the video content and extracted by the video content providing means 7 is configured to be provided to the viewer, the viewer does not need to watch the video content delivered to the TV or the web all the time. Instead, it is possible to view video content that appears on TV or the web more than a predetermined frequency that is automatically provided.

また、本実施の形態にあっては、制御回路部５は希望映像登録手段５Ｃを備えている。従って、本システムのユーザーである視聴者は、操作部３を介して、ユーザー自身が視聴を希望する映像、映像の分類、種類、ジャンル等を適宜の指定範囲において、希望映像登録手段５Ｃにより登録することができ、構成要素出現頻度判断手段５ａは登録された希望映像に基づき構成要素の出現頻度の判断を行う。 Further, in this embodiment, the control circuit section 5 is provided with a desired video registration means 5C. Therefore, the viewer, who is the user of the present system, registers the video that the user himself/herself desires to view, the classification, type, genre, etc. of the video in an appropriate designated range by the desired video registration means 5C via the operation unit 3. The constituent element appearance frequency determination means 5a judges the appearance frequency of the constituent elements based on the registered desired image.

本実施の形態にあっては、動画映像に関し構成要素出現頻度判断手段５ａが特定の映像コンテンツをいかに認識、特定するか、に関しては、「文字、音声、映像コンテンツのテーマとなる映像対象物、登場人物又は映像対象物の背景」の観点から行われる。 In the present embodiment, regarding how the component appearance frequency determination means 5a recognizes and identifies specific video content with respect to a moving image, "a video object that is the theme of text, voice, and video content, This is done from the point of view of the character or the background of the image object.

従って、本実施の形態にあっては、出現頻度の判断対象となる構成要素は、文字、音声、映像コンテンツのテーマとなる映像対象物、登場人物又は映像対象物の背景である。 Therefore, in the present embodiment, the components whose frequency of appearance is to be determined are text, voice, video objects that are themes of video content, characters, or backgrounds of video objects.

また、「音声」とは、動画等に含まれる多様な音声であり、背景音、テーマソング、登場人物の話し声、効果音等が含まれる。 Also, "sound" is a variety of sounds included in moving images and the like, and includes background sounds, theme songs, voices of characters, sound effects, and the like.

必要な場合には、その多様な音声の中から、当該映像のテーマとなる話題に関する特定の登場人物の話し声のみを抽出して構成要素出現頻度判断手段５ａにより出現頻度が判断される。 If necessary, only the speaking voice of a specific character on the theme of the video is extracted from the various voices, and the frequency of appearance is determined by the component appearance frequency determination means 5a.

また、「テーマとなる映像対象物」とは、当該映像の主題となる対象物であり、「登場人物」とは映像のテーマに関連して主役として登場する人物又は脇役として登場する人物を含む。この場合、登場人物の特定に関しては顔認証技術等が使用される。また、「映像対象物の背景」とは、登場人物や映像の主題となる対象物の背景として映っているものを指し、例えば、建造物や、海、山、空、平原等を指す。 In addition, "theme video object" is the subject of the video, and "character" includes a person who appears as a main character or a supporting character in relation to the theme of the video. . In this case, facial recognition technology or the like is used to identify characters. Also, the "background of an image object" refers to what appears as the background of a character or an object that is the subject of an image, such as a building, the sea, a mountain, the sky, a plain, and the like.

これにより、構成要素出現頻度判断手段５ａは視聴者が登録した視聴希望に基づき構成要素の出現頻度を判断するように構成されており、本システムのユーザーである視聴者は、自身が視聴を希望する映像、映像の分類、種類、ジャンル等を適宜の指定範囲において登録することができ、構成要素出現頻度判断手段はその希望に基づき出現頻度の判断を行うことができるため、ユーザーが自分が希望する映像コンテンツのジャンル、種類等を適宜システムに登録しておき、そのジャンル、種類の範囲内において出現頻度の高い映像コンテンツを自動的に、効率よく視聴することが可能となる。 Thus, the component appearance frequency determination means 5a is configured to determine the frequency of appearance of the component based on the viewing desire registered by the viewer. It is possible to register the video, the classification, type, genre, etc. of the video in an appropriate designated range, and the component appearance frequency determination means can determine the appearance frequency based on the user's desire. The genre, type, etc. of the video content to be viewed are appropriately registered in the system, and it is possible to automatically and efficiently view the video content having a high frequency of appearance within the range of the genre and type.

また、例えば、ユーザーである特定の会社が、ＴＶ又はウェブに現れる自社に関連する情報を一括管理しておきたいと希望する場合には、当該会社名、コーポレーションアイデンティティ、社名の略称等を登録しえておくことにより、当該会社に関連するＴＶ映像コンテンツ又はウェブにおける映像情報を全て自動的に収集して視聴し、様々な観点から会社の経営管理に役立てることが可能となる。 In addition, for example, if a specific company that is a user wishes to collectively manage information related to the company that appears on TV or the web, register the company name, corporate identity, company name abbreviation, etc. By doing so, it is possible to automatically collect and view all TV video content or video information on the web related to the company, and use it for business management of the company from various points of view.

映像コンテンツは、電気通信回線を介してウェブサーバ（又はクラウドサーバ）から配信されたものである場合、構成要素出現頻度判断手段は、配信された動画等を含む映像コンテンツを対象として同一の構成要素の出願頻度を判断する。ここで「ウェブを介して配信された動画」とは、ウェブにアップされている各種の動画、ＳＮＳ（ソーシャルネットワークサービス）上の動画、インターネットを通じて動画を提供するサービス上の動画等の一切を対象とすることができる。 If the video content is distributed from a web server (or cloud server) via an electric communication line, the component appearance frequency determination means determines whether the same component is targeted for the video content including the distributed video etc. determine the filing frequency of Here, "video distributed via the web" refers to all types of videos uploaded to the web, videos on SNS (social network services), videos on services that provide videos via the Internet, etc. can be

このように、構成要素出現頻度判断手段は、ウェブにアップされている各種の動画、ＳＮＳ（ソーシャルネットワークサービス）上の一切の動画、インターネットを通じて配信される動画を対象として出現頻度を判断することから、ウェブを介して世の中に出回る全ての映像コンテンツから重要度の高い映像コンテンツを抽出し、ユーザーである視聴者にとって重要度又は関心度の高い映像コンテンツを網羅的かつ自動的に提供することが可能となる。 In this way, the component appearance frequency determination means determines the appearance frequency of various videos uploaded on the web, all videos on SNS (social network service), and videos distributed over the Internet. It is possible to extract video content with high importance from all the video content circulating in the world through the web, and comprehensively and automatically provide video content with high importance or interest to viewers who are users. becomes.

また、映像コンテンツは、電気通信回線を介して放送局から配信されたものである場合には、構成要素出現頻度判断手段５ａは、ＴＶ映像として表示される全ての映像コンテンツを対象として同一の構成要素の出願頻度を判断する。 Further, when the video content is distributed from a broadcasting station via an electric communication line, the constituent element appearance frequency determination means 5a applies the same configuration to all the video content displayed as TV video. Determine the filing frequency of elements.

従って、構成要素出現頻度判断手段５ａは、ＴＶ映像として表示される全ての映像コンテンツを対象として同一の構成要素の出願頻度を判断することから、ＴＶを介して世の中に出回る全ての映像コンテンツから重要度又は関心度の高い映像コンテンツを抽出し、ユーザーである視聴者に対して重要度又は関心度の高い映像コンテンツを網羅的かつ自動的に提供することが可能となる。 Therefore, since the constituent element appearance frequency determination means 5a judges the filing frequency of the same constituent element for all video contents displayed as TV images, the important It is possible to extract video content with a high degree of importance or interest and comprehensively and automatically provide video content with a high degree of importance or interest to viewers who are users.

なお、制御回路部５は、後述する各種の重み付けの条件下において構成要素の出現頻度を判断することも可能である。
また、本実施の形態にあっては、ＴＶ放送局から送信されてくる映像情報に関しては、
例えば、ニュース番組の場合には、画面中において右上等の位置に当該ニュースのタイトル部が短いテキストにより表示される場合がある。このような場合には、構成要素の出現頻度を判断するための映像認識、音声認識、人物認識、背景認識を行う場合に、タイトル部を参照して行うこともできる。
このようにタイトル部を参照して映像認識、音声認識、人物認識、背景認識を行った場合には、より精度の高い認識を行うことができ、より効率の良い構成要素の出現頻度の判断をすることが可能となる。 The control circuit unit 5 can also determine the appearance frequency of the constituent elements under various weighting conditions, which will be described later.
Further, in this embodiment, regarding the video information transmitted from the TV broadcasting station,
For example, in the case of a news program, the title of the news may be displayed in short text at a position such as the upper right of the screen. In such a case, when video recognition, voice recognition, person recognition, and background recognition are performed to determine the appearance frequency of the constituent elements, the title part can be referred to.
In this way, when video recognition, voice recognition, person recognition, and background recognition are performed by referring to the title part, more accurate recognition can be performed, and more efficient determination of the appearance frequency of constituent elements can be performed. It becomes possible to

次に、映像コンテンツに基づいて要約を作成しその要約を利用して、構成要素の出現頻度を判断するように構成された第二の実施形態を説明する。
図２は本発明の実施形態に係る映像情報提供システムを実現するための映像情報提供システム１１の全体構成を示すブロック図である。システム全体としての構成は、基本的な構成は、前記第一の実施の形態と同様であるが、本実施の形態にあっては、要約作成システム１０を備えている点で異なる。 Next, a second embodiment configured to create a summary based on video content and use the summary to determine the frequency of appearance of constituent elements will be described.
FIG. 2 is a block diagram showing the overall configuration of the video information providing system 11 for realizing the video information providing system according to the embodiment of the present invention. The basic configuration of the system as a whole is similar to that of the first embodiment, but the present embodiment differs in that a summary creation system 10 is provided.

即ち、第二実施の形態に係る映像情報提供システムは、第一の実施の形態に係る映像情報提供システム１の構成を基本として、さらに、複数の映像コンテンツについての各ビデオ信号から抽出した複数の映像コンテンツにおける音声データ又は映像データをテキスト化して要約を作成する要約作成手段１０と、前記要約作成手段１０で作成した要約の蓄積結果に基づいて最適な条件を学習しつつ、複数の映像コンテンツに対して要約に含まれる一つ以上の所定の条件に特化した重み付けを付与する重み付け付与手段１５ｄとを備えており、構成要素出現頻度判断手段１５ａは前記要約に基づき構成要素の出現頻度を判断するように構成されている。
また、本実施の形態にあっては、制御回路部１５は希望映像登録手段１５ｃを有しており、ユーザーは適宜、操作部３を介して、ユーザー自身が視聴を希望する映像、映像の分類、種類、ジャンル等を適宜の指定範囲において、希望映像登録手段１５Ｃにより登録することができ、構成要素出現頻度判断手段１５ａは登録された希望映像に基づき構成要素の出現頻度の判断を行い、映像コンテンツ抽出手段１５ｂにより当該映像コンテンツが抽出され、映像コンテンツ出力手段を構成する出力部１７によりディスプレイ等に表示される点は第一実施の形態と同様である。 That is, the video information providing system according to the second embodiment is based on the configuration of the video information providing system 1 according to the first embodiment, and furthermore, has a plurality of video contents extracted from each video signal. A summary creating means 10 for converting audio data or video data in a video content into text to create a summary; weighting means 15d for giving weights specific to one or more predetermined conditions included in the summary, and component appearance frequency determination means 15a determines the frequency of appearance of the components based on the summary. is configured to
Further, in this embodiment, the control circuit section 15 has a desired video registration means 15c, and the user can select the video the user desires to view and classify the video through the operation section 3 as appropriate. , type, genre, etc., can be registered by the desired video registration means 15C within an appropriate designated range, and the constituent element appearance frequency determination means 15a determines the frequency of appearance of the constituent elements based on the registered desired video. The video content is extracted by the content extraction means 15b and displayed on a display or the like by the output section 17 constituting the video content output means, as in the first embodiment.

本実施の形態において、映像情報提供システム１１を構成する要約作成システム１０を再生装置１９内に設けることもでき、また、要約作成システム１０を専用の管理サーバ等によって構成することもでき、その管理サーバによって作成された要約に基づいて稼働する映像出力システム部分を、例えば、コンピュータ機能を備えるテレビ、パーソナルコンピュータ、スマートフォン、タブレット端末等（以下、「再生装置１９」と称する。）により実現することも可能である。なお、再生装置１９は、１台での利用のほか、複数台での利用も可能である。 In the present embodiment, the summary creation system 10 that constitutes the video information providing system 11 can be provided in the playback device 19, or the summary creation system 10 can be configured by a dedicated management server or the like. The video output system part that operates based on the summary created by the server can be realized by, for example, a TV, personal computer, smartphone, tablet terminal, etc. (hereinafter referred to as "playback device 19") having computer functions. It is possible. It should be noted that the playback device 19 can be used not only by one device but also by a plurality of devices.

また、以下の説明においては、テレビ放映の場合を主として説明するとともに、ウェブの映像配信の固有の場合は適宜説明し、テレビ放映の利用形態と同一若しくは実質的に同一のウェブの映像配信の利用形態に関してはその説明を省略する。 In addition, in the following explanation, the case of television broadcasting will be mainly explained, and the unique cases of web video distribution will be explained as appropriate. Description of the form is omitted.

テレビ放映には、地上波デジタル放送、衛星放送、ワンセグ放送、インターネット放送等が含まれ、放送形態や受信形態は問わない。 Television broadcasting includes terrestrial digital broadcasting, satellite broadcasting, one-segment broadcasting, Internet broadcasting, etc., and the form of broadcasting and reception is not limited.

図２及び図３に示すように、映像情報提供システム１１は、テレビ局３０若しくはウェブの映像配信サーバ４０から映像コンテンツに関するビデオ信号を受信するチューナ等を備える受信部１２と、再生装置９に装備の操作部（リモコン等を含む）１３と、再生装置１９としての各種機能を実現するためのアプリケーションを格納した記憶部１４と、記憶部１４に記憶したアプリケーションに基づいて各種機能を処理する制御回路部１５と、上述した要約作成システム１０と、作成された要約並びに映像コンテンツの録画用の各種データを記憶する大容量記憶部１６と、音声出力用のスピーカや映像出力用のモニタを含む出力部１７とを備えている。 As shown in FIGS. 2 and 3, the video information providing system 11 includes a receiving unit 12 having a tuner or the like for receiving a video signal related to video content from a TV station 30 or a web video distribution server 40, and An operation unit (including a remote controller, etc.) 13, a storage unit 14 storing applications for realizing various functions of the playback device 19, and a control circuit unit processing various functions based on the applications stored in the storage unit 14. 15, the summary creation system 10 described above, a large-capacity storage unit 16 for storing the created summary and various data for recording video content, and an output unit 17 including a speaker for audio output and a monitor for video output. and

要約書作成システム１０は、ビデオ信号分離部２０、ビデオ信号処理部１８、テキスト統合部５００及び要約作成部６００により構成されており、ビデオ信号処理部１８は、図３に示すように、発話テキスト化部１００、テロップテキスト化部２００、背景画像テキスト化部３００、ロゴマークテキスト化部４００により構成されている。 The abstract creation system 10 is composed of a video signal separation unit 20, a video signal processing unit 18, a text integration unit 500, and a summary creation unit 600. The video signal processing unit 18, as shown in FIG. It is composed of a text conversion unit 100 , a telop text conversion unit 200 , a background image text conversion unit 300 and a logo mark text conversion unit 400 .

図２に示すように、再生装置１９は、受信部１２、操作部１３、記憶部１４、制御回路部１５、要約処理システム１１、大容量記憶部１６及び出力部１７を有している。
また、出力部１７は、例えば、重み付け付与手段としての制御回路部１５で算出した重み付け付与に基づく、付与結果をモニタ出力或いはプリンタ出力する機能を有する映像コンテンツ提供手段を構成している。
制御回路部１５は、構成要件出現頻度判断手段１５ａ、映像コンテンツ抽出手段１５ｂ、希望映像登録手段１５Ｃを有している点では第一実施の形態の場合と同様であり、本実施の形態にあっては、さらに、重み付け手段１５ｄを有している。 As shown in FIG. 2, the playback device 19 has a receiving section 12, an operating section 13, a storage section 14, a control circuit section 15, a summary processing system 11, a large capacity storage section 16 and an output section 17. FIG.
Further, the output unit 17 constitutes a video content providing unit having a function of outputting the result of weighting calculated by the control circuit unit 15 as a weighting unit to a monitor or a printer.
The control circuit section 15 is the same as in the first embodiment in that it has a component appearance frequency determination means 15a, a video content extraction means 15b, and a desired video registration means 15C. It further has weighting means 15d.

＜要約（映像メタデータ）の作成＞
ここでは、要約としての映像メタデータを制作する場合の一例として、テレビ放送内容を日本語処理してデータベース化する場合を説明する。また、この場合に映像コンテンツとは、一つの番組又はコーナーを対象として例示する。 <Create a summary (video metadata)>
Here, as an example of producing video metadata as a summary, a case of processing television broadcast contents in Japanese and creating a database will be described. Also, in this case, the video content is exemplified for one program or segment.

テレビ番組において、特に、刻々と放送されるニュース・放送番組にあっては、「即時性」や「正確性」が重要となっている。 "Immediateness" and "accuracy" are important in television programs, particularly in news and broadcast programs that are broadcast every moment.

その一方で、テレビ放送におけるこのようなニュース・放送番組にあっては、一部のニュース内容が時間帯の異なる他のニュース番組等（放送局の相違は問わない）で放送されることはあるものの、同一番組が異なる曜日に再放送されることはなく、消えゆく情報ともいえる。 On the other hand, in such news and broadcast programs in television broadcasting, some news content may be broadcast in other news programs etc. with different time zones (regardless of the difference in broadcasting stations). However, the same program will not be rebroadcast on different days of the week, and it can be said that it is disappearing information.

このような「即時性」や「正確性」を有する情報にあっては、ニュース内容によって、社会的な重要性やニーズ、或は、新情報が明らかになる、などの条件によって継続性を有する場合があるため、例えば、出現頻度が所定値に達するなどの重要度・ニーズ度等に応じてニュースが重み付けされるのが望ましい。なお、出現頻度には、例えば、出現回数、出現時間、出現率等を適用することができる。 Information that has such "immediacy" and "accuracy" has continuity depending on conditions such as social importance and needs, or new information becoming clear depending on the news content. Therefore, it is desirable to weight the news according to the degree of importance/needs, such as the frequency of appearance reaching a predetermined value, for example. For the frequency of appearance, for example, the number of times of appearance, time of appearance, rate of appearance, etc. can be applied.

ここで、重要度・ニーズ度には、短期的、長期的、時期的な要素を有していることから、例えば、週間、月間、季間（旬間）、年間、別の統計によって重み付けしたグラフを作成することも可能である。この際、作成されたグラフは、出力部１７からモニタ出力又はプリンタ出力が可能である。 Here, since the importance and needs have short-term, long-term, and seasonal elements, for example, weekly, monthly, seasonal (seasonal), yearly, graph weighted by another statistic It is also possible to create At this time, the created graph can be output from the output unit 17 to a monitor or a printer.

これにより、短期間での重要度・ニーズ度は高いが年間を通じた場合に重要度・ニーズ度が低くなってしまうことを抑制することができるうえ、対応する時期における重要度・ニーズ度が高いという重み付けを付与することができる。 As a result, it is possible to suppress the situation where the importance/needs are high in a short period of time, but the importance/needs are low throughout the year. can be given a weight.

具体的には、「桜の開花予想」、「桜の名所」、「オリンピック」などの特定の周期で重要度・ニーズ度が高くなる場合等に有効な重み付けを付与することができる。 Specifically, effective weighting can be applied to cases such as "forecast of cherry blossom blooming", "famous cherry blossom viewing", "Olympics", etc., where the degree of importance/need increases in a specific cycle.

また、新たに放送されるビデオ情報に対するメタデータは、１０分程度のタイムラグで逐次更新することができ、最新の情報に基づいた重要度等に更新することができる。この際、複数の放送局の番組を同時に受信して最新の情報に更新することも可能である。 Metadata for newly broadcast video information can be successively updated with a time lag of about 10 minutes, and can be updated to the degree of importance based on the latest information. At this time, it is possible to simultaneously receive programs from a plurality of broadcasting stations and update to the latest information.

メタデータには、放送局や放送時間等の基本情報に加え、ニュースのタイトル、内容の抄録、コメンテータの氏名や目立つロゴ、といったテキスト情報に加え、背景画像等の画像認証、キャスターの顔認証、声紋分析、等によってより細かい映像メタデータを制作・配信することができる。 In addition to basic information such as broadcast stations and broadcast times, metadata includes text information such as news titles, content abstracts, commentator names and conspicuous logos, as well as image authentication such as background images, caster face authentication, More detailed video metadata can be produced and distributed by voiceprint analysis and the like.

さらに、その結果は、ウェブやメールにより、ユーザー側で確認することも可能となっている。したがって、ユーザー側において、これらの映像メタデータをハードディスク等の大容量記憶媒体に保存・蓄積していけば、さまざまな活用場面に利用することができる。 Furthermore, the results can be confirmed by the user through the web or by e-mail. Therefore, if the user saves and accumulates the video metadata in a large-capacity storage medium such as a hard disk, it can be used in various situations.

具体的には、日々のニュース放送から、特定のコメンテータの言動をクローズアップして詳細を完全収録し、追って、その内容を検証することも可能となる。
なお、その特定のコメンテータを条件とし、特定のコメンテータが出演している場合には、必ず、ユーザーに視聴を報知するように構成することもでき、現在放送中のニュース番組、或は、録画したニュース番組において、そのコメンテータがコメントしている際に、スポット的にボリュームを上げることによりユーザーに視聴を促す報知を行うように構成することも可能となる。
このように構成した場合には、ユーザーはＴＶの番組映像をすべて見る必要はなく、報知されて場合にのみ視聴すればよいことから、ユーザーの時間的自由度を確保しつつ、ユーザーにとって有用な情報の取得を可能とするものである。 Specifically, from the daily news broadcast, it is possible to close up the specific commentator's words and actions, record the details, and later verify the contents.
If the specific commentator is a condition, it can be configured to always notify the user of the viewing when the specific commentator appears, and the news program currently being broadcast or the recorded In a news program, when the commentator is making a comment, it is also possible to notify the user of the news program by increasing the volume in a spot so as to prompt the user to view the program.
In the case of such a configuration, the user does not have to watch all the program images on the TV, and only needs to watch them when they are notified. It enables acquisition of information.

なお、ユーザーに対して、ユーザーが多忙でＴＶを常時視聴する時間的余裕のない場合には、要約において所定の映像に対応する条件を設定し、当該条件に合致した場合に、映像が表示されている旨の報知をユーザーに対して行うように構成することもできる。
この場合、報知条件に適合した内容が含まれている場合に、出力中の映像コンテンツのユーザーである視聴者に対して報知には、上述したボリュームを上げる場合のほか、メッセージ等を発音するなどの利用者の聴覚に対して行うことができる。
また、利用者の聴覚に対する報知のほか、例えば、図１１に示す表示映像７ａの明暗反転の繰り返しや専用ランプの点灯・点滅など、利用者の視覚に対する報知でもよい。また、これら聴覚と視覚との併用でもよい。さらに、単なる報知にとどまらず、他の動作（例えば、録画）を開始するためのトリガー信号として利用することも可能である。 If the user is busy and does not have time to watch TV all the time, a condition corresponding to a predetermined image is set in the summary, and the image is displayed when the condition is met. It can also be configured to notify the user to the effect that the
In this case, when content that meets the notification conditions is included, the viewer, who is the user of the video content being output, may be notified by increasing the volume as described above or by uttering a message or the like. of the user's hearing.
Further, in addition to the user's auditory notification, for example, the user's visual notification such as repetition of brightness inversion of the display image 7a shown in FIG. In addition, both hearing and vision may be used. Furthermore, it can be used not only as a mere notification, but also as a trigger signal for starting other operations (for example, recording).

また、例えば、ニュース番組において、利用者がスポーツニュースの結果のみを知りたい場合、ユーザーが希望映像登録手段１５Ｃによりその旨を登録しておいた場合には、ニュース番組全体を視聴するのではなく、制御回路部に設けられた構成要素出現頻度判定手段１５により、例えば、図１１（Ａ）に示すように、表示画面１７ａに「スポーツ」の文字がテロップ表示された場合や、図１１（Ｂ）に示すように、キャスターが「スポーツ」を含むアナウンス原稿を読み上げたときに、利用者に報知することができる。 Further, for example, in a news program, if the user wants to know only the results of sports news, and if the user has registered that effect by the desired video registration means 15C, the user does not view the entire news program. For example, as shown in FIG. ), the user can be notified when the newscaster reads out an announcement manuscript containing "sports".

また、映像メタデータの利用の態様としては、番組中に流れる映像中の登場人物、例えば、上述した特定コメンテータのコメント時間や論調分析、放送された内容中（番組中）に紹介された政治家（政党）やスポーツ選手の映像等を含む放送時間といった、映像メタデータのデータベース化を行うとともに、クラスタリング（データを外的基準なしに自動的に分類する機能の意）を行うことにより、人・物のＣＭ換算値を算出するといった重み付けの付与も可能である。 In addition, as a mode of utilization of video metadata, characters appearing in videos played in the program, for example, the comment time and tone analysis of the above-mentioned specific commentator, politicians introduced in the broadcast content (during the program) In addition to creating a database of video metadata such as broadcast times including videos of (political parties) and athletes, and clustering (a function that automatically classifies data without external criteria), It is also possible to apply weighting such as calculating the CM conversion value of an object.

なお、蓄積された過去の要約作成結果の入力データと出力データとを教材として最適な要約作成設定を学習する要約作成システム１０の機能である要約作成処理（ＡＩ処理）を利用して上述したような重み付けを付与する場合、ＡＩ処理とは別に、視聴率、或は、新聞や雑誌等の映像メタデータに含まれていない情報に基づいたオペレータの手動入力により、ＣＭ換算値を人物毎に評価価格（単位時間当たりの単価）に変換してもよい。 As described above, the summary creation process (AI processing), which is a function of the summary creation system 10 that learns the optimal summary creation settings using the accumulated past summary creation result input data and output data as teaching materials, is used. When assigning such weighting, separate from AI processing, the CM conversion value is evaluated for each person by manual input by the operator based on the information not included in the video metadata such as the viewing rate or the video metadata such as newspapers and magazines. You may convert into a price (unit price per unit time).

さらに、重み付けされたＣＭ換算値は、例えば、単一放送局、単一番組、複数放送局（例えば、関東エリアのキー局）等を対象として映像メタデータを制作し、週報／月報／旬報（四半期）／半期／通期／単位でまとめることができる。なお、まとめたデータはグラフや一覧表（例えば、上位１００人を対象として）等によって出力部７からモニタ出力又はプリンタ出力が可能である。 Furthermore, the weighted commercial conversion values are used to produce video metadata for, for example, a single broadcasting station, a single program, multiple broadcasting stations (e.g., key stations in the Kanto area), etc. quarterly)/half year/full year/unit. The collected data can be output to a monitor or a printer from the output unit 7 in the form of graphs, lists (for example, for the top 100 people), or the like.

さらに、テキスト化した映像メタデータは、同時放送中の文字放送として利用することができるうえ、例えば、テレビのニュース・放送番組、ワイドショー、討論番組、政治・経済番組、政治・経済バラエティなど、１日単位で延べ１００時間以上にもおよぶ国営放送局及び民放キー局の情報番組について、その内容や記事単位の詳細情報をオペレータによって作成するためのテキスト情報として利用することも可能である。 In addition, textual video metadata can be used as text broadcast during simulcast, for example, TV news/broadcast programs, wide shows, debate programs, political/economic programs, political/economic variety, etc. It is also possible to use as text information for the operator to create detailed information on the contents and article units of the information programs of state-run broadcasting stations and commercial key stations, which total more than 100 hours per day.

＜再生装置１９＞
再生装置１９には、受信部２として、テレビ放送（地デジ・衛星放送・ワンセグを含む）用のチューナ機能、或は、インターネット配信映像を受信する受信機能、を有し、図１１に示すように、その映像を出力部７の表示画面１７ａに出力することが可能であることから、テレビ、パーソナルコンピュータ、スマートフォン、タブレット端末、等を利用することができる。 <Playback Device 19>
The playback device 19 has, as the receiving unit 2, a tuner function for television broadcasting (including terrestrial digital broadcasting, satellite broadcasting, and one-seg) or a receiving function for receiving Internet-delivered video, as shown in FIG. Furthermore, since the image can be output to the display screen 17a of the output unit 7, a television, personal computer, smart phone, tablet terminal, or the like can be used.

受信部１２は、ＴＶ局からのＴＶ電波又はインターネットを介して発信される通信電波を受信する。また、再生装置１９外部で要約が作成される場合には、通信手段を介して供給された要約情報を受信する機能を有するように構成されていてもよい。受信部１２で受信した要約は、大容量記憶部１６に記憶、又は更新される。 The receiving unit 12 receives TV radio waves from a TV station or communication radio waves transmitted via the Internet. Further, when a summary is created outside the reproducing device 19, it may be configured to have a function of receiving summary information supplied via communication means. The abstract received by the receiving unit 12 is stored or updated in the large-capacity storage unit 16 .

操作部１３は、テレビに付帯の各種スイッチ等、テレビに付属のリモートコントロール装置、コンピュータ用のマウスやキーボード、スマートフォンやタブレット端末に付帯の各種スイッチやタッチパネル、等を利用することができる。 The operation unit 13 can use various switches and the like attached to the television, remote control devices attached to the television, mice and keyboards for computers, various switches and touch panels attached to smartphones and tablet terminals, and the like.

ところで、上述したテレビ放送において、ニュースでは、ある事件が起きると、複数局あるテレビ放送局が繰り返し同じシーンを放送する。このような場合、各テレビメディアが何をいつどう放送したか、一つ一つ把握しても全体像を容易に認識することはできない場合が多い。 By the way, in the above-mentioned television broadcasting, when a certain incident occurs in the news, a plurality of television broadcasting stations repeatedly broadcast the same scene. In such a case, it is often not possible to easily recognize the whole picture even if one understands what, when and how each television media broadcasted.

そこで、このような事件を所望の条件として設定すれば、指定した全てのニュース放送番組の内容を秒単位でテキストデータ化したうえでデータベース化し、要約を作成することができる。 Therefore, if such incidents are set as desired conditions, the contents of all designated news broadcast programs can be converted into text data in units of seconds and stored in a database to create a summary.

そして、その要約の内容を同一テーマ毎に分類（クラスター化）した結果を分析し、例えば、利用者や契約した専用会社のオペレータが処理すれば、なにが、いつ、どの局で、どのくらい放送されたか、定量化された情報を得ることも可能となる。 Then, by analyzing the results of classifying (clustering) the contents of the summary by the same theme, for example, if the user or the operator of the contracted dedicated company processes, what, when, which station, and how much will be broadcast It is also possible to obtain information that has been measured or quantified.

そして、このような定量化された情報を、所望の条件として設定することにより、以降のニュース放送では、より最新の正確な条件を設定することも可能となり、上述した事件に関する放送の場合には報知による番組の部分的視聴、他のニュース放送に関しては番組全体を視聴する、といったような選択を行うことができる。 By setting such quantified information as desired conditions, it becomes possible to set more up-to-date and accurate conditions in subsequent news broadcasts. It is possible to make selections such as partial viewing of the program by notification and viewing of the entire program for other news broadcasts.

この定量化に際し、例えば、事件の映像部分（例えば、原子力発電所の事故処理の経過に関する映像部分）を大容量記憶部１６に自動録画するなどの出力機能において重み付けを付与することも可能である。 In this quantification, for example, it is possible to give weight to the output function such as automatically recording the video portion of the incident (for example, the video portion related to the progress of the accident processing at the nuclear power plant) in the large-capacity storage unit 16. .

また、上述したように、このような事件・事故に関する放送がテレビメディアでどのくらい扱われたか、どの局がどのテーマを時間・回数的にどう扱ってきたかをグラフ化するといった利用形態において重み付けを付与することも可能である。 In addition, as mentioned above, weighting is given to usage forms such as graphing how much broadcasts related to such incidents and accidents have been covered in television media, and which stations have dealt with which theme in terms of time and number of times. It is also possible to

さらに、このような要約には、ニュース放送に限らず、各種エンターテーメント番組の内容を多角的に分析することも可能である。 Furthermore, such summaries are not limited to news broadcasts, and it is possible to analyze the contents of various entertainment programs from various angles.

これにより、例えば、網羅的に構築されたエンタメ・データベースを基に、ドラマ、映画、バラエティなどのエンターテーメント番組の内容やジャンル比較、時間帯把握など、多角的な観点で分析することができる。 As a result, for example, based on a comprehensively constructed entertainment database, entertainment programs such as dramas, movies, and variety shows can be analyzed from multiple perspectives, such as content, genre comparison, and time zone understanding. .

さらに、当該特定の出演者の出演時間を換算し、例えば、日・週・月単位での出演割合等からその演者価値を容易に算出することができる。 Furthermore, the appearance time of the specific performer can be converted, and the performer value can be easily calculated from, for example, the appearance rate on a daily/weekly/monthly basis.

また、上述した出演者の音声は、音声認識後のテキスト化のための形態素解析の際に、方言を標準語へと変換する重み付けを付与することも可能である。 Also, the voice of the performer described above can be weighted to convert the dialect into a standard language at the time of morphological analysis for text conversion after voice recognition.

制御回路部１５は、要約作成システム１０によって作成した要約を適宜（又は逐次）受信して大容量記憶部１６に蓄積するとともに、その要約の蓄積結果に基づいて重み付け付与のための最適な条件を学習しつつ、複数の映像コンテンツに対して要約に含まれる一つ以上の所定の条件に特化した重み付けを付与することができる。 The control circuit unit 15 appropriately (or sequentially) receives the summaries created by the summary creation system 10, stores them in the large-capacity storage unit 16, and determines the optimum conditions for weighting based on the summaries. While learning, weights specific to one or more predetermined conditions included in the summary can be assigned to multiple video contents.

このように、映像情報提供システム１１は、要約書作成手段で作成された要約は、重み付け付与手段によって複数の映像コンテンツを対象として少なくとも一つ以上の所定の条件に特化した重み付けを付与することにより、複数の映像コンテンツを対象として利用者により有用な情報を供給することができる。 In this way, the video information providing system 11 applies weights specific to at least one or more predetermined conditions to a plurality of video contents by the weighting means to the summary created by the summary creating means. Therefore, more useful information can be supplied to the user for a plurality of video contents.

また、複数の映像コンテンツにおいて、出現頻度に応じて（段階的等の）重要度を適正に設定することができる。また、日々放送されるニュース番組等においては、大きな事件や事故などは社会的な重要度（又は関心度）が高いといえる。そこで、そのような事件・事故等の重要度の比較的短期間を対象とすることにより、自動的に所定の条件とすることができる。 In addition, it is possible to appropriately set the degree of importance (in stages, etc.) according to the frequency of appearance in a plurality of video contents. In addition, in daily broadcast news programs, etc., it can be said that the degree of social importance (or degree of interest) of major incidents and accidents is high. Therefore, by targeting a relatively short period of importance of such incidents, accidents, etc., it is possible to automatically set a predetermined condition.

また、そのような事件・事故等を、例えば、要約作成用とは別に、大容量記憶部６に録画記憶しておけば、オペレータ等の編集、記録映像等として残しておくことができる。 In addition, if such incidents, accidents, etc., are recorded and stored in the large-capacity storage unit 6 separately from, for example, for preparing summaries, they can be edited by operators, etc., and can be left as recorded images.

上述した特定の条件には、特定の人・物・事件・事故を含ませることができる。ここで、特定の条件に、例えば、特定の人物、会社、各種団体等を対象とすることにより、複数の映像コンテンツから特定の人物、会社、団体の出現頻度、例えば、出現回数、出現時間、出現率等を割り出すことができる。 The specific conditions mentioned above can include specific persons, objects, incidents, and accidents. Here, by targeting, for example, a specific person, company, various groups, etc., as a specific condition, the frequency of appearance of a specific person, company, or group from a plurality of video contents, such as the number of appearances, the appearance time, The appearance rate, etc. can be calculated.

そして、その出現頻度に基づいて、どの程度の人物価値（例えば、ギャラの設定など）があるかの目安となるＣＭ換算値を算出することができる。 Then, based on the appearance frequency, it is possible to calculate a CM conversion value, which is a measure of how much character value (for example, setting of a guarantee, etc.) exists.

なお、特定の法人名を対象とした場合には、ＣＭ全体に対する提供割合、法人が販売する各種商品の案分（力の入れ具合や新商品の特定）、株価変動との相互関係、といった解析（重み付け）等に利用することもできる。 In addition, when targeting a specific corporate name, analyzes such as the ratio of commercials provided to the entire commercial, the proportion of various products sold by the corporation (degree of emphasis and identification of new products), and the correlation with stock price fluctuations It can also be used for (weighting) and the like.

出現頻度は、日・週・月・年単位で集計することができるほか、ＴＶ放送局ごと、時間帯ごと、に分けて集計することもできる。 Appearance frequencies can be aggregated on a daily/weekly/monthly/yearly basis, and can also be aggregated separately for each TV broadcasting station and for each time period.

また、出現頻度には、例えば、特定の番組において、番組単位で出現回数を１回とする場合と、番組内における出現回数を対象とする場合を、重み付けの条件として含ませることができる。 Also, the frequency of appearance can include, for example, a case where the number of appearances in a specific program is once per program, and a case where the number of appearances within a program is targeted, as weighting conditions.

ここで、番組内における出現回数には、映像として出現した回数や出現時間を対象とする場合と、例えば、司会者等から氏名を呼び掛けられた回数を対象とする場合と、を含ませることができる。 Here, the number of appearances in the program may include the number of appearances in the video or the appearance time, and the number of times the name is called out by the moderator, etc., for example. can.

なお、氏名での呼び掛けには、「○×△□」のフルネームの場合、「○×さん」等の氏だけの場合、「△□ちゃん」や「△ちゃん」等の名又は呼称の場合を条件とすることができる。 In addition, when calling by name, in the case of the full name "○×△□", in the case of only the name such as "○×-san", in the case of the name or name such as "△□-chan" or "△-chan" can be a condition.

さらに、その出演者がグループに所属している人であれば、グループ名での呼び掛けの場合と、氏名での呼び掛けの場合と、を（重み付けとして）含ませることができる。同様に、例えば、スポーツ等の分野において過去に好成績を残したことにより、後輩の選手に「○○２世」などと呼称されている場合も、先人と後人の両方を重み付け条件とすることができる。 Furthermore, if the performer belongs to a group, it is possible to include (as weighting) the case of addressing by group name and the case of addressing by name. Similarly, for example, even if a junior player is called "○○ second generation" because he has achieved good results in a field such as sports in the past, both the predecessor and the successor are used as weighting conditions. be able to.

なお、例えば、「○×」の氏での呼び掛けの場合、同一番組中に同じ氏の人が含まれている場合は少ないものの、業界全体としては複数の人が存在する可能性が高い。したがって、このような場合には、番組に出演していない人を除き、番組中に出演している人のみを重み付けの条件として含ませることができる。 For example, in the case of calling by the name of “○×”, although it is rare that a person with the same name is included in the same program, there is a high possibility that there are multiple people in the industry as a whole. Therefore, in such a case, only persons appearing in the program can be included as weighting conditions, excluding persons not appearing in the program.

これとは逆に、例えば、番組に出演はしていない人ではあるものの、出演者が所属するグループに所属する他の人物の氏名や映像等が出た場合には、その人物は出演しているものとして扱うこともできる。 Conversely, for example, if the name or image of another person belonging to the group to which the performer belongs, even though the person did not appear on the program, that person will not appear on the program. It can also be treated as if it exists.

このように、本実施の形態においては、複数の映像コンテンツについての各ビデオ信号から抽出した複数の映像コンテンツにおける音声データ又は映像データをテキスト化して要約を作成する要約作成手段６００と、要約作成手段６００で作成した要約の蓄積結果に基づいて最適な条件を学習しつつ、複数の映像コンテンツに対して要約に含まれる一つ以上の所定の条件に特化した重み付けを付与する重み付け付与手段１５ｄとを備え、構成要素出現頻度判断手段１５ａは要約に基づき構成要素の出現頻度を判断するように構成されている。 As described above, in the present embodiment, a summary creation means 600 for creating a summary by converting audio data or video data in a plurality of video contents extracted from each video signal of the plurality of video contents into text, and a summary creation means a weighting means 15d that, while learning the optimum conditions based on the accumulation results of the summaries created in step 600, gives weights specific to one or more predetermined conditions included in the summaries to a plurality of video contents; and the component appearance frequency determination means 15a is configured to determine the appearance frequency of the component based on the summary.

構成要素出現頻度判断手段１５ａは映像コンテンツの要約を基礎として映像コンテンツの出現頻度、即ち重要度の判断を行う。 The component appearance frequency determination means 15a determines the frequency of appearance of the video content, that is, the degree of importance, based on the summary of the video content.

即ち、要約作成手段６００により、複数の映像コンテンツについての各ビデオ信号から抽出した複数の映像コンテンツにおける音声データ又は映像データをテキスト化された要約に基づき、映像コンテンツの構成要素の出現頻度が判断される。 That is, the summarization means 600 determines the appearance frequency of the components of the video content based on the text summary of the audio data or video data of the plurality of video contents extracted from each video signal of the plurality of video contents. be.

この場合、要約作成手段６００により作成された要約は、重み付け付与手段１５ｄによって複数の映像コンテンツを対象として少なくとも一つ以上の所定の条件に特化した重み付けを付与することにより、複数の映像コンテンツを対象として利用者に対して、重要度又は関心度の高いより有用な情報を供給する。 In this case, the summary created by the summary creating means 600 is obtained by assigning weights specific to at least one or more predetermined conditions to the plurality of video contents by the weighting means 15d. To provide users with more useful information of high importance or interest.

従って、第二実施の形態に係る映像情報システム１１にあっては、構成要素出現頻度判断手段１５ａは映像コンテンツの要約を基礎として映像コンテンツの出現頻度、即ち重要度の判断を行うように構成されていることから、直接に映像コンテンツから、顔認識技術、音声認識、形態認識技術等の各種の高度な技術を各映像の主題を絞り込む場合に比して、より迅速かつ正確に当該構成要素の出現頻度の判断を行うことが可能となる。 Therefore, in the video information system 11 according to the second embodiment, the component appearance frequency determination means 15a is configured to determine the frequency of appearance of the video content, that is, the degree of importance, based on the summary of the video content. Therefore, compared to the case where various advanced technologies such as face recognition technology, voice recognition technology, and morphological recognition technology are used to narrow down the subject of each video directly from the video content, it is possible to identify the constituent elements more quickly and accurately. It becomes possible to judge the appearance frequency.

また、重み付け付与手段１５ｄは、複数の映像コンテンツに含まれる音声データ又は映像データから重複するテキストを参照したうえで、その参照結果が所定値以上である場合に、そのテキストを所定の条件に合致した重要テキストであると判定して重み付けを付与するものである。 Further, the weighting unit 15d refers to overlapping text from audio data or video data included in a plurality of video contents, and if the reference result is equal to or greater than a predetermined value, the text meets a predetermined condition. The text is judged to be important text that has been read and weighted.

このように、社会的事象に関する情報は複数のメディアにおいて、複数の映像コンテンツとして出現する場合が多いことから、複数の映像コンテンツにおいて、テキストの形態素解析等によって重複するテキストを同一の構成要素として判断するとともに、その出現頻度、例えば、出現回数、出現時間、出現率等に応じて、段階的等の重要度を適正に設定するように構成されていることから、正確に構成要素の出現頻度、即ち、社会的重要度の判断が行われ、ユーザーに対して適切な動画コンテンツが提供される。 In this way, information related to social events often appears as multiple video contents in multiple media. Therefore, in multiple video contents, it is possible to judge overlapping text as the same component by morphological analysis of the text. In addition, since it is configured to appropriately set the degree of importance, such as in stages, according to the appearance frequency, for example, the number of appearances, the appearance time, the appearance rate, etc., the appearance frequency of the constituent elements can be accurately set. That is, the degree of social importance is determined, and appropriate video content is provided to the user.

また、重み付け付与手段１５ｄは、予め設定された期間内における複数映像コンテンツを対象として重要テキストであるか否かを判定するものである。 Also, the weighting unit 15d determines whether or not a plurality of video contents within a preset period are important texts.

このような話題、事件又は事故等の情報の社会的な重要度の判断に要する期間を所定の期間内において判断するものである。従って、社会的な重要度の判断に要する期間を所定の期間内において判断されることから、ユーザーに対して迅速に社会的に重要な映像コンテンツを提供することが可能となる。 The period required for judging the social importance of information such as topics, incidents or accidents is judged within a predetermined period. Therefore, since the period required for determining the degree of social importance can be determined within a predetermined period, it is possible to quickly provide socially important video content to users.

このように、社会的に重要な事件・事故等を、例えば、要約作成用とは別個に、大容量記憶部に録画記憶しておけば、オペレータ等の編集、記録映像等として残しておくことができ、収集した当該ユーザーにとって重要な映像コンテンツを適宜、分析等に再利用することができる映像情報提供システムを提供することができる。 In this way, by recording and storing socially important incidents, accidents, etc. in a large-capacity storage unit separately from, for example, summaries, they can be edited and recorded by operators, etc. It is possible to provide a video information providing system capable of appropriately reusing the collected video content important to the user for analysis or the like.

また、重み付け付与手段１５ｄは、新たな映像コンテンツを対象として、重要テキストを含む映像コンテンツであると判定した場合には、当該映像コンテンツの録画を開始するものである。 Further, when the new video content is determined to be video content including important text, the weighting unit 15d starts recording the video content.

また、重み付け付与手段１５ｄは、複数の映像コンテンツの音声データ又は映像データの少なくとも一方に含まれるテキストから特定の人・物を対象として、その出現頻度からＣＭ換算値を算出するものである。 Further, the weighting unit 15d calculates a CM conversion value from the frequency of appearance of a specific person or object from texts included in at least one of audio data and video data of a plurality of video contents.

そして、その出現回数、出現時間、出現率等に基づいて、マスコミ、情報の世界においてどのくらいの人物価値（例えば、ギャラの設定など）があるかの目安となるＣＭ換算値を算出することができる。なお、特定の法人名を対象とした場合には、ＣＭ全体に対する提供割合、法人が販売する各種商品の案分（力の入れ具合や新商品の特定）、株価変動との相互関係、といった解析に利用することもできる。 Then, based on the number of appearances, appearance time, appearance rate, etc., it is possible to calculate a CM conversion value that serves as a guideline of how much a person's value (for example, setting a guarantee) is in the world of mass media and information. . In addition, when targeting a specific corporate name, analyzes such as the ratio of commercials provided to the entire commercial, the proportion of various products sold by the corporation (degree of emphasis and identification of new products), and the correlation with stock price fluctuations can also be used for

従って、重み付け付与手段１５ｄにより算出されたＣＭ換算値を利用して様々な経済活動の分析指標として使用することができる映像情報算出システムを提供することができる。 Therefore, it is possible to provide a video information calculation system that can use the CM conversion value calculated by the weighting means 15d as an analytical index for various economic activities.

また、重み付け付与手段１５ｄは、複数の映像コンテンツの音声データ又は映像データの少なくとも一方に含まれるテキストから特定の法人名を対象とするものである。 Also, the weighting means 15d targets a specific corporate name from texts included in at least one of audio data and video data of a plurality of video contents.

従って、特定の法人名が出現頻度判断の対象となる構成要素とされることから、特定の法人名が現れる映像コンテンツが抽出され、ユーザーとしての視聴者である当該法人に提供される。 Therefore, since a specific corporate name is used as a component for determining the appearance frequency, video content in which the specific corporate name appears is extracted and provided to the corporation, which is a viewer as a user.

従って、特定の法人名が出現頻度判断の対象となる構成要素とされることから、特定の法人名が現れる映像コンテンツが抽出され、ユーザーとしての視聴者である当該法人に提供されることから、例えば、自社の社会的評価、評判等に関する情報を迅速、適切、かつ網羅的に収集することができ、自社の経営に敏速に反映させることが可能となる。 Therefore, since a specific corporate name is a component that is the target of appearance frequency determination, video content in which a specific corporate name appears is extracted and provided to the corporation, who is a viewer as a user, For example, it is possible to quickly, appropriately, and comprehensively collect information about the company's social evaluation, reputation, etc., and promptly reflect it in the management of the company.

以下、第二実施の形態における要約作成システム１０について説明する。
＜要約作成システム１０の全体構成＞
図３に示すように、要約作成システム１０は、ビデオ信号分離部２０、発話テキスト化部１００、テロップテキスト化部２００、背景画像テキスト化部３００、ロゴマークテキスト化部４００、テキスト統合部５００、及び要約作成部６００を備える。本実施形態では要約作成システム１０はビデオ信号をテレビ局３０からのテレビ放送から取得する。なお、ビデオ信号は、インターネットにおける映像から取得することができる。 The summary creation system 10 according to the second embodiment will be described below.
<Overall Configuration of Abstract Creation System 10>
As shown in FIG. 3, the summary creation system 10 includes a video signal separation unit 20, a speech text conversion unit 100, a telop text conversion unit 200, a background image text conversion unit 300, a logo mark text conversion unit 400, a text integration unit 500, and a summary creation unit 600 . In this embodiment, summarizing system 10 obtains the video signal from a television broadcast from television station 30 . Note that the video signal can be obtained from images on the Internet.

音声信号と映像信号を含むビデオ信号Ｖは、ビデオ信号分離部２０で音声信号Ａと映像信号Ｂとに分離される。音声信号Ａは発話テキスト化部１００に入力され、映像信号Ｂはテロップテキスト化部２００、背景画像テキスト化部３００、及びロゴマークテキスト化部４００に入力される。 A video signal V including an audio signal and a video signal is separated into an audio signal A and a video signal B by a video signal separating section 20 . The audio signal A is input to the speech text conversion unit 100 , and the video signal B is input to the telop text conversion unit 200 , the background image text conversion unit 300 , and the logo mark text conversion unit 400 .

＜発話テキスト化部１００＞
図４に示すように、発話テキスト化部１００は音声信号Ａを受けて映像コンテンツにおける人の発話内容を記述したテキストである発話テキストを出力する。発話テキスト化部１００は、発話情報抽出部１１０、発話内容認識部１２０、発話内容テキスト化部１３０を備える。 <Speech text conversion unit 100>
As shown in FIG. 4, the speech-to-text conversion unit 100 receives the audio signal A and outputs speech text, which is a text describing the content of a person's speech in the video content. The speech text conversion unit 100 includes a speech information extraction unit 110 , a speech content recognition unit 120 , and a speech content text conversion unit 130 .

発話情報抽出部１１０は、ビデオ信号Ｖの音声信号Ａから発話情報を抽出する。すなわち、音声信号Ａ中の雑音を取り除き、人の発話音声の情報を抽出する。この発話情報として効果音や特徴的な音楽を含むことができる。 The speech information extraction unit 110 extracts speech information from the audio signal A of the video signal V. FIG. Namely, the noise in the audio signal A is removed and the information of the human speech is extracted. This speech information can include sound effects and characteristic music.

発話内容認識部１２０は、発話情報から発話内容を認識する。すなわち、発話情報を音響的、文法的に解析して発話内容を言語として認識する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去の音声テキストの生成データから機械学習により生成できる。 The utterance content recognition unit 120 recognizes the utterance content from the utterance information. That is, the utterance information is acoustically and grammatically analyzed to recognize the contents of the utterance as language. The parameters, conditions, and the like used for this recognition can be generated by machine learning from accumulated past speech text generation data, as will be described later.

発話内容テキスト化部１３０は発話内容をテキスト化して出力する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去の音声テキストの入力データ及び生成データから機械学習により生成できる。 The utterance content text conversion unit 130 converts the utterance content into text and outputs it. The parameters, conditions, and the like used for this recognition can be generated by machine learning from input data and generated data of past voice texts that have been accumulated as will be described later.

＜テロップテキスト化部２００＞
図５に示すように、テロップテキスト化部２００は映像信号Ｂを受けて映像コンテンツにおけるテロップ内容を記述したテキストであるテロップテキストを出力する。テロップテキスト化部２００は、テロップ情報抽出部２１０、テロップ内容認識部２２０、テロップ内容テキスト化部２３０を備える。 <telop text conversion unit 200>
As shown in FIG. 5, the telop text converting unit 200 receives the video signal B and outputs telop text, which is text describing the content of the telop in the video content. The telop text conversion unit 200 includes a telop information extraction unit 210 , a telop content recognition unit 220 , and a telop content text conversion unit 230 .

テロップ情報抽出部２１０は、ビデオ信号Ｖの映像信号Ｂからテロップ情報を抽出する。すなわち、映像信号Ｂ中の背景を取り除き、テロップ画像だけの情報を抽出する。 The telop information extraction unit 210 extracts telop information from the video signal B of the video signal V. FIG. That is, the background in the video signal B is removed and the information of only the telop image is extracted.

発話内容認識部１２０は、テロップ画像情報からテロップ内容を認識する。すなわち、テロップ情報を言語的、文法的に解析してテロップ表示内容を言語として認識する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去のテロップテキストの入力データ及び生成データから機械学習により生成できる。 The utterance content recognition unit 120 recognizes the telop content from the telop image information. That is, the telop information is linguistically and grammatically analyzed to recognize the telop display content as a language. The parameters, conditions, and the like used for this recognition can be generated by machine learning from accumulated past telop text input data and generated data, as will be described later.

テロップ内容テキスト化部２３０はテロップ内容をテキスト化して出力する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去のテロップテキストの入力データ及び生成データから機械学習により生成できる。 The telop content text conversion unit 230 converts the telop content into text and outputs it. The parameters, conditions, and the like used for this recognition can be generated by machine learning from accumulated past telop text input data and generated data, as will be described later.

＜背景画像テキスト化部３００＞
図６に示すように、背景画像テキスト化部３００は映像信号Ｂを受けて映像コンテンツにおける背景画像内容を記述したテキストである背景画像テキストを出力する。背景画像としては、人物、人物の持ち物、人物の表情、風景、建築物の状況、室内の状況、動物、乗物、その他の物品を挙げることができる。背景画像テキスト化部３００は、背景画像情報抽出部３１０、背景画像内容認識部３２０、背景画像内容テキスト化部３３０を備える。 <Background image text conversion unit 300>
As shown in FIG. 6, the background image text conversion unit 300 receives the video signal B and outputs background image text, which is text describing the content of the background image in the video content. Background images can include people, belongings of people, facial expressions of people, landscapes, building conditions, indoor conditions, animals, vehicles, and other items. The background image text conversion unit 300 includes a background image information extraction unit 310 , a background image content recognition unit 320 , and a background image content text conversion unit 330 .

背景画像情報抽出部３１０は、ビデオ信号Ｖの映像信号Ｂから背景画像情報を抽出する。すなわち、映像信号Ｂ中のテロップや不鮮明な画像を取り除き、認識可能な背景画像だけの情報を抽出する。 Background image information extraction section 310 extracts background image information from video signal B of video signal V. FIG. That is, telops and unclear images in the video signal B are removed, and information of only the recognizable background image is extracted.

背景画像内容認識部３２０は、背景画像情報から背景画像の内容を認識する。すなわち、背景画像情報を解析して表されている人物、人物の持ち物、人物の表情、風景、建築物の状況、室内の状況、動物、乗物、その他の物品を認識する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去の背景画像テキストの入力データ及び生成データから機械学習により生成できる。 The background image content recognition unit 320 recognizes the content of the background image from the background image information. In other words, the person, belongings of the person, expression of the person, scenery, building conditions, indoor conditions, animals, vehicles, and other items represented by analyzing the background image information are recognized. The parameters, conditions, and the like used for this recognition can be generated by machine learning from past background image text input data and generated data that have been accumulated as will be described later.

背景画像内容テキスト化部３３０は背景画像内容をテキスト化して出力する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去の背景画像テキストの入力データ及び生成データから機械学習により生成できる。 The background image content text conversion unit 330 converts the background image content into text and outputs it. The parameters, conditions, and the like used for this recognition can be generated by machine learning from past background image text input data and generated data that have been accumulated as will be described later.

＜ロゴマークテキスト化部４００＞
図７に示すように、ロゴマークテキスト化部４００は映像信号Ｂを受けて映像コンテンツにおけるロゴマーク内容を記述したテキストであるロゴマークテキストを出力する。ロゴマークとしては、商品の出所を表示する商標、その他の標章を挙げることができる。 <Logo Mark Text Conversion Unit 400>
As shown in FIG. 7, the logo mark text converting unit 400 receives the video signal B and outputs logo mark text, which is text describing the content of the logo mark in the video content. Logomarks include trademarks and other marks that indicate the origin of goods.

ロゴマークテキスト化部４００は、ロゴマーク画像情報抽出部４１０、ロゴマーク内容認識部４２０、ロゴマーク内容テキスト化部４３０を備える。 The logo mark text conversion unit 400 includes a logo mark image information extraction unit 410 , a logo mark content recognition unit 420 , and a logo mark content text conversion unit 430 .

ロゴマーク画像情報抽出部４１０は、ビデオ信号Ｖの映像信号Ｂからロゴマーク画像情報を抽出する。すなわち、映像信号Ｂ中のテロップや背景画像を取り除き、認識可能なロゴマーク画像だけの情報を抽出する。 The logo mark image information extraction unit 410 extracts logo mark image information from the video signal B of the video signal V. FIG. That is, telops and background images in the video signal B are removed, and information of only the recognizable logo mark image is extracted.

ロゴマーク内容認識部４２０は、ロゴマーク画像情報からロゴマークの内容を認識する。すなわち、ロゴマーク画像情報を解析して表されている商品、サービス、店舗、施設等を認識する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去のロゴマークテキストの入力データ及び生成データから機械学習により生成できる。 The logo mark content recognition unit 420 recognizes the content of the logo mark from the logo mark image information. That is, the product, service, store, facility, etc. represented by analyzing the logo mark image information are recognized. The parameters, conditions, and the like used for this recognition can be generated by machine learning from past logo mark text input data and generated data accumulated as will be described later.

ロゴマーク内容テキスト化部４３０はロゴマーク画像内容をテキスト化して出力する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去のロゴマークテキストの入力データ及び生成データから機械学習により生成できる。 The logo mark content text conversion unit 430 converts the content of the logo mark image into text and outputs the text. The parameters, conditions, and the like used for this recognition can be generated by machine learning from past logo mark text input data and generated data accumulated as will be described later.

＜テキスト統合部５００＞
図８に示すように、テキスト統合部５００は、発話テキスト化部１００からの発話テキスト、テロップテキスト化部２００からのテロップテキスト、背景画像テキスト化部３００からの背景テキスト、ロゴマークテキスト化部４００からの背景テキストを統合する。すなわち、各テキストにおける矛盾や誤りを訂正して、統合テキストを生成する。このテキストの統合に使用するパラメータ、条件等は後述するように蓄積された過去のテキスト統合の入力、出力データから機械学習により生成できる。 <Text Integration Unit 500>
As shown in FIG. 8, the text integration unit 500 includes the speech text from the speech text conversion unit 100, the telop text from the telop text conversion unit 200, the background text from the background image text conversion unit 300, and the logo mark text conversion unit 400. Integrate background text from . That is, correct contradictions and errors in each text to generate an integrated text. The parameters, conditions, etc. used for this text integration can be generated by machine learning from the input and output data of past text integration, which has been accumulated as described later.

＜要約作成部６００＞
図９に示すように、要約作成部６００は、テキスト統合部５００からの統合テキストを要約する。すなわち、要約テキストの内容を要約して指定された文字数とする。この要約に使用するパラメータ、条件等は後述するように蓄積された過去のようよう役処理の入力データ、出力データから機械学習により生成できる。 <Summary Creation Unit 600>
As shown in FIG. 9, the summarizer 600 summarizes the integrated text from the text integrator 500 . That is, the contents of the summary text are summarized to the specified number of characters. The parameters, conditions, etc. used for this summarization can be generated by machine learning from the input data and output data of past winning combination processing accumulated as will be described later.

次に、各部の機械学習処理について説明する。
＜発話テキスト化部１００の機械学習処理＞
図３は同要約作成システムの発話テキスト化部を示すブロック図である。発話テキスト化部１００は、発話情報抽出部１１０、発話内容認識部１２０、発話内容テキスト化部１３０の他、機械学習部１４０、内容認識テキスト作成設定部１５０、比較評価部１６０を備える。また発話テキスト化部１００には、既存データ格納部７００が接続されている。 Next, machine learning processing of each unit will be described.
<Machine learning processing of the speech-to-text unit 100>
FIG. 3 is a block diagram showing an utterance-to-text section of the summary creation system. The utterance text conversion unit 100 includes an utterance information extraction unit 110 , an utterance content recognition unit 120 , an utterance content text conversion unit 130 , a machine learning unit 140 , a content recognition text creation setting unit 150 , and a comparison evaluation unit 160 . An existing data storage unit 700 is also connected to the speech text conversion unit 100 .

発話テキスト化部１００は既存データ格納部７００が格納する既存のビデオデータと既存の発話テキストに基づいて機械学習を行い、発話内容認識部１２０及び発話内容テキスト化部１３０を最適化する。既存データ格納部７００には、過去に人が発話テキストを作成したときに使用した多数のビデオデータを格納した既存ビデオデータ格納部７１０と、このビデオデータの発話内容から作成した発話テキストを格納した既存発話テキスト格納部７２０を備える。これらのビデオデータ及び発話テキストは機械学習の教材となる。 The speech text conversion unit 100 performs machine learning based on the existing video data stored in the existing data storage unit 700 and the existing speech text, and optimizes the speech content recognition unit 120 and the speech content text conversion unit 130 . An existing data storage unit 700 stores an existing video data storage unit 710 storing a large number of video data used when a person created an utterance text in the past, and an utterance text created from the utterance contents of this video data. An existing speech text storage unit 720 is provided. These video data and spoken texts serve as teaching materials for machine learning.

また、発話テキスト化部１００には、機械学習を行うときと、新規のビデオデータから発話内容テキストを作成するときにデータ出力の切り換えを行う切換部１７０、１８０を備える。 The speech-to-text unit 100 also includes switching units 170 and 180 for switching data output when performing machine learning and when creating speech content text from new video data.

内容認識テキスト作成設定部１５０は、発話内容認識部１２０の発話内容認識処理の設定と、発話内容テキスト化部１３０のテキスト化処理の設定が格納されている。発話内容認識部１２０及び発話内容テキスト化部１３０は内容認識テキスト作成設定部１５０の設定した条件、パラメータに従って発話内容の認識とテキスト化とを行う。 The content recognition text creation setting section 150 stores the settings of the speech content recognition processing of the speech content recognition section 120 and the settings of the text conversion processing of the speech content text conversion section 130 . The speech content recognition unit 120 and the speech content text conversion unit 130 recognize and text the speech content according to the conditions and parameters set by the content recognition text creation setting unit 150 .

比較評価部１６０は、比較部１６１と評価部１６２とを備える。比較部１６１は、既存ビデオデータ格納部７１０からの既存ビデオデータを受けて発話内容テキスト化部１３０が作成した発話テキストと、既存発話テキスト格納部７２０からの既存発話テキストとを比較する。評価部１６２は比較部１６１の比較結果に基づいて評価を行い、よく一致した場合は高い点数を与える。 The comparison evaluation section 160 includes a comparison section 161 and an evaluation section 162 . The comparison unit 161 compares the speech text created by the speech content text conversion unit 130 upon receiving the existing video data from the existing video data storage unit 710 and the existing speech text from the existing speech text storage unit 720 . The evaluation unit 162 evaluates based on the comparison result of the comparison unit 161, and gives a high score when there is a good match.

機械学習部１４０は、評価部１６２からの評価を受け、内容認識テキスト作成設定部１５０の設定状態を変更する。この処理を同一のビデオデータについて繰り返し行い、評価部１６２の評価値をできるだけ高いものとする。この処理は複数のビデオデータについて繰り返し行うことができる。 The machine learning unit 140 receives the evaluation from the evaluation unit 162 and changes the setting state of the content recognition text creation setting unit 150 . This processing is repeated for the same video data, and the evaluation value of the evaluation unit 162 is made as high as possible. This process can be repeated for multiple pieces of video data.

このような機械学習を行うことにより、発話内容認識部１２０及び発話内容テキスト化部１３０の能力が向上する。所定の機械学習を終了した後、発話テキスト化部１００は新規ビデオデータを処理して、最適な発話テキストを出力できる状態となる。 By performing such machine learning, the capabilities of the utterance content recognition unit 120 and the utterance content text conversion unit 130 are improved. After completing the predetermined machine learning, the speech-to-text conversion unit 100 processes the new video data and is ready to output the optimum speech text.

＜テロップテキスト化部２００の機械学習＞
図４は同要約作成システムのテロップテキスト化部を示すブロック図である。テロップテキスト化部２００は、テロップ情報抽出部２１０、テロップ内容認識部２２０、テロップ内容テキスト化部２３０の他、機械学習部２４０、内容認識テキスト作成設定部２５０、比較評価部２６０を備える。またテロップテキスト化部２００には、既存データ格納部７００が接続されている。 <Machine learning of telop text conversion unit 200>
FIG. 4 is a block diagram showing a telop-text conversion unit of the summary creation system. The telop text conversion unit 200 includes a telop information extraction unit 210, a telop content recognition unit 220, a telop content text conversion unit 230, a machine learning unit 240, a content recognition text creation setting unit 250, and a comparison evaluation unit 260. An existing data storage unit 700 is also connected to the telop text conversion unit 200 .

テロップテキスト化部２００は既存データ格納部７００が格納する既存のビデオデータと既存のテロップテキストに基づいて機械学習を行い、テロップ内容認識部２２０及びテロップ内容テキスト化部２３０を最適化する。既存データ格納部７００には、過去に人がテロップテキストを作成したときに使用した多数のビデオデータを格納した既存ビデオデータ格納部７１０と、このビデオデータの発話内容から作成したテロップテキストを格納した既存テロップテキスト格納部７３０を備える。これらのビデオデータ及び発話テキストは機械学習の教材となる。 The telop text conversion unit 200 performs machine learning based on existing video data and existing telop text stored in the existing data storage unit 700 to optimize the telop content recognition unit 220 and the telop content text conversion unit 230 . An existing data storage unit 700 stores an existing video data storage unit 710 storing a large number of video data used when a person created telop text in the past, and telop text created from the utterance content of this video data. An existing telop text storage unit 730 is provided. These video data and spoken texts serve as teaching materials for machine learning.

また、テロップテキスト化部２００には、機械学習を行うときと、新規のビデオデータから発話内容テキストを作成するときにデータ出力の切り換えを行う切換部２７０、２８０を備える。 Further, the telop text conversion unit 200 includes switching units 270 and 280 for switching data output when performing machine learning and when creating speech content text from new video data.

内容認識テキスト作成設定部２５０は、テロップ内容認識部２２０のテキスト内容認識処理の設定と、テロップ内容テキスト化部２３０のテキスト化処理の設定が格納されている。テロップ内容認識部２２０及びテロップ内容テキスト化部２３０は内容認識テキスト作成設定部２５０の設定した条件、パラメータに従ってテロップの内容認識及びテキスト化を行う。 The content recognition text creation setting section 250 stores the settings of the text content recognition processing of the telop content recognition section 220 and the settings of the text conversion processing of the telop content text conversion section 230 . The telop content recognition unit 220 and the telop content text conversion unit 230 recognize the content of the telop and convert it into text according to the conditions and parameters set by the content recognition text creation setting unit 250 .

比較評価部２６０は、比較部２６１と評価部２６２とを備える。比較部２６１は、既存ビデオデータ格納部７１０からの既存ビデオデータを受けてテロップ内容テキスト化部２３０が作成したテロップテキストと、既存テロップテキスト格納部７３０からの既存テロップテキストとを比較する。評価部２６２は比較部２６１の比較結果に基づいて評価を行い、よく一致した場合は高い点数を与える。 The comparison evaluation section 260 includes a comparison section 261 and an evaluation section 262 . The comparison unit 261 compares the telop text created by the telop content text conversion unit 230 upon receiving the existing video data from the existing video data storage unit 710 and the existing telop text from the existing telop text storage unit 730 . The evaluation unit 262 evaluates based on the comparison result of the comparison unit 261, and gives a high score when there is a good match.

機械学習部２４０は、評価部２６２からの評価を受け、内容認識テキスト作成設定部２５０の設定状態を変更する。この処理を同一のビデオデータについて繰り返し行い、評価部２６２の評価値をできるだけ高いものとする。この処理は複数のビデオデータについて繰り返し行うことができる。 The machine learning section 240 receives the evaluation from the evaluation section 262 and changes the setting state of the content recognition text creation setting section 250 . This processing is repeated for the same video data, and the evaluation value of the evaluation unit 262 is made as high as possible. This process can be repeated for multiple pieces of video data.

このような機械学習を行うことにより、テロップ内容認識部２２０及びテロップ内容テキスト化部２３０の能力が向上する。所定の機械学習を終了した後、テロップテキスト化部２００は新規ビデオデータを処理して、最適なテロップテキストを出力できる状態となる。 By performing such machine learning, the capabilities of the telop content recognition unit 220 and the telop content text conversion unit 230 are improved. After completing the predetermined machine learning, the telop text conversion unit 200 processes the new video data and is ready to output the optimum telop text.

＜背景画像テキスト化部３００の機械学習＞
図６は同要約作成システムの背景画像テキスト化部を示すブロック図である。背景画像テキスト化部３００は、背景画像情報抽出部３１０、背景画像内容認識部３２０、背景画像内容テキスト化部３３０の他、機械学習部３４０、内容認識テキスト作成設定部３５０、比較評価部３６０を備える。また背景画像テキスト化部３００には、既存データ格納部７００が接続されている。 <Machine learning of the background image text conversion unit 300>
FIG. 6 is a block diagram showing the background image text conversion section of the summary creation system. The background image text conversion unit 300 includes a background image information extraction unit 310, a background image content recognition unit 320, a background image content text conversion unit 330, a machine learning unit 340, a content recognition text creation setting unit 350, and a comparison evaluation unit 360. Prepare. An existing data storage unit 700 is connected to the background image text converting unit 300 .

背景画像テキスト化部３００は既存データ格納部７００が格納する既存のビデオデータと既存の背景画像テキストに基づいて機械学習を行い、背景画像内容認識部３２０及び背景画像内容テキスト化部３３０を最適化する。既存データ格納部７００には、過去に人がテロップテキストを作成したときに使用した多数のビデオデータを格納した既存ビデオデータ格納部７１０と、このビデオデータの発話内容から作成した背景画像テキストを格納した既存背景画像テキスト格納部７４０を備える。これらのビデオデータ及び背景画像テキストは機械学習の教材となる。 The background image text conversion unit 300 performs machine learning based on the existing video data stored in the existing data storage unit 700 and the existing background image text, and optimizes the background image content recognition unit 320 and the background image content text conversion unit 330. do. The existing data storage unit 700 stores an existing video data storage unit 710 storing a large number of video data used when a person created telop text in the past, and a background image text created from the utterance content of this video data. An existing background image text storage unit 740 is provided. These video data and background image text serve as teaching materials for machine learning.

また、背景画像テキスト化部３００には、機械学習を行うときと、新規のビデオデータから発話内容テキストを作成するときにデータ出力の切り換えを行う切換部３７０、３８０を備える。 The background image text conversion unit 300 also includes switching units 370 and 380 for switching data output when machine learning is performed and when speech content text is created from new video data.

内容認識テキスト作成設定部３５０は、背景画像内容認識部３２０の背景画像内容認識処理の設定と、背景画像内容テキスト化部３３０のテキスト化処理の設定が格納されている。背景画像内容認識部３２０及び背景画像内容テキスト化部３３０は内容認識テキスト作成設定部３５０の設定した条件、パラメータに従って背景画像の内容認識及びテキスト化を行う。 The content recognition text creation setting section 350 stores the settings of the background image content recognition processing of the background image content recognition section 320 and the settings of the text conversion processing of the background image content text conversion section 330 . The background image content recognition unit 320 and the background image content text conversion unit 330 perform content recognition and text conversion of the background image according to the conditions and parameters set by the content recognition text creation setting unit 350 .

比較評価部３６０は、比較部３６１と評価部３６２とを備える。比較部３６１は、既存ビデオデータ格納部７１０からの既存ビデオデータを受けて背景画像内容テキスト化部３３０が作成した背景画像テキストと、既存背景画像テキスト格納部７４０からの既存背景画像テキストとを比較する。評価部３６２は比較部３６１の比較結果に基づいて評価を行い、よく一致した場合は高い点数を与える。 The comparison evaluation section 360 includes a comparison section 361 and an evaluation section 362 . The comparison unit 361 compares the background image text created by the background image content text conversion unit 330 upon receiving the existing video data from the existing video data storage unit 710 and the existing background image text from the existing background image text storage unit 740. do. The evaluation unit 362 performs evaluation based on the comparison result of the comparison unit 361, and gives a high score when there is a good match.

機械学習部３４０は、評価部３６２からの評価を受け、内容認識テキスト作成設定部３５０の設定状態を変更する。この処理を同一のビデオデータについて繰り返し行い、評価部３６２の評価値をできるだけ高いものとする。この処理は複数のビデオデータについて繰り返し行うことができる。 The machine learning section 340 receives the evaluation from the evaluation section 362 and changes the setting state of the content recognition text creation setting section 350 . This processing is repeated for the same video data, and the evaluation value of the evaluation unit 362 is made as high as possible. This process can be repeated for multiple pieces of video data.

このような機械学習を行うことにより、背景画像内容認識部３２０及び背景画像内容テキスト化部３３０の能力が向上する。所定の機械学習を終了した後、背景画像テキスト化部３００は新規ビデオデータを処理して、最適な背景画像テキストを出力できる状態となる。 By performing such machine learning, the capabilities of the background image content recognition unit 320 and the background image content text conversion unit 330 are improved. After completing the predetermined machine learning, the background image text conversion unit 300 is ready to process the new video data and output the optimum background image text.

＜ロゴマークテキスト化部４００の機械学習＞
図７は同要約作成システムのロゴマークテキスト化部を示すブロック図である。ロゴマークテキスト化部４００は、ロゴマーク画像情報抽出部４１０、ロゴマーク内容認識部４２０、ロゴマーク内容テキスト化部４３０の他、機械学習部４４０、内容認識テキスト作成設定部４５０、比較評価部４６０を備える。またロゴマークテキスト化部４００には、既存データ格納部７００が接続されている。 <Machine learning of the logo mark text conversion unit 400>
FIG. 7 is a block diagram showing a logo mark text conversion unit of the same summary creation system. The logo mark text conversion unit 400 includes a logo mark image information extraction unit 410 , a logo mark content recognition unit 420 , a logo mark content text conversion unit 430 , a machine learning unit 440 , a content recognition text creation setting unit 450 , a comparison evaluation unit 460 . Prepare. An existing data storage unit 700 is connected to the logo mark text conversion unit 400 .

ロゴマークテキスト化部４００は、既存データ格納部７００が格納する既存のビデオデータと既存のロゴマークテキストに基づいて機械学習を行い、ロゴマーク内容認識部４２０及びロゴマーク内容テキスト化部４３０を最適化する。既存データ格納部７００には、過去に人がロゴマークテキストを作成したときに使用した多数のビデオデータを格納した既存ビデオデータ格納部７１０と、このビデオデータの発話内容から作成したロゴマークテキストを格納した既存ロゴマークテキスト格納部７５０を備える。これらのビデオデータ及びロゴマークテキストは機械学習の教材となる。 The logo mark text conversion unit 400 performs machine learning based on the existing video data stored in the existing data storage unit 700 and the existing logo mark text, and optimizes the logo mark content recognition unit 420 and the logo mark content text conversion unit 430. become The existing data storage unit 700 stores an existing video data storage unit 710 storing a large number of video data used when people created logo mark texts in the past, and a logo mark text created from the utterance content of this video data. A stored existing logo mark text storage unit 750 is provided. These video data and logo mark text serve as teaching materials for machine learning.

また、ロゴマークテキスト化部４００には、機械学習を行うときと、新規のビデオデータから発話内容テキストを作成するときにデータ出力の切り換えを行う切換部４７０、４８０を備える。 Further, the logo mark text conversion unit 400 is provided with switching units 470 and 480 for switching data output when performing machine learning and when creating speech content text from new video data.

内容認識テキスト作成設定部４５０は、ロゴマーク内容認識部４２０のロゴマーク画像内容認識処理の設定と、ロゴマーク内容テキスト化部４３０のテキスト化処理の設定が格納されている。ロゴマーク内容認識部４２０及びロゴマーク内容テキスト化部４３０は内容認識テキスト作成設定部４５０の設定した条件、パラメータに従ってロゴマークの内容認識及びテキスト化を行う。 The content recognition text creation setting section 450 stores the settings of the logo mark image content recognition processing of the logo mark content recognition section 420 and the settings of the text conversion processing of the logo mark content text conversion section 430 . The logo mark content recognition section 420 and the logo mark content text conversion section 430 perform the content recognition and text conversion of the logo mark according to the conditions and parameters set by the content recognition text creation setting section 450 .

比較評価部４６０は、比較部４６１と評価部４６２とを備える。比較部４６１は、既存ビデオデータ格納部７１０からの既存ビデオデータを受けてロゴマーク内容テキスト化部４３０が作成したテキストと、既存ロゴマークテキスト格納部７５０からの既存背景画像テキストとを比較する。評価部４６２は比較部４６１の比較結果に基づいて評価を行い、よく一致した場合は高い点数を与える。 The comparison evaluation section 460 includes a comparison section 461 and an evaluation section 462 . The comparison unit 461 compares the text created by the logo mark content text conversion unit 430 upon receiving the existing video data from the existing video data storage unit 710 and the existing background image text from the existing logo mark text storage unit 750 . The evaluation unit 462 evaluates based on the comparison result of the comparison unit 461, and gives a high score when there is a good match.

機械学習部４４０は、評価部４６２からの評価を受け、内容認識テキスト作成設定部４５０の設定状態を変更する。この処理を同一のビデオデータについて繰り返し行い、評価部４６２の評価値をできるだけ高いものとする。この処理は複数のビデオデータについて繰り返し行うことができる。 The machine learning section 440 receives the evaluation from the evaluation section 462 and changes the setting state of the content recognition text creation setting section 450 . This processing is repeated for the same video data, and the evaluation value of the evaluation unit 462 is made as high as possible. This process can be repeated for multiple pieces of video data.

このような機械学習を行うことにより、ロゴマーク内容認識部４２０及びロゴマーク内容テキスト化部４３０の能力が向上する。所定の機械学習を終了した後、ロゴマークテキスト化部４００は新規ビデオデータを処理して、最適な背景画像テキストを出力できる状態となる。 By performing such machine learning, the capabilities of the logo mark content recognition unit 420 and the logo mark content text conversion unit 430 are improved. After completing the predetermined machine learning, the logo mark text conversion unit 400 is ready to process the new video data and output the optimum background image text.

＜テキスト統合部５００の機械学習＞
図８は同要約作成システムのテキスト統合部を示すブロック図である。テキスト統合部５００は、統合テキスト作成部５１０、統合テキスト作成設定部５２０、機械学習部５３０、比較評価部５４０を備える。テキスト統合部５００には、既存データ格納部７００が接続されている。 <Machine Learning of Text Integration Unit 500>
FIG. 8 is a block diagram showing the text integration section of the summary creation system. The text integration unit 500 includes an integrated text creation unit 510 , an integrated text creation setting unit 520 , a machine learning unit 530 and a comparison evaluation unit 540 . An existing data storage unit 700 is connected to the text integration unit 500 .

テキスト統合部５００は、既存データ格納部７００が格納する既存の各種、すなわち、発話テキスト、テロップテキスト、背景テキスト及びロゴマークテキストと既存の統合テキストに基づいて機械学習を行い、統合テキスト作成部５１０の動作を最適化する。既存データ格納部７００には、過去に統合テキストを作成したときに使用した各種テキストデータを格納した既存各種テキスト格納部７６０と、この各種テキストから作成した統合テキストを格納した既存統合テキスト格納部７７０とを備える。これらの各種テキスト及び統合テキストは機械学習の教材となる。 The text integration unit 500 performs machine learning based on existing integrated texts stored in the existing data storage unit 700, namely, speech text, telop text, background text, logo mark text, and existing integrated texts. Optimize the behavior of The existing data storage unit 700 includes an existing text storage unit 760 that stores various text data used when integrated texts were created in the past, and an existing integrated text storage unit 770 that stores integrated texts created from these various texts. and These various texts and integrated texts serve as teaching materials for machine learning.

また、テキスト統合部５００には、機械学習を行うときと、新規の各種テキストから新たな統合テキストを作成するときにデータ出力の切り換えを行う切換部５７０、５８０を備える。 The text integration unit 500 also includes switching units 570 and 580 for switching data output when performing machine learning and when creating a new integrated text from various new texts.

統合テキスト作成設定部５２０は、統合テキスト作成部５１０のテキスト統合処理の設定が格納されている。統合テキスト作成部５１０は統合テキスト作成設定部５２０の設定した条件、パラメータに従ってテキスト統合処理を行う。 The integrated text creation setting section 520 stores settings for text integration processing of the integrated text creation section 510 . The integrated text creation section 510 performs text integration processing according to the conditions and parameters set by the integrated text creation setting section 520 .

比較評価部５４０は、比較部５４１と評価部５４２とを備える。比較部５４１は、既存各種テキスト格納部７６０からの既存各種テキストを受けて統合テキスト作成部５１０が作成した統合テキストと、既存統合テキスト格納部７７０からの既存統合テキストとを比較する。評価部５４２は比較部５４１の比較結果に基づいて評価を行い、よく一致した場合は高い点数を与える。 The comparison evaluation section 540 includes a comparison section 541 and an evaluation section 542 . The comparison unit 541 compares the integrated text created by the integrated text creation unit 510 upon receipt of the various existing texts from the existing various text storage unit 760 and the existing integrated text from the existing integrated text storage unit 770 . The evaluation unit 542 performs evaluation based on the comparison result of the comparison unit 541, and gives a high score when there is a good match.

機械学習部５３０は、評価部５４２からの評価を受け、統合テキスト作成設定部５２０の設定状態を変更する。この処理を同一の各種テキストデータについて繰り返し行い、評価部５４２の評価値をできるだけ高いものとする。この処理は複数の各種テキストデータについて繰り返し行うことができる。 The machine learning section 530 receives the evaluation from the evaluation section 542 and changes the setting state of the integrated text creation setting section 520 . This processing is repeated for the same text data, and the evaluation value of the evaluation unit 542 is made as high as possible. This process can be repeated for a plurality of various text data.

このような機械学習を行うことにより、統合テキスト作成部５１０の能力が向上する。所定の機械学習を終了した後、テキスト統合部５００は新規ビデオデータを処理して、最適な統合テキストを出力できる状態となる。 By performing such machine learning, the ability of the integrated text creation unit 510 is improved. After completing the predetermined machine learning, the text integrator 500 is ready to process the new video data and output the optimal integrated text.

＜要約作成部６００の機械学習＞
図９は同要約作成システムの要約作成部を示すブロック図である。要約作成部６００は、要約テキスト作成部６１０、要約作成設定部６２０、機械学習部６３０、比較評価部６４０を備える。要約作成部６００には、既存データ格納部７００が接続されている。 <Machine learning of summary creation unit 600>
FIG. 9 is a block diagram showing the abstract creating section of the same abstract creating system. The summary creation section 600 includes a summary text creation section 610 , a summary creation setting section 620 , a machine learning section 630 and a comparison evaluation section 640 . An existing data storage unit 700 is connected to the summary creation unit 600 .

要約作成部６００は既存データ格納部７００が格納する統合テキストと要約テキストに基づいて機械学習を行い、要約テキスト作成部６１０の動作を最適化する。既存データ格納部７００には、過去に要約テキストを作成したときに使用した統合テキストデータを格納した既存統合テキスト格納部７７０と、この統合テキストから作成した要約テキストを格納した既存要約テキスト格納部７８０とを備える。これらの統合テキスト及び要約テキストは機械学習の教材となる。 The summary creation section 600 performs machine learning based on the integrated text and the summary text stored in the existing data storage section 700 to optimize the operation of the summary text creation section 610 . The existing data storage unit 700 includes an existing integrated text storage unit 770 that stores integrated text data used when summarizing texts were created in the past, and an existing integrated text storage unit 780 that stores a summary text created from this integrated text. and These integrated texts and summary texts serve as teaching materials for machine learning.

また、要約作成部６００には、機械学習を行うときと、新規の統合テキストから新たな要約テキストを作成するときにデータ出力の切り換えを行う切換部６７０、６８０を備える。 The summary creation unit 600 also includes switching units 670 and 680 for switching data output when performing machine learning and when creating a new summary text from a new integrated text.

要約作成設定部６２０には、要約テキスト作成部６１０の要約処理の設定が格納されている。要約テキスト作成部６１０は要約作成設定部６２０の設定した条件、パラメータに従ってテキスト要約処理を行う。 The summary creation setting section 620 stores the settings of the summary processing of the summary text creation section 610 . The summary text creation section 610 performs text summary processing according to the conditions and parameters set by the summary creation setting section 620 .

比較評価部６４０は、比較部６４１と評価部６４２とを備える。比較部６４１は、既存統合テキスト格納部７７０からの既存統合テキストを受けて要約テキスト作成部６１０が作成した要約テキストと、既存要約テキスト格納部７８０からの要約テキストとを比較する。評価部６４２は比較部６４１の比較結果に基づいて評価を行い、よく一致した場合は高い点数を与える。 The comparison evaluation section 640 includes a comparison section 641 and an evaluation section 642 . Comparing section 641 compares the summary text created by summary text creation section 610 upon receiving the existing integrated text from existing integrated text storage section 770 and the summary text from existing summary text storage section 780 . The evaluation unit 642 evaluates based on the comparison result of the comparison unit 641, and gives a high score when there is a good match.

機械学習部６３０は、評価部６４２からの評価を受け、要約作成設定部６２０の設定状態を変更する。この処理を同一の各種テキストデータについて繰り返し行い、評価部６４２の評価値をできるだけ高いものとする。この処理は複数の統合テキストデータについて繰り返し行うことができる。 The machine learning section 630 receives the evaluation from the evaluation section 642 and changes the setting state of the summary creation setting section 620 . This processing is repeated for the same text data, and the evaluation value of the evaluation unit 642 is made as high as possible. This process can be repeated for a plurality of integrated text data.

このような機械学習を行うことにより、要約テキスト作成部６１０の能力が向上する。所定の機械学習を終了した後、要約作成部６００は新規ビデオデータを処理して、最適な要約テキストを出力できる状態となる。 By performing such machine learning, the ability of the summary text creation unit 610 is improved. After completing the predetermined machine learning, the digester 600 is ready to process the new video data and output the optimal text summary.

次に、要約作成システム１０の処理について説明する。図１０は同要約作成システムの動作を示すフローチャートである。 Next, processing of the abstract creation system 10 will be described. FIG. 10 is a flow chart showing the operation of the abstract creation system.

まず、既存データ格納部７００の既存ビデオデータ格納部７１０、既存発話テキスト格納部７２０、既存テロップテキスト格納部７３０、既存背景画像テキスト格納部７４０、既存ロゴマークテキスト格納部７５０、既存各種テキスト格納部７６０、既存統合テキスト格納部７７０、既存要約テキスト格納部７８０に既存のビデオ信号、各種テキストデータを読み込む（ステップＳＴ１）。 First, an existing video data storage unit 710, an existing speech text storage unit 720, an existing telop text storage unit 730, an existing background image text storage unit 740, an existing logo mark text storage unit 750, and various existing text storage units of the existing data storage unit 700. 760, the existing integrated text storage section 770, and the existing summary text storage section 780, the existing video signal and various text data are read (step ST1).

次いで発話テキスト化部１００、テロップテキスト化部２００、背景画像テキスト化部３００、ロゴマークテキスト化部４００において、機械学習処理を行う（ステップＳＴ２ａ、ＳＴ２ｂ、ＳＴ２ｃ、ＳＴ２ｄ）。この学習処理は逐次的に行うこともできる。 Next, machine learning processing is performed in the speech text conversion unit 100, the telop text conversion unit 200, the background image text conversion unit 300, and the logo mark text conversion unit 400 (steps ST2a, ST2b, ST2c, ST2d). This learning process can also be performed sequentially.

次に、テキスト統合部５００の既存データ格納部５５０、要約作成部６００の既存データ格納部６５０に既存の入力データ、出力データを読み込む（ステップＳＴ３）。次いで、テキスト統合部５００、要約作成部６００において機械学習処理を行う（ステップＳＴ３ａ、３ｂ）。この学習処理は逐次的に行うこともできる。なお、発話テキスト化部１００、テロップテキスト化部２００、背景画像テキスト化部３００、及びロゴマークテキスト化部４００の機械学習処理と、及びテキスト統合部５００及び要約作成部６００の機械学習処理とは処理の順序を問わず、逆の順序で行うことができる。 Next, existing input data and output data are read into the existing data storage unit 550 of the text integration unit 500 and the existing data storage unit 650 of the summary creation unit 600 (step ST3). Next, machine learning processing is performed in the text integration section 500 and the summary creation section 600 (steps ST3a and ST3b). This learning process can also be performed sequentially. What is the machine learning processing of the utterance text conversion unit 100, the telop text conversion unit 200, the background image text conversion unit 300, and the logo mark text conversion unit 400, and the machine learning processing of the text integration unit 500 and the summary creation unit 600? Regardless of the order of processing, it can be performed in the reverse order.

学習処理が終了すると（ステップＳＴ４のyes）、処理対象となるビデオ信号をビデオ信号分離部２０に入力する（ステップＳＴ５）。これにより、発話テキスト化部１００、テロップテキスト化部２００、背景画像テキスト化部３００、ロゴマークテキスト化部４００は、テキスト化処理を実行する（ステップＳＴ６ａ、ＳＴ６ｂ、ＳＴ６ｃ、ＳＴ６ｄ）。 When the learning process is completed (yes in step ST4), the video signal to be processed is input to the video signal separating section 20 (step ST5). As a result, the speech text conversion unit 100, the telop text conversion unit 200, the background image text conversion unit 300, and the logo mark text conversion unit 400 execute text conversion processing (steps ST6a, ST6b, ST6c, ST6d).

そして、各テキストをテキスト統合部５００で統合処理し（ステップＳＴ７）、更に統合されたテキストを要約作成部６００で要約処理し（ステップＳＴ８）、要約テキストを出力し、要約作成システム１０の処理は終了する。 Then, each text is integrated by the text integration unit 500 (step ST7), the integrated text is summarized by the summary generation unit 600 (step ST8), and the summary text is output. finish.

次の要約作成処理からは、機械学習処理（ステップＳＴ１～ＳＴ４）は行わなくて直ちに要約作成の対象ビデオ信号の入力（ステップＳＴ５）をするだけで、最適な要約作成を行うことができる。また、機械学習処理は必要に応じて行うことができる。 From the next digest creation process, the machine learning process (steps ST1 to ST4) is not performed, and the optimum digest can be created only by immediately inputting the video signal to be summarized (step ST5). Also, machine learning processing can be performed as needed.

以上のシステムは、処理装置としてのＣＰＵ(Central Processing Unit)、記憶装置としてＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、ＨＤＤ（Hard Disc Drive）、ＳＳＤ（Solid State Drive）等を備えたコンピュータシステムでアプリケーションションソフトウエアを実行して実現できる。また、各部は同一ヶ所に配置される必要はなく、一部をウェブに配置してネットワークで接続して実現することができる。また、これらの処理は、多量のデータを対象とするためＧＰＵ（Graphics Processing Unit）を使用して処理することが好ましい。 The above system includes a CPU (Central Processing Unit) as a processing device, RAM (Random Access Memory), ROM (Read Only Memory), HDD (Hard Disc Drive), SSD (Solid State Drive), etc. as storage devices. It can be implemented by running application software on a computer system. Moreover, each part does not have to be arranged at the same place, and can be realized by arranging a part on a web and connecting with a network. In addition, since these processes target a large amount of data, it is preferable to process them using a GPU (Graphics Processing Unit).

すなわち、統合テキストは、単に、音声、文字,背景映像等の文字化してものであり、膨大な文章についてのデータである。コのため、ＧＰＵをテキスト処理に特化することにより高速に処理できる。 That is, the integrated text is simply textualization of voices, characters, background images, etc., and is data about a huge amount of sentences. Therefore, high-speed processing can be achieved by specializing the GPU to text processing.

また、テキスト統合部５００によるテキスト入力は、発話テキスト、テロップテキスト、背景画像テキスト及びロゴマークテキストに限定されない。 Text input by the text integration unit 500 is not limited to speech text, telop text, background image text, and logo mark text.

例えば、テレビ番組（地上デジタルテレビ放送番組）を対象とする場合、電子番組表（ＥＰＧ）、字幕放送、解説放送から取得した文字や音声をテキストとして取得して入力することができる。これにより、統合テキストの質と量とを向上させるとともに、テキストの汎用性や嗜好性を向上させることができる。 For example, when targeting television programs (digital terrestrial television broadcast programs), characters and voices obtained from an electronic program guide (EPG), closed-caption broadcasts, and commentary broadcasts can be acquired as text and input. As a result, the quality and quantity of the integrated text can be improved, and the versatility and taste of the text can be improved.

同様に、インターネット映像配信を対象とする場合、第三者の評価（コメントを含む）や評判をテキストとして取得して入力できる。これにより、統合テキストの質と量とを向上させるとともに、テキストの汎用性や嗜好性を向上させることができる。 Similarly, in the case of Internet video distribution, evaluations (including comments) and reputations of third parties can be acquired as text and input. As a result, the quality and quantity of the integrated text can be improved, and the versatility and taste of the text can be improved.

なお、「重み付け」には、視聴回数（再生回数）や第三者の評価（ｇｏｏｄ・ｂａｄ）を利用して、視聴回数に対する評価割合、或いは、ｇｏｏｄ／ｂａｄの比率等を利用することも可能である。 For "weighting", it is also possible to use the number of views (number of playbacks) and third party evaluation (good/bad), and use the evaluation ratio to the number of views, or the ratio of good/bad. is.

以上のように、第二の実施形態に係る映像情報提供システム１１にあっては、要約書作成手段６００により作成された要約は、重み付け付与手段１５ｄによって複数の映像コンテンツを対象として少なくとも一つ以上の所定の条件に特化した重み付けを付与されることにより、複数の映像コンテンツを対象として利用者により有用な情報を供給することができる。 As described above, in the video information providing system 11 according to the second embodiment, the summary created by the summary creating means 600 is weighted by the weighting means 15d for a plurality of video contents, and at least one or more By assigning weights specific to the predetermined conditions of (1), more useful information can be supplied to the user for a plurality of video contents.

第二の実施の形態にあってはデータ処理をＡＩ（人工知能：artificial intelligence）処理により高速且つ適切に処理して要約化する。ＡＩ処理は、上述した機械学習（ＭＬ：machine learning）により実現できる。更に、機械学習として、既存データを正解とする教師有り学習が採用できる。また、機械学習としてディープラーニング（深層学習：ＤＬ：Deep Learning）により行うと効果的である。 In the second embodiment, data processing is performed at high speed and appropriately by AI (artificial intelligence) processing and summarized. AI processing can be realized by machine learning (ML) described above. Furthermore, as machine learning, supervised learning can be employed in which existing data is the correct answer. Moreover, it is effective to use deep learning (DL: Deep Learning) as machine learning.

ディープラーニングでは、既存の多数のビデオデータ、各ビデオデータに対応する各種テキストデータ、統合テキスト、要約テキストをビッグデータとして学習を行う。この、各機械学習部は、入力層、複数の中間層、出力層を備え、多数のニューロンを備えたニューラルネットワークにより処理を行い。すなわち、本発明に係る要約作成システムに入力された新規ビデオデータ、このビデオデータによる各種テキスト、統合テキスト、要約を入力とした出力が、既存の各種テキスト、統合テキスト、要約に近づくように中間層のニューロンにおける重み、パラメータを最小二乗法等の手法で適正化する。 In deep learning, learning is performed using a large number of existing video data, various text data corresponding to each video data, integrated text, and summary text as big data. Each machine learning unit has an input layer, multiple intermediate layers, and an output layer, and performs processing using a neural network with a large number of neurons. That is, the new video data input to the summary creation system according to the present invention, various texts, integrated texts, and abstracts based on this video data are input, and the intermediate layer is arranged so that the output approaches the existing various texts, integrated texts, and abstracts. The weights and parameters in the neurons of are optimized by methods such as the method of least squares.

本発明の権利範囲は、前記第一及び第二の実施の形態に記載された構成には限定されず、本発明の範囲内に含まれる全ての構成に及ぶものである。 The scope of rights of the present invention is not limited to the configurations described in the first and second embodiments, but extends to all configurations included within the scope of the present invention.

１：映像情報提供システム
２：受信部
３：操作部
４：記憶部
５：制御回路部
５ａ：構成要素出現頻度判断手段
５ｂ：映像コンテンツ抽出手段
５Ｃ：希望映像登録手段
６：大容量記憶部
７：出力部（映像コンテンツ提供手段、出力手段）
７ａ：表示映像対象物
９：再生装置
１０：要約作成システム（要約作成手段）
１１：映像情報提供システム
１２：受信部
１３：操作部
１４：記憶部
１５：制御回路部
１５ａ：構成要件出現頻度判断手段
１５ｂ：映像コンテンツ抽出手段
１５ｃ：希望映像登録手段
１５ｄ：重み付け付与手段
１６：大容量記憶部
１７：出力部（出力手段）
１７ａ：表示画面
１８：ビデオ信号処理部
１９：再生装置
２０：ビデオ信号分離部
３０：テレビ局
４０：インターネット
１００：発話テキスト化部
１１０：発話情報抽出部
１２０：発話内容認識部
１３０：発話内容テキスト化部
１４０：機械学習部
１５０：内容認識テキスト作成設定部
１６０：比較評価部
１６１：比較部
１６２：評価部
１７０：切換部
１８０：切換部
２００：テロップテキスト化部
２１０：テロップ情報抽出部
２２０：テロップ内容認識部
２３０：テロップ内容テキスト化部
２４０：機械学習部
２５０：内容認識テキスト作成設定部
２６０：比較評価部
２６１：比較部
２６２：評価部
２７０：切換部
２８０：切換部
３００：背景画像テキスト化部
３１０：背景画像情報抽出部
３２０：背景画像内容認識部
３３０：背景画像内容テキスト化部
３４０：機械学習部
３５０：内容認識テキスト作成設定部
３６０：比較評価部
３６１：比較部
３６２：評価部
３７０：切換部
３８０：切換部
４００：ロゴマークテキスト化部
４１０：ロゴマーク画像情報抽出部
４２０：ロゴマーク内容認識部
４３０：ロゴマーク内容テキスト化部
４４０：機械学習部
４５０：内容認識テキスト作成設定部
４６０：比較評価部
４６１：比較部
４６２：評価部
４７０：切換部
４８０：切換部
５００：テキスト統合部
５１０：統合テキスト作成部
５２０：統合テキスト作成設定部
５３０：機械学習部
５４０：比較評価部
５４１：比較部
５４２：評価部
５５０：既存データ格納部
５７０：切換部
５８０：切換部
６００：要約作成部（要約作成手段）
６１０：要約テキスト作成部
６２０：要約作成設定部
６３０：機械学習部
６４０：比較評価部
６４１：比較部
６４２：評価部
６５０：既存データ格納部
６７０：切換部
６８０：切換部
７００：既存データ格納部
７１０：既存ビデオデータ格納部
７２０：既存発話テキスト格納部
７３０：既存テロップテキスト格納部
７４０：既存背景画像テキスト格納部
７５０：既存ロゴマークテキスト格納部
７６０：既存各種テキスト格納部
７７０：既存統合テキスト格納部
７８０：既存要約テキスト格納部
1: Video information providing system 2: Reception unit 3: Operation unit 4: Storage unit 5: Control circuit unit 5a: Component appearance frequency determination means 5b: Video content extraction means 5C: Desired video registration means 6: Mass storage unit 7 : Output unit (video content providing means, output means)
7a: Display video object 9: Playback device 10: Summary creation system (summary creation means)
11: Video information providing system 12: Reception unit 13: Operation unit 14: Storage unit 15: Control circuit unit 15a: Constituent element appearance frequency determination means 15b: Video content extraction means 15c: Desired video registration means 15d: Weighting means 16: Mass storage unit 17: output unit (output means)
17a: Display screen 18: Video signal processing unit 19: Playback device 20: Video signal separation unit 30: TV station 40: Internet 100: Speech text conversion unit 110: Speech information extraction unit 120: Speech content recognition unit 130: Speech content text conversion Unit 140: Machine learning unit 150: Content recognition text creation setting unit 160: Comparative evaluation unit 161: Comparison unit 162: Evaluation unit 170: Switching unit 180: Switching unit 200: Telop text conversion unit 210: Telop information extraction unit 220: Telop Content recognition unit 230: Telop content text conversion unit 240: Machine learning unit 250: Content recognition text creation setting unit 260: Comparative evaluation unit 261: Comparison unit 262: Evaluation unit 270: Switching unit 280: Switching unit 300: Background image text conversion Section 310: Background image information extraction section 320: Background image content recognition section 330: Background image content text conversion section 340: Machine learning section 350: Content recognition text creation setting section 360: Comparative evaluation section 361: Comparison section 362: Evaluation section 370 : Switching section 380: Switching section 400: Logo mark text conversion section 410: Logo mark image information extraction section 420: Logo mark content recognition section 430: Logo mark content text conversion section 440: Machine learning section 450: Content recognition text creation setting section 460: Comparative evaluation unit 461: Comparison unit 462: Evaluation unit 470: Switching unit 480: Switching unit 500: Text integration unit 510: Integrated text creation unit 520: Integrated text creation setting unit 530: Machine learning unit 540: Comparative evaluation unit 541 : Comparison unit 542: Evaluation unit 550: Existing data storage unit 570: Switching unit 580: Switching unit 600: Summary creation unit (summary creation means)
610: Summary text creation unit 620: Summary creation setting unit 630: Machine learning unit 640: Comparative evaluation unit 641: Comparison unit 642: Evaluation unit 650: Existing data storage unit 670: Switching unit 680: Switching unit 700: Existing data storage unit 710: Existing video data storage unit 720: Existing speech text storage unit 730: Existing telop text storage unit 740: Existing background image text storage unit 750: Existing logo mark text storage unit 760: Existing various text storage unit 770: Existing integrated text storage Section 780: Existing summary text storage section

Claims

receiving means for receiving the distributed video content;
a storage means for storing and accumulating the video content received by the receiving means;
A component that analyzes the components of the distributed video content, compares the analyzed components with a large number of past video contents accumulated in the storage means, and determines the appearance frequency of the same component. appearance frequency determination means;
Video content extraction for extracting video content containing the component when the appearance frequency is equal to or greater than a predetermined value based on the determination by the component appearance frequency determination means , from a large number of past video contents accumulated in the storage means. means and
a video content providing means for providing a viewer with the video extracted by the video content extraction means;
The component appearance frequency determination means analyzes components included in the distributed video content, and compares the analyzed component with a large number of past video contents accumulated in the storage means to determine if they are the same. determine the frequency of occurrence of the constituent elements of
When it is determined that the appearance frequency of the predetermined component has reached a predetermined value in a predetermined period of time, the video content extracting means extracts the video content including the component from a large number of video contents stored in the storage means. and providing the video content extracted by the video content providing means to a viewer .

2. The video information providing system according to claim 1, wherein said component appearance frequency determination means determines the appearance frequency of said component based on a viewing request registered by a viewer.

2. A video information providing system according to claim 1, wherein said constituent elements are characters, sounds, video objects that are themes of said video content, characters, or backgrounds of video objects.

4. The structural element appearance frequency determining means determines the frequency of appearance of the structural elements by referring to the title portion displayed by characters in the video screen. or the image information providing system according to item 1.

5. The video information providing system according to any one of claims 1, 2, 3, and 4, wherein said video content is distributed from a server via an electric communication line.

5. The video information providing system according to any one of claims 1, 2, 3, and 4, wherein said video content is distributed from a broadcasting station via an electric communication line.

a summary creation means for creating a summary by converting audio data or video data in the video content extracted from each video signal of the video content into text;
Appropriate weighting is applied to one or more predetermined conditions included in the summary to the plurality of video contents while learning the optimum condition based on the accumulation result of the summary created by the summary creation means. 2. A video information providing system according to claim 1, further comprising weighting means, and wherein the element appearance frequency determination means determines the appearance frequency of the elements based on the summary.

The weighting means refers to overlapping text from audio data or video data included in the plurality of video contents, and if the reference result is equal to or greater than a predetermined value, the text meets the predetermined condition. 8. The video information providing system according to claim 7, wherein the image information providing system determines that the text is important text that has been read and weighted.

9. A video information providing system according to claim 8, wherein said weighting means determines whether or not a plurality of video contents within a preset period are important texts.

10. The weight assigning means starts recording of new video content when it is determined that the new video content includes the important text. The video information providing system according to any one of 1.

The weighting means calculates a CM conversion value from the frequency of appearance of a specific person or object from the text included in at least one of the audio data and the video data of the plurality of video contents. 8. The video information providing system according to claim 7.

8. The video information providing system according to claim 7, wherein said weighting means targets a specific corporate name from text included in at least one of said audio data and said video data of said plurality of video contents. .