JP2019213038A

JP2019213038A - Video information providing system

Info

Publication number: JP2019213038A
Application number: JP2018107238A
Authority: JP
Inventors: 孝利石井; Takatoshi Ishii
Original assignee: JCC KK
Current assignee: JCC KK
Priority date: 2018-06-04
Filing date: 2018-06-04
Publication date: 2019-12-12
Anticipated expiration: 2038-06-04
Also published as: JP7137825B2

Abstract

To provide a video information providing system capable of quickly and appropriately supplying useful video contents to a user, for a large number of distributed video contents.SOLUTION: A system comprises: component appearance frequency determining means for determining an appearance frequency of an identical component on the basis of the component of distributed video content; video content extracting means for extracting a video content including the component when the appearance frequency is equal to or higher than a predetermined value on the basis of the determination by the component appearance frequency determining means, and video content providing means for providing a viewer with the video extracted by the video content extracting means.SELECTED DRAWING: Figure 1

Description

本発明は、映像情報提供システムに関し、特に、ＴＶ及びウェブに出現する映像コンテンツの出現頻度を判断し、出現頻度が高い映像コンテンツを適時にユーザーに提供する映像情報提供システムに関する。 The present invention relates to a video information providing system, and more particularly, to a video information providing system that determines the appearance frequency of video content appearing on TV and the web and provides the user with video content having a high appearance frequency in a timely manner.

現在、多種多様な大量の映像情報がＴＶ及びウェブを介して視聴者、ユーザーに継続的に配信されている。このように２４時間、随時、自動的に提供されている大量の映像情報の中から、視聴者及びユーザーが自分に有用な映像コンテンツを適切、適時に抽出することは困難な状況になりつつある。 Currently, a wide variety of video information is continuously distributed to viewers and users via TV and the web. Thus, it is becoming difficult for viewers and users to appropriately and timely extract useful video content from a large amount of video information that is automatically provided at any time for 24 hours. .

即ち、多忙な現代人は、従来のように、ＴＶ受像機の前において所定時間に亘って放送局から配信されてくる複数の番組、大量の映像情報を全て視認し、その中から自分に必要な映像コンテンツ、情報を入手することが次第に不可能となっている状況がある。 In other words, a busy modern person visually recognizes a plurality of programs and a large amount of video information distributed from a broadcasting station for a predetermined time in front of a TV receiver as before, and needs it from among them. There is a situation where it is gradually impossible to obtain proper video content and information.

また、ウェブからの情報の入手に関しても同様の事情がある。即ち、ウェブから必要な情報を入手しようとする場合には、Ｇｏｏｇｌｅ（登録商標）等のインターネット検索エンジンを利用し、キーワード検索により情報を入手するが、キーワード検索により検索される情報には、いわゆる「ノイズ」と称される、希望する情報に類似する情報までもが大量に抽出される。 There is a similar situation with regard to obtaining information from the web. That is, in order to obtain necessary information from the web, information is obtained by keyword search using an Internet search engine such as Google (registered trademark). A large amount of information called “noise” and similar to the desired information is extracted.

従って、ユーザーは、検索エンジンを使用した場合であっても、大量の関連情報の中から、自らが希望する情報をさらに抽出する必要があり、検索作業がなお煩雑である、という不具合があった。 Therefore, even when the search engine is used, the user needs to further extract information desired by the user from a large amount of related information, and the search operation is still complicated. .

ところで、従来、例えば、映像コンテンツとしてのテレビ放映等で出力している映像の中から、ある期間において予め登録した映像と類似する映像を探索する技術が知られている。 By the way, conventionally, for example, a technique for searching for a video similar to a video registered in advance in a certain period from videos output by television broadcasting as video content is known.

このような探索技術は、例えば、テレビ放映信号の中から特定のタイトルロールを検出してリアルタイム録画の開始・停止や、異なる時間・放送局で放送された同一ニュース素材を検出して映像の構造解析を行う等の技術に用いられている（例えば、特許文献１参照）。 Such a search technique is, for example, the detection of a specific title roll from a television broadcast signal to start / stop real-time recording, or the same news material broadcast at different times / broadcast stations to detect the structure of the video. It is used for techniques such as analysis (see, for example, Patent Document 1).

また、このような探索技術は、テレビ放映に限定されず、例えば、インターネット回線を通じて受信した映像コンテンツ等の配信データを対象とすることも可能である（例えば、特許文献２参照）。 Moreover, such a search technique is not limited to television broadcasting, but can also target distribution data such as video content received through the Internet line (see, for example, Patent Document 2).

さらに、このような探索技術は、映像に限定されず、例えば、テキストへの対応も可能である。具体的には、映像コンテンツに含まれる字幕テキストの他、放送番組のコーナーごとの放送開始時刻、放送終了時刻、出演者、及び、コーナーの内容の要約等のメタデータを、放送番組の終了後に配信する有料サービス（番組メタデータサービスとも称される）のサービス提供者が提供するメタデータや、ユーザーがキーボード等を操作することによって入力する、映像コンテンツを説明するテキスト等を採用することができる（例えば、特許文献２参照）。 Further, such a search technique is not limited to video, and can deal with text, for example. Specifically, in addition to the subtitle text included in the video content, metadata such as the broadcast start time, broadcast end time, performer, and summary of the contents of the corner for each broadcast program corner are displayed after the broadcast program ends. Metadata provided by a service provider of a paid service to be distributed (also referred to as a program metadata service), text describing a video content input by a user operating a keyboard or the like can be adopted. (For example, refer to Patent Document 2).

特開２０１０−２６２４１３号公報JP 2010-262413 A 特開２０１２−０３８２３９号公報JP 2012-038239 A

しかしながら、これらの技術は、例えば、一つの番組や映像コンテンツを対象としており、上記のような、視聴者やユーザーが、ＴＶにおいて提供されている全番組の中から重要度が高い、又は関心度が高い、有用な映像コンテンツを時間効率的に取得したい、という要請、及び、ウェブにおいてアップされている全映像コンテンツの中から重要な又は関心度の高い、有用な映像コンテンツを時間効率的に取得したい、という要請に応えられるものではなかった。 However, these technologies, for example, target a single program or video content, and viewers and users have high importance or interest among all programs provided on TV as described above. Demand for timely and efficient acquisition of useful video content, and timely and effective acquisition of important or high interest video content from all the video content uploaded on the web It was not possible to meet the request to do so.

本発明の課題は、上述のような課題を解決するために、配信された多数の映像コンテンツの中から利用者にとって有用な映像コンテンツを、迅速かつ適切に供給することができる映像情報提供システムを提供することにある。 In order to solve the above-described problems, an object of the present invention is to provide a video information providing system capable of quickly and appropriately supplying video content useful for a user from among a large number of distributed video contents. It is to provide.

本発明に係る映像情報提供システムは、上記目的を達成のため、配信された映像コンテンツの構成要素に基づき、同一の構成要素の出現頻度を判断する構成要素出現頻度判断手段と、前記構成要素出現頻度判断手段による判断に基づき、出現頻度が所定値以上の場合に前記構成要素が含まれる映像コンテンツを抽出する映像コンテンツ抽出手段と、前記映像コンテンツ抽出手段により抽出された映像を視聴者に提供する映像コンテンツ提供手段とを備える。 In order to achieve the above object, the video information providing system according to the present invention includes a component appearance frequency determining unit that determines the appearance frequency of the same component based on the component of the distributed video content, and the component occurrence Based on the determination by the frequency determination means, the video content extraction means for extracting the video content including the component when the appearance frequency is a predetermined value or more, and the video extracted by the video content extraction means are provided to the viewer Video content providing means.

配信された映像コンテンツは、構成要素出現頻度判断手段により、映像コンテンツに含まれる構成要素が解析されて、過去に蓄積した映像コンテンツとの対比がされることにより同一の構成要素の出現頻度が判断される。ここで上記映像コンテンツは動画が主体となるが静止画も含まれる。また、動画の中に存在する映像構成要素としての音声やテキストも判断対象となる。 The distributed video content is analyzed by the component appearance frequency determining means, and the components included in the video content are analyzed and compared with the video content accumulated in the past to determine the appearance frequency of the same component. Is done. Here, the video content is mainly composed of moving images but also includes still images. In addition, audio and text as video components existing in the moving image are also determined.

そして、構成要素出現頻度判断手段により所定の構成要素の出現頻度が所定期間において所定値に達したと判断された場合には、映像コンテンツ抽出手段によりその構成要素を含む映像コンテンツが、蓄積された多数の映像コンテンツから抽出される。 Then, when it is determined by the component appearance frequency determining means that the appearance frequency of the predetermined component has reached a predetermined value in a predetermined period, the video content including the component is accumulated by the video content extracting means. Extracted from a large number of video contents.

その後、映像コンテンツ提供手段により抽出された映像コンテンツは視聴者に提供される。この場合の映像コンテンツの提供には、映像コンテンツの表示による提供のみならず、映像コンテンツの記録・配信、その他の出力の態様が含まれる。 Thereafter, the video content extracted by the video content providing means is provided to the viewer. The provision of the video content in this case includes not only provision by display of the video content but also recording and distribution of the video content and other output modes.

この場合、構成要素出現頻度判断手段及び映像コンテンツ抽出手段には、ＡＩ（人工知能：artificial intelligence）が使用され、高速でのデータ処理が行われる。 In this case, AI (artificial intelligence) is used for the component appearance frequency determining means and the video content extracting means, and high-speed data processing is performed.

請求項２記載の発明にあっては、前記構成要素出現頻度判断手段は視聴者が登録した視聴希望に基づき前記構成要素の出現頻度を判断することを特徴とする。 According to a second aspect of the present invention, the component appearance frequency determining means determines the appearance frequency of the component based on a viewing request registered by a viewer.

従って、本システムのユーザーである視聴者は、自身が視聴を希望する映像、映像の分類、種類、ジャンル等を適宜の指定範囲において登録することができ、前記構成要素出現頻度判断手段はその希望に基づき構成要素の出現頻度の判断を行う。 Therefore, the viewer who is the user of this system can register the video he / she wants to watch, the classification, type, genre, etc. of the video within an appropriate designated range, and the component appearance frequency judging means Based on the above, the appearance frequency of the component is determined.

請求項３記載の発明は、前記構成要素は、文字、音声、前記映像コンテンツのテーマとなる映像対象物、登場人物又は映像対象物の背景であることを特徴とする。 According to a third aspect of the present invention, the component is a character, audio, a video object, a character, or a background of the video object as a theme of the video content.

動画映像の場合に、構成要素出現頻度判断手段が特定の映像コンテンツ内の構成要素をいかに認識、特定するか、に関しては、「文字、音声、前記映像コンテンツのテーマとなる映像対象物、登場人物又は映像対象物の背景」の観点から行われる。 In the case of a moving image, regarding how the component appearance frequency determining means recognizes and identifies a component in a specific video content, “text, audio, video object that is the subject of the video content, character Alternatively, it is performed from the viewpoint of “background of video object”.

この場合、「文字」とは動画に現れるテロップ等の文字であり、いわゆる文字認識技術に基づき行われ、形態素解析技術が使用される。 In this case, the “character” is a character such as a telop that appears in a moving image, which is performed based on a so-called character recognition technique and uses a morphological analysis technique.

また、「音声」とは、動画等に含まれる多様な音や人の声であり、背景音、テーマソング、登場人物の話し声、効果音等が含まれる。必要な場合には、その多様な音声の中から、当該映像のテーマとなる話題に関する特定の登場人物の話し声のみを抽出して構成要素出現頻度判断手段により出現頻度が判断される。 “Speech” is a variety of sounds and human voices included in moving images and the like, and includes background sounds, theme songs, spoken voices of characters, sound effects, and the like. When necessary, only the spoken voice of a specific character related to the topic that is the theme of the video is extracted from the various voices, and the appearance frequency is determined by the component appearance frequency determination means.

この場合、特定の映像の主題となる人物の話し声のみを特定する場合には、当該人物の声の周波数帯域を特定することにより、当該人物の話し声のみを抽出することが可能となる。 In this case, when only the spoken voice of the person who is the subject of the specific video is specified, it is possible to extract only the voice of the person by specifying the frequency band of the voice of the person.

また、「テーマとなる映像対象物」とは、当該映像の主題となる対象物であり、「登場人物」とは映像のテーマに関連して主役として登場する人物又は脇役として登場する人物を含む。この場合、登場人物の特定に関しては顔認証技術等が使用される。また、「映像対象物の背景」とは、登場人物や映像の主題となる対象物の背景として映っているものを指し、例えば、建造物や、海、山、空、平原等である。 In addition, the “video object as a theme” is an object as the theme of the video, and the “character” includes a person who appears as a leading role or a supporting character in relation to the video theme. . In this case, face authentication technology or the like is used for specifying the characters. In addition, the “background of the video object” refers to what is reflected as the background of the character or the object that is the subject of the video, such as a building, the sea, a mountain, the sky, a plain, or the like.

請求項４記載の発明にあっては、前記構成要素出現頻度判断手段は、映像画面内に文字により表示されたタイトル部を参照して構成要素の出現頻度を判断することを特徴とする。
即ち、ＴＶ放送局から配信されるニュース映像の場合には、画面に表示された映像内の右上等にテキストから成るタイトル部が表示される場合があり、構成要件出現頻度判断手段はこのタイトル部を参照して映像認識、音声認識、人物認識、背景認識等を行い、構成要件の出現頻度を判断する。
請求項５記載の発明は、前記映像コンテンツは、電気通信回線を介してサーバから配信されることを特徴とする。 According to a fourth aspect of the present invention, the component appearance frequency determining means determines the frequency of appearance of the component with reference to a title portion displayed by characters in the video screen.
That is, in the case of a news video distributed from a TV broadcasting station, a title part made up of text may be displayed in the upper right of the video displayed on the screen. The image recognition, the voice recognition, the person recognition, the background recognition, etc. are performed with reference to the above, and the appearance frequency of the component requirement is determined.
The invention according to claim 5 is characterized in that the video content is distributed from a server via a telecommunication line.

従って、請求項５に係る発明においては、構成要素出現頻度判断手段は、ウェブを介して配信された動画を含む映像コンテンツを対象として同一の構成要素の出願頻度を判断する。ここで「ウェブを介して配信された動画」とは、ウェブにアップされている各種の動画、ＳＮＳ（ソーシャルネットワークサービス）上の動画、インターネットを通じて配信される動画（例えば、ＮＥＴＦＬＩＸ（登録商標）、ａｍａｚｏｎＦＩＲＥＴＶ）等の一切を含む。
また、以下の説明において、「ウェブ」とは、ブラウザで閲覧するウェブシステムの場合に限定されず、クラウドシステムの場合を含む。したがって、映像コンテンツは、ウェブサーバ或いはクラウドサーバの何れかに存在している場合を含んでいる。 Accordingly, in the invention according to claim 5, the component appearance frequency determining means determines the application frequency of the same component for video content including a moving image distributed via the web. Here, “video distributed via the web” refers to various videos uploaded to the web, videos on SNS (social network service), videos distributed via the Internet (for example, NETFLIX (registered trademark), including Amazon FIRE TV).
Further, in the following description, “web” is not limited to the case of a web system browsed by a browser, but includes the case of a cloud system. Accordingly, the video content includes a case where the video content exists in either the web server or the cloud server.

請求項６に記載の発明は、前記映像コンテンツは、電気通信回線を介して放送局から配信されることを特徴とする。 The invention described in claim 6 is characterized in that the video content is distributed from a broadcasting station via a telecommunication line.

従って、請求項６記載の発明においては、構成要素出現頻度判断手段は、ＴＶ映像として表示される全ての映像コンテンツを対象として同一の構成要素の出願頻度を判断する。 Accordingly, in the invention described in claim 6, the component appearance frequency determining means determines the application frequency of the same component for all video contents displayed as TV images.

また、請求項７記載の発明に係る映像情報提供システムは、複数の映像コンテンツについての各ビデオ信号から抽出した前記複数の映像コンテンツにおける音声データ又は映像データをテキスト化して要約を作成する要約作成手段と、前記要約作成手段で作成した要約の蓄積結果に基づいて最適な条件を学習しつつ、前記複数の映像コンテンツに対して前記要約に含まれる一つ以上の所定の条件に特化した重み付けを付与する重み付け付与手段とを備え、前記構成要素出現頻度判断手段は前記要約に基づき構成要素の出現頻度を判断することを特徴とする。 According to a seventh aspect of the present invention, there is provided the video information providing system according to the seventh aspect of the present invention, wherein the video information providing system creates a summary by converting the audio data or video data in the video contents extracted from the video signals for the video contents into texts And weighting specialized for one or more predetermined conditions included in the summary with respect to the plurality of video contents while learning an optimum condition based on the accumulation result of the summary created by the summary creation means. Weighting assigning means for assigning, and the component appearance frequency determining means determines the appearance frequency of the component based on the summary.

請求項７記載の発明にあっては、前記構成要素出現頻度判断手段は映像コンテンツの要約を基礎として映像コンテンツの出現頻度、即ち重要度の判断を行う。 According to the seventh aspect of the present invention, the component appearance frequency determining means determines the appearance frequency of the video content, that is, the importance level based on the summary of the video content.

即ち、前記要約作成手段により、複数の映像コンテンツについての各ビデオ信号から抽出した前記複数の映像コンテンツにおける音声データ又は映像データをテキスト化された要約に基づき、構成要素の出現頻度が判断される。 In other words, the summarizing means determines the appearance frequency of the component based on the text data summary of the audio data or video data in the plurality of video contents extracted from the video signals of the plurality of video contents.

この場合、要約作成手段で作成された要約は、重み付け付与手段によって複数の映像コンテンツを対象として少なくとも一つ以上の所定の条件に特化した重み付けを付与することにより、複数の映像コンテンツを対象として利用者に対して、重要度、関心度の高い、より有用な情報を供給する。 In this case, the summary created by the summary creation means is targeted for a plurality of video contents by assigning a weight specific to at least one predetermined condition for the plurality of video contents by the weight assignment means. Provide more useful information with high importance and interest to users.

請求項８に記載の発明は、請求項７に記載の映像情報提供システムにおいて、前記重み付け付与手段は、前記複数の映像コンテンツに含まれる音声データ又は映像データから重複するテキストを参照したうえで、その参照結果が所定値以上である場合に、そのテキストを前記所定の条件に合致した重要テキストであると判定して重み付けを付与することを特徴とする。 The invention according to claim 8 is the video information providing system according to claim 7, wherein the weight assigning means refers to audio data or video data included in the plurality of video contents, and overlaps text. When the reference result is equal to or greater than a predetermined value, it is determined that the text is an important text that meets the predetermined condition, and weighting is performed.

例えば、日々放送されるニュース番組やウェブ、ＳＮＳ等においては、社会的に大きな報道価値を有する話題、事件や事故等の情報は社会的に重要度、関心度が高いといえる。その結果、そのような社会的事象に関する情報は複数のメディアにおいて、複数の映像コンテンツとして出現する場合が多い。 For example, in news programs, webs, SNSs, and the like that are broadcast daily, information on topics, incidents, accidents, and the like that have a great social value is socially important and highly relevant. As a result, information regarding such social events often appears as a plurality of video contents in a plurality of media.

従って、複数の映像コンテンツにおいて、テキストの形態素解析等によって重複するテキストを同一の構成要素として判断するとともに、その出現頻度、例えば、出現回数、出現時間、出現率等に応じて、段階的等の重要度を適正に設定するものである。 Therefore, in a plurality of video contents, the overlapping text is determined as the same component by morphological analysis of the text, etc., and stepwise etc. depending on the appearance frequency, for example, the number of appearances, the appearance time, the appearance rate, etc. The importance is set appropriately.

請求項９記載の発明は、請求項８記載の映像情報提供システムにおいて、前記重み付け付与手段は、予め設定された期間内における複数映像コンテンツを対象として重要テキストであるか否かを判定することを特徴とする。 According to a ninth aspect of the present invention, in the video information providing system according to the eighth aspect, the weighting assigning means determines whether or not the text is an important text for a plurality of video contents within a preset period. Features.

前記のような話題、事件又は事故等の情報の社会的な重要度の判断に要する期間を所定の期間内において判断するものである。 The period required for determining the social importance of information such as the topic, incident or accident as described above is determined within a predetermined period.

請求項１０記載の発明は、請求項８又は請求項９記載の映像情報提供システムにおいて、前記重み付け付与手段は、新たな映像コンテンツを対象として、前記重要テキストを含む映像コンテンツであると判定した場合には、当該映像コンテンツの録画を開始することを特徴とする。 According to a tenth aspect of the present invention, in the video information providing system according to the eighth or ninth aspect, when the weighting assigning unit determines that the new video content is the video content including the important text. Is characterized by starting recording of the video content.

日々放送されるニュース番組等においては、大きな事件や事故などは社会的な重要度、関心度が高いといえる。そこで、そのような事件・事故等を、例えば、要約作成用とは別個に、大容量記憶部に録画記憶しておけば、オペレータ等の編集、記録映像等として残しておくことができる。 In news programs that are broadcast every day, large incidents and accidents have high social importance and interest. Therefore, if such incidents / accidents are recorded and stored in the large-capacity storage unit separately from, for example, the summary creation, they can be left as edited by the operator, recorded video, or the like.

請求項１１に記載の発明は、請求項６に記載の映像情報提供システムにおいて、前記重み付け付与手段は、前記複数の映像コンテンツの前記音声データ又は前記映像データの少なくとも一方に含まれるテキストから特定の人・物を対象として、その出現頻度からＣＭ換算値を算出することを特徴とする。 According to an eleventh aspect of the present invention, in the video information providing system according to the sixth aspect, the weight assigning means is specified from text included in at least one of the audio data or the video data of the plurality of video contents. A feature is that a CM conversion value is calculated from the appearance frequency of a person / thing.

上述した特定の条件には、特定の人物・物・事件・事故を含ませることができる。ここで、特定の条件に、例えば、特定の人（法人名・各種団体等を含む）を対象とすることにより、複数の映像コンテンツから特定の人物の出現頻度、例えば、出現回数、出現時間、出現率等を割り出すことができる。 The specific condition described above can include a specific person, object, incident, or accident. Here, for example, by targeting a specific person (including a corporate name, various organizations, etc.) as a specific condition, the appearance frequency of a specific person from a plurality of video contents, for example, the number of appearances, the appearance time, Appearance rate, etc. can be determined.

そして、その出現回数、出現時間、出現率等に基づいて、マスコミ、情報の世界においてどのくらいの人物価値（例えば、ギャラの設定など）があるかの目安となるＣＭ換算値を算出することができる。なお、特定の法人名を対象とした場合には、ＣＭ全体に対する提供割合、法人が販売する各種商品の重要度（力の入れ具合や新商品の特定）、株価変動との相互関係、といった解析（重み付け）等に利用することもできる。 Then, based on the number of appearances, the appearance time, the appearance rate, etc., it is possible to calculate a CM-converted value that is a measure of how much personal value (for example, setting of the galley) there is in the media and information world. . In the case of a specific corporation name, analysis of the ratio of provision to the entire CM, the importance of various products sold by the corporation (such as strength and identification of new products), and the correlation with stock price fluctuations. It can also be used for (weighting) and the like.

請求項１２記載の発明にあっては、前記重み付け付与手段は、前記複数の映像コンテンツの前記音声データ又は前記映像データの少なくとも一方に含まれるテキストから特定の法人名を対象とすることを特徴とする。 The invention according to claim 12 is characterized in that the weight assigning means targets a specific corporate name from text included in at least one of the audio data or the video data of the plurality of video contents. To do.

従って、請求項１２記載の発明にあっては、特定の法人名が出現頻度判断の対象となる構成要素とされることから、特定の法人名が現れる映像コンテンツが抽出され、ユーザーとしての視聴者である当該法人に提供される。 Therefore, in the invention described in claim 12, since the specific corporate name is a component subject to the appearance frequency judgment, the video content in which the specific corporate name appears is extracted, and the viewer as the user To the legal entity.

請求項１乃至１２記載の発明にあっては、複数の映像コンテンツを対象として利用者に対してより有用な情報を、映像システム側から自動的に提供することができる。 According to the first to twelfth aspects of the present invention, more useful information can be automatically provided from the video system side to the user for a plurality of video contents.

即ち、配信された動画や静止画に係る映像コンテンツは、構成要素出現頻度判断手段により、映像コンテンツに含まれる要素が解析されて、過去に蓄積した映像コンテンツとの対比がされることにより同一の構成要素の出現頻度が判断され、所定の構成要素の出現頻度が所定期間において所定値に達したと判断された場合、映像コンテンツ抽出手段によりその構成要素を含む映像コンテンツを蓄積した多数の映像コンテンツから抽出され、映像コンテンツ提供手段により抽出した映像コンテンツは視聴者に提供されるように構成されていることから、視聴者はＴＶやウェブに配信される映像コンテンツを常時視聴する必要はなく、自動的に提供された所定の頻度以上にＴＶやウェブに出現する映像コンテンツのみを視聴することにより重要な情報のチェック、把握が可能となる。 In other words, the video content related to the distributed video or still image is the same by analyzing the elements included in the video content by the component appearance frequency determining means and comparing with the video content accumulated in the past. When the appearance frequency of a constituent element is determined and it is determined that the appearance frequency of a predetermined constituent element has reached a predetermined value in a predetermined period, a large number of video contents in which video content including the constituent element is accumulated by the video content extracting means Since the video content extracted from the video content and provided by the video content providing means is configured to be provided to the viewer, the viewer does not need to always watch the video content distributed to the TV or the web. It is important to watch only video content that appears on TV or the web more than the predetermined frequency provided News of the check, it is possible to grasp.

その結果、出現頻度の高い映像コンテンツは重要な情報である場合、又はその時点においてトレンディな情報である場合が多いことから、視聴者は、常時、ＴＶの複数チャンネルの映像やウェブ情報をチェックする必要はなく、ＴＶ受像機、パソコンやスマートフォンに時間的に縛られることなく、本発明に係るシステムが提供してくれる重要な情報やその時点におけるトレンディな情報からなる映像コンテンツのみを時間効率的に視聴することができる。 As a result, since the video content with high appearance frequency is important information or is often trendy information at that time, the viewer always checks the video and web information of a plurality of TV channels. It is not necessary, and only video content consisting of important information provided by the system according to the present invention and trendy information at that time is time-efficient without being tied to a TV receiver, personal computer or smartphone. Can watch.

従って、視聴者が多忙な環境にあっても、ＴＶ受像機により常にＴＶの番組を確認したり、パソコン、スマートフォンにより頻繁にウェブ情報をチェックすることなく、ＴＶ、パソコン、スマートフォン上において必要な重要情報を効率的に取得することができる、という効果を奏する。 Therefore, even in a busy environment of viewers, it is important for TV, PC, and smartphones to always check TV programs with TV receivers and check web information frequently with PCs and smartphones. There is an effect that information can be efficiently acquired.

請求項２記載の発明にあっては、前記構成要素出現頻度判断手段は視聴者が登録した視聴希望に基づき前記構成要素の出現頻度を判断するように構成されており、本システムのユーザーである視聴者は、自身が視聴を希望する映像、映像の分類、種類、ジャンル等を適宜の指定範囲において登録することができ、前記構成要素出現頻度判断手段はその希望に基づき出現頻度の判断を行うことができるため、ユーザーが自分が希望する映像コンテンツのジャンル、種類等を適宜、あらかじめシステムに登録しておき、そのジャンル、種類の範囲内において出現頻度の高い映像コンテンツを自動的に、効率よく視聴することが可能となる。 In the invention according to claim 2, the component appearance frequency determining means is configured to determine the appearance frequency of the component based on the viewing request registered by the viewer, and is a user of the system. The viewer can register the video he / she wants to watch, the classification, type, genre, etc. of the video within an appropriate designated range, and the component appearance frequency determination means determines the appearance frequency based on the request. Therefore, the genre and type of video content that the user wants can be registered in the system in advance, and video content that appears frequently within the range of the genre and type is automatically and efficiently It becomes possible to watch.

また、例えば、ユーザーである特定の会社が、ＴＶ又はウェブに現れる、自社に関連する情報を一括管理しておきたいと希望する場合には、当該会社名、コーポレーションアイデンティティ、社名の略称等を登録しておくことにより、当該会社に関連するＴＶ映像コンテンツ又はウェブにおける映像情報を全て自動的に収集して視聴し、様々な観点から会社の経営管理に役立てることが可能となる。
請求項４記載の発明にあっては、ＴＶ放送局から送信されてくる映像情報に関しては、例えば、ニュース番組の場合には、画面中において右上等の位置に当該ニュースのタイトル部が短いテキストにより表示される場合がある。このような場合には、構成要素の出現頻度を判断するための映像認識、音声認識、人物認識、背景認識を行う場合に、タイトル部を参照して行うこともできる。このようにタイトル部を参照して映像認識、音声認識、人物認識、背景認識を行った場合には、より精度の高い認識を行うことができ、より効率の良い構成要素の出現頻度の判断をすることが可能となる。 Also, for example, when a specific company that is a user wants to collectively manage information related to the company that appears on TV or the web, the company name, corporate identity, abbreviation of the company name, etc. are registered. By doing so, it is possible to automatically collect and view all TV video content related to the company or video information on the web, and use it for business management from various viewpoints.
In the invention described in claim 4, regarding the video information transmitted from the TV broadcasting station, for example, in the case of a news program, the title portion of the news is expressed by a short text at a position such as the upper right in the screen. It may be displayed. In such a case, when performing video recognition, voice recognition, person recognition, and background recognition for determining the appearance frequency of a component, it can also be performed with reference to the title part. In this way, when video recognition, voice recognition, person recognition, and background recognition are performed with reference to the title part, more accurate recognition can be performed, and more efficient determination of the appearance frequency of components can be performed. It becomes possible to do.

請求項５に係る発明においては、構成要素出現頻度判断手段は、ウェブにアップされている各種の動画、ＳＮＳ（ソーシャルネットワークサービス）上の一切の動画、インターネットを通じて配信されてくる各種の動画等を対象として出現頻度を判断することから、ウェブを介して世の中に出回る全ての映像コンテンツから重要度の高い映像コンテンツを抽出し、ユーザーである視聴者に対して重要度の高い映像コンテンツを網羅的に検索し、迅速かつ自動的に提供することが可能となる。 In the invention which concerns on Claim 5, a component appearance frequency judgment means is the various animations currently uploaded on the web, all the animations on SNS (social network service), various animations distributed over the internet, etc. Since the appearance frequency is judged as the target, video content with high importance is extracted from all video content that is available on the world via the web, and comprehensive video content with high importance for the viewer who is the user is exhausted. It is possible to search and provide quickly and automatically.

請求項６記載の発明は、構成要素出現頻度判断手段は、ＴＶ映像として表示される全ての映像コンテンツを対象として同一の構成要素の出願頻度を判断することから、ＴＶを介して世の中に出回る全ての映像コンテンツから重要度の高い映像コンテンツを抽出し、ユーザーである視聴者に対して重要度の高い映像コンテンツを網羅的に検索し、迅速かつ自動的に提供することが可能となる。 According to the sixth aspect of the present invention, since the component appearance frequency determining means determines the application frequency of the same component for all video contents displayed as TV images, all of the components appearing in the world via TV are included. It is possible to extract video content with high importance from the video content, and comprehensively search for video content with high importance for viewers who are users, and quickly and automatically provide them.

請求項７〜１２記載の発明にあっては、前記構成要素出現頻度判断手段は映像コンテンツの要約を基礎として映像コンテンツの出現頻度、即ち重要度の判断を行うように構成されていることから、直接に映像コンテンツから、顔認識技術、音声認識、形態認識技術等の各種の高度な技術を各映像の主題を絞り込む場合に比して、より迅速かつ正確に当該構成要素の出現頻度の判断を行うことが可能となる。 In the inventions according to claims 7 to 12, since the component appearance frequency determining means is configured to determine the appearance frequency of the video content, that is, the importance, based on the summary of the video content. Compared to video content directly, various advanced technologies such as face recognition technology, voice recognition, form recognition technology, etc., to narrow down the subject of each video, more quickly and accurately determine the appearance frequency of the component. Can be done.

請求項７に記載の発明にあっては、社会的事象に関する情報は複数のメディアにおいて、複数の映像コンテンツとして出現する場合が多いことから、複数の映像コンテンツにおいて、テキストの形態素解析等によって重複するテキストを同一の構成要素として判断するとともに、その出現頻度、例えば、出現回数、出現時間、出現率等に応じて、段階的等の重要度を適正に設定するように構成されていることから、正確に構成要素の出現頻度、即ち、社会的重要度の判断が行われ、ユーザーに対して適切な重要度の動画コンテンツが提供される。 In the invention described in claim 7, since information on social events often appears as a plurality of video contents in a plurality of media, the information is duplicated in a plurality of video contents by morphological analysis of text or the like. Since the text is determined as the same component, and the frequency of appearance, for example, the number of times of appearance, the time of appearance, the appearance rate, etc., it is configured to appropriately set the degree of importance, such as stepwise, The appearance frequency of the constituent elements, that is, the social importance level is accurately determined, and the moving image content having an appropriate importance level is provided to the user.

請求項９記載の発明は、社会的な重要度の判断に要する期間を所定の期間内において判断されることから、ユーザーに対して、迅速に、社会的に重要な映像コンテンツを提供することが可能となる。 According to the ninth aspect of the invention, since the period required for determining the social importance level is determined within a predetermined period, it is possible to quickly provide socially important video content to the user. It becomes possible.

請求項１０記載の発明は、社会的に重要な事件・事故等を、例えば、要約作成用とは別個に、大容量記憶部に録画記憶しておけば、オペレータ等の編集、記録映像等として残しておくことができ、収集した当該ユーザーにとって重要な映像コンテンツを適宜、分析等に再利用することができる映像情報提供システムを提供することができる。 The invention described in claim 10 can be used for editing, recording video, etc. of an operator if socially important incidents / accidents are recorded and stored in a large-capacity storage unit, for example, separately from the summary creation It is possible to provide a video information providing system that can be left and can be reused for analysis or the like as appropriate for the collected video content important to the user.

請求項１１に記載の発明は、前記重み付け付与手段により算出されたＣＭ換算値を利用して様々な経済活動の分析指標として使用することができる映像情報算出システムを提供することができる。 The invention according to claim 11 can provide a video information calculation system that can be used as an analysis index of various economic activities by using the CM conversion value calculated by the weighting means.

請求項１２記載の発明にあっては、特定の法人名が出現頻度判断の対象となる構成要素とされることから、特定の法人名が現れる映像コンテンツが抽出され、ユーザーとしての視聴者である当該法人に提供されることから、例えば、自社の社会的評価、評判等に関する情報を迅速、適切、かつ網羅的に収集することができ、自社の経営に敏速に反映させることが可能となる。 In the invention described in claim 12, since the specific corporate name is a constituent element for which the appearance frequency is determined, the video content in which the specific corporate name appears is extracted and the viewer is the viewer Since the information is provided to the corporation, for example, it is possible to quickly, appropriately, and comprehensively collect information on the company's social evaluation, reputation, etc., and to promptly reflect it on the company's management.

本発明に係る映像情報提供システムの第一の実施の形態の全体構成を示すブロック図である。It is a block diagram which shows the whole structure of 1st embodiment of the video information provision system which concerns on this invention. 本発明に係る映像情報提供システムの第二の実施の形態の全体構成を示すブロック図である。It is a block diagram which shows the whole structure of 2nd embodiment of the video information provision system which concerns on this invention. 本発明に係る映像情報提供システムの第二実施の形態に係る要約作成システムの全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the summary production system which concerns on 2nd embodiment of the video information provision system which concerns on this invention. 本発明に係る映像情報提供システムの第二実施の形態において発話テキスト化部を示すブロック図であり、（ａ）はブロック図、（ｂ）は処理の流れを示す図である。It is a block diagram which shows the speech text conversion part in 2nd embodiment of the video information provision system which concerns on this invention, (a) is a block diagram, (b) is a figure which shows the flow of a process. 本発明に係る映像情報提供システムの第二実施の形態において要約作成システムのテロップテキスト化部を示すブロック図であり、（ａ）はブロック図、（ｂ）は処理の流れを示す図である。It is a block diagram which shows the telop text conversion part of the summary production system in 2nd embodiment of the video information provision system which concerns on this invention, (a) is a block diagram, (b) is a figure which shows the flow of a process. 本発明に係る映像情報提供システムの第二実施の形態において要約作成システムの背景画像テキスト化部を示すブロック図であり、（ａ）はブロック図、（ｂ）は処理の流れを示す図である。It is a block diagram which shows the background image text conversion part of the summary preparation system in 2nd embodiment of the video information provision system which concerns on this invention, (a) is a block diagram, (b) is a figure which shows the flow of a process. . 本発明に係る映像情報提供システムの第二実施の形態において要約作成システムのロゴマークテキスト化部を示すものであり、（ａ）はブロック図、（ｂ）は処理の流れを示す図である。The logo mark text conversion part of the summary preparation system in 2nd embodiment of the video information provision system which concerns on this invention is shown, (a) is a block diagram, (b) is a figure which shows the flow of a process. 本発明に係る映像情報提供システムの第二実施の形態において要約作成システムのテキスト統合部を示すブロック図である。It is a block diagram which shows the text integration part of the summary preparation system in 2nd embodiment of the video information provision system which concerns on this invention. 本発明に係る映像情報提供システムの第二実施の形態において要約作成システムの要約作成部を示すブロック図である。It is a block diagram which shows the summary preparation part of the summary preparation system in 2nd embodiment of the video information provision system which concerns on this invention. 本発明に係る映像情報提供システムの第二実施の形態において要約作成システムの作動工程をしめすフローチャートである。It is a flowchart which shows the operation | movement process of the summary preparation system in 2nd embodiment of the video information provision system which concerns on this invention. 本発明に係る映像情報提供システムの適用例を示し、（Ａ）は文字認識により所望の条件に適合していると判定した場合の説明図、（Ｂ）は音声認識によりにより所望の条件に適合していると判定した場合の説明図、である。The application example of the video information provision system according to the present invention is shown, (A) is an explanatory diagram when it is determined that the desired condition is met by character recognition, (B) is adapted to the desired condition by voice recognition It is explanatory drawing at the time of determining with having carried out.

図１は、本発明に係る映像情報提供システム１を実現するための第一実施の形態を示すブロック図である。
本実施の形態に係る映像情報提供システム１は、配信された多数の映像コンテンツを対象として利用者にとって有用な映像コンテンツを、迅速かつ適切に供給することを目的として、配信された映像コンテンツの構成要素に基づき、同一の構成要素の出現頻度を判断する構成要素出現頻度判断手段５ａと、構成要素出現頻度判断手段５ａによる判断に基づき、出現頻度が所定値以上の場合に構成要素が含まれる映像コンテンツを抽出する映像コンテンツ抽出手段５ｂと、映像コンテンツ抽出手段５ｂにより抽出された映像を視聴者に提供する映像コンテンツ提供手段７とを備えている。 FIG. 1 is a block diagram showing a first embodiment for realizing a video information providing system 1 according to the present invention.
The video information providing system 1 according to the present embodiment has a configuration of distributed video content for the purpose of quickly and appropriately supplying video content useful for a user for a large number of distributed video content. Based on the element, the component appearance frequency determining means 5a for determining the appearance frequency of the same component, and the image including the component when the appearance frequency is a predetermined value or more based on the determination by the component element appearance frequency determining means 5a Video content extracting means 5b for extracting content and video content providing means 7 for providing the viewer with the video extracted by the video content extracting means 5b are provided.

本実施の形態にあっては、図１に示すように、映像情報提供システム１は、例えば、１台のコンピュータ機能を備えるテレビ、パーソナルコンピュータ、スマートフォン、タブレット端末等の再生装置９のみでの利用を可能としたもので、テレビ局（テレビ放送局）若しくはウェブの映像配信サーバからコンテンツに関するビデオ信号を受信するチューナ等を備える受信部２と、再生装置９に装備の操作部（リモコン等を含む）３と、再生装置９としての各種機能を実現するためのアプリケーションを格納した記憶部４と、記憶部４に記憶したアプリケーションに基づいて各種機能を処理する制御回路部５と、映像メタデータを格納する大容量記憶部６と、音声出力用のスピーカや映像出力用のモニタを含む出力部７とを備えている。 In the present embodiment, as shown in FIG. 1, the video information providing system 1 is used only by a playback device 9 such as a television, personal computer, smartphone, tablet terminal or the like having a single computer function. The receiving unit 2 includes a tuner that receives a video signal related to content from a television station (television broadcasting station) or a web video distribution server, and an operation unit (including a remote controller) provided in the playback device 9. 3, a storage unit 4 that stores applications for realizing various functions as the playback device 9, a control circuit unit 5 that processes various functions based on the applications stored in the storage unit 4, and video metadata A large capacity storage unit 6 and an output unit 7 including an audio output speaker and a video output monitor.

受信部２は、衛星放送等を含むテレビ放送用の受信アンテナやインターネット回線等の電気通信回線用のモデム、並びに、チューナ等を含み、映像コンテンツに関するデータ全般を受信することができる。 The receiving unit 2 includes a receiving antenna for television broadcasting including satellite broadcasting and the like, a modem for a telecommunication line such as the Internet line, a tuner, and the like, and can receive all data related to video content.

操作部３は、再生装置９がテレビ等の受像機（ビデオデッキ等を含む）の場合には、主電源、チャネル選択操作、音量増減スイッチ、等の所望の映像コンテンツを視聴するためにユーザーが操作を行うためのもの（リモコン装置等を含む）である。また、操作部３は、再生装置９がパーソナルコンピュータ等である場合には、マウスやキーボード、或いはタッチパネル等の所望の映像コンテンツを視聴するためにユーザーが操作を行うためのものである。 When the playback device 9 is a television receiver or the like (including a video deck or the like), the operation unit 3 is used by a user to view desired video content such as a main power source, a channel selection operation, a volume increase / decrease switch, and the like. This is for operation (including a remote control device). In addition, when the playback device 9 is a personal computer or the like, the operation unit 3 is used by a user to view desired video content such as a mouse, a keyboard, or a touch panel.

記憶部４は、例えば、再生装置９が動作するために制御回路部５が実行する各種アプリケーションを格納している。また、記憶部４に格納したアプリケーションには、本実施形態が実現するための機能用のアプリケーションを含む。 The storage unit 4 stores, for example, various applications executed by the control circuit unit 5 in order for the playback device 9 to operate. In addition, the applications stored in the storage unit 4 include applications for functions for realizing the present embodiment.

制御回路部５は、記憶部４に記憶したアプリケーションに基づいて、受信した映像コンテンツの視聴を可能とするとともに、構成要素出現頻度判断手段５ａ、映像コンテンツ抽出手段５ｂを有している。 The control circuit unit 5 enables viewing of the received video content based on the application stored in the storage unit 4, and includes a component appearance frequency determination unit 5a and a video content extraction unit 5b.

大容量記憶部６は、受信部２で受信した映像コンテンツのデータを格納する。また、大容量記憶部６は、制御回路部５で映像コンテンツを解析した結果を蓄積することができる。 The large-capacity storage unit 6 stores video content data received by the receiving unit 2. Further, the large-capacity storage unit 6 can store the result of analyzing the video content by the control circuit unit 5.

具体的には、ウェブサーバ（クラウドサーバ）或いはテレビ放送局等から配信され、受信部２で受信した映像コンテンツは、大容量記憶部６に蓄積される。 Specifically, video content distributed from a web server (cloud server) or a television broadcasting station and received by the receiving unit 2 is stored in the large-capacity storage unit 6.

そして、蓄積された映像コンテンツに含まれる文字、音声、映像コンテンツのテーマとなる映像対象物、登場人物又は画面の背景等の各構成要素は、記憶部４に格納したアプリケーションにしたがって構成要件出現頻度判断手段５ａによって各構成要素の出現頻度が判断される。 The constituent elements such as characters, sounds, video objects as the theme of the video content, characters, screen backgrounds, and the like included in the accumulated video content are displayed according to the application stored in the storage unit 4. The appearance frequency of each component is determined by the determination unit 5a.

即ち、制御回路部５は、対象となる構成要素の出現頻度、例えば、出現回数、出現時間、出現率等を逐次算出するために、その算出結果と各構成要素とを対応させたデータを大容量記憶部１６に格納する。 That is, the control circuit unit 5 greatly calculates the data that associates the calculation results with each component in order to sequentially calculate the appearance frequency of the target component, for example, the number of appearances, the appearance time, the appearance rate, and the like. It is stored in the capacity storage unit 16.

制御回路部５は、構成要素出現頻度判断手段５ａ及び映像コンテンツ抽出手段５ｂを備えており、また、出力部７が備える表示画面は映像コンテンツ提供手段７を構成する。 The control circuit unit 5 includes a component appearance frequency determination unit 5 a and a video content extraction unit 5 b, and a display screen included in the output unit 7 constitutes a video content providing unit 7.

制御回路部５は、逐次算出した各構成要素の出現頻度が所定値以上に達したか否か、を判定し、出現頻度が所定値以上に達したと判定した場合に、その構成要素が含まれる映像コンテンツを抽出し、コンテンツ提供手段である出力部７の表示画面に表示させる。 The control circuit unit 5 determines whether or not the appearance frequency of each component calculated sequentially has reached a predetermined value or more, and if it is determined that the appearance frequency has reached a predetermined value or more, the component is included. Video content to be extracted is extracted and displayed on the display screen of the output unit 7 serving as content providing means.

なお、出力部７は、例えば、図示を略す外部記録メディア（例えば、ＳＤメモリカードやＵＳＢメモリスティックなど）に抽出した映像コンテンツを記憶するためのストレージやスロットにより構成してもよい。 Note that the output unit 7 may be configured by, for example, a storage or a slot for storing the extracted video content on an external recording medium (not shown) (for example, an SD memory card or a USB memory stick).

また、このような外部記録メディアに抽出した映像コンテンツデータを記録する場合、例えば、大容量記憶部６に記憶した映像コンテンツにおける映像データの解像度を低くしたバックアップ用映像データを生成して記憶部４に一時的に記憶したうえで、そのバックアップ用映像データを記録するように構成してもよい。 When recording the extracted video content data on such an external recording medium, for example, backup video data in which the resolution of the video data in the video content stored in the large-capacity storage unit 6 is lowered is generated and stored in the storage unit 4. The video data for backup may be recorded after being temporarily stored.

このような構成において、配信された映像コンテンツは、構成要素出現頻度判断手段５ａにより、映像コンテンツに含まれる構成要素が解析されて、過去に蓄積した映像コンテンツとの対比がされることにより同一の構成要素の出現頻度が判断される。ここで映像コンテンツは動画が主体となるが静止画も含まれる。 In such a configuration, the distributed video content is the same when the constituent elements included in the video content are analyzed by the constituent element appearance frequency determination unit 5a and compared with the video contents accumulated in the past. The appearance frequency of the component is determined. Here, the video content mainly includes moving images, but also includes still images.

そして、構成要素出現頻度判断手段５ａにより所定の構成要素の出現頻度が所定期間において所定値に達したと判断された場合、映像コンテンツ抽出手段５ｂによりその構成要素を含む映像コンテンツが蓄積された多数の映像コンテンツから抽出される。 When the appearance frequency of a predetermined component is determined to have reached a predetermined value in a predetermined period by the component appearance frequency determination unit 5a, a large number of video contents including the component are accumulated by the video content extraction unit 5b. Extracted from video content.

その後、映像コンテンツ提供手段７により抽出した映像コンテンツは視聴者に提供される。この場合の映像コンテンツの提供には、適宜のディスプレイを介しての映像コンテンツの表示のみならず、映像コンテンツの記録・配信、その他の出力の態様が含まれる。 Thereafter, the video content extracted by the video content providing means 7 is provided to the viewer. The provision of the video content in this case includes not only the display of the video content via an appropriate display but also the recording / distribution of the video content and other output modes.

この場合、構成要素出現頻度判断手段５ａ及び映像コンテンツ抽出手段５ｂには、ＡＩ（人工知能：artificial intelligence）が使用され、高速でのデータ処理が行われる。 In this case, AI (artificial intelligence) is used for the component appearance frequency determination unit 5a and the video content extraction unit 5b, and high-speed data processing is performed.

即ち、配信された動画や静止画に係る映像コンテンツは、構成要素出現頻度判断手段５ａにより、映像コンテンツに含まれる要素が解析されて、過去に蓄積した映像コンテンツとの対比がされることにより同一の構成要素の出現頻度が判断され、所定の構成要素の出現頻度が所定期間において所定値に達したと判断された場合、映像コンテンツ抽出手段５ｂによりその構成要素を含む映像コンテンツを蓄積した多数の映像コンテンツから抽出され、映像コンテンツ提供手段７により抽出した映像コンテンツは視聴者に提供されるように構成されていることから、視聴者はＴＶやウェブに配信される映像コンテンツを常時視聴する必要はなく、自動的に提供された所定の頻度以上にＴＶやウェブに出現する映像コンテンツを視聴することができる。 In other words, the video content related to the distributed video or still image is the same by analyzing the elements included in the video content by the component appearance frequency determination means 5a and comparing with the video content accumulated in the past. When the appearance frequency of the constituent element is determined and it is determined that the appearance frequency of the predetermined constituent element has reached a predetermined value in a predetermined period, a large number of video contents including the constituent element are accumulated by the video content extracting means 5b. Since the video content extracted from the video content and extracted by the video content providing means 7 is configured to be provided to the viewer, the viewer needs to always watch the video content distributed to the TV or the web. Video content that appears on TV and the web more frequently than the automatically provided frequency. .

従って、視聴者が多忙な環境にあっても、ＴＶ受像機により常にＴＶの番組を確認したり、パソコン、スマートフォンにより頻繁にウェブ情報をチェックすることなく、ＴＶ、パソコン、スマートフォン上において必要な重要情報を効率的に取得することができる、という効果を奏する。 Therefore, even in a busy environment of viewers, it is important to check on TV, personal computer and smartphone without constantly checking TV program with TV receiver and checking web information frequently with personal computer and smartphone. There is an effect that information can be efficiently acquired.

また、本実施の形態にあっては、制御回路部５は希望映像登録手段５Ｃを備えている。従って、本システムのユーザーである視聴者は、操作部３を介して、ユーザー自身が視聴を希望する映像、映像の分類、種類、ジャンル等を適宜の指定範囲において、希望映像登録手段５Ｃにより登録することができ、構成要素出現頻度判断手段５ａは登録された希望映像に基づき構成要素の出現頻度の判断を行う。 In the present embodiment, the control circuit unit 5 includes the desired video registration means 5C. Therefore, the viewer who is the user of this system registers the video that the user desires to view, the classification, type, and genre of the video by the desired video registration means 5C via the operation unit 3 within an appropriate designated range. The component appearance frequency determining means 5a determines the appearance frequency of the component based on the registered desired video.

本実施の形態にあっては、動画映像に関し構成要素出現頻度判断手段５ａが特定の映像コンテンツをいかに認識、特定するか、に関しては、「文字、音声、映像コンテンツのテーマとなる映像対象物、登場人物又は映像対象物の背景」の観点から行われる。 In the present embodiment, with regard to how the component appearance frequency determining means 5a recognizes and specifies a specific video content with respect to a moving image, “text object, audio, video object as a video content theme, This is performed from the viewpoint of the “character or background of the video object”.

従って、本実施の形態にあっては、出現頻度の判断対象となる構成要素は、文字、音声、映像コンテンツのテーマとなる映像対象物、登場人物又は映像対象物の背景である。 Therefore, in the present embodiment, the constituent elements for which the appearance frequency is determined are the background of the video object, the character, or the video object that is the theme of characters, audio, and video content.

また、「音声」とは、動画等に含まれる多様な音声であり、背景音、テーマソング、登場人物の話し声、効果音等が含まれる。 The “sound” is a variety of sounds included in a moving image or the like, and includes background sounds, theme songs, characters' speaking voices, sound effects, and the like.

必要な場合には、その多様な音声の中から、当該映像のテーマとなる話題に関する特定の登場人物の話し声のみを抽出して構成要素出現頻度判断手段５ａにより出現頻度が判断される。 If necessary, the appearance frequency is determined by the component appearance frequency determination means 5a by extracting only the spoken voice of a specific character related to the topic that is the theme of the video from the various sounds.

また、「テーマとなる映像対象物」とは、当該映像の主題となる対象物であり、「登場人物」とは映像のテーマに関連して主役として登場する人物又は脇役として登場する人物を含む。この場合、登場人物の特定に関しては顔認証技術等が使用される。また、「映像対象物の背景」とは、登場人物や映像の主題となる対象物の背景として映っているものを指し、例えば、建造物や、海、山、空、平原等を指す。 In addition, the “video object as a theme” is an object as the theme of the video, and the “character” includes a person who appears as a leading role or a supporting character in relation to the video theme. . In this case, face authentication technology or the like is used for specifying the characters. Further, the “background of the video object” refers to what is reflected as the background of the character or the object that is the subject of the video, for example, a building, the sea, a mountain, the sky, a plain, or the like.

これにより、構成要素出現頻度判断手段５ａは視聴者が登録した視聴希望に基づき構成要素の出現頻度を判断するように構成されており、本システムのユーザーである視聴者は、自身が視聴を希望する映像、映像の分類、種類、ジャンル等を適宜の指定範囲において登録することができ、構成要素出現頻度判断手段はその希望に基づき出現頻度の判断を行うことができるため、ユーザーが自分が希望する映像コンテンツのジャンル、種類等を適宜システムに登録しておき、そのジャンル、種類の範囲内において出現頻度の高い映像コンテンツを自動的に、効率よく視聴することが可能となる。 Thereby, the component appearance frequency determining means 5a is configured to determine the appearance frequency of the component based on the viewing request registered by the viewer, and the viewer who is the user of this system desires to view it. Video, video classification, type, genre, etc. can be registered within the appropriate designated range, and the component appearance frequency determination means can determine the appearance frequency based on the request, so the user wants The genre, type, and the like of the video content to be registered are appropriately registered in the system, and the video content having a high appearance frequency within the range of the genre and type can be automatically and efficiently viewed.

また、例えば、ユーザーである特定の会社が、ＴＶ又はウェブに現れる自社に関連する情報を一括管理しておきたいと希望する場合には、当該会社名、コーポレーションアイデンティティ、社名の略称等を登録しえておくことにより、当該会社に関連するＴＶ映像コンテンツ又はウェブにおける映像情報を全て自動的に収集して視聴し、様々な観点から会社の経営管理に役立てることが可能となる。 Also, for example, if a specific company that is a user wants to collectively manage information related to the company appearing on TV or the web, register the company name, corporate identity, abbreviation of the company name, etc. By doing so, it is possible to automatically collect and view all TV video content related to the company or video information on the web and use it for business management from various viewpoints.

映像コンテンツは、電気通信回線を介してウェブサーバ（又はクラウドサーバ）から配信されたものである場合、構成要素出現頻度判断手段は、配信された動画等を含む映像コンテンツを対象として同一の構成要素の出願頻度を判断する。ここで「ウェブを介して配信された動画」とは、ウェブにアップされている各種の動画、ＳＮＳ（ソーシャルネットワークサービス）上の動画、インターネットを通じて動画を提供するサービス上の動画等の一切を対象とすることができる。 When the video content is distributed from a web server (or cloud server) via a telecommunication line, the component appearance frequency determination means is the same component for the video content including the distributed video. Determine the frequency of application. Here, “video distributed via the web” refers to all types of videos uploaded to the web, videos on SNS (social network service), videos on services that provide videos over the Internet, etc. It can be.

このように、構成要素出現頻度判断手段は、ウェブにアップされている各種の動画、ＳＮＳ（ソーシャルネットワークサービス）上の一切の動画、インターネットを通じて配信される動画を対象として出現頻度を判断することから、ウェブを介して世の中に出回る全ての映像コンテンツから重要度の高い映像コンテンツを抽出し、ユーザーである視聴者にとって重要度又は関心度の高い映像コンテンツを網羅的かつ自動的に提供することが可能となる。 Thus, the component appearance frequency determining means determines the appearance frequency for various videos uploaded to the web, all videos on SNS (social network service), and videos distributed through the Internet. It is possible to extract video content with high importance from all video contents that are available on the world via the web, and to provide comprehensive and automatic video content with high importance or interest to the viewer who is the user It becomes.

また、映像コンテンツは、電気通信回線を介して放送局から配信されたものである場合には、構成要素出現頻度判断手段５ａは、ＴＶ映像として表示される全ての映像コンテンツを対象として同一の構成要素の出願頻度を判断する。 In addition, when the video content is distributed from a broadcasting station via a telecommunication line, the component appearance frequency determining unit 5a has the same configuration for all video content displayed as TV video. Determine the filing frequency of the element.

従って、構成要素出現頻度判断手段５ａは、ＴＶ映像として表示される全ての映像コンテンツを対象として同一の構成要素の出願頻度を判断することから、ＴＶを介して世の中に出回る全ての映像コンテンツから重要度又は関心度の高い映像コンテンツを抽出し、ユーザーである視聴者に対して重要度又は関心度の高い映像コンテンツを網羅的かつ自動的に提供することが可能となる。 Therefore, the component appearance frequency determining means 5a determines the application frequency of the same component for all video contents displayed as TV images, and therefore important from all the video contents that are available on the world via TV. It is possible to extract video content with a high degree of interest or a high degree of interest, and to comprehensively and automatically provide video content with a high degree of importance or a high degree of interest to viewers who are users.

なお、制御回路部５は、後述する各種の重み付けの条件下において構成要素の出現頻度を判断することも可能である。
また、本実施の形態にあっては、ＴＶ放送局から送信されてくる映像情報に関しては、
例えば、ニュース番組の場合には、画面中において右上等の位置に当該ニュースのタイトル部が短いテキストにより表示される場合がある。このような場合には、構成要素の出現頻度を判断するための映像認識、音声認識、人物認識、背景認識を行う場合に、タイトル部を参照して行うこともできる。
このようにタイトル部を参照して映像認識、音声認識、人物認識、背景認識を行った場合には、より精度の高い認識を行うことができ、より効率の良い構成要素の出現頻度の判断をすることが可能となる。 The control circuit unit 5 can also determine the appearance frequency of the component under various weighting conditions described later.
In the present embodiment, regarding video information transmitted from a TV broadcast station,
For example, in the case of a news program, the title part of the news may be displayed as a short text at a position such as the upper right in the screen. In such a case, when performing video recognition, voice recognition, person recognition, and background recognition for determining the appearance frequency of a component, it can also be performed with reference to the title part.
In this way, when video recognition, voice recognition, person recognition, and background recognition are performed with reference to the title part, more accurate recognition can be performed, and more efficient determination of the appearance frequency of components can be performed. It becomes possible to do.

次に、映像コンテンツに基づいて要約を作成しその要約を利用して、構成要素の出現頻度を判断するように構成された第二の実施形態を説明する。
図２は本発明の実施形態に係る映像情報提供システムを実現するための映像情報提供システム１１の全体構成を示すブロック図である。システム全体としての構成は、基本的な構成は、前記第一の実施の形態と同様であるが、本実施の形態にあっては、要約作成システム１０を備えている点で異なる。 Next, a description will be given of a second embodiment configured to create a summary based on video content and determine the appearance frequency of the component using the summary.
FIG. 2 is a block diagram showing the overall configuration of the video information providing system 11 for realizing the video information providing system according to the embodiment of the present invention. The basic configuration of the system as a whole is the same as that of the first embodiment, but the present embodiment is different in that the summary creation system 10 is provided.

即ち、第二実施の形態に係る映像情報提供システムは、第一の実施の形態に係る映像情報提供システム１の構成を基本として、さらに、複数の映像コンテンツについての各ビデオ信号から抽出した複数の映像コンテンツにおける音声データ又は映像データをテキスト化して要約を作成する要約作成手段１０と、前記要約作成手段１０で作成した要約の蓄積結果に基づいて最適な条件を学習しつつ、複数の映像コンテンツに対して要約に含まれる一つ以上の所定の条件に特化した重み付けを付与する重み付け付与手段１５ｄとを備えており、構成要素出現頻度判断手段１５ａは前記要約に基づき構成要素の出現頻度を判断するように構成されている。
また、本実施の形態にあっては、制御回路部１５は希望映像登録手段１５ｃを有しており、ユーザーは適宜、操作部３を介して、ユーザー自身が視聴を希望する映像、映像の分類、種類、ジャンル等を適宜の指定範囲において、希望映像登録手段１５Ｃにより登録することができ、構成要素出現頻度判断手段１５ａは登録された希望映像に基づき構成要素の出現頻度の判断を行い、映像コンテンツ抽出手段１５ｂにより当該映像コンテンツが抽出され、映像コンテンツ出力手段を構成する出力部１７によりディスプレイ等に表示される点は第一実施の形態と同様である。 That is, the video information providing system according to the second embodiment is based on the configuration of the video information providing system 1 according to the first embodiment, and further includes a plurality of video signals extracted from each video signal for a plurality of video contents. Summarizing means 10 for creating a summary by converting audio data or video data in the video content into text, and learning the optimum conditions based on the accumulation result of the summaries created by the summary creating means 10, and for a plurality of video contents Weighting means 15d for assigning weights specialized to one or more predetermined conditions included in the summary, and the component appearance frequency judging means 15a judges the appearance frequency of the component based on the summary. Is configured to do.
Further, in the present embodiment, the control circuit unit 15 includes the desired video registration unit 15c, and the user appropriately classifies videos and videos that the user desires to view via the operation unit 3. The desired video registration means 15C can register the type, genre, etc. in an appropriate designated range, and the component appearance frequency determination means 15a determines the appearance frequency of the component based on the registered desired video, and the video The video content is extracted by the content extraction unit 15b and displayed on a display or the like by the output unit 17 constituting the video content output unit, as in the first embodiment.

本実施の形態において、映像情報提供システム１１を構成する要約作成システム１０を再生装置１９内に設けることもでき、また、要約作成システム１０を専用の管理サーバ等によって構成することもでき、その管理サーバによって作成された要約に基づいて稼働する映像出力システム部分を、例えば、コンピュータ機能を備えるテレビ、パーソナルコンピュータ、スマートフォン、タブレット端末等（以下、「再生装置１９」と称する。）により実現することも可能である。なお、再生装置１９は、１台での利用のほか、複数台での利用も可能である。 In the present embodiment, the summary creation system 10 that constitutes the video information providing system 11 can be provided in the playback device 19, and the summary creation system 10 can also be configured by a dedicated management server or the like. The video output system portion that operates based on the summary created by the server may be realized by, for example, a television, personal computer, smartphone, tablet terminal, or the like (hereinafter referred to as “playback device 19”) having a computer function. Is possible. Note that the playback device 19 can be used not only by one unit but also by a plurality of units.

また、以下の説明においては、テレビ放映の場合を主として説明するとともに、ウェブの映像配信の固有の場合は適宜説明し、テレビ放映の利用形態と同一若しくは実質的に同一のウェブの映像配信の利用形態に関してはその説明を省略する。 In the following description, the case of airing on TV will be mainly described, and the case specific to video distribution on the web will be described as appropriate, and the use of video distribution on the web that is the same as or substantially the same as the form of use of airing on TV. The description of the form is omitted.

テレビ放映には、地上波デジタル放送、衛星放送、ワンセグ放送、インターネット放送等が含まれ、放送形態や受信形態は問わない。 Television broadcasting includes terrestrial digital broadcasting, satellite broadcasting, one-segment broadcasting, Internet broadcasting, and the like, and any broadcasting form or receiving form is acceptable.

図２及び図３に示すように、映像情報提供システム１１は、テレビ局３０若しくはウェブの映像配信サーバ４０から映像コンテンツに関するビデオ信号を受信するチューナ等を備える受信部１２と、再生装置９に装備の操作部（リモコン等を含む）１３と、再生装置１９としての各種機能を実現するためのアプリケーションを格納した記憶部１４と、記憶部１４に記憶したアプリケーションに基づいて各種機能を処理する制御回路部１５と、上述した要約作成システム１０と、作成された要約並びに映像コンテンツの録画用の各種データを記憶する大容量記憶部１６と、音声出力用のスピーカや映像出力用のモニタを含む出力部１７とを備えている。 As shown in FIGS. 2 and 3, the video information providing system 11 includes a receiver 12 including a tuner that receives a video signal related to video content from a television station 30 or a video distribution server 40 on the web, and the playback device 9. An operation unit (including a remote controller) 13, a storage unit 14 that stores applications for realizing various functions as the playback device 19, and a control circuit unit that processes various functions based on the applications stored in the storage unit 14 15, the above-described summary creation system 10, a large-capacity storage unit 16 for storing various data for recording the created summary and video content, and an output unit 17 including a speaker for audio output and a monitor for video output. And has.

要約書作成システム１０は、ビデオ信号分離部２０、ビデオ信号処理部１８、テキスト統合部５００及び要約作成部６００により構成されており、ビデオ信号処理部１８は、図３に示すように、発話テキスト化部１００、テロップテキスト化部２００、背景画像テキスト化部３００、ロゴマークテキスト化部４００により構成されている。 The summary creation system 10 includes a video signal separation unit 20, a video signal processing unit 18, a text integration unit 500, and a summary creation unit 600. As shown in FIG. The image forming unit 100, the telop text converting unit 200, the background image text converting unit 300, and the logo mark text converting unit 400 are configured.

図２に示すように、再生装置１９は、、受信部１２、操作部１３、記憶部１４、制御回路部１５、要約処理システム１１、大容量記憶部１６及び出力部１７を有している。
また、出力部１７は、例えば、重み付け付与手段としての制御回路部１５で算出した重み付け付与に基づく、付与結果をモニタ出力或いはプリンタ出力する機能を有する映像コンテンツ提供手段を構成している。
制御回路部１５は、構成要件出現頻度判断手段１５ａ、映像コンテンツ抽出手段１５ｂ、希望映像登録手段１５Ｃを有している点では第一実施の形態の場合と同様であり、本実施の形態にあっては、さらに、重み付け手段１５ｄを有している。 As illustrated in FIG. 2, the playback device 19 includes a reception unit 12, an operation unit 13, a storage unit 14, a control circuit unit 15, a summary processing system 11, a large capacity storage unit 16, and an output unit 17.
Further, the output unit 17 constitutes video content providing means having a function of outputting a result of assignment based on weighting calculated by the control circuit unit 15 serving as weighting means, for example, to monitor output or printer output.
The control circuit unit 15 is the same as that in the first embodiment in that it includes a component requirement frequency determination unit 15a, a video content extraction unit 15b, and a desired video registration unit 15C. In addition, weighting means 15d is further provided.

＜要約（映像メタデータ）の作成＞
ここでは、要約としての映像メタデータを制作する場合の一例として、テレビ放送内容を日本語処理してデータベース化する場合を説明する。また、この場合に映像コンテンツとは、一つの番組又はコーナーを対象として例示する。 <Creation of summary (video metadata)>
Here, as an example of producing video metadata as a summary, a case will be described in which a television broadcast content is processed into a Japanese database. In this case, the video content is exemplified for one program or corner.

テレビ番組において、特に、刻々と放送されるニュース・放送番組にあっては、「即時性」や「正確性」が重要となっている。 In TV programs, especially “news” and “broadcast programs” that are broadcast every moment, “immediateness” and “accuracy” are important.

その一方で、テレビ放送におけるこのようなニュース・放送番組にあっては、一部のニュース内容が時間帯の異なる他のニュース番組等（放送局の相違は問わない）で放送されることはあるものの、同一番組が異なる曜日に再放送されることはなく、消えゆく情報ともいえる。 On the other hand, in such news / broadcast programs in television broadcasting, some news content may be broadcast on other news programs with different time zones (regardless of differences in broadcasting stations). However, it can be said that the same program is not rebroadcast on different days of the week and disappears.

このような「即時性」や「正確性」を有する情報にあっては、ニュース内容によって、社会的な重要性やニーズ、或は、新情報が明らかになる、などの条件によって継続性を有する場合があるため、例えば、出現頻度が所定値に達するなどの重要度・ニーズ度等に応じてニュースが重み付けされるのが望ましい。なお、出現頻度には、例えば、出現回数、出現時間、出現率等を適用することができる。 Such “immediate” and “accurate” information has continuity depending on the news content, social importance and needs, or new information becomes clear. In some cases, for example, it is desirable that the news is weighted according to the degree of importance / needs such that the appearance frequency reaches a predetermined value. For example, the number of appearances, the appearance time, the appearance rate, and the like can be applied to the appearance frequency.

ここで、重要度・ニーズ度には、短期的、長期的、時期的な要素を有していることから、例えば、週間、月間、季間（旬間）、年間、別の統計によって重み付けしたグラフを作成することも可能である。この際、作成されたグラフは、出力部１７からモニタ出力又はプリンタ出力が可能である。 Here, the importance / needs level has short-term, long-term, and seasonal factors. For example, graphs weighted by weekly, monthly, seasonal (seasonal), annual, and other statistics. It is also possible to create. At this time, the created graph can be output from the output unit 17 as a monitor or a printer.

これにより、短期間での重要度・ニーズ度は高いが年間を通じた場合に重要度・ニーズ度が低くなってしまうことを抑制することができるうえ、対応する時期における重要度・ニーズ度が高いという重み付けを付与することができる。 As a result, the importance and needs in the short term are high, but it is possible to suppress the importance and needs from becoming low throughout the year, and the importance and needs in the corresponding period are high. Weighting can be given.

具体的には、「桜の開花予想」、「桜の名所」、「オリンピック」などの特定の周期で重要度・ニーズ度が高くなる場合等に有効な重み付けを付与することができる。 Specifically, it is possible to assign an effective weighting when the degree of importance / needs increases in a specific cycle such as “forecasting of cherry blossoms”, “famous place for cherry blossoms”, “olympic games”, and the like.

また、新たに放送されるビデオ情報に対するメタデータは、１０分程度のタイムラグで逐次更新することができ、最新の情報に基づいた重要度等に更新することができる。この際、複数の放送局の番組を同時に受信して最新の情報に更新することも可能である。 Also, metadata for newly broadcast video information can be sequentially updated with a time lag of about 10 minutes, and can be updated to importance based on the latest information. At this time, it is also possible to simultaneously receive programs from a plurality of broadcasting stations and update them to the latest information.

メタデータには、放送局や放送時間等の基本情報に加え、ニュースのタイトル、内容の抄録、コメンテータの氏名や目立つロゴ、といったテキスト情報に加え、背景画像等の画像認証、キャスターの顔認証、声紋分析、等によってより細かい映像メタデータを制作・配信することができる。 In addition to basic information such as broadcast stations and broadcast times, metadata includes text information such as news titles, content abstracts, commentator names and prominent logos, image authentication of background images, caster face authentication, Finer video metadata can be produced and distributed by voiceprint analysis.

さらに、その結果は、ウェブやメールにより、ユーザー側で確認することも可能となっている。したがって、ユーザー側において、これらの映像メタデータをハードディスク等の大容量記憶媒体に保存・蓄積していけば、さまざまな活用場面に利用することができる。 Furthermore, the result can be confirmed on the user side by web or mail. Therefore, if the video metadata is stored and accumulated on a large-capacity storage medium such as a hard disk on the user side, it can be used in various usage situations.

具体的には、日々のニュース放送から、特定のコメンテータの言動をクローズアップして詳細を完全収録し、追って、その内容を検証することも可能となる。
なお、その特定のコメンテータを条件とし、特定のコメンテータが出演している場合には、必ず、ユーザーに視聴を報知するように構成することもでき、現在放送中のニュース番組、或は、録画したニュース番組において、そのコメンテータがコメントしている際に、スポット的にボリュームを上げることによりユーザーに視聴を促す報知を行うように構成することも可能となる。
このように構成した場合には、ユーザーはＴＶの番組映像をすべて見る必要はなく、報知されて場合にのみ視聴すればよいことから、ユーザーの時間的自由度を確保しつつ、ユーザーにとって有用な情報の取得を可能とするものである。 Specifically, from daily news broadcasts, it is possible to close up the behavior of a particular commentator, record the details completely, and verify the contents later.
In addition, if the specific commentator appears on the condition, it can be configured to always notify the user of the viewing, the news program that is currently broadcast, or recorded In a news program, when the commentator is commenting, it is also possible to make a notification that prompts the user to watch by increasing the volume in a spot manner.
When configured in this way, the user does not have to watch all TV program images, and only needs to watch it when notified, so that it is useful for the user while ensuring the user's time freedom. Information can be obtained.

なお、ユーザーに対して、ユーザーが多忙でＴＶを常時視聴する時間的余裕のない場合には、要約において所定の映像に対応する条件を設定し、当該条件に合致した場合に、映像が表示されている旨の報知をユーザーに対して行うように構成することもできる。
この場合、報知条件に適合した内容が含まれている場合に、出力中の映像コンテンツのユーザーである視聴者に対して報知には、上述したボリュームを上げる場合のほか、メッセージ等を発音するなどの利用者の聴覚に対して行うことができる。
また、利用者の聴覚に対する報知のほか、例えば、図１１に示す表示映像７ａの明暗反転の繰り返しや専用ランプの点灯・点滅など、利用者の視覚に対する報知でもよい。また、これら聴覚と視覚との併用でもよい。さらに、単なる報知にとどまらず、他の動作（例えば、録画）を開始するためのトリガー信号として利用することも可能である。 If the user is busy and has no time to watch TV at all times, a condition corresponding to a predetermined video is set in the summary, and the video is displayed when the condition is met. It can also be configured to notify the user to the effect.
In this case, when contents that meet the notification conditions are included, notification to the viewer, who is the user of the video content being output, is performed in addition to increasing the volume as described above, and sounding a message, etc. This can be done for the user's hearing.
Further, in addition to notification to the user's hearing, notification to the user's vision, such as repetition of bright and dark inversion of the display image 7a shown in FIG. 11 and lighting / flashing of a dedicated lamp, may be used. Further, a combination of hearing and vision may be used. Furthermore, it can be used as a trigger signal for starting other operations (for example, recording) as well as mere notification.

また、例えば、ニュース番組において、利用者がスポーツニュースの結果のみを知りたい場合、ユーザーが希望映像登録手段１５Ｃによりその旨を登録しておいた場合には、ニュース番組全体を視聴するのではなく、制御回路部に設けられた構成要素出現頻度判定手段１５により、例えば、図１１（Ａ）に示すように、表示画面１７ａに「スポーツ」の文字がテロップ表示された場合や、図１１（Ｂ）に示すように、キャスターが「スポーツ」を含むアナウンス原稿を読み上げたときに、利用者に報知することができる。
Also, for example, in a news program, if the user wants to know only the result of sports news, and if the user has registered that fact by the desired video registration means 15C, the entire news program is not viewed. The component appearance frequency determining means 15 provided in the control circuit unit displays, for example, the case where the characters “sports” are displayed on the display screen 17a as a telop as shown in FIG. ), The user can be notified when the caster reads an announcement document including “sports”.

また、映像メタデータの利用の態様としては、番組中に流れる映像中の登場人物、例えば、上述した特定コメンテータのコメント時間や論調分析、放送された内容中（番組中）に紹介された政治家（政党）やスポーツ選手の映像等を含む放送時間といった、映像メタデータのデータベース化を行うとともに、クラスタリング（データを外的基準なしに自動的に分類する機能の意）を行うことにより、人・物のＣＭ換算値を算出するといった重み付けの付与も可能である。 In addition, as a mode of use of video metadata, characters in the video that flow during the program, for example, comment time and logical analysis of the above-mentioned specific commentator, politician introduced in the broadcast content (during the program) By creating a database of video metadata, such as broadcast times including (political parties) and athletes' video, etc., and clustering (a function that automatically classifies data without external standards) Weighting such as calculating a CM conversion value of an object is also possible.

なお、蓄積された過去の要約作成結果の入力データと出力データとを教材として最適な要約作成設定を学習する要約作成システム１０の機能である要約作成処理（ＡＩ処理）を利用して上述したような重み付けを付与する場合、ＡＩ処理とは別に、視聴率、或は、新聞や雑誌等の映像メタデータに含まれていない情報に基づいたオペレータの手動入力により、ＣＭ換算値を人物毎に評価価格（単位時間当たりの単価）に変換してもよい。 As described above using the summary creation process (AI process) that is a function of the summary creation system 10 that learns the optimum summary creation setting using the input data and output data of the past summary creation results accumulated as teaching materials. In addition to the AI processing, the CM conversion value is evaluated for each person by the manual input of the operator based on the audience rating or information not included in the video metadata such as newspapers and magazines. You may convert into a price (unit price per unit time).

さらに、重み付けされたＣＭ換算値は、例えば、単一放送局、単一番組、複数放送局（例えば、関東エリアのキー局）等を対象として映像メタデータを制作し、週報／月報／旬報（四半期）／半期／通期／単位でまとめることができる。なお、まとめたデータはグラフや一覧表（例えば、上位１００人を対象として）等によって出力部７からモニタ出力又はプリンタ出力が可能である。 Furthermore, the weighted CM conversion value is generated as video metadata for a single broadcast station, a single program, a plurality of broadcast stations (for example, a key station in the Kanto area), and the weekly / monthly / seasonal ( (Quarterly) / half year / full year / unit. The collected data can be output from the output unit 7 as a monitor or a printer by a graph or a list (for example, for the top 100 people).

さらに、テキスト化した映像メタデータは、同時放送中の文字放送として利用することができるうえ、例えば、テレビのニュース・放送番組、ワイドショー、討論番組、政治・経済番組、政治・経済バラエティなど、１日単位で延べ１００時間以上にもおよぶ国営放送局及び民放キー局の情報番組について、その内容や記事単位の詳細情報をオペレータによって作成するためのテキスト情報として利用することも可能である。 In addition, text-formatted video metadata can be used as teletext broadcasting during simultaneous broadcasting. For example, TV news / broadcast programs, wide shows, discussion programs, political / economic programs, political / economic variety, etc. It is also possible to use the contents and the detailed information for each article as text information to be created by the operator for the information programs of the state-run broadcasting stations and private key stations that extend over 100 hours per day.

＜再生装置１９＞
再生装置１９には、受信部２として、テレビ放送（地デジ・衛星放送・ワンセグを含む）用のチューナ機能、或は、インターネット配信映像を受信する受信機能、を有し、図１１に示すように、その映像を出力部７の表示画面１７ａに出力することが可能であることから、テレビ、パーソナルコンピュータ、スマートフォン、タブレット端末、等を利用することができる。 <Reproducing device 19>
The playback device 19 has a tuner function for television broadcasting (including terrestrial digital broadcasting, satellite broadcasting, and one-segment broadcasting) or a receiving function for receiving Internet distribution video as the receiving unit 2, as shown in FIG. In addition, since the video can be output to the display screen 17a of the output unit 7, a television, a personal computer, a smartphone, a tablet terminal, or the like can be used.

受信部１２は、ＴＶ局からのＴＶ電波又はインターネットを介して発信される通信電波を受信する。また、再生装置１９外部で要約が作成される場合には、通信手段を介して供給された要約情報を受信する機能を有するように構成されていてもよい。受信部１２で受信した要約は、大容量記憶部１６に記憶、又は更新される。 The receiving unit 12 receives TV radio waves from TV stations or communication radio waves transmitted via the Internet. Further, when a summary is created outside the playback device 19, it may be configured to have a function of receiving summary information supplied via communication means. The summary received by the receiving unit 12 is stored or updated in the mass storage unit 16.

操作部１３は、テレビに付帯の各種スイッチ等、テレビに付属のリモートコントロール装置、コンピュータ用のマウスやキーボード、スマートフォンやタブレット端末に付帯の各種スイッチやタッチパネル、等を利用することができる。 The operation unit 13 can use various switches attached to the television, remote control devices attached to the television, mouse and keyboard for computers, various switches and touch panels attached to smartphones and tablet terminals, and the like.

ところで、上述したテレビ放送において、ニュースでは、ある事件が起きると、複数局あるテレビ放送局が繰り返し同じシーンを放送する。このような場合、各テレビメディアが何をいつどう放送したか、一つ一つ把握しても全体像を容易に認識することはできない場合が多い。 By the way, in the above-mentioned television broadcasting, when a certain incident occurs in news, a plurality of television broadcasting stations repeatedly broadcast the same scene. In such a case, it is often impossible to easily recognize the whole picture even if each TV media knows what and when it broadcasts.

そこで、このような事件を所望の条件として設定すれば、指定した全てのニュース放送番組の内容を秒単位でテキストデータ化したうえでデータベース化し、要約を作成することができる。 Therefore, if such an incident is set as a desired condition, the contents of all designated news broadcast programs can be converted into text data in seconds and then compiled into a database to create a summary.

そして、その要約の内容を同一テーマ毎に分類（クラスター化）した結果を分析し、例えば、利用者や契約した専用会社のオペレータが処理すれば、なにが、いつ、どの局で、どのくらい放送されたか、定量化された情報を得ることも可能となる。 Then, the results of classifying (summarizing) the contents of the summary into the same theme are analyzed. For example, if a user or an operator of a contracted dedicated company processes it, what, when and how much broadcasting It is also possible to obtain information that has been made or quantified.

そして、このような定量化された情報を、所望の条件として設定することにより、以降のニュース放送では、より最新の正確な条件を設定することも可能となり、上述した事件に関する放送の場合には報知による番組の部分的視聴、他のニュース放送に関しては番組全体を視聴する、といったような選択を行うことができる。 And by setting such quantified information as a desired condition, it becomes possible to set a more recent and accurate condition in subsequent news broadcasts. For example, it is possible to make a selection such as partial viewing of a program by notification and viewing of the entire program regarding other news broadcasts.

この定量化に際し、例えば、事件の映像部分（例えば、原子力発電所の事故処理の経過に関する映像部分）を大容量記憶部１６に自動録画するなどの出力機能において重み付けを付与することも可能である。 For this quantification, for example, weighting can be given in an output function such as automatically recording a video portion of an incident (for example, a video portion regarding the progress of accident processing at a nuclear power plant) in the large-capacity storage unit 16. .

また、上述したように、このような事件・事故に関する放送がテレビメディアでどのくらい扱われたか、どの局がどのテーマを時間・回数的にどう扱ってきたかをグラフ化するといった利用形態において重み付けを付与することも可能である。 In addition, as described above, weighting is given in usage forms such as how much broadcasts related to such incidents and accidents are handled in television media, and which stations have handled which themes and how they have been handled in terms of time and frequency. It is also possible to do.

さらに、このような要約には、ニュース放送に限らず、各種エンターテーメント番組の内容を多角的に分析することも可能である。 Furthermore, in such a summary, the contents of various entertainment programs can be analyzed from various perspectives, not limited to news broadcasts.

これにより、例えば、網羅的に構築されたエンタメ・データベースを基に、ドラマ、映画、バラエティなどのエンターテーメント番組の内容やジャンル比較、時間帯把握など、多角的な観点で分析することができる。 As a result, for example, based on a comprehensively constructed entertainment database, it is possible to analyze the contents of entertainment programs such as dramas, movies, and varieties, genre comparisons, time zone grasps, etc. .

さらに、当該特定の出演者の出演時間を換算し、例えば、日・週・月単位での出演割合等からその演者価値を容易に算出することができる。 Furthermore, the performance time of the specific performer can be converted, and the performer value can be easily calculated from, for example, the appearance ratio in units of days, weeks, and months.

また、上述した出演者の音声は、音声認識後のテキスト化のための形態素解析の際に、方言を標準語へと変換する重み付けを付与することも可能である。 In addition, the voice of the performer described above can be given a weight for converting a dialect into a standard word when performing morphological analysis for text conversion after voice recognition.

制御回路部１５は、要約作成システム１０によって作成した要約を適宜（又は逐次）受信して大容量記憶部１６に蓄積するとともに、その要約の蓄積結果に基づいて重み付け付与のための最適な条件を学習しつつ、複数の映像コンテンツに対して要約に含まれる一つ以上の所定の条件に特化した重み付けを付与することができる。 The control circuit unit 15 receives the summary created by the summary creation system 10 as appropriate (or sequentially) and stores it in the large-capacity storage unit 16, and sets optimum conditions for weighting based on the storage result of the summary. While learning, a plurality of video contents can be given a weight specific to one or more predetermined conditions included in the summary.

このように、映像情報提供システム１１は、要約書作成手段で作成された要約は、重み付け付与手段によって複数の映像コンテンツを対象として少なくとも一つ以上の所定の条件に特化した重み付けを付与することにより、複数の映像コンテンツを対象として利用者により有用な情報を供給することができる。 As described above, the video information providing system 11 assigns a weight specialized to at least one predetermined condition for a plurality of video contents to the summary created by the summary creation means. Thus, it is possible to supply useful information to a user for a plurality of video contents.

また、複数の映像コンテンツにおいて、出現頻度に応じて（段階的等の）重要度を適正に設定することができる。また、日々放送されるニュース番組等においては、大きな事件や事故などは社会的な重要度（又は関心度）が高いといえる。そこで、そのような事件・事故等の重要度の比較的短期間を対象とすることにより、自動的に所定の条件とすることができる。 Also, in a plurality of video contents, the importance (eg, stepwise) can be set appropriately according to the appearance frequency. In news programs and the like that are broadcast daily, large incidents and accidents can be said to have high social importance (or interest). Therefore, the target condition can be automatically set to a predetermined condition by targeting a relatively short period of importance of such an incident or accident.

また、そのような事件・事故等を、例えば、要約作成用とは別に、大容量記憶部６に録画記憶しておけば、オペレータ等の編集、記録映像等として残しておくことができる。 Further, for example, if such incidents / accidents are recorded and stored in the large-capacity storage unit 6 separately from the summary creation, they can be left as edited by the operator, recorded video, or the like.

上述した特定の条件には、特定の人・物・事件・事故を含ませることができる。ここで、特定の条件に、例えば、特定の人物、会社、各種団体等を対象とすることにより、複数の映像コンテンツから特定の人物、会社、団体の出現頻度、例えば、出現回数、出現時間、出現率等を割り出すことができる。 The specific conditions described above can include specific people, things, incidents, and accidents. Here, for example, by targeting a specific person, company, various organizations, etc. in a specific condition, the appearance frequency of a specific person, company, group from a plurality of video contents, for example, the number of appearances, the appearance time, Appearance rate, etc. can be determined.

そして、その出現頻度に基づいて、どの程度の人物価値（例えば、ギャラの設定など）があるかの目安となるＣＭ換算値を算出することができる。 Based on the appearance frequency, it is possible to calculate a CM-converted value that is a measure of how much person value (for example, the setting of the galley) is present.

なお、特定の法人名を対象とした場合には、ＣＭ全体に対する提供割合、法人が販売する各種商品の案分（力の入れ具合や新商品の特定）、株価変動との相互関係、といった解析（重み付け）等に利用することもできる。 In the case of a specific corporation name, analysis of the ratio of provision to the entire CM, prorated amount of various products sold by the corporation (determination of strength and identification of new products), and correlation with stock price fluctuations. It can also be used for (weighting) and the like.

出現頻度は、日・週・月・年単位で集計することができるほか、ＴＶ放送局ごと、時間帯ごと、に分けて集計することもできる。 Appearance frequencies can be tabulated by day, week, month, year, and can also be tabulated separately for each TV broadcast station and time zone.

また、出現頻度には、例えば、特定の番組において、番組単位で出現回数を１回とする場合と、番組内における出現回数を対象とする場合を、重み付けの条件として含ませることができる。 In addition, the appearance frequency can include, for example, a case where the number of appearances is set to be 1 for each program in a specific program and a case where the number of appearances in the program is targeted.

ここで、番組内における出現回数には、映像として出現した回数や出現時間を対象とする場合と、例えば、司会者等から氏名を呼び掛けられた回数を対象とする場合と、を含ませることができる。 Here, the number of appearances in the program may include the case where the number of appearances and the appearance time are targeted, and the case where, for example, the number of times the name is called by a moderator or the like is targeted. it can.

なお、氏名での呼び掛けには、「○×△□」のフルネームの場合、「○×さん」等の氏だけの場合、「△□ちゃん」や「△ちゃん」等の名又は呼称の場合を条件とすることができる。 In addition, in the case of a full name such as “○ × △ □”, only a name such as “○ × Ms.”, A name or name such as “△ □ -chan” or “△ -chan”, etc. It can be a condition.

さらに、その出演者がグループに所属している人であれば、グループ名での呼び掛けの場合と、氏名での呼び掛けの場合と、を（重み付けとして）含ませることができる。同様に、例えば、スポーツ等の分野において過去に好成績を残したことにより、後輩の選手に「○○２世」などと呼称されている場合も、先人と後人の両方を重み付け条件とすることができる。 Further, if the performer belongs to a group, the case of calling by group name and the case of calling by name can be included (as weights). Similarly, for example, even if a junior player is called “XX II” due to having a good performance in the past in the field of sports, etc., both the former and the latter are used as weighting conditions. be able to.

なお、例えば、「○×」の氏での呼び掛けの場合、同一番組中に同じ氏の人が含まれている場合は少ないものの、業界全体としては複数の人が存在する可能性が高い。したがって、このような場合には、番組に出演していない人を除き、番組中に出演している人のみを重み付けの条件として含ませることができる。 For example, in the case of an appeal by Mr. “X”, there are few cases where the same program includes the same person, but there is a high possibility that a plurality of people exist in the entire industry. Therefore, in such a case, only people who appear in the program can be included as weighting conditions except for those who do not appear in the program.

これとは逆に、例えば、番組に出演はしていない人ではあるものの、出演者が所属するグループに所属する他の人物の氏名や映像等が出た場合には、その人物は出演しているものとして扱うこともできる。 On the other hand, for example, if you are a person who has not appeared in the program, but the name or video of another person belonging to the group to which the performer belongs appears, that person will appear. It can also be treated as being.

このように、本実施の形態においては、複数の映像コンテンツについての各ビデオ信号から抽出した複数の映像コンテンツにおける音声データ又は映像データをテキスト化して要約を作成する要約作成手段６００と、要約作成手段６００で作成した要約の蓄積結果に基づいて最適な条件を学習しつつ、複数の映像コンテンツに対して要約に含まれる一つ以上の所定の条件に特化した重み付けを付与する重み付け付与手段１５ｄとを備え、構成要素出現頻度判断手段１５ａは要約に基づき構成要素の出現頻度を判断するように構成されている。 Thus, in the present embodiment, summary creation means 600 for creating a summary by converting audio data or video data in a plurality of video contents extracted from each video signal for a plurality of video contents into text, and summary creation means Weighting assigning means 15d for assigning weights specific to one or more predetermined conditions included in the summary to a plurality of video contents while learning optimum conditions based on the accumulation result of the summary created in 600; The component appearance frequency determining means 15a is configured to determine the appearance frequency of the component based on the summary.

構成要素出現頻度判断手段１５ａは映像コンテンツの要約を基礎として映像コンテンツの出現頻度、即ち重要度の判断を行う。 The component appearance frequency determination means 15a determines the appearance frequency of the video content, that is, the importance level based on the summary of the video content.

即ち、要約作成手段６００により、複数の映像コンテンツについての各ビデオ信号から抽出した複数の映像コンテンツにおける音声データ又は映像データをテキスト化された要約に基づき、映像コンテンツの構成要素の出現頻度が判断される。 That is, the summarizing means 600 determines the appearance frequency of the constituent elements of the video content based on the audio data or the video data in the plurality of video contents extracted from the respective video signals for the plurality of video contents as text. The

この場合、要約作成手段６００により作成された要約は、重み付け付与手段１５ｄによって複数の映像コンテンツを対象として少なくとも一つ以上の所定の条件に特化した重み付けを付与することにより、複数の映像コンテンツを対象として利用者に対して、重要度又は関心度の高いより有用な情報を供給する。 In this case, the summary created by the summary creation means 600 is assigned a plurality of video contents by assigning weights specific to at least one or more predetermined conditions to the plurality of video contents by the weight assigning means 15d. More useful information with high importance or interest is supplied to the user as a target.

従って、第二実施の形態に係る映像情報システム１１にあっては、構成要素出現頻度判断手段１５ａは映像コンテンツの要約を基礎として映像コンテンツの出現頻度、即ち重要度の判断を行うように構成されていることから、直接に映像コンテンツから、顔認識技術、音声認識、形態認識技術等の各種の高度な技術を各映像の主題を絞り込む場合に比して、より迅速かつ正確に当該構成要素の出現頻度の判断を行うことが可能となる。 Therefore, in the video information system 11 according to the second embodiment, the component appearance frequency determination means 15a is configured to determine the appearance frequency of video content, that is, the degree of importance based on the summary of the video content. Therefore, compared to the case of narrowing down the subject of each video directly from video content, various advanced technologies such as face recognition technology, voice recognition, form recognition technology, etc. It is possible to determine the appearance frequency.

また、重み付け付与手段１５ｄは、複数の映像コンテンツに含まれる音声データ又は映像データから重複するテキストを参照したうえで、その参照結果が所定値以上である場合に、そのテキストを所定の条件に合致した重要テキストであると判定して重み付けを付与するものである。 Also, the weighting unit 15d refers to duplicate text from audio data or video data included in a plurality of video contents, and if the reference result is equal to or greater than a predetermined value, the text matches the predetermined condition. It is determined that the text is important, and weighting is given.

例えば、日々放送されるニュース番組やウェブ、ＳＮＳ等においては、社会的に大きな報道価値を有する話題、事件や事故等の情報は社会的に重要度、関心度が高いといえる。その結果、そのような社会的事象に関する情報は複数のメディアにおいて、複数の映像コンテンツとして出現する場合が多い。 For example, in news programs, webs, SNSs, and the like that are broadcast daily, it can be said that topics such as topics, incidents, accidents, and the like that have a great social value are of high social importance and interest. As a result, information regarding such social events often appears as a plurality of video contents in a plurality of media.

このように、社会的事象に関する情報は複数のメディアにおいて、複数の映像コンテンツとして出現する場合が多いことから、複数の映像コンテンツにおいて、テキストの形態素解析等によって重複するテキストを同一の構成要素として判断するとともに、その出現頻度、例えば、出現回数、出現時間、出現率等に応じて、段階的等の重要度を適正に設定するように構成されていることから、正確に構成要素の出現頻度、即ち、社会的重要度の判断が行われ、ユーザーに対して適切な動画コンテンツが提供される。 In this way, information related to social events often appears as multiple video contents in multiple media. Therefore, in multiple video contents, duplicate text is determined as the same component by morphological analysis of the text. In addition, since it is configured to appropriately set the importance, such as stepwise, according to its appearance frequency, for example, the number of appearances, the appearance time, the appearance rate, etc., the appearance frequency of the component accurately, That is, the social importance level is determined, and appropriate video content is provided to the user.

また、重み付け付与手段１５ｄは、予め設定された期間内における複数映像コンテンツを対象として重要テキストであるか否かを判定するものである。 The weight assigning unit 15d determines whether or not the text is an important text for a plurality of video contents within a preset period.

このような話題、事件又は事故等の情報の社会的な重要度の判断に要する期間を所定の期間内において判断するものである。従って、社会的な重要度の判断に要する期間を所定の期間内において判断されることから、ユーザーに対して迅速に社会的に重要な映像コンテンツを提供することが可能となる。 A period required for determining the social importance of information such as a topic, an incident or an accident is determined within a predetermined period. Therefore, since the period required for determining the social importance level is determined within a predetermined period, it is possible to quickly provide socially important video content to the user.

このように、社会的に重要な事件・事故等を、例えば、要約作成用とは別個に、大容量記憶部に録画記憶しておけば、オペレータ等の編集、記録映像等として残しておくことができ、収集した当該ユーザーにとって重要な映像コンテンツを適宜、分析等に再利用することができる映像情報提供システムを提供することができる。 In this way, if socially important incidents / accidents, etc., are recorded and stored in a large-capacity storage unit, for example, separately from the summary creation, they are left as edited by the operator, recorded video, etc. Therefore, it is possible to provide a video information providing system that can appropriately reuse collected video content important for the user for analysis or the like.

また、重み付け付与手段１５ｄは、新たな映像コンテンツを対象として、重要テキストを含む映像コンテンツであると判定した場合には、当該映像コンテンツの録画を開始するものである。 Further, when it is determined that the new video content is the video content including the important text, the weight assigning unit 15d starts recording the video content.

また、重み付け付与手段１５ｄは、複数の映像コンテンツの音声データ又は映像データの少なくとも一方に含まれるテキストから特定の人・物を対象として、その出現頻度からＣＭ換算値を算出するものである。 Further, the weighting unit 15d calculates a CM conversion value from the appearance frequency of a specific person / thing from text included in at least one of audio data or video data of a plurality of video contents.

そして、その出現回数、出現時間、出現率等に基づいて、マスコミ、情報の世界においてどのくらいの人物価値（例えば、ギャラの設定など）があるかの目安となるＣＭ換算値を算出することができる。なお、特定の法人名を対象とした場合には、ＣＭ全体に対する提供割合、法人が販売する各種商品の案分（力の入れ具合や新商品の特定）、株価変動との相互関係、といった解析に利用することもできる。 Then, based on the number of appearances, the appearance time, the appearance rate, etc., it is possible to calculate a CM-converted value that is a measure of how much personal value (for example, setting of the galley) there is in the media and information world. . In the case of a specific corporation name, analysis of the ratio of provision to the entire CM, prorated amount of various products sold by the corporation (determination of strength and identification of new products), and correlation with stock price fluctuations. It can also be used.

従って、重み付け付与手段１５ｄにより算出されたＣＭ換算値を利用して様々な経済活動の分析指標として使用することができる映像情報算出システムを提供することができる。 Accordingly, it is possible to provide a video information calculation system that can be used as an analysis index for various economic activities by using the CM conversion value calculated by the weighting unit 15d.

また、重み付け付与手段１５ｄは、複数の映像コンテンツの音声データ又は映像データの少なくとも一方に含まれるテキストから特定の法人名を対象とするものである。 Further, the weighting unit 15d targets a specific corporate name from text included in at least one of audio data or video data of a plurality of video contents.

従って、特定の法人名が出現頻度判断の対象となる構成要素とされることから、特定の法人名が現れる映像コンテンツが抽出され、ユーザーとしての視聴者である当該法人に提供される。 Therefore, since a specific corporate name is a constituent element for which the appearance frequency is determined, video content in which the specific corporate name appears is extracted and provided to the corporate entity that is a viewer as a user.

従って、特定の法人名が出現頻度判断の対象となる構成要素とされることから、特定の法人名が現れる映像コンテンツが抽出され、ユーザーとしての視聴者である当該法人に提供されることから、例えば、自社の社会的評価、評判等に関する情報を迅速、適切、かつ網羅的に収集することができ、自社の経営に敏速に反映させることが可能となる。 Therefore, since a specific corporate name is a component subject to appearance frequency determination, video content in which a specific corporate name appears is extracted and provided to the legal entity that is a viewer as a user. For example, information on the company's social evaluation, reputation, etc. can be collected quickly, appropriately and comprehensively, and can be promptly reflected in the company's management.

以下、第二実施の形態における要約作成システム１０について説明する。
＜要約作成システム１０の全体構成＞
図３に示すように、要約作成システム１０は、ビデオ信号分離部２０、発話テキスト化部１００、テロップテキスト化部２００、背景画像テキスト化部３００、ロゴマークテキスト化部４００、テキスト統合部５００、及び要約作成部６００を備える。本実施形態では要約作成システム１０はビデオ信号をテレビ局３０からのテレビ放送から取得する。なお、ビデオ信号は、インターネットにおける映像から取得することができる。 Hereinafter, the summary creation system 10 according to the second embodiment will be described.
<Overall Configuration of Summary Generation System 10>
As shown in FIG. 3, the summary creation system 10 includes a video signal separation unit 20, an utterance text conversion unit 100, a telop text conversion unit 200, a background image text conversion unit 300, a logo mark text conversion unit 400, a text integration unit 500, And a summary creation unit 600. In the present embodiment, the summary generation system 10 acquires a video signal from a television broadcast from the television station 30. The video signal can be acquired from video on the Internet.

音声信号と映像信号を含むビデオ信号Ｖは、ビデオ信号分離部２０で音声信号Ａと映像信号Ｂとに分離される。音声信号Ａは発話テキスト化部１００に入力され、映像信号Ｂはテロップテキスト化部２００、背景画像テキスト化部３００、及びロゴマークテキスト化部４００に入力される。 The video signal V including the audio signal and the video signal is separated into the audio signal A and the video signal B by the video signal separation unit 20. The audio signal A is input to the utterance text unit 100, and the video signal B is input to the telop text unit 200, the background image text unit 300, and the logo mark text unit 400.

＜発話テキスト化部１００＞
図４に示すように、発話テキスト化部１００は音声信号Ａを受けて映像コンテンツにおける人の発話内容を記述したテキストである発話テキストを出力する。発話テキスト化部１００は、発話情報抽出部１１０、発話内容認識部１２０、発話内容テキスト化部１３０を備える。 <Speech text unit 100>
As shown in FIG. 4, the utterance text unit 100 receives the audio signal A and outputs an utterance text that is a text describing a person's utterance content in the video content. The utterance text conversion unit 100 includes an utterance information extraction unit 110, an utterance content recognition unit 120, and an utterance content text conversion unit 130.

発話情報抽出部１１０は、ビデオ信号Ｖの音声信号Ａから発話情報を抽出する。すなわち、音声信号Ａ中の雑音を取り除き、人の発話音声の情報を抽出する。この発話情報として効果音や特徴的な音楽を含むことができる。 The utterance information extraction unit 110 extracts utterance information from the audio signal A of the video signal V. That is, the noise in the voice signal A is removed, and the information of the human speech voice is extracted. The utterance information can include sound effects and characteristic music.

発話内容認識部１２０は、発話情報から発話内容を認識する。すなわち、発話情報を音響的、文法的に解析して発話内容を言語として認識する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去の音声テキストの生成データから機械学習により生成できる。 The utterance content recognition unit 120 recognizes the utterance content from the utterance information. That is, utterance information is acoustically and grammatically analyzed to recognize the utterance content as a language. Parameters, conditions, and the like used for this recognition can be generated by machine learning from the past speech text generation data accumulated as described later.

発話内容テキスト化部１３０は発話内容をテキスト化して出力する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去の音声テキストの入力データ及び生成データから機械学習により生成できる。 The utterance content text conversion unit 130 converts the utterance content into text and outputs it. The parameters, conditions, and the like used for this recognition can be generated by machine learning from past speech text input data and generation data accumulated as described later.

＜テロップテキスト化部２００＞
図５に示すように、テロップテキスト化部２００は映像信号Ｂを受けて映像コンテンツにおけるテロップ内容を記述したテキストであるテロップテキストを出力する。テロップテキスト化部２００は、テロップ情報抽出部２１０、テロップ内容認識部２２０、テロップ内容テキスト化部２３０を備える。 <Telop text converter 200>
As shown in FIG. 5, the telop text conversion unit 200 receives the video signal B and outputs telop text that is text describing the telop content in the video content. The telop text conversion unit 200 includes a telop information extraction unit 210, a telop content recognition unit 220, and a telop content text conversion unit 230.

テロップ情報抽出部２１０は、ビデオ信号Ｖの映像信号Ｂからテロップ情報を抽出する。すなわち、映像信号Ｂ中の背景を取り除き、テロップ画像だけの情報を抽出する。 The telop information extraction unit 210 extracts telop information from the video signal B of the video signal V. That is, the background in the video signal B is removed, and only the telop image information is extracted.

発話内容認識部１２０は、テロップ画像情報からテロップ内容を認識する。すなわち、テロップ情報を言語的、文法的に解析してテロップ表示内容を言語として認識する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去のテロップテキストの入力データ及び生成データから機械学習により生成できる。 The utterance content recognition unit 120 recognizes the telop content from the telop image information. That is, the telop information is analyzed linguistically and grammatically to recognize the telop display content as a language. Parameters, conditions, and the like used for this recognition can be generated by machine learning from the past telop text input data and generation data accumulated as described later.

テロップ内容テキスト化部２３０はテロップ内容をテキスト化して出力する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去のテロップテキストの入力データ及び生成データから機械学習により生成できる。 The telop content text conversion unit 230 converts the telop content into text and outputs it. Parameters, conditions, and the like used for this recognition can be generated by machine learning from the past telop text input data and generation data accumulated as described later.

＜背景画像テキスト化部３００＞
図６に示すように、背景画像テキスト化部３００は映像信号Ｂを受けて映像コンテンツにおける背景画像内容を記述したテキストである背景画像テキストを出力する。背景画像としては、人物、人物の持ち物、人物の表情、風景、建築物の状況、室内の状況、動物、乗物、その他の物品を挙げることができる。背景画像テキスト化部３００は、背景画像情報抽出部３１０、背景画像内容認識部３２０、背景画像内容テキスト化部３３０を備える。 <Background image text unit 300>
As shown in FIG. 6, the background image text converting unit 300 receives the video signal B and outputs a background image text that is a text describing the background image content in the video content. Examples of the background image include a person, a person's belongings, a person's facial expression, a landscape, a building situation, an indoor situation, an animal, a vehicle, and other articles. The background image text conversion unit 300 includes a background image information extraction unit 310, a background image content recognition unit 320, and a background image content text conversion unit 330.

背景画像情報抽出部３１０は、ビデオ信号Ｖの映像信号Ｂから背景画像情報を抽出する。すなわち、映像信号Ｂ中のテロップや不鮮明な画像を取り除き、認識可能な背景画像だけの情報を抽出する。 The background image information extraction unit 310 extracts background image information from the video signal B of the video signal V. That is, the telop and unclear image in the video signal B are removed, and only the recognizable background image information is extracted.

背景画像内容認識部３２０は、背景画像情報から背景画像の内容を認識する。すなわち、背景画像情報を解析して表されている人物、人物の持ち物、人物の表情、風景、建築物の状況、室内の状況、動物、乗物、その他の物品を認識する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去の背景画像テキストの入力データ及び生成データから機械学習により生成できる。 The background image content recognition unit 320 recognizes the content of the background image from the background image information. That is, a person, a personal belonging, a human facial expression, a landscape, a building situation, an indoor situation, an animal, a vehicle, and other articles represented by analyzing background image information are recognized. Parameters, conditions, and the like used for this recognition can be generated by machine learning from input data and generation data of past background image text accumulated as will be described later.

背景画像内容テキスト化部３３０は背景画像内容をテキスト化して出力する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去の背景画像テキストの入力データ及び生成データから機械学習により生成できる。 The background image content text unit 330 converts the background image content into text and outputs it. Parameters, conditions, and the like used for this recognition can be generated by machine learning from input data and generation data of past background image text accumulated as will be described later.

＜ロゴマークテキスト化部４００＞
図７に示すように、ロゴマークテキスト化部４００は映像信号Ｂを受けて映像コンテンツにおけるロゴマーク内容を記述したテキストであるロゴマークテキストを出力する。ロゴマークとしては、商品の出所を表示する商標、その他の標章を挙げることができる。 <Logo Mark Textification Unit 400>
As shown in FIG. 7, the logo mark text converting unit 400 receives the video signal B and outputs a logo mark text which is a text describing the logo mark content in the video content. Examples of the logo include a trademark indicating the origin of the product and other marks.

ロゴマークテキスト化部４００は、ロゴマーク画像情報抽出部４１０、ロゴマーク内容認識部４２０、ロゴマーク内容テキスト化部４３０を備える。 The logo mark text conversion unit 400 includes a logo mark image information extraction unit 410, a logo mark content recognition unit 420, and a logo mark content text conversion unit 430.

ロゴマーク画像情報抽出部４１０は、ビデオ信号Ｖの映像信号Ｂからロゴマーク画像情報を抽出する。すなわち、映像信号Ｂ中のテロップや背景画像を取り除き、認識可能なロゴマーク画像だけの情報を抽出する。 The logo mark image information extraction unit 410 extracts logo mark image information from the video signal B of the video signal V. That is, the telop and the background image in the video signal B are removed, and only the recognizable logo mark image information is extracted.

ロゴマーク内容認識部４２０は、ロゴマーク画像情報からロゴマークの内容を認識する。すなわち、ロゴマーク画像情報を解析して表されている商品、サービス、店舗、施設等を認識する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去のロゴマークテキストの入力データ及び生成データから機械学習により生成できる。 The logo mark content recognition unit 420 recognizes the content of the logo mark from the logo mark image information. That is, it recognizes products, services, stores, facilities, etc. represented by analyzing logo mark image information. Parameters, conditions, and the like used for this recognition can be generated by machine learning from the past logo mark text input data and generation data accumulated as described later.

ロゴマーク内容テキスト化部４３０はロゴマーク画像内容をテキスト化して出力する。この認識に使用するパラメータ、条件等は後述するように蓄積された過去のロゴマークテキストの入力データ及び生成データから機械学習により生成できる。 The logo mark content text conversion unit 430 converts the logo mark image content into text and outputs it. Parameters, conditions, and the like used for this recognition can be generated by machine learning from the past logo mark text input data and generation data accumulated as described later.

＜テキスト統合部５００＞
図８に示すように、テキスト統合部５００は、発話テキスト化部１００からの発話テキスト、テロップテキスト化部２００からのテロップテキスト、背景画像テキスト化部３００からの背景テキスト、ロゴマークテキスト化部４００からの背景テキストを統合する。すなわち、各テキストにおける矛盾や誤りを訂正して、統合テキストを生成する。このテキストの統合に使用するパラメータ、条件等は後述するように蓄積された過去のテキスト統合の入力、出力データから機械学習により生成できる。 <Text integration unit 500>
As shown in FIG. 8, the text integration unit 500 includes the utterance text from the utterance text conversion unit 100, the telop text from the telop text conversion unit 200, the background text from the background image text conversion unit 300, and the logo mark text conversion unit 400. Integrate background text from That is, inconsistencies and errors in each text are corrected, and an integrated text is generated. Parameters, conditions, and the like used for text integration can be generated by machine learning from past text integration input and output data accumulated as described later.

＜要約作成部６００＞
図９に示すように、要約作成部６００は、テキスト統合部５００からの統合テキストを要約する。すなわち、要約テキストの内容を要約して指定された文字数とする。この要約に使用するパラメータ、条件等は後述するように蓄積された過去のようよう役処理の入力データ、出力データから機械学習により生成できる。 <Summary creation unit 600>
As shown in FIG. 9, the summary creation unit 600 summarizes the integrated text from the text integration unit 500. That is, the number of characters specified by summarizing the contents of the summary text is used. Parameters, conditions, and the like used for this summarization can be generated by machine learning from input data and output data of combination processing such as the past accumulated as will be described later.

次に、各部の機械学習処理について説明する。
＜発話テキスト化部１００の機械学習処理＞
図３は同要約作成システムの発話テキスト化部を示すブロック図である。発話テキスト化部１００は、発話情報抽出部１１０、発話内容認識部１２０、発話内容テキスト化部１３０の他、機械学習部１４０、内容認識テキスト作成設定部１５０、比較評価部１６０を備える。また発話テキスト化部１００には、既存データ格納部７００が接続されている。 Next, machine learning processing of each unit will be described.
<Machine learning process of speech text unit 100>
FIG. 3 is a block diagram showing an utterance text conversion unit of the summary creation system. The utterance text conversion unit 100 includes an utterance information extraction unit 110, an utterance content recognition unit 120, an utterance content text conversion unit 130, a machine learning unit 140, a content recognition text creation setting unit 150, and a comparative evaluation unit 160. Further, an existing data storage unit 700 is connected to the speech text unit 100.

発話テキスト化部１００は既存データ格納部７００が格納する既存のビデオデータと既存の発話テキストに基づいて機械学習を行い、発話内容認識部１２０及び発話内容テキスト化部１３０を最適化する。既存データ格納部７００には、過去に人が発話テキストを作成したときに使用した多数のビデオデータを格納した既存ビデオデータ格納部７１０と、このビデオデータの発話内容から作成した発話テキストを格納した既存発話テキスト格納部７２０を備える。これらのビデオデータ及び発話テキストは機械学習の教材となる。 The utterance text conversion unit 100 performs machine learning based on the existing video data stored in the existing data storage unit 700 and the existing utterance text, and optimizes the utterance content recognition unit 120 and the utterance content text conversion unit 130. The existing data storage unit 700 stores an existing video data storage unit 710 that stores a lot of video data that was used when a person created utterance texts in the past, and an utterance text created from the utterance contents of the video data. An existing utterance text storage unit 720 is provided. These video data and utterance texts are used as machine learning materials.

また、発話テキスト化部１００には、機械学習を行うときと、新規のビデオデータから発話内容テキストを作成するときにデータ出力の切り換えを行う切換部１７０、１８０を備える。 Further, the utterance text conversion unit 100 includes switching units 170 and 180 that perform data output switching when machine learning is performed and when an utterance content text is created from new video data.

内容認識テキスト作成設定部１５０は、発話内容認識部１２０の発話内容認識処理の設定と、発話内容テキスト化部１３０のテキスト化処理の設定が格納されている。発話内容認識部１２０及び発話内容テキスト化部１３０は内容認識テキスト作成設定部１５０の設定した条件、パラメータに従って発話内容の認識とテキスト化とを行う。 The content recognition text creation setting unit 150 stores the settings of the speech content recognition processing of the speech content recognition unit 120 and the settings of the text conversion processing of the speech content text conversion unit 130. The utterance content recognition unit 120 and the utterance content text conversion unit 130 recognize the utterance content and convert it into text according to the conditions and parameters set by the content recognition text creation setting unit 150.

比較評価部１６０は、比較部１６１と評価部１６２とを備える。比較部１６１は、既存ビデオデータ格納部７１０からの既存ビデオデータを受けて発話内容テキスト化部１３０が作成した発話テキストと、既存発話テキスト格納部７２０からの既存発話テキストとを比較する。評価部１６２は比較部１６１の比較結果に基づいて評価を行い、よく一致した場合は高い点数を与える。 The comparative evaluation unit 160 includes a comparison unit 161 and an evaluation unit 162. The comparison unit 161 receives the existing video data from the existing video data storage unit 710 and compares the utterance text created by the utterance content text conversion unit 130 with the existing utterance text from the existing utterance text storage unit 720. The evaluation unit 162 performs an evaluation based on the comparison result of the comparison unit 161, and gives a high score when the values match well.

機械学習部１４０は、評価部１６２からの評価を受け、内容認識テキスト作成設定部１５０の設定状態を変更する。この処理を同一のビデオデータについて繰り返し行い、評価部１６２の評価値をできるだけ高いものとする。この処理は複数のビデオデータについて繰り返し行うことができる。 The machine learning unit 140 receives the evaluation from the evaluation unit 162 and changes the setting state of the content recognition text creation setting unit 150. This process is repeated for the same video data to make the evaluation value of the evaluation unit 162 as high as possible. This process can be repeated for a plurality of video data.

このような機械学習を行うことにより、発話内容認識部１２０及び発話内容テキスト化部１３０の能力が向上する。所定の機械学習を終了した後、発話テキスト化部１００は新規ビデオデータを処理して、最適な発話テキストを出力できる状態となる。 By performing such machine learning, the abilities of the utterance content recognition unit 120 and the utterance content text conversion unit 130 are improved. After the predetermined machine learning is completed, the utterance text converting unit 100 processes the new video data and is in a state where the optimum utterance text can be output.

＜テロップテキスト化部２００の機械学習＞
図４は同要約作成システムのテロップテキスト化部を示すブロック図である。テロップテキスト化部２００は、テロップ情報抽出部２１０、テロップ内容認識部２２０、テロップ内容テキスト化部２３０の他、機械学習部２４０、内容認識テキスト作成設定部２５０、比較評価部２６０を備える。またテロップテキスト化部２００には、既存データ格納部７００が接続されている。 <Machine learning of telop text unit 200>
FIG. 4 is a block diagram showing a telop text conversion unit of the summary creation system. The telop text conversion unit 200 includes a telop information extraction unit 210, a telop content recognition unit 220, a telop content text conversion unit 230, a machine learning unit 240, a content recognition text creation setting unit 250, and a comparative evaluation unit 260. An existing data storage unit 700 is connected to the telop text conversion unit 200.

テロップテキスト化部２００は既存データ格納部７００が格納する既存のビデオデータと既存のテロップテキストに基づいて機械学習を行い、テロップ内容認識部２２０及びテロップ内容テキスト化部２３０を最適化する。既存データ格納部７００には、過去に人がテロップテキストを作成したときに使用した多数のビデオデータを格納した既存ビデオデータ格納部７１０と、このビデオデータの発話内容から作成したテロップテキストを格納した既存テロップテキスト格納部７３０を備える。これらのビデオデータ及び発話テキストは機械学習の教材となる。 The telop text conversion unit 200 performs machine learning based on the existing video data stored in the existing data storage unit 700 and the existing telop text, and optimizes the telop content recognition unit 220 and the telop content text conversion unit 230. The existing data storage unit 700 stores an existing video data storage unit 710 that stores a large number of video data used when a telop text was created by a person in the past, and a telop text created from the utterance content of the video data. An existing telop text storage unit 730 is provided. These video data and utterance texts are used as machine learning materials.

また、テロップテキスト化部２００には、機械学習を行うときと、新規のビデオデータから発話内容テキストを作成するときにデータ出力の切り換えを行う切換部２７０、２８０を備える。 The telop text conversion unit 200 includes switching units 270 and 280 that perform data output switching when machine learning is performed and when an utterance content text is created from new video data.

内容認識テキスト作成設定部２５０は、テロップ内容認識部２２０のテキスト内容認識処理の設定と、テロップ内容テキスト化部２３０のテキスト化処理の設定が格納されている。テロップ内容認識部２２０及びテロップ内容テキスト化部２３０は内容認識テキスト作成設定部２５０の設定した条件、パラメータに従ってテロップの内容認識及びテキスト化を行う。 The content recognition text creation setting unit 250 stores the text content recognition processing setting of the telop content recognition unit 220 and the text conversion processing setting of the telop content text conversion unit 230. The telop content recognition unit 220 and the telop content text conversion unit 230 perform content recognition and text conversion of the telop according to the conditions and parameters set by the content recognition text creation setting unit 250.

比較評価部２６０は、比較部２６１と評価部２６２とを備える。比較部２６１は、既存ビデオデータ格納部７１０からの既存ビデオデータを受けてテロップ内容テキスト化部２３０が作成したテロップテキストと、既存テロップテキスト格納部７３０からの既存テロップテキストとを比較する。評価部２６２は比較部２６１の比較結果に基づいて評価を行い、よく一致した場合は高い点数を与える。 The comparative evaluation unit 260 includes a comparison unit 261 and an evaluation unit 262. The comparison unit 261 receives the existing video data from the existing video data storage unit 710 and compares the telop text created by the telop content text unit 230 with the existing telop text from the existing telop text storage unit 730. The evaluation unit 262 performs an evaluation based on the comparison result of the comparison unit 261, and gives a high score when the values match well.

機械学習部２４０は、評価部２６２からの評価を受け、内容認識テキスト作成設定部２５０の設定状態を変更する。この処理を同一のビデオデータについて繰り返し行い、評価部２６２の評価値をできるだけ高いものとする。この処理は複数のビデオデータについて繰り返し行うことができる。 The machine learning unit 240 receives the evaluation from the evaluation unit 262 and changes the setting state of the content recognition text creation setting unit 250. This process is repeated for the same video data to make the evaluation value of the evaluation unit 262 as high as possible. This process can be repeated for a plurality of video data.

このような機械学習を行うことにより、テロップ内容認識部２２０及びテロップ内容テキスト化部２３０の能力が向上する。所定の機械学習を終了した後、テロップテキスト化部２００は新規ビデオデータを処理して、最適なテロップテキストを出力できる状態となる。 By performing such machine learning, the capabilities of the telop content recognition unit 220 and the telop content text conversion unit 230 are improved. After completing the predetermined machine learning, the telop text conversion unit 200 processes the new video data and is in a state where it can output the optimum telop text.

＜背景画像テキスト化部３００の機械学習＞
図６は同要約作成システムの背景画像テキスト化部を示すブロック図である。背景画像テキスト化部３００は、背景画像情報抽出部３１０、背景画像内容認識部３２０、背景画像内容テキスト化部３３０の他、機械学習部３４０、内容認識テキスト作成設定部３５０、比較評価部３６０を備える。また背景画像テキスト化部３００には、既存データ格納部７００が接続されている。 <Machine learning of background image text unit 300>
FIG. 6 is a block diagram showing a background image text conversion unit of the summary creation system. In addition to the background image information extraction unit 310, the background image content recognition unit 320, and the background image content text conversion unit 330, the background image text conversion unit 300 includes a machine learning unit 340, a content recognition text creation setting unit 350, and a comparative evaluation unit 360. Prepare. An existing data storage unit 700 is connected to the background image text unit 300.

背景画像テキスト化部３００は既存データ格納部７００が格納する既存のビデオデータと既存の背景画像テキストに基づいて機械学習を行い、背景画像内容認識部３２０及び背景画像内容テキスト化部３３０を最適化する。既存データ格納部７００には、過去に人がテロップテキストを作成したときに使用した多数のビデオデータを格納した既存ビデオデータ格納部７１０と、このビデオデータの発話内容から作成した背景画像テキストを格納した既存背景画像テキスト格納部７４０を備える。これらのビデオデータ及び背景画像テキストは機械学習の教材となる。 The background image text conversion unit 300 performs machine learning based on the existing video data stored in the existing data storage unit 700 and the existing background image text, and optimizes the background image content recognition unit 320 and the background image content text conversion unit 330. To do. The existing data storage unit 700 stores an existing video data storage unit 710 that stores a large number of video data used when a person created telop text in the past, and a background image text created from the utterance content of the video data. The existing background image text storage unit 740 is provided. These video data and background image text serve as machine learning materials.

また、背景画像テキスト化部３００には、機械学習を行うときと、新規のビデオデータから発話内容テキストを作成するときにデータ出力の切り換えを行う切換部３７０、３８０を備える。 The background image text converting unit 300 includes switching units 370 and 380 that perform data output switching when machine learning is performed and when an utterance content text is created from new video data.

内容認識テキスト作成設定部３５０は、背景画像内容認識部３２０の背景画像内容認識処理の設定と、背景画像内容テキスト化部３３０のテキスト化処理の設定が格納されている。背景画像内容認識部３２０及び背景画像内容テキスト化部３３０は内容認識テキスト作成設定部３５０の設定した条件、パラメータに従って背景画像の内容認識及びテキスト化を行う。 The content recognition text creation setting unit 350 stores the settings for the background image content recognition processing of the background image content recognition unit 320 and the settings for the text conversion processing of the background image content text conversion unit 330. The background image content recognition unit 320 and the background image content text conversion unit 330 perform content recognition and text conversion of the background image according to the conditions and parameters set by the content recognition text creation setting unit 350.

比較評価部３６０は、比較部３６１と評価部３６２とを備える。比較部３６１は、既存ビデオデータ格納部７１０からの既存ビデオデータを受けて背景画像内容テキスト化部３３０が作成した背景画像テキストと、既存背景画像テキスト格納部７４０からの既存背景画像テキストとを比較する。評価部３６２は比較部３６１の比較結果に基づいて評価を行い、よく一致した場合は高い点数を与える。 The comparative evaluation unit 360 includes a comparison unit 361 and an evaluation unit 362. The comparison unit 361 compares the background image text generated by the background image content text unit 330 upon receiving the existing video data from the existing video data storage unit 710 and the existing background image text from the existing background image text storage unit 740. To do. The evaluation unit 362 performs an evaluation based on the comparison result of the comparison unit 361, and gives a high score when the values match well.

機械学習部３４０は、評価部３６２からの評価を受け、内容認識テキスト作成設定部３５０の設定状態を変更する。この処理を同一のビデオデータについて繰り返し行い、評価部３６２の評価値をできるだけ高いものとする。この処理は複数のビデオデータについて繰り返し行うことができる。 The machine learning unit 340 receives the evaluation from the evaluation unit 362 and changes the setting state of the content recognition text creation setting unit 350. This process is repeated for the same video data to make the evaluation value of the evaluation unit 362 as high as possible. This process can be repeated for a plurality of video data.

このような機械学習を行うことにより、背景画像内容認識部３２０及び背景画像内容テキスト化部３３０の能力が向上する。所定の機械学習を終了した後、背景画像テキスト化部３００は新規ビデオデータを処理して、最適な背景画像テキストを出力できる状態となる。 By performing such machine learning, the capabilities of the background image content recognition unit 320 and the background image content text conversion unit 330 are improved. After the predetermined machine learning is completed, the background image text converting unit 300 processes the new video data and can output an optimum background image text.

＜ロゴマークテキスト化部４００の機械学習＞
図７は同要約作成システムのロゴマークテキスト化部を示すブロック図である。ロゴマークテキスト化部４００は、ロゴマーク画像情報抽出部４１０、ロゴマーク内容認識部４２０、ロゴマーク内容テキスト化部４３０の他、機械学習部４４０、内容認識テキスト作成設定部４５０、比較評価部４６０を備える。またロゴマークテキスト化部４００には、既存データ格納部７００が接続されている。 <Machine learning of logo mark text unit 400>
FIG. 7 is a block diagram showing a logo mark text conversion unit of the summary creation system. In addition to the logo mark image information extraction unit 410, the logo mark content recognition unit 420, and the logo mark content text conversion unit 430, the logo mark text conversion unit 400 includes a machine learning unit 440, a content recognition text creation setting unit 450, and a comparative evaluation unit 460. Is provided. Further, an existing data storage unit 700 is connected to the logo mark text unit 400.

ロゴマークテキスト化部４００は、既存データ格納部７００が格納する既存のビデオデータと既存のロゴマークテキストに基づいて機械学習を行い、ロゴマーク内容認識部４２０及びロゴマーク内容テキスト化部４３０を最適化する。既存データ格納部７００には、過去に人がロゴマークテキストを作成したときに使用した多数のビデオデータを格納した既存ビデオデータ格納部７１０と、このビデオデータの発話内容から作成したロゴマークテキストを格納した既存ロゴマークテキスト格納部７５０を備える。これらのビデオデータ及びロゴマークテキストは機械学習の教材となる。 The logo mark text conversion unit 400 performs machine learning based on the existing video data stored in the existing data storage unit 700 and the existing logo mark text, and the logo mark content recognition unit 420 and the logo mark content text conversion unit 430 are optimal. Turn into. In the existing data storage unit 700, an existing video data storage unit 710 storing a large number of video data used when a person has created logo mark text in the past, and a logo mark text created from the utterance content of the video data are stored. The stored existing logo mark text storage unit 750 is provided. These video data and logo mark text are used as machine learning materials.

また、ロゴマークテキスト化部４００には、機械学習を行うときと、新規のビデオデータから発話内容テキストを作成するときにデータ出力の切り換えを行う切換部４７０、４８０を備える。 In addition, the logo mark text converting unit 400 includes switching units 470 and 480 that switch data output when machine learning is performed and when an utterance content text is created from new video data.

内容認識テキスト作成設定部４５０は、ロゴマーク内容認識部４２０のロゴマーク画像内容認識処理の設定と、ロゴマーク内容テキスト化部４３０のテキスト化処理の設定が格納されている。ロゴマーク内容認識部４２０及びロゴマーク内容テキスト化部４３０は内容認識テキスト作成設定部４５０の設定した条件、パラメータに従ってロゴマークの内容認識及びテキスト化を行う。 The content recognition text creation setting unit 450 stores the settings of the logo mark image content recognition processing of the logo mark content recognition unit 420 and the settings of the text conversion processing of the logo mark content text conversion unit 430. The logo mark content recognition unit 420 and the logo mark content text conversion unit 430 perform content recognition and text conversion of the logo mark according to the conditions and parameters set by the content recognition text creation setting unit 450.

比較評価部４６０は、比較部４６１と評価部４６２とを備える。比較部４６１は、既存ビデオデータ格納部７１０からの既存ビデオデータを受けてロゴマーク内容テキスト化部４３０が作成したテキストと、既存ロゴマークテキスト格納部７５０からの既存背景画像テキストとを比較する。評価部４６２は比較部４６１の比較結果に基づいて評価を行い、よく一致した場合は高い点数を与える。 The comparative evaluation unit 460 includes a comparison unit 461 and an evaluation unit 462. The comparison unit 461 receives the existing video data from the existing video data storage unit 710 and compares the text created by the logo mark content text conversion unit 430 with the existing background image text from the existing logo mark text storage unit 750. The evaluation unit 462 performs an evaluation based on the comparison result of the comparison unit 461, and gives a high score when the values match well.

機械学習部４４０は、評価部４６２からの評価を受け、内容認識テキスト作成設定部４５０の設定状態を変更する。この処理を同一のビデオデータについて繰り返し行い、評価部４６２の評価値をできるだけ高いものとする。この処理は複数のビデオデータについて繰り返し行うことができる。 The machine learning unit 440 receives the evaluation from the evaluation unit 462 and changes the setting state of the content recognition text creation setting unit 450. This process is repeated for the same video data to make the evaluation value of the evaluation unit 462 as high as possible. This process can be repeated for a plurality of video data.

このような機械学習を行うことにより、ロゴマーク内容認識部４２０及びロゴマーク内容テキスト化部４３０の能力が向上する。所定の機械学習を終了した後、ロゴマークテキスト化部４００は新規ビデオデータを処理して、最適な背景画像テキストを出力できる状態となる。 By performing such machine learning, the ability of the logo mark content recognition unit 420 and the logo mark content text conversion unit 430 is improved. After the predetermined machine learning is completed, the logo mark text conversion unit 400 processes the new video data and is in a state where it can output the optimum background image text.

＜テキスト統合部５００の機械学習＞
図８は同要約作成システムのテキスト統合部を示すブロック図である。テキスト統合部５００は、統合テキスト作成部５１０、統合テキスト作成設定部５２０、機械学習部５３０、比較評価部５４０を備える。テキスト統合部５００には、既存データ格納部７００が接続されている。 <Machine learning of text integration unit 500>
FIG. 8 is a block diagram showing a text integration unit of the summary creation system. The text integration unit 500 includes an integrated text creation unit 510, an integrated text creation setting unit 520, a machine learning unit 530, and a comparative evaluation unit 540. An existing data storage unit 700 is connected to the text integration unit 500.

テキスト統合部５００は、既存データ格納部７００が格納する既存の各種、すなわち、発話テキスト、テロップテキスト、背景テキスト及びロゴマークテキストと既存の統合テキストに基づいて機械学習を行い、統合テキスト作成部５１０の動作を最適化する。既存データ格納部７００には、過去に統合テキストを作成したときに使用した各種テキストデータを格納した既存各種テキスト格納部７６０と、この各種テキストから作成した統合テキストを格納した既存統合テキスト格納部７７０とを備える。これらの各種テキスト及び統合テキストは機械学習の教材となる。 The text integration unit 500 performs machine learning based on various existing types stored in the existing data storage unit 700, that is, speech text, telop text, background text, logo mark text, and existing integrated text, and an integrated text creation unit 510. Optimize the operation. The existing data storage unit 700 stores an existing various text storage unit 760 that stores various text data used when an integrated text was created in the past, and an existing integrated text storage unit 770 that stores an integrated text created from the various texts. With. These various texts and integrated texts serve as machine learning materials.

また、テキスト統合部５００には、機械学習を行うときと、新規の各種テキストから新たな統合テキストを作成するときにデータ出力の切り換えを行う切換部５７０、５８０を備える。 In addition, the text integration unit 500 includes switching units 570 and 580 that perform data output switching when performing machine learning and when creating a new integrated text from various new texts.

統合テキスト作成設定部５２０は、統合テキスト作成部５１０のテキスト統合処理の設定が格納されている。統合テキスト作成部５１０は統合テキスト作成設定部５２０の設定した条件、パラメータに従ってテキスト統合処理を行う。 The integrated text creation setting unit 520 stores text integration processing settings of the integrated text creation unit 510. The integrated text creation unit 510 performs text integration processing according to the conditions and parameters set by the integrated text creation setting unit 520.

比較評価部５４０は、比較部５４１と評価部５４２とを備える。比較部５４１は、既存各種テキスト格納部７６０からの既存各種テキストを受けて統合テキスト作成部５１０が作成した統合テキストと、既存統合テキスト格納部７７０からの既存統合テキストとを比較する。評価部５４２は比較部５４１の比較結果に基づいて評価を行い、よく一致した場合は高い点数を与える。 The comparative evaluation unit 540 includes a comparison unit 541 and an evaluation unit 542. The comparison unit 541 receives the existing various texts from the existing various text storage units 760 and compares the integrated text created by the integrated text creation unit 510 with the existing integrated texts from the existing integrated text storage unit 770. The evaluation unit 542 performs an evaluation based on the comparison result of the comparison unit 541, and gives a high score when the values match well.

機械学習部５３０は、評価部５４２からの評価を受け、統合テキスト作成設定部５２０の設定状態を変更する。この処理を同一の各種テキストデータについて繰り返し行い、評価部５４２の評価値をできるだけ高いものとする。この処理は複数の各種テキストデータについて繰り返し行うことができる。 The machine learning unit 530 receives the evaluation from the evaluation unit 542 and changes the setting state of the integrated text creation setting unit 520. This process is repeated for the same various text data to make the evaluation value of the evaluation unit 542 as high as possible. This process can be repeated for a plurality of various text data.

このような機械学習を行うことにより、統合テキスト作成部５１０の能力が向上する。所定の機械学習を終了した後、テキスト統合部５００は新規ビデオデータを処理して、最適な統合テキストを出力できる状態となる。 By performing such machine learning, the ability of the integrated text creation unit 510 is improved. After completing the predetermined machine learning, the text integration unit 500 processes the new video data and is in a state where it can output the optimum integrated text.

＜要約作成部６００の機械学習＞
図９は同要約作成システムの要約作成部を示すブロック図である。要約作成部６００は、要約テキスト作成部６１０、要約作成設定部６２０、機械学習部６３０、比較評価部６４０を備える。要約作成部６００には、既存データ格納部７００が接続されている。 <Machine learning of summary creation unit 600>
FIG. 9 is a block diagram showing a summary creation unit of the summary creation system. The summary creation unit 600 includes a summary text creation unit 610, a summary creation setting unit 620, a machine learning unit 630, and a comparative evaluation unit 640. An existing data storage unit 700 is connected to the summary creation unit 600.

要約作成部６００は既存データ格納部７００が格納する統合テキストと要約テキストに基づいて機械学習を行い、要約テキスト作成部６１０の動作を最適化する。既存データ格納部７００には、過去に要約テキストを作成したときに使用した統合テキストデータを格納した既存統合テキスト格納部７７０と、この統合テキストから作成した要約テキストを格納した既存要約テキスト格納部７８０とを備える。これらの統合テキスト及び要約テキストは機械学習の教材となる。 The summary creation unit 600 performs machine learning based on the integrated text and summary text stored in the existing data storage unit 700, and optimizes the operation of the summary text creation unit 610. The existing data storage unit 700 includes an existing integrated text storage unit 770 that stores integrated text data used when a summary text was created in the past, and an existing summary text storage unit 780 that stores a summary text created from the integrated text. With. These integrated texts and summary texts serve as machine learning materials.

また、要約作成部６００には、機械学習を行うときと、新規の統合テキストから新たな要約テキストを作成するときにデータ出力の切り換えを行う切換部６７０、６８０を備える。 The summary creation unit 600 includes switching units 670 and 680 that perform data output switching when machine learning is performed and when a new summary text is created from a new integrated text.

要約作成設定部６２０には、要約テキスト作成部６１０の要約処理の設定が格納されている。要約テキスト作成部６１０は要約作成設定部６２０の設定した条件、パラメータに従ってテキスト要約処理を行う。 The summary creation setting unit 620 stores the summary processing settings of the summary text creation unit 610. The summary text creation unit 610 performs text summary processing according to the conditions and parameters set by the summary creation setting unit 620.

比較評価部６４０は、比較部６４１と評価部６４２とを備える。比較部６４１は、既存統合テキスト格納部７７０からの既存統合テキストを受けて要約テキスト作成部６１０が作成した要約テキストと、既存要約テキスト格納部７８０からの要約テキストとを比較する。評価部６４２は比較部６４１の比較結果に基づいて評価を行い、よく一致した場合は高い点数を与える。 The comparative evaluation unit 640 includes a comparison unit 641 and an evaluation unit 642. The comparison unit 641 compares the summary text created by the summary text creation unit 610 in response to the existing integration text from the existing integration text storage unit 770 and the summary text from the existing summary text storage unit 780. The evaluation unit 642 performs an evaluation based on the comparison result of the comparison unit 641 and gives a high score when the values match well.

機械学習部６３０は、評価部６４２からの評価を受け、要約作成設定部６２０の設定状態を変更する。この処理を同一の各種テキストデータについて繰り返し行い、評価部６４２の評価値をできるだけ高いものとする。この処理は複数の統合テキストデータについて繰り返し行うことができる。 The machine learning unit 630 receives the evaluation from the evaluation unit 642 and changes the setting state of the summary creation setting unit 620. This process is repeated for the same various text data to make the evaluation value of the evaluation unit 642 as high as possible. This process can be repeated for a plurality of integrated text data.

このような機械学習を行うことにより、要約テキスト作成部６１０の能力が向上する。所定の機械学習を終了した後、要約作成部６００は新規ビデオデータを処理して、最適な要約テキストを出力できる状態となる。 By performing such machine learning, the capability of the summary text creation unit 610 is improved. After completing the predetermined machine learning, the summary creation unit 600 can process the new video data and output an optimum summary text.

次に、要約作成システム１０の処理について説明する。図１０は同要約作成システムの動作を示すフローチャートである。 Next, processing of the summary creation system 10 will be described. FIG. 10 is a flowchart showing the operation of the summary creation system.

まず、既存データ格納部７００の既存ビデオデータ格納部７１０、既存発話テキスト格納部７２０、既存テロップテキスト格納部７３０、既存背景画像テキスト格納部７４０、既存ロゴマークテキスト格納部７５０、既存各種テキスト格納部７６０、既存統合テキスト格納部７７０、既存要約テキスト格納部７８０に既存のビデオ信号、各種テキストデータを読み込む（ステップＳＴ１）。 First, the existing video data storage unit 710, the existing utterance text storage unit 720, the existing telop text storage unit 730, the existing background image text storage unit 740, the existing logo mark text storage unit 750, and the existing various text storage units of the existing data storage unit 700 760, an existing video signal and various text data are read into the existing integrated text storage unit 770 and the existing summary text storage unit 780 (step ST1).

次いで発話テキスト化部１００、テロップテキスト化部２００、背景画像テキスト化部３００、ロゴマークテキスト化部４００において、機械学習処理を行う（ステップＳＴ２ａ、ＳＴ２ｂ、ＳＴ２ｃ、ＳＴ２ｄ）。この学習処理は逐次的に行うこともできる。 Next, machine learning processing is performed in the speech text unit 100, the telop text unit 200, the background image text unit 300, and the logo mark text unit 400 (steps ST2a, ST2b, ST2c, and ST2d). This learning process can also be performed sequentially.

次に、テキスト統合部５００の既存データ格納部５５０、要約作成部６００の既存データ格納部６５０に既存の入力データ、出力データを読み込む（ステップＳＴ３）。次いで、テキスト統合部５００、要約作成部６００において機械学習処理を行う（ステップＳＴ３ａ、３ｂ）。この学習処理は逐次的に行うこともできる。なお、発話テキスト化部１００、テロップテキスト化部２００、背景画像テキスト化部３００、及びロゴマークテキスト化部４００の機械学習処理と、及びテキスト統合部５００及び要約作成部６００の機械学習処理とは処理の順序を問わず、逆の順序で行うことができる。 Next, the existing input data and output data are read into the existing data storage unit 550 of the text integration unit 500 and the existing data storage unit 650 of the summary creation unit 600 (step ST3). Next, machine learning processing is performed in the text integration unit 500 and the summary creation unit 600 (steps ST3a and 3b). This learning process can also be performed sequentially. Note that the machine learning processing of the utterance text conversion unit 100, the telop text conversion unit 200, the background image text conversion unit 300, and the logo mark text conversion unit 400, and the machine learning processing of the text integration unit 500 and the summary creation unit 600 are as follows. Regardless of the order of processing, it can be performed in the reverse order.

学習処理が終了すると（ステップＳＴ４のｙｅｓ）、処理対象となるビデオ信号をビデオ信号分離部２０に入力する（ステップＳＴ５）。これにより、発話テキスト化部１００、テロップテキスト化部２００、背景画像テキスト化部３００、ロゴマークテキスト化部４００は、テキスト化処理を実行する（ステップＳＴ６ａ、ＳＴ６ｂ、ＳＴ６ｃ、ＳＴ６ｄ）。 When the learning process is completed (yes in step ST4), the video signal to be processed is input to the video signal separation unit 20 (step ST5). Thereby, the utterance text conversion unit 100, the telop text conversion unit 200, the background image text conversion unit 300, and the logo mark text conversion unit 400 execute text conversion processing (steps ST6a, ST6b, ST6c, ST6d).

そして、各テキストをテキスト統合部５００で統合処理し（ステップＳＴ７）、更に統合されたテキストを要約作成部６００で要約処理し（ステップＳＴ８）、要約テキストを出力し、要約作成システム１０の処理は終了する。 Then, each text is integrated by the text integration unit 500 (step ST7), the integrated text is further summarized by the summary creation unit 600 (step ST8), and the summary text is output. finish.

次の要約作成処理からは、機械学習処理（ステップＳＴ１〜ＳＴ４）は行わなくて直ちに要約作成の対象ビデオ信号の入力（ステップＳＴ５）をするだけで、最適な要約作成を行うことができる。また、機械学習処理は必要に応じて行うことができる。 From the next summary creation process, the optimum summary creation can be performed only by inputting the target video signal for summary creation (step ST5) immediately without performing the machine learning process (steps ST1 to ST4). The machine learning process can be performed as necessary.

以上のシステムは、処理装置としてのＣＰＵ(Central Processing Unit)、記憶装置としてＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、ＨＤＤ（Hard Disc Drive）、ＳＳＤ（Solid State Drive）等を備えたコンピュータシステムでアプリケーションションソフトウエアを実行して実現できる。また、各部は同一ヶ所に配置される必要はなく、一部をウェブに配置してネットワークで接続して実現することができる。また、これらの処理は、多量のデータを対象とするためＧＰＵ（Graphics Processing Unit）を使用して処理することが好ましい。 The above system includes a CPU (Central Processing Unit) as a processing device, a RAM (Random Access Memory), a ROM (Read Only Memory), an HDD (Hard Disc Drive), an SSD (Solid State Drive), and the like as a storage device. It can be realized by executing application software on a computer system. Moreover, each part does not need to be arrange | positioned in the same place, and it can implement | achieve by arrange | positioning a part on a web and connecting with a network. In addition, since these processes target a large amount of data, it is preferable to perform processing using a GPU (Graphics Processing Unit).

すなわち、統合テキストは、単に、音声、文字,背景映像等の文字化してものであり、膨大な文章についてのデータである。コのため、ＧＰＵをテキスト処理に特化することにより高速に処理できる。 That is, the integrated text is simply converted into characters such as voice, characters, background video, etc., and is data on a huge amount of text. Therefore, GPU can be processed at high speed by specializing in text processing.

また、テキスト統合部５００によるテキスト入力は、発話テキスト、テロップテキスト、背景画像テキスト及びロゴマークテキストに限定されない。 Further, text input by the text integration unit 500 is not limited to speech text, telop text, background image text, and logo mark text.

例えば、テレビ番組（地上デジタルテレビ放送番組）を対象とする場合、電子番組表（ＥＰＧ）、字幕放送、解説放送から取得した文字や音声をテキストとして取得して入力することができる。これにより、統合テキストの質と量とを向上させるとともに、テキストの汎用性や嗜好性を向上させることができる。 For example, when a TV program (terrestrial digital TV broadcast program) is targeted, characters and voices acquired from an electronic program guide (EPG), caption broadcasting, and commentary broadcasting can be acquired and input as text. Thereby, while improving the quality and quantity of an integrated text, the versatility and preference of a text can be improved.

同様に、インターネット映像配信を対象とする場合、第三者の評価（コメントを含む）や評判をテキストとして取得して入力できる。これにより、統合テキストの質と量とを向上させるとともに、テキストの汎用性や嗜好性を向上させることができる。 Similarly, in the case of Internet video distribution, third party evaluation (including comments) and reputation can be acquired and input as text. As a result, the quality and quantity of the integrated text can be improved, and the versatility and taste of the text can be improved.

なお、「重み付け」には、視聴回数（再生回数）や第三者の評価（ｇｏｏｄ・ｂａｄ）を利用して、視聴回数に対する評価割合、或いは、ｇｏｏｄ／ｂａｄの比率等を利用することも可能である。 For “weighting”, it is possible to use an evaluation ratio with respect to the number of times of viewing or the ratio of good / bad, etc., using the number of times of viewing (number of times of reproduction) or third party evaluation (good / bad). It is.

以上のように、第二の実施形態に係る映像情報提供システム１１にあっては、要約書作成手段６００により作成された要約は、重み付け付与手段１５ｄによって複数の映像コンテンツを対象として少なくとも一つ以上の所定の条件に特化した重み付けを付与されることにより、複数の映像コンテンツを対象として利用者により有用な情報を供給することができる。 As described above, in the video information providing system 11 according to the second embodiment, at least one summary created by the summary creation unit 600 is targeted for a plurality of video contents by the weighting unit 15d. By assigning weights specialized to the predetermined conditions, useful information can be supplied to the user for a plurality of video contents.

第二の実施の形態にあってはデータ処理をＡＩ（人工知能：artificial intelligence）処理により高速且つ適切に処理して要約化する。ＡＩ処理は、上述した機械学習（ＭＬ：machine learning）により実現できる。更に、機械学習として、既存データを正解とする教師有り学習が採用できる。また、機械学習としてディープラーニング（深層学習：ＤＬ：Deep Learning）により行うと効果的である。 In the second embodiment, data processing is summarized at high speed and appropriately by AI (artificial intelligence) processing. The AI process can be realized by the machine learning (ML) described above. Furthermore, supervised learning with existing data as correct answers can be adopted as machine learning. Further, it is effective to perform deep learning (DL: Deep Learning) as machine learning.

ディープラーニングでは、既存の多数のビデオデータ、各ビデオデータに対応する各種テキストデータ、統合テキスト、要約テキストをビッグデータとして学習を行う。この、各機械学習部は、入力層、複数の中間層、出力層を備え、多数のニューロンを備えたニューラルネットワークにより処理を行い。すなわち、本発明に係る要約作成システムに入力された新規ビデオデータ、このビデオデータによる各種テキスト、統合テキスト、要約を入力とした出力が、既存の各種テキスト、統合テキスト、要約に近づくように中間層のニューロンにおける重み、パラメータを最小二乗法等の手法で適正化する。 In deep learning, a large number of existing video data, various text data corresponding to each video data, integrated text, and summary text are learned as big data. Each machine learning unit includes an input layer, a plurality of intermediate layers, and an output layer, and performs processing by a neural network including a large number of neurons. That is, an intermediate layer so that new video data input to the summary creation system according to the present invention, various texts based on the video data, integrated texts, and an output with the summary as input are close to existing various texts, integrated texts, and summaries. The weights and parameters in the neurons are optimized by a method such as the least square method.

本発明の権利範囲は、前記第一及び第二の実施の形態に記載された構成には限定されず、本発明の範囲内に含まれる全ての構成に及ぶものである。 The scope of rights of the present invention is not limited to the configurations described in the first and second embodiments, but covers all configurations included within the scope of the present invention.

１：映像情報提供システム
２：受信部
３：操作部
４：記憶部
５：制御回路部
５ａ：構成要素出現頻度判断手段
５ｂ：映像コンテンツ抽出手段
５Ｃ：希望映像登録手段
６：大容量記憶部
７：出力部（映像コンテンツ提供手段、出力手段）
７ａ：表示映像対象物
９：再生装置
１０：要約作成システム（要約作成手段）
１１：映像情報提供システム
１２：受信部
１３：操作部
１４：記憶部
１５：制御回路部
１５ａ：構成要件出現頻度判断手段
１５ｂ：映像コンテンツ抽出手段
１５ｃ：希望映像登録手段
１５ｄ：重み付け付与手段
１６：大容量記憶部
１７：出力部（出力手段）
１７ａ：表示画面
１８：ビデオ信号処理部
１９：再生装置
２０：ビデオ信号分離部
３０：テレビ局
４０：インターネット
１００：発話テキスト化部
１１０：発話情報抽出部
１２０：発話内容認識部
１３０：発話内容テキスト化部
１４０：機械学習部
１５０：内容認識テキスト作成設定部
１６０：比較評価部
１６１：比較部
１６２：評価部
１７０：切換部
１８０：切換部
２００：テロップテキスト化部
２１０：テロップ情報抽出部
２２０：テロップ内容認識部
２３０：テロップ内容テキスト化部
２４０：機械学習部
２５０：内容認識テキスト作成設定部
２６０：比較評価部
２６１：比較部
２６２：評価部
２７０：切換部
２８０：切換部
３００：背景画像テキスト化部
３１０：背景画像情報抽出部
３２０：背景画像内容認識部
３３０：背景画像内容テキスト化部
３４０：機械学習部
３５０：内容認識テキスト作成設定部
３６０：比較評価部
３６１：比較部
３６２：評価部
３７０：切換部
３８０：切換部
４００：ロゴマークテキスト化部
４１０：ロゴマーク画像情報抽出部
４２０：ロゴマーク内容認識部
４３０：ロゴマーク内容テキスト化部
４４０：機械学習部
４５０：内容認識テキスト作成設定部
４６０：比較評価部
４６１：比較部
４６２：評価部
４７０：切換部
４８０：切換部
５００：テキスト統合部
５１０：統合テキスト作成部
５２０：統合テキスト作成設定部
５３０：機械学習部
５４０：比較評価部
５４１：比較部
５４２：評価部
５５０：既存データ格納部
５７０：切換部
５８０：切換部
６００：要約作成部（要約作成手段）
６１０：要約テキスト作成部
６２０：要約作成設定部
６３０：機械学習部
６４０：比較評価部
６４１：比較部
６４２：評価部
６５０：既存データ格納部
６７０：切換部
６８０：切換部
７００：既存データ格納部
７１０：既存ビデオデータ格納部
７２０：既存発話テキスト格納部
７３０：既存テロップテキスト格納部
７４０：既存背景画像テキスト格納部
７５０：既存ロゴマークテキスト格納部
７６０：既存各種テキスト格納部
７７０：既存統合テキスト格納部
７８０：既存要約テキスト格納部 1: video information providing system 2: receiving unit 3: operation unit 4: storage unit 5: control circuit unit 5a: component appearance frequency determination unit 5b: video content extraction unit 5C: desired video registration unit 6: large capacity storage unit 7 : Output unit (video content providing means, output means)
7a: Display video object 9: Playback device 10: Summary creation system (summary creation means)
11: video information providing system 12: receiving unit 13: operation unit 14: storage unit 15: control circuit unit 15a: component appearance frequency determining unit 15b: video content extracting unit 15c: desired video registration unit 15d: weighting unit 16: Large-capacity storage unit 17: output unit (output means)
17a: Display screen 18: Video signal processing unit 19: Playback device 20: Video signal separation unit 30: TV station 40: Internet 100: Utterance text conversion unit 110: Utterance information extraction unit 120: Utterance content recognition unit 130: Utterance content text conversion Unit 140: machine learning unit 150: content recognition text creation / setting unit 160: comparison evaluation unit 161: comparison unit 162: evaluation unit 170: switching unit 180: switching unit 200: telop text converting unit 210: telop information extracting unit 220: telop Content recognition unit 230: telop content text conversion unit 240: machine learning unit 250: content recognition text creation setting unit 260: comparison evaluation unit 261: comparison unit 262: evaluation unit 270: switching unit 280: switching unit 300: background image text conversion Unit 310: Background image information extraction unit 320: Background image content recognition unit 330: In the background image Text conversion unit 340: Machine learning unit 350: Content recognition text creation setting unit 360: Comparison evaluation unit 361: Comparison unit 362: Evaluation unit 370: Switching unit 380: Switching unit 400: Logo mark text conversion unit 410: Logo mark image information Extraction unit 420: Logo mark content recognition unit 430: Logo mark content text conversion unit 440: Machine learning unit 450: Content recognition text creation setting unit 460: Comparison evaluation unit 461: Comparison unit 462: Evaluation unit 470: Switching unit 480: Switching Unit 500: text integration unit 510: integrated text creation unit 520: integrated text creation setting unit 530: machine learning unit 540: comparative evaluation unit 541: comparison unit 542: evaluation unit 550: existing data storage unit 570: switching unit 580: switching Part 600: Summary creation part (summary creation means)
610: summary text creation unit 620: summary creation setting unit 630: machine learning unit 640: comparative evaluation unit 641: comparison unit 642: evaluation unit 650: existing data storage unit 670: switching unit 680: switching unit 700: existing data storage unit 710: Existing video data storage unit 720: Existing utterance text storage unit 730: Existing telop text storage unit 740: Existing background image text storage unit 750: Existing logo mark text storage unit 760: Existing various text storage unit 770: Existing integrated text storage Part 780: Existing summary text storage part

Claims

When the appearance frequency is greater than or equal to a predetermined value based on the determination by the component appearance frequency determination unit that determines the appearance frequency of the same component based on the component of the distributed video content, and the determination by the component appearance frequency determination unit A video information providing system comprising: video content extracting means for extracting video content including the constituent elements; and video content providing means for providing the viewer with the video extracted by the video content extracting means.

2. The video information providing system according to claim 1, wherein the component appearance frequency determining means determines the appearance frequency of the component based on a viewing request registered by a viewer.

The video information providing system according to claim 1, wherein the component is a character, sound, a video object, a character, or a background of the video object as a theme of the video content.

4. The component appearance frequency determining means determines an appearance frequency of a component element by referring to a title part displayed by characters in a video screen. The video information providing system according to claim 1.

5. The video information providing system according to claim 1, wherein the video content is distributed from a server via a telecommunication line.

5. The video information providing system according to claim 1, wherein the video content is distributed from a broadcasting station via a telecommunication line.

Summary creating means for creating a summary by converting audio data or video data in the video content extracted from each video signal of the video content into a text;
While learning the optimum condition based on the accumulation result of the summary created by the summary creation means, a weight specific to one or more predetermined conditions included in the summary is given to the plurality of video contents 2. The video information providing system according to claim 1, further comprising weighting means, wherein the component appearance frequency determining means determines the appearance frequency of the component based on the summary.

The weight assigning means matches the text with the predetermined condition when the reference result is a predetermined value or more after referring to the duplicate text from the audio data or video data included in the plurality of video contents. The video information providing system according to claim 7, wherein the important text is determined and weighting is given.

9. The video information providing system according to claim 8, wherein the weighting unit determines whether the text is an important text for a plurality of video contents within a preset period.

10. The weighting assigning unit starts recording the video content when it is determined that the video content includes the important text for a new video content. The video information providing system according to any one of the above.

The weight assigning means calculates a CM conversion value from the appearance frequency of a specific person / thing from a text included in at least one of the audio data or the video data of the plurality of video contents. The video information providing system according to claim 7.

8. The video information providing system according to claim 7, wherein the weight assigning unit targets a specific corporate name from text included in at least one of the audio data or the video data of the plurality of video contents. .