JPWO2008050718A1

JPWO2008050718A1 - Rights information extraction device, rights information extraction method and program

Info

Publication number: JPWO2008050718A1
Application number: JP2008540979A
Authority: JP
Inventors: 亮磨大網
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2006-10-26
Filing date: 2007-10-22
Publication date: 2010-02-25
Anticipated expiration: 2027-10-22
Also published as: JP5218766B2; WO2008050718A1

Abstract

クレジット認識手段１００は、映像などのコンテンツ中で、音楽の名称や登場人物名等のクレジット情報が重畳されているクレジット情報重畳区間を検出するクレジット情報重畳区間検出手段４００と、クレジット情報重畳区間からクレジット情報を読み取り、結果をクレジット候補情報として出力するクレジット情報読取手段６００とを備える。The credit recognition means 100 includes a credit information superimposition section detection means 400 for detecting a credit information superimposition section in which credit information such as a music name and a character name is superimposed in content such as video, and the credit information superimposition section. Credit information reading means 600 for reading credit information and outputting the result as credit candidate information.

Description

本発明は権利情報抽出装置、権利情報抽出方法及びプログラムに関し、特に、番組等のコンテンツから、著作権や著作隣接権などの権利に関する情報を抽出する権利情報抽出装置、権利情報抽出方法及びプログラムに関する。 The present invention relates to a rights information extraction device, a rights information extraction method, and a program, and more particularly, to a rights information extraction device, a rights information extraction method, and a program that extract information related to rights such as copyright and copyright right from content such as a program. .

従来、コンテンツに付随する著作権などの権利を管理する権利管理システムが、例えば、特許文献１に記載されている。 Conventionally, for example, Patent Document 1 discloses a rights management system for managing rights such as copyrights attached to content.

特許文献１に記載されている権利管理システムは、コンテンツマネジメントサーバで著作権などの権利を一括管理し、契約管理サーバや課金サーバ、認証サーバなどと連携することによって、コンテンツ利用者の要求に応じた自動契約、およびコンテンツのセキュアな流通を実現している。 The rights management system described in Patent Document 1 collectively manages rights such as copyrights on a content management server, and responds to requests from content users by cooperating with a contract management server, billing server, authentication server, and the like. Automatic contracts and secure distribution of content.

一方、権利情報は、仲介業者によって人手により登録されることを前提としている。すなわち、権利情報のデータベースの整備は人手に頼っているのが現状であり、過去に制作したコンテンツを、特許文献１に記載されたようなシステムで取り扱えるようにするためには、誰かが権利情報を抽出し、仲介業者等を通じてデータベースに登録する必要がある。 On the other hand, it is assumed that the right information is manually registered by an intermediary. That is, the present situation is that the maintenance of the database of rights information relies on humans, and in order to be able to handle the contents produced in the past with a system as described in Patent Document 1, somebody has rights information. Must be extracted and registered in the database through an intermediary.

しかしながら、過去のコンテンツの場合、契約情報の詳細などが残っていない場合も多く、まず、誰がコンテンツの権利を有するかを明確にする必要がある。従来、これらの作業は人手で確認しながら登録するようになっており、このプロセスに非常に膨大な工数を費やさねばならないという問題があった。これが、例えば、テレビ放送されたドラマなどの優良コンテンツが二次流通市場に流れない原因ともなっていた。 However, in the case of past contents, there are many cases where details of contract information and the like do not remain, and it is first necessary to clarify who has the rights to the contents. Conventionally, these operations are registered while being confirmed manually, and there has been a problem that a very large number of man-hours must be spent on this process. This has also been a cause of not having excellent content such as dramas broadcast on the secondary distribution market.

特開２００２−１０９２５４号公報JP 2002-109254 A

第１の問題点は、映像などのコンテンツに関連する権利情報の特定が自動化されていないことである。このため、過去コンテンツの利用に際しては、コンテンツの権利者を特定するために、人手で権利情報を抽出しなければならず、膨大な手間がかかっていた。 The first problem is that identification of right information related to content such as video is not automated. For this reason, when using the past contents, it is necessary to manually extract the right information in order to identify the right holder of the contents, which takes a lot of time and effort.

第２の問題点は、これをテロップ認識などの技術によって抽出する場合には、精度が十分でないことである。その理由は、通常のテロップ認識では、権利情報に特化していないため、精度が低いという問題があった。また、権利に関係のない情報も多く含まれ、権利情報の特定が困難であるという問題があった。 The second problem is that the accuracy is not sufficient when this is extracted by a technique such as telop recognition. The reason is that normal telop recognition does not specialize in rights information, and therefore has a problem of low accuracy. In addition, there is a lot of information not related to rights, and there is a problem that it is difficult to specify rights information.

そこで、本発明は上記課題に鑑みて発明されたものであって、映像などのコンテンツから権利に関する情報を自動抽出できる権利情報抽出装置及び権利情報抽出方法を提供することである。 Accordingly, the present invention has been invented in view of the above problems, and it is an object of the present invention to provide a rights information extraction apparatus and a rights information extraction method that can automatically extract information about rights from content such as video.

上記課題を解決する本発明は、コンテンツから権利に関するクレジット情報を読み取り、結果をクレジット候補情報として出力するクレジット情報認識手段を有することを特徴とする。 The present invention for solving the above-described problems is characterized by comprising credit information recognition means for reading credit information related to rights from content and outputting the result as credit candidate information.

上記課題を解決する本発明は、コンテンツから権利に関する権利情報を抽出する権利情報抽出装置であって、コンテンツから権利に関するクレジット情報を読み取り、結果をクレジット候補情報として出力するクレジット情報認識手段と、前記コンテンツを解析し、コンテンツ内の権利に関する対象物を認識して、その結果を対象物識別情報として出力する対象物認識手段と、前記クレジット候補情報と、前記対象物識別情報とを統合し、権利情報として出力する統合手段とを有することを特徴とする。 The present invention for solving the above-mentioned problems is a rights information extraction device for extracting rights information related to rights from content, the credit information recognition means for reading credit information about rights from content and outputting the result as credit candidate information, Analyzing the content, recognizing the object related to the right in the content, and outputting the result as object identification information, integrating the credit candidate information and the object identification information, And integrating means for outputting as information.

上記課題を解決する本発明は、コンテンツから権利に関する権利情報を抽出する権利情報抽出装置であって、コンテンツから権利に関するクレジット情報を読み取り、結果をクレジット候補情報として出力するクレジット情報認識手段と、前記クレジット候補情報を参照し、前記コンテンツを解析してコンテンツ内の権利に関する対象物を認識して、その結果を対象物識別情報として出力する対象物認識手段と、前記クレジット候補情報と前記対象物識別情報とを統合し、権利情報として出力する統合手段とを有することを特徴とする。 The present invention for solving the above-mentioned problems is a rights information extraction device for extracting rights information related to rights from content, the credit information recognition means for reading credit information about rights from content and outputting the result as credit candidate information, Object recognition means for referring to credit candidate information, analyzing the content to recognize an object related to rights in the content, and outputting the result as object identification information; the credit candidate information and the object identification And integrating means for integrating the information and outputting the information as right information.

上記課題を解決する本発明は、コンテンツから権利に関するクレジット情報を読み取り、結果をクレジット候補情報として出力することを特徴とする権利情報抽出方法である。 The present invention for solving the above-mentioned problems is a rights information extraction method characterized by reading credit information about rights from content and outputting the result as credit candidate information.

上記課題を解決する本発明は、コンテンツから権利に関する権利情報を抽出する権利情報抽出方法であって、コンテンツから権利に関するクレジット情報を読み取り、結果をクレジット候補情報として出力する処理と、前記コンテンツを解析し、コンテンツ内の権利に関する対象物を認識して、その結果を対象物識別情報として出力する処理と、前記クレジット候補情報と前記対象物識別情報とを統合し、権利情報として出力する処理とを有することを特徴とする。 The present invention for solving the above-mentioned problems is a right information extraction method for extracting right information related to rights from content, a process of reading credit information related to rights from content and outputting the result as credit candidate information, and analyzing the content A process for recognizing an object related to the right in the content and outputting the result as object identification information; and a process for integrating the credit candidate information and the object identification information and outputting as rights information. It is characterized by having.

上記課題を解決する本発明は、コンテンツから権利に関する権利情報を抽出する権利情報抽出方法であって、コンテンツから権利に関するクレジット情報を読み取り、結果をクレジット候補情報として出力する処理と、前記クレジット候補情報を参照し、前記コンテンツを解析してコンテンツ内の権利に関する対象物を認識して、その結果を対象物識別情報として出力する処理と、前記クレジット候補情報と前記対象物識別情報とを統合し、権利情報として出力する処理とを有することを特徴とする。 The present invention for solving the above-mentioned problems is a right information extraction method for extracting right information related to rights from content, a process of reading credit information related to rights from content and outputting the result as credit candidate information, and the credit candidate information And analyzing the content, recognizing the object related to the right in the content, and outputting the result as the object identification information, integrating the credit candidate information and the object identification information, And processing for outputting as right information.

上記課題を解決する本発明は、コンテンツから権利に関するクレジット情報を読み取り、結果をクレジット候補情報として出力する処理を情報処理装置に実行させることを特徴とするプログラムである。 The present invention that solves the above problems is a program that causes an information processing apparatus to execute a process of reading credit information about a right from a content and outputting the result as credit candidate information.

上記課題を解決する本発明は、コンテンツから権利に関するクレジット情報を読み取り、結果をクレジット候補情報として出力する処理と、前記コンテンツを解析し、コンテンツ内の権利に関する対象物を認識して、その結果を対象物識別情報として出力する処理と、前記クレジット候補情報と前記対象物識別情報とを統合し、権利情報として出力する処理とを情報処理装置に実行させることを特徴とするプログラムである。 The present invention for solving the above-described problems is a process of reading credit information related to rights from content, outputting the result as credit candidate information, analyzing the content, recognizing an object related to rights in the content, and obtaining the result. A program that causes an information processing device to execute a process of outputting as object identification information and a process of integrating the credit candidate information and the object identification information and outputting as rights information.

上記課題を解決する本発明は、コンテンツから権利に関するクレジット情報を読み取り、結果をクレジット候補情報として出力する処理と、前記クレジット候補情報を参照し、前記コンテンツを解析してコンテンツ内の権利に関する対象物を認識して、その結果を対象物識別情報として出力する処理と、前記クレジット候補情報と前記対象物識別情報とを統合し、権利情報として出力する処理とを情報処理装置に実行させることを特徴とするプログラムである。 The present invention that solves the above-described problems is a process of reading credit information related to rights from content and outputting the result as credit candidate information, and referring to the credit candidate information, analyzing the content, and an object related to rights in the content And processing the information processing apparatus to output the result as object identification information and the process of integrating the credit candidate information and the object identification information and outputting as right information. It is a program.

本発明は、コンテンツから権利に関するクレジット情報を読み取り、結果をクレジット候補情報として出力するようにしているので、コンテンツに関連する権利に関する情報を精度良く、自動抽出することができる。 According to the present invention, since the credit information related to the right is read from the content and the result is output as the credit candidate information, the information related to the right related to the content can be automatically extracted with high accuracy.

また、著作権等のコンテンツに関連する権利に関する情報を、コンテンツから抽出する処理の省力化が図れる点である。その理由は、コンテンツから自動的に著作権などの権利情報を抽出することが可能になるためである。 In addition, it is possible to save labor in processing for extracting information related to rights such as copyrights from the content. The reason is that it is possible to automatically extract copyright information such as copyright from the content.

図１はクレジット認識手段１００のブロック図である。FIG. 1 is a block diagram of the credit recognition means 100. 図２はクレジット情報重畳区間検出手段４００の構成を示した図である。FIG. 2 is a diagram showing the configuration of the credit information superimposition section detecting means 400. 図３は主題歌検知手段４１０の具体的な構成を示した図である。FIG. 3 is a diagram showing a specific configuration of the theme song detection means 410. 図４は主題歌検知手段４１０の他の具体的な構成を示した図である。FIG. 4 is a diagram showing another specific configuration of the theme song detection means 410. 図５は主題歌検知手段４１０の他の具体的な構成を示した図である。FIG. 5 is a diagram showing another specific configuration of the theme song detection means 410. 図６は主題歌検知手段４１０の他の具体的な構成を示した図である。FIG. 6 is a diagram showing another specific configuration of the theme song detection means 410. 図７は主題歌検知手段４１０の他の具体的な構成を示した図である。FIG. 7 is a diagram showing another specific configuration of the theme song detection means 410. 図８は主題歌検知手段４１０の他の具体的な構成を示した図である。FIG. 8 is a diagram showing another specific configuration of the theme song detection means 410. 図９はクレジット情報重畳区間検出手段４００の他の構成を示した図である。FIG. 9 is a diagram showing another configuration of the credit information superimposition section detecting means 400. In FIG. 図１０はクレジット情報重畳区間検出手段４００の他の構成を示した図である。FIG. 10 is a diagram showing another configuration of the credit information superimposition section detecting means 400. 図１１はクレジット情報重畳区間検出手段４００の他の具体的な構成を示した図である。FIG. 11 is a diagram showing another specific configuration of the credit information superimposition section detecting means 400. 図１２はクレジット情報読み取り手段６００の一例を示した図である。FIG. 12 is a diagram showing an example of the credit information reading unit 600. 図１３はクレジット情報読み取り手段６００の他の構成を示した図である。FIG. 13 is a diagram showing another configuration of the credit information reading means 600. 図１４は主題歌背景映像生成手段６２０の構成を示した図である。FIG. 14 is a diagram showing the configuration of the theme song background video generation means 620. 図１５はクレジット情報読み取り手段６００の他の構成を示した図である。FIG. 15 is a diagram showing another configuration of the credit information reading means 600. 図１６は本発明の第２の実施の形態の構成を示した図である。FIG. 16 is a diagram showing the configuration of the second exemplary embodiment of the present invention. 図１７は本発明の第３の実施の形態の構成を示した図である。FIG. 17 is a diagram showing the configuration of the third exemplary embodiment of the present invention. 図１８は音楽著作物認識手段１０１の構成例を示した図である。FIG. 18 is a diagram showing a configuration example of the music work recognizing means 101. 図１９は音楽著作物認識手段１０１の他の構成を示した図である。FIG. 19 is a diagram showing another configuration of the music work recognizing means 101. 図２０は音楽著作物照合手段８２２の構成を示した図である。FIG. 20 is a diagram showing the configuration of the music work collating means 822. 図２１は登場人物認識手段１０２の構成を示した図である。FIG. 21 is a diagram showing the configuration of the character recognition means 102. 図２２は登場人物認識手段１０２の他の構成を示した図である。FIG. 22 is a diagram showing another configuration of the character recognition means 102. 図２３は本発明の第３の実施の形態の他の構成を示した図である。FIG. 23 is a diagram showing another configuration of the third exemplary embodiment of the present invention. 図２４は本発明の第３の実施の形態の他の構成を示した図である。FIG. 24 is a diagram showing another configuration of the third exemplary embodiment of the present invention.

Explanation of symbols

１００クレジット情報認識手段
１０３統合手段
１０５対象物認識手段
１２０統合手段
１３０統合手段
４００クレジット情報重畳区間検出手段
４１０主題歌検知手段
４２０映像切り出し手段
４３０音響特徴量抽出手段
４３１音響特徴量照合手段
４３２主題歌音響特徴量データベース
４３５主題歌音響特徴量データベース
４４０音響特徴量抽出手段
４４１音響特徴量照合手段
４４２視覚特徴量抽出手段
４４３視覚特徴量照合手段
４４５音響特徴量照合手段
４５０連続音響区間抽出手段
４５１主題歌区間判定手段
４５２主題歌候補区間判定手段
４７０連続テロップ検知手段
４８０ロールテロップ検知手段
４８１選択手段
６００クレジット情報読み取り手段
６１０テロップ読み取り手段
６２０主題歌背景映像生成手段
６３０主題歌背景差分映像生成手段
６４０テロップ読み取り手段
６５０テロップ読み取り結果総合判断
７００背景映像生成手段
７１０対応フレーム算出手段
７２０視覚特徴量抽出手段
８００楽曲候補抽情報出手段
８０１候補音響特徴量選択手段
８０２音楽著作物照合手段
８０３楽曲音響特徴量データベース
８２０音楽関連制作情報抽出手段
８２１音楽著作物照合パラメータ選択手段
８２２音楽著作物照合手段
８２３音楽著作物照合パラメータデータベース
９００出演者候補情報抽出手段
９０１候補人物特徴量選択手段
９０２出演者照合手段
９０３人物特徴量データベース
９２０出演者所属団体抽出手段
９２１出演者照合パラメータ選択手段
９２２出演者照合手段
９２３人物照合パラメータデータベース
９５０音声重畳判定手段
９５１音響特徴量照合手段DESCRIPTION OF SYMBOLS 100 Credit information recognition means 103 Integration means 105 Object recognition means 120 Integration means 130 Integration means 400 Credit information superimposition area detection means 410 Theme song detection means 420 Image | video cutout means 430 Sound feature-value extraction means 431 Sound feature-value collation means 432 Theme song Acoustic feature quantity database 435 Thematic song acoustic feature quantity database 440 Acoustic feature quantity extraction means 441 Acoustic feature quantity verification means 442 Visual feature quantity extraction means 443 Visual feature quantity verification means 445 Acoustic feature quantity verification means 450 Continuous acoustic section extraction means 451 Thematic song Section determination means 452 Thematic song candidate section determination means 470 Continuous telop detection means 480 Roll telop detection means 481 Selection means 600 Credit information reading means 610 Telop reading means 620 Thematic song background video generation Stage 630 Theme song background difference video generation means 640 Ticker reading means 650 Ticker reading result comprehensive judgment 700 Background video generation means 710 Corresponding frame calculation means 720 Visual feature quantity extraction means 800 Music candidate extraction information output means 801 Candidate acoustic feature quantity selection means 802 Music work collation means 803 Music acoustic feature database 820 Music related production information extraction means 821 Music work collation parameter selection means 822 Music work collation means 823 Music work collation parameter database 900 Performer candidate information extraction means 901 Candidate person characteristics Quantity selection means 902 Performer collation means 903 Person feature quantity database 920 Performer affiliation group extraction means 921 Performer collation parameter selection means 922 Performer collation means 923 Person collation parameter database 9 0 audio superimposing determining means 951 acoustic feature checker means

＜第１の実施の形態＞
第１の実施の形態を説明する。<First Embodiment>
A first embodiment will be described.

第１の実施の形態では、コンテンツを解析してクレジット情報を読み取り、クレジット情報の候補となる情報を出力するクレジット情報認識手段１００について説明する。 In the first embodiment, a credit information recognition unit 100 that analyzes content, reads credit information, and outputs information that is a candidate for credit information will be described.

図１はクレジット認識手段１００のブロック図である。 FIG. 1 is a block diagram of the credit recognition means 100.

クレジット情報認識手段１００では、コンテンツ中からクレジット情報が含まれている可能性が高い区間（以後、この区間をクレジット情報重畳区間と呼ぶ）を抽出する。次に、クレジット情報重畳区間に含まれる映像や音声を解析し、コンテンツからクレジット情報としてテロップ情報や音声を読み取る。そして、その結果をクレジット候補情報として出力する。
また、クレジット情報認識手段１００において、クレジット情報を読み取る際には、クレジット情報が重畳されている可能性が高い区間だけではなく低い区間も読み取る様にしても良い。更に、読み取るクレジット情報は、コンテンツの権利に関しないクレジット情報も読取るようにしても良い。The credit information recognizing means 100 extracts a section (hereinafter, this section is referred to as a credit information superimposition section) that is likely to contain credit information from the content. Next, video and audio included in the credit information superimposition section are analyzed, and telop information and audio are read as credit information from the content. Then, the result is output as credit candidate information.
Further, when the credit information recognition unit 100 reads the credit information, it may be configured to read not only the section where the possibility that the credit information is superimposed is high but also the low section. Furthermore, the credit information to be read may be such that credit information not related to the content right is also read.

ここで、コンテンツとは、映像、音声等から構成されるものであり、例えば、テレビ番組、ラジオ番組、映画等をいい、かならずしも放送や公開されるものに限られず、ＤＶＤ等の記録媒体に格納されたものも含む。 Here, the content is composed of video, audio, and the like, for example, a television program, a radio program, a movie, and the like, and is not necessarily limited to broadcast or released, but stored in a recording medium such as a DVD. Also included.

また、クレジット情報とは、コンテンツの主題歌や最後の部分等に重畳されている、原作者や脚本家、出演者、主題歌、協力団体、提供企業などの情報を記したテロップや、音声である。 Credit information is a telop or audio that contains information such as the original author, screenwriter, performer, theme song, cooperating organization, provider, etc., superimposed on the theme song or the last part of the content. is there.

また、クレジット情報認識手段１００に入力されるコンテンツは、MPEGなどの圧縮されたフォーマットで入力されてもよいし、既に復号されてから入力されてもよい。圧縮された映像として入力される場合には、クレジット情報認識手段の中で映像を復号しながら解析を行う。なお、番組映像は、ある特定の一回の放送分の映像であってもよいし、あるいは、同じ番組の複数の回の映像（例えば、ドラマの第1話から第10話までなど）を同時に入力する構成になっていてもよい。 The content input to the credit information recognition unit 100 may be input in a compressed format such as MPEG, or may be input after being already decrypted. When input as a compressed video, analysis is performed while decoding the video in the credit information recognition means. The program video may be a video for a specific one-time broadcast, or multiple times of the same program (for example, from the first episode to the tenth episode of the drama) at the same time. It may be configured to input.

更に、クレジット候補情報は、認識された文字列とその時間情報、画像中での位置情報（フレーム内での座標）を含んでいてもよい。また、テロップや音声の認識の確からしさを表す指標を含んでいてもよい。また、クレジット候補情報は、認識された各文字列に対して１つの情報を出力するようになっていてもよいし、複数の候補文字列を出力するようになっていてもよい。また、テロップが読み取れなかった場合には、そのテロップを含む映像中の時空間位置を特定する情報をクレジット候補情報に含んで出力してもよい。さらに、この時空間位置の映像情報自体をクレジット候補情報に含んで出力してもよい。 Further, the credit candidate information may include a recognized character string, its time information, and position information in the image (coordinates in the frame). In addition, it may include an index representing the probability of recognition of telop or voice. Further, the credit candidate information may output one piece of information for each recognized character string, or may output a plurality of candidate character strings. If the telop cannot be read, information specifying the spatio-temporal position in the video including the telop may be included in the credit candidate information and output. Furthermore, the video information itself at this spatiotemporal position may be included in the credit candidate information and output.

次に、図１に示されるクレジット情報認識手段１００の各構成について述べる。尚、以下の説明では、コンテンツの例として番組映像を例にして説明する。 Next, each configuration of the credit information recognition unit 100 shown in FIG. 1 will be described. In the following description, a program video will be described as an example of content.

図１を参照すると、クレジット情報認識手段１００は、クレジット情報重畳区間検出手段４００とクレジット情報読み取り手段６００とからなる。 Referring to FIG. 1, the credit information recognition unit 100 includes a credit information superimposition section detection unit 400 and a credit information reading unit 600.

クレジット情報重畳区間検出手段４００は、番組映像を入力とし、その出力はクレジット情報読み取り手段６００へ接続される。クレジット情報読み取り手段６００は、クレジット情報重畳区間検出手段４００から出力されるクレジット情報重畳区間映像データを入力とし、クレジット候補情報を出力する。 The credit information superimposition section detection means 400 receives a program video as an input, and its output is connected to the credit information reading means 600. The credit information reading unit 600 receives the credit information superimposed section video data output from the credit information superimposed section detecting unit 400 and outputs credit candidate information.

次に、図１に示すクレジット情報認識手段１００の実施の形態の動作について説明する。 Next, the operation of the embodiment of the credit information recognition unit 100 shown in FIG. 1 will be described.

番組映像は、まず、クレジット情報重畳区間検出手段４００へ入力される。クレジット情報重畳区間検出手段４００では、視覚特徴量や音響特徴量、あるいはテロップの出現パターンなどの特徴を用いて、クレジット情報重畳区間を特定する。この方式の詳細については後述する。そして、特定された時間区間の映像データを、クレジット情報重畳区間映像データとして出力する。 First, the program video is input to the credit information superimposition section detecting means 400. The credit information superimposition section detecting means 400 specifies a credit information superimposition section using features such as visual feature quantity, acoustic feature quantity, or telop appearance pattern. Details of this method will be described later. Then, the video data of the specified time interval is output as credit information superimposed interval video data.

クレジット情報重畳区間映像データは、クレジット情報読み取り手段６００へ入力される。クレジット情報読み取り手段６００では、入力される映像、あるいはそれを処理して得られる映像に対してテロップ認識が行われる。そして得られた認識結果をクレジット候補情報として出力する。 The credit information superimposed section video data is input to the credit information reading means 600. In the credit information reading means 600, telop recognition is performed on an input video or a video obtained by processing it. The obtained recognition result is output as credit candidate information.

このように、本発明のクレジット情報認識手段１００では、クレジット情報が重畳されている箇所を特定し、その区間を重点的に識別することで、映像に対して単純にテロップ認識を適用する場合に比べ、効率的に精度よくクレジット情報を抽出することが可能になる。 As described above, the credit information recognition unit 100 according to the present invention identifies a portion where credit information is superimposed and identifies the section with priority, thereby simply applying telop recognition to a video. In comparison, it is possible to extract credit information efficiently and accurately.

＜クレジット情報重畳区間検出手段４００の具体的な構成＞
１．主題歌に着目してクレジット情報重畳区間を検出
クレジット情報重畳区間検出手段４００の具体的な構成について説明する。以下に説明するクレジット情報重畳区間検出手段４００は、コンテンツに含まれる音響のうち主題歌の部分にクレジット情報が重畳されることが多いことを利用した具体例である。尚、音響の一例として主題歌を用いたが、これに類するものであれば、主題歌に限られない。<Specific Configuration of Credit Information Superimposing Section Detection Unit 400>
1. Detecting credit information superimposition section focusing on the theme song A specific configuration of the credit information superimposition section detection means 400 will be described. The credit information superimposing section detecting means 400 described below is a specific example using the fact that credit information is often superimposed on the theme song portion of the sound included in the content. In addition, although the theme song was used as an example of sound, if it is similar to this, it is not restricted to a theme song.

図２はクレジット情報重畳区間検出手段４００の構成を示した図であり、クレジット情報重畳区間検出手段４００は、主題歌検知手段４１０と映像切り出し手段４２０とからなる。 FIG. 2 is a diagram showing the configuration of the credit information superimposing section detecting means 400, and the credit information superimposing section detecting means 400 includes a theme song detecting means 410 and a video cutout means 420.

主題歌検知手段４１０は、番組映像を入力とし、その出力である区間指定時刻情報は映像切り出し手段４２０へ接続される。映像切り出し手段４２０は、番組映像と主題歌検知手段４１０から出力される区間指定時刻情報を入力とし、クレジット情報重畳区間映像データを出力する。 The theme song detection means 410 receives the program video as input, and the section designation time information as the output is connected to the video cutout means 420. The video cutout unit 420 receives the program specified video and the section designation time information output from the theme song detection unit 410, and outputs credit information superimposed section video data.

次に、図２に示すクレジット情報重畳区間検出手段４００の動作について述べる。 Next, the operation of the credit information superimposition section detecting means 400 shown in FIG. 2 will be described.

番組映像は、まず、主題歌検知手段４１０へ入力される。主題歌検知手段４１０では、映像中から主題歌を含む区間を抽出する。これは、ドラマなどの映像では、ほとんどの場合、主題歌の部分にクレジット情報が重畳されるためである。すなわち、主題歌の時間区間は、クレジット情報重畳区間とみなせる。番組映像から主題歌時間区間を抽出する方法の詳細については後述する。抽出された主題歌部分を特定する時刻情報は、区間指定時刻情報として出力される。 The program video is first input to the theme song detection means 410. The theme song detection means 410 extracts a section including the theme song from the video. This is because in most cases, such as dramas, credit information is superimposed on the theme song. That is, the time section of the theme song can be regarded as a credit information superimposition section. Details of the method for extracting the theme song time section from the program video will be described later. The time information specifying the extracted theme song portion is output as the section designation time information.

区間指定時刻情報は、番組映像とともに、映像切り出し手段４２０へ入力される。映像切り出し手段４２０では、番組映像ののうち、区間指定時刻情報によって指定される映像データを特定し、クレジット情報重畳区間映像データとして出力する。ここで、特定された映像をもとの番組映像から実際に切り出して出力してもよい。あるいは、実際に切り出すことはせず、区間の先頭と最後の位置へジャンプするための情報（例えば番組先頭からのバイト数）を取得し、特定された区間にすぐに頭だしできるようにするだけでもよい。この場合も、すぐに指定区間の先頭にアクセスできるため、実際に切り出した場合と同様に後段の処理が行える。 The section designation time information is input to the video cutout unit 420 together with the program video. The video cutout means 420 identifies video data designated by the section designation time information from the program video and outputs it as credit information superimposed section video data. Here, the specified video may be actually cut out from the original program video and output. Or, do not actually cut out, just get the information to jump to the beginning and end of the section (for example, the number of bytes from the beginning of the program), so that you can immediately head to the specified section But you can. Also in this case, since the head of the designated section can be accessed immediately, the subsequent processing can be performed in the same manner as when the cut is actually performed.

このように図２に示すクレジット情報重畳区間検出手段では、主題歌を検知することで、精度よくクレジット情報重畳区間を求めることを可能にする。 As described above, the credit information superimposition section detecting means shown in FIG. 2 can obtain the credit information superimposition section with high accuracy by detecting the theme song.

（１）主題歌検知手段４１０の具体的構成例１
主題歌検知手段４１０の具体的な構成について説明する。(1) Specific configuration example 1 of the theme song detection means 410
A specific configuration of the theme song detection unit 410 will be described.

図３を参照すると、主題歌検知手段４１０は、音響特徴量抽出手段４３０と音響特徴量照合手段４３１と主題歌音響特徴量データベース４３２とからなる。音響特徴量抽出手段４３０は、番組映像を入力とし、その出力である音響特徴量は音響特徴量照合手段４３１へ入力される。音響特徴量照合手段４３１は、音響特徴量抽出手段４３０から出力される音響特徴量と主題歌音響特徴量データベース４３２からの音響特徴量を入力とし、区間指定時刻情報を出力する。 Referring to FIG. 3, the theme song detection means 410 includes an acoustic feature quantity extraction means 430, an acoustic feature quantity verification means 431, and a theme song acoustic feature quantity database 432. The acoustic feature quantity extraction means 430 receives the program video, and the output acoustic feature quantity is input to the acoustic feature quantity verification means 431. The acoustic feature quantity matching means 431 receives the acoustic feature quantity output from the acoustic feature quantity extraction means 430 and the acoustic feature quantity from the theme song acoustic feature quantity database 432, and outputs section designation time information.

次に、図３に示す主題歌検知手段４１０の動作について述べる。 Next, the operation of the theme song detection means 410 shown in FIG. 3 will be described.

番組映像は、まず、音響特徴量抽出手段４３０へ入力される。音響特徴量抽出手段４３０では、番組映像の音響信号を解析し、音響特徴量を抽出する。抽出された音響特徴量は、音響特徴量照合手段４３１へ出力される。音響特徴量照合手段４３１では、音響特徴量抽出手段４３０から入力される番組映像の音響特徴量と主題歌音響特徴量データベース内の主題歌音響特徴量を照合する。そして、番組音響特徴量中の照合した部分に相当する時間区間を特定する時刻情報（区間の始点、終点、時間長など）を区間指定時刻情報として出力する。 The program video is first input to the acoustic feature quantity extraction means 430. The acoustic feature quantity extraction means 430 analyzes the acoustic signal of the program video and extracts the acoustic feature quantity. The extracted acoustic feature amount is output to the acoustic feature amount matching unit 431. The acoustic feature quantity matching unit 431 compares the acoustic feature quantity of the program video input from the acoustic feature quantity extraction unit 430 with the theme song acoustic feature quantity in the theme song acoustic feature quantity database. Then, time information (such as the start point, end point, and time length of the section) that specifies the time section corresponding to the collated portion in the program acoustic feature amount is output as the section designation time information.

ここで、主題歌音響特徴量は、主題歌から予め抽出した音響特徴量である。これを事前に登録しておき、主題歌検知に用いる。この際、主題歌が予め既知の場合には、その主題歌の音源（ＣＤなど）から音響特徴量を抽出して用いることができる。あるいは、シリーズもののドラマなどの場合には、ある一回の映像で主題歌の区間を特定し、この部分の音響特徴量を用いて他の回のドラマ映像の主題歌検知に用いることができる。 Here, the theme song acoustic feature quantity is an acoustic feature quantity extracted in advance from the theme song. This is registered in advance and used for theme song detection. At this time, if the theme song is known in advance, an acoustic feature can be extracted from the sound source (such as a CD) of the theme song and used. Alternatively, in the case of a series drama or the like, the section of the theme song can be specified by a certain video, and the acoustic feature of this part can be used to detect the theme song of another drama video.

上述した主題歌検知手段４１０は、主題歌音響特徴量データベースに登録されている特徴量を用いて照合を行うため、確実に主題歌の部分を抽出することができる。 Since the above-described theme song detection means 410 performs matching using the feature quantity registered in the theme song acoustic feature quantity database, the theme song part can be reliably extracted.

（２）主題歌検知手段４１０の具体的構成例２
主題歌検知手段４１０の他の具体的な構成について説明する。(2) Specific configuration example 2 of the theme song detection means 410
Another specific configuration of the theme song detection unit 410 will be described.

図４を参照すると、主題歌検知手段４１０の他の具体的な構成例が示されており、音響特徴量抽出手段４４０と音響特徴量照合手段４４１とからなる。音響特徴量抽出手段４４０は、番組映像を入力とし、その出力である音響特徴量は音響特徴量照合手段４４１へ入力される。音響特徴量照合手段４４１は、音響特徴量抽出手段４４０から出力される音響特徴量を入力とし、区間指定時刻情報を出力する。 Referring to FIG. 4, another specific configuration example of the theme song detection unit 410 is shown, which includes an acoustic feature amount extraction unit 440 and an acoustic feature amount comparison unit 441. The acoustic feature quantity extraction means 440 receives the program video, and the output acoustic feature quantity is input to the acoustic feature quantity verification means 441. The acoustic feature quantity matching unit 441 receives the acoustic feature quantity output from the acoustic feature quantity extracting unit 440 and outputs section designation time information.

次に、図４に示す主題歌検知手段４１０の動作について述べる。 Next, the operation of the theme song detection means 410 shown in FIG. 4 will be described.

番組映像は、まず、音響特徴量抽出手段４４０へ入力される。ここで、番組映像は、単一の回の番組映像ではなく、複数回の番組映像をまとめて入力するものとする。例えば、シリーズもののドラマの場合には、数話分のドラマ映像がまとめて入力されるものとする。音響特徴量抽出手段４４０では、この複数回のドラマ映像それぞれに対して音響特徴量の抽出を行う。抽出された各回の音響特徴量は、音響特徴量照合手段４４１へ出力される。 The program video is first input to the acoustic feature quantity extraction means 440. Here, the program video is not a single-time program video, but a plurality of times of program video are input together. For example, in the case of a series drama, it is assumed that drama videos for several episodes are input together. The acoustic feature quantity extraction means 440 extracts an acoustic feature quantity for each of the plurality of drama videos. The extracted acoustic feature amount at each time is output to the acoustic feature amount matching unit 441.

音響特徴量照合手段４４１では、入力される複数回の番組の音響特徴量間で照合を行う。この際、照合は各回の番組全体で行うのではなく、番組から切り出される任意長の区間同士で行う。これにより、各回で音響特徴量が一致する区間が求まる。このようにして求まった区間のうち、一定区間長以上のものは、主題歌に相当する可能性が高いと考えられる。よって、上記で求まった一定区間長以上の区間を指定する時刻情報を区間指定時刻情報として出力する。あるいは、さらに区間の位置情報を用いて判定してもよい。即ち、主題歌は番組の冒頭か最後に流れる場合が多いことを利用して主題歌の区間を特定してもよい。この情報は、各回の番組に対して出力される。 The acoustic feature amount matching unit 441 performs matching between acoustic feature amounts of a plurality of input programs. At this time, collation is not performed for the entire program of each time, but between sections of an arbitrary length cut out from the program. As a result, a section in which the acoustic feature amounts coincide with each other is obtained. Of the sections obtained in this way, those having a certain section length or more are considered to be highly likely to correspond to the theme song. Therefore, the time information for designating a section longer than the predetermined section length obtained above is output as the section designation time information. Or you may determine using the positional information on an area further. That is, the section of the theme song may be specified using the fact that the theme song often flows at the beginning or end of the program. This information is output for each program.

図４に示す主題歌検知手段４１０は、主題歌が何であるかを知っていなくても、複数回の映像を用いて同じ音響のパターンを有するところを見つけることで、主題歌部分を特定できる。すなわち、主題歌特徴量を格納したデータベースが不要となる。 Even if the theme song detection means 410 shown in FIG. 4 does not know what the theme song is, the theme song portion can be specified by finding a place having the same acoustic pattern using a plurality of images. That is, a database storing theme song feature quantities is not required.

また、はじめの数回で主題歌部分を特定し、主題歌の特徴量を抽出すれば、それ以降の回では、抽出した主題歌の特徴量を用いて図３と同様にして主題歌部分を特定することもできる。これにより、複数回の動画間全体で共通部分を見つける場合に比べ、より少ない演算量で確実に主題歌部分を特定できる。 If the theme song part is specified in the first few times and the feature quantity of the theme song is extracted, the theme song part is extracted in the same manner as in FIG. 3 in the subsequent times using the feature quantity of the extracted theme song. It can also be specified. This makes it possible to reliably identify the theme song portion with a smaller amount of computation than in the case where a common portion is found in a plurality of moving images as a whole.

（３）主題歌検知手段４１０の具体的構成例３
主題歌検知手段４１０の他の具体的な構成について説明する。(3) Specific configuration example 3 of the theme song detection means 410
Another specific configuration of the theme song detection unit 410 will be described.

図５を参照すると、主題歌検知手段４１０の他の具体的な構成の一例が示されており、連続音響区間抽出手段４５０と主題歌区間判定手段４５１とからなる。連続音響区間抽出手段４５０は、番組映像を入力とし、その出力である連続音響時間情報は主題歌区間判定手段４５１へ入力される。主題歌区間判定手段４５１は、連続音響区間抽出手段４５０から出力される連続音響時間情報を入力とし、区間指定時刻情報を出力する。 Referring to FIG. 5, an example of another specific configuration of the theme song detection unit 410 is shown, which includes a continuous sound segment extraction unit 450 and a theme song segment determination unit 451. The continuous sound segment extraction means 450 receives a program video as input, and the continuous sound time information as the output is input to the theme song segment determination means 451. The theme song section determination means 451 receives the continuous sound time information output from the continuous sound section extraction means 450 and outputs the section designated time information.

次に、図５に示す主題歌検知手段４１０の動作について述べる。 Next, the operation of the theme song detection means 410 shown in FIG. 5 will be described.

番組映像は、まず、連続音響区間抽出手段４５０へ入力される。ここでは、映像中の音響信号から音響の連続性（持続性）を分析する。そして、連続する音響区間がみつかった場合には、その時刻情報を連続音響時間情報として主題歌区間判定手段４５１へ出力する。 The program video is first input to the continuous sound segment extraction means 450. Here, the continuity (sustainability) of the sound is analyzed from the sound signal in the video. And when a continuous acoustic section is found, the time information is output to the theme song section determination means 451 as continuous acoustic time information.

連続音響の分析は、例えば、番組映像の音響信号のパワーから無音区間を見つけ、無音区間で挟まれる区間を連続音響区間とする方式が考えられる。この際、音響特徴量を分析して音響信号の楽曲らしさを判定し、これが高いときのみ、連続音響区間として出力するようにしてもよい。この分析には、音響データに基づいて学習したサポートベクターマシンなどの音響判別器を用いることができる。 For example, a continuous sound analysis may be performed by finding a silent section from the power of an audio signal of a program video and setting a section between the silent sections as a continuous acoustic section. At this time, it is possible to analyze the acoustic feature amount to determine the musicalness of the acoustic signal and output it as a continuous acoustic section only when it is high. For this analysis, an acoustic discriminator such as a support vector machine learned based on acoustic data can be used.

主題歌区間判定手段４５１では、入力される連続音響時間情報から主題歌に相当する時間区間を選択し、区間指定時刻情報として出力する。この際、主題歌部分は音響が長く続くこと、および、主題歌は番組のはじめか終わりに近い部分に存在することなどの条件を用いて主題歌区間を判定する。 The theme song section determination means 451 selects a time section corresponding to the theme song from the input continuous sound time information and outputs it as section designated time information. At this time, the theme song section is determined using conditions such as that the sound of the theme song portion lasts for a long time and that the theme song exists near the beginning or end of the program.

このように、本説明の主題歌検知手段４１０では、音響が連続して続く場所は主題歌やBGMの部分が多く、特に、主題歌の部分は、音響信号が長く続く（数十秒から数分）ことを利用して主題歌の部分を特定している。これにより、実際に詳細な音響解析を行わなくても簡易に主題歌部分を特定できる。 In this way, in the theme song detection means 410 of this description, there are many theme songs and BGM portions where the sound continues continuously. In particular, the theme song portion has a long acoustic signal (several tens of seconds to several seconds). Min)) to identify the theme song part. This makes it possible to easily identify the theme song portion without actually performing detailed acoustic analysis.

（４）主題歌検知手段４１０の具体的構成例４
主題歌検知手段４１０の他の具体的な構成について説明する。(4) Specific configuration example 4 of the theme song detection means 410
Another specific configuration of the theme song detection unit 410 will be described.

図６を参照すると、主題歌検知手段４１０の他の具体的な構成の一例が示されており、視覚特徴量抽出手段４４２と視覚特徴量照合手段４４３とからなる。視覚特徴量抽出手段４４２は、番組映像を入力とし、その出力である番組視覚特徴量は視覚特徴量照合手段４４３へ入力される。視覚特徴量照合手段４４３は、視覚特徴量抽出手段４４２から出力される視覚特徴量を入力とし、区間指定時刻情報を出力する。 Referring to FIG. 6, an example of another specific configuration of the theme song detection unit 410 is shown, which includes a visual feature amount extraction unit 442 and a visual feature amount comparison unit 443. The visual feature quantity extraction unit 442 receives a program video as an input, and the program visual feature quantity as an output is input to the visual feature quantity collating unit 443. The visual feature amount matching unit 443 receives the visual feature amount output from the visual feature amount extraction unit 442 and outputs the section designation time information.

次に、図６に示す主題歌検知手段４１０の動作について説明する。 Next, the operation of the theme song detection means 410 shown in FIG. 6 will be described.

番組映像は、まず、視覚特徴量抽出手段４４２へ入力される。ここで、番組映像は、図４と同様に、複数回の番組映像がまとめて入力されるものとする。視覚特徴量抽出手段４４２では、この複数回の番組映像それぞれに対して視覚特徴量の抽出を行う。抽出された各回の視覚特徴量は、視覚特徴量照合手段４４３へ出力される。 The program video is first input to the visual feature quantity extraction means 442. Here, as in the case of FIG. 4, it is assumed that the program video is input a plurality of times. The visual feature amount extraction means 442 extracts a visual feature amount for each of the plurality of program videos. The extracted visual feature amount at each time is output to the visual feature amount matching unit 443.

視覚特徴量照合手段４４３では、入力される複数回の番組の視覚特徴量間で照合を行う。この際、照合は各回の番組全体で行うのではなく、番組から切り出される任意長の区間同士で行う。これにより、各回で視覚特徴量が一致する区間が求まる。このようにして求まった区間のうち、一定区間長以上のものは、主題歌に相当する可能性が高いと考えられる。よって上記で求まった一定区間以上の区間を指定する時刻情報を区間指定時刻情報として出力する。この情報は、各回の番組に対して出力される。 The visual feature amount matching unit 443 performs matching between visual feature amounts of a plurality of input programs. At this time, collation is not performed for the entire program of each time, but between sections of an arbitrary length cut out from the program. Thereby, the section where visual feature-values correspond in each time is obtained. Of the sections obtained in this way, those having a certain section length or more are considered to be highly likely to correspond to the theme song. Therefore, the time information for designating a section that is equal to or greater than the predetermined section obtained above is output as the section designation time information. This information is output for each program.

図６に示す主題歌検知手段４１０も、図４の場合と同様に、主題歌が何であるかを知っていなくても複数回の映像を用いて同じ視覚パターンを有するところを見つけることで、主題歌部分を特定できる。 Similar to the case of FIG. 4, the theme song detection means 410 shown in FIG. 6 also finds a subject having the same visual pattern using a plurality of images without knowing what the theme song is. The song part can be specified.

なお、視覚特徴量は、画面全体から算出するようになっていてもよいし、画面の一部分のみから抽出するようになっていてもよい。後者の場合には、主題歌背景の一部に本編映像が重ね合わせられるような場合にも対処できるようになる。 Note that the visual feature amount may be calculated from the entire screen, or may be extracted from only a part of the screen. In the latter case, it is possible to cope with the case where the main video is superimposed on a part of the theme song background.

さらに、図６の視覚特徴量による照合結果と、図４の音響特徴量による照合結果を組み合わせることも可能である。これにより、より高精度に主題歌区間を検知できるようになる。特に、背景映像は、各回によって出現順が入れ替わる場合もあるが、音響特徴量を組み合わせることで、このような場合であっても、確実に主題歌区間を特定できるようになる。また、本編の音声が主題歌と重なって音響による全区間の特定が困難な場合であっても、視覚特徴量による照合の結果を組み合わせることで、補完することが可能となる。 Furthermore, the collation result based on the visual feature quantity in FIG. 6 and the collation result based on the acoustic feature quantity in FIG. 4 can be combined. As a result, the theme song section can be detected with higher accuracy. In particular, the appearance order of the background video may be changed every time, but by combining the acoustic feature amounts, the theme song section can be reliably specified even in such a case. Further, even when the sound of the main part overlaps with the theme song and it is difficult to specify all sections by sound, it can be complemented by combining the results of collation by visual feature amount.

（５）主題歌検知手段４１０の具体的構成例５
主題歌検知手段４１０の他の具体的な構成について説明する。(5) Specific configuration example 5 of the theme song detection means 410
Another specific configuration of the theme song detection unit 410 will be described.

図７を参照すると、主題歌検知手段４１０の他の具体的な構成の一例が示されており、音響特徴量抽出手段４４０と音響特徴量照合手段４４５と音響特徴量抽出手段４３０と音響特徴量照合手段４３１と主題歌音響特徴量データベース４３５とからなる。 Referring to FIG. 7, an example of another specific configuration of the theme song detection unit 410 is shown. The acoustic feature amount extraction unit 440, the acoustic feature amount comparison unit 445, the acoustic feature amount extraction unit 430, and the acoustic feature amount are illustrated. It consists of collation means 431 and a theme song acoustic feature database 435.

音響特徴量抽出手段４４０は、番組映像を入力とし、その出力である番組音響特徴量は音響特徴量照合手段４４５へ接続される。音響特徴量照合手段４４５は、音響特徴量抽出手段４４０から出力される番組音響特徴量を入力とし、主題歌音響特徴量を主題歌音響特徴量データベース４３５へ出力するとともに、区間指定時刻情報を出力する。主題歌音響特徴量データベース４３５は、音響特徴量照合手段４４５からの出力される主題歌音響特徴量を入力とし、それを音響特徴量照合手段４３１へ出力する。音響特徴量抽出手段４３０は、番組情報を入力とし、その出力である音響特徴量を音響特徴量照合手段４３１へ出力する。音響特徴量照合手段４３１は、主題歌音響特徴量データベース４３５から出力される主題歌音響特徴量と音響特徴量抽出手段４３０から出力される音響特徴量を入力とし、照合結果を出力する。 The acoustic feature quantity extraction unit 440 receives a program video as an input, and the program acoustic feature quantity as an output thereof is connected to the acoustic feature quantity verification unit 445. The acoustic feature quantity matching means 445 receives the program acoustic feature quantity output from the acoustic feature quantity extraction means 440 as an input, outputs the theme song acoustic feature quantity to the theme song acoustic feature quantity database 435, and outputs section designation time information. To do. The theme song acoustic feature quantity database 435 receives the theme song acoustic feature quantity output from the acoustic feature quantity collating means 445 and outputs it to the acoustic feature quantity collating means 431. The acoustic feature quantity extraction unit 430 receives the program information as an input and outputs the output acoustic feature quantity to the acoustic feature quantity verification unit 431. The acoustic feature quantity matching means 431 receives the theme song acoustic feature quantity output from the theme song acoustic feature quantity database 435 and the acoustic feature quantity output from the acoustic feature quantity extraction means 430, and outputs a matching result.

次に、図７に示す主題歌検知手段の動作について述べる。 Next, the operation of the theme song detection means shown in FIG. 7 will be described.

番組映像は、複数の回からなる映像であるとする。音響特徴量抽出手段４４０の動作は図４の場合と同じである。音響特徴量照合手段４４５の動作も、図４の音響特徴量照合手段４４１の動作と同様であるが、さらに、検知された主題歌音響特徴量を主題歌音響特徴量データベース４３５へ出力する。主題歌音響特徴量データベース４３５は、音響特徴量照合手段４４５から出力される主題歌音響特徴量を蓄積しておき、音響特徴量照合手段４３１へ出力する。 It is assumed that the program video is a video composed of a plurality of times. The operation of the acoustic feature quantity extraction means 440 is the same as that in FIG. The operation of the acoustic feature quantity matching unit 445 is similar to the operation of the acoustic feature quantity matching unit 441 in FIG. 4, but further outputs the detected theme song acoustic feature quantity to the theme song acoustic feature database 435. The theme song acoustic feature quantity database 435 stores the theme song acoustic feature quantity output from the acoustic feature quantity collating means 445 and outputs it to the acoustic feature quantity collating means 431.

音響特徴量抽出手段４３０へは、番組映像のうち、残りの複数回の映像が入力される。音響特徴量抽出手段４３０、音響特徴量照合手段４３１の動作は、図３の場合と同様である。 Of the program video, the remaining multiple times of video are input to the acoustic feature quantity extraction means 430. The operations of the acoustic feature quantity extraction unit 430 and the acoustic feature quantity verification unit 431 are the same as those in the case of FIG.

これにより、複数回の動画間全体で共通部分を見つける場合に比べ、より少ない演算量で確実に主題歌部分を特定できる。また、図７では、音響特徴量を用いた場合の構成について述べたが、視覚特徴量や、音響特徴量と視覚特徴量を用いた場合もまったく同様にして主題歌区間を検知できる。 This makes it possible to reliably identify the theme song portion with a smaller amount of computation than in the case where a common portion is found in a plurality of moving images as a whole. In addition, although the configuration in the case where the acoustic feature amount is used is described in FIG. 7, the theme song section can be detected in exactly the same manner when the visual feature amount or the acoustic feature amount and the visual feature amount are used.

（６）主題歌検知手段４１０の具体的構成例６
次に、主題歌検知手段４１０の他の具体的な構成について説明する。(6) Specific configuration example 6 of the theme song detection means 410
Next, another specific configuration of the theme song detection unit 410 will be described.

図８を参照すると、主題歌検知手段４１０の他の具体的な構成が示されており、連続音響区間抽出手段４５０、主題歌候補区間判定手段４５２、音響特徴量抽出手段４３３、音響特徴量照合手段４３１、主題歌音響特徴量データベース４３２とからなる。連続音響区間抽出手段４５０は番組映像を入力とし、その出力である連続音響時間情報を主題歌候補区間判定手段４５２へ出力する。主題歌候補区間判定手段４５２は、連続音響区間抽出手段４５０から出力される連続音響時間情報を入力とし、その出力である主題歌候補区間時刻情報を音響特徴量抽出手段４３３へ出力する。音響特徴量抽出手段４３３は、番組映像と主題歌候補区間判定手段４５２から出力される主題歌候補区間時刻情報を入力とし、その出力である音響特徴量を音響特徴量照合手段４３１へ出力する。音響特徴量照合手段４３１は、音響特徴量抽出手段４３３から出力される音響特徴量と主題歌音響特徴量データベース４３２から出力される主題歌音響特徴量を入力とし、区間指定時刻情報を出力する。 Referring to FIG. 8, there is shown another specific configuration of the theme song detection unit 410, which is a continuous acoustic segment extraction unit 450, a theme song candidate segment determination unit 452, an acoustic feature quantity extraction unit 433, and an acoustic feature quantity collation. A means 431 and a theme song acoustic feature database 432; The continuous sound section extraction means 450 receives the program video as an input, and outputs continuous sound time information as an output to the theme song candidate section determination means 452. The theme song candidate section determination means 452 receives the continuous sound time information output from the continuous sound section extraction means 450 and outputs the theme song candidate section time information as the output to the acoustic feature quantity extraction means 433. The acoustic feature quantity extraction unit 433 receives the program video and the theme song candidate section time information output from the theme song candidate section determination unit 452 as input, and outputs the output acoustic feature quantity to the acoustic feature quantity verification unit 431. The acoustic feature quantity collating means 431 receives the acoustic feature quantity output from the acoustic feature quantity extracting means 433 and the theme song acoustic feature quantity output from the theme song acoustic feature quantity database 432, and outputs section designation time information.

次に、図８に示す主題歌検知手段４１０の動作について説明する。 Next, the operation of the theme song detection means 410 shown in FIG. 8 will be described.

番組映像は、連続音響区間抽出手段４５０へ入力される。連続音響区間抽出手段４５０の動作は、図５の場合と同様であり、求まった連続音響時間情報を主題歌候補区間判定手段４５２へ出力する。 The program video is input to continuous sound segment extraction means 450. The operation of the continuous sound section extraction unit 450 is the same as that in FIG. 5, and the obtained continuous sound time information is output to the theme song candidate section determination unit 452.

主題歌候補区間判定手段４５２の動作も基本的には、図５の主題歌区間判定手段４５１と同様であるが、ここでは、完全に主題歌区間を特定する必要はなく、候補となる区間を抽出するのみでよいため、図５の場合よりもゆるい判定基準を用いてもよい。求まった主題歌候補区間時刻情報は音響特徴量抽出手段４３３へ出力される。 The operation of the theme song candidate section determination means 452 is basically the same as that of the theme song section determination means 451 in FIG. 5, but here it is not necessary to completely specify the theme song section, and the candidate section is selected. Since it only needs to be extracted, a criterion that is looser than in the case of FIG. 5 may be used. The obtained theme song candidate section time information is output to the acoustic feature quantity extraction means 433.

音響特徴量抽出手段４３３へは、番組映像も入力され、音響特徴量を抽出する。ただし、ここでは、主題歌候補区間時刻情報で指定された区間に対してのみ音響特徴量を抽出する。抽出された音響特徴量は、音響特徴量照合手段４３１へ出力される。 A program video is also input to the acoustic feature amount extraction means 433, and an acoustic feature amount is extracted. However, here, the acoustic feature quantity is extracted only for the section specified by the theme song candidate section time information. The extracted acoustic feature amount is output to the acoustic feature amount matching unit 431.

音響特徴量照合手段４３１、主題歌音響特徴量データベース４３２の動作は、図３の場合と同様である。 The operations of the acoustic feature quantity matching means 431 and the theme song acoustic feature quantity database 432 are the same as in the case of FIG.

図８に示す主題歌検知手段４１０では、主題歌候補区間に対してのみ音響特徴量を抽出・照合するため、番組全体に対して特徴量抽出を行う場合に比べ、処理量を軽減できる。なお、このような絞込みは、図４、図６、図７などに示す主題歌検知手段４１０に対しても適用可能であり、処理量の低減が図れる。 In the theme song detection means 410 shown in FIG. 8, since the acoustic feature quantity is extracted and collated only for the theme song candidate section, the processing amount can be reduced as compared with the case where the feature quantity is extracted for the entire program. Such narrowing down can also be applied to the theme song detection means 410 shown in FIGS. 4, 6, 7, etc., and the amount of processing can be reduced.

２．テロップが連続的に出現するという特性に着目してクレジット情報重畳区間を検出
クレジット情報重畳区間検出手段４００の具体的な他の構成について説明する。以下に説明するクレジット情報重畳区間検出手段４００は、コンテンツでは、クレジット情報が重畳されているテロップは連続的に出現するという特性を利用した具体例である。2. Detecting a credit information superimposing section by paying attention to the characteristic that telops appear continuously. Another specific configuration of the credit information superimposing section detecting means 400 will be described. The credit information superimposing section detecting means 400 described below is a specific example using the characteristic that in the content, telops on which credit information is superimposed appear continuously.

図９を参照するとクレジット情報重畳区間検出手段４００の実施の形態の一例が示されており、連続テロップ検知手段４７０と映像切り出し手段４２０とからなる。連続テロップ検知手段４７０は、番組映像を入力とし、その出力である区間指定時刻情報は映像切り出し手段４２０へ接続される。映像切り出し手段４２０は、番組映像と連続テロップ検知手段４７０から出力される区間指定時刻情報を入力とし、クレジット情報重畳区間映像データを出力する。 Referring to FIG. 9, an example of an embodiment of the credit information superimposing section detecting means 400 is shown, and it comprises a continuous telop detecting means 470 and a video clipping means 420. The continuous telop detection means 470 receives a program video as input, and the section designation time information as the output is connected to the video cutout means 420. The video cutout unit 420 receives the program video and the section designation time information output from the continuous telop detection unit 470, and outputs credit information superimposed section video data.

次に、図９に示すクレジット情報重畳区間検出手段４００の動作について述べる。
番組映像は、まず、連続テロップ検知手段４７０へ入力される。連続テロップ検知手段４７０では、テロップが連続して現れる区間を抽出する。これは、ドラマやバラエティ番組などで、クレジット情報がテロップとして重畳される区間では、テロップが連続的に出現するという特性に基づく。そして、この時間区間を区間指定時刻情報として出力する。
具体的には、番組映像に対してテロップ検出を行い、テロップが検出できた場合には、その開始時刻と終了時刻を求める処理を繰り返す。次に、開始時刻と終了時刻を解析し、複数のテロップがほとんど間を空けずに次々と出現する時間区間を求める。あるいは、異なるテロップ間の時間間隔を解析するかわりに、１画面中のテロップ占有面積を求め、ある一定領域以上の占有面積が断続的に続く区間として、区間指定時刻情報を求めてもよい。区間指定時刻情報は、番組映像とともに、映像切り出し手段４２０へ入力される。映像切り出し手段４２０の動作は、図２の場合と同様である。Next, the operation of the credit information superimposition section detecting means 400 shown in FIG. 9 will be described.
The program video is first input to the continuous telop detection means 470. The continuous telop detection means 470 extracts sections where telops appear continuously. This is based on the characteristic that telops appear continuously in sections where credit information is superimposed as telops in drama and variety programs. Then, this time section is output as section designation time information.
Specifically, telop detection is performed on the program video, and if the telop is detected, the process for obtaining the start time and end time is repeated. Next, the start time and the end time are analyzed, and a time interval in which a plurality of telops appear one after another with almost no gap is obtained. Alternatively, instead of analyzing the time interval between different telops, the telop occupation area in one screen may be obtained, and the section designation time information may be obtained as a section in which the occupation area of a certain region or more is intermittently continued. The section designation time information is input to the video cutout unit 420 together with the program video. The operation of the video cutout means 420 is the same as in the case of FIG.

このように図９に示すクレジット情報重畳区間検出手段は、音響特徴用の解析などの複雑な処理を行わなくても、テロップ出現のパターン情報のみを用いてクレジット情報重畳区間を求めることを可能にする。特に、静止テロップでクレジット情報が表示される番組に対して有効である。 As described above, the credit information superimposing section detection means shown in FIG. 9 can obtain the credit information superimposing section using only the pattern information of the telop appearance without performing complicated processing such as analysis for acoustic features. To do. This is particularly effective for programs in which credit information is displayed in a stationary telop.

３．ロールテロップ上にクレジット情報が連続的に出現するという特性に着目してクレジット情報重畳区間を検出
図１０を参照するとクレジット情報重畳区間検出手段４００の他の例が示されており、ロールテロップ検知手段４８０と映像切り出し手段４２０とからなる。ロールテロップ検知手段４８０は、番組映像を入力とし、その出力である区間指定時刻情報は映像切り出し手段４２０へ接続される。映像切り出し手段４２０は、番組映像とロールテロップ検知手段４８０から出力される区間指定時刻情報を入力とし、クレジット情報重畳区間映像データを出力する。3. Detecting credit information superimposition section by paying attention to the characteristic that credit information appears continuously on roll telop Referring to FIG. 10, another example of credit information superimposition section detection means 400 is shown. 480 and image cutout means 420. The roll telop detection means 480 receives the program video as input, and the section designation time information as the output is connected to the video cutout means 420. The video cutout means 420 receives the program video and the section designation time information output from the roll telop detection means 480, and outputs credit information superimposed section video data.

次に、図１０に示すクレジット情報重畳区間検出手段４００の動作について述べる。 Next, the operation of the credit information superimposition section detecting means 400 shown in FIG. 10 will be described.

番組映像は、まず、ロールテロップ検知手段４８０へ入力される。ロールテロップ検知手段４８０では、水平方向、あるいは垂直方向にスクロールするロールテロップを検知し、ロールテロップの存在する区間を区間指定時刻情報として出力する。これは、ドラマやバラエティ番組などで、クレジット情報が水平方向、あるいは垂直方向にスクロールしながら表示される場合が多いことに基づく。 The program video is first input to the roll telop detection means 480. The roll telop detection unit 480 detects a roll telop that scrolls in the horizontal direction or the vertical direction, and outputs a section where the roll telop exists as section designation time information. This is based on the fact that credit information is often displayed while scrolling horizontally or vertically in a drama or a variety program.

このタイプのクレジットは、たいてい番組の最後であるため、エンドロールと呼ばれることもある。このため、ロールテロップを検知する際、時刻情報も併用し、映像の終わりに近い部分に対してロールテロップ検知を行うようになっていてもよい。これにより、番組映像央全体に対してロールテロップ検知を行う場合に比べ、処理量を大幅に低減できる。 This type of credit is often referred to as an end roll because it is usually the end of a program. For this reason, when detecting a roll telop, time information may also be used in combination, and roll telop detection may be performed on a portion near the end of the video. Thereby, compared with the case where roll telop detection is performed on the entire program video center, the processing amount can be greatly reduced.

具体的なロールテロップの検知方法としては、フレーム間で動き推定を行い、水平または垂直方向に等速直線運動を行っている領域を探す。そして、この等速直線運動が一定の時間間隔続く場合にロールテロップとして検知する。動き推定には、例えばブロックマッチングや、一般化ハフ変換を用いることができる。 As a specific roll telop detection method, motion estimation is performed between frames to search for a region in which a uniform linear motion is performed in the horizontal or vertical direction. And when this constant velocity linear motion continues for a fixed time interval, it detects as a roll telop. For motion estimation, for example, block matching or generalized Hough transform can be used.

求まった区間指定時刻情報は、番組映像とともに、映像切り出し手段４２０へ入力される。映像切り出し手段４２０の動作は、図２の場合と同様である。 The obtained section designation time information is input to the video cutout means 420 together with the program video. The operation of the video cutout means 420 is the same as in the case of FIG.

このように、図１０に示すクレジット情報重畳区間検出手段は、ロールテロップを検知することで、音響信号を用いずとも、クレジット重畳区間を検知できる。これは、映画やドラマなど、コンテンツの最後でクレジット情報が縦や横方向にスクロールしていく場合に特に有効である。また、バラエティなど、主題歌がなく、音響情報が使えない場合であっても、ロールテロップを検知することで、クレジット重畳区間を求めることができる。 Thus, the credit information superimposition section detection means shown in FIG. 10 can detect the credit superimposition section without using the acoustic signal by detecting the roll telop. This is particularly effective when the credit information scrolls vertically or horizontally at the end of the content, such as a movie or drama. Further, even when there is no theme song such as variety and acoustic information cannot be used, a credit superimposition section can be obtained by detecting a roll telop.

４．上述した構成の組み合わせによりクレジット情報重畳区間を検出
次に、主題歌検知手段４００の他の具体的な構成について説明する。図１１を参照すると、クレジット情報重畳区間検出手段４００の他の具体的な構成が示されており、主題歌検知手段４１０、ロールテロップ検知手段４８０、連続テロップ検知手段４７０、選択手段４８１、映像切り出し手段４２０とからなる。主題歌検知手段４１０、ロールテロップ検知手段４８０、連続テロップ検知手段４７０は、すべて、番組映像を入力とし、区間指定時刻情報を選択手段４８１へ出力する。選択手段４８１は、主題歌検知手段４１０から出力される区間指定時刻情報と、ロールテロップ検知手段４８０から出力される区間指定時刻情報と、連続テロップ検知手段４７０から出力される区間指定時刻情報とを入力とし、区間指定時刻情報を映像切り出し手段４２０へ出力する。映像切り出し手段４２０は、番組映像と選択手段４８１から出力される区間指定時刻情報とを入力とし、クレジット情報重畳区間映像データを出力する。4). Detection of credit information superimposed section by combination of the above-described configurations Next, another specific configuration of the theme song detection means 400 will be described. Referring to FIG. 11, there is shown another specific configuration of the credit information superimposition section detecting means 400, which includes a theme song detecting means 410, a roll telop detecting means 480, a continuous telop detecting means 470, a selecting means 481, and a video cutout. Means 420. The theme song detection means 410, the roll telop detection means 480, and the continuous telop detection means 470 all receive the program video and output the section designation time information to the selection means 481. The selection unit 481 obtains the section designation time information output from the theme song detection unit 410, the section designation time information output from the roll telop detection unit 480, and the section designation time information output from the continuous telop detection unit 470. As an input, the section designation time information is output to the video cutout means 420. The video cutout unit 420 receives the program video and the section designation time information output from the selection unit 481 and outputs credit information superimposed section video data.

次に、図１１に示すクレジット情報重畳区間検出手段４００の動作について説明する。番組映像は、主題歌検知手段４１０、ロールテロップ検知手段４８０、連続テロップ検知手段４７０へ入力される。主題歌検知手段４１０、ロールテロップ検知手段４８０、連続テロップ検知手段４７０の動作は、前述のものと同様である。これらから出力される区間指定時刻情報は選択手段４８１へ入力される。選択手段４８１では、入力される区間指定時刻情報のうち、確からしいものを選択して出力する。もし、入力のうち、どれか1つのみしか区間指定時刻情報が入力されない場合には、その区間指定時刻情報を出力する。一方、複数の区間指定時刻情報が重なる場合（例えば、主題歌中にロールテロップが現れる場合など）には、重複する区間指定時刻情報を出力する。ただし、各検知手段で部分的にしか検知できない場合もあるため、全体のORをとるようにして区間指定時刻情報を求めてもよい。求まった区間指定時刻情報は、映像切り出し手段４２０へ出力される。 Next, the operation of the credit information superimposition section detecting means 400 shown in FIG. 11 will be described. The program video is input to the theme song detection means 410, the roll telop detection means 480, and the continuous telop detection means 470. The operations of the theme song detection means 410, the roll telop detection means 480, and the continuous telop detection means 470 are the same as those described above. The section designation time information output from these is input to the selection means 481. The selection means 481 selects and outputs a probable piece of input section designation time information. If only one of the inputs is input with the section specified time information, the section specified time information is output. On the other hand, when a plurality of section designation time information overlaps (for example, when a roll telop appears in the theme song), overlapping section designation time information is output. However, since each detection means may be able to detect only partly, the section designation time information may be obtained by taking the entire OR. The obtained section designation time information is output to the video cutout means 420.

映像切り出し手段４２０の動作は、図７の場合と同様である。 The operation of the video cutout means 420 is the same as that in FIG.

図１１のクレジット情報重畳区間検出手段は、様々なクレジットの出現パターンに適応的に対応できるという特長がある。また、複数のソースの利用により、クレジット重畳区間の検出精度を高めることができる。 The credit information superimposition section detecting means in FIG. 11 has an advantage that it can adaptively cope with various credit appearance patterns. Moreover, the detection accuracy of a credit superimposition area can be improved by using a plurality of sources.

＜クレジット情報読み取り手段６００の具体的な構成＞
（１）クレジット情報読み取り手段６００の具体的な構成例１
次に、クレジット情報読み取り手段６００の具体的な構成の一例を説明する。<Specific Configuration of Credit Information Reading Unit 600>
(1) Specific configuration example 1 of the credit information reading unit 600
Next, an example of a specific configuration of the credit information reading unit 600 will be described.

図１２はクレジット情報読み取り手段６００の一例を示した図であり、クレジット情報読み取り手段６００はテロップ読み取り手段６１０からなる。テロップ読み取り手段６１０は、クレジット情報重畳区間映像データを入力とし、クレジット候補情報を出力する。 FIG. 12 is a diagram showing an example of the credit information reading unit 600, and the credit information reading unit 600 includes a telop reading unit 610. The telop reading means 610 receives credit information superimposed section video data as input, and outputs credit candidate information.

図１２のクレジット情報読み取り手段６００の動作について述べる。 The operation of the credit information reading means 600 in FIG. 12 will be described.

クレジット情報重畳区間映像データは、テロップ読み取り手段６１０へ入力される。テロップ読み取り手段６１０では、入力される映像に対してテロップ認識を行い、認識結果をクレジット候補情報として出力する。ここで、テロップ認識をクレジット用にカスタマイズすることで、識別率を向上できる。例えば、「脚本」や「主題歌」など、クレジット情報で頻繁に使用される重要単語を重点的に学習した辞書を用いることができる。あるいは、このような特定の単語を事前に登録しておき、その単語が現れたかどうかを判定するようにすることで、より精度よく単語を抽出できるようになる。また、このような特定な単語を複数のフォントに対して学習しておき、文字列が出現したときにフォントを推定し、フォントごとにカスタマイズしたテロップ認識辞書を選択して、他のクレジット情報の読み取りを行うようにしてもよい。また、クレジットに現れる可能性がある人名を、その人の属性（例えば脚本家、俳優といった職業などの情報）別にデータベースに登録しておき、例えば脚本の箇所であれば、脚本家のデータベースから人名を探して識別するようにすることで、人名の識別精度を飛躍的に向上できる。さらに、この人名データベースを用いることで、人名の一部が読み取れなかった場合であっても、効率よく候補を絞り込むことが可能になる。また、クレジット情報の現れる順番やパターンにもある程度の規則性がある（例えば脚本家や原作家の情報は出演者の情報よりも時間的に先に出現しやすい、あるいは、単独で表示されることが多いなど）ため、これらの情報を反映させてテロップを識別することで、さらに精度を向上できる。以後、このようなテロップ認識に用いるパラメータ類をテロップ認識パラメータと呼ぶことにする。 The credit information superimposed section video data is input to the telop reading means 610. The telop reading means 610 performs telop recognition on the input video and outputs the recognition result as credit candidate information. Here, the identification rate can be improved by customizing the telop recognition for credit. For example, it is possible to use a dictionary that focuses on important words frequently used in credit information, such as “screenplay” and “theme song”. Alternatively, by registering such a specific word in advance and determining whether or not the word has appeared, the word can be extracted with higher accuracy. It also learns such specific words for multiple fonts, estimates the font when a character string appears, selects a customized telop recognition dictionary for each font, and stores other credit information You may make it read. Also, the names of people who may appear in credits are registered in the database according to their attributes (for example, information such as screenwriters, actors, etc.). By searching for and identifying the name, the identification accuracy of the person name can be dramatically improved. Furthermore, by using this personal name database, it is possible to narrow down candidates efficiently even when a part of the personal name cannot be read. There is also a certain degree of regularity in the order and pattern in which credit information appears (for example, information on screenwriters and original authors is likely to appear earlier in time than the information on performers, or is displayed alone) Therefore, the accuracy can be further improved by identifying the telop by reflecting such information. Hereinafter, such parameters used for telop recognition are referred to as telop recognition parameters.

このようなクレジット情報読み取り手段６００は、テロップ読み取り手段のみを用いて構成しているため、簡易に構成することができる。また、クレジット情報読み取り手段６００には、テロップが重畳されているクレジット情報重畳区間映像データが入力されるので、番組全体に対してテロップの読み取りを行う場合に比べると、余分な処理を行わずに、処理を軽減できる。すなわち、番組全体をテロップ読み取りする場合よりも、テロップが重畳された部分だけ、より詳細に効率よく解析し、テロップを読み取ることができる。このため、読み取りアルゴリズムをテロップ読み取りに特化することができ、クレジット情報の読み取り精度を向上できる。 Since such credit information reading means 600 is configured using only the telop reading means, it can be easily configured. Further, since the credit information superimposing section video data on which the telop is superimposed is input to the credit information reading means 600, compared with the case where the telop is read for the entire program, an extra process is not performed. , Can reduce the processing. That is, the telop can be read by analyzing the portion where the telop is superimposed more efficiently and in detail than when the telop is read for the entire program. For this reason, the reading algorithm can be specialized for telop reading, and the reading accuracy of credit information can be improved.

（２）クレジット情報読み取り手段６００の具体的な構成例２
クレジット情報読み取り手段６００の他の具体的な構成の一例を説明する。本例は、コンテンツに含まれる音響のうち主題歌が流れている映像に着目してクレジット情報読み取る例である。(2) Specific configuration example 2 of the credit information reading unit 600
An example of another specific configuration of the credit information reading unit 600 will be described. This example is an example in which credit information is read by paying attention to a video in which a theme song is flowing among sounds included in content.

図１３を参照すると、クレジット情報読み取り手段６００の実施の形態の一例が示されており、主題歌背景映像生成手段６２０と主題歌背景差分映像生成手段６３０とテロップ読み取り手段６４０とからなる。 Referring to FIG. 13, an example of an embodiment of the credit information reading means 600 is shown, which comprises a theme song background video generation means 620, a theme song background difference video generation means 630 and a telop reading means 640.

主題歌背景映像生成手段６２０は、クレジット情報重畳区間映像データを入力とし、主題歌背景映像を主題歌背景差分映像生成手段６３０へ出力する。主題歌背景差分映像生成手段６３０は、クレジット情報重畳区間映像データと主題歌背景映像生成手段６２０から出力される主題歌背景映像とを入力とし、主題歌背景差分映像をテロップ読み取り手段６４０へ出力する。テロップ読み取り手段６４０は、主題歌背景差分映像生成手段６３０から出力される主題歌背景差分映像を入力とし、クレジット候補情報を出力する。 The theme song background video generation means 620 receives the credit information superimposed section video data as input, and outputs the theme song background video to the theme song background difference video generation means 630. The theme song background difference video generation means 630 receives the credit information superimposed section video data and the theme song background video output from the theme song background video generation means 620, and outputs the theme song background difference video to the telop reading means 640. . The telop reading unit 640 receives the theme song background difference video output from the theme song background difference video generation unit 630 and outputs credit candidate information.

次に、図１３のクレジット情報読み取り手段６００の動作について説明する。
まず、クレジット情報重畳区間映像データは、主題歌背景映像生成手段６２０へ入力される。ここで、クレジット情報重畳区間映像データは、複数回の映像を含むものとする。Next, the operation of the credit information reading unit 600 in FIG. 13 will be described.
First, the credit information superimposed section video data is input to the theme song background video generation means 620. Here, it is assumed that the credit information superimposed section video data includes a plurality of times of video.

主題歌背景映像生成手段６２０では、複数回の映像間で背景（クレジット情報以外の部分）が同じであるフレーム同士を対応付ける。対応付けられたフレーム間で画像処理を行って、主題歌の背景映像を作成し、主題歌背景差分映像生成手段６３０へ出力する。画像処理の詳細については後述する。 The theme song background video generation means 620 associates frames having the same background (portion other than the credit information) among a plurality of videos. Image processing is performed between the associated frames to create a background video of the theme song and output it to the theme song background difference video generation means 630. Details of the image processing will be described later.

主題歌背景差分映像生成手段６３０では、入力される主題歌背景映像とクレジット情報重畳区間映像データの差分を求め、この値に基づいて主題歌背景差分映像を生成する。具体的には、差分が大きい画素は原画像をそのまま用いるようにし、そうでない画素は画素値を０にする。これにより、クレジットの部分のみが残る主題歌背景差分映像を生成できる。主題歌背景差分映像は、テロップ読み取り手段６４０へ出力される。 The theme song background difference image generation means 630 obtains a difference between the input theme song background image and the credit information superimposed section image data, and generates a theme song background difference image based on this value. Specifically, the original image is used as it is for pixels having a large difference, and the pixel value is set to 0 for pixels that are not. Thereby, the theme song background difference image in which only the credit portion remains can be generated. The theme song background difference video is output to the telop reading means 640.

テロップ読み取り手段６４０では、入力される映像に対してテロップ認識を行い、認識結果をクレジット候補情報として出力する。 The telop reading unit 640 performs telop recognition on the input video and outputs the recognition result as credit candidate information.

図１３のクレジット情報読み取り手段６００では、テロップ認識において背景の影響がなくなるため、読み取り精度を向上できる。 In the credit information reading means 600 of FIG. 13, the influence of the background is eliminated in the telop recognition, so that the reading accuracy can be improved.

ここで、主題歌背景映像生成手段６２０について述べる。 Here, the theme song background video generation means 620 will be described.

図１４を参照すると、主題歌背景映像生成手段６２０の実施の形態の一例が示されており、視覚特徴量抽出手段７２０と対応フレーム算出手段７１０と背景映像生成手段７００とからなる。視覚特徴量抽出手段７２０は、クレジット情報重畳区間映像データを入力とし、主題歌背景視覚特徴量を対応フレーム算出手段７１０へ出力する。対応フレーム算出手段７１０は、視覚特徴量抽出手段７２０から出力される主題歌背景視覚特徴量を入力とし、フレーム対応情報を背景映像生成手段７００へ出力する。背景映像生成手段７００は、クレジット情報重畳区間映像データと対応フレーム算出手段７１０から出力されるフレーム対応情報とを入力とし、主題歌背景映像を出力する。 Referring to FIG. 14, an example of an embodiment of the theme song background video generation means 620 is shown. The visual feature quantity extraction unit 720 receives the credit information superimposed section video data as input, and outputs the theme song background visual feature quantity to the corresponding frame calculation unit 710. Corresponding frame calculation means 710 receives the theme song background visual feature quantity output from visual feature quantity extraction means 720 and outputs frame correspondence information to background video generation means 700. The background video generation means 700 receives the credit information superimposed section video data and the frame correspondence information output from the corresponding frame calculation means 710, and outputs the theme song background video.

次に、図１４の主題歌背景映像生成手段６２０の動作について説明する。 Next, the operation of the theme song background video generation means 620 in FIG. 14 will be described.

まず、クレジット情報重畳区間映像データは、視覚特徴量抽出手段７２０へ入力される。ここで、クレジット情報重畳区間映像データは、複数回の映像分のクレジット情報重畳区間映像である。例えば、シリーズもののドラマの場合には、数話分のドラマに対応する映像がまとめて入力されるものとする。視覚特徴量抽出手段７２０では、各回の映像から視覚特徴量を抽出する。抽出された視覚特徴量は、主題歌背景視覚特徴量として、対応フレーム算出手段７１０へ出力される。 First, the credit information superimposed section video data is input to the visual feature amount extraction means 720. Here, the credit information superimposed section video data is credit information superimposed section video for a plurality of times of video. For example, in the case of a series drama, it is assumed that videos corresponding to a drama for several episodes are input together. The visual feature amount extraction means 720 extracts a visual feature amount from each video. The extracted visual feature value is output to the corresponding frame calculation means 710 as the theme song background visual feature value.

対応フレーム算出手段７１０では、入力された各回の視覚特徴量間で照合を行う。この際、照合は各回の特徴量全体で行うのではなく、各回のクレジット情報重畳区間映像から切り出される任意長の区間同士で行う。これにより、各回で映像特徴量が一致する区間が求まる。区間が求まると、映像のフレーム同士の対応関係も求まる。なお、ここで、区間の対応付けは、数フレーム分前後にずれる可能性もあるため、このずれを補償する仕組みを追加してもよい。例えば、対応付けられた前後のフレームでフレーム間差分をとり、これが最小なるものを選ぶ、あるいは、マッチングがとれるピクセルの数が最大になるフレームを選ぶなどの方法が考えられる。このようにして求まった各回のフレーム間の対応情報は、フレーム対応情報として背景映像生成手段７００へ出力される。 Corresponding frame calculation means 710 performs collation between the inputted visual feature quantities each time. At this time, the collation is not performed for the entire feature amount of each time, but is performed between sections of an arbitrary length cut out from the credit information superimposed section video of each time. As a result, a section in which the video feature amount matches each time is obtained. When a section is obtained, the correspondence between video frames is also obtained. Here, there is a possibility that the association of the sections may be shifted around several frames, so a mechanism for compensating for this deviation may be added. For example, a method may be considered in which an inter-frame difference is taken between the frames before and after the correspondence and a frame having the smallest difference is selected, or a frame having the maximum number of pixels that can be matched is selected. The correspondence information between the frames obtained in this way is output to the background video generation means 700 as frame correspondence information.

背景映像生成手段７００では、入力される各回のクレジット情報重畳区間映像データと、対応フレーム算出手段７１０から出力されるフレーム対応情報とから、主題歌背景映像を生成する。フレーム対応情報から対応付けられる各回のフレームの対応位置の画素値に統計処理を行って生成する。 The background video generation means 700 generates the theme song background video from the input credit information superimposed section video data each time and the frame correspondence information output from the corresponding frame calculation means 710. It is generated by performing statistical processing on the pixel value at the corresponding position of each frame associated from the frame correspondence information.

次に、このアルゴリズムの詳細について説明する。ここで、Fn,m(i,j)をn番目の映像のm番目のフレームの位置(I)における画素値とする。また、入力される番組の数をNとし、n番目の映像のm_n番目のフレームが対応するフレームであるとする。また、生成する背景映像のm番目のフレームの位置(i,j)における画素値をBm(i,j)で表すこととする。このとき、Bm(i,j)の値は、Fn,m_n(i,j) (n=1、…、N)から算出される。
まず、各画素(i,j)において、Fn,m_n(i,j) (n=1、…、N)の分散σ(i,j)を求める。これが十分小さい場合は、この位置にはどの回もテロップが載っていないと考えられる。よって、Bm(i,j)の値は単純に平均することで算出できる。すなわち、

によって算出する。一方、分散σ(i,j)が大きい場合には、テロップが重畳されている可能性が高いと考えられる。この場合にそのまま単純平均を出すと、テロップの影響が背景に含まれ、背景映像がうまく生成できないという問題がある。そこで、分散σ(i,j)が大きいときには、例えば、Fn,m_n(i,j) (n=1、…、N)のメディアン値をBm(i,j)とする。これにより、テロップが載っている回の方が少ない場合には、背景画像生成でのテロップの影響を排除できる。Next, details of this algorithm will be described. Here, Fn, m (i, j) is a pixel value at the position (I) of the mth frame of the nth video. Also, assume that the number of input programs is N, and the m_nth frame of the nth video is a corresponding frame. Also, the pixel value at the position (i, j) of the mth frame of the background video to be generated is represented by Bm (i, j). At this time, the value of Bm (i, j) is calculated from Fn, m_n (i, j) (n = 1,..., N).
First, in each pixel (i, j), a variance σ (i, j) of Fn, m_n (i, j) (n = 1,..., N) is obtained. If this is sufficiently small, it is considered that there is no telop at this position. Therefore, the value of Bm (i, j) can be calculated by simply averaging. That is,

Calculated by On the other hand, when the variance σ (i, j) is large, it is considered that there is a high possibility that the telop is superimposed. In this case, if a simple average is calculated as it is, the effect of the telop is included in the background, and there is a problem that the background video cannot be generated well. Therefore, when the variance σ (i, j) is large, for example, the median value of Fn, m_n (i, j) (n = 1,..., N) is set to Bm (i, j). Thereby, when the number of times the telop is placed is less, the influence of the telop in the generation of the background image can be eliminated.

しかしながら、テロップが載っている回の方が多い位置(i,j)も存在すると考えられる。そこで、各回の映像Fn,m_n(i,j) (n=1、…、N)で、対象画素(i,j)がテロップに含まれている可能性を表す指標を定義し、これが大きいほど重みを小さくして加重平均をとる。これにより、テロップの入っていない回の映像の画素値の重みが大きくなり、背景映像へのテロップの影響を軽減できる。 However, it is considered that there are positions (i, j) where there are more times when the telop is placed. Therefore, in each image Fn, m_n (i, j) (n = 1,..., N), an index indicating the possibility that the target pixel (i, j) is included in the telop is defined. Decrease the weight and take the weighted average. Thereby, the weight of the pixel value of the video without the telop is increased, and the influence of the telop on the background video can be reduced.

このテロップらしさを表す指標をRn,m_n(i,j)で表すことにする。ただし、Rn,m_n(i,j)は非負の値を有し、これが大きいほど、テロップにふくまれている可能性が高いものとする。これを用いて以下の式により、背景映像の画素値Bm(i,j)を算出する。

ここで、g(x)は非負の値を返すxに対する単調減少関数である。このようにして、テロップの影響が少ない背景映像を生成することが可能である。Rn,m_n(i,j)としては、例えば、位置(i,j)近傍のエッジの多さや勾配の大きさなどを用いることができる。あるいは、テロップらしいパターンを学習したニューラルネットなどの識別器を用いて、テロップらしさを判定するようにしてもよい。An index representing the telop-likeness is represented by Rn, m_n (i, j). However, Rn, m_n (i, j) has a non-negative value, and it is assumed that the larger the value, the higher the possibility of inclusion in the telop. Using this, the pixel value Bm (i, j) of the background video is calculated by the following equation.

Where g (x) is a monotonically decreasing function for x that returns a non-negative value. In this way, it is possible to generate a background video that is less affected by the telop. As Rn, m_n (i, j), for example, the number of edges near the position (i, j), the magnitude of the gradient, and the like can be used. Alternatively, the telop-likeness may be determined using a discriminator such as a neural network that has learned a telop-like pattern.

このようにして得られた背景映像を用いると、背景の影響がないクレジット部分のみからなる映像を生成できるため、後段に接続されるテロップ認識の精度向上に貢献する。 By using the background video obtained in this way, it is possible to generate a video consisting only of a credit portion without the influence of the background, which contributes to improving the accuracy of telop recognition connected to the subsequent stage.

（３）クレジット情報読み取り手段６００の具体的な構成例３
図１５を参照すると、クレジット情報読み取り手段６００の他の一例が示されており、主題歌背景映像生成手段６２０と主題歌背景差分映像生成手段６３０と第１のテロップ読み取り手段６１０と第２のテロップ読み取り手段６４０とテロップ読み取り結果統合手段６５０とからなる。第１のテロップ読み取り手段６１０は、クレジット情報重畳区間映像データを入力とし、第１のクレジット候補情報をテロップ読み取り結果統合手段６５０へ出力する。主題歌背景映像生成手段６２０は、クレジット情報重畳区間映像データを入力とし、主題歌背景映像を主題歌背景差分映像生成手段６３０へ出力する。主題歌背景差分映像生成手段６３０は、クレジット情報重畳区間映像データと主題歌背景映像生成手段６２０から出力される主題歌背景映像とを入力とし、主題歌背景差分映像を第２のテロップ読み取り手段６４０へ出力する。第２のテロップ読み取り手段６４０は、主題歌背景差分映像生成手段６３０から出力される主題歌背景差分映像を入力とし、第２のクレジット候補情報をテロップ読み取り結果統合手段６５０へ出力する。テロップ読み取り結果統合手段６５０は、第１のテロップ読み取り手段６１０から出力される第１のクレジット候補情報と第２のテロップ読み取り手段６４０から出力される第２のクレジット候補情報を入力とし、クレジット候補情報を出力する。(3) Specific configuration example 3 of the credit information reading unit 600
Referring to FIG. 15, another example of the credit information reading unit 600 is shown. The theme song background video generation unit 620, the theme song background difference video generation unit 630, the first telop reading unit 610, and the second telop are illustrated. A reading unit 640 and a telop reading result integrating unit 650 are included. The first telop reading unit 610 receives the credit information superimposed section video data as input, and outputs the first credit candidate information to the telop reading result integration unit 650. The theme song background video generation means 620 receives the credit information superimposed section video data as input, and outputs the theme song background video to the theme song background difference video generation means 630. The theme song background difference image generation means 630 receives the credit information superimposed section image data and the theme song background image output from the theme song background image generation means 620 as input, and uses the theme song background difference image as the second telop reading means 640. Output to. The second telop reading means 640 receives the theme song background difference video output from the theme song background difference video generation means 630 and outputs the second credit candidate information to the telop reading result integration means 650. The telop reading result integration unit 650 receives the first credit candidate information output from the first telop reading unit 610 and the second credit candidate information output from the second telop reading unit 640 as input, and credit candidate information Is output.

次に、図１５のクレジット情報読み取り手段６００の動作について述べる。第１のテロップ読み取り手段６１０の動作は、図１２のテロップ読み取り手段６１０と同様であり、第１のクレジット候補情報がテロップ読み取り結果統合手段６５０へ出力される。主題歌背景映像生成手段６２０と主題歌背景差分映像生成手段６３０の動作は、図１３のものと同様である。また、第２のテロップ読み取り手段６４０の動作も図１３のテロップ読み取り手段６４０と同様であり、第２のクレジット候補情報がテロップ読み取り結果統合手段６５０へ出力される。 Next, the operation of the credit information reading unit 600 in FIG. 15 will be described. The operation of the first telop reading unit 610 is the same as that of the telop reading unit 610 of FIG. 12, and the first credit candidate information is output to the telop reading result integration unit 650. The operations of the theme song background video generation means 620 and the theme song background difference video generation means 630 are the same as those in FIG. The operation of the second telop reading unit 640 is the same as that of the telop reading unit 640 of FIG. 13, and the second credit candidate information is output to the telop reading result integration unit 650.

テロップ読み取り結果統合手段６５０では、第１のクレジット候補情報と第２のクレジット候補情報を統合し、クレジット候補情報を生成して出力する。統合方法としてはいくつか考えられるが、例えば、両者の候補情報を合わせて候補情報として出力する方法、両者のうち、テロップ認識の信頼度が高い方を候補情報として出力する方法、両者のうち、信頼度が一定の基準より高いものをすべて候補情報として出力する方法などがある。これ以外にも、両者を統合して出力を生成する方法であれば、どのような方法でもよい。 The telop reading result integration unit 650 integrates the first credit candidate information and the second credit candidate information, and generates and outputs credit candidate information. There are several possible integration methods, for example, a method for outputting candidate information by combining the candidate information of both, a method for outputting candidate information with higher reliability of telop recognition, of both, There is a method of outputting all information having reliability higher than a certain standard as candidate information. Other than this, any method may be used as long as both methods are integrated to generate an output.

図１５のクレジット情報読み取り手段６００では、通常の画像データか主題歌背景差分の画像データのどちらかで正しく読み取れればよいため、図１２や図１３のように単独で用いる場合に比べ、認識精度を向上できる。 The credit information reading means 600 of FIG. 15 only needs to correctly read either normal image data or image data of the theme song background difference, so that the recognition accuracy is higher than when used alone as shown in FIGS. Can be improved.

本説明のクレジット情報読み取り手段６００は、第１の読み取り手段での読み取り結果と第２の読み取り手段での読み取り結果とのうち、信頼度が高い方を選択してマージしているので、どちらか一方のみを用いた場合よりも読み取り精度を向上することができる。例えば、背景の主題歌で毎回同じ位置に同じクレジット情報が重畳される場合には、背景差分ではクレジット文字列が抽出できないため、直接テロップを読み取った方が精度がよい。一方、テロップ重畳位置や内容が毎回異なる場合には、背景が複雑で通常のテロップ読み取りができない場合であっても、背景差分を読み取ることで、クレジット情報の読み取れるようになる。このようにして、両者をマージすることでクレジット読み取りの精度を向上することができる。 The credit information reading unit 600 of the present description selects and merges one of the reading results obtained by the first reading unit and the reading result obtained by the second reading unit with higher reliability. The reading accuracy can be improved as compared with the case where only one is used. For example, when the same credit information is superimposed at the same position in the background theme song every time, the credit character string cannot be extracted by the background difference, so it is better to read the telop directly. On the other hand, when the telop superposition position and content are different each time, even when the background is complicated and normal telop reading is impossible, the credit information can be read by reading the background difference. In this way, the accuracy of credit reading can be improved by merging both.

＜第２の実施の形態＞
本発明の第２の実施の形態について図面を参照して詳細に説明する。<Second Embodiment>
A second embodiment of the present invention will be described in detail with reference to the drawings.

図１６を参照すると、本発明の第２の実施の形態の一例が示されており、クレジット情報認識手段１００と、対象物認識手段１０５と、統合手段１０３とを含む。 Referring to FIG. 16, an example of the second embodiment of the present invention is shown, which includes a credit information recognition unit 100, an object recognition unit 105, and an integration unit 103.

クレジット情報認識手段１００は、番組映像を入力とし、その出力は統合手段１０３へ接続される。対象物認識手段１０５は、番組映像を入力とし、その出力は、統合手段１０３へ接続される。統合手段１０３は、クレジット情報認識手段１００の出力と対象物認識手段１０５の出力を入力とし、権利情報を出力する。 The credit information recognition unit 100 receives a program video as an input, and its output is connected to the integration unit 103. The object recognition unit 105 receives a program video as an input, and the output is connected to the integration unit 103. The integration unit 103 receives the output of the credit information recognition unit 100 and the output of the object recognition unit 105 as inputs, and outputs right information.

次に、第２の実施の形態の動作について説明する。 Next, the operation of the second embodiment will be described.

番組映像は、クレジット情報認識手段１００と対象物認識手段１０１とへ入力される。 The program video is input to the credit information recognition unit 100 and the object recognition unit 101.

クレジット情報認識手段１００の動作は、上述した第１の実施の形態又は実施例のものと同様であり、クレジット候補情報を統合手段１０３へ出力する。 The operation of the credit information recognition unit 100 is the same as that of the first embodiment or example described above, and the credit candidate information is output to the integration unit 103.

対象物認識手段１０５は、コンテンツ内の権利に関する対象物を認識する手段であり、対象物とは、コンテンツ内の音楽著作物や、登場人物等である。 The object recognition means 105 is a means for recognizing an object related to a right in the content, and the object is a music work or a character in the content.

例えば、対象物が音楽著作物の場合、番組映像から音響特徴量を抽出し、既にデータベースに登録されている音響特徴量と照合する。この際、音楽著作物の全体ではなく、一部区間のみの照合も許可して照合を行う。照合した結果、データベース内の楽曲と同一であると判定された場合には、その楽曲を特定する音楽識別情報（例えば楽曲に付与されたＩＤ）を出力する。同じ楽曲でも複数の音源がデータベースに登録されており、それらの一つが照合された場合には、その音源を特定する情報も含んでいてもよい。また、楽曲の全体ではなく一部が照合された場合には、その照合区間を特定する情報を音楽識別情報に含んでいてもよい。さらに、音楽識別の確からしさを表す指標も合わせて含んでいてもよい。また、音楽識別情報は、各楽曲に対して１つだけ出力するようになっていてもよいし、複数の候補を出力するようになっていてもよい。また、抽出した音響特徴量が照合を試行したどの音響特徴量ともマッチングしない場合には、その音響特徴量を含む映像区間を特定する情報を音楽識別情報に含めて出力するようになっていてもよい。さらに、この区間の音響信号を一緒に出力するようになっていてもよい。このようにして求められた音楽識別情報は、統合手段１０３へ出力される。 For example, when the target object is a music work, an acoustic feature amount is extracted from a program video and collated with an acoustic feature amount already registered in the database. At this time, the collation is performed by permitting the collation of only a part of the section, not the entire music work. As a result of the collation, if it is determined that the music is the same as the music in the database, music identification information for specifying the music (for example, an ID given to the music) is output. A plurality of sound sources are registered in the database even for the same music piece, and when one of them is collated, information for specifying the sound source may be included. Further, when a part of the music is collated rather than the whole, the music identification information may include information for specifying the collation section. Further, it may also include an index representing the probability of music identification. Further, only one piece of music identification information may be output for each music piece, or a plurality of candidates may be output. Also, if the extracted acoustic feature quantity does not match any acoustic feature quantity that has been verified, the music identification information may include information for specifying a video section including the acoustic feature quantity. Good. Furthermore, the acoustic signal of this section may be output together. The music identification information obtained in this way is output to the integration unit 103.

また、対象物が登場人物である場合、映像中に出現する登場人物の人物特徴量を抽出・照合する。すなわち、映像情報から人物特徴量を抽出し、既にデータベースに登録されている人物特徴量と照合する。照合した結果、データベース内の人物と同一であると判定された場合には、その人物を特定する人物識別情報（例えば人物に付与されたＩＤ）を出力する。また、人物識別の確からしさを表す指標も合わせて含んでいてもよい。また、人物識別情報は、各登場人物に対して１つだけ出力するようになっていてもよいし、複数の候補を出力するようになっていてもよい。また、抽出した人物特徴量が、照合を試行したいずれの人物特徴量ともマッチングしない場合には、その人物特徴量を含む映像区間または映像中の時空間位置を特定する情報を人物識別情報に含んで出力してもよい。さらに、この時空間位置の映像情報自体も合わせて出力してもよい。このようにして求められた人物識別情報は、統合手段１０３へ出力される。ここで、人物特徴量としては、顔を記述する特徴量であってもよいし、人の声の特徴量であってもよい。あるいは、これらを組み合わせた特徴量であってもよく、また、人物の識別に用いることができる他の特徴量であってもよい。 When the target object is a character, the person feature amount of the character appearing in the video is extracted and verified. That is, a person feature is extracted from the video information and collated with a person feature already registered in the database. As a result of the collation, if it is determined that the person is the same as the person in the database, person identification information for identifying the person (for example, an ID given to the person) is output. In addition, it may also include an index representing the probability of person identification. Further, only one person identification information may be output for each character, or a plurality of candidates may be output. In addition, when the extracted person feature quantity does not match any of the person feature quantities that have been verified, the person identification information includes information that specifies the video section including the person feature quantity or the spatio-temporal position in the video. May be output. Further, the video information itself at the spatiotemporal position may be output together. The person identification information obtained in this way is output to the integration unit 103. Here, the person feature amount may be a feature amount describing a face, or a feature amount of a human voice. Or the feature-value which combined these may be sufficient, and the other feature-value which can be used for a person's identification may be sufficient.

統合手段１０３は、クレジット情報認識手段１００から出力されるクレジット候補情報、対象物認識手段１０５から出力される対象物識別情報を統合し、権利情報として出力する。 The integration unit 103 integrates the credit candidate information output from the credit information recognition unit 100 and the object identification information output from the object recognition unit 105, and outputs them as right information.

統合手段１０３の統合の方法であるが、単純に、クレジット情報認識手段１００から出力されるクレジット候補情報と、対象物認識手段１０５から出力される対象物識別情報とを出力する方法が考えられる。 As an integration method of the integration unit 103, a method of simply outputting the credit candidate information output from the credit information recognition unit 100 and the object identification information output from the object recognition unit 105 can be considered.

また、統合の他の方法として、クレジット情報認識手段１００から出力されるクレジット候補情報と、対象物認識手段１０５から出力される対象物識別情報とを照合し、グループ化して出力する方法が考えられる。このときに信頼度に応じて優先順位をつけてもよい。また、信頼度が最も高いものを選択する、あるいは、信頼度が一定以上のものを選択するようにしても良い。 As another method of integration, a method is conceivable in which credit candidate information output from the credit information recognition unit 100 and object identification information output from the object recognition unit 105 are collated, grouped and output. . At this time, priorities may be given according to the reliability. Alternatively, the one with the highest reliability may be selected, or one with a certain degree of reliability may be selected.

照合の方法として、対象物が音楽の場合、識別された楽曲の中から、連続音響の長さから主題歌／テーマソングを選択する。選択された楽曲のタイトルやその属性情報（作詞者、作曲者、あるいは歌手、演奏家名）と、クレジット候補情報の音楽情報とを照合し、一致度がある一定以上の場合に同一楽曲とみなす方法がある。また、番組中での出現時刻を考慮し、主題歌かどうかを判定（すなわち、番組の冒頭に近い位置あるいは最後に近い位置で出現するかどうかを判定）し、上記と同じ基準で重複を判定する方法もある。 As a matching method, when the object is music, a theme song / theme song is selected from the lengths of continuous sounds from among the identified songs. The title of the selected song and its attribute information (lyricist, composer, singer, performer name) are collated with the music information in the credit candidate information, and if the degree of coincidence exceeds a certain level, they are considered to be the same There is a way. Also, considering the appearance time in the program, determine whether it is a theme song (ie, determine whether it appears near the beginning or near the end of the program), and determine overlap based on the same criteria as above There is also a way to do it.

更に、対象物が人物の場合、単純に人物識別の結果得られた名前と、クレジット候補情報から得られた出演者の名前を照合し、一致度がある一定以上の場合に同一人物とみなす方法がある。例えば、文字の数が一定以上一致する場合、あるいは、文字の形状の類似度が名前全体で一定以上になる場合に同一人物とみなす。この際、一致した文字の汎用度を考慮して類似度を判定してもよい。例えば、「木村」よりも「拓也」のほうが同じ２文字でも人物を特定しやすいため、後者のほうを類似度が高いと判定するようにしてもよい。この判定には、TF・IDF法などの方法を用いることができる。また、人物識別の結果から、同じ人物と判定される人物識別情報をグループ化して出演頻度あるいは出演時間を算出あるいは推定し、主役級かどうかを判定し、クレジット候補情報での出現順や、出現パターン（脇役は複数まとめて表示されるのに対し、主役級は単独で表示される、あるいは、ロールテロップの場合は、主役級の場合は前後と間隔を空けて表示される）を考慮して、同一人物かどうかを判定する方法もある。 Furthermore, when the object is a person, the name obtained as a result of person identification is simply compared with the name of the performer obtained from the credit candidate information, and the person is regarded as the same person when the degree of coincidence is above a certain level. There is. For example, if the number of characters is equal to or greater than a certain value, or if the similarity of character shapes is greater than or equal to the entire name, the characters are regarded as the same person. At this time, the similarity may be determined in consideration of the versatility of the matched characters. For example, since “Takuya” is easier to identify a person with the same two characters than “Kimura”, the latter may be determined to have a higher degree of similarity. For this determination, a method such as the TF / IDF method can be used. In addition, based on the results of person identification, the person identification information determined to be the same person is grouped to calculate or estimate the appearance frequency or appearance time, determine whether it is the leading role, the order of appearance in the credit candidate information, the appearance Considering the pattern (multiple supporting roles are displayed together, the leading role is displayed alone, or in the case of a roll telop, the leading role is displayed with an interval before and after) There is also a method for determining whether or not they are the same person.

このような照合方法を用いて、クレジット候補情報と対象物識別情報とをグループ化し、権利情報として出力する。 Using such a collation method, credit candidate information and object identification information are grouped and output as right information.

次に、本発明の第２の実施の形態の効果について説明する。 Next, effects of the second exemplary embodiment of the present invention will be described.

第２の実施の形態では、クレジット情報認識手段と対象物認識手段とを独立に動かし、これらの結果を統合することによって、クレジット候補情報のみの場合と比べて、より正確に権利に関する情報を知ることができる。 In the second embodiment, the credit information recognition means and the object recognition means are moved independently, and these results are integrated, so that information about the rights can be obtained more accurately than in the case of only credit candidate information. be able to.

＜第３の実施の形態＞
第３の実施の形態を説明する。<Third Embodiment>
A third embodiment will be described.

第３の実施の形態は、上述した第１の実施の形態と第２の実施の形態とを組み合わせたものであり、更に、対象物認識手段１０５がクレジット情報認識手段からのクレジット候補情報を用いることを特徴とする。尚、以下の説明では、対象物認識手段の例として、音楽著作物認識手段１０１と、登場人物認識手段１０２との例を示す。 The third embodiment is a combination of the first embodiment and the second embodiment described above, and the object recognition unit 105 uses credit candidate information from the credit information recognition unit. It is characterized by that. In the following description, examples of the music work recognizing means 101 and the character recognizing means 102 are shown as examples of the object recognizing means.

図１７を参照すると、本発明の第３の実施の形態は、クレジット情報認識手段１００と、音楽著作物認識手段１０１と、登場人物認識手段１０２と、統合手段１０３とを含む。クレジット情報認識手段１００は、番組映像を入力とし、その出力は、音楽著作物認識手段１０１と、登場人物認識手段１０２と、統合手段１０３へ接続される。音楽著作物認識手段１０１は、番組映像とクレジット情報認識手段１００の出力とを入力とし、その出力は、統合手段１０３へ接続される。登場人物認識手段１０２は、番組映像とクレジット情報認識手段１００の出力とを入力とし、その出力は、統合手段１０３へ接続される。統合手段１０３へは、クレジット情報認識手段１００と音楽著作物認識手段１０１と登場人物認識手段１０２の出力が接続され、権利情報を出力する。 Referring to FIG. 17, the third embodiment of the present invention includes a credit information recognition unit 100, a music work recognition unit 101, a character recognition unit 102, and an integration unit 103. The credit information recognition unit 100 receives a program video as an input, and its output is connected to the music work recognition unit 101, the character recognition unit 102, and the integration unit 103. The music work recognizing means 101 receives the program video and the output of the credit information recognizing means 100 as inputs, and the output is connected to the integrating means 103. The character recognition means 102 receives the program video and the output of the credit information recognition means 100 as inputs, and the output is connected to the integration means 103. The integration unit 103 is connected to the outputs of the credit information recognition unit 100, the music work recognition unit 101, and the character recognition unit 102, and outputs the right information.

次に、図１７の実施の形態の動作について説明する。 Next, the operation of the embodiment of FIG. 17 will be described.

番組映像は、クレジット情報認識手段１００と、音楽著作物認識手段１０１と、登場人物認識手段１０２とへ入力される。これらの３つの手段のうち、まず、クレジット情報認識手段１００によって番組映像が解析される。 The program video is input to the credit information recognizing means 100, the music work recognizing means 101, and the character recognizing means 102. Of these three means, first, the program information is analyzed by the credit information recognition means 100.

クレジット情報認識手段１００では、入力される番組映像を解析し、映像中に重畳されているクレジット情報を読み取り、クレジット情報の候補となる情報を出力する。 The credit information recognition means 100 analyzes the input program video, reads the credit information superimposed in the video, and outputs information that is a candidate for credit information.

ここで、クレジット情報とは、上述したように、番組の主題歌や最後の部分に重畳されている、原作者や脚本家、出演者、主題歌などの情報を記したテロップや音声である。また、番組映像は、MPEGなどの圧縮されたフォーマットで入力されてもよいし、既に復号されてから入力されてもよい。圧縮された映像として入力される場合には、クレジット情報認識手段の中で映像を復号しながら解析を行う。なお、番組映像は、ある特定の一回の放送分の映像であってもよいし、あるいは、同じ番組の複数の回の映像を同時に入力する構成になっていてもよい。 Here, the credit information is, as described above, a telop or sound that describes information such as the original author, screenwriter, performer, or theme song that is superimposed on the theme song or the last part of the program. The program video may be input in a compressed format such as MPEG, or may be input after being already decoded. When input as a compressed video, analysis is performed while decoding the video in the credit information recognition means. Note that the program video may be a video for a specific one-time broadcast, or may be configured to simultaneously input a plurality of times of video of the same program.

クレジット情報認識手段１００では、番組映像中からクレジット情報が重畳されている情報重畳区間を抽出する。次に、クレジット情報重畳区間に含まれる映像を解析し、映像からテロップ情報を読み取る。そして、その結果をクレジット候補情報として出力する。クレジット候補情報は、認識された文字列とその時間情報、画像中での位置情報（フレーム内での座標）を含んでいてもよい。また、テロップの認識の確からしさを表す指標を含んでいてもよい。また、クレジット候補情報は、認識された各文字列に対して１つの情報を出力するようになっていてもよいし、複数の候補文字列を出力するようになっていてもよい。このようにして求められたクレジット候補情報は、統合手段１０３へ出力されるとともに、音楽著作物認識手段１０１、登場人物認識手段１０２へも出力される。 The credit information recognizing means 100 extracts an information superimposition section in which credit information is superimposed from the program video. Next, the video included in the credit information superimposition section is analyzed, and the telop information is read from the video. Then, the result is output as credit candidate information. The credit candidate information may include a recognized character string, its time information, and position information in the image (coordinates in the frame). In addition, it may include an index representing the probability of telop recognition. Further, the credit candidate information may output one piece of information for each recognized character string, or may output a plurality of candidate character strings. The credit candidate information obtained in this way is output to the integration unit 103 and also output to the music work recognition unit 101 and the character recognition unit 102.

尚、クレジット情報認識手段１００は、上述した具体的な構成のいずれかを用いることが可能である。 Note that the credit information recognition means 100 can use any of the specific configurations described above.

音楽著作物認識手段１０１では、入力される番組映像とクレジット候補情報とを解析し、映像中で使われている音楽情報を抽出・照合する。すなわち、まず番組映像から音響特徴量を抽出し、次に既にデータベースに登録されている音響特徴量と照合する。この際、音楽著作物の全体ではなく、一部区間のみの照合も許可して照合を行う。また、クレジット候補情報から音楽に関連する情報を抽出し、照合に用いるデータベースの制御、あるいは、照合時のパラメータの調整に用いる。照合した結果、データベース内の楽曲と同一であると判定された場合には、その楽曲を特定する音楽識別情報（例えば楽曲に付与されたＩＤ）を出力する。同じ楽曲でも複数の音源がデータベースに登録されており、それらの一つが照合された場合には、その音源を特定する情報も含んでいてもよい。また、楽曲の全体ではなく一部が照合された場合には、その照合区間を特定する情報を音楽識別情報に含んでいてもよい。さらに、音楽識別の確からしさを表す指標も合わせて含んでいてもよい。また、音楽識別情報は、各楽曲に対して１つだけ出力するようになっていてもよいし、複数の候補を出力するようになっていてもよい。このようにして求められた音楽識別情報は、統合手段１０３へ出力される。 The music work recognizing means 101 analyzes the input program video and credit candidate information, and extracts and collates music information used in the video. That is, first, an acoustic feature amount is extracted from a program video, and then collated with an acoustic feature amount already registered in the database. At this time, the collation is performed by permitting the collation of only a part of the section, not the entire music work. In addition, information related to music is extracted from the credit candidate information and used for controlling a database used for collation or adjusting parameters during collation. As a result of the collation, if it is determined that the music is the same as the music in the database, music identification information for specifying the music (for example, an ID given to the music) is output. A plurality of sound sources are registered in the database even for the same music piece, and when one of them is collated, information for specifying the sound source may be included. Further, when a part of the music is collated rather than the whole, the music identification information may include information for specifying the collation section. Further, it may also include an index representing the probability of music identification. Further, only one piece of music identification information may be output for each music piece, or a plurality of candidates may be output. The music identification information obtained in this way is output to the integration unit 103.

登場人物認識手段１０２では、入力される番組映像とクレジット情報とを解析し、映像中に出現する登場人物の人物特徴量を抽出・照合する。すなわち、まず映像情報から人物特徴量を抽出し、次に既にデータベースに登録されている人物特徴量と照合する。この際、クレジット候補情報から登場人物に関連する情報を抽出し、照合に用いるデータベースの制御、あるいは、照合時のパラメータの調整に用いる。そして、この照合結果を人物識別情報として出力する。照合した結果、データベース内の人物と同一であると判定された場合には、その人物を特定する人物識別情報（例えば人物に付与されたＩＤ）を出力する。また、人物識別の確からしさを表す指標も合わせて含んでいてもよい。また、人物識別情報は、各登場人物に対して１つだけ出力するようになっていてもよいし、複数の候補を出力するようになっていてもよい。このようにして求められた人物識別情報は、統合手段１０３へ出力される。 The character recognizing means 102 analyzes the input program video and credit information, and extracts and collates character feature amounts of the characters appearing in the video. That is, first, a person feature amount is extracted from video information, and then collated with a person feature amount already registered in the database. At this time, information related to the characters is extracted from the credit candidate information, and is used to control a database used for matching or to adjust parameters during matching. And this collation result is output as person identification information. As a result of the collation, if it is determined that the person is the same as the person in the database, person identification information for identifying the person (for example, an ID given to the person) is output. In addition, it may also include an index representing the probability of person identification. Further, only one person identification information may be output for each character, or a plurality of candidates may be output. The person identification information obtained in this way is output to the integration unit 103.

ここで、人物特徴量としては、顔を記述する特徴量であってもよいし、人の声の特徴量であってもよい。あるいは、これらを組み合わせた特徴量であってもよく、また、人物の識別に用いることができる他の特徴量であってもよい。 Here, the person feature amount may be a feature amount describing a face, or a feature amount of a human voice. Or the feature-value which combined these may be sufficient, and the other feature-value which can be used for a person's identification may be sufficient.

統合手段１０３では、クレジット情報認識手段１００から出力されるクレジット候補情報、音楽著作物認識手段１０１から出力される音楽識別情報、登場人物認識手段１０２から出力される人物識別情報を統合し、権利情報として出力する。 The integration unit 103 integrates the credit candidate information output from the credit information recognition unit 100, the music identification information output from the music work recognition unit 101, and the person identification information output from the character recognition unit 102 to obtain rights information. Output as.

統合の方法は、上述した第２の実施の形態における統合手段１０３で説明した手法を用いても良いし、認識された文字列とその位置の関係から、原作や脚本家、出演者など権利対象の種別ごとに対応付け、権利情報として出力するようにしても良い。 For the integration method, the method described in the integration unit 103 in the second embodiment described above may be used, and based on the relationship between the recognized character string and its position, the original object, screenwriter, performer, etc. Each type may be associated and output as right information.

また、音楽著作物の場合には、認識された音楽タイトルや音楽識別情報を各楽曲に対して出力する。あるいは、各楽曲の著作権情報を格納したデータベースにアクセスができる場合には、音楽識別情報からその音楽に付随する権利情報を求め、これを出力してもよい。人物については、人物識別情報をそのまま出力してもよいし、それと合わせて人物名を出力するようにしてもよい。なお、これらの権利情報は、最終的には１つに絞り込まずに、候補を全て出力するようにし、最終的には人が確認するようにしてもよい。これにより、認識された権利情報が誤っている場合の訂正が容易になる。 In the case of a music work, the recognized music title and music identification information are output for each song. Alternatively, when the database storing the copyright information of each musical piece can be accessed, the right information accompanying the music may be obtained from the music identification information and output. For the person, the person identification information may be output as it is, or the person name may be output together with it. Note that these rights information may not be finally narrowed down to one, but all candidates may be output, and finally, a person may confirm. This facilitates correction when the recognized right information is incorrect.

＜音楽著作物認識手段１０１の構成例＞
（１）音楽著作物認識手段１０１の具体的な構成例１
図１８を参照すると、音楽著作物認識手段１０１の構成例が示されており、楽曲候補抽出手段８００と候補音響特徴量選択手段８０１と音楽著作物照合手段８０２と楽曲音響特徴量データベース８０３とからなる。楽曲候補抽出手段８００は、クレジット候補情報を入力とし、楽曲候補情報を候補音響特徴量選択手段８０１へ出力する。候補音響特徴量選択手段８０１は、楽曲候補抽出手段８００から出力される楽曲候補情報に基づいて楽曲音響特徴量データベース８０３から楽曲音響特徴量を選択し、候補音響特徴量を音楽著作物照合手段８０２へ出力する。音楽著作物照合手段８０２は、番組映像と候補音響特徴量選択手段８０１から出力される候補音響特徴量を入力とし、音楽識別情報を出力する。<Example of Configuration of Musical Work Recognition Unit 101>
(1) Specific configuration example 1 of the music work recognition means 101
Referring to FIG. 18, a configuration example of the music work recognizing means 101 is shown. From the music candidate extracting means 800, the candidate acoustic feature quantity selecting means 801, the music work collating means 802, and the music acoustic feature quantity database 803. Become. The music candidate extraction unit 800 receives credit candidate information as input, and outputs the music candidate information to the candidate acoustic feature quantity selection unit 801. The candidate acoustic feature quantity selection unit 801 selects a music acoustic feature quantity from the music acoustic feature quantity database 803 based on the music candidate information output from the music candidate extraction unit 800, and the candidate acoustic feature quantity is stored as a music work collating unit 802. Output to. The music work collating means 802 receives the program video and the candidate acoustic feature quantity output from the candidate acoustic feature quantity selecting means 801 and outputs music identification information.

次に、図１８の音楽著作物認識手段１０１の動作について述べる。 Next, the operation of the music work recognition means 101 in FIG. 18 will be described.

クレジット候補情報は、楽曲候補情報抽出手段８００へ入力される。楽曲候補情報抽出手段８００では、クレジット候補情報から、主題歌や挿入歌など、番組中で使われている楽曲に関する候補情報を抽出する。例えば、「主題歌」や「挿入歌」、「テーマソング」など、楽曲に関するキーワードを登録しておき、これらのキーワードが検出された場合には、これと並んで表示されるかあるいは続けて表示される文字列の認識結果を楽曲候補情報として抽出する。ここで得られる情報は、楽曲の題名、歌手や演奏家の名前、作詞・作曲家の名前などである。そして、得られた楽曲候補情報を候補音響特徴量選択手段８０１へ出力する。 The credit candidate information is input to the music candidate information extraction unit 800. The music candidate information extraction means 800 extracts candidate information related to music used in the program, such as a theme song and an insertion song, from the credit candidate information. For example, keywords related to music such as “theme song”, “insert song”, “theme song” are registered, and when these keywords are detected, they are displayed side by side or displayed in succession. The character string recognition result is extracted as music candidate information. The information obtained here includes the title of the song, the names of the singer and performer, the names of the lyrics and composers, and so on. Then, the obtained music candidate information is output to the candidate acoustic feature quantity selection means 801.

候補音響特徴量選択手段８０１では、楽曲音響特徴量データベース８０３から、得られた楽曲候補情報と一致するか、あるいは類似する題名や人名と関連付けられた楽曲の特徴量を選択する。そして、選択された音響特徴量データを候補音響特徴量として音楽著作物照合手段８０２へ出力する。 The candidate acoustic feature quantity selection means 801 selects from the music acoustic feature quantity database 803 a feature quantity of a song that matches or is similar to the obtained song candidate information. Then, the selected acoustic feature amount data is output to the music work collating means 802 as a candidate acoustic feature amount.

音楽著作物照合手段８０２では、まず、番組映像から音響特徴量を抽出する。この音響特徴量と候補音響特徴量選択手段８０１から出力される候補音響特徴量とを照合し、照合された場合には、その音楽の識別情報を出力する。この際、音楽著作物の全体ではなく、一部区間のみの照合も許可して照合を行う。 The music work collation means 802 first extracts an acoustic feature quantity from the program video. This acoustic feature quantity and the candidate acoustic feature quantity output from the candidate acoustic feature quantity selection means 801 are collated, and if they are collated, the identification information of the music is output. At this time, the collation is performed by permitting the collation of only a part of the section, not the entire music work.

このように、図１８の音楽著作物認識手段１０１は、クレジットの情報が完全に読み取れなくても、実際に使われている楽曲を特徴量同士で照合することで、楽曲に関する著作権情報抽出の精度を向上できる。 In this way, the music copyright recognizing means 101 in FIG. 18 extracts the copyright information regarding the music by collating the actually used music with the feature quantities even if the credit information is not completely read. Accuracy can be improved.

（２）音楽著作物認識手段１０１の具体的な構成例２
図１９を参照すると、音楽著作物認識手段１０１の他の構成例が示されており、音楽関連制作情報抽出手段８２０と音楽著作物照合パラメータ選択手段８２１と音楽著作物照合手段８２２と音楽著作物照合パラメータデータベース８２３と楽曲音響特徴量データベース８０３とからなる。音楽関連制作情報抽出手段８２０は、クレジット候補情報を入力とし、音楽関連制作情報を音楽著作物照合パラメータ選択手段８２１へ出力する。音楽著作物照合パラメータ選択手段８２１は、音楽関連制作情報抽出手段８２０から出力される音楽関連制作情報を入力とし、音楽著作物照合パラメータデータベース８２３から音楽著作権照合パラメータを選択し、音楽著作物照合手段８２２へ出力する。音楽著作物照合手段８２２は、番組情報と音楽著作物照合パラメータ選択手段８２１から出力される音楽著作物照合パラメータと楽曲音響特徴量データベース８０３に格納された楽曲音響特徴量とを入力とし、音楽識別情報を出力する。(2) Specific configuration example 2 of the music work recognition means 101
Referring to FIG. 19, there is shown another configuration example of the music work recognizing means 101, which includes a music related production information extracting means 820, a music work collation parameter selection means 821, a music work collation means 822, and a music work. It consists of a collation parameter database 823 and a music acoustic feature database 803. The music-related production information extraction unit 820 receives the credit candidate information as input, and outputs the music-related production information to the music work collation parameter selection unit 821. The music work collation parameter selection means 821 receives the music related production information output from the music related production information extraction means 820, selects a music copyright collation parameter from the music work collation parameter database 823, and performs music work collation. Output to means 822. The music work collation means 822 receives the program information, the music work collation parameters output from the music work collation parameter selection means 821, and the music acoustic feature quantity stored in the music acoustic feature quantity database 803, and performs music identification. Output information.

次に、図１９の音楽著作物認識手段１０１の動作について述べる。
クレジット候補情報は、音楽関連制作情報抽出手段８２０へ入力される。音楽関連制作情報抽出手段８２０では、音楽関連制作情報をクレジット候補情報から抽出する。ここで、音楽関連制作情報とは、番組制作における音楽関連の情報であり、音楽の担当者、音楽協力を行ったレコード会社、選曲を担当した人物などである。これも、前述の楽曲候補情報抽出手段８００と同様に、「音楽」、「選曲」などといったキーワードを登録しておき、これらのキーワードが検出された場合には、これと並んで表示されるかあるいは続けて表示される文字列の認識結果を音楽関連制作情報として抽出する。そして、抽出結果を音楽制作関連情報として音楽著作物照合パラメータ選択手段８２１へ出力する。Next, the operation of the music work recognition means 101 in FIG. 19 will be described.
The credit candidate information is input to the music related production information extracting means 820. The music related production information extracting means 820 extracts music related production information from the credit candidate information. Here, the music-related production information is music-related information in program production, such as a person in charge of music, a record company that cooperated in music, a person in charge of music selection, and the like. Similarly to the music candidate information extraction unit 800 described above, keywords such as “music” and “music selection” are registered, and if these keywords are detected, are they displayed alongside them? Or the recognition result of the character string displayed continuously is extracted as music related production information. Then, the extraction result is output to the music work collation parameter selection means 821 as music production related information.

音楽著作物照合パラメータ選択手段８２１では、入力される音楽関連制作情報に応じて、音楽著作物照合パラメータデータベース８２３に格納されている音楽著作物の照合で使用するパラメータを選択する。あるいは、選択された情報に基づいて、音楽著作物照合パラメータを制御する。例えば、音楽関連制作情報として抽出された文字列がレコード会社の名前の場合には、そのレコード会社が持っている楽曲を優先的に選択させるように音楽著作物照合パラメータを調整する。あるいは、レコード会社ごとにグループ化されて、あるいはデータベースを分けて楽曲音響特徴量データベース８０３に楽曲音響特徴量が格納されている場合には、そのグループやデータベースを指定するための情報を音楽著作物照合パラメータとして選択する。あるいは、音楽関連制作情報が、ＢＧＭなどの楽曲の選定に関わった人や団体名である場合には、その人の過去の楽曲使用履歴に応じて、音楽著作物照合パラメータを調整するようになっていてもよい。このようにして選択された音楽著作物照合パラメータは、音楽著作物照合手段８２２へ入力される。 The music work collation parameter selection means 821 selects a parameter used for collation of music works stored in the music work collation parameter database 823 in accordance with the input music-related production information. Alternatively, the music work collation parameter is controlled based on the selected information. For example, when the character string extracted as music-related production information is the name of a record company, the music work collation parameter is adjusted so that the music held by the record company is preferentially selected. Alternatively, if music acoustic feature quantities are stored in the music acoustic feature quantity database 803, grouped for each record company or divided into databases, information for designating the group or database is used as music works. Select as a verification parameter. Alternatively, when the music-related production information is the name of a person or group involved in the selection of music such as BGM, the music work collation parameter is adjusted according to the person's past music use history. It may be. The music work collation parameter selected in this way is input to the music work collation means 822.

音楽著作物照合手段８２２の動作は、基本的に図１８の音楽著作物照合手段８０２の動作と同様である。違いは、さらに音楽著作物照合パラメータ選択手段８２１から音楽著作物照合パラメータが入力され、これによって照合のパラメータを調整できるようになっている点である。照合の結果は、音楽識別情報として出力される。 The operation of the music work collating means 822 is basically the same as the operation of the music work collating means 802 of FIG. The difference is that a music work collation parameter is further input from the music work collation parameter selection means 821 so that the collation parameters can be adjusted. The result of collation is output as music identification information.

図１９の音楽著作物認識手段１０１により、照合パラメータを調整することで、認識の精度を高めることが可能になる。 By adjusting the collation parameter by the music work recognizing means 101 of FIG. 19, it is possible to improve the recognition accuracy.

ここで、図１９の音楽著作物認識手段１０１における音楽著作物照合手段８２２について、更に、詳細に説明する。 Here, the music work collating means 822 in the music work recognizing means 101 in FIG. 19 will be described in more detail.

図２０を参照すると、音楽著作物照合手段８２２の実施の形態の一例が示されており、音声重畳判定手段９５０と音響特徴量照合手段９５１とからなる。音声重畳判定手段９５０は、番組映像を入力とし、音声重畳区間時刻情報を音響特徴量照合手段９５１へ出力する。音響特徴量照合手段９５１は、番組映像と音声重畳判定手段９５０から出力される音声重畳区間時刻情報と、音楽著作物照合パラメータとを入力とし、音声識別情報を出力する。 Referring to FIG. 20, an example of an embodiment of a music work collating unit 822 is shown, which includes an audio superimposition determining unit 950 and an acoustic feature amount collating unit 951. The voice superimposition determination unit 950 receives the program video and outputs the voice superimposition section time information to the acoustic feature amount matching unit 951. The acoustic feature amount matching unit 951 receives the program video, the sound superimposition section time information output from the sound superimposition determination unit 950, and the music work collation parameter, and outputs the sound identification information.

次に、図２０の音楽著作物照合手段８２２の動作について述べる。 Next, the operation of the music work collating means 822 in FIG. 20 will be described.

番組映像は、音声重畳判定手段９５０へ入力される。音声重畳判定手段９５０では、音響信号を解析し、音声が重畳されているかどうかを判定する。例えば、音響信号の周波数解析を行い、人間の声に近い特性を有する場合には、音声が重畳されていると判定する。これ以外にも、音声の重畳を判定可能な方法であれば、どのような方法も用いることができる。音声が重畳していると判定された場合には、音声が重畳している区間の時間情報（区間開始点、終了点、区間時間長など）を音声重畳区間時刻情報として音響特徴量照合手段９５１へ出力する。 The program video is input to the audio superimposition determination means 950. The sound superimposition determination means 950 analyzes the acoustic signal and determines whether or not the sound is superimposed. For example, the frequency analysis of an acoustic signal is performed, and when it has the characteristic close | similar to a human voice, it determines with the audio | voice being superimposed. In addition to this, any method can be used as long as it can determine whether audio is superimposed. When it is determined that the voice is superimposed, the acoustic feature amount matching unit 951 uses the time information (section start point, end point, section time length, etc.) of the section in which the voice is superimposed as the voice superimposed section time information. Output to.

音響特徴量照合手段９５１では、入力される映像番組から音響特徴量を抽出し、候補音響特徴量と照合する。この際、音声重畳判定手段９５０から出力される音声重畳区間時刻情報を用い、照合方法を調整する。例えば、音声が重畳している区間を省いて照合を行う、音声が重畳している区間では、音声周波数帯域の重みを低くして照合を行うなどの方法が考えられる。また、音楽著作物照合パラメータも入力されるようになっており、これを用いて照合方式を調整するようになっていてもよい。照合の結果は音楽識別情報として出力される。 The acoustic feature amount matching unit 951 extracts an acoustic feature amount from the input video program and compares it with the candidate acoustic feature amount. At this time, the collation method is adjusted using the voice superimposition section time information output from the voice superimposition determination unit 950. For example, a method may be considered in which collation is performed by omitting a section in which speech is superimposed, or in a section in which speech is superimposed, collation is performed by lowering the weight of the speech frequency band. In addition, a music work collation parameter is also input, and the collation method may be adjusted using this parameter. The result of collation is output as music identification information.

図２０の音楽著作物照合手段８２２は、音声情報がＢＧＭにかかった場合でもその影響を小さく抑え、認識精度を向上できる。 The music work collating means 822 in FIG. 20 can suppress the influence even when the voice information is applied to the BGM, and can improve the recognition accuracy.

＜登場人物認識手段１０２の構成例＞
（１）登場人物認識手段１０２の具体的な構成例１
図２１を参照すると、登場人物認識手段１０２の実施の形態の一例が示されており、出演者候補情報抽出手段９００と候補人物特徴量選択手段９０１と出演者照合手段９０２と人物特徴量データベース９０３とからなる。出演者候補情報抽出手段９００は、クレジット候補情報を入力とし、出演者候補情報を候補人物特徴量選択手段９０１へ出力する。候補人物特徴量選択手段９０１は、出演者候補情報抽出手段９００から出力される出演者候補情報を入力とし、人物特徴量データベース９０３から候補人物特徴量を選択し、出演者照合手段９０２へ出力する。出演者照合手段９０２は、番組映像と候補人物特徴量選択手段９０１から出力される候補人物特徴量を入力とし、人物識別情報を出力する。<Configuration Example of Character Recognition Unit 102>
(1) Specific configuration example 1 of the character recognition means 102
Referring to FIG. 21, an example of an embodiment of the character recognition unit 102 is shown. The performer candidate information extraction unit 900, the candidate person feature amount selection unit 901, the performer collation unit 902, and the person feature amount database 903 are shown. It consists of. The performer candidate information extraction unit 900 receives the credit candidate information as input, and outputs the performer candidate information to the candidate person feature amount selection unit 901. Candidate person feature quantity selection means 901 receives performer candidate information output from performer candidate information extraction means 900, selects candidate person feature quantities from person feature quantity database 903, and outputs them to performer verification means 902. . The performer collating unit 902 receives the program image and the candidate person feature quantity output from the candidate person feature quantity selecting unit 901, and outputs person identification information.

次に、図２１の登場人物認識手段１０２の動作について述べる。 Next, the operation of the character recognition unit 102 in FIG. 21 will be described.

クレジット候補情報は、出演者候補情報抽出手段９００へ入力される。出演者候補情報抽出手段９００では、クレジット候補情報から、出演者に相当する部分を抽出し、出演者候補情報として出力する。具体的には、配役名と推定される名前と一緒に記されている人物名を抽出する、「出演」という単語と同時かあるいはそれに続いて表示される人物名を抽出する、クレジットに記載されている人物名から脚本家やプロデューサーなど、番組に明らかに出演していない人物名を判定し、これらを除いた人物名を抽出するなどの方法が考えられる。抽出された出演者候補情報は、候補人物特徴量選択手段９０１へ出力される。 The credit candidate information is input to the performer candidate information extraction unit 900. The performer candidate information extraction unit 900 extracts a portion corresponding to the performer from the credit candidate information and outputs it as performer candidate information. Specifically, the person name written together with the name estimated as the cast name is extracted, the person name displayed at the same time as or following the word “appearance” is extracted and written in the credit. A method may be considered in which the names of persons who are not clearly appearing in the program, such as screenwriters and producers, are determined from the names of the persons who are present, and the names of persons excluding these names are extracted. The extracted performer candidate information is output to the candidate person feature quantity selection means 901.

候補人物特徴量選択手段９０１では、人物特徴量データベース９０３から、人名が一致、あるいは近い人物の特徴量を選択する。この際、１つの認識された人名候補に対して１つの人物特徴量を対応付けて出力する必要はなく、複数の類似した名前を有する人物の人物特徴量を出力するようになっていてもよい。選択された候補人物特徴量は、出演者照合手段９０２へ出力される。 The candidate person feature quantity selection unit 901 selects a feature quantity of a person whose person names match or are close to each other from the person feature quantity database 903. At this time, it is not necessary to output one person feature amount in association with one recognized person name candidate, and the person feature amount of a person having a plurality of similar names may be output. . The selected candidate person feature amount is output to the performer verification unit 902.

出演者照合手段９０２では、まず、入力される番組映像から人物特徴量を抽出する。例えば、人物特徴量が顔特徴量の場合には、映像に対して顔検出を行い、次に、検出された領域の顔特徴量を算出する。あるいは、人物特徴量が音声特徴量の場合には、まず、音声を含む区間を抽出し、次に、この区間の音声の特徴量を抽出する。抽出された人物特徴量は、候補人物特徴量選択手段９０１から入力される候補人物特徴量の各々と照合される。照合の結果、同一と判定された場合には、その人物を識別するための情報を人物識別情報として出力する。 The performer collating means 902 first extracts a person feature amount from the input program video. For example, when the person feature amount is a face feature amount, face detection is performed on the video, and then the face feature amount of the detected area is calculated. Alternatively, when the person feature amount is a voice feature amount, first, a section including the voice is extracted, and then, a voice feature amount of this section is extracted. The extracted person feature amount is collated with each of the candidate person feature amounts input from the candidate person feature amount selecting unit 901. If it is determined as a result of the collation, information for identifying the person is output as person identification information.

図２１に示す登場人物認識手段１０２では、クレジット情報が完全に正確には認識できなかった場合や同姓同名などで曖昧な場合であっても、実際の人物特徴量を照合することで、出演者情報を正しく抽出することを可能にする。 In the character recognition means 102 shown in FIG. 21, even if the credit information cannot be completely accurately recognized or is ambiguous with the same name and the same name, the actual character feature amount is collated to perform the performer. Allows information to be extracted correctly.

（２）登場人物認識手段１０２の具体的な構成例２
図２２を参照すると、登場人物認識手段１０２の実施の形態の一例が示されており、出演者所属団体抽出手段９２０と出演者照合パラメータ選択手段９２１と出演者照合手段９２２と人物特徴量データベース９０３と人物照合パラメータデータベース９２３とからなる。(2) Specific configuration example 2 of the character recognition means 102
Referring to FIG. 22, an example of an embodiment of the character recognition unit 102 is shown. The performer affiliation group extraction unit 920, the performer collation parameter selection unit 921, the performer collation unit 922, and the person feature amount database 903 are shown. And a person verification parameter database 923.

出演者所属団体抽出手段９２０は、クレジット候補情報を入力とし、出演者所属関連情報を出演者照合パラメータ選択手段９２１へ出力する。出演者照合パラメータ選択手段９２１は、出演者所属団体抽出手段９２０から出力される出演者所属関連情報を入力とし、人物照合パラメータデータベース９２３から出演者照合パラメータを選択し、出演者照合手段９２２へ出力する。出演者照合手段９２２は、番組映像と出演者照合パラメータ選択手段９２１から出力される出演者照合パラメータと人物特徴量データベース９０３に格納されている人物特徴量を入力とし、人物識別情報を出力する。 The performer affiliation group extraction unit 920 receives the credit candidate information as input, and outputs the performer affiliation related information to the performer collation parameter selection unit 921. Performer collation parameter selection means 921 receives performer affiliation related information output from performer affiliation group extraction means 920, selects performer collation parameters from person collation parameter database 923, and outputs them to performer collation means 922. To do. The performer verification unit 922 receives the program video and the performer verification parameter output from the performer verification parameter selection unit 921 and the person feature amount stored in the person feature amount database 903, and outputs person identification information.

次に、図２２の登場人物認識手段１０２の動作について述べる。 Next, the operation of the character recognition unit 102 in FIG. 22 will be described.

クレジット候補情報は出演者所属団体抽出手段９２０へ入力される。出演者所属団体抽出手段９２０では、クレジット候補情報から、劇団名や芸能プロダクション名など、出演者の所属に関連する情報を抽出する。具体的には、出演者の所属情報の辞書を用意し、この辞書に登録されている名前と照合することで抽出できる。抽出された結果は、出演者所属関連情報として出演者照合パラメータ選択手段９２１へ出力される。 The credit candidate information is input to the performer affiliation group extraction means 920. The performer affiliation group extraction means 920 extracts information related to the affiliation of the performer, such as theatrical company name and entertainment production name, from the credit candidate information. Specifically, it can be extracted by preparing a dictionary of performer affiliation information and collating it with names registered in this dictionary. The extracted result is output to the performer collation parameter selection unit 921 as performer affiliation related information.

出演者照合パラメータ選択手段９２１では、人物照合パラメータデータベース９２３から出演者照合パラメータを選択する。例えば、所属団体ごとにグループ化されて、あるいはデータベースを分けて人物特徴量データベース９０３に人物特徴量が格納されている場合には、そのグループやデータベースを指定するための情報を出演者照合パラメータとして選択する。選択された出演者照合パラメータは出演者照合手段９２２へ出力される。 The performer verification parameter selection unit 921 selects a performer verification parameter from the person verification parameter database 923. For example, when the person feature quantity is stored in the person feature quantity database 903 by grouping for each affiliated organization or by dividing the database, information for designating the group or database is used as a performer verification parameter. select. The selected performer verification parameter is output to the performer verification unit 922.

出演者照合手段９２２の動作は、基本的に図２１の出演者照合手段９０２の動作と同様である。違いは、さらに出演者照合パラメータ選択手段９２１から出力される出演者照合パラメータが入力され、これによって照合のパラメータを調整できるようになっている点である。照合の結果は、人物識別情報として出力される。 The operation of the performer verification unit 922 is basically the same as the operation of the performer verification unit 902 of FIG. The difference is that the performer collation parameter output from the performer collation parameter selection means 921 is further input, and the collation parameter can be adjusted accordingly. The result of collation is output as person identification information.

図２２の登場人物認識手段１０２により、クレジットには劇団名などの団体名しか記載されていない場合であっても、その団体に所属する誰が出演したかを効率的に抽出することが可能になる。 The character recognition means 102 in FIG. 22 makes it possible to efficiently extract who appeared in the group even if only the group name such as theatrical company name is described in the credit. .

本発明の第３の実施の形態の効果について説明する。 The effect of the 3rd Embodiment of this invention is demonstrated.

本実施の形態では、映像中からクレジットの重畳区間を求め、テロップ認識を行うため、権利情報として重要なクレジット情報を直接得ることができる。また、重畳区間のみに処理を絞っているため、番組全体にテロップ認識を行う場合に比べ、演算負荷を低減できる。 In the present embodiment, since credit overlap sections are obtained from video and telop recognition is performed, it is possible to directly obtain important credit information as rights information. In addition, since the processing is limited to only the overlapping section, the calculation load can be reduced as compared with the case where telop recognition is performed on the entire program.

また、音楽著作物の識別にも、このクレジット情報を用いるため、通常の音楽識別に比べ、識別の精度を高めることができる。また、登場人物の識別にも、クレジット情報を用いるため、単体の人物識別に比べ、識別の精度を高めることができる。 In addition, since this credit information is used for identifying music works, the accuracy of identification can be improved compared to ordinary music identification. In addition, since credit information is used for the identification of the characters, the identification accuracy can be improved as compared with the single person identification.

尚、上述した説明では、対象物認識手段の例として、音楽著作物認識手段１０１と、登場人物認識手段１０２との例を示したが、この例に限ることなく、例えば、図２３、図２４のようにいずれかの一方のみを用いる構成としても良い。また、対象物認識手段は、上述した各具体的な構成のものを組み合わせて用いても良い。 In the above description, examples of the music work recognition means 101 and the character recognition means 102 are shown as examples of the object recognition means. However, the present invention is not limited to this example. For example, FIGS. It is good also as a structure which uses only either one like this. Further, the object recognition means may be used in combination with the specific configurations described above.

本出願は、２００６年１０月２６日に出願された特願２００６−２９１４４２号を基礎とする優先権を主張し、その開示の全てをここに取り込む。
This application claims the priority on the basis of Japanese Patent Application No. 2006-291442 for which it applied on October 26, 2006, and takes in those the indications of all here.

Claims

A right information extraction apparatus comprising credit information recognition means for reading credit information related to a right from content and outputting the result as credit candidate information.

A rights information extraction device that extracts rights information related to rights from content,
Credit information recognition means for reading credit information about rights from content and outputting the result as credit candidate information;
Analyzing the content, recognizing an object related to the right in the content, and outputting the result as object identification information;
An rights information extraction device comprising: an integration unit that integrates the credit candidate information and the object identification information and outputs the information as rights information.

A rights information extraction device that extracts rights information related to rights from content,
Credit information recognition means for reading credit information about rights from content and outputting the result as credit candidate information;
Object recognition means for referring to the credit candidate information, analyzing the content and recognizing an object related to a right in the content, and outputting the result as object identification information;
An rights information extraction apparatus comprising: an integration unit that integrates the credit candidate information and the object identification information and outputs the information as rights information.

The credit information recognizing means includes a credit information section detecting means for detecting a credit information section including credit information in the content,
The right information extracting device according to any one of claims 1 to 3, further comprising credit information reading means for reading credit information from the credit information section and outputting the result as credit candidate information.

5. The credit information section detection means detects a video section in which credit information is superimposed on a video from content, and outputs credit information section video data that is video data of the video section. 5. The right information extraction device according to 5.

The credit information section detecting means detects a sound section from the content, and outputs sound detecting section information as sound section information;
5. The right information extracting apparatus according to claim 4, further comprising means for outputting the content section specified by the acoustic section information as the credit information section.

The acoustic detection means includes
Continuous acoustic time measuring means for measuring the duration of continuous acoustic information in the content and outputting as continuous acoustic time;
The right information extraction device according to claim 6, further comprising: a sound section determination unit that determines a sound section using the continuous sound time and outputs the sound section as sound section information.

The acoustic detection means includes
An acoustic feature quantity extracting means for extracting an acoustic feature quantity for each of a plurality of times constituting the content and outputting the acoustic feature quantity;
The acoustic feature quantity is collated between the acoustic feature quantities, an acoustic feature is detected by specifying a section having a common acoustic feature quantity, and output as acoustic section information. The right information extracting device according to claim 6 or 7.

The credit information section detecting means is
Continuous telop detection means for detecting video sections in which telop candidate areas continuously appear from the content and outputting the video sections as continuous telop appearance section information;
6. The right information extracting apparatus according to claim 5, further comprising means for outputting a video section of the program video specified by the continuous telop appearance section information as the credit information section video data.

The credit information section detecting means is
Roll telop detection means for detecting roll telop from the content and outputting time information of the video section as roll telop section information;
6. The right information extracting apparatus according to claim 5, further comprising means for outputting a video section of the content specified by the roll telop section information as the credit information section video data.

6. The right information extracting apparatus according to claim 5, wherein the credit information reading means includes performing telop recognition on the credit information section video data and outputting the credit candidate information.

The credit information reading means includes
A plurality of times of credit information section video data, and an acoustic background video generation means for generating and outputting an acoustic background video having characteristics common to the plurality of credit information section video data;
Acoustic background difference video generation means for generating and outputting an acoustic background difference video by subtracting the acoustic background video from the credit information section video data;
6. The right information extraction apparatus according to claim 5, further comprising: a telop reading unit that applies telop recognition to the sound background difference video to acquire and output the credit candidate information.

The credit information reading means includes
First telop reading means for obtaining and outputting first credit candidate information by applying telop recognition to the credit information section video data;
A plurality of times of credit information section video data, and an acoustic background video generation means for generating and outputting an acoustic background video having characteristics common to the plurality of credit information section video data;
Acoustic background difference video generation means for generating and outputting an acoustic background difference video by subtracting the acoustic background video from the credit information section video data;
A telop reading unit that applies telop recognition to the acoustic background difference video to obtain and output second credit candidate information;
6. The right information according to claim 5, further comprising: a telop reading result integrating unit that integrates the first credit candidate information and the second credit candidate information to obtain and output credit candidate information. Extraction device.

The acoustic background video generation means includes
A visual feature amount extracting means for extracting a visual feature amount for each time of the credit information section video data, and outputting it as an acoustic background visual feature amount;
A corresponding frame calculation means for collating visual feature amounts between the acoustic background visual feature amounts, associating video frames having a common background, and outputting as frame correspondence information;
A background image generation means for performing a statistical process of pixel values between each frame associated with the frame correspondence information to calculate a value of each pixel of the acoustic background, generating an acoustic background image, and outputting it; The right information extracting device according to claim 12 or 13, characterized in that

15. The right information extraction apparatus according to claim 14, wherein the background video generation unit uses a median as the statistical processing when a variation in pixel value between corresponding frames is large.

The background video generation means is an indicator that indicates that the pixel value corresponds to a telop area from the pixel value information in the vicinity of the pixel value as the statistical processing when the variation of the pixel value between corresponding frames is large. 15. The right information extracting apparatus according to claim 14, wherein statistical processing is performed by applying a greater weight to the smaller index.

The credit information recognition means includes
17. If there is a credit candidate area that could not be read, information specifying a spatio-temporal position in a video including the credit candidate area is output together with the credit candidate information. The right information extraction device according to any one of the above.

The object recognizing means analyzes the acoustic feature amount of the content, recognizes the music work of the content based on the acoustic feature amount and the credit candidate information, and outputs the result as music identification information 5. The right information extracting device according to claim 2, wherein the right information extracting device is a recognition unit.

The target object recognition means analyzes the character feature amount of the content, recognizes the character of the content based on the character feature amount and the credit candidate information, and outputs the result as performer identification information 5. The right information extraction apparatus according to claim 2, wherein the right information extraction apparatus is a means.

The music work recognizing means is:
Music candidate information extraction means for extracting candidate information of used music from the credit candidate information and outputting it as used music candidate information;
A candidate acoustic feature quantity selecting unit that selects an acoustic feature quantity of music close to the use song candidate information from an acoustic feature quantity database and outputs it as a candidate acoustic feature quantity;
A music work collating unit that collates the candidate acoustic feature quantity with the acoustic feature quantity extracted from the content and outputs the music identification information when it is determined that they match each other. Item 21. The right information extracting device according to Item 18.

The music work recognizing means is:
Music-related production information extracting means for extracting information on the person, group, or record producer involved in music production from the credit candidate information and outputting it as music-related production information;
Music work collation parameter selection means for selecting collation parameters such as variables used for music collation and music database selection information according to the music-related production information;
A music work that compares the acoustic feature quantity in the music acoustic feature quantity database with the acoustic feature quantity extracted from the content using the matching parameter, and outputs the music identification information when it is determined that they match. 19. The right information extracting device according to claim 18, further comprising an object collating unit.

The music work collating means is:
Analyzing the content to determine a section including sound, and outputting the time information of the section as the sound superimposition section time information;
An acoustic feature amount is extracted from the content, a voice non-superimposed section that is a section in which no voice is superimposed is detected based on the voice superimposition section time information, and the music work collation parameter is detected only in the voice non-superimposed section. The right information extraction unit according to claim 20 or 21, further comprising: an acoustic feature amount matching unit that performs matching with the candidate acoustic feature amount by using and outputs a matching result as the music identification information. apparatus.

The music work collation means is:
Audio superimposition determination means for analyzing the content and determining an audio superimposition section including audio, and outputting time information of the section as audio superimposition section time information;
An acoustic feature amount is extracted from the content, a voice non-superimposed section that is a section in which no voice is superimposed is detected based on the voice superimposition section time information, and the music work collation parameter is set in the voice non-superimposed section. The candidate acoustic feature is used for collation, and in the voice superimposition section, the influence of the signal of the voice frequency band is suppressed, and the music work collation parameter is used for matching with the candidate acoustic feature, The right information extraction device according to claim 20 or 21, further comprising an acoustic feature amount matching unit that outputs a matching result as the music identification information.

The music work recognizing means is:
When the acoustic feature quantity extracted from the program video does not match any acoustic feature quantity that has been tried to collate, information for identifying a video section including the acoustic feature quantity is output together with the music identification information. The right information extraction device according to any one of claims 18 to 23.

The character recognition means is:
Performer candidate information extracting means for extracting candidate information of a person related to the performer from the credit candidate information and outputting it as performer candidate information;
A candidate person feature quantity selecting means for selecting a person feature quantity of a person close to the performer candidate information from a person feature quantity database and outputting as a candidate person feature quantity;
A performer collating unit that collates the candidate person feature amount with a person feature amount extracted from the content and outputs the person identification information when it is determined that the candidate feature amount is matched. 19. The right information extraction device according to 19.

The character recognition means is:
Performer affiliation organization extraction means for extracting information about the affiliation organization of the performer from the credit candidate information and outputting as performer affiliation related information;
Performer collation parameter selection means for selecting collation parameters according to the performer affiliation related information;
Using the matching parameter, the person feature quantity in the person feature quantity database is collated with the person feature quantity extracted from the content, and if it is determined that they match, the performer collation that outputs the person identification information 20. The right information extracting apparatus according to claim 19, wherein the right information extracting apparatus is a means.

27. The right information extracting apparatus according to claim 19, 25 or 26, wherein the person feature amount includes at least a face feature amount of a person.

27. The right information extraction apparatus according to claim 19, 25 or 26, wherein the person feature amount includes at least a feature amount of a person's voice.

If the character feature extracted from the content does not match any of the character features that have been verified, the character recognition means identifies a video section including the character feature or a spatio-temporal position in the video. 29. The right information extracting apparatus according to claim 19, wherein the information to be output together with the person identification information is output.

A right information extraction method comprising: reading credit information related to a right from content and outputting the result as credit candidate information.

A rights information extraction method for extracting rights information related to rights from content,
A process of reading credit information about rights from the content and outputting the result as credit candidate information;
Processing for analyzing the content, recognizing an object related to rights in the content, and outputting the result as object identification information;
A right information extraction method comprising: processing for integrating the credit candidate information and the target object identification information and outputting the information as right information.

A rights information extraction method for extracting rights information related to rights from content,
A process of reading credit information about rights from the content and outputting the result as credit candidate information;
A process of referring to the credit candidate information, analyzing the content, recognizing an object related to a right in the content, and outputting the result as object identification information;
A right information extraction method comprising: processing for integrating the credit candidate information and the target object identification information and outputting the information as right information.

The process of outputting the credit candidate information is as follows:
A process of detecting a credit information section in which credit information is superimposed in the content;
33. The right information extraction method according to claim 30, further comprising: processing for reading credit information from the credit information section and outputting the result as credit candidate information.

A program for causing an information processing apparatus to execute a process of reading credit information related to rights from content and outputting the result as credit candidate information.

A process of reading credit information about rights from the content and outputting the result as credit candidate information;
Processing for analyzing the content, recognizing an object related to rights in the content, and outputting the result as object identification information;
A program for causing an information processing apparatus to execute a process of integrating the credit candidate information and the object identification information and outputting them as right information.

A process of reading credit information about rights from the content and outputting the result as credit candidate information;
A process of referring to the credit candidate information, analyzing the content, recognizing an object related to a right in the content, and outputting the result as object identification information;
A program for causing an information processing apparatus to execute a process of integrating the credit candidate information and the object identification information and outputting them as right information.

The process of outputting the credit candidate information is as follows:
A process of detecting a credit information section in which credit information is superimposed in the content;
38. The program according to claim 34, further comprising a process of reading credit information from the credit information section and outputting the result as credit candidate information.