JP2009060567A

JP2009060567A - Information processing apparatus, method, and program

Info

Publication number: JP2009060567A
Application number: JP2007303993A
Authority: JP
Inventors: Takeshi Takagi; 剛高木
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2007-08-07
Filing date: 2007-11-26
Publication date: 2009-03-19
Anticipated expiration: 2027-11-26
Also published as: JP5051448B2

Abstract

PROBLEM TO BE SOLVED: To efficiently extract information of names of performers of a program within information included in an electronic program guide (EPG). SOLUTION: An EPG text data extraction part 13 extracts text data of an electronic program guide; a morphological analyzing part 15 performs morphological analysis on text information of the electronic program guide; a pattern comparison part 42 compares morphological analysis results of the morphological analyzing part 15 with a plurality of list patterns of predetermined performer names; and when there is a list pattern of predetermined performer names having matched at least one part or more out of the morphological analysis results on the basis of the comparison results, a performer name extraction means 43 extracts a performer name with the list pattern of the matched predetermined performer name. This invention can be applied to a content management system. COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、情報処理装置および方法、並びにプログラムに関し、特に、コンテンツのメタデータに含まれる情報のうち、コンテンツの出演者名の情報を効率良く抽出できるようにした情報処理装置および方法、並びにプログラムに関する。 The present invention relates to an information processing apparatus, method, and program, and in particular, an information processing apparatus, method, and program that can efficiently extract information on the names of performers of content from information included in content metadata. About.

EPG（Electric Program Guide）と呼ばれるコンテンツのメタデータから構成される電子番組表を用いてコンテンツである番組を選択したり、EPG上で選択した番組を予約する技術が一般に普及しつつある。 A technique for selecting a program that is a content by using an electronic program guide composed of content metadata called EPG (Electric Program Guide) and for reserving a program selected on the EPG is becoming widespread.

自動録画のために用いるキーワードを情報として、より適切なものを確実かつ簡単に抽出できるようにする技術が提案されている（特許文献１参照）。 A technique has been proposed that enables more appropriate information to be extracted reliably and easily using keywords used for automatic recording as information (see Patent Document 1).

また、番組の時間の経過と共にEPGに含まれている番組名が省略された場合にでも確実に所望とする番組を検索する技術が提案されている（特許文献２参照）。 Further, a technique has been proposed for reliably searching for a desired program even when the program name included in the EPG is omitted as the program time elapses (see Patent Document 2).

特開２００６−３３９９４７号公報JP 2006-339947 A 特開２００４−１３４８５８号公報JP 2004-134858 A

しかしながら、従来において、EPGなどのコンテンツメタデータよりコンテンツである番組の出演者名の情報を抽出しようとすると、形態素解析により人物名を探し出すことはできても、役名としての人物名であるのか、または、出演者名としての人物名であるのかを識別することはできないため、単純に出演者名を抽出しようとすると、役名やその他の人物名についても抽出されることがあった。 However, in the past, when trying to extract information on the name of a performer of a program that is content from content metadata such as EPG, it is possible to find a person name by morphological analysis, but whether it is a person name as a role name, Alternatively, since it is impossible to identify whether the name is a person name as a performer name, when a performer name is simply extracted, a role name and other person names may also be extracted.

本発明はこのような状況に鑑みてなされたものであり、特に、電子番組表（EPG）などのコンテンツのメタデータに含まれる情報のうち、コンテンツである番組の出演者名の情報を効率良く抽出できるようにするものである。 The present invention has been made in view of such a situation, and in particular, among the information included in content metadata such as an electronic program guide (EPG), information on the names of performers of programs that are content can be efficiently obtained. It is to be able to extract.

本発明の一側面の情報処理装置は、コンテンツのメタデータを取得する取得手段と、前記コンテンツのメタデータに含まれるテキスト情報を形態素解析する形態素解析手段と、前記形態素解析手段の形態素解析結果と、複数の所定の出演者名の羅列パターンとを比較する比較手段と、前記比較手段の比較結果に基づいて、前記形態素解析結果のうち、少なくとも１個所以上で一致した所定の出演者名の羅列パターンが存在する場合、前記一致した所定の出演者名の羅列パターンで出演者名を抽出する抽出手段とを含む。 An information processing apparatus according to an aspect of the present invention includes: an acquisition unit that acquires content metadata; a morpheme analysis unit that performs morphological analysis on text information included in the content metadata; and a morpheme analysis result of the morpheme analysis unit; A comparison means for comparing a plurality of predetermined performer name enumeration patterns, and a list of predetermined performer names matching at least one of the morphological analysis results based on the comparison result of the comparison means Extracting means for extracting a performer name in a sequence of the predetermined predetermined performer names when there is a pattern.

前記形態素解析手段の形態素解析結果より、記載されている内容毎のレイアウトを認識するレイアウト認識手段をさらに含ませるようにすることができ、前記比較手段には、前記レイアウト認識手段により認識された前記形態素解析手段の形態素解析結果のレイアウトのうち、出演者名欄外の情報と、複数の所定の出演者名の羅列パターンとを比較させるようにすることができる。 From the morpheme analysis result of the morpheme analysis unit, a layout recognition unit for recognizing a layout for each content described can be further included, and the comparison unit recognizes the layout recognition unit. In the layout of the morpheme analysis result of the morpheme analysis means, the information outside the performer name field can be compared with the enumeration pattern of a plurality of predetermined performer names.

前記形態素解析手段の形態素解析結果より、記載されている内容毎のレイアウトを認識するレイアウト認識手段と、前記レイアウト認識手段により認識された前記形態素解析手段の形態素解析結果のレイアウトのうち、出演者名欄内の情報と、複数の所定の出演者名の羅列パターンとの類似度の距離を計算する類似度計算手段と、前記類似度距離計算手段の類似度距離計算結果に基づいて、前記形態素解析結果のうち、類似度距離が最小となる所定の出演者名の羅列パターンで出演者名を抽出する第２の抽出手段とをさらに含ませるようにすることができる。 Of the layout of the morpheme analysis result of the morpheme analysis unit recognized by the layout recognition unit and the layout recognition unit that recognizes the layout for each content described from the morpheme analysis result of the morpheme analysis unit, performer name Similarity calculation means for calculating the distance of similarity between the information in the column and the enumeration pattern of a plurality of predetermined performer names, and the morphological analysis based on the similarity distance calculation result of the similarity distance calculation means Among the results, a second extraction means for extracting a performer name with a predetermined enumeration pattern of performer names having a minimum similarity distance can be further included.

前記所定の出演者名の羅列パターンには、出演者名、記号、出演者名、記号・・、出演者名、記号、役名、出演者名・・・、若しくは役名、記号、出演者名、役名・・・、または、出演者名、出演者名・・・の羅列パターンを含ませるようにすることができる。 In the enumeration pattern of the predetermined performer name, performer name, symbol, performer name, symbol ..., performer name, symbol, role name, performer name ..., or role name, symbol, performer name, It is possible to include an enumeration pattern of role names ... or performer names, performer names ....

前記コンテンツには、テレビジョン番組を含ませるようにすることができ、前記メタデータは、前記テレビジョン番組に関する情報を含ませるようにすることができる。 The content may include a television program, and the metadata may include information related to the television program.

本発明の一側面の情報処理方法は、コンテンツのメタデータを取得する取得ステップと、前記コンテンツのメタデータに含まれるテキスト情報を形態素解析する形態素解析ステップと、前記形態素解析ステップの処理での形態素解析結果と、複数の所定の出演者名の羅列パターンとを比較する比較ステップ表のテキスト情報を形態素解析する形態素解析ステップと、前記形態素解析ステップの処理と、前記比較ステップの処理での比較結果に基づいて、前記形態素解析結果のうち、少なくとも１個所以上で一致した所定の出演者名の羅列パターンが存在する場合、前記一致した所定の出演者名の羅列パターンで出演者名を抽出する抽出ステップとを含む。 An information processing method according to one aspect of the present invention includes an acquisition step of acquiring content metadata, a morphological analysis step of analyzing text information included in the content metadata, and a morpheme in the processing of the morpheme analysis step. Comparison result in morpheme analysis step for comparing morphological analysis of text information of comparison step table for comparing analysis result and enumeration pattern of plural predetermined performer names, processing in said morpheme analysis step, and processing in said comparison step Based on the morpheme analysis result, if there is a list of predetermined performer names that match at least one place, the extraction is performed to extract the performer names with the predetermined pattern of the predetermined performer names Steps.

本発明の一側面のプログラムは、コンテンツのメタデータを取得する取得ステップと、前記コンテンツのメタデータに含まれるテキスト情報を形態素解析する形態素解析ステップと、前記形態素解析ステップの処理での形態素解析結果と、複数の所定の出演者名の羅列パターンとを比較する比較ステップと、前記比較ステップの処理での比較結果に基づいて、前記形態素解析結果のうち、少なくとも１個所以上で一致した所定の出演者名の羅列パターンが存在する場合、前記一致した所定の出演者名の羅列パターンで出演者名を抽出する抽出ステップとを含む処理をコンピュータに実行させる。 The program according to one aspect of the present invention includes an acquisition step of acquiring content metadata, a morpheme analysis step of analyzing morpheme of text information included in the content metadata, and a morpheme analysis result in the processing of the morpheme analysis step And a comparison step for comparing a plurality of predetermined performer name enumeration patterns, and a predetermined appearance matching at least one or more of the morphological analysis results based on the comparison result in the processing of the comparison step If there is an enumeration pattern of performer names, the computer is caused to execute a process including an extraction step of extracting performer names with the enumerated pattern of the predetermined performer names.

本発明のプログラム格納媒体は、請求項４に記載のプログラムが格納されている。 The program storage medium of the present invention stores the program according to claim 4.

本発明の一側面の情報処理装置および方法、並びにプログラムにおいては、コンテンツのメタデータが取得され、前記コンテンツのメタデータに含まれるテキスト情報が形態素解析され、形態素解析結果と、複数の所定の出演者名の羅列パターンとが比較され、比較結果に基づいて、前記形態素解析結果のうち、少なくとも１個所以上で一致した所定の出演者名の羅列パターンが存在する場合、前記一致した所定の出演者名の羅列パターンで出演者名が抽出される。 In the information processing apparatus, method, and program according to an aspect of the present invention, content metadata is acquired, text information included in the content metadata is analyzed, a morphological analysis result, and a plurality of predetermined appearances If a list pattern of a given performer name that matches at least one of the morphological analysis results is present based on the comparison result, the list of matched performers is displayed. The names of the performers are extracted with the enumeration pattern of names.

本発明の情報処理装置は、独立した装置であっても良いし、情報処理を行うブロックであっても良い。 The information processing apparatus of the present invention may be an independent apparatus or a block that performs information processing.

本発明の一側面によれば、コンテンツのメタデータに含まれる情報のうち、コンテンツの出演者名の情報を効率良く抽出することが可能となる。 According to one aspect of the present invention, it is possible to efficiently extract information on the names of performers of content from information included in the metadata of the content.

以下に本発明の実施の形態を説明するが、本明細書に記載の発明と、発明の実施の形態との対応関係を例示すると、次のようになる。この記載は、本明細書に記載されている発明をサポートする実施の形態が本明細書に記載されていることを確認するためのものである。従って、発明の実施の形態中には記載されているが、発明に対応するものとして、ここには記載されていない実施の形態があったとしても、そのことは、その実施の形態が、その発明に対応するものではないことを意味するものではない。逆に、実施の形態が発明に対応するものとしてここに記載されていたとしても、そのことは、その実施の形態が、その発明以外の発明には対応しないものであることを意味するものでもない。 Embodiments of the present invention will be described below. The correspondence relationship between the invention described in this specification and the embodiments of the invention is exemplified as follows. This description is intended to confirm that the embodiments supporting the invention described in this specification are described in this specification. Therefore, although there is an embodiment which is described in the embodiment of the invention but is not described here as corresponding to the invention, it means that the embodiment is not It does not mean that it does not correspond to the invention. Conversely, even if an embodiment is described herein as corresponding to an invention, that means that the embodiment does not correspond to an invention other than the invention. Absent.

さらに、この記載は、本明細書に記載されている発明の全てを意味するものではない。換言すれば、この記載は、本明細書に記載されている発明であって、この出願では請求されていない発明の存在、すなわち、将来、分割出願されたり、補正により出現、追加される発明の存在を否定するものではない。 Further, this description does not mean all the inventions described in this specification. In other words, this description is for the invention described in the present specification, which is not claimed in this application, that is, for the invention that will be applied for in the future or that will appear and be added by amendment. It does not deny existence.

即ち、本発明の一側面の情報処理装置は、コンテンツのメタデータを取得する取得手段（例えば、図１のEPG取得部１２またはiEPG取得部１４）と、前記コンテンツのメタデータに含まれるテキスト情報を形態素解析する形態素解析手段（例えば、図１の形態素解析部１５）と、前記形態素解析手段の形態素解析結果と、複数の所定の出演者名の羅列パターンとを比較する比較手段（例えば、図１のパターン比較部４２）と、前記比較手段の比較結果に基づいて、前記形態素解析結果のうち、少なくとも１個所以上で一致した所定の出演者名の羅列パターンが存在する場合、前記一致した所定の出演者名の羅列パターンで出演者名を抽出する抽出手段（例えば、図１の出演者名抽出部４３）とを含む。 That is, an information processing apparatus according to an aspect of the present invention includes an acquisition unit (for example, the EPG acquisition unit 12 or the iEPG acquisition unit 14 in FIG. 1) that acquires content metadata, and text information included in the content metadata. Morphological analysis means (for example, morpheme analysis unit 15 in FIG. 1), and comparison means (for example, FIG. 1) for comparing the morpheme analysis results of the morpheme analysis means and the enumeration patterns of a plurality of predetermined performer names. 1 pattern comparison unit 42) and, based on the comparison result of the comparison means, if there is an enumeration pattern of a given performer name that matches at least one place among the morphological analysis results, the matching predetermined Extraction means (for example, a performer name extraction unit 43 in FIG. 1) for extracting performer names in an enumerated pattern of performer names.

前記形態素解析手段の形態素解析結果より、記載されている内容毎のレイアウトを認識するレイアウト認識手段（例えば、図１のレイアウト認識部２０）をさらに含ませるようにすることができ、前記比較手段（例えば、図１のパターン比較部４２）には、前記レイアウト認識手段により認識された前記形態素解析手段の形態素解析結果のレイアウトのうち、出演者名欄外の情報と、複数の所定の出演者名の羅列パターンとを比較させるようにすることができる。 From the morpheme analysis result of the morpheme analysis unit, a layout recognition unit (for example, the layout recognition unit 20 in FIG. 1) for recognizing a layout for each content described can be further included, and the comparison unit ( For example, the pattern comparison unit 42) in FIG. 1 includes information outside the performer name column and a plurality of predetermined performer names in the layout of the morpheme analysis result of the morpheme analysis unit recognized by the layout recognition unit. It can be made to compare with the enumeration pattern.

前記形態素解析手段の形態素解析結果より、記載されている内容毎のレイアウトを認識するレイアウト認識手段（例えば、図１のレイアウト認識部２０）と、前記レイアウト認識手段により認識された前記形態素解析手段の形態素解析結果のレイアウトのうち、出演者名欄内の情報と、複数の所定の出演者名の羅列パターンとの類似度の距離を計算する類似度計算手段（例えば、図１の類似性距離計算部３３）と、前記類似度距離計算手段の類似度距離計算結果に基づいて、前記形態素解析結果のうち、類似度距離が最小となる所定の出演者名の羅列パターンで出演者名を抽出する第２の抽出手段（例えば、図１の出演者名抽出部３５）とをさらに含ませるようにすることができる。 From the morpheme analysis result of the morpheme analysis unit, a layout recognition unit (for example, the layout recognition unit 20 in FIG. 1) that recognizes the layout for each content described, and the morpheme analysis unit recognized by the layout recognition unit Of the layout of the morpheme analysis results, similarity calculation means for calculating the similarity distance between the information in the performer name column and the enumeration pattern of a plurality of predetermined performer names (for example, similarity distance calculation in FIG. 1) Part 33) and based on the similarity distance calculation result of the similarity distance calculation means, performer names are extracted from the morpheme analysis result in a list pattern of predetermined performer names having the minimum similarity distance. Second extraction means (for example, performer name extraction unit 35 in FIG. 1) can be further included.

本発明の一側面の情報処理方法は、コンテンツのメタデータを取得するEPG取得ステップ（例えば、図３のステップＳ２）と、前記コンテンツのメタデータに含まれるテキスト情報を形態素解析する形態素解析ステップ（例えば、図３のステップＳ４）と、前記形態素解析ステップの処理での形態素解析結果と、複数の所定の出演者名の羅列パターンとを比較する比較ステップ（例えば、図１０のステップＳ５３）と、前記比較ステップの処理での比較結果に基づいて、前記形態素解析結果のうち、少なくとも１個所以上で一致した所定の出演者名の羅列パターンが存在する場合、前記一致した所定の出演者名の羅列パターンで出演者名を抽出する抽出ステップ（例えば、図１０のステップＳ５５）とを含む。 An information processing method according to an aspect of the present invention includes an EPG acquisition step (for example, step S2 in FIG. 3) for acquiring content metadata, and a morpheme analysis step (for performing morphological analysis on text information included in the content metadata). For example, step S4) in FIG. 3 and a comparison step (for example, step S53 in FIG. 10) for comparing the morpheme analysis result in the processing of the morpheme analysis step and the enumeration pattern of a plurality of predetermined performer names, Based on the comparison result in the process of the comparison step, if there is an enumeration pattern of a predetermined performer name that matches at least one place among the morphological analysis results, an enumeration of the matched predetermined performer names And an extraction step (for example, step S55 in FIG. 10) for extracting performer names by pattern.

図１は、本発明を適用した一実施の形態の構成を示す情報処理装置である。 FIG. 1 is an information processing apparatus showing the configuration of an embodiment to which the present invention is applied.

情報処理装置１は、インターネットなどに代表されるネットワークや、放送波などにより配信されるコンテンツのメタデータから構成されたEPG（電子番組表）を取得し、電子番組表の中に含まれる番組（コンテンツ）の情報より、出演者名をキーワードとして抽出し、抽出された出演者名のうち、操作ボタン、またはキーボードなどからなるリモートコントローラなどの操作部５により選択された出演者名に対応する番組を表示する。 The information processing apparatus 1 acquires an EPG (electronic program guide) composed of metadata represented by a network represented by the Internet or the like, or broadcast waves, and the like (programs included in the electronic program guide ( Performer name is extracted as a keyword from the information of the content), and the program corresponding to the performer name selected by the operation unit 5 such as a remote controller including an operation button or a keyboard among the extracted performer names is extracted. Is displayed.

受信部１１は、アンテナ２を介して放送波を受信して、EPG取得部１２およびチューナ２６に供給する。EPG取得部１２は、受信部１１より供給されてきた信号のうち、EPG（電子番組表）情報を取得してEPGテキストデータ抽出部１３、レイアウト認識部２０、および番組検索部２５に供給する。 The receiving unit 11 receives a broadcast wave via the antenna 2 and supplies it to the EPG acquisition unit 12 and the tuner 26. The EPG acquisition unit 12 acquires EPG (electronic program guide) information from the signals supplied from the reception unit 11, and supplies the EPG text data extraction unit 13, the layout recognition unit 20, and the program search unit 25.

iEPG取得部１４は、インターネットに代表されるネットワーク３を介して、所定のURL（Uniform Resource Locator ）などにより指定されるEPG配信サーバ４にアクセスし、EPG情報を取得し、EPGテキストデータ抽出部１３、レイアウト認識部２０、および番組検索部２５に供給する。 The iEPG acquisition unit 14 accesses the EPG distribution server 4 specified by a predetermined URL (Uniform Resource Locator) or the like via the network 3 represented by the Internet, acquires EPG information, and acquires the EPG text data extraction unit 13. To the layout recognition unit 20 and the program search unit 25.

EPGテキストデータ抽出部１３は、EPG取得部１２より供給されてくるEPG情報、または、iEPG取得部１４より供給されてくるEPG情報のそれぞれよりテキストデータを抽出して形態素解析部１５に供給する。 The EPG text data extraction unit 13 extracts text data from each of the EPG information supplied from the EPG acquisition unit 12 or the EPG information supplied from the iEPG acquisition unit 14 and supplies the text data to the morpheme analysis unit 15.

形態素解析部１５は、EPG情報のテキストデータを、言葉の最小単位（以降においては、これを単語と称するものとする）に分割し、それぞれの単語について、辞書記憶部１６に登録されている情報と照合して、品詞を特定することにより、形態素解析処理を実行し、その結果を形態素解析結果バッファ１７に記憶させる。 The morpheme analysis unit 15 divides the text data of the EPG information into minimum units of words (hereinafter referred to as words), and information registered in the dictionary storage unit 16 for each word. The morpheme analysis process is executed by specifying the part of speech and the result is stored in the morpheme analysis result buffer 17.

レイアウト認識部２０は、EPG取得部１２またはiEPG取得部１４より供給されたEPG情報に基づいて、EPGとして表示される情報毎のレイアウトを認識し、認識したレイアウトの情報を分別抽出部２１に供給する。 The layout recognition unit 20 recognizes the layout for each information displayed as EPG based on the EPG information supplied from the EPG acquisition unit 12 or the iEPG acquisition unit 14, and supplies the recognized layout information to the classification extraction unit 21. To do.

分別抽出部２１は、出演者名が記載されている出演者欄の配置を認識し、レイアウト認識部２０より供給されたレイアウト情報に基づいて、出演者欄内の情報を形態素解析結果バッファ１７より読み出して、出演者欄内判定部２４に供給すると共に、出演者欄外の情報を形態素解析結果バッファ１７より読み出して、出演者欄外判定部１８に供給する。尚、出演者欄については詳細を後述する。 The classification extraction unit 21 recognizes the arrangement of the performer column in which the performer name is described, and based on the layout information supplied from the layout recognition unit 20, the information in the performer column is obtained from the morphological analysis result buffer 17. The information is read out and supplied to the performer column determination unit 24, and information outside the performer column is read from the morphological analysis result buffer 17 and supplied to the performer column determination unit 18. Details of the performer column will be described later.

出演者欄外判定部１８は、EPGとして表示される情報のうち、出演者欄のレイアウト以外の領域に含まれている形態素解析結果に基づいて、出演者名を抽出して出演者名抽出結果記憶部２２に記憶させる。 The performer margin determination unit 18 extracts a performer name based on a morphological analysis result included in an area other than the layout of the performer column in the information displayed as the EPG, and stores the performer name extraction result storage. Store in the unit 22.

出演者欄外判定部１８のパターン抽出部４１は、パターン記憶部１９に記憶されている、複数の属性の羅列パターンのいずれかを順次読み出し、その羅列パターンで形態素解析結果バッファ１７に記憶されている出演者欄外の領域に存在する単語と、対応する属性の情報を抽出してパターン比較部４２に供給する。 The pattern extraction unit 41 of the performer margin determination unit 18 sequentially reads one of a plurality of attribute enumeration patterns stored in the pattern storage unit 19 and stores them in the morphological analysis result buffer 17 as the enumeration pattern. The words existing in the area outside the performer column and the corresponding attribute information are extracted and supplied to the pattern comparison unit 42.

ここでいう、属性のパターンとは、属性が出演者名、役名、外人出演者、日本人声、外人役名、読仮名、およびグループ名の羅列パターンであり、例えば、図２で示されるような、第１パターン乃至第８パターンである。 Here, the attribute pattern is an enumeration pattern in which the attribute is a performer name, a role name, a foreigner performer, a Japanese voice, a foreign character name, a reading name, and a group name, for example, as shown in FIG. , First pattern to eighth pattern.

第１パターンは、例えば、「出演者名，出演者名」、「出演者名、出演者名」、「出演者名・出演者名」、「出演者名出演者名」、「出演者名／出演者名」、および「出演者名（改行）出演者名」であり、出演者名と出演者名との間に、何らかの記号（スペースおよび改行を含む）があり、出演者名が連続的に羅列されるパターンである。 The first pattern is, for example, “performer name, performer name”, “performer name, performer name”, “performer name / performer name”, “performer name performer name”, “performer name”. / Performer name "and" performer name (new line) performer name ", there is some symbol (including space and new line) between the performer name and performer name, and the performer name is continuous It is a pattern that is enumerated.

また、第２パターンは、例えば、「出演者名（役名）」、および「出演者名→役名」であり、出演者名の次に役名が配置され、何らかの記号（スペースおよび改行を含む）があるものが連続的に羅列されるパターンである。 The second pattern is, for example, “performer name (title)” and “performer name → role name”, where the role name is arranged next to the performer name, and some symbol (including space and line feed) is present. There is a pattern in which something is continuously listed.

さらに、第３パターンは、例えば、「役名：出演者名」、「役名・・・出演者名」、「役名＿＿＿＿出演者名」、「役名＿出演者名」、および「役名・・出演者名」であり、役名の次に出演者名が配置され、何らかの記号（スペースおよび改行を含む）で挟まれ、それらが連続的に羅列されるパターンである。 Furthermore, the third pattern includes, for example, “title: performer name”, “role name ... performer name”, “role name____performer name”, “role name_performer name”, and “role name / performer”. The name of the performer is arranged next to the role name, and is inserted between some symbols (including spaces and line feeds), and these are continuously enumerated.

また、第４パターンは、例えば、「出演者名（グループ名）」であり、出演者名の次に、出演者が属しているグループ名が配置され、それらが連続的に羅列されるパターンである。 The fourth pattern is, for example, a “performer name (group name)”, and a group name to which the performer belongs is arranged next to the performer name, and these are continuously enumerated. is there.

さらに、第５パターンは、例えば、「外人出演者...日本人声」、および「外人出演者（日本人声）」であり、外人の出演者名の次に吹き替えの日本人名が配置され、何らかの記号で挟まれ、それらが連続的に羅列されるパターンである。ここでいう外人出演者名は、カタカナやアルファベットにより表記されている人名からなるものである。 Furthermore, the fifth pattern is, for example, “foreign performer ... Japanese voice” and “foreign performer (Japanese voice)”, and the dubbed Japanese name is placed next to the foreign performer name. , It is a pattern in which they are sandwiched between some symbols and they are enumerated continuously. The foreign performer name here is composed of a person name written in katakana or alphabet.

また、第６パターンは、例えば、「外人役名＝外人出演者（日本人声）」であり、外人役名の次に、記号が配置され、その次に外人出演者名が配置され、さらに引き続いて括弧内に吹き替えの日本人名が配置され、それらが連続的に羅列されるパターンである。 In addition, the sixth pattern is, for example, “foreign character name = foreign performer (Japanese voice)”, a symbol is arranged next to the foreign character name, and then the foreign character name is arranged. It is a pattern in which dubbed Japanese names are placed in parentheses and are enumerated continuously.

さらに、第７パターンは、例えば、「外人出演者読仮名」であり、外人の出演者名の次に読仮名が配置され、それらが連続的に羅列されるパターンである。 Furthermore, the seventh pattern is, for example, “foreign performer reading Kana”, in which a reading kana is arranged next to a foreign performer name, and these are continuously enumerated.

また、第８パターンは、例えば、「外人役名（読仮名）・・・外人出演者（読仮名）であり、外人役名の次に、括弧内に読仮名が配置され、その次に何らかの記号が配置され、さらに、外人出演者名が配置され、さらに引き続いて括弧内に読仮名が配置され、それらが連続的に羅列されるパターンである。 The eighth pattern is, for example, “foreign character name (Yomikana)... Foreign performer (Yomikana), and after the foreign character name, the Yomi name is placed in parentheses, and then some symbol is In this pattern, the names of foreign performers are arranged, followed by reading names in parentheses, and these are continuously enumerated.

第１乃至第８パターンにおける出演者名には、品詞として人物名はもちろんのこと、さらに、女優名、俳優名、歌手名など著名の人物を識別する属性についても含まれる。また、役名は、属性として「司会」、および「プロデューサ」などの役職を示す言葉はもちろんのこと、物語に登場する人物名も含まれるものである。 The names of performers in the first to eighth patterns include not only the name of a person as part of speech but also an attribute for identifying a famous person such as an actress name, an actor name, and a singer name. Further, the title includes not only words indicating titles such as “moderator” and “producer” as attributes, but also names of characters appearing in the story.

パターン比較部４２は、パターン抽出部４１が、形態素解析結果バッファ１７より、パターン記憶部１９に記憶されている、上述した第１乃至第８パターンのいずれかの羅列パターンであると仮定して抽出してきた属性の羅列パターンと、仮定された羅列パターンとを比較して一致するか否かを判定する。 The pattern comparison unit 42 extracts the pattern extraction unit 41 on the assumption that the pattern extraction unit 41 is one of the above-described first to eighth patterns stored in the pattern storage unit 19 from the morphological analysis result buffer 17. The enumeration pattern of the attribute that has been performed is compared with the assumed enumeration pattern to determine whether or not they match.

出演者名抽出部４３は、パターン比較部４２の比較結果に基づいて、一致した羅列パターンで出演者名の情報を抽出して、出演者名抽出結果記憶部２２に記憶させる。 The performer name extraction unit 43 extracts the information of the performer name with the matched enumeration pattern based on the comparison result of the pattern comparison unit 42 and stores it in the performer name extraction result storage unit 22.

出演者欄内判定部２４は、EPGとして表示される情報のうち、出演者欄内の領域に含まれている形態素解析結果に基づいて、出演者名を抽出して出演者名抽出結果記憶部２２に記憶させる。 The performer column determination unit 24 extracts the performer name based on the morphological analysis result included in the region in the performer column from the information displayed as the EPG, and the performer name extraction result storage unit 22 is stored.

属性判定部３１は、分別抽出部２１より供給される単語についてそれぞれ属性を判定し、パターン抽出部３２に供給する。パターン抽出部３２は、属性判定部３１より供給されてくる属性の判定結果に基づいて、属性のパターンを抽出して類似性距離計算部３３に供給する。類似性距離計算部３３は、パターン抽出部３２より供給されてくるパターンと、パターン記憶部１９に記憶されているパターンとの類似性を示す類似性距離を計算し、順次パターン決定部３４に供給する。パターン決定部３４は、類似性距離計算部３３より供給されてくる類似性距離の情報に基づいて、類似性距離が最も小さいパターンをパターン抽出部３２により抽出されたパターンであるものとして認識し、抽出されたパターンを決定すると共に、決定したパターンを演出者名抽出部３５に供給する。演出者名抽出部３５は、パターン決定部３４より供給されてくるパターンに基づいて、分別抽出部２１より供給されてくる単語より出演者名のみを抽出して出演者名抽出結果記憶部２２に記憶させる。 The attribute determination unit 31 determines an attribute for each word supplied from the classification extraction unit 21 and supplies the attribute to the pattern extraction unit 32. The pattern extraction unit 32 extracts an attribute pattern based on the attribute determination result supplied from the attribute determination unit 31 and supplies the attribute pattern to the similarity distance calculation unit 33. The similarity distance calculation unit 33 calculates a similarity distance indicating the similarity between the pattern supplied from the pattern extraction unit 32 and the pattern stored in the pattern storage unit 19, and sequentially supplies the similarity distance to the pattern determination unit 34. To do. The pattern determination unit 34 recognizes the pattern having the smallest similarity distance as the pattern extracted by the pattern extraction unit 32 based on the similarity distance information supplied from the similarity distance calculation unit 33. The extracted pattern is determined, and the determined pattern is supplied to the director name extraction unit 35. The director name extraction unit 35 extracts only the performer name from the words supplied from the classification extraction unit 21 based on the pattern supplied from the pattern determination unit 34 and stores it in the performer name extraction result storage unit 22. Remember me.

出力部２３は、出演者名抽出結果記憶部２２に記憶されている出演者名を出力する。 The output unit 23 outputs the performer name stored in the performer name extraction result storage unit 22.

次に、図３のフローチャートを参照して、出演者名抽出処理について説明する。 Next, the performer name extraction process will be described with reference to the flowchart of FIG.

ステップＳ１において、EPG取得部１２またはiEPG取得部１４は、操作部５が操作されて、出演者名の表示が指示されたか否かを判定し、指示されたと判定されるまで同様の処理を繰り返す。例えば、図４で示されるようなオプションタブ１０１が表示され、出演者名を人名として表示させるように指示する「人名」と記述されたボタン１１６が操作された場合、出演者名の表示が指示されたものとみなし、処理は、ステップＳ２に進む。 In step S1, the EPG acquisition unit 12 or the iEPG acquisition unit 14 determines whether or not the operation unit 5 has been operated to instruct the display of the performer name, and the same processing is repeated until it is determined that the operation has been instructed. . For example, when the option tab 101 as shown in FIG. 4 is displayed and the button 116 described as “person name” instructing to display the performer name as a person name is operated, the display of the performer name is instructed. The process proceeds to step S2.

尚、図４は、表示部６に表示されている画像例を示しており、チューナ２４により選局されている通常の放送番組の表示欄１０２がオプションタブ１０１の左側に設けられており、オプションタブ１０１には、上から「HDD情報」、「DVD情報」、「画音質設定」、「番組録画」、「番組説明」、「人名」、および「キーワード」と表示されたボタン１１１乃至１１７が表示されている。ボタン１１１は、図示せぬHDD（Hard Disk Drive）に記録されている番組の情報を表示するとき操作される。ボタン１１２は、図示せぬDVD（Digital Versatile Disk）ドライブに挿入されているDVDに記録されている番組情報を表示するとき操作される。ボタン１１３は、画音質設定を実行するとき操作される。ボタン１１４は、番組録画を実行するとき操作される。ボタン１１５は、EPGに含まれている表示欄１０２で表示されている番組の番組説明を表示させるとき操作される。ボタン１１６は、EPGに含まれている表示欄１０２で表示されている番組の出演者名を人名として表示させるとき操作される。ボタン１１７は、EPGに含まれている表示欄１０２で表示されている番組のキーワードを表示させるとき操作される。 FIG. 4 shows an example of an image displayed on the display unit 6. A display field 102 for a normal broadcast program selected by the tuner 24 is provided on the left side of the option tab 101. On the tab 101, buttons 111 to 117 are displayed with “HDD information”, “DVD information”, “picture and sound quality setting”, “program recording”, “program description”, “person name”, and “keyword” from the top. It is displayed. The button 111 is operated to display program information recorded in an HDD (Hard Disk Drive) (not shown). The button 112 is operated when displaying program information recorded on a DVD inserted in a DVD (Digital Versatile Disk) drive (not shown). The button 113 is operated when executing image quality setting. The button 114 is operated when program recording is executed. The button 115 is operated when displaying the program description of the program displayed in the display field 102 included in the EPG. The button 116 is operated to display the performer name of the program displayed in the display field 102 included in the EPG as a person name. The button 117 is operated when displaying the keyword of the program displayed in the display field 102 included in the EPG.

ステップＳ２において、EPG取得部１２は、受信部１１を介してアンテナ２により受信される放送波に含まれる所定の番組のEPG情報を取得してEPGテキストデータ抽出部１３およびレイアウト認識部２０に供給する。または、iEPG取得部１４は、所定のURLで指定されるネットワーク３上のEPG配信サーバ４にアクセスして、所定の番組のEPG情報を取得してEPGテキストデータ抽出部１３およびレイアウト認識部２０に供給する。 In step S 2, the EPG acquisition unit 12 acquires EPG information of a predetermined program included in the broadcast wave received by the antenna 2 via the reception unit 11 and supplies the EPG information to the EPG text data extraction unit 13 and the layout recognition unit 20. To do. Alternatively, the iEPG acquisition unit 14 accesses the EPG distribution server 4 on the network 3 specified by a predetermined URL, acquires EPG information of a predetermined program, and sends it to the EPG text data extraction unit 13 and the layout recognition unit 20. Supply.

ステップＳ３において、EPGテキストデータ抽出部１３は、供給されてきたEPG情報よりテキストデータを抽出して形態素解析部１５に供給する。 In step S 3, the EPG text data extraction unit 13 extracts text data from the supplied EPG information and supplies the text data to the morpheme analysis unit 15.

ステップＳ４において、形態素解析部１５は、辞書記憶部１６に記憶されている情報に基づいて、供給されてきたEPG情報のテキストデータを単語に分割して、それぞれの単語の品詞を特定し、形態素解析結果バッファ１７に記憶させる。辞書記憶部１６を用いた形態素解析部１５の形態素解析においては、品詞として、名詞のうち、人物名であれば、品詞として人物名を指定することも可能であるし、人物名のうち、例えば、有名俳優名、有名女優名、有名歌手名などについては、その人物名が、それぞれ俳優名、女優名、歌手名であることを指定した属性とすることも可能である。従って、形態素解析部１５は、各単語について、文法上の品詞を特定するのみならず、名詞である場合には、人物名であるか、商品名であるか、地名であるかなどを分類したり、さらには、人物名であるときには、俳優名、女優名、または、歌手名であるかなどを含めて属性として分類する。 In step S4, the morpheme analysis unit 15 divides the text data of the supplied EPG information into words based on the information stored in the dictionary storage unit 16, specifies the part of speech of each word, and determines the morpheme. The result is stored in the analysis result buffer 17. In the morphological analysis of the morphological analysis unit 15 using the dictionary storage unit 16, it is possible to specify a person name as a part of speech if it is a person name among the nouns as part of speech. As for the famous actor name, famous actress name, famous singer name, etc., it is also possible to use attributes that specify that the person name is the actor name, actress name, and singer name, respectively. Therefore, the morphological analysis unit 15 not only specifies the part of speech in the grammar for each word, but also classifies whether it is a person name, a product name, or a place name when it is a noun. Furthermore, when it is a person name, it is classified as an attribute including whether it is an actor name, an actress name, or a singer name.

ステップＳ５において、レイアウト認識部２０は、EPG取得部１２またはiEPG取得部１４より供給されてきたEPGの表示情報に基づいてレイアウトを認識し、認識結果を分別抽出部２１に供給する。例えば、図５で示されるようにEPG情報が表示されるとき、レイアウト認識部２０は、以下のようにレイアウトを認識する。 In step S 5, the layout recognition unit 20 recognizes the layout based on the EPG display information supplied from the EPG acquisition unit 12 or iEPG acquisition unit 14, and supplies the recognition result to the classification extraction unit 21. For example, when the EPG information is displayed as shown in FIG. 5, the layout recognition unit 20 recognizes the layout as follows.

図５の場合、レイアウト認識部２０は、最上段の「タコの瞳に恋してる〜あなたは運命の出会...」と表示されている領域Ｚ１については、タイトル表示欄として認識する。また、レイアウト認識部２０は、中段の「主人公山田おさむ（稲田吾郎）が出会った天使のような女性はなんとタコだった...。同僚の竹内武（大林南朋）は、・・・」と表示されている領域Ｚ２は、ストーリー説明欄であると認識する。さらにレイアウト認識部２０は、最下段の「出演者稲田吾郎（山田おさむ）村下知子（太口美幸）蟹原友里（蟹原友美）ＭＥＧＵＭＵ（代々木翔子）脚本マザー監督三上義重橋本圭太ほか音楽三菱紀人主題歌：「恋の花」倖田未来（リズムゾンビ）番組説明今から３年前、一組の夫婦が誕生した。昔から変わらぬオトコの評価基準である”仕事ができる”を、大幅にクリアしている・・・」と表示されている領域Ｚ３を番組詳細説明欄として認識する。EPG表示画面は、上述した領域Ｚ１乃至Ｚ３で示されるような領域の配置順序が異なることがあっても、ほぼ同様な構造の領域を備えているため、レイアウト認識部２０は、これらの領域を属性から認識（推定）する。 In the case of FIG. 5, the layout recognizing unit 20 recognizes the area Z1 displayed at the top as “I fall in love with the octopus's eyes-You are destined ...” as the title display column. In addition, the layout recognition unit 20 said in the middle “The angelic woman whom the main character Osamu Yamada (Inaro Goro) met was an octopus ... My colleague Takeuchi Takeuchi (Nanori Obayashi) ...” Is recognized as a story explanation column. Furthermore, the layout recognition unit 20 is the bottom stage "Performer Goro Inada (Osamu Yamada) Tomoko Murashita (Miyuki Oguchi) Yuri Sugawara (Yumi Sugawara) MEGMU (Shoko Yoyogi) Screenplay Mother Director Yoshishige Mikami Yuta Hashimoto and others Music Mitsubishiki People Theme Song: “Koi no Hana” Mirai Hamada (Rhythm Zombie) Program Description A couple of couples were born three years ago. The area Z3 displayed as “The work can be done”, which is the evaluation standard of the man who has not changed from the past, has been greatly cleared. Since the EPG display screen includes regions having substantially the same structure even if the arrangement order of the regions as indicated by the regions Z1 to Z3 described above is different, the layout recognition unit 20 includes these regions. Recognize (estimate) from attributes.

さらに、レイアウト認識部２０は、この領域Ｚ３において、特に、「出演者稲田吾郎（山田おさむ）村下知子（太口美幸）蟹原友里（蟹原友美）ＭＥＧＵＭＵ（代々木翔子）」の記述欄（後述する領域Ｚ３’）を出演者欄として認識する。すなわち、図５の場合、レイアウト認識部２０は、「出演者」との記述を含む単語の塊となる領域を出演者欄として認識する。 Furthermore, in this area Z3, the layout recognizing unit 20 particularly describes a description column of “Performer Goro Inada (Osamu Yamada) Tomoko Murashita (Miyuki Oguchi) Yuri Sugawara (Yumi Sugawara) MEGUMU (Shoko Yoyogi) (described later). Region Z3 ′) is recognized as the performer column. That is, in the case of FIG. 5, the layout recognizing unit 20 recognizes an area that is a lump of words including a description “performer” as a performer column.

ステップＳ６において、分別抽出部２１は、レイアウト情報に基づいて、出演者欄内の単語を形態素解析結果バッファ１７より抽出して出演者欄内判定部２４に供給する。 In step S 6, the classification extraction unit 21 extracts words in the performer column from the morpheme analysis result buffer 17 based on the layout information and supplies the extracted words to the performer column determination unit 24.

ステップＳ７において、分別抽出部２１は、レイアウト情報に基づいて、出演者欄外の単語を形態素解析結果バッファ１７より抽出して出演者欄外判定部１８に供給する。 In step S 7, the classification extraction unit 21 extracts words outside the performer column from the morphological analysis result buffer 17 based on the layout information, and supplies the extracted word to the performer margin determination unit 18.

ステップＳ８において、出演者欄内判定部２４は、出演者欄内判定処理を実行し、出演者欄内の単語より出演者の単語を抽出して出演者名抽出結果記憶部２２に記憶させる。 In step S 8, the performer column determination unit 24 executes a performer column determination process, extracts the performer's words from the words in the performer column, and stores them in the performer name extraction result storage unit 22.

ここで、図６のフローチャートを参照して、出演者欄内判定処理について説明する。 Here, with reference to the flowchart of FIG.

ステップＳ３１において、属性判定部３１は、分離抽出部２１より供給されてきた全ての単語について、それぞれ俳優や女優などの属性に登録されている単語であるか否かを判定し、判定結果をパターン抽出部３２に供給する。すなわち、属性判定部３１は、供給されてきた出演者欄内の単語について、役名などの実在しない人物を示す属性の人名ではなく、実在する俳優や女優などの人名として登録された属性の人名であるか否かを判定する。 In step S31, the attribute determination unit 31 determines whether or not all the words supplied from the separation / extraction unit 21 are words registered in attributes such as actors and actresses, and sets the determination result as a pattern. This is supplied to the extraction unit 32. That is, the attribute determination unit 31 uses the name of the attribute registered as the name of an actual actor or actress, not the name of an attribute indicating a nonexistent person, such as a role name, for the word in the performer column that has been supplied. It is determined whether or not there is.

ステップＳ３２において、パターン抽出部３２は、属性判定部３１より供給されてきた判定結果に基づいて、人名としての登録の有無を示すパターンから判定パターンを生成する。すなわち、例えば、図７の上段で示されるように、出演者欄が、領域Ｚ３’で示されるような場合、「稲田吾郎」、「村下知子」、「ＭＥＧＵＭＵ」の単語が俳優や女優などの実在する人名として認識され、それ以外の「山田おさむ」、「太口美幸」、「蟹原友里」、「蟹原友美」、「代々木翔子」が実在しない人名として認識されたとき、登録の有無を示すパターンは、図７の下段で示されるように、「有」、「無」、「有」、「無」、「無」、「無」、「有」、「無」となる。この判定結果から、出演者名が実在する人名であるので、「有」との判定結果は、出演者名であることを示し、「無」との判定結果は、役名であることを示すものであるので、パターン抽出部３２は、判定パターンとして、「出」、「役」、「出」、「役」、「役」、「役」、「出」、「役」を生成して類似性距離計算部３３に供給する。ここで、「出」は、出演者名を示し、「役」は、役名を示す。 In step S 32, the pattern extraction unit 32 generates a determination pattern from a pattern indicating the presence / absence of registration as a personal name based on the determination result supplied from the attribute determination unit 31. That is, for example, as shown in the upper part of FIG. 7, when the performer column is indicated by the region Z3 ′, the words “Inada Goro”, “Murashita Tomoko”, “MEGUMU” are actors, actresses, etc. If it is recognized as a non-existing person name, other than "Osamu Yamada", "Miyuki Oguchi", "Yuri Sugawara", "Yumi Sugawara", "Syoyo Yoyogi" As shown in the lower part of FIG. 7, the patterns shown are “Yes”, “No”, “Yes”, “No”, “No”, “No”, “Yes”, “No”. From this determination result, since the performer name is a real person name, the determination result of “Yes” indicates that it is the name of the performer, and the determination result of “No” indicates that it is a role name. Therefore, the pattern extraction unit 32 generates “out”, “comb”, “out”, “comb”, “comb”, “comb”, “out”, and “comb” as determination patterns, and is similar. To the sex distance calculator 33. Here, “Out” indicates the name of the performer, and “Combination” indicates the name of the role.

ステップＳ３３において、類似性距離計算部３３は、パターンを識別するための図示せぬカウンタｉを１に初期化する。 In step S33, the similarity distance calculation unit 33 initializes a counter i (not shown) for identifying a pattern to 1.

ステップＳ３４において、類似性距離計算部３３は、判定パターンとパターン記憶部１９に記憶されている第ｉパターンとを比較して、正否の数をカウントする。すなわち、例えば、図７の下段で示されるように、判定パターンが「出」、「役」、「出」、「役」、「役」、「役」、「出」、「役」である場合、カウンタｉ＝１のとき、図８の上段で示されるように、第１パターンであるパターンＰ１は、「出」、「出」、「出」、「出」、「出」、「出」、「出」、「出」となり、判定パターンにおける「出」は全て正とカウントされ、「役」は否とカウントされるため、今の場合、正が５であり、否が３となる。 In step S 34, the similarity distance calculation unit 33 compares the determination pattern with the i-th pattern stored in the pattern storage unit 19 and counts the number of correct / incorrect. That is, for example, as shown in the lower part of FIG. 7, the determination pattern is “out”, “comb”, “out”, “comb”, “comb”, “comb”, “out”, “comb”. In this case, when the counter i = 1, as shown in the upper part of FIG. 8, the pattern P1, which is the first pattern, is “out”, “out”, “out”, “out”, “out”, “out”. ”,“ Out ”,“ Out ”, and“ Out ”in the determination pattern are all counted as positive, and“ Combination ”is counted as“ No ”. In this case, positive is 5 and negative is 3. .

また、カウンタｉ＝２のとき、図８の中段で示されるように第２パターンであるパターンＰ２は、「出」、「役」、「出」、「役」、「出」、「役」、「出」、「役」となるため、正が７であり、否が１となる。 Further, when the counter i = 2, as shown in the middle part of FIG. 8, the pattern P2, which is the second pattern, is “out”, “combination”, “out”, “combination”, “out”, “combination”. , “Out” and “role”, positive is 7 and negative is 1.

さらに、カウンタｉ＝３のとき、図８の下段で示されるように第３パターンであるパターンＰ３は、「役」、「出」、「役」、「出」、「役」、「出」、「役」、「出」となるため、正が１であり、否が７となる。 Further, when the counter i = 3, as shown in the lower part of FIG. 8, the pattern P3 which is the third pattern is “combination”, “out”, “combination”, “out”, “combination”, “out”. , “Position” and “Out”, so positive is 1 and negative is 7.

ステップＳ３５において、類似性距離計算部３３は、正否のカウント結果に基づいて、判定パターンと第ｉパターンとの類似性距離を計算し、パターン決定部３４に供給する。より具体的には、例えば、類似性距離計算部３３は、例えば、カウンタｉが１のとき、判定パターンは、８個の要素からなり、そのうち正しくない要素は３個であるので、類似性距離計算部３３は、類似性距離を３７．５％（＝３／８×１００）であると計算する。類似性距離は、類似しているほど０％に近い値となる。同様に、カウンタｉが２のとき、類似性距離計算部３３は、類似性距離を１２．５％（＝１／８×１００）であると計算する。さらに、カウンタｉが３のとき、類似性距離計算部３３は、類似性距離を８７．５％（＝７／８×１００）であると計算する。尚、第４乃至第８パターンについても同様の処理であるので、その説明は省略する。 In step S 35, the similarity distance calculation unit 33 calculates the similarity distance between the determination pattern and the i-th pattern based on the correct / incorrect count result, and supplies the similarity distance to the pattern determination unit 34. More specifically, for example, the similarity distance calculation unit 33, for example, when the counter i is 1, the determination pattern includes eight elements, and among them, there are three incorrect elements. The calculation unit 33 calculates the similarity distance as 37.5% (= 3/8 × 100). The similarity distance is closer to 0% as the similarity is more similar. Similarly, when the counter i is 2, the similarity distance calculation unit 33 calculates the similarity distance as 12.5% (= 1/8 × 100). Further, when the counter i is 3, the similarity distance calculation unit 33 calculates the similarity distance as 87.5% (= 7/8 × 100). Since the fourth to eighth patterns are the same process, the description thereof is omitted.

また、類似性距離は、上述した定義に限るものではなく、類似性の定量的な値が示せればその他のものでもよく、例えば、エディトグラフアルゴリズムなどを用いるようにしても良い。 The similarity distance is not limited to the above-described definition, and any other distance may be used as long as a quantitative value of the similarity can be shown. For example, an edit graph algorithm may be used.

ここで、エディトグラフアルゴリズムとは、例えば、図９で示されるようなものである。すなわち、図９においては、左から順にカウンタｉが１，２，３の場合であり、Ｘ軸およびＹ軸上のそれぞれに第ｉパターンと判定パターンとをそれぞれ１の間隔毎に配置する。そして、原点より順次Ｘ座標とＹ座標が一致する部分のパターンを比較し、一致する場合、対角線を引き、一致しない場合、Ｘ方向およびＹ方向にそれぞれ１ずつ水平方向に線を引く。同様の処理により、出来上がったグラフに対して、対角線の数を０として水平方向および垂直方向の線の数をそれぞれ１としたときの総和がエディトグラフアルゴリズムによる類似性距離である。 Here, the edit graph algorithm is, for example, as shown in FIG. That is, in FIG. 9, the counter i is 1, 2, and 3 in order from the left, and the i-th pattern and the determination pattern are arranged at intervals of 1 on the X axis and the Y axis, respectively. Then, the patterns of the portions where the X coordinate and the Y coordinate coincide with each other from the origin are compared, and if they match, a diagonal line is drawn, and if they do not match, a line is drawn in the horizontal direction one by one in the X direction and the Y direction. With the same processing, the total distance when the number of diagonal lines is 0 and the number of horizontal and vertical lines is 1 is the similarity distance according to the edit graph algorithm.

したがって、カウンタｉが１の場合、図９の左部で示されるように、判別パターンにおける「役」に対応する個数について、それぞれ否となるので、（０，０）乃至（１，１）において、対角線であり、（１，１）乃至（２，２）において、Ｘ方向およびＹ方向に１ずつの２直線となり、（２，２）乃至（３，３）において、対角線であり、（３，３）乃至（６，６）において、Ｘ方向およびＹ方向にそれぞれ１ずつの２直線となり、（６，６）乃至（７，７）において、対角線であり、（７，７）乃至（８，８）において、Ｘ方向およびＹ方向に１ずつの２直線となる。結果として、カウンタｉが１の場合、エディトグラフアルゴリズムを用いた類似性距離は１０となる。同様にして、カウンタｉが２の場合、図９の中央部で示されるように、（４，４）乃至（５，５）において、２直線である以外は、いずれも対角線となるので、類似性距離は２となる。さらに、同様にして、カウンタｉが３の場合、図９の右部で示されるように、（４，４）乃至（５，５）において対角線である以外は、いずれも２直線となるので、類似性距離は１４となる。 Therefore, when the counter i is 1, as shown in the left part of FIG. 9, the numbers corresponding to the “combination” in the discrimination pattern are respectively rejected, so in (0, 0) to (1, 1) Are diagonal lines in (1, 1) to (2, 2), one straight line in the X direction and one in the Y direction, and diagonal lines in (2, 2) to (3, 3), (3 , 3) to (6, 6), there are two straight lines in the X direction and Y direction, respectively, (6, 6) to (7, 7) are diagonal lines, and (7, 7) to (8 , 8), there are two straight lines, one in the X direction and one in the Y direction. As a result, when the counter i is 1, the similarity distance using the edit graph algorithm is 10. Similarly, when the counter i is 2, as shown in the central part of FIG. 9, in (4, 4) to (5, 5), all are diagonal lines except for two straight lines. The sex distance is 2. Further, similarly, when the counter i is 3, as shown in the right part of FIG. 9, all of them are two straight lines except for the diagonal lines in (4, 4) to (5, 5). The similarity distance is 14.

ステップＳ３６において、類似性距離計算部３３は、パターン記憶部１９に記憶されている全てのパターンと判定パターンとの類似性距離を計算したか否かを判定し、例えば、全てのパターンについて類似性距離を計算していない場合、ステップＳ３７に進み、カウンタｉを１インクリメントした後、処理は、ステップＳ３４に戻る。すなわち、全てのパターンとの類似性距離が計算されるまで、ステップＳ３４乃至Ｓ３７の処理が繰り返される。そして、ステップＳ３７において、全てのパターンとの類似性距離が求められた場合、ステップＳ３８において、パターン決定部３４は、計算された類似性距離のうち、最小となる値が、所定の閾値よりも小さいか否かを判定する。すなわち、最小の類似性距離が信頼できる値であるか否かが判定される。ステップＳ３８において、最小の類似性距離が所定の閾値よりも小さい、すなわち、信頼できる値であると判定された場合、処理は、ステップＳ３９に進む。 In step S36, the similarity distance calculation unit 33 determines whether or not the similarity distance between all patterns stored in the pattern storage unit 19 and the determination pattern has been calculated. If the distance has not been calculated, the process proceeds to step S37, and after the counter i is incremented by 1, the process returns to step S34. That is, the processes in steps S34 to S37 are repeated until similarity distances with all patterns are calculated. In step S37, when the similarity distances to all patterns are obtained, in step S38, the pattern determination unit 34 determines that the minimum value of the calculated similarity distances is greater than a predetermined threshold value. It is determined whether or not it is small. That is, it is determined whether or not the minimum similarity distance is a reliable value. If it is determined in step S38 that the minimum similarity distance is smaller than the predetermined threshold value, that is, it is a reliable value, the process proceeds to step S39.

ステップＳ３９において、パターン決定部３４は、最小の類似性距離となるパターンを出演者名の抽出に用いるパターンとして決定し、そのパターンの情報を出演者名抽出部３５に供給する。出演者名抽出部３５は、パターン決定部３４より供給されてきたパターンに基づいて、分別抽出部２１より供給されてくる単語より出演者名を抽出する。すなわち、例えば、出演者欄内の領域として、図７の領域Ｚ３’が供給されてきた場合、第１パターン乃至第３パターンのとき、第２パターンが類似性距離が最小となるので、出演者名抽出部３５は、「稲田吾郎」、「村下知子」、「蟹原友里」、「ＭＥＧＵＭＵ」を順次出演者名として抽出し、ステップＳ４０において、出演者名抽出結果記憶部２２に記憶させる。 In step S 39, the pattern determination unit 34 determines the pattern having the minimum similarity distance as a pattern used for the performer name extraction, and supplies the pattern information to the performer name extraction unit 35. The performer name extraction unit 35 extracts a performer name from the words supplied from the classification extraction unit 21 based on the pattern supplied from the pattern determination unit 34. That is, for example, when the area Z3 ′ in FIG. 7 has been supplied as the area in the performer column, since the second pattern has the minimum similarity distance in the case of the first pattern to the third pattern, the performer The name extraction unit 35 sequentially extracts “Inada Goro”, “Murashita Tomoko”, “Yuri Sugawara”, and “MEGUMU” as performer names, and stores them in the performer name extraction result storage unit 22 in step S40.

一方、ステップＳ３８において、最小の類似性距離が所定の値よりも大きく、信頼できる値ではないと判定された場合、ステップＳ４１において、パターン決定部３４は、第１パターンにより、全ての人名を出演者であるものとして抽出し、ステップＳ４０において、出演者名抽出結果記憶部２２に記憶させる。 On the other hand, if it is determined in step S38 that the minimum similarity distance is greater than the predetermined value and is not a reliable value, in step S41, the pattern determination unit 34 displays all the names according to the first pattern. It is extracted as a performer, and is stored in the performer name extraction result storage unit 22 in step S40.

以上の処理により、EPG表示画面の中から出演者欄内を特定して、出演者名のパターンを決定して、出演者名を抽出するようにしたので、出演者欄という出演者名が比較的規則的に配置されている可能性の高い領域内で出演者名の配置されるパターンを類似性距離を用いて決定することができる。結果として、出演者名の配置パターンの認識精度を向上させることが可能となる。また、類似性距離の信頼性が低い場合、出演者欄内の全ての人物名を出演者名として抽出することにより、出演者名の抽出漏れを抑制することが可能となる。 By performing the above process, the performer field is identified from the EPG display screen, the performer name pattern is determined, and the performer name is extracted. A pattern in which performer names are arranged in an area that is likely to be regularly arranged can be determined using the similarity distance. As a result, it becomes possible to improve the recognition accuracy of the arrangement pattern of performer names. When the similarity distance is low in reliability, it is possible to suppress omission of performer name extraction by extracting all person names in the performer column as performer names.

ここで、図３のフローチャートの説明に戻る。 Now, the description returns to the flowchart of FIG.

ステップＳ８において、出演者欄内判定処理が終了すると、ステップＳ９において、出演者欄外判定部１８は、出演者欄外判定処理を実行し、出演者欄以外の単語より出演者の単語を抽出して出演者名抽出結果記憶部２２に記憶させる。 In step S8, when the in-performer column determination process ends, in step S9, the performer out-of-line determination unit 18 performs a performer out-of-line determination process, and extracts performer words from words other than the performer column. It is stored in the performer name extraction result storage unit 22.

ここで、図１０のフローチャートを参照して、出演者欄外判定処理について説明する。 Here, with reference to the flowchart of FIG. 10, the performer margin determination process will be described.

ステップＳ５１において、パターン抽出部４１は、パターンを識別するための図示せぬカウンタｉを１に初期化する。 In step S51, the pattern extraction unit 41 initializes a counter i (not shown) for identifying a pattern to 1.

ステップＳ５２において、パターン抽出部４１は、第ｉパターンに対応する属性の羅列パターンであると仮定して、順次、分別抽出部２１より供給されてきた出演者欄以外の形態素解析結果より抽出し、パターン比較部４２に供給する。このとき、パターン抽出部４１は、第ｉパターンで抽出していることをパターン比較部４２に通知する。 In step S52, assuming that the pattern extraction unit 41 is an enumeration pattern of attributes corresponding to the i-th pattern, the pattern extraction unit 41 sequentially extracts from the morpheme analysis results other than the performer column supplied from the classification extraction unit 21, This is supplied to the pattern comparison unit 42. At this time, the pattern extraction unit 41 notifies the pattern comparison unit 42 that extraction is performed with the i-th pattern.

ステップＳ５３において、パターン比較部４２は、分別抽出部２１より供給されてきた出演者欄以外の形態素解析結果より順次、パターン抽出部４１により抽出されてきた単語の属性と、第ｉパターンにおける属性の羅列パターンとを比較する。 In step S53, the pattern comparison unit 42 sequentially extracts the attribute of the word extracted by the pattern extraction unit 41 and the attribute of the i-th pattern from the morpheme analysis result other than the performer column supplied from the classification extraction unit 21. Compare with the enumeration pattern.

すなわち、例えば、EPGテキストデータ抽出部１３より、図１１で示されるようなテキストデータが抽出された場合、以下のような比較がなされる。図１１においては、「今回は新婚演技派俳優・鳥見辰吾、酒豪のベテラン女優・藤川弓子がＭｒ．ミリオネアにチャレンジ。注目はセンターシートをいきなり獲得した鳥見。３度目の正直でようやくＭｒ．ミリオネアのみたんもと直接対決が実現。１０００万円の夢は『仲間に豪華な自転車をプレゼントしてサイクリングに行く』。鳥見の挑戦が始まる。 That is, for example, when text data as shown in FIG. 11 is extracted from the EPG text data extraction unit 13, the following comparison is made. In Fig. 11, “This time, newly-married actor Satoshi Tomomi and veteran actress Yumiko Fujikawa challenged Mr. Millionaire. Attention is the first time for Mr. Mirinea who finally won the center seat. A face-to-face confrontation is realized, and the dream of 10 million yen is “Give a gorgeous bicycle to a friend to go cycling.” The challenge of Tori begins.

しかしながらクイズもそっちのけで話題は『３年C組金九先生』の撮影秘話へ。“恩師”武川鉄矢は１４問目まで行った。ライフラインをうまいタイミングで使い何とか恩師と同じ１４問目まで来た。鳥見に襲い掛かるのはスポーツ問題。果たして鳥見は恩師を超えることができるのか！？その他、“のみさんと見つめ合うために来た”藤川の１０００万円の夢は『劇団の東京公演の資金』。出演者司会：のみたんもゲスト挑戦者：鳥見辰吾藤川弓子他」で示されるようなテキストデータが抽出されている。 However, the quiz is also over there, and the topic is to the secret story of the shooting of “Third year C-Gumi Kin-kun teacher”. “Teacher” Tetsuya Takekawa went to the 14th question. The lifeline was used at a good timing, and I managed to come up to the 14th question. It is a sports problem that attacks Tomi. Can Torimi really exceed the teacher? ? In addition, Fujikawa ’s 10 million yen dream, “I came to stare at you,” was “Fund for the Tokyo Theater Performance”. Performer Moderator: Minato also guest challenger: Toriko Saito Yumiko Fujikawa et al. "

このうち、例えば、「新婚演技派俳優・鳥見辰吾、酒豪のベテラン女優・藤川弓子」については、形態素解析により「新婚」、「演技」、「派」、「俳優」、「・」、「鳥見辰吾」「、」「酒豪」、「の」、「ベテラン」、「女優」「・」、および「藤川弓子」と分解される。ｉ＝１の場合、すなわち、第１パターンの場合、パターン抽出部４１は、「出演者名」、「記号」、「出演者名」であることを想定し、順次、最初に「新婚」、「演技」、「派」を抽出し、次に、「演技」、「派」、「俳優」を抽出し、さらに、「派」、「俳優」、「・」を抽出し、順次、３個の連続する単語をパターンとして抽出し、パターン比較部４２に供給する。 Among these, for example, “Newlywed acting group actor Tori Tomomi, Liquor veteran actress Yumiko Fujikawa” was analyzed by morphological analysis as “newly married”, “acting”, “group”, “actor”, “ It is broken down into “辰吾”, “,” “sake-go”, “no”, “veteran”, “actress” “·”, and “Yuko Fujikawa”. In the case of i = 1, that is, in the case of the first pattern, the pattern extraction unit 41 assumes “performer name”, “symbol”, and “performer name”, and sequentially sets “newly-married”, Extract "act", "sect", then extract "act", "sect", "actor", then extract "sect", "actor", "." Are extracted as a pattern and supplied to the pattern comparison unit 42.

パターン比較部４２は、パターン抽出部４１より供給されてきた、この３個の単語に対する属性の羅列パターンと、第１パターンにおける属性の羅列パターンとを比較する。 The pattern comparison unit 42 compares the attribute enumeration pattern for the three words supplied from the pattern extraction unit 41 with the attribute enumeration pattern in the first pattern.

ステップＳ５４において、パターン比較部４２は、羅列パターンが一致するか否かを判定する。すなわち、例えば、「新婚演技派俳優・鳥見辰吾、酒豪のベテラン女優・藤川弓子」の場合、人物名は、「鳥見辰吾」および「藤川弓子」のみであり、「鳥見辰吾」および「藤川弓子」が、それぞれ俳優名および女優名として認識されたとしても、「出演者名」、「記号」、「出演者名」のパターンは成立しないので、一致しないものと判定され、処理は、ステップＳ５５に進む。 In step S54, the pattern comparison unit 42 determines whether the enumeration patterns match. That is, for example, in the case of “the newlywed acting actor Tori Tomomi, the liquor veteran actress Yumi Fujikawa”, the names are only “Tomi Tomi” and “Yuko Fujikawa”, “Tomi Tomi” and “Yuko Fujikawa” However, even if the names of actors and actresses are recognized, the patterns of “performer name”, “symbol”, and “performer name” are not established, so it is determined that they do not match, and the process proceeds to step S55. move on.

ステップＳ５５において、パターン比較部４２は、パターン記憶部１９に記憶されているパターンが全て試されたか否かを判定し、全て試されていない場合、ステップＳ５６において、パターン抽出部４１は、カウンタｉを１インクリメントして、処理は、ステップＳ５２に戻る。 In step S55, the pattern comparison unit 42 determines whether or not all the patterns stored in the pattern storage unit 19 have been tried. If all the patterns have not been tried, in step S56, the pattern extraction unit 41 determines whether the counter i Is incremented by 1, and the process returns to step S52.

一方、例えば、テキストデータの後段において、「出演者司会：のみたんもゲスト挑戦者：鳥見辰吾藤川弓子他」の部位については、「出演者」、「司会」、「：」、「のみたんも」、「ゲスト」、「挑戦者」、「：」、「鳥見辰吾」、「藤川弓子」、および「他」と単語が抽出されるが、カウンタｉ＝２の場合、パターン抽出部４１は、「役名」、「記号」、「出演者名」を想定して、最初に「出演者」、「司会」、「：」を抽出し、次に、「司会」、「：」、「のみたんも」を抽出し、さらに、「：」、「のみたんも」、「ゲスト」を抽出し、順次、３個の単語を抽出して、パターン比較部４２に供給する。 On the other hand, for example, in the latter part of the text data, “Performer”, “Moderator”, “:”, “Mitanmo” for the part of “Performer Moderator: Mintanmo Guest Challenger: Toriko Saito Yumiko etc.” ”,“ Guest ”,“ Challenge ”,“: ”,“ Torimi Satoshi ”,“ Fujikawa Yumiko ”, and“ Other ”are extracted, but when the counter i = 2, the pattern extraction unit 41 Assuming “title”, “symbol”, and “performer name”, first “performer”, “moderator”, “:” are extracted, then “moderator”, “:”, “Mitan” ”,“: ”,“ Mintanmo ”, and“ Guest ”are extracted, and three words are sequentially extracted and supplied to the pattern comparison unit 42.

すると、この場合、抽出された「司会」、「：」、「のみたんも」の属性が、「司会」は役名であり、「：」は記号であり、「のみたんも」が著名人として登録されていたとすると、ステップＳ５４において、パターン比較部４２は、第３パターンと一致するものとみなし、処理は、ステップＳ５５に進む。 Then, in this case, the extracted attributes of “Moderator”, “:”, “Mitanmo”, “Moderator” is the title, “:” is a symbol, and “Mitanmo” is a celebrity. If registered, in step S54, the pattern comparison unit 42 considers that the pattern matches the third pattern, and the process proceeds to step S55.

ステップＳ５５において、パターン比較部４２は、一致したパターンで出演者名を抽出するように出演者名抽出部４３を指示する。そこで、出演者名抽出部４３は、第３パターンである「役名」、「記号」、「出演者名」のパターンに基づいて、出演者名を抽出して、出演者名抽出結果記憶部２２に記憶させる。そして、処理は、ステップＳ５６に進む。 In step S55, the pattern comparison unit 42 instructs the performer name extraction unit 43 to extract performer names with the matched pattern. Therefore, the performer name extraction unit 43 extracts the performer name based on the third pattern of “title”, “symbol”, and “performer name”, and performs the performer name extraction result storage unit 22. Remember me. Then, the process proceeds to step S56.

すなわち、図１１のテキストデータの後段の場合、「記号」の後に配置される単語は、出演者名であることになるので、「司会」、「：」、「のみたんも」の配置より属性の配置パターンが、「役名」、「記号」、「出演者名」となるので「のみたんも」が、「挑戦者」、「：」、「鳥見辰吾」の配置より同様に「鳥見辰吾」が、それぞれ出演者として抽出されて、出演者名抽出結果記憶部２２に記憶される。 That is, in the latter part of the text data in FIG. 11, the word placed after “symbol” is the name of the performer. Therefore, the attribute is arranged from the arrangement of “moderator”, “:”, “Mitanmo”. Since the arrangement pattern of “Role name”, “Symbol”, and “Performer name” is “Mitanmo”, “Torimi” is the same as “Challenge”, “:”, and “Tomi”. Are extracted as performers and stored in the performer name extraction result storage unit 22.

ステップＳ５６において、全てのパターンについて試されたと判定された場合、すなわち、今の場合、羅列パターン数を示すカウンタｉは８までであるので、カウンタｉが８より大きい場合、ステップＳ５８において、パターン比較部４２は、全てのパターンにおいて一致するパターンが存在しないか否かを判定する。今の場合、第３パターンにおいて、一致したので、ステップＳ５９の処理がスキップされる。 If it is determined in step S56 that all patterns have been tried, that is, in this case, the counter i indicating the number of enumerated patterns is up to 8, so if the counter i is greater than 8, the pattern comparison is performed in step S58. The unit 42 determines whether or not there is a matching pattern in all patterns. In this case, since the third pattern matches, the process of step S59 is skipped.

一方、ステップＳ５８において、いずれのパターンにも一致しなかった場合、ステップＳ５９において、パターン比較部４２は、第１パターンで出演者名を抽出するように出演者名抽出部４３を指示する。すなわち、いずれのパターンにも一致しない場合、出演者として人物名が抽出されないことになるので、人物名として読み出せるものについては、いずれかの記号を挟んで、羅列されているパターンに限り、全て読み出される。 On the other hand, if the pattern does not match any pattern in step S58, the pattern comparison unit 42 instructs the performer name extraction unit 43 to extract the performer name in the first pattern in step S59. In other words, if it does not match any of the patterns, the person name will not be extracted as a performer. Read out.

また、例えば、図１２で示されるように、EPGデータよりテキストデータが抽出された場合、後段の「黒石鈴子：谷川京子葛山春樹：小田恵介島中沙織：大林麻央大河外民雄：東村雅彦深倉ミチル：サチコ柏本マキ：大池栄子大河外孝信：岩井正則黒石徹：天野ひろき葛山道造：中爪功」の表示により、第１パターンである「出演者名」、「記号」、「出演者名」であるか、第２パターンである「役名」、「記号」、「出演者名」であるかのいずれかのパターンとなる。すなわち、ドラマなどの場合、人物名は、「役名」とも、「出演者名」とも認識され得る。例えば、実際には、第３パターンであったとすると、「小田恵介」が俳優名として形態素解析処理で認識される有名俳優の人物名であることが認識できていれば、少なくとも、「葛山春樹」、「：」、「小田恵介」は、「役名」、「記号」、「出演者名」であることが認識されるので、少なくとも１は、第３パターンで一致するものとみなされることになるので、「谷川京子」、「小田恵介」、「大林麻央」、「東村雅彦」、「サチコ」、「大池栄子」、「岩井正則」、「天野ひろき」、および「中爪功」が出演者名として抽出されることになる。 For example, as shown in FIG. 12, when text data is extracted from EPG data, “Suzuko Kuroishi: Kyoko Tanigawa Haruki Kuzuyama: Saori Oda Saori Nakajima: Mao Obayashi Tamio Ogai: Masahiko Higashimura Michiru : Sachiko Makimoto Maki: Eiko Oike Takanobu Ogauchi: Masanori Iwai Toru Kuroishi: Hiroki Amano Michizo Kuzuyama: Isao Nakazume The first pattern “Performer Name”, “Symbol”, “Performer Name” Or the second pattern “title”, “symbol”, or “performer name”. That is, in the case of a drama or the like, the name of the person can be recognized as both “title” and “performer name”. For example, if it is actually the third pattern, at least “Haruki Kuzuyama” will be recognized if “Oda Keisuke” is recognized as the actor name in the morphological analysis process. , “:”, And “Kesuke Oda” are recognized as “title”, “symbol”, and “performer name”, so that at least one is regarded as a match in the third pattern. So, “Kyoko Tanigawa”, “Kesuke Oda”, “Mao Obayashi”, “Mashiko Higashimura”, “Sachiko”, “Eiko Oike”, “Massunori Iwai”, “Hiroki Amano”, and “Isao Nakazume” performers It will be extracted as a name.

また、図１２における後段で、全ての名称が人物名としてしか認識できないような場合、ステップＳ５８において、いずれのパターンにも認識されないことになるので、ステップＳ５９において、全ての人物名である、「黒石鈴子」、「谷川京子」、「葛山春樹」、「小田恵介」、「島中沙織」、「大林麻央」、「大河外民雄」、「東村雅彦」、「深倉ミチル」、「サチコ」、「柏本マキ」、「大池栄子」、「大河外孝信」、「岩井正則」、「黒石徹」、「天野ひろき」、「葛山道造」、「中爪功」が出演者名として抽出されることになる。この場合、出演者名としては、誤りを含む可能性があるが、少なくとも全ての出演者名が表示されることになる。 In the latter part of FIG. 12, if all names can only be recognized as person names, no pattern is recognized in step S58. Therefore, in step S59, all person names are “ Suzuko Kuroishi, Kyoko Tanigawa, Haruki Kuzuyama, Keisuke Oda, Saori Shimanaka, Mao Obayashi, Tamio Oga, Masahiko Higashimura, Michiru Fukakura, Sachiko, "Maki Enomoto", "Eiko Oike", "Takanobu Ogai", "Masunori Iwai", "Toru Kuroishi", "Hiroki Amano", "Michizo Kazuyama", "Isao Nakazume" are selected as performers It will be. In this case, the performer names may include an error, but at least all performer names are displayed.

以上のような処理により、出演者が表示される属性の羅列パターンを予め設定し、形態素解析結果に対して、設定された属性の羅列パターンと比較して、一致する属性の羅列パターンに基づいて、出演者の抽出することにより、効率良く出演者を抽出することが可能となる。 Through the processing as described above, an enumeration pattern of attributes in which the performers are displayed is set in advance, and the morphological analysis result is compared with the enumeration pattern of the set attributes, based on the matching attribute enumeration pattern. By extracting performers, performers can be extracted efficiently.

ステップＳ９において、出演者欄外判定処理が終了すると、ステップＳ１０において、出力部２３は、出演者名抽出結果記憶部２２に記憶されている出演者の名前を読み出し、表示部６に表示する。 In step S9, when the performer margin determination process is completed, in step S10, the output unit 23 reads the name of the performer stored in the performer name extraction result storage unit 22 and displays it on the display unit 6.

この処理により、表示部６は、例えば、図１３で示されるような画面により人名として出演者名を表示する。図１３においては、通常の放送番組の表示欄１０２の右側に人名表示欄１２１が設けられ、抽出された人名を選択するとき操作されるボタン１３１乃至１３３が、抽出された出演者名に対応して設けられている。図１３においては、「社長部長」の出演者名に対してボタン１３１が設けられており、「ベキ男」の出演者名に対してボタン１３２が設けられており、「変奈えみり」の出演者名に対してボタン１３３が設けられている。 By this processing, the display unit 6 displays the name of the performer as the name of the person on a screen as shown in FIG. 13, for example. In FIG. 13, a personal name display field 121 is provided on the right side of the normal broadcast program display field 102, and buttons 131 to 133 operated when selecting the extracted personal name correspond to the extracted performer names. Is provided. In FIG. 13, a button 131 is provided for the performer name of “President Manager”, a button 132 is provided for the performer name of “Beki Man”, and the appearance of “Emi Kanina” A button 133 is provided for the name of the person.

ステップＳ１１において、番組検索部２５は、操作部５が操作されて、ボタン１３１乃至１３３のいずれかが操作されて、出演者名である人名が選択されたか否かを判定する。例えば、図１３において、ボタン１３１が操作部５により操作されて、「社長部長」のキーワードが選択された場合、ステップＳ１２において、番組検索部２５は、EPG取得部１２またはiEPG取得部１４より供給されてくるEPG情報に基づいて、「社長部長」のキーワードにより番組を検索し（EPG情報の番組情報に「社長部長」のキーワードを含む番組を検索し）、ステップＳ１３において、検索結果を、例えば、図１４で示されるように表示部６に表示させる。また、ステップＳ１１において、選択されなかった場合、ステップＳ１２，Ｓ１３の処理はスキップされる。 In step S 11, the program search unit 25 determines whether or not the operation unit 5 is operated and any of the buttons 131 to 133 is operated to select a performer name. For example, in FIG. 13, when the button 131 is operated by the operation unit 5 and the keyword “president's manager” is selected, the program search unit 25 supplies from the EPG acquisition unit 12 or the iEPG acquisition unit 14 in step S12. Based on the received EPG information, a program is searched with the keyword “President Manager” (search for a program including the keyword “President Director” in the program information of EPG information). As shown in FIG. 14, it is displayed on the display unit 6. If no selection is made in step S11, the processes in steps S12 and S13 are skipped.

図１４においては、選択キーワードタブ１５１が設けられており、選択されたキーワードが示されており、図１４においては、選択されたキーワードである「社長部長」が示されている。その下には、検索結果表示欄１５２が設けられており、選択されたキーワードで検索された番組が表示されている。図１４においては、最上段に、「明日 1:05 AM 映画劇場「台の向こうに」」が表示され、第２段目には、「2:30 AM Howbiz Extra #201」が表示され、第３段目には、「9:30 PM 木曜洋画劇場「インディアン・ゲーム」が表示され、第４段目には、「0:00 AM インディーズムービーフェスティバル〜自主制作」が表示され、第５段目には、「050 AM 映画劇場「マイ・ホーム」」が表示され、第６段目には、「2:30 AM ビリーさん自らを語る」が表示され、第７段目には、「11:00 PM 映画『お墓と結婚』（無料放送）」が表示されており、それぞれの番組名と放送時間が表示される。これらの番組の表示欄は選択することにより、例えば、録画予約ができるようにしてもよい。検索結果表示欄の下には、右側に「戻る」と表示されたボタン１５３が設けられている。ボタン１５３は、選択キーワードタブ１５１による表示を終了して元に戻すとき操作される。また、ボタン１５３の左側には「オプション」と表示されたボタン１５４が設けられている。ボタン１５４は、オプション操作を実行させるとき操作される。 In FIG. 14, a selected keyword tab 151 is provided to show the selected keyword. In FIG. 14, “president president”, which is the selected keyword, is shown. Below that, a search result display field 152 is provided, and a program searched with the selected keyword is displayed. In FIG. 14, “Tomorrow 1:05 AM movie theater“ Beyond the table ”” is displayed at the top, and “2:30 AM Howbiz Extra # 201” is displayed at the second, In the third row, “9:30 PM Thursday Western Theater“ Indian Game ”is displayed. In the fourth row,“ 0:00 AM Indie Movie Festival ~ Independent Production ”is displayed. Displays “050 AM movie theater“ My Home ””, the sixth row displays “2:30 AM Billy talks about himself”, and the seventh row displays “11: 00 PM The movie “Tomb and Marriage” (free broadcast) ”is displayed, and the program name and broadcast time of each is displayed. For example, a recording reservation may be made by selecting the display column of these programs. Below the search result display field, a button 153 with “return” displayed on the right side is provided. The button 153 is operated to end the display by the selected keyword tab 151 and return it to the original state. Further, on the left side of the button 153, a button 154 displayed as “option” is provided. The button 154 is operated when an option operation is executed.

ステップＳ１４において、番組検索部２５は、操作部５が操作されて語句登録が指示されたか否かを判定する。例えば、ボタン１５４が操作部５により操作され、図１５で示されるようにオプション操作ダイアログボックス１７１が表示され、さらに、オプション操作ダイアログボックス１７１上の「語句登録」と表示されたボタン１８１が押下されて、語句登録が指示された場合、ステップＳ１５において、番組検索部２５は、今現在選択されている語句である「社長部長」を出演者名抽出結果記憶部２２に記憶させる。この処理により、語句登録が指示された出演者名は、EPGデータに含まれていなくても、常に人名表示欄１２１に表示される。 In step S14, the program search unit 25 determines whether the operation unit 5 has been operated to instruct word registration. For example, the button 154 is operated by the operation unit 5 and the option operation dialog box 171 is displayed as shown in FIG. 15, and the button 181 displayed as “Register word” on the option operation dialog box 171 is pressed. When word registration is instructed, in step S15, the program search unit 25 causes the performer name extraction result storage unit 22 to store the currently selected word “president president” in the performer name extraction result storage unit 22. By this processing, the name of the performer instructed to register the phrase is always displayed in the personal name display column 121 even if it is not included in the EPG data.

尚、図１５のオプション操作ダイアログボックス１７１には、「語句登録」が指示されるとき操作されるボタン１８１およびオプション操作をキャンセルするとき操作されるボタン１８２が設けられている。 The option operation dialog box 171 shown in FIG. 15 is provided with a button 181 that is operated when “register word” is instructed and a button 182 that is operated when canceling the option operation.

一方、ステップＳ１４において、語句登録が指示されなかった場合、ステップＳ１５の処理は、スキップされる。 On the other hand, when the word registration is not instructed in step S14, the process of step S15 is skipped.

そして、ステップＳ１６において、終了が指示されたか否かが判定され、終了が指示されていない場合、処理は、ステップＳ１１に戻り、終了が指示された場合、処理は、終了する。 In step S16, it is determined whether or not termination has been instructed. If termination has not been instructed, the process returns to step S11. If termination has been instructed, the process ends.

以上の処理によれば、電子番組表（EPG）に含まれる情報に基づいて、レイアウトの情報から出演者欄の領域を特定し、出演者欄内の情報については、出演者名が規則的に配置されている可能性が高いため、出演者名および役名といった記号を含まない情報の配置によりパターンを解析し、解析したパターンに基づいて出演者名を抽出するようにしたので、より高い精度で出演者名を抽出することが可能となる。 According to the above processing, based on the information included in the electronic program guide (EPG), the area of the performer column is specified from the layout information, and the performer name is regularly specified for the information in the performer column. Since there is a high possibility of being placed, the pattern is analyzed by placing information that does not include symbols such as performer name and role name, and the performer name is extracted based on the analyzed pattern, so with higher accuracy It is possible to extract performer names.

また、出演者欄外の情報については、出演者名が出演者欄内よりも規則的には配置されていない可能性があるので、出演者名、役名に加えて、記号の配置に基づいて、パターンを解析し、解析したパターンに基づいて出演者名を抽出するようにしたので、より高い精度で出演者名を抽出することが可能となる。 Also, for information outside the performer column, the performer name may not be regularly arranged than in the performer column, so in addition to the performer name and role name, based on the arrangement of symbols, Since the pattern is analyzed and the performer name is extracted based on the analyzed pattern, the performer name can be extracted with higher accuracy.

結果として、出演者欄内と出演者欄外とを区別して出演者名を抽出する方法を切り替えるようにすることで、高精度で、かつ、効率良く出演者名を抽出することが可能となる。 As a result, it is possible to extract the performer name with high accuracy and efficiency by switching the method of extracting the performer name by distinguishing between the performer field and the performer field.

また、以上においては、コンテンツのメタデータがEPGである例について説明してきたが、コンテンツの付加情報としてのメタデータであれば、EPG以外のものであってもよく、例えば、ECG（Electronic Contents Guide）などであってもよい。 In the above description, the example in which the content metadata is EPG has been described. However, any metadata other than EPG may be used as long as it is metadata as content additional information. For example, ECG (Electronic Contents Guide) Or the like.

さらに、以上においては、コンテンツがテレビジョン番組である例について説明してきたが、メタデータを備えたコンテンツであれば、テレビジョン番組以外であってもよく、例えば、ネットワークを介してダウンロードされる動画像コンテンツや音楽コンテンツであってもよいし、DVD（Digital Versatile Disc）やBD（Blu-Ray Disc）などのデータ格納媒体に格納された動画像コンテンツや音楽コンテンツであってもよい。 Furthermore, the example in which the content is a television program has been described above. However, the content may be other than a television program as long as the content includes metadata, for example, a video downloaded via a network. It may be image content or music content, or may be moving image content or music content stored in a data storage medium such as a DVD (Digital Versatile Disc) or BD (Blu-Ray Disc).

ところで、上述した一連のテキスト処理は、ハードウェアにより実行させることもできるが、ソフトウェアにより実行させることもできる。一連の処理をソフトウェアにより実行させる場合には、そのソフトウェアを構成するプログラムが、専用のハードウェアに組み込まれているコンピュータ、または、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータなどに、記録媒体からインストールされる。 Incidentally, the series of text processing described above can be executed by hardware, but can also be executed by software. When a series of processing is executed by software, a program constituting the software may execute various functions by installing a computer incorporated in dedicated hardware or various programs. For example, it is installed from a recording medium in a general-purpose personal computer or the like.

図１６は、汎用のパーソナルコンピュータの構成例を示している。このパーソナルコンピュータは、CPU(Central Processing Unit)１００１を内蔵している。CPU１００１にはバス１００４を介して、入出力インタフェース１００５が接続されている。バス１００４には、ROM(Read Only Memory)１００２およびRAM(Random Access Memory)１００３が接続されている。 FIG. 16 shows a configuration example of a general-purpose personal computer. This personal computer incorporates a CPU (Central Processing Unit) 1001. An input / output interface 1005 is connected to the CPU 1001 via the bus 1004. A ROM (Read Only Memory) 1002 and a RAM (Random Access Memory) 1003 are connected to the bus 1004.

入出力インタフェース１００５には、ユーザが操作コマンドを入力するキーボード、マウスなどの入力デバイスよりなる入力部１００６、処理操作画面や処理結果の画像を表示デバイスに出力する出力部１００７、プログラムや各種データを格納するハードディスクドライブなどよりなる記憶部１００８、LAN（Local Area Network）アダプタなどよりなり、インターネットに代表されるネットワークを介した通信処理を実行する通信部１００９が接続されている。また、磁気ディスク（フレキシブルディスクを含む）、光ディスク（CD-ROM(Compact Disc-Read Only Memory)、DVD(Digital Versatile Disc)を含む）、光磁気ディスク（ＭＤ(Mini Disc)を含む）、もしくは半導体メモリなどのリムーバブルメディア１０１１に対してデータを読み書きするドライブ１０１０が接続されている。 An input / output interface 1005 includes an input unit 1006 including an input device such as a keyboard and a mouse for a user to input an operation command, an output unit 1007 for outputting a processing operation screen and an image of a processing result to a display device, a program and various data. A storage unit 1008 including a hard disk drive to be stored, a LAN (Local Area Network) adapter, and the like, and a communication unit 1009 for performing communication processing via a network represented by the Internet are connected. Also, a magnetic disk (including a flexible disk), an optical disk (including a CD-ROM (compact disc-read only memory), a DVD (digital versatile disc)), a magneto-optical disk (including an MD (mini disc)), or a semiconductor A drive 1010 for reading / writing data from / to a removable medium 1011 such as a memory is connected.

CPU１００１は、ROM１００２に記憶されているプログラム、または磁気ディスク、光ディスク、光磁気ディスク、もしくは半導体メモリ等のリムーバブルメディア１０１１から読み出されて記憶部１００８にインストールされ、記憶部１００８からRAM１００３にロードされたプログラムに従って各種の処理を実行する。RAM１００３にはまた、CPU１００１が各種の処理を実行する上において必要なデータなども適宜記憶される。 The CPU 1001 is read from a program stored in the ROM 1002 or a removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, installed in the storage unit 1008, and loaded from the storage unit 1008 to the RAM 1003. Various processes are executed according to the program. The RAM 1003 also appropriately stores data necessary for the CPU 1001 to execute various processes.

尚、本明細書において、記録媒体に記録されるプログラムを記述するステップは、記載された順序に沿って時系列的に行われる処理は、もちろん、必ずしも時系列的に処理されなくとも、並列的あるいは個別に実行される処理を含むものである。 In this specification, the step of describing the program recorded on the recording medium is not limited to the processing performed in time series in the order described, but of course, it is not necessarily performed in time series. Or the process performed separately is included.

本発明を適用した情報処理装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the information processing apparatus to which this invention is applied. 出演者名の羅列パターンを説明する図である。It is a figure explaining the enumeration pattern of a performer name. 出演者名抽出処理を説明するフローチャートである。It is a flowchart explaining a performer name extraction process. 出演者名抽出処理を実行する際の表示画像の例を示す図である。It is a figure which shows the example of the display image at the time of performing a performer name extraction process. 出演者名抽出処理を説明する図である。It is a figure explaining a performer name extraction process. 出演者欄内判定処理を説明するフローチャートである。It is a flowchart explaining the determination process in a performer column. 出演者欄内判定処理を説明する図である。It is a figure explaining the determination process in a performer column. 類似性距離の計算方法を説明する図である。It is a figure explaining the calculation method of similarity distance. エディトグラフアルゴリズムを用いた類似性距離の計算方法を説明する図である。It is a figure explaining the calculation method of the similarity distance using an edit graph algorithm. 出演者欄外判定処理を説明するフローチャートである。It is a flowchart explaining a performer margin determination process. 出演者欄外判定処理を説明する図である。It is a figure explaining a performer margin determination process. 出演者欄外判定処理を説明する図である。It is a figure explaining a performer margin determination process. 人名の表示画面の表示例を説明する図である。It is a figure explaining the example of a display of the display screen of a person name. 人名を選択したときの表示画面の表示例を説明する図である。It is a figure explaining the example of a display of a display screen when a person name is selected. 語句登録を指示するときの表示画面の表示例を説明する図である。It is a figure explaining the example of a display of a display screen when instruct | indicating word registration. パーソナルコンピュータの構成例を説明する図である。And FIG. 11 is a diagram illustrating a configuration example of a personal computer.

Explanation of symbols

１情報処理装置，１３ EPGテキストデータ抽出部，１５形態素解析部，１６辞書記憶部，１７形態素解析結果バッファ，１８出演者欄外判定部，１９パターン記憶部，２０レイアウト認識部，２１分別抽出部，２２出演者名抽出記憶部，２３出力部，２４出演者欄内判定部，２５番組検索部，２６チューナ，３１属性判定部，３２パターン抽出部，３３類似性距離計算部，３４パターン決定部，３５出演者名抽出部，４１パターン抽出部，４２パターン比較部，４３出演者名抽出部 DESCRIPTION OF SYMBOLS 1 Information processing apparatus, 13 EPG text data extraction part, 15 Morphological analysis part, 16 Dictionary storage part, 17 Morphological analysis result buffer, 18 Performer extracurricular judgment part, 19 Pattern storage part, 20 Layout recognition part, 21 Sorting extraction part, 22 Performer name extraction storage unit, 23 output unit, 24 performer in-column determination unit, 25 program search unit, 26 tuner, 31 attribute determination unit, 32 pattern extraction unit, 33 similarity distance calculation unit, 34 pattern determination unit, 35 performer name extraction unit, 41 pattern extraction unit, 42 pattern comparison unit, 43 performer name extraction unit

Claims

An acquisition means for acquiring content metadata;
Morphological analysis means for morphological analysis of text information included in the metadata of the content;
Comparison means for comparing the morphological analysis results of the morpheme analysis means and the enumeration patterns of a plurality of predetermined performer names;
Based on the comparison result of the comparison means, if there is an enumeration pattern of a predetermined performer name that matches in at least one of the morphological analysis results, an appearance is made with the enumeration pattern of the corresponding predetermined performer name An information processing apparatus comprising: first extraction means for extracting a person's name.

From the morpheme analysis result of the morpheme analysis means, further comprising a layout recognition means for recognizing the layout for each content described,
The comparison unit compares information outside the performer name column with a plurality of predetermined performer name enumeration patterns in the layout of the morpheme analysis result of the morpheme analysis unit recognized by the layout recognition unit. The information processing apparatus according to 1.

From the morpheme analysis result of the morpheme analysis means, layout recognition means for recognizing the layout for each content described;
Of the layouts of the morpheme analysis results of the morpheme analysis means recognized by the layout recognition means, the similarity for calculating the distance of the similarity between the information in the performer name field and the enumeration pattern of a plurality of predetermined performer names Degree calculation means,
Based on the similarity distance calculation result of the similarity distance calculation means, a second extraction means for extracting a performer name from the morpheme analysis result with an enumeration pattern of predetermined performer names having a minimum similarity distance The information processing apparatus according to claim 1.

The enumeration pattern of the predetermined performer name is performer name, symbol, performer name, symbol, .., performer name, symbol, role name, performer name ..., or role name, symbol, performer name, role name. The information processing apparatus according to claim 1, further comprising: an enumeration pattern of performer names, performer names, and so on.

The information processing apparatus according to claim 1, wherein the content includes a television program, and the metadata includes information related to the television program.

An acquisition step of acquiring content metadata;
A morpheme analysis step for morphological analysis of text information included in the metadata of the content;
A comparison step for comparing a morphological analysis result in the processing of the morpheme analysis step and an enumeration pattern of a plurality of predetermined performer names;
Based on the comparison result in the process of the comparison step, if there is an enumeration pattern of a predetermined performer name that matches at least one place among the morphological analysis results, an enumeration of the matched predetermined performer names A first extraction step of extracting a performer name by a pattern.

An acquisition step of acquiring content metadata;
A morpheme analysis step for morphological analysis of text information included in the metadata of the content;
A comparison step for comparing a morphological analysis result in the processing of the morpheme analysis step and an enumeration pattern of a plurality of predetermined performer names;
Based on the comparison result in the process of the comparison step, if there is an enumeration pattern of a predetermined performer name that matches at least one place among the morphological analysis results, an enumeration of the matched predetermined performer names A program for causing a computer to execute a process including a first extraction step of extracting a performer name by a pattern.

A program storage medium in which the program according to claim 7 is stored.