JP2009043189A

JP2009043189A - Information processor, information processing method, and program

Info

Publication number: JP2009043189A
Application number: JP2007210309A
Authority: JP
Inventors: Shunji Yoshimura; 俊司吉村
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2007-08-10
Filing date: 2007-08-10
Publication date: 2009-02-26

Abstract

PROBLEM TO BE SOLVED: To allow a user to confirm description of content from a word selected based on a value calculated with consideration given to deflection of an appearance frequency and an appearance position of the word and to designate a reproduction position. SOLUTION: A movement window is set for every area of a predetermined number of words constituting a string of words obtained by performing morphological analysis on subtitle data. The whole of the words included in each of the movement windows is considered as one document, and as to each word, a TF-IDF (Term Frequency-Inverted Document Frequency) value is calculated. For example, TF-IDF values of the respective words are compared mutually, and the words are selected sequentially as characteristic words in the descending order of the TF-IDF value. When a predetermined characteristic word is selected from the words displayed on a screen of program detailed information by the user, reproduction of a recorded program is started from the position wherein a sentence including the selected characteristic word is displayed as subtitles. This Invention is applicable to a digital recording apparatus. COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、情報処理装置、情報処理方法、およびプログラムに関し、特に、単語の出現回数と出現位置の偏りを考慮して算出される値に基づいて選択した単語からユーザがコンテンツの内容を確認し、再生位置を指定することができるようにした情報処理装置、情報処理方法、およびプログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and a program, and in particular, a user confirms the contents of content from a word selected based on a value calculated in consideration of the number of appearances of words and the deviation of appearance positions. The present invention relates to an information processing apparatus, an information processing method, and a program that can specify a reproduction position.

録画済みの番組などのコンテンツの再生位置を指定する方法として各種の方法が提案されている。例えば、特許文献１には、ユーザが入力したキーワードと同じキーワードをコンテンツの字幕の中から検索し、同じキーワードを含む文章が字幕として表示されるシーンをユーザの注目するシーンとして出力する技術が開示されている。 Various methods have been proposed as a method for designating the playback position of content such as a recorded program. For example, Patent Literature 1 discloses a technique for searching for the same keyword as the keyword input by the user from the subtitles of the content, and outputting a scene in which a sentence including the same keyword is displayed as the subtitle as a scene that the user pays attention to. Has been.

特開２００６−１２９１２２号公報JP 2006-129122 A 特開２００３−１６０９１号公報JP 2003-16091 A

上述した方法は、ユーザがコンテンツのどのようなシーンを見たいのかが明確であるときには有効であるものの、明確でないときには使うことができない。 The above-described method is effective when it is clear what kind of scene the user wants to see, but cannot be used when it is not clear.

また、ユーザは、入力したキーワードがコンテンツ内でどの程度の重要性を持った単語であるのかがわからないため、字幕としては表示されるものの、内容的にそのキーワードとあまり関係のないシーンが再生されるといったこともある。 Also, since the user does not know how important the entered keyword is in the content, a scene that is displayed as subtitles but is not closely related to the keyword is played back. There is also such a thing.

そこで、字幕を解析するなどしてコンテンツの特徴を表す単語を装置が選択し、選択した単語を提示して、その中からユーザが選択することでキーワードを入力することができるようになされているものもあるが、単純に出現頻度の高い単語を選択しただけでは適当でない。どのシーンの字幕にもよく出現するような、特定のシーンの特徴を表しているとは考えられない単語であっても、ユーザに提示され、シーンの選択に用いられてしまうことになる。 Therefore, the device can select a word representing the feature of the content by analyzing subtitles, present the selected word, and the user can select the keyword to input the keyword. There are some, but it is not appropriate to simply select words with high frequency of appearance. Even words that often appear in subtitles in any scene and are not considered to represent the characteristics of a specific scene are presented to the user and used for scene selection.

従って、シーンの選択に利用させるといった点からは、他のシーンの字幕にはあまり出現せず、特定のシーンに集中して出現する単語を提示することが好ましい。 Therefore, from the viewpoint of use in scene selection, it is preferable to present words that do not appear so much in the subtitles of other scenes but concentrate on a specific scene.

本発明はこのような状況に鑑みてなされたものであり、単語の出現回数と出現位置の偏りを考慮して算出される値に基づいて選択した単語からユーザがコンテンツの内容を確認し、再生位置を指定することができるようにするものである。 The present invention has been made in view of such a situation, and the user confirms the content of the content from the word selected based on the value calculated in consideration of the appearance frequency and the bias of the appearance position, and plays back the content. The position can be specified.

本発明の一側面の情報処理装置は、コンテンツの字幕を解析し、字幕を構成する単語を取得する解析手段と、前記解析手段により取得した単語のそれぞれについて、出現回数と出現位置の偏りを考慮した重み係数を算出する算出手段と、前記算出手段により算出された重み係数に基づいて、コンテンツの特徴を表す所定の数の特徴語を選択する選択手段と、前記選択手段により選択された前記特徴語を表示し、表示したものの中から選択された前記特徴語を含む文章が字幕として表示される位置から前記コンテンツを再生する再生手段とを備える。 An information processing apparatus according to an aspect of the present invention analyzes a subtitle of a content, acquires a word constituting the subtitle, and considers an appearance frequency and a bias of an appearance position for each of the words acquired by the analyzing unit. Calculating means for calculating the weighting factor, selection means for selecting a predetermined number of feature words representing the features of the content based on the weighting factor calculated by the calculating means, and the feature selected by the selecting means Replaying means for displaying the word and reproducing the content from the position where the sentence including the feature word selected from the displayed ones is displayed as a subtitle.

前記算出手段は、前記解析手段によって取得した単語を表示順に並べた列を生成し、その列を構成する所定の数の単語の範囲毎に検出窓を設定し、それぞれの検出窓に含まれるそれぞれの単語に注目して、注目する単語のその単語を含む検出窓内の出現回数をTF、全ての前記検出窓のうちの注目する単語が出現する検出窓の数をDFとして、重み係数がTFの増加に応じて増加し、DFの増加に応じて減少するように重み係数を算出する算出手段であるようにすることができる。 The calculation unit generates a column in which the words acquired by the analysis unit are arranged in the display order, sets a detection window for each range of a predetermined number of words constituting the column, and is included in each detection window. TF is the number of occurrences of the word of interest within the detection window containing the word, DF is the number of detection windows in which the word of interest appears among all the detection windows, and the weighting factor is TF It is possible to provide a calculation means for calculating the weighting coefficient so that the weighting coefficient increases in accordance with the increase in DF and decreases in accordance with the increase in DF.

前記選択手段には、前記算出手段により算出された重み係数を比較させ、同一コンテンツ内で得られた値の最大値の大きい単語から順に所定の数の単語を、または、同一コンテンツ内で得られた値の最大値が閾値より大きい単語を、前記特徴語として選択させることができる。 The selection unit compares the weighting factor calculated by the calculation unit, and a predetermined number of words are obtained in order from the word with the largest maximum value obtained in the same content, or in the same content. A word having a maximum value greater than a threshold value can be selected as the feature word.

前記再生手段には、EPGデータから取得されたコンテンツに関する情報とともに、前記選択手段により選択された前記特徴語を表示させることができる。 The reproduction means can display the feature word selected by the selection means together with information on the content acquired from the EPG data.

本発明の一側面の情報処理方法またはプログラムは、コンテンツの字幕を解析し、字幕を構成する単語を取得し、取得した単語のそれぞれについて、出現回数と出現位置の偏りを考慮した重み係数を算出し、算出した重み係数に基づいて、コンテンツの特徴を表す所定の数の特徴語を選択し、選択した前記特徴語を表示し、表示したものの中から選択された前記特徴語を含む文章が字幕として表示される位置から前記コンテンツを再生するステップを含む。 An information processing method or program according to one aspect of the present invention analyzes subtitles of content, acquires words constituting the subtitles, and calculates a weighting factor for each of the acquired words in consideration of appearance frequency and appearance position bias. Then, based on the calculated weighting factor, a predetermined number of feature words representing the feature of the content are selected, the selected feature word is displayed, and the sentence including the feature word selected from the displayed ones is subtitled The content is reproduced from the position displayed as.

本発明の一側面においては、コンテンツの字幕が解析され、字幕を構成する単語が取得され、取得された単語のそれぞれについて、出現回数と出現位置の偏りを考慮した重み係数が算出される。また、算出された重み係数に基づいて、コンテンツの特徴を表す所定の数の特徴語が選択され、選択された前記特徴語が表示され、表示されたものの中から選択された前記特徴語を含む文章が字幕として表示される位置から前記コンテンツが再生される。 In one aspect of the present invention, subtitles of content are analyzed, words constituting the subtitles are acquired, and a weighting factor is calculated for each of the acquired words in consideration of the appearance frequency and the bias of the appearance position. Further, a predetermined number of feature words representing the features of the content are selected based on the calculated weighting factor, the selected feature words are displayed, and the selected feature words are included from the displayed ones. The content is reproduced from the position where the text is displayed as a subtitle.

本発明の一側面によれば、ユーザは、単語の出現回数と出現位置の偏りを考慮して算出される値に基づいて選択された単語から、コンテンツの内容を確認し、再生位置を指定することができる。 According to one aspect of the present invention, the user confirms the content of the content from the word selected based on the value calculated in consideration of the number of appearances of the word and the bias of the appearance position, and designates the reproduction position. be able to.

以下に本発明の実施の形態を説明するが、本発明の構成要件と、明細書又は図面に記載の実施の形態との対応関係を例示すると、次のようになる。この記載は、本発明をサポートする実施の形態が、明細書又は図面に記載されていることを確認するためのものである。従って、明細書又は図面中には記載されているが、本発明の構成要件に対応する実施の形態として、ここには記載されていない実施の形態があったとしても、そのことは、その実施の形態が、その構成要件に対応するものではないことを意味するものではない。逆に、実施の形態が発明に対応するものとしてここに記載されていたとしても、そのことは、その実施の形態が、その構成要件以外には対応しないものであることを意味するものでもない。 Embodiments of the present invention will be described below. Correspondences between the constituent elements of the present invention and the embodiments described in the specification or the drawings are exemplified as follows. This description is intended to confirm that the embodiments supporting the present invention are described in the specification or the drawings. Therefore, even if there is an embodiment which is described in the specification or the drawings but is not described here as an embodiment corresponding to the constituent elements of the present invention, that is not the case. It does not mean that the form does not correspond to the constituent requirements. On the contrary, even if an embodiment is described herein as corresponding to the invention, this does not mean that the embodiment does not correspond to other than the configuration requirements. .

本発明の一側面の情報処理装置（例えば、図１の情報処理装置１）は、コンテンツの字幕を解析し、字幕を構成する単語を取得する解析手段（例えば、図３の形態素解析部３３）と、前記解析手段により取得した単語のそれぞれについて、出現回数と出現位置の偏りを考慮した重み係数を算出する算出手段（例えば、図３のTF-IDF値算出部３４）と、前記算出手段により算出された重み係数に基づいて、コンテンツの特徴を表す所定の数の特徴語を選択する選択手段（例えば、図３の特徴語選択部３５）と、前記選択手段により選択された前記特徴語を表示し、表示したものの中から選択された前記特徴語を含む文章が字幕として表示される位置から前記コンテンツを再生する再生手段（例えば、図３のコンテンツ再生部３７）とを備える。 The information processing apparatus according to one aspect of the present invention (for example, the information processing apparatus 1 in FIG. 1) analyzes content subtitles and acquires analysis words (for example, the morphological analysis unit 33 in FIG. 3) that acquires words constituting the subtitles. And calculating means (for example, TF-IDF value calculating unit 34 in FIG. 3) for calculating the weighting factor in consideration of the appearance frequency and the bias of the appearance position for each of the words acquired by the analyzing means, and the calculating means Based on the calculated weighting factor, a selection unit (for example, the feature word selection unit 35 in FIG. 3) that selects a predetermined number of feature words representing the feature of the content, and the feature word selected by the selection unit Reproduction means (for example, the content reproduction unit 37 in FIG. 3) reproduces the content from a position where the sentence including the feature word selected from the displayed ones is displayed as a caption.

本発明の一側面の情報処理方法またはプログラムは、コンテンツの字幕を解析し、字幕を構成する単語を取得し、取得した単語のそれぞれについて、出現回数と出現位置の偏りを考慮した重み係数を算出し、算出した重み係数に基づいて、コンテンツの特徴を表す所定の数の特徴語を選択し、選択した前記特徴語を表示し、表示したものの中から選択された前記特徴語を含む文章が字幕として表示される位置から前記コンテンツを再生するステップ（例えば、図１２のステップＳ１６）を含む。 An information processing method or program according to one aspect of the present invention analyzes subtitles of content, acquires words constituting the subtitles, and calculates a weighting factor for each of the acquired words in consideration of appearance frequency and appearance position bias. Then, based on the calculated weighting factor, a predetermined number of feature words representing the feature of the content are selected, the selected feature word is displayed, and the sentence including the feature word selected from the displayed ones is subtitled Including the step of reproducing the content from the position displayed as (for example, step S16 in FIG. 12).

以下、本発明の実施の形態について図を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明の一実施形態に係る情報処理装置１を示す図である。 FIG. 1 is a diagram showing an information processing apparatus 1 according to an embodiment of the present invention.

情報処理装置１はハードディスクを内蔵するデジタル録画機器である。情報処理装置１には、ケーブルを介してテレビジョン受像機２が接続される。 The information processing apparatus 1 is a digital recording device having a built-in hard disk. A television receiver 2 is connected to the information processing apparatus 1 via a cable.

情報処理装置１は、BS(Broadcasting Satellite)／CS(Communications Satellite)デジタル放送、地上デジタル放送、またはインターネットを介して行われる放送によって提供される番組（コンテンツ）の映像や音声をテレビジョン受像機２から出力させたり、番組のデータをハードディスクに記録させ、録画を行ったりする。すなわち、情報処理装置１には、図示せぬアンテナからの信号などが供給されるようになされている。情報処理装置１は、録画済みの番組をユーザによる指示に応じて再生し、番組の映像や音声をテレビジョン受像機２から出力させる。 The information processing device 1 is a television receiver 2 that receives video (audio) of a program (content) provided by BS (Broadcasting Satellite) / CS (Communications Satellite) digital broadcasting, terrestrial digital broadcasting, or broadcasting performed via the Internet. Or output program data, record program data on the hard disk, and record. That is, the information processing apparatus 1 is supplied with a signal from an antenna (not shown). The information processing apparatus 1 reproduces a recorded program in response to an instruction from the user, and causes the television receiver 2 to output the video and audio of the program.

また、情報処理装置１は、放送波やインターネットを介して放送局などによって配信されるEPGデータを取得し、管理する。EPGデータには、それぞれの番組について、番組のタイトル、放送日時、出演者、概要などの番組に関する情報が含まれる。 Further, the information processing apparatus 1 acquires and manages EPG data distributed by a broadcasting station or the like via broadcast waves or the Internet. The EPG data includes information on the program such as the program title, broadcast date and time, performers, and overview for each program.

情報処理装置１は、録画済みの番組のデータに含まれるクローズドキャプションデータなどの字幕データの解析を行うことによってそれぞれの番組の特徴を表していると考えられる単語を抽出する機能を有する。番組の特徴を表していると考えられる単語として、他の番組の字幕データにも含まれるような一般的な単語ではなく、特定の番組の字幕データ内で、出現位置に偏りがあり、かつ、出現頻度が高い単語が後述するようにして抽出される。 The information processing apparatus 1 has a function of extracting words that are considered to represent the characteristics of each program by analyzing subtitle data such as closed caption data included in recorded program data. As a word that is considered to represent the characteristics of the program, there is a bias in the appearance position in the caption data of a specific program, not a general word that is also included in the caption data of other programs, and Words with high appearance frequency are extracted as described later.

情報処理装置１は、抽出した単語を、録画済みの番組の情報が一覧表示されるタイトルリストの画面や、タイトルリストから所定の番組を選択したときに表示される番組詳細情報の画面に、番組のタイトルなどの情報とともに表示し、ユーザに提示する。提示したものの中から所定の単語がユーザにより選択され、再生が指示されたとき、情報処理装置１は、ユーザにより選択された単語を含む文章が字幕として表示される位置から、録画済みの番組の再生を開始する。 The information processing apparatus 1 displays the extracted word on a title list screen on which recorded program information is displayed in a list or a program detailed information screen displayed when a predetermined program is selected from the title list. It is displayed with information such as the title and presented to the user. When a predetermined word is selected by the user from the presented items and playback is instructed, the information processing apparatus 1 starts the recorded program from the position where the sentence including the word selected by the user is displayed as a subtitle. Start playback.

このような単語がそれぞれの録画済みの番組について提示されることにより、ユーザは、それぞれの番組の特徴をより適切に表していると考えられる単語から、それぞれの番組内で取り上げられている話題の概要を確認することができる。特定の番組の字幕データ内で、出現位置に偏りがあり、かつ、出現頻度が高い単語は、他の番組の字幕データ内にもよく出現するような一般的な単語と較べて番組の特徴をより適切に表している単語であるといえる。 By presenting these words for each recorded program, the user can identify the topics covered in each program from words that are thought to better represent the characteristics of each program. You can check the summary. In the caption data of a specific program, words whose appearance position is biased and whose appearance frequency is high are characterized by the program characteristics compared to general words that often appear in the caption data of other programs. It can be said that it is a more appropriate word.

また、ユーザは、そのような単語の中から自分の興味のあるシーンと関係のある単語を選択し、選択した単語を含む文章が字幕として表示される位置から番組の再生を開始させることができる。 In addition, the user can select a word related to the scene he is interested in from such words, and can start playing the program from a position where a sentence including the selected word is displayed as a subtitle. .

特徴的な単語を抽出してユーザに提示し、録画済みの番組の再生を制御する情報処理装置１の処理についてはフローチャートを参照して後述する。 A process of the information processing apparatus 1 that extracts a characteristic word and presents it to the user and controls the reproduction of the recorded program will be described later with reference to a flowchart.

図２は、情報処理装置１のハードウエア構成例を示すブロック図である。 FIG. 2 is a block diagram illustrating a hardware configuration example of the information processing apparatus 1.

CPU(Central Processing Unit)１１は、ROM(Read Only Memory)１２、または記録部１９に記録されているプログラムに従って各種の処理を実行する。RAM(Random Access Memory)１３には、CPU１１が実行するプログラムやデータなどが適宜記録される。これらのCPU１１、ROM１２、およびRAM１３は、バス１４により相互に接続されている。 A CPU (Central Processing Unit) 11 executes various processes according to a program recorded in a ROM (Read Only Memory) 12 or a recording unit 19. A RAM (Random Access Memory) 13 appropriately stores programs executed by the CPU 11 and data. The CPU 11, ROM 12, and RAM 13 are connected to each other by a bus 14.

CPU１１にはまた、バス１４を介して入出力インタフェース１５が接続されている。入出力インタフェース１５には、受信部１６、入力部１７、出力部１８、記録部１９、通信部２０、およびドライブ２１が接続されている。 An input / output interface 15 is also connected to the CPU 11 via the bus 14. The input / output interface 15 is connected to a receiving unit 16, an input unit 17, an output unit 18, a recording unit 19, a communication unit 20, and a drive 21.

受信部１６は、アンテナ１６Ａからの放送波信号を受信、復調し、MPEG-TS(Moving Picture Experts Group-Transport Stream)を取得する。受信部１６は、録画の対象になっている番組のデータやEPGデータをMPEG-TSから取得し、取得したデータを入出力インタフェース１５を介して記録部１９に出力する。 The receiving unit 16 receives and demodulates the broadcast wave signal from the antenna 16A, and acquires MPEG-TS (Moving Picture Experts Group-Transport Stream). The receiving unit 16 acquires program data and EPG data to be recorded from the MPEG-TS, and outputs the acquired data to the recording unit 19 via the input / output interface 15.

入力部１７は、リモートコントローラからの信号を受信し、ユーザの操作の内容を表す情報を入出力インタフェース１５、バス１４を介してCPU１１に出力する。 The input unit 17 receives a signal from the remote controller, and outputs information representing the contents of the user's operation to the CPU 11 via the input / output interface 15 and the bus 14.

出力部１８は、再生が指示された番組のデータをデコードし、得られた映像信号に基づいて、番組の映像をテレビジョン受像機２に表示させる。 The output unit 18 decodes the data of the program instructed to be reproduced, and displays the video of the program on the television receiver 2 based on the obtained video signal.

記録部１９は例えばハードディスクからなり、CPU１１が実行するプログラムや、入出力インタフェース１５を介して受信部１６から供給された番組のデータ、EPGデータなどの各種のデータを記録する。録画済みの番組のデータには、字幕データから抽出された単語や、EPGデータから取得されたその番組に関する情報が対応付けられる。 The recording unit 19 includes, for example, a hard disk, and records various data such as a program executed by the CPU 11, program data supplied from the receiving unit 16 via the input / output interface 15, and EPG data. Recorded program data is associated with words extracted from subtitle data and information about the program acquired from EPG data.

通信部２０は、サーバと通信を行い、インターネットを介して行われる放送によって配信される番組のデータや、サーバが配信するEPGデータを取得する。通信部２０は、取得したデータを入出力インタフェース１５を介して記録部１９に出力し、記録させる。 The communication unit 20 communicates with the server, and acquires program data distributed by broadcasting performed via the Internet and EPG data distributed by the server. The communication unit 20 outputs the acquired data to the recording unit 19 via the input / output interface 15 and records it.

ドライブ２１は、磁気ディスク、光ディスク、光磁気ディスク、或いは半導体メモリなどのリムーバブルメディア２２が装着されたとき、それらを駆動し、そこに記録されているプログラムやデータなどを取得する。取得されたプログラムやデータは、必要に応じて記録部１９に転送され、記録される。 When a removable medium 22 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory is mounted, the drive 21 drives them to acquire programs and data recorded therein. The acquired program and data are transferred to the recording unit 19 and recorded as necessary.

図３は、情報処理装置１の機能構成例を示すブロック図である。図３に示す機能部のうちの少なくとも一部は、図２のCPU１１により所定のプログラムが実行されることによって実現される。 FIG. 3 is a block diagram illustrating a functional configuration example of the information processing apparatus 1. At least a part of the functional units shown in FIG. 3 is realized by executing a predetermined program by the CPU 11 of FIG.

図３に示されるように、情報処理装置１においては、コンテンツ記録部３１、字幕データ取得部３２、形態素解析部３３、TF-IDF(Term Frequency-Inverted Document Frequency)値算出部３４、特徴語選択部３５、コンテンツ情報表示部３６、およびコンテンツ再生部３７が実現される。 As shown in FIG. 3, in the information processing apparatus 1, a content recording unit 31, a caption data acquisition unit 32, a morpheme analysis unit 33, a TF-IDF (Term Frequency-Inverted Document Frequency) value calculation unit 34, a feature word selection The unit 35, the content information display unit 36, and the content reproduction unit 37 are realized.

コンテンツ記録部３１は、図２の記録部１９において実現され、番組のデータを、EPGデータから取得されたその番組に関する情報と対応付けて管理する。また、コンテンツ記録部３１は、字幕データから番組の特徴を表す単語が抽出され、抽出された単語が特徴語選択部３５から供給されたとき、供給された単語を番組のデータと対応付けて管理する。 The content recording unit 31 is realized in the recording unit 19 of FIG. 2 and manages program data in association with information related to the program acquired from EPG data. Also, the content recording unit 31 extracts words representing program features from the caption data, and manages the extracted words in association with program data when the extracted words are supplied from the feature word selection unit 35. To do.

字幕データ取得部３２は、コンテンツ記録部３１にデータが記録されているそれぞれの番組に注目し、注目する番組のデータに含まれる字幕データをコンテンツ記録部３１から読み出す。字幕データ取得部３２は、コンテンツ記録部３１から読み出した字幕データを形態素解析部３３に出力する。 The caption data acquisition unit 32 pays attention to each program whose data is recorded in the content recording unit 31 and reads out the caption data included in the data of the program of interest from the content recording unit 31. The caption data acquisition unit 32 outputs the caption data read from the content recording unit 31 to the morpheme analysis unit 33.

形態素解析部３３は、字幕データ取得部３２から供給された字幕データを対象として形態素解析を行うことによって、字幕として表示される文章を単語に区切り、それぞれの単語が表示順に並べられた単語の列を生成する。形態素解析部３３は、生成した単語の列をTF-IDF値算出部３４に出力する。 The morpheme analysis unit 33 performs a morphological analysis on the caption data supplied from the caption data acquisition unit 32 to divide sentences displayed as captions into words, and a sequence of words in which the words are arranged in display order. Is generated. The morpheme analyzer 33 outputs the generated word string to the TF-IDF value calculator 34.

TF-IDF値算出部３４は、形態素解析部３３から供給された列を構成するそれぞれの単語のTF-IDF値を算出する。一般的な単語ではなく、出現位置に偏りがあり、かつ、出現頻度が高い単語に対して、大きな値のTF-IDF値が算出される。 The TF-IDF value calculation unit 34 calculates the TF-IDF value of each word constituting the sequence supplied from the morpheme analysis unit 33. A large TF-IDF value is calculated for a word that is not a general word but has a biased appearance position and a high appearance frequency.

ここで、TF-IDF値の算出について説明する。 Here, calculation of the TF-IDF value will be described.

図４は、TF-IDF値算出部３４による移動ウインドウの設定の例を示す図である。 FIG. 4 is a diagram illustrating an example of setting a moving window by the TF-IDF value calculation unit 34.

図４の上段に示される幅広の水平方向の点線は字幕として表示されるそれぞれの文章を表し、下段に示される水平方向の点線は、形態素解析が行われることによって形態素解析部３３により生成された単語の列を表す。録画済みの番組の再生中、テレビジョン受像機２には、左側にある字幕、単語から表示される。 The wide horizontal dotted line shown in the upper part of FIG. 4 represents each sentence displayed as subtitles, and the horizontal dotted line shown in the lower part is generated by the morphological analysis unit 33 by performing morphological analysis. Represents a sequence of words. While the recorded program is being played back, the television receiver 2 displays the subtitles and words on the left side.

TF-IDF値算出部３４においては、単語の列を構成する５０個、１００個などの所定の数の単語の範囲毎に移動ウインドウ（検出窓）が設定され、それぞれの移動ウインドウに含まれるそれぞれの単語についてTF-IDF値の算出が行われる。すなわち、移動ウインドウに含まれる単語全体が１つのドキュメントとみなされ、TF-IDF値の算出が行われることになる。 In the TF-IDF value calculation unit 34, a moving window (detection window) is set for each range of a predetermined number of words such as 50, 100, etc. constituting the word string, and each included in each moving window. The TF-IDF value is calculated for this word. That is, the entire word included in the moving window is regarded as one document, and the TF-IDF value is calculated.

図４の例においては、形態素解析が行われることによって生成された単語の列に移動ウインドウＷ₁が設定され、次に、所定の数の単語分だけ範囲を表示順にずらして移動ウインドウＷ₂が設定されている。 In the example of FIG. 4 are set moving window W ₁ is a sequence of words that are generated by the morphological analysis is performed, then, is moved window W ₂ is shifted in the order of displaying the word to only a range of a predetermined number of Is set.

例えば、１つの移動ウインドウが１００個の単語の範囲に設定されるものであり、それぞれの移動ウインドウが１０個の単語分だけ範囲をずらして設定される場合、表示順に並べた単語の列の１個目の単語から１００個目の単語までの範囲に１つ目の移動ウインドウが設定され、次に、１０個目の単語から１１０個目の単語までの範囲に２つ目の移動ウインドウが設定され、次に、２０個目の単語から１２０個目の単語までの範囲に３つ目の移動ウインドウが設定されることになる。 For example, when one moving window is set to a range of 100 words, and each moving window is set by shifting the range by 10 words, 1 of a sequence of words arranged in display order. The first moving window is set in the range from the 10th word to the 100th word, and then the second moving window is set in the range from the 10th word to the 110th word. Then, the third moving window is set in the range from the 20th word to the 120th word.

TF-IDF値算出部３４においては、それぞれの移動ウインドウに含まれるそれぞれの単語に注目して、下式（１）によりTF-IDF値が算出される。

In the TF-IDF value calculation unit 34, paying attention to each word included in each moving window, the TF-IDF value is calculated by the following equation (1).

式（１）において、TFは注目する単語のその単語を含む移動ウインドウ内の出現回数であり、Ｎは全ての移動ウインドウの数である。DFは全ての移動ウインドウのうちの注目する単語が出現する移動ウインドウの数である。 In equation (1), TF is the number of appearances of the word of interest within the moving window containing that word, and N is the number of all moving windows. DF is the number of moving windows in which a noticed word appears in all moving windows.

すなわち、TF-IDF値は、ある移動ウインドウ内に注目している単語がどの程度の頻度で現れるのかを数値化して表すTFと、どの程度の少ない移動ウインドウ内にしか注目している単語が現れないのかを数値化して表すIDFを乗算することによって求められる値であり、出現位置の偏りと出現頻度を考慮した数的指標となる。 In other words, the TF-IDF value is a TF that shows how often the word you are interested in appears in a certain moving window, and the number of words that you are interested in only in a few moving windows. This is a value obtained by multiplying IDF, which expresses whether or not there is a numerical value, and is a numerical index that takes into account the bias of the appearance position and the appearance frequency.

TF-IDF値算出部３４により求められたTF-IDF値と、番組の先頭を基準とした表示時刻などの、それぞれの単語の出現位置を表す情報は特徴語選択部３５に出力される。 Information representing the appearance position of each word, such as the TF-IDF value obtained by the TF-IDF value calculation unit 34 and the display time based on the beginning of the program, is output to the feature word selection unit 35.

特徴語選択部３５は、注目する番組の特徴を表していると考えられる単語である特徴語を、TF-IDF値算出部３４から供給されたTF-IDF値に基づいて選択する。特徴語選択部３５は、例えば、各単語のTF-IDF値を比較し、最大のTF-IDF値が大きいものから順に、１０個などの所定の数の単語を特徴語として選択する。 The feature word selection unit 35 selects a feature word that is a word considered to represent the feature of the program of interest based on the TF-IDF value supplied from the TF-IDF value calculation unit 34. For example, the feature word selection unit 35 compares the TF-IDF values of the respective words, and selects a predetermined number of words such as 10 as feature words in order from the largest TF-IDF value.

図５乃至図７は、ある番組に注目しているときに求められたTF-IDF値の例を示す図である。 5 to 7 are diagrams illustrating examples of TF-IDF values obtained when attention is paid to a certain program.

図５は、注目する番組の字幕に含まれる単語である「そば」について算出されたTF-IDF値の例を示す図である。 FIG. 5 is a diagram illustrating an example of the TF-IDF value calculated for “Soba” that is a word included in the caption of the program of interest.

図５の例においては、時刻ｔ₁に出現する「そば」には４０のTF-IDF値が算出され、時刻ｔ₂に出現する「そば」には３５のTF-IDF値が算出され、時刻ｔ₃に出現する「そば」には５５のTF-IDF値が算出されている。同様に、時刻ｔ₄に出現する「そば」には４０のTF-IDF値が算出され、時刻ｔ₅に出現する「そば」には２０のTF-IDF値が算出されている。最大のTF-IDF値は、時刻ｔ₃に出現する「そば」について算出された５５となる。 In the example of FIG. 5, 40 TF-IDF values are calculated for “Soba” appearing at time t ₁ , and 35 TF-IDF values are calculated for “Soba” appearing at time t _2. For “soba” appearing at t ₃ , 55 TF-IDF values are calculated. Similarly, 40 TF-IDF values are calculated for “Soba” appearing at time t _4, and 20 TF-IDF values are calculated for “Soba” appearing at time t ₅ . The maximum TF-IDF value is 55 calculated for “soba” appearing at time t ₃ .

なお、移動ウインドウの設定の仕方によっては、同じ例えば時刻ｔ₁において出現する「そば」を含む移動ウインドウが複数設定され、それぞれ異なるTF-IDF値がその「そば」について算出されることになるが、説明の便宜上、ここでは、各時刻に出現する単語のTF-IDF値として１つの代表値のみを示している。図６、図７においても同様である。 Depending on how the moving window is set, a plurality of moving windows including the same “soba” appearing at, for example, time t ₁ are set, and different TF-IDF values are calculated for the “soba”. For convenience of explanation, here, only one representative value is shown as the TF-IDF value of the word appearing at each time. The same applies to FIGS. 6 and 7.

図６は、「サービス」について算出されたTF-IDF値の例を示す図である。 FIG. 6 is a diagram illustrating an example of the TF-IDF value calculated for “service”.

図６の例においては、時刻ｔ₁₁に出現する「サービス」には１０のTF-IDF値が算出され、時刻ｔ₁₂に出現する「サービス」には１５のTF-IDF値が算出され、時刻ｔ₁₃に出現する「サービス」には２０のTF-IDF値が算出されている。同様に、時刻ｔ₁₄に出現する「サービス」には１５のTF-IDF値が算出され、時刻ｔ₁₅に出現する「サービス」には１０のTF-IDF値が算出されている。最大のTF-IDF値は、時刻ｔ₁₃に出現する「サービス」について算出された２０となる。 In the example of FIG. 6, 10 TF-IDF values are calculated for “service” that appears at time t ₁₁ , and 15 TF-IDF values are calculated for “service” that appears at time t _12. Twenty TF-IDF values are calculated for the “service” appearing at t ₁₃ . Similarly, 15 TF-IDF values are calculated for “service” appearing at time t _14, and 10 TF-IDF values are calculated for “service” appearing at time t ₁₅ . The maximum TF-IDF value is 20 calculated for the “service” that appears at time t ₁₃ .

図７は、「豚」について算出されたTF-IDF値の例を示す図である。 FIG. 7 is a diagram illustrating an example of TF-IDF values calculated for “pigs”.

図７の例においては、時刻ｔ₂₁に出現する「豚」には６５のTF-IDF値が算出され、時刻ｔ₂₂に出現する「豚」には６０のTF-IDF値が算出され、時刻ｔ₂₃に出現する「豚」には４０のTF-IDF値が算出されている。同様に、時刻ｔ₂₄に出現する「豚」には５５のTF-IDF値が算出され、時刻ｔ₂₅に出現する「豚」には５０のTF-IDF値が算出されている。最大のTF-IDF値は、時刻ｔ₂₁に出現する「豚」について算出された６５となる。 In the example of FIG. 7, 65 TF-IDF values are calculated for “pigs” that appear at time t ₂₁ , and 60 TF-IDF values are calculated for “pigs” that appear at time t _22. TF-IDF value of 40 in the "pig" that appear in the t ₂₃ is calculated. Similarly, 55 TF-IDF values are calculated for “pigs” appearing at time t _24, and 50 TF-IDF values are calculated for “pigs” appearing at time t ₂₅ . The maximum TF-IDF value is 65 calculated for the “pig” appearing at time t ₂₁ .

特徴語選択部３５においては、「そば」についての最大のTF-IDF値である５５と、「サービス」についての最大のTF-IDF値である２０と、「豚」についての最大のTF-IDF値である６５を含む、各単語についての最大のTF-IDF値が比較され、最大のTF-IDF値の大きいものから順に、所定の数の特徴語が選択される。 In the feature word selection unit 35, 55 is the maximum TF-IDF value for "Soba", 20 is the maximum TF-IDF value for "Service", and the maximum TF-IDF for "Pig" The maximum TF-IDF values for each word including the value 65 are compared, and a predetermined number of feature words are selected in descending order of the maximum TF-IDF value.

特徴語選択部３５により選択された特徴語と、それぞれの特徴語の最大のTF-IDF値が算出された出現位置の情報はコンテンツ記録部３１に出力され、番組のデータと対応付けて記録される。なお、最大のTF-IDF値が大きいものから順に所定の数の単語が選択されるのではなく、最大のTF-IDF値が閾値より大きい値となる単語が特徴語として選択されるようにしてもよい。 The feature word selected by the feature word selection unit 35 and the information on the appearance position where the maximum TF-IDF value of each feature word is calculated are output to the content recording unit 31 and recorded in association with the program data. The It should be noted that a predetermined number of words are not selected in order from the largest TF-IDF value, but a word having a maximum TF-IDF value larger than a threshold value is selected as a feature word. Also good.

コンテンツ情報表示部３６は、タイトルリストを表示することがユーザにより指示され、そのことを表す情報が入力部１７から供給されたとき、録画済みの番組に関する情報をコンテンツ記録部３１から読み出し、出力部１８を制御してタイトルリストの画面をテレビジョン受像機２に表示させる。コンテンツ記録部３１から読み出される情報には、録画済みの番組のタイトル、所定の位置の映像をキャプチャした代表画、特徴語などの情報が含まれる。 When the user is instructed to display the title list and information indicating that is supplied from the input unit 17, the content information display unit 36 reads information about the recorded program from the content recording unit 31, and outputs it. 18 is controlled to display the title list screen on the television receiver 2. The information read from the content recording unit 31 includes information such as the title of a recorded program, a representative image obtained by capturing a video at a predetermined position, and a feature word.

図８は、タイトルリストの画面の例を示す図である。 FIG. 8 is a diagram illustrating an example of a title list screen.

図８の例においては、『きょうの献立「トマトはすごい」』、『旅、いい気分「茨城、福島おいし・・・」』、『酒蔵訪問−東北−』の３つの録画済みの番組のタイトルが表示されている。 In the example of FIG. 8, the titles of three recorded programs “Today's menu“ Tomato is amazing ””, “Travel, good mood“ Ibaraki, Fukushima Oishi ... ”,“ Sake Brewery Visit-Tohoku ” Is displayed.

『きょうの献立「トマトはすごい」』のタイトルの左側に表示されている画像Ｐ₁はこの番組の代表画であり、タイトルの下側に表示されている「ねぎ」、「ベーコン」、「手羽先」、「煮る」は、『きょうの献立「トマトはすごい」』の番組の字幕から選択された特徴語である。 The image P ₁ displayed on the left side of the title of “Today is Tomato is amazing” is a representative picture of this program. “Negi”, “Bacon”, “Wings” displayed below the title “First” and “Simmer” are feature words selected from the subtitles of the “Today is Tomato is Great” program.

『旅、いい気分「茨城、福島おいし・・・」』のタイトルの左側に表示されている画像Ｐ₂はこの番組の代表画であり、タイトルの下側に表示されている「そば」、「豚」、「エステ」、「コラーゲン」は、『旅、いい気分「茨城、福島おいし・・・」』の番組の字幕から選択された特徴語である。 The image P ₂ displayed on the left side of the title “Travel, Good Feeling“ Ibaraki, Fukushima Oishi ... ”is a representative picture of this program, and“ Soba ”,“ “Pig”, “Esthetic”, and “Collagen” are feature words selected from the subtitles of the program “Journey, Good Feeling“ Ibaraki, Oishi Fukushima ... ”.

図５乃至図７は『旅、いい気分「茨城、福島おいし・・・」』の番組の字幕に含まれる単語を対象としたときに算出されたTF-IDF値の例を示しており、図５乃至図７に示されるTF-IDF値に基づいて、「そば」、「豚」などの単語が特徴語として選択されている。 FIGS. 5 to 7 show examples of TF-IDF values calculated when words included in the subtitles of the program “Journey, Good mood“ Ibaraki, Fukushima Oishi ... ”are shown. Based on the TF-IDF values shown in FIGS. 5 to 7, words such as “soba” and “pig” are selected as feature words.

『酒蔵訪問−東北−』のタイトルの左側に表示されている画像Ｐ₃はこの番組の代表画であり、タイトルの下側に表示されている「米」、「水」、「酒蔵」は、『酒蔵訪問−東北−』の番組の字幕から選択された特徴語である。 The image P ₃ displayed on the left side of the title of “Sake Brewery -Tohoku-” is a representative picture of this program, and “rice”, “water”, and “sake brewery” displayed below the title are: It is a feature word selected from the subtitles of the program “Sake Brewery -Tohoku-”.

字幕から選択された特徴語がこのようにして表示されるため、ユーザは、特徴語から、それぞれの番組の概要を確認し、視聴しようとする番組を選択することができる。最大のTF-IDF値が閾値を超える特徴語だけがタイトルリストの画面に表示されるようにしてもよい。 Since the feature words selected from the subtitles are displayed in this way, the user can check the outline of each program from the feature words and select the program to be viewed. Only feature words for which the maximum TF-IDF value exceeds the threshold value may be displayed on the title list screen.

図９は、番組詳細情報の画面の例を示す図である。 FIG. 9 is a diagram illustrating an example of a program detail information screen.

例えば、図８に示される画面から『旅、いい気分「茨城、福島おいし・・・」』のタイトルが選択され、所定の操作が行われたとき、図８に示される画面に替えて図９の画面が表示される。 For example, when the title “Travel, good mood“ Ibaraki, Fukushima Oishi ... ”is selected from the screen shown in FIG. 8 and a predetermined operation is performed, the screen shown in FIG. Is displayed.

図９の例においては、画像Ｐ₂と同じ画像が画面の中央上方に表示され、その下に、『旅、いい気分「茨城、福島おいしいものめぐり」』のタイトルと、「そば」、「豚」、「エステ」、「コラーゲン」のタイトルリストの画面にも表示されていたものに加えて、「野菜」、「アンコウ」の特徴語が表示されている。「そば」、「豚」、「エステ」、「コラーゲン」の特徴語は最大のTF-IDF値が閾値を超えており、特に、番組の特徴を表していると考えられる単語であるため、「野菜」、「アンコウ」の他の特徴語に較べて目立つ色で表示されている。最大のTF-IDF値が閾値を超えている特徴語は大きな文字で表示されるなど、他の方法で強調表示されるようにしてもよい。 In the example of FIG. 9, the same image as the image P ₂ is displayed at the upper center of the screen, and below that, the title “Travel, good mood“ Ibaraki, Fukushima delicious food tour ”,“ Soba ”,“ Pig ” In addition to those displayed on the title list screen of “esthetic” and “collagen”, characteristic words of “vegetable” and “angko” are displayed. The characteristic words "Soba", "Pig", "Esthetic", and "Collagen" have a maximum TF-IDF value that exceeds the threshold value, and are particularly words that are considered to represent the characteristics of the program. It is displayed in a conspicuous color compared to other characteristic words of “vegetable” and “angkou”. A feature word whose maximum TF-IDF value exceeds the threshold value may be highlighted by another method, for example, displayed as a large character.

また、「そば」の下にはカーソルＣが表示されている。ユーザは、リモートコントローラに設けられるボタンを操作してカーソルＣを移動させ、決定操作を行うことによって、そのときカーソルＣがあてられている特徴語の出現位置から、『旅、いい気分「茨城、福島おいしいものめぐり」』の番組の再生を開始させることができる。 A cursor C is displayed under “Soba”. The user moves the cursor C by operating a button provided on the remote controller and performs a determination operation. From the appearance position of the characteristic word to which the cursor C is assigned at that time, “travel, good mood” Ibaraki, You can start playing the “Fukushima Delicious Food Tour” program.

タイトルだけでなく、番組の出演者の情報、概要などの、EPGデータから取得された各種の情報が番組詳細情報の画面にさらに表示されるようにしてもよい。 In addition to the title, various information acquired from the EPG data such as information on the performers of the program and an outline may be further displayed on the program detailed information screen.

図３の説明に戻り、コンテンツ再生部３７は、図９に示されるような番組詳細情報の画面から所定の特徴語が選択され、その特徴語の出現位置から録画済みの番組の再生を開始することが指示されたとき、再生を開始する位置以降の番組のデータをコンテンツ記録部３１から読み出し、読み出したデータに基づいて、番組の画面を出力部１８を制御してテレビジョン受像機２に表示させる。 Returning to the description of FIG. 3, the content playback unit 37 selects a predetermined feature word from the program detailed information screen as shown in FIG. 9, and starts playback of the recorded program from the appearance position of the feature word. Is read from the content recording unit 31 and the screen of the program is displayed on the television receiver 2 by controlling the output unit 18 based on the read data. Let

図１０は、「そば」の特徴語を選択した図９の状態で決定操作が行われたときにコンテンツ再生部３７により表示される番組の画面の例を示す図である。 FIG. 10 is a diagram illustrating an example of a program screen displayed by the content reproduction unit 37 when the determination operation is performed in the state of FIG. 9 in which the feature word “soba” is selected.

「そば」の特徴語を選択した図９の状態で決定操作が行われたとき、図１０に示されるように、「そば」を含む文章が字幕として表示される位置から、『旅、いい気分「茨城、福島おいしいものめぐり」』の番組の再生が開始される。図１０の画面の下方には、「新そばの季節ですよねこちらのそばセットは...」の字幕が表示されている。図１０の画面は、最大のTF-IDF値として５５が算出された「そば」の出現位置である時刻ｔ₃（図５）で特定される位置の画面である。 When the determination operation is performed in the state shown in FIG. 9 in which the feature word “soba” is selected, as shown in FIG. 10, from the position where the sentence including “soba” is displayed as subtitles, “travel, good mood” The program “Ibaraki, Fukushima Delicious Food Tour” starts playing. In the lower part of the screen of FIG. 10, a subtitle of “Soba of new soba, is this soba set ...” is displayed. The screen in FIG. 10 is a screen at a position specified at time t ₃ (FIG. 5), which is the appearance position of “Soba” for which 55 is calculated as the maximum TF-IDF value.

このように、ユーザは、番組の特徴を適切に表していると考えられる単語の出現位置から、番組の再生を開始させることができる。 In this way, the user can start playing the program from the appearance position of a word that is considered to appropriately represent the characteristics of the program.

次に、以上のような構成を有する情報処理装置１の処理について説明する。 Next, processing of the information processing apparatus 1 having the above configuration will be described.

はじめに、図１１のフローチャートを参照して、特徴語を選択する情報処理装置１の処理について説明する。この処理は、それぞれの録画済みの番組を対象として行われる。 First, the processing of the information processing apparatus 1 for selecting feature words will be described with reference to the flowchart of FIG. This process is performed for each recorded program.

ステップＳ１において、字幕データ取得部３２は、注目する番組のデータに含まれる字幕データをコンテンツ記録部３１から読み出し、取得する。字幕データ取得部３２は、取得した字幕データを形態素解析部３３に出力する。 In step S 1, the subtitle data acquisition unit 32 reads out and acquires subtitle data included in the data of the program of interest from the content recording unit 31. The caption data acquisition unit 32 outputs the acquired caption data to the morphological analysis unit 33.

ステップＳ２において、形態素解析部３３は、字幕データ取得部３２から供給された字幕データを対象として形態素解析を行い、字幕を構成するそれぞれの単語が表示順に並べられた単語の列を生成する。形態素解析部３３は、生成した単語の列をTF-IDF値算出部３４に出力する。 In step S 2, the morpheme analysis unit 33 performs morpheme analysis on the caption data supplied from the caption data acquisition unit 32, and generates a word string in which the respective words constituting the caption are arranged in the display order. The morpheme analyzer 33 outputs the generated word string to the TF-IDF value calculator 34.

ステップＳ３において、TF-IDF値算出部３４は、形態素解析部３３から供給された列を構成するそれぞれの単語のTF-IDF値を上述したようにして算出する。TF-IDF値算出部３４は、算出したTF-IDF値と、それぞれの単語の出現位置を表す情報を特徴語選択部３５に出力する。 In step S3, the TF-IDF value calculation unit 34 calculates the TF-IDF value of each word constituting the sequence supplied from the morpheme analysis unit 33 as described above. The TF-IDF value calculation unit 34 outputs the calculated TF-IDF value and information indicating the appearance position of each word to the feature word selection unit 35.

ステップＳ４において、特徴語選択部３５は、各単語のTF-IDF値を比較し、最大のTF-IDF値が大きいものから順に、所定の数の単語を特徴語として選択する。特徴語選択部３５は、選択した特徴語と、それぞれの特徴語の最大のTF-IDF値が算出された出現位置の情報をコンテンツ記録部３１に出力する。 In step S4, the feature word selection unit 35 compares the TF-IDF values of the respective words, and selects a predetermined number of words as the feature words in order from the largest TF-IDF value. The feature word selection unit 35 outputs the selected feature word and the information on the appearance position where the maximum TF-IDF value of each feature word is calculated to the content recording unit 31.

ステップＳ５において、コンテンツ記録部３１は、特徴語選択部３５から供給された特徴語と出現位置の情報を番組のデータと対応付けて記録し、処理を終了させる。 In step S5, the content recording unit 31 records the feature word and the appearance position information supplied from the feature word selection unit 35 in association with the program data, and ends the process.

次に、図１２のフローチャートを参照して、録画済みの番組を再生する情報処理装置１の処理について説明する。この処理は、例えば、タイトルリストを表示することがユーザにより指示されたときに開始される。 Next, processing of the information processing apparatus 1 that reproduces a recorded program will be described with reference to the flowchart of FIG. This process is started, for example, when the user instructs to display the title list.

ステップＳ１１において、コンテンツ情報表示部３６は、録画済みの番組のタイトル、代表画などの番組に関する情報とともに特徴語をコンテンツ記録部３１から読み出す。 In step S 11, the content information display unit 36 reads out feature words from the content recording unit 31 together with information related to programs such as the titles of recorded programs and representative images.

ステップＳ１２において、コンテンツ情報表示部３６は、コンテンツ記録部３１から読み出した情報に基づいてタイトルリストの画面をテレビジョン受像機２に表示させる。タイトルリストの画面には、図８に示されるように、代表画、タイトル、特徴語などが録画済みの各番組について一覧表示される。 In step S 12, the content information display unit 36 causes the television receiver 2 to display a title list screen based on the information read from the content recording unit 31. On the title list screen, as shown in FIG. 8, representative images, titles, feature words, and the like are displayed in a list for each recorded program.

ステップＳ１３において、コンテンツ情報表示部３６は、タイトルリストに表示される番組の中から所定の番組が選択され、その番組の詳細情報の画面を表示することが指示されたか否かを判定し、指示されたと判定するまで待機する。 In step S13, the content information display unit 36 determines whether or not a predetermined program is selected from the programs displayed in the title list and an instruction to display the detailed information screen of the program is given. Wait until it is determined that it has been done.

ステップＳ１３において詳細情報の画面を表示することが指示されたと判定した場合、ステップＳ１４において、コンテンツ情報表示部３６は、図９に示されるような、特徴語の一覧を含む番組詳細情報の画面を表示させる。 If it is determined in step S13 that an instruction to display a detailed information screen has been given, in step S14, the content information display unit 36 displays a program detailed information screen including a list of feature words as shown in FIG. Display.

ステップＳ１５において、コンテンツ再生部３７は、番組詳細情報の画面から１つの特徴語が選択され、その特徴語の出現位置から番組の再生を開始することがユーザにより指示されたか否かを判定し、指示されたと判定するまで待機する。 In step S15, the content reproduction unit 37 determines whether one feature word is selected from the program detail information screen, and whether or not the user has instructed to start playing the program from the appearance position of the feature word, Wait until it is determined that it is instructed.

ステップＳ１５において１つの特徴語が選択され、その特徴語の出現位置から番組の再生を開始することがユーザにより指示されたと判定した場合、ステップＳ１６において、コンテンツ再生部３７は、選択された特徴語の出現位置、すなわち、選択された特徴語の最大のTF-IDF値が算出された出現位置以降の番組のデータをコンテンツ記録部３１から読み出し、再生する。テレビジョン受像機２には、図１０に示されるような番組の画面が表示される。録画済みの番組の再生が終了したとき、処理は終了される。 When one feature word is selected in step S15 and it is determined that the user has instructed to start playing the program from the appearance position of the feature word, in step S16, the content playback unit 37 selects the selected feature word. Is read from the content recording unit 31 and reproduced, from the appearance position after the appearance position where the maximum TF-IDF value of the selected feature word is calculated. On the television receiver 2, a program screen as shown in FIG. 10 is displayed. When the reproduction of the recorded program ends, the process ends.

以上においては、録画済みの番組の情報を表示し、再生する場合について説明したが、録画済みの番組だけでなく、放送番組やパッケージに記録されている映画などの、字幕データを含む他のコンテンツの情報を表示する場合にも、上述したようにしてTF-IDF値に基づいて選択された特徴語がユーザに提示されるようにしてもよい。 In the above, the case where the recorded program information is displayed and reproduced has been described. However, not only the recorded program but also other contents including subtitle data such as a broadcast program and a movie recorded in a package. Also when the above information is displayed, the feature word selected based on the TF-IDF value as described above may be presented to the user.

また、以上においては、１つのコンテンツの字幕に移動ウインドウが設定され、移動ウインドウに含まれる単語全体を１つのドキュメントとみなしてTF-IDF値の算出が行われるものとしたが、１つのコンテンツの字幕全体を１つのドキュメントとみなしてTF-IDF値の算出が行われるようにしてもよい。 In the above, a moving window is set for the caption of one content, and the TF-IDF value is calculated by regarding the entire word included in the moving window as one document. The TF-IDF value may be calculated by regarding the entire caption as one document.

このようにしてTF-IDF値の算出が行われた場合、複数のコンテンツ全体を見たときに出現コンテンツに偏りがあり、かつ、出現頻度が高い単語に対してTF-IDF値として大きな値が算出されることになる。移動ウインドウを設定してTF-IDF値を算出した場合と同様に、各単語のTF-IDF値が比較され、最大のTF-IDF値が大きいものから順に所定の数の単語が特徴語として選択され、ユーザに提示されるようにしてもよい。 When the TF-IDF value is calculated in this way, a large value is shown as a TF-IDF value for a word that appears biased when looking at a plurality of contents as a whole and has a high appearance frequency. Will be calculated. Similar to the case where the TF-IDF value is calculated by setting the moving window, the TF-IDF value of each word is compared, and a predetermined number of words are selected as feature words in order from the largest TF-IDF value. And may be presented to the user.

図１３は、他のタイトルリストの画面の例を示す図である。 FIG. 13 is a diagram illustrating an example of another title list screen.

図１３の例においては、『連続ドラマ「こんにちは」第１話』、『連続ドラマ「こんにちは」第２話』、『連続ドラマ「こんにちは」第３話』の３つの録画済みの番組のタイトルが表示されている。 In the example of FIG. 13, the "continuous drama" Hello "Episode 1", "continuous drama" Hello "Episode 2", "continuous drama" Hello "Episode 3" of three of the recorded program title display Has been.

『連続ドラマ「こんにちは」第１話』のタイトルの左側に表示されている画像Ｐ₁₁はこの番組の代表画であり、タイトルの下側に表示されている「馬」、「北海道」、「ラーメン」は、『連続ドラマ「こんにちは」第１話』の番組の字幕全体を１つのドキュメントとみなしてTF-IDF値の算出が行われ、算出されたTF-IDF値に基づいて選択された特徴語である。 "Drama series" Hello "Episode 1" image P ₁₁ that is displayed on the left side of the title is a typical image of this program, it is displayed on the lower side of the title "horse", "Hokkaido", "Ramen "is, the calculation of" continuous drama "Hello" TF-IDF values the entire subtitle of the program is regarded as one of the documents of the first episode "is performed, the feature words that have been selected on the basis of the calculated TF-IDF value It is.

『連続ドラマ「こんにちは」第２話』のタイトルの左側に表示されている画像Ｐ₁₂はこの番組の代表画であり、タイトルの下側に表示されている「アメリカ」、「飛行機」、「競馬」は、『連続ドラマ「こんにちは」第２話』の番組の字幕全体を１つのドキュメントとみなしてTF-IDF値の算出が行われ、算出されたTF-IDF値に基づいて選択された特徴語である。 "Drama series" Hello "Episode 2" image P ₁₂ that is displayed on the left side of the title is a typical image of this program, it is displayed on the lower side of the title "America", "airplane", "Horse Racing "is, the calculation of" continuous drama "Hello" the entire subtitle of the program the is regarded as one of the documents TF-IDF value of the second story "is performed, the feature words that have been selected on the basis of the calculated TF-IDF value It is.

『連続ドラマ「こんにちは」第３話』のタイトルの左側に表示されている画像Ｐ₁₃はこの番組の代表画であり、タイトルの下側に表示されている「日本」、「東京」、「帰国」は、『連続ドラマ「こんにちは」第３話』の番組の字幕全体を１つのドキュメントとみなしてTF-IDF値の算出が行われ、算出されたTF-IDF値に基づいて選択された特徴語である。 "Drama series" Hello "Episode 3" image P ₁₃ that is displayed on the left side of the title is a typical image of this program, are displayed on the lower side of the title "Japan", "Tokyo", "return home "is, the calculation of" continuous drama "Hello" the entire subtitle of the program the is regarded as one of the documents TF-IDF value of episode 3 "is performed, the feature words that have been selected on the basis of the calculated TF-IDF value It is.

図１３に示されるような画面から所定の番組を選択したとき、特徴語などを含む、番組詳細情報の画面が表示される。番組詳細情報の画面に表示されるものの中から特徴語が選択され、再生が指示されたとき、上述したように、特徴語の出現位置からコンテンツの再生が開始される。 When a predetermined program is selected from the screen as shown in FIG. 13, a program detail information screen including feature words and the like is displayed. When a feature word is selected from those displayed on the program detailed information screen and playback is instructed, the playback of the content is started from the appearance position of the feature word as described above.

TF-IDF値に基づいて選択された特徴語が提示されることにより、ユーザはそれぞれのコンテンツの内容を確認することができる。単純に出現頻度の多い単語が特徴語として選択されるとした場合、連続ドラマの各放送回の字幕を対象としたときには出演者の名前などが特徴語として選択されることが多くなるが、このように、TF-IDF値に基づいて特徴語が選択されることにより、全ての放送回の字幕に共通して含まれるような単語ではなく、それぞれの放送回の字幕にだけ含まれるような単語が提示されることになる。 The feature word selected based on the TF-IDF value is presented, so that the user can confirm the content of each content. If a word with a high frequency of appearance is simply selected as a feature word, the name of the performer is often selected as a feature word when subtitles for each broadcast episode of the series are targeted. Thus, when a feature word is selected based on the TF-IDF value, a word that is included only in the subtitles of each broadcast time, not a word that is included in all the subtitles of each broadcast time Will be presented.

上述した一連の処理は、ハードウエアにより実行することもできるし、ソフトウエアにより実行することもできる。一連の処理をソフトウエアにより実行する場合には、そのソフトウエアを構成するプログラムが、専用のハードウエアに組み込まれているコンピュータ、または、各種のプログラムをインストールすることで、各種の機能を実行することが可能な汎用のパーソナルコンピュータなどにインストールされる。 The series of processes described above can be executed by hardware or can be executed by software. When a series of processing is executed by software, a program constituting the software executes various functions by installing a computer incorporated in dedicated hardware or various programs. It can be installed on a general-purpose personal computer or the like.

インストールされる実行するプログラムは、例えば、磁気ディスク（フレキシブルディスクを含む）、光ディスク（CD-ROM(Compact Disc-Read Only Memory),DVD(Digital Versatile Disc)等）、光磁気ディスク、もしくは半導体メモリなどよりなるパッケージメディアである図２に示されるリムーバブルメディア２２に記録して、あるいは、ローカルエリアネットワーク、インターネット、デジタル衛星放送といった、有線または無線の伝送媒体を介して提供される。プログラムは、ROM１２や記録部１９に、あらかじめインストールしておくことができる。 The installed program to be executed is, for example, a magnetic disk (including a flexible disk), an optical disk (CD-ROM (Compact Disc-Read Only Memory), DVD (Digital Versatile Disc), etc.), a magneto-optical disk, or a semiconductor memory. It is recorded on the removable medium 22 shown in FIG. 2 which is a package medium comprising the above, or is provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting. The program can be installed in advance in the ROM 12 or the recording unit 19.

なお、コンピュータが実行するプログラムは、本明細書で説明する順序に沿って時系列に処理が行われるプログラムであっても良いし、並列に、あるいは呼び出しが行われたとき等の必要なタイミングで処理が行われるプログラムであっても良い。 The program executed by the computer may be a program that is processed in time series in the order described in this specification, or in parallel or at a necessary timing such as when a call is made. It may be a program for processing.

本発明の実施の形態は、上述した実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能である。 The embodiments of the present invention are not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the present invention.

本発明の一実施形態に係る情報処理装置の例を示す図である。It is a figure which shows the example of the information processing apparatus which concerns on one Embodiment of this invention. 情報処理装置のハードウエア構成例を示すブロック図である。It is a block diagram which shows the hardware structural example of information processing apparatus. 情報処理装置の機能構成例を示すブロック図である。It is a block diagram which shows the function structural example of information processing apparatus. 移動ウインドウの設定の例を示す図である。It is a figure which shows the example of the setting of a movement window. TF-IDF値の例を示す図である。It is a figure which shows the example of a TF-IDF value. TF-IDF値の他の例を示す図である。It is a figure which shows the other example of TF-IDF value. TF-IDF値のさらに他の例を示す図である。It is a figure which shows the further another example of TF-IDF value. タイトルリストの画面の例を示す図である。It is a figure which shows the example of the screen of a title list. 番組詳細情報の画面の例を示す図である。It is a figure which shows the example of the screen of program detailed information. 番組の画面の例を示す図である。It is a figure which shows the example of the screen of a program. 情報処理装置の特徴語選択処理について説明するフローチャートである。It is a flowchart explaining the feature word selection process of information processing apparatus. 情報処理装置の番組再生処理について説明するフローチャートである。It is a flowchart explaining the program reproduction | regeneration processing of information processing apparatus. タイトルリストの他の画面の例を示す図である。It is a figure which shows the example of the other screen of a title list.

Explanation of symbols

１情報処理装置，２テレビジョン受像機，３１コンテンツ記録部，３２字幕データ取得部，３３形態素解析部，３４ TF-IDF値算出部，３５特徴語選択部，３６コンテンツ情報表示部，３７コンテンツ再生部 DESCRIPTION OF SYMBOLS 1 Information processing apparatus, 2 Television receivers, 31 Content recording part, 32 Subtitle data acquisition part, 33 Morphological analysis part, 34 TF-IDF value calculation part, 35 Feature word selection part, 36 Content information display part, 37 Content reproduction | regeneration Part

Claims

Analysis means for analyzing the subtitles of the content and obtaining words constituting the subtitles;
For each word acquired by the analysis means, a calculation means for calculating a weighting factor considering the number of appearances and the bias of the appearance position;
Selection means for selecting a predetermined number of feature words representing features of the content based on the weighting factor calculated by the calculation means;
An information processing apparatus comprising: a reproduction unit that displays the feature word selected by the selection unit, and reproduces the content from a position where a sentence including the feature word selected from the displayed ones is displayed as a caption.

The calculation unit generates a column in which the words acquired by the analysis unit are arranged in the display order, sets a detection window for each range of a predetermined number of words constituting the column, and is included in each detection window. TF is the number of occurrences of the word of interest within the detection window containing the word, DF is the number of detection windows in which the word of interest appears among all the detection windows, and the weighting factor is TF The information processing apparatus according to claim 1, wherein the information processing apparatus calculates a weighting factor so as to increase as the DF increases and decrease as the DF increases.

The selection means compares the weighting factors calculated by the calculation means, and obtains a predetermined number of words in order from the word with the largest maximum value obtained in the same content or in the same content. The information processing apparatus according to claim 1, wherein a word whose maximum value is larger than a threshold value is selected as the feature word.

The information processing apparatus according to claim 1, wherein the reproduction unit displays the feature word selected by the selection unit together with information on content acquired from EPG data.

Analyze the subtitles in the content, get the words that make up the subtitles,
For each acquired word, calculate a weighting factor that takes into account the number of appearances and the bias of the appearance position,
Based on the calculated weighting factor, select a predetermined number of feature words representing the features of the content,
An information processing method comprising: displaying the selected feature word, and reproducing the content from a position where a sentence including the feature word selected from the displayed ones is displayed as a caption.

Analyze the subtitles in the content, get the words that make up the subtitles,
For each acquired word, calculate a weighting factor that takes into account the number of appearances and the bias of the appearance position,
Based on the calculated weighting factor, select a predetermined number of feature words representing the features of the content,
A program for displaying a selected feature word and causing a computer to execute a process including a step of reproducing the content from a position where a sentence including the feature word selected from the displayed ones is displayed as a caption.