JP2008009552A

JP2008009552A - Index generation device, index generation method and index generation program

Info

Publication number: JP2008009552A
Application number: JP2006177146A
Authority: JP
Inventors: Miki Sakai; 美樹境; Daishiro Yokozeki; 大子郎横関; Shinya Takada; 慎也高田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2006-06-27
Filing date: 2006-06-27
Publication date: 2008-01-17

Abstract

<P>PROBLEM TO BE SOLVED: To create a sufficiently useful and detailed index of entire audio content. <P>SOLUTION: When audio content (audio data) is input and an index creation command is received, an index generation device 20 generates a speech word list from words included in text data obtained through speech recognition of the audio content, calculates speech densities from the generated speech word list, extracts words whose speech density exceeds a predetermined value as topic words per unit time, generates a topic word list enumerating the extracted topic words in association with the predetermined unit time between the speech start time and speech end time of the audio content, extracts time points without any enumerated topic words on the generated topic word list as topic change points, and generates an index by sections delimited by the extracted topic change points from the topic words enumerated in each section on the topic word list. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

この発明は、音声コンテンツを音声認識してテキストデータを作成し、当該テキストデータから前記音声コンテンツに付与するインデクスを生成するインデクス生成装置、インデクス生成方法およびインデクス生成プログラムに関する。 The present invention relates to an index generation apparatus, an index generation method, and an index generation program that generate speech data by recognizing speech content and generate an index to be added to the speech content from the text data.

従来より、社会のブロードバンド化を背景に、インターネット環境に数多くの映像・音声コンテンツが流通するインフラが整いつつある。そして、これらの映像・音声コンテンツに対するインターネット上などでの検索方法は、映像・音声コンテンツをテキスト化して、文書によるディレクトリやタイトルによる検索が主である。 2. Description of the Related Art With the background of social broadbandization, an infrastructure for distributing a large number of video / audio contents in the Internet environment has been established. The search method on the Internet or the like for these video / audio contents is mainly a text / video content search and a search by a directory or title by document.

そこで、このような音声コンテンツにおいて、音声コンテンツの文書による検索を行うために、当該音声コンテンツ（例えば、音声データなど）を音声認識してインデクス化（重要語抽出）を行う様々な技術が開示されている。 In view of this, in order to search for such audio content using a document of the audio content, various techniques for recognizing the audio content (for example, audio data) and indexing (important word extraction) are disclosed. ing.

例えば、特許文献１（特開平９−１１４４９４号公報）では、「初めて出現した名詞、あるいは、前回出現してから長い間使用されていなかった名詞は、新しい話題の話題語となる可能性がある。一方、直前の発話に出現した名詞が次の発話でも出現した場合、この名詞は話題の維持の要因となる可能性が高い」という前提に基づいて、単語の出現頻度や出現間隔を利用して入力文書中の重要語を抽出し、抽出した重要語からなるインデクスを生成する技術が開示されている。 For example, in Patent Document 1 (Japanese Patent Laid-Open No. 9-114494), “a noun that appears for the first time or a noun that has not been used for a long time since the last appearance may become a topic word of a new topic” On the other hand, if the noun that appeared in the previous utterance appears also in the next utterance, this noun is likely to be a factor in maintaining the topic. ” A technique for extracting an important word from an input document and generating an index composed of the extracted important word is disclosed.

特開平９−１１４４９４号公報JP-A-9-114494

ところで、上記した従来の技術は、音声コンテンツ全体に対して一つのインデクスを作成するものに過ぎず、このインデクスを利用しても、音声コンテンツ内から所望の話題部分（ある話題について発話されている部分）を検索できるわけではないという課題があった。 By the way, the above-described conventional technique merely creates one index for the entire audio content, and even if this index is used, a desired topic portion (a certain topic is uttered from the audio content). There was a problem that it was not possible to search for (part).

そこで、この発明は、上述した従来技術の課題を解決するためになされたものであり、音声コンテンツ全体に対して十分に有用かつ詳細なインデクスを作成することが可能であるインデクス生成装置、インデクス生成方法およびインデクス生成プログラムを提供することを目的とする。 Therefore, the present invention has been made to solve the above-described problems of the prior art, and an index generation device and an index generation capable of creating a sufficiently useful and detailed index for the entire audio content It is an object to provide a method and an index generation program.

上述した課題を解決し、目的を達成するため、請求項１に係る発明は、音声コンテンツを音声認識してテキストデータを作成し、当該テキストデータから前記音声コンテンツに付与するインデクスを生成するインデクス生成装置であって、前記テキストデータに含まれる単語と当該単語の発話時刻とを抽出し、当該単語および発話時刻を対応付けて列挙した発話単語リストを生成する発話単語リスト生成手段と、前記発話単語リスト生成手段によって生成された発話単語リストに含まれる単語ごとに所定の単位時間当たりの発言回数を示す発話密度を算出し、当該発話密度が所定の値を超える単語を前記単位時間における話題語として抽出する話題語抽出手段と、前記音声コンテンツの発話開始時刻から発話終了時刻に至るまでの前記所定の単位時間ごとに、前記話題語抽出手段によって抽出された話題語を対応付けて列挙した話題語リストを生成する話題語リスト生成手段と、前記話題語リスト生成手段によって生成された話題語リストに列挙された話題語からインデクスを生成するインデクス生成手段と、を備えたことを特徴とする。 In order to solve the above-described problems and achieve the object, the invention according to claim 1 generates index data by recognizing speech content and generates an index to be given to the speech content from the text data. An utterance word list generation means for extracting a word included in the text data and an utterance time of the word, and generating an utterance word list in which the word and the utterance time are associated and listed; and the utterance word An utterance density indicating the number of utterances per predetermined unit time is calculated for each word included in the utterance word list generated by the list generation means, and words whose utterance density exceeds a predetermined value are used as topic words in the unit time. Topic word extracting means for extracting, and the predetermined content from the utterance start time to the utterance end time of the audio content A topic word list generating unit that generates a topic word list in which topic words extracted by the topic word extracting unit are associated and listed for each rank time, and a topic word list generated by the topic word list generating unit. And an index generating means for generating an index from the topic words thus obtained.

また、請求項２に係る発明は、上記の発明において、前記話題語リスト生成手段によって生成された話題語リストにおいて、前記話題語のいずれも列挙が続かない時刻を話題転換点として抽出する話題転換点抽出手段をさらに備え、前記インデクス生成手段は、前記話題転換点抽出手段によって抽出された話題転換点によって区切られる区間ごとに、前記話題語リストにおいて各区間で列挙された話題語からインデクスを生成することを特徴とする。 Further, in the invention according to claim 2, in the above invention, in the topic word list generated by the topic word list generating means, the topic conversion that extracts a time when none of the topic words are enumerated as a topic conversion point The index generation means further generates an index from the topic words listed in each section in the topic word list for each section delimited by the topic conversion points extracted by the topic conversion point extraction means. It is characterized by doing.

また、請求項３に係る発明は、上記の発明において、複数の単語間で互いの意味的距離を規定した距離辞書を記憶する距離辞書記憶手段をさらに備え、前記話題語抽出手段は、前記発話単語リストに含まれる所定の単語について所定の単位時間当たりの発話密度を算出する場合に、前記距離辞書記憶手段に記憶された距離辞書において当該所定の単語と意味的距離が所定の範囲内にある単語についても当該所定の単語と同一の単語であるとして、前記所定の単位時間当たりの発話密度を算出することを特徴とする。 The invention according to claim 3 further comprises distance dictionary storage means for storing a distance dictionary that defines a mutual semantic distance between a plurality of words in the above invention, wherein the topic word extraction means includes the utterance When calculating the utterance density per unit time for a predetermined word included in the word list, the predetermined word and the semantic distance are within a predetermined range in the distance dictionary stored in the distance dictionary storage means. The utterance density per predetermined unit time is calculated on the assumption that the word is also the same word as the predetermined word.

また、請求項４に係る発明は、上記の発明において、前記話題語として抽出すべき単語を列挙した話題語抽出用リストを記憶する話題語抽出用リスト記憶手段をさらに備え、前記話題語抽出手段は、前記話題語抽出用リスト記憶手段に記憶された話題語抽出用リストに列挙された単語であることを条件に、前記発話密度が所定の値を超える単語を前記単位時間における話題語として抽出することを特徴とする。 The invention according to claim 4 further comprises topic word extraction list storage means for storing a topic word extraction list in which the words to be extracted as the topic words are listed in the above invention, wherein the topic word extraction means Is extracted as a topic word in the unit time on the condition that the utterance density exceeds a predetermined value on the condition that the word is listed in the topic word extraction list stored in the topic word extraction list storage means It is characterized by doing.

また、請求項５に係る発明は、上記の発明において、前記話題転換点抽出手段は、前記話題語リストにおいて同一の話題語が所定の時間間隔を空けずに列挙されている場合には、当該話題語の列挙が続いたものとして、前記話題転換点を抽出することを特徴とする。 Further, in the invention according to claim 5, in the above invention, the topic turning point extracting means, when the same topic word is listed in the topic word list without a predetermined time interval, The topic turning points are extracted as the enumeration of topic words continues.

また、請求項６に係る発明は、上記の発明において、複数の単語間で互いの意味的距離を規定した距離辞書を記憶する距離辞書記憶手段と、前記距離辞書記憶手段に記憶された距離辞書を用いて、前記発話単語リスト生成手段によって生成された発話単語リストに含まれる単語間の意味的距離を評価し、当該評価が低い単語を前記発話単語リストから除去する単語除去手段と、をさらに備え、前記話題語抽出手段は、前記単語除去手段による単語除去後の発話単語リストに基づいて、当該発話単語リストに含まれる単語ごとに所定の単位時間当たりの発話密度を算出し、当該発話密度が所定の値を超える単語を前記単位時間における話題語として抽出することを特徴とする。 According to a sixth aspect of the present invention, in the above invention, a distance dictionary storage unit that stores a distance dictionary that defines a semantic distance between a plurality of words, and a distance dictionary stored in the distance dictionary storage unit A word removal unit that evaluates a semantic distance between words included in the utterance word list generated by the utterance word list generation unit and removes a word with a low evaluation from the utterance word list. The topic word extraction unit calculates a utterance density per unit time for each word included in the utterance word list based on the utterance word list after the word removal by the word removal unit, and the utterance density A word having a value exceeding a predetermined value is extracted as a topic word in the unit time.

また、請求項７に係る発明は、上記の発明において、前記距離辞書記憶手段に記憶された距離辞書を用いて、前記単語除去手段による単語除去後の発話単語リストに列挙された単語との意味的距離の評価が高い単語を前記距離辞書から抽出し、当該抽出した単語を前記発話単語リストに補完する単語補完手段をさらに備え、前記話題語抽出手段は、前記単語補完手段による単語補完後の発話単語リストに基づいて、当該発話単語リストに含まれる単語ごとに所定の単位時間当たりの発話密度を算出し、当該発話密度が所定の値を超える単語を前記単位時間における話題語として抽出することを特徴とする。 The invention according to claim 7 is the meaning of the words listed in the utterance word list after the word removal by the word removal means using the distance dictionary stored in the distance dictionary storage means in the above invention A word having a high evaluation of a target distance is extracted from the distance dictionary, and further includes a word complementing unit that complements the extracted word in the utterance word list, and the topic word extracting unit includes a word after word completion by the word complementing unit Calculating an utterance density per predetermined unit time for each word included in the utterance word list based on the utterance word list, and extracting words having the utterance density exceeding a predetermined value as topic words in the unit time It is characterized by.

また、請求項８に係る発明は、上記の発明において、前記話題転換点抽出手段によって抽出された話題転換点によって区切られる区間ごとに、前記話題語リストにおいて各区間で列挙された話題語それぞれの発話頻度を算出し、当該発話頻度が所定の値を超える話題語を重要語として抽出する重要語抽出手段をさらに備え、前記インデクス生成手段は、前記話題転換点によって区切られる区間ごとに、前記重要語抽出手段によって重要語として抽出された話題語を列挙したインデクスを生成することを特徴とする。 Further, in the invention according to claim 8, in the above invention, for each section delimited by the topic turning point extracted by the topic turning point extracting means, each of the topic words listed in each section in the topic word list It further comprises an important word extraction means for calculating an utterance frequency and extracting a topic word having the utterance frequency exceeding a predetermined value as an important word, and the index generation means is configured to extract the important words for each section delimited by the topic turning points. An index listing topic words extracted as important words by the word extraction means is generated.

また、請求項９に係る発明は、上記の発明において、前記インデクス生成手段は、前記話題転換点によって区切られる区間ごとに、前記発話時刻、前記発話密度、前記発話頻度、前記意味的距離のいずれか一つまたは複数に基づいて、前記話題語を編集してインデクスを生成することを特徴とする。 The invention according to claim 9 is the above invention, wherein the index generation means is any of the utterance time, the utterance density, the utterance frequency, and the semantic distance for each section delimited by the topic turning points. The topic word is edited based on one or more to generate an index.

また、請求項１０に係る発明は、音声コンテンツを音声認識してテキストデータを作成し、当該テキストデータから前記音声コンテンツに付与するインデクスを生成するインデクス生成装置であって、複数の単語間で互いの意味的距離を規定した距離辞書を記憶する距離辞書記憶手段と、前記テキストデータに含まれる単語と当該単語の発話時刻とを抽出し、当該単語および発話時刻を対応付けて列挙した発話単語リストを生成する発話単語リスト生成手段と、前記距離辞書記憶手段に記憶された距離辞書を用いて、前記発話単語リスト生成手段によって生成された発話単語リストに含まれる単語間の意味的距離を評価し、当該評価が低い単語を前記発話単語リストから除去する単語除去手段と、前記単語除去手段によって単語除去後の発話単語リストに基づいてインデクスを生成するインデクス生成手段と、を備えたことを特徴とする。 The invention according to claim 10 is an index generation device that generates speech data by recognizing speech content and generates an index to be added to the speech content from the text data. Distance dictionary storage means for storing a distance dictionary that defines the semantic distance of the word, a word included in the text data and the utterance time of the word, and a utterance word list in which the word and the utterance time are associated and listed Utterance word list generation means for generating the utterance word and a distance dictionary stored in the distance dictionary storage means to evaluate a semantic distance between words included in the utterance word list generated by the utterance word list generation means. , A word removing unit that removes a word with a low evaluation from the utterance word list, and an utterance after the word removal by the word removing unit Characterized in that and a index generating means for generating an index based on the word list.

また、請求項１１に係る発明は、上記の発明において、前記距離辞書記憶手段に記憶された距離辞書を用いて、前記単語除去手段による単語除去後の発話単語リストに列挙された単語との意味的距離の評価が高い単語を前記距離辞書から抽出し、当該抽出した単語を前記発話単語リストに補完する単語補完手段をさらに備え、前記インデクス生成手段は、前記単語補完手段による単語補完後の発話単語リストに基づいてインデクスを生成することを特徴とする。 The invention according to claim 11 is the meaning of the words listed in the utterance word list after the word removal by the word removal means using the distance dictionary stored in the distance dictionary storage means in the above invention. A word completion means for extracting a word having a high target distance evaluation from the distance dictionary and complementing the extracted word in the utterance word list, wherein the index generation means is an utterance after word completion by the word completion means An index is generated based on the word list.

また、請求項１２に係る発明は、音声コンテンツを音声認識してテキストデータを作成し、当該テキストデータから前記音声コンテンツに付与するインデクスを生成することに適するインデクス生成方法であって、前記テキストデータに含まれる単語と当該単語の発話時刻とを抽出し、当該単語および発話時刻を対応付けて列挙した発話単語リストを生成する発話単語リスト生成工程と、前記発話単語リスト生成工程によって生成された発話単語リストに含まれる単語ごとに所定の単位時間当たりの発言回数を示す発話密度を算出し、当該発話密度が所定の値を超える単語を前記単位時間における話題語として抽出する話題語抽出工程と、前記音声コンテンツの発話開始時刻から発話終了時刻に至るまでの前記所定の単位時間ごとに、前記話題語抽出工程によって抽出された話題語を対応付けて列挙した話題語リストを生成する話題語リスト生成工程と、前記話題語リスト生成工程によって生成された話題語リストに列挙された話題語からインデクスを生成するインデクス生成工程と、を含んだことを特徴とする。 The invention according to claim 12 is an index generation method suitable for generating speech data by recognizing speech content and generating an index to be added to the speech content from the text data, the text data Utterance word list generating step of extracting a word included in the word and the utterance time of the word, generating an utterance word list in which the word and the utterance time are associated and enumerated, and the utterance generated by the utterance word list generation step A topic word extraction step of calculating an utterance density indicating the number of utterances per predetermined unit time for each word included in the word list, and extracting words whose utterance density exceeds a predetermined value as topic words in the unit time; For each predetermined unit time from the utterance start time to the utterance end time of the audio content, A topic word list generating step for generating a topic word list in which the topic words extracted by the word extracting step are associated and enumerated; and an index from the topic words listed in the topic word list generated by the topic word list generating step. And an index generation step to be generated.

また、請求項１３に係る発明は、音声コンテンツを音声認識してテキストデータを作成し、当該テキストデータから前記音声コンテンツに付与するインデクスを生成することに適するインデクス生成方法をコンピュータに実行させるインデクス作成プログラムであって、前記テキストデータに含まれる単語と当該単語の発話時刻とを抽出し、当該単語および発話時刻を対応付けて列挙した発話単語リストを生成する発話単語リスト生成手順と、前記発話単語リスト生成手順によって生成された発話単語リストに含まれる単語ごとに所定の単位時間当たりの発言回数を示す発話密度を算出し、当該発話密度が所定の値を超える単語を前記単位時間における話題語として抽出する話題語抽出手順と、前記音声コンテンツの発話開始時刻から発話終了時刻に至るまでの前記所定の単位時間ごとに、前記話題語抽出手順によって抽出された話題語を対応付けて列挙した話題語リストを生成する話題語リスト生成工程と、前記話題語リスト生成手順によって生成された話題語リストに列挙された話題語からインデクスを生成するインデクス生成手順と、をコンピュータに実行させることを特徴とする。 The invention according to claim 13 is an index creation for causing a computer to execute an index generation method suitable for generating text data by recognizing voice content and generating an index to be given to the voice content from the text data. An utterance word list generation procedure for extracting a word included in the text data and an utterance time of the word, and generating an utterance word list in which the word and the utterance time are associated and listed, and the utterance word An utterance density indicating the number of utterances per predetermined unit time is calculated for each word included in the utterance word list generated by the list generation procedure, and words whose utterance density exceeds a predetermined value are used as topic words in the unit time. The topic word extraction procedure to be extracted and the utterance end from the utterance start time of the audio content A topic word list generation step for generating a topic word list in which the topic words extracted by the topic word extraction procedure are associated and listed for each predetermined unit time until the time reaches, and the topic word list generation procedure An index generation procedure for generating an index from topic words listed in the generated topic word list is executed by a computer.

請求項１または１０、１１、１２、１３の発明によれば、テキストデータに含まれる単語と当該単語の発話時刻とを抽出し、当該単語および発話時刻を対応付けて列挙した発話単語リストを生成し、生成された発話単語リストに含まれる単語ごとに所定の単位時間当たりの発言回数を示す発話密度を算出し、当該発話密度が所定の値を超える単語を単位時間における話題語として抽出し、音声コンテンツの発話開始時刻から発話終了時刻に至るまでの所定の単位時間ごとに、抽出された話題語を対応付けて列挙した話題語リストを生成し、生成された話題語リストにおいて、話題語のいずれも列挙が続かない時刻を話題転換点として抽出し、抽出された話題転換点によって区切られる区間ごとに、話題語リストにおいて各区間で列挙された話題語からインデクスを生成するので、音声コンテンツ全体に対して十分に有用かつ詳細なインデクスを作成することが可能である。 According to the invention of claim 1 or 10, 11, 12, and 13, a word included in text data and an utterance time of the word are extracted, and an utterance word list in which the word and the utterance time are associated and listed is generated. And calculating an utterance density indicating the number of utterances per predetermined unit time for each word included in the generated utterance word list, and extracting a word whose utterance density exceeds a predetermined value as a topic word in the unit time, For each predetermined unit time from the utterance start time of the audio content to the utterance end time, a topic word list in which the extracted topic words are associated and listed is generated, and in the generated topic word list, the topic word In each case, the time when enumeration does not continue is extracted as a topic turning point, and for each section delimited by the extracted topic turning point, the stories listed in each section in the topic word list Because it generates an index from the word, it is possible to create a sufficiently useful and detailed index for the entire audio content.

請求項２の発明によれば、生成された話題語リストにおいて、話題語のいずれも列挙が続かない時刻を話題転換点として抽出し、抽出された話題転換点によって区切られる区間ごとに、話題語リストにおいて各区間で列挙された話題語からインデクスを生成するので、音声コンテンツに含まれる複数の話題を抽出して、それぞれの話題に対して複数のインデクスが作成できる結果、音声コンテンツ全体に対してさらに有用かつ詳細なインデクスを作成することが可能である。 According to the invention of claim 2, in the generated topic word list, the time when none of the topic words are enumerated is extracted as a topic turning point, and for each section delimited by the extracted topic turning point, the topic word Since the index is generated from the topic words listed in each section in the list, multiple topics included in the audio content can be extracted, and multiple indexes can be created for each topic. It is possible to create a more useful and detailed index.

また、請求項３の発明によれば、複数の単語間で互いの意味的距離を規定した距離辞書を記憶し、発話単語リストに含まれる所定の単語について所定の単位時間当たりの発話密度を算出する場合に、記憶された距離辞書において当該所定の単語と意味的距離が所定の範囲内にある単語についても当該所定の単語と同一の単語であるとして、所定の単位時間当たりの発話密度を算出するので、音声コンテンツ全体に対して、より十分に有用かつ詳細なインデクスを作成することが可能である。 According to the invention of claim 3, a distance dictionary that defines a semantic distance between a plurality of words is stored, and an utterance density per predetermined unit time is calculated for a predetermined word included in the utterance word list. In the stored distance dictionary, the utterance density per predetermined unit time is calculated assuming that the word whose semantic distance is within the predetermined range in the stored distance dictionary is the same as the predetermined word. Therefore, it is possible to create a more sufficiently useful and detailed index for the entire audio content.

また、請求項４の発明によれば、話題語として抽出すべき単語を列挙した話題語抽出用リストを記憶し、記憶された話題語抽出用リストに列挙された単語であることを条件に、発話密度が所定の値を超える単語を単位時間における話題語として抽出するので、より正確に話題転換点を抽出することが可能である。 According to the invention of claim 4, a topic word extraction list in which words to be extracted as topic words are listed is stored, and on the condition that the words are listed in the stored topic word extraction list, Since words whose utterance density exceeds a predetermined value are extracted as topic words in unit time, topic turning points can be extracted more accurately.

また、請求項５の発明によれば、話題語リストにおいて同一の話題語が所定の時間間隔を空けずに列挙されている場合には、当該話題語の列挙が続いたものとして、話題転換点を抽出するので、無駄に細かく話題語を抽出してインデクス全体に不要な情報が抽出されることを防止することが可能である。 According to the invention of claim 5, when the same topic word is enumerated without a predetermined time interval in the topic word list, it is assumed that the enumeration of the topic word has continued and the topic turning point Therefore, it is possible to prevent unnecessary information from being extracted from the entire index by extracting topic words in detail.

また、請求項６の発明によれば、複数の単語間で互いの意味的距離を規定した距離辞書を記憶し、記憶された距離辞書を用いて、生成された発話単語リストに含まれる単語間の意味的距離を評価し、当該評価が低い単語を発話単語リストから除去し、単語除去後の発話単語リストに基づいて、当該発話単語リストに含まれる単語ごとに所定の単位時間当たりの発話密度を算出し、当該発話密度が所定の値を超える単語を単位時間における話題語として抽出するので、意味的に遠い単語を削除していない抽出単語データから話題語を抽出する場合に比べて、さらに正確に話題語を特定することが可能である。 According to the invention of claim 6, a distance dictionary that defines a mutual semantic distance between a plurality of words is stored, and the words between words included in the generated utterance word list are stored using the stored distance dictionary. The utterance density per unit time is determined for each word included in the utterance word list based on the utterance word list after the word is removed. Since the word whose utterance density exceeds a predetermined value is extracted as a topic word in unit time, compared to the case where a topic word is extracted from extracted word data in which words that are semantically far are not deleted, It is possible to accurately identify the topic word.

また、請求項７の発明によれば、記憶された距離辞書を用いて、単語除去後の発話単語リストに列挙された単語との意味的距離の評価が高い単語を距離辞書から抽出し、当該抽出した単語を発話単語リストに補完し、単語補完後の発話単語リストに基づいて、当該発話単語リストに含まれる単語ごとに所定の単位時間当たりの発話密度を算出し、当該発話密度が所定の値を超える単語を単位時間における話題語として抽出するので、意味的に近い単語を補完しなかった場合に比べて、より正確に話題語を特定することが可能である。 Further, according to the invention of claim 7, using the stored distance dictionary, a word having a high semantic distance evaluation with the words listed in the utterance word list after the word removal is extracted from the distance dictionary, The extracted words are supplemented to the utterance word list, and the utterance density per unit time is calculated for each word included in the utterance word list based on the utterance word list after the word completion, Since words exceeding the value are extracted as topic words in unit time, it is possible to specify the topic words more accurately than in the case where words that are semantically close are not complemented.

また、請求項８の発明によれば、抽出された話題転換点によって区切られる区間ごとに、話題語リストにおいて各区間で列挙された話題語それぞれの発話頻度を算出し、当該発話頻度が所定の値を超える話題語を重要語として抽出し、話題転換点によって区切られる区間ごとに、重要語として抽出された話題語を列挙したインデクスを生成するので、話題ごとにより有用な単語をインデクスとして作成することが可能である。 Further, according to the invention of claim 8, for each section delimited by the extracted topic turning points, the utterance frequency of each topic word listed in each section in the topic word list is calculated, and the utterance frequency is predetermined. Topic words that exceed the value are extracted as important words, and an index that lists the topic words extracted as important words is generated for each section delimited by topic turning points. Therefore, useful words are created as indexes for each topic. It is possible.

また、請求項９の発明によれば、話題転換点によって区切られる区間ごとに、発話時刻、発話密度、発話頻度、意味的距離のいずれか一つまたは複数に基づいて、話題語を編集してインデクスを生成するので、インターネットの検索や会議録音データなどの議題検索など利用目的に応じたインデクスの表示を行うことが可能である。 According to the invention of claim 9, the topic word is edited on the basis of one or more of the utterance time, the utterance density, the utterance frequency, and the semantic distance for each section delimited by the topic turning point. Since an index is generated, it is possible to display an index according to the purpose of use, such as searching the Internet or searching for agenda items such as conference recording data.

以下に添付図面を参照して、この発明に係るインデクス生成装置、インデクス生成方法およびインデクス生成プログラムの実施例を詳細に説明する。なお、以下では、実施例で用いる主要な用語の説明、実施例１に係るインデクス生成装置の概要および特徴、インデクス生成装置の構成および処理の流れ、実施例１による効果について説明する。 Exemplary embodiments of an index generation device, an index generation method, and an index generation program according to the present invention will be described below in detail with reference to the accompanying drawings. In the following, description of main terms used in the embodiment, outline and features of the index generation device according to the first embodiment, configuration and processing flow of the index generation device, and effects of the first embodiment will be described.

［用語の説明］
まず最初に、以下の実施例で用いる主要な用語を説明する。実施例で用いる「音声コンテンツ（音声データ）」とは、従来からあるテキストデータではなく、デジタルデータで表現された人の話す音声、音楽や映像の集合のことであり、近年、インターネットなどに多く利用されている。具体的には、図３に示したようなデータがこれに該当する。 [Explanation of terms]
First, main terms used in the following examples will be described. “Audio content (audio data)” used in the embodiment is not a conventional text data but a set of voices, music, and videos spoken by a person expressed in digital data. It's being used. Specifically, data as shown in FIG. 3 corresponds to this.

また、「音声認識」とは、上記で説明した音声コンテンツをテキスト化する技術である。具体的には、人の話す音声言語をコンピュータによって解析し、話している内容を文字データとして取り出す。例えば、図３に示した音声コンテンツ（音声データ）を音声認識すると、図４に示したようなテキストデータを取り出すことができる。そして、このように、音声コンテンツをテキスト化したテキストデータを使用することで、インターネットなどで音声コンテンツの検索を行うことができる。 “Voice recognition” is a technique for converting the audio content described above into text. Specifically, the spoken language spoken by a person is analyzed by a computer, and the content being spoken is extracted as character data. For example, when the audio content (audio data) shown in FIG. 3 is recognized by voice, the text data as shown in FIG. 4 can be extracted. Thus, by using text data obtained by converting audio content into text, it is possible to search for audio content over the Internet or the like.

また、「インデクス」とは、コンピュータが扱うデータにおいて、特定の項目を素早く参照できるように、当該特定の項目に対して索引を付与するデータのことである。付与するインデクスの例を挙げると、例えば、会議を録音した音声コンテンツに対して「戦略会議」や音楽の音声コンテンツに対して「曲名」などを付与する。 The “index” is data that gives an index to a specific item so that the specific item can be referred to quickly in the data handled by the computer. As an example of the index to be given, for example, “strategic conference” is given to the audio content recorded in the conference, and “song name” is given to the audio content of music.

［インデクス生成装置の概要および特徴（実施例１）］
次に、図１を用いて、実施例１に係るインデクス装置の概要および特徴を説明する。図１は、実施例１に係るインデクス生成装置の概要および特徴を説明するための図である。 [Outline and Features of Index Generation Device (Example 1)]
Next, the outline and characteristics of the index device according to the first embodiment will be described with reference to FIG. FIG. 1 is a diagram for explaining an overview and characteristics of the index generation device according to the first embodiment.

図１に示すように、音声コンテンツ（音声データ）が入力されインデクス作成指示を受け付けると、インデクス生成装置２０は、入力された音声コンテンツの音声認識を行う。具体的に例を挙げると、図３に示したような音声コンテンツが入力されると、インデクス生成装置２０は、図４に示したテキストデータ（より具体的には、図５に示すように発話時刻ごとにデータを対応付けたテキストデータ）を作成する。なお、音声コンテンツの入力は、ネットワークを介して他のコンピュータなどから入力されてもよく、また、ＦＤ、ＣＤなどの記録媒体から入力されてもよい。 As illustrated in FIG. 1, when audio content (audio data) is input and an index creation instruction is received, the index generation device 20 performs speech recognition of the input audio content. To give a specific example, when the audio content as shown in FIG. 3 is input, the index generating device 20 performs the text data shown in FIG. 4 (more specifically, the utterance as shown in FIG. 5). Text data in which data is associated with each time). Note that the audio content may be input from another computer or the like via a network, or may be input from a recording medium such as an FD or a CD.

そして、実施例１に係るインデクス生成装置２０は、このようにして、音声コンテンツを音声認識した当該テキストデータから音声コンテンツに付与するインデクスを生成することを概要とするものであり、特に、音声コンテンツ全体に対して十分に有用かつ詳細なインデクスを生成することが可能である点に主たる特徴がある。 Then, the index generation device 20 according to the first embodiment outlines the generation of an index to be added to the audio content from the text data obtained by recognizing the audio content in this manner, and in particular, the audio content. The main feature is that it is possible to generate a sufficiently useful and detailed index for the whole.

この主たる特徴を具体的に説明すると、インデクス生成装置２０は、当該テキストデータに含まれる単語と当該単語の発話時刻とを抽出し、当該単語および発話時刻を対応付けて列挙した発話単語リストを生成する（図１の（１）参照）。具体的に例を挙げると、当該テキストデータに出現した単語（実際に発話された単語）と出現した時間（発話された時間）とを対応付けて、「１０：０６：００、環境、教育」や「１０：０６：０４、図書館」などと発話単語を抽出して、発話単語リスト（図６参照）を生成する。 Specifically, the main feature is that the index generation device 20 extracts a word included in the text data and the utterance time of the word, and generates an utterance word list in which the word and the utterance time are associated and listed. (Refer to (1) in FIG. 1). To give a specific example, the words appearing in the text data (actually spoken words) and the appearing times (spoken time) are associated with each other by “10:06:00, environment, education”. Or “10:06:04, library” or the like is extracted to generate an utterance word list (see FIG. 6).

そして、インデクス生成装置２０は、生成された発話単語リストに含まれる単語ごとに所定の単位時間当たりの発言回数を示す発話密度を算出し、当該発話密度が所定の値を超える単語を単位時間における話題語として抽出する（図１の（２）参照）。具体的には、生成された発話単語リストに含まれる単語ごとに所定の単位時間当たり（Δｔ）の発言回数（Δｈ：発言回数）を示す発話密度（Δｈ／Δｔ＝Ｘ）を算出し、当該発話密度（Ｘ）が所定の値（例えば、２０）を超える単語を単位時間における話題語として抽出する（図７参照）。例を挙げると、ここでは、インデクス装置２０は、「単語Ａ、単語Ｂ、単語Ｃ」を話題語として抽出する。 Then, the index generation device 20 calculates an utterance density indicating the number of utterances per predetermined unit time for each word included in the generated utterance word list, and selects a word whose utterance density exceeds a predetermined value in the unit time. It is extracted as a topic word (see (2) in FIG. 1). Specifically, an utterance density (Δh / Δt = X) indicating the number of utterances (Δh: number of utterances) per predetermined unit time (Δt) is calculated for each word included in the generated utterance word list, Words whose utterance density (X) exceeds a predetermined value (for example, 20) are extracted as topic words in unit time (see FIG. 7). For example, here, the indexing apparatus 20 extracts “word A, word B, word C” as topic words.

続いて、インデクス装置２０は、音声コンテンツの発話開始時刻から発話終了時刻に至るまでの所定の単位時間ごとに、抽出された話題語を対応付けて列挙した話題語リストを生成する（図１の（３）参照）。上記した例で具体的に説明すると、インデクス生成装置２０は、「発話（出現）時刻、単語名」として「９：００：００、単語Ａ」、「９：１０：０６、単語Ａ、単語Ｂ」、「９：１７：００、単語Ａ、単語Ｂ」「９：２０：１２、単語Ｃ」を話題語リストとして作成する。 Subsequently, the index device 20 generates a topic word list in which the extracted topic words are associated and listed for each predetermined unit time from the utterance start time to the utterance end time of the audio content (FIG. 1). (See (3)). More specifically, the index generation device 20 uses “9:00: 00, word A”, “9:10:06, word A, word B as“ speech (appearance time), word name ”. ", 9:17:00, word A, word B" and "9:20:12, word C" are created as a topic word list.

そして、インデクス生成装置２０は、生成された話題語リストにおいて、話題語のいずれも列挙が続かない時刻を話題転換点として抽出する（図１の（４）参照）。上記した例で具体的に説明すると、インデクス生成装置２０は、抽出し作成した話題語リストに含まれる単語ごとの出現密度を出現（発話）順に、図１の（４）に示したようにプロットする。そして、インデクス装置２０は、それぞれの区間が重ならない（不連続な）時刻、つまり、「単語Ｃ」の出現時刻（Ｔ＝９：２０：１２）を話題転換点として抽出する。 And the index production | generation apparatus 20 extracts the time when none of enumeration of a topic word continues as a topic turning point in the produced | generated topic word list (refer (4) of FIG. 1). Specifically, in the above example, the index generation device 20 plots the appearance density for each word included in the extracted topic word list in the order of appearance (utterance) as shown in (4) of FIG. To do. Then, the index device 20 extracts the time when the sections do not overlap (discontinuous), that is, the appearance time (T = 9: 20: 12) of the “word C” as the topic turning point.

続いて、インデクス生成装置２０は、抽出された話題転換点によって区切られる区間ごとに、話題語リストにおいて各区間で列挙された話題語からインデクスを生成する（図１の（５）参照）。上記した例で具体的に説明すると、インデクス生成装置２０は、抽出された話題転換点「単語Ｃの出現時刻：Ｔ」より前の時刻「Ｔ０−Ｔ」では、「単語Ａ」と「単語Ｂ」についての話題である「話題区間１」とし、「出現時刻：Ｔ」以降の時刻「Ｔ−Ｔ１」では、「単語Ｃ」についての話題である「話題区間２」として、それぞれの話題区間について、インデクス「９：００：００〜９：２０：１１、単語Ａ、単語Ｂ」や「９：２０：１２〜９：３０：００、単語Ｃ」などを生成する。 Subsequently, the index generation device 20 generates an index from the topic words listed in each section in the topic word list for each section delimited by the extracted topic turning points (see (5) in FIG. 1). More specifically, in the above example, the index generation device 20 performs “word A” and “word B” at time “T0-T” before the extracted topic turning point “appearance time of word C: T”. "Topic section 1" which is a topic about "Topic section", and "Topic section 2" which is a topic about "Word C" at time "T-T1" after "Appearance time: T". The index “9: 00: 00: 00 to 9:20:11, word A, word B”, “9:20:12 to 9:30:30, word C”, and the like are generated.

このように、音声コンテンツに対してある時間帯における話題を特定し、その話題の間に話題を特定するインデクスを付与し、また別の話題の間では別のインデクスを付与することができるなど、音声コンテンツに含まれる複数の話題を抽出して、それぞれの話題に対して複数のインデクスが作成できる結果、上記した主たる特徴のごとく、音声コンテンツ全体に対して十分に有用かつ詳細なインデクスを作成することが可能である。 In this way, it is possible to identify a topic in a certain time zone for audio content, give an index for identifying the topic between the topics, and assign another index between other topics, etc. As a result of extracting multiple topics included in the audio content and creating multiple indexes for each topic, a sufficiently useful and detailed index is created for the entire audio content as described above. It is possible.

［インデクス装置の構成（実施例１）］
次に、図２〜図１０を用いて、図１に示したインデクス生成装置２０の構成を説明する。図２は、インデクス生成装置２０の構成を示すブロック図であり、図３は、音声コンテンツに記憶される情報の構成例を示す図であり、図４と図５は、テキストデータに記憶される情報の構成例を示す図であり、図６は、発話単語リストに記憶される情報の構成例を示す図であり、図７は、実施例１に係る発話密度の算出方法の例を示す図であり、図８は、話題語リストに記憶される情報の構成例を示す図であり、図９は、実施例１に係る話題転換点の算出方法の例を示す図であり、図１０は、インデクスに記憶される情報の構成例を示す図ある。 [Configuration of Indexing Device (Example 1)]
Next, the configuration of the index generation device 20 shown in FIG. 1 will be described with reference to FIGS. FIG. 2 is a block diagram showing the configuration of the index generation device 20, FIG. 3 is a diagram showing an example of the configuration of information stored in audio content, and FIGS. 4 and 5 are stored in text data. FIG. 6 is a diagram illustrating a configuration example of information stored in the utterance word list, and FIG. 7 is a diagram illustrating an example of an utterance density calculation method according to the first embodiment. FIG. 8 is a diagram illustrating an example of the configuration of information stored in the topic word list, FIG. 9 is a diagram illustrating an example of a topic turning point calculation method according to the first embodiment, and FIG. It is a figure which shows the structural example of the information memorize | stored in an index.

図２に示すように、このインデクス生成装置２０は、通信制御Ｉ／Ｆ部２１と、記憶部２２と、制御部２３とから構成される。通信制御Ｉ／Ｆ部２１は、インデクス生成装置２０にネットワークなどを介して接続される他装置との間でやり取りする各種情報に関する通信を制御する。具体的に例を挙げれば、インデクス生成装置２０に入力される音声コンテンツなどを他の装置から受信する。 As illustrated in FIG. 2, the index generation device 20 includes a communication control I / F unit 21, a storage unit 22, and a control unit 23. The communication control I / F unit 21 controls communication related to various types of information exchanged with other devices connected to the index generation device 20 via a network or the like. To give a specific example, audio content input to the index generation device 20 is received from another device.

記憶部２２は、制御部２３による各種処理に必要なデータおよびプログラムを格納（記憶）する他に、特に本発明に密接に関連するものとして、テキストデータ２２ａと、発話単語リスト２２ｂと、話題語リスト２２ｃと、インデクス２２ｄとを備える。 In addition to storing (storing) data and programs necessary for various processes by the control unit 23, the storage unit 22 is particularly closely related to the present invention and includes text data 22a, an utterance word list 22b, topic words A list 22c and an index 22d are provided.

テキストデータ２２ａは、入力された音声コンテンツを音声認識して得られたテキストデータを記憶する。具体的に例を挙げて説明すると、インデクス生成装置２０は、図３に示すような音声コンテンツが入力されると、音声認識をして、図４やさらに時刻を対応付けた図５などを記憶する。 The text data 22a stores text data obtained by voice recognition of input voice content. More specifically, the index generation device 20 recognizes the voice when the voice content as shown in FIG. 3 is input, and stores FIG. 4 and FIG. To do.

発話単語リスト２２ｂは、後述する発話単語リスト生成部２３ｂにより生成されたデータを記憶する。具体的に例を挙げて説明すると、図６に示したように、「発話単語」と「当該発話単語が発話された時刻」とを対応付けて、「１０：０６：００、環境、教育」や「１０：０６：０２、高齢者、環境、教育、障害者」などと記憶する。 The utterance word list 22b stores data generated by an utterance word list generation unit 23b described later. More specifically, as shown in FIG. 6, “10: 6: 00, environment, education” is made by associating the “uttered word” with “the time when the spoken word was spoken”. Or “10:06:02, elderly, environment, education, disabled”.

話題語リスト２２ｃは、後述する話題語リスト生成部２３ｄにより生成された話題語を記憶する。具体的に例を挙げれば、図８に示すように、話題語抽出部２３ｃにより抽出されたそれぞれの話題語を、それぞれ出現した時間軸に対応付けて格納する。 The topic word list 22c stores topic words generated by a topic word list generation unit 23d described later. As a specific example, as shown in FIG. 8, each topic word extracted by the topic word extraction unit 23c is stored in association with each appearing time axis.

インデクス２２ｄは、後述するインデクス生成部２３ｆにより生成されたデータを記憶する。具体的に例を挙げて説明すると、図１０に示すように、『後述する話題語転換点抽出部２３ｅにより抽出された話題転換点により区切られる「話題区間」と、インデクス生成部２３ｆにより生成された「インデクス」』とを対応付けて、「１０：００：００〜１０：０６：００、未来図書館、建設、中央」、「１０：０６：０１〜１０：１７：２０、図書館、未来図書館、土地」や「１０：１７：２１〜１０：３０：００、図書館、要望、自衛隊」などと記憶する。 The index 22d stores data generated by an index generation unit 23f described later. More specifically, as shown in FIG. 10, “a topic section” delimited by a topic turning point extracted by a topic word turning point extracting unit 23e described later and an index generating unit 23f are used. “10: 00: 10-10: 06: 00, Future Library, Construction, Central”, “10: 06: 01-10: 17: 20, Library, Future Library, “Land” or “10: 17: 21-10: 30, Library, Request, Self-Defense Force”.

制御部２３は、ＯＳ（Operating System）などの制御プログラム、各種の処理手順などを規定したプログラムおよび所要データを格納するための内部メモリを有する他に、特に本発明に密接に関連するものとして、音声認識部２３ａと、発話単語リスト生成部２３ｂと、話題語抽出部２３ｃと、話題語リスト生成部２３ｄと、話題語転換点抽出部２３ｅと、インデクス生成部２３ｆとを備え、これらによって種々の処理を実行する。なお、発話単語リスト生成部２３ｂは、特許請求の範囲に記載の「発話単語リスト生成手段」に対応し、同様に、話題語抽出部２３ｃは、「話題語抽出手段」に対応し、話題語リスト生成部２３ｄは、「話題語リスト生成手段」に対応し、話題語転換点抽出部２３ｅは、「話題語転換点抽出手段」に対応し、インデクス生成部２３ｆは、「インデクス生成手段」に対応する。 The control unit 23 has a control program such as an OS (Operating System), a program that defines various processing procedures, and an internal memory for storing necessary data, and in particular, is closely related to the present invention. A speech recognition unit 23a, an utterance word list generation unit 23b, a topic word extraction unit 23c, a topic word list generation unit 23d, a topic word turning point extraction unit 23e, and an index generation unit 23f are provided. Execute the process. Note that the utterance word list generation unit 23b corresponds to the “utterance word list generation unit” described in the claims, and similarly, the topic word extraction unit 23c corresponds to the “topic word extraction unit”, and the topic word The list generation unit 23d corresponds to the “topic word list generation unit”, the topic word turning point extraction unit 23e corresponds to the “topic word turning point extraction unit”, and the index generation unit 23f corresponds to the “index generation unit”. Correspond.

音声認識部２３ａは、入力された音声コンテンツからテキストデータ２２ａを生成し、記憶部２２に格納する。具体的に例を挙げて説明すると、音声コンテンツが入力されてインデクス生成指示を受け付けると、音声認識部２３ａは、入力された音声コンテンツ（図３参照）を解析し、テキストデータ（図４、図５参照）を生成して記憶部２２に格納する。なお、音声認識部２３ａが扱う音声コンテンツは、マイクなどからリアルタイムに入力されるデータであっても、一度録音されて、ＣＤやＤＶＤなどの記録媒体に記録されたデータであってもよい。また、リアルタイム入力音声データを音声認識する場合、音声認識処理時刻を時刻情報とし、保存音声データを音声認識する場合、音声認識処理時刻を録音開始時刻で補正した時刻を時刻情報として扱う。 The voice recognition unit 23 a generates text data 22 a from the input voice content and stores it in the storage unit 22. Specifically, when an audio content is input and an index generation instruction is received, the audio recognition unit 23a analyzes the input audio content (see FIG. 3) and generates text data (FIG. 4, FIG. 4). 5) is generated and stored in the storage unit 22. The audio content handled by the voice recognition unit 23a may be data input in real time from a microphone or the like, or may be data recorded once and recorded on a recording medium such as a CD or a DVD. When real-time input speech data is recognized as speech, the speech recognition processing time is used as time information. When the stored speech data is recognized as speech, the time obtained by correcting the speech recognition processing time with the recording start time is handled as time information.

発話単語リスト生成部２３ｂは、テキストデータ２２ａに含まれる単語と当該単語の発話時刻とを抽出し、当該単語および発話時刻を対応付けて列挙した発話単語リスト２２ｂを生成する。具体的に例を挙げて説明すると、音声認識部２３ａにより生成されて記憶部２２に格納されたテキストデータ２２ａを読み出して、当該テキストデータ２２ａに含まれる単語と当該単語の発話時刻とを抽出し、当該単語および発話時刻を対応付けて、「１０：０６：００、環境、教育」や「１０：０６：０４、図書館」などと抽出して発話単語リスト２２ｂ（図６参照）を生成して、記憶部２２に格納する。 The utterance word list generation unit 23b extracts a word included in the text data 22a and the utterance time of the word, and generates an utterance word list 22b that lists the word and the utterance time in association with each other. More specifically, the text data 22a generated by the speech recognition unit 23a and stored in the storage unit 22 is read out, and the word included in the text data 22a and the utterance time of the word are extracted. The word and the utterance time are associated with each other and extracted as “10:06:00, environment, education”, “10:06:04, library”, etc. to generate the utterance word list 22b (see FIG. 6). And stored in the storage unit 22.

話題語抽出部２３ｃは、発話単語リスト生成部２３ｂによって生成された発話単語リスト２２ｂに含まれる単語ごとに所定の単位時間当たりの発言回数を示す発話密度を算出し、当該発話密度が所定の値を超える単語を単位時間における話題語として抽出する。具体的には、発話単語リスト生成部２３ｂによって生成されて、記憶部２２に格納される発話単語リスト２２ｂを読み出して、そして、図７の（１）および（２）に示すように、この発話単語リスト２２ｂに含まれる単語ごとに所定の単位時間当たり（Δｔ）の発言回数（Δｈ）を示す発話密度を算出し、当該発話密度（Ｘ）が所定の値（例えば、２０）を超える単語を単位時間における話題語として抽出する。 The topic word extraction unit 23c calculates an utterance density indicating the number of utterances per predetermined unit time for each word included in the utterance word list 22b generated by the utterance word list generation unit 23b, and the utterance density is a predetermined value. Words that exceed are extracted as topic words in unit time. Specifically, the utterance word list 22b generated by the utterance word list generation unit 23b and stored in the storage unit 22 is read, and as shown in (1) and (2) of FIG. An utterance density indicating the number of utterances (Δh) per predetermined unit time (Δt) is calculated for each word included in the word list 22b, and words whose utterance density (X) exceeds a predetermined value (for example, 20) are calculated. Extract as topic words in unit time.

例を挙げて説明すると、単位時間を「１分」、閾値を「２０」として、図６に示した「１０：０６：００〜１０：０６：５７」の発話単語リストを、上記した「Δｈ／Δｔ」で発話密度（Ｘ）を算出すると、図７の（３）に示したように、「単語」と「発話密度」とを対応付けて、「環境、１５」「教育、４」「高齢者、７」「障害者、１」「図書館、２１」となる。このうち、閾値「２０」を超える「図書館」が「１０：０６：００〜１０：０６：５７」における話題語として抽出される。 To explain with an example, the utterance word list of “10:06:00 to 10:06:57” shown in FIG. 6 with the unit time being “1 minute” and the threshold value “20” is the above-mentioned “Δh”. When the utterance density (X) is calculated by “/ Δt”, as shown in (3) of FIG. 7, “environment, 15”, “education, 4”, “ Elderly, 7 ”“ Disabled, 1 ”“ Library, 21 ”. Among these, “library” exceeding the threshold “20” is extracted as a topic word in “10:06:00 to 10:06:57”.

話題語リスト生成部２３ｄは、音声コンテンツの発話開始時刻から発話終了時刻に至るまでの所定の単位時間ごとに、話題語抽出部２３ｃによって抽出された話題語を対応付けて列挙した話題語リスト２２ｃを生成する。具体的に例を挙げて説明すると、図８および図９に示すように、話題語抽出部２３ｃによって抽出された話題語「質問、改革、駐車場、図書館、横須賀、商店街など」を、それぞれの話題語が発言（出現）する時刻に対応付けた話題語リスト２２ｃを生成し、記憶部２２に格納する。上記した例で具体的に説明すると、入力された音声コンテンツが１０分間の場合、音声コンテンツの発話開始時刻から発話終了時刻に至る１０分間において、「１分間隔の時刻」と「話題語」とを対応付けて、「１０：０６：００〜１０：０６：５７、図書館」、「１０：０７：００〜１０：０８：５７、図書館、環境」「１０：０９：００〜１０：０９：５７、教育」などと、１分おきの話題語を列挙した話題語リストを生成し、記憶部２２に格納する。 The topic word list generation unit 23d associates and lists the topic words extracted by the topic word extraction unit 23c for each predetermined unit time from the utterance start time to the utterance end time of the audio content. Is generated. Specifically, as shown in FIG. 8 and FIG. 9, the topic words “question, reform, parking lot, library, Yokosuka, shopping street, etc.” extracted by the topic word extraction unit 23 c, respectively, A topic word list 22 c associated with the time when the topic word of the person speaks (appears) is generated and stored in the storage unit 22. Specifically, in the example described above, when the input audio content is 10 minutes, “time at 1 minute interval” and “topic word” in 10 minutes from the utterance start time to the utterance end time of the audio content. , “10: 06: 00-10: 06: 57, library”, “10: 07: 00-10: 08: 57, library, environment” “10: 09: 00-10: 09: 57 , “Education” and the like, a topic word list listing topic words every other minute is generated and stored in the storage unit 22.

話題語転換点抽出部２３ｅは、話題語リスト生成部２３ｄによって生成された話題語リスト２２ｃにおいて、話題語のいずれも列挙が続かない時刻を話題転換点として抽出する。具体的に例を挙げて説明すると、図１０に示すように、話題語リスト生成部２３ｄにより作成されて、記憶部２２に格納される話題語リスト２２ｃに含まれる「単語Ａ」「単語Ｂ」「単語Ｃ」を読み出して、それぞれの単語ごとの出現密度を出現（発話）順に、プロットする。そして、「単語Ａ」と「単語Ｂ」とをプロットすると連続した領域（出現時刻が重なる）となり（話題区間１）、「単語Ｃ」をプロットすると「単語Ａ」、「単語Ｂ」とは不連続に領域（話題区間２）となるので、話題語のいずれも列挙が続かない時刻、つまり、領域が不連続なる「単語Ｃ」の出現時刻「Ｔ」を話題転換点として抽出する。 The topic word turning point extraction unit 23e extracts, as the topic turning point, a time when none of the topic words are enumerated in the topic word list 22c generated by the topic word list generation unit 23d. Specifically, as shown in FIG. 10, “word A” “word B” included in the topic word list 22 c created by the topic word list generation unit 23 d and stored in the storage unit 22 as shown in FIG. “Word C” is read, and the appearance density for each word is plotted in the order of appearance (utterance). Then, when “word A” and “word B” are plotted, they become continuous areas (with overlapping appearance times) (topic section 1), and when “word C” is plotted, “word A” and “word B” are not. Since it is a continuous area (topic section 2), the time when none of the topic words are enumerated, that is, the appearance time “T” of “word C” where the area is discontinuous is extracted as a topic turning point.

インデクス生成部２３ｆは、話題語転換点抽出部２３ｅによって抽出された話題転換点によって区切られる区間ごとに、話題語リストにおいて各区間で列挙された話題語からインデクスを生成する。具体的に例を挙げて説明すると、話題語転換点抽出部２３ｅによって抽出された話題転換点が「１０：０６：００、未来図書館」「１０：０６：０１、図書館」「１０：１７：２１、自衛隊」と抽出された場合、図１０に示すように、インデクス生成部２３ｆは、抽出された話題転換点によって区切られる区間ごとに、話題語リストにおいて各区間で列挙された話題語からインデクスとして、「１０：００：００〜１０：０６：００、未来図書館、建設、中央」、「１０：０６：０１〜１０：１７：２０、図書館、未来図書館、土地」や「１０：１７：２１〜１０：３０：００、図書館、要望、自衛隊」などと生成して、記憶部２２に格納する。 The index generation unit 23f generates an index from the topic words listed in each section in the topic word list for each section delimited by the topic conversion points extracted by the topic word conversion point extraction unit 23e. Specifically, the topic turning points extracted by the topic word turning point extraction unit 23e are “10:06:00, Future Library”, “10:06:01, Library”, “10:17:21”. When “Self-Defense Force” is extracted, as shown in FIG. 10, the index generation unit 23f creates an index from the topic words listed in each section in the topic word list for each section delimited by the extracted topic turning points. "10: 00: 00-10: 06: 00, Future Library, Construction, Central", "10: 06: 01-10: 17: 20, Library, Future Library, Land" or "10: 17: 21- 10:30, library, request, self-defense force "and the like are stored in the storage unit 22.

［インデクス生成装置による処理（実施例１）］
次に、図１１を用いて、インデクス生成装置による処理を説明する。図１１は、インデクス生成処理の流れを示すフローチャートである。 [Processing by Index Generation Device (Example 1)]
Next, processing by the index generation device will be described with reference to FIG. FIG. 11 is a flowchart showing the flow of index generation processing.

図１１に示すように、音声コンテンツ（音声データ）が入力されて、インデクス生成指示を受け付けると（ステップＳ１１０１肯定）、インデクス生成装置２０の音声認識部２３ａは、入力された音声コンテンツを音声認識からテキストデータ２２ａを生成して、記憶部２２に格納する（ステップＳ１１０２）。 As shown in FIG. 11, when audio content (audio data) is input and an index generation instruction is accepted (Yes in step S1101), the audio recognition unit 23a of the index generation device 20 detects the input audio content from audio recognition. Text data 22a is generated and stored in the storage unit 22 (step S1102).

続いて、インデクス生成装置２０の発話単語リスト生成部２３ｂは、生成し格納されたテキストデータ２２ａを記憶部２２から読み出して、テキストデータ２２ａに含まれる単語と当該単語の発話時刻とを抽出し、当該単語および発話時刻を対応付けて列挙した発話単語リスト２２ｂを生成して、記憶部２２に格納する（ステップＳ１１０３）。 Subsequently, the utterance word list generation unit 23b of the index generation device 20 reads the generated and stored text data 22a from the storage unit 22, extracts the words included in the text data 22a and the utterance time of the words, An utterance word list 22b in which the word and the utterance time are associated and listed is generated and stored in the storage unit 22 (step S1103).

そして、インデクス生成装置２０の話題語抽出部２３ｃは、発話単語リスト生成部２３ｂによって生成されて格納された発話単語リスト２２ｂを記憶部２２から読み出して、発話単語リスト２２ｂに含まれる単語ごとに所定の単位時間当たりの発言回数を示す発話密度を算出し、当該発話密度が所定の値を超える単語を単位時間における話題語として抽出する（ステップＳ１１０４）。 Then, the topic word extraction unit 23c of the index generation device 20 reads the utterance word list 22b generated and stored by the utterance word list generation unit 23b from the storage unit 22, and performs predetermined processing for each word included in the utterance word list 22b. Utterance density indicating the number of utterances per unit time is calculated, and words whose utterance density exceeds a predetermined value are extracted as topic words in unit time (step S1104).

続いて、話題語リスト生成部２３ｄは、音声コンテンツの発話開始時刻から発話終了時刻に至るまでの所定の単位時間ごとに、話題語抽出部２３ｃによって抽出された話題語を対応付けて列挙した話題語リスト２２ｃを生成して、記憶部２２に格納する（ステップＳ１１０５）。 Subsequently, the topic word list generation unit 23d associates the topic words extracted by the topic word extraction unit 23c in association with each predetermined unit time from the utterance start time to the utterance end time of the audio content. A word list 22c is generated and stored in the storage unit 22 (step S1105).

そして、話題語転換点抽出部２３ｅは、話題語リスト生成部２３ｄによって生成された話題語リスト２２ｃを記憶部２２から読み出して、話題語のいずれも列挙が続かない時刻を話題転換点として抽出する（ステップＳ１１０６）。 Then, the topic word turning point extraction unit 23e reads the topic word list 22c generated by the topic word list generation unit 23d from the storage unit 22, and extracts a time when no enumeration of topic words continues as a topic turning point. (Step S1106).

インデクス生成部２３ｆは、話題語転換点抽出部２３ｅによって抽出された話題転換点によって区切られる区間ごとに、話題語リストにおいて各区間で列挙された話題語からインデクスを生成する（ステップＳ１１０７）。 The index generation unit 23f generates an index from the topic words listed in each section in the topic word list for each section delimited by the topic conversion points extracted by the topic word conversion point extraction unit 23e (step S1107).

［実施例１による効果］
このように、実施例１によれば、テキストデータに含まれる単語と当該単語の発話時刻とを抽出し、当該単語および発話時刻を対応付けて列挙した発話単語リストを生成し、生成された発話単語リストに含まれる単語ごとに所定の単位時間当たりの発言回数を示す発話密度を算出し、当該発話密度が所定の値を超える単語を前記単位時間における話題語として抽出し、音声コンテンツの発話開始時刻から発話終了時刻に至るまでの所定の単位時間ごとに、抽出された話題語を対応付けて列挙した話題語リストを生成し、生成された話題語リストにおいて、話題語のいずれも列挙が続かない時刻を話題転換点として抽出し、話題転換点によって区切られる区間ごとに、前記話題語リストにおいて各区間で列挙された話題語からインデクスを生成するので、音声コンテンツに対してある時間帯における話題を特定し、その話題の間に話題を特定するインデクスを付与し、また別の話題の間では別のインデクスを付与することができる結果、音声コンテンツ全体に対して十分に有用かつ詳細なインデクスを作成することが可能である。 [Effects of Example 1]
As described above, according to the first embodiment, a word included in text data and an utterance time of the word are extracted, an utterance word list in which the word and the utterance time are associated with each other is generated, and the generated utterance is generated. Calculates the utterance density indicating the number of utterances per unit time for each word included in the word list, extracts words whose utterance density exceeds a predetermined value as topic words in the unit time, and starts utterance of audio content For each predetermined unit time from the time to the utterance end time, a topic word list in which the extracted topic words are associated and enumerated is generated, and in the generated topic word list, all of the topic words are enumerated. Time is extracted as a topic turning point, and an index is generated from the topic words listed in each section in the topic word list for each section delimited by the topic turning point Therefore, it is possible to identify a topic in a certain time zone for audio content, assign an index for identifying the topic between the topics, and assign another index between other topics. It is possible to create a sufficiently useful and detailed index for the entire content.

また、実施例１によれば、生成された話題語リストにおいて、話題語のいずれも列挙が続かない時刻を話題転換点として抽出し、抽出された話題転換点によって区切られる区間ごとに、話題語リストにおいて各区間で列挙された話題語からインデクスを生成するので、音声コンテンツに含まれる複数の話題を抽出して、それぞれの話題に対して複数のインデクスが作成できる結果、音声コンテンツ全体に対してさらに有用かつ詳細なインデクスを作成することが可能である。 Further, according to the first embodiment, in the generated topic word list, a time when none of the topic words are enumerated is extracted as a topic turning point, and for each section divided by the extracted topic turning point, the topic word Since the index is generated from the topic words listed in each section in the list, multiple topics included in the audio content can be extracted, and multiple indexes can be created for each topic. It is possible to create a more useful and detailed index.

さて、上記した実施例１で本発明の実施例について説明したが、本発明は上述した実施例以外にも、種々の異なる形態にて実施されてよいものである。そこで、図１２〜図１４を用いて、（１）意味的に近い単語も同一として発話密度算出、（２）あらかじめ用意した話題語リストから話題語抽出、（３）僅かに不連続な時間帯を連続とみなす、にそれぞれ区分けして説明する。 The embodiment of the present invention has been described in the first embodiment. However, the present invention may be implemented in various different forms other than the above-described embodiment. Therefore, using FIG. 12 to FIG. 14, (1) utterance density calculation for words that are semantically similar, (2) topic word extraction from a topic word list prepared in advance, (3) slightly discontinuous time zone Are considered to be continuous, and will be described separately.

図１２は、実施例２に係る距離辞書の例を示す図であり、図１３は、あらかじめ用意した発話単語リストに記憶される情報の構成例を示す図であり、図１４は、実施例２に係る話題語の不連続な時間帯を同じ時間帯とみなすことを説明するための図である。 FIG. 12 is a diagram illustrating an example of a distance dictionary according to the second embodiment, FIG. 13 is a diagram illustrating a configuration example of information stored in an utterance word list prepared in advance, and FIG. It is a figure for demonstrating that the discontinuous time slot | zone of the topic word which concerns on is considered as the same time slot | zone.

（１）意味的に近い単語も同一として発話密度算出
例えば、実施例１では、生成された発話単語リストに含まれる単語の発話密度を算出して話題語を抽出する場合について説明したが、本発明はこれに限定されるものではなく、発話単語リストに含まれる単語と意味的に近い単語についても、発話密度を算出して話題語を抽出するようにしてもよい。 (1) Calculation of utterance density assuming that words that are semantically similar are the same For example, in the first embodiment, the case has been described where the utterance density of words included in the generated utterance word list is calculated and topic words are extracted. The invention is not limited to this, and a topic word may be extracted by calculating the utterance density of words that are semantically close to words included in the utterance word list.

具体的には、インデクス生成装置は、あらかじめ複数の単語間で互いの意味的距離を規定した距離辞書を記憶しておき、発話単語リストに含まれる所定の単語について所定の単位時間当たりの発話密度を算出する場合に、記憶された距離辞書において当該所定の単語と意味的距離が所定の範囲内にある単語についても当該所定の単語と同一の単語であるとして、所定の単位時間当たりの発話密度を算出する。 Specifically, the index generation device stores a distance dictionary that prescribes a mutual semantic distance between a plurality of words in advance, and utterance density per unit time for a predetermined word included in the utterance word list In the stored distance dictionary, the utterance density per unit time is assumed to be the same word as the predetermined word even if the semantic distance from the predetermined word is within the predetermined range. Is calculated.

例えば、図１２に示したように、インデクス生成装置は、「図書館」に近い単語（蔵書、辞書、ＩＴ化、立地、利用者、本など）が、発話単語リストに含まれる場合、実際には発話されておらず、発話単語リストにも含まれない「図書館」が発話されたものとみなして、この「図書館」につても、所定の単位時間当たりの発話密度を算出して、話題語リストを作成し、話題転換点を抽出して、インデクスを生成する。 For example, as shown in FIG. 12, the index generation device, when a word close to “library” (collection, dictionary, IT, location, user, book, etc.) is included in the utterance word list, actually Assuming that a “library” that has not been uttered and is not included in the utterance word list has been uttered, the utterance density per unit time is also calculated for this “library”, and the topic word list Is created, topic turning points are extracted, and an index is generated.

これにより、複数の単語間で互いの意味的距離を規定した距離辞書を記憶し、発話単語リストに含まれる所定の単語について所定の単位時間当たりの発話密度を算出する場合に、記憶された距離辞書において当該所定の単語と意味的距離が所定の範囲内にある単語についても当該所定の単語と同一の単語であるとして、前記所定の単位時間当たりの発話密度を算出するので、例えば、音声コンテンツに含まれる単語から「美術館」「市民」などが話題語として決定されて、音声コンテンツに含まれなかった「建設計画」や「建設場所」などを、「美術館」「市民」に意味的に近い単語であることを根拠にして、発話密度を算出することで、話題語である「美術館」「市民」をより特定することができる結果、音声コンテンツ全体に対して、より十分に有用かつ詳細なインデクスを作成することが可能である。 Thereby, when storing a distance dictionary that defines the mutual semantic distance between a plurality of words, and calculating the utterance density per unit time for a predetermined word included in the utterance word list, the stored distance Since the utterance density per predetermined unit time is calculated assuming that a word whose semantic distance is within a predetermined range in the dictionary is also the same word as the predetermined word, for example, audio content “Museum”, “Citizen”, etc. are determined as topic words from the words included in the word, and “construction plan” and “construction place” that were not included in the audio content are semantically close to “Museum” and “Citizen” By calculating the utterance density based on the word, it is possible to identify the topic words “Museum” and “Citizen” more. It is possible to create a sufficiently useful and detailed index.

（２）あらかじめ用意した話題語リストから話題語抽出
また、実施例１では、発話単語リストに含まれる単語ごと算出した発話密度が所定の値を超える単語を単位時間における話題語として抽出する場合について説明したが、本発明はこれに限定されるものではなく、あらかじめ記憶された話題語リストに含まれることを条件に、算出した発話密度が所定の値を超える単語を単位時間における話題語として抽出するようにしてもよい。 (2) Extracting Topic Words from Preliminary Topic Word Lists In Example 1, a case where words whose utterance density calculated for each word included in the utterance word list exceeds a predetermined value is extracted as topic words in unit time. As described above, the present invention is not limited to this, and a word whose calculated utterance density exceeds a predetermined value is extracted as a topic word in a unit time on condition that it is included in a pre-stored topic word list. You may make it do.

具体的には、インデクス生成装置は、図１３に示したような話題語抽出用リストを記憶しておき、記憶された話題語抽出用リストに列挙された単語であることを条件に、発話密度が所定の値を超える単語を単位時間における話題語として抽出する Specifically, the index generation device stores the topic word extraction list as shown in FIG. 13 and is the utterance density on the condition that the words are listed in the stored topic word extraction list. Words that exceed a predetermined value as topic words in unit time

例を挙げると、発話単語リストに含まれる単語ごと算出した発話密度が所定の値を超える単語について、図１３に示すような単語一覧（例えば、地方分権、マイホーム、交通安全、税、図書館）に含まれる場合に、話題語として抽出する。 For example, for words whose utterance density calculated for each word included in the utterance word list exceeds a predetermined value, a word list as shown in FIG. 13 (for example, decentralization, home, traffic safety, tax, library) If it is included, it is extracted as a topic word.

これにより、記話題語として抽出すべき単語を列挙した話題語抽出用リストを記憶し、記憶された話題語抽出用リストに列挙された単語であることを条件に、発話密度が所定の値を超える単語を単位時間における話題語として抽出するので、例えば、「就職」、「海水浴場」、「ボランティア」など特定の単語をあらかじめ用意しておき、抽出された単語がこれら特定の単語か否かを判定することで、さらに正確に話題語を特定することができる結果、より正確に話題転換点を抽出することが可能である。 As a result, a topic word extraction list in which the words to be extracted as written topic words are stored is stored, and the utterance density is set to a predetermined value on condition that the words are listed in the stored topic word extraction list. Since more words are extracted as topic words in unit time, for example, specific words such as “employment”, “beach”, “volunteer” are prepared in advance, and whether or not the extracted words are these specific words As a result of more accurately specifying the topic word, it is possible to extract the topic turning point more accurately.

（３）僅かに不連続な時間帯を連続とみなす
また、実施例１では、生成された話題語リストにおいて、話題語のいずれも列挙が続かない時刻を話題転換点として抽出する場合を説明したが、本発明はこれに限定されるものではなく、同一の話題語が所定の時間間隔を空けずに列挙されている場合には、当該話題語の列挙が続いたものとするようにしてもよい。 (3) Slightly discontinuous time zones are regarded as continuous. In the first embodiment, the case where a time when enumeration of all topic words does not continue is extracted as a topic turning point in the generated topic word list has been described. However, the present invention is not limited to this, and when the same topic word is enumerated without a predetermined time interval, the enumeration of the topic word may be continued. Good.

具体的に例を挙げて説明すると、図１４に示したように、同一の話題語が所定の時間間隔を空けずに列挙されている場合（図１４の（１）から（３））には、当該話題語（土地）の列挙が続いたものとして、話題転換点を抽出する。 Specifically, as shown in FIG. 14, when the same topic word is listed without a predetermined time interval ((1) to (3) in FIG. 14), as shown in FIG. Then, a topic turning point is extracted as the enumeration of the topic words (land) continues.

これにより、話題語リストにおいて同一の話題語が所定の時間間隔を空けずに列挙されている場合には、当該話題語の列挙が続いたものとして、話題転換点を抽出するので、例えば、離散的に話題語が抽出された場合、当該話題語が検出された区間をマージして一つの区間として話題を抽出することができる結果、無駄に細かく話題語を抽出してインデクス全体に不要な情報が抽出されることを防止することが可能である。 Thereby, when the same topic word is enumerated without leaving a predetermined time interval in the topic word list, the topic turning point is extracted as the enumeration of the topic word continues, so for example, discrete If a topic word is extracted, the sections in which the topic word is detected can be merged to extract the topic as one section. As a result, unnecessary information is extracted from the entire index by extracting the topic words in detail. Can be prevented from being extracted.

さて、上記した実施例では、テキストデータから作成した発話単語リストから話題語を抽出する場合について説明したが、本発明はこれに限定されるものではなく、作成した発話単語リストに単語除去、単語補完などの処理を行った結果を用いて、話題語を抽出するようにしてもよい。 In the embodiment described above, a case has been described in which topic words are extracted from an utterance word list created from text data. However, the present invention is not limited to this, and word removal and words are added to the created utterance word list. You may make it extract a topic word using the result of processing, such as a complement.

そこで、以下では実施例３として、作成した発話単語リストに単語除去、単語補完などの処理を行った結果を用いて、話題語を抽出する例を説明する。なお、以下では、実施例３に係るインデクス生成装置の概要および特徴、インデクス生成装置の構成および実施例３による効果について説明する。 Therefore, in the following, as Example 3, an example in which topic words are extracted using the results of processing such as word removal and word completion on the created utterance word list will be described. In the following, the outline and features of the index generation device according to the third embodiment, the configuration of the index generation device, and the effects of the third embodiment will be described.

［インデクス生成装置の概要および特徴（実施例３）］
次に、図１５を用いて、実施例３に係るインデクス装置の概要および特徴を説明する。図１５は、実施例３に係るインデクス生成装置の概要および特徴を示す図である。 [Outline and Features of Index Generation Device (Example 3)]
Next, the outline and characteristics of the index device according to the third embodiment will be described with reference to FIG. FIG. 15 is a diagram illustrating an overview and characteristics of the index generation device according to the third embodiment.

図１５に示すように、インデクス生成装置４０は、記憶部２２に複数の単語間で互いの意味的距離を規定した距離辞書を記憶する。そして、実施例１と同様に、音声コンテンツ（音声データ）が入力されインデクス作成指示を受け付けると、インデクス生成装置４０は、入力された音声コンテンツの音声認識を行う。 As illustrated in FIG. 15, the index generation device 40 stores a distance dictionary that defines a semantic distance between a plurality of words in the storage unit 22. Then, as in the first embodiment, when audio content (audio data) is input and an index creation instruction is received, the index generation device 40 performs speech recognition of the input audio content.

続いて、インデクス装置４０は、音声認識により作成されたテキストデータに含まれる単語と当該単語の発話時刻とを抽出し、当該単語および発話時刻を対応付けて列挙した発話単語リストを生成する。具体的に例を挙げると、インデクス生成装置４０は、単語および発話時刻を対応付けて、「９：００：００、単語Ａ」「９：０６：１０、単語Ｂ」「９：２０：００、単語Ｃ」「９：２５：００、単語Ｄ」「９：３０：００、単語Ｅ」などと発話単語リストを作成する。 Subsequently, the index device 40 extracts a word included in the text data created by speech recognition and the utterance time of the word, and generates an utterance word list in which the word and the utterance time are associated and listed. To give a specific example, the index generation device 40 associates a word and an utterance time, and “9:00: 00, word A”, “9:06:10, word B”, “9:20:00, An utterance word list such as “word C”, “9:25:00, word D”, “9:30, 0, word E” is created.

そして、インデクス生成装置４０は、記憶部４２に記憶された距離辞書（図１７〜図１９参照）を用いて、生成された発話単語リストに含まれる単語間の意味的距離を評価し、当該評価が低い単語を発話単語リストから除去する。上記した例で具体的に説明すると、インデクス生成装置４０は、作成された発話単語リストに含まれる「９：００：００、単語Ａ」「９：０６：１０、単語Ｂ」「９：２０：００、単語Ｃ」「９：２５：００、単語Ｄ」「９：３０：００、単語Ｅ」から、記憶部２２に記憶される距離辞書（図１７〜図１９参照）を使用して、意味的に遠い「９：０６：１０、単語Ｂ」を削除する。その結果、発話単語リストには、「９：００：００、単語Ａ」「９：２０：００、単語Ｃ」「９：２５：００、単語Ｄ」「９：３０：００、単語Ｅ」が記憶されることとなる。 And the index production | generation apparatus 40 evaluates the semantic distance between the words contained in the produced | generated utterance word list | wrist using the distance dictionary (refer FIGS. 17-19) memorize | stored in the memory | storage part 42, The said evaluation Remove low words from the spoken word list. Specifically, the index generation device 40 includes “9:00: 00, word A”, “9:06:10, word B”, “9:20:” included in the created utterance word list. “00, word C”, “9:25:00, word D”, “9:30, 00, word E”, and using the distance dictionary stored in the storage unit 22 (see FIGS. 17 to 19), "9:06:10, word B" which is far away is deleted. As a result, in the utterance word list, “9:00: 00, word A”, “9: 20:00, word C”, “9: 25:00, word D”, “9:30: 00, word E” are included. It will be memorized.

より詳細に説明すると、例えば、発話単語リストに「美術館」「芸術」「二極化」が含まれている場合、図１９に示すように、「美術館」「芸術」とは意味的に近い距離にあり、「二極化」は「美術館」「芸術」のどちらとも意味的に遠いことを根拠にして、発話単語リストから「二極化」を削除する。なお、「意味的に近い」または「意味的に遠い」とは、図１９において、直接繋がっている場合や隣接した関係にある場合、「意味的に近い」と判断し、離れた位置にある場合、「意味的に遠い」と判断する。 More specifically, for example, when the utterance word list includes “art museum”, “art”, and “bipolarization”, as shown in FIG. 19, the distance is semantically close to “art museum” and “art”. “Bipolarization” deletes “bipolarization” from the utterance word list on the basis that both “art museum” and “art” are semantically distant. Note that “semantically close” or “semanticly far” means that if they are directly connected or are adjacent to each other in FIG. In this case, it is determined that “it is semantically far”.

続いて、インデクス生成装置４０は、記憶部４２に記憶された距離辞書（図１７〜図１９参照）を用いて、単語除去後の発話単語リストに列挙された単語との意味的距離の評価が高い単語を距離辞書から抽出し、当該抽出した単語を発話単語リストに補完する。上記した例で具体的に説明すると、インデクス生成装置４０は、「単語Ｂ」が除去された発話単語リストに含まれる「単語Ａ」と「単語Ｃ」との意味的に近い距離にある「単語Ｆ」を記憶部２２に記憶される距離辞書を使用して抽出し、発話単語リストに追加する。 Subsequently, the index generation device 40 uses the distance dictionary (see FIGS. 17 to 19) stored in the storage unit 42 to evaluate the semantic distance from the words listed in the utterance word list after word removal. A high word is extracted from the distance dictionary, and the extracted word is complemented to the utterance word list. More specifically, the index generation device 40 uses the “word” that is semantically close to “word A” and “word C” included in the utterance word list from which “word B” has been removed. F ”is extracted using the distance dictionary stored in the storage unit 22 and added to the spoken word list.

より詳細に説明すると、例えば、発話単語リストに「美術館」「芸術」が含まれている場合、図１９に示すように、「活動」が「美術館」「芸術」とのそれぞれに対して意味的に近い距離にあることを根拠にして、「活動」を発話単語リストに追加する。 More specifically, for example, when “art museum” and “art” are included in the utterance word list, “activity” is meaningful for each of “art museum” and “art” as shown in FIG. "Activity" is added to the utterance word list on the basis of being close to.

そして、インデクス生成装置４０は、生成された発話単語リストに含まれる単語ごとに所定の単位時間当たりの発言回数を示す発話密度を算出し、当該発話密度が所定の値を超える単語を単位時間における話題語として抽出し、音声コンテンツの発話開始時刻から発話終了時刻に至るまでの所定の単位時間ごとに、抽出された話題語を対応付けて列挙した話題語リストを生成する。ここで、発話密度算出などの具体的な処理は、実施例１と同様であるので、その詳細な説明は省略するが、上記した例で説明すると、「単語Ａ」「単語Ｃ」「単語Ｄ」「単語Ｅ」「単語Ｆ」を話題語として抽出されるとする。 Then, the index generation device 40 calculates an utterance density indicating the number of utterances per predetermined unit time for each word included in the generated utterance word list, and selects words whose utterance density exceeds a predetermined value in the unit time. A topic word list that is extracted as topic words and enumerated in association with the extracted topic words is generated for each predetermined unit time from the speech start time to the speech end time of the audio content. Here, specific processing such as utterance density calculation is the same as that in the first embodiment, and thus detailed description thereof is omitted. However, in the above example, “word A” “word C” “word D” It is assumed that “word E” and “word F” are extracted as topic words.

そして、インデクス生成装置４０は、生成された話題語リストにおいて、話題語のいずれも列挙が続かない時刻を話題転換点として抽出する。なお、話題転換点の抽出は、実施例１で説明したように、図４などによる話題語をプロットすることで抽出する。ここでは、「単語Ｃ」の出現時刻が、話題転換点として抽出されるとする。 And the index production | generation apparatus 40 extracts the time when none of a topic word enumeration continues as a topic turning point in the produced | generated topic word list. Note that topic turning points are extracted by plotting topic words as shown in FIG. 4 as described in the first embodiment. Here, it is assumed that the appearance time of “word C” is extracted as a topic turning point.

そして、インデクス生成装置４０は、抽出された話題転換点によって区切られる区間ごとに、話題語リストにおいて各区間で列挙された話題語それぞれの発話頻度を算出し、当該発話頻度が所定の値を超える話題語を重要語として抽出し、話題転換点によって区切られる区間ごとに、重要語として抽出された話題語を列挙したインデクスを生成する。具体的に例をあげると、発話頻度が「１」未満である単語および不要単語一覧に含まれる単語が削除されて抽出される。 Then, the index generation device 40 calculates the utterance frequency of each topic word listed in each section in the topic word list for each section delimited by the extracted topic turning points, and the utterance frequency exceeds a predetermined value. Topic words are extracted as important words, and an index listing the topic words extracted as important words is generated for each section delimited by topic turning points. As a specific example, words whose utterance frequency is less than “1” and words included in the unnecessary word list are deleted and extracted.

上記した例で具体的に説明すると、抽出された話題転換点「単語Ｃ」によって区切られる区間「区間１」の「単語Ａ」「単語Ｆ」は、発話頻度が「１」以上であり、「区間２」の「単語Ｃ」「単語Ｄ」「単語Ｅ」「単語Ｆ」も、発話頻度が「１」以上であり、不要単語一覧に「単語Ｄ」が記載されているとすると、インデクス生成装置４０は、「区間１」では「単語Ａ」「単語Ｆ」、「区間２」では「単語Ｃ」「単語Ｅ」「単語Ｆ」を話題語として抽出して列挙したインデクスを生成する。 Specifically, in the above example, “word A” and “word F” in the section “section 1” divided by the extracted topic turning point “word C” have an utterance frequency of “1” or more, and “ Assuming that “word C”, “word D”, “word E”, and “word F” in section 2 also have an utterance frequency of “1” or higher and “word D” is listed in the unnecessary word list, an index is generated. The device 40 generates an index in which “word A”, “word F” in “section 1” and “word C”, “word E”, and “word F” in “section 2” are extracted and listed as topic words.

続いて、インデクス生成装置４０は、話題転換点によって区切られる区間ごとに、発話時刻、発話密度、発話頻度、意味的距離のいずれか一つまたは複数に基づいて、話題語を編集してインデクスを生成する。具体的に説明すると、例えば、インデクス生成装置４０は、話題語を抽出し生成したインデクスを「発話時刻が早い順に左から表示する」や「発話密度が高い順にフォントの大きさを変えて表示する」など、発話時刻、発話密度、発話頻度、意味的距離のいずれか一つまたは複数を組み合わせて、インデクスを編集する（図３８、図３９参照）。 Subsequently, the index generating device 40 edits the topic word based on one or more of the utterance time, the utterance density, the utterance frequency, and the semantic distance for each section delimited by the topic turning point. Generate. More specifically, for example, the index generation device 40 displays the index generated by extracting topic words by “displaying from the left in order of speaking time from the left” or “changing the font size in descending order of speaking density”. The index is edited by combining any one or more of utterance time, utterance density, utterance frequency, and semantic distance (see FIGS. 38 and 39).

このようなことから、実施例３によれば、意味的に遠い単語を削除していない抽出単語データから話題語を抽出する場合に比べて、さらに正確に話題語を特定することが可能である。また、意味的に近い単語を補完しなかった場合に比べて、より正確に話題語を特定することが可能である。さらに、話題ごとにより有用な単語をインデクスとして作成することが可能であり、インターネットの検索や会議録音データなどの議題検索など利用目的に応じたインデクスの表示を行うことが可能である。 For this reason, according to the third embodiment, it is possible to specify a topic word more accurately than in the case where a topic word is extracted from extracted word data in which words that are semantically distant are not deleted. . In addition, it is possible to specify a topic word more accurately than in the case where words that are semantically close are not complemented. Furthermore, it is possible to create a useful word as an index for each topic, and it is possible to display an index according to the purpose of use, such as searching the Internet or searching for an agenda such as conference recording data.

［インデクス装置の構成（実施例３）］
次に、図１６〜図３９を用いて、図１５に示したインデクス生成装置４０の構成を説明する。図１６は、インデクス生成装置４０の構成を示すブロック図であり、図１７〜図１９は、距離辞書の例を示す図であり、図２０は、発話単語抽出条件の例を示す図であり、図２１は、発話単語抽出結果の例を示す図であり、図２２は、最短距離計算条件の例を示す図であり、図２３は、最短距離計算結果の例を示す図であり、図２４は、キーワード抽出条件の例を示す図であり、図２５は、単語間最短距離別累積回数による評価の例を示す図であり、図２６は、単語削除後の単語抽出結果（発話単語リスト）の例を示す図であり、図２７は、キーワード抽出条件の例を示す図であり、図２８、平均距離と短縮距離による評価の例を示す図である。 [Configuration of Indexing Device (Example 3)]
Next, the configuration of the index generation device 40 shown in FIG. 15 will be described with reference to FIGS. 16 is a block diagram showing the configuration of the index generation device 40, FIGS. 17 to 19 are diagrams showing examples of distance dictionaries, and FIG. 20 is a diagram showing examples of utterance word extraction conditions, 21 is a diagram illustrating an example of the utterance word extraction result, FIG. 22 is a diagram illustrating an example of the shortest distance calculation condition, and FIG. 23 is a diagram illustrating an example of the shortest distance calculation result. FIG. 25 is a diagram showing an example of keyword extraction conditions, FIG. 25 is a diagram showing an example of evaluation based on the cumulative number by shortest distance between words, and FIG. 26 is a word extraction result (utterance word list) after word deletion FIG. 27 is a diagram showing an example of keyword extraction conditions, and FIG. 28 is a diagram showing an example of evaluation based on average distance and shortening distance.

また、図２９は、単語削除後の単語抽出結果（発話単語リスト）の例を示す図であり、図３０は、補完単語抽出条件の例を示す図であり、図３１は、単語補完後の単語抽出結果（発話単語リスト）の例を示す図であり、図３２は、発話密度集計条件の例を示す図であり、図３３は、発話密度集計結果の例を示す図であり、図３４は、重要語抽出条件の例を示す図であり、図３５は、インデクスとして抽出すべきでない単語一覧の例を示す図であり、図３６は、インデクスとして抽出すべきでない単語削除後の単語抽出結果の例を示す図であり、図３７は、インデクス生成条件の例を示す図であり、図３８と図３９は、インデクス生成例を示す図である。 FIG. 29 is a diagram illustrating an example of a word extraction result (utterance word list) after word deletion, FIG. 30 is a diagram illustrating an example of complementary word extraction conditions, and FIG. 31 is a diagram after word completion. FIG. 32 is a diagram illustrating an example of a word extraction result (utterance word list), FIG. 32 is a diagram illustrating an example of an utterance density count condition, and FIG. 33 is a diagram illustrating an example of an utterance density count result. FIG. 35 is a diagram illustrating an example of important word extraction conditions, FIG. 35 is a diagram illustrating an example of a word list that should not be extracted as an index, and FIG. 36 is a word extraction after deleting a word that should not be extracted as an index. FIG. 37 is a diagram illustrating an example of a result, FIG. 37 is a diagram illustrating an example of an index generation condition, and FIGS. 38 and 39 are diagrams illustrating an example of index generation.

同図に示すように、このインデクス生成装置４０は、通信制御Ｉ／Ｆ部４１と、記憶部４２と、制御部４３とから構成される。また、記憶部４２は、テキストデータ４２ａと、発話単語リスト４２ｂと、話題語リスト４２ｃと、インデクス４２ｄとを備えており、制御部４３は、音声認識部４３ａと、発話単語リスト生成部４３ｂと、話題語抽出部４３ｃと、話題語リスト生成部４３ｄと、話題語転換点抽出部４３ｅと、インデクス生成部４３ｆと、単語除去部５１と、単語補完部５２と、重要語抽出部５３とを備える。なお、以下では、上記した「インデクス生成装置の全体構成」の処理の流れに沿って説明する。 As shown in the figure, the index generation device 40 includes a communication control I / F unit 41, a storage unit 42, and a control unit 43. The storage unit 42 includes text data 42a, an utterance word list 42b, a topic word list 42c, and an index 42d. The control unit 43 includes a voice recognition unit 43a, an utterance word list generation unit 43b, and the like. The topic word extracting unit 43c, the topic word list generating unit 43d, the topic word turning point extracting unit 43e, the index generating unit 43f, the word removing unit 51, the word complementing unit 52, and the keyword extracting unit 53 Prepare. In the following, description will be given along the processing flow of the above-mentioned “overall configuration of the index generation device”.

また、発話単語リスト生成部４３ｂは、特許請求の範囲に記載の「発話単語リスト生成手段」に対応し、同様に、話題語抽出部４３ｃは、「話題語抽出手段」に対応し、話題語リスト生成部４３ｄは、「話題語リスト生成手段」に対応し、話題語転換点抽出部４３ｅは、「話題語転換点抽出手段」に対応し、インデクス生成部４３ｆは、「インデクス生成手段」に対応し、単語除去部５１は、「単語除去手段」に対応し、単語補完部５２は、「単語補完手段」に対応し、重要語抽出部５３は、「重要語抽出手段」に対応する。 The utterance word list generation unit 43b corresponds to the “utterance word list generation unit” described in the claims, and similarly, the topic word extraction unit 43c corresponds to the “topic word extraction unit”. The list generation unit 43d corresponds to the “topic word list generation unit”, the topic word turning point extraction unit 43e corresponds to the “topic word turning point extraction unit”, and the index generation unit 43f corresponds to the “index generation unit”. Correspondingly, the word removal unit 51 corresponds to the “word removal unit”, the word complementing unit 52 corresponds to the “word complementing unit”, and the important word extraction unit 53 corresponds to the “important word extracting unit”.

音声認識部４３ａと、発話単語リスト生成部４３ｂと、テキストデータ４２ａと、話題語リスト４２ｃは、実施例１で説明した音声認識部２３ａと、発話単語リスト生成部２３ｂと、テキストデータ２２ａと、話題語リスト２２ｃ同様の機能を有するので、ここでは、その詳細な説明は省略する。 The speech recognition unit 43a, the utterance word list generation unit 43b, the text data 42a, and the topic word list 42c are the speech recognition unit 23a, the utterance word list generation unit 23b, the text data 22a described in the first embodiment, Since it has the same function as the topic word list 22c, its detailed description is omitted here.

発話単語リスト４２ｂは、発話単語リスト生成部４３ｂによりテキストデータに含まれる単語と当該単語の発話時刻とを抽出されて当該単語および発話時刻を対応付けて列挙したデータを記憶する。具体的に例を挙げれば、発話単語リスト４２ｂは、図２０に示すように、『設定項目を示す「項目、設定する値を示す「値」』として「処理実行単位（ステップ）、１」などの単語抽出条件を保持する。そして、この保持する単語抽出条件に基づいて、図２１に示すように『抽出単語のノード番号を示す「抽出単語ノード番号」、抽出した単語を示す「抽出単語」、当該単語が出現した時刻を示す「時刻情報」』として「１、都市、１６時２９分００秒」や「２、意見、１６時２９分１０秒」などと記憶する。なお、図２０に示した単語抽出条件は任意に変更可能であり、ここでは、「値」が「１」に設定されていることより、単語抽出が１ステップずつ（１単語づつ）実行される。 The utterance word list 42b stores data in which the words included in the text data and the utterance times of the words are extracted by the utterance word list generation unit 43b and associated with the words and the utterance times. As a specific example, as shown in FIG. 20, the utterance word list 42 b includes “processing execution unit (step), 1” as “item indicating setting item,“ value indicating setting value ””, and the like. The word extraction condition of is retained. Then, based on the stored word extraction conditions, as shown in FIG. 21, “extracted word node number indicating the extracted word node number”, “extracted word” indicating the extracted word, and the time when the word appears As “time information” to be shown, “1, city, 16:29:00”, “2, opinion, 16:29:10” and the like are stored. Note that the word extraction conditions shown in FIG. 20 can be arbitrarily changed. Here, since “value” is set to “1”, word extraction is executed step by step (one word at a time). .

距離辞書記憶部５０は、複数の単語間で互いの意味的距離を規定した距離辞書を記憶する。具体的に例を挙げれば、図１７に示したように、『単語を一意に識別する「ノード番号」、単語の名前を示す「ノード名」、単語の品詞を示す「品詞」、評価情報として利用する「全ノード平均距離」』として、「１、都市、名詞、２．９７１」や「２、意見、名詞、３．４２９」などと記憶する。また、図１８に示すように、『単語と単語の意味的な接続関係を示す「リンク元ノード番号、リンク先ノード番号」』として「１、９」や「２、４」など記憶し、さらに、図１９に示すように、図１７と図１８とに基づいて、単語間の意味的距離を図示した知識ネットワークなどを記憶する。 The distance dictionary storage unit 50 stores a distance dictionary that defines a semantic distance between a plurality of words. Specifically, as shown in FIG. 17, “a node number that uniquely identifies a word, a“ node name ”that indicates the name of the word, a“ part of speech ”that indicates the part of speech of the word, and evaluation information As “the average distance of all nodes” to be used, “1, city, noun, 2.971”, “2, opinion, noun, 3.429”, etc. are stored. Further, as shown in FIG. 18, “1, 9”, “2, 4”, etc. are stored as “link source node number, link destination node number indicating the semantic connection relationship between words”. As shown in FIG. 19, a knowledge network or the like illustrating the semantic distance between words is stored based on FIG. 17 and FIG.

単語除去部５１は、距離辞書記憶部５０に記憶された距離辞書を用いて、発話単語リスト生成部４３ｂによって生成された発話単語リスト４２ｂに含まれる単語間の意味的距離を評価し、当該評価が低い単語を発話単語リストから除去する。具体的には、「単語間最短距離別累積回数による評価」と「単語の平均距離と短縮距離による評価」とのいずれかにより意味的距離を評価して当該評価が低い単語を発話単語リストから除去する。 The word removal unit 51 uses the distance dictionary stored in the distance dictionary storage unit 50 to evaluate the semantic distance between words included in the utterance word list 42b generated by the utterance word list generation unit 43b, and performs the evaluation. Remove low words from the spoken word list. Specifically, a semantic distance is evaluated by either “an evaluation based on the cumulative number of words by the shortest distance between words” or “an evaluation based on an average distance and a shortened distance of words”, and a word with a low evaluation is selected from the utterance word list. Remove.

（単語間最短距離計算）
まず、「単語間最短距離計算」について説明すると、単語除去部５１は、図２２に示すように、『最短距離計算結果を図示しないメモリに保持する単語の最大数を示す「距離計算ウインドウ幅」と単語間最短距離を計算する最大値を示す「計算対象最短距離」』として「５、４」などと最短距離計算条件を保持する。なお、入力単語数が「距離計算ウインドウ幅」を超える場合には、古いものから上書きし、最短距離計算結果が「計算対象最短距離」を超える場合、「Ｎ」を入力する。なお、必ずしも保持している必要はなく、処理実行ごとに手動で設定してもよい。 (Calculate shortest distance between words)
First, “shortest distance calculation between words” will be described. As shown in FIG. 22, the word removal unit 51 reads “a distance calculation window width indicating the maximum number of words held in a memory (not shown) for the shortest distance calculation result”. “5, 4” and the like as the “calculation target shortest distance” indicating the maximum value for calculating the shortest distance between words. When the number of input words exceeds the “distance calculation window width”, the oldest one is overwritten, and when the shortest distance calculation result exceeds the “calculation target shortest distance”, “N” is input. Note that it is not always necessary to hold this, and it may be set manually for each process execution.

そして、単語除去部５１は、距離辞書記憶部５０に記憶された距離辞書と保持する最短距離計算条件とを用いて、発話単語リスト４２ｂに記憶される単語ごとに「単語間最短距離」を計算し、「最短距離データ」としてメモリに格納する。具体的には、図２３に示すように、「領域、ノード名、ノード番号、時刻情報、最短距離」として、発話単語リスト４２ｂに記憶される単語ごとの単語間最短距離を算出する。例を挙げると、単語除去部５１は、図１９を用いて、計算対象とする２つのノード番号をもとに、始点のノード番号と終点のノード番号の両側から検索し、検索にかかったカウント数を最短距離とする。 Then, the word removal unit 51 calculates the “shortest distance between words” for each word stored in the utterance word list 42 b using the distance dictionary stored in the distance dictionary storage unit 50 and the shortest distance calculation condition held. And stored in the memory as “shortest distance data”. Specifically, as shown in FIG. 23, the shortest distance between words for each word stored in the utterance word list 42b is calculated as “region, node name, node number, time information, shortest distance”. For example, the word removal unit 51 searches from both sides of the node number of the start point and the node number of the end point based on the two node numbers to be calculated, using FIG. Let the number be the shortest distance.

上記した例をさらに具体的に説明すると、図１９では、「都市」と「意見」との意味的距離が「３」であり、「意見」と「コレラ」との意味的距離が「４」であることがわかる。このように、単語除去部５１は、図１９を用いて、各単語間の意味的距離（最短距離）を算出して、図２３に示した発話単語リスト４２ｂに記憶される単語ごとの単語間最短距離を算出し、「最短距離データ」をメモリに格納する。なお、この最短距離計算は、既知のダイクストラ法やウォーシャル・フロイド法を用いて算出してもよい。 The above example will be described more specifically. In FIG. 19, the semantic distance between “city” and “opinion” is “3”, and the semantic distance between “opinion” and “cholera” is “4”. It can be seen that it is. As described above, the word removing unit 51 calculates the semantic distance (shortest distance) between the words using FIG. 19, and calculates the distance between the words for each word stored in the utterance word list 42 b illustrated in FIG. 23. The shortest distance is calculated, and the “shortest distance data” is stored in the memory. The shortest distance calculation may be performed using a known Dijkstra method or a Warsal Floyd method.

（単語間最短距離別累積回数による評価）
次に、単語間最短距離別累積回数による評価について説明すると、単語除去部５１は、算出した「最短距離データ」をメモリから読み出し、発話単語リスト４２ｂに含まれる単語間の最短距離の累積個数を距離別にカウントした値をそれぞれの単語ごとの評価とし、この評価が高いものをキーワードとして抽出し、発話単語リスト４２ｂに格納する。つまり、単語除去部５１は、発話単語リスト４２ｂからキーワードとして抽出されなかった単語を、評価が低いことを根拠にして削除する。 (Evaluation based on the cumulative number of shortest distances between words)
Next, the evaluation based on the cumulative number of shortest distances between words will be described. The word removal unit 51 reads the calculated “shortest distance data” from the memory, and calculates the cumulative number of shortest distances between words included in the utterance word list 42b. A value counted for each distance is used as an evaluation for each word, and a high evaluation is extracted as a keyword and stored in the utterance word list 42b. That is, the word removal unit 51 deletes words that have not been extracted as keywords from the utterance word list 42b on the basis of low evaluation.

具体的に説明すると、単語除去部５１は、図２４に示したように、『キーワード抽出評価対象とする単語の最大数を示す「評価ウインドウ幅」、キーワード抽出評価対象とする「距離０の累積回数」、キーワード抽出評価対象とする「距離１の累積回数」、キーワード抽出評価対象とする「距離２の累積回数」、キーワード抽出評価対象とする「距離３の累積回数」』を「５、１、１、１、３」などのキーワード抽出条件を保持する。なお、必ずしも保持している必要はなく、処理実行ごとに手動で設定してもよい。 More specifically, as shown in FIG. 24, the word removal unit 51 performs “an“ evaluation window width ”indicating the maximum number of words to be subjected to keyword extraction evaluation, and“ cumulative distance 0 ”as a keyword extraction evaluation target. “Number of times”, “cumulative number of distance 1” as a keyword extraction evaluation object, “accumulation number of distance 2” as a keyword extraction evaluation object, and “cumulative number of distance 3” as a keyword extraction evaluation object ” The keyword extraction condition such as “1, 1, 3” is held. Note that it is not always necessary to hold this, and it may be set manually for each process execution.

このような構成のもと、単語除去部５１は、保持するキーワード抽出条件と距離辞書記憶部５０から距離辞書（図１７〜図１９）に基づいて、発話単語リスト４２ｂに含まれる単語間の最短距離の累積回数を距離別に評価する。その結果を図２５に示す。この結果から、各単語の距離別累積回数のいずれかが、距離別の抽出対象累積回数以上である場合、単語除去部５１は、該当単語をキーワードとして抽出する。 Under such a configuration, the word removal unit 51 uses the keyword extraction condition to be held and the distance dictionary storage unit 50 based on the distance dictionary (FIGS. 17 to 19) to shorten the shortest distance between words included in the utterance word list 42 b. Evaluate the cumulative number of distances by distance. The result is shown in FIG. From this result, if any of the cumulative number of distances for each word is equal to or greater than the cumulative number of extraction targets for each distance, the word removal unit 51 extracts the corresponding word as a keyword.

例えば、上記したキーワード抽出条件（最短距離０＝「１」、最短距離１＝「１」、最短距離２＝「１」、最短距離３＝「３」）をもとにキーワード抽出を行うと、図２５の（１）から（３）までは、各単語（都市、意見、コレラ）の距離別累積回数のいずれも、距離別の抽出対象累積回数を越えないので、いずれの単語もキーワードとして抽出されない。そして単語除去部５１は、図２５の（４）で、領域１「都市」の「最短距離累積回数３」が「３」となり、キーワード抽出条件（最短距離３＝「３」）以上となることから「都市」をキーワードとして抽出する。同様に、単語除去部５１は、領域２と４の「２、意見」が「最短距離累積回数１」が「１」となり、キーワード抽出条件（最短距離１＝「１」）以上となることから、それぞれをキーワードとして抽出する。このように、キーワードとして抽出した結果を図２６に示す。 For example, if keyword extraction is performed based on the above keyword extraction conditions (shortest distance 0 = “1”, shortest distance 1 = “1”, shortest distance 2 = “1”, shortest distance 3 = “3”), From (1) to (3) in FIG. 25, since the cumulative number by distance of each word (city, opinion, cholera) does not exceed the cumulative number of extraction targets by distance, any word is extracted as a keyword. Not. Then, in (4) of FIG. 25, the word removal unit 51 sets “3” as the “shortest distance cumulative count” of the area 1 “city” to be equal to or greater than the keyword extraction condition (shortest distance 3 = “3”). To extract “city” as a keyword. Similarly, the word removal unit 51 sets “2, opinion” in the areas 2 and 4 to “1” for “the shortest distance accumulation count 1”, which is equal to or greater than the keyword extraction condition (shortest distance 1 = “1”). , Each is extracted as a keyword. Thus, the result extracted as a keyword is shown in FIG.

意味的に遠い単語を除去する前の図２５と除去した後の図２６とを比較すればわかるように、単語除去部５１は、キーワードとして「都市」「意見」「市民」「美術館」を抽出し、「コレラ」「失意」をキーワードとして抽出していない。つまり、単語除去部５１は、発話単語リスト４２ｂから「コレラ」「失意」を削除したこととなる。 As can be seen by comparing FIG. 25 before removing words that are semantically far away from FIG. 26 after removal, the word removal unit 51 extracts “city”, “opinion”, “citizen”, and “museum” as keywords. However, “cholera” and “disappointment” are not extracted as keywords. That is, the word removal unit 51 deletes “cholera” and “disappointment” from the spoken word list 42b.

（単語の平均距離と短縮距離による評価）
次に、単語の平均距離と短縮距離による評価について説明すると、単語除去部５１は、算出した「最短距離データ」をメモリから読み出し、発話単語リスト４２ｂに含まれる単語間の平均距離と短縮距離して評価し、この評価が高いものをキーワードとして抽出し、発話単語リスト４２ｂに格納する。 (Evaluation based on average word distance and shortened distance)
Next, the evaluation based on the average word distance and the shortened distance will be described. The word removing unit 51 reads the calculated “shortest distance data” from the memory, and calculates the average distance between the words included in the utterance word list 42b and the shortened distance. Are evaluated and extracted as keywords, and stored in the spoken word list 42b.

具体的に説明すると、単語除去部５１は、図２７に示したように、『キーワード抽出評価対象とする単語の最大数を示す「評価ウインドウ幅」、平均距離の計算に使用する距離計算結果の最大数を示す「平均対象距離計算結果数」、キーワード抽出対象とする平均距離を示す「抽出対象平均距離」、キーワード抽出対象とする短縮距離（全ノード距離平均―平均距離）を示す「抽出対象短縮距離」』を「５、３、３．０、１．０」などのキーワード抽出条件を保持する。 More specifically, as shown in FIG. 27, the word removing unit 51 “the“ evaluation window width ”indicating the maximum number of words to be subjected to keyword extraction evaluation, and the distance calculation result used for calculating the average distance. “Number of average target distance calculation results” indicating the maximum number, “Extraction target average distance” indicating the average distance for keyword extraction, The keyword extraction condition such as “shortened distance” ”“ 5, 3, 3.0, 1.0 ”is held.

このような構成のもと、単語除去部５１は、保持するキーワード抽出条件と距離辞書記憶部５０から距離辞書（図１７〜図１９）に基づいて、発話単語リスト４２ｂに含まれる単語の平均距離と短縮距離による評価を行う。その結果を図２８に示す。この結果から、各単語の「平均距離の値が抽出対象平均距離の設定値以下」もしくは「短縮距離の値が抽出対象短縮距離の設定値以上」である場合、単語除去部５１は、該当単語をキーワードとして抽出する。 Based on such a configuration, the word removal unit 51 uses the keyword extraction condition to be held and the distance dictionary storage unit 50 based on the distance dictionary (FIGS. 17 to 19) and the average distance of words included in the utterance word list 42b. And evaluation by shortening distance. The result is shown in FIG. From this result, when the “average distance value is equal to or smaller than the extraction target average distance setting value” or “the shortening distance value is equal to or larger than the extraction target shortening distance setting value” for each word, the word removing unit 51 Is extracted as a keyword.

ここで、最短距離平均の計算にあたって、キーワード抽出条件の平均対象距離計算結果数を読み出し、評価ウインドウ内に最短距離計算結果が、この設定値より多く存在する場合、上位（距離が短い）平均対象距離計算結果数を平均計算対象とする。また、最短距離平均計算上、距離計算結果が「０」の計算結果は、平均対象とし、距離計算結果が「Ｎ」の計算結果は、平均計算対象外とする。このように、キーワードとして抽出した結果を図２９に示す。 Here, when calculating the shortest distance average, the average target distance calculation result number of the keyword extraction condition is read, and if there are more shortest distance calculation results than this set value in the evaluation window, the upper (short distance) average target The number of distance calculation results is used as an average calculation target. In addition, in the shortest distance average calculation, a calculation result with a distance calculation result “0” is an average object, and a calculation result with a distance calculation result “N” is not an average calculation object. Thus, the result extracted as a keyword is shown in FIG.

意味的に遠い単語を除去する前の図２８と除去した後の図２９を比較すればわかるように、単語除去部５１は、キーワードとして「都市」「意見」「市民」「美術館」を抽出し、「コレラ」「失意」をキーワードとして抽出していない。つまり、単語除去部５１は、発話単語リスト４２ｂから「コレラ」「失意」を削除したこととなる。 As can be seen by comparing FIG. 28 before removing semantically distant words and FIG. 29 after removing words, the word removal unit 51 extracts “city”, “opinion”, “citizen”, and “museum” as keywords. , “Cholera” and “Disappointment” are not extracted as keywords. That is, the word removal unit 51 deletes “cholera” and “disappointment” from the spoken word list 42b.

なお、上記した「単語間最短距離別累積回数による評価」「単語の平均距離と短縮距離による評価」において、全ノード平均距離、リンク数、距離中心性、媒介中心性といった一般的なネットワーク分析で用いられる指標値を、キーワード抽出条件として用いてもよい。また、抽出されるキーワードの個数に上限を設けて、上記した一般的なネットワーク分析で用いられる指標値の大小順にキーワードを抽出してもよい。 In the above-mentioned “evaluation based on the cumulative number of shortest distances between words” and “evaluation based on average distance and shortening distance of words”, general network analysis such as average distance of all nodes, number of links, distance centrality, and mediation centrality The index value used may be used as a keyword extraction condition. Further, an upper limit may be set on the number of keywords to be extracted, and the keywords may be extracted in order of the index values used in the general network analysis described above.

図１６の説明に戻り、単語補完部５２は、記憶部４２に記憶された距離辞書を用いて、単語除去後の発話単語リスト４２ｂに列挙された単語との意味的距離の評価が高い単語を距離辞書から抽出し、当該抽出した単語を発話単語リストに補完する。具体的には、図２９に示した単語除去部５１により抽出された発話単語リスト４２ｂに含まれる単語との意味的距離が近い単語を、図１７〜図１９に示した距離辞書を用いて抽出し、発話単語リスト４２ｂに補完する。 Returning to the description of FIG. 16, the word complementing unit 52 uses the distance dictionary stored in the storage unit 42 to select words having a high semantic distance evaluation with the words listed in the spoken word list 42 b after the word removal. Extract from the distance dictionary and complement the extracted word to the spoken word list. Specifically, words that are close in semantic distance to the words included in the utterance word list 42b extracted by the word removal unit 51 shown in FIG. 29 are extracted using the distance dictionary shown in FIGS. The utterance word list 42b is complemented.

例を挙げて説明すると、単語補完部５２は、図３０に示したように、『抽出対象とするキーワードからの最大距離を示す「抽出対象距離数」、キーワードからの距離が「１」の出力最大個数を示す「距離１出力上限数」、キーワードからの距離が「２」の出力最大個数を示す「距離２出力上限数」』を「２、１、０」などの抽出条件を保持する。 For example, as shown in FIG. 30, the word complementing unit 52 outputs “the number of extraction target distances indicating the maximum distance from the keyword to be extracted” and the distance from the keyword “1”. The “distance 1 output upper limit number” indicating the maximum number and the “distance 2 output upper limit number” indicating the maximum output number with the distance from the keyword “2” are held as extraction conditions such as “2, 1, 0”.

このような構成のもと、単語補完部５２は、図２９に示した単語除去部５１により抽出された発話単語リスト４２ｂに列挙された単語との意味的距離の評価が高い単語を距離辞書から抽出し、当該抽出した単語を発話単語リストに補完する。補完した結果を図３１に示す。 With such a configuration, the word complementing unit 52 searches the distance dictionary for words having a high semantic distance evaluation with the words listed in the utterance word list 42b extracted by the word removing unit 51 shown in FIG. The extracted word is complemented to the utterance word list. The complemented results are shown in FIG.

意味的に近い単語を補完する前の図２９と補完した後の図３１とを比較してわかるように、単語補完部５２は、「美術館、意見、市民」と意味の近い単語として「活動」を補完し、さらに「芸術」を補完していることがわかる。そして、このように補完した単語を発話単語リスト４２ｂに格納する。 As can be seen by comparing FIG. 29 before supplementing words that are close in meaning and FIG. 31 after completion, the word completion unit 52 is “activity” as a word having a meaning similar to “museum, opinion, citizen”. It is understood that “art” is complemented. The words complemented in this way are stored in the utterance word list 42b.

図１６の説明に戻り、話題語抽出部４３ｃは、単語除去部５１により意味的距離が遠い単語が削除され、単語補完部５２により意味的距離が近い単語が補完された発話単語リスト４２ｂに含まれる単語ごとに所定の単位時間当たりの発言回数を示す発話密度を算出し、当該発話密度が所定の値を超える単語を単位時間における話題語として抽出する。 Returning to the description of FIG. 16, the topic word extraction unit 43 c is included in the utterance word list 42 b in which words having a long semantic distance are deleted by the word removal unit 51 and words having a short semantic distance are complemented by the word completion unit 52. For each word, an utterance density indicating the number of utterances per predetermined unit time is calculated, and words whose utterance density exceeds a predetermined value are extracted as topic words in the unit time.

続いて、話題語転換点抽出部４３ｅは、話題語リスト４２ｃに記憶された話題語抽出用リストに列挙された単語であることを条件に、発話密度が所定の値を超える単語を前記単位時間における話題語として抽出する。なお、話題語抽出部４３ｃと話題語転換点抽出部４３ｅとは、実施例１で説明した話題語抽出部２３ｃと話題語転換点抽出部２３ｅと同様の機能を有するので、ここではその詳細な説明は省略する。 Subsequently, the topic word turning point extraction unit 43e selects words having an utterance density exceeding a predetermined value on the condition that the words are listed in the topic word extraction list stored in the topic word list 42c. Extracted as a topic word. Note that the topic word extraction unit 43c and the topic word turning point extraction unit 43e have the same functions as the topic word extraction unit 23c and the topic word turning point extraction unit 23e described in the first embodiment. Description is omitted.

重要語抽出部５３は、話題語転換点抽出部４３ｅによって抽出された話題語（話題転換点）によって区切られる区間ごとに、話題語リスト４２ｃにおいて各区間で列挙された話題語それぞれの発話頻度を算出し、当該発話頻度が所定の値を超える話題語を重要語として抽出する。 The keyword extraction unit 53 determines the utterance frequency of each topic word listed in each section in the topic word list 42c for each section divided by the topic word (topic conversion point) extracted by the topic word conversion point extraction section 43e. A topic word whose utterance frequency exceeds a predetermined value is extracted as an important word.

具体的に説明すると、重要語抽出部５３は、図３２に示したように、『データを集計する周期を示す「集計周期」、周期時刻まわりの前後時間幅を示す「集計時間幅」、連想語を抽出するか否かを示す「連想語集計要否設定」』を「２（分）、１（分）、１」などの集計条件を保持する。なお、連想語とは、ある単語から意味的に近い単語を示している。また、「連想語集計要否設定」が「０」の場合、キーワード（重要語）のみを抽出し、「１」の場合、キーワードと連想語との両方を抽出する。また、重要語抽出部５３は、図３５に示したようなあらかじめインデクスとして抽出すべきでない単語一覧（不要単語一覧）、例えば、「自分」や「人々」などインデクスを特定するのに不向きな単語を保持する。 More specifically, as shown in FIG. 32, the keyword extraction unit 53 reads “a totaling period indicating a data summing period, a“ totaling time width ”indicating a time width before and after the period time, an association “Associative word count necessity setting” indicating whether or not to extract words is stored as a count condition such as “2 (minutes), 1 (minutes), 1”. An associative word indicates a word that is semantically close to a certain word. When “associative word count necessity setting” is “0”, only keywords (important words) are extracted, and when “1”, both keywords and associative words are extracted. Further, the important word extraction unit 53 is a word list (unnecessary word list) that should not be extracted in advance as an index as shown in FIG. 35, for example, words unsuitable for specifying an index such as “self” or “people”. Hold.

また、重要語抽出部５３は、図３４に示したように、『インデクス語として抽出する最小累積頻度を示す「インデクス語抽出対象頻度」、抽出対象外単語排除要否設定を示す「抽出対象外単語排除要否設定」、連想語抽出条件を示す「近傍語集計要否設定」』を「２、１、１」などの重要語抽出条件を保持する。また、「抽出対象外単語排除要否設定」が「０」の場合、不要単語一覧を考慮せず、「１」の場合、不要単語一覧を考慮する。また、「近傍語集計要否設定」が「０」の場合、近傍語は抽出せず、「１」の場合、距離「１」の近傍語のみ抽出し、「２」の場合、距離「２」の近傍語のみ抽出し、「３」の場合、距離「３」の近傍語のみ抽出する。なお、近傍語とは、先に説明した連想語と同一の意味である。 Further, as shown in FIG. 34, the important word extraction unit 53 displays “an index word extraction target frequency indicating the minimum cumulative frequency extracted as an index word” and “non-extraction target word exclusion setting” indicating the non-extraction target word exclusion necessity setting. Key word extraction conditions such as “2, 1, 1” are held for “word exclusion necessity setting” and “neighboring word aggregation necessity setting” indicating association word extraction conditions. Further, when the “exclusion target word exclusion necessity setting” is “0”, the unnecessary word list is not considered, and when it is “1”, the unnecessary word list is considered. Further, when “neighboring word aggregation necessity setting” is “0”, neighboring words are not extracted, when “1”, only the neighboring words with the distance “1” are extracted, and when “2”, the distance “2”. ”Is extracted, and in the case of“ 3 ”, only the neighborhood word of distance“ 3 ”is extracted. Note that the neighborhood word has the same meaning as the association word described above.

このような構成のもと、話題転換点が「１６：３０：００」であった場合、重要語抽出部５３は、保持する集計条件（図３２参照）に基づいて、話題転換点によって区切られる区間ごと（「１６：２８：００〜１６：３０：００」）に、話題語リスト４２ｃにおいて各区間で列挙された話題語それぞれの発話頻度を算出すると、図３３のような結果が得られる。図３３では、「時刻情報、ノード番号、集計単語、累積頻度」として「１６時２８分１０秒、４、市民、７」や「１６時２９分４０秒、５８、意識」などの結果を得る。 In such a configuration, when the topic turning point is “16:30,” the keyword extraction unit 53 is divided by the topic turning point based on the totaling condition held (see FIG. 32). If the utterance frequency of each topic word listed in each section in the topic word list 42c is calculated for each section ("16: 28: 0 to 16:30:30"), a result as shown in FIG. 33 is obtained. In FIG. 33, “time information, node number, total word, cumulative frequency”, “16:28:10, 4, citizen, 7,” or “16:29:40, 58, awareness” is obtained. .

続いて、重要語抽出部５３は、保持する重要語抽出条件（図３４参照）と図３５の不要単語一覧に基づいて、図３３から当該発話頻度が所定の値（例えば、１）を超える話題語を重要語として抽出すると、図３６に示した結果を得る。図３６には、図３３から発話頻度が「１」未満である単語および図３５の不要単語一覧に含まれる単語が削除されて抽出される。 Subsequently, the keyword extraction unit 53 determines the topic whose utterance frequency exceeds a predetermined value (for example, 1) from FIG. 33 based on the keyword extraction condition (see FIG. 34) and the unnecessary word list in FIG. When a word is extracted as an important word, the result shown in FIG. 36 is obtained. 36, the words whose utterance frequency is less than “1” and the words included in the unnecessary word list of FIG. 35 are deleted and extracted from FIG.

図１６の説明に戻り、インデクス生成部４３ｆは、話題転換点によって区切られる区間ごとに、発話時刻、発話密度、発話頻度、意味的距離のいずれか一つまたは複数に基づいて、話題語を編集してインデクスを生成する。 Returning to the description of FIG. 16, the index generation unit 43f edits the topic word based on one or more of the utterance time, the utterance density, the utterance frequency, and the semantic distance for each section delimited by the topic turning points. To generate an index.

具体的に説明すると、インデクス生成部４３ｆは、図３７に示したような『インデクス生成順序を示す「生成順序設定」、表示フォントを示す「フォント設定」』として「２、１」などのインデクス生成条件を保持し、この条件応じて図３８や図３９などのインデクスを生成する。なお、「生成順序設定」が「０」の場合、出現時刻の早い順に表示し、「１」の場合、累積頻度の大きい順に表示し、「２」の場合、出現時刻が早いかつ累積頻度の大きい順に表示する。また、「フォント設定」が「０」の場合、同じ大きさで生成し、「１」の場合、累積頻度に比例したフォントの大きさで生成する。 More specifically, the index generation unit 43f generates indexes such as “2, 1” as “generation order setting indicating index generation order” and “font setting” indicating display font as shown in FIG. The conditions are held, and indexes such as FIG. 38 and FIG. 39 are generated according to the conditions. When “Generation order setting” is “0”, the items are displayed in ascending order of appearance times. When “1”, the items are displayed in descending order of the cumulative frequency. When “2”, the appearance time is early and the cumulative frequency is displayed. Display in descending order. When “Font setting” is “0”, the fonts are generated with the same size, and when “1”, they are generated with a font size proportional to the cumulative frequency.

なお、意味的に近い隣接した単語が抽出された場合、例えば、「反対」「意見」と「反対意見」にマージするような、係り受けの関係にある２つのインデクスを一つの語に統合するチャンク処理を行ってもよい。 When adjacent words that are semantically close are extracted, for example, two indexes having a dependency relationship, such as merging into “opposite”, “opinion”, and “opposite opinion”, are integrated into one word. Chunk processing may be performed.

図１６の説明に戻り、インデクス４２ｄは、インデクス生成部４３ｆにより作成されたインデクスを格納する。具体的に例を挙げれば、図３８に示すように、「時間帯（話題区間）とインデクス」として「１６時２７分〜３０分、美術館、建設計画、市民、賛成、反対、意見、建設場所、活動、芸術、定例会、問題、トーク、計画」などインデクス生成部４３ｆによりインデクス生成順序や表示フォントを変えて生成されたインデクスを記憶する。また、図３９に示すように、音声コンテンツ全体を複数の話題（区間）に分けて、それぞれに対応しインデクスを記憶する。 Returning to the description of FIG. 16, the index 42d stores the index created by the index generation unit 43f. To give a specific example, as shown in FIG. 38, “time zone (topic section) and index” “16: 27-30, art museum, construction plan, citizen, approval, disagreement, opinion, construction location , Activity, art, regular meeting, problem, talk, plan, etc., and the index generated by the index generation unit 43f changing the index generation order and display font. Also, as shown in FIG. 39, the entire audio content is divided into a plurality of topics (sections), and indexes are stored corresponding to each.

［実施例３による効果］
このように、実施例３によれば、複数の単語間で互いの意味的距離を規定した距離辞書を記憶し、記憶された距離辞書を用いて、生成された発話単語リストに含まれる単語間の意味的距離を評価し、当該評価が低い単語を前記発話単語リストから除去し、単語除去後の発話単語リストに基づいて、当該発話単語リストに含まれる単語ごとに所定の単位時間当たりの発話密度を算出し、当該発話密度が所定の値を超える単語を単位時間における話題語として抽出するので、例えば、「美術館」「コレラ」「芸術」などから意味的に遠い「コレラ」を削除した抽出単語データを作成することができ、この抽出単語データから話題を特定することができる結果、意味的に遠い単語を削除していない抽出単語データから話題語を抽出する場合に比べて、さらに正確に話題語を特定することが可能である。 [Effects of Example 3]
As described above, according to the third embodiment, a distance dictionary that defines a mutual semantic distance between a plurality of words is stored, and words between words included in the generated utterance word list are stored using the stored distance dictionary. Utterance per predetermined unit time for each word included in the utterance word list based on the utterance word list after the word is removed. Since the density is calculated and words whose utterance density exceeds a predetermined value are extracted as topic words in unit time, for example, extraction is performed by deleting “cholera” that is semantically far from “museum”, “cholera”, “art”, etc. As a result of being able to create word data and identifying topics from this extracted word data, compared to extracting topic words from extracted word data that does not delete words that are semantically far It is possible to more accurately identify the topic words.

また、実施例３によれば、記憶された距離辞書を用いて、単語除去後の発話単語リストに列挙された単語との意味的距離の評価が高い単語を距離辞書から抽出し、当該抽出した単語を前記発話単語リストに補完し、単語補完後の発話単語リストに基づいて、当該発話単語リストに含まれる単語ごとに所定の単位時間当たりの発話密度を算出し、当該発話密度が所定の値を超える単語を単位時間における話題語として抽出するので、例えば、音声コンテンツから「美術館」「建設場所」などが抽出されているが「建設計画」という単語が抽出されなかった場合、「建設計画」「美術館」「建設場所」はお互いに意味的に近い単語と推定されるので、この「建設計画」を新しく抽出するなど、音声コンテンツからは抽出されなかったが、意味的に近い単語を補完することができる結果、意味的に近い単語を補完しなかった場合に比べて、より正確に話題語を特定することが可能である。 Further, according to the third embodiment, using the stored distance dictionary, a word having a high semantic distance evaluation with words listed in the utterance word list after word removal is extracted from the distance dictionary and extracted. A word is complemented to the utterance word list, and an utterance density per unit time is calculated for each word included in the utterance word list based on the utterance word list after word completion, and the utterance density is a predetermined value. For example, if “art museum” or “construction place” is extracted from the audio content, but the word “construction plan” is not extracted, “construction plan” is extracted. “Museum” and “construction site” are presumed to be semantically close to each other, so this “construction plan” was not extracted from the audio content, such as newly extracted, but semantically The results can be complemented words have, as compared with the case that did not complement the semantically close words, it is possible to identify more accurately topic words.

また、実施例３によれば、抽出された話題転換点によって区切られる区間ごとに、話題語リストにおいて各区間で列挙された話題語それぞれの発話頻度を算出し、当該発話頻度が所定の値を超える話題語を重要語として抽出し、話題転換点によって区切られる区間ごとに、重要語として抽出された話題語を列挙したインデクスを生成するので、例えば、音声コンテンツ全体では出現頻度が少ない単語でも、ある区間においては話題を特定するのに重要な単語を抽出することができる結果、話題ごとにより有用な単語をインデクスとして作成することが可能である。 Further, according to the third embodiment, for each section delimited by the extracted topic turning points, the utterance frequency of each topic word listed in each section in the topic word list is calculated, and the utterance frequency is set to a predetermined value. The topic words that are extracted as important words are extracted as important words, and an index that lists the topic words extracted as important words is generated for each section delimited by topic turning points. As a result of extracting words that are important for specifying a topic in a certain section, it is possible to create more useful words as indexes for each topic.

また、実施例３によれば、話題転換点によって区切られる区間ごとに、発話時刻、発話密度、発話頻度、意味的距離のいずれか一つまたは複数に基づいて、話題語を編集してインデクスを生成するので、例えば、出現時間順や出現頻度順に表示フォントを大きくしてインデクスを表示するなど、インターネットの検索や会議録音データなどの議題検索など利用目的に応じたインデクスの表示を行うことが可能である。 Further, according to the third embodiment, for each section delimited by the topic turning point, the index is edited by editing the topic word based on one or more of the utterance time, the utterance density, the utterance frequency, and the semantic distance. Since it is generated, for example, it is possible to display the index according to the purpose of use such as searching the Internet and agenda searching such as conference recording data, such as displaying the index by increasing the display font in order of appearance time and appearance frequency It is.

さて、これまで本発明の実施例について説明したが、本発明は上述した実施例以外にも、種々の異なる形態にて実施されてよいものである。そこで、以下では実施例４として、本発明に含まれる他の実施例について説明する。 Although the embodiments of the present invention have been described so far, the present invention may be implemented in various different forms other than the embodiments described above. Therefore, another embodiment included in the present invention will be described below as a fourth embodiment.

（１）処理順序
例えば、実施例３では、発話単語リストを作成した後、単語除去処理、単語補完処理を行ってから、話題語抽出処理を行ったが、本発明はこれに限定されるものではなく、どちらかの処理のみを行うようにしてもよい。例えば、発話単語リストを作成した後、単語除去処理のみ行ってもよく、単語除去処理、単語補完処理のいずれか一つまたは両方の処理を行った後の発話単語リストに基づいて、話題語抽出処理を行わず、インデクスを作成してもよく、処理順序は臨機応変に変更可能である。 (1) Processing order For example, in Example 3, after the utterance word list was created, the word removal processing and the word complementing processing were performed, and then the topic word extraction processing was performed. However, the present invention is limited to this. Instead, only one of the processes may be performed. For example, after creating an utterance word list, only word removal processing may be performed, and topic word extraction is performed based on the utterance word list after performing one or both of word removal processing and word completion processing. An index may be created without performing processing, and the processing order can be changed as needed.

（２）システム構成等
また、本実施例において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的におこなうこともでき、あるいは、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的におこなうこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報（実施例１および２で説明した設定条件や抽出される各種リスト、インデクス）については、特記する場合を除いて任意に変更することができる。 (2) System configuration etc. In addition, among the processes described in the present embodiment, all or part of the processes described as being performed automatically can be performed manually or manually. All or a part of the processing described as a thing can also be automatically performed by a known method. In addition, the processing procedure, control procedure, specific name, information including various data and parameters shown in the above document and drawings (setting conditions described in the first and second embodiments, various lists to be extracted, indexes) About can be changed arbitrarily unless otherwise specified.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる（例えば、話題語抽出部と話題語リスト生成部を統合するなど）。さらに、各装置にて行なわれる各処理機能は、その全部または任意の一部が、ＣＰＵおよび当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 Further, each component of each illustrated apparatus is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. The topic word extraction unit and the topic word list generation unit may be integrated. Further, all or any part of each processing function performed in each device may be realized by a CPU and a program analyzed and executed by the CPU, or may be realized as hardware by wired logic.

なお、上記の実施例では、本発明を実現するインデクス生成装置を機能面から説明したが、インデクス生成装置の各機能はパーソナルコンピュータやワークステーションなどのコンピュータにプログラムを実行させることによって実現することもできる。すなわち、上記の実施例で説明した各種の処理手順は、あらかじめ用意されたプログラムをコンピュータ上で実行することによって実現することができる。そして、これらのプログラムは、インターネットなどのネットワークを介して配布することができる。さらに、これらのプログラムは、ハードディスク、フレキシブルディスク（ＦＤ）、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤなどのコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行することもできる。つまり、例を挙げれば、実施例に示したようなインデクス生成プログラムを格納したＣＤ−ＲＯＭ（装置ごとに別個のＣＤ−ＲＯＭであってもよい）を配布し、このＣＤ−ＲＯＭに格納されたプログラムを各コンピュータが読み出して実行するようにしてもよい。 In the above-described embodiments, the index generation apparatus that implements the present invention has been described in terms of functions. However, each function of the index generation apparatus may be realized by causing a computer such as a personal computer or a workstation to execute a program. it can. That is, the various processing procedures described in the above embodiments can be realized by executing a program prepared in advance on a computer. These programs can be distributed via a network such as the Internet. Further, these programs can be recorded on a computer-readable recording medium such as a hard disk, a flexible disk (FD), a CD-ROM, an MO, and a DVD, and can be executed by being read from the recording medium by the computer. In other words, for example, a CD-ROM (which may be a separate CD-ROM for each apparatus) storing the index generation program as shown in the embodiment is distributed and stored in this CD-ROM. The program may be read and executed by each computer.

以上のように、本発明に係るインデクス生成装置、インデクス生成方法およびインデクス生成プログラムは、音声コンテンツを音声認識してテキストデータを作成し、当該テキストデータから前記音声コンテンツに付与するインデクスを生成するのに有用であり、特に、音声コンテンツ全体に対して十分に有用かつ詳細なインデクスを作成することに適する。 As described above, the index generation device, the index generation method, and the index generation program according to the present invention generate speech data by recognizing speech content, and generate an index to be given to the speech content from the text data. In particular, it is suitable for creating a sufficiently useful and detailed index for the entire audio content.

実施例１に係るインデクス生成装置の概要および特徴を説明するための図である。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram for explaining an overview and characteristics of an index generation device according to Embodiment 1; 実施例１に係るインデクス装置の構成を示すブロック図である。It is a block diagram which shows the structure of the index apparatus which concerns on Example 1. FIG. 音声コンテンツに記憶される情報の構成例を示す図である。It is a figure which shows the structural example of the information memorize | stored in an audio | voice content. テキストデータに記憶される情報の構成例を示す図である。It is a figure which shows the structural example of the information memorize | stored in text data. テキストデータに記憶される情報の構成例を示す図である。It is a figure which shows the structural example of the information memorize | stored in text data. 発話単語リストに記憶される情報の構成例を示す図である。It is a figure which shows the structural example of the information memorize | stored in an utterance word list. 実施例１に係る発話密度の算出方法の例を示す図である。It is a figure which shows the example of the calculation method of the speech density which concerns on Example 1. FIG. 話題語リストに記憶される情報の構成例を示す図である。It is a figure which shows the structural example of the information memorize | stored in a topic word list. 話題転換点の算出方法の例を示す図である。It is a figure which shows the example of the calculation method of a topic turning point. インデクスに記憶される情報の構成例を示す図である。It is a figure which shows the structural example of the information memorize | stored in an index. 実施例１に係るインデクス生成装置における処理の流れを示すフローチャートである。7 is a flowchart illustrating a flow of processing in the index generation device according to the first embodiment. 実施例２に係る距離辞書の例を示す図である。It is a figure which shows the example of the distance dictionary which concerns on Example 2. FIG. あらかじめ用意した発話単語リストに記憶される情報の構成例を示す図である。It is a figure which shows the structural example of the information memorize | stored in the utterance word list prepared beforehand. 実施例２に係る話題語の不連続な時間帯を同じ時間帯とみなすことを説明するための図である。It is a figure for demonstrating that the discontinuous time slot | zone of the topic word which concerns on Example 2 is considered as the same time slot | zone. 実施例３に係るインデクス生成装置の概要および特徴を説明するための図である。It is a figure for demonstrating the outline | summary and the characteristic of the index production | generation apparatus which concern on Example 3. FIG. 実施例３に係るインデクス装置の構成を示すブロック図である。It is a block diagram which shows the structure of the index apparatus which concerns on Example 3. FIG. 距離辞書の例を示す図である。It is a figure which shows the example of a distance dictionary. 距離辞書の例を示す図である。It is a figure which shows the example of a distance dictionary. 距離辞書（知識ネットワーク）の例を示す図である。It is a figure which shows the example of a distance dictionary (knowledge network). 発話単語抽出条件の例を示す図である。It is a figure which shows the example of the utterance word extraction conditions. 発話単語抽出結果の例を示す図である。It is a figure which shows the example of the utterance word extraction result. 最短距離計算条件の例を示す図である。It is a figure which shows the example of the shortest distance calculation conditions. 最短距離計算結果の例を示す図である。It is a figure which shows the example of the shortest distance calculation result. キーワード抽出条件の例を示す図である。It is a figure which shows the example of keyword extraction conditions. 単語間最短距離別累積回数による評価の例を示す図である。It is a figure which shows the example of the evaluation by the frequency | count of accumulation according to the shortest distance between words. 単語削除後の単語抽出結果（発話単語リスト）の例を示す図である。It is a figure which shows the example of the word extraction result (utterance word list) after word deletion. キーワード抽出条件の例を示す図である。It is a figure which shows the example of keyword extraction conditions. 平均距離と短縮距離による評価の例を示す図である。It is a figure which shows the example of evaluation by an average distance and shortening distance. 単語削除後の単語抽出結果（発話単語リスト）の例を示す図である。It is a figure which shows the example of the word extraction result (utterance word list) after word deletion. 補完単語抽出条件の例を示す図である。It is a figure which shows the example of a complementary word extraction condition. 単語補完後の単語抽出結果（発話単語リスト）の例を示す図である。It is a figure which shows the example of the word extraction result (utterance word list) after word complementation. 発話密度集計条件の例を示す図である。It is a figure which shows the example of speech density totalization conditions. 発話密度集計結果の例を示す図である。It is a figure which shows the example of an utterance density total result. 重要語抽出条件の例を示す図である。It is a figure which shows the example of an important word extraction condition. インデクスとして抽出すべきでない単語一覧の例を示す図である。It is a figure which shows the example of the word list which should not be extracted as an index. インデクスとして抽出すべきでない単語削除後の単語抽出結果の例を示す図である。It is a figure which shows the example of the word extraction result after the word deletion which should not be extracted as an index. インデクス生成条件の例を示す図である。It is a figure which shows the example of an index production | generation condition. インデクス生成例を示す図である。It is a figure which shows the example of an index production | generation. インデクス生成例を示す図である。It is a figure which shows the example of an index production | generation.

Explanation of symbols

２０インデクス生成装置
２１通信制御Ｉ／Ｆ部
２２記憶部
２２ａテキストデータ
２２ｂ発話単語リスト
２２ｃ話題語リスト
２２ｄインデクス
２３制御部
２３ａ音声認識部
２３ｂ発話単語リスト生成部
２３ｃ話題語抽出部
２３ｄ話題語リスト生成部
２３ｅ話題語転換点抽出部
２３ｆインデクス生成部
４０インデクス生成装置
４１通信制御Ｉ／Ｆ部
４２記憶部
４２ａテキストデータ
４２ｂ発話単語リスト
４２ｃ話題語リスト
４２ｄインデクス
４３制御部
４３ａ音声認識部
４３ｂ発話単語リスト生成部
４３ｃ話題語抽出部
４３ｄ話題語リスト生成部
４３ｅ話題語転換点抽出部
４３ｆインデクス生成部
５０距離辞書記憶部
５１単語除去部
５２単語補完部
５３重要語抽出部 DESCRIPTION OF SYMBOLS 20 Index production | generation apparatus 21 Communication control I / F part 22 Storage part 22a Text data 22b Utterance word list 22c Topic word list 22d Index 23 Control part 23a Speech recognition part 23b Utterance word list generation part 23c Topic word extraction part 23d Topic word list generation Unit 23e topic word turning point extraction unit 23f index generation unit 40 index generation device 41 communication control I / F unit 42 storage unit 42a text data 42b utterance word list 42c topic word list 42d index 43 control unit 43a speech recognition unit 43b utterance word list Generating unit 43c topic word extracting unit 43d topic word list generating unit 43e topic word turning point extracting unit 43f index generating unit 50 distance dictionary storage unit 51 word removing unit 52 word complementing unit 53 important word extracting unit

Claims

An index generation device that generates speech data by recognizing speech content and generates an index to be given to the speech content from the text data,
An utterance word list generating means for extracting a word included in the text data and an utterance time of the word, and generating an utterance word list in which the word and the utterance time are associated and listed;
An utterance density indicating the number of utterances per predetermined unit time is calculated for each word included in the utterance word list generated by the utterance word list generation means, and words whose utterance density exceeds a predetermined value are calculated in the unit time. Topic word extraction means for extracting as a topic word;
Topic word list generation for generating a topic word list in which the topic words extracted by the topic word extraction means are associated and listed for each predetermined unit time from the utterance start time to the utterance end time of the audio content. Means,
Index generating means for generating an index from the topic words listed in the topic word list generated by the topic word list generating means;
An index generation device comprising:

In the topic word list generated by the topic word list generating means, the topic word list further includes a topic turning point extracting means for extracting a time when none of the topic words are enumerated as a topic turning point,
The index generating means generates an index from the topic words listed in each section in the topic word list for each section delimited by the topic turning points extracted by the topic turning point extracting means. Generator.

A distance dictionary storage means for storing a distance dictionary that defines a semantic distance between the plurality of words;
The topic word extracting means, when calculating the utterance density per predetermined unit time for a predetermined word included in the utterance word list, means the predetermined word and the meaning in the distance dictionary stored in the distance dictionary storage means. The index according to claim 1 or 2, wherein words having a target distance within a predetermined range are also considered to be the same word as the predetermined word, and the utterance density per predetermined unit time is calculated. Generator.

A topic word extraction list storage means for storing a topic word extraction list that lists the words to be extracted as the topic words;
The topic word extraction means, on the condition that it is a word listed in the topic word extraction list stored in the topic word extraction list storage means, the word whose utterance density exceeds a predetermined value as the unit time The index generation device according to claim 1, 2 or 3, wherein the index generation device extracts the topic word.

When the same topic word is listed without a predetermined time interval in the topic word list, the topic turning point extraction means determines that the topic turning point has been enumerated, and the topic turning point is extracted. The index generation apparatus according to claim 1, wherein the index generation apparatus extracts the index generation apparatus.

A distance dictionary storage means for storing a distance dictionary that defines a semantic distance between a plurality of words;
Using the distance dictionary stored in the distance dictionary storage means, a semantic distance between words included in the utterance word list generated by the utterance word list generation means is evaluated, and a word with a low evaluation is selected as the utterance word Word removal means to be removed from the list;
Further comprising
The topic word extraction unit calculates an utterance density per unit time for each word included in the utterance word list based on the utterance word list after the word removal by the word removal unit, and the utterance density is predetermined. The index generation apparatus according to claim 1, wherein words exceeding the value of the index are extracted as topic words in the unit time.

Using the distance dictionary stored in the distance dictionary storage means, extract words from the distance dictionary having a high semantic distance evaluation with words listed in the utterance word list after the word removal by the word removal means, Further comprising word complementing means for complementing the extracted word in the utterance word list,
The topic word extraction unit calculates an utterance density per unit time for each word included in the utterance word list based on the utterance word list after word completion by the word completion unit, and the utterance density is predetermined. The index generation apparatus according to claim 6, wherein a word exceeding the value is extracted as a topic word in the unit time.

For each section delimited by topic turning points extracted by the topic turning point extraction means, the utterance frequency of each topic word listed in each section in the topic word list is calculated, and the utterance frequency exceeds a predetermined value It further includes an important word extraction means for extracting a topic word as an important word,
The index generation means generates an index listing the topic words extracted as important words by the important word extraction means for each section delimited by the topic turning points. The index generation device according to any one of the above.

The index generation means edits the topic word based on one or more of the utterance time, the utterance density, the utterance frequency, and the semantic distance for each section delimited by the topic turning point. The index generating apparatus according to claim 1, wherein the index is generated.

An index generation device that generates speech data by recognizing speech content and generates an index to be given to the speech content from the text data,
A distance dictionary storage means for storing a distance dictionary that defines a semantic distance between a plurality of words;
An utterance word list generating means for extracting a word included in the text data and an utterance time of the word, and generating an utterance word list in which the word and the utterance time are associated and listed;
Using the distance dictionary stored in the distance dictionary storage means, a semantic distance between words included in the utterance word list generated by the utterance word list generation means is evaluated, and a word with a low evaluation is selected as the utterance word Word removal means to be removed from the list;
Index generating means for generating an index based on the spoken word list after word removal by the word removing means;
An index generation device comprising:

Using the distance dictionary stored in the distance dictionary storage means, extract words from the distance dictionary having a high semantic distance evaluation with words listed in the utterance word list after the word removal by the word removal means, Further comprising word complementing means for complementing the extracted word in the utterance word list,
The index generation device according to claim 10, wherein the index generation means generates an index based on the utterance word list after word completion by the word completion means.

An index generation method suitable for generating speech data by recognizing speech content and generating an index to be given to the speech content from the text data,
An utterance word list generating step of extracting a word included in the text data and an utterance time of the word and generating an utterance word list in which the word and the utterance time are associated and listed;
An utterance density indicating the number of utterances per predetermined unit time is calculated for each word included in the utterance word list generated by the utterance word list generation step, and words whose utterance density exceeds a predetermined value are calculated in the unit time. A topic word extraction process for extracting as a topic word;
Topic word list generation that generates a topic word list in which the topic words extracted by the topic word extraction step are associated and listed for each predetermined unit time from the utterance start time to the utterance end time of the audio content. Process,
An index generation step of generating an index from the topic words listed in the topic word list generated by the topic word list generation step;
The index generation method characterized by including.

An index creation program for causing a computer to execute an index generation method suitable for generating speech data by speech recognition of speech content and generating an index to be added to the speech content from the text data,
An utterance word list generation procedure for extracting a word included in the text data and an utterance time of the word and generating an utterance word list in which the word and the utterance time are associated and listed;
An utterance density indicating the number of utterances per predetermined unit time is calculated for each word included in the utterance word list generated by the utterance word list generation procedure, and words whose utterance density exceeds a predetermined value are calculated in the unit time. A topic word extraction procedure to extract as a topic word;
Topic word list generation that generates a topic word list in which the topic words extracted by the topic word extraction procedure are associated and listed for each predetermined unit time from the utterance start time to the utterance end time of the audio content. Process,
An index generation procedure for generating an index from the topic words listed in the topic word list generated by the topic word list generation procedure;
An index generation program for causing a computer to execute