JP2014052803A

JP2014052803A - Voice document retrieval method and voice document retrieval system

Info

Publication number: JP2014052803A
Application number: JP2012196357A
Authority: JP
Inventors: Hiroteru Nanjo; 浩輝南條; Tomohiro Nishio; 友宏西尾; Takehiko Yoshimi; 毅彦吉見
Original assignee: Ryukoku University
Current assignee: Ryukoku University
Priority date: 2012-09-06
Filing date: 2012-09-06
Publication date: 2014-03-20

Abstract

PROBLEM TO BE SOLVED: To provide a voice document retrieval method and the like in which a voice section showing the content to be retrieved can be retrieved with good accuracy out of a relatively long voice document.SOLUTION: Voice data stored in a database 2 is read out and documented by using voice recognition means 3, and a voice section for each utterance unit such as 15 utterance unit, 30 utterance unit, and 60 utterance unit is extracted from the documented data. Then, a related word which is related to a query input by using this voice section is extracted by using a pseudo-compatibility feedback, and search is performed with a second voice section being an index by using the extracted related word and the original query.

Description

本発明は、音声集合から探したい内容の音声区間を見つけ出せるようにした音声ドキュメント検索方法に関するものであって、例えば、講演単位のような長時間の音声ファイルの中から最適な内容の音声区間を見つけ出せるようにした音声ドキュメント検索方法に関するものである。 The present invention relates to an audio document search method that enables an audio section of content to be searched from an audio set to be found. For example, an audio section having an optimal content is selected from a long-time audio file such as a lecture unit. The present invention relates to a voice document search method that can be found.

従来より、音声ドキュメントから所望の音声区間を見つけ出すようにするための手法として、クエリ拡張などの検索方法が存在する。このクエリ拡張は、ユーザが入力した初期クエリに対していくつかの関連語を抽出し、ユーザにこの関連語を取捨選択させて元のクエリとともに検索を行うようにしたものである。また、このクエリ拡張を発展させたものとして、入力された元のクエリによる検索結果から上位のいくつかを擬似的に適合文書とみなし、元のクエリとともに検索を行っていく擬似適合性フィードバックなどがある（特許文献１など）。 Conventionally, there is a search method such as query expansion as a method for finding a desired speech section from a speech document. In this query expansion, some related terms are extracted from the initial query input by the user, and the user selects the related terms and performs a search together with the original query. As an extension of this query expansion, pseudo-conformance feedback, etc. that considers some of the top ranks from the search results of the input original query as pseudo-compliant documents and performs a search with the original query, etc. Yes (Patent Document 1 etc.).

特開２００３−２４２１７０号公報JP 2003-242170 A

ところで、このような擬似適合性フィードバックなどの手法を用いて、例えば、複数の講演などのような非常に長い音声ドキュメントの中から所望の話題を含む音声区間を検索しようとした場合、次のような問題を生じる。 By the way, when trying to search a speech section including a desired topic from a very long speech document such as a plurality of lectures using a method such as pseudo-adaptability feedback, the following is performed. Cause serious problems.

すなわち、擬似適合性フィードバックは、入力された初期のクエリによる検索結果の上位を擬似的に適合文書とみなして関連語を抽出するものであるため、音声ドキュメント検索では、クエリ拡張時に音声認識誤り語が追加されてしまう可能性がある。また、検索対象が、比較的長時間の講演単位であって多くの話題を含む場合には、クエリと関連しない追加語が抽出される可能性がある。このため、数多くの講演などの音声ドキュメントの中から所望の話題を含む音声区間を抽出するような場合、その抽出された講演単位の長い音声ドキュメントを一から再生して適合した検索結果であるかどうかを判断しなければならず、その作業に膨大な時間がかかってしまうという問題があった。 In other words, pseudo relevance feedback is to extract the related terms by regarding the upper rank of the search results from the input initial query as pseudo relevance documents. May be added. In addition, when the search target is a relatively long lecture unit and includes many topics, an additional word that is not related to the query may be extracted. For this reason, if a speech segment containing a desired topic is extracted from a large number of speech documents such as lectures, is the search result suitable by reproducing the extracted speech document for each lecture unit from the beginning? There was a problem that it took a huge amount of time to work.

そこで、本発明は上記課題に着目してなされたものであり、比較的長い音声ドキュメントの中から探したい内容を表す音声区間を精度よく検索できるようにした音声ドキュメント検索方法などを提供することを目的とする。 Therefore, the present invention has been made paying attention to the above-mentioned problem, and provides a voice document search method and the like that can accurately search a voice section representing a content to be searched from a relatively long voice document. Objective.

すなわち、本発明は上記課題を解決するために、検索対象となる音声ドキュメントを所定の時間単位ごとに分離して音声区間を抽出するステップと、当該抽出された音声区間内においてクエリに関連する関連語を抽出するステップと、当該抽出された関連語を用いて検索を行うステップとを備えるようにしたものである。 That is, in order to solve the above-described problem, the present invention separates a speech document to be searched for every predetermined time unit and extracts a speech section, and a relation related to a query in the extracted speech section. A step of extracting a word and a step of performing a search using the extracted related word are provided.

このようにすれば、例えば、講演単位といった非常に長い区間から関連語を抽出して検索する場合と比べて、無関係な話題まで関連語として追加される可能性が少なくなり、検索精度を高めることができるようになる。 In this way, for example, compared to the case of searching related words extracted from a very long section such as a lecture unit, the possibility of adding unrelated topics as related words is reduced, and the search accuracy is improved. Will be able to.

さらには、所定の時間単位として、所定の単語単位、もしくは、息継ぎ単位で分割された時間単位を用いるようにする。 Further, as the predetermined time unit, a predetermined word unit or a time unit divided by breathing unit is used.

このようにすれば、関連語を抽出するための音声区間を単語単位や息継ぎ単位で区切るため、一律に何秒単位といった時間で区切る場合と比べて、関連語を抽出しやすくすることができるようになる。 In this way, since the speech section for extracting the related word is divided in units of words or breathing units, it is possible to make it easier to extract the related words than in the case where the time is uniformly divided in units of seconds. become.

さらに、前記所定の時間単位とは異なる第二の時間単位ごとに分離された音声区間を検索のための索引として、前記抽出された関連語を用いて検索するようにする。 Further, the speech segment separated for each second time unit different from the predetermined time unit is used as a search index to search using the extracted related terms.

このようにすれば、分離された第二の時間単位ごとに検索を行うことができるため、長い音声ドキュメントの中から適合した音声区間を効率よく抽出することができるようになる。 In this way, since a search can be performed for each separated second time unit, it is possible to efficiently extract a suitable speech section from a long speech document.

本発明では、検索対象となる音声ドキュメントを所定の時間単位ごとに分離して音声区間を抽出し、その抽出された音声区間内においてクエリに関連する関連語を抽出して検索を行うようにしたので、例えば、講演単位といった非常に長い区間から関連語を抽出して検索する場合と比べて、無関係な話題まで関連語として追加される可能性が少なくなり、検索精度を高めることができるようになる。 In the present invention, a speech document to be searched is separated every predetermined time unit to extract a speech section, and a related word related to the query is extracted in the extracted speech section to perform a search. So, for example, compared to searching by extracting related terms from a very long section such as a lecture unit, the possibility of adding unrelated topics as related terms is reduced, and the search accuracy can be improved. Become.

本発明の一実施の形態における音声ドキュメント検索システムの機能ブロック図Functional block diagram of a speech document search system in an embodiment of the present invention 擬似適合性フィードバックを用いたクエリ拡張を示す図Diagram showing query expansion with pseudo-fitness feedback 本発明の一実施の形態における擬似適合性フィードバックの概観を示す図The figure which shows the outline | summary of the pseudo conformity feedback in one embodiment of this invention 同形態におけるシステムのフローチャートFlow chart of the system in the same form 擬似適合性フィードバックの効果を示す表Table showing the effect of pseudo conformance feedback 擬似適合性フィードバックの影響を示す例Example showing the impact of pseudo-conformance feedback

以下、本発明における一つの実施の形態である音声ドキュメント検索システム１について図面を参照しながら説明する。 Hereinafter, an audio document search system 1 according to an embodiment of the present invention will be described with reference to the drawings.

この実施の形態における音声ドキュメント検索システム１は、データベース２に格納された音声データを音声認識手段３を用いてドキュメント化し、その中から、入力されたクエリに関連する関連語を擬似適合性フィードバックを用いて抽出して検索するようにしたものである。そして、特徴的には、そのドキュメント化されたテキストを所定の単語（列）単位や息継ぎ（発話）の単位（例えば、15発話、30発話、60発話など）に分離し、その分離された音声区間内において擬似適合性フィードバックを用いて関連語を抽出するとともに、ドキュメント化されたテキストを前記時間単位とは異なる第二の時間単位（例えば、30発話〜全体）に分離し、これを検索のための索引として関連語を用いて検索を行えるようにしたものである。以下、本実施の形態における各機能実現手段について図１の機能ブロック図などを用いて説明する。 The speech document retrieval system 1 in this embodiment documents speech data stored in a database 2 using speech recognition means 3, and provides related information related to an inputted query from among them as pseudo-fit feedback. It is used to extract and search. Characteristically, the documented text is separated into predetermined word (column) units or breathing (speech) units (for example, 15 utterances, 30 utterances, 60 utterances, etc.), and the separated speech Extract related terms using pseudo relevance feedback within the interval, and separate the documented text into a second time unit (eg, 30 utterances to the whole) that is different from the time unit. It is possible to perform a search using related terms as an index for this purpose. Hereafter, each function realization means in this Embodiment is demonstrated using the functional block diagram of FIG.

まず、データベース２には音声データが格納されている。ここでは、音声データとして、例えば、学会講演や模擬講演などのように話者によってスピーチされたものを取り扱うものとするがこれに限定されるものではない。これらの音声データは、通常10分から25分程度であるが、1時間を超える場合もあり、また、模擬講演の場合は、一般話者による日常的な話題についての12分程度のスピーチもあるため、種々の話題を含んでいることが多い。 First, voice data is stored in the database 2. Here, for example, speech data that is spoken by a speaker such as a conference lecture or a mock lecture is handled, but the present invention is not limited to this. These audio data are usually about 10 to 25 minutes, but may exceed 1 hour, and in the case of a mock lecture, there is also a speech of about 12 minutes on a daily topic by a general speaker. , Often includes a variety of topics.

音声認識手段３は、このデータベース２に格納された音声データを読み出し、音声認識技術を用いてドキュメント化する。この音声認識技術としては、従来より提案されている種々の手法を用い、これらを用いてテキスト文書とする。このとき、音声認識技術によって音声認識率にばらつきを生じるため、一般的には、認識率は65％〜95％程度となる。 The voice recognition means 3 reads the voice data stored in the database 2 and documents it using a voice recognition technique. As the speech recognition technology, various conventionally proposed methods are used, and these are used as a text document. At this time, since the speech recognition rate varies depending on the speech recognition technology, the recognition rate is generally about 65% to 95%.

第一音声区間抽出手段４は、このように音声認識されたテキスト文書から第一の時間長の音声区間（第一の音声区間）を抽出する。この音声区間としては、一律の秒単位（例えば、30秒、1分、2分など）の音声区間としてもよいが、ここでは、息継ぎ（発話）単位の時間長とする。この息継ぎ単位の音声区間を抽出する場合は、所定の息継ぎ（発話）単位（例えば、15発話、30発話、60発話）の時間長に分離していく。このように息継ぎ単位で時間を区切った場合は、少なくとも発話の途中で区切ることがなくなるため、一律に秒単位で分離する場合と比べて関連語などの抽出を精度よく行うことができるというメリットがある。なお、ここでは息継ぎ単位で第一の時間長を区切るようにしているが、例えば、「単語列単位」で区切るようにしてもよい。このように単語列単位で区切った場合では、典型的な文末表現が見つかれば文の境界で分離が行える。また、典型的な話題に関連する単語列が見つかれば話題の境界で分離ができるため、一律に何秒単位で分離する場合と比べて関連語などの抽出を精度よく行うことができるというメリットがある。 The first speech segment extraction means 4 extracts a speech segment having a first time length (first speech segment) from the text document recognized as described above. This voice section may be a voice section in a uniform second unit (for example, 30 seconds, 1 minute, 2 minutes, etc.), but here it is a time length in units of breathing (speech). When extracting the voice interval of this breath connection unit, it is separated into time lengths of a predetermined breath connection (speech) unit (for example, 15 speeches, 30 speeches, 60 speeches). In this way, when the time is divided in units of breathing, it is not divided at least in the middle of the utterance, so there is an advantage that related words etc. can be extracted accurately compared to the case where separation is performed uniformly in seconds. is there. Here, the first time length is divided in units of breath connection, but may be divided in “word string units”, for example. In this way, when segmented in units of word strings, separation can be performed at sentence boundaries if a typical sentence end expression is found. In addition, if a word string related to a typical topic is found, it can be separated at the boundary of the topic, so there is a merit that related words etc. can be extracted more accurately than in the case of uniformly separating in units of seconds. is there.

クエリ入力手段５は、ユーザによるクエリの入力を受け付ける。この入力を受け付ける方法としては、テキストとして入力を受け付けるようにしてもよく、もしくは、音声として入力を受け付けるようにしてもよい。ここでは、入力例として「アボリジニーはどこの国にいるか」というクエリが入力されたものとして、後述する実施例に擬似適合性フィードバックの影響の例を説明する。 The query input means 5 receives a query input by the user. As a method of accepting this input, the input may be accepted as text, or the input may be accepted as speech. Here, an example of the influence of pseudo conformity feedback will be described in an example to be described later, assuming that a query “where is Aboriginal?” Is input as an input example.

関連語抽出引手段７は、この入力されたクエリに対して、前記第一音声区間抽出手段４で抽出された第一の音声区間のテキストを索引として関連語を抽出する。ここで、クエリ拡張の概念について説明する。 The related word extraction / drawing means 7 extracts a related word from the input query by using the text of the first voice section extracted by the first voice section extraction means 4 as an index. Here, the concept of query expansion will be described.

一般に、ベクトル空間に基づく検索システムでは、クエリは単語の出現頻度を要素としたベクトルで表される。元のクエリのベクトルをq、クエリに付け加える新しい索引語のベクトルq_ｎ、重みをαとした場合、拡張されたクエリのベクトルq_ｎは、数１で表される。ベクトル空間モデルではこのqをq_ｎにすることがクエリ拡張といえる。 In general, in a search system based on a vector space, a query is represented by a vector whose elements are word frequency. Assuming that the original query vector is q, the new index word vector q _n to be added to the query, and the weight is α, the expanded query vector q _n is expressed by Equation 1. In the vector space model to the q to q _n be said query expansion.

一般的には、α＜１であるが、q_ｎの要素を正数とするため、数２に基づくβを正の整数としてクエリ拡張を行う。すなわち、元のクエリqに対して再び重み付けを行い数２のようにクエリ拡張をする。 Generally, α <1, but since the element of q _n is a positive number, query expansion is performed with β based on Equation 2 as a positive integer. That is, the original query q is weighted again and the query is expanded as shown in Equation 2.

次に、適合性フィードバックについて説明する。適合性フィードバックとは、得られた検索結果のうち、どの文書が「適合文書であるか」、どの文書が「不適合文書であるが」をユーザに入力させることによって、クエリベクトルqをq_ｎに修正していくものである（数３参照）。 Next, relevance feedback will be described. Relevance feedback means that a query vector q is changed to q _n by allowing a user to input which document is “conforming document” and which document is “nonconforming document” among the obtained search results. It will be corrected (see Equation 3).

このとき、d_rとd_nはそれぞれ適合文書の集合と不適合文書の集合に含まれる単語の出現頻度を各要素としたクエリベクトルとしたものである。また、ω₀、ω₁、ω₂は0以上の定数である。 At this time, d _r and d _n are query vectors each having the appearance frequency of words included in the set of conforming documents and the set of non-conforming documents as elements. Further, ω ₀ , ω ₁ and ω ₂ are constants of 0 or more.

次に、擬似適合性フィードバックについて説明する。上述の適合性フィードバックの手法は、ユーザによる「適合文書であるか」、「不適合文書であるか」の入力が必要となるものであるが、この「擬似適合性フィードバック」は、ユーザによるこれらの判断を必要とせず、ユーザとのインタラクションなしに関連語を抽出してクエリ拡張を行えるようにしたものである。この手法を図２に示す。擬似適合性フィードバックでは、音声データから音声認識された文書を索引として、入力されたクエリqを用いて検索結果を得る。次に、得られた検索結果の上位いくつかを擬似的に適合文書の集合とし、これらの文書から関連語を抽出し、元のクエリに追加することでクエリ拡張を行うようにしたものである。ここでは、不適合性文書は用いないため、擬似適合性フィードバックの手法は、数４によって表されることになる。 Next, pseudo compatibility feedback will be described. The conformance feedback method described above requires the user to input “whether it is a conforming document” or “whether it is a nonconforming document”. This “pseudo conformance feedback” The query expansion can be performed by extracting the related words without requiring any judgment and without any interaction with the user. This technique is shown in FIG. In the pseudo relevance feedback, a search result is obtained by using an input query q with a document recognized from speech data as an index. Next, the top of the obtained search results are made a collection of conforming documents in a pseudo manner, related terms are extracted from these documents, and added to the original query to expand the query. . Here, since the non-conformance document is not used, the pseudo-conformity feedback technique is expressed by Equation 4.

検索手段８は、このように抽出された関連語と元のクエリを用いて、検索のための索引から検索を行うようにする。ここで、検索のための索引としては、第二音声区間抽出手段６によって関連語抽出のための第一の時間長よりも長い時間のドキュメントを抽出し、例えば、30発話、60発話、もしくは、ドキュメント全体の長さとして、これをもとに関連語と元のクエリを用いて検索する。通常、ドキュメント全体から関連語を抽出して検索すると、異なる話題に関連する関連語を抽出してしまい、また、音声認識誤りによって検索精度が落ちてしまうが、このようにドキュメントを分離して関連語を抽出して検索するようにすれば、このような無関係な関連語を抽出する可能性が低くなり、検索精度を高めることができる。 The search means 8 performs a search from the index for search using the related terms extracted in this way and the original query. Here, as an index for searching, a document having a time longer than the first time length for extracting related words is extracted by the second speech section extracting means 6, for example, 30 utterances, 60 utterances, or The total length of the document is searched using related terms and the original query. Normally, if you search by extracting related terms from the entire document, related terms related to different topics will be extracted, and the search accuracy will be reduced due to voice recognition errors. If a word is extracted and searched, the possibility of extracting such irrelevant related words is reduced, and the search accuracy can be increased.

出力手段９は、このように検索された音声ドキュメントを出力する。この出力に関しては、スコアの高いものから順に可能に表示し、それらが選択されることによって、該当する音声データの第二音声区間から再生できるようにする。このようにすれば、抽出された音声データを講演単位で再生して適合文書であるかどうかをユーザに検討させる必要がないため、極めて短時間のうちに該当する適合文書に対応する音声区間から再生させることができるようになる。なお、ここでは、選択された音声データの第二音声区間から再生させるようにしているが、その音声区間の所定の長さ分前からの音声区間から再生させるようにしてもよい。 The output means 9 outputs the voice document searched in this way. These outputs are displayed in order from the highest score, and are selected so that they can be reproduced from the second audio section of the corresponding audio data. In this way, since it is not necessary for the user to consider whether the extracted speech data is a relevance document by reproducing the extracted speech data in units of lectures, from the speech section corresponding to the relevant relevance document in a very short time. It can be played back. In this example, the audio data is reproduced from the second audio segment of the selected audio data. However, the audio data may be reproduced from the audio segment from a predetermined length before the audio segment.

次に、このように構成された音声ドキュメント検索システムを用いて、データベースに格納された音声データの中からクエリに対応した音声データを検索する方法について、図３の概観図や図４のフローチャートを用いて説明する。 Next, with respect to a method for searching voice data corresponding to a query from voice data stored in a database using the voice document search system configured as described above, the overview diagram of FIG. 3 and the flowchart of FIG. It explains using.

まず、クエリ入力手段５を用いてユーザがクエリを入力すると（ステップＳ１）、音声ドキュメント検索システム１は、データベース２に格納された音声データを抽出し（ステップＳ２）、これを音声認識手段３を用いて音声認識する（ステップＳ３）。なお、ここでは、クエリが入力されてから音声認識手段３を用いてテキスト化するようにしているが、あらかじめデータベース２内の音声データをテキスト化しておいてもよい。 First, when a user inputs a query using the query input means 5 (step S1), the voice document search system 1 extracts voice data stored in the database 2 (step S2), and the voice recognition means 3 Use for voice recognition (step S3). In this example, the text is converted into text using the voice recognition means 3 after the query is input, but the voice data in the database 2 may be converted into text in advance.

そして、このようにテキスト化された音声ドキュメントを形態素解析し（ステップＳ４）、例えば、15発話単位といった第一音声区間に分離する（ステップＳ５）。そして、入力されたクエリを用いて、この第一音声区間ごとに検索を行い（ステップＳ６）、ランク付けをして上位数個の検索結果を擬似適合文書とみなして関連語を抽出し（ステップＳ７）、元のクエリも用いて擬似適合性フィードバックにより再び検索を行う（ステップＳ９）。 Then, the voice document converted into text in this way is subjected to morphological analysis (step S4) and separated into first voice sections, for example, 15 speech units (step S5). Then, a search is performed for each first speech section using the input query (step S6), ranking is performed, and the related words are extracted by regarding the top several search results as pseudo matching documents (step S6). S7) The search is performed again by pseudo conformity feedback using the original query (step S9).

この再検索を行う場合、ステップＳ３でテキスト化された文書の中から第二音声区間を抽出し（ステップＳ８）、この音声区間の中から適合する音声ドキュメントを検索する（ステップＳ９）。 When this re-search is performed, the second voice section is extracted from the text converted in step S3 (step S8), and a matching voice document is searched from the voice section (step S9).

そして、ユーザによってこの検索結果のうち所望の音声ドキュメントが選択させ、その音声ドキュメントに関連づけられた音声データをデータベースから読み出して、該当する音声区間の音声データを出力するようにする（ステップＳ１０）。 Then, the user selects a desired voice document from the search results, reads the voice data associated with the voice document from the database, and outputs the voice data of the corresponding voice section (step S10).

このように上記実施の形態によれば、検索対象となる音声ドキュメントを所定の発話単位ごとに分離して音声区間を抽出し、その抽出された音声区間内においてクエリに関連する関連語を抽出して検索を行うようにしたので、例えば、講演単位といった非常に長い区間から関連語を抽出して検索する場合と比べて、無関係な話題まで関連語として追加される可能性が低くなり、検索精度を高めることができるようになる。 As described above, according to the above embodiment, the speech document to be searched is separated for each predetermined utterance unit, the speech section is extracted, and the related terms related to the query are extracted in the extracted speech section. For example, compared with the case of searching by extracting related words from a very long section such as a lecture unit, the possibility of adding irrelevant topics as related words is reduced, and the search accuracy is reduced. Can be increased.

次に、本発明における実験結果を示す。ここでは、NTCIR-9 Spoken Docで用いられたテストコレクションを用いて実験を行う。これは、日本語話し言葉コーパス（以下、CSJと称す）の学会講演987件と模擬講演1715件の合計2702件の講演を検索対象とするものである。CSJの各講演音声に対する音声認識結果が必要であるが、これについては、音声ドキュメント検索テストコレクションに含まれているもの（認識率65%〜95%）を利用した。本実験ではパッセージ検索の対象として15発話と30発話、60発話を採用し、15発話単位は講演の先頭から順に15発話ごとに区切り、この各区間が検索対象の文書となる。30発話と60発話についても15発話と同様に講演の先頭から順にそれぞれ30発話、60発話ごとに区切ったものである。 Next, experimental results in the present invention will be shown. Here, the experiment is performed using the test collection used in NTCIR-9 Spoken Doc. This searches for a total of 2702 lectures, including 987 academic lectures and 1715 mock lectures in the Japanese Spoken Language Corpus (hereinafter referred to as CSJ). Speech recognition results for each CSJ lecture speech are required, but we used those included in the speech document search test collection (recognition rate 65% -95%). In this experiment, 15 utterances, 30 utterances, and 60 utterances are adopted as passage search targets. The 15 utterance units are divided into 15 utterances in order from the beginning of the lecture, and each section becomes a document to be searched. Similarly to 15 utterances, 30 and 60 utterances are divided into 30 utterances and 60 utterances in order from the beginning of the lecture.

クエリは自然言語文で記述された125件のテキストである各クエリに対する答えとしての適合情報が、どの講演のどの発話からどの発話までという単位で付与されており、適合語として適合（R）と部分適合（P）が存在する。ここでは、適合ラベル（R）が付与された区間をクエリに対する正解として扱った。 Queries are 125 texts written in natural language sentences. Relevance information as an answer to each query is given in units from which utterance to which utterance of which lecture. Partial fit (P) exists. Here, the section with the matching label (R) is treated as the correct answer to the query.

本実施の形態における音声ドキュメント検索システム１の検索性能の評価は、再現率と精度を用いるのが一般的である。ここでは、評価尺度としてこれらを組み合わせた評価尺度である11点平均精度を用いる。これは数５で求められるもので、値が１に近づくほど適合していることを意味するものである。 The evaluation of the search performance of the voice document search system 1 in the present embodiment generally uses the recall and accuracy. Here, 11-point average accuracy, which is an evaluation scale combining these, is used as the evaluation scale. This is obtained by Equation 5, and means that the closer the value is to 1, the more suitable it is.

１．講演ドキュメント検索での擬似適合性フィードバックの効果
（１）ベースラインによる検索結果
ここでは、ベクトル空間モデルに基づくドキュメント検索システムを用いた。ベクトル間の類似度にはSMARTを用いた。ここでは、索引語を形態素の基本形（名詞、動詞のみ）とし、検索エンジンには汎用連想計算エンジンGETAを用いてベースライン検索システムを設計した。クエリqが与えられたとき、全ての文書Diについてqとの類似度SMART(q,Di)を算出し、類似度が0より大きいものを高い順に全件出力するようにした。 1. Effects of pseudo-conformance feedback in lecture document search (1) Baseline search results Here, a document search system based on a vector space model was used. SMART was used for the similarity between vectors. Here, the baseline search system was designed using the index word as the basic form of morphemes (only nouns and verbs) and the general-purpose associative engine GETA as the search engine. When query q is given, similarity SMART (q, Di) with q is calculated for all documents Di, and all documents with similarity greater than 0 are output in descending order.

このベースラインシステムでの結果を図５に示す、講演単位、60発話単位、30発話単位、15発話単位の検索の11点平均精度はそれぞれ、0.531、0.294、0.241、0.183であった。 The results of this baseline system are shown in FIG. 5, and the 11-point average accuracy of the search for the lecture unit, 60 speech units, 30 speech units, and 15 speech units was 0.531, 0.294, 0.241, and 0.183, respectively.

（２）検索対象の発話単位ごとの擬似適合性フィードバックの検索結果
擬似適合性フィードバックには、３つのパラメータが存在する。１つめは元のクエリqで得られた検索結果の上位いくつを擬似的に適合文書として扱うか、２つめは適合文書から数５を用いて関連語抽出行う際に何語抽出するか、３つめは数４における元のクエリに対する重みβである。この実施の形態では、125件のクエリを用いて、Leave-one-outの交差検定を行った。具体的には、124件のクエリを用いて検索精度が最大となるように各パラメータの推定を行い、残りの1件の検索実験にその推定パラメータを用いた。その際、擬似的に適合文書とみなす数は1件から5件までの1件刻み、関連語は10語から50語まで10語刻み、βは1〜10まで1刻みで検索を行った。これを全ての分割セットそれぞれに対して行った。なお、ここでは、予備実験に基づき、関連語抽出の際に、低頻度語に音声認識誤り語が含まれていると仮定し、文書出現頻度が1(DF=1)の語を関連語から除外することとした。 (2) Search result of pseudo suitability feedback for each utterance unit to be searched There are three parameters in pseudo suitability feedback. The first is how many of the top search results obtained by the original query q are treated as pseudo matching documents, and the second is how many words are extracted when extracting related words from the matching documents using Equation 5. 3 The second is the weight β for the original query in Equation 4. In this embodiment, leave-one-out cross-validation was performed using 125 queries. Specifically, each parameter was estimated using 124 queries so that the search accuracy was maximized, and the estimated parameters were used for the remaining one search experiment. At that time, the number of documents regarded as compliant documents was counted from 1 to 5 in increments of 1; related words were searched from 10 to 50 words in increments of 10; and β was searched from 1 to 10 in increments of 1. This was done for every split set. Here, based on a preliminary experiment, it is assumed that a low-frequency word contains a speech recognition error word when a related word is extracted, and a word with a document appearance frequency of 1 (DF = 1) is extracted from the related word. I decided to exclude it.

擬似適合性フィードバックによるクエリ拡張の結果を図５の真中の列に示す。ここでは検索対象の発話単位と関連語の抽出発話単位は同一である。講演単位と60発話単位では検索精度がそれぞれ0.531から0.470、0.294から0.262に減少した。30発話単位と15発話単位では検索精度がそれぞれ0.241から0.253、0.183から0.210に向上した。これらは検索対象となる文書が小さいほど擬似適合性フィードバックの効果が大きいことを示している。 The result of query expansion with pseudo-fitness feedback is shown in the middle column of FIG. Here, the utterance unit to be searched and the extracted utterance unit of the related word are the same. Search accuracy decreased from 0.531 to 0.470 and from 0.294 to 0.262 for the lecture unit and 60 utterance units, respectively. Search accuracy improved from 0.241 to 0.253 and from 0.183 to 0.210 for 30 utterance units and 15 utterance units, respectively. These indicate that the smaller the document to be searched is, the greater the effect of pseudo conformity feedback is.

クエリ毎の詳細な結果を見てみると、検索精度の向上がみられたクエリは講演単位では、28.8%(125件中36件)、60発話単位では32.0%(125件中40件)、30発話単位では54.4%（125件中68件）、15発話単位では66.4%（125件中83件）であった。擬似適合性フィードバックの影響の例を図６に示す。講演単位では、追加語中に当該クエリの検索意図を含む大きな話題から選ばれた語が多いことが分かる。これに対して、15発話単位では追加語中に当該クエリの検索意図に関連する語が数語あることがわかる。これは、講演単位での検索はうまく行われているが、検索結果として得られた文書が、ある話題を中心に複数の小話題を含むものであり、擬似適合性フィードバックの際に当該クエリと関連しない小話題の語が追加語として抽出されているためと考えられる。15発話単位での検索は、検索結果として得られた文書が小さく複数の話題を含みにくいため、擬似適合性フィードバックの際に当該クエリに大きく関連する語が抽出されたためと考えられる。 Looking at the detailed results for each query, the queries that showed improved search accuracy were 28.8% (36/125) of the talk units, 32.0% (40/125) of the 60 utterance units, 30 utterance units accounted for 54.4% (68 out of 125 cases), and 15 utterance units accounted for 66.4% (83 out of 125 cases). An example of the effect of pseudo-compatibility feedback is shown in FIG. In the lecture unit, it can be seen that there are many words selected from large topics including the search intention of the query in the additional words. In contrast, in the unit of 15 utterances, it is understood that there are several words related to the search intention of the query in the additional words. This is because the search by lecture unit is performed well, but the document obtained as a search result includes a plurality of small topics centering on a certain topic. This is probably because unrelated small topic words are extracted as additional words. The search in units of 15 utterances is considered to be because words that are greatly related to the query were extracted at the time of pseudo relevance feedback because the document obtained as a search result is small and difficult to include a plurality of topics.

２．関連語抽出用索引を用いた擬似適合性フィードバックの利用
（１）関連語抽出用索引を用いた擬似適合性フィードバックの利用方法
擬似適合性フィードバックのための関連語抽出を、検索した発話単位とは独立な発話単位から、擬似適合性フィードバックを行う方法について述べる。 2. Use of pseudo-adaptability feedback using related word extraction index (1) How to use pseudo-adaptability feedback using related word extraction index What is the retrieved utterance unit for related word extraction for pseudo-adaptability feedback? A method for performing pseudo-adaptability feedback from independent utterance units is described.

ここでは、15発話単位での関連語の抽出を行い、講演単位、60発話単位、30発話単位でこの追加語を擬似適合性フィードバックに用いて検索を行う。これは、複数の話題を含む可能性のある大きい発話単位の検索で、クエリと関連しない語がクエリに追加されることを防ぐことを目的とするものである。 Here, related words are extracted in units of 15 utterances, and search is performed using these additional words for pseudo-adaptability feedback in units of lectures, 60 utterances, and 30 utterances. This is intended to prevent a word not related to the query from being added to the query in a search of a large utterance unit that may include a plurality of topics.

（２）実験結果
15発話単位の検索での擬似適合性フィードバックの際に得られた追加語を用いて講演単位、60発話単位、30発話単位での擬似適合性フィードバックを行った。その結果を図５の右端の列に示す。全体において検索精度の改善がみられた。講演単位では11点平均精度0.500が得られ、60発話単位では11点平均精度0.313が得られた。30発話単位では11点平均精度0.262が得られた。これらは検索対象となる文書が小さいほど検索精度の上昇率が高いことを示している。クエリ毎の詳細な結果をみると、ベースラインの検索結果と比べて検索精度の向上がみられたクエリは講演単位では、33.6%（125件中42件）、60発話単位では、59.2%（125件中74件）、30発話単位では、60.0%（125件中75件）であった。通常の擬似適合性フィードバックで得られた結果と比べると、検索精度の向上がみられたクエリは講演単位では、56.0%（125件中70件）、60発話単位では、69.6%（125件中85件）、30発話単位では、55.2%（125件中69件）であった。また講演単位では、4.0%（125件中5件）のクエリで検索精度が変わらなかった。これらの結果は、15発話単位で抽出された追加語を用いて講演単位、60発話単位、30発話単位の擬似適合性フィードバックを行う効果を示している。これは、正解発話区間が15発話区間以下、特に1発話区間から3発話区間に多く分布しているためと考えられる。今後の課題として、さらに短い発話単位で追加語の抽出を行い、擬似適合性フィードバックを行うことが挙げられる。 (2) Experimental results
Using the additional words obtained in the pseudo-adaptation feedback in the search of 15 utterance units, we performed pseudo-adaptation feedback in lecture units, 60 utterance units, and 30 utterance units. The results are shown in the rightmost column of FIG. Overall, the search accuracy was improved. 11 points average accuracy 0.500 was obtained in the lecture unit, and 11 points average accuracy 0.313 was obtained in 60 utterance units. In 30 utterance units, the average accuracy of 11 points was 0.262. These indicate that the smaller the document to be searched, the higher the increase rate of search accuracy. Looking at the detailed results for each query, 33.6% (42/125) of queries that showed improved search accuracy compared to baseline search results, 59.2% (42 of 125) 74 cases out of 125 cases) and 60.0% (75 out of 125 cases) in 30 utterance units. Compared to the results obtained with normal pseudo-conformance feedback, queries with improved search accuracy were 56.0% (70/125) for the lecture unit and 69.6% (125/125) for the 60 utterance unit. 85) and 30 utterances, 55.2% (69 out of 125). In terms of lecture units, the search accuracy remained unchanged for 4.0% (5/125) of queries. These results show the effect of performing pseudo-adaptive feedback of lecture units, 60 utterance units, and 30 utterance units using additional words extracted in units of 15 utterances. This is thought to be because the correct utterance intervals are distributed more than 15 utterance intervals, especially from 1 utterance interval to 3 utterance intervals. Future topics include extracting additional words in shorter utterance units and performing pseudo-adaptability feedback.

CSJの講演を対象とした検索タスクにおいて、十分な量の検索クエリを利用して、擬似適合性フィードバックの効果が明らかになった。また、関連語抽出用の索引を用いた擬似適合性フィードバックの利用する手法を提案し、短い発話単位を検索して得られた追加語を擬似適合性フィードバックに用いることが有効であることが分かった。 In a search task for CSJ lectures, the effect of pseudo-conformance feedback was clarified by using a sufficient amount of search queries. We also proposed a method using pseudo-adaptability feedback using an index for extracting related words, and found that it is effective to use additional words obtained by searching short utterance units for pseudo-adaptability feedback. It was.

１・・・音声ドキュメント検索システム
２・・・データベース
３・・・音声認識手段
４・・・第一音声区間抽出手段
５・・・クエリ入力手段
６・・・関連語抽出手段
７・・・第二音声区間抽出手段
８・・・検索手段
９・・・出力手段 DESCRIPTION OF SYMBOLS 1 ... Voice document search system 2 ... Database 3 ... Voice recognition means 4 ... First voice section extraction means 5 ... Query input means 6 ... Related word extraction means 7 ... No. Two voice segment extraction means 8 ... search means 9 ... output means

Claims

Separating a voice document to be searched for every predetermined time unit and extracting a voice section;
Extracting related terms related to the query within the extracted speech segment;
Performing a search using the extracted related terms;
A voice document search method characterized by comprising:

The speech document search method according to claim 1, wherein the predetermined time unit is a predetermined word unit or a time unit divided by a breathing unit.

The step of performing a search using the extracted related word uses the extracted related word as an index for search using a speech segment separated for each second time unit different from the predetermined time unit. The speech document search method according to claim 1, wherein the search is performed using the search.

A first voice segment extracting means for separating a voice document to be searched every predetermined time unit and extracting a voice segment;
Related word extraction means for extracting related words related to the query in the extracted speech section using pseudo-adaptability feedback;
Search means for performing a search using the extracted related terms;
A voice document search system characterized by comprising:

5. The speech document search system according to claim 4, wherein the predetermined time unit is a predetermined word unit or a time unit divided by a breathing unit.

The search means performs a search using the extracted related terms by using a speech segment separated for each second time unit different from the predetermined time unit as an index for search. 4. The voice document search system according to 4.