JP2002278579A

JP2002278579A - Voice data retrieving device

Info

Publication number: JP2002278579A
Application number: JP2001077107A
Authority: JP
Inventors: Hiroo Kitagawa; 博雄北川
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2001-03-16
Filing date: 2001-03-16
Publication date: 2002-09-27

Abstract

PROBLEM TO BE SOLVED: To provide a voice data retrieving device accurately searching a part, including a desired speech at a high speed from voice data stored in a large volume. SOLUTION: This voice data retrieving device is constituted of a voice data registration part 1 converting digitized voice waveform data 4 to a preset voice symbol sequence and recording it, a candidate voice section detection part 2 for converting a retrieval word to a voice symbol and retrieving a matching part from a registered symbol sequence 7 and a retrieval word voice determining part 3 for determining whether or not a candidate section detected in the candidate voice section detection part matches with the retrieval word. Thus, the candidate section is narrowed at a high speed at a symbol level and accurate detection by matching at a voice waveform level is conducted.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、大量に蓄積された
音声波形データから所望の内容を含む発話部分を高速に
かつ正確に検索する音声データ検索装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an audio data retrieval apparatus for quickly and accurately retrieving an utterance portion containing desired contents from a large amount of accumulated audio waveform data.

【０００２】[0002]

【従来の技術】従来、音声データが記憶された音声デー
タ記憶手段の中から、所望の音声データを検索する音声
データ検索装置が提案されている（特開２０００−０２
０５５１号公報）。この音声データ検索装置では、登録
時に、音声データを区間分割し、各音声区間で予め語彙
選択された単語の存在確率を計算、蓄積しておき、検索
時には、入力された検索語の類義語を含めて検索単語群
に分解し、それらの存在確率が最も高くなる音声区間を
出力するようにしている。2. Description of the Related Art Conventionally, there has been proposed an audio data retrieval apparatus for retrieving desired audio data from audio data storage means in which audio data is stored (Japanese Patent Laid-Open No. 2000-02).
0551). In this voice data search device, at the time of registration, voice data is divided into sections, the existence probabilities of words that are vocabulary selected in advance in each voice section are calculated and stored, and at the time of search, synonyms of the input search word are included. Into a search word group, and a speech section having the highest existence probability is output.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、この方
式では予め辞書登録された検索語しか扱えないという制
限があり、また、この制限を緩和するために語彙を増大
させると記憶容量の大幅な増大につながるという問題が
ある。さらに、検索単語群の存在確率の計算は全音声区
間をフルサーチするため、音声データの容量が増大する
とそれに比例して検索時間も増大してしまうという問題
もある。However, in this method, there is a limitation that only search words registered in a dictionary in advance can be handled, and if the vocabulary is increased in order to alleviate this limitation, the storage capacity is greatly increased. There is a problem of being connected. Furthermore, since the calculation of the existence probability of the search word group performs a full search in all voice sections, there is a problem that as the volume of voice data increases, the search time also increases in proportion thereto.

【０００４】そこで、本発明は、このような問題点に鑑
みさなれたもので、大量に蓄積された音声データから所
望の発話を含む部分を高速にかつ正確に頭だしできる音
声データ検索装置を提供することを目的としている。Accordingly, the present invention has been made in view of such a problem, and an audio data search apparatus capable of quickly and accurately finding a portion including a desired utterance from a large amount of accumulated audio data. It is intended to provide.

【０００５】[0005]

【課題を解決するための手段】上記課題は、以下の本発
明の手段により解決される。The above object is achieved by the following means of the present invention.

【０００６】請求項１記載の発明の音声データ検索装置
は、デジタル化された音声波形データを予め設定した音
声シンボル列に変換して記録する音声データ登録部と、
検索語を音声シンボルに変換して登録シンボル系列中か
ら一致する部分を検索する候補音声区間検出部と、該候
補音声区間検出部で検出された候補区間が検索語と一致
しているか否かを判定する検索語音声判定部とを有する
ことを特徴とするものである。An audio data search device according to the first aspect of the present invention includes an audio data registration unit that converts digitized audio waveform data into a predetermined audio symbol sequence and records the converted data.
A candidate voice section detection unit that converts a search word into a voice symbol and searches for a matching part from a registered symbol sequence; and determines whether a candidate section detected by the candidate voice section detection unit matches the search word. And a search word / voice determination unit.

【０００７】また、請求項２記載の発明の音声データ検
索装置は、請求項１記載の音声データ検索装置におい
て、前記音声データ登録部が、音声データから音声シン
ボルを抽出する際に用いる音声パラメータ列を記憶して
おき、該音声パラメータ列を前記検索語音声判定部で共
用することを特徴とするものである。According to a second aspect of the present invention, there is provided the voice data search apparatus according to the first aspect, wherein the voice data registration unit uses a voice parameter string used when extracting voice symbols from voice data. Is stored, and the voice parameter string is shared by the search word voice determination unit.

【０００８】また、請求項３記載の発明の音声データ検
索装置は、請求項１記載の音声データ検索装置におい
て、前記候補音声区間検出部が、音声シンボルの検索に
全文検索システムを用いて検索することを特徴とするも
のである。According to a third aspect of the present invention, there is provided the voice data search apparatus according to the first aspect, wherein the candidate voice section detection unit performs a search using a full-text search system to search for a voice symbol. It is characterized by the following.

【０００９】また、請求項４記載の発明の音声データ検
索装置は、請求項１記載の音声データ検索装置におい
て、前記検索語音声判定部が、音声の始端と終端がフリ
ーで扱えるワードスポッティング音声認識装置を判定に
用いることを特徴とするものである。According to a fourth aspect of the present invention, there is provided the voice data searching apparatus according to the first aspect, wherein the search word voice determining unit is capable of treating a voice at a beginning and an end of the voice freely. The apparatus is used for determination.

【００１０】また、請求項５記載の発明の音声データ検
索装置は、請求項１記載の音声データ検索装置におい
て、前記音声データ登録部が、有声母音、長母音、撥
音、無音の音声区間のみを音声シンボル列に変換して用
いることを特徴とするものである。According to a fifth aspect of the present invention, there is provided the voice data search apparatus according to the first aspect, wherein the voice data registration unit stores only a voiced vowel, a long vowel, a vowel sound, and a silent voice section. It is characterized by being converted into a voice symbol string for use.

【００１１】また、請求項６記載の発明の音声データ検
索装置は、請求項１記載の音声データ検索装置におい
て、前記音声データ登録部が、単音節を基本とするが、
コンフュージョンを起し易い子音をグループ化して１つ
の音節扱いにしたシンボル群を音声シンボル列として用
いることを特徴とするものである。According to a sixth aspect of the present invention, in the voice data search apparatus according to the first aspect, the voice data registration unit is based on a single syllable.
The present invention is characterized in that consonants that easily cause confusion are grouped and treated as one syllable, and a symbol group is used as a voice symbol sequence.

【００１２】また、請求項７記載の発明の音声データ検
索装置は、請求項５または６記載の音声データ検索装置
において、前記音声データ登録部が、出現頻度高い単語
を予め辞書登録しておき、単語を１つの音声シンボルと
して追加することを特徴とするものである。According to a seventh aspect of the present invention, in the voice data search apparatus according to the fifth or sixth aspect, the voice data registration unit preliminarily registers words having a high appearance frequency in a dictionary. It is characterized in that words are added as one voice symbol.

【００１３】上記本発明では、音声データをまず文字表
現可能なシンボルに変換し、この文字レベルでのマッチ
ングにより候補音声区間を抽出する。次に検出された各
音声区間で音声波形データのパターンマッチングを行
い、実際に検索語を含んでいるか否かを判定する。この
２段階の処理により高速性と正確さを実現することがで
きる。According to the present invention, first, voice data is converted into a symbol capable of expressing characters, and a candidate voice section is extracted by matching at the character level. Next, pattern matching of the voice waveform data is performed in each detected voice section, and it is determined whether or not a search word is actually included. High speed and accuracy can be realized by the two-stage processing.

【００１４】[0014]

【発明の実施の形態】以下、本発明の実施の形態を図面
に基づいて具体的に説明する。Embodiments of the present invention will be specifically described below with reference to the drawings.

【００１５】図１は、本発明の実施の形態に係る音声デ
ータ検索装置の構成図を示している。図１に示されるよ
うに、音声データ検索装置は、音声データ登録部１、候
補者音声区間検出部２、検索語音声判定部３から構成さ
れている。音声データ登録部１では、デジタル化された
音声波形データ４を予め設定した音声シンボル列７に変
換して記録している。候補音声区間検出部２は、入力さ
れた検索語を音声シンボルに変換して登録音声シンボル
系列７の中からマッチする部分を検索する。検索語音声
判定部３は、文字サーチ１０で検出された音声区間のす
べてについて、その区間に目的とする検索語が含まれて
いるか否かを元の音声データ波形４を使って判定する。FIG. 1 shows a configuration diagram of a voice data search device according to an embodiment of the present invention. As shown in FIG. 1, the voice data search device includes a voice data registration unit 1, a candidate voice section detection unit 2, and a search word voice determination unit 3. The audio data registration unit 1 converts the digitized audio waveform data 4 into a preset audio symbol sequence 7 and records it. The candidate voice section detection unit 2 converts the input search word into a voice symbol and searches for a matching part from the registered voice symbol sequence 7. The search word voice determination unit 3 determines whether or not the target search word is included in all the voice sections detected by the character search 10 using the original voice data waveform 4.

【００１６】まず、図１に示される音声データ登録部１
について説明する。対象とする音声波形データ４は、予
めデジタル化され、ハードディスクなどの記憶装置に蓄
えられているものとする。音声波形データ４は、音響分
析５によって、５〜１０ｍｓｅｃ程度の短い時間単位毎
にスペクトル情報やパワーなどの特徴量に変換され、音
声パラメータとして出力される。音素認識４では、この
音声パラメータの時系列データから音素の種類を特定し
ていく。抽出された各音素は予め設定しておいたシンボ
ル群にマッピングされ、音声区間との対応付けを合わせ
て記憶装置に蓄積する（音声シンボル列７）。音声シン
ボル列７への変換精度が検索時の検出精度になるため、
音声シンボル群は読みを正確にあらわすものではなく、
間違えやすい音素を無視したり、いくつかの類似音素を
まとめて扱っている。First, the audio data registration unit 1 shown in FIG.
Will be described. It is assumed that the target audio waveform data 4 is digitized in advance and stored in a storage device such as a hard disk. The audio waveform data 4 is converted into feature amounts such as spectrum information and power in short time units of about 5 to 10 msec by the acoustic analysis 5 and output as audio parameters. In phoneme recognition 4, the type of phoneme is specified from the time-series data of the voice parameter. Each extracted phoneme is mapped to a preset symbol group, and stored in a storage device in association with a voice section (voice symbol sequence 7). Since the conversion accuracy to the voice symbol sequence 7 becomes the detection accuracy at the time of search,
Speech symbols do not represent readings accurately,
It ignores phonemes that are easy to make mistakes and handles several similar phonemes at once.

【００１７】次に、音声データ検索時の候補音声区間検
出部２について、図１を用いて説明する。キーボードな
どの入力装置によって検索語が入力され（検索語入力
８）、候補音声区間検出部６に受け渡されると、まず検
索語を音声データ登録時に用いたのと同じ仕様のシンボ
ル群の系列にマッピングする（音声シンボルへの変換
９）。検索語は探したい発話内容を表すもので、ひらが
な／カタカナ／ローマ字のいずれかで表記された読み記
号である。検索語入力時に通常の漢字かな混じり表記を
扱いたい場合は、別途単語辞書を用意すれば可能とな
る。Next, the candidate voice section detection unit 2 at the time of voice data search will be described with reference to FIG. When a search word is input by an input device such as a keyboard (search word input 8) and passed to the candidate voice section detection unit 6, first, the search word is converted into a series of symbol groups having the same specifications as those used at the time of voice data registration. Mapping (conversion to speech symbol 9). The search word represents the content of the utterance to be searched for, and is a reading symbol written in any of Hiragana / Katakana / Romaji. If you want to handle ordinary Kanji kana mixed notation when entering a search word, it is possible to prepare a separate word dictionary.

【００１８】文字列サーチ１０では、音声データ登録部
１に登録されている音声シンボル列７の中から検索語の
音声シンボル並びと一致する部分を抜き出す。一致部分
が複数存在すれば、それらすべての検出位置を出力す
る。これにより、シンボルレベルでの高速な候補区間の
絞込みができる。In the character string search 10, a part that matches the voice symbol sequence of the search word is extracted from the voice symbol row 7 registered in the voice data registration unit 1. If there are a plurality of matching parts, the detection positions of all of them are output. As a result, high-speed candidate sections can be narrowed down at the symbol level.

【００１９】次に、音声データ検索時の検索語音声判定
部３について、図１を用いて説明する。検索語音声判定
部３では、文字列サーチ１０により検出された音声区間
のすべてについて、実際にその区間に目的とする検索語
が含まれているか否かを、元の音声波形データ４を使っ
て検証する（単語音声認識１２）。一般的な単語音声認
識は登録されている複数の単語の中から１つを特定する
ものであるが、ここでは検索語のみを入力し、その類似
度を計算して閾値によって判定するようにしている。こ
れにより、音声波形データレベルでの検出ができる。単
語音声認識１２で、検索語が存在すると判断された場
合、その位置を検索結果として出力する（検索結果出力
１３）。Next, a description will be given of the search word voice determination unit 3 at the time of voice data search with reference to FIG. The search word speech determination unit 3 uses the original speech waveform data 4 to determine whether or not all of the speech sections detected by the character string search 10 actually include the target search word. Verification (word speech recognition 12). In general word speech recognition, one of a plurality of registered words is specified. Here, only a search word is input, the similarity is calculated, and the similarity is determined based on a threshold. I have. This enables detection at the audio waveform data level. When it is determined by the word speech recognition 12 that a search word exists, the position is output as a search result (search result output 13).

【００２０】次に、音声パラメータを用いた音声データ
検索装置について、図２を用いて説明する。図２は、本
発明の実施の形態に係る音声データ検索装置の他の構成
図を示している。図２に示されるように、音声データ登
録部２１で行う音響分析２５の処理で得られた音声パラ
メータ２６を音声波形と対応付けて記憶しておき、検索
時の検索語音声判定部２３での検索語音声判定にも利用
できるようにしたことを特徴としている。記憶容量は若
干増加するが、検索時の演算量を削減することができ、
さらなる高速化が可能となる。処理手順は上記した処理
手順と同じであるので、ここでは説明を省略する。Next, an audio data search apparatus using audio parameters will be described with reference to FIG. FIG. 2 shows another configuration diagram of the voice data search device according to the embodiment of the present invention. As shown in FIG. 2, a speech parameter 26 obtained in the processing of the acoustic analysis 25 performed by the speech data registration unit 21 is stored in association with a speech waveform, and is stored in the search word speech determination unit 23 during a search. It is also characterized in that it can be used for search word speech determination. Although the storage capacity will increase slightly, the amount of computation during searching can be reduced,
Further higher speed is possible. Since the processing procedure is the same as the above-described processing procedure, the description is omitted here.

【００２１】次に、検索時の文字列サーチ９における音
声シンボル検索に全文検索システムを導入した場合につ
いて説明する。図１に示される、候補音声区間検出部２
において、検索時の文字列サーチ９における音声シンボ
ル検索に全文検索システムを導入したことで、音声シン
ボル列１１を毎回頭から順にサーチするのに比較し、候
補音声区間の絞込みを高速検索が可能となる。特に、音
声データ量が多い場合に効果が大きい。ただし、全文検
索システムは大規模なインデックスファイルを使用する
ため、システムに要求される記憶容量は増大する。Next, a case in which a full-text search system is introduced for voice symbol search in the character string search 9 at the time of search will be described. Candidate voice section detection unit 2 shown in FIG.
In the above, by introducing a full-text search system for the voice symbol search in the character string search 9 at the time of the search, it is possible to perform a high-speed search for narrowing down the candidate voice sections, as compared to searching the voice symbol sequence 11 sequentially from the beginning each time. Become. In particular, the effect is great when the amount of audio data is large. However, since the full-text search system uses a large-scale index file, the storage capacity required for the system increases.

【００２２】次に、検索時の検索語音声判定部３で行な
われる検索語音声判定にワードスポッティング音声認識
技術を用いた場合について説明する。ここで、ワードス
ポッティングとは、音声パターンに対して区間を限定せ
ず標準パターンとの参照を行い、一致尤度が高い部分を
探すことにより、目的の単語を検出する認識方式であ
る。Next, a case where a word spotting speech recognition technique is used for the search word speech determination performed by the search word speech determination unit 3 at the time of search will be described. Here, the word spotting is a recognition method for detecting a target word by referencing a voice pattern to a standard pattern without limiting the section and searching for a portion having a high likelihood of matching.

【００２３】図１における、検索語音声判定部３で行な
われる検索語音声判定に音声の始端と終端がフリーで扱
えるワードスポッティング音声認識技術を用いたこと
で、候補音声区間の両端を若干広げ、ワードスポッティ
ング音声認識に音声波形データを渡すことにより、語頭
の音素の脱落等による誤認識を軽減でき、より高精度な
判定が可能になる。また、ワードスポッティング音声認
識は音声を区切って与える必要がないため、指定した単
語が存在しているか否かの判定もその内部機構に備わっ
ており、あらためて閾値処理する必要がない。By using a word spotting speech recognition technique in which the start and end of the speech can be handled freely for the search word speech determination performed by the search word speech determination unit 3 in FIG. 1, both ends of the candidate speech section are slightly widened. By passing the speech waveform data to the word spotting speech recognition, erroneous recognition due to dropout of the phoneme at the beginning of the word can be reduced, and more accurate judgment can be made. In addition, since word spotting speech recognition does not need to provide speech in sections, the internal mechanism also determines whether or not the specified word exists, and there is no need to perform threshold processing again.

【００２４】次に、音声データ登録部１での音声シンボ
ル群の設定方法について説明する。Next, a method for setting a voice symbol group in the voice data registration unit 1 will be described.

【００２５】ここでは、誤認識を起こしやすい子音をす
べて無視して有声母音、長母音、撥音、無音の音声区間
のみを音声シンボル列に変換して用いる。これにより、
候補区間の絞込み時に発生する誤りを減少させることが
でき、高精度な検出結果が得られる。例えば、「音声の
検索について・・・」という発話があったとすると、そ
の母音系列である「オンエーオエンアウイウイエ・・
・」を音声シンボル列として抽出する。この場合、検索
時には、検索語入力８で入力される検索語も同じように
シンボル化する必要がある。図１の検索語入力８におい
て、検索語として「オンセー」を入力した場合は、その
発音から子音を削除して「オンエー」に変換し、登録さ
れている音声シンボル列１１からサーチする（文字列サ
ーチ１０）。ここで、文字列サーチ１０によって、「音
声」以外にも「尊敬」など母音系列が同じ単語はすべて
検出されることになる。最後に、検出された音声区間す
べてが単語音声認識１２によってチェックされ、目的と
する区間のみが検索結果として出力される（検索結果出
力１３）。Here, all consonants that are likely to cause misrecognition are ignored, and only voiced vowels, long vowels, vowels, and silence speech sections are converted into speech symbol strings and used. This allows
Errors generated when narrowing down the candidate sections can be reduced, and a highly accurate detection result can be obtained. For example, if there is an utterance of “voice search ...”, the vowel sequence “On-eo-en-a-ui-wi-e ...
"Is extracted as a voice symbol sequence. In this case, at the time of the search, the search word input in the search word input 8 also needs to be symbolized in the same manner. In the search word input 8 of FIG. 1, when "onsay" is input as a search word, a consonant is deleted from its pronunciation, converted to "on-a", and a search is performed from the registered voice symbol string 11 (character string). Search 10). Here, the character string search 10 detects all words having the same vowel sequence, such as “respect”, in addition to “voice”. Finally, all the detected speech sections are checked by the word speech recognition 12, and only the target section is output as a search result (search result output 13).

【００２６】次に、音声シンボル群に、コンフュージョ
ン起し易い子音をグループ化して１つの音節扱いとした
ものを用いる場合について説明する。単音節を基本とす
るが、コンフュージョンを起し易い子音をグループ化し
て１つの音節扱いしたシンボル群を音声シンボル列７と
して用いることで、候補区間の絞込み誤りを抑えつつ、
候補区間数の削減ができ、より高速な検索が可能とな
る。例えば、無声破裂子音の“Ｐ”と“Ｔ”が識別しに
くい場合は、「パ」行と「タ」行の音節を同じものとし
て扱う。表記については、「パ」行に統一、「タ」行に
統一、全く別のシンボルを割り当てるなど自由に設定し
てよい。処理手順は上述した処理手順と同じであるの
で、ここでは説明を省略する。Next, a case will be described in which consonants that are likely to cause confusion are treated as one syllable in the speech symbol group. Although it is based on a single syllable, the consonants that are likely to cause confusion are grouped and a symbol group treated as one syllable is used as the voice symbol sequence 7, thereby suppressing errors in narrowing down candidate sections.
The number of candidate sections can be reduced, and higher-speed search can be performed. For example, when it is difficult to distinguish between “P” and “T” of the unvoiced consonant, the syllables in the “P” line and the “T” line are treated as the same. The notation may be set freely, such as unifying the "pa" line, unifying the "ta" line, or assigning a completely different symbol. Since the processing procedure is the same as the above-described processing procedure, the description is omitted here.

【００２７】次に、音声シンボル群に、母音や単音節と
いった短い単位だけではなく、単語も１つのシンボルと
して割り当てた場合について説明する。出現頻度高い単
語を予め辞書登録しておき、各単語をそれぞれ１つの音
声シンボルとして割り当てておくことにより、音声シン
ボル列７が縮小されると同時に検出される候補区間数の
減少にもなり、より高速な検索が可能となる。図１にお
ける音声データ登録部１へ音声データの登録時に、対象
単語部分を抽出するには、検出された音素系列から行っ
ても良いし、別に単語音声認識装置を用意しても良い。Next, a case will be described in which not only a short unit such as a vowel or a single syllable but also a word is assigned to a voice symbol group as one symbol. By preliminarily registering words having a high frequency of occurrence in a dictionary and assigning each word as one voice symbol, the number of candidate sections detected at the same time as the voice symbol string 7 is reduced is reduced. High-speed search becomes possible. When registering voice data in the voice data registration unit 1 in FIG. 1, the target word portion may be extracted from the detected phoneme sequence or a separate word voice recognition device may be prepared.

【００２８】本発明は、会議を録音したテープから所望
の議論がなされた部分を検索・再生するシステムや、録
画されたビデオテープから音声をキーに所望のシーンを
抽出するシステムなどにも応用することができる。The present invention is also applied to a system for retrieving / reproducing a part where a desired discussion has been made from a tape recording a conference, a system for extracting a desired scene from a recorded videotape by using sound as a key, and the like. be able to.

【００２９】以上本発明の好ましい実施例について詳述
したが、本発明は係る特定の実施形態に限定されるもの
ではなく、特許請求の範囲に記載された本発明の要旨の
範囲内において、種々の変形・変更が可能である。Although the preferred embodiment of the present invention has been described in detail, the present invention is not limited to the specific embodiment, and various modifications may be made within the scope of the present invention described in the appended claims. Can be modified and changed.

【００３０】[0030]

【発明の効果】以上詳述したところから明らかなよう
に、請求項１記載の発明は、デジタル化された音声波形
データを予め設定した音声シンボル列に変換して記録す
る音声データ登録部と、検索語を音声シンボルに変換し
て登録シンボル系列中からマッチする部分を検索する候
補音声区間検出部と、該候補区間が検索語と一致してい
るか否かを判定する検索語音声判定部から構成すること
で、シンボルレベルでの高速な候補区間の絞込みと音声
波形レベルでのマッチングによる正確な検出が可能とな
る。As is apparent from the above description, the first aspect of the present invention provides an audio data registration unit for converting digitized audio waveform data into a predetermined audio symbol sequence and recording the converted data. A candidate speech section detection unit that converts a search term into a speech symbol and searches for a matching part from a registered symbol sequence, and a search term speech determination unit that determines whether the candidate section matches the search term By doing so, high-speed narrowing of candidate sections at the symbol level and accurate detection by matching at the audio waveform level can be performed.

【００３１】また、請求項２記載の発明は、音声データ
登録部において、音声データから音声シンボルを抽出す
る際に用いる音声パラメータ列を記憶しておき、該音声
パラメータ列を検索語音声判定部で共用することで、検
索時の演算量を削減することができ、さらなる高速化が
可能となる。According to a second aspect of the present invention, in the voice data registration unit, a voice parameter sequence used when extracting voice symbols from voice data is stored, and the voice parameter sequence is stored in the search word voice determination unit. By sharing, it is possible to reduce the amount of calculation at the time of retrieval, and further increase the speed.

【００３２】また、請求項３記載の発明は、候補音声区
間検出部において、音声シンボルの検索に全文検索シス
テムを用いることで、候補区間の絞込みをさらに高速化
するができる。特に大量の音声データを扱う際に効果が
大きい。According to the third aspect of the present invention, the candidate voice section detection unit uses a full-text search system to search for voice symbols, so that the speed of narrowing down the candidate sections can be further increased. This is particularly effective when dealing with a large amount of audio data.

【００３３】また、請求項４記載の発明は、検索語音声
判定部において、音声の始端と終端がフリーで扱えるワ
ードスポッティング音声認識装置を判定に用いること
で、単語音声認識の誤認識を減少させることができ、高
精度な検出結果が得られる。According to a fourth aspect of the present invention, the search word speech determination unit uses a word spotting speech recognition device capable of freely handling the start and end of the speech for determination, thereby reducing erroneous word speech recognition. And a highly accurate detection result can be obtained.

【００３４】また、請求項５記載の発明は、誤認識しや
すい子音を無視し、有声母音、長母音、撥音、無音の音
声区間のみを音声シンボル列に変換して用いることで、
候補区間の絞込み時に発生する誤りを減少させることが
でき、高精度な検出結果が得られる。According to a fifth aspect of the present invention, a consonant which is apt to be erroneously recognized is ignored, and only voiced vowels, long vowels, lyophotic and non-voiced voice sections are converted into voice symbol strings and used.
Errors generated when narrowing down the candidate sections can be reduced, and a highly accurate detection result can be obtained.

【００３５】また、請求項６記載の発明は、単音節を基
本とするが、コンフュージョンを起し易い子音をグルー
プ化して１つの音節扱いしたシンボル群を音声シンボル
列として用いることで、候補区間の絞込み誤りを抑えつ
つ、候補区間数の削減ができ、より高速な検索が可能と
なる。The invention according to claim 6 is based on a single syllable, but a consonant which is likely to cause confusion is grouped and a symbol group treated as one syllable is used as a voice symbol sequence, thereby making it possible to select a candidate section. The number of candidate sections can be reduced while suppressing an error in narrowing down, and a higher-speed search can be performed.

【００３６】また、請求項７記載の発明は、出現頻度高
い単語を予め辞書登録しておき、各単語をそれぞれ１つ
の音声シンボルとして割り当てておくので、音声シンボ
ル列が縮小されると同時に検出される候補区間数の減少
にもなり、より高速な検索が可能となる。According to the present invention, words having a high frequency of appearance are registered in a dictionary in advance, and each word is assigned as one voice symbol. Thus, the number of candidate sections can be reduced, and higher-speed search can be performed.

[Brief description of the drawings]

【図１】本発明の実施の形態に係る音声データ検索装置
の構成図である。FIG. 1 is a configuration diagram of a voice data search device according to an embodiment of the present invention.

【図２】本発明の実施の形態に係る音声データ検索装置
の他の構成図である。FIG. 2 is another configuration diagram of the voice data search device according to the embodiment of the present invention.

[Explanation of symbols]

１音声データ登録部２候補音声区間検出部３検索語音声判定部 1 voice data registration unit 2 candidate voice section detection unit 3 search word voice determination unit

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 15/00 Ｇ１０Ｌ 3/00 ５５１Ｐ 15/28 ５７１Ｅ 5/06 Ｆ ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) G10L 15/00 G10L 3/00 551P 15/28 571E 5/06 F

Claims

[Claims]

1. An audio data registration unit for converting digitized audio waveform data into a predetermined audio symbol sequence and recording it, and converting a search word into an audio symbol to search for a matching part from a registered symbol sequence. Voice data search device, comprising: a candidate voice section detection unit that performs a search; and a search word voice determination unit that determines whether a candidate section detected by the candidate voice section detection unit matches a search word. .

2. The voice data search device according to claim 1, wherein the voice data registration unit stores a voice parameter sequence used when extracting a voice symbol from voice data, and searches the voice parameter sequence. A voice data search device characterized by being shared by a word voice determination unit.

3. The voice data search device according to claim 1, wherein the candidate voice section detection unit searches for a voice symbol using a full-text search system.

4. The voice data search device according to claim 1, wherein the search word voice determination unit uses a word spotting voice recognition device capable of handling the start and end of the voice freely. apparatus.

5. The voice data search device according to claim 1, wherein the voice data registration unit converts only voiced vowels, long vowels, vocal sounds, and non-voiced voice sections into voice symbol strings and uses them. Voice data search device.

6. The symbol data group according to claim 1, wherein the voice data registration unit is based on a single syllable, but groups consonants that are likely to cause confusion into one syllable. A voice data search device characterized by using a character string as a voice symbol sequence.

7. The voice data search device according to claim 5, wherein the voice data registration unit registers in advance a word having a high frequency of appearance in a dictionary, and adds the word as one voice symbol. Voice data search device.