JP2005257954A

JP2005257954A - Speech retrieval apparatus, speech retrieval method, and speech retrieval program

Info

Publication number: JP2005257954A
Application number: JP2004068177A
Authority: JP
Inventors: Riyouko Imai; 亮子今井; Takafumi Koshinaka; 孝文越仲; Satoshi Nakazawa; 聡中澤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2004-03-10
Filing date: 2004-03-10
Publication date: 2005-09-22

Abstract

PROBLEM TO BE SOLVED: To provide a speech retrieval apparatus, a speech retrieval program, and a speech retrieval method that provide an efficient retrieval result by retrieving an optional word from speech data including a recognition error with a small information processing quantity. SOLUTION: Speech retrieval is carried out by using the speech retrieval apparatus equipped with: a word/phrase expansion part which converts an inputted word to generate a phoneme string or syllable string; a phoneme string conversion part which adds or substitutes a new phoneme to or for the phoneme string or syllable string to generate a new phoneme string or new syllable string; a phoneme/syllable data storage part which stores phoneme/syllable data to be retrieved; and a collation part which collates the new phoneme string or new syllable string with the phoneme/syllable data to be retrieved. COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、音声検索装置、音声検索方法および音声検索プログラムに関し、特に発話された音声から特定の語を検索する技術に関する。 The present invention relates to a voice search device, a voice search method, and a voice search program, and more particularly to a technique for searching for a specific word from spoken voice.

コンピュータ技術の進歩に伴う記憶デバイスの大容量化や、ネットワーク技術の発達による広帯域を使用したデータ通信の普及に伴って、音声の電子化が行われ始めている。電子化された音声である音声データを、適切に検索できる技術が望まれている。従来技術による音声認識によって書き起こされた音素列には、認識誤りが混入している場合がある。そのため、ヒトが耳で聞いた場合には同一の読みであると判断される音声同士であっても、音声認識を行う装置（またはソフトウェア）が、同じ音素列を生成しない場合がある。認識誤りが存在する場合であっても、入力された検索キーワードによる検索が可能な技術が知られている（例えば、特許文献１、非特許文献１参照。）。 With the increase in the capacity of storage devices accompanying the advancement of computer technology and the spread of data communication using broadband due to the development of network technology, digitization of voice has begun. There is a demand for a technique that can appropriately search voice data that is an electronic voice. There may be a case where a recognition error is mixed in a phoneme string transcribed by speech recognition according to the prior art. For this reason, even when sounds that are determined to be the same reading when a human hears them by ear, a device (or software) that performs speech recognition may not generate the same phoneme string. A technique is known that enables a search using an input search keyword even when a recognition error exists (see, for example, Patent Document 1 and Non-Patent Document 1).

図１は上記特許文献１に記載された音声検索装置の構成を示すブロック図である。図１を参照すると、その音声検索装置は、音声・電気信号変換部１０１と、音声データ保管部１０２と、音素または音節認識部１０３と、音声データ始端保管部１０４と、音素または音節系列保管部１０５と、検索単語・語句の音素列または音節保管部１０６と、マッチング部１０７と、尤度閾値保管部１０８と、比較部１０９と、音声データ再生始端ポインタ１１０と、電気信号・音声変換部１１１とから構成されている。 FIG. 1 is a block diagram showing the configuration of the speech search apparatus described in Patent Document 1. Referring to FIG. 1, the speech search apparatus includes a speech / electrical signal conversion unit 101, a speech data storage unit 102, a phoneme or syllable recognition unit 103, a speech data start end storage unit 104, and a phoneme or syllable sequence storage unit. 105, search word / phrase phoneme string or syllable storage unit 106, matching unit 107, likelihood threshold storage unit 108, comparison unit 109, audio data reproduction start pointer 110, and electric signal / audio conversion unit 111 It consists of and.

このような構成を有する従来の検索装置は次のように動作する。文章音声を音声・電気信号変換部１０１により電気信号に変換し、この音声データを音声データ保管部１０２に保管すると共に、その音声データを音素または音節認識部１０３に入力する。音素または音節認識部１０３は、その認識結果である文章音声の音素または音節系列を音素または音節系列保管部１０５に保管すると共に、認識した各音素または音節の音声データの始端位置を、音素または音節の音声データ始端保管部１０４に保管する。 The conventional search device having such a configuration operates as follows. The sentence voice is converted into an electric signal by the voice / electrical signal conversion unit 101, the voice data is stored in the voice data storage unit 102, and the voice data is input to the phoneme or syllable recognition unit 103. The phoneme or syllable recognition unit 103 stores the phoneme or syllable sequence of sentence speech as the recognition result in the phoneme or syllable sequence storage unit 105, and sets the start position of the recognized speech data of each phoneme or syllable as the phoneme or syllable. Is stored in the voice data start-end storage unit 104.

次に、キーボード等により文字で入力された、検索を希望する単語、もしくは語句の音素または音節列を検索単語・語句の音素または音節列保管部１０６に保管する。文章音声音素または音節系列保管部１０５の文章音素の音素または音節系列と、検索単語・語句の音素または音節列保管部１０６の検索を希望する単語、もしくは語句の音素または音節列とをマッチング部１０７に入力し、文章音声の音素または音節列中で検索を希望する単語、もしくは語句の音素または音節列とのマッチングの尤度を計算する。 Next, the phoneme or syllable string of the word or phrase desired to be searched, which is input by characters using a keyboard or the like, is stored in the phoneme or syllable string storage unit 106 of the search word / phrase. The matching unit 107 matches the phoneme or syllable sequence of the sentence phoneme or syllable series storage unit 105 with the phoneme or syllable string of the word or phrase that the search word / phrase phoneme or syllable string storage unit 106 desires to search. To calculate the likelihood of matching with the phoneme or syllable string of the word or phrase desired to be searched in the phoneme or syllable string of the sentence speech.

このマッチング尤度の計算結果と尤度閾値保管部１０８における予め設定してある尤度閾値とを比較部１０９に入力し、尤度閾値を超える文章音声の音素または音節系列中の区間を検出し、その区間の位置を出力する。音声データ再生始端ポインタ１１０は、検索結果区間の音素または音節と音声データ始端保管部１０４の音声データの始端位置の入力により検索区間の文章音声データ保管部１０２の中の文章音声データの位置を指し示し、この位置からの音声データである電気信号を電気信号・音声変換部１１１に入力させる。これにより、電気信号・電気変換部１１１から検索結果を音声として出力することができる。上記の技術は、尤度を使用して検索結果を導き出しているため、認識誤りが存在する場合でも、比較的精度の高い検索の実行が可能である。 The matching likelihood calculation result and the likelihood threshold set in advance in the likelihood threshold storage unit 108 are input to the comparison unit 109 to detect the phoneme of the sentence speech exceeding the likelihood threshold or the section in the syllable series. , Output the position of the section. The voice data playback start pointer 110 indicates the position of the sentence voice data in the sentence voice data storage unit 102 in the search section by inputting the phoneme or syllable in the search result section and the start position of the voice data in the voice data start storage unit 104. Then, an electric signal which is audio data from this position is input to the electric signal / audio converter 111. As a result, the search result can be output as sound from the electrical signal / electrical converter 111. Since the above technique uses the likelihood to derive the search result, even when a recognition error exists, it is possible to execute a search with relatively high accuracy.

また、非特許文献１に記載の技術は、ニュース記事読み上げ音声コーパスからクエリー語の音声を含んだ記事の音声ファイルの検索を行う。クエリーは音素列として与え、音声コーパスを音素認識して得られた認識誤りを含む音素列を検索する。この際、連続した音素列中の任意の始点から始まる音素列とクエリーとの距離を計算する連続ＤＰマッチングを用いることで、ある程度の挿入、脱落、置換を吸収している。また、ある音素がどういう音素に誤りやすいかをまとめた Confusion matrix もＤＰマッチングと合わせて用いることでより効果的な検索を可能にしている。 The technique described in Non-Patent Document 1 searches for an audio file of an article including the voice of a query word from a news article reading voice corpus. The query is given as a phoneme sequence, and a phoneme sequence including a recognition error obtained by phoneme recognition of a speech corpus is searched. At this time, a certain amount of insertion, omission, and replacement is absorbed by using continuous DP matching that calculates the distance between the phoneme string starting from an arbitrary start point in the continuous phoneme string and the query. In addition, a confusion matrix that summarizes what phonemes are likely to be mistaken for is also used in conjunction with DP matching to enable more effective searches.

特公平７−６９７０８号公報Japanese Examined Patent Publication No. 7-69708 前田勇希、島津明、音素認識に基づく音声全文検索、人工知能学会研究会資料 SIG-SLUD-A102、 pp.1-6、2001Yuki Maeda, Akira Shimazu, Full-text search based on phoneme recognition, SIG-SLUD-A102, pp.1-6, 2001

本発明が解決しようとする課題は、認識誤りが混入している音声データから、任意の単語（または語句）を検索する場合に、少ない情報処理量で検索を実行し、効果的な検索結果が得られる音声検索装置、音声検索プログラムおよび音声検索方法を提供することにある。 The problem to be solved by the present invention is that when an arbitrary word (or phrase) is searched from speech data in which recognition errors are mixed, a search is executed with a small amount of information processing, and an effective search result is obtained. An object is to provide a voice search device, a voice search program, and a voice search method.

以下に、［発明を実施するための最良の形態］で使用される番号を用いて、課題を解決するための手段を説明する。これらの番号は、［特許請求の範囲］の記載と［発明を実施するための最良の形態］との対応関係を明らかにするために付加されたものである。ただし、それらの番号を、［特許請求の範囲］に記載されている発明の技術的範囲の解釈に用いてはならない。 The means for solving the problem will be described below using the numbers used in [Best Mode for Carrying Out the Invention]. These numbers are added to clarify the correspondence between the description of [Claims] and [Best Mode for Carrying Out the Invention]. However, these numbers should not be used to interpret the technical scope of the invention described in [Claims].

入力された語を変換して、音素列または音節列を生成する語句展開部（６）と、前記音素列または前記音節列に新たな音素を加減するか、または、前記音素列または前記音節列を構成する少なくとも一つの音素を他の音素に置換して、新たな音素列または新たな音節列を生成する音素列変換部（７）と、検索対象音素・音節データを格納する音素・音節データ格納部（９）と、前記音素列または前記音節列と、前記検索対象音素・音節データとを照合すると共に、前記新たな音素列または前記新たな音節列と、前記検索対象音素・音節データとを照合する照合部（４）とを具備する音声検索装置を使用して認識誤りを含むデータに対する音声検索を実行する。 A phrase expansion unit (6) that converts an input word to generate a phoneme string or a syllable string, and adds or subtracts a new phoneme to the phoneme string or the syllable string, or the phoneme string or the syllable string A phoneme string conversion unit (7) for generating a new phoneme string or a new syllable string by replacing at least one phoneme constituting the phoneme, and a phoneme / syllable data for storing search target phoneme / syllable data The storage unit (9), the phoneme string or the syllable string, and the search target phoneme / syllable data are collated, and the new phoneme string or the new syllable string, the search target phoneme / syllable data, A voice search is performed on data including a recognition error using a voice search device including a matching unit (4) for matching the data.

その音声検索装置において、前記検索対象音素・音節データは、複数の音素によって構成され、前記照合部（４）は、前記照合により、前記新たな音素列または前記新たな音節列に一致する箇所を前記検索対象音素・音節データから検出する音声検索装置を使用して認識誤りを含むデータに対する音声検索を実行する。また、前記音素列変換部（７）は、前記新たな音素列または前記新たな音節列を生成するための規則である展開ルール（２５）を格納し、前記展開ルール（２５）に基づいて前記新たな音素列または前記新たな音節列を生成する音声検索装置を使用して認識誤りを含むデータに対する音声検索を実行する。 In the speech search apparatus, the search target phoneme / syllable data is composed of a plurality of phonemes, and the collation unit (4) finds a position that matches the new phoneme string or the new syllable string by the collation. A speech search is performed on data including recognition errors using a speech search device that detects from the search target phoneme / syllable data. Further, the phoneme string conversion unit (7) stores a development rule (25) that is a rule for generating the new phoneme string or the new syllable string, and based on the expansion rule (25), A voice search is performed on data including recognition errors using a new phoneme string or a voice search device that generates the new syllable string.

さらに、その音声検索装置において、前記検索対象音素・音節データは、音声入力装置によって入力された音声を電子データ化するための情報処理を、逐次実行することで生成された音声データに基づいて生成される。また、その音声データは予め格納された音声データ（３０）でも良い。そのうえで、前記展開ルール（２５）は、前記検索対象音素・音節データを構成する複数の音素と、前記音声データ（３０）を正しく音素・音節認識した結果である正解データ（２８）を構成する音素・音節との比較に基づいて設定される音声検索装置を使用して認識誤りを含むデータに対する音声検索を実行する。ここで、本発明による音声検索装置は前記検索対象音素・音節データを生成するための元となるデータ形式に制限が無い。 Further, in the voice search device, the search target phoneme / syllable data is generated based on the voice data generated by sequentially executing information processing for converting the voice input by the voice input device into electronic data. Is done. The voice data may be voice data (30) stored in advance. In addition, the expansion rule (25) includes a plurality of phonemes constituting the search target phoneme / syllable data and phonemes constituting correct answer data (28) as a result of correctly recognizing the phoneme / syllable of the speech data (30). Perform a voice search for data containing recognition errors using a voice search device set based on comparison with syllables. Here, the speech search apparatus according to the present invention has no limitation on the data format that is the basis for generating the search target phoneme / syllable data.

その音声検索装置において、前記検索対象音素・音節データは、上記と同様に音声入力装置によって入力された音声から生成された音声データや、予め格納された音声データ（３０）に基づいて生成され、前記展開ルール（２５）は、前記検索対象音素・音節データを構成する複数の音素の出現頻度の統計に基づいて設定される音声検索装置を使用して認識誤りを含むデータに対する音声検索を実行する。 In the speech search device, the search target phoneme / syllable data is generated based on speech data generated from speech input by the speech input device as described above, or speech data (30) stored in advance. The expansion rule (25) executes a voice search for data including a recognition error using a voice search device set based on statistics of appearance frequencies of a plurality of phonemes constituting the search target phoneme / syllable data. .

その音声検索装置において、前記語句展開部（６）は、入力された語を構成する形態素を解析する形態素解析手段（５０）を備え、前記音素列変換部（７）は、前記形態素解析手段（５０）から出力された解析結果と、前記音素列または前記音節列とに基づいて、前記新たな音素列または前記新たな音節列を生成する音声検索装置を使用して認識誤りを含むデータに対する音声検索を実行する。また、前記語句展開部（６）は更に、登録語句判定部（５１）を備え、前記登録語句判定部（５１）は、前記形態素解析手段（５０）が解析した結果である形態素の各々が、予め登録されたものであるかどうかを判定し、前記音素列変換部（７）は、前記判定結果に基づいて前記新たな音素列または前記新たな音節列を生成する音声検索装置を使用して認識誤りを含むデータに対する音声検索を実行する。 In the speech search apparatus, the phrase expansion unit (6) includes morpheme analysis means (50) for analyzing morphemes constituting the input word, and the phoneme string conversion unit (7) includes the morpheme analysis means ( 50) speech for data including a recognition error using the speech search device that generates the new phoneme sequence or the new syllable sequence based on the analysis result output from 50) and the phoneme sequence or the syllable sequence. Perform a search. Moreover, the phrase expansion unit (6) further includes a registered phrase determination unit (51), and each of the morphemes that is a result of analysis by the morpheme analysis unit (50) The phoneme string conversion unit (7) determines whether the phoneme string is registered in advance, and uses the speech search device that generates the new phoneme string or the new syllable string based on the determination result. Perform a voice search for data containing recognition errors.

その音声検索装置において、前記検索対象音素・音節データは、上記と同様に音声入力装置によって入力された音声を電子データ化するための情報処理を、逐次実行することで生成された音声データに基づいて生成される。また、その音声データは予め格納された音声データ（３０）でも良い。それらの音声データを音声認識した音声認識結果、または、それらの音声データを音素認識した音素認識結果に基づいて、前記検索対象音素・音節データは生成される。ここで、前記音声データは、発話された音声の集合から生成される。このような構成を備える音声検索装置を使用して認識誤りを含むデータに対する音声検索を実行する。また、その音声検索装置において、前記検索対象音素・音節データは、予め格納された言語モデルに基づいて生成され、前記言語モデルは、単語の接続制約を記述した情報であるような音声検索装置を使用して認識誤りを含むデータに対する音声検索を実行する。また、その音声検索装置において、前記語句展開部（６）は、前記入力された語が、前記新たな音素列または前記新たな音節列を生成する必要がない語であった場合、前記音素列または前記音節列を前記照合部（４）に出力するような音声検索装置を使用して認識誤りを含むデータに対する音声検索を実行する。 In the speech search device, the search target phoneme / syllable data is based on speech data generated by sequentially executing information processing for converting speech input by the speech input device into electronic data, as described above. Generated. The voice data may be voice data (30) stored in advance. The search target phoneme / syllable data is generated based on a speech recognition result obtained by recognizing the speech data or a phoneme recognition result obtained by phoneme recognition of the speech data. Here, the voice data is generated from a set of spoken voices. A voice search is performed on data including a recognition error using the voice search apparatus having such a configuration. Further, in the speech search device, the search target phoneme / syllable data is generated based on a language model stored in advance, and the language model is a speech search device that is information describing connection restrictions of words. Use to perform a voice search for data containing recognition errors. Further, in the speech search device, the phrase expansion unit (6), when the input word is a word that does not need to generate the new phoneme string or the new syllable string, the phoneme string Alternatively, a voice search is performed on data including a recognition error using a voice search device that outputs the syllable string to the collation unit (4).

その音声検索装置において、音声検索装置は所定の記憶領域を備え、その記憶領域に前記入力された語を変換するための音素辞書データまたは音節辞書データを有し、前記語句展開部（６）は、音素列を生成する場合には、前記音素辞書データを使用し、音節列を生成する場合には前記音節辞書データ使用する。それにより、前記語句展開部（６）は、前記入力された語を音素または音節に変換し、変換された音素または音節に基づいて前記音素列または前記音節列を生成する音声検索装置によって音声検索を実行する。 In the speech search device, the speech search device has a predetermined storage area, and has phoneme dictionary data or syllable dictionary data for converting the input word in the storage area, and the phrase expansion unit (6) When generating a phoneme string, the phoneme dictionary data is used, and when generating a syllable string, the syllable dictionary data is used. Thereby, the phrase expansion unit (6) converts the inputted word into a phoneme or a syllable, and performs a voice search by a voice search device that generates the phoneme string or the syllable string based on the converted phoneme or syllable. Execute.

上記課題をコンピュータプログラムによって解決しようとする場合、入力された語を変換し、音素列または音節列を生成するステップと、前記音素列または前記音節列に新たな音素を加減し、または、前記音素列または前記音節列を構成する音素を他の音素に置換して、新たな音素列または新たな音節列を生成するステップと、格納された検索対象音素・音節データを読み出すステップと、前記新たな音素列または前記新たな音節列と、前記検索対象音素・音節データとを照合するステップとを具備する方法をコンピュータで実行可能なプログラムを所定のコンピュータに搭載し、そのプログラムを実行することによって音声検索を実行する。 When solving the above problem by a computer program, a step of converting an input word to generate a phoneme string or a syllable string, adding or subtracting a new phoneme to the phoneme string or the syllable string, or the phoneme Replacing the phoneme constituting the sequence or the syllable sequence with another phoneme, generating a new phoneme sequence or a new syllable sequence, reading the stored search target phoneme / syllable data, and the new A computer-executable program is installed in a predetermined computer and a method comprising the step of collating a phoneme string or the new syllable string with the search target phoneme / syllable data, and a voice is obtained by executing the program. Perform a search.

そのプログラムにおいて、複数の音素によって構成された前記検索対象音素・音節データを読み出すステップと、前記照合により、前記新たな音素列または前記新たな音節列に一致する箇所を前記検索対象音素・音節データから検出するステップを具備する方法をコンピュータで実行可能なプログラムを所定のコンピュータに搭載し、そのプログラムを実行することによって音声検索を実行する。 In the program, the step of reading out the search target phoneme / syllable data constituted by a plurality of phonemes, and the search results in the search target phoneme / syllable data corresponding to the new phoneme string or the new syllable string by the collation. A program that can be executed by a computer is installed in a predetermined computer, and voice search is executed by executing the program.

そのプログラムにおいて、展開ルール（２５）を読み出すステップと、前記展開ルール（２５）は、前記新たな音素列または前記新たな音節列を生成するための規則であり、前記展開ルール（２５）に基づいて前記新たな音素列または前記新たな音節列を生成するステップを具備する方法をコンピュータで実行可能なプログラムを所定のコンピュータに搭載し、そのプログラムを実行することによって音声検索を実行する。 In the program, the step of reading the expansion rule (25), and the expansion rule (25) are rules for generating the new phoneme string or the new syllable string, and are based on the expansion rule (25). Then, a computer-executable program that includes the step of generating the new phoneme sequence or the new syllable sequence is installed in a predetermined computer, and voice search is executed by executing the program.

そのプログラムにおいて、音声入力装置によって入力された音声を電子データ化するための情報処理を逐次実行することで生成された音声データや、予め格納された音声データ（３０）に基づいて、前記検索対象音素・音節データを生成するステップと、前記検索対象音素・音節データを構成する複数の音素と、前記音声データを正しく音素認識した結果である正解データ（２８）を構成する音素との比較に基づいて前記展開ルール（２５）を設定するステップを具備する方法をコンピュータで実行可能なプログラムを所定のコンピュータに搭載し、そのプログラムを実行することによって音声検索を実行する。ここで、本発明による音声検索装置は前記検索対象音素・音節データを生成するための元となるデータ形式に制限が無い。 In the program, the search target is based on voice data generated by sequentially executing information processing for converting voice inputted by the voice input device into electronic data, or voice data (30) stored in advance. Based on comparison between a step of generating phoneme / syllable data, a plurality of phonemes constituting the search target phoneme / syllable data, and a phoneme constituting correct answer data (28) as a result of correct phoneme recognition of the speech data. Then, a program that can be executed by a computer is installed in a predetermined computer and a voice search is executed by executing the program. Here, the speech search apparatus according to the present invention has no limitation on the data format that is the basis for generating the search target phoneme / syllable data.

そのプログラムにおいて、上記と同様に音声入力装置によって入力された音声から生成された音声データや、予め格納された音声データ（３０）に基づいて、前記検索対象音素・音節データを生成するステップと、前記検索対象音素・音節データを構成する複数の音素の出現頻度の統計に基づいて、前記展開ルール（２５）を設定するステップを具備する方法をコンピュータで実行可能なプログラムを所定のコンピュータに搭載し、そのプログラムを実行することによって音声検索を実行する。 In the program, similar to the above, generating the search target phoneme / syllable data based on voice data generated from voice input by the voice input device or voice data stored in advance (30); A computer-executable program having a method of setting the expansion rule (25) based on statistics of appearance frequencies of a plurality of phonemes constituting the search target phoneme / syllable data is installed in a predetermined computer. Perform a voice search by running the program.

そのプログラムにおいて、入力された語を構成する形態素を解析するステップと、その解析結果と、前記音素列または前記音節列とに基づいて、前記新たな音素列または前記新たな音節列を生成するステップを具備する方法をコンピュータで実行可能なプログラムを所定のコンピュータに搭載し、そのプログラムを実行することによって音声検索を実行する。 In the program, a step of analyzing a morpheme constituting an input word, and a step of generating the new phoneme sequence or the new syllable sequence based on the analysis result and the phoneme sequence or the syllable sequence A program that can be executed by a computer is installed in a predetermined computer, and voice search is executed by executing the program.

そのプログラムにおいて、前記形態素の各々が、予め登録されたものであるかどうかを判定するステップと、その判定結果に基づいて前記新たな音素列または前記新たな音節列を生成するステップを具備する方法をコンピュータで実行可能なプログラムを所定のコンピュータに搭載し、そのプログラムを実行することによって音声検索を実行する。 In the program, a method comprising: determining whether each of the morphemes is registered in advance; and generating the new phoneme string or the new syllable string based on the determination result Is installed in a predetermined computer and a voice search is executed by executing the program.

そのプログラムにおいて、音声データ（３０）を読み出すステップと、前記音声データを音声認識した音声認識結果、または、前記音声データを音素・音節認識した音素・音節認識結果に基づいて前記検索対象音素・音節データを生成するステップとを具備する方法をコンピュータで実行可能なプログラムを所定のコンピュータに搭載し、そのプログラムを実行することによって音声検索を実行する。前記音声データは、発話された音声の集合であり、上記と同様に音声入力装置によって入力された音声を、逐次で電子データ化を行うことで生成された音声データから生成される。また、その音声データは予め格納されたものであっても良い。 In the program, the retrieval target phoneme / syllable is read based on a step of reading out speech data (30) and a speech recognition result obtained by speech recognition of the speech data, or a phoneme / syllable recognition result obtained by phoneme / syllable recognition of the speech data. A program that can be executed by a computer is mounted on a predetermined computer, and a voice search is executed by executing the program. The voice data is a set of spoken voices, and is generated from voice data generated by sequentially converting voice input by a voice input device into electronic data in the same manner as described above. The voice data may be stored in advance.

そのプログラムにおいて、予め格納された言語モデルを読み出すステップと、前記言語モデルは、単語の接続制約を記述した情報であり、前記言語モデルに基づいて前記検索対象音素・音節データを生成するステップを具備する方法をコンピュータで実行可能なプログラムを所定のコンピュータに搭載し、そのプログラムを実行することによって音声検索を実行する。また、そのプログラムにおいて、前記入力された語が、前記新たな音素列または前記新たな音節列を生成する必要があるかどうかの判定を実行するステップと、前記判定の結果、前記入力された語が、前記新たな音素列または前記新たな音節列を生成する必要のない語であった場合、前記音素列または前記音節列を新たな音素列または新たな音節列にすることなく出力するプログラムを所定のコンピュータに搭載し、そのプログラムを実行することによって音声検索を実行する。 In the program, a step of reading a language model stored in advance, and the language model is information describing connection restrictions of words, and includes generating the search target phoneme / syllable data based on the language model. A program that can be executed by a computer is installed in a predetermined computer, and voice search is executed by executing the program. In the program, the step of determining whether or not the input word needs to generate the new phoneme string or the new syllable string; and as a result of the determination, the input word Is a word that does not require generation of the new phoneme string or the new syllable string, a program for outputting the phoneme string or the syllable string without making it a new phoneme string or a new syllable string. The voice search is executed by installing the program on a predetermined computer and executing the program.

そのプログラムにおいて、所定の記憶領域に格納された、前記入力された語を変換するための音素辞書データまたは音節辞書データを読み出すステップと、前記音素辞書データまたは前記音節辞書データに基づいて、前記入力された語を音素または音節に変換するステップと、その変換された音素または音節に基づいて前記音素列または前記音節列を生成するステップを具備する方法をコンピュータで実行可能なプログラムを所定のコンピュータに搭載し、そのプログラムを実行することによって音声検索を実行する。 In the program, the step of reading phoneme dictionary data or syllable dictionary data for converting the inputted word stored in a predetermined storage area; and the input based on the phoneme dictionary data or the syllable dictionary data A computer-executable program comprising a step of converting a converted word into a phoneme or a syllable and a step of generating the phoneme sequence or the syllable sequence based on the converted phoneme or syllable Install and execute voice search by executing the program.

さらに、上記課題を解決する方法として、入力された語を変換し、音素列または音節列を生成するステップと、前記音素列または前記音節列に新たな音素を加減し、または、前記音素列または前記音節列を構成する音素を他の音素に置換して、新たな音素列または新たな音節列を生成するステップと、格納された検索対象音素・音節データを読み出すステップと、前記新たな音素列または前記新たな音節列と、前記検索対象音素・音節データとを照合するステップとを具備する音声検索方法を使用して、認識誤りを含むデータに対する音声検索を実行する。 Furthermore, as a method for solving the above-mentioned problem, a step of converting an input word to generate a phoneme string or a syllable string, adding or subtracting a new phoneme to the phoneme string or the syllable string, Replacing a phoneme constituting the syllable string with another phoneme to generate a new phoneme string or a new syllable string; reading a stored search target phoneme / syllable data; and the new phoneme string Alternatively, a speech search is performed on data including a recognition error using a speech search method including a step of collating the new syllable string with the search target phoneme / syllable data.

その音声検索方法において、複数の音素によって構成された前記検索対象音素・音節データを読み出すステップと、前記照合により、前記新たな音素列または前記新たな音節列に一致する箇所を前記検索対象音素・音節データから検出するステップを具備する音声検索方法を使用して認識誤りを含むデータに対する音声検索を実行する。 In the speech search method, the step of reading out the search target phoneme / syllable data composed of a plurality of phonemes, and the matching to the new phoneme string or the location matching the new syllable string by the collation Perform a speech search on data containing recognition errors using a speech search method comprising detecting from syllable data.

その音声検索方法において、展開ルール（２５）を読み出すステップと、前記展開ルール（２５）は、前記新たな音素列または前記新たな音節列を生成するための規則であり、前記展開ルール（２５）に基づいて前記新たな音素列または前記新たな音節列を生成するステップを具備する音声検索方法を使用して認識誤りを含むデータに対する音声検索を実行する。 In the speech search method, the step of reading the expansion rule (25), and the expansion rule (25) are rules for generating the new phoneme string or the new syllable string, and the expansion rule (25) A speech search is performed on data including recognition errors using a speech search method comprising the step of generating the new phoneme sequence or the new syllable sequence based on

その音声検索方法において、音声入力装置によって入力された音声をリアルタイムで電子データに変換した音声データや、予め格納された音声データ（３０）に基づいて、前記検索対象音素・音節データを生成するステップと、前記検索対象音素・音節データを構成する複数の音素と、前記音声データを正しく音素・音節認識した結果である正解データ（２８）を構成する音素・音節との比較に基づいて前記展開ルール（２５）を設定するステップを具備する音声検索方法を使用して認識誤りを含むデータに対する音声検索を実行する。本発明による音声検索方法は前記検索対象音素・音節データを生成するための元となるデータ形式に制限が無い。 In the voice search method, the search target phoneme / syllable data is generated based on voice data obtained by converting voice input by a voice input device into electronic data in real time or voice data (30) stored in advance. And a plurality of phonemes constituting the search target phoneme / syllable data and the expansion rule based on a comparison between the phoneme / syllable constituting the correct answer data (28) as a result of correctly recognizing the phoneme / syllable Perform a voice search on data containing recognition errors using a voice search method comprising the step of setting (25). In the speech search method according to the present invention, there is no limitation on the data format that is the basis for generating the search target phoneme / syllable data.

その音声検索方法において、音声入力装置によって入力された音声をリアルタイムで電子データに変換した音声データや、予め格納された音声データ（３０）に基づいて、前記検索対象音素・音節データを生成するステップと、前記検索対象音素・音節データを構成する複数の音素の出現頻度の統計に基づいて、前記展開ルール（２５）を設定するステップを具備する音声検索方法を使用して認識誤りを含むデータに対する音声検索を実行する。 In the voice search method, the search target phoneme / syllable data is generated based on voice data obtained by converting voice input by a voice input device into electronic data in real time or voice data (30) stored in advance. And using a speech search method comprising a step of setting the expansion rule (25) based on statistics of appearance frequencies of a plurality of phonemes constituting the search target phoneme / syllable data. Perform a voice search.

その音声検索方法において、入力された語を構成する形態素を解析するステップと、その解析結果と、前記音素列または前記音節列とに基づいて、前記新たな音素列または前記新たな音節列を生成するステップを具備する音声検索方法を使用して認識誤りを含むデータに対する音声検索を実行する。 In the speech search method, the new phoneme string or the new syllable string is generated based on the step of analyzing the morpheme constituting the input word, the analysis result, and the phoneme string or the syllable string Performing a voice search on data including a recognition error using a voice search method comprising the steps of:

その音声検索方法において、前記形態素の各々が、予め登録されたものであるかどうかを判定するステップと、その判定結果に基づいて前記新たな音素列または前記新たな音節列を生成するステップを具備する音声検索方法を使用して認識誤りを含むデータに対する音声検索を実行する。 In the speech search method, the method includes a step of determining whether each of the morphemes is registered in advance, and a step of generating the new phoneme sequence or the new syllable sequence based on the determination result. The voice search is performed on the data including the recognition error using the voice search method.

その音声検索方法において、音声入力装置によって入力された音声をリアルタイムで電子データに変換した音声データや、予め格納された音声データ（３０）を読み出すステップと、前記音声データを音声認識した音声認識結果、または、前記音声データを音素認識した音素認識結果に基づいて前記検索対象音素・音節データを生成するステップとを具備する音声検索方法を使用して認識誤りを含むデータに対する音声検索を実行する。前記音声データは、発話された音声の集合であり、上記と同様に音声入力装置によって入力された音声を、逐次で電子データ化された音声データから生成される。また、その音声データは予め格納されたものであっても良い。 In the voice search method, a step of reading voice data obtained by converting voice input by a voice input device into electronic data in real time, or voice data (30) stored in advance, and a voice recognition result obtained by voice recognition of the voice data Alternatively, a speech search is performed on data including a recognition error using a speech search method including a step of generating the search target phoneme / syllable data based on a phoneme recognition result obtained by phoneme recognition of the speech data. The voice data is a set of spoken voices, and the voice input by the voice input device is generated from the voice data that is sequentially converted into electronic data in the same manner as described above. The voice data may be stored in advance.

その音声検索方法において、予め格納された言語モデルを読み出すステップと、前記言語モデルは、単語の接続制約を記述した情報であり、前記言語モデルに基づいて前記検索対象音素・音節データを生成するステップを具備する音声検索方法を使用して認識誤りを含むデータに対する音声検索を実行する。 In the speech search method, a step of reading a language model stored in advance, and the language model is information describing a word connection constraint, and generating the search target phoneme / syllable data based on the language model A voice search is performed on data including recognition errors using a voice search method comprising:

その音声検索方法において、所定の記憶領域に格納され、前記入力された語を変換するための音素辞書データまたは音節辞書データを読み出すステップと、前記音素辞書データまたは前記音節辞書データに基づいて、前記入力された語を音素または音節に変換するステップと、その変換された音素または音節に基づいて前記音素列または前記音節列を生成するステップを具備する音声検索方法を使用して認識誤りを含むデータに対する音声検索を実行する。 In the speech search method, the step of reading phoneme dictionary data or syllable dictionary data for converting the input word stored in a predetermined storage area, and based on the phoneme dictionary data or the syllable dictionary data, Data including a recognition error using a speech search method comprising: converting an input word into a phoneme or syllable; and generating the phoneme sequence or the syllable sequence based on the converted phoneme or syllable Perform a voice search for.

本発明によれば、検索対象である音声データに対して、任意の単語（または語句）である検索キーワードを使用して、その検索キーワードに対応する箇所を特定する場合に、少ない情報処理量で検索結果が得られるという効果がある。 According to the present invention, when using a search keyword that is an arbitrary word (or phrase) for audio data that is a search target and specifying a location corresponding to the search keyword, the amount of information processing is small. There is an effect that a search result can be obtained.

さらに、本発明によれば、入力された任意の単語（または語句）から音素列を生成する場合に、認識誤りが発生している可能性を考慮した検索を実行するため、より効果的な検索結果が得られるという効果がある。 Furthermore, according to the present invention, when a phoneme string is generated from an arbitrary input word (or phrase), a search is performed in consideration of the possibility that a recognition error has occurred. There is an effect that a result is obtained.

以下に図面を使用して本発明を実施するための最良の形態について述べる。 The best mode for carrying out the present invention will be described below with reference to the drawings.

［第１の実施の形態の構成］
図２は、本発明を実施するための第1の実施の形態の構成を示すブロック図である。このブロック図によると、第1の実施の形態における音声検索装置は、検索キーワード入力部１と、音素・音節処理部２と、音素・音節データ出力部３とマッチング部４と、出力部５とで構成されることが示されている。以下に述べる実施の形態では、被検索対象である音素・音節データが、予め格納された音声データから生成される場合を例に述べるが、これは、本発明における音声データの状態を限定するものではない。例えば、入力された音声（発話された音声など）に対して、逐次に認識処理を実行して音声データを作成することで、リアルタイムで音声検索を行うことも可能である。 [Configuration of First Embodiment]
FIG. 2 is a block diagram showing the configuration of the first embodiment for carrying out the present invention. According to this block diagram, the speech search apparatus in the first embodiment includes a search keyword input unit 1, a phoneme / syllable processing unit 2, a phoneme / syllable data output unit 3, a matching unit 4, and an output unit 5. It is shown that it consists of In the embodiment described below, a case where phoneme / syllable data to be searched is generated from previously stored speech data will be described as an example, but this restricts the state of speech data in the present invention. is not. For example, it is possible to perform a voice search in real time by sequentially executing recognition processing on input voice (spoken voice or the like) to create voice data.

検索キーワード入力部１は、文字や音声で入力された単語、語句および文（以下、これらをキーワードと呼ぶ）を、コンピュータによる情報処置が可能なデータとして出力する情報入力装置である。 The search keyword input unit 1 is an information input device that outputs words, phrases, and sentences (hereinafter referred to as keywords) input by characters or voice as data that can be processed by a computer.

音素・音節処理部２は、検索キーワードを音素列または音節列（以下、音素列等と呼ぶ）に変換するデータ処理機能ブロックである。音素・音節処理部２はさらに語句展開部６と音素列変換部７とを含む。語句展開部６は検索キーワード入力部１から出力されたキーワードを音素列等に変換するデータ処理機能ブロックである。語句展開部６は音声検索装置に備えられた音素・音節辞書（図示されず）を使用して、そのキーワードが、どの音素（または音節）で構成されているかを判断する。語句展開部６は、その判断により得られた音素（または音節）を用いて音素列等を生成することによって、そのキーワードを音素列等に変換する。音素列変換部７は、入力された音素列等と、所定の処理規則に対応して新たな音素列等を生成するデータ処理機能ブロックである。音素列変換部７に関する詳細は後で説明する。 The phoneme / syllable processing unit 2 is a data processing function block that converts a search keyword into a phoneme string or a syllable string (hereinafter referred to as a phoneme string). The phoneme / syllable processing unit 2 further includes a phrase expansion unit 6 and a phoneme string conversion unit 7. The phrase expansion unit 6 is a data processing function block that converts a keyword output from the search keyword input unit 1 into a phoneme string or the like. The phrase expansion unit 6 uses a phoneme / syllable dictionary (not shown) provided in the speech search device to determine which phoneme (or syllable) the keyword is composed of. The phrase expansion unit 6 converts the keyword into a phoneme string or the like by generating a phoneme string or the like using the phoneme (or syllable) obtained by the determination. The phoneme string conversion unit 7 is a data processing function block that generates an input phoneme string and a new phoneme string corresponding to a predetermined processing rule. Details regarding the phoneme string converter 7 will be described later.

音素・音節データ出力部３は、音声検索可能な検索対象音素・音節データを出力するデータ出力機能ブロックである。音素・音節データ出力部３はさらに音素・音節データ生成部８と音素・音節データ格納部９とを含む。音素・音節データ生成部８は入力された音声からコンピュータで情報処理が可能な音声データを生成し、その音声データから、検索対象音素・音節データを生成するデータ生成機能ブロックである。音素・音節データ格納部９は音素・音節データ生成部８から出力された検索対象音素・音節データを格納する情報記憶機能ブロックである。 The phoneme / syllable data output unit 3 is a data output function block that outputs search target phoneme / syllable data that can be searched for speech. The phoneme / syllable data output unit 3 further includes a phoneme / syllable data generation unit 8 and a phoneme / syllable data storage unit 9. The phoneme / syllable data generation unit 8 is a data generation function block that generates speech data that can be processed by a computer from input speech and generates search target phoneme / syllable data from the speech data. The phoneme / syllable data storage unit 9 is an information storage function block that stores the search target phoneme / syllable data output from the phoneme / syllable data generation unit 8.

マッチング部４は、音素・音節処理部２から出力された音素列等と、音素・音節データ出力部３から出力された検索対象音素・音節データとのマッチングを行う照合機能ブロックである。マッチング部４は、音素・音節処理部２から出力された音素列等と、音素・音節データ出力部３から出力された検索対象音素・音節データとに対してマッチングを実行し、検索対象音素・音節データに音素列等と一致する部分が存在する場合、その一致する箇所を特定するための情報（例えば、その一致する箇所を含む単文や、その単文を含む複文など）を抽出する。また、マッチングを行う際には、音素・音節データ出力部３から出力された検索対象音素・音節データと、音素・音節処理部２から出力された音素列等の中に含まれる音素あるいは音節を組にしてインデックスを作成し、そのインデックスの突合せを行ってより効率的に検索を行うこともできる。出力部５はマッチング部４から出力された情報が入力され、その情報を出力可能なデータ形式にデータ変換して出力する情報出力装置である。 The matching unit 4 is a collating function block that performs matching between the phoneme string output from the phoneme / syllable processing unit 2 and the search target phoneme / syllable data output from the phoneme / syllable data output unit 3. The matching unit 4 performs matching on the phoneme string output from the phoneme / syllable processing unit 2 and the search target phoneme / syllable data output from the phoneme / syllable data output unit 3, When the syllable data includes a portion that matches a phoneme string, information for identifying the matching portion (for example, a single sentence including the matching portion or a compound sentence including the single sentence) is extracted. When matching is performed, the phonemes or syllables included in the search target phoneme / syllable data output from the phoneme / syllable data output unit 3 and the phoneme string output from the phoneme / syllable processing unit 2 are selected. It is also possible to create an index as a set and perform a search more efficiently by matching the indexes. The output unit 5 is an information output device that receives the information output from the matching unit 4, converts the information into a data format that can be output, and outputs the data.

図３は、音素・音節データ出力部３に備えられた音素・音節データ生成部８と音素・音節データ格納部９との構成の詳細を示すブロック図である。図３を参照すると、音素・音節データ生成部８は、音声データ格納部３０と、その音声データ格納部３０に接続された音素・音節認識部３１と、その音素・音節認識部３１に接続された音響モデル格納部３２とを含む。音声データ格納部３０は、予め入力された音声をコンピュータによる情報処理が可能な形式に変換することによって生成されたデータを格納する情報記憶機能ブロックである。音素・音節認識部３１は、その音声データ格納部３０から出力された音声データに対応する、検索対象音素・音節データを生成する情報処理機能ブロックである。音響モデル格納部３２は、認識単位の音響的特徴の情報である音響モデルを格納する情報記憶機能ブロックである。 FIG. 3 is a block diagram showing details of the configuration of the phoneme / syllable data generation unit 8 and the phoneme / syllable data storage unit 9 provided in the phoneme / syllable data output unit 3. Referring to FIG. 3, the phoneme / syllable data generation unit 8 is connected to the speech data storage unit 30, the phoneme / syllable recognition unit 31 connected to the speech data storage unit 30, and the phoneme / syllable recognition unit 31. And an acoustic model storage unit 32. The voice data storage unit 30 is an information storage function block that stores data generated by converting voice input in advance into a format that can be processed by a computer. The phoneme / syllable recognition unit 31 is an information processing function block that generates search target phoneme / syllable data corresponding to the speech data output from the speech data storage unit 30. The acoustic model storage unit 32 is an information storage function block that stores an acoustic model that is information of acoustic features of recognition units.

音素・音節認識部３１は、音声データ格納部３０から出力される音声データと、音響モデル格納部３２から出力される音響モデルに基づいて音素・音節認識を実行する。音素・音節認識部３１はその実行結果である音素・音節データを音素・音節データ格納部９に出力する。音素・音節データ格納部９は音素・音節データ生成部８から出力された音素・音声データを検索対象音素・音節データとして格納する。音素・音節データ出力部３は、マッチング部４からの要求に応答して、音素・音節データ格納部９に格納された検索対象音素・音節データを出力する。 The phoneme / syllable recognition unit 31 performs phoneme / syllable recognition based on the audio data output from the audio data storage unit 30 and the acoustic model output from the acoustic model storage unit 32. The phoneme / syllable recognition unit 31 outputs the phoneme / syllable data as the execution result to the phoneme / syllable data storage unit 9. The phoneme / syllable data storage unit 9 stores the phoneme / speech data output from the phoneme / syllable data generation unit 8 as search target phoneme / syllable data. The phoneme / syllable data output unit 3 outputs the search target phoneme / syllable data stored in the phoneme / syllable data storage unit 9 in response to a request from the matching unit 4.

図４は、音素・音節データ出力部３に備えられた音素・音節データ生成部８と音素・音節データ格納部９との構成を詳細に示すブロック図である。図４を参照すると、音素・音節データ生成部８は音声データ格納部３０と、音声認識部３３と、音素・音節変換部３４と、音響モデル格納部３５と、言語モデル格納部３６とで構成される。図４に示される音声データ格納部３０及び音素・音節データ格納部９は、図３に示されるものと同様であるため、それらに関する詳細な説明は省略する。音声認識部３３は、入力された音声データの音声認識を実行する情報処理機能ブロックである。音声認識部３３は、音響モデル格納部３５に格納された音響モデル（認識単位の音響的特徴を示す情報）、および言語モデル格納部３６に格納された言語モデル（認識対象の単語間の接続制約を示す情報）を使用して大語彙連続音声認識を実行する。音素・音節変換部３４は、音声認識部３３が実行した大語彙連続音声認識処理によって得られた認識結果に基づいて、検索対象音素・音節データを生成する情報処理機能ブロックである。 FIG. 4 is a block diagram showing in detail the configuration of the phoneme / syllable data generation unit 8 and the phoneme / syllable data storage unit 9 provided in the phoneme / syllable data output unit 3. Referring to FIG. 4, the phoneme / syllable data generation unit 8 includes a speech data storage unit 30, a speech recognition unit 33, a phoneme / syllable conversion unit 34, an acoustic model storage unit 35, and a language model storage unit 36. Is done. Since the voice data storage unit 30 and the phoneme / syllable data storage unit 9 shown in FIG. 4 are the same as those shown in FIG. 3, detailed description thereof will be omitted. The voice recognition unit 33 is an information processing function block that executes voice recognition of input voice data. The speech recognition unit 33 includes an acoustic model stored in the acoustic model storage unit 35 (information indicating acoustic characteristics of recognition units) and a language model stored in the language model storage unit 36 (connection restrictions between words to be recognized). Large vocabulary continuous speech recognition is performed using the information indicating The phoneme / syllable conversion unit 34 is an information processing function block that generates search target phoneme / syllable data based on the recognition result obtained by the large vocabulary continuous speech recognition processing executed by the speech recognition unit 33.

音素・音節データ生成部８は、上述の音響モデル格納部３２（または音響モデル格納部３５）に格納される音響モデルを使用して音素・音節データ格納部９に格納される検索対象音素・音節データを生成する。音響モデル格納部３２（または音響モデル格納部３５）に格納される音響モデルは任意の音響モデルを選択することが可能である。例えば、通常は音声認識に広く使用されているＨＭＭ（ＨｉｄｄｅｎＭａｒｋｏｖＭｏｄｅｌ）を用いて音素・音節データを生成し、必要に応じて他の音響モデルを使用するような構成を備えることで、より高精度の検索が可能になる。また、言語モデル格納部３６に格納された言語モデルも、音響モデルと同様に任意の言語モデルを選択することが可能である。例えば、通常は音声認識に広く使用されている単語Ｎ−Ｇｒａｍモデルを用いて音素・音節データを生成し、必要に応じて他の言語モデルを使用するような構成を備えることで、より高精度の検索が可能になる。 The phoneme / syllable data generation unit 8 uses the acoustic model stored in the acoustic model storage unit 32 (or the acoustic model storage unit 35) described above to search for the phoneme / syllable data storage unit 9 to be searched. Generate data. As the acoustic model stored in the acoustic model storage unit 32 (or the acoustic model storage unit 35), any acoustic model can be selected. For example, it is possible to generate phoneme / syllable data using HMM (Hidden Markov Model), which is usually widely used for speech recognition, and to use other acoustic models as necessary. A precision search is possible. As the language model stored in the language model storage unit 36, any language model can be selected similarly to the acoustic model. For example, the phoneme and syllable data is generated using the word N-Gram model that is normally widely used for speech recognition, and other language models are used as necessary, so that higher accuracy can be achieved. Can be searched.

音素・音節データ生成部８の構成を、図４に示される言語モデル格納部３６を備える構成にすることで、大語彙連続音声認識を行うことが可能になる。その大語彙連続音声認識処理によって得られた検索対象音素・音節データは、言語モデルを備えていない構成の音素・音節データ生成部８に比較して、既知語（言語モデルに登録のある単語）に関連する部分において、高精度の処理が行える。これらのような音素・音節データ生成部８を備え、処理速度と音素・音節認識の精度との重要度に応じて両者を切り替えることで、利便性の高い検索装置を構成することができる。 By configuring the phoneme / syllable data generation unit 8 to include the language model storage unit 36 shown in FIG. 4, large vocabulary continuous speech recognition can be performed. The search target phoneme / syllable data obtained by the large vocabulary continuous speech recognition processing is a known word (a word registered in the language model) compared to the phoneme / syllable data generation unit 8 having no language model. High-precision processing can be performed in the portion related to. By providing such phoneme / syllable data generation unit 8 and switching between them according to the importance of processing speed and accuracy of phoneme / syllable recognition, a highly convenient search device can be configured.

図５は、音素列変換部７の構成を詳細に示すブロック図である。図５を参照すると、音素列変換部７は、展開ルール出力部２０と展開実行部２１とを備え、その各々は互いに接続されている。展開ルール出力部２０は展開実行部２１が実行する情報処理のための処理規則である展開ルールを出力する情報出力機能ブロックである。展開実行部２１は、語句展開部６から出力された音素列等を展開ルール出力部２０から出力された展開ルールに基づいて変形する情報処理機能ブロックである。 FIG. 5 is a block diagram showing in detail the configuration of the phoneme string conversion unit 7. Referring to FIG. 5, the phoneme string conversion unit 7 includes an expansion rule output unit 20 and an expansion execution unit 21, which are connected to each other. The expansion rule output unit 20 is an information output function block that outputs a expansion rule that is a processing rule for information processing executed by the expansion execution unit 21. The expansion execution unit 21 is an information processing function block that transforms the phoneme sequence output from the phrase expansion unit 6 based on the expansion rule output from the expansion rule output unit 20.

その展開ルール出力部２０は更に、展開ルール作成用音素・音節データ格納部２２と、統計処理部２３と、展開ルール作成部２４と、展開ルール格納部２５とを含む。展開ルール作成用音素・音節データ格納部２２は、展開ルール作成用の音素・音節データを格納する情報記憶機能ブロックである。展開ルール作成用音素・音節データ格納部２２に格納される音素・音節データは、任意に変更可能である。例えば、格納される音素・音節データを音素・音節データ格納部９から抽出して展開ルールを生成する構成にすることで、特定の音声データに対応した展開ルールを作成することが可能になる。 The expansion rule output unit 20 further includes an expansion rule creation phoneme / syllable data storage unit 22, a statistical processing unit 23, an expansion rule generation unit 24, and an expansion rule storage unit 25. The expansion rule creation phoneme / syllable data storage unit 22 is an information storage function block for storing expansion rule creation phoneme / syllable data. The phoneme / syllable data stored in the expansion rule creating phoneme / syllable data storage unit 22 can be arbitrarily changed. For example, by extracting the stored phoneme / syllable data from the phoneme / syllable data storage unit 9 and generating a development rule, it is possible to create a development rule corresponding to specific speech data.

統計処理部２３は、展開ルール作成用音素・音節データ格納部２２に格納された音素・音節データの統計処理を実行する統計処理機能ブロックである。展開ルール作成部２４は、統計処理部２３から出力された統計処理結果に基づいて展開ルールを作成する情報作成機能ブロックである。 The statistical processing unit 23 is a statistical processing functional block that executes statistical processing of phoneme / syllable data stored in the phoneme / syllable data storage unit 22 for developing rules. The development rule creation unit 24 is an information creation function block that creates a development rule based on the statistical processing result output from the statistical processing unit 23.

以下に、統計処理部２３がｎ−ｇｒａｍの統計を行う場合を例に、展開ルール作成部２４が実行する展開ルール作成処理について具体的に説明を行う。統計処理部２３は予めルール作成用音素・音節データに含まれる音素列の音素n-gramの統計を調査し、展開ルールを音素n-gramの頻度順に並べる。統計処理部２３は、その統計処理結果を展開ルール作成部２４に出力し、展開ルール作成部２４はその出力された統計処理結果に所定の閾値を設ける。さらに展開ルール作成部２４は、展開ルール作成用音素・音節データ格納部２２に格納される音素・音節データを用いて、前後音素条件を変化させながら、複数の展開ルールを作成する。展開ルール作成部２４は、作成された複数の展開ルールと、閾値が設けられた統計処理結果に基づいて複数の統計ルールの絞込みを実行する。例えば、音素n-gramの頻度が少なく、キーワードにほとんど適用されないと考えられる閾値を設け、その閾値以下の展開ルールを削除する。また逆に、展開実行部２１が生成する新たな音素列がn-gramの頻度の高いコンテクストを含むことになる展開ルールを削除する。このような処理を実行することで効率と精度の良い展開ルール集合を作成する。 Hereinafter, the expansion rule creation process executed by the expansion rule creation unit 24 will be described in detail by taking as an example the case where the statistical processing unit 23 performs n-gram statistics. The statistical processing unit 23 investigates the phoneme n-gram statistics of the phoneme string included in the phoneme / syllable data for rule creation in advance, and arranges the expansion rules in the order of the frequency of the phoneme n-gram. The statistical processing unit 23 outputs the statistical processing result to the development rule creation unit 24, and the development rule creation unit 24 sets a predetermined threshold for the output statistical processing result. Further, the expansion rule creation unit 24 creates a plurality of expansion rules while changing the front and rear phoneme conditions using the phoneme / syllable data stored in the expansion rule creation phoneme / syllable data storage unit 22. The development rule creation unit 24 narrows down the plurality of statistical rules based on the created plurality of development rules and the statistical processing result provided with the threshold value. For example, a threshold value that is considered to be rarely applied to keywords with a low frequency of phoneme n-grams is provided, and a development rule that is equal to or lower than the threshold value is deleted. Conversely, the expansion rule that deletes the new phoneme string generated by the expansion execution unit 21 includes a context with a high frequency of n-grams. By executing such processing, a development rule set with high efficiency and accuracy is created.

図６は、音素列変換部７の構成を詳細に示すブロック図である。図７を参照すると、音素列変換部７は、展開ルール出力部２０と展開実行部２１とを備え、その各々は互いに接続されている。展開ルール出力部２０は、さらに、展開ルール作成用音声データ格納部２６と展開ルール作成用音素・音節データ格納部２７と正解ルール格納部２８と統計処理部２３と展開ルール作成部２４と展開ルール格納部２５とを備える。展開実行部２１、統計処理部２３、展開ルール作成部２４および展開ルール格納部２５は図５に示されるものと同様であるため、詳細な説明は省略する。 FIG. 6 is a block diagram showing the configuration of the phoneme string conversion unit 7 in detail. Referring to FIG. 7, the phoneme string conversion unit 7 includes an expansion rule output unit 20 and an expansion execution unit 21, which are connected to each other. The expansion rule output unit 20 further includes an expansion rule creation voice data storage unit 26, a development rule generation phoneme / syllable data storage unit 27, a correct rule storage unit 28, a statistical processing unit 23, an expansion rule generation unit 24, and an expansion rule. And a storage unit 25. The expansion execution unit 21, the statistical processing unit 23, the expansion rule creation unit 24, and the expansion rule storage unit 25 are the same as those shown in FIG.

展開ルール作成用音声データ格納部２６は、展開ルール作成用の音声データを格納する情報記憶機能ブロックである。展開ルール作成用音声データ格納部２６に格納される音声データは、任意に変更可能である。例えば、展開ルール作成用音声データ格納部２６に格納される音声データを、音声データ格納部３０から抽出して展開ルールを生成する構成にすることで、特定の音声データに対応した展開ルールを作成することが可能になる。展開ルール作成用音素・音節データ格納部２７は、展開ルール作成用音声データ格納部２６に格納された音声データに基づいて作成された展開ルール作成用音素・音節データを格納する情報記憶機能ブロックである。格納される展開ルール作成用音素・音節データは、展開ルール作成用音声データ格納部２６に格納された音声データに対して音素・音節認識を実行することにより得られた音素・音節認識結果、あるいは、音声認識を実行することにより得られた音声認識結果を、音素列あるいは音節列に変換したデータである。この展開ルール作成用音素・音節データは認識誤りを含んでいても良い。 The expansion rule creation voice data storage unit 26 is an information storage function block that stores voice data for development rule creation. The audio data stored in the expansion rule creating audio data storage unit 26 can be arbitrarily changed. For example, by creating a development rule by extracting the voice data stored in the voice data storage unit for development rule creation 26 from the voice data storage unit 30, a development rule corresponding to specific voice data is created. It becomes possible to do. The expansion rule creation phoneme / syllable data storage unit 27 is an information storage function block that stores the expansion rule generation phoneme / syllable data created based on the speech data stored in the expansion rule creation speech data storage unit 26. is there. The stored expansion rule creating phoneme / syllable data is a phoneme / syllable recognition result obtained by executing phoneme / syllable recognition on the speech data stored in the expanded rule creating speech data storage unit 26, or This is data obtained by converting a speech recognition result obtained by executing speech recognition into a phoneme string or a syllable string. This expansion rule creation phoneme / syllable data may include a recognition error.

正解ルール格納部２８は、正解データを格納する情報記憶機能ブロックである。正解データは、展開ルール作成用音素・音節データ格納部２７に格納される展開ルール作成用音素・音節データの、正しい音素列（あるいは音節列）のデータである。この正解データは、展開ルール作成用音声データの全体に対して作成されたデータでもよいし、一部でもよい。 The correct rule storage unit 28 is an information storage function block that stores correct answer data. The correct answer data is the correct phoneme string (or syllable string) of the expansion rule creation phoneme / syllable data stored in the expansion rule creation phoneme / syllable data storage unit 27. This correct answer data may be data created for the entire development rule creating voice data, or may be a part thereof.

統計処理部２３は、ルール作成用音素・音節データの音素列（あるいは音節列）と、正解データの音素列（あるいは音節列）とマッチングを行う情報処理機能ブロックである。統計処理部２３はそのマッチングの結果、正解データ中の音素列（あるいは音節列）が展開ルール作成用音素・音節データのどのような音素列（あるいは音節列）になったかの統計をとり、その統計結果を展開ルール作成部２４へ出力する。 The statistical processing unit 23 is an information processing function block that performs matching with the phoneme string (or syllable string) of the rule creating phoneme / syllable data and the phoneme string (or syllable string) of the correct answer data. As a result of the matching, the statistical processing unit 23 obtains statistics about the phoneme string (or syllable string) in the correct answer data and the phoneme string (or syllable string) in the development rule creation phoneme / syllable data, and the statistics The result is output to the expansion rule creation unit 24.

展開ルール作成部２４は、その統計結果に基づいて、展開ルールを生成する情報生成機能ブロックである。展開ルール作成部２４は、例えば、ルール作成用音節・音声データの音素列と正解データの音素列とのＤＰマッチングをとり、前後の音素条件を見た1音素につき、正解データ中の音素がルール作成用音素・音節データ中のどの音素に置換（あるいは脱落、挿入）されたかの頻度を調査する。 The expansion rule creation unit 24 is an information generation function block that generates expansion rules based on the statistical results. The expansion rule creation unit 24, for example, performs DP matching between the phoneme sequence of the rule creation syllable / speech data and the phoneme sequence of the correct data, and the phoneme in the correct data is a rule for each phoneme that looks at the preceding and following phoneme conditions. The frequency of which phoneme in the phoneme / syllable data for creation is replaced (or dropped or inserted) is investigated.

置換に基づく展開ルールの作成の例を以下に説明する。なお、脱落、挿入に関しても同様の処理で展開ルールの作成が可能である。下記のリスト例は左から、正解データ中の前後の音素条件を見た音素、その音素が展開ルール作成用音素・音節データ中のどの音素になったか、その置換の回数をまとめたものの一部分である。
（ａ）ｔ（ｏ）ｋ５０
（ａ）ｔ（ｏ）ｄ４０
（ａ）ｔ（ｏ）ｐ２０
（ｉ）ｋ（ａ）ｇ４００
（ｉ）ｋ（ａ）ｔ１００
（ｅ）ｓ（ｕ）ｚ３
（ｅ）ｓ（ｕ）ｃ２
上記のリスト例は、例えば、「（ａ）ｔ（ｏ）ｋ５０」は、正解データ中の、前音素がａで後ろ音素がｏの音素ｔが、ルール作成用音素・音節データ中で音素ｋになった頻度が５０であったことを示す。 An example of creating an expansion rule based on replacement will be described below. It should be noted that the expansion rule can be created by the same process for dropping and inserting. The following list example is a part of the phoneme that looks at the phoneme conditions before and after in the correct answer data, the phoneme that the phoneme became in the development rule creation phoneme / syllable data, and the number of replacements. is there.
(A) t (o) k 50
(A) t (o) d 40
(A) t (o) p 20
(I) k (a) g 400
(I) k (a) t 100
(E) s (u) z 3
(E) s (u) c 2
In the above list example, for example, “(a) t (o) k 50” is the phoneme t in the correct answer, the phoneme t in which the front phoneme is a and the back phoneme is o, in the phoneme / syllable data for rule creation. It shows that the frequency of becoming k was 50.

さらに、展開ルール集合の作成は特定の条件に基づいて行われる。例えば作成の条件として、「出現回数の多いものから並べ、ある閾値以上のものを採用する。」という条件が設定されている場合を考える。この場合、例えば閾値を５０以上ならば、
（ｉ）ｋ（ａ）→ｇ、（ｉ）ｋ（ａ）→ｔ、（ａ）ｔ（ｏ）→ｋ
の展開ルールを採用する。
また、「正解データ中の音素の置換先の音素毎に頻度の割合を調査し、ある閾値以上のものを採用する。」と、定められた場合を考える。このとき仮に、正解データ中の音素“ｔ”を抽出して調査した結果、（ａ）ｔ（ｏ）が計２００回出現したならば、
（ａ）ｔ（ｏ）→ｋ５０／２００＝０．２５、
（ａ）ｔ（ｏ）→ｄ４０／２００＝０．２０、
（ａ）ｔ（ｏ）→ｐ２０／２００＝０．１０、
となる。
ここで、閾値を０．２０以上とすれば、採用される展開ルールは、音素（ａ）ｔ（ｏ）の場合（ａ）ｔ（ｏ）→ｋ、（ａ）ｔ（ｏ）→ｄとなる。
［第１の実施の形態の動作］
図７は、本発明の第１の実施の形態の動作を示すフローチャートである。図７を参照すると、第1の実施の形態の動作は、検索対象のコンテンツを決定すると開始する。ステップＳ１０１において、決定されたコンテンツ内の音素列または音節列に対する検索を実行するために、検索キーワードを入力する。入力された検索キーワードは、検索キーワード入力部１から出力され音素・音節処理部２に入力される。 Further, the creation of the expansion rule set is performed based on specific conditions. For example, let us consider a case where a condition of “arrange from the most frequently appearing items and adopt a certain threshold value or more” is set as a creation condition. In this case, for example, if the threshold is 50 or more,
(I) k (a) → g, (i) k (a) → t, (a) t (o) → k
Adopt the deployment rules.
Further, a case is considered in which “the frequency ratio is investigated for each phoneme as a replacement destination of phonemes in correct answer data, and a phoneme having a certain threshold value or more is adopted”. At this time, if the phoneme “t” in the correct answer data is extracted and investigated, and (a) t (o) appears 200 times in total,
(A) t (o) → k 50/200 = 0.25,
(A) t (o) → d 40/200 = 0.20,
(A) t (o) → p 20/200 = 0.10,
It becomes.
Here, if the threshold is 0.20 or more, the expansion rules adopted are (a) t (o) → k, (a) t (o) → d in the case of phoneme (a) t (o). Become.
[Operation of First Embodiment]
FIG. 7 is a flowchart showing the operation of the first exemplary embodiment of the present invention. Referring to FIG. 7, the operation of the first embodiment starts when content to be searched is determined. In step S101, a search keyword is input to execute a search for the phoneme string or syllable string in the determined content. The input search keyword is output from the search keyword input unit 1 and input to the phoneme / syllable processing unit 2.

ステップＳ１０２において、音素・音節処理部２は検索キーワードを語句展開部６に入力する。検索キーワードが入力された語句展開部６は、その検索キーワードを音素列または音節列（以下、［実施の形態の構成］と同様に音素列等と呼ぶ）に変換し、ステップＳ１０３に進む。ステップＳ１０３において、語句展開部６は、変換された音素列等が、音素列変換部７に出力するべきものかどうかの判断を行う。その判断の結果、その音素列等が、音素列変換部７で処理する必要の無いものだと判断された場合、処理はステップＳ１０６に進む。例えば、入力された検索キーワードが音声認識の認識誤りが比較的少ない単語であった場合、語句展開部６はその単語の音素列等を音素列変換部７に出力することなくマッチング部４へ出力する。これによって、入力された検索キーワードに対応して、データ処理量を軽減させることが可能になる。 In step S 102, the phoneme / syllable processing unit 2 inputs the search keyword to the phrase expansion unit 6. The phrase expansion unit 6 to which the search keyword is input converts the search keyword into a phoneme string or a syllable string (hereinafter referred to as a phoneme string or the like as in [Configuration of the embodiment]), and the process proceeds to step S103. In step S 103, the phrase expansion unit 6 determines whether the converted phoneme string or the like is to be output to the phoneme string conversion unit 7. As a result of the determination, if it is determined that the phoneme string or the like does not need to be processed by the phoneme string conversion unit 7, the process proceeds to step S106. For example, if the input search keyword is a word with relatively few recognition errors in speech recognition, the phrase expansion unit 6 outputs the phoneme sequence of the word to the matching unit 4 without outputting it to the phoneme sequence conversion unit 7 To do. This makes it possible to reduce the amount of data processing corresponding to the input search keyword.

ステップＳ１０４において、語句展開部６から出力された音素列等が入力された音素列変換部７は、その音素列等を展開実行部２１に出力する。展開実行部２１には、音素列変換部７から出力された音素列等が入力され、その音素列等に応答して展開ルール格納部２５に格納された展開ルールを抽出する。ステップＳ１０５において、展開ルールを抽出した展開実行部２１は、その展開ルールに基づいて語句展開部６から送られた検索キーワードの音素列等に対応する新たな音素列等を生成する。 In step S104, the phoneme string conversion unit 7 to which the phoneme string output from the phrase expansion unit 6 is input outputs the phoneme string to the expansion execution unit 21. The expansion execution unit 21 receives the phoneme string output from the phoneme string conversion unit 7 and extracts the expansion rule stored in the expansion rule storage unit 25 in response to the phoneme string. In step S105, the expansion executing unit 21 that has extracted the expansion rule generates a new phoneme string or the like corresponding to the phoneme string of the search keyword sent from the phrase expansion unit 6 based on the expansion rule.

ステップＳ１０６において、音素列変換部７は、展開実行部２１によって生成された新たな音素列等をマッチング部４に出力する。マッチング部４は、検索キーワードの音素列等または新たな音素列（音節列）との、少なくとも一方の入力に応答して、検索対象のコンテンツの検索対象音素・音節データを音素・音節データ出力部３に要求する（ステップＳ１０７）。音素・音節データ出力部３は、その要求に応答して対応する検索対象音素・音節データを音素・音節データ格納部９から抽出しマッチング部４へ出力する。 In step S 106, the phoneme string conversion unit 7 outputs the new phoneme string generated by the expansion execution unit 21 to the matching unit 4. The matching unit 4 responds to an input of at least one of a search keyword phoneme string or the like or a new phoneme string (syllable string) and outputs a search target phoneme / syllable data of a search target content to a phoneme / syllable data output unit. 3 is requested (step S107). The phoneme / syllable data output unit 3 extracts the corresponding search target phoneme / syllable data from the phoneme / syllable data storage unit 9 in response to the request and outputs it to the matching unit 4.

ステップＳ１０８において、マッチング部４は、音素・音節処理部２から出力された音素列等と、音素・音節データ出力部３から出力された検索対象音素・音節データとのマッチングを実行し、その実行結果を出力部５に出力する。 In step S108, the matching unit 4 executes matching between the phoneme string output from the phoneme / syllable processing unit 2 and the search target phoneme / syllable data output from the phoneme / syllable data output unit 3, and the execution. The result is output to the output unit 5.

展開ルールについての具体的な例としては、例えば、短母音と長母音の揺れに対応可能なように、同じ母音を挿入するルール、つまり、
「ａ」→「ａａ」、「ｉ」→「ｉｉ」、「ｕ」→「ｕｕ」、「ｅ」→「ｅｅ」、「ｏ」→「ｏｏ」
および同じ母音を削除する展開ルール、つまり、
「ａａ」→「ａ」、「ｉｉ」→「ｉ」、「ｕｕ」→「ｕ」、「ｅｅ」→「ｅ」、「ｏｏ」→「ｏ」
が設定されている場合を例に、新たな音素列等の生成動作について述べる。 As a specific example of the expansion rule, for example, a rule that inserts the same vowel so as to be able to cope with fluctuations of a short vowel and a long vowel, that is,
“A” → “aa”, “i” → “ii”, “u” → “uu”, “e” → “ee”, “o” → “oo”
And an expansion rule that removes the same vowel, ie
“Aa” → “a”, “ii” → “i”, “uu” → “u”, “ee” → “e”, “oo” → “o”
The operation for generating a new phoneme string or the like will be described by taking as an example the case where is set.

検索キーワードとして「インタホーン」が入力された場合、これを音素列に変換した「ｉＮｔａｈｏｏＮ」に上述の展開ルールをかけることで「ｉＮｔａａｈｏｏＮ」「ｉＮｔａａｈｏＮ」「ｉＮｔａｈｏＮ」などの展開音素列が作成される。もし、検索対象が音素認識結果で、「インタホーン」に対応する認識結果が「ｉＮｔａａｈｏｏＮ」で音素・音節データ格納部９に格納されている場合、上述の展開音素列「ｉＮｔａａｈｏｏＮ」で検索を行うことができる。また、もし検索対象が大語彙連続音声認識結果を音素列に変換したもので、認識辞書に「インタホーン」ではなく「インターホン」しか登録されておらず「インターホン」が認識結果となり、その音素列が「ｉＮｔａａｈｏＮ」となっている場合でも、上述の展開音素列「ｉＮｔａａｈｏＮ」で検索を行うことができる。 When "interphone" is input as a search keyword, expanded phoneme strings such as "iNtahouN", "iNtahoN", and "iNtahoN" are created by applying the above expansion rules to "iNtahooN" that is converted to a phoneme string. . If the search target is the phoneme recognition result and the recognition result corresponding to “interphone” is “iNtahouN” and stored in the phoneme / syllable data storage unit 9, the search is performed with the above-described expanded phoneme string “iNtaahooN”. be able to. Also, if the search target is a large vocabulary continuous speech recognition result converted to a phoneme string, only “interphone” is registered in the recognition dictionary instead of “interphone”, and “interphone” becomes the recognition result, and the phoneme string Can be searched with the above-described expanded phoneme string “iNtaahoN”.

また、他の展開ルールとしては、子音「ｔ」→「ｄ」、「ｋ」→「ｇ」、「ｄ」→「ｒ」、「ｓｈ」→「ｊ」の置換、「ｗ」「ｙ」の挿入、母音の「ａ」「ｉ」「ｕ」「ｅ」「ｏ」の削除など、予めどの音素がどの音素に誤りやすいかを展開ルールとして設定されている場合を例に、新たな音素列（または音節列）の生成動作について述べる。 Other expansion rules include consonants “t” → “d”, “k” → “g”, “d” → “r”, “sh” → “j” replacement, “w” “y” of insertion, such as the deletion of "a", "i", "u", "e", "o" vowel, a case that has been set if the error likely in advance which phoneme is any phoneme as a developing rules as an example, a new phoneme A sequence (or syllable sequence) generation operation will be described.

例えば、「礼文島」について検索したいとし、検索対象である音素認識結果中では「礼文島」は「ｄｅｂｕＮｔｏ」となって音素・音節データ格納部９に格納されているとする。 For example, it is assumed that “Rebun Island” is to be searched, and “Rebun Island” is stored as “debuNto” in the phoneme / syllable data storage unit 9 in the phoneme recognition result to be searched.

この場合、まず検索キーワード「礼文島」を音素列「ｒｅｂｕＮｔｏｏ」に変換し、この音素列に対し音素の展開を行う。この場合「ｄ」→「ｒ」の置換、「ｏ」の削除による「ｄｅｂｕＮｔｏ」の展開音素列により、コンテンツ中の所望の部分を検索することができる。 In this case, first, the search keyword “Rebun Island” is converted into a phoneme string “rebuNtoo”, and the phoneme is expanded for this phoneme string. In this case, it is possible to search for a desired portion in the content by using “debuNto” expanded phoneme string by replacing “d” → “r” and deleting “o”.

他の例としては、例えば、「小泉首相」について検索したいとし、検索対象である音素列では、「小泉首相」は音声認識結果「小泉受賞」を音素列に変換した「ｋｏｉｚｕｍｉｊｕｓｈｏｏ」になっているとする。この場合、まず検索キーワード「小泉首相」を音素列「ｋｏｉｚｕｍｉｓｈｕｓｈｏｏ」に変換し、この音素列に対し音素の展開を行う。この場合「ｓｈ」→「ｊ」の置換による「ｋｏｉｚｕｍｉｊｕｓｈｏｏ」の展開音素列により、コンテンツ中の所望の部分を検索することができる。 As another example, for example, when it is desired to search for “Prime Koizumi”, in the phoneme string to be searched, “Koizumi Prime” is “koizumijusho” obtained by converting the speech recognition result “Koizumi Award” into a phoneme string. And In this case, first, the search keyword “Prime Minister Koizumi” is converted into a phoneme string “koizumishusho”, and phonemes are expanded for this phoneme string. In this case, a desired part in the content can be searched by using the expanded phoneme string “koizumijusho” by replacing “sh” → “j”.

このような構成、動作によって音声検索を実行することで、検索対象のコンテンツに対する認識誤りが存在する場合でも、処理に係る負担を増加させることなく効果的な検索の実行が可能になる。 By performing a voice search with such a configuration and operation, even when there is a recognition error with respect to the content to be searched, an effective search can be executed without increasing the burden on processing.

［第２の実施の形態の構成］
図８は、本発明の第２の実施の形態の構成を示すブロック図である。本発明の第２の実施の形態は、複数の展開ルールが存在する場合において、どのルールを適用して検索を実行するかを決定する展開調整部を備える。図８を参照すると、第２の実施の形態の構成は、語句展開部６に接続される音素・音節数カウント部４０と、その音素・音節数カウント部４０と音素列変換部７との各々と接続される展開調整部４１とを備え、キーワードの音素・音節数により展開の調整を行う構成である。 [Configuration of Second Embodiment]
FIG. 8 is a block diagram showing the configuration of the second exemplary embodiment of the present invention. The second embodiment of the present invention includes an expansion adjustment unit that determines which rule is applied to execute a search when there are a plurality of expansion rules. Referring to FIG. 8, the configuration of the second embodiment includes a phoneme / syllable number counting unit 40 connected to the phrase expansion unit 6, and each of the phoneme / syllable number counting unit 40 and the phoneme string conversion unit 7. And an expansion adjustment unit 41 connected to the terminal, and adjust the expansion according to the number of phonemes and syllables of the keyword.

音素・音節数カウント部４０は、音素・音節変換手段２が出力したキーワードの音素列等に含まれる音素数（または音節数）をカウントするカウンタである。展開調整部４１は、語句展開部６が出力したキーワードの音素列等と、音素・音節数カウント部４０が出力する、カウントされた音素数（または音節数）に基づいて、音素列変換部７で使用される展開ルールを調整する情報処理機能ブロックである。展開調整部４１は音素・音節数カウント部４０から出力される音素数（または音節数）と、予め設定された閾値とに基づいて音素列変換部７で使用される展開ルールに制限ルールや拡張ルールを加えることで調整を実行する。 The phoneme / syllable number counting unit 40 is a counter that counts the number of phonemes (or the number of syllables) included in the keyword phoneme string output by the phoneme / syllable conversion means 2. The development adjustment unit 41 is based on the phoneme sequence of the keyword output by the phrase expansion unit 6 and the counted phoneme number (or the number of syllables) output by the phoneme / syllable number counting unit 40. It is an information processing function block which adjusts the expansion | deployment rule used by. The expansion adjustment unit 41 uses a restriction rule or an extension for the expansion rule used in the phoneme string conversion unit 7 based on the number of phonemes (or the number of syllables) output from the phoneme / syllable number counting unit 40 and a preset threshold value. Perform adjustments by adding rules.

この制限ルールの適用の具体的な動作を以下に述べる。以下の例では、展開調整部４１には、音素・音節数カウント部４０でカウントされる音素数（または音節数）に適用する閾値が格納され、検索キーワードの音素数がその閾値より少ない場合は、「展開を行わない」という制限ルールである場合を考える。一般に検索キーワードが短いほど誤検出が増えることから、このような制限を加えることができる展開調整部４１を備えることで、より効果的な音声検索の実行が可能になる。上記の制限ルールは、「展開を行わない」というもの以外に、任意に変更可能である。例えば、「音素の削除は行わない」、「展開後の音素・音節列の数をある閾値より少なくする」などの制限ルールを使用することも可能である。また、これらの制限ルールを複数格納し、組合せて使用する構成にすることも可能である。 The specific operation of applying this restriction rule will be described below. In the following example, the development adjustment unit 41 stores a threshold value applied to the number of phonemes (or the number of syllables) counted by the phoneme / syllable number counting unit 40, and when the number of phonemes of the search keyword is less than the threshold value Consider the case where the restriction rule is “no expansion”. In general, the shorter the search keyword is, the more false detections occur. Therefore, by providing the development adjustment unit 41 that can add such a restriction, more effective voice search can be performed. The above restriction rule can be arbitrarily changed other than “no expansion”. For example, it is possible to use a restriction rule such as “no phoneme is deleted” or “the number of phonemes / syllable strings after expansion is less than a certain threshold”. It is also possible to store a plurality of these restriction rules and use them in combination.

また、展開調整部４１が、音素・音節数カウント部４０でカウントされる音素数（または音節数）に適用する閾値を格納し、検索キーワードの音素数がその閾値より多い場合、展開調整部４１は、キーワード中の「複数箇所に挿入、置換、削除の変換を行う」という拡張ルールを使用することで効果的な音声検索の実行が可能になる。 Further, the expansion adjustment unit 41 stores a threshold value applied to the number of phonemes (or the number of syllables) counted by the phoneme / syllable number counting unit 40. When the number of phonemes of the search keyword is larger than the threshold value, the expansion adjustment unit 41 Makes it possible to perform effective voice search by using an extended rule “insert, replace, delete at multiple locations” in the keyword.

図９は、本発明の第２の実施の形態における他の構成を示すブロック図である。図９を参照すると、第２の実施の形態における他の構成では、語句展開部６に接続されるコンテクスト調査部４３を備え、そのコンテクスト調査部４３と音素列変換部７との各々と接続される展開調整部４４とを備える構成である。図９に示される展開調整部４４は、キーワードに含まれる音素・音節コンテクストにより展開の調整を行う。 FIG. 9 is a block diagram showing another configuration in the second exemplary embodiment of the present invention. Referring to FIG. 9, in another configuration in the second embodiment, a context investigation unit 43 connected to the phrase expansion unit 6 is provided, and is connected to each of the context investigation unit 43 and the phoneme string conversion unit 7. And a deployment adjusting unit 44. The expansion adjustment unit 44 shown in FIG. 9 adjusts the expansion based on the phoneme / syllable context included in the keyword.

コンテクスト調査部４３は、語句展開部６が出力した検索キーワードの音素列等のコンテクストを調査する情報処理機能ブロックである。展開調整部４４は、コンテクスト調査部４３が出力した音素（または音節）のコンテクストに基づいて、音素列変換部７で使用される展開ルールを調整する情報処理機能ブロックである。展開調整部４４は、検索キーワードに含まれる音素・音節のコンテクストに応じて、そのコンテクスト部分の展開を行うか行わないかを調整する。 The context investigation unit 43 is an information processing function block that examines a context such as a phoneme string of a search keyword output from the phrase expansion unit 6. The expansion adjustment unit 44 is an information processing function block that adjusts expansion rules used in the phoneme string conversion unit 7 based on the phoneme (or syllable) context output by the context examining unit 43. The expansion adjustment unit 44 adjusts whether or not to expand the context portion in accordance with the phoneme / syllable context included in the search keyword.

展開調整部４４は、特定の文集合から、既知語または未知語それぞれに含まれる音素・音節のコンテクストの統計情報を予め格納する。展開調整部４４は、コンテクスト調査部４３から出力される検索キーワード中の音素・音節のコンテクストを解析し、コンテクストの中で未知語に多く含まれるコンテクスト部分があった場合はその部分の展開を行い、既知語に多く含まれるコンテクスト部分は展開を行わないようにする。 The development adjustment unit 44 stores in advance statistical information of the phoneme / syllable context included in each known word or unknown word from a specific sentence set. The expansion adjustment unit 44 analyzes the context of phonemes and syllables in the search keyword output from the context investigation unit 43, and if there is a context part that is included in many unknown words in the context, expands that part. Context parts that are included in many known words are not expanded.

これにより、検索対象として、音声認識結果を音素に変換したものを使用する場合に、適切な音声検索の実行が可能になる。音声認識の際に用いた言語モデルに登録された語である既知語は、検索対象の音素列に正しく現れている可能性が高く、その場合における展開を行わないことで処理に係る負担を軽減することができる。また、言語モデルに登録されていない語である未知語は、誤認識される可能性が高く、その誤認識により音素列が元の音素列から大きくずれることが多い。そのため、展開を実行して音声検索を行うことで効果的な検索が可能になる。さらに、特定の文集合中に含まれるコンテクストの頻度を予め調査し、展開後に大量に誤検出を招くコンテクストを含むことになる音素列の展開を行わないようにすることで、展開音素列による誤検出の増大を防ぐことも可能になる。 As a result, when a speech recognition result converted into phonemes is used as a search target, an appropriate speech search can be executed. A known word, which is a word registered in the language model used for speech recognition, is likely to appear correctly in the phoneme string to be searched, and the processing burden is reduced by not expanding in that case. can do. In addition, unknown words that are not registered in the language model are likely to be misrecognized, and the phoneme string often deviates greatly from the original phoneme string due to the misrecognition. Therefore, an effective search can be performed by performing the expansion and performing a voice search. Furthermore, by examining the frequency of contexts included in a specific sentence set in advance and avoiding the expansion of phoneme strings that contain a large number of contexts that may cause false detection after expansion, errors due to expanded phoneme strings are prevented. It is also possible to prevent an increase in detection.

図１０は、本発明の第２の実施の形態における他の構成を示すブロック図である。図１０を参照すると、第２の実施の形態における他の構成では、マップ４５と、展開ルール集合格納部４６と、展開調整部４７と、コンテクスト調査部４８とを備える構成である。展開調整部４７は、語句展開部６と音素列変換部７とマップ４５と展開ルール集合格納部４６とコンテクスト調査部４８とに接続され、コンテクスト調査部４８は音素・音節データ格納部９に接続される。以下の説明では、コンテンツの性質が、そのコンテンツから作成した音素・音節n-gramの分布から特定できる場合を例に述べる。 FIG. 10 is a block diagram showing another configuration in the second exemplary embodiment of the present invention. Referring to FIG. 10, another configuration in the second embodiment is a configuration including a map 45, a development rule set storage unit 46, a development adjustment unit 47, and a context investigation unit 48. The expansion adjustment unit 47 is connected to the word expansion unit 6, the phoneme string conversion unit 7, the map 45, the expansion rule set storage unit 46, and the context investigation unit 48, and the context investigation unit 48 is connected to the phoneme / syllable data storage unit 9. Is done. In the following description, a case where the nature of content can be identified from the distribution of phonemes / syllable n-grams created from the content will be described as an example.

展開調整部４７は、マップ４５と展開ルール集合格納部４６とに基づいて、複数の展開ルールから音素列変換部７に適用させる展開ルールを選択する機能を有する情報処理機能ブロックである。マップ４５は、「どのようなコンテンツの検索対象に対しどのような展開ルール集合が有効か」という情報を示すマップを格納する情報記憶機能ブロックである。展開ルール集合格納部４６は複数の展開ルール（４６_１、４６_２…４６_ｎ）を予め格納する情報記憶機能ブロックである。コンテクスト調査部４８は、音素・音節データ格納部９が出力した検索キーワードの音素列等のコンテクストを調査する情報処理機能ブロックである。 The expansion adjustment unit 47 is an information processing function block having a function of selecting an expansion rule to be applied to the phoneme string conversion unit 7 from a plurality of expansion rules based on the map 45 and the expansion rule set storage unit 46. The map 45 is an information storage function block that stores a map indicating information “what kind of expansion rule set is valid for what kind of content search target”. The expansion rule set storage unit 46 is an information storage function block that stores a plurality of expansion rules (46 ₁ , 46 ₂ ... 46 _n ) in advance. The context investigation unit 48 is an information processing function block that examines a context such as a phoneme string of a search keyword output from the phoneme / syllable data storage unit 9.

コンテクスト調査部４８は、検索対象の音素・音節データ格納部９の音素・音節のn-gramを調査し、検索対象に適した展開ルール集合を選択し、キーワードの音素・音節列に選択した展開ルールを適用する。これにより、検索対象が変化する際にも、予め用意しておいた複数の展開ルールの中から自動で検索対象に適した展開ルール集合を選択することができる。また、展開ルールを適用する際に上述のようにキーワード音素・音節数や音素・音節コンテクストによりさらに展開パターンの調整を行ってもよい。 The context investigation unit 48 investigates the phoneme / syllable n-gram of the search target phoneme / syllable data storage unit 9, selects an expansion rule set suitable for the search target, and selects the expansion selected for the keyword phoneme / syllable string. Apply the rules. Thereby, even when the search target changes, an expansion rule set suitable for the search target can be automatically selected from a plurality of prepared expansion rules. Further, when applying the expansion rules, the expansion pattern may be further adjusted by the keyword phoneme / syllable number or the phoneme / syllable context as described above.

［第２の実施の形態の動作］
図１１は、第２の実施の形態の動作の一例を示すフローチャートである。図１１を参照すると、第２の実施の形態の動作は、検索対象のコンテンツを決定すると開始する。ステップＳ１０１において、決定されたコンテンツ内の音素列（または音節列）に対する検索を実行するために、検索キーワードを入力する。入力された検索キーワードは、検索キーワード入力部１から出力され音素・音節処理部２に入力される。 [Operation of Second Embodiment]
FIG. 11 is a flowchart illustrating an example of the operation of the second embodiment. Referring to FIG. 11, the operation of the second embodiment starts when content to be searched is determined. In step S101, a search keyword is input to execute a search for a phoneme string (or syllable string) in the determined content. The input search keyword is output from the search keyword input unit 1 and input to the phoneme / syllable processing unit 2.

ステップＳ１０２において、音素・音節処理部２は検索キーワードを語句展開部６に入力する。検索キーワードが入力された語句展開部６は、その検索キーワードを音素列等に変換し、ステップＳ１０３に進む。ステップＳ１０３において、語句展開部６は、変換された音素列が、音素列変換部７に出力するべきものかどうかの判断を行う。その判断の結果、その音素列等が、音素列変換部７で処理する必要の無いものだと判断された場合、処理はステップＳ１０６に進む。例えば、入力された検索キーワードが音声認識の認識誤りが比較的少ない単語であった場合、語句展開部６はその単語の音素列等を音素列変換部７に出力することなくマッチング部４へ出力する。ステップＳ１０３の判断の結果、その音素列等が、音素列変換部７での処理を要するものだと判断された場合、ステップＳ２０１に進む。 In step S 102, the phoneme / syllable processing unit 2 inputs the search keyword to the phrase expansion unit 6. The phrase expansion unit 6 to which the search keyword is input converts the search keyword into a phoneme string or the like, and proceeds to step S103. In step S 103, the phrase expansion unit 6 determines whether the converted phoneme string should be output to the phoneme string conversion unit 7. As a result of the determination, if it is determined that the phoneme string or the like does not need to be processed by the phoneme string conversion unit 7, the process proceeds to step S106. For example, if the input search keyword is a word with relatively few recognition errors in speech recognition, the phrase expansion unit 6 outputs the phoneme sequence of the word to the matching unit 4 without outputting it to the phoneme sequence conversion unit 7 To do. As a result of the determination in step S103, if it is determined that the phoneme string or the like requires processing by the phoneme string conversion unit 7, the process proceeds to step S201.

ステップＳ２０１において、音素列等に変換した検索キーワードに対応した展開調整を実行する。展開調整部（４１、４４、４７）はステップＳ２０１における処理結果を音素列変換部７に出力する。ステップＳ１０４において、音素列変換部７は展開調整部（４１、４４、４７）から出力された処理結果に基づいて展開ルールを抽出し、以下の処理は第1の実施の形態と同様に動作する。 In step S201, the expansion adjustment corresponding to the search keyword converted into a phoneme string or the like is executed. The development adjustment unit (41, 44, 47) outputs the processing result in step S201 to the phoneme string conversion unit 7. In step S104, the phoneme string conversion unit 7 extracts expansion rules based on the processing results output from the expansion adjustment units (41, 44, 47), and the following processing operates in the same manner as in the first embodiment. .

これにより、展開ルールに制限や拡張、および多彩な展開ルールの使用などのキーワードや検索対象に適した展開パターンを適用することが可能になり、より精度が高く、効率の良い検索を行うことができる。 This makes it possible to apply expansion patterns suitable for keywords and search targets, such as restrictions and expansions to expansion rules, and the use of various expansion rules, enabling more accurate and efficient searches. it can.

［第３の実施の形態の構成］
図１２は本発明の第３の実施の形態の構成を示すブロック図である。図１２を参照すると、本発明の第３の実施の形態は、形態素解析部５０と既知語・未知語判定部５１とを備える。形態素解析部５０は、検索キーワード入力部１が出力した検索キーワードを入力とし、形態素に分割し、形態素列を生成する情報処理機能ブロックである。形態素解析部５０は既知語・未知語判定部５１と接続し、生成した形態素列を既知語・未知語判定部５１に出力する。既知語・未知語判定部５１は、形態素解析部５０から出力された形態素列が入力され、各形態素が既知語であるか未知語であるかを判定する情報処理機能ブロックである。既知語・未知語判定部５１は、入力された形態素列を、音素・音節データ格納部９に格納される音素・音節データを作成する際に用いる言語モデルと突合せ、各形態素が既知語であるか未知語であるかを判定し、各形態素の判定結果を語句展開部６に出力する。 [Configuration of Third Embodiment]
FIG. 12 is a block diagram showing the configuration of the third exemplary embodiment of the present invention. Referring to FIG. 12, the third embodiment of the present invention includes a morpheme analysis unit 50 and a known word / unknown word determination unit 51. The morpheme analysis unit 50 is an information processing function block that takes the search keyword output from the search keyword input unit 1 as input, divides it into morphemes, and generates a morpheme string. The morpheme analysis unit 50 is connected to the known word / unknown word determination unit 51 and outputs the generated morpheme string to the known word / unknown word determination unit 51. The known word / unknown word determination unit 51 is an information processing function block that receives the morpheme string output from the morpheme analysis unit 50 and determines whether each morpheme is a known word or an unknown word. The known word / unknown word determination unit 51 matches the input morpheme sequence with a language model used when creating phoneme / syllable data stored in the phoneme / syllable data storage unit 9, and each morpheme is a known word. Or an unknown word, and the determination result of each morpheme is output to the phrase expansion unit 6.

［第３の実施の形態の動作］
図１３は、本発明の第３の実施の形態の動作の一例を示すフローチャートである。図１３を参照すると、第３の実施の形態の動作は、検索対象のコンテンツを決定すると開始する。ステップＳ１０１において、決定されたコンテンツ内の音素列（または音節列）に対する検索を実行するために、検索キーワードを入力する。入力された検索キーワードは、検索キーワード入力部１から出力され形態素解析部５０に入力される。 [Operation of Third Embodiment]
FIG. 13 is a flowchart showing an example of the operation of the third exemplary embodiment of the present invention. Referring to FIG. 13, the operation of the third embodiment starts when the content to be searched is determined. In step S101, a search keyword is input to execute a search for a phoneme string (or syllable string) in the determined content. The input search keyword is output from the search keyword input unit 1 and input to the morpheme analysis unit 50.

ステップＳ３０１において、形態素解析部５０は入力された検索キーワードを形態素に分割し、形態素列を生成する。生成された形態素列は、既知語・未知語判定部５１に出力される。ステップＳ３０２において、既知語・未知語判定部５１は、入力された形態素列を、音素・音節データ格納部９に格納される検索対象音素・音節データを作成する際に用いる言語モデルと突合せ、各形態素が既知語であるか未知語であるかを判定し、各形態素の判定結果と、検索キーワードを語句展開部６に出力する。 In step S301, the morpheme analysis unit 50 divides the input search keyword into morphemes, and generates a morpheme string. The generated morpheme string is output to the known word / unknown word determination unit 51. In step S302, the known word / unknown word determination unit 51 matches the input morpheme sequence with a language model used when creating the search target phoneme / syllable data stored in the phoneme / syllable data storage unit 9, It is determined whether the morpheme is a known word or an unknown word, and the determination result of each morpheme and the search keyword are output to the phrase expansion unit 6.

ステップ１０２において、語句展開部６は、既知語・未知語判定部５１が出力した形態素区切りのキーワードと、各形態素の既知語・未知語の判定結果を入力とし、各形態素を音素または音節列に変換し、形態素区切りの音素または音節列と各形態素の既知語・未知語の判定結果とともに音素列変換部７へ出力する。以下、第1の実施の形態、または第２の実施の形態と同様に動作する。 In step 102, the phrase expansion unit 6 receives the morpheme-separated keyword output from the known word / unknown word determination unit 51 and the determination result of the known word / unknown word of each morpheme, and converts each morpheme into a phoneme or syllable string. The result is converted and output to the phoneme string conversion unit 7 together with the morpheme-delimited phoneme or syllable string and the determination result of each morpheme known word / unknown word. Hereinafter, the operation is the same as in the first embodiment or the second embodiment.

これにより、そのキーワードの中に含まれる既知語・未知語によって適用する展開ルールを指定することが可能になり、精度よく、効率よく検索を行うことができる。更に、第２の実施の形態と組合せることで、そのキーワードの中に含まれる既知語・未知語によって適用する展開パターンや展開数を調整することも可能になり、より精度よく、効率よく検索を行うことができる。 As a result, it is possible to specify an expansion rule to be applied based on a known word / unknown word included in the keyword, and a search can be performed with high accuracy and efficiency. Furthermore, by combining with the second embodiment, it is also possible to adjust the expansion pattern and the number of expansions to be applied depending on the known words / unknown words included in the keyword, so that more accurate and efficient search is possible. It can be performed.

例えば、実際のキーワードとして「インフレターゲティング」が入力されたとする。これを形態素解析にかけ「インフレ」「ターゲティング」が得られる。既知語・未知語判定部５１は、それぞれを音声認識に用いた言語モデルと突合せ、既知語か未知語かを判定する。この場合、「インフレ」は既知語、「ターゲティング」は未知語だったとする。既知語は正しく認識結果に出現している可能性が高く、未知語は検索対象の音素列がキーワードの音素列からずれている可能性が高いため、展開調整部で既知語、未知語により展開のパターンを切り分け、既知語である「インフレ」の音素列「iNfure」は少量の展開しか行わず、未知語である「ターゲティング」の音素列「taagetyiNgu」は多く展開を行う。この結果、処理に係る負担を軽減しつつ、高精度な検索を行うことが可能になる。 For example, it is assumed that “inflation targeting” is input as an actual keyword. By applying this to morphological analysis, “inflation” and “targeting” are obtained. The known word / unknown word determination unit 51 determines whether each is a known word or an unknown word by matching each with a language model used for speech recognition. In this case, it is assumed that “inflation” is a known word and “targeting” is an unknown word. The known word is likely to appear correctly in the recognition result, and the unknown word is likely to be misaligned from the keyword phoneme string. The phoneme string “iNfure” of the “inflation” that is a known word is developed only in a small amount, and the phoneme string “taagetyiNgu” of the “targeting” that is an unknown word is expanded a lot. As a result, it is possible to perform a highly accurate search while reducing the burden on processing.

図１は、従来の音声検索装置の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a conventional voice search apparatus. 図２は、第１の実施の形態の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of the first embodiment. 図３は、音素・音節データ出力部３の構成の詳細を示すブロック図である。FIG. 3 is a block diagram showing details of the configuration of the phoneme / syllable data output unit 3. 図４は、音素・音節データ出力部３における、他の構成を詳細に示すブロック図である。FIG. 4 is a block diagram showing in detail another configuration of the phoneme / syllable data output unit 3. 図５は、音素列変換部７の構成を詳細に示すブロック図である。FIG. 5 is a block diagram showing in detail the configuration of the phoneme string conversion unit 7. 図６は、音素列変換部７における他の構成を詳細に示すブロック図である。FIG. 6 is a block diagram showing in detail another configuration of the phoneme string conversion unit 7. 図７は、第１の実施の形態の動作を示すフローチャートである。FIG. 7 is a flowchart showing the operation of the first embodiment. 図８は、第２の実施の形態の構成を示すブロック図である。FIG. 8 is a block diagram showing the configuration of the second embodiment. 図９は、第２の実施の形態における他の構成を示すブロック図である。FIG. 9 is a block diagram showing another configuration in the second embodiment. 図１０は、第２の実施の形態の動作を示すフローチャートである。FIG. 10 is a flowchart illustrating the operation of the second embodiment. 図１１は、第３の実施の形態の構成を示すブロック図である。FIG. 11 is a block diagram illustrating a configuration of the third embodiment. 図１２は、第３の実施の形態における他の構成を示すブロック図である。FIG. 12 is a block diagram showing another configuration according to the third embodiment. 図１３は、第３の実施の形態の動作を示すフローチャートである。FIG. 13 is a flowchart showing the operation of the third embodiment.

Explanation of symbols

１…検索キーワード入力部
２…音素・音節処理部
３…音素・音節データ出力部
４…マッチング部
５…出力部
６…語句展開部
７…音素列変換部
８…音素・音節データ生成部
９…音素・音節データ格納部
２０…展開ルール出力部
２１…展開実行部
２２…展開ルール作成用音素・音節データ格納部
２３…統計処理部
２４…展開ルール作成部
２５…展開ルール格納部
２６…展開ルール作成用音声データ格納部
２７…展開ルール作成用音素・音節データ格納部
２８…正解ルール格納部
３０…音声データ格納部
３１…音素・音節認識部
３２…音響モデル格納部
３３…音声認識部
３４…音素・音節変換部
３５…音響モデル格納部
３６…言語モデル格納部
４０…音素・音節数カウント部
４１…展開調整部
４３…コンテクスト調査部
４４…展開調整部
４５…マップ
４６…展開ルール集合格納部、
４６_１、４６_２〜４６_ｎ…展開ルール
４７…展開調整部
４８…コンテクスト調査部
５０…形態素解析部
５１…既知語・未知語判定部 DESCRIPTION OF SYMBOLS 1 ... Search keyword input part 2 ... Phoneme / syllable processing part 3 ... Phoneme / syllable data output part 4 ... Matching part 5 ... Output part 6 ... Phrase expansion part 7 ... Phoneme sequence conversion part 8 ... Phoneme / syllable data generation part 9 ... Phoneme / syllable data storage unit 20 ... expansion rule output unit 21 ... expansion execution unit 22 ... expansion rule creation phoneme / syllable data storage unit 23 ... statistical processing unit 24 ... expansion rule generation unit 25 ... expansion rule storage unit 26 ... expansion rule Creation voice data storage unit 27 ... Development rule creation phoneme / syllable data storage unit 28 ... Correct rule storage unit 30 ... Speech data storage unit 31 ... Phoneme / syllable recognition unit 32 ... Acoustic model storage unit 33 ... Speech recognition unit 34 ... Phoneme / syllable conversion unit 35 ... Acoustic model storage unit 36 ... Language model storage unit 40 ... Phoneme / syllable number counting unit 41 ... Development adjustment unit 43 ... Context investigation unit 44 ... Exhibition Open adjustment unit 45 ... map 46 ... expansion rule set storage unit,
46 ₁ , 46 _{2 to} 46 _n ... Expansion rule 47... Expansion adjustment unit 48 .. context investigation unit 50 .. morpheme analysis unit 51 .. known word / unknown word determination unit

Claims

A phrase expansion unit that converts input words to generate phoneme strings or syllable strings;
A phoneme that generates a new phoneme string or a new syllable string by adding or subtracting a new phoneme to the phoneme string or the syllable string, or replacing a phoneme constituting the phoneme string or the syllable string with another phoneme A column conversion unit;
Phoneme / syllable data storage unit for storing search target phoneme / syllable data;
A collation unit that collates the phoneme string or the syllable string with the search target phoneme / syllable data, and collates the new phoneme string or the new syllable string with the search target phoneme / syllable data; Voice search device provided.

The voice search device according to claim 1,
The search target phoneme / syllable data is composed of a plurality of phonemes,
The collation unit detects, from the search target phoneme / syllable data, a part that matches the new phoneme string or the new syllable string by the collation.

The voice search device according to claim 1 or 2,
The phoneme string conversion unit stores an expansion rule that is a rule for generating the new phoneme string or the new syllable string, and based on the expansion rule, the new phoneme string or the new syllable string Generate voice search device.

The voice search device according to claim 3.
The search target phoneme / syllable data is generated based on the input voice data,
The expansion rule is set based on a comparison between a plurality of phonemes constituting the search target phoneme / syllable data and a phoneme constituting correct data which is a result of correct phoneme recognition of the speech data.

The voice search device according to claim 3.
The search target phoneme / syllable data is generated based on the input voice data,
The expansion rule is set based on statistics of appearance frequencies of a plurality of phonemes constituting the search target phoneme / syllable data.

The voice search device according to any one of claims 1 to 5,
The phrase expansion unit includes morpheme analysis means for analyzing morphemes constituting the input word,
The phoneme string conversion unit generates the new phoneme string or the new syllable string based on the analysis result output from the morpheme analyzing unit and the phoneme string or the syllable string.

The voice search device according to claim 6.
The phrase expansion unit includes a registered phrase determination unit,
The registered word determination unit determines whether each morpheme that is a result of analysis by the morpheme analysis unit is registered in advance,
The phoneme string conversion unit generates the new phoneme string or the new syllable string based on the determination result.

The voice search device according to any one of claims 1 to 7,
The search target phoneme / syllable data is generated based on a speech recognition result obtained by speech recognition of input speech data or a phoneme recognition result obtained by phoneme recognition of the speech data,
The voice search device is a set of spoken voices.

The voice search device according to claim 8.
The search target phoneme / syllable data is generated based on a language model stored in advance,
The language model is information describing connection restrictions of words.

The voice search device according to any one of claims 1 to 9,
The phrase expansion unit outputs the phoneme sequence or the syllable sequence to the collation unit when the input word is a word that does not need to generate the new phoneme sequence or the new syllable sequence. Voice search device.

The voice search device according to any one of claims 1 to 10,
Phoneme dictionary data or syllable dictionary data for converting the input word,
The phrase expansion unit converts the input word into phonemes or syllables based on the phoneme dictionary data or the syllable dictionary data,
A speech search device that generates the phoneme string or the syllable string based on the converted phoneme or syllable.

Converting input words to generate phoneme sequences or syllable sequences;
A step of generating a new phoneme string or a new syllable string by adding or subtracting a new phoneme to the phoneme string or the syllable string or replacing a phoneme constituting the phoneme string or the syllable string with another phoneme When,
Reading stored search target phoneme / syllable data;
Collating the phoneme string or the syllable string with the search target phoneme / syllable data, and collating the new phoneme string or the new syllable string with the search target phoneme / syllable data. A program that can be executed on a computer.

The program according to claim 12,
Reading the search target phoneme / syllable data composed of a plurality of phonemes;
A computer-executable program comprising a step of detecting, from the search target phoneme / syllable data, the new phoneme string or a part that matches the new syllable string by the collation.

The program according to claim 12 or 13,
A step of reading an expansion rule; and the expansion rule is a rule for generating the new phoneme string or the new syllable string,
A computer-executable program comprising the step of generating the new phoneme string or the new syllable string based on the expansion rule.

The program according to claim 14, wherein
Generating the search target phoneme / syllable data based on the input voice data;
A method of setting the expansion rule based on a comparison between a plurality of phonemes constituting the search target phoneme / syllable data and a phoneme constituting correct data which is a result of correct phoneme recognition of the speech data. A program that can be executed on a computer.

The program according to claim 14, wherein
Generating the search target phoneme / syllable data based on the input voice data;
A computer-executable program comprising: a step of setting the expansion rule based on statistics of appearance frequencies of a plurality of phonemes constituting the search target phoneme / syllable data.

The program according to any one of claims 12 to 16,
Analyzing the morphemes that make up the input word;
A computer-executable program comprising a step of generating the new phoneme string or the new syllable string based on the analysis result and the phoneme string or the syllable string.

The program according to claim 17, wherein
Determining whether each of the morphemes is pre-registered;
A computer-executable program comprising a step of generating the new phoneme string or the new syllable string based on the determination result.

The program according to any one of claims 12 to 18,
Reading the input voice data, and the voice data is a set of spoken voices;
A computer-executable program comprising: generating the search target phoneme / syllable data based on a voice recognition result obtained by voice recognition of the voice data or a phoneme recognition result obtained by phoneme recognition of the voice data.

The program according to claim 19, wherein
A step of reading a language model stored in advance, and the language model is information describing a connection constraint of words;
A computer-executable program comprising the step of generating the search target phoneme / syllable data based on the language model.

The program according to any one of claims 12 to 20,
Performing a determination as to whether the input word needs to generate the new phoneme string or the new syllable string;
As a result of the determination, if the input word is a word that does not require generation of the new phoneme string or the new syllable string, the method includes the step of outputting the phoneme string or the syllable string. A computer executable program.

The program according to any one of claims 12 to 21,
Reading phoneme dictionary data or syllable dictionary data for converting input words;
Converting the input word into phonemes or syllables based on the phoneme dictionary data or the syllable dictionary data;
A computer-executable program comprising a step of generating the phoneme string or the syllable string based on the converted phoneme or syllable.

Converting input words to generate phoneme sequences or syllable sequences;
A step of generating a new phoneme string or a new syllable string by adding or subtracting a new phoneme to the phoneme string or the syllable string or replacing a phoneme constituting the phoneme string or the syllable string with another phoneme When,
Reading stored search target phoneme / syllable data;
Collating the phoneme string or the syllable string with the search target phoneme / syllable data, and comparing the new phoneme string or the new syllable string with the search target phoneme / syllable data. Voice search method.

24. The voice search method according to claim 23.
Reading the search target phoneme / syllable data composed of a plurality of phonemes;
A speech search method comprising a step of detecting, from the search target phoneme / syllable data, a part that matches the new phoneme string or the new syllable string by the collation.

The voice search method according to claim 23 or 24,
A step of reading an expansion rule; and the expansion rule is a rule for generating the new phoneme string or the new syllable string,
A speech search method comprising the step of generating the new phoneme string or the new syllable string based on the expansion rule.

26. The voice search method according to claim 25.
Generating the search target phoneme / syllable data based on the input voice data;
A speech search comprising a step of setting the expansion rule based on a comparison between a plurality of phonemes constituting the search target phoneme / syllable data and a phoneme constituting correct data which is a result of correct phoneme recognition of the speech data. Method.

26. The voice search method according to claim 25.
Generating the search target phoneme / syllable data based on the input voice data;
A speech search method comprising a step of setting the expansion rule based on statistics of appearance frequencies of a plurality of phonemes constituting the search target phoneme / syllable data.

The voice search method according to any one of claims 23 to 27,
Analyzing the morphemes that make up the input word;
A speech search method comprising a step of generating the new phoneme string or the new syllable string based on the analysis result and the phoneme string or the syllable string.

The voice search method according to claim 28, wherein
Determining whether each of the morphemes is pre-registered;
A speech search method comprising a step of generating the new phoneme string or the new syllable string based on the determination result.

The voice search method according to any one of claims 23 to 29,
Reading the input voice data, and the voice data is a set of spoken voices;
Generating a search target phoneme / syllable data based on a speech recognition result obtained by speech recognition of the speech data or a phoneme recognition result obtained by phoneme recognition of the speech data.

The voice search method according to claim 30, wherein
A step of reading a language model stored in advance, and the language model is information describing a connection constraint of words;
A speech search method comprising: generating the search target phoneme / syllable data based on the language model.

The voice search method according to any one of claims 23 to 31,
Performing a determination as to whether the input word needs to generate the new phoneme string or the new syllable string;
As a result of the determination, if the input word is a word that does not require generation of the new phoneme string or the new syllable string, a step of outputting the phoneme string or the syllable string.

The voice search method according to any one of claims 23 to 32,
Reading phoneme dictionary data or syllable dictionary data for converting input words;
Converting the input word into phonemes or syllables based on the phoneme dictionary data or the syllable dictionary data;
A speech search method comprising: generating the phoneme string or the syllable string based on the converted phoneme or syllable.