JPH0656556B2

JPH0656556B2 - Word detection method

Info

Publication number: JPH0656556B2
Application number: JP61190258A
Authority: JP
Inventors: 香一郎畑▲崎▼
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1986-08-12
Filing date: 1986-08-12
Publication date: 1994-07-27
Anticipated expiration: 2009-07-27
Also published as: JPS6344700A

Description

【発明の詳細な説明】（産業上の利用分野）本発明は音声認識装置、音声入力装置等において用いら
れ、入力音声中に含まれる単語とその単語の音声中での
位置とを検出する単語検出方式に関する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention is used in a voice recognition device, a voice input device, etc., and is a word for detecting a word included in an input voice and a position of the word in the voice. Regarding detection method.

（従来の技術）音声認識装置、音声入力装置等において入力音声中の単
語とその位置を検出する方法に、音節、音素、音素クラ
ス等のカテゴリの列である入力音声から各カテゴリとそ
れらの入力音声中での位置情報とを抽出し、抽出された
カテゴリから作成したカテゴリ列がある単語のカテゴリ
列に対応すれば、その単語と入力音声中でのカテゴリ列
の位置とを検出結果として出力する方法がある。(Prior Art) A method for detecting a word and its position in an input voice in a voice recognition device, a voice input device, etc., includes a method for inputting each category from the input voice that is a sequence of categories such as syllable, phoneme, and phoneme class Position information in the voice is extracted, and if the category string created from the extracted categories corresponds to the category string of a word, that word and the position of the category string in the input voice are output as the detection result. There is a way.

一般に上述のカテゴリは、その時間長が短かくまた類似
するカテゴリが存在することなどから、入力音声中のカ
テゴリを完全に誤りなく抽出することは困難である。こ
のため、従来は、入力音声中の各カテゴリの区間に対し
て複数個のカテゴリ候補を抽出しておき、入力音声の端
から順に、カテゴリ候補を用いて部分的なカテゴリ候補
列を生成しては単語のカテゴリ列との照合を行なうとい
う処理を繰り返すことによって、その単語に対応するカ
テゴリ候補列を見つけていた。この方法の詳細は、例え
ば、文献「特願昭58−214544号、パタン認識装置」に述
べられているので、ここでは省略する。In general, the above-mentioned categories have a short time length, and there are similar categories. Therefore, it is difficult to extract the categories in the input speech completely without error. Therefore, conventionally, a plurality of category candidates are extracted for each category section in the input voice, and partial category candidate strings are generated using the category candidates in order from the end of the input voice. Found a category candidate sequence corresponding to the word by repeating the process of matching the word with the category sequence. Details of this method are described in, for example, Japanese Patent Application No. 58-214544, Pattern Recognition Device, and will not be described here.

また、入力音声中のカテゴリ抽出の段階において、発生
のなまけや隣接するカテゴリ（例えば音節）どうしの調
音結合などの原因によって、あるカテゴリが消失してい
たり、あるいはその存在が検出できなかった結果、その
前後のカテゴリが隣接するものとして抽出されてしまう
ことがある。この現象を以後、カテゴリ脱落と呼ぶ。Also, at the stage of extracting categories in the input voice, a certain category has disappeared or its presence could not be detected due to factors such as the occurrence of occurrence and articulatory coupling between adjacent categories (for example, syllables). The categories before and after that may be extracted as being adjacent to each other. This phenomenon is hereinafter referred to as category dropout.

このことに対処するため、従来は、どのようなカテゴリ
の並びのときにカテゴリ脱落が起こるかということをあ
らかじめ調査し、その結果から比較的頻度の高いカテゴ
リ脱落について、そのカテゴリ脱落の起こっているカテ
ゴリ列を脱落したカテゴリが復元されたカテゴリ列に変
換するカテゴリ列訂正規則を用意する。この規則を、単
語検出時に、カテゴリ候補列に適用することによって、
比較的頻度の高いカテゴリ脱落に対しては、脱落したカ
テゴリを復元することができる。この方法の詳細は、例
えば文献２「並木、浜田、中津、“音声認識を用いた日
本語入力方式”、信学論Vol.Ｊ67−Ｄ No.４，1984.
４」に述べられているので、ここでは省略する。In order to deal with this, in the past, it was investigated in advance what sort of categories would result in category dropouts, and from the results, the category dropouts for relatively frequent category dropouts have occurred. Prepare a category string correction rule to convert a category that is missing from the category string into a restored category string. By applying this rule to the category candidate sequence at the time of word detection,
For a relatively frequently dropped category, the dropped category can be restored. The details of this method are described in, for example, Reference 2 “Namiki, Hamada, Nakatsu,“ Japanese Input Method Using Speech Recognition ”, SIJ Vol. J67-D No. 4, 1984.
No. 4 ”is omitted here.

（発明が解決しようとしている問題点）上記従来の方法では、入力音声から抽出されたカテゴリ
候補を用いてカテゴリ候補列を生成したのちに、単語の
カテゴリ列との照合を行なっていたために、最終的に無
駄になるカテゴリ候補列が多数生成されてしまい、その
ために多大な計算量を必要としていた。(Problems to be Solved by the Invention) In the above-mentioned conventional method, since the category candidate string is generated using the category candidates extracted from the input speech, the category candidate string is collated with the word category string. A large number of category candidate sequences, which are wasted, are generated, which requires a large amount of calculation.

また、検出すべき単語の区間が入力音声の一部分しか占
めない場合でも、従来は、その単語の存在しない区間を
含め、入力音声の端からすべてのカテゴリ候補について
等しく単語中のカテゴリとの照合を行なわねばならず、
無駄な計算時間を必要とし、単語の検出まで長い時間を
必要としていた。Further, even if a word segment to be detected occupies only a part of the input voice, conventionally, all the category candidates including the segment in which the word does not exist are matched equally with the category in the word from the end of the input voice. Must be done,
It takes a lot of calculation time and a long time to detect a word.

さらに、前記のカテゴリ列訂正規則は、カテゴリ脱落の
起こっているカテゴリ候補列だけではなくて、起こって
いないカテゴリ候補列にも等しく適用される。また、一
つのカテゴリ候補列に対しては多くの場合、複数個の訂
正規則が個別に適用される。このため、一つのカテゴリ
候補列から多くのカテゴリ候補列が生成されてしまい、
検出すべき単語に対応するカテゴリ候補列が見つかるま
で、多くのカテゴリ候補列を検査しなければならない。
しかもそのカテゴリ候補列のほとんどは、検出すべき単
語のカテゴリ列とは一致せずに最終的に無駄になるもの
である。Further, the above-mentioned category string correction rule is applied not only to the category candidate string in which the category dropout has occurred but also to the category candidate string in which the category dropout has not occurred. In many cases, a plurality of correction rules are individually applied to one category candidate sequence. Therefore, many category candidate sequences are generated from one category candidate sequence,
Many category candidate sequences must be examined until a category candidate sequence corresponding to the word to be found is found.
Moreover, most of the category candidate sequences do not match the category sequence of the word to be detected, and are eventually wasted.

また、訂正規則で復元できるカテゴリは、比較的頻繁に
起こるカテゴリ脱落によるものに限られ、比較的まれに
起こる脱落に対しては復元は不可能である。復元できる
脱落を増やすためには訂正規則の数を増加させなければ
ならず、この結果、生成されるカテゴリ候補列はますま
す増加する。Further, the categories that can be restored by the correction rule are limited to the categories that are relatively frequently dropped, and the categories that are relatively rarely restored cannot be restored. The number of correction rules must be increased in order to increase the number of dropouts that can be restored, and as a result, more and more category candidate sequences are generated.

例えば、「オンセイニンシキトワ（音声認識とは）」と
発生された音声から、その中の音節候補を抽出しようと
したところ、音節“シ”の継続時間長が短く、音節
“シ”とその前後の音節“ン”と“キ”のそれぞれとの
音節境界が接近していたために、音節“シ”の存在が検
出できずに、その前後の音節“ン”と“キ”の音節候補
が隣接する位置に抽出されたとする。この結果、音節
“シ”以外のすべての音節に対しては正しい音節候補が
得られた場合でも、生成される音節候補列は“オンセイ
ニンキトワ”となり、脱落した音節“シ”を訂正規則で
復元しなければならない。しかしながら、このような音
節の脱落は比較的まれな種類のものであり、この脱落を
訂正する規則が用意されていることが少ないと思われ
る。また、たとえ、この訂正規則が用意されていても、
この他に、例えば、“ニン”→”ニイン”、“イ”→
“イイ”という訂正規則が用意されていることは多く、
これらが適用されることによって、“オンセイイニンキ
トワ”、“オンセイニイントワ”、“オンセイイニイン
トワ”などの無駄な音節候補列も生成されてしまう。For example, when trying to extract a syllable candidate from a voice generated as "Onsei Ninshitowa (What is voice recognition)", the duration of the syllable "Si" is short and the syllable "Si" and its Since the syllable boundaries of the preceding and following syllables "n" and "ki" were close to each other, the existence of the syllable "si" could not be detected, and the syllable candidates of the preceding and following syllables "n" and "ki" were detected. It is assumed that the data are extracted at the adjacent positions. As a result, even if correct syllable candidates are obtained for all syllables other than the syllable "si", the generated syllable candidate sequence becomes "onseininkittowa", and the dropped syllable "si" is corrected by the correction rule. Must be restored at. However, such syllable omissions are a relatively rare type, and it seems unlikely that there will be rules to correct these omissions. Also, even if this correction rule is prepared,
In addition to this, for example, "nin" → "nine", "a" →
In many cases, a correction rule called "Good" is prepared,
By applying these, useless syllable candidate sequences such as “onseiinnintowa”, “onseiinnintowa”, and “onseiinnintowa” are generated.

本発明の目的は、無駄なカテゴリ候補列を生成せず、ま
た、検出すべき単語の区間が入力音声全体のごく一部で
ある場合や、さらに入力音声中の検出すべき単語中のい
くつかのカテゴリが脱落した場合でも、効率よく入力音
声から単語とその位置とを検出することを可能とする単
語検出方式を提供することにある。An object of the present invention is not to generate a wasteful category candidate sequence, and when a section of a word to be detected is a very small part of the entire input speech, or when some of the words to be detected in the input speech are detected. An object of the present invention is to provide a word detection method capable of efficiently detecting a word and its position from an input voice even when the category of is dropped.

（問題点を解決すための手段）前述の問題点を解決し上記目的を達成するために本発明
が提供する手段は、音節、音素、音素クラス等のカテゴ
リの列である入力音声から抽出した複数個のカテゴリ候
補とそれらの位置情報とを用いて、単語のカテゴリ列に
対応するカテゴリ候補列を生成することによって、入力
音声中の単語とその出現位置を検出する単語検出方式で
あって、入力音声から得た複数個のカテゴリ候補とそれ
らの位置情報のそれぞれをそのカテゴリ名で分類して記
憶し、単語中のカテゴリの並びの順に従って各カテゴリ
に対応するカテゴリ候補とその位置情報をそのカテゴリ
と同じ名前に分類されて記憶されているカテゴリ候補の
中から選ぶとともに、単語中の連続する３個のカテゴリ
の並びの最初と最後のカテゴリが、入力音声中の連続す
る２個のカテゴリ候補の並びのそれぞれのカテゴリ候補
に対応するときは、その３個のカテゴリの並びと２個の
カテゴリ候補の並びとを対応させて、カテゴリ候補列の
生成を行なうことを特徴とする。(Means for Solving Problems) Means provided by the present invention in order to solve the above problems and achieve the above object are extracted from an input speech which is a sequence of categories such as syllables, phonemes, and phoneme classes. A word detection method for detecting a word in an input voice and its appearance position by generating a category candidate string corresponding to a word category string using a plurality of category candidates and their position information, A plurality of category candidates obtained from the input voice and their position information are classified and stored by the category name, and the category candidates and their position information corresponding to each category are stored according to the order of the categories in the word. While selecting from the category candidates that are categorized into the same name as the category and stored, the first and last categories in the sequence of three consecutive categories in a word are the input sounds. When corresponding to each category candidate in the sequence of two consecutive category candidates in the voice, the sequence of three categories and the sequence of two category candidates are associated with each other to generate a category candidate sequence. It is characterized by performing.

（作用）本発明の方式では、入力音声から抽出されたカテゴリ候
補のうち、検出すべき単語に含まれるカテゴリと同じ名
前のカテゴリ候補だけを用いて、かつ単語中のカテゴリ
の並びを辿りながら対応するカテゴリ候補列を生成す
る。このことによって、単語のカテゴリ列あるいはその
部分列に対応するカテゴリ候補列だけが生成されること
になり、無駄なカテゴリ列を生成することを避けること
が可能となる。(Operation) In the method of the present invention, among the category candidates extracted from the input speech, only the category candidates having the same name as the category included in the word to be detected are used, and the arrangement of the categories in the word is followed. A category candidate sequence to be generated is generated. As a result, only the category candidate string corresponding to the word category string or the substring thereof is generated, and it is possible to avoid generating an unnecessary category string.

また、入力音声中のカテゴリ候補のうち、単語中のカテ
ゴリに対応するカテゴリ候補からカテゴリ候補列を生成
してゆくために、検出すべき単語の区間が入力音声の全
体のごく一部の場合であっても、また、その区間が入力
音声中のどの位置にあっても、素早くその単語を検出す
ることが可能となる。In addition, in order to generate a category candidate string from category candidates corresponding to the category in a word among the category candidates in the input speech, when the section of the word to be detected is a very small part of the entire input speech. It is possible to detect the word quickly regardless of whether or not the section exists in the input voice.

入力音声中の単語の中のあるカテゴリが脱落した場合、
そのカテゴリの前後にそれぞれ隣接する３個のカテゴリ
に対するカテゴリ候補は入力音声中で互いに隣り合う。
すなわち、単語中のカテゴリ列をＣ_i-1Ｃ_ｉＣ_i+1とし、
カテゴリＣ_ｉが脱落すると、Ｃ_i+1はＣ_i-1に後続するも
のとして、それぞれのカテゴリ候補が抽出される。If a category in a word in the input speech is dropped,
The category candidates for the three adjacent categories before and after the category are adjacent to each other in the input voice.
That is, the category string in a word is set to C _i-1 C _i C _{i + 1} ,
When the category C _i is dropped, each category candidate is extracted as C _{i + 1} follows C _i-1 .

そこで、検出すべき単語中のカテゴリの並びを辿りなが
ら、その単語に対応するカテゴリ候補列を生成するとき
に、上記のＣ_i+1に対応するカテゴリ候補がＣ_i-1に対応
するカテゴリ候補に入力音声中で後続するならば、その
２個のカテゴリ候補の並びを単語中のカテゴリ列Ｃ_i-1
Ｃ_ｉＣ_i+1に対応させる。このことによって、カテゴリ
Ｃ_ｉが脱落していても、カテゴリ候補列とカテゴリ列と
の正しい対応をとることが可能になる。また、単語のカ
テゴリ列に対応するカテゴリ候補列だけが生成されるこ
とになるため、無駄なカテゴリ候補列の生成を避けるこ
とができる。Therefore, when the category candidate string corresponding to the word is generated while tracing the arrangement of the categories in the word to be detected, the category candidate corresponding to C _{i + 1} is the category candidate corresponding to C _i-1. If it follows in the input speech, the sequence of the two category candidates is the category string C _{i-1 in the} word.
Corresponds to C _i C _{i + 1} . As a result, even if the category C _i is missing, it is possible to take a correct correspondence between the category candidate sequence and the category sequence. Further, since only the category candidate string corresponding to the word category string is generated, useless generation of the category candidate string can be avoided.

（実施例）以下、図面を参照しつつ、実施例に従って本発明を一層
詳細に説明する。(Examples) Hereinafter, the present invention will be described in more detail according to examples with reference to the drawings.

第１図は本発明の一実施例を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of the present invention.

本実施例では日本語の音声の入力がなされるものとし
て、カテゴリとして音節を用いる。In the present embodiment, syllables are used as categories assuming that Japanese voice is input.

音節抽出部１は入力音声中の音節候補を検出し、その候
補を音節候補記憶部２に記憶する。例として、“オンセ
イニンシキトハ”（音声認識とは）という音声が入力さ
れたとする。この場合、音節認識の結果として例えば第
２図に示されるような音節候補が抽出される。第２図に
おいて、矢印の線が各音節候補の区間であり、各区間に
複数個の音節候補が抽出されている。これらの音節候補
は、音節名で分類されて、音節候補記憶部２に記憶され
る。この結果、音節候補記憶部２の内容は第３図に示さ
れるようになる。この図では、各音節候補を“音節名／
始端時刻：終端時刻”の形式で表現している。The syllable extraction unit 1 detects syllable candidates in the input voice and stores the candidates in the syllable candidate storage unit 2. As an example, it is assumed that the voice “ON-SEI NIN-SHITOH” (what is voice recognition) is input. In this case, as a result of syllable recognition, for example, syllable candidates as shown in FIG. 2 are extracted. In FIG. 2, a line of an arrow is a section of each syllable candidate, and a plurality of syllable candidates are extracted in each section. These syllable candidates are classified by syllable name and stored in the syllable candidate storage unit 2. As a result, the contents of the syllable candidate storage unit 2 are as shown in FIG. In this figure, each syllable candidate is represented by “syllable name /
It is expressed in the format of "start time: end time".

単語記憶部３には検出すべき単語の音節列が記憶されて
いる。その中の１個の単語を単語バッファ４を取り出し
た後、入力音声にこの単語が含まれるかどうかが調べら
れる。今、単語バッファ４には単語「認識」の音節列
“ニンシキ”が記憶されているとする。The word storage unit 3 stores syllable strings of words to be detected. After extracting one word from the word buffer 4, it is examined whether or not this word is included in the input speech. Now, it is assumed that the word buffer 4 stores the syllable string “Ninshiki” of the word “recognition”.

音節候補列生成部５は単語バッファ４に記憶されている
単語中の音節の並びの順に、音節候補記憶部２中の音節
候補から音節候補列を作成し、その結果の音節候補列と
対応する音節列とを音節候補列記憶部６に記憶する。本
実施例では、単語の先頭の音節から順に音節列を作成し
てゆく。The syllable candidate string generation unit 5 creates a syllable candidate string from the syllable candidates in the syllable candidate storage unit 2 in the order of arrangement of syllables in the words stored in the word buffer 4, and corresponds to the resulting syllable candidate string. The syllable sequence and the syllable candidate sequence storage unit 6 are stored. In this embodiment, a syllable string is created in order from the beginning syllable of a word.

まず、単語バッファ４の先頭の音節は“ニ”であるか
ら、音節候補列生成部５は音節候補記憶部２中で“ニ”
に分類されて記憶されている音節候補を取り出し、それ
ぞれを長さ１の音節候補列として音節“ニ”とともに音
節候補列記憶部６に記憶する。この結果、音節候補列記
憶部６には、ニ／２：４（ニ）ニ／10：13（ニ）の２個の音節候補列が記憶される。ここで、括弧の中が
対応する音節列である。First, since the first syllable of the word buffer 4 is “d”, the syllable candidate string generation unit 5 stores “d” in the syllable candidate storage unit 2.
The syllable candidates that have been classified and stored are extracted and stored as syllable candidate strings of length 1 in the syllable candidate string storage unit 6 together with the syllable "d". As a result, the syllable candidate string storage unit 6 stores two syllable candidate strings of N / 2: 4 (D) N / 10: 13 (D). Here, the brackets are the corresponding syllable strings.

次に、音節候補列生成部５は単語バッファ４中の次の音
節“ン”とその次の音節“シ”に注目し、音節候補記憶
部２中で“ン”あるいは“シ”に分類されて記憶されて
いる音節候補のそれぞれについて、音節候補記憶部６中
のいずれかの音節候補列の最後尾の音節候補に入力音声
中で後続しているかどうかを調べる。後続している音節
候補があれば、その音節候補を音節候補列の最後尾に連
結して新たな音節候補列を生成し音節候補列記憶部６に
記憶する。“ン”あるいは“シ”に分類されて記憶され
ている音節候補列は、ン／２：４、ン13：16、シ／４：
７の３個である。音節候補Ａが他の音節候補Ｂに後続し
ているかどうかは音節候補Ａの終端時刻と音節候補Ｂの
始端時刻とを比較することによって判定することができ
る。ここでは、それらの時刻の差がプラスマイナス１以
下のときに後続すると判定する。そこで今の場合は、音
節候補ン／13：16を音節候補列に連結して音節列“ニ
ン”に対応させ、音節候補シ／４：７を音節候補に連
結して、音節列“ニンシ”に対応させる。また、それま
で音節候補列記憶部６に記憶されていた音節候補列は削
除する。この結果、音節候補列記憶部６の中には、ニ／10：13−ン／13：16（ニン）ニ／２：４−シ／４：７（ニンシ）の２個の音節候補列が残る。Next, the syllable candidate string generation unit 5 pays attention to the next syllable “n” and the next syllable “si” in the word buffer 4 and classifies them into “n” or “si” in the syllable candidate storage unit 2. For each of the stored syllable candidates, it is checked whether or not the last syllable candidate of any syllable candidate sequence in the syllable candidate storage unit 6 is followed in the input speech. If there is a succeeding syllable candidate, the syllable candidate is connected to the end of the syllable candidate string to generate a new syllable candidate string and stored in the syllable candidate string storage unit 6. The syllable candidate strings that are classified and stored as “n” or “si” are n / 2: 4, n13: 16, and si / 4:
3 of 7. Whether or not the syllable candidate A follows the other syllable candidate B can be determined by comparing the end time of the syllable candidate A and the start time of the syllable candidate B. Here, when the difference between those times is plus or minus 1 or less, it is determined to follow. Therefore, in the present case, syllable candidate / 13: 16 is connected to the syllable candidate sequence to correspond to the syllable sequence “nin”, and syllable candidate sequence / 4: 7 is connected to the syllable candidate to produce the syllable sequence “ninshi”. Correspond to. Further, the syllable candidate string stored in the syllable candidate string storage unit 6 up to that point is deleted. As a result, in the syllable candidate string storage unit 6, two syllable candidate strings of N / 10: 13-N / 13: 16 (Nin) N / 2: 4-Si / 4: 7 (Ninshi) are stored. Remain.

続いて、音節“シ”についての処理に進む。音節候補記
憶部２中で、“シ”あるいはその次の音節“キ”に分類
されて記憶されている音節候補は、シ／４：７とキ／1
6：19の２個である。このそれぞれについて音節候補列
あるいはの最後尾の音節候補に入力音声中で後続す
るかを調べると、キ／16：19が音節候補列の最後尾の
音節候補ン／13：16に後続すると判定される。この結
果、音節候補列に音節候補キ／16：19を連結してでき
た新たな音節候補列を、音節列“ニンシキ”と対応さ
せて、音節列記憶部６に記憶する。従って、音節候補列
記憶部６の内容はニ／10：13−ン／13：16−キ／16：19 （ニンシキ）となる。Then, the process proceeds to the syllable "si". In the syllable candidate storage unit 2, the syllable candidates that are classified and stored as “si” or the next syllable “ki” are si / 4: 7 and ki / 1.
It is two of 6:19. For each of these, it was determined whether or not the syllable candidate sequence or the last syllable candidate of the syllable candidate sequence was followed in the input voice, and it was determined that Ki / 16: 19 follows the last syllable candidate / 13: 16 of the syllable candidate sequence. It As a result, a new syllable candidate string formed by connecting the syllable candidate key 16:19 to the syllable candidate string is stored in the syllable string storage unit 6 in association with the syllable string “Ninshiki”. Therefore, the contents of the syllable candidate string storage unit 6 are 2/10: 13- / 13: 16-key / 16: 19 (ninshinki).

ここで、単語バッツァ４の中の最後の音節に達している
ため、音節候補列生成部５は、単語「認識」が入力音声
中の時刻10から時刻19に至る区間に存在するということ
を出力する。Here, since the last syllable in the word Bazza 4 has been reached, the syllable candidate string generation unit 5 outputs that the word “recognition” exists in the section from time 10 to time 19 in the input speech. To do.

以上、本発明の一実施例を説明した。なお、音節の脱落
は、連続しないかぎり、１個の単語中に複数個生じてい
てもよい。The embodiment of the present invention has been described above. It should be noted that, as long as the syllables are omitted, a plurality of syllables may occur in one word unless they are consecutive.

（発明の効果）以上説明したように、本発明によれば、入力音声からの
音節候補抽出の段階で、検出すべき単語中の連続しない
いくつかの音節が脱落した場合でも、その単語の存在と
入力音声中での位置を検出することが可能となり、しか
も検出処理の途中で生成される音節候補列の数が極めて
少なくて、効率の良い単語検出を行なうことが可能とな
る、単語検出方式を提供することができる。(Effects of the Invention) As described above, according to the present invention, even if some non-consecutive syllables in a word to be detected are dropped at the stage of syllable candidate extraction from an input voice, the existence of that word exists. And a position in the input voice can be detected, and the number of syllable candidate sequences generated during the detection process is extremely small, which enables efficient word detection. Can be provided.

[Brief description of drawings]

第１図は本発明の一実施例を示すブロック図、第２図は
第１図実施例における入力音声と抽出された音節候補の
一例を示す図、第３図は第１図実施例における音節候補
記憶部の内容の一例を示す図である。１……音節検出部、２……音節候補記憶部、３……単語
記憶部、４……単語バッファ、５……音節列生成部、６
……音節列記憶部。FIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is a diagram showing an example of an input voice and extracted syllable candidates in the embodiment of FIG. 1, and FIG. 3 is a syllable in the embodiment of FIG. It is a figure which shows an example of the content of a candidate storage part. 1 ... syllable detection unit, 2 ... syllable candidate storage unit, 3 ... word storage unit, 4 ... word buffer, 5 ... syllable string generation unit, 6
...... Syllable string storage unit.

Claims

[Claims]

1. A category candidate sequence corresponding to a word category sequence is generated using a plurality of category candidates extracted from an input speech, which is a sequence of categories such as syllables, phonemes, and phoneme classes, and their position information. By doing so, in a word detection method for detecting a word in an input voice and its appearance position, each of a plurality of category candidates and their position information obtained from the input voice is classified and stored by the category name, and the word is stored. The category candidates corresponding to each category and their position information are selected from the category candidates stored under the same name as the category according to the order of arrangement of the categories, and three consecutive categories in the word are selected. When the first and last categories in the sequence of correspond to the respective category candidates in the sequence of two consecutive category candidates in the input speech, A sequence of three categories and the alignment of two categories candidates in correspondence, features and to word detection scheme to perform the generation of the category candidate string.