JPS6344700A

JPS6344700A - Word detection system

Info

Publication number: JPS6344700A
Application number: JP61190258A
Authority: JP
Inventors: 畑崎　香一郎
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1986-08-12
Filing date: 1986-08-12
Publication date: 1988-02-25
Anticipated expiration: 2009-07-27
Also published as: JPH0656556B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Abstract] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は音声認識装置、音声入力装置等において用いら
れ、入力音声中に含まれる単語とその惧語の音声中での
位置とを検出する単語検出方式に関する。[Detailed Description of the Invention] (Industrial Application Field) The present invention is used in speech recognition devices, speech input devices, etc., and detects words included in input speech and the position of the word in the speech. Concerning word detection methods.

（従来の技術）音声認識装置、音声入力装置等において入力音声中の単
シ５とその位置を検出する方法に１．音節、音素、音素
クラス等のカテゴリの列である入力音声から各カテゴリ
とそれらの入力音声中での位置情報とを抽出し、抽出さ
れたカテゴリから作成したカテゴリ列がある単語のカテ
コ゛りン１］に対応すれば、その単語と入力音声中１ご
のカテゴリ列の位置とを検出結果として田力ｒる方法が
ある。(Prior Art) A method for detecting a single character 5 and its position in an input voice in a voice recognition device, voice input device, etc. 1. Each category and its position information in the input speech are extracted from the input speech, which is a series of categories such as syllables, phonemes, phoneme classes, etc., and the categorization 1 of the word has a category series created from the extracted categories. ], there is a method that uses the word and the position of the first category string in the input speech as the detection result.

一般１、：上述のカテゴリは、その時間長が短かく、ま
た類似するカテゴリが存在することなどから、入力音声
中のカテゴリを完全に誤りなく抽出することは困難であ
る。このため、従来は、入力音声中の各カテゴリの区間
に対して複数個のカテゴリ候補を抽出しておき、入力音
声の端から頚に、カテゴリ候補を用いて部分的なカテゴ
リ候補列を生成しては単語のカテゴリタｑとの照合を行
なうという処理を繰り返すことによって、その単語に対
応するカテゴリ候補列を見つけていた。この方法の詳細
は、例えば、文献１１特願昭５８＝２１４５４４号、バ
タン認識装置、に述べられているので、ここでは省略す
る。General 1: The above-mentioned categories have short durations and similar categories exist, so it is difficult to extract the categories from the input speech completely without error. For this reason, conventionally, multiple category candidates are extracted for each category section in the input audio, and a partial category candidate sequence is generated using the category candidates from the end of the input audio to the neck. By repeating the process of comparing a word with the category tag q, a category candidate string corresponding to the word was found. The details of this method are described in, for example, Document 11, Japanese Patent Application No. 58/214544, "Bang Recognition Apparatus", and will therefore be omitted here.

また、入力音声中のカテゴリ抽出の段階において、発声
のなまけや隣接するカテゴリ（例えば音節）どうしの調
音結合などの原因によって、あるカテゴリが消失してい
たり、あるいはその存在が検出できなかった結果、その
前後のカテゴリが隣接するものとして抽出きれてしまう
ことがある。In addition, during the stage of category extraction from input speech, a certain category may disappear due to reasons such as lax pronunciation or articulatory combination between adjacent categories (for example, syllables), or its existence may not be detected. The categories before and after the category may be extracted as adjacent categories.

この現象を以後、カテゴリの脱落と呼ぶ。This phenomenon will hereinafter be referred to as category dropout.

このことに対処するため、従来は、どのようなカテゴリ
の並びのときにカテゴリ脱落が起こるかということをあ
らかじめ調査し、その結果から比較的頻度の高いカテゴ
リ脱落について、そのカテゴリ脱落の起こっているカテ
ゴリ列を脱落したカテゴリが復元されたカテゴリ列に変
換するカテゴリタ９訂正規則を用意する。この規則を、
単語検出時に、カテゴリ候補列に適用することによって
、比較的頻度の高いカテゴリ脱落に対しては、脱落した
カテゴリを復元することができる。この方法の詳細は、
例えば文献２ｒ並木、浜田、中津、“音声認識を用いた
日本語入力方式”、信学論Ｖｏ１．　Ｊ　６７　　Ｄ　
　Ｎｏ、　４　、１９８４．４　Ｊに述べられているの
で、ここでは省略する。In order to deal with this, conventional methods have been to investigate in advance the arrangement of categories in which category dropouts occur, and based on the results, to determine which category dropouts occur with a relatively high frequency. A category 9 correction rule for converting a category string into a category string in which dropped categories are restored is prepared. This rule
By applying this method to a category candidate string when detecting a word, it is possible to restore a dropped category even if the category is dropped with a relatively high frequency. For more information on this method, see
For example, Reference 2r Namiki, Hamada, Nakatsu, “Japanese Input Method Using Speech Recognition,” Theory of IEICE Vol. 1. J67D
No. 4, 1984.4 J, so it will be omitted here.

（発明が解決しようとしている問題点）上記従来の方法
では、入力音声から抽出されたカテゴリ候補を用いてカ
テゴリ候補列を生成したのちに、単語のカテゴリ列との
照合を行なっていたために、最終的に無駄になるカテゴ
リ候補列が多数生成きれてしまい、そのために多大な計
算量を必要としていた。(Problem to be Solved by the Invention) In the above conventional method, a category candidate string is generated using category candidates extracted from input speech, and then a match is made with a word category string. This results in a large number of wasteful category candidate sequences being generated, which requires a large amount of calculation.

また、検出すべき単語の区間が入力音声の一部分しか占
めない場合でも、従来は、その単語の存在しない区間を
含め、入力音声の端からすべてのカテゴリ候補について
等しく単語中のカテゴリとの照合を行なわねばならず、
無駄な計算時間を必要とし、Ｉ−ｉの検出まで長い時間
を必要としていた。In addition, even if the section of the word to be detected occupies only a part of the input speech, conventionally, all category candidates are equally matched against the categories in the word from the end of the input speech, including the section where the word does not exist. must be done,
This requires unnecessary calculation time and requires a long time until I-i is detected.

さらに、前記のカテゴリ列訂正規則は、カテゴリ脱’１
ｇの起こっているカテゴリ候補列だけではなくて、起こ
っていないカテゴリ候補列にも等しく適用される。また
、−つのカテゴリ候補列に対しては多くの場合、複数個
の訂正規則が個別に適用きれる。このため、一つのカテ
ゴリ候補列から多くのカテゴリ候補タリが生成されてし
まい、検出すべき！Ｘ語に対応するカテゴリ候補列が見
つかるまで、多くのカテゴリ候補列を検査しなければな
らない。しかもそのカテゴリ候補列のほとんどは、検出
すべき単語のカテゴリ列とは一致せずに最終的に無駄に
なるものである。Furthermore, the above category string correction rule
This applies not only to category candidate sequences in which g occurs, but also to category candidate sequences in which g does not occur. Furthermore, in many cases, a plurality of correction rules can be individually applied to - category candidate columns. For this reason, many category candidates are generated from one category candidate sequence, and they must be detected! Many candidate category sequences must be examined until a category candidate sequence corresponding to the X word is found. Moreover, most of the category candidate strings do not match the category string of the word to be detected and are ultimately wasted.

また、訂正規則で復元できるカテゴリは、比較的頻繁に
起こるカテゴリ脱落によるものに限られ、比較的まれに
起こる脱落に対しては復元は不可能である。復元できる
脱落を増やすためには訂正規則の数を増加させなければ
ならず、この結果、生成きれるカテゴリ候補列はますま
す増加する。Further, the categories that can be restored using the correction rules are limited to those due to category omissions that occur relatively frequently, and it is impossible to restore categories that occur relatively infrequently. In order to increase the number of omissions that can be restored, the number of correction rules must be increased, and as a result, the number of category candidate sequences that can be generated increases.

例えば、「オンセイニンシキトワ（音声認識とは）、と
発声移れた音声から、その中の音節候補を抽出しようと
したところ、音節“シ”の！！統待時間長短く、音節“
シ”とその前後の音節“ン”と“キ”のそれぞれとの音
節境界が接近していたために、音節“シ”の存在が検出
できずに、その前後の音節“ン”と“キ”の音節候補が
隣接する位置に抽出されたとする。この結果、音節“シ
”以外のすべての音節に対しては正しい音節候補が得ら
れた場合でも、生成される音節候補列は“オンセイニン
キトワ゛となり、脱落した音節“シ”を訂正規則で復元
しなければならない。しかしながら、このような音節の
脱落は比較的まれな種類のものであり、この脱落を訂正
する規則が用意されていることが少ないと思われる。ま
た、たとえ、この訂正規則が用意きれていても、その他
に、例えば、“ニン”→“ニイン”、“イ”→“イイ”
という訂正規則が用意されていることは多く、これらが
適用きれることによって、“オンセイイニンキトワ″、
“オンセイニイントワ”、“オンセイイニイントワ”な
どの無駄な音節候補列も生成きれてしまう。For example, when I tried to extract syllable candidates from a voice that had been pronounced as ``Onseininshikitwa (What is speech recognition?),'' I found that the syllable ``shi'' was pronounced!!The waiting time was short, and the syllable ``
Because the syllable boundaries between ``shi'' and the syllables ``n'' and ``ki'' before and after it were close, the presence of the syllable ``shi'' could not be detected, and the syllables ``n'' and ``ki'' before and after it could not be detected. Assume that syllable candidates are extracted at adjacent positions.As a result, even if correct syllable candidates are obtained for all syllables other than the syllable "shi", the generated syllable candidate string is The syllable ``shi'' which becomes towa and is dropped must be restored using correction rules. However, this kind of syllable omission is a relatively rare type, and it seems that there are few rules in place to correct this omission. Also, even if this correction rule is prepared, there are still other errors, such as “nin” → “niin”, “i” → “ii”.
There are many correction rules that are prepared, and by applying these, “Onsei Inin Kitwa”,
Useless syllable candidate strings such as "once in intowa" and "once ini intowa" are also generated.

本発明の目的は、無駄なカテゴリ候補列を生成せず、ま
た、検出すべき単語の区間が入力音声全体のごく一部で
ある場合や、きらに入力音声中の検出すべき単語中のい
くつかのカテゴリが脱落した場合でも、効率よく入力音
声から単語と−１の位置とを検出することを可能にする
！Ｘ語検出方式を提供することにある。It is an object of the present invention to avoid generating unnecessary category candidate sequences, and to avoid generating unnecessary category candidate sequences when the range of words to be detected is a small part of the entire input speech, or when the number of words to be detected in the input speech is Even if a category is dropped, it is possible to efficiently detect words and the position of -1 from input speech! The object of the present invention is to provide an X word detection method.

（問題点を解決するための手段）前述の問題点を解決し上記目的を達成するために本発明
が提供する手段は、音節、音素、音素クラス等のカテゴ
リの列である入力音声から抽出した複数個のカテゴリ候
補とそれらの位置情報とを用いて、単語のカテゴリタ［
１に対応するカテゴリ候補列を生成することによって、
入力音声中の単語とその出現位置を検出する単語検出方
式であって、入力音声から得た複数個のカテゴリ候補の
それぞれをそのカテゴリ名で分類して記ｔαし、単語中
のカテゴリの並びの顕に従って各カテゴリに対応するカ
テゴリ候補をそのカテゴリと同じ名前に分類されて記憶
されているカテゴリ候補の中から選ぶとともに、単語中
の連続する３個のカテゴリの並びの最初と最後のカテゴ
リが、入力音声中の連続する２個のカテゴリ候補の並び
のそれぞれのカテゴリ候補に対応するときは、その３個
のカテゴリの並びと２個のカテゴリ候補の並びとを対応
させて、カテゴリ候補列の生成を行なうことを特徴とす
る。(Means for Solving the Problems) In order to solve the above-mentioned problems and achieve the above objects, the present invention provides means for solving the above-mentioned problems and achieving the above objects. Using multiple category candidates and their positional information, word category [
By generating a category candidate sequence corresponding to 1,
This is a word detection method that detects words and their appearance positions in input speech. Each of a plurality of category candidates obtained from input speech is classified and written by its category name, and the order of categories in a word is calculated. Accordingly, a category candidate corresponding to each category is selected from among the category candidates stored and classified with the same name as that category, and the first and last categories in the sequence of three consecutive categories in the word are When corresponding to each category candidate in a sequence of two consecutive category candidates in the input audio, a sequence of category candidates is generated by making the sequence of three categories correspond to the sequence of two category candidates. It is characterized by carrying out the following.

（作用）本発明の方式では、入力音声から抽出きれたカテゴリ候
補のうち、検出すべき単語に含まれるカテゴリと同じ名
前のカテゴリ候補だけを用いて、かつ単語中のカテゴリ
の並びを辿りながら対応するカテゴリ候補タロを生成す
る。このことによって、単語のカテゴリ列あるいはその
部分列に対応するカテゴリ候補列だけが生成されること
になり、無駄なカテゴリ列を生成することを避けること
が可能となる。(Operation) The method of the present invention uses only category candidates with the same name as the category included in the word to be detected from among the category candidates extracted from the input speech, and responds by tracing the arrangement of categories in the word. Generate category candidate taro. As a result, only category candidate sequences corresponding to word category sequences or subsequences thereof are generated, making it possible to avoid generating unnecessary category sequences.

また、入力音声中のカテゴリ候補のうち、単語中のカテ
ゴリに対応するカテゴリ候補からカテゴリ候補列を生成
してゆくために、検出すべき単語の区間が入力音声の全
体のごく一部の場合であっても、また、その区間が入力
音声中のどの位置にあっても、素早くその単語を検出す
ることが可能となる。In addition, in order to generate a category candidate string from the category candidates corresponding to the categories in words among the category candidates in the input speech, it is necessary to Even if the word exists, and no matter where the section is located in the input speech, it is possible to quickly detect the word.

入力音声中の単語の中のあるカテゴリが脱落した場合、
そのカテゴリの前後にそれぞれ隣接する２個のカテゴリ
に対するカテゴリ候補は入力音声中で互いに隣り合う。If a certain category of words in the input audio is dropped,
Category candidates for two categories adjacent to each other before and after that category are adjacent to each other in the input speech.

すなわち、単語中のカテゴリ列をＣ＋−Ｉ　ＣＩ　ＣＩ
ｏｌとし、カテゴリＣ１が脱落すると、Ｃ＋＋ＩはＣＩ
−１に後続するものとして、それぞれのカテゴリ候補が
抽出される。In other words, the category string in a word is C+-I CI CI
ol, and if category C1 drops out, C++I becomes CI
-1, each category candidate is extracted.

そこで、検出すべき単語中のカテゴリの並びを辿りなが
ら、その単語に対応するカテゴリ候補列を生成するとき
に、上記のＣＩ＋　Ｈに対応するカテゴリ候補がＣｌ−
１に対応するカテゴリ候補に入力音声中で後続するなら
ば、その２個のカテゴリ候補の並びを単語中のカテゴリ
列Ｃ＋−＋　Ｃ、Ｃ、、、に対応させる。このことによ
って、カテゴリＣ５が脱落していても、カテゴリ候補列
とカテゴリ列との正しい対応をとることが可能になる。Therefore, when generating a category candidate string corresponding to the word while tracing the arrangement of categories in the word to be detected, the category candidate corresponding to CI+H above is Cl−
If the category candidate corresponding to 1 follows in the input speech, the arrangement of the two category candidates is made to correspond to the category string C+-+ C, C, . . . in the word. As a result, even if category C5 is dropped, it is possible to maintain correct correspondence between the category candidate string and the category string.

また、単語のカテゴリ列に対応するカテゴリ候補列だけ
が生成きれることになるため、無駄なカテゴリ候補列の
生成を避けることができる。Furthermore, since only the category candidate string corresponding to the word category string can be generated, it is possible to avoid wasteful generation of category candidate strings.

（実施例）以下、図面を参照しつつ、実施例に従って本発明を一層
詳細に説明する。(Examples) Hereinafter, the present invention will be described in more detail according to examples with reference to the drawings.

第１図は本発明の一実施例を示すブロック図である。FIG. 1 is a block diagram showing one embodiment of the present invention.

本実施例では日本語の音声の入力がされるものとして、
カテゴリとして音節を用いる。In this example, it is assumed that Japanese audio is input.
Use syllables as categories.

音節抽出部１は入力音声中の音節候補を検出し、その候
補を音節候補記憶部２に記憶する。例として、“オンセ
イニンシキトハ”（音声Ｔ２ｍとは）という音声が入力
されたとする。この場合、音節認識の結果として例えば
第２区に示されるような音節候補が抽出される。第２図
において、矢印の線が各音節候補の区間であり、各区間
に複数個の音節候補が抽出されている。これらの音節候
補は、音節者で分類されて、音節候補記憶部２に記憶さ
れる。この結果、音節候補記憶部２の内容は第３図に示
されるようになる。この図では、各音節候補を“音節基
／始端時刻：終端時刻”の形式で表現している。The syllable extraction unit 1 detects syllable candidates in input speech and stores the candidates in the syllable candidate storage unit 2. As an example, assume that the voice "Onseininshikitoha" (what is the voice T2m) is input. In this case, syllable candidates such as those shown in the second section are extracted as a result of syllable recognition. In FIG. 2, arrow lines indicate sections of each syllable candidate, and a plurality of syllable candidates are extracted for each section. These syllable candidates are classified by syllable person and stored in the syllable candidate storage section 2. As a result, the contents of the syllable candidate storage section 2 become as shown in FIG. In this figure, each syllable candidate is expressed in the format of "syllable base/start time: end time".

単語記憶部３には検出すべき単語の音節列が記憶されて
いる。その中の１個の単語を単語バッファ４に取り出し
た後、入力音声にこのａｉ語が含まれるかどうかが調べ
られる。今、単語バッファ４には！！Ｌ語ｒ認識」の音
節列“ニンシキ”が記憶されているとする。The word storage unit 3 stores syllable strings of words to be detected. After one word is extracted into the word buffer 4, it is checked whether the input speech contains this ai word. Now in word buffer 4! ! It is assumed that the syllable string "ninshiki" of "L word r recognition" is stored.

音節候補列生成部５は単語バッファ４に記憶されている
一屯語中の音節の並びの順に、音節候補記憶部２中の音
節候補から音節候補列を作成し、その結果の音節候補列
と対応する音節列とを音節候補列記憶部６に記憶する。The syllable candidate string generation section 5 creates a syllable candidate string from the syllable candidates in the syllable candidate storage section 2 in the order of the arrangement of syllables in the one-ton words stored in the word buffer 4, and combines the resulting syllable candidate string with the syllable candidate string. The corresponding syllable string is stored in the syllable candidate string storage section 6.

本実流側では、単Ｊ１の先頭の音節から順に音ｆｆｉ列
を作成（７てゆく。On the main stream side, a string of sounds ffi is created in order from the first syllable of single J1 (starting from 7).

まず、単語バッファ４の先頭の音節は“二“であるから
、音節候補列生成部５は音節俣？＋！ｉ記憶部２中で“
二”に分類されて記憶きれている音節候補を取り出し、
それぞれを長さ１の音節候補列として音節“二”ととも
に音節俣補列記士α部６に記憶する。この結果、音節候
補列記憶部６には、■二／２：４（ニ） ■二／１０：１３（ニ）の２個の音節候補列が記憶される。ここで、括弧の中が
対応する音節列である。First, since the first syllable of the word buffer 4 is "two", the syllable candidate string generation unit 5 is a syllable mata? +! In the i-storage unit 2, “
Extract the syllable candidates that have been classified into ``2'' and have been memorized.
Each of these is stored as a syllable candidate string of length 1 in the syllable mater supplementary sequencer α section 6 together with the syllable "two". As a result, the syllable candidate string storage section 6 stores two syllable candidate strings: ■2/2:4 (d) and ■2/10:13 (d). Here, the corresponding syllable string is in parentheses.

次に、音節候補列生成部５はＭ６Ｄバッファ４中の次の
音節ン゛′とその次の音節“シ″に注目し、音節候補記
憶部２中で“ン”あるいは“シ”に分類きれて記憶され
ている音節候補のそれぞれについて、音節候補記憶部Ｇ
中のいずれかの音節候補夕Ｕの最後尾の音節候補に入力
音声中で後読しているかどうかを調べる。後続している
音節候補があれば、その音節候補を音節候補列の最後尾
に連結して新たな音節候補列を生成し音節候補列記憶部
６に記憶する。“ン”あるいは“シ”に分類されて記憶
されている音節候補列は、ン／２：４、ン／１３：１６
、シ／４ニアの３個である。Next, the syllable candidate string generation unit 5 pays attention to the next syllable ``n'' and the next syllable ``shi'' in the M6D buffer 4, and determines whether they can be classified into ``n'' or ``shi'' in the syllable candidate storage unit 2. For each syllable candidate stored in the syllable candidate storage unit G
It is checked whether or not the last syllable candidate of any of the syllable candidates Yu U is read behind in the input speech. If there is a subsequent syllable candidate, that syllable candidate is connected to the end of the syllable candidate string to generate a new syllable candidate string and stored in the syllable candidate string storage section 6. The syllable candidate strings that are classified and stored as “n” or “shi” are n/2:4, n/13:16.
, senior/four-near.

音節候補Ａが他の音節候補Ｂに後読しているかどうかは
音節候補Ａの終端時刻と音節候補Ｂの始端時刻とを比較
することによって判定することができる。ここでは、そ
れらの時刻の差がプラスマイナス１以下のときに後読す
ると判定する。そこで今の場合は、音節候補ン／１３：
１６を音節候補列■に連結して音節列“ニン”に対応さ
せ、音節候補シ／４ニアを音節候補■に連結し工、音節
列４ニンシ”に対応させろ。また、それまで音節候補列
記憶部６に記憶きれていた音節候補列は削除する。この
結果、音節候補列記憶部６の中には、■二／１０：ｉ３
−ン／′１３：１６にン′）■二／２：４−シ／４ニア
にンン）の２個の音節候補列が残る。Whether syllable candidate A reads behind another syllable candidate B can be determined by comparing the ending time of syllable candidate A and the starting end time of syllable candidate B. Here, it is determined that reading later is to be performed when the difference between these times is less than or equal to plus or minus 1. So in this case, the syllable candidate /13:
Connect 16 to the syllable candidate string ■ to make it correspond to the syllable string "nin", and connect the syllable candidate si/4 nia to the syllable candidate ■ to make it correspond to the syllable string 4 ninshi. The syllable candidate string that has been completely stored in the storage unit 6 is deleted.As a result, in the syllable candidate string storage unit 6, ■2/10:i3
Two syllable candidate sequences remain: -n/'13:16 ni n') ■2/2:4-shi/4 nia ni nn).

読いて、音節“シ”についての処理に進む６音節候補記
憶部２中で、“シ”あるいはその次の音節“キ“′に分
類されて記憶されている音節候補は、シ／４ニアとキ／
１６：１９の２個である。このそれぞれについて音節候
補列■あるいは■の最後尾の音節候補に入力音声中で後
読するかを調べると、キ／１６：１９が音４５侯補列■
の最後尾の音節候補ン／１３　：　ｉ６に後続すると判
定される。この結果、音節候補列■に音節候補キ／１６
　：　１９を連結してできた新たな音節候補列■を、音
節不１１″ニンシキ”と対応させて、音罪列記憶部６に
記憶する。In the 6-syllable candidate storage unit 2, the syllable candidates that are classified and stored as "shi" or the next syllable "ki" are classified as "shi" and the next syllable "ki" are stored as "shi"/4-near. tree/
There are two at 16:19. For each of these, if we check whether the last syllable candidate of syllable candidate string ■ or ■ is read later in the input voice, Ki/16:19 is the sound 45 complement sequence ■
The last syllable candidate n/13: It is determined that it follows i6. As a result, the syllable candidate string ■ has a syllable candidate key of /16.
: A new syllable candidate string ■ created by concatenating syllables 19 and 19 is stored in the syllable string storage unit 6 in association with syllable number 11 "ninshiki".

従って、音節候補列記憶部６の内容は ■二／１０：１Ｂ−ン／１３：１６−キ／１６：１９に
ンシキ）となる。Therefore, the contents of the syllable candidate string storage section 6 are as follows.

ここで、単語バッファ４の中の最後の音節に達している
ため、音節候補列生成部５は、屯Ｂｎ１認識、が入力音
＞’４３中の時刻１０から時刻１９に至る区間に存在す
るということを出力ず乙。Here, since the last syllable in the word buffer 4 has been reached, the syllable candidate string generation unit 5 recognizes that the tun Bn1 recognition exists in the section from time 10 to time 19 in the input sound>'43. Don't output that.

ｊ〕（上、本発明の一実施併ｊを説明した。なお、音節
の脱落は、連続しないかぎり、１個の爪語中に複数細土
じていてもよい。(Above, one embodiment of the present invention has been described.) As long as the syllables are not consecutive, the syllables may be dropped multiple times in one nail word.

（発明の効果）以上説明したように、本発明によれば、入力音声からの
音節候補抽出の段階で、検出すべき単語中の連続しない
いくつかの音節が脱落した場合でも、その単語の存在と
入力音声中での位置を検出することが可能となり、しか
も検出処理の途中で生成される音節候補列の数が極めて
少なくて、効率の良い単語検出を行なうことが可能とな
る、単語検出方式を提供することができる。(Effects of the Invention) As explained above, according to the present invention, even if some non-consecutive syllables in a word to be detected are dropped at the stage of extracting syllable candidates from input speech, the presence of the word A word detection method that makes it possible to detect the position in input speech, and also allows for highly efficient word detection because the number of syllable candidate sequences generated during the detection process is extremely small. can be provided.

[Brief explanation of drawings]

第１図は本発明の一実施例を示すブロック図、第２図は
第１図実施例における入力音声と抽出された音節候補の
一例を示す図、第３図は第１図実施例における音節候補
記憶部の内容の一例を示す図である。１・・・音節検出部、２・・・音節候補記憶部、３・・
・単語記憶部、４・・・単語バッファ、５・・・音節列
生成部、６・・・音節列記憶部。入力音声第１図第２図第３図FIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is a diagram showing an example of input speech and extracted syllable candidates in the embodiment of FIG. 1, and FIG. 3 is a syllable in the embodiment of FIG. 1. It is a figure which shows an example of the content of a candidate storage part. 1... Syllable detection unit, 2... Syllable candidate storage unit, 3...
- Word storage section, 4... word buffer, 5... syllable string generation section, 6... syllable string storage section. Input audio Figure 1 Figure 2 Figure 3

Claims

[Claims]

By using a plurality of category candidates extracted from the input speech, which is a sequence of categories such as syllables, phonemes, and phoneme classes, and their position information, a category candidate sequence corresponding to the word category sequence is generated. In a word detection method that detects words and their positions in speech, multiple category candidates obtained from input speech are classified and stored by their category names, and each category is sorted according to the order of the categories in the word. Select the category candidate corresponding to the category from among the stored category candidates classified with the same name as the category, and select the first and last categories of the three consecutive categories in the word in the input speech. When corresponding to each category candidate in a sequence of two consecutive category candidates, the sequence of three categories is made to correspond to the sequence of two category candidates to generate a category candidate sequence. A word detection method that uses