JPS62219000A

JPS62219000A - Word voice recognition equipment

Info

Publication number: JPS62219000A
Application number: JP61060970A
Authority: JP
Inventors: 教幸藤本; 佐藤　泰雄
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1986-03-20
Filing date: 1986-03-20
Publication date: 1987-09-26

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔概　要〕入力単語音声を予め登録されている音節対応の標準パタ
ーンの系列と照合して音節候補列を抽出し、認識する単
語の文字°系列のうち音節候補列の音節数に対応する音
節数を有する各単語に対応する文字系列と前記音節候補
列との各桁毎の距離を、各音節間の距離が登録されてい
る音節間距離メモリに基づいて求め、この距離が最小と
なる単語を認識結果として出力する。これにより、小容
量の記憶装置を用いて事実上無限語賃又は人語党単語音
声の認識が可能となると共に、認識処理量を低減させる
ことが出来る。[Detailed Description of the Invention] [Summary] A syllable candidate string is extracted by comparing the input word sound with a series of standard patterns corresponding to syllables registered in advance, and a syllable candidate string is extracted from the character series of the word to be recognized. Find the distance for each digit between the character sequence corresponding to each word having the number of syllables corresponding to the number of syllables and the syllable candidate string based on an inter-syllable distance memory in which the distance between each syllable is registered, The word with the minimum distance is output as the recognition result. As a result, it is possible to recognize virtually infinite word length or human word speech using a small-capacity storage device, and the amount of recognition processing can be reduced.

[Industrial application field]

本発明は音節を認識単位とする単語音声の認識装置、特
に、人語霊の単語群を認識する場合に好適な単語音声認
識装置に関する。The present invention relates to a word speech recognition device using syllables as recognition units, and particularly to a word speech recognition device suitable for recognizing word groups of human speech.

[Conventional technology]

単語音声を認識する方式の代表的なものとして、従来、
第５図（Ａ）に示す様に単語を認識単位とする方式と、
同図（Ｂ）に示す様に単語よりも小さい音節（又は音素
）を認識単位とする方式がある。Conventionally, as a typical method for recognizing word sounds,
As shown in Figure 5 (A), a method using words as the recognition unit,
As shown in FIG. 5B, there is a method in which syllables (or phonemes) smaller than words are used as recognition units.

第５図（Ａ）に示す単語を認識単位とする単語音声認識
方式では、音声分析部２１０が入力音声を分析し、特徴
パラメタの抽出や区間検出を行って入力音声パターンを
作成し、単語認識部２２０に入力する。In the word speech recognition method using words as recognition units shown in FIG. 220.

一方、単語標準パターン部２３０には、認識対象単語の
分析結果が標準パターンとして予め登録されている。On the other hand, in the word standard pattern section 230, the analysis results of recognition target words are registered in advance as standard patterns.

単語認識部２２０は、単語標準パターン部２３０の各単
語の標準パターンと入力音声パターンのパラメタを比較
して、距離の最も小さい標準パターンの単語を認識結果
として出力する。The word recognition unit 220 compares the standard pattern of each word in the word standard pattern unit 230 with the parameters of the input speech pattern, and outputs the word of the standard pattern with the smallest distance as a recognition result.

第５図（Ｂ）に示す音節（又は音素）を認識単位とする
単語音声認識方式では、音声分析部２４０が入力音声を
小区間毎に音節（又は音素）の特徴を表すパラメタを抽
出して入力音節パターンを作成し、音節（又は音素）認
識部２５０に入力する。音素は、母音又は子音であり、
音節（シラブル）は音素から成り立ち、通常１つの母音
と１ないし２個の子音が結合して形成される。In the word speech recognition method using syllables (or phonemes) as recognition units shown in FIG. An input syllable pattern is created and input to the syllable (or phoneme) recognition unit 250. A phoneme is a vowel or a consonant,
A syllable is made up of phonemes, usually formed by combining one vowel and one or two consonants.

音節（又は音素）標準パターン部２６０には、認識対象
音節（又は音素）の分析結果が標準パターンエして予め
登録されている。In the syllable (or phoneme) standard pattern section 260, the analysis result of the recognition target syllable (or phoneme) is registered in advance as a standard pattern.

音節（又は音素）認識部２５０は、入力音声の入力音節
（又は音素）パターンと音節（又は音素）標準パターン
部２６０の各音節（又は音素）の標準パターンのパラメ
タを比較して各距離を求め、単語認識部２７０に入力す
る。The syllable (or phoneme) recognition unit 250 calculates each distance by comparing the parameters of the input syllable (or phoneme) pattern of the input speech and the standard pattern of each syllable (or phoneme) in the syllable (or phoneme) standard pattern unit 260. , is input to the word recognition section 270.

単語辞書部２８０には、認識対象とする各単語をその音
節（又は音素）の系列で表現したものが格納されている
。The word dictionary section 280 stores each word to be recognized expressed as a series of its syllables (or phonemes).

単語認識部２７０は、音節（又は音素）認識部２５０よ
り人力された各音節（又は音素）系列の距離情報に基づ
いて単語辞書部２８０の各単語毎の音節（又は音素）系
列の距離を求め、その値が最も小さい単語を認識結果と
して出力する。The word recognition unit 270 calculates the distance between syllable (or phoneme) sequences for each word in the word dictionary unit 280 based on the distance information for each syllable (or phoneme) sequence manually input by the syllable (or phoneme) recognition unit 250. , the word with the smallest value is output as the recognition result.

単語を認識単位とする単語音声認識方式は、認識率は良
いが、反面、処理量及び標準パターンの記憶容量が単語
数の増加と共に急速に増大する。The word speech recognition method that uses words as recognition units has a good recognition rate, but on the other hand, the amount of processing and the storage capacity of standard patterns increase rapidly as the number of words increases.

この為、１０００以上といった大語鴬の場合には不利で
ある。For this reason, it is disadvantageous in the case of large words such as 1000 or more.

これに対し、音節（又は音素）を認識単位とする単語音
声認識方式は認識率の点ではやや劣るが、認識に必要な
距離計算が入力音声と音節（又は音素）の標準パターン
との間で行われることから、単語数が増加しても処理量
や標準パターンの記憶容量の増加が単語を認識単位とす
る単語音声認識方式よりもはるかに少ないという長所を
もっている。On the other hand, word speech recognition methods that use syllables (or phonemes) as recognition units are slightly inferior in terms of recognition rate, but the distance calculation required for recognition is This method has the advantage that even if the number of words increases, the increase in processing amount and storage capacity for standard patterns is much smaller than in the word speech recognition method, which uses words as recognition units.

この様なことから、一般に単語数が少ない語霊のときは
単語を認識単位とする単語音声認識方式が有利であり、
単語数の多い人語量のときには音節（又は音素）を認識
単位とする単語音声認識方式が有利であるとされている
。For this reason, word speech recognition methods that use words as recognition units are generally advantageous when the number of words is small.
When a person speaks a large number of words, a word speech recognition method that uses syllables (or phonemes) as recognition units is said to be advantageous.

[Problem that the invention seeks to solve]

音節を認識単位とする単語音声認識方式は、認識する単
語数が増加しても、処理量や記憶容量の増加が単語を認
識単位とする単語音声認識方式よりもはるかに少ないと
いう特徴を有している。然しなから、人語量の単語音声
の認識の場合は、処理量や単語辞書部の記憶容量の絶対
量は未だ多いのが現状である。この為、人語量の単語群
を対象とする単語音声認識装置においては、処理量や単
語辞書部の記憶容量を更に減少させることが問題点とな
っている。The word speech recognition method that uses syllables as the recognition unit has the characteristic that even if the number of recognized words increases, the increase in processing amount and storage capacity is much smaller than the word speech recognition method that uses words as the recognition unit. ing. However, in the case of recognition of human word speech, the amount of processing and the absolute storage capacity of the word dictionary section are still large. For this reason, in a word speech recognition device that targets word groups of human language, it is a problem to further reduce the amount of processing and the storage capacity of the word dictionary section.

本発明は、処理量や単語辞書部の記憶容量を従来よりも
低減した、音節を認識単位とする単語音声認識装置を提
供することを目的とする。SUMMARY OF THE INVENTION An object of the present invention is to provide a word speech recognition device that uses syllables as a recognition unit, which reduces the amount of processing and the storage capacity of a word dictionary section compared to conventional methods.

[Means for solving problems]

従来の音節を認識単位とする単語音声認識方式における
前述の問題点を解決する為に本発明が講じた手段を、第
１図を参照して説明する。The means taken by the present invention to solve the above-mentioned problems in the conventional word speech recognition method using syllables as recognition units will be explained with reference to FIG.

第１図は、本発明の基本構成をブロック図で示したもの
である。FIG. 1 is a block diagram showing the basic configuration of the present invention.

第１図において、１１０は音節候補列抽出部で、入力さ
れた単語音声を予め登録されている音節対応の標準パタ
ーンの系列と照合して１組の音節候補列を出力する。In FIG. 1, 110 is a syllable candidate string extraction unit that compares the input word sound with a series of pre-registered standard patterns corresponding to syllables and outputs a set of syllable candidate strings.

１２０は単語認識部で、各音節間の距離が格納されてい
る音節間距離メモリ　（１２１）を備え、認識する単語
に対応する文字系列のうち前記音節候補列の音節数に対
応する音節数を有する各単語に対応する文字系列と前記
音節候補列との間の各桁毎の距離を、前記音節間距離メ
モリ（１２１）に基づいて求め、この距離の和が最小と
なる単語を認識結果として出力する。Reference numeral 120 denotes a word recognition unit, which includes an inter-syllable distance memory (121) in which the distance between each syllable is stored, and calculates the number of syllables corresponding to the number of syllables in the syllable candidate string among the character series corresponding to the word to be recognized. The distance for each digit between the character sequence corresponding to each word and the syllable candidate string is determined based on the inter-syllable distance memory (121), and the word with the minimum sum of distances is determined as the recognition result. Output.

なお、単語を構成する各音節を単語の発音順に配列した
ものが文字系列である。又、桁は各音節の位置関係を表
現するもので、単語を構成する各音節を先頭から順に１
桁目、２桁目、・・・・・・と呼ぶ。Note that a character sequence is a sequence of syllables that make up a word arranged in the order of pronunciation of the word. In addition, the digits express the positional relationship of each syllable, and each syllable that makes up a word is numbered 1 from the beginning.
They are called the digit, second digit, etc.

[For production]

単語音声が例えばマイクロホン等から入力されると、音
節候補列抽出部１１０は、入力された単語音声を予め登
録されている音節対応の標準ツクターンの系列と照合し
て音節候補列を抽出して単語認識部１２０に入力する。When a word sound is input from a microphone or the like, the syllable candidate string extracting unit 110 compares the input word sound with a pre-registered series of standard words corresponding to syllables, extracts a syllable candidate string, and extracts a syllable candidate string. The information is input to the recognition unit 120.

この音節候補列は、音節の配列であって距離情報は持っ
ていない。This syllable candidate string is an arrangement of syllables and does not have distance information.

単語認識部１２０は、認識する単語の文字系列のうち入
力された音節候補列の音節数に対応する音節数を有する
各単語に対応する文字系列と前記音節候補列との各桁毎
の距離を音節間距離メモリ１２１に基づいて求め、この
距離の和が最小となる単語を認識結果として出力する。The word recognition unit 120 calculates the distance for each digit between the character sequence corresponding to each word having the number of syllables corresponding to the number of syllables in the input syllable candidate sequence among the character sequences of the word to be recognized and the syllable candidate sequence. The distance is calculated based on the inter-syllable distance memory 121, and the word with the minimum sum of distances is output as the recognition result.

この様にすることにより、音節候補列が１組で済むので
音節候補列の抽出処理量及び単語認識処理量を低減させ
ることが出来る。又、上位システムと接続して所望カテ
ゴリの単語セット情報を受信することにより、小容量の
各種記憶装置を用いて事実上無限鉛量又は人語量の単語
音声認識システムを構成することが出来る。By doing this, only one set of syllable candidate strings is required, so that the amount of processing for extracting syllable candidate strings and the amount of processing for word recognition can be reduced. Furthermore, by connecting to a host system and receiving word set information of a desired category, it is possible to construct a word speech recognition system with virtually unlimited lead or human vocabulary using various small-capacity storage devices.

〔Example〕

本発明の実施例を、第２図〜第４図を参照して説明する
。Embodiments of the present invention will be described with reference to FIGS. 2 to 4.

第２図は本発明の一実施例の構成のブロック説明図、第
３図は音節間距離メモリの一実施例の説明図、第４図は
音節間距離メモリの他の実施例の説明図である。FIG. 2 is a block diagram illustrating the configuration of one embodiment of the present invention, FIG. 3 is a diagram illustrating one embodiment of the inter-syllable distance memory, and FIG. 4 is a diagram illustrating another embodiment of the inter-syllable distance memory. be.

（Ａ）実施例の構成第２図において、音節候補列抽出部１１０、単語認識部
１２０、音節間距離メモリ１２１については、第１図で
説明した通りである。(A) Configuration of the Embodiment In FIG. 2, the syllable candidate string extraction section 110, word recognition section 120, and inter-syllable distance memory 121 are as described in FIG.

１３０はマイクロホンで、話者の発音した単語音声を音
節候補列抽出部１１０に入力する。Reference numeral 130 denotes a microphone, which inputs word sounds pronounced by the speaker to the syllable candidate string extraction section 110.

１４０はエキスパートシステムと呼ばれる上位システム
で、認識された単語に基づいて各種の処理を行う。又、
単語認識部１２０に対し、認識の対象とするカテゴリの
単語セット情報を送る。140 is a host system called an expert system, which performs various processes based on recognized words. or,
The word set information of the category to be recognized is sent to the word recognition unit 120.

音節候補列抽出部１１０において、１１１はパラメタ抽
出部で、入力音声より音声パターンの特徴を表すパラメ
タを抽出する。１１２は区間検出部で、パラメタ抽出部
１１１によって抽出されたパラメタに基づいて単音節毎
の区間検出を行う。In the syllable candidate string extraction unit 110, 111 is a parameter extraction unit that extracts parameters representing the characteristics of a speech pattern from the input speech. Reference numeral 112 denotes a section detecting section, which detects sections for each single syllable based on the parameters extracted by the parameter extracting section 111.

１１３は単音節辞書部で、特定の単語音声から抽出され
た音節対応の標準パターン又は単音節音声から作成され
た音節対応の標準パターンが、予め登録されている。１
１４は音節照合部で、区間検出部１１２から入力された
音節対応の単語音声パターンと単音節辞書部１１３の音
節対応の各標準パターン系列とのパラメタを照合し、そ
の距離の最も小さい標準パターンを音節候補列として出
力する。１１５は切替え回路で、区間検出部１１２の出
力を単音節辞書部１１３又は音節照合部１１４に切り替
える。113 is a monosyllable dictionary section in which standard patterns corresponding to syllables extracted from specific word speech or standard patterns corresponding to syllables created from monosyllabic speech are registered in advance. 1
14 is a syllable matching unit that matches the parameters of the word sound pattern corresponding to syllables inputted from the interval detecting unit 112 and each standard pattern series corresponding to syllables from the monosyllable dictionary unit 111, and selects the standard pattern with the smallest distance between them. Output as a syllable candidate string. 115 is a switching circuit that switches the output of the section detection section 112 to the monosyllable dictionary section 113 or the syllable matching section 114;

単語認識部１２０において、１２２は使用単語辞書部で
、入力単語音声の属するカテゴリ別の単語情報が登録さ
れる。この単語情報は、単語音声入力時点以前に、エキ
スパートシステムより予め送られる。In the word recognition section 120, reference numeral 122 is a used word dictionary section in which word information for each category to which the input word sound belongs is registered. This word information is sent in advance from the expert system before the word voice is input.

１２３は文字列照合部で、使用単語辞書部１２２から読
み出された単語情報の文字系列のうち入力された音節候
補列の音節数に対応する音節数を有する各単語の文字系
列と入力された音節候補列との各桁毎の距離を音節間距
離メモリ１２１に基づいて求め、その距離の和が最小と
なる単語を認識結果としてエキスパートシステム１４０
に送る。Reference numeral 123 denotes a character string matching unit, which inputs the character sequence of each word having the number of syllables corresponding to the number of syllables in the input syllable candidate string among the character sequences of the word information read out from the used word dictionary unit 122. The distance of each digit to the syllable candidate string is calculated based on the inter-syllable distance memory 121, and the word with the minimum sum of distances is recognized by the expert system 140.
send to

（Ｂ）実施例の動作第２図の実施例の動作を、第３図及び第４図を参照して
説明する。(B) Operation of the Embodiment The operation of the embodiment shown in FIG. 2 will be explained with reference to FIGS. 3 and 4.

話者（図示せず）の発声した単語音声に対する認識処理
が開始される前に、単音節辞書部１１３には音節対応の
標準パターンが登録され、使用単語辞書部１２２には入
力単語音声の属するカテゴリ別の単語情報が登録される
。Before recognition processing for a word voice uttered by a speaker (not shown) is started, a standard pattern corresponding to a syllable is registered in the monosyllable dictionary section 113, and a standard pattern corresponding to a syllable is registered in the used word dictionary section 122. Word information by category is registered.

単音節辞書部１１３に音節対応の標準パターンを登録す
る場合は、切替え回路１１５を単音節辞書部１１３側に
接続し、マイクロホン１３Ｇより特定の単語音声又は単
音節音声を入力する。入力された単語音声又は単音節音
声は、パラメタ抽出部１１１及び区間検出部１１２によ
り音節対応の標準パターンが抽出されて単音節辞書部１
１３に登録される。これらパラメタ抽出部１１１及び区
間検出部１１２の構成及び動作は何れも公知であるので
、その詳細な説明は省略する。When registering a standard pattern corresponding to syllables in the monosyllabic dictionary section 113, the switching circuit 115 is connected to the monosyllabic dictionary section 113 side, and a specific word sound or monosyllabic sound is inputted from the microphone 13G. From the input word speech or monosyllabic speech, standard patterns corresponding to syllables are extracted by the parameter extraction unit 111 and the interval detection unit 112, and the monosyllable dictionary unit 1 extracts standard patterns corresponding to syllables.
Registered on 13th. The configurations and operations of the parameter extracting section 111 and the section detecting section 112 are all well known, so detailed explanation thereof will be omitted.

単音節の数は、約１００種類で単語数に依存しないので
、人語量の単語音声認識の場合にも単音節辞書部１１３
の容量は増加しない。Since the number of monosyllables is approximately 100 and does not depend on the number of words, the monosyllable dictionary unit 113
capacity will not increase.

使用単語辞書部１２２には、エキスパートシステム１４
０より入力単語音声の属するカテゴリ別の単語セット情
報が送られて来て格納される。入力単語音声の属するカ
テゴリが変更されると、エキスパートシステム１４０よ
りそのカテゴリの単語セット情報が送られ、使用単語辞
書部１２２の内容が書き換えられる。The used word dictionary section 122 includes an expert system 14.
Word set information for each category to which the input word sounds belong is sent from 0 and stored. When the category to which the input word voice belongs is changed, the expert system 140 sends word set information for that category, and the contents of the used word dictionary section 122 are rewritten.

この様にすることにより、少ない容量の使用単語辞書部
１２２を使用して、事実上無限誘電又は人語量単語音声
認識システムを構成することが出来る。By doing so, it is possible to configure a virtually infinite dielectric or human vocabulary word speech recognition system using the used word dictionary section 122 with a small capacity.

次に、入力単語音声の認識を行う場合は、切替え回路１
１５は音節照合部１１４側に接続される。Next, when recognizing the input word voice, the switching circuit 1
15 is connected to the syllable matching section 114 side.

マイクロホン１３０より入力さた単語音声は、前述の標
準パターンの登録の場合と同様に、パラメタ抽出部１１
１及び区間検出部１１２により音節対応の入力単語音声
パターンが作成されて音節照合部１１４に入力される。The word voice input from the microphone 130 is processed by the parameter extraction unit 11 as in the case of standard pattern registration described above.
1 and the section detection section 112 create an input word sound pattern corresponding to syllables, and input it to the syllable matching section 114 .

音節照合部１１４は、区間検出部１１２より入力された
音節対応の入力単語音声パターンと単音゛節辞書部１１
３の音節対応の各標準パターン系列のパラメタとを照合
し、その距離の最も小さい標準パターンの音節系列を音
節候補列として抽出する。この音節候補列の抽出は、例
えばＤ　Ｐ　（ＤｙｎａＩｌｌｉｃｐｒｏｇｒａ＋＋ｕ
＋＋ｉｎｇ　）マツチング処理により行うことが出来、
距離の最も小さい即ち第１位の音節候補列が所望の音節
候補列として抽出される。第２位以降の音節候補列を求
める必要がないので、処理量をそれだけ低減させること
が出来る。The syllable matching unit 114 uses the input word sound pattern corresponding to syllables inputted from the interval detection unit 112 and the monosyllable dictionary unit 11.
The parameters of each standard pattern series corresponding to the three syllables are compared, and the syllable series of the standard pattern with the smallest distance is extracted as a syllable candidate series. Extraction of this syllable candidate string can be performed using, for example, D P (DynaIllicprogra++u
++ing) can be done by matching process,
The syllable candidate string with the smallest distance, that is, the first syllable candidate string is extracted as the desired syllable candidate string. Since it is not necessary to obtain the second and subsequent syllable candidate sequences, the amount of processing can be reduced accordingly.

文字列照合部１２３は、使用単語辞書部１２２から読み
出された単語情報の文字系列のうち入力された音節候補
列の音節数に対応する音節数を持った各単語の文字系列
と入力された音節候補列との各桁毎の距離を音節間距離
メモリ１２１に基づいて求め、その距離の和が最小とな
る単語を認識結果として出力する。The character string matching unit 123 inputs the character sequence of each word having the number of syllables corresponding to the number of syllables of the input syllable candidate string among the character sequences of the word information read out from the used word dictionary unit 122. The distance from each digit to the syllable candidate string is determined based on the inter-syllable distance memory 121, and the word with the minimum sum of distances is output as a recognition result.

次に、文字列照合部１２３の動作を、第３図及び第４図
の各音節間距離メモリを参照して具体的に説明する。Next, the operation of the character string matching section 123 will be specifically explained with reference to the inter-syllable distance memories shown in FIGS. 3 and 4.

第３図（Ａ）の音節間距離メモリは、母音（ａｓｉ＊ｕ
ｐ’３ｓｏｇＮ）で構成される各音節間の距離をマトリ
ックス形式で示したものである。例えば、ａとｉの間の
距離は５０″であり、Ｕと０間の距離は“２０”である
。同一母音間の距離は、当然“Ｏ”である。なお、Ｎは
“ん”を表すものとする。The inter-syllable distance memory in Figure 3 (A) is for vowels (asi*u
The distance between each syllable consisting of p'3sogN) is shown in a matrix format. For example, the distance between a and i is 50", and the distance between U and 0 is "20". The distance between the same vowels is naturally "O". Note that N stands for "n". shall be expressed.

第３図（Ｂ）の音節間距離メモリは、子音（φ。The inter-syllable distance memory in FIG. 3(B) is based on the consonant (φ).

ｋ、ｓ、ｔ、・・・・・・ｐｊ等　約２０種）で構成さ
れる各音節間の距離を同じくマトリックス形式で示した
ものである。例えば、ｋと８間の距離は“５”であり、
ｋとｎ間の距離は１０”である。同一子音間の距離は、
当然“０”である。なお、“φ”は、母音“あ、い、う
、え、お”に仮想上の子音を設定し、それを“φ”で表
したものである。例えば、母音“あ”は“　（φ）ａ″
で表される。これにより、各音節は形式上子音と母音の
組合せで構成されるので、処理を画一的に効率良く行う
ことが出来る。The distance between each syllable consisting of approximately 20 types of syllables (k, s, t, ... pj, etc.) is also shown in a matrix format. For example, the distance between k and 8 is "5",
The distance between k and n is 10”.The distance between identical consonants is
Naturally, it is "0". Note that "φ" is a virtual consonant set to the vowel "a, i, u, e, o" and expressed as "φ". For example, the vowel “a” is “(φ)a”
It is expressed as As a result, since each syllable is formally composed of a combination of a consonant and a vowel, processing can be performed uniformly and efficiently.

各音節間の距離は、音声学上からの類似性や実際に単語
認識を行って得られた結果等を参酌して決定される。こ
のことは、次の第４図の音節間距離メモリの場合も同様
である。The distance between each syllable is determined by taking into account phonetic similarities and results obtained from actual word recognition. This also applies to the inter-syllable distance memory shown in FIG.

第４図の音節間距離メモリは、′あ、い、う。The inter-syllable distance memory in Figure 4 is 'Ah, I, U.

え、・・・・・・ん”等、約１００種類の音節間の距離
をマトリックス形式で示したもので、各音節間の距離は
、第３図（Ａ）及び（Ｂ）から求められる距離と一致す
る。It shows the distances between about 100 types of syllables, such as ``Eh...'', in a matrix format, and the distance between each syllable is the distance calculated from Figure 3 (A) and (B). matches.

例えば、１か（ｋａ）　”と“う（φＵ）ｗ間の距離は
、何れの音節間距離メモリによっても同じ値″５５”が
得られる。For example, for the distance between ``1 (ka)'' and ``U (φU)w,'' the same value ``55'' can be obtained by any inter-syllable distance memory.

音節数は約１００種類度で、単語数には依存しないので
、前記いずれの音節間距離メモリを用いても、少ない容
量の音節間距離メモリにより、人語党の単語音声の認識
を行うことが出来る。The number of syllables is about 100 and does not depend on the number of words, so no matter which of the above-mentioned inter-syllable distance memories are used, it is possible to recognize word sounds in human languages using a small-capacity inter-syllable distance memory. I can do it.

いま、音節照合部１１４より入力された音節候補列が”
ｈａ　　ｋｉ　　ｓａ”の３音節からなるものであると
する。Now, the syllable candidate string input from the syllable matching unit 114 is "
Assume that it consists of three syllables: "ha ki sa".

文字列照合部１２３は、音節候補列の音節数に対応する
音節数として等しい３音節数を持った各単語に対応する
文字系列を使用単語辞書部１２２より読み出し、前記音
節間距離メモリを参照して、入力された音節候補列との
各桁毎の距離を求める。The character string matching unit 123 reads character sequences corresponding to each word having the same number of three syllables as the number of syllables in the syllable candidate string from the used word dictionary unit 122, and refers to the inter-syllable distance memory. Then, the distance of each digit from the input syllable candidate string is calculated.

表１は、第３図（Ａ）及び（Ｂ）の音節間距離メモリを
用いて使用単語辞書部１２２より読み出された３音節単
語“秋田”に対応する文字系列“（（φ）ａ　　ｋｊ　
　ｔａ）　　”及び“堺”に対応する文字系列“　（ｓ
ａ　　ｋａ　　　（φ）ｉ）　　″と音節候補列“ｈａ
　　ｋｉ　　ｔａ”間の距離を求める例を示したもので
ある。Table 1 shows the character sequence "((φ)a kj
ta) ” and the character sequence corresponding to “Sakai” (s
a ka (φ)i)'' and the syllable candidate string “ha
This figure shows an example of finding the distance between "kita".

ｈａ　　ｋｉ　　ｓａ″と“秋田”に対応する文字系列
″（φ）ａ　　ｋｉ　　ｔａ”間の距離は、ｈａ”と“
（φ）ａ″間が６５″、＠　ｋｉ　ｍとｋｉ″″間が“
θ″、“ｓａ″とｔａ”間が“４′″であるので、合計
″９″となる。The distance between the character series "(φ)a ki ta" corresponding to "ha ki sa" and "Akita" is "ha" and "Akita".
(φ) between a″ is 65″, @ki m and ki″″ is “
θ'', the distance between “sa” and ta is “4′”, so the total is “9”.

同様にして、ｈａ　　ｋｉ　　ｓａ”と“堺”に対応す
る文字系列“ｓａ　　ｋａ　　　（φ）ｉ”間の距離は
“１０６”となる。これらの距離の値は、第４図の音節
間距離メモリを使用した場合にも同じ結果が得られる。Similarly, the distance between "ha ki sa" and the character sequence "sa ka (φ)i" corresponding to "Sakai" is "106". These distance values are stored in the inter-syllable distance memory in Figure 4. The same result can be obtained using .

なお、表１の（Ｃ）に示す沖縄、（φ）ｏ　　ｋｉ　　
ｎａ　　ｗａ”の様に音節数が異なるものは、認識単語
となる可能性は無いものと考えてよいので、ｏｏ（無限
大）等、認識単語の対象とならない様な距離又は符号が
与えられる。In addition, Okinawa shown in (C) of Table 1, (φ) o ki
It can be considered that words with different numbers of syllables, such as "na wa", have no possibility of becoming recognized words, so a distance or code such as oo (infinity), which is not a recognized word, is given.

第３図や第４図に例示する様な音節間距離メモリを用い
ることにより、音節候補列と使用単語辞書部１２２より
読み出された単語に対応する文字系列間の距離を、従来
のＤＰマツチングによる方法よりも大幅に少ない処理量
で求めることが出来る。By using inter-syllable distance memories such as those illustrated in FIGS. 3 and 4, distances between character sequences corresponding to words read from the syllable candidate string and the used word dictionary section 122 can be calculated using conventional DP matching. The amount of processing required is significantly smaller than that of the method.

以上の様にして求められた距離の最も小さい単語（表１
の例では“秋田”）が、認識単語として抽出されエキス
パートシステム１４０に送られる。The word with the smallest distance determined as above (Table 1
In the example, “Akita”) is extracted as a recognized word and sent to the expert system 140.

以上本発明の一実施例について説明したが、本発明は、
次の様にして実施することも出来る。Although one embodiment of the present invention has been described above, the present invention includes
It can also be implemented as follows.

即ち、単語認識部１２０に使用単語辞書部１２２を設け
ることなく、それから読み出される単語情報に対応する
文字系列と同じ内容のものを、単語認識処理時にエキス
パートシステム１４０より直接受信する様にしてもよい
。That is, without providing the used word dictionary section 122 in the word recognition section 120, the same content as the character sequence corresponding to word information read out from the word recognition section 120 may be directly received from the expert system 140 during word recognition processing. .

又、音節候補列抽出部１１０において行われるパラメタ
レベルの照合での音節数の検出が高精度で得られない場
合は、未知入力単語音声から音節数の異なる複数の音節
系列を抽出し、音節系列の異なる長さ毎に音節候補列と
累積距離を求めて単語認識部１２０の文字列照合部１２
３に入力する。In addition, if the number of syllables cannot be detected with high accuracy through the parameter-level matching performed in the syllable candidate string extraction unit 110, multiple syllable sequences with different numbers of syllables are extracted from the unknown input word speech, and the syllable sequence is extracted from the unknown input word speech. The character string matching unit 12 of the word recognition unit 120 calculates the syllable candidate string and the cumulative distance for each different length.
Enter 3.

文字列照合部１２３は、音節毎に文字列照合を行って、
パラメータレベルの照合での距離と文字列照合での距離
の重み付は加算値によって認識結果を得ることにより、
精度を上げることが出来る。The character string matching unit 123 performs character string matching for each syllable,
By weighting the distance in parameter level matching and the distance in string matching, the recognition results are obtained by adding values.
Accuracy can be increased.

〔Effect of the invention〕

以上説明した様に、本発明によれば次の諸効果が得られ
る。As explained above, according to the present invention, the following effects can be obtained.

（イ）エキスパートシステム等の上位装置と容易に接続
し、この上位装置から所望の単語セット情報を受信する
ことが出来るので、これにより少ない容量の各種記憶装
置を用いて認識対象単語を随時変更することが可能とな
り、事実上無限語賃又は人語霊単語音声認識システムを
構成することが出来る。(b) Since it is possible to easily connect to a host device such as an expert system and receive desired word set information from this host device, the words to be recognized can be changed at any time using various storage devices with a small capacity. This makes it possible to construct a speech recognition system with virtually unlimited word usage or human speech.

（ロ）距離の最も小さい第１位の音節候補列のみを抽出
して単語認識処理を行う様にしたので、音節候補列の抽
出処理量及び単語認識処理量を低減させることが出来る
。(b) Since word recognition processing is performed by extracting only the first syllable candidate string with the smallest distance, the amount of processing for extracting syllable candidate strings and the amount of word recognition processing can be reduced.

[Brief explanation of drawings]

第１図・・・本発明の基本構成の説明図、第２図・・・
本発明の一実施例の説明図、第３図・・・同実施例に用
いられる音節間距離メモリの説明図、第４図・・・同実施例に用いられる他の音節間距離メモ
リの説明図、第５図・・・従来の単語音声認識方式の説明図。第１図及び第２図において、１１０・・・音節候補列抽出部、１２０・・・単語認識
部、１２１・・・音節間距離メモリ、１２２・・・使用
単語辞書部、１２３・・・文字列照合部、１３０・・・
マイクロホン、１４０・・・エキスパートシステム。Fig. 1...Explanatory diagram of the basic configuration of the present invention, Fig. 2...
An explanatory diagram of one embodiment of the present invention, FIG. 3: An explanatory diagram of an inter-syllable distance memory used in the same embodiment, FIG. 4: An explanation of another inter-syllable distance memory used in the same embodiment Fig. 5: An explanatory diagram of a conventional word speech recognition method. In FIG. 1 and FIG. 2, 110... Syllable candidate string extraction unit, 120... Word recognition unit, 121... Inter-syllable distance memory, 122... Used word dictionary unit, 123... Character Column matching section, 130...
Microphone, 140...Expert system.

Claims

[Claims]

(1) In a word speech recognition device whose recognition unit is a syllable, (a) The input word speech is compared with a series of pre-registered standard patterns corresponding to syllables, and a set of syllable candidate sequences is output. (b) an inter-syllable distance memory (121) in which the distance between each syllable is stored; The distance for each digit between the character sequence corresponding to each word having the number of syllables corresponding to the number of syllables and the syllable candidate string is calculated based on the inter-syllable distance memory (121), and the sum of these distances is the minimum. A word speech recognition device comprising: a word recognition unit (120) that outputs a word as a recognition result.

(2) The word recognition unit (120) includes a used word dictionary unit (122) that receives and stores word information of the category to which the input word voice belongs from the host device before inputting the word voice;
The distance for each digit between the character sequence corresponding to each word having the number of syllables corresponding to the number of syllables in the voice candidate string read from this used word dictionary section (122) and the syllable candidate string, Determined based on the inter-syllable distance memory (121),
2. The word speech recognition device according to claim 1, wherein the word for which the sum of distances is the minimum is output as a recognition result.

(3) The word recognition unit (120) recognizes each word having the number of syllables corresponding to the number of syllables in the voice candidate string among the character series corresponding to the word information of the category to which the input word voice received from the host device belongs. The distance for each digit between the corresponding character sequence and the syllable candidate string is stored in the inter-syllable distance memory (12
1) and outputs the word for which the sum of the distances is the minimum as a recognition result.