JPS6057395A

JPS6057395A - Voice recognition equipment

Info

Publication number: JPS6057395A
Application number: JP16554783A
Authority: JP
Inventors: 郁夫井上; 二矢田　勝行; 藤井　諭; 森井　秀司
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1983-09-08
Filing date: 1983-09-08
Publication date: 1985-04-03
Also published as: JPH042198B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は人間の発声した音声に応じた動作を機械に行な
わせることを可能とする音声認識装置に関するものであ
る。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a speech recognition device that enables a machine to perform operations in accordance with human voice.

従来例の構成とその問題点人間の声を認識する音声認識装置は、計算機やワードプ
ロセッサの入力、音声による品物の仕分は作業等のいろ
いろな分野に於て、円滑に作業を行なう為の有効な入力
手段として注目を集めている。Conventional configurations and their problems Speech recognition devices that recognize human voices are effective in various fields such as input into computers and word processors, and work such as sorting items by voice. It is attracting attention as an input method.

従来の単語音声認識装置では、単独発声された単語音声
をマイクロホンより入力し、音響分析を行なった後特徴
パラメータを抽出し、単語辞書として格納されている学
語標準パターンとの比較を行々い、確からしさの度合（
尤度）の最も高い単語を認識結果として出力していた。In conventional word speech recognition devices, single word speech is input through a microphone, acoustic analysis is performed, feature parameters are extracted, and comparisons are made with academic language standard patterns stored as word dictionaries. , degree of certainty (
The word with the highest likelihood) was output as the recognition result.

しかしこの様な方法では認識対象語いを多くすると、辞
書中に類似した単語が含１れる割合が増えて来る結果、
認識率の急激な低下をまねく為、実用的なレベルの認識
率を確保するには認識対象語いを小数に限定するか、装
置自体に応答機能を持たせて誤りである場合には次に尤
度の高い単語を出力するか、あるいは複数の候補を出力
して選択させるといった方法をとらざるを得す、用途が
限定されたり、手間がかかるといった問題があった。However, with this method, as the number of words to be recognized increases, the proportion of similar words in the dictionary increases.
In order to ensure a practical level of recognition rate, it is necessary to limit the number of words to be recognized to a decimal number, or to equip the device itself with a response function. There are problems in that it is necessary to output a word with a high likelihood, or to output multiple candidates and have the user select one, which limits its use and is time-consuming.

発明の目的本発明は、従来の単語音声入力装置における前記の問題
点を解決し、多数語いが扱え、高い認識率を有しなおか
つオペレーションが簡単であるという特徴を有する、よ
り実用的な音声認識装置を実現することを目的とする。OBJECT OF THE INVENTION The present invention solves the above-mentioned problems with conventional word voice input devices, and provides a more practical voice input device that can handle a large number of words, has a high recognition rate, and is easy to operate. The purpose is to realize a recognition device.

発明の構成本発明は上記目的を達成するためになされたもので、複
数個の単語辞書項目からなる単語辞書を少なくとも１個
有する単語辞書列をｎ列設け、１列目（ｉ＝１．・・・
・・−、ｎ−１）の単語辞書列の単語辞書を構成する単
語辞書項目と（ｉ＋１）列目の単語辞書列内の単語辞書
との間に対応関係を有するように配された単語辞書群格
納部と、ｎヶの単語からなる音声を入力し、ｉ番目の入
力単語の認識結果により前記単語辞書群格納部の（ｉ＋
１）列目の単語辞書列内の対応単語辞書を選択する単語
辞書選択部と、（ｉ＋１）番目の人力単語と前記選択さ
れた対応単語辞書とを比較しその確からしさの度合に基
づいて（ｉ＋１）番目の入力単語を認識する単語認識部
とを少なくとも有することを特徴とする音声認識装置を
提供するものである。Structure of the Invention The present invention has been made to achieve the above object, and includes n word dictionary strings each having at least one word dictionary consisting of a plurality of word dictionary entries, and the first column (i=1.・・・
...-, n-1) word dictionary items constituting the word dictionary of the word dictionary string and the word dictionary in the word dictionary string of the (i+1)th column are arranged so that there is a correspondence relationship. A voice consisting of n words is input to the group storage section, and (i+
1) A word dictionary selection unit that selects a corresponding word dictionary in the word dictionary column of the column, compares the (i+1)th human-generated word with the selected corresponding word dictionary, and selects ( The present invention provides a speech recognition device characterized by having at least a word recognition unit that recognizes an i+1)th input word.

実施例の説明我々が、多くの物の中からある特定の物を捜す場合、た
だ片端から捜すのでは非常に時間がかかってし捷う。こ
の様な場合、個々の物がそれぞれが持っている属性金基
に系統的に分類整理されていれば、ある特定の物のもつ
属性をより大きな分類から順に辿っていくことにより、
容易に捜し出すことが出来る。DESCRIPTION OF EMBODIMENTS When we search for a specific item among many items, it takes a lot of time to search from just one end. In such a case, if each object is systematically classified and organized according to its own attribute metal base, by tracing the attributes of a particular object in order from the larger classification,
It can be easily searched for.

単語音声認識装置に於ても、この様な方法で認識を行な
えればより効率的な認識が行なえるだけでなく、より確
実な認識結果を得ることが可能である。Even in a word speech recognition device, if recognition can be performed using such a method, not only more efficient recognition can be performed, but also more reliable recognition results can be obtained.

大語いを扱う単語音声認識装置の場合、多くの単語の中
からある１つの単語をより正確に認識することを最大の
目的とするならば、前述の様な方法を用いることにより
これを実現することが可能である。例えば、日本の都市
名単語を認識対象とした場合、その都市の属する区や市
電に１つの都市名単語集団とし、それぞれの都市の属す
る市や区の名前を集めて市（区）名単語集団とし、同様
にしてそれぞれの区や市の属する都道府県名を集めて都
道府県名単語集団とするｔｒｅｅ型の階層構造全作成し
ておき、−回の入力で例えば県名、市（区）名、都市名
の順に３語を連続して発声し、認識装置ではこの順序で
それぞれの階層に於ける認識候補を複数求めておき、そ
れぞれの階層の認識候補相互のｔｒｅｅ上の位置関係と
、それぞれの単語に対する尤度を基に都市名の決定を行
なうことにより、認識率の大幅な低下を伴うことなくし
て大語いを対象とした単語の認識が可能となる。In the case of a word speech recognition device that handles large words, if the main goal is to more accurately recognize one word from among many words, this can be achieved by using the method described above. It is possible to do so. For example, if Japanese city name words are to be recognized, one city name word group is created for the ward and streetcar to which the city belongs, and the city (ward) name word group is created by collecting the names of the cities and wards to which each city belongs. In the same way, create a tree-type hierarchical structure that collects the names of prefectures to which each ward and city belongs to form a prefecture name word group, and input - times to input prefecture name, city (ward) name, etc. , three words are uttered consecutively in the order of the city name, the recognition device obtains multiple recognition candidates in each hierarchy in this order, and calculates the positional relationship of the recognition candidates in each hierarchy on the tree and each By determining the city name based on the likelihood for the word, it becomes possible to recognize large words without significantly reducing the recognition rate.

ｔた、この様な方法を用いれば、発声は一度の入力で行
なえるので、質問応答を主体とした従来の方法の様なめ
んどうな手続きも不要である。Furthermore, if such a method is used, utterance can be performed with a single input, so there is no need for the troublesome procedures required in conventional methods that mainly involve question and answering.

ここで、単語の認識に用いる尤度の値は以下の様にして
めることが出来る。Here, the likelihood value used for word recognition can be determined as follows.

Ωを単語辞書、Ｄを辞書項目とすると、入力情報Ｘが与
えられた時の認識結果がは、入力情報Ｘが与えられた時
の辞書項目りの事後生起確率が最大となるＤによって与
えられる。この関係は、Ｂａｙｅｓの定理によりＤｅＱ　Ｐ（ｘ）と表わされる。Ｐａ）は一定としてよ＜、Ｐ（Ｘ）はＤ
に関らず共通であるから、結局を満足するＤをめればよく、その時のＰ（ｘｌＤ）の値
が辞書項目に対する尤度となる。If Ω is a word dictionary and D is a dictionary item, the recognition result when input information X is given is given by D, which maximizes the posterior probability of occurrence of the dictionary item when input information X is given. . This relationship is expressed as DeQ P(x) according to Bayes' theorem. Pa) is constant <, P(X) is D
Since it is common regardless of the result, it is only necessary to find D that satisfies the final condition, and the value of P(xID) at that time becomes the likelihood for the dictionary item.

前記説明で述べた様に、本発明は１回の入力で複数個の
単語を発声しておき、その第１語口で第１の属性（例え
ば県名）の中から認識し、第２語口で第１語口の認識結
果に代表される第２の属性（例えば市名）の中から認識
し、第３語間で第２語口の認識結果に代表される第３の
属性（例えば区名）の中から認識するという様に、階層
的に認識を行なってゆくものである。As mentioned in the above description, the present invention utters a plurality of words in one input, recognizes the first attribute (for example, prefecture name) with the first utterance, and recognizes the second word. The second attribute (e.g. city name) represented by the recognition result of the first word is recognized by mouth, and the third attribute (e.g. city name) represented by the recognition result of the second word is recognized between the third words. Recognition is performed hierarchically, such as recognizing from the ward name).

以下に本実施例の詳細を図面を用いて説明する。The details of this embodiment will be explained below with reference to the drawings.

第１図は本実施例の認識に用いる単語辞書ｔ　ｒｅｅの
構成の例を示す図である。第２図は、本実施例による単
゛語音声認識装置の構成の例を示す図である。第１図及
び第２図を参照しながら、都市名単語を対象として認識
を行なう場合を例にとり本実施例を説明する。FIG. 1 is a diagram showing an example of the structure of a word dictionary tree used for recognition in this embodiment. FIG. 2 is a diagram showing an example of the configuration of the single-word speech recognition device according to this embodiment. The present embodiment will be described with reference to FIGS. 1 and 2, taking as an example a case in which city name words are recognized.

第１図に於いて、１〜６はそれぞれ単語辞書を表わし、
６は第１階層の単語辞書列、７は第２階層の単語辞書列
、８は第３階層の単語辞書列を表わす。各単語辞書は、
よシ上位の階層に於ける単語が示す属性を持つ単語名か
ら成る。認識されるべき都市名単語は第３階層の単語辞
書列８にあり、第２階層の単語辞書列７にはそれぞれの
都市が属する市（区）名から成る単語辞書が、さらに、
第１階層の単語辞書列６にはそれぞれの市（区）が属す
る都道府県名から成る単語辞書が入っている。In Figure 1, 1 to 6 each represent a word dictionary,
6 represents a word dictionary string of the first layer, 7 represents a word dictionary string of the second layer, and 8 represents a word dictionary string of the third layer. Each word dictionary is
It consists of word names that have attributes indicated by words in higher hierarchy. The city name words to be recognized are in the word dictionary string 8 in the third layer, and the word dictionary string 7 in the second layer contains word dictionaries consisting of city (ward) names to which each city belongs.
The word dictionary row 6 in the first layer contains word dictionaries consisting of the names of prefectures to which each city (ward) belongs.

都市名単語の認識を行なう際、１回の入力で、その都市
が属する県名、市（区）名、その都市名の順に３語全区
切りながら発声し、それぞれの階層毎に入力単語と辞書
単語との間でマツチングを行ない、その結果得られた尤
度の高いものから順に認識候補とする。When recognizing a city name word, in one input, say the name of the prefecture to which the city belongs, the name of the city (ward), and the name of the city, dividing all three words in that order, and separate the input word and dictionary for each layer. Words are matched, and the words with the highest likelihood are selected as recognition candidates.

先ず、最も単純な場合としてｔ　ｒｅｅの上の層から第
１認識の枝を順に辿り、認識都市名を出力するものとし
、都市名／ＫＡＮＤＡ（神田）／全認識すル為ニ、／Ｔ
ＯＯＫＹＯＯＴＯ（東京都）／ＣＨＩＹＯＤＡＫＵ（千
代田区）　７　ＫＡＮＤＡ　（神田）／と１語ずつ区切
って発声する場合について説明する。First, in the simplest case, the branches of the first recognition are sequentially traced from the top layer of the tree, and the recognized city name is output.
OOKYOOTO (Tokyo) / CHIYODAKU (Chiyoda-ku) 7 KANDA (Kanda) / and the case where each word is uttered separately will be explained.

第２図に於て、初期状態では階層計数部１３の値は１と
なっており、単語辞書選択部１２では第１階層の単語辞
書列６の都道府県名辞書が選択される。音声入力部９か
ら第１語口が入力されると、単語認識部１ｏでは、単語
辞書選択部１２で選択された都道府県名辞書が単語辞書
群格納部１１から取り出され、その中の１語（例えば東
京都）が認識され、その結果が単語判定部１４と単語辞
書選択部１２へ送られる。この際、階層計数部の値は＋
１されて、次の認識が第２階層の単語辞書列７に対する
ものであることを示す。単語選択部１２では、第１語口
の認識結果と階層計数部１３の値によって次に入力され
るべき単語の属する単語辞書が選択される。音声入力部
９から第２語口が入力されると、単語認識部１ｏでは、
単語辞書選択部部１２で選択された第２＠層の単語辞書列７の区名辞書
が単語辞書群格納部１１から取り出され、その中の１語
（例えば千代田区）が認識され、その結果が単語判定部
１４と単語辞書選択部１２に送られる。この際、階層計
数部１３の値は更に＋１されて、次の認識が第３階層の
単語辞書列８に対するものであることを示す。単語辞書
選択部１２では、第２語口の認識結果と階層計数部１０
の値によって次に入力されるべき単語の属する単語辞書
が選択される。音声入力部９に第３語間が入力されると
、単語認識部１ｏでは、単語辞書選択部１２で選択され
た第３階層の単語辞書列８の都市名辞書が単語辞書格納
部１１から取り出され、その中の１語（例えば神田）が
認識され、その結果が単語判定部１４へ送られる。この
様にして全ての階層についての単語認識が終わると、単
語判定部１４には／東京都７千代田区／神田／という認
識結果が得られ、これらの全であるいは一部が最終的な
認識結果として出力される。この場合、／ＫＡＮＤＡ／
が認識される為には、第１階層の単語辞書列で／ＴＯＯ
ＫＹＯＯＴＯ／、第２階層の単語辞書列で／ＣＨＩＹＯ
ＤＡＫＵ／、第３階層の単語辞書列で／ＫＡＮＤＡＺ力
玉それぞれ正しく認識されなければならない。しかし、
各階層に於ける単語辞書の辞書項目数は、全ての都市名
を１つの単語辞書とする従来の辞書項目数に比べて遥か
に少ないので、各階層に於ける単語認識率は非常に高く
なる為最終的な単語認識率であるそれらの積の値も高く
なる。In FIG. 2, in the initial state, the value of the layer counting section 13 is 1, and the word dictionary selection section 12 selects the prefecture name dictionary in the word dictionary string 6 of the first layer. When the first word is input from the voice input unit 9, the word recognition unit 1o retrieves the prefecture name dictionary selected by the word dictionary selection unit 12 from the word dictionary group storage unit 11, and selects one word from it. (for example, Tokyo) is recognized, and the result is sent to the word determination section 14 and word dictionary selection section 12. At this time, the value of the layer counting section is +
1, indicating that the next recognition is for word dictionary string 7 in the second layer. The word selection unit 12 selects the word dictionary to which the next word to be input belongs based on the recognition result of the first idiom and the value of the hierarchy counting unit 13. When the second word is input from the voice input unit 9, the word recognition unit 1o
The ward name dictionary of the word dictionary row 7 in the second @ layer selected by the word dictionary selection section 12 is retrieved from the word dictionary group storage section 11, one word (for example, Chiyoda Ward) therein is recognized, and the result is is sent to the word determination section 14 and the word dictionary selection section 12. At this time, the value of the layer counting unit 13 is further incremented by 1 to indicate that the next recognition is for the word dictionary string 8 of the third layer. The word dictionary selection unit 12 uses the recognition result of the second word and the hierarchy counting unit 10.
The word dictionary to which the next word to be input belongs is selected depending on the value of . When the third word interval is input to the voice input unit 9, the word recognition unit 1o retrieves the city name dictionary of the word dictionary row 8 in the third layer selected by the word dictionary selection unit 12 from the word dictionary storage unit 11. One of the words (for example, Kanda) is recognized, and the result is sent to the word determination section 14. When the word recognition for all layers is completed in this way, the word judgment unit 14 obtains the recognition result /Chiyoda-ku, Tokyo 7/Kanda/, and some or all of these are the final recognition results. is output as In this case, /KANDA/
In order to be recognized, /TOO is required in the first layer word dictionary string.
KYOOTO/, second layer word dictionary string/CHIYO
DAKU/ and /KANDAZ must be correctly recognized in the word dictionary string of the third layer. but,
The number of dictionary items in the word dictionary at each level is much smaller than the number of dictionary items in a conventional dictionary that includes all city names in one word dictionary, so the word recognition rate at each level is extremely high. Therefore, the value of their product, which is the final word recognition rate, also becomes high.

なお、本実施例のように各単語辞書の辞書項目数を限定
することにより誤認識の起きる割合を十分少なくする場
合には、単語判定部１４は必らずしも必要でない。入力
音声が３単語で、単語辞書列が３層の場合で認識候補数
を１つに絞った時の単語認識フローチャートを第３図に
示す。Note that, in the case where the number of dictionary entries in each word dictionary is limited to sufficiently reduce the rate of erroneous recognition as in this embodiment, the word determination unit 14 is not necessarily necessary. FIG. 3 shows a word recognition flowchart when the input speech is three words and the word dictionary string is three layers, and the number of recognition candidates is narrowed down to one.

次に、各階層に於て、認識単語候補の数を１つに限定せ
ずに数番目の候補まで許すことにすれば、それらの中に
正解単語が含まれる確率は更に高まる。この場合、第２
図の例で第１語目の都道府県名単語を認識されるとき、
単語認識部１０では、入力単語と都道府県名単語辞書の
辞書項目との尤度の高いものから例えば第３位までが認
識候補として認識され、その結果が単語判定部１４及び
単語辞書選択部１２に送られる。単語辞書選択部１２で
は、第３位までの認識候補に対応する３つの市伝）名単
語辞書が選択される。次に１第２語目の市（区）名単語
が認識されるとき、単語認識部１０では、３つのそれぞ
れの市（区）名単語辞書について、入力単語と辞書項目
との尤度の高いものから例えば第３位までが認識候補と
して単語判定部１４及び単語辞書選択部１２に送られる
。単語辞書選択部１２では、それぞれの認識候補に対す
る劃９つの都市名単語辞書が選択される。最後に、第３
語目の都市名単語を認識されるとき、単語認識部１０で
は、９つのそれぞれの都市名単語辞書について、入力単
語と辞書項目との尤度の最も高いものが認識候補として
単語判定部１４に送られる。Next, in each layer, if the number of recognized word candidates is not limited to one but is allowed up to several candidates, the probability that a correct word is included among them is further increased. In this case, the second
In the example in the figure, when the first prefecture name word is recognized,
The word recognition unit 10 recognizes, for example, the third most likely dictionary entry between the input word and the prefecture name word dictionary as recognition candidates, and the results are sent to the word determination unit 14 and the word dictionary selection unit 12. sent to. The word dictionary selection unit 12 selects three famous word dictionaries corresponding to the top three recognition candidates. Next, when the city (ward) name word as the second word is recognized, the word recognition unit 10 selects a word with a high likelihood between the input word and the dictionary item for each of the three city (ward) name word dictionaries. For example, the top three words are sent to the word determination section 14 and word dictionary selection section 12 as recognition candidates. The word dictionary selection unit 12 selects nine city name word dictionaries for each recognition candidate. Finally, the third
When a city name word in the word category is recognized, the word recognition unit 10 selects the word with the highest likelihood between the input word and the dictionary entry for each of the nine city name word dictionaries and sends it to the word determination unit 14 as a recognition candidate. Sent.

この様にして全ての階層についての単語認識が終わると
、単語判定部１４では、都道府県名、龍囚名、都市名と
連なる９通りの単語ｔｒｅｅ系列の組み合わせができる
。ここで、第１語目の入力に対する辞書項目の尤度を大
きいものから順にＳｉ（を−１，２，３）とし、第２語
目の入力に対するｉ番目の辞書の辞書項目の尤度を大き
いものから順に８、（ｊ＝１．２．３）とし、第３語目
の入力に対するｉ番目の辞書の辞書項目の尤度が最大の
ものを５ｉＪ１　とすると、単語判定部１４での判定規
則は、例えば９通りの単語ｔｒｅｅ系列の各系列毎の尤
度和ＬＬ−３１＋Ｓ　ｉｓ　＋Ｓ　ｉ）　１　（’　””’　
１　＋　２＋　３　：　）＝１　＋　２　＋　３　）あ
るいはそれぞれの辞書項目数によって重みづけられた尤
度和し。When word recognition for all layers is completed in this way, the word determination unit 14 creates nine combinations of word tree sequences that are connected to prefecture names, dragon prisoner names, and city names. Here, the likelihood of the dictionary entry for the input of the first word is set as Si (-1, 2, 3) in descending order, and the likelihood of the dictionary entry of the i-th dictionary for the input of the second word is set as Si (-1, 2, 3). 8 (j = 1.2.3) in descending order of magnitude, and the maximum likelihood of the dictionary entry of the i-th dictionary for the input of the third word is 5iJ1, then the judgment by the word judgment unit 14 The rule is, for example, the sum of likelihoods for each of nine word tree sequences L L-31+S is +S i) 1 ('””'
1 + 2 + 3 : ) = 1 + 2 + 3 ) or the sum of likelihoods weighted by the number of dictionary entries.

Ｌω＝ｃｃ＋ｉＳ、＋ｃｃ＋１）Ｓ１１＋ＱｌｉｌＩ　
５ｉｊ１（１−１＋　２　＋　３　＋　］＝１４２＋　
３）（ω１．ωｉｊ＋ωｉｊ１は各階層に於ける辞書項
目数による重み）が最も大きくなる単語ｔｒｅｅ　系列
の全単語あるいはその中の都市名単語を認識結果とすれ
ばよい。この様に、各階層に於ける認識候補を複数にし
、各階層での尤度値から、単語ｔｒｅｅ系列の単位で認
識結果を決定する事により、上位の階層に於ける認識の
段階で正解単語に至るｔｒｅｅの枝の脱落が起こるのを
防ぎ、更に認識率を向上させることができる。Lω=cc+iS,+cc+1)S11+Qliil
5ij1(1-1+2+3+]=142+
3) All the words of the word tree series with the largest value (ω1.ωij+ωij1 is the weight based on the number of dictionary entries in each hierarchy) or the city name word therein may be used as the recognition result. In this way, by creating multiple recognition candidates in each layer and determining the recognition result in units of word tree series from the likelihood value in each layer, the correct word can be determined at the recognition stage in the upper layer. It is possible to prevent the branches of the tree from falling off and further improve the recognition rate.

入力音声が３単語、単語辞書列が３層構成で認識候補数
を複数許す場合の単語認識フローチャートを第４図に示
す。この場合認識候補数は第１番目の入力単語には一律
にｍ個、第２番目の入力単語には一律にｎ個と定めた場
合について示した。FIG. 4 shows a word recognition flowchart in the case where the input speech is three words, the word dictionary string has a three-layer structure, and a plurality of recognition candidates are allowed. In this case, the number of recognition candidates is uniformly determined to be m for the first input word and n for the second input word.

本実施例において、例えば第１層から第３層までの単語
辞書列の各単語辞書がそれぞれ１００単語から成り、特
別な認識傾向を持たない一般的なものである場合を仮定
すれば、不特定話者を対象とした１００単語の認識率が
９６．６％の単語音声認識装置では、１ｏｏ万単語につ
いて９０チ以上の認識率を得ることが可能となる。また
、この時の階層構造化の為に新たに作成する単語辞書の
量は１．１％の増加にすぎず、十分実用化可能である。In this example, if we assume that each word dictionary in the word dictionary strings from the first layer to the third layer consists of 100 words and is a general word with no special recognition tendency, A word speech recognition device that has a recognition rate of 96.6% for 100 words targeted at a speaker can achieve a recognition rate of 90% or more for 100,000 words. Further, the amount of newly created word dictionaries for creating a hierarchical structure at this time increases by only 1.1%, which is sufficient for practical use.

また、辞書項目の標章パターンと入カバターンとのマツ
チング演算の回数は、各単語辞書の辞書項目数が全てｎ
で３層から成る場合、階層構造にして第１候補のみを選
ぶ場合と階層構造にしない場合との比はほぼ３ｎ　：ｎ
３　となり本実施例を用いることにより演算時間を大幅
に削減することができる０発明の効果以上要するに本発明は、複数個の単語辞書項目からなる
単語辞書を少なくとも１個有する単語辞書列をｎ列設け
、ｉ列目（ｉ＝１．・・・・・・、ｎ−１）の単語辞書
列の単語辞書を構成する単語辞書項目と＜　ｉ　＋　１
’＞′−ｙす目の単語辞書列内の単語辞書との間に対応
関係を有するように配された単語辞書群格納部Ｌ、ｎヶ
の単語からなる音声を入力し、１番目の入力単語の認識
結果により前記単語辞書群格納部の（ｌ＋１）列目の単
語辞書列内の対応単語辞書を選択する単語辞書選択部と
、（ｉ＋１）番目の入力単語と前記選択された対応単語
辞書とを比較しその確からしさの度合に基づいて（ｉ＋
１）番目の入力単語を認識する単語認識部とを少なくと
も有することを特徴とする音声認識装置を提供するもの
で、数十から数百単語を１つの単語辞書として階層化を
行なう事により、新たに作成する辞書項目の数の増加を
数チ以下におさえ、２層の構成の場合は数千語から数万
語の単語が、また３層構成の場合は数万語から数百万語
の単語を高い認識率で認識することが可能となり、従来
、実用化にならなかった大語い単語音声認識装置が簡単
に実現でき、寸だ犬語いの場合、入カバターンと単語辞
書の標章パターンとのマツチングを全辞書項目に対して
行なわずに済む為演算時間が大幅に短縮される。In addition, the number of matching operations between the symbol pattern of the dictionary item and the input cover pattern is calculated when the number of dictionary items in each word dictionary is n.
In the case of 3 layers, the ratio between selecting only the first candidate with a hierarchical structure and not using a hierarchical structure is approximately 3n:n
3 Therefore, by using this embodiment, the calculation time can be significantly reduced. The word dictionary items constituting the word dictionary of the i-th column (i = 1..., n-1) word dictionary string and < i + 1 are provided.
A word dictionary group storage part L arranged so as to have a correspondence relationship with the word dictionary in the '>'-y-th word dictionary string, inputs speech consisting of n words, and inputs the sound consisting of n words. a word dictionary selection unit that selects a corresponding word dictionary in the (l+1)th word dictionary string of the word dictionary group storage unit based on the word recognition result; and an (i+1)th input word and the selected corresponding word dictionary. Based on the degree of certainty, (i+
1) A speech recognition device is provided which is characterized by having at least a word recognition unit that recognizes the first input word, and by layering tens to hundreds of words as one word dictionary, The increase in the number of dictionary entries to be created is kept to a few inches or less, and the two-layer structure can handle thousands to tens of thousands of words, and the three-layer structure can reduce the number of words from tens of thousands to millions of words. It has become possible to recognize words with a high recognition rate, and it has become possible to easily realize a speech recognition device for large words, which has not been put into practical use in the past. Since it is not necessary to perform matching with patterns for all dictionary items, calculation time is significantly reduced.

[Brief explanation of drawings]

第１図は本発明の単語辞書ｔｒｅ’ｅの構成の例を示す
図、第２図は本発明の一実施例である単語音声認識装置
のブロック図、第３図及び第４図は本実施例のフローチ
ャートを示す四ζ°゛あｊ。１〜６・・・・・・単語辞書、６・・・・・・第１階層
の単語辞書列、７・・・・・・第２階層の単語辞書列、
８・・・・・・第３階層の単語辞書列、９・・・・・・
音声入力部、１０・・・・・・単語認識部、１１・・・
・・・単語辞書群格納部、１２・・・・単語辞書選択部
、１３・・・・・・階層側敷部、１４・・・・・・単語
判定部。Fig. 1 is a diagram showing an example of the configuration of the word dictionary tre'e of the present invention, Fig. 2 is a block diagram of a word speech recognition device which is an embodiment of the present invention, and Figs. 4ζ°゛Aj showing an example flowchart. 1 to 6... Word dictionary, 6... First layer word dictionary string, 7... Second layer word dictionary string,
8...3rd layer word dictionary string, 9...
Voice input section, 10... Word recognition section, 11...
. . . word dictionary group storage section, 12 . . . word dictionary selection section, 13 . . . layer side section, 14 . . . word determination section.

Claims

[Claims]

(1) Provide n columns of word dictionary strings each having at least one word dictionary consisting of a plurality of word dictionary entries, and set the i-th column (i=1
．． ......, arranged so that there is a correspondence relationship between the word dictionary items constituting the word dictionary of the word dictionary string of (n-1) and the word dictionary in the word dictionary string of the (i+1)th column. Input the word dictionary group storage unit and the audio consisting of n words,
a word dictionary selection unit that selects a corresponding word dictionary in the (i+1)th word dictionary string of the word dictionary group storage unit based on the recognition result of the first input word; and the (i+1)th input word and the selection; A speech recognition device comprising at least a word recognition unit that compares the selected corresponding word dictionary and recognizes the (i+1)th input word based on its degree of certainty.

(2) The recognition results of the word recognition unit are ranked, a plurality of recognition result candidates are selected, and a word dictionary is selected corresponding to each of the candidates, so that a group of recognition result candidates for all input speech is selected. 2. The speech recognition device according to claim 1, wherein the input word is recognized based on the degree of certainty among the input words.