JPS62234198A

JPS62234198A - Voice recognition equipment

Info

Publication number: JPS62234198A
Application number: JP61077900A
Authority: JP
Inventors: 花岡　忠史
Original assignee: Citizen Watch Co Ltd
Current assignee: Citizen Watch Co Ltd
Priority date: 1986-04-04
Filing date: 1986-04-04
Publication date: 1987-10-14

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は音声を制御入力として使用する機器に関するも
のであり、音声認識装置の認識誤謬率と認識に必要な時
間とに影響を与えることなく、認識可能な音声言語数を
大巾に増大させる手段に関するものである。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to a device that uses speech as a control input, and it can be used without affecting the recognition error rate of a speech recognition device and the time required for recognition. , relates to means for greatly increasing the number of recognizable spoken languages.

[Conventional technology]

音声認識装置は大別して二つに分けられ、その一つが話
者限定型音声認識装置であり、他の一つが不特定話者音
声認識装置である。両者についてそれぞれに音声認識の
アルゴリズムは様々なものが提案されているが、現在ま
で゛に提案されている全ての音声認識装置は、入力され
る音声信号の周波数成分等の時間的変化のパターンを、
予め準備るものを捜すと言う原理に基づいている。した
がって予め準備されている音声言語の種類が多（なれば
なる程、正確な識別が困難になると共にその判別に必要
な時間が長くなると言う問題がある。Speech recognition devices are roughly divided into two types, one of which is a speaker-specific speech recognition device, and the other is a speaker-independent speech recognition device. Various speech recognition algorithms have been proposed for each of the two, but all speech recognition devices that have been proposed to date do not recognize patterns of temporal changes in the frequency components of the input speech signal. ,
It is based on the principle of looking for things to prepare in advance. Therefore, there is a problem that the more types of spoken languages are prepared in advance, the more difficult it becomes to accurately identify them, and the longer the time required for the discrimination becomes.

この問題を解決して認識可能な音声言語の数を増そうと
すると、入力される音声信号の特徴をより正確に抽出す
る必要と共に高速のデータ処理装置の必要が生じ、音声
認識装置が急に大損りなものになってコストが嵩むばか
りでなく装置の可搬性がなくなり音声認識装置の応用分
野が限られてしまう等の問題があった。In order to solve this problem and increase the number of recognizable spoken languages, it becomes necessary to more accurately extract the features of the input speech signal and requires high-speed data processing equipment. There were problems such as not only being a big loss and increasing the cost, but also the device becoming less portable and the field of application of the speech recognition device being limited.

第２図は玩具のロボッ）Ｋ組込まれた音声認識装置の応
用例を示すブロック図である。本例は限られた数の音声
言語を認識する機能を使用した従来技術の範晴に含まれ
るものであるが、以下にその動作の概略を説明する。FIG. 2 is a block diagram showing an application example of a voice recognition device incorporated in a toy robot. Although this example falls within the scope of the prior art that uses a function to recognize a limited number of spoken languages, an outline of its operation will be explained below.

音声入力はマイクロホン１によって電気信号に変換され
、アンプ回路２において増巾及びアナログ的な前処理を
されて特徴抽出回路３へ伝達される。The audio input is converted into an electrical signal by the microphone 1, amplified and analog preprocessed by the amplifier circuit 2, and transmitted to the feature extraction circuit 3.

特徴抽出回路３は、音声信号の周波数成分の時間的変化
を抽出し、時系列的に並べられた一連のデジタルデータ
、すなわち「分析データ」を発生してバッファメモリ４
に伝達し保持させる。The feature extraction circuit 3 extracts temporal changes in the frequency components of the audio signal, generates a series of chronologically arranged digital data, that is, "analysis data", and stores the generated data in the buffer memory 4.
to be transmitted and retained.

以上の信号処理過程を「音声分析過程」と称し、一般的
には一単語の音声を一つの単位として処理することが多
いので、０．１秒乃至２秒間の時間が割り当てられてい
る。The above signal processing process is called a "speech analysis process", and since one word of speech is generally processed as one unit, a time of 0.1 to 2 seconds is allocated to it.

音声分析過程が終了すると直ちに「データ照合過程」が
始まる。データ照合過程では特徴記憶回［５に記憶保持
されている十数種類の音声言語の「特徴データ」を、バ
ッファメモリ４に保持されている前記分析データと順次
比較照合し、近似性の強い特徴データを捜し出す。この
照合過程は次のように進められる。Immediately after the voice analysis process ends, the "data matching process" begins. In the data matching process, the "feature data" of more than a dozen types of spoken languages stored in the feature storage circuit [5] are sequentially compared and matched with the analysis data stored in the buffer memory 4, and the feature data with strong similarity is compared. Find out. This matching process proceeds as follows.

配列番号発生回路６が第１の音声言語を特定する音声番
号を発生すると、アドレス指定回路７は該音声番号に対
応した番地信号を発生して、特徴記憶回路５から当該の
特徴データを読み出す。該特徴データは比較回路８にお
いて分析データと比較され、その近似性が定量化される
。この時、近似性が一定値以下であれば、前記音声入力
は第１の音声言語とは異ると判定され、配列番号発生回
路６は第２の音声言語を特定する音声番号を発生する。When the array number generation circuit 6 generates a voice number specifying the first voice language, the address designation circuit 7 generates an address signal corresponding to the voice number and reads the corresponding feature data from the feature storage circuit 5. The feature data is compared with the analysis data in a comparison circuit 8, and the similarity thereof is quantified. At this time, if the similarity is less than a certain value, it is determined that the voice input is different from the first voice language, and the array number generation circuit 6 generates a voice number specifying the second voice language.

こうして次々と異る音声言語の特徴データを読み出して
は分析データと比較し近似性が一定値以上ある特徴デー
タが捜し出される。In this way, feature data of different spoken languages are read out one after another and compared with the analysis data to find feature data for which the similarity is greater than a certain value.

近似性が一定値以上ある特徴データが発見された時、比
較回路８から音声認識終了信号が出力される。これによ
り配列番号発生回路６は番号の更新を停止し、その時の
音声番号によって特定される音声言語が前記の音声入力
であったと判定される。When feature data with similarity greater than a certain value is found, the comparison circuit 8 outputs a speech recognition end signal. As a result, the array number generation circuit 6 stops updating the number, and it is determined that the audio language specified by the audio number at that time is the audio input.

このようにして捜し出された音声番号と音声認識終了信
号は、ロボットの手足や首の動きを生ずる命令実行手段
９に伝達される。該命令実行手段９においては、音声番
号をデコードし、音声入力の意味に沿った運動を発生す
るために必要なサーボ回路を必要な時間だけ動作させる
。The voice number and voice recognition end signal thus found are transmitted to the command execution means 9 which causes the robot's limbs and neck to move. The command execution means 9 decodes the voice number and operates the servo circuits necessary for the necessary time to generate movements in accordance with the meaning of the voice input.

[Problem that the invention seeks to solve]

上述のような音声認識装置を装備した玩具ロボットは、
人間の音声による命令、すなわち、「進メ」、「止１ノ
」、「退し」、「右ヲ向ケ」、「左ヲ向ケ」、「御挨拶
」、「手ヲ上ゲロ」、「手ヲ下ゲロ」、「掴メ」、「離
セ」等の音声言語を認識してその意味に応じた運動をす
ることができる。A toy robot equipped with a voice recognition device as described above is
Human voice commands, i.e., "Advance", "Stop 1 no", "Go away", "Go to the right", "Go to the left", "Greetings", "Puke on your hands", It is able to recognize spoken language such as ``hands down,''``tatchime,'' and ``let go,'' and perform movements according to their meanings.

ところがロボットの運動機能を更に高度にして更に多く
の命令の意味を理解させようとすると二つの大きな問題
が生ずる。その第一は例えば「右へ行ケ」、「左へ行ケ
」等の命令を、前記の「右ヲ向ケ」、「左ヲ向ケ」等の
命令と誤認識しやすくなることである。つまり、良く似
た音韻を持つ音声言語に対しては認識ミスを発生しやす
いと言う基本的な制約の下に、対象となる音声言語が増
加すれば、認識誤謬率が増大してしま５と言う問題があ
る。However, two major problems arise when trying to make the robot's motor functions more sophisticated and make it understand the meaning of more commands. The first is that commands such as "Go to the right" and "Go to the left" are easily misunderstood as commands such as "Go to the right" and "Go to the left". . In other words, given the basic constraint that recognition errors are more likely to occur for spoken languages with similar phonemes, as the number of target spoken languages increases, the recognition error rate increases5. I have a problem to say.

その第二はデータ照合過程において、照合すべき特徴デ
ータの数が増えることにより、データ照合に必要な時間
が長（なり、迅速な音声認識ができなくなることである
。The second problem is that in the data matching process, as the number of feature data to be matched increases, the time required for data matching becomes longer (and rapid speech recognition becomes impossible).

従来、上記二つの問題を解決するため、次のような方法
が試みられている。前者の認識誤謬率の増大に対しては
、比較されるべき両方のデータの密度と情報精度を上げ
たり、近似性の要求レベルを上げる一方で分析データを
様々な条件下で加工変形して比較照合すること等が行わ
れている。又、後者の照合時間の問題に対しては、デー
タ処理装置を高速化して時間の短縮を計ることが行われ
ている。しかしこれ等はすべて装置の大型化と価格の上
昇とを伴なうため、実用性の面では新たな問題を生ずる
ことになっていた。Conventionally, the following methods have been attempted to solve the above two problems. To deal with the former, which increases the recognition error rate, it is possible to increase the density and information accuracy of both data to be compared, or to increase the required level of approximation while processing and transforming the analysis data under various conditions for comparison. Verification etc. are being carried out. In addition, to solve the latter problem of collation time, efforts are being made to speed up the data processing device to shorten the time. However, all of these methods involve an increase in the size of the device and an increase in price, creating new problems in terms of practicality.

本発明は上記のような従来技術の欠点を克服し、より多
くの音声言語を認識して複雑な内容を理解できる音声認
識装置を、高価で大損りな装置に頼らずに実現すること
を目的としている。The present invention aims to overcome the drawbacks of the prior art as described above, and to realize a speech recognition device that can recognize more spoken languages and understand complex content without relying on expensive and costly equipment. There is.

[Means for solving problems]

音声入力手段と、該音声入力手段によって取り込まれた
音声信号の特徴を抽出し、予め準備された複数の音声言
語の特徴データと比較し、前記音声信号が予め準備され
た特徴データのどれに該当するかを判断して該特徴デー
タの番号を表すコード信号を出力する音声照合手段と、
前記予め準備された複数の音声言語の特徴データ群を記
憶保持し、必要に応じて前記音声照合手段にその記憶内
容を供給する特徴記憶手段とを有する音声認識装置付電
子機器において、前記予め準備された音声言語を複数の
グループに分類すると同時に、前記特徴記憶手段を前記
グループと同数のブロックに分割し、それぞれのブロッ
クに一つのグループの特徴データ群を記憶保持させる一
方、ブロック選択手段を設け、該ブロック選択手段から
出力されるブロック選択信号によって前記特徴記憶手段
のブロックの一つ、を選択し、該ブロックに記憶保持さ
れている特徴データ群を前記音声照合手段に供給するよ
うにしたことを特徴とする音声認識装置。A voice input means and a feature of the voice signal taken in by the voice input means are extracted and compared with feature data of a plurality of previously prepared speech languages, and to which of the previously prepared feature data the voice signal corresponds. voice matching means for determining whether the feature data is correct and outputting a code signal representing the number of the feature data;
In the electronic device equipped with a speech recognition device, the electronic device includes a feature storage means for storing and holding feature data groups of a plurality of speech languages prepared in advance and supplying the stored contents to the speech matching means as needed. At the same time, the feature storage means is divided into the same number of blocks as the groups, each block stores and holds a group of feature data of one group, and block selection means is provided. , one of the blocks of the feature storage means is selected by a block selection signal output from the block selection means, and a group of feature data stored in the block is supplied to the voice matching means. A voice recognition device featuring:

[Effect]

上記の構成において、初期状態では特徴記憶手段の第一
のブロックがブロック選択手段によって選択され、該ブ
ロックに記憶保持されている特徴データ群のみが音声照
合手段に供給できろようになっている。この状態で音声
入力手段に最初の音声が入力されると該音声は電気的な
音声信号に変換されて音声照合手段に伝達される。該音
声照合手段は音声信号を分析して特徴を抽出し、該特徴
を前記特徴記憶手段の第一のブロックから供給される特
徴データ群に比較照合して両者の近似性を評価する。そ
の結果、近似性の高い特徴データが見つかるとその特徴
データの番号を示すコード信号を出力する。該コード信
号はブロック選択手段から出力されるブロック選択信号
との組合せにより、入力された音声を特定するコード信
号として利用され、最初の音声認識を終る。In the above configuration, in the initial state, the first block of the feature storage means is selected by the block selection means, and only the feature data group stored and held in this block can be supplied to the voice matching means. When the first voice is input to the voice input means in this state, the voice is converted into an electrical voice signal and transmitted to the voice verification means. The voice matching means analyzes the voice signal to extract features, and compares and collates the features with the feature data group supplied from the first block of the feature storage means to evaluate the similarity between the two. As a result, if highly similar feature data is found, a code signal indicating the number of the feature data is output. The code signal is used as a code signal for specifying the input voice in combination with the block selection signal output from the block selection means, and the first voice recognition is completed.

その後ブロック選択手段は特徴記憶手段の第二のブロッ
クを選択して該ブロックに記憶保持されている特徴デー
タ群のみが音声照合手段に供給できるようにする。勿論
、音声認識の結果によっては特徴記憶手段の第一のブロ
ックがそのまま引続き選択されることもある。Thereafter, the block selection means selects the second block of the feature storage means so that only the feature data group stored in this block can be supplied to the voice matching means. Of course, depending on the result of speech recognition, the first block of the feature storage means may continue to be selected.

このようにして一つの音声の認識が終了すると次に入力
されるべき音声のグループを推定し、そのグループの音
声信号の特徴データ群が記憶保持されている特徴記憶手
段のブロックが順次選択されるのである。このような推
定は音声認識される音声言語の配列が全（ランダムに行
われるのでなく、一定の制約下にあることから可能にな
っている。一般に、音声認識装置によって機械装置に命
令を入力する場合、命令は複数の音声言語を文法に従っ
て配列することによって作られる。日本語においては例
えば「手を−ゆっくりと−挙げろ。」と言うように、目
的語−修飾語一連語の配列順序がとられる。そのため、
音声認識の対象として予め準備される音声言語をグルー
プ分けする時に、目的語に相当する名詞のグループ、修
飾語に相当する副詞のグループ、及び述語に相当する動
詞のグループとに分け、それ等の特徴データ群を特徴記
憶手段の各ブロックに記憶させておくと良い。When recognition of one voice is completed in this way, the group of voices to be input next is estimated, and blocks of the feature storage means in which the feature data group of the voice signal of that group are stored are sequentially selected. It is. This kind of estimation is possible because the sequence of spoken languages to be recognized is not random, but is subject to certain constraints.In general, a command is input to a mechanical device by a speech recognition device. In Japanese, commands are created by arranging multiple spoken languages according to the grammar.For example, in Japanese, the order of the object-modifier sequence is fixed, as in "Raise your hand slowly." Therefore,
When dividing spoken language prepared in advance as a target for speech recognition into groups, it is divided into a group of nouns corresponding to objects, a group of adverbs corresponding to modifiers, and a group of verbs corresponding to predicates. It is preferable to store the feature data group in each block of the feature storage means.

このようなグループ構成を行うと、命令を待っている初
期の状態においては目的語となる言語（名詞）の認識だ
けができる状態になっていれば良く、目的語を認識した
後は修飾語となる言語の認識だけができる状態になって
いれば良い。また修飾語を認識した後は述語となる言語
（動詞）の認識だけができる状態になっていれば艮い。With such a group structure, in the initial state of waiting for a command, it is only necessary to be able to recognize the language (noun) that is the object, and after recognizing the object, it is necessary to recognize the language (noun) that is the object, and after recognizing the object, it is necessary to recognize the language (noun) that is the object. All you need to do is to be able to recognize that language. Also, after recognizing the modifier, it is acceptable if only the language (verb) that becomes the predicate can be recognized.

述語が認識された後は一つの命令が完了するので初期の
状態に復帰して再び目的語となる言語の入力待ちになる
。After the predicate is recognized, one instruction is completed, so the system returns to the initial state and waits for the input of the object language again.

このような方式では音声認識装置が同時に識別しなげれ
ばならない音声言語の数は大巾に減らすことができる上
、文章の構成をいくらでも複雑にすることが可能である
ので極めて多種類の情報を音声によって機械装置に入力
することか可能になる。With such a method, the number of spoken languages that a speech recognition device must simultaneously identify can be greatly reduced, and the structure of sentences can be made as complex as desired, making it possible to handle an extremely wide variety of information. It is possible to input into mechanical devices by voice.

〔Example〕

第１図は不発明による音声認識装置の一実施例を示すブ
ロック線図である。不実施例は音声入力手段としてのマ
イクロホン１、該マイクロホン１の出力信号を増巾する
アンプ回路２、該アンプ回路２の出力である音声信号を
分析して音声の特徴を抽出する特徴抽出回路６、該特徴
抽出回路６の出力である分析データを一時的に保持する
バックアメモリ４．該バッファメモリ４は保持されてい
る分析データを予め準備された特徴データ群に比較し、
近似性の高い特徴データを捜し出す比較回路８、特徴デ
ータ群を３個のグループに分けて６各その１個のグルー
プの特徴データ群を記憶保持する３個のブロックからな
る特徴記憶回路５、該特徴記憶回路５の各ブロック内Ｋ
Ｎげる特徴データの配列順序を示す配列番号信号を発生
する配列番号発生回路６、該配列番号信号を受は取って
、その配列番号により指定された特徴データが収納され
ている特徴記憶回路５のブロック内アドレス信号を発生
するアドレス指定回路７．１回の音声認識が終了した時
、次に入力される筈の音声言語五のグループを予測して特徴抽出回路令の３個のブロック
から１個を選択する選択信号を出力するための、ブロッ
ク選択手段としての機能を有するブロック制御回路１０
、及び、配列番号発生回路６から出力される配列番号信
号とブロック制御回路１０から出力されるブロック選択
信号と比較回路８から出力される音声認識終了信号とを
受けて音声認識結果を具体的な動作に結びつげる命令実
行手段９とから構成されている。FIG. 1 is a block diagram showing one embodiment of a speech recognition device according to the invention. The non-embodiment includes a microphone 1 as a voice input means, an amplifier circuit 2 that amplifies the output signal of the microphone 1, and a feature extraction circuit 6 that analyzes the voice signal output from the amplifier circuit 2 and extracts voice characteristics. , a backup memory 4 for temporarily holding analysis data output from the feature extraction circuit 6; The buffer memory 4 compares the retained analysis data with a group of feature data prepared in advance,
a comparison circuit 8 that searches for highly similar feature data; a feature storage circuit 5 consisting of three blocks that divides the feature data group into three groups and stores and holds the feature data group of each group; K in each block of the feature storage circuit 5
an array number generation circuit 6 that generates an array number signal indicating the arrangement order of feature data; a feature storage circuit 5 that receives and receives the array number signal and stores feature data specified by the array number; 7. Addressing circuit that generates an address signal within the block of 7. When one round of speech recognition is completed, the feature extraction circuit predicts the next group of five speech languages to be input and selects one from the three blocks of the command. A block control circuit 10 having a function as a block selection means for outputting a selection signal for selecting a block.
, and receives the array number signal outputted from the array number generation circuit 6, the block selection signal outputted from the block control circuit 10, and the voice recognition end signal outputted from the comparison circuit 8, and converts the voice recognition result into a specific form. It is composed of an instruction execution means 9 that is linked to the operation.

このブロック線図に示された実施例に８いて、音声認識
は次のようにして行われる。In the embodiment 8 shown in this block diagram, speech recognition is performed as follows.

音声はマイクロホン１によって電気的な音声信号に変換
されてアンプ回路２に伝達される。アンプ回路２におい
て音声信号は増巾されると共に若干のアナログ的な前処
理をされ、特徴抽出回路６に伝達される。該特徴抽出回
路６で音声信号は数種類の周波数成分に分割され、各々
の周波数成分の時間的な変化や各周波数灰分の相互の位
相関係の変化等が抽出され、時系列的に並べられた一連
のデジタルデータ、すなわち分析データが得られる。該
分析データはバッファメモリ４に伝達され、一旦記憶保
持される。以上で音声分析過程が終了するが、この音声
分析過程は従来技術と同等のものである。Sound is converted into an electrical sound signal by the microphone 1 and transmitted to the amplifier circuit 2. In the amplifier circuit 2, the audio signal is amplified and subjected to some analog preprocessing, and then transmitted to the feature extraction circuit 6. The feature extraction circuit 6 divides the audio signal into several types of frequency components, extracts temporal changes in each frequency component, changes in the mutual phase relationship of each frequency component, etc., and extracts a series arranged in chronological order. digital data, that is, analytical data. The analysis data is transmitted to the buffer memory 4 and temporarily stored. This completes the voice analysis process, which is equivalent to the prior art.

１個の音声の入力が終了すると音声分析過程は終了し、
音声照合過程が始まる。このときブロック制御回路１０
は特徴記憶回路５のブロック１を選択する信号を出力し
ているものとする。これにより特徴記憶回路５のブロッ
ク１の記憶内容である特徴データ群のみが比較回路８に
伝達されるようになっている。音声照合過程においては
先ず配列番号発生回路６から第１の特徴データの配列番
号を示す配列番号信号が出力される。該配列番号信号を
受は取ったアドレス指定回路７は当該配列番号の特徴デ
ータが記憶されているアドレスを指定するアドレス信号
を特徴記憶回路５に伝達し、これによって該特徴記憶回
路５のブロック１から第１の特徴データが読み出され、
比較回路８に伝達される。該比較回路８は、前記のバッ
ファメモリ４に記憶保持されている分析データを取り出
して米て、第１の特徴データと比較照合し、その近似性
を定量化する。この時、近似性が一定値以下であれば入
力音声は第１の特徴データの素となった音声言語と同一
のものではないとされ、配列番号発生回路６から第２の
特徴データの配列番号信号が出力される。こうして次々
と特徴データを読み出しては分析データと比較して、近
似性が一定値以上ある特徴データが捜し出される。近似
性が一定値以上あるデータが捜し出されると、比較回路
８から音声認識終了信号が出力され、配列番号発生回路
６に伝達されて配列番号信号の更新を停止する。音声認
識終了信号は又命令実行手段９にも伝達され、その時配
列番号発生回路６から出力されている配列番号信号とブ
ロック制御回路１０から出力されているブロック選択信
号が命令実行手段９に取り込まれる。命令実行手段９に
おいては、ブロック選択信号と配列番号信号との組合せ
によって入力音声の意味を解釈され、その意味する内容
が実行される。When the input of one voice is completed, the voice analysis process ends.
The voice matching process begins. At this time, the block control circuit 10
It is assumed that outputs a signal for selecting block 1 of the feature storage circuit 5. As a result, only the feature data group, which is the storage content of block 1 of feature storage circuit 5, is transmitted to comparison circuit 8. In the voice verification process, first, the array number generation circuit 6 outputs an array number signal indicating the array number of the first feature data. The addressing circuit 7 that has received the array number signal transmits an address signal specifying the address where the feature data of the array number is stored to the feature storage circuit 5, and thereby blocks 1 of the feature storage circuit 5. The first feature data is read from
The signal is transmitted to the comparison circuit 8. The comparison circuit 8 takes out the analysis data stored in the buffer memory 4, compares it with the first characteristic data, and quantifies the similarity thereof. At this time, if the similarity is less than a certain value, it is determined that the input speech is not the same as the speech language from which the first feature data is derived, and the array number generation circuit 6 generates the array number of the second feature data. A signal is output. In this way, the feature data are read out one after another and compared with the analysis data to find feature data that has a similarity greater than a certain value. When data with similarity equal to or greater than a certain value is found, a speech recognition end signal is output from the comparator circuit 8, and is transmitted to the array element number generation circuit 6 to stop updating of the array element number signal. The voice recognition end signal is also transmitted to the instruction execution means 9, and at this time, the array number signal outputted from the array number generation circuit 6 and the block selection signal outputted from the block control circuit 10 are taken into the instruction execution means 9. . In the command execution means 9, the meaning of the input voice is interpreted based on the combination of the block selection signal and the array number signal, and the meaning thereof is executed.

音声認識終了信号は一方でブロック制御回路１０に伝達
され、ブロック選択信号を更新させる。The voice recognition end signal is then transmitted to the block control circuit 10, which updates the block selection signal.

これにより、特徴記憶回路５のブロック２が選択され、
次の音声入力を待機する状態になる。As a result, block 2 of the feature storage circuit 5 is selected,
It will wait for the next voice input.

以上のようにして１個の音声入力に対する音声認識が終
了するが、この動作を３回繰り返すことＫより３個の音
声言語からなる１個の文が完成し、命令実行手段９は１
個の完成した動作を行うことができる。その後ブロック
制御回路１０は再び特徴記憶回路５のブロック１を選択
する選択信号を出力して初期状態に戻り、次の文を受は
付けるようになる。As described above, speech recognition for one speech input is completed, but by repeating this operation three times, one sentence consisting of three speech languages is completed, and the command execution means 9
Able to perform individual completed movements. Thereafter, the block control circuit 10 again outputs a selection signal for selecting block 1 of the feature storage circuit 5, returns to the initial state, and accepts the next sentence.

第３図は本発明の音声認識装置をより具体的にした実施
例の一つである。本実施例ではマイクロホン１は音声照
合回路１１の入力端子■ｔｎに接続されている。該音声
照合回路１１はアドレス出カポ−）Ａｏｕｔとデータ人
力ポートＤ１ｎを介して読み出し専用メモリ（以下ＲＯ
Ｍと称す）１２と結合されている。該ＲＯＭ１２は特徴
データ群の記憶手段として使用されるものであり、デー
タ出カポ−）ｌ）ｏｕｔとアドレス入力ポートＡｔｎｖ
有している。FIG. 3 shows one of the more specific embodiments of the speech recognition device of the present invention. In this embodiment, the microphone 1 is connected to the input terminal ■tn of the voice verification circuit 11. The voice matching circuit 11 is connected to a read-only memory (hereinafter referred to as RO) via an address output port (Aout) and a data input port (D1n).
(referred to as M) 12. The ROM 12 is used as a storage means for a group of characteristic data, and has a data output port (l) out and an address input port Atnv.
have.

該データ出カポ−）　Ｄ　ｏｕｔＱ＄音声照合回路１１
のデータ入力ボートＤｉｎに結合され、アドレス入カポ
）　Ａ　ｉ　ｎの下位ビットは音声照合回路１１のアド
レス出カポ−）Ａ、、ｔに結合されている。D outQ$ voice verification circuit 11
The lower bits of the address input capo A in are coupled to the address output capo A, , t of the voice verification circuit 11 .

一方、音声照合回路１１は音声認識の結果である特徴デ
ータの配列番号信号を出力するデータ出カポ−）Ｎｏｕ
ｔと音声認識動作中信号を出力する出力端子Ｅ　ｏｕｔ
を有している。音声認識動作中信号は２ビツトの７リツ
プフロツプＱ１及びＱ２からなるカウンター１６と該カ
ウンター１３の出力のデコーダーを構成する３個の論理
ゲート１４．１５及び１６の入力として使用されている
。カウンター１６の出力はＲＯＭ１２のアドレス入力ポ
ートＡ１ｎの上位２ビツトに結合される一方、前記３個
の論理ゲート１４．１５及び１６とＡＮＤゲート１７の
入力となっている。３個の論理ゲート１４．１５及び１
６の出力は３個のラッチ回路１９．２０及び２１のサン
プリング制御入力端子に結合されている。該３個のラッ
チ回路１９．２０及び２１のデータ入力端子には音声照
合回路１１のデータ出力ポートＮｏｕｔからの出力信号
が伝達されるようになっている。又、ラッチ回路１９．
２０及び２１のデータ出力信号は命令実行回路２２に対
して並列に入力するように構成されている。一方ＡＮＤ
ゲート１７の出力はワンショット回路１８の入力信号と
して作用し、該ワンショット回路１８の出力信号はカウ
ンター１６のリセット入力端子に導かれると同時に命令
実行、回路２２のスタート信号入力端子Ｓに伝達される
。すなわち第３図の構成に於いては、カウンター１６、
ＡＮＤゲート１７及びワンショット回路１８によってブ
ロック制御回路４０を構成している。On the other hand, the voice matching circuit 11 outputs an array number signal of feature data which is the result of voice recognition.
Output terminal E out that outputs t and voice recognition operation signal
have. The voice recognition active signal is used as an input to a counter 16 consisting of 2-bit 7-lip-flops Q1 and Q2 and to three logic gates 14, 15 and 16 forming a decoder of the output of the counter 13. The output of the counter 16 is coupled to the upper two bits of the address input port A1n of the ROM 12, while being input to the three logic gates 14, 15 and 16, and the AND gate 17. 3 logic gates 14.15 and 1
The output of 6 is coupled to the sampling control input terminals of three latch circuits 19, 20 and 21. The output signal from the data output port Nout of the voice verification circuit 11 is transmitted to the data input terminals of the three latch circuits 19, 20 and 21. Also, the latch circuit 19.
The data output signals 20 and 21 are configured to be input to the instruction execution circuit 22 in parallel. On the other hand, AND
The output of the gate 17 acts as an input signal of the one-shot circuit 18, and the output signal of the one-shot circuit 18 is led to the reset input terminal of the counter 16 and simultaneously transmitted to the start signal input terminal S of the instruction execution circuit 22. Ru. That is, in the configuration shown in FIG. 3, the counter 16,
The AND gate 17 and the one-shot circuit 18 constitute a block control circuit 40.

上記構成において、初期状態ではカウンター１３はクリ
アーされており２個のフリップフロップＱ１及びＱ２の
出力は共に論理Ｏである。したがってＲＯＭ１２は最も
番地の若い第１のブロックが選択されている。以下に第
３図の実施例の動作を第４図の波形図に基づいて説明す
る。In the above configuration, in the initial state, the counter 13 is cleared and the outputs of the two flip-flops Q1 and Q2 are both logic O. Therefore, in the ROM 12, the first block with the lowest address is selected. The operation of the embodiment shown in FIG. 3 will be explained below based on the waveform diagram shown in FIG. 4.

マイクロホン１に１番目の音声信号（波形ａ）が入力さ
れると、該音声信号は音声照合回路１１に伝達され、音
声分析過程（波形ｂ）が開始する。When the first audio signal (waveform a) is input to the microphone 1, the audio signal is transmitted to the audio verification circuit 11, and the audio analysis process (waveform b) is started.

同時に音声照合回路１１の出力端子Ｅ　ｏｕｔの出力で
ある音声認識動作中信号（波形ｅ）が論理１になる。入
力音声信号が終了か又は一定の時間が経過すると音声分
析過程（波形ｂ）が終了しデータ照合過程（波形Ｃ）が
始まる。このデータ照合過程において、音声照合回路１
１は出力ポートＡｏｕｔから次々とアドレス信号を出力
し、ＲＯＭＩ　２から特徴データを読み取っては分析デ
ータと比較し、近似性の高い特徴データを捜し出す。こ
の間、音声照合回路１１のデータ出カポ−）Ｎ、ｕｔか
らは次次と特徴データの配列番号（波形ｄ）が出力され
ている。At the same time, the voice recognition operation signal (waveform e) output from the output terminal E out of the voice matching circuit 11 becomes logic 1. When the input audio signal ends or a certain period of time has elapsed, the audio analysis process (waveform b) ends and the data matching process (waveform C) begins. In this data matching process, the voice matching circuit 1
1 outputs address signals one after another from the output port Aout, reads feature data from ROMI 2, compares it with analysis data, and searches for highly similar feature data. During this time, the array numbers (waveform d) of the characteristic data are outputted from the data output ports N and ut of the voice matching circuit 11.

やがて近似性の高い特徴データが見つけられると該特徴
データの配列番号がデータ出力ボートＮ　ｏｕｔに保持
さ扛たままデータ照合過程（波形Ｃ）が終了し、音声認
識動作中信号（波形ｅ）は論理０になる。これによって
第１番目の音声認識が終了するのであるが、この間に論
理ゲート１４から音声認識動作中信号（波形ｅ）に同期
したサンプリング信号（波形ｈ）が出力され、その信号
の立ち下がりでラッチ回路１９が配列番号信号を記憶す
る（波形ｋ）。該配列番号信号は第１の音声認識の結果
を与えろデータである。Eventually, when highly similar feature data is found, the data matching process (waveform C) ends while the array number of the feature data is retained in the data output port N out, and the voice recognition operation signal (waveform e) is Becomes logic 0. This completes the first voice recognition, but during this time the logic gate 14 outputs a sampling signal (waveform h) that is synchronized with the voice recognition operation signal (waveform e), and at the falling edge of this signal, the signal is latched. Circuit 19 stores the array number signal (waveform k). The sequence number signal is data giving the result of the first speech recognition.

音声認識動作中信号（波形ｅ）の立ち下がりがカウンタ
ー１６に入力することによりて該カウンター１６の値が
歩進しフリップフロップＱ、及びＱ２はそれぞれ論理１
及び論理０になる（波形ｆ及びｇ）。この結果、ＲＯＭ
−１２は第２のブロックが選択され、２番目の音声信号
の入力待ちの状態になる。２番目の音声信号が入力され
ると、前回と同様に音声分析を行うが、サンプリング信
号は論理ゲート１５から出力され（波形ｉ）、ラッチ回
路２０に第２の音声認識結果である配列番号信号を記憶
する（波形ｌ）。同様にして３番目の音声信号の入力に
対してはＲＯＭ１２の第３のブロックが選択されており
、サンプリング信号は論理ゲート１６から出力される（
波形ｊ）。又、ラッチ回路２１に第３の音声認識結果で
ある配列番号信号を記憶する（波形ｍ）。When the falling edge of the voice recognition operation signal (waveform e) is input to the counter 16, the value of the counter 16 increments, and the flip-flops Q and Q2 each become logic 1.
and becomes a logic 0 (waveforms f and g). As a result, the ROM
-12, the second block is selected and enters the state of waiting for input of the second audio signal. When the second audio signal is input, audio analysis is performed in the same way as before, but the sampling signal is output from the logic gate 15 (waveform i), and the latch circuit 20 receives the array number signal, which is the second audio recognition result. (waveform l). Similarly, the third block of the ROM 12 is selected for the input of the third audio signal, and the sampling signal is output from the logic gate 16 (
Waveform j). Further, the array number signal which is the third voice recognition result is stored in the latch circuit 21 (waveform m).

以上のようにして３回の音声認識を終了すると、３個の
ラッチ回路１９．２０及び２１にそれぞれＲＯＭ１２の
第１％第２及び第３のブロックにおける配列番号信号が
準備されろ。該３個の配列番号は命令実行回路２２に並
列に伝達される。一方、３番の音声認識が終了するとカ
ウンター１３は歩進し、フリップフロップＱ１及びＱ２
は共に論理１になる。この結果ＡＮＤゲート１７の出力
も論理１になる。該ＡＮＤゲート１７の出力はワンショ
ット回路１８によってパルス信号に変形され、カウンタ
ー１３のリセット端子に伝達されて該カウンター１６を
リセットして初期化すると共に、命令実行回路２２のス
タート端子に伝達されて前記３個の配列番号によって指
定される音声言語の組合せによる文の内容の動作を開始
させる。When the voice recognition is completed three times as described above, the array number signals for the 1st% second and third blocks of the ROM 12 are prepared in the three latch circuits 19, 20 and 21, respectively. The three array array numbers are transmitted to the instruction execution circuit 22 in parallel. On the other hand, when the voice recognition of No. 3 is completed, the counter 13 increments, and the flip-flops Q1 and Q2
Both become logic 1. As a result, the output of AND gate 17 also becomes logic 1. The output of the AND gate 17 is transformed into a pulse signal by the one-shot circuit 18, and is transmitted to the reset terminal of the counter 13 to reset and initialize the counter 16, and is transmitted to the start terminal of the instruction execution circuit 22. The content of the sentence is started based on the combination of audio languages specified by the three array numbers.

本実施例に２いては、３回の音声認識において入力され
る音声言語の種類が一定の約束の下にあると言う前提が
必要である。前述の例で説明すれば、「手を−ゆっくり
と−挙げろ」のような命令文を認識させる場合、ＲＯＭ
１２の第１、第２、及び第３のブロックにはそれぞれ、
目的語を表わす音声言語の特徴データ群、修飾語を表わ
す音声言語の特徴データ群、及び述語な表わ丁音声言語
の特徴データ群を記憶させておく。その上で一定の順序
でブロックの選択を行うと唯一の文法に従った文章が構
成されるのである。In the second embodiment, it is necessary to assume that the type of speech language input in three speech recognitions is subject to a certain agreement. To explain using the example above, if you want to recognize a command sentence like "raise your hand slowly", the ROM
The 12 first, second, and third blocks each include:
A group of characteristic data of a spoken language representing an object, a group of characteristic data of a spoken language representing a modifier, and a group of characteristic data of a spoken language representing a predicate are stored. Then, by selecting blocks in a certain order, a sentence that follows a unique grammar is constructed.

しかし複数言語によっ（文章を作る場合、常に一つの文
法に制約されるのでは文章の自由度が少なく、より自由
な表現ができない。この点を改善するために、認識が終
った音声言語の意味に応じて、次に認識しなげればなら
ない言語グループを推定して行（ことが考えられる。す
なわち、文法に分岐性を与えることによって更に複雑な
内容の表現を音声認識することができるようになる。However, when creating sentences in multiple languages, if you are always constrained to one grammar, there is less freedom in writing and you cannot express yourself more freely. Depending on the meaning, it is possible to estimate the language group that needs to be recognized next. become.

第５図は本発明の実施例の一つであり、分岐性のある文
法を有する文章の音声認識を可能にした音声認識装置を
示すブロック線図である。本実施例は第１図に示したと
ころの実施例と概略同じ構成を有しているが、ブロック
制御回路１０の入力信号として配列番号発生回路６から
出力される配列番号信号が使用されている点で機能上の
違いを有している。本実施例においてブロック制御回路
１０は、現在出力しているブロック選択信号と、音声認
識の結果として捜し出された配列番号信号とβ・ら入力
音声言語の意味を理解する。次にその意味に従って次に
米るべき音声言語がどのような種類のものであるかを推
定して、そのグループの特徴データが含まれるブロック
を特徴記憶回路５の中から選択する信号、すなわち新し
いブロック選択信号を出力するようになっている。FIG. 5 is a block diagram showing a speech recognition device, which is one embodiment of the present invention, and is capable of speech recognition of sentences having a branching grammar. This embodiment has roughly the same configuration as the embodiment shown in FIG. They have some functional differences. In this embodiment, the block control circuit 10 understands the meaning of the input speech language from the currently output block selection signal, the array element number signal found as a result of speech recognition, and β. Next, according to the meaning, it is estimated what type of spoken language to use next, and a signal for selecting a block containing the feature data of that group from the feature storage circuit 5, that is, a new It outputs a block selection signal.

第６図は第５図において説明した分岐性のある文法を有
する音声認識装置をより具体的にした回路図であり、本
発明の実施例の一つである。本実施例において、マイク
ロホン１、音声照合回路１１、及び特徴データ群の記憶
手段としてのＲＯＭ１２は第３図の実施例において説明
したものと同様のものであるので、説明を省略する。音
声照合回路１１のデータ出力ボートＮ、、ｔには命令実
行回路２２のデータ入力部とメモリ回路２６０入力部が
接続されている。該メモリ回路２６は音声照合回路１１
の出力端子Ｅ　ｏｕｔの出力信号である音声認識動作中
信号をサンプリング信号として使用しており、該サンプ
リング信号の立ち下がりでデータ出力ポートＮｏｕｔの
出力信号をメモリ回路２６に読み込むようになっている
。メモリ回路２６の出力信号は文法解読用読み出し専用
メモリ２４（以下文法ＲＯＭと称す）のアドレス入力信
号の１部として利用されている。該文法ＲＯＭ２４の出
力端子は、ＲＯＭ１２のアドレス入カポ）Ａ＋ｎの上位
ビットと命令実行回路２２のデータ入力部と、ラッチ回
路２５の入力部とに接続され、該ラッチ回路２５の出力
データは文法ＲＯＭ２４のアドレス入力信号の１部とし
て帰還している。FIG. 6 is a more specific circuit diagram of the speech recognition device having the branching grammar explained in FIG. 5, and is one of the embodiments of the present invention. In this embodiment, the microphone 1, the voice matching circuit 11, and the ROM 12 as a storage means for the characteristic data group are the same as those described in the embodiment of FIG. 3, and therefore their explanation will be omitted. A data input section of an instruction execution circuit 22 and an input section of a memory circuit 260 are connected to the data output ports N, , t of the voice verification circuit 11. The memory circuit 26 is the voice matching circuit 11
The voice recognition operation signal, which is the output signal of the output terminal E out of , is used as a sampling signal, and the output signal of the data output port Nout is read into the memory circuit 26 at the falling edge of the sampling signal. The output signal of the memory circuit 26 is used as part of the address input signal of the grammar decoding read-only memory 24 (hereinafter referred to as grammar ROM). The output terminal of the grammar ROM 24 is connected to the upper bit of the address input capo A+n of the ROM 12, the data input part of the instruction execution circuit 22, and the input part of the latch circuit 25, and the output data of the latch circuit 25 is connected to the grammar ROM 24. It is fed back as part of the address input signal.

音声照合回路１１の出力端子Ｅ　ｏｕｔの出力信号であ
る音声認識動作中信号は、前述のサンプリング信号とし
て使用される他にラッチ回路２５のサンプリング制御入
力端子と命令実行回路２２のスタート信号入力端子とに
伝達される。第６図の構成に於いては、メモリ２６、文
法ＲＯＭ２４及びラッチ２５によってブロック制御回路
４０を構成している。The voice recognition operation signal, which is the output signal of the output terminal Eout of the voice matching circuit 11, is used as the sampling signal mentioned above, and also as the sampling control input terminal of the latch circuit 25 and the start signal input terminal of the instruction execution circuit 22. transmitted to. In the configuration shown in FIG. 6, the memory 26, grammar ROM 24, and latch 25 constitute a block control circuit 40.

このような構成において、初期状態ではメモリ回路２３
とラッチ２５はそれぞれ一定初期垣にプリセットされて
おり、両者の内容データによって文法ＲＯＭ２４のアド
レス入力信号が定まっている。その結果文法ＲＯＭ２４
からは特徴データ記−億回路としてのＲＯＭ１２の第１
のブロックを選択するブロック選択信号が出力されてい
る。この時マイクロホン１に１番目の音声信号が入力さ
れると音声認識動作が始まり、一定時間の後に音声照合
回路１１のデータ出カポ−）　Ｎ　ｏｕｔに音声認識結
果として１番目の配列番号信号が出力される。In such a configuration, in the initial state, the memory circuit 23
and latch 25 are each preset to a certain initial value, and the address input signal of grammar ROM 24 is determined by the content data of both. As a result, grammar ROM24
From then on, the first part of ROM 12 as a characteristic data storage circuit is shown.
A block selection signal for selecting the block is output. At this time, when the first voice signal is input to the microphone 1, the voice recognition operation starts, and after a certain period of time, the first array number signal is output as the voice recognition result to the data output capo of the voice matching circuit 11. be done.

同時に、音声照合回路１１の出力端子Ｅ　ｏｕｔから音
声認識動作中信号が出力される。以上の音声認識動作は
第３図の実施例の場合と同様であるので詳細な説明を省
略する。At the same time, a voice recognition operation signal is output from the output terminal E out of the voice matching circuit 11. The voice recognition operation described above is the same as that in the embodiment shown in FIG. 3, so a detailed explanation will be omitted.

音声認識動作中信号が論理Ｏに変化すると、命令実行回
路２２は音声認識の結果得られた配列番号信号とブロッ
ク選択信号とを読みとって、入力された音声言語の意味
を理解し、その意味にそった動作を実行する。When the voice recognition operation signal changes to logic O, the instruction execution circuit 22 reads the array number signal and block selection signal obtained as a result of voice recognition, understands the meaning of the input voice language, and uses the meaning. perform the corresponding action.

一方音声認識動作中信号の立ち下がりによって音声認識
結果としての配列番号信号はメモリ回路２６に読み込ま
れて記憶保持される。同時にラッチ回路２５には１番目
のブロック選択信号がラッチされる。On the other hand, when the voice recognition operation signal falls, the array number signal as the voice recognition result is read into the memory circuit 26 and stored and held. At the same time, the latch circuit 25 latches the first block selection signal.

この結果、文法ＲＯＭ２４のアドレス入力信号は１番目
の音声入力信号の配列番号信号とＲＯＭ１２の第１のブ
ロックを選択するブロック選択信号によりて構成される
ことになる。文法ＲＯＭ２４はこのアドレス入力信号圧
基づいて新しいブロック選択信号を出力し、ＲＯＭ１２
の第２のブロックを選択する。この時、配列番号信号の
値（すなわち１番目の入力音声言語の意味）によっては
新しいブロック選択信号が第１のブロックを選択する信
号のまま変化しないこともあり得る。As a result, the address input signal of the grammar ROM 24 is composed of the array number signal of the first audio input signal and the block selection signal for selecting the first block of the ROM 12. The grammar ROM 24 outputs a new block selection signal based on this address input signal pressure, and the ROM 12
Select the second block of . At this time, depending on the value of the array number signal (ie, the meaning of the first input audio language), the new block selection signal may remain unchanged as the signal for selecting the first block.

このことが文法に分岐性を与えろことに通じているので
ある。This leads to the idea of giving branching properties to grammar.

このようにして新しいブロック選択信号が定まると、２
番目の音声信号の入力待ち状態になり、２番目の入力音
声の種類によって３番目のブロック選択信号が定まるの
である。この場合においてもブロック選択信号は必ずし
も新しいものであるわけでなく、１番目又は２番目のブ
ロック選択信号が再度出力されることもある。When a new block selection signal is determined in this way, 2
The system waits for the input of the second audio signal, and the third block selection signal is determined by the type of the second input audio. Even in this case, the block selection signal is not necessarily new, and the first or second block selection signal may be output again.

このような構成によれば、一つの音声認識の結果によっ
て、その意味を吟味し次に米たるべき音声言語の所属す
るグループを推定し、該当する特徴データ群を記憶保持
しているブロックを選択していくことが可能になる。According to this configuration, based on the result of one speech recognition, the meaning is examined, the group to which the next spoken language belongs is estimated, and the block that stores the corresponding feature data group is selected. It becomes possible to continue.

次にこのような分岐性を持った文法による命令大系を持
った電子機器の例として、音声制御式デジタル時計を取
り上げ、その文法の利用例を説明する。Next, a voice-controlled digital watch will be taken up as an example of an electronic device that has a command system based on such a branching grammar, and an example of the use of this grammar will be explained.

ここに引用する音声制御式デジタル時計は、本発明によ
って実現されたものであり、従来、操作が極めてやっか
いであった多機能デジタル時計を丁べて音声入力で制御
することを可能にしたものである。The voice-controlled digital watch cited here was realized by the present invention, and it is a multi-function digital watch that was previously extremely difficult to operate, but it is now possible to control it by voice input. be.

第７図は不実施例に於ける時計の保有機能とその機能の
呼び出し方法について説明した状態遷移図である。第７
図より明らかなように本時計は、時刻を表示する時刻表
示機能のモード２６、プリセットされた時刻にアラーム
音を発生するアラーム機能のモード２７、及び多数の電
話番号を記憶していてその中の一つを命令に応じて表示
する番号メモ機能のモード２８の三つの基本モードを有
している。FIG. 7 is a state transition diagram illustrating functions possessed by the watch and a method of calling those functions in a non-embodiment. 7th
As is clear from the figure, this watch has mode 26 for the time display function that displays the time, mode 27 for the alarm function that generates an alarm sound at a preset time, and a large number of phone numbers that are stored in the memory. It has three basic modes, one of which is mode 28 for the number memo function, which is displayed in response to a command.

時刻表示機能においては時刻表示を行う時刻表示モード
２６ａと、時刻を修正する機能を動作させる時刻修正モ
ード２６ｂとがある。アラーム機能においてはアラーム
時刻とアラーム機能のオンオフ状態とを表示するアラー
ム時刻表示モード２７ａと、アラーム時刻をプリセット
したりアラーム機能のオンオフを選択する機能を有する
アラーム時刻セットモード２７ｂとがある。又、番号メ
モ機能では、番号メモのメモリー使用状態を表示する番
号メモメモリーマツプ表示モード２８ａと、メモ丁べき
番号とその番号のネームを入力してメモ登録する機能を
働かせる番号メモ登録モード２８ｂと、ネームを入力す
ることによりそのネームに相当する番号をメモリーから
読み出して表示する番号メモ読み出しモード２８Ｃとが
ある。The time display function includes a time display mode 26a for displaying the time, and a time correction mode 26b for operating a time correction function. The alarm function includes an alarm time display mode 27a that displays the alarm time and the on/off state of the alarm function, and an alarm time set mode 27b that has the function of presetting the alarm time and selecting on/off of the alarm function. In addition, the number memo function includes a number memo memory map display mode 28a that displays the memory usage status of the number memo, and a number memo registration mode 28b that activates the function of registering a memo by inputting the number to be memoed and the name of that number. There is also a number memo readout mode 28C in which when a name is input, a number corresponding to the name is read out from the memory and displayed.

これ等７個のモードの相互の移行はすべて音声による命
令によって行われる。この状態遷移に使用される音声言
語は第一グループの言語であって−「ジコク」、「アラ
ーム」、「メモ」、「シュウセイ」、「トウロク」、「
ヨミダシ」の６単語である。これ等の音声言語の機能は
第７図中の状態遷移方向暑示す矢印によって示しである
。ただし図中に示した「オワリ」の音声言語は状態遷移
命令に使用されているが、機能上第二の音声言語グルー
プに属している。All transitions between these seven modes are performed by voice commands. The spoken languages used for this state transition are those of the first group - ``jikoku'', ``alarm'', ``memo'', ``shuusei'', ``touroku'', ``
The six words are ``Yomidashi.'' The functions of these spoken languages are indicated by arrows indicating state transition directions in FIG. However, although the spoken language "Owari" shown in the figure is used for state transition commands, it functionally belongs to the second spoken language group.

該第二の音声言語グループは前記「オワリ」の他「ゼロ
」、「イチ」、「二」、「サン」、「ヨン」、「ゴ」、
「ロク」、「ナナ」、「ハチ」、「キュウ」、「オン」
、「オフ」、「ゴゼン」、「ゴゴ」、「モトエ」の１６
単語からなっている。In addition to the above-mentioned "Owari", the second spoken language group includes "Zero", "Ichi", "Ni", "San", "Yon", "Go",
"Roku", "Nana", "Hachi", "Kyuu", "On"
, "Off", "Gozen", "Gogo", "Motoe" 16
It consists of words.

これ等第ニゲループの音声言語は、時刻修正モード２６
ｂ、アラーム時刻セットモード２７ｂ、及び番号メモ登
録モード２８ｂにおいて数値データを入力したり、入力
したデータのミスの修正をしたり、アラーム機能のオン
オフを選択したりすることに使用される。例えばアラー
ム時刻セットモード２７ｂにおいて、「ゴゼン」−「ゼ
ロ」−「ゴ」と音声入力すると、アラーム時刻として午
前６時４５分がプリセットされる。又、「オン」と音声
入力することでアラーム機能がオン状態となり、「オフ
」と音声人力することでアラーム機能がオフ状態になる
。又、「オワリ」と音声入力することでアラーム時刻表
示モード２７ａへ状態遷移するのである。The audio language of these second loops is time correction mode 26
b. In the alarm time setting mode 27b and the number memo registration mode 28b, it is used to input numerical data, correct errors in input data, and select on/off of the alarm function. For example, in the alarm time setting mode 27b, if "Gozen" - "Zero" - "Go" is input by voice, 6:45 am is preset as the alarm time. Also, by inputting "on" by voice, the alarm function is turned on, and by inputting "off" by voice, the alarm function is turned off. Also, by inputting a voice saying "Owari", the state changes to the alarm time display mode 27a.

第三の音声言語グループは番号メモ読み出しモード２８
Ｃにおいてのみ音声認識が可能な音声言語であり、番号
メモのネームそのものである。該ネームの音声の特徴デ
ータは番号メモ登録モード２８ｂにおいて音声を入力す
ることによって抽出され、ＲＡＭに保存されているもの
である。それ等のネームは例えば「ヤマダ」、「コバヤ
シ」、「カッコウ」、「ホケンショウバンゴウ」等の音
声言語が記憶されているので、その音声を音声認識し、
そのネームに対になった番号をメモリーより引出して表
示する。Ｊ、うになっている。第三の音声言語グループ
はこのようなネームと「オワリ」以上に説明した３グル
ープの音声言語を駆使して本時計の全操作が可能となっ
ているのだが、その時、本時計に装備されている音声認
識装置は常時これ等全ての音声を対象にして認識作業を
する必要はない。時計は第７図に示した７個のモードの
どれかの状態にあるのだが、その状態において意味のあ
る音声入力は上記３グループの中の１グループに限られ
ている。したがってそのグループの音声言語の特徴デー
タが記憶されているＲＯＭ（第３グループについてはＲ
ＡＭ）のブロックが選択されていれば良いのである。ま
たこのブロックの交換ハ、「シェウセイ」、「トウロク
」、「ヨミダシ」、「オワリ」の４種類の音声入力が行
われた場合に実行されるが、同じ「シェウセイ」と言う
音声であってもその時のモードが時刻表示モード２６ａ
であるのか又はアラーム時刻表示モード２７ａであるの
かによって遷移先が異るのである。更に番号メモメモリ
ーマツプ表示モード２８ａにおける「シーウセイ」と言
う音声のように認識されても機能はしない場合もあるの
である。The third voice language group is number memo reading mode 28
It is a spoken language that can be recognized only in C, and is the name of the number memo itself. The voice characteristic data of the name is extracted by inputting the voice in the number memo registration mode 28b and is stored in the RAM. For example, the spoken names such as "Yamada", "Kobayashi", "Cuckoo", and "Hokenshobango" are memorized, so the voice is recognized by voice,
The number paired with that name is retrieved from memory and displayed. J. The sea urchin is turning. The third voice language group allows full operation of the watch by making full use of these names and the voice languages of the three groups explained above, but at that time, the watch is equipped with It is not necessary for the speech recognition device currently available to perform recognition work on all of these voices at all times. Although the watch is in one of the seven modes shown in FIG. 7, meaningful voice input in that state is limited to one of the three groups mentioned above. Therefore, the ROM (for the third group, R
It is sufficient if the block AM) is selected. Also, the exchange of this block is executed when four types of voice inputs are performed: ``Shausei'', ``Touroku'', ``Yomidashi'', and ``Owari'', but even if the same voice is ``Shausei'', The mode at that time is time display mode 26a
The transition destination differs depending on whether the mode is 27a or the alarm time display mode 27a. Furthermore, even if the voice is recognized, it may not function as in the case of the voice ``see'' in the number memo memory map display mode 28a.

〔Effect of the invention〕

以上に本発明による音声認識装置の詳細を説明したが、
本発明によれば一回の音声認識において入力される可能
性のある音声言語の数を一定以下に制限することが可能
になった。このため音声認識の誤謬率が大巾に減少し確
実な音声認識ができるようになった。また音声認識に必
要な時間を一定値以内にすることができ、高価でかつ大
型の高速データ処理装置を使用することなく迅速な音声
認識が可能になった。Although the details of the speech recognition device according to the present invention have been explained above,
According to the present invention, it has become possible to limit the number of spoken languages that may be input in one speech recognition to a certain number or less. As a result, the error rate in speech recognition has been greatly reduced, making reliable speech recognition possible. Furthermore, the time required for speech recognition can be kept within a certain value, and rapid speech recognition has become possible without using expensive and large-scale high-speed data processing equipment.

一方、音声認識を数回からそれ以上の回数に分けて行う
ことができることから、音声認識が可能な語粟の数が大
変豊かになった。また単に語粟の増大による表現の自由
度の拡大だけでなく、それ等の単語を組合せて文を構成
させることが可能になったため非常に複雑な内容の伝達
を音声によって行うことが可能になったのである。しか
もこの文は一つの定形化された文法によるのみでな（、
分岐性をもった文法の使用も可能にしたので、表現の自
由度は益々大きくなっている。On the other hand, since speech recognition can be divided into several or more times, the number of words that can be speech-recognized has greatly increased. Furthermore, not only did the freedom of expression expand due to the increase in the number of words used, but it also became possible to construct sentences by combining words, making it possible to convey extremely complex content through speech. It was. Moreover, this sentence is based on only one formalized grammar (,
It has also made it possible to use branching grammars, so the degree of freedom of expression is increasing.

このように優れた特徴を有する本発明の音声認　２識装
置は、主として電池を電源とするような携帯用電子機器
に応用した場合極めて有用な効果を生じる。例えば前述
したような音声制御式デジタル時計においては、装置を
小型軽量化することができるので腕時計としての実用性
が十分に得られるのである。腕時計において音声による
操作が可能になることは、従来の腕時計で多機能化や高
信頼性の障害になっていた機械式の入力手段（リューズ
や押しボタンスイッチ等）を排除できる方法を与えたこ
とであり、大きな意味を持っている。又、音声による機
能操作は、操作目的を直接入力できるので従来のデジタ
ル時計のスイッチ操作のように複雑なスイッチ操作の繰
り返しが必要でなくなり、機械操作に不慣れな人でも容
易に使用できる多機能時計が実現できる。特に電話番号
メモなどのデータ索引機能においてはデータのネームを
音声認識させることによってデータの直接引き出しが可
能になり、従来のメモ機能付腕時計に較べて格段の進歩
性を発揮する。The voice recognition device of the present invention having such excellent features produces extremely useful effects when applied to portable electronic equipment mainly powered by batteries. For example, in the voice-controlled digital watch as described above, the device can be made smaller and lighter, making it fully practical as a wristwatch. The ability to operate wristwatches by voice provides a way to eliminate mechanical input means (crowns, pushbutton switches, etc.), which were an obstacle to multifunctionality and high reliability in conventional wristwatches. and has great meaning. In addition, when operating functions by voice, the purpose of the operation can be input directly, which eliminates the need for repeated complex switch operations, unlike the switch operations of conventional digital watches. can be realized. Particularly in data index functions such as phone number memos, data can be retrieved directly by voice recognition of the name of the data, making it a much more advanced watch than conventional wristwatches with memo functions.

[Brief explanation of drawings]

第２図は従来の技術による音声認識装置の構成を表わす
ブロック線図、第１図と第５図はともに本発明よりなる
音声認識装置の実施例を表わすブロック線図、第３図は
第１図の実施例と同等の技術よりなる音声認識装置をよ
り具体的な回路図モ、第４図は第３図の回路図の動作を説明するだめの信号波形図、第６図は第５図の実施例
と同等の技術よりなる音声認識装置をより具体的な回路
図　　　　−、第７図は第２図又は第６図に示した音声
認識装置を装備した音声制御デジタル時計の機能を説明
するだめの状態遷移図である。１・・・・・・マイクロホン、２・・・・・・アンプ回
路、３・・・・・・特徴抽出回路、４・・・・・・バッ
ファメモリ、５・・・・・・特徴弄冨呼記憶回路、６・・・・・・配列番号発生回路、７・・・・・・アドレス指定回路、８・・・・・・比較
回路、９・・・・・・命令実行手段、１０．４０・・・・・・ブロック制御回路、１３・・・
・・・カウンター、１８・・・・・・ワンショット回路。手続補正書（方式）昭和〆／年Ｚり〃日特許庁長官　宇　賀　道　部　殿１、事件の表示昭和６１年特許願第７７９００号　　　　薗２、発明の
名称音声認識装置３、補正をする者事件との関係　　特許出願人住所　東京都新宿区西新宿２丁目１番１号電話（０３）
３４２−１２３１昭和６１年６月２４日（発送日）５、補正により増加する発明の数な　　し６、補正の対象図面全図FIG. 2 is a block diagram showing the configuration of a speech recognition device according to the prior art, FIGS. 1 and 5 are both block diagrams showing an embodiment of the speech recognition device according to the present invention, and FIG. A more specific circuit diagram of a speech recognition device that uses the same technology as the embodiment shown in the figure, Figure 4 is a signal waveform diagram for explaining the operation of the circuit diagram in Figure 3, and Figure 6 is a diagram of the signal waveforms shown in Figure 5. A more specific circuit diagram of a voice recognition device using technology equivalent to that of the embodiment shown in FIG. It is a state transition diagram of no use. 1...Microphone, 2...Amplifier circuit, 3...Feature extraction circuit, 4...Buffer memory, 5...Characteristics Call storage circuit, 6... Array number generation circuit, 7... Address designation circuit, 8... Comparison circuit, 9... Instruction execution means, 10. 40...Block control circuit, 13...
...Counter, 18...One-shot circuit. Procedural amendment (method) Michibe Uga, Director General of the Japan Patent Office, 1985/1985, 1, Indication of the case, 1986 Patent Application No. 77900, 2, Name of the invention, voice recognition device 3, Person making the amendment Relationship to the incident Patent applicant address 2-1-1 Nishi-Shinjuku, Shinjuku-ku, Tokyo Telephone (03)
342-1231 June 24, 1986 (shipment date) 5. The number of inventions will not increase due to the amendment 6. All drawings subject to the amendment

Claims

[Claims]

A voice input means and a feature of the voice signal taken in by the voice input means are extracted and compared with feature data of a plurality of previously prepared speech languages, and to which of the previously prepared feature data the voice signal corresponds. voice matching means for determining whether the feature data is correct and outputting a code signal representing the number of the feature data;
and feature storage means for storing and holding feature data groups of the plurality of speech languages prepared in advance, and supplying the stored contents to the speech matching means as needed, the feature storage; The means is constituted by a plurality of blocks for storing each of the plurality of groups in which the spoken languages prepared in advance are classified, and selectively supplies the stored information of the plurality of blocks to the speech matching means. a block selection means for
and command execution means for inputting a plurality of voice signal information collated against the feature data group of each block sequentially supplied to the voice collation means by the block selection means and executing the command according to the combination thereof. A voice recognition device featuring: