JPS6118199B2

JPS6118199B2 -

Info

Publication number: JPS6118199B2
Application number: JP51061984A
Authority: JP
Inventors: Hiroaki Sekoe; Shigemi Chiba
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1976-05-27
Filing date: 1976-05-27
Publication date: 1986-05-10
Also published as: JPS52144205A

Description

【発明の詳細な説明】本発明は音声認識装置の改良に関するものであ
る。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to improvements in speech recognition devices.

人間によつて発声された音声を自動的に認識す
る装置である音声認識装置は人間から計算機や各
種機械へデータや指令を入力するための新しい手
段として非常に有効であると考えられている。た
とえば、数字音声を認識する装置を電子計算機に
接続して用いると伝票などの数字データを入力す
ることができるが、音声信号は安価な電話回線を
経由して容易に伝送できるので遠隔地からデータ
を入力するという新しくかつ有用な使用形態が可
能になる。また、各種産業機械の運転に必用な各
種命令語を認識する装置を用いると人間が発声す
るだけで機械を制御することができて手や足さら
には目を他の目的に使用できるので同時に複数の
作業を実行でき従来にない高い作業能率が実現で
きることになる。 Speech recognition devices, which are devices that automatically recognize speech uttered by humans, are considered to be very effective as a new means for inputting data and commands from humans to computers and various machines. For example, if you use a device that recognizes voice numbers and connects them to a computer, you can input numerical data such as slips, but voice signals can be easily transmitted via inexpensive telephone lines, so data can be sent from remote locations. This enables a new and useful form of use in which the user inputs the following information. In addition, by using a device that recognizes the various command words necessary to operate various industrial machines, it is possible to control the machine simply by uttering the words, and the hands, feet, and even eyes can be used for other purposes, allowing multiple commands to be used at the same time. This means that work can be carried out and higher work efficiency than ever before can be achieved.

しかし現在までに開発されている音声認識装置
は周囲雑音が入つた場合や発声が不明確であつた
場合には正常な動作ができず誤認識を生じる恐れ
がある。したがつて誤りが絶対に許されない使用
目的の場合には、使用者が発声し終り判定結果が
確定した時点でこの判定結果を表示して使用者に
確認を求めるような構成が必要とされる。この場
合には使用者は発声と判定結果認識を次々に繰返
し、誤りがない限り順次先のデータを入力し続
け、確認の段階で誤りが発見された場合にはその
データを再び発声して入力するという作業手順を
実行することになる。この作業手順を高速に実行
するためには、認識装置が高速に動作して判定結
果を早く表示して使用者が確認できるようにする
必要がある。ここで問題になるのは実際に発声が
終了していても数百ミリ秒の間は認識結果を確定
できないという事実がある。すなわち現在までの
ほとんどの認識装置は入力された音声の終端の検
出を音声の振幅レベルを検定することによつて行
なつているが、振幅レベルが瞬時的に０（実際に
は充分小な値）になつてもその時点を入力音声の
終端であると判定することはできず、振幅レベル
０の時間が、たとえば250ミリ秒継続して始めて
終端であると確定できるのである。仮に、振幅レ
ベルが０になつた時点を直ちに終端であると決定
すると次のような不都合が生じる。いま、数字
“６”（／roku／）を発声したとすると、普
通、／ro／と／ku／の間に休止区間が表われ、
振幅レベルは０になつてしまう（以下ではこの休
止区間を単語中の休止区間と呼ぶ）。したがつ
て、パワーレベルが０になつた時点を単語の終端
であるとすると、／ro／が１個の単語とされ、恐
らく“５”（／go／）と判定されてしまうであろ
う。このような不都合を避けるためには振幅レベ
ルがｏになつた区間すなわち休止区間が単語中の
休止区間であるか、あるいは真に単語終端の休止
区間であるかを正確に判別する必要がある。単語
中の休止区間は、通常200ミリ秒程度以下の長さ
しかない。この事実を利用して、振幅レベルｏの
休止区間があらかじめ定められた長さ、例えば
250ミリ秒以下であると単語中の休止区間である
とし、たとえば250ミリ秒を超えると単語終端の
休止区間であると判別する方法がとられる。した
がつて発声し終つてたとえば250ミリ秒は単語終
端が確定しないので判定結果を表示できないこと
になる。 However, the speech recognition devices that have been developed to date cannot operate normally and may cause erroneous recognition when ambient noise is present or when the speech is unclear. Therefore, if the purpose of use is for absolutely no errors, it is necessary to have a configuration that displays the judgment result and requests confirmation from the user once the user has finished speaking and the judgment result has been finalized. . In this case, the user repeats utterance and judgment result recognition one after another, and continues inputting the previous data in sequence unless there is an error. If an error is discovered during the confirmation stage, the user speaks the data again and inputs it. The following work procedure will be executed. In order to execute this work procedure at high speed, it is necessary for the recognition device to operate at high speed and quickly display the determination result so that the user can confirm it. The problem here is the fact that even if the utterance has actually ended, the recognition result cannot be confirmed for several hundred milliseconds. In other words, most recognition devices to date detect the end of input speech by testing the amplitude level of the speech, but when the amplitude level is instantaneously 0 (actually a sufficiently small value), ), it is not possible to determine that this point is the end of the input audio, and the end can only be determined after the amplitude level 0 continues for, for example, 250 milliseconds. If the moment when the amplitude level becomes 0 is immediately determined to be the end, the following inconvenience will occur. Now, if you say the number "6" (/roku/), there will normally be a pause between /ro/ and /ku/,
The amplitude level becomes 0 (hereinafter, this pause section will be referred to as a pause section within a word). Therefore, if we assume that the point when the power level becomes 0 is the end of a word, /ro/ will be treated as one word, and will probably be determined as "5" (/go/). In order to avoid such inconveniences, it is necessary to accurately determine whether the section in which the amplitude level reaches o, that is, the pause section, is a pause section within a word, or whether it is truly a pause section at the end of a word. Pauses within words are usually only about 200 milliseconds or less in length. Taking advantage of this fact, the pause interval of amplitude level o can be set to a predetermined length, e.g.
If it is 250 milliseconds or less, it is determined that it is a pause within a word, and if it exceeds 250 milliseconds, it is determined that it is a pause at the end of a word. Therefore, for example, 250 milliseconds after the end of the utterance, the end of the word is not determined, so the determination result cannot be displayed.

以上述べた理由によつて、従来の音声認識装置
の使用者は発声が終つた後数百ミリ秒間無為に時
間を費やさないと認識結果を知ることができず、
また認識結果が表示されてから確認を行なつた後
でないと次のデータを発声できなかつた。このた
め従来の音声認識装置では高速にデータを入力す
ることは不可能であつた。 For the reasons mentioned above, users of conventional speech recognition devices cannot know the recognition result until they waste several hundred milliseconds after the utterance ends.
Also, the next data could not be uttered until after the recognition results were displayed and confirmed. For this reason, it has been impossible to input data at high speed with conventional speech recognition devices.

本発明は、従来の音声認識装置が有している上
記欠点すなわち発声終了から数百ミリ秒も経過し
ないと認識結果を知ることもできず、したがつて
データ入力速度が低いという欠点を改良し、発声
が終了すると速やかに判定経過が表示され、した
がつてデータ入力速度が高い音声認識装置を実現
することを目的としている。 The present invention improves the above-mentioned drawbacks of conventional speech recognition devices, namely that the recognition result cannot be known until several hundred milliseconds have passed after the end of utterance, and therefore the data input speed is low. The object of the present invention is to realize a speech recognition device in which the judgment progress is displayed immediately after the utterance is finished, and the data input speed is high.

本発明による音声認識装置は、入力音声の休止
区間が検出された時点で休止区間検出信号を発生
し、休止区間の継続長があらかじめ定められた閾
値を超えた時点で終端検出信号を発生する機能を
有する終端検出部と、前記の休止区間検出信号が
発生された時点で入力音声が終了したものとして
認識動作を行ない速やかに認識結果信号を出力す
る機能を有する認識部と、前記認識結果信号を表
示するための認識結果表示部と、前記の終端検出
信号が発生された時点で前記認識結果信号を通過
出力せしめる出力部とにより構成されている。 The speech recognition device according to the present invention has a function of generating a pause section detection signal when a pause section of input speech is detected, and generating an end detection signal when the duration of the pause section exceeds a predetermined threshold. a recognition section having a function of performing a recognition operation and promptly outputting a recognition result signal by assuming that the input voice has ended at the time when the pause section detection signal is generated; The recognition result display section includes a recognition result display section for displaying the recognition result, and an output section that allows the recognition result signal to pass through and be output at the time when the end detection signal is generated.

かくのごとき構成によると発声が終了した時点
で速やかに認識結果が表示され確認を高速に実行
できる。また認識結果の出力は休止区間が充分に
長時間継続して終端であることが確定してから行
なわれるので単語中の休止区間で誤つた認識結果
が出力されることはない。 With such a configuration, the recognition result is displayed immediately after the utterance is finished, and confirmation can be performed quickly. Furthermore, since the recognition result is output after the pause period continues for a sufficiently long time and it is determined that it is the end, there is no possibility that an erroneous recognition result will be output during the pause section in a word.

第１図は本発明の一実施形態を示すブロツク図
であり、第２図はその動作の１例を示すタイムチ
ヤートである。図において、１はマイクロホンで
あつて入力音声波を電気信号に変換し信号線ｉを
経由して分析部１０に送る。分析部１０は信号線
ｉを経由して送り込まれる入力音声波形を分析し
て認識パラメタａと振幅レベルｌを抽出し、それ
ぞれを認識部１３と始端検出部１１および終端検
出部１２に送出する機能を有する。分析部は、要
するに、入力音声波形から認識のために必要なパ
ラメタを抽出する回路であつて、たとえばバンド
パスフイルタ分析器，自己相関分析器，あるいは
線形予測分析器等によつて構成することができ
る。第３図はバンドパスフイルタ分析器を中心に
して構成される分析部の一例を示している。破線
のブロツク１１０はよく知られているバンドパス
フイルタ分析器であつて入力信号ｉを、一例とし
て10個の周波数帯域に分析して各テヤンネルの出
力振幅をα_１〜α₁₀のアナログ信号として出力す
る。これらの信号はアナログマルチプレクサ１２
０によつて時分割多重化され、さらにＡ／Ｄ変換
器１３０によつてデイジイタル信号ａに変換され
認識部１３に送られる。一方、入力信号ｉは直接
に整流回路１４０に与えられ整流された後にロー
パスフイルタ１５０によつて平滑され、さらに
Ａ／Ｄ変換器１６０によつてデイジイタル信号に
変換され振幅レベル信号ｌとして出力される。マ
ルチプレクサ１２０とＡ／Ｄ変換器１６０は別途
与えられるフレーム周期信号Clのパルスに同期
して動作する。このフレーム周期信号のパルス間
隔は一例として10ミリ秒に設定されている。それ
ゆえ、分析部１０よりの出力信号ａは ai＝（α_1i，α_2i，…，α_xi，…，α_10i）(1) なる入力ベクトルの時系列となり、出力信号ｌはｌ_i (2) なる振幅信号の時系列となる。ここに、α_xiは時
間標本点における第ｘチヤネルの分析出力、すな
わち信号α_x（デイジタル化したもの）である。 FIG. 1 is a block diagram showing one embodiment of the present invention, and FIG. 2 is a time chart showing an example of its operation. In the figure, reference numeral 1 denotes a microphone that converts an input sound wave into an electrical signal and sends it to the analysis section 10 via a signal line i. The analysis section 10 has a function of analyzing the input audio waveform sent via the signal line i, extracting the recognition parameter a and the amplitude level l, and sending them to the recognition section 13, the start end detection section 11, and the end end detection section 12, respectively. has. The analysis section is, in short, a circuit that extracts parameters necessary for recognition from the input speech waveform, and can be configured with, for example, a bandpass filter analyzer, an autocorrelation analyzer, or a linear prediction analyzer. can. FIG. 3 shows an example of an analysis section mainly composed of a bandpass filter analyzer. The broken line block 110 is a well-known bandpass filter analyzer that analyzes the input signal i into, for example, 10 frequency bands and outputs the output amplitude of each channel as an analog signal of α ₁ to α ₁₀ . do. These signals are sent to analog multiplexer 12
0, and is further converted into a digital signal a by the A/D converter 130 and sent to the recognition section 13. On the other hand, the input signal i is directly applied to the rectifier circuit 140 and rectified, then smoothed by the low-pass filter 150, further converted into a digital signal by the A/D converter 160, and output as an amplitude level signal l. . The multiplexer 120 and the A/D converter 160 operate in synchronization with the pulses of the separately provided frame period signal Cl. The pulse interval of this frame period signal is set to 10 milliseconds, for example. Therefore, the output signal a from the analysis unit 10 is a time series of input vectors ai = (α _1i , α _2i , ..., α _xi , ..., α _10i ) (1), and the output signal l is l _i (2 ) is the time series of the amplitude signal. Here, α _xi is the analysis output of the xth channel at the time sample point, ie, the signal α _x (digitized).

始端検出部１１は前記信号ｌとして与えられる
(2)式のｌ_iを検定して入力音声の始端を検出し、
その時点で始端検出信号パルスＳを発生する。始
端検出の方法としては種々のものが考えられる
が、一例としてはあらかじめ定められた閾値θ_b
と前記ｌ_iを比較して最初にｌ_i＞θ_b (3) となつた時点ｌ＝ｂを始端であるとするのが最も
簡単な方法である。いずれにしても、入力音声の
始端が検出されると始端検出信号パルスＳが発生
される。このパルスＳは認識部１３に対して送ら
れる。 The starting edge detection unit 11 is given as the signal l.
Test l _i in equation (2) to detect the start of the input voice,
At that point, a starting edge detection signal pulse S is generated. Various methods can be considered for starting edge detection, but one example is a predetermined threshold θ _b
The simplest method is to compare the above-mentioned l _i and define the point l=b at which l _i >θ _b (3) as the starting point. In any case, when the starting edge of the input audio is detected, a starting edge detection signal pulse S is generated. This pulse S is sent to the recognition section 13.

終端検出部１２は休止区間および終端の検出動
作を実行する。休止区間は振幅レベルがあらかじ
め定められた閾値θ_eより小な区間として定義さ
れる。また、終端とは休止区間があらかじめ定め
られた時間長Ｌより長く継続した時、その休止区
間の直前の時間点ｉ＝ｅとして定義される。第４
図に休止区間と終端の定義を図示している。第４
図の上段にはたとえば“３”（／san／）と単語
を発声した時の振幅レベルｌ_iの時間変化の様子
を図示したものである。ｌ_iがあらかじめ定めら
れた閾値θ_eより小となる時間点、すなわち参照
数字４０で示した時間点以後が休止区間であり、
休止区間があらかじめ定められた時間長Ｌを超え
た場合、この休止区間は単語の終了を意味するも
のと見なされる。この場合には参照数字４０の直
前、すなわち参照数字４１で示す時間点が入力音
声の終端ｉ＝ｅとなる。 The termination detection unit 12 performs a detection operation for a pause section and an termination. The rest period is defined as a period in which the amplitude level is smaller than a predetermined threshold θ _e . Furthermore, the end point is defined as the time point i=e immediately before the pause section when the pause section continues for longer than a predetermined time length L. Fourth
The figure shows the definition of the pause section and the end. Fourth
The upper part of the figure shows how the amplitude level l _i changes over time when the word "3" (/san/) is uttered, for example. The time point after which l _i is smaller than a predetermined threshold θ _e , that is, the time point indicated by reference numeral 40, is the rest period,
If a pause exceeds a predetermined time length L, this pause is considered to mean the end of a word. In this case, the time point immediately before reference numeral 40, that is, the time point indicated by reference numeral 41, becomes the end point i=e of the input speech.

第５図は終端検出部の一構成例を示す図であ
る。分析部５１よりの振幅レベル信号ｌ_iは比較
回路５１によつてあらかじめ設定されている閾値
θ_eと比較される。ｌ_i＞θ_eであると比較回路の出
力はｙ＝０，＝１となる。前者の信号ｙはその
まま休止区間検出信号e2として出力される。後者
の信号は計数回路５３にリセツト信号として与
えられるゆえ、ｌ_i＞θ_eであるかぎり計数回路５
３の出力ｍは０である。このカウンタの内容とし
て出力される信号ｍは比較回路５４によつて、あ
らかじめ設定された閾値Ｌと比較される。一例と
して、この閾値Ｌは前記のフレーム周期10ミリ秒
で計数して250ミリ秒となるべくＬ＝25とする。
閾値回路５４の出力e1はｍ≧ｌのときのみe1＝１
でｍ＜ｌの時はe1＝０となるようになつている。
それゆえ、ｌ_i＞θ_eであるかぎり終端検出信号e1
は０である。入力音声の終端の休止区間に入りｌ
_i＜θ_Lとなつた時点では比較回路５１の出力はｙ
＝１、＝０となる。前者の信号は前記休止区間
検出信号e2として出力されるゆえe2＝１となる。
また、この信号ｙと前記フレーム周期信号パルス
Clとの論理積Ｚが論理回路５２によつて計算さ
れ、計数回路５３に与えられる。この時は、信号
は１になつているので信号Ｚのパルスがこの計
数回路５３によつて計数される。それゆえ、計数
回路５３の内容ｍはｌ_i＜θ_eである間、すなわち
休止区間の間、フレーム周期信号パルスClに同
期して１ずつ加算され増加を続ける。かくして記
数回路５３の内容が定数Ｌ以上になると比較回路
５４の出力すなわち終端検出信号e1は１になる。 FIG. 5 is a diagram showing an example of the configuration of the termination detection section. The amplitude level signal l _i from the analysis section 51 is compared with a preset threshold value θ _e by the comparison circuit 51 . If l _i >θ _e , the output of the comparator circuit becomes y=0,=1. The former signal y is output as is as the rest period detection signal e2. Since the latter signal is given to the counting circuit 53 as a reset signal, as long as l _i >θ _e , the counting circuit 5
The output m of 3 is 0. The signal m output as the contents of this counter is compared with a preset threshold L by a comparison circuit 54. As an example, this threshold L is set to L=25, which is 250 milliseconds when counted at the frame period of 10 milliseconds.
The output e1 of the threshold circuit 54 is e1=1 only when m≧l
When m<l, e1=0.
Therefore, as long as l _i >θ _e, the termination detection signal e1
is 0. Enters the pause section at the end of the input audio.
At the point when _i < θ _L , the output of the comparison circuit 51 is y
=1, =0. Since the former signal is output as the rest period detection signal e2, e2=1.
Moreover, this signal y and the frame periodic signal pulse
The logical product Z with Cl is calculated by the logic circuit 52 and provided to the counting circuit 53. At this time, since the signal is 1, the pulses of the signal Z are counted by the counting circuit 53. Therefore, the content m of the counting circuit 53 continues to increase by being added by 1 in synchronization with the frame periodic signal pulse Cl while l _i <θ _e , that is, during the pause period. Thus, when the content of the numeral circuit 53 becomes equal to or greater than the constant L, the output of the comparison circuit 54, that is, the end detection signal e1 becomes 1.

以上のように終端検出部が入力音声終端の検出
動作を行なつている間に認識部１３は以下のごと
き動作をする。始端検出部１１よりの始端検出信
号Ｓ（パルス）が与えられた時点、すなわちｉ＝
ｂ以後、分析部１０より入力される入力ベクトル
ａ_i（(1)式）を受け取り認識動作を実行する。始
端が検出されて以後、前記入力ベクトルはフレー
ム周期信号Clのパルスに同期して次々に入力さ
れる。休止区間検出信号e2が与えられるまでの間
は、主として認識のための前処理を実行している
ものとして最終的な判定は行なわない。休止区間
検出信号e2が与えられた時点では、認識部１３に
は次のような入力ベクトル系列が入力され保持さ
れている。 While the end detection section is performing the operation of detecting the end of the input audio as described above, the recognition section 13 performs the following operations. The time point when the start edge detection signal S (pulse) from the start edge detection section 11 is given, i.e., i=
After b, the input vector a _i (formula (1)) input from the analysis unit 10 is received and a recognition operation is executed. After the start edge is detected, the input vectors are inputted one after another in synchronization with the pulses of the frame periodic signal Cl. Until the rest period detection signal e2 is applied, preprocessing for recognition is mainly performed, and no final determination is made. At the time when the pause section detection signal e2 is applied, the following input vector sequence is input and held in the recognition unit 13.

Ａ＝ａ_b，ａ_b+1，…，ａ_i，…，ａ_e (4) 以下ではこの入力ベクトル系列を入力パタンＡ
と呼ぶ。認識部１３はこの入力パタンＡを１個の
入力音声と見なして（すなわち、入力音声は休止
区間検出信号e2が１になつた時点で終了したもの
と仮定して）判定を行なう。 A=a _b , a _b+1 ,..., a _i ,..., a _e (4) Below, this input vector sequence is used as input pattern A
It is called. The recognition unit 13 regards this input pattern A as one input voice (that is, assumes that the input voice ends when the pause section detection signal e2 becomes 1) and makes a determination.

実際には、認識部１３に採用される認識原理に
よつて、この間の動作は種々の形態をとることに
なるが、本発明で特徴とし主張する処の１は振幅
レベルの低い区間（休止区間）が検出されると、
すなわち休止区間検出信号e2を受け取るとその時
点で入力音声が完結したものと仮定して直ちに判
定を行なうものである。認識部１３の認識原理と
しては種々のものが考えられ一種のものに特定す
るものではないが、一例としては、音響学会研究
委員会資料S73―22（1973年12月発行）に“音声
認識における各種DPマツチング法の比較”と題
して発表された論文に記載されているDPマツチ
ング法が考えられる。すなわち、必要なすべての
単語ｎ（ｎ＝１〜Ｎ）に対して標準パタンＣⁿ＝Ｃ^ｎ _１，Ｃ^ｎ _２，… …，Ｃ^ｎ _Ｊｏ (5) を内蔵しておき、前記の休止区間検出信号が１に
なつた時点で入力パタンＡと各標準パタンＣⁿと
の間でDPマツチングを実行して類似度Ｓ（Ａ，Ｃⁿ） (6) を算出する。次に、各標準パタンＣⁿとの間に算
出された前記類似度を比較して最大となる単語名
ｎ＝ｒを定めることによつて入力パタンＡは単語
ｒであると判定する。この認識部の具体的構成
は、本発明の権利として主張する処ではないので
省略する。 In reality, the operation during this period will take various forms depending on the recognition principle adopted by the recognition unit 13, but one of the features claimed in the present invention is an interval with a low amplitude level (a pause interval). ) is detected,
That is, when the pause section detection signal e2 is received, the determination is made immediately on the assumption that the input audio has been completed at that point. Various recognition principles can be considered for the recognition unit 13, and it is not intended to be specific to one type, but as an example, there is a One possible example is the DP matching method described in the paper titled "Comparison of Various DP Matching Methods." That is, the standard pattern C ⁿ = C ⁿ ₁ , C ⁿ ₂ , ..., C ⁿ _Jo (5) is built in for all necessary words n (n = 1 to N), and the above-mentioned pause section is When the detection signal becomes 1, DP matching is performed between the input pattern A and each standard pattern C ⁿ to calculate the similarity S (A, C ⁿ ) (6). Next, the input pattern A is determined to be the word r by comparing the similarity calculated with each standard pattern C ⁿ and determining the maximum word name n=r. The specific configuration of this recognition unit is not claimed as a right of the present invention, so it will be omitted.

認識部１３で決定された認識結果ｒは第１図の
信号線ｒを経由して出力部１６と認識結果表示制
御部１４に送られる。出力部１６は終端検出部１
２よりの終端検出信号e1が１である時にのみ前記
の認識結果信号ｒを出力信号ｘとして出力する機
能を有する。したがつて前記の終端検出信号e1が
０である間は出力ｘは生じない。認識結果表示制
御部１４は終端検出部１２より与えられる休止区
間検出信号e2が１である間、認識結果信号ｒを信
号線ｄを経由して認識結果表示部１７に送出し表
示せしめる。この認識結果表示部１７は、たとえ
ば発光ダイオードのキヤラクタデイスプレーで構
成されていて信号線ｄで指定される文字を表示す
ることによつて前記の認識結果ｒを発声者に速や
かに知らしめる。 The recognition result r determined by the recognition section 13 is sent to the output section 16 and the recognition result display control section 14 via the signal line r shown in FIG. The output section 16 is the termination detection section 1
It has a function of outputting the recognition result signal r as an output signal x only when the termination detection signal e1 from 2 is 1. Therefore, while the termination detection signal e1 is 0, no output x is generated. The recognition result display control section 14 sends the recognition result signal r to the recognition result display section 17 via the signal line d for display while the pause section detection signal e2 given by the end detection section 12 is 1. The recognition result display section 17 is constituted by, for example, a character display of a light emitting diode, and promptly informs the speaker of the recognition result r by displaying the characters specified by the signal line d.

休止区間が継続してやがて終端検出信号e1が１
となると出力部１６は認識結果信号ｒを出力信号
ｘとして出力する。この出力信号ｘは以上の音声
認識装置に接続され制御される別の装置（以下で
は目的装置と呼ぶ）に送られる。 The pause section continues and eventually the end detection signal e1 becomes 1.
Then, the output unit 16 outputs the recognition result signal r as the output signal x. This output signal x is sent to another device (hereinafter referred to as a target device) connected to and controlled by the above speech recognition device.

入力可能状態表示部１５と表示ランプ１８は、
入力音声を発声してよい時期を発声者に対して知
らせるためのものである。終端検出信号e1が１で
ある信号線ｔに電流を流して表示ランプ１８を点
灯させる。e1が０であると信号線ｔに電流を流さ
ないで表示ランプ１８を消灯させる。それゆえ、
表示ランプ１８は音声の始端が検出されて以後終
端が検出されるまで消灯されていることになる。
終端が検出された時点で再び点灯され次の入力音
声を発声してよいことが表示される。 The input ready state display section 15 and the display lamp 18 are
This is to inform the speaker of when it is permissible to utter the input voice. A current is passed through the signal line t for which the end detection signal e1 is 1, and the indicator lamp 18 is turned on. When e1 is 0, no current is passed through the signal line t, and the indicator lamp 18 is turned off. therefore,
The indicator lamp 18 is turned off after the start of the audio is detected until the end is detected.
When the end is detected, the light is lit again to indicate that the next input voice may be uttered.

以上の動作によると最終的な認識結果は第２図
に参照数字２１で示す時間点で決定されるが、実
際には参照数字２０で示す時間点で表示されるこ
とになる。休止区間が検出された時間点２２から
認識結果が表示される時間点２０までの時間は認
識部１３の動作を高速化することによつて充分短
かくできるので発声者は実際には発声が終るほと
んど同時に認識結果を知つて確認ができる。しか
も目的装置に対する認識結果の出力は休止区間が
充分長く継続した後の時間点２１においてなされ
るので、最初述べたように単語中の休止区間と単
語終端の休止区間との判別を誤ることはない。 According to the above operations, the final recognition result is determined at the time point indicated by reference numeral 21 in FIG. 2, but is actually displayed at the time point indicated by reference numeral 20. The time from the time point 22 at which the pause section is detected to the time point 20 at which the recognition result is displayed can be made sufficiently short by speeding up the operation of the recognition unit 13, so that the speaker actually finishes speaking. You can know and check the recognition results almost at the same time. Moreover, since the recognition result is output to the target device at time point 21 after the pause has continued for a sufficiently long period of time, there is no possibility of misidentifying a pause within a word from a pause at the end of a word, as mentioned above. .

第６図は単語中の休止区間がある場合の動作を
明らかにするためのタイムチヤートである。数字
“６”（／roku／）を発声した場合には／ｏ／
と／ｋ／の間に参照数字６１で示す休止区間があ
るために、休止区間検出信号e2が１になり、これ
に応じて認識結果ｒが発生されｄとして表示され
る。この時点で認識部１３に入力されている(4)式
の入力パタンは／ro／の部分までであるので
“５”（／go／）と誤つて表示されているかも知
れない。しかし、この休止区間は前記の終端検出
部に設定されているＬ（たとえば250ミリ秒の時
間長）よりも短かいので終端検出信号e1が１にな
ることはないから出力信号ｘとして出力される恐
れはない。／ku／の部分まで発声が終了すると
参照数字６２で示す単語終端の休止区間が検出さ
れるので、上記と同様の動作によつて認識結果ｒ
が決定されｄとして表示される。今度は“６”の
単語全体が入力された後であるので正しく“６”
と認識され表示される。かくして休止区間６２が
前記Ｌ以上継続して参照数字６３で示す時間点に
至つてはじめて終端検出信号e1が１となり出力信
号ｘが目的装置に対して送られる。この動作によ
ると、発声途中で一度誤つた認識結果が表示され
るが、発声者は発声途中であることを知つている
ので途中表示された結果を無視すれば特に問題は
生じない。 FIG. 6 is a time chart to clarify the operation when there is a pause section in a word. When you say the number “6” (/roku/), it becomes /o/
Since there is a pause section indicated by the reference numeral 61 between and /k/, the pause section detection signal e2 becomes 1, and the recognition result r is generated accordingly and displayed as d. At this point, the input pattern of equation (4) that is input to the recognition unit 13 is up to the /ro/ part, so it may be displayed incorrectly as "5" (/go/). However, since this pause period is shorter than the L set in the end detection section (for example, a time length of 250 milliseconds), the end detection signal e1 never becomes 1, so it is output as the output signal x. There's no fear. When the utterance ends up to the /ku/ part, a pause section at the end of the word indicated by reference numeral 62 is detected, so the recognition result r is determined by the same operation as above.
is determined and displayed as d. This time, the entire word “6” has been entered, so it is correctly written as “6”.
is recognized and displayed. In this way, the end detection signal e1 becomes 1 and the output signal x is sent to the target device only when the pause period 62 continues for the length L or more and reaches the time point indicated by the reference numeral 63. According to this operation, an incorrect recognition result is displayed once during the utterance, but since the speaker knows that the utterance is in the middle of utterance, no particular problem will occur if the speaker ignores the displayed result.

このように認識結果の表示と出力を行なうこと
によつて発声者は認識結果を速やかに知り確認を
行なうことができ、しかも単語途中の休止区間を
単語終端の休止区間と誤つて判別し誤認識を行な
うことはない。この効果によつて音声認識装置を
高速かつ円滑に使用できることになる。 By displaying and outputting the recognition results in this way, the speaker can quickly know and confirm the recognition results, and there is also the possibility that a pause in the middle of a word may be mistakenly recognized as a pause at the end of a word, resulting in erroneous recognition. will not be carried out. This effect allows the speech recognition device to be used quickly and smoothly.

以上、本発明の原理を実施例をもとに説明した
が、こられの記載は本発明の範囲を限定するもの
ではなく、特に終端検出部における終端検出の方
法には種々の変形が考えられる。例えば、過去
250ミリ秒程度の時間幅をもつ時間窓ｋ−Ａ ≦ ｉ ≦ ｋを設定し、この中の振幅レベルの総和、すなわちｌ_k-A＋ｌ_k-A+1＋… …ｌ_k があらかじめ定められた閾値θ′_eより大となる時
点ｋで終端検出信号を発生する方法が考えられ
る。この方法によると総和をとることによつて瞬
時的なノイズの影響をある程度弱めることができ
るので先の実施例に述べた方法に比して安定な動
作が期待できる。この他にも、入力音声波の零交
差数が休止区間では低い性質を利用して、振幅レ
ベルと零交差数を合せて総合的に判定を行ない、
休止区間と終端の検出を行なう方法も有効であ
る。 The principles of the present invention have been explained above based on examples, but these descriptions do not limit the scope of the present invention, and various modifications can be made to the method of detecting the end in the end detecting section. . For example, past
A time window k-A ≦ i ≦ k with a time width of about 250 milliseconds is set, and the sum of the amplitude levels within this window, that is, l _kA +l _k-A+1 +... l _k is a predetermined threshold. A possible method is to generate the termination detection signal at the time point k when the value becomes larger than θ′ _e . According to this method, the effect of instantaneous noise can be weakened to some extent by taking the summation, so that more stable operation can be expected than in the method described in the previous embodiment. In addition, by taking advantage of the property that the number of zero crossings of the input audio wave is low in the pause section, comprehensive judgment is made by combining the amplitude level and the number of zero crossings.
A method of detecting the pause section and the end is also effective.

また、先の実施例では終端検出信号e1や休止区
間検出信号e2などを断続される直流信号とした
が、これをパルス信号としても同様な動作を実現
できる。また振幅レベル信号ｌ_iなどをアナログ
信号のままで処理しても本発明の原理を実行でき
ることは明白である。 Further, in the previous embodiment, the end detection signal e1, the rest period detection signal e2, and the like were used as intermittent DC signals, but the same operation can be achieved by using them as pulse signals. Furthermore, it is clear that the principle of the present invention can be implemented even if the amplitude level signal l _i and the like are processed as analog signals.

[Brief explanation of the drawing]

第１図は本発明の一実施例を示すブロツク図
で、１はマイクロホン、１０は分析部、１１は始
端検出部、１２は終端検出部、１３は認識部、１
４は認識結果表示制御部、１５は入力可能状態表
示部、１６は出力部、１７は認識結果表示部、１
８は表示ランプである。第２図は第１図に示した実施例の動作を説明す
るためのタイムチヤートであり、第３図は分析部
の一構成例を示すブロツク図である。第３図にお
いて破線で示すブロツク１１０はバンドパスフイ
ルタ分析器、ブロツク１２０はアナログマルチプ
レクサ、１３０はＡ／Ｄ変換器、１４０は整流
器、１５０はローパスフイルタ、１６０はＡ／Ｄ
変換器である。第４図は終端検出部１２の動作を説明するため
の図であり、第５図は終端検出部１２の一構成例
を示すブロツク図である。第５図において５１は
比較回路、５２は論理積回路、５３は計数回路、
５４は比較回路である。第６図は第１図の実施例
の装置の動作の一部を示すタイムチヤートであ
る。 FIG. 1 is a block diagram showing an embodiment of the present invention, in which 1 is a microphone, 10 is an analysis section, 11 is a starting edge detection section, 12 is an end detection section, 13 is a recognition section, 1
4 is a recognition result display control section, 15 is an input enable state display section, 16 is an output section, 17 is a recognition result display section, 1
8 is a display lamp. FIG. 2 is a time chart for explaining the operation of the embodiment shown in FIG. 1, and FIG. 3 is a block diagram showing an example of the configuration of the analysis section. In FIG. 3, block 110 indicated by a broken line is a bandpass filter analyzer, block 120 is an analog multiplexer, 130 is an A/D converter, 140 is a rectifier, 150 is a low-pass filter, and 160 is an A/D converter.
It is a converter. FIG. 4 is a diagram for explaining the operation of the termination detection section 12, and FIG. 5 is a block diagram showing an example of the configuration of the termination detection section 12. In FIG. 5, 51 is a comparison circuit, 52 is an AND circuit, 53 is a counting circuit,
54 is a comparison circuit. FIG. 6 is a time chart showing part of the operation of the apparatus of the embodiment shown in FIG.

Claims

[Claims]

1. An end detection section having a function of generating a pause section detection signal indicating that a pause section of the input audio has been detected and an end detection signal indicating that the end of the input audio has been detected; a recognition section having a function of performing a recognition operation and outputting a recognition result signal by assuming that the input voice has ended when the detection signal is generated; a recognition result display section for displaying the recognition result signal; and an output section that outputs the recognition result signal at the time when the termination detection signal is generated.