JPH02275497A

JPH02275497A - Voice recognition device

Info

Publication number: JPH02275497A
Application number: JP1096706A
Authority: JP
Inventors: Shoichi Kamei; 亀井　正一
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1989-04-17
Filing date: 1989-04-17
Publication date: 1990-11-09

Abstract

PURPOSE:To improve the recognition rate by performing matching processes in two stages wherein input voices which can be recognized in the respective matching processes are limited respectively. CONSTITUTION:This device is equipped with standard pattern storage means 5 and 6 wherein standard patterns classified in two groups are stored by the groups and 1st stage and 2nd stage matching means which perform similarity calculation. Then this device is provided with a control means 9 which performs return control from the matching process of the 2nd matching means to the matching process of the 1st stage matching means while input patterns in an input pattern storage means at this time are held when the maximum similarity value obtained by the 2nd stage matching process is smaller than a specific threshold value. Therefore, if misrecognition is caused in the 1st step, the voice recognition in the 1st step can be performed even in the 2nd step without moving back to the 1st step from the 2nd step. Consequently, the high-performance recognition rate can be obtained.

Description

【発明の詳細な説明】（イ）産業上の利用分野本発明は音声認識して目的の電気機器を制御し得るよう
になした音声認識装置に関する。DETAILED DESCRIPTION OF THE INVENTION (A) Field of Industrial Application The present invention relates to a voice recognition device capable of recognizing voice and controlling target electrical equipment.

（ロ）　従来の技術近年、音声認識装置に於ける音声認識率の向上に伴い、
音声制御できる電子機器、例えばオートダイヤルできる
電話機が実用化されつつある（特開昭６２−８１１５２
号）。(b) Conventional technology In recent years, with the improvement in the speech recognition rate of speech recognition devices,
Electronic devices that can be controlled by voice, such as telephones that can auto-dial, are being put into practical use (Japanese Patent Laid-Open No. 62-81152).
issue).

例えば、音声認識オートダイヤル電話機の場合、その音
声認識装置としては、第１ステツプでダイヤル先名称（
個人名、会社名等）を音声認識し、第２ステツプで指令
音声（ダイヤル、キャンセル停）を音声認識する２段階
認識処理方式を採用したものが最も現実的である。For example, in the case of a voice recognition auto-dial telephone, the voice recognition device uses the dial destination name (
The most practical method is one that adopts a two-step recognition processing method in which the user's name (individual name, company name, etc.) is voice-recognized, and the command voice (dial, cancel stop, etc.) is voice-recognized in the second step.

即ち、２段階認識処理は、話者が発声したダイヤル先名
称を認識した結果で直ちにダイヤル先の１話番号をダイ
ヤルするのではなく、この認識結果を表示、あるいは合
成音声出力で話者に確認させ、誤認識が無い時に、音声
によるダイヤル指令を与えるので、音声の誤認識による
ダイヤル誤りを未然に貼止できる。In other words, the two-step recognition process recognizes the name of the dialed destination uttered by the speaker, and instead of immediately dialing the first number of the dialed destination, this recognition result is displayed or confirmed to the speaker using synthesized voice output. Since a dialing command is given by voice when there is no erroneous recognition, it is possible to prevent dialing errors caused by erroneous voice recognition.

このような２段階認識処理を行なうパタンマツチング手
法の音声認識装置は、第１ステツプではそのステップに
しか要求されない語の音声、例えば複数のダイヤル先名
称の音声、を認識するために、これ等ダイヤル先名称の
音声の標準パタン群（第１のグループ）を用い、第２ス
テツプでは複数の指令音声を認識するために、これ等指
令音声の標準パタン群（第２グループ）を用いる事にな
る。A speech recognition device using a pattern matching method that performs such a two-step recognition process uses these words in the first step to recognize the speech of a word that is required only for that step, such as the speech of multiple dialed destination names. A standard group of voice patterns (first group) for the dial destination name is used, and in the second step, a standard group of voice command patterns (second group) are used to recognize multiple command voices. .

これ等第１グループ、第２グループの標準パタンを分類
しないで、１つのグループとして、上述の第１、第２の
両ステップの音声認識処理に同様に用いる事もできるが
、この場合には、各ステップの音声認識の為のパタンマ
ツチング（入力音声のパタンと、標準パタンとの誤差計
算）処理量が大きくなり、しかも誤認識発生の確率を増
す事になるので、これを回避するために、上述の如く標
準音声パタンはステップ毎の認識処理に対応してグルー
プ分けきれている。These standard patterns of the first group and the second group can be used as one group for the speech recognition processing of both the first and second steps described above, without classifying them, but in this case, The amount of processing required for pattern matching (calculating the error between the input speech pattern and the standard pattern) for speech recognition in each step becomes large, and the probability of misrecognition occurring increases.To avoid this, As mentioned above, the standard speech patterns can be divided into groups corresponding to the recognition processing for each step.

（ハ）発明が解決しようとする課題上述の如く、多段階ステップで音声認識処理を行なう従
来の音声認識装置に於ては、例えばオートダイヤル電話
機に於いては、第１ステツプで話者が発声したダイヤル
先名称の認識結果が誤認識であった場合、次の第２ステ
ツプで話者が発声した指令音声１キヤンセル」を認識さ
せ、この認識が正しく行なわれることを条件に第１ステ
ツプにもどしてから、ダイヤル先名称を再発声してこれ
の認識を行なわせる必要がある。このような音声の再入
力のための手順は、非常に煩雑であり、音声入力する話
者への負担が大きくなる欠点があった。(c) Problems to be Solved by the Invention As mentioned above, in conventional speech recognition devices that perform speech recognition processing in multiple steps, for example, in an auto-dial telephone, the speaker utters a voice in the first step. If the recognition result of the dialed destination name is an incorrect recognition, in the next second step, the command voice 1 cancel uttered by the speaker is recognized, and on the condition that this recognition is performed correctly, the process returns to the first step. It is then necessary to re-speak the name of the dialed destination so that it can be recognized. Such a procedure for re-inputting voice is extremely complicated and has the drawback of increasing the burden on the speaker who inputs voice.

本発明は断る欠点を解消すべくなされたものであり、第
１ステツプに誤認識があった場合、第２ステツプから第
１ステツプへもどる事なく、第２ステツプでも第１ステ
ツプの再度の音声認識処理を可能とした音声認識装置を
実現するものである。The present invention has been made in order to eliminate the drawback of refusing.If there is an erroneous recognition in the first step, the voice recognition of the first step is repeated at the second step without returning from the second step to the first step. This realizes a speech recognition device that enables processing.

（ニ）課題を解決するための手段本発明の音声認識装置は、少なくとも２グループに分類
された標準パタンをグループ別に格納した標準パタン記
憶手段と、最新の入力音声の入力パタンを記憶しておく
入力パタン記憶手段と、第１グループの上記標準パタン
記憶手段の各標準パタンと上記入力パタン記憶手段の入
力音声パタンとをパタンマツチングにより比較して類似
度計算を行う第１段マツチング手段と、第２グループの
上記標準パタン記憶手段の各標準パタンと上記入力パタ
ン記憶手段の入力音声パタンとをパタンマツチングによ
り比較して類似度計算を行う第２段マツチング手段と、
上記第１段マツチング手段で認識した音声を報知する報
知手段、上記第２段マツチング処理で得られる最大類似
度値が所定の閾値より小さい場合に、この時の入力パタ
ン記憶手段の入力パタンを保存した状態で、第２マツチ
ング手段のマツチング処理から上記第１段マツチング手
段のマツチング処理に差し戻す制御を司る制御手段を備
えてなるものである。(d) Means for Solving the Problems The speech recognition device of the present invention includes a standard pattern storage means storing standard patterns classified into at least two groups for each group, and an input pattern of the latest input speech. an input pattern storage means, a first stage matching means for calculating similarity by comparing each standard pattern of the first group of the standard pattern storage means and the input voice pattern of the input pattern storage means by pattern matching; a second stage matching means for calculating similarity by comparing each standard pattern of the second group of the standard pattern storage means and the input voice pattern of the input pattern storage means by pattern matching;
Notifying means for notifying the voice recognized by the first stage matching means; if the maximum similarity value obtained in the second stage matching process is smaller than a predetermined threshold, the input pattern at this time is stored in the input pattern storage means; In this state, the apparatus includes a control means for controlling the matching process of the second matching means to be returned to the matching process of the first stage matching means.

（ホ）作用本発明の音声認識装置は、各段マツチング処理で認識で
きる入力計重がそれぞれ制限きれている少なくとも２段
のマツチング処理が行える装置であり、第１段マツチン
グで認識した音声が誤認識であると判った時に、該誤認
識音声を再入力することにより、次の２段マツチング処
理で得られる最大類似度値が所定の閾値より小さくなる
ので、この時の入力パタン記憶手段の入力パタンを保存
した状態で、第２段マツチング処理から前段の第１段マ
ツプ・ング処理に自動的に戻すことができる。(E) Function The speech recognition device of the present invention is a device capable of performing at least two stages of matching processing in which the input weights that can be recognized in each stage of matching processing are each limited, and the speech recognized in the first stage matching is incorrect. When it is determined that the erroneously recognized speech is recognized, by re-inputting the misrecognized speech, the maximum similarity value obtained in the next two-stage matching process becomes smaller than the predetermined threshold. With the pattern saved, it is possible to automatically return from the second stage matching process to the preceding first stage mapping process.

（へ）実施例第１図に本発明の音声、認識装置の構成を示す。(f) Example FIG. 1 shows the configuration of a speech recognition device according to the present invention.

同図の本発明装置は、音声を入力する入力部（１）と、
入力音声から特徴パラメータを抽出する前処理部（２）
と予め作成しである２種類のグループの標準パタンで、
第一の入力音声との間でマツチングを行なう標準パタン
（５）と、第二の入力音声との間でマツチングを行なう
標準パタン（６）と、これらの標準パタンと前処理部（
２）によって特徴抽出された入力パタンとの間で距離（
誤差と等しく類似度とは逆数的関係にある）計算を行な
って、最小距離のパタンを！５識結果として出力する識
別部（４）を基本構成としている。The device of the present invention shown in the figure includes an input section (1) for inputting audio;
Preprocessing unit (2) that extracts feature parameters from input audio
and standard patterns of two types of groups created in advance,
A standard pattern (5) for matching with the first input audio, a standard pattern (6) for matching with the second input audio, and a preprocessing unit (
The distance (
It is equal to error and has a reciprocal relationship with similarity), and calculate the minimum distance pattern! The basic configuration is an identification section (4) that outputs the identification result.

更に同図実施例構成を以下に詳述する。Furthermore, the configuration of the embodiment shown in the figure will be described in detail below.

まず、第一の音声が入力部（１）に入力されると入力さ
れた音声は前処理部（２）で特徴パラメータが抽出され
、標準パタン（５）との距離計算が識別部（４）で行な
われ、認識結果が結果格納部（７）に格納される。そこ
で制御部（９）は音声指令のための第二の音声入力待ち
状態にし、第二の音声が入力部（１）に入力されると前
処理部（２）で特徴パラメータが抽出された後、入力パ
タン格納部（３）にその特徴パタンか格納される。そし
て、標準パタン（６）との距離計算が識別部（４）で行
なわれ、その時のマツチング距離が距離比較部（８）で
、予め定められたしきい値と比較きれる。即ち、距離が
小さい程、類似度は大きい事になる。従って、この距離
が所定のしきい値より大きい場合（類似度が所定の値よ
り小さい場合〉には、制御部（９）に対して認識結果無
効信号を出し、それを受は取った制御部（９）は結果格
納部（７）に対して消去信号を出して、第一の入力音声
に対する認識結果を消去する。さらに、入力パタン格納
部（３）に対して入力パタン出力信号を出し、識別部（
４）において標準パタン（５）との間で距離計算を行な
い、その結果を結果格納部（７）に格納する。First, when the first voice is input to the input unit (1), the preprocessing unit (2) extracts feature parameters from the input voice, and the distance calculation between the input voice and the standard pattern (5) is performed by the identification unit (4). The recognition results are stored in the result storage section (7). Therefore, the control unit (9) waits for a second voice input for a voice command, and when the second voice is input to the input unit (1), the preprocessing unit (2) extracts the feature parameters and then waits for a second voice input. , the characteristic pattern is stored in the input pattern storage section (3). Distance calculation with the standard pattern (6) is then performed in the identification section (4), and the matching distance at that time is compared with a predetermined threshold value in the distance comparison section (8). In other words, the smaller the distance, the greater the similarity. Therefore, if this distance is larger than a predetermined threshold (if the degree of similarity is smaller than a predetermined value), a recognition result invalidation signal is sent to the control unit (9), and the control unit that receives the signal outputs a recognition result invalidation signal. (9) outputs a deletion signal to the result storage unit (7) to delete the recognition result for the first input voice.Furthermore, outputs an input pattern output signal to the input pattern storage unit (3), Identification part (
In step 4), distance calculation is performed between the standard pattern (5) and the result is stored in the result storage section (7).

一方、しきい値より小さい場合（類似度が所定の値より
大きい場合）には第二の音声の認識結果が制御部（９）
に送られ、それに対応した制御信号を出力する。On the other hand, if it is smaller than the threshold value (if the similarity is larger than a predetermined value), the recognition result of the second voice is sent to the control unit (9).
and outputs a corresponding control signal.

次に、本発明の音声認識装置をオートダイヤルＴ話機に
採用した場合の一例を以下に示す。Next, an example in which the speech recognition device of the present invention is employed in an auto-dial T-phone will be shown below.

まず、第１標準パタンメモリ（５）の第１グループ標準
パタンとして下記茨の多数のダイヤル先名称、第２Ｎ４
準パタンメモリ（６）の第２グループ標準パタンとして
下記表の２語の音声指令を月意しておく。First, as the 1st group standard pattern of the 1st standard pattern memory (5), the following thorny dial destination names, 2nd N4
The two-word voice commands shown in the table below are prepared as the second group standard patterns in the semi-pattern memory (6).

表：標準パタン化された音声なお、上表の音声指令の１ソウシユツ」は［送出コ（ダ
イヤル先名称に対応したダイヤルを送出する）、「コウ
ホ、は［候補］（認識結果の次候補を出力する）を示し
ている。Table: Standard patterned voice Note that one of the voice commands in the table above is [Send ko (sends the dial that corresponds to the name of the dial destination)], [Send the dial that corresponds to the name of the dial destination], [Send the dial that corresponds to the dial destination name], [Send the dial that corresponds to the name of the dial destination], [Send the dial that corresponds to the name of the dial destination], [Send the dial that corresponds to the name of the dial destination], output).

まず、発声者が第１ステツプで、第一の音声（相手先、
即ちダイヤル先名称）「サンヨウ」を発声し、上記第１
標準パタンメモリ（５）の第１グループ標準パタンを用
いて認識処理（距離計算）した結果が１マツシタ」と間
違った場合、次のステップで第二の音声（音声指令）１
候補」を発声し、上記第２標準パタンメモリ（６）の第
１グループ標準パタンを用いて認識処理して次候補を出
力すればよいわけであるが、目的の１サンヨウ、が下位
候補からなかなか出てこない場合がある。そのような場
合に例えば、従来装置のように１キヤンセル、等の指示
語によって第１ステツプにもどして第一の音声入力待ち
状態にし、再度ダイヤル先名称を発声するのは面倒であ
る。First, in the first step, the speaker hears the first voice (the other party's voice,
(i.e. the name of the dialed destination), say "Sanyou" and dial the number 1 above.
If the result of recognition processing (distance calculation) using the first group standard pattern in the standard pattern memory (5) is incorrect as "1 matsushita", in the next step the second voice (voice command) 1
All you have to do is say ``candidate'' and output the next candidate through recognition processing using the first group standard pattern of the second standard pattern memory (6), but it is difficult to find the desired one among the lower candidates. It may not come out. In such a case, for example, it is troublesome to return to the first step with an instruction such as 1 cancel, enter the first voice input standby state, and then speak the name of the dialed destination again, as in the conventional device.

本発明に於いては、上記第２ステツプでの音声指令入力
待ちの状態でいきなり１サンヨウ、と発声すると、この
入力音声のパタンか入力パタン格納部（３〉に格納され
た状態で、この入力パタンと第２グループ標準パタンの
［送出コ、［候補コのパタンとの間で夫々マツチングを
行なうが、そのマ／ナング距離があらかじめ定められた
値より大きくなるので、対象とする標準パタンを第２標
準パタンメモリ（６）の第２グループ標準パタンから’
＄ｌｊＭ準パタンメモリ（５）の第２　Ｉｌ準パタンに
切り替えて、再度マツチングを行ない、その結果を出力
する。この場合、音声指令のための対象語は２語だけな
ので、しきい値をきびしく（）Ｊｌさく）シておけばよ
い。In the present invention, when the user suddenly utters ``1 sanyo'' while waiting for the voice command input in the second step, the input voice pattern is stored in the input pattern storage section (3) and Matching is performed between the pattern and the [sending pattern] and [candidate pattern] of the second group standard pattern, respectively, but since the matching distance is larger than a predetermined value, the target standard pattern is 2 From the 2nd group standard pattern of standard pattern memory (6)'
Switch to the second Il quasi-pattern in the $ljM quasi-pattern memory (5), perform matching again, and output the result. In this case, since there are only two target words for the voice command, the threshold value may be set severely.

（ト）発明の効果本発明の音声認識装置によれば、認識対象語によって標
準パタンをグループ分けすることで、認識の時のしきい
値をきびしくしても高性能な認識率を得ることができる
。また、語鵞の選択のための認識と音声指令のための認
識という２段階制御を行なう場合に、異なる標準パタン
のどちらとマツチングをとるかを予め選択するための音
声指令を行なう必要がないので使用者の負担が軽減でき
る。(G) Effects of the Invention According to the speech recognition device of the present invention, by grouping standard patterns according to recognition target words, it is possible to obtain a high-performance recognition rate even if the threshold at the time of recognition is made strict. can. Furthermore, when performing two-step control of recognition for word selection and recognition for voice commands, there is no need to perform voice commands to select in advance which of the different standard patterns to match. The burden on the user can be reduced.

[Brief explanation of drawings]

第１図は本発明の音声認識装置のブロンク図である。（１）・・・入力部、（２）・・・前処理部、（３・・
・入力パタン格納部、（４）・・・識別部、（５・・・
第１標準パタンメモリ、（６・・・第２標準パタンメモリ、（７・・・結果格納部、（８）・・・距離比較部、（９
）・・・制御部。FIG. 1 is a block diagram of the speech recognition device of the present invention. (1)...Input section, (2)...Preprocessing section, (3...
- Input pattern storage section, (4)...Identification section, (5...
First standard pattern memory, (6... Second standard pattern memory, (7... Result storage section, (8)... Distance comparison section, (9
)...Control unit.

Claims

[Claims]

(1) In a speech recognition device that compares an input speech pattern with a large number of pre-prepared standard speech patterns and recognizes the most similar standard pattern of speech as the input speech, the speech recognition device classifies the input speech into at least two groups. Standard pattern storage means for storing standard patterns in groups; input pattern storage means for storing the input pattern of the latest input voice; and each standard pattern of the standard pattern storage means of the first group and the input pattern storage means. a first stage matching means for calculating the similarity by comparing the input speech patterns of the input speech pattern of the second group with the input speech pattern of the input pattern storage means; a second stage matching means for comparing and calculating similarity through matching; a notification means for notifying the voice recognized by the first stage matching means; a maximum similarity value obtained in the second stage matching processing is a predetermined threshold; control means for controlling the matching processing of the second-stage matching means to be returned to the matching processing of the first-stage matching means with the input pattern stored in the input pattern storage means at this time being smaller than that of the first-stage matching means; A voice recognition device.