JPH02275497A - Voice recognition device - Google Patents

Voice recognition device

Info

Publication number
JPH02275497A
JPH02275497A JP1096706A JP9670689A JPH02275497A JP H02275497 A JPH02275497 A JP H02275497A JP 1096706 A JP1096706 A JP 1096706A JP 9670689 A JP9670689 A JP 9670689A JP H02275497 A JPH02275497 A JP H02275497A
Authority
JP
Japan
Prior art keywords
input
pattern
matching
voice
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP1096706A
Other languages
Japanese (ja)
Inventor
Shoichi Kamei
亀井 正一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sanyo Electric Co Ltd
Original Assignee
Sanyo Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sanyo Electric Co Ltd filed Critical Sanyo Electric Co Ltd
Priority to JP1096706A priority Critical patent/JPH02275497A/en
Publication of JPH02275497A publication Critical patent/JPH02275497A/en
Priority to US07/896,414 priority patent/US5301227A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To improve the recognition rate by performing matching processes in two stages wherein input voices which can be recognized in the respective matching processes are limited respectively. CONSTITUTION:This device is equipped with standard pattern storage means 5 and 6 wherein standard patterns classified in two groups are stored by the groups and 1st stage and 2nd stage matching means which perform similarity calculation. Then this device is provided with a control means 9 which performs return control from the matching process of the 2nd matching means to the matching process of the 1st stage matching means while input patterns in an input pattern storage means at this time are held when the maximum similarity value obtained by the 2nd stage matching process is smaller than a specific threshold value. Therefore, if misrecognition is caused in the 1st step, the voice recognition in the 1st step can be performed even in the 2nd step without moving back to the 1st step from the 2nd step. Consequently, the high-performance recognition rate can be obtained.

Description

【発明の詳細な説明】 (イ)産業上の利用分野 本発明は音声認識して目的の電気機器を制御し得るよう
になした音声認識装置に関する。
DETAILED DESCRIPTION OF THE INVENTION (A) Field of Industrial Application The present invention relates to a voice recognition device capable of recognizing voice and controlling target electrical equipment.

(ロ) 従来の技術 近年、音声認識装置に於ける音声認識率の向上に伴い、
音声制御できる電子機器、例えばオートダイヤルできる
電話機が実用化されつつある(特開昭62−81152
号)。
(b) Conventional technology In recent years, with the improvement in the speech recognition rate of speech recognition devices,
Electronic devices that can be controlled by voice, such as telephones that can auto-dial, are being put into practical use (Japanese Patent Laid-Open No. 62-81152).
issue).

例えば、音声認識オートダイヤル電話機の場合、その音
声認識装置としては、第1ステツプでダイヤル先名称(
個人名、会社名等)を音声認識し、第2ステツプで指令
音声(ダイヤル、キャンセル停)を音声認識する2段階
認識処理方式を採用したものが最も現実的である。
For example, in the case of a voice recognition auto-dial telephone, the voice recognition device uses the dial destination name (
The most practical method is one that adopts a two-step recognition processing method in which the user's name (individual name, company name, etc.) is voice-recognized, and the command voice (dial, cancel stop, etc.) is voice-recognized in the second step.

即ち、2段階認識処理は、話者が発声したダイヤル先名
称を認識した結果で直ちにダイヤル先の1話番号をダイ
ヤルするのではなく、この認識結果を表示、あるいは合
成音声出力で話者に確認させ、誤認識が無い時に、音声
によるダイヤル指令を与えるので、音声の誤認識による
ダイヤル誤りを未然に貼止できる。
In other words, the two-step recognition process recognizes the name of the dialed destination uttered by the speaker, and instead of immediately dialing the first number of the dialed destination, this recognition result is displayed or confirmed to the speaker using synthesized voice output. Since a dialing command is given by voice when there is no erroneous recognition, it is possible to prevent dialing errors caused by erroneous voice recognition.

このような2段階認識処理を行なうパタンマツチング手
法の音声認識装置は、第1ステツプではそのステップに
しか要求されない語の音声、例えば複数のダイヤル先名
称の音声、を認識するために、これ等ダイヤル先名称の
音声の標準パタン群(第1のグループ)を用い、第2ス
テツプでは複数の指令音声を認識するために、これ等指
令音声の標準パタン群(第2グループ)を用いる事にな
る。
A speech recognition device using a pattern matching method that performs such a two-step recognition process uses these words in the first step to recognize the speech of a word that is required only for that step, such as the speech of multiple dialed destination names. A standard group of voice patterns (first group) for the dial destination name is used, and in the second step, a standard group of voice command patterns (second group) are used to recognize multiple command voices. .

これ等第1グループ、第2グループの標準パタンを分類
しないで、1つのグループとして、上述の第1、第2の
両ステップの音声認識処理に同様に用いる事もできるが
、この場合には、各ステップの音声認識の為のパタンマ
ツチング(入力音声のパタンと、標準パタンとの誤差計
算)処理量が大きくなり、しかも誤認識発生の確率を増
す事になるので、これを回避するために、上述の如く標
準音声パタンはステップ毎の認識処理に対応してグルー
プ分けきれている。
These standard patterns of the first group and the second group can be used as one group for the speech recognition processing of both the first and second steps described above, without classifying them, but in this case, The amount of processing required for pattern matching (calculating the error between the input speech pattern and the standard pattern) for speech recognition in each step becomes large, and the probability of misrecognition occurring increases.To avoid this, As mentioned above, the standard speech patterns can be divided into groups corresponding to the recognition processing for each step.

(ハ)発明が解決しようとする課題 上述の如く、多段階ステップで音声認識処理を行なう従
来の音声認識装置に於ては、例えばオートダイヤル電話
機に於いては、第1ステツプで話者が発声したダイヤル
先名称の認識結果が誤認識であった場合、次の第2ステ
ツプで話者が発声した指令音声1キヤンセル」を認識さ
せ、この認識が正しく行なわれることを条件に第1ステ
ツプにもどしてから、ダイヤル先名称を再発声してこれ
の認識を行なわせる必要がある。このような音声の再入
力のための手順は、非常に煩雑であり、音声入力する話
者への負担が大きくなる欠点があった。
(c) Problems to be Solved by the Invention As mentioned above, in conventional speech recognition devices that perform speech recognition processing in multiple steps, for example, in an auto-dial telephone, the speaker utters a voice in the first step. If the recognition result of the dialed destination name is an incorrect recognition, in the next second step, the command voice 1 cancel uttered by the speaker is recognized, and on the condition that this recognition is performed correctly, the process returns to the first step. It is then necessary to re-speak the name of the dialed destination so that it can be recognized. Such a procedure for re-inputting voice is extremely complicated and has the drawback of increasing the burden on the speaker who inputs voice.

本発明は断る欠点を解消すべくなされたものであり、第
1ステツプに誤認識があった場合、第2ステツプから第
1ステツプへもどる事なく、第2ステツプでも第1ステ
ツプの再度の音声認識処理を可能とした音声認識装置を
実現するものである。
The present invention has been made in order to eliminate the drawback of refusing.If there is an erroneous recognition in the first step, the voice recognition of the first step is repeated at the second step without returning from the second step to the first step. This realizes a speech recognition device that enables processing.

(ニ)課題を解決するための手段 本発明の音声認識装置は、少なくとも2グループに分類
された標準パタンをグループ別に格納した標準パタン記
憶手段と、最新の入力音声の入力パタンを記憶しておく
入力パタン記憶手段と、第1グループの上記標準パタン
記憶手段の各標準パタンと上記入力パタン記憶手段の入
力音声パタンとをパタンマツチングにより比較して類似
度計算を行う第1段マツチング手段と、第2グループの
上記標準パタン記憶手段の各標準パタンと上記入力パタ
ン記憶手段の入力音声パタンとをパタンマツチングによ
り比較して類似度計算を行う第2段マツチング手段と、
上記第1段マツチング手段で認識した音声を報知する報
知手段、上記第2段マツチング処理で得られる最大類似
度値が所定の閾値より小さい場合に、この時の入力パタ
ン記憶手段の入力パタンを保存した状態で、第2マツチ
ング手段のマツチング処理から上記第1段マツチング手
段のマツチング処理に差し戻す制御を司る制御手段を備
えてなるものである。
(d) Means for Solving the Problems The speech recognition device of the present invention includes a standard pattern storage means storing standard patterns classified into at least two groups for each group, and an input pattern of the latest input speech. an input pattern storage means, a first stage matching means for calculating similarity by comparing each standard pattern of the first group of the standard pattern storage means and the input voice pattern of the input pattern storage means by pattern matching; a second stage matching means for calculating similarity by comparing each standard pattern of the second group of the standard pattern storage means and the input voice pattern of the input pattern storage means by pattern matching;
Notifying means for notifying the voice recognized by the first stage matching means; if the maximum similarity value obtained in the second stage matching process is smaller than a predetermined threshold, the input pattern at this time is stored in the input pattern storage means; In this state, the apparatus includes a control means for controlling the matching process of the second matching means to be returned to the matching process of the first stage matching means.

(ホ)作用 本発明の音声認識装置は、各段マツチング処理で認識で
きる入力計重がそれぞれ制限きれている少なくとも2段
のマツチング処理が行える装置であり、第1段マツチン
グで認識した音声が誤認識であると判った時に、該誤認
識音声を再入力することにより、次の2段マツチング処
理で得られる最大類似度値が所定の閾値より小さくなる
ので、この時の入力パタン記憶手段の入力パタンを保存
した状態で、第2段マツチング処理から前段の第1段マ
ツプ・ング処理に自動的に戻すことができる。
(E) Function The speech recognition device of the present invention is a device capable of performing at least two stages of matching processing in which the input weights that can be recognized in each stage of matching processing are each limited, and the speech recognized in the first stage matching is incorrect. When it is determined that the erroneously recognized speech is recognized, by re-inputting the misrecognized speech, the maximum similarity value obtained in the next two-stage matching process becomes smaller than the predetermined threshold. With the pattern saved, it is possible to automatically return from the second stage matching process to the preceding first stage mapping process.

(へ)実施例 第1図に本発明の音声、認識装置の構成を示す。(f) Example FIG. 1 shows the configuration of a speech recognition device according to the present invention.

同図の本発明装置は、音声を入力する入力部(1)と、
入力音声から特徴パラメータを抽出する前処理部(2)
と予め作成しである2種類のグループの標準パタンで、
第一の入力音声との間でマツチングを行なう標準パタン
(5)と、第二の入力音声との間でマツチングを行なう
標準パタン(6)と、これらの標準パタンと前処理部(
2)によって特徴抽出された入力パタンとの間で距離(
誤差と等しく類似度とは逆数的関係にある)計算を行な
って、最小距離のパタンを!5識結果として出力する識
別部(4)を基本構成としている。
The device of the present invention shown in the figure includes an input section (1) for inputting audio;
Preprocessing unit (2) that extracts feature parameters from input audio
and standard patterns of two types of groups created in advance,
A standard pattern (5) for matching with the first input audio, a standard pattern (6) for matching with the second input audio, and a preprocessing unit (
The distance (
It is equal to error and has a reciprocal relationship with similarity), and calculate the minimum distance pattern! The basic configuration is an identification section (4) that outputs the identification result.

更に同図実施例構成を以下に詳述する。Furthermore, the configuration of the embodiment shown in the figure will be described in detail below.

まず、第一の音声が入力部(1)に入力されると入力さ
れた音声は前処理部(2)で特徴パラメータが抽出され
、標準パタン(5)との距離計算が識別部(4)で行な
われ、認識結果が結果格納部(7)に格納される。そこ
で制御部(9)は音声指令のための第二の音声入力待ち
状態にし、第二の音声が入力部(1)に入力されると前
処理部(2)で特徴パラメータが抽出された後、入力パ
タン格納部(3)にその特徴パタンか格納される。そし
て、標準パタン(6)との距離計算が識別部(4)で行
なわれ、その時のマツチング距離が距離比較部(8)で
、予め定められたしきい値と比較きれる。即ち、距離が
小さい程、類似度は大きい事になる。従って、この距離
が所定のしきい値より大きい場合(類似度が所定の値よ
り小さい場合〉には、制御部(9)に対して認識結果無
効信号を出し、それを受は取った制御部(9)は結果格
納部(7)に対して消去信号を出して、第一の入力音声
に対する認識結果を消去する。さらに、入力パタン格納
部(3)に対して入力パタン出力信号を出し、識別部(
4)において標準パタン(5)との間で距離計算を行な
い、その結果を結果格納部(7)に格納する。
First, when the first voice is input to the input unit (1), the preprocessing unit (2) extracts feature parameters from the input voice, and the distance calculation between the input voice and the standard pattern (5) is performed by the identification unit (4). The recognition results are stored in the result storage section (7). Therefore, the control unit (9) waits for a second voice input for a voice command, and when the second voice is input to the input unit (1), the preprocessing unit (2) extracts the feature parameters and then waits for a second voice input. , the characteristic pattern is stored in the input pattern storage section (3). Distance calculation with the standard pattern (6) is then performed in the identification section (4), and the matching distance at that time is compared with a predetermined threshold value in the distance comparison section (8). In other words, the smaller the distance, the greater the similarity. Therefore, if this distance is larger than a predetermined threshold (if the degree of similarity is smaller than a predetermined value), a recognition result invalidation signal is sent to the control unit (9), and the control unit that receives the signal outputs a recognition result invalidation signal. (9) outputs a deletion signal to the result storage unit (7) to delete the recognition result for the first input voice.Furthermore, outputs an input pattern output signal to the input pattern storage unit (3), Identification part (
In step 4), distance calculation is performed between the standard pattern (5) and the result is stored in the result storage section (7).

一方、しきい値より小さい場合(類似度が所定の値より
大きい場合)には第二の音声の認識結果が制御部(9)
に送られ、それに対応した制御信号を出力する。
On the other hand, if it is smaller than the threshold value (if the similarity is larger than a predetermined value), the recognition result of the second voice is sent to the control unit (9).
and outputs a corresponding control signal.

次に、本発明の音声認識装置をオートダイヤルT話機に
採用した場合の一例を以下に示す。
Next, an example in which the speech recognition device of the present invention is employed in an auto-dial T-phone will be shown below.

まず、第1標準パタンメモリ(5)の第1グループ標準
パタンとして下記茨の多数のダイヤル先名称、第2N4
準パタンメモリ(6)の第2グループ標準パタンとして
下記表の2語の音声指令を月意しておく。
First, as the 1st group standard pattern of the 1st standard pattern memory (5), the following thorny dial destination names, 2nd N4
The two-word voice commands shown in the table below are prepared as the second group standard patterns in the semi-pattern memory (6).

表:標準パタン化された音声 なお、上表の音声指令の1ソウシユツ」は[送出コ(ダ
イヤル先名称に対応したダイヤルを送出する)、「コウ
ホ、は[候補](認識結果の次候補を出力する)を示し
ている。
Table: Standard patterned voice Note that one of the voice commands in the table above is [Send ko (sends the dial that corresponds to the name of the dial destination)], [Send the dial that corresponds to the name of the dial destination], [Send the dial that corresponds to the dial destination name], [Send the dial that corresponds to the name of the dial destination], [Send the dial that corresponds to the name of the dial destination], [Send the dial that corresponds to the name of the dial destination], output).

まず、発声者が第1ステツプで、第一の音声(相手先、
即ちダイヤル先名称)「サンヨウ」を発声し、上記第1
標準パタンメモリ(5)の第1グループ標準パタンを用
いて認識処理(距離計算)した結果が1マツシタ」と間
違った場合、次のステップで第二の音声(音声指令)1
候補」を発声し、上記第2標準パタンメモリ(6)の第
1グループ標準パタンを用いて認識処理して次候補を出
力すればよいわけであるが、目的の1サンヨウ、が下位
候補からなかなか出てこない場合がある。そのような場
合に例えば、従来装置のように1キヤンセル、等の指示
語によって第1ステツプにもどして第一の音声入力待ち
状態にし、再度ダイヤル先名称を発声するのは面倒であ
る。
First, in the first step, the speaker hears the first voice (the other party's voice,
(i.e. the name of the dialed destination), say "Sanyou" and dial the number 1 above.
If the result of recognition processing (distance calculation) using the first group standard pattern in the standard pattern memory (5) is incorrect as "1 matsushita", in the next step the second voice (voice command) 1
All you have to do is say ``candidate'' and output the next candidate through recognition processing using the first group standard pattern of the second standard pattern memory (6), but it is difficult to find the desired one among the lower candidates. It may not come out. In such a case, for example, it is troublesome to return to the first step with an instruction such as 1 cancel, enter the first voice input standby state, and then speak the name of the dialed destination again, as in the conventional device.

本発明に於いては、上記第2ステツプでの音声指令入力
待ちの状態でいきなり1サンヨウ、と発声すると、この
入力音声のパタンか入力パタン格納部(3〉に格納され
た状態で、この入力パタンと第2グループ標準パタンの
[送出コ、[候補コのパタンとの間で夫々マツチングを
行なうが、そのマ/ナング距離があらかじめ定められた
値より大きくなるので、対象とする標準パタンを第2標
準パタンメモリ(6)の第2グループ標準パタンから’
$ljM準パタンメモリ(5)の第2 Il準パタンに
切り替えて、再度マツチングを行ない、その結果を出力
する。この場合、音声指令のための対象語は2語だけな
ので、しきい値をきびしく()Jlさく)シておけばよ
い。
In the present invention, when the user suddenly utters ``1 sanyo'' while waiting for the voice command input in the second step, the input voice pattern is stored in the input pattern storage section (3) and Matching is performed between the pattern and the [sending pattern] and [candidate pattern] of the second group standard pattern, respectively, but since the matching distance is larger than a predetermined value, the target standard pattern is 2 From the 2nd group standard pattern of standard pattern memory (6)'
Switch to the second Il quasi-pattern in the $ljM quasi-pattern memory (5), perform matching again, and output the result. In this case, since there are only two target words for the voice command, the threshold value may be set severely.

(ト)発明の効果 本発明の音声認識装置によれば、認識対象語によって標
準パタンをグループ分けすることで、認識の時のしきい
値をきびしくしても高性能な認識率を得ることができる
。また、語鵞の選択のための認識と音声指令のための認
識という2段階制御を行なう場合に、異なる標準パタン
のどちらとマツチングをとるかを予め選択するための音
声指令を行なう必要がないので使用者の負担が軽減でき
る。
(G) Effects of the Invention According to the speech recognition device of the present invention, by grouping standard patterns according to recognition target words, it is possible to obtain a high-performance recognition rate even if the threshold at the time of recognition is made strict. can. Furthermore, when performing two-step control of recognition for word selection and recognition for voice commands, there is no need to perform voice commands to select in advance which of the different standard patterns to match. The burden on the user can be reduced.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明の音声認識装置のブロンク図である。 (1)・・・入力部、(2)・・・前処理部、(3・・
・入力パタン格納部、(4)・・・識別部、(5・・・
第1標準パタンメモリ、 (6・・・第2標準パタンメモリ、 (7・・・結果格納部、(8)・・・距離比較部、(9
)・・・制御部。
FIG. 1 is a block diagram of the speech recognition device of the present invention. (1)...Input section, (2)...Preprocessing section, (3...
- Input pattern storage section, (4)...Identification section, (5...
First standard pattern memory, (6... Second standard pattern memory, (7... Result storage section, (8)... Distance comparison section, (9
)...Control unit.

Claims (1)

【特許請求の範囲】[Claims] (1)入力音声パタンと予じめ用意された多数の標準音
声パタンとを比較して最も類似した標準パタンの音声を
入力音声と認識する音声認識装置に於て、 少なくとも2グループに分類された標準パタンをグルー
プ別に格納した標準パタン記憶手段と、最新の入力音声
の入力パタンを記憶しておく入力パタン記憶手段と、第
1グループの上記標準パタン記憶手段の各標準パタンと
上記入力パタン記憶手段の入力音声パタンとをパタンマ
ッチングにより比較して類似度計算を行う第1段マッチ
ング手段と、第2グループの上記標準パタン記憶手段の
各標準パタンと上記入力パタン記憶手段の入力音声パタ
ンとをパタンマッチングにより比較して類似度計算を行
う第2段マッチング手段と、上記第1段マッチング手段
で認識した音声を報知する報知手段、上記第2段マッチ
ング処理で得られる最大類似度値が所定の閾値より小さ
い場合に、この時の入力パタン記憶手段の入力パタンを
保存した状態で、第2段マッチング手段のマッチング処
理から上記第1段マッチング手段のマッチング処理に差
し戻す制御を司る制御手段を備えてなる音声認識装置。
(1) In a speech recognition device that compares an input speech pattern with a large number of pre-prepared standard speech patterns and recognizes the most similar standard pattern of speech as the input speech, the speech recognition device classifies the input speech into at least two groups. Standard pattern storage means for storing standard patterns in groups; input pattern storage means for storing the input pattern of the latest input voice; and each standard pattern of the standard pattern storage means of the first group and the input pattern storage means. a first stage matching means for calculating the similarity by comparing the input speech patterns of the input speech pattern of the second group with the input speech pattern of the input pattern storage means; a second stage matching means for comparing and calculating similarity through matching; a notification means for notifying the voice recognized by the first stage matching means; a maximum similarity value obtained in the second stage matching processing is a predetermined threshold; control means for controlling the matching processing of the second-stage matching means to be returned to the matching processing of the first-stage matching means with the input pattern stored in the input pattern storage means at this time being smaller than that of the first-stage matching means; A voice recognition device.
JP1096706A 1989-04-17 1989-04-17 Voice recognition device Pending JPH02275497A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP1096706A JPH02275497A (en) 1989-04-17 1989-04-17 Voice recognition device
US07/896,414 US5301227A (en) 1989-04-17 1992-06-10 Automatic dial telephone

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1096706A JPH02275497A (en) 1989-04-17 1989-04-17 Voice recognition device

Publications (1)

Publication Number Publication Date
JPH02275497A true JPH02275497A (en) 1990-11-09

Family

ID=14172199

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1096706A Pending JPH02275497A (en) 1989-04-17 1989-04-17 Voice recognition device

Country Status (1)

Country Link
JP (1) JPH02275497A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000348187A (en) * 1999-03-29 2000-12-15 Sony Corp Method and device for picture processing and recording medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000348187A (en) * 1999-03-29 2000-12-15 Sony Corp Method and device for picture processing and recording medium
JP4496595B2 (en) * 1999-03-29 2010-07-07 ソニー株式会社 Image processing apparatus, image processing method, and recording medium

Similar Documents

Publication Publication Date Title
US5719921A (en) Methods and apparatus for activating telephone services in response to speech
JP3150085B2 (en) Multi-modal telephone
JP3168033B2 (en) Voice telephone dialing
US5917889A (en) Capture of alphabetic or alphanumeric character strings in an automated call processing environment
JP3204632B2 (en) Voice dial server
CA2058644C (en) Voice activated telephone set
JPH02275497A (en) Voice recognition device
JPH03248199A (en) Voice recognition system
JPS5823097A (en) Voice recognition apparatus
US6801890B1 (en) Method for enhancing recognition probability in voice recognition systems
JPH0432900A (en) Sound recognizing device
JP2788658B2 (en) Voice dialing device
JPS6361300A (en) Voice recognition system
JP3112556B2 (en) Voice dialer
KR0173914B1 (en) Name Search Method in Voice Dialing System
KR100230972B1 (en) Voice cognition service apparatus of full electronic exchange
JPS638798A (en) Voice recognition equipment
JPS605337A (en) Voice inputting system
JPH03180897A (en) Voice recognition device
JPS63259599A (en) Voice recognition equipment
JPH02278297A (en) Voice recognizing device
JP2005159395A (en) System for telephone reception and translation
JPH0197044A (en) Voice dialing device
JPH03153300A (en) Voice input device
JPH01293397A (en) Speech answer system