JPH0421880B2 - - Google Patents

Info

Publication number
JPH0421880B2
JPH0421880B2 JP57071225A JP7122582A JPH0421880B2 JP H0421880 B2 JPH0421880 B2 JP H0421880B2 JP 57071225 A JP57071225 A JP 57071225A JP 7122582 A JP7122582 A JP 7122582A JP H0421880 B2 JPH0421880 B2 JP H0421880B2
Authority
JP
Japan
Prior art keywords
standard patterns
recognition
standard
continuous
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP57071225A
Other languages
Japanese (ja)
Other versions
JPS58189694A (en
Inventor
Hiroshi Ichikawa
Nobuo Hataoka
Toshihiro Kimura
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Priority to JP57071225A priority Critical patent/JPS58189694A/en
Publication of JPS58189694A publication Critical patent/JPS58189694A/en
Publication of JPH0421880B2 publication Critical patent/JPH0421880B2/ja
Granted legal-status Critical Current

Links

Description

【発明の詳細な説明】 本発明は音声認識装置、特に不特定の話者が発
声した連続単語音声を認識する装置の改良に関す
るものである。
DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech recognition device, and particularly to an improvement in a device for recognizing continuous word speech uttered by an unspecified speaker.

従来、不特定の話者の発する音声は認識する方
式においては、入力音声の特徴を調べ、その特徴
に合うように標準パタンを変形する学習方式、逆
に入力音声を標準パタンに合うように変形する正
規化方式、あるいは話者が異なることによる音声
の変形の範囲を予め予想し、その変動範囲に多数
の標準パタンを配置する多標準方式、および適当
な前処理手法と組み合せた判別関数法などが提案
されている。これらの内現在、実用レベルの認識
能力を持つものは多標準方式と判別関数法による
ものである。さらに、連続単語認識まで能力を拡
張することを考えると多標準方式がほぼ唯一の現
実的方式と言えよう。
Conventionally, methods for recognizing speech uttered by unspecified speakers include a learning method that examines the characteristics of the input speech and transforms a standard pattern to match the characteristics, and conversely a learning method that transforms the input speech to match the standard pattern. A normalization method that predicts the range of speech deformation caused by different speakers, and a multi-standard method that places a large number of standard patterns within that range of variation, and a discriminant function method that combines with an appropriate preprocessing method. is proposed. Among these, the ones that currently have practical level recognition ability are based on the multi-standard method and the discriminant function method. Furthermore, considering extending the ability to continuous word recognition, the multi-standard method is almost the only realistic method.

しかしながら、連続単語認識において、可能性
のある単語連続の組み合せを考えると、二段DP
法や連続DP法などの手法を用いても、認識のた
めの処理量は大幅に増加する。従つて多標準方式
そのままに、不特定話者連続単語認識を行なう方
式では、パタンマツチング部等の規模が非常に大
きくなり経済性の点で非現実的なものとなる。
However, in continuous word recognition, considering the possible combinations of word sequences, two-stage DP
Even if methods such as the method or continuous DP method are used, the amount of processing required for recognition increases significantly. Therefore, in a system that performs speaker-independent continuous word recognition while maintaining the multi-standard system, the scale of the pattern matching section etc. becomes extremely large, making it impractical from an economic point of view.

本発明では、このような問題点を改善すること
を目的としている。
The present invention aims to improve such problems.

音声認識装置への入力中の者は、ある利用場面
に注目すれば、利用中に男女の性が変つたり、成
人から子供に変るなどの変動は起り得ない点に注
目する。
Those who are inputting data to a speech recognition device should note that, if they pay attention to a certain usage situation, changes such as changing the gender between men and women or changing from an adult to a child cannot occur during the usage.

すなわち、本発明では、認識装置の状態を、比
較的認識処理量の少ない離散発声音声の認識状態
(状態1)と、処理量の多い連続発声音声の認識
状態(状態2)に分け、先ず状態1で入力音声を
認識し、話者の性格を限定した後に、その性格の
共通の組の標準パタンを用い、状態2の認識を行
なうことにより、状態2における処理量を低減さ
せようというものである。
That is, in the present invention, the state of the recognition device is divided into a recognition state of discrete utterances (state 1), which requires a relatively small amount of recognition processing, and a recognition state of continuous utterances, which requires a large amount of processing (state 2). The idea is to reduce the processing amount in state 2 by recognizing the input speech in step 1 and limiting the personality of the speaker, and then performing recognition in state 2 using a standard pattern of a common set of characteristics. be.

以下、実施例にもとづき本発明を説明する。 The present invention will be described below based on Examples.

第1図は本発明を応用した電話情報サービスシ
ステム構成の一例である。システム制御部1と本
発明による音声認識部2、音声応答部3、ハイブ
リツドコイル4、加入者電話器5からなり、電話
情報サービスシステムより本発明の説明に必要な
部分のみを取り出して記してある。
FIG. 1 shows an example of the configuration of a telephone information service system to which the present invention is applied. It consists of a system control section 1, a voice recognition section 2, a voice response section 3, a hybrid coil 4, and a subscriber telephone 5 according to the present invention, and only the parts necessary for explaining the present invention are extracted from the telephone information service system and written. .

第2図は本発明を説明するための音声認識装置
(音声認識部2)の構成例である。第2図におい
て、制御部21は第1図のシステム制御部1から
の指令と結果27を授受する他、音声認識部2の
制御を行なう。分析部22で分析された入力音声
は標準パタンメモリ24中の標準パタンデータと
の類似度が類似度計算部23で計算され、連続パ
タン・マツチング部25で最適マツチング値が各
標準パタンとの間で計算される。その結果は判定
部26で判定され、判定結果が制御部21に送ら
れる。連続パタン・マツチング処理を行なう認識
装置の構成はすでに公知なので(特開昭55−2205
号公報参照)その説明は省略する。この装置の例
では、常に入力パタンと指定された標準パタンと
を照合しているので、入力が離散発声であること
があらかじめ判明していれば、マツチング部の出
力は離散発声単語が入力されたものとして判定部
26で判定すれば良く、連続単語入力の場合に
は、連続単語として判定して行く方式となつてお
り、連続パタン・マツチング部25の動作は共通
である。このマツチング部25の動作を離散発声
用と連続発声用に切り換える方式(たとえば、特
願昭55−158296号参照)の装置においても以下の
説明は全く同様に取り扱える。
FIG. 2 is a configuration example of a speech recognition device (speech recognition unit 2) for explaining the present invention. In FIG. 2, a control section 21 not only sends and receives commands and results 27 from the system control section 1 of FIG. 1, but also controls the voice recognition section 2. The similarity calculation unit 23 calculates the similarity between the input voice analyzed by the analysis unit 22 and the standard pattern data in the standard pattern memory 24, and the continuous pattern matching unit 25 calculates the optimum matching value between each standard pattern. It is calculated by The result is determined by the determination section 26, and the determination result is sent to the control section 21. The configuration of a recognition device that performs continuous pattern matching processing is already known (Japanese Patent Laid-Open No. 55-2205).
(Refer to the publication) The explanation thereof will be omitted. In the example of this device, the input pattern is always compared with the specified standard pattern, so if it is known in advance that the input is a discrete utterance, the output of the matching section will be the same as that of the input discrete utterance word. In the case of continuous word input, the continuous pattern matching section 25 operates in the same way. The following explanation can be applied in exactly the same manner in a system in which the operation of the matching section 25 is switched between discrete vocalization and continuous vocalization (see, for example, Japanese Patent Application No. 158296/1982).

いま、登録されている単語の種類が「はい」、
「いいえ」と0〜9の数字とする。また、各単語
と数字の標準パタンは話者による差異を考慮
し、/男/女/子供/各5種すなわち、3×5=
15個ずつ登録されているものとする。銀行におけ
る残高照会の例を取り上げると、第1図に戻つ
て、利用者からの電話がシステムに入ると、先ず
音声応答部3は「残高照会ですか」と利用者に問
うと共に、音声認識部2はシステム制御部1の指
令にもとづき、「はい」か「いいえ」の2種の単
語を離散入力として認識するモード(状態1)で
入力を待つ。利用者が「はい」又は「いいえ」と
答えると、音声認識部2は「はい」「いいえ」の
2語に対し各15個の合計30個の標準パタンとの照
合をすれば良い。この結果、最もマツチングの良
い標準パタンが男の組(又は女、又は子供の組)
であれば、以降状態2(連続単語認識の状態)で
は、男(又は女、又は子供)に属する数字の標準
パタンのみを用いるように制御部21が制御指令
を出す。次の段階で音声応答装置(音声応答部
3)は「暗証番号をどうぞ」と利用者に音声出力
すると共に、音声認識部2は状態2となり連続数
字認識可能な状態となる。利用者は、たとえば暗
証番号「1234」などと音声で入力すると音声認識
部2は男(又は女、又は子供)の組に所属する数
字標準パタン10×5=50個との照合を行なえば良
いことになる。従つて、音声認識部2のマツチン
グ能力は高々50個の標準パタンとの照合で良いこ
とになる。これに対し、状態1で組を定めずに認
識する場合は10×15=150個の標準パタンとの照
合を要することになる。
If the type of word currently registered is "Yes",
“No” and a number from 0 to 9. In addition, the standard pattern of each word and number takes into account differences depending on the speaker, /man / woman / child / 5 types each, i.e. 3 × 5 =
It is assumed that 15 items are registered each. Taking the example of a balance inquiry at a bank, returning to Figure 1, when a call from a user enters the system, the voice response unit 3 first asks the user, ``Do you want to inquire about your balance?'' and the voice recognition unit 2 waits for input in a mode (state 1) in which two types of words, "yes" and "no", are recognized as discrete inputs based on a command from the system control unit 1. When the user answers "yes" or "no," the speech recognition unit 2 only needs to match the two words "yes" and "no" with a total of 30 standard patterns, 15 each. As a result, the standard pattern with the best matching is the group of men (or the group of women, or the group of children).
If so, then in state 2 (continuous word recognition state), the control unit 21 issues a control command to use only the standard pattern of numbers belonging to men (or women, or children). In the next step, the voice response device (voice response section 3) outputs a voice message to the user saying "please enter your password," and the voice recognition section 2 changes to state 2, allowing continuous digit recognition. For example, when the user inputs a password such as "1234" by voice, the voice recognition unit 2 only needs to match it with 10 x 5 = 50 standard numbers patterns belonging to the male (or female, or child) group. It turns out. Therefore, the matching ability of the speech recognition unit 2 is sufficient to match at most 50 standard patterns. On the other hand, in the case of recognition without determining the set in state 1, comparison with 10×15=150 standard patterns is required.

以上説明したごとく、本発明によれば、経済的
に、不特定話者の連続発声した音声を認識するシ
ステムが実現できることになりその効果は大き
い。
As described above, according to the present invention, it is possible to economically realize a system for recognizing continuous voices uttered by an unspecified speaker, and the effects thereof are significant.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明を応用した電話情報サービスシ
ステムの一構成例を示し、第2図は本発明による
音声認識装置のブロツク構成を示す。 25:連続パタン・マツチング部。
FIG. 1 shows an example of the configuration of a telephone information service system to which the present invention is applied, and FIG. 2 shows a block configuration of a speech recognition device according to the present invention. 25: Continuous pattern matching section.

Claims (1)

【特許請求の範囲】 1 同じ意味を有する複数の標準パタンを1組と
し、性質の異なる話者に対応して複数組の標準パ
タンを記憶する手段と 入力音声と上記標準パタンとを照合する手段
と、 不特定話者に対して一連の応答の中で先ず求め
られる離散発声の入力音声と上記記憶手段に記憶
された上記複数組の標準パタンとを上記照合手段
で照合するように制御し、上記照合手段の結果に
応じて上記話者の性質を特定して上記記憶手段に
記憶された上記複数組の標準パタンのうち1組の
標準パタンを限定し、上記特定された性質の話者
による連続発声の入力音声と上記限定された1組
の標準パタンとを上記照合手段で照合するように
制御する手段と を備えたことを特徴とする音声認識装置。
[Scope of Claims] 1. Means for storing a plurality of sets of standard patterns having the same meaning in correspondence with speakers with different characteristics; and means for comparing input speech with the standard patterns. and controlling the matching means to match the input voice of the discrete utterance that is first obtained in a series of responses to an unspecified speaker with the plurality of sets of standard patterns stored in the storage means, The characteristic of the speaker is specified in accordance with the result of the matching means, and one set of standard patterns among the plurality of sets of standard patterns stored in the storage means is limited, and A speech recognition device comprising: means for controlling the collating means to collate continuous utterance input speech with the limited set of standard patterns.
JP57071225A 1982-04-30 1982-04-30 Voice recognition system Granted JPS58189694A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP57071225A JPS58189694A (en) 1982-04-30 1982-04-30 Voice recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP57071225A JPS58189694A (en) 1982-04-30 1982-04-30 Voice recognition system

Publications (2)

Publication Number Publication Date
JPS58189694A JPS58189694A (en) 1983-11-05
JPH0421880B2 true JPH0421880B2 (en) 1992-04-14

Family

ID=13454518

Family Applications (1)

Application Number Title Priority Date Filing Date
JP57071225A Granted JPS58189694A (en) 1982-04-30 1982-04-30 Voice recognition system

Country Status (1)

Country Link
JP (1) JPS58189694A (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01290000A (en) * 1988-05-17 1989-11-21 Sharp Corp Voice recognition device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS493507A (en) * 1972-04-19 1974-01-12
JPS56119199A (en) * 1980-02-26 1981-09-18 Sanyo Electric Co Voice identifying device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS493507A (en) * 1972-04-19 1974-01-12
JPS56119199A (en) * 1980-02-26 1981-09-18 Sanyo Electric Co Voice identifying device

Also Published As

Publication number Publication date
JPS58189694A (en) 1983-11-05

Similar Documents

Publication Publication Date Title
US5895448A (en) Methods and apparatus for generating and using speaker independent garbage models for speaker dependent speech recognition purpose
US5842165A (en) Methods and apparatus for generating and using garbage models for speaker dependent speech recognition purposes
EP0647344B1 (en) Method for recognizing alphanumeric strings spoken over a telephone network
US6076054A (en) Methods and apparatus for generating and using out of vocabulary word models for speaker dependent speech recognition
US5125022A (en) Method for recognizing alphanumeric strings spoken over a telephone network
JP3968133B2 (en) Speech recognition dialogue processing method and speech recognition dialogue apparatus
US5297194A (en) Simultaneous speaker-independent voice recognition and verification over a telephone network
US5517558A (en) Voice-controlled account access over a telephone network
US8612235B2 (en) Method and system for considering information about an expected response when performing speech recognition
AU704831B2 (en) Method for reducing database requirements for speech recognition systems
CA1239478A (en) Method and apparatus for use in interactive dialogue
JPH0421880B2 (en)
JP2980382B2 (en) Speaker adaptive speech recognition method and apparatus
JP2000122693A (en) Speaker recognizing method and speaker recognizing device
JPH04280299A (en) Speech recognition device
JPH0217038B2 (en)
JPH04301695A (en) Dictionary control system for speech recognition device
JPS5993500A (en) Voice recognition equipment
JPS602998A (en) Method of composing voice dictionary for voice recognition system
JPS6348599A (en) Voice recognition response system
JPH03248198A (en) Voice recognition device
JPS61123889A (en) Voice recognition equipment
JPH04128898A (en) Voice recognition device
JPH053596B2 (en)
JPS60241097A (en) Voice recognition applying equipment