JPS58189694A - Voice recognition system - Google Patents

Voice recognition system

Info

Publication number
JPS58189694A
JPS58189694A JP57071225A JP7122582A JPS58189694A JP S58189694 A JPS58189694 A JP S58189694A JP 57071225 A JP57071225 A JP 57071225A JP 7122582 A JP7122582 A JP 7122582A JP S58189694 A JPS58189694 A JP S58189694A
Authority
JP
Japan
Prior art keywords
recognition
standard
state
speech
continuous
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP57071225A
Other languages
Japanese (ja)
Other versions
JPH0421880B2 (en
Inventor
市川 熹
畑岡 信夫
俊宏 木村
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Priority to JP57071225A priority Critical patent/JPS58189694A/en
Publication of JPS58189694A publication Critical patent/JPS58189694A/en
Publication of JPH0421880B2 publication Critical patent/JPH0421880B2/ja
Granted legal-status Critical Current

Links

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 本発明は音声認識方式、%lこ不特定の話者が発声した
連続単語音声を認識する方式の改良に関するものである
DETAILED DESCRIPTION OF THE INVENTION The present invention relates to an improvement in a speech recognition method, a method for recognizing continuous word speech uttered by an unspecified speaker.

従来、不特定の話者の発する音声は認識する方式におい
ては、入力音声の特徴を調べ、その特徴に合うように標
準パタンを変形する学習方式、逆1こ入力音声を標準パ
タンに合うように変形する正規化方式、あるいは話者が
異なることによる音声の変形の範囲を予め予想し、その
変動範囲に多数の標準パタンを配置する多標準方式、お
よび適当な前処理手法と組み合せた判別関数法などが提
案されている。これらの内現在、実用レベルの認識能力
を持つものは多標準方式と判別関数法によるものである
。さら匿、連続単語認識まで能力を拡張することを考え
ると多標準方式がほぼ唯一の現実的方式と言えよう。
Conventionally, methods for recognizing speech uttered by unspecified speakers include a learning method that examines the characteristics of the input speech and transforms a standard pattern to match the characteristics, and a learning method that transforms the input speech into a standard pattern to match the characteristics. A normalization method that performs deformation, a multi-standard method that predicts the range of speech deformation due to different speakers and arranges a large number of standard patterns within that range of variation, and a discriminant function method that combines with an appropriate preprocessing method. etc. have been proposed. Among these, the ones that currently have practical level recognition ability are based on the multi-standard method and the discriminant function method. Considering that the ability can be expanded to include hidden and continuous word recognition, the multi-standard method is almost the only realistic method.

しかしながら、連続単語認識において、可能性のある単
語連続の組み合せを考えると、二段DP法や連続DP法
などの手法を用いても、認識のための処理量は大幅tこ
増加する。従って多標準方式そのままに、不特定話者連
続単語認識を行なう方式では、パタンマ、チング部等の
規膜が非常に大きくなり経済性の点で非現実的なものと
なる。
However, in continuous word recognition, when considering possible combinations of consecutive words, even if techniques such as the two-stage DP method or the continuous DP method are used, the amount of processing for recognition increases significantly. Therefore, in a system that performs speaker-independent continuous word recognition while maintaining the multi-standard system, the membranes such as pattern and chiming parts become extremely large, making it unrealistic from an economic point of view.

本発明では、このような問題点を改善することを目的と
している。
The present invention aims to improve such problems.

音声認識装置への入力中の者は、ある利用場面に注目す
れば、利用中に男女の性が変ったり、成人から子供に変
るなどの変動は起り得ない点に注目する。
Those who are inputting information to a speech recognition device should note that, if they pay attention to a certain use situation, changes such as changing gender from male to female or changing from adult to child cannot occur during use.

すなわち、本発明では、認識装置の状態を、比較的認識
処理量の少ない離散発声音声の認識状態(状態1)と、
処理量の多い連続発声音声の認識状態(状態2)に分け
、先ず状態lで入力音声を認識し、話者の性格を限定し
た後に、その性格の共通の組の標準バタンを用い、状態
2の認識を行なうことにより、状態2における処理量を
低減させようというものである。
That is, in the present invention, the state of the recognition device is divided into a recognition state (state 1) of discrete utterances that requires a relatively small amount of recognition processing;
It is divided into continuous speech recognition states (state 2) that require a large amount of processing. First, the input speech is recognized in state l, and after limiting the personality of the speaker, a standard button of a common set of the personality is used, and state 2 is recognized. The aim is to reduce the amount of processing in state 2 by recognizing the following.

以下、実施例にもとづき本発明を説明する。Hereinafter, the present invention will be explained based on Examples.

第1図は本発明を応用した電話情報サービスシステム構
成の一例である。システム制御部lと本発明iこよる音
声認識部2、音声応答部3、ハイブリッドコイル4、加
入者電話器5からなり、電話情報サービスシステムより
本発明の説明に必要な部分のみを取り出して記しである
FIG. 1 shows an example of the configuration of a telephone information service system to which the present invention is applied. Consisting of a system control unit 1, a voice recognition unit 2 according to the present invention, a voice response unit 3, a hybrid coil 4, and a subscriber telephone 5, only the parts necessary for explaining the present invention are extracted from the telephone information service system and written down. It is.

第2図は本発明を説明するための音声認識装置の構成例
である。第2図において、制御部21は第1図のシステ
ム制御部1からの指令と結果27を授受する他、音声認
識部2の制御を行なう。分析部22で分析された入力音
声は標準バタンメモリ24中の標準バタンデータとの類
似度が類似度計算部23で計算され、連続バタン・マツ
チング部25で最適マツチング値が各標準バタンとの間
で計算される。その結果は判定部26で判定され、判定
結果が制御部21に送られる。連続バタン・マツチング
処理を行なう認識装置の構成はすでに公知なので(%開
昭55−2205号公報参照)その説明は省略する。こ
の装置の例では、常に入力バタンと指定された標準バタ
ンとを照合しているので、入力が離散発声であることが
あらかじめ判明していれば、マツチング部の出力は離散
発声単語が入力されたものとして判定部26で判定すれ
ば良く、連続単語入力の場合をこは、連続単語として判
定して行く方式となっており、連続バタンマツチング部
25の動作は共通である。このマツチング部25の動作
を離散発声用と連続発声用に    ′切り換える方式
(たとえば、特願昭55−158296号参照)の装置
においても以下の説明は全く同様に取り扱える。
FIG. 2 is a configuration example of a speech recognition device for explaining the present invention. In FIG. 2, a control section 21 not only sends and receives commands and results 27 from the system control section 1 of FIG. 1, but also controls the voice recognition section 2. The similarity calculation unit 23 calculates the degree of similarity between the input voice analyzed by the analysis unit 22 and the standard bang data in the standard bang memory 24, and the continuous bang matching unit 25 calculates the optimum matching value between each standard bang data. is calculated. The result is determined by the determination section 26, and the determination result is sent to the control section 21. Since the configuration of a recognition device that performs continuous bump matching processing is already known (see Japanese Patent Application No. 1982-2205), a description thereof will be omitted. In the example of this device, the input button is always compared with the specified standard button, so if it is known in advance that the input is a discrete utterance, the output of the matching section will be the same as that of the input discrete utterance word. In this case, continuous word input is judged as a continuous word, and the operation of the continuous slam matching section 25 is the same. The following explanation can be applied in exactly the same manner in a system in which the operation of the matching section 25 is switched between discrete vocalization and continuous vocalization (see, for example, Japanese Patent Application No. 158296/1983).

いま、登録されている単語の種類が「はい」、「いいえ
」と0〜9の数字とする。また、各単語と数字の標準バ
タンは話者lこよる差異を考慮し、/男/女/子供/各
5種すなわち、3X5−15個ずつ登録されているもの
とする。銀行における残高照会の例を取り上げると、第
1図に戻って、利用者からの電話がシステムに入ると、
先ず音声応答部3は「残高照会ですか」と利用者に問う
と共に、音声認識部2はシステム制御部lの指令にもと
づき、「はい」か「いいえ」の2種の単語を離散入力と
して認識するモート(状態l)で入力を待つ。利用者が
「はい」又は「いいえ」と答えると、認識部2は「はい
」「いいえ」の2語に対し各15個の合計30個の標準
バタンとの照合をすれば良い。この結果、最もマツチン
グの良い標準バタンか男の組(又は女、又は子供の組)
であれば、以降状態2(連続単語認識の状態)では、男
(又は女、又は子供)に属する数字の標準バタンのみを
用いるように制御部21が割部指令を出す。次の段階で
音声応答装置3は「暗証番号をどうぞ」と利用者に音声
出力すると共に、認識部2は状態2となり連続数字認識
可能な状態となる。
It is assumed that the types of words currently registered are "yes", "no", and numbers from 0 to 9. Further, it is assumed that five types of standard clicks for each word and number, ie, 3×5−15, are registered for each word/number/man/woman/child, taking into consideration the differences among speakers. Taking the example of balance inquiry at a bank, going back to Figure 1, when a call from a user enters the system,
First, the voice response unit 3 asks the user, “Do you want to inquire about your balance?” At the same time, the voice recognition unit 2 recognizes two types of words, “yes” or “no,” as discrete inputs based on commands from the system control unit l. Wait for input at the mote (state l). When the user answers "yes" or "no", the recognition unit 2 only has to compare the two words "yes" and "no" with a total of 30 standard bangs, 15 each. As a result, the best matching standard batan or the male group (or female or child group)
If so, then in state 2 (state of continuous word recognition), the control unit 21 issues a division command so that only standard bangs with numbers belonging to men (or women, or children) are used. In the next step, the voice response device 3 outputs a voice to the user saying, "Please enter your password," and the recognition unit 2 changes to state 2, making it possible to recognize consecutive numbers.

利用者は、たとえば暗証番号1’−1234Jなどと音
声で入力すると認識部2は男(又は女、又は子供)の組
に所属する数字標準バタン10X5−50個との照合を
行なえば良いことになる。従って、認識部3のマツチン
グ能力は高々50個の標準バタンとの照合で良いことに
なる。これ番こ対し、状態lで組を定めずに認識する場
合は10XI 5−150個の標準バタンとの照合を要
することになる。
For example, when the user inputs a password such as 1'-1234J by voice, the recognition unit 2 only needs to match it with 10X5-50 number standard buttons belonging to the male (or female, or child) group. Become. Therefore, the matching ability of the recognition unit 3 is sufficient to match at most 50 standard batons. On the other hand, if recognition is performed in state 1 without determining the set, it will be necessary to check with 5 to 150 standard drums of 10XI.

以上説明したごとく、本発明によれば、経済的に、不特
定話者の連続発声した音声を認識するシステムが実現で
きることになりその効果は大きい。
As described above, according to the present invention, it is possible to economically realize a system for recognizing continuous voices uttered by an unspecified speaker, and the effects thereof are significant.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明を応用した電話情報サービスシステムの
一構成例を示し、 第2図は本発明による音声認識装置のブロック構成を示
す。 第 1 図 犯 2 図
FIG. 1 shows an example of the configuration of a telephone information service system to which the present invention is applied, and FIG. 2 shows a block configuration of a speech recognition device according to the present invention. Figure 1 Criminal Figure 2

Claims (1)

【特許請求の範囲】[Claims] 同じ意味を有する音声パタンに対して性質の異なる複数
種類の話者の組ごとに用意された複数組の標準パタンに
より離散発声の音声を認識する第1の認識状態と連続発
声の音声を認識する第2の認識状態とを備えた音声認識
方式において、上記第1の認識状態で入力音声の性質を
認識し、該認識された性質にもとづいて上記第2の認識
状態で使用すべき標準パタンの組を限定することを特徴
とする音声認識方式。
A first recognition state that recognizes discrete utterances and recognizes continuous utterances using multiple sets of standard patterns prepared for multiple types of speakers with different characteristics for voice patterns having the same meaning. In a speech recognition method having a second recognition state, the characteristics of the input speech are recognized in the first recognition state, and a standard pattern to be used in the second recognition state is determined based on the recognized characteristics. A speech recognition method characterized by limiting the number of pairs.
JP57071225A 1982-04-30 1982-04-30 Voice recognition system Granted JPS58189694A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP57071225A JPS58189694A (en) 1982-04-30 1982-04-30 Voice recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP57071225A JPS58189694A (en) 1982-04-30 1982-04-30 Voice recognition system

Publications (2)

Publication Number Publication Date
JPS58189694A true JPS58189694A (en) 1983-11-05
JPH0421880B2 JPH0421880B2 (en) 1992-04-14

Family

ID=13454518

Family Applications (1)

Application Number Title Priority Date Filing Date
JP57071225A Granted JPS58189694A (en) 1982-04-30 1982-04-30 Voice recognition system

Country Status (1)

Country Link
JP (1) JPS58189694A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01290000A (en) * 1988-05-17 1989-11-21 Sharp Corp Voice recognition device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS493507A (en) * 1972-04-19 1974-01-12
JPS56119199A (en) * 1980-02-26 1981-09-18 Sanyo Electric Co Voice identifying device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS493507A (en) * 1972-04-19 1974-01-12
JPS56119199A (en) * 1980-02-26 1981-09-18 Sanyo Electric Co Voice identifying device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01290000A (en) * 1988-05-17 1989-11-21 Sharp Corp Voice recognition device

Also Published As

Publication number Publication date
JPH0421880B2 (en) 1992-04-14

Similar Documents

Publication Publication Date Title
EP0647344B1 (en) Method for recognizing alphanumeric strings spoken over a telephone network
US5895448A (en) Methods and apparatus for generating and using speaker independent garbage models for speaker dependent speech recognition purpose
US5842165A (en) Methods and apparatus for generating and using garbage models for speaker dependent speech recognition purposes
JP3968133B2 (en) Speech recognition dialogue processing method and speech recognition dialogue apparatus
US6076054A (en) Methods and apparatus for generating and using out of vocabulary word models for speaker dependent speech recognition
US5127043A (en) Simultaneous speaker-independent voice recognition and verification over a telephone network
US5125022A (en) Method for recognizing alphanumeric strings spoken over a telephone network
US5517558A (en) Voice-controlled account access over a telephone network
US5365574A (en) Telephone network voice recognition and verification using selectively-adjustable signal thresholds
CA2189011C (en) Method for reducing database requirements for speech recognition systems
US20010056345A1 (en) Method and system for speech recognition of the alphabet
CA1239478A (en) Method and apparatus for use in interactive dialogue
JPS58189694A (en) Voice recognition system
JP3919314B2 (en) Speaker recognition apparatus and method
JP2980382B2 (en) Speaker adaptive speech recognition method and apparatus
JPH10116093A (en) Voice recognition device
JPH01179198A (en) Rejecting system
KR20200134868A (en) Speech synthesis device and speech synthesis method
JPS5934596A (en) Voice recognition processing system
JPS6348599A (en) Voice recognition response system
JPH01197795A (en) Voice recognizing device
JPH04199199A (en) Speech recognition device
JPS60172099A (en) Speaker recognition equipment
JPS60241097A (en) Voice recognition applying equipment
JPH053596B2 (en)