JPH0421880B2

JPH0421880B2 -

Info

Publication number: JPH0421880B2
Application number: JP57071225A
Authority: JP
Inventors: Hiroshi Ichikawa; Nobuo Hataoka; Toshihiro Kimura
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1982-04-30
Filing date: 1982-04-30
Publication date: 1992-04-14
Also published as: JPS58189694A

Description

【発明の詳細な説明】本発明は音声認識装置、特に不特定の話者が発
声した連続単語音声を認識する装置の改良に関す
るものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech recognition device, and particularly to an improvement in a device for recognizing continuous word speech uttered by an unspecified speaker.

従来、不特定の話者の発する音声は認識する方
式においては、入力音声の特徴を調べ、その特徴
に合うように標準パタンを変形する学習方式、逆
に入力音声を標準パタンに合うように変形する正
規化方式、あるいは話者が異なることによる音声
の変形の範囲を予め予想し、その変動範囲に多数
の標準パタンを配置する多標準方式、および適当
な前処理手法と組み合せた判別関数法などが提案
されている。これらの内現在、実用レベルの認識
能力を持つものは多標準方式と判別関数法による
ものである。さらに、連続単語認識まで能力を拡
張することを考えると多標準方式がほぼ唯一の現
実的方式と言えよう。 Conventionally, methods for recognizing speech uttered by unspecified speakers include a learning method that examines the characteristics of the input speech and transforms a standard pattern to match the characteristics, and conversely a learning method that transforms the input speech to match the standard pattern. A normalization method that predicts the range of speech deformation caused by different speakers, and a multi-standard method that places a large number of standard patterns within that range of variation, and a discriminant function method that combines with an appropriate preprocessing method. is proposed. Among these, the ones that currently have practical level recognition ability are based on the multi-standard method and the discriminant function method. Furthermore, considering extending the ability to continuous word recognition, the multi-standard method is almost the only realistic method.

しかしながら、連続単語認識において、可能性
のある単語連続の組み合せを考えると、二段DP
法や連続DP法などの手法を用いても、認識のた
めの処理量は大幅に増加する。従つて多標準方式
そのままに、不特定話者連続単語認識を行なう方
式では、パタンマツチング部等の規模が非常に大
きくなり経済性の点で非現実的なものとなる。 However, in continuous word recognition, considering the possible combinations of word sequences, two-stage DP
Even if methods such as the method or continuous DP method are used, the amount of processing required for recognition increases significantly. Therefore, in a system that performs speaker-independent continuous word recognition while maintaining the multi-standard system, the scale of the pattern matching section etc. becomes extremely large, making it impractical from an economic point of view.

本発明では、このような問題点を改善すること
を目的としている。 The present invention aims to improve such problems.

音声認識装置への入力中の者は、ある利用場面
に注目すれば、利用中に男女の性が変つたり、成
人から子供に変るなどの変動は起り得ない点に注
目する。 Those who are inputting data to a speech recognition device should note that, if they pay attention to a certain usage situation, changes such as changing the gender between men and women or changing from an adult to a child cannot occur during the usage.

すなわち、本発明では、認識装置の状態を、比
較的認識処理量の少ない離散発声音声の認識状態
（状態１）と、処理量の多い連続発声音声の認識
状態（状態２）に分け、先ず状態１で入力音声を
認識し、話者の性格を限定した後に、その性格の
共通の組の標準パタンを用い、状態２の認識を行
なうことにより、状態２における処理量を低減さ
せようというものである。 That is, in the present invention, the state of the recognition device is divided into a recognition state of discrete utterances (state 1), which requires a relatively small amount of recognition processing, and a recognition state of continuous utterances, which requires a large amount of processing (state 2). The idea is to reduce the processing amount in state 2 by recognizing the input speech in step 1 and limiting the personality of the speaker, and then performing recognition in state 2 using a standard pattern of a common set of characteristics. be.

以下、実施例にもとづき本発明を説明する。 The present invention will be described below based on Examples.

第１図は本発明を応用した電話情報サービスシ
ステム構成の一例である。システム制御部１と本
発明による音声認識部２、音声応答部３、ハイブ
リツドコイル４、加入者電話器５からなり、電話
情報サービスシステムより本発明の説明に必要な
部分のみを取り出して記してある。 FIG. 1 shows an example of the configuration of a telephone information service system to which the present invention is applied. It consists of a system control section 1, a voice recognition section 2, a voice response section 3, a hybrid coil 4, and a subscriber telephone 5 according to the present invention, and only the parts necessary for explaining the present invention are extracted from the telephone information service system and written. .

第２図は本発明を説明するための音声認識装置
（音声認識部２）の構成例である。第２図におい
て、制御部２１は第１図のシステム制御部１から
の指令と結果２７を授受する他、音声認識部２の
制御を行なう。分析部２２で分析された入力音声
は標準パタンメモリ２４中の標準パタンデータと
の類似度が類似度計算部２３で計算され、連続パ
タン・マツチング部２５で最適マツチング値が各
標準パタンとの間で計算される。その結果は判定
部２６で判定され、判定結果が制御部２１に送ら
れる。連続パタン・マツチング処理を行なう認識
装置の構成はすでに公知なので（特開昭55−2205
号公報参照）その説明は省略する。この装置の例
では、常に入力パタンと指定された標準パタンと
を照合しているので、入力が離散発声であること
があらかじめ判明していれば、マツチング部の出
力は離散発声単語が入力されたものとして判定部
２６で判定すれば良く、連続単語入力の場合に
は、連続単語として判定して行く方式となつてお
り、連続パタン・マツチング部２５の動作は共通
である。このマツチング部２５の動作を離散発声
用と連続発声用に切り換える方式（たとえば、特
願昭55−158296号参照）の装置においても以下の
説明は全く同様に取り扱える。 FIG. 2 is a configuration example of a speech recognition device (speech recognition unit 2) for explaining the present invention. In FIG. 2, a control section 21 not only sends and receives commands and results 27 from the system control section 1 of FIG. 1, but also controls the voice recognition section 2. The similarity calculation unit 23 calculates the similarity between the input voice analyzed by the analysis unit 22 and the standard pattern data in the standard pattern memory 24, and the continuous pattern matching unit 25 calculates the optimum matching value between each standard pattern. It is calculated by The result is determined by the determination section 26, and the determination result is sent to the control section 21. The configuration of a recognition device that performs continuous pattern matching processing is already known (Japanese Patent Laid-Open No. 55-2205).
(Refer to the publication) The explanation thereof will be omitted. In the example of this device, the input pattern is always compared with the specified standard pattern, so if it is known in advance that the input is a discrete utterance, the output of the matching section will be the same as that of the input discrete utterance word. In the case of continuous word input, the continuous pattern matching section 25 operates in the same way. The following explanation can be applied in exactly the same manner in a system in which the operation of the matching section 25 is switched between discrete vocalization and continuous vocalization (see, for example, Japanese Patent Application No. 158296/1982).

いま、登録されている単語の種類が「はい」、
「いいえ」と０〜９の数字とする。また、各単語
と数字の標準パタンは話者による差異を考慮
し、／男／女／子供／各５種すなわち、３×５＝
15個ずつ登録されているものとする。銀行におけ
る残高照会の例を取り上げると、第１図に戻つ
て、利用者からの電話がシステムに入ると、先ず
音声応答部３は「残高照会ですか」と利用者に問
うと共に、音声認識部２はシステム制御部１の指
令にもとづき、「はい」か「いいえ」の２種の単
語を離散入力として認識するモード（状態１）で
入力を待つ。利用者が「はい」又は「いいえ」と
答えると、音声認識部２は「はい」「いいえ」の
２語に対し各15個の合計30個の標準パタンとの照
合をすれば良い。この結果、最もマツチングの良
い標準パタンが男の組（又は女、又は子供の組）
であれば、以降状態２（連続単語認識の状態）で
は、男（又は女、又は子供）に属する数字の標準
パタンのみを用いるように制御部２１が制御指令
を出す。次の段階で音声応答装置（音声応答部
３）は「暗証番号をどうぞ」と利用者に音声出力
すると共に、音声認識部２は状態２となり連続数
字認識可能な状態となる。利用者は、たとえば暗
証番号「1234」などと音声で入力すると音声認識
部２は男（又は女、又は子供）の組に所属する数
字標準パタン10×５＝50個との照合を行なえば良
いことになる。従つて、音声認識部２のマツチン
グ能力は高々50個の標準パタンとの照合で良いこ
とになる。これに対し、状態１で組を定めずに認
識する場合は10×15＝150個の標準パタンとの照
合を要することになる。 If the type of word currently registered is "Yes",
“No” and a number from 0 to 9. In addition, the standard pattern of each word and number takes into account differences depending on the speaker, /man / woman / child / 5 types each, i.e. 3 × 5 =
It is assumed that 15 items are registered each. Taking the example of a balance inquiry at a bank, returning to Figure 1, when a call from a user enters the system, the voice response unit 3 first asks the user, ``Do you want to inquire about your balance?'' and the voice recognition unit 2 waits for input in a mode (state 1) in which two types of words, "yes" and "no", are recognized as discrete inputs based on a command from the system control unit 1. When the user answers "yes" or "no," the speech recognition unit 2 only needs to match the two words "yes" and "no" with a total of 30 standard patterns, 15 each. As a result, the standard pattern with the best matching is the group of men (or the group of women, or the group of children).
If so, then in state 2 (continuous word recognition state), the control unit 21 issues a control command to use only the standard pattern of numbers belonging to men (or women, or children). In the next step, the voice response device (voice response section 3) outputs a voice message to the user saying "please enter your password," and the voice recognition section 2 changes to state 2, allowing continuous digit recognition. For example, when the user inputs a password such as "1234" by voice, the voice recognition unit 2 only needs to match it with 10 x 5 = 50 standard numbers patterns belonging to the male (or female, or child) group. It turns out. Therefore, the matching ability of the speech recognition unit 2 is sufficient to match at most 50 standard patterns. On the other hand, in the case of recognition without determining the set in state 1, comparison with 10×15=150 standard patterns is required.

以上説明したごとく、本発明によれば、経済的
に、不特定話者の連続発声した音声を認識するシ
ステムが実現できることになりその効果は大き
い。 As described above, according to the present invention, it is possible to economically realize a system for recognizing continuous voices uttered by an unspecified speaker, and the effects thereof are significant.

[Brief explanation of drawings]

第１図は本発明を応用した電話情報サービスシ
ステムの一構成例を示し、第２図は本発明による
音声認識装置のブロツク構成を示す。２５：連続パタン・マツチング部。 FIG. 1 shows an example of the configuration of a telephone information service system to which the present invention is applied, and FIG. 2 shows a block configuration of a speech recognition device according to the present invention. 25: Continuous pattern matching section.

Claims

[Scope of Claims] 1. Means for storing a plurality of sets of standard patterns having the same meaning in correspondence with speakers with different characteristics; and means for comparing input speech with the standard patterns. and controlling the matching means to match the input voice of the discrete utterance that is first obtained in a series of responses to an unspecified speaker with the plurality of sets of standard patterns stored in the storage means, The characteristic of the speaker is specified in accordance with the result of the matching means, and one set of standard patterns among the plurality of sets of standard patterns stored in the storage means is limited, and A speech recognition device comprising: means for controlling the collating means to collate continuous utterance input speech with the limited set of standard patterns.