JPS593494A

JPS593494A - Voice input recognition equipment

Info

Publication number: JPS593494A
Application number: JP57112775A
Authority: JP
Inventors: 浜田　隆史; 荻田　隆彦
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-06-30
Filing date: 1982-06-30
Publication date: 1984-01-10

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（１）発明の技術分野本発明は音声認識装置にかかり、特番こ音声入力によっ
て特徴を記憶している内容を切換える音声入力認識装置
に関する。DETAILED DESCRIPTION OF THE INVENTION (1) Technical Field of the Invention The present invention relates to a voice recognition device, and more particularly, to a voice input recognition device that switches the stored characteristics of a special number by voice input.

（２）技術の背景音声を認識する装置はほとんどが入力した音声の特徴デ
ータととあらかじめ記憶されている種々の音声の特徴デ
ータとを比較し、入力した音声の特徴データに一番近い
特徴データの音声が入力した音声であるとして認識して
いる。すなわち、特徴を抽出してパターンマツチングを
行っている。(2) Background of the technology Most speech recognition devices compare the input voice feature data with various pre-stored voice feature data, and select the feature data that is closest to the input voice feature data. The voice is recognized as the input voice. That is, features are extracted and pattern matching is performed.

これらの特徴にはそのほとんどが音声におけるホルマン
ト、すなわち音声をスペクトラムに変換した特性が用い
られている。またホルマントをそのままパターンマツチ
ングする場合と、第１ホルマント周波数、第２ホルマン
ト周波数、・・・を摘出し、その値を用いてパターンマ
ツチングする場合等がある。Most of these features use formants in speech, that is, characteristics obtained by converting speech into a spectrum. Further, there are cases where pattern matching is performed on the formants as they are, and cases where the first formant frequency, second formant frequency, etc. are extracted and pattern matching is performed using those values.

（３）従来技術と問題点音声認識には不特定多数の発声者の入力を共通の標準パ
ターンと照合して認識する不特定話者認識と発声者があ
らかじめ音声パターンを登録しておき、音声入力を自分
の登録パターンと照合して認識する特定話者認識がある
。前者は登録作業を必要としないが高い認識率を得るこ
とが難しく、また後者は高い認識率を得ることができる
が、登録作業を必要とする。一般的に高い認識率が得ら
れる特定話者認識が多く用いられているが、パターンの
登録には煩雑なキー操作等が伴う。(3) Prior art and problems Speech recognition involves speaker-independent recognition, which recognizes the input from an unspecified number of speakers by comparing it with a common standard pattern, and the speaker-independent recognition, which recognizes the input of an unspecified number of speakers by comparing it with a common standard pattern. There is specific speaker recognition, which recognizes input by comparing it with one's own registered patterns. The former method does not require registration work, but it is difficult to obtain a high recognition rate, and the latter method can obtain a high recognition rate, but requires registration work. Specific speaker recognition, which generally provides a high recognition rate, is often used, but pattern registration involves complicated key operations.

第１図は前述の特定話者認識の構成図を示す。FIG. 1 shows a block diagram of the above-mentioned specific speaker recognition.

パターン登録、或いは修正時においては入力した音声信
号Ｓは音声入力部１において特徴抽出がなされ、初期の
パターン登録及び経年変化等によるパターンの修正を行
う音声パターン登録・修正部２を介して登録パターン記
憶部３に格納される。When registering or modifying a pattern, the input audio signal S is subjected to feature extraction in the audio input section 1, and is then converted into a registered pattern via the audio pattern registration/correction section 2, which performs initial pattern registration and modification of the pattern due to aging, etc. It is stored in the storage unit 3.

前述の登録或いは修正は操作部４より入力したキー操作
データにより制御部５が音声パターン登録・修正部２を
制御することによってなされる。The above-mentioned registration or modification is performed by the control section 5 controlling the voice pattern registration/correction section 2 based on key operation data inputted from the operation section 4.

音声認識のときには、入力に先立ち操作部４からのキー
操作データによってユーザＩＤを入力することにより制
御部５が入力された１０と対応する登録パターンを登録
パターン記憶部３から選び出し、認識時の照合データと
する。認識部では音声入力部１より得られる音声パター
ンと比較照合し、その結果を制御部５に出力する。前述
の認識部での比較照合は最も類似度の大きいパターンと
同一音声であるとして認識を行う。また、表示部３は操
作動作からのキー人力及び音声入力のモニタ、或いは本
システムからのメソセージを表示するものである。At the time of voice recognition, by inputting a user ID using key operation data from the operation unit 4 prior to input, the control unit 5 selects a registered pattern corresponding to the input 10 from the registered pattern storage unit 3, and performs verification during recognition. Data. The recognition section compares and matches the speech pattern obtained from the speech input section 1 and outputs the result to the control section 5. The above-described comparison and matching in the recognition unit performs recognition on the assumption that the pattern with the highest degree of similarity is the same voice. Further, the display unit 3 is used to monitor key input and voice input from operation operations, or to display messages from the system.

前述の説明で明らかなように初期の音声パターンの登録
作業並びにユーザに対応する登録済みパターンの選択の
指示にはキー操作が伴っている。As is clear from the above description, the initial voice pattern registration work and the user's instruction to select the corresponding registered pattern are accompanied by key operations.

すなわち、従来の音声入力装置は初期状態におけるキー
操作並びに電源投入時における登録済みパターンの指定
を行うためのキー操作部４を必要とする。That is, the conventional voice input device requires a key operation unit 4 for performing key operations in an initial state and for specifying a registered pattern when the power is turned on.

このため、前記従来の方法は煩わしいキー操作からは解
放されず、音声入力装置の長所が生かされていなかった
。For this reason, the conventional method does not free users from cumbersome key operations and does not take advantage of the advantages of voice input devices.

（４）発明の目的本発明は前記問題点を解決するものであり、その目的は
音声入力によって特徴を記憶している内容を切換えるこ
とによってキー操作をなくした音声入力認識装置を提供
することにある。(4) Purpose of the Invention The present invention solves the above-mentioned problems, and its purpose is to provide a voice input recognition device that eliminates key operations by switching the stored characteristics by voice input. be.

（５）発明の構成本発明の特徴とするところは、音声の特徴を摘出する音
声入力手段と、標準音声パターン登録クが格納された第
１の記憶手段と、特定話者の音声パターン登録クが格納
される第２の記憶手段と、前記第１．第２の記憶手段の
出力を切換える切替手段と、前記切替え手段によって選
択された音声パターンデータと音声入力手段より得られ
た音声パターンデータとを比較する認識手段よりなり、
前記第１の記憶手段と認識手段によって初期設定を行い
、次いで前記第２の記憶手段と該認識手段による音声認
識を行うようにしたことを特徴とした音声入力認識装置
にある。(5) Structure of the Invention The present invention is characterized by a voice input means for extracting voice characteristics, a first storage means storing a standard voice pattern registration record, and a voice pattern registration record of a specific speaker. a second storage means in which the first . comprising a switching means for switching the output of the second storage means, and a recognition means for comparing the speech pattern data selected by the switching means and the speech pattern data obtained from the speech input means,
The voice input recognition device is characterized in that the first storage means and the recognition means perform initial settings, and then the second storage means and the recognition means perform voice recognition.

（６）発明の実施例以下、図面を用いて本発明の詳細な説明する。(6) Examples of the invention Hereinafter, the present invention will be explained in detail using the drawings.

第２図は本発明の実施例の構成図を示す。音声信号Ｓは
音声入力部１に入力され、その出力は認識部６と音声パ
ターン登録・修正部２に接続されている。音声パターン
登録・修正部２の出力は記憶部３′に入力する。認識部
６の出力は制御部５′に入力する。制御部５′の制御線
は音声パターン登録・修正部２２表示図７．記憶部３′
にそれぞれ接続されている。記憶部３′の出力は認識部
６に入力する。FIG. 2 shows a block diagram of an embodiment of the present invention. The audio signal S is input to the audio input unit 1, and its output is connected to the recognition unit 6 and the audio pattern registration/correction unit 2. The output of the voice pattern registration/correction section 2 is input to the storage section 3'. The output of the recognition section 6 is input to the control section 5'. The control line of the control section 5' is connected to the voice pattern registration/correction section 22 displayed in Fig. 7. Storage section 3'
are connected to each. The output of the storage section 3' is input to the recognition section 6.

また、記憶部３′は標準パターンメモリ３′−１、登録
バクーンメモリ３’−２．切替部３′−３よりなり、制
御部５′の制御信号によって音声パターン登録・修正部
２より得られる特徴データを登録パターンメモリに格納
する。さらに、制御部５′の制御信号によって標準パタ
ーンメモリ３′−１，ｕ録パターンメモリ３′−２の内
容を切替部３′−３で選択し認識部に出力する。Further, the storage section 3' includes a standard pattern memory 3'-1, a registered Bakun memory 3'-2, . The switching section 3'-3 stores characteristic data obtained from the voice pattern registration/correction section 2 in a registered pattern memory in response to a control signal from a control section 5'. Furthermore, the contents of the standard pattern memory 3'-1 and the u-recorded pattern memory 3'-2 are selected by the switching unit 3'-3 and output to the recognition unit according to a control signal from the control unit 5'.

初期状態においては、本発明の実施例では標準パターン
メモリの内容が選択され、入力音声との照合に用いられ
る。ここでいう標準音声パターンとしては、数字（０〜
９）及び、数倍程度の簡単な命令が考えられる。ユーザ
が音声入力によってＩＤ情報（例えば数桁の番号）を入
力すると、入力された音声信号はその特徴が抽出されて
認識部に入る。認識部では前述の音声入力部１より得ら
れるデータを初期状態において格納されている標準パタ
ーンメモリと比較、認識し、その結果を制御部５′に出
力する。初期状態におけるこれら認識はこれから入力さ
れる音声人力Ｓが登録パターンメモリ３′−２に登録さ
れているか否か、の選択を行うための認識であり、既に
登録がなされている場合には制御部５゛は登録パターン
メモリ３′−２に格納されている特定話者すなわち現在
のユーザの特徴データを照合データとする。また、登録
がされていない場合には、登録作業に入るか、或いはＩ
Ｄの再入力を要求する等が考えられる。In the initial state, in embodiments of the present invention, the contents of the standard pattern memory are selected and used for matching with the input speech. The standard voice pattern referred to here is the number (0~
9) and several times as simple instructions can be considered. When a user inputs ID information (for example, a several-digit number) by voice input, the input voice signal has its characteristics extracted and enters the recognition unit. The recognition section compares and recognizes the data obtained from the voice input section 1 with the standard pattern memory stored in the initial state, and outputs the result to the control section 5'. These recognitions in the initial state are recognitions for selecting whether or not the voice input S to be inputted from now on is registered in the registered pattern memory 3'-2, and if it has already been registered, the control unit 5' uses characteristic data of a specific speaker, that is, the current user, stored in the registered pattern memory 3'-2 as verification data. In addition, if it is not registered, please start the registration process or
It is conceivable to request the re-input of D.

表示部７はこれらの動作を操者に伝えるための表示部で
あり、この表示部７に表示されるメソセージによってユ
ーザは操作を進める。The display section 7 is a display section for conveying these operations to the operator, and the user proceeds with the operation based on the messages displayed on the display section 7.

上記動作が完了すると、その後はユーザ自身の登録デー
タが音声入力部より人力される特徴データと比較認識さ
れ、結果が制御部５′に出力される。When the above operations are completed, the user's own registered data is then compared and recognized with the feature data input manually from the voice input section, and the results are output to the control section 5'.

本発明の実施例における表示部７は文字等を表示するＣ
ＲＴやプラズマディスプレイ等が可能である。また、表
示部を音声として出力することも可能であり、制御部５
′から発声するメソセージデータに対応した音声をあら
かじめ備わっている発音データを選択して発声するよう
にすることも可能である。The display section 7 in the embodiment of the present invention displays characters, etc.
RT, plasma display, etc. are possible. It is also possible to output the display unit as audio, and the control unit 5
It is also possible to select and utter pronunciation data that is provided in advance with the voice corresponding to the message data to be uttered from ''.

（７）発明の効果前述で述べたように、本発明は初期状態においては入力
された音声の特徴と標準な音声の特徴との比較ＩＪｉ＆
によってユーザの音声の特徴を選択し、その後はユーザ
自身が登録したパターンデータを用いて認識を行ってい
る。その結果、従来行われていたキー操作等は必要とせ
ず、さらに初期状態後は認識率の高い音声認識を可能と
している。(7) Effects of the Invention As mentioned above, in the initial state, the present invention provides a comparison between the characteristics of the input voice and the characteristics of the standard voice.
The system selects the characteristics of the user's voice, and then performs recognition using pattern data registered by the user. As a result, there is no need for conventional key operations, and speech recognition with a high recognition rate is possible after the initial state.

[Brief explanation of the drawing]

第１図は従来の音声認識の回路構成図、第２図は本発明
の音声認識の回路構成図を示す。１・・・音声入力部、　２・・・音声パターン登録・修
正部、　３′・・・記憶部、　３′−１・・・標準パタ
ーンメモリ、　　３′−２・・・登録バクーンメモリ、
　３′−３・・・切替部、５′・・・制御部、　６・・
・認識部特許出願人　　富士通株式会社FIG. 1 shows a circuit configuration diagram of a conventional speech recognition system, and FIG. 2 shows a circuit configuration diagram of a speech recognition system according to the present invention. 1... Voice input section, 2... Voice pattern registration/correction section, 3'... Storage section, 3'-1... Standard pattern memory, 3'-2... Registration Bakun memory,
3'-3...Switching unit, 5'...Control unit, 6...
・Recognition Department Patent Applicant Fujitsu Limited

Claims

[Claims]

(1) A voice input means for extracting voice characteristics, a first storage means storing standard voice pattern data, a second storage means storing voice pattern data of a specific speaker, and the first storage means storing voice pattern data of a specific speaker; ．． The first storage means comprises a switching means for switching the output of the second storage means, and a recognition means for comparing the audio pattern data selected by the switching means and the audio pattern data obtained from the audio input means. A voice input recognition device characterized in that initial settings are performed by a recognition means, and then voice recognition is performed by the second storage means and the recognition means.