JPS6329278B2

JPS6329278B2 -

Info

Publication number: JPS6329278B2
Application number: JP56079579A
Authority: JP
Inventors: Kyoshi Tajima; Hiroki Oonishi; Masanori Myatake
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1981-05-25
Filing date: 1981-05-25
Publication date: 1988-06-13
Also published as: JPS57195297A

Description

【発明の詳細な説明】本発明は音声装置、更に詳しくは登録モードに
於て音声を登録し、その後の認識モードに於てそ
の登録音声についての認識を行う音声装置に関す
る。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to an audio device, and more particularly to an audio device that registers audio in a registration mode and then recognizes the registered audio in a recognition mode.

音声の特徴を抽出し、その抽出した音声の特徴
とその後に入力される音声の特徴とを比較して認
識する音声装置が出現し、実用化されつつある。
此種音声装置は特定の人の音声を認識するもの
と、不特定話者の声を認識するものとが存在する
が、認識時間、認識率及び装置の構成、の各点か
ら現在のところ前者の特定話者用のものが殆どで
ある。 2. Description of the Related Art Speech devices that extract speech features and perform recognition by comparing the extracted speech features with the features of subsequently input speech have appeared and are being put into practical use.
There are two types of voice devices: one that recognizes the voice of a specific person and one that recognizes the voice of an unspecified speaker, but the former is currently the best in terms of recognition time, recognition rate, and device configuration. Most of them are for specific speakers.

本発明はこのような音声装置の使い勝手の向上
を図つたもので、以下に図面を参照しつつ詳述す
る。 The present invention aims to improve the usability of such an audio device, and will be described in detail below with reference to the drawings.

１は音声を電気信号に変換するマイクロフオ
ン、２はこのマイクロフオンから得られる音声電
気信号の特徴を抽出する特徴抽出回路で、ゼロク
ロス検出回路、音声スペクトル抽出回路、音声領
域検出回路、等から成つている。３は音声の登録
モードと音声の認識モードとの切り換えを行うモ
ード切り換えスイツチ、４はこのモード切り換え
スイツチ３の登録モード側に第１の入力ゲート群
５を介して連つた第１の特徴パターンメモリで、
特徴抽出回路２で抽出された音声の特徴パターン
が貯えられる。６もモード切換えスイツチ３の登
録モード側に第２の入力ゲート群７を介して連つ
た第２の特徴パターンメモリで、特徴抽出回路２
で抽出された音声の特徴パターンが貯えられる。
尚、これ等の特徴パターンメモリ４，６は例えば
夫々８語程度の音声の特徴パターンが記憶される
容量を持つており、またその各メモリ４，６への
特徴パターンの書き込み、並びに読み出しタイミ
ングは例えばT₁〜T₈並びにT₉〜T₁₆の如く、
夫々同様には行われずずれている。８は上記モー
ド切り換えスイツチ３の認識モード側に連つたバ
ツフアメモリで、特徴抽出回路２からの未知音声
の特徴パターンが一時的に貯えられる。９，１０
は上記第１、第２の特徴パターンメモリ４，６の
出力側に設けられた第１、第２の出力ゲート群、
１１はこれ等の出力ゲート群９，１０を介して得
られる登録パターンとバツフアメモリ８からの未
知音声パターンとを比較認識する認識回路、１２
は上記各入出力ゲート群５，７・９，１０へ切り
換え信号を供給する話者選択スイツチで、第１の
話者Ａと、第２の話者Ｂと、特定話者Ｚとの３種
の話者選択が行われる。次に上記各入出力ゲート
群５，７・９，１０の内部構成に就いて説明を加
えておく。第１の入出力ゲート群５，９は、第１
話者Ａと特定話者Ｚとの論理和を採るORゲート
２０，２２と、その論理和出力と音声特徴パター
ンとの論理積を採るANDゲート２１，２３と、
から成つている。第２の入出力ゲート群７，１０
は、第２話者Ｂと特定話者Ｚと論理和を採るOR
ゲート２４，２６と、その論理和出力と音声パタ
ーンとの論理積を採るANDゲート２５，２７と、
から成つている。 1 is a microphone that converts audio into an electrical signal, and 2 is a feature extraction circuit that extracts the features of the audio electrical signal obtained from this microphone, which consists of a zero-cross detection circuit, an audio spectrum extraction circuit, an audio region detection circuit, etc. It's on. Reference numeral 3 denotes a mode changeover switch for switching between a voice registration mode and a voice recognition mode, and 4 a first feature pattern memory connected to the registration mode side of the mode changeover switch 3 via a first input gate group 5. in,
The voice feature patterns extracted by the feature extraction circuit 2 are stored. 6 is also a second feature pattern memory connected to the registration mode side of the mode changeover switch 3 via the second input gate group 7, and is connected to the feature extraction circuit 2.
The extracted voice feature patterns are stored.
Note that these feature pattern memories 4 and 6 each have a capacity to store speech feature patterns of, for example, about 8 words, and the writing and reading timing of the feature patterns to each memory 4 and 6 is determined by the following timing. For example, T ₁ to _{T 8} and T ₉ to T ₁₆ ,
They are not performed in the same way and are shifted. Reference numeral 8 denotes a buffer memory connected to the recognition mode side of the mode changeover switch 3, in which the characteristic pattern of the unknown voice from the characteristic extraction circuit 2 is temporarily stored. 9,10
are first and second output gate groups provided on the output side of the first and second feature pattern memories 4 and 6,
11 is a recognition circuit that compares and recognizes the registered pattern obtained through these output gate groups 9 and 10 and the unknown voice pattern from the buffer memory 8; 12;
is a speaker selection switch that supplies switching signals to each input/output gate group 5, 7, 9, and 10, and selects three types of speakers: first speaker A, second speaker B, and specific speaker Z. Speaker selection is performed. Next, the internal configuration of each input/output gate group 5, 7, 9, and 10 will be explained. The first input/output gate group 5, 9 includes a first
OR gates 20 and 22 that take the logical sum of speaker A and specific speaker Z; AND gates 21 and 23 that take the logical product of the logical sum output and the voice feature pattern;
It consists of Second input/output gate group 7, 10
is an OR that is logically ORed with the second speaker B and the specific speaker Z.
gates 24 and 26, and AND gates 25 and 27 that take the logical product of the logical sum output and the voice pattern;
It consists of

而して先ず音声登録に就いての動作説明を行
う。モード切り換えスイツチ３を登録モード側に
倒し登録話者の選択を第１話者Ａとした場合は、
マイクロフオン１、特徴抽出回路２を介して得ら
れる第１話者の音声の特徴パターンが第１の入力
ゲート群５を通過して最大８語まで第１の特徴パ
ターンメモリ４に貯えられる。また第２話者Ｂを
選択した場合は第２の入力ゲート群７が開きその
第２話者の音声特徴パターンが第２の特徴パター
ンメモリ６に最大８語まで貯えられる。更に特定
話者Ｚを話者選択スイツチ１２で選択すると、両
入力ゲート５，７が開き、特定話者の特徴パター
ンは第１第２の特徴パターンメモリ４，６に導入
されるのであるが、上述した如くこの各メモリ
４，６への特徴パターンの書き込みタイミングが
ずれているので、８＋８、即ち最大16語の特徴パ
ターンが両メモリ４，６にに亘つて貯えられる。 First, we will explain the operation of voice registration. When the mode changeover switch 3 is set to the registration mode side and the registered speaker is selected as the first speaker A,
The characteristic pattern of the first speaker's voice obtained through the microphone 1 and the characteristic extraction circuit 2 passes through the first input gate group 5 and is stored in the first characteristic pattern memory 4 for up to eight words. Further, when the second speaker B is selected, the second input gate group 7 is opened and the voice feature pattern of the second speaker is stored in the second feature pattern memory 6 for up to eight words. Further, when a specific speaker Z is selected by the speaker selection switch 12, both input gates 5 and 7 are opened, and the feature pattern of the specific speaker is introduced into the first and second feature pattern memories 4 and 6. As mentioned above, since the writing timings of the feature patterns to each memory 4 and 6 are shifted, a feature pattern of 8+8, ie, a maximum of 16 words, is stored in both memories 4 and 6.

このようにして登録された音声を用いての認識
動作を次に説明する。モード切り換えスイツチ３
を認識モード側に倒し、先ず第１話者についての
認識を行う場合は話者選択スイツチ１２で第１話
者Ａを指定する。その状態で第１話者がマイクロ
フオン１に向つて未知音声を発すると、その特徴
が特徴抽出回路２で抽出され、バツフアメモリ８
に一時的に貯えられる。そしてその一時的に貯え
られた未知音声の特徴パターンと第１の特徴パタ
ーンメモリ４に登録されている登録音声の特徴パ
ターンとが比較認識回路１１で比較され、その未
知特徴パターンを登録パターンのうちのどちらか
に特定認識される。また第２話者Ｂを選択した時
は第２話者Ｂの未知音声の特徴パターンが第２特
徴パターンメモリ６に登録されているパターンと
比較され、比較認識回路１１で認識される。 The recognition operation using the voice registered in this way will be explained next. Mode change switch 3
If the first speaker is to be recognized by switching to the recognition mode side, the first speaker A is designated by the speaker selection switch 12. In this state, when the first speaker utters an unknown voice into the microphone 1, its features are extracted by the feature extraction circuit 2, and the buffer memory 8
temporarily stored. Then, the temporarily stored characteristic pattern of the unknown voice and the characteristic pattern of the registered voice registered in the first characteristic pattern memory 4 are compared in the comparison recognition circuit 11, and the unknown characteristic pattern is selected from among the registered patterns. It is recognized specifically by either. When the second speaker B is selected, the characteristic pattern of the unknown voice of the second speaker B is compared with the pattern registered in the second characteristic pattern memory 6 and recognized by the comparison recognition circuit 11.

次に本発明の最も特徴とする選択スイツチ１２
で特定話者Ｚを選択した場合を考えてみる。音声
の登録モードに於て、第１、第２各話者の音声を
夫々第１、第２各特徴パターンメモリ４，６に登
録した状態で特定話者Ｚを選択した場合は、バツ
フアメモリ８に一時的に貯えられた未知音声と第
１、第２両話者の音声の登録パターンとが比較認
識回路１１で比較されその登録パターンのうち最
も類似した音声パターンを選び出して未知音声を
認識する。即ちこの状態では２話者の音声の認識
が行われる。 Next, the selection switch 12 which is the most characteristic of the present invention
Let us consider the case where specific speaker Z is selected. In the voice registration mode, when specific speaker Z is selected with the voices of the first and second speakers registered in the first and second characteristic pattern memories 4 and 6, respectively, the voices of the first and second speakers are registered in the buffer memory 8. The temporarily stored unknown voice and the registered patterns of the voices of both the first and second speakers are compared in a comparison recognition circuit 11, and the most similar voice pattern among the registered patterns is selected to recognize the unknown voice. That is, in this state, the voices of the two speakers are recognized.

一方、登録モードに於て、特定話者の（８＋
８）語の登録を両特徴パターンメモリ４，６に登
録した状態で特定話者を選択スイツチ１２で選択
した場合は、バツフアメモリ８の内容と両特徴パ
ターンメモリ４，６にある特定話者の16語の内容
とが比較認識回路１１で比較され、未知音声がそ
の16語の内から選び出されて認識される。 On the other hand, in the registration mode, a specific speaker's (8+
8) When a specific speaker is selected by the selection switch 12 with the word registered in both feature pattern memories 4 and 6, the content of the buffer memory 8 and the specific speaker's 16 in both feature pattern memories 4 and 6 are selected. The contents of the words are compared in a comparison recognition circuit 11, and unknown speech is selected from among the 16 words and recognized.

本発明は以上の説明から明らかな如く、第１、
第２の話者の音声については夫々個別に認識動作
し、その両話者の音声を登録した状態で特定話者
を選択して認識動作を行つた時は未知音声を両話
者の音声に対しての認識動作を行い、また特定話
者の音声を登録した状態でその特定話者を選択し
て認識を行うと、通常の２倍の数の音声の認識を
行わしめる事が出来るものである。従つて数多く
のの音声認識形態に対処する事が出来、音声認識
装置としての汎用性が増すと同時にハード面での
負担は僅かで実用性の極めて高い音声装置を得る
事が出来る。 As is clear from the above description, the present invention has the following features:
The second speaker's voice is recognized individually, and when the voices of both speakers are registered and a specific speaker is selected and recognized, the unknown voice is recognized as the voice of both speakers. If you perform a recognition operation for a specific speaker, and select that specific speaker and perform recognition with the voice of a specific speaker registered, it is possible to recognize twice the number of voices normally. be. Therefore, it is possible to deal with a large number of speech recognition formats, increase the versatility of the speech recognition device, and at the same time, it is possible to obtain an extremely practical speech device with a small burden on hardware.

[Brief explanation of the drawing]

図は本発明の音声認識処理方法を実現する音声
装置の構成を示すブロツク図であつて、２は特徴
抽出回路、４，６は特徴パターンメモリ、８はバ
ツフアメモリ、１１は比較認識回路、１２は話者
選択スイツチ、を夫々示している。 The figure is a block diagram showing the configuration of a voice device that implements the voice recognition processing method of the present invention, in which 2 is a feature extraction circuit, 4 and 6 are feature pattern memories, 8 is a buffer memory, 11 is a comparison recognition circuit, and 12 is a A speaker selection switch is shown.

Claims

[Claims] 1. A voice recognition processing method that registers voice in a registration mode and then recognizes the registered voice in a subsequent recognition mode, which includes: a microphone that converts voice into an electrical signal; a feature extraction circuit for extracting the features of the audio electrical signal from the microphone; and a first and second feature extraction circuit capable of storing feature patterns of n words (n is an integer of 1 or more) extracted by the feature extraction circuit. 2, a buffer memory for temporarily storing the feature pattern of the input voice obtained from the feature extraction circuit, and recognition for comparing and comparing the contents of the buffer memory with the contents of the first and second feature pattern memories. a circuit, a mode switching means for switching between a voice registration mode and a voice recognition mode, and a speaker selection means for a first speaker, a second speaker, and a specific speaker; The feature pattern of n words of the speaker's voice is stored in the first feature pattern memory, the feature pattern of n words of the second speaker's voice is stored in the second feature pattern memory, and the feature pattern of 2n words of the specific speaker is stored in the first feature pattern memory. 1 and 2 are respectively stored in the feature pattern memory, and when the first speaker is selected in the recognition mode, the content introduced in the buffer memory and the content stored in the first feature pattern memory are When the second speaker is selected, the recognition circuit compares and recognizes the contents introduced in the buffer memory and the contents stored in the second feature pattern memory, and , the second speaker's voice feature pattern as the first,
When a specific speaker is selected in the recognition mode with the data stored in the second feature pattern memory, the contents of the buffer memory and the contents of both feature pattern memories are compared and recognized to recognize the voices of the two speakers. When a specific speaker is selected in the recognition mode with the feature pattern of 2n words stored in both feature memories, the content of the buffer memory and the content of both feature memories are compared and the 2n word feature pattern is stored in both feature memories. A speech recognition processing method characterized by performing speech recognition of words.