JPS6250838B2

JPS6250838B2 -

Info

Publication number: JPS6250838B2
Application number: JP56001838A
Authority: JP
Inventors: Kyoshi Tajima; Masayuki Iida; Hiroki Oonishi
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1981-01-08
Filing date: 1981-01-08
Publication date: 1987-10-27
Also published as: JPS57115600A

Description

【発明の詳細な説明】本発明は人間の音声を聞き分ける事のできる音
声認識装置に関し、特に特定話者用の音声認識装
置の音声登録方式に特徴を有するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech recognition device that can distinguish human speech, and is particularly characterized by a voice registration method of a speech recognition device for a specific speaker.

第１図は現存する音声認識装置を示すブロツク
図である。同図に於て１は音声を電気信号に変換
するマイクロフオン、２は該マイクロフオン１か
ら入力される音声信号から音声の特徴パターンを
抽出する特徴抽出器であり、音声領域検出器３
と、スペクトル抽出器４と、ゼロクロス検出器５
と、パターン作成部６と、からなつている。尚、
該音声領域検出器３は入力される信号の振巾の変
化状態を監視して音声信号が現れている時間領域
を検出するものである。上記スペクトル抽出器４
は入力される音声信号から周波数スペクトルを抽
出し、音声の特徴が現われるホルマント情報を含
んだ各周波数に対するスペクトル値を得るもので
ある。また上記ゼロクロス検出器５は入力される
音声信号の内の高域成分に於けるゼロクロス回数
を検出してこの大小に依り有声無声の別を判定す
るものである。上記パターン作成部６は上記各検
出器３，４，５に基づいて音声のスペクトル情報
と有声無声情報とを含めた特徴パターンを作成す
るものである。７は予め音声の標準的な特徴パタ
ーンが登録パターンとして多数の語について貯え
られている登録パターンメモリであり、この登録
パターンは予め行なわれる登録モードに於いて上
記マイクロフオン１から入力される音声信号に基
づき上記特徴抽出器２に依つて得られたものであ
る。８は音声を認識する為の認識モードに於いて
上記特徴抽出器２から得られる入力音声の特徴パ
ターンと、上記登録パターンメモリの多数の登録
パターンとを比較認識する認識処理部であり、こ
の時の入力音声に対応した出力信号を得るもので
ある。 FIG. 1 is a block diagram showing an existing speech recognition device. In the figure, 1 is a microphone that converts voice into an electrical signal, 2 is a feature extractor that extracts a voice feature pattern from the voice signal input from the microphone 1, and a voice area detector 3.
, a spectrum extractor 4, and a zero-cross detector 5
and a pattern creation section 6. still,
The audio region detector 3 monitors changes in the amplitude of the input signal and detects the time region in which the audio signal appears. The above spectrum extractor 4
The method extracts a frequency spectrum from an input audio signal and obtains a spectrum value for each frequency that includes formant information where the characteristics of the audio appear. The zero-crossing detector 5 detects the number of zero-crossings in the high-frequency components of the input audio signal, and determines whether the signal is voiced or unvoiced based on the magnitude of this. The pattern creation section 6 creates a characteristic pattern including voice spectrum information and voiced/unvoiced information based on the respective detectors 3, 4, and 5. Reference numeral 7 denotes a registered pattern memory in which standard feature patterns of speech are stored in advance as registered patterns for a large number of words, and this registered pattern is used as a registered pattern for the voice signal inputted from the microphone 1 in the registration mode performed in advance. This is obtained by the feature extractor 2 based on the above. Reference numeral 8 denotes a recognition processing unit that compares and recognizes the characteristic pattern of the input voice obtained from the feature extractor 2 and a large number of registered patterns in the registered pattern memory in the recognition mode for recognizing voice. This is to obtain an output signal corresponding to the input audio.

斯様な音声認識装置を自宅の電話器に接続し、
話者が外出先から自宅に電話をかける事に依つ
て、この音声認識装置に音声を認識せしめ、この
認識結果を用いて自宅内の所望の電気機器の動作
を制御しようとする試みが為されている。 Connect such a voice recognition device to your home phone,
Attempts have been made to have this voice recognition device recognize the voice of a person calling home from outside the home, and to use this recognition result to control the operation of desired electrical equipment in the home. ing.

この場合、上述の如き特定話者用の音声認識装
置を用いる限り、登録モードは自宅にて行なわれ
る事になり、マイクロフオンから得られる自然な
音声帯域（100Hz〜600Hz）を有する音声信号に基
づいた登録パターンが貯えられる。この為に、認
識モードに於いて電話回線特有の伝送特性に依り
その周波数帯域が300Hz〜3400Hzに制限された音
声信号に基づく入力パターンと、同一音声の上記
登録パターンと、の間にかなりの差異が生じるの
で、実用に値する認識率を得る事ができなかつ
た。 In this case, as long as a voice recognition device for a specific speaker as described above is used, the registration mode will be performed at home, and will be based on the voice signal with a natural voice band (100Hz to 600Hz) obtained from the microphone. Registered patterns are stored. For this reason, in the recognition mode, there is a considerable difference between the input pattern based on a voice signal whose frequency band is limited to 300Hz to 3400Hz due to the transmission characteristics specific to telephone lines, and the above registered pattern of the same voice. , it was not possible to obtain a recognition rate worthy of practical use.

即ち、音声信号の周波数帯域の相違の為に起こ
る上記入力パターンと登録パターンとの差異は、
上記スペクトル抽出器４に於ける300Hz以下の低
域と3400Hz以上の高域とのスペクトル値の有無
と、上記音声領域検出器３に於ける音声エネルギ
ーの大きな低域成分の振巾の変化状態の差に依る
音声の時間領域の誤差と、上記ゼロクロス検出器
５に於ける高域成分のゼロクロス回数の差に依る
有声無声の判定誤差と、に夫々起因するものであ
る。 In other words, the difference between the input pattern and the registered pattern that occurs due to the difference in the frequency band of the audio signal is
The presence or absence of spectral values in the low range below 300 Hz and the high range above 3400 Hz in the spectrum extractor 4, and the state of change in the amplitude of the low range component with large audio energy in the audio region detector 3. This is caused by an error in the time domain of the voice due to the difference, and a voiced/unvoiced determination error due to the difference in the number of zero crosses of the high frequency component in the zero cross detector 5.

本発明は、斯る点に鑑みて為されたものであ
り、登録モードで得られる音声信号と、認識モー
ドで得られる音声信号と、の周波数帯域の均等化
を計るものである。 The present invention has been made in view of this point, and aims to equalize the frequency bands of the audio signal obtained in the registration mode and the audio signal obtained in the recognition mode.

第２図は本発明の音声認識装置のブロツク図を
示す。同図に於いて、１，２，７，８は夫々第１
図に示した現存装置と同じく、マイクロフオン、
特徴抽出器、登録パターンメモリ、認識処理部、
を示している。９は自宅外の電話器、１０は該電
話器９が連なる電話交換器を含めた電話回線であ
り、この内の一回線が自宅に引き入れられてい
る。１１はマイクロフオン１からの音声信号の周
波数帯域を制限するバンドパスフイルタであり、
第３図の周波数特性図に示す如く、電話回線１０
の伝送特性と等価な300Hz〜3400Hzの通過帯域を
有している。S₁、及びS₂は連動式のモード切換用
の第１、及び第２切換スイツチであり、登録モー
ドに於てはこれ等スイツチS₁，S₂は共にｂ端子側
にあつて、マイクロフオン１からの音声信号を上
記バンドパスフイルタ１１に依つて周波数制限し
たものを第１切換スイツチS₁を介して特徴抽出器
２に入力せしめ、この特徴抽出器２から得られる
この時の音声の登録パターンが第２切換スイツチ
S₂を介して登録パターンメモリ７に貯えられる。
又、認識モードに於てはこれ等スイツチS₁，S₂は
共にａ端子側にあつて、外出先の電話器９から上
記電話回線１０に依つて得られる音声信号を第１
切換スイツチS₁を介して特徴抽出器２に入力せし
め、この特徴抽出器２から得られるこの時の音声
の入力パターンが第２切換スイツチS₂を介して認
識処理部８に入力される。 FIG. 2 shows a block diagram of the speech recognition device of the present invention. In the same figure, 1, 2, 7, and 8 are the first
Like the existing equipment shown in the figure, Microphon,
Feature extractor, registered pattern memory, recognition processing unit,
It shows. 9 is a telephone outside the home; 10 is a telephone line including a telephone exchange to which the telephone 9 is connected; one of these lines is connected to the home. 11 is a bandpass filter that limits the frequency band of the audio signal from the microphone 1;
As shown in the frequency characteristic diagram of Fig. 3, the telephone line 10
It has a passband of 300Hz to 3400Hz, which is equivalent to the transmission characteristics of . S ₁ and S ₂ are first and second changeover switches for interlocking mode changeover, and in the registration mode, these switches S ₁ and S ₂ are both on the b terminal side, and the microphone The audio signal from 1 is frequency-limited by the bandpass filter 11 and inputted to the feature extractor 2 via the first changeover switch _S1 , and the audio obtained from the feature extractor 2 is registered. The pattern is the second changeover switch
It is stored in the registered pattern memory 7 via _S2 .
In addition, in the recognition mode, these switches S ₁ and S ₂ are both on the a terminal side, and the voice signal obtained from the telephone 9 outside the home via the telephone line 10 is switched to the first switch.
The voice input pattern is inputted to the feature extractor 2 via the changeover switch _S1 , and the input pattern of the voice at this time obtained from the feature extractor 2 is inputted to the recognition processing section ₈ via the second changeover switch S2.

上述の如き構成に依れば、登録パターンメモリ
７には、マイクロフオン１から得られる自然な音
声周波数帯域（100Hz〜6000Hz）の音声信号をバ
ンドパスフイルタ１１に依つて、電話回線１０の
伝送特性と等価の通過帯域300Hz〜3400Hzに制限
したものに基づいて抽出された登録パターンが多
数の語について貯えられている事になる。認識処
理部８に於てはこれ等の登録パターンを用いて、
外出先の電話器９から電話回線１０を介して得ら
れる音声信号に基づいた特徴パターンを比較認識
し、この時の入力音声例えば「ライト・オン」に
対応した制御信号を出力する事に依つて自宅の門
灯が点灯される。 According to the above-described configuration, the registered pattern memory 7 stores an audio signal in the natural audio frequency band (100Hz to 6000Hz) obtained from the microphone 1 by using the bandpass filter 11 to adjust the transmission characteristics of the telephone line 10. This means that registered patterns extracted based on the equivalent passband limited to 300Hz to 3400Hz are stored for a large number of words. The recognition processing unit 8 uses these registered patterns to
By comparing and recognizing characteristic patterns based on voice signals obtained from a telephone 9 outside the home via the telephone line 10, and outputting a control signal corresponding to the input voice at this time, for example, "Light On". The gate lights at home are turned on.

本発明の音声認識装置は、以上の説明から明ら
かな如く、電話回線の伝送周波数特性と等価な周
波数通過帯域を有するフイルタ回路を備え、登録
モードに於いてマイクロフオンから上記フイルタ
回路を介して得られる音声信号に基づく多数の登
録パターンを貯え、認識モードに於いてはこれ等
登録パターンと、電話回線から得られる音声信号
に基づく入力パターンとを比較して、この時の入
力音声を認識するものであるので、登録モードで
得られる音声信号と、認識モードで得られる音声
信号と、の周波数帯域の均等化が計れる。 As is clear from the above description, the speech recognition device of the present invention is equipped with a filter circuit having a frequency pass band equivalent to the transmission frequency characteristics of a telephone line, and in the registration mode, the speech recognition device receives information from a microphone through the filter circuit. This system stores a large number of registered patterns based on the voice signals obtained from the telephone line, and in the recognition mode, compares these registered patterns with input patterns based on the voice signals obtained from the telephone line to recognize the input voice at this time. Therefore, it is possible to equalize the frequency bands of the audio signal obtained in the registration mode and the audio signal obtained in the recognition mode.

従つて、電話回線の伝送特性に依るこの種音声
認識装置の認識率の低下を回避する事ができ、電
話器を用いた各種電気機器の遠隔制御が確実なも
のとなる。 Therefore, it is possible to avoid a decrease in the recognition rate of this type of speech recognition device due to the transmission characteristics of the telephone line, and the remote control of various electrical devices using the telephone becomes reliable.

[Brief explanation of the drawing]

第１図は、現存する音声認識装置を示すブロツ
ク図、第２図は本発明の音声認識装置を示すブロ
ツク図、第３図は本発明装置に用いられるバンド
パスフイルタの周波数特性図、であり、１はマイ
クロフオン、２は特徴抽出器、７は登録パターン
メモリ、８は認識処理部、９は電話器、１０はバ
ンドパスフイルタ、を夫々示している。 FIG. 1 is a block diagram showing an existing speech recognition device, FIG. 2 is a block diagram showing a speech recognition device of the present invention, and FIG. 3 is a frequency characteristic diagram of a bandpass filter used in the device of the present invention. , 1 is a microphone, 2 is a feature extractor, 7 is a registered pattern memory, 8 is a recognition processing section, 9 is a telephone, and 10 is a bandpass filter.

Claims

[Claims]

1. A feature extractor that extracts a voice feature pattern from an input voice signal, a registered pattern memory in which a large number of registered patterns are stored in advance, and a It consists of a recognition processing unit that compares and recognizes input patterns, and a filter circuit that has a frequency pass band equivalent to the transmission frequency characteristics of a telephone line. The registered pattern obtained from the feature extractor is transmitted to the feature extractor via the filter circuit, and the registered pattern obtained from the feature extractor is stored in the registered pattern memory, and in voice recognition mode, the feature is extracted from the voice signal obtained from the telephone line. Introduce it into a container,
A speech recognition device characterized in that the speech at this time is recognized by the recognition processing section.