JPS6214200A

JPS6214200A - Voice recognition equipment

Info

Publication number: JPS6214200A
Application number: JP60153515A
Authority: JP
Inventors: 敏雄吉原
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1985-07-11
Filing date: 1985-07-11
Publication date: 1987-01-22

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は音声認識技術に関し特に９発声者の情緒の変化
による認識性能の低下を防止する手法に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to speech recognition technology, and particularly relates to a method for preventing deterioration in recognition performance due to changes in the emotional state of a speaker.

[Conventional technology]

従来の音声認識装置では、登録時又は認識時のデータパ
タンを比較する場合に、変化の比較的少ないパラメータ
のみを抽出してマツチングを行っている。In conventional speech recognition devices, when comparing data patterns at the time of registration or recognition, only parameters that change relatively little are extracted and matched.

[Problem that the invention seeks to solve]

上述した従来の音声認識装置は９発声者の情緒の変化に
対して比較的影響の少ない成分を抽出して認識を行って
いるが、その影響が皆無と言うわけではなく、認識性能
の変動を生ずる要因のとなっている。The conventional speech recognition device described above performs recognition by extracting components that have relatively little influence on changes in the speaker's emotion, but this does not mean that there is no influence, and it is possible to This is one of the reasons why this occurs.

特に発声者が、緊張状ＢＫあると、認識率が大幅に低下
するという欠点があった。Particularly, when the speaker has a tense BK, the recognition rate drops significantly.

[Failure to solve the problem]

本発明の認識装置では９発声者の緊張度を示す成分を、
その発声音声の中から抽出手段と、その取分を登録時と
１．認識時とで比較判断する手段と。In the recognition device of the present invention, nine components indicating the speaker's tension level are
1. Extracting means from the uttered voice and its share at the time of registration. A means to make a comparative judgment with the time of recognition.

その比較判断結果を発声者にフィードバックする手段を
有している。It has means for feeding back the results of comparison and judgment to the speaker.

〔Example〕

次に本発明について図面を参照して説明する。 Next, the present invention will be explained with reference to the drawings.

ｉ１図は不発明の一実施例のブロック図である。Figure i1 is a block diagram of one embodiment of the invention.

１は音声信号の入力端子、２は音声信号のパタンデータ
分析部、３は基本ピッチ周期検出部、４は前記２の出力
信号切換スイッチ、５は前記３の出力信号の切換えスイ
ッチ、６は音声パタンデータによるマツチング処理部、
７＃ｉ登録時基本ピッチ周期と認識時の基本ピッチ周期
との比較判定部。1 is an audio signal input terminal, 2 is an audio signal pattern data analysis section, 3 is a basic pitch period detection section, 4 is an output signal changeover switch of 2, 5 is an output signal changeover switch of 3, and 6 is an audio Matching processing section using pattern data,
7#i Comparison/judgment unit of the basic pitch period at the time of registration and the basic pitch period at the time of recognition.

８は音声信号のパタンデータ、基本ピッチ周期とを記憶
する記憶部、９は音声認識結果の出力端子。Reference numeral 8 denotes a storage unit for storing the pattern data and basic pitch period of the voice signal, and 9 denotes an output terminal for the voice recognition result.

１０は基本ピッチ周期の比較判断結果をメロティ又は音
声でフィードバックするための音響トランスジューサで
ある。Reference numeral 10 denotes an acoustic transducer for feeding back the comparison and judgment result of the fundamental pitch period in the form of melodies or voice.

次にとの動作を説明する。Next, the operation will be explained.

この実施例では発声者の緊張度を示す取分として入力音
声信号の基本ピッチ周期を用いている。In this embodiment, the basic pitch period of the input audio signal is used as a measure indicating the speaker's nervousness.

登録時には、スイッチ４．及びスイッチ５は共に記憶部
８４ＡｌｌＫ接続され、入力された音声４ｇ号から、音
声パタ／データ分析部２基本ピッチ周期検出部３により
それぞれ抽出されたデータが、共に記憶部８に記憶され
る。When registering, switch 4. and the switch 5 are both connected to the storage section 84AllK, and the data extracted from the input voice number 4g by the audio pattern/data analysis section 2 and the basic pitch period detection section 3 are both stored in the storage section 8.

認識時には、スイッチ４及びスイッチ５はそれぞれマツ
チング処理部、２１ＰＦ不ピッチ周期比較判足部とに接
続され、入力端子ｌよｐ人力された音声信号から、音声
パタンデータ分析部２により、Ｖ声パタンデータが抽出
され、マツチング処理部６により、記憶部８の音声パタ
ンデータと比較利足処理されて認識結果が、出力端子１
０から出力される。At the time of recognition, the switches 4 and 5 are connected to the matching processing section and the 21PF non-pitch period comparison section, respectively, and the voice pattern data analysis section 2 extracts a V voice pattern from the input terminal l to p voice signal. The data is extracted, and the matching processing unit 6 compares it with the voice pattern data in the storage unit 8 and outputs the recognition result to the output terminal 1.
Output from 0.

また以上の動作と同時に、入力音声信号から。At the same time as the above operation, from the input audio signal.

基本ピッチ周期検出部３により抽出された。基本ピッチ
周期データは、基本ピッチ周期比較判足部７によシ、記
憶部８のデータと比較利足され、その変化が一足量以上
の場合には１発声８の緊彊を和わらけるため、！餐トラ
ンスジューサ１０から。It is extracted by the basic pitch period detection section 3. The basic pitch period data is compared with the data in the storage section 8 by the basic pitch period comparison section 7, and if the change is more than one step, the nervousness of one utterance 8 is relieved. For,! From meal transducer 10.

メロティあるいは音声メツセージ等の形態で出力されろ
。Output in the form of melodies or voice messages.

〔Effect of the invention〕

以上紗明したように本発明は、従来の音声認識装置では
使用されていない１発声者の情緒に大きく依存する取５
＋ヲ入力音声中から抽出し２発声者にフィードバックす
る事によシ平均的な音声認識率の向上、あるいは、トレ
ーニングに要する時間が短縮される効果がある。As explained above, the present invention provides an approach that is highly dependent on the emotion of a speaker, which is not used in conventional speech recognition devices.
By extracting + from the input speech and feeding it back to the two speakers, there is an effect of improving the average speech recognition rate or shortening the time required for training.

[Brief explanation of drawings]

第１図は本発明の一実施例のブロック図である。１・・・音声信号入力端子、２・・・音声パタンデータ
分析部、３・・・基本ピッチ周期検出部、４・・・音声
パタンデータ切換スイッチ、５・・・基本ピッチ周期切
換スイッチ、６・・・マツチング処理部、７・・・基本
ピッチ周期比較判足部、８・・・記憶部、９・・・館緻
結果出力端子、１０・・・音響トランスジェーサ代理人
　弁理士　　内　原　　　音＝Ｓ＼＼−０゛′FIG. 1 is a block diagram of one embodiment of the present invention. DESCRIPTION OF SYMBOLS 1...Audio signal input terminal, 2...Audio pattern data analysis section, 3...Basic pitch cycle detection section, 4...Audio pattern data changeover switch, 5...Basic pitch cycle changeover switch, 6 ...Matching processing section, 7.Basic pitch period comparison section, 8.Storage section, 9.Result output terminal, 10.Acoustic transducer agent, patent attorney Hara Uchi. Sound=S＼＼−0゛′

Claims

[Claims]

In a speech recognition device that performs training before performing a recognition operation, in addition to the means of analyzing the speech signal input during the registration period and extracting pattern data, there is also a method of extracting a portion that is correlated with the speaker's tension level, It has means for storing, and further has means for performing pattern matching between the analysis pattern data and input speech during speech recognition, as well as means for extracting components correlated with the degree of tension and comparing and determining them. , specific speaker speech recognition characterized by having a feedback means to relieve the speaker's tension when there is a change of more than a certain amount between a component correlated with the tension level at the time of registration and the same component at the time of recognition. Device.