JPH071437B2

JPH071437B2 - Voice recognizer

Info

Publication number: JPH071437B2
Application number: JP63095697A
Authority: JP
Inventors: 洋一元田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-04-20
Filing date: 1988-04-20
Publication date: 1995-01-11
Anticipated expiration: 2010-01-11
Also published as: JPH01267699A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は発声された音声を標準パターンとDP（Dynamic
Programming）マツチングを行い、最小の相違度を与え
る標準パタンを求めることにより認識を行う音声認識装
置に係り、特に環境騒音の影響を受けにくい音声認識装
置に関するものである。DETAILED DESCRIPTION OF THE INVENTION [Industrial field of application] The present invention converts uttered speech into a standard pattern and DP (Dynamic).
TECHNICAL FIELD The present invention relates to a speech recognition apparatus that performs recognition by determining a standard pattern that gives a minimum degree of difference, and particularly to a speech recognition apparatus that is not easily affected by environmental noise.

[Conventional technology]

従来の音声認識装置では、発声された音声をマイクロホ
ンから入力し、電気信号に変換された音声信号波の振幅
（パワーを含む），スペクトルなどを検定して音声の検
出を行い、その区間の音声を認識している。In a conventional voice recognition device, a uttered voice is input from a microphone, the amplitude (including power) and spectrum of a voice signal wave converted into an electric signal are tested to detect the voice, and the voice of the section is detected. I am aware of

そして、通常は、振幅レベルがある閾値を越えた点と下
回つた点を始端・終端としたり，あるいは上記点の近傍
でスペクトルが急激に変化した点を始端・終端として、
その音声区間に対して認識処理を行う。数字の“1"（/i
cni/）、札幌（/sapporo/）などの語中や、連続的に発
声された語と語の間には、休止区間（無音）が観測され
る。なお、語中の休止区間については音声区間に含める
方法と含めない方法がある。And, usually, the point where the amplitude level exceeds a certain threshold value and the point where it falls below it are used as the start and end points, or the point where the spectrum changes rapidly in the vicinity of the above point is set as the start and end points.
The recognition process is performed for the voice section. The number "1" (/ i
A pause interval (silence) is observed between words such as cni /) and Sapporo (/ sapporo /), and between continuously uttered words. Note that there is a method of including a pause section in a word and a method of not including it in the voice section.

一方、音声データを入力する場所は静かな事務室だけで
なく工場内や屋外などのように、各種機械から騒音が発
生される所も多い。そして、音声認識装置では一般に雑
音消去用接話型マイクロホンを使用し雑音耐力を上げて
いるが、それでも十分とは言えない。雑音のレベルが音
声検出の閾値を越えたり、真の発声の始端・終端の前後
で雑音そのもののスペクトルが変化すると、音声検出を
誤るという事態が生じる。また、語中や語間の休止区間
に雑音が重畳し音声検出区間を誤ると、見かけ上パタン
長が長くなつて標準パタンとの整合が困難になり、発声
全体の認識結果を誤つてしまうことになる。そして、騒
音がそれほど高くない場合には、閾値を上げてかつ語中
の休止区間を音声区間から除くことにより、ある程度は
雑音の影響を受けにくくできる。しかし、雑音の振幅や
スペクトルが短時間に大幅に変化する場合、つまり、非
定常雑音である場合には、閾値を雑音のピーク値より高
く設定することになり、今度は発声の始端・終端および
休止点近傍にある振幅の小さい部分や子音部分の検出が
困難となるので、認識性能が著しく低下し、この方法は
実用的でない。On the other hand, the place to input voice data is not only in a quiet office but also in many places such as in a factory or outdoors where noise is generated from various machines. In addition, a speech recognition apparatus generally uses a noise canceling close-talking microphone to improve noise immunity, but this is not enough. If the noise level exceeds the threshold for voice detection, or if the spectrum of the noise itself changes before and after the beginning and end of true utterance, a situation occurs in which voice detection is erroneous. Also, if noise is superimposed on a pause interval between words or between words and the voice detection section is mistaken, the pattern length becomes apparently long and it becomes difficult to match with the standard pattern, and the recognition result of the entire utterance is erroneous. become. Then, when the noise is not so high, the influence of the noise can be reduced to some extent by raising the threshold and removing the pause section in the word from the voice section. However, when the noise amplitude or spectrum changes significantly in a short time, that is, when it is non-stationary noise, the threshold is set higher than the peak value of the noise, and this time the start and end of the utterance and Since it becomes difficult to detect a small amplitude portion or a consonant portion near the pause point, the recognition performance is significantly reduced, and this method is not practical.

この音声検出誤りの影響を少なくするため発声の始端・
終端を一定に定めず、始端および終端に幅を持たせた、
いわゆる、始端・終端フリーの認識方法がある。そし
て、この始端・終端フリーの認識は始端候補点と終端候
補点が取り得る全ての組合せの区間の音声パタンについ
て比較照合を行い、認識結果として最も可能性の高いも
のを最終結果とすることにより実現される。その一例が
例えば、特願昭61−31179号明細書に詳細に記載されて
いる。そして、端点フリーの認識により音声区間の始端
・終端の検出誤りを少なくすることは可能であるが、発
声の休止区間に雑音が混入しその雑音が音声区間内に含
まれてしまう問題については何ら効果がないために正し
い認識結果が得られないことがよくおきる。In order to reduce the effect of this voice detection error,
The end is not fixed and the beginning and end have a width.
There is a so-called start / end free recognition method. Then, the recognition of the start / end free is performed by comparing and collating the voice patterns of the sections of all the combinations of the start and end candidate points, and determining the most likely recognition result as the final result. Will be realized. One example thereof is described in detail in Japanese Patent Application No. 61-31179. Although it is possible to reduce the detection error of the beginning and end of the voice section by the end-free recognition, there is no problem about the noise included in the pause section of the utterance and the noise included in the voice section. It is often the case that correct recognition results cannot be obtained due to ineffectiveness.

[Problems to be Solved by the Invention]

上述した従来の音声認識方法では、音声の振幅レベルや
スペクトル変化などで音声検出を行い、始端・終端フリ
ーで音声認識を行う場合、発声中の休止区間に音声が混
入したときの付加によるエラーは依然として解決されて
いないという課題があつた。In the above-mentioned conventional voice recognition method, when voice detection is performed based on the amplitude level or spectrum change of the voice, and voice recognition is performed at the start / end free, an error due to addition when voice is mixed in the pause section during utterance There was a problem that it was not solved yet.

[Means for Solving the Problems]

本発明による音声認識装置は、発声された音声を標準パ
タンとDPマツチングを行い、最小の相違度を与える標準
パタンを求めることにより認識を行う音声認識装置にお
いて、標準パタン時間長に比例した相違度を計算するDP
マツチング部を持ち、DPマツチングパスが標準パタン側
の休止位置を通過するときにそれに対応する入力パタン
側の位置の近傍で相異度の最小値を与える仮休止点を求
め、この相異度を境界条件として標準パタンの上記休止
位置以降の部分と入力パタンの上記仮休止点から先行し
た点の近傍部分との間で端点フリーにてDPマツチングを
続行する手段を備えてなるものである。The speech recognition apparatus according to the present invention is a speech recognition apparatus which performs DP matching of uttered speech with a standard pattern and obtains a standard pattern that gives a minimum degree of difference. DP to calculate
When the DP matching path has a matching part and passes through the rest position on the standard pattern side, finds a temporary rest point that gives the minimum value of the difference near the position on the input pattern side corresponding to it, and determines the difference As a condition, there is provided means for continuing DP matching between the portion after the rest position of the standard pattern and the portion in the vicinity of the point preceding the temporary rest point of the input pattern without end points.

[Action]

本発明においては、標準パタン側の休止位置に対応する
入力パタンの位置の前後でも端点フリーでマツチングが
行なわれるため、入力パタンの休止区間の正確な検出が
必要でない。In the present invention, since the end-point-free matching is performed even before and after the position of the input pattern corresponding to the rest position on the standard pattern side, it is not necessary to accurately detect the rest period of the input pattern.

〔Example〕

以下、図面に基づき本発明の実施例を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

第１図は本発明の一実施例を示すブロツク図である。FIG. 1 is a block diagram showing an embodiment of the present invention.

図において、１は音声信号波ｓを入力する入力部、２は
この入力部１の出力を入力とする標準パタンメモリ、３
は入力部１の出力を入力とする入力パタンメモリ、４は
この入力パタンメモリ３からの入力パタン１と標準パタ
ンメモリ２よりの標準パタンｊを入力とし標準パタン時
間長に比例した相異度を計算するDPマツチング部で、こ
れらはDPマツチングパスが標準パタン側の休止位置を通
過するときにそれに対応する入力パタン側の位置の近傍
で相異度の最小値を与える仮休止点を求め、この相異度
を境界条件として標準パタンの上記休止位置以降の部分
と入力パタンの上記仮休止点から先行した点の近傍部分
との間で端点フリーにてDPマツチングを続行する手段を
構成している。In the figure, 1 is an input section for inputting an audio signal wave s, 2 is a standard pattern memory for receiving the output of the input section 1, 3
Is an input pattern memory whose input is the output of the input unit 1, and 4 is an input pattern 1 from the input pattern memory 3 and a standard pattern j from the standard pattern memory 2 is input, and a difference degree proportional to the standard pattern time length is shown. In the DP matching section for calculation, these are temporary rest points that give the minimum value of the difference in the vicinity of the corresponding position on the input pattern side when the DP matching path passes through the rest position on the standard pattern side. A means for continuing DP matching between the portion after the rest position of the standard pattern and the portion in the vicinity of the point preceding the temporary rest point of the input pattern using the degree of difference as a boundary condition is constituted.

つぎにこの第１図に示す実施例の動作を説明する。The operation of the embodiment shown in FIG. 1 will be described below.

まず、入力部１は入力される音声信号波ｓの振幅レベル
が予め定められた閾値より高い区間を音声区間として検
出し、特徴パラメータの時系列パタンに変換する。ま
た、語中に休止区間があればその位置も検出する。そし
て、登録時においては、時系列パタンと休止位置が標準
パタンメモリ２に記憶される。認識時においては、時系
列パタンは入力パタンメモリ３に一時的に記憶される。First, the input unit 1 detects a section in which the amplitude level of the input voice signal wave s is higher than a predetermined threshold as a voice section and converts it into a time-series pattern of characteristic parameters. If there is a pause section in the word, its position is also detected. Then, at the time of registration, the time-series pattern and the rest position are stored in the standard pattern memory 2. At the time of recognition, the time series patterns are temporarily stored in the input pattern memory 3.

つぎに、DPマツチング部４は入力パタンメモリ３から出
力される入力パタンｉと標準パタンメモリ２から出力さ
れる標準パタンｊをベースにして標準パタン時間長に比
例した相異度を計算する。そして、標準パタンメモリ２
からは標準パタンの休止位置情報ｑもDPマツチング部４
へ指示される。Next, the DP matching unit 4 calculates a difference degree proportional to the standard pattern time length based on the input pattern i output from the input pattern memory 3 and the standard pattern j output from the standard pattern memory 2. And the standard pattern memory 2
Also, the rest position information q of the standard pattern is displayed by the DP matching unit 4
Be instructed to.

第２図は第１図の動作説明に供するDPマツチングの過程
を説明するための図で、横軸に入力パタンｉを、縦軸に
標準パタンｊをとつて表わした説明図である。FIG. 2 is a diagram for explaining the process of DP matching used in the explanation of the operation in FIG. 1, and is an explanatory diagram in which the horizontal axis represents the input pattern i and the vertical axis represents the standard pattern j.

DPマツチングパスＤが標準パタン側の休止位置Ｑを通過
するとき、それに対応する入力パタン側の近傍K₁で最小
の相異度を与える点P₁（仮休止点）を求める。この点P₁
の相異度を境界条件として標準パタン側の休止位置Ｑ以
降の部分パタンと入力パタンの仮休止点から先行した点
の近傍K₂との間で端点フリーでDPマツチングを続行す
る。この第２図において、端点フリーのマツチング結果
として点P₂からDPマツチングが続行したことを示す。点
P₁から点P₂の間が入力パタン側の休止区間と扱われたこ
とになる。When the DP matching path D passes through the rest position Q on the standard pattern side, a point P ₁ (temporary rest point) that gives the minimum difference in the corresponding neighborhood K ₁ on the input pattern side is obtained. This point P ₁
As a boundary condition, the DP matching is continued between the partial pattern after the rest position Q on the standard pattern side and the neighborhood K _{2 of} the point preceding the temporary rest point of the input pattern, free of end points. In FIG. 2, it is shown that the DP matching continued from the point P ₂ as a result of the end point-free matching. point
This means that the interval between P ₁ and P ₂ is treated as the pause interval on the input pattern side.

そして、雑音が休止区間に混入し音声検出を誤つた場合
でも、入力パタンの雑音成分はスキツプしてDPマツチン
グが行なわれる。ここで、標準パタン毎に休止位置とそ
の個数は異なるが、上記の計算を繰り返すことにより、
標準パタンと入力パタンの始端から終端までの相異度を
求め、最終的に標準パタン長Ｊで正規化した相異度が最
も小さい標準パタンを標準結果Ｒとして出力する。Then, even if noise is mixed in the pause interval and the voice detection is erroneous, the noise component of the input pattern is skipped and DP matching is performed. Here, although the rest position and the number thereof are different for each standard pattern, by repeating the above calculation,
The difference between the start end and the end of the standard pattern and the input pattern is obtained, and finally the standard pattern with the smallest difference normalized by the standard pattern length J is output as the standard result R.

〔The invention's effect〕

以上説明したように本発明は、標準パタン側の休止位置
に対応する入力パタンの位置の前後でも端点フリーでマ
ツチングが行なわれるため、入力パタンの休止区間の正
確な検出が必要でなく、発声の休止区間に雑音が重畳し
入力パタンが長くなり誤認識をおこすという課題を解決
することができるので、非定常騒音があつた場合でも通
常時の認識性能を維持することができるという効果があ
る。また、本発明は語中の休止区間に限らず、語間の休
止区間についても適用可能であるので、連続単語認識に
おいても効果を発揮する。As described above, according to the present invention, since the end-point-free matching is performed even before and after the position of the input pattern corresponding to the rest position on the standard pattern side, it is not necessary to accurately detect the rest period of the input pattern, and the utterance Since it is possible to solve the problem that noise is superimposed on the pause section and the input pattern becomes long to cause erroneous recognition, there is an effect that the recognition performance in normal time can be maintained even when there is unsteady noise. Further, the present invention can be applied not only to the pause section in a word but also to the pause section between words, so that it is also effective in continuous word recognition.

[Brief description of drawings]

第１図は本発明の一実施例を示すブロツク図、第２図は
第１図の動作説明に供するDPマツチングの過程を説明す
るための説明図である。１……入力部、２……標準パタンメモリ、３……入力パ
タンメモリ、４……DPマツチング部。FIG. 1 is a block diagram showing an embodiment of the present invention, and FIG. 2 is an explanatory diagram for explaining the process of DP matching used for the operation explanation of FIG. 1 ... Input section, 2 ... Standard pattern memory, 3 ... Input pattern memory, 4 ... DP matching section.

Claims

[Claims]

1. A speech recognition apparatus for recognizing a uttered voice by performing DP matching with a standard pattern and obtaining a standard pattern giving a minimum degree of difference, and determining a degree of difference proportional to a standard pattern time length. It has a DP matching part for calculation, and when the DP matching path passes through the rest position on the standard pattern side, finds a temporary rest point that gives the minimum value of the difference near the position on the input pattern side corresponding to this With a degree as a boundary condition, a means is provided for continuing the DP matching without end points between the portion after the rest position of the standard pattern and the portion in the vicinity of the point preceding the temporary rest point of the input pattern. And a voice recognition device.