JPH06105400B2

JPH06105400B2 - Voice recognition system

Info

Publication number: JPH06105400B2
Application number: JP63279132A
Authority: JP
Inventors: 勇一郎藤橋
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-11-04
Filing date: 1988-11-04
Publication date: 1994-12-21
Anticipated expiration: 2009-12-21
Also published as: JPH02125299A

Description

【発明の詳細な説明】技術分野本発明は音声認識方式に関し、特に各時間毎に入力パタ
ーンの終端点を時間方向にシフトしながらパターンマッ
チングを行い、標準パターンとの距離を算出し、各時間
毎の距離を評価して認識結果を判定する音声認識方式に
関する。Description: TECHNICAL FIELD The present invention relates to a speech recognition system, and in particular, pattern matching is performed while shifting an end point of an input pattern in the time direction at each time, and a distance from a standard pattern is calculated, and each time is calculated. The present invention relates to a voice recognition system that evaluates a distance for each and determines a recognition result.

従来技術従来、この種の音声認識方式の認識結果判定部で用いる
認識結果判定の終了条件には評価距離に関して定めた条
件だけを用いている。2. Description of the Related Art Conventionally, only the condition defined for the evaluation distance is used as the end condition of the recognition result determination used by the recognition result determination unit of this type of voice recognition method.

上述した従来の音声認識方式の認識結果判定部では、認
識結果判定の終了条件に評価距離に関して定めた条件だ
けを用いているため、最適な位置で認識結果判定を終了
できない。よって、誤認識を招くことがあり、正しい認
識結果を得られないという欠点があった。In the recognition result determination unit of the conventional voice recognition method described above, only the condition defined regarding the evaluation distance is used as the end condition of the recognition result determination, and therefore the recognition result determination cannot be completed at the optimum position. Therefore, there is a drawback that incorrect recognition may be caused and a correct recognition result cannot be obtained.

発明の目的本発明の目的は、最適な位置で認識結果判定を終了し
て、正しい認識結果を得るようにした音声認識方式を提
供することである。OBJECT OF THE INVENTION It is an object of the present invention to provide a voice recognition system which ends recognition result determination at an optimum position and obtains a correct recognition result.

発明の構成本発明によれば、時系列データである評価距離に対して
評価距離閾値と評価距離時間閾値を設け、音声認識結果
を判定する方法であって、時系列データである入力音声
のパワーに対してパワー閾値とパワー継続時間閾値を設
け、入力音声の評価距離が、評価距離閾値と評価距離継
続時間閾値とにより定められる条件を満足し、かつ入力
音声のパワーがパワー閾値とパワー継続時間閾値とによ
り定められる条件を満足した時に認識結果判定を終了
し、認識結果を出力するようにした音声認識方式が得ら
れる。According to the present invention, a method of determining a voice recognition result by providing an evaluation distance threshold value and an evaluation distance time threshold value for an evaluation distance that is time-series data, that is, the power of an input voice that is time-series data. A power threshold value and a power duration time threshold value are provided for the input voice evaluation distance satisfying the condition defined by the evaluation distance threshold value and the evaluation distance duration time threshold value, and the input voice power is the power threshold value and the power duration time. A voice recognition system is obtained in which the recognition result determination is ended and the recognition result is output when the condition defined by the threshold value is satisfied.

実施例次に、図面を参照して本発明の実施例について説明す
る。Example Next, an example of the present invention will be described with reference to the drawings.

第１図は本発明の一実施例のブロック図である。本発明
の実施例は、音声分析部１、標準パターン部２、パター
ンマッチング部３、認識結果定部４、パワー算出部５よ
り構成される。FIG. 1 is a block diagram of an embodiment of the present invention. The embodiment of the present invention includes a voice analysis unit 1, a standard pattern unit 2, a pattern matching unit 3, a recognition result determination unit 4, and a power calculation unit 5.

入力音声10は音声分析部１とパワー算出部５とに入力さ
れ、音声分析部１で特徴パターンに分析され、入力パタ
ーン11としてパターンマッチング部３に入力される。パ
ワー算出部５では、入力音声10のパワーを算出し入力パ
ワー15として認識結果判定部４に出力される。パターン
マッチング部３は標準パターン２から標準パターン12を
読出し、入力パターン11とパターンマッチングを毎フレ
ーム行いパターン間距離を算出し、評価距離13として、
認識結果判定部４に出力する。The input voice 10 is input to the voice analysis unit 1 and the power calculation unit 5, analyzed by the voice analysis unit 1 into a characteristic pattern, and input as an input pattern 11 to the pattern matching unit 3. The power calculation unit 5 calculates the power of the input voice 10 and outputs it as the input power 15 to the recognition result determination unit 4. The pattern matching unit 3 reads the standard pattern 12 from the standard pattern 2, performs pattern matching with the input pattern 11 for each frame, calculates the inter-pattern distance, and sets it as the evaluation distance 13.
It outputs to the recognition result determination unit 4.

尚、パターンマッチングはフレーム毎に入力パターンの
終端点を前フレームの時の終端点よりも１フレーム後ろ
にシフトしてから行う。パターンマッチングの方法は特
に限定されないが、本実施例では、始端フリーDPマッチ
ングを用いた例について説明する。The pattern matching is performed after shifting the end point of the input pattern for each frame by one frame behind the end point of the previous frame. The pattern matching method is not particularly limited, but in this embodiment, an example using the start end free DP matching will be described.

第２図は始端フリーDPマッチングを用いた時のパターン
マッチングのDPパス許容領域を示す図であり、第３図に
示すDPパスを用いる。第2,3図とも横軸が入力パターン
のフレーム、縦軸が標準パターンのフレームであり、ｉ
は入力パターンのフレーム、ｊは標準パターンのフレー
ムを示す。FIG. 2 is a diagram showing a DP path permissible area for pattern matching when the start end free DP matching is used, and the DP path shown in FIG. 3 is used. 2 and 3, the horizontal axis is the frame of the input pattern and the vertical axis is the frame of the standard pattern.
Indicates an input pattern frame, and j indicates a standard pattern frame.

始端フリーDPマッチング法とは、終端点に至るDPマッチ
ング経路のうち最小のパターン間距離を与えるものを始
端点を限定しないで求める方法であり、第２図に示した
例では、入力パターンのフレームＩを終端点とした時、
IS1からIS2のいづれかのフレームが始端点となることを
示している。又、フレーム毎に入力パターンの終端点を
シフトしてパターンマッチングを行うことは、第２図で
示されたDPパスの許容領域が横軸ｉの方向にスライドし
たことを意味する。The start-end free DP matching method is a method for finding a DP matching path that gives the minimum inter-pattern distance to the end point without limiting the start end point. In the example shown in FIG. When I is the end point,
It indicates that one of the frames from IS1 to IS2 is the starting point. Further, shifting the end point of the input pattern for each frame to perform pattern matching means that the allowable area of the DP path shown in FIG. 2 has slid in the direction of the horizontal axis i.

次に、第４図及び第５図を参照して、認識結果判定部４
について説明する。評価距離13は、入力フレーム毎に算
出されるため、第４図（ａ），（ｂ）に示す様な入力フ
レームに対する時系列データとなる。第４図の横軸は入
力フレーム、縦軸は評価距離である。第４図（ａ）は正
解標準パターンに対する評価距離のグラフ、第４図
（ｂ）は誤り標準パターンに対する評価距離のグラフで
ある。Next, referring to FIG. 4 and FIG. 5, the recognition result determination unit 4
Will be described. Since the evaluation distance 13 is calculated for each input frame, it becomes time-series data for the input frame as shown in FIGS. 4 (a) and 4 (b). The horizontal axis of FIG. 4 is the input frame, and the vertical axis is the evaluation distance. FIG. 4 (a) is a graph of the evaluation distance for the correct standard pattern, and FIG. 4 (b) is a graph of the evaluation distance for the error standard pattern.

パワー15は入力フレーム毎に算出されるため第５図に示
す様な入力フレームに対する時系列デターとなる。第５
図の横軸は入力フレーム、縦軸はパワーである。Since the power 15 is calculated for each input frame, it becomes a time series data for the input frame as shown in FIG. Fifth
The horizontal axis of the figure is the input frame, and the vertical axis is the power.

認識結果判定部４では第４図に示す様な評価距離閾値DT
Hと、評価距離継続時間閾値LDTHとを設け、評価距離がD
THを継続して下回っている区間の時間がLDTHフレーム以
上であるかを判定し、LDTH以上である区間を谷区間と
し、谷区間が存在することを、評価距離に関する条件と
する。また、第５図に示す様なパワー閾値PTHと、パワ
ー継続時間閾値LPTHとを設け、パワーがPTHを連続してL
PTHフレーム以上下回ることを、パワーに関する条件と
し、二つの条件を満足したとき認識の判定を終了し認識
結果14を出力する。In the recognition result judging unit 4, the evaluation distance threshold DT as shown in FIG.
H and the evaluation distance duration threshold LDTH are set, and the evaluation distance is D
It is determined whether the time of the section that is continuously below TH is equal to or longer than the LDTH frame, the section that is equal to or greater than LDTH is the valley section, and the existence of the valley section is a condition related to the evaluation distance. Also, a power threshold PTH and a power duration threshold LPTH as shown in FIG.
The power condition is that PTH frames fall below the PTH frame, and when the two conditions are satisfied, the recognition determination is ended and the recognition result 14 is output.

第４図及び第５図で示した例は、音声の前に雑音が付加
されている場合である。第４図（ｂ）の誤り標準パター
ンとの評価距離の場合、雑音とのマッチングにより評価
距離がDTHを下回りその継続時間LD2がLDTH以上であるた
め谷区間となり、スポッティング誤りが発生する。ま
た、その谷区間の最下点はV2となる。評価距離に関する
条件だけを終了条件とした場合、この時点で認識の判定
を終了し認識結果14を出力するため、誤認識となってし
まう。The example shown in FIGS. 4 and 5 is a case where noise is added before the voice. In the case of the evaluation distance with the error standard pattern in FIG. 4B, matching with noise causes the evaluation distance to fall below DTH and its duration LD2 to be LDTH or more, which results in a valley section and spotting error occurs. The lowest point of the valley section is V2. If only the condition related to the evaluation distance is set as the end condition, the recognition determination is ended at this point and the recognition result 14 is output, resulting in erroneous recognition.

しかし、本発明では評価距離に関する条件とパワーに関
する条件の二つの満足したとき認識の判定を終了するた
め、最下点がV2である谷区間に対応する区間のパワーは
PTHを下回るが、その継続時間LP1がLPTH以下であるため
にパワーに関する条件を満足せず、終了条件は成り立た
ず、誤認識となることが防止される。However, in the present invention, the recognition determination ends when the two conditions of the evaluation distance and the power are satisfied, so that the power of the section corresponding to the valley section whose lowest point is V2 is
Although it is less than PTH, the duration LP1 is less than or equal to LPTH, so the condition regarding power is not satisfied, the termination condition is not satisfied, and erroneous recognition is prevented.

次に、第４図（ａ）の正解標準パターンとの評価距離の
場合、評価距離がDTHを下回りその継続時間LD1がLDTH以
上であるため谷区間となり、その最下点V1となる。ま
た、最下点がV1である谷区間に対応する区間のパワーは
PTHを下回り、その継続時間がLPTH以上となった時点Ien
dでパワーに関する条件と評価距離に関する条件とを満
足し、終了条件が成立する。Next, in the case of the evaluation distance with respect to the correct standard pattern of FIG. 4 (a), the evaluation distance is less than DTH and the duration LD1 thereof is LDTH or more, which is a valley section and is the lowest point V1. In addition, the power of the section corresponding to the valley section whose lowest point is V1 is
When Pen is below PTH and the duration is above LPTH Ien
In d, the condition regarding the power and the condition regarding the evaluation distance are satisfied, and the ending condition is satisfied.

一方、第４図（ｂ）の誤り標準パターンとの評価距離
も、再度評価距離がDTHを下回りその継続時間LD3がLDTH
以上となるため谷区間が形成され、この最下点はV3とな
る。本実施例では、谷区間の最下点の評価距離を判定ス
コアーとして用いる。但し、判定スコアーとして何を用
いるかは特に限定されない。On the other hand, as for the evaluation distance with the error standard pattern in FIG. 4 (b), the evaluation distance again falls below DTH, and its duration LD3 is LDTH.
Because of the above, a valley section is formed, and the lowest point is V3. In this embodiment, the evaluation distance at the lowest point of the valley section is used as the judgment score. However, what is used as the determination score is not particularly limited.

このようにして、全ての標準パターンとの判定スコアー
を求め、判定スコアーが最も小さい標準パターンを認識
結果14として選択して出力するのである。第４図
（ａ），（ｂ）に示した例の場合、V1,V3が判定スコア
ーとなり、V1＜V3であることから、正解標準パターンが
認識結果14として選択され出力とされる。In this way, the judgment score with all the standard patterns is obtained, and the standard pattern with the smallest judgment score is selected and output as the recognition result 14. In the case of the example shown in FIGS. 4 (a) and 4 (b), since V1 and V3 are the judgment scores and V1 <V3, the correct answer standard pattern is selected and output as the recognition result 14.

第４図（ａ），（ｂ）、第５図に示した例から明らかな
用に、評価距離に関する条件だけを終了条件に用いた場
合、スポッティング誤りが発生した場合適切な位置で認
識の判定を終了できず、誤認識を招いてしまう。一方、
上述の本実施例で示したように、パワーに関する条件と
評価距離に関する条件とを終了条件に用いることによ
り、スポッティング誤りが発生しても適切な位置で認識
の判定を終了でき、従来方式で発生した誤認識を除去し
て正しい認識結果を得ることができる。As is clear from the examples shown in FIGS. 4 (a), 4 (b), and 5, when only the condition related to the evaluation distance is used as the end condition, when spotting error occurs, determination of recognition at an appropriate position is made. Can not be ended, which leads to misrecognition. on the other hand,
As shown in the above-described embodiment, by using the condition regarding power and the condition regarding evaluation distance as the ending condition, the recognition determination can be ended at an appropriate position even if a spotting error occurs, and the conventional method is used. It is possible to obtain the correct recognition result by removing the incorrect recognition.

発明の効果以上説明した様に、本発明によれば、評価距離に対する
条件だけで認識結果の判定を終了するのではなく、本発
明による評価距離に対する条件と、入力音声のパワーに
対する条件とを満足した時に認識結果の判定を終了し認
識結果を出力する方式を用いることにより、正確な認識
結果を得ることができ、誤認識を少なくすることができ
るという効果がある。EFFECTS OF THE INVENTION As described above, according to the present invention, the judgment of the recognition result is not completed only by the condition for the evaluation distance, but the condition for the evaluation distance according to the present invention and the condition for the power of the input voice are satisfied. By using the method of ending the determination of the recognition result and outputting the recognition result at the time of performing, there is an effect that an accurate recognition result can be obtained and erroneous recognition can be reduced.

[Brief description of drawings]

第１図は本発明の実施例のブロック図、第２図は始端フ
リーDPマッチングのDPパスの許容領域を示す図、第３図
は始端フリーDPマッチングのDPパスを示す図、第４図
（ａ）は正解標準パターンと評価距離との関係を示す
図、第４図（ｂ）は誤り標準パターンと評価距離との関
係を示す図、第５図は入力音声のパワーグラフの例を示
す図である。主要部分の符号の説明１……音声分析部２……標準パターン部３……パターンマッチング部４……認識結果判定部５……パワー算出部FIG. 1 is a block diagram of an embodiment of the present invention, FIG. 2 is a diagram showing a permissible area of a DP path of start-end free DP matching, FIG. 3 is a diagram showing a DP path of start-end free DP matching, and FIG. a) is a diagram showing the relationship between the correct standard pattern and the evaluation distance, FIG. 4 (b) is a diagram showing the relationship between the error standard pattern and the evaluation distance, and FIG. 5 is a diagram showing an example of a power graph of the input voice. Is. Description of symbols of main parts 1 ... Voice analysis unit 2 ... Standard pattern unit 3 ... Pattern matching unit 4 ... Recognition result determination unit 5 ... Power calculation unit

Claims

[Claims]

1. A standard pattern portion for storing a characteristic pattern of a word or a phoneme to be recognized as a standard pattern, and an input pattern which is a characteristic pattern of the input speech by analyzing the input speech at a predetermined frame period. A voice analysis unit that outputs, a pattern matching unit that calculates an evaluation distance by matching a standard pattern and an input pattern, a recognition result determination unit that determines a recognition result based on the evaluation distance,
And a power calculation unit that calculates the power of the input voice,
The pattern matching unit shifts the end point of the input pattern by one frame for each frame of the input pattern to perform pattern matching, calculates the inter-pattern distance as an evaluation distance for each frame, and recognizes the evaluation distance as time-series data. Output to the determination unit, the recognition result determination unit is a voice recognition method for determining the recognition result by providing an evaluation distance threshold and an evaluation distance duration threshold for the evaluation distance that is time-series data. A power threshold and a power duration threshold are provided for the power of a certain input voice, and the evaluation distance of the input voice satisfies the condition defined by the evaluation distance threshold and the evaluation distance duration threshold, and the power of the input voice Ends the recognition result determination when the condition defined by the power threshold value and the power duration threshold value is satisfied, Speech recognition system, characterized in that the force.