JP2712586B2

JP2712586B2 - Pattern matching method for word speech recognition device

Info

Publication number: JP2712586B2
Application number: JP1173141A
Authority: JP
Inventors: 潤亀谷
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1989-07-06
Filing date: 1989-07-06
Publication date: 1998-02-16
Anticipated expiration: 2013-02-16
Also published as: JPH0339797A

Description

【発明の詳細な説明】［産業上の利用分野］本発明はパターン登録型の単語音声認識装置に関し，
特に単語標準パターンを子音部分，過渡部分及び母音定
常部分とに分け，適応的に線形または非線形にマッチン
グを行なうパターンマッチング方式に関するものであ
る。Description: TECHNICAL FIELD The present invention relates to a pattern registration type word speech recognition apparatus,
In particular, the present invention relates to a pattern matching method in which a word standard pattern is divided into a consonant part, a transient part, and a vowel stationary part, and matching is performed linearly or nonlinearly adaptively.

［従来の技術］従来，この種の単語音声確認装置は，第２図に示すよ
うな構成となっている。図中において,1はマイクロフォ
ン等の音声入力部,2は入力される音声信号を無音区間か
ら分離する始終端検出部,3は音声信号から認識に適した
特徴パラメータを抽出する音響分析部,10は標準パター
ンと抽出した特徴パラメータの間でDPマッチング等を実
行するパターンマッチング部,6は登録する標準パターン
を蓄積しておく単語標準パターンメモリ部,8はパターン
マッチングの結果により認識結果を求める認識結果処理
部である。[Prior Art] Conventionally, this type of word voice confirmation device has a configuration as shown in FIG. In the figure, 1 is a voice input unit such as a microphone, 2 is a start / end detection unit that separates an input voice signal from a silent section, 3 is an acoustic analysis unit that extracts feature parameters suitable for recognition from the voice signal, 10 Is a pattern matching unit that performs DP matching etc. between the standard pattern and the extracted feature parameters, 6 is a word standard pattern memory unit that stores standard patterns to be registered, and 8 is a recognition unit that obtains recognition results based on the results of pattern matching. It is a result processing unit.

［発明が解決しようとする課題］上述した従来の単語音声認識装置では，パターンマッ
チング方式として,DPマッチングに代表される非線形マ
ッチング方式を一般的に採用している。DPマッチング等
の非線形マッチング方式は，線形マッチング方式と比較
して，発生速度の違いによるパターンの時間軸方向の伸
縮に対し，極めて適応性の高い方式であり，単語認識率
も高い値を得られるという利点がある。[Problems to be Solved by the Invention] In the conventional word speech recognition apparatus described above, a non-linear matching method represented by DP matching is generally adopted as a pattern matching method. Non-linear matching methods such as DP matching are highly adaptable to the expansion and contraction of the pattern in the time axis direction due to the difference in the generation speed, and can obtain a high word recognition rate, compared to the linear matching method There is an advantage.

しかし，発声速度の違いによる単語パターンの伸縮
は，ほとんど全て定常母音部分で生じ，子音および過渡
的な部分は，あまり時間的な伸縮が起らない。このた
め，単語パターン全体に渡って非線形マッチングを行な
う場合，母音部分のパターンに対する重みが大きくなっ
て，相対的に子音，過渡部分の評価値に占める割合が下
がり，誤認識を起こす場合がある。例えば,/gosjaku/
（五尺）と/osjaku/（お杓）,/jonnijon/（424）と/jon
jon/（44）などの，母音系列が似ている単語は誤認識を
起しやすい。これを回避するために,DPマッチングでは
マッチングパスに傾斜制限を設ける等の策を講じている
が，決定的な解決策がないのが現状である。However, the expansion and contraction of the word pattern due to the difference in the utterance speed almost entirely occurs in the stationary vowel part, and the consonant and the transition part do not undergo much temporal expansion and contraction. For this reason, when performing non-linear matching over the entire word pattern, the weight of the vowel portion pattern becomes large, and the ratio of the consonant and the transient portion to the evaluation value is relatively reduced, which may cause erroneous recognition. For example, / gosjaku /
(5 shaku) and / osjaku / (ladle), / jonnijon / (424) and / jon
Words with similar vowel sequences, such as jon / (44), are prone to misrecognition. In order to avoid this, DP matching takes measures such as setting a slope limit on the matching path, but there is no definitive solution at present.

［課題を解決するための手段］本発明による単語音声認識装置用パターンマッチング
方式は，母音の定常部分の標準パターンのみを蓄積する
定常母音標準パターン蓄積手段と，該定常母音標準パタ
ーン蓄積手段に蓄積された定常母音標準パターンと入力
単語標準パターンとを照合して，前記入力単語標準パタ
ーン中の定常母音区間を識別し，単語標準パターンを修
正する単語標準パターン修正手段と，該単語標準パター
ン修正手段によって修正された単語標準パターンを用い
て，入力単語パターンに対し，線形，非線形パターンマ
ッチングを適応的に行う適応形パターンマッチング手段
とを有し，前記入力単語パターン中の子音区間に対して
は線形マッチングを，母音区間に対しては非線形マッチ
ングを行うことを特徴とする。[Means for Solving the Problems] A pattern matching method for a word speech recognition device according to the present invention comprises: a stationary vowel standard pattern storage means for storing only a standard pattern of a stationary part of a vowel; The standardized vowel standard pattern and the input word standard pattern are compared with each other to identify a stationary vowel section in the input word standard pattern, and the word standard pattern correcting means for correcting the word standard pattern; Adaptive pattern matching means for adaptively performing linear and non-linear pattern matching on an input word pattern using the word standard pattern corrected by the method described above, and linearly matching consonant sections in the input word pattern. It is characterized in that matching is performed on a vowel section by nonlinear matching.

［実施例］次に，本発明について図面を参照して説明する。Next, the present invention will be described with reference to the drawings.

第１図は本発明の一実施例によるパターンマッチング
方式が適用される単語音声認識装置の構成を示すブロッ
ク図である。FIG. 1 is a block diagram showing a configuration of a word speech recognition apparatus to which a pattern matching method according to one embodiment of the present invention is applied.

音声入力部１は，マイクロフォン等を通じて入力され
る音声信号をディジタル化するユニットである。始終端
検出部２は，音声入力部１により入力された音声信号の
前後にある無音区間から音声区間だけを分離するユニッ
トである。音響分析部３は始終端検出部２により分離し
た音声信号を分析し，認識に有効な特徴パラメータを抽
出するユニットである。定常母音標準パラメータメモリ
部４は，母音の定常部分に対応する特徴パラメータ系列
のみを蓄積するユニットである。標準パターン修正部５
は，定常母音標準パラメータメモリ部４に蓄積された定
常母音標準パターンを参照して，単語標準パターンの定
常母音の部分を識別し，修正するユニットである。単語
標準パターンメモリ部６は，修正後の単語標準パターン
を蓄積するユニットである。適応形パターンマッチング
部７は，単語標準パターンに記述されている情報に従
い，入力音声信号の特徴パラメータ系列と単語標準パタ
ーンの間で適応的に線形，非線形マッチングを行なうユ
ニットである。認識結果処理部８は，適応的パターンマ
ッチングの結果として得られる評価値に基づき，単語認
識処理を行なうユニットである。The voice input unit 1 is a unit that digitizes a voice signal input through a microphone or the like. The start / end detection unit 2 is a unit that separates only a voice section from a silent section before and after a voice signal input by the voice input unit 1. The acoustic analysis unit 3 is a unit that analyzes the audio signal separated by the start / end detection unit 2 and extracts a feature parameter effective for recognition. The stationary vowel standard parameter memory unit 4 is a unit that stores only a feature parameter sequence corresponding to a stationary part of a vowel. Standard pattern correction unit 5
Is a unit for referring to the standard vowel standard pattern stored in the standard vowel standard parameter memory unit 4 to identify and correct the normal vowel part of the word standard pattern. The word standard pattern memory unit 6 is a unit that stores the corrected word standard pattern. The adaptive pattern matching unit 7 is a unit that adaptively performs linear and nonlinear matching between a feature parameter sequence of an input speech signal and a word standard pattern in accordance with information described in the word standard pattern. The recognition result processing unit 8 is a unit that performs a word recognition process based on the evaluation value obtained as a result of the adaptive pattern matching.

以下にこの実施例の動作について説明する。 The operation of this embodiment will be described below.

まず，認識動作を行なう前に，単語標準パターンを登
録する必要がある。そのパターン登録の動作は，まず話
者が母音について孤立発声し，その発声はマイクロフォ
ン等を通じて音声入力部１に入力される。音声入力部１
において，入力信号はアナログ信号からディジタル信号
に変換され，次のユニットに送られる。始終端検出部２
に送られた入力信号は，数十個サンプル毎にフレーム単
位にまとめられ，フレーム内の平均パワー，零交差回数
等の情報に基づき，無音区間か音声区間かの判定がフレ
ーム単位で行なわれる。ここで，音声区間と判定された
フレーム区間が順次，音響分析部３に送られる。次の音
響分析部３において，送られて来た音声フレームデータ
はフレーム単位でケプストラム分析等が行なわれ，得ら
れた特徴パラメータとパワー情報が標準パターン修正部
５に送られる。標準パターン修正部５は，受け取ったフ
レーム単位のパワー情報を識別し，パワーがほぼ一定で
平坦となっているフレーム区間の特徴パラメータのみ
を，定常母音標準パターンメモリ部４に送り，定常母音
標準パターンメモリ部４はこの特徴パラメータ系列をそ
の発声母音の定常母音標準パターンとして蓄積してお
く。First, it is necessary to register a word standard pattern before performing a recognition operation. In the pattern registration operation, first, the speaker utters an isolated vowel, and the utterance is input to the voice input unit 1 through a microphone or the like. Voice input unit 1
In, the input signal is converted from an analog signal to a digital signal and sent to the next unit. Start / end detector 2
The input signal sent to the is grouped in frame units every several tens of samples, and a determination as to a silent section or a voice section is made in frame units based on information such as the average power and the number of zero crossings in the frame. Here, the frame sections determined as the voice sections are sequentially sent to the acoustic analysis unit 3. In the next acoustic analysis unit 3, the transmitted speech frame data is subjected to cepstrum analysis or the like in frame units, and the obtained feature parameters and power information are sent to the standard pattern correction unit 5. The standard pattern correction unit 5 identifies the received power information for each frame and sends only the characteristic parameters of the frame section in which the power is almost constant and flat to the stationary vowel standard pattern memory unit 4, and outputs the stationary vowel standard pattern. The memory unit 4 stores the feature parameter sequence as a standard vowel standard pattern of the vowel.

次に単語標準パターンの登録動作は，話者が登録する
単語を発声すると，音声入力部１から音響分析部３まで
同様に処理される。しかし、音響分析部３から標準パタ
ーン修正部５に送られる情報は、フレーム単位の特徴パ
ラメータのみで、パワー情報は単語情報パターン登録時
には送られない。標準パターン修正部５において，送ら
れて来た特徴パラメータは，既に登録されている定常母
音標準パターンメモリ部４内の標準パターンと照合され
る。その照合方法は，受信した単語音声の特徴パラメー
タ系列全体に渡り，個々の定常母音標準パターンと端点
フリーのDPマッチングを行ない，マッチングの評価値が
あるスレッショルド値以下でかつ最小となるフレーム区
間を，その単語標準パターンの定常母音区間として，特
徴パラメータの末尾に定常母音フラグを立てる。以上の
処理を各母音の標準パターンについて繰り返し行ない，
マッチングの評価がスレッショルド以下の条件を満足し
なくなった時点で終了する。ただし，このマッチングの
結果，定常母音区間が重なった場合は，評価値の小さい
方の区間を採用する。この定常母音区間の識別が完了し
た単語標準パターンは，修正済単語標準パターンとして
単語標準パターンメモリ部６に送られ蓄積される。Next, in the registration operation of the word standard pattern, when the speaker utters a word to be registered, the same processing is performed from the voice input unit 1 to the sound analysis unit 3. However, the information sent from the acoustic analysis unit 3 to the standard pattern correction unit 5 is only the feature parameters in frame units, and the power information is not sent when the word information pattern is registered. In the standard pattern correction unit 5, the sent feature parameters are collated with the standard patterns stored in the standard vowel standard pattern memory unit 4 which have already been registered. The matching method is to perform end-point-free DP matching with each standard vowel standard pattern over the entire feature parameter sequence of the received word speech, and to determine the frame interval in which the matching evaluation value is less than or equal to a certain threshold value and minimum. As a stationary vowel section of the word standard pattern, a stationary vowel flag is set at the end of the feature parameter. The above processing is repeated for each vowel standard pattern.
The process ends when the evaluation of the matching no longer satisfies the condition below the threshold. However, if the result of this matching is that the normal vowel sections overlap, the section with the smaller evaluation value is adopted. The word standard pattern for which the identification of the steady vowel section has been completed is sent to the word standard pattern memory unit 6 as a corrected word standard pattern and stored.

以上の２つの動作を経て，初めて認識動作が可能とな
る。認識時の動作は，話者が認識させたい単語を発声す
ると，マイクロフォン等を通して音声入力部１に入力さ
れた音声信号は，音響分析部３までは登録時と同様に処
理される。音響分析部３において算出された特徴パラメ
ータの系列は，認識動作時には適応形パターンマッチン
グ部７に送られる。適応形パターンマッチング部７にお
いて，入力された特徴パラメータの系列は，単語標準パ
ターンメモリ部６に格納してある単語標準パターンと順
次照合される。そのパターンマッチングの方法は，入力
音声の特徴パラメータ系列の始端と単語標準パターンの
特徴パラメータ系列の始端から，順次１対１に線形マッ
チングを行ない，パラメータ系列の終端同士が一致した
時点でのパラメータ間の距離値の累積和を評価値として
カウントするやり方で行なう。ただし，標準パターンの
特徴パラメータに定常母音フラグが立っている区間で
は，入力パターンと標準パターンの間でDPマッチングを
行なう。すなわち，標準パターンの定常母音区間ではDP
マッチング，その他の区間では時間軸の伸縮を行なわな
い直接的マッチングにより，パターン間の累積距離値を
求め，それを評価値とする方式である。以上の処理によ
り求められたある単語標準パターンに対する評価値は，
その標準パターンのIDコードと一緒に確認結果処理部８
に送られる。認識結果処理部８では，送られて来たパタ
ーンマッチングの結果に基づき，評価値の最小のものか
ら３候補選択する等の認識アルゴリズムを実行し，ホス
トコンピュータ等外部のユニットに結果を送信する。以
上の手順により，単語レベルの認識を行なう。The recognition operation can be performed only after the above two operations. In the recognition operation, when a speaker utters a word to be recognized, a voice signal input to the voice input unit 1 through a microphone or the like is processed up to the acoustic analysis unit 3 in the same manner as at the time of registration. The feature parameter sequence calculated by the acoustic analysis unit 3 is sent to the adaptive pattern matching unit 7 during the recognition operation. In the adaptive pattern matching unit 7, the input feature parameter sequence is sequentially collated with the word standard pattern stored in the word standard pattern memory unit 6. In the pattern matching method, linear matching is performed one-to-one sequentially from the beginning of the feature parameter sequence of the input speech and the beginning of the feature parameter sequence of the word standard pattern. Is performed by counting the cumulative sum of the distance values as the evaluation value. However, in the section where the steady vowel flag is set in the feature parameter of the standard pattern, DP matching is performed between the input pattern and the standard pattern. That is, in the regular vowel section of the standard pattern, DP
In this method, a cumulative distance value between patterns is obtained by matching and direct matching that does not expand or contract the time axis in other sections, and is used as an evaluation value. The evaluation value for a certain word standard pattern obtained by the above processing is
Confirmation result processing unit 8 along with the ID code of the standard pattern
Sent to The recognition result processing unit 8 executes a recognition algorithm such as selecting three candidates from the smallest evaluation value based on the received pattern matching result, and transmits the result to an external unit such as a host computer. With the above procedure, word level recognition is performed.

［発明の効果］以上説明したように本発明は，単語標準パターン中の
定常母音区間ではDPマッチング，その他の子音および過
渡区間においては直接的線形マッチングを行なうことに
より，従来のDPマッチングではよく発生する，母音系列
の似た単語同士の誤認識に対して，子音部分の評価値に
占める割合を高めて誤認識を減少させることが可能であ
る。すなわち，パターン登録型の単語認識方式におい
て，認識率を従来の方式より改善できるという効果があ
る。[Effects of the Invention] As described above, the present invention often performs DP matching in a stationary vowel section in a word standard pattern and direct linear matching in other consonants and transition sections, and thus often occurs in conventional DP matching. For erroneous recognition of words having similar vowel sequences, it is possible to reduce the erroneous recognition by increasing the ratio of the consonant part to the evaluation value. That is, there is an effect that the recognition rate can be improved in the pattern registration type word recognition method as compared with the conventional method.

[Brief description of the drawings]

第１図は本発明の一実施例によるパターンマッチング方
式が適用される単語音声認識装置の構成を示すブロック
図，第２図は従来の単語音声確認装置の構成を示すブロ
ック図である。１……音声入力部,2……始終端検出部,3……音響分析
部,4……定常母音標準パターンメモリ部,5……標準パタ
ーン修正部,6……単語標準パターンメモリ部,7……適応
形パターンマッチング部,8……認識結果処理部,9……ホ
ストコンピュータ,10……パターンマッチング部。FIG. 1 is a block diagram showing a configuration of a word voice recognition device to which a pattern matching method according to an embodiment of the present invention is applied, and FIG. 2 is a block diagram showing a configuration of a conventional word voice recognition device. 1 ... Speech input unit, 2 ... Start / end detection unit, 3 ... Sound analysis unit, 4 ... Standard vowel standard pattern memory unit, 5 ... Standard pattern correction unit, 6 ... Word standard pattern memory unit, 7 ...... Adaptive pattern matching section, 8 ... Recognition result processing section, 9 ... Host computer, 10 ... Pattern matching section.

Claims

(57) [Claims]

1. A stationary vowel standard pattern storage means for storing only a standard pattern of a stationary part of a vowel, and a standard vowel standard pattern stored in the stationary vowel standard pattern storage means is collated with an input word standard pattern. Using a word standard pattern correcting means for identifying a stationary vowel section in the input word standard pattern and correcting the word standard pattern, and using the word standard pattern corrected by the word standard pattern correcting means, Adaptive pattern matching means for adaptively performing linear and nonlinear pattern matching, wherein linear matching is performed for consonant sections in the input word pattern, and nonlinear matching is performed for vowel sections. Pattern matching method for the word speech recognition device.