JPH04128900A

JPH04128900A - Voice recognition device

Info

Publication number: JPH04128900A
Application number: JP2250868A
Authority: JP
Inventors: Shinji Osaki; 大崎　真司
Original assignee: NEC IC Microcomputer Systems Co Ltd
Current assignee: NEC IC Microcomputer Systems Co Ltd
Priority date: 1990-09-20
Filing date: 1990-09-20
Publication date: 1992-04-30

Abstract

PURPOSE:To prevent voice misrecognition due to a noise mixed with an input voice signal by comparing and deciding time-expanded/compressed data between a voice pitch pattern and an input voice analytic pattern. CONSTITUTION:A voice analysis part 3 analyzes the frequency of the input voice signal 2 and outputs the voice analytic pattern 22. A analytic data DP matching part 4 performs DP matching between the pattern 22 and a pattern 23 outputted by a pattern storage part 7 and outputs the time-expanded/ compressed data 25. A voice pitch extraction part 8, on the other hand, finds the voice pitch frequency of the input voice signal 2 at each time by an autocorrelating method to output the voice pitch pattern 28 and performs DP matching between the previously stored pattern 23 and the pitch pattern 26 of the same voice to obtain time-expanded/compressed data 27. A time expansion/compression comparison part 14 converts the data 25 and 27 into vector data, adds differences between all couples of vectors, and outputs the result as a time-expansion/compression similarity signal 15. A decision part 16 makes a decision according to the similarity signal 15.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は音声認識装置に関する。[Detailed description of the invention] [Industrial application field] The present invention relates to a speech recognition device.

[Conventional technology]

従来の音声認識装置の構成の一例を第９図に示す、マイ
クロホン１で集音された入力音声信号２は音声分析部３
で周波数分析され、第１０図の４２に示すように入力音
声信号をｔ！からｔｎにてサンプリングし、各サンプリ
ングに対する音声レベルを周波数ｆ、Ｈ２からｆｎＨ２
を求め、分析データＤＰマツチング部４へ出力する。An example of the configuration of a conventional speech recognition device is shown in FIG.
The input audio signal is frequency-analyzed at t! as shown at 42 in FIG. to tn, and the audio level for each sampling is set to frequency f, H2 to fnH2.
is calculated and output to the analysis data DP matching section 4.

一方音声分析パターン記憶部７には、第１１図に示すよ
うに認識したい音声の音声分析パターン４３が予め記憶
されている。On the other hand, the voice analysis pattern storage section 7 stores in advance a voice analysis pattern 43 of a voice to be recognized, as shown in FIG.

分析データＤＰマツチング部４では第１０図に示す音声
分析部３から出力される入力音声信号２の音声分析パタ
ーン４２と、音声分析パターン記憶部７から出力される
予め登録された音声分析パターン４３とが入力される。The analysis data DP matching section 4 combines the speech analysis pattern 42 of the input speech signal 2 output from the speech analysis section 3 shown in FIG. is input.

この２つのパターン間の時間伸縮を吸収するＤＰマツチ
ング法（１９８０年５月１０日　共立出版株式会社発行
音声認識１０７〜１０８ページ）により、音声分析パタ
ーン記憶部７に記憶しである全音声について比較し、・
最も類似している単語を音声認識単語としていた。Using the DP matching method (May 10, 1980, published by Kyoritsu Publishing Co., Ltd., Speech Recognition, pages 107-108), which absorbs the time expansion and contraction between these two patterns, all the sounds stored in the speech analysis pattern storage section 7 are compared. death,·
The most similar words were selected as speech recognition words.

[Problem to be solved by the invention]

上述した従来の音声認識装置では、信号対雑音比の良く
ない入力音声信号２が入力された場合、第１０図に示す
ように音声分析部３から出力される音声分析パターン４
２にノイズ４１が含まれてしまい、音声分析パターン記
憶部７に登録されるでいるノイズが含まれていない音声
信号パターン４３と分析データＤＰマツチング部４でＤ
Ｐマ・ンチングを行うと、ノイズ４１により４５のよう
に誤った時間伸縮が行われ、所望する音声に類似した違
う音声を認識してしまうという欠点があった。In the conventional speech recognition device described above, when an input speech signal 2 with a poor signal-to-noise ratio is input, a speech analysis pattern 4 outputted from the speech analysis section 3 as shown in FIG.
2 contains noise 41, and the analysis data DP matching unit 4 matches the noise 41 registered in the audio analysis pattern storage unit 7 with the audio signal pattern 43 that does not include noise.
When P-manching is performed, noise 41 causes erroneous time expansion and contraction as shown in 45, resulting in the recognition of a different voice similar to the desired voice.

本発明の目的は、入力音声信号から抽出した音声ピッチ
パターンの時間伸縮データと入力音声分析パターンの時
間伸縮データとを比較判定することにより、入力音声信
号に混入したノイズによる誤った時間伸縮に起因する音
声誤認識を防止し、高精度の音声認識が可能な音声蓄積
装置を提供することにある。An object of the present invention is to compare and determine the time warping data of a voice pitch pattern extracted from an input voice signal and the time warp data of an input voice analysis pattern, thereby detecting errors caused by erroneous time warping due to noise mixed in the input voice signal. It is an object of the present invention to provide a voice storage device that can prevent voice recognition errors and perform highly accurate voice recognition.

[Means to solve the problem]

本発明の音声認識装置は、入力音声信号を受け周波数分
析する音声分析部と、音声認識が必要な音声の分析パタ
ーンを予め登録しておく音声分析パターン記憶部と、前
記音声分析部が出力する音声分析パターンと前記音声分
析パターン記憶部が８カする音声分析パターンとをＤＰ
マツチング法により各単語の時間伸縮データを生成する
分析データＤＰマツチング部と、前記入力音声信号から
音声ピッチパターンを算出する音声ピッチ抽出部と、音
声認識に必要な音声の音声ピッチパターンを予め登録し
ておく音声ピッチパターン記憶部と、前記音声ピッチ抽
出部が出力する音声ピッチパターンと前記音声ピッチパ
ターン記憶部が出力する音声ピッチパターンとをＤＰマ
ツチング法により各単語の時間伸縮データを生成する音
声ピッチＤＰマツチング部と、前記分析データＤＰマツ
チング部で生成した各単語の時間伸縮データと前記音声
ピッチＤＰマツチング部で生成した各単語の時間伸縮デ
ータとをベクトル化し、ベクトル間相違を線形比較する
時間伸縮比較部と、該時間伸縮比較部が出力する時間伸
縮類似度信号の中で最もベクトル開類似度の高い単語を
音声認識単語として出力する判定部を有する。The speech recognition device of the present invention includes a speech analysis section that receives and frequency-analyzes an input speech signal, a speech analysis pattern storage section that preregisters analysis patterns of speech that require speech recognition, and a speech analysis section that outputs an input speech signal. The voice analysis pattern and the voice analysis pattern stored in the voice analysis pattern storage section are DP
An analysis data DP matching section that generates time-stretched data for each word using a matching method, a speech pitch extraction section that calculates a speech pitch pattern from the input speech signal, and a speech pitch pattern of speech necessary for speech recognition that is registered in advance. A voice pitch pattern storage unit that generates time-stretched data for each word by a DP matching method between the voice pitch pattern output by the voice pitch extraction unit and the voice pitch pattern output by the voice pitch pattern storage unit. A DP matching unit vectorizes the time expansion/contraction data of each word generated by the analysis data DP matching unit and the time expansion/contraction data of each word generated by the audio pitch DP matching unit, and linearly compares the differences between the vectors. It has a comparison section and a determination section that outputs a word with the highest vector open similarity among the time warping similarity signals outputted by the time warping comparison section as a speech recognition word.

〔Example〕

次に本発明について図面を参照して説明する。 Next, the present invention will be explained with reference to the drawings.

第１図は本発明の一実施例を示すブロック図。FIG. 1 is a block diagram showing one embodiment of the present invention.

第２図および第５図は入力音声信号の分析パターンと音
声分析パターン記憶部に予め登録されている音声分析パ
ターンとの時間伸縮を示す図、第３図および第６図は第
２図および第５図において、入力音声信号のピッチ変化
と音声ピッチパターン記憶部に予め登録されている音声
ピッチパターンとの時間伸縮を示す図、第４図（Ａ＞は
第２図の時間伸縮データをベクトル化した図、第４図（
Ｂ）は第３図の時間伸縮データをベクトル化した図、第
７図（Ａ）は第５図の時間伸縮データを、第７図（Ｂ）
は第６図の時間伸縮データのベクトル図、第８図は第４
図（Ａ）、（Ｂ）のベクトル間の相違を算出する方法を
示す図である。FIGS. 2 and 5 are diagrams showing the time expansion and contraction between the analysis pattern of the input audio signal and the audio analysis pattern registered in advance in the audio analysis pattern storage unit, and FIGS. 5 is a diagram showing the time expansion/contraction between the pitch change of the input audio signal and the audio pitch pattern registered in advance in the audio pitch pattern storage unit, and FIG. Fig. 4 (
B) is a diagram that vectorizes the time warping data in Figure 3, Figure 7 (A) shows the time warp data in Figure 5, and Figure 7 (B)
is a vector diagram of the time warping data in Figure 6, and Figure 8 is a vector diagram of the time warping data in Figure 4.
It is a figure which shows the method of calculating the difference between the vectors of figures (A) and (B).

第１図は、マイクロホン１と、マイクロホン１が出力す
る入力音声信号２を受け周波数分析する音声分析部３と
、音声分析部３が出力する入力音声分析パターン２２と
、音声認識が必要な音声の分析パターンを予め登録して
おく音声分析パターン記憶部７と、音声分析部３が出力
する音声分析パターン２２と音声分析パターン記憶部７
が出力する音声分析パターン２３とをＤＰマツチングし
時間伸縮データ２５を生成する分析データＤＰマツチン
グ部４と、入力音声信号２から音声ピッチパターン２８
を算出する音声ピッチ抽出部８と、音声ピッチパターン
２６を予め登録しておく音声ピッチパターン記憶部１３
と、音声ピッチパターン２８と音声ピッチパターン２６
とをＤＰマツチングし時間伸縮データ２７を生成する音
声ピッチＤＰマツチング部１０と、分析データＤＰマツ
チング部４が出力する時間伸縮データ２５と音声ピッチ
ＤＰマツチング部１０が出力する時間伸縮データ２７と
をベクトル化後、ベクトル間相違を線形比較する時間伸
縮比較部１４と、時間伸縮比較部１４が出力する時間伸
縮類似度信号１５の中で最もベクトル閲類似度の高い単
語を音声認識単語として出力する判定部１６から構成さ
れる。FIG. 1 shows a microphone 1, a voice analysis section 3 that receives and frequency-analyzes the input voice signal 2 output by the microphone 1, an input voice analysis pattern 22 output by the voice analysis section 3, and a voice that requires voice recognition. A speech analysis pattern storage section 7 in which analysis patterns are registered in advance, and a speech analysis pattern 22 outputted by the speech analysis section 3 and the speech analysis pattern storage section 7.
an analysis data DP matching unit 4 that performs DP matching with the audio analysis pattern 23 output by the input audio signal 2 to generate time expansion/compression data 25;
a voice pitch extraction unit 8 that calculates the voice pitch pattern 26; and a voice pitch pattern storage unit 13 that registers the voice pitch pattern 26 in advance.
, the audio pitch pattern 28 and the audio pitch pattern 26
The audio pitch DP matching unit 10 generates time expansion/compression data 27 by performing DP matching of After the conversion, a time warp comparison unit 14 linearly compares differences between vectors, and a decision is made to output the word with the highest vector viewing similarity among the time warp similarity signals 15 output by the time warp comparison unit 14 as speech recognition words. It consists of a section 16.

第２図はノイズ２１を含んだ入力音声信号２の音声分析
パターン２２と、音声分析パターン記憶部７に予め記憶
されている入力音声信号２のノイズ２１と類似した音声
分析パターン２３どの時間伸縮データ２５の関係を示す
図である。FIG. 2 shows a voice analysis pattern 22 of the input voice signal 2 containing noise 21 and a voice analysis pattern 23 similar to the noise 21 of the input voice signal 2 stored in advance in the voice analysis pattern storage unit 7. 25 is a diagram showing the relationship between

第３図はこの時の入力音声信号２の音声ピッチパターン
２８と、音声ピッチパターン記憶部１３に予め記憶され
ている音声ピッチパターン２６どの時間伸縮データ２７
の関係を示す図である。FIG. 3 shows the audio pitch pattern 28 of the input audio signal 2 at this time, the audio pitch pattern 26 stored in advance in the audio pitch pattern storage section 13, and the time expansion/contraction data 27.
FIG.

第４図（Ａ）は入力音声信号２の音声分析パターン２２
と予め登録されている第２図の音声分析パターン２３の
時間伸縮データ２５を各サンプリング時間のマトリック
ス上にプロットしたベクトルを示し、第４図（Ｂ）は第
３図の音声ピッチパターン２８と入力音声分析パターン
２６の時間伸縮データ２７を各サンプリング時間のマト
リックス上にプロットしたベクトルを示す。FIG. 4(A) shows the audio analysis pattern 22 of the input audio signal 2.
4(B) shows a vector obtained by plotting the time expansion/compression data 25 of the voice analysis pattern 23 of FIG. 2 registered in advance on a matrix of each sampling time, and FIG. 4(B) shows the voice pitch pattern 28 of FIG. A vector is shown in which the time warped data 27 of the audio analysis pattern 26 is plotted on a matrix of each sampling time.

第５図はノイズ３１を含んだ入力音声信号２の音声分析
パターン３２と、音声分析パターン記憶部７に予め記憶
されている入力音声信号２の音声分析パターンのノイズ
３１と類似していない音声分析パターン３３との時間伸
縮データ３５を示し、第６図はこの時の入力音声信号２
の音声ピッチパターン３８と、音声ピッチパターン記憶
部１３に予め記憶されている音声ピッチパターン３６ど
の時間伸縮データ３７を示している。FIG. 5 shows a speech analysis pattern 32 of the input speech signal 2 containing noise 31 and a speech analysis pattern that is not similar to the noise 31 of the speech analysis pattern of the input speech signal 2 stored in the speech analysis pattern storage section 7 in advance. FIG. 6 shows the time expansion/contraction data 35 with the pattern 33, and FIG. 6 shows the input audio signal 2 at this time.
The audio pitch pattern 38 and the audio pitch pattern 36 stored in advance in the audio pitch pattern storage unit 13 and the time expansion/contraction data 37 are shown.

第７図（Ａ＞は第５図の時間伸縮データ３５を、第７図
（Ｂ）は第６図の時間伸縮データ３７のベクトルを示す
。FIG. 7(A>) shows the vector of the time warping data 35 of FIG. 5, and FIG. 7(B) shows the vector of the time warping data 37 of FIG.

次に動作について説明する。Next, the operation will be explained.

音声分析部３は入力音声信号２を第２図に示すように各
時刻１．からｔｎで周波数分析を行い、ｆ、Ｈ２からｆ
ｎＨ，、の各周波数成分を算出し、音声分析パターン２
２を出力する。音声分析パターン記憶部７は、予め音声
認識させたい音声を第２図の２３に示すように音声分析
し登録しておく０分析データＤＰマツチング部４では音
声分析パターン２２と音声分析パターン記憶部部７が出
力する音声分析パターン２３とをＤＰマツチングし時間
伸縮データ２５を出力する。The voice analysis unit 3 analyzes the input voice signal 2 at each time 1. as shown in FIG. Frequency analysis is performed on tn from f, H2 to f
Calculate each frequency component of nH, , and create voice analysis pattern 2.
Outputs 2. The voice analysis pattern storage unit 7 analyzes and registers the voice to be recognized in advance as shown in 23 in FIG. 2.The analysis data DP matching unit 4 stores the voice analysis pattern 22 and the voice analysis pattern storage unit. It performs DP matching with the voice analysis pattern 23 outputted by 7 and outputs time expansion/contraction data 25.

この時入力音声信号分析パターン２２から、２１がノイ
ズであるという判断ができない、従ってノイズ２１を音
声分析パターン２３の特徴であるノイズ２１に類似した
パターン２４へ合せ込むようにＤＰマツチングが行われ
誤った時間伸縮が行われ、時間伸縮データ２５が得られ
る。At this time, it cannot be determined from the input audio signal analysis pattern 22 that 21 is noise, so DP matching is performed to match the noise 21 to a pattern 24 similar to the noise 21, which is a characteristic of the audio analysis pattern 23. Time expansion/contraction is performed, and time expansion/contraction data 25 is obtained.

−有音声ピッチ部８は入力音声信号２の各時刻ｔ１から
ｔｎの音声ピッチ周波数を自己相関法（１９８０年５月
１０日　共立出版株式会社発行音声認識　３２〜３４．
４２〜４３ページ）で求め、音声ピッチパターン２８を
出力する。音声ピッチパターン２８は、音声ピッチパタ
ーン記憶部１３に予め記憶されている音声分析パターン
２３と同じ音声の音声ピッチパターン２６とを音声ピッ
チＤＰマツチング部１０でＤＰマツチング法により比較
し、時間伸縮データ２７を得る６時間伸縮比較部１４は
、時間伸縮データ２５と２７を第４［Ｊ（Ａ）および（
Ｂ）のようにベクトルデータに変換し、時刻ｔ１のベク
トル対から逐次順番にベクトル間の相違を求め全ベクト
ル対の相違を加算し、その加算結果を時間伸縮類似度信
号１５（図示せず）として出力する。このベクトル変換
は、第２図の音声分析パターン２２を縦軸に、入力音声
分析パターン２３を横軸とし、音声分析パターン２２の
各サンプリング時間に対応する入力音声分析パターン２
３のサンプリング時間とを、ｔ、対ｔｌ　、ｔ２対ｔ２
、・・・・・・ｔフ対ｔ７、ｔ８対１．．１９．１９対
ｔｌｏの如く第４図（Ａ）に示すように、マトリックス
上にプロットし時間伸縮データ２５をベクトル化する。- The voice pitch unit 8 calculates the voice pitch frequency from each time t1 to tn of the input voice signal 2 using an autocorrelation method (May 10, 1980, Kyoritsu Shuppan Co., Ltd., Speech Recognition, 32-34.
42-43) and outputs the audio pitch pattern 28. The voice pitch pattern 28 is generated by comparing the voice analysis pattern 23 stored in advance in the voice pitch pattern storage unit 13 with the voice pitch pattern 26 of the same voice using the DP matching method in the voice pitch DP matching unit 10, and then generating the time expansion/contraction data 27. The 6-hour expansion/contraction comparison unit 14 converts the time expansion/contraction data 25 and 27 into the fourth [J(A) and (
Convert to vector data as shown in B), find differences between vectors sequentially starting from the vector pair at time t1, add up the differences of all vector pairs, and use the result of the addition as a time warping similarity signal 15 (not shown). Output as . This vector conversion is performed using the input voice analysis pattern 22 in FIG. 2 as the vertical axis and the input voice analysis pattern 23 as the horizontal axis.
3 sampling times, t vs. tl, t2 vs. t2
,...Tfu vs. t7, t8 vs. 1. ．． 19.19 vs. tlo, as shown in FIG. 4(A), is plotted on a matrix and the time warp data 25 is vectorized.

第３図も入力音声ピッチパターン２８を縦軸に、音声ピ
ッチパターン２６を横軸とし同様に第４図（Ｂ）に示す
ようにベクトル化する。ベクトル間の相違は、直交座標
でベクルトを規定し、第８図に示すように縦軸の差の絶
対値と横軸の差の絶対値とを加算して求める。In FIG. 3, the input voice pitch pattern 28 is plotted on the vertical axis and the voice pitch pattern 26 is plotted on the horizontal axis, and vectorization is similarly performed as shown in FIG. 4(B). The difference between vectors is determined by defining a vector using orthogonal coordinates and adding the absolute value of the difference on the vertical axis and the absolute value of the difference on the horizontal axis, as shown in FIG.

判定部１６のしきい値を例えば「１」とすれば、時間伸
縮類似度信号１５のベクトル間差分加算値が第８図に示
すように「５」であり、しきい値を越えるため、その時
間伸縮類似度信号１５は廃棄される。For example, if the threshold value of the determination unit 16 is "1", the vector difference addition value of the time warped similarity signal 15 is "5" as shown in FIG. 8, which exceeds the threshold value. The time warped similarity signal 15 is discarded.

次に音声分析パターン記憶部７にノイズを含まない音声
分析パターンが記憶されている場合について説明する。Next, a case will be described in which a speech analysis pattern that does not include noise is stored in the speech analysis pattern storage section 7.

第５図は第２図同様、音声分析部３の音声分析結果３２
にノイズ３１が含まれている０分析データＤＰマツチン
グ部４では、音声分析パターン記憶部７に予め記憶され
ているノイズの含まれていない音声分析パターン３３と
、音声分析部３から出力された音声分析パターン３２と
を入力しＤＰマツチングを行い、時間伸縮データ３５を
出力する。この時間伸縮データ３５は、音声分析パター
ン記憶部７に記憶してあった音声分析パターン３３にノ
イズ３１に類似した音声分析パターンかないため、第５
図（Ａ）の音声分析パターン３２と音声分析パターン３
３が類似していないことになる。Similar to FIG. 2, FIG. 5 shows the voice analysis result 32 of the voice analysis section 3.
0 analysis data in which noise 31 is included The DP matching unit 4 uses the voice analysis pattern 33 that does not include noise and is stored in advance in the voice analysis pattern storage unit 7 and the voice output from the voice analysis unit 3. The analysis pattern 32 is input, DP matching is performed, and time expansion/contraction data 35 is output. This time expansion/contraction data 35 is the fifth data because there is no voice analysis pattern similar to the noise 31 in the voice analysis pattern 33 stored in the voice analysis pattern storage section 7.
Speech analysis pattern 32 and speech analysis pattern 3 in Figure (A)
3 are not similar.

一方音声ピッチ抽出部８は、入力音声信号２の各時刻ｔ
１からｔｎの音声ピッチ周波数を自己相関法で求め、音
声ピッチパターン３８を出力する。音声ピッチＤＰマツ
チング部１０は、音声ピッチパターン３８と、音声分析
パターン３３と同じ音声の音声ピッチパターン３６とを
ＤＰマツチング法により比較し、時間伸縮データ３７を
得る。時間伸縮比較部１４では、分析データＤＰマツチ
ング部４が出力する時間伸縮データ３５と音声ピッチＤ
Ｐマツチング部１０が出力する時間伸縮データ３７とを
受け、それぞれ第７図（Ａ）。On the other hand, the audio pitch extractor 8 extracts each time t of the input audio signal 2.
The voice pitch frequencies from 1 to tn are determined by the autocorrelation method, and a voice pitch pattern 38 is output. The audio pitch DP matching unit 10 compares the audio pitch pattern 38 with the audio pitch pattern 36 of the same audio as the audio analysis pattern 33 using the DP matching method, and obtains time expansion/contraction data 37. In the time warp comparison section 14, the time warp data 35 output from the analysis data DP matching section 4 and the audio pitch D
FIG. 7A shows the time expansion and contraction data 37 outputted by the P matching section 10.

（Ｂ）のようにベクトル死後双方の差分を加算し、第２
図と同様に時間伸縮類似度信号１５を判定部１６へ出力
する。As shown in (B), add the difference between both vectors after death, and
Similarly to the figure, the time warp similarity signal 15 is output to the determination unit 16.

ここで入力音声分析パターン３２と音声分析パターン３
３はノイズ３１がないため類似しないが、第７図（Ａ）
、（Ｂ）に示すようにベクトルが全く同じであるため、
ベクトル間差分加算値は「０」となる、このベクトル間
差分加算値は、予め定めた所定のしきい値「ｌ」以下で
あるため、第５図の音声分析パターン３３を音声認識結
果として出力する。Here, input voice analysis pattern 32 and voice analysis pattern 3
3 is not similar because there is no noise 31, but Fig. 7(A)
, since the vectors are exactly the same as shown in (B),
The inter-vector difference addition value is "0". Since this inter-vector difference addition value is less than the predetermined threshold value "l", the speech analysis pattern 33 in FIG. 5 is output as the speech recognition result. do.

このように時間伸縮類似度信号であるベクトル間差分加
算値が、予め定めた所定のしきい値以上であれば類似度
が低いと判定しその時間伸縮類似度信号は破棄し、しき
い値以下であれば類似度が高いと判定しその時間伸縮類
似度信号を保持し、次の時間伸縮類似度信号の加算値と
しきい値を比較する。このような動作を予め登録しであ
る単語全部について行い、最後に残った単語が最も類似
度が高く、音声認識結果として出力する。In this way, if the vector-to-vector difference sum value, which is a time-stretched similarity signal, is greater than or equal to a predetermined threshold, it is determined that the degree of similarity is low, and that time-stretched similarity signal is discarded, and the vector difference value is determined to be lower than the threshold. If so, it is determined that the degree of similarity is high, that time-warping similarity signal is held, and the next sum of time-warping similarity signals is compared with the threshold value. This operation is performed for all the words registered in advance, and the last word remaining has the highest degree of similarity and is output as the speech recognition result.

〔Effect of the invention〕

以上説明したように本発明は、入力音声信号にノイズが
混入した際、音声分析部から出力される音声分析パター
ンのノイズの影響により誤ってＤＰマツチングした時間
伸縮データを、音声ピッチ抽出部から出力される音声ピ
ッチパターンと音声ピッチパターン記憶部に予め記憶し
である音声ピッチパターンとのＤＰマツチングで得られ
る時間伸縮データにより比較判定することにより、誤っ
た時間伸縮による音声誤認識を防止でき、音声認識性能
を高める効果がある。As explained above, when noise is mixed into the input audio signal, the audio pitch extraction unit outputs time-stretched data that has been erroneously DP matched due to the influence of noise in the audio analysis pattern output from the audio analysis unit. By comparing and determining the voice pitch pattern to be used with the voice pitch pattern pre-stored in the voice pitch pattern storage unit using time warp data obtained by DP matching, it is possible to prevent speech misrecognition due to erroneous time warp. It has the effect of improving recognition performance.

[Brief explanation of drawings]

第１図は本発明の一実施例を示すブロック図。第２図および第５図は入力音声信号の分析パターンと音
声分析パターン記憶部に予め登録されている音声分析パ
ターンとの時間伸縮を示す図、第３図および第６図は第
２図および第５図において、入力音声信号のピッチ変化
と音声ピッチパターン記憶部に予め登録されている音声
ピッチパターンとの時間伸縮を示す図、第４図（Ａ）は
第２図の時間伸縮データをベクトル化した図、第４図（
Ｂ）は第３図の時間伸縮データをベクトル化した図、第
７図（Ａ）は第５図の時間伸縮データのベクトル図、第
７図（Ｂ）は第６図の時間伸縮データのベクトル図、第
８図は第４図（Ａ）。（Ｂ）のベクトル間の相違を算出する方法を示す図、第
９図は従来の音声認識装置の一例を示すブロック図、第
１０図は第７図において入力音声信号の音声分析パター
ンと音声分析パターン記憶部に登録されている音声分析
パターンとの時間伸縮を示す図、第１１図は音声分析パ
ターン記憶部に音声分析パターンを示す図である。１・・・マイクロホン、２・・・入力音声信号、３・・
・音声分析部、４・・・分析データＤＰマツチング部、
７・・・音声分析パターン記憶部、８・・・音声ピッチ
抽出部、１０・・・音声ピッチＤＰマツチング部、１３
・・・音声ピッチパターン記憶部、１４・・・時間伸縮
比較部、１６・・・判定部、２１，３１．４１・・・ノ
イズ、２２，３２．４２・・・入力音声信号の音声分析
パターン、２３，３３．４３・・・音声分析パターン、
２４・・・ノイズ類似音声分析パターン、２５３５．４
５・・・音声分析パターン時開伸縮データ、２６．３６
・・・音声ピッチパターン、２８．３８・・入力音声信
号の音声ピッチパターン、２７．３７・・・音声ピッチ
パターンの時間伸縮データ。FIG. 1 is a block diagram showing one embodiment of the present invention. FIGS. 2 and 5 are diagrams showing the time expansion and contraction between the analysis pattern of the input audio signal and the audio analysis pattern registered in advance in the audio analysis pattern storage unit, and FIGS. 5 is a diagram showing the time warping between the pitch change of the input audio signal and the sound pitch pattern registered in advance in the sound pitch pattern storage unit, and FIG. 4 (A) is a vectorization of the time warping data in FIG. Fig. 4 (
B) is a vector diagram of the time warped data in Figure 3, Figure 7 (A) is a vector diagram of the time warped data in Figure 5, and Figure 7 (B) is a vector diagram of the time warped data in Figure 6. Fig. 8 is Fig. 4(A). FIG. 9 is a block diagram showing an example of a conventional speech recognition device; FIG. 10 is a diagram showing a method for calculating the difference between the vectors in FIG. 7; FIG. FIG. 11 is a diagram showing the time expansion and contraction with the voice analysis pattern registered in the pattern storage section, and FIG. 11 is a diagram showing the voice analysis pattern in the voice analysis pattern storage section. 1...Microphone, 2...Input audio signal, 3...
・Speech analysis section, 4...Analysis data DP matching section,
7... Voice analysis pattern storage section, 8... Voice pitch extraction section, 10... Voice pitch DP matching section, 13
. . . Audio pitch pattern storage section, 14 . . Time expansion/contraction comparison section, 16 . , 23, 33. 43...Speech analysis pattern,
24...Noise-like speech analysis pattern, 2535.4
5...Speech analysis pattern time expansion/contraction data, 26.36
...Audio pitch pattern, 28.38...Audio pitch pattern of input audio signal, 27.37...Time expansion/contraction data of audio pitch pattern.

Claims

[Claims]

a voice analysis section that receives and frequency-analyzes an input voice signal; a voice analysis pattern storage section that pre-registers voice analysis patterns that require speech recognition; and a voice analysis pattern output by the voice analysis section and the voice analysis pattern. An analysis data DP matching section that generates time-stretched data of each word using the speech analysis pattern output from the storage section using the DP matching method, a speech pitch extraction section that calculates a speech pitch pattern from the input speech signal, and a speech recognition system. A voice pitch pattern storage section in which voice pitch patterns of necessary voices are registered in advance, and a voice pitch pattern outputted by the voice pitch extraction section and a voice pitch pattern outputted from the voice pitch pattern storage section using a DP matching method. A voice pitch DP matching unit that generates time expansion/contraction data for each word; and a vector that combines the time expansion/contraction data of each word generated by the analysis data DP matching unit and the time expansion/contraction data of each word generated by the voice pitch DP matching unit. a time warp comparison unit that linearly compares differences between vectors, and a determination unit that outputs a word with the highest inter-vector similarity among the time warp similarity signals outputted by the time warp comparison unit as a speech recognition word. A voice recognition device comprising: