JPH0527120B2 - - Google Patents

Info

Publication number
JPH0527120B2
JPH0527120B2 JP58094750A JP9475083A JPH0527120B2 JP H0527120 B2 JPH0527120 B2 JP H0527120B2 JP 58094750 A JP58094750 A JP 58094750A JP 9475083 A JP9475083 A JP 9475083A JP H0527120 B2 JPH0527120 B2 JP H0527120B2
Authority
JP
Japan
Prior art keywords
speech
patterns
pattern
voice
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP58094750A
Other languages
Japanese (ja)
Other versions
JPS59219800A (en
Inventor
Junichiro Fujimoto
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority to JP9475083A priority Critical patent/JPS59219800A/en
Publication of JPS59219800A publication Critical patent/JPS59219800A/en
Publication of JPH0527120B2 publication Critical patent/JPH0527120B2/ja
Granted legal-status Critical Current

Links

Description

【発明の詳細な説明】 技術分野 本発明は、音声認識装置における音声パターン
比較方法に関する。
DETAILED DESCRIPTION OF THE INVENTION Technical Field The present invention relates to a speech pattern comparison method in a speech recognition device.

従来技術 近年、マン・マシン対話の実現のために音声認
識装置が実用化されつつあるが、音声の認識にお
いて重要な部分は、辞書登録された特徴パターン
と入力音声の特徴パターンの照合部である。通
常、この音声特徴パターンの照合には次の二つの
問題点があり,その一は、発生毎に音声長が変動
することであり、その二は発生者によつてホルマ
ントが異り周波数変動があることである。前記そ
の一の変動吸収のためには動的計画法(DP)に
よるパターンマツチング法が知られている。この
DPマツチング法は時間方向の変動吸収のために
比較すべき二つのパターン間の類似度が最大にな
るようにパターンの時間長を伸縮するものである
が、この方法は二つのパターン間のあらゆる対応
づけとして類似度を求めるものであるため、演算
量が多く、また周波数変動を吸収するためには膨
大なDP演算が必要となる。なお、前記その二つ
の周波数変動を吸収する方法は未だに確立されて
いない。
PRIOR ART In recent years, speech recognition devices have been put into practical use to realize man-machine dialogue, but an important part of speech recognition is a section that matches feature patterns registered in a dictionary with feature patterns of input speech. . Normally, there are two problems in matching this speech feature pattern: the first is that the length of the speech varies each time it occurs, and the second is that the formants differ depending on the speaker, resulting in frequency fluctuations. It is a certain thing. A pattern matching method using dynamic programming (DP) is known for the first variation absorption method. this
The DP matching method expands or contracts the time length of a pattern to maximize the similarity between two patterns to be compared in order to absorb fluctuations in the time direction. Since the method calculates similarity as an index, the amount of calculation is large, and a huge amount of DP calculation is required to absorb frequency fluctuations. Note that a method for absorbing these two frequency fluctuations has not yet been established.

目 的 本発明は、上述のごとき実情に鑑みてなされた
もので、少ない演算量によつて時間変動と周波数
変動の両方を吸収して精度よくパターンを照合し
得るようにした音声パターン比較方法を提供しよ
うとするものである。
Purpose The present invention has been made in view of the above-mentioned circumstances, and provides a voice pattern comparison method that can absorb both time fluctuations and frequency fluctuations and match patterns with high accuracy with a small amount of calculation. This is what we are trying to provide.

構 成 本発明は、上記目的を達成するために、1音声
を周波数分析して得られる特徴量の時系列からな
る音声パターンを比較する音声パターン比較方法
において、周波数分析の結果を2値化して特徴量
分布部を求め、該特徴量分布部を周波数軸方向に
細線化処理して形成された音声パターンを、比較
すべき音声パターンの少なくとも一方とし、動的
計画法により両方の音声パターンの時間軸方向の
整合をとつて比較を行なうこと、或いは、2音声
を周波数分析して得られる特徴量の時系列からな
る音声パターンを比較する音声パターン比較方法
において、周波数分析の結果得られた帯域パワー
のピークを検出し、該ピークを時系列に示した音
声パターンを、比較すべき音声パターンの少なく
とも一方とし、動的計画法により両方の音声パタ
ーンの時間軸方向の整合をとつて比較を行なうこ
とを特徴としたものである。以下、実施例に基づ
いて説明する。
Configuration In order to achieve the above object, the present invention provides a speech pattern comparison method for comparing speech patterns consisting of a time series of feature amounts obtained by frequency analysis of one speech, in which the results of frequency analysis are binarized. A voice pattern formed by finding a feature distribution and thinning the feature distribution in the frequency axis direction is used as at least one of the voice patterns to be compared, and the time of both voice patterns is calculated using dynamic programming. In a voice pattern comparison method that performs comparison by aligning in the axial direction, or compares voice patterns consisting of time series of feature values obtained by frequency analysis of two voices, the band power obtained as a result of frequency analysis. Detecting the peak of , and using the speech pattern showing the peak in time series as at least one of the speech patterns to be compared, and performing the comparison by aligning both speech patterns in the time axis direction using dynamic programming. It is characterized by The following will explain based on examples.

最初に、第1図及び第2図を参照しながら通常
のDPマツチング法について説明する。
First, a normal DP matching method will be explained with reference to FIGS. 1 and 2.

まず、第1図において、a図のパターンとb図
のパターンを比較することを考えるが、同図に
は、音声パターンを時間軸方向に一定間隔でサン
プリングしたパターン1,2…が示されており、
これら各パターン1,2…をフレームと呼んでい
る。このDP法は、まずa図の第1フレームとb
図の第1フレームを対応づけて二つの波形の差を
求め、第2図の斜線部を求める。以下同様にして
a図の第1フレームとb図の第2フレーム、a図
の第1フレームとb図の第3フレーム……a図の
第2フレームとbの第1フレーム、a図の第2フ
レームとb図の第2フレーム……と対応づけ、そ
の波形差が一番少なくなるようにフレーム間、つ
まり時間軸の対応をつけるものである。そのた
め、a,bが同一人物の発生した音声であるよう
な周波数変動が少ないパターンに関しては有効で
あるが、例えば第2図の破線と実線の波形のよう
に波形が似ているにもかかわらず、周波数にずれ
がある場合にはこれを同一波形とみなすことがで
きない。このような現象はaとbの音声の発声者
が異なつた場合に起こるが、これには個人のホル
マント差が影響している。
First, let's consider comparing the pattern in Figure A and the pattern in Figure B in Figure 1. In Figure 1, Patterns 1, 2, etc., in which audio patterns are sampled at regular intervals along the time axis, are shown. Ori,
Each of these patterns 1, 2, . . . is called a frame. This DP method first uses the first frame in figure a and the frame in figure b.
The difference between the two waveforms is determined by correlating the first frame in the figure, and the shaded area in FIG. 2 is determined. Similarly, the first frame in figure a and the second frame in figure b, the first frame in figure a and the third frame in figure b...the second frame in figure a, the first frame in figure b, and the second frame in figure a. 2 frame and the second frame in figure b, etc., and the frames, that is, the correspondence on the time axis, are established so that the difference in waveform is minimized. Therefore, it is effective for patterns with little frequency fluctuation, such as when a and b are voices generated by the same person, but even though the waveforms are similar, such as the waveforms of the broken line and solid line in Figure 2, , if there is a difference in frequency, these cannot be considered as the same waveform. This phenomenon occurs when the speakers of voices a and b are different, and this is affected by the formant differences between individuals.

本発明は、上記DPマツチング法の欠点を解決
するためになされたもので、その動作原理につい
て第3図を参照しながら説明する。まず、前述の
ごとくしてサンプリングされたパターンを、周波
数軸方向及び時間軸方向にサンプリングし、周波
数の低い方から順にi=1,2,……I、時間軸
方向をj=1,2,3,……Jとし、二つのパタ
ーンをA(i,j)、B(i,j)で表わす。次に
辞書登録すべきパターンはフイルター群でi=
1,2,……Iまで分け、閾値を設けて2値化し
て登録する(A(i,j)、jA=1,…JA)。一方、
認識音声は同様に2値化されたあと、細線化され
てB(i,j)、jB=1,2,…JBとなる。ここで
jAとjBの対応づけが問題になるが、この対応づけ
を第3図に示す。第3図において、i−jA面上で
A(i,jA)を表わすとa図のようになりB(i,
jB)をi−jB面で表わすとb図のようになる。た
だし、2値化して0,1にしたうち1の部分を斜
線で表わしている。この時、jAとjBのサンプル点
の作るメツシユ(jA,jB)各点におけるA(i,
jA)とB(i,jB)の類似度r(jA,jB)を次式で定
義し、 r(jA,jB)=Ii=1 A(i,jA)・B(i,jB) ……(1) jA=1,jB=1からjA,jBまでの類似度の累計を
R(jA,jB)で表わした時、 R(jA,jB)=r(jA,jB)+maxR(jA,jB−1) R(jA−1,jB−1) R(jA−1,jB) ……(2) となるような(jA,jB)を決定して行く(ただし
maxは{ }内の最大値を採用することを示し
ている)。なお、上記(1)式は積をとつているが、
これは理論演算でも良いし、B(i,jB)のiを
変化させて「1」を抽出し、その部分だけ演算を
しても良い。また(2)式の結果をフレーム数I+J
で正規化することも考えられる。また各パターン
の始端と終端は各々対応づけるものとする。
The present invention has been made to solve the drawbacks of the above-mentioned DP matching method, and its operating principle will be explained with reference to FIG. 3. First, the pattern sampled as described above is sampled in the frequency axis direction and the time axis direction, and in order from the lowest frequency, i = 1, 2, ... I, and in the time axis direction, j = 1, 2, ... 3,...J, and the two patterns are represented by A(i,j) and B(i,j). The next pattern to be registered in the dictionary is the filter group i=
1, 2, . . . I, set a threshold value, binarize, and register (A(i, j), j A =1, . . . J A ). on the other hand,
The recognized speech is similarly binarized and then thinned to become B(i,j), j B =1, 2, . . . J B . here
The problem is the correspondence between j A and j B , and this correspondence is shown in Figure 3. In Figure 3, when A(i, j A ) is represented on the i-j A plane, it becomes as shown in diagram a, and B(i,
When j B ) is expressed on the i-j B plane, it becomes as shown in diagram b. However, among the binarized values of 0 and 1, the 1 part is indicated by diagonal lines. At this time , A( i ,
The similarity r(j A , j B ) between j A ) and B(i, j B ) is defined by the following formula, r(j A , j B )= Ii=1 A(i, j A )・B(i, jB )...(1) When the cumulative total of similarities from jA = 1, jB = 1 to jA , jB is expressed as R(jA, jB ) , R( j A , j B ) = r ( j A , j B ) + maxR ( j A , j B −1) R ( j A −1, j B −1) R ( j A −1, j B ) ……( 2) Determine (j A , j B ) such that (where
max indicates that the maximum value within { } is to be used). Note that although equation (1) above takes the product,
This may be a theoretical calculation, or it may be possible to extract "1" by changing i of B(i, j B ) and perform calculations only on that part. Also, the result of equation (2) is the number of frames I + J
It is also possible to normalize with It is also assumed that the starting end and ending end of each pattern are associated with each other.

第4図は、上記動作原理に従つて構成された本
発明の一実施例を示すブロツク線図で、図中、1
はマイク、2はフイルター群、3は音声区間検出
部、4は2値化部、5はスイツチ、6は辞書部、
7は細線化部、8は類似度計算部、9はjA,jB
化部、10は類似度検出部、11はR計算部、1
2はjA又はjBを1ステツプ歩進する歩進部、13
はRの最大算出部、14は認識結果出力部で、本
発明によると、aのパターンが周波数軸方向に幅
をもち、bのパターンの幅がせまいため、発声者
によつて周波数が変動し、そのためbのパターン
が周波数軸方向に変動してもaのパターン幅から
はみ出さない限りその変動を吸収することができ
る。
FIG. 4 is a block diagram showing an embodiment of the present invention constructed according to the above operating principle.
is a microphone, 2 is a filter group, 3 is a voice section detection section, 4 is a binarization section, 5 is a switch, 6 is a dictionary section,
7 is a thinning section, 8 is a similarity calculation section, 9 is a j A , j B changing section, 10 is a similarity detection section, 11 is an R calculation section, 1
2 is a stepping part that advances j A or j B by one step, 13
14 is a maximum calculation unit for R, and 14 is a recognition result output unit. According to the present invention, since the pattern a has a width in the frequency axis direction and the width of the pattern b is narrow, the frequency varies depending on the speaker. Therefore, even if the pattern b fluctuates in the frequency axis direction, the fluctuation can be absorbed as long as it does not exceed the pattern width of a.

第5図は、本発明の他の実施例を示す図で、こ
の実施例は、辞書部6の前にピーク検出部15を
設け、該ピーク検出部15によつて音声の特徴パ
ターンの周波数上のピークを検出し、そのパター
ンを辞書部に登録しておき、他方、照合すべきパ
ターンが入力された時に、これをある閾値で0,
1に2値化し(この時1になる部分を特徴量分布
部と称する)、これと辞書パターンの類似度を前
記式(2)に従つて動的計画法によつて最大になるよ
うに時間伸縮を行なつて照合するようにしたもの
である。
FIG. 5 is a diagram showing another embodiment of the present invention. In this embodiment, a peak detection section 15 is provided before the dictionary section 6, and the peak detection section 15 detects the frequency of the characteristic pattern of the voice. Detect the peak of , and register that pattern in the dictionary section. On the other hand, when a pattern to be matched is input, it is set to 0,
It is binarized to 1 (the part that becomes 1 at this time is called the feature distribution part), and the similarity between this and the dictionary pattern is maximized over time using dynamic programming according to equation (2) above. This is a method that performs expansion and contraction for comparison.

効 果 以上の説明から明らかなように、本発明による
と、少ない演算量で時間変動と周波数変動の両方
を吸収することができる精度の高い音声パターン
比較方法を提供することができる。
Effects As is clear from the above description, according to the present invention, it is possible to provide a highly accurate voice pattern comparison method that can absorb both time fluctuations and frequency fluctuations with a small amount of calculation.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図及び第2図は、DPマツチング法を説明
すめための図、第3図は、本発明の原理を説明す
るための図、第4図及び第5図は、それぞれ本発
明の実施例を説明するためのブロツク線図であ
る。 1……マイク、2……フイルター群、3……音
声区間検出部、4は2値化部、5……スイツチ、
6……辞書部、7……細線化部、8……類似度計
算部、9……jA,jB変化部、10……最大類似度
算出部、11……R算出部、12……jA(jB)歩進
部、13……R最大算出部、14……結果出力
部、15……ピーク検出部。
Figures 1 and 2 are diagrams for explaining the DP matching method, Figure 3 is a diagram for explaining the principle of the present invention, and Figures 4 and 5 are examples of the present invention, respectively. FIG. 2 is a block diagram for explaining. 1...Microphone, 2...Filter group, 3...Speech section detection section, 4: Binarization section, 5...Switch,
6... Dictionary section, 7... Thinning section, 8... Similarity calculation section, 9... j A , j B changing section, 10... Maximum similarity calculation section, 11... R calculation section, 12... ...j A (j B ) Step unit, 13...R maximum calculation unit, 14...Result output unit, 15...Peak detection unit.

Claims (1)

【特許請求の範囲】 1 音声を周波数分析して得られる特徴量の時系
列からなる音声パターンを比較する音声パターン
比較方法において、周波数分析の結果を2値化し
て特徴量分布部を求め、該特徴量分布部を周波数
軸方向に細線化処理して形成された音声パターン
を、比較すべき音声パターンの少なくとも一方と
し、動的計画法により両方の音声パターンの時間
軸方向の整合をとつて比較を行なうことを特徴と
する音声パターン比較方法。 2 音声を周波数分析して得られる特徴量の時系
列からなる音声パターンを比較する音声パターン
比較方法において、周波数分析の結果得られた帯
域パワーのピークを検出し、該ピークを時系列に
示した音声パターンを、比較すべき音声パターン
の少なくとも一方とし、動的計画法により両方の
音声パターンの時間軸方向の整合をとつて比較を
行なうことを特徴とする音声パターン比較方法。
[Claims] 1. In a speech pattern comparison method for comparing speech patterns consisting of a time series of feature quantities obtained by frequency analysis of speech, the result of frequency analysis is binarized to obtain a feature quantity distribution part, The audio pattern formed by thinning the feature distribution part in the frequency axis direction is used as at least one of the audio patterns to be compared, and both audio patterns are matched in the time axis direction using dynamic programming and compared. A voice pattern comparison method characterized by performing the following. 2. In a speech pattern comparison method that compares speech patterns consisting of a time series of feature quantities obtained by frequency analysis of speech, a peak of band power obtained as a result of frequency analysis is detected and the peak is shown in a time series. A voice pattern comparison method characterized in that a voice pattern is used as at least one of the voice patterns to be compared, and the comparison is performed by aligning both voice patterns in the time axis direction by dynamic programming.
JP9475083A 1983-05-27 1983-05-27 Voice pattern collator Granted JPS59219800A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP9475083A JPS59219800A (en) 1983-05-27 1983-05-27 Voice pattern collator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP9475083A JPS59219800A (en) 1983-05-27 1983-05-27 Voice pattern collator

Publications (2)

Publication Number Publication Date
JPS59219800A JPS59219800A (en) 1984-12-11
JPH0527120B2 true JPH0527120B2 (en) 1993-04-20

Family

ID=14118797

Family Applications (1)

Application Number Title Priority Date Filing Date
JP9475083A Granted JPS59219800A (en) 1983-05-27 1983-05-27 Voice pattern collator

Country Status (1)

Country Link
JP (1) JPS59219800A (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5313531A (en) * 1990-11-05 1994-05-17 International Business Machines Corporation Method and apparatus for speech analysis and speech recognition

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5023941A (en) * 1973-07-02 1975-03-14

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5023941A (en) * 1973-07-02 1975-03-14

Also Published As

Publication number Publication date
JPS59219800A (en) 1984-12-11

Similar Documents

Publication Publication Date Title
JPS58130393A (en) Voice recognition equipment
JPH07104952B2 (en) Pattern matching device
Elenius et al. Effects of emphasizing transitional or stationary parts of the speech signal in a discrete utterance recognition system
JPH0527120B2 (en)
JP2997007B2 (en) Voice pattern matching method
Saha et al. Modified mel-frequency cepstral coefficient
Niederjohn et al. Computer recognition of the continuant phonemes in connected English speech
Tanaka A dynamic processing approach to phoneme recognition (part I)--Feature extraction
JP2557497B2 (en) How to identify male and female voices
JPS61260299A (en) Voice recognition equipment
JP2514983B2 (en) Voice recognition system
JPS61233791A (en) Voice section detection system for voice recognition equipment
JP2655637B2 (en) Voice pattern matching method
JP2996977B2 (en) Voice recognition device
JP2901976B2 (en) Pattern matching preliminary selection method
JPH07104675B2 (en) Speech recognition method
JPS63223698A (en) Monosyllable voice recognition equipment
Chaudhuri et al. Automatic Recognition of Isolated Spoken Words with New Features
JPS61203498A (en) Preselection system for voice recognition equipment
JPH0367279B2 (en)
Haus et al. Department of Electrical Engineering and Computer Science and Research Laboratory of Electronics Massachusetts Institute of Technology
JPS6229798B2 (en)
JPS6075895A (en) Voice pattern analogy calculation system
JPS61252595A (en) Voice recognition processing system
JPS60262198A (en) Consonant section detector