JPS6283798A

JPS6283798A - Continuous voice recognition equipment

Info

Publication number: JPS6283798A
Application number: JP60224549A
Authority: JP
Inventors: 英一坪香
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1985-10-08
Filing date: 1985-10-08
Publication date: 1987-04-17

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は、特徴ベクトルの系列で表わされた複数種類の
標準パターンと入力パターンとの比較を行ない、入力音
声の識別を行なう音声認識装置に関し、特に連続して発
声した単語音声の認識などに適用可能な音声認識装置に
関する。DETAILED DESCRIPTION OF THE INVENTION Field of Industrial Application The present invention relates to a speech recognition device that identifies input speech by comparing an input pattern with a plurality of standard patterns represented by a series of feature vectors. In particular, the present invention relates to a speech recognition device that can be applied to recognition of continuously uttered word sounds.

従来の技術従来、音声認識装置としては特定話者登録方式によるも
のが実用化されている。即ち、認識装置を使用しようと
する話者が、予め、認識すべきすべての単語を自分の声
で特徴ベクトルの系列に変換し、単語辞書に標準パター
ンとして登録しておき、認識時に発声された音声を、同
様に特徴ベクトルの系列に変換し、前記単語辞書中のど
の単語に最も近いかを予め定められた規則によって計算
し、最も類似している単語を認識結果とするものである
。2. Description of the Related Art Conventionally, speech recognition devices based on a specific speaker registration method have been put into practical use. That is, a speaker who intends to use a recognition device converts all the words to be recognized into a series of feature vectors using his/her own voice and registers them as standard patterns in a word dictionary, and then Similarly, speech is converted into a series of feature vectors, which word in the word dictionary is closest is calculated according to predetermined rules, and the most similar word is taken as the recognition result.

ところが、この方法によると、認識単語数が少ないとき
は良いが、数百、数千単語といったように増加してくる
と、主として次の三つの問題が無視し得なくなる。However, this method is good when the number of recognized words is small, but as the number of words increases to hundreds or thousands of words, the following three problems become impossible to ignore.

（１）登録時における話者の負担が著しく増大する０（２）認識時に発声された音声と標準パターンとの類似
度あるいは距離を計算するのに要する時間が著しく増大
し、認・識装置の応答速度が遅くなる〇（３）前記単語辞書のために要するメモリが非常に大き
くなる。(1) The burden on the speaker during registration increases significantly. (2) The time required to calculate the similarity or distance between the uttered voice and the standard pattern during recognition increases significantly, which increases the burden on the recognition/recognition device. (3) The memory required for the word dictionary becomes very large.

以上の欠点を回避するための方法として認識の単位を子
音＋母音の単音節（以後それぞれＣＶ。As a method to avoid the above drawbacks, the unit of recognition is a monosyllable of consonant + vowel (hereinafter referred to as CV).

■で表わすＯＣは子音、■は母音を意味する。）とする
方法がある。即ち、標準パターンとして単音節を特徴ベ
クトルの系列として登録しておき、認識時に特徴ベクト
ルの系列に変換された入力音声を、前記単音節の標準パ
ターンとマツチングすることにより、単音節の系列に変
換するものである。日本語の場合、単音節は高々１０１
種類であり、単音節は仮名文字に対応しているから、こ
の方法によれば、日本語の任意の単語あるいは文章を単
音節列に変換する（認識する）事が出来、前記（１）〜
（３）の問題はすべて解決されることになる０しかし、
この場合の問題の一つとしてセグメンテーションがある
。即ち、セグメンテーションは連続して発声された音声
を単音節単位に区切ることであるが、これを確実に行な
う決定的な方法は未だ見出されていない０この問題を解
決するために、現在のところ各単音節を区切って発声す
ることが行なわれており、実用化されている装置もある
。OC represented by ■ means a consonant, and ■ means a vowel. ). That is, monosyllables are registered as a series of feature vectors as standard patterns, and input speech converted into a series of feature vectors during recognition is converted into a series of monosyllables by matching with the standard pattern of monosyllables. It is something to do. In Japanese, there are at most 101 monosyllables.
Since monosyllables correspond to kana characters, according to this method, any Japanese word or sentence can be converted (recognized) into a monosyllable string, and (1) to
All problems in (3) will be solved0However,
One of the problems in this case is segmentation. In other words, segmentation is the process of dividing continuously uttered speech into monosyllabic units, but a definitive method to do this reliably has not yet been found.To solve this problem, there are currently Each monosyllable is uttered separately, and some devices are in practical use.

発明が解決しようとする問題点しかし、単音節の離散発声により日本語の文章を入力す
るのは、話者にとって緊張を強いるものであり、連続発
声により入力出来ることが望ましい。Problems to be Solved by the Invention However, it is stressful for the speaker to input Japanese sentences using discrete utterances of monosyllables, and it is desirable to be able to input them using continuous utterances.

本発明は、前記連続発声により入力された音声に対する
前記セグメンテーションの問題を解決した音声認識装置
を提供することを目的とする。An object of the present invention is to provide a speech recognition device that solves the problem of segmentation of speech input by continuous utterance.

問題点を解決するだめの手段本発明は、入力信号を特徴ベクトルａ１　、　ａ２　、
・・・・・・、ａｌ、・・・・・・、ａＩの系列に変換
する特徴抽出手段と、特徴ベクトルの系列ｂｎ１．ｂｎ
□、・・・・・・、、ｂｎｊ、ｎから成る標準パターン
Ｒｎ（ただし、ｎ＝１〜Ｎ）を記憶する標準パターン記
憶手段と、１音節発生する毎にそれに同期した信号（以
後、セグメント・マーカと呼ぶ）°を入力するセグメン
ト・マーカ入力手段と、入力のフレームｌを横軸に、標
準パターンのフレームｊを縦軸とする格子グラフにおい
て、入力信号の前記各セグメント・マーカに対応する位
置の近傍を始端点、そこから予め定めた範囲に含まれる
フレームを終端点とする入力パターンの部分パターンと
前記標準パターンＲｎとを、標準パターン側を基本軸と
して始終端自由のＤＰマツチングを行なうＤＰマツチン
グ手段と、このＤＰマツチングの結果得られる前記標準
パターンＲｎに対する最適値のうちさらに前記ｎに関す
る最適値を求め、そのｎを前記入力パターンの部分パタ
ーンの音節認識結果とする音節認識手段とを含む連続音
声認識装置である。Means for Solving the Problems The present invention converts the input signal into feature vectors a1, a2,
. . . , al, . . . , aI. bn
□, ......, ,bnj,n, a standard pattern storage means for storing a standard pattern Rn (where n=1 to N), and a signal (hereinafter referred to as segment・Segment marker input means for inputting ° (referred to as marker) °, and in a grid graph with the input frame l as the horizontal axis and the standard pattern frame j as the vertical axis, corresponding to each segment marker of the input signal. DP matching is performed with the standard pattern Rn and the partial pattern of the input pattern whose starting point is near the position and the ending point is a frame included in a predetermined range from there, with the standard pattern side as the basic axis. DP matching means; and syllable recognition means for further determining the optimum value for the n from among the optimum values for the standard pattern Rn obtained as a result of this DP matching, and for determining the optimum value for the n as the syllable recognition result of the partial pattern of the input pattern. It is a continuous speech recognition device including.

作　　用本発明は以上の構成により、特徴抽出手段により、入力
信号を特徴ベクトルａ　１　、　ａ　２　、・・・・・
・ｒ　ａｉ　＋・・・・・・、ａＩの系列に変換し、セ
グメント・マーカ入力手段により、１音節発生する毎に
それに同期した信号（以後、セグメント・マーカと呼ぶ
）を入力し、入力のフレームｉを横軸に、標準パターン
のフレームｊを縦軸とする格子グラフにおいて、入力信
号の前記各セグメント・マーカに対応する位置の近傍を
始端点、そこから予め定めた範囲に含まれるフレームを
終端点とする入力パターンの部分パターンと特徴ベクト
ルの系列ｂｎ１．ｂｎ２．・・・・・・＋ｂｎｊｐ・・
・・・・　ｂ　ｎ　、ｎ　から成る標準パターンＲｎ（
ただし、ｎ＝１〜Ｎ）とを標準パターン側を基本軸とし
て始終端自由のＤＰマツチングを行ない、このＤＰマツ
チングの結果得られる前記標準パターンｉｎに対する最
適値のうちさらにｎに関する最適値を求め、そのｎを前
記入力パターンの部分パターンの音節認識結果とする。Operation With the above configuration, the present invention converts an input signal into feature vectors a 1 , a 2 , . . . by the feature extraction means.
・r ai +..., aI, and input a signal synchronized with each syllable (hereinafter referred to as a segment marker) using the segment marker input means, and In a grid graph with frame i as the horizontal axis and frame j of the standard pattern as the vertical axis, the starting point is near the position corresponding to each segment marker of the input signal, and frames included in a predetermined range from there are A sequence of partial patterns and feature vectors of the input pattern to be used as terminal points bn1. bn2.・・・・・・＋bnjp・・
... Standard pattern Rn (
However, DP matching is performed with free starting and ending points for n=1 to N) with the standard pattern side as the basic axis, and among the optimal values for the standard pattern in obtained as a result of this DP matching, the optimal value for n is further determined, Let n be the syllable recognition result of the partial pattern of the input pattern.

実施例音声認識の一つの方法にワードスポツティングと呼ばれ
るものがある。即ち、単語、音節等の、認識の基本単位
を連続して発声して入力された連続音声に対して、前記
それぞれの認識基本単位の可能な存在候補区間を見出す
方法である０本発明はこのワードスポツティングと呼ば
れる方法に基礎を置いているものであって、先ず、ワー
ドスポツティングの原理を説明する。Embodiment One method of speech recognition is called word spotting. That is, the present invention is a method of finding possible existence candidate intervals of each of the recognition basic units for continuous speech inputted by continuously uttering basic recognition units such as words and syllables. It is based on a method called word spotting, and first, the principle of word spotting will be explained.

第１図すはワードスポツティングに用いられるＤＰマツ
チングにおける径路の拘束条件の一例を示している。Ｄ
Ｐマツチングとは、入力パターンを特徴ベクトルの系列
＆１．　ａ２　、・・・・・・ｐａｉ＋・・・・・・、
ａ工とし、標準パターンＲｎを特徴ベクトルの系列ｂｎ
１．ｂｎ３．　、・、、、・、　、ｂｎｊ、　、、、、
・、　、　、ｂｎｊ、ｎ　　とするとき、両者それぞれ
の系列の特徴ベクトルの対応を動的計画法（ＤＰ）によ
り、その対応による特徴ベクトル間の類似度（または距
離）の荷重和を最大（または最小）にし、その時の類似
度（または距離）を両系列間の累積類似度（まだは累積
距離）とするものである。このことを図的に説明すると
、入力パターンの特徴ベクトルを横軸に、標準パターン
の特徴ベクトルを縦軸にとった格子グラフを考えれば、
前記両パターンの特徴ベクトルの対応は各格子点により
表わされるから、前記最大化（最小化）問題はこの格子
グラフにおいて前記系列間の類似度（または距離）の荷
重和を最大（または最小）にする格子点列即ち径路を求
める問題になる。この場合、極端な対応付けを避けるた
めに径路の選び方に拘束条件を設ける。第１図すはその
一例であって、その意味するところは、前記格子グラフ
において入力パターンの座標をｉ、標準パターンの座標
をｊとするとき、点（ｉ、Ｈに至る径路は点（５−２ｓ
　５−１Ｌ点（Ｌ　−１゜ｊ−１）、点（ｉ−１，ｊ−
２）のうち何れかを必ず通過するように径路が選ばれる
ということである。また、同図においてそれぞれの径路
上に付した数値は、その径路が選ばれたときの重み係数
を示しており、本例はワードスポツティングを行なう場
合に用いられる重み係数の定め方の一例である。FIG. 1 shows an example of path constraint conditions in DP matching used in word spotting. D
P-matching refers to matching an input pattern with a series of feature vectors &1. a2,...pai+...,
Let the standard pattern Rn be the series of feature vectors bn
1. bn3. ,・,,,・, ,bnj, ,,,,
・ , , , bnj,n, the correspondence between the feature vectors of both series is calculated using dynamic programming (DP), and the weighted sum of the similarity (or distance) between the feature vectors according to the correspondence is maximized (or The similarity (or distance) at that time is set as the cumulative similarity (still cumulative distance) between both series. To explain this graphically, if we consider a grid graph in which the horizontal axis is the feature vector of the input pattern and the vertical axis is the feature vector of the standard pattern,
Since the correspondence between the feature vectors of both patterns is represented by each grid point, the maximization (minimization) problem is to maximize (or minimize) the weighted sum of the similarity (or distance) between the series in this grid graph. The problem is to find a grid point sequence, that is, a path. In this case, constraints are set on how to select routes to avoid extreme correspondence. Figure 1 is an example of this, and what it means is that in the grid graph, when the coordinates of the input pattern are i and the coordinates of the standard pattern are j, the path to point (i, H) is point (5). -2s
5-1L point (L -1゜j-1), point (i-1,j-
This means that the route is selected so that it always passes through one of 2). In addition, the numerical values attached to each route in the same figure indicate the weighting coefficient when that route is selected, and this example is an example of how to determine the weighting coefficient used when performing word spotting. be.

・・・・・・・・・・・・・・・・・・（１）ここで、
ｄ！１ｌ（ｉ、ｊ）は特徴ベクトルａｉ　　と特徴ベク
トル、ｂｎｊ、との距離であって、最も簡単にはｄ”（
ｉ、＋）＝　ｌ　ａｉ−ｂｎ５　１となし得る。類似度
を用いる場合は騙をｍ＆Ｘ　　とすればよい。式（１）
を初期値Ｄｎ（ｉ、１）＝ｄｎ（１，１）トシテ計算ス
レハ、Ｄｎ（ｉｌｇ）は、入力パターンの第ｍフレーム
から第１フレームの入力パターンの部分パターンＡ（ｍ
、ｉ）と標準パターンＲｎの第１フレームから第１フレ
ームまでの部分パターン”（’ｙｊ）との最小累積距離
のｍに関する最小値ということになるＯこの時のｍを知
るためには、式（１）においてをＤ”（ｉ、ｊ）が決定
される毎に計算し、記憶しておけば前記最適のｍ＝Ｂ”
（ｉ、ｊ）となっている・重み係数をこのように定める
ことにより、Ｒｎ（１゜ｊ）と入力パターンの任意の区
間とのマツチングにおける重み係数の和は一定となり、
例えば第６図に示すような径路３０１．３０２は重み係
数の和が同じとなるから標準パターンＲｎに入力パター
ンの部分パターン３０３と３０４の何れがよりよく類似
しているかは、入力パターンの長さには関係無く、それ
ぞれの径路における累積距離を比較するのみでよい。ま
た、第１図ｂ１あるいは式（１）からも明らかなように
、点（’１１）までの累積距離の計算には点（ｉ　−２
、ｊ　−１）　、点（ｉ　−１、ｉ　−１）、点（’　
　１．）−２）における累積距離が分かつておればよい
から、実際には式（１）の計算をｉ＝１〜Ｈのそれぞれ
のフレームにおいて順次ｊ＝１〜■０について行ない、
Ｄｎ（ｉ　＋　Ｉｎ）　（Ｄ　ｉ　Ｋ関スル最小値カＤ
ｎ（ｉ＊、　１″）になったとすれば、標準パターンＲ
ｎ　に最も近い入力パターンノ部分ハターンハＡ（Ｂ″
（ｉ＊ｌＪ”）、Ｌ＊）として求められる。・・・・・・・・・・・・・・・・・・(1) Here,
d! 1l(i,j) is the distance between the feature vector ai and the feature vector, bnj, and can be expressed most simply as d''(
i, +) = l ai - bn5 1. When using similarity, the trick may be m&X. Formula (1)
The initial value Dn (i, 1) = dn (1, 1) is calculated as follows, Dn (ilg) is the partial pattern A (m
, i) and the partial pattern "('yj) from the first frame to the first frame of the standard pattern Rn". In (1), if D" (i, j) is calculated and stored each time D" (i, j) is determined, the optimal m=B"
(i, j) By determining the weighting coefficients in this way, the sum of the weighting coefficients in matching Rn(1°j) and any section of the input pattern becomes constant,
For example, paths 301 and 302 as shown in FIG. 6 have the same sum of weighting coefficients, so which of the partial patterns 303 and 304 of the input pattern is more similar to the standard pattern Rn is determined by the length of the input pattern. It is only necessary to compare the cumulative distances on each route, regardless of the distance. Also, as is clear from Fig. 1 b1 or equation (1), the cumulative distance to point ('11) is calculated by point (i - 2
, j -1), point (i -1, i -1), point ('
1. Since it is sufficient to know the cumulative distance in )-2), in reality, calculation of equation (1) is performed sequentially for j = 1 to ■0 in each frame of i = 1 to H,
Dn(i + In) (D i K related minimum value D
If it becomes n(i*, 1″), the standard pattern R
The part of the input pattern closest to n is A(B″
(i*lJ”), L*).

本発明は前記認識基本単位とマツチングするべき入力パ
ターンの部分パターンの候補区間をある程度確かな情報
により限定することによシ、認識の制度を上げようとす
るものであって、認識の基本単位は本実施例においては
音節であり、音節の発声に同期して入力されたセグメン
ト・マーカをもとにして前記音節の存在候補区間を限定
するものである。The present invention aims to improve the accuracy of recognition by limiting candidate sections of partial patterns of an input pattern to be matched with the basic unit of recognition using certain information, and the basic unit of recognition is In this embodiment, it is a syllable, and the candidate interval for the syllable is limited based on a segment marker input in synchronization with the pronunciation of the syllable.

第３図は「よこはま」と発声した場合の音声のパワーの
変化の様子ととキー等により手動により入力されたセグ
メント・マーカｂの時間関係を示すタイム・チャートで
ある。実際の場合にも連続発声された音節の開始点と手
で入力したセグメント・マーカの位置はかなりよく一致
していることが観測されている０本発明の原理は、各音節の開始時点はこのセグメント・
マーカの近辺にあり、音節標準パターンを各音節を発声
した時の語頭から母音の定常部までとすれば、各音節の
終了時点はこのセグメント・マーカとそれに続くセグメ
ント・マーカの間にあると考えられるから、この範囲に
前記ワード・スポツティングの範囲を限定して始終端自
由のＤＰマツチングを行なうものである。ここで、音節
標準パターンは各音節を発声した時の語頭から母音の定
常部までとしたのは、音節としての情報は、子音部、子
音から母音部にかけての過渡部、母音部の三つの部分に
分けて考えられるが、母音部は持続時間が最も長くその
まま標準パターンとして採用したのでは母音部の重みが
他の三者に比べて大きくなりすぎ、認識に際して好まし
くないからである。FIG. 3 is a time chart showing the change in the power of the voice when uttering "Yokohama" and the time relationship between the segment marker b input manually using a key or the like. In actual cases, it has been observed that the starting point of a continuously uttered syllable and the position of a manually inputted segment marker match fairly well. The principle of the present invention is that the starting point of each syllable is segment·
If the syllable standard pattern is from the beginning of each syllable to the stationary part of the vowel, then the end point of each syllable is considered to be between this segment marker and the following segment marker. Therefore, the word spotting range is limited to this range, and DP matching is performed freely at the beginning and end. Here, the syllable standard pattern is from the beginning of each syllable to the fixed part of the vowel.The reason why the syllable standard pattern is from the beginning of each syllable to the fixed part of the vowel is because the information as a syllable is divided into three parts: the consonant part, the transition part from the consonant to the vowel part, and the vowel part. However, the vowel part has the longest duration, so if it were adopted as it is as a standard pattern, the weight of the vowel part would be too large compared to the other three, which would be unfavorable for recognition.

第１図ａは以上の原理に基づく本発明の一実施例を示す
ブロック図である。FIG. 1a is a block diagram showing an embodiment of the present invention based on the above principle.

１はフィルタバンク等で構成された特徴抽出部であって
、入力音声信号を特徴ベクトルａｉ　の系列Ａに変換す
る。Reference numeral 1 denotes a feature extraction unit composed of a filter bank, etc., which converts an input audio signal into a series A of feature vectors ai.

４は標準パターン記憶部であって、認識すべき各単音節
に対応する前記の如き特徴ベクトル系列Ｒ”（ｎ＝１〜
Ｎ）が予め登録されている。Reference numeral 4 denotes a standard pattern storage unit, which stores the above-mentioned feature vector series R'' (n=1~
N) is registered in advance.

５はベクトル間距離計算部であって、入力パターンの第
ｉフレームにおける特徴ベクトルａｌトｎ番目の標準パ
ターンＲｎの特徴ベクトルｂｎ　、との距離ｄ（’　＋
　＋　）を求める。5 is an inter-vector distance calculation unit which calculates the distance d(' +
+).

３は標準パターン・カウンタであって、マツチングすべ
き標準パターンＲｎをｎ＝１〜Ｎと順次設定する。Reference numeral 3 denotes a standard pattern counter, which sequentially sets standard patterns Rn to be matched as n=1 to N.

２は標準パターンフレーム・カウンタであって、標準ハ
ターンＲｎＯフレームをｉ＝１〜Ｊ”　Ｋｆｆって、順
次指し示す。Reference numeral 2 denotes a standard pattern frame counter, which sequentially points to standard pattern RnO frames i=1 to J''Kff.

８はセグメント情報入力部であって、キー等の押し下げ
により、入力される音節に同期して手動により前記セグ
メント情報を入力する。Reference numeral 8 denotes a segment information input section, in which the segment information is manually input in synchronization with the input syllable by pressing down a key or the like.

９は音声区間検出部であって、入力信号の大きさ等から
周知の方法に従って音声区間を検出するものである０１Ｑは入力フレーム・カウンタであって、音声区間検出
部９が音声入力が開始されたことを検出すると入力フレ
ーム・カウンタ１ｏはフレーム毎に計数を始める。Reference numeral 9 denotes a voice section detecting section, which detects a voice section according to a well-known method based on the magnitude of the input signal, etc.0.1Q is an input frame counter, and the voice section detecting section 9 detects a voice section according to a well-known method. When the input frame counter 1o detects that the input frame has been input, the input frame counter 1o starts counting for each frame.

７はマツチング開始終了フレーム決定部であって、入力
フレーム・カウンタ１ｏの出力に対して、セグメント情
報入力部８の出力から、前記説明に従って標準パターン
Ｒｎ　とマツチングすべき入力パターンの部分区間を決
定する。Reference numeral 7 denotes a matching start/end frame determining unit, which determines a partial section of the input pattern to be matched with the standard pattern Rn according to the above explanation based on the output of the input frame counter 1o and the output of the segment information input unit 8. .

６はセグメント内カウンタであって、前記それぞれのセ
グメントにおいて、マツチング開始終了フレーム決定部
７で決定されたマツチング開始フｖ　−ムカラ？ッチン
ク終了フレームまでのフレームを計数し、前記セグメン
ト内の入力パターンの各フレームを指し示す。Reference numeral 6 denotes an intra-segment counter, which indicates the matching start frame v-mukara? determined by the matching start and end frame determination unit 7 in each of the segments. Count the frames up to the end frame of the link and point to each frame of the input pattern within the segment.

５はベクトル間距離計算部であって、前記セグメント内
カウンタ６で指定される入力パターンのフレームと、前
記標準パターン・カウンタ３と前記標準パターンフレー
ム・カウンタ２で指定される標準パターンのフレームと
の前記ベクトル間距離を計算する。Reference numeral 5 denotes an inter-vector distance calculation unit which calculates the distance between the frame of the input pattern specified by the intra-segment counter 6 and the frame of the standard pattern specified by the standard pattern counter 3 and the standard pattern frame counter 2. Calculate the distance between the vectors.

１１はベクトル間距離記憶部であって、前記ベクトル間
距離計算部６で計算されたベクトル間距離を必要が無く
なるまで一時的に記憶する。Reference numeral 11 denotes an inter-vector distance storage unit which temporarily stores the inter-vector distance calculated by the inter-vector distance calculation unit 6 until it is no longer needed.

１３は累積距離計算部であって、前記式（１）に従って
、累積距離を計算する。Reference numeral 13 denotes a cumulative distance calculation unit, which calculates the cumulative distance according to the above equation (1).

１２は累積距離記憶部であって、部累積距離計算部１３
で計算された累積距離を必要が無くなるまで一時的に記
憶する。12 is a cumulative distance storage unit, and a cumulative distance calculation unit 13
The cumulative distance calculated in is temporarily stored until it is no longer needed.

１４は最小値決定部であって、前記累積距離記憶部１２
に記憶されている累積距離から、前記各セグメントにお
いて、そのセグメントにおけるフレームに関する標準パ
ターンＲｎに対する累積距離の最小値を決定する。14 is a minimum value determination unit, and the cumulative distance storage unit 12
In each segment, the minimum value of the cumulative distance with respect to the standard pattern Rn for the frame in that segment is determined from the cumulative distances stored in the segment.

１５は音節決定部であって、前記最小値決定部１４で決
定された標準パターンＲｎに対して求められた前記累積
距離の最小値から、さらに、ｎに関する最小値を求め、
その時のｎ　ｔ−ｎ＊　とするとき、このｎ＊を当該セ
グメントの音節認識結果とする。15 is a syllable determining unit which further determines a minimum value regarding n from the minimum value of the cumulative distance determined for the standard pattern Rn determined by the minimum value determining unit 14;
Letting n t-n* at that time, let n* be the syllable recognition result of the segment.

第２図は前記実施例の詳細な処理の手順を説明するだめ
の処理手順図である。ソフトウェアにより実現する場合
もこれに従えばよいのは勿論である。FIG. 2 is a processing procedure diagram for explaining the detailed processing procedure of the embodiment. Of course, this can also be followed when implementing it using software.

同図において、ＮＤＤＯなる記法はＸが満足される間Ｙを実行することを、ＬＳ
ＥＮＤＩＦ力る記法はＸなる条件が満足されればＹを、そうでなけ
れば、Ｚを実行することをそれぞれ意味するものとする
。In the same figure, the notation NDDO indicates that LS executes Y while X is satisfied.
The E NDIF notation means to execute Y if the condition X is satisfied, and to execute Z otherwise.

また、累積距離、ベクトル間距離等に関する記法を次の
ように変更することにより、途中結果記憶用メモリを節
約する。即ち、第１図すに示した径路を用いるときは１
．入力パターンの第ｉフレームの累積距離の計算は第ｉ
−１段と第ｉ　−２段の累積距離と、第ｉ−１段のベク
トル間距離が記憶されておればよく、結局、必要なメモ
リはＤｎ（ｉＦ））については入力の３フレ一ム分、ｄ
ｎ（ｉｒｔ）については入力の２フレ一ム分がそれぞれ
記憶されておれば漸化式の計算は出来るものであるが、
さらに、累積距離の計算はｔ＝１ｎ〜１の順番に行なう
事が出来、この時は”（’　ｐ　］　）の計算に用いら
れたＤ”（ｉ　−２、ｊ−１）は再び用いられることは
ないから、新たに求められたＤｎ（ｉ　、　ｊ　）ノ記
憶場所トＬテＤ”（ｉ　−２、５）の記憶されていた場
所を用いることが出来、必要ナメモリ１ｆ−Ｄｎ（’　
ｌ　３　）　＊　”（’　Ｆ　］　）　ニツイて入力の
２フレ一ム分ということになる。また、各単語ｎ毎に１
セグメント分の計算を行なうことニスレバ、前記Ｄｎ（
ｔ　＋　ｓ　）　＋　ｄ”（ｔ　ｐ　ｊ）ハｔべでのｎ
について共通に用いることが出来るから、これらをそれ
ぞれＤ（＝　？　Ｊ　）　−ｄ（＝　ｔ　）　）（但し
、ｍ＝ｏ　、１　、：　ｊ＝１〜■ジとすることが出来
、メモリの大幅な節約が図れる。Furthermore, by changing the notation regarding cumulative distance, distance between vectors, etc. as follows, the memory for storing intermediate results can be saved. That is, when using the route shown in Figure 1, 1
．． The calculation of the cumulative distance of the i-th frame of the input pattern is
It is only necessary to store the cumulative distances of the -1st stage and the i-2nd stage, and the distance between the vectors of the i-1st stage.In the end, the required memory is Dn (iF)), which is the 3 frames of input. minute, d
Regarding n(irt), it is possible to calculate the recurrence formula if two frames of input are memorized, but
Furthermore, the cumulative distance calculation can be performed in the order of t = 1n ~ 1, and in this case, D'' (i - 2, j - 1) used in the calculation of ``(' p ])'' is used again. Therefore, it is possible to use the storage location of the newly found Dn(i, j) where D"(i -2, 5) was stored, and the required memory 1f-Dn('
l 3 ) * ”(' F ] ) This means that it is two frames of input. Also, 1 for each word n.
Performing calculations for the segment Nislever, the Dn(
t + s ) + d” (t p j) n at Hatabe
Since these can be used in common, they can be set to D (=? You can save money.

ステップ１０１においてｋは入力されたセグメント・マ
ーカの入力順序に従う番号を表わし、ｋ＝ｏは語頭を表
わすものとする。In step 101, k represents a number according to the input order of input segment markers, and k=o represents the beginning of a word.

ステップ１０２は第に番のセグメントにおけるマツチン
グの開始フレームＳを設定している。ｉｋはに番目のセ
グメント・マーカの入力フレーム番号、ｒｌ　　はに番
目のセグメントのマツチング計算を行なう際の始端点自
由の範囲を規定するものであり、予め設定されている。Step 102 sets the starting frame S for matching in the second segment. ik is the input frame number of the second segment marker, and rl defines the free range of the starting point when performing the matching calculation of the second segment, and is set in advance.

ステップ１０３は第に番のセグメントにおけるマツチン
グの終了フレームｔを設定している。ｒ２はに番目のセ
グメントのマツチング計算を行なう際の始端点自由の範
囲を規定するものであり、予め設定されている。ｔとし
ては１ｋ−１＋ｒ２　　とｉｋの小さい方が選ばれる。Step 103 sets the matching end frame t for the second segment. r2 defines the free range of the starting point when performing the matching calculation for the second segment, and is set in advance. The smaller of 1k-1+r2 and ik is selected as t.

ステップ１０４は、標準パターンＲｎ　と最も良くマツ
チングする入力パターンの部分パターンを前記８〜ｔの
範囲で求めた距離のうちｎ＝１〜ｎに関する最小値を記
憶するメモリＤｎ　を−に初期化している。In step 104, a memory Dn that stores the minimum value for n=1 to n among the distances obtained in the range of 8 to t for the partial pattern of the input pattern that best matches the standard pattern Rn is initialized to -. .

ステップ１０５は前記区間３〜ｔにおいて入力パターン
と標準パターンＲｎ　とのマツチング計算を行なうもの
である。Step 105 is to perform a matching calculation between the input pattern and the standard pattern Rn in the section 3 to t.

ステップ１０６は、標準パターンＲｎ　と最も良くマツ
チングする入力パターンの部分パターンを前記３〜ｔの
範囲で求めるために当該セグメントの当該フレームまで
の距離の最小値を記憶するメモリＤ、をφに初期化して
いる。Step 106 initializes the memory D, which stores the minimum distance of the segment to the frame, to φ in order to find the partial pattern of the input pattern that best matches the standard pattern Rn in the range of 3 to t. ing.

ステップ１０７は当該セグメントのマツチング計算を行
なうに先だって累積距離を−に初期化している。In step 107, the cumulative distance is initialized to - before performing the matching calculation for the segment.

ステップ１０８は累積距離、バックポインタ、ベクトル
間距離を記憶すべきメモリを巡回的に使用するに際して
、各セグメントにおいて最初に用いるメモリを決めるた
めに行なっている処理である。Step 108 is a process performed to determine the first memory to be used in each segment when cyclically using memories to store cumulative distances, back pointers, and inter-vector distances.

ステップ１０９は標準パターンＲｎ　と最も良くマツチ
ングする入力パターンの部分パターンを前記ｓ−ｔの範
囲で求め、その値をメモリＤｉ　に入れている。In step 109, a partial pattern of the input pattern that best matches the standard pattern Rn is found in the range s-t, and its value is stored in the memory Di.

ステップ１１ｏは前記累積距離、ベクトル間距離を記憶
すべきメモリを巡回的に使用するために、行なっている
処理であって、４ｍの否定を意味する。Step 11o is a process performed to cyclically use the memory for storing the cumulative distance and inter-vector distance, and means the negation of 4m.

ステップ１１１は入力パターンのフレームａ１と標準パ
ターンＲｎのフレームｂ″ｊとのベクトル間距離をｊ　
＝　１〜■０について計算している。本実施例は所謂市
街地距離である。In step 111, the vector distance between frame a1 of the input pattern and frame b''j of the standard pattern Rn is determined by j.
Calculations are made for = 1 to ■0. This example is a so-called urban area distance.

ステップ１１２はｊ＝Ｉｎ〜３について式（１）に相当
する計算を行なっている。Step 112 performs calculations corresponding to equation (1) for j=In~3.

ステップ１１３はｊ＝２の場合についての累積距離の計
算である。この場合は径路は第４図のようになる。Step 113 is the calculation of cumulative distance for the case of j=2. In this case, the route will be as shown in FIG.

ステップ１１４はｊ　＝　１の場合についての累積距離
の計算である。Step 114 is the calculation of the cumulative distance for the case j=1.

ステップ１１５はフレームｔｓ−ｔにおける標準パター
ンＲｎに対する累積距離の最小値をＤｉとしてメモリに
記憶する。Step 115 stores the minimum value of the cumulative distance to the standard pattern Rn in frame ts-t as Di in the memory.

ステップ１１６は前記最小値Ｄｉ　を標準パターンａｎ
のフレーム数で正規化し、その値をＤｔとして記憶する
。Step 116 converts the minimum value Di into a standard pattern an
is normalized by the number of frames, and the value is stored as Dt.

ステップ１１８は前記Ｄｔ　の値からｎについての最小
値を求め、その時のｎをＮ　（ｋ）として記憶して記憶
している。即ち、Ｎ（ＩＩは入力の第にセグメントの部
分パターンに最も近い標準パターンということになる。Step 118 calculates the minimum value of n from the value of Dt, and stores n at that time as N (k). That is, N(II is the standard pattern closest to the partial pattern of the input segment.

以上のようにして求められたＮ　（ｋ）から、入力は音
節列Ｎ（１）　、　Ｎ（２）　、・・・＋＋＋　、Ｎ（
６）であるという結果が得られる。From N(k) obtained in the above manner, the input is the syllable string N(1), N(2),...+++, N(
6) is obtained.

発明の効果以上求べたように、本発明においては、音節を連続して
発声する際、手動にて音節毎にセグメント・マーカを入
力するようになし、このマーカの位置を基にワード・ス
ポツティングの考え方を導入し、始終端自由のＤＰマツ
チングを行なうようにしたので、セグメンテーション誤
りを排除出来たばかりでなく、高精度のセグメンテーシ
ョンが可能となったものである。Effects of the Invention As described above, in the present invention, when syllables are uttered continuously, a segment marker is manually input for each syllable, and word spotting is performed based on the position of this marker. By introducing this idea and performing DP matching with free start and end points, it is possible to not only eliminate segmentation errors but also to perform highly accurate segmentation.

[Brief explanation of drawings]

第１図は本発明の一実施例を示すブロック図、第２図は
前記実施例における処理手順を詳細に説明するための処
理手順図、第３図は本発明の詳細な説明するだめの実例
を示す実例図、第４図は本実施例におけるＤＰマツチン
グの例外処理を説明するための説明図、第６図は本発明
に用いられているＤＰマツチングの性質を説明するため
の説明図である。１・・・・・・特徴抽出部、２・・・・・・標準パター
ン・フレーム・カウンタ、３・・・・・・標準パターン
ｅカウンタ、４・・・・・・標準パターン記憶部、６・
・・・・・ベクトル間距離計算部、６・・・・・・セグ
メント内カウンタ、７・・・・・・マツチング開始終了
フレーム決定部、８・・・・・・セグメント情報入力部
、９・・・・・・音声区間検出部、１０・・・・・・入
力フレーム・カラインタ、１１・・・・・・ベクトル間
距離記憶部、１２・・・・・・累積距離記憶部、１３・
・・・・・最小値決定部、１４・・・・・・音節列決定
部。FIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is a processing procedure diagram for explaining in detail the processing procedure in the embodiment, and FIG. 3 is an example of the present invention for which detailed explanation is not necessary. FIG. 4 is an explanatory diagram for explaining the exception handling of DP matching in this embodiment, and FIG. 6 is an explanatory diagram for explaining the nature of DP matching used in the present invention. . 1...Feature extraction unit, 2...Standard pattern frame counter, 3...Standard pattern e counter, 4...Standard pattern storage unit, 6・
. . . Inter-vector distance calculation unit, 6 . . . Intra-segment counter, 7 . . . Matching start and end frame determination unit, 8 . . . . Voice section detection section, 10 . . . Input frame/color inter, 11 .
... Minimum value determining section, 14... Syllable string determining section.

Claims

[Claims]

The input signal is converted into feature vectors a_1, a_2,...
, a_i, ..., a_ I, and a feature vector sequence b^n_1, b^n
＿２、・・・・・・、b＾n＿j、・・・・・・、b＾
Standard pattern R^n consisting of n_J^n (where n=
1 to N), a segment marker input means that inputs a signal synchronized with each syllable (hereinafter referred to as a segment marker), and input frame i on the horizontal axis. , in a lattice graph with frame j of the standard pattern as the vertical axis, an input pattern in which the starting point is near the position corresponding to each segment marker of the input signal, and the ending point is a frame included in a predetermined range from there. The partial pattern and the standard pattern R^
n and the standard pattern side is the basic axis, and the starting and ending points are free D.
DP matching means for performing P matching, and this DP
further determining the optimal value for the standard pattern R^n among the optimal values for the standard pattern R^n obtained as a result of matching;
syllable recognition means for determining the n as a syllable recognition result of a partial pattern of the input pattern.