JPS58176699A

JPS58176699A - Audio standard pattern registration method

Info

Publication number: JPS58176699A
Application number: JP57058113A
Authority: JP
Inventors: 吉明北爪; 遠藤　武之; 目瀬　道弘; 栄二大平; 利一安江
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1982-04-09
Filing date: 1982-04-09
Publication date: 1983-10-17

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】本発明は、音声認識装置に係り％特に、特定話者の音声
認識装置で大語案、究極的には任意語型の認識における
標準パターンの登録に好適゛な標準パターン登録方式に
関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech recognition device, and in particular, to a speech recognition device for a specific speaker, which is suitable for registering standard patterns for recognition of large word patterns, ultimately arbitrary word types. Regarding pattern registration method.

従来の大語案認識では、単音節を標準パターンとして、
離散発声の認識を行なっており、その局面では、調音結
合（前後の音韻による影響）の考慮がなされておらず、
得られる認識率が低い。これに対して調音結合を考慮し
たＶＣＶ　＜母音−子音−母音）を標準パターンとして
登録し、認識率を向上させることが考えられるが、この
場合の問題点は多数のＶＣｖの標準ノ（ターンを登録す
る必要があり、このために多大の時間を要することであ
る。特に特定話者で認識する場合には、使用者ごとに■
Ｃ■パターンを発声する必要があり、意味のない多数の
ＶＣ■パターンを発声するので、当然心理的につかれる
し、リズムにのらないので、各々のｖＣＶも明確に発声
されない可能性がある。Conventional large word recognition uses monosyllables as the standard pattern,
Discrete utterances are recognized, and in this aspect, articulatory coupling (the influence of preceding and following phonemes) is not taken into consideration.
The recognition rate obtained is low. On the other hand, it is possible to improve the recognition rate by registering a VCV <vowel-consonant-vowel) that takes into account articulatory combination as a standard pattern, but the problem in this case is that many VCv standard patterns (turns) This requires a lot of time to register.Especially when recognizing a specific speaker, the
It is necessary to utter the C■ pattern, and since a large number of meaningless VC■ patterns are uttered, it is naturally psychologically exhausting, and since the player cannot get into the rhythm, there is a possibility that each vCV may not be uttered clearly.

したがって、本発明は上記問題点を解決するためになさ
れたもので、大語業の認識を実現するために現時点で考
えて最良の標準パターンであるＶＣＶパターンを用いる
場合に、特に特定話者の認識において、使用者が意味の
ある単語あるいは文章を発声することによシ、前記■Ｃ
ｖ）（ターンを自動的に切出して、標準パターンを登録
する方式を提供することを目的とする。Therefore, the present invention has been made to solve the above-mentioned problems, and when using the VCV pattern, which is currently the best standard pattern to realize the recognition of large vocabulary words, In recognition, when the user utters a meaningful word or sentence,
v) (The purpose is to provide a method for automatically cutting out turns and registering standard patterns.

本発明は、連続非線形あるいは線形照合回路が［入カバ
ターン中の各エリアを標準）くターンとの照合によりセ
グメンテーションする。」ことができるという機能を用
いて、標準パターンを作成し登録する方式を実現するも
のであり意味のない音声を発声しなければならないｆｃ
品問題（心理的疲労、明確に各パターンが発声されない
等）を解決する。ｆなわち単語１文章など、発声者は、
自然で意味のある音声を発声すればよい。In the present invention, a continuous non-linear or linear matching circuit performs segmentation by comparing each area in an input cover turn with a turn. This is to realize a method of creating and registering a standard pattern using the function that allows you to ``fc'', which requires you to utter meaningless sounds.
Resolve quality issues (psychological fatigue, inability to clearly pronounce each pattern, etc.). f, that is, a single word sentence, etc., the utterer is
All you have to do is speak in a natural and meaningful way.

以下、実施例にもとづき本発明の詳細な説明する。Hereinafter, the present invention will be described in detail based on Examples.

第１図は本発明により標準パターン登録が実施嘔れる音
声認識装置のブロック構成を示し、第２図に公知の連続
非線形照合の原理を示し、第３図に本発明における標準
パターン作成の例を示す。FIG. 1 shows a block configuration of a speech recognition device in which standard pattern registration is performed according to the present invention, FIG. 2 shows the principle of known continuous nonlinear matching, and FIG. 3 shows an example of standard pattern creation according to the present invention. show.

第１図〜第３図で了解されるように、標準パターンを登
録する場合に、あらかじめ決められたＩ１ｍ序で、単語
あるいは文章を発声すれば、九とえは第３図（１）に示
した音声／ＡＫＡぺＥ　Ｉ　ＲＯ／を入力すると、第１
図の音声入力部１でディジタル化され。As can be understood from Figures 1 to 3, when registering a standard pattern, if you utter the words or sentences in the predetermined I1m order, the 9th toe will be as shown in Figure 3 (1). When you input the voice /AKA PE I RO/, the first
The audio is digitized by the audio input unit 1 shown in the figure.

ディジタル信号ａが特徴抽出部２で周波数分析される。The digital signal a is subjected to frequency analysis in the feature extraction section 2.

分析された結果ｂ（たとえばフィルタパンク出力値）が
特徴パラメータメモリ３に入る。後述するように、この
メモリに記憶されたパラメータが、標準パターンメモリ
６に転送される。The analyzed result b (eg filter puncture output value) is entered into the feature parameter memory 3. As will be described later, the parameters stored in this memory are transferred to the standard pattern memory 6.

また、信号すは距離計算部４に入力され、あらかじめオ
フラインで作成された基準パターンを格納するメモリ５
の出力との距離が求められる。Further, the signal signal is input to the distance calculation unit 4, and the memory 5 stores a reference pattern created offline in advance.
The distance to the output of is calculated.

本実施例では基準パターンとしてＶ＊Ｖ（母音のペア）
が用いられる。In this example, the reference pattern is V*V (vowel pair).
is used.

この工うにして、第３図（１）のごとく発声された音声
入力に対して、入力フレームの各時点で、第３図（２）
のように基準パターン（Ａ’　＝Ａ傘Ａ。In this way, for the voice input uttered as shown in Fig. 3 (1), at each point in the input frame, as shown in Fig. 3 (2),
The standard pattern (A' = A umbrella A.

Ｈ’　＝Ａ＊Ｅ、Ｃ’　＝ＥＩ）との距離が求められる
。H' = A*E, C' = EI) is determined.

すなわち、第１図の距離計算部４では、入力フレームの
各時点で、基準パターン全フレームとの距離を求め、こ
れを公知の連続非線形照合部７に送る。　　　　　　　
５、・連続非線形照合部７では、第２図（２）に示すように、
距離計算部４ゆら入力した距離ｄを用いて、全体の照合
をとり、入力音田のどの部分にどの基準パターンが照合
するかを決定する。第２図（１）では基準パターンＢ′
との照合結果を示している。That is, the distance calculation unit 4 in FIG. 1 calculates the distance to all frames of the reference pattern at each point in time of the input frame, and sends this to the known continuous nonlinear matching unit 7.
5. In the continuous nonlinear matching unit 7, as shown in FIG. 2 (2),
Using the input distance d, the distance calculation unit 4 performs an overall comparison, and determines which reference pattern matches which part of the input sound field. In Fig. 2 (1), the reference pattern B'
The results are shown below.

第２図（３）に示すように入カバターン内容としてＡ、
、Ｂ、Ｃの系列を考えているが、たとえば、Ａは／ＡＫ
Ａ／、ＢＦｉ／ＡＮＥ／、Ｃは／ＥＩ／と仮定する。As shown in Figure 2 (3), the contents of the inlet pattern are A,
, B, and C. For example, A is /AK
Assume that A/, BFi/ANE/, and C are /EI/.

しｔがってＡＢＣ・・・と発声された音声に対して。Therefore, in response to the voice saying ABC...

連続非線形照合部７で、基準パターンＡ’、Ｂ’。The continuous nonlinear matching unit 7 generates reference patterns A' and B'.

ＣＩ　Ｉ、・・・との距離すなわち、第２−（２）のｄ
（ｔ。Distance from CI I,..., i.e. d of 2nd-(2)
(t.

Ｔ）を入力して下式に示す漸化式４−を演算処理を行な
い、入力フレーム中の各基準パターン位置の決定を行な
う。T) is input and the recurrence formula 4- shown below is computed to determine the position of each reference pattern in the input frame.

・・・・・・・・・・・・（υ 上式（は、第２図に示したパス選択法を達成するもので
、現時点の距離ｄ（ｔ、ｒ）と過去のマツチング値とを
用いて最適パスを選択する。なお。・・・・・・・・・・・・(υ The above formula () achieves the path selection method shown in Fig. 2, and matches the current distance d(t, r) with the past matching value. to select the optimal path.

実際には、パスの長さで正規化する必要があるが、ここ
では説明を省略する。この連続非線形照合部７の動作療
理は、（１）式の処理を入力フレー２の１点ごとに全て
の基準パターン全フレームどの間の距＃１ｌｄ（ｔ、ｌ
を用いて行ない、各基準パターンのフレームエンドでの
みマツチング値の判定を行なうことにより、入力フレー
ムのどの時点で。In reality, it is necessary to normalize by path length, but the explanation is omitted here. The motion therapy of this continuous nonlinear matching unit 7 is based on the processing of equation (1) for each point of the input frame 2, and the distance #1ld(t,l
at any point in the input frame by determining the matching value only at the frame end of each reference pattern.

最もよくマツチングするかを知るものである。これらの
詳細については特願昭５５−１５８２９６号に記述きれ
ている。なお、第２図では時点ｔで基準パターンＢ′と
最適マツチングし、ｔ−ＴＯから始するＢパターンを決
定する。This is to know how best to match. These details are fully described in Japanese Patent Application No. 55-158296. In FIG. 2, optimal matching is performed with reference pattern B' at time t, and pattern B starting from t-TO is determined.

このような原理にもとづき、入力音−の任意の領域に存
在する基準パターンの位：ｔが決定される。Based on this principle, the position t of the reference pattern existing in an arbitrary region of the input sound is determined.

再び第１図と第３図にもどシ説明する。The explanation will be given again with reference to FIGS. 1 and 3.

／ＡＫＡＮＥＩＲＯ／と発声された音声は、基準パター
ン／ＡｍＡ／、／Ａ＊Ｅ／、／ＥＩ／などとマツチング
され、各々％／ＡＫＡ／、／ＡＮＥ／。The voice uttered as /AKANEIRO/ is matched with the standard patterns /AmA/, /A*E/, /EI/, etc., and becomes % /AKA/, /ANE/, respectively.

／ＥＩ／・・・の存在する領域が決定される。The area where /EI/... exists is determined.

これにより判定部８により１％徴パラメータメモリ３に
対して制御信号ｆが出力され、特徴パラメータ、／ＡＫ
Ａ／、／ＡＮＥ／、／ＥＩ／を記憶するこのメモリ３の
アドレスより、あらかじめ決められた標準パターンメモ
リ６のアドレスへのデータ転送が行なわれる。As a result, the determination unit 8 outputs the control signal f to the 1% characteristic parameter memory 3, and the characteristic parameter /AK
Data is transferred from the address of this memory 3 storing A/, /ANE/, /EI/ to a predetermined standard pattern memory 6 address.

このようにして、あらかじめ定められた順序に単語ある
いは文章を発声することにより１判定部側で所定の基準
パターンを指定して連続非線形マツチングを行なうこと
により、入力に含まれる標準パターンをセグメンテーシ
ョンして、標準パターンを作成できるので、意味のない
ＶＣＶなどのパターンを発声する必要がなくなる。In this way, by uttering words or sentences in a predetermined order, the first judgment unit specifies a predetermined reference pattern and performs continuous nonlinear matching, thereby segmenting the standard pattern included in the input. , standard patterns can be created, eliminating the need to utter meaningless patterns such as VCV.

この実施例てはｖ＊■のパターンを基準とし友が、母音
ｖのみを用いて母音系列により入力Ｈａ中の各標準パタ
ーンをセグメンテーションすることも同様に達成できる
。この場合は連続線形照合機能を用いる。In this embodiment, it is also possible to similarly achieve segmentation of each standard pattern in the input Ha by a vowel sequence using only the vowel v using the v*■ pattern as a reference. In this case, a continuous linear matching function is used.

連続非線形照合は、音声の伸縮の程度が大きいＶＣＶ（
Ｖ−Ｖも含む。Ｊなどの単位でのマツチングに、連続線
形照合は、短いフレーム数の母音などとのマツチングに
用いられ、双方の機能は第２図で述べたように、入力音
声中のどの部分にどの基準パターンが存在する力・を決
定する。Continuous nonlinear matching uses VCV (
Also includes V-V. Continuous linear matching is used for matching in units such as J, and continuous linear matching is used for matching vowels with a short number of frames.As described in Figure 2, both functions are used to match which reference pattern in which part of the input audio. determines the force that exists.

標準パターンとしてＶＣＶが登録されると、次のフェー
ズでは、これをフロッピーディスク等に記憶させ、使用
者が認識の局面になると、フロッピーディスクより標準
パターンメモリに、該標準パターンを転送し、第１図の
システムで、距離計算部４に対して、標準パターンメモ
リ６の内容を送る。なおこの発明において基準パターン
のオフライン作成が重要となるが、これはＶ傘Ｖ、％る
いは母音類のパターンとも高、々４０〜５０種なので、
複数の話者からパターンを採集して平均的なパターンを
作ることが容易なので問題ない。When the VCV is registered as a standard pattern, in the next phase it is stored on a floppy disk, etc., and when the user comes to the stage of recognition, the standard pattern is transferred from the floppy disk to the standard pattern memory, and the first In the system shown in the figure, the contents of the standard pattern memory 6 are sent to the distance calculation section 4. In this invention, it is important to create a reference pattern off-line, since there are 40 to 50 types of V umbrella V, % rui, and vowel patterns.
There is no problem because it is easy to collect patterns from multiple speakers and create an average pattern.

以上説明したごとく本発明によれば、大語案認識に対し
て、現状で考慮した最良の標準パターンである■Ｃｖパ
ターンを用いるとき、特定話者の認識の局面で、使用者
が意味のある単語９文章を発声すれば、システムが自動
的に標準パターンを作成できるので、使用者の負担が減
少するのでその効果は大きい。As explained above, according to the present invention, when using the Cv pattern, which is the best standard pattern currently considered, for large word recognition, the user can find a meaning in the recognition of a specific speaker. By uttering nine words and sentences, the system can automatically create a standard pattern, which reduces the burden on the user and is highly effective.

[Brief explanation of drawings]

第１図は１本発明を実施する音声認識装置のブロック構
成を示し、第２図は、連続非線形照合の原理図を示し、
第３図は１本発明における標準パターン作成の例を示す
。３・・・特徴パラメータメモリ、５・・・基準パターン
メモリ、６・・・標準パターンメモリ。代理人　弁理士　薄田利幸 ■Ｉ（FIG. 1 shows a block configuration of a speech recognition device implementing the present invention, and FIG. 2 shows a principle diagram of continuous nonlinear matching.
FIG. 3 shows an example of standard pattern creation in the present invention. 3...Characteristic parameter memory, 5...Reference pattern memory, 6...Standard pattern memory. Agent Patent Attorney Toshiyuki Usuda ■I (

Claims

[Claims]

1. Continuous matching calculations are performed between a speech reference pattern created offline in advance and a series of feature parameters extracted from speech patterns uttered in a predetermined order so as to include the standard pattern to be registered, and the optimal matching interval is determined. A speech standard pattern registration method characterized by registering feature parameters in a section as a standard pattern in a memory each time it is detected.