JPH0277100A

JPH0277100A - Preliminary selecting device for speech recognition

Info

Publication number: JPH0277100A
Application number: JP63227584A
Authority: JP
Inventors: Makoto Okazaki; 真岡崎; Koji Eto; 公二江藤
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1988-09-13
Filing date: 1988-09-13
Publication date: 1990-03-16

Abstract

PURPOSE:To make an effective preliminary selection by deleting distance data corresponding to all feature patterns of a group that a feature pattern for preliminary selection which is selected and outputted belongs to, from a distance data storage means. CONSTITUTION:A preliminary selecting means 4 selects and outputs the feature pattern for preliminary selection which has the shortest distance among distances stored in a distance data temporary storage means 3. A link data storage means 5 is stored previously with the relation between respective feature patterns and all feature patterns in the group that the feature pattern belongs to. Then all the feature patterns of the group that the feature pattern for preliminary selection which is selected and outputted belongs to is outputted from the link data storage means 5 and inputted to the distance data storage means 3, thereby deleting the distance data corresponding to all the feature patterns of the group that the feature pattern for preliminary selection belongs to from the distance data storage means 3. Consequently, a single feature pattern for preliminary selection is obtained for the contents of one input voice and the width of the preliminary selection is never narrowed down.

Description

【発明の詳細な説明】〔概要〕音声認識時に１つの入力音声内容に対して複数の特徴パ
タンを有する音声辞書の中から少数の特徴パタンを予備
選択する音声認識用予備選択装置の改良に関し、予備選択の範囲が狭くならない効果的な予備選択が可能
な予備選択装置を提供することを目的とし、各人力音声内容に対応する複数の予備選択用特徴パタン
からなる群を予め登録格納する予備選択用特徴パタン記
憶手段と、入力音声内容から作られた入力特徴パタンと該予備選択
用特徴パタン記憶手段が格納する各予備選択用特徴パタ
ンとの距離を計算する距離計算手段と、該各予備選択用特徴パタンに対応する距離データを一時
的に格納する距離データ一時格納手段と、該距離データ
一時格納手段に格納されている距離のうち最小距離を示
す予備選択用特徴ツマタンを選択出力する予ＯＭＭ沢手
段と、各特徴パタンとその特徴パタンか属する群の全特徴パタ
ンとの関係付けを予め格納するりンクデータ格納手段と
を具備し、該選択出力された予備選択用特徴パタンか属する群の全
特徴パタンを該リンクデータ格納手段から出力して該距
離データ格納手段に入力し、それにより、該距離データ
格納手段から、選択出力された予備選択用特徴パタンか
属する群の全特徴パタンに対応する距離データを削除す
るように構成する。[Detailed Description of the Invention] [Summary] Regarding the improvement of a preliminary selection device for speech recognition that preliminarily selects a small number of feature patterns from a speech dictionary having a plurality of feature patterns for one input speech content during speech recognition, The purpose of the present invention is to provide a preliminary selection device capable of effective preliminary selection without narrowing the range of preliminary selection, and a preliminary selection device that registers and stores in advance a group of a plurality of preliminary selection feature patterns corresponding to each human voice content. a distance calculation means for calculating a distance between an input feature pattern created from the input audio content and each preliminary selection feature pattern stored in the preliminary selection feature pattern storage means; a distance data temporary storage means for temporarily storing distance data corresponding to the distance data temporary storage means; and a pre-OMM that selects and outputs a preliminary selection feature pattern indicating the minimum distance among the distances stored in the distance data temporary storage means. link data storage means for storing in advance the relationship between each feature pattern and all feature patterns of the group to which the feature pattern belongs; A feature pattern is outputted from the link data storage means and inputted to the distance data storage means, so that the preliminary selection feature pattern selectively outputted from the distance data storage means corresponds to all feature patterns of the group to which it belongs. Configure to delete distance data.

[Industrial application field]

本発明は、音声認識時に１つの人力音声内容に対して複
数の特徴パタンを有する音声辞書の中から少数の特徴パ
タンを予備選択する予備選択装置の改良に関する。The present invention relates to an improvement in a preliminary selection device that preliminarily selects a small number of feature patterns from a speech dictionary having a plurality of feature patterns for one human voice content during speech recognition.

近年、音声認識処理技術の進歩とＬＳＩ等の技術進歩に
より、多くの音声認識装置が開発されており、マン・マ
シン・インターフェイスに用いる等の応用が考えられて
いる。In recent years, with advances in speech recognition processing technology and advances in LSI technology, many speech recognition devices have been developed, and applications such as use in man-machine interfaces are being considered.

音声認識装置をマン・マシン・インターフェイスに用い
るためｂｉは、小型、低コスト、高性能等の要求を満た
す必要がある。特に、同一の内容を示す単語の音声でも
発声毎のゆらぎ、話者による声の違い、発音時の気分、
環境等により、発声時間、リズム、等の時間的要素、声
の大きさ、アクセント等のレベル的要素及び、イントネ
ーション、ホルマント位置の変化等の周波数成分的要素
が異なることがあり、常に一定で安定しているとは限ら
ないので、このような場合でも誤認識を避ける必要があ
る。Since the voice recognition device is used as a man-machine interface, the bi needs to meet requirements such as small size, low cost, and high performance. In particular, even when the sound of words that express the same content is pronounced, there may be fluctuations in each utterance, differences in voice depending on the speaker, mood at the time of pronunciation, etc.
Depending on the environment, time factors such as utterance time and rhythm, level factors such as voice volume and accent, and frequency component factors such as changes in intonation and formant position may vary, but are always constant and stable. Therefore, it is necessary to avoid misrecognition even in such cases.

このため、高性能化を目的として、音声登録時に複数の
音声を用い、それぞれについて特徴パタンを作り、登録
する方法が提案されている（マルチテンプレート登録方
式）。即ち、同一内容の例えば「１」に対して［イチＪ
と短く発音する場合、「イーチ」と長く発音する場合等
のそれぞれに特徴パタンを作成し、登録する。これによ
り、認識時の入力音声の変動により音声リズムが変動す
ることによる誤認識の可能性を少なくできる効果がある
。Therefore, for the purpose of improving performance, a method has been proposed in which a plurality of voices are used during voice registration, and characteristic patterns are created and registered for each voice (multi-template registration method). In other words, for example "1" with the same content, [Ichi J
Characteristic patterns are created and registered for each case, such as a short pronunciation of ``,'' and a long pronunciation of ``each''. This has the effect of reducing the possibility of erroneous recognition due to variations in speech rhythm due to variations in input speech during recognition.

しかしながら、この登録方式では１つの人力音声内容に
つき、複数の特徴パタンを作るため、音声辞書に登録さ
れる特徴パタンの数が多くなってしまい、その結果、認
識時には、入力音声により作られた特徴パタンと音声辞
書による特徴パタンとの距離計算量が多くなる。レスポ
ンスの良い応答をするためには高速に計算する必要が生
じており、上記の如く距離計算量が増大することは、そ
の必要性に反するばかりか、小型、低コストの要求に反
してしまうおそれがある。However, in this registration method, multiple feature patterns are created for one human voice content, so the number of feature patterns registered in the speech dictionary increases, and as a result, during recognition, the features created by the input speech The amount of distance calculation between the pattern and the feature pattern based on the speech dictionary increases. In order to provide a good response, it is necessary to perform calculations at high speed, and increasing the amount of distance calculations as described above not only goes against this necessity, but also runs the risk of going against the demands for small size and low cost. There is.

この問題を解決する方法の１つに、音声辞書に格納され
ている特徴パタンの中の特定のパラメータのみを用いて
、簡易距離を求め、この簡易距離により予備選択を行な
い、予備選択された候補の中で、高精度なマツチングを
行なう予備選択方式がある。One method to solve this problem is to obtain a simple distance using only specific parameters from the feature patterns stored in the speech dictionary, perform preliminary selection using this simple distance, and select the preselected candidates. Among these methods, there is a preliminary selection method that performs highly accurate matching.

第４図は、本発明の背景となる予備選択方式を含む音声
認識装置のシステム構成を示すプロ・ンク図である。本
発明は、第４図に示したシステム構成の中の、予備選択
処理部４０５の改良に関する。FIG. 4 is a diagram showing the system configuration of a speech recognition device including a preliminary selection method which is the background of the present invention. The present invention relates to an improvement of the preliminary selection processing section 405 in the system configuration shown in FIG.

〔従来の技術］第５図は従来の予備選択装置の一例を示すブロック図で
ある。同図において、従来の予備選択方式によれば、人
力パタンレジスタ５０１により得られた入力音声による
特徴パタンと、音声辞書メモリ５０２にあるすべての特
徴パタンとの特定のパラメータを用いて、予備選択用簡
易距離計算部５０３によって簡易距離を計算し、距離デ
ータ一時スタック用メモリ５０５、最小距離判定部５０
６、カウンタ５０７によって、上記簡易距離の小さい順
に上位Ｎ個の特徴パタンを選出していた。登録パタン番
号データメモリ５０４はパタン番号を辞書メモリ５０２
に出力する。最小距離判定部５０６の出力に得られるパ
タン番号は、距渾データ一時スタ・ンク用メモリ５０５
に戻されてそのパタン番号が距離データ一時スタック用
メモリ５０５から削除される。カウンタ５０７は最小距
離判定部５０６から出力されるパタン番号をＮ回計数す
ると、最小距離判定部５０６からの出力を停止する。最
小距離判定部５０６からの出力は、予（！ｌｉ！択結果
としてメモリ５０８に格納される。[Prior Art] FIG. 5 is a block diagram showing an example of a conventional preliminary selection device. In the same figure, according to the conventional preliminary selection method, specific parameters of the input voice feature pattern obtained by the manual pattern register 501 and all the feature patterns in the speech dictionary memory 502 are used for preliminary selection. A simple distance calculation unit 503 calculates a simple distance, and a distance data temporary stack memory 505 and a minimum distance determination unit 50
6. The counter 507 selected the top N feature patterns in descending order of the simple distance. The registered pattern number data memory 504 stores the pattern number in the dictionary memory 502.
Output to. The pattern number obtained from the output of the minimum distance determination unit 506 is stored in the distance data temporary stand memory 505.
, and the pattern number is deleted from the distance data temporary stack memory 505. When the counter 507 counts the pattern number output from the minimum distance determination section 506 N times, the counter 507 stops the output from the minimum distance determination section 506. The output from the minimum distance determination unit 506 is stored in the memory 508 as a preliminary (!li! selection result).

ところが、音声辞書メモリ５０２に前記の様に１つの入
力音声内容に対し複数の特徴パタンを有する場合、従来
の予備選択方式をそのまま適用すると、同し音声内容に
対し、複数の特徴パタンを予（Ｍ選択してしまうことが
生じてしまう。However, when the speech dictionary memory 502 has a plurality of feature patterns for one input speech content as described above, if the conventional preliminary selection method is applied as is, it will not be possible to predict the plurality of feature patterns for the same speech content. M may end up being selected.

〔発明が解決しようとする課題］従って、この様な場合、Ｎ個の特徴パタンを予備選択し
たにもかかわらず、登録音声の内容についてはＮ個以下
の予０ｉｉｆ選択となってしまい、予備ｊ！沢範囲が狭
（なるといった問題点を生していた。[Problem to be Solved by the Invention] Therefore, in such a case, even though N feature patterns have been preliminarily selected, N or less pre-selections are made for the content of the registered voice, and the preliminary selection is ! This caused problems such as the narrow area of the river.

本発明は、この様な問題を解決し、予備Ｊｘ沢の範囲が
狭くならない効果的な予備選択が可能な予備選択装置を
提供することを目的としている。SUMMARY OF THE INVENTION An object of the present invention is to solve such problems and provide a preliminary selection device capable of effective preliminary selection without narrowing the range of the preliminary Jx amount.

[Means to solve the problem]

第１図は本発明の原理ブロック図である。同図において
、本発明による音声認識用予備選択装置は、予備選択用
特徴パタン記憶手段ｌと、距離計算手段２と、距離デー
タ一時格納手段３と、予備選択手段４と、リンクデータ
格納手段５とを（ｌ！ｆｆえている。FIG. 1 is a block diagram of the principle of the present invention. In the figure, the preliminary selection device for speech recognition according to the present invention includes a preliminary selection feature pattern storage means 1, a distance calculation means 2, a distance data temporary storage means 3, a preliminary selection means 4, and a link data storage means 5. and (l!ff).

予備選択用特徴パタン記憶手段１は、各入力音声内容に
対応する複数の子６１　Ｕ　ｔＲ用特徴パタンかちなる
群を予め登録格納している。The preliminary selection feature pattern storage means 1 registers and stores in advance a group of a plurality of child 61 U tR feature patterns corresponding to each input voice content.

距離計算手段２は、入力音声内容から作られた人力特徴
パタンと該予備選択用特徴パタン記憶手段１が格納する
各予備選択用特徴パタンとの距＾１（を計算する。The distance calculating means 2 calculates the distance ^1 between the human feature pattern created from the input voice content and each preliminary selection feature pattern stored in the preliminary selection feature pattern storage means 1.

距離データ一時格納手段３は、各予備選択用特徴パタン
に対応する距離データを一時的に格納する。The distance data temporary storage means 3 temporarily stores distance data corresponding to each preliminary selection feature pattern.

予備選択手段４は、距離データ一時格納手段３に格納さ
れている距離のうら最小距離を示す予（ｉｉ＃選択用特
徴パタンを選択出力する。The preliminary selection means 4 selectively outputs a preliminary (ii# selection feature pattern) indicating the minimum distance behind the distance stored in the distance data temporary storage means 3.

リンクデータ格納手段５は、各特徴パタンとその特徴パ
タンか属する群の全特徴パタンとの関係付けを予め格納
する。The link data storage means 5 stores in advance the relationships between each feature pattern and all feature patterns of the group to which the feature pattern belongs.

選択出力された予備選択用特徴パタンか属する群の全特
徴パタンをリンクデータ格納手段５から出力して距離デ
ータ格納手段３に入力し、それにより、距離データ格納
手段３から、選択出力された予備選択用特徴パタンか属
する群の全特徴パタンに対応する距離データを削除する
ようにした。All feature patterns of the group to which the selectively output preliminary selection feature pattern belongs are outputted from the link data storage means 5 and inputted to the distance data storage means 3. The distance data corresponding to all feature patterns of the group to which the selection feature pattern belongs is deleted.

（作用〕予備選択手段４から選択出力れさた予備選択用特徴パタ
ンか属する群は、リンクデータ格納手段５によって特定
され、その特定された特徴パタンに対応する距離データ
が距離データ一時格納手段３から削除されるので、予備
選択手段４の出力には、１つの入力音声内容に対して単
一の予備選択用特徴パタンか得られ、従来の如く複数の
特徴パタンか予備選択されることはなく、したがって予
備選択の幅が狭くなることはない。(Operation) The group to which the preliminary selection feature pattern selectively output from the preliminary selection means 4 belongs is specified by the link data storage means 5, and the distance data corresponding to the specified feature pattern is stored in the distance data temporary storage means 3. As a result, the output of the preliminary selection means 4 is a single preliminary selection feature pattern for one input audio content, instead of a plurality of characteristic patterns being preliminary selected as in the past. , so the range of preliminary selection is not narrowed.

〔Example〕

第２図は本発明の実施例による予備選択装置を示すブロ
ック図である。同図において、第１図と同一部分には同
一参照番号を付しである。即ち、予備選択用特徴パタン
記憶手段ｌは、登録パタン番号データメモリ２０２　と
、辞書メモリ２０３　とを０１ηえており、距離計算手
段２は予備選択用簡易距離計算部２０４で構成されてお
り、距離データ一時格納手段３は距離データ一時スタッ
ク用メモリ２０５で構成されており、予備選択手段４は
最小距離判定部２０６で構成されている。リンクデータ
格納手段５は、パタン番号−音声番号リンクデータメモ
’Ｊ２０９と、照合部２１０と、再照合部２１１　とを
備えている。FIG. 2 is a block diagram illustrating a preselection device according to an embodiment of the present invention. In this figure, the same parts as in FIG. 1 are given the same reference numerals. That is, the preliminary selection feature pattern storage means 1 includes a registered pattern number data memory 202 and a dictionary memory 203, and the distance calculation means 2 includes a preliminary selection simple distance calculation section 204, which stores distance data. The temporary storage means 3 is composed of a distance data temporary stack memory 205, and the preliminary selection means 4 is composed of a minimum distance determination section 206. The link data storage means 5 includes a pattern number-voice number link data memo 'J209, a collation section 210, and a re-verification section 211.

辞書メモリ２０３は、複数の特徴パタンとこれに対応す
るパタン番号を含む。Dictionary memory 203 includes a plurality of characteristic patterns and pattern numbers corresponding thereto.

パタン番号−音声番号リンクデータメモリ２０９は、パ
タン番号と音声番号をリンクさせるリンクデータを格納
している。The pattern number-voice number link data memory 209 stores link data that links pattern numbers and voice numbers.

未知入力音声により入力パタンレジスタ２０１の出力に
得られた１つの人力特徴パタンに対応して、辞書メモリ
２０３に格納されている特徴パタンの中からＮ個の特１
枚パタンを予備選択するに際し、予ｆＪＭ選択用簡易距
離計算部２０４は、入力特徴パタンと辞書メモリ２０３
にある全てのパタン番号に対応する特徴パタンとの距離
を計算する。距離データ一時スタンク用メモリ２０５は
、予備選択用簡易距離計算部２０４により計算されたデ
ータを計算に用いた特徴パタンのパタン番号と共に一時
的に記憶する。最小距離判定部２０６は、距離データ一
時スタック用メモリ２０５に記憶されている距離データ
の中から最も小さい値を示す距離データを捜し、この距
離データに対応するパタン番号を出力する。Corresponding to one human feature pattern obtained from the output of the input pattern register 201 by unknown input speech, N features are selected from among the feature patterns stored in the dictionary memory 203.
When preliminarily selecting a sheet pattern, the preliminary fJM selection simple distance calculation unit 204 uses the input feature pattern and the dictionary memory 203.
Calculate the distance from the feature pattern corresponding to all pattern numbers in . The distance data temporary storage memory 205 temporarily stores the data calculated by the preliminary selection simple distance calculation unit 204 together with the pattern number of the feature pattern used in the calculation. The minimum distance determining unit 206 searches for distance data indicating the smallest value from among the distance data stored in the distance data temporary stack memory 205, and outputs a pattern number corresponding to this distance data.

リンクデータ格納手段５においては、照合部２１０が、
最小距離判定部２０６より出力されたパタン番号に対す
る音声番号を、パタン番号−音声番号リンクデータメモ
リ２０９から取り出し、その音声番号を再照合部２１１
に入力する。再照合部２１１は、入力された音声番号に
対する１つ、又は複数のパタン番号を照合し、パタン番
号を出力する。In the link data storage means 5, the collation unit 210
The voice number corresponding to the pattern number output from the minimum distance determination unit 206 is retrieved from the pattern number-voice number link data memory 209, and the voice number is retrieved from the re-verification unit 211.
Enter. The re-verification unit 211 verifies one or more pattern numbers against the input voice number and outputs the pattern number.

距離データ一時スタック用メモリ２０５は、格納してい
る距離データから、再照合部２１１より出力されたパタ
ン番号に対応したデータを削除する。The distance data temporary stack memory 205 deletes the data corresponding to the pattern number output from the re-verification unit 211 from the stored distance data.

カウンタ２０８は、Ｎ個のパタン番号をカウントすると
、最小距離判定部２０６に対して出力を停止させる。Ｎ
個のパタン番号は、予備選択結果メモリ２０７に順次格
納される。When the counter 208 counts N pattern numbers, it causes the minimum distance determination unit 206 to stop outputting. N
The pattern numbers are sequentially stored in the preliminary selection result memory 207.

こうして、最小距離判定部２０６から出力されるＮ個の
パタン番号には、同一の音声番号該対応することはなく
なる。In this way, the N pattern numbers output from the minimum distance determining section 206 will not correspond to the same voice number.

上記の如く、本発明の実施例では、予備選択時にパタン
番号−音声番号リンクデータを用い、Ｎ個の特徴パタン
を予備選択を行なうに際し、最小距離判定部２０６より
出力されたパタン番号について照合部２１０と再照合部
２１１を用いることにより、同一音声番号を有するパタ
ン番号を検出し、再照合部２１１より出力されたパタン
番号に対応した距離データ一時スタック用メモリ２０５
に記憶されているデータを削除する。As described above, in the embodiment of the present invention, the pattern number-voice number link data is used at the time of preliminary selection, and when performing preliminary selection of N feature patterns, the matching unit uses the pattern number output from the minimum distance determination unit 206. 210 and the re-verification unit 211, pattern numbers having the same voice number are detected, and distance data temporary stack memory 205 corresponding to the pattern number output from the re-verification unit 211 is used.
Delete the data stored in.

従って、最小距離判定部２０６より出力されるパタン番
号は、それを出力する前までに出力されたパタン番号と
同一の音声番号を存することがなくなる。Therefore, the pattern number output from the minimum distance determining section 206 will not have the same voice number as the pattern number output before outputting it.

第３図は第４図に示したシステム中の本選択処理部４０
７の構成を示すブロック図である。同図におついて、本
選択処理部４０７は、入力パタンレジスタ３０１　と、
辞書メモリ３０２と、本選択用距離計算部３０３と、最
小距離計算部３０４と、パタン番号−音声番号リンクデ
ータメモリ３０５と、照合部３０６とを備えている。FIG. 3 shows the book selection processing unit 40 in the system shown in FIG.
7 is a block diagram showing the configuration of FIG. In the figure, the main selection processing unit 407 has an input pattern register 301,
It includes a dictionary memory 302, a main selection distance calculation section 303, a minimum distance calculation section 304, a pattern number-voice number link data memory 305, and a collation section 306.

人力パタンレジスタ３０１　は入力パタンレジスタ２０
１　と同一物であり、１つの人力音声に基づいて１つの
特徴パタンを出力する。辞書メモリ３０２は、第２図に
示した予備選択結果メモリ２０７から出力されるパタン
番号に基づいて、予備選択されたＮ個の特徴パタンを出
力する。本選択用距離計算部３０３は入力パタンレジス
タ３０１からの特徴パタンと辞書メモリ３０２からの特
徴パタンとの距離を計算する。最小距離計算部３０４は
その距離のうち最小距離を判定し、これに対応するパタ
ン番号を出力する。照合部３０６は入力されたパタン番
号に対応する音声番号をパタン番号−音声番号リンクデ
ータメモリ３０５から読み出して出力する。The human pattern register 301 is the input pattern register 20
1 and outputs one feature pattern based on one human voice. The dictionary memory 302 outputs N preliminary selected feature patterns based on the pattern numbers output from the preliminary selection result memory 207 shown in FIG. The main selection distance calculation unit 303 calculates the distance between the feature pattern from the input pattern register 301 and the feature pattern from the dictionary memory 302. The minimum distance calculation unit 304 determines the minimum distance among the distances, and outputs a pattern number corresponding to the minimum distance. The matching unit 306 reads out the voice number corresponding to the input pattern number from the pattern number-voice number link data memory 305 and outputs it.

第４図の音声認識装置のシステム構成において、予備選
択処理部４０５に第２図に示したものを用い、本選択処
理部４０７に第３図に示したものを用い、１０数字音声
認識に適用し、自動ダイヤル装置を構築した時の実施例
の動作を以下に説明する。　以下に各ブロックの説明を
行なう。In the system configuration of the speech recognition device shown in Fig. 4, the one shown in Fig. 2 is used for the preliminary selection processing section 405, and the one shown in Fig. 3 is used for the main selection processing section 407, and the system is applied to 10-digit speech recognition. The operation of the embodiment when an automatic dialing device is constructed will be described below. Each block will be explained below.

詩虞Ｕ１忠彊Ａ四− １２チヤンネルのバンドパスフィルター（ＢＰＦ）によ
る帯域分割を行ない、各チャンネルにっいて、整流（絶
対値を取り、ＬＰＦにより、平滑）した値を求め、ｉｏ
ｍｓ毎に各チャンネルの値を求める。Shigo U1 Tadashi A4- Perform band division using a 12-channel band pass filter (BPF), calculate the rectified (absolute value is taken, and smoothed by LPF) value for each channel, and io
The value of each channel is determined every ms.

ＩｏｆｆＩｓ毎に出力される１２チャンネル分の出力は
、対数変換された後、１２チャンネル分のデータの平均
を求め、その平均値と各チャンネルの値の差を出力とす
る。The 12 channels' worth of output output for each IoffIs is logarithmically transformed, then the average of the 12 channels' worth of data is determined, and the difference between the average value and the value of each channel is output.

一×１．　　　部４０３人力音声パワーの値が、ある１つの闇値レベルより、大
か小かの判定により、音声区間の始端、終端を検出し、
その区間において特徴抽出部４０２より出力されたデー
タを蓄える。1 x 1. Section 403 Detects the start and end of a voice section by determining whether the value of human voice power is greater or less than a certain dark value level,
The data output from the feature extraction unit 402 in that section is stored.

（始端の検出方法）入力音声パワーが連続して５フレーム（５０ｍｓ）以上
、しきい値より大であった時、パワーが小から大に変っ
た時点を始端とする。(Method of Detecting the Starting Edge) When the input audio power is higher than the threshold value for 5 consecutive frames (50 ms) or more, the starting edge is defined as the point in time when the power changes from low to high.

（終端の検出方法）始端を検出した後、人力音声パワーが連続して３０フレ
ーム（３００ｍｓ）以上闇値より小であった時、パワー
が大から小に変化した時点を終端とする。(Method for detecting the end) After detecting the start end, when the human voice power is smaller than the dark value for 30 consecutive frames (300 ms) or more, the time point when the power changes from large to small is defined as the end.

■、１１由の正大　６１９部４０４始端から終端までの長さの異なる音声区間を時間軸で８
分割し、分割された区間において、平均化を行なう。こ
れにより、１２（チャンネル）×８　（フレーム）−９
６’（パラメータ）の特徴パタンを得る。■, 11 Yu no Seidai 619 parts 404 Voice sections of different lengths from the beginning to the end are 8 on the time axis.
It is divided and averaged in the divided sections. This results in 12 (channels) x 8 (frames) - 9
6' (parameter) characteristic pattern is obtained.

辞１４四− 学習用音声を用い、３つのレベルの異なる闇値により、
１回の音声入力に対し、前記音声区間検出部、正規化処
理部を用いて、３つの特徴パタンを作り、記録しである
。144 - Using learning audio, with three different levels of darkness values,
For one speech input, three characteristic patterns are created and recorded using the speech section detection section and normalization processing section.

登録する音声内容は、音声番号１に対し°“イチ°゛、
音声番号２に対し“二°′等であり、数字１０単語すべ
てを登録しておく。ただし、音声番号１０に対しては“
ゼロ°′とする。The audio content to be registered is “1°” for audio number 1.
For voice number 2, it is "2°', etc., and all 10 numeric words are registered. However, for voice number 10, it is "2°', etc.
Let it be zero°′.

つまり、登録されている特徴パタンの数は３０バクンで
ある。In other words, the number of registered feature patterns is 30.

゛　　　　　几　　　Ｂ　４０５本発明の適用部であり、その構成は第２図の通りである
。゛几 B 405 This is an application part of the present invention, and its configuration is as shown in FIG.

本実施例における特徴パタンは１２チャンネル×８フレ
ーム−９６パラメータになっており、予ｉ選択用距離と
しては、偶数チャンネル（２，４゜６、　８．１０．１
２）において各フレームに対し、人力された特徴パタン
と辞書の特徴パタンの既当するパラメータの差の絶対値
を求め、これを合計した値を用いている。The feature pattern in this example is 12 channels x 8 frames - 96 parameters, and the distance for pre-i selection is even channels (2.4°6, 8.10.1
In 2), for each frame, the absolute value of the difference between the parameters corresponding to the manually generated feature pattern and the feature pattern in the dictionary is determined, and the sum of these values is used.

また、予備選択の選択パタン数Ｎは１０パタンとする。Further, the number N of selection patterns for preliminary selection is assumed to be 10 patterns.

したがって、辞書にある３０パタンから、予備選択をす
ることにより、新たにＩＯパタンの辞書を作ったことに
なる。Therefore, by making a preliminary selection from the 30 patterns in the dictionary, a new IO pattern dictionary is created.

オｆｆｉ匹理ｌ汀凹− 子ｗｌｉ！択処理部４０５により作られた１０パタンの
中から、１パタン（１音声）を選出する。その構成は第
３図に示す通りである。It's off and I'm depressed - child wli! One pattern (one voice) is selected from among the 10 patterns created by the selection processing unit 405. Its configuration is as shown in FIG.

本選択に用いられる距離計算はＤＰ距離を用い、時間的
な変動を考慮した精密な距離を用いる。The distance calculation used for this selection uses the DP distance, and uses a precise distance that takes into account temporal fluctuations.

非線形マツチングでは、ＤＰ（ダイナミック・プログラ
ミング）法が代表的手法である。ゴムのように伸縮しな
がらマツチングの操作を行うことから、ラバー・マツチ
ングとも呼ばれている。In nonlinear matching, the DP (dynamic programming) method is a typical method. It is also called rubber matching because the matching operation is performed while expanding and contracting like rubber.

ＤＰマツチングは、標準パターンと入カバターンの距離
を計算する際に、両者の時系列情報を１対ｌに対応づけ
ることなく、２つのパターン間の距離が最も小さくなる
ように入力側を部分的にずらしながら対応づける方式で
ある。When calculating the distance between a standard pattern and an input pattern, DP matching partially matches the input side so that the distance between the two patterns is minimized, without making a one-to-one correspondence between the time series information of the two patterns. This is a method of matching while shifting.

ダイヤラ４０９本選択処理部４０７より出力された、音声番号を受けそ
れに対応するパルスを回線へ発信する。Dialer 409 receives the voice number output from main selection processing section 407 and transmits a pulse corresponding to the voice number to the line.

次に、使用者が「１」を発信する時のシステムの動作を
説明する。Next, the operation of the system when the user issues "1" will be explained.

まず、使用者はマイク４０１に向って゛イチ°°と発声
する。First, the user speaks into the microphone 401.

これと同時に音声区間検出部４０３は“イチ°゛の始端
と終端を検出し、この音声区間において特徴抽出部より
出力されたデータを保存する。At the same time, the voice section detecting section 403 detects the beginning and end of "I" and stores the data output from the feature extracting section in this voice section.

次に、正規化処理部４０４は、保存されている音声の特
徴に対し、時間軸の正規化を行ない、９６パラメータの
特徴パタンを作る。Next, the normalization processing unit 404 normalizes the saved audio features on the time axis to create a feature pattern of 96 parameters.

次に予備選択処理部４０５は、第２図に示した構成によ
り、人力音声により作られた特徴パタンと辞書にある３
０の特徴パタンとに基づいて予備選択用距離を計算し、
その中から１０の特徴パタンを選択し、予備選択結果の
辞書４０８に記録する。Next, the preliminary selection processing unit 405 uses the configuration shown in FIG.
Calculate a preliminary selection distance based on the feature pattern of 0,
Ten feature patterns are selected from among them and recorded in a dictionary 408 of preliminary selection results.

次に１１本選択処理部４０７は、第３図に示した構成に
より、入力音声により作られた特徴パタンと予備選択結
果の辞書４０８に含まれる１０の特徴パタンとに基づい
て本選択用距離を計算し、その最小値を示す特徴パタン
の持つパタン番号を求め、これに対応する音声番号（１
）を出力する次にダイヤラ４０９は受けた音声番号（１
）により回線へ「１」の信号を送出する。Next, the 11 line selection processing unit 407 calculates the main selection distance based on the feature pattern created by the input voice and the 10 feature patterns included in the dictionary 408 of the preliminary selection results, using the configuration shown in FIG. The pattern number of the feature pattern that shows the minimum value is calculated, and the corresponding voice number (1
), the dialer 409 then outputs the received voice number (1
) sends a "1" signal to the line.

ここで、予備選択処理部４０５の動作を、第２図によっ
てさらに詳細に説明する。Here, the operation of the preliminary selection processing section 405 will be explained in more detail with reference to FIG.

入力パタンレジスタ２０１には、入力音声により作られ
た特徴パタンか格納される。The input pattern register 201 stores characteristic patterns created from input speech.

辞書メモリ２０３には登録されているＩＯ音声、３０パ
タンの特徴バクンが記録されている。The dictionary memory 203 records registered IO voices and 30 patterns of characteristic bakuns.

登録パタン番号データメモリ２０２は、例えば、（１１
，１２，１３，２１，２２，２３，・−・・・、０１，
０２，０３１　というデータを格納しており、この場合
、上１ケタは音声番号、下１ケタは闇値番号となってお
り、登録時、１つの音声Ｘｌ対してＸｌ、Ｘ２．Ｘ３（
７）３パタンか登録されたことを意味している。The registered pattern number data memory 202 stores, for example, (11
,12,13,21,22,23,...,01,
02,031 is stored. In this case, the first digit is the voice number and the bottom one is the dark value number. When registering, for one voice Xl, Xl, X2 . X3(
7) This means that 3 patterns have been registered.

予備選択用簡易距離計算部２０４は偶数チャンネルの全
パラメータを対象とし、入力特徴パタンと辞書の特徴パ
タンとの差分の絶対値を計算し、これを合計した予備選
択用簡易距離を計算する。The preliminary selection simple distance calculation unit 204 targets all parameters of even channels, calculates the absolute value of the difference between the input feature pattern and the dictionary feature pattern, and calculates the preliminary selection simple distance by summing these values.

入力パタンに対し、各特徴パタンの予備選択用距離は距
離データ一時スタック用メモリ２０５にパタン番号と共
に記録される。The preliminary selection distance of each feature pattern with respect to the input pattern is recorded in the distance data temporary stack memory 205 together with the pattern number.

次に、最小距離判定部２０６はこの距離データの中から
最も小さい値を示す距離を検出し、そのパタン番号を出
力する。Next, the minimum distance determination unit 206 detects the distance indicating the smallest value from this distance data, and outputs its pattern number.

この時に出力されるパタン番号は、音声番号ｌに属する
パタン番号＋１１．１２．１３）であるとは限らないが
、その候補であるとｉみなして、予備選択結果として予
ｗＩ選択結果メモリ２０７に記録する。The pattern number output at this time is not necessarily the pattern number belonging to voice number l+11.12.13), but it is regarded as a candidate and is stored in the preliminary selection result memory 207 as a preliminary selection result. Record.

また同時に、この出力されたパタン番号、例えば２１、
は照合部２１０に入力され、照合部２１０はそのパタン
番号２１に対応する音声番号２をパタン番号−音声番号
リンクデータメモリ２ｏ９がら読み出して出力する。At the same time, the output pattern number, for example 21,
is input to the matching section 210, and the matching section 210 reads out the voice number 2 corresponding to the pattern number 21 from the pattern number-voice number link data memory 2o9 and outputs it.

照合部２１０かち出力された音声番号２は再照合部２１
１に入力され、再照合部２１１は、パタン番号−音声番
号リンクデータメモリ２０９に格納されているデータを
参照して、その音声番号２に対応するパタン番号（２１
，２２，２３１を出力する。The voice number 2 output from the collation unit 210 is sent to the re-verification unit 21
1, and the re-verification unit 211 refers to the data stored in the pattern number-voice number link data memory 209 to find the pattern number (21) corresponding to the voice number 2.
, 22, 231 are output.

次にこれらのパタン番号＋２Ｌ２２．２３）に対応する
特徴パタンを距離データ一時スタック用メモリ２０５か
ら削除する。Next, the characteristic patterns corresponding to these pattern numbers +2L22.23) are deleted from the distance data temporary stack memory 205.

そして前と同様に最小距離判定部２０６により得られた
パタン番号を予備選択結果に加える。Then, as before, the pattern number obtained by the minimum distance determining section 206 is added to the preliminary selection result.

この様な動作を予備選択結果がｌｏパタンになるまで繰
り返す。Such operations are repeated until the preliminary selection result becomes the lo pattern.

〔Effect of the invention〕

以上説明した様に、本発明によれば、同一音声番号に対
応するパタン番号を削除する用にしたので、予備選択に
おける選択特徴パタンには同一音声内容の複数のパタン
選択をすることがなくなる効果があり、認識率の向上環
に寄与するところが大きい。As explained above, according to the present invention, since pattern numbers corresponding to the same voice number are deleted, there is no need to select multiple patterns with the same voice content as selected feature patterns in preliminary selection. This greatly contributes to improving the recognition rate.

[Brief explanation of the drawing]

第１図は本発明の原理ブロック図、第２図は本発明の実施例による予備選択装置を示すブロ
ック図、第３図は本発明による予ｉ選択の後の本選択処理部の構
成の一例を示すブロック図、第４図は本発明の背景となる音声認識装置のシステム構
成を示す図、第５図は従来の予備選択装置の一例を示すブロック図で
ある。第２図に於いて、１・・・予備選択手段、２・・・距離計算手段、３・・・距離データ一時格納手段、４・・・予備選択手段４．５・・・リンクデータ格納手段、２０１・・・入力パタンレジスタ、２０２・・・登録パタン番号データメモリ、２０３・・
・辞書メモリ、２０４・・・予備選択用簡易距離計算部、２０５・・・
距離データ一時スタック用メモリ、２０６・・・最小距
離判定部、２０７・・・予備選択結果メモリ、２０８・・・カウンタ、２０９・・・パタン番号−音声番号リンクデータメモリ
、２１０・・・照合部、２１１・・・再照合部。FIG. 1 is a block diagram of the principle of the present invention. FIG. 2 is a block diagram showing a preliminary selection device according to an embodiment of the present invention. FIG. 3 is an example of the configuration of a main selection processing section after preliminary i-selection according to the present invention. FIG. 4 is a block diagram showing the system configuration of a speech recognition device which is the background of the present invention. FIG. 5 is a block diagram showing an example of a conventional preliminary selection device. In FIG. 2, 1... Preliminary selection means, 2... Distance calculation means, 3... Distance data temporary storage means, 4... Preliminary selection means 4. 5...Link data storage means, 201...Input pattern register, 202...Registered pattern number data memory, 203...
- Dictionary memory, 204... Simple distance calculation unit for preliminary selection, 205...
Distance data temporary stack memory, 206... Minimum distance determination section, 207... Preliminary selection result memory, 208... Counter, 209... Pattern number-voice number link data memory, 210... Verification section , 211... Re-verification section.

Claims

[Claims] 1. Preliminary selection feature pattern storage means (1) for registering and storing in advance a group of a plurality of preliminary selection feature patterns corresponding to each input audio content; and an input made from the input audio content. distance calculation means (2) for calculating the distance between the feature pattern and each preliminary selection feature pattern stored in the preliminary selection feature pattern storage means (1); Distance data temporary storage means (3) for temporarily storing distance data; and preliminary selection means (4) for selectively outputting a preliminary selection feature pattern indicating the minimum distance among the distances stored in the distance data temporary storage means (3). ), and link data storage means (5) for storing in advance the relationship between each feature pattern and all feature patterns of the group to which the feature pattern belongs, and a link data storage means (5) for storing in advance the relationship between each feature pattern and all feature patterns of the group to which the selected and outputted feature pattern for preliminary selection belongs. output all feature patterns from the link data storage means (6) and input them to the distance data storage means (3), thereby
A preliminary selection device for speech recognition, characterized in that distance data corresponding to all feature patterns of a group to which the selectively outputted preliminary selection feature pattern belongs is deleted from the distance data storage means (3).