JP2001290494A

JP2001290494A - Method and device for generating registered word dictionary, and method and device for speech recognition

Info

Publication number: JP2001290494A
Application number: JP2000103055A
Authority: JP
Inventors: Sumiyuki Okimoto; 純幸沖本; Tatsuya Kimura; 達也木村; Taisuke Watanabe; 泰助渡辺
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2000-04-05
Filing date: 2000-04-05
Publication date: 2001-10-19

Abstract

PROBLEM TO BE SOLVED: To drastically reduce a dictionary domain while maintaining a high recognition rate and to easily realize hardware by reducing the number of computations. SOLUTION: Parmeter time series computed from inputted speech are coded employing a degree of similarity vector code book 106 that is beforehand generated by using learning speech and a regression coefficient code book 108. The coded series are registered in a coded registration word dictionary 114 as word patterns. In order to recognize speech, the word pattern stored in the dictionary 114 are restored into parameter time series by using the books 106 and 108 and a recognition result is obtained by comparing the restored parameter time series and the parameter series computed from the speech to be recognized.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、認識語彙を音声で
登録し認識を行うための登録単語辞書作成方法及びその
装置と、これを用いた音声認識方法及びその装置に関す
るものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and an apparatus for creating a registered word dictionary for registering and recognizing a recognized vocabulary by voice, and a speech recognition method and an apparatus using the same.

【０００２】[0002]

【従来の技術】従来より、認識対象語彙を決定する単語
辞書を音声を用いて作成する場合には、登録者が同一の
単語を１回以上発声したものを用いてそのパラメータか
ら辞書単語パターンを作成し、これを複数単語それぞれ
について作成したものを集めて、登録単語辞書としてい
た。またこれを用いて音声を認識する場合には、入力音
声をこの辞書中の辞書単語パターンと比較することによ
って認識するという方法が採られていた。このような発
声登録方式の音声認識は、たとえば、"登録型英語用モ
デル音声法の検討"(日本音響学会講演論文集, 1-Q-11,
1995/3)などがある。2. Description of the Related Art Conventionally, when a word dictionary for determining a vocabulary to be recognized is created using speech, a registrant utters the same word one or more times to use a dictionary word pattern from its parameters. They were created, and those created for each of a plurality of words were collected and used as a registered word dictionary. When using this to recognize speech, a method has been adopted in which input speech is recognized by comparing it with dictionary word patterns in this dictionary. Such a speech registration speech recognition method is described in, for example, "Study of a Registered Model English Speech Method" (Acoustic Society of Japan, 1-Q-11,
1995/3).

【０００３】図６および図７は、この手法の構成図であ
る。図６は入力音声から辞書用単語パターンを作成し、
登録単語辞書に登録する方法の構成図であり、図７はこ
の登録単語辞書を用いて入力音声を認識する方法の構成
図である。図６において、１は音響分析部、２は類似度
計算部、３は標準パターン格納部、４は回帰係数計算
部、９は単語パターン格納部、１０は辞書単語パターン
作成部、１４は登録単語辞書である。また、図７の認識
方法の構成図において１〜４までは、図６と同じもので
あり、１５は単語パターン作成部、１４は登録単語辞書
であり、１7は単語認識部である。FIGS. 6 and 7 are block diagrams of this method. FIG. 6 creates a dictionary word pattern from the input voice,
FIG. 7 is a configuration diagram of a method of registering in a registered word dictionary, and FIG. 7 is a configuration diagram of a method of recognizing an input voice using the registered word dictionary. 6, 1 is an acoustic analysis unit, 2 is a similarity calculation unit, 3 is a standard pattern storage unit, 4 is a regression coefficient calculation unit, 9 is a word pattern storage unit, 10 is a dictionary word pattern creation unit, and 14 is a registered word. It is a dictionary. Also, in the configuration diagram of the recognition method in FIG. 7, steps 1 to 4 are the same as those in FIG. 6, 15 is a word pattern creation unit, 14 is a registered word dictionary, and 17 is a word recognition unit.

【０００４】ここで、図６および図７に共通な標準パタ
ーン格納部３は、あらかじめ多くの話者が発声した学習
データに対して、ｎ個の各音素毎に、その音素の特徴を
最も良く表現する時間的位置（特徴フレーム）を求め、
この特徴フレームを中心とした特徴パラメータの時間パ
ターンを使用して作成された音素標準パターンを格納し
ている。時間パターンとしては、特徴フレームの前後数
フレームに対してＬＰＣケプストラム係数（Ｃ０〜Ｃ
８）を計算し、これらを１次元に並べたパラメータ系列
を求め、このパラメータ系列の各要素の平均値ベクトル
と要素間の共分散行列を求め標準パターンとして格納す
るものとする。[0006] Here, the standard pattern storage unit 3 common to FIG. 6 and FIG. 7 optimizes the characteristics of phonemes for each of n phonemes with respect to training data uttered in advance by many speakers. Find the temporal position (feature frame) to represent,
A phoneme standard pattern created using a time pattern of feature parameters centered on the feature frame is stored. As the time pattern, LPC cepstrum coefficients (C0 to C
8) is calculated, a parameter series in which these are arranged one-dimensionally is obtained, and an average value vector of each element of the parameter series and a covariance matrix between the elements are obtained and stored as a standard pattern.

【０００５】また、図６および図７における音響分析部
１、類似度計算部２、標準パターン格納部３および回帰
係数計算部４の構成は、入力音声からｎ次元の類似度ベ
クトルの系列とｎ次元の回帰係数ベクトルの系列を求め
るための構成であり、以下のようにしてこれらの系列を
求める。The configurations of the acoustic analysis unit 1, similarity calculation unit 2, standard pattern storage unit 3, and regression coefficient calculation unit 4 shown in FIGS. This is a configuration for obtaining a series of dimensional regression coefficient vectors, and these series are obtained as follows.

【０００６】すなわち、音響分析部１は、入力された音
声を各分析フレームごとに音響分析を行い、ＬＰＣケプ
ストラム係数の系列が計算される。類似度計算部２は、
音響分析部１で得られたＬＰＣケプストラム係数の系列
は、標準パターン格納部３のｎ個のパターンそれぞれと
比較してｎ個の類似度を求め、これを１つのベクトルと
するｎ次元の類似度ベクトルの系列が計算される。回帰
係数計算部４では、この類似度ベクトルの時系列から、
各ベクトル成分ごとに回帰係数をフレームごとに計算
し、ｎ次元の回帰係数ベクトルの系列を求める。That is, the acoustic analysis unit 1 performs an acoustic analysis on the input speech for each analysis frame, and calculates a series of LPC cepstrum coefficients. The similarity calculation unit 2
The sequence of LPC cepstrum coefficients obtained by the acoustic analysis unit 1 is compared with each of the n patterns in the standard pattern storage unit 3 to obtain n similarities, and the n-dimensional similarity is defined as one vector. A series of vectors is calculated. The regression coefficient calculator 4 calculates the similarity vector from the time series
A regression coefficient is calculated for each vector component for each frame, and a series of n-dimensional regression coefficient vectors is obtained.

【０００７】登録単語辞書のための辞書単語パターンを
作成する場合には、図６において、２回発声された同一
単語の音声データから、上述の類似度ベクトル系列と回
帰係数ベクトル系列をそれぞれ求め、これを一組として
単語パターン格納部９に格納する。辞書単語パターン作
成部１０では、単語パターン格納部９に格納された２回
発声分の類似度ベクトル系列と回帰係数ベクトル系列の
組に対して、ＤＰマッチングによる時間照合を取り、そ
の結果に従って平均の類似度ベクトル系列と、平均の回
帰係数ベクトル系列を求めて、これを該当単語の辞書単
語パターンとして登録単語辞書１４に格納する。In the case of creating a dictionary word pattern for a registered word dictionary, in FIG. 6, the above-described similarity vector series and regression coefficient vector series are respectively obtained from speech data of the same word uttered twice. These are stored in the word pattern storage unit 9 as a set. The dictionary word pattern creation unit 10 performs time matching by DP matching on a set of a similarity vector sequence and a regression coefficient vector sequence for two utterances stored in the word pattern storage unit 9, and performs average matching according to the result. A similarity vector series and an average regression coefficient vector series are obtained and stored in the registered word dictionary 14 as a dictionary word pattern of the corresponding word.

【０００８】また、入力された音声を認識する場合に
は、図７において、入力音声から音響分析部１、類似度
計算部２、標準パターン格納部３および回帰係数計算部
４の構成に従って類似度ベクトルの系列と、回帰係数ベ
クトルの系列との組を求める。When the input speech is recognized, the similarity is calculated from the input speech according to the configurations of the acoustic analysis unit 1, the similarity calculation unit 2, the standard pattern storage unit 3, and the regression coefficient calculation unit 4 in FIG. A set of a series of vectors and a series of regression coefficient vectors is obtained.

【０００９】この２つのベクトル系列は単語パターン作
成部１５で合わされて、入力音声の単語パターンとす
る。単語認識部１５では、単語パターン作成部１５で合
わされて単語パターンと、登録単語辞書１４中に格納さ
れた各辞書単語パターンと比較され、最も高いスコアを
得た単語が認識結果とする。ここでも、入力音声の単語
パターンと登録単語辞書中の辞書単語パターンの比較に
は、ＤＰマッチングによる時間整合を用いた距離を用い
る。The two vector sequences are combined by a word pattern creating section 15 to form a word pattern of the input speech. In the word recognition unit 15, the word pattern created by the word pattern creation unit 15 is compared with each dictionary word pattern stored in the registered word dictionary 14, and the word with the highest score is determined as a recognition result. Here, a distance using time matching by DP matching is used for comparing the word pattern of the input voice with the dictionary word pattern in the registered word dictionary.

【００１０】[0010]

【発明が解決しようとする課題】このように上述した従
来の方法では、登録単語辞書に各辞書単語パターンを格
納するために、単語の各フレームごとにｎ次元の類似度
ベクトルと、ｎ次元の回帰係数ベクトルを格納する必要
があるため、登録単語辞書のための領域として非常に大
きなメモリ領域を必要とするという課題を有しており、
これを削減することが要求されている。As described above, in the conventional method described above, in order to store each dictionary word pattern in the registered word dictionary, an n-dimensional similarity vector and an n-dimensional similarity vector are stored for each frame of the word. Since it is necessary to store the regression coefficient vector, there is a problem that a very large memory area is required as an area for the registered word dictionary.
It is required to reduce this.

【００１１】また、入力音声と登録単語辞書の辞書単語
パターンとの比較において、１単語中の１フレームあた
り２×ｎ回の乗算を必要とし、演算に時間がかかると
いう問題を有しており、これを削減することが要求され
ている。[0011] Further, in comparison between the input speech and the dictionary word pattern of the registered word dictionary, there is a problem that 2 × n multiplications are required per frame in one word, and the calculation takes time. It is required to reduce this.

【００１２】本発明は、上記課題を解決するものであ
り、圧縮されたコードブックを用いて登録単語辞書中の
辞書単語パターンを表現することにより、高認識率を保
ったまま辞書領域を大幅に削減するとともに、演算回数
も削減することを可能にし、ハードウェア化を容易にす
ることを目的とする。The present invention solves the above-mentioned problem. By expressing a dictionary word pattern in a registered word dictionary using a compressed codebook, the dictionary area can be significantly increased while maintaining a high recognition rate. An object of the present invention is to reduce the number of operations and the number of operations, and to facilitate hardware implementation.

【００１３】[0013]

【課題を解決するための手段】この目的を達成するため
に本発明の登録単語辞書作成においては、発声音声から
計算したパラメータの時系列を、あらかじめ学習用音声
を用いて作成しておいたコードブックを用いて符号化
し、この符号系列を単語パターンとして登録単語辞書に
登録する。また、音声を認識する場合には、登録単語辞
書中に格納された単語パターンを、コードブックを用い
てパラメータの時系列に復元し、これと認識対象の音声
から計算したパラメータ系列を比較することによって認
識結果を得る。In order to achieve this object, in the creation of a registered word dictionary according to the present invention, a time series of parameters calculated from an uttered voice is prepared by using a code prepared in advance using a learning voice. Encoding is performed using a book, and this code sequence is registered as a word pattern in a registered word dictionary. When recognizing speech, the word pattern stored in the registered word dictionary is restored to a time series of parameters using a codebook, and this is compared with the parameter series calculated from the speech to be recognized. To obtain the recognition result.

【００１４】この本発明により、従来ｎ次元の類似度ベ
クトル系列、またはｎ次元の回帰係数ベクトル系列とし
て表現されていた登録単語辞書の辞書単語パターンが、
類似度ベクトル系列および回帰係数ベクトル系列それぞ
れについて、単なる１次元のコード系列として表わさ
れ、認識精度を低下させることなく登録単語辞書のため
のメモリ領域が削減され、ハードウェア化が容易とな
る。According to the present invention, the dictionary word pattern of the registered word dictionary, which has been conventionally expressed as an n-dimensional similarity vector series or an n-dimensional regression coefficient vector series, is:
Each of the similarity vector series and the regression coefficient vector series is represented as a simple one-dimensional code series, and the memory area for the registered word dictionary is reduced without lowering the recognition accuracy, thereby facilitating hardware implementation.

【００１５】また、本発明では、類似度ベクトルおよび
回帰係数ベクトルを符号化・復号化するためのコードブ
ックのためのメモリ領域が新たに必要となるが、コード
ブックを構成するｎ次元の類似度ベクトル、およびｎ次
元の回帰係数ベクトルそれぞれについて、ｎより小さい
Ｎ個、ないしＭ個の有効なベクトル成分のみを選択する
ことにより、入力音声と登録単語辞書中の辞書単語パタ
ーンとの比較のための演算回数を削減すると共に、コー
ドブックの大きさ自体も圧縮することを可能とする。Further, in the present invention, a memory area for a codebook for encoding / decoding the similarity vector and the regression coefficient vector is newly required. However, the n-dimensional similarity constituting the codebook is required. By selecting only N or M valid vector components smaller than n for each of the vector and the n-dimensional regression coefficient vector, the input speech and the dictionary word pattern in the registered word dictionary are compared. The number of operations can be reduced, and the size of the codebook itself can be compressed.

【００１６】[0016]

【発明の実施の形態】本発明の請求項１に記載の発明
は、語彙登録用の発声音声から求めた単語パターンを、
コードブックを用いて符号化し、このコード系列を登録
単語辞書に辞書単語パターンとして登録することを特徴
とするものであり、登録単語辞書を格納するための領域
を大幅に削減できるという作用を有する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS According to the first aspect of the present invention, a word pattern obtained from an utterance voice for vocabulary registration is converted into a word pattern.
It is characterized by encoding using a codebook and registering this code sequence as a dictionary word pattern in a registered word dictionary, and has the effect of greatly reducing the area for storing the registered word dictionary.

【００１７】請求項２に記載の発明は、辞書登録語を登
録者が発声し、フレーム毎に得られるｍ個（ｍは整数）
の音響的特徴パラメータと、あらかじめ学習データから
作成したｎ種類（ｎは整数）の標準パターン各々が有す
るｍ個の特徴パラメータとのマッチングを行い、ｎ個の
類似度をフレーム毎に求めｎ次元の類似度ベクトル系列
とし、これをあらかじめ学習データの類似度ベクトルか
ら求めたコードブックを用いてフレーム毎に符号化し、
このコード系列を登録単語辞書に辞書単語パターンとし
て登録することを特徴とするものであり、類似度ベクト
ル系列を辞書単語パターンとする登録単語辞書のための
メモリ領域を大幅に削減できるという作用を有する。According to a second aspect of the present invention, a registrant utters dictionary registration words and obtains m (m is an integer) obtained for each frame.
Are matched with m feature parameters of each of n types (n is an integer) of standard patterns created in advance from learning data, and n similarities are obtained for each frame to obtain n-dimensional n-dimensional similarities. A similarity vector sequence is encoded for each frame using a codebook previously obtained from the similarity vector of the learning data,
This code sequence is registered as a dictionary word pattern in a registered word dictionary, and has the effect of significantly reducing the memory area for a registered word dictionary using a similarity vector sequence as a dictionary word pattern. .

【００１８】請求項３に記載の発明は、辞書登録語を登
録者が発声し、フレーム毎に得られるｍ個（ｍは整数）
の音響的特徴パラメータと、あらかじめ学習データから
作成したｎ種類（ｎは整数）の標準パターン各々が有す
るｍ個の特徴パラメータとのマッチングを行い、ｎ個の
類似度をフレーム毎に求めｎ次元の類似度ベクトル系列
とし、このｎ種類の各類似度の時系列それぞれに対し
て、類似度の時間変化量をフレーム毎に求め、この類似
度の時間変化量のｎ次元ベクトルの系列を、あらかじめ
学習データの類似度の時間変化量のｎ次元のベクトルか
ら求めたコードブックを用いてフレーム毎に符号化し、
このコード系列を登録単語辞書の辞書単語パターンとし
て登録することを特徴とするものであり、類似度の時間
変化量のベクトル系列を単語パターンとする登録単語辞
書のためのメモリ領域を大幅に削減できるという作用を
有する。According to a third aspect of the present invention, m words (m is an integer) obtained for each frame by a registrant uttering dictionary registration words.
Are matched with m feature parameters of each of n types (n is an integer) of standard patterns created in advance from learning data, and n similarities are obtained for each frame to obtain n-dimensional n-dimensional similarities. As a similarity vector series, a time change amount of the similarity is obtained for each frame for each of the n types of similarity time series, and a series of n-dimensional vectors of the time change amount of the similarity is learned in advance. Encoding is performed for each frame using a codebook obtained from an n-dimensional vector of a temporal change amount of data similarity,
This code sequence is registered as a dictionary word pattern of a registered word dictionary, and a memory area for a registered word dictionary in which a vector sequence of a time variation of similarity is used as a word pattern can be significantly reduced. It has the action of:

【００１９】請求項４に記載の発明は、請求項２または
３に記載の登録単語辞書作成方法において、ｎ次元の類
似度ベクトル系列をコードブックを用いて符号化して求
めたコード系列と、類似度の時間変化量のｎ次元のベク
トル系列をコードブックを用いて符号化して求めたコー
ド系列を合わせて、辞書単語パターンとして登録するこ
とを特徴とする登録単語辞書作成方法であり、類似度ベ
クトル系列と、類似度の時間変化量のベクトル系列を併
用して単語パターンとする登録単語辞書のためのメモリ
領域を大幅に削減できるという作用を有する。According to a fourth aspect of the present invention, in the method for creating a registered word dictionary according to the second or third aspect, a code sequence obtained by encoding an n-dimensional similarity vector sequence using a codebook is used. A method for creating a registered word dictionary, characterized in that an n-dimensional vector sequence of a degree of time change is encoded using a codebook and a code sequence obtained is combined and registered as a dictionary word pattern. This has the effect of significantly reducing the memory area for a registered word dictionary that is used as a word pattern by using both a sequence and a vector sequence of the time variation of similarity.

【００２０】請求項５に記載の発明は、請求項２に記載
の登録単語辞書作成方法において、学習データの類似度
ベクトルからコードブックを作成する際に、ｎ次元の各
コードベクトルについて、ｎより小さいＮ個の成分のみ
を選んで作成したコードベクトルを用いてコードブック
とすることを特徴とするものであり、類似度ベクトル系
列による単語パターンの符号化に必要な、コードブック
の大きさを削減できるという作用を有する。According to a fifth aspect of the present invention, in the method for creating a registered word dictionary according to the second aspect, when a codebook is created from the similarity vector of the learning data, each of the n-dimensional code vectors has The codebook is characterized by using a code vector created by selecting only small N components and reducing the size of the codebook necessary for encoding a word pattern by a similarity vector sequence. Has the effect of being able to.

【００２１】請求項６に記載の発明は、請求項３に記載
の登録単語辞書作成方法において、類似度の時間変化量
のベクトルのコードブックを作成する際に、ｎ次元の各
コードベクトルについて、ｎより小さいＭ個の成分のみ
を選んで作成したコードベクトルを用いてコードブック
とすることを特徴とするものであり、類似度の時間変化
量のベクトル系列による単語パターンの符号化に必要
な、コードブックの大きさを削減できるという作用を有
する。According to a sixth aspect of the present invention, in the method for creating a registered word dictionary according to the third aspect, when creating a code book of a vector of the amount of time change of the similarity, each of the n-dimensional code vectors includes: It is characterized in that a codebook is created by using a code vector created by selecting only M components smaller than n, and is necessary for encoding a word pattern by a vector series of a time variation of similarity. This has the effect that the size of the codebook can be reduced.

【００２２】請求項７に記載の発明は、請求項４に記載
の登録単語辞書作成方法において、類似度ベクトルのコ
ードブックを作成する際に、ｎより小さいＮ個の成分の
みを選んで作成したコードベクトルを用いたコードブッ
クと、類似度の時間変化量のベクトルのコードブックを
作成する際に、ｎより小さいＭ個の成分のみを選んで作
成したコードベクトルを用いたコードブックを併用する
ことを特徴とするものであり、類似度ベクトル系列と類
似度の時間変化量のベクトル系列を併用する単語パター
ンを符号化する際に必要なコードブックの大きさを削減
できるという作用を有する。According to a seventh aspect of the present invention, in the method for creating a registered word dictionary according to the fourth aspect, when creating a codebook of similarity vectors, only N components smaller than n are selected and created. When creating a codebook using a code vector and a codebook of a vector of the amount of time change of similarity, a codebook using a code vector created by selecting only M components smaller than n is used. And has the effect of reducing the size of a codebook required when encoding a word pattern that uses both a similarity vector sequence and a vector sequence of a temporal change in similarity.

【００２３】請求項８に記載の発明は、予め学習データ
の類似度ベクトルから求めたコードブックを用いて、予
め作成した登録単語辞書からの符号化された各辞書単語
パターンを復号化し、この復号化された辞書単語パター
ンを用いて、発声された入力音声を認識することを特徴
とするものであり、符号化され圧縮された単語辞書を用
いて入力音声を認識することができるという作用を有す
る。According to an eighth aspect of the present invention, each dictionary word pattern encoded from a registered word dictionary created in advance is decoded using a codebook previously obtained from a similarity vector of learning data, and this decoding is performed. It is characterized by recognizing the uttered input voice by using the coded dictionary word pattern, and has the effect that the input voice can be recognized by using the coded and compressed word dictionary. .

【００２４】請求項９に記載の発明は、発声された入力
音声を、フレーム毎に得られるｍ個（ｍは整数）の音響
的特徴パラメータと、あらかじめ学習データから作成し
たｎ種類（ｎは整数）の標準パターン各々が有するｍ個
の特徴パラメータとのマッチングを行い、ｎ個の類似度
をフレーム毎に求めｎ次元の類似度ベクトル系列とし、
これをあらかじめ学習データの類似度ベクトルから求め
たコードブックを用いてフレーム毎に、予め作成した登
録単語辞書から辞書単語パターンとして復元し、この辞
書単語パターンを用いて、前記入力音声を認識すること
を特徴とするものであり、符号化され圧縮された単語辞
書を用いて入力音声を認識することができるという作用
を有する。According to a ninth aspect of the present invention, the uttered input voice is composed of m (m is an integer) acoustic feature parameters obtained for each frame and n types (n is an integer) prepared in advance from learning data. ) Are matched with m feature parameters of each of the standard patterns, and n similarities are obtained for each frame to obtain an n-dimensional similarity vector sequence.
This is restored as a dictionary word pattern from a registered word dictionary created in advance for each frame using a codebook previously obtained from a similarity vector of learning data, and the input speech is recognized using the dictionary word pattern. And has the effect that input speech can be recognized using an encoded and compressed word dictionary.

【００２５】請求項１０に記載の発明は、発声された入
力音声を、フレーム毎に得られるｍ個（ｍは整数）の音
響的特徴パラメータと、あらかじめ学習データから作成
したｎ種類（ｎは整数）の標準パターン各々が有するｍ
個の特徴パラメータとのマッチングを行い、ｎ個の類似
度をフレーム毎に求めｎ次元の類似度ベクトル系列と
し、このｎ種類の各類似度の時系列それぞれに対して、
類似度の時間変化量をフレーム毎に求め、この類似度の
時間変化量のｎ次元ベクトルの系列を、あらかじめ学習
データの類似度の時間変化量のｎ次元のベクトルから求
めたコードブックを用いて、フレーム毎に予め作成した
登録単語辞書から辞書単語パターンとして復元し、この
辞書単語パターンを用いて、前記入力音声を認識するこ
とを特徴とするもので、類似度の時間変化量のベクトル
のコード化された系列を用いて入力音声を認識すること
ができるという作用を有する。According to a tenth aspect of the present invention, the uttered input voice is composed of m (m is an integer) acoustic feature parameters obtained for each frame and n types (n is an integer) created in advance from learning data. M) of each of the standard patterns
Matching is performed with n feature parameters, n similarities are obtained for each frame, and an n-dimensional similarity vector series is obtained. For each of the n types of similarity time series,
The time change amount of the similarity is obtained for each frame, and a series of n-dimensional vectors of the time change amount of the similarity is calculated using a codebook previously obtained from the n-dimensional vector of the time change amount of the similarity of the learning data. A dictionary word pattern is restored from a registered word dictionary created in advance for each frame, and the input speech is recognized using the dictionary word pattern. This has the effect that the input speech can be recognized using the converted sequence.

【００２６】請求項１１に記載の発明は、発声された入
力音声を、フレーム毎に得られるｍ個（ｍは整数）の音
響的特徴パラメータと、あらかじめ学習データから作成
したｎ種類（ｎは整数）の標準パターン各々が有するｍ
個の特徴パラメータとのマッチングを行い、ｎ個の類似
度をフレーム毎に求めｎ次元の類似度ベクトル系列と
し、このｎ種類の各類似度の時系列それぞれに対して、
類似度の時間変化量をフレーム毎に求め、前記ｎ次元の
類似度ベクトル系列と前記この類似度の時間変化量のｎ
次元ベクトルの系列とを１つにまとめた入力音声の単語
パターンを、あらかじめ学習データの類似度から求めた
コードブックを用いて、フレーム毎に予め作成した登録
単語辞書から辞書単語パターンとして復元し、この辞書
単語パターンを用いて、前記入力音声を認識することを
特徴とするもので、類似度ベクトルと類似度の時間変化
量のベクトルのコード化された系列を用いて入力音声を
認識することができるという作用を有する。According to an eleventh aspect of the present invention, the uttered input voice is composed of m (m is an integer) acoustic feature parameters obtained for each frame and n types (n is an integer) prepared in advance from learning data. M) of each of the standard patterns
Matching is performed with n feature parameters, n similarities are obtained for each frame, and an n-dimensional similarity vector series is obtained. For each of the n types of similarity time series,
The temporal change amount of the similarity is obtained for each frame, and the n-dimensional similarity vector sequence and n of the temporal change amount of the similarity are calculated.
A word pattern of an input voice obtained by combining a series of dimensional vectors into one is restored as a dictionary word pattern from a registered word dictionary created in advance for each frame using a codebook obtained in advance from the similarity of learning data, Using the dictionary word pattern to recognize the input voice, the input voice can be recognized by using a coded sequence of a similarity vector and a vector of a temporal change amount of the similarity. Has the effect of being able to.

【００２７】請求項１２に記載の発明は、請求項８から
１１のいずれかに記載の音声認識方法において、登録単
語辞書は、請求項１から請求項７のいずれかに記載の方
法で作成した辞書を用いることを特徴とするもので、符
号化され圧縮された単語辞書を用いて入力音声を認識す
ることができるという作用を有する。According to a twelfth aspect of the present invention, in the speech recognition method according to any one of the eighth to eleventh aspects, the registered word dictionary is created by the method according to any one of the first to seventh aspects. It is characterized by using a dictionary, and has an effect that input speech can be recognized using an encoded and compressed word dictionary.

【００２８】請求項１３に記載の発明は、辞書登録語を
登録者が発声し、フレーム毎に得られるｍ個（ｍは整
数）の音響的特徴パラメータを求める音響分析手段と、
あらかじめ学習データから作成したｎ種類（ｎは整数）
の標準パターン各々が有するｍ個の特徴パラメータを格
納した標準パターン格納手段と、前記標準パターン格納
手段からのｍ個の特徴パラメータと前記音響分析手段か
らのｍ個（ｍは整数）の音響的特徴パラメータのマッチ
ングを行い、ｎ個の類似度をフレーム毎に求めｎ次元の
類似度ベクトル系列として求める類似度計算手段と、こ
れをあらかじめ学習データの類似度ベクトルから求めた
コードブックを用いてフレーム毎に符号化する類似ベク
トル符号化手段と、前記符号化されたコード系列を登録
単語辞書に辞書単語パターンとして登録するコード化登
録単語辞書格納手段とを具備することを特徴とするもの
で、類似度ベクトル系列を辞書単語パターンとする登録
単語辞書のためのメモリ領域を大幅に削減できるという
作用を有する。According to a thirteenth aspect of the present invention, the registrant utters the dictionary registration word and obtains m (m is an integer) acoustic feature parameters obtained for each frame,
N types (n is an integer) created in advance from learning data
Standard pattern storage means storing m feature parameters of each of the standard patterns, m feature parameters from the standard pattern storage means, and m (m is an integer) acoustic features from the acoustic analysis means A similarity calculating means for performing parameter matching, obtaining n similarities for each frame, and obtaining an n-dimensional similarity vector series, and using a codebook previously obtained from a similarity vector of learning data for each frame. And a coded registered word dictionary storage means for registering the coded code sequence as a dictionary word pattern in a registered word dictionary. This has the effect of significantly reducing the memory area for a registered word dictionary using a vector sequence as a dictionary word pattern.

【００２９】請求項１４に記載の発明は、辞書登録語を
登録者が発声し、フレーム毎に得られるｍ個（ｍは整
数）の音響的特徴パラメータを求める音響分析手段と、
あらかじめ学習データから作成したｎ種類（ｎは整数）
の標準パターン各々が有するｍ個の特徴パラメータを格
納した標準パターン格納手段と、前記標準パターン格納
手段からのｍ個の特徴パラメータと前記音響分析手段か
らのｍ個（ｍは整数）の音響的特徴パラメータのマッチ
ングを行い、ｎ個の類似度をフレーム毎に求めｎ次元の
類似度ベクトル系列として求める類似度計算手段と、前
記ｎ種類の各類似度の時系列それぞれに対して、類似度
の時間変化量をフレーム毎に求め、この類似度の時間変
化量のｎ次元ベクトルの系列を求める回帰係数計算手段
と、これをあらかじめ学習データの類似度の時間変化量
のｎ次元のベクトルから求めたコードブックを用いて、
フレーム毎に符号化する回帰係数ベクトル符号化手段
と、前記符号化されたコード系列を登録単語辞書に辞書
単語パターンとして登録するコード化登録単語辞書格納
手段とを具備することを特徴とするもので、類似度の時
間変化量のベクトル系列を単語パターンとする登録単語
辞書のためのメモリ領域を大幅に削減できるという作用
を有する。According to a fourteenth aspect of the present invention, the registrant utters the dictionary registration word, and obtains m (m is an integer) acoustic feature parameters obtained for each frame,
N types (n is an integer) created in advance from learning data
Standard pattern storage means storing m feature parameters of each of the standard patterns, m feature parameters from the standard pattern storage means, and m (m is an integer) acoustic features from the acoustic analysis means Parameter matching, n similarities are calculated for each frame, and a similarity calculating means for obtaining an n-dimensional similarity vector series; and a time series of the similarity for each of the n types of similarity time series. Regression coefficient calculating means for obtaining the amount of change for each frame and obtaining a series of n-dimensional vectors of the amount of time change of the similarity, and a code which previously obtained this from an n-dimensional vector of the amount of time change of the similarity of the learning data Using a book,
A regression coefficient vector encoding unit that encodes each frame, and a coded registered word dictionary storage unit that registers the coded code sequence in a registered word dictionary as a dictionary word pattern. This has the effect that the memory area for a registered word dictionary in which a vector sequence of similarity changes over time is used as a word pattern can be significantly reduced.

【００３０】請求項１５に記載の発明は、請求項１３ま
たは１４に記載の登録単語辞書作成装置において、コー
ド化登録単語辞書格納手段は、前記ｎ次元の類似度ベク
トル系列をコードブックを用いて符号化して求めたコー
ド系列と、前記類似度の時間変化量のｎ次元のベクトル
系列をコードブックを用いて符号化して求めたコード系
列を合わせて、辞書単語パターンとして登録することを
特徴とするもので、類似度ベクトル系列と、類似度の時
間変化量のベクトル系列を併用して単語パターンとする
登録単語辞書のためのメモリ領域を大幅に削減できると
いう作用を有する。According to a fifteenth aspect of the present invention, in the registered word dictionary creating device according to the thirteenth or fourteenth aspect, the coded registered word dictionary storage means stores the n-dimensional similarity vector series using a codebook. The code sequence obtained by encoding and the code sequence obtained by encoding an n-dimensional vector sequence of the time change amount of the similarity using a code book are combined and registered as a dictionary word pattern. This has an effect that a memory area for a registered word dictionary that is used as a word pattern by using the similarity vector sequence and the vector sequence of the time variation of the similarity together can be significantly reduced.

【００３１】請求項１６に記載の発明は、発声された入
力音声を、フレーム毎に得られるｍ個（ｍは整数）の音
響的特徴パラメータを求める音響分析手段と、あらかじ
め学習データから作成したｎ種類（ｎは整数）の標準パ
ターン各々が有するｍ個の特徴パラメータを登録した標
準パターン格納手段と、前記のｍ個（ｍは整数）の音響
的特徴パラメータと前記ｍ個の特徴パラメータとをマッ
チングを行い、ｎ個の類似度をフレーム毎に求めｎ次元
の類似度ベクトル系列として求める類似度計算手段と、
前記ｎ次元の類似度ベクトル系列をあらかじめ学習デー
タの類似度ベクトルから求めたコードブックを用いてフ
レーム毎に、予め作成した登録単語辞書から辞書単語パ
ターンとして復元する単語パターン復元手段と、前記復
元された辞書単語パターンを用いて、前記入力音声を認
識する単語認識手段とを具備することを特徴とするもの
で、符号化され圧縮された単語辞書を用いて入力音声を
認識することができるという作用を有する。According to a sixteenth aspect of the present invention, the uttered input voice is converted into m (m is an integer) acoustic feature parameters obtained for each frame, and n is generated from learning data in advance. Standard pattern storage means for registering m feature parameters of each type (n is an integer) of standard patterns, and matching of the m (m is an integer) acoustic feature parameters with the m feature parameters A similarity calculating means for obtaining n similarities for each frame and obtaining an n-dimensional similarity vector series;
Word pattern restoring means for restoring the n-dimensional similarity vector series as a dictionary word pattern from a registered word dictionary created in advance for each frame using a codebook previously obtained from the similarity vector of the learning data; And a word recognizing means for recognizing the input voice using the dictionary word pattern, wherein the input voice can be recognized using the encoded and compressed word dictionary. Having.

【００３２】請求項１７に記載の発明は、発声された入
力音声を、フレーム毎に得られるｍ個（ｍは整数）の音
響的特徴パラメータを求める音響分析手段と、あらかじ
め学習データから作成したｎ種類（ｎは整数）の標準パ
ターン各々が有するｍ個の特徴パラメータを登録した標
準パターン格納手段と、前記のｍ個（ｍは整数）の音響
的特徴パラメータと前記ｍ個の特徴パラメータとをマッ
チングを行い、ｎ個の類似度をフレーム毎に求めｎ次元
の類似度ベクトル系列として求める類似度計算手段と、
前記ｎ種類の各類似度の時系列それぞれに対して、類似
度の時間変化量をフレーム毎に求め、この類似度の時間
変化量のｎ次元ベクトルの系列を求める回帰係数計算手
段と、前記ｎ次元の類似度ベクトル系列をあらかじめ学
習データの類似度の時間変化量のｎ次元のベクトルから
求めたコードブックを用いて、フレーム毎に予め作成し
た登録単語辞書から辞書単語パターンとして復元する単
語パターン復元手段と、前記復元された辞書単語パター
ンを用いて、前記入力音声を認識する単語認識手段とを
具備することを特徴とするもので、符号化され圧縮され
た単語辞書を用いて入力音声を認識することができると
いう作用を有する。According to a seventeenth aspect of the present invention, the uttered input voice is converted from an acoustic analysis unit for obtaining m (m is an integer) acoustic feature parameters obtained for each frame, and n generated from learning data in advance. Standard pattern storage means for registering m feature parameters of each type (n is an integer) of standard patterns, and matching of the m (m is an integer) acoustic feature parameters with the m feature parameters A similarity calculating means for obtaining n similarities for each frame and obtaining an n-dimensional similarity vector series;
Regression coefficient calculating means for obtaining, for each frame, a time change amount of the similarity for each of the n types of similarity time series, and obtaining an n-dimensional vector series of the time change amount of the similarity; Word pattern restoration that restores a dimensional similarity vector series as a dictionary word pattern from a registered word dictionary created in advance for each frame, using a codebook previously obtained from an n-dimensional vector of the time variation of the similarity of the learning data Means, and word recognition means for recognizing the input voice using the restored dictionary word pattern, wherein the input voice is recognized using an encoded and compressed word dictionary. Has the effect of being able to

【００３３】請求項１８に記載の発明は、発声された入
力音声を、フレーム毎に得られるｍ個（ｍは整数）の音
響的特徴パラメータを求める音響分析手段と、あらかじ
め学習データから作成したｎ種類（ｎは整数）の標準パ
ターン各々が有するｍ個の特徴パラメータを登録した標
準パターン格納手段と、前記のｍ個（ｍは整数）の音響
的特徴パラメータと前記ｍ個の特徴パラメータとをマッ
チングを行い、ｎ個の類似度をフレーム毎に求めｎ次元
の類似度ベクトル系列として求める類似度計算手段と、
前記ｎ種類の各類似度の時系列それぞれに対して、類似
度の時間変化量をフレーム毎に求め、この類似度の時間
変化量のｎ次元ベクトルの系列を求める回帰係数計算手
段と、前記類似度計算手段からのｎ次元の類似度ベクト
ル系列と前記回帰係数計算手段からの類似度の時間変化
量のｎ次元ベクトルの系列とを１つにまとめ、入力音声
の単語パターンとする単語パターン作成手段と、前記入
力音声の単語パターンをあらかじめ学習データの類似度
の時間変化量のｎ次元のベクトルから求めたコードブッ
クを用いて、フレーム毎に予め作成した登録単語辞書か
ら辞書単語パターンとして復元する単語パターン復元手
段と、前記復元された辞書単語パターンを用いて、前記
入力音声を認識する単語認識手段とを具備することを特
徴とするもので、符号化され圧縮された単語辞書を用い
て入力音声を認識することができるという作用を有す
る。The invention according to claim 18 is a sound analysis means for obtaining m (m is an integer) acoustic feature parameters obtained for each frame from the uttered input voice, and n which is previously created from learning data. Standard pattern storage means for registering m feature parameters of each type (n is an integer) of standard patterns, and matching of the m (m is an integer) acoustic feature parameters with the m feature parameters A similarity calculating means for obtaining n similarities for each frame and obtaining an n-dimensional similarity vector series;
For each of the n types of time series of similarities, a regression coefficient calculating means for obtaining a time change amount of the similarity for each frame, and obtaining an n-dimensional vector series of the time change amount of the similarity; Word pattern creating means which combines an n-dimensional similarity vector series from the degree calculating means and an n-dimensional vector series of similarity time change from the regression coefficient calculating means into a single word pattern of the input voice And a word for restoring a word pattern of the input voice as a dictionary word pattern from a registered word dictionary created in advance for each frame using a codebook previously obtained from an n-dimensional vector of a time change amount of similarity of learning data. Pattern restoring means, comprising using the restored dictionary word pattern, word recognition means for recognizing the input voice, An effect that can recognize an input speech with Goka and compressed word dictionary.

【００３４】請求項１９に記載の発明は、請求項１６か
ら１８のいずれかに記載の音声認識装置において、登録
単語辞書は、請求項１３から１５のいずれかに記載の登
録単語辞書作成装置で作成された登録辞書を用いること
を特徴とするもので、符号化され圧縮された単語辞書を
用いて入力音声を認識することができるという作用を有
する。According to a nineteenth aspect of the present invention, in the speech recognition apparatus according to any one of the sixteenth to eighteenth aspects, the registered word dictionary is provided by the registered word dictionary creating apparatus according to any one of the thirteenth to fifteenth aspects. It is characterized by using the created registration dictionary, and has the effect that input speech can be recognized using an encoded and compressed word dictionary.

【００３５】以下、本発明の実施の形態について、図１
から図５を用いて説明する。Hereinafter, an embodiment of the present invention will be described with reference to FIG.
This will be described with reference to FIG.

【００３６】（実施の形態１）図１に、本発明の実施の
形態１における登録単語辞書作成装置のブロック構成図
に示し説明する。(Embodiment 1) FIG. 1 is a block diagram of a registered word dictionary creating apparatus according to Embodiment 1 of the present invention, which will be described.

【００３７】図１において、１は入力音声からＬＰＣケ
プストラムを求める音響分析部、２は入力音声と標準パ
ターンの比較から類似度ベクトルを求める類似度計算
部、３は類似度計算で用いる標準パターン格納部、４は
類似度ベクトルの時系列からその回帰係数ベクトルを計
算する回帰係数計算部、Ｓ１は計算された類似度ベクト
ルをコードブック作成時と単語登録時で振り分けるスイ
ッチ、Ｓ２は計算された回帰係数ベクトルをコードブッ
ク作成時と単語登録時で振り分けるスイッチである。In FIG. 1, reference numeral 1 denotes an acoustic analysis unit for obtaining an LPC cepstrum from an input voice, 2 denotes a similarity calculation unit for obtaining a similarity vector from a comparison between the input voice and a standard pattern, and 3 denotes a standard pattern storage used for similarity calculation. , 4 is a regression coefficient calculation unit for calculating a regression coefficient vector from a time series of similarity vectors, S1 is a switch for distributing the calculated similarity vector between codebook creation and word registration, and S2 is a calculated regression. This is a switch for distributing coefficient vectors between when creating a codebook and when registering words.

【００３８】１０５は類似度計算部２で学習用音声から
作成された多数の類似度ベクトルに対してクラスタリン
グを行ない、１個以上の代表(平均)ベクトルを求めてこ
れらをコードベクトルとする類似度コードベクトル作成
部、１０６は類似度コードベクトル作成部１０５で得ら
れた類似度コードベクトルをまとめてコードブックとす
る類似度ベクトルコードブック、１０７は回帰係数計算
部４で学習用音声から作成された多数の回帰係数ベクト
ルに対してクラスタリングを行ない、１個以上の代表
(例えば平均)ベクトルを求めてこれらをコードベクトル
とする回帰係数コードベクトル作成部、１０８は回帰係
数コードベクトル作成部１０７で得られた回帰係数コー
ドベクトルをまとめてコードブックとする回帰係数ベク
トルコードブック、９は類似度計算部２および回帰係数
計算部４で得られた類似度ベクトルの時系列と回帰係数
ベクトルの時系列を一組として蓄える単語パターン格納
部、１１０は単語パターン格納部９に蓄えられた２回分
の単語パターンから平均パターンを作成する平均単語パ
ターン作成部である。A similarity calculation unit 105 performs clustering on a large number of similarity vectors created from the learning speech, obtains one or more representative (average) vectors, and uses these as code vectors. A code vector creation unit 106 is a similarity vector codebook that collectively uses the similarity code vectors obtained by the similarity code vector creation unit 105 as a codebook, and 107 is a regression coefficient calculation unit 4 created from learning speech. Perform clustering on many regression coefficient vectors, one or more representatives
A regression coefficient code vector creation unit 108 that obtains (for example, average) vectors and uses these as code vectors, and a regression coefficient vector code book 108 that collects the regression coefficient code vectors obtained by the regression coefficient code vector creation unit 107 and creates a code book , 9 is a word pattern storage unit that stores the time series of the similarity vector and the time series of the regression coefficient vector obtained by the similarity calculation unit 2 and the regression coefficient calculation unit 4 as a set. 110 is stored in the word pattern storage unit 9. An average word pattern creation unit that creates an average pattern from the two word patterns obtained.

【００３９】１１１は平均単語パターン中の類似度ベク
トル系列を類似度ベクトルコードブック１０６に格納さ
れた類似度ベクトルコードブックを用いて符号化し、コ
ード系列に変換する類似度ベクトル符号化部、１１２は
回帰係数ベクトル系列を回帰係数ベクトルコードブック
１０８に格納された回帰係数ベクトルコードブックを用
いて符号化し、コード系列に変換する回帰係数ベクトル
符号化部、１１３は類似度ベクトル系列に対応するコー
ド系列と、回帰係数ベクトル系列に対応するコード系列
をまとめて辞書単語パターンとするコード化辞書単語パ
ターン作成部、１１４はコード化辞書単語パターン作成
部１１３で作成した辞書単語パターンを格納するコード
化登録単語辞書である。A similarity vector encoding unit 111 encodes the similarity vector sequence in the average word pattern using the similarity vector codebook stored in the similarity vector codebook 106 and converts it into a code sequence. A regression coefficient vector encoding unit that encodes a regression coefficient vector series using the regression coefficient vector codebook stored in the regression coefficient vector codebook 108 and converts the regression coefficient vector series into a code series. A coded dictionary word pattern creating unit that collects code sequences corresponding to the regression coefficient vector series into a dictionary word pattern, and 114 is a coded registered word dictionary that stores the dictionary word pattern created by the coded dictionary word pattern creating unit 113 It is.

【００４０】以上のような構成において、以下にその動
作を説明する。まず、以下に動作の概略を述べる。音声
認識装置では、入力音声から求めた単語パターンと、単
語辞書中に格納された単語パターンの比較を行ない、最
も類似したパターン選択する。ここで、辞書中の単語パ
ターンはコード化されているため、入力音声のパターン
との比較のためには、辞書単語パターンを復元する必要
がある。本実施例ではこの復元作業を各単語パタンごと
にまとめて行なわず、逐次的に行なう。すなわち、入力
音声から音声パターン中の1フレームのベクトルが計算
されるごとに、そのベクトルとの距離比較に必要な辞書
中の単語パタンのベクトルがその都度復元される。この
ような処理を順次繰り返すことにより、辞書単語パター
ンの展開のためのメモリ領域を抑えたまま、認識計算が
可能となる。The operation of the above configuration will be described below. First, an outline of the operation will be described below. The speech recognition device compares the word pattern obtained from the input speech with the word pattern stored in the word dictionary, and selects the most similar pattern. Here, since the word pattern in the dictionary is coded, it is necessary to restore the dictionary word pattern for comparison with the input voice pattern. In the present embodiment, this restoring operation is performed sequentially without performing collective operation for each word pattern. That is, each time a vector of one frame in the voice pattern is calculated from the input voice, the vector of the word pattern in the dictionary required for the distance comparison with the vector is restored each time. By sequentially repeating such processing, recognition calculation can be performed while the memory area for developing the dictionary word pattern is suppressed.

【００４１】次に、詳細に説明する。まず、コードブッ
クを作成する場合には、図１の類似度ベクトル切換えス
イッチＳ１をａ側、回帰係数ベクトル切換えスイッチＳ
２をａ側に接続する。Next, a detailed description will be given. First, when creating a codebook, the similarity vector changeover switch S1 of FIG.
2 is connected to a side.

【００４２】コードブック作成用の音声には、あらかじ
め用意された多数の話者の発声データを用いる。これら
の音声は、音響分析部１で分析時間(フレームと呼ぶ。
本実施例では 10m秒)ごとに線形予測分析し、ＬＰＣケ
プストラム係数(C0〜C11)を求める。このＬＰＣケプス
トラム係数は、標準パタン格納部３に格納された音素標
準パターンを用いて、類似度計算部２で類似度ベクトル
を求めるために用いられる。For speech for creating a code book, utterance data of many speakers prepared in advance is used. These sounds are analyzed by the acoustic analysis unit 1 (referred to as frames).
In this embodiment, linear prediction analysis is performed every 10 msec) to obtain LPC cepstrum coefficients (C0 to C11). The LPC cepstrum coefficient is used by the similarity calculation unit 2 to obtain a similarity vector using the phoneme standard pattern stored in the standard pattern storage unit 3.

【００４３】標準パターン格納部３には、日本語の２４
種類の音素それぞれのＬＰＣケプストラムの統計的分布
を示すパラメータが標準パターンとして格納されてお
り、これらは音素標準パターン作成用として別に用意さ
れた多数の話者の音声データに対して、目視によって正
確にラベル付けされたものを用い、各音素の特徴を最も
よく表す部分を中心とした前後１０フレーム(100m秒)の
範囲のＬＰＣケプストラム係数(C0〜C11)を用いて作成
する。The standard pattern storage 3 stores Japanese 24 characters.
The parameters indicating the statistical distribution of the LPC cepstrum of each type of phoneme are stored as standard patterns. These parameters can be accurately and visually inspected with respect to voice data of a large number of speakers separately prepared for creating phoneme standard patterns. Using labeled ones, it is created using LPC cepstrum coefficients (C0 to C11) in the range of 10 frames before and after (100 ms) centering on the part that best represents the characteristics of each phoneme.

【００４４】類似度計算部２では、音響分析部１から得
た入力音声のＬＰＣケプストラム係数と標準パターン格
納部３に格納されている音素標準パターンとの統計的な
距離を、２４の音素標準パターンそれぞれについて求
め、これらを係数とする２４次元のベクトルを作成す
る。The similarity calculator 2 calculates the statistical distance between the LPC cepstrum coefficient of the input speech obtained from the acoustic analyzer 1 and the phoneme standard pattern stored in the standard pattern storage 3 by using the 24 phoneme standard patterns. For each of them, a 24-dimensional vector using these as coefficients is created.

【００４５】回帰係数計算部４では、類似度計算部２で
求めた２４次元の類似度ベクトルの各次元の時間変化を
表わすパラメータとして、各次元ごとに時間軸方向の回
帰係数を計算し、これを回帰係数ベクトルとする。The regression coefficient calculation unit 4 calculates a regression coefficient in the time axis direction for each dimension as a parameter representing a time change of each dimension of the 24-dimensional similarity vector obtained by the similarity calculation unit 2. Is a regression coefficient vector.

【００４６】以上のようにして得られた多数の類似度ベ
クトルと多数の回帰係数ベクトルは、類似度コードベク
トル作成部１０５と回帰係数コードベクトル作成部１０
７に送られる。A number of similarity vectors and a number of regression coefficient vectors obtained as described above are combined with a similarity code vector creation unit 105 and a regression coefficient code vector creation unit 10.
7

【００４７】類似度コードベクトル作成部１０５では、
集められた多数の類似度ベクトルに対してK-meansクラ
スタリングによるベクトル量子化(ＶＱ)が行なわれて、
類似度ベクトルの空間的な分布を表現する１個以上の代
表ベクトルが計算さる。In the similarity code vector creation unit 105,
Vector quantization (VQ) by K-means clustering is performed on a large number of similarity vectors collected,
One or more representative vectors representing the spatial distribution of the similarity vector are calculated.

【００４８】この代表ベクトルは、そのままコードベク
トルとして図２（ａ）に示すようにそれぞれを識別する
コードと組にされて、類似度ベクトルコードブック１０
６に格納することも可能であるが、さらにベクトルの成
分の絶対値の大きいものから順に上位６個を選び出して
これを圧縮されたコードベクトルとする。これを類似度
ベクトルコードブック１０６に格納する場合には、圧縮
されたコードベクトルとそれを識別するコードだけでは
なく、選択した成分を示すインデックスと組にされ、図
２（ｂ）に示す構造で格納される。The representative vector is directly paired with a code for identifying each as a code vector as shown in FIG.
6 can be stored, but the top six are selected in descending order of the absolute value of the component of the vector, and this is used as a compressed code vector. When this is stored in the similarity vector code book 106, not only a compressed code vector and a code for identifying the compressed code vector but also an index indicating a selected component are paired, and a structure shown in FIG. Is stored.

【００４９】同様に回帰係数コードベクトル作成部１０
７においても、集められた多数の回帰係数ベクトルに対
してベクトル量子化が行なわれて、１個以上の代表ベク
トルが計算され、各ベクトルごとにベクトル成分の絶対
値の大きいもの上位６個のみを選択して圧縮されたコー
ドベクトルとする。回帰係数ベクトルコードブック１０
８では、得られた回帰係数ベクトルの圧縮されたコード
ベクトルが識別コードと選択成分を示すインデックスと
ともに格納される。Similarly, the regression coefficient code vector creation unit 10
Also in step 7, vector quantization is performed on a large number of regression coefficient vectors collected, one or more representative vectors are calculated, and for each vector, only the top six with a large absolute value of the vector component are determined. The code vector is selected and compressed. Regression coefficient vector codebook 10
At 8, the compressed code vector of the obtained regression coefficient vector is stored together with the identification code and the index indicating the selected component.

【００５０】次に、登録単語辞書に登録用単語パターン
を登録する方法を説明する。単語パターンを辞書に登録
する場合には、類似度ベクトル切換えスイッチＳ１およ
び回帰係数ベクトル切換えスイッチＳ２をともにｂの側
に接続する。Next, a method of registering a word pattern for registration in the registered word dictionary will be described. When registering a word pattern in the dictionary, both the similarity vector changeover switch S1 and the regression coefficient vector changeover switch S2 are connected to the side b.

【００５１】単語登録時には、登録者は同一単語を１回
以上発声するが、これは各発声ごとに、コードブック作
成時と同様の方法で、音響分析部１、類似度計算部２、
回帰係数計算部４を経て、各フレームごとに類似度ベク
トルと回帰係数ベクトルの系列が計算される。この２つ
のベクトル系列は、各単語発声ごとに単語パターンとし
て単語パターン格納部９に格納される。At the time of word registration, the registrant utters the same word one or more times. For each utterance, a sound analysis unit 1, a similarity calculation unit 2,
Through the regression coefficient calculation unit 4, a series of the similarity vector and the regression coefficient vector is calculated for each frame. These two vector sequences are stored in the word pattern storage unit 9 as word patterns for each word utterance.

【００５２】平均単語パターン作成部１１０では、単語
パターン格納部９に蓄えられた１回以上の発声から得ら
れた複数の単語パターンから平均単語パターンを求め
る。ただし、登録用の単語発声が１回のみである場合に
は、そのパターンがそのまま平均単語パターンとして扱
われる。The average word pattern creating section 110 obtains an average word pattern from a plurality of word patterns obtained from one or more utterances stored in the word pattern storage section 9. However, when the word for registration is uttered only once, the pattern is treated as it is as an average word pattern.

【００５３】類似度ベクトル符号化部１１１では、平均
単語パターンを構成している類似度ベクトル系列中の各
ベクトルに対して、類似度ベクトルコードブック１０６
に格納されているコードベクトルのうち最も類似してい
るものが１つ選択され、そのベクトルに対応する識別コ
ードに変換される。これにより類似度ベクトルの系列
は、１次元のコード系列に変換される。The similarity vector encoding section 111 performs similarity vector code book 106 on each vector in the similarity vector sequence forming the average word pattern.
Is selected and the one that is most similar is selected and converted to an identification code corresponding to that vector. As a result, the sequence of similarity vectors is converted into a one-dimensional code sequence.

【００５４】同様に回帰係数ベクトル符号化部１１２で
も、平均単語パターンを構成している回帰係数ベクトル
系列中の各ベクトルに対して、回帰係数ベクトルコード
ブック１０８に格納されているコードベクトルのうち最
も類似しているものが１つ選択され、対応する識別コー
ドに変換される。これにより回帰係数ベクトルの系列
は、１次元のコード系列に変換される。Similarly, the regression coefficient vector encoding unit 112 also determines the most significant of the code vectors stored in the regression coefficient vector code book 108 for each vector in the regression coefficient vector sequence forming the average word pattern. One similar one is selected and converted to a corresponding identification code. Thus, the regression coefficient vector sequence is converted into a one-dimensional code sequence.

【００５５】以上のようにして得られた類似度ベクトル
のコード系列と回帰係数ベクトルのコード系列は、コー
ド化辞書単語パターン作成部１１３で組み合わされて、
１つの単語パターンを表すパラメータとしてまとめら
れ、コード化登録単語辞書１１４に送られる。このよう
にして、コード化登録単語辞書１１４には、コード系列
で表現された単語パターンが認識語彙の数だけ格納され
る。The code sequence of the similarity vector and the code sequence of the regression coefficient vector obtained as described above are combined by the coded dictionary word pattern creating unit 113, and
The parameters are grouped as parameters representing one word pattern and sent to the coded registered word dictionary 114. In this way, the coded registered word dictionary 114 stores word patterns represented by code sequences as many as the number of recognized vocabularies.

【００５６】なお、上記実施例においては、類似度ベク
トルおよび回帰係数ベクトルのコードブック作成方法と
して、クラスタリングによるベクトル量子化を用いる方
法を示したが、任意に作成したコードブックに、識別学
習といった学習方法による最適化したものを用いる方法
も有効である。In the above-described embodiment, the method of using vector quantization by clustering as a method of creating a codebook of a similarity vector and a regression coefficient vector is described. A method using a method optimized by the method is also effective.

【００５７】また、同実施例では、コードベクトルは各
成分の絶対値で上位選択し、６個のみ選択する方法を示
したが、選択の基準として統計的に重要な意味を持つ成
分を残すといった方法も効果的であり、選択の個数も６
個固定としたが、ベクトルごとに任意に変更できるとす
る方法も可能である。In this embodiment, the code vector is selected in the order of the absolute value of each component, and only six code vectors are selected. However, a component having a statistically significant meaning is left as a selection criterion. The method is effective and the number of choices is six
Although fixed individually, it is also possible to adopt a method that can be arbitrarily changed for each vector.

【００５８】また、実施例では類似度ベクトルと回帰係
数ベクトルの双方をコードブックによって符号化する方
法を示したが、いずれか一方のみを用いる方法も有効で
ある。In the embodiment, the method of coding both the similarity vector and the regression coefficient vector by the code book has been described. However, a method using only one of them is also effective.

【００５９】さらに、実施例では類似度ベクトルコード
ブック１０６と回帰係数ベクトルコードブック１０８
を、それぞれ独立に作成する方法を示したが、これらを
共通に作成して単一のコードブックとし、類似度ベクト
ルと回帰係数ベクトルを符号化する方法も有効である。Further, in the embodiment, the similarity vector code book 106 and the regression coefficient vector code book 108
Have been described above, but it is also effective to create them in common to form a single codebook and encode the similarity vector and the regression coefficient vector.

【００６０】（実施の形態２）図３に、本発明の実施の
形態２における音声認識装置の構成図を示し、以下に説
明する。(Embodiment 2) FIG. 3 shows a configuration diagram of a speech recognition apparatus according to Embodiment 2 of the present invention, which will be described below.

【００６１】図３において、音響分析部１、類似度計算
部２、標準パターン格納部３および回帰係数計算部４
は、実施の形態１で示した図１の各部と同じものであ
り、詳細な説明は省略する。１５は入力音声に対して求
められた類似度ベクトル系列と回帰係数ベクトル系列を
合わせて単語パターンとする単語パターン作成部、１１
４は実施の形態１で作成した各単語パターンが格納され
たコード化登録単語辞書、１０６は実施の形態１で作成
した類似度ベクトルコードブック、１０８も実施の形態
１で作成した回帰係数ベクトルコードブック、１１６は
コード化登録単語辞書に辞書単語パターンとして登録さ
れたコード系列を、類似度ベクトル系列および回帰係数
ベクトル系列に復元する単語パターン復元部、１７は入
力音声と復元された辞書単語パターンを比較して認識結
果を求める単語認識部である。In FIG. 3, an acoustic analysis unit 1, a similarity calculation unit 2, a standard pattern storage unit 3, and a regression coefficient calculation unit 4
Are the same as those of the first embodiment shown in FIG. 1, and a detailed description thereof will be omitted. Reference numeral 15 denotes a word pattern creating unit that combines the similarity vector series and the regression coefficient vector series obtained for the input voice to form a word pattern, 11
4 is a coded registered word dictionary storing each word pattern created in the first embodiment, 106 is a similarity vector codebook created in the first embodiment, and 108 is a regression coefficient vector code created in the first embodiment Book, 116 is a word pattern restoring unit that restores a code sequence registered as a dictionary word pattern in the coded registered word dictionary into a similarity vector sequence and a regression coefficient vector sequence, and 17 is an input speech and restored dictionary word pattern. This is a word recognition unit for comparing and obtaining a recognition result.

【００６２】次に、上記のように構成された音声認識装
置の動作について説明する。まず、入力音声は、実施の
形態１と同じ方法によって、10m秒の分析周期ごとに音
響分析部１、類似度計算部２、標準パターン格納部３お
よび回帰係数計算部４を用いて、類似度ベクトルの系列
と回帰係数ベクトルが計算される。Next, the operation of the speech recognition apparatus configured as described above will be described. First, in the same manner as in the first embodiment, the input voice is subjected to the similarity analysis using the acoustic analysis unit 1, the similarity calculation unit 2, the standard pattern storage unit 3, and the regression coefficient calculation unit 4 at every analysis period of 10 ms. A series of vectors and a regression coefficient vector are calculated.

【００６３】単語パターン作成部１５では、類似度ベク
トルと回帰係数ベクトルの２つのベクトルは１つにまと
められ、入力音声の単語パターン中の１フレームとな
す。In the word pattern creating section 15, the two vectors, the similarity vector and the regression coefficient vector, are put together into one, forming one frame in the word pattern of the input voice.

【００６４】単語認識部１７では、この入力音声の単語
パターン中の１フレームが求められるごとに、コード化
登録単語辞書１１４に格納されている各単語パターンの
各フレームとのＤＰマッチングによる時間整合マッチン
グが逐次進められてスコア計算が行なわれる。ここで、
１１４のコード化登録単語辞書に格納されている単語パ
ターンは、符号化されているため、類似度ベクトルコー
ドブック１０６と回帰係数ベクトルコードブック１０８
を用いて、単語パターン復元部１１６で類似度ベクトル
と回帰係数ベクトルに逐次復元される。The word recognizing unit 17 performs time matching matching by DP matching with each frame of each word pattern stored in the coded registered word dictionary 114 every time one frame in the word pattern of the input voice is obtained. Are sequentially advanced to calculate a score. here,
Since the word patterns stored in the coded registered word dictionary 114 are coded, the similarity vector code book 106 and the regression coefficient vector code book 108
, The word pattern restoration unit 116 successively restores the similarity vector and the regression coefficient vector.

【００６５】すなわち、単語認識部１７のＤＰマッチン
グにおいて、１１４のコード化登録単語辞書に格納され
ている単語ｗのｉ番目のフレームが要求されると、この
フレームのコードは、類似度ベクトルコードブック１０
６および回帰係数ベクトルコードブック１０８を用い
て、単語パターン復元部１１６でコードに対応するコー
ドベクトルに復元され、単語認識部１７に渡される。That is, in the DP matching of the word recognizing unit 17, when the i-th frame of the word w stored in the coded registered word dictionary 114 is requested, the code of this frame is stored in the similarity vector code book. 10
6 and the regression coefficient vector codebook 108, the word pattern is restored to a code vector corresponding to the code by the word pattern restoration unit 116, and is passed to the word recognition unit 17.

【００６６】この処理を繰り返すこにより、登録単語パ
ターン格納のための大きな領域を必要としない認識計算
を可能とする。また、以上の処理を辞書中の各単語全て
について行なって得られるスコアのうち、最も高いスコ
アを出した単語が認識結果となる。By repeating this process, it is possible to perform a recognition calculation that does not require a large area for storing a registered word pattern. Further, among the scores obtained by performing the above processing for all the words in the dictionary, the word giving the highest score is the recognition result.

【００６７】なお、本実施例においては、登録単語辞書
から単語パターンを復元するための類似度ベクトルコー
ドブック１０６および回帰係数ベクトルコードブック１
０８に、登録単語辞書作成時に用いたものと同じコード
ブックを用いたが、これは必ずしも同一のものである必
要はなく、単語パターン復元用に最適化されたコードブ
ックを用いることも可能である。In this embodiment, the similarity vector code book 106 and the regression coefficient vector code book 1 for restoring a word pattern from a registered word dictionary are used.
In 08, the same codebook as that used at the time of creating the registered word dictionary was used, but this is not necessarily the same, and a codebook optimized for word pattern restoration may be used. .

【００６８】また、入力音声の単語パターンと登録単語
辞書から復元された単語パターンの比較のためのマッチ
ングにはＤＰマッチングを用いたが、その他の時間整合
によるマッチングを用いる方法も有効である。Although the DP matching is used for the matching for comparing the word pattern of the input voice with the word pattern restored from the registered word dictionary, other methods using time matching are also effective.

【００６９】（実施例）以上に説明した実施の形態１お
よび実施の形態２を用いて、２つの認識実験を行なっ
た。１つは、類似度ベクトルコードブック１０６および
回帰係数ベクトルコードブック１０８の大きさと、単語
認識率の関係を調べる実験であり、もう１つは、コード
ベクトルの圧縮方法として、各コードベクトルの成分の
選択の個数を決定する実験を行なった。実験に用いたデ
ータは、コードブック作成用として日本語５４３単語を
男女各５名計１０名が発声したデータを用い、コードブ
ックサイズ決定、ならびにコードベクトルの成分の選択
個数の決定用のデータとして、日本語１００地名を男女
各２５名計５０名が３回づつ発声したデータを用い、こ
のうちの２回分の発声データを単語登録用として、残り
１回分の発声データを、認識評価用として用いた。(Examples) Two recognition experiments were performed using the first and second embodiments described above. One is an experiment for examining the relationship between the size of the similarity vector codebook 106 and the regression coefficient vector codebook 108 and the word recognition rate, and the other is as a code vector compression method. An experiment was performed to determine the number of choices. The data used in the experiment was data for deciding the codebook size and the number of selected codevector components, using data in which 543 words in Japanese were uttered by five men and women for a codebook creation. , Using data of 100 Japanese place names uttered three times by a total of 50 people, 25 men and women, of which two utterances are used for word registration, and the remaining one utterance is used for recognition evaluation. Was.

【００７０】本認識実験では、入力音声から類似度ベク
トル系列を求めるための標準パターンとして、日本語２
４音素のパターンを用いるものとし、類似度ベクトルと
回帰係数ベクトルの次元数は２４とした。また類似度ベ
クトルおよび回帰係数ベクトルのコードブックを作成す
る際には、ＶＱ（ベクトル量子化）を用いてコードベク
トルを作成するものとし、ベクトル間の距離には、（数
１）によって定義される相関余弦を用いるものとした。In the present recognition experiment, Japanese standard 2 was used as a standard pattern for obtaining a similarity vector series from input speech.
A pattern of four phonemes is used, and the number of dimensions of the similarity vector and the regression coefficient vector is set to 24. When creating a codebook of a similarity vector and a regression coefficient vector, a code vector is created using VQ (vector quantization), and the distance between the vectors is defined by (Equation 1). The correlation cosine was used.

【００７１】[0071]

【数１】 (Equation 1)

【００７２】さらに、登録用入力音声の複数の単語パタ
ーンからその平均パターンを求める場合、ならびに、入
力音声と登録単語辞書から復元した辞書単語パターンと
のマッチングには、ＤＰマッチングを用いるものとし
た。Further, DP matching is used for obtaining the average pattern from a plurality of word patterns of the input voice for registration and for matching the input voice with the dictionary word pattern restored from the registered word dictionary.

【００７３】図４には、類似度ベクトルコードブック１
０６および回帰係数ベクトルコードブック１０８の大き
さに対する、認識率の変化を調べた結果を示す。ここ
で、類似度ベクトルコードブック１０６の大きさと回帰
係数ベクトルコードブック１０８の大きさは同じである
ものとし、各コードブックに含まれるコードベクトルの
数を横軸に、その時の認識率を縦軸に示す。またこの実
験においては、各コードブック中のコードベクトルの成
分の選択は行なわないものとした。FIG. 4 shows a similarity vector codebook 1
11 shows the result of examining the change in the recognition rate with respect to the size of the codebook 06 and the regression coefficient vector codebook 108. Here, it is assumed that the size of the similarity vector codebook 106 and the size of the regression coefficient vector codebook 108 are the same, and the horizontal axis represents the number of code vectors included in each codebook, and the vertical axis represents the recognition rate at that time. Shown in In this experiment, the selection of the components of the code vector in each code book was not performed.

【００７４】この実験結果によれば、各コードブックの
大きさが４０の時に認識率が９７．４％となり、従来
の方式が９８．４％であるのと比較して１．０％の低下
に抑えることができる。またこの時、登録単語辞書の大
きさは、従来の方法で登録単語１語あたり平均約５．５
キロバイト程度必要であるところ、平均約１２０バイト
に圧縮することができ、１／４８の圧縮となる。According to the experimental results, when the size of each codebook is 40, the recognition rate is 97.4%, which is 1.0% lower than that of the conventional system of 98.4%. Can be suppressed. At this time, the size of the registered word dictionary is about 5.5 per registered word on average by the conventional method.
Where about a kilobyte is required, it can be compressed to about 120 bytes on average, resulting in 1/48 compression.

【００７５】図５には、類似度ベクトルコードブック１
０６および回帰係数ベクトルコードブック１０８に含ま
れる、各コードベクトルの成分を選択した場合の認識率
の変化を調べた結果を示す。ここでコードベクトルの成
分の選択個数は、類似度ベクトルと回帰係数ベクトルで
同数として、その選択個数を横軸に、その時の認識率を
縦軸に示す。またこの実験においては、各コードブック
の大きさは同数としてコードベクトル数４０とした。
この結果によれば、各２４次元のコードベクトルの成分
から６個の成分のみを選択した場合でも、認識率は９
７．３％であり、コードベクトルの成分を選択しなかっ
た場合の９７．４％と比較して、０．１％の認識率の低
下に抑えることができる。FIG. 5 shows a similarity vector codebook 1
6 shows a result of examining a change in recognition rate when a component of each code vector included in the codebook 06 and the regression coefficient vector code book 108 is selected. Here, the number of selected code vector components is the same in the similarity vector and the regression coefficient vector, and the selected number is shown on the horizontal axis, and the recognition rate at that time is shown on the vertical axis. In this experiment, the size of each codebook was the same, and the number of code vectors was 40.
According to this result, even when only six components are selected from the components of each 24-dimensional code vector, the recognition rate is 9
This is 7.3%, which is a reduction of 0.1% in recognition rate, compared to 97.4% when no code vector component is selected.

【００７６】また、この時、コードブックの大きさは、
コードベクトル１つあたり４８バイトから１８バイトに
圧縮されている。At this time, the size of the code book is
Each code vector is compressed from 48 bytes to 18 bytes.

【００７７】[0077]

【発明の効果】以上のように本発明は、学習用音声デー
タから得られた多数の類似度ベクトルと回帰係数ベクト
ルの空間的な分布を代表するいくつかのコードベクトル
を求めることにより、従来連続値的な値を持つベクトル
としてベクルト値のまま記憶する必要のあったパラメー
タを表わす１次元のパラメータに僅かな誤差で置き換え
ることを可能にし、これによって単語認識率をほとんど
低下させないまま、登録単語辞書の大きさを従来方法と
比べて１／４８と大幅に削減することを可能にする。As described above, the present invention obtains a number of similarity vectors and a number of code vectors representing the spatial distribution of regression coefficient vectors obtained from learning speech data, thereby obtaining a continuous code. A one-dimensional parameter representing a parameter that needs to be stored as a vector value as a vector having a numerical value with a small error can be replaced with a small error, whereby the registered word dictionary can be reduced without substantially reducing the word recognition rate. Can be greatly reduced to 1/48 of that of the conventional method.

【００７８】また、この際、別途必要となるコードブッ
クも、コードベクトルの成分の選択を行なうことにより
通常に作成したコードブックに較べてその大きさを半分
以下にするとともに、距離計算のための乗算回数自体も
削減することを可能とする。このように、本発明は発声
登録方式の音声認識装置の実用化技術の向上に対して極
めて大きく貢献する。At this time, the codebook, which is separately required, is reduced to half or less the size of the normally created codebook by selecting the components of the code vector, and is used for distance calculation. The number of times of multiplication itself can be reduced. As described above, the present invention greatly contributes to the improvement of the practical application technology of the speech registration type speech recognition apparatus.

[Brief description of the drawings]

【図１】本発明の実施の形態１における登録単語辞書作
成装置のブロック構成図FIG. 1 is a block diagram of a registered word dictionary creation device according to a first embodiment of the present invention.

【図２】同実施の形態における通常のコードブックと圧
縮したコードブックの構造図FIG. 2 is a structural diagram of a normal codebook and a compressed codebook according to the embodiment;

【図３】本発明の実施の形態２における音声認識装置の
ブロック構成図FIG. 3 is a block diagram of a speech recognition apparatus according to a second embodiment of the present invention.

【図４】同実施の形態において、コードブックの大きさ
に対する認識率の変化を調べた実験結果を示す図FIG. 4 is a diagram showing an experimental result obtained by examining a change in a recognition rate with respect to a codebook size in the embodiment.

【図５】同実施の形態において、コードブック中のコー
ドベクトルの成分を選択した場合の認識率の変化を調べ
た実験結果を示す図FIG. 5 is a diagram showing an experimental result of examining a change in recognition rate when a code vector component in a code book is selected in the embodiment.

【図６】従来の発声登録方法による登録単語辞書作成装
置のブロック構成図FIG. 6 is a block diagram of a registered word dictionary creating apparatus according to a conventional utterance registration method.

【図７】従来の発声登録方法による音声認識装置のブロ
ック構成図FIG. 7 is a block diagram showing the configuration of a speech recognition apparatus using a conventional utterance registration method.

[Explanation of symbols]

１音響分析部２類似度計算部３標準パターン格納部４回帰係数計算部９単語パターン格納部１０辞書単語パターン作成部１４登録単語辞書１５単語パターン作成部１７単語認識部１８コード格納部１９コードベクトル格納部２０圧縮コードベクトル格納部１０５類似度コードベクトル作成部１０６類似度ベクトルコードブック作成部１０７回帰係数コードベクトル作成部１０８回帰係数ベクトルコードブック１１０平均単語パターン作成部１１１類似度ベクトル符号化部１１２回帰係数ベクトル符号化部１１３コード化辞書単語パターン作成部１１４コード化登録単語辞書１１６単語パターン復元部 Reference Signs List 1 acoustic analysis unit 2 similarity calculation unit 3 standard pattern storage unit 4 regression coefficient calculation unit 9 word pattern storage unit 10 dictionary word pattern creation unit 14 registered word dictionary 15 word pattern creation unit 17 word recognition unit 18 code storage unit 19 code vector Storage unit 20 Compressed code vector storage unit 105 Similarity code vector creation unit 106 Similarity vector codebook creation unit 107 Regression coefficient code vector creation unit 108 Regression coefficient vector codebook 110 Average word pattern creation unit 111 Similarity vector encoding unit 112 Regression coefficient vector encoding unit 113 Coded dictionary word pattern creation unit 114 Coded registered word dictionary 116 Word pattern restoration unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者渡辺泰助神奈川県川崎市多摩区東三田３丁目10番１号松下技研株式会社内Ｆターム(参考） 5D015 AA02 BB01 FF05 GG04 5D045 CB01 DA11 9A001 EE04 HH17 ────────────────────────────────────────────────── ─── Continuing on the front page (72) Inventor Taisuke Watanabe 3-10-1 Higashi-Mita, Tama-ku, Kawasaki-shi, Kanagawa F-term in Matsushita Giken Co., Ltd. 5D015 AA02 BB01 FF05 GG04 5D045 CB01 DA11 9A001 EE04 HH17

Claims

[Claims]

1. A method for creating a registered word dictionary, comprising encoding a word pattern obtained from a vocalized speech for vocabulary registration using a codebook, and registering the code sequence as a dictionary word pattern in a registered word dictionary.

2. A dictionary registration word is uttered by a registrant, and m (m is an integer) acoustic feature parameters obtained for each frame, and n (n is an integer) standard patterns created in advance from learning data By performing matching with m feature parameters of each, n similarities are obtained for each frame to obtain an n-dimensional similarity vector sequence, and this is determined using a codebook previously obtained from the similarity vector of the learning data. A method for creating a registered word dictionary, characterized by encoding each frame and registering this code sequence in a registered word dictionary as a dictionary word pattern.

3. A dictionary registration word is uttered by a registrant, and m (m is an integer) acoustic feature parameters obtained for each frame, and n (n is an integer) standard patterns created in advance from learning data Matching is performed with the m feature parameters of each of them, n similarities are obtained for each frame, and an n-dimensional similarity vector sequence is obtained. Is calculated for each frame, and a series of n-dimensional vectors of the time variation of the similarity is calculated for each frame using a codebook previously obtained from the n-dimensional vector of the time variation of the similarity of the learning data. And registering the code series as a dictionary word pattern of a registered word dictionary.

4. A dictionary word pattern is obtained by encoding an n-dimensional similarity vector sequence using a codebook and encoding an n-dimensional vector sequence of similarity change over time using a codebook. 4. The registered word dictionary creation method according to claim 2, wherein a code sequence obtained by encoding is registered as a dictionary word pattern.

5. When a codebook is created from a similarity vector of learning data, a codebook is created using a codevector created by selecting only N components smaller than n for each n-dimensional codevector. 3. The method for creating a registered word dictionary according to claim 2, wherein:

6. When creating a codebook of a vector of the amount of time change of the similarity, a codebook is created using a codevector created by selecting only M components smaller than n for each n-dimensional codevector. 4. The method for creating a registered word dictionary according to claim 3, wherein

7. When creating a codebook of a similarity vector, a codebook using a codevector created by selecting only N components smaller than n and a codebook of a time variation vector of the similarity are included. 5. The method according to claim 4, wherein a codebook using a code vector created by selecting only M components smaller than n is used when creating the registered word dictionary.

8. A coded dictionary word pattern from a previously created registered word dictionary is decoded using a codebook previously obtained from a similarity vector of learning data, and the decoded dictionary word pattern is decoded. And a speech recognition method for recognizing the uttered input speech using the speech recognition method.

9. An uttered input voice has m (m is an integer) acoustic feature parameters obtained for each frame and n types (n is an integer) of standard patterns created in advance from learning data. Matching with m feature parameters is performed, n similarities are obtained for each frame, and an n-dimensional similarity vector sequence is obtained. This is used for each frame using a codebook previously obtained from the similarity vector of the learning data. A speech recognition method characterized by reconstructing a dictionary word pattern from a registered word dictionary created in advance, and recognizing the input speech using the dictionary word pattern.

10. An m-type (m is an integer) acoustic feature parameter obtained for each frame based on the uttered input voice,
N types (n is an integer) created in advance from learning data
Is performed with m feature parameters of each standard pattern, and n similarities are obtained for each frame.
As a dimensional similarity vector series, a time variation of similarity is obtained for each frame for each of the n types of similarity time series, and an n-dimensional vector sequence of this similarity time variation is calculated as Using a codebook obtained in advance from an n-dimensional vector of the time variation of the similarity of the learning data, a dictionary word pattern is restored from a registered word dictionary created in advance for each frame, and using this dictionary word pattern, A speech recognition method characterized by recognizing input speech.

11. An m-type (m is an integer) acoustic feature parameter obtained for each frame based on an uttered input voice;
N types (n is an integer) created in advance from learning data
Is performed with m feature parameters of each standard pattern, and n similarities are obtained for each frame.
For each of the n types of time series of similarities, a time change amount of the similarity is obtained for each frame, and the n-dimensional similarity vector series and the time of the similarity are calculated. The sequence of the n-dimensional vector of the change amount is 1
Using the codebook obtained in advance from the similarity of the learning data, the combined word pattern of the input voice is restored as a dictionary word pattern from a registered word dictionary created in advance for each frame, and this dictionary word pattern is used. And recognizing the input voice.

12. The speech recognition method according to claim 8, wherein the registered word dictionary uses a dictionary created by the method according to any one of claims 1 to 7.

13. A registrant utters dictionary registration words, and acoustic analysis means for obtaining m (m is an integer) acoustic feature parameters obtained for each frame, and n types (n is an integer) created from learning data in advance. A standard pattern storage unit that stores m feature parameters of each of the (integer) standard patterns, m feature parameters from the standard pattern storage unit, and m (m is an integer) sounds from the acoustic analysis unit Using similarity calculating means for matching characteristic feature parameters, obtaining n similarities for each frame, and obtaining an n-dimensional similarity vector sequence, and using a codebook previously obtained from the similarity vector of the learning data. A similar vector encoding means for encoding each frame, and a code for registering the encoded code sequence as a dictionary word pattern in a registered word dictionary. A registered word dictionary creating apparatus, comprising: a registered word dictionary storage unit.

14. A registrant utters dictionary registration words, and acoustic analysis means for obtaining m (m is an integer) acoustic feature parameters obtained for each frame, and n types (n is an integer) created in advance from learning data. A standard pattern storage unit that stores m feature parameters of each of the (integer) standard patterns, m feature parameters from the standard pattern storage unit, and m (m is an integer) sounds from the acoustic analysis unit Similarity calculating means for performing matching of characteristic feature parameters, obtaining n similarities for each frame, and obtaining an n-dimensional similarity vector series, and a similarity measure for each of the n types of time series of similarities Regression coefficient calculating means for calculating the time change amount of each frame, and obtaining a series of n-dimensional vectors of the time change amount of the similarity, Regression coefficient vector encoding means for encoding each frame using a codebook obtained from an n-dimensional vector of the amount of change, and a code for registering the encoded code sequence in a registered word dictionary as a dictionary word pattern A registered word dictionary creation device, comprising: a registered word dictionary storage unit.

15. A coded registered word dictionary storage means, comprising: a code sequence obtained by encoding the n-dimensional similarity vector sequence using a codebook; and an n-dimensional vector sequence of the time change amount of the similarity. 15. A code sequence obtained by encoding a code sequence using a code book is registered and registered as a dictionary word pattern.
Registered word dictionary creation device described in.

16. An acoustic analysis means for obtaining m (m is an integer) acoustic feature parameters obtained for each frame from an uttered input speech, and n kinds (n is an integer) of pre-created learning data. Standard pattern storage means for registering m feature parameters of each standard pattern;
A similarity calculation means for matching the m (m is an integer) acoustic feature parameters with the m feature parameters, obtaining n similarities for each frame, and obtaining an n-dimensional similarity vector sequence Word pattern restoring means for restoring the n-dimensional similarity vector series as a dictionary word pattern from a previously created registered word dictionary for each frame using a codebook previously obtained from the similarity vector of the learning data; A speech recognition device comprising: a word recognition unit that recognizes the input speech using the restored dictionary word pattern.

17. An acoustic analysis means for obtaining m (m is an integer) acoustic feature parameters obtained for each frame from an uttered input speech, and n kinds (n is an integer) of pre-created learning data. Standard pattern storage means for registering m feature parameters of each standard pattern;
A similarity calculation means for matching the m (m is an integer) acoustic feature parameters with the m feature parameters, obtaining n similarities for each frame, and obtaining an n-dimensional similarity vector sequence Regression coefficient calculating means for obtaining a time change amount of the similarity for each frame for each of the n types of similarity time series, and obtaining an n-dimensional vector series of the time change amount of the similarity, A word for which the n-dimensional similarity vector series is restored as a dictionary word pattern from a registered word dictionary created in advance for each frame, using a codebook previously obtained from an n-dimensional vector of the time variation of the similarity of the learning data. Pattern restoration means;
A speech recognition device comprising: a word recognition unit that recognizes the input speech using the restored dictionary word pattern.

18. An acoustic analysis means for obtaining m (m is an integer) acoustic feature parameters obtained for each frame from an uttered input speech, and n kinds (n is an integer) of pre-created learning data. Standard pattern storage means for registering m feature parameters of each standard pattern;
A similarity calculation means for matching the m (m is an integer) acoustic feature parameters with the m feature parameters, obtaining n similarities for each frame, and obtaining an n-dimensional similarity vector sequence Regression coefficient calculating means for obtaining a time change amount of the similarity for each frame for each of the n types of similarity time series, and obtaining an n-dimensional vector series of the time change amount of the similarity, A word pattern as a word pattern of an input voice, by combining the n-dimensional similarity vector series from the similarity calculating means and the n-dimensional vector series of the time variation of similarity from the regression coefficient calculating means into one Using a creating means and a codebook which previously obtained the word pattern of the input voice from an n-dimensional vector of the time change amount of the similarity of the learning data, for each frame. A word pattern restoring means for restoring the word dictionary as the dictionary word pattern, using the restored dictionary word pattern, the speech recognition apparatus characterized by comprising a word recognition means for recognizing the input speech.

19. The registered word dictionary according to claim 13, wherein
19. A registered dictionary created by the registered word dictionary creating apparatus according to any one of claims 16 to 18.
The speech recognition device according to any one of the above.