JPH04298795A

JPH04298795A - Standard pattern generation device

Info

Publication number: JPH04298795A
Application number: JP3051559A
Authority: JP
Inventors: Tadashi Suzuki; 忠鈴木; Kunio Nakajima; 中島　邦男
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1991-03-15
Filing date: 1991-03-15
Publication date: 1992-10-22
Anticipated expiration: 2015-02-14
Also published as: JP3008520B2

Abstract

PURPOSE:To obtain the standard pattern generation device which can generate a chord label time series with a different articulation mode without increasing the number of standard speakers as to a standard pattern generation device which generates the chord label time series with the different articulation mode by using plural standard speakers. CONSTITUTION:A weighted distance arithmetic means 11 calculates a weight distance between each of dimensional extended feature vectors of a dimensional extended feature vector time series obtained from a feature vector time series group of voices of plural standard speakers and each of the chords in a dimensional extended chord book according to weight data inputted from a weight data input terminal 10, and then outputs it as distance data. A distance minimum chord search means 12 inputs the distance data outputted by said weighted distance arithmetic means 11 and searches for the chord having the minimum distance to each dimensional extended feature vector to generate and write a chord label time series in a dictionary memory 7.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】この発明は、音声認識における標
準パタンの作成装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a standard pattern creation device for speech recognition.

【０００２】0002

【従来の技術】図２は、日本音響学会平成２年度春季研
究発表会講演論文集　２−３−３“話者適応型不特定話
者単語音声認識方式の検討”（鈴木、中島；平成２年３
月）に示された、従来の標準パタン作成装置を示すブロ
ック図である。図において、１は標準話者が発声した音
声が入力される入力端子である。２は入力端子１より入
力された音声を分析して特徴ベクトル時系列に変換する
音響分析手段である。３は複数の標準話者が発声した同
じカテゴリの音声を入力とすることで音響分析手段２よ
り出力される特徴ベクトル時系列群に対し、時間軸に関
する正規化処理を行い等長特徴ベクトル時系列群を出力
する時間軸正規化手段である。４は前記等長特徴ベクト
ル時系列群に対し同時点での特徴ベクトルを多重化する
ことで次元拡張特徴ベクトル時系列を生成する次元拡張
手段である。５は次元拡張コードブックを格納している
次元拡張コードブックメモリである。６は前記次元拡張
コードブックを用いて、前記次元拡張特徴ベクトル時系
列をベクトル量子化し、コードラベル時系列に変換する
次元拡張量子化手段である。７は前記コードラベル時系
列を保存する辞書メモリである。８は前記次元拡張コー
ドブックから各標準話者に対応する特徴ベクトルを抽出
し、各標準話者ごとのコードブックを生成する特徴ベク
トル抽出手段である。９は前記音響分析手段より出力さ
れる標準話者音声の特徴ベクトル時系列に対し、前記特
徴ベクトル抽出手段の出力である各標準話者ごとのコー
ドブックを用いてベクトル量子化することでコードラベ
ル時系列を生成し、前記辞書メモリに書き込むベクトル
量子化手段である。[Prior Art] Figure 2 shows the Proceedings of the 1990 Spring Research Conference of the Acoustical Society of Japan, 2-3-3 "Study of speaker-adaptive speaker-independent word speech recognition system" (Suzuki, Nakajima; 1990 Year 3
FIG. 2 is a block diagram showing a conventional standard pattern creation device shown in FIG. In the figure, reference numeral 1 denotes an input terminal into which voices uttered by a standard speaker are input. Reference numeral 2 denotes an acoustic analysis means that analyzes the audio input from the input terminal 1 and converts it into a time series of feature vectors. 3 performs normalization processing on the time axis on a group of feature vector time series output from the acoustic analysis means 2 by inputting speech of the same category uttered by multiple standard speakers to create an isometric feature vector time series. This is a time axis normalization means that outputs a group. Reference numeral 4 denotes a dimension expansion means for generating a dimension expansion feature vector time series by multiplexing feature vectors at the same point in time with respect to the equal-length feature vector time series group. Reference numeral 5 denotes a dimension expansion codebook memory that stores a dimension expansion codebook. Reference numeral 6 denotes a dimension expansion quantization means that uses the dimension expansion codebook to vector quantize the dimension expansion feature vector time series and converts it into a code label time series. 7 is a dictionary memory that stores the code label time series. Reference numeral 8 denotes a feature vector extracting means for extracting feature vectors corresponding to each standard speaker from the dimensionally expanded codebook and generating a codebook for each standard speaker. 9 is a code label obtained by vector quantizing the feature vector time series of the standard speaker's speech outputted from the acoustic analysis means using the codebook for each standard speaker, which is the output of the feature vector extraction means. A vector quantization means that generates a time series and writes it into the dictionary memory.

【０００３】次に動作について説明する。ここでは、Ｎ
人（ただし、Ｎ＞１）の標準話者を用いて、カテゴリｉ
（ｉは１からＩの整数をとり、Ｉはカテゴリ数を示す）
に属する標準パタンを作成する場合を例にとる。また、
次元拡張コードブックメモリ上には、すでに前述のＮ人
の標準話者の音声を用いて作成された次元拡張コードブ
ック数１（Ｊはコードブックサイズ）が格納されている
とする。Next, the operation will be explained. Here, N
category i using standard speakers (N>1)
(i is an integer from 1 to I, and I indicates the number of categories)
Let us take as an example the case of creating a standard pattern belonging to . Also,
It is assumed that the dimensionally expanded codebook memory has already stored a dimensionally expanded codebook number 1 (J is the codebook size) created using the voices of the aforementioned N standard speakers.

【０００４】0004

【数１】[Math 1]

【０００５】この時、Ｖｊは番号ｊが付されたコードワ
ードであり、数２で表される。[0005] At this time, Vj is a code word numbered j, and is expressed by equation 2.

【０００６】[0006]

【数２】[Math 2]

【０００７】数２において数３は、[0007] In equation 2, equation 3 is

【０００８】[0008]

【数３】[Math 3]

【０００９】話者番号ｎ（ｎは１からＮの整数をとる）
が付された標準話者（以後、標準話者ｎと称す）の音声
に対応するＭ次元の特徴ベクトルで、数４で表される。Speaker number n (n is an integer from 1 to N)
This is an M-dimensional feature vector corresponding to the voice of a standard speaker (hereinafter referred to as standard speaker n), which is expressed by Equation 4.

【００１０】0010

【数４】[Math 4]

【００１１】故に、ＶｊはＭ×Ｎ次元のベクトルである
。なお、次元拡張コードブックの作り方は、日本音響学
会昭和６３年度秋季研究発表会講演論文集　１−３−２
２“ベクトル量子化による話者適応化法の連続音声認識
への適用”（鈴木、中島、石川；昭和６３年１０月）に
詳しい。Therefore, Vj is an M×N dimensional vector. For information on how to create a dimensionally expanded codebook, see Proceedings of the 1986 Autumn Research Conference of the Acoustical Society of Japan, 1-3-2.
2 “Application of speaker adaptation method using vector quantization to continuous speech recognition” (Suzuki, Nakajima, Ishikawa; October 1988).

【００１２】標準話者ｎが発声したカテゴリｉの音声は
入力端子１を通して音響分析手段２に入力され、音響分
析手段２は入力音声を音響分析し、数５で表される特徴
ベクトル時系列を出力する。Speech of category i uttered by standard speaker n is input to acoustic analysis means 2 through input terminal 1, and acoustic analysis means 2 acoustically analyzes the input speech and generates a feature vector time series expressed by Equation 5. Output.

【００１３】[0013]

【数５】[Math 5]

【００１４】但し、Ｔｎ，ｉは系列数である。[0014] However, Tn,i is the number of sequences.

【００１５】全ての標準話者について（ｎ＝１．．．Ｎ
）前述の音響分析処理を行うことで、数６で表される特
徴ベクトル時系列群をつくる。For all standard speakers (n=1...N
) By performing the acoustic analysis process described above, a feature vector time series group expressed by Equation 6 is created.

【００１６】[0016]

【数６】[Math 6]

【００１７】前記特徴ベクトル時系列群数６は、時間軸
正規化手段３に入力され、系列数が相等しい等長特徴ベ
クトル時系列群数７に変換される。The number of feature vector time series groups, 6, is input to the time axis normalization means 3, and is converted into the number of equal length feature vector time series groups, 7, having the same number of sequences.

【００１８】[0018]

【数７】ここで、[Math 7] here,

【００１９】[0019]

【数８】[Math. 8]

【００２０】（ただし、Ｔｉは系列数で話者ｎに依らな
い）である。(where Ti is the number of sequences and does not depend on speaker n).

【００２１】次元拡張手段４は、前記等長特徴ベクトル
時系列群数７を入力とし、各等長特徴ベクトル時系列に
ついて、同時点ｔにおける特徴ベクトル数９を抽出し、
The dimension expansion means 4 inputs the number of equal-length feature vector time series groups, 7, and extracts the number of feature vectors, 9, at the same time point t, for each equal-length feature vector time series,

【００２２】[0022]

【数９】[Math. 9]

【００２３】これを多重化することで次元拡張特徴ベク
トル数１０を生成する。By multiplexing these, ten dimensionally extended feature vectors are generated.

【００２４】[0024]

【数１０】[Math. 10]

【００２５】これを式で表すと以下のようになる。[0025] This can be expressed as the following equation.

【００２６】[0026]

【数１１】[Math. 11]

【００２７】ただし、数９は等長ベクトル時系列数８の
時刻ｔにおける特徴ベクトルで、数１２で表される。However, Equation 9 is a feature vector at time t of equal-length vector time series number 8, and is expressed as Equation 12.

【００２８】[0028]

【数１２】[Math. 12]

【００２９】ここで、数１３は特徴ベクトル数９の第ｍ
次元の要素、Ｍは特徴ベクトルの次元数である。Here, equation 13 is the m-th feature vector number 9.
The dimension element, M, is the number of dimensions of the feature vector.

【００３０】[0030]

【数１３】[Math. 13]

【００３１】これにより次元拡張特徴ベクトルの次元数
は、Ｍ×Ｎ次元となる。この次元拡張特徴ベクトル数１
０をｔ＝１．．．Ｔｉについて求め、数１４で表される
次元拡張特徴ベクトル時系列を出力する。[0031] As a result, the number of dimensions of the dimensionally expanded feature vector becomes M×N dimensions. The number of dimensionally expanded feature vectors is 1
0 to t=1. ．．．． Ti is calculated, and a dimensionally expanded feature vector time series expressed by Equation 14 is output.

【００３２】[0032]

【数１４】[Math. 14]

【００３３】次元拡張量子化手段６は、前記次元拡張特
徴ベクトル時系列数１４を入力とし、各次元拡張特徴ベ
クトル数１０とのユークリッド距離を最小化するコード
を次元拡張コードブックメモリ５に記憶されている次元
拡張コードブック数１の中から探し、そのコードに対応
するコードラベルを得る事で、数１５で表されるコード
ラベル時系列を生成し、辞書メモリ７に書き込む。The dimension expansion quantization means 6 receives the dimension expansion feature vector time series number 14 as input, and stores in the dimension expansion code book memory 5 a code that minimizes the Euclidean distance with each dimension expansion feature vector number 10. A code label time series expressed by Equation 15 is generated by searching among the dimension expansion codebooks number 1 and obtaining a code label corresponding to the code, and writing it into the dictionary memory 7.

【００３４】[0034]

【数１５】[Math. 15]

【００３５】但し、数１６はコードラベルである。However, number 16 is a code label.

【００３６】[0036]

【数１６】[Math. 16]

【００３７】以上述べたようにコードラベル時系列数１
５は、複数の標準話者の音声の特徴ベクトルを重畳させ
て得られた次元拡張特徴ベクトルをベクトル量子化する
ことで作成されているので、複数標準話者に対し、平均
的な調音様態を持つものになっている。As mentioned above, the number of code label time series is 1
5 is created by vector quantizing the dimensionally expanded feature vector obtained by superimposing the feature vectors of the voices of multiple standard speakers. It has become something to have.

【００３８】特徴ベクトル抽出手段８は、次元拡張コー
ドブックメモリ５に記憶されている次元拡張コードブッ
ク数１のコードワードＶｊから、数２の関係に従い、標
準話者ｎの特徴ベクトルを抽出し、コードワード数４を
得る。これをｊ＝１．．．Ｊについて行う事で、標準話
者ｎのコードブック数１７を生成する。The feature vector extracting means 8 extracts the feature vector of the standard speaker n from the codeword Vj of the dimensionally extended codebook number 1 stored in the dimensionally extended codebook memory 5 according to the relationship expressed in equation 2, Obtain codeword number 4. This is j=1. ．．．． By doing this for J, the number of codebooks for standard speaker n is 17.

【００３９】[0039]

【数１７】[Math. 17]

【００４０】この処理を全ての標準話者について行うこ
とで、全ての標準話者に対するコードブック数１８が作
成される。By performing this process for all standard speakers, 18 codebooks are created for all standard speakers.

【００４１】[0041]

【数１８】[Math. 18]

【００４２】ベクトル量子化手段９は、音響分析手段２
の出力であるところの前記特徴ベクトル時系列群数６に
属する特徴ベクトル時系列数５に対し、特徴ベクトル抽
出手段８の出力であるところの、標準話者ｎに対するコ
ードブック数１７を用いて、ベクトル量子化処理を行い
コードラベル時系列数１９The vector quantization means 9 is the acoustic analysis means 2
Using the codebook number 17 for standard speaker n, which is the output of the feature vector extraction means 8, for the feature vector time series number 5 belonging to the feature vector time series group number 6, which is the output of Perform vector quantization processing to reduce the number of code label time series to 19

【００４３】[0043]

【数１９】[Math. 19]

【００４４】（ただし、数２０はコードラベル、Ｔｎ，
ｉは系列数）を生成する。(However, number 20 is a code label, Tn,
i is the number of sequences).

【００４５】[0045]

【数２０】[Math. 20]

【００４６】これをｎ＝１．．．Ｎについて行うことで
、コードラベル時系列群数２１を生成し、これを辞書メ
モリに書き込む。[0046] This is set to n=1. ．．．． By doing this for N, a code label time series group number 21 is generated, and this is written into the dictionary memory.

【００４７】[0047]

【数２１】[Math. 21]

【００４８】以上述べたように、生成されたコードラベ
ル時系列群は、各標準話者の音声から作成されているの
で、相異なる調音様態を持っている。As described above, since the generated code label time series groups are created from the voices of each standard speaker, they have different modes of articulation.

【００４９】以上の動作により、辞書メモリにはカテゴ
リｉの音声に対応するＮ＋１組の相異なる調音様態を持
つコードラベル時系列が格納される。Through the above operations, the code label time series having N+1 sets of different articulation modes corresponding to the voices of category i are stored in the dictionary memory.

【００５０】[0050]

【発明が解決しようとする課題】従来の標準パタン作成
装置は、以上のように構成されているので、カテゴリｉ
の音声に対し、Ｎ人の標準話者の調音様態を平均したコ
ードラベル時系列が１組と各標準話者の調音様態に従っ
たコードラベル時系列が標準話者数であるＮ組の、合計
Ｎ＋１組のコードラベル時系列が生成されることとなり
、この様な標準パタン作成装置を話者適応化音声認識シ
ステムに、入力話者の調音様態の多様性に対処した認識
率向上のために採用した場合、調音様態の異なるコード
ラベル時系列を増やすためには、標準話者数を増やさな
ければならないという課題があった。[Problems to be Solved by the Invention] Since the conventional standard pattern creation device is configured as described above,
For the speech of , one set of code label time series is an average of the articulation modes of N standard speakers, and N sets of code label time series according to the articulation modes of each standard speaker are the number of standard speakers. A total of N+1 sets of code label time series will be generated, and such a standard pattern generation device will be used in a speaker-adaptive speech recognition system to improve the recognition rate while dealing with the diversity of articulatory modes of input speakers. If adopted, there was a problem that the number of standard speakers would have to be increased in order to increase the number of code label time series with different articulation modes.

【００５１】この発明は、上記のような課題を解消する
ためになされたもので、標準話者数を増やすことなく調
音様態の異なるコードラベル時系列を作成することが可
能な標準パタン作成装置を得ることを目的としている。The present invention was made to solve the above-mentioned problems, and provides a standard pattern creation device that can create chord label time series with different modes of articulation without increasing the number of standard speakers. The purpose is to obtain.

【００５２】[0052]

【課題を解決するための手段】この発明にかかる標準パ
タン作成装置は、複数の標準話者が発声した同カテゴリ
音声を入力として信号分析を行い特徴ベクトル時系列群
に変換する音響分析手段と、この音響分析手段の出力で
ある特徴ベクトル時系列群に対し、時間軸の正規化を行
う時間軸正規化手段と、この時間軸正規化手段の出力の
等長特徴ベクトル時系列群を用いて次元拡張特徴ベクト
ル時系列を生成する次元拡張手段と、次元拡張コードブ
ックを格納する次元拡張コードブックメモリと、重みデ
ータ入力端子より入力された重みデータに従い前記次元
拡張コードブックの各コードと前記次元拡張特徴ベクト
ル時系列の各次元拡張特徴ベクトルとの重みつき距離を
計算する重みつき距離演算手段と、この重みつき距離演
算手段により求められた距離データを入力としてコード
ラベル時系列を出力する距離最小コード探索手段と、こ
のコードラベル時系列を保存する辞書メモリとを設けた
ものである。なお以降、前述の重みつき距離演算手段と
距離最小コード探索手段をまとめて重みつきベクトル量
子化手段と称す。[Means for Solving the Problems] A standard pattern creation device according to the present invention includes acoustic analysis means for inputting the same category speech uttered by a plurality of standard speakers, performing signal analysis and converting it into a feature vector time series group; For the feature vector time series group output from this acoustic analysis means, a time axis normalization means normalizes the time axis, and the isometric feature vector time series group output from this time axis normalization means is used to calculate the dimension. dimension expansion means for generating an extended feature vector time series; a dimension expansion codebook memory for storing a dimension expansion codebook; and a dimension expansion means for generating each code of the dimension expansion codebook and the dimension expansion according to weight data inputted from a weight data input terminal. A weighted distance calculation means that calculates a weighted distance between the feature vector time series and each dimension extended feature vector, and a distance minimum code that outputs a code label time series using the distance data obtained by the weighted distance calculation means as input. This system is provided with a search means and a dictionary memory for storing this code label time series. Note that hereinafter, the above-mentioned weighted distance calculation means and distance minimum code search means are collectively referred to as weighted vector quantization means.

【００５３】[0053]

【作用】この発明における重みつきベクトル量子化手段
は、その構成要素である重みつき距離演算手段が重みデ
ータ入力端子より入力された重みデータに従い、次元拡
張特徴ベクトル時系列の各次元拡張特徴ベクトルと次元
拡張コードブックメモリに格納されている各コードとの
重みつき距離を演算し、また同じく重みつきベクトル量
子化手段の構成要素である距離最小コード探索手段が、
前記重みつき距離演算手段の出力である距離データを入
力として、各次元拡張特徴ベクトルに対し最小の距離を
与えるコードを探索する事でコードラベル時系列を作成
し、辞書メモリに書き込むので、重みデータ入力端子か
ら入力される重みデータを変えて、前述した重みつきベ
クトル量子化手段における処理を繰り返す度に、調音様
態の異なるコードラベル時系列が生成される。[Operation] In the weighted vector quantization means of the present invention, the weighted distance calculation means, which is a component thereof, calculates each dimension extended feature vector of the dimension extended feature vector time series according to the weight data input from the weight data input terminal. Minimum distance code searching means, which is also a component of the weighted vector quantization means, calculates the weighted distance with each code stored in the dimension expansion codebook memory.
A code label time series is created by inputting the distance data which is the output of the weighted distance calculation means and searching for a code that gives the minimum distance for each dimension extended feature vector, and writing it into the dictionary memory, so that the weight data Each time the weighting data input from the input terminal is changed and the process in the weighted vector quantization means described above is repeated, a chord label time series with a different articulation mode is generated.

【００５４】[0054]

【実施例】以下、この発明の一実施例を図について説明
する。図１において、１は入力端子、２は音響分析手段
、３は時間軸正規化手段、４は次元拡張手段、５は次元
拡張コードブックメモリ、７は辞書メモリであり、図２
の同一符号を付した従来の装置と同一、あるいは相当部
分であるため詳細な説明は省略する。DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. In FIG. 1, 1 is an input terminal, 2 is an acoustic analysis means, 3 is a time axis normalization means, 4 is a dimension expansion means, 5 is a dimension expansion codebook memory, and 7 is a dictionary memory.
The detailed explanation will be omitted since these are the same or equivalent parts to the conventional devices with the same reference numerals.

【００５５】また、１０は重みデータを入力する重みデ
ータ入力端子である。１１は、前記重みデータ入力端子
１０から入力される重みデータに従い、前記次元拡張手
段４の出力である次元拡張特徴ベクトル時系列の各特徴
ベクトルと前記次元拡張コードブックメモリ５に保存さ
れている次元拡張コードブックの各コードとの重みつき
距離を演算する重みつき距離演算手段である。１２はこ
の重みつき距離演算手段１１の出力である距離データを
入力として、前記次元拡張特徴ベクトル時系列の各コー
ドに対し、最小の距離を与えるコードを探索することで
、コードラベル時系列を作成し、前記辞書メモリに書き
込む距離最小コード探索手段である。１３は、前記重み
つき距離演算手段１１と前記距離最小コード探索手段１
２とで構成される重みつきベクトル量子化手段である。Further, 10 is a weight data input terminal for inputting weight data. Reference numeral 11 denotes each feature vector of the dimension expansion feature vector time series output from the dimension expansion means 4 and the dimension stored in the dimension expansion codebook memory 5 according to the weight data inputted from the weight data input terminal 10. This is a weighted distance calculation means that calculates a weighted distance with each code of the extended codebook. 12 creates a code label time series by using the distance data that is the output of the weighted distance calculation means 11 as input and searching for a code that provides the minimum distance for each code of the dimension-expanded feature vector time series. and minimum distance code searching means for writing into the dictionary memory. 13, the weighted distance calculation means 11 and the distance minimum code search means 1;
This is a weighted vector quantization means consisting of 2 and 2.

【００５６】次に動作について説明する。この場合も、
従来の場合と同様に、標準話者はＮ人（ただし、Ｎ＞１
）とし、次元拡張コードブックは既に次元拡張コードブ
ックメモリ上に格納されているとする。また、従来と同
様にカテゴリｉの音声についてコードラベル時系列を作
成し辞書メモリに格納する場合を例にとる。音声の入力
から、次元拡張手段までの動作は従来と同じ動作をする
ので説明を省略し、重みデータ入力端子１０から説明を
行う。Next, the operation will be explained. In this case too,
As in the conventional case, there are N standard speakers (however, N > 1
), and the dimension-extended codebook is already stored in the dimension-extended codebook memory. Further, as in the conventional case, a case will be taken as an example in which a code label time series is created for audio of category i and stored in the dictionary memory. The operations from voice input to the dimension expansion means are the same as conventional ones, so the explanation will be omitted, and the explanation will start from the weight data input terminal 10.

【００５７】重みデータ入力端子１０からは、各標準話
者ｎに対する重みデータＷｎ（ただし、ｎ＝１．．．Ｎ
）が入力される。重みデータＷｎ（ただし、ｎ＝１．．
．Ｎ）は、重みつきベクトル量子化手段１３が辞書メモ
リ７に書き込むコードラベル時系列の調音様態に、各標
準話者の調音様態がどの程度寄与するかを決定するもの
で、数２２を満たす。From the weight data input terminal 10, weight data Wn for each standard speaker n (where n=1...N
) is input. Weight data Wn (where n=1..
．． N) determines how much the articulation mode of each standard speaker contributes to the articulation mode of the code label time series written in the dictionary memory 7 by the weighted vector quantization means 13, and satisfies Equation 22.

【００５８】[0058]

【数２２】[Math. 22]

【００５９】重みつき距離演算手段１１は、重みデータ
入力端子１０より入力される重みデータＷｎ（ただし、
ｎ＝１．．．Ｎ）を用いて、前記次元拡張手段４の出力
であるところのカテゴリｉの音声に対応する次元拡張特
徴ベクトル時系列数１４の次元拡張特徴ベクトル数１１
と、前記次元拡張コードブックメモリ５に格納されてい
る次元拡張コードブック数１（ただし、Ｊはコードブッ
クサイズ）のコードワード数２との重みつき距離Ｄｉ（
ｊ，ｔ）を次式のように求める。The weighted distance calculation means 11 receives weight data Wn (however,
n=1. ．．．． N), the number of dimension extended feature vectors corresponding to the voice of category i, which is the output of the dimension expanding means 4, is 11, and the number of dimension extended feature vector time series is 14.
The weighted distance Di(
j, t) is determined as shown below.

【００６０】[0060]

【数２３】[Math. 23]

【００６１】ただし、数３は、数４に示すようなＭ次元
の特徴ベクトルで、次元拡張コードブック数１のコード
ワードＶｊから、標準話者ｎに対応する特徴ベクトルを
抽出したものである。同様に、数９は、数１２に示すよ
うなＭ次元の特徴ベクトルで、次元拡張特徴ベクトル数
１０から、標準話者ｎに対応する特徴ベクトルを抽出し
たものである。数２４は、前記の特徴ベクトル数３と数
９のユークリッド距離で、However, Equation 3 is an M-dimensional feature vector as shown in Equation 4, which is a feature vector corresponding to standard speaker n extracted from the codeword Vj of the dimension expansion codebook number 1. Similarly, Equation 9 is an M-dimensional feature vector as shown in Equation 12, which is a feature vector corresponding to standard speaker n extracted from the dimensionally expanded feature vector number 10. Equation 24 is the Euclidean distance between the feature vector number 3 and number 9,

【００６２】[0062]

【数２４】[Math. 24]

【００６３】数２５で求められる。##EQU25##

【００６４】[0064]

【数２５】[Math. 25]

【００６５】この様にして、重みつき距離Ｄｉ（ｊ，ｔ
）は、次元拡張コードブック数１のコードワードＶｊか
ら抽出した話者番号ｎが付された標準話者に対応する特
徴ベクトル数３と、次元拡張特徴ベクトル数１０から抽
出した前記標準話者の特徴ベクトル数９とのユークリッ
ド距離に重みデータＷｎで重み付けした値を、話者番号
ｎ＝１．．．Ｎについて総和する事で得られるので、重
みつき距離Ｄｉ（ｊ，ｔ）に対する各標準話者の寄与率
は、重みデータＷｎ（ただし、ｎ＝１．．．Ｎ）に依る
。前記重みつき距離Ｄｉ（ｊ，ｔ）をｊ＝１．．．Ｊ、
ｔ＝１．．．Ｔｉについて求め、距離データとして出力
する。In this way, the weighted distance Di(j, t
) is the number of feature vectors 3 corresponding to the standard speaker assigned the speaker number n extracted from the code word Vj with the number of dimensional expansion codebooks 1, and the number of feature vectors of the standard speaker extracted from the number 10 of dimension expansion feature vectors. The value obtained by weighting the Euclidean distance with the number of feature vectors 9 using the weight data Wn is set to the speaker number n=1. ．．．． Since it is obtained by summing over N, the contribution rate of each standard speaker to the weighted distance Di (j, t) depends on the weight data Wn (where n=1...N). The weighted distance Di(j, t) is set to j=1. ．．．． J.
t=1. ．．．． Ti is determined and output as distance data.

【００６６】距離最小コード探索手段１２は、前記重み
つき距離演算手段１１の出力であるところの距離データ
Ｄｉ（ｊ，ｔ）（ただし、ｊ＝１．．．Ｊ、ｔ＝１．．
．Ｔｉ）を入力として、任意のｔに対し、最小のＤｉ（
ｊ，ｔ）を与えるｊを探索し、そのｊに対応するコード
のコードラベルを求めることで、コードラベル時系列数
２６The distance minimum code search means 12 uses the distance data Di(j, t) which is the output of the weighted distance calculation means 11 (where j=1...J, t=1...
．． Ti) as input, for any t, the minimum Di(
By searching for j that gives j, t) and finding the code label of the code corresponding to that j, the number of code label time series is 26.

【００６７】[0067]

【数２６】[Math. 26]

【００６８】（但し、数２７はコードラベル）を作成し
、辞書メモリ７に書き込む。(where number 27 is a code label) is created and written into the dictionary memory 7.

【００６９】[0069]

【数２７】[Math. 27]

【００７０】重みデータ入力端子１０より入力する重み
データを変えながら、前述した重みつき距離演算手段１
１と、距離最小コード探索手段１２の動作を繰り返すこ
とで、コードラベル時系列が複数個辞書メモリ上に作成
されるWhile changing the weight data input from the weight data input terminal 10, the weighted distance calculation means 1 described above
By repeating steps 1 and 12, a plurality of code label time series are created on the dictionary memory.

【００７１】以上述べたように、カテゴリｉの音声に対
応するコードラベル時系列は、重みつき距離演算手段１
１の出力であるところの距離データを入力として、距離
最小コード探索手段１２が作成したものであるので、複
数標準話者に対して重みデータＷｎで重み付けして平均
した調音様態を持っており、辞書メモリ上の、カテゴリ
ｉの音声に対応する複数個のコードラベル時系列は、そ
れぞれ異なる重みデータに従って作成されているので、
相異なる調音様態を持っており、この様な標準パタン作
成装置を話者適応化音声認識システムに、入力話者の調
音様態の多様性に対処した認識率向上のために採用した
場合、標準話者数を増やすこと無く調音様態の異なる標
準パタンが作成できる。なお、上記実施例では専用のハ
ードウェアにて構成するものを示したが、汎用の計算機
や信号処理プロセッサにおけるソフトウェア処理によっ
て実現するようにしても良い。As described above, the code label time series corresponding to the voice of category i is calculated by the weighted distance calculation means 1.
Since it is created by the minimum distance code search means 12 using the distance data which is the output of 1 as input, it has an articulation mode that is weighted and averaged by weight data Wn for multiple standard speakers. Since multiple code label time series corresponding to voices of category i on the dictionary memory are created according to different weight data,
They have different modes of articulation, and if such a standard pattern creation device is adopted in a speaker-adaptive speech recognition system to improve the recognition rate by dealing with the diversity of articulatory modes of input speakers, it is possible to Standard patterns with different modes of articulation can be created without increasing the number of participants. In addition, although the above-mentioned embodiment shows the configuration using dedicated hardware, it may be realized by software processing in a general-purpose computer or signal processing processor.

【００７２】[0072]

【発明の効果】以上のようにこの発明によれば、複数の
標準話者が発声した同カテゴリ音声を入力として信号分
析を行い特徴ベクトル時系列群に変換する音響分析手段
と、この音響分析手段の出力である特徴ベクトル時系列
群に対し、時間軸の正規化を行う時間軸正規化手段と、
この時間軸正規化手段の出力の等長特徴ベクトル時系列
群を用いて次元拡張特徴ベクトル時系列を生成する次元
拡張手段と、次元拡張コードブックを格納する次元拡張
コードブックメモリと、重みデータ入力端子より入力さ
れた重みデータに従い前記次元拡張コードブックの各コ
ードと前記次元拡張特徴ベクトル時系列の各次元拡張特
徴ベクトルとの重みつき距離を計算する重みつき距離演
算手段と、この重みつき距離演算手段により求められた
距離データを入力としてコードラベル時系列を出力する
距離最小コード探索手段と、このコードラベル時系列を
保存する辞書メモリとで構成したので、標準話者数に依
らず複数個の調音様態の異なるコードラベル時系列が作
成でき、この様な標準パタン作成装置を話者適応化音声
認識システムに、入力話者の調音様態の多様性に対処し
た認識率向上のために採用した場合、標準話者数を増や
すこと無く調音様態の異なる標準パタンが作成できる。As described above, according to the present invention, there is provided acoustic analysis means for inputting speech of the same category uttered by a plurality of standard speakers, performing signal analysis and converting it into a time series group of feature vectors, and this acoustic analysis means. a time axis normalization means for normalizing the time axis of the feature vector time series group output from the
dimension expansion means for generating a dimension expansion feature vector time series using the isometric feature vector time series group output from the time axis normalization means; a dimension expansion codebook memory for storing a dimension expansion codebook; and a weight data input. weighted distance calculation means for calculating a weighted distance between each code of the dimension expansion codebook and each dimension expansion feature vector of the dimension expansion feature vector time series according to weight data inputted from a terminal; and the weighted distance calculation means. It consists of a distance minimum code search means that inputs the distance data obtained by the method and outputs a code label time series, and a dictionary memory that stores this code label time series. When code label time series with different articulatory modes can be created and such a standard pattern creation device is adopted in a speaker-adaptive speech recognition system to improve the recognition rate while dealing with the diversity of the articulatory modes of input speakers. , standard patterns with different modes of articulation can be created without increasing the number of standard speakers.

【００７３】[0073]

[Brief explanation of drawings]

【図１】この発明の一実施例に係る標準パタン作成装置
の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a standard pattern creation device according to an embodiment of the present invention.

【図２】従来の標準パタン作成装置の構成を示すブロッ
ク図である。FIG. 2 is a block diagram showing the configuration of a conventional standard pattern creation device.

[Explanation of symbols]

１　　音声信号の入力端子２　　音響分析手段３　　時間軸正規化手段４　　次元拡張手段５　　次元拡張コードブックメモリ６　　次元拡張量子化手段７　　辞書メモリ８　　特徴ベクトル抽出手段９　　ベクトル量子化手段１０　　重みデータ入力端子１１　　重みつき距離演算手段１２　　距離最小コード探索手段１３　　重みつきベクトル量子化手段 1 Audio signal input terminal 2 Acoustic analysis means 3 Time axis normalization means 4 Dimension expansion means 5 Dimension expansion codebook memory 6 Dimension expansion quantization means 7 Dictionary memory 8 Feature vector extraction means 9 Vector quantization means 10 Weight data input terminal 11 Weighted distance calculation means 12 Minimum distance code search means 13 Weighted vector quantization means

Claims

[Claims]

Claim 1: Acoustic analysis means for inputting the same category speech uttered by a plurality of standard speakers and outputting a feature vector time series group by performing acoustic analysis, and a feature vector time series group output from the acoustic analysis means. On the other hand, a time-axis normalization means for normalizing the time axis, a dimension expansion means for generating a dimension-expanded feature vector time series using a group of isometric feature vector time series output from the time-axis normalization means, A dimension expansion codebook memory that stores an expansion codebook, and weighting of each code of the dimension expansion codebook and each dimension expansion feature vector of the dimension expansion feature vector time series according to weight data input from a weight data input terminal. weighted distance calculation means for calculating a distance; distance minimum code search means for outputting a code label time series using the distance data obtained by the weighted distance calculation means as input; and a dictionary memory for storing the code label time series. A standard pattern creation device equipped with