JP3008520B2

JP3008520B2 - Standard pattern making device

Info

Publication number: JP3008520B2
Application number: JP3051559A
Authority: JP
Inventors: 鈴木　　忠; 邦男中島
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1991-03-15
Filing date: 1991-03-15
Publication date: 2000-02-14
Anticipated expiration: 2015-02-14
Also published as: JPH04298795A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は、音声認識における標
準パタンの作成装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an apparatus for creating a standard pattern in speech recognition.

【０００２】[0002]

【従来の技術】図２は、日本音響学会平成2年度春季研
究発表会講演論文集 2-3-3“話者適応型不特定話者単語
音声認識方式の検討”（鈴木、中島；平成2年3月）に示
された、従来の標準パタン作成装置を示すブロック図で
ある。図において、１は標準話者が発声した音声が入力
される入力端子である。２は入力端子１より入力された
音声を分析して特徴ベクトル時系列に変換する音響分析
手段である。３は複数の標準話者が発声した同じカテゴ
リの音声を入力とすることで音響分析手段２より出力さ
れる特徴ベクトル時系列群に対し、時間軸に関する正規
化処理を行い等長特徴ベクトル時系列群を出力する時間
軸正規化手段である。４は前記等長特徴ベクトル時系列
群に対し同時点での特徴ベクトルを多重化することで次
元拡張特徴ベクトル時系列を生成する次元拡張手段であ
る。５は次元拡張コードブックを格納している次元拡張
コードブックメモリである。６は前記次元拡張コードブ
ックを用いて、前記次元拡張特徴ベクトル時系列をベク
トル量子化し、コードラベル時系列に変換する次元拡張
量子化手段である。７は前記コードラベル時系列を保存
する辞書メモリである。８は前記次元拡張コードブック
から各標準話者に対応する特徴ベクトルを抽出し、各標
準話者ごとのコードブックを生成する特徴ベクトル抽出
手段である。９は前記音響分析手段より出力される標準
話者音声の特徴ベクトル時系列に対し、前記特徴ベクト
ル抽出手段の出力である各標準話者ごとのコードブック
を用いてベクトル量子化することでコードラベル時系列
を生成し、前記辞書メモリに書き込むベクトル量子化手
段である。2. Description of the Related Art Fig. 2 is a collection of the papers of the Acoustical Society of Japan, Spring Research Conference, 1990 2-3-3, "Study on speaker-adaptive unspecified speaker word speech recognition method" (Suzuki, Nakajima; Heisei 2 FIG. 3 is a block diagram showing a conventional standard pattern creation device shown in FIG. In the figure, reference numeral 1 denotes an input terminal to which a voice uttered by a standard speaker is input. Reference numeral 2 denotes an acoustic analysis unit that analyzes a voice input from the input terminal 1 and converts the voice into a feature vector time series. Reference numeral 3 denotes an isometric feature vector time series by performing a normalization process on a time axis with respect to a feature vector time series output from the acoustic analysis means 2 by inputting voices of the same category uttered by a plurality of standard speakers. This is a time axis normalizing means for outputting a group. Reference numeral 4 denotes a dimension extension unit that generates a dimension extension feature vector time series by multiplexing feature vectors at the same point with the isometric feature vector time series group. Reference numeral 5 denotes a dimension extension codebook memory that stores a dimension extension codebook. Reference numeral 6 denotes a dimension extension quantization unit that performs vector quantization on the dimension extension feature vector time series using the dimension extension codebook and converts the time series into a code label time series. Reference numeral 7 denotes a dictionary memory for storing the code label time series. Reference numeral 8 denotes a feature vector extracting unit that extracts a feature vector corresponding to each standard speaker from the dimension-extended codebook and generates a codebook for each standard speaker. Reference numeral 9 denotes a code label by performing vector quantization on the feature vector time series of the standard speaker voice output from the acoustic analysis unit using a codebook for each standard speaker output from the feature vector extraction unit. Vector quantization means for generating a time series and writing the time series in the dictionary memory.

【０００３】次に動作について説明する。ここでは、Ｎ
人（ただし、Ｎ＞１）の標準話者を用いて、カテゴリｉ
（ｉは１からＩの整数をとり、Ｉはカテゴリ数を示す）
に属する標準パタンを作成する場合を例にとる。また、
次元拡張コードブックメモリ上には、すでに前述のＮ人
の標準話者の音声を用いて作成された次元拡張コードブ
ック数１（Ｊはコードブックサイズ）が格納されている
とする。Next, the operation will be described. Here, N
Using standard speakers (where N> 1), category i
(I is an integer from 1 to I, and I indicates the number of categories)
The case where a standard pattern belonging to is created is taken as an example. Also,
It is assumed that the dimension-extended codebook memory has already stored the dimension-extended codebook number 1 (J is a codebook size) created using the voices of the N standard speakers described above.

【０００４】[0004]

【数１】 (Equation 1)

【０００５】この時、Ｖjは番号ｊが付されたコードワ
ードであり、数２で表される。[0005] At this time, Vj is a code word with the number j, and is expressed by Equation 2.

【０００６】[0006]

【数２】 (Equation 2)

【０００７】数２において数３は、In Equation 2, Equation 3 is

【０００８】[0008]

【数３】 (Equation 3)

【０００９】話者番号ｎ（ｎは１からＮの整数をとる）
が付された標準話者（以後、標準話者ｎと称す）の音声
に対応するＭ次元の特徴ベクトルで、数４で表される。Speaker number n (n is an integer from 1 to N)
Is an M-dimensional feature vector corresponding to the speech of a standard speaker (hereinafter, referred to as a standard speaker n), and is expressed by Expression 4.

【００１０】[0010]

【数４】 (Equation 4)

【００１１】故に、ＶjはＭ×Ｎ次元のベクトルであ
る。なお、次元拡張コードブックの作り方は、日本音響
学会昭和63年度秋季研究発表会講演論文集 1-3-22“ベ
クトル量子化による話者適応化法の連続音声認識への適
用”（鈴木、中島、石川；昭和63年10月）に詳しい。Therefore, Vj is an M × N-dimensional vector. The method of creating a dimension expansion codebook is described in the Autumn Meeting of the Acoustical Society of Japan, 1988, Fall Meeting, Proceedings 1-32-2, "Applying Speaker Adaptation by Vector Quantization to Continuous Speech Recognition" (Suzuki, Nakajima , Ishikawa; October 1988).

【００１２】標準話者ｎが発声したカテゴリｉの音声は
入力端子１を通して音響分析手段２に入力され、音響分
析手段２は入力音声を音響分析し、数５で表される特徴
ベクトル時系列を出力する。The voice of category i uttered by the standard speaker n is input to the acoustic analysis means 2 through the input terminal 1, and the acoustic analysis means 2 analyzes the input voice to obtain a feature vector time series expressed by the following equation (5). Output.

【００１３】[0013]

【数５】 (Equation 5)

【００１４】但し、Ｔn,iは系列数である。Here, Tn, i is the number of streams.

【００１５】全ての標準話者について（ｎ＝１...Ｎ）
前述の音響分析処理を行うことで、数６で表される特徴
ベクトル時系列群をつくる。For all standard speakers (n = 1 ... N)
By performing the above-described acoustic analysis processing, a feature vector time series group represented by Expression 6 is created.

【００１６】[0016]

【数６】 (Equation 6)

【００１７】前記特徴ベクトル時系列群数６は、時間軸
正規化手段３に入力され、系列数が相等しい等長特徴ベ
クトル時系列群数７に変換される。The number of feature vector time series groups 6 is input to the time axis normalizing means 3 and is converted into the number of equal length feature vector time series groups 7 having the same number of series.

【００１８】[0018]

【数７】ここで、(Equation 7) here,

【００１９】[0019]

【数８】 (Equation 8)

【００２０】（ただし、Ｔiは系列数で話者ｎに依らな
い）である。(Where Ti is the number of streams and does not depend on speaker n).

【００２１】次元拡張手段４は、前記等長特徴ベクトル
時系列群数７を入力とし、各等長特徴ベクトル時系列に
ついて、同時点ｔにおける特徴ベクトル数９を抽出し、The dimension extending means 4 receives the number 7 of isometric feature vector time series groups as input, and extracts 9 feature vectors at the simultaneous point t for each isometric feature vector time series.

【００２２】[0022]

【数９】 (Equation 9)

【００２３】これを多重化することで次元拡張特徴ベク
トル数１０を生成する。This is multiplexed to generate the number of dimensionally extended feature vectors 10.

【００２４】[0024]

【数１０】 (Equation 10)

【００２５】これを式で表すと以下のようになる。This is represented by the following equation.

【００２６】[0026]

【数１１】 [Equation 11]

【００２７】ただし、数９は等長ベクトル時系列数８の
時刻ｔにおける特徴ベクトルで、数１２で表される。[Mathematical formula-see original document] where the equation 9 is a feature vector at time t of the isometric vector time series number 8, and is represented by the equation 12.

【００２８】[0028]

【数１２】 (Equation 12)

【００２９】ここで、数１３は特徴ベクトル数９の第ｍ
次元の要素、Ｍは特徴ベクトルの次元数である。Here, Equation 13 is the m-th feature vector number 9
The dimension element, M, is the number of dimensions of the feature vector.

【００３０】[0030]

【数１３】 (Equation 13)

【００３１】これにより次元拡張特徴ベクトルの次元数
は、Ｍ×Ｎ次元となる。この次元拡張特徴ベクトル数１
０をｔ＝１...Ｔiについて求め、数１４で表される次元
拡張特徴ベクトル時系列を出力する。As a result, the number of dimensions of the dimensionally expanded feature vector becomes M × N. This number of dimensionally expanded feature vectors 1
0 is obtained for t = 1... Ti, and a dimension-extended feature vector time series represented by Expression 14 is output.

【００３２】[0032]

【数１４】 [Equation 14]

【００３３】次元拡張量子化手段６は、前記次元拡張特
徴ベクトル時系列数１４を入力とし、各次元拡張特徴ベ
クトル数１０とのユークリッド距離を最小化するコード
を次元拡張コードブックメモリ５に記憶されている次元
拡張コードブック数１の中から探し、そのコードに対応
するコードラベルを得る事で、数１５で表されるコード
ラベル時系列を生成し、辞書メモリ７に書き込む。The dimension extension quantization means 6 receives the dimension extension feature vector time series number 14 as an input, and stores a code for minimizing the Euclidean distance from each dimension extension feature vector number 10 in the dimension extension codebook memory 5. A code label corresponding to the code is obtained by searching from the number 1 of the extended dimension codebook, and a code label time series expressed by the expression 15 is generated and written to the dictionary memory 7.

【００３４】[0034]

【数１５】 (Equation 15)

【００３５】但し、数１６はコードラベルである。Here, the expression 16 is a code label.

【００３６】[0036]

【数１６】 (Equation 16)

【００３７】以上述べたようにコードラベル時系列数１
５は、複数の標準話者の音声の特徴ベクトルを重畳させ
て得られた次元拡張特徴ベクトルをベクトル量子化する
ことで作成されているので、複数標準話者に対し、平均
的な調音様態を持つものになっている。As described above, the code label time series number 1
5 is created by vector quantizing the dimensionally extended feature vector obtained by superimposing the feature vectors of the voices of a plurality of standard speakers. It has something to have.

【００３８】特徴ベクトル抽出手段８は、次元拡張コー
ドブックメモリ５に記憶されている次元拡張コードブッ
ク数１のコードワードＶjから、数２の関係に従い、標
準話者ｎの特徴ベクトルを抽出し、コードワード数４を
得る。これをｊ＝１...Ｊについて行う事で、標準話者
ｎのコードブック数１７を生成する。The feature vector extracting means 8 extracts the feature vector of the standard speaker n from the code word Vj of the number of dimension extension codebooks 1 stored in the dimension extension codebook memory 5 in accordance with the relationship of equation 2, The codeword number 4 is obtained. By performing this for j = 1... J, the codebook number 17 of the standard speaker n is generated.

【００３９】[0039]

【数１７】 [Equation 17]

【００４０】この処理を全ての標準話者について行うこ
とで、全ての標準話者に対するコードブック数１８が作
成される。By performing this process for all standard speakers, a codebook number 18 for all standard speakers is created.

【００４１】[0041]

【数１８】 (Equation 18)

【００４２】ベクトル量子化手段９は、音響分析手段２
の出力であるところの前記特徴ベクトル時系列群数６に
属する特徴ベクトル時系列数５に対し、特徴ベクトル抽
出手段８の出力であるところの、標準話者ｎに対するコ
ードブック数１７を用いて、ベクトル量子化処理を行い
コードラベル時系列数１９The vector quantizing means 9 comprises the acoustic analyzing means 2
For the feature vector time series number 5 belonging to the feature vector time series group number 6 which is the output of the above, using the codebook number 17 for the standard speaker n which is the output of the feature vector extraction means 8, A vector quantization process is performed and the number of code label time series is 19

【００４３】[0043]

【数１９】 [Equation 19]

【００４４】（ただし、数２０はコードラベル、Ｔn,i
は系列数）を生成する。(However, Expression 20 is a code label, Tn, i
Is the number of series).

【００４５】[0045]

【数２０】 (Equation 20)

【００４６】これをｎ＝１...Ｎについて行うことで、
コードラベル時系列群数２１を生成し、これを辞書メモ
リに書き込む。By performing this for n = 1 ... N,
A code label time series group number 21 is generated and written to the dictionary memory.

【００４７】[0047]

【数２１】 (Equation 21)

【００４８】以上述べたように、生成されたコードラベ
ル時系列群は、各標準話者の音声から作成されているの
で、相異なる調音様態を持っている。As described above, since the generated code label time series is created from the voice of each standard speaker, it has different articulation modes.

【００４９】以上の動作により、辞書メモリにはカテゴ
リｉの音声に対応するＮ＋１組の相異なる調音様態を持
つコードラベル時系列が格納される。With the above operation, the dictionary memory stores N + 1 sets of code label time series having different tonal modalities corresponding to the category i speech.

【００５０】[0050]

【発明が解決しようとする課題】従来の標準パタン作成
装置は、以上のように構成されているので、カテゴリｉ
の音声に対し、Ｎ人の標準話者の調音様態を平均したコ
ードラベル時系列が１組と各標準話者の調音様態に従っ
たコードラベル時系列が標準話者数であるＮ組の、合計
Ｎ＋１組のコードラベル時系列が生成されることとな
り、この様な標準パタン作成装置を話者適応化音声認識
システムに、入力話者の調音様態の多様性に対処した認
識率向上のために採用した場合、調音様態の異なるコー
ドラベル時系列を増やすためには、標準話者数を増やさ
なければならないという課題があった。Since the conventional standard pattern creating apparatus is configured as described above, the category i
, One set of code label time series that averages the articulation modes of N standard speakers and N sets of code label time series that are the number of standard speakers according to the articulation mode of each standard speaker. A total of N + 1 sets of code label time series will be generated, and such a standard pattern generation device can be used in a speaker-adaptive speech recognition system in order to improve the recognition rate in response to the variety of articulatory modes of the input speaker. When employed, there is a problem that the number of standard speakers must be increased in order to increase the time series of code labels having different articulation modes.

【００５１】この発明は、上記のような課題を解消する
ためになされたもので、標準話者数を増やすことなく調
音様態の異なるコードラベル時系列を作成することが可
能な標準パタン作成装置を得ることを目的としている。The present invention has been made in order to solve the above-mentioned problems, and has a standard pattern creating apparatus capable of creating a code label time series having different articulation modes without increasing the number of standard speakers. The purpose is to get.

【００５２】[0052]

【課題を解決するための手段】この発明にかかる標準パ
タン作成装置は、複数の標準話者が発声した同カテゴリ
音声を入力として信号分析を行い特徴ベクトル時系列群
に変換する音響分析手段と、この音響分析手段の出力で
ある特徴ベクトル時系列群に対し、時間軸の正規化を行
う時間軸正規化手段と、この時間軸正規化手段の出力等
長特徴ベクトル時系列群を用いて次元拡張特徴ベクトル
時系列を生成する次元拡張手段と、次元拡張コードブッ
クを格納する次元拡張コードブックメモリと、重みデー
タ入力端子より入力された各標準話者に対し選択された
「０」以上「１」以下の重みデータ（以降、標準話者重
みデータと称す）に従い、前記次元拡張コードブックの
各コードと前記次元拡張特徴ベクトル時系列の各次元拡
張特徴ベクトルとの、標準話者重みデータで重みづけさ
れた距離（以降、標準話者重みつき距離と称す）を計算
する重みつき距離演算手段と、この重みつき距離演算手
段により求められた標準話者重みつき距離データを入力
としてコードラベル時系列を出力する距離最小コード探
索手段と、このコードラベル時系列を保存する辞書メモ
リとを設けたものである。なお以降、前述の重みつき距
離演算手段と距離最小コード探索手段をまとめて重みつ
きベクトル量子化手段と称す。According to the present invention, there is provided a standard pattern creating apparatus comprising: a sound analyzing unit which performs signal analysis by using the same category voice uttered by a plurality of standard speakers as input and converts the signal into a feature vector time series group; Time axis normalization means for normalizing the time axis for the feature vector time series output from the acoustic analysis means, and dimension expansion using the output isometric feature vector time series group output from the time axis normalization means Dimension extension means for generating a feature vector time series, dimension extension codebook memory for storing a dimension extension codebook, and selected for each standard speaker input from the weight data input terminal
Weight data of "0" or more and "1" or less (hereinafter, standard speaker weight)
In accordance with standard speaker weight data of each code of the dimension extension codebook and each dimension extension feature vector of the dimension extension feature vector time series.
Weighted distance calculating means for calculating the calculated distance (hereinafter, referred to as a standard speaker weighted distance), and the standardized speaker weighted distance data obtained by the weighted distance calculating means is used as an input to generate a code label time series. It is provided with a minimum distance code search means for outputting and a dictionary memory for storing the code label time series. Hereinafter, the above-mentioned weighted distance calculation means and minimum distance code search means are collectively referred to as weighted vector quantization means.

【００５３】[0053]

【作用】この発明における重みつきベクトル量子化手段
は、その構成要素である重みつき距離演算手段が重みデ
ータ入力端子より入力された標準話者重みデータに従
い、次元拡張特徴ベクトル時系列の各次元拡張特徴ベク
トルと次元拡張コードブックメモリに格納されている各
コードとの標準話者重みつき距離を演算し、また同じく
重みつきベクトル量子化手段の構成要素である距離最小
コード探索手段が、前記重みつき距離演算手段の出力で
ある距離データを入力として、各次元拡張特徴ベクトル
に対し最小の距離を与えるコードを探索する事でコード
ラベル時系列を作成し、辞書メモリに書き込むので、重
みデータ入力端子から入力される標準話者重みデータを
変えて、前述した重みつきベクトル量子化手段における
処理を繰り返す度に、調音様態の異なるコードラベル時
系列が生成される。The weighted vector quantizing means according to the present invention is characterized in that the weighted distance calculating means, which is a constituent element thereof, expands each dimension of the dimensionally extended feature vector time series in accordance with the standard speaker weight data input from the weight data input terminal. The standard speaker weighted distance between the feature vector and each code stored in the dimension expansion codebook memory is calculated, and the minimum distance code searching means, which is also a component of the weighted vector quantization means, With the distance data output from the distance calculation means as input, a code label time series is created by searching for a code that gives the minimum distance to each dimension extended feature vector, and written in the dictionary memory. changing the standard speaker weight data to be input, each time repeating the process in the weighted vector quantization means described above Different code label time series of articulatory manner is generated.

【００５４】[0054]

【実施例】以下、この発明の一実施例を図について説明
する。図１において、１は入力端子、２は音響分析手
段、３は時間軸正規化手段、４は次元拡張手段、５は次
元拡張コードブックメモリ、７は辞書メモリであり、図
２の同一符号を付した従来の装置と同一、あるいは相当
部分であるため詳細な説明は省略する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below with reference to the drawings. In FIG. 1, 1 is an input terminal, 2 is an acoustic analysis unit, 3 is a time axis normalizing unit, 4 is a dimension expansion unit, 5 is a dimension expansion codebook memory, 7 is a dictionary memory, and the same reference numerals in FIG. The detailed description is omitted because it is the same as or equivalent to the attached conventional device.

【００５５】また、１０は標準話者重みデータを入力す
る重みデータ入力端子である。１１は、前記重みデータ
入力端子１０から入力される標準話者重みデータに従
い、前記次元拡張手段４の出力である次元拡張特徴ベク
トル時系列の各特徴ベクトルと前記次元拡張コードブッ
クメモリ５に保存されている次元拡張コードブックの各
コードとの重みつき距離を演算する重みつき距離演算手
段である。１２はこの重みつき距離演算手段１１の出力
である距離データを入力として、前記次元拡張特徴ベク
トル時系列の各コードに対し、最小の距離を与えるコー
ドを探索することで、コードラベル時系列を作成し、前
記辞書メモリに書き込む距離最小コード探索手段であ
る。１３は、前記重みつき距離演算手段１１と前記距離
最小コード探索手段１２とで構成される重みつきベクト
ル量子化手段である。Reference numeral 10 denotes a weight data input terminal for inputting standard speaker weight data. Reference numeral 11 denotes each of the feature vectors of the dimension extended feature vector time series output from the dimension extending means 4 and stored in the dimension extended codebook memory 5 according to the standard speaker weight data input from the weight data input terminal 10. Weighted distance calculation means for calculating a weighted distance from each code of the existing dimension expansion codebook. A code label time series 12 is created by using the distance data output from the weighted distance calculation means 11 as an input and searching for a code that gives the minimum distance to each code of the dimension-extended feature vector time series. And means for searching for a minimum distance code to be written into the dictionary memory. Numeral 13 denotes a weighted vector quantization means comprising the weighted distance calculation means 11 and the distance minimum code search means 12.

【００５６】次に動作について説明する。この場合も、
従来の場合と同様に、標準話者はＮ人（ただし、Ｎ＞
１）とし、次元拡張コードブックは既に次元拡張コード
ブックメモリ上に格納されているとする。また、従来と
同様にカテゴリｉの音声についてコードラベル時系列を
作成し辞書メモリに格納する場合を例にとる。音声の入
力から、次元拡張手段までの動作は従来と同じ動作をす
るので説明を省略し、重みデータ入力端子１０から説明
を行う。Next, the operation will be described. Again,
As in the conventional case, the number of standard speakers is N (where N>
It is assumed that 1), and the dimension extension codebook is already stored in the dimension extension codebook memory. Further, a case will be taken as an example where a code label time series is created for a voice of category i and stored in a dictionary memory as in the conventional case. Since the operation from the input of the voice to the dimension expansion means is the same as the conventional operation, the description is omitted, and the description will be made from the weight data input terminal 10.

【００５７】重みデータ入力端子１０からは、各標準話
者ｎに対する重みデータＷn（ただし、ｎ＝１...Ｎ）が
入力される。重みデータＷn（ただし、ｎ＝１...Ｎ）
は、重みつきベクトル量子化手段１３が辞書メモリ７に
書き込むコードラベル時系列の調音様態に、各標準話者
の調音様態がどの程度寄与するかを決定するもので、数
２２を満たす。From the weight data input terminal 10, weight data Wn (where n = 1... N) for each standard speaker n is input. Weight data Wn (where n = 1 ... N)
Determines how much the articulation mode of each standard speaker contributes to the articulation mode of the code label written in the dictionary memory 7 by the weighted vector quantization means 13, and satisfies the equation (22).

【００５８】[0058]

【数２２】 (Equation 22)

【００５９】重みつき距離演算手段１１は、重みデータ
入力端子１０より入力される重みデータＷn（ただし、
ｎ＝１...Ｎ）を用いて、前記次元拡張手段４の出力で
あるところのカテゴリｉの音声に対応する次元拡張特徴
ベクトル時系列数１４の次元拡張特徴ベクトル数１１
と、前記次元拡張コードブックメモリ５に格納されてい
る次元拡張コードブック数１（ただし、Ｊはコードブッ
クサイズ）のコードワード数２との標準話者重みつき距
離Ｄi(j,t)を次式のように求める。The weighted distance calculating means 11 outputs weight data Wn (however,
n = 1... N), the number of dimension-extended feature vectors 11 of the time-series number of dimension-extended feature vectors 14 corresponding to the category i speech, which is the output of the dimension extending means 4
And the standard speaker weighted distance Di (j, t) between the number of codewords 2 of the number of dimension-extended codebooks 1 (where J is the codebook size) stored in the dimension-extended codebook memory 5 is as follows: Find as in the formula.

【００６０】[0060]

【数２３】 (Equation 23)

【００６１】ただし、数３は、数４に示すようなＭ次元
の特徴ベクトルで、次元拡張コードブック数１のコード
ワードＶjから、標準話者ｎに対応する特徴ベクトルを
抽出したものである。同様に、数９は、数１２に示すよ
うなＭ次元の特徴ベクトルで、次元拡張特徴ベクトル数
１０から、標準話者ｎに対応する特徴ベクトルを抽出し
たものである。数２４は、前記の特徴ベクトル数３と数
９のユークリッド距離で、However, Equation 3 is an M-dimensional feature vector as shown in Equation 4, and is obtained by extracting a feature vector corresponding to the standard speaker n from the codeword Vj having the dimension extension codebook number 1. Similarly, Equation 9 is an M-dimensional feature vector as shown in Equation 12, and is obtained by extracting a feature vector corresponding to the standard speaker n from the number of dimensionally expanded feature vectors 10. Equation 24 is the Euclidean distance of the above feature vectors 3 and 9,

【００６２】[0062]

【数２４】 (Equation 24)

【００６３】数２５で求められる。This is obtained by the following equation (25).

【００６４】[0064]

【数２５】 (Equation 25)

【００６５】この様にして、標準話者重みつき距離Ｄi
(j,t)は、次元拡張コードブック数１のコードワードＶj
から抽出した話者番号ｎが付された標準話者に対応する
特徴ベクトル数３と、次元拡張特徴ベクトル数１０から
抽出した前記標準話者の特徴ベクトル数９とのユークリ
ッド距離に重みデータＷnで重み付けした値を、話者番
号ｎ＝１...Ｎについて総和する事で得られるので、標
準話者重みつき距離Ｄi(j,t)に対する各標準話者の寄与
率は、重みデータＷn（ただし、ｎ＝１...Ｎ）に依る。
前記標準話者重みつき距離Ｄi(j,t)をｊ＝１...Ｊ、ｔ
＝１...Ｔiについて求め、距離データとして出力する。In this way, the standard speaker weighted distance Di
(j, t) is the code word Vj of the dimension extension codebook number 1.
The Euclidean distance between the number of feature vectors 3 corresponding to the standard speaker with the speaker number n extracted from the above and the number of feature vectors 9 of the standard speaker extracted from the number of dimensionally expanded feature vectors 10 is calculated using weight data Wn. the weighted value, because it is obtained by summing the speaker number n = 1 ... N, target
The contribution rate of each standard speaker to the quasi-speaker weighted distance Di (j, t) depends on the weight data Wn (where n = 1... N).
The standard speaker weighted distance Di (j, t) is j = 1 ... J, t
= 1 ... Ti and output as distance data.

【００６６】距離最小コード探索手段１２は、前記重み
つき距離演算手段１１の出力であるところの標準話者重
みつき距離データＤi(j,t)（ただし、ｊ＝１...Ｊ、ｔ
＝１...Ｔi）を入力として、任意のｔに対し、最小のＤ
i(j,t)を与えるｊを探索し、そのｊに対応するコードの
コードラベルを求めることで、コードラベル時系列数２
６The minimum distance code searching means 12 outputs the standard speaker weight as the output of the weighted distance calculating means 11.
Attachment distance data Di (j, t) (where j = 1 ... J, t
= 1 ... Ti), and for any t, the minimum D
By searching for j giving i (j, t) and finding the code label of the code corresponding to j, the code label time series number 2
6

【００６７】[0067]

【数２６】 (Equation 26)

【００６８】（但し、数２７はコードラベル）を作成
し、辞書メモリ７に書き込む。(However, Expression 27 is a code label) is created and written in the dictionary memory 7.

【００６９】[0069]

【数２７】 [Equation 27]

【００７０】重みデータ入力端子１０より入力する標準
話者重みデータを変えながら、前述した重みつき距離演
算手段１１と、距離最小コード探索手段１２の動作を繰
り返すことで、コードラベル時系列が複数個辞書メモリ
上に作成される Standard input from weight data input terminal 10
By repeating the operations of the above-described weighted distance calculation means 11 and minimum distance code search means 12 while changing the speaker weight data, a plurality of code label time series are created in the dictionary memory.

【００７１】以上述べたように、カテゴリｉの音声に対
応するコードラベル時系列は、重みつき距離演算手段１
１の出力であるところの距離データを入力として、距離
最小コード探索手段１２が作成したものであるので、複
数標準話者に対して重みデータＷnで重み付けして平均
した調音様態を持っており、辞書メモリ上の、カテゴリ
ｉの音声に対応する複数個のコードラベル時系列は、そ
れぞれ異なる重みデータに従って作成されているので、
相異なる調音様態を持っており、この様な標準パタン作
成装置を話者適応化音声認識システムに、入力話者の調
音様態の多様性に対処した認識率向上のために採用した
場合、標準話者数を増やすこと無く調音様態の異なる標
準パタンが作成できる。なお、上記実施例では専用のハ
ードウェアにて構成するものを示したが、汎用の計算機
や信号処理プロセッサにおけるソフトウェア処理によっ
て実現するようにしても良い。As described above, the code label time series corresponding to the category i voice is obtained by the weighted distance calculating means 1.
1 is created by the minimum distance code search means 12 with the distance data as the output of 1 as an input, so that it has an articulatory mode in which a plurality of standard speakers are weighted with the weight data Wn and averaged. Since a plurality of code label time series corresponding to the category i voice on the dictionary memory are created according to different weight data,
If such a standard pattern creation device is used in a speaker-adaptive speech recognition system to improve the recognition rate corresponding to the variety of articulatory modes of the input speaker, the standard speech generation method has different articulation modes. Standard patterns with different articulation modes can be created without increasing the number of participants. In the above-described embodiment, the hardware is configured by dedicated hardware. However, the hardware may be realized by software processing in a general-purpose computer or a signal processor.

【００７２】[0072]

【発明の効果】以上のようにこの発明によれば、複数の
標準話者が発声した同カテゴリ音声を入力として信号分
析を行い特徴ベクトル時系列群に変換する音響分析手段
と、この音響分析手段の出力である特徴ベクトル時系列
群に対し、時間軸の正規化を行う時間軸正規化手段と、
この時間軸正規化手段の出力の等長特徴ベクトル時系列
群を用いて次元拡張特徴ベクトル時系列を生成する次元
拡張手段と、次元拡張コードブックを格納する次元拡張
コードブックメモリと、重みデータ入力端子より入力さ
れた標準話者重みデータを用い前記次元拡張コードブッ
クの各コードと前記次元拡張特徴ベクトル時系列の各次
元拡張特徴ベクトルとの標準話者重みつき距離を計算す
る重みつき距離演算手段と、この重みつき距離演算手段
により求められた距離データを入力としてコードラベル
時系列を出力する距離最小コード探索手段と、このコー
ドラベル時系列を保存する辞書メモリとで構成したの
で、標準話者数に依らず複数個の調音様態の異なるコー
ドラベル時系列が作成でき、この様な標準パタン作成装
置を話者適応化音声認識システムに、入力話者の調音様
態の多様性に対処した認識率向上のために採用した場
合、標準話者数を増やすこと無く調音様態の異なる標準
パタンが作成できる。As described above, according to the present invention, sound analysis means for performing signal analysis using the same category voice uttered by a plurality of standard speakers as input and converting the signal into a feature vector time series group, and this sound analysis means Time axis normalizing means for normalizing the time axis for the feature vector time series group which is the output of
Dimension extension means for generating a dimension extension feature vector time series using the isometric feature vector time series output from the time axis normalization means, dimension extension codebook memory for storing dimension extension codebook, and weight data input Weighted distance calculating means for calculating a standard speaker weighted distance between each code of the dimension-extended codebook and each dimension-extended feature vector of the dimension-extended feature vector time series using standard speaker weight data input from a terminal And a minimum distance code search means for outputting a code label time series with the distance data obtained by the weighted distance calculation means as an input, and a dictionary memory for storing the code label time series. A plurality of code label time series with different articulation modes can be created regardless of the number. To identify system, if it is adopted for the deal to diversity recognition rate improvement of articulation aspect of the input speaker, it can be a different standard pattern is created with no articulation manner to increase the standard number of speakers.

【００７３】[0073]

[Brief description of the drawings]

【図１】この発明の一実施例に係る標準パタン作成装置
の構成を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of a standard pattern creation device according to an embodiment of the present invention.

【図２】従来の標準パタン作成装置の構成を示すブロッ
ク図である。FIG. 2 is a block diagram showing a configuration of a conventional standard pattern creation device.

[Explanation of symbols]

１音声信号の入力端子２音響分析手段３時間軸正規化手段４次元拡張手段５次元拡張コードブックメモリ６次元拡張量子化手段７辞書メモリ８特徴ベクトル抽出手段９ベクトル量子化手段１０重みデータ入力端子１１重みつき距離演算手段１２距離最小コード探索手段１３重みつきベクトル量子化手段 Reference Signs List 1 audio signal input terminal 2 acoustic analysis means 3 time axis normalization means 4 dimensional expansion means 5 dimensional expansion codebook memory 6 dimensional expansion quantization means 7 dictionary memory 8 feature vector extraction means 9 vector quantization means 10 weight data input terminal 11 weighted distance calculation means 12 minimum distance code search means 13 weighted vector quantization means

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平２−251999（ＪＰ，Ａ) 特開昭61−261799（ＪＰ，Ａ) 鈴木、中島「話者適応型不特定話者単語認識方式の検討」音講論平成２年春、Ｐ55−56 鈴木、中島「ベクトル量子化による話者適応化法の連続音声認識への適応」音講論昭和63年秋、Ｐ43−44 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 3/00 - 9/20 ＪＩＣＳＴファイル（ＪＯＩＳ)────────────────────────────────────────────────── ─── Continuation of the front page (56) References JP-A-2-251999 (JP, A) JP-A-61-261799 (JP, A) Suzuki, Nakajima “Speaker-adaptive unspecified speaker word recognition method Suzuki, Nakajima, "Adaptation of Speaker Adaptation Method to Continuous Speech Recognition by Vector Quantization", Sound Lecture, Fall 1988, P43-44 (58) .Cl. ⁷ , DB name) G10L 3/00-9/20 JICST file (JOIS)

Claims

(57) [Claims]

1. An acoustic analysis means for performing an acoustic analysis by using the same category speech uttered by a plurality of standard speakers as an input and outputting a feature vector time series group, and a feature vector time series group output from the acoustic analysis means. On the other hand, a time axis normalizing means for normalizing the time axis, a dimension extending means for generating a dimension extended feature vector using an isometric feature vector time series output from the time axis normalizing means, and a dimension extension code A dimension expansion codebook memory for storing books, and a selection for each standard speaker input from the weight data input terminal
Used are-option "0" or "1" following weight data, with each dimension extension feature vector and each code of the dimension extension codebook the dimensional extension feature vector time series, each target
The distance weighted by the weight data selected for the quasi-speaker
Weighted distance calculating means for calculating the distance; distance minimum code searching means for outputting a code label time series with the distance data obtained by the weighted distance calculating means as input; dictionary memory for storing the code label time series And a standard pattern making device.