JP3061912B2

JP3061912B2 - Voice recognition device

Info

Publication number: JP3061912B2
Application number: JP3285678A
Authority: JP
Inventors: 泰山崎; 晋太木村
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1991-10-04
Filing date: 1991-10-04
Publication date: 2000-07-10
Anticipated expiration: 2015-07-10
Also published as: JPH05100695A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、学習用の音声をスペク
トル分析して分析結果の特徴的時系列を抽出してテンプ
レート記憶部に格納し、音声認識時にテンプレート記憶
部より上記分析結果を読みだして、識別音声と照合する
ことにより音声を認識する音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for spectrally analyzing a speech for learning, extracting a characteristic time series of the analysis result, storing the characteristic time series in a template storage unit, and reading the analysis result from the template storage unit during speech recognition. However, the present invention relates to a voice recognition device that recognizes voice by collating with an identification voice.

【０００２】[0002]

【従来の技術】図１１は音声認識装置の従来例である。
同図において、１０１はスペクトル分析部、１０２はテ
ンプレート抽出部、１０３はテンプレート記憶部、１０
４はテンプレート読み出し部、１０５は照合部である。
図１１におけるスペクトル分析部１０１は認識音声ない
し学習用音声をスペクトル分析する手段、テンプレート
抽出部１０２はスペクトル分析結果から各音声について
複数のテンプレートを抽出する手段、テンプレート記憶
部１０３は抽出されたテンプレートを記憶する手段、テ
ンプレート読み出し部１０４はテンプレート記憶部１０
３に記憶されたテンプレートを読み出す手段、照合部１
０５は認識音声のスペクトル分析結果とテンプレート記
憶部１０３に記憶されたテンプレートを照合する手段で
ある。2. Description of the Related Art FIG. 11 shows a conventional example of a speech recognition apparatus.
In the figure, 101 is a spectrum analysis unit, 102 is a template extraction unit, 103 is a template storage unit, 10
Reference numeral 4 denotes a template reading unit, and 105 denotes a collating unit.
In FIG. 11, a spectrum analyzing unit 101 is a means for analyzing a spectrum of a recognition speech or a learning speech, a template extracting unit 102 is a means for extracting a plurality of templates for each speech from a spectrum analysis result, and a template storage unit 103 is a unit for extracting the extracted templates. The storing means, the template reading unit 104,
Means for reading the template stored in the storage unit 3;
Reference numeral 05 denotes a unit for comparing the spectrum analysis result of the recognized voice with the template stored in the template storage unit 103.

【０００３】次に、上記従来装置の動作を説明する。学
習時、システムからの指示等に基づき音声入力者が音声
を入力すると、スペクトル分析部１０１は入力された音
声をスペクトル分析する。テンプレート抽出部１０２
は、スペクトル分析結果の一部を特徴的時系列パターン
｛ａ_ij｝（ｉは入力フレーム、ｊは周波数）として抽出
する。Next, the operation of the above-mentioned conventional device will be described. At the time of learning, when a voice input person inputs a voice based on an instruction from the system or the like, the spectrum analysis unit 101 performs a spectrum analysis on the input voice. Template extraction unit 102
Extracts a part of the spectrum analysis result as a characteristic time-series pattern {a _ij } (i is an input frame and j is a frequency).

【０００４】図１２は音声から特徴パラメータを取り出
すための１例を示すブロック図であり、まず、音声を高
速フーリェ変換部１１０により高速フーリェ変換し、音
声の周波数成分を求め、それをメル尺度分割部１１１に
より人間の聴覚の尺度であるメル尺度で分割し、音声あ
るいは音素に準ずる音声単位の特徴パラメータを求め
る。抽出された学習用音声の特徴的時系列パターンはそ
の音声のカテゴリに対応付けられてテンプレート記憶部
１０３に格納される。FIG. 12 is a block diagram showing an example for extracting feature parameters from speech. First, the speech is subjected to fast Fourier transform by a fast Fourier transform unit 110 to obtain a frequency component of the speech, which is divided into mel scales. The unit 111 divides by a mel scale, which is a measure of human hearing, and obtains a feature parameter of a speech unit similar to speech or a phoneme. The extracted characteristic time-series pattern of the learning voice is stored in the template storage unit 103 in association with the category of the voice.

【０００５】図１３はテンプレートのデータ構造を摸式
的に表したものであって、同図において、１２１はカテ
ゴリのラベル、１２２ないし１２５は入力音声から求め
たテンプレートである。また、「ｃａｔ」は音声のカテ
ゴリであり、同図は、「Ａ」、「Ｉ」・・・のカテゴリ
の音声に対応したテンプレートが示されている。ｋは上
記カテゴリの番号、ｌはカテゴリ内の番号、ｊは周波数
である。FIG. 13 schematically shows the data structure of a template. In FIG. 13, reference numeral 121 denotes a category label, and reference numerals 122 to 125 denote templates obtained from input speech. In addition, “cat” is a category of voice, and the figure shows templates corresponding to voices of categories “A”, “I”,. k is the number of the category, l is the number in the category, and j is the frequency.

【０００６】テンプレートは｛ｂ_klj｝で表され、同図
に示すように、各カテゴリｋについて、複数のテンプレ
ートｌ（マルチテンプレートｌ）が抽出される。学習時
には、以上のようにして各種のカテゴリの音声を入力し
各音声について複数のテンプレートを作成し、学習を終
了する。[0006] The template is represented by {b _klj }, and as shown in the figure, a plurality of templates 1 (multi-templates 1) are extracted for each category k. At the time of learning, voices of various categories are input as described above, a plurality of templates are created for each voice, and the learning is completed.

【０００７】音声認識時には、認識音声をスペクトル分
析部１０１においてスペクトル分析する。ついで、テン
プレート読出部１０４はテンプレート記憶部１０３に格
納されている学習時に作成したテンプレートを読み出
し、照合部１０５において認識音声のスペクトル分析結
果と照合する。At the time of speech recognition, the spectrum of the recognized speech is analyzed by the spectrum analyzer 101. Next, the template reading unit 104 reads the template created at the time of learning stored in the template storage unit 103, and the matching unit 105 compares the template with the spectrum analysis result of the recognized voice.

【０００８】そして、入力された音声のスペクトル分析
結果とテンプレート記憶部１０３に格納されている各テ
ンプレートとの距離を求め、その距離のもつとも近いテ
ンプレートに対応づけられる音声を入力された音声と認
識する。[0008] Then, the distance between the spectrum analysis result of the input voice and each template stored in the template storage unit 103 is obtained, and the voice corresponding to the template having the closest distance is recognized as the input voice. .

【０００９】照合部１０５においては下記の手法で音声
が照合される。ここで、｛ａ_ij｝は入力パターン（特徴
パラメータ）であり、ｉは時間、ｊは周波数、
｛ｂ_klj｝はテンプレート（特徴パラメータ）であり、
ｋはカテゴリ、ｌはカテゴリ内の番号、ｊは周波数、ま
た、｛ｃ_i｝は照合結果（カテゴリラベル列）であり、
ｉは時間である。The collation unit 105 collates the voice by the following method. Here, {a _ij } is an input pattern (feature parameter), i is time, j is frequency,
{B _klj } is a template (feature parameter),
k is a category, l is a number in a category, j is a frequency, and {c _i } is a collation result (category label sequence).
i is time.

【００１０】あるカテゴリｋについて、各テンプレー
トに対して入力パターンａ_ijとテンプレートｂ_kljの距
離を式（１）により計算する。式（１）で求めた距離の最小値を式（２）で求め、こ
れをこのカテゴリとの距離Ｄ_ikとする。各カテゴリ（音素など）について，を実行する。式（３）により、求めた各カテゴリの距離の内最小の
距離を求め、最も近いカテゴリを決定する。各入力フレームに対して上記…の処理を実行す
る。なお、式（１）ないし（３）において、ｃａｔ（）は
カテゴリのラベル、Ｄ_ikはカテゴリｋとの距離、ｄ_ikl
はカテゴリｋのテンプレートｌとの距離である。また、
ｍｉｎは最小値、ａｒｇｍｉｎは最小値となるものを引
数とすることをであり、式（３）はＤ_ikが最小値となる
ｋを求め、ｃ_i＝ｃａｔ（ｋ）とすることを意味する。For a certain category k, the distance between the input pattern a _ij and the template b _klj is calculated for each template by the equation (1). The minimum value of the distance obtained by Expression (1) is obtained by Expression (2), and this is set as the distance _Dik to this category. Execute for each category (phonemes, etc.). The minimum distance among the obtained distances of the respective categories is determined by Expression (3), and the closest category is determined. The above-described processing is executed for each input frame. In equations (1) to (3), cat () is the label of the category, D _ik is the distance from category k, and d _ikl
Is the distance from the template l of the category k. Also,
min is the minimum value, and argmin is the argument that takes the minimum value as an argument. Equation (3) means that k that minimizes D _ik is obtained, and that c _i = cat (k). .

【００１１】[0011]

【数１】 (Equation 1)

【００１２】[0012]

【発明が解決しようとする課題】テンプレートを使用し
て音声を認識する際には、各カテゴリの音声についてテ
ンプレートの数が多いほど、音声の認識性能は高くな
る。しかし、テンプレートの個数を増やすためには学習
用の単語を増やす必要があり、利用者の負担が大きくな
る。本発明は、この点に鑑みてなされたものであって、
同じカテゴリの音声について複数のテンプレートを抽出
し、音声認識時、入力音声と複数のテンプレートを比較
することにより、音声を認識する音声認識装置におい
て、学習用の単語を増やすことなく、テンプレートの個
数を増やし、音声の認識性能を向上させることを目的と
する。In recognizing speech using a template, the greater the number of templates for each category of speech, the higher the speech recognition performance. However, in order to increase the number of templates, it is necessary to increase the number of words for learning, and the burden on the user increases. The present invention has been made in view of this point,
By extracting a plurality of templates for voices in the same category and comparing the input voice and the plurality of templates during voice recognition, a voice recognition device that recognizes voice can reduce the number of templates without increasing the number of learning words. The purpose is to increase the speech recognition performance.

【００１３】[0013]

【課題を解決するための手段】上記課題を解決するため
本発明は、図１および図２に示すように、入力音声のス
ペクトル分析を行うスペクトル分析部１と、分析結果か
ら音素あるいは音素に準ずる音声単位の特徴的時系列パ
ターンを抽出するテンプレート抽出部２と、抽出したテ
ンプレートを同じカテゴリについてマルチプレートとし
て格納するテンプレート記憶部３と、格納されたテンプ
レートを読み出すテンプレート読み出し部４と、入力音
声と読み出したテンプレートとの照合を行う照合部５と
を備え、学習時には、入力音声から特徴的時系列パター
ンを抽出しテンプレートとして、テンプレート記憶部３
に格納し、認識時には、テンプレート記憶部３に格納さ
れたテンプレートを読み出し、照合部５において入力音
声の特徴的時系列パターンと照合することにより、音声
を認識する音声認識装置において、請求項１は抽出した
テンプレートの時系列を反転するテンプレート反転部６
を設け、反転したテンプレートをテンプレート記憶部３
に格納するものであり、反転したテンプレートを反転前
のテンプレートと同じカテゴリのテンプレートとして照
合に用いることにより、テンプレートの個数を増やすも
のである。請求項２は、請求項１の音声認識装置にテン
プレート選択部７を設け、特徴的時系列パターンを反転
しても入力音声の特徴が失われないテンプレートを選択
し、選択されたテンプレートをテンプレート反転部６に
与えるものである。請求項３は、請求項１又は請求項２
の音声認識装置に、反転したテンプレートに反転フラグ
を付加する反転フラグ付加部８を設け、照合時、反転フ
ラグを考慮して照合を行うように構成したものである。
請求項４は、記憶部３から読み出したテンプレートの時
系列を反転する反転テンプレート読み出し部９を設け、
反転したテンプレートを反転前のテンプレートと同じカ
テゴリのテンプレートとして読み出し、照合部において
入力音声と照合し、反転しないテンプレートと入力音声
の特徴的時系列パターンとの照合結果と、上記反転した
テンプレートと入力音声の特徴的時系列パターンとの照
合結果とを統合し、音声を認識するように構成したもの
である。請求項５は、請求項４の音声認識装置に、特徴
的時系列パターンを反転しても入力音声の特徴が失われ
ないテンプレートを選択するテンプレート選択部７とテ
ンプレート選択部７において選択されたテンプレートに
反転フラグを付加する反転フラグ付加部８を設け、反転
テンプレート読み出し部９によりテンプレートを反転し
て読み出す際、反転フラグの付加されたテンプレートを
選択して反転読み出しをするように構成したものであ
る。請求項６は、認識時に入力音声の時系列を反転する
入力反転部１０を設け、入力反転部１０により反転され
た入力音声とテンプレート読み出し部により読み出され
たテンプレートを照合し、前記反転しない入力音声の特
徴的時系列パターンとテンプレートとの照合結果と、上
記反転した入力音声の特徴的時系列パターンとテンプレ
ートとの照合結果とを統合し、音声を認識するように構
成したものである。請求項７は、請求項６の音声認識装
置において、特徴的時系列パターンを反転しても入力音
声の特徴が失われないテンプレートを選択するテンプレ
ート選択部１１を設け、テンプレート選択部１１により
選択されたテンプレートと入力反転部１０の出力を照合
するように構成したものである。According to the present invention, as shown in FIGS. 1 and 2, a spectrum analyzing section 1 for analyzing the spectrum of an input voice and a phoneme or a phoneme based on the analysis result are provided. A template extracting unit 2 for extracting a characteristic time-series pattern for each voice, a template storage unit 3 for storing the extracted templates as a multi-plate for the same category, a template reading unit 4 for reading out the stored templates, A collating unit 5 for collating with the read template; at the time of learning, a characteristic time-series pattern is extracted from the input voice and the template storage unit 3 is used as a template;
In the speech recognition apparatus for recognizing speech by reading the template stored in the template storage unit 3 at the time of recognition and comparing the template with the characteristic time-series pattern of the input speech at the collation unit 5, Template inverting unit 6 for inverting the time series of the extracted template
And store the inverted template in the template storage unit 3
The number of templates is increased by using the inverted template as a template of the same category as the template before inversion for matching. According to a second aspect of the present invention, the speech recognition apparatus of the first aspect further includes a template selection unit for selecting a template that does not lose the characteristics of the input speech even when the characteristic time-series pattern is inverted, and inverts the selected template. This is given to the unit 6. Claim 3 is Claim 1 or Claim 2.
Is provided with an inversion flag adding unit 8 for adding an inversion flag to an inverted template, and performs collation in consideration of the inversion flag at the time of collation.
Claim 4 provides an inverted template reading unit 9 for inverting the time series of the template read from the storage unit 3,
The inverted template is read out as a template of the same category as the template before the inversion, and compared with the input voice in the matching unit, and the template that is not inverted and the input voice are compared.
The result of the comparison with the characteristic time series pattern of
Comparison between the template and the characteristic time-series pattern of the input voice
The result is integrated so as to recognize speech . According to a fifth aspect of the present invention, there is provided the speech recognition apparatus according to the fourth aspect, wherein the template selection unit selects a template which does not lose the characteristics of the input speech even when the characteristic time-series pattern is inverted. Is provided with an inversion flag adding unit 8 for adding an inversion flag, and when the template is inverted and read by the inversion template reading unit 9, the template to which the inversion flag is added is selected and the inversion reading is performed. . Claim 6 provides an input inverting unit 10 for inverting the time series of the input voice at the time of recognition, collating the input voice inverted by the input inverting unit 10 with the template read by the template reading unit, and providing the input that is not inverted. Audio features
The result of matching the symbolic time series pattern with the template
Characteristic time-series pattern and template of input speech inverted
It is configured to recognize the voice by integrating the collation result with the port . According to a seventh aspect of the present invention, there is provided the speech recognition apparatus according to the sixth aspect, further comprising a template selection unit for selecting a template which does not lose the characteristics of the input speech even when the characteristic time-series pattern is inverted. The configuration is such that the template and the output of the input inversion unit 10 are collated.

【００１４】[0014]

【作用】請求項１のものにおいては、テンプレート記憶
部３に入力音声の特徴的時系列パターンを反転したテン
プレートが格納されているので、学習用の単語の入力を
増やすことなく、テンプレートの数を多くすることがで
き、音声の認識性能を向上することができる。請求項２
のものにおいては、請求項１のものにテンプレート選択
部７を設け、反転しても元の音声の特徴が失われないテ
ンプレートのみをテンプレート選択部７により選択し
て、テンプレート反転部６で反転しているので、テンプ
レート記憶部３に不要なテンプレートが格納されること
がなく、記憶部の容量を小さくできる。また、照合時、
元の音声の特徴が失われたテンプレートと認識音声を照
合することがないので、照合時の性能を向上させること
ができる。請求項３のものにおいては、請求項１または
請求項２のものに反転フラグ付加部８を設け、反転フラ
グ付加部８により、反転したテンプレートに反転フラグ
を付加し、通常のテンプレートと反転テンプレートを区
別できるようにしているので、照合時の音声識別性能を
向上させることができる。請求項４のものにおいては、
反転テンプレート読み出し部９を設け、反転したテンプ
レートを反転前のテンプレートと同じカテゴリのテンプ
レートとして読み出し、照合部において入力音声と照合
し、反転しないテンプレートと入力音声の特徴的時系列
パターンとの照合結果と、上記反転したテンプレートと
入力音声の特徴的時系列パターンとの照合結果とを統合
し、音声を認識するようにしているため、テンプレート
記憶部３に反転テンプレートを格納する必要はなく、テ
ンプレート記憶部３の容量を従来の装置と同等としたま
ま、テンプレートの数を増やしたのと同等の効果を得る
ことができる。請求項５のものにおいては、請求項４の
ものに、テンプレート選択部７および反転フラグ付加部
８を設けて、テンプレート記憶部３からテンプレートの
読み出し時、反転可能なものを区別できるようにし、反
転可能なもののみを反転して読み出しているため、反転
することにより特徴の失われるテンプレートを用いて照
合部５において照合することがなく、照合時の性能が向
上する。請求項６のものにおいては、入力反転部１０に
より、反転された入力音声とテンプレート読み出し部に
より読み出されたテンプレートを照合し、前記反転しな
い入力音声の特徴的時系列パターンとテンプレートとの
照合結果と、上記反転した入力音声の特徴的時系列パタ
ーンとテンプレートとの照合結果とを統合し、音声を認
識するようにしているので、テンプレート記憶部３の容
量を従来の装置と同等としたまま、テンプレートの時系
列を反転した場合と全く同等の効果を得ることができ
る。請求項７のものにおいては、請求項６のものにテン
プレート選択部７を設け、反転しても特徴が失われるこ
とのないテンプレートをテンプレート選択部７で選択し
ているので、照合時、元の音声の特徴が失われた反転音
声とテンプレートを照合することがなく、照合部５にお
ける照合時の性能を向上させることができる。According to the first aspect of the present invention, since the template storage unit 3 stores a template obtained by inverting the characteristic time-series pattern of the input voice, the number of templates can be reduced without increasing the number of words for learning. This can be increased, and the speech recognition performance can be improved. Claim 2
In the first embodiment, a template selection unit 7 is provided in the first embodiment, and only templates that do not lose the original voice characteristics even when inverted are selected by the template selection unit 7 and inverted by the template inversion unit 6. Therefore, unnecessary templates are not stored in the template storage unit 3, and the capacity of the storage unit can be reduced. Also, at the time of verification,
Since the template having the feature of the original speech lost and the recognized speech are not collated, the performance at the time of collation can be improved. According to the third aspect, an inversion flag adding unit 8 is provided in the first or second aspect, and the inversion flag adding unit 8 adds an inversion flag to the inverted template, so that the normal template and the inversion template can be combined. Since it is possible to distinguish, the voice identification performance at the time of collation can be improved. According to the fourth aspect,
An inverted template reading unit 9 is provided to read the inverted template as a template of the same category as the template before the inversion, collate with the input voice in the collation unit, and determine the characteristic time series of the non-inverted template and the input voice.
The result of matching with the pattern
Integration of the matching result with the characteristic time-series pattern of the input speech
In addition, since the voice is recognized, there is no need to store the inverted template in the template storage unit 3, and the number of templates is increased while keeping the capacity of the template storage unit 3 equal to that of the conventional device. An equivalent effect can be obtained. According to the fifth aspect, a template selection unit 7 and an inversion flag addition unit 8 are provided in the fourth aspect, so that when a template is read from the template storage unit 3, reversible ones can be distinguished. Since only possible ones are inverted and read, the collating unit 5 does not collate using a template whose characteristics are lost due to the inversion, and the performance at the time of collation is improved. According to the sixth aspect, the input inverting unit compares the inverted input voice with the template read out by the template reading unit, and performs the inversion.
Between the characteristic time-series pattern of the input speech and the template
The matching result and the characteristic time-series pattern of the inverted input voice
Integrate the result of matching the
Since so as to identify, while the capacity of the template storage unit 3 was equivalent to the conventional apparatus, at all it is possible to obtain the same effect as reversing the time series of templates. According to the seventh aspect, the template selecting unit 7 is provided in the sixth aspect, and a template that does not lose its characteristics even when inverted is selected by the template selecting unit 7. It is possible to improve the performance of the collating unit 5 at the time of collation without collating the template with the inverted voice having the characteristic of the audio lost.

【００１５】[0015]

【実施例】図３は本発明の第１の実施例を示す図であ
り、同図において、３１はスペクトル分析部、３２はテ
ンプレート抽出部、３３はテンプレート記憶部、３４は
テンプレート読み出し部、３５は照合部、３６はテンプ
レート反転部である。FIG. 3 is a diagram showing a first embodiment of the present invention. In FIG. 3, reference numeral 31 denotes a spectrum analysis unit, 32 denotes a template extraction unit, 33 denotes a template storage unit, 34 denotes a template reading unit, and 35 denotes a template reading unit. Denotes a collating unit, and 36 denotes a template inverting unit.

【００１６】図３におけるスペクトル分析部３１、テン
プレート抽出部３２、テンプレート記憶部３３、テンプ
レート読み出し部３４、照合部３５は従来例のものと同
等の機能をもつ手段であって、本実施例においては、従
来例のものに、テンプレートの時系列を反転するテンプ
レート反転部３６が付加されている。The spectrum analyzing unit 31, template extracting unit 32, template storing unit 33, template reading unit 34, and collating unit 35 in FIG. 3 are means having functions equivalent to those of the conventional example. A template inverting unit 36 for inverting the time series of a template is added to the conventional example.

【００１７】次に図３の装置の動作を説明する。学習
時、従来例と同様にシステムの指示等に基づき音声入力
者が音声を入力すると、スペクトル分析部３１は入力さ
れた音声をスペクトル分析する。テンプレート抽出部３
２はスペクトル分析結果の一部を特徴的時系列パターン
｛ａ_ij｝（ｉは入力フレーム、ｊは周波数）として抽出
する。抽出された学習用音声の特徴的時系列パターンは
その音声に対応付けられてテンプレート記憶部３３に格
納される。Next, the operation of the apparatus shown in FIG. 3 will be described. At the time of learning, as in the conventional example, when a voice input person inputs a voice based on an instruction of the system or the like, the spectrum analysis unit 31 performs a spectrum analysis of the input voice. Template extraction unit 3
2 extracts a part of the spectrum analysis result as a characteristic time series pattern {a _ij } (i is an input frame and j is a frequency). The characteristic time-series pattern of the extracted learning voice is stored in the template storage unit 33 in association with the voice.

【００１８】一方、テンプレート反転部３６はテンプレ
ート抽出部３２において抽出された特徴的時系列パター
ン｛ａ_ij｝の時系列を反転する。例えば、テンプレート
として｛ａ_1j｝，｛ａ_2j｝，｛ａ_3j｝，｛ａ_4j｝，｛ａ
_5j｝が抽出されたとすると、テンプレート反転部３６は
上記抽出テンプレートの時系列を反転し、｛ａ_5j｝，
｛ａ_4j｝，｛ａ_3j｝，｛ａ_2j｝，｛ａ_1j｝のテンプレー
トを作成する。このようにして反転されたテンプレート
は、テンプレート抽出部３２において抽出されたテンプ
レートと同様、テンプレート記憶部３３に記憶される。On the other hand, the template inverting unit 36 inverts the time series of the characteristic time series pattern {a _ij } extracted in the template extracting unit 32. For example, {a _1j }, {a _2j }, {a _3j }, {a _4j }, {a
_Assuming that _5j } is extracted, the template inverting unit 36 inverts the time series of the extracted template and outputs {a _5j },
A template of {a _4j }, {a _3j }, {a _2j }, {a _1j } is created. The template inverted in this manner is stored in the template storage unit 33, similarly to the template extracted in the template extraction unit 32.

【００１９】ここで、Ａ，Ｉ，Ｕ，Ｅ，０（オー）など
の母音および子音の一部の音声、例えば「Ａ」，「Ｉ」
の波形は図４ａ，ｂに示すように同一波形の繰り返しで
あり、その音声から抽出されたテンプレートの時系列を
反転してもその音声の特徴は失われない。したがって、
ある音声について、上記のように時系列を反転したテン
プレートは、その音声のテンプレートとして用いること
ができる。Here, some of the vowels and consonants such as A, I, U, E, and 0 (e), for example, "A" and "I"
4a and 4b are repetitions of the same waveform as shown in FIGS. 4A and 4B. Even if the time series of the template extracted from the voice is inverted, the characteristics of the voice are not lost. Therefore,
For a certain sound, the template whose time series is inverted as described above can be used as a template of the sound.

【００２０】音声認識時には、従来例と同様、認識音声
をスペクトル分析部３１においてスペクトル分析する。
ついで、テンプレート読み出し部３４はテンプレート記
憶部３３に格納されているテンプレートおよびそのテン
プレートの時系列を反転して作成したテンプレートを読
み出し、照合部３５において認識音声のスペクトル分析
結果と照合する。At the time of speech recognition, the spectrum of the recognized speech is analyzed by the spectrum analyzer 31 as in the conventional example.
Next, the template reading unit 34 reads the template stored in the template storage unit 33 and the template created by inverting the time series of the template, and the matching unit 35 compares the template with the spectrum analysis result of the recognized voice.

【００２１】そして、入力された音声のスペクトル分析
結果とテンプレート記憶部に格納されている各テンプレ
ートとの距離を求め、その距離のもつとも近いテンプレ
ートに対応づけられいる音声を入力された音声と認識す
る。前述したように、テンプレート記憶部３３には、学
習時に入力された音声から作成したテンプレートに加
え、その時系列を反転したテンプレートが格納されてい
るので、学習用の単語の入力を増やすことなく、テンプ
レートの数を多くすることができ、音声の認識性能を向
上することができる。Then, the distance between the spectrum analysis result of the input voice and each template stored in the template storage unit is obtained, and the voice corresponding to the template having the closest distance is recognized as the input voice. . As described above, in addition to the template created from the voice input at the time of learning, the template storage unit 33 stores a template whose time series is inverted. Can be increased, and the speech recognition performance can be improved.

【００２２】図５は本発明の第２の実施例を示す図であ
り、同図において、３１はスペクトル分析部、３２はテ
ンプレート抽出部、３３はテンプレート記憶部、３４は
テンプレート読み出し部、３５は照合部、３６はテンプ
レート反転部、３７はテンプレート選択部である。図５
におけるスペクトル分析部３１、テンプレート抽出部３
２、テンプレート記憶部３３、テンプレート読み出し部
３４、照合部３５、テンプレート反転部３６は第１の実
施例のものと同等の機能を持つ手段であり、本実施例に
おいては第１の実施例の構成のものに加えて、テンプレ
ート選択部３７が付加されている。FIG. 5 is a diagram showing a second embodiment of the present invention. In FIG. 5, reference numeral 31 denotes a spectrum analyzing unit, 32 denotes a template extracting unit, 33 denotes a template storing unit, 34 denotes a template reading unit, and 35 denotes a template reading unit. A collating unit, 36 is a template inverting unit, and 37 is a template selecting unit. FIG.
Analysis unit 31 and template extraction unit 3
2. The template storage unit 33, the template reading unit 34, the matching unit 35, and the template reversing unit 36 are means having functions equivalent to those of the first embodiment, and the configuration of the first embodiment is the same as that of the first embodiment. In addition to the above, a template selection unit 37 is added.

【００２３】テンプレート選択部３７はテンプレート抽
出部３２で抽出されたテンプレートのうち、反転するこ
とにより音声の特徴が失われないテンプレートのみを選
択してテンプレート反転部３６に与える。すなわち、音
声の内、破裂音を含む音声、例えばＢ，Ｋ，Ｐ，Ｆなど
の音声は図４ｃに示すように（図４ｃは「ｋ」について
例示）、周期的でなく、時間的に反転すると元の音声の
特徴が失われ、これらの音声の時系列を反転したテンプ
レートは音声の識別に使用するに適当でない。そこで、
本実施例においては、テンプレート選択部３７を設け、
上記のように反転しても元の音声の特徴が失われないテ
ンプレートを選択し、テンプレート反転部３６にあたえ
ている。The template selecting section 37 selects only the template which does not lose its voice characteristics by inverting it from the templates extracted by the template extracting section 32 and gives the selected template to the template inverting section 36. That is, among the voices, voices including plosives, for example, voices such as B, K, P, and F, are not periodically but temporally inverted as shown in FIG. 4C (FIG. 4C illustrates “k”). Then, the features of the original speech are lost, and the template with the time series inverted of these speeches is not suitable for use in speech identification. Therefore,
In the present embodiment, a template selection unit 37 is provided,
As described above, a template that does not lose the characteristics of the original voice even when inverted is selected and given to the template inverting unit 36.

【００２４】本実施例においては、反転しても元の音声
の特徴が失われないテンプレートのみを選択してテンプ
レート反転部３６で反転しているので、テンプレート記
憶部３３に不要なテンプレートが格納されることがな
く、記憶部の容量を小さくできるとともに、照合時、元
の音声の特徴が失われたテンプレートと認識音声を照合
することがないので、照合時の性能を向上させることが
できる。In the present embodiment, only templates that do not lose their original voice characteristics even when inverted are selected and inverted by the template inverting unit 36. Therefore, unnecessary templates are stored in the template storage unit 33. In addition, the capacity of the storage unit can be reduced, and at the time of matching, there is no need to match a template in which the features of the original sound have been lost with the recognized sound, so that the performance at the time of matching can be improved.

【００２５】図６は本発明の第３の実施例を示す図であ
り、同図において、３１はスペクトル分析部、３２はテ
ンプレート抽出部、３３はテンプレート記憶部、３４は
テンプレート読み出し部、３５は照合部、３６はテンプ
レート反転部、３７はテンプレート選択部、３８は反転
フラグ付加部である。図６におけるスペクトル分析部３
１、テンプレート抽出部３２、テンプレート記憶部３
３、テンプレート読み出し部３４、テンプレート反転部
３６、テンプレート選択部３７は第２の実施例のものと
同等の機能を持つ手段である。そして、本実施例におい
ては第２の実施例の構成のものに加えて、反転フラグ付
加部３８が付加されているとともに、照合部３５が読み
出されたテンプレートと認識音声とを照合する際、テン
プレートに付加された反転フラグを考慮する点で第２の
実施例のものと相違し、その他の点においては第２の実
施例のものと同様である。FIG. 6 is a diagram showing a third embodiment of the present invention. In FIG. 6, reference numeral 31 denotes a spectrum analyzing unit, 32 denotes a template extracting unit, 33 denotes a template storing unit, 34 denotes a template reading unit, and 35 denotes a template reading unit. A collating unit, 36 is a template reversing unit, 37 is a template selecting unit, and 38 is a reversing flag adding unit. Spectrum analysis unit 3 in FIG.
1, template extraction unit 32, template storage unit 3
3. The template reading unit 34, the template inverting unit 36, and the template selecting unit 37 are means having functions equivalent to those of the second embodiment. In the present embodiment, in addition to the configuration of the second embodiment, an inversion flag adding unit 38 is added, and when the matching unit 35 matches the read template with the recognition voice, The difference from the second embodiment is that the inversion flag added to the template is taken into account, and the other points are the same as those of the second embodiment.

【００２６】本実施例における反転フラグ付加部３８は
テンプレート反転部３６で反転されたテンプレートに反
転フラグを付加し、照合部３５における照合時、反転し
て作成したテンプレートを区別できるようにしている。
すなわち、反転して作成したテンプレートは通常のテン
プレートに対してあくまで副次的なものであって、特徴
の信頼性において通常のテンプレートよりも劣る。The inversion flag adding section 38 in this embodiment adds an inversion flag to the template inverted by the template inverting section 36 so that the template created by inversion can be distinguished at the time of collation by the collating section 35.
In other words, the template created by inversion is only a subsidiary to the normal template, and is inferior to the normal template in feature reliability.

【００２７】そこで、テンプレートと認識音声とを照合
部３５において照合する際、反転テンプレートについて
はペナルティーを与えるなどして、反転することにより
生ずる特徴の歪みを考慮している。ペナルティーを与え
る手法としては、例えば、下記の手法を用いることがで
きる。すなわち、従来例の照合部において説明した照合
手法のにおいて、反転フラグが無い場合には、式
（１）により、あるカテゴリｋについて、各テンプレー
トに対して入力パターンａ_ijとテンプレートｂ_kljの距
離を計算する。また、反転フラグがある場合には、式
（４）に示すようにペナルティーＰ（Ｐ＞１．０、例え
ば１．１など）を付けて距離の計算をする。Therefore, when the template and the recognized speech are collated by the collation unit 35, a distortion of a feature caused by the inversion is considered by giving a penalty to the inversion template. As a technique for giving a penalty, for example, the following technique can be used. That is, in the collation method described in the collation unit of the conventional example, if there is no inversion flag, the distance between the input pattern a _ij and the template b _klj for each template for a certain category k is calculated by Expression (1). calculate. If there is an inversion flag, the distance is calculated with a penalty P (P> 1.0, for example, 1.1) as shown in Expression (4).

【００２８】[0028]

【数２】 (Equation 2)

【００２９】本実施例においては、上記のように照合
時、通常のテンプレートと反転テンプレートを区別でき
るようにしているので、音声識別性能を向上させること
ができる。In this embodiment, since the normal template and the inverted template can be distinguished at the time of collation as described above, the voice discrimination performance can be improved.

【００３０】図７は本発明の第４の実施例を示す図であ
り、同図において、３１はスペクトル分析部、３２はテ
ンプレート抽出部、３３はテンプレート記憶部、３４は
テンプレート読み出し部、３５は照合部、３９は反転テ
ンプレート読み出し部である。図７におけるスペクトル
分析部３１、テンプレート抽出部３２、テンプレート記
憶部３３、テンプレート読み出し部３４、照合部３５は
第１，２の実施例のものと同等の機能を持つ手段であ
る。そして、本実施例においては第１の実施例の構成の
ものと較べ、テンプレート反転部３６が除去され、かわ
りに、反転テンプレート読み出し部３９が設けられてい
る点で相違し、その他の点においては第１の実施例のも
のと同様である。FIG. 7 is a diagram showing a fourth embodiment of the present invention. In FIG. 7, reference numeral 31 denotes a spectrum analyzing unit, 32 denotes a template extracting unit, 33 denotes a template storing unit, 34 denotes a template reading unit, and 35 denotes a template reading unit. The collating unit 39 is an inverted template reading unit. The spectrum analysis unit 31, template extraction unit 32, template storage unit 33, template reading unit 34, and collation unit 35 in FIG. 7 are means having functions equivalent to those of the first and second embodiments. The present embodiment differs from the configuration of the first embodiment in that the template inverting unit 36 is removed and a reversed template reading unit 39 is provided instead, and in other respects, This is the same as that of the first embodiment.

【００３１】図７における反転テンプレート読み出し部
３９は、テンプレート記憶部３３に記憶されたテンプレ
ートを読み出す際、読み出したテンプレートを反転し、
反転テンプレートを作成して照合部３５に与える手段で
あり、テンプレートを反転する手段をテンプレート記憶
部３３の出力側に設けた点で第１の実施例のものと相違
している。本実施例においては、テンプレート記憶部３
３からテンプレートを読み出す時にテンプレートを反転
しているため、テンプレート記憶部３３に反転テンプレ
ートを格納する必要はなく、テンプレート記憶部３３の
容量を従来の装置と同等としたまま、テンプレートの数
を増やすことができる。When reading the template stored in the template storage unit 33, the inverted template reading unit 39 in FIG.
This is a means for creating an inverted template and providing it to the collating unit 35, and differs from that of the first embodiment in that a means for inverting the template is provided on the output side of the template storage unit 33. In the present embodiment, the template storage unit 3
Since the template is inverted when the template is read from No. 3, there is no need to store the inverted template in the template storage unit 33, and the number of templates can be increased while keeping the capacity of the template storage unit 33 equal to that of the conventional device. Can be.

【００３２】図８は本発明の第５の実施例を示す図であ
り、同図において、３１はスペクトル分析部、３２はテ
ンプレート抽出部、３３はテンプレート記憶部、３４は
テンプレート読み出し部、３５は照合部、３７はテンプ
レート選択部、３８は反転フラグ付加部、３９は反転テ
ンプレート読み出し部である。図８におけるスペクトル
分析部３１、テンプレート抽出部３２、テンプレート記
憶部３３、テンプレート読み出し部３４、照合部３５、
テンプレート選択部３７、反転フラグ付加部３８は第３
の実施例のものと同様の機能を持つ手段である。そし
て、本実施例においては第３の実施例の構成のものと較
べ、テンプレート反転部３６が除去され、かわりに、反
転テンプレート読み出し部３９が設けられている点で相
違し、その他の点においては第３の実施例のものと同様
である。また、反転テンプレート読み出し部３９は第４
の実施例における反転テンプレート読み出し部３９と同
等の機能を持つ。FIG. 8 is a diagram showing a fifth embodiment of the present invention. In FIG. 8, reference numeral 31 denotes a spectrum analyzing unit, 32 denotes a template extracting unit, 33 denotes a template storing unit, 34 denotes a template reading unit, and 35 denotes a template reading unit. A collation unit, 37 is a template selection unit, 38 is an inversion flag adding unit, and 39 is an inversion template reading unit. 8, the spectrum analysis unit 31, the template extraction unit 32, the template storage unit 33, the template reading unit 34, the collation unit 35,
The template selecting unit 37 and the inversion flag adding unit 38
This means has the same function as that of the embodiment. The present embodiment is different from the third embodiment in that the template inverting unit 36 is removed and an inverted template reading unit 39 is provided instead, and the other points are different. This is the same as that of the third embodiment. In addition, the inverted template reading unit 39
It has the same function as the inverted template reading unit 39 in the embodiment.

【００３３】図８のものは、テンプレート選択部３７に
おいて、反転しても音声の特徴の失われないテンプレー
トを選択し、反転フラグ付加部３８においてそのテンプ
レートに反転フラグを付加し、テンプレート読み出し
時、反転フラグが付されたテンプレートのみを反転す
る。In FIG. 8, a template selection unit 37 selects a template that does not lose its voice characteristics even when inverted, and an inversion flag adding unit 38 adds an inversion flag to the template. Only the template with the inversion flag is inverted.

【００３４】本実施例においては、反転可能なテンプレ
ートのみを反転しているため、反転することによりと特
徴の失われるテンプレートを用いて照合部３５において
照合することがなく、照合時の性能が向上するととも
に、第４の実施例と同様、テンプレート記憶部３３から
テンプレートの読み出し時に反転したテンプレートを作
成しているため、テンプレート記憶部３３に反転テンプ
レートを格納する必要はなく、テンプレート記憶部３３
の容量を従来の装置と同等としたまま、テンプレートの
数を増やすことができる。In this embodiment, since only the reversible template is inverted, the collation unit 35 does not collate using the template whose characteristic is lost by inversion, and the performance at the time of collation is improved. In addition, as in the fourth embodiment, since a template that has been inverted when the template is read from the template storage unit 33 is created, there is no need to store the inverted template in the template storage unit 33.
The number of templates can be increased while keeping the capacity of the conventional apparatus equal.

【００３５】図９は本発明の第６の実施例を示す図であ
り、同図において、３１はスペクトル分析部、３２はテ
ンプレート抽出部、３３はテンプレート記憶部、３４は
テンプレート読み出し部、３５は照合部、３５−１は通
常照合部、３５−２は反転照合部、３５−３は統合部、
４０は入力反転部である。FIG. 9 is a view showing a sixth embodiment of the present invention. In FIG. 9, reference numeral 31 denotes a spectrum analyzing unit, 32 denotes a template extracting unit, 33 denotes a template storing unit, 34 denotes a template reading unit, and 35 denotes a template reading unit. Collating unit, 35-1 is a normal collating unit, 35-2 is a reverse collating unit, 35-3 is an integrating unit,
Reference numeral 40 denotes an input inversion unit.

【００３６】図９におけるスペクトル分析部３１、テン
プレート抽出部３２、テンプレート記憶部３３、テンプ
レート読み出し部３４は第１，２の実施例のものと同等
の機能を持つ手段であり、本実施例においては上記構成
に加え、入力反転部４０が設けられているとともに、照
合部３５に通常照合部３５−１、反転照合部３５−２、
統合部３５−３が設けられている点が相違している。図
９における入力反転部４０は入力音声を反転する手段で
あり、また、照合部３５における通常照合部３５−１は
入力音声とテンプレート読み出し部３４により読み出さ
れた通常のテンプレートとを照合する手段、反転照合部
３５−２は入力反転部４０で反転された入力音声とテン
プレート読み出し部３４により読み出された通常のテン
プレートとを照合する手段、統合部３５−３は通常照合
部３５−１と反転照合部３５−２における照合結果を統
合する手段である。The spectrum analyzing section 31, template extracting section 32, template storing section 33, and template reading section 34 in FIG. 9 are means having functions equivalent to those of the first and second embodiments. In addition to the above configuration, an input reversing unit 40 is provided, and the collating unit 35 includes a normal collating unit 35-1, an inverting collating unit 35-2,
The difference is that an integration unit 35-3 is provided. The input inverting unit 40 in FIG. 9 is a unit for inverting the input voice, and the normal collating unit 35-1 in the collating unit 35 is a unit for collating the input voice with the normal template read by the template reading unit 34. , The inversion matching unit 35-2 matches the input voice inverted by the input inversion unit 40 with the normal template read by the template reading unit 34, and the integration unit 35-3 matches the normal matching unit 35-1. This is a means for integrating the collation results in the reverse collation unit 35-2.

【００３７】図９において、音声の学習時には、第１の
実施例のものと同様、テンプレート抽出部３２において
学習用の音声のスペクトル分析結果からテンプレートを
抽出し、テンプレート記憶部３３に記憶させる。音声の
認識時には、認識音声とテンプレート記憶部３３に記憶
されたテンプレートを照合部３５に設けられた通常照合
部３５−１で比較するとともに、入力反転部２０におい
て認識音声を反転し、反転した音声とテンプレート記憶
部３３に記憶されたテンプレートを反転照合部３５−２
において照合する。In FIG. 9, at the time of learning a voice, a template is extracted from the spectrum analysis result of the learning voice by the template extracting unit 32 and stored in the template storing unit 33, as in the first embodiment. At the time of voice recognition, the recognized voice and the template stored in the template storage unit 33 are compared by the normal verification unit 35-1 provided in the verification unit 35, and the input voice inversion unit 20 inverts the recognized voice. And the template stored in the template storage unit 33 with the reverse matching unit 35-2.
Is collated.

【００３８】統合部３５−３は通常照合部３５−１にお
ける照合結果と反転照合部３５−２における照合結果を
統合して、音声の認識結果を出力する。例えば、通常照
合部３５−１において求めたテンプレートと認識音声の
間の距離と反転照合部３５−２において求めたテンプレ
ートと認識音声の間の距離の近い方を統合部で選択し、
認識音声とする。本実施例においては、識別時の入力音
声を反転しているので、第４，５の実施例のようにテン
プレート記憶部３３の容量を従来の装置と同等としたま
ま、テンプレートの時系列を反転した場合と全く同等の
効果をうることができる。The integrating unit 35-3 integrates the collation result of the normal collation unit 35-1 and the collation result of the reverse collation unit 35-2, and outputs a speech recognition result. For example, the integration unit selects the closer one between the distance between the template and the recognition voice obtained by the normal matching unit 35-1 and the distance between the template and the recognition voice obtained by the reverse matching unit 35-2.
Recognize speech. In the present embodiment, since the input voice at the time of identification is inverted, the time series of the template is inverted while the capacity of the template storage unit 33 is equal to that of the conventional device as in the fourth and fifth embodiments. It is possible to obtain the same effect as in the case of the above.

【００３９】図１０は本発明の第７の実施例を示す図で
あり、同図において、３１はスペクトル分析部、３２は
テンプレート抽出部、３３はテンプレート記憶部、３４
はテンプレート読み出し部、３５は照合部、３５−１は
通常照合部、３５−２は反転照合部、３５−３は統合
部、３７はテンプレート選択部、４０は入力反転部であ
る。FIG. 10 shows a seventh embodiment of the present invention. In FIG. 10, reference numeral 31 denotes a spectrum analyzer, 32 denotes a template extractor, 33 denotes a template storage, 34
Is a template reading unit, 35 is a collating unit, 35-1 is a normal collating unit, 35-2 is an inverting collating unit, 35-3 is an integrating unit, 37 is a template selecting unit, and 40 is an input inverting unit.

【００４０】図１０におけるスペクトル分析部３１、テ
ンプレート抽出部３２、テンプレート記憶部３３、テン
プレート読み出し部３４、照合部３５、通常照合部３５
−１、反転照合部３５−２、統合部３５−３、入力反転
部４０は第６の実施例のものと同等の機能を持つ手段で
あり、本実施例においては上記構成に加え、テンプレー
ト選択部３７設けられている点が相違している。The spectrum analysis unit 31, template extraction unit 32, template storage unit 33, template reading unit 34, collation unit 35, and normal collation unit 35 in FIG.
-1, the reverse collating unit 35-2, the integrating unit 35-3, and the input reversing unit 40 are means having the same functions as those of the sixth embodiment. The difference is that a portion 37 is provided.

【００４１】図１０におけるテンプレート選択部３７
は、第２の実施例のものと同様、テンプレート読み出し
部３４から読み出されたテンプレートのうち、反転する
ことにより音声の特徴が失われないテンプレートのみを
選択し出力する手段である。図１０において、音声の学
習時には、第１の実施例のものと同様、テンプレート抽
出部３２において学習用の音声のスペクトル分析結果か
らテンプレートを抽出し、テンプレート記憶部３３に記
憶させる。音声の認識時には、認識音声とテンプレート
記憶部３３に記憶されたテンプレートを照合部３５に設
けられた通常照合部３５−１で比較するとともに、入力
反転部４０において認識音声を反転する。テンプレート
選択部３７は、反転することにより音声の特徴が失われ
ないテンプレートのみを選択し反転照合部３５−２にあ
たえる。反転照合部３５−２は反転した音声とテンプレ
ート選択部３７により選択されたテンプレートを反転照
合部３５−２において照合する。Template selection section 37 in FIG.
Is a means for selecting and outputting only the template whose voice characteristics are not lost by inverting, from the templates read out from the template reading unit 34, as in the second embodiment. In FIG. 10, at the time of learning a voice, a template is extracted from a spectrum analysis result of a learning voice by a template extracting unit 32 and stored in a template storage unit 33 as in the first embodiment. When recognizing the voice, the recognition voice is compared with the template stored in the template storage unit 33 by the normal verification unit 35-1 provided in the verification unit 35, and the input reversal unit 40 reverses the recognition voice. The template selecting unit 37 selects only the template whose voice characteristics are not lost by the inversion, and supplies the selected template to the inversion matching unit 35-2. The reverse matching unit 35-2 compares the inverted voice with the template selected by the template selecting unit 37 in the reverse matching unit 35-2.

【００４２】本実施例においては、反転しても特徴の失
われることのないテンプレートをテンプレート選択部３
７で選択しているので、反転照合部３５−２における照
合時の性能を向上させることができる。なお、第６およ
び第７の実施例において、反転照合部３５−２における
照合時、第３の実施例に示したように、ペナルティーを
つけて照合することもできる。ペナルティーをつけるこ
とにより、反転することによって生ずる特徴の歪みを考
慮することができるので、音声の識別性能を一層向上さ
せることができる。In the present embodiment, a template whose characteristics are not lost even when inverted is selected by the template selecting unit 3.
7, the performance of the reversal collation unit 35-2 at the time of collation can be improved. In the sixth and seventh embodiments, at the time of the collation in the reverse collation unit 35-2, the collation can be performed with a penalty as shown in the third embodiment. By giving the penalty, the distortion of the feature caused by the inversion can be considered, so that the speech recognition performance can be further improved.

【００４３】[0043]

【発明の効果】以上説明したように、本発明によれば、
学習用の単語を増やすことなく、音声識別性能を向上す
ることができる。特に請求項１ないし請求項７の発明に
よれば、次の効果を得ることができる。請求項１によれ
ば、学習時、入力音声の時系列パターンを反転したテン
プレートを作成して記憶し、認識時に上記反転テンプレ
ートを用いて認識音声と照合するので、学習用の単語を
増やすことなく、テンプレートの個数を増やすことがで
き、音声認識装置性能を向上することができる。請求項
２によれば、請求項１の効果に加え、反転しても元の音
声の特徴が失われないテンプレートのみをテンプレート
選択部により選択して、テンプレート反転部１６で反転
しているので、テンプレート記憶部の容量を小さくでき
るとともに、元の音声の特徴が失われたテンプレートと
認識音声を照合することがないので、照合時の性能を向
上させることができる。請求項３によれば、請求項１の
効果に加え、反転したテンプレートに反転フラグを付加
し、通常のテンプレートと反転テンプレートを区別でき
るようにしているので、照合時の音声識別性能を向上さ
せることができる。請求項４によれば、テンプレート記
憶部からテンプレートを読み出す時にテンプレートを反
転しているため、テンプレート記憶部３３の容量を従来
の装置と同等としたまま、テンプレートの数を増やした
のと同等の効果を得ることができる。請求項５によれ
ば、請求項４の効果に加え、反転可能なテンプレートに
反転フラグをつけてテンプレート記憶部に格納している
ため、反転可能なもののみを反転して読み出だすことが
でき、照合時の性能を向上させることができる。請求項
６によれば、識別時の入力音声を反転し照合部で照合し
ているので、テンプレート記憶部の容量を従来の装置と
同等としたまま、テンプレートの時系列を反転した場合
と全く同等の効果を得ることができる。請求項７によれ
ば、請求項６の効果に加え、反転しても特徴の失われる
ことのないテンプレートを選択して反転した入力音声と
照合しているので、照合時、元の音声の特徴が失われた
反転音声とテンプレートを照合することがなく、照合時
の性能を向上させることができる。As described above, according to the present invention,
The speech recognition performance can be improved without increasing the number of words for learning. In particular, according to the first to seventh aspects of the present invention, the following effects can be obtained. According to the first aspect, at the time of learning, a template in which the time series pattern of the input voice is inverted is created and stored, and at the time of recognition, the template is compared with the recognized voice using the inverted template, so that the number of words for learning is not increased. , The number of templates can be increased, and the performance of the speech recognition device can be improved. According to the second aspect, in addition to the effect of the first aspect, only the template which does not lose the original voice characteristics even when inverted is selected by the template selecting unit and inverted by the template inverting unit 16. The capacity of the template storage unit can be reduced, and the template in which the features of the original speech are lost is not compared with the recognized speech, so that the performance at the time of matching can be improved. According to the third aspect, in addition to the effect of the first aspect, an inversion flag is added to the inverted template so that the normal template and the inverted template can be distinguished from each other. Can be. According to the fourth aspect, since the template is inverted when the template is read from the template storage unit, the same effect as increasing the number of templates while keeping the capacity of the template storage unit 33 equal to that of the conventional device. Can be obtained. According to the fifth aspect, in addition to the effect of the fourth aspect, since the invertable template is stored in the template storage unit with an inversion flag, only the invertible template can be inverted and read out. In addition, the performance at the time of matching can be improved. According to claim 6, since the input voice at the time of identification is inverted and collated by the collating unit, it is completely equivalent to the case where the time series of the template is reversed while the capacity of the template storage unit is made equal to that of the conventional device. The effect of can be obtained. According to the seventh aspect, in addition to the effect of the sixth aspect, a template that does not lose its characteristics even when inverted is selected and collated with the inverted input voice. The template at the time of matching can be improved without matching the inverted voice lost with the template with the template.

[Brief description of the drawings]

【図１】本発明の構成を示す図である。FIG. 1 is a diagram showing a configuration of the present invention.

【図２】本発明の構成を示す図である。FIG. 2 is a diagram showing a configuration of the present invention.

【図３】本発明の第１の実施例を示す図である。FIG. 3 is a diagram showing a first embodiment of the present invention.

【図４】音声の波形を示す図である。FIG. 4 is a diagram showing a waveform of a sound.

【図５】本発明の第２の実施例を示す図である。FIG. 5 is a diagram showing a second embodiment of the present invention.

【図６】本発明の第３の実施例を示す図である。FIG. 6 is a diagram showing a third embodiment of the present invention.

【図７】本発明の第４の実施例を示す図である。FIG. 7 is a diagram showing a fourth embodiment of the present invention.

【図８】本発明の第５の実施例を示す図である。FIG. 8 is a diagram showing a fifth embodiment of the present invention.

【図９】本発明の第６の実施例を示す図である。FIG. 9 is a diagram showing a sixth embodiment of the present invention.

【図１０】本発明の第７の実施例を示す図である。FIG. 10 is a diagram showing a seventh embodiment of the present invention.

【図１１】従来例を示す図である。FIG. 11 is a diagram showing a conventional example.

【図１２】音声から特徴パラメータを抽出するためのブ
ロック図である。FIG. 12 is a block diagram for extracting feature parameters from speech.

【図１３】テンプレートのデータ構造を示す図である。FIG. 13 is a diagram showing a data structure of a template.

[Explanation of symbols]

３１スペクトル分析部３２テンプレート抽出部３３テンプレート記憶部３４テンプレート読み出し部３５照合部３５−１通常照合部３５−２反転照合部３５−３統合部３６テンプレート反転部３７テンプレート選択部３８反転フラグ付加部３９反転テンプレート読み出し部４０入力反転部 31 Spectrum analysis unit 32 Template extraction unit 33 Template storage unit 34 Template reading unit 35 Collation unit 35-1 Normal collation unit 35-2 Inversion collation unit 35-3 Integration unit 36 Template inversion unit 37 Template selection unit 38 Inversion flag addition unit 39 Inversion template reading unit 40 Input inversion unit

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/00 - 17/00 ──────────────────────────────────────────────────続き Continuation of front page (58) Field surveyed (Int. Cl. ⁷ , DB name) G10L 15/00-17/00

Claims

(57) [Claims]

1. A spectrum analysis unit for analyzing a spectrum of an input voice, a template extraction unit for extracting a characteristic time-series pattern of a phoneme or a speech unit similar to a phoneme from an analysis result, and a multi-plate A template storage unit for storing the stored template, a template reading unit for reading the stored template, and a matching unit for matching the input voice with the read template. During learning, a characteristic time-series pattern is extracted from the input voice. The template is stored in the template storage unit as a template. At the time of recognition, the template stored in the template storage unit is read out, and the matching unit compares the template with the characteristic time-series pattern of the input speech. When the template is A speech recognition apparatus, comprising: a template inverting unit for inverting a sequence; storing the inverted template in a template storage unit as a template of the same category as the template before the inversion.

2. A method according to claim 1, further comprising: providing a template selection unit for selecting a template that does not lose the characteristics of the input voice even when the characteristic time-series pattern is inverted, and inverting the template selected by the template selection unit in the template inversion unit. The speech recognition device according to claim 1, wherein

3. An inversion flag adding section for adding an inversion flag to an inverted template, and
3. The speech recognition apparatus according to claim 1, wherein the matching is performed in consideration of an inversion flag.

4. A spectrum analyzing unit for analyzing a spectrum of an input voice, a template extracting unit for extracting a characteristic time-series pattern of a phoneme or a voice unit corresponding to a phoneme from the analysis result, and a multi-plate for extracting the extracted template for the same category. A template storage unit for storing the stored template, a template reading unit for reading the stored template, and a matching unit for matching the input voice with the read template. During learning, a characteristic time-series pattern is extracted from the input voice. A template is stored in the template storage unit as a template. At the time of recognition, the template stored in the template storage unit is read out, and the matching unit compares the template with a characteristic time-series pattern of the input speech. Read from the unit An inverted template reading unit for inverting the time series of the plate is provided. The inverted template is read out by the inverted template reading unit as a template of the same category as the template before the inversion, and the matching unit compares the template with the input voice, and inputs the template that is not inverted. Characteristic time series of speech
The result of matching with the pattern
Integration of the matching result with the characteristic time-series pattern of the input speech
And, the speech recognition apparatus characterized by recognizing the voice.

5. A template selecting section for selecting a template which does not lose the characteristics of the input speech even when the characteristic time-series pattern is inverted, and an inversion flag adding section for adding an inversion flag to the template selected in the template selection section. 5. The speech recognition apparatus according to claim 4, wherein when the template is inverted and read by the inverted template reading unit, the template to which the inversion flag is added is selected to perform the inverted reading.

6. A spectrum analysis unit for analyzing a spectrum of an input speech, a template extraction unit for extracting a characteristic time-series pattern of a phoneme or a speech unit similar to a phoneme from an analysis result, and a multi-plate A template storage unit for storing the stored template, a template reading unit for reading the stored template, and a matching unit for matching the input voice with the read template. During learning, a characteristic time-series pattern is extracted from the input voice. As a template, a template is stored in the template storage unit. At the time of recognition, the template stored in the template storage unit is read out, and the matching unit matches the characteristic time-series pattern of the input voice, thereby recognizing the speech. Time system of input voice during recognition An input inverting unit for inverting a column is provided, the input voice inverted by the input inverting unit is collated with the template read by the template reading unit, and a characteristic time-series pattern of the input voice not inverted and a template are compared.
The result of matching with the plate and the characteristics of the inverted input voice
Integration of the matching result of the target time series pattern with the template
And, the speech recognition apparatus characterized by recognizing the voice.

7. A template selecting section for selecting a template that does not lose the characteristics of the input voice even when the characteristic time-series pattern is inverted, and compares the template selected by the template selecting section with the output of the input inverting section. 7. The speech recognition device according to claim 6, wherein: