JPS62145298A

JPS62145298A - Voice recognition equipment

Info

Publication number: JPS62145298A
Application number: JP28579485A
Authority: JP
Inventors: 宮芝　晃一; 恭則大洞
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1985-12-20
Filing date: 1985-12-20
Publication date: 1987-06-29

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】［産業上の利用分野」本発明は入力された音声情報を認識する音声認識装置に
関するものである。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a speech recognition device that recognizes input speech information.

［従来の技術Ｊ音声認識装置は、ます、入力された音声をＡ／Ｄ変換し
、その出刃を特徴抽出部に送る。特徴抽出部では、音声
のパワー（′心力）情報を計算したり、高速フーリエ変
換などの手法を用いて「声のスペクトル情報を算出して
いる。[Prior Art J] A speech recognition device first performs A/D conversion on input speech and sends the output to a feature extraction unit. The feature extraction section calculates voice power information and uses techniques such as fast Fourier transform to calculate voice spectral information.

標準パタン記憶部に蓄えられている標準パタンか持つ情
報の種類は、特徴抽出部で算出される情報の種類に一致
しており、パタンマツチングの際の類似度計算は、ます
、入力音声と標準パタンの同じ種類の情報量ごとに計算
し、最終的な類似度は求まった個々の情報量に定められ
た値を乗じてそれらを加え合わせて求めている。The type of information held by the standard patterns stored in the standard pattern storage unit matches the type of information calculated by the feature extraction unit, and similarity calculation during pattern matching is performed based on the input audio and Calculations are made for each amount of information of the same type in a standard pattern, and the final similarity is determined by multiplying each amount of information found by a predetermined value and adding them together.

ところで従来の音声訳：＆装置では、有声行と無声すｆ
の識別、ざらに無ｉｎの中での無計と無声子片の識別、
有声計の中での母「と鼻子音の識別簿は、音声のパワー
情報を利用したり、周波数？ｉＦ域を低域、中域、高域
に分割し、その帯域に含まれる周波数成分の比を利用す
ることによって判別してきた。By the way, in the conventional phonetic translation:& device, voiced line and unvoiced line f
Discrimination of mute and voiceless fragments in zarananimuin,
The identification list for nasal consonants in the voice meter uses voice power information, divides the frequency range into low, middle, and high ranges, and calculates the frequency components included in that range. This was determined by using the ratio.

［発明が解決しようとする問題点」しかし、入力音声に雑音が多く混在している場合には、
語頭の子音のパワー情報がｊＡｇのパワーの中にうもれ
てしまったりすることが多く、語中の子音においてもそ
のスペクトルが前後のｆｌＪ　Ｙｆのスペクトルに引き
すられ定性的な情報が欠落して子音を識別することは容
易ではなかった。[Problem to be solved by the invention] However, if the input audio contains a lot of noise,
The power information of the consonant at the beginning of a word is often hidden in the power of jAg, and even for the consonant in the middle of a word, its spectrum is dragged by the spectrum of the flJ Yf before and after, and qualitative information is missing, resulting in a consonant. was not easy to identify.

また、母＝　／ｕ　／のスペクトルは、鼻子音／ｍ／　
、／ｎ／のスペクトルに非常に類似しているため、これ
らの誤識別率も高かった。Also, the spectrum of the mother = /u / is the nasal consonant /m/
, /n/, their misidentification rate was also high.

本発明は、上述した従来技術の欠点に鑑みなされたもの
であり、その目的は、入力音声波形に関する情報を音声
認識−ト法に取り入れることによってマツチングの際の
処理時間を短縮させ、かつ高認識率が得られる音声認識
装とを提供することにある。The present invention has been made in view of the above-mentioned shortcomings of the prior art, and its purpose is to shorten the processing time during matching by incorporating information about the input speech waveform into the speech recognition method, and to achieve high recognition performance. The object of the present invention is to provide a speech recognition device that can obtain a high rate of speech recognition.

ｃ問題点を解決するためのＬ段］この問題を解決するだめの−手段として、例えば、本実
施例のぎ声認識装置には、１１声情報の波形のピークレ
ベルを求める第１の算出手段と、第１の算出手段により
求めたピークレベルの時間的変化を算出する第２の算出
手段と、第２の算出手段により求められた音声情報のピ
ークレベルの時間的変化を音声情報の特徴に統合する統
合ｆ段とを備える。L Stage for Solving Problem C] As an alternative means for solving this problem, for example, the voice recognition apparatus of this embodiment includes a first calculation means for calculating the peak level of the waveform of the 11 voice information. and a second calculation means for calculating the temporal change in the peak level obtained by the first calculation means, and a second calculation means for calculating the temporal change in the peak level of the audio information obtained by the second calculation means as a feature of the audio information. and an integration f stage for integration.

［作用コかかる本実施例の構成において、入力された音声情報の
時間的ピークレベルの変化を新たに音声情報の特徴に統
合し、その結果、得られた最適の認ａ結果を出力する。[Operations] In the configuration of this embodiment, changes in the temporal peak level of the input audio information are newly integrated into the characteristics of the audio information, and as a result, the obtained optimal recognition a result is output.

「７！施例」以下、添付図面に従って本発明に係る実施例を詳細に説
明する。"7! Embodiments" Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

第１図は本実施例の音声認識装置のブロック構成図であ
る。FIG. 1 is a block diagram of the speech recognition device of this embodiment.

図中、１は音声入力部で音声を電気信号に変換するマイ
クロホンであり、２は電気信号に変換された音声を例え
ば、５〜１０ｍ秒毎に標本化し、量子化するアナログを
デジタル化するＡ／Ｄ変換器である。３はＡ／Ｄ変換器
２の出力を一時蓄えるバッファメモリである。４はバッ
ファメモリ３からデータを逐次読出して波高１１１′Ｉ
（ピーク）を求める波高（ｉｎ検出回路であり、４ａは
波高１＋１１検ｉ−１回路４内の中央処理装置ＣＰＵで
、４ｂは後述する第３図に示されるフロチャートのプロ
グラムが格納されているＲＯＭであり、４ｃはワークエ
リア又は後述する波高イ直を求めると５に使用するｄ（
１）、ｄ　（２）、ｄ　（３）のバッファとして使用す
るＲＡＭである。５は波高１１自検出回路４の出力を一
時的に蓄えるパックアメモリ、６は波高値の１１テ間的
変化を算出する波高１自変化算出回路である。７は周波
数範囲２００〜６０００ヘルツを８〜３０チヤンネルに
分けた１？域通過フィルタ群を備え、パワーＷ号やスペ
クトル骨報等の特徴抽出を行なう特徴抽出部である。８
は波高イ直に関する特徴Ｔｉｔが算出されるまでの間、
入力ｇ声特微量を蓄えておくバッファメモリであり、９
は特徴抽出部の出力と仮＋５’５　（Ｉｆｆに関する特
徴最を統合して入力ｉｌ由の特徴パタンを作成する特徴
パタン統合部であり、１０は標準パタンを蓄えるメモリ
である。１１は入力斤Ｊ＋１特徴届とメモリ１０から読
７１５した標／４１ｉパタンを比（咬し、両者の類似度
を計算するパタンマツチング部である。１２はパタンマ
ツチングＦＡｌｌで計算された類似度最大の標準パタン
を認ａ結果として出力する認識結果出力部である。In the figure, 1 is an audio input section that converts audio into an electrical signal, and 2 is a microphone that samples and quantizes the audio converted into an electrical signal every 5 to 10 msec. /D converter. A buffer memory 3 temporarily stores the output of the A/D converter 2. 4 sequentially reads data from the buffer memory 3 to obtain a wave height of 111'I.
4a is the central processing unit CPU in the wave height 1+11 detection i-1 circuit 4, and 4b stores the program of the flowchart shown in FIG. 3, which will be described later. 4c is a ROM, and 4c is a work area or d(, which is used in 5 when calculating the wave height adjustment described later).
This RAM is used as a buffer for 1), d(2), and d(3). 5 is a pack memory for temporarily storing the output of the wave height 11 self-detection circuit 4, and 6 is a wave height 1 self-change calculation circuit that calculates the 11-time change in the wave height value. 7 is a frequency range of 200-6000 Hz divided into 8-30 channels 1? This is a feature extraction unit that includes a group of pass-pass filters and extracts features such as power W and spectral information. 8
Until the characteristic Tit related to the wave height is calculated,
It is a buffer memory that stores the input g voice feature amount, and 9
is a feature pattern integration unit that integrates the output of the feature extraction unit and the features related to temporary +5'5 (If) to create a feature pattern based on the input il; 10 is a memory that stores standard patterns; 11 is an input block; This is a pattern matching unit that compares the J+1 feature report and the standard/41i pattern read from memory 10 and calculates the similarity between the two. 12 is the standard pattern with the maximum similarity calculated by pattern matching FAll. This is a recognition result output unit that outputs as a recognition result.

以下、本実施例の動作を詳細に説明する。The operation of this embodiment will be explained in detail below.

ます、入力音声はＡ／Ｄ変換器２によってテイジタル批
に変換され、その出力はパックアメモリ３を介して波高
値検出回路４と特徴抽出部７に送られる。Ａ／Ｄ変換の
サンプリング周波数及び、■サンプルあたりの量子化ヒ
ツト数は可変であるか本実施例では、１２ｋＨｚでサン
プリングを行ない、■サンプル１２ヒツト（１２ビツト
中１ビツトは符号ヒツトとする）で量子化した。この場
合、１秒の音声は　１２０００点のデータで表ねされる
ことになる。First, the input voice is converted into a digital signal by the A/D converter 2, and its output is sent to the peak value detection circuit 4 and the feature extraction section 7 via the pack memory 3. Is the sampling frequency of A/D conversion and ■ the number of quantization hits per sample variable? In this example, sampling is performed at 12 kHz, and Quantized. In this case, one second of audio will be represented by 12,000 points of data.

第２図はＡ／Ｄ変換器２の出力をグラフィック表示した
ものである。FIG. 2 is a graphical representation of the output of the A/D converter 2.

Ａ／Ｄ変換は実時間で行なわれるために、波高（ｌｌｆ
ｆ検出回路４の前にバッファメモリ３をおき、波高イ１
α検出回路４はこのバッファメモリ３からサンプルデー
タを逐次読出して行なう。Since A/D conversion is performed in real time, the wave height (llf
A buffer memory 3 is placed in front of the f detection circuit 4, and the wave height is
The α detection circuit 4 sequentially reads sample data from the buffer memory 3.

第３図は波高値検出回路４内部のＣＰＵ４ａが処理する
フローチャートである。FIG. 3 is a flowchart of processing performed by the CPU 4a inside the peak value detection circuit 4.

ここで、　ｄ（１１、６（２）　、　ｄ（３）はパック
アメモリ３から読出すデータを格納する配夕１１で、１
，２．３はサンプルデータの１１「１番を表わす。また
、ｄ＋は正の波高Ｗを、ｄ−は負の波高１１〔ｉを表す
。Here, d(11, 6(2), d(3) is the storage 11 that stores data read from the pack memory 3, and 1
, 2.3 represents sample data 11"1. Also, d+ represents a positive wave height W, and d- represents a negative wave height 11[i.

ます、ステップＳｔでバッファメモリ３に格納されたデ
ータの始めの２つをｄ（１）　、ｄ（２）に夫々、洸み
込む。ステップＳ２ではバッファメモリ３内のデータを
全て読み終えたかを判定し、読み終えたらステップＳ９
で処理を終了するが、この時点では終了していないから
ステップＳ３にすすみ。First, the first two pieces of data stored in the buffer memory 3 in step St are imported into d(1) and d(2), respectively. In step S2, it is determined whether all the data in the buffer memory 3 has been read, and when it has been read, step S9
The process ends at step S3, but since it is not finished at this point, proceed to step S3.

パックアメモリ３から次のデータをｄ（３）に読み込む
。Read the next data from pack memory 3 into d(3).

ステップＳ４，６ではｄ（１）　、ｄ（２）　、ｄ（３
）の大小関係を調べる。例えば、ｄ（１）（ｄ（２）且
つｄ（２））ｄ（３）で、ｄ（２））Ｏのときのｄ（２
）が正の波高値であり、ｄ（１））ｄ（２）　　且つｄ
（２）（ｄ（３）で、ｄ（２）ｃｏのときｄ（２）が負
の波高値であるから、夫々上記条件を満たしているとき
に、ステップＳ５．７でｄ＋、ｄ−にｄ（２）を格納し
、またそのデータが何番（」のデータであるのか格納し
てステップＳ８にすすむ。In steps S4 and S6, d(1), d(2), d(3
). For example, if d(1) (d(2) and d(2)) d(3), then d(2) when d(2))O
) is the positive peak value, d(1)) d(2) and d
(2) (In d(3), when d(2)co, d(2) is a negative peak value, so when each of the above conditions is satisfied, d+ and d- are set in step S5.7. d(2) is stored, and the data number ('') is stored, and the process proceeds to step S8.

また以上の条件を満足しない場合には直接ステップＳ８
にすすむ。ステップＳ８では現在のｄ（２）をｄ（１〕
に格納し、同様にｄ（３）をｄ（２）に格納して、ステ
ップＳ２に移り、終了かどうかを判定し、終了でないと
きには、新たにデータをバッファメモリ３より読み込み
、そのデータをｄ（３）に格納して同様の処理を繰り返
し、全てのデータ分に対して処理される。In addition, if the above conditions are not satisfied, step S8 is performed directly.
Proceed to. In step S8, the current d(2) is changed to d(1)
Similarly, d(3) is stored in d(2), and the process moves to step S2, where it is determined whether or not the end is complete. If it is not the end, new data is read from the buffer memory 3, and the data is stored in d(2). (3) and repeats the same process to process all data.

またデータ数はこの場合、計測時間に１２０００を掛だ
値となる。In this case, the number of data is the value obtained by multiplying the measurement time by 12,000.

以上、述べた様に、波高値を求める時には、以下の様に
することになる。As mentioned above, when calculating the peak value, the following procedure is performed.

［正の波高値を求める場合の説明］即ち、　ｄ（１）≦ｄ（２）である場合には、ざらにｄ
（２）とｄ（３〕の比較を行なう。ａ（３）＜　ｄ（２
）ならば２の値が、ピークになっているのでｄ（２）の
値は波高値である。さらに、これが正の波高値であるか
どうかはｄ（２）の符号を調べれば良い、　ｄ（２）＞
　Ｏであれば、ｄ＋にｄ（２）を代入し、ｄ＋とｎの値
を格納してステップ５８以下の処理を行なう。[Explanation for determining the positive peak value] In other words, if d(1)≦d(2), roughly d
Compare (2) and d(3).a(3)<d(2
), then the value of 2 is the peak, so the value of d(2) is the peak value. Furthermore, you can check whether this is a positive peak value by checking the sign of d(2), d(2)>
If it is O, d(2) is substituted for d+, the values of d+ and n are stored, and the processing from step 58 onwards is performed.

また、それ以外の場合、例えばｄ（３）≧ｄ（２）、ｄ
（２）≦０の場合等は、直接ステップＳ８以下の処理を
行なう。In other cases, for example, d(3)≧d(2), d
(2) In the case of ≦0, etc., the processing from step S8 onwards is directly performed.

［負の波高イ１を求める場合の説明］即ち、ｄ（１）≧ｄ（２）である場合には、ざらにｄ（
２）とｄ（３）の比較を行なう。　ｄ（３）　＞ｄ（２
）ならば２の値が、ピークになっているのでｄ（２）の
値は波高（＋ｔｉである。さらに、これが負の波高値で
あるかどうかはｄ（２）の符号を調べればよい。ｄ（２
）＜　０であれば、ｄ−にｄ（２）を代入し、ｄ−とｎ
の値を格納してステップ８８以下の処理を行なう。[Explanation for finding negative wave height i1] In other words, when d(1)≧d(2), roughly d(
2) and d(3) are compared. d(3) > d(2
), the value of 2 is the peak, so the value of d(2) is the wave height (+ti).Furthermore, whether this is a negative peak value can be determined by checking the sign of d(2). d(2
) < 0, substitute d(2) for d-, and set d- and n
The value of is stored and the processing from step 88 onwards is performed.

また、それ以外の場合、例えばｄ（３）≦ｄ（２）、ｄ
（２）≧Ｏの場合等は、直接ステップ５８以下の処理を
行なう。In other cases, for example, d(3)≦d(2), d
(2) If ≧O, etc., the process from step 58 onwards is directly performed.

第２図において、波高値検出回路４で求められる正の波
高値ｄ＋と、負の波高値ｄ−の位置の例をそれぞれ信号
、ムで表わした。In FIG. 2, examples of the positions of the positive peak value d+ and the negative peak value d- determined by the peak value detection circuit 4 are represented by signals and mu, respectively.

本実施例の波高値変化算出回路６では、上記の波高値検
出回路４の出力から次のような特徴量を算出している。The peak value change calculation circuit 6 of this embodiment calculates the following feature amount from the output of the peak value detection circuit 4 described above.

また、以後の式中ｄ”（ｎ）　、ｄ−（ｎ）は時間情報
ｎと波高値情報ｄ＋とｄ−の組を表わすことにする。Furthermore, in the following equations, d"(n) and d-(n) represent a set of time information n and peak value information d+ and d-.

一定時間内の正の波高価と負の波高値の総和の比；ｐｌ：Σ（ｄ＋（ｎ）；ｎ≦Ｔｌ／Σ（ｄ−（ｎ）　；
ｎ≦Ｔ）隣り合う同符号の波高１＋ｆｉの比とその間の
距離；ｐ２　　　：　ｄ＋（ｎ−１）／　ｄ＋（ｎ）ｐ
２（ｎ、ｔ）：（ｄ”（ｎ）となるｎの時間）−（ｄ＋
（ｎ−１）となるｎ−１の時間）ｐ３　　　：　Ｉ　ｄ−（ｎ−１）　　Ｉ　／　Ｉ　ｄ
ｌｎ）　　Ｉｐ３（ｎ、　ｔ）　：　（ｄ−（ｎ）とな
るｎの時間）−（ｄ−（ｎ−１）となるｎ−１の時間）隣り合う異なる符号の波高値の比その間の距離；ｐ４（
ｎ、＋）：　ｃＤ（ｎ−１）／　ｌ　ｄ−（ｎ）　　１
ｐ４（ｎ、ｔ）：（ｄ＋（ｎ）となるｎの時間１（ｄ−
（ｎ−１）となるｎ−１の時間）ｐ５（ｎ、−）：　Ｉ　ｄ−（ｎ−１）　　Ｉ　／ｄ＋
（ｎ）ｐ５（ｒ＋、　ｔ）　：　（ｃＤ（ｎ）となるｎ
の時間）−（ｄ−（ｎ−１）となるｎ−１の時間）特徴パタン統合部９では、特徴抽出部７から出力され、
バッファメモリ８に蓄えられた特徴パタンと、波高値変
化算出回路６の出力を統合して入力音声の特徴パタンを
新しく作成する。以後、この新しく作成されたものを単
に特徴パタンと呼ぶことにする。Ratio of the sum of positive wave height and negative wave height within a certain time; pl:Σ(d+(n); n≦Tl/Σ(d-(n);
n≦T) Ratio of adjacent wave heights 1+fi of the same sign and distance between them; p2: d+(n-1)/d+(n)p
2(n, t): (time of n for d''(n)) - (d+
(n-1) p3: I d-(n-1) I/I d
ln) Ip3(n, t): (Time of n to become d-(n)) - (Time of n-1 to become d-(n-1)) Distance between ratios of peak values of adjacent different codes ;p4(
n, +): cD(n-1)/l d-(n) 1
p4(n, t): (d+(n) at time 1(d-
(n-1) p5(n,-): I d-(n-1) I /d+
(n) p5(r+, t) : (n that becomes cD(n)
time) - (time of n-1 for d-(n-1)) In the feature pattern integration section 9, the output from the feature extraction section 7 is
A new feature pattern of the input voice is created by integrating the feature pattern stored in the buffer memory 8 and the output of the peak value change calculation circuit 6. Hereinafter, this newly created pattern will be simply referred to as a feature pattern.

また１本実施例では１２ｋＨｚでサンプリングしたが、
次のような基準で特徴パタン統合部９で新しく特徴パタ
ンの諸条件を設定する。In addition, in this example, sampling was performed at 12kHz, but
The feature pattern integration unit 9 sets new conditions for the feature pattern based on the following criteria.

標僧パタン記憶部７ｂの選択をした。The emblem pattern storage section 7b has been selected.

１）有声音と無声音との識別ｄ＋（ｎ）と　ｄ−（ｎ）との値の差が１００以上であ
り、ｐ４（ｎ、÷）＞Ｉ、３または、Ｐ　５（ｎ、−）
）　０．７１３を満足する場合は、有声片と判定する。1) Discrimination between voiced and unvoiced sounds The difference in value between d+(n) and d-(n) is 100 or more, and p4(n,÷)>I,3 or P5(n,-)
) If it satisfies 0.713, it is determined to be a voiced piece.

そうでない場合は、無声音と判定する。Otherwise, it is determined that the sound is voiceless.

２）無音と無声子音との識別ｌ〕で無声音と判定されたものに対して。2) Discrimination between silence and voiceless consonants l] for those determined to be voiceless sounds.

ｐ２（ｎ、ｔ）（３または、ｐ３（ｎ、ｔ）＜３を満足
する場合は、無音と判定する。そうでない場合は、罵声
子音と判定する。If p2(n,t)(3 or p3(n,t)<3 is satisfied, it is determined to be silent. Otherwise, it is determined to be an abusive consonant.

３）母音と子音との識別１）で有声音と′＃定されたものに対して、Ｐｉ）１．
５である場合は母音と判定する。そうでない場合は、予
行と判定する。3) Discrimination between vowels and consonants For those determined to be voiced in 1), Pi) 1.
If it is 5, it is determined to be a vowel. If not, it is determined to be a preliminary run.

以上、述べた如く音声情報の波高（ＩＴｉの１１４ｉ間
的変化と、各音素の判別結果を新たに音声情報の特徴パ
タンに特徴パタン統合部９で統合することにより、より
正確な音声情報の特徴を設定することが可能になる。As mentioned above, by integrating the wave height of audio information (inter-114i changes in ITi and the discrimination results of each phoneme into new audio information feature patterns in the feature pattern integration unit 9), more accurate audio information features can be obtained. It becomes possible to set.

また、パタンマツチング部１１では、メモリｌＯから標
準パタンを逐次読み出し、特徴統合部より出力された特
徴パタンとの類似度を計算し、その類似度最高の標準パ
タンを認識結果出力部１２に出力し、その標準パタンを
出力する。In addition, the pattern matching unit 11 sequentially reads the standard patterns from the memory IO, calculates the degree of similarity with the feature pattern output from the feature integration unit, and outputs the standard pattern with the highest degree of similarity to the recognition result output unit 12. and output the standard pattern.

本実施例では、波高値の時間変化情報を音声情報の特徴
パラメータとして新規に統合して認識処理を行ったが、
その他に音声波形の零交叉数の単位時間の比や、音声ス
ペクトルの単位時間の強度比などを用いても同等の効果
が得られる。In this example, the recognition process was performed by newly integrating the time change information of the peak value as a feature parameter of the audio information.
In addition, the same effect can be obtained by using the ratio of the number of zero crossings of the audio waveform per unit time, the intensity ratio of the audio spectrum per unit time, or the like.

また本実施例では１２ｋＨｚでサンプリングしたが、こ
れに特定されるものではなし、１サンプリングは１２ビ
ツトとなっているが、これも１２ヒントに特定されるも
のではない。Further, although sampling was performed at 12 kHz in this embodiment, the frequency is not limited to this, and although one sampling is 12 bits, this is also not limited to 12 hints.

［発明の効果］以上、述べたように本発明によれば、音声の特ｙ！１．
量に波高値情報に限らず、音声の時間変化情報を付加す
ることにより音声認識処理の認識率を向上させることが
可１后になった。[Effects of the Invention] As described above, according to the present invention, audio special y! 1.
It has recently become possible to improve the recognition rate of speech recognition processing by adding not only peak value information but also time change information of speech to the amount.

また、音声の時間変化情報によってマツチング候補が絞
れるから処理時間の短縮にも効果がある。Furthermore, since the matching candidates can be narrowed down based on the temporal change information of the audio, it is effective in shortening the processing time.

[Brief explanation of drawings]

第１図は、本実施例の音声認識装置のブロック構成図、第２図は、入力音声のＡ／Ｄ変換後の用カデータをグラ
フィック表示した図、第３図は、実施例の波高値検出処理を示すフローチャー
トである。図中、１・・・マイクロホン、２・・・Ａ／Ｄ変換器、
３．５．８・・・／ヘツファメモリ、４・・・波高値検
出回路、４　ａ　・−ＣＰ　Ｕ、４　ｂ−ＲＯＭ、４　
ｃ　・・−ＲＡＭ、６・・・波高値変化算出回路、７・
・・特徴抽出部、９・・・特徴パタン統合部、１０・・
・メモリ、１１・・・パタンマツチング部、１２・・・
認識結果出力部である。特許出願人　　　キャノン株式会社第１図第２図Fig. 1 is a block diagram of the speech recognition device of this embodiment. Fig. 2 is a graphical representation of input speech output after A/D conversion. Fig. 3 is a peak value detection diagram of the embodiment. 3 is a flowchart showing processing. In the figure, 1...microphone, 2...A/D converter,
3.5.8.../heffer memory, 4... Peak value detection circuit, 4 a.-CPU, 4 b-ROM, 4
c...-RAM, 6... Peak value change calculation circuit, 7.
...Feature extraction unit, 9...Feature pattern integration unit, 10...
・Memory, 11...Pattern matching section, 12...
This is a recognition result output unit. Patent applicant Canon Co., Ltd. Figure 1 Figure 2

Claims

[Claims]

(1) In a speech recognition device that selects a characteristic pattern of input voice information and recognizes a standard pattern from the characteristic pattern, a first calculating means for calculating a peak level of a waveform of the voice information; a second calculating means for calculating a temporal change in the peak level obtained by the calculating means; and an integration for integrating the temporal change in the peak level of the audio information obtained by the second calculating means into a characteristic pattern of the audio information. A speech recognition device comprising: means.

(2) The integrating means includes a discriminating means for discriminating the type of phoneme of the audio information from the temporal change in the peak level calculated by the second calculating means, and a discriminating means for discriminating the type of phoneme of the audio information from the temporal change in the peak level calculated by the second calculating means, and a part of the feature pattern that is determined by the discriminating means. 2. The speech recognition device according to claim 1, further comprising: a synthesis means for synthesizing the speech information.