JPH0677198B2

JPH0677198B2 - Speech recognition method

Info

Publication number: JPH0677198B2
Application number: JP60285792A
Authority: JP
Inventors: 晃一宮芝; 恭則大洞
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1985-12-20
Filing date: 1985-12-20
Publication date: 1994-09-28
Anticipated expiration: 2009-09-28
Also published as: JPS62145299A

Description

【発明の詳細な説明】［産業上の利用分野］本発明は入力された音声情報を認識する音声認識方法に
関するものである。The present invention relates to a voice recognition method for recognizing input voice information.

［従来の技術］音声認識装置は、まず、入力された音声をA/D変換し、
その出力を特徴抽出部に送る。特徴抽出部では、音声の
パワー（電力）情報を計算したり、高速フーリエ変換な
どの手法を用いて音声のスペクトル情報を算出してい
る。[Prior Art] The voice recognition device first A / D-converts the input voice,
The output is sent to the feature extraction unit. The feature extraction unit calculates the power information of the sound and calculates the spectrum information of the sound by using a method such as fast Fourier transform.

基準パタン記憶部に蓄えられている標準パタンが持つ情
報の種類は、特徴抽出部で算出される情報の種類に一致
しており、パタンマツチングの際の類似度計算は、ま
ず、入力音声と標準パタンの同じ種類の情報量ごとに計
算し、最終的な類似度は求まつた個々の情報量に定めら
れた値を乗じてそれらを加え合わせて求めている。The type of information that the standard pattern stored in the reference pattern storage unit has is the same as the type of information calculated by the feature extraction unit. The calculation is performed for each same type of information amount of the standard pattern, and the final similarity is calculated by multiplying the obtained individual information amount by a predetermined value and adding them together.

［発明が解決しようとする問題点］ところで従来の音声認識装置では、有声音と無声音の識
別、さらに無声の中での無音と無声子音の識別、有声音
の中での母音と鼻子音の識別等は、音声のパワー情報を
利用したり、周波数帯域を低域、中域、高域に分割し、
その帯域に含まれる周波数成分の比を利用することによ
つて行なわれてきた。[Problems to be Solved by the Invention] By the way, in a conventional voice recognition device, discrimination between voiced sound and unvoiced sound, further discrimination between unvoiced sound and unvoiced consonant, discrimination between vowel sound and nasal consonant in voiced sound. Etc. use the power information of voice, divide the frequency band into low band, middle band, high band,
This has been done by utilizing the ratio of frequency components included in the band.

しかし、入力音声に雑音が多く混在している場合には、
語頭の子音のパワー情報が雑音のパワーの中にうもれて
しまつたりすることが多く、語中の子音においてもその
スペクトルが前後の母音のスペクトルに引きずられ定性
的な情報が欠落して子音を識別することは容易ではなか
つた。However, if the input voice contains a lot of noise,
The power information of the consonant at the beginning of a word is often lost in the power of noise, and even in the consonant in a word, its spectrum is dragged by the spectrum of the vowels before and after, and qualitative information is lost, and consonants are lost. It was not easy to identify.

また、母音/u/のスペクトルは、鼻子音/m/,/n/のスペク
トルに非常に類似しているため、これらの誤識別率も高
かつた。Moreover, the vowel / u / spectrum is very similar to the nasal consonant / m /, / n / spectrum, so the misidentification rate of these was also high.

［目的］本発明は上記従来技術の問題点に鑑み為されたものであ
り、入力された音声情報に含まれる複数のピークレベル
の相対関係により音声情報の認識結果を予備選択するこ
とにより、音声認識処理の認識率を高め、処理時間を短
縮させる音声認識方法を提供することを目的とする。[Purpose] The present invention has been made in view of the above-mentioned problems of the prior art, and a voice recognition is performed by preselecting a recognition result of voice information according to a relative relationship of a plurality of peak levels included in input voice information. It is an object of the present invention to provide a voice recognition method that increases the recognition rate of recognition processing and shortens the processing time.

［課題を解決するための手段］上記課題を解決するために、本発明の音声認識方法は以
下の工程を備える。すなわち、入力された音声情報を認識する音声認識方法において、前記入力された音声情報に含まれる複数ピークレベルを
検出し、前記検出された複数のピークレベルの相対関係を算出
し、前記算出されたピークレベルの相対関係から前記入力さ
れた音声情報の認識候補を予備選択する。[Means for Solving the Problems] In order to solve the above problems, the speech recognition method of the present invention includes the following steps. That is, in a voice recognition method for recognizing input voice information, a plurality of peak levels included in the input voice information are detected, a relative relationship between the plurality of detected peak levels is calculated, and the calculated A recognition candidate of the input voice information is preselected from the relative relationship of the peak levels.

［作用］かかる本発明の工程において、入力された音声情報に含
まれる複数のピークレベルを検出し、前記検出された複
数のピークレベルの相対関係を算出する。そして、その
算出されたピークレベルの相対関係から入力された音声
情報の認識候補を予備選択する。[Operation] In the process of the present invention, a plurality of peak levels included in the input voice information are detected, and a relative relationship between the detected plurality of peak levels is calculated. Then, a recognition candidate of the input voice information is preselected from the calculated relative relationship of the peak levels.

［実施例］以下、添付図面に従つて本発明に係る実施例を詳細に説
明する。Embodiments Embodiments according to the present invention will be described in detail below with reference to the accompanying drawings.

第１図は本実施例の音声認識装置のブロツク構成図であ
る。FIG. 1 is a block diagram of the voice recognition apparatus of this embodiment.

図中、１は音声入力部で音声を電気信号に変換するマイ
クロホンであり、２は電気信号に変換された音声を例え
ば、５〜10m秒毎に標本化し、量子化するアナログをデ
ジタル化するA/D変換器である。３はA/D変換器の出力を
一部蓄えるバツフアメモリであり、４はバツフアメモリ
３からデータを逐次読出して波高値を求める波高値検出
回路である。4aは波高値検出回路４内にある中央処理装
置CPUであり、4bは後述する第３図のフローチヤートの
プログラムが格納されているRAMであり、4cはワークエ
リア又は後述する波高値を求めるときに使用するｄ
（１）,d（２）,d（３）のバツフアとして使用するRAM
である。５は波高値の時間的変化を算出する波高値変化
算出回路である。６は波高値変化算出回路５によつて求
められた波高値の値で入力された音声が有声音か無声音
かを識別し、更に無音と無声子音の識別、また、母音鼻
子音の識別等を行なう判別回路である。7bは波高値の値
別に類別された標準パタンをグループ毎に記憶する標準
パタン記憶部であり、7aは標準パタン記憶部7bを有する
メモリである。８は周波数範囲200〜6000ヘルツを８〜3
0チヤンネルに分けた帯域通過フイルタ群から成り、パ
ワー信号やスペクトル情報等の特徴抽出を行なう特徴抽
出部である。９は波高値の値別に類別された標準パタン
記憶部が選択されるまでの間入力音声特徴量を蓄えてお
くバツフアメモリであり、10は判別回路６より判定され
た標準パタン記憶部7bを選択するスイツチである。11は
入力音声特徴量スイツチ10を切り替えて読出した標準パ
タンを比較し、両者の類似度を計算するパタンマツチン
グ部である。12はパタンマツチング部11で計算された類
似度最大の標準パタンを認識結果として出力する認識結
果出力部である。In the figure, 1 is a microphone that converts a voice into an electric signal in a voice input unit, and 2 is a sample that samples the voice converted into an electric signal every 5 to 10 msec and digitizes an analog to be quantized A It is a / D converter. Reference numeral 3 is a buffer memory for storing a part of the output of the A / D converter, and reference numeral 4 is a peak value detecting circuit for sequentially reading data from the buffer memory 3 to obtain a peak value. 4a is a central processing unit CPU in the crest value detection circuit 4, 4b is a RAM in which the program of the flow chart of FIG. 3 to be described later is stored, and 4c is a work area or a crest value to be described later. Used for d
RAM used as a buffer for (1), d (2), d (3)
Is. Reference numeral 5 denotes a peak value change calculation circuit for calculating a temporal change in the peak value. Reference numeral 6 identifies whether the input voice is a voiced sound or an unvoiced sound based on the value of the peak value obtained by the peak value change calculation circuit 5, and further identifies a silent voice and an unvoiced consonant, and a vowel nose consonant. This is a discriminating circuit. Reference numeral 7b is a standard pattern storage unit that stores standard patterns classified by peak value for each group, and 7a is a memory having a standard pattern storage unit 7b. 8 is a frequency range of 200-6000 hertz 8-3
It is a feature extraction unit that consists of a band-pass filter group divided into 0 channels and extracts features such as power signals and spectrum information. Reference numeral 9 is a buffer memory for storing the input voice feature amount until the standard pattern storage section classified by the peak value is selected, and 10 selects the standard pattern storage section 7b judged by the judgment circuit 6. It is a switch. Reference numeral 11 denotes a pattern matching unit which compares the standard patterns read by switching the input voice feature amount switch 10 and calculates the similarity between the two. Reference numeral 12 is a recognition result output unit that outputs the standard pattern having the maximum similarity calculated by the pattern matching unit 11 as a recognition result.

以下、本実施例の動作を詳細に説明する。The operation of this embodiment will be described in detail below.

まず、入力音声はA/D変換器２によつてデイジタル量に
変換され、その出力はバツフアメモリ３を介して波高値
検出回路４と特徴抽出部８に送られる。A/D変換のサン
プリング周波数、及び１サンプルあたりの量子化ビツト
数は可変であるが本実施例では、12kHzでサンプリング
を行ない、１サンプル12ビツト（12ビツト中１ビツトは
符号ビツトとする）で量子化した。この場合、１秒の音
声は12000点のデータで表わされることになる。First, the input voice is converted into a digital amount by the A / D converter 2, and its output is sent to the peak value detection circuit 4 and the feature extraction unit 8 via the buffer memory 3. Although the sampling frequency of A / D conversion and the number of quantization bits per sample are variable, in this embodiment, sampling is performed at 12 kHz and 1 sample is 12 bits (1 bit out of 12 bits is a code bit). Quantized. In this case, 1 second of voice is represented by 12000 points of data.

第２図はA/D変換器２の出力をグラフイツク表示した図
である。FIG. 2 is a graphic representation of the output of the A / D converter 2.

A/D変換は実時間で行なわれるために、波高値検出回路
４の前にバツフアメモリ３をおき、波高値検出回路４は
このバツフアメモリ３からサンプルデータを逐次読出し
て行なう。Since the A / D conversion is performed in real time, the buffer memory 3 is placed in front of the peak value detecting circuit 4, and the peak value detecting circuit 4 sequentially reads the sample data from the buffer memory 3.

第３図は波高値検出回路４内部のCPU4aが処理するフロ
ーチャートである。FIG. 3 is a flow chart for processing by the CPU 4a inside the peak value detection circuit 4.

ここで、ｄ（１）,d（２）,d（３）はバツフアメモリ３
から読出すデータを格納する配列で、1,2,3はサンプル
データの順番を表わす。また、ｄ＋は正の波高値を、ｄ
−は負の波高値をそれぞれ表わす。Here, d (1), d (2), d (3) are buffer memories 3
This is an array for storing the data read from, and 1, 2, 3 represent the order of sample data. Also, d + is a positive peak value, d
− Represents a negative peak value.

まず、ステツプS1でバツフアメモリ３に格納されたデー
タの始めの２つをｄ（１）,d（２）に夫々読み込む。ス
テツプS2ではバツフアメモリ３内のデータを全て読み終
えたかを判定し、読み終えたらステツプS9で処理を終了
するが、この時点では終了していないからステツプS3に
すすみ、バツフアメモリ３から次のデータをｄ（３）に
読み込む。First, in step S1, the first two data stored in the buffer memory 3 are read into d (1) and d (2), respectively. In step S2, it is judged whether all the data in the buffer memory 3 have been read, and when the reading is completed, the process ends in step S9. However, since it has not ended at this point, the process proceeds to step S3, and the next data is read from the buffer memory 3. Read in (3).

ステツプS4,6ではｄ（１）,d（２）,d（３）の大小関係
を調べる。例えば、ｄ（１）＜ｄ（２）且つｄ（２）＞
ｄ（３）で、ｄ（２）＞０のときのｄ（２）が正の波高
値であり、ｄ（１）＞ｄ（２）且つｄ（２）＜ｄ（３）
で、ｄ（２）＜０のときｄ（２）が負の波高値であるか
ら、夫々上記条件を満たしているときに、ステツプS5,7
でｄ＋,d−にｄ（２）を格納し、またそのデータが何番
目のデータであるか格納してステツプS8にすすむ。In step S4,6, the magnitude relationship of d (1), d (2), d (3) is examined. For example, d (1) <d (2) and d (2)>
When d (2)> 0 in d (3), d (2) is a positive peak value, and d (1)> d (2) and d (2) <d (3).
Then, when d (2) <0, d (2) has a negative peak value, and therefore, when the above conditions are satisfied, step S5,7
Then, d (2) is stored in d + and d-, and the order of the data is stored, and the process proceeds to step S8.

また以上の条件を満足しない場合には直接ステツプS8に
すすむ。ステツプS8では現在のｄ（２）をｄ（１）に格
納し、同様にｄ（３）をｄ（２）に格納して、ステツプ
S2に移り、終了かどうかを判定し、終了でないときに
は、新たにデータをバツフアメモリ３より読み込み、そ
のデータをｄ（３）に格納して同様の処理を繰り換え
し、全てのデータに対して処理がなされる。If the above conditions are not satisfied, proceed directly to step S8. At step S8, the current d (2) is stored in d (1), similarly d (3) is stored in d (2), and
The process moves to S2, it is judged whether or not the process is finished, and when it is not finished, the data is newly read from the buffer memory 3, the data is stored in d (3), the same process is repeated, and all the data are processed. Is done.

またデータの個数は計測時間に12000を掛た値である。The number of data is the value obtained by multiplying the measurement time by 12000.

以上、述べた様に、波高値を求める時には、以下の様に
することになる。As described above, when obtaining the peak value, the following is done.

［正の波高値を求める場合の説明］即ち、ｄ（１）≦ｄ（２）である場合には、さらにｄ
（２）とｄ（３）の比較を行なう。ｄ（３）＜ｄ（２）
ならば２の値が、ピークになつているのでｄ（２）の値
は波高値である。さらに、これが正の波高値であるかど
うかはｄ（２）の符号を調べれば良い。ｄ（２）＞０で
あれば、ｄ＋にｄ（２）を代入し、ｄ＋とｎの値を格納
してステツプS8以下の処理を行なう。[Explanation for Obtaining Positive Crest Value] That is, when d (1) ≦ d (2), further d
(2) and d (3) are compared. d (3) <d (2)
Then, the value of 2 is at the peak, so the value of d (2) is the peak value. Furthermore, the sign of d (2) may be checked to see if this is a positive peak value. If d (2)> 0, d (2) is substituted for d +, the values of d + and n are stored, and the processing from step S8 onward is performed.

また、それ以外の場合、例えばｄ（３）≧ｄ（２）、ｄ
（２）≦０の場合等は、直接ステツプS8以下の処理を行
なう。In other cases, for example, d (3) ≧ d (2), d
(2) If ≤0, etc., the process directly following step S8 is performed.

［負の波高値を求める場合の説明］即ち、ｄ（１）≧ｄ（２）である場合には、さらにｄ
（２）とｄ（３）の比較を行なう。[Explanation for Obtaining Negative Crest Value] That is, when d (1) ≧ d (2), further d
(2) and d (3) are compared.

ｄ（３）＞ｄ（２）ならば２の値が、ピークになつてい
るのでｄ（２）の値は波高値である。さらに、これが負
の波高値であるかどうかはｄ（２）の符号を調べればよ
い。ｄ（２）＜０であれば、ｄ−にｄ（２）を代入しｄ
−とｎの値をセーブしてステツプS8以下の処理を行な
う。If d (3)> d (2), the value of 2 is at the peak, so the value of d (2) is the peak value. Furthermore, the sign of d (2) may be checked to see if this is a negative peak value. If d (2) <0, substitute d (2) for d-
The values of − and n are saved, and the processing from step S8 onward is performed.

また、それ以外の場合、例えば、ｄ（３）≦ｄ（２）、
ｄ（２）≧０の場合等は、直接ステツプS8以下の処理を
行なう。In other cases, for example, d (3) ≦ d (2),
If d (2) ≧ 0 or the like, the processing directly after step S8 is performed.

第２図において、波高値検出回路４で求められる正の波
高値ｄ＋と、負の波高値ｄ−の位置の例をそれぞれ記号
▽、△で表わした。In FIG. 2, examples of the positions of the positive crest value d + and the negative crest value d− obtained by the crest value detection circuit 4 are represented by symbols ▽ and Δ, respectively.

本実施例の波高値変化算出回路５では、上記の波高値検
出回路４の出力から次のような特徴量を算出している。
また、以後の式中ｄ＋（ｎ）,d−（ｎ）は時間情報ｎと
波高値情報ｄ＋とｄ−の組を表わすことにする。The crest value change calculation circuit 5 of this embodiment calculates the following characteristic amount from the output of the crest value detection circuit 4 described above.
Further, in the following formulas, d + (n) and d- (n) represent a set of time information n and peak value information d + and d-.

一定時間内の正の波高値と負の波高値の総和の比； p1 ：Σ｛ｄ＋（ｎ）;n≦Ｔ｝／Σ｛ｄ−（ｎ）;n≦
Ｔ｝隣り合う同符号の波高値の比とその間の距離； p2 :d＋（ｎ−１）/d＋（ｎ） p2（n,t）：｛ｄ＋（ｎ）となるｎの時間｝−｛ｄ＋
（ｎ−１）となるｎ−１の時間｝ p3:|d−（ｎ−１）|/|d−（ｎ）｜ p3（n,t）：｛ｄ＋（ｎ）となるｎの時間｝−｛ｄ−
（ｎ−１）となるｎ−１の時間｝隣り合う異なる符号の波高値の比とその間の距離； p4（n,＋）:d＋（ｎ−１）/|d−（ｎ）｜ p4（n,t）：｛ｄ＋（ｎ）となるｎの時間｝−｛ｄ−
（ｎ−１）となるｎ−１の時間｝ p5（n,−）:|d−（ｎ−１）|/d＋（ｎ） p5（n,t）：｛ｄ＋（ｎ）となるｎの時間｝−｛ｄ−
（ｎ−１）となるｎ−１の時間｝判別回路６では、波高値の大きさと上記、波高値変化算
出回路５によつて得られた特徴量とを組み合わせて、入
力音声から有声音と無声音の識別、さらに無声の中での
無音と無声子音の識別、無声音の中での母音と鼻子音の
識別等を行なう。Ratio of sum of positive peak value and negative peak value within a fixed time; p1: Σ {d + (n); n ≦ T} / Σ {d− (n); n ≦
T} Ratio of adjacent crest values of the same sign and the distance between them; p2: d + (n-1) / d + (n) p2 (n, t): {time of n to be d + (n)}-{d +
N-1 time of (n-1)} p3: | d- (n-1) | / | d- (n) | p3 (n, t): {n time of d + (n)} -{D-
(N-1 time as (n-1)} The ratio of the crest values of the adjacent different codes and the distance between them; p4 (n, +): d + (n-1) / | d- (n) | p4 ( n, t): {time of n that becomes d + (n)}-{d-
N-1 time of (n-1)} p5 (n,-): | d- (n-1) | / d + (n) p5 (n, t): {d + (n) of n Time}-{d-
(N-1 time of (n-1)} In the discrimination circuit 6, the magnitude of the crest value and the feature amount obtained by the crest value change calculation circuit 5 are combined to convert the input voice into a voiced sound. Identification of unvoiced sounds, further identification of unvoiced consonants and unvoiced consonants, unvoiced recognition of vowels and consonants.

また、本実施例では12kHzでサンプリングしたが、次の
ような基準で標準パタン記憶部7bに格納される標準パタ
ンの選択をした。Further, although the sampling is performed at 12 kHz in this embodiment, the standard pattern stored in the standard pattern storage unit 7b is selected based on the following criteria.

１）有声音と無声音との識別ｄ＋（ｎ）とｄ−（ｎ）との値の差が100以上であり、p
4（n,＋）＞1.3または、p5（n,−）＞0.76 を満足する場合は、有声音と判定する。そうでない場合
は、無声音と判定する。1) Discrimination between voiced sound and unvoiced sound The difference between the values of d + (n) and d- (n) is 100 or more, and p
If 4 (n, +)> 1.3 or p5 (n,-)> 0.76 is satisfied, it is judged as voiced sound. If not, it is determined as unvoiced sound.

２）無音と無声子音との識別１）で無声音と判定されたものに対して、p2（n,t）＜
３または、p3（n,t）＜３を満足する場合は、無音と判
定する。そうでない場合は、無声子音と判定する。2) Discrimination between silent and unvoiced consonants p2 (n, t) <
When 3 or p3 (n, t) <3 is satisfied, it is determined that there is no sound. If not, it is determined to be unvoiced consonant.

３）母音と子音との識別１）で有声音と判定されたものに対して、p1＞1.5であ
る場合は母音と判定する。そうでない場合は、子音と判
定する。3) Discrimination between vowels and consonants In contrast to the voiced sound determined in 1), if p1> 1.5, it is determined as a vowel. If not, it is determined to be a consonant.

このような基準で選択された標準パタンの候補が標準パ
タン記憶部7bに格納される。この標準パタンの中から１
つ選んで選択する。この選択の処理は、第１図のスイツ
チ10によつて選択し、選択された標準パタン記憶部7bか
ら標準パタンが逐次読出され、そのパタンと特徴抽出部
から出力され一時的にバツフアメモリ９に蓄えられてい
る入力音声の特徴パタンは、パタンマツチング部11に送
られる。パタンマツチング部11では両者の類似度が計算
され、認識結果出力部12において類似度最大の標準パタ
ンが認識結果として認識結果出力部12より出力される。Standard pattern candidates selected based on such a criterion are stored in the standard pattern storage unit 7b. 1 out of this standard pattern
Select one to select. This selection process is selected by the switch 10 in FIG. 1, and the standard patterns are sequentially read from the selected standard pattern storage unit 7b, output from the pattern and the feature extraction unit, and temporarily stored in the buffer memory 9. The characteristic pattern of the input voice that has been input is sent to the pattern matching unit 11. The pattern matching unit 11 calculates the similarity between the two, and the recognition result output unit 12 outputs the standard pattern with the maximum similarity as the recognition result from the recognition result output unit 12.

以上、述べた如く本実施例によると入力された音声情報
の候補をいくつか選出することにより、より正確な認識
結果が出力されることになる。As described above, according to the present embodiment, a more accurate recognition result is output by selecting some input voice information candidates.

本実施例では、波高値の時間変化の情報によつて標準パ
タンのグループを選択したが、各標準パタンの特徴量に
波高値の時間変化情報を新規に組み込むことによつて標
準パタン記憶部をグループ分けしないでも実施例と同等
の効果が得られる。時間変化情報としては、他にスペク
トル概形、零交差等がある。また、本実施例を応用する
ことにより、高速、高信頼性の音声識別装置、例えば音
声認識が付いたタイプライタ等を構成できる。In the present embodiment, the group of standard patterns is selected based on the information on the time change of the crest value.However, the standard pattern storage unit is created by newly incorporating the time change information of the crest value in the feature amount of each standard pattern. Even if not divided into groups, the same effect as that of the embodiment can be obtained. As the time change information, there are other spectrum outlines, zero crossings, and the like. Further, by applying this embodiment, it is possible to configure a high-speed and highly-reliable voice identification device, for example, a typewriter equipped with voice recognition.

また本実施例では12kHzでサンプリングしたが、これに
特定されるものではなし、１サンプリングは12ビットと
なつているが、これも12ビットに特定されるものではな
い。Further, in the present embodiment, the sampling is performed at 12 kHz, but it is not specified to this. One sampling is 12 bits, but this is not specified to 12 bits either.

［効果］以上述べたように、本発明によれば、入力された音声情
報に含まれる複数のピークレベルの相対関係により音声
情報の認識結果を予備選択することにより、音声認識処
理の認識率を高め、処理時間を短縮させる音声認識方法
を提供することができる。[Effect] As described above, according to the present invention, the recognition rate of the voice recognition process is improved by preselecting the recognition result of the voice information according to the relative relationship of the plurality of peak levels included in the input voice information. It is possible to provide a voice recognition method that enhances the processing time and shortens the processing time.

[Brief description of drawings]

第１図は、本実施例の音声認識装置のブロツク構成図、第２図は、入力音声のA/D変換後の出力データをグラフ
イツク表示した図、第３図は、実施例の波高値検出処理を示すフローチヤー
トである。図中、１……マイクロホン、２……A/D変換器、3,9……
バツフアメモリ、４……波高値検出回路、4a……CPU、4
b……ROM、4c……RAM、５……波高値変化算出回路、６
……判別回路、7a……メモリ、7b……標準パタン記憶
部、８……特徴抽出部、10……スイツチ、11……パタン
マツチング部、12……認識結果出力部である。FIG. 1 is a block diagram of the voice recognition apparatus of this embodiment, FIG. 2 is a graphic representation of the output data of the input voice after A / D conversion, and FIG. 3 is the peak value detection of the embodiment. It is a flow chart showing processing. In the figure, 1 ... Microphone, 2 ... A / D converter, 3,9 ...
Buffer memory, 4 ... Crest value detection circuit, 4a ... CPU, 4
b …… ROM, 4c …… RAM, 5 …… Crest value change calculation circuit, 6
...... Discrimination circuit, 7a ...... Memory, 7b ...... Standard pattern storage section, 8 ...... Feature extraction section, 10 ...... Switch, 11 ...... Pattern matching section, 12 ...... Recognition result output section.

Claims

[Claims]

1. A voice recognition method for recognizing input voice information, wherein a plurality of peak levels included in the input voice information are detected, and a relative relationship between the plurality of detected peak levels is calculated, A voice recognition method, characterized in that a recognition candidate of the inputted voice information is preselected from the calculated relative relation of the peak levels.