JPH02134700A

JPH02134700A - Voice recognizing device

Info

Publication number: JPH02134700A
Application number: JP28678688A
Authority: JP
Inventors: Shigemi Otsu; 茂実大津
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1988-11-15
Filing date: 1988-11-15
Publication date: 1990-05-23

Abstract

PURPOSE:To speed up voice recognition by deciding a voice according to feature parameters from a parameter extracting means which extracts the feature parameters according to a detected formant frequency. CONSTITUTION:When a voice signal S is inputted to a voice signal dividing means 1, the voice signal S is outputted as voice division signals Si which are divided by specific frequency bands. Then adjacent voice division signals Si-1 and Si are compared by a comparing means 2 and converted into binary signals. Its comparison information is inputted to a parameter extracting means 3. Then the parameter extracting means 3 detects the formant frequency according to the input comparison information and extracts the feature parameters according to the detected formant frequency, and then a voice decision means 4 decides the input voice signal S according to the feature parameters. Thus, the time of the detecting operation for the formant frequency is shortened to speed up the voice recognition.

Description

【発明の詳細な説明】［産業上の利用分野］この発明は、音声認識装置に係り、特に、ホルマント周
波数自体あるいはホルマント周波数に基づく情報を特徴
パラメータとして抽出するタイプの音声認識装置の改良
に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a speech recognition device, and particularly to an improvement of a speech recognition device of a type that extracts formant frequencies themselves or information based on formant frequencies as feature parameters.

［従来の技術］一般に、音声認識方法としては、例えば抽出対象となる
母音（この例ではＵ″の音声信号（第８図（ａ）参照）
を周波数分析し、この周波数分析したちのく第８図（ｂ
）参照）を線形予測分析（ＬＰＧ分析）して得られる特
性曲線（第８図（Ｃ）参照）から、第一ホルマント及び
第二ホルマントの周波数の絶対値を特徴パラメータとし
て抽出することが広く採用されている。尚、第８図（Ｃ
）中、ＦｌないしＦ４は第一ないし第四ホルマントを示
す。[Prior Art] In general, as a speech recognition method, for example, a vowel to be extracted (in this example, a speech signal of U'' (see FIG. 8(a)) is used.
Figure 8 (b)
) is widely used to extract the absolute values of the frequencies of the first and second formants as characteristic parameters from the characteristic curve (see Figure 8 (C)) obtained by linear predictive analysis (LPG analysis). has been done. Furthermore, Figure 8 (C
), Fl to F4 represent the first to fourth formants.

このような従来の音声認識方法を実現する装置としては
、例えば第９図に示すように、母音のような音声を電気
的音声信号に変換するマイクロホン１０と、マイクロホ
ン１０から入力された音声信号を増幅した後不要な高周
波数成分をカットするアンプ及びローパスフィルタ並び
に音声信号の振幅レベルを調整するオートゲインコント
ロール回路（以下ＡＧＣ回路と略記する）からなる信号
調整回路１１と、信号調整回路１１からのアナログ信号
をディジタル化するＡＤコンバータ１２と、ディジタル
化された音声信号を一定周期毎にサンプリングして周波
数分析を行う周波数分析部１３と、線形予測分析等によ
って周波数分析部１３で分析された５ＫＨ２以下のスペ
クトルビークを抽出し、第−及び第二ホルマント周波数
を検出するホルマント検出部１４と、このホルマント検
出部１４で検出されたホルマント周波数を特徴パラメー
タとし、この特徴パラメータをもとに母音を判別する音
声判別部１５とで構成されたものが既に知られている。For example, as shown in FIG. 9, a device for realizing such a conventional voice recognition method includes a microphone 10 that converts a voice such as a vowel into an electrical voice signal, and a device that converts a voice signal input from the microphone 10 into an electric voice signal. A signal adjustment circuit 11 consisting of an amplifier and a low-pass filter that cut unnecessary high frequency components after amplification, and an auto gain control circuit (hereinafter abbreviated as AGC circuit) that adjusts the amplitude level of the audio signal; An AD converter 12 that digitizes an analog signal, a frequency analyzer 13 that samples the digitized audio signal at regular intervals and performs frequency analysis, and a frequency analyzer 13 that performs frequency analysis by sampling the digitized audio signal at regular intervals; A formant detection unit 14 extracts the spectral peak of A device configured with a voice discriminator 15 is already known.

［発明が解決しようとする課題１ところで、このような従来の音声認識装置にあっては、
音声信号をディジタル化した後、ＣＰＵボード等のソフ
トウェア処理によってホルマント周波数を検出するよう
にしているので、必然的にホルマント周波数の検出時間
がかかり、その分、音声認識を高速に行うことが困難に
なるという問題を含んでいた。[Problem to be solved by the invention 1 By the way, in such a conventional speech recognition device,
After the audio signal is digitized, the formant frequency is detected by software processing on a CPU board, etc., which inevitably takes time to detect the formant frequency, which makes it difficult to perform high-speed speech recognition. It included the problem of becoming.

このような問題は、ホルマント周波数をそのまま特徴パ
ラメータとして抽出するものに限られるものではなく、
ホルマント周波数を検出し、検出されたホルマント周波
数に基づいて特徴パラメータを抽出するものにおいても
同様に生じ得る。Such problems are not limited to those that directly extract formant frequencies as feature parameters;
The same problem may occur in devices that detect formant frequencies and extract feature parameters based on the detected formant frequencies.

この発明は、以上の問題点に着目して為されたものであ
って、ホルマント周波数の検出動作時間の短縮化を図り
、もって、音声認識の高速化を企図した音声認識装置を
提供するものである。The present invention has been made in view of the above-mentioned problems, and it is an object of the present invention to provide a speech recognition device that aims to shorten the formant frequency detection operation time and thereby speed up speech recognition. be.

［課題を解決するための手段］すなわち、この発明は、第１図に示すように、音声信号
Ｓを所定の周波数帯域成分毎に分割して出力する音声信
号分割手段１と、この音声信号分割手段１にて分割され
た各周波数毎の音声分割信号ｓｒ　　＜ｒ＝１〜ｎ）の
出力レベルを隣接チャンネル間で比較して二値化する比
較手段２と、上記比較手段２からの比較情報に基づいて
ホルマント周波数を検出し、検出されたホルマント周波
数に基づいて特徴パラメータを抽出するパラメータ抽出
手段３と、このパラメータ抽出−手段３からの特徴パラ
メータをもとに音声の判別を行う音声判別手段４とを備
えた音声認識装置である。[Means for Solving the Problems] As shown in FIG. a comparing means 2 that compares and binarizes the output level of the audio divided signal sr < r = 1 to n) for each frequency divided by the means 1 between adjacent channels; and comparison information from the comparing means 2. a parameter extracting means 3 that detects a formant frequency based on the detected formant frequency and extracts a feature parameter based on the detected formant frequency, and a voice discriminating means that discriminates speech based on the feature parameter from the parameter extracting means 3. This is a speech recognition device equipped with 4.

このような技術的手段において、本発明の適用範囲とし
ては音声認識に当ってホルマント周波数を検出し得るも
のであれば総て対象になり得る。In such technical means, the scope of the present invention can be applied to any device in which formant frequencies can be detected in speech recognition.

この場合において、音声を認識する上での特徴パラメー
タとしては、ホルマント周波数自体やその対数情報等適
宜選択することができるが、不特定話者の音声を認識す
る際の個人差を少なく抑えるという観点からすれば、ホ
ルマント周波数そのものに依存しないホルマント周波数
相互の比やホルマント周波数の対数差情報の此等ホルマ
ント周波数を相対値化したものを特徴パラメータとする
ことが好ましい。In this case, the formant frequency itself and its logarithm information can be selected as appropriate as characteristic parameters for recognizing speech, but from the viewpoint of minimizing individual differences when recognizing speech from unspecified speakers. Therefore, it is preferable to use a relative value of formant frequencies, such as a ratio between formant frequencies or logarithmic difference information of formant frequencies, which does not depend on the formant frequencies themselves, as a feature parameter.

また、上記音声信号分割手段１の分割数、分割すべき周
波数帯域の範囲等については、各種音声のホルマント周
波数に応じて適宜選択することができる。この場合にお
いて、特徴パラメータとしてホルマント周波数の対数情
報を用いるタイプにあっては、対数変換をパラメータ抽
出手段３にて行うようにしてもよいが、音声認識のより
高速化を企図すれば、バンドパスフィルタ群を対数的に
配列したり、バンドフィルタ群と対数変換手段とを組合
せたりして音声信号分割手段１を構成し、上記音声信号
分割手段１にて予めホルマント周波数を実質的に対数変
換するようにすることが好ましい。Further, the number of divisions of the audio signal dividing means 1, the range of frequency bands to be divided, etc. can be appropriately selected depending on the formant frequencies of various voices. In this case, if the type uses logarithmic information of the formant frequency as a feature parameter, logarithmic transformation may be performed by the parameter extraction means 3, but if speeding up speech recognition is intended, band pass The audio signal dividing means 1 is configured by arranging filter groups logarithmically or by combining a band filter group and a logarithmic conversion means, and the formant frequency is substantially logarithmically converted in advance by the audio signal dividing means 1. It is preferable to do so.

更に、上記比較手段２としてはアナログ信号である隣接
する音声分割信号３ｉ−１，３ｉを比較して二値化し得
るものであれば適宜設計変更して差支えない。Furthermore, the design of the comparing means 2 may be changed as appropriate as long as it can compare and binarize the adjacent audio divided signals 3i-1 and 3i, which are analog signals.

更にまた、上記パラメータ抽出手段３としては、比較手
段２からの比較情報に基づいてホルマント周波数を検出
するホルマント検出部と、このホルマント検出部からの
検出情報に基づいて特徴パラメータを演算するパラメー
タ演算部とを備えたものであれば、適宜設計変更差支え
ない。この場合において、有声子音を認識する際には、
各ホルマントが時間的に変化するため、特徴パラメータ
の抽出に当って時間変化をメモリに記憶できるように設
計することが必要である。Furthermore, the parameter extracting means 3 includes a formant detecting section that detects a formant frequency based on the comparison information from the comparing means 2, and a parameter calculating section that calculates feature parameters based on the detection information from this formant detecting section. As long as it is equipped with the following, the design may be changed as appropriate. In this case, when recognizing voiced consonants,
Since each formant changes over time, it is necessary to design the system so that the changes over time can be stored in memory when extracting feature parameters.

また、音声判別手段４としては、認識対象となる音声に
関する基準情報を具備し、特徴パラメータと前記基準情
報とを比較して合致したものを入力音声として認識し得
るものであれば適宜設計変更することができる。Further, the speech discrimination means 4 is provided with reference information regarding the speech to be recognized, and if it is capable of comparing the feature parameters and the reference information and recognizing the matched speech as the input speech, the design is appropriately changed. be able to.

［作用」上述したような技術的手段によれば、音声信号Ｓが音声
信号分割手段１に入力されると、音声信号Ｓは所定の周
波数帯域毎に分割された音声分割信号３ｉとして出力さ
れる。そして、隣接する音声分割信号３ｉ−１，Ｓｉは
比較手段２で比較されて二値化され、この比較情報がパ
ラメータ抽出手段３に入力される。[Operation] According to the above-mentioned technical means, when the audio signal S is input to the audio signal dividing means 1, the audio signal S is output as the audio divided signal 3i divided into each predetermined frequency band. . Then, the adjacent audio divided signals 3i-1 and Si are compared and binarized by the comparing means 2, and this comparison information is input to the parameter extracting means 3.

次いで、このパラメータ抽出手段３では、入力された比
較情報に基づいてホルマント周波数が検出されると共に
、検出されたホルマント周波数に基づいて特徴パラメー
タが抽出されることになり、この後、音声判別手段４が
前記特徴パラメータをもとに入力音声信号Ｓを判別する
。Next, the parameter extraction means 3 detects a formant frequency based on the input comparison information, and also extracts a feature parameter based on the detected formant frequency. determines the input audio signal S based on the characteristic parameters.

［実施例］以下、添附図面に示す実施例に基づいてこの発明の詳細
な説明する。[Embodiments] Hereinafter, the present invention will be described in detail based on embodiments shown in the accompanying drawings.

第２図はこの発明に係る音声認識装置の一実施例を示す
ブロック図であり、音声の特徴パラメータとしては隣接
するホルマント周波数の対数差を用い、音声として母音
及び有声子音を認識し１テるものである。FIG. 2 is a block diagram showing an embodiment of the speech recognition device according to the present invention, in which vowels and voiced consonants are recognized as speech using logarithmic differences between adjacent formant frequencies as speech feature parameters. It is something.

同図において、２１は音声を電気的音声信号に変換する
マイクロホン、２２はマイクロホン２１から入力された
音声信号を増幅した後不要な高周波数成分をカットする
アンプ及びローパスフィルタ並びに音声信号の１辰動レ
ベルを調整するＡＧＣ回路からなる信号調整回路、２３
は音声信号の１５０〜４８００Ｈ２の範囲を１／３オク
ターブずつ１５チヤンネルに対数的に等分割すべく並設
されるバンドパスフィルタ（具体的にはチャンネルｃｈ
（ｉ）（ｉ−１〜１５））（本発明の音声信号分割手段
１に相当）、２４はバンドパスフィルタ２３の隣接する
チャンネルｃ　ｈ　（ｉ−１）　、　ｃ　ｈ　（ｉ）間
の出力レベルを比較した後二値化して出力する二値化回
路（本発明の比較手段２に相当）、２５は上記二値化回
路２４からの音声信号がどの母音あるいは有声子音であ
るかを判別するＣＰＵボード（本発明のパラメータ抽出
手段３及び音声判別手段４に相当）である。尚、バンド
パスフィルタ２３からの出力は、図示外の整流平滑回路
を通り強度情報のみを持った直流信号に変換される。In the same figure, 21 is a microphone that converts audio into an electrical audio signal, 22 is an amplifier and low-pass filter that amplifies the audio signal input from the microphone 21 and then cuts unnecessary high frequency components, and an audio signal converter. A signal adjustment circuit consisting of an AGC circuit that adjusts the level, 23
is a bandpass filter (specifically, a channel ch
(i) (i-1 to 15)) (corresponding to the audio signal dividing means 1 of the present invention), 24 is the output between adjacent channels ch (i-1) and ch (i) of the bandpass filter 23 A binarization circuit (corresponding to the comparison means 2 of the present invention) that compares the levels and then binarizes and outputs the result, 25 determines which vowel or voiced consonant the audio signal from the binarization circuit 24 is. This is a CPU board (corresponding to the parameter extraction means 3 and the voice discrimination means 4 of the present invention). Note that the output from the bandpass filter 23 passes through a rectifying and smoothing circuit (not shown) and is converted into a DC signal having only intensity information.

この実施例において、上記二値化回路２４は、特に第３
図に示すように、１４個のコンパレータ２４　（ｊ）　
　（ｊ　＝　１〜１４）からなり、夫々のコンパレータ
２４　（ｊ）にてｃ　ｈ　（ｊ＋１）　＞　ｃ　ｈ　（
ｊ）のときＣｊ＝’“１″、そうでないときＣｊ−“°
０″を出力するようになっている。In this embodiment, the binarization circuit 24 has a third
As shown in the figure, 14 comparators 24 (j)
(j = 1 to 14), and each comparator 24 (j) satisfies ch (j+1) > ch (
j), Cj='"1", otherwise Cj-"°
0'' is output.

また、上記ＣＰＵボード２５は第４図に示すフローチャ
ートに従う制御動作を行うようになっている。具体的に
は、上記ＣＰｔＪボード２５は、二値化回路２４からの
各比較信号ｃｊ＜　ｃ　ｈ　（ｊ＋１）−ｃｈ（ｊ））
に基づいてその符号が正から負に変化する箇所を調べ（
ステップ１）、正から負に変化するチャンネルｃ　ｈ　
（ｉ）の番号（この実施例ではｍｌないしｍ３とする）
をホルマントのあるチャンネルとして検出する（ステッ
プ２）。しかる後、上記ＣＰＵボード２５は、第二ホル
マントのあるチャンネル番号ｍ２から第一ホルマントの
あるチャンネル番号ｍ１の差を第一のパラメータＬ　ｆ
１２として抽出すると共に、第三ホルマントのあるチャ
ンネル番号ｍ３から第二ホルマントのあるチャンネル番
号ｍ２の差を第二のパラメータＬ　ｆ２３として抽出し
くステップ３．４）、次いで、第−及び第二のパラメー
タ１−ｆ１２　、１−ｆ２３の時間変化をメモリに記憶
させた後、ＤＰマツチングを行う（ステップ５，６）。Further, the CPU board 25 is designed to perform control operations according to the flowchart shown in FIG. Specifically, the CPtJ board 25 receives each comparison signal cj<ch(j+1)-ch(j)) from the binarization circuit 24.
Find out where the sign changes from positive to negative based on (
Step 1), channel ch changing from positive to negative
Number (i) (in this example, it is ml or m3)
is detected as a channel with formant (step 2). Thereafter, the CPU board 25 calculates the difference between the channel number m2 where the second formant is located and the channel number m1 where the first formant is located as a first parameter L f
12, and the difference between the channel number m3 where the third formant is located and the channel number m2 where the second formant is located is extracted as the second parameter Lf23 (step 3.4), and then the - and second parameters After storing the temporal changes of 1-f12 and 1-f23 in the memory, DP matching is performed (steps 5 and 6).

この後、ＣＰＵボード２５は、第−及び第二のパラメー
タ１ｆ１２゜Ｌ　ｆ２３をもとに、予め設定されている
基準データと比較し、入力された音声信号Ｓがなにであ
るかを判別して出力する（ステップ７）。Thereafter, the CPU board 25 compares it with preset reference data based on the first and second parameters 1f12°Lf23 and determines what the input audio signal S is. and output it (step 7).

次に、この実施例に係る音声認識装置で例えば母音を認
識する場合について説明する。Next, a case will be described in which, for example, a vowel is recognized by the speech recognition device according to this embodiment.

今は、複数の男性、女性が混在した不特定話者（男性：
ＭＡＬＥ、女性：ＦＥＭＡＬＥ）による各母音“ａ、ｉ
　、ｕ、ｅ、ｏ”の第−及び第二のパラメータＬｆ１２
　、１Ｊ２３の値を夫々求め、第−及び第二のパラメー
タＬｆ１２　、　Ｌｆ２３を座標軸とする音声の特徴空
間において夫々の値をプロットしたところ、第５図に示
すような分布図が得られた。尚、同図において、ＭＡＬ
Ｅは・（黒塗り）。Nowadays, unspecified speakers (men:
Each vowel “a, i” by MALE, FEMALE)
, u, e, o'''s first and second parameters Lf12
, 1J23 were determined, and the respective values were plotted in the voice feature space with the coordinate axes of the second and second parameters Lf12 and Lf23, and a distribution diagram as shown in FIG. 5 was obtained. In addition, in the same figure, MAL
E is (blacked out).

ＦＥＭＡＬＥは○（白抜き）、ａはｏ、ｉ　は、ＩＵは
口、ｅは、０は◇である。FEMALE is ○ (white), a is o, i is IU, e is 0, ◇.

この分布図によれば、男性と女性の区別なく、各母音が
均等に且つ比較的狭い領域３ａ、３ｉ、３ｕ。According to this distribution map, each vowel is equally spaced in relatively narrow regions 3a, 3i, and 3u, regardless of whether it is male or female.

Ｓｅ、３ｏ　　（図中点線で囲む）に分布していること
が理解される。このことは、前記音声の特徴空間におい
て、各母音の分布領域が相互に重なり合うことがほとん
どないことを意味し、各母音の分布領域３ａないしＳＯ
は、仕切り線１１ないし１５で確実に区画されることに
なる。It is understood that the distribution is Se,3o (encircled by a dotted line in the figure). This means that in the speech feature space, the distribution areas of each vowel almost never overlap with each other, and each vowel distribution area 3a to SO
will be reliably divided by the partition lines 11 to 15.

よって、各母音”ａ、ｉ、ＬＪ、ｅ、ｏ″に対応する第
一、第二のパラメータＬｆ１２　、１Ｊ２３の基準値を
、上記仕切り線１１ないしｉ５の値をもとに各ｆａ　域
Ｓ　ａないしＳＯが含まれるように予め設定しておけば
、入力された音声情報の第−及び第二パラメータしｆ１
２　、１ｆ２３がどの領域に含まれるかがＣＰＵボート
２５の音声判別部（ステップ７）において判断され、入
力された音声情報が正確に判別されるのである。Therefore, the reference values of the first and second parameters Lf12 and 1J23 corresponding to each vowel "a, i, LJ, e, o" are determined for each fa area S a based on the values of the partition lines 11 to i5. If it is set in advance to include SO or SO, the first and second parameters of the input audio information will be f1.
The audio discrimination unit (step 7) of the CPU board 25 determines in which region 2, 1f23 is included, and the input audio information is accurately determined.

尚、有声子音の認識についても、時間的に変化する基準
データを予め設定しておき、これと比較することにより
上述したのと同様に判別することができる。Note that voiced consonants can also be recognized in the same manner as described above by setting time-varying reference data in advance and comparing them with this data.

このような音声認識過程においては、ホルマント周波数
を検出する過程で、バンドパスフィルタ２３及び二値化
回路２４を使用しているので、ＣＰＵボード２５では二
値化回路２４の出力信号の符号を調べるだけでホルマン
・ト周波数を検出することが可能になる。このため、ホ
ルマント周波数の検出手段の大部分をハードウェア構成
にすることができ、その分、ホルマント周波数の検出動
作時間が短縮される。In such a speech recognition process, since the bandpass filter 23 and the binarization circuit 24 are used in the process of detecting the formant frequency, the CPU board 25 checks the sign of the output signal of the binarization circuit 24. It becomes possible to detect the formant frequency with just this. Therefore, most of the formant frequency detection means can be configured as hardware, and the formant frequency detection operation time is shortened accordingly.

更に加えて、この実施例においては、特殊なバンドパス
フィルタ２３を使用することにより、ホルマント周波数
の対数情報をハードウェア的に求めるようにしているの
で、ＣＰＵボード２５内でホルマント周波数の対数差を
求める演算を行う必要がなくなり、しかも、ホルマント
周波数の比をとる場合等において必要となる乗除筒を行
う必要がない。このため、ＣＰＵ−ド２５内での計算量
が少なくなり、その分、特徴パラメータの抽出時間を短
縮することができるほか、ＬＳＩを使用する等集積化が
容易になる。In addition, in this embodiment, the logarithmic information of the formant frequency is determined by hardware by using a special bandpass filter 23, so the logarithmic difference of the formant frequency is calculated within the CPU board 25. There is no need to perform calculations to obtain the result, and furthermore, there is no need to perform multiplication and division cylinders, which is necessary when calculating the ratio of formant frequencies. Therefore, the amount of calculation within the CPU card 25 is reduced, and the extraction time for feature parameters can be shortened accordingly, and integration, such as using an LSI, is facilitated.

また、この実施例においては、特徴パラメータとして、
ホルマント周波数の対数差情報１ｆｔ２゜Ｌ　ｒ２３を
用いているため、音声認識率が高いものになっている。In addition, in this example, as a feature parameter,
Since the formant frequency logarithmic difference information 1ft2°L r23 is used, the speech recognition rate is high.

このことを以下の比較例１及び比較例２との関係で述べ
る。This will be described in relation to Comparative Example 1 and Comparative Example 2 below.

先ず、比較例１は音声の特徴パラメータとしてホルマン
ト周波数そのものを用いたものであり、複数の男性、女
性が混在した不特定話者による各母音“ａ、ｉ、ｕ、ｅ
、ｏ”の第一ホルマントｆ１．第二ホルマントｆ２の周
波数を夫々求め、第−及び第二ホルマントｆ１．ｆ２を
座標軸とする音声の特徴空間において夫々の周波数をプ
ロットしたところ、第６図に示すような分布図が得られ
た。First, Comparative Example 1 uses the formant frequency itself as a characteristic parameter of the voice, and each vowel "a, i, u, e" by an unspecified speaker including a plurality of men and women.
, o", the frequencies of the first formant f1 and the second formant f2 were determined, and the respective frequencies were plotted in the speech feature space with the - and second formants f1 and f2 as the coordinate axes, as shown in Fig. 6. A distribution map like this was obtained.

この比較例１に係る分布図によれば、各母音の分布領域
３ａないしＳＯが大きく広がり、しかも、各母音の分布
領域が接近配置されており、特に、１１　ｕＩＩと“ｅ
゛′の分布領域が一部重なっていることが把握される。According to the distribution map according to Comparative Example 1, the distribution areas 3a to SO of each vowel are greatly expanded, and the distribution areas of each vowel are arranged close to each other.
It can be seen that the distribution areas of ゛′ partially overlap.

このため、音声の分布領域を明確に区画することが困難
になり、その分、音声の誤認識が生じ易く、特に、上記
″゛Ｕ”とｅ″との間で音声の誤認識の頻度が高くなっ
てしまう。For this reason, it becomes difficult to clearly demarcate the voice distribution area, which makes it more likely that voice recognition will be misrecognized, and the frequency of voice misrecognition is particularly high between the above-mentioned "゛U" and e. It gets expensive.

また、比較例２は音声の特徴パラメータとして隣接する
ホルマント周波数の比を用いたもので、複数の男性、女
性が混在した不特定話者による各母音”ａ、ｉ　、ｕ、
ｅ、ｏ”の第一ホルマントｆ１ないし第三ホルマントｆ
３の周波数を夫々求め、第−及び第二ホルマントの比ｆ
ｌ　／ｆ２　、第二及び第三ホルマントの比ｆ３　／ｆ
２を座標軸とする音声の特徴空間において夫々の周波数
をプロットしたところ、第７図に示すような分布図が得
られた。Comparative Example 2 uses the ratio of adjacent formant frequencies as a voice characteristic parameter, and each vowel "a, i, u,
e, o'' first formant f1 to third formant f
Find the frequencies of 3 and the ratio f of the -th and second formants.
l /f2, the ratio of second and third formants f3 /f
When each frequency was plotted in the voice feature space with 2 as the coordinate axis, a distribution diagram as shown in FIG. 7 was obtained.

この比較例２に係る分布図によれば、比較例１よりも各
母音の分布領域は明確に区画されているが、母音“１′
′の分布領域Ｓｉが他のものに比べて大きく広がってい
るほか、母音の分布領域の大きさがばらついていること
が把握される。このため、母音１１　ｉ　ＩＩの分布領
域３ｉにおいては他の母音の分布領域に比較して不自然
である。また、母音の分布領域の大きさがばらつくと、
音声の判別精度を向上させる上で重み付は処理等を施す
場合に重み付は計数が各母音毎にまちまちになり、その
分、重み付は処理等が面倒になってしまうという不具合
が生じてしまう。According to the distribution map of Comparative Example 2, the distribution area of each vowel is more clearly demarcated than in Comparative Example 1, but the vowel "1'
It can be seen that the distribution area Si of ' is much wider than the others, and that the size of the distribution area of vowels varies. Therefore, the distribution region 3i of the vowels 11 i II is unnatural compared to the distribution regions of other vowels. Also, if the size of the vowel distribution area varies,
In order to improve speech discrimination accuracy, when weighting is used for processing, the number of counts varies for each vowel, and the problem with weighting is that the processing becomes troublesome. Put it away.

従って、比較例１及び比較例２に係るものが実施例に比
べて音声の１１率は低いことになる。また、これらの比
較例の場合には、実施例で用いたような減粋のみによる
方法はとれず、乗除算を必ず行わなければならない分、
演算時間の点で不利である。Therefore, the 11 rate of speech in Comparative Examples 1 and 2 is lower than that in the Examples. In addition, in the case of these comparative examples, the method using only subtraction as used in the example cannot be used, and multiplication and division must be performed.
This is disadvantageous in terms of calculation time.

更に、第−及び第二ホルマント周波数と第三ホルマント
周波数との対数差を特徴パラメータとする場合には、実
施例における分布図が右回りに回転した形となり、各母
音を区画する直線が複雑になる。このため、実施例にお
けるように、座標軸に平行な直線で各母音を区画するよ
うな手法を用いることは困難であり、判別のための計算
で時間がかかる等の点で好ましくない。Furthermore, when the logarithmic difference between the first and second formant frequencies and the third formant frequency is used as a characteristic parameter, the distribution diagram in the example is rotated clockwise, and the straight lines dividing each vowel become complicated. Become. For this reason, it is difficult to use a method of dividing each vowel by a straight line parallel to the coordinate axis as in the embodiment, and it is not preferable because it takes time to calculate for discrimination.

尚、この実施例にあっては、音声として母音及び有声子
音を認識対栄としているが、例えば、無声子音をも認識
対象とする場合には、第２図に仮想線で示すように、バ
ンドパスフィルタからのパラレルな音声分１．１１信号
Ｓ１をマルチプレクサ３１によってシリアル信号に変換
した後、ＡＤコンバータ３２にてディジタル化してＣＰ
Ｕボード２５に導く訴うにする一方、上記ＣＰＬＩボー
ド２５内に無声子音判別部（図示せず）を設け、この無
声子音判別部で上記ＡＤコンバータ３２からのデータを
処理し、無声子音をも判別できるようにすることが可能
である。In this embodiment, vowels and voiced consonants are recognized as sounds, but if voiceless consonants are also to be recognized, for example, as shown by the phantom lines in FIG. After converting the parallel audio 1.11 signal S1 from the pass filter into a serial signal by the multiplexer 31, it is digitized by the AD converter 32 and sent to the CP.
On the other hand, an unvoiced consonant discriminator (not shown) is provided in the CPLI board 25, and this unvoiced consonant discriminator processes data from the AD converter 32 to also discriminate unvoiced consonants. It is possible to do so.

［発明の効果］以上説明してきたように、この発明に係る音声認識装置
によれば、音声信号のホルマント周波数の検出過程の装
置構成を工夫し、ホルマント周波数の検出時間の短縮化
を図るようにしたので、ホルマント周波数に基づく特徴
パラメータをもとにした音声認識を高速に行うことがで
きる。[Effects of the Invention] As explained above, according to the speech recognition device according to the present invention, the device configuration of the process of detecting the formant frequency of the speech signal is devised to shorten the detection time of the formant frequency. Therefore, speech recognition based on feature parameters based on formant frequencies can be performed at high speed.

更に、請求項２記載の音声認識装置にあっては、ホルマ
ント周波数の対数差情報を特徴パラメータとしているの
で、ホルマント周波数に直接依存しない聴空間に対応し
た音声の特徴空間を形成することができる。このため、
不特定話者による同一音声の特徴パラメータを比較的狭
い範囲に分布させることができ、異なる音声の分布領域
を確実に区画することが可能になり、その分、音声の誤
認識を少なくすることができる。Furthermore, in the speech recognition device according to the second aspect, since logarithmic difference information of formant frequencies is used as a feature parameter, it is possible to form a speech feature space corresponding to an auditory space that does not directly depend on formant frequencies. For this reason,
It is possible to distribute the characteristic parameters of the same voice by an unspecified speaker in a relatively narrow range, and it is possible to reliably separate the distribution areas of different voices, thereby reducing misrecognition of the voice. can.

特に、請求項３記載の音声認識装置にあっては、ホルマ
ント抽出手段にてホルマント周波数の対数差演算を行う
必要がなくなるので、ホルマント抽出手段でホルマント
周波数の対数差演算をソフトウェア処理にて行う場合に
比べて特徴パラメータの抽出時間をより短縮することが
できる。In particular, in the speech recognition device according to claim 3, there is no need to perform logarithmic difference calculation of formant frequencies in the formant extraction means, so when the formant extraction means performs logarithmic difference calculation of formant frequencies by software processing. The extraction time for feature parameters can be further shortened compared to .

[Brief explanation of the drawing]

第１図はこの発明に係る音声認識装置の概略構成を示す
説明図、第２図はこの発明に係る音声認識装置の一実施
例を示すブロック図、第３図は実施例に係る二値化回路
の詳細を示す説明図、第４図は実施例に係るＣＰＵボー
ドの具体的な処理を示すフローチャート、第５図は実施
例に基づく音声の特徴空間を示すグラフ図、第６図及び
第７図ハ比較例１寝比較例２に基づく音声の特徴空間を
示すグラフ日、第８図は従来の音声認識方法におけるホ
ルマントの抽出過程を示すグラフ図、第９図は従来の音
声認識装置の一例を示す説明図である。［符号の説明］Ｓ・・・音声信号Ｓｉ・・・音声分割信号１・・・音声信号分割手段２・・・比較手段３・・・ホルマント抽出手段４・・・音声判別手段FIG. 1 is an explanatory diagram showing the schematic configuration of a speech recognition device according to the present invention, FIG. 2 is a block diagram showing an embodiment of the speech recognition device according to the invention, and FIG. 3 is a binarization diagram according to the embodiment. An explanatory diagram showing the details of the circuit, FIG. 4 is a flowchart showing specific processing of the CPU board according to the embodiment, FIG. 5 is a graph diagram showing the audio feature space based on the embodiment, and FIGS. Figure C is a graph showing the feature space of speech based on Comparative Example 1 and Comparative Example 2. Figure 8 is a graph showing the formant extraction process in a conventional speech recognition method. Figure 9 is an example of a conventional speech recognition device. FIG. [Explanation of symbols] S...Audio signal Si...Speech division signal 1...Audio signal division means 2...Comparison means 3...Formant extraction means 4...Speech discrimination means

Claims

[Claims] 1) Audio signal dividing means (1) that divides the audio signal (S) into predetermined frequency band components and outputs the divided components, and each frequency divided by the audio signal dividing means (1). a comparison means (2) that compares the output level of each audio division signal (Si) between adjacent channels and binarizes the same, and detects a formant frequency based on the comparison information from the comparison means (2); Parameter extraction means (3) for extracting feature parameters based on the formant frequencies
), and a speech discrimination means (4) for discriminating speech based on the characteristic parameters from the parameter extraction means. 2) The speech recognition device according to claim 1, characterized in that logarithmic difference information of formant frequencies is used as the speech feature parameter. 3) The audio signal dividing means (
1) consists of a group of band-pass filters arranged logarithmically, and the parameter extraction means (3) consists of a formant detection section that detects a formant frequency and an audio signal division means (1) that corresponds to the detected formant frequency. A speech recognition device comprising: a parameter calculation unit that calculates a channel number difference as a feature parameter.