JPS6069700A

JPS6069700A - Voice recognition equipment

Info

Publication number: JPS6069700A
Application number: JP58178467A
Authority: JP
Inventors: 小林　敦仁; 奈良　泰弘; 裕二木島; 晋太木村
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1983-09-27
Filing date: 1983-09-27
Publication date: 1985-04-20

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】＜ａ＞発明の技術分野本発明は音韻を認識単位として入力音声を認識する方式
の音声認識装置に関する。DETAILED DESCRIPTION OF THE INVENTION <a> Technical Field of the Invention The present invention relates to a speech recognition device that recognizes input speech using phonemes as recognition units.

（ｂ）技術の背景音声認識装置は、認識単位によって、単音節音声認識装
置・単語音声認識装置あるいは連続音声認識装置に類別
され、特定話者を対象とする単音節音声認識装置および
単語音声認識装置ならびに不特定話者を対象とする少数
単語音声認識装置等がすでに実用化てされている。(b) Technical Background Speech recognition devices are classified into monosyllabic speech recognition devices, word speech recognition devices, or continuous speech recognition devices depending on the recognition unit, and are monosyllabic speech recognition devices and word speech recognition devices that target specific speakers. Devices such as speech recognition devices and small word speech recognition devices targeted at unspecified speakers have already been put into practical use.

しかし電子針算機システムに対する情報入力手段として
は連続音声認識装置が最も望ましい装置であり、その実
用化のため、近時、音韻を認識単位とする音声認識装置
の開発が進められている。However, a continuous speech recognition device is the most desirable means for inputting information to an electronic pointer system, and in order to put it into practical use, speech recognition devices that use phonemes as recognition units have recently been developed.

（Ｃ）従来技術と問題点前記音韻を認識単位とする音声認識装置は、入力音声を
分析し特徴を抽出する分析部２音韻毎の標準特徴を記憶
する音韻辞書、および前記分析部によって得られた入力
音声の特徴を前記音韻辞書に記憶する標準特徴と照合す
ることによって入力音声の音韻を認識する音韻認識部と
を主要な構成要素として備えている。(C) Prior Art and Problems The speech recognition device that uses phonemes as recognition units has two analysis units that analyze input speech and extract features; a phoneme dictionary that stores standard features for each phoneme; and a phoneme dictionary that stores standard features for each phoneme; The main component is a phoneme recognition unit that recognizes the phoneme of the input voice by comparing the characteristics of the input voice with the standard features stored in the phoneme dictionary.

前記分析部には高速フーリエ変換（ＦＦＴ）　・フィル
タ分析・線形予測法（Ｌ　Ｐ　Ｇ）等の技術が用いられ
１例えばＦＦＴによる分析法では、一般に、ＡＤ変換後
の入力音声データに対して、ある一定の分析フレーム長
と分析フレーム周期で分析を行い１分析フレーム毎の対
数スペクトラムをめ、これを予め音韻辞書に記憶する音
韻毎の標準特徴と照合するという方法が多く用いられて
いる。Techniques such as fast Fourier transform (FFT), filter analysis, and linear prediction method (LPG) are used in the analysis section. A method that is often used is to perform analysis at a certain analysis frame length and analysis frame period, obtain a logarithmic spectrum for each analysis frame, and compare this with standard features for each phoneme stored in a phoneme dictionary in advance.

このように、従来、前記分析フレーム長および分析フレ
ーム周期など分析条件を一定としていた。In this way, conventionally, analysis conditions such as the analysis frame length and analysis frame period have been kept constant.

このため例えば無声破裂音（））−Ｔ−Ｋ）のように持
続時間が母音等に比べて非常に短い音韻は。For this reason, for example, phonemes such as voiceless plosives ())-T-K) whose duration is very short compared to vowels, etc.

その前後の音韻の影響を受け易く認識が困難であるとい
う欠点があった。It has the disadvantage that it is difficult to recognize because it is easily influenced by the phonemes before and after it.

（ｄ）発明の目的本発明の目的は、音韻を認識単位として入力音声を認識
する方式の音声認識装置における認識率の向上にある。(d) Object of the Invention An object of the present invention is to improve the recognition rate in a speech recognition device that recognizes input speech using phonemes as recognition units.

（ｅ）発明の構成すなわち１本発明になる音声認識装置は、大男音声を分
析し特徴を抽出する分析部と音韻毎の標準特徴を記憶す
る音韻辞書と前記分析部によって得られた入力音声の特
徴を前記音韻辞書に記憶する標準特徴と照合して入力音
声の音韻を認識する音韻認識部を備え音韻を認識単位と
して人力音声を認識する音声認識装置において、前記分
析部における分析条件を前記音韻認識部における認識結
果に応じて設定する分析条件設定部を設け、前記音韻認
識部における認識結果に応じて入力音声の分析条件を変
更して設定し再分析することによって認識するようにし
たものである。(e) Structure of the Invention: 1. The speech recognition device according to the present invention includes an analysis unit that analyzes a big man's voice and extracts features, a phoneme dictionary that stores standard features for each phoneme, and input speech obtained by the analysis unit. In a speech recognition device that recognizes human speech using phonemes as recognition units, the speech recognition device includes a phoneme recognition unit that recognizes phonemes of input speech by comparing the characteristics of the above with standard features stored in the phoneme dictionary, An analysis condition setting unit is provided to set according to the recognition result in the phoneme recognition unit, and recognition is performed by changing and setting analysis conditions for input speech according to the recognition result in the phoneme recognition unit and reanalyzing. It is.

（ｆ）発明の実施例以下２本発明の要旨を実施例によって具体的に説明する
。(f) Examples of the Invention The gist of the present invention will be specifically explained below using two examples.

図は本発明一実施例の構成を示すブロック図であり、１
は入力音声をＱ、１ｍｓ毎にザンプリングして１２ビツ
ト多値デジタルデータに変換するＡＤ変換器、２はＡＤ
変換器１の出力を一時記憶するへソファ、３はバッファ
２の内容を読み取り分析フレーム長毎に入力音声を分析
し、その特徴として対数スペクトルとピンチ周波数とを
抽出する分析部。The figure is a block diagram showing the configuration of one embodiment of the present invention.
is an AD converter that samples the input audio every 1 ms and converts it into 12-bit multi-value digital data; 2 is an AD converter;
An analysis unit 3 reads the contents of the buffer 2, analyzes the input audio for each analysis frame length, and extracts the logarithmic spectrum and pinch frequency as its characteristics.

４は分析部３によって得られるピンチ周波数の有無によ
って前記分析フレーム長毎の音声が有声音か無声音かを
判定する判定部、５は音韻毎の標準の対数パワ７スベク
トルを記憶する音韻辞書、６は分析部３によって得られ
る対数パワースペクトルを音韻辞書５に記憶する標準の
音韻の対数パワースペクトルと照合することによって入
力音声の音韻を認識する音韻認識部、７は分析部３にお
ける分析条件を設定する分析条件設定部である。4 is a determination unit that determines whether the speech for each analysis frame length is a voiced or unvoiced sound based on the presence or absence of a pinch frequency obtained by the analysis unit 3; 5 is a phoneme dictionary that stores a standard logarithmic power vector for each phoneme; 6 is a phoneme recognition unit that recognizes the phoneme of the input speech by comparing the logarithmic power spectrum obtained by the analysis unit 3 with the logarithmic power spectrum of the standard phoneme stored in the phoneme dictionary 5; 7 is a phoneme recognition unit that recognizes the analysis conditions in the analysis unit 3; This is an analysis condition setting section to be set.

分析部３は２通常は後記第一の分析条件によって入力音
声を分析し、音！ｌ！認識部６において認識不能が生し
た場合には１判定部４の判定結果に応して後記第二の分
析条件または第三の分析条件のいずれかによって入力音
声を分析する。The analysis unit 3 analyzes the input audio according to the first analysis condition described later, and then analyzes the sound! l! If recognition is not possible in the recognition unit 6, the input voice is analyzed according to either the second analysis condition or the third analysis condition described later in accordance with the judgment result of the first judgment unit 4.

すなわち２第一の分析条件では１分析フレーム長を１２
・３　ｍｓまた分析フレーム周期を６．４ｍｓと設定し
、ＦＦＴによって対数パワースペクトルをめ、その１ｌ
ＦＦＴとして得られるケプストラムからピッチ周波数を
める。In other words, in the first analysis condition, the length of one analysis frame is 12
・3 ms Also, set the analysis frame period to 6.4 ms, calculate the logarithmic power spectrum by FFT, and calculate its 1l
Calculate the pitch frequency from the cepstrum obtained as FFT.

このあと、音韻認識部６において認識不能が生じ且つ判
定部４によって無声音として判定された場合には、第二
の分析条件として分析フレーム長を６．４貼また分析フ
レーム周期を３．２ｍ３と設定し。After this, if an unrecognizable sound occurs in the phoneme recognition unit 6 and it is determined as an unvoiced sound by the determination unit 4, the analysis frame length is set to 6.4 meters and the analysis frame period is set to 3.2 m3 as the second analysis condition. death.

ＦＦＴによって対数パワースペクトルをめ、その逆ＦＦ
Ｔとして得られるケプストラムからピッチ周波数をめる
。第二の分析条件は第一の分析条件に比し分析フレーム
長および分析フレーム周期が共に短いために前後に隣接
する音韻の影響を受け難＜、シたがって、持続時間の短
い無声破裂音に対する認識率を向上することができる。Obtain the logarithmic power spectrum by FFT, and inverse FF
Calculate the pitch frequency from the cepstrum obtained as T. Compared to the first analysis condition, the second analysis condition has a shorter analysis frame length and analysis frame period, so it is less susceptible to the influence of adjacent phonemes. The recognition rate can be improved.

また音韻認識部６において認識不能が生じ且つ判定部４
によって有声音として判定された場合には、第三の分析
条件として分析フレーム長を２５．６ｍｓまた分析フレ
ーム周期を３Ｊｍａと設定し、ＦＦＴによって対数パワ
ースペクトルをめ、その逆ＦＦＴとして得られるケプス
トラムからピッチ周波数をめる。第三の分析条件は第一
の分析条件に比し分析フレーム長が長＜、シたがって持
続時間の長い母音の認識に適するほか、とくに２分析フ
レーム周期を短くすることによって認識精度を向上する
ことができる。In addition, if the phoneme recognition unit 6 fails to recognize the phoneme, and the determination unit 4
If it is determined to be a voiced sound by Increase the pitch frequency. The third analysis condition has a longer analysis frame length than the first analysis condition, and is therefore suitable for recognizing vowels with a long duration, and particularly improves recognition accuracy by shortening the two-analysis frame period. be able to.

（ｇ）発明の効果以」二説明したように２本発明によれば音韻を認識単位
として入力音声を認識する方式の音声認識装置において
認識率を向上する効果がある。(g) Effects of the Invention As described above, the present invention has the effect of improving the recognition rate in a speech recognition device that recognizes input speech using phonemes as recognition units.

[Brief explanation of drawings]

図は本発明一実施例の構成を示すブロック図である。図中、３は分析部、５は音韻辞書、６は音韻認識部２７
は分析条件設定部である。The figure is a block diagram showing the configuration of an embodiment of the present invention. In the figure, 3 is an analysis unit, 5 is a phoneme dictionary, and 6 is a phoneme recognition unit 27.
is an analysis condition setting section.

Claims

[Claims]

an analysis unit that analyzes input speech and extracts features; a phoneme dictionary that stores standard features for each phoneme; and a phoneme dictionary that stores standard features for each phoneme; and Equipped with a sound B recognition unit that recognizes phonemes,
In a speech recognition device that recognizes input speech using phonemes as recognition units, an analysis condition setting unit is provided that sets analysis conditions in the analysis unit according to the recognition results in the phoneme recognition unit, and A speech recognition device characterized in that the analysis conditions of input speech are changed and set, and the analysis is re-analyzed.