JPS6069700A - Voice recognition equipment - Google Patents

Voice recognition equipment

Info

Publication number
JPS6069700A
JPS6069700A JP58178467A JP17846783A JPS6069700A JP S6069700 A JPS6069700 A JP S6069700A JP 58178467 A JP58178467 A JP 58178467A JP 17846783 A JP17846783 A JP 17846783A JP S6069700 A JPS6069700 A JP S6069700A
Authority
JP
Japan
Prior art keywords
analysis
phoneme
recognition
unit
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP58178467A
Other languages
Japanese (ja)
Inventor
小林 敦仁
奈良 泰弘
裕二 木島
晋太 木村
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to JP58178467A priority Critical patent/JPS6069700A/en
Publication of JPS6069700A publication Critical patent/JPS6069700A/en
Pending legal-status Critical Current

Links

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 <a>発明の技術分野 本発明は音韻を認識単位として入力音声を認識する方式
の音声認識装置に関する。
DETAILED DESCRIPTION OF THE INVENTION <a> Technical Field of the Invention The present invention relates to a speech recognition device that recognizes input speech using phonemes as recognition units.

(b)技術の背景 音声認識装置は、認識単位によって、単音節音声認識装
置・単語音声認識装置あるいは連続音声認識装置に類別
され、特定話者を対象とする単音節音声認識装置および
単語音声認識装置ならびに不特定話者を対象とする少数
単語音声認識装置等がすでに実用化てされている。
(b) Technical Background Speech recognition devices are classified into monosyllabic speech recognition devices, word speech recognition devices, or continuous speech recognition devices depending on the recognition unit, and are monosyllabic speech recognition devices and word speech recognition devices that target specific speakers. Devices such as speech recognition devices and small word speech recognition devices targeted at unspecified speakers have already been put into practical use.

しかし電子針算機システムに対する情報入力手段として
は連続音声認識装置が最も望ましい装置であり、その実
用化のため、近時、音韻を認識単位とする音声認識装置
の開発が進められている。
However, a continuous speech recognition device is the most desirable means for inputting information to an electronic pointer system, and in order to put it into practical use, speech recognition devices that use phonemes as recognition units have recently been developed.

(C)従来技術と問題点 前記音韻を認識単位とする音声認識装置は、入力音声を
分析し特徴を抽出する分析部2音韻毎の標準特徴を記憶
する音韻辞書、および前記分析部によって得られた入力
音声の特徴を前記音韻辞書に記憶する標準特徴と照合す
ることによって入力音声の音韻を認識する音韻認識部と
を主要な構成要素として備えている。
(C) Prior Art and Problems The speech recognition device that uses phonemes as recognition units has two analysis units that analyze input speech and extract features; a phoneme dictionary that stores standard features for each phoneme; and a phoneme dictionary that stores standard features for each phoneme; The main component is a phoneme recognition unit that recognizes the phoneme of the input voice by comparing the characteristics of the input voice with the standard features stored in the phoneme dictionary.

前記分析部には高速フーリエ変換(FFT) ・フィル
タ分析・線形予測法(L P G)等の技術が用いられ
1例えばFFTによる分析法では、一般に、AD変換後
の入力音声データに対して、ある一定の分析フレーム長
と分析フレーム周期で分析を行い1分析フレーム毎の対
数スペクトラムをめ、これを予め音韻辞書に記憶する音
韻毎の標準特徴と照合するという方法が多く用いられて
いる。
Techniques such as fast Fourier transform (FFT), filter analysis, and linear prediction method (LPG) are used in the analysis section. A method that is often used is to perform analysis at a certain analysis frame length and analysis frame period, obtain a logarithmic spectrum for each analysis frame, and compare this with standard features for each phoneme stored in a phoneme dictionary in advance.

このように、従来、前記分析フレーム長および分析フレ
ーム周期など分析条件を一定としていた。
In this way, conventionally, analysis conditions such as the analysis frame length and analysis frame period have been kept constant.

このため例えば無声破裂音())−T−K)のように持
続時間が母音等に比べて非常に短い音韻は。
For this reason, for example, phonemes such as voiceless plosives ())-T-K) whose duration is very short compared to vowels, etc.

その前後の音韻の影響を受け易く認識が困難であるとい
う欠点があった。
It has the disadvantage that it is difficult to recognize because it is easily influenced by the phonemes before and after it.

(d)発明の目的 本発明の目的は、音韻を認識単位として入力音声を認識
する方式の音声認識装置における認識率の向上にある。
(d) Object of the Invention An object of the present invention is to improve the recognition rate in a speech recognition device that recognizes input speech using phonemes as recognition units.

(e)発明の構成 すなわち1本発明になる音声認識装置は、大男音声を分
析し特徴を抽出する分析部と音韻毎の標準特徴を記憶す
る音韻辞書と前記分析部によって得られた入力音声の特
徴を前記音韻辞書に記憶する標準特徴と照合して入力音
声の音韻を認識する音韻認識部を備え音韻を認識単位と
して人力音声を認識する音声認識装置において、前記分
析部における分析条件を前記音韻認識部における認識結
果に応じて設定する分析条件設定部を設け、前記音韻認
識部における認識結果に応じて入力音声の分析条件を変
更して設定し再分析することによって認識するようにし
たものである。
(e) Structure of the Invention: 1. The speech recognition device according to the present invention includes an analysis unit that analyzes a big man's voice and extracts features, a phoneme dictionary that stores standard features for each phoneme, and input speech obtained by the analysis unit. In a speech recognition device that recognizes human speech using phonemes as recognition units, the speech recognition device includes a phoneme recognition unit that recognizes phonemes of input speech by comparing the characteristics of the above with standard features stored in the phoneme dictionary, An analysis condition setting unit is provided to set according to the recognition result in the phoneme recognition unit, and recognition is performed by changing and setting analysis conditions for input speech according to the recognition result in the phoneme recognition unit and reanalyzing. It is.

(f)発明の実施例 以下2本発明の要旨を実施例によって具体的に説明する
(f) Examples of the Invention The gist of the present invention will be specifically explained below using two examples.

図は本発明一実施例の構成を示すブロック図であり、1
は入力音声をQ、1ms毎にザンプリングして12ビツ
ト多値デジタルデータに変換するAD変換器、2はAD
変換器1の出力を一時記憶するへソファ、3はバッファ
2の内容を読み取り分析フレーム長毎に入力音声を分析
し、その特徴として対数スペクトルとピンチ周波数とを
抽出する分析部。
The figure is a block diagram showing the configuration of one embodiment of the present invention.
is an AD converter that samples the input audio every 1 ms and converts it into 12-bit multi-value digital data; 2 is an AD converter;
An analysis unit 3 reads the contents of the buffer 2, analyzes the input audio for each analysis frame length, and extracts the logarithmic spectrum and pinch frequency as its characteristics.

4は分析部3によって得られるピンチ周波数の有無によ
って前記分析フレーム長毎の音声が有声音か無声音かを
判定する判定部、5は音韻毎の標準の対数パワ7スベク
トルを記憶する音韻辞書、6は分析部3によって得られ
る対数パワースペクトルを音韻辞書5に記憶する標準の
音韻の対数パワースペクトルと照合することによって入
力音声の音韻を認識する音韻認識部、7は分析部3にお
ける分析条件を設定する分析条件設定部である。
4 is a determination unit that determines whether the speech for each analysis frame length is a voiced or unvoiced sound based on the presence or absence of a pinch frequency obtained by the analysis unit 3; 5 is a phoneme dictionary that stores a standard logarithmic power vector for each phoneme; 6 is a phoneme recognition unit that recognizes the phoneme of the input speech by comparing the logarithmic power spectrum obtained by the analysis unit 3 with the logarithmic power spectrum of the standard phoneme stored in the phoneme dictionary 5; 7 is a phoneme recognition unit that recognizes the analysis conditions in the analysis unit 3; This is an analysis condition setting section to be set.

分析部3は2通常は後記第一の分析条件によって入力音
声を分析し、音!l!認識部6において認識不能が生し
た場合には1判定部4の判定結果に応して後記第二の分
析条件または第三の分析条件のいずれかによって入力音
声を分析する。
The analysis unit 3 analyzes the input audio according to the first analysis condition described later, and then analyzes the sound! l! If recognition is not possible in the recognition unit 6, the input voice is analyzed according to either the second analysis condition or the third analysis condition described later in accordance with the judgment result of the first judgment unit 4.

すなわち2第一の分析条件では1分析フレーム長を12
・3 msまた分析フレーム周期を6.4msと設定し
、FFTによって対数パワースペクトルをめ、その1l
FFTとして得られるケプストラムからピッチ周波数を
める。
In other words, in the first analysis condition, the length of one analysis frame is 12
・3 ms Also, set the analysis frame period to 6.4 ms, calculate the logarithmic power spectrum by FFT, and calculate its 1l
Calculate the pitch frequency from the cepstrum obtained as FFT.

このあと、音韻認識部6において認識不能が生じ且つ判
定部4によって無声音として判定された場合には、第二
の分析条件として分析フレーム長を6.4貼また分析フ
レーム周期を3.2m3と設定し。
After this, if an unrecognizable sound occurs in the phoneme recognition unit 6 and it is determined as an unvoiced sound by the determination unit 4, the analysis frame length is set to 6.4 meters and the analysis frame period is set to 3.2 m3 as the second analysis condition. death.

FFTによって対数パワースペクトルをめ、その逆FF
Tとして得られるケプストラムからピッチ周波数をめる
。第二の分析条件は第一の分析条件に比し分析フレーム
長および分析フレーム周期が共に短いために前後に隣接
する音韻の影響を受け難<、シたがって、持続時間の短
い無声破裂音に対する認識率を向上することができる。
Obtain the logarithmic power spectrum by FFT, and inverse FF
Calculate the pitch frequency from the cepstrum obtained as T. Compared to the first analysis condition, the second analysis condition has a shorter analysis frame length and analysis frame period, so it is less susceptible to the influence of adjacent phonemes. The recognition rate can be improved.

また音韻認識部6において認識不能が生じ且つ判定部4
によって有声音として判定された場合には、第三の分析
条件として分析フレーム長を25.6msまた分析フレ
ーム周期を3Jmaと設定し、FFTによって対数パワ
ースペクトルをめ、その逆FFTとして得られるケプス
トラムからピッチ周波数をめる。第三の分析条件は第一
の分析条件に比し分析フレーム長が長<、シたがって持
続時間の長い母音の認識に適するほか、とくに2分析フ
レーム周期を短くすることによって認識精度を向上する
ことができる。
In addition, if the phoneme recognition unit 6 fails to recognize the phoneme, and the determination unit 4
If it is determined to be a voiced sound by Increase the pitch frequency. The third analysis condition has a longer analysis frame length than the first analysis condition, and is therefore suitable for recognizing vowels with a long duration, and particularly improves recognition accuracy by shortening the two-analysis frame period. be able to.

(g)発明の効果 以」二説明したように2本発明によれば音韻を認識単位
として入力音声を認識する方式の音声認識装置において
認識率を向上する効果がある。
(g) Effects of the Invention As described above, the present invention has the effect of improving the recognition rate in a speech recognition device that recognizes input speech using phonemes as recognition units.

【図面の簡単な説明】[Brief explanation of drawings]

図は本発明一実施例の構成を示すブロック図である。 図中、3は分析部、5は音韻辞書、6は音韻認識部27
は分析条件設定部である。
The figure is a block diagram showing the configuration of an embodiment of the present invention. In the figure, 3 is an analysis unit, 5 is a phoneme dictionary, and 6 is a phoneme recognition unit 27.
is an analysis condition setting section.

Claims (1)

【特許請求の範囲】[Claims] 入力音声を分析し特徴を抽出する分析部と音韻毎の標準
特徴を記憶する音韻辞書と前記分析部によって得られた
入力音声の特徴を前記音韻辞書に記憶する標準特徴と照
合して入力音声の音韻を認識する音B認識部とを備え、
音韻を認識単位として入力音声を認識する音声認識装置
において、前記分析部における分析条件を前記音韻認識
部における認識結果に応じて設定する分析条件設定部を
設け、前記音韻認識部における認識結果に応じて入力音
声の分析条件を変更して設定し再分析することを特徴と
する音声認識装置。
an analysis unit that analyzes input speech and extracts features; a phoneme dictionary that stores standard features for each phoneme; and a phoneme dictionary that stores standard features for each phoneme; and Equipped with a sound B recognition unit that recognizes phonemes,
In a speech recognition device that recognizes input speech using phonemes as recognition units, an analysis condition setting unit is provided that sets analysis conditions in the analysis unit according to the recognition results in the phoneme recognition unit, and A speech recognition device characterized in that the analysis conditions of input speech are changed and set, and the analysis is re-analyzed.
JP58178467A 1983-09-27 1983-09-27 Voice recognition equipment Pending JPS6069700A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP58178467A JPS6069700A (en) 1983-09-27 1983-09-27 Voice recognition equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP58178467A JPS6069700A (en) 1983-09-27 1983-09-27 Voice recognition equipment

Publications (1)

Publication Number Publication Date
JPS6069700A true JPS6069700A (en) 1985-04-20

Family

ID=16049020

Family Applications (1)

Application Number Title Priority Date Filing Date
JP58178467A Pending JPS6069700A (en) 1983-09-27 1983-09-27 Voice recognition equipment

Country Status (1)

Country Link
JP (1) JPS6069700A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006350090A (en) * 2005-06-17 2006-12-28 Nippon Telegr & Teleph Corp <Ntt> Client/server speech recognizing method, speech recognizing method of server computer, speech feature quantity extracting/transmitting method, and system and device using these methods, and program and recording medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006350090A (en) * 2005-06-17 2006-12-28 Nippon Telegr & Teleph Corp <Ntt> Client/server speech recognizing method, speech recognizing method of server computer, speech feature quantity extracting/transmitting method, and system and device using these methods, and program and recording medium
JP4603429B2 (en) * 2005-06-17 2010-12-22 日本電信電話株式会社 Client / server speech recognition method, speech recognition method in server computer, speech feature extraction / transmission method, system, apparatus, program, and recording medium using these methods

Similar Documents

Publication Publication Date Title
US11056097B2 (en) Method and system for generating advanced feature discrimination vectors for use in speech recognition
JP3162994B2 (en) Method for recognizing speech words and system for recognizing speech words
US8326610B2 (en) Producing phonitos based on feature vectors
Bezoui et al. Feature extraction of some Quranic recitation using mel-frequency cepstral coeficients (MFCC)
Nwe et al. Detection of stress and emotion in speech using traditional and FFT based log energy features
Shahin Gender-dependent emotion recognition based on HMMs and SPHMMs
Pao et al. Combining acoustic features for improved emotion recognition in mandarin speech
Priyadarshani et al. Dynamic time warping based speech recognition for isolated Sinhala words
Safavi et al. Identification of gender from children's speech by computers and humans.
Deiv et al. Automatic gender identification for hindi speech recognition
Olive et al. Speech resynthesis from phoneme‐related parameters
Ishihara et al. Automatic transformation of environmental sounds into sound-imitation words based on Japanese syllable structure.
Nagaraja et al. Mono and cross lingual speaker identification with the constraint of limited data
JPS6069700A (en) Voice recognition equipment
Mahmood et al. Multidirectional local feature for speaker recognition
Varma et al. Segmentation algorithm using temporal features and group delay for speech signals
Alotaibi et al. Noise Effect on Arabic Alphadigits in Automatic Speech Recognition.
Undhad et al. Exploiting speech source information for vowel landmark detection for low resource language
Basztura et al. Automatic speech signal segmentation with chosen parametrization method
Latha et al. Performance Analysis of Kannada Phonetics: Vowels, Fricatives and Stop Consonants Using LP Spectrum
Uthiraa et al. Analysis of Mandarin vs English Language for Emotional Voice Conversion
Thankappan et al. Language independent voice-based gender identification system
CN115019775A (en) Phoneme-based language identification method for language distinguishing characteristics
Naveena et al. Extraction of Prosodic Features to Automatically Recognize Tamil Dialects
JPS61249099A (en) Voice recognition equipment