JPS62191899A

JPS62191899A - Voiced plosive consonant identification system

Info

Publication number: JPS62191899A
Application number: JP61033560A
Authority: JP
Inventors: 小林　敦仁
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1986-02-18
Filing date: 1986-02-18
Publication date: 1987-08-22
Anticipated expiration: 2010-12-20
Also published as: JPH07120153B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔概要〕本発明は有声破裂子音相互を識別する方式に関するもの
で、安定かつ確実に各々の有声破裂子音の特徴を捉える
ため、破裂時点から母音への過渡部のスペクトル系列を
破裂時点から母音立ち上がり時点までの時間長及び音声
信号のピッチに応じて分析位置を決定し抽出するように
したことにより、安定して過筺部情報を捉えることがで
き高率で有声破裂子音相互を識別可能としたものである
。[Detailed Description of the Invention] [Summary] The present invention relates to a method for identifying voiced plosive consonants from each other, and in order to stably and reliably capture the characteristics of each voiced plosive consonant, the spectrum of the transition part from the point of plosive to the vowel is analyzed. By determining and extracting the analysis position according to the length of time from the plosive point to the vowel rise point and the pitch of the voice signal, it is possible to stably capture hyperphonic part information and detect voiced plosives at a high rate. This allows consonants to be distinguished from each other.

[Industrial application field]

本発明は音声認識装置、特に有声破裂子音間の相互を職
別する方式に関する。The present invention relates to a speech recognition device, and particularly to a method for distinguishing between voiced plosive consonants.

音声認識装置、特に単音節認識装置は基本的に６８個の
単音節ですべての日本語性量を構成できることからキー
ボード入力のかわりとして文章入力に用いることができ
、音声ワードプロセッサとして現在実用化がはかられて
いる。しかし、単音節認識においては語頭の子１＋；ｌ
ｌυハｔ’ｆ？ｉｔの違いが犬きな比重を示めており、
従来から単語認識等で用いられている音声分析方法では
その特徴を捉えきれずｇＨが低いという問題があり、実
用化にはさらに多くの技術的諌題の克服が要求されてい
る。Speech recognition devices, especially monosyllable recognition devices, can basically consist of all Japanese characters with 68 monosyllables, so they can be used for text input instead of keyboard input, and are currently not being put into practical use as voice word processors. It's getting messy. However, in monosyllable recognition, the word-initial child 1+;l
lυhat'f? The difference in IT shows a great deal of importance,
Speech analysis methods conventionally used for word recognition etc. have the problem of being unable to capture the characteristics of the speech, resulting in low gH, and many more technical problems must be overcome for practical use.

この中で特に発声機構が非常に似ている子音群内の識別
問題は難しく、その高村度な識別方式の確立が必要とさ
れている。Among these, the problem of identifying consonants within groups whose phonation mechanisms are very similar is particularly difficult, and it is necessary to establish a sophisticated identification method.

[Conventional technology]

従来の有声破裂子音識別方式としては、破裂時点直後の
破裂部スペクトルを特徴量として用いる方式や、ホルマ
ントローカスを特徴量として用いる方式等が代表的であ
り、この他にも種々の特徴量−を識別パラメータとして
用いた方式が提案されている。Typical conventional methods for identifying voiced plosive consonants include methods that use the spectrum of the plosive part immediately after the point of plosive as a feature, and methods that use the formant locus as a feature. A method using this as an identification parameter has been proposed.

前者は倣裂部の静的なスペクトルに識別に有効な情報が
存在するという考え方に基づいており、後者はホルマン
トローカスと呼ばれる破裂時点付近での第２，３ホルマ
ントの遷移開始周波数をホルマント周波数軌跡から推定
して求めたものを特徴量として用いている。The former is based on the idea that there is information useful for identification in the static spectrum of the fissure, and the latter is based on the formant locus, which is the formant locus, which is the transition start frequency of the second and third formants near the rupture point. The features estimated from the above are used as the feature values.

また、上記方式以外にも破裂時点から後続する母音への
過渡部の動的特徴を用いろ方式なども多（ゝＯ〔発明が解決しようとする問題点〕従来、破裂時点から後続する母音への過渡部の特徴を用
いて有声破裂子音間の相互を職別する方式では、破裂時
点から後続母音における動的な時系列情報を特徴量とし
て用いて識別を行っている〇ここでは破裂時点から母音
側に一定の分析条件のもとに音声分析がなされるため、
各発声毎の変動や各子音毎の母音への立ち上がり変動、
またピッチによる変動等の影響を強く受け、安定した分
析結果が得られないという問題があった。。In addition to the above methods, there are many other methods that use the dynamic characteristics of the transition from the point of rupture to the following vowel. In the method of distinguishing voiced plosive consonants from each other using the characteristics of the transition part of Since phonetic analysis is performed under certain analysis conditions on the vowel side,
Variations in each utterance and variations in the rise of each consonant to the vowel,
Furthermore, there is a problem in that stable analytical results cannot be obtained because the method is strongly affected by pitch fluctuations. .

[Means for solving problems]

第１図は本発明の原理ブロック図であり、図中、ｌはマ
イクロフォン、２はＡ／Ｄ変換器、３は破裂時点検出部
、１０は分析（，２！′ｆＦｔ２１・穴部、１６は判定
部、１７は音声分析部、１８は時間長演算部、１９は分
析位置補正部、２０は短区間音声分析部である。FIG. 1 is a block diagram of the principle of the present invention. In the figure, l is a microphone, 2 is an A/D converter, 3 is a rupture point detection section, 10 is an analysis (,2!'fFt21/hole part, 16 is a 17 is a speech analysis section, 18 is a time length calculation section, 19 is an analysis position correction section, and 20 is a short section speech analysis section.

本発明において％徴的な構成部分は、過渡部スペクトル
系列を抽出する音声分析手段１７であり、この音声分析
手段１７は、破裂時点から母音立ち上がり時点までの時
間長を演算する手段１８と、その時間長に応じてスペク
トル系列の分析位置を決定する手段１０と、音声信号の
ピッチに同期して前記分析位置を補正する手段１９と、
短区間音声分析を行う手段２０とから構成される。A characteristic component of the present invention is a speech analysis means 17 for extracting a transient spectrum sequence, and this speech analysis means 17 includes means 18 for calculating the time length from the rupture point to the vowel rise point, and means 10 for determining the analysis position of the spectral sequence according to the time length; means 19 for correcting the analysis position in synchronization with the pitch of the audio signal;
and means 20 for performing short-range speech analysis.

[Effect]

本発明においては、破裂時点から母音立ち上がり時点ま
での時間長を演算する手段と、その時間長に応じてスペ
クトル系列の分析位置を決定する手段により、各発声時
の変動や各子音毎の母音への立ち上がり変動を事前時間
正規化することにより吸収し、またさらに、音声信号の
ピッチに同期して前記分析位置を補正する手段とを設け
たことにより、ピッチの影響をも祭減する様に機能する
ものである。In the present invention, by means of calculating the time length from the time of rupture to the time of vowel rise, and means of determining the analysis position of the spectral series according to the time length, it is possible to analyze the fluctuations at each utterance and the vowel of each consonant. The variation in the rise of the sound signal is absorbed by normalizing the time in advance, and furthermore, by providing means for correcting the analysis position in synchronization with the pitch of the audio signal, it functions to reduce the influence of the pitch. It is something to do.

〔Example〕

第２図は本発明の一実施例のブロック構成図である。 FIG. 2 is a block diagram of an embodiment of the present invention.

図中、１はマイクロフォン、２はＡ／Ｄ変換器、３は破
裂時点検出部、４は音声データメモリ、５は音声パワー
演算部、６は音声パワー系列メモリ、７は母音定常検出
部、８は母音立ち上がり検出部、９は子音部時間決穴部
、１０は分析位置仮決定部、距離演算部、１６は判定部
、１７は音声分析部、１８は時間長演算部、１９は分析
位置補正部、２０は短区間音声分析部である。In the figure, 1 is a microphone, 2 is an A/D converter, 3 is a rupture point detection section, 4 is an audio data memory, 5 is an audio power calculation section, 6 is an audio power series memory, 7 is a vowel stationary detection section, 8 1 is a vowel rise detection unit, 9 is a consonant time determination unit, 10 is an analysis position temporary determination unit, a distance calculation unit, 16 is a determination unit, 17 is a speech analysis unit, 18 is a time length calculation unit, and 19 is an analysis position correction unit. Section 20 is a short period speech analysis section.

第３図は母音の定常口；４始点の抽出と母音の立ち上が
り点の抽出を示・１図、第４図は仮の分析位置の設定を
示す図、第５図は実の分析位置の設定を示す図、第６図
は実の分析位置の抽出を示す図である。Figure 3 shows the extraction of the steady mouth of a vowel; the extraction of the 4 starting points and the rising point of the vowel. Figures 1 and 4 are diagrams showing the setting of the temporary analysis position, and Figure 5 is the setting of the actual analysis position. FIG. 6 is a diagram showing extraction of the actual analysis position.

以下に実施例の動作を説明する。The operation of the embodiment will be explained below.

マイクロフォン１から入力された離散単音節音声（有声
破裂音）は、Ａ／Ｄ変換器２によりアナログ−ディジタ
ル変俟され、音声データメモリ４に格納される。このメ
モリ４内に格納された音声信号から有声破裂音特有の現
象である破裂の生じた時点、即ち破裂時点を破裂時点検
出部３で検出する。この検出方法としては各υ方式が用
いられる。例えば波形振幅が急激に変化する時点を検出
することによって可能である。Discrete monosyllabic speech (voiced plosives) inputted from the microphone 1 is subjected to analog-to-digital conversion by the A/D converter 2 and stored in the speech data memory 4. From the audio signal stored in the memory 4, the rupture time detection section 3 detects the time point at which a rupture, which is a phenomenon peculiar to voiced plosives, occurs, that is, the rupture point. Each υ method is used as this detection method. For example, this can be done by detecting the point in time when the waveform amplitude suddenly changes.

次に音声パワー演算部５において音声パワーを演算する
。メモリ４内に格納された音声信号をＸ（１）とすると
、音声パワーは、（但し、Ｎは演算区間長）で定義する。Next, the audio power calculation section 5 calculates the audio power. Assuming that the audio signal stored in the memory 4 is X(1), the audio power is defined as follows (where N is the calculation interval length).

上記定義に従って検出された破裂時点以後の音声信号に
ついて分析周期Ｍで音声パワー系列を求め、音声パワー
系列メモリ６に格納する。A voice power series is obtained at an analysis period M for the voice signal after the point of rupture detected according to the above definition, and is stored in the voice power series memory 6.

次に、その音声パワ一時系列を基に後続−ｆる母音部の
定常開始点を母音定常検出部７で抽出する。Next, the steady-state starting point of the subsequent -f vowel part is extracted by the vowel steady-state detector 7 based on the speech power temporal sequence.

破裂時点以後の音声パワー系列において、そのパワーの
変動がある閾値以下になる時点を母音の定常部間始点と
する。In the speech power series after the rupture point, the point in time when the power fluctuation becomes less than a certain threshold is defined as the starting point between vowel stationary parts.

Ｐ（Ｊｌ）を音声パワーの時系列とすれば、Ｉ　Ｐ　（
ｌ　ｉ　）　　Ｐ　（ｌ　ｌ＋　１　）　ｌ≦ＴＨ（閾
値）・・・（２）となる１時刻をそれとする。（第３図
の８点）次に、母音立ち上がり検出部８において母音の
立ち上がり点な検出する。ここでは、前記母音の定常開
始点の音声パワー値よりパワーの値が２０チダクンした
時点を求め、その時点を母音の立ち上がり時点とする。If P(Jl) is the time series of voice power, then I P (
l i ) P (l l+ 1 ) l≦TH (threshold value) (2) Let this be one time. (8 points in FIG. 3) Next, the vowel rise detection section 8 detects the vowel rise point. Here, the point in time when the power value is 20 degrees lower than the voice power value at the steady-state starting point of the vowel is determined, and that point is defined as the rising point of the vowel.

（第３図のＴ点）この母音の立ち上がり点と破裂時点との間の区間を有声
破裂音の子音区間とし、子音長を求める。(Point T in Figure 3) The interval between the rising point of this vowel and the point of plosive is defined as the consonant interval of the voiced plosive, and the consonant length is determined.

これは子音部時間決穴部９で求められる。（第３図参照
）次に１これら得られた情報即ち、破裂時点、母音立ち上
がり点、子音長時間を基に分析位置仮決定部１０におい
て分析位置を仮に決定する。ここでは子音区間において
３箇所に分析位置を仮に定める。破裂時点及び子音区間
の中心、母音立ち上がり点に分析の開始位置を設定する
。（ここでは１実施例として３箇所の分析位置を設定し
たため子音長は実質的に用いられないが、例えばＮ箇所
とした場合にはこの子音長を（Ｎ−１）で割った値を同
期として破裂時点から分析位置をＮ箇所設定する）（第
４図参照）次に、前記の仮に設定された分析位置の中で、子音区間
中心と母音立ち上がり点の２つに関して、波形位相を合
わせるよ５Ｋ（ピッチ同期）分析位置を補正する。この
機能はピッチ同期補正部１１において実行される。This is found in the consonant part time determination part 9. (See FIG. 3) Next, based on the obtained information, that is, the rupture point, the vowel rise point, and the consonant duration, the analysis position tentative determination section 10 tentatively determines the analysis position. Here, three analysis positions are temporarily determined in the consonant interval. The analysis start position is set at the rupture point, the center of the consonant interval, and the vowel onset point. (Here, as an example, three analysis positions are set, so the consonant length is not actually used. However, for example, if N positions are set, the value obtained by dividing this consonant length by (N-1) is used as the synchronization value. (Set N analysis positions from the point of rupture) (See Figure 4) Next, among the temporarily set analysis positions mentioned above, adjust the waveform phase with respect to two points: the center of the consonant interval and the rise point of the vowel. (Pitch synchronization) Correct the analysis position. This function is executed in the pitch synchronization correction section 11.

これは次の様にして行われろ。Do this as follows.

第６図の様に仮の分析位置を中心に１０ｍ５程度のす、
チ窓を音声信号にかけ、その振幅の最大点を検出する。As shown in Figure 6, there is a distance of about 10m5 around the temporary analysis position.
Apply a check window to the audio signal and detect the maximum point of its amplitude.

この最大点は声帯の励振点に対応する。この方法により
第５図の如く実の分析位置が決定される。This maximum point corresponds to the excitation point of the vocal cords. By this method, the actual analysis position is determined as shown in FIG.

上記方法によって求められた分析位置を分析位置指示部
１２で指示することＫより音声信号を周波数分析部工３
でスペクトル分析する。なお、分析窓長は時間分解能を
あげるため、数ｍｓの小さい窓長な用いるものとする。Instructing the analysis position determined by the above method using the analysis position instruction section 12;
Analyze the spectrum using Note that, in order to improve the time resolution, a small window length of several ms is used as the analysis window length.

得られた入カスベクトル系列／／Ｉ＝／Ｉ　／１　７１
　とＩｌ　　　鵞１ｍ予め辞書メモリ１４内に格納されている標準スペクトル
系列Ｎ””ｆ：、　ｆ：、　ｊ：との距離演算を次式の
距離定牡に基き行う。Obtained input waste vector sequence //I=/I /1 71
The distance calculation between and the standard spectrum series N""f:, f:, j: stored in advance in the dictionary memory 14 is performed based on the distance determination of the following equation.

距離Ｄ−ΣＩ／’−戸１　　・・・旧・・川　（３）Ｋ
ｘｌ　　　　ｊ［Ｋこれは距離演算部１５において実行される。Distance D-ΣI/'-Door 1... Old... River (3) K
xl j[K This is executed in the distance calculation section 15.

判定部１６において、この距離の値に基づいて（Ｍ小距
離を有するカテゴリ）結果を出方する。The determination unit 16 generates a result (category with M short distances) based on this distance value.

前向での各子音毎の母音への立ち上がり変動を事前に時
間正矧化することで吸収できるとともに、ピッチによる
スペクトル変動の影響を軽減することが可能となり、有
声破裂子音の速い現象変化を安定に捉えることができる
。このことから高率で有声破裂子音間の識別が可能とな
る。It is possible to absorb the variation in the rise of each consonant to the vowel in front by making the time regular in advance, and it is also possible to reduce the influence of spectral variation due to pitch, which stabilizes the rapid phenomenon changes of voiced plosive consonants. can be captured in This makes it possible to discriminate between voiced plosive consonants at a high rate.

[Brief explanation of drawings]

第１図は本発明の原理ブロック図、第２図は本発明の一実施例を示す有声破裂子音識別装置
のブロック図、第３図は母音の定常開始点と母音の立ち上がり点の抽出
を示す図、第４図は仮の分析位置の設定を示す図、第５図は実の分
析位置の設定を示す図、第６図は笑の分析位置の抽出を
示す図である。第１図において、３は破裂時点検出部、１０は分析位置
仮決定部、１６は判定部、１７は音声分析部、１８は時
間長演算部、１９は分析位置補正部、２０は短区間音声
分析部である。、ｙ７’、、７７〜′〜３、代理人　弁理士　井　桁　貞′二２　、□。マイクロフォン＋全日月め原理７゛口・・７７図名１図母音め定零間妬点の４宙出り分音の立も１点の抽出第　
３　図　　　　　■ ７−コイ及の不＞５ｔｕｉめ獣第４　図　　　　　　　■ 寞の分相位置の設定第　５　図実の分粕ゴ立ｌの抽出＆　　　ろ　　図Fig. 1 is a block diagram of the principle of the present invention. Fig. 2 is a block diagram of a voiced plosive consonant identification device showing an embodiment of the present invention. Fig. 3 shows extraction of the steady start point of a vowel and the rising point of a vowel. 4 is a diagram showing the setting of a temporary analysis position, FIG. 5 is a diagram showing the setting of the actual analysis position, and FIG. 6 is a diagram showing the extraction of the analysis position of laughter. In FIG. 1, 3 is a rupture point detection unit, 10 is an analysis position provisional determination unit, 16 is a determination unit, 17 is a voice analysis unit, 18 is a time length calculation unit, 19 is an analysis position correction unit, and 20 is a short section audio This is the analysis department. ,y7',,77~'~3, Agent: Patent attorney Teiji Igata, □. Microphone + Full Moon Principle 7゛ Mouth...77 Figure Name 1 Figure Vowel Me Determined Zero Interval Jealous Point 4 Airborne Minute Standing Also 1 Point Extraction No.
3 Fig. ■ 7-Carp and non->5tui beasts Fig. 4 ■ Setting the phase separation position of the carp Fig. 5 Extracting the separating lees of the fruit &

Claims

[Scope of Claims] Means (3) for detecting the point of rupture of a voiced plosive consonant, speech analysis means (17) for extracting a spectral sequence from a speech signal in the transition region from the point of plosive to a vowel, and the spectral sequence. In the voiced plosive consonant identifying device (16) which performs discrimination based on information, the voice analysis means (17) extracts a spectral sequence from the audio signal in the transitional part from the plosive point to the vowel. means (18) for calculating the time length from to the vowel onset point; means (10) for determining the analysis position of the spectral series according to the time length; and correcting the analysis device in synchronization with the pitch of the audio signal. A method for identifying voiced plosive consonants, comprising means (19) for performing short-range speech analysis according to the corrected analysis position.