JPS59123894A - Head phoneme initial extraction processing system - Google Patents

Head phoneme initial extraction processing system

Info

Publication number
JPS59123894A
JPS59123894A JP57229269A JP22926982A JPS59123894A JP S59123894 A JPS59123894 A JP S59123894A JP 57229269 A JP57229269 A JP 57229269A JP 22926982 A JP22926982 A JP 22926982A JP S59123894 A JPS59123894 A JP S59123894A
Authority
JP
Japan
Prior art keywords
energy
low
starting
mid
frequency component
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP57229269A
Other languages
Japanese (ja)
Inventor
佐藤 泰雄
大山 隆之
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to JP57229269A priority Critical patent/JPS59123894A/en
Publication of JPS59123894A publication Critical patent/JPS59123894A/en
Pending legal-status Critical Current

Links

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 (5)発明の技術分野 本発明は、先端部音素始端抽出処理方式、特に、人力音
声の先端部音素の始端を厳密に抽出すべく、エネルギに
もとづいて始端を仮決定しておいて真の始端を抽出する
ようにしだ音声処理システムにおいて、入力音声の中高
域成分と低域成分との夫々について仮始端を抽出するよ
う構成し、該抽出に当って他方の成分からの漏れ成分に
よる影響を少なくするようにした先端部音素始端抽出処
理方式に関するものである。
Detailed Description of the Invention (5) Technical Field of the Invention The present invention relates to a processing method for extracting the beginning of the leading end of a phoneme, and in particular, to extract the beginning of the leading end of the leading end of human speech in a tentative manner based on energy. The speech processing system is configured to extract the tentative starting point for each of the middle and high frequency components and the low frequency component of the input audio, and for the extraction, the true starting point is extracted from the other component. This invention relates to a processing method for extracting the beginning of a phoneme at the tip, which reduces the influence of leakage components.

(Bl  技術の背景と問題点 例えば単音節の認識処理などにおいては、入力音声の先
端部音素の始点を厳密に決定してやることが必要である
。このために、従来から、入力音声のエネルギが閾値を
超える点ン:仮始端としておき、例えばこの仮始端にも
とづいて当該音素について特徴を抽出してみるなどして
真の始端を抽出する如き方式が知られている。このよう
な認識処理に当っては、例えば無声子音のr i(A 
Jと有声子音の「GA」とを区別しようとする場合には
、中高域成分と低域成分とを夫々分離して夫々の始端を
抽出し、夫々の成分の存否や始端位置などによって区別
することが有効である。
(Bl Background and problems of the technology For example, in monosyllable recognition processing, it is necessary to precisely determine the starting point of the tip end phoneme of the input speech.For this purpose, it has traditionally been necessary to set the energy of the input speech to a threshold value. There is a known method in which the point exceeding the initial point is set as a temporary starting point, and the true starting point is extracted by, for example, extracting the features of the phoneme based on this temporary starting point. For example, the voiceless consonant r i (A
When trying to distinguish between J and the voiced consonant "GA", the middle and high frequency components and the low frequency components are separated, the beginnings of each are extracted, and the difference is made based on the presence or absence of each component and the location of the beginning. This is effective.

第1区間は無声子音rKAjのエネルギ変化を示し、第
1図(B)は低域成分のエネルギ変化、第1図(qは中
高域成分のエネルギ変化を示している。
The first section shows the energy change of the unvoiced consonant rKAj, FIG. 1(B) shows the energy change of the low frequency component, and FIG. 1 (q shows the energy change of the middle and high frequency component).

子音rKAJの場合には、中高域成分が最初から現われ
低域成分は遅れて、現われている。また第2図(5)は
有声子音「GA」のエネルギ変化を示し、第2図(ハ)
は低域成分のエネルギ変化、第2図0(は中高域成分の
エネルギ変化を示している。子音rGAJの場合には、
低域成分が最初から現われ中高域成分は遅れて現われて
いる。
In the case of the consonant rKAJ, the middle and high frequency components appear from the beginning, and the low frequency components appear later. Also, Figure 2 (5) shows the energy change of the voiced consonant "GA", and Figure 2 (C) shows the energy change of the voiced consonant "GA".
shows the energy change in the low frequency component, and Figure 2 (0) shows the energy change in the middle and high frequency component. In the case of the consonant rGAJ,
The low frequency components appear from the beginning, and the middle and high frequency components appear later.

このような背景の下で、上記人力音声の先端部音素の始
端を抽出するに当って、上記低域成分と上記中高域成分
とについて夫々個別に抽出することが望まれる。
Under such a background, when extracting the beginning of the leading end phoneme of the human voice, it is desirable to separately extract the low frequency component and the middle and high frequency component.

(q 発明の目的と構成 本発明は、上記の点を解決することを目的としており、
その上で上記成分相互間のエネルギの漏洩を層成して正
しく始端を抽出し得るようにすることを目的としている
。そしてそのため、本発明の先端部音素始端抽出処理方
式は、 入力音声の先端部音素の始端を抽出するに当って、上記
先端部音素の始端を当該始端におけるエネルギと閾値と
の対比によって仮決定し、該仮決定された仮始端あるい
はその近傍から始゛端を抽出するようKした先端部音素
始端抽出処理方式において、中高域フィルタと低域フィ
ルタをそなえて上記入力音声の中高域成分と低域成分と
を夫々抽出して夫々の始端を検出するよう構成すると共
に、上記中高域成分の始端を検出する中高域成分始端検
出部が予め定められた値と上記低域成分のエネルギに比
例した値との合成値をもつ闇値による判定処理部を少な
くともそなえ、かつ上記低域成分の始端を検出する低域
成分始端検出部が予め足められたit&と上記中高域成
分のエネルギに比例した値との合成値をもつ闇値による
判定処理部を少なくともそなえ、上記中高域成分始端検
出部と上記低域成分始端検出部とが夫々上記仮始端を個
別に抽出するようにしたことをt+if徴どしている。
(q Purpose and structure of the invention The purpose of the present invention is to solve the above points,
Furthermore, the purpose is to stratify the energy leakage between the components so that the starting point can be extracted correctly. Therefore, the leading end phoneme start extraction processing method of the present invention, when extracting the leading end of the leading end phoneme of the input speech, temporarily determines the beginning end of the leading end phoneme by comparing the energy at the beginning end with a threshold value. , in the front end phoneme start extraction processing method, which extracts the start end from the tentatively determined tentative start end or its vicinity, a middle-high band filter and a low-pass filter are provided to extract the middle-high band components and low band components of the input voice. The mid-high range component start end detecting section detects the start end of the mid-high range component and a value proportional to the energy of the low range component and a predetermined value. and a low-frequency component start detection section for detecting the start of the low-frequency component, which is added in advance to a value proportional to the energy of the middle and high-frequency components. The t+if characteristic includes at least a determination processing section based on a dark value having a composite value of are doing.

以下図面を参照しつつ説明する。This will be explained below with reference to the drawings.

実施例 第3図は本発す]の−英飽例構成を示している。Example FIG. 3 shows an example configuration of ``This issue''.

第3図において、1は低域フイノノタでちって例えば5
oHzないし350)t、の周波数成分をパスするもの
、2は中高域フィルタであって例えばI KH,ないし
4.9 KH,の周波数成分をパスするもの、3.4は
夫々パワー計算部であって夫々抽出されてきた周波数成
分についてエネルギを計算するもの、5,6は夫々1!
“づ1直決定部、7,8は夫々始端、演出部であって本
発明にいう仮始端を検出するものを表わしている。
In Figure 3, 1 is a low-frequency fin, for example, 5.
2 is a medium-high pass filter that passes frequency components of, for example, I KH to 4.9 KH, and 3.4 is a power calculation unit, respectively. 5 and 6 are each 1!
1. The direct determination section 7 and 8 represent a starting end and a production section, respectively, which detect a tentative starting end according to the present invention.

第1図および第2図に関連して説明した如く低域成分と
中高域成分とを夫々個別に抽出すべく、第3図図示の如
くフィルタ1,2がもうけられる。
As explained in connection with FIGS. 1 and 2, filters 1 and 2 are provided as shown in FIG. 3 in order to separately extract low frequency components and middle and high frequency components, respectively.

そしてフィルタ1を通過した低域成分についてパワー計
算部3においてエネルギPwLが計算され、またフィル
タ2を通過した中高域成分についてパワー計算部4にお
いてエネルギPwHが計算される。
Energy PwL is calculated for the low frequency components that have passed through filter 1 in power calculation section 3, and energy PwH is calculated for middle and high frequency components that have passed through filter 2 in power calculation section 4.

有声始端検出部7は、基本的には、上記エネルギpw[
+が予め定めだ閾値を超えるとき、有声子音についての
仮始端t・・′を抽出するものである。しかし、本来の
エネルギpwIiは十分小さくても、中高域成分エネル
ギP、Hが大きい場合に、漏洩が生じていて、パワー計
算部3の見掛は上のエネルギが上記閾値よシも大きくな
ることが生じる。午のために、閾値決定部5においては
、上記エネルギPWHの値を導入して闇値を決定するよ
う構成されている。また閾値決定部6においても、同様
であり、上記エネルギpwLの値を導入して閾値を決定
するようにしている。
The voiced start edge detection unit 7 basically uses the energy pw[
When + exceeds a predetermined threshold, the tentative beginning t...' of the voiced consonant is extracted. However, even if the original energy pwIi is sufficiently small, if the middle and high frequency component energies P and H are large, leakage occurs, and the apparent energy of the power calculation unit 3 is higher than the above threshold value. occurs. The threshold value determination unit 5 is configured to introduce the value of the energy PWH to determine the darkness value for the sun. The same applies to the threshold value determination unit 6, and the threshold value is determined by introducing the value of the energy pwL.

上記始端検出部7,8における仮始端t v’ 、t 
u’の抽出は次のように行われる。
Temporary starting point t v', t in the starting point detection units 7 and 8
The extraction of u' is performed as follows.

CI)無声始端検出部8における処理。CI) Processing in the silent start detection unit 8.

上記エネルギPwHに対してlQmsの時間幅をもつ観
測窓を時間2ms間隔のサンプル点をとるよう走査して
ゆく。そして成る時間位置での観測窓内のエネルギが閾
値 TH,= 3.0 + 0. I X PwL −(1
)を超えるとき、無声子音についての仮始端tu′を抽
出する。
An observation window having a time width of 1Qms is scanned for the above energy PwH so that sample points are taken at intervals of 2ms. The energy within the observation window at the time position becomes the threshold TH, = 3.0 + 0. I X PwL −(1
), the temporary beginning tu' of the voiceless consonant is extracted.

r■〕有声始端検出部7における処理。r■] Processing in the voiced start detection unit 7.

上記エネルギPwLに対して5msの時間幅をもつ観測
窓を時間2ms間隔のサンプル点をとるよう走査してゆ
く。そして(1)成る時間位置での観測窓内のエネルギ
が閾値 TI(L、=0.5  −(2) を超えた場合か、(++)該第(2)式によって得られ
た仮始端tv’が仮始端iu’よシも遅れている場合に
は上記酸る時間位置での観測窓内のエネルギが閾値 T)(L−= 10.0 + 0.5 X PwH−□
(3)を超えた場合か、のいずれかをもって有声子音に
ついての仮始端tv’を抽出する。
An observation window having a time width of 5 ms is scanned for the above energy PwL so that sample points are taken at intervals of 2 ms. Then, if the energy within the observation window at the time position (1) exceeds the threshold TI (L, = 0.5 - (2)), or (++) the temporary starting point tv obtained by the equation (2) If ' is later than the tentative starting point iu', the energy within the observation window at the above-mentioned acidic time position is the threshold T) (L-= 10.0 + 0.5 X PwH-□
The tentative beginning tv' of a voiced consonant is extracted either when (3) is exceeded.

上記の如くして仮始端tu’とtv′とが得られるが、
始端iuやtvは、これら仮始端の近傍において次の如
き処理を行って抽出するようにされる。
The tentative starting points tu' and tv' are obtained as described above, but
The starting points iu and tv are extracted by performing the following processing in the vicinity of these temporary starting points.

(III)無声始端tu0 上記仮始端tu′の近傍即ち(tu’ −5ms )の
時間位置から(tu′+20m5 )の時間位置の範囲
内で、上記エネルギーPwHが急変した時点が検出され
た場合、その時点を始端tuとする。検出されなかった
場合、上記仮始端tu’を始端tuとする。
(III) Silent start point tu0 When a point in time when the energy PwH suddenly changes is detected in the vicinity of the tentative start point tu', that is, within the time position from (tu'-5ms) to (tu'+20m5), Let that point be the starting point tu. If not detected, the temporary starting end tu' is set as the starting end tu.

なお、 tv’ > tu’であって、 かつ(tu’ +20ms ) > tv′である場合
には、上記範囲を(tu’ −5ms )からtv’ま
でとしている。
Note that when tv'>tu' and (tu' + 20 ms) >tv', the above range is from (tu' - 5 ms) to tv'.

(IV)有声始端jv 。(IV) Voiced beginning jv.

上記仮始端tv″の近傍即ちtv′の時間位置から(t
v’+5m5)の時間位置の範囲内で、上記エネルギp
Jが急変した時点が検出された場合、その時点を始端t
・とする。検出されなかった場合、上記仮始端tv’を
始端jvとする。
From the vicinity of the tentative starting point tv'', that is, from the time position of tv' (t
Within the time position range of v'+5m5), the above energy p
If a point where J suddenly changes is detected, that point is set as the starting point t
・Suppose. If it is not detected, the temporary starting point tv' is set as the starting point jv.

上記の如く、夫々の始’A jLl 、  tvが抽出
されるが、上述にいうエネルギの急変時点は次のように
抽出されると考えてよい。即ち、始端t I+抽出の場
合、上記エネルギP=Hに対して3msの時間幅をも′
)2つの観測窓を例えば、連結して時間0.5ms間隔
のサンプル点をとるよう走査してゆく。
As described above, each beginning 'A jLl , tv is extracted, but the above-mentioned point of sudden change in energy can be considered to be extracted as follows. That is, in the case of starting point t I+ extraction, a time width of 3 ms is also required for the above energy P=H.
) Two observation windows are connected, for example, and scanned to take sample points at intervals of 0.5 ms.

そして第1の観測窓でのエネルギi%1と第2の観測窓
でのエネルギPw!とについて、 を計算し、その値が閾値3. Of:超えるとき、当該
例えば連結位1置に対応する時間位置において急変が生
じているとみなすようにする。1だ、始端tv  抽出
の場合、上記エネルギPwLに対して3m sの時間幅
をもつ2つの哉測窓を連結して時間0.5m5(43隔
のサンプル点をとるよう走査してゆく。
Then, the energy i%1 at the first observation window and the energy Pw at the second observation window! For , calculate , and the value is the threshold 3. Of: When exceeds, it is assumed that a sudden change has occurred at the time position corresponding to, for example, the connection position 1. 1. In the case of starting point tv extraction, two measurement windows with a time width of 3 ms are connected for the above energy PwL, and scanning is performed so as to take sample points at intervals of 0.5 m5 (43 intervals).

そして、上記第(4)式に対応するDPWの側が閾値2
.0を超えるとき、当該連結位置に対応j゛る時間位置
において8.変が生じているとみなすようにする。
Then, the DPW side corresponding to the above equation (4) is the threshold value 2.
.. When it exceeds 0, at the time position corresponding to the relevant connection position, 8. Consider that a change is occurring.

(ト)発明の詳細 な説明した如く、本発明によれば、入力音声の先端部音
素の始端について、中高域成分と低域成分との夫々を個
別に抽出するよう便成し、その際に上記成分相互間のエ
ネルギに漏れが生じていても、仮始端を正しく抽出する
ことができ、それに伴ってR’yFtAを正しく抽出す
ることが可能となる。
(g) As described in detail, according to the present invention, it is arranged to individually extract mid-high frequency components and low frequency components with respect to the beginning of the leading end phoneme of input speech, and at that time, Even if there is a leakage of energy between the components, the tentative starting point can be extracted correctly, and accordingly, R'yFtA can be extracted correctly.

【図面の簡単な説明】[Brief explanation of drawings]

第1図および第2図は本発明の前提問題を説明する説明
図、第3図は本発明の一芙施例構成を示す。 図中、1は低域フィルタ、2は中高域フィルタ、3.4
はパワー計算部、5,6は閾値決定部、7.8は始yi
M検出部を表わす。 特許出願人 g士通株式会社
FIGS. 1 and 2 are explanatory diagrams for explaining the prerequisite problems of the present invention, and FIG. 3 shows the configuration of one embodiment of the present invention. In the figure, 1 is a low-pass filter, 2 is a mid-high-pass filter, and 3.4
is the power calculation unit, 5 and 6 are the threshold value determination units, and 7.8 is the start yi
Represents the M detection section. Patent applicant: gshitsu Co., Ltd.

Claims (1)

【特許請求の範囲】[Claims] 入力音声の先端部音素の始端を抽出するに当って、上記
先端部音素の始端を当該始端におけるエネルギと閾値と
の対比によって仮決定し、該仮決定された仮始端あるい
はその近傍から始端を抽出するようにした先端部音素始
端抽出処理方式において、中高域フィルタと低域フィル
タをそなえて上記入力音声の中高域成分と低域成分とを
夫々抽出して夫々の始端を検出す−るよう構成すると共
に、上記中高域成分の始端を検出する中高域成分始端検
出部が予め定められた値と上記低域成分のエネルギに比
例した値との合成値をもつ閾値による判定処理部を少な
くともそなえ、かつ上記低域成分の始端を検出する低域
成分始端検出部が予め定められた値と上記中高域成分の
エネルギに比例した値との合成値をもつ閾値による判定
処理部を少なくともそなえ、上記中高域成分始端検出部
と上記低域成分始端検出部とが夫々上記仮始端を個別に
抽出するようにしたことを特徴とする先端部音素始端抽
出処理方式。
When extracting the starting edge of the leading end phoneme of the input speech, the starting end of the leading end phoneme is tentatively determined by comparing the energy at the starting end with a threshold value, and the starting end is extracted from the provisionally determined tentative starting end or its vicinity. The front end phoneme start point extraction processing method is configured to include a mid-high range filter and a low-pass filter to extract mid-high range components and low range components of the input voice, respectively, and detect the start end of each of them. At the same time, the mid-high range component starting edge detection unit for detecting the start end of the mid-high range component includes at least a determination processing unit using a threshold value having a composite value of a predetermined value and a value proportional to the energy of the low frequency component, and the low-frequency component starting edge detection section for detecting the starting edge of the low-frequency component includes at least a determination processing section using a threshold value having a composite value of a predetermined value and a value proportional to the energy of the mid-high frequency component; 1. A front end phoneme start extraction processing method, characterized in that the range component start detection section and the low frequency component start detection section each individually extract the provisional start.
JP57229269A 1982-12-29 1982-12-29 Head phoneme initial extraction processing system Pending JPS59123894A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP57229269A JPS59123894A (en) 1982-12-29 1982-12-29 Head phoneme initial extraction processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP57229269A JPS59123894A (en) 1982-12-29 1982-12-29 Head phoneme initial extraction processing system

Publications (1)

Publication Number Publication Date
JPS59123894A true JPS59123894A (en) 1984-07-17

Family

ID=16889460

Family Applications (1)

Application Number Title Priority Date Filing Date
JP57229269A Pending JPS59123894A (en) 1982-12-29 1982-12-29 Head phoneme initial extraction processing system

Country Status (1)

Country Link
JP (1) JPS59123894A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012036305A1 (en) * 2010-09-17 2012-03-22 日本電気株式会社 Voice recognition device, voice recognition method, and program

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012036305A1 (en) * 2010-09-17 2012-03-22 日本電気株式会社 Voice recognition device, voice recognition method, and program

Similar Documents

Publication Publication Date Title
US4720862A (en) Method and apparatus for speech signal detection and classification of the detected signal into a voiced sound, an unvoiced sound and silence
CN106448659B (en) A kind of sound end detecting method based on short-time energy and fractal dimension
Lovekin et al. Developing usable speech criteria for speaker identification technology
CN101625860A (en) Method for self-adaptively adjusting background noise in voice endpoint detection
CN101625858A (en) Method for extracting short-time energy frequency value in voice endpoint detection
JPS59123894A (en) Head phoneme initial extraction processing system
Sundaram et al. Usable Speech Detection Using Linear Predictive Analysis–A Model-Based Approach
JP2992324B2 (en) Voice section detection method
KR100835993B1 (en) Pre-processing Method and Device for Clean Speech Feature Estimation based on Masking Probability
Niederjohn et al. Computer recognition of the continuant phonemes in connected English speech
KR100273395B1 (en) Voice duration detection method for voice recognizing system
JPS61233791A (en) Voice section detection system for voice recognition equipment
US20220199074A1 (en) A dialog detector
Ru-Wei et al. Pitch detection method for noisy speech signals based on wavelet transform and autocorrelation function
JPS5936759B2 (en) Voice recognition method
JPS61238099A (en) Word voice recognition equipment
JP3008404B2 (en) Voice recognition device
Elghonemy et al. Speaker independent isolated Arabic word recognition system
JPS62238599A (en) Voice section detecting system
JPS62293299A (en) Voice recognition
JPS6217800A (en) Voice section decision system
Sundaram et al. Usable speech detection using linear predictive analysis
JPH0312699A (en) Voice recognition device
JPS60260096A (en) Correction system for voice section detecting threshold in voice recognition
JPH0289097A (en) Syllable pattern segmenting system