JPS59123894A

JPS59123894A - Head phoneme initial extraction processing system

Info

Publication number: JPS59123894A
Application number: JP57229269A
Authority: JP
Inventors: 佐藤　泰雄; 大山　隆之
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-12-29
Filing date: 1982-12-29
Publication date: 1984-07-17

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（５）発明の技術分野本発明は、先端部音素始端抽出処理方式、特に、人力音
声の先端部音素の始端を厳密に抽出すべく、エネルギに
もとづいて始端を仮決定しておいて真の始端を抽出する
ようにしだ音声処理システムにおいて、入力音声の中高
域成分と低域成分との夫々について仮始端を抽出するよ
う構成し、該抽出に当って他方の成分からの漏れ成分に
よる影響を少なくするようにした先端部音素始端抽出処
理方式に関するものである。Detailed Description of the Invention (5) Technical Field of the Invention The present invention relates to a processing method for extracting the beginning of the leading end of a phoneme, and in particular, to extract the beginning of the leading end of the leading end of human speech in a tentative manner based on energy. The speech processing system is configured to extract the tentative starting point for each of the middle and high frequency components and the low frequency component of the input audio, and for the extraction, the true starting point is extracted from the other component. This invention relates to a processing method for extracting the beginning of a phoneme at the tip, which reduces the influence of leakage components.

（Ｂｌ　　技術の背景と問題点例えば単音節の認識処理などにおいては、入力音声の先
端部音素の始点を厳密に決定してやることが必要である
。このために、従来から、入力音声のエネルギが閾値を
超える点ン：仮始端としておき、例えばこの仮始端にも
とづいて当該音素について特徴を抽出してみるなどして
真の始端を抽出する如き方式が知られている。このよう
な認識処理に当っては、例えば無声子音のｒ　ｉ（Ａ　
Ｊと有声子音の「ＧＡ」とを区別しようとする場合には
、中高域成分と低域成分とを夫々分離して夫々の始端を
抽出し、夫々の成分の存否や始端位置などによって区別
することが有効である。(Bl Background and problems of the technology For example, in monosyllable recognition processing, it is necessary to precisely determine the starting point of the tip end phoneme of the input speech.For this purpose, it has traditionally been necessary to set the energy of the input speech to a threshold value. There is a known method in which the point exceeding the initial point is set as a temporary starting point, and the true starting point is extracted by, for example, extracting the features of the phoneme based on this temporary starting point. For example, the voiceless consonant r i (A
When trying to distinguish between J and the voiced consonant "GA", the middle and high frequency components and the low frequency components are separated, the beginnings of each are extracted, and the difference is made based on the presence or absence of each component and the location of the beginning. This is effective.

第１区間は無声子音ｒＫＡｊのエネルギ変化を示し、第
１図（Ｂ）は低域成分のエネルギ変化、第１図（ｑは中
高域成分のエネルギ変化を示している。The first section shows the energy change of the unvoiced consonant rKAj, FIG. 1(B) shows the energy change of the low frequency component, and FIG. 1 (q shows the energy change of the middle and high frequency component).

子音ｒＫＡＪの場合には、中高域成分が最初から現われ
低域成分は遅れて、現われている。また第２図（５）は
有声子音「ＧＡ」のエネルギ変化を示し、第２図（ハ）
は低域成分のエネルギ変化、第２図０（は中高域成分の
エネルギ変化を示している。子音ｒＧＡＪの場合には、
低域成分が最初から現われ中高域成分は遅れて現われて
いる。In the case of the consonant rKAJ, the middle and high frequency components appear from the beginning, and the low frequency components appear later. Also, Figure 2 (5) shows the energy change of the voiced consonant "GA", and Figure 2 (C) shows the energy change of the voiced consonant "GA".
shows the energy change in the low frequency component, and Figure 2 (0) shows the energy change in the middle and high frequency component. In the case of the consonant rGAJ,
The low frequency components appear from the beginning, and the middle and high frequency components appear later.

このような背景の下で、上記人力音声の先端部音素の始
端を抽出するに当って、上記低域成分と上記中高域成分
とについて夫々個別に抽出することが望まれる。Under such a background, when extracting the beginning of the leading end phoneme of the human voice, it is desirable to separately extract the low frequency component and the middle and high frequency component.

（ｑ　発明の目的と構成本発明は、上記の点を解決することを目的としており、
その上で上記成分相互間のエネルギの漏洩を層成して正
しく始端を抽出し得るようにすることを目的としている
。そしてそのため、本発明の先端部音素始端抽出処理方
式は、入力音声の先端部音素の始端を抽出するに当って、上記
先端部音素の始端を当該始端におけるエネルギと閾値と
の対比によって仮決定し、該仮決定された仮始端あるい
はその近傍から始゛端を抽出するようＫした先端部音素
始端抽出処理方式において、中高域フィルタと低域フィ
ルタをそなえて上記入力音声の中高域成分と低域成分と
を夫々抽出して夫々の始端を検出するよう構成すると共
に、上記中高域成分の始端を検出する中高域成分始端検
出部が予め定められた値と上記低域成分のエネルギに比
例した値との合成値をもつ闇値による判定処理部を少な
くともそなえ、かつ上記低域成分の始端を検出する低域
成分始端検出部が予め足められたｉｔ＆と上記中高域成
分のエネルギに比例した値との合成値をもつ闇値による
判定処理部を少なくともそなえ、上記中高域成分始端検
出部と上記低域成分始端検出部とが夫々上記仮始端を個
別に抽出するようにしたことをｔ＋ｉｆ徴どしている。(q Purpose and structure of the invention The purpose of the present invention is to solve the above points,
Furthermore, the purpose is to stratify the energy leakage between the components so that the starting point can be extracted correctly. Therefore, the leading end phoneme start extraction processing method of the present invention, when extracting the leading end of the leading end phoneme of the input speech, temporarily determines the beginning end of the leading end phoneme by comparing the energy at the beginning end with a threshold value. , in the front end phoneme start extraction processing method, which extracts the start end from the tentatively determined tentative start end or its vicinity, a middle-high band filter and a low-pass filter are provided to extract the middle-high band components and low band components of the input voice. The mid-high range component start end detecting section detects the start end of the mid-high range component and a value proportional to the energy of the low range component and a predetermined value. and a low-frequency component start detection section for detecting the start of the low-frequency component, which is added in advance to a value proportional to the energy of the middle and high-frequency components. The t+if characteristic includes at least a determination processing section based on a dark value having a composite value of are doing.

以下図面を参照しつつ説明する。This will be explained below with reference to the drawings.

実施例第３図は本発す］の−英飽例構成を示している。Example FIG. 3 shows an example configuration of ``This issue''.

第３図において、１は低域フイノノタでちって例えば５
ｏＨｚないし３５０）ｔ、の周波数成分をパスするもの
、２は中高域フィルタであって例えばＩ　ＫＨ，ないし
４．９　ＫＨ，の周波数成分をパスするもの、３．４は
夫々パワー計算部であって夫々抽出されてきた周波数成
分についてエネルギを計算するもの、５，６は夫々１！
“づ１直決定部、７，８は夫々始端、演出部であって本
発明にいう仮始端を検出するものを表わしている。In Figure 3, 1 is a low-frequency fin, for example, 5.
2 is a medium-high pass filter that passes frequency components of, for example, I KH to 4.9 KH, and 3.4 is a power calculation unit, respectively. 5 and 6 are each 1!
1. The direct determination section 7 and 8 represent a starting end and a production section, respectively, which detect a tentative starting end according to the present invention.

第１図および第２図に関連して説明した如く低域成分と
中高域成分とを夫々個別に抽出すべく、第３図図示の如
くフィルタ１，２がもうけられる。As explained in connection with FIGS. 1 and 2, filters 1 and 2 are provided as shown in FIG. 3 in order to separately extract low frequency components and middle and high frequency components, respectively.

そしてフィルタ１を通過した低域成分についてパワー計
算部３においてエネルギＰｗＬが計算され、またフィル
タ２を通過した中高域成分についてパワー計算部４にお
いてエネルギＰｗＨが計算される。Energy PwL is calculated for the low frequency components that have passed through filter 1 in power calculation section 3, and energy PwH is calculated for middle and high frequency components that have passed through filter 2 in power calculation section 4.

有声始端検出部７は、基本的には、上記エネルギｐｗ［
＋が予め定めだ閾値を超えるとき、有声子音についての
仮始端ｔ・・′を抽出するものである。しかし、本来の
エネルギｐｗＩｉは十分小さくても、中高域成分エネル
ギＰ、Ｈが大きい場合に、漏洩が生じていて、パワー計
算部３の見掛は上のエネルギが上記閾値よシも大きくな
ることが生じる。午のために、閾値決定部５においては
、上記エネルギＰＷＨの値を導入して闇値を決定するよ
う構成されている。また閾値決定部６においても、同様
であり、上記エネルギｐｗＬの値を導入して閾値を決定
するようにしている。The voiced start edge detection unit 7 basically uses the energy pw[
When + exceeds a predetermined threshold, the tentative beginning t...' of the voiced consonant is extracted. However, even if the original energy pwIi is sufficiently small, if the middle and high frequency component energies P and H are large, leakage occurs, and the apparent energy of the power calculation unit 3 is higher than the above threshold value. occurs. The threshold value determination unit 5 is configured to introduce the value of the energy PWH to determine the darkness value for the sun. The same applies to the threshold value determination unit 6, and the threshold value is determined by introducing the value of the energy pwL.

上記始端検出部７，８における仮始端ｔ　ｖ’　、ｔ　
ｕ’の抽出は次のように行われる。Temporary starting point t v', t in the starting point detection units 7 and 8
The extraction of u' is performed as follows.

ＣＩ）無声始端検出部８における処理。CI) Processing in the silent start detection unit 8.

上記エネルギＰｗＨに対してｌＱｍｓの時間幅をもつ観
測窓を時間２ｍｓ間隔のサンプル点をとるよう走査して
ゆく。そして成る時間位置での観測窓内のエネルギが閾
値ＴＨ，＝　３．０　＋　０．　Ｉ　Ｘ　ＰｗＬ　−（１
）を超えるとき、無声子音についての仮始端ｔｕ′を抽
出する。An observation window having a time width of 1Qms is scanned for the above energy PwH so that sample points are taken at intervals of 2ms. The energy within the observation window at the time position becomes the threshold TH, = 3.0 + 0. I X PwL −(1
), the temporary beginning tu' of the voiceless consonant is extracted.

ｒ■〕有声始端検出部７における処理。r■] Processing in the voiced start detection unit 7.

上記エネルギＰｗＬに対して５ｍｓの時間幅をもつ観測
窓を時間２ｍｓ間隔のサンプル点をとるよう走査してゆ
く。そして（１）成る時間位置での観測窓内のエネルギ
が閾値ＴＩ（Ｌ、＝０．５　　−（２）を超えた場合か、（＋＋）該第（２）式によって得られ
た仮始端ｔｖ’が仮始端ｉｕ’よシも遅れている場合に
は上記酸る時間位置での観測窓内のエネルギが閾値Ｔ）（Ｌ−＝　１０．０　＋　０．５　Ｘ　ＰｗＨ−□
（３）を超えた場合か、のいずれかをもって有声子音に
ついての仮始端ｔｖ’を抽出する。An observation window having a time width of 5 ms is scanned for the above energy PwL so that sample points are taken at intervals of 2 ms. Then, if the energy within the observation window at the time position (1) exceeds the threshold TI (L, = 0.5 - (2)), or (++) the temporary starting point tv obtained by the equation (2) If ' is later than the tentative starting point iu', the energy within the observation window at the above-mentioned acidic time position is the threshold T) (L-= 10.0 + 0.5 X PwH-□
The tentative beginning tv' of a voiced consonant is extracted either when (3) is exceeded.

上記の如くして仮始端ｔｕ’とｔｖ′とが得られるが、
始端ｉｕやｔｖは、これら仮始端の近傍において次の如
き処理を行って抽出するようにされる。The tentative starting points tu' and tv' are obtained as described above, but
The starting points iu and tv are extracted by performing the following processing in the vicinity of these temporary starting points.

（ＩＩＩ）無声始端ｔｕ０上記仮始端ｔｕ′の近傍即ち（ｔｕ’　−５ｍｓ　）の
時間位置から（ｔｕ′＋２０ｍ５　）の時間位置の範囲
内で、上記エネルギーＰｗＨが急変した時点が検出され
た場合、その時点を始端ｔｕとする。検出されなかった
場合、上記仮始端ｔｕ’を始端ｔｕとする。(III) Silent start point tu0 When a point in time when the energy PwH suddenly changes is detected in the vicinity of the tentative start point tu', that is, within the time position from (tu'-5ms) to (tu'+20m5), Let that point be the starting point tu. If not detected, the temporary starting end tu' is set as the starting end tu.

なお、ｔｖ’　＞　ｔｕ’であって、かつ（ｔｕ’　＋２０ｍｓ　）　＞　ｔｖ′である場合
には、上記範囲を（ｔｕ’　−５ｍｓ　）からｔｖ’ま
でとしている。Note that when tv'>tu' and (tu' + 20 ms) >tv', the above range is from (tu' - 5 ms) to tv'.

（ＩＶ）有声始端ｊｖ　。(IV) Voiced beginning jv.

上記仮始端ｔｖ″の近傍即ちｔｖ′の時間位置から（ｔ
ｖ’＋５ｍ５）の時間位置の範囲内で、上記エネルギｐ
Ｊが急変した時点が検出された場合、その時点を始端ｔ
・とする。検出されなかった場合、上記仮始端ｔｖ’を
始端ｊｖとする。From the vicinity of the tentative starting point tv'', that is, from the time position of tv' (t
Within the time position range of v'+5m5), the above energy p
If a point where J suddenly changes is detected, that point is set as the starting point t
・Suppose. If it is not detected, the temporary starting point tv' is set as the starting point jv.

上記の如く、夫々の始’Ａ　ｊＬｌ　、　　ｔｖが抽出
されるが、上述にいうエネルギの急変時点は次のように
抽出されると考えてよい。即ち、始端ｔ　Ｉ＋抽出の場
合、上記エネルギＰ＝Ｈに対して３ｍｓの時間幅をも′
）２つの観測窓を例えば、連結して時間０．５ｍｓ間隔
のサンプル点をとるよう走査してゆく。As described above, each beginning 'A jLl , tv is extracted, but the above-mentioned point of sudden change in energy can be considered to be extracted as follows. That is, in the case of starting point t I+ extraction, a time width of 3 ms is also required for the above energy P=H.
) Two observation windows are connected, for example, and scanned to take sample points at intervals of 0.5 ms.

そして第１の観測窓でのエネルギｉ％１と第２の観測窓
でのエネルギＰｗ！とについて、を計算し、その値が閾値３．　Ｏｆ：超えるとき、当該
例えば連結位１置に対応する時間位置において急変が生
じているとみなすようにする。１だ、始端ｔｖ　　抽出
の場合、上記エネルギＰｗＬに対して３ｍ　ｓの時間幅
をもつ２つの哉測窓を連結して時間０．５ｍ５（４３隔
のサンプル点をとるよう走査してゆく。Then, the energy i%1 at the first observation window and the energy Pw at the second observation window! For , calculate , and the value is the threshold 3. Of: When exceeds, it is assumed that a sudden change has occurred at the time position corresponding to, for example, the connection position 1. 1. In the case of starting point tv extraction, two measurement windows with a time width of 3 ms are connected for the above energy PwL, and scanning is performed so as to take sample points at intervals of 0.5 m5 (43 intervals).

そして、上記第（４）式に対応するＤＰＷの側が閾値２
．０を超えるとき、当該連結位置に対応ｊ゛る時間位置
において８．変が生じているとみなすようにする。Then, the DPW side corresponding to the above equation (4) is the threshold value 2.
．． When it exceeds 0, at the time position corresponding to the relevant connection position, 8. Consider that a change is occurring.

（ト）発明の詳細な説明した如く、本発明によれば、入力音声の先端部音
素の始端について、中高域成分と低域成分との夫々を個
別に抽出するよう便成し、その際に上記成分相互間のエ
ネルギに漏れが生じていても、仮始端を正しく抽出する
ことができ、それに伴ってＲ’ｙＦｔＡを正しく抽出す
ることが可能となる。(g) As described in detail, according to the present invention, it is arranged to individually extract mid-high frequency components and low frequency components with respect to the beginning of the leading end phoneme of input speech, and at that time, Even if there is a leakage of energy between the components, the tentative starting point can be extracted correctly, and accordingly, R'yFtA can be extracted correctly.

[Brief explanation of drawings]

第１図および第２図は本発明の前提問題を説明する説明
図、第３図は本発明の一芙施例構成を示す。図中、１は低域フィルタ、２は中高域フィルタ、３．４
はパワー計算部、５，６は閾値決定部、７．８は始ｙｉ
Ｍ検出部を表わす。特許出願人　ｇ士通株式会社FIGS. 1 and 2 are explanatory diagrams for explaining the prerequisite problems of the present invention, and FIG. 3 shows the configuration of one embodiment of the present invention. In the figure, 1 is a low-pass filter, 2 is a mid-high-pass filter, and 3.4
is the power calculation unit, 5 and 6 are the threshold value determination units, and 7.8 is the start yi
Represents the M detection section. Patent applicant: gshitsu Co., Ltd.

Claims

[Claims]

When extracting the starting edge of the leading end phoneme of the input speech, the starting end of the leading end phoneme is tentatively determined by comparing the energy at the starting end with a threshold value, and the starting end is extracted from the provisionally determined tentative starting end or its vicinity. The front end phoneme start point extraction processing method is configured to include a mid-high range filter and a low-pass filter to extract mid-high range components and low range components of the input voice, respectively, and detect the start end of each of them. At the same time, the mid-high range component starting edge detection unit for detecting the start end of the mid-high range component includes at least a determination processing unit using a threshold value having a composite value of a predetermined value and a value proportional to the energy of the low frequency component, and the low-frequency component starting edge detection section for detecting the starting edge of the low-frequency component includes at least a determination processing section using a threshold value having a composite value of a predetermined value and a value proportional to the energy of the mid-high frequency component; 1. A front end phoneme start extraction processing method, characterized in that the range component start detection section and the low frequency component start detection section each individually extract the provisional start.