JPH0588840B2 - - Google Patents

Info

Publication number
JPH0588840B2
JPH0588840B2 JP60282481A JP28248185A JPH0588840B2 JP H0588840 B2 JPH0588840 B2 JP H0588840B2 JP 60282481 A JP60282481 A JP 60282481A JP 28248185 A JP28248185 A JP 28248185A JP H0588840 B2 JPH0588840 B2 JP H0588840B2
Authority
JP
Japan
Prior art keywords
peak
speech
speech segment
candidate
section
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP60282481A
Other languages
Japanese (ja)
Other versions
JPS62141595A (en
Inventor
Juichiro Fujihashi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
Nippon Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Electric Co Ltd filed Critical Nippon Electric Co Ltd
Priority to JP60282481A priority Critical patent/JPS62141595A/en
Publication of JPS62141595A publication Critical patent/JPS62141595A/en
Publication of JPH0588840B2 publication Critical patent/JPH0588840B2/ja
Granted legal-status Critical Current

Links

Description

【発明の詳細な説明】 (産業上の利用分野) 本発明は、音声認識装置等において音声の存在
する時間を判定するのに用いる音声検出方式に関
する。
DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a voice detection method used in a voice recognition device or the like to determine the time when voice exists.

(従来の技術) 従来、この種の音声検出方式では、音声のパワ
ーのレベルが閾夢値を越えている継続時間がある
一定時間以上のときに音声の始端とし、閾値を下
回つている継続時間がある一定時間以上のときに
音声の終端とする方式が多く用いられていた。
(Prior Art) Conventionally, in this type of voice detection method, the duration of the voice power level exceeding the threshold value is determined to be the beginning of the voice when the duration time is longer than a certain period of time, and the duration of the voice power level exceeding the threshold value is determined as the beginning of the voice, and the continuation of the voice power level exceeding the threshold value A method in which audio ends when the time exceeds a certain time has often been used.

(発明が解決しようとする問題点) 上述した従来の音声検出方式では、レベルの継
続時間によつて音声区間を検出しているから、パ
ワー・デイツプの深い音声の場合には語頭部が欠
落したり、瞬時的な雑音でも雑音が語尾に近接し
ている場合には終端が延長されて音声区間に雑音
が含まれる。このように、従来の音声検出方式に
は音声区間を誤つて検出するという欠点がある。
(Problems to be Solved by the Invention) In the conventional speech detection method described above, speech sections are detected based on the duration of the level, so in the case of speech with a deep power dip, the beginning of the word may be missing. Even if the noise is instantaneous, if the noise is close to the end of a word, the end will be extended and the noise will be included in the speech section. As described above, the conventional speech detection method has the drawback of erroneously detecting speech sections.

(問題点を解決するための手段) 前述の問題点を解決するために本発明が提供す
る手段は、音声信号のパワーを算出するパワー算
出部と、このパワー算出部が算出した前記パワー
を平滑化して平滑化ワーを得るパワー平滑化部
と、前記平滑化パワーの変化率が正から負に変わ
る変曲点をその平滑化パワーのピーク候補として
検出するピーク検出部と、前記ピーク候補のうち
レベルが最大であるピーク候補を最大ピークとし
て選出し、この最大ピークのレベルと所定のピー
ク選別用係数とからピーク選別用閾値を算出し、
前記最大ピークのレベルと所定のピーク幅算出用
係数とからピーク幅算出用閾値を算出する閾値算
出部と、前記検出部で検出した前記ピーク候補の
レベルと前記ピーク選別用閾値とを比較し、その
レベルが前記ピーク選別用閾値以上の前記ピーク
候補だけをピークとして選別する選別部と、前記
平滑化パワーが前記ピーク幅算出用閾値以上であ
る時間であつて前記ピーク選別部で選別されて前
記ピークを含む時間をピーク幅として算出するピ
ーク幅算出部と、前記ピーク幅のうち所定のピー
ク幅閾値より広いピーク幅を音声区間候補として
出力するピーク幅比較部と、このピーク幅比較部
で得た前記音声区間候補が複数である場合、、隣
接した前記音声区間候補のうちの前の前記音声区
間候補の終端から後の前記音声区間候補の始端ま
での時間を音声区間候補時間差として算出する音
声区間候補時間差算出部と、前記ピーク幅比較部
及び前記音声区間候補時間差算出部の出力結果か
ら音声区間の判定を行なう音声区間判定部とを備
え、この音声区間判定部は、前記音声区間候補が
1つの場合にそのままその音声区間候補を前記音
声区間と判定し、前記音声区間候補が複数であつ
て隣接している前記音声区間候補の前記音声区間
候補時間差が所定の音声区間候補時間差閾値より
短かい場合には複数の前記音声区間候補を1つの
音声区間候補にまとめて前の前記音声区間候補の
始端ら後ろの前記音声区間候補の終端までを新た
な音声区間候補とする音声区間候補のまとめ処理
を行ない、この音声区間候補のまとめ処理を繰返
し行ない最終的に残つた音声区間候補のうちの1
つ又はは複数を前記音声区間とすることを特徴と
する。
(Means for Solving the Problems) Means provided by the present invention to solve the above-mentioned problems includes a power calculation unit that calculates the power of an audio signal, and a power calculation unit that smooths the power calculated by the power calculation unit. a power smoothing unit that obtains a smoothed power by converting the rate of change into a smoothed power; a peak detection unit that detects an inflection point where the rate of change of the smoothed power changes from positive to negative as a peak candidate of the smoothed power; Selecting the peak candidate with the highest level as the maximum peak, calculating a peak selection threshold from the level of this maximum peak and a predetermined peak selection coefficient,
a threshold calculation unit that calculates a peak width calculation threshold from the maximum peak level and a predetermined peak width calculation coefficient, and a comparison between the level of the peak candidate detected by the detection unit and the peak selection threshold; a selection unit that selects only the peak candidates whose level is equal to or higher than the peak selection threshold as peaks; a peak width calculation unit that calculates the time including the peak as a peak width; a peak width comparison unit that outputs a peak width wider than a predetermined peak width threshold value as a voice section candidate among the peak widths; If there are a plurality of speech segment candidates, the time from the end of the previous speech segment candidate to the start of the next speech segment candidate among the adjacent speech segment candidates is calculated as a speech segment candidate time difference. The speech segment candidate time difference calculation unit includes a speech segment candidate time difference calculation unit, and a speech segment determination unit that determines a speech segment based on the output results of the peak width comparison unit and the speech segment candidate time difference calculation unit. If there is one speech segment candidate, the speech segment candidate is directly determined as the speech segment, and if there are multiple speech segment candidates and the speech segment candidate time difference between the adjacent speech segment candidates is shorter than a predetermined speech segment candidate time difference threshold. In such a case, the plurality of speech segment candidates are combined into one speech segment candidate, and the speech segment candidates are summarized so that a new speech segment candidate extends from the start of the previous speech segment candidate to the end of the subsequent speech segment candidate. Processing is performed, and this process of summarizing the speech segment candidates is repeated until one of the speech segment candidates that remains.
The voice section is characterized in that one or more of the voice sections are defined as the voice section.

(実施例) 次に本発明について図面を参照して説明する。
第1図は本発明の一実施例のブロツク図である。
(Example) Next, the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram of one embodiment of the present invention.

この実施例は、パワー算出部1、パワー平滑化
部2、ピーク検出部3、閾値算出部4、ピーク選
別部5、ピーク幅算出部6、ピーク幅比較部7、
音声区間判定部8及び音声区間候補時間差算出部
22から構成される。入力音声10はパワー算出
部1に入力され、算出されたパワー11はパワー
平滑化部2に入力され、平滑化されたパワー12
はピーク検出部3とピーク幅算出部6とに入力さ
れる。ピーク検出部3は、平滑化パワー12の変
化率が正から負に変わる変曲点をその平滑化パワ
ーのピーク候補13として検出し、検出したピー
ク候補13を閾値算出部4とピーク選別部5とに
出力する。閾値算出部4は、ピーク候補13のう
ちから最大ピークレバーを算出し、ピーク選別用
係数19と演算を行ないピーク選別用閾値14を
算出しピーク選別部5へ出力し、また最大ピーク
レベルとピーク幅算出用係数20との演算を行な
いピーク幅算出用閾値15を算出しピーク幅算出
部6へ出力する。ピーク選別部5は、ピーク候補
13のピークレベルとピーク選別用閾値14とを
比較し閾値以上のピークレベルを有するピーク候
補だけをピーク25としてピーク幅算出部6へ出
力する。
This embodiment includes a power calculation section 1, a power smoothing section 2, a peak detection section 3, a threshold calculation section 4, a peak selection section 5, a peak width calculation section 6, a peak width comparison section 7,
It is composed of a speech section determination section 8 and a speech section candidate time difference calculation section 22. The input voice 10 is input to the power calculation section 1, the calculated power 11 is input to the power smoothing section 2, and the smoothed power 12 is inputted to the power calculation section 1.
is input to the peak detection section 3 and the peak width calculation section 6. The peak detection unit 3 detects an inflection point where the rate of change of the smoothed power 12 changes from positive to negative as a peak candidate 13 of the smoothed power, and applies the detected peak candidate 13 to the threshold calculation unit 4 and the peak selection unit 5. Output to. The threshold calculation unit 4 calculates the maximum peak lever from among the peak candidates 13, performs calculations with the peak selection coefficient 19, calculates the peak selection threshold 14, outputs it to the peak selection unit 5, and also calculates the maximum peak level and peak A calculation is performed with the width calculation coefficient 20 to calculate a peak width calculation threshold 15 and output it to the peak width calculation section 6. The peak selection unit 5 compares the peak level of the peak candidate 13 with the peak selection threshold 14 and outputs only peak candidates having a peak level equal to or higher than the threshold to the peak width calculation unit 6 as peaks 25 .

ピーク幅算出部6は、ピーク幅算出用閾値15
以上に平滑化パワー12がある時間区間であつ
て、ピーク25が含まれる時間区間をピーク幅1
6として出力する。ピーク幅16は、ピーク25
が指定する平滑化パワー12のピークであつて、
ピーク幅算出用閾値15以上である平滑化パワー
12の時間区間を現わしている。このピーク幅1
6はピーク幅比較部7へ出力される。ピーク幅比
較部7は、ピーク幅閾値21と各ピークのピーク
幅16とを比較し、閾値21以上のピーク幅を有
するピークの始端及び終端を音声区間候補17と
して音声区間判定部8と音声区間候補時間差算出
部22へ出力する。
The peak width calculation unit 6 calculates a peak width calculation threshold value 15.
The time interval in which the smoothed power is 12 and includes the peak 25 is defined as the peak width 1
Output as 6. Peak width 16 is equal to peak 25
is the peak of smoothed power 12 specified by
It represents a time section of smoothed power 12 that is equal to or greater than the peak width calculation threshold value 15. This peak width 1
6 is output to the peak width comparison section 7. The peak width comparison unit 7 compares the peak width threshold 21 with the peak width 16 of each peak, and selects the starting and ending ends of the peaks having a peak width equal to or greater than the threshold 21 as voice interval candidates 17, and selects the voice interval determination unit 8 and the voice interval. It is output to the candidate time difference calculating section 22.

音声区間判定部8は、音声区間候補17が1つ
の場合そのまま音声区間候補17を音声区間18
として出力する。音声区間候補17が複数の場合
は、音声区間候補時間差算出部22は、隣接した
音声区間候補のうち前の音声区間候補の終端から
後の音声区間候補の始端までの時間を音声区間候
補時間差23として算出する。このとき、音声区
間判定部8は、隣接した音声区間候補の音声区間
候補時間差23が音声区間候補時間差閾値24よ
り小さい場合には、1つの音声区間候補にまとめ
る処理をくり返し行ない、最終的に1つになつた
場合はまとめ処理を行なつた音声区間候補を音声
区間18として出力し、1つにならなかつた場
合、最大のピークレベルを有するまとめ処理を行
なつた音声区間候補を音声区間18として出力す
る。
When there is only one speech segment candidate 17, the speech segment determination unit 8 converts the speech segment candidate 17 into a speech segment 18.
Output as . When there are multiple speech segment candidates 17, the speech segment candidate time difference calculation unit 22 calculates the speech segment candidate time difference 23, which is the time from the end of the previous speech segment candidate to the start of the next speech segment candidate among the adjacent speech segment candidates. Calculated as At this time, if the speech segment candidate time difference 23 of adjacent speech segment candidates is smaller than the speech segment candidate time difference threshold 24, the speech segment determination unit 8 repeatedly performs the process of combining the speech segment candidates into one speech segment candidate, and finally If it becomes one, the voice section candidate that has undergone the grouping process is output as voice section 18, and if it has not become one, the voice section candidate that has undergone the grouping process and has the maximum peak level is output as voice section 18. Output as .

このように、音声区間候補が最終的に複数とな
つた場合、最大ピークレベルを有する音声区間候
補以外は切り捨てるという方式は、雑音区間の除
去にに有効である。しかし、音声区間判定部8
は、音声区間候補が複雑となつた場合には各々の
音声区間候補を別々の音声区間と判定する方式に
すれば、連続して音声を発声した場合における音
声区間の分離などに有効であること明らかであ
る。
In this way, when a plurality of speech segment candidates end up, the method of discarding speech segment candidates other than those having the maximum peak level is effective in removing noise segments. However, the voice section determination unit 8
In this method, when the speech section candidates become complex, it is effective to separate the speech sections when uttering continuous speech by using a method that determines each speech section candidate as a separate speech section. it is obvious.

第2図は、第1図実施例における平滑した音声
パワー12の波形と音声検出用閾値と検出された
音声区間との関係を示す図である。第1図実施例
によれば、ピークレベルの低い雑音や音声に近接
した雑音が除去され、かつパワー・デイツプの深
い音声でも語頭の欠落を防ぐことができること
を、第2図を参照して、また第1図と関連づけて
以下に詳しく説明する。第2図の横軸30は時
間、縦軸31平滑されたパワーを表し、本図の波
形は、第1図のパワー平滑化部2の出力である平
滑化されたパワー12の波形を示す。
FIG. 2 is a diagram showing the relationship between the waveform of the smoothed voice power 12, the voice detection threshold, and the detected voice section in the embodiment of FIG. 1. With reference to FIG. 2, it will be seen that according to the embodiment of FIG. 1, noise with a low peak level and noise close to speech can be removed, and even speech with a deep power dip can be prevented from missing the beginning of a word. Further, a detailed explanation will be given below in connection with FIG. The horizontal axis 30 in FIG. 2 represents time, and the vertical axis 31 represents smoothed power, and the waveform in this figure shows the waveform of the smoothed power 12 that is the output of the power smoothing section 2 in FIG.

第1図のピーク検出部3によつて、第2図のピ
ーク候補32,33,34,35の4つのピーク
候補が検出され、第1図の閾値算出部4で最大ピ
ークであるピーク候補34からピーク選別用閾値
14とピーク幅算出用夢値15とが算出される。
ピーク選別部5では、ピーク選別用閾値14によ
りピークレベルの小さいピーク候補32が除去さ
れ、ピーク候補33,34,35がピークとして
出力される。ピーク幅算出部6では、ピーク幅算
出用閾値15によりピーク33,34,3のピー
ク幅38,39,40を算出し、ピーク幅比較部
7ではピーク幅閾値21と各ピーク幅38,3
9,40とを比較し、第2図の例では全てのピー
ク幅が閾値21より広いので、ピーク33,3
4,35の各々の始端から終端までの区間が音声
区間候補17として出力される。
The peak detection section 3 in FIG. 1 detects four peak candidates 32, 33, 34, and 35 in FIG. 2, and the threshold calculation section 4 in FIG. A peak selection threshold value 14 and a peak width calculation value value 15 are calculated from the above.
In the peak selection section 5, peak candidates 32 with low peak levels are removed using a peak selection threshold 14, and peak candidates 33, 34, and 35 are output as peaks. The peak width calculation section 6 calculates the peak widths 38, 39, and 40 of the peaks 33, 34, and 3 using the peak width calculation threshold 15, and the peak width comparison section 7 calculates the peak widths 38, 39, and 40 using the peak width threshold 21 and the peak widths 38, 3, respectively.
9 and 40, all peak widths are wider than the threshold value 21 in the example of FIG.
The sections from the start to the end of each of 4 and 35 are output as voice section candidates 17.

音声区間候補時間差算出部22では、ピーク3
3と34の音声区間候補時間差43と、ピーク3
4とピーク35の音声区間候補時間差44とを算
出する。音声区間判定部8では、音声区間候補時
間差閾値24と、各音声区間候補時間差43,4
4とを比較し、音声区間候補時間差43が閾値2
4より短いのでピーク33と34の音声区間候補
を1つにまとめ、ピーク33の始端からピーク3
4の終端までを新たな音声区間候補とし、音声区
間候補時間差44は閾値24より広いので、ピー
ク35はまとめることができず、2つの音声区間
候補が残ることになる。音声区間判定部8は、次
に2つの音声区間候補のピークレベルを比較し、
最大ピーク34を有するる始端41から終端42
までの音声区間候補を音声区間18と判定し出力
する。
The voice section candidate time difference calculation unit 22 calculates peak 3
3 and 34 voice section candidate time difference 43 and peak 3
4 and the voice section candidate time difference 44 between the peak 35 and the peak 35 are calculated. The speech section determination unit 8 uses a speech section candidate time difference threshold 24 and each speech section candidate time difference 43, 4.
4, the voice section candidate time difference 43 is the threshold value 2.
Since it is shorter than 4, the voice section candidates of peaks 33 and 34 are combined into one, and the voice section candidates from peak 33 to peak 3 are
4 is set as a new speech section candidate, and since the speech section candidate time difference 44 is wider than the threshold value 24, the peaks 35 cannot be combined, and two speech section candidates remain. The speech section determination unit 8 then compares the peak levels of the two speech section candidates,
From the starting end 41 to the ending end 42 having the maximum peak 34
The speech section candidates up to this point are determined to be speech section 18 and output.

従つて、第1図実施例によれば、第2図に示し
た例の様に、雑音であるるピーク32と35が除
去され、かつパワー・デイツプが深くピーク33
と34に分離している音声でも正しく音声区間の
検出を行なうことができる。
Therefore, according to the embodiment shown in FIG. 1, as in the example shown in FIG.
Even if the voice is separated into 34 parts, the voice section can be detected correctly.

(発明の効果) 以上説明したように、本発明は、平滑化したパ
ワー波形のピークを検出し、レベルが最大である
ピークのレベルからピーク選別用閾値とピーク幅
算出用閾値とを算出し、ピーク選別用閾値以上の
ピークレベルを有するピークのピーク幅をピーク
幅算出用閾値によつて算出し、ピーク幅が所定の
幅以上のピークを音声区間候補と判定し、音声区
間候補と判定されたピークが複数の場合、音声区
間時間差を算出し、所定の時間より短かい場合は
1つの音声区間にまとめる処理をくり返し行な
い、最終的に1つにならなかつた場合にはそのう
ちの1つ(例えば最大のピークレベルを有する音
声区間候補)又は複数の音声区間候補のうちのい
くつかを音声区間と判定することにより、ピーク
の高さ、幅、隣接ピークとの時間差に基づいて音
声区間の判定を行なうことができ、瞬時的なピー
クを持つ雑音が音声に近接していても雑音の部分
を除去でき、またパワー・デイツプの深い音声で
も語頭のピークの部分の欠落を防ぐことができ、
上述した従来方式の欠点を除去することができ、
音声認識装置に用いた場合、認識率を向上でき
る。
(Effects of the Invention) As described above, the present invention detects the peak of a smoothed power waveform, calculates the peak selection threshold and the peak width calculation threshold from the level of the peak with the maximum level, The peak width of a peak having a peak level equal to or greater than a peak selection threshold is calculated using a peak width calculation threshold, and a peak whose peak width is equal to or greater than a predetermined width is determined to be a speech section candidate. If there are multiple peaks, calculate the voice interval time difference, and if it is shorter than a predetermined time, repeat the process of combining them into one voice interval, and if the peak does not become one in the end, one of them (e.g. By determining some of the speech section candidates (with the highest peak level) or a plurality of speech section candidates as speech sections, the speech section can be determined based on the peak height, width, and time difference with adjacent peaks. Even if noise with an instantaneous peak is close to the voice, it can be removed, and even in voice with a deep power dip, the peak at the beginning of a word can be prevented from being lost.
The drawbacks of the conventional method mentioned above can be removed,
When used in a speech recognition device, the recognition rate can be improved.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の一実施例のブロツク図、第2
図はこの実施例における平滑化音声パワーの波形
を示す図である。 1……パワー算出部、2……パワー平滑化部、
3……ピーク検出部、4……閾値算出部、5……
ピーク選別部、6……ピーク幅算出部、7……ピ
ーク幅比較部、8……音声区間判定部、10……
入力音声、11……パワー、12……平滑化され
たパワー、13……ピーク候補、14……ピーク
選別用閾値、15……ピーク幅算出用閾値、16
……ピーク幅、17……音声区間候補、18……
音声区間、19……ピーク選別用係数、20……
ピーク幅算出用係数、21……ピーク幅閾値、2
2……音声区間候補時間差算出部、23……音声
区間候補時間差、24……音声区間候補時間差閾
値、30……横軸(時間)、31……縦軸(平滑
されたパワー))、32〜35……ピーク候補、3
8〜40……ピーク幅、41……終端、42……
終端、43,44音声区間候補時間差。
FIG. 1 is a block diagram of one embodiment of the present invention, and FIG.
The figure is a diagram showing the waveform of smoothed audio power in this embodiment. 1... Power calculation section, 2... Power smoothing section,
3...Peak detection section, 4...Threshold value calculation section, 5...
Peak selection unit, 6...Peak width calculation unit, 7...Peak width comparison unit, 8...Speech section determination unit, 10...
Input audio, 11... Power, 12... Smoothed power, 13... Peak candidate, 14... Threshold for peak selection, 15... Threshold for peak width calculation, 16
...Peak width, 17...Voice section candidate, 18...
Voice section, 19...Coefficient for peak selection, 20...
Peak width calculation coefficient, 21...Peak width threshold, 2
2...Voice segment candidate time difference calculation unit, 23...Voice segment candidate time difference, 24...Voice segment candidate time difference threshold, 30...Horizontal axis (time), 31...Vertical axis (smoothed power)), 32 ~35...Peak candidate, 3
8 to 40...Peak width, 41...Terminal, 42...
End, 43, 44 voice section candidate time difference.

Claims (1)

【特許請求の範囲】 1 音声信号のパワーを算出するパワー算出部
と、このパワー算出部が算出した前記パワーを平
滑化して平滑化パワーを得るパワー平滑化部と、
前記平滑化パワーの変化率が正から負に変わる変
曲点をその平滑化パワーのピーク候補として検出
するピーク検出部と、前記ピーク候補のうちレベ
ルが最大であるピーク候補を最大ピークとして選
出し、この最大ピークのレベルと所定のピーク選
別用係数とからピーク選別用閾値を算出し、前記
最大ピークのレベルと所定のピーク幅算出用係数
とからピーク幅算出用閾値を算出する閾値算出部
と、前記ピーク検出部で検出した前記ピーク候補
のレベルと前記ピーク選別用閾値とを比較し、そ
のレベルが前記ピーク選別用閾値以上の前記ピー
ク候補だけをピークとして選別するピーク選別部
と、前記平滑化パワーが前記ピーク幅算出用閾値
以上である時間であつて前記ピーク選別部で選別
された前記ピークを含む時間をピーク幅として算
出するピーク幅算出部と、前記ピーク幅のうち所
定の幅閾値より広いピーク幅を音声区間候補とし
て出力するピーク幅比較部と、このピーク幅比較
部で得た前記音声区間候補が複数である場合、隣
接した前記音声区間候補のうちの前の前記音声区
間候補の終端から後の前記音声区間候補の始端ま
での時間を音声区間候補時間差として算出する音
声区間候補時間差算出部と、前記ピーク幅比較部
及び前記音声区間候補時間差算出部の出力結果か
ら音声区間の判定を行なう音声区間判定部とを備
え、この音声区間判定部は、前記音声区間候補が
1つの場合にはそのままその音声区間候補を前記
音声区間と判定し、前記音声区間候補が複数であ
つて隣接している前記音声区間候補の前記音声区
間候補時間差が所定の音声区間候補時間差閾値よ
り短かい場合には複数の前記音声区間候補を1つ
の音声区間候補にまとめて前の前記音声区間候補
の始端から後ろの前記音声区間候補の終端までを
新たな音声区間候補とする音声区間候補のまとめ
処理を行ない、この音声区間候補のまとめ処理を
繰返し行ない最終的に残つた音声区間候補のうち
の1つ又は複数を前記音声区間とすることを特徴
とする音声検出方式。 2 特許請求の範囲第1項記載の音声検出方式に
おいて、前記音声区間判定部は、最終的に前記音
声区間候補が複数となつた場合、それらの音声区
間候補のうちピークレベルが最大である音声区間
候補だけを前記音声区間とすることを特徴とする
音声区間検出方式。 3 特許請求の範囲第1項記載の音声検出方式に
おいて、前記音声区間判定部は、最終的に前記音
声区間候補が複数となつた場合、各々の前記音声
区間候補を別々の前記音声区間と判定することを
特徴とする音声区間検出方式。
[Claims] 1. A power calculation unit that calculates the power of an audio signal; a power smoothing unit that smoothes the power calculated by the power calculation unit to obtain smoothed power;
a peak detection unit that detects an inflection point where the rate of change of the smoothed power changes from positive to negative as a peak candidate of the smoothed power; and a peak detector that selects a peak candidate with a maximum level among the peak candidates as the maximum peak. , a threshold calculation unit that calculates a peak selection threshold from the maximum peak level and a predetermined peak selection coefficient, and calculates a peak width calculation threshold from the maximum peak level and a predetermined peak width calculation coefficient; , a peak selection unit that compares the level of the peak candidate detected by the peak detection unit with the peak selection threshold and selects only the peak candidates whose level is equal to or higher than the peak selection threshold as peaks; a peak width calculation unit that calculates, as a peak width, a time during which the peak width calculation power is greater than or equal to the peak width calculation threshold and that includes the peak selected by the peak selection unit; and a predetermined width threshold among the peak widths. a peak width comparison section that outputs a wider peak width as a speech section candidate; and when there is a plurality of speech section candidates obtained by this peak width comparison section, the previous speech section candidate among the adjacent speech section candidates; a speech segment candidate time difference calculation unit that calculates the time from the end of the speech segment candidate to the start of the subsequent speech segment candidate as a speech segment candidate time difference; and a speech segment determination unit that performs the determination, and the speech segment determination unit directly determines the speech segment candidate as the speech segment when there is one speech segment candidate, and when there is a plurality of speech segment candidates, the speech segment determination unit directly determines the speech segment candidate as the speech segment. If the speech segment candidate time difference between adjacent speech segment candidates is shorter than a predetermined speech segment candidate time difference threshold, the plural speech segment candidates are combined into one speech segment candidate, and the speech segment candidates are combined into one speech segment candidate. A process of summarizing speech segment candidates is performed in which a new speech segment candidate is created from the beginning to the end of the voice segment candidate after the speech segment candidate, and this process of summarizing speech segment candidates is repeatedly performed to finally select one of the remaining speech segment candidates. A voice detection method characterized in that one or more of the voice sections are defined as the voice section. 2. In the speech detection method according to claim 1, when there are finally a plurality of speech section candidates, the speech section determining section selects the speech having the highest peak level among the speech section candidates. A speech section detection method characterized in that only section candidates are used as the speech section. 3. In the speech detection method according to claim 1, when there are finally a plurality of speech segment candidates, the speech segment determination unit determines each speech segment candidate to be a different speech segment. A voice section detection method characterized by:
JP60282481A 1985-12-16 1985-12-16 Voice detection system Granted JPS62141595A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP60282481A JPS62141595A (en) 1985-12-16 1985-12-16 Voice detection system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP60282481A JPS62141595A (en) 1985-12-16 1985-12-16 Voice detection system

Publications (2)

Publication Number Publication Date
JPS62141595A JPS62141595A (en) 1987-06-25
JPH0588840B2 true JPH0588840B2 (en) 1993-12-24

Family

ID=17652995

Family Applications (1)

Application Number Title Priority Date Filing Date
JP60282481A Granted JPS62141595A (en) 1985-12-16 1985-12-16 Voice detection system

Country Status (1)

Country Link
JP (1) JPS62141595A (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2658712B2 (en) * 1992-02-19 1997-09-30 ティアック株式会社 Audio signal detection circuit
JP5109050B2 (en) * 2007-07-13 2012-12-26 学校法人早稲田大学 Voice processing apparatus and program
JP5867066B2 (en) 2011-12-26 2016-02-24 富士ゼロックス株式会社 Speech analyzer
JP6031761B2 (en) 2011-12-28 2016-11-24 富士ゼロックス株式会社 Speech analysis apparatus and speech analysis system

Also Published As

Publication number Publication date
JPS62141595A (en) 1987-06-25

Similar Documents

Publication Publication Date Title
US4757540A (en) Method for audio editing
US5826230A (en) Speech detection device
WO1996002911A1 (en) Speech detection device
JPH0588840B2 (en)
JPH0588839B2 (en)
CN113611330A (en) Audio detection method and device, electronic equipment and storage medium
JPH0138320B2 (en)
JP3983421B2 (en) Voice recognition device
JPH01159697A (en) Voice recognition apparatus
JPS6147437B2 (en)
JP2532618B2 (en) Pitch extractor
JPH03253899A (en) Voice section detection system
JPH01219636A (en) Automatic score taking method and apparatus
JPS6247319B2 (en)
JPH07101354B2 (en) Voice section detector
JP3411074B2 (en) Vowel interval detection device and vowel interval detection method
JP2901976B2 (en) Pattern matching preliminary selection method
JPH0754434B2 (en) Voice recognizer
JP4360527B2 (en) Pitch detection method
JPS60101598A (en) Voice section detector
JPH04251299A (en) Speech section detecting means
JPH06266383A (en) Speech segmentation device
JPH0467200A (en) Method for discriminating voiced section
JPS6069699A (en) Voice pattern generator
JPH0320792B2 (en)