JPH06266383A

JPH06266383A - Speech segmentation device

Info

Publication number: JPH06266383A
Application number: JP5050928A
Authority: JP
Inventors: Katsumi Hama; 勝巳濱
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1993-03-11
Filing date: 1993-03-11
Publication date: 1994-09-22

Abstract

PURPOSE:To improve the recognition rate of a speech by accurately detecting the end point of the speech and segmenting a signal into a speech signal. CONSTITUTION:This speech segmentation device is equipped with a start point detecting means 2 which detects a start point where speech power calculated from the speech signal increases, a temporary end point detecting means 3 which detects a temporary end point C where the speech power decreases below a 1st threshold value, and a temporary end point moving means 4 which moves the temporary end point sequentially to a position where the speech power becomes less than a 2nd threshold value beyond it after the temporary end point C is detected by the temporary end point detecting means 3. In response to the determination of a final temporary end point as the end point H by the temporary end point moving means 4, the speech signal from the start point A to the end point H is obtained by segmentation and regarded as an object of speech recognition.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、信号中から音声信号を
切り出す音声切出装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice cutting device for cutting a voice signal from a signal.

【０００２】[0002]

【従来の技術】従来、マイクロホンで集音したアナログ
の音声信号をディジタル化し、このディジタル音声信号
から音声パワーを演算によって求める。図５に示すこの
音声パワーについて、図示の第１閾値と比較して、一定
時間以上、第１閾値未満が継続した場合、音声パワーが
零でなくてもノイズとみなす。そして、この第１閾値未
満となった点Ｃ’を音声の終了点Ｃ’として、音声開始
点からこの終了点Ｃ’までの音声を切り出し、音声認識
を行なうようにしていた。以下図５の構成および動作を
簡単に説明する。2. Description of the Related Art Conventionally, an analog voice signal picked up by a microphone is digitized, and voice power is calculated from the digital voice signal. Regarding this voice power shown in FIG. 5, when the voice power continues to be less than the first threshold for a certain period of time as compared with the first threshold shown in the figure, it is regarded as noise even if the voice power is not zero. Then, the point C ′ that is less than the first threshold value is set as the end point C ′ of the voice, and the voice from the voice start point to the end point C ′ is cut out and the voice recognition is performed. The configuration and operation of FIG. 5 will be briefly described below.

【０００３】図５は、従来技術の説明図を示す。図５の
（Ａ）において、音声パワーは、マイクロホンで集音し
てディジタル化したディジタル音声信号から演算して算
出した音声パワーである。縦軸は音声パワーの強さ、横
軸は時間を表す。FIG. 5 shows an explanatory view of the prior art. In FIG. 5A, the voice power is the voice power calculated and calculated from the digital voice signal collected by the microphone and digitized. The vertical axis represents voice power strength, and the horizontal axis represents time.

【０００４】マイクロホンで音声を集音して算出した図
示の音声パワーが所定値以上に増大した開始点Ａ’と求
める（図示しない）。次に、音声パワーが徐々に弱くな
り、第１閾値未満を一定時間以上継続したときの点を終
了点Ｃ’と決定する。そして、開始点Ａ’から終了点
Ｃ’までの区間の音声信号の切り出しを行い、音声認識
の対象とする。A starting point A'when the voice power shown in the figure, which is calculated by collecting voices with a microphone, is increased to a predetermined value or more is obtained (not shown). Next, the point at which the voice power becomes gradually weaker and remains below the first threshold value for a certain period of time or more is determined as the end point C ′. Then, the voice signal in the section from the start point A ′ to the end point C ′ is cut out and is used as a voice recognition target.

【０００５】[0005]

【発明が解決しようとする課題】上述した従来の音声の
切り出しを行う際に、図５の（Ｂ）に示すように、音声
パワーが第１閾値未満を一定時間以上継続しても、音声
が終了していない場合があるという問題があった。これ
により、音声の切り出し開始点Ａ’から終了点Ｃ’まで
のみの音声を切り出して音声認識を行ってしまい、音声
の終了点Ｃ’が誤ったことによって認識率が低下してし
まう問題が発生した。When performing the above-described conventional audio segmentation, as shown in FIG. 5B, even if the audio power continues to be less than the first threshold value for a certain period of time or more, There was a problem that it might not finish. As a result, a voice is cut out from only the voice cut-out start point A ′ to the voice cut-out point C ′, and voice recognition is performed, and a problem occurs that the recognition rate is lowered due to an incorrect voice cut-out point C ′. did.

【０００６】本発明は、これらの問題を解決するため、
音声の終了点を正確に検出して音声信号を切り出し、音
声の認識率の向上を図ることを目的としている。The present invention solves these problems.
The purpose is to accurately detect the end point of the voice and cut out the voice signal to improve the recognition rate of the voice.

【０００７】[0007]

【課題を解決するための手段】図１は、本発明の原理構
成図を示す。図１において、開始点検出手段２は、音声
信号１の開始点Ａを検出するものである。FIG. 1 is a block diagram showing the principle of the present invention. In FIG. 1, the starting point detecting means 2 detects the starting point A of the audio signal 1.

【０００８】仮終了点検出手段３は、音声パワーが第１
閾値以下となった仮終了点Ｃを検出するものである。終
了点移動手段４は、仮終了点Ｃを検出した後、第２閾値
を越えた後に未満となった位置に当該仮終了点を順次移
動するものである。The temporary end point detecting means 3 has the first voice power.
The provisional end point C that is less than or equal to the threshold is detected. After detecting the temporary end point C, the end point moving means 4 sequentially moves the temporary end point to a position where the temporary end point C exceeds the second threshold value and becomes less than the second threshold value.

【０００９】切出手段５は、開始点Ａから終了点Ｈまで
の音声信号を切り出すものである。The cut-out means 5 cuts out the audio signal from the start point A to the end point H.

【００１０】[0010]

【作用】本発明は、図１に示すように、音声信号１から
算出した音声パワーをもとに、開始点検出手段２が当該
音声パワーの増大する開始点Ａを検出し、仮終了点検出
手段３が音声パワーについて第１閾値以下となる仮終了
点Ｃを検出し、仮終了点移動手段４がこの仮終了点Ｃの
検出された後、所定の第２閾値を越えて未満となる位置
に当該仮終了点Ｃを順次移動することを繰り返し、最終
の終了点Ｈを検出したことに対応して、切出手段５が開
始点Ａから終了点Ｈまでの音声信号を切り出し、音声認
識の対象とするようにしている。According to the present invention, as shown in FIG. 1, the starting point detecting means 2 detects the starting point A at which the audio power increases based on the audio power calculated from the audio signal 1 and detects the temporary end point. The position where the means 3 detects the provisional end point C which is less than or equal to the first threshold value for the voice power, and the provisional end point moving means 4 detects the provisional end point C and then exceeds the predetermined second threshold value and becomes less than the predetermined second threshold value. The tentative end point C is repeatedly moved to, and in response to the detection of the final end point H, the cut-out means 5 cuts out a voice signal from the start point A to the end point H, and the voice recognition I am trying to target it.

【００１１】この際、第２閾値として、仮終了点Ｃから
時間の経過に伴い値が徐々に大きくなるように設定して
いる。従って、音声の終了点Ｈを正確に検出して開始点
Ａから終了点Ｈまでの音声信号を切り出して音声認識の
対象とすることにより、音声の認識率の向上を図ること
が可能となる。At this time, the second threshold value is set so that the value gradually increases with the passage of time from the temporary end point C. Therefore, by accurately detecting the end point H of the voice and cutting out the voice signal from the start point A to the end point H to be the target of voice recognition, it is possible to improve the voice recognition rate.

【００１２】[0012]

【実施例】次に、図１から図４を用いて本発明の実施例
の構成および動作を順次詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, the construction and operation of an embodiment of the present invention will be described in detail with reference to FIGS.

【００１３】図１は、本発明の原理構成図を示す。図１
において、音声入力１は、マイクロホンによって集音し
た音声信号であって、切り出して音声認識の対象とする
音声入力である。FIG. 1 is a block diagram showing the principle of the present invention. Figure 1
In the above, the voice input 1 is a voice signal collected by a microphone and is a voice input that is cut out and is a target of voice recognition.

【００１４】開始点検出手段２は、音声信号１から音声
パワーを計算し、この音声パワーの開始点Ａを算出する
ものである（図４参照）。仮終了点検出手段３は、開始
点検出手段２によって音声パワーの開始点Ａを検出した
後、当該音声パワーが予め設定した第１閾値以下となっ
た点を仮終了点Ｃとして検出するものである（図４参
照）。The start point detecting means 2 calculates the voice power from the voice signal 1 and calculates the start point A of this voice power (see FIG. 4). The temporary end point detection means 3 detects the start point A of the audio power by the start point detection means 2 and then detects the point at which the audio power becomes equal to or less than a preset first threshold value as the temporary end point C. Yes (see FIG. 4).

【００１５】終了点移動手段４は、仮終了点検出手段３
が仮終了点Ｃを検出した後、予め設定した第２閾値を越
えた後に未満となった位置に当該仮終了点Ｃを順次移動
するものである。ここで、第２閾値は、仮終了点Ｃから
徐々にその値が大きくなるように設定する。The end point moving means 4 is a temporary end point detecting means 3
After detecting the tentative end point C, the tentative end point C is sequentially moved to a position where the tentative end point C exceeds a preset second threshold value and then becomes less. Here, the second threshold value is set such that its value gradually increases from the temporary end point C.

【００１６】切出手段５は、開始点検出手段２によって
検出した開始点Ａから、終了点移動手段４によって移動
させた最終の点の終了点Ｈまでの音声信号を切り出すも
のである。この切り出した音声信号を音声認識の対象と
する。The cut-out means 5 cuts out a voice signal from the start point A detected by the start-point detection means 2 to the end point H of the final point moved by the end-point moving means 4. This cut out voice signal is the target of voice recognition.

【００１７】次に、図２のフローチャートを用いて、図
１の構成の動作を詳細に説明する。図２において、Ｓ１
は、音声入力を行なう。これは、オペレータがマクロホ
ンに向かって音声を入力する。これにより、マイクロホ
ンによって集音した音声信号を取り込む。Next, the operation of the configuration of FIG. 1 will be described in detail with reference to the flowchart of FIG. In FIG. 2, S1
Performs voice input. This is where the operator inputs speech into the macrophone. As a result, the audio signal collected by the microphone is captured.

【００１８】Ｓ２は、音声ディジタル化する。これは、
Ｓ１で取り込んだアナログの音声信号をディジタル音声
信号に変換する。Ｓ３は、音声パワーを計算する。これ
は、Ｓ２でディジタル音声信号から音声パワーを計算す
る。In step S2, the sound is digitized. this is,
The analog voice signal captured in S1 is converted into a digital voice signal. S3 calculates the voice power. This calculates the voice power from the digital voice signal in S2.

【００１９】Ｓ４は、開始点を検出する。これは、Ｓ３
で音声信号から計算した音声パワー、例えば図４の
（Ａ）の音声パワーから開始点Ａを検出する。Ｓ５は、
第１閾値との比較を行なう。これは、図４の（Ａ）に示
すように、音声パワーについて、予め設定した第１閾値
と比較する。In step S4, the starting point is detected. This is S3
The starting point A is detected from the voice power calculated from the voice signal, for example, the voice power of FIG. S5 is
The comparison with the first threshold value is performed. This compares the audio power with a preset first threshold as shown in FIG.

【００２０】Ｓ６は、仮終了点Ｃを検出する。これは、
Ｓ５の比較によって、音声パワーが第１閾値よりも小さ
くなる点を仮終了点Ｃとして検出する（図４の（Ａ）参
照）。In step S6, the temporary end point C is detected. this is,
By the comparison in S5, a point at which the voice power becomes smaller than the first threshold value is detected as a temporary end point C (see FIG. 4A).

【００２１】Ｓ７は、第２閾値と比較する。これは、後
述する図４の（Ａ）に示すように、Ｓ６で仮終了点Ｃを
検出したので、これに続いて、音声パワーと第２閾値と
を比較する。In step S7, the second threshold value is compared. This is because the provisional end point C is detected in S6, as shown in FIG. 4 (A) described later, and subsequently, the voice power and the second threshold value are compared.

【００２２】Ｓ８は、終了点移動する。これは、Ｓ７の
比較によって音声パワーが第２閾値を越えた後に未満と
なった点、例えば図４のＦ点に仮終了点Ｃを移動する。
そして、この仮終了点の移動を繰り返す。In S8, the end point is moved. This is to move the provisional end point C to a point where the voice power becomes smaller than the second threshold value after the comparison in S7, for example, point F in FIG.
Then, the movement of the temporary end point is repeated.

【００２３】Ｓ９は、音声の切り出しを行なう。これ
は、Ｓ８で仮終了点Ｃの移動を繰り返し、最終の終了点
Ｈを求めた後、開始点Ａから当該終了点Ｈまでの音声信
号を切り出し、音声認識の対象とする。In step S9, a voice is cut out. This is to repeat the movement of the temporary end point C in S8, obtain the final end point H, and then cut out the voice signal from the start point A to the end point H to be the target of voice recognition.

【００２４】以上によって、音声信号をディジタル音
声信号に変換した後、音声パワーを算出し、この音声パ
ワーの開始点Ａを求める。音声パワーと第１閾値と比
較して未満となる点を仮終了点Ｃとする。次に、音声
パワーと第２閾値と比較し、音声パワーが第２閾値を越
えた後に未満となる点に当該仮終了点Ｃを移動すること
を繰り返し、最終的な終了点Ｈを求める。検出した開
始点Ａから最終点Ｈまでの区間の音声信号を切り出し、
音声認識の対象とする。これらにより、音声信号の末尾
の部分の信号レベルがたとえ小さくなっても、切り出し
を忘れることがなく、正確に音声信号の範囲内の信号を
切り出し、結果として音声認識率の向上を図ることが可
能となる。After converting the voice signal into a digital voice signal as described above, the voice power is calculated and the starting point A of this voice power is obtained. A point that is less than the voice power and the first threshold value is defined as a temporary end point C. Next, the voice power is compared with the second threshold value, and the temporary end point C is repeatedly moved to a point where the voice power becomes less than the second threshold value after exceeding the second threshold value, and the final end point H is obtained. Cut out the detected audio signal from the start point A to the end point H,
The target of voice recognition. With these, even if the signal level at the end of the audio signal becomes small, it is possible to accurately cut out the signal within the range of the audio signal without forgetting to cut out, and as a result, improve the voice recognition rate. Becomes

【００２５】図３は、本発明の要部動作フローチャート
を示す。これは、図２のＳ６の仮終了点Ｃを検出した
後、最終の終了点Ｈを検出するときの手順を詳細に記載
したものである。FIG. 3 shows an operation flowchart of the main part of the present invention. This is a detailed description of the procedure for detecting the final end point H after detecting the temporary end point C in S6 of FIG.

【００２６】図３において、Ｓ１１は、音声の仮終了点
を検出する。これは、図２のＳ６に対応し、図４の
（Ａ）の音声パワーが第１閾値以下となる仮終了点Ｃを
検出する。In FIG. 3, S11 detects a temporary end point of voice. This corresponds to S6 of FIG. 2 and detects the tentative end point C at which the audio power of FIG. 4A becomes equal to or lower than the first threshold value.

【００２７】Ｓ１２は、時間軸に対する第２閾値を設定
する。これは、例えば図４の第２閾値に示すように、仮
終了点Ｃから時間の経過に伴い徐々に値が大きくなる第
２閾値を設定する。In step S12, a second threshold value for the time axis is set. For example, as shown in the second threshold in FIG. 4, the second threshold is set so that the value gradually increases with the passage of time from the temporary end point C.

【００２８】Ｓ１３は、期間Ｘの終了か判別する。これ
は、仮終了点Ｃから予め設定した期間Ｘを経過して終了
したか判別する。ＹＥＳの場合には、終了する（ＥＮ
Ｄ）。ＮＯの場合には、Ｓ１４に進む。In step S13, it is determined whether the period X has ended. This determines whether the preset period X has elapsed from the temporary end point C and the process has ended. If YES, end (EN
D). If NO, the process proceeds to S14.

【００２９】Ｓ１４は、音声パワー値が第２閾値未満か
判別する。ＹＥＳの場合には、Ｓ１５で音声の終了点を
設定（即ち仮終了点Ｃをこの点に移動して設定）し、Ｓ
１６に進む。一方、ＮＯの場合には、Ｓ１３を戻る。In step S14, it is determined whether the voice power value is less than the second threshold value. If YES, the end point of the voice is set in S15 (that is, the temporary end point C is moved to this point and set), and S
Proceed to 16. On the other hand, if NO, the process returns to S13.

【００３０】Ｓ１６は、音声パワー値が第２閾値以上か
判別する。これは、Ｓ１４のＹＥＳで音声パワー値が第
２閾値未満となった後、第２閾値以上となったか判別す
る。ＹＥＳの場合には、Ｓ１３に戻る。ＮＯの場合に
は、Ｓ１７で期間Ｘの終了か判別し、ＹＥＳのときに終
了し（ＥＮＤ）、ＮＯのときにＳ１６に戻る。In step S16, it is determined whether the voice power value is the second threshold value or more. This determines whether or not the voice power value becomes less than the second threshold value and then becomes equal to or more than the second threshold value in YES in S14. If YES, the process returns to S13. In the case of NO, it is determined in S17 whether or not the period X has ended, when YES is ended (END), and when NO is returned to S16.

【００３１】以上によって、音声パワー値が第２閾値未
満となったときにＳ１５で仮終了点Ｃをこの未満となっ
た点に移動することを、期間Ｘの終了するまで繰り返
す。これにより、仮終了点Ｃを期間Ｘ内で第２閾値値未
満となった点に順次移動し、最終の値を終了点Ｈと決定
する。As described above, when the voice power value becomes less than the second threshold value, the process of moving the temporary end point C to the point less than this value in S15 is repeated until the end of the period X. As a result, the temporary end point C is sequentially moved to a point that is less than the second threshold value within the period X, and the final value is determined as the end point H.

【００３２】図４は、本発明の動作説明図を示す。図４
の（Ａ）は、音声パワーの変化とそのときの仮終了点Ｃ
および終了点Ｈの算出の様子を示す。ここで、・音声パワーは、音声信号から演算によって求めた音声
パワーである。FIG. 4 shows an operation explanatory diagram of the present invention. Figure 4
(A) shows a change in voice power and a temporary end point C at that time.
The calculation of the end point H is shown. Here, the voice power is the voice power calculated from the voice signal.

【００３３】・第１閾値は、仮終了点Ｃを求めるための
ものである。音声パワーがこの第１閾値未満となった点
を仮終了点Ｃとする。・第２閾値は、仮終了点Ｃ以降で当該仮終了点Ｃを順次
移動し、最終の終了点Ｈを算出するためのものであっ
て、仮終了点Ｃを始点として時間の経過に伴い徐々に増
加する値を持つものである。The first threshold is for obtaining the temporary end point C. The point at which the voice power becomes less than the first threshold value is defined as the temporary end point C. The second threshold is for calculating the final end point H by sequentially moving the provisional end point C after the provisional end point C, and gradually starting with the provisional end point C as the start point. It has a value that increases.

【００３４】・開始点Ａは、音声パワーの開始点であ
る。・仮終了点Ｃは、音声パワーが第１閾値未満となった点
である。・最終の終了点Ｈは、仮終了点Ｃを求めた後、予め設定
した期間Ｘ内で音声パワーが第２閾値を越えた後に未満
となる点に順次移動した最終の終了点である。The starting point A is the starting point of voice power. -Temporary end point C is a point at which the audio power becomes less than the first threshold value. The final end point H is a final end point that is obtained after the provisional end point C is obtained and sequentially moved to a point where the audio power becomes less than the second threshold value within the preset period X.

【００３５】次に、図４の（Ａ）の音声パワー、第１閾
値、第２閾値、および期間Ｘのもとで、最終の終了点Ｈ
を求める手順を説明する。（１）音声パワーが増大を始めた開始点Ａを求める。Next, under the voice power, the first threshold value, the second threshold value, and the period X shown in FIG.
The procedure for obtaining is explained. (1) The starting point A at which the voice power starts to increase is determined.

【００３６】（２）音声パワーが増大し、第１閾値未
満となる仮終了点Ｃを求める。（３）仮終了点Ｃ以降で、音声パワーが第２閾値を越
えた後に未満となる点に当該仮終了点Ｃを順次移動し、
期間Ｘが終わるまで続け、最終の移動した点を終了点Ｈ
と決定する。(2) A tentative end point C at which the voice power increases and is less than the first threshold value is obtained. (3) After the tentative end point C, the tentative end point C is sequentially moved to a point where the audio power becomes less than the second threshold value after exceeding the second threshold value,
Continue until the end of period X, and set the last moved point to the end point H.
To decide.

【００３７】（４）（１）で求めた開始点Ａと、
（３）で決定した最終の終了点Ｈとの範囲内の音声信号
を切り出し、音声認識の対象とする。図４の（Ｂ）は、
第２閾値例を示す。これは、仮終了点Ｃから期間Ｘ内
で、仮終了点Ｃから徐々に増大する値を持つようにした
ものであって、ここでは、変化分を徐々に大きくなるよ
うにしている（図４の（Ａ）の第２閾値は直線的に徐々
に大きくなるようにしている）。これにより、仮終了点
Ｃの近傍では可及的に音声信号が小さくても切り出し範
囲に取り込むようにして音声認識率を向上させるように
すると共に、仮終了点Ｃから離れるに従い、ノイズなど
により切り出し範囲が異常に広くなって音声認識率が低
下しないようにしている。(4) The starting point A obtained in (1),
A voice signal within the range of the final end point H determined in (3) is cut out and is subjected to voice recognition. In FIG. 4B,
A second threshold example is shown. This is to have a value that gradually increases from the temporary end point C within the period X from the temporary end point C, and here, the change amount is gradually increased (FIG. 4). The second threshold of (A) is linearly and gradually increased). As a result, even if the voice signal is as small as possible in the vicinity of the temporary end point C, the voice recognition rate is improved by incorporating the audio signal into the cutout range. The range is abnormally wide so that the voice recognition rate does not decrease.

【００３８】[0038]

【発明の効果】以上説明したように、本発明によれば、
音声パワーが第１閾値未満となった仮終了点Ｃを出した
後、第２閾値を越えて未満となる位置に仮終了点を順次
移動して最終の終了点Ｈを求め、開始点Ａから終了点Ｈ
の範囲で音声信号を切り出して音声認識の対象とする構
成を採用しているため、音声の終了点Ｈを正確に検出し
て開始点Ａから終了点Ｈまでの音声信号を切り出して音
声認識の対象とし、音声の認識率の向上を図ることがで
きる。この際、第２閾値を仮終了点Ｃから徐々に増大す
るように設定し、仮終了点Ｃの近傍の音声信号を取り込
んで音声認識率の向上を図ると共に、仮終了点Ｃから離
れるに従いノイズなどを取り込まないようにして音声認
識率の低下を防止することが可能となる。As described above, according to the present invention,
After the provisional end point C where the voice power is less than the first threshold value is output, the provisional end point is sequentially moved to a position where it exceeds the second threshold value and becomes less than the second threshold value to obtain the final end point H. End point H
Since a configuration is adopted in which a voice signal is cut out within the range of 10 to be the target of voice recognition, the end point H of the voice is accurately detected, and the voice signal from the start point A to the end point H is cut out to perform voice recognition. As a target, the voice recognition rate can be improved. At this time, the second threshold value is set so as to gradually increase from the temporary end point C, a voice signal in the vicinity of the temporary end point C is captured to improve the voice recognition rate, and noise increases as the distance from the temporary end point C increases. It is possible to prevent the voice recognition rate from being lowered by not taking in such information.

[Brief description of drawings]

【図１】本発明の原理構成図である。FIG. 1 is a principle configuration diagram of the present invention.

【図２】本発明の実施例構成の動作フローチャートであ
る。FIG. 2 is an operation flowchart of the configuration of the embodiment of the present invention.

【図３】本発明の要部動作フローチャートである。FIG. 3 is an operation flowchart of a main part of the present invention.

【図４】本発明の動作説明図である。FIG. 4 is an explanatory diagram of the operation of the present invention.

【図５】従来技術の説明図である。FIG. 5 is an explanatory diagram of a conventional technique.

[Explanation of symbols]

１：音声入力２：開始点検出手段３：仮終了点検出手段４：終了点移動手段５：切出手段Ａ：開始点Ｃ：仮終了点Ｈ：最終の終了点Ｘ：仮終了点Ｃから音声信号を切り出す期間 1: Voice input 2: Starting point detecting means 3: Temporary ending point detecting means 4: Ending point moving means 5: Cutting means A: Starting point C: Temporary ending point H: Final ending point X: From temporary ending point C Period for cutting out audio signal

Claims

[Claims]

1. A voice clipping device for clipping a voice signal from a signal, the voice power calculated from the voice signal, a start point detecting means (2) for detecting a start point A at which the voice power increases, and a voice power. Is a first threshold value or less, a temporary end point detecting means (3) for detecting the temporary end point C, and a predetermined second threshold value after the temporary end point C is detected by the temporary end point detecting means (3). A provisional end point moving means (4) for sequentially moving the provisional end point to a position above and below is provided, and the provisional end point moving means (4) determines the final provisional end point as the end point H. Correspondingly, a voice cut-out device, characterized in that the voice signal from the start point A to the end point H is cut out and used as a target of voice recognition.

2. The audio clipping device according to claim 1, wherein the second threshold value is set such that the value gradually increases with time from the temporary end point C.