JPH071433B2 - Voice start time detection method - Google Patents

Voice start time detection method

Info

Publication number
JPH071433B2
JPH071433B2 JP61101806A JP10180686A JPH071433B2 JP H071433 B2 JPH071433 B2 JP H071433B2 JP 61101806 A JP61101806 A JP 61101806A JP 10180686 A JP10180686 A JP 10180686A JP H071433 B2 JPH071433 B2 JP H071433B2
Authority
JP
Japan
Prior art keywords
voice
start time
sum
values
sampling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP61101806A
Other languages
Japanese (ja)
Other versions
JPS62258499A (en
Inventor
博行 関根
潤一 瀧口
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Mektron KK
Original Assignee
Nippon Mektron KK
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Mektron KK filed Critical Nippon Mektron KK
Priority to JP61101806A priority Critical patent/JPH071433B2/en
Publication of JPS62258499A publication Critical patent/JPS62258499A/en
Publication of JPH071433B2 publication Critical patent/JPH071433B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Description

【発明の詳細な説明】 〔発明の目的〕 (産業上の利用分野) 本発明は音声の開始時点を検知する音声開始時点検知方
法に関する。
DETAILED DESCRIPTION OF THE INVENTION Object of the Invention (Industrial field of use) The present invention relates to a voice start time detection method for detecting a voice start time.

(従来の技術) 音声認識装置は、入力した音声波形をスペクトル分析
し、音声入力パターンとしてパターンメモリに記憶す
る。この音声入力パターンを予め辞書に登録してある標
準パターンと比較して類似度を演算し、最も類似度の高
い標準パターンを認識結果として判定し出力する。
(Prior Art) A voice recognition device spectrally analyzes an input voice waveform and stores it as a voice input pattern in a pattern memory. This voice input pattern is compared with the standard pattern registered in the dictionary in advance to calculate the similarity, and the standard pattern with the highest similarity is determined and output as the recognition result.

かかる音声認識装置には入力する音声により単語単位の
音声を認識する単語音声認識装置と単音節単位の音声を
認識する単音節音声認識装置がある。単語音声認識装置
の場合も単音節音声認識装置の場合も、音声の開始時点
を定める必要がある。従来は単に音声波形のレベルが所
定値を超えたか否かを検知し、あるレベルを超えた時点
を音声開始時点としていた。
Such voice recognition devices include a word voice recognition device that recognizes a voice in word units by an input voice and a monosyllabic voice recognition device that recognizes a voice in a monosyllabic unit. In both the word voice recognition device and the monosyllabic voice recognition device, it is necessary to determine the start time of the voice. Conventionally, it is simply detected whether or not the level of the voice waveform exceeds a predetermined value, and the time when the level exceeds a certain level is set as the voice start time.

(発明が解決しようとする問題点) しかしながら従来の方法ではノイズと音声の判別を正確
にすることができないという問題があった。特に鋭いノ
イズと子音との判別がつきにくく、音声開始時点をあや
まって検知し、その結果音声を誤認識してしまうという
問題があった。
(Problems to be Solved by the Invention) However, the conventional method has a problem in that noise and voice cannot be accurately discriminated. In particular, there is a problem in that it is difficult to distinguish sharp noises from consonants, the voice start time is mistakenly detected, and as a result, the voice is erroneously recognized.

本発明は上記事情を考慮してなされたもので、正確に音
声開始時点を検知することができる音声開始時点検知方
法を提供することを目的とする。
The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a voice start time point detection method capable of accurately detecting the voice start time point.

〔発明の構成〕[Structure of Invention]

(問題点を解決するための手段) 上記目的を達成するため本発明による音声開始時点検知
方法は、入力音声波形のサンプリング値を、現時点を基
準として時間軸と逆方向に一定時間間隔の複数時点ごと
に複数個ずつとり出し、このとり出された複数個のサン
プリング値の絶対値の和であるサンプリング値和を計算
し、これら22プリング値和のすべてが予め定められたし
きい値を超えた場合には、前記現時点を基準として音声
開始時点を定めることを特徴とする。
(Means for Solving Problems) In order to achieve the above object, a voice start time point detection method according to the present invention uses a sampling value of an input voice waveform at a plurality of time points at fixed time intervals in a direction opposite to the time axis with reference to the current time point. A plurality of sampling values are extracted for each time, and the sum of sampling values, which is the sum of the absolute values of the plurality of sampled values, is calculated, and all of these 22 pulling value sums exceed a predetermined threshold value. In this case, the voice start time is determined based on the current time.

(実施例) 本発明の一実施例による音声開始時点検知方法を用いた
単音節音声の子音切出方法を第1図のフローチャートを
用いて説明する。本実施例ではサンプリングは100μsec
ごとにおこなわれ、128個のサンプリング値Siを1フレ
ームとし、1フレームで音声の認識がおこなわれるもの
とする。
(Embodiment) A method for extracting a consonant of a monosyllabic voice using a method for detecting a voice start point according to an embodiment of the present invention will be described with reference to the flowchart of FIG. In this embodiment, sampling is 100 μsec
It is assumed that 128 sampling values Si are set as one frame and voice recognition is performed in one frame.

まず100μsecごとに割込みがかかり音声波形がサンプリ
ングされ、サンプリング値Siがメモリに格納される(ス
テップ10)。
First, an interrupt is applied every 100 μsec, the voice waveform is sampled, and the sampling value Si is stored in the memory (step 10).

次に音声開始時点を決めるための計算をおこなう(ステ
ップ11)。割込み時点のサンプリング値をS0とし、100
×nμsec前のサンプリング値をS-nとすると、次式で示
すサンプリング値和R1,R2,R3を求める。
Next, calculation for determining the voice start time is performed (step 11). The sampling value at the time of the interrupt is set to S 0 and 100
Assuming that the sampling value before × nμsec is S n, the sampling value sums R 1 , R 2 and R 3 shown in the following equation are obtained.

例えば今m=8とすると第2図に示すようにサンプリン
グ値和R1は1フレーム前の時点のサンプリング値S-128
〜S-120の絶対値の和を示し、サンプリング値和R2は2
フレーム前の時点のサンプリング値S-256〜S-248の絶対
値の和を示し、サンプリング値和R3は3フレーム前の時
点のサンプリング値S-384〜S-376の絶対値の和を示して
いる。
For example, if m = 8 now, as shown in FIG. 2, the sampling value sum R 1 is the sampling value S −128 at the time point one frame before.
~ The sum of absolute values of S -120 is shown, and the sum of sampling values R 2 is 2
The sum of the absolute values of the sampling values S -256 to S -248 at the time point before the frame is shown, and the sum R 3 of the sampling values is the sum of the absolute values of the sampling values S -384 to S -376 at the time point three frames before. ing.

次にこれらサンプリング値和R1,R2,R3すべてが予め定め
られたしきい値RTHより大きいか否か判断する(ステッ
プ12)。サンプリング値R1,R2,R3のひとつでもしきい値
RTHより小さければノイズと判断してステップ10に戻
る。すべてのサンプリング値和R1,R2,R3がしきい値RTH
より大きければ実際の音声と判断し、ステップ13で音声
開始時点を決定する。本実施例では現時点を基準として
3フレーム過去の時点、すなわちサンプリング値S-384
を音声開始時点とする。
Next, it is determined whether or not all of these sampling value sums R 1 , R 2 , R 3 are larger than a predetermined threshold value R TH (step 12). Threshold value even with one of sampling values R 1 , R 2 and R 3
If it is smaller than R TH , it is judged as noise and the process returns to step 10. The sum of all sampling values R 1 , R 2 , R 3 is the threshold value R TH
If it is larger, it is determined that it is an actual voice, and in step 13, the voice start time is determined. In the present embodiment, a time point three frames in the past based on the current time, that is, a sampling value S -384
Is the voice start time.

次にこの音声開始時点を基準として予め定められたフレ
ーム数、例えば8フレームを子音部として切出す(ステ
ップ14)。
Next, a predetermined number of frames, for example, 8 frames are cut out as a consonant portion with reference to this voice start time point (step 14).

このように本実施例によれば38.4m secと比較的長い期
間で音声かどうか判断しているため、振幅値の大きい鋭
いノイズと音声を判別することができる。
As described above, according to the present embodiment, since it is determined whether or not the voice is voice in a relatively long period of 38.4 msec, it is possible to discriminate the voice and the sharp noise having a large amplitude value.

本発明は上記実施例に限らず種々の変形が可能である。
例えばサンプリング値和の計算するためのサンプリング
値はいくつでもよい。また音声開始時点か否かを判断す
るフレーム数は3つに限らずいくつでもよい。また現時
点から何フレーム過去の時点を音声開始時点とするかは
3フレーム過去の時点に限らない。
The present invention is not limited to the above embodiment, and various modifications can be made.
For example, any number of sampling values may be used to calculate the sum of sampling values. Further, the number of frames for determining whether or not it is the voice start time is not limited to three, and may be any number. Further, the number of frames past from the present time point is not limited to the time point of three frames past as the voice start time point.

さらに上記実施例は単音節音声の子音切出しに適用した
場合を示したが、単語音声や連続音声の開始時点の検知
にも本発明を適用できることはいうまでもない。
Furthermore, although the above-described embodiment has been applied to the consonant extraction of monosyllabic voices, it goes without saying that the present invention can also be applied to detection of the start point of a word voice or continuous voice.

〔発明の効果〕〔The invention's effect〕

以上の通り本発明によれば正確に音声開始時点を検知す
ることができる。
As described above, according to the present invention, it is possible to accurately detect the voice start point.

【図面の簡単な説明】[Brief description of drawings]

第1図は本発明の一実施例による音声開始時点検知方法
のフローチャート、第2図は入力音声波形を示す波形図
である。
FIG. 1 is a flowchart of a method for detecting a voice start point according to an embodiment of the present invention, and FIG. 2 is a waveform diagram showing an input voice waveform.

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】入力音声波形のサンプリング値を、現時点
を基準として時間軸と逆方向に一定時間間隔の複数時点
ごとに複数個ずつとり出し、このとり出された複数個の
サンプリング値の絶対値の和であるサンプリング値和を
計算し、これらサンプリング値和のすべてが予め定めら
れたしきい値を超えた場合には、前記現時点を基準とし
て音声開始時点を定めることを特徴とする音声開始時点
検知方法。
1. A plurality of sampling values of an input speech waveform are taken out at a plurality of time points at fixed time intervals in the direction opposite to the time axis with respect to the present time, and the absolute values of the plurality of sampled values taken out. The sum of the sampling values, which is the sum of the following, is calculated, and when all of these sums of sampling values exceed a predetermined threshold value, the sound starting time point is determined based on the present time point. Detection method.
JP61101806A 1986-05-01 1986-05-01 Voice start time detection method Expired - Lifetime JPH071433B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP61101806A JPH071433B2 (en) 1986-05-01 1986-05-01 Voice start time detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP61101806A JPH071433B2 (en) 1986-05-01 1986-05-01 Voice start time detection method

Publications (2)

Publication Number Publication Date
JPS62258499A JPS62258499A (en) 1987-11-10
JPH071433B2 true JPH071433B2 (en) 1995-01-11

Family

ID=14310377

Family Applications (1)

Application Number Title Priority Date Filing Date
JP61101806A Expired - Lifetime JPH071433B2 (en) 1986-05-01 1986-05-01 Voice start time detection method

Country Status (1)

Country Link
JP (1) JPH071433B2 (en)

Also Published As

Publication number Publication date
JPS62258499A (en) 1987-11-10

Similar Documents

Publication Publication Date Title
US4769844A (en) Voice recognition system having a check scheme for registration of reference data
JP2829014B2 (en) Speech recognition device and method
JPH071433B2 (en) Voice start time detection method
EP0109140B1 (en) Recognition of continuous speech
JPH01159697A (en) Voice recognition apparatus
JP3091537B2 (en) How to create voice patterns
JP3360978B2 (en) Voice recognition device
JP3031081B2 (en) Voice recognition device
JP4604424B2 (en) Speech recognition apparatus and method, and program
JP2844592B2 (en) Discrete word speech recognition device
JPH0651792A (en) Speech recognizing device
JP2679039B2 (en) Vowel cutting device
JP3484559B2 (en) Voice recognition device and voice recognition method
JP3474949B2 (en) Voice recognition device
JPH0643893A (en) Voice recognition method
JP2892004B2 (en) Word speech recognition device
JPH0792679B2 (en) Monosyllabic speech recognizer
JPS61292199A (en) Voice recognition equipment
JPS61260299A (en) Voice recognition equipment
WO1987003127A1 (en) System and method for sound recognition with feature selection synchronized to voice pitch
JP2744622B2 (en) Plosive consonant identification method
JPH09138695A (en) Voice recognition device
JPS62115498A (en) Voiceless plosive consonant identification system
JPH0693196B2 (en) Pitch extractor
JPH05165491A (en) Voice recognizing device