JPS63163495A

JPS63163495A - Voice section detector

Info

Publication number: JPS63163495A
Application number: JP61312193A
Authority: JP
Inventors: 北野　正明
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1986-12-26
Filing date: 1986-12-26
Publication date: 1988-07-06
Anticipated expiration: 2010-11-01
Also published as: JPH07101354B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】２ページ産業上の利用分野本発明は音声認識装置等に用いられる音声区間検出装置
に関するものである。DETAILED DESCRIPTION OF THE INVENTION Page 2 Industrial Application Field The present invention relates to a speech segment detection device used in speech recognition devices and the like.

従来の技術近年、音声認識技術の進歩に伴い、音声認識を用いた各
種機器が生まれ始めた。しかし認識技術の上で問題にな
るものの一つは入力信号から音声区間を正確に切り出す
ことであり、様々な音声区間検出装置が開発されてきた
（例えば、特開昭６゜−５２９００号公報）。BACKGROUND OF THE INVENTION In recent years, with the advancement of voice recognition technology, various devices using voice recognition have begun to be created. However, one of the problems in recognition technology is accurately extracting speech sections from the input signal, and various speech section detection devices have been developed (for example, Japanese Patent Application Laid-open No. 6゜-52900). .

以下に従来の音声区間検出装置について説明する。第２
図は従来の音声区間検出装置のブロック図であシ、１に
音声入力端子、８は音声信号の特徴を取シ出し雑音をカ
ットするＢＰＦ　（バンドパスフィルタ）、５ＨＢＰＦ
ｓを通ったエネルキレベルとあらかじめ設定した閾値と
を比較して音声区間を決定するエネルギレベル比較器、
２は音声区間出力端子、９（ｌ−Ｉ音声区間出力端子２
より出カされる音声信号より認識結果を出す音声認識マ
ツチング部である。A conventional voice section detection device will be explained below. Second
The figure is a block diagram of a conventional voice section detection device. 1 is an audio input terminal, 8 is a BPF (band pass filter) that extracts the characteristics of the audio signal and cuts noise, and 5HBPF.
an energy level comparator that determines a voice section by comparing the energy level passed through s with a preset threshold;
2 is a voice section output terminal, 9 (l-I voice section output terminal 2
This is a voice recognition matching unit that outputs recognition results from the voice signals output from the voice signals.

３ヘーノ以上のように構成された従来の音声区間検出装置につい
て、以下その動作を説明する。The operation of the conventional voice section detection device configured as 3 or more will be described below.

まず音声入力端子１から音声信号が入力され、あらかじ
め帯域が設定されているＢＰＦｓを通る。First, an audio signal is input from the audio input terminal 1 and passes through BPFs whose bands are set in advance.

ＢＰＦ８の帯域は、音声の特徴を良く示し、かつ雑音が
表れない帯域に設定する。ＢＰＦ８を通った音声信号は
、エネルギレベル比較器３にて、音声信号のエネルギと
あらかじめ設定されている閾値とが比較され、エネルギ
が閾値より大きい区間を音声区間とみなし、音声区間出
力端子２へ出力する。音声認識マツチング部は音声区間
出力端子２より出力される音声信号とあらかじめ蓄えら
れている音声の標準パターンとを比較して認識結果を出
力する。The band of BPF8 is set to a band that clearly shows the characteristics of the voice and does not cause noise. The audio signal that has passed through the BPF 8 is compared in the energy level comparator 3 with the energy of the audio signal and a preset threshold, and the section where the energy is greater than the threshold is regarded as a speech section and is sent to the speech section output terminal 2. Output. The speech recognition matching section compares the speech signal outputted from the speech section output terminal 2 with a standard pattern of speech stored in advance and outputs a recognition result.

発明が解決しようとする問題点しかしながら上記の従来の構成でに、雑音の周波数帯域
を除いた周波数帯のエネルギにより音声区間を検出して
いたので、ランダム雑音（雑音は多くの場合、ランダム
雑音である）を音声と誤まって検出していた。まだ、音
声の特徴を持つ帯域の設定も困難であった。Problems to be Solved by the Invention However, in the conventional configuration described above, speech sections are detected using energy in a frequency band excluding the noise frequency band. ) was being mistakenly detected as voice. It was still difficult to set a band with voice characteristics.

本発明は上記の従来の問題点を解決するもので、正確に
音声の区間を雑音の区間より分離することのできる音声
区間検出装置を提供することを目的とする。SUMMARY OF THE INVENTION The present invention solves the above-mentioned conventional problems, and aims to provide a speech section detection device that can accurately separate speech sections from noise sections.

問題点を解決するだめの手段この目的を達成するために本発明の音声区間検出装置に
、音声入力のエネルギとあらかじめ設定された閾値とを
比較し、仮の音声区間を検出するエネルギレベル比較器
と、前記エネルギレベル比較器で検出された仮の音声区
間に雑音が含まれているか検知し、雑音が含まれている
場合、雑音の区間を除去する雑音検出器とから構成され
ている。Means for Solving the Problem In order to achieve this objective, the speech section detection device of the present invention includes an energy level comparator that compares the energy of the speech input with a preset threshold and detects a temporary speech section. and a noise detector that detects whether noise is included in the temporary voice section detected by the energy level comparator and removes the noise section if noise is included.

なお前記雑音検出器は、音声入力のピッチ周波数の時間
方向のランダム性を算出するピッチのランダム性算出手
段と、ピッチのランダム性算出手段により算出されたピ
ンチのランダム性とあらかじめ設定した閾値とを比較し
てピンチのランダム性が閾値より大きい状態の継続時間
があらかじめ設定した閾値時間以上で、かつ、ローパス
フィルタ５へ＝７を通した音声入力があらかじめ設定した閾値以上のとき
この区間を雑音と判定する雑音判定手段と、この雑音判
定手段で検知された雑音区間を仮の音声区間から除去す
る雑音区間除去手段とにより構成されている。Note that the noise detector includes a pitch randomness calculation means for calculating the temporal randomness of the pitch frequency of the audio input, and a preset threshold value and the pinch randomness calculated by the pitch randomness calculation means. In comparison, when the duration of the state in which the randomness of the pinch is greater than the threshold is longer than the preset threshold time, and the audio input that has passed through the low-pass filter 5 is equal to or greater than the preset threshold, this section is considered to be noise. It is comprised of a noise determining means for making a determination, and a noise section removing means for removing the noise section detected by the noise determining means from a temporary speech section.

作用本発明は上記した構成によって、まずエネルギレベル比
較器で、音声入力のエネルギとあらかじめ設定された閾
値とを比較して仮の音声区間を検出する。次に雑音検出
器のピッチのランダム性算出手段で、音声入力のピンチ
周波数の時間方向のランダム性を算出する。雑音検出器
の雑音判定手段でこのピンチのランダム性があらかじめ
設定した閾値より大きい状態があらかじめ設定した閾値
時間以上継続し、かつ、あらかじめ設定した周波数以下
のエネルギがあらかじめ設定した閾値より大きい時、こ
の区間を雑音であると判定する。そして雑音検出器の雑
音区間除去手段にて仮の音声区間から雑音の区間を除去
して出力することにより、音声区間が正確に抽出できる
。Operation According to the above-described configuration, the present invention first detects a temporary speech section by comparing the energy of the speech input with a preset threshold using the energy level comparator. Next, the pitch randomness calculation means of the noise detector calculates the randomness of the pinch frequency of the audio input in the time direction. When the randomness of this pinch is greater than the preset threshold in the noise judgment means of the noise detector and continues for more than the preset threshold time, and the energy below the preset frequency is greater than the preset threshold, this The section is determined to be noise. Then, the noise section removal means of the noise detector removes the noise section from the temporary speech section and outputs the result, so that the speech section can be extracted accurately.

６ベーノ実施例以下本発明の一実施例について、図面を参照しながら説
明する。6 BENO EMBODIMENT An embodiment of the present invention will be described below with reference to the drawings.

第１図は本発明の一実施例における音声区間検出装置の
ブロック図である。第１図において、１は音声入力端子
、３に音声入力端子１より入力される音声入力のエネル
ギレベルとあらかじめ設定した閾値とを比較して仮の音
声区間を検出するエネルギレベル比較器、４に前記音声
入力の４００田以下の信号のみ通すＬＰＦ、５［前記音
声入力のピッチ周波数を検出するピッチ検出器、７１は
ピッチ検出器５で検出されたピッチ周波数のランダム度
を算出するピッチのランダム性算出器、７２はエネルギ
レベル比較器３によって検出された仮の音声区間に対し
て、ピッチのランダム性算出手段７１で算出されたピッ
チのランダム性とＬＰＦ４を通した音声入力のエネルギ
とを判定して雑音か音声かを決定する雑音判定手段、７
３は雑音判定手段７２によって雑音であると判定された
区間を除去する雑音区間除去手段である。２に音声区７
ベーノ間出力端子、９は音声認識マツチング部であり、これら
は従来例と同様である。なお雑音検出器７ぼピッチのラ
ンダム性算出手段７１と雑音判定手段７２と雑音区間除
去手段７３とにより構成される。FIG. 1 is a block diagram of a voice section detection device according to an embodiment of the present invention. In FIG. 1, 1 is an audio input terminal, 3 is an energy level comparator that compares the energy level of the audio input from the audio input terminal 1 with a preset threshold value, and detects a temporary audio section; an LPF that passes only signals of 400 or less of the audio input; 5 a pitch detector that detects the pitch frequency of the audio input; 71 a pitch randomness that calculates the randomness of the pitch frequency detected by the pitch detector 5; A calculator 72 determines the pitch randomness calculated by the pitch randomness calculation means 71 and the energy of the voice input through the LPF 4 for the temporary voice section detected by the energy level comparator 3. noise determining means for determining whether the noise is noise or voice; 7
Reference numeral 3 denotes a noise section removing means for removing the section determined to be noise by the noise determining means 72. 2 to voice section 7
The inter-Beno output terminal 9 is a voice recognition matching section, and these are the same as in the conventional example. The noise detector 7 is composed of a pitch randomness calculating means 71, a noise determining means 72, and a noise section removing means 73.

以上のように構成された本実施例の音声区間検出装置に
ついて以下その動作を説明する。The operation of the voice section detection device of this embodiment configured as described above will be explained below.

まず音声入力端子１より音声入力が入力されると、エネ
ルギレベル比較器３にて、前記音声入力のエネルギとあ
らかじめ設定されている閾値とを比較し、エネルギが閾
値より大きい区間を仮音声区間とみなす。ま、た、前記
音声入力ＨＬＰＦ４にて４００１−Ｉｚ以下の信号のみ
通される、一方、ピッチ検出器５にてピンチ周波数が算
出され、前記ピッチ周波数にピッチのランダム性算出手
段７１でランダム性が算出される。First, when a voice input is input from the voice input terminal 1, the energy level comparator 3 compares the energy of the voice input with a preset threshold, and determines a section where the energy is greater than the threshold as a temporary voice section. I reckon. Also, only signals of 4001-Iz or less are passed through the audio input HLPF 4, while a pinch frequency is calculated by the pitch detector 5, and randomness is added to the pitch frequency by a pitch randomness calculating means 71. Calculated.

次にこのピッチのランダム性算出手段７１の一実施例を
説明すると、ピッチ検出器６で検出される１フレーム（
例えば１２　ｍ５８Ｑ）ごとのピッチ周波数の値の前後
のフレームの差の５点メディアンｅ４のフレームのピッ
チのランダム性とする。次に雑音判定手段７２でピッチ
のランダム性があらかじめ定められた閾値以上である状
態が、あらかじめ定められたフレーム数以上続き、かつ
ＬＰＦ４全通した音声入力のエネルギがあらかじめ定め
られた閾値以上であるとき、前記ピッチのランダム性の
大きい区間を雑音区間とみなし、雑音区間除去手段７３
ではエネルギレベル比較器３で検出された仮音声区間か
ら前記雑音区間を除去し、音声区間出力端子２へ出力し
音声認識マツチング部９への入力とする。Next, one embodiment of the pitch randomness calculation means 71 will be explained. One frame (
For example, let it be the randomness of the pitch of the frame with a 5-point median e4 of the difference between the frames before and after the pitch frequency value every 12 m58Q). Next, the noise determining means 72 determines that the state in which the pitch randomness is equal to or greater than a predetermined threshold continues for a predetermined number of frames or more, and the energy of the audio input passed through the LPF 4 is equal to or greater than the predetermined threshold. In this case, the section with large pitch randomness is regarded as a noise section, and the noise section removing means 73
Then, the noise section is removed from the temporary speech section detected by the energy level comparator 3, and outputted to the speech section output terminal 2, which is input to the speech recognition matching section 9.

以上のように本実施例によれば、雑音検出器７によりエ
ネルギレベル比較器３により検出された仮音声区間の中
に含まれる雑音を検出し、雑音が存在する場合、雑音区
間を除去するので、正確な音声区間が検出できる。さら
に雑音検出器７は、ピッチのランダム性とその継続長そ
してＬＰＦを通る音声信号のエネルギにより雑音を検知
する雑音判定手段７２を備えているので、精度良く雑音
を検出できる。まだＬＰＦ４やピッチ検出器６は９ペー
／音声認識の特徴パターン抽出用に用いられるので本発明
の音声区間検出装置用として特に追加しなくて良く、装
置の大きさ１価格、処理速度の面でも優れた音声区間検
出装置である。As described above, according to this embodiment, the noise detector 7 detects the noise included in the temporary speech section detected by the energy level comparator 3, and if noise is present, the noise section is removed. , accurate speech intervals can be detected. Furthermore, the noise detector 7 is equipped with a noise determining means 72 that detects noise based on the randomness of the pitch, its duration, and the energy of the audio signal passing through the LPF, so that it can detect noise with high accuracy. Since the LPF 4 and the pitch detector 6 are still used for extracting feature patterns for speech recognition, there is no need to add them for the speech section detection device of the present invention, and there is no need to add them in terms of device size, cost, and processing speed. This is an excellent voice section detection device.

発明の効果本発明は雑音検出器を設けることにより、エネルギ比較
器により検出された音声区間に含まれる雑音を検知し雑
音が存在する時、雑音区間を除去することができ、さら
に、雑音検出器はピンチのランダム性とその継続長そし
てＬＰＦｉ通る音声信号のエネルギにより雑音を検知す
るので、精度良く雑音を検出できる。Effects of the Invention By providing a noise detector, the present invention can detect noise included in the voice section detected by the energy comparator and remove the noise section when noise is present. detects noise based on the randomness of the pinch, its duration, and the energy of the audio signal passing through the LPFi, so it can detect noise with high accuracy.

[Brief explanation of the drawing]

第１図は本発明の一実施例における音声区間検出装置の
ブロック図、第２図は従来の音声区間検出装置のブロッ
ク図である。１・・・・・・音声入力端子、２・・・・・・音声区間
出力端子、３・・・・・・エネルギレベル比較器、４・
・・・・・ＬＰＦ、５・・・・・・ピッチ検出器、７・
・・・・・雑音検出器、７１・・・・・・ピッチのラン
ダム性算出手段、７２・・・・・・雑音判定１　ｏベー
ン手段、７３・・・・・・雑音区間除去手段、９・・・・
・・音声認識マツチング部、８・・・・・・ＢＰＦ。FIG. 1 is a block diagram of a speech segment detection device according to an embodiment of the present invention, and FIG. 2 is a block diagram of a conventional speech segment detection device. 1...Audio input terminal, 2...Audio section output terminal, 3...Energy level comparator, 4.
...LPF, 5...Pitch detector, 7.
... Noise detector, 71 ... Pitch randomness calculation means, 72 ... Noise judgment 1 o vane means, 73 ... Noise section removal means, 9・・・・・・
...Speech recognition matching section, 8...BPF.

Claims

[Claims]

an energy level comparator that detects a temporary voice section by comparing the energy of the voice input with a preset threshold; a pitch randomness calculation means that calculates the randomness of the pitch frequency of the voice input in the time direction; The pitch randomness calculated by the pitch randomness calculation means is compared with a preset threshold value, and the duration of the state in which the pitch randomness is greater than the threshold value is equal to or longer than the preset threshold time, and the low-pass filter is Three means: a noise determining means that determines this section as noise when the voice input passed through it is equal to or higher than a preset threshold, and a noise section removing means that removes the noise section detected by the noise determining means from the temporary voice section. What is claimed is: 1. A speech interval detection device comprising: a noise detector comprising: