JP2835483B2

JP2835483B2 - Voice discrimination device and sound reproduction device

Info

Publication number: JP2835483B2
Application number: JP5151664A
Authority: JP
Inventors: 武志則松; 良久中藤
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1993-06-23
Filing date: 1993-06-23
Publication date: 1998-12-14
Anticipated expiration: 2013-12-14
Also published as: JPH0713586A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、映像装置や音響装置な
どの前処理装置として使用され、連続して入力される音
響信号が音声であるか否かを自動的に判別する音声判別
装置と、音声判別装置を利用した音響再生装置とに関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech discriminating apparatus which is used as a pre-processing device for a video apparatus, a sound apparatus, or the like, and which automatically discriminates whether or not a continuously input sound signal is a sound. And a sound reproducing device using a voice discriminating device.

【０００２】[0002]

【従来の技術】近年、ステレオ装置やテレビジョン受像
機（以下、テレビと称す）などに効果音を創り出す「サ
ラウンド」などと称する機能が登載されている。これら
の機能は音楽などのソースに対しては効果が大きい反
面、ニュース番組などの音声主体のソースに対しては逆
に明瞭感が乏しくなってしまう。したがって、ソースが
音声主体のものか、それ以外のものかを自動的に判別す
ることができれば、その結果に応じて音場や周波数特性
を最適に制御することが可能となる。2. Description of the Related Art In recent years, a function called "surround" for creating a sound effect has been added to a stereo device or a television receiver (hereinafter, referred to as a television). While these functions are very effective for sources such as music, they are less clear for sources mainly composed of audio such as news programs. Therefore, if it is possible to automatically determine whether the source is mainly of voice or not, the sound field and the frequency characteristics can be optimally controlled according to the result.

【０００３】従来の音声判別装置では、入力信号がステ
レオ信号であることを利用している。すなわち、音楽な
どのソースの場合、左チャンネル（以下、Ｌチャンネル
と称す）と右チャンネル（以下、Ｒチャンネルと称す）
の信号は互いに独立しており、両チャンネル間の相関が
低い。逆に、ニュース番組などの音声主体のソースの場
合は中央に定位しており、左信号（以下、Ｌ信号と称
す）と右信号（以下、Ｒ信号と称す）がほとんど同じ信
号であるため、両チャンネル間の相関が高い。したがっ
て、Ｌ信号とＲ信号の振幅の差を計算し、差が小さい時
は音声信号、差が大きいときは音声以外の信号であると
して判別している。また、Ｌ信号とＲ信号の相関値を計
算し、相関値の大きい場合は音声信号、小さい場合は音
声以外の信号とすることもできる。[0003] A conventional voice discriminating device utilizes that an input signal is a stereo signal. That is, in the case of a source such as music, a left channel (hereinafter, referred to as an L channel) and a right channel ( hereinafter, referred to as an R channel)
Are independent of each other, and the correlation between both channels is low. Conversely, in the case of a sound-based source such as a news program, the source is localized at the center, and the left signal (hereinafter, referred to as L signal) and the right signal (hereinafter, referred to as R signal) are almost the same signal. The correlation between both channels is high. Therefore, the difference between the amplitudes of the L signal and the R signal is calculated, and when the difference is small, it is determined that the signal is an audio signal, and when the difference is large, the signal is determined to be a signal other than voice. Further, the correlation value between the L signal and the R signal is calculated, and if the correlation value is large, the signal may be an audio signal, and if the correlation value is small, the signal may be a signal other than audio.

【０００４】[0004]

【発明が解決しようとする課題】このような従来の音声
判別装置では、ステレオのソースについては効果がある
が、Ｌ信号とＲ信号に差がないモノラルのソースに対し
ては判別できないという問題があった。BRIEF Problem to be Solved] In the conventional voice discriminating apparatus, that is effective for the source of the scan Te Leo, can not be determined for the L signal and the source of the mono no difference in R signal There was a problem.

【０００５】本発明は上記の課題を解決するもので、モ
ノラル信号とステレオ信号のいづれの信号に対しても精
度よく音声か否かを判別できる音声判別装置と、この音
声判別装置を用い、ソースに合わせて自動的に音響特性
を制御できる音響再生装置とを提供することを目的とす
る。SUMMARY OF THE INVENTION The present invention solves the above-mentioned problems. An audio discriminating apparatus capable of accurately discriminating whether a signal is a monaural signal or a stereo signal is provided. It is an object of the present invention to provide a sound reproducing apparatus that can automatically control sound characteristics in accordance with the sound reproduction.

【０００６】[0006]

【課題を解決するための手段】請求項１に係わる本発明
は、一定時間のフレームごとに音響信号の音響パワーを
算出するパワー算出部と、算出された音響パワー値をあ
らかじめ設定したしきい値と比較してそのフレームの有
音無音を判定する有音無音判定部と、前記フレームごと
に前記音響信号の波形の零交差回数を算出する零交差算
出部と、算出された零交差回数を予め設定したしきい値
と比較してそのフレームの子音性を判定する子音性判定
部と、連続する所定複数フレーム区間におけるパワー値
の最大値と最小値とを検出し、その差分値を算出する定
常性判定部と、前記複数フレームにおいて無音と判定さ
れたフレームの存在比率と子音性が高いと判定されたフ
レームの存在比率と、前記差分値とがそれぞれにあらか
じめ設定したしきい値よりもすべて大きい場合に音声と
判定し、音声と判定されない場合であって、前記複数フ
レームにおいて無音と判定されたフレームの存在比率
と、前記差分値とがそれぞれにあらかじめ前記しきい値
より小さく設定したしきい値より小さい場合にその複数
フレームにおける音響信号は非音声と判定し、それ以外
の場合はその複数フレームにおける音響信号は不定と判
定し、複数フレームごとに判定結果を出力する音声判定
部とを備えた音声判別装置であり、また、請求項２に係
わる発明は、音響信号を入力して音声非音声の判別を行
う請求項１記載の音声判別装置と、前記音響信号と前記
音声判別装置の所定時間ごとの音声非音声判別結果とを
入力し、音声非音声判定結果に応じて前記音響信号の周
波数特性を最適な周波数特性に段階的に変更する周波数
特性制御部とを備えた音響再生装置である。 According to a first aspect of the present invention, there is provided a power calculating unit for calculating the acoustic power of an acoustic signal for each frame of a predetermined time, and a threshold value for setting the computed acoustic power value in advance. A sound / silence determining unit that determines the sound / non-sound of the frame by comparing with a zero-crossing calculating unit that calculates the number of zero-crossings of the waveform of the acoustic signal for each frame, and calculates the calculated number of zero-crossings in advance. A consonant determining unit that determines a consonant of the frame by comparing the set threshold value with a set threshold value , and detects a maximum value and a minimum value of power values in a plurality of continuous predetermined frame periods, and calculates a difference value between the detected power values. The sex determination unit, the presence ratio of the frames determined to be silent in the plurality of frames, the presence ratio of the frames determined to have high consonantness, and the difference value are preset in advance. And the voice if all than the value large
Is determined and the voice is not determined.
Existence ratio of frames judged to be silent in the frame
And the difference value are respectively set to the threshold value in advance.
Multiple if smaller than the set threshold
The sound signal in a frame is determined to be non-voice, and in other cases, the sound signal in a plurality of frames is determined to be undefined.
Constant, and a voice discriminating apparatus and a sound determination unit for outputting a determination result for each of a plurality frames, also invention, line discrimination of speech non-speech to input acoustic signals according to claim 2
And the sound signal and the sound signal.
The voice non-voice discrimination result for each predetermined time of the voice discrimination device is
Input, and the frequency of the sound signal is
The frequency at which the wave number characteristic is gradually changed to the optimal frequency characteristic
This is a sound reproduction device including a characteristic control unit.

【０００７】[0007]

【作用】請求項１に係わる本発明において、パワー算出
部は音響信号のフレーム区間の信号パワーを算出し、算
出されたパワーの大きさからその区間が有音か無音かを
判定し、零交差回数算出部は音響信号のフレーム区間の
零交差回数を算出し、子音性判定部は算出された零交差
回数の大きさからその区間の子音性を判定する。定常性
判定部は連続する複数フレーム区間におけるパワーの最
大値と最小値との差分値を算出する。音声判別部は複数
レーム区間において、無音フレームの存在比率と、子音
性フレームの存在比率と、パワー差分値とがそれぞれに
設定されたしきい値より大きいときにその複数フレーム
区間における音響信号は音声と判定するとともに、音声
と判定できない場合であって、無音フレームの存在比率
と、パワー差分値とがそれぞれの前記しきい値より小さ
く設定されたしきい値より小さいときは非音声と判定
し、それ以外のときは不定と判定する。 According to the first aspect of the present invention, the power calculation section calculates the signal power of the frame section of the audio signal, determines whether the section is sound or no sound based on the calculated power level, and performs zero-crossing. The number calculation unit calculates the number of zero crossings in the frame section of the audio signal, and the consonant determination unit determines the consonantness of the section from the magnitude of the calculated number of zero crossings. The continuity determining unit calculates a difference value between a maximum value and a minimum value of power in a plurality of continuous frame sections. The sound discriminating unit determines that the sound signal in the plurality of frame sections is not sound when the existence rate of silence frames, the existence rate of consonant frames, and the power difference value are larger than the threshold values set respectively. And the sound
And the existence ratio of silence frames
And the power difference values are smaller than the respective threshold values.
If it is smaller than the set threshold, it is judged as non-voice
Otherwise, it is determined to be undefined.

【０００８】また、請求項２に係わる発明において、音
声判別部は音響信号が音声かどうかを判定し、周波数特
性制御部はその判定結果に基づいて、入力した音響信号
の周波数特性をその音響信号に適した周波数特性に段階
的に切り替えて出力させる。 Further, in the invention according to claim 2, the voice discriminating section determines whether or not the acoustic signal is a voice, and determines a frequency characteristic.
The sex control unit, based on the determination result,
Frequency characteristics of the sound signal
And switch the output.

【０００９】[0009]

【実施例】（実施例１）以下、本発明の音声判別装置の一実施例について図面を
参照しながら説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS (Embodiment 1) An embodiment of the voice discriminating apparatus of the present invention will be described below with reference to the drawings.

【００１０】図１は本実施例の構成を示すブロック図で
ある。図において、１は入力信号のパワーを計算するパ
ワー算出部、２はフレームごとに波形の零交差の回数を
算出する零交差算出部、３は算出されたパワーをしきい
値と比較することによりフレームの入力信号が有音であ
るか無音であるかを判定する有音無音判定部、４はフレ
ームごとの零交差回数に基づいて、そのフレームの子音
性の有無を判定する子音性判定部、５は一定の複数フレ
ームごとのパワーの最大値と最小値との差分値により定
常性の判定をする定常性判定部、６は複数フレーム中の
無音判定フレーム数の割合、複数フレーム間の最大パワ
ーと最小パワーとの差、および複数フレーム中に占める
零交差回数が一定回数以上のフレーム数の割合により、
複数フレームごとに音声であるか非音声であるかを判定
する音声判定部である。FIG. 1 is a block diagram showing the configuration of this embodiment. In the figure, 1 is a power calculation unit for calculating the power of an input signal, 2 is a zero-crossing calculation unit for calculating the number of zero-crossings of the waveform for each frame, and 3 is a unit that compares the calculated power with a threshold value. A voiced / silent determining unit that determines whether the input signal of the frame is voiced or silent, 4 is a consonant determining unit that determines the presence or absence of consonantness of the frame based on the number of zero crossings for each frame, 5 stationarity determining unit for determining the stationarity by the difference value between the maximum value and the minimum value of the power of every predetermined plurality of frames, a silence determination frame number ratio of the plurality frames 6, the maximum power among a plurality of frames And the difference between the minimum power and the ratio of the number of frames where the number of zero crossings in a plurality of frames is a certain number or more,
An audio determination unit that determines whether the audio is audio or non-audio for each of a plurality of frames.

【００１１】上記構成要素の相互関係と動作について説
明する。ここでは入力信号は音響機器、テレビなどの機
器の信号とし、また、ステレオ信号とする。入力された
ステレオ信号のＬ信号、Ｒ信号はミキシングされ、（Ｌ
＋Ｒ）信号としてパワー算出部１に入力される。パワー
算出部１は一定時間間隔のフレームごとに、その区間の
振幅の累積値または平均値をそのフレームにおけるパワ
ー値として算出する。零交差算出部２ではフレームごと
に入力波形が零振幅値を横切る回数を零交差回数Ｚ0 と
して算出する。音声の場合、零交差回数はとくに無声摩
擦子音で大きな値を示す。子音性判定部４は零交差算出
部２で得られたそのフレームの零交差回数Ｚ0がＺ0＞Ｚt を満たせば子音性が高いと判定する。ここで、Ｚt は子
音性を判定するためにあらかじめ設定されたしきい値で
あり、実験の結果では標本化周波数が１０ｋＨｚでフレ
ーム長が２０ミリ秒の場合、４０回程度が妥当な値であ
る。子音性が高いと判定されたフレームの数を一定の複
数フレーム単位で累積していく。この累積値をＮZとす
る。The interrelationship and operation of the above components will be described. Here, the input signal is a signal of a device such as an audio device and a television, and is a stereo signal. The input L signal and R signal of the stereo signal are mixed, and (L
+ R) as a signal. The power calculation unit 1 calculates, for each frame at a fixed time interval, a cumulative value or an average value of amplitudes in the section as a power value in the frame. The zero-crossing calculation unit 2 calculates the number of times the input waveform crosses the zero amplitude value for each frame as the number of zero crossings Z0. In the case of speech, the number of zero crossings shows a large value especially for unvoiced fricative consonants. The consonant determining unit 4 determines that the consonant is high if the number of zero crossings Z0 of the frame obtained by the zero crossing calculating unit 2 satisfies Z0> Zt. Here, Zt is a preset threshold value for judging consonantness. According to the experimental results, when the sampling frequency is 10 kHz and the frame length is 20 milliseconds, about 40 times is a reasonable value. . The number of frames determined to have high consonantness is accumulated in units of a fixed number of frames. This accumulated value is assumed to be NZ.

【００１２】有音無音判定部３はパワー算出部１で得た
パワー値を用いてフレームごとに有音か無音かの判定を
下す。ここで、現フレームのパワー値をＰ、有音無音判
定のしきい値をＰtとすると、Ｐ＜Ｐt を満たすときに無音と判定し、無音と判定されたフレー
ム数を一定の複数フレーム単位で累積する。このフレー
ム数の累積値をＮpとする。The sound / non-speech determining unit 3 uses the power value obtained by the power calculating unit 1 to determine whether there is sound or no sound for each frame. Here, assuming that the power value of the current frame is P and the threshold value of the sound / silence determination is Pt, when P <Pt is satisfied, it is determined that there is no sound, and the number of frames determined to be silent is determined in units of a fixed plurality of frames. Cumulative. The accumulated value of the number of frames is defined as Np.

【００１３】ここで、しきい値Ｐt はあらかじめ設定さ
れた値であるが、入力レベルの変動に応じて適応的に値
を定めてもよい。以上の処理は１フレーム単位の処理で
ある。Here, the threshold value Pt is a preset value, but may be adaptively determined according to a change in the input level. The above process is a process of one frame unit.

【００１４】以下の処理は複数Ｆフレームを１単位とし
て処理を行うものとする。ここで処理間隔Ｆは音声の特
徴が最小限確認できる単位とし、実際には連続して発声
された音声の場合、平均して２、３音節が含まれるよう
な値（たとえば、１秒〜２秒の間）に設定すればよい。
この値Ｆは大きいほど音声らしさを精度よく検出できる
が、判定に要する時間が長くなるため、両者のトレード
オフで決定される。The following processing is performed with a plurality of F frames as one unit. Here, the processing interval F is a unit in which the features of the voice can be confirmed at a minimum, and in the case of a voice uttered continuously, a value including an average of a few syllables (for example, 1 second to 2 seconds) Seconds).
The larger the value F is, the more accurately the voice-likeness can be detected. However, the time required for the determination becomes longer, and thus the value F is determined by a trade-off between the two.

【００１５】このＦフレーム区間で子音性が高いとした
フレーム数の累積値ＮZ と、無音と判定したフレーム数
の累積値ＮP とから、Ｆフレーム区間における子音性の
高いフレームの存在比率がＮZ／Ｆ、Ｆフレーム区間に
おける無音区間の存在比率がＮP／Ｆとして与えられ
る。From the cumulative value NZ of the number of frames determined to have high consonantness in the F-frame section and the cumulative value NP of the number of frames determined to be silent, the existence ratio of frames having high consonantity in the F-frame section is NZ /. The existence ratio of the silent section in the F and F frame sections is given as NP / F.

【００１６】また、定常性判定部５はＦフレームごと
に、その間のパワーの最大値および最小値を検出し、そ
の差分値Ｐd を計算する。連続して発声された音声は母
音と子音と無音部の繰り返しであるので、ある時間間隔
（今の場合、Ｆフレーム）でみれば当然、パワーの変
化、すなわちＰd の値は大きくなる。したがって、この
値Ｐd の大きさにより音声らしさの判定対象となる。Further, the continuity determining unit 5 detects the maximum value and the minimum value of the power during each F frame, and calculates the difference value Pd. Since the continuously uttered voice is a repetition of a vowel, a consonant, and a silent part, the power change, that is, the value of Pd naturally increases at a certain time interval (in this case, F frame). Therefore, the value of the value Pd is used as a target for determining the likelihood of speech.

【００１７】音声判定部６は、有音無音判定部３、定常
性判定部４、および子音性算出部５でそれぞれ得られた
Ｎz、Ｎp、Ｎd を用いて、無音区間の存在比率、子音性
の高いフレームの存在比率、パワー差分値の条件、すな
わち、以下に示す判定式をすべて満たすとき音声である
と判定する。The voice determining unit 6 uses the Nz, Np, and Nd obtained by the voiced / silent determining unit 3, the continuity determining unit 4, and the consonant calculating unit 5, respectively, to determine the existence ratio of the silent section and the consonantness. Is determined to be a voice when the conditions of the existence ratio of frames having a high frame rate and the power difference value, that is, all of the following determination formulas are satisfied.

【００１８】ａ＜（ＮZ／Ｆ）＜ｂ（Ｎp／Ｆ）＞ｃＰd＞Ｐdtv ただし、ａ、ｂ、ｃ、Ｐdtv は有音無音判定のためのパ
ラメータごとのしきい値であり、実験により最適な値を
定める。ａとｂはそれぞれ子音性の高いフレームの存在
比率の下限しきい値と上限しきい値、ｃは無音区間の存
在比率のしきい値、Ｐdtv はパワーの変化度合を計るし
きい値である。以上の処理により、Ｆフレーム内に無音
区間、子音区間が一定値以上存在し、かつ、パワーの変
化が大きい場合にソースは音声である可能性が高いとし
て音声と判定する。一方、非音声の判定は、とくに非音
声を音楽と限定した場合を考えると、無音区間が殆ど存
在せず、パワーの変化が小さい（定常性がある）場合、
すなわち、（Ｎp／Ｆ）＜ｄＰd＜；Ｐdtu の条件を満足するときのみ非音声（音楽）であると判定
する。ここでｄは非音声判定のための無音区間の存在比
率のしきい値、Ｐdtu は非音声判定のためのパワー変化
度合を計るしきい値であって、前記しきい値ｃ、Ｐdtu
に対してｄ＜ｃ、Ｐdtu＜Ｐdtvである。A <(NZ / F) <b (Np / F)> c Pd> Pdtv where a, b, c, and Pdtv are threshold values for each parameter for sound / silence determination. Determine the optimal value. a and b are the lower and upper thresholds of the existence ratio of frames with high consonance, c is the threshold of the existence ratio of silence periods, and Pdtv is a threshold for measuring the degree of power change. According to the above processing, when the silence section and the consonant section exist in the F frame at a certain value or more and the power change is large, it is determined that the source is likely to be speech, and is determined to be speech. On the other hand, non-speech determination is performed when there is almost no silent section and the power change is small (constant), especially when non-speech is limited to music.
That is, only when the condition of (Np / F) <dPd <; Pdtu is satisfied, it is determined that the sound is non-voice (music). Here, d is a threshold value of the existence ratio of a silent section for non-speech determination, and Pdtu is a threshold value for measuring the degree of power change for non-speech determination.
D <c and Pdtu <Pdtv.

【００１９】音声、非音声のどちらの判定条件も満たさ
なかった場合は、どちらにも決定できないとして不定と
いう結果を出力する。この不定と判定することにより誤
った判定を防ぐことができ、また、不定の場合は前回の
判定結果をそのまま保持することにより、音声、非音声
の判定が短時間で切り替わる現象を防止することができ
る。これら一連の判定結果は、音声判別部６からＦフレ
ーム周期で連続して出力されることになる。If neither the speech nor the non-speech determination condition is satisfied, the result is indeterminate because neither can be determined. It is possible to prevent an erroneous determination by determining that this is indefinite, and to prevent a phenomenon in which the determination of voice or non-voice is switched in a short time by retaining the previous determination result in the case of indeterminate. it can. These series of determination results are continuously output from the voice determination unit 6 at the F frame period.

【００２０】以上のように本実施例によれば、音響信号
の子音性の存在比率と無音の存在比率と音響パワーの最
大最小の差分値とがそれぞれの所定値より大きいことに
より音声と判定するようにしたことにより、音響信号が
モノラルかステレオかに無関係に、音声か否かを判別で
き、さらに音声でない場合に、信号の連続性と最大最小
の差分値が音声より小さいことにより音楽のような非音
声であるかを精度よく判定することができる。また音声
とも非音声とも判定できない場合は不定と判定すること
により、音声と非音声の判定誤りによる極端な切り替わ
りを防止することが可能となる。 As described above, according to the present embodiment, the sound signal is determined to be a speech when the consonant existence ratio, the silence existence ratio, and the maximum and minimum difference values of the sound power are larger than the respective predetermined values. By doing so, it can be determined whether or not the audio signal is audio regardless of whether the audio signal is monaural or stereo. If the audio signal is not audio, the continuity of the signal and the difference between the maximum and the minimum are smaller than the audio, so that the audio signal is like music. It is possible to determine with high accuracy whether the voice is a non-voice . Also voice
If it is not possible to determine whether it is non-speech
Causes extreme switching due to erroneous judgment between speech and non-speech.
Can be prevented.

【００２１】（実施例２）以下、請求項２に係わる本発明の音響再生装置の実施例
について図面を参照しながら説明する。(Embodiment 2) Hereinafter, an embodiment of a sound reproducing apparatus according to the present invention according to claim 2 will be described with reference to the drawings.

【００２２】図２は本実施例の構成を示すブロック図で
ある。図において、７は音声音楽判別部であり、一定周
期毎にその区間が音声であるか音楽であるかの判定結果
を出力する。８は周波数特性制御部であり、音声音楽判
別部７の判定結果に基づいて音声または音楽に適した周
波数特性に徐々に切り替えていく。図３は周波数特性制
御部８が切り替えていく周波数特性図の一例を示す。FIG. 2 is a block diagram showing the configuration of this embodiment. In the figure, reference numeral 7 denotes an audio / music discriminating unit, which outputs a judgment result as to whether the section is sound or music at regular intervals. Reference numeral 8 denotes a frequency characteristic control unit that gradually switches to a frequency characteristic suitable for voice or music based on the determination result of the audio / music determination unit 7. FIG. 3 shows an example of a frequency characteristic diagram in which the frequency characteristic control section 8 switches.

【００２３】上記構成においてその動作を説明する。ま
ず、音声音楽判別部７は（Ｌ＋Ｒ）信号を入力し、一定
周期（Ｆフレーム区間）ごとに音声、音楽または不定と
いう判定を下し、その結果を周波数特性制御部８に出力
する。なお、音声音楽判別部７の動作は実施例１におけ
る音声判別装置の動作と同じであるので説明を省略す
る。また、非音声をここでは音楽と考える。周波数特性
制御部８には、あらかじめ設定された、たとえば、図３
に示したような１０個の周波数特性が用意されており、
入力信号が音声ソースであれば最終的に１の周波数特性
になるように、また、音楽ソースであれば１０の周波数
特性になるように制御する。The operation of the above configuration will be described. First, the audio / music discriminating unit 7 receives the (L + R) signal, determines whether the signal is audio, music, or indefinite at regular intervals (F frame intervals), and outputs the result to the frequency characteristic control unit 8. The operation of the audio / music discriminating section 7 is the same as the operation of the audio discriminating apparatus in the first embodiment, and a description thereof will be omitted. Also, non-voice is considered music here. The frequency characteristic control unit 8 is set in advance, for example, as shown in FIG.
There are 10 frequency characteristics as shown in
If the input signal is an audio source, control is performed so that the frequency characteristic finally becomes one, and if the input signal is a music source, control is performed so that the frequency characteristic becomes ten.

【００２４】いま、周波数特性の初期状態として５の特
性に設定されているものとする。音声音楽判定部７から
音声という判定結果を受け取った場合は、１段階音声の
特性１に近付けるため４の特性に変更する。音楽という
判定結果を受け取った場合は逆に１０の特性に１段階近
づけ、６の特性に変更する。また、不定の判定結果の場
合には、現在の状態５を維持する。この動作をＦフレー
ムごと送られてくる音声音楽判別結果に基づいて繰り返
すことにより、たとえば、音声という判定結果が続けば
次第に音声再生に適した特性に徐々に近づいていき、最
終的に１の特性に設定され、つぎに音楽という判定結果
を受け取るまでその状態に固定される。It is assumed that the frequency characteristic is set to 5 as an initial state. When the judgment result of the sound is received from the sound / music judgment section 7, the characteristic is changed to 4 in order to approach the characteristic 1 of the one-step sound. When the judgment result of music is received, on the contrary, the characteristic is approximated by one step to the characteristic of 10 and is changed to the characteristic of 6. If the determination result is indeterminate, the current state 5 is maintained. By repeating this operation based on the sound / music discrimination result sent for each F frame, for example, if the judgment result of sound continues, the characteristic gradually approaches a characteristic suitable for sound reproduction, and finally a characteristic of 1 And the state is fixed until the next judgment result of music is received.

【００２５】以上のように、本実施例によれば、ソース
が音声か音楽かを判別する音声音楽判別部７と、判別結
果に基づいてソースに適した周波数特性に段階的に近づ
けていく周波数特性制御部８を設けたことにより、装置
の周波数特性を入力ソースに適した周波数特性に自動的
に変更することができ、聴き易い音響再生装置を実現で
きる。また、音声、音楽に最適な特性に一挙に切り替え
るのではなく、段階的に切り替えることにより、周波数
特性の変更による違和感を感じさせることがない。ま
た、音声、音楽とも判定ができない場合に不定と判定
し、特性の変更を行わないように制御することにより、
誤判定のために特性が短時間で切り替わってしまう現象
を防止し、ソースに適した再生音を違和感なく出力する
ことができる。 As described above, according to the present embodiment, the audio / music discriminating section 7 for discriminating whether the source is voice or music, and the frequency gradually approaching the frequency characteristic suitable for the source based on the discrimination result. By providing the characteristic control unit 8, the frequency characteristics of the device can be automatically changed to the frequency characteristics suitable for the input source, and a sound reproduction device that is easy to hear can be realized. In addition, the characteristics are not switched all at once to the optimum characteristics for voice and music, but are switched stepwise so that the user does not feel uncomfortable due to the change in the frequency characteristics. Ma
Also, if it is not possible to judge both voice and music, judge it as indeterminate
And by controlling not to change the characteristics,
Phenomenon that characteristics are switched in a short time due to erroneous judgment
And output the playback sound suitable for the source without discomfort
be able to.

【００２６】[0026]

【発明の効果】以上の説明から明らかなように、請求項
１に係わる発明は、一定時間のフレームごとに音響信号
の音響パワーを算出するパワー算出部と、算出された音
響パワー値をあらかじめ設定したしきい値と比較してそ
のフレームの有音無音を判定する有音無音判定部と、前
記フレームごとに前記音響信号の波形の零交差回数を算
出する零交差算出部と、算出された零交差回数を予め設
定したしきい値と比較してそのフレームの子音性を判定
する子音性判定部と、連続する所定複数フレーム区間に
おけるパワー値の最大値と最小値を検出し、その差分値
を算出する定常性判定部と、前記複数フレームにおいて
無音と判定されたフレームの存在比率と、子音性が高い
と判定されたフレームの存在比率と、前記差分値とがそ
れぞれにあらかじめ設定したしきい値よりもすべて大き
い場合にはその複数フレームにおける音響信号は音声と
判定し、音声と判定されない場合であって、前記複数フ
レームにおいて無音と判定されたフレームの存在比率
と、前記差分値とがそれぞれにあらかじめ前記しきい値
より小さく設定したしきい値より小さい場合にその複数
フレーム区間における音響信号は非音声と判定し、それ
以外の場合にはその複数フレームにおける音響信号は不
定と判定し、複数フレームごとに判定結果を出力する音
声判定部とを備えたことにより、音響信号がモノラルか
ステレオであるかに関係なく音声であるか、非音声であ
るかを精度よく判定できる。また音声、非音声の判定が
確かでない場合に不定であると判定することにより、誤
判定によって音声非音声の判定が急に切り替わることを
防止できる。 As is apparent from the above description, the invention according to claim 1 is a power calculating unit for calculating the acoustic power of an acoustic signal for each frame of a fixed time, and the calculated acoustic power value is set in advance. A voiced / silent determination unit that determines voiced / voicelessness of the frame by comparing with the threshold value, a zero-crossing calculation unit that calculates the number of zero-crossings of the waveform of the acoustic signal for each frame, A consonant determining unit that determines the consonantness of the frame by comparing the number of crossings with a preset threshold value, and detects the maximum value and the minimum value of the power value in a plurality of continuous predetermined frame periods, and calculates the difference value. The calculated continuity determination unit, the existence ratio of the frames determined to be silent in the plurality of frames, the existence ratio of the frames determined to have high consonantness, and the difference value are each previously determined. When all of the sound signals are larger than the set threshold value, the sound signal in the plurality of frames is determined to be sound, and the sound signal in the plurality of frames is not determined to be sound. If the value is smaller than the threshold value set in advance to be smaller than the threshold value, the sound signal in the plurality of frame sections is determined to be non-speech, and otherwise, the sound signal in the plurality of frames is undefined. By providing a sound determination unit that makes a determination and outputs a determination result for each of a plurality of frames, it is possible to accurately determine whether a sound signal is sound or non-speech regardless of whether the sound signal is monaural or stereo . In addition, voice and non-voice
If you are not sure, you can determine
The judgment that voice / non-voice is suddenly switched by the judgment
Can be prevented.

【００２７】また、請求項２に係わる発明は、音響信号
を入力して音声非音声の判別を行う音声判別装置と、前
記音響信号と前記音声判別装置の所定時間ごとの音声非
音声判別結果とを入力し、音声非音声判別結果に応じて
周波数特性をその音響信号に最適な特性に段階的に変更
して出力する周波数特性制御部とを備えた音響再生装置
としたことにより、音響信号がモノラルかステレオであ
るかに関係なく、音声か否かに自動的に対応した周波数
特性で再生することができるとともに、不定判定の場合
には、前の特性をそのまま維持することにより、誤判定
による周波数特性の急な切り替わりや、特性が音声側、
非音声側の特性に不安定に変更される現象を防止し、非
常に聞きやすい音響再生装置を提供できる。 According to a second aspect of the present invention, there is provided a speech discriminating apparatus for discriminating speech non-speech by inputting a speech signal, and a speech non-speech discrimination result of the speech signal and the speech discriminating apparatus at predetermined time intervals. And a frequency characteristic control unit for outputting a frequency characteristic in a stepwise manner in accordance with a result of the voice non-voice discrimination and changing the frequency characteristic to a characteristic optimum for the audio signal.
With this, regardless of whether the sound signal is monaural or stereo , the sound signal can be automatically reproduced with the frequency characteristic corresponding to whether it is sound or not , and in the case of indeterminate judgment
Erroneous judgment by maintaining the previous characteristics
Sudden change in frequency characteristics due to
Prevents the phenomenon that the characteristics of the non-voice side are unstablely changed, and
It is possible to provide a sound reproducing device that is always easy to hear.

[Brief description of the drawings]

【図１】本発明の音声判別装置の一実施例の構成を示す
ブロック図FIG. 1 is a block diagram showing a configuration of an embodiment of a voice discrimination device of the present invention.

【図２】本発明の音響再生装置の一実施例の構成を示す
ブロック図FIG. 2 is a block diagram showing a configuration of an embodiment of a sound reproducing apparatus according to the present invention.

【図３】本発明の音響再生装置における周波数特性制御
部が段階的に切り替える一実施例の周波数特性図FIG. 3 is a frequency characteristic diagram of an embodiment in which a frequency characteristic control unit in the sound reproducing device of the present invention switches stepwise.

【符号の説明】１パワー算出部２零交差算出部３有音無音判定部４子音性判定部５定常性判定部６音声判定部７音声音楽判別部８周波数特性制御部 [Description of Signs] 1 Power calculation unit 2 Zero-crossing calculation unit 3 Voiced / silence determination unit 4 Consonant determination unit 5 Stationaryness determination unit 6 Voice determination unit 7 Voice / music determination unit 8 Frequency characteristic control unit

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁶，ＤＢ名) G10L 3/00 531 G10L 3/00 513 G10L 9/12 301──────────────────────────────────────────────────続き Continued on the front page (58) Field surveyed (Int. Cl. ⁶ , DB name) G10L 3/00 531 G10L 3/00 513 G10L 9/12 301

Claims

(57) [Claims]

1. A sound of an acoustic signal for each frame of a predetermined time.
A power calculation unit for calculating the sound power, and the calculated sound power.
The power value is compared with a preset threshold value and
A sound / silence determining section for determining sound / silence of a frame;
Calculate the number of zero crossings of the sound signal waveform for each frame.
The zero-crossing calculation unit and the calculated number of zero-crossings are set in advance.
Judge the consonantness of the frame by comparing it with the threshold
The consonant determination unit and a plurality of continuous
The maximum and minimum power values are detected and the difference is calculated.
And a continuity determining unit that emits no sound in the plurality of frames.
And the consonant is determined to be high.
The defined frame existence ratio and the difference value are respectively
If all values are greater than the preset threshold
In this case, the audio signal in the
Is not determined to be audio,
The existence ratio of frames determined to be silent in the
Each of the difference values is equal to or less than the threshold value in advance.
If it is smaller than the threshold set in
The sound signal in the frame section is determined to be non-voice, and the other
Sound that is judged to be constant and the judgment result is output for each frame
A voice discriminating device comprising a voice determining unit.

2. An audio signal is inputted to determine whether a voice is a non-voice.
2. The voice discriminating apparatus according to claim 1, wherein said audio signal is
The voice non-voice discrimination result of the voice discrimination device every predetermined time and
Of the sound signal according to the voice / non-voice discrimination result.
Frequency that gradually changes the frequency characteristics to the optimal frequency characteristics
An audio reproducing device comprising a numerical characteristic control unit.