JP2006148608A

JP2006148608A - Sound signal discriminating device, tone controlling device, broadcast receiving unit, program, and recording medium

Info

Publication number: JP2006148608A
Application number: JP2004336823A
Authority: JP
Inventors: Sunao Onishi; 直大西; Tomoya Nakamura; 智也中村; Koichiro Matsuhisa; 浩一郎松久
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2004-11-22
Filing date: 2004-11-22
Publication date: 2006-06-08
Anticipated expiration: 2024-11-22
Also published as: JP4275054B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a sound signal discriminating device which accurately discriminating speech from non-speech when sound signals are inputted into it. <P>SOLUTION: The sound signal discriminating device is equipped with: a speech/non-speech judging means 12 for judging whether the inputted sound signals are corresponding to speech or non-speech; a monaural/stereo judging means 13 which judges whether the inputted sound signals are monaural or stereo; and a reference optimizing means 14 which optimizes a judgment reference in the speech/non-speech judging means 12 resting on a judgment result given by the monaural/stereo judging means 13. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、音声信号判別装置、音質調整装置、放送受信機、プログラム、及び記録媒体に関し、より詳細には、音声信号に対しスピーチ／非スピーチの判別を行う音声信号判別装置、その装置を備えた音質調整装置、その音質調整装置を備えた放送受信機、それらのプログラム、及びそのプログラムを記録したコンピュータ読み取り可能な記録媒体に関する。 The present invention relates to an audio signal discriminating device, a sound quality adjusting device, a broadcast receiver, a program, and a recording medium, and more particularly, an audio signal discriminating device that discriminates speech / non-speech from an audio signal, and the apparatus. The present invention relates to a sound quality adjusting device, a broadcast receiver including the sound quality adjusting device, a program thereof, and a computer-readable recording medium on which the program is recorded.

従来から、一般的なオーディオ装置では、低音域の出力周波数特性を調整するバス調整、高音域の出力周波数特性を調整するトレブル調整、低音域及び高音域を強調するラウドネス調整等の各種音質調整装置が設けられている。 Conventionally, in general audio devices, various sound quality adjustment devices such as bass adjustment that adjusts the output frequency characteristics of the low frequency range, treble adjustment that adjusts the output frequency characteristics of the high frequency range, and loudness adjustment that emphasizes the low and high frequency ranges. Is provided.

このような音質調整装置としては、入力された音声信号の音声情報自体からその周期性の有無を検出することにより、入力された信号が音楽情報かそれ以外の情報かを判断し、その結果に応じて音響パラメータを制御するものも提案されている（例えば、特許文献１を参照）。
特開昭６１−９３７１２号公報 As such a sound quality adjusting device, by detecting the presence or absence of the periodicity from the sound information itself of the input sound signal, it is determined whether the input signal is music information or other information, and the result is A device that controls acoustic parameters in response to this has been proposed (see, for example, Patent Document 1).
JP-A-61-93712

しかしながら、特にテレビジョン放送やラジオ放送を受信する機器においては、音声情報だけから音楽情報の是非を判断すると思わぬ誤判定が生じる場合がある。 However, in particular, in devices that receive television broadcasts or radio broadcasts, an unexpected misjudgment may occur when judging whether or not music information is appropriate only from audio information.

例えば、音楽番組でアカペラが流れた場合は、その作風のためにリズム感を検出することができずに、音楽情報ではないと判定し、この音楽情報に最適な音響パラメータをイコライザ等で選択しないという誤判定が生じる。その結果、この音楽情報は、イコライザの方で例えばスピーチに最適な音響パラメータ等を選択することも生じ得るので、生の音の響きを重視したいアカペラの音楽情報に対して、言葉の明瞭性を重視した（中音域を比較的強調した）音響特性で出力する結果となり、ユーザが本来聞きたい音響設定にならない。 For example, if a cappella flows in a music program, it is not possible to detect the rhythm due to its style, it is determined that it is not music information, and the optimal acoustic parameter for this music information is not selected by an equalizer or the like A misjudgment occurs. As a result, for this music information, the equalizer may select, for example, the optimal acoustic parameters for speech, so the clarity of the words can be improved for the music information of a cappella that emphasizes the sound of raw sound. As a result, the sound characteristics that are emphasized (relatively emphasized in the middle sound range) are output, and the sound settings that the user originally wants to hear are not achieved.

また、ニュース番組を視聴中には、本来言語の明瞭性を重視したスピーチに最適なパラメータ等を選択するのが好適であるが、ニュースの内容によっては時にはアナウンサのスピーチと平行してニュースの取材現場で集音した音声をそのまま出力する場合もある。このような集音した音声情報に音楽が混在していると、その両者の音量のバランスによってはニュース番組のスピーチより、集音した音声から出力された音楽情報などが優位性を持つことも想定されるので、このような場合も、上述のアカペラの例とは逆の例として十分起こり得る問題点である。 In addition, while watching a news program, it is preferable to select parameters that are optimal for speech that emphasizes language clarity, but depending on the content of the news, it may sometimes be in parallel with the announcer's speech. In some cases, the sound collected on site is output as it is. If music is mixed in such collected audio information, it is assumed that the music information output from the collected audio has an advantage over the speech of the news program depending on the balance of the volume of both. Therefore, such a case is also a problem that can occur sufficiently as an example opposite to the above-described a cappella example.

本発明は、上述のごとき実情に鑑みてなされたものであり、入力された音声信号に対して的確にスピーチ／非スピーチを判別することが可能な音声信号判別装置、その装置を備えた音質調整装置、その音質調整装置を備えた放送受信機、それらのプログラム、及びそのプログラムを記録したコンピュータ読み取り可能な記録媒体を提供することをその目的とする。 The present invention has been made in view of the above circumstances, and an audio signal discriminating apparatus capable of accurately discriminating speech / non-speech from an input audio signal, and sound quality adjustment provided with the apparatus It is an object of the present invention to provide an apparatus, a broadcast receiver including the sound quality adjusting apparatus, a program thereof, and a computer-readable recording medium on which the program is recorded.

本発明は、上述のごとき課題を解決するために、以下の各技術手段でそれぞれ構成される。
第１の技術手段は、音声信号判別装置において、入力された音声信号がスピーチに対応するものか、非スピーチに対応するものかを判別するための判定を行うスピーチ／非スピーチ判定手段と、入力された音声信号が、モノラル信号又はステレオ信号のいずれであるかを判定するモノラル／ステレオ判定手段と、該モノラル／ステレオ判定手段での判定結果に基づいて、前記スピーチ／非スピーチ判定手段における判定基準を最適化する基準最適化手段とを有することを特徴としたものである。 The present invention is constituted by the following technical means in order to solve the above-described problems.
The first technical means is a speech / non-speech determination means for making a determination to determine whether an input audio signal corresponds to speech or non-speech in an audio signal determination device; Determination unit in the speech / non-speech determination unit based on the determination result of the monaural / stereo determination unit and the determination result in the monaural / stereo determination unit And a standard optimizing means for optimizing.

第２の技術手段は、第１の技術手段において、前記スピーチ／非スピーチ判定手段における判定は、複数の信号解析によって行い、前記基準最適化手段は、前記判定基準としての各信号解析に対する閾値のセットを、モノラル／ステレオ判定に基づいて変更することを特徴としたものである。 According to a second technical means, in the first technical means, the determination in the speech / non-speech determination means is performed by a plurality of signal analysis, and the reference optimization means is a threshold value for each signal analysis as the determination criterion. The set is changed based on monaural / stereo determination.

第３の技術手段は、第１又は第２の技術手段における音声信号判別装置を備えた音質調整装置であって、該音声信号判別装置によってスピーチ／非スピーチに判別された音声信号に対し、スピーチと非スピーチとで異なる音質に調整することを特徴としたものである。 A third technical means is a sound quality adjusting device including the audio signal discriminating device in the first or second technical means, and is adapted to perform speech on an audio signal discriminated as speech / non-speech by the audio signal discriminating device. It is characterized by adjusting the sound quality different between non-speech and non-speech.

第４の技術手段は、第３の技術手段における音質調整装置と放送受信装置とを備えた放送受信機であって、該放送受信装置で受信した放送信号から前記音声信号を前記音質調整装置に入力し、音質を調整して音声出力することを特徴としたものである。 A fourth technical means is a broadcast receiver comprising the sound quality adjusting device and the broadcast receiving device in the third technical means, wherein the audio signal is transmitted from the broadcast signal received by the broadcast receiving device to the sound quality adjusting device. It is characterized by inputting, adjusting sound quality and outputting sound.

第５の技術手段は、入力された音声信号が、モノラル信号又はステレオ信号のいずれであるかを判定するモノラル／ステレオ判定ステップと、その判定結果に基づいて所定の基準を最適化する基準最適化ステップと、入力された音声信号がスピーチに対応するものか、非スピーチに対応するものかを、前記所定の基準で判定するスピーチ／非スピーチ判定ステップとを、コンピュータに実行させるためのプログラムである。 The fifth technical means includes a monaural / stereo determination step for determining whether the input audio signal is a monaural signal or a stereo signal, and a reference optimization for optimizing a predetermined reference based on the determination result A program for causing a computer to execute a step and a speech / non-speech determination step for determining whether an input audio signal corresponds to speech or non-speech based on the predetermined criterion. .

第６の技術手段は、第５の技術手段において、前記スピーチ／非スピーチ判定ステップにおける判定は、複数の信号解析によって行い、前記基準最適化ステップは、前記所定の基準としての各信号解析に対する閾値のセットを、モノラル／ステレオ判定に基づいて変更することを特徴としたものである。 According to a sixth technical means, in the fifth technical means, the determination in the speech / non-speech determination step is performed by a plurality of signal analysis, and the reference optimization step is a threshold for each signal analysis as the predetermined reference. The set is changed based on monaural / stereo determination.

第７の技術手段は、第５又は第６の技術手段において、前記コンピュータに、前記スピーチ／非スピーチ判定ステップによってスピーチ／非スピーチに判別された音声信号に対し、スピーチと非スピーチとで異なる音質に調整するステップを実行させるプログラムをさらに含むことを特徴としたものである。 According to a seventh technical means, in the fifth or sixth technical means, the sound quality which is different between speech and non-speech for the speech signal discriminated as speech / non-speech by the speech / non-speech determination step in the computer. The program further includes a program for executing the adjusting step.

第８の技術手段は、第５乃至第７のいずれかの技術手段におけるプログラムを記録したコンピュータ読み取り可能な記録媒体である。 The eighth technical means is a computer-readable recording medium on which a program according to any of the fifth to seventh technical means is recorded.

本発明によれば、入力された音声信号に対して的確にスピーチ／非スピーチを判別することが可能となる。 According to the present invention, it is possible to accurately determine speech / non-speech for an input audio signal.

本発明に係る音声信号判別装置は、スピーチ／非スピーチ判定手段、モノラル／ステレオ判定手段、及び基準最適化手段を備えるものとする。以下、このような音声信号判別装置を備え、ここでの判別に基づいた音質調整を行う音質調整装置について説明するが、本発明に係る音声信号判別装置は、音質調整以外、例えば判別に基づいたコンテンツ（その音声信号を含むコンテンツ）の分別記録（録画）などにも適用可能である。 The audio signal discriminating apparatus according to the present invention includes speech / non-speech determination means, monaural / stereo determination means, and reference optimization means. Hereinafter, a sound quality adjustment apparatus that includes such a sound signal determination apparatus and performs sound quality adjustment based on the determination here will be described. However, the sound signal determination apparatus according to the present invention is based on, for example, determination other than sound quality adjustment. The present invention can also be applied to separate recording (recording) of content (content including the audio signal).

図１は、本発明の一実施形態に係る音質調整装置の一構成例を示すブロック図で、図中、１は音質調整装置、１１は音声信号入力手段、１２はスピーチ／非スピーチ判定手段、１３はモノラル／ステレオ判定手段、１４は基準最適化手段、１４ａはスイッチ、１４ｂは閾値（スレッショルド）Ｖ_ＳＬ１への設定手段、１４ｃは閾値Ｖ_ＳＬ２への設定手段、１５は音質調整手段、１６は音声信号出力手段である。 FIG. 1 is a block diagram showing a configuration example of a sound quality adjustment apparatus according to an embodiment of the present invention. In the figure, 1 is a sound quality adjustment apparatus, 11 is an audio signal input means, 12 is a speech / non-speech determination means, 13 mono / stereo decision means, reference optimization means 14, 14a switch, setting means 14b is the threshold _{(threshold) V SL1,} 14c is set means to the threshold _{V SL2,} 15 is the sound quality adjustment section, 16 It is an audio signal output means.

スピーチ／非スピーチ判定手段１２は、音声信号入力手段１１で入力された音声信号がスピーチに対応するものか、非スピーチに対応するものかを判別するための判定を行う。音声信号入力手段１１では、その入力元や入力方法は問わない。また、スピーチ／非スピーチ判定手段１２は、その全体又は一部をハードウェアで構成してもソフトウェアで構成してもよい。本発明に係る音声信号判別装置は、スピーチ／非スピーチ判定手段１２により音声信号がスピーチか非スピーチかを判別することとなる。 The speech / non-speech determination unit 12 performs a determination for determining whether the audio signal input by the audio signal input unit 11 corresponds to speech or non-speech. In the audio signal input means 11, the input source and input method are not limited. The speech / non-speech determination unit 12 may be configured in whole or in part by hardware or software. In the speech signal discriminating apparatus according to the present invention, the speech / non-speech judging means 12 judges whether the speech signal is speech or non-speech.

本発明に係る音声信号判別装置では、「ニュース番組などは一般的にモノラル放送が多く、一方で音楽が流れるＣＭや音楽番組はステレオ放送に設定されていることが多い」といった経験則を利用し、音声信号に重畳されたモノラル／ステレオ信号を検出することによって、現在放送されている番組がスピーチ／非スピーチ（音楽）のいずれに好適かを判断する。このため、本発明に係る音声信号判別装置には、モノラル／ステレオ判定手段１３及び基準最適化手段１４を備える。そして、本発明に係る音質調整装置では、その判断に基づき音響パラメータの制御を行っている。 In the audio signal discrimination device according to the present invention, an empirical rule such as “News programs are generally monaural broadcasts, while music and music programs in which music flows are often set to stereo broadcasts” is used. By detecting the monaural / stereo signal superimposed on the audio signal, it is determined whether the currently broadcast program is suitable for speech / non-speech (music). For this reason, the audio signal discriminating apparatus according to the present invention includes a monaural / stereo determination unit 13 and a reference optimization unit 14. The sound quality adjusting apparatus according to the present invention controls the acoustic parameters based on the determination.

モノラル／ステレオ判定手段１３は、入力された音声信号が、モノラル信号又はステレオ信号のいずれであるかを判定する。モノラル／ステレオ判定手段１３も、その全体又は一部をハードウェアで構成してもソフトウェアで構成してもよく、また、単に音声信号を入力した際のモノラル／ステレオの切り替えなどの情報によって判定してもよい。さらに、音声信号の元のコンテンツが電子プログラムガイド（ＥＰＧ）に掲載され予約録画可能なようになっている場合などには、ＥＰＧにおけるモノラル／ステレオの情報も共に掲載されているので、その情報を取得することでモノラル／ステレオ判定を行うことも可能である。 The monaural / stereo determination means 13 determines whether the input audio signal is a monaural signal or a stereo signal. The monaural / stereo determination means 13 may be configured in whole or in part by hardware or software, and is determined by information such as mono / stereo switching when a sound signal is input. May be. Furthermore, if the original content of the audio signal is posted in an electronic program guide (EPG) and can be reserved for recording, the mono / stereo information in the EPG is also posted together. It is also possible to perform monaural / stereo determination by acquiring.

基準最適化手段１４は、モノラル／ステレオ判定手段１３での判定結果に基づいて、スピーチ／非スピーチ判定手段１２における判定基準を最適化する。従って、本発明に係る音声信号判別装置では、音声信号のスピーチ／非スピーチの判定を的確に行うには、その音声信号に対してモノラル／ステレオ判定及び基準最適化が予めなされていることを前提とするが、ディレイなどを用いてもよいし、単に、音声信号が入力される度に、逐次、モノラル／ステレオ判定及び基準最適化を行ってスピーチ／非スピーチ判定を行っていってもよい。 The reference optimization unit 14 optimizes the determination criterion in the speech / non-speech determination unit 12 based on the determination result in the monaural / stereo determination unit 13. Therefore, in the audio signal discriminating apparatus according to the present invention, in order to accurately determine the speech / non-speech of the audio signal, it is assumed that monaural / stereo determination and reference optimization have been performed on the audio signal in advance. However, a delay or the like may be used, or the speech / non-speech determination may be performed by sequentially performing monaural / stereo determination and reference optimization every time an audio signal is input.

音質調整手段１５は、このような構成要素を備えた音声信号判別装置によってスピーチ／非スピーチに判別された音声信号に対し、スピーチと非スピーチとで異なる音質に調整する。ここでの音質設定の方法は任意であり、スピーチ／非スピーチにより、その設定値や増減の設定値、或いは各周波数帯での設定値などが異なっていればよい。例えば、グラフィックイコライザのごときイコライザの中心周波数とフィルタのＱ値（グラフィックイコライザの１つの帯域分のカーブにおける山，谷の鋭さ）が固定されている音質設定や、パラメトリックイコライザのごとくこれらも変更可能な音質設定であってもよい。そして、音声信号出力手段１６は、音質調整手段１５で調整された音声信号を出力する。 The sound quality adjusting means 15 adjusts the sound signal discriminated as speech / non-speech by the audio signal discriminating apparatus having such components to a different sound quality between speech and non-speech. The sound quality setting method here is arbitrary, and it suffices if the setting value, the increase / decrease setting value, or the setting value in each frequency band differs depending on speech / non-speech. For example, the sound quality setting where the center frequency of the equalizer such as a graphic equalizer and the Q value of the filter (the sharpness of peaks and valleys in the curve for one band of the graphic equalizer) are fixed, and these can be changed as in the case of a parametric equalizer. Sound quality setting may be used. The audio signal output unit 16 outputs the audio signal adjusted by the sound quality adjustment unit 15.

図２は、図１の音質調整装置における音質調整処理の一例を説明するためのフロー図で、図３は、図１の音質調整装置における音質調整処理で用いる音質設定イコライジングの一例を示す図である。 2 is a flowchart for explaining an example of the sound quality adjustment process in the sound quality adjustment apparatus of FIG. 1, and FIG. 3 is a diagram showing an example of the sound quality setting equalizing used in the sound quality adjustment process in the sound quality adjustment apparatus of FIG. is there.

簡略化のため、スピーチ／非スピーチにおける判定基準が、ある１つの閾値処理によってなされるものとして説明する。まず、音声信号が入力されると、モノラル／ステレオ判定手段１３によりモノラル／ステレオ判定がなされる（ステップＳ１）。この判定に際しては、例えば、Ｌを左入力信号、Ｒを右入力信号とすると、入力信号に（Ｌ−Ｒ）／（Ｌ＋Ｒ）の演算を実行し、位相差判定を実施するとよい。 For the sake of simplicity, the description will be made assuming that the determination criterion for speech / non-speech is made by a single threshold process. First, when an audio signal is input, monaural / stereo determination means 13 performs monaural / stereo determination (step S1). In this determination, for example, assuming that L is a left input signal and R is a right input signal, a calculation of (LR) / (L + R) is performed on the input signal, and the phase difference determination may be performed.

この判定により、モノラル信号であると判定された場合には、基準最適化手段１４において、スイッチ１４ａを閾値Ｖ_ＳＬ１への設定手段１４ｂ側へ接続し、スピーチ／非スピーチ判定手段１２における判定の閾値をＶ_ＳＬ１に設定する（ステップＳ２）。一方、ステップＳ１により、ステレオ信号であると判定された場合には、基準最適化手段１４において、スイッチ１４ａを閾値Ｖ_ＳＬ２への設定手段１４ｃ側へ接続し、スピーチ／非スピーチ判定手段１２における判定の閾値をＶ_ＳＬ２に設定する（ステップＳ３）。このように閾値の設定を最適化することで、ニュース等のモノラル信号時はスピーチと判定し易く、またＢＧＭを含めた音楽が多いステレオ信号時は非スピーチと判定し易くなるように制御することができる。なお、基準最適化手段１４の構成は図示したものに限定されるものではない。 If it is determined by this determination that the signal is a monaural signal, the reference optimization means 14 connects the switch 14a to the setting means 14b side for the threshold value _VSL1, and the determination threshold value in the speech / non-speech determination means 12 _Is set to _VSL1 (step S2). On the other hand, if it is determined in step S1 that the signal is a stereo signal, the reference optimizing unit 14 connects the switch 14a to the setting unit 14c side for the threshold value _VSL2, and the determination in the speech / non-speech determining unit 12 is performed. _Is set to _VSL2 (step S3). By optimizing the threshold setting in this way, control is performed so that it is easy to determine speech for monaural signals such as news, and it is easy to determine non-speech for stereo signals with a lot of music including BGM. Can do. The configuration of the reference optimization unit 14 is not limited to that shown in the figure.

次に、スピーチ／非スピーチ判定手段１２が、ステップＳ２／Ｓ３のいずれかで設定された閾値Ｖ_ＳＬ１／Ｖ_ＳＬ２に基づいて、スピーチ／非スピーチの判定を行う（ステップＳ４）。そして、スピーチであると判定された場合には、音質設定Ａを選択し（ステップＳ５）、処理を終了する。一方、ステップＳ４で非スピーチと判定された場合、音質設定Ｂを選択し（ステップＳ６）、処理を終了する。 Next, the speech / non-speech determination unit 12 determines speech / non-speech based on the threshold value V _SL1 / V _SL2 set in any of steps S 2 / S 3 (step S 4). If it is determined to be speech, the sound quality setting A is selected (step S5), and the process ends. On the other hand, if it is determined in step S4 that the speech is not speech, the sound quality setting B is selected (step S6), and the process ends.

ここで、音質設定Ａと音質設定Ｂとの違いの例について、図３を参照して説明する。音質設定Ａ（スピーチ）の場合、イコライザの周波数特性をグラフ２１で示すように設定し、音質設定Ｂ（非スピーチ）の場合、イコライザの周波数特性をグラフ２２で示すように設定する。グラフ２１とグラフ２２との違いは、非スピーチのときはスピーチのときに比べて、所定の低周波数２２ａの付近及び所定の高周波数２２ｂの付近を強調している点にある。 Here, an example of the difference between the sound quality setting A and the sound quality setting B will be described with reference to FIG. In the case of the sound quality setting A (speech), the frequency characteristic of the equalizer is set as shown in the graph 21, and in the case of the sound quality setting B (non-speech), the frequency characteristic of the equalizer is set as shown in the graph 22. The difference between the graph 21 and the graph 22 is that in the case of non-speech, the vicinity of the predetermined low frequency 22a and the vicinity of the predetermined high frequency 22b are emphasized compared to the case of speech.

以上、本実施形態に係る音声信号判別装置によれば、モノラル／ステレオ判定をすることによって、スピーチ自動検出機能の判定基準を最適化させて、検出機能の精度を向上させることができる。従って、入力された音声信号に対して的確にスピーチ／非スピーチを判別すること、すなわち音声信号のモノラル／ステレオの信号に応じて好適なスピーチ／非スピーチ検出が可能となる。例えば、ニュース等のモノラル信号時はスピーチと判定し易く、またＢＧＭを含めた音楽が多いステレオ信号時は非スピーチと判定し易くなるように最適化制御を行うことができる。 As described above, according to the audio signal determination device according to the present embodiment, the determination standard of the automatic speech detection function can be optimized and the accuracy of the detection function can be improved by performing monaural / stereo determination. Therefore, it is possible to accurately determine speech / non-speech with respect to the input audio signal, that is, suitable speech / non-speech detection according to the monaural / stereo signal of the audio signal. For example, optimization control can be performed so that a monaural signal such as news can be easily determined as speech, and a stereo signal with a lot of music including BGM can be easily determined as non-speech.

また、本実施形態に係る音質調整装置によれば、音声信号にとって最適な音響パラメータを、音声信号の音声情報だけからではなく番組（その音声信号を含む番組）の主旨に沿った判断（スピーチ／非スピーチの判断）も同時になすことで、入力された音声信号の特性によるイコライザ等の音響パラメータ制御の誤判定を極力低減し、的確な音響パラメータの制御及び的確な音質調整が可能となる。また、本実施形態に係る音質調整装置も含め、本発明に係る音声信号判別装置を備えた機器においては、例えば、音声信号に音声情報と同時に重畳されたモノラル／ステレオ信号によってその番組の主旨を判定し、その結果に応じて入力された音声信号がスピーチか非スピーチ（音楽）かを判断するための判断基準を最適化することによって、放送された番組の内容、特性に応じたスピーチ／非スピーチ検出の自由な制御、及びその制御に基づく機器の制御（例えば音質調整や分別録画等）が可能になる。 In addition, according to the sound quality adjusting apparatus according to the present embodiment, the optimum acoustic parameter for the audio signal is determined not only from the audio information of the audio signal but also in accordance with the gist of the program (program including the audio signal) (speech / By simultaneously determining (non-speech determination), it is possible to reduce erroneous determination of acoustic parameter control such as an equalizer based on the characteristics of the input audio signal as much as possible, and it is possible to perform accurate acoustic parameter control and accurate sound quality adjustment. In addition, in a device equipped with the audio signal discriminating apparatus according to the present invention, including the sound quality adjusting apparatus according to the present embodiment, for example, the gist of the program is indicated by a monaural / stereo signal superimposed on the audio signal simultaneously with the audio information. By determining and optimizing the criteria for determining whether the input audio signal is speech or non-speech (music) according to the result, speech / non-speech according to the contents and characteristics of the broadcast program Free control of speech detection and control of equipment based on the control (for example, sound quality adjustment and separate recording) can be performed.

また、本発明の他の実施形態として、スピーチ／非スピーチ判定手段１２における判定は、複数の信号解析によって行うようにすることが好ましい。この形態にあっては、図２のステップＳ４において、入力された音声信号に対して複数の信号解析を実施する。信号解析としては、例えば、信号の対時間エネルギー変化解析，音節の均一解析，周波数対音声強度の解析などである。このような信号解析により、例えば、（Ｉ）信号の対時間エネルギー変化，（ＩＩ）周波数対音声強度，（ＩＩＩ）母音と子音の順序，（ＩＶ）音節の長さ，（Ｖ）子音と母音のエネルギー量などが得られる。 As another embodiment of the present invention, it is preferable that the determination in the speech / non-speech determination unit 12 is performed by a plurality of signal analyses. In this form, a plurality of signal analyzes are performed on the input audio signal in step S4 of FIG. Signal analysis includes, for example, signal energy change analysis with respect to time, syllable uniformity analysis, and frequency vs. sound intensity analysis. By such signal analysis, for example, (I) signal energy change over time, (II) frequency versus voice intensity, (III) vowel and consonant order, (IV) syllable length, (V) consonant and vowel. The amount of energy can be obtained.

これらは、例えば次のような点を考慮して、スピーチ／非スピーチを判定するとよい。（Ｉ）スピーチには、音節（音声エネルギーが高い）と音節との間に、音声エネルギーが低い区分が存在し、非スピーチにはこのような区分は存在しないことが多い。（ＩＩ）スピーチが１００Ｈｚ〜３ｋＨｚの中域の強度が強く、非スピーチが低域及び高域の強度が強い。（ＩＩＩ）スピーチは、音節内の順序が子音から母音へと続く場合が多い。（ＩＶ）スピーチは、音節の長さが均一の場合が多い。（Ｖ）スピーチは、母音のエネルギー量が子音のエネルギー量より大きい場合が多い。さらに、（Ｉ）〜（Ｖ）に対し、重み付けを行って合算し、統計処理を施すなどして、最終的な信号解析の結果を得、その数値をモノラルの場合には閾値Ｖ_ＳＬ１でステレオの場合は閾値Ｖ_ＳＬ２で判定することで、スピーチ／非スピーチの判定を行えばよい。他の方法として、この実施形態において基準最適化手段１４が、判定基準としての各信号解析に対する閾値のセットを、モノラル／ステレオ判定に基づいて変更するようにしてもよい。 For example, in consideration of the following points, speech / non-speech may be determined. (I) In speech, there is a segment with low speech energy between syllables (with high speech energy) and syllables, and such segments are often absent in non-speech. (II) The mid-range intensity of speech is 100 Hz to 3 kHz, and the non-speech intensity is low and high. (III) Speech often follows the order in a syllable from a consonant to a vowel. (IV) Speech often has uniform syllable lengths. In (V) speech, the amount of vowel energy is often greater than the amount of consonant energy. Furthermore, (I) to (V) are weighted and summed, and statistical processing is performed to obtain a final signal analysis result. If the numerical value is monaural, it is stereo with the threshold value _VSL1 . In this case, the speech / non-speech determination may be performed by determining with the threshold value _VSL2 . As another method, in this embodiment, the reference optimization unit 14 may change a set of threshold values for each signal analysis as a determination criterion based on monaural / stereo determination.

なお、本実施形態においても、ステップＳ１では、説明した通り、例えば入力信号に（Ｌ−Ｒ）／（Ｌ＋Ｒ）を実行し、位相差判定を実施し、この結果によりステップＳ２／Ｓ３において判定基準のスレッショルド（Ｖ_ＳＬ１／Ｖ_ＳＬ２）を決定する。そして、信号解析の結果より、音質設定（Ａ又はＢ）を実施し（ステップＳ５／Ｓ６）、音声信号を出力する。 Also in the present embodiment, as described above, in step S1, for example, (LR) / (L + R) is performed on the input signal, and phase difference determination is performed. Threshold (V _SL1 / V _SL2 ) is determined. Then, sound quality setting (A or B) is performed based on the result of signal analysis (step S5 / S6), and an audio signal is output.

また、図１乃至図３で上述した音声信号判別装置や音質調整装置１やその構成要素となる各手段は、上述したように、ハードウェアで構成してもよいがその一部をソフトウェアで構成してもよい。例えば、ＰＣ（パーソナルコンピュータ）等の汎用コンピュータなどにプログラムを組み込むことで構成してもよく、その場合の各種処理について、図４に示す一般的な情報処理装置の構成例を参照して説明する。図４は、一般的な情報処理装置の構成例を示すブロック図で、図中、３は情報処理装置、３１はＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、３２はＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、３３は書き換え可能なＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、３４は入力装置、３５は表示装置、３６は出力装置、３７はバスである。 Further, the audio signal discriminating apparatus and the sound quality adjusting apparatus 1 described above with reference to FIG. 1 to FIG. 3 and each means as the constituent elements may be configured by hardware as described above, but a part thereof is configured by software. May be. For example, it may be configured by incorporating a program into a general-purpose computer such as a PC (personal computer), and various processes in that case will be described with reference to a configuration example of a general information processing apparatus shown in FIG. . FIG. 4 is a block diagram showing a configuration example of a general information processing apparatus, in which 3 is an information processing apparatus, 31 is a CPU (Central Processing Unit), 32 is a RAM (Random Access Memory), and 33 is a rewritable memory. A ROM (Read Only Memory) 34 is an input device, 35 is a display device, 36 is an output device, and 37 is a bus.

また、コンピュータを本発明における装置や各手段として機能させるためのプログラムは、ＲＯＭ３３に蓄積されており、ＣＰＵ３１が読み出すことによって実行される。コンピュータ等に搭載される場合のこのプログラムは、上述の各手段としてコンピュータのＣＰＵ３１等を制御するプログラム（コンピュータを機能させるプログラム）である。本発明に係る装置や各手段で取り扱われる情報は、その処理時に一時的にＲＡＭ３２に蓄積され、その後、各種ＲＯＭ３３に格納され、必要に応じて、ＣＰＵ３１によって読み出し、修正・書き込みが行われる。ここで本発明に関連する情報としては、閾値や入力装置３４の一つとしての音声信号入力手段によって入力され信号解析される時の音声信号などが挙げられる。また、例えばＲＯＭ３３に記憶された閾値のうち設定された閾値をＲＡＭ３２に読み出すことで閾値設定をその間維持するようにしてもよい。 A program for causing a computer to function as the device and each means in the present invention is stored in the ROM 33 and is executed by being read out by the CPU 31. This program when mounted on a computer or the like is a program (a program that causes a computer to function) that controls the CPU 31 or the like of the computer as each of the means described above. Information handled by the apparatus and each means according to the present invention is temporarily stored in the RAM 32 at the time of processing, and then stored in various ROMs 33, and is read out, corrected, and written by the CPU 31 as necessary. Here, examples of the information related to the present invention include a threshold value and an audio signal when the signal is input and analyzed by an audio signal input unit as one of the input devices 34. Further, for example, a threshold value set among the threshold values stored in the ROM 33 may be read out to the RAM 32 so that the threshold value setting is maintained.

また、処理の途中経過や途中結果は、ＬＣＤ，ＰＤＰ，有機ＥＬ，ＣＲＴ等の表示装置３５を通して装置ユーザに提示され、必要な場合には、キーボード，マウス（ポインティングデバイス）等の入力装置３４から装置ユーザが処理に必要なパラメータを入力指定すればよい（例えば入力する音声信号或いはそれを含むコンテンツの指定など）。また、このプログラムは、装置ユーザが使用する際に容易となるように、表示装置３５用のグラフィカルユーザインターフェース（ＧＵＩ）を備えるようにするとよい。出力装置３６としては、音声信号の出力装置であるスピーカをはじめとして、ネットワークに接続して通信を行うためのネットワークボード等の通信機器や、その他、印刷装置等の出力デバイス用の出力装置がある。なお、ＣＰＵ３１，ＲＡＭ３２，ＲＯＭ３３，入力装置３４，表示装置３５，出力装置３６は、バス３７などで接続されていればよい。 Further, the progress of the process and the result of the process are presented to the device user through the display device 35 such as an LCD, PDP, organic EL, CRT, and the like, if necessary, from the input device 34 such as a keyboard and a mouse (pointing device). The apparatus user may input and specify parameters necessary for processing (for example, specification of an audio signal to be input or contents including the input signal). In addition, this program may be provided with a graphical user interface (GUI) for the display device 35 so as to be easy for the device user to use. The output device 36 includes a speaker that is an audio signal output device, a communication device such as a network board for communication by connecting to a network, and an output device for an output device such as a printing device. . Note that the CPU 31, RAM 32, ROM 33, input device 34, display device 35, and output device 36 may be connected by a bus 37 or the like.

また、上述のごときプログラムを記録した記録媒体としては、具体的には、ＣＤ−ＲＯＭ、光磁気ディスク、ＤＶＤ−ＲＯＭ、ＦＤ、フラッシュメモリ、及びその他各種ＲＯＭ（書き換え可能なＲＯＭも含む）やＲＡＭ等が想定でき、上述した本発明の各実施形態の機能をコンピュータに実行させるプログラムを、これら記録媒体に記録して流通させることにより、当機能の実現を容易にする。そして、コンピュータ等の情報処理装置に、上述のごとくの記録媒体を装着して情報処理装置によりプログラムを読み出すか、若しくは情報処理装置が備えている記録媒体に当プログラムを記憶させておき、必要に応じて読み出すことにより、本発明に係わる機能を実行することができる。 Further, as a recording medium on which the program as described above is recorded, specifically, a CD-ROM, a magneto-optical disk, a DVD-ROM, an FD, a flash memory, and various other ROMs (including a rewritable ROM) and a RAM The above functions can be easily realized by recording and distributing a program for causing a computer to execute the functions of the above-described embodiments of the present invention on these recording media. Then, the information processing apparatus such as a computer is loaded with the recording medium as described above, and the program is read by the information processing apparatus, or the program is stored in the recording medium included in the information processing apparatus, By reading in response, the function according to the present invention can be executed.

さらに、本発明は、上述の各実施形態のごとき音声信号判別装置（或いは音質調整装置）と、デジタル／アナログに限らずテレビジョン放送やラジオ放送の放送信号を受信する放送受信装置とを備えた放送受信機にも適用可能である。ここでは、放送受信装置で受信した放送信号から音声信号を音質調整装置に入力し、音質を調整して音声出力する。また、上述したように本発明に係る音声信号の判別をコンテンツの記録に適用する場合には、このような放送受信機に、放送を受信するだけでなく受信した放送を記録或いは予約記録する機能を付加したものであってもよい。さらに、本発明に係る音声信号の判別をコンテンツの記録に適用する場合には、受信したコンテンツでなくても、ネットワーク経由や記録媒体経由で取得したコンテンツを再録画する場合などにも好適であり、例えば、各種レコーダなどでスピーチ／非スピーチ判定をＣＭ判定やその他の分別録画に利用することもできる。 Furthermore, the present invention includes an audio signal discriminating device (or a sound quality adjusting device) as in each of the above-described embodiments, and a broadcast receiving device that receives a broadcast signal of television broadcast or radio broadcast as well as digital / analog. It can also be applied to a broadcast receiver. Here, an audio signal is input from the broadcast signal received by the broadcast receiving apparatus to the sound quality adjusting apparatus, and the sound quality is adjusted and output as sound. In addition, as described above, when the discrimination of the audio signal according to the present invention is applied to content recording, such a broadcast receiver not only receives the broadcast but also records or reserves the received broadcast. May be added. Furthermore, when the discrimination of the audio signal according to the present invention is applied to content recording, it is suitable not only for received content but also for re-recording content acquired via a network or a recording medium. For example, speech / non-speech determination can be used for CM determination and other separate recordings with various recorders.

本発明の一実施形態に係る音質調整装置の一構成例を示すブロック図である。It is a block diagram which shows the example of 1 structure of the sound quality adjustment apparatus which concerns on one Embodiment of this invention. 図１の音質調整装置における音質調整処理の一例を説明するためのフロー図である。It is a flowchart for demonstrating an example of the sound quality adjustment process in the sound quality adjustment apparatus of FIG. 図１の音質調整装置における音質調整処理で用いる音質設定イコライジングの一例を示す図である。It is a figure which shows an example of the sound quality setting equalizing used by the sound quality adjustment process in the sound quality adjustment apparatus of FIG. 一般的な情報処理装置の構成例を示すブロック図である。FIG. 11 is a block diagram illustrating a configuration example of a general information processing apparatus.

Explanation of symbols

１…音質調整装置、３…情報処理装置、１１…音声信号入力手段、１２…スピーチ／非スピーチ判定手段、１３…モノラル／ステレオ判定手段、１４…基準最適化手段、１４ａ…スイッチ、１４ｂ…閾値Ｖ_ＳＬ１への設定手段、１４ｃ…閾値Ｖ_ＳＬ２への設定手段、１５…音質調整手段、１６…音声信号出力手段、３１…ＣＰＵ、３２…ＲＡＭ、３３…書き換え可能なＲＯＭ、３４…入力装置、３５…表示装置、３６…出力装置、３７…バス。 DESCRIPTION OF SYMBOLS 1 ... Sound quality adjustment apparatus, 3 ... Information processing apparatus, 11 ... Audio | voice signal input means, 12 ... Speech / non-speech determination means, 13 ... Monaural / stereo determination means, 14 ... Reference | standard optimization means, 14a ... Switch, 14b ... Threshold setting means for V _SL1, setting means to 14c ... threshold _{V SL2,} 15 ... sound quality adjustment section, 16 ... audio signal output means, 31 ... CPU, 32 ... RAM, 33 ... rewritable ROM, 34 ... input device, 35 ... display device, 36 ... output device, 37 ... bus.

Claims

Speech / non-speech determination means for determining whether the input audio signal corresponds to speech or non-speech, and the input audio signal is either a monaural signal or a stereo signal A monaural / stereo determination unit that determines whether the signal is non-speech and a standard optimization unit that optimizes a determination criterion in the speech / non-speech determination unit based on a determination result of the monaural / stereo determination unit. An audio signal discrimination device.

The determination in the speech / non-speech determination means is performed by a plurality of signal analysis, and the reference optimization means changes a set of thresholds for each signal analysis as the determination reference based on monaural / stereo determination. The audio signal discrimination device according to claim 1, wherein

A sound quality adjusting device comprising the audio signal discriminating device according to claim 1, wherein the audio signal discriminated as speech / non-speech by the audio signal discriminating device has different sound quality between speech and non-speech. A sound quality adjusting device characterized by adjusting.

A broadcast receiver comprising the sound quality adjusting device according to claim 3 and a broadcast receiving device, wherein the audio signal is input to the sound quality adjusting device from the broadcast signal received by the broadcast receiving device, and the sound quality is adjusted. A broadcast receiver characterized by outputting sound.

A monaural / stereo determination step for determining whether the input sound signal is a monaural signal or a stereo signal, a reference optimization step for optimizing a predetermined reference based on the determination result, and the input sound A program for causing a computer to execute a speech / non-speech determination step of determining whether a signal corresponds to speech or non-speech based on the predetermined criterion.

The determination in the speech / non-speech determination step is performed by a plurality of signal analysis, and the reference optimization step changes a set of thresholds for each signal analysis as the predetermined reference based on monaural / stereo determination. The program according to claim 5.

The computer program further includes a program for causing the computer to execute a step of adjusting the sound quality determined by the speech / non-speech in the speech / non-speech determination step to a sound quality different between the speech and the non-speech. Item 7. The program according to item 5 or 6.

The computer-readable recording medium which recorded the program of any one of Claim 5 thru | or 7.