JP2006171458A

JP2006171458A - Tone quality controller, content display device, program, and recording medium

Info

Publication number: JP2006171458A
Application number: JP2004364813A
Authority: JP
Inventors: Tomoya Nakamura; 智也中村; Sunao Onishi; 直大西; Koichiro Matsuhisa; 浩一郎松久
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2004-12-16
Filing date: 2004-12-16
Publication date: 2006-06-29

Abstract

<P>PROBLEM TO BE SOLVED: To provide a tone quality controller capable of allowing a user to view a decision result by deciding whether an input audio signal is a speech or non-speech signal. <P>SOLUTION: The tone quality controller includes a speech/non-speech decision means 12 of deciding whether the inputted audio signal corresponds to a speech or non-speech, a tone quality adjusting means 15 of controlling the audio signal after the speech/non-speech decision means 12 decides the speech/non-speech decision to different tone quality between the speech and non-speech, and a decision result display means 10 of displaying the decision result of the speech/non-speech decision means 12. The decision result display means 10 displays the decision result to the user in steps according to the degree of the speech or non-speech. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、音質調整装置、コンテンツ表示装置、プログラム、及び記録媒体に関し、より詳細には、音声信号に対しスピーチ／非スピーチの判定を行う音声信号判定機能を備えた音質調整装置、その音質調整装置を備えたコンテンツ表示装置、それらのプログラム、及びそのプログラムを記録したコンピュータ読み取り可能な記録媒体に関する。 The present invention relates to a sound quality adjustment device, a content display device, a program, and a recording medium, and more specifically, a sound quality adjustment device having a sound signal determination function for performing speech / non-speech determination on a sound signal, and the sound quality adjustment thereof The present invention relates to a content display device including the device, a program thereof, and a computer-readable recording medium on which the program is recorded.

従来から、一般的なオーディオ装置では、低音域の出力周波数特性を調整するバス調整、高音域の出力周波数特性を調整するトレブル調整、低音域及び高音域を強調するラウドネス調整等の各種音質調整装置が設けられている。 Conventionally, in general audio devices, various sound quality adjustment devices such as bass adjustment that adjusts the output frequency characteristics of the low frequency range, treble adjustment that adjusts the output frequency characteristics of the high frequency range, and loudness adjustment that emphasizes the low and high frequency ranges. Is provided.

このような音質調整装置としては、入力された音声信号の音声情報自体からその周期性の有無を検出することにより、入力された信号が音楽情報かそれ以外の情報かを判断し、その結果に応じて音響パラメータを制御するものも提案されている（例えば、特許文献１を参照）。
特開昭６１−９３７１２号公報 As such a sound quality adjusting device, by detecting the presence or absence of the periodicity from the sound information itself of the input sound signal, it is determined whether the input signal is music information or other information, and the result is A device that controls acoustic parameters in response to this has been proposed (see, for example, Patent Document 1).
JP-A-61-93712

しかしながら、特にテレビジョン放送やラジオ放送を受信する機器においては、音声情報だけから音楽情報の是非を判断すると思わぬ誤判定が生じる場合がある。 However, in particular, in devices that receive television broadcasts or radio broadcasts, an unexpected misjudgment may occur when judging whether or not music information is appropriate only from audio information.

例えば、音楽番組でアカペラが流れた場合は、その作風のためにリズム感を検出することができずに、音楽情報ではないと判定し、この音楽情報に最適な音響パラメータをイコライザ等で選択しないという誤判定が生じる。その結果、この音楽情報は、イコライザの方で例えばスピーチに最適な音響パラメータ等を選択することも生じ得るので、生の音の響きを重視したいアカペラの音楽情報に対して、言葉の明瞭性を重視した（中音域を比較的強調した）音響特性で出力する結果となり、ユーザが本来聞きたい音響設定にならない。 For example, if a cappella flows in a music program, it is not possible to detect the rhythm due to its style, it is determined that it is not music information, and the optimal acoustic parameter for this music information is not selected by an equalizer or the like A misjudgment occurs. As a result, for this music information, the equalizer may select, for example, the optimal acoustic parameters for speech, etc., so the clarity of the words is improved for the music information of a cappella that emphasizes the sound of raw sounds. As a result, the sound characteristics that are emphasized (relatively emphasized in the middle sound range) are output, and the sound settings that the user originally wants to hear are not achieved.

また、ニュース番組を視聴中には、本来言語の明瞭性を重視したスピーチに最適なパラメータ等を選択するのが好適であるが、ニュースの内容によっては時にはアナウンサのスピーチと並行してニュースの取材現場で集音した音声をそのまま出力する場合もある。このような集音した音声情報に音楽が混在していると、その両者の音量のバランスによってはニュース番組のスピーチより、集音した音声から出力された音楽情報などが優位性を持つことも想定されるので、このような場合も、上述のアカペラの例とは逆の例として十分起こり得る問題点である。 While watching a news program, it is best to select parameters that are optimal for speech that emphasizes language clarity. However, depending on the content of the news, news reporting may sometimes be performed in parallel with the announcement speech. In some cases, the sound collected on site is output as it is. If music is mixed in such collected audio information, it is assumed that the music information output from the collected audio has an advantage over the speech of the news program depending on the balance of the volume of both. Therefore, such a case is also a problem that can occur sufficiently as an example opposite to the above-described a cappella example.

そして、上述のごとき問題を解決し、入力音声信号に対し的確なスピーチ／非スピーチ判定を実行可能とした機器であっても、機器内部で判定並びにその判定に基づく音質調整を実行していることから、ユーザはどのような理由で音質が変更されたのかを理解できないといった問題が生じる。特に、このようなスピーチ／非スピーチ判定に基づく音質調整の結果として出力された音声がユーザ好みでなかった場合、ユーザは、音質調整の原因が分からず設定を変更することもできないので、不快感を抱かざるを得ない。 And even if the device solves the problems as described above and can execute accurate speech / non-speech determination on the input audio signal, the determination and the sound quality adjustment based on the determination are executed inside the device. Therefore, there arises a problem that the user cannot understand why the sound quality has been changed. In particular, if the sound output as a result of the sound quality adjustment based on such speech / non-speech determination is not user-preferred, the user cannot understand the cause of the sound quality adjustment and cannot change the setting. I have to hold.

本発明は、上述のごとき実情に鑑みてなされたものであり、入力された音声信号に対してスピーチ／非スピーチを判定してその判定結果に基づき音質を調整する際に、その判定結果をユーザに視認させることが可能な音質調整装置、その音質調整装置を備えたコンテンツ表示装置、それらのプログラム、及びそのプログラムを記録したコンピュータ読み取り可能な記録媒体を提供することをその目的とする。 The present invention has been made in view of the above circumstances, and when determining speech / non-speech for an input audio signal and adjusting the sound quality based on the determination result, the determination result is used by the user. It is an object of the present invention to provide a sound quality adjusting device that can be visually recognized, a content display device including the sound quality adjusting device, a program thereof, and a computer-readable recording medium on which the program is recorded.

本発明は、上述のごとき課題を解決するために、以下の各技術手段でそれぞれ構成される。
第１の技術手段は、入力された音声信号がスピーチに対応するものか、非スピーチに対応するものかを判別するための判定を行うスピーチ／非スピーチ判定手段と、該スピーチ／非スピーチ判定手段によってスピーチ／非スピーチに判別された音声信号に対し、スピーチと非スピーチとで異なる音質に調整する音質調整手段と、前記スピーチ／非スピーチ判定手段における判定結果を表示する判定結果表示手段とを有する音質調整装置であって、該判定結果表示手段は、ユーザに対し、前記判定結果をスピーチ或いは非スピーチの度合に応じて段階的に表示することを特徴としたものである。 The present invention is constituted by the following technical means in order to solve the above-described problems.
The first technical means includes speech / non-speech determination means for determining whether the input audio signal corresponds to speech or non-speech, and the speech / non-speech determination means. A sound quality adjusting means for adjusting the speech signal determined to be speech / non-speech by the speech / non-speech, and a determination result display means for displaying the determination result in the speech / non-speech determination means. In the sound quality adjusting apparatus, the determination result display means displays the determination result to the user in stages according to the degree of speech or non-speech.

第２の技術手段は、第１の技術手段において、前記音質調整手段は、前記スピーチ／非スピーチ判定手段の判定結果に基づく前記音質調整を実行するか否かを設定する調整設定手段を有し、前記判定結果表示手段は、前記調整設定手段によって前記音質調整を実行するよう設定されている場合にのみ、前記判定結果の表示を行うことを特徴としたものである。 According to a second technical means, in the first technical means, the sound quality adjusting means includes an adjustment setting means for setting whether or not to execute the sound quality adjustment based on a determination result of the speech / non-speech determination means. The determination result display means displays the determination result only when the sound quality adjustment is set to be executed by the adjustment setting means.

第３の技術手段は、第１又は第２の技術手段において、前記判定結果表示手段は、前記判定結果の表示を実行するか否かを設定する表示設定手段を有し、該表示設定手段によって前記判定結果表示を実行するよう設定されている場合にのみ、前記判定結果の表示を行うことを特徴としたものである。 According to a third technical means, in the first or second technical means, the determination result display means has a display setting means for setting whether or not to display the determination result, and the display setting means The determination result is displayed only when it is set to execute the determination result display.

第４の技術手段は、第１乃至第３のいずれかの技術手段における音質調整装置とコンテンツ入力装置とを備えたコンテンツ表示装置であって、該コンテンツ入力装置で入力されたコンテンツに含まれる音声信号を前記音質調整装置に入力し、音質を調整して音声出力し、且つ、前記コンテンツに含まれる映像信号を表示すると共に、必要に応じて前記判定結果表示手段による判定結果表示を行うことを特徴としたものである。 A fourth technical means is a content display device comprising the sound quality adjusting device and the content input device according to any one of the first to third technical means, wherein the audio included in the content input by the content input device A signal is input to the sound quality adjustment device, the sound quality is adjusted and sound is output, a video signal included in the content is displayed, and a determination result display by the determination result display means is performed as necessary. It is a feature.

第５の技術手段は、入力された音声信号がスピーチに対応するものか、非スピーチに対応するものかを判別するための判定を行うスピーチ／非スピーチ判定ステップと、該スピーチ／非スピーチ判定ステップによってスピーチ／非スピーチに判別された音声信号に対し、スピーチと非スピーチとで異なる音質に調整する音質調整ステップと、前記スピーチ／非スピーチ判定ステップにおける判定結果を表示する判定結果表示ステップとを、コンピュータに実行させるためのプログラムであって、該判定結果表示ステップは、ユーザに対し、前記判定結果をスピーチ或いは非スピーチの度合に応じて段階的に表示するステップであることを特徴としたものである。 The fifth technical means includes a speech / non-speech determination step for determining whether the input audio signal corresponds to speech or non-speech, and the speech / non-speech determination step. A sound quality adjustment step of adjusting the sound signal determined to be different between speech and non-speech with respect to the audio signal determined as speech / non-speech by the determination, and a determination result display step of displaying a determination result in the speech / non-speech determination step. A program for causing a computer to execute, wherein the determination result display step is a step of displaying the determination result to the user in a stepwise manner according to the degree of speech or non-speech. is there.

第６の技術手段は、第５の技術手段において、前記音質調整ステップは、前記スピーチ／非スピーチ判定ステップでの判定結果に基づく前記音質調整を実行するか否かを設定する調整設定ステップを含み、前記判定結果表示ステップは、前記調整設定ステップによって前記音質調整を実行するよう設定されている場合にのみ、前記判定結果の表示を行うことを特徴としたものである。 According to a sixth technical means, in the fifth technical means, the sound quality adjustment step includes an adjustment setting step for setting whether or not to perform the sound quality adjustment based on a determination result in the speech / non-speech determination step. The determination result display step displays the determination result only when the sound quality adjustment is set to be executed by the adjustment setting step.

第７の技術手段は、第５又は第６の技術手段において、前記判定結果表示ステップは、前記判定結果の表示を実行するか否かを設定する表示設定ステップを含み、該表示設定ステップによって前記判定結果表示を実行するよう設定されている場合にのみ、前記判定結果の表示を行うことを特徴としたものである。 A seventh technical means is the fifth or sixth technical means, wherein the determination result display step includes a display setting step for setting whether or not to display the determination result. Only when the determination result display is set to be executed, the determination result is displayed.

第８の技術手段は、第５乃至第７のいずれかの技術手段におけるプログラムを記録したコンピュータ読み取り可能な記録媒体である。 The eighth technical means is a computer-readable recording medium on which a program according to any of the fifth to seventh technical means is recorded.

本発明によれば、入力された音声信号に対してスピーチ／非スピーチを判定してその判定結果に基づき音質を調整する際に、その判定結果をユーザに視認させることが可能となる。 According to the present invention, when speech / non-speech is determined for an input audio signal and the sound quality is adjusted based on the determination result, the determination result can be made visible to the user.

本発明に係る音質調整装置は、スピーチ／非スピーチ判定手段、音質調整手段、及び判定結果表示手段を備えるものとする。以下、本発明の説明にあたり、スピーチ／非スピーチ判定に際して、モノラル／ステレオ判定並びにその判定結果に基づきスピーチ／非スピーチ判定における判断基準を最適化するといった好適な例を挙げて説明するが、本発明ではこのようなモノラル／ステレオ判定及び最適化を実行しない形態も当然採用可能である。 The sound quality adjustment apparatus according to the present invention includes speech / non-speech determination means, sound quality adjustment means, and determination result display means. Hereinafter, in the description of the present invention, in the speech / non-speech determination, the mono / stereo determination and a determination example in the speech / non-speech determination based on the determination result will be described with reference to a preferable example. Then, it is naturally possible to adopt a form in which such monaural / stereo determination and optimization are not executed.

図１は、本発明の一実施形態に係る音質調整装置の一構成例を示すブロック図で、図中、１は音質調整装置、１０は判定結果表示手段、１１は音声信号入力手段、１２はスピーチ／非スピーチ判定手段、１３はモノラル／ステレオ判定手段、１４は基準最適化手段、１４ａはスイッチ、１４ｂは閾値（スレッショルド）Ｖ_ＳＬ１への設定手段、１４ｃは閾値Ｖ_ＳＬ２への設定手段、１５は音質調整手段、１６は音声信号出力手段である。 FIG. 1 is a block diagram showing a configuration example of a sound quality adjustment apparatus according to an embodiment of the present invention. In the figure, 1 is a sound quality adjustment apparatus, 10 is a determination result display means, 11 is an audio signal input means, and 12 is a sound signal input means. speech / non-speech decision section, the mono / stereo decision means 13, reference optimization means 14, 14a switch, setting means 14b is the threshold _(threshold) setting means to _{V SL1,} 14c is to the threshold _{V SL2,} 15 Is sound quality adjusting means, and 16 is an audio signal output means.

スピーチ／非スピーチ判定手段１２は、音声信号入力手段１１で入力された音声信号がスピーチに対応するものか、非スピーチに対応するものかを判別するための判定を行う。音声信号入力手段１１では、その入力元や入力方法は問わない。また、スピーチ／非スピーチ判定手段１２は、その全体又は一部をハードウェアで構成してもソフトウェアで構成してもよい。 The speech / non-speech determination unit 12 performs a determination for determining whether the audio signal input by the audio signal input unit 11 corresponds to speech or non-speech. In the audio signal input means 11, the input source and input method are not limited. The speech / non-speech determination unit 12 may be configured in whole or in part by hardware or software.

また、スピーチ／非スピーチ判定手段１２では、「ニュース番組などは一般的にモノラル放送が多く、一方で音楽が流れるＣＭや音楽番組はステレオ放送に設定されていることが多い」といった経験則を利用し、音声信号に重畳されたモノラル／ステレオ信号を検出することによって、現在放送されている番組がスピーチ／非スピーチ（音楽）のいずれに好適かを判断することが好ましい。このため、ここで説明する音質調整装置は、モノラル／ステレオ判定手段１３及び基準最適化手段１４を備え、これらの手段によってスピーチ／非スピーチ判定を最適化し、その判定に基づき音響パラメータの制御を行っている。 Further, the speech / non-speech determination means 12 uses an empirical rule such as “News programs are generally monaural broadcasting, while music and music programs in which music flows are often set to stereo broadcasting”. It is preferable to determine whether the currently broadcast program is suitable for speech / non-speech (music) by detecting the monaural / stereo signal superimposed on the audio signal. For this reason, the sound quality adjustment apparatus described here includes a monaural / stereo determination unit 13 and a reference optimization unit 14, which optimizes speech / non-speech determination and controls acoustic parameters based on the determination. ing.

モノラル／ステレオ判定手段１３は、入力された音声信号が、モノラル信号又はステレオ信号のいずれであるかを判定する。モノラル／ステレオ判定手段１３も、その全体又は一部をハードウェアで構成してもソフトウェアで構成してもよく、また、単に音声信号を入力した際のモノラル／ステレオの切り替えなどの情報によって判定してもよい。さらに、音声信号の元のコンテンツが電子プログラムガイド（ＥＰＧ）に掲載され予約録画可能なようになっている場合などには、ＥＰＧにおけるモノラル／ステレオの情報も共に掲載されているので、その情報を取得することでモノラル／ステレオ判定を行うことも可能である。 The monaural / stereo determination means 13 determines whether the input audio signal is a monaural signal or a stereo signal. The monaural / stereo determination means 13 may be configured in whole or in part by hardware or software, and is determined by information such as mono / stereo switching when a sound signal is input. May be. Furthermore, if the original content of the audio signal is posted in an electronic program guide (EPG) and can be reserved for recording, the mono / stereo information in the EPG is also posted together. It is also possible to perform monaural / stereo determination by acquiring.

基準最適化手段１４は、モノラル／ステレオ判定手段１３での判定結果に基づいて、スピーチ／非スピーチ判定手段１２における判定基準を最適化する。このように、モノラル／ステレオ判定によりスピーチ自動検出機能の判定基準を最適化させることで、検出機能の精度を向上させることができる。従って、入力された音声信号に対して的確にスピーチ／非スピーチを判別すること、すなわち音声信号のモノラル／ステレオの信号に応じて好適なスピーチ／非スピーチ検出が可能となる。例えば、ニュース等のモノラル信号時はスピーチと判定し易く、またＢＧＭを含めた音楽が多いステレオ信号時は非スピーチと判定し易くなるように最適化制御を行うことができる。また、この例では、音声信号のスピーチ／非スピーチの判定を的確に行うためにその音声信号に対してモノラル／ステレオ判定及び基準最適化が予めなされていることを前提とするが、ディレイなどを用いてもよいし、単に、音声信号が入力される度に、逐次、モノラル／ステレオ判定及び基準最適化を行ってスピーチ／非スピーチ判定を行っていってもよい。 The reference optimization unit 14 optimizes the determination criterion in the speech / non-speech determination unit 12 based on the determination result in the monaural / stereo determination unit 13. Thus, the accuracy of the detection function can be improved by optimizing the determination criterion of the automatic speech detection function by monaural / stereo determination. Therefore, it is possible to accurately determine speech / non-speech with respect to the input audio signal, that is, suitable speech / non-speech detection according to the monaural / stereo signal of the audio signal. For example, optimization control can be performed so that a monaural signal such as news can be easily determined as speech, and a stereo signal with a lot of music including BGM can be easily determined as non-speech. Also, in this example, it is assumed that monaural / stereo determination and reference optimization have been performed in advance for the sound signal in order to accurately determine the speech / non-speech of the sound signal. Alternatively, the speech / non-speech determination may be performed by sequentially performing monaural / stereo determination and reference optimization every time an audio signal is input.

また、スピーチ／非スピーチ判定手段１２における判定は、入力された音声信号に対して複数の信号解析を施すことによって行うようにすることが好ましい。信号解析としては、例えば、信号の対時間エネルギー変化解析，音節の均一解析，周波数対音声強度の解析などである。このような信号解析により、例えば、（Ｉ）信号の対時間エネルギー変化，（ＩＩ）周波数対音声強度，（ＩＩＩ）母音と子音の順序，（ＩＶ）音節の長さ，（Ｖ）子音と母音のエネルギー量などが得られる。 Further, the determination by the speech / non-speech determination unit 12 is preferably performed by performing a plurality of signal analyzes on the input audio signal. Signal analysis includes, for example, signal energy change analysis with respect to time, syllable uniformity analysis, and frequency vs. sound intensity analysis. By such signal analysis, for example, (I) signal energy change over time, (II) frequency versus voice intensity, (III) vowel and consonant order, (IV) syllable length, (V) consonant and vowel. The amount of energy can be obtained.

これらは、例えば次のような点を考慮して、スピーチ／非スピーチを判定するとよい。（Ｉ）スピーチには、音節（音声エネルギーが高い）と音節との間に、音声エネルギーが低い区分が存在し、非スピーチにはこのような区分は存在しないことが多い。（ＩＩ）スピーチが１００Ｈｚ〜３ｋＨｚの中域の強度が強く、非スピーチが低域及び高域の強度が強い。（ＩＩＩ）スピーチは、音節内の順序が子音から母音へと続く場合が多い。（ＩＶ）スピーチは、音節の長さが均一の場合が多い。（Ｖ）スピーチは、母音のエネルギー量が子音のエネルギー量より大きい場合が多い。さらに、（Ｉ）〜（Ｖ）に対し、重み付けを行って合算し、統計処理を施すなどして、最終的な信号解析の結果を得、その数値をモノラルの場合にはそれ用の閾値Ｖ_ＳＬ１でステレオの場合はそれ用の閾値Ｖ_ＳＬ２で判定することで、スピーチ／非スピーチの判定（例えばスピーチの可能性等の度合の判定）を行えばよい。他の方法として、基準最適化手段１４が、スピーチ／非スピーチの判定基準としての各信号解析に対する閾値のセットを、モノラル／ステレオ判定に基づいて変更するようにしてもよい。 For example, in consideration of the following points, speech / non-speech may be determined. (I) In speech, there is a segment with low speech energy between syllables (with high speech energy) and syllables, and such segments are often absent in non-speech. (II) The mid-range intensity of speech is 100 Hz to 3 kHz, and the non-speech intensity is low and high. (III) Speech often follows the order in a syllable from a consonant to a vowel. (IV) Speech often has uniform syllable lengths. In (V) speech, the amount of vowel energy is often greater than the amount of consonant energy. Further, weighting is performed on (I) to (V), summed, and statistical processing is performed to obtain a final signal analysis result. If the numerical value is monaural, the threshold value V is used. _{When SL1} is stereo, it is sufficient to determine speech / non-speech (for example, determination of the degree of possibility of speech, etc.) by determining with the threshold V _SL2 for that. As another method, the reference optimization unit 14 may change the set of threshold values for each signal analysis as a speech / non-speech determination criterion based on the monaural / stereo determination.

音質調整手段１５は、上述のごとき構成によってスピーチ／非スピーチに判別された音声信号に対し、少なくともスピーチと非スピーチとで異なる音質に調整する。ここでの音質設定の方法は任意であり、スピーチ／非スピーチの可能性などの度合により、その設定値や増減の設定値、或いは各周波数帯での設定値などが異なっていればよい。例えば、グラフィックイコライザのごときイコライザの中心周波数とフィルタのＱ値（グラフィックイコライザの１つの帯域分のカーブにおける山，谷の鋭さ）が固定されている音質設定や、パラメトリックイコライザのごとくこれらも変更可能な音質設定であってもよい。そして、音声信号出力手段１６は、音質調整手段１５で調整された音声信号を出力する。 The sound quality adjusting means 15 adjusts the sound signal determined to be speech / non-speech by the configuration as described above to at least different sound quality between speech and non-speech. The sound quality setting method here is arbitrary, and the set value, the increase / decrease set value, or the set value in each frequency band may be different depending on the degree of possibility of speech / non-speech. For example, the sound quality setting where the center frequency of the equalizer such as a graphic equalizer and the Q value of the filter (the sharpness of peaks and valleys in the curve for one band of the graphic equalizer) are fixed, and these can be changed as in the case of a parametric equalizer. Sound quality setting may be used. The audio signal output unit 16 outputs the audio signal adjusted by the sound quality adjustment unit 15.

そして、本発明の特徴となる判定結果表示手段１０は、ユーザに対し、スピーチ／非スピーチ判定手段１２における判定結果を、スピーチ或いは非スピーチの度合（例えば、スピーチ部分の割合やスピーチである可能性）に応じて段階的に表示する。実際、スピーチ／非スピーチ判定手段１２においては、スピーチ或いは非スピーチを検出し、その検出結果に応じて、所定の閾値で閾値処理し、スピーチであるか／非スピーチであるかの判定を下す。判定結果表示手段１０では、このようなスピーチ／非スピーチの判定の元となるスピーチ検出結果或いは非スピーチ検出結果を、その検出レベル（例えばスピーチの度合）に応じて段階的に表示するようにすればよい。このような段階的表示を行う際には、併せて複数段階の閾値処理（モノラル／ステレオの度合いに応じて少なくとも２セット以上の閾値群を用意しておくとよい）を行っておいて、各段階に応じた音質に調整するようにしておくことで、より段階的表示が効果的となる。 Then, the determination result display means 10 which is a feature of the present invention gives the determination result of the speech / non-speech determination means 12 to the user as the degree of speech or non-speech (for example, the ratio of speech portion or the possibility of speech) ) To display in stages. Actually, the speech / non-speech determination means 12 detects speech or non-speech, performs threshold processing with a predetermined threshold according to the detection result, and determines whether the speech is / non-speech. The determination result display means 10 displays the speech detection result or the non-speech detection result that is the basis of such speech / non-speech determination in a stepwise manner according to the detection level (for example, the degree of speech). That's fine. When performing such stepwise display, a plurality of threshold processings (at least two sets of threshold groups may be prepared according to the degree of monaural / stereo) are combined, By adjusting the sound quality according to the stage, the staged display becomes more effective.

また、音質調整手段１５は、スピーチ／非スピーチ判定手段１２の判定結果に基づく音質調整手段１５による音質調整を実行するか否かを設定する調整設定手段を有するようにしてもよい。なお、スピーチ／非スピーチ判定以外に起因する音質調整については別途設定するなどすればよい。この調整設定手段ではユーザ操作により設定させることとなる。そして、ここでいう設定とは、例えば、（ａ）音質調整をスピーチ／非スピーチ判定に基づき自動的に行うこと、（ｂ）音質調整を固定すること（所定のスピーチに対して行う音質調整とするなど）、（ｃ）音質調整（あくまでスピーチ／非スピーチ判定に基づく音質調整）を行わないこと、などの選択肢の中からユーザの選択操作によって設定となる。その調整設定手段におけるユーザ設定に基づき、音質調整手段１５では（ａ），（ｂ），（ｃ）のそれぞれに合致した音質調整を行い、判定結果表示手段１０では、（ａ）の場合には判定結果（検出結果）の表示、（ｂ），（ｃ）の場合には非表示とする。このように、判定結果表示手段１０では、調整設定手段によって音質調整を実行するよう設定されている場合にのみ、判定結果の表示を行えばよい。例えば、単に上述の（ｂ）のごときスピーチ用の音質調整を行うだけのときには判定結果を表示しないことになる。 Further, the sound quality adjusting unit 15 may include an adjustment setting unit that sets whether or not to perform the sound quality adjustment by the sound quality adjusting unit 15 based on the determination result of the speech / non-speech determining unit 12. In addition, what is necessary is just to set separately about the sound quality adjustment resulting from other than speech / non-speech determination. This adjustment setting means is set by a user operation. The settings here include, for example, (a) automatically adjusting the sound quality based on speech / non-speech determination, (b) fixing the sound quality adjustment (sound quality adjustment performed on a predetermined speech, And (c) sound quality adjustment (sound quality adjustment based on speech / non-speech determination) is not performed, and the setting is set by the user's selection operation from among the options. Based on the user settings in the adjustment setting means, the sound quality adjustment means 15 performs sound quality adjustments that match each of (a), (b), and (c), and the determination result display means 10 determines in the case of (a). Display of the determination result (detection result), and in the case of (b) and (c), it is not displayed. As described above, the determination result display unit 10 may display the determination result only when the adjustment setting unit is set to execute the sound quality adjustment. For example, the determination result is not displayed when the sound quality adjustment for speech is simply performed as in (b) described above.

さらに、判定結果表示手段１０は、判定結果の表示を実行するか否かを設定する表示設定手段を有するようにしてもよい。そして、判定結果表示手段１０では、表示設定手段によって判定結果表示を実行するよう設定されている場合にのみ、判定結果の表示を行えばよい。なお、この表示設定手段は上述の調整設定手段の具備の如何は問わず具備すればよいが、調整設定手段と共に具備する形態にあっては、判定結果表示手段１０は、調整設定手段で判定結果に基づく音質調整を実行する場合で、且つ判定結果表示を実行する場合でのみ、判定結果の表示を行うこととなる。 Further, the determination result display means 10 may include display setting means for setting whether or not to display the determination result. The determination result display means 10 may display the determination result only when the display setting means is set to execute the determination result display. It should be noted that this display setting means may be provided regardless of the provision of the adjustment setting means described above. However, in the embodiment provided with the adjustment setting means, the determination result display means 10 is the determination result obtained by the adjustment setting means. The determination result is displayed only when the sound quality adjustment based on is performed and the determination result display is performed.

図２は、図１の音質調整装置における音質調整処理並びに判定結果表示処理の一例を説明するためのフロー図で、図３は、図１の音質調整装置における音質調整処理で用いる音質設定イコライジングの一例を示す図、図４は、図２の判定結果表示処理における画面表示例を示す図である。 2 is a flowchart for explaining an example of the sound quality adjustment process and the determination result display process in the sound quality adjustment apparatus of FIG. 1, and FIG. 3 shows the sound quality setting equalization used in the sound quality adjustment process in the sound quality adjustment apparatus of FIG. FIG. 4 is a diagram illustrating an example, and FIG. 4 is a diagram illustrating a screen display example in the determination result display process of FIG.

簡略化のため、スピーチ／非スピーチにおける判定基準がある１つの閾値処理によってなされるものとして説明するが、複数段階の閾値処理を行う場合には以下の説明で閾値を閾値のセットと読みかえればよい。まず、音声信号が入力されると、モノラル／ステレオ判定手段１３によりモノラル／ステレオ判定がなされる（ステップＳ１）。この判定に際しては、例えば、Ｌを左入力信号、Ｒを右入力信号とすると、入力信号に（Ｌ−Ｒ）／（Ｌ＋Ｒ）の演算を実行し、位相差判定を実施するとよい。 For the sake of simplification, the description will be made assuming that the determination criterion for speech / non-speech is performed by one threshold process. Good. First, when an audio signal is input, monaural / stereo determination means 13 performs monaural / stereo determination (step S1). In this determination, for example, assuming that L is a left input signal and R is a right input signal, a calculation of (LR) / (L + R) is performed on the input signal, and the phase difference determination may be performed.

この判定により、モノラル信号であると判定された場合には、基準最適化手段１４において、スイッチ１４ａを閾値Ｖ_ＳＬ１への設定手段１４ｂ側へ接続し、スピーチ／非スピーチ判定手段１２における判定の閾値をＶ_ＳＬ１に設定する（ステップＳ２）。一方、ステップＳ１により、ステレオ信号であると判定された場合には、基準最適化手段１４において、スイッチ１４ａを閾値Ｖ_ＳＬ２への設定手段１４ｃ側へ接続し、スピーチ／非スピーチ判定手段１２における判定の閾値をＶ_ＳＬ２に設定する（ステップＳ３）。このように閾値の設定を最適化することで、ニュース等のモノラル信号時はスピーチと判定し易く、またＢＧＭを含めた音楽が多いステレオ信号時は非スピーチと判定し易くなるように制御することができる。なお、基準最適化手段１４の構成は図示したものに限定されるものではない。 If it is determined by this determination that the signal is a monaural signal, the reference optimization means 14 connects the switch 14a to the setting means 14b side for the threshold value _VSL1, and the determination threshold value in the speech / non-speech determination means 12 _Is set to _VSL1 (step S2). On the other hand, if it is determined in step S1 that the signal is a stereo signal, the reference optimizing unit 14 connects the switch 14a to the setting unit 14c side for the threshold value _VSL2, and the determination in the speech / non-speech determining unit 12 is performed. _Is set to _VSL2 (step S3). By optimizing the threshold setting in this way, control is performed so that it is easy to determine speech for monaural signals such as news, and it is easy to determine non-speech for stereo signals with a lot of music including BGM. Can do. The configuration of the reference optimization unit 14 is not limited to that shown in the figure.

次に、スピーチ／非スピーチ判定手段１２が、ステップＳ２／Ｓ３のいずれかで設定された閾値Ｖ_ＳＬ１／Ｖ_ＳＬ２に基づいて、スピーチ／非スピーチの判定を行う（ステップＳ４）。そして、スピーチであると判定された場合には、音質設定Ａを選択して音質を調整する（ステップＳ５）。一方、ステップＳ４で非スピーチと判定された場合、音質設定Ｂを選択して音質を調整する（ステップＳ６）。 Next, the speech / non-speech determination unit 12 determines speech / non-speech based on the threshold value V _SL1 / V _SL2 set in any of steps S 2 / S 3 (step S 4). If the speech is determined to be speech, the sound quality setting A is selected to adjust the sound quality (step S5). On the other hand, if it is determined in step S4 that the speech is not speech, the sound quality setting B is selected to adjust the sound quality (step S6).

ここで、音質設定Ａと音質設定Ｂとの違いの例について、図３を参照して説明する。音質設定Ａ（スピーチ）の場合、イコライザの周波数特性をグラフ２１で示すように設定し、音質設定Ｂ（非スピーチ）の場合、イコライザの周波数特性をグラフ２２で示すように設定する。グラフ２１とグラフ２２との違いは、非スピーチのときはスピーチのときに比べて、所定の低周波数２２ａの付近及び所定の高周波数２２ｂの付近を強調している点にある。 Here, an example of the difference between the sound quality setting A and the sound quality setting B will be described with reference to FIG. In the case of the sound quality setting A (speech), the frequency characteristic of the equalizer is set as shown in the graph 21, and in the case of the sound quality setting B (non-speech), the frequency characteristic of the equalizer is set as shown in the graph 22. The difference between the graph 21 and the graph 22 is that in the case of non-speech, the vicinity of the predetermined low frequency 22a and the vicinity of the predetermined high frequency 22b are emphasized compared to the case of speech.

ステップＳ５／Ｓ６の処理の前後（少なくともステップＳ４におけるスピーチ／非スピーチ判定の後）に、その判定結果を表示する（ステップＳ７）。この表示の方法としては音質調整装置にＬＥＤ表示するようにしてもよいし、音声信号が映像信号と共に入力されている場合には、例えば図４で例示するように、その映像信号を表示する画面３１上にＯＳＤ（ＯｎＳｃｒｅｅｎＤｉｓｐｌａｙ）表示を行うようにしてもよい。 The determination result is displayed before and after the process of step S5 / S6 (at least after the speech / non-speech determination in step S4) (step S7). As a display method, an LED may be displayed on the sound quality adjusting device. When an audio signal is input together with a video signal, a screen for displaying the video signal, for example, as illustrated in FIG. OSD (On Screen Display) display may be performed on 31.

また、ステップＳ７における判定結果表示に際しては、スピーチ／非スピーチ判定によるスピーチ度合（或いは非スピーチ度合）が視認できるように、段階的に表示する。なお、ここでの最低の段階表示処理としては、結果的に１つの閾値でスピーチ／非スピーチ判定の処理をして音質調整を実行する場合に対応させ、少なくともスピーチか非スピーチかの２段階で表示する。以下、スピーチ度合をユーザに視認させるような例で説明すると、図４で例示したように、例えば、画面３１上に「スピーチ度合」を表す文字３２等を表示させると共に、スピーチ度合（スピーチ検出レベル）に応じた数のマーク３３を表示させるとよい。このマーク３３の数は、スピーチ度合に応じた数であってスピーチセンサマークとも呼べ、結果的に音質調整がどの位スピーチ寄りになされているかを示すものであり、マーク３３の例としてはグリーンの色で口を開けた人の顔をイメージしたスピーチマークを表示するなどすればよい。その他、例えばユーザ設定によって、色の選択や（例えば日本語はグリーン、英文字はオレンジ等）、形状の選択（スピーカマーク，サイン，コサインマーク，フラッシング点滅等）も可能としておいてもよい。なお、図４の例では、「スピーチ度合」を表す文字３２として、スピーチ／非スピーチ判定に基づく音質調整の名称（ここでは「いきいきボイス」と命名）を示している。 When the determination result is displayed in step S7, the determination result is displayed step by step so that the speech degree (or non-speech degree) by the speech / non-speech determination can be visually recognized. Note that the lowest stage display process here corresponds to the case where the speech / non-speech determination process is performed with one threshold value and the sound quality adjustment is executed, and at least two stages of speech or non-speech. indicate. Hereinafter, an example in which the user visually recognizes the speech level will be described. As illustrated in FIG. 4, for example, the character 32 indicating “speech level” is displayed on the screen 31 and the speech level (speech detection level). The number of marks 33 may be displayed according to (). The number of marks 33 is a number corresponding to the degree of speech and can also be called a speech sensor mark. As a result, it indicates how much the sound quality adjustment is made, and an example of the mark 33 is green. For example, a speech mark representing the face of a person who opened his mouth with a color may be displayed. In addition, for example, color selection (for example, green for Japanese, orange for English) or selection of shape (speaker mark, sign, cosine mark, flashing flashing, etc.) may be made possible by user settings. In the example of FIG. 4, the name of the sound quality adjustment based on the speech / non-speech determination (named “lively voice” here) is shown as the character 32 representing “speech degree”.

また、判定結果表示に際しては、マーク３３のごとく画面３１の下部に顔イメージを横方向に表示するようにしてもよいし、自動又は手動によって表示位置を任意の位置に移動できるようにすること、さらには縦型表示／横型表示を変更することも可能としておくとよい。また、表示位置を移動する方法として、例えば画面の下部や上部に文字が表示された場合は、それらの文字と重ならない位置に移動できるようにするとよい。より具体的には、例えば、音声多重放送の日本語の吹き替え表示や画面の下部にデータ放送のニュース情報等の文字表示と重ならない位置などに移動すればよい。また、ＥＰＧから番組種別情報（例えば歌番組かそれ以外の番組）を取得して、歌番組の場合に表示の大きさを小さく又は大きくするとともに、画面に表示される歌詞の表示と重ならない位置に表示するなどの応用も可能である。 When displaying the determination result, the face image may be displayed in the horizontal direction at the bottom of the screen 31 like the mark 33, or the display position may be moved to an arbitrary position automatically or manually. Furthermore, it is preferable that the vertical display / horizontal display can be changed. Further, as a method of moving the display position, for example, when characters are displayed at the lower or upper part of the screen, it is preferable that the display position can be moved to a position that does not overlap those characters. More specifically, for example, it may be moved to a position where it does not overlap with text display such as Japanese dubbing display of audio multiplex broadcasting or news information of data broadcasting at the bottom of the screen. In addition, the program type information (for example, a song program or other program) is acquired from the EPG, and in the case of a song program, the display size is reduced or increased, and the position that does not overlap with the display of lyrics displayed on the screen Applications such as displaying on the screen are also possible.

さらに、本発明は、上述のごとき音質調整装置とコンテンツ入力装置とを備えたコンテンツ表示装置（例えば、デジタル／アナログに限らずテレビジョン放送やラジオ放送の放送信号を受信する放送受信装置）にも適用可能である。このコンテンツ表示装置では、コンテンツ入力装置で入力されたコンテンツに含まれる音声信号を音質調整装置に入力し、音質を調整して音声出力し、且つ、コンテンツに含まれる映像信号を表示すると共に、必要に応じて判定結果表示手段１０による判定結果表示を行う。本発明に係るコンテンツ表示装置は、例えば、テレビジョン受信機をはじめ、コンテンツ再生プログラム，ビデオカード（ビデオアダプタともいう）等のモジュールを備えた汎用のパーソナルコンピュータ（以下、ＰＣと略す）などにも、後述するように適用可能である。また、本発明においては、コンテンツの配信及び放送形態は基本的に問わない。次に、音質調整装置を組み込んだコンテンツ表示装置の例としてテレビ受信機（テレビ受像機）を挙げて、より具体的に説明する。 Furthermore, the present invention is also applied to a content display device (for example, a broadcast receiving device that receives a broadcast signal of a television broadcast or a radio broadcast as well as a digital / analog) including the sound quality adjusting device and the content input device as described above. Applicable. In this content display device, the audio signal included in the content input by the content input device is input to the sound quality adjustment device, the sound quality is adjusted and the sound is output, the video signal included in the content is displayed, and necessary. In response to this, the determination result display means 10 displays the determination result. The content display device according to the present invention is also applicable to, for example, a general-purpose personal computer (hereinafter abbreviated as a PC) including modules such as a television receiver, a content reproduction program, and a video card (also referred to as a video adapter). It is applicable as described later. Further, in the present invention, the distribution and broadcasting form of the content are basically not questioned. Next, a television receiver (television receiver) is given as an example of a content display device incorporating a sound quality adjusting device, and will be described more specifically.

図５は、図１の音質調整装置における適用例の一つであるテレビ受像機の一構成例を示すブロック図で、図６は、図５におけるマイコン内に格納されているマーク表示目標テーブルの一例を示す図である。また、図７は、図５のテレビ受像機における判定結果表示処理を説明するためのフロー図で、図２のフロー図における判定結果表示処理を抜粋して詳細に説明するためのフロー図でもある。さらに、図８乃至図１０は、図１の音質調整装置における判定結果表示の設定画面の一例を示す図で、図８は音声調整の設定項目例を、図９は図８の設定項目例のうちの本発明に係る音質調整に対する動作設定の項目例を、図１０は図８の設定項目例のうちの本発明に係る音質調整に対する表示設定の項目例を、それぞれ示している。 FIG. 5 is a block diagram showing a configuration example of a television receiver which is one of application examples in the sound quality adjustment apparatus of FIG. 1, and FIG. 6 is a diagram showing a mark display target table stored in the microcomputer of FIG. It is a figure which shows an example. FIG. 7 is a flowchart for explaining the determination result display processing in the television receiver of FIG. 5, and is also a flowchart for excerpting the determination result display processing in the flowchart of FIG. . Further, FIGS. 8 to 10 are diagrams illustrating an example of a setting screen for determination result display in the sound quality adjustment apparatus of FIG. 1, FIG. 8 is an example of setting items for audio adjustment, and FIG. 9 is an example of setting items of FIG. FIG. 10 shows an example of operation setting items for sound quality adjustment according to the present invention, and FIG. 10 shows an example of display setting items for sound quality adjustment according to the present invention, among the setting item examples of FIG.

図５において、４はテレビ受像機本体、４０はチューナ部、４１は外部入力部、４２は本体操作部、４３は映像処理ＩＣ（ＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、４４は本体のマイクロコンピュータ（以下、マイコン）、４５は音声処理ＩＣ、４６はディスプレイ、４７Ｌは左スピーカ、４７Ｒは右スピーカ、４８は受光部、４９はリモートコントローラユニット（以下、リモコン）である。また、図６、及び図８乃至図１０において、５は、マイコン４４内のＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）等に格納されたスピーチセンサマーク表示目標テーブル、６は音声調整の設定画面例、６１は設定メニュー一覧、６２は音声調整項目一覧、６３は動作設定項目、６４は表示設定項目である。 In FIG. 5, 4 is a television receiver main body, 40 is a tuner unit, 41 is an external input unit, 42 is a main body operation unit, 43 is a video processing IC (Integrated Circuit), 44 is a main body microcomputer (hereinafter referred to as a microcomputer), 45 is an audio processing IC, 46 is a display, 47L is a left speaker, 47R is a right speaker, 48 is a light receiving unit, and 49 is a remote controller unit (hereinafter referred to as a remote controller). 6 and 8 to 10, 5 is a speech sensor mark display target table stored in a ROM (Read Only Memory) or the like in the microcomputer 44, 6 is an example of a voice adjustment setting screen, and 61 is a setting. A menu list, 62 is an audio adjustment item list, 63 is an operation setting item, and 64 is a display setting item.

ここで例示するテレビ受像機本体４は、主として、制御手段の一例としての本体マイコン４４、アンテナ及びチューナ部４０や外部入力部４１などの映像・音声入力部、入力した映像信号に対し各種映像処理を施す映像処理ＩＣ４３、入力した音声信号に対し各種音声処理を施す音声処理ＩＣ４５、ユーザ操作を受け付ける本体操作部４２、映像処理した映像信号を映し出すＬＣＤ，ＰＤＰ，有機ＥＬ等のディスプレイ（表示デバイス）４６、音声処理した音声信号を出力する左右のスピーカ４７Ｌ，４７Ｒ、リモコン４９からの光を受光する受光部４８により構成される。そして、マイコン４４内のＲＯＭ等には、スピーチセンサマーク表示目標テーブル５が格納されているものとする。なお、マイコン４４及び音声処理ＩＣ４５（及び映像処理ＩＣ４３）は、システムＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）としても組み込むこともできる。 The television receiver main body 4 exemplified here mainly includes a main body microcomputer 44 as an example of a control means, a video / audio input unit such as an antenna and a tuner unit 40 and an external input unit 41, and various video processings for input video signals. A video processing IC 43 that performs various processing on the input audio signal, a main body operation unit 42 that accepts user operations, a display (display device) such as an LCD, PDP, or organic EL that displays the video signal that has undergone the video processing 46, left and right speakers 47L and 47R that output audio signals subjected to audio processing, and a light receiving unit 48 that receives light from the remote controller 49. The speech sensor mark display target table 5 is stored in the ROM or the like in the microcomputer 44. The microcomputer 44 and the audio processing IC 45 (and the video processing IC 43) can also be incorporated as a system LSI (Large Scale Integrated Circuit).

また、周期処理時間の設定を、テレビ受像機４における調整工程で設定しておく。この周期処理時間の設定は、本発明に係る判定結果表示処理を行うに際し、音声処理ＩＣ４５でなされるスピーチ／非スピーチの判定結果をマイコン４４で読み取る周期を設定する処理であり、例えば１００ｍｓ単位で読み取る設定しておくとよい。ここでは、例えば１００ｍｓ〜２０００ｍｓの間で可変としてもよく、調整工程だけでなくユーザ設定によっても可変としてもよい。このように読み取り時間をある程度固定しないと、判定結果表示の滑らかさに影響してしまう。実際にここで設定された周期で読み取られるデータとしては、例えばレジスタの可動範囲として−１６０〜＋１６０（ＦＦＦＦ６０〜００００００〜００００Ａ０）を用意しておき、このレジスタの初期設定値を「００００００」としておく。そして、音質調整自体は、このレジスタ値が正方向でスピーチ、負方向で非スピーチの音質設定となるように制御しておく。また、表示目標数を下式、並びに下式におけるＭＩＮ及びＭＡＸの値の設定などにより、予め設定しておく。ここで、各表示数の設定値は「以上未満」とする。なお、下式をスピーチセンサマーク表示目標テーブル５などとして格納しておけばよい。 In addition, the setting of the periodic processing time is set in the adjustment process in the television receiver 4. The setting of the period processing time is a process for setting a period for reading the speech / non-speech determination result performed by the voice processing IC 45 by the microcomputer 44 when the determination result display process according to the present invention is performed. It is good to set to read. Here, for example, it may be variable between 100 ms and 2000 ms, and may be variable not only by the adjustment process but also by user settings. Thus, if the reading time is not fixed to some extent, the smoothness of the determination result display is affected. For example, −160 to +160 (FFFF 60 to 000000 to 0000A0) is prepared as the movable range of the register as the data that is actually read at the period set here, and the initial setting value of this register is set to “000000”. . Then, the sound quality adjustment itself is controlled so that the register value is set to a sound quality setting of speech in the positive direction and non-speech in the negative direction. In addition, the display target number is set in advance by the following formula and the setting of the MIN and MAX values in the following formula. Here, the set value of each display number is “less than or equal to”. The following equation may be stored as the speech sensor mark display target table 5 or the like.

ＭＩＮ＋（ＭＡＸ−ＭＩＮ）×変数［１〜９］÷９ MIN + (MAX−MIN) × variables [1-9] ÷ 9

上式において、ＭＡＸ及びＭＩＮは、上述した例でいうところの−１６０〜＋１６０の間の値として予め設定される最大値及び最小値であり、例えばＭＩＮを０、ＭＡＸを１５０などと予め設定しておけばよい。さらに下式では、判定結果表示を１０段階（つまりＭＡＸ）で行うものとして、すなわち表示の個数の一例として図４のマーク３３が０〜１０個表示できるように予め設定されているものとして例示しているが、これに限ったものではない。 In the above equation, MAX and MIN are the maximum and minimum values set in advance as values between −160 and +160 in the above-described example. For example, MIN is set to 0, MAX is set to 150, etc. Just keep it. Further, in the following expression, the determination result is displayed in 10 steps (that is, MAX), that is, as an example of the number of displays, it is illustrated as being preset so that 0 to 10 marks 33 in FIG. 4 can be displayed. However, it is not limited to this.

上述のごときテレビ受像機４におけるマイコン４４の処理は、まず、上述のごとく設定された周期での周期処理（例えば１００ｍｓ単位）を行う（ステップＳ１１）。ステップＳ１１では、処理周期の到来によって、以下のステップＳ１２〜Ｓ２２を実行させることになる。まずステップＳ１２では、マイコン４４で読み取った音声処理ＩＣ４５における判定結果を上式（テーブル５）に代入することで、表示目標値を設定、すなわち表示数を決定する。 In the processing of the microcomputer 44 in the television receiver 4 as described above, first, periodic processing (for example, in units of 100 ms) is performed with the period set as described above (step S11). In step S11, the following steps S12 to S22 are executed according to the arrival of the processing cycle. First, in step S12, the display target value is set, that is, the number of displays is determined by substituting the determination result in the voice processing IC 45 read by the microcomputer 44 into the above equation (Table 5).

ここで、同期無し時及び無音時は表示を即時に“０”とする（ステップＳ１３，Ｓ１４）。ステップＳ１３において、入力信号の同期の有無の判定及び無音状態の判定を行い、入力信号同期が無かった場合或いは無音状態であった場合、ステップＳ１４において「強制的に“０”」とする計算を行って、ステップＳ２０へ進む。無音状態の判定については他の実施形態で後述する。なお、ステップＳ１３の判断及びステップＳ１４における計算は、例えばユーザがニュース番組を視聴していて次に選曲によって砂嵐の画面が表示された場合などに有効である。このような場合、またスピーチ／非スピーチの判定結果としては例えばスピーチであるとの判定結果（例えばレジスタ値が＋１６０）が徐々に０に落ちてはいくがレジスタに残ってしまっており、周期的な表示がそのレジスタ値（その残った値）を読み取って実行するようになっていることから、スピーチ／非スピーチの判定が実行できない砂嵐に対しても実行されているようにユーザが勘違いしてしまう。従って、このような勘違いを防止するために強制的にレジスタ値を０にする必要がある。 Here, the display is immediately set to “0” when there is no synchronization and when there is no sound (steps S13 and S14). In step S13, it is determined whether or not the input signal is synchronized and the silence state is determined. If the input signal is not synchronized or is silent, the calculation is forcibly set to “0” in step S14. Go to step S20. The determination of the silent state will be described later in another embodiment. Note that the determination in step S13 and the calculation in step S14 are effective, for example, when the user views a news program and then displays a sandstorm screen by selecting a song. In such a case, as a speech / non-speech determination result, for example, the determination result of speech (for example, the register value is +160) gradually falls to 0 but remains in the register, and is periodically Since the display is executed by reading the register value (the remaining value), the user misunderstands that it is also executed for a sandstorm where speech / non-speech determination cannot be performed. End up. Therefore, it is necessary to forcibly set the register value to 0 in order to prevent such a misunderstanding.

一方、ステップＳ１３でＮＯの場合、前周期の表示数がステップＳ１２で設定された表示目標値であるか否かを判定する（ステップＳ１５）。ステップＳ１５でＹＥＳの場合、その表示数を維持し（ステップＳ１６）、ステップＳ２０へ進む。ステップＳ１５でＮＯの場合、前周期の表示数がステップＳ１２で設定された表示目標値より小さいか否かを判定する（ステップＳ１７）。ステップＳ１７でＹＥＳの場合、「前周期の表示数＋１」の計算を実行し（ステップＳ１８）、ステップＳ２０へ進む。ステップＳ１７でＮＯの場合、「前周期の表示数−１」の計算を実行し（ステップＳ１９）、ステップＳ２０へ進む。 On the other hand, in the case of NO in step S13, it is determined whether or not the display number of the previous cycle is the display target value set in step S12 (step S15). If YES in step S15, the display number is maintained (step S16), and the process proceeds to step S20. In the case of NO in step S15, it is determined whether or not the display number of the previous cycle is smaller than the display target value set in step S12 (step S17). If “YES” in the step S17, a calculation of “the display number of the previous cycle + 1” is executed (step S18), and the process proceeds to the step S20. In the case of NO in step S17, the calculation of “display number of previous period−1” is executed (step S19), and the process proceeds to step S20.

そして、ステップＳ１４，Ｓ１６，Ｓ１８，Ｓ１９の後、表示数を前周期の表示数に格納し（ステップＳ２０）、表示するか否かの判定を行って（ステップＳ２１）、表示すると判定された場合には画面に表示を行い（ステップＳ２２）、そうでない場合にはそのままこの周期での処理を終了して次の周期の到来を待つ。このように、マイコン４４では、ＲＯＭ内に格納されたテーブル５を元に、上述のごとき周期処理及び計算がなされる。 Then, after Steps S14, S16, S18, and S19, the display number is stored in the display number of the previous period (Step S20), and it is determined whether or not to display (Step S21). Is displayed on the screen (step S22), otherwise, the processing in this cycle is terminated and the next cycle is awaited. As described above, the microcomputer 44 performs the periodic processing and calculation as described above based on the table 5 stored in the ROM.

次に、ステップＳ２１における判定に関して説明する。この判定は、デフォルト値或いはユーザ設定を読み取ることでなされる。ここで、ユーザ設定は、上述した調整設定手段並びに表示設定手段における設定がそれに相当し、次のような手順でなされる。まず、図８に示すようにユーザメニュー一覧６１（映像調整，音声調整，本体設定，機能切替）を表示し、ユーザが音声調整を選択することで、音声調整に関する項目一覧６２（高音，低音，バランス，サラウンド，いきいきボイス，リセット）を表示する。ユーザが、その中から本発明に係る音質調整（「いきいきボイス」６２ａ）を選択することで、図９或いは図１０のように、動作設定項目（調整設定手段における設定項目）６３及び表示設定項目６４（表示設定手段における設定項目）を表示する。 Next, the determination in step S21 will be described. This determination is made by reading a default value or a user setting. Here, the user setting corresponds to the setting in the adjustment setting means and the display setting means described above, and is performed in the following procedure. First, as shown in FIG. 8, a user menu list 61 (video adjustment, audio adjustment, main unit setting, function switching) is displayed, and when the user selects audio adjustment, an item list 62 relating to audio adjustment (high, low, (Balance, surround, lively voice, reset) is displayed. When the user selects the sound quality adjustment (“live voice” 62a) according to the present invention from among them, the operation setting item (setting item in the adjustment setting means) 63 and the display setting item as shown in FIG. 9 or FIG. 64 (setting item in the display setting means) is displayed.

動作設定項目６３としては、例えば、本発明に係る音質調整を行わない設定に相当する「切」６３ａ、スピーチ／非スピーチの判定無しで或いは判定に依らずにスピーチ（又は非スピーチ）寄りの音質に調整するための設定に相当する「固定」６３ｂ、及び自動でスピーチ／非スピーチの判定並びにその判定結果に基づく音質調整を行う設定に相当する「自動」６３ｃを用意しておく。そして、「動作設定」が「自動」６３ｃの時にスピーチセンサマークを表示し、「固定」６３ｂ，「切」６３ａの時にはスピーチセンサマークを表示しない。なお、フローのように、「切」６４ａに設定されている時でもデータの読み取りを行っておくとよい。一方、表示設定項目６４としては、「表示なし」６４ａ及び「表示あり」６４ｂを用意しておき、「表示設定」が「表示あり」６４ｂの時だけ、スピーチセンサマークを表示する。勿論、設定周期（例えば１００ｍｓ単位）毎にデータを読み取って画面下部にスピーチセンサマークを表示すること自体を、「表示あり」６４ｂに設定されている時のみ実行してもよい。 As the operation setting item 63, for example, “OFF” 63a corresponding to a setting for performing no sound quality adjustment according to the present invention, sound quality close to speech (or non-speech) without or without determination of speech / non-speech “Fixed” 63b corresponding to the setting for adjusting to “Automatic” and “Automatic” 63c corresponding to the setting for automatically performing speech / non-speech determination and sound quality adjustment based on the determination result are prepared. When the “operation setting” is “automatic” 63c, the speech sensor mark is displayed. When the “operation setting” is “fixed” 63b and “off” 63a, the speech sensor mark is not displayed. Note that data may be read even when “OFF” 64a is set as in the flow. On the other hand, “no display” 64a and “display” 64b are prepared as display setting items 64, and the speech sensor mark is displayed only when “display setting” is “display” 64b. Of course, reading the data every set cycle (for example, in units of 100 ms) and displaying the speech sensor mark at the bottom of the screen may be executed only when “with display” 64b is set.

上述のごとき構成及び処理により、本実施形態では、入力された音声信号に対してスピーチ／非スピーチを判定する際に、その判定結果をユーザに視認させることが可能となる。このような判定結果をユーザに視認させることによって、その判定結果に基づいて処理されている音質調整の要因もユーザに把握させることが可能となる。また、その視認によって、さらなるユーザ設定も可能になる。また、スピーチ／非スピーチを判定する際にモノラル／ステレオ判定を行うことで、音声信号の音声情報だけからではなく番組（その音声信号を含む番組）の主旨に沿った判断（スピーチ／非スピーチの判断）も同時になすことで、入力された音声信号の特性によるイコライザ等の音響パラメータ制御の誤判定を極力低減し、的確な音響パラメータの制御及び的確な音質調整が可能となる。また、例えば、音声信号に音声情報と同時に重畳されたモノラル／ステレオ信号によってその番組の主旨を判定し、その結果に応じて入力された音声信号がスピーチか非スピーチ（音楽）かを判断するための判断基準を最適化することによって、放送された番組の内容、特性に応じたスピーチ／非スピーチ検出の自由な制御、及びその制御に基づく機器の制御（例えば音質調整や分別録画等）も可能になる。 With the configuration and processing as described above, in this embodiment, when speech / non-speech is determined for an input audio signal, the determination result can be made visible to the user. By making the user visually recognize such a determination result, it is possible to cause the user to understand the sound quality adjustment factor being processed based on the determination result. Further, further user settings can be made by the visual recognition. In addition, by determining monaural / stereo when determining speech / non-speech, the judgment (speech / non-speech) is based not only on the audio information of the audio signal but also on the gist of the program (the program including the audio signal). (Judgement) is also performed at the same time, it is possible to reduce erroneous determination of acoustic parameter control of an equalizer or the like due to the characteristics of the input audio signal as much as possible, and it is possible to perform accurate acoustic parameter control and accurate sound quality adjustment. Also, for example, in order to determine whether the program is based on a monaural / stereo signal superimposed on the audio signal at the same time as the audio information, and to determine whether the input audio signal is speech or non-speech (music) according to the result. By optimizing the judgment criteria, it is possible to freely control speech / non-speech detection according to the contents and characteristics of the broadcasted program, and to control equipment based on that control (for example, sound quality adjustment and separate recording) become.

また、本実施形態に係るコンテンツ表示装置では、例えば、スピーチ自動検出機能を使用し、ＴＶ番組やビデオ／ＤＶＤ等がスピーチ音声か非スピーチ音声かを視覚的に認識できる表示機能を備えることで、現在表示しているコンテンツがスピーチ音声か非スピーチ音声かをユーザに視覚的に認識させることが可能となる。すなわち、リアルタイムにＴＶ番組やビデオ／ＤＶＤ等の音声体系（スピーチ／非スピーチ）が視覚的にわかる。また、上述したスピーチ／非スピーチの判定をコンテンツの記録（再録画も含む）に適用してもよく、その場合には、コンテンツ表示装置に、コンテンツを放送経由，ネットワーク経由，記録媒体経由などで取得するだけでなく取得したコンテンツを記録或いは予約記録する機能を付加しておくとよい。例えば、各種レコーダなどでスピーチ／非スピーチ判定をＣＭ判定やその他の分別録画に利用することもでき、そのときに、併せてそのコンテンツがスピーチに相当するのか、或いは非スピーチに相当するのかをユーザに視認可能なように表示すればよい。 In addition, the content display device according to the present embodiment includes a display function capable of visually recognizing whether a TV program, a video / DVD, or the like is a speech sound or a non-speech sound by using a speech automatic detection function, for example. It is possible to make the user visually recognize whether the currently displayed content is speech sound or non-speech sound. That is, an audio system (speech / non-speech) such as a TV program or video / DVD can be visually recognized in real time. In addition, the speech / non-speech determination described above may be applied to content recording (including re-recording). In this case, the content is transmitted to the content display device via broadcast, via a network, via a recording medium, or the like. It is preferable to add a function to record or reserve record the acquired content as well as the acquired content. For example, speech / non-speech determination can be used for CM determination and other separate recordings with various recorders, etc., and at that time, whether the content corresponds to speech or non-speech at the same time It may be displayed so as to be visible.

また、図１乃至図１０で上述した音質調整装置１やテレビ受像機４等のコンテンツ表示装置、さらにはそれらの構成要素となる各手段は、上述したように、ハードウェアで構成してもよいがその一部をソフトウェアで構成してもよい。例えば、図５のマイコンで示したようなコンピュータやＰＣ等の汎用コンピュータなどにプログラムを組み込むことで構成してもよく、その場合の各種処理について、図１１に示す一般的な情報処理装置の構成例を参照して説明する。図１１は、一般的な情報処理装置の構成例を示すブロック図で、図中、７は情報処理装置、７１はＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、７２はＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、７３は書き換え可能なＲＯＭ、７４は入力装置、７５は表示装置、７６は出力装置、７７はバスである。 Further, the content display devices such as the sound quality adjusting device 1 and the television receiver 4 described above with reference to FIGS. 1 to 10, and further, the respective constituent elements thereof may be configured with hardware as described above. However, a part thereof may be configured by software. For example, the program may be incorporated in a general-purpose computer such as a computer or a PC as shown by the microcomputer in FIG. 5, and the configuration of the general information processing apparatus shown in FIG. This will be described with reference to an example. FIG. 11 is a block diagram showing a configuration example of a general information processing apparatus. In the figure, 7 is an information processing apparatus, 71 is a CPU (Central Processing Unit), 72 is a RAM (Random Access Memory), and 73 is rewritable. ROM, 74 is an input device, 75 is a display device, 76 is an output device, and 77 is a bus.

また、コンピュータを本発明に係る装置や各手段として機能させるためのプログラム、或いは各処理ステップをコンピュータに実行させるためのプログラムは、ＲＯＭ７３に蓄積されており、ＣＰＵ７１が読み出すことによって実行される。コンピュータ等に搭載される場合のこのプログラムは、上述の各手段としてコンピュータのＣＰＵ７１等を制御するプログラム（コンピュータを機能させるプログラム）である。本発明に係る装置や各手段で取り扱われる情報は、その処理時に一時的にＲＡＭ７２に蓄積され、その後、各種ＲＯＭ７３に格納され、必要に応じて、ＣＰＵ７１によって読み出し、修正・書き込みが行われる。ここで本発明に関連する情報としては、ユーザ選択された項目の情報や、閾値や入力装置７４の一つとしての音声信号入力手段によって入力され信号解析される時の音声信号などが挙げられる。また、例えばＲＯＭ７３に記憶された設定選択肢のうち設定された値をＲＡＭ７２に読み出すことでその設定をその間維持するようにしてもよい。 Further, a program for causing a computer to function as an apparatus or each means according to the present invention or a program for causing a computer to execute each processing step is stored in the ROM 73 and is executed by being read out by the CPU 71. This program when installed in a computer or the like is a program (a program that causes a computer to function) that controls the CPU 71 or the like of the computer as each of the means described above. Information handled by the apparatus and each means according to the present invention is temporarily accumulated in the RAM 72 at the time of processing, then stored in various ROMs 73, and read out, corrected, and written by the CPU 71 as necessary. Here, the information related to the present invention includes information on items selected by the user, audio signals when the signal is input and analyzed by the audio signal input means as one of the threshold value and the input device 74, and the like. Further, for example, the setting value stored in the ROM 73 may be read out to the RAM 72 to maintain the setting during that time.

また、処理の途中経過や結果は、ＬＣＤ，ＰＤＰ，有機ＥＬ，ＣＲＴ等の表示装置７５を通して装置ユーザに提示され、ユーザ設定が必要な場合には、キーボード，マウス（ポインティングデバイス）等の入力装置７４から装置ユーザが処理に必要なパラメータを入力指定或いは選択入力すればよい（例えば入力する音声信号或いはそれを含むコンテンツの指定、各種ユーザ設定項目の選択など）。また、このプログラムは、装置ユーザが使用する際に容易となるように、表示装置７５用のグラフィカルユーザインターフェース（ＧＵＩ）を備えるようにするとよい。ＧＵＩの例は、図８乃至図１０でも例示している。出力装置７６としては、音声信号の出力装置であるスピーカをはじめとして、ネットワークに接続して通信を行うためのネットワークボード等の通信機器や、その他、印刷装置等の出力デバイス用の出力装置がある。なお、ＣＰＵ７１，ＲＡＭ７２，ＲＯＭ７３，入力装置７４，表示装置７５，出力装置７６は、バス７７などで接続されていればよい。 The progress and result of the process are presented to the device user through a display device 75 such as an LCD, PDP, organic EL, or CRT. When user settings are required, an input device such as a keyboard or a mouse (pointing device). The device user may input or select and input parameters necessary for processing from 74 (for example, specification of an audio signal to be input or content including it, selection of various user setting items, etc.). In addition, this program may be provided with a graphical user interface (GUI) for the display device 75 so as to be easy for the device user to use. Examples of GUIs are also illustrated in FIGS. Examples of the output device 76 include a speaker which is an audio signal output device, a communication device such as a network board for communication by connecting to a network, and an output device for an output device such as a printing device. . The CPU 71, RAM 72, ROM 73, input device 74, display device 75, and output device 76 may be connected by a bus 77 or the like.

また、上述のごときプログラムを記録した記録媒体としては、具体的には、ＣＤ−ＲＯＭ、光磁気ディスク、ＤＶＤ−ＲＯＭ、ＦＤ、フラッシュメモリ、及びその他各種ＲＯＭ（書き換え可能なＲＯＭも含む）やＲＡＭ等が想定でき、上述した本発明の各実施形態の機能をコンピュータに実行させるプログラムを、これら記録媒体に記録して流通させることにより、当機能の実現を容易にする。そして、コンピュータ等の情報処理装置に、上述のごとくの記録媒体を装着して情報処理装置によりプログラムを読み出すか、若しくは情報処理装置が備えている記録媒体に当プログラムを記憶させておき、必要に応じて読み出すことにより、本発明に係わる機能を実行することができる。 Further, as a recording medium on which the program as described above is recorded, specifically, a CD-ROM, a magneto-optical disk, a DVD-ROM, an FD, a flash memory, and various other ROMs (including a rewritable ROM) and a RAM The above functions can be easily realized by recording and distributing a program for causing a computer to execute the functions of the above-described embodiments of the present invention on these recording media. Then, the information processing apparatus such as a computer is loaded with the recording medium as described above, and the program is read by the information processing apparatus, or the program is stored in the recording medium included in the information processing apparatus, By reading in response, the function according to the present invention can be executed.

図１２は、本発明の他の実施形態に係る音質調整装置の一構成例を示すブロック図で、図中、８は音質調整装置、８０は判定結果表示手段、８１は音声信号入力手段、８２はスピーチ／非スピーチ判定手段、８３は有音／無音判定手段、８５は音質調整手段、８６は音声信号出力手段である。 FIG. 12 is a block diagram showing a configuration example of a sound quality adjusting apparatus according to another embodiment of the present invention, in which 8 is a sound quality adjusting apparatus, 80 is a determination result display means, 81 is an audio signal input means, and 82. Is speech / non-speech determination means, 83 is sound / silence determination means, 85 is sound quality adjustment means, and 86 is audio signal output means.

本実施形態に係る音質調整装置８は、有音／無音判定手段８３、スピーチ／非スピーチ判定手段８２、音質調整手段８５、及び判定結果表示手段８０を備えるものとする。有音／無音判定手段８３は、音声信号入力手段８１で入力された音声信号が有音の状態か無音の状態かを判定する。音声信号入力手段８１では、その入力元や入力方法は問わない。また、有音／無音判定手段８３では、例えば入力音声信号の信号レベルを検出すること（所定レベル以上を有音とするなど）で、有音／無音のいずれの状態であるかを判定すればよい。なお、有音／無音判定手段８３は、その全体又は一部をハードウェアで構成してもソフトウェアで構成してもよい。 The sound quality adjustment apparatus 8 according to the present embodiment includes a sound / silence determination unit 83, a speech / non-speech determination unit 82, a sound quality adjustment unit 85, and a determination result display unit 80. The voice / silence determination unit 83 determines whether the voice signal input by the voice signal input unit 81 is in a voiced state or in a silent state. In the audio signal input means 81, the input source and input method are not limited. In addition, the sound / silence determination means 83 detects the state of sound / silence, for example, by detecting the signal level of the input sound signal (eg, a sound level above a predetermined level). Good. The voice / silence determination means 83 may be configured in whole or in part by hardware or software.

音質調整手段８５は、有音／無音判定手段８３での判定結果に基づいて、音声信号を有音と無音とで異なる音質に設定し、その設定に基づいて音質を調整する。なお、音質調整手段８５は、その全体又は一部をハードウェアで構成してもソフトウェアで構成してもよい。そして、音質調整手段８５による無音時の音質設定は、有音／無音判定手段８３で無音と判定された直前の有音時の音質設定に基づき、その一部のみの変更により行う。例えば、無音の場合には所定の低域帯及び所定の高域帯の出力レベルを有音の場合に比べ１〜２ｄＢ下げるなどすればよい。一部のみの変更により、直前の有音時の設定値に近い設定値で調整することとなり、無音時から再度有音状態に移行した際、この状態が上記直前の有音時と近い信号レベルを持つ状態と想定されることから、設定値の変更が一部で済み、素早い復帰が可能となる。なお、この効果は、音質調整手段８５に基づく音質の設定をハードウェアで構成することでより顕著になる。そして、音声信号出力手段８６は、音質調整手段８５で調整された音声信号を出力する。 The sound quality adjusting unit 85 sets the sound signal to be different in sound quality and soundless based on the determination result in the sound / silence determination unit 83, and adjusts the sound quality based on the setting. The sound quality adjusting means 85 may be configured in whole or in part by hardware or software. The sound quality setting by the sound quality adjusting means 85 is performed by changing only a part thereof based on the sound quality setting at the time of sound immediately before the sound / silence determining means 83 determines that there is no sound. For example, in the case of silence, the output level of a predetermined low frequency band and a predetermined high frequency band may be lowered by 1 to 2 dB compared to the case of sound. By changing only a part, adjustment is made with a setting value close to the setting value at the time of the previous sound, and when transitioning from the silent state to the sounding state again, this state is close to the previous sounding level. Therefore, it is possible to change the setting value in part and to quickly return. This effect becomes more prominent when the sound quality setting based on the sound quality adjusting means 85 is configured by hardware. The audio signal output unit 86 outputs the audio signal adjusted by the sound quality adjusting unit 85.

また、スピーチ／非スピーチ判定手段８２については、図１で説明した通りであるが、ここではモノラル／ステレオ判定に基づく閾値の最適化を行わない例を示している。なお、モノラル／ステレオ判定によってスピーチ自動検出機能の判定基準を最適化させる方が、検出機能の精度を向上させることができる。また、スピーチ／非スピーチ判定手段８２の代わりに、ＥＰＧ情報によってコンテンツの詳細な時系列の情報を取得するよう構成してもよく、その場合にはその取得した情報を元に判定結果表示も行うこととなる。また、スピーチ／非スピーチ判定手段８２の配置は、図１２で示したものに限らない。そして、この形態における音質調整手段８５は、スピーチ／非スピーチ判定手段８２における判定結果に基づいて、スピーチと非スピーチとで、上記一部のみの変更の値を異ならしめればよい。 Further, the speech / non-speech determination means 82 is as described with reference to FIG. 1, but here, an example in which threshold optimization based on monaural / stereo determination is not performed is shown. Note that the accuracy of the detection function can be improved by optimizing the determination standard of the automatic speech detection function by monaural / stereo determination. Further, instead of the speech / non-speech determination means 82, it may be configured to acquire detailed time-series information of the content based on the EPG information. In this case, the determination result is also displayed based on the acquired information. It will be. Further, the arrangement of the speech / non-speech determination means 82 is not limited to that shown in FIG. Then, the sound quality adjusting means 85 in this embodiment may make the change values of only a part of the speech and non-speech different based on the determination result in the speech / non-speech determination means 82.

ここでの音質設定の方法は任意であり、スピーチ／非スピーチにより、その設定値や増減の設定値、或いは各周波数帯での設定値などが異なっていればよい。例えば、グラフィックイコライザのごときイコライザの中心周波数とフィルタのＱ値が固定されている音質設定や、パラメトリックイコライザのごとくこれらも変更可能な音質設定であってもよいが、上述したように、基本的に有音から無音に移行した際の音質設定は直前の有音時のそれに一部変更したものとなる。さらに、上記一部のみの変更は、無音の場合には所定の低域帯及び所定の高域帯の出力レベルを有音の場合に比べ１〜２ｄＢ下げるなどとして例示したように、一部の周波数帯域で局所的に出力レベルを低減させる変更とすることが好ましい。 The sound quality setting method here is arbitrary, and it suffices if the setting value, the increase / decrease setting value, or the setting value in each frequency band differs depending on speech / non-speech. For example, a sound quality setting such as a graphic equalizer in which the center frequency of the equalizer and the Q value of the filter are fixed, or a sound quality setting that can be changed like a parametric equalizer may be used. The sound quality setting at the time of transition from sound to silence is partially changed from that at the previous sound. In addition, only a part of the change described above is illustrated as a case where the output level of the predetermined low-frequency band and the predetermined high-frequency band is lowered by 1 to 2 dB in the case of silence as compared to the case of sound. It is preferable to change the output level locally in the frequency band.

また、判定結果表示手段８０は、スピーチ／非スピーチ判定の結果をユーザに視認させるための手段であるが、同様に、有音／無音の判定結果をユーザに視認させるようにしてもよい。 The determination result display means 80 is a means for making the user visually recognize the speech / non-speech determination result. Similarly, the sound / silence determination result may be visually recognized by the user.

図１３は、図１２の音質調整装置における音質調整処理の一例を説明するためのフロー図で、図１４は、図１２の音質調整装置における音質調整処理で用いる音質設定イコライジングの一例を示す図である。ここで、図１４（Ａ）はスピーチ時の例、図１４（Ｂ）は非スピーチ時の例を示している。 FIG. 13 is a flowchart for explaining an example of the sound quality adjustment process in the sound quality adjustment apparatus of FIG. 12, and FIG. 14 is a diagram showing an example of the sound quality setting equalization used in the sound quality adjustment process in the sound quality adjustment apparatus of FIG. is there. Here, FIG. 14A shows an example during speech, and FIG. 14B shows an example during non-speech.

音質が基本音質に初期設定されているものとして説明する。また、音声信号からスピーチ／非スピーチを判定し、スピーチと判定されたときにはＡの音質に、非スピーチと判定されたときにはＢの音質に設定する例を中心に説明する。 In the following description, it is assumed that the sound quality is initially set to the basic sound quality. Further, an explanation will be made focusing on an example in which speech / non-speech is determined from the audio signal, and the sound quality of A is set when it is determined as speech, and the sound quality of B is set when it is determined as non-speech.

まず、有音／無音判定手段８３で入力レベルを確認する（ステップＳ３１）。ここで、有音であればステップＳ３３へ、無音であれば基本音質を修正し（ステップＳ３２）、再度ステップＳ３１で入力レベルを確認する。ステップＳ３２では、ステップＳ３１での無音状態との判定が二度目以降の場合には、基本音質の修正を行わないようにしてもよく、この場合でなく再度修正する場合でもその設定は継続しておく。ステップＳ３１，Ｓ３２での処理は、音声信号が入力され、最初に音質が音質Ａ／Ｂのいずれかに設定される前の処理であり、その後はステップＳ３３以降の処理で設定の変更及び保持が遂行されていく。 First, the voice / silence determination means 83 confirms the input level (step S31). Here, if there is sound, the process proceeds to step S33. If there is no sound, the basic sound quality is corrected (step S32), and the input level is confirmed again in step S31. In step S32, if the determination of the silent state in step S31 is the second time or later, the basic sound quality may not be corrected, and the setting is continued even in the case of correcting again instead of this case. deep. The processes in steps S31 and S32 are processes before an audio signal is input and the sound quality is first set to one of the sound quality A / B. After that, the setting is changed and retained in the processes after step S33. It will be carried out.

ステップＳ３３では、スピーチ／非スピーチを判定する（ステップＳ３３）。なお、スピーチ／非スピーチにおける判定基準は、ある１つの閾値処理によってなされても複数パラメータの閾値処理によってなされてもよい。ステップＳ３３の判定に基づいて、音質の設定・調整を行う（ステップＳ３４，Ｓ３５）。この音質設定では、スピーチと判定されたときにはＡの音質を選択して音質を調整し（ステップＳ３４）、非スピーチと判定されたときにはＢの音質を選択して音質を調整する（ステップＳ３５）。 In step S33, speech / non-speech is determined (step S33). Note that the determination criteria for speech / non-speech may be made by a single threshold process or a multi-parameter threshold process. Based on the determination in step S33, the sound quality is set and adjusted (steps S34 and S35). In this sound quality setting, when it is determined to be speech, the sound quality of A is selected to adjust the sound quality (step S34), and when it is determined to be non-speech, the sound quality of B is selected to adjust the sound quality (step S35).

ここで、音質設定Ａと音質設定Ｂとの違いの例について、図１４を参照して説明する。音質設定Ａ（スピーチ）の場合、イコライザの周波数特性をグラフ９１で示すように設定し、音質設定Ｂ（非スピーチ）の場合、イコライザの周波数特性をグラフ９３で示すように設定する。グラフ９１とグラフ９３との違いは、非スピーチのとき、スピーチのときの所定の低周波数９１ａの付近及び所定の高周波数９１ｂの付近の出力レベルに比べて、所定の低周波数９３ａの付近及び所定の高周波数９３ｂの付近の出力レベルを強調している点にある。 Here, an example of the difference between the sound quality setting A and the sound quality setting B will be described with reference to FIG. In the case of sound quality setting A (speech), the frequency characteristic of the equalizer is set as shown by a graph 91, and in the case of sound quality setting B (non-speech), the frequency characteristic of the equalizer is set as shown by a graph 93. The difference between the graph 91 and the graph 93 is that in the case of non-speech, the vicinity of the predetermined low frequency 93a and the predetermined level in the vicinity of the predetermined low frequency 91a and the output level in the vicinity of the predetermined high frequency 91b at the time of speech. The output level near the high frequency 93b is emphasized.

ステップＳ３４，Ｓ３５の処理では、この選択した音質を保持しておき、次にステップＳ３６において、その元となったスピーチ／非スピーチの判定結果の表示を行う。そして、有音／無音判定手段８３で入力レベルを確認する（ステップＳ３７）。ここで、有音であれば処理を終了し、無音であれば音質の調整を行う。ここで行われる音質の調整は、音質をそれぞれの前の状態に合わせて修正する（ステップＳ３８）。設定保持されている音質（無音になる前の音質）が、音質Ａであった場合には図１４（Ａ）のグラフ９２のごとき音質Ａ′、音質Ｂであった場合には図１４（Ｂ）のグラフ９４のごとき音質Ｂ′に修正する。スピーチ時のグラフ９２とグラフ９１との違いは、所定の低周波数９１ａの付近及び所定の高周波数９１ｂの付近を強調している点にある。同様に、非スピーチ時のグラフ９４とグラフ９３との違いは、所定の低周波数９３ａの付近及び所定の高周波数９３ｂの付近を強調している点にある。本実施形態では、音質Ａ′，Ｂ′のように、スピーチ自動検出機能使用時に、有音時の音質設定Ａ，Ｂの他に、無音状態用の音質設定、すなわち音声入力信号が無い時、若しくは入力信号が小さい（バックグランドノイズ）時の音質設定を設けておく。 In the processes of steps S34 and S35, the selected sound quality is held, and then, in step S36, the speech / non-speech determination result that is the origin is displayed. The voice / silence determination means 83 confirms the input level (step S37). Here, if there is sound, the process is terminated, and if there is no sound, the sound quality is adjusted. In the sound quality adjustment performed here, the sound quality is corrected in accordance with each previous state (step S38). When the set sound quality (the sound quality before silence) is the sound quality A, the sound quality is A 'as shown in the graph 92 of FIG. ) To a sound quality B 'as shown in the graph 94 of FIG. The difference between the graph 92 and the graph 91 during speech is that the vicinity of the predetermined low frequency 91a and the vicinity of the predetermined high frequency 91b are emphasized. Similarly, the difference between the non-speech graph 94 and the graph 93 is that the vicinity of the predetermined low frequency 93a and the vicinity of the predetermined high frequency 93b are emphasized. In the present embodiment, when the automatic speech detection function is used, as in the case of the sound quality A ′ and B ′, in addition to the sound quality settings A and B in the presence of sound, Alternatively, a sound quality setting when the input signal is small (background noise) is provided.

次に、無音状態から有音状態へ復帰したかを判定する（ステップＳ３９）。復帰せず、無音のままであればそのときの設定（音質パラメータなど）は変更せずに継続しておき、有音状態への復帰を待つ。一方、復帰した場合には、音質Ａ′又は音質Ｂ′を、有音時の音質設定Ａ又はＢに戻し（ステップＳ４０）、処理を終了する。 Next, it is determined whether the silent state has returned to the voiced state (step S39). If the sound is not restored and remains silent, the settings (sound quality parameters, etc.) at that time are continued without being changed, and a return to the sound state is awaited. On the other hand, when the sound is restored, the sound quality A ′ or the sound quality B ′ is returned to the sound quality setting A or B when there is a sound (step S40), and the process is terminated.

以上、本実施形態のごとき有音／無音判定を実行することにより、次のような従来技術の課題を解決することができる。すなわち、従来技術では、音声情報だけから音楽情報の是非を判断することによって生ずるこのような誤判定によって的確な音質調整を行うことが困難であるだけでなく、音声信号が無音の信号や入力レベルが小さい信号であった場合には、スピーカから低高域ノイズが出力される。このような事態を解消するために、信号レベルが０或いは小さいときには入力信号をシャットアウトするような音質調整を行うように機器を構成した場合であっても、信号レベルが上がり音声が復帰したときに的確で素早い音質設定ができない。このような現象は、記録媒体のローディング時、外部入力との切り替え時、視聴するコンテンツがスピーチ時から非スピーチ時への切り替え時、受信するチャンネルの切り替え時、さらにはＣＭからの本編への移行時など、急激に信号レベルの大小が切り替わるような音声信号に対しては、特に問題となる。 As described above, by executing the sound / silence determination as in the present embodiment, the following problems of the prior art can be solved. That is, in the prior art, not only is it difficult to accurately adjust the sound quality due to such a misjudgment caused by judging whether or not the music information is only from the sound information, but the sound signal is a silent signal or an input level. Is a small signal, low and high frequency noise is output from the speaker. To solve this situation, when the signal level rises and the sound is restored even when the equipment is configured to adjust the sound quality so that the input signal is shut out when the signal level is 0 or low The sound quality cannot be set accurately and quickly. Such phenomena occur when recording media are loaded, when switching to external input, when the content to be viewed is switched from speech to non-speech, when the channel to be received is switched, and from CM to the main part. This is particularly a problem for audio signals whose signal level changes suddenly, such as at times.

すなわち、本実施形態に係る音質調整装置によれば、無音時にスピーカから低高域ノイズが出力されるのを削減すると共に、前の状態に近い状態で音質設定をすることによって、音声復帰時の素早い対応（音質設定）が可能となる。つまり、この音質調整装置では、入力レベルが急激に切り替わるような音声信号に対しても、無音時のノイズ出力を的確に低減し且つ有音状態に素早く復帰するような音質設定を行うことが可能となる。 That is, according to the sound quality adjustment apparatus according to the present embodiment, the low-high frequency noise is output from the speaker when there is no sound, and the sound quality setting is performed in a state close to the previous state, so that the sound quality is restored. Quick response (sound quality setting) is possible. In other words, with this sound quality adjustment device, it is possible to perform sound quality settings that accurately reduce noise output during silence and quickly return to the sound state even for audio signals whose input level changes abruptly. It becomes.

本実施形態によれば、このような効果に加え、音声信号の音声情報だけからではなく番組（その音声信号を含む番組）の主旨に沿った判断（スピーチ／非スピーチの判断）も同時になすことで、入力された音声信号の特性によるイコライザ等の音響パラメータ制御の誤判定を極力低減し、的確な音響パラメータの制御及び的確な音質調整が可能となり、さらに、本発明の主たる効果として、入力された音声信号に対してスピーチ／非スピーチを判定する際にその判定結果をユーザに視認させることが可能となる。例えば、音声信号に音声情報と同時に重畳されたモノラル／ステレオ信号によってその番組の主旨を判定し、その結果に応じて入力された音声信号がスピーチか非スピーチ（音楽）かを判断するための判断基準を最適化することによって、放送された番組の内容、特性に応じたスピーチ／非スピーチ検出の自由な制御、及びその制御に基づく音質調整、並びにユーザへの検出結果の提示が可能になる。 According to the present embodiment, in addition to the effects described above, not only the audio information of the audio signal but also the determination (speech / non-speech determination) according to the gist of the program (the program including the audio signal) is made at the same time. Therefore, it is possible to reduce the erroneous determination of the acoustic parameter control of the equalizer or the like as much as possible due to the characteristics of the input audio signal, and it is possible to accurately control the acoustic parameters and adjust the sound quality.In addition, as the main effect of the present invention, When the speech / non-speech is determined for the audio signal, the determination result can be made visible to the user. For example, the main purpose of the program is determined by a monaural / stereo signal superimposed on the audio signal at the same time as the audio information, and a determination for determining whether the input audio signal is speech or non-speech (music) according to the result. By optimizing the criteria, it is possible to freely control speech / non-speech detection according to the contents of broadcasted programs, characteristics, and sound quality adjustment based on the control, and to present detection results to the user.

また、図１２乃至図１４で上述した音質調整装置８も、図１等で示した音質調整装置と同様に、コンテンツ表示装置に組み込むことも可能である。また、その音質調整装置８又はコンテンツ表示装置における構成要素となる各手段もハードウェアで構成してもよいがその一部をソフトウェアで構成してもよい。ＰＣ（パーソナルコンピュータ）等の汎用コンピュータなどにプログラムを組み込むことで構成した例、並びにそのプログラムを記録したコンピュータ読み取り可能な記録媒体の例も、図１１を参照して説明した通りであるが、ＲＯＭに格納されているプログラムが異なる。このプログラムは、上述した各手段に対応する処理ステップ、すなわち有音／無音判定ステップ、スピーチ／非スピーチ判定ステップ、音質調整ステップ、及びスピーチ／非スピーチ判定に基づく判定結果表示ステップとを、コンピュータに実行させるためのプログラムである。そして、音質調整ステップにおける無音時の音質設定は、有音／無音判定ステップで無音と判定された直前の有音時の音質設定に基づき、その一部のみの変更により行う。また、音質調整を音質調整器（ハードウェア）によって実行させる場合の音質調整ステップは、音声信号を音質設定に基づき音声信号の音質を音質調整機器に調整させるための制御を行うステップとなる。 Further, the sound quality adjusting device 8 described above with reference to FIGS. 12 to 14 can also be incorporated into the content display device in the same manner as the sound quality adjusting device shown in FIG. Further, each means that is a component in the sound quality adjusting device 8 or the content display device may be configured by hardware, but a part thereof may be configured by software. An example in which a program is incorporated in a general-purpose computer such as a PC (personal computer) and an example of a computer-readable recording medium in which the program is recorded are as described with reference to FIG. The programs stored in are different. This program performs processing steps corresponding to the above-described means, that is, sound / silence determination step, speech / non-speech determination step, sound quality adjustment step, and determination result display step based on speech / non-speech determination on a computer. This is a program to be executed. The sound quality setting in the sound quality adjustment step is performed by changing only a part thereof based on the sound quality setting in the sound immediately before it is determined that there is no sound in the sound / silence determination step. The sound quality adjustment step when the sound quality adjustment is executed by the sound quality adjuster (hardware) is a step of performing control for causing the sound quality adjusting device to adjust the sound quality of the sound signal based on the sound quality setting of the sound signal.

本発明の一実施形態に係る音質調整装置の一構成例を示すブロック図である。It is a block diagram which shows the example of 1 structure of the sound quality adjustment apparatus which concerns on one Embodiment of this invention. 図１の音質調整装置における音質調整処理並びに判定結果表示処理の一例を説明するためのフロー図である。It is a flowchart for demonstrating an example of the sound quality adjustment process and determination result display process in the sound quality adjustment apparatus of FIG. 図１の音質調整装置における音質調整処理で用いる音質設定イコライジングの一例を示す図である。It is a figure which shows an example of the sound quality setting equalizing used by the sound quality adjustment process in the sound quality adjustment apparatus of FIG. 図２の判定結果表示処理における画面表示例を示す図である。It is a figure which shows the example of a screen display in the determination result display process of FIG. 図１の音質調整装置における適用例の一つであるテレビ受像機の一構成例を示すブロック図である。It is a block diagram which shows the example of 1 structure of the television receiver which is one of the application examples in the sound quality adjustment apparatus of FIG. 図５におけるマイコン内に格納されているマーク表示目標テーブルの一例を示す図である。It is a figure which shows an example of the mark display target table stored in the microcomputer in FIG. 図５のテレビ受像機における判定結果表示処理を説明するためのフロー図で、図２のフロー図における判定結果表示処理を抜粋して詳細に説明するためのフロー図でもある。FIG. 6 is a flowchart for explaining determination result display processing in the television receiver of FIG. 5, and is also a flowchart for explaining in detail by extracting the determination result display processing in the flowchart of FIG. 2. 図１の音質調整装置における判定結果表示の設定画面の一例を示す図である。It is a figure which shows an example of the setting screen of the determination result display in the sound quality adjustment apparatus of FIG. 図１の音質調整装置における判定結果表示の設定画面の一例を示す図である。It is a figure which shows an example of the setting screen of the determination result display in the sound quality adjustment apparatus of FIG. 図１の音質調整装置における判定結果表示の設定画面の一例を示す図である。It is a figure which shows an example of the setting screen of the determination result display in the sound quality adjustment apparatus of FIG. 一般的な情報処理装置の構成例を示すブロック図である。FIG. 11 is a block diagram illustrating a configuration example of a general information processing apparatus. 本発明の他の実施形態に係る音質調整装置の一構成例を示すブロック図である。It is a block diagram which shows the example of 1 structure of the sound quality adjustment apparatus which concerns on other embodiment of this invention. 図１２の音質調整装置における音質調整処理の一例を説明するためのフロー図である。It is a flowchart for demonstrating an example of the sound quality adjustment process in the sound quality adjustment apparatus of FIG. 図１２の音質調整装置における音質調整処理で用いる音質設定イコライジングの一例を示す図である。It is a figure which shows an example of the sound quality setting equalization used by the sound quality adjustment process in the sound quality adjustment apparatus of FIG.

Explanation of symbols

１，８…音質調整装置、４…テレビ受像機、７…情報処理装置、１０，８０…判定結果表示手段、１１，８１…音声信号入力手段、１２，８２…スピーチ／非スピーチ判定手段、１３…モノラル／ステレオ判定手段、１４…基準最適化手段、１４ａ…スイッチ、１４ｂ…閾値Ｖ_ＳＬ１への設定手段、１４ｃ…閾値Ｖ_ＳＬ２への設定手段、１５，８５…音質調整手段、１６，８６…音声信号出力手段、４０…チューナ部、４１…外部入力部、４２…本体操作部、４３…映像処理ＩＣ、４４…マイコン、４５…音声処理ＩＣ、４６…ディスプレイ、４７Ｌ，４７Ｒ…スピーカ、４８…受光部、４９…リモコン、７１…ＣＰＵ、７２…ＲＡＭ、７３…書き換え可能なＲＯＭ、７４…入力装置、７５…表示装置、７６…出力装置、７７…バス、８３…有音／無音判定手段。 DESCRIPTION OF SYMBOLS 1,8 ... Sound quality adjustment apparatus, 4 ... Television receiver, 7 ... Information processing apparatus, 10, 80 ... Determination result display means, 11, 81 ... Audio | voice signal input means, 12, 82 ... Speech / non-speech determination means, 13 ... mono / stereo decision means, 14 ... reference optimization means, 14a ... switch, setting means to 14b ... threshold _{V SL1,} setting means to 14c ... threshold _{V SL2,} 15,85 ... tone control means, 16,86 ... Audio signal output means, 40 ... tuner unit, 41 ... external input unit, 42 ... main body operation unit, 43 ... video processing IC, 44 ... microcomputer, 45 ... audio processing IC, 46 ... display, 47L, 47R ... speaker, 48 ... Light receiving unit 49... Remote control 71... CPU 72. RAM 73. Rewriteable ROM 74 Input device 75 Display device 76 Output device 77 Bus 83 Sound / silence determining means.

Claims

Speech / non-speech determination means for determining whether the input audio signal corresponds to speech or non-speech, and the speech / non-speech determination means determines speech / non-speech. A sound quality adjusting device having sound quality adjusting means for adjusting the sound quality of the speech signal different between speech and non-speech, and a determination result display means for displaying a determination result in the speech / non-speech determining means, The determination result display means displays to the user the determination result in stages according to the degree of speech or non-speech.

The sound quality adjusting means includes adjustment setting means for setting whether or not to execute the sound quality adjustment based on a determination result of the speech / non-speech determining means, and the determination result display means is determined by the adjustment setting means. The sound quality adjustment apparatus according to claim 1, wherein the determination result is displayed only when the sound quality adjustment is set to be executed.

The determination result display means has display setting means for setting whether or not to display the determination result, and only when the determination result display is set to be executed by the display setting means, The sound quality adjustment apparatus according to claim 1 or 2, wherein a determination result is displayed.

A content display device comprising the sound quality adjustment device according to any one of claims 1 to 3 and a content input device, wherein an audio signal included in the content input by the content input device is transmitted to the sound quality adjustment device. The content display device is characterized in that the sound quality is adjusted, the sound is output, the video signal included in the content is displayed, and the determination result display unit displays the determination result as necessary. .

A speech / non-speech determination step for determining whether an input audio signal corresponds to speech or non-speech, and the speech / non-speech determination step determines speech / non-speech. A program for causing a computer to execute a sound quality adjustment step for adjusting different sound qualities for speech and non-speech with respect to a sound signal and a determination result display step for displaying a determination result in the speech / non-speech determination step The determination result display step is a step of displaying the determination result stepwise to the user in accordance with the degree of speech or non-speech.

The sound quality adjustment step includes an adjustment setting step for setting whether or not to execute the sound quality adjustment based on the determination result in the speech / non-speech determination step, and the determination result display step includes the adjustment setting step. The program according to claim 5, wherein the determination result is displayed only when the sound quality adjustment is set to be executed.

The determination result display step includes a display setting step for setting whether or not to display the determination result, and the determination result is displayed only when the determination result display is set to be executed by the display setting step. The program according to claim 5 or 6, wherein a result is displayed.

The computer-readable recording medium which recorded the program of any one of Claim 5 thru | or 7.