JP5695896B2

JP5695896B2 - SOUND QUALITY CONTROL DEVICE, SOUND QUALITY CONTROL METHOD, AND SOUND QUALITY CONTROL PROGRAM

Info

Publication number: JP5695896B2
Application number: JP2010286276A
Authority: JP
Inventors: 竹内　広和; 広和竹内; 裕米久保
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2010-12-22
Filing date: 2010-12-22
Publication date: 2015-04-08
Anticipated expiration: 2030-12-22
Also published as: JP2012134842A

Description

この発明の実施の形態は、再生すべきオーディオ（可聴周波数）信号に含まれる音声信号と音楽信号とに対して、それぞれ適応的に音質制御処理を施す音質制御装置、音質制御方法及び音質制御用プログラムに関する。 Embodiments of the present invention provide a sound quality control apparatus, a sound quality control method, and a sound quality control apparatus that adaptively perform sound quality control processing on a sound signal and a music signal included in an audio (audible frequency) signal to be reproduced. Regarding the program.

周知のように、例えばテレビジョン放送を受信する放送受信機器や、情報記録媒体からその記録情報を再生する情報再生機器等にあっては、受信した放送信号や情報記録媒体から読み取った信号等からオーディオ信号を再生する際に、オーディオ信号に音質制御処理を施すことによって、より一層の高音質化を図るようにしている。また、テレビ等の視聴時に周囲の背景雑音（環境音）によってテレビのコンテンツ再生音が聞きづらい状況において、再生音を補正する方法が提案されている。 As is well known, for example, in a broadcast receiving device that receives a television broadcast or an information reproducing device that reproduces recorded information from an information recording medium, the received broadcast signal or the signal read from the information recording medium When the audio signal is reproduced, the audio signal is subjected to sound quality control processing to further improve the sound quality. Also, a method has been proposed for correcting the playback sound in a situation where it is difficult to hear the content playback sound of the television due to ambient background noise (environmental sound) during viewing of the television or the like.

このような状況において、特許文献１では、コンテンツのオーディオ再生信号とマイクから取得される環境音のラウドネス（あるいはレベル）との比較や再生信号の有音声・無音声判定に基づく音量制御や、環境音のスペクトル重心周波数に応じたイコライジング処理による補正を行うことで、環境音に応じたオーディオ再生信号の出力レベルを制御する技術が開示されている。 Under such circumstances, in Patent Document 1, volume control based on comparison between the audio playback signal of the content and the loudness (or level) of the environmental sound acquired from the microphone, and whether the playback signal is voiced or silent, A technique for controlling the output level of an audio reproduction signal according to environmental sound by performing correction by equalizing processing according to the spectrum centroid frequency of sound is disclosed.

しかしながら、上記技術は、コンテンツのオーディオ再生信号の解析は有音声か無音声かの２値判定であり、またその判定結果に応じて音声であればより音量を大きく制御するものである。この場合、有音声判定の場合でも環境音の信号特性によっては必ずしも音声が聞きづらいとは限らず、その場合にはより過剰に音量が増大されることになり、不快な音量になる可能性がある。 However, in the above technique, the analysis of the audio reproduction signal of the content is a binary determination of voiced or non-voiced, and the volume is controlled to be larger if the sound is in accordance with the determination result. In this case, even in the case of voiced determination, depending on the signal characteristics of the environmental sound, it is not always difficult to hear the sound. In that case, the volume is increased excessively, which may result in an unpleasant volume. .

また、上記技術は、コンテンツのオーディオ再生信号と環境音のラウドネス(あるいはレベル)との比較に応じた音量制御を行っているが、オーディオ再生信号の音種別に合った音質制御をしている訳ではなく、音量以外の音質制御（サラウンド、イコライザ、センター強調等）としては、必ずしも適切に制御されない。 In addition, the above technology performs volume control according to the comparison between the audio playback signal of the content and the loudness (or level) of the environmental sound, but it performs sound quality control that matches the sound type of the audio playback signal. Instead, sound quality control other than volume (surround, equalizer, center emphasis, etc.) is not necessarily controlled appropriately.

特開２０１０−１５４３８８号公報JP 2010-154388 A

オーディオ信号に対して、再生信号の特性と視聴時の周囲の環境音の特性に応じた適切な音質制御処理を施すことを可能とした音質制御装置、音質制御方法及び音質制御用プログラムを提供することを目的とする。 Provided are a sound quality control device, a sound quality control method, and a sound quality control program capable of performing an appropriate sound quality control process according to the characteristics of a reproduction signal and the characteristics of ambient environmental sounds at the time of viewing. For the purpose.

実施形態に係る音質制御装置は、入力オーディオ信号に対してその再生音が周囲の環境音にマスクされないように周波数帯域毎にゲインを補正するための補正ゲインを算出する補正ゲイン算出手段と、入力オーディオ信号に含まれる１以上の音種別のうち、支配的な音種別に応じた周波数帯域毎の重み係数に基づいて、周波数帯域毎の補正ゲインを補正する補正ゲイン補正手段と、前記補正ゲイン補正手段で補正された周波数帯域毎の補正ゲインを用いて生成される音質制御信号に基づいて、入力オーディオ信号に対して音質制御処理を施す音質制御手段とを具備する。 The sound quality control apparatus according to the embodiment includes a correction gain calculation unit that calculates a correction gain for correcting a gain for each frequency band so that a reproduced sound is not masked by an ambient environmental sound with respect to an input audio signal, and an input Correction gain correction means for correcting a correction gain for each frequency band based on a weighting factor for each frequency band corresponding to the dominant sound type among one or more sound types included in the audio signal , and the correction gain correction Sound quality control means for performing sound quality control processing on the input audio signal based on the sound quality control signal generated using the correction gain for each frequency band corrected by the means.

実施の形態におけるデジタルテレビジョン放送受信装置とそれを中心としたネットワークシステムの一例とを概略的に説明するために示す図。The figure shown in order to demonstrate roughly the digital television broadcast receiver in embodiment and an example of the network system centering on it. 同実施の形態におけるデジタルテレビジョン放送受信装置の主要な信号処理系の一例を説明するために示すブロック構成図。The block block diagram shown in order to demonstrate an example of the main signal processing systems of the digital television broadcast receiver in the embodiment. 同実施の形態におけるデジタルテレビジョン放送受信装置のオーディオ処理部に含まれる音質制御処理部の一例を説明するために示すブロック構成図。The block block diagram shown in order to demonstrate an example of the sound quality control processing part contained in the audio processing part of the digital television broadcast receiver in the embodiment. 同実施の形態における音質制御処理部に含まれる特徴パラメータ算出部が行なう動作の一例を説明するために示す図。The figure shown in order to demonstrate an example of the operation | movement which the characteristic parameter calculation part contained in the sound quality control processing part in the embodiment performs. 同実施の形態における特徴パラメータ算出部が行なう主要な処理動作の一例を説明するために示すフローチャート。The flowchart shown in order to demonstrate an example of main processing operation which the characteristic parameter calculation part in the embodiment performs. 同実施の形態における音質制御処理部に含まれる音声・音楽識別スコア算出部及び音楽・背景音識別スコア算出部が行なう動作の一例を説明するために示すフローチャート。The flowchart shown in order to demonstrate an example of the operation | movement which the audio | voice / music identification score calculation part and the music / background sound identification score calculation part contained in the sound quality control processing part in the same embodiment perform. 同実施の形態における音質制御処理部に含まれる検出スコア算出部が行なう主要な処理動作の一例の一部を説明するために示すフローチャート。The flowchart shown in order to demonstrate a part of example of main processing operations which the detection score calculation part contained in the sound quality control processing part in the embodiment performs. 同実施の形態における検出スコア算出部が行なう主要な処理動作の一例の他の部分を説明するために示すフローチャート。The flowchart shown in order to demonstrate the other part of an example of main processing operation which the detection score calculation part in the embodiment performs. 同実施の形態における検出スコア算出部が行なう主要な処理動作の一例の残部を説明するために示すフローチャート。The flowchart shown in order to demonstrate the remainder of an example of main processing operations which the detection score calculation part in the same embodiment performs. 同実施の形態における音質制御処理部に含まれるマスキング補正ゲイン算出部が行なう動作の一例を説明するために示す図。The figure shown in order to demonstrate an example of the operation | movement which the masking correction gain calculation part contained in the sound quality control process part in the embodiment performs. 同実施の形態における音質制御処理部に含まれる補正特性制御部が行なう主要な処理動作の一例の一部を説明するために示すフローチャート。The flowchart shown in order to demonstrate a part of example of main processing operations which the correction characteristic control part contained in the sound quality control processing part in the embodiment performs. 同実施の形態における補正特性制御部が行なう主要な処理動作の一例の残部を説明するために示すフローチャート。The flowchart shown in order to demonstrate the remainder of an example of main processing operations which the correction characteristic control part in the embodiment performs. 同実施の形態における補正特性制御部が主要な処理動作中に使用する補正特性算出重み係数の一例を説明するために示す図。The figure shown in order to demonstrate an example of the correction characteristic calculation weight coefficient which the correction characteristic control part in the embodiment uses during main processing operation. 同実施の形態における音質制御処理部に含まれる音質制御部の一例を説明するために示すブロック構成図。The block block diagram shown in order to demonstrate an example of the sound quality control part contained in the sound quality control process part in the embodiment. 同実施の形態における補正特性制御部が行なう主要な処理動作の他の例を説明するために示すフローチャート。The flowchart shown in order to demonstrate the other example of main processing operation which the correction characteristic control part in the embodiment performs. 同実施の形態における補正特性制御部が主要な処理動作中に使用する補正強度算出重み係数の一例を説明するために示す図。The figure shown in order to demonstrate an example of the correction | amendment intensity | strength calculation weight coefficient which the correction characteristic control part in the embodiment uses during main processing operation.

以下、実施の形態について図面を参照して詳細に説明する。図１は、この実施の形態で説明するデジタルテレビジョン放送受信装置１１の外観と、このデジタルテレビジョン放送受信装置１１を中心として構成されるネットワークシステムの一例とを概略的に示している。 Hereinafter, embodiments will be described in detail with reference to the drawings. FIG. 1 schematically shows an external appearance of a digital television broadcast receiving apparatus 11 described in this embodiment and an example of a network system configured around the digital television broadcast receiving apparatus 11.

すなわち、デジタルテレビジョン放送受信装置１１は、主として、薄型のキャビネット１２と、このキャビネット１２を起立させて支持する支持台１３とから構成されている。そして、このキャビネット１２には、例えばＳＥＤ（surface-conduction electron-emitter display）表示パネルまたは液晶表示パネル等でなる平面パネル型の映像表示器１４、一対のスピーカ１５，１５、操作部１６、リモートコントローラ１７から送信される操作情報を受ける受光部１８、マイクロホンＭＩＣ等が設置されている。 That is, the digital television broadcast receiver 11 is mainly composed of a thin cabinet 12 and a support base 13 that supports the cabinet 12 upright. The cabinet 12 includes, for example, a flat panel type video display 14 composed of a surface-conduction electron-emitter display (SED) display panel or a liquid crystal display panel, a pair of speakers 15 and 15, an operation unit 16, a remote controller. A light receiving unit 18 that receives operation information transmitted from the microphone 17, a microphone MIC, and the like are installed.

また、このデジタルテレビジョン放送受信装置１１には、例えばＳＤ（secure digital）メモリカード、ＭＭＣ（multimedia card）及びメモリスティック等の第１のメモリカード１９が着脱可能となっており、この第１のメモリカード１９に対して番組や写真等の情報の記録再生が行なわれるようになっている。 In addition, for example, a first memory card 19 such as an SD (secure digital) memory card, an MMC (multimedia card), and a memory stick can be attached to and detached from the digital television broadcast receiver 11. Information such as programs and photographs is recorded on and reproduced from the memory card 19.

さらに、このデジタルテレビジョン放送受信装置１１には、例えば契約情報等の記録された第２のメモリカード［ＩＣ（integrated circuit）カード等］２０が着脱可能となっており、この第２のメモリカード２０に対して情報の記録再生が行なわれるようになっている。 Further, for example, a second memory card [IC (integrated circuit) card or the like] 20 in which contract information or the like is recorded can be attached to and detached from the digital television broadcast receiver 11. Information is recorded / reproduced with respect to 20.

また、このデジタルテレビジョン放送受信装置１１は、第１のＬＡＮ（local area network）端子２１、第２のＬＡＮ端子２２、ＵＳＢ（universal serial bus）端子２３及びＩＥＥＥ（institute of electrical and electronics engineers）１３９４端子２４を備えている。 The digital television broadcast receiver 11 includes a first LAN (local area network) terminal 21, a second LAN terminal 22, a USB (universal serial bus) terminal 23, and an IEEE (institute of electrical and electronics engineers) 1394. A terminal 24 is provided.

このうち、第１のＬＡＮ端子２１は、ＬＡＮ対応ＨＤＤ（hard disk drive）専用ポートとして使用される。すなわち、この第１のＬＡＮ端子２１は、それに接続されたＮＡＳ（network attached storage）であるＬＡＮ対応のＨＤＤ２５に対して、イーサネット（登録商標）により情報の記録再生を行なうために使用される。 Among these, the first LAN terminal 21 is used as a LAN dedicated HDD (hard disk drive) dedicated port. That is, the first LAN terminal 21 is used for recording and reproducing information by Ethernet (registered trademark) with respect to a LAN-compatible HDD 25 that is a NAS (network attached storage) connected thereto.

このように、デジタルテレビジョン放送受信装置１１にＬＡＮ対応ＨＤＤ専用ポートとしての第１のＬＡＮ端子２１を設けることにより、他のネットワーク環境やネットワーク使用状況等に影響されることなく、ＨＤＤ２５に対してハイビジョン画質による放送番組の情報記録を安定して行なうことができる。 Thus, by providing the digital television broadcast receiving apparatus 11 with the first LAN terminal 21 as a LAN-compatible HDD dedicated port, the HDD 25 can be connected without being affected by other network environments or network usage conditions. It is possible to record broadcast program information stably with high-definition image quality.

また、第２のＬＡＮ端子２２は、イーサネット（登録商標）を用いた一般的なＬＡＮ対応ポートとして使用される。すなわち、この第２のＬＡＮ端子２２は、ハブ２６を介して、ＬＡＮ対応のＨＤＤ２７、ＰＣ（personal computer）２８、ＨＤＤ内蔵のＤＶＤ（digital versatile disk）レコーダ２９等の機器を接続して、例えば家庭内ネットワークを構築し、これらの機器と情報伝送を行なうために使用される。 The second LAN terminal 22 is used as a general LAN compatible port using Ethernet (registered trademark). That is, the second LAN terminal 22 is connected to devices such as a LAN-compatible HDD 27, a PC (personal computer) 28, a DVD (digital versatile disk) recorder 29, etc. via a hub 26, for example, at home. It is used to construct an internal network and transmit information with these devices.

この場合、ＰＣ２８及びＤＶＤレコーダ２９については、それぞれ、家庭内ネットワークにおいてコンテンツのサーバ機器として動作するための機能を持ち、さらにコンテンツのアクセスに必要なＵＲＩ（uniform resource identifier）情報を提供するサービスを備えたＵＰｎＰ（universal plug and play）対応機器として構成される。 In this case, each of the PC 28 and the DVD recorder 29 has a function for operating as a content server device in a home network, and further includes a service for providing URI (uniform resource identifier) information necessary for accessing the content. It is configured as a UPnP (universal plug and play) compatible device.

なお、ＤＶＤレコーダ２９については、第２のＬＡＮ端子２２を介して通信されるデジタル情報が制御系のみの情報であるため、デジタルテレビジョン放送受信装置１１との間でアナログの映像及びオーディオ情報を伝送するために、専用のアナログ伝送路３０が設けられている。 As for the DVD recorder 29, since the digital information communicated via the second LAN terminal 22 is information only for the control system, analog video and audio information is exchanged with the digital television broadcast receiver 11. A dedicated analog transmission line 30 is provided for transmission.

さらに、この第２のＬＡＮ端子２２は、ハブ２６に接続されたブロードバンドルータ３１を介して、例えばインターネット等の外部のネットワーク３２に接続される。そして、この第２のＬＡＮ端子２２は、ネットワーク３２を介してＰＣ３３や携帯電話３４等と情報伝送を行なうためにも使用される。 Further, the second LAN terminal 22 is connected to an external network 32 such as the Internet via a broadband router 31 connected to the hub 26. The second LAN terminal 22 is also used to transmit information with the PC 33, the mobile phone 34, etc. via the network 32.

また、上記ＵＳＢ端子２３は、一般的なＵＳＢ対応ポートとして使用されるもので、例えばハブ３５を介して、携帯電話３６、デジタルカメラ３７、メモリカードに対するカードリーダ／ライタ３８、ＨＤＤ３９、キーボード４０等のＵＳＢ機器を接続し、これらのＵＳＢ機器と情報伝送を行なうために使用される。 The USB terminal 23 is used as a general USB compatible port. For example, a mobile phone 36, a digital camera 37, a card reader / writer 38 for a memory card, an HDD 39, a keyboard 40, etc. via a hub 35. USB devices are connected to each other and used for information transmission with these USB devices.

さらに、上記ＩＥＥＥ１３９４端子２４は、例えばＡＶ−ＨＤＤ４１及びＤ（digital）−ＶＨＳ（video home system）４２等のような複数の情報記録再生機器をシリアル接続し、各機器と選択的に情報伝送を行なうために使用される。 Further, the IEEE 1394 terminal 24 serially connects a plurality of information recording / reproducing devices such as an AV-HDD 41 and a D (digital) -VHS (video home system) 42 to selectively transmit information to each device. Used for.

図２は、上記したデジタルテレビジョン放送受信装置１１の主要な信号処理系を示している。すなわち、ＢＳ／ＣＳ（broadcasting satellite／communication satellite）デジタル放送受信用のアンテナ４３で受信した衛星デジタルテレビジョン放送信号は、入力端子４４を介して衛星デジタル放送用のチューナ４５に供給されることにより、所望のチャンネルの放送信号が選局される。 FIG. 2 shows a main signal processing system of the digital television broadcast receiver 11 described above. That is, the satellite digital television broadcast signal received by the BS / CS (broadcasting satellite / communication satellite) digital broadcast receiving antenna 43 is supplied to the satellite digital broadcast tuner 45 via the input terminal 44. A broadcast signal of a desired channel is selected.

そして、このチューナ４５で選局された放送信号は、ＰＳＫ（phase shift keying）復調器４６及びＴＳ（transport stream）復号器４７に順次供給されることにより、デジタルの映像信号及びオーディオ信号に復調された後、信号処理部４８に出力される。 The broadcast signal selected by the tuner 45 is sequentially supplied to a PSK (phase shift keying) demodulator 46 and a TS (transport stream) decoder 47 to be demodulated into a digital video signal and an audio signal. And then output to the signal processing unit 48.

また、地上波放送受信用のアンテナ４９で受信した地上デジタルテレビジョン放送信号は、入力端子５０を介して地上デジタル放送用のチューナ５１に供給されることにより、所望のチャンネルの放送信号が選局される。 The terrestrial digital television broadcast signal received by the terrestrial broadcast receiving antenna 49 is supplied to the digital terrestrial broadcast tuner 51 via the input terminal 50, so that the broadcast signal of the desired channel is selected. Is done.

そして、このチューナ５１で選局された放送信号は、例えば日本ではＯＦＤＭ（orthogonal frequency division multiplexing）復調器５２及びＴＳ復号器５３に順次供給されることにより、デジタルの映像信号及びオーディオ信号に復調された後、上記信号処理部４８に出力される。 The broadcast signal selected by the tuner 51 is demodulated into a digital video signal and an audio signal by being sequentially supplied to an OFDM (orthogonal frequency division multiplexing) demodulator 52 and a TS decoder 53 in Japan, for example. After that, it is output to the signal processing unit 48.

また、上記地上波放送受信用のアンテナ４９で受信した地上アナログテレビジョン放送信号は、入力端子５０を介して地上アナログ放送用のチューナ５４に供給されることにより、所望のチャンネルの放送信号が選局される。そして、このチューナ５４で選局された放送信号は、アナログ復調器５５に供給されてアナログの映像信号及びオーディオ信号に復調された後、上記信号処理部４８に出力される。 The terrestrial analog television broadcast signal received by the terrestrial broadcast receiving antenna 49 is supplied to the terrestrial analog broadcast tuner 54 via the input terminal 50, so that the broadcast signal of the desired channel is selected. Bureau. The broadcast signal selected by the tuner 54 is supplied to the analog demodulator 55, demodulated into an analog video signal and audio signal, and then output to the signal processing unit 48.

ここで、上記信号処理部４８は、ＴＳ復号器４７，５３からそれぞれ供給されたデジタルの映像信号及びオーディオ信号に対して、選択的に所定のデジタル信号処理を施し、グラフィック処理部５６及びオーディオ処理部５７に出力している。 Here, the signal processing unit 48 selectively performs predetermined digital signal processing on the digital video signal and audio signal supplied from the TS decoders 47 and 53, respectively, and the graphic processing unit 56 and audio processing are performed. This is output to the unit 57.

また、上記信号処理部４８には、複数（図示の場合は４つ）の入力端子５８ａ，５８ｂ，５８ｃ，５８ｄが接続されている。これら入力端子５８ａ〜５８ｄは、それぞれ、アナログの映像信号及びオーディオ信号を、デジタルテレビジョン放送受信装置１１の外部から入力可能とするものである。 The signal processing unit 48 is connected to a plurality (four in the illustrated case) of input terminals 58a, 58b, 58c, and 58d. These input terminals 58a to 58d can input analog video signals and audio signals from the outside of the digital television broadcast receiving apparatus 11, respectively.

そして、上記信号処理部４８は、上記アナログ復調器５５及び各入力端子５８ａ〜５８ｄからそれぞれ供給されたアナログの映像信号及びオーディオ信号を選択的にデジタル化し、このデジタル化された映像信号及びオーディオ信号に対して所定のデジタル信号処理を施した後、グラフィック処理部５６及びオーディオ処理部５７に出力する。 The signal processing unit 48 selectively digitizes the analog video signal and audio signal supplied from the analog demodulator 55 and the input terminals 58a to 58d, respectively, and the digitized video signal and audio signal. Are subjected to predetermined digital signal processing and then output to the graphic processing unit 56 and the audio processing unit 57.

グラフィック処理部５６は、信号処理部４８から供給されるデジタルの映像信号に、ＯＳＤ（on screen display）信号生成部５９で生成されるＯＳＤ信号を重畳して出力する機能を有する。このグラフィック処理部５６は、信号処理部４８の出力映像信号と、ＯＳＤ信号生成部５９の出力ＯＳＤ信号とを選択的に出力すること、また、両出力をそれぞれ画面の半分を構成するように組み合わせて出力することができる。 The graphic processing unit 56 has a function of superimposing and outputting the OSD signal generated by the OSD (on screen display) signal generation unit 59 on the digital video signal supplied from the signal processing unit 48. The graphic processing unit 56 selectively outputs the output video signal of the signal processing unit 48 and the output OSD signal of the OSD signal generation unit 59, and combines both outputs so as to constitute half of the screen. Can be output.

グラフィック処理部５６から出力されたデジタルの映像信号は、映像処理部６０に供給される。この映像処理部６０は、入力されたデジタルの映像信号を、前記映像表示器１４で表示可能なフォーマットのアナログ映像信号に変換した後、映像表示器１４に出力して映像表示させるとともに、出力端子６１を介して外部に導出させる。 The digital video signal output from the graphic processing unit 56 is supplied to the video processing unit 60. The video processing unit 60 converts the input digital video signal into an analog video signal in a format that can be displayed on the video display 14 and then outputs the analog video signal to the video display 14 to display the video. Derived outside through 61.

また、上記オーディオ処理部５７は、入力されたデジタルのオーディオ信号に対して、後述する音質制御処理を施した後、前記スピーカ１５で再生可能なフォーマットのアナログオーディオ信号に変換している。そして、このアナログオーディオ信号は、スピーカ１５に出力されてオーディオ再生に供されるとともに、出力端子６２を介して外部に導出される。 The audio processing unit 57 performs a sound quality control process, which will be described later, on the input digital audio signal, and then converts it into an analog audio signal in a format that can be reproduced by the speaker 15. The analog audio signal is output to the speaker 15 for audio reproduction, and is derived to the outside via the output terminal 62.

さらに、このオーディオ処理部５７には、前記マイクロホンＭＩＣが接続されており、マイクロホンＭＩＣによって採取した周囲の環境音に対応した信号が供給されるようになっている。 Furthermore, the microphone MIC is connected to the audio processing unit 57, and a signal corresponding to the ambient environmental sound collected by the microphone MIC is supplied.

ここで、このデジタルテレビジョン放送受信装置１１は、上記した各種の受信動作を含むその全ての動作を制御部６３によって統括的に制御されている。この制御部６３は、ＣＰＵ（central processing unit）６３ａを内蔵しており、前記操作部１６からの操作情報、または、リモートコントローラ１７から送出され前記受光部１８に受信された操作情報を受けて、その操作内容が反映されるように各部をそれぞれ制御している。 Here, in the digital television broadcast receiving apparatus 11, all operations including the above-described various reception operations are comprehensively controlled by the control unit 63. The control unit 63 includes a CPU (central processing unit) 63a, and receives operation information from the operation unit 16 or operation information sent from the remote controller 17 and received by the light receiving unit 18, Each unit is controlled to reflect the operation content.

この場合、制御部６３は、主として、そのＣＰＵ６３ａが実行する制御プログラムを格納したＲＯＭ（read only memory）６３ｂと、該ＣＰＵ６３ａに作業エリアを提供するＲＡＭ（random access memory）６３ｃと、各種の設定情報及び制御情報等が格納される不揮発性メモリ６３ｄとを利用している。 In this case, the control unit 63 mainly includes a read only memory (ROM) 63b that stores a control program executed by the CPU 63a, a random access memory (RAM) 63c that provides a work area for the CPU 63a, and various setting information. And a nonvolatile memory 63d in which control information and the like are stored.

また、この制御部６３は、カードＩ／Ｆ（interface）６４を介して、前記第１のメモリカード１９が装着可能なカードホルダ６５に接続されている。これによって、制御部６３は、カードホルダ６５に装着された第１のメモリカード１９と、カードＩ／Ｆ６４を介して情報伝送を行なうことができる。 The control unit 63 is connected via a card I / F (interface) 64 to a card holder 65 in which the first memory card 19 can be mounted. Thereby, the control unit 63 can perform information transmission via the card I / F 64 with the first memory card 19 mounted in the card holder 65.

さらに、上記制御部６３は、カードＩ／Ｆ６６を介して、前記第２のメモリカード２０が装着可能なカードホルダ６７に接続されている。これにより、制御部６３は、カードホルダ６７に装着された第２のメモリカード２０と、カードＩ／Ｆ６６を介して情報伝送を行なうことができる。 Further, the control unit 63 is connected to a card holder 67 into which the second memory card 20 can be mounted via a card I / F 66. Accordingly, the control unit 63 can perform information transmission via the card I / F 66 with the second memory card 20 mounted on the card holder 67.

また、上記制御部６３は、通信Ｉ／Ｆ６８を介して第１のＬＡＮ端子２１に接続されている。これにより、制御部６３は、第１のＬＡＮ端子２１に接続されたＬＡＮ対応のＨＤＤ２５と、通信Ｉ／Ｆ６８を介して情報伝送を行なうことができる。この場合、制御部６３は、ＤＨＣＰ（dynamic host configuration protocol）サーバ機能を有し、第１のＬＡＮ端子２１に接続されたＬＡＮ対応のＨＤＤ２５にＩＰ（internet protocol）アドレスを割り当てて制御している。 The control unit 63 is connected to the first LAN terminal 21 via the communication I / F 68. Accordingly, the control unit 63 can perform information transmission via the communication I / F 68 with the LAN-compatible HDD 25 connected to the first LAN terminal 21. In this case, the control unit 63 has a DHCP (dynamic host configuration protocol) server function, and assigns and controls an IP (internet protocol) address to the LAN-compatible HDD 25 connected to the first LAN terminal 21.

さらに、上記制御部６３は、通信Ｉ／Ｆ６９を介して第２のＬＡＮ端子２２に接続されている。これにより、制御部６３は、第２のＬＡＮ端子２２に接続された各機器（図１参照）と、通信Ｉ／Ｆ６９を介して情報伝送を行なうことができる。 Further, the control unit 63 is connected to the second LAN terminal 22 via the communication I / F 69. Thereby, the control unit 63 can perform information transmission with each device (see FIG. 1) connected to the second LAN terminal 22 via the communication I / F 69.

また、上記制御部６３は、ＵＳＢＩ／Ｆ７０を介して前記ＵＳＢ端子２３に接続されている。これにより、制御部６３は、ＵＳＢ端子２３に接続された各機器（図１参照）と、ＵＳＢＩ／Ｆ７０を介して情報伝送を行なうことができる。 The control unit 63 is connected to the USB terminal 23 via the USB I / F 70. Accordingly, the control unit 63 can perform information transmission with each device (see FIG. 1) connected to the USB terminal 23 via the USB I / F 70.

さらに、上記制御部６３は、ＩＥＥＥ１３９４Ｉ／Ｆ７１を介してＩＥＥＥ１３９４端子２４に接続されている。これにより、制御部６３は、ＩＥＥＥ１３９４端子２４に接続された各機器（図１参照）と、ＩＥＥＥ１３９４Ｉ／Ｆ７１を介して情報伝送を行なうことができる。 Further, the control unit 63 is connected to the IEEE 1394 terminal 24 via the IEEE 1394 I / F 71. Thereby, the control part 63 can perform information transmission via each apparatus (refer FIG. 1) connected to the IEEE1394 terminal 24 via IEEE1394 I / F71.

図３は、上記オーディオ処理部５７内に備えられる音質制御処理部７２を示している。この音質制御処理部７２では、入力端子７３に供給されたオーディオ信号が、直列接続された複数（図示の場合は４つ）の音質制御部７４，７５，７６，７７によって、それぞれ異なる種類の音質制御処理を施された後、出力端子７８から取り出される。 FIG. 3 shows a sound quality control processing unit 72 provided in the audio processing unit 57. In the sound quality control processing unit 72, the audio signal supplied to the input terminal 73 is converted into different types of sound quality by a plurality of (four in the illustrated example) sound quality control units 74, 75, 76, 77 connected in series. After being subjected to the control process, it is taken out from the output terminal 78.

一例を言えば、音質制御部７４は入力オーディオ信号にリバーブ処理を施し、音質制御部７５は入力オーディオ信号にワイドステレオ処理を施し、音質制御部７６は入力オーディオ信号にセンター強調処理を施し、音質制御部７７は入力オーディオ信号にイコライザ処理を施している。 For example, the sound quality control unit 74 performs reverberation processing on the input audio signal, the sound quality control unit 75 performs wide stereo processing on the input audio signal, and the sound quality control unit 76 performs center emphasis processing on the input audio signal. The control unit 77 performs an equalizer process on the input audio signal.

そして、これらの音質制御部７４〜７７にあっては、後述する補正特性制御部７９から各音質制御部７４〜７７に対してそれぞれ別個に生成されて出力される音質制御信号に基づいて、入力オーディオ信号に施す音質制御処理の強度が独立に制御されるようになっている。 In these sound quality control units 74 to 77, input is performed based on sound quality control signals that are separately generated and output from the correction characteristic control unit 79 described later to the sound quality control units 74 to 77, respectively. The intensity of the sound quality control process applied to the audio signal is controlled independently.

一方、上記音質制御処理部７２では、入力端子７３に供給されたオーディオ信号が特徴パラメータ算出部８０に供給されている。この特徴パラメータ算出部８０は、入力されたオーディオ信号から、音声信号と音楽信号とを判別するための各種の特徴パラメータ、音楽信号と例えばＢＧＭ（back ground music）、拍手及び歓声等の背景音となる背景音信号とを判別するための各種の特徴パラメータ、音声や音楽の信号とノイズ信号とを判別するための各種の特徴パラメータ等を算出している。 On the other hand, in the sound quality control processing unit 72, the audio signal supplied to the input terminal 73 is supplied to the feature parameter calculation unit 80. The feature parameter calculation unit 80 determines various feature parameters for discriminating a voice signal and a music signal from the input audio signal, a music signal and background sounds such as BGM (back ground music), applause and cheers. Various feature parameters for discriminating a background sound signal, various feature parameters for discriminating a voice or music signal and a noise signal, and the like are calculated.

この場合、特徴パラメータ算出部８０は、入力されたオーディオ信号を、図４（ａ）に示すように、数１００ｍｓｅｃ程度のフレーム単位に切り出し、さらに、図４（ｂ）に示すように、各フレームを数１０ｍｓｅｃ程度のサブフレームに分割する。そして、サブフレーム単位で各種の特徴パラメータを生成するための判別情報を取得し、取得した判別情報のフレーム単位での統計量を算出することにより、特徴パラメータを算出する処理を行なっている。 In this case, the feature parameter calculation unit 80 cuts the input audio signal into frames of about several hundreds msec as shown in FIG. 4A, and further, each frame as shown in FIG. 4B. Is divided into subframes of about several tens of milliseconds. And the discrimination information for producing | generating various feature parameters per sub-frame is acquired, The process which calculates the feature parameter is performed by calculating the statistics for the frame unit of the acquired discrimination information.

すなわち、特徴パラメータ算出部８０では、入力されたオーディオ信号から、サブフレーム単位で、音声信号と音楽信号とを判別するための各種の判別情報、音楽信号と背景音信号とを判別するための各種の判別情報、音声や音楽の信号とノイズ信号とを判別するための各種の判別情報等を取得し、取得した各種の判別情報それぞれについて、フレーム単位での統計量（例えば平均，分散，最大，最小等）を求めることにより、種々の特徴パラメータを算出している。 That is, the feature parameter calculation unit 80 determines various discrimination information for discriminating the audio signal and the music signal from the input audio signal in units of subframes, and various discriminating information for discriminating the music signal and the background sound signal. Discriminating information, various discriminating information for discriminating audio and music signals and noise signals, etc., and for each of the various discriminating information obtained, statistics (for example, average, variance, maximum, By calculating the minimum etc., various feature parameters are calculated.

例えば、特徴パラメータ算出部８０では、サブフレーム単位で入力オーディオ信号の信号振幅の二乗和であるパワー値を判別情報として算出し、その算出されたパワー値に対するフレーム単位での統計量を求めることにより、パワー値に関する特徴パラメータｐｗを生成している。 For example, the feature parameter calculation unit 80 calculates a power value, which is the sum of squares of the signal amplitude of the input audio signal, in subframe units as discrimination information, and obtains a statistic in frame units for the calculated power value. The characteristic parameter pw related to the power value is generated.

また、特徴パラメータ算出部８０では、サブフレーム単位で入力オーディオ信号の時間波形が振幅方向に零を横切る回数である零交差周波数を判別情報として算出し、その算出された零交差周波数に対するフレーム単位での統計量を求めることにより、零交差周波数に関する特徴パラメータｚｃを生成している。 Also, the feature parameter calculation unit 80 calculates, as discrimination information, a zero crossing frequency that is the number of times that the time waveform of the input audio signal crosses zero in the amplitude direction in subframe units, and in frame units with respect to the calculated zero crossing frequency. The characteristic parameter zc related to the zero crossing frequency is generated by obtaining the statistic.

さらに、特徴パラメータ算出部８０では、サブフレーム単位で入力オーディオ信号の周波数領域でのスペクトル変動を判別情報として算出し、その算出されたスペクトル変動に対するフレーム単位での統計量を求めることにより、スペクトル変動に関する特徴パラメータｓｆを生成している。 Further, the feature parameter calculation unit 80 calculates the spectrum variation in the frequency domain of the input audio signal in units of subframes as discrimination information, and obtains a statistic in units of frames with respect to the calculated spectrum variation. The characteristic parameter sf is generated.

また、特徴パラメータ算出部８０では、サブフレーム単位で入力オーディオ信号における２チャンネルステレオの左右（ＬＲ）信号のパワー比（ＬＲパワー比）を判別情報として算出し、その算出されたＬＲパワー比に対するフレーム単位での統計量を求めることにより、ＬＲパワー比に関する特徴パラメータｌｒを生成している。 Further, the feature parameter calculation unit 80 calculates the power ratio (LR power ratio) of the left and right (LR) signals of the two-channel stereo in the input audio signal in units of subframes as discrimination information, and the frame for the calculated LR power ratio. A characteristic parameter lr related to the LR power ratio is generated by obtaining a statistic in units.

さらに、特徴パラメータ算出部８０では、サブフレーム単位で入力オーディオ信号のスペクトル平坦度を判別情報として算出し、その算出されたスペクトル平坦度に対するフレーム単位での統計量を求めることにより、ノイズ信号に関する特徴パラメータＳＦＭを生成している。 Further, the feature parameter calculation unit 80 calculates the spectral flatness of the input audio signal in subframe units as discriminating information, and obtains a statistic in frame units with respect to the calculated spectral flatness, so that the feature relating to the noise signal is obtained. A parameter SFM is generated.

図５は、上記特徴パラメータ算出部８０が、入力されたオーディオ信号から、音声信号と音楽信号とを判別するための各種の特徴パラメータ、音楽信号と背景音信号とを判別するための各種の特徴パラメータ、音声や音楽の信号とノイズ信号とを判別するための各種の特徴パラメータを生成する処理動作の一例をまとめたフローチャートを示している。 FIG. 5 shows various feature parameters for the feature parameter calculation unit 80 to discriminate between the audio signal and the music signal from the input audio signal, and various features for discriminating the music signal from the background sound signal. 3 is a flowchart summarizing an example of a processing operation for generating various characteristic parameters for discriminating parameters, voice and music signals and noise signals.

まず、処理が開始（ステップＳ５ａ）されると、特徴パラメータ算出部８０は、ステップＳ５ｂで、入力オーディオ信号から数１０ｍｓｅｃ程度のサブフレームを抽出する。そして、特徴パラメータ算出部８０は、ステップＳ５ｃで、入力オーディオ信号からサブフレーム単位でのパワー値を算出する。 First, when the process is started (step S5a), the feature parameter calculation unit 80 extracts a subframe of about several tens of milliseconds from the input audio signal in step S5b. In step S5c, the feature parameter calculation unit 80 calculates a power value for each subframe from the input audio signal.

その後、特徴パラメータ算出部８０は、ステップＳ５ｄで、入力オーディオ信号からサブフレーム単位での零交差周波数を算出し、ステップＳ５ｅで、入力オーディオ信号からサブフレーム単位でのスペクトル変動を算出し、ステップＳ５ｆで、入力オーディオ信号からサブフレーム単位でのＬＲパワー比を算出する。 Thereafter, the feature parameter calculation unit 80 calculates a zero-crossing frequency for each subframe from the input audio signal in step S5d, calculates a spectral variation for each subframe from the input audio signal in step S5e, and performs step S5f. Thus, the LR power ratio in subframe units is calculated from the input audio signal.

また、特徴パラメータ算出部８０は、ステップＳ５ｇで、入力オーディオ信号からサブフレーム単位でスペクトル平坦度を算出する。同様に、特徴パラメータ算出部８０は、ステップＳ５ｈで、入力オーディオ信号からサブフレーム単位で他の算出可能な判別情報を算出する。 Also, the feature parameter calculation unit 80 calculates the spectral flatness in units of subframes from the input audio signal in step S5g. Similarly, in step S5h, the feature parameter calculation unit 80 calculates other calculation information that can be calculated for each subframe from the input audio signal.

その後、特徴パラメータ算出部８０は、ステップＳ５ｉで、サブフレーム単位で算出された各種の判別情報が、数１００ｍｓｅｃ程度のフレーム分蓄積されると、ステップＳ５ｊで、各種の判別情報に対してそれぞれフレーム単位での統計量を求めることにより種々の特徴パラメータを生成し、処理を終了（ステップＳ５ｋ）する。 After that, when various types of discrimination information calculated in units of subframes are accumulated for about several hundreds msec in step S5i, the feature parameter calculation unit 80 obtains frames for each type of discrimination information in step S5j. Various feature parameters are generated by obtaining the statistics in units, and the process is terminated (step S5k).

上記のようにして、特徴パラメータ算出部８０で生成された各種の特徴パラメータは、再び、図３に示すように、音声・音楽識別スコア算出部８１、音楽・背景音識別スコア算出部８２及び検出スコア算出部８３にそれぞれ供給される。 As described above, the various feature parameters generated by the feature parameter calculation unit 80 are, as shown in FIG. 3 again, the voice / music identification score calculation unit 81, the music / background sound identification score calculation unit 82, and the detection. Each is supplied to the score calculation unit 83.

このうち、音声・音楽識別スコア算出部８１は、特徴パラメータ算出部８０で生成された各種の特徴パラメータに基づいて、入力端子７３に供給されたオーディオ信号が、スピーチのような音声信号の特性に近いか、音楽（楽曲）信号の特性に近いかを定量的に示す音声・音楽識別スコアＳ１を算出し、上記検出スコア算出部８３に出力している。 Among them, the voice / music identification score calculation unit 81 converts the audio signal supplied to the input terminal 73 into the characteristics of the voice signal such as speech based on the various feature parameters generated by the feature parameter calculation unit 80. A voice / music identification score S 1 that quantitatively indicates whether it is close or close to the characteristics of the music (music) signal is calculated and output to the detection score calculation unit 83.

また、音楽・背景音識別スコア算出部８２は、特徴パラメータ算出部８０で生成された各種の特徴パラメータに基づいて、入力端子７３に供給されたオーディオ信号が、音楽信号の特性に近いか、背景音信号の特性に近いかを定量的に示す音楽・背景音識別スコアＳ２を算出し、上記検出スコア算出部８３に出力している。 In addition, the music / background sound identification score calculation unit 82 determines whether the audio signal supplied to the input terminal 73 is close to the characteristics of the music signal based on the various feature parameters generated by the feature parameter calculation unit 80. A music / background sound identification score S2 that quantitatively indicates whether the characteristics of the sound signal are close to each other is calculated and output to the detection score calculation unit 83.

この検出スコア算出部８３は、詳細は後述するが、音声・音楽識別スコアＳ１、音楽・背景音識別スコアＳ２及び特徴パラメータに基づいて、入力端子７３に供給されたオーディオ信号に、音声信号が含まれている確度を示す音声スコアＳＳ、音楽信号が含まれている確度を示す音楽スコアＳＭ、ノイズ信号が含まれている確度を示すノイズスコアＳＮを生成している。 As will be described in detail later, the detection score calculation unit 83 includes an audio signal in the audio signal supplied to the input terminal 73 based on the audio / music identification score S1, the music / background sound identification score S2, and the characteristic parameters. Are generated, a music score SM indicating the accuracy of including a music signal, and a noise score SN indicating the accuracy of including a noise signal.

ここで、上記音声・音楽識別スコアＳ１及び音楽・背景音識別スコアＳ２の算出について説明するに先立ち、各種の特徴パラメータの性質について説明しておくことにする。まず、上記パワー値に関する特徴パラメータｐｗについて説明する。すなわち、パワー変動に関して言えば、一般に、音声は、発話している区間と沈黙している区間とが交互に現れるため、サブフレーム間での信号パワーの違いが大きくなり、フレーム単位で見ると各サブフレーム間のパワー値の分散が大きくなる傾向にある。ここでパワー変動とは、サブフレームにおいて計算したパワー値について、より長いフレーム区間内での値の変動に着目した特徴量を指し、具体的にはパワーの分散値等を用いる。 Here, before describing the calculation of the speech / music identification score S1 and the music / background sound identification score S2, the characteristics of various feature parameters will be described. First, the characteristic parameter pw related to the power value will be described. In other words, in terms of power fluctuations, generally speaking, since speech and silent intervals appear alternately, the difference in signal power between subframes increases. There is a tendency for the dispersion of power values between subframes to increase. Here, the power fluctuation refers to a feature amount focused on a fluctuation of a value in a longer frame section with respect to a power value calculated in a subframe, and specifically, a power variance value or the like is used.

また、上記零交差周波数に関する特徴パラメータｚｃについて説明すると、零交差周波数に関して言えば、前述した発話区間と沈黙区間との違いに加えて、音声信号は零交差周波数が子音では高く母音では低くなるため、フレーム単位で見ると各サブフレーム間の零交差周波数の分散が大きくなる傾向にある。 The feature parameter zc related to the zero-crossing frequency will be described. In terms of the zero-crossing frequency, in addition to the difference between the speech period and the silence period described above, the voice signal has a high zero-crossing frequency for consonants and low for vowels. When viewed in units of frames, the dispersion of the zero crossing frequency between the subframes tends to increase.

さらに、上記スペクトル変動に関する特徴パラメータｓｆについて説明すると、スペクトル変動に関して言えば、音声信号は、音楽信号のようにトーナル（調音構造的）な信号に比べて周波数特性の変動が激しいため、フレーム単位で見るとスペクトル変動分散が大きくなる傾向にある。 Further, the characteristic parameter sf related to the spectrum variation will be described. In terms of the spectrum variation, since the audio signal has a greater frequency characteristic variation than a tonal (articulation structural) signal such as a music signal, it is in units of frames. As seen, the spectral fluctuation dispersion tends to increase.

また、上記したＬＲパワー比に関する特徴パラメータｌｒについて説明すると、ＬＲパワー比に関して言えば、音楽信号では、ボーカル以外の楽器演奏がセンター以外に定位していることが多いため、左右のチャンネル間のパワー比が大きくなる傾向にある。 Further, the characteristic parameter lr related to the LR power ratio will be described. In terms of the LR power ratio, in the music signal, musical instrument performances other than vocals are often localized outside the center. The ratio tends to increase.

さらに、上記ノイズ信号に関する特徴パラメータＳＦＭについて説明すると、この特徴パラメータＳＦＭは、ノイズ信号に典型的に見られるスペクトル平坦度を利用しており、このスペクトル平坦度に対するフレーム単位での統計量を求めることにより生成することができる。 Further, the characteristic parameter SFM related to the noise signal will be described. The characteristic parameter SFM uses a spectral flatness typically seen in a noise signal, and a statistic in units of frames is obtained for the spectral flatness. Can be generated.

次に、上記音声・音楽識別スコア算出部８１及び音楽・背景音識別スコア算出部８２における音声・音楽識別スコアＳ１及び音楽・背景音識別スコアＳ２の算出について説明する。音声・音楽識別スコアＳ１及び音楽・背景音識別スコアＳ２の算出手法については１つの手法に特定しないが、ここでは、線形識別関数を用いた算出手法について説明する。 Next, calculation of the voice / music identification score S1 and the music / background sound identification score S2 in the voice / music identification score calculation unit 81 and the music / background sound identification score calculation unit 82 will be described. The calculation method of the speech / music identification score S1 and the music / background sound identification score S2 is not specified as one method, but here, a calculation method using a linear identification function will be described.

線形識別関数を用いる手法では、音声・音楽識別スコアＳ１及び音楽・背景音識別スコアＳ２の算出に必要な各種特徴パラメータに乗ずる重み付け係数をオフライン学習により算出している。この重み付け係数としては、信号種別の判別に効果の高い特徴パラメータほど大きい値が与えられる。 In the method using the linear discriminant function, weighting coefficients to be multiplied by various feature parameters necessary for calculating the speech / music discrimination score S1 and the music / background sound discrimination score S2 are calculated by offline learning. As this weighting coefficient, a larger value is given to a feature parameter that is more effective in determining the signal type.

また、重み付け係数は、音声・音楽識別スコアＳ１については、予め準備した多くの既知の音声信号及び音楽信号を基準となる参照データとして入力し、その参照データについて特徴パラメータを学習することで算出され、音楽・背景音識別スコアＳ２については、予め準備した多くの既知の音楽信号及び背景音信号を基準となる参照データとして入力し、その参照データについて特徴パラメータを学習することで算出される。 Further, the weighting coefficient is calculated by inputting many known speech signals and music signals prepared in advance as reference data for the speech / music identification score S1, and learning feature parameters for the reference data. The music / background sound identification score S2 is calculated by inputting many known music signals and background sound signals prepared in advance as reference data and learning feature parameters for the reference data.

まず、音声・音楽識別スコアＳ１の算出について説明すると、今、学習対象とする参照データのｋ番目のフレームの特徴パラメータセットをベクトルｘで表わし、入力オーディオ信号が属する信号区間｛音声、音楽｝としてｚで以下のように表わすものとする。 First, the calculation of the speech / music identification score S1 will be described. The feature parameter set of the kth frame of the reference data to be learned is represented by a vector x, and the signal section {speech, music} to which the input audio signal belongs is represented. Let z be the following:

ｘ^ｋ＝（１，ｘ_１ ^ｋ，ｘ_２ ^ｋ，……，ｘ_ｎ ^ｋ） … （１）
ｚ^ｋ＝｛−１，＋１｝ … （２）
ここで、上記（１）式の各要素は、抽出したｎ個の特徴パラメータに対応する。また、上記（２）式の−１，＋１は、それぞれ、音声区間及び音楽区間に対応し、使用する音声・音楽判別用の参照データの正解信号種別となる区間について、予め人手で２値のラベル付けをしたものである。さらに、上記（２）式より、以下の線形識別関数を立てる。 x ^k = (1, x ₁ ^k , x ₂ ^k ,..., x _n ^k ) (1)
z ^k = {− 1, + 1} (2)
Here, each element of the above equation (1) corresponds to the extracted n feature parameters. Further, −1 and +1 in the above equation (2) correspond to the voice section and the music section, respectively, and for the section that is the correct signal type of the reference data for voice / music discrimination to be used, a binary value is manually set in advance. Labeled. Furthermore, the following linear discriminant function is established from the above equation (2).

ｆ（ｘ）＝Ａ_０＋Ａ_１・ｘ_１＋Ａ_２・ｘ_２＋……＋Ａ_ｎ・ｘ_ｎ … （３）
ｋ＝１〜Ｎ（Ｎは参照データの入力フレーム数）に対し、ベクトルｘを抽出し、（３）式の評価値と（２）式の正解信号種別との誤差二乗和である（４）式が最小となる正規方程式を解くことにより、各特徴パラメータに対する重み付け係数Ａ_ｉ（ｉ＝０〜ｎ）が決定される。

f (x) = A ₀ + A ₁ · x ₁ + A ₂ · x ₂ + …… + A _n · x _n (3)
A vector x is extracted for k = 1 to N (N is the number of input frames of reference data), and is the sum of squared errors between the evaluation value of equation (3) and the correct signal type of equation (2) (4) A weighting coefficient A _i (i = 0 to n) for each feature parameter is determined by solving a normal equation that minimizes the equation.

学習によって決定した重み付け係数を用い、実際に識別するオーディオ信号の評価値を（３）式より計算し、ｆ（ｘ）＜０であれば音声区間、ｆ（ｘ）＞０であれば音楽区間と判定する。そして、このときのｆ（ｘ）が上記音声・音楽識別スコアＳ１に相当する。これにより、
Ｓ１＝Ａ_０＋Ａ_１・ｘ_１＋Ａ_２・ｘ_２＋……＋Ａ_ｎ・ｘ_ｎ
が算出される。 Using the weighting coefficient determined by learning, the evaluation value of the audio signal that is actually identified is calculated from the equation (3). If f (x) <0, the speech interval, and if f (x)> 0, the music interval Is determined. In this case, f (x) corresponds to the voice / music identification score S1. This
S1 = A ₀ + A ₁ · x ₁ + A ₂ · x ₂ + …… + A _n · x _n
Is calculated.

また、音楽・背景音識別スコアＳ２の算出についても同様に、学習対象とする参照データのｋ番目のフレームの特徴パラメータセットをベクトルｙで表わし、入力オーディオ信号が属する信号区間｛背景音、音楽｝としてｚで以下のように表わすものとする。 Similarly, for the calculation of the music / background sound identification score S2, the feature parameter set of the kth frame of the reference data to be learned is represented by the vector y, and the signal section {background sound, music} to which the input audio signal belongs is represented. Let z be expressed as follows.

ｙ^ｋ＝（１，ｙ_１ ^ｋ，ｙ_２ ^ｋ，……，ｙ_ｍ ^ｋ） … （５）
ｚ^ｋ＝｛−１，＋１｝ … （６）
ここで、上記（５）式の各要素は、抽出したｍ個の特徴パラメータに対応する。また、上記（６）式の−１，＋１は、それぞれ、背景音区間及び音楽区間に対応し、使用する音楽・背景音判別用の参照データの正解信号種別となる区間について、予め人手で２値のラベル付けをしたものである。さらに、上記（６）式より、以下の線形識別関数を立てる。 y ^k = (1, y ₁ ^k , y ₂ ^k ,..., y _m ^k ) (5)
z ^k = {− 1, + 1} (6)
Here, each element of the above equation (5) corresponds to the extracted m feature parameters. Further, −1 and +1 in the above equation (6) correspond to the background sound section and the music section, respectively, and the section that is the correct signal type of the reference data for music / background sound discrimination to be used is manually 2 in advance. The value is labeled. Furthermore, the following linear discriminant function is established from the above equation (6).

ｆ（ｙ）＝Ｂ_０＋Ｂ_１・ｙ_１＋Ｂ_２・ｙ_２＋……＋Ｂ_ｍ・ｙ_ｍ … （７）
ｋ＝１〜Ｎ（Ｎは参照データの入力フレーム数）に対し、ベクトルｙを抽出し、（７）式の評価値と（６）式の正解信号種別との誤差二乗和である（８）式が最小となる正規方程式を解くことにより、各特徴パラメータに対する重み付け係数Ｂ_ｉ（ｉ＝０〜ｍ）が決定される。

f (y) = B ₀ + B ₁ · y ₁ + B ₂ · y ₂ +... + B _m · y _m (7)
A vector y is extracted for k = 1 to N (N is the number of input frames of reference data), and is the sum of squared errors between the evaluation value of equation (7) and the correct signal type of equation (6) (8) By solving the normal equation that minimizes the equation, the weighting coefficient B _i (i = 0 to m) for each feature parameter is determined.

学習によって決定した重み付け係数を用い、実際に識別するオーディオ信号の評価値を（７）式より計算し、ｆ（ｙ）＜０であれば背景音区間、ｆ（ｙ）＞０であれば音楽区間と判定する。そして、このときのｆ（ｙ）が上記音楽・背景音識別スコアＳ２に相当する。これにより、
Ｓ２＝Ｂ_０＋Ｂ_１・ｙ_１＋Ｂ_２・ｙ_２＋……＋Ｂ_ｍ・ｙ_ｍ
が算出される。 Using the weighting coefficient determined by learning, the evaluation value of the audio signal to be actually identified is calculated from the equation (7). If f (y) <0, the background sound interval is calculated, and if f (y)> 0, the music is calculated. Judged as a section. In this case, f (y) corresponds to the music / background sound identification score S2. This
S2 = B ₀ + B ₁ · y ₁ + B ₂ · y ₂ + …… + B _m · y _m
Is calculated.

なお、音声・音楽識別スコアＳ１及び音楽・背景音識別スコアＳ２の算出については、上記した線形識別関数を用いたオフライン学習により求めた重み付け係数を特徴パラメータに乗ずる手法に限定されるものではなく、例えば各特徴パラメータの算出値に対して経験的な閾値を設定し、この閾値との比較判定に応じて各特徴パラメータに重み付けされた得点を付与し、スコアを算出する等の手法も用いることが可能である。 Note that the calculation of the speech / music identification score S1 and the music / background sound identification score S2 is not limited to the method of multiplying the feature parameter by the weighting coefficient obtained by offline learning using the linear identification function described above. For example, an empirical threshold value is set for the calculated value of each feature parameter, a weighted score is assigned to each feature parameter in accordance with a comparison determination with the threshold value, and a score is calculated. Is possible.

図６は、上記のように線形識別関数を用いたオフライン学習で算出した各特徴パラメータの重み付け係数に基づき、音声・音楽識別スコア算出部８１及び音楽・背景音識別スコア算出部８２が音声・音楽識別スコアＳ１及び音楽・背景音識別スコアＳ２を算出する処理動作の一例をまとめたフローチャートを示している。 FIG. 6 shows that the speech / music identification score calculation unit 81 and the music / background sound identification score calculation unit 82 are based on the weighting coefficient of each feature parameter calculated by the offline learning using the linear discrimination function as described above. The flowchart which put together an example of the processing operation which calculates identification score S1 and music and background sound identification score S2 is shown.

すなわち、処理が開始（ステップＳ６ａ）されると、音声・音楽識別スコア算出部８１は、ステップＳ６ｂで、特徴パラメータ算出部８０で算出される各種の特徴パラメータに対して、予め学習した音声・音楽判別用の参照データの特徴パラメータに基づく重み付け係数を付与し、重み付け係数を乗じた特徴パラメータを算出する。その後、音声・音楽識別スコア算出部８１は、ステップＳ６ｃで、重み付け係数を乗じた各特徴パラメータの総和を、音声・音楽識別スコアＳ１として算出する。 That is, when the process is started (step S6a), the voice / music identification score calculation unit 81 performs voice / music learned in advance for various feature parameters calculated by the feature parameter calculation unit 80 in step S6b. A weighting coefficient based on the characteristic parameter of the reference data for determination is given, and the characteristic parameter multiplied by the weighting coefficient is calculated. Thereafter, in step S6c, the voice / music identification score calculation unit 81 calculates the sum of the feature parameters multiplied by the weighting coefficient as the voice / music identification score S1.

また、音楽・背景音識別スコア算出部８２は、ステップＳ６ｄで、特徴パラメータ算出部８０で算出される各種の特徴パラメータに対して、予め学習した音楽・背景音判別用の参照データの特徴パラメータに基づく重み付け係数を付与し、重み付け係数を乗じた特徴パラメータを算出する。その後、音声・背景音識別スコア算出部８２は、ステップＳ６ｅで、重み付け係数を乗じた各特徴パラメータの総和を、音楽・背景音識別スコアＳ２として算出し、処理を終了（ステップＳ６ｆ）する。 In addition, the music / background sound identification score calculation unit 82 uses the characteristic parameters of the reference data for music / background sound discrimination learned in advance for the various characteristic parameters calculated by the characteristic parameter calculation unit 80 in step S6d. Based on the weighting coefficient, the characteristic parameter multiplied by the weighting coefficient is calculated. Thereafter, in step S6e, the voice / background sound identification score calculation unit 82 calculates the sum of the feature parameters multiplied by the weighting coefficient as the music / background sound identification score S2, and ends the process (step S6f).

図７乃至図９は、上記検出スコア算出部８３が、音声・音楽識別スコアＳ１、音楽・背景音識別スコアＳ２及び特徴パラメータに基づいて、音声スコアＳＳ、音楽スコアＳＭ及びノイズスコアＳＮを生成する処理動作の一例をまとめたフローチャートを示している。すなわち、処理が開始（ステップＳ７ａ）されると、検出スコア算出部８３には、ステップＳ７ｂで、音声・音楽識別スコアＳ１、音楽・背景音識別スコアＳ２及び特徴パラメータが供給される。 7 to 9, the detection score calculation unit 83 generates a speech score SS, a music score SM, and a noise score SN based on the speech / music identification score S1, the music / background sound identification score S2, and the feature parameters. The flowchart which put together an example of processing operation is shown. That is, when the process is started (step S7a), the detection score calculation unit 83 is supplied with the voice / music identification score S1, the music / background sound identification score S2, and the characteristic parameter in step S7b.

すると、検出スコア算出部８３は、ステップＳ７ｃで、音声・音楽識別スコアＳ１が負値（Ｓ１＜０、つまり、音楽より音声に近い）であるか否かを判別し、負値であると判断された場合（ＹＥＳ）、ステップＳ７ｄで、音楽・背景音識別スコアＳ２が正値（Ｓ２＞０、つまり、背景音より音楽に近い）であるか否かを判別する。 Then, in step S7c, the detection score calculation unit 83 determines whether or not the voice / music identification score S1 is a negative value (S1 <0, that is, closer to voice than music), and determines that it is a negative value. If yes (YES), it is determined in step S7d whether or not the music / background sound identification score S2 is a positive value (S2> 0, that is, closer to music than the background sound).

そして、音楽・背景音識別スコアＳ２が正値であると判断された場合（ＹＥＳ）、つまり、Ｓ１＜０かつＳ２＞０のとき、検出スコア算出部８３は、ステップＳ７ｅで、音声・音楽識別スコアＳ１が負値であるため、その絶対値をとった値、つまり、|Ｓ１|を音声スコアＳＳとして設定する。その後、検出スコア算出部８３は、ステップＳ７ｆで、音声信号特性に近いので音楽スコアＳＭを０に設定する。 When it is determined that the music / background sound identification score S2 is a positive value (YES), that is, when S1 <0 and S2> 0, the detected score calculation unit 83 determines the voice / music identification in step S7e. Since the score S1 is a negative value, the absolute value, that is, | S1 | is set as the voice score SS. Thereafter, the detection score calculation unit 83 sets the music score SM to 0 because it is close to the audio signal characteristic in step S7f.

また、上記ステップＳ７ｄで音楽・背景音識別スコアＳ２が正値でない（Ｓ２＜０、つまり、音楽より背景音に近い）と判断された場合（ＮＯ）、つまり、Ｓ１＜０かつＳ２＜０のとき、検出スコア算出部８３は、ステップＳ７ｇで、音声・音楽識別スコアＳ１が負値であるため、その絶対値をとった値、つまり、|Ｓ１|に、背景音に含まれる音声成分を考慮してαｓ・|Ｓ２|を加算した値（|Ｓ１|＋αｓ・|Ｓ２|）を、音声スコアＳＳとして設定する。この場合、音楽・背景音識別スコアＳ２が負値であるため、その絶対値を取った値|Ｓ２|に、音声成分に関して予め設定された所定の重み付け係数αｓを乗算することになる。その後、検出スコア算出部８３は、ステップＳ７ｈで、音声信号特性に近いので音楽スコアＳＭを０に設定する。 If it is determined in step S7d that the music / background sound identification score S2 is not a positive value (S2 <0, that is, the background sound is closer to the music) (NO), that is, S1 <0 and S2 <0. At this time, since the speech / music identification score S1 has a negative value in step S7g, the detection score calculation unit 83 considers the speech component included in the background sound in the value obtained by taking the absolute value thereof, that is, | S1 | Then, the value obtained by adding αs · | S2 | (| S1 | + αs · | S2 |) is set as the voice score SS. In this case, since the music / background sound identification score S2 is a negative value, the absolute value | S2 | is multiplied by a predetermined weighting coefficient αs set in advance for the sound component. Thereafter, the detection score calculation unit 83 sets the music score SM to 0 in step S7h because it is close to the audio signal characteristic.

そして、上記ステップＳ７ｆまたはステップＳ７ｈの後、検出スコア算出部８３は、ステップＳ７ｉで、音声スコアＳＳを安定化させるための補正値ＳＳ３及び音楽スコアＳＭを安定化させるための補正値ＳＭ３を更新する。この更新処理は、連続してＣｓ回以上、音声スコアＳＳが正値（ＳＳ＞０）であった場合、既に算出されている安定化補正値ＳＳ３に、音声成分に関して予め設定された所定の安定化係数βｓを加算した値（ＳＳ３＋βｓ）を、音声スコアＳＳに対する新たな安定化補正値ＳＳ３として更新する。また、既に算出されている安定化補正値ＳＭ３から、音楽成分に関して予め設定された所定の安定化係数γｍを減算した値（ＳＭ３−γｍ）を、音楽スコアＳＭに対する新たな安定化補正値ＳＭ３として更新する。 After step S7f or step S7h, the detection score calculation unit 83 updates the correction value SS3 for stabilizing the speech score SS and the correction value SM3 for stabilizing the music score SM in step S7i. . In this update process, when the speech score SS is a positive value (SS> 0) continuously for Cs times or more, the predetermined stabilization that is set in advance with respect to the speech component is set to the already calculated stabilization correction value SS3. The value (SS3 + βs) obtained by adding the activation coefficient βs is updated as a new stabilization correction value SS3 for the speech score SS. Further, a value (SM3−γm) obtained by subtracting a predetermined stabilization coefficient γm set in advance with respect to the music component from the already calculated stabilization correction value SM3 is set as a new stabilization correction value SM3 for the music score SM. Update.

一方、上記ステップＳ７ｃで音声・音楽識別スコアＳ１が負値でない（Ｓ１＞０、つまり、音声より音楽に近い）と判断された場合（ＮＯ）、検出スコア算出部８３は、ステップＳ８ａで、音楽・背景音識別スコアＳ２が正値（Ｓ２＞０、つまり、背景音より音楽に近い）であるか否かを判別する。 On the other hand, if it is determined in step S7c that the voice / music identification score S1 is not a negative value (S1> 0, that is, closer to music than voice) (NO), the detected score calculation unit 83 determines that the music in step S8a It is determined whether the background sound identification score S2 is a positive value (S2> 0, that is, closer to music than the background sound).

そして、音楽・背景音識別スコアＳ２が正値であると判断された場合（ＹＥＳ）、つまり、Ｓ１＞０かつＳ２＞０のとき、検出スコア算出部８３は、ステップＳ８ｂで、音楽信号特性に近いので音声スコアＳＳを０に設定する。その後、検出スコア算出部８３は、ステップＳ８ｃで、音声・音楽識別スコアＳ１を音楽スコアＳＭとして設定する。 When it is determined that the music / background sound identification score S2 is a positive value (YES), that is, when S1> 0 and S2> 0, the detection score calculation unit 83 sets the music signal characteristics in step S8b. Since it is near, the voice score SS is set to 0. Thereafter, the detection score calculation unit 83 sets the voice / music identification score S1 as the music score SM in step S8c.

また、上記ステップＳ８ａで音楽・背景音識別スコアＳ２が正値でない（Ｓ２＜０、つまり、音楽より背景音に近い）と判断された場合（ＮＯ）、つまり、Ｓ１＞０かつＳ２＜０のとき、検出スコア算出部８３は、ステップＳ８ｄで、音声・音楽識別スコアＳ１を負値にして音声度合いに対応させた値、つまり、−Ｓ１に、背景音に含まれる音声成分を考慮してαｓ・|Ｓ２|を加算した値（−Ｓ１＋αｓ・|Ｓ２|）を、音声スコアＳＳとして設定する。この場合、音楽・背景音識別スコアＳ２が負値であるため、その絶対値を取った値|Ｓ２|に、音声成分に関して予め設定された所定の重み付け係数αｓを乗算することになる。 If it is determined in step S8a that the music / background sound identification score S2 is not a positive value (S2 <0, that is, the background sound is closer to the music) (NO), that is, S1> 0 and S2 <0. In step S8d, the detection score calculation unit 83 sets α / sound identification score S1 to a negative value and corresponds to the degree of sound, that is, −S1 in consideration of the sound component included in the background sound. A value obtained by adding | S2 | (−S1 + αs · | S2 |) is set as the voice score SS. In this case, since the music / background sound identification score S2 is a negative value, the absolute value | S2 | is multiplied by a predetermined weighting coefficient αs set in advance for the sound component.

その後、検出スコア算出部８３は、ステップＳ８ｅで、音声・音楽識別スコアＳ１から背景音に含まれる音楽成分を考慮してαｍ・|Ｓ２|を減算した値（Ｓ１−αｍ・|Ｓ２|）を、音楽スコアＳＭとして設定する。この場合、音楽・背景音識別スコアＳ２が負値であるため、その絶対値を取った値|Ｓ２|に、音楽成分に関して予め設定された所定の重み付け係数αｍを乗算することになる。 Thereafter, in step S8e, the detection score calculation unit 83 subtracts αm · | S2 | from the speech / music identification score S1 in consideration of the music component included in the background sound (S1−αm · | S2 |). And set as a music score SM. In this case, since the music / background sound identification score S2 is a negative value, the absolute value | S2 | is multiplied by a predetermined weighting coefficient αm set in advance for the music component.

そして、上記ステップＳ８ｃまたはステップＳ８ｅの後、検出スコア算出部８３は、ステップＳ８ｆで、音声スコアＳＳを安定化させるための補正値ＳＳ３及び音楽スコアＳＭを安定化させるための補正値ＳＭ３を更新する。この更新処理は、連続してＣｍ回以上、音楽スコアＳＭが正値（ＳＭ＞０）であった場合、既に算出されている安定化補正値ＳＳ３から、音声成分に関して予め設定された所定の安定化係数γｓを減算した値（ＳＳ３−γｓ）を、音声スコアＳＳに対する新たな安定化補正値ＳＳ３として更新する。また、既に算出されている安定化補正値ＳＭ３に、音楽成分に関して予め設定された所定の安定化係数βｍを加算した値（ＳＭ３＋βｍ）を、音楽スコアＳＭに対する新たな安定化補正値ＳＭ３として更新する。 After step S8c or step S8e, the detection score calculation unit 83 updates the correction value SS3 for stabilizing the speech score SS and the correction value SM3 for stabilizing the music score SM in step S8f. . In this update process, when the music score SM is a positive value (SM> 0) continuously for Cm times or more, a predetermined stability set in advance with respect to the sound component is calculated from the already calculated stabilization correction value SS3. The value obtained by subtracting the activation coefficient γs (SS3−γs) is updated as a new stabilization correction value SS3 for the speech score SS. Further, a value (SM3 + βm) obtained by adding a predetermined stabilization coefficient βm set in advance with respect to the music component to the already calculated stabilization correction value SM3 is updated as a new stabilization correction value SM3 for the music score SM. .

ここで、上記ステップＳ７ｉまたはステップＳ８ｆの後、検出スコア算出部８３は、ステップＳ７ｊで、安定化補正値ＳＳ３及びＳＭ３をクリップする。これは、音声スコアＳＳに対する安定化補正値ＳＳ３を、予め設定された最小値ＳＳ３ｍｉｎと最大値ＳＳ３ｍａｘとの範囲内に収める、つまり、ＳＳ３ｍｉｎ≦ＳＳ３≦ＳＳ３ｍａｘとしている。また、音楽スコアＳＭに対する安定化補正値ＳＭ３を、予め設定された最小値ＳＭ３ｍｉｎと最大値ＳＭ３ｍａｘとの範囲内に収める、つまり、ＳＭ３ｍｉｎ≦ＳＭ３≦ＳＭ３ｍａｘとしている。 Here, after step S7i or step S8f, the detection score calculation unit 83 clips the stabilization correction values SS3 and SM3 in step S7j. This makes the stabilization correction value SS3 for the speech score SS fall within a preset range between the minimum value SS3min and the maximum value SS3max, that is, SS3min ≦ SS3 ≦ SS3max. Further, the stabilization correction value SM3 for the music score SM is set within a range between a preset minimum value SM3min and maximum value SM3max, that is, SM3min ≦ SM3 ≦ SM3max.

その後、検出スコア算出部８３は、ステップＳ９ａで、クリップされた安定化補正値ＳＳ３を音声スコアＳＳに加算することにより、音声スコアＳＳに対する安定化補正処理を行なうとともに、クリップされた安定化補正値ＳＭ３を音楽スコアＳＭに加算することにより、音楽スコアＳＭに対する安定化補正処理を実行する。 Thereafter, in step S9a, the detection score calculation unit 83 adds the clipped stabilization correction value SS3 to the voice score SS, thereby performing a stabilization correction process on the voice score SS, and the clipped stabilization correction value. By adding SM3 to the music score SM, a stabilization correction process for the music score SM is executed.

次に、上記検出スコア算出部８３は、ステップＳ９ｂで、ノイズ・非ノイズ識別ベーススコアＳ３を算出する。このノイズ・非ノイズ識別ベーススコアＳ３の算出は、特徴パラメータＳＦＭを利用しており、複数の周波数帯域（低域、中域、高域）毎のスペクトル平坦度に対する統計量を求めることにより算出される。 Next, the detection score calculation unit 83 calculates a noise / non-noise discrimination base score S3 in step S9b. The noise / non-noise discrimination base score S3 is calculated by using a characteristic parameter SFM and calculating a statistic with respect to spectral flatness for each of a plurality of frequency bands (low frequency, mid frequency, and high frequency). The

その後、検出スコア算出部８３は、ステップＳ９ｃで、ノイズ・非ノイズ識別ベーススコアＳ３が正値（Ｓ３＞０）であるか否かを判別し、正値であると判断された場合（ＹＥＳ）、ステップＳ９ｄで、ノイズ・非ノイズ識別ベーススコアＳ３をノイズスコアＳＮとして設定する。また、上記ステップＳ９ｃでノイズ・非ノイズ識別ベーススコアＳ３が正値でないと判断された場合（ＮＯ）、検出スコア算出部８３は、ステップＳ９ｅで、ノイズスコアＳＮを０に設定する。 Thereafter, the detection score calculation unit 83 determines whether or not the noise / non-noise identification base score S3 is a positive value (S3> 0) in step S9c, and if it is determined to be a positive value (YES). In step S9d, the noise / non-noise discrimination base score S3 is set as the noise score SN. When it is determined in step S9c that the noise / non-noise discrimination base score S3 is not a positive value (NO), the detection score calculation unit 83 sets the noise score SN to 0 in step S9e.

そして、上記ステップＳ９ｄまたはステップＳ９ｅの後、検出スコア算出部８３は、ステップＳ９ｆで、設定されたノイズスコアＳＮに対して安定化補正処理やクリッピング処理を施し、ステップＳ９ｇで、スコア間調整補正を実行して処理を終了（ステップＳ９ｈ）する。 After step S9d or step S9e, the detection score calculation unit 83 performs stabilization correction processing and clipping processing on the set noise score SN in step S9f, and performs inter-score adjustment correction in step S9g. The processing is terminated (step S9h).

このスコア間調整補正は、設定された音声スコアＳＳ、音楽スコアＳＭ及びノイズスコアＳＮ相互間のバランス調整を行なうもので、例えば音楽スコアＳＭ及びノイズスコアＳＮが共に規定値より大きい場合には、主観的な印象に合わせるため音楽スコアＳＭをノイズスコアＳＮに応じて低くするように補正する等の動作である。 This inter-score adjustment correction adjusts the balance among the set speech score SS, music score SM, and noise score SN. For example, when the music score SM and the noise score SN are both greater than a prescribed value, For example, the music score SM is corrected so as to be lowered according to the noise score SN in order to match a specific impression.

そして、上記検出スコア算出部８３は、スコア間調整補正処理が施された音声スコアＳＳ、音楽スコアＳＭ及びノイズスコアＳＮを、前記補正特性制御部７９（図３参照）に出力している。 The detected score calculation unit 83 outputs the speech score SS, the music score SM, and the noise score SN that have been subjected to the inter-score adjustment correction process to the correction characteristic control unit 79 (see FIG. 3).

ここで、再び図３に示すように、音質制御処理部７２は、環境音マスキング特性算出部８４を備えている。この環境音マスキング特性算出部８４には、入力端子８５を介して周囲の環境音に対応した信号が供給されている。この場合、入力端子８５に供給される信号は、前記マイクロホンＭＩＣで採取した周囲の環境音に対応した信号から、エコーキャンセラ等を用いてオーディオ信号の再生音の回り込み成分を抑制したものとなっている。 Here, as shown in FIG. 3 again, the sound quality control processing unit 72 includes an environmental sound masking characteristic calculation unit 84. A signal corresponding to the ambient environmental sound is supplied to the environmental sound masking characteristic calculation unit 84 via the input terminal 85. In this case, the signal supplied to the input terminal 85 is a signal corresponding to the ambient environmental sound collected by the microphone MIC, in which the wraparound component of the reproduced sound of the audio signal is suppressed using an echo canceller or the like. Yes.

そして、この環境音マスキング特性算出部８４は、入力端子８５に供給された環境音信号レベルに対し、聴覚の周波数マスキング特性を参照してノイズマスキングレベルを算出している。このノイズマスキングレベルの算出は、環境音信号を時間周波数変換した周波数帯域毎のパワーに基づいた周波数マスキング特性を、全帯域の周波数成分に対して重ね合わせることで実現される。 The environmental sound masking characteristic calculator 84 calculates a noise masking level for the environmental sound signal level supplied to the input terminal 85 with reference to the auditory frequency masking characteristic. The calculation of the noise masking level is realized by superimposing frequency masking characteristics based on the power for each frequency band obtained by time-frequency converting the environmental sound signal on the frequency components of the entire band.

この環境音マスキング特性算出部８４で算出されたノイズマスキングレベルは、マスキング補正ゲイン算出部８６に供給される。このマスキング補正ゲイン算出部８６は、図１０に示すように、オーディオ信号の周波数特性（パワー）が、環境音マスキング特性算出部８４で算出されたノイズマスキングレベル以下の帯域に対して、信号成分がノイズに埋もれて聴取しにくい事態が生じないように、図中矢印で示すように、ノイズマスキングレベル以上に引き上げるためのゲイン係数を、補正ゲイン値として周波数帯域毎に算出している。 The noise masking level calculated by the environmental sound masking characteristic calculator 84 is supplied to the masking correction gain calculator 86. As shown in FIG. 10, the masking correction gain calculation unit 86 has a signal component with respect to a band in which the frequency characteristic (power) of the audio signal is equal to or lower than the noise masking level calculated by the environmental sound masking characteristic calculation unit 84. In order to prevent a situation where it is buried in noise and difficult to hear, a gain coefficient for raising the noise masking level or higher is calculated for each frequency band as a correction gain value as indicated by an arrow in the figure.

ただし、過大なゲイン補正や、時系列での急激なゲインの変化は、聴感状の違和感を招くので、算出したゲイン係数に対してクリッピング処理や時間平滑化処理を施した値を補正ゲイン値Ｇm［k］としている。なお、ｋは、周波数帯域を示すインデックスである。そして、マスキング補正ゲイン算出部８６は、算出した補正ゲイン値Ｇm［k］を上記補正特性制御部７９に出力している。 However, excessive gain correction and sudden gain changes in time series cause a sense of discomfort in the sense of hearing. Therefore, the value obtained by subjecting the calculated gain coefficient to clipping or time smoothing is the correction gain value Gm. [K]. Note that k is an index indicating a frequency band. Then, the masking correction gain calculation unit 86 outputs the calculated correction gain value Gm [k] to the correction characteristic control unit 79.

この補正特性制御部７９は、検出スコア算出部８３から供給される音声スコアＳＳ、音楽スコアＳＭ及びノイズスコアＳＮや、マスキング補正ゲイン算出部８６から供給される補正ゲイン値Ｇm［k］等に基づいて、各音質制御部７４〜７７に対して音質制御処理の強度を独立に制御するための音質制御信号をそれぞれ生成している。 The correction characteristic control unit 79 is based on the speech score SS, the music score SM, and the noise score SN supplied from the detection score calculation unit 83, the correction gain value Gm [k] supplied from the masking correction gain calculation unit 86, and the like. Thus, a sound quality control signal for independently controlling the intensity of the sound quality control processing is generated for each of the sound quality control units 74 to 77.

図１１及び図１２は、この補正特性制御部７９が、音声スコアＳＳ、音楽スコアＳＭ、ノイズスコアＳＮ及び補正ゲイン値等に基づいて、入力オーディオ信号にイコライザ処理を施す音質制御部７７に対して音質制御を行なう処理動作の一例をまとめたフローチャートを示している。 11 and 12, the correction characteristic control unit 79 performs an equalizer process on the input audio signal based on the voice score SS, the music score SM, the noise score SN, the correction gain value, and the like. The flowchart which put together an example of the processing operation which performs sound quality control is shown.

すなわち、処理が開始（ステップＳ１１ａ）されると、補正特性制御部７９は、ステップＳ１１ｂで、上記したマスキング補正ゲイン算出部８６から供給される補正ゲイン値Ｇm［k］（＞１．０）を正規化する。以下、正規化された補正ゲインをＧmn［k］と表わす。この場合、下式に示すように、全帯域（周波数帯域を示すインデックスの最小値１から最大値ｋまで）に渡って底上げするゲイン成分、つまり、
Ｇmg＝min（Ｇm［1］，Ｇm［2］，……，Ｇm［k］）
をグローバル補正ゲインＧmgとして算出し、このグローバル補正ゲインＧmgを基準に、下式のように正規化する。 That is, when the process is started (step S11a), the correction characteristic control unit 79 uses the correction gain value Gm [k] (> 1.0) supplied from the masking correction gain calculation unit 86 described above in step S11b. Normalize. Hereinafter, the normalized correction gain is represented as Gmn [k]. In this case, as shown in the following formula, the gain component that rises over the entire band (from the minimum value 1 to the maximum value k of the index indicating the frequency band), that is,
Gmg = min (Gm [1], Gm [2], ..., Gm [k])
Is calculated as a global correction gain Gmg, and normalized based on the global correction gain Gmg as shown in the following equation.

Ｇmn［k］＝Ｇm［k］／Ｇmg
なお、min（Ｇmn［k］）＝１．０となる。 Gmn [k] = Gm [k] / Gmg
Note that min (Gmn [k]) = 1.0.

次に、補正特性制御部７９は、ステップＳ１１ｃで、検出スコア算出部８３から供給される音声スコアＳＳ、音楽スコアＳＭ及びノイズスコアＳＮを比較して、最もスコアが高い音種別、つまり、支配的な音種別が音声であるか否かを判別する。そして、支配的な音種別が音声である（つまり、音声スコアＳＳが最も高い）と判断された場合（ＹＥＳ）、補正特性制御部７９は、ステップＳ１１ｄで、後の処理で使用する補正特性算出重み係数を得るために、図１３（ａ）に一例を示すように、音声に対応して予め設定された係数群を選択する。これは、音声帯域以外の補正ゲインを抑圧するものであり、再生音が音声の場合に音声帯域以外の強調によって音声が聞きづらくなることを防ぐものである。 Next, in step S11c, the correction characteristic control unit 79 compares the voice score SS, the music score SM, and the noise score SN supplied from the detection score calculation unit 83, and determines the sound type having the highest score, that is, dominant. It is determined whether the appropriate sound type is voice. If it is determined that the dominant sound type is voice (that is, the voice score SS is the highest) (YES), the correction characteristic control unit 79 calculates a correction characteristic used in later processing in step S11d. In order to obtain the weighting coefficient, a coefficient group set in advance corresponding to the voice is selected as shown in FIG. 13A. This suppresses correction gains outside the audio band, and prevents the sound from becoming difficult to hear due to enhancement outside the audio band when the reproduced sound is audio.

その後、補正特性制御部７９は、ステップＳ１１ｅで、先に判別された支配的な音種別のスコア（音声スコアＳＳ）に対し、それ以外の他の音種別のスコアを考慮して、図１３（ａ）に示す補正特性算出重み係数群の中から必要な係数を決定するためのスコア補正を行なうことにより、補正音声スコアＳＳ´を生成する。具体的にいえば、補正音声スコアＳＳ´は、音声スコアＳＳから、音楽スコアＳＭ及びノイズスコアＳＮの大きい方を減算して得られる。すなわち、
ＳＳ´＝ＳＳ−max（ＳＭ，ＳＮ）
となる。 Thereafter, in step S11e, the correction characteristic control unit 79 considers the scores of the other sound types other than the dominant sound type score (voice score SS) previously determined, as shown in FIG. A corrected speech score SS ′ is generated by performing score correction for determining a necessary coefficient from the correction characteristic calculation weight coefficient group shown in a). Specifically, the corrected speech score SS ′ is obtained by subtracting the larger one of the music score SM and the noise score SN from the speech score SS. That is,
SS ′ = SS−max (SM, SN)
It becomes.

また、上記ステップＳ１１ｃで支配的な音種別が音声でないと判断された場合（ＮＯ）、補正特性制御部７９は、ステップＳ１２ａで、支配的な音種別が音楽であるか否かを判別し、音楽である（つまり、音楽スコアＳＭが最も高い）と判断された場合（ＹＥＳ）、ステップＳ１２ｂで、後の処理で使用する補正特性算出重み係数を得るために、図１３（ｂ）に一例を示すように、音楽に対応して予め設定された係数群を選択する。これは、音楽の臨場感向上において重要となる低域及び高域以外の中域部の補正ゲインを抑圧するものであり、再生音が音楽の場合に音楽帯域（低域及び高域）以外の強調によって音楽の臨場感が低下することを防ぐものである。 If it is determined in step S11c that the dominant sound type is not voice (NO), the correction characteristic control unit 79 determines whether the dominant sound type is music in step S12a. If it is determined that the music is music (that is, the music score SM is the highest) (YES), an example is shown in FIG. 13B in order to obtain a correction characteristic calculation weighting coefficient used in later processing in step S12b. As shown, a preset coefficient group corresponding to music is selected. This suppresses the correction gain in the mid-range other than the low and high frequencies, which is important for improving the realistic sensation of music. When the playback sound is music, other than the music bands (low and high frequencies) The emphasis prevents the music's realism from deteriorating.

その後、補正特性制御部７９は、ステップＳ１２ｃで、先に判別された支配的な音種別のスコア（音楽スコアＳＭ）に対し、それ以外の他の音種別のスコアを考慮して、図１３（ｂ）に示す補正特性算出重み係数群の中から必要な係数を決定するためのスコア補正を行なうことにより、補正音楽スコアＳＭ´を生成する。具体的にいえば、補正音楽スコアＳＭ´は、音楽スコアＳＭから、音声スコアＳＳ及びノイズスコアＳＮの大きい方を減算して得られる。すなわち、
ＳＭ´＝ＳＭ−max（ＳＳ，ＳＮ）
となる。 Thereafter, in step S12c, the correction characteristic control unit 79 considers the score of the other sound type other than the score of the dominant sound type determined previously (music score SM) in FIG. A corrected music score SM ′ is generated by performing score correction for determining a necessary coefficient from the correction characteristic calculation weight coefficient group shown in b). Specifically, the corrected music score SM ′ is obtained by subtracting the larger one of the voice score SS and the noise score SN from the music score SM. That is,
SM ′ = SM−max (SS, SN)
It becomes.

また、上記ステップＳ１２ａで支配的な音種別が音楽でないと判断された場合（ＮＯ）、補正特性制御部７９は、支配的な音種別がノイズである（つまり、ノイズスコアＳＮが最も高い）と判断し、ステップＳ１２ｄで、後の処理で使用する補正特性算出重み係数を得るために、図１３（ｃ）に一例を示すように、ノイズに対応して予め設定された係数群を選択する。これは、帯域全体の補正ゲインを抑圧するものであり、再生音がノイズの場合にゲイン補正による強調によって返ってうるさく聞きづらい音質になることを防ぐものである。 When it is determined in step S12a that the dominant sound type is not music (NO), the correction characteristic control unit 79 determines that the dominant sound type is noise (that is, the noise score SN is the highest). In step S12d, in order to obtain a correction characteristic calculation weight coefficient used in the subsequent processing, a coefficient group set in advance corresponding to noise is selected as shown in FIG. 13C. This suppresses the correction gain of the entire band, and prevents the sound quality that is difficult to hear due to the enhancement by gain correction when the reproduced sound is noise.

その後、補正特性制御部７９は、ステップＳ１２ｅで、先に判別された支配的な音種別のスコア（ノイズスコアＳＮ）に対し、それ以外の他の音種別のスコアを考慮して、図１３（ｃ）に示す補正特性算出重み係数群の中から必要な係数を決定するためのスコア補正を行なうことにより、補正ノイズスコアＳＮ´を生成する。具体的にいえば、補正ノイズスコアＳＮ´は、ノイズスコアＳＮから、音声スコアＳＳ及び音楽スコアＳＭの大きい方を減算して得られる。すなわち、
ＳＮ´＝ＳＮ−max（ＳＳ，ＳＭ）
となる。 Thereafter, in step S12e, the correction characteristic control unit 79 considers the scores of the other sound types other than the dominant sound type score (noise score SN) previously determined, as shown in FIG. A corrected noise score SN ′ is generated by performing score correction for determining a necessary coefficient from the correction characteristic calculation weight coefficient group shown in c). Specifically, the corrected noise score SN ′ is obtained by subtracting the larger one of the voice score SS and the music score SM from the noise score SN. That is,
SN ′ = SN−max (SS, SM)
It becomes.

そして、上記ステップＳ１１ｅ、ステップＳ１２ｃまたはステップＳ１２ｅの後、補正特性制御部７９は、ステップＳ１１ｆで、補正音声スコアＳＳ´、補正音楽スコアＳＭ´または補正ノイズスコアＳＮ´に基づいて、対応する補正特性算出重み係数群の中から係数を決定する。 Then, after step S11e, step S12c or step S12e, the correction characteristic control unit 79 performs the corresponding correction characteristic based on the corrected voice score SS ′, the corrected music score SM ′, or the corrected noise score SN ′ in step S11f. A coefficient is determined from the calculated weight coefficient group.

この場合、例えば、支配的な音種別が音声の場合には、補正音声スコアＳＳ´が大きいほど音声帯域の重み付けが高い係数が選択される。ただし、この係数は、音声帯域を強調するものではなく、音声帯域以外のゲイン補正による強調により音声が聴きづらくなることを抑制するためのものである。同様に、音楽の場合には、低域と高域とに重み付けを行ない、ノイズの場合には、スコアが大きいほど、全帯域に渡って強調を抑制する重み付けを行なう。 In this case, for example, when the dominant sound type is voice, a coefficient with higher weight of the voice band is selected as the corrected voice score SS ′ is larger. However, this coefficient is not for emphasizing the voice band, but for suppressing the difficulty in hearing the voice due to the enhancement by gain correction other than the voice band. Similarly, in the case of music, weighting is performed on the low frequency range and the high frequency range, and in the case of noise, weighting that suppresses emphasis over the entire band is performed as the score increases.

そして、決定された補正特性算出重み係数に基づいて、正規化後の補正ゲインＧmn［k］が補正される。この場合、重み付け係数による補正後の補正ゲインＧmnw［k］は、補正特性算出重み係数をWg［k］とすると、
Ｇmnw［k］＝Wg［k］×Ｇmn［k］
となる。 Then, the normalized correction gain Gmn [k] is corrected based on the determined correction characteristic calculation weight coefficient. In this case, the correction gain Gmnw [k] after correction by the weighting coefficient is Wg [k] as the correction characteristic calculation weighting coefficient.
Gmnw [k] = Wg [k] × Gmn [k]
It becomes.

ただし、重み係数によりＧmn［k］が１．０以下になる場合には、Ｇmn［k］を１．０とする。これは、オーディオ信号の特性（音種別）によっては、環境音のマスキング特性に基づく補正ゲインにより過剰な補正や音色の変化を抑制する（ゲイン補正がフラットな特性になるようにする）ための対応である。 However, when Gmn [k] is 1.0 or less due to the weight coefficient, Gmn [k] is set to 1.0. Depending on the characteristics (sound type) of the audio signal, this is a measure to suppress excessive correction and timbre changes with a correction gain based on the masking characteristics of the environmental sound (make gain correction flat) It is.

例えば、オーディオ信号の支配的な音種別が音声で、環境音のマスキング特性に基づく補正ゲイン値が音声帯域を強調するような補正である場合には、重み係数をそのまま適用すると音声帯域が過剰に強調されてしまうが、補正ゲイン値を１．０以下にならないようにクリップすることにより、低域及び高域の周波数成分の減衰（音声帯域の強調）を抑制することができる。 For example, when the dominant sound type of the audio signal is voice and the correction gain value based on the masking characteristic of the environmental sound is correction that emphasizes the voice band, the voice band becomes excessive if the weighting factor is applied as it is. Although it is emphasized, by clipping so that the correction gain value does not become 1.0 or less, it is possible to suppress the attenuation of the frequency components in the low frequency band and the high frequency band (emphasis of the voice band).

逆に、環境音のマスキング特性に基づく補正ゲイン値が音声帯域以外（低域あるいは高域）を強調するような、音種別と不整合な補正である場合には、音声が聞きづらくなる方向の補正につながるため、この補正ゲインを低下させるように重み付けが行なわれることになる。結果として、ゲイン補正の特性は、周波数領域でフラットに近づく方向の補正となるため、音色の変化の抑制につながる。なお、重み係数により抑制された補正ゲインは、グローバルゲインの補正により補償される。 On the other hand, if the correction gain value based on the masking characteristics of the environmental sound is a correction that is inconsistent with the sound type, such as emphasizing other than the audio band (low frequency or high frequency), the sound may be difficult to hear. Since this leads to correction, weighting is performed so as to reduce the correction gain. As a result, the gain correction characteristic is correction in a direction approaching flat in the frequency domain, which leads to suppression of timbre changes. The correction gain suppressed by the weighting factor is compensated by correcting the global gain.

オーディオ信号の支配的な音種別が音楽の場合についても、周波数特性は逆になるが同様のことが言える。 The same applies to the case where the dominant sound type of the audio signal is music, although the frequency characteristics are reversed.

次に、補正特性制御部７９は、ステップＳ１１ｇで、重み係数により環境音のマスキング特性に基づくゲイン補正が満たせなくなる帯域に対する補償のため、重み係数により最も補正ゲイン値が低下したゲイン値、つまり、最も補正率が高かったゲインを算出する。すなわち、
min（Ｇmnw［k］／Ｇmn［k］）（＜１．０）
を探索し、この最大補正率をＲmnw_maxとする。ただし、Ｇmnw［k］が最小１．０でクリップされることを考慮して探索する。 Next, in step S11g, the correction characteristic control unit 79 compensates for the band in which the gain correction based on the masking characteristic of the environmental sound cannot be satisfied by the weighting coefficient, that is, the gain value having the lowest correction gain value by the weighting coefficient, The gain with the highest correction factor is calculated. That is,
min (Gmnw [k] / Gmn [k]) (<1.0)
And the maximum correction factor is set to Rmnw_max. However, the search is performed considering that Gmnw [k] is clipped at a minimum of 1.0.

そして、補正特性制御部７９は、ステップＳ１１ｈで、最大補正率Ｒmnw_maxに基づいて、下式によりグローバル補正ゲインＧmgを補正したＧmgwを算出して、
Ｇmgw＝Ｇmg／Ｒmnw_max
処理を終了（ステップＳ１１ｉ）する。 In step S11h, the correction characteristic control unit 79 calculates Gmgw by correcting the global correction gain Gmg by the following equation based on the maximum correction rate Rmnw_max.
Gmgw = Gmg / Rmnw_max
The process ends (step S11i).

以上に説明した実施の形態によれば、入力オーディオ信号にイコライザ処理を施す音質制御部７７に対し、補正後の補正ゲインＧmnw［k］とグローバル補正ゲインＧmgwとを通知することによって、環境音に応じた適切な音質制御処理を施すことができるとともに、オーディオ信号の音種別（音声、音楽、ノイズ）に適した音質制御処理を施すことができるようになる。 According to the embodiment described above, the sound quality control unit 77 that performs the equalizer process on the input audio signal is notified of the corrected correction gain Gmnw [k] and the global correction gain Gmgw to the environmental sound. Accordingly, it is possible to perform appropriate sound quality control processing corresponding to the sound type and to perform sound quality control processing suitable for the sound type (sound, music, noise) of the audio signal.

すなわち、環境音のマスキング特性に応じた周波数帯域毎のゲイン補正に対して、オーディオ信号の音種別判定を考慮して修正することにより、環境音に応じた適切な音質制御を行なうことができるとともに、過剰な音質制御やオーディオ信号の音種別判定と整合しない音質制御が行なわれることを抑制し、音色の変化を抑えた自然な音質の再生音を得ることが可能となる。 That is, by correcting the gain correction for each frequency band according to the masking characteristic of the environmental sound in consideration of the sound type determination of the audio signal, appropriate sound quality control according to the environmental sound can be performed. Therefore, it is possible to suppress reproduction of excessive sound quality and sound quality control that does not match the sound type determination of the audio signal, and to obtain a reproduced sound with a natural sound quality with suppressed change in timbre.

また、上記した音質制御部７７によるイコライザ処理のように周波数帯域毎に補正強度を変更するものでない音質制御、例えば、音質制御部７４によるリバーブ処理、音質制御部７５によるワイドステレオ処理、音質制御部７６によるセンター強調処理等については、原音とその遅延信号とのミクシングゲインを変更することで補正強度を制御することができる。 Also, sound quality control that does not change the correction strength for each frequency band as in the equalizer processing by the sound quality control unit 77 described above, for example, reverb processing by the sound quality control unit 74, wide stereo processing by the sound quality control unit 75, sound quality control unit As for the center enhancement processing by 76, the correction intensity can be controlled by changing the mixing gain of the original sound and its delayed signal.

図１４は、音質制御部７７を除く各音質制御部７４〜７６のうち、入力オーディオ信号にリバーブ処理を施す音質制御部７４の一例を示している。なお、他の音質制御部７５，７６については、音質制御部７４とほぼ同様の構成及び動作を有するため、それらの説明は省略する。 FIG. 14 shows an example of the sound quality control unit 74 that performs reverberation processing on the input audio signal among the sound quality control units 74 to 76 excluding the sound quality control unit 77. Since the other sound quality control units 75 and 76 have substantially the same configuration and operation as the sound quality control unit 74, their description is omitted.

すなわち、音質制御部７４は、入力端子７４ａに供給されたオーディオ信号が、リバーブ処理部７４ｂ及び遅延補償部７４ｃにそれぞれ供給される。このうち、リバーブ処理部７４ｂは、入力オーディオ信号にエコー効果を与えるためのリバーブ処理を施した後、可変利得増幅部７４ｄに出力している。 That is, the sound quality control unit 74 supplies the audio signal supplied to the input terminal 74a to the reverb processing unit 74b and the delay compensation unit 74c, respectively. Among these, the reverberation processing unit 74b performs reverberation processing for giving an echo effect to the input audio signal, and then outputs it to the variable gain amplification unit 74d.

この可変利得増幅部７４ｄは、入力オーディオ信号に対して、上記補正特性制御部７９から出力され入力端子７４ｅを介して供給される音質制御信号に基づいた補正強度で増幅処理を施している。この場合、可変利得増幅部７４ｄのゲインＧは、音質制御信号に基づいて０．０〜１．０の範囲で可変される。 The variable gain amplifying unit 74d performs an amplification process on the input audio signal with a correction intensity based on a sound quality control signal output from the correction characteristic control unit 79 and supplied via the input terminal 74e. In this case, the gain G of the variable gain amplifying unit 74d is varied in the range of 0.0 to 1.0 based on the sound quality control signal.

また、上記遅延補償部７４ｃは、入力オーディオ信号とリバーブ処理部７４ｂから得られるオーディオ信号との処理遅延を吸収するために設けられている。この遅延補償部７４ｄから出力されたオーディオ信号は、可変利得増幅部７４ｆに供給される。 The delay compensation unit 74c is provided to absorb a processing delay between the input audio signal and the audio signal obtained from the reverb processing unit 74b. The audio signal output from the delay compensation unit 74d is supplied to the variable gain amplification unit 74f.

この可変利得増幅部７４ｆは、可変利得増幅部７４ｄのゲインＧに対して、１．０−Ｇなるゲインで、入力オーディオ信号に増幅処理を施している。そして、上記可変利得増幅部７４ｄ，７４ｆから出力される各オーディオ信号は、加算部７４ｇにより加算されて、出力端子７８ｈから取り出される。 The variable gain amplifying unit 74f performs amplification processing on the input audio signal with a gain of 1.0-G with respect to the gain G of the variable gain amplifying unit 74d. The audio signals output from the variable gain amplifiers 74d and 74f are added by the adder 74g and taken out from the output terminal 78h.

なお、他の音質制御部７５，７６では、上記した音質制御部７４のリバーブ処理部７４ｂが、ワイドステレオ処理部、センター強調処理部等に入れ替わることになる。 In the other sound quality control units 75 and 76, the reverberation processing unit 74b of the sound quality control unit 74 is replaced with a wide stereo processing unit, a center enhancement processing unit, and the like.

図１５は、上記した補正特性制御部７９が、音声スコアＳＳ、音楽スコアＳＭ、ノイズスコアＳＮ及び補正ゲイン値等に基づいて、入力オーディオ信号にリバーブ処理を施す音質制御部７４に対して音質制御を行なう処理動作の一例をまとめたフローチャートを示している。 In FIG. 15, the above-described correction characteristic control unit 79 performs sound quality control on the sound quality control unit 74 that performs reverberation processing on the input audio signal based on the voice score SS, the music score SM, the noise score SN, the correction gain value, and the like. 6 is a flowchart summarizing an example of processing operations for performing the above.

すなわち、処理が開始（ステップＳ１５ａ）されると、補正特性制御部７９は、ステップＳ１５ｂで、上記したマスキング補正ゲイン算出部８６から供給される補正ゲイン値Ｇm［k］を正規化する。この補正ゲイン値を正規化する手法は、先にステップＳ１１ｂの処理で説明した内容と同様である。 That is, when the process is started (step S15a), the correction characteristic control unit 79 normalizes the correction gain value Gm [k] supplied from the masking correction gain calculation unit 86 described above in step S15b. The method for normalizing the correction gain value is the same as that described in the process of step S11b.

次に、補正特性制御部７９は、ステップＳ１５ｃで、正規化された補正ゲイン値Ｇmn［k］に基づいて、音楽スコアＳＭを修正する補正スコアを算出するためのパラメータとして、音楽ゲイン補正ベース値Ｇbmを算出する。この音楽ゲイン補正ベース値Ｇbmは、正規化された補正ゲイン値Ｇmn［k］と、図１６（ｂ）に示すような補正強度算出重み係数Ｗsm［k］とから、下式により算出する。 Next, in step S15c, the correction characteristic control unit 79 uses the music gain correction base value as a parameter for calculating a correction score for correcting the music score SM based on the normalized correction gain value Gmn [k]. Gbm is calculated. The music gain correction base value Gbm is calculated from the normalized correction gain value Gmn [k] and the correction intensity calculation weight coefficient Wsm [k] as shown in FIG.

Ｇbm＝Σ（Ｗsm［k］×Ｇmn［k］）
図１６（ｂ）は、音楽に対応して予め設定された補正強度算出重み係数Ｗsm［k］の一例を示しており、中域に重み付けがなされている。つまり、音楽に関する典型的な周波数特性と相反する周波数特性を重視した係数となっている。このため、音楽ゲイン補正ベース値Ｇbmは、音楽信号において相対的に重視しない補正ゲイン値Ｇmn［k］をどの程度含むかを示す指標となっている。これは、音楽帯域以外のゲイン補正による度合いを考慮したもので、この値が大きいほど音楽帯域以外のゲイン補正が強いと推測されるため、音楽向けをより強くスコア補正するためのものである。 Gbm = Σ (Wsm [k] × Gmn [k])
FIG. 16B shows an example of a correction intensity calculation weight coefficient Wsm [k] set in advance corresponding to the music, and the middle region is weighted. That is, the coefficient emphasizes frequency characteristics that are contrary to typical frequency characteristics related to music. For this reason, the music gain correction base value Gbm is an index indicating how much the correction gain value Gmn [k] that is relatively unimportant in the music signal is included. This is because the degree of gain correction other than the music band is taken into account, and it is estimated that the gain correction other than the music band is stronger as the value is larger, so that the score correction for music is more strongly performed.

次に、補正特性制御部７９は、ステップＳ１５ｄで、音楽ゲイン補正ベース値Ｇbmに基づいて、音楽スコアＳＭを修正するための音楽強度補正スコアＳbmを算出する。この音楽強度補正スコアＳbmは、音楽ゲイン補正ベース値Ｇbmに応じて高くなるように対応付けられるように変換される。例えば、Ｓbm＝α×Ｇbm（αは変換のための係数）の一次関数で変換した上で、音楽強度補正スコアＳbmの最大値でクリップ処理する。 Next, in step S15d, the correction characteristic control unit 79 calculates a music intensity correction score Sbm for correcting the music score SM based on the music gain correction base value Gbm. The music intensity correction score Sbm is converted so as to be associated with the music gain correction base value Gbm so as to increase. For example, after conversion with a linear function of Sbm = α × Gbm (α is a coefficient for conversion), clip processing is performed with the maximum value of the music intensity correction score Sbm.

そして、補正特性制御部７９は、ステップＳ１５ｅで、音楽強度補正スコアＳbmを元の音楽スコアＳＭに加算する、つまり、
ＳＭ＝ＳＭ＋Ｓbm
なる演算を行なうことにより、音楽向けの音響効果（この場合、リバーブ処理）を強化するように音楽スコアＳＭの補正が行なわれる。 In step S15e, the correction characteristic control unit 79 adds the music intensity correction score Sbm to the original music score SM.
SM = SM + Sbm
By performing this calculation, the music score SM is corrected so as to enhance the acoustic effect for music (in this case, reverb processing).

同様に、補正特性制御部７９は、ステップＳ１５ｆで、正規化された補正ゲイン値Ｇmn［k］に基づいて、音声スコアＳＳを修正する補正スコアを算出するためのパラメータとして、音声ゲイン補正ベース値Ｇbsを算出する。この音声ゲイン補正ベース値Ｇbsは、正規化された補正ゲイン値Ｇmn［k］と、図１６（ａ）に示すような補正強度算出重み係数Ｗss［k］とから、下式により算出する。 Similarly, in step S15f, the correction characteristic control unit 79 uses the audio gain correction base value as a parameter for calculating a correction score for correcting the audio score SS based on the normalized correction gain value Gmn [k]. Gbs is calculated. The audio gain correction base value Gbs is calculated from the normalized correction gain value Gmn [k] and the correction intensity calculation weight coefficient Wss [k] as shown in FIG.

Ｇbs＝Σ（Ｗss［k］×Ｇmn［k］）
図１６（ａ）は、音声に対応して予め設定された補正強度算出重み係数Ｗss［k］の一例を示しており、音声帯域以外の帯域（低域及び高域）に重み付けがなされている。つまり、音声に関する典型的な周波数特性と相反する周波数特性を重視した係数となっている。このため、音声ゲイン補正ベース値Ｇbsは、音声信号において相対的に重視しない補正ゲイン値Ｇmn［k］をどの程度含むかを示す指標となっている。これは、音声帯域以外のゲイン補正による度合いを考慮したもので、この値が大きいほど音声帯域以外のゲイン補正が強いと推測されるため、音声向けをより強くスコア補正するためのものである。 Gbs = Σ (Wss [k] × Gmn [k])
FIG. 16A shows an example of a correction intensity calculation weight coefficient Wss [k] set in advance corresponding to the voice, and the bands other than the voice band (low band and high band) are weighted. . That is, the coefficient emphasizes frequency characteristics that are in conflict with typical frequency characteristics related to speech. For this reason, the audio gain correction base value Gbs is an index indicating how much the correction gain value Gmn [k], which is not relatively important in the audio signal, is included. This is because the degree of gain correction other than the voice band is taken into account, and it is estimated that the gain correction other than the voice band is stronger as the value is larger.

次に、補正特性制御部７９は、ステップＳ１５ｇで、音声ゲイン補正ベース値Ｇbsに基づいて、音声スコアＳＳを修正するための音声強度補正スコアＳbsを算出する。この音声強度補正スコアＳbsは、音声ゲイン補正ベース値Ｇbsに応じて高くなるように対応付けられるように変換される。例えば、Ｓbs＝β×Ｇbs（βは変換のための係数）の一次関数で変換した上で、音声強度補正スコアＳbsの最大値でクリップ処理する。 Next, in step S15g, the correction characteristic control unit 79 calculates a sound intensity correction score Sbs for correcting the sound score SS based on the sound gain correction base value Gbs. The voice intensity correction score Sbs is converted so as to be associated with the voice gain correction base value Gbs so as to increase. For example, after conversion with a linear function of Sbs = β × Gbs (β is a coefficient for conversion), clip processing is performed with the maximum value of the voice strength correction score Sbs.

そして、補正特性制御部７９は、ステップＳ１５ｈで、音声強度補正スコアＳbsを元の音声スコアＳＳに加算する、つまり、
ＳＳ＝ＳＳ＋Ｓbs
なる演算を行なうことにより、音声向けの音響効果を強化するように音声スコアＳＳの補正が行なわれる。 In step S15h, the correction characteristic control unit 79 adds the voice strength correction score Sbs to the original voice score SS.
SS = SS + Sbs
The voice score SS is corrected so as to enhance the sound effect for voice.

その後、補正特性制御部７９は、ステップＳ１５ｉで、ステップＳ１５ｅにより補正された音楽スコアＳＭ及びステップＳ１５ｈにより補正された音声スコアＳＳに基づいて、音声制御部７４の入力端子７４ｅに供給する音質制御信号を生成し音声制御部７４に出力して、処理を終了（ステップＳ１５ｊ）する。 Thereafter, the sound quality control signal supplied to the input terminal 74e of the sound controller 74 based on the music score SM corrected in step S15e and the sound score SS corrected in step S15h in step S15i. Is output to the voice control unit 74, and the process ends (step S15j).

図１４乃至図１６で説明した実施の形態によれば、環境音を考慮して音楽スコアＳＭ及び音声スコアＳＳを補正し、その補正された音楽スコアＳＭ及び音声スコアＳＳに基づいて生成した音質制御信号を、入力オーディオ信号にリバーブ処理を施す音質制御部７４に通知するようにしたので、環境音に応じた適切な音質制御処理を施すことができるとともに、オーディオ信号の音種別（音声、音楽）に適した音質制御処理を施すことができるようになる。 According to the embodiment described with reference to FIGS. 14 to 16, the music score SM and the voice score SS are corrected in consideration of the environmental sound, and the sound quality control generated based on the corrected music score SM and the voice score SS. Since the signal is notified to the sound quality control unit 74 that performs reverberation processing on the input audio signal, appropriate sound quality control processing according to the environmental sound can be performed, and the sound type of the audio signal (voice, music) It is possible to perform sound quality control processing suitable for the above.

すなわち、オーディオ信号の音種別に応じた音質制御を行なう際に、環境音のマスキング特性を考慮することにより、オーディオ信号の音種別に応じた適切な音質制御を行なうことができるとともに、環境音にマスキングされてしまうオーディオ信号の音質制御効果を高め、より効果的な音質制御を実現すると共に再生オーディオ信号に合っていない環境音による過剰な音質補正を防ぐことができる。 In other words, when performing sound quality control according to the sound type of the audio signal, it is possible to perform appropriate sound quality control according to the sound type of the audio signal by considering the masking characteristic of the environmental sound, and It is possible to enhance the sound quality control effect of the masked audio signal, realize more effective sound quality control, and prevent excessive sound quality correction due to environmental sound that does not match the reproduced audio signal.

また、上記した実施の形態では、補正すべき音質の要素として、リバーブ、ワイドステレオ、センター強調、イコライザ等を挙げたが、これに限らず、例えばサラウンド等を含む音質制御可能な各種の要素について音質制御を行なうことができることはもちろんである。 In the above-described embodiment, reverb, wide stereo, center emphasis, equalizer, and the like are cited as sound quality elements to be corrected. However, the present invention is not limited to this, and various elements that can be controlled in sound quality including, for example, surround sound. Of course, sound quality control can be performed.

なお、この発明は上記した実施の形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を種々変形して具体化することができる。また、上記した実施の形態に開示されている複数の構成要素を適宜に組み合わせることにより、種々の発明を形成することができる。例えば、実施の形態に示される全構成要素から幾つかの構成要素を削除しても良いものである。さらに、異なる実施の形態に係る構成要素を適宜組み合わせても良いものである。 Note that the present invention is not limited to the above-described embodiments as they are, and can be embodied by variously modifying the constituent elements without departing from the scope of the invention in the implementation stage. Various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the above-described embodiments. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements according to different embodiments may be appropriately combined.

１１…デジタルテレビジョン放送受信装置、１２…キャビネット、１３…支持台、１４…映像表示器、１５…スピーカ、１６…操作部、１７…リモートコントローラ、１８…受光部、ＭＩＣ…マイクロホン、１９…第１のメモリカード、２０…第２のメモリカード、２１…第１のＬＡＮ端子、２２…第２のＬＡＮ端子、２３…ＵＳＢ端子、２４…ＩＥＥＥ１３９４端子、２５…ＨＤＤ、２６…ハブ、２７…ＨＤＤ、２８…ＰＣ、２９…ＤＶＤレコーダ、３０…アナログ伝送路、３１…ブロードバンドルータ、３２…ネットワーク、３３…ＰＣ、３４…携帯電話、３５…ハブ、３６…携帯電話、３７…デジタルカメラ、３８…カードリーダ／ライタ、３９…ＨＤＤ、４０…キーボード、４１…ＡＶ−ＨＤＤ、４２…Ｄ−ＶＨＳ、４３…アンテナ、４４…入力端子、４５…チューナ、４６…ＰＳＫ復調器、４７…ＴＳ復号器、４８…信号処理部、４９…アンテナ、５０…入力端子、５１…チューナ、５２…ＯＦＤＭ復調器、５３…ＴＳ復号器、５４…チューナ、５５…アナログ復調器、５６…グラフィック処理部、５７…オーディオ処理部、５８ａ〜５８ｄ…入力端子、５９…ＯＳＤ信号生成部、６０…映像処理部、６１，６２…出力端子、６３…制御部、６３ａ…ＣＰＵ、６３ｂ…ＲＯＭ、６３ｃ…ＲＡＭ、６３ｄ…不揮発性メモリ、６４…カードＩ／Ｆ、６５…カードホルダ、６６…カードＩ／Ｆ、６７…カードホルダ、６８，６９…通信Ｉ／Ｆ、７０…ＵＳＢＩ／Ｆ、７１…ＩＥＥＥ１３９４Ｉ／Ｆ、７２…音質制御処理部、７３…入力端子、７４…音質制御部、７４ａ…入力端子、７４ｂ…リバーブ処理部、７４ｃ…遅延補償部、７４ｄ…可変利得増幅部、７４ｅ…入力端子、７４ｆ…可変利得増幅部、７４ｇ…加算部、７４ｈ…出力端子、７５〜７７…音質制御部、７８…出力端子、７９…補正特性制御部、８０…特徴パラメータ算出部、８１…音声・音楽識別スコア算出部、８２…音楽・背景音識別スコア算出部、８３…検出スコア算出部、８４…環境音枚キング特性算出部、８５…入力端子、８６…マスキング補正ゲイン算出部。 DESCRIPTION OF SYMBOLS 11 ... Digital television broadcast receiver, 12 ... Cabinet, 13 ... Support stand, 14 ... Video display, 15 ... Speaker, 16 ... Operation part, 17 ... Remote controller, 18 ... Light-receiving part, MIC ... Microphone, 19th 1 memory card, 20 ... second memory card, 21 ... first LAN terminal, 22 ... second LAN terminal, 23 ... USB terminal, 24 ... IEEE1394 terminal, 25 ... HDD, 26 ... hub, 27 ... HDD 28 ... PC, 29 ... DVD recorder, 30 ... analog transmission line, 31 ... broadband router, 32 ... network, 33 ... PC, 34 ... mobile phone, 35 ... hub, 36 ... mobile phone, 37 ... digital camera, 38 ... Card reader / writer, 39 ... HDD, 40 ... keyboard, 41 ... AV-HDD, 42 ... D-VHS, 43 ... antenna 44 ... input terminal, 45 ... tuner, 46 ... PSK demodulator, 47 ... TS decoder, 48 ... signal processor, 49 ... antenna, 50 ... input terminal, 51 ... tuner, 52 ... OFDM demodulator, 53 ... TS decoding 54 ... tuner, 55 ... analog demodulator, 56 ... graphic processing unit, 57 ... audio processing unit, 58a to 58d ... input terminal, 59 ... OSD signal generation unit, 60 ... video processing unit, 61, 62 ... output terminal , 63 ... control unit, 63a ... CPU, 63b ... ROM, 63c ... RAM, 63d ... nonvolatile memory, 64 ... card I / F, 65 ... card holder, 66 ... card I / F, 67 ... card holder, 68, 69 ... Communication I / F, 70 ... USB I / F, 71 ... IEEE1394 I / F, 72 ... Sound quality control processing unit, 73 ... Input terminal, 74 ... Sound quality control unit, 74a ... Input terminal 74b ... Reverb processing unit 74c ... Delay compensation unit 74d ... Variable gain amplification unit 74e ... Input terminal 74f ... Variable gain amplification unit 74g ... Addition unit 74h ... Output terminal 75-77 ... Sound quality control unit , 78... Output terminal, 79... Correction characteristic control section, 80... Feature parameter calculation section, 81... Voice / music identification score calculation section, 82 .. music / background sound identification score calculation section, 83. Environmental sound king characteristic calculation unit, 85... Input terminal, 86... Masking correction gain calculation unit.

Claims

Correction gain calculating means for calculating a correction gain for correcting the gain for each frequency band so that the reproduced sound is not masked by the surrounding environmental sound with respect to the input audio signal;
Correction gain correction means for correcting a correction gain for each frequency band based on a weighting factor for each frequency band according to a dominant sound type among one or more sound types included in the input audio signal ;
Sound quality control means for performing sound quality control processing on the input audio signal based on the sound quality control signal generated using the correction gain for each frequency band corrected by the correction gain correction means;
A sound quality control device comprising:

Correction gain calculating means for calculating a correction gain for correcting the gain for each frequency band so that the reproduced sound is not masked by the surrounding environmental sound with respect to the input audio signal;
Score calculating means for calculating a score indicating the accuracy included in each sound type from the input audio signal;
Sound quality control means for performing sound quality control processing on the input audio signal based on a sound quality control signal supplied from outside;
A sound type determining means for comparing the scores for each sound type calculated by the score calculating means to determine a dominant sound type;
Weights corresponding to the sound types determined by the sound type determination means from among a plurality of types of weight coefficients that are preset for each sound type and each have a plurality of coefficients selectable for each frequency band of the input audio signal A first selection means for selecting a coefficient;
Based on a score corresponding to a sound type other than the sound type determined by the sound type determination unit, a desired coefficient is selected from a plurality of coefficients included in the weighting coefficient selected by the first selection unit. A second selection means for selecting;
Correction gain correction means for correcting the correction gain for each frequency band calculated by the correction gain calculation means based on the coefficient for each frequency band of the input audio signal selected by the second selection means;
Sound quality control signal generation means for generating a sound quality control signal to be supplied to the sound quality control means based on the correction gain for each frequency band corrected by the correction gain correction means;
A sound quality control device comprising:

Correction gain calculating means for calculating a correction gain for correcting the gain for each frequency band so that the reproduced sound is not masked by the surrounding environmental sound with respect to the input audio signal;
Score calculating means for calculating a score indicating the accuracy included in each sound type from the input audio signal;
Sound quality control means for performing sound quality control processing on the input audio signal based on a sound quality control signal supplied from outside;
Based on the correction gain for each frequency band of the input audio signal calculated by the correction gain calculation means and the weighting factor preset for each sound type, the score for each sound type calculated by the score calculation means is calculated. Score correcting means for correcting;
Sound quality control signal generation means for generating a sound quality control signal to be supplied to the sound quality control means based on the score for each sound type corrected by the score correction means;
A sound quality control device comprising:

Correction gain calculating means for calculating a correction gain for correcting the gain for each frequency band so that the reproduced sound is not masked by the surrounding environmental sound with respect to the input audio signal;
Score calculating means for calculating a score indicating the accuracy included in each sound type from the input audio signal;
Sound quality control means for performing sound quality control processing on the input audio signal based on a sound quality control signal supplied from outside;
Calculated by the score calculation means based on the correction gain for each frequency band of the input audio signal calculated by the correction gain calculation means and a weighting factor set in advance corresponding to the audio signal included in the input audio signal Voice score correction means for correcting a voice score indicating the accuracy of the included voice signal;
Calculated by the score calculation means based on the correction gain for each frequency band of the input audio signal calculated by the correction gain calculation means and a weighting factor set in advance corresponding to the music signal included in the input audio signal Music score correction means for correcting the music score indicating the accuracy of the included music signal,
The sound quality based on the audio score indicating the accuracy of the audio signal corrected by the audio score correction means and the music score indicating the accuracy of the music signal corrected by the music score correction means. Sound quality control signal generating means for generating a sound quality control signal to be supplied to the control means;
A sound quality control device comprising:

Correction gain calculating means for calculating a correction gain for correcting the gain for each frequency band so that the reproduced sound is not masked by the surrounding environmental sound with respect to the input audio signal;
Score calculating means for calculating a score indicating the accuracy included in each sound type from the input audio signal;
Sound quality control means for performing sound quality control processing on the input audio signal based on a sound quality control signal supplied from outside;
Feature parameter calculation means for calculating various feature parameters for determining the sound type from the input audio signal;
A speech and music identification score calculating means for calculating a speech and music identification score indicating whether an input audio signal is close to a speech signal or a music signal based on various feature parameters calculated by the feature parameter calculating means;
Music background sound identification score calculating means for calculating a music background sound identification score indicating whether the input audio signal is closer to the music signal or the background sound signal based on the various feature parameters calculated by the feature parameter calculating means; ,
Based on the characteristic parameters for discriminating noise, the audio music identification score and the music background sound identification score, an audio score indicating the accuracy of the audio signal and a music score indicating the accuracy of the audio signal , A score calculation means for calculating a noise score indicating the accuracy of the noise signal,
A sound quality control signal to be supplied to the sound quality control means is generated based on the correction gain for each frequency band calculated by the correction gain calculation means and the voice score, music score, and noise score calculated by the score calculation means. Sound quality control signal generating means;
A sound quality control device comprising:

The sound quality control device according to claim 1, wherein the sound quality control means performs at least one of reverberation processing, wide stereo processing, center enhancement processing, equalizer processing, and surround processing on an input audio signal .

Calculating a correction gain for correcting the gain for each frequency band so that the reproduced sound is not masked by the surrounding environmental sound with respect to the input audio signal;
Correcting a correction gain for each frequency band based on a weighting factor for each frequency band according to a dominant sound type among one or more sound types included in the input audio signal;
Performing a sound quality control process on the input audio signal based on the sound quality control signal generated using the corrected correction gain for each frequency band;
A sound quality control method comprising :

Calculating a correction gain for correcting the gain for each frequency band so that the reproduced sound is not masked by the surrounding environmental sound with respect to the input audio signal;
Calculating a score indicating the accuracy included for each sound type from the input audio signal,
Performing a sound quality control process on the input audio signal based on a sound quality control signal supplied from outside;
Comparing the score for each calculated sound type to determine the dominant sound type;
A first weighting coefficient corresponding to the determined sound type is selected from a plurality of weighting coefficients that are set in advance for each sound type and each has a plurality of selectable coefficients for each frequency band of the input audio signal. A process to select;
A step of selecting a desired second weighting factor from a plurality of weighting factors included in the selected first weighting factor based on a score corresponding to a sound type other than the determined sound type. When,
Correcting the calculated correction gain for each frequency band based on a second weighting factor for each frequency band of the selected input audio signal;
Sound quality control signal generating means for generating the sound quality control signal based on the corrected correction gain for each frequency band;
A sound quality control method comprising :

Calculating a correction gain for correcting the gain for each frequency band so that the reproduced sound is not masked by the surrounding environmental sound with respect to the input audio signal;
Calculating a score indicating the accuracy included for each sound type from the input audio signal,
Performing a sound quality control process on the input audio signal based on a sound quality control signal supplied from outside;
Correcting the score for each calculated sound type based on the calculated correction gain for each frequency band of the input audio signal and a weighting factor set in advance for each sound type;
Generating the sound quality control signal based on the corrected score for each sound type;
A sound quality control method comprising:

Calculating a correction gain for correcting the gain for each frequency band so that the reproduced sound is not masked by the surrounding environmental sound with respect to the input audio signal;
Calculating a score indicating the accuracy included for each sound type from the input audio signal,
Performing a sound quality control process on the input audio signal based on a sound quality control signal supplied from outside;
The calculated audio signal is included based on the calculated correction gain for each frequency band of the input audio signal and a weighting factor set in advance corresponding to the audio signal included in the input audio signal. Correcting the voice score indicating accuracy;
The calculated music signal is included based on the calculated correction gain for each frequency band of the input audio signal and a weighting factor set in advance corresponding to the music signal included in the input audio signal. Correcting the music score indicating accuracy;
Generating the sound quality control signal based on a voice score indicating the accuracy of the corrected audio signal and a music score indicating the accuracy of the corrected music signal;
A sound quality control method comprising:

Calculating a correction gain for correcting the gain for each frequency band so that the reproduced sound is not masked by the surrounding environmental sound with respect to the input audio signal;
Calculating a score indicating the accuracy included for each sound type from the input audio signal,
Performing a sound quality control process on the input audio signal based on a sound quality control signal supplied from outside;
Calculating various feature parameters for determining the type of sound from the input audio signal;
Calculating an audio music identification score indicating whether the input audio signal is close to an audio signal or a music signal based on the calculated various characteristic parameters;
Calculating a music background sound identification score indicating whether the input audio signal is close to a music signal or a background sound signal based on the various feature parameters calculated;
Based on the characteristic parameters for discriminating noise, the audio music identification score and the music background sound identification score, an audio score indicating the accuracy of the audio signal and a music score indicating the accuracy of the audio signal Each calculating a noise score indicating the accuracy with which the noise signal is included;
Generating the sound quality control signal based on the calculated correction gain for each frequency band and the calculated voice score, music score, and noise score;
A sound quality control method comprising:

A process of calculating a correction gain for correcting the gain for each frequency band so that the reproduced sound is not masked by the surrounding environmental sound with respect to the input audio signal;
A process of correcting a correction gain for each frequency band based on a weighting factor for each frequency band according to a dominant sound type among one or more sound types included in the input audio signal;
A process of performing a sound quality control process on the input audio signal based on the sound quality control signal generated using the corrected correction gain for each frequency band;
A sound quality control program that is executed by a computer.