JP4381892B2

JP4381892B2 - Transmitter and receiver for sound quality correction transmission

Info

Publication number: JP4381892B2
Application number: JP2004167087A
Authority: JP
Inventors: 馨渡辺; 靖茂中山; 智康小森
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2004-06-04
Filing date: 2004-06-04
Publication date: 2009-12-09
Anticipated expiration: 2024-06-04
Also published as: JP2005348216A

Description

本発明は、音質補正伝送における送信装置、及び受信装置に係り、特に、各個人に適した音質補正を行い高精度な音声を提供するための音質補正伝送における送信装置、及び受信装置に関する。 The present invention relates to a transmission device and a reception device in sound quality correction transmission, and more particularly to a transmission device and a reception device in sound quality correction transmission for providing high-accuracy sound by performing sound quality correction suitable for each individual.

従来、番組においては放送番組中のナレーション音と共にＢＧＭ等の背景音（効果音）を重ねることにより効果的な演出を実現している。このような番組中の音声は、ある所定の混合比により混合されて符号化されたオーディオ信号が視聴者に提供される。このため、視聴者は、受信機等により復号されたオーディオ信号を補正せずに再生するか、又は予め設定された音質補正特性を使用した再生を行っている。このため、番組のオーディオ信号毎の特徴を考慮した高精度な補正や、個人の嗜好に適した自動的な音質補正を行うことは困難であった。 Conventionally, in a program, an effective production is realized by superimposing background sound (sound effect) such as BGM together with narration sound in a broadcast program. The audio in such a program is mixed and encoded with a certain predetermined mixing ratio, and an audio signal is provided to the viewer. For this reason, the viewer reproduces the audio signal decoded by the receiver or the like without correction, or performs reproduction using a preset sound quality correction characteristic. For this reason, it has been difficult to perform high-accuracy correction in consideration of the characteristics of each audio signal of the program and automatic sound quality correction suitable for personal preference.

そこで、従来の技術として、番組制作にあたり、ナレーション音や背景音等との音の聴感的なバランスを測定し、これを制御信号としてミキシングを制御する手段がある（例えば、特許文献１参照。）。 Therefore, as a conventional technique, there is a means for measuring the perceptual balance of sound with narration sound, background sound, and the like in program production, and controlling mixing using this as a control signal (see, for example, Patent Document 1). .

また、視聴者が「声」と、「声」以外の音楽や効果音との音声バランスを自分に適したバランスにして聴取する手法がある（例えば、特許文献２参照。）。なお、特許文献２では、通常の音声信号と人の声以外の音声信号を異なるチャンネルに割当てて多重化している。
特開２００１−７６４６０号公報特開平１０−３２７３８６号公報 In addition, there is a method in which a viewer listens with a voice balance between “voice” and music or sound effects other than “voice” in a balance suitable for him / her (see, for example, Patent Document 2). In Patent Document 2, normal audio signals and audio signals other than human voice are assigned to different channels and multiplexed.
JP 2001-76460 A JP-A-10-327386

しかしながら、上述した特許文献１に示されている手法の場合は、放送局側で予め制作された音声の中から自分の好ましいものを選択することができるが、放送局側で制作された音声が必ずしも各個人に最適な音でない場合もある。 However, in the case of the technique shown in Patent Document 1 described above, it is possible to select a preferred one from the sound produced in advance on the broadcast station side, but the sound produced on the broadcast station side is selected. It may not always be the best sound for each individual.

また、特許文献２に示されている手法の場合は、人の声の電気信号と、人の声以外の電気信号とのミキシングバランスが異なる２種類の信号を受信してミキシングバランスを視聴者に応じてある程度の調整することができるが、この場合も全ての人（視聴者）にとっては最適な音でない場合もある。 In the case of the technique disclosed in Patent Document 2, two types of signals having different mixing balances between an electrical signal of a human voice and an electrical signal other than a human voice are received and the mixing balance is given to the viewer. Some adjustments can be made accordingly, but in this case as well, the sound may not be optimal for all people (viewers).

本発明は、上述した問題点に鑑みなされたものであり、各個人（各視聴者）に適した音質補正を行い高精度な音声を提供するための音質補正伝送における送信装置、及び受信装置を提供することを目的とする。 The present invention has been made in view of the above-described problems, and includes a transmitting device and a receiving device in sound quality correction transmission for performing sound quality correction suitable for each individual (each viewer) and providing high-accuracy sound. The purpose is to provide.

上記課題を解決するために、本件発明は、以下の特徴を有する課題を解決するための手段を採用している。 In order to solve the above problems, the present invention employs means for solving the problems having the following characteristics.

請求項１に記載された発明は、映像と、ナレーション音及び背景音を含むオーディオ信号を有する音声とからなる番組信号の送信を行い、視聴者毎の聴覚測定データを含む視聴者情報に対応させて、受信側で受信した前記番組信号に含まれる音声の音質を補正する送信装置であって、前記オーディオ信号に含まれる前記ナレーション音及び背景音の何れか１つをＭＤＣＴ（ＭｏｄｉｆｉｅｄＤｉｓｃｒｅｔｅＣｏｓｉｎｅＴｒａｎｓｆｏｒｍ）周波数データに変換し、変換されたオーディオ信号の周波数データを構成する複数の周波数帯域のうち、選択される所定の周波数帯域を結合して得られる前記ＭＤＣＴ周波数データの総エネルギーの最大値を前記周波数帯域のオーディオ特徴データとして生成し、生成した前記オーディオ特徴データと、前記音声とを多重する特徴データ生成手段と、前記視聴者情報から、前記音声の音質を前記受信側にて補正させるために、前記視聴者毎に対する音の聴きやすさに関するアンケートを含む質問情報、及び該質問情報に対応付けられた個人の聴力を測定するためのオーディオ信号を含む音声情報を有する音質補正用データを生成する音質補正用データ生成手段と、前記特徴データ生成手段により得られたオーディオ特徴データを含む番組信号を前記受信側に送信する第１送信手段と、前記音質補正用データ生成手段により得られた音質補正用データを前記受信側に送信する第２送信手段とを有し、前記視聴者情報は、前記音質補正用データに対する回答結果として得られる、ナレーション音と背景音との好みの混合比率と所定の周波数帯域毎の聴力レベルとからなる聴覚測定データを含み、前記受信側にて行う前記音声の音質補正は、前記視聴者情報から、前記混合比率と前記所定の周波数帯域毎の前記聴力レベルに対応して設定される音量レベルとの補正変換テーブルからなる補正データを生成させ、前記音声に含まれるナレーション音及び背景音を含むオーディオ信号をＭＤＣＴ変換し、変換したＭＤＣＴ周波数データの周波数帯域毎に前記オーディオ特徴データを乗算し、乗算して得られたＭＤＣＴ周波数データを、前記補正変換テーブルの前記混合比率と周波数帯域に対応させて得られる前記視聴者毎に対する前記音量レベルとを用いて、乗算又は加減算することであることを特徴とする。 The invention described in claim 1 transmits a program signal composed of a video and a voice having an audio signal including a narration sound and a background sound, and corresponds to viewer information including auditory measurement data for each viewer. A transmitting device that corrects the sound quality of the audio included in the program signal received on the receiving side, wherein either one of the narration sound and the background sound included in the audio signal is converted into an MDCT (Modified Discrete Cosine Transform). A maximum value of the total energy of the MDCT frequency data obtained by combining predetermined frequency bands selected from among a plurality of frequency bands constituting the frequency data of the converted audio signal is converted to frequency data. Generated as audio characteristic data of the band, and the generated audio characteristic data When the feature data generation means for multiplexing said voice, from the viewer information, the sound quality of the speech in order to make correction at the receiving side, the question including a questionnaire about the listening easiness of sound for each of the viewer Information and sound quality correction data generating means for generating sound quality correction data having audio information including an audio signal for measuring an individual's hearing associated with the question information, and obtained by the feature data generating means First transmission means for transmitting the program signal including the audio feature data to the reception side, and second transmission means for transmitting the sound quality correction data obtained by the sound quality correction data generation means to the reception side. The viewer information is obtained as a response to the sound quality correction data, and a desired mixing ratio of narration sound and background sound and a predetermined frequency band. The sound quality correction of the sound that is performed at the receiving side includes auditory measurement data that includes each hearing level, and corresponds to the hearing ratio for each predetermined frequency band from the viewer information. Correction data including a correction conversion table with a set volume level is generated, an audio signal including a narration sound and a background sound included in the sound is subjected to MDCT conversion, and the audio feature is obtained for each frequency band of the converted MDCT frequency data. Data is multiplied, and the MDCT frequency data obtained by the multiplication is multiplied or added / subtracted using the mixing ratio and the volume level for each viewer obtained corresponding to the frequency band of the correction conversion table. It is characterized by that.

請求項１記載の発明によれば、受信側で音質補正を行わせるための音質補正用データをオーディオ特徴データに対応させて生成することで、受信側で効率的に高精度な音質補正を実現できる。これにより、各個人に適した音質補正を行い高精度な音声を提供することができる。また、質問情報、及び該質問情報に対応付けられた音声情報を用いて、受信側での視聴者による設定を容易に実現することができ効率的に音質補正用データを生成することができる。また、混合して出力されるナレーション音及び背景音について、お互いの音声を補正することにより、より効果的な演出を実現することができる。更に、オーディオ特徴データを効率的で高精度に生成することができる。 According to the first aspect of the present invention, sound quality correction data for performing sound quality correction on the receiving side is generated in correspondence with the audio feature data, thereby realizing highly accurate sound quality correction on the receiving side efficiently. it can. Thereby, sound quality correction suitable for each individual can be performed, and highly accurate voice can be provided. Further, using the question information and the voice information associated with the question information, the setting by the viewer on the receiving side can be easily realized, and the sound quality correction data can be generated efficiently. In addition, it is possible to achieve a more effective effect by correcting each other's sound for the narration sound and the background sound that are mixed and output. Furthermore, audio feature data can be generated efficiently and with high accuracy.

請求項２に記載された発明は、送信側から送信された番組信号を受信し、受信した番組信号におけるナレーション音及び背景音を有するオーディオ信号からなる音声の音質を視聴者毎の視聴者情報に対応させて補正する受信装置であって、前記送信側から送信される音質補正用データに含まれる前記視聴者毎に対する音の聴きやすさに関するアンケートを含む質問情報、及び該質問情報に対応付けられた個人の聴力を測定するためのオーディオ信号を含む音声情報に対し、前記視聴者毎から回答結果として入力された、ナレーション音と背景音との好みの混合比率と所定の周波数帯域毎の聴力レベルとからなる聴覚測定データを含む視聴者情報を取得する視聴者情報取得手段と、前記視聴者情報取得手段により得られる前記聴覚測定データに対応させて、前記混合比率と前記所定の周波数帯域毎に前記聴力レベルに対応して設定される音量レベルとの補正変換テーブルからなる補正データを生成する音質補正データ生成手段と、前記音質補正データ生成手段により得られる前記補正データと、前記番組信号に含まれる音声及び該音声に含まれるナレーション音及び背景音を含むオーディオ信号の何れか１つのオーディオ信号をＭＤＣＴ周波数データに変換し、変換されたオーディオ信号の周波数データを構成する複数の周波数帯域のうち、選択される所定の周波数帯域を結合して得られる前記ＭＤＣＴ周波数データの総エネルギーの最大値からなるオーディオ特徴データとに基づいて、前記視聴者毎に対する前記音声の音質を補正する補正手段とを有し、前記補正手段は、前記音声に含まれるナレーション音及び背景音を含むオーディオ信号をＭＤＣＴ変換し、変換したＭＤＣＴ周波数データの周波数帯域毎に前記オーディオ特徴データを乗算し、乗算して得られたＭＤＣＴ周波数データを、前記補正変換テーブルの前記混合比率と周波数帯域に対応させて得られる前記視聴者毎に対する前記音量レベルとを用いて、乗算又は加減算することで前記音声の音質を補正することを特徴とする。 The invention described in claim 2 receives the program signal transmitted from the transmission side, and converts the sound quality of the audio signal having the narration sound and the background sound in the received program signal into the viewer information for each viewer. Corresponding correction apparatus that receives and corrects the question information including a questionnaire regarding the ease of listening to the sound for each viewer included in the sound quality correction data transmitted from the transmission side, and associated with the question information For the audio information including the audio signal for measuring the individual hearing ability, the desired mixing ratio of the narration sound and the background sound and the hearing level for each predetermined frequency band, which are input as a response result from each viewer. from the viewing information obtaining means for obtaining audience information containing an auditory measurement data becomes, the auditory measurement data obtained by the viewing information obtaining means and By response, and sound quality correction data generation means for generating correction data consisting of the correction conversion table and volume level set corresponding to the hearing level and the mixing ratio for each of the predetermined frequency band, the sound quality correction data the correction data obtained by the generating means, to convert any one of the audio signals of the audio signal that includes a narration sound and background sound included in the audio and voice included in the program signal to the MDCT frequency data, it is converted Based on the audio feature data comprising the maximum value of the total energy of the MDCT frequency data obtained by combining predetermined frequency bands selected from among a plurality of frequency bands constituting the frequency data of the audio signal, the viewing have a correcting means for correcting the tone of the voice for each finisher, said correction means, the voice The audio signal including the narration sound and background sound included is subjected to MDCT conversion, the audio feature data is multiplied for each frequency band of the converted MDCT frequency data, and the MDCT frequency data obtained by the multiplication is stored in the correction conversion table. The sound quality of the sound is corrected by multiplying or adding / subtracting using the volume ratio for each viewer obtained corresponding to the mixing ratio and frequency band .

請求項２記載の発明によれば、音質補正を行うための音質補正用データにより予め補正データを生成しておくことで、オーディオ特徴データに対して効率的に高精度な音質補正を実現することができる。これにより、各個人に適した音質補正を行い高精度な音声を提供することができる。また、視聴者に対する質問情報、及び該質問情報に対応付けられた音声情報を用いて容易に視聴者情報を取得することができる。 According to the second aspect of the present invention, the correction data is generated in advance from the sound quality correction data for performing the sound quality correction, so that the audio feature data can be efficiently and highly accurately corrected. Can do. Thereby, sound quality correction suitable for each individual can be performed, and highly accurate voice can be provided. Further, the viewer information can be easily obtained using the question information for the viewer and the voice information associated with the question information.

本発明によれば、各個人に適した音質補正を行い高精度な音声を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, sound quality correction | amendment suitable for each individual can be performed and a highly accurate audio | voice can be provided.

＜本発明の概要＞
本発明は、各個人に適した音質補正を行うために、インターネット等の通信ネットワークを使用した送受信手段やＣＤ−ＲＯＭ等の記録媒体等により、音の聞き易さに関するアンケートと、アンケートに付随したオーディオ信号を受信装置側で取得し、取得した情報に合わせて音質補正を行うことにより、パーソナルな音質補正データを各視聴者に提供することができる。 <Outline of the present invention>
In order to perform sound quality correction suitable for each individual, the present invention is accompanied by a questionnaire regarding the ease of listening to sound by means of transmission / reception means using a communication network such as the Internet, a recording medium such as a CD-ROM, and the like. By acquiring an audio signal on the receiving device side and performing sound quality correction in accordance with the acquired information, personal sound quality correction data can be provided to each viewer.

更に具体的に説明すると、例えばデジタル放送又はインターネット配信により、オーディオ信号とオーディオ信号の特徴データが付随したデジタルオーディオ符号化ストリーム信号を受信し、この受信したストリーム信号を復号し、復号したデジタルオーディオ信号に含まれる特徴データを用いて音質補正を行う機能を有する装置において、本装置で取得した個人のパーソナルな音質補正データとオーディオ信号の特徴データとを組み合わせることにより、各個人に適した音質補正を行ったオーディオ信号を出力する。 More specifically, the digital audio encoded stream signal accompanied by the audio signal and the characteristic data of the audio signal is received, for example, by digital broadcasting or Internet distribution, the received stream signal is decoded, and the decoded digital audio signal is decoded. In a device having a function of performing sound quality correction using the feature data included in the device, the sound quality correction suitable for each individual can be performed by combining the personal sound quality correction data of the individual acquired by this device and the feature data of the audio signal. Output the performed audio signal.

＜実施形態＞
以下に、本発明を適用した実施形態について図を用いて説明する。図１は、本発明における音質補正伝送システムの一構成例を示す図である。 <Embodiment>
Embodiments to which the present invention is applied will be described below with reference to the drawings. FIG. 1 is a diagram showing a configuration example of a sound quality correction transmission system according to the present invention.

図１に示す音質補正伝送システム１０は、送信装置１１と、受信装置１２とを有するよう構成されている。また、送信装置１１と受信装置１２とは、インターネット等の通信ネットワーク１３を介してデータの送受信が可能な状態で接続されている。また、送信装置１１は、放送衛星１４を介して放送番組を受信装置１２へ提供する。 A sound quality correction transmission system 10 illustrated in FIG. 1 is configured to include a transmission device 11 and a reception device 12. The transmission device 11 and the reception device 12 are connected in a state where data can be transmitted and received via a communication network 13 such as the Internet. Further, the transmission device 11 provides a broadcast program to the reception device 12 via the broadcast satellite 14.

ここで、図１に示す送信装置１１は、各視聴者が自宅等に設置された受信装置１２等にて視聴可能な映像や音声等からなる番組等を提供する提供事業体等からなる送信装置であり、例えば、放送局やコンテンツを提供するコンテンツプロバイダ等である。また、本実施形態において、送信装置１１と受信装置１２の数には限定されない。 Here, the transmission device 11 shown in FIG. 1 is a transmission device composed of a provider or the like that provides a program or the like composed of video or audio that can be viewed by each viewer on the reception device 12 installed at home or the like. For example, a broadcast station or a content provider that provides content. In the present embodiment, the number of transmitting devices 11 and receiving devices 12 is not limited.

更に、送信装置１１は、ＣＤ−ＲＯＭ等の記録媒体１５により、視聴者情報を取得するためのアンケートやテスト音声データ等の音質補正用データを記録し、受信装置１２に視聴者端末１２に提供することができる。 Further, the transmission device 11 records the sound quality correction data such as the questionnaire for acquiring the viewer information and the test voice data on the recording medium 15 such as a CD-ROM, and provides it to the viewer terminal 12 on the reception device 12. can do.

送信装置１１は、制作した放送番組を、放送衛星１４を介して受信装置１２に提供する。また、送信装置１１は、受信装置１２に音声を補正させるため視聴者に対するンケートや、視聴能力や嗜好の音を測定するためのテスト音声データ等からなる音質補正用データを、通信ネットワーク１３を介して受信装置１２に送信する。なお、上述した音質補正用データは、ＣＤ−ＲＯＭ等の記録媒体１５に蓄積して、郵送等により視聴者に提供することもできる。 The transmission device 11 provides the produced broadcast program to the reception device 12 via the broadcast satellite 14. Further, the transmission device 11 transmits the sound quality correction data including the questionnaire to the viewer for causing the reception device 12 to correct the sound and the test sound data for measuring the viewing ability and the preference sound through the communication network 13. To the receiving device 12. The sound quality correction data described above can be stored in a recording medium 15 such as a CD-ROM and provided to the viewer by mail or the like.

一方、受信装置１２は、送信装置１１からのアンケート等の質問情報、音質補正用データ等を用いて各個人毎に設定を行い、そのアンケートの回答や測定結果等の情報から音質補正データを生成し、生成した音質補正データを蓄積して番組中に含まれる音声の補正を行う。 On the other hand, the receiving device 12 performs setting for each individual using question information such as a questionnaire from the transmitting device 11, data for sound quality correction, and the like, and generates sound quality correction data from information such as answers to the questionnaire and measurement results. Then, the generated sound quality correction data is accumulated to correct the sound included in the program.

次に、送信装置１１及び受信装置１２の各機能構成について、図を用いて説明する。 Next, functional configurations of the transmission device 11 and the reception device 12 will be described with reference to the drawings.

＜送信装置１１＞
図２は、本実施形態における送信装置の機能構成の一例を示す図である。図２に示す送信装置１１は、オーディオ特徴データ生成手段２１と、番組制作手段２２と、音質補正用データ生成手段２３と、第１送信手段２４と、第２送信手段２５とを有するよう構成されている。 <Transmitter 11>
FIG. 2 is a diagram illustrating an example of a functional configuration of the transmission apparatus according to the present embodiment. The transmission device 11 shown in FIG. 2 includes an audio feature data generation unit 21, a program production unit 22, a sound quality correction data generation unit 23, a first transmission unit 24, and a second transmission unit 25. ing.

オーディオ特徴データ生成手段２１は、ナレーションの音声データと、ＢＧＭ等の背景音の音声データを入力し、放送番組のジャンル情報又は番組毎のオーディオ信号に基づいて、オーディオ特徴データを生成する。また、オーディオ特徴データ生成手段２１は、生成したオーディオ特徴データを番組制作手段２２に出力する。なお、オーディオ特徴データ生成についての詳細な説明は後述する。 The audio feature data generation means 21 receives narration audio data and background sound audio data such as BGM, and generates audio feature data based on the genre information of the broadcast program or the audio signal of each program. Further, the audio feature data generation unit 21 outputs the generated audio feature data to the program production unit 22. The detailed description of the audio feature data generation will be described later.

番組制作手段２２は、映像データ及びオーディオ特徴データ生成手段２１より得られるオーディオ特徴データを入力して番組を制作する。また、番組制作手段２２は、制作された番組信号を第１送信手段２４に出力する。 The program production means 22 produces the program by inputting the video data and the audio feature data obtained from the audio feature data generation means 21. Further, the program production means 22 outputs the produced program signal to the first transmission means 24.

また、音質補正用データ生成手段２３は、受信装置１２により音質補正データを生成し音質補正を行わせるため、アンケート等の視聴者に対する質問情報、及び質問情報に対応付けられた音声情報（テスト音声データ）から音質補正用データを生成する。音質補正用データ生成手段２３は、生成した音質補正用データを第２送信手段２５に出力する。なお、音質補正用データ生成手段２３は、ＣＤ―ＲＯＭ等の記録媒体１５等に生成した音質補正用データを蓄積することができ、これにより、郵送等でも音質補正用データを受信装置１２へ送ることができる。 Further, the sound quality correction data generation means 23 generates the sound quality correction data by the receiving device 12 and performs sound quality correction, so that the question information for the viewer such as a questionnaire and the voice information (test voice) associated with the question information. Sound quality correction data is generated from the data. The sound quality correction data generation unit 23 outputs the generated sound quality correction data to the second transmission unit 25. Note that the sound quality correction data generating means 23 can store the sound quality correction data generated on the recording medium 15 such as a CD-ROM, thereby sending the sound quality correction data to the receiving device 12 by mail or the like. be able to.

ここで、質問情報とは、例えば視聴者に対して音の聞きやすさに関するアンケートや各種質問に関するデータであり、テスト音声データとは、例えばアンケートに付随した個人の聴力を測定するためのオーディオ信号である。また、ナレーション音や背景音のサンプルデータ等も含まれる。 Here, the question information is, for example, a questionnaire regarding ease of listening to the viewer and data regarding various questions, and the test voice data is, for example, an audio signal for measuring an individual's hearing associated with the questionnaire. It is. Also included are sample data of narration sounds and background sounds.

第１送信手段２４は、番組制作手段２２から得られる番組信号を放送衛星１４により受信装置１２に出力する。また、第２送信手段２５は、音声補正用データ生成部２３から得られる音質補正用データを通信ネットワーク１３等により受信装置１２に出力する。 The first transmission means 24 outputs the program signal obtained from the program production means 22 to the receiving device 12 by the broadcast satellite 14. The second transmission means 25 outputs the sound quality correction data obtained from the sound correction data generation unit 23 to the receiving device 12 via the communication network 13 or the like.

ここで、第１送信手段２４から送信される信号は、番組制作手段２２により生成される番組毎に送信されるのに対し、第２送信手段２５から送信される信号は、音声補正用データ生成部２３により生成される音質補正用データを用いて受信装置１２側に補正を行わせるとき、例えば一度だけ送信を行う。 Here, the signal transmitted from the first transmitting unit 24 is transmitted for each program generated by the program producing unit 22, whereas the signal transmitted from the second transmitting unit 25 is used to generate audio correction data. When the reception apparatus 12 performs correction using the sound quality correction data generated by the unit 23, for example, transmission is performed only once.

＜オーディオ特徴データ生成手段２１：オーディオ特徴データ生成例＞
次に、オーディオ特徴データ生成手段２１におけるオーディオ特徴データの生成例について図を用いて説明する。図３は、オーディオ特徴データの生成例を説明するための図である。また、具体的には、ナレーション音からのオーディオ特徴データを生成する生成処理ブロックの一例を示している。 <Audio Feature Data Generation Unit 21: Audio Feature Data Generation Example>
Next, an example of generating audio feature data in the audio feature data generating unit 21 will be described with reference to the drawings. FIG. 3 is a diagram for explaining an example of generation of audio feature data. Specifically, an example of a generation processing block that generates audio feature data from a narration sound is shown.

ここでは、例えばニュース等の情報提供番組において、ナレーション音と背景音に関して、ナレーション音を明瞭で聞きやすくするためのオーディオ特徴データの生成例について示す。なお、本実施形態にて用いられるオーディオ音についてはこの限りではなく、また、２つのオーディオ信号（ナレーション音、背景音）に限定されない。 Here, for example, in an information providing program such as news, an example of generation of audio feature data for making the narration sound clear and easy to hear with respect to the narration sound and the background sound will be described. The audio sound used in this embodiment is not limited to this, and is not limited to two audio signals (narration sound and background sound).

更に、ここでは、ナレーション音から抽出したＭＰＥＧ―２ＡＡＣ（ＭＰＥＧ：ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ／ＡＡＣ：ＡｄｖａｎｃｅｄＡｕｄｉｏＣｏｄｉｎｇ）符号化による４９個の聴覚的な周波数帯域（ＳｃａｌｅｆａｃｔｏｒＢａｎｄｓ）に対応した特徴データを生成するものとする。 Further, here, feature data corresponding to 49 auditory frequency bands (Scalefactor Bands) encoded by MPEG-2 AAC (MPEG: Moving Picture Experts Group / AAC: Advanced Audio Coding) extracted from narration sound is generated. Shall.

図３に示すオーディオ特徴データ生成手段２１は、ミキシング手段３１と、ＭＤＣＴ（ＭｏｄｉｆｉｅｄＤｉｓｃｒｅｔｅＣｏｓｉｎｅＴｒａｎｓｆｏｒｍ）手段３２と、ＭＰＥＧ―２ＡＡＣ手段３３と、スケールファクタバンド結合手段３４と、スケールファクタバンド最大値検出手段３５と、符号化手段３６と、多重化手段３７とを有するよう構成されている。 The audio feature data generating means 21 shown in FIG. 3 includes a mixing means 31, an MDCT (Modified Discrete Case Transform) means 32, an MPEG-2 AAC means 33, a scale factor band combining means 34, and a scale factor band maximum value detecting means. 35, an encoding unit 36, and a multiplexing unit 37.

ここで通常、送信装置１１においてはナレーション音と背景音は別々に入力され、これをミキシング手段３１により適切なレベル比で混合して放送音声が生成される。図３では、このうちナレーション音のみを用いてオーディオ特徴データを生成する。 Here, normally, the narration sound and the background sound are input separately in the transmission device 11 and mixed with an appropriate level ratio by the mixing means 31 to generate broadcast sound. In FIG. 3, audio feature data is generated using only the narration sound.

まず、ナレーション音声をＭＤＣＴ手段３２に入力し、またＭＰＥＧ−２ＡＡＣ符号化手段３３で音声を符号化したＭＤＣＴ窓情報を入力して、時間信号をＭＤＣＴ周波数データに変換する。 First, the narration voice is input to the MDCT means 32, and the MDCT window information obtained by encoding the voice by the MPEG-2 AAC encoding means 33 is input to convert the time signal into MDCT frequency data.

図４は、周波数帯域とＭＤＣＴ周波数データの範囲を示す図である。図４では、サンプルレートｆｓを４４．１ｋＨｚ又は４８ＫＨｚとし、４９個の聴覚的な周波数帯域（ｎｕｍ＿ｓｗｂ＿ｌｏｎｇ＿ｗｉｎｄｏｗ）の０〜４８の各スケールファクタバンド（ｓｗｂ）におけるＭＤＣＴ周波数データの範囲（ｓｗｂ＿ｏｆｆｓｅｔ＿ｌｏｎｇ＿ｗｉｎｄｏｗ）を示している。 FIG. 4 is a diagram showing a frequency band and a range of MDCT frequency data. In FIG. 4, the sample rate fs is 44.1 kHz or 48 KHz, and the MDCT frequency data range (swb_offset_long_window) in each of the scale factor bands (swb) of 0 to 48 in 49 auditory frequency bands (num_swb_long_window) is shown. Yes.

ここで、例えば、多重化するナレーション音のオーディオ特徴データとして、変換されたＭＤＣＴ周波数データのうち、図４に示すスケールファクタバンド（ｓｗｂ）のうち３５番目（ｓｗｂ＝３５）までのデータを用いるものとする。 Here, for example, as the audio feature data of the narration sound to be multiplexed, the data up to the 35th (swb = 35) of the scale factor band (swb) shown in FIG. 4 among the converted MDCT frequency data is used. And

つまり、ＭＤＣＴ手段３２により周波数変換されたデータをスケールファクタバンド結合手段３４により結合し、図４に示すスケールファクタバンド内のＭＤＣＴ周波数データの総エネルギーを計算する。また、スケールファクタバンド結合手段３４は、例えば、３４番目まで結合した信号をスケールファクタバンド最大値検出手段３５に出力する。スケールファクタバンド最大値検出手段３６は、入力された信号において、ＭＤＣＴ周波数データの総エネルギーの最大値をスケールファクタバンドの特徴データとして符号化手段３６に出力する。 That is, the data frequency-converted by the MDCT unit 32 is combined by the scale factor band combining unit 34, and the total energy of the MDCT frequency data in the scale factor band shown in FIG. 4 is calculated. Further, the scale factor band combining unit 34 outputs, for example, a signal combined up to the 34th to the scale factor band maximum value detecting unit 35. The scale factor band maximum value detecting means 36 outputs the maximum value of the total energy of the MDCT frequency data in the input signal to the encoding means 36 as scale factor band feature data.

符号化手段３６は、入力された特徴データの符号化を行い、多重化手段３７に出力する。多重化手段３７は、デジタルオーディオストリームとオーディオ特徴データを多重化して出力する。なお、ナレーション音が十分大きいと判定された場合には特徴データの多重を行うが、ナレーション音が無音に近いと判定された場合には特徴データの多重は行う必要はない。次に、多重化方法の例について、具体的に説明する。 The encoding unit 36 encodes the input feature data and outputs it to the multiplexing unit 37. Multiplexing means 37 multiplexes the digital audio stream and the audio feature data and outputs the result. Note that, when it is determined that the narration sound is sufficiently loud, the feature data is multiplexed. However, when it is determined that the narration sound is close to silence, it is not necessary to multiplex the feature data. Next, an example of a multiplexing method will be specifically described.

＜オーディオストリームへの特徴データの多重例＞
ここで、オーディオ特徴データ生成手段２１によるオーディオストリームへの特徴データの多重方法の例について図を用いて説明する。ここでは、ＭＰＥＧ−２ＡＡＣ符号化を例に、デジタルオーディオ符号化ストリームに付随したオーディオ特徴データの多重方法の例を示す。 <Example of multiplexing feature data into audio stream>
Here, an example of a method of multiplexing feature data into an audio stream by the audio feature data generation means 21 will be described with reference to the drawings. Here, an example of a method for multiplexing audio feature data associated with a digital audio encoded stream will be described by taking MPEG-2 AAC encoding as an example.

図５は、ｒａｗ＿ｄａｔａ＿ｂｌｏｃｋシンタックスのシンタックス例を示す図である。図５に示すように、ＭＰＥＧ−２ＡＡＣのデジタルオーディオ符号化ストリームの音声データは、図５に示すようなｒａｗ＿ｄａｔａ＿ｂｌｏｃｋシンタックスの形式で多重化される。このシンタックス中にｆｉｌｌ＿ｅｌｅｍｅｎｔ領域やｄａｔａ＿ｓｔｒｅａｍ＿ｅｌｅｍｅｎｔ領域が定義されている。ここで、上述の各領域について説明する。 FIG. 5 is a diagram illustrating a syntax example of the raw_data_block syntax. As shown in FIG. 5, the audio data of the MPEG-2 AAC digital audio encoded stream is multiplexed in the form of raw_data_block syntax as shown in FIG. In this syntax, a fill_element area and a data_stream_element area are defined. Here, each of the above-described regions will be described.

図６は、ｆｉｌｌ＿ｅｌｅｍｅｎｔ領域のシンタックス例を示す図であり、図７は、ｄａｔａ＿ｓｔｒｅａｍ＿ｅｌｅｍｅｎｔ領域のシンタックス例を示す図である。更に、図８は、図６に示すｆｉｌｌ＿ｅｌｅｍｅｎｔ領域に含まれるｅｘｔｅｎｓｉｏｎ＿ｐａｙｌｏａｄ領域のシンタックス例を示す図である。 FIG. 6 is a diagram illustrating a syntax example of the fill_element area, and FIG. 7 is a diagram illustrating a syntax example of the data_stream_element area. Further, FIG. 8 is a diagram illustrating a syntax example of the extension_payload area included in the fill_element area illustrated in FIG.

オーディオ特徴データは、図６に示すように、ｆｉｌｌ＿ｅｌｅｍｅｎｔ領域に含まれる図８のｅｘｔｅｎｓｉｏｎ＿ｐａｙｌｏａｄシンタックスに示す音響特徴＿ｅｘｔｅｎｓｉｏｎ＿ｄａｔａ（）として多重化させたり、又は図７に示すｄａｔａ＿ｓｔｒｅａｍ＿ｅｌｅｍｅｎｔ領域に示す音響特徴＿ｅｘｔｅｎｓｉｏｎ＿ｄａｔａ＿ｂｙｔｅとして多重化させることができる。 As shown in FIG. 6, the audio feature data is multiplexed as the acoustic feature_extension_data () shown in the extension_payload syntax of FIG. 8 included in the fill_element region, or as the acoustic feature_extension_data_byte_element shown in the data_stream_element region shown in FIG. Can be multiplexed.

なお、ｅｘｔｅｎｓｉｏｎ＿ｐａｙｌｏａｄシンタックスの音響特徴＿ｅｘｔｅｎｓｉｏｎ＿ｄａｔａ（）として多重するためには、ＭＰＥＧ等の規格により承認される必要であると思われるが、ｄａｔａ＿ｓｔｒｅａｍ＿ｅｌｅｍｅｎｔ領域で示す音響特徴＿ｅｘｔｅｎｓｉｏｎ＿ｄａｔａ＿ｂｙｔｅとして多重する場合には、ＭＰＥＧ等により承認される必要がないため、容易に多重化が実現可能となる。なお、本発明における特徴データの多重化例については、この限りではない。 In addition, in order to multiplex as extension_payload syntax acoustic feature_extension_data (), it seems necessary to be approved by standards such as MPEG, but when multiplexing as acoustic feature_extension_data_byte shown in the data_stream_element area, MPEG Therefore, multiplexing can be easily realized. The example of multiplexing feature data in the present invention is not limited to this.

＜受信装置１２＞
次に、受信装置１２の具体的な機能構成について図を用いて説明する。図９は、本実施形態における受信装置の機能構成の一例を示す図である。図９に示す受信装置１２は、受信手段４１と、音質補正データ生成手段４２と、音質補正データ蓄積手段４３と、映像音声分離手段４４と、音質補正手段４５と、視聴者情報手段４６と、表示手段４７と、音声出力手段４８と、入力手段４９とを有するよう構成されている。 <Receiver 12>
Next, a specific functional configuration of the receiving device 12 will be described with reference to the drawings. FIG. 9 is a diagram illustrating an example of a functional configuration of the receiving device according to the present embodiment. The receiving apparatus 12 shown in FIG. 9 includes a receiving means 41, a sound quality correction data generating means 42, a sound quality correction data accumulating means 43, a video / audio separating means 44, a sound quality correcting means 45, a viewer information means 46, The display unit 47, the audio output unit 48, and the input unit 49 are configured.

受信手段４１は、放送衛星１４からの番組信号を受信し、映像音声分離手段４４に出力する。また、受信手段４１は、通信ネットワーク１３を介して送信装置１１から送信された音質補正用データの各種情報を取得する。ここで、取得される各種情報は、上述した視聴者に対する音声を補正するための質問情報やテスト音声情報である。受信手段４１は取得した音質補正用データを視聴者情報取得手段４６に出力する。 The receiving means 41 receives the program signal from the broadcast satellite 14 and outputs it to the video / audio separating means 44. In addition, the receiving unit 41 acquires various types of information on the sound quality correction data transmitted from the transmission device 11 via the communication network 13. Here, the various pieces of information acquired are question information and test audio information for correcting the audio for the viewer described above. The receiving unit 41 outputs the acquired sound quality correction data to the viewer information acquiring unit 46.

音質補正データ生成手段４２は、視聴者情報取得手段４６が取得したアンケート等に回答結果及び各個人の聴覚測定データを用いて、デジタルオーディオストリームに含まれるオーディオ特徴データに対応して音質補正を行うことができる音声補正変換テーブルを生成する。また、音質補正データ生成手段４２は、生成した変換テーブルを音質補正データ蓄積手段４３に出力する。音質補正データ蓄積手段４３は、入力された音声補正変換テーブルを視聴者（個人）毎に蓄積する。 The sound quality correction data generating means 42 performs sound quality correction corresponding to the audio feature data included in the digital audio stream by using the answer result and the individual auditory measurement data in the questionnaire acquired by the viewer information acquiring means 46. An audio correction conversion table that can be generated is generated. Further, the sound quality correction data generating means 42 outputs the generated conversion table to the sound quality correction data accumulating means 43. The sound quality correction data storage means 43 stores the input sound correction conversion table for each viewer (individual).

また、映像音声分離手段４４は、入力した番組信号を映像信号と音声信号とに分離し、分離した映像信号を表示手段４７に出力し、分離した音声信号を音質補正手段４５に出力する。 The video / audio separation unit 44 separates the input program signal into a video signal and an audio signal, outputs the separated video signal to the display unit 47, and outputs the separated audio signal to the sound quality correction unit 45.

音質補正手段４５は、復号したデジタルオーディオ信号に含まれる特徴データと、音質補正データ蓄積手段４３に蓄積された各視聴者の音質補正変換テーブルを用いて、番組に合わせて個人に最適な音質補正を行ったオーディオ信号を生成し、音声出力手段４８に出力する。この出力は、例えばナレーション音が強調されたオーディオ信号となる。なお、音質補正の詳細については後述する。 The sound quality correction means 45 uses the feature data included in the decoded digital audio signal and the sound quality correction conversion table of each viewer stored in the sound quality correction data storage means 43, so that the sound quality correction optimum for the individual according to the program is performed. Is generated and output to the audio output means 48. This output is, for example, an audio signal in which the narration sound is emphasized. Details of the sound quality correction will be described later.

視聴者情報取得手段４６は、受信手段４１から得られる音質補正の各種情報を入力する。なお、これらの情報は、上述したように、視聴者に対する音声を補正するための質問情報やテスト音声情報であり、記録媒体１５等からも取得することができる。 The viewer information acquisition means 46 inputs various kinds of sound quality correction information obtained from the reception means 41. Note that, as described above, these pieces of information are question information and test audio information for correcting the audio for the viewer, and can also be acquired from the recording medium 15 or the like.

視聴者情報取得装置４６は、入力された質問情報を表示手段４７に出力する。また、テスト音声情報を音声出力手段４７に出力する。また、視聴者情報取得装置４６は、視聴者がキーボードやマウス、又はリモコン等の入力手段４９により表示手段４７に出力された質問の回答や各個人の聴覚測定データ、音量等の設定情報等を取得する。更に、視聴者情報取得装置４６は、その取得データを音質補正データ生成手段４２に出力する。 The viewer information acquisition device 46 outputs the input question information to the display means 47. Also, the test voice information is output to the voice output means 47. In addition, the viewer information acquisition device 46 receives the answer of the question output by the viewer to the display means 47 by the input means 49 such as a keyboard, a mouse, or a remote controller, the hearing measurement data of each individual, setting information such as volume, etc. get. Further, the viewer information acquisition device 46 outputs the acquired data to the sound quality correction data generation means 42.

表示手段４７は、モニタ等からなり、音声分離手段４４から得られる映像を出力する。また、視聴者情報取得手段４６から入力される視聴者に音声補正に関するアンケート等の内容を表示する。 The display means 47 is composed of a monitor or the like and outputs the video obtained from the sound separation means 44. In addition, contents such as a questionnaire regarding voice correction are displayed to the viewer input from the viewer information acquisition means 46.

また、音声出力手段４８は、スピーカ等からなり音質補正手段４５により視聴者の嗜好に合った音声信号を出力する。また、視聴者情報取得手段４６より入力されるテスト音声データを出力する。 The audio output means 48 is composed of a speaker or the like, and the sound quality correction means 45 outputs an audio signal suitable for the viewer's preference. Also, test audio data input from the viewer information acquisition means 46 is output.

入力手段４９は、キーボードやマウス、リモコン等からなり、視聴者は入力手段４９により、１台の受信装置で複数の視聴者の音質補正データを蓄積可能とするため、視聴者毎に音質補正データを識別するための個人種別情報を入力する。更に、質問情報やテスト音声データを試聴して、その回答を入力する。例えば、テスト音声データのオーディオ信号が聞こえているか、又は聞き取りやすいか等の質問に対して回答を行う。 The input means 49 includes a keyboard, a mouse, a remote controller, etc., and the viewer can store the sound quality correction data of a plurality of viewers with one receiving device by the input means 49. Enter personal type information for identifying Further, the question information and test voice data are auditioned and the answer is input. For example, an answer is given to a question such as whether the audio signal of the test voice data is heard or easy to hear.

また、入力手段４９は、映像や番組等を視聴する際に、上述の質問情報やテスト音声データにより新たに補正データを生成したい場合には視聴者情報取得手段４６に視聴者情報取得に関する処理を行うよう指示することができる。 In addition, when viewing the video, the program, etc., the input unit 49 causes the viewer information acquisition unit 46 to perform processing related to viewer information acquisition when it is desired to newly generate correction data based on the above-described question information and test audio data. Can be instructed to do.

＜視聴者情報取得例＞
ここで、視聴者情報を取得するための具体的な音声の設定例について説明する。 <Example of viewer information acquisition>
Here, a specific audio setting example for acquiring the viewer information will be described.

まず、音声出力手段４８により、あるサイン波音（例えば５００Ｈｚ）を所定の出力レベルで再生する。視聴者は、再生音が聞こえていれば「Ｙｅｓ」を、再生音が聞こえていなければ「Ｎｏ」を入力手段４９により回答する。この回答結果を視聴者情報取得手段４６により取得する。 First, the sine wave sound (for example, 500 Hz) is reproduced at a predetermined output level by the audio output means 48. The viewer answers “Yes” using the input means 49 if the reproduced sound is heard, and “No” if the reproduced sound is not heard. This answer result is acquired by the viewer information acquisition means 46.

次に、回答結果により再生音が聞こえていれば出力レベルを下げ、再生音が聞こえていなければ出力レベルを上げて、再度再生を行う。上述の処理を繰り返し行い、あるレベル以下では聞こえず、所定のレベル以上で聞こえるという閾値を、回答者個人のあるサイン波音に対する聴覚データとして取得する。 Next, if the reproduction sound is heard from the answer result, the output level is lowered, and if the reproduction sound is not heard, the output level is raised and reproduction is performed again. The above-described processing is repeated, and a threshold value that the sound cannot be heard below a certain level but can be heard above a predetermined level is acquired as auditory data for a certain sine wave sound of the respondent.

次に、上述のサイン波音とは別のサイン波音（例えば１０００Ｈｚ）を用いて、上述と同様に回答者個人のあるサイン波音に対する聴覚データとして取得する。このように、複数の測定すべきサイン波音全てに対して上述した処理を繰り返し行う。 Next, by using a sine wave sound (for example, 1000 Hz) different from the above sine wave sound, it is acquired as auditory data for a certain sine wave sound of the respondent individual as described above. In this way, the above-described processing is repeated for all of the plurality of sine wave sounds to be measured.

次に、ある放送番組を模擬した番組音（Ｒｅｆ）と、ナレーション音と効果音をある比率に設定した番組音（Ｔｓｔ）を音声出力手段４８から再生する。音声出力手段４８によりＲｅｆよりもＴｓｔが好みならば「Ｙｅｓ」を、ＴｓｔよりもＲｅｆが好みならば「ＮＯ」を回答し、この回答結果を視聴者情報取得手段４６により取得する。 Next, a program sound (Ref) simulating a broadcast program and a program sound (Tst) in which a narration sound and a sound effect are set at a certain ratio are reproduced from the audio output means 48. If the voice output means 48 prefers Tst over Ref, “Yes” is answered, and if Ref is preferred over Tst, “NO” is answered, and the answer information is obtained by the viewer information obtaining means 46.

次に、回答結果によりＲｅｆよりもＴｓｔが好みならば背景音の比率を下げ、ＴｓｔよりもＲｅｆが好みならば背景音の比率を上げて再度再生を行う。上述の処理を繰り返し行い、ある比率以下ではＲｅｆよりもＴｓｔが好みで、ある比率以上ではＴｓｔよりもＲｅｆが好みとなる閾値を、ある番組音に対する好みの聴覚データとして取得する。また、上述の処理を複数の測定すべき番組音全てに対して繰り返し行う。 Next, if Tst is preferred over Ref as a result of the answer, the background sound ratio is decreased, and if Ref is preferred over Tst, the background sound ratio is increased and reproduction is performed again. The above process is repeated, and a threshold value that favors Tst over Ref below a certain ratio and prefers Ref over Tst above a certain ratio is acquired as preferred auditory data for a certain program sound. Further, the above-described processing is repeated for all the program sounds to be measured.

＜音質補正データ生成手段４２＞
次に、音質補正データ生成手段４２におけるテーブル生成の具体例について説明する。上述した視聴者情報取得例において、視聴者情報取得手段４６により取得された取得データにより所定数のデータが取得できた場合、取得したデータからテーブルを生成する。 <Sound quality correction data generating means 42>
Next, a specific example of table generation in the sound quality correction data generation means 42 will be described. In the above-described viewer information acquisition example, when a predetermined number of data can be acquired from the acquisition data acquired by the viewer information acquisition means 46, a table is generated from the acquired data.

具体的には、収集した個人の周波数毎の聴力レベル（個人毎に聞こえる最も小さいレベル）と、周波数毎のナレーションと背景音間における好みの混合比率のデータとを、音質補正変換テーブルとして使用する。または、音声ボリューム毎に周波数毎の聴力レベルと、ナレーションと背景音間における好みの混合比率のデータを作成し音質補正変換テーブルを生成する。なお、テーブルの生成例については、本発明においては限定されず、その他の方法を用いてもよい。 Specifically, the collected hearing level for each individual frequency (the lowest level that can be heard for each individual) and the data of the desired mixing ratio between the narration for each frequency and the background sound are used as the sound quality correction conversion table. . Alternatively, the sound quality correction conversion table is generated by creating data of the hearing level for each frequency and the desired mixing ratio between narration and background sound for each sound volume. Note that the table generation example is not limited in the present invention, and other methods may be used.

ここで、更に具体的な変換テーブルについて説明する。ここでは、取得したサイン波音に対する聴覚データからゲイン変換テーブルを作成する例について説明する。 Here, a more specific conversion table will be described. Here, an example of creating a gain conversion table from auditory data for the acquired sine wave sound will be described.

図１０は、ゲイン変換テーブルの一例を示す図である。図１０では、ゲイン変換テーブルとして、上述したオーディオ特徴データ生成例と同様にスケールファクタバンド番号（ｓｆｂ番号）毎に、ＭＰＥＧ−２ＡＡＣにおいて、スケールファクタ値（ｓｆｂ値）として０〜２５５の値があるので、ｓｆｂ番号及びｓｆｂ値の２次元配列に対して、所定のゲイン値（Ａ〜ＺＺ４８）を設定したゲイン変換テーブルを生成する。 FIG. 10 is a diagram illustrating an example of the gain conversion table. In FIG. 10, the gain conversion table has a value of 0 to 255 as a scale factor value (sfb value) in MPEG-2 AAC for each scale factor band number (sfb number) as in the audio feature data generation example described above. Therefore, a gain conversion table in which predetermined gain values (A to ZZ48) are set for a two-dimensional array of sfb numbers and sfb values is generated.

また、図１１は、補正前と補正後のデータ推移の一例を示す図である。ここで、図１０に示すゲイン変換テーブルにおける所定のゲイン値は、図１１に示すゲイン補正前とゲイン補正後の差分値とすることができる。このように、図１０により、各スケールファクタバンド毎に異なるゲイン値を記載することができる。 FIG. 11 is a diagram illustrating an example of data transition before and after correction. Here, the predetermined gain value in the gain conversion table shown in FIG. 10 can be a difference value before and after gain correction shown in FIG. Thus, according to FIG. 10, a different gain value can be described for each scale factor band.

ここで、図１１に示すように、視聴者から収集された聴覚データについてある入力の音量よりも小さい音は聞きづらい場合、所定の範囲（５０〜１００）は一定の音量となるように補正したり、ある所定量を超える音量について（図１１では、２００以上）は、入力に対する出力の傾きが半分になるような補正を行う。 Here, as shown in FIG. 11, when it is difficult to hear a sound smaller than a certain input volume in the auditory data collected from the viewer, the predetermined range (50 to 100) is corrected to a constant volume. For a volume exceeding a certain predetermined amount (200 or more in FIG. 11), correction is performed so that the slope of the output with respect to the input is halved.

つまり、上述のような補正を行うことにより、例えば全ての受信装置に放送される番組（ニュース等の情報提供番組等）の番組音に対して、補正データにより他の受信装置で放送された放送音よりもナレーションが大きい音を容易に出力することができる。 In other words, by performing the correction as described above, for example, the program sound of a program (such as an information providing program such as news) broadcasted to all receiving devices is broadcast by another receiving device using correction data. Sounds with narration greater than sound can be easily output.

また、聴覚測定データから番組音に対する所望の混合比率を生成する場合は、まず、上述したサイン波音に対する聴覚測定データにより、視聴者が所望の音質の閾値を設定されているため、これにより、例えば、最も小さい音と最も大きい音の音量差（ダイナミックレンジ）を放送音よりも小さくするためのテーブル等を生成することができる。 In addition, when generating a desired mixing ratio for program sound from the auditory measurement data, first, the viewer sets a desired sound quality threshold value based on the above-described auditory measurement data for sine wave sound. It is possible to generate a table or the like for making the volume difference (dynamic range) between the smallest sound and the largest sound smaller than the broadcast sound.

また、ナレーション音と効果音等の背景音の混合比率を放送音から変更するための混合比率テーブルを生成することができる。この混合比率テーブルは、デジタルオーディオ符号化ストリームにオーディオ特徴データが含まれており、更にニュース等の情報提供番組において、ナレーション音を明瞭で聞きやすくするためにナレーション音に関するオーディオ特徴データが伝送される場合に用いられる。 Also, it is possible to generate a mixing ratio table for changing the mixing ratio of narration sound and background sound such as sound effect from broadcast sound. In this mixing ratio table, audio feature data is included in a digital audio encoded stream, and audio feature data related to a narration sound is transmitted in an information providing program such as news in order to make the narration sound clear and easy to hear. Used in cases.

混合比率テーブルは、図１０に示すように、上述したオーディオ特徴データ生成例と同様にスケールファクタバンド番号（ｓｆｂ番号）毎に、ＭＰＥＧ−２ＡＡＣにおいて、スケールファクタ値（ｓｆｂ値）として０〜２５５の値があるので、ｓｆｂ番号及びｓｆｂ値の２次元配列に対して、所定のゲイン値（Ａ〜ＺＺ４８）を設定した混合比率を設定することができる。 As shown in FIG. 10, the mixing ratio table has a scale factor value (sfb value) of 0 to 255 in MPEG-2 AAC for each scale factor band number (sfb number) as in the audio feature data generation example described above. Since there is a value, a mixing ratio in which a predetermined gain value (A to ZZ48) is set can be set for a two-dimensional array of sfb numbers and sfb values.

ここで、例えば収集した聴覚データがニュース等情報提供番組の単一の値である場合、ｓｆｂ値に係わらず一定の混合比率とすることができる。また、ニュース等情報提供番組に用いる場合、ｓｆｂ番号３５までの混合比率は収集した聴覚データに対応して補正し、３５以上のミックスレベル比は通常音声と同じ混合比率となるように設定することもできる。 Here, for example, when the collected auditory data is a single value of an information providing program such as news, a constant mixing ratio can be set regardless of the sfb value. In addition, when used for information providing programs such as news, the mixing ratio up to sfb number 35 is corrected in accordance with the collected auditory data, and the mixing level ratio of 35 or higher is set to be the same mixing ratio as normal audio. You can also.

＜音質補正手段４５＞
次に、音質補正手段４５について、具体的に説明する。音質補正手段４５は、個人の音質補正変換テーブルを利用して、取得した個人のパーソナルな音質補正データと放送番組のジャンル情報又は番組毎にオーディオ信号に含まれて送信される特徴データを組み合わせて、番組毎に視聴者に適したオーディオ信号になるよう音質補正を行う。 <Sound quality correction means 45>
Next, the sound quality correction means 45 will be specifically described. The sound quality correction means 45 uses the personal sound quality correction conversion table to combine the acquired personal sound quality correction data with the genre information of the broadcast program or the feature data included in the audio signal for each program and transmitted. The sound quality is corrected so that an audio signal suitable for the viewer is obtained for each program.

図１２は、音質補正手段における補正処理ブロックの一例を示す図である。なお、図１２は、ナレーション音を明瞭で聞きやすくするためのブロック構成例である。図１２は、分離手段６１と、デジタルオーディオストリーム復号手段６２と、ナレーション復号手段６３と、ＭＤＣＴ手段６４と、ナレーションミックス比変換手段６５と、ダイナミックレンジ変換手段６６と、逆ＭＤＣＴ手段６７とを有するよう構成されている。 FIG. 12 is a diagram illustrating an example of a correction processing block in the sound quality correction unit. FIG. 12 is a block configuration example for making the narration sound clear and easy to hear. 12 includes a separating unit 61, a digital audio stream decoding unit 62, a narration decoding unit 63, an MDCT unit 64, a narration mix ratio conversion unit 65, a dynamic range conversion unit 66, and an inverse MDCT unit 67. It is configured as follows.

なお、図１２に示すブロック図は、いったんデジタルオーディオ信号を復号し、復号信号に音質補正を行う場合を示しているが、構成をより簡潔にするためデジタルオーディオストリーム復号時にこの機能を組み込むことも可能である。 The block diagram shown in FIG. 12 shows a case where the digital audio signal is once decoded and the sound quality correction is performed on the decoded signal. However, in order to simplify the configuration, this function may be incorporated when decoding the digital audio stream. Is possible.

分離手段６１は、入力されるＭＰＥＧ−２ＡＡＣデジタルオーディオストリームとナレーション特徴データを分離する。また、分離手段６１は、分離されたデジタルオーディオストリームをデジタルオーディオストリーム復号手段６２に出力し、ナレーション特徴データをナレーション特徴データ復号手段６３に出力する。 Separation means 61 separates the input MPEG-2 AAC digital audio stream and narration feature data. The separating unit 61 outputs the separated digital audio stream to the digital audio stream decoding unit 62 and outputs the narration feature data to the narration feature data decoding unit 63.

デジタルオーディオストリーム復号手段６２は、分離手段６１により得られるデジタルオーディオストリームをデジタルオーディオ信号に復号する。また、デジタルオーディオストリーム復号手段６２は、復号されたデジタルオーディオ信号をＭＤＣＴ手段６４に出力する。また、ナレーション特徴データ復号手段６３は、ナレーション特徴データを復号する。また、ナレーション特徴データ復号手段６３は、復号されたナレーション特徴データをＭＤＣＴ手段６４に出力する。 The digital audio stream decoding unit 62 decodes the digital audio stream obtained by the separating unit 61 into a digital audio signal. Further, the digital audio stream decoding unit 62 outputs the decoded digital audio signal to the MDCT unit 64. The narration feature data decoding unit 63 decodes the narration feature data. The narration feature data decoding unit 63 outputs the decoded narration feature data to the MDCT unit 64.

ＭＤＣＴ手段６４は、ＭＰＥＧ−２ＡＡＣデジタルオーディオストリームのフレームと同じＭＤＣＴ窓情報を用いて再度時間信号をＭＤＣＴ周波数データに変換する。 The MDCT means 64 converts the time signal into MDCT frequency data again using the same MDCT window information as the frame of the MPEG-2 AAC digital audio stream.

具体的には、ＭＤＣＴ手段６４により得られるＭＤＣＴ周波数データにナレーション特徴データをスケールファクタバンド毎に乗算することによりナレーション音のＭＤＣＴ周波数データを再現し、再現したナレーション音のＭＤＣＴ周波数データをナレーションミックス比変換手段６５に出力する。 Specifically, the MDCT frequency data obtained by the MDCT means 64 is multiplied by the narration feature data for each scale factor band to reproduce the MDCT frequency data of the narration sound, and the MDCT frequency data of the reproduced narration sound is converted to the narration mix ratio. It outputs to the conversion means 65.

ナレーションミックス比変換手段６５は、元の放送音（ナレーション音と背景音とが混合されたもの）のＭＤＣＴ周波数データを、音質補正データ蓄積手段４３から得られる音質補正テーブル（混合比率テーブル）のデータにしたがって視聴者に適切な混合比で乗算又は加減算することにより音質補正を行う。 The narration mix ratio conversion means 65 uses the MDCT frequency data of the original broadcast sound (mixed narration sound and background sound) as data of the sound quality correction table (mixing ratio table) obtained from the sound quality correction data storage means 43. Accordingly, the sound quality is corrected by multiplying or adding / subtracting with a mixing ratio appropriate for the viewer.

また、ナレーションミックス比変換手段６５は、音質補正した音声をダイナミックレンジ変換手段６６に出力する。ダイナミックレンジ変換手段６６は、ゲイン変換テーブルにしたがって、入力した信号に音質補正データ蓄積手段４３から得られる音質補正テーブル（ゲイン変換テーブル）のデータを乗算又は加減算することにより、音量の大きさが視聴者に適切な範囲内となる音質補正済み信号を作成する。ダイナミックレンジ変換手段６６は、その信号を逆ＭＤＣＴ手段６７に出力する。 The narration mix ratio conversion means 65 outputs the sound whose sound quality has been corrected to the dynamic range conversion means 66. In accordance with the gain conversion table, the dynamic range conversion unit 66 multiplies or adds / subtracts the input signal with the data of the sound quality correction table (gain conversion table) obtained from the sound quality correction data storage unit 43, so that the volume level can be viewed. Create a sound quality corrected signal that falls within the appropriate range for the user. The dynamic range conversion unit 66 outputs the signal to the inverse MDCT unit 67.

逆ＭＤＣＴ手段６７は、ダイナミックレンジ変換手段６６により得られる信号に逆ＭＤＣＴを行い音質補正済み信号を出力する。これにより、各個人に適した音質補正を行い高精度な音声を提供することができる。 The inverse MDCT means 67 performs inverse MDCT on the signal obtained by the dynamic range conversion means 66 and outputs a sound quality corrected signal. Thereby, sound quality correction suitable for each individual can be performed, and highly accurate voice can be provided.

上述したように、各個人に適した音質補正を行い高精度な音声を提供することができる。 As described above, sound quality correction suitable for each individual can be performed to provide highly accurate sound.

本発明により、不特定多数が同時に試聴する放送等の形態において、送信側では特徴データを有する同一の番組を複数の受信側に送信し、受信側で個人毎（視聴者毎）に最適な音質で聞きたい使用者は、事前に入手した音の聞きやすさに関するアンケート等の質問情報及びその質問情報に付随したテスト音声データからなる音質補正用データに基づいて回答することにより音質補正データを生成し、生成した音質補正データとデジタル放送波又はインターネット音のデジタルオーディオ符号化ストリーム信号に含まれるオーディオ特徴データとを組み合わせることにより、各個人に適した音質補正を行い高精度な音声を提供することができる。 According to the present invention, in the form of broadcasting or the like in which a large number of unspecified persons are simultaneously auditioned, the transmitting side transmits the same program having characteristic data to a plurality of receiving sides, and the receiving side has the optimum sound quality for each individual (for each viewer) The user who wants to listen to the sound generates sound quality correction data by answering based on the sound quality correction data consisting of the question information such as questionnaire about the ease of hearing of the sound obtained in advance and the test voice data attached to the question information In addition, by combining the generated sound quality correction data and the audio feature data included in the digital broadcast wave or digital audio encoded stream signal of Internet sound, sound quality correction suitable for each individual is performed and high-accuracy audio is provided. Can do.

例えば、放送番組にはニュース番組やニュース解説等の情報番組、スポーツ中継、音楽番組等の各種のジャンルがある。このうち、ニュース等の情報提供番組では、ナレーション音声が明瞭で聞きやすく、内容が正確に把握できることが望まれる。一方、音楽番組では、番組音の雰囲気を保つことが大切である。また、個人毎の聴力や嗜好が異なっており、個人毎に適切な音質補正することが必要である。本発明は、番組にあわせて個人に最適な音質補正を行ったオーディオ信号を提供することができる。本発明を適用することにより、特に高齢者等に聞き取りにくかった番組等の台詞、アナウンス、ナレーションを聞き取りやすくさせることができる。 For example, broadcast programs include various genres such as news programs, information programs such as news commentary, sports broadcasts, and music programs. Of these, it is desirable for information providing programs such as news that the narration voice is clear and easy to hear and the contents can be accurately grasped. On the other hand, in music programs, it is important to maintain the atmosphere of the program sound. In addition, hearing ability and preference for each individual are different, and it is necessary to correct sound quality appropriately for each individual. The present invention can provide an audio signal that has been subjected to optimum sound quality correction for an individual in accordance with a program. By applying the present invention, it is possible to make it easy to hear lines, announcements, and narrations of programs and the like that are difficult for the elderly and the like to hear.

以上本発明の好ましい実施形態について詳述したが、本発明は係る特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形、変更が可能である。 Although the preferred embodiment of the present invention has been described in detail above, the present invention is not limited to the specific embodiment, and various modifications, within the scope of the gist of the present invention described in the claims, It can be changed.

本発明における音質補正伝送システムの一構成例を示す図である。It is a figure which shows the example of 1 structure of the sound quality correction | amendment transmission system in this invention. 本実施形態における送信装置の機能構成の一例を示す図である。It is a figure which shows an example of a function structure of the transmitter in this embodiment. オーディオ特徴データの生成例を説明するための図である。It is a figure for demonstrating the example of a production | generation of audio feature data. ＳｃａｌｅｆａｃｔｏｒｂａｎｄｓとＭＤＣＴ周波数データの範囲を示す図である。It is a figure which shows the range of Scalefactor bands and MDCT frequency data. ｒａｗ＿ｄａｔａ＿ｂｌｏｃｋシンタックスのシンタックス例を示す図である。It is a figure which shows the syntax example of raw_data_block syntax. ｆｉｌｌ＿ｅｌｅｍｅｎｔ領域のシンタックス例を示す図である。It is a figure which shows the syntax example of a fill_element area | region. ｄａｔａ＿ｓｔｒｅａｍ＿ｅｌｅｍｅｎｔ領域のシンタックス例を示す図である。It is a figure which shows the example of a syntax of a data_stream_element area | region. ｆｉｌｌ＿ｅｌｅｍｅｎｔ領域に含まれるｅｘｔｅｎｓｉｏｎ＿ｐａｙｌｏａｄ領域のシンタックス例を示す図である。It is a figure which shows the syntax example of the extension_payload area | region contained in a fill_element area | region. 本実施形態における受信装置の機能構成の一例を示す図である。It is a figure which shows an example of a function structure of the receiver in this embodiment. ゲイン変換テーブルの一例を示す図である。It is a figure which shows an example of a gain conversion table. 補正前と補正後のデータ推移の一例を示す図である。It is a figure which shows an example of the data transition before correction | amendment and after correction | amendment. 音質補正手段における補正処理ブロックの一例を示す図である。It is a figure which shows an example of the correction process block in a sound quality correction | amendment means.

Explanation of symbols

１０音質補正伝送システム
１１送信装置
１２受信装置
１３通信ネットワーク
１４放送衛星
１５記録媒体
２１オーディオ特徴データ生成手段
２２番組制作手段
２３音質補正用データ生成手段
２４第１送信手段
２５第２送信手段
３１ミキシング手段
３２ＭＤＣＴ手段
３３ＭＰＥＧ−２ＡＡＣ符号化手段
３４スケールファクタバンド結合手段
３５スケールファクタバンド最大値検出手段
３６符号化手段
３７多重化手段
４１送信手段
４２音質補正データ生成手段
４３音質補正蓄積手段
４４映像音声分離手段
４５音質補正手段
４６視聴者情報取得手段
４７表示手段
４８音声出力手段
４９入力手段
６１分離手段
６２デジタルオーディオストリーム復号手段
６３ナレーション特徴データ復号手段
６４ＭＤＣＴ手段
６５ナレーションミックス比変換手段
６６ダイナミックレンジ変換手段
６７逆ＭＤＣＴ手段 DESCRIPTION OF SYMBOLS 10 Sound quality correction | amendment transmission system 11 Transmission apparatus 12 Reception apparatus 13 Communication network 14 Broadcasting satellite 15 Recording medium 21 Audio characteristic data generation means 22 Program production means 23 Sound quality correction data generation means 24 1st transmission means 25 2nd transmission means 31 Mixing means 32 MDCT means 33 MPEG-2 AAC encoding means 34 Scale factor band combining means 35 Scale factor band maximum value detecting means 36 Encoding means 37 Multiplexing means 41 Transmitting means 42 Sound quality correction data generating means 43 Sound quality correction accumulating means 44 Video / audio Separation means 45 Sound quality correction means 46 Viewer information acquisition means 47 Display means 48 Audio output means 49 Input means 61 Separation means 62 Digital audio stream decoding means 63 Narration feature data decoding means 64 MDCT hand 65 narration mix ratio converting means 66 dynamic range conversion means 67 inverse MDCT unit

Claims

Video, performs transmission of the program signal comprising a voice having an audio signal including a narration sound and background sound, in association with the audience information containing the auditory measurement data for each viewer, the program signal received by the receiving side A transmission device for correcting the sound quality of the sound included in
Any one of the narration sound and background sound included in the audio signal is converted into MDCT (Modified Discrete Case Transform) frequency data, and selected from a plurality of frequency bands constituting the frequency data of the converted audio signal Characteristic data for generating a maximum value of the total energy of the MDCT frequency data obtained by combining predetermined frequency bands to be generated as audio characteristic data of the frequency band, and multiplexing the generated audio characteristic data and the voice Generating means;
From the viewer information , in order to correct the sound quality of the sound on the receiving side , question information including a questionnaire regarding the ease of listening to the sound for each viewer, and the individual's hearing associated with the question information Sound quality correction data generating means for generating sound quality correction data having audio information including an audio signal for measuring
First transmission means for transmitting a program signal including audio feature data obtained by the feature data generation means to the reception side;
Have a second transmitting means for transmitting a sound quality correction data obtained by the sound quality correction data generation means on the receiving side,
The viewer information includes auditory measurement data, which is obtained as a response result to the sound quality correction data, and includes a desired mixing ratio of the narration sound and the background sound and a hearing level for each predetermined frequency band,
The sound quality correction performed on the receiving side is a correction composed of a correction conversion table of the mixing ratio and the volume level set corresponding to the hearing level for each predetermined frequency band from the viewer information. MDCT frequency obtained by generating data, performing MDCT conversion on an audio signal including narration sound and background sound included in the sound , multiplying the audio feature data for each frequency band of the converted MDCT frequency data, and multiplying A transmission apparatus characterized by multiplying or adding / subtracting data using the mixing ratio of the correction conversion table and the volume level for each viewer obtained corresponding to the frequency band .

A receiving device that receives a program signal transmitted from a transmission side and corrects the sound quality of an audio signal having a narration sound and a background sound in the received program signal in accordance with viewer information for each viewer. ,
Question information including a questionnaire regarding the ease of listening to sound for each viewer included in the sound quality correction data transmitted from the transmission side, and an audio signal for measuring an individual's hearing associated with the question information Viewer information including auditory measurement data, which is input as an answer result from each of the viewers, and includes a desired mixing ratio of the narration sound and the background sound and a hearing level for each predetermined frequency band Viewer information acquisition means for acquiring
Corresponding to the auditory measurement data obtained by the viewer information acquisition means, correction data comprising a correction conversion table of the mixing ratio and the volume level set corresponding to the hearing level for each of the predetermined frequency bands Sound quality correction data generating means for generating
Wherein the correction data obtained by the tone correction data generating means converts the one of the audio signals of the audio signal that includes a narration sound and background sound included in the audio and voice included in the program signal to the MDCT frequency data Based on audio feature data comprising a maximum value of total energy of the MDCT frequency data obtained by combining predetermined frequency bands selected from among a plurality of frequency bands constituting frequency data of the converted audio signal Te, have a correction means for correcting the sound quality of the voice for each of the viewer,
The correction means includes
The audio signal including the narration sound and background sound included in the voice is subjected to MDCT conversion, the audio feature data is multiplied for each frequency band of the converted MDCT frequency data, and the MDCT frequency data obtained by the multiplication is corrected. A receiving apparatus , wherein the sound quality of the sound is corrected by multiplying or adding / subtracting using the mixing ratio of the conversion table and the volume level for each viewer obtained corresponding to a frequency band .