JP2008301427A

JP2008301427A - Multichannel voice reproduction equipment

Info

Publication number: JP2008301427A
Application number: JP2007148390A
Authority: JP
Inventors: Joji Kasai; 譲治笠井; Tetsuo Nakatake; 哲郎中武; Kazunari Takemura; 和斉竹村
Original assignee: Onkyo Corp
Current assignee: Onkyo Corp
Priority date: 2007-06-04
Filing date: 2007-06-04
Publication date: 2008-12-11

Abstract

<P>PROBLEM TO BE SOLVED: To provide multichannel voice reproduction equipment capable of improving voice clarity in content such as a movie by emphasizing speech voice included mainly in a front center signal, and further making a listener feel as that the speech voice is reproduced in a listening position. <P>SOLUTION: The multichannel voice reproduction equipment includes a speech extracting means for outputting a speech extraction signal of the front center signal, a speech emphasizing means for providing speech emphasizing process to the speech extraction signal and outputting as a speech emphasizing signal, and a crosstalk canceling means for branching the speech emphasizing signal into a first speech emphasizing signal and a second speech emphasizing signal, providing crosstalk canceling process for eliminating interaural crosstalk of the listener on the respective signals, and generating a first crosstalk cancel signal and second crosstalk cancel signal. The first crosstalk cancel signal is reproduced from a left speaker, and the second crosstalk cancel signal is reproduced from a right speaker. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、サラウンド音声を含むマルチチャンネル音声を再生するマルチチャンネル音声再生装置に関し、特に、前方中央信号に主に含まれる台詞（セリフ）音声を強調して、映画等のコンテンツにおける音声明瞭度を改善するマルチチャンネル音声再生装置に関する。 The present invention relates to a multi-channel sound reproducing apparatus that reproduces multi-channel sound including surround sound, and in particular, emphasizes speech sound mainly included in a front center signal to improve sound clarity in content such as a movie. The present invention relates to an improved multi-channel audio reproduction apparatus.

２つを超える複数の独立した音声信号チャンネルをもつ、いわゆるマルチチャンネル音声信号を、映像信号と共に、もしくは、音声信号単独で再生してサラウンド音場を再生するマルチチャンネル音声再生装置が提供されている。各チャンネルに対応するスピーカーの配置方向、再生周波数帯域、サラウンド方式、等を含め、様々な方式・フォーマットが提案されている。多くの場合には、マルチチャンネル音声信号は、前方左信号Ｌおよび前方右信号Ｒと伴に、前方中央信号Ｃ、サラウンド左信号ＳＬ、サラウンド右信号ＳＲ、の合計５チャンネルの信号を含む場合もある。これらに加えて、低音信号ＬＦＥを含む、いわゆる５．１チャンネルから構成される場合もある。 There is provided a multi-channel audio reproduction apparatus that reproduces a surround sound field by reproducing a so-called multi-channel audio signal having a plurality of independent audio signal channels exceeding two together with a video signal or by an audio signal alone. . Various systems and formats have been proposed, including the speaker arrangement direction, playback frequency band, surround system, etc. corresponding to each channel. In many cases, the multi-channel audio signal may include a signal of a total of five channels including a front center signal C, a surround left signal SL, and a surround right signal SR together with the front left signal L and the front right signal R. is there. In addition to these, there is a case where it is configured by a so-called 5.1 channel including a bass signal LFE.

マルチチャンネル音声は、マルチチャンネル音声再生装置に接続されて各チャンネルに対応して配置される複数のスピーカーを含むスピーカー群から再生することを予定されている。前方音声信号（前方左信号Ｌ、前方右信号Ｒ、前方中央信号Ｃを含む。以下同じ。）は、視聴者の前方に配置される前方スピーカー（前方左スピーカーＬｓｐ、前方右スピーカーＲｓｐ、前方中央スピーカーＣｓｐを含む。以下同じ。）から再生される。また、サラウンド音声信号（サラウンド左信号ＳＬ、サラウンド右信号ＳＲを含む。以下同じ。）は、視聴者の側方又は後方に配置されるそれぞれのサラウンドスピーカー（サラウンド左スピーカーＳＬｓｐ、サラウンド右スピーカーＳＲｓｐ含む。以下同じ。）から再生されることが望ましい。 The multi-channel sound is scheduled to be reproduced from a speaker group including a plurality of speakers connected to the multi-channel sound reproducing device and arranged corresponding to each channel. A front audio signal (including a front left signal L, a front right signal R, and a front center signal C. The same applies hereinafter) is a front speaker (a front left speaker Lsp, a front right speaker Rsp, a front center) arranged in front of the viewer. It is reproduced from the speaker Csp. Surround sound signals (including a surround left signal SL and a surround right signal SR; the same applies hereinafter) include respective surround speakers (including a surround left speaker SLsp and a surround right speaker SRsp) arranged on the side or rear of the viewer. (The same shall apply hereinafter.)

しかし、マルチチャンネル音声を再生するスピーカー群が、視聴者の前方に配置される前方スピーカーのみから構成され、サラウンド音声信号のみを再生するサラウンドスピーカーを含まない場合、つまり、スピーカーが最小限の場合には、一般的なステレオ再生用の前方左スピーカーＬｓｐおよび前方右スピーカーＲｓｐしか備えないときには、マルチチャンネル音声再生装置では、前方中央信号およびサラウンド音声信号を前方左信号ならびに前方右信号に混合処理したステレオ音声信号を生成し、これを前方スピーカーから再生する。このステレオ音声信号を生成する処理においては、前方音声信号にサラウンド音声信号を加算するダウンミックス処理（簡易な位相シフト処理等を含む。）を行う場合の他に、サラウンド音声信号に対して頭部伝達関数ＨＲＴＦ（Head Related Transfer Function）に基づく仮想定位処理を行った音声信号を前方音声信号に加算する、いわゆる、バーチャルサラウンド処理を行う場合がある。なお、サラウンド音声信号を前方音声信号に仮想定位処理したステレオ音声信号を生成しなければ、前方左スピーカーＬｓｐおよび前方右スピーカーＲｓｐでは、受聴者の側方又は後方からサラウンド音声信号の音波が到来するような音場を再生できないという問題を生じる。 However, if the speaker group that plays back multi-channel audio consists of only the front speakers placed in front of the viewer and does not include surround speakers that play back only the surround audio signals, that is, if the speakers are minimal. When only the front left speaker Lsp and the front right speaker Rsp for general stereo reproduction are provided, the multi-channel audio reproducing apparatus is a stereo in which the front center signal and the surround audio signal are mixed into the front left signal and the front right signal. An audio signal is generated and reproduced from a front speaker. In the process of generating the stereo audio signal, in addition to the case of performing a downmix process (including a simple phase shift process) for adding the surround audio signal to the front audio signal, the head is applied to the surround audio signal. There is a case where a so-called virtual surround process is performed in which an audio signal that has been subjected to virtual localization processing based on a transfer function HRTF (Head Related Transfer Function) is added to the forward audio signal. If a stereo sound signal obtained by virtually localizing a surround sound signal into a front sound signal is not generated, sound waves of the surround sound signal arrive at the front left speaker Lsp and the front right speaker Rsp from the side or rear of the listener. This causes a problem that such a sound field cannot be reproduced.

ただし、従来には、マルチチャンネル音声信号のダイナミックレンジが広い映画等のコンテンツにおいては、登場人物の台詞音声が聞き取りにくい場合がある、という問題がある。映画等のコンテンツにおいては、台詞音声は、マルチチャンネル音声信号の前方中央信号に配分されているのが一般的である。そこで、従来には、台詞音声の明瞭度を改善するために、人間の音声帯域に相当する周波数領域のレベルを増大して強調処理するものがある。 However, conventionally, there is a problem that in the case of content such as a movie having a wide dynamic range of a multi-channel audio signal, it is sometimes difficult to hear the speech of the characters. In content such as a movie, the speech is generally distributed to the front center signal of the multi-channel audio signal. Therefore, conventionally, in order to improve the intelligibility of speech speech, there is a technique in which enhancement processing is performed by increasing the level of a frequency region corresponding to a human speech band.

例えば、画像に伴う音声信号の中から人間の声に相当する所定の周波数成分を抽出する周波数成分抽出回路を具備し、ステレオ信号の左及び右信号をそれぞれの前記抽出回路に通した信号を加算回路及び第１の減算回路に入力し、前記加算回路の出力をゲインコントロール回路を通してセンター用スピーカーから出力し、第１の減算回路の出力により前記ゲインコントロール回路を制御するテレビジョン受像器用音声回路がある（特許文献１）。 For example, a frequency component extraction circuit for extracting a predetermined frequency component corresponding to a human voice from an audio signal accompanying an image is added, and a signal obtained by passing the left and right signals of a stereo signal through the extraction circuits is added. An audio circuit for a television receiver that inputs to the circuit and the first subtraction circuit, outputs the output of the addition circuit from the center speaker through the gain control circuit, and controls the gain control circuit by the output of the first subtraction circuit; Yes (Patent Document 1).

また、従来には、少なくとも右チャンネル及び左チャンネルを含む多チャンネルの音声信号のうち、特定周波数帯域の音声信号を含むチャンネルの音声信号の周波数特性を、頭部伝達関数に基づいて決定された補正特性に従って補正する周波数特性補正手段と、補正された音声信号を、前記右チャンネルの音声信号及び前記左チャンネルの音声信号に混合し、それぞれ右チャンネル出力音声信号及び左チャンネル出力音声信号として出力する出力手段と、を備える多チャンネル音声信号の処理回路がある（特許文献２）。 Conventionally, the frequency characteristic of the audio signal of the channel including the audio signal of the specific frequency band among the multi-channel audio signals including at least the right channel and the left channel is corrected based on the head-related transfer function. Frequency characteristic correcting means for correcting according to the characteristics, and the corrected audio signal is mixed with the right channel audio signal and the left channel audio signal and output as a right channel output audio signal and a left channel output audio signal, respectively. And a multi-channel audio signal processing circuit (Patent Document 2).

特開平４−２４９４８４号公報（第１図）JP-A-4-249484 (FIG. 1) 特開２００４−２６６６０４号公報（第１図）JP 2004-266604 A (FIG. 1)

しかしながら、マルチチャンネル音声再生装置では、前方音声信号にサラウンド音声信号を加算するダウンミックス処理やバーチャルサラウンド処理を行う場合には、信号が加算されるためにダイナミックレンジが低下しやすく、特許文献１の場合のように音声レベルを変更する従来の方法で音声明瞭度を改善するには限界がある。頭部伝達関数に基づいて決定された補正特性に従って補正する特許文献２の場合であっても、この補正特性は、視聴者に対して左右方向から到来する音を正面方向から到来するように補正するための特性であり、視聴者が感じる音声の到来方向を修正するものにすぎない。 However, in a multi-channel audio reproduction device, when performing a downmix process or a virtual surround process in which a surround audio signal is added to a front audio signal, the dynamic range tends to decrease because the signals are added. There is a limit to improving the speech intelligibility by the conventional method of changing the voice level as in the case. Even in the case of Patent Document 2 in which correction is performed according to the correction characteristic determined based on the head-related transfer function, this correction characteristic corrects the sound coming from the left and right directions to the viewer from the front direction. This is only a characteristic for correcting the direction of arrival of the voice felt by the viewer.

本発明は、上記の従来技術が有する問題を解決するためになされたものであり、その目的は、サラウンド音声を含むマルチチャンネル音声を再生するマルチチャンネル音声再生装置に関し、特に、前方中央信号に主に含まれる台詞音声を強調し、さらに、受聴者の受聴位置で台詞音声が再生されているように感じさせることで、映画等のコンテンツにおける音声明瞭度を改善するマルチチャンネル音声再生装置を提供することにある。 The present invention has been made in order to solve the above-described problems of the prior art, and an object of the present invention relates to a multi-channel audio reproducing apparatus that reproduces multi-channel audio including surround audio, and particularly to a front center signal. A multi-channel audio reproduction device that improves speech intelligibility in content such as a movie by emphasizing the speech contained in the sound and further making it feel as if the speech is being reproduced at the listening position of the listener There is.

本発明のマルチチャンネル音声再生装置は、マルチチャンネル音声の前方中央信号に含まれる台詞音声信号を受聴者の前方に設置した左スピーカーおよび右スピーカーから再生し、受聴者の受聴位置での台詞音声を強調するマルチチャンネル音声再生装置であって、前方中央信号の台詞音声帯域を通過させた台詞抽出信号を出力する台詞抽出手段と、台詞抽出信号の台詞音声帯域の上限付近を最大レベルとなるようにレベルを相対的に増大させる台詞強調処理を行って、台詞強調信号として出力する台詞強調手段と、台詞強調信号を第１台詞強調信号および第２台詞強調信号に分岐して、それぞれを受聴者の両耳間クロストークを除去するクロストークキャンセル処理を行って第１クロストークキャンセル信号および第２クロストークキャンセル信号を生成するクロストークキャンセル手段と、を備え、第１クロストークキャンセル信号を左スピーカーから再生し、第２クロストークキャンセル信号を右スピーカーから再生する。 The multi-channel sound reproduction apparatus of the present invention reproduces the speech sound signal included in the front center signal of the multi-channel sound from the left speaker and the right speaker installed in front of the listener, and the speech sound at the listener's listening position is reproduced. A multi-channel audio reproducing apparatus for emphasizing speech extraction means for outputting a speech extraction signal that has passed through the speech sound band of the front center signal, and a maximum level near the upper limit of the speech sound band of the speech extraction signal. Line emphasis processing for relatively increasing the level and outputting as a line emphasis signal, the line emphasis signal is branched into a first line emphasis signal and a second line emphasis signal, and each of them is A first crosstalk cancel signal and a second crosstalk cancel are performed by performing a crosstalk cancellation process for removing interaural crosstalk. Comprising a crosstalk canceling means for generating a Le signal, and a first crosstalk cancel signal reproduced from the left speaker to reproduce the second crosstalk canceling signal from the right speaker.

さらに好ましくは、マルチチャンネル音声再生装置は、台詞抽出手段が、前方中央信号に台詞音声が含まれるか否かを検出する台詞検出手段を含み、台詞検出手段が、台詞抽出信号のエンベロープ信号の１〜４Ｈｚ成分レベルの大小にしたがって台詞抽出信号の出力レベルを制御し、台詞音声が含まれると検出する場合には台詞抽出信号の出力レベルを大きくし、台詞音声が含まれないと検出する場合には台詞抽出信号の出力レベルを小さくする。 More preferably, in the multi-channel sound reproducing apparatus, the line extracting means includes a line detecting means for detecting whether or not a line sound is included in the front center signal, and the line detecting means is one of envelope signals of the line extracted signal. When the output level of the speech extraction signal is controlled according to the level of the ~ 4 Hz component level, and when detecting that speech speech is included, the output level of the speech extraction signal is increased, and when detecting that speech speech is not included Decreases the output level of the dialogue extraction signal.

また、好ましくは、マルチチャンネル音声再生装置は、前方中央信号から台詞抽出信号を減算した台詞除去前方中央信号を生成する台詞除去手段と、台詞除去前方中央信号を第１および第２クロストークキャンセル信号にそれぞれ加算する加算手段と、を備え、台詞除去前方中央信号ならびに第１クロストークキャンセル信号を左スピーカーから再生し、台詞除去前方中央信号ならびに第２クロストークキャンセル信号を右スピーカーから再生する。 Preferably, the multi-channel sound reproduction apparatus generates a speech removal front center signal obtained by subtracting the speech extraction signal from the front center signal, and the speech removal front center signal as the first and second crosstalk cancellation signals. And a speech removal front center signal and the first crosstalk cancellation signal are reproduced from the left speaker, and a speech removal front center signal and the second crosstalk cancellation signal are reproduced from the right speaker.

また、好ましくは、マルチチャンネル音声再生装置は、前方中央信号から台詞抽出信号を減算した台詞除去前方中央信号を生成する台詞除去手段を備え、台詞除去前方中央信号を中央スピーカーから再生し、第１クロストークキャンセル信号を左スピーカーから再生し、第２クロストークキャンセル信号を右スピーカーから再生する。 Preferably, the multi-channel sound reproducing device includes speech removal means for generating a speech-removed front center signal obtained by subtracting the speech-extracted signal from the front center signal, and reproduces the speech-removed front center signal from the center speaker. A crosstalk cancellation signal is reproduced from the left speaker, and a second crosstalk cancellation signal is reproduced from the right speaker.

好ましくは、マルチチャンネル音声再生装置は、クロストークキャンセル手段が、クロストークキャンセル処理する第１台詞強調信号および第２台詞強調信号について、いずれか一方のみを入力する場合と、その両方を同相同レベルで入力する場合と、その両方を逆相同レベルで入力する場合と、のいずれかの場合を選択して入力する選択入力手段を含み、選択入力手段のいずれかの場合の選択により、受聴者への台詞音声の強調を、受聴者の両耳のいずれか一方の近傍に台詞音声を感じる場合と、受聴者の頭内部に台詞音声を感じる場合と、受聴者の後頭部に台詞音声を感じる場合と、のいずれかに切り換える。 Preferably, in the multi-channel audio reproducing device, when the crosstalk canceling unit inputs only one of the first speech enhancement signal and the second speech enhancement signal subjected to the crosstalk cancellation processing, The selection input means for selecting and inputting either of the case of inputting at the reverse homology level or the case of inputting both of them at the reverse homology level, to the listener by selection in either case of the selection input means If you feel speech in the vicinity of one of the listener's ears, if you feel speech in the listener's head, or if you feel speech in the back of the listener Switch to one of the following.

好ましくは、マルチチャンネル音声再生装置は、マルチチャンネル音声のうち前方左信号ならびに前方右信号をさらに受ける上記のマルチチャンネル音声再生装置であって、前方左信号を第１クロストークキャンセル信号に加算し、かつ、前方右信号を第２クロストークキャンセル信号に加算する加算手段と、を備える。 Preferably, the multi-channel audio reproduction device is the multi-channel audio reproduction device further receiving the front left signal and the front right signal among the multi-channel audio, and adds the front left signal to the first crosstalk cancellation signal, And an adding means for adding the front right signal to the second crosstalk cancellation signal.

好ましくは、マルチチャンネル音声再生装置は、マルチチャンネル音声のうちサラウンド左信号ならびにサラウンド右信号をさらに受ける上記のマルチチャンネル音声再生装置であって、サラウンド左信号ならびにサラウンド右信号に、仮想到来角度の伝達関数に基づく両耳間クロストークを付与するクロストーク付与処理を行って第１クロストーク付与信号および第２クロストーク付与信号を生成するクロストーク付与手段と、第１クロストーク付与信号を第１台詞強調信号に加算し、かつ、第２クロストーク付与信号を第２台詞強調信号に加算する加算手段と、を備える。 Preferably, the multi-channel audio reproduction device is the multi-channel audio reproduction device that further receives a surround left signal and a surround right signal among the multi-channel audio, and transmits a virtual arrival angle to the surround left signal and the surround right signal. Crosstalk giving means for generating a first crosstalk giving signal and a second crosstalk giving signal by performing a crosstalk giving process for giving an interaural crosstalk based on a function; and a first line of the first crosstalk giving signal. Adding means for adding to the enhancement signal and adding the second crosstalk giving signal to the second speech enhancement signal.

以下、本発明の作用について説明する。 The operation of the present invention will be described below.

本発明のマルチチャンネル音声再生装置は、マルチチャンネル音声の前方中央信号に含まれる台詞音声信号を受聴者の前方に設置した左スピーカーおよび右スピーカーから再生し、受聴者の受聴位置での台詞音声を強調するマルチチャンネル音声再生装置であって、マルチチャンネル音声のうち、前方左信号ならびに前方右信号をさらに受ける場合、ならびに、サラウンド左信号ならびにサラウンド右信号をさらに受ける場合にも対応する。なお、左スピーカーおよび右スピーカーに加えて、中央スピーカー、および、視聴者の側方又は後方に配置されるそれぞれのサラウンドスピーカーをさらに備える場合には、マルチチャンネル音声再生装置は、左スピーカーおよび右スピーカーからのみならず、これらのスピーカーからサラウンド音場を再生することができる。 The multi-channel sound reproduction apparatus of the present invention reproduces the speech sound signal included in the front center signal of the multi-channel sound from the left speaker and the right speaker installed in front of the listener, and the speech sound at the listener's listening position is reproduced. The multi-channel sound reproducing apparatus to be emphasized corresponds to a case of further receiving a front left signal and a front right signal and a case of further receiving a surround left signal and a surround right signal among the multi-channel sound. In addition, in addition to the left speaker and the right speaker, when the center speaker and each surround speaker arranged on the side or rear of the viewer are further provided, the multi-channel sound reproducing device includes the left speaker and the right speaker. The surround sound field can be reproduced from these speakers as well as from.

マルチチャンネル音声再生装置は、前方中央信号の台詞音声帯域を通過させた台詞抽出信号を出力する台詞抽出手段と、台詞抽出信号の台詞音声帯域の上限付近を最大レベルとなるようにレベルを相対的に増大させる台詞強調処理を行って、台詞強調信号として出力する台詞強調手段と、台詞強調信号を第１台詞強調信号および第２台詞強調信号に分岐して、それぞれを受聴者の両耳間クロストークを除去するクロストークキャンセル処理を行って第１クロストークキャンセル信号および第２クロストークキャンセル信号を生成するクロストークキャンセル手段と、を備える。したがって、マルチチャンネル音声信号の前方中央信号に配分されている台詞音声の明瞭度を改善して再生することができる。なお、台詞抽出信号を出力する台詞抽出手段は、２００Ｈｚ以上５ｋＨｚ以下の台詞音声帯域を通過させるようにするのが好ましい。 The multi-channel audio reproduction device includes a line extraction unit that outputs a line extraction signal that has passed through the line audio band of the front center signal, and a relative level that is near the upper limit of the line audio band of the line extraction signal. A line emphasizing process that increases the line emphasis and outputs as a line emphasis signal, and the line emphasis signal is branched into a first line emphasis signal and a second line emphasis signal. Crosstalk cancellation means for generating a first crosstalk cancellation signal and a second crosstalk cancellation signal by performing a crosstalk cancellation process for removing the talk. Therefore, it is possible to improve the clarity of the speech that is distributed to the front center signal of the multi-channel audio signal and reproduce it. In addition, it is preferable that the line extraction means for outputting the line extraction signal allows a line voice band of 200 Hz to 5 kHz to pass.

台詞抽出手段は、前方中央信号に台詞音声が含まれるか否かを、台詞抽出信号のエンベロープ信号の１〜４Ｈｚ成分レベルの大小にしたがって検出する台詞検出手段を含んでいてもよい。また、マルチチャンネル音声再生装置は、前方中央信号から台詞抽出信号を減算した台詞除去前方中央信号を生成する台詞除去手段を更に備えていても良い。その結果、本発明のマルチチャンネル音声再生装置は、台詞抽出手段および台詞強調手段により、前方中央信号に主に含まれる台詞音声を強調することができ、前方中央信号に含まれる台詞音声以外の音声も、台詞除去前方中央信号により欠落することなくサラウンド音場に再生することができる。 The line extraction means may include a line detection means for detecting whether or not the line speech is included in the front center signal according to the magnitude of the 1 to 4 Hz component level of the envelope signal of the line extraction signal. In addition, the multi-channel sound reproducing apparatus may further include a speech removing unit that generates a speech removed front central signal obtained by subtracting the speech extraction signal from the forward central signal. As a result, the multi-channel sound reproducing apparatus of the present invention can emphasize speech that is mainly included in the front center signal by the speech extraction means and speech enhancement means, and the sound other than the speech sound included in the forward center signal. Can be reproduced in the surround sound field without being lost by the dialogue-removed front center signal.

ここで、クロストークキャンセル処理とは、受聴者の前方に設置した左スピーカーおよび右スピーカーと、受聴者の両耳（左耳および右耳）と、の空間配置から生じる頭部伝達関数ＨＲＴＦ（Head Related Transfer Function）に基づく信号処理であり、左スピーカーから右耳へ、もしくは、右スピーカーから左耳へ、の空間伝搬のクロストークを打ち消して、例えば、左チャンネルからの再生音は左耳のみへ、右チャンネルからの再生音は右耳のみへ、伝送しようとする信号処理である。クロストークキャンセル処理により、受聴者の両耳では、反対側の耳（スピーカーから遠い側の耳）への音声再生がほぼ無くなるので、あたかもヘッドホンを装着した場合、もしくは、耳介の近傍にスピーカーが配置されている場合、に近い音場が再生されることになる。したがって、受聴者は、クロストークキャンセル処理によって、耳の直ぐ近くに再生音源が存在するような感覚を得ることができる。 Here, the crosstalk cancellation processing is a head related transfer function HRTF (Head) generated from a spatial arrangement of a left speaker and a right speaker installed in front of the listener and both ears (left ear and right ear) of the listener. Signal transfer based on Related Transfer Function), canceling the crosstalk of spatial propagation from the left speaker to the right ear, or from the right speaker to the left ear, for example, the playback sound from the left channel is only to the left ear This is signal processing for transmitting the reproduced sound from the right channel only to the right ear. The crosstalk cancellation process almost eliminates audio playback to the ears on the opposite side (the ears far from the speakers) in the listener's ears, so if the headphones are worn or if the speakers are near the pinna When arranged, the sound field close to is reproduced. Therefore, the listener can obtain a feeling that the reproduced sound source exists in the immediate vicinity of the ear by the crosstalk cancellation process.

本発明のマルチチャンネル音声再生装置では、強調された台詞音声に基づく第１クロストークキャンセル信号を左スピーカーから再生し、第２クロストークキャンセル信号を右スピーカーから再生するので、クロストークキャンセル処理によって、受聴者は耳の直ぐ近くで台詞音声が再生されているように知覚する。したがって、台詞音声は、台詞強調処理によって強調されることに加えて、台詞音声以外のサラウンド音場とは定位感が異なるので、その結果、受聴者にとって、台詞音声に関する明瞭度を改善することができる。 In the multi-channel audio reproduction device of the present invention, the first crosstalk cancellation signal based on the emphasized speech is reproduced from the left speaker, and the second crosstalk cancellation signal is reproduced from the right speaker. The listener perceives that speech is being played in the immediate vicinity of the ear. Therefore, in addition to the speech sound being emphasized by the speech emphasis process, the localization sound is different from the surround sound field other than the speech sound, and as a result, the intelligibility of the speech sound can be improved for the listener. it can.

第１クロストークキャンセル信号および第２クロストークキャンセル信号は、台詞強調信号を分岐する第１台詞強調信号および第２台詞強調信号を、クロストークキャンセル処理して生成されるので、第１台詞強調信号および第２台詞強調信号について、いずれか一方のみを入力する場合と、その両方を同相同レベルで入力する場合と、その両方を逆相同レベルで入力する場合と、のいずれかの場合を選択して入力することで、受聴者への台詞音声の強調再生を選択して提供することができる。具体的には、受聴者への台詞音声の強調は、受聴者の両耳のいずれか一方の近傍に台詞音声を感じる場合と、受聴者の頭内部に台詞音声を感じる場合と、受聴者の後頭部に台詞音声を感じる場合と、のいずれかに切り換えることができる。したがって、台詞抽出手段によって台詞音声が存在する場合にのみ強調処理がなされることに加えて、受聴者は、聞きやすい台詞音声の定位感を選択することができ、さらに台詞音声に関する明瞭度が高まるように感じることができる。 Since the first crosstalk cancellation signal and the second crosstalk cancellation signal are generated by performing the crosstalk cancellation process on the first dialogue enhancement signal and the second dialogue enhancement signal that branch the dialogue enhancement signal, the first dialogue enhancement signal And the second line emphasis signal, select either one of the two, the case where both are input at the same homology level, or the case where both are input at the reverse homology level. Input, it is possible to select and provide emphasized reproduction of speech to the listener. Specifically, speech speech enhancement to the listener is based on whether the speech sound is felt near one of the listener's ears, the speech sound is felt inside the listener's head, It is possible to switch to either the case where a speech voice is felt at the back of the head. Therefore, in addition to the emphasis process being performed only when the speech is present by the speech extraction means, the listener can select a sense of localization of the speech that is easy to hear, and the clarity of the speech is further increased. Can feel like.

また、本発明のマルチチャンネル音声再生装置は、クロストークキャンセル手段を備えるので、サラウンド左信号ならびにサラウンド右信号に仮想到来角度の伝達関数に基づく両耳間クロストークを付与するクロストーク付与手段を更に備えることで、サラウンド左信号ならびにサラウンド右信号にバーチャルサラウンド処理を行うことができる。したがって、サラウンドスピーカーを備えていなくても、側方若しくは後方に仮想定位させた仮想音源により、受聴者にサラウンド音場を提供できる。 In addition, since the multi-channel audio reproduction device of the present invention includes the crosstalk canceling unit, the multichannel audio reproducing device further includes a crosstalk applying unit that applies interaural crosstalk based on the transfer function of the virtual arrival angle to the surround left signal and the surround right signal. By providing, virtual surround processing can be performed on the surround left signal and the surround right signal. Therefore, a surround sound field can be provided to the listener by a virtual sound source that is virtually localized laterally or rearward even without a surround speaker.

本発明のマルチチャンネル音声再生装置は、前方中央信号に主に含まれる台詞音声を強調し、クロストークキャンセル処理によって、受聴者の耳の直ぐ近くに再生音源が存在するような感覚を得させることができる。受聴者の受聴位置で台詞音声が再生されているように感じさせることで、映画等のコンテンツにおける音声明瞭度を改善することができる。 The multi-channel audio reproduction apparatus of the present invention emphasizes the dialogue sound mainly included in the front center signal, and obtains a feeling that the reproduction sound source exists in the immediate vicinity of the listener's ear by the crosstalk cancellation process. Can do. By making the listener feel as if the speech is being played at the listening position of the listener, it is possible to improve the audio clarity in content such as a movie.

本発明のマルチチャンネル音声再生装置は、映画等のコンテンツにおける音声明瞭度を改善するという目的を、マルチチャンネル音声再生装置が、前方中央信号の台詞音声帯域を通過させた台詞抽出信号を出力する台詞抽出手段と、台詞抽出信号の台詞音声帯域の上限付近を最大レベルとなるようにレベルを相対的に増大させる台詞強調処理を行って、台詞強調信号として出力する台詞強調手段と、台詞強調信号を第１台詞強調信号および第２台詞強調信号に分岐して、それぞれを受聴者の両耳間クロストークを除去するクロストークキャンセル処理を行って第１クロストークキャンセル信号および第２クロストークキャンセル信号を生成するクロストークキャンセル手段と、を備え、第１クロストークキャンセル信号を左スピーカーから再生し、第２クロストークキャンセル信号を右スピーカーから再生するように構成することにより、実現した。 The multi-channel audio reproduction device of the present invention is intended to improve the audio intelligibility in content such as a movie, and the multi-channel audio reproduction device outputs a dialogue extraction signal that passes the dialogue audio band of the front center signal. Extraction means, speech emphasis processing for relatively increasing the level so that the upper limit of the speech voice band of the speech extraction signal is at the maximum level, and outputting the speech emphasis signal as a speech emphasis signal; The first dialogue emphasis signal and the second dialogue emphasis signal are branched to each other, and the first crosstalk cancellation signal and the second crosstalk cancellation signal are obtained by performing a crosstalk cancellation process for removing the interaural crosstalk of the listener. Crosstalk canceling means for generating the first crosstalk cancellation signal from the left speaker. And, by configuring so as to reproduce the second crosstalk canceling signal from the right speaker, it was realized.

以下、本発明の好ましい実施形態によるマルチチャンネル音声再生装置について説明するが、本発明はこれらの実施形態には限定されない。 Hereinafter, although the multi-channel audio | voice reproduction apparatus by preferable embodiment of this invention is demonstrated, this invention is not limited to these embodiment.

図１は、本発明の好ましい実施形態によるマルチチャンネル音声再生装置１について説明する図である。図１（ａ）は、マルチチャンネル音声再生装置１において信号処理を行うＤＳＰ（Digital Signal Processor）１０を説明するブロックダイアグラムであり、図１（ｂ）は、台詞強調処理の周波数特性を説明するグラフである。ＤＳＰ１０は、その処理プログラムおよび係数を変更することによって、信号処理を変更することができる。なお、図１（ａ）では、マルチチャンネル音声再生装置１のＤＳＰ１０以外の構成（例えば、電源回路、制御回路、等。）は省略されている。また、本実施例のマルチチャンネル音声再生装置１には、後述する左スピーカーＬｓｐと、右スピーカーＲｓｐと、が接続される。 FIG. 1 is a diagram for explaining a multi-channel audio reproducing apparatus 1 according to a preferred embodiment of the present invention. FIG. 1A is a block diagram illustrating a DSP (Digital Signal Processor) 10 that performs signal processing in the multi-channel audio reproduction apparatus 1, and FIG. 1B is a graph illustrating frequency characteristics of the dialogue enhancement processing. It is. The DSP 10 can change the signal processing by changing its processing program and coefficients. In FIG. 1A, configurations other than the DSP 10 of the multi-channel audio reproduction device 1 (for example, a power supply circuit, a control circuit, etc.) are omitted. In addition, a left speaker Lsp and a right speaker Rsp, which will be described later, are connected to the multichannel audio reproduction device 1 of the present embodiment.

具体的には、マルチチャンネル音声再生装置１は、（図示しない）ディスクプレーヤー等の他の再生装置から入力されるマルチチャンネル音声データを、５チャンネルサラウンド音声信号にデコードする（図示しない）デコード回路と、ＤＳＰ１０からの出力信号をＤ／Ａ変換してアナログ信号にする（図示しない）ＤＡ変換器と、ＤＡ変換器の出力を増幅して（図示しない）複数のスピーカーへ出力する（図示しない）増幅器を含む。なお、デコード回路は、ＤＳＰ１０に含まれていても良い。また、５チャンネルの音声信号とは、前方左信号Ｌ、前方右信号Ｒ、前方中央信号Ｃ、サラウンド左信号ＳＬ、サラウンド右信号ＳＲの全帯域の成分を含む合計５チャンネルであり、帯域制限された低音信号ＬＦＥ、もしくは、他のサラウンドチャンネル信号をさらに含んでいてもよい。 Specifically, the multi-channel audio reproduction device 1 includes a decoding circuit (not shown) that decodes multi-channel audio data input from another reproduction device (not shown) such as a disc player into a 5-channel surround audio signal. A D / A converter (not shown) converts the output signal from the DSP 10 into an analog signal, and an amplifier (not shown) that amplifies the output of the DA converter (not shown) and outputs it to a plurality of speakers (not shown) including. Note that the decoding circuit may be included in the DSP 10. The five-channel audio signal is a total of five channels including components of all bands of the front left signal L, the front right signal R, the front center signal C, the surround left signal SL, and the surround right signal SR, and is band-limited. The bass signal LFE or other surround channel signal may be further included.

マルチチャンネル音声再生装置１のＤＳＰ１０は、前方中央信号Ｃに配分されている台詞音声に対して、強調処理と、クロストークキャンセル処理と、を行う。前方中央信号Ｃは、台詞抽出回路１１に入力されて台詞抽出信号Ｃｄ０として出力される。台詞抽出回路１１は、通過帯域が人間の音声帯域に相当する２００Ｈｚ〜５ｋＨｚに設定された帯域通過フィルタ（ＢＰＦ）１１ａを含む。台詞抽出信号Ｃｄ０は、台詞強調回路１２に入力されて台詞強調信号Ｃｅ０として出力される。台詞強調回路１２は、人間の音声帯域に相当する２００Ｈｚ〜５ｋＨｚを他の帯域よりも相対的に高いレベルに増大し、かつ、人間の音声の高次フォルマントに相当する帯域である約４ｋＨｚの帯域を、更に相対的に高いレベルに増大する回路であり、（図示しない）帯域通過フィルタにより構成される。したがって、台詞強調信号Ｃｅ０は、図１（ｂ）の周波数特性に示すように、台詞音声の明瞭度が改善された信号となる。 The DSP 10 of the multi-channel audio reproduction device 1 performs enhancement processing and crosstalk cancellation processing on the speech that is distributed to the front center signal C. The front center signal C is input to the line extraction circuit 11 and output as a line extraction signal Cd0. The speech extraction circuit 11 includes a band pass filter (BPF) 11a whose pass band is set to 200 Hz to 5 kHz corresponding to a human voice band. The speech extraction signal Cd0 is input to the speech enhancement circuit 12 and output as a speech enhancement signal Ce0. The line emphasis circuit 12 increases a frequency of 200 Hz to 5 kHz corresponding to a human voice band to a level relatively higher than other bands, and a band of about 4 kHz which is a band corresponding to a higher-order formant of human voice. Is increased to a relatively higher level, and is constituted by a band-pass filter (not shown). Therefore, the dialogue emphasis signal Ce0 is a signal with improved clarity of dialogue speech as shown in the frequency characteristics of FIG.

台詞強調信号Ｃｅ０は、選択入力回路１３へと入力される。選択入力回路１３は、台詞強調信号Ｃｅ０に対して係数ｋｅ１ならびにｋｅ２をそれぞれ乗算する乗算器１３ａおよび１３ｂを含み、それぞれの乗算器１３ａおよび１３ｂの出力は、２つに分岐された第１台詞強調信号Ｃｅ１および第２台詞強調信号Ｃｅ２となる。後述するように、係数ｋｅ１ならびにｋｅ２は、第１台詞強調信号Ｃｅ１および第２台詞強調信号Ｃｅ２のいずれか一方のみを入力する場合と、その両方を同相同レベルで入力する場合と、その両方を逆相同レベルで入力する場合と、のいずれかの場合を選択して入力する場合と、を切り換えるときは、１、０、−１の値のいずれかから選択される。第１台詞強調信号Ｃｅ１および第２台詞強調信号Ｃｅ２は、後述する加算器１８を経て、クロストークキャンセル回路１４へ入力される。 The dialogue emphasis signal Ce0 is input to the selection input circuit 13. The selection input circuit 13 includes multipliers 13a and 13b that multiply the line emphasis signal Ce0 by coefficients ke1 and ke2, respectively. The outputs of the multipliers 13a and 13b are divided into two first line emphasis signals. The signal Ce1 and the second dialogue emphasis signal Ce2 are obtained. As will be described later, the coefficients ke1 and ke2 are obtained by inputting only one of the first dialogue emphasizing signal Ce1 and the second dialogue emphasizing signal Ce2, and inputting both at the same homology level. When switching between the case of inputting at the reverse homology level and the case of selecting and inputting one of the cases, one of values of 1, 0, and −1 is selected. The first speech enhancement signal Ce1 and the second speech enhancement signal Ce2 are input to the crosstalk cancellation circuit 14 via an adder 18 described later.

クロストークキャンセル回路１４は、第１台詞強調信号Ｃｅ１および第２台詞強調信号Ｃｅ２を入力にして、第１クロストークキャンセル信号Ｃｃ１および第２クロストークキャンセル信号Ｃｃ２を出力する。そして、後述する加算器１５と（図示しない）増幅器とを経て、第１クロストークキャンセル信号Ｃｃ１は、左スピーカーＬｓｐから再生され、第２クロストークキャンセル信号Ｃｃ２は、右スピーカーＲｓｐから再生される。なお、第１クロストークキャンセル信号Ｃｃ１は、ＤＳＰ１０からの出力信号Ｌｖに含まれて、第２クロストークキャンセル信号Ｃｃ２は、ＤＳＰ１０からの出力信号Ｒｖに含まれて、それぞれ左スピーカーＬｓｐと右スピーカーＲｓｐに入力される。 The crosstalk cancellation circuit 14 receives the first speech enhancement signal Ce1 and the second speech enhancement signal Ce2 and outputs the first crosstalk cancellation signal Cc1 and the second crosstalk cancellation signal Cc2. The first crosstalk cancellation signal Cc1 is reproduced from the left speaker Lsp and the second crosstalk cancellation signal Cc2 is reproduced from the right speaker Rsp through an adder 15 and an amplifier (not shown) which will be described later. The first crosstalk cancellation signal Cc1 is included in the output signal Lv from the DSP 10, and the second crosstalk cancellation signal Cc2 is included in the output signal Rv from the DSP 10, and the left speaker Lsp and the right speaker Rsp, respectively. Is input.

図２は、マルチチャンネル音声再生装置１が再生するサラウンド音場における前方スピーカーの配置、ならびに、受聴者２の頭部伝達関数ＨＲＴＦを説明する図である。マルチチャンネル音声再生装置１のクロストークキャンセル回路１４は、頭部伝達関数ｈ１１、ｈ１２、ｈ２１、ならびに、ｈ２２に基づいて算出される伝達関数Ｈ１１、Ｈ１２、Ｈ２１、ならびに、Ｈ２２を有するフィルタから構成されるラティス型フィルタを有する。伝達関数Ｈ１１、Ｈ１２、Ｈ２１、ならびに、Ｈ２２を有するフィルタからなるラティスフィルタは、空間伝搬のクロストークを打ち消して、左スピーカーＬｓｐから受聴者２の右耳２Ｒへの頭部伝達関数ｈ１２、もしくは、右スピーカーＲｓｐから受聴者２の左耳２Ｌへの頭部伝達関数ｈ２１を実質的に０（ゼロ）に近づけようとするフィルタである。なお、クロストークキャンセル回路１４のラティス型フィルタは、受聴者２に対する前方スピーカーの配置が左右対称であれば、加算器並びに減算器と２つのフィルタからなるシャフラー型フィルタで代用しても良い。 FIG. 2 is a diagram for explaining the arrangement of the front speakers in the surround sound field reproduced by the multi-channel sound reproducing device 1 and the head-related transfer function HRTF of the listener 2. The crosstalk cancellation circuit 14 of the multi-channel audio reproduction device 1 is composed of filters having transfer functions H11, H12, H21, and H22 calculated based on the head-related transfer functions h11, h12, h21, and h22. Lattice type filter. A lattice filter composed of filters having transfer functions H11, H12, H21, and H22 cancels the crosstalk of spatial propagation, and the head related transfer function h12 from the left speaker Lsp to the right ear 2R of the listener 2 or This is a filter that tries to make the head-related transfer function h21 from the right speaker Rsp to the left ear 2L of the listener 2 substantially close to 0 (zero). Note that the lattice type filter of the crosstalk cancellation circuit 14 may be replaced with a shuffler type filter including an adder, a subtracter, and two filters as long as the arrangement of the front speakers with respect to the listener 2 is symmetrical.

したがって、第１クロストークキャンセル信号Ｃｃ１および第２クロストークキャンセル信号Ｃｃ２を、それぞれ左スピーカーＬｓｐおよび右スピーカーＲｓｐから再生すると、第１台詞強調信号Ｃｅ１に含まれる台詞音声が受聴者２の左耳２Ｌへ、第２台詞強調信号Ｃｅ２に含まれる台詞音声が受聴者２の右耳２Ｒへ再生される。マルチチャンネル音声再生装置１は、クロストークキャンセル回路１４を備えているので、その結果、受聴者２は、両耳の耳介の近傍に仮想スピーカー（図２における点線で記載されるＣＥＬならびにＣＥＲ）が配置されており、これらから強調された台詞音声が再生されるような感覚を得ることができる。 Therefore, when the first crosstalk cancellation signal Cc1 and the second crosstalk cancellation signal Cc2 are reproduced from the left speaker Lsp and the right speaker Rsp, respectively, the speech included in the first speech enhancement signal Ce1 is the left ear 2L of the listener 2. The speech contained in the second speech enhancement signal Ce2 is reproduced to the right ear 2R of the listener 2. Since the multi-channel audio reproduction device 1 includes the crosstalk cancellation circuit 14, as a result, the listener 2 can use virtual speakers (CEL and CER indicated by dotted lines in FIG. 2) near the pinna of both ears. Can be obtained, and a sense that the emphasized speech is reproduced from these can be obtained.

図３は、マルチチャンネル音声再生装置１によって受聴者２が得る台詞音声の定位感について説明する図である。受聴者２は、仮想スピーカーＣＥＬが受聴者２の左耳２Ｌの近傍に、また、仮想スピーカーＣＥＲが受聴者２の右耳２Ｒの近傍に配置されているような感覚を得るので、選択入力回路１３の乗算器１３ａおよび１３ｂの係数ｋｅ１ならびにｋｅ２をそれぞれ変更することにより、図３（ａ）〜（ｄ）のそれぞれに点の集合領域ｘで示すようなイメージの定位感が実現される。図３（ａ）は、選択入力回路１３の係数が（ｋｅ１＝１、ｋｅ２＝０）であって、強調された台詞音声が左耳２Ｌの近傍に定位する場合であり、図３（ｂ）は、選択入力回路１３の係数が（ｋｅ１＝０、ｋｅ２＝１）であって、強調された台詞音声が右耳２Ｒの近傍に定位する場合であり、図３（ｃ）は、選択入力回路１３の係数が（ｋｅ１＝ｋｅｘ、ｋｅ２＝ｋｅｘ、ただし、０．５≦ｋｅｘ≦１）であって、強調された台詞音声が受聴者２の頭の中に定位する場合であり、図３（ｄ）は、選択入力回路１３の係数が（ｋｅ１＝ｋｅｘ、ｋｅ２＝−ｋｅｘ、ただし０．５≦ｋｅｘ≦１）であって、強調された台詞音声が受聴者２の後頭部付近に定位する場合である。受聴者２は、マルチチャンネル音声再生装置１を操作することで選択入力回路１３の係数ｋｅ１およびｋｅ２を変更し、強調された台詞音声が聞き取りやすい定位感を選択することができる。 FIG. 3 is a diagram for explaining the sense of localization of the speech sound obtained by the listener 2 by the multi-channel sound reproducing device 1. The listener 2 obtains a feeling that the virtual speaker CEL is arranged in the vicinity of the left ear 2L of the listener 2 and the virtual speaker CER is arranged in the vicinity of the right ear 2R of the listener 2, so that the selection input circuit By changing the coefficients ke1 and ke2 of the 13 multipliers 13a and 13b, an image localization feeling as shown by the point collection region x in each of FIGS. 3 (a) to 3 (d) is realized. FIG. 3A shows a case where the coefficient of the selection input circuit 13 is (ke1 = 1, ke2 = 0) and the emphasized speech is localized in the vicinity of the left ear 2L. FIG. Is a case where the coefficient of the selection input circuit 13 is (ke1 = 0, ke2 = 1) and the emphasized speech is localized in the vicinity of the right ear 2R. FIG. 3C shows the selection input circuit. The coefficient of 13 is (ke1 = kex, ke2 = kex, where 0.5 ≦ kex ≦ 1), and the emphasized speech is localized in the head of the listener 2, FIG. d) is a case where the coefficients of the selection input circuit 13 are (ke1 = kex, ke2 = −kex, where 0.5 ≦ kex ≦ 1), and the emphasized speech is localized near the back of the listener 2 It is. The listener 2 can change the coefficients ke1 and ke2 of the selection input circuit 13 by operating the multi-channel sound reproducing device 1, and can select a localization that makes it easy to hear the emphasized speech.

一方で、マルチチャンネル音声再生装置１のＤＳＰ１０は、前方中央信号Ｃから台詞抽出信号Ｃｄ０を減算した台詞除去前方中央信号Ｃｆ０を生成する台詞除去回路１６を備える。台詞除去回路１６は、台詞抽出信号Ｃｄ０に係数（−１）を乗算してその位相を反転する乗算器１６ａと、前方中央信号Ｃから台詞抽出信号Ｃｄ０を減算して台詞除去前方中央信号Ｃｆ０を生成する加算器１６ｂと、加算器１６ｂの出力に係数ｋｃ１（＝０．７０７）を乗算する乗算器１６ｃとを含む。台詞除去前方中央信号Ｃｆ０は、加算器１５へと入力される。加算器１５は、前方左信号Ｌ、第１クロストークキャンセル信号Ｃｃ１、台詞除去前方中央信号Ｃｆ０、を加算して出力信号Ｌｖとする加算器１５ａと、前方右信号Ｒ、第２クロストークキャンセル信号Ｃｃ２、台詞除去前方中央信号Ｃｆ０、を加算して出力信号Ｒｖとする加算器１５ｂと、を含んでいる。 On the other hand, the DSP 10 of the multi-channel sound reproducing device 1 includes a dialogue removal circuit 16 that generates a dialogue removal front center signal Cf0 obtained by subtracting the dialogue extraction signal Cd0 from the front center signal C. The speech removal circuit 16 multiplies the speech extraction signal Cd0 by a coefficient (-1) and inverts the phase thereof, and subtracts the speech extraction signal Cd0 from the forward central signal C to obtain a speech removal forward central signal Cf0. An adder 16b to be generated and a multiplier 16c that multiplies the output of the adder 16b by a coefficient kc1 (= 0.707). The speech removal front center signal Cf 0 is input to the adder 15. The adder 15 adds the front left signal L, the first crosstalk cancellation signal Cc1, and the speech removal front center signal Cf0 to generate the output signal Lv, the front right signal R, and the second crosstalk cancellation signal. And an adder 15b that adds the Cc2 and the speech removal front center signal Cf0 to produce the output signal Rv.

台詞除去前方中央信号Ｃｆ０は、前方左スピーカーＬｓｐおよび前方右スピーカーＲｓｐから同位相同レベルで再生される。したがって、受聴者２は、前述のように前方中央信号Ｃに含まれる台詞音声が両耳の耳介の近傍の仮想スピーカーＣＥＬおよびＣＥＲから再生されるように感じる一方で、台詞音声以外の前方中央信号Ｃの音声（人間の音声以外の音楽成分、他）が、受聴者２の正面前方の仮想中央スピーカーＶＣから再生されるような定位感を得ることができる。また、前方左信号Ｌは、加算器１５を経て前方左スピーカーＬｓｐから再生され、前方右信号Ｒは、加算器１５を経て前方右スピーカーＲｓｐから再生される。その結果、受聴者２には、前方スピーカーから再生される音声のなかで、台詞以外の音声が前方に定位し、強調された台詞音声だけが受聴者の両耳の耳介の近傍から再生されるように感じるので、定位感が異なる台詞音声に関する明瞭度が改善される。 The speech removal front center signal Cf0 is reproduced from the front left speaker Lsp and the front right speaker Rsp at the same homologous level. Therefore, the listener 2 feels that the speech included in the front center signal C is reproduced from the virtual speakers CEL and CER near the pinna of both ears as described above, while the front center other than the speech It is possible to obtain a sense of localization such that the sound of the signal C (music components other than human speech, etc.) is reproduced from the virtual central speaker VC in front of the listener 2. The front left signal L is reproduced from the front left speaker Lsp via the adder 15, and the front right signal R is reproduced from the front right speaker Rsp via the adder 15. As a result, the listener 2 has the sound other than the speech localized in the speech reproduced from the front speaker, and only the emphasized speech speech is reproduced from the vicinity of the pinna of the listener's both ears. Therefore, the intelligibility of speech with different localization feeling is improved.

さらに、サラウンド左信号ＳＬ、および、サラウンド右信号ＳＲは、ＤＳＰ１０のクロストーク付与回路１７へ入力される。クロストーク付与回路１７は、受聴者２の両耳２Ｌおよび２Ｒと、仮想定位させる仮想左サラウンドスピーカーＶＳＬおよびＶＳＲと、の間に想定される頭部伝達関数ｄ１１、ｄ１２、ｄ２１、ならびに、ｄ２２に基づいて算出される伝達関数Ｄ１１、Ｄ１２、Ｄ２１、ならびに、Ｄ２２を有するフィルタから構成されるラティス型フィルタを有する。頭部伝達関数ｄ１２、ならびに、ｄ２１は、仮想サラウンドスピーカーから遠い方の耳へのクロストークを意味する。クロストーク付与回路１７のラティス型フィルタの出力ｖｓ１ならびにｖｓ２は、加算器１８を経て第１台詞強調信号Ｃｅ１および第２台詞強調信号Ｃｅ２にそれぞれ加算され、クロストークキャンセル回路１４に入力される。なお、クロストーク付与回路１７のラティス型フィルタも、受聴者２に対する仮想サラウンドスピーカーの配置が左右対称であれば、加算器並びに減算器と２つのフィルタからなるシャフラー型フィルタで代用しても良い。 Further, the surround left signal SL and the surround right signal SR are input to the crosstalk applying circuit 17 of the DSP 10. The crosstalk applying circuit 17 applies the head related transfer functions d11, d12, d21, and d22 assumed between the two ears 2L and 2R of the listener 2 and the virtual left surround speakers VSL and VSR to be virtually localized. It has a lattice type filter composed of filters having transfer functions D11, D12, D21 and D22 calculated based on them. The head-related transfer functions d12 and d21 mean crosstalk to the ear far from the virtual surround speaker. The outputs vs1 and vs2 of the lattice filter of the crosstalk applying circuit 17 are added to the first speech emphasis signal Ce1 and the second speech emphasis signal Ce2 via the adder 18 and input to the crosstalk cancel circuit 14. Note that the lattice type filter of the crosstalk giving circuit 17 may be replaced with a shuffler type filter including an adder, a subtracter, and two filters as long as the virtual surround speakers are symmetrically arranged with respect to the listener 2.

したがって、サラウンド左信号ＳＬ、および、サラウンド右信号ＳＲは、クロストーク付与回路１７ならびにクロストークキャンセル回路１４を経て、前方左スピーカーＬｓｐと前方右スピーカーＲｓｐとから再生されるので、受聴者２は、仮想左サラウンドスピーカーＶＳＬおよびＶＳＲからサラウンド音声が再生されるように感じることができる。その結果、受聴者２には、マルチチャンネル音声再生装置１を含む再生システムがサラウンドスピーカーを備えていなくても、側方（若しくは後方）に仮想定位させた仮想音源により、受聴者にサラウンド音場を提供できる。また、マルチチャンネル音声再生装置１は、受聴者２には、前方スピーカーから再生されるサラウンド音場のなかで、強調された台詞音声だけが受聴者２の両耳の耳介の近傍から再生されるように感じさせる一方で、台詞以外の音声を受聴者２の周囲に定位させるので、受聴者２は、定位感が異なっている台詞音声に関する明瞭度を改善することができる。 Therefore, the surround left signal SL and the surround right signal SR are reproduced from the front left speaker Lsp and the front right speaker Rsp via the crosstalk giving circuit 17 and the crosstalk cancellation circuit 14, and the listener 2 It can be felt that surround sound is reproduced from the virtual left surround speakers VSL and VSR. As a result, even if the playback system including the multi-channel audio playback device 1 is not provided with a surround speaker, the listener 2 can receive a surround sound field by using a virtual sound source that is virtually localized laterally (or rearward). Can provide. Further, the multi-channel sound reproducing device 1 allows the listener 2 to reproduce only the emphasized speech sound from the vicinity of the ears of both ears of the listener 2 in the surround sound field reproduced from the front speaker. On the other hand, since the sound other than the speech is localized around the listener 2, the listener 2 can improve the intelligibility regarding the speech with different localization feelings.

図４は、本発明の他の好ましい実施形態によるマルチチャンネル音声再生装置３について説明する図であり、マルチチャンネル音声再生装置３は、信号処理を行うＤＳＰ１０の構成が先の実施例のマルチチャンネル音声再生装置１と異なる他は、共通する。図４では、マルチチャンネル音声再生装置３のＤＳＰ１０以外の構成（例えば、電源回路、制御回路、等。）は省略されており、実施例１と共通する部分には共通の番号を付し、説明は省略する。 FIG. 4 is a diagram for explaining a multi-channel sound reproducing apparatus 3 according to another preferred embodiment of the present invention. The multi-channel sound reproducing apparatus 3 has the configuration of the DSP 10 that performs signal processing in the multi-channel sound reproducing apparatus according to the previous embodiment. Other than the playback apparatus 1 is common. In FIG. 4, configurations (for example, a power supply circuit, a control circuit, etc.) other than the DSP 10 of the multi-channel audio reproduction device 3 are omitted, and portions common to the first embodiment are denoted by the same reference numerals and described. Is omitted.

図５は、マルチチャンネル音声再生装置３が再生するサラウンド音場における前方スピーカーおよびサラウンドスピーカーの配置、ならびに、受聴者２の頭部伝達関数ＨＲＴＦを説明する図である。また、本実施例のマルチチャンネル音声再生装置３には、左スピーカーＬｓｐと、右スピーカーＲｓｐと、に加えて、前方中央スピーカーＣｓｐと、サラウンド左スピーカーＳＬｓｐと、サラウンド右スピーカーＳＲｓｐと、が接続される。 FIG. 5 is a diagram for explaining the arrangement of the front speakers and the surround speakers in the surround sound field reproduced by the multi-channel sound reproducing device 3, and the head-related transfer function HRTF of the listener 2. In addition to the left speaker Lsp and the right speaker Rsp, the front center speaker Csp, the surround left speaker SLsp, and the surround right speaker SRsp are connected to the multi-channel sound reproducing device 3 of the present embodiment. The

マルチチャンネル音声再生装置３のＤＳＰ１０は、先の実施例と同様に、前方中央信号Ｃに配分されている台詞音声に対して、強調処理と、クロストークキャンセル処理と、を行う。前方中央信号Ｃは、台詞抽出回路１１に入力されて台詞抽出信号Ｃｄ０として出力される。台詞音声に対する強調処理と、クロストークキャンセル処理とは、先の実施例と同様であるので、説明を省略する。ただし、本実施例の台詞抽出回路１１は、通過帯域が人間の音声帯域に相当する２００Ｈｚ〜５ｋＨｚに設定された帯域通過フィルタ（ＢＰＦ）１１ａに加えて、前方中央信号Ｃに台詞音声が含まれるか否かを、帯域通過フィルタ（ＢＰＦ）１１ａの出力のエンベロープ信号の１〜４Ｈｚ成分レベルの大小にしたがって検出する台詞検出回路１１ｂと、このエンベロープ信号により帯域通過フィルタ（ＢＰＦ）１１ａの出力のレベルを制御する乗算器１１ｃと、を含んでいる。 The DSP 10 of the multi-channel audio reproduction device 3 performs enhancement processing and crosstalk cancellation processing on the speech that is distributed to the front center signal C, as in the previous embodiment. The front center signal C is input to the line extraction circuit 11 and output as a line extraction signal Cd0. Since the emphasis processing for the speech and the crosstalk cancellation processing are the same as those in the previous embodiment, description thereof will be omitted. However, the speech extraction circuit 11 of this embodiment includes speech in the front center signal C in addition to the bandpass filter (BPF) 11a whose passband is set to 200 Hz to 5 kHz corresponding to the human speech band. The line detection circuit 11b detects whether or not the level of the 1 to 4 Hz component level of the envelope signal output from the band pass filter (BPF) 11a, and the level of the output from the band pass filter (BPF) 11a based on the envelope signal. And a multiplier 11c for controlling.

図６は、本実施例の台詞検出回路１１ｂを説明する図であり、図６（ａ）は、台詞検出回路１１ｂの構成を示すブロックダイアグラムであり、図６（ｂ）は、台詞検出回路１１ｂが出力する乗算器１１ｃの係数ｋｚを生成するアルゴリズムを説明するフローチャートである。台詞抽出回路１１は、前方中央信号Ｃに台詞音声が含まれているかどうかによって適応的に動作し、台詞検出回路１１ｂが、台詞音声が含まれると検出する場合には、乗算器１１ｃの係数ｋｚを大きくして台詞抽出信号Ｃｄ０の出力レベルを大きくし、台詞音声が含まれないと検出する場合には、乗算器１１ｃの係数ｋｚを小さくして台詞抽出信号Ｃｄ０の出力レベルを小さくすることができる。 FIG. 6 is a diagram for explaining the dialogue detection circuit 11b of this embodiment, FIG. 6 (a) is a block diagram showing the configuration of the dialogue detection circuit 11b, and FIG. 6 (b) is a dialogue detection circuit 11b. Is a flowchart illustrating an algorithm for generating a coefficient kz of the multiplier 11c output from The dialogue extraction circuit 11 operates adaptively depending on whether or not speech speech is included in the front center signal C, and when the speech detection circuit 11b detects that speech speech is included, the coefficient kz of the multiplier 11c. Is increased to increase the output level of the speech extraction signal Cd0, and when detecting that speech speech is not included, the coefficient kz of the multiplier 11c is decreased to decrease the output level of the speech extraction signal Cd0. it can.

台詞抽出回路１１の帯域通過フィルタ（ＢＰＦ）１１ａの出力信号Ｘｎは、通過帯域が人間の音声帯域に相当する２００Ｈｚ〜５ｋＨｚの帯域成分を主に含むので、台詞音声信号の特徴点を検出することでその信号が台詞であるかどうかを判断する。人間の音声である台詞音声は、音楽信号と比べて一般に断続的な要素が大きく、特に早口である場合を除いてそのエンベロープは数Ｈｚ程度であるのが一般的であるので、台詞音声信号のエンベロープの数Ｈｚ成分を取り出し、そのレベルによってその信号が台詞であるかそうでないかを判断して処理を変えることで、台詞音声の抽出がより適切に行われる。 Since the output signal Xn of the band pass filter (BPF) 11a of the line extraction circuit 11 mainly includes a band component of 200 Hz to 5 kHz corresponding to the human voice band, the feature point of the line voice signal is detected. To determine if the signal is a line. Dialogue speech, which is human speech, generally has more intermittent elements than music signals, and its envelope is generally around a few Hz unless it is particularly fast. By extracting a few Hz component of the envelope, and determining whether the signal is speech or not depending on its level, the speech is extracted more appropriately.

具体的には、台詞検出回路１１ｂは、図６（ａ）に示すように、最初に帯域通過フィルタ（ＢＰＦ）１１ａの出力信号Ｘｎの絶対値を求め、台詞音声信号のエンベロープを抽出する（Ｓ１０１）。次に、カットオフ周波数が４Ｈｚの低域通過フィルタ（ＬＰＦ：Ｓ１０２）と、カットオフ周波数が１Ｈｚの高域通過フィルタ（ＨＰＦ：Ｓ１０３）によって、エンベロープ信号の１〜４Ｈｚ成分レベルＱｎを取り出す。そのＱｎの絶対値を平滑することで数Ｈｚのエンベロープ信号のレベルＹｎを求め（Ｓ１０４）、そのレベルＹｎによって乗算器１１ｃの係数ｋｚを制御する（Ｓ１０５）。 Specifically, as shown in FIG. 6A, the line detection circuit 11b first obtains the absolute value of the output signal Xn of the band pass filter (BPF) 11a, and extracts the envelope of the line sound signal (S101). ). Next, the 1 to 4 Hz component level Qn of the envelope signal is extracted by a low-pass filter (LPF: S102) having a cutoff frequency of 4 Hz and a high-pass filter (HPF: S103) having a cutoff frequency of 1 Hz. The level Yn of the envelope signal of several Hz is obtained by smoothing the absolute value of Qn (S104), and the coefficient kz of the multiplier 11c is controlled by the level Yn (S105).

台詞音声信号のエンベロープ信号のレベルＹｎから乗算器１１ｃの係数ｋｚを生成する図６（ｂ）に示すアルゴリズムを説明する。最初に、予め設定した最小レベル値Ｙｍｉｎをレジスタ値Ｙｐに代入し、また、係数ｋｚを０に初期化する（Ｓ２０１）。エンベロープ信号のレベルＹｎが入力される（Ｓ２０２）と、レベルＹｎがレジスタ値Ｙｐ以上の場合には（Ｓ２０３：Ｙｅｓ）、レジスタ値ＹｐにレベルＹｎを代入して（Ｓ２０４）次のステップ（Ｓ２０８）に進む。一方、レベルＹｎがレジスタ値Ｙｐよりも小さい場合には（Ｓ２０３：Ｎｏ）、レジスタ値Ｙｐに係数０．９を乗算した値を新たなレジスタ値Ｙｐとして（Ｓ２０５）、このレジスタ値Ｙｐが最小レベル値Ｙｍｉｎ以下である場合（Ｓ２０６：Ｙｅｓ）には、最小レベル値Ｙｍｉｎをレジスタ値Ｙｐに代入し（Ｓ２０７）次のステップ（Ｓ２０８）に進む。また、レジスタ値Ｙｐが最小レベル値Ｙｍｉｎよりも大きい場合（Ｓ２０６：Ｎｏ）には、次のステップ（Ｓ２０８）に進む An algorithm shown in FIG. 6B for generating the coefficient kz of the multiplier 11c from the level Yn of the envelope signal of the speech sound signal will be described. First, the preset minimum level value Ymin is substituted into the register value Yp, and the coefficient kz is initialized to 0 (S201). When the level Yn of the envelope signal is input (S202), if the level Yn is greater than or equal to the register value Yp (S203: Yes), the level Yn is substituted for the register value Yp (S204) and the next step (S208) Proceed to On the other hand, when the level Yn is smaller than the register value Yp (S203: No), a value obtained by multiplying the register value Yp by the coefficient 0.9 is set as a new register value Yp (S205), and this register value Yp is the minimum level. When the value is less than or equal to the value Ymin (S206: Yes), the minimum level value Ymin is substituted into the register value Yp (S207), and the process proceeds to the next step (S208). When the register value Yp is larger than the minimum level value Ymin (S206: No), the process proceeds to the next step (S208).

次に、レジスタ値Ｙｐが予め設定した最大レベル値Ｙｍａｘ以上である場合には（Ｓ２０８：Ｙｅｓ）、最大レベル値Ｙｍａｘをレジスタ値Ｙｐに代入し、また、係数ｋｚを１に設定する（Ｓ２０９）。一方、レジスタ値Ｙｐが予め設定した最大レベル値Ｙｍａｘよりも小さい値である場合には（Ｓ２０８：Ｎｏ）、最小レベル値Ｙｍｉｎとレジスタ値Ｙｐが等しければ（Ｓ２１０：Ｙｅｓ）、係数ｋｚを０に設定する（Ｓ２１１）。また、最小レベル値Ｙｍｉｎとレジスタ値Ｙｐが等しくなければ（Ｓ２１０：Ｎｏ）、（Ｙｐ−Ｙｍｉｎ）／（Ｙｍａｘ−Ｙｍｉｎ）を計算してこれを係数ｋｚに設定する（Ｓ２１２）。最後に次のエンベロープ信号のレベルＹｎが入力されるのに合わせて、整数ｎをインクリメントする（Ｓ２１３）。 Next, when the register value Yp is greater than or equal to the preset maximum level value Ymax (S208: Yes), the maximum level value Ymax is substituted into the register value Yp, and the coefficient kz is set to 1 (S209). . On the other hand, when the register value Yp is smaller than the preset maximum level value Ymax (S208: No), if the minimum level value Ymin and the register value Yp are equal (S210: Yes), the coefficient kz is set to 0. Set (S211). If the minimum level value Ymin and the register value Yp are not equal (S210: No), (Yp−Ymin) / (Ymax−Ymin) is calculated and set to the coefficient kz (S212). Finally, the integer n is incremented as the level Yn of the next envelope signal is input (S213).

以上のように、前方中央信号Ｃに含まれる台詞音声のエンベロープ信号のレベルＹｎを求めて、そのレベルＹｎによって乗算器１１ｃの係数ｋｚを制御することができる。このようにして、レベルＹｎに応じて係数ｋｚは、０〜１の間で緩やかに変化する。乗算器１１ｃの係数ｋｚを制御するアルゴリズムは、上記に限られるわけではなく、エンベロープ信号のレベルＹｎの変動に合わせて相対的に係数ｋｚを変化させるものであればよい。実質的に人の音声の変動を検出して台詞音声を週出することができるので、他の音声との定位の差異により、受聴者２にとっての明瞭度を顕著に上昇させることができる。 As described above, the level Yn of the speech signal envelope signal included in the front center signal C is obtained, and the coefficient kz of the multiplier 11c can be controlled by the level Yn. In this way, the coefficient kz changes gently between 0 and 1 according to the level Yn. The algorithm for controlling the coefficient kz of the multiplier 11c is not limited to the above, and any algorithm that changes the coefficient kz relative to the fluctuation of the level Yn of the envelope signal may be used. Since the speech of a person can be produced substantially by detecting fluctuations in human speech, the clarity of the listener 2 can be significantly increased due to the difference in localization from other speech.

なお、この台詞音声を検出する適応処理は、フィルタの周波数が低くて時定数が長く、台詞音声であるかどうかの検出にはある程度時間を要する。したがって、前方中央信号Ｃの台詞抽出回路１１の台詞検出回路１１ｂにおいて、台詞検出に要する時間（例えば、０．５秒程度。）だけ、マルチチャンネル音声の各信号経路に遅延（ディレイ）回路ＴＤを設けることが好ましい。マルチチャンネル音声が映像を伴う場合には、映像信号にも同様な遅延を施すのが更に好ましい。台詞抽出回路１１では、帯域通過フィルタ（ＢＰＦ）１１ａの出力と乗算器１１ｃとの間に遅延（ディレイ）回路ＴＤを設けている。マルチチャンネル音声間の同期が維持されるので、受聴者２には、良好なマルチチャンネル音場の再生が可能になる。 Note that this adaptive processing for detecting speech speech has a low filter frequency and a long time constant, and it takes some time to detect whether speech speech is speech speech. Accordingly, in the dialogue detection circuit 11b of the dialogue extraction circuit 11 for the front center signal C, a delay circuit TD is provided in each signal path of multi-channel audio for the time required for dialogue detection (for example, about 0.5 seconds). It is preferable to provide it. In the case where multi-channel audio is accompanied by video, it is more preferable to apply a similar delay to the video signal. In the dialogue extraction circuit 11, a delay circuit TD is provided between the output of the band pass filter (BPF) 11a and the multiplier 11c. Since synchronization between the multi-channel sounds is maintained, the listener 2 can reproduce a good multi-channel sound field.

台詞除去回路１６は、台詞抽出信号Ｃｄ０に係数（−１）を乗算してその位相を反転する乗算器１６ａと、前方中央信号Ｃから台詞抽出信号Ｃｄ０を減算して台詞除去前方中央信号Ｃｆ０を生成する加算器１６ｂと、台詞除去前方中央信号Ｃｆ０に係数ｋｃ１を乗算する乗算器１６ｃとを含み、さらに、台詞除去前方中央信号Ｃｆ０に係数ｋｃ２を乗算する乗算器１６ｄを含む。本実施例では、係数ｋｃ１を０として、加算器１５へ台詞除去前方中央信号Ｃｆ０を入力せず、そして、係数ｋｃ２を１として、台詞除去前方中央信号Ｃｆ０を出力信号Ｃｖとして出力している。 The speech removal circuit 16 multiplies the speech extraction signal Cd0 by a coefficient (-1) and inverts the phase thereof, and subtracts the speech extraction signal Cd0 from the forward central signal C to obtain a speech removal forward central signal Cf0. It includes an adder 16b to be generated, a multiplier 16c that multiplies the speech removal front center signal Cf0 by a coefficient kc1, and further includes a multiplier 16d that multiplies the speech removal front center signal Cf0 by a coefficient kc2. In the present embodiment, the coefficient kc1 is set to 0, the speech removal front center signal Cf0 is not input to the adder 15, and the coefficient kc2 is set to 1 and the speech removal front center signal Cf0 is output as the output signal Cv.

マルチチャンネル音声再生装置３のＤＳＰ１０は、先の実施例の場合と同様に、サラウンド左信号ＳＬ、および、サラウンド右信号ＳＲを、ＤＳＰ１０のクロストーク付与回路１７へ入力する。クロストーク付与回路１７のラティス型フィルタの出力ｖｓ１ならびにｖｓ２は、それぞれ乗算器１９ａおよび１９ｂと、遅延（ディレイ）回路ＴＤとを経て、加算器１８に入力される。ただし、本実施例では、サラウンド左信号ＳＬ、および、サラウンド右信号ＳＲは、クロストーク付与回路１７に入力される前に分岐されて、それぞれ乗算器１９ｃおよび１９ｄと、遅延（ディレイ）回路ＴＤとを経て、出力信号ＳｌおよびＳｒとして出力される。したがって、乗算器１９ａおよび１９ｂの係数ｋｓ１を０として、一方、乗算器１９ｃおよび１９ｄの係数ｋｓ２を１とすることで、サラウンド左信号ＳＬ、および、サラウンド右信号ＳＲは、クロストーク付与処理とクロストークキャンセル処理からなるバーチャルサラウンド処理を施さないで、そのままサラウンド左スピーカーＳＬｓｐ、および、サラウンド右スピーカーＳＲｓｐに出力される。 The DSP 10 of the multi-channel audio reproduction device 3 inputs the surround left signal SL and the surround right signal SR to the crosstalk giving circuit 17 of the DSP 10 as in the case of the previous embodiment. The outputs vs1 and vs2 of the lattice filter of the crosstalk applying circuit 17 are input to an adder 18 through multipliers 19a and 19b and a delay circuit TD, respectively. However, in this embodiment, the surround left signal SL and the surround right signal SR are branched before being input to the crosstalk giving circuit 17, and are respectively multiplied by multipliers 19c and 19d and a delay circuit TD. And then output as output signals Sl and Sr. Therefore, by setting the coefficient ks1 of the multipliers 19a and 19b to 0, and setting the coefficient ks2 of the multipliers 19c and 19d to 1, the surround left signal SL and the surround right signal SR are processed by the crosstalk applying process and the crosstalk. Without being subjected to virtual surround processing including talk cancel processing, the sound is output as it is to the surround left speaker SLsp and the surround right speaker SRsp.

出力信号Ｃｖは、増幅されて前方中央スピーカーＣｓｐから再生されるので、台詞音声を除いた前方中央信号Ｃの音声、つまり、本来に受聴者２の前方に定位すべき音声が、受聴者２の前方から再生される。また、出力信号ＳｌおよびＳｒは、サラウンド左スピーカーＳＬｓｐ、および、サラウンド右スピーカーＳＲｓｐから出力され、サラウンド音声が、受聴者２の両側方から再生される。したがって、マルチチャンネル音声再生装置１は、受聴者２には、前方スピーカーならびにサラウンドスピーカーから再生されるサラウンド音場のなかで、先の実施例で示した図３（ａ）〜（ｄ）のように、強調された台詞音声だけが受聴者２の両耳の耳介の近傍から再生されるように感じさせる一方で、台詞以外の音声を受聴者２の周囲に定位させるので、受聴者２は、定位感が異なっている台詞音声に関する明瞭度を改善することができる。 Since the output signal Cv is amplified and reproduced from the front center speaker Csp, the sound of the front center signal C excluding the speech sound, that is, the sound that should be localized in front of the listener 2 is the sound of the listener 2. Played from the front. The output signals Sl and Sr are output from the surround left speaker SLsp and the surround right speaker SRsp, and surround sound is reproduced from both sides of the listener 2. Therefore, the multi-channel sound reproducing apparatus 1 allows the listener 2 to perform the surround sound field reproduced from the front speaker and the surround speaker as shown in FIGS. 3A to 3D shown in the previous embodiment. Further, since only the emphasized speech is felt to be reproduced from the vicinity of the pinna of both ears of the listener 2, while the speech other than the speech is localized around the listener 2, the listener 2 It is possible to improve the intelligibility regarding speech with different feelings of localization.

本実施例では、台詞音声のエンベロープ信号に基づいて抽出した台詞音声のみについてクロストークキャンセル処理を行うので、信号の加算が最小限にできるので、信号の加算によるダイナミックレンジの低下を抑制できる。もちろん、本実施例においても、先の実施例のように、台詞除去前方中央信号Ｃｆ０を前方左スピーカーＬｓｐおよび前方右スピーカーＲｓｐから同位相同レベルで再生してもよい。また、サラウンド左信号ＳＬ、および、サラウンド右信号ＳＲにバーチャルサラウンド処理を施して、前方左スピーカーＬｓｐおよび前方右スピーカーＲｓｐから出力しても良い。 In the present embodiment, since the crosstalk cancellation process is performed only for the speech sound extracted based on the speech sound envelope signal, the addition of the signal can be minimized, so that the reduction of the dynamic range due to the addition of the signal can be suppressed. Of course, also in the present embodiment, the speech removal front center signal Cf0 may be reproduced from the front left speaker Lsp and the front right speaker Rsp at the same homologous level as in the previous embodiment. Alternatively, the surround left signal SL and the surround right signal SR may be subjected to virtual surround processing and output from the front left speaker Lsp and the front right speaker Rsp.

また、マルチチャンネル音声再生装置１または３は、ＤＳＰ１０のデコード回路１１が出力するマルチチャンネル音声が、いわゆる７．１チャンネルである場合には、さらに含むサラウンド後方左信号ならびにサラウンド後方右信号に、他の仮想到来角度の伝達関数に基づく両耳間クロストークを付与するクロストーク付与回路を設けて、その出力をクロストークキャンセル回路に入力しても良い。 In addition, when the multi-channel audio output from the decoding circuit 11 of the DSP 10 is a so-called 7.1 channel, the multi-channel audio reproducing device 1 or 3 further includes the surround back left signal and the surround back right signal that are included. It is also possible to provide a crosstalk application circuit that applies interaural crosstalk based on the transfer function of the virtual arrival angle, and input the output to the crosstalk cancellation circuit.

さらに、本実施例のマルチチャンネル音声再生装置１または３の選択入力回路１３は、台詞強調信号Ｃｅ０に対して係数ｋｅ１ならびにｋｅ２をそれぞれ乗算する乗算器１３ａおよび１３ｂを含み、係数ｋｅ１ならびにｋｅ２は、１、０、−１の値の他にも、他の組み合わせであって時間的に変動する係数であっても良い。係数ｋｅ１ならびにｋｅ２の組み合わせが、時間的に変動することで、強調された台詞音声が受聴者２の両耳の耳介の近傍から再生され、かつ、移動するように感じさせるので、移動しない他の音声との差異が明確になり、結果として台詞音声を明瞭にすることができる。 Further, the selection input circuit 13 of the multi-channel sound reproducing apparatus 1 or 3 of the present embodiment includes multipliers 13a and 13b that multiply the speech emphasis signal Ce0 by coefficients ke1 and ke2, respectively, and the coefficients ke1 and ke2 are In addition to the values of 1, 0, and −1, other combinations and coefficients that vary with time may be used. Since the combination of the coefficients ke1 and ke2 fluctuates with time, the emphasized speech is reproduced from the vicinity of the pinna of both ears of the listener 2, and does not move. As a result, the speech can be clarified.

本発明のマルチチャンネル音声再生装置は、上記実施例に限られず、ＭＰＥＧ−２／ＡＡＣといったマルチチャンネル音声信号フォーマットを伝送するＡＶレシーバーや、ディスク再生装置といったマルチチャンネル音声信号再生装置にも適する。 The multi-channel audio reproduction apparatus of the present invention is not limited to the above-described embodiment, and is also suitable for an AV receiver that transmits a multi-channel audio signal format such as MPEG-2 / AAC and a multi-channel audio signal reproduction apparatus such as a disk reproduction apparatus.

本発明の好ましい実施形態によるマルチチャンネル音声再生装置１について説明する図である。（実施例１）It is a figure explaining the multichannel audio | voice reproduction apparatus 1 by preferable embodiment of this invention. Example 1 マルチチャンネル音声再生装置１が再生するサラウンド音場における前方スピーカーの配置、ならびに、受聴者２の頭部伝達関数ＨＲＴＦを説明する図である。（実施例１）It is a figure explaining arrangement | positioning of the front speaker in the surround sound field which the multichannel audio | voice reproducing | regenerating apparatus 1 reproduces, and the listener's 2 head-related transfer function HRTF. Example 1 マルチチャンネル音声再生装置１によって受聴者２が得る台詞音声の定位感について説明する図である。（実施例１、２）It is a figure explaining the localization feeling of the speech sound which the listener 2 obtains with the multichannel audio | voice reproduction apparatus 1. FIG. (Examples 1 and 2) 本発明の好ましい実施形態によるマルチチャンネル音声再生装置３について説明する図である。（実施例２）It is a figure explaining the multichannel audio | voice reproduction apparatus 3 by preferable embodiment of this invention. (Example 2) マルチチャンネル音声再生装置３が再生するサラウンド音場における前方スピーカーの配置、ならびに、受聴者２の頭部伝達関数ＨＲＴＦを説明する図である。（実施例２）It is a figure explaining arrangement | positioning of the front speaker in the surround sound field which the multichannel audio | voice reproduction | regeneration apparatus 3 reproduces, and the listener's 2 head-related transfer function HRTF. (Example 2) マルチチャンネル音声再生装置３の台詞検出回路１１ｂを説明する図である。（実施例２）It is a figure explaining the dialog detection circuit 11b of the multichannel audio | voice reproduction apparatus 3. FIG. (Example 2)

Explanation of symbols

１マルチチャンネル音声再生装置
２受聴者
３マルチチャンネル音声再生装置
１０ＤＳＰ
１１台詞抽出回路
１２台詞強調回路
１３乗算器
１４クロストークキャンセル回路
１５加算器
１６台詞除去回路
１７クロストーク付与回路
１８加算器
１９乗算器
DESCRIPTION OF SYMBOLS 1 Multichannel audio | voice reproduction apparatus 2 Listener 3 Multichannel audio | voice reproduction apparatus 10
DESCRIPTION OF SYMBOLS 11 Dialog extraction circuit 12 Dialog emphasis circuit 13 Multiplier 14 Crosstalk cancellation circuit 15 Adder 16 Dialog removal circuit 17 Crosstalk provision circuit 18 Adder 19 Multiplier

Claims

A speech extraction means for outputting a speech extraction signal that has passed through the speech voice band of the front center signal;
A line emphasizing unit that performs a line emphasis process for relatively increasing the level so that the upper limit of the line speech band of the line extraction signal is at a maximum level, and outputs the line emphasis signal;
The line emphasis signal is branched into a first line emphasis signal and a second line emphasis signal, and a crosstalk cancellation process for removing interaural crosstalk of the listener is performed on each of them to perform the first crosstalk cancellation signal and the first crosstalk cancellation signal. 2 crosstalk cancellation means for generating a crosstalk cancellation signal,
Playing back the first crosstalk cancellation signal from the left speaker and playing back the second crosstalk cancellation signal from the right speaker;
Multi-channel audio playback device.

The dialogue extraction means includes dialogue detection means for detecting whether or not the dialogue speech is included in the front center signal,
When the line detection means controls the output level of the line extraction signal according to the level of the 1-4 Hz component level of the envelope signal of the line extraction signal and detects that the line speech is included, the line extraction signal The output level of the dialogue extraction signal is reduced when detecting that the speech is not included.
The multi-channel audio reproducing apparatus according to claim 1.

Dialog removal means for generating a dialogue removal front center signal obtained by subtracting the dialogue extraction signal from the front center signal;
Adding means for adding the speech removal front center signal to the first and second crosstalk cancellation signals,
Playing the dialogue-removed front center signal and the first crosstalk cancellation signal from the left speaker, and playing the dialogue-removed front center signal and the second crosstalk cancellation signal from the right speaker,
The multi-channel audio reproduction apparatus according to claim 1 or 2.

Line removal means for generating a line removal front center signal obtained by subtracting the line extraction signal from the front center signal,
Playing the speech-removed front center signal from a center speaker, playing the first crosstalk cancellation signal from the left speaker, and playing the second crosstalk cancellation signal from the right speaker;
The multi-channel audio reproduction apparatus according to claim 1 or 2.

When the crosstalk canceling unit inputs only one of the first speech emphasis signal and the second speech emphasis signal to be subjected to the crosstalk cancellation processing, and when both are input at the same homology level, Including both a case of inputting both at the reverse homology level, and a selection input means for selecting and inputting either case,
When the selection input means is selected, the dialogue voice is emphasized to the listener, the dialogue voice is felt in the vicinity of either one of the listener's both ears, and the listener's head. Switching between the case of feeling the speech inside and the case of feeling the speech on the back of the listener,
The multi-channel audio reproducing apparatus according to any one of claims 1 to 4.

The multi-channel audio reproduction apparatus according to any one of claims 1 to 5, further receiving a front left signal and a front right signal among the multi-channel audio.
A multi-channel audio reproducing device comprising: an adding means for adding the front left signal to the first crosstalk cancellation signal and adding the front right signal to the second crosstalk cancellation signal.

The multi-channel audio reproducing apparatus according to claim 1, further receiving a surround left signal and a surround right signal among the multi-channel audio.
A cross for applying a crosstalk applying process for applying a binaural crosstalk based on a transfer function of a virtual arrival angle to the surround left signal and the surround right signal to generate a first crosstalk applying signal and a second crosstalk applying signal. Talk giving means,
A multi-channel audio reproduction apparatus comprising: adding means for adding the first crosstalk giving signal to the first speech emphasizing signal and adding the second crosstalk giving signal to the second speech emphasizing signal.