JP5202021B2

JP5202021B2 - Audio signal conversion apparatus, audio signal conversion method, control program, and computer-readable recording medium

Info

Publication number: JP5202021B2
Application number: JP2008036589A
Authority: JP
Inventors: 修藤井
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2008-02-18
Filing date: 2008-02-18
Publication date: 2013-06-05
Anticipated expiration: 2028-02-18
Also published as: JP2009193031A

Abstract

<P>PROBLEM TO BE SOLVED: To achieve a voice signal converter, making voice easy to listen to and heightening the feeling of being there. <P>SOLUTION: The voice signal converter 1 converts a right voice signal corresponding to a right channel and a left voice signal corresponding to a left channel to a central normal voice output signal corresponding to a center channel, a right voice output signal corresponding to the right channel and a left voice output signal corresponding to the left channel. A common component extracting part 3 extracts a common component contained in common in the right voice signal and the left voice signal, and an inverter part 5 generates the central voice output signal from the common component. A multiplication part 4 decreases a component calculated by subtracting the common component from the right voice signal and the left voice signal, respectively, and the inverter part 5 generates time signals of the right voice output signal and the left voice output signal. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、テレビ受信装置などに設けられ、放送中の番組などの音声を強調する音声信号変換装置、音声信号変換方法、制御プログラム、および、コンピュータ読み取り可能な記録媒体に関するものである。 The present invention relates to an audio signal conversion apparatus, an audio signal conversion method, a control program, and a computer-readable recording medium that are provided in a television receiver or the like and emphasize audio such as a broadcast program.

オーディオ再生技術の進歩に伴い、専用のリスニングルームにおけるＨｉＦｉ（High Fidelity:高忠実度）オーディオによる大音量での音楽再生や、マルチチャンネルのホームシアターシステムによるサラウンド再生などにより、ユーザは、自宅に居ながらにして、コンサートホールや映画館と同様の自然な残響音や臨場感を楽しむことができる。 With the advancement of audio playback technology, users can stay at home by playing music at high volume with HiFi (High Fidelity) audio in a dedicated listening room and surround playback with a multi-channel home theater system. In the same way, you can enjoy the natural reverberation and realism of a concert hall or movie theater.

これに対し、通常、テレビ放送などのコンテンツを視聴する場合、視聴者はリビングや台所などにおいて、小さい音量で視聴することが多い。そして、このような小さい音量でテレビを視聴する場合などにおいても、台詞など人の声が正確に認識できることや、高い臨場感が要求される。 On the other hand, usually, when viewing content such as television broadcasting, the viewer often watches at a low volume in a living room or kitchen. And even when watching TV with such a small volume, it is required that human voices such as dialogue can be accurately recognized and that a high sense of reality is required.

しかしながら、聴覚の衰えた高齢者などと一緒にテレビ放送を視聴する場合には、通常よりも大きな音量で視聴することになるが、人の声以外の騒音や効果音まで大きくなるため、通常の聴覚を持つ人にとっては、人の声が聴き取り難くなり、煩わしく感じることがある。したがって、リビングなどにおいて通常よりも大きな音量で視聴する場合、音声（人の声）を聞き取り易くするため、騒音や効果音については強調されないことが望ましい。 However, when watching TV broadcasts with elderly people with weak hearing, etc., they will be watching at a louder volume than usual. For a person with hearing, it may be difficult to hear a person's voice and feel annoying. Therefore, when viewing at a louder volume than usual in a living room or the like, it is desirable that noise and sound effects are not emphasized in order to make it easier to hear the voice (human voice).

そのため、放送中、または、再生中のコンテンツについて、状況に応じて、音声（人の声）を強調して、騒音や音楽などを抑制したり、あるいは、反対に、臨場感を向上させるために音楽や効果音などを強調したりする必要がある。 Therefore, for the content being broadcast or being played, in order to enhance the sound (human voice) and suppress noise and music depending on the situation, or on the contrary, to improve the sense of reality It is necessary to emphasize music and sound effects.

特許文献１には、ＶＴＲやテープレコーダーなどの音声再生装置に適用され、人の声を聴き易くすることができる音声強調回路が開示されている。特許文献１に記載の音声強調回路は、左及び右音声信号Ｌ、Ｒの和音声信号Ｌ＋Ｒの中域成分を増幅する中域抜き出し増幅器と、左及び右音声信号Ｌ、Ｒに中域抜き出し増幅器の出力を加算する左及び右加算器とを有している。 Patent Document 1 discloses a voice enhancement circuit that can be applied to a voice reproduction device such as a VTR or a tape recorder and can easily hear a human voice. The speech enhancement circuit described in Patent Document 1 includes a mid-range extraction amplifier that amplifies a mid-frequency component of a sum speech signal L + R of left and right audio signals L and R, and a mid-range extraction amplifier for left and right audio signals L and R. And left and right adders for adding the outputs.

特許文献１に記載の音声強調回路によれば、左及び右音声信号Ｌ、Ｒに、その和音声信号Ｌ＋Ｒの中域成分の増幅されたものが加算されるので、左及び右音声を再生して聴取した場合、中央の人の声が聴き易くなる。 According to the speech enhancement circuit described in Patent Document 1, the left and right speech signals L and R are added to the summed component of the sum speech signal L + R, so that the left and right speech are reproduced. When listening, it becomes easier to hear the voice of the center person.

また、特許文献２には、オーディオ信号再生装置のボーカル音帯域強調回路が開示されている。特許文献２に記載のボーカル音帯域強調回路は、左右チャンネル信号から同相成分を取り出す同相成分抽出回路と、同相成分からボーカル音帯域を抽出するバンドパスフィルタと、ボーカル音帯域から所定の周波数成分を吸収減衰させるノッチフィルタと、ノッチフィルタからの出力信号を増幅する自動レベルコントロール回路と、その増幅レベルを制御するマイクロコンピュータと、自動レベルコントロール回路からの増幅された出力信号と左右チャンネル信号とを合成して、ボーカル音帯域強調左右チャンネル信号として出力する第１、第２の合成回路とを備えている。 Patent Document 2 discloses a vocal sound band emphasis circuit of an audio signal reproduction device. A vocal sound band emphasis circuit described in Patent Document 2 includes an in-phase component extraction circuit that extracts an in-phase component from left and right channel signals, a bandpass filter that extracts a vocal sound band from the in-phase component, and a predetermined frequency component from the vocal sound band. A notch filter that absorbs and attenuates, an automatic level control circuit that amplifies the output signal from the notch filter, a microcomputer that controls the amplification level, and the amplified output signal from the automatic level control circuit and the left and right channel signals are combined. And a first synthesis circuit and a second synthesis circuit that output the vocal sound band-emphasized left and right channel signals.

特許文献２に記載のオーディオ信号再生装置のボーカル音帯域強調回路は、同相成分抽出回路において左右チャンネル信号を加算している。そして、加算された信号をバンドパスフィルタによりボーカル音帯域を抽出し、ノッチフィルタにより所定（約１ｋＨｚ）の周波数成分を吸収減衰した後、さらに自動レベルコントロールで増幅し、左右チャンネル信号と合成する構成である。 The vocal sound band emphasis circuit of the audio signal reproduction device described in Patent Document 2 adds the left and right channel signals in the in-phase component extraction circuit. Then, a vocal sound band is extracted from the added signal by a bandpass filter, a predetermined (about 1 kHz) frequency component is absorbed and attenuated by a notch filter, and further amplified by automatic level control to be combined with left and right channel signals. It is.

特許文献１に記載の音声強調回路と特許文献２に記載のボーカル音帯域強調回路は、いずれも、右チャンネルと左チャンネルとの２チャンネルの入力信号を加算した信号のボーカル音帯域成分（中域成分）を抽出して増幅し、左右チャンネルの入力信号に加算することによって、人の声を強調するものである。 The voice enhancement circuit described in Patent Document 1 and the vocal sound band enhancement circuit described in Patent Document 2 are both vocal sound band components (middle band) of a signal obtained by adding two channel input signals of a right channel and a left channel. Component) is extracted, amplified, and added to the input signals of the left and right channels to enhance the human voice.

ところで、２チャンネルのオーディオ信号に対し、所定の処理を施してサラウンドスピーカを含む複数のスピーカから再生する場合に、音質を変えずに音像を広げることができるようにしたオーディオ装置が特許文献３に開示されている。特許文献３に記載されているオーディオ装置は、左右チャンネルのスペクトル分析を行って左右共通スペクトル成分を抽出し、共通スペクトル成分を基にフロントチャンネルとサラウンドチャンネルの波形を生成することにより広がり感のある音響空間を得る。 By the way, Patent Document 3 discloses an audio apparatus that can widen a sound image without changing the sound quality when two-channel audio signals are subjected to predetermined processing and reproduced from a plurality of speakers including a surround speaker. It is disclosed. The audio apparatus described in Patent Document 3 has a sense of breadth by performing left and right channel spectrum analysis to extract left and right common spectral components, and generating front channel and surround channel waveforms based on the common spectral components. Get an acoustic space.

より具体的には、特許文献３に記載のオーディオ装置は、左右チャンネルに共通して含まれるスペクトル成分（共通成分）を算出し、共通成分を用いて２チャンネルの入力信号を４チャンネル（フロントチャンネルとサラウンドチャンネル）の信号に分離するための分離係数を算出する。そして、入力信号のスペクトルを、分離係数を用いてフロントスペクトルとサラウンドスペクトルに分離し、それらを逆フーリエ変換することによって、フロントチャンネルとサラウンドチャンネルの波形を求める。
特開平５−１１５１００（１９９３年５月７日公開）特開２００５−８６４６２（２００５年３月３１日公開）特開平１１−１１３０９７（１９９９年４月２３日公開） More specifically, the audio device described in Patent Document 3 calculates a spectral component (common component) that is commonly included in the left and right channels, and uses the common component to convert an input signal of two channels into four channels (front channel). And a surround channel) are calculated. Then, the spectrum of the input signal is separated into a front spectrum and a surround spectrum using a separation coefficient, and the waveforms of the front channel and the surround channel are obtained by inverse Fourier transforming them.
JP 5-115100 (published May 7, 1993) JP 2005-86462 (published March 31, 2005) JP-A-11-113097 (published on April 23, 1999)

しかしながら、特許文献１、２に記載の構成、すなわち、右チャンネルと左チャンネルとの２チャンネルの入力信号を加算した信号のボーカル音帯域成分（中域成分）を抽出して増幅し、左右チャンネルの入力信号に加算する構成では、中域の周波数帯域に含まれる全ての音が強調されるため、人の声以外の雑音や音楽なども強調されることになる。 However, the configurations described in Patent Documents 1 and 2, that is, a vocal sound band component (middle band component) of a signal obtained by adding two channel input signals of the right channel and the left channel is extracted and amplified, In the configuration of adding to the input signal, all sounds included in the middle frequency band are emphasized, so noise and music other than human voice are also emphasized.

より詳細に説明すれば、次のとおりである。右チャンネルの信号のスペクトルＲと左チャンネルの信号のスペクトルＬとの共通成分をＣとすると、Ｒ＝Ｃ＋Ｒ’、Ｌ＝Ｃ＋Ｌ’と表される。また、Ｒ’はＲとＣとの差であり、Ｌ’はＬとＣとの差である。ここで、共通成分Ｃは、主として、中央に定位するボーカルやセリフなどの人の声に対応する成分である。また、Ｒ’およびＬ’は、人の声以外の周囲の音（雑音や背景音楽、効果音など）に対応する成分である。 This will be described in more detail as follows. When the common component of the spectrum R of the right channel signal and the spectrum L of the left channel signal is C, R = C + R ′ and L = C + L ′. R ′ is the difference between R and C, and L ′ is the difference between L and C. Here, the common component C is mainly a component corresponding to a human voice such as a vocal or speech localized at the center. R ′ and L ′ are components corresponding to ambient sounds (noise, background music, sound effects, etc.) other than human voices.

特許文献１、２に記載の構成では、右チャンネルの信号と左チャンネルの信号とが加算された信号の中域成分が増幅される。ここで、加算された信号のスペクトル成分はＬ＋Ｒ（＝２Ｃ＋Ｌ’＋Ｒ’）と表され、その中域では増幅によって値が増大する。この場合、共通成分Ｃだけではなく、Ｒ’およびＬ’についても増大することになる。つまり、人の声に対応する共通成分Ｃのみならず、周囲の音に対応するＲ’とＬ’との両方が増大するため、人の声を強調したい場合において、周囲の音も強調されてしまい、必ずしも人の声を聴き取り易くすることはできないという問題がある。 In the configurations described in Patent Documents 1 and 2, the middle band component of the signal obtained by adding the right channel signal and the left channel signal is amplified. Here, the spectral component of the added signal is expressed as L + R (= 2C + L ′ + R ′), and in the middle range, the value increases due to amplification. In this case, not only the common component C but also R ′ and L ′ increase. In other words, not only the common component C corresponding to the human voice but also both R ′ and L ′ corresponding to the surrounding sound increase, so when the human voice is to be emphasized, the surrounding sound is also emphasized. Therefore, there is a problem that it is not always easy to hear a human voice.

本発明は、上記の問題点に鑑みてなされたものであり、その第１の目的は、放送中や再生中の番組などにおいて、周囲の音、すなわち、人の声以外の音を抑制できるようにして、人の声を聴き取り易くすることができる３ｃｈの音声出力を実現するための音声信号変換装置、音声信号変換方法、制御プログラム、および、コンピュータ読み取り可能な記録媒体を提供することにある。また、本発明の第２の目的は、放送中や再生中の番組などにおいて、人の声を抑制できるようにして、効果音や背景音楽などを強調し、臨場感を向上させることができる３ｃｈの音声出力を実現するための音声信号変換装置、音声信号変換方法、制御プログラム、および、コンピュータ読み取り可能な記録媒体を提供することにある。 The present invention has been made in view of the above-mentioned problems, and a first object thereof is to suppress ambient sounds, that is, sounds other than human voices, in programs being broadcast or being played back. Thus, an object of the present invention is to provide an audio signal conversion device, an audio signal conversion method, a control program, and a computer-readable recording medium for realizing 3ch audio output that can make it easy to hear a human voice. . In addition, the second object of the present invention is to improve the sense of reality by emphasizing sound effects, background music, etc. by suppressing human voice in programs being broadcast or playing. An audio signal conversion device, an audio signal conversion method, a control program, and a computer-readable recording medium for realizing the above-described audio output are provided.

本発明に係る音声信号変換装置は、上記の課題を解決するために、右チャンネルに対応する右側音声信号および左チャンネルに対応する左側音声信号を、中央チャンネルに対応する中央音声出力信号、上記右チャンネルに対応する右側音声出力信号、および上記左チャンネルに対応する左側音声出力信号に変換する音声信号変換装置であって、上記右側音声信号および上記左側音声信号に共通に含まれる共通成分を抽出する共通成分抽出手段と、上記共通成分から上記中央音声出力信号を生成する中央音声出力信号生成手段と、上記右側音声信号および左側音声信号から上記共通成分をそれぞれ減算することによって、上記右側音声出力信号および上記左側音声出力信号を生成する左右音声出力信号生成手段とを備えていることを特徴としている。 In order to solve the above problems, an audio signal conversion apparatus according to the present invention converts a right audio signal corresponding to a right channel and a left audio signal corresponding to a left channel into a central audio output signal corresponding to a center channel, An audio signal conversion device for converting a right audio output signal corresponding to a channel and a left audio output signal corresponding to the left channel, wherein common components included in the right audio signal and the left audio signal are extracted. A common component extraction unit; a central audio output signal generation unit that generates the central audio output signal from the common component; and the right audio output signal by subtracting the common component from the right audio signal and the left audio signal, respectively. And left and right audio output signal generating means for generating the left audio output signal. That.

上記の構成によれば、本発明に係る音声信号変換装置は、右チャンネルに対応する右側音声信号および左チャンネルに対応する左側音声信号を、中央チャンネルに対応する中央音声出力信号、上記右チャンネルに対応する右側音声出力信号、および上記左チャンネルに対応する左側音声出力信号に変換する。つまり、本発明に係る音声信号変換装置は、２チャンネルの音声入力信号を、３チャンネルの音声出力信号に変換する。２チャンネルの音声入力信号としては、例えば、テレビ放送におけるステレオ音声信号などがある。 According to the above configuration, the audio signal conversion device according to the present invention converts the right audio signal corresponding to the right channel and the left audio signal corresponding to the left channel into the central audio output signal corresponding to the center channel and the right channel. The corresponding right audio output signal and the left audio output signal corresponding to the left channel are converted. That is, the audio signal conversion apparatus according to the present invention converts a 2-channel audio input signal into a 3-channel audio output signal. Examples of the two-channel audio input signal include a stereo audio signal in television broadcasting.

また、上記の構成によれば、共通成分抽出手段が、上記右側音声信号および上記左側音声信号に共通に含まれる共通成分を抽出する。共通成分とは、右側音声信号のスペクトルと左側音声信号のスペクトルとに共通して含まれるスペクトル成分である。つまり、共通成分とは、全ての周波数帯域において、右側音声信号のスペクトルと左側音声信号のスペクトルのうち、絶対値が小さい方のスペクトル成分を抽出したものである。 Further, according to the above configuration, the common component extraction unit extracts a common component included in common in the right audio signal and the left audio signal. The common component is a spectral component included in common in the spectrum of the right audio signal and the spectrum of the left audio signal. That is, the common component is obtained by extracting the spectrum component having the smaller absolute value from the spectrum of the right audio signal and the spectrum of the left audio signal in all frequency bands.

また、上記の構成によれば、中央音声出力信号生成手段が、上記共通成分から上記中央音声出力信号を生成する。中央音声出力信号生成手段は、例えば、高速フーリエ変換（ＦＦＴ；Fast Fourier Transform）によって周波数領域におけるスペクトル情報である共通成分を、時間領域の信号波形である中央音声出力信号に変換することができる。なお、中央音声出力信号生成手段は、離散フーリエ変換（ＤＦＴ；Discrete Fourier Transform）や修正離散コサイン変換（ＭＤＣＴ；Modified Discrete Cosine Transform）などによって中央音声出力信号を生成する構成であってもよく、特に限定はされない。 Moreover, according to said structure, a center audio | voice output signal production | generation means produces | generates the said center audio | voice output signal from the said common component. The central audio output signal generation means can convert the common component, which is spectral information in the frequency domain, into a central audio output signal that is a signal waveform in the time domain, for example, by fast Fourier transform (FFT). The central audio output signal generation means may be configured to generate a central audio output signal by a discrete Fourier transform (DFT), a modified discrete cosine transform (MDCT), or the like. There is no limitation.

また、上記の構成によれば、左右音声出力信号生成手段が、上記右側音声信号および左側音声信号から上記共通成分をそれぞれ減算することによって、上記右側音声出力信号および上記左側音声出力信号を生成する。左右音声出力信号生成手段は、例えば、右側音声信号のスペクトルから共通成分を減算して得られるスペクトルに対してＦＦＴなどを施すことにより、上記右側音声出力信号を生成できる。同様にして、左右音声出力信号生成手段は、左側音声信号から共通成分を減算することによって左側音声出力信号を生成できる。 Further, according to the above configuration, the left and right audio output signal generating means generates the right audio output signal and the left audio output signal by subtracting the common component from the right audio signal and the left audio signal, respectively. . The left and right audio output signal generation means can generate the right audio output signal by performing, for example, FFT on the spectrum obtained by subtracting the common component from the spectrum of the right audio signal. Similarly, the left and right audio output signal generating means can generate the left audio output signal by subtracting the common component from the left audio signal.

これにより、本発明に係る音声信号変換装置によれば、人の声と周囲の音とがミックスされた２チャンネルの音声信号から、主に人の声を表す音声信号（共通成分に対応）の１つのチャンネルと、周囲の音を表す音声信号（左右のスペクトル成分から共通成分を減算した成分に対応）の２つのチャンネルとの３チャンネルの音声信号を生成できる。つまり、人の声の音声信号と周囲の音の音声信号とを分離できる。ここで周囲の音とは、ドラマでの効果音や、スポーツ中継の歓声、背景音楽、家庭及び自然騒音等、人の声以外の音を指す。 Thus, according to the audio signal conversion device of the present invention, an audio signal (corresponding to a common component) mainly representing a human voice is obtained from a two-channel audio signal in which a human voice and surrounding sounds are mixed. It is possible to generate a three-channel audio signal including one channel and two channels of an audio signal representing surrounding sounds (corresponding to a component obtained by subtracting a common component from left and right spectral components). That is, it is possible to separate a voice signal of a human voice and a voice signal of surrounding sounds. Here, ambient sounds refer to sounds other than human voices, such as sound effects in dramas, sports broadcast cheers, background music, home and natural noise.

したがって、人の声を表す音声信号と、周囲の音を表す音声信号とを独立して調整することが可能となる。つまり、人の声を表す音声信号と周囲の音を表す音声信号とのレベルバランスを調整できるようになる。 Therefore, it is possible to independently adjust an audio signal representing a human voice and an audio signal representing an ambient sound. That is, it becomes possible to adjust the level balance between an audio signal representing a human voice and an audio signal representing surrounding sounds.

本発明に係る音声信号変換方法は、右チャンネルに対応する右側音声信号および左チャンネルに対応する左側音声信号を、中央チャンネルに対応する中央音声出力信号、上記右チャンネルに対応する右側音声出力信号、および上記左チャンネルに対応する左側音声出力信号に変換する音声信号変換方法であって、上記右側音声信号および上記左側音声信号に共通に含まれる共通成分を抽出する共通成分抽出ステップと、上記共通成分から上記中央音声出力信号を生成する中央音声出力信号生成ステップと、上記右側音声信号および左側音声信号から上記共通成分をそれぞれ減算することによって、上記右側音声出力信号および上記左側音声出力信号を生成する左右音声出力信号生成ステップとを含んでいることを特徴としている。 The audio signal conversion method according to the present invention includes a right audio signal corresponding to the right channel and a left audio signal corresponding to the left channel, a central audio output signal corresponding to the center channel, a right audio output signal corresponding to the right channel, And a method of converting an audio signal into a left audio output signal corresponding to the left channel, the common component extracting step for extracting a common component common to the right audio signal and the left audio signal, and the common component Generating a central audio output signal from the central audio output signal and subtracting the common component from the right audio signal and the left audio signal to generate the right audio output signal and the left audio output signal, respectively. And a left and right audio output signal generation step.

上記の構成によれば、本発明に係る音声信号変換装置と同様の作用効果を奏する。 According to said structure, there exists an effect similar to the audio | voice signal converter concerning this invention.

本発明に係る音声信号変換装置では、上記右側音声出力信号および上記左側音声出力信号の全成分の値をそれぞれ低減させる左右成分低減手段をさらに備えていることが好ましい。 The audio signal conversion device according to the present invention preferably further includes left and right component reduction means for reducing the values of all components of the right audio output signal and the left audio output signal.

上記の構成によれば、左右成分低減手段は、右側音声出力信号および左側音声出力信号の全成分の値を低減させる。すなわち、左右成分低減手段は、上記減算後の右側音声信号および左側音声信号を低減して出力する。ここで、左右成分低減手段は、例えば、上記減算後の右側音声信号および左側音声信号のスペクトル成分に１未満の乗数を乗じたものに、逆ＦＦＴなどを施す構成であってもよいし、上記減算後の右側音声信号および左側音声信号のスペクトル成分に逆ＦＦＴなどを施して時間波形を表す音声信号に変換した後、減衰器によって減衰させる構成であってもよく、特に限定はされない。 According to said structure, a left-right component reduction means reduces the value of all the components of a right audio | voice output signal and a left audio | voice output signal. That is, the left / right component reducing means reduces and outputs the right audio signal and left audio signal after the subtraction. Here, the left / right component reducing means may be configured to perform, for example, inverse FFT on a product obtained by multiplying the spectral components of the right audio signal and the left audio signal after the subtraction by a multiplier less than 1. There may be a configuration in which the spectral components of the right audio signal and the left audio signal after subtraction are subjected to inverse FFT or the like to be converted into an audio signal representing a time waveform and then attenuated by an attenuator, and there is no particular limitation.

これにより、３チャンネルの出力信号のうち、中央音声出力信号は、全成分の値が低減されることなくセンターチャンネルに出力され、右側音声出力信号、および、左側音声出力信号は、全成分の値が低減されて、右チャンネル、および、左チャンネルに出力される。 As a result, of the three channel output signals, the central audio output signal is output to the center channel without reducing the values of all components, and the right audio output signal and the left audio output signal are all component values. Are reduced and output to the right and left channels.

したがって、右側音声出力信号、および、左側音声出力信号によって表される周囲の音が低減されて、中央音声出力信号によって表される人の声が強調されるため、特に小さい音量で番組を視聴する場合において、人の声の聴き取り易さを向上させることができる。 Accordingly, the ambient sound represented by the right audio output signal and the left audio output signal is reduced, and the human voice represented by the central audio output signal is emphasized, so that the program is viewed at a particularly low volume. In some cases, it is possible to improve the ease of listening to a human voice.

本発明に係る音声信号変換装置では、上記左右成分低減手段は、上記右側音声出力信号および上記左側音声出力信号の全成分の値をゼロにすることが好ましい。 In the audio signal conversion apparatus according to the present invention, it is preferable that the left / right component reduction means sets the values of all components of the right audio output signal and the left audio output signal to zero.

上記の構成によれば、左右成分低減手段は、右側音声出力信号の全成分の値をゼロにする。左右成分低減手段は、例えば、右側音声信号のスペクトルから共通成分を減算して求められた全成分に対して、乗数として０を乗じることによって、右側音声出力信号を０にする。また、左右成分低減手段は、同様にして、右側音声出力信号の全成分の値を０にする。 According to the above configuration, the left / right component reduction unit sets the values of all components of the right audio output signal to zero. The left and right component reducing means, for example, sets the right audio output signal to 0 by multiplying all components obtained by subtracting the common component from the spectrum of the right audio signal by 0 as a multiplier. Similarly, the left / right component reducing means sets the values of all components of the right audio output signal to zero.

これにより、３チャンネルの出力信号のうち、中央音声出力信号は、全成分の値が低減されることなくセンターチャンネルに出力され、右側音声出力信号、および、左側音声出力信号は、全成分の値がゼロとなって出力される。つまり、右チャンネル、および、左チャンネルに出力される音声出力信号の大きさは０となる。 As a result, of the three channel output signals, the central audio output signal is output to the center channel without reducing the values of all components, and the right audio output signal and the left audio output signal are all component values. Is output as zero. That is, the magnitude of the audio output signal output to the right channel and the left channel is zero.

したがって、主に人の声が含まれる共通成分から生成された音声信号のみに基づいて音声が出力され、周囲の音（雑音など人の声以外の音）は出力されない。そのため、例えば聴覚の衰えた高齢者がボリュームを上げた場合などであっても、雑音などの周囲の音は大きくならずに人の声のみが強調されるため、通常の聴覚をもつ人は、煩わしさをあまり感じなくなる。 Accordingly, the sound is output based only on the sound signal generated mainly from the common component including the human voice, and the surrounding sound (sound other than the human voice such as noise) is not output. Therefore, for example, even when an elderly person whose hearing has declined increases the volume, the surrounding sounds such as noise are not loud and only the human voice is emphasized. You don't feel much annoyance.

本発明に係る音声信号変換装置では、上記中央音声出力信号の全成分の値を増幅する中央音声出力信号増幅手段をさらに備えていることが好ましい。 The audio signal conversion apparatus according to the present invention preferably further comprises central audio output signal amplifying means for amplifying the values of all components of the central audio output signal.

上記の構成によれば、中央音声出力信号増幅手段は、中央音声出力信号の全成分の値を増幅する。中央音声出力信号増幅手段は、中央音声出力信号を、周波数領域においてスペクトルデータを乗算して増幅する構成であってもよいし、時間領域において時間波形を直接増幅する構成であってもよく、特に限定はされない。 According to said structure, a center audio | voice output signal amplification means amplifies the value of all the components of a center audio | voice output signal. The central audio output signal amplifying means may be configured to amplify the central audio output signal by multiplying the spectrum data in the frequency domain, or may be configured to directly amplify the time waveform in the time domain. There is no limitation.

これにより、中央音声出力信号によって表される人の声を強調できるため、人の声の聴き取り易さを向上させることができる。しかも、全成分の値を増幅できるため、容易に人の声の聴き取り易さを向上させることができる。 Thereby, since the voice of the person represented by the central audio output signal can be emphasized, the ease of listening to the voice of the person can be improved. In addition, since the values of all the components can be amplified, it is possible to easily improve the human voice.

本発明に係る音声信号変換装置では、上記中央音声出力信号の値を調整する中央レベル調整手段をさらに備えていることが好ましい。 The audio signal conversion apparatus according to the present invention preferably further includes a central level adjusting means for adjusting the value of the central audio output signal.

中央レベル調整手段は、例えば、パラメトリックイコライザとして構成される。あるいは、中央レベル調整手段は、パラメトリックイコライザだけでなく、中心周波数やＱやゲインを調整できないフィルタと増幅器で構成されてもよい。中央レベル調整手段によれば、中央音声出力信号に含まれる特定の周波数帯域の成分のみを増幅することができる。 The center level adjusting means is configured as a parametric equalizer, for example. Alternatively, the center level adjusting means may be composed of not only a parametric equalizer but also a filter and an amplifier that cannot adjust the center frequency, Q, and gain. According to the center level adjusting means, only a component in a specific frequency band included in the center audio output signal can be amplified.

これにより、中央音声出力信号によって表される人の声を強調できるため、人の声の聴き取り易さを向上させることができる。しかも、中央音声出力信号の値を直接調整できるため、より細かな調整が可能となる。 Thereby, since the voice of the person represented by the central audio output signal can be emphasized, the ease of listening to the voice of the person can be improved. In addition, since the value of the central audio output signal can be directly adjusted, finer adjustment is possible.

本発明に係る音声信号変換装置では、上記中央レベル調整手段は、略２ｋＨｚにおいて、上記中央音声出力信号のゲインが最大となるように上記値を調整することが好ましい。これにより、人の声をより強調することが可能となる。 In the audio signal conversion apparatus according to the present invention, it is preferable that the central level adjusting means adjusts the value so that the gain of the central audio output signal becomes maximum at approximately 2 kHz. Thereby, it becomes possible to emphasize a human voice more.

本発明に係る音声信号変換装置では、上記中央音声出力信号の全成分の値を低減させる中央音声出力信号低減手段をさらに備えていることが好ましい。 The audio signal conversion apparatus according to the present invention preferably further comprises a central audio output signal reducing means for reducing the values of all components of the central audio output signal.

上記の構成によれば、中央音声出力信号低減手段は、中央音声出力信号の全成分の値を低減させる。すなわち、中央音声出力信号低減手段は、中央音声出力信号を低減して出力する。ここで、中央音声出力信号低減手段は、例えば、共通成分のスペクトル成分に１未満の乗数を乗じたものに、逆ＦＦＴなどを施す構成であってよい。あるいは、スペクトル成分に逆ＦＦＴなどを施して時間波形を表す音声信号に変換した後、減衰器によって減衰させる構成であってもよく、特に限定はされない。 According to said structure, a center audio | voice output signal reduction means reduces the value of all the components of a center audio | voice output signal. That is, the central audio output signal reducing means reduces the central audio output signal and outputs it. Here, the central audio output signal reduction unit may be configured to perform inverse FFT or the like on a product obtained by multiplying the spectrum component of the common component by a multiplier less than 1, for example. Or after performing inverse FFT etc. on a spectrum component and converting it into the audio | voice signal showing a time waveform, the structure attenuate | damped by an attenuator may be sufficient, and it does not specifically limit.

これにより、３チャンネルの出力信号のうち、中央音声出力信号は、全成分の値が低減されてセンターチャンネルに出力され、右側音声出力信号、および、左側音声出力信号は、全成分の値が低減されることなく、右チャンネル、および、左チャンネルに出力される。 As a result, among the three-channel output signals, the central audio output signal is reduced in all component values and output to the center channel, and the right audio output signal and the left audio output signal are reduced in all component values. Without being output to the right channel and the left channel.

したがって、中央音声出力信号によって表される人の声が低減されるため、右側音声出力信号、および、左側音声出力信号によって表される周囲の音が強調されて、特に小さい音量で番組を視聴する場合において、臨場感を向上させることができる。 Therefore, since the human voice represented by the central audio output signal is reduced, the right audio output signal and the surrounding sound represented by the left audio output signal are emphasized, and the program is viewed at a particularly low volume. In some cases, a sense of reality can be improved.

本発明に係る音声信号変換装置では、上記中央音声出力信号低減手段は、上記中央音声出力信号の全成分の値をゼロにすることが好ましい。 In the audio signal conversion apparatus according to the present invention, it is preferable that the central audio output signal reducing means sets the values of all components of the central audio output signal to zero.

上記の構成によれば、中央音声出力信号低減手段は、中央音声出力信号の全成分の値をゼロにする。中央音声出力信号低減手段は、例えば、共通成分に０を乗じることによって、共通成分の全成分の値を０にする。 According to said structure, a center audio | voice output signal reduction means makes the value of all the components of a center audio | voice output signal zero. For example, the central audio output signal reducing unit multiplies the common component by 0 to set the values of all the common components to 0.

これにより、３チャンネルの出力信号のうち、中央音声出力信号は、全成分の値がゼロになってセンターチャンネルに出力され、右側音声出力信号、および、左側音声出力信号は、全成分の値が低減されることなく出力される。つまり、センターチャンネルに出力される音声出力信号の大きさは０となる。 As a result, among the three channel output signals, the central audio output signal is output to the center channel with all component values being zero, and the right audio output signal and the left audio output signal are all component values. Output without being reduced. That is, the magnitude of the audio output signal output to the center channel is zero.

したがって、周囲の音（雑音など人の声以外の音）のみが出力され、主に人の声が含まれる共通成分から生成された音声信号に基づいた音声は出力されず、臨場感をさらに強調させることが可能となる。 Therefore, only ambient sounds (noise and other sounds other than human voice) are output, and no sound based on the audio signal generated from the common component that mainly contains human voice is output, further enhancing the sense of reality. It becomes possible to make it.

本発明に係る音声信号変換装置では、上記右側音声出力信号および上記左側音声出力信号の全成分の値をそれぞれ増幅する左右成分増幅手段をさらに備えていることが好ましい。 The audio signal conversion device according to the present invention preferably further includes left and right component amplifying means for amplifying the values of all components of the right audio output signal and the left audio output signal.

上記の構成によれば、左右成分増幅手段は、上記右側音声出力信号および上記左側音声出力信号の全成分の値をそれぞれ増幅する。左右成分増幅手段は、上記右側音声出力信号および上記左側音声出力信号を、周波数領域においてスペクトルデータを乗算して増幅する構成であってもよいし、時間領域において時間波形を直接増幅する構成であってもよく、特に限定はされない。
左音声出力信号または右音声出力信号によって表される周囲の音を強調できるため、臨場感を向上させることができる。しかも、全成分の値を増幅できるため、容易に臨場感を向上させることが可能となる。 According to said structure, a right-and-left component amplification means amplifies the value of all the components of the said right audio | voice output signal and the said left audio | voice output signal, respectively. The left and right component amplifying means may be configured to amplify the right audio output signal and the left audio output signal by multiplying spectrum data in the frequency domain, or to directly amplify the time waveform in the time domain. There is no particular limitation.
Since the surrounding sound represented by the left audio output signal or the right audio output signal can be emphasized, the sense of reality can be improved. In addition, since the values of all the components can be amplified, the sense of reality can be easily improved.

本発明に係る音声信号変換装置では、上記右側音声出力信号および上記左側音声出力信号のうち、少なくとも一方の信号の値を調整するレベル調整手段をさらに備えていることが好ましい。 The audio signal converter according to the present invention preferably further comprises level adjusting means for adjusting the value of at least one of the right audio output signal and the left audio output signal.

左右レベル調整手段は、例えば、パラメトリックイコライザとして構成される。あるいは、左右レベル調整装手段は、パラメトリックイコライザだけでなく、中心周波数やＱやゲインを調整できないフィルタと増幅器によって構成されてもよく、特に限定はされない。左右レベル調整手段によれば、右側音声出力信号および左側音声出力信号に含まれる特定の周波数帯域の成分のみを増幅することができる。 The left / right level adjusting means is configured as a parametric equalizer, for example. Alternatively, the left / right level adjusting means may be configured not only by a parametric equalizer but also by a filter and an amplifier that cannot adjust the center frequency, Q, and gain, and is not particularly limited. According to the left / right level adjusting means, it is possible to amplify only a component of a specific frequency band included in the right audio output signal and the left audio output signal.

これにより、左音声出力信号または右音声出力信号によって表される周囲の音を強調できるため、臨場感を向上させることができる。しかも、左音声出力信号または右音声出力信号の値を直接調整できるため、より細かな調整が可能となる。 As a result, the ambient sound represented by the left audio output signal or the right audio output signal can be emphasized, so that the sense of reality can be improved. In addition, since the value of the left audio output signal or the right audio output signal can be directly adjusted, finer adjustment is possible.

本発明に係る音声信号変換装置では、上記左右レベル調整手段は、略４ｋＨｚにおいて、上記右側音声出力信号と上記左側音声出力信号との少なくとも一方のゲインが最小となるように上記値を調整することが好ましい。 In the audio signal converter according to the present invention, the left / right level adjusting means adjusts the value so that at least one gain of the right audio output signal and the left audio output signal is minimized at approximately 4 kHz. Is preferred.

これにより、臨場感をより向上させることが可能となる。 Thereby, it is possible to further improve the sense of reality.

本発明に係る音声信号変換装置は、右チャンネルに対応する右側音声信号および左チャンネルに対応する左側音声信号を、中央チャンネルに対応する中央音声出力信号、上記右チャンネルに対応する右側音声出力信号、および上記左チャンネルに対応する左側音声出力信号に変換する音声信号変換装置であって、上記右側音声信号の高域成分の右側音声高域信号および上記左側音声信号の高域成分の左側音声高域信号を生成する高域信号生成手段と、上記右側音声信号の低域成分の右側音声低域信号および上記左側音声信号の低域成分の左側音声低域信号を生成する低域信号生成手段と、上記右側音声高域信号および上記左側音声高域信号に共通に含まれる高域共通成分を抽出する共通成分抽出手段と、上記高域共通成分から上記中央音声出力信号を生成する共通信号生成手段と、上記右側音声高域信号および上記左側音声高域信号から上記高域共通成分をそれぞれ減算し、該減算後の上記右側音声高域信号に上記右側音声低域信号を加算し、該減算後の上記左側音声高域信号に上記左側音声低域信号を加算することによって、上記右側音声出力信号および上記左側音声出力信号を生成する音声出力信号生成手段とを備えていることを特徴としている。 The audio signal converter according to the present invention includes a right audio signal corresponding to the right channel and a left audio signal corresponding to the left channel, a central audio output signal corresponding to the center channel, a right audio output signal corresponding to the right channel, And an audio signal conversion device for converting into a left audio output signal corresponding to the left channel, the right audio high frequency signal of the high frequency component of the right audio signal and the left audio high frequency of the high frequency component of the left audio signal High-frequency signal generating means for generating a signal, low-frequency signal generating means for generating a right-side audio low-frequency signal of a low-frequency component of the right-side audio signal and a left-side audio low-frequency signal of a low-frequency component of the left audio signal; A common component extraction means for extracting a high frequency common component commonly included in the right audio high frequency signal and the left audio high frequency signal; and the central audio output from the high frequency common component. Common signal generating means for generating a signal, subtracting the high frequency common component from the right audio high frequency signal and the left audio high frequency signal, respectively, and adding the right audio low frequency to the right audio high frequency signal after the subtraction Audio output signal generation means for generating the right audio output signal and the left audio output signal by adding the signal and adding the left audio low frequency signal to the left audio high frequency signal after the subtraction. It is characterized by having.

上記の構成によれば、音声信号変換装置は、２チャンネルの音声入力信号を、３チャンネルの音声出力信号に変換する。そして、高域信号生成手段は、上記右側音声信号の高域成分である右側音声高域信号および上記左側音声信号の高域成分である左側音声高域信号を生成する。また、低域信号生成手段は、上記右側音声信号の低域成分である右側音声低域信号および上記左側音声信号の低域成分である左側音声低域信号を生成する。高域信号生成手段および低域信号生成手段は、それぞれが高域信号および低域信号を抽出可能なフィルタの構成であってもよいし、高域信号生成手段と低域信号生成手段のいずれか一方がフィルタであって、他方は原信号から前記フィルタによって出力される信号を減算する構成であってもよく、特に限定はされない。 According to the above configuration, the audio signal conversion device converts a 2-channel audio input signal into a 3-channel audio output signal. The high frequency signal generating means generates a right audio high frequency signal that is a high frequency component of the right audio signal and a left audio high frequency signal that is a high frequency component of the left audio signal. Further, the low frequency signal generating means generates a right audio low frequency signal that is a low frequency component of the right audio signal and a left audio low frequency signal that is a low frequency component of the left audio signal. Each of the high-frequency signal generating means and the low-frequency signal generating means may have a filter configuration capable of extracting a high-frequency signal and a low-frequency signal, or either of the high-frequency signal generating means and the low-frequency signal generating means. One may be a filter, and the other may be configured to subtract the signal output by the filter from the original signal, and is not particularly limited.

また、上記の構成によれば、共通成分抽出手段は、上記右側音声高域信号および上記左側音声高域信号に共通に含まれる高域共通成分を抽出する。ここで、高域音声信号には低域の音声が含まれておらず、人の声以外の音声は除かれているため、高域共通成分としては、より厳密に人の声に対応した成分のみが抽出される。なお、高域信号生成手段および低域信号生成手段における遮断周波数は、要求される正確さに応じて設定されればよく、特に限定はされない。そして、共通信号生成手段は、例えばＦＦＴなどによって、人の声に対応した上記高域共通成分から上記中央音声出力信号を生成する。 Further, according to the above configuration, the common component extraction unit extracts a high frequency common component included in common in the right audio high frequency signal and the left audio high frequency signal. Here, since the high frequency audio signal does not include low frequency audio and audio other than the human voice is excluded, the high frequency common component is a component more precisely corresponding to the human voice. Only is extracted. Note that the cutoff frequency in the high-frequency signal generating means and the low-frequency signal generating means is not particularly limited as long as it is set according to the required accuracy. Then, the common signal generating means generates the central audio output signal from the high frequency common component corresponding to the human voice by, for example, FFT.

また、上記の構成によれば、音声出力信号生成手段は、上記右側音声高域信号および上記左側音声高域信号から上記高域共通成分をそれぞれ減算する。つまり、右側音声高域信号および左側音声高域信号のうち、人の声以外、すなわち、周囲の音に対応する成分が算出される。そして、該減算後の上記右側音声高域信号に上記右側音声低域信号を加算し、該減算後の上記左側音声高域信号に上記左側音声低域信号を加算することによって、より厳密に周囲の音に対応した上記右側音声出力信号および上記左側音声出力信号を生成する。 Further, according to the above configuration, the audio output signal generation means subtracts the high frequency common component from the right audio high frequency signal and the left audio high frequency signal, respectively. That is, components corresponding to sounds other than the human voice, that is, surrounding sounds, are calculated from the right audio high frequency signal and the left audio high frequency signal. Then, the right audio low-frequency signal is added to the right audio high-frequency signal after the subtraction, and the left audio low-frequency signal is added to the left audio high-frequency signal after the subtraction. The right audio output signal and the left audio output signal corresponding to the sound are generated.

これにより、本発明に係る音声信号変換装置によれば、人の声を表す成分と人の声以外の成分とを、より厳密に分離することが可能となる。したがって、より正確に人の声に対応する音声出力信号と周囲の音に対応する音声出力信号とが生成される。これにより、より厳密に人の声に対応する音声出力信号と周囲の音に対応する音声出力信号とのレベルバランスを変化させることができる。即ち、正確に人の声を強調したり、または、人の声以外（臨場感）を強調した再生が可能となる。 Thereby, according to the audio | voice signal converter which concerns on this invention, it becomes possible to isolate | separate more strictly the component showing a human voice, and components other than a human voice. Therefore, a sound output signal corresponding to a human voice and a sound output signal corresponding to a surrounding sound are generated more accurately. As a result, the level balance between the sound output signal corresponding to the human voice and the sound output signal corresponding to the surrounding sound can be changed more strictly. That is, it is possible to perform reproduction that emphasizes the voice of a person accurately or emphasizes a voice other than a person's voice (realism).

本発明に係る音声信号変換方法は、右チャンネルに対応する右側音声信号および左チャンネルに対応する左側音声信号を、中央チャンネルに対応する中央音声出力信号、上記右チャンネルに対応する右側音声出力信号、および上記左チャンネルに対応する左側音声出力信号に変換する音声信号変換方法であって、上記右側音声信号の高域成分の右側音声高域信号および上記左側音声信号の高域成分の左側音声高域信号を生成する高域信号生成ステップと、上記右側音声信号の低域成分の右側音声低域信号および上記左側音声信号の低域成分の左側音声低域信号を生成する低域信号生成ステップと、上記右側音声高域信号および上記左側音声高域信号に共通に含まれる高域共通成分を抽出する共通成分抽出ステップと、上記高域共通成分から上記中央音声出力信号を生成する共通信号生成ステップと、上記右側音声高域信号および上記左側音声高域信号から上記高域共通成分をそれぞれ減算し、該減算後の上記右側音声高域信号に上記右側音声低域信号を加算し、該減算後の上記左側音声高域信号に上記左側音声低域信号を加算することによって、上記右側音声出力信号および上記左側音声出力信号を生成する音声出力信号生成ステップとを含んでいることを特徴としている。 The audio signal conversion method according to the present invention includes a right audio signal corresponding to the right channel and a left audio signal corresponding to the left channel, a central audio output signal corresponding to the center channel, a right audio output signal corresponding to the right channel, And an audio signal conversion method for converting into a left audio output signal corresponding to the left channel, wherein the right audio high frequency signal of the high frequency component of the right audio signal and the left audio high frequency of the high frequency component of the left audio signal A high frequency signal generating step for generating a signal, a low frequency signal generating step for generating a right audio low frequency signal of a low frequency component of the right audio signal and a left audio low frequency signal of a low frequency component of the left audio signal, A common component extraction step for extracting a high-frequency common component commonly included in the right audio high-frequency signal and the left audio high-frequency signal; A common signal generating step for generating a central audio output signal; and subtracting the high frequency common component from the right audio high frequency signal and the left audio high frequency signal, respectively, and adding the right audio high frequency signal to the right audio high frequency signal after the subtraction. An audio output signal generating step for generating the right audio output signal and the left audio output signal by adding an audio low frequency signal and adding the left audio low frequency signal to the left audio high frequency signal after the subtraction It is characterized by including.

また、本発明に係る音声信号変換装置では、上記低域信号生成手段は、上記右側音声信号および上記左側音声信号を低域濾波して、上記右側音声低域信号および上記左側音声低域信号を生成し、上記高域信号生成手段は、上記右側音声信号および上記左側音声信号に上記低域信号生成手段と同一の遅延量を持たせ、遅延した上記右側音声信号から上記右側音声低域信号を減算し、遅延した上記左側音声信号から上記左側音声低域信号を減算して、上記右側音声高域信号および上記左側音声高域信号を生成することが好ましい。 In the audio signal conversion device according to the present invention, the low-frequency signal generation means performs low-pass filtering on the right audio signal and the left audio signal, and outputs the right audio low-frequency signal and the left audio low-frequency signal. The high frequency signal generating means generates the right audio low frequency signal from the delayed right audio signal by giving the right audio signal and the left audio signal the same delay amount as the low frequency signal generating means. The left audio low-frequency signal is preferably subtracted from the delayed left audio signal to generate the right audio high-frequency signal and the left audio high-frequency signal.

上記の構成によれば、上記低域信号生成手段は、上記右側音声信号および上記左側音声信号を低域濾波する。一方、上記高域信号生成手段は、上記右側音声信号および上記左側音声信号を、上記低域信号生成手段の遅延時間と同じだけ遅延させる。つまり、入力された音声信号と低域信号生成手段を通過した音声信号との位相を合わせる。そして、遅延した上記右側音声信号から上記右側音声低域信号を減算し、遅延した上記左側音声信号から上記左側音声低域信号を減算して、上記右側音声高域信号および上記左側音声高域信号を生成する。 According to said structure, the said low-pass signal production | generation means carries out the low-pass filter of the said right audio | voice signal and the said left audio | voice signal. On the other hand, the high frequency signal generating means delays the right audio signal and the left audio signal by the same delay time as the low frequency signal generating means. That is, the phase of the input audio signal and the audio signal that has passed through the low-frequency signal generating means are matched. Then, the right audio low frequency signal is subtracted from the delayed right audio signal, the left audio low frequency signal is subtracted from the delayed left audio signal, and the right audio high frequency signal and the left audio high frequency signal are subtracted. Is generated.

これにより、高域信号生成手段は高域濾波の機能を備えることなく高域信号を生成するため、簡素な部品を用いて、消費電力の少ない音声信号変換装置を構成することができる。 As a result, the high-frequency signal generating means generates a high-frequency signal without having the function of high-frequency filtering, and thus an audio signal conversion device with low power consumption can be configured using simple components.

また、本発明に係る音声信号変換装置では、上記低域信号生成手段における低域濾波の遮断周波数は、略１００Ｈｚであることが好ましい。 Moreover, in the audio signal conversion device according to the present invention, it is preferable that the cutoff frequency of the low-pass filtering in the low-frequency signal generating means is approximately 100 Hz.

これにより、高域共通成分として、より厳密に人の声に対応した成分を抽出することが可能となるため、より正確に人の声を強調することができるようになる。 As a result, it is possible to more precisely extract a component corresponding to a human voice as a high-frequency common component, so that the human voice can be more accurately emphasized.

本発明に係る音声信号変換装置では、上記高域信号生成手段は、上記右側音声信号および上記左側音声信号を高域濾波して、上記右側音声高域信号および上記左側音声高域信号を生成し、上記低域信号生成手段は、上記右側音声信号および上記左側音声信号に上記高域信号生成手段と同一の遅延量を持たせ、遅延した上記右側音声信号から上記右側音声高域信号を減算し、遅延した上記左側音声信号から上記左側音声高域信号を減算して、上記右側音声低域信号および上記左側音声低域信号を生成することが好ましい。 In the audio signal converter according to the present invention, the high-frequency signal generation means generates the right audio high-frequency signal and the left audio high-frequency signal by high-pass filtering the right audio signal and the left audio signal. The low frequency signal generating means gives the right audio signal and the left audio signal the same delay amount as the high frequency signal generating means, and subtracts the right audio high frequency signal from the delayed right audio signal. Preferably, the left audio high frequency signal is subtracted from the delayed left audio signal to generate the right audio low frequency signal and the left audio low frequency signal.

上記の構成によれば、上記高域信号生成手段は、上記右側音声信号および上記左側音声信号を高域濾波する。一方、上記低域信号生成手段は、上記右側音声信号および上記左側音声信号を、上記低域信号生成手段の遅延時間と同じだけ遅延させる。つまり、入力された音声信号と高域信号生成手段を通過した音声信号との位相を合わせる。そして、遅延した上記右側音声信号から上記右側音声高域信号を減算し、遅延した上記左側音声信号から上記左側音声高域信号を減算して、上記右側音声低域信号および上記左側音声低域信号を生成する。 According to said structure, the said high-pass signal production | generation means carries out the high-pass filter of the said right audio | voice signal and the said left audio | voice signal. On the other hand, the low frequency signal generating means delays the right audio signal and the left audio signal by the same delay time as the low frequency signal generating means. That is, the phase of the input audio signal and the audio signal that has passed through the high frequency signal generating means are matched. Then, the right audio high frequency signal is subtracted from the delayed right audio signal, the left audio high frequency signal is subtracted from the delayed left audio signal, and the right audio low frequency signal and the left audio low frequency signal are subtracted. Is generated.

これにより、低域信号生成手段は低域濾波の機能を備えることなく低域信号を生成するため、簡素な部品を用いて、消費電力の少ない音声信号変換装置を構成することができる。 As a result, the low-frequency signal generating means generates a low-frequency signal without having a low-pass filtering function, and therefore, an audio signal conversion device with low power consumption can be configured using simple components.

本発明に係る音声信号変換装置では、上記高域信号生成手段における高域濾波の遮断周波数は、略１００Ｈｚであることが好ましい。 In the audio signal conversion device according to the present invention, it is preferable that the cut-off frequency of high-pass filtering in the high-frequency signal generating means is approximately 100 Hz.

これにより、低域共通成分として、より厳密に人の声に対応した成分を抽出することが可能となるため、より正確に人の声を強調することができるようになる。 As a result, it is possible to extract a component corresponding to the human voice more precisely as the low-frequency common component, so that the human voice can be more accurately emphasized.

また、本発明に係る音声信号変換装置では、上記低域信号生成手段は、上記右側音声信号および上記左側音声信号を低域濾波して、上記右側音声低域信号および上記左側音声低域信号を生成し、上記高域信号生成手段は、上記右側音声信号および上記左側音声信号を高域濾波して、上記右側音声高域信号および上記左側音声高域信号を生成することが好ましい。 In the audio signal conversion device according to the present invention, the low-frequency signal generation means performs low-pass filtering on the right audio signal and the left audio signal, and outputs the right audio low-frequency signal and the left audio low-frequency signal. Preferably, the high frequency signal generating means generates the right audio high frequency signal and the left audio high frequency signal by high-pass filtering the right audio signal and the left audio signal.

これにより、高域信号生成手段は高域濾波の機能を有し、低域信号生成手段は低域濾波の機能を有しているため、遅延器を有さず、少ない部品の数によって音声信号変換装置を構成することができる。 As a result, the high-frequency signal generating means has a high-pass filtering function, and the low-frequency signal generating means has a low-pass filtering function. A conversion device can be configured.

また、本発明に係る音声信号変換装置では、上記低域信号生成手段における低域濾波の遮断周波数、および、上記高域信号生成手段における高域濾波の遮断周波数は、共に略１００Ｈｚであることが好ましい。 In the audio signal converter according to the present invention, both the low-pass filtering cutoff frequency in the low-frequency signal generating means and the high-pass filtering cutoff frequency in the high-frequency signal generating means are both approximately 100 Hz. preferable.

また、本発明に係る音声信号変換装置では、上記低域信号生成手段の遅延量と上記高域信号生成手段の遅延量とは等しいことが好ましい。すなわち、入力された音声信号と低域信号生成手段を通過した音声信号との位相を合わせることが好ましい。 In the audio signal conversion apparatus according to the present invention, it is preferable that the delay amount of the low-frequency signal generation unit is equal to the delay amount of the high-frequency signal generation unit. That is, it is preferable to match the phase of the input audio signal and the audio signal that has passed through the low-frequency signal generating means.

なお、音声信号変換装置は、コンピュータによって実現してもよい。この場合、コンピュータを上記各手段として動作させることにより上記音声信号変換装置をコンピュータにおいて実現する制御プログラム、およびその制御プログラムを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。 Note that the audio signal conversion apparatus may be realized by a computer. In this case, a control program for realizing the audio signal conversion apparatus in the computer by operating the computer as each of the above means and a computer-readable recording medium on which the control program is recorded also fall within the scope of the present invention.

本発明に係る音声信号変換装置は、右チャンネルに対応する右側音声信号および左チャンネルに対応する左側音声信号を、中央チャンネルに対応する中央音声出力信号、上記右チャンネルに対応する右側音声出力信号、および上記左チャンネルに対応する左側音声出力信号に変換する音声信号変換装置であって、上記右側音声信号および上記左側音声信号に共通に含まれる共通成分を抽出する共通成分抽出手段と、上記共通成分から上記中央音声出力信号を生成する中央音声出力信号生成手段と、上記右側音声信号および左側音声信号から上記共通成分をそれぞれ減算することによって、上記右側音声出力信号および上記左側音声出力信号を生成する左右音声出力信号生成手段とを備えている。 The audio signal converter according to the present invention includes a right audio signal corresponding to the right channel and a left audio signal corresponding to the left channel, a central audio output signal corresponding to the center channel, a right audio output signal corresponding to the right channel, And an audio signal converter for converting into a left audio output signal corresponding to the left channel, the common component extracting means for extracting a common component common to the right audio signal and the left audio signal, and the common component The central audio output signal generating means for generating the central audio output signal from, and the right audio output signal and the left audio output signal are generated by subtracting the common component from the right audio signal and the left audio signal, respectively. Left and right audio output signal generating means.

それゆえ、本発明に係る音声信号変換装置は、人の声を表す音声信号と、周囲の音を表す音声信号とを独立して調整でき、人の声を表す音声信号と周囲の音を表す音声信号とのレベルバランスを調整できるため、人の声を強調して、聴き取り易くすることができる。 Therefore, the audio signal conversion device according to the present invention can independently adjust an audio signal representing a human voice and an audio signal representing an ambient sound, and represents an audio signal representing a human voice and an ambient sound. Since the level balance with the audio signal can be adjusted, the human voice can be emphasized to make it easy to hear.

また、本発明に係る音声信号変換装置は、右チャンネルに対応する右側音声信号および左チャンネルに対応する左側音声信号を、中央チャンネルに対応する中央音声出力信号、上記右チャンネルに対応する右側音声出力信号、および上記左チャンネルに対応する左側音声出力信号に変換する音声信号変換装置であって、上記右側音声信号の高域成分の右側音声高域信号および上記左側音声信号の高域成分の左側音声高域信号を生成する高域信号生成手段と、上記右側音声信号の低域成分の右側音声低域信号および上記左側音声信号の低域成分の左側音声低域信号を生成する低域信号生成手段と、上記右側音声高域信号および上記左側音声高域信号に共通に含まれる高域共通成分を抽出する共通成分抽出手段と、上記高域共通成分から上記中央音声出力信号を生成する共通信号生成手段と、上記右側音声高域信号および上記左側音声高域信号から上記高域共通成分をそれぞれ減算し、該減算後の上記右側音声高域信号に上記右側音声低域信号を加算し、該減算後の上記左側音声高域信号に上記左側音声低域信号を加算することによって、上記右側音声出力信号および上記左側音声出力信号を生成する音声出力信号生成手段とを備えている。 The audio signal conversion apparatus according to the present invention also includes a right audio signal corresponding to the right channel and a left audio signal corresponding to the left channel, a central audio output signal corresponding to the center channel, and a right audio output corresponding to the right channel. An audio signal conversion device for converting a signal and a left audio output signal corresponding to the left channel, wherein the right audio high frequency signal of the high frequency component of the right audio signal and the left audio of the high frequency component of the left audio signal High frequency signal generating means for generating a high frequency signal, and low frequency signal generating means for generating a right audio low frequency signal of a low frequency component of the right audio signal and a left audio low frequency signal of a low frequency component of the left audio signal A common component extraction means for extracting a high frequency common component that is commonly included in the right audio high frequency signal and the left audio high frequency signal, and the center from the high frequency common component. A common signal generating means for generating a voice output signal; and subtracting the high frequency common component from the right audio high frequency signal and the left audio high frequency signal, respectively, and adding the right audio to the right audio high frequency signal after the subtraction. Audio output signal generating means for adding the low frequency signal and adding the left audio low frequency signal to the left audio high frequency signal after the subtraction to generate the right audio output signal and the left audio output signal; It has.

それゆえ、本発明に係る音声信号変換装置によれば、人の声を表す成分と人の声以外の成分とを、より厳密に分離することが可能となり、より正確に人の声に対応する音声出力信号と周囲の音に対応する音声出力信号とのレベルバランスを変化させることができるため、人の声を強調する場合において、または、人の声以外（臨場感）を強調する場合において確度を高めることができるようになる。 Therefore, according to the audio signal conversion device of the present invention, it is possible to more strictly separate a component representing a human voice and a component other than a human voice, and more accurately deal with a human voice. Since the level balance between the audio output signal and the audio output signal corresponding to the surrounding sound can be changed, the accuracy when emphasizing a human voice or emphasizing a voice other than a human voice (sense of presence) Can be increased.

〔実施の形態１〕
（音声信号変換装置１）
図１は、本発明に係る音声信号変換装置１の構成を示すブロック図である。本発明に係る音声信号変換装置１は、スペクトル変換部２と共通成分抽出部３（共通成分抽出手段）と乗算部４と逆変換部５とパラメトリックイコライザ（ＰＥＱ；Parametric Equalizer）部６と減算器７、８と入力端子１２と出力端子１３とを備えている。 [Embodiment 1]
(Audio signal converter 1)
FIG. 1 is a block diagram showing a configuration of an audio signal conversion apparatus 1 according to the present invention. An audio signal converter 1 according to the present invention includes a spectrum converter 2, a common component extractor 3 (common component extractor), a multiplier 4, an inverse converter 5, a parametric equalizer (PEQ) unit 6, and a subtractor. 7, 8, input terminal 12, and output terminal 13.

スペクトル変換部２は、スペクトル変換部２ａ、および２ｂを含んで構成される。乗算部４は、乗算部４ａ（左右成分低減手段、左右成分増幅手段）、乗算部４ｂ（中央音声出力信号増幅手段、中央音声出力信号低減手段）、および乗算部４ｃ（左右成分低減手段、左右成分増幅手段）を含んで構成される。逆変換部５は、逆変換部５ａ（左右音声出力信号生成手段）、逆変換部５ｂ（中央音声出力信号生成手段）、および逆変換部５ｃ（左右音声出力信号生成手段）を含んで構成される。ＰＥＱ部６は、ＰＥＱ部６ａ（左右レベル調整手段）、ＰＥＱ部６ｂ（中央レベル調整手段）、およびＰＥＱ部６ｃ（左右レベル調整手段）を含んで構成される。入力端子１２は、入力端子１２ａ、および１２ｂを含んで構成される。出力端子１３は、出力端子１３ａ、１３ｂ、および１３ｃを含んで構成される。 The spectrum conversion unit 2 includes spectrum conversion units 2a and 2b. The multiplying unit 4 includes a multiplying unit 4a (left / right component reducing unit, left / right component amplifying unit), a multiplying unit 4b (central audio output signal amplifying unit, central audio output signal reducing unit), and a multiplying unit 4c (left / right component reducing unit, left and right component). Component amplification means). The inverse conversion unit 5 includes an inverse conversion unit 5a (left / right audio output signal generation unit), an inverse conversion unit 5b (central audio output signal generation unit), and an inverse conversion unit 5c (left / right audio output signal generation unit). The The PEQ unit 6 includes a PEQ unit 6a (left / right level adjusting unit), a PEQ unit 6b (center level adjusting unit), and a PEQ unit 6c (left / right level adjusting unit). The input terminal 12 includes input terminals 12a and 12b. The output terminal 13 includes output terminals 13a, 13b, and 13c.

音声信号変換装置１は、テレビ受信装置などに実装され、放送中の番組の音声を強調する装置である。ここで、音声とは、台詞やボーカルなどの人の声を指し、人の声以外の音（例えば、周囲の雑音やＢＧＭや効果音など）と区別する。つまり、音声信号変換装置１は、放送番組中の人の声を強調する装置である。なお、音声信号と表現した場合、番組中の音声と音声以外の音も含めた全ての音を表す信号を指す。 The audio signal converter 1 is an apparatus that is mounted on a television receiver or the like and emphasizes the audio of a program being broadcast. Here, the voice refers to a human voice such as dialogue or vocals, and is distinguished from sounds other than the human voice (for example, ambient noise, BGM, sound effects, etc.). That is, the audio signal conversion apparatus 1 is an apparatus that emphasizes the voice of a person in a broadcast program. In addition, when expressed as an audio signal, it indicates a signal that represents all sounds including audio and non-audio sounds in the program.

本実施の形態では、音声信号変換装置１には、ＰＣＭ（Pulse Code Modulation）によってデジタル符号化された２チャンネルの音声信号が入力される。通常、ステレオ放送などでは、入力された２チャンネルの音声信号に基づいて、テレビに備えられている左右のスピーカに異なる音声信号が供給され、左右のスピーカからは異なる音声が出力される。 In the present embodiment, the audio signal converter 1 receives a 2-channel audio signal digitally encoded by PCM (Pulse Code Modulation). Normally, in stereo broadcasting or the like, different audio signals are supplied to the left and right speakers provided in the television based on the input two-channel audio signals, and different audio is output from the left and right speakers.

以下では、通常のステレオ放送において左右のスピーカに供給される音声信号を、それぞれ、左側音声信号（左チャンネルに対応する左側音声信号）、および右側音声信号（右チャンネルに対応する右側音声信号）と呼ぶ。右側音声信号、および、左側音声信号は、それぞれ、入力端子１２ａ、および、入力端子１２ｂを介して音声信号変換装置１に入力される。 In the following, audio signals supplied to the left and right speakers in normal stereo broadcasting are respectively a left audio signal (left audio signal corresponding to the left channel) and a right audio signal (right audio signal corresponding to the right channel). Call. The right audio signal and the left audio signal are input to the audio signal converter 1 via the input terminal 12a and the input terminal 12b, respectively.

また、本実施の形態では、音声信号変換装置１は、上記の右側音声信号と左側音声信号との２チャンネルの音声信号に基づいて、左右、および、中央の３つのスピーカを介して音声を出力する。つまり、音声信号変換装置１は、入力された２チャンネルの音声信号を、左チャンネル、右チャンネル、および、中央チャンネルの３チャンネルの音声出力信号に変換し、各スピーカに供給する構成である。 In the present embodiment, the audio signal conversion apparatus 1 outputs audio via the left, right, and center speakers based on the two-channel audio signals of the right audio signal and the left audio signal. To do. That is, the audio signal converter 1 is configured to convert the input 2-channel audio signals into 3-channel audio output signals of the left channel, the right channel, and the center channel, and supply them to each speaker.

以下に、図１に示す音声信号変換装置１における音声強調の処理について説明する。 Hereinafter, the speech enhancement process in the speech signal conversion apparatus 1 shown in FIG. 1 will be described.

スペクトル変換部２は、各チャンネルの音声信号のスペクトルを算出するための各種の処理を行う。スペクトル変換部２について詳細に説明すれば次のとおりである。 The spectrum conversion unit 2 performs various processes for calculating the spectrum of the audio signal of each channel. The spectrum conversion unit 2 will be described in detail as follows.

まず、スペクトル変換部２ａは、入力端子１２ａを介して入力された右側音声信号を、１フレームあたり１０２４サンプルに分割する。音声信号のサンプリング周波数が４４．１ｋＨｚの場合、１フレームあたりの時間は、２３ｍｓ（＝（１÷４４１００）×１０２４）となる。 First, the spectrum conversion unit 2a divides the right audio signal input via the input terminal 12a into 1024 samples per frame. When the sampling frequency of the audio signal is 44.1 kHz, the time per frame is 23 ms (= (1 ÷ 44100) × 1024).

次に、スペクトル変換部２ａは、フレーム分割された音声信号に対し、ハニング窓などの窓関数を掛ける。窓関数を適用することにより、フレーム分割された音声信号についての周波数解析の誤差を低減できる。本実施の形態では、窓関数としてハニング窓を用いているが、ハニング窓以外の窓関数であってもよく、特に限定はされない。 Next, the spectrum converter 2a multiplies the frame-divided audio signal by a window function such as a Hanning window. By applying the window function, it is possible to reduce the frequency analysis error for the frame-divided audio signal. In the present embodiment, a Hanning window is used as the window function, but a window function other than the Hanning window may be used, and is not particularly limited.

次に、スペクトル変換部２ａは、フレームごとに、窓関数が適用された音声信号に対して高速フーリエ変換（ＦＦＴ：Fast Fourier Transform）を行い、時間領域の音声信号を周波数領域のデータ、すなわち、スペクトル（以下では、右側音声信号スペクトルと呼ぶ）に変換して、共通成分抽出部３と減算器７とに出力する。 Next, the spectrum transform unit 2a performs fast Fourier transform (FFT) on the speech signal to which the window function is applied for each frame, and converts the time domain speech signal into frequency domain data, that is, The spectrum is converted into a spectrum (hereinafter referred to as a right audio signal spectrum) and output to the common component extraction unit 3 and the subtractor 7.

ここで、右側音声信号をｘｒ（ｎ）、右側音声信号スペクトルをＸＲ（ｋ）、窓関数ｗ（ｎ）とすると、スペクトル変換部２ａは、次式によって右側音声信号スペクトルＸＲ（ｋ）を算出する。なお、ｎはサンプリング番号である。本実施の形態においては、上述したとおり、１フレームに１０２４サンプルが含まれており、スペクトル変換部２ａは１０２４ポイントのＦＦＴを行う。 Here, assuming that the right audio signal is xr (n), the right audio signal spectrum is XR (k), and the window function w (n), the spectrum conversion unit 2a calculates the right audio signal spectrum XR (k) by the following equation. To do. Note that n is a sampling number. In the present embodiment, as described above, 1024 samples are included in one frame, and the spectrum conversion unit 2a performs 1024-point FFT.

本実施の形態では、音声信号から周波数スペクトルを算出するためにＦＦＴを行っているが、修正離散コサイン変換（ＭＤＣＴ：Modified Discrete Cosine Transform）によって周波数スペクトルを算出する構成であってもよく、特に限定はされない。 In this embodiment, FFT is performed to calculate a frequency spectrum from an audio signal. However, a configuration in which a frequency spectrum is calculated by a modified discrete cosine transform (MDCT) may be used. Not done.

また、スペクトル変換部２ｂは、スペクトル変換部２ａと同様の処理により、入力端子１２ｂを介して入力された左側音声信号のスペクトル（以下では、左側音声信号スペクトルと呼ぶ）を算出し、共通成分抽出部３と減算器８とに出力する。ここで、左側音声信号をｘｌ（ｎ）、左側音声信号スペクトルをＸＬ（ｋ）、窓関数ｗ（ｎ）とすると、スペクトル変換部２ａは、次式によって左側音声信号スペクトルＸＬ（ｋ）を算出する。 Further, the spectrum conversion unit 2b calculates the spectrum of the left audio signal input through the input terminal 12b (hereinafter referred to as the left audio signal spectrum) by the same processing as the spectrum conversion unit 2a, and extracts the common component. Output to the unit 3 and the subtracter 8. Here, assuming that the left audio signal is xl (n), the left audio signal spectrum is XL (k), and the window function w (n), the spectrum conversion unit 2a calculates the left audio signal spectrum XL (k) by the following equation. To do.

共通成分抽出部３は、右側音声信号スペクトルと左側音声信号スペクトルとの共通成分を抽出する。図２は、共通成分を説明するための図であり、（ａ）は右側音声信号スペクトル（Ｒチャンネル）と左側音声信号スペクトル（Ｌチャンネル）との共通成分を示す図であり、（ｂ）は共通成分のみを示す図である。 The common component extraction unit 3 extracts a common component of the right audio signal spectrum and the left audio signal spectrum. 2A and 2B are diagrams for explaining the common component. FIG. 2A is a diagram illustrating the common component of the right audio signal spectrum (R channel) and the left audio signal spectrum (L channel), and FIG. It is a figure which shows only a common component.

共通成分抽出部３は、共通成分スペクトルＣ（ｋ）をＣ（ｋ）＝ＭＩＮ（ＸＬ（ｋ），ＸＲ（ｋ））によって算出し、減算器７、８、および乗算部４ｂに出力する。つまり、共通成分抽出部３は、ＸＲ（ｋ）とＸＬ（ｋ）との小さいほうのスペクトルを共通成分として抽出する。 The common component extraction unit 3 calculates the common component spectrum C (k) by C (k) = MIN (XL (k), XR (k)), and outputs it to the subtracters 7 and 8 and the multiplication unit 4b. That is, the common component extraction unit 3 extracts the smaller spectrum of XR (k) and XL (k) as a common component.

上述したとおり、本発明の音声信号変換装置１には、ステレオ放送の番組などにおける２チャンネルの入力信号が入力される。一般的なステレオ放送の番組では、音声は音声収録用の１チャンネルマイクによって収録され、ボーカルを除くＢＧＭや効果音等は、予め左右の２つのマイク（ステレオ）で収録されている。これら３つのマイクによって録音された番組を２チャンネルでステレオ放送する場合、３チャンネルの信号を２チャンネルにダウンミックスすることになる。すなわち、音声収録用の１チャンネルマイクによって録音された人の声の音声信号は、左右の２つのマイクによって録音された周囲の音の信号とミックスされて、２チャンネルの音声信号が送出されることになる。このとき、人の声の信号と周囲の音の信号とをミックスする比率は、放送局において設定される。この場合、上記の右側音声信号は右マイク、および、音声収録用の１チャンネルマイクによって録音された音声をミックスした音声信号である。また、上記の左側音声信号は、左マイク、および、音声収録用の１チャンネルマイクによって録音された音声をミックスした音声信号である。そのため、この場合にも、人の声を表す音声信号は、左側音声信号、および、右側音声信号に共通して含まれる。なお、ボーカルを含む音楽は、同様にボーカルが、音声収録用の１チャンネルマイクによって収録され、楽器音は左右の２つのマイク（ステレオ）で収録されたのち、レコーディングエンジニアによって２チャンネルにダウンミックスされている。前述のような背景を利用してダウンミックスされた２チャンネルの音声信号を、元のダウンミックス前の３チャンネルにおおよそ復元する。ここで、おおよそとは、予め左右の２つのマイク（ステレオ）で収録された信号にも共通成分があり、完全に復元するものではないことを意味する。 As described above, the audio signal conversion apparatus 1 of the present invention receives a 2-channel input signal in a stereo broadcast program or the like. In a general stereo broadcast program, sound is recorded by a one-channel microphone for recording sound, and BGM and sound effects other than vocals are recorded in advance by two microphones (stereo) on the left and right. When a program recorded by these three microphones is broadcast in stereo on 2 channels, the 3 channel signal is downmixed to 2 channels. That is, a voice signal of a human voice recorded by a one-channel microphone for recording a sound is mixed with a surrounding sound signal recorded by two left and right microphones, and a two-channel voice signal is transmitted. become. At this time, the ratio at which the human voice signal and the surrounding sound signal are mixed is set in the broadcasting station. In this case, the right audio signal is an audio signal obtained by mixing audio recorded by the right microphone and the 1-channel microphone for recording audio. The left audio signal is an audio signal obtained by mixing audio recorded by the left microphone and a one-channel microphone for recording audio. Therefore, also in this case, an audio signal representing a human voice is included in common with the left audio signal and the right audio signal. For music including vocals, vocals are also recorded by a 1-channel microphone for recording audio, and instrument sounds are recorded by two microphones (stereo) on the left and right, and then downmixed to 2 channels by a recording engineer. ing. The two-channel audio signal downmixed using the background as described above is roughly restored to the three channels before the original downmix. Here, “approximate” means that the signals recorded in advance by the two left and right microphones (stereo) have a common component and are not completely restored.

つまり、共通成分抽出部３は、右側音声信号、および、左側音声信号に共通して含まれている主として人の声を表す音声信号の成分を、共通成分として抽出する。 That is, the common component extraction unit 3 extracts, as a common component, a component of an audio signal mainly representing a human voice that is included in common in the right audio signal and the left audio signal.

減算器７は、スペクトル変換部２ａから出力された右側音声信号スペクトルＸＲ（ｋ）から、共通成分抽出部３から出力された共通成分スペクトルＣ（ｋ）を減算して、右成分スペクトルＸＲ’（ｋ）を算出し、乗算部４ａに出力する。つまり、減算器７は、ＸＲ’（ｋ）＝ＸＲ（ｋ）−Ｃ（ｋ）の演算を行う。 The subtractor 7 subtracts the common component spectrum C (k) output from the common component extraction unit 3 from the right audio signal spectrum XR (k) output from the spectrum conversion unit 2a to obtain the right component spectrum XR ′ ( k) is calculated and output to the multiplication unit 4a. That is, the subtractor 7 performs an operation of XR ′ (k) = XR (k) −C (k).

減算器８は、スペクトル変換部２ｂから出力された左側音声信号スペクトルＸＬ（ｋ）から、共通成分抽出部３から出力された共通成分スペクトルＣ（ｋ）を減算して、左成分スペクトルＸＬ’（ｋ）を算出し、乗算部４ｃに出力する。つまり、減算器８は、ＸＬ’（ｋ）＝ＸＬ（ｋ）−Ｃ（ｋ）の演算を行う。 The subtracter 8 subtracts the common component spectrum C (k) output from the common component extraction unit 3 from the left audio signal spectrum XL (k) output from the spectrum conversion unit 2b to obtain the left component spectrum XL ′ ( k) is calculated and output to the multiplier 4c. That is, the subtracter 8 performs the calculation of XL ′ (k) = XL (k) −C (k).

図３は、右側音声信号スペクトル（Ｒチャンネル）、および、左側音声信号スペクトル（Ｌチャンネル）から共通成分スペクトルを除いた残りの成分を示す図であり、（ａ）は左成分スペクトルＸＬ’（ｋ）を示す図であり、（ｂ）は右成分スペクトルＸＲ’（ｋ）を示す図である。 FIG. 3 is a diagram showing the remaining components obtained by removing the common component spectrum from the right audio signal spectrum (R channel) and the left audio signal spectrum (L channel). FIG. 3A shows the left component spectrum XL ′ (k (B) is a diagram showing the right component spectrum XR ′ (k).

ここで、左成分スペクトルＸＬ’（ｋ）、および、右成分スペクトルＸＲ’（ｋ）は、主として人の声以外の音（ＢＧＭや効果音や雑音などの周囲の音）を表す成分である。 Here, the left component spectrum XL ′ (k) and the right component spectrum XR ′ (k) are components mainly representing sounds other than the human voice (background sounds such as BGM, sound effects, and noise).

乗算部４ａは、減算器７から出力されたＸＲ’（ｋ）に乗数Ｍ１（０≦Ｍ１≦１）を乗じてＸＲ”（ｋ）（＝Ｍ１×ＸＲ’（ｋ））を算出し、逆変換部５ａに出力する。また、乗算部４ｂは、共通成分抽出部３から出力されたＣ（ｋ）に乗数Ｍ２（０≦Ｍ２≦１）を乗じてＣ”（ｋ）（＝Ｍ２×Ｃ（ｋ））を算出し、逆変換部５ｂに出力する。さらに、乗算部４ｃは、減算器８から出力されたＸＬ’（ｋ）に乗数Ｍ１を乗じてＸＬ”（ｋ）（＝Ｍ１×ＸＬ’（ｋ））を算出し、逆変換部５ｃに出力する。 The multiplier 4a multiplies XR ′ (k) output from the subtractor 7 by a multiplier M1 (0 ≦ M1 ≦ 1) to calculate XR ″ (k) (= M1 × XR ′ (k)), and the inverse The multiplication unit 4b multiplies C (k) output from the common component extraction unit 3 by a multiplier M2 (0 ≦ M2 ≦ 1) to obtain C ″ (k) (= M2 × C). (K)) is calculated and output to the inverse transform unit 5b. Further, the multiplier 4c multiplies XL ′ (k) output from the subtractor 8 by the multiplier M1 to calculate XL ″ (k) (= M1 × XL ′ (k)), and outputs the result to the inverse converter 5c. To do.

以下では、ＸＲ”（ｋ）、Ｃ”（ｋ）、および、ＸＬ”（ｋ）を、それぞれ、右成分出力スペクトル、共通成分出力スペクトル、および、左成分出力スペクトルと称する。 Hereinafter, XR ″ (k), C ″ (k), and XL ″ (k) are referred to as a right component output spectrum, a common component output spectrum, and a left component output spectrum, respectively.

図４は、右成分出力スペクトルＸＲ”（ｋ）、および、左成分出力スペクトルＸＬ”（ｋ)を示す図であり、（ａ）は図３（ａ）に示す右成分スペクトルに乗数Ｍ１を乗じて算出された右成分出力スペクトルＸＲ”（ｋ）を示す図であり、（ｂ）は図３（ｂ）に示す左成分スペクトルに乗数Ｍ１を乗じて算出された左成分出力スペクトルＸＬ”（ｋ）を示す図である。 FIG. 4 is a diagram showing the right component output spectrum XL ″ (k) and the left component output spectrum XL ″ (k). FIG. 4A is a diagram in which the right component spectrum shown in FIG. FIG. 4B is a diagram illustrating the right component output spectrum XR ″ (k) calculated in FIG. 3B, and FIG. 3B is a diagram illustrating the left component output spectrum XL ″ (k) calculated by multiplying the left component spectrum illustrated in FIG. ).

左成分出力スペクトルＸＬ”（ｋ）、および、右成分出力スペクトルＸＲ”（ｋ）は、周囲の音（人の声以外の音）を表す音声信号の成分である。 The left component output spectrum XL ″ (k) and the right component output spectrum XR ″ (k) are components of an audio signal representing surrounding sounds (sounds other than human voice).

逆変換部５ａは、周波数領域の情報である右成分出力スペクトルＸＲ”（ｋ）を逆ＦＦＴによって時間領域の信号波形に変換して右のスピーカに出力する音声出力信号（右チャンネルに対応する右側音声出力信号）を生成し、ＰＥＱ部６ａに出力する。また、逆変換部５ｂは、逆変換部５ａと同様の処理を行い、周波数領域の情報である共通成分出力スペクトルＣ”（ｋ）を逆ＦＦＴによって時間領域の信号波形に変換して中央のスピーカに出力する音声出力信号（中央チャンネルに対応する中央音声出力信号）を生成し、ＰＥＱ部６ｂに出力する。逆変換部５ｃは、逆変換部５ｃと同様の処理を行い、周波数領域の情報である左成分出力スペクトルＸＬ”（ｋ）を逆ＦＦＴによって時間領域の信号波形に変換して左のスピーカに出力する音声出力信号（左チャンネルに対応する左側音声出力信号）を生成し、ＰＥＱ部６ｃに出力する。 The inverse transform unit 5a converts the right component output spectrum XR ″ (k), which is frequency domain information, into a time domain signal waveform by inverse FFT and outputs it to the right speaker (right side corresponding to the right channel). (Sound output signal) is generated and output to the PEQ unit 6a, and the inverse transform unit 5b performs the same processing as the inverse transform unit 5a, and outputs the common component output spectrum C ″ (k), which is information in the frequency domain. An audio output signal (a central audio output signal corresponding to the central channel) that is converted to a signal waveform in the time domain by inverse FFT and output to the central speaker is generated and output to the PEQ unit 6b. The inverse transform unit 5c performs the same processing as the inverse transform unit 5c, converts the left component output spectrum XL ″ (k), which is frequency domain information, into a signal waveform in the time domain by inverse FFT and outputs the signal waveform to the left speaker. Audio output signal (left audio output signal corresponding to the left channel) is generated and output to the PEQ unit 6c.

なお、時間波形をＦＦＴして周波数領域に変換し、上記共通成分抽出等を行った後、逆ＦＦＴにより再度時間軸の信号波形に戻す場合、フレームのつなぎ目で発生する歪み（高調波成分）を軽減するため、ＦＦＴ処理前の時間波形の切り出しの始めと終わりの部分を、滑らかに０に近づける窓関数をかける。本実施の形態では、フレームの切り出し時間をｔとすると、切り出し時間を１／２ｔだけずらして、切り出した夫々の波形にハニング窓関数を掛け、逆ＦＦＴ後のデータに前後１／２ｔのオーバーラップを行って加算し、連続する時間波形に戻す。本実施の形態では、ハニング窓の形状にあわせて、１／２ｔのオーバーラップを行っているが、窓の形状に応じたオーバーラップ長を設定すればよく、特に限定はされない。 When the time waveform is FFT-converted into the frequency domain, the above common component extraction is performed, and then the signal waveform on the time axis is restored again by inverse FFT, the distortion (harmonic component) generated at the joint of the frames is reduced. In order to reduce this, a window function is applied to smoothly bring the beginning and end portions of the time waveform before the FFT processing close to zero. In this embodiment, when the frame cut-out time is t, the cut-out time is shifted by 1 / 2t, the Hann window function is applied to each cut-out waveform, and the data after inverse FFT overlaps by 1 / 2t before and after. To add and return to a continuous time waveform. In the present embodiment, the overlap of 1 / 2t is performed in accordance with the shape of the Hanning window, but the overlap length may be set according to the shape of the window, and is not particularly limited.

臨場感を高めたい場合、すなわち、周囲の音を強調したい場合、ＰＥＱ部６ａは、逆変換部５ａから出力される右チャンネルの音声出力信号に対し、等ラウドネス曲線の特性のパラメトリックイコライザを施し、出力端子１３ａを介して右チャンネルのスピーカに出力する。あるいは、乗算部４ａにおいて、減算された右側音声信号に対し、１より大きい乗数を乗じることによっても臨場感を高めることができる。 When it is desired to enhance the sense of reality, that is, when it is desired to enhance surrounding sounds, the PEQ unit 6a applies a parametric equalizer having the characteristic of an equal loudness curve to the right channel audio output signal output from the inverse transform unit 5a. Output to the right channel speaker via the output terminal 13a. Alternatively, the presence can be enhanced by multiplying the subtracted right audio signal by a multiplier larger than 1 in the multiplication unit 4a.

図５は、略２ｋＨｚをピークとした人の声の帯域を強調するパラメトリックイコライザの周波数特性例を示す図であり、図６は、等ラウドネス曲線に基づいて作成した略４ｋＨｚを最小値とするパラメトリックイコライザの周波数特性例を示す図である。また、図７はロビンソンらが測定した等ラウドネス曲線を示す図である。 FIG. 5 is a diagram showing a frequency characteristic example of a parametric equalizer that emphasizes a human voice band having a peak of about 2 kHz, and FIG. 6 is a parametric having a minimum value of about 4 kHz created based on an equal loudness curve. It is a figure which shows the frequency characteristic example of an equalizer. FIG. 7 is a diagram showing an equal loudness curve measured by Robinson et al.

人の声を聞こえ易くしたい場合、すなわち、人の声を強調したい場合、ＰＥＱ部６ｂは、逆変換部５ｂから出力される中央チャンネルの音声出力信号に対し、図５の如く、２ｋＨｚをピークとした音声帯域を強調するパラメトリックイコライザを施し、出力端子１３ｂを介して中央チャンネルのスピーカに出力する。あるいは、乗算部４ｂにおいて、抽出された共通成分に対し、１より大きい乗数を乗じることによっても人の声を聞こえ易くできる。 When it is desired to make the human voice easy to hear, that is, when the human voice is to be emphasized, the PEQ unit 6b has a peak at 2 kHz as shown in FIG. 5 with respect to the central channel audio output signal output from the inverse transform unit 5b. A parametric equalizer for emphasizing the voice band is applied and output to the speaker of the central channel via the output terminal 13b. Alternatively, the multiplication unit 4b can easily hear a human voice by multiplying the extracted common component by a multiplier larger than 1.

また、ＰＥＱ部６ｃは、ＰＥＱ部６ａと同様、臨場感を高めたい場合、すなわち、周囲の音を強調したい場合、逆変換部５ｃから出力される左チャンネルの音声出力信号に対し、図６の如く、等ラウドネス曲線の特性のパラメトリックイコライザを施し、出力端子１３ｃを介して左チャンネルのスピーカに出力する。あるいは、乗算部４ｃにおいて、減算された左側音声信号に対し、１より大きい乗数を乗じることによっても臨場感を高めることができる。 In addition, as in the case of the PEQ unit 6a, the PEQ unit 6c applies the left channel audio output signal output from the inverse conversion unit 5c to the audio output signal of FIG. In this manner, a parametric equalizer having the characteristic of an equal loudness curve is applied and output to the left channel speaker via the output terminal 13c. Alternatively, the presence can be enhanced by multiplying the subtracted left audio signal by a multiplier greater than 1 in the multiplication unit 4c.

ここで、ラウドネスとは、人間の音の感じ方を感覚量として表した数値である。ラウドネスは、音の強度を表す物理量である音圧とは区別される。一般的に人間の聴覚は４ｋＨｚ付近（赤ちゃんの鳴き声など）において最も感度がよく、そこから低周波または高周波になるにつれ、感度が悪くなる。そのため、同じ音の大きさに聞こえた場合であっても、実際の物理的な音圧レベルは異なる。また、音圧が２倍になったとしても、人は音の大きさが２倍大きくなったと感じるわけではない。そして、等ラウドネス曲線とは、１ｋＨｚの基準音と等しいラウドネスに聞こえた他の周波数の音圧をプロットしたものであり、図７の如く、４ｋＨｚ付近において最小値をとる略Ｖ字の曲線となる。また、等ラウドネス曲線は、音圧が高くなるにつれて特性が平坦に近づくため、図６で示すパラメトリックイコライザの特性も入力音声信号の入力レベルに応じて変更することが好ましい。 Here, the loudness is a numerical value representing how a human sound is perceived as a sensation amount. Loudness is distinguished from sound pressure, which is a physical quantity representing the intensity of sound. In general, human hearing is most sensitive in the vicinity of 4 kHz (such as a baby cry), and the sensitivity decreases as the frequency becomes lower or higher. Therefore, the actual physical sound pressure level is different even when the sounds are heard at the same volume. Also, even if the sound pressure is doubled, a person does not feel that the loudness has doubled. The equal loudness curve is a plot of the sound pressures of other frequencies that have been heard with a loudness equal to the reference sound of 1 kHz, and is a substantially V-shaped curve having a minimum value near 4 kHz as shown in FIG. . Further, since the characteristic of the equal loudness curve approaches flat as the sound pressure increases, it is preferable to change the characteristic of the parametric equalizer shown in FIG. 6 according to the input level of the input audio signal.

また、パラメトリックイコライザは、オーディオ周波数帯域を数分割することによって、それぞれの帯域ごとに通過レベルのゲイン（１以下を含む）等を調整できるイコライザであり、「中心周波数」、「ゲイン」、「Ｑ（Quality factor）」という３つのパラメータにより、通過帯域の中心周波数や周波数帯域幅を独立に変更調節することができる。ここでＱ値とは、中心周波数のレベルから３dＢ減衰した、または増幅された周波数帯域幅（Δω）と中心周波数ω０の比Ｑ＝ω０/Δωで表される。 The parametric equalizer is an equalizer that can adjust the gain (including 1 or less) of the pass level for each band by dividing the audio frequency band into several parts, and includes “center frequency”, “gain”, “Q”. With the three parameters (Quality factor), the center frequency and frequency bandwidth of the passband can be independently changed and adjusted. Here, the Q value is represented by a ratio Q = ω0 / Δω of a frequency bandwidth (Δω) attenuated or amplified by 3 dB from the level of the center frequency and the center frequency ω0.

つまり、上述したＰＥＱ部６ａ、および、ＰＥＱ部６ｃでは、等ラウドネス曲線の特性、すなわち、４ｋＨｚにおいて通過レベルが最小となる略Ｖ字の特性を示すように「中心周波数」と「ゲイン」と「Ｑ（Quality factor）」とが設定されたイコライザが施される。 That is, in the PEQ section 6a and the PEQ section 6c described above, the "center frequency", "gain", and "gain" are shown so as to show the characteristic of the equal loudness curve, that is, the substantially V-shaped characteristic at which the passing level is minimum at 4 kHz. An equalizer in which “Q (Quality factor)” is set is applied.

なお、本実施の形態では、２ｋＨｚをピークとした音声帯域を強調する手段として、ＰＥＱ部６ｂを使用したが、ＰＥＱ以外のフィルタと増幅器の組み合わせを用いて実現されてもよい。また、ＦＦＴ後のスペクトルに対して、乗算部４ｂを用い、２ｋＨｚをピークとする重み付けを直接行っても良い。また、周囲の音を強調する手段として、ＰＥＱ部６ａとＰＥＱ部６ｃを使用したが、ＰＥＱ以外のフィルタと増幅器の組み合わせを用いて実現されてもよい。また、ＦＦＴ後のスペクトルに対して、乗算部４ａと乗算部４ｃを用いて、等ラウドネス曲線の特性の重み付けを行っても良く、特に限定されない。 In the present embodiment, the PEQ unit 6b is used as means for enhancing the voice band having a peak of 2 kHz, but may be realized using a combination of a filter and an amplifier other than the PEQ. Moreover, the multiplication unit 4b may be used for the spectrum after FFT to directly weight the spectrum at 2 kHz. Further, although the PEQ unit 6a and the PEQ unit 6c are used as means for enhancing surrounding sounds, the PEQ unit 6a and the PEQ unit 6c may be implemented using a combination of a filter and an amplifier other than the PEQ. Further, the characteristics of the equal loudness curve may be weighted to the spectrum after the FFT using the multiplier 4a and the multiplier 4c, and the spectrum is not particularly limited.

本発明に係る音声信号変換装置１では、左成分スペクトルＸＬ’（ｋ）と右成分スペクトルＸＲ’（ｋ）とに乗じる乗数Ｍ１を小さくすれば音声を強調することができる。例えば、共通成分スペクトルに乗じる乗数を１として共通成分スペクトルを生成し、右成分スペクトルと左成分スペクトルとに１未満の乗数を乗じて、左成分出力スペクトル、および右成分出力スペクトルを小さくした場合、人の声に対応する音声出力信号の大きさは変化せず、周囲の音に対応する音声出力信号のみが小さくなるため、共通成分出力スペクトル、左成分出力スペクトル、および右成分出力スペクトルから生成された各音声出力信号に基づいてスピーカから出力される音声は、人の声が強調される。また、右成分スペクトルと左成分スペクトルとに乗数として０を乗じれば、より人の声を強調できる。 In the audio signal conversion apparatus 1 according to the present invention, the voice can be enhanced by reducing the multiplier M1 multiplied by the left component spectrum XL ′ (k) and the right component spectrum XR ′ (k). For example, when a common component spectrum is generated by multiplying the common component spectrum by 1, and the left component output spectrum and the right component output spectrum are reduced by multiplying the right component spectrum and the left component spectrum by a multiplier of less than 1, The size of the audio output signal corresponding to the human voice does not change and only the audio output signal corresponding to the surrounding sound is reduced, so it is generated from the common component output spectrum, left component output spectrum, and right component output spectrum. The voice output from the speaker based on each voice output signal is emphasized by a human voice. Further, if the right component spectrum and the left component spectrum are multiplied by 0 as a multiplier, the human voice can be more emphasized.

一方、共通成分スペクトルの大きさを変化させることなく、左成分スペクトルＸＬ’（ｋ）と右成分スペクトルＸＲ’（ｋ）とに乗じる乗数Ｍ１を大きくすれば、周囲の音に対応する音声出力信号が大きくなり、スピーカから出力される周囲の音が大きくなるため、臨場感を高めることができる。 On the other hand, if the multiplier M1 multiplied by the left component spectrum XL ′ (k) and the right component spectrum XR ′ (k) is increased without changing the size of the common component spectrum, the audio output signal corresponding to the surrounding sound is obtained. Since the surrounding sound output from the speaker increases, the sense of reality can be enhanced.

また、右成分スペクトルと左成分スペクトルの大きさを変化させることなく、共通成分スペクトルＣ（ｋ）に乗じる乗数を大きくすれば音声を強調することができる。一方、共通成分スペクトルＣ（ｋ）に乗じる乗数を小さくすれば臨場感を高めることができる。さらに、共通成分スペクトルに乗数として０を乗じれば、より臨場感を高めることができる。 Further, the voice can be enhanced by increasing the multiplier by which the common component spectrum C (k) is multiplied without changing the sizes of the right component spectrum and the left component spectrum. On the other hand, if the multiplier by which the common component spectrum C (k) is multiplied is reduced, the sense of reality can be enhanced. Furthermore, if the common component spectrum is multiplied by 0 as a multiplier, the sense of reality can be further enhanced.

本実施の形態では、右成分出力スペクトルＸＲ”（ｋ）、共通成分出力スペクトルＣ”（ｋ）、および、左成分出力スペクトルＸＬ”（ｋ）を算出する場合、右成分スペクトルＸＲ’（ｋ）、共通成分スペクトルＣ（ｋ）、および、左成分スペクトルＸＬ’（ｋ）に、乗数Ｍ１、Ｍ２として０〜１の間の数値を乗じる構成だが、１以上の乗数を乗じる構成であってもよく、特に限定はされない。また、左成分スペクトルＸＬ’（ｋ）と右成分スペクトルＸＲ’（ｋ）とに、それぞれ、異なる乗数を乗じる構成であってもよく、特に限定はされない。 In the present embodiment, when calculating the right component output spectrum XR ″ (k), the common component output spectrum C ″ (k), and the left component output spectrum XL ″ (k), the right component spectrum XR ′ (k) The common component spectrum C (k) and the left component spectrum XL ′ (k) are multiplied by numerical values between 0 and 1 as multipliers M1 and M2, but may be multiplied by one or more multipliers. Further, the left component spectrum XL ′ (k) and the right component spectrum XR ′ (k) may be multiplied by different multipliers, respectively, and there is no particular limitation.

なお、本実施の形態では、左成分スペクトルＸＬ’（ｋ）、右成分スペクトルＸＲ’（ｋ）、および共通成分スペクトルＣ（ｋ）にＭ１（０〜１の乗数）を乗じることによって、最終的に左チャンネル、右チャンネル、および中央チャネルに出力される音声出力信号のレベルバランスを変化させる構成であるが、左成分スペクトルＸＬ’（ｋ）、右成分スペクトルＸＲ’（ｋ）、および共通成分スペクトルＣ（ｋ）に乗数を乗じることなく逆ＦＦＴを施して時間波形に変換し、変換により得られた左チャンネル、右チャンネル、および中央チャネルに対応する音声出力信号を乗数Ｍ１、Ｍ２と同じ入出力特性となる増幅、減衰器によってそれぞれ増幅、減衰して、各音声出力信号のレベルバランスを変化させる構成であってもよく、特に限定はされない。 In the present embodiment, the left component spectrum XL ′ (k), the right component spectrum XR ′ (k), and the common component spectrum C (k) are multiplied by M1 (multiplier of 0 to 1) to obtain a final result. The left component spectrum XL ′ (k), the right component spectrum XR ′ (k), and the common component spectrum are configured to change the level balance of the audio output signals output to the left channel, the right channel, and the center channel. C (k) is converted into a time waveform by performing inverse FFT without multiplying by a multiplier, and the audio output signals corresponding to the left channel, right channel, and center channel obtained by the conversion are input / output same as multipliers M1 and M2. It may be configured to amplify and attenuate by characteristic amplification and attenuator, and change the level balance of each audio output signal, especially limited Not.

すなわち、乗算部４は、スペクトル成分に乗数を乗じる構成のほか、スペクトル成分に逆ＦＦＴなどを施して時間波形を表す音声信号に変換した後、減衰器によって減衰させる構成、あるいは、増幅器によって増幅させる構成によって実現されてもよく、特に限定はされない。 That is, in addition to a configuration in which the spectral component is multiplied by a multiplier, the multiplication unit 4 performs inverse FFT or the like on the spectral component to convert it into an audio signal representing a time waveform, and then attenuates it by an attenuator or amplifies it by an amplifier. The configuration may be realized and is not particularly limited.

また、この音声出力信号のレベルバランスを変化させる処理は、ＰＥＱ部６において実現されてもよいし、ＰＥＱ部６以外のフィルタと増幅器の組み合わせを用いて実現されてもよく、特に限定はされない。例えば、ＰＥＱ部６ｂにおいて主に人の声に対応する音声信号を増幅すれば、人の声を強調する構成を実現できる。また、ＰＥＱ部６ａまたは６ｃにおいて周囲の音に対応する音声信号を増幅すれば、臨場感を高める構成を実現できる。 The processing for changing the level balance of the audio output signal may be realized in the PEQ unit 6 or may be realized using a combination of a filter and an amplifier other than the PEQ unit 6, and is not particularly limited. For example, if the PEQ unit 6b amplifies an audio signal mainly corresponding to a human voice, a configuration that emphasizes the human voice can be realized. Further, if the PEQ unit 6a or 6c amplifies an audio signal corresponding to the surrounding sound, a configuration that enhances the sense of reality can be realized.

〔実施の形態２〕
以下では、図８〜９を参照して、人の声をより強調することが可能な音声信号変換装置１ａ、１ｂについて説明する。 [Embodiment 2]
Below, with reference to FIGS. 8-9, the audio | voice signal converter 1a, 1b which can emphasize a human voice more is demonstrated.

音声信号変換装置１ａ、１ｂは、音声信号変換装置１と同様、テレビ受信装置などに実装され、放送中の番組の音声を強調する装置である。ここで、音声とは、台詞やボーカルなどの人の声を指し、人の声以外の音（例えば、周囲の雑音やＢＧＭや効果音など）と区別する。つまり、音声信号変換装置１ａは、放送番組中の人の声を強調する装置である。なお、音声信号と表現した場合、番組中の音声と音声以外の音も含めた全ての音を表す信号を指す。 The audio signal conversion devices 1a and 1b are devices that are mounted on a television receiver or the like, as with the audio signal conversion device 1, and emphasize the audio of a program being broadcast. Here, the voice refers to a human voice such as dialogue or vocals, and is distinguished from sounds other than the human voice (for example, ambient noise, BGM, sound effects, etc.). That is, the audio signal conversion device 1a is a device that emphasizes the voice of a person in a broadcast program. In addition, when expressed as an audio signal, it indicates a signal that represents all sounds including audio and non-audio sounds in the program.

本実施の形態では、音声信号変換装置１ａ、１ｂには、ＰＣＭ（Pulse Code Modulation）によってデジタル符号化された２チャンネルの音声信号が入力される。通常、ステレオ放送などでは、入力された２チャンネルの音声信号に基づいて、テレビに備えられている左右のスピーカに異なる音声信号が供給され、左右のスピーカからは異なる音声が出力される。 In the present embodiment, two-channel audio signals digitally encoded by PCM (Pulse Code Modulation) are input to the audio signal converters 1a and 1b. Normally, in stereo broadcasting or the like, different audio signals are supplied to the left and right speakers provided in the television based on the input two-channel audio signals, and different audio is output from the left and right speakers.

以下では、通常のステレオ放送において左右のスピーカに供給される音声信号を、それぞれ、左側音声信号（左チャンネルに対応する左側音声信号）、および右側音声信号（右チャンネルに対応する右側音声信号）と呼ぶ。右側音声信号、および、左側音声信号は、それぞれ、入力端子１２ａ、および、入力端子１２ｂを介して音声信号変換装置１ａ、１ｂに入力される。 In the following, audio signals supplied to the left and right speakers in normal stereo broadcasting are respectively a left audio signal (left audio signal corresponding to the left channel) and a right audio signal (right audio signal corresponding to the right channel). Call. The right audio signal and the left audio signal are input to the audio signal converters 1a and 1b via the input terminal 12a and the input terminal 12b, respectively.

本実施の形態に係る音声信号変換装置１ａ、１ｂは、いずれも、入力される右側音声信号および左側音声信号について、高域成分の音声信号と低域成分の音声信号とに分けて、右側音声信号の高域成分（以下では、右側音声高域信号と称す）と左側音声信号の高域成分（以下では、左側音声高域信号と称する）について、共通成分を抽出する構成である。ここで、共通成分は、主としてボーカルや台詞などの人の声に対応しているが、厳密には、楽器の低音や騒音等も含んでいる。そこで、共通成分を、例えば、人の声に相当する１００Ｈｚ以上の高域成分について抽出すれば、共通成分から人の声以外の成分をより厳密に除去することができる。これにより、人の声をより正確に強調することが可能となる。以下に、音声信号変換装置１ａ、１ｂにおける音声強調の処理について、より詳細に説明する。 Audio signal conversion apparatuses 1a and 1b according to the present embodiment are divided into a high-frequency component audio signal and a low-frequency component audio signal for the right audio signal and the left audio signal that are input, and the right audio signal. A common component is extracted from the high frequency component of the signal (hereinafter referred to as the right audio high frequency signal) and the high frequency component of the left audio signal (hereinafter referred to as the left audio high frequency signal). Here, the common component mainly corresponds to human voices such as vocals and lines, but strictly speaking, includes the bass and noise of musical instruments. Therefore, if the common component is extracted with respect to a high frequency component of 100 Hz or more corresponding to a human voice, for example, components other than the human voice can be more strictly removed from the common component. Thereby, it becomes possible to emphasize a human voice more correctly. Hereinafter, the speech enhancement processing in the speech signal converters 1a and 1b will be described in more detail.

（音声信号変換装置１ａ）
本発明に係る音声信号変換装置１ａについて、図８を参照して説明すれば、以下のとおりである。音声信号変換装置１ａは、上記の右側音声信号と左側音声信号との２チャンネルの音声信号に基づいて、左右、および、中央の３つのスピーカを介して音声を出力する。つまり、音声信号変換装置１ａは、入力された２チャンネルの音声信号を、左チャンネル、右チャンネル、および、中央チャンネルの３チャンネルの音声出力信号に変換し、各スピーカに供給する構成である。 (Audio signal converter 1a)
The audio signal conversion apparatus 1a according to the present invention will be described below with reference to FIG. The audio signal converter 1a outputs audio via the left, right, and center speakers based on the two-channel audio signals of the right audio signal and the left audio signal. In other words, the audio signal conversion device 1a is configured to convert the input 2-channel audio signals into 3-channel audio output signals of the left channel, the right channel, and the center channel, and supply them to each speaker.

図８は、本発明に係る音声信号変換装置１ａの構成を示すブロック図である。音声信号変換装置１ａは、スペクトル変換部２と共通成分抽出部（共通成分抽出手段）３と乗算部４と逆変換部（共通信号生成手段、音声出力信号生成手段）５とパラメトリックイコライザ（ＰＥＱ；Parametric Equalizer）部６と減算器７、８と入力端子１２と出力端子１３と、遅延部（高域信号生成手段）２１、２３と減算器（高域信号生成手段）２７、２８と低域通過フィルタ部（低域信号生成手段）２２、２４と加算器（音声出力信号生成手段）２５、２６とを備えている。 FIG. 8 is a block diagram showing the configuration of the audio signal converter 1a according to the present invention. The audio signal converter 1a includes a spectrum converter 2, a common component extractor (common component extractor) 3, a multiplier 4, an inverse converter (common signal generator, audio output signal generator) 5, and a parametric equalizer (PEQ; Parametric Equalizer) 6, subtractors 7 and 8, input terminal 12 and output terminal 13, delay units (high frequency signal generating means) 21 and 23, subtractors (high frequency signal generating means) 27 and 28, and low pass Filter sections (low-frequency signal generating means) 22 and 24 and adders (audio output signal generating means) 25 and 26 are provided.

右側音声信号、および、左側音声信号は、それぞれ、入力端子１２ａ、および、入力端子１２ｂを介して音声信号変換装置１ａに入力される。そして、入力端子１２ａに入力された右側音声信号は、遅延部２１と低域通過フィルタ部２２（例えばローパスフィルタ）とに入力される。また、入力端子１２ｂに入力された左側音声信号は、遅延部２３と低域通過フィルタ部２４とに入力される。 The right audio signal and the left audio signal are input to the audio signal converter 1a via the input terminal 12a and the input terminal 12b, respectively. The right audio signal input to the input terminal 12a is input to the delay unit 21 and the low-pass filter unit 22 (for example, a low-pass filter). The left audio signal input to the input terminal 12 b is input to the delay unit 23 and the low-pass filter unit 24.

低域通過フィルタ部２２は、入力された右側音声信号を低域濾波して、加算器２５と減算器２７とに出力する。すなわち、右側音声信号の低域成分（以下では、右側音声低域信号と称する）のみを通過させる。本実施の形態においては、上記低域濾波の遮断周波数は略１００Ｈｚである。しかしながら、遮断周波数は、要求される精度に応じて１００Ｈｚ以外の遮断周波数であってもよく、特に限定はされない。 The low-pass filter unit 22 performs low-pass filtering on the input right audio signal and outputs it to the adder 25 and the subtractor 27. That is, only the low frequency component of the right audio signal (hereinafter referred to as the right audio low frequency signal) is passed. In the present embodiment, the cutoff frequency of the low-pass filtering is approximately 100 Hz. However, the cutoff frequency may be a cutoff frequency other than 100 Hz depending on the required accuracy, and is not particularly limited.

遅延部２１は、入力された右側音声信号を遅延させて、減算器２７に出力する。ここで、遅延部２１における遅延量は、低域通過フィルタ部２２における遅延量（すなわち、入力された右側音声信号が低域濾波されて右側音声低域信号として出力されるまでに要する時間）と同じであることが好ましい。これにより、遅延部２１からの遅延した右側音声信号と低域通過フィルタ部２２からの右側音声低域信号の位相を合わせることができる。 The delay unit 21 delays the input right audio signal and outputs it to the subtractor 27. Here, the delay amount in the delay unit 21 is the delay amount in the low-pass filter unit 22 (that is, the time required for the input right audio signal to be low-pass filtered and output as the right audio low-frequency signal). Preferably they are the same. As a result, the phase of the delayed right audio signal from the delay unit 21 and the right audio low frequency signal from the low pass filter unit 22 can be matched.

減算器２７は、遅延部２１からの遅延した右側音声信号から、低域通過フィルタ部２２からの右側音声低域信号を減算して、スペクトル変換部２ａに出力する。上述のとおり、遅延部２１からの遅延した右側音声信号と低域通過フィルタ部２２からの右側音声低域信号の位相は同期しているため、減算器２７からは、右側音声信号の高域成分（以下では、右側音声高域信号と称する）が出力される。 The subtractor 27 subtracts the right audio low-frequency signal from the low-pass filter unit 22 from the delayed right audio signal from the delay unit 21 and outputs the result to the spectrum conversion unit 2a. As described above, since the phase of the delayed right audio signal from the delay unit 21 and the right audio low frequency signal from the low pass filter unit 22 are synchronized, the subtractor 27 outputs a high frequency component of the right audio signal. (Hereinafter referred to as the right audio high frequency signal) is output.

なお、本実施の形態においては、低域通過フィルタ部２２と遅延部２１および減算部２７との組み合わせによって低域信号および高域信号を出力する構成であるが、高域通過フィルタ部と遅延部および減算部との組み合わせによって高域信号および低域信号を出力する構成であってもよく特に限定はされない。 In the present embodiment, a combination of the low-pass filter unit 22, the delay unit 21, and the subtracting unit 27 outputs a low-frequency signal and a high-frequency signal. In addition, there may be a configuration in which a high frequency signal and a low frequency signal are output in combination with a subtracting unit, and there is no particular limitation.

スペクトル変換部２ａは、ＦＦＴなどによって、右側音声高域信号から周波数スペクトル（以下では、右側音声高域信号スペクトルＸＲ（ｋ）と呼ぶ）を算出し、共通成分抽出部３と減算器７とに出力する。なお、スペクトル変換部２の処理は、音声信号変換装置１における処理と同じであるため、詳細な説明は省略する。 The spectrum conversion unit 2a calculates a frequency spectrum (hereinafter, referred to as a right audio high-frequency signal spectrum XR (k)) from the right audio high-frequency signal by FFT or the like, and sends it to the common component extraction unit 3 and the subtractor 7. Output. Note that the processing of the spectrum conversion unit 2 is the same as the processing in the audio signal conversion device 1, and thus detailed description thereof is omitted.

また、入力端子１２ｂに入力された左側音声信号は、入力端子１２ａに入力された右側音声信号と同様に、遅延部２３と低域通過フィルタ部２４とに入力され、それぞれ、遅延した左側音声信号と左側音声信号の低域成分（以下では、左側音声低域信号と称する）とを減算器２８に出力する。ここで、遅延部２３における遅延量は、低域通過フィルタ部２４における遅延量と同じであることが好ましい。なお、低域通過フィルタ部２４は、左側音声低域信号を加算器２６にも出力する。そして、減算器２８は、遅延部２３からの遅延した左側音声信号から、低域通過フィルタ部２４からの左側音声低域信号を減算して、左側音声信号の高域成分（以下では、左側音声高域信号と称する）をスペクトル変換部２ｂに出力する。そして、スペクトル変換部２ｂは、ＦＦＴなどによって、左側音声高域信号から周波数スペクトル（以下では、左側音声高域信号スペクトルＸＬ（ｋ）と呼ぶ）を算出し、共通成分抽出部３と減算器８とに出力する。なお、スペクトル変換部２の処理は、音声信号変換装置１における処理と同じであるため、詳細な説明は省略する。 The left audio signal input to the input terminal 12b is input to the delay unit 23 and the low-pass filter unit 24 in the same manner as the right audio signal input to the input terminal 12a. And the low frequency component of the left audio signal (hereinafter referred to as the left audio low frequency signal) are output to the subtractor 28. Here, the delay amount in the delay unit 23 is preferably the same as the delay amount in the low-pass filter unit 24. The low-pass filter unit 24 also outputs the left audio low-frequency signal to the adder 26. Then, the subtractor 28 subtracts the left audio low-frequency signal from the low-pass filter unit 24 from the delayed left audio signal from the delay unit 23 to obtain a high-frequency component (hereinafter, left audio) of the left audio signal. (Referred to as a high frequency signal) is output to the spectrum converter 2b. Then, the spectrum conversion unit 2b calculates a frequency spectrum (hereinafter referred to as the left audio high frequency signal spectrum XL (k)) from the left audio high frequency signal by FFT or the like, and the common component extraction unit 3 and the subtractor 8 And output. Note that the processing of the spectrum conversion unit 2 is the same as the processing in the audio signal conversion device 1, and thus detailed description thereof is omitted.

共通成分抽出部３は右側音声高域信号スペクトルＸＲ（ｋ）と左側音声高域信号スペクトルＸＬ（ｋ）との小さいほうのスペクトルを共通成分として抽出する。つまり、共通成分抽出部３は、右側音声高域信号、および、左側音声高域音声信号に共通して含まれている主として人の声を表す音声信号の成分を、高域共通成分Ｃ（ｋ）として抽出する。なお、共通成分抽出部３の処理は、音声信号変換装置１における処理と同じであるため、詳細な説明は省略する。 The common component extraction unit 3 extracts the smaller spectrum of the right audio high frequency signal spectrum XR (k) and the left audio high frequency signal spectrum XL (k) as a common component. That is, the common component extraction unit 3 converts the components of the audio signal mainly representing human voice included in common with the right audio high frequency signal and the left audio high frequency audio signal into the high frequency common component C (k ). Note that the processing of the common component extraction unit 3 is the same as the processing in the audio signal conversion apparatus 1, and thus detailed description thereof is omitted.

減算器７は、スペクトル変換部２ａから出力された右音声高域信号スペクトルＸＲ（ｋ）から、共通成分抽出部３から出力された高域共通成分スペクトルＣ（ｋ）を減算して、右高域成分スペクトルＸＲ’（ｋ）を算出し、乗算部４ａに出力する。つまり、減算器７は、ＸＲ’（ｋ）＝ＸＲ（ｋ）−Ｃ（ｋ）の演算を行う。 The subtractor 7 subtracts the high frequency common component spectrum C (k) output from the common component extraction unit 3 from the right audio high frequency signal spectrum XR (k) output from the spectrum conversion unit 2a, The band component spectrum XR ′ (k) is calculated and output to the multiplication unit 4a. That is, the subtractor 7 performs an operation of XR ′ (k) = XR (k) −C (k).

減算器８は、スペクトル変換部２ｂから出力された左側音声高域信号スペクトルＸＬ（ｋ）から、共通成分抽出部３から出力された高域共通成分スペクトルＣ（ｋ）を減算して、左高域成分スペクトルＸＬ’（ｋ）を算出し、乗算部４ｃに出力する。つまり、減算器８は、ＸＬ’（ｋ）＝ＸＬ（ｋ）−Ｃ（ｋ）の演算を行う。 The subtracter 8 subtracts the high frequency common component spectrum C (k) output from the common component extraction unit 3 from the left audio high frequency signal spectrum XL (k) output from the spectrum conversion unit 2b, The band component spectrum XL ′ (k) is calculated and output to the multiplication unit 4c. That is, the subtracter 8 performs the calculation of XL ′ (k) = XL (k) −C (k).

ここで、左高域成分スペクトルＸＬ’（ｋ）、および、右高域成分スペクトルＸＲ’（ｋ）は、主として人の声以外の音（ＢＧＭや効果音や雑音などの周囲の音）を表す成分である。 Here, the left high frequency component spectrum XL ′ (k) and the right high frequency component spectrum XR ′ (k) mainly represent sounds other than human voice (ambient sounds such as BGM, sound effects, and noise). It is an ingredient.

以下では、ＸＲ”（ｋ）、Ｃ”（ｋ）、および、ＸＬ”（ｋ）を、それぞれ、右高域成分出力スペクトル、高域共通成分出力スペクトル、および、左高域成分出力スペクトルと称する。 Hereinafter, XR ″ (k), C ″ (k), and XL ″ (k) are referred to as a right high-frequency component output spectrum, a high-frequency common component output spectrum, and a left high-frequency component output spectrum, respectively. .

左高域成分出力スペクトルＸＬ”（ｋ）、および、右高域成分出力スペクトルＸＲ”（ｋ）は、周囲の音（人の声以外の音）を表す音声信号の成分である。 The left high-frequency component output spectrum XL ″ (k) and the right high-frequency component output spectrum XR ″ (k) are audio signal components representing surrounding sounds (sounds other than human voice).

逆変換部５ａは、周波数領域の情報である右高域成分出力スペクトルＸＲ”（ｋ）を逆ＦＦＴによって時間領域の信号波形に変換して、加算器２５に出力する。また、逆変換部５ｂは、逆変換部５ａと同様の処理を行い、周波数領域の情報である共通成分出力スペクトルＣ”（ｋ）を逆ＦＦＴによって時間領域の信号波形に変換して、中央のスピーカに出力する音声出力信号（中央チャンネルに対応する中央音声出力信号）を生成し、ＰＥＱ部６ｂに出力する。逆変換部５ｃは、逆変換部５ｃと同様の処理を行い、周波数領域の情報である左高域成分出力スペクトルＸＬ”（ｋ）を逆ＦＦＴによって時間領域の信号波形に変換して、加算器２６に出力する。 The inverse transform unit 5a converts the right high frequency component output spectrum XR ″ (k), which is information in the frequency domain, into a signal waveform in the time domain by inverse FFT and outputs the signal waveform to the adder 25. Further, the inverse transform unit 5b. Performs the same processing as the inverse transform unit 5a, converts the common component output spectrum C ″ (k), which is information in the frequency domain, into a signal waveform in the time domain by inverse FFT, and outputs the sound to the central speaker. A signal (central audio output signal corresponding to the central channel) is generated and output to the PEQ unit 6b. The inverse transform unit 5c performs the same processing as the inverse transform unit 5c, converts the left high-frequency component output spectrum XL ″ (k), which is information in the frequency domain, into a signal waveform in the time domain by inverse FFT, and an adder 26.

また、上述のとおり、加算器２５には、低域通過フィルタ部２２から左側音声低域信号が入力されており、加算器２６には、低域通過フィルタ部２４から右側音声低域信号が入力されている。 In addition, as described above, the left audio low-frequency signal is input to the adder 25 from the low-pass filter unit 22, and the right audio low-frequency signal is input to the adder 26 from the low-pass filter unit 24. Has been.

加算器２５は、右高域成分出力スペクトルＸＲ”（ｋ）を逆ＦＦＴして得られる信号と右側音声低域信号とを加算して、右チャンネルに対応する右側音声出力信号を生成し、ＰＥＱ部６ａに出力する。また、加算器２６は、左高域成分出力スペクトルＸＬ”（ｋ）を逆ＦＦＴして得られる信号と左側音声低域信号とを加算して、左チャンネルに対応する左側音声出力信号を生成し、ＰＥＱ部６ｃに出力する。 The adder 25 adds the signal obtained by performing inverse FFT on the right high frequency component output spectrum XR ″ (k) and the right audio low frequency signal to generate a right audio output signal corresponding to the right channel, and PEQ The adder 26 adds the signal obtained by performing inverse FFT on the left high frequency component output spectrum XL ″ (k) and the left audio low frequency signal, and adds the left side corresponding to the left channel. An audio output signal is generated and output to the PEQ unit 6c.

ＰＥＱ部６ａは、加算器２５から出力される右側音声出力信号に対し、ラウドネス曲線の特性のパラメトリックイコライザを施し、出力端子１３ａを介して右チャンネルのスピーカに出力する。また、ＰＥＱ部６ｂは、逆変換部５ｂから出力される中央チャンネルの音声出力信号に対し、２ｋＨｚをピークとした音声帯域を強調するパラメトリックイコライザを施し、出力端子１３ｂを介して中央チャンネルのスピーカに出力する。また、ＰＥＱ部６ｃは、加算器２６から出力される左側音声出力信号に対し、等ラウドネス曲線の特性のパラメトリックイコライザを施し、出力端子１３ｃを介して左チャンネルのスピーカに出力する。 The PEQ unit 6a applies a parametric equalizer having a loudness curve characteristic to the right audio output signal output from the adder 25, and outputs the parametric equalizer to the right channel speaker via the output terminal 13a. The PEQ unit 6b also applies a parametric equalizer that emphasizes the audio band having a peak of 2 kHz to the audio output signal of the central channel output from the inverse conversion unit 5b, and applies it to the speaker of the central channel via the output terminal 13b. Output. Also, the PEQ unit 6c applies a parametric equalizer having the characteristic of an equal loudness curve to the left audio output signal output from the adder 26, and outputs the result to the left channel speaker via the output terminal 13c.

以上のとおり、音声信号変換装置１ａは入力された左右の音声信号の高域成分について共通成分を抽出する構成であるため、人の声を表す成分と人の声以外の成分とを、より厳密に分離することが可能となる。したがって、より厳密に人の声に対応する音声出力信号と周囲の音に対応する音声出力信号とが生成される。これにより、より正確に人の声に対応する音声出力信号と周囲の音に対応する音声出力信号とのレベルバランスを変化させることができるため、人の声を強調する場合においても確度を高めることができるようになる。 As described above, since the audio signal conversion device 1a is configured to extract a common component for the high frequency components of the input left and right audio signals, a component representing a human voice and a component other than the human voice are more strictly classified. Can be separated. Therefore, a sound output signal corresponding to a human voice and a sound output signal corresponding to a surrounding sound are generated more strictly. As a result, the level balance between the sound output signal corresponding to the human voice and the sound output signal corresponding to the surrounding sound can be changed more accurately, so that the accuracy is improved even when the human voice is emphasized. Will be able to.

なお、本実施の形態では、遅延部２１、２３において遅延させた入力信号から、低域通過フィルタ部２２、２４において低域濾波して得られた右側音声低域信号および左側音声低域信号を減算して、右側音声高域信号および左側音声高域信号を生成する構成であるが、高域通過フィルタ部をさらに備えている構成であってもよい。すなわち、遅延部において遅延させた入力信号から、高域通過フィルタ部において高域濾波して得られた右側音声高域信号および左側音声高域信号を減算して右側音声低域信号および左側音声低域信号を生成する構成であってもよく、特に限定はされない。 In the present embodiment, the right audio low-frequency signal and the left audio low-frequency signal obtained by performing low-pass filtering in the low-pass filter units 22 and 24 from the input signals delayed in the delay units 21 and 23 are used. Although it is the structure which subtracts and produces | generates a right audio | voice high frequency signal and a left audio | voice high frequency signal, the structure further equipped with the high-pass filter part may be sufficient. That is, the right audio low-frequency signal and the left audio low-frequency signal are subtracted from the input signal delayed in the delay unit by subtracting the right audio high-frequency signal and left audio high-frequency signal obtained by high-pass filtering in the high-pass filter unit. There may be a configuration for generating an area signal, and there is no particular limitation.

（音声信号変換装置１ｂ）
本発明に係る音声信号変換装置１ｂについて、図９を参照して説明すれば、以下のとおりである。音声信号変換装置１ｂは、上記の右側音声信号と左側音声信号との２チャンネルの音声信号に基づいて、左右、および、中央の３つのスピーカを介して音声を出力する。つまり、音声信号変換装置１ｂは、入力された２チャンネルの音声信号を、左チャンネル、右チャンネル、および、中央チャンネルの３チャンネルの音声出力信号に変換し、各スピーカに供給する構成である。 (Audio signal converter 1b)
The audio signal conversion device 1b according to the present invention will be described below with reference to FIG. The audio signal converter 1b outputs audio via the left, right, and center speakers based on the two-channel audio signals of the right audio signal and the left audio signal. That is, the audio signal conversion device 1b is configured to convert the input 2-channel audio signals into 3-channel audio output signals of the left channel, the right channel, and the center channel, and supply the audio signals to the speakers.

図９は、本発明に係る音声信号変換装置１ｂの構成を示すブロック図である。音声信号変換装置１ａは、スペクトル変換部２と共通成分抽出部（共通成分抽出手段）３と乗算部（成分低減手段）４と逆変換部（共通信号生成手段、音声出力信号生成手段）５とパラメトリックイコライザ（ＰＥＱ；Parametric Equalizer）部６と減算器７、８と入力端子１２と出力端子１３と、高域通過フィルタ部（高域信号生成手段）３１、３３と低域通過フィルタ部（低域信号生成手段）３２、３４と加算器（音声出力信号生成手段）３５、３６とを備えている。 FIG. 9 is a block diagram showing the configuration of the audio signal converter 1b according to the present invention. The audio signal converter 1a includes a spectrum converter 2, a common component extractor (common component extractor) 3, a multiplier (component reducer) 4, an inverse converter (common signal generator, audio output signal generator) 5, Parametric equalizer (PEQ) unit 6, subtractors 7 and 8, input terminal 12 and output terminal 13, high-pass filter units (high-frequency signal generating means) 31 and 33, and low-pass filter unit (low-frequency filter unit) Signal generating means) 32 and 34 and adders (sound output signal generating means) 35 and 36.

音声信号変換装置１ｂは、高域通過フィルタ部３１、３３および低域通過フィルタ部３２、３４以外の各部については、音声信号変換装置１ａと同様の構成のため、以下では、音声信号変換装置１ａと異なる構成についてのみ説明する。 Since the audio signal conversion device 1b has the same configuration as the audio signal conversion device 1a except for the high-pass filter units 31 and 33 and the low-pass filter units 32 and 34, the audio signal conversion device 1a will be described below. Only the different configuration will be described.

右側音声信号、および、左側音声信号は、それぞれ、入力端子１２ａ、および、入力端子１２ｂを介して音声信号変換装置１ａに入力される。そして、入力端子１２ａに入力された右側音声信号は、高域通過フィルタ部３１（例えばハイパスフィルタ）と低域通過フィルタ部３２とに入力される。また、入力端子１２ｂに入力された左側音声信号は、高域通過フィルタ部３３と低域通過フィルタ部３４とに入力される。 The right audio signal and the left audio signal are input to the audio signal converter 1a via the input terminal 12a and the input terminal 12b, respectively. The right audio signal input to the input terminal 12a is input to the high-pass filter unit 31 (for example, a high-pass filter) and the low-pass filter unit 32. The left audio signal input to the input terminal 12b is input to the high-pass filter unit 33 and the low-pass filter unit 34.

高域通過フィルタ部３１は、入力された右側音声信号を高域濾波して、スペクトル変換部２ａに出力する。すなわち、右側音声信号の高域成分（以下では、右側音声高域信号と称する）のみを通過させる。同様に、高域通過フィルタ部３３は、入力された右側音声信号を高域濾波して、スペクトル変換部２ｂに出力する。すなわち、左側音声信号の高域成分（以下では、左側音声高域信号と称する）のみを通過させる。本実施の形態においては、上記低域濾波の遮断周波数は略１００Ｈｚである。しかしながら、遮断周波数は、要求される精度に応じて１００Ｈｚ以外の遮断周波数であってもよく、特に限定はされない。 The high-pass filter unit 31 performs high-pass filtering on the input right audio signal and outputs it to the spectrum conversion unit 2a. That is, only the high frequency component of the right audio signal (hereinafter referred to as the right audio high frequency signal) is passed. Similarly, the high-pass filter unit 33 performs high-pass filtering on the input right audio signal and outputs it to the spectrum conversion unit 2b. That is, only the high frequency component of the left audio signal (hereinafter referred to as the left audio high frequency signal) is passed. In the present embodiment, the cutoff frequency of the low-pass filtering is approximately 100 Hz. However, the cutoff frequency may be a cutoff frequency other than 100 Hz depending on the required accuracy, and is not particularly limited.

低域通過フィルタ部３２は、入力された右側音声信号を低域濾波して、加算器３５に出力する。すなわち、右側音声信号の低域成分（以下では、右側音声低域信号と称する）のみを通過させる。同様に、低域通過フィルタ部３４は、入力された左側音声信号を低域濾波して、加算器３６に出力する。すなわち、左側音声信号の低域成分（以下では、左側音声低域信号と称する）のみを通過させる。本実施の形態においては、上記低域濾波の遮断周波数は略１００Ｈｚである。しかしながら、遮断周波数は、要求される精度に応じて１００Ｈｚ以外の遮断周波数であってもよく、特に限定はされない。ここで、高域通過フィルタ部３１、３３における遅延量と低域通過フィルタ部３２、３４における遅延量とは、同じであることが好ましい。 The low-pass filter unit 32 performs low-pass filtering on the input right audio signal and outputs it to the adder 35. That is, only the low frequency component of the right audio signal (hereinafter referred to as the right audio low frequency signal) is passed. Similarly, the low-pass filter unit 34 performs low-pass filtering on the input left audio signal and outputs it to the adder 36. That is, only the low frequency component of the left audio signal (hereinafter referred to as the left audio low frequency signal) is passed. In the present embodiment, the cutoff frequency of the low-pass filtering is approximately 100 Hz. However, the cutoff frequency may be a cutoff frequency other than 100 Hz depending on the required accuracy, and is not particularly limited. Here, it is preferable that the delay amount in the high-pass filter units 31 and 33 and the delay amount in the low-pass filter units 32 and 34 are the same.

音声信号変換装置１ａは、遅延部２１、２３と低域通過フィルタ部２２、２４とを用い、入力された音声信号の低域成分について直接抽出し、高域成分については原信号から低域成分を減算して抽出する構成であるのに対して、音声信号変換装置１ｂは、高域通過フィルタ部３１、３３と低域通過フィルタ部３２、３４とを用いることによって、入力された音声信号から、直接、高域成分と低域成分とを抽出する構成であり、この点においてのみ、音声信号変換装置１ａと異なる。音声信号変換装置１ｂを構成する他の各部の動作については、音声信号変換装置１ａと同様であり、説明は省略する。 The audio signal conversion device 1a uses the delay units 21 and 23 and the low-pass filter units 22 and 24 to directly extract the low-frequency component of the input audio signal, and the high-frequency component from the original signal to the low-frequency component. In contrast, the audio signal conversion device 1b uses the high-pass filter units 31 and 33 and the low-pass filter units 32 and 34 to extract from the input audio signal. In this configuration, the high-frequency component and the low-frequency component are directly extracted, and this point is different from the audio signal conversion device 1a only in this point. The operations of the other parts constituting the audio signal conversion device 1b are the same as those of the audio signal conversion device 1a, and a description thereof will be omitted.

なお、本発明を、以下のように表現することも可能である。 The present invention can also be expressed as follows.

（第１の構成）
音声を含む２チャンネルのステレオソース信号を入力し、センター成分を含む３チャンネルで再生する再生装置において、入力ソースを時間領域から周波数領域へ変換する周波数変換手段と、前記周波数変換されたスペクトルの左右のチャンネルの共通成分を抽出する手段と、左チャンネル、右チャンネルのスペクトルから、前記抽出した共通成分を減算する手段と、共通成分をセンターチャンネルに出力する再生手段と、前記減算された左右のチャンネル成分に０から１の乗数を乗じて左右チャンネルに出力する手段を備えること特徴とする第１の構成。 (First configuration)
In a playback apparatus for inputting a 2-channel stereo source signal including sound and reproducing the 3-channel stereo source signal including a center component, frequency converting means for converting the input source from the time domain to the frequency domain, and the left and right of the frequency-converted spectrum Means for extracting the common component of the channels, means for subtracting the extracted common component from the spectrum of the left channel and the right channel, a reproducing means for outputting the common component to the center channel, and the subtracted left and right channels A first configuration comprising means for multiplying a component by a multiplier of 0 to 1 and outputting the result to the left and right channels.

（第２の構成）
最後に、音声信号変換装置１、１ａ、１ｂの各ブロックは、ハードウェアロジックによって構成してもよいし、次のようにＣＰＵを用いてソフトウェアによって実現してもよい。 (Second configuration)
Finally, each block of the audio signal converters 1, 1a, 1b may be configured by hardware logic, or may be realized by software using a CPU as follows.

すなわち、音声信号変換装置１、１ａ、１ｂは、各機能を実現する制御プログラムの命令を実行するＣＰＵ（central processing unit）、上記プログラムを格納したＲＯＭ（read only memory）、上記プログラムを展開するＲＡＭ（random access memory）、上記プログラムおよび各種データを格納するメモリ等の記憶装置（記録媒体）などを備えている。そして、本発明の目的は、上述した機能を実現するソフトウェアである音声信号変換装置１、１ａ、１ｂの制御プログラムのプログラムコード（実行形式プログラム、中間コードプログラム、ソースプログラム）をコンピュータで読み取り可能に記録した記録媒体を、音声信号変換装置１、１ａ、１ｂに供給し、そのコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に記録されているプログラムコードを読み出し実行することによっても、達成可能である。 That is, the audio signal converters 1, 1a, and 1b include a CPU (central processing unit) that executes instructions of a control program that realizes each function, a ROM (read only memory) that stores the program, and a RAM that expands the program. (Random access memory), a storage device (recording medium) such as a memory for storing the program and various data. An object of the present invention is to enable the computer to read the program code (execution format program, intermediate code program, source program) of the control program of the audio signal converters 1, 1a, 1b, which is software that implements the functions described above. This can also be achieved by supplying the recorded recording medium to the audio signal converters 1, 1a, 1b, and the computer (or CPU or MPU) reading out and executing the program code recorded on the recording medium.

上記記録媒体としては、例えば、磁気テープやカセットテープ等のテープ系、フロッピー（登録商標）ディスク／ハードディスク等の磁気ディスクやＣＤ−ＲＯＭ／ＭＯ／ＭＤ／ＤＶＤ／ＣＤ−Ｒ等の光ディスクを含むディスク系、ＩＣカード（メモリカードを含む）／光カード等のカード系、あるいはマスクＲＯＭ／ＥＰＲＯＭ／ＥＥＰＲＯＭ／フラッシュＲＯＭ等の半導体メモリ系などを用いることができる。 Examples of the recording medium include a tape system such as a magnetic tape and a cassette tape, a magnetic disk such as a floppy (registered trademark) disk / hard disk, and an optical disk such as a CD-ROM / MO / MD / DVD / CD-R. Card system such as IC card, IC card (including memory card) / optical card, or semiconductor memory system such as mask ROM / EPROM / EEPROM / flash ROM.

また、音声信号変換装置１、１ａ、１ｂを通信ネットワークと接続可能に構成し、上記プログラムコードを通信ネットワークを介して供給してもよい。この通信ネットワークとしては、特に限定されず、例えば、インターネット、イントラネット、エキストラネット、ＬＡＮ、ＩＳＤＮ、ＶＡＮ、ＣＡＴＶ通信網、仮想専用網（virtual private network）、電話回線網、移動体通信網、衛星通信網等が利用可能である。また、通信ネットワークを構成する伝送媒体としては、特に限定されず、例えば、ＩＥＥＥ１３９４、ＵＳＢ、電力線搬送、ケーブルＴＶ回線、電話線、ＡＤＳＬ回線等の有線でも、ＩｒＤＡやリモコンのような赤外線、Ｂｌｕｅｔｏｏｔｈ（登録商標）、８０２．１１無線、ＨＤＲ、携帯電話網、衛星回線、地上波デジタル網等の無線でも利用可能である。なお、本発明は、上記プログラムコードが電子的な伝送で具現化された、搬送波に埋め込まれたコンピュータデータ信号の形態でも実現され得る。 Further, the audio signal converters 1, 1a, 1b may be configured to be connectable to a communication network, and the program code may be supplied via the communication network. The communication network is not particularly limited. For example, the Internet, intranet, extranet, LAN, ISDN, VAN, CATV communication network, virtual private network, telephone line network, mobile communication network, satellite communication. A net or the like is available. Also, the transmission medium constituting the communication network is not particularly limited. For example, even in the case of wired such as IEEE 1394, USB, power line carrier, cable TV line, telephone line, ADSL line, etc., infrared rays such as IrDA and remote control, Bluetooth ( (Registered trademark), 802.11 wireless, HDR, mobile phone network, satellite line, terrestrial digital network, and the like can also be used. The present invention can also be realized in the form of a computer data signal embedded in a carrier wave in which the program code is embodied by electronic transmission.

本発明に係る音声信号変換装置は、放送中や再生中のコンテンツのボーカルやセリフなどの人の声を強調することができるため、テレビ受像装置などにおいて好適に利用できる。 The audio signal conversion apparatus according to the present invention can be suitably used in a television receiver or the like because it can emphasize human voices such as vocals and lines of content being broadcast or reproduced.

本発明に係る音声信号変換装置１の構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice signal converter 1 which concerns on this invention. 共通成分を説明するための図であり、（ａ）は右側音声信号スペクトル（Ｒチャンネル）と左側音声信号スペクトル（Ｌチャンネル）との共通成分を示す図であり、（ｂ）は共通成分のみを示す図である。It is a figure for demonstrating a common component, (a) is a figure which shows the common component of a right audio | voice signal spectrum (R channel) and a left audio | voice signal spectrum (L channel), (b) is only a common component. FIG. 右側音声信号スペクトル（Ｒチャンネル）、および、左側音声信号スペクトル（Ｌチャンネル）から共通成分スペクトルを除いた残りの成分を示す図であり、（ａ）は左成分スペクトルＸＬ’（ｋ）を示す図であり、（ｂ）は右成分スペクトルＸＲ’（ｋ）を示す図である。It is a figure which shows the remaining component remove | excluding the common component spectrum from the right side audio | voice signal spectrum (R channel) and the left side audio | voice signal spectrum (L channel), (a) is a figure which shows left component spectrum XL '(k). (B) is a figure which shows right component spectrum XR '(k). 右成分出力スペクトルＸＲ”（ｋ）、および、左成分出力スペクトルＸＬ”（ｋ)を示す図であり、（ａ）は図３（ａ）に示す右成分スペクトルに所定の乗数を乗じて算出された右成分出力スペクトルＸＲ”（ｋ）を示す図であり、（ｂ）は図３（ｂ）に示す左成分スペクトルに所定の乗数を乗じて算出された左成分出力スペクトルＸＬ”（ｋ）を示す図である。It is a figure which shows right component output spectrum XR "(k) and left component output spectrum XL" (k), (a) is calculated by multiplying the right component spectrum shown to Fig.3 (a) by a predetermined multiplier. FIG. 4B shows a right component output spectrum XR ″ (k), and FIG. 3B shows a left component output spectrum XL ″ (k) calculated by multiplying the left component spectrum shown in FIG. 3B by a predetermined multiplier. FIG. 略２ｋＨｚをピークとした人の声の帯域を強調するパラメトリックイコライザの周波数特性例を示す図である。It is a figure which shows the example of a frequency characteristic of the parametric equalizer which emphasizes the zone | band of the human voice which peaked about 2 kHz. 等ラウドネス曲線に基づいて作成した略４ｋＨｚを最小値とするパラメトリックイコライザの周波数特性例を示す図である。It is a figure which shows the example of a frequency characteristic of the parametric equalizer which made based on the equal loudness curve and made about 4 kHz the minimum value. ロビンソンらが測定した等ラウドネス曲線を示す図である。It is a figure which shows the equal loudness curve measured by Robinson et al. 本発明に係る音声信号変換装置１ａの構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice signal converter 1a which concerns on this invention. 本発明に係る音声信号変換装置１ｂの構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice signal converter 1b which concerns on this invention.

Explanation of symbols

１音声信号変換装置（音声信号変換装置）
２スペクトル変換部
３共通成分抽出部（共通成分抽出手段）
４乗算部
４ａ乗算部（左右成分低減手段、左右成分増幅手段、成分増幅手段）
４ｂ乗算部（中央音声出力信号増幅手段、中央音声出力信号低減手段）
４ｃ乗算部（左右成分低減手段、左右成分増幅手段、成分増幅手段）
５逆変換部
５ａ逆変換部（左右音声出力信号生成手段、音声出力信号生成手段）
５ｂ逆変換部（中央音声出力信号生成手段、音声出力信号生成手段）
５ｃ逆変換部（左右音声出力信号生成手段、音声出力信号生成手段）
６ＰＥＱ部
６ａＰＥＱ部（左右レベル調整手段）
６ｂＰＥＱ部（中央レベル調整手段）
６ｃＰＥＱ部（左右レベル調整手段）
７減算器（減算手段）
８減算器（減算手段）
９加算器（音声出力信号生成手段）
１０加算器（音声出力信号生成手段）
１２入力端子
１３出力端子
１４出力端子
２１、２３遅延部（高域信号生成手段）
２２、２４低域通過フィルタ部（低域信号生成手段）
２５、２６加算器（音声出力信号生成手段）
２７、２８減算器（高域信号生成手段）
３１、３３高域通過フィルタ部（高域信号生成手段）
３２、３４低域通過フィルタ部（低域信号生成手段）
３５、３６加算器（音声出力信号生成手段） 1 Audio signal converter (Audio signal converter)
2 Spectrum conversion unit 3 Common component extraction unit (common component extraction means)
4 Multiplying unit 4a Multiplying unit (left / right component reduction means, left / right component amplification means, component amplification means)
4b Multiplication unit (central audio output signal amplifying means, central audio output signal reducing means)
4c Multiplier (left / right component reduction means, left / right component amplification means, component amplification means)
5 Inverse conversion unit 5a Inverse conversion unit (left and right audio output signal generation means, audio output signal generation means)
5b Inverse conversion unit (central audio output signal generating means, audio output signal generating means)
5c Inverse conversion unit (left / right audio output signal generating means, audio output signal generating means)
6 PEQ section 6a PEQ section (left and right level adjustment means)
6b PEQ section (center level adjustment means)
6c PEQ section (left / right level adjustment means)
7 Subtractor (subtraction means)
8 Subtractor (subtraction means)
9 Adder (Audio output signal generation means)
10 Adder (Audio output signal generation means)
12 input terminal 13 output terminal 14 output terminal 21, 23 delay unit (high-frequency signal generating means)
22, 24 Low-pass filter section (low-frequency signal generating means)
25, 26 Adder (Audio output signal generating means)
27, 28 Subtractor (High-frequency signal generating means)
31, 33 High-pass filter section (high-frequency signal generating means)
32, 34 Low-pass filter section (low-frequency signal generating means)
35, 36 Adder (Audio output signal generating means)

Claims

A right audio signal corresponding to the right channel and a left audio signal corresponding to the left channel, a central audio output signal corresponding to the center channel, a right audio output signal corresponding to the right channel, and a left audio output corresponding to the left channel. An audio signal converter for converting into a signal,
A high frequency signal generating means for generating a right audio high frequency signal of 100 Hz or higher, which is a high frequency component of the right audio signal, and a left audio high frequency signal of 100 Hz or higher, which is a high frequency component of the left audio signal;
Low-frequency signal generating means for generating a right-side audio low-frequency signal less than 100 Hz that is a low-frequency component of the right-side audio signal and a left-side audio low-frequency signal less than 100 Hz that is a low-frequency component of the left-side audio signal;
A common component extraction means for extracting a high frequency common component included in common in the right audio high frequency signal and the left audio high frequency signal;
Common signal generating means for generating the central audio output signal from the high frequency common component;
The right audio high frequency signal and the high frequency common components from the left side audio high frequency signal is subtracted as each adds the right audio low frequency signal to the right audio high frequency signal after subtraction, after subtraction An audio signal comprising audio output signal generation means for generating the right audio output signal and the left audio output signal by adding the left audio low frequency signal to the left audio high frequency signal Conversion device.

The low-frequency signal generating means is
Low-pass filtering the right audio signal and the left audio signal to generate the right audio low-frequency signal and the left audio low-frequency signal,
The high frequency signal generating means is
The right audio signal and the left audio signal have the same amount of delay as the low frequency signal generation means, the right audio low frequency signal is subtracted from the delayed right audio signal, and the left audio signal is delayed from the left audio signal. The audio signal conversion apparatus according to claim 1, wherein the left audio low frequency signal is subtracted to generate the right audio high frequency signal and the left audio high frequency signal.

The high frequency signal generating means is
High-pass filtering the right audio signal and the left audio signal to generate the right audio high-frequency signal and the left audio high-frequency signal,
The low-frequency signal generating means is
The right audio signal and the left audio signal have the same amount of delay as the high frequency signal generating means, the right audio high frequency signal is subtracted from the delayed right audio signal, and the left audio signal is delayed from the left audio signal. The audio signal converter according to claim 1, wherein the left audio high frequency signal is subtracted to generate the right audio low frequency signal and the left audio low frequency signal.

The low-frequency signal generating means is
Low-pass filtering the right audio signal and the left audio signal to generate the right audio low-frequency signal and the left audio low-frequency signal,
The high frequency signal generating means is
The audio signal conversion apparatus according to claim 1, wherein the right audio signal and the left audio signal are high-pass filtered to generate the right audio high-frequency signal and the left audio high-frequency signal.

5. The audio signal conversion apparatus according to claim 4, wherein a delay amount of the low-frequency signal generation unit is equal to a delay amount of the high-frequency signal generation unit.

A right audio signal corresponding to the right channel and a left audio signal corresponding to the left channel, a central audio output signal corresponding to the center channel, a right audio output signal corresponding to the right channel, and a left audio output corresponding to the left channel. An audio signal conversion method for converting into a signal,
A high-frequency signal generation step for generating a right-frequency audio high-frequency signal of 100 Hz or higher, which is a high-frequency component of the right-side audio signal, and a left-frequency audio high-frequency signal of 100 Hz or higher, which is a high-frequency component of the left-side audio signal;
A low frequency signal generating step for generating a right audio low frequency signal of less than 100 Hz that is a low frequency component of the right audio signal and a left audio low frequency signal of less than 100 Hz that is a low frequency component of the left audio signal;
A common component extraction step for extracting a high frequency common component that is commonly included in the right audio high frequency signal and the left audio high frequency signal;
A common signal generation step of generating the central audio output signal from the high frequency common component;
The right audio high frequency signal and the high frequency common components from the left side audio high frequency signal is subtracted as each adds the right audio low frequency signal to the right audio high frequency signal after subtraction, after subtraction An audio signal comprising: an audio output signal generating step of generating the right audio output signal and the left audio output signal by adding the left audio low frequency signal to the left audio high frequency signal Conversion method.

A control program for operating the audio signal converter according to any one of claims 1 to 5, wherein the control program causes a computer to function as each of the above means.

A computer-readable recording medium in which the control program according to claim 7 is recorded.