JP2011211312A - Sound image localization processing apparatus and sound image localization processing method - Google Patents

Sound image localization processing apparatus and sound image localization processing method Download PDF

Info

Publication number
JP2011211312A
JP2011211312A JP2010074669A JP2010074669A JP2011211312A JP 2011211312 A JP2011211312 A JP 2011211312A JP 2010074669 A JP2010074669 A JP 2010074669A JP 2010074669 A JP2010074669 A JP 2010074669A JP 2011211312 A JP2011211312 A JP 2011211312A
Authority
JP
Japan
Prior art keywords
sound image
virtual sound
centroid
virtual
input signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2010074669A
Other languages
Japanese (ja)
Inventor
Junji Araki
潤二 荒木
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Corp
Original Assignee
Panasonic Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Panasonic Corp filed Critical Panasonic Corp
Priority to JP2010074669A priority Critical patent/JP2011211312A/en
Publication of JP2011211312A publication Critical patent/JP2011211312A/en
Pending legal-status Critical Current

Links

Landscapes

  • Stereophonic System (AREA)

Abstract

PROBLEM TO BE SOLVED: To solve the problem, wherein localization is pulled to a reproducing loudspeaker position depending on input signals to be reproduced, the localization sense of a virtual sound image is weakened, and full virtual sound image localization sense cannot be obtained, when the virtual sound image is generated using panning or a head transfer function, in a sound image localization processing system that attains virtual sound image localization using one or more sets of paired loudspeakers.SOLUTION: By calculating the azimuth and the size of the centroid of a sound field from multi-channel input signals by a centroid calculating section 2, determining the weight coefficient corresponding to the calculated azimuth and size of the centroid by a weight coefficient determination section 3, and performing virtual sound image generation processing, on the basis of the determined weight coefficient by a virtual sound image generation processing section 4; the localization sense of the virtual sound image is adaptively enhanced for the input signals; and as a result, a sensation of being surrounded by sound is obtained.

Description

本発明は、複数のスピーカを用いた音像定位処理技術に関し、特にパンニングと頭部伝達関数(HRTF)を用いて所望の位置に仮想音像定位を実現する機能を有する音像定位処理技術に関する。   The present invention relates to a sound image localization processing technique using a plurality of speakers, and more particularly to a sound image localization processing technique having a function of realizing a virtual sound image localization at a desired position using panning and a head related transfer function (HRTF).

仮想音像定位技術において、パンニングと頭部伝達関数を用いて前方及び後方の仮想音像定位を実現する手法がある。この手法では次のようにして仮想音像を生成する。   In the virtual sound image localization technique, there is a method of realizing front and rear virtual sound image localization using panning and head related transfer functions. In this method, a virtual sound image is generated as follows.

まず、頭部伝達関数を用意するため、仮想音像を定位させたい位置にスピーカを設置し、このスピーカから受聴者の外耳道入り口までの頭部伝達関数を測定する。これを頭部伝達関数フィルタとする。ここで、仮想音像を定位させたい位置に設置したスピーカは、頭部伝達関数を測定することにのみ用いられ、再生の際には設置されない。再生には入力信号を再生するための複数のスピーカのみが用いられる。   First, in order to prepare a head-related transfer function, a speaker is installed at a position where a virtual sound image is to be localized, and the head-related transfer function from this speaker to the listener's ear canal entrance is measured. This is a head-related transfer function filter. Here, the speaker installed at the position where the virtual sound image is to be localized is used only for measuring the head-related transfer function, and is not installed at the time of reproduction. Only a plurality of speakers for reproducing the input signal are used for reproduction.

次に、複数存在するスピーカのうち、隣り合う2つの再生スピーカ間に仮想音像を定位させるために、それぞれの再生スピーカから再生される信号をパンニングすることにより、仮想音像が2つのスピーカ間に定位するようにする。   Next, in order to localize a virtual sound image between two adjacent reproduction speakers among a plurality of existing speakers, the virtual sound image is localized between the two speakers by panning a signal reproduced from each reproduction speaker. To do.

最後に、仮想音像の定位精度をより向上させるために、仮想音像を定位させたい位置に設置したスピーカを用いて測定した頭部伝達関数フィルタを入力信号に畳み込んで再生することにより、仮想音像による定位を実現する。特許文献1には、このようにして仮想音像定位を実現する手法が記載されている(特許文献1参照)。   Finally, in order to further improve the localization accuracy of the virtual sound image, the head-related transfer function filter measured using a speaker installed at the position where the virtual sound image is to be localized is convolved with the input signal and reproduced. The localization by is realized. Patent Document 1 describes a method for realizing virtual sound image localization in this way (see Patent Document 1).

特開2002−135899号公報JP 2002-135899 A

しかし、仮想音像の定位精度は、仮想音像の位置に実在するスピーカを置いて再生する場合と比較すると、前記手法のようにパンニングや頭部伝達関数フィルタを用いて複数の仮想音像を生成した場合でも、まだ十分な仮想音像定位感が得られず、その結果、実在する複数のスピーカを置いた場合と同等の高い包まれ感が実現できていないという問題が存在する。   However, the localization accuracy of the virtual sound image is higher when the virtual sound image is generated by using panning or head-related transfer function filters as in the above method, compared with the case of reproducing by placing a speaker present at the position of the virtual sound image. However, there is still a problem that a sufficient virtual sound image localization feeling cannot be obtained, and as a result, a high wrapping feeling equivalent to the case where a plurality of actual speakers are placed cannot be realized.

前記従来の課題を解決するため、本発明の音像定位処理装置は、受聴位置の周囲に設置した複数のスピーカを用いて、仮想音像を作るための入力信号を再生して仮想位置に定位させる音響再生装置であって、前記仮想音像を作るための入力信号を処理して、前記仮想位置を挟む2つのスピーカに出力する信号を生成する仮想音像生成処理手段と、前記仮想音像を作るための入力信号を含むマルチチャンネルの入力信号の重心を算出する重心算出手段と、前記重心算出手段により算出された重心の位置と大きさに応じて重み係数を決定する重み係数決定手段とを備え、前記仮想音像生成処理手段は、前記重み係数決定手段により決定された重み係数を前記仮想音像を作るための入力信号に掛けることを特徴とするものである。   In order to solve the above-described conventional problems, the sound image localization processing apparatus of the present invention uses a plurality of speakers installed around a listening position to reproduce an input signal for creating a virtual sound image and localize it to the virtual position. A playback device that processes an input signal for generating the virtual sound image and generates a signal to be output to two speakers sandwiching the virtual position; and an input for generating the virtual sound image Centroid calculating means for calculating the centroid of the multi-channel input signal including the signal, and weight coefficient determining means for determining a weight coefficient according to the position and size of the centroid calculated by the centroid calculating means, The sound image generation processing means is characterized by multiplying the input signal for creating the virtual sound image by the weighting coefficient determined by the weighting coefficient determination means.

また、前記重心算出手段は、前記マルチチャンネルのそれぞれの入力信号を、前記受聴位置を座標の中心とした前記それぞれの入力信号に対応したスピーカの位置へ向かう方位と前記それぞれの入力信号の所定時間間隔毎の平均レベルの大きさとを有するベクトルに変換し、これらのベクトルを合成して前記重心を算出することを特徴とするものである。   In addition, the center-of-gravity calculation means is configured so that each of the multi-channel input signals has a direction toward a speaker position corresponding to each input signal with the listening position as the center of coordinates and a predetermined time of each input signal. The center of gravity is calculated by converting the vectors into vectors having the average level for each interval and combining these vectors.

また、前記重み係数決定手段は、重心が存在しうる領域を方位と大きさに基づいて複数の領域に分割し、前記重心算出手段により算出した重心の方位と大きさがそれぞれ前記複数の領域のうちのどの領域に属するかによって前記重み係数を決定することを特徴とするものである。   The weighting factor determination unit divides a region where a center of gravity can exist into a plurality of regions based on an azimuth and a size, and the direction and size of the center of gravity calculated by the center of gravity calculation unit are The weighting factor is determined according to which of the regions it belongs to.

また、前記重み係数決定手段は、前記重心算出手段により算出した重心の方位がどの方位の領域に属するかによって方位の重み係数を算出するとともに、前記重心算出手段により算出した重心の大きさがどの大きさの領域に属するかによって大きさの重み係数を算出し、前記方位の重み係数と前記大きさの重み係数とを乗算したものを前記重み係数として決定することを特徴とするものである。   In addition, the weighting factor determination unit calculates a weighting factor of the azimuth according to which azimuth region the centroid direction calculated by the centroid calculation unit belongs to, and determines the size of the centroid calculated by the centroid calculation unit. A size weighting factor is calculated depending on whether it belongs to a size region, and a product obtained by multiplying the weighting factor of the direction and the weighting factor of the size is determined as the weighting factor.

また、前記重み係数決定手段は、前記重心の方位に近い方位領域に属する方位の重み関数ほど大きくなるように算出し、かつ、前記重心の大きさが大きいほど前記大きさの重み係数が大きくなるように算出することを特徴とするものである。   Further, the weighting factor determination means calculates the weighting function of the azimuth belonging to the azimuth region close to the azimuth of the centroid, and the weighting factor of the magnitude increases as the size of the centroid increases. It is characterized by calculating as follows.

また、本発明の音像定位処理方法は、受聴位置の周囲に設置した複数のスピーカを用いて、仮想音像を作るための入力信号を再生して仮想位置に定位させる音響再生方法であって、前記仮想音像を作るための入力信号を処理して、前記仮想位置を挟む2つのスピーカに出力する信号を生成する仮想音像生成処理ステップと、前記仮想音像を作るための入力信号を含むマルチチャンネルの入力信号の重心を算出する重心算出ステップと、前記重心算出ステップにおいて算出された重心の位置と大きさに応じて重み係数を決定する重み係数決定ステップとを有し、前記仮想音像生成処理ステップにおいて、前記重み係数決定ステップで決定された重み係数を前記仮想音像を作るための入力信号に掛けることを特徴とするものである。   Further, the sound image localization processing method of the present invention is a sound reproduction method for reproducing an input signal for creating a virtual sound image using a plurality of speakers installed around a listening position and localizing to a virtual position, A virtual sound image generation processing step for processing an input signal for creating a virtual sound image and generating a signal to be output to two speakers sandwiching the virtual position; and a multi-channel input including the input signal for creating the virtual sound image A centroid calculating step for calculating the centroid of the signal, and a weighting factor determining step for determining a weighting factor according to the position and size of the centroid calculated in the centroid calculating step. The weighting coefficient determined in the weighting coefficient determination step is multiplied by the input signal for creating the virtual sound image.

また、本発明のプログラムは、前記音像定位処理方法の各ステップをコンピュータに実行させるためのものである。   The program of the present invention is for causing a computer to execute each step of the sound image localization processing method.

また、本発明の記録媒体は、前記プログラムを格納したものである。   The recording medium of the present invention stores the program.

本発明によれば、受聴者の周囲に設置した2つ以上の複数スピーカから再生された音を仮想位置に定位させる音響再生装置において、マルチチャンネルの入力信号の重心を算出し、その重心位置に応じて決定された重み係数を仮想音像生成処理に反映して入力信号を再生することにより、仮想音像の定位効果をより強調し、音場の包まれ感を向上させることが可能となる。   According to the present invention, in a sound reproduction device that localizes sound reproduced from two or more speakers installed around a listener at a virtual position, the center of gravity of a multi-channel input signal is calculated, and the position of the center of gravity is calculated. By reproducing the input signal by reflecting the weighting factor determined accordingly in the virtual sound image generation process, it is possible to further enhance the localization effect of the virtual sound image and improve the feeling of the sound field wrapping.

本発明の実施の形態1における、音像定位処理装置のブロック図The block diagram of the sound image localization processing apparatus in Embodiment 1 of this invention 本発明の実施の形態1における、仮想音像生成処理部の構成を示すブロック図The block diagram which shows the structure of the virtual sound image generation process part in Embodiment 1 of this invention. 本発明の実施の形態1における、入力信号をベクトル分解して算出した重心位置を示す図The figure which shows the gravity center position calculated by carrying out vector decomposition | disassembly of the input signal in Embodiment 1 of this invention 本発明の実施の形態1における、重心Gの方位に対する重み付け領域を示す図The figure which shows the weighting area | region with respect to the direction of the gravity center G in Embodiment 1 of this invention. 本発明の実施の形態1における、重心Gの大きさに対する重み付け領域を示す図The figure which shows the weighting area | region with respect to the magnitude | size of the gravity center G in Embodiment 1 of this invention.

(実施の形態1)
以下図面を参照しながら、本発明の実施の形態について説明する。
(Embodiment 1)
Embodiments of the present invention will be described below with reference to the drawings.

図1は、本実施の形態において、5.1チャンネル等のマルチチャンネル入力信号に対して仮想音像生成処理を行い、フロントLチャンネル(FL)スピーカ5、センターチャンネル(C)スピーカ6、フロントRチャンネル(FR)スピーカ7、サラウンドLチャンネル(SL)スピーカ8、サラウンドRチャンネル(SR)スピーカ9の各再生スピーカを用いて仮想音像10〜15を定位させる音像定位処理を説明するためのブロック図である。   FIG. 1 shows that in this embodiment, virtual sound image generation processing is performed on a multichannel input signal such as 5.1 channel, and a front L channel (FL) speaker 5, a center channel (C) speaker 6, and a front R channel. FIG. 6 is a block diagram for explaining sound image localization processing in which virtual sound images 10 to 15 are localized using reproduction speakers of (FR) speaker 7, surround L channel (SL) speaker 8, and surround R channel (SR) speaker 9. .

図1において、マルチチャンネル入力信号は入力端子1より入力する。重心算出部2は、入力信号の重心を算出する。重み係数決定部3は、重心算出部2より算出した入力信号の重心を基に重み係数を決定する。仮想音像生成処理部4は、重み係数決定部3で決定された重み係数に基づき、マルチチャンネル入力信号に対して仮想音像生成処理を行い、複数の再生スピーカ5〜9に出力する信号を生成する。このような構成により、受聴者16はFLスピーカ5、Cスピーカ6、FRスピーカ7、SLスピーカ8、SRスピーカ9に加えて、仮想音像10〜15の位置から仮想的に再生音が聞こえることとなる。   In FIG. 1, a multi-channel input signal is input from an input terminal 1. The center of gravity calculation unit 2 calculates the center of gravity of the input signal. The weighting factor determination unit 3 determines a weighting factor based on the centroid of the input signal calculated by the centroid calculation unit 2. The virtual sound image generation processing unit 4 performs virtual sound image generation processing on the multi-channel input signal based on the weighting factor determined by the weighting factor determination unit 3, and generates signals to be output to the plurality of reproduction speakers 5 to 9. . With such a configuration, the listener 16 can virtually hear the reproduced sound from the positions of the virtual sound images 10 to 15 in addition to the FL speaker 5, the C speaker 6, the FR speaker 7, the SL speaker 8, and the SR speaker 9. Become.

ここで、仮想音像10は、フロントLチャンネル(FL)信号をパンニングおよび頭部伝達関数フィルタで処理してFLスピーカ5とSLスピーカ8に出力することにより定位させる。仮想音像11は、サラウンドLチャンネル(SL)信号をパンニングおよび頭部伝達関数フィルタで処理してFLスピーカ5とSLスピーカ8に出力することにより定位させる。仮想音像12は、SL信号をパンニングおよび頭部伝達関数フィルタで処理してSLスピーカ8とSRスピーカ9に出力することにより定位させる。R側についても同様である。   Here, the virtual sound image 10 is localized by processing the front L channel (FL) signal with panning and a head-related transfer function filter and outputting the processed signal to the FL speaker 5 and the SL speaker 8. The virtual sound image 11 is localized by processing a surround L channel (SL) signal by panning and a head-related transfer function filter and outputting the processed signal to the FL speaker 5 and the SL speaker 8. The virtual sound image 12 is localized by processing the SL signal with panning and a head-related transfer function filter and outputting it to the SL speaker 8 and the SR speaker 9. The same applies to the R side.

以上のように構成された音像定位処理装置について、以下説明する。   The sound image localization processing apparatus configured as described above will be described below.

まず、仮想音像生成処理部4について説明する。図2に仮想音像生成処理部4の構成の一例を示す。図2において、211〜216は仮想音像を作るための信号にそれぞれ重み係数K11〜K16を掛けるための係数器、221〜232はそれぞれ特性EQ21〜EQ32を有する頭部伝達関数フィルタ、241〜252はそれぞれ頭部伝達関数フィルタ221〜232の出力信号にパンニングのための係数K41〜K52を掛ける係数器、261〜264は加算器である。   First, the virtual sound image generation processing unit 4 will be described. FIG. 2 shows an example of the configuration of the virtual sound image generation processing unit 4. In FIG. 2, 211 to 216 are coefficient multipliers for multiplying signals for creating virtual sound images by weighting coefficients K11 to K16, 221 to 232 are head-related transfer function filters having characteristics EQ21 to EQ32, and 241 to 252 are Coefficient units for multiplying the output signals of the head-related transfer function filters 221 to 232 by the coefficients K41 to K52 for panning, and 261 to 264 are adders.

仮想音像10は、FL信号に係数器211で重み係数K11を掛け、頭部伝達関数フィルタ221と係数器241を介してFLスピーカ5に出力すると共に、頭部伝達関数フィルタ223と係数器243を介してSLスピーカ8に出力する。仮想音像11は、SL信号に係数器212で重み係数K12を掛け、頭部伝達関数フィルタ222と係数器242を介してFLスピーカ5に出力すると共に、頭部伝達関数フィルタ224と係数器244を介してSLスピーカ8に出力する。仮想音像12は、SL信号に係数器213で重み係数K13を掛け、頭部伝達関数フィルタ225と係数器245を介してSLスピーカ8に出力すると共に、頭部伝達関数フィルタ227と係数器247を介してSRスピーカ9に出力する。R側についても同様である。   The virtual sound image 10 is obtained by multiplying the FL signal by a weighting coefficient K11 by a coefficient unit 211 and outputting the result to the FL speaker 5 through the head-related transfer function filter 221 and the coefficient unit 241, and the head-related transfer function filter 223 and the coefficient unit 243. To the SL speaker 8. The virtual sound image 11 is obtained by multiplying the SL signal by a weighting coefficient K12 by a coefficient unit 212 and outputting the result to the FL speaker 5 via the head-related transfer function filter 222 and the coefficient unit 242, and the head-related transfer function filter 224 and the coefficient unit 244. To the SL speaker 8. The virtual sound image 12 is obtained by multiplying the SL signal by a weighting coefficient K13 by a coefficient unit 213 and outputting the result to the SL speaker 8 via the head-related transfer function filter 225 and the coefficient unit 245, and the head-related transfer function filter 227 and the coefficient unit 247. To the SR speaker 9. The same applies to the R side.

ここで、パンニングのための係数は以下のように設定する。2つのスピーカA、Bを用いて、これらのスピーカに挟まれた位置に仮想音像Cを生成する場合、スピーカAと仮想音像Cとの成す角度をa、スピーカBと仮想音像Cとの成す角度をbとすると、スピーカAに出力する信号レベルPAとスピーカB出力する信号レベルPBの比は、以下のようになる。   Here, the coefficient for panning is set as follows. When the virtual sound image C is generated at a position sandwiched between the two speakers A and B, the angle between the speaker A and the virtual sound image C is a, and the angle between the speaker B and the virtual sound image C is Is b, the ratio of the signal level PA output to the speaker A and the signal level PB output to the speaker B is as follows.

例えば、仮想音像10をパンニングするための係数器241および243の係数K41およびK43は、これらの比が(数1)を満たし、かつトータルパワーが変わらない、すなわち2乗平均が1となるように設定すればよい。他の仮想音像についても同様である。   For example, the coefficients K41 and K43 of the coefficient units 241 and 243 for panning the virtual sound image 10 are such that their ratio satisfies (Equation 1) and the total power does not change, that is, the mean square is 1. You only have to set it. The same applies to other virtual sound images.

つぎに、頭部伝達関数フィルタの特性について説明する。仮想音像Cを生成するためには、仮想音像Cから受聴者までの頭部伝達特性をスピーカAから受聴者までの頭部伝達特性で除したものをスピーカAに出力するための頭部伝達関数フィルタとし、仮想音像Cから受聴者までの頭部伝達特性をスピーカBから受聴者までの頭部伝達特性で除したものをスピーカBに出力するための頭部伝達関数フィルタとする。   Next, characteristics of the head-related transfer function filter will be described. In order to generate the virtual sound image C, the head-related transfer function for outputting to the speaker A the head-related transfer characteristic from the virtual sound image C to the listener divided by the head-related transfer characteristic from the speaker A to the listener. The filter is a head-related transfer function filter for outputting to the speaker B the head-related transfer characteristic from the virtual sound image C to the listener divided by the head-related transfer characteristic from the speaker B to the listener.

例えば、仮想音像10を生成するための頭部伝達関数フィルタ221の特性EQ21は、仮想音像10から受聴者16までの頭部伝達特性をFLスピーカ5から受聴者16までの頭部伝達特性で除した特性であり、頭部伝達関数フィルタ223の特性EQ23は、仮想音像10から受聴者16までの頭部伝達特性をSLスピーカ8から受聴者16までの頭部伝達特性で除した特性である。他の仮想音像についても同様である。   For example, the characteristic EQ21 of the head-related transfer function filter 221 for generating the virtual sound image 10 is obtained by dividing the head-related transfer characteristic from the virtual sound image 10 to the listener 16 by the head-related transfer characteristic from the FL speaker 5 to the listener 16. The characteristic EQ23 of the head-related transfer function filter 223 is a characteristic obtained by dividing the head-related transfer characteristic from the virtual sound image 10 to the listener 16 by the head-related transfer characteristic from the SL speaker 8 to the listener 16. The same applies to other virtual sound images.

なお、仮想音像を、その元となる信号から分離するために、頭部伝達関数フィルタ221〜232が、さらに遅延特性を有するようにしてもよい。   Note that the head-related transfer function filters 221 to 232 may further have delay characteristics in order to separate the virtual sound image from the original signal.

ここで、重み付け係数K11〜K16をすべて1に設定した場合は、頭部伝達関数フィルタ221〜232と係数器241〜252で仮想音像10〜15をある程度定位させることができるが、このようにして定位させた仮想音像は、元の信号を実スピーカで再生した実音像と比較して定位感が弱く、そのため、十分な包まれ感が得られない。   Here, when all of the weighting coefficients K11 to K16 are set to 1, the virtual sound images 10 to 15 can be localized to some extent by the head-related transfer function filters 221 to 232 and the coefficient units 241 to 252. The localized virtual sound image is weak in localization as compared with the actual sound image obtained by reproducing the original signal with a real speaker, so that a sufficient feeling of wrapping cannot be obtained.

そこで、本実施の形態では、各チャンネルの入力信号を合成した重心Gの方位と大きさをリアルタイムで検出し、重心Gの方位に近い仮想音像を作るための信号ほど増強されるように重み付けをし、さらに重心Gの大きさが大きいほど増強の程度を強くすることにより、仮想音像の定位感を強めるようにしている。   Therefore, in this embodiment, the azimuth and size of the center of gravity G obtained by synthesizing the input signals of the respective channels are detected in real time, and weighting is performed so that a signal for creating a virtual sound image close to the direction of the center of gravity G is enhanced. In addition, the greater the size of the center of gravity G, the stronger the degree of enhancement, thereby enhancing the sense of localization of the virtual sound image.

この重み付けを行うために、まず、重心算出部2で入力信号の重心Gを算出し、その算出結果に基づいて、重み係数決定部3で重み係数を決定し、その重み係数を、係数器211〜216の係数K11〜K16に設定する。   In order to perform this weighting, first, the center of gravity G of the input signal is calculated by the center of gravity calculating unit 2, the weighting factor is determined by the weighting factor determining unit 3 based on the calculation result, and the weighting factor is calculated by the coefficient unit 211. Set to coefficients K11 to K16 of ˜216.

つぎに、重心算出部2について説明する。図3は各チャンネルの入力信号をベクトル分解し、重心Gを算出する様子を示す図である。重心算出部2は、入力端子1から入力されるマルチチャンネル信号に対し、受聴者16の位置を座標の中心とするベクトル座標において、各チャンネルの入力信号の所定時間間隔における平均レベルをx軸とy軸の要素にベクトル分解する。ここで、平均レベルで表されるFL信号のx軸要素をFLx、y軸要素をFLy、C信号のx軸要素をCx、y軸要素をCy、FR信号のx軸要素をFRx、y軸要素をFRy、SL信号のx軸要素をSLx、y軸要素をSLy、SR信号のx軸要素をSRx、y軸要素をSRyと表すと、重心Gは(数2)次のように算出することができる。(−1)を掛ける理由は、入力信号から算出した重心を受聴者に対する重心に置き換えるためであり、また、各x軸、y軸要素はスカラー値を表し、向きは符号で表現することとする。   Next, the center of gravity calculation unit 2 will be described. FIG. 3 is a diagram showing a state in which the center of gravity G is calculated by vector decomposition of the input signal of each channel. For the multi-channel signal input from the input terminal 1, the center-of-gravity calculation unit 2 uses the average level of the input signal of each channel at a predetermined time interval as the x-axis in vector coordinates having the position of the listener 16 as the center of coordinates. Perform vector decomposition into y-axis elements. Here, the x-axis element of the FL signal represented by the average level is FLx, the y-axis element is FLy, the x-axis element of the C signal is Cx, the y-axis element is Cy, and the x-axis element of the FR signal is FRx, y-axis When the element is FRy, the x-axis element of the SL signal is SLx, the y-axis element is SLy, the x-axis element of the SR signal is SRx, and the y-axis element is SRy, the center of gravity G is calculated as follows. be able to. The reason for multiplying by (-1) is to replace the center of gravity calculated from the input signal with the center of gravity for the listener, and each x-axis and y-axis element represents a scalar value and the direction is represented by a sign. .

ここで、Gx、Gyは重心Gのそれぞれx軸、y軸要素を表し、|G|は図3における受聴者から重心Gまでの距離に対応し、各チャンネルの入力信号を合成した信号の平均レベルを表す。すなわち、受聴者16の位置に対して、Gx、Gyをベクトル合成した位置が重心Gの位置となり、その大きさは|G|で表される。   Here, Gx and Gy represent the x-axis and y-axis elements of the center of gravity G, respectively, and | G | corresponds to the distance from the listener to the center of gravity G in FIG. Represents a level. That is, the position obtained by vector synthesis of Gx and Gy with respect to the position of the listener 16 is the position of the center of gravity G, and the size is represented by | G |

次に、重み係数決定部3について説明する。重み係数は重心Gの方位θと大きさ|G|それぞれに応じて決定される。   Next, the weight coefficient determination unit 3 will be described. The weighting coefficient is determined in accordance with the azimuth θ and the size | G |

図4は重心Gの方位に対する重み付け領域を示す図である。まず、重心Gの方位θに関する重み係数Nについて説明する。重心Gの方位θを中心とする方位領域(例えば±30°)をθ1とし、その左回り隣りの方位領域(例えば60°)をθ2p、右回り隣りの方位領域(例えば60°)をθ2m、そのさらに隣り合う方位領域(例えば60°)をθ3p、θ3m・・・と設定し、方位領域θ1、(θ2p、θ2m)、(θ3p、θ3m)・・・に属する仮想音像を生成するための、方位に関する重み係数をそれぞれN1、N2、N3・・・(N1>N2>N3>・・・>0)と決定する(例えばN1=1.0、N2=0.8、N3=0.6・・・)。   FIG. 4 is a diagram showing a weighted region for the orientation of the center of gravity G. First, the weighting coefficient N related to the direction θ of the center of gravity G will be described. An azimuth region (for example, ± 30 °) centered on the azimuth θ of the center of gravity G is θ1, the counterclockwise adjacent azimuth region (for example, 60 °) is θ2p, the rightward adjacent azimuth region (for example, 60 °) is θ2m, The adjacent azimuth regions (for example, 60 °) are set as θ3p, θ3m,... And a virtual sound image belonging to the azimuth regions θ1, (θ2p, θ2m), (θ3p, θ3m),. The weighting factors related to the bearings are determined as N1, N2, N3... (N1> N2> N3>...> 0) (for example, N1 = 1.0, N2 = 0.8, N3 = 0.6 ·・ ・).

すなわち図3において、方位に関する重み係数はそれぞれ、仮想音像10、11を生成するための重み係数はN1、仮想音像12を生成するための重み係数はN2、仮想音像13,15を生成するための重み係数はN3、仮想音像14は重み付けなしとなる。   That is, in FIG. 3, the weighting coefficient for the azimuth is N1 for generating the virtual sound images 10 and 11, N2 for the virtual sound image 12, and N2 for generating the virtual sound image 12, respectively. The weighting coefficient is N3, and the virtual sound image 14 is not weighted.

続いて、図5を用いて、重心Gの大きさ|G|に関する重み係数Dについて説明する。   Subsequently, the weighting coefficient D related to the size | G | of the center of gravity G will be described with reference to FIG.

受聴者の位置を中心とし、その位置から遠ざかる方向に向かって、半径d1、d2、d3(0<d1<d2<d3<・・・)となるような同心円で区切られた領域を想定し、重心Gの大きさ|G|がどの領域に位置するかによって、重心Gの大きさ|G|に関する重み係数Dを、次のように決定する。   Assuming a region delimited by concentric circles with radii d1, d2, d3 (0 <d1 <d2 <d3 <...) Centering on the listener's position and moving away from that position, Depending on which region the size | G | of the center of gravity G is located, the weighting coefficient D for the size | G | of the center of gravity G is determined as follows.

(例えば、D1=0.1、D2=0.5、D3=1.0・・・)
図5の場合、重心Gの大きさ|G|に関する重み係数Dは、D2となる。
(For example, D1 = 0.1, D2 = 0.5, D3 = 1.0 ...)
In the case of FIG. 5, the weighting coefficient D regarding the size | G | of the center of gravity G is D2.

このようにして決定した重心Gの方位θに関する重み係数Nと、重心Gの大きさ|G|に関する重み係数Dの積N・Dを重み係数Kとする。この重み係数Kを、仮想音像生成処理部4の係数器211〜216のそれぞれの係数K11〜K16に設定する。   The product N · D of the weighting coefficient N related to the orientation θ of the center of gravity G and the weighting coefficient D related to the size | G | The weighting coefficient K is set to the coefficients K11 to K16 of the coefficient units 211 to 216 of the virtual sound image generation processing unit 4, respectively.

以上のようにすることにより、各チャンネルの入力信号の重心Gをリアルタイムに算出し、その重心Gから重み係数Kを算出し、その重み係数を仮想音像を作る信号に掛けることにより、入力信号に応じて変化する重心に対して適応的に重み係数を変化させることが可能となり、その結果、重心位置に応じて仮想音像の定位感をより強調した高い包まれ感を得ることが可能となる。   As described above, the centroid G of the input signal of each channel is calculated in real time, the weighting coefficient K is calculated from the centroid G, and the weighting coefficient is multiplied by the signal for creating the virtual sound image, thereby obtaining the input signal. Accordingly, it is possible to adaptively change the weighting factor with respect to the center of gravity that changes accordingly, and as a result, it is possible to obtain a high wrapping feeling that emphasizes the localization feeling of the virtual sound image according to the position of the center of gravity.

なお、図2に示す仮想音像生成処理部では、重み係数を係数器211〜216の係数K11〜K16に設定しているが、その代わりに、係数器241〜252の係数K41〜K52に掛けるようにすれば、係数器211〜216は省略できる。   In the virtual sound image generation processing unit shown in FIG. 2, the weighting coefficients are set to the coefficients K11 to K16 of the coefficient units 211 to 216, but instead, the coefficients K41 to K52 of the coefficient units 241 to 252 are multiplied. In this case, the coefficient units 211 to 216 can be omitted.

また、図2に示す仮想音像生成処理部では、仮想音像を生成するために、パンニングと頭部伝達関数フィルタの両方を用いているが、どちらか一方のみであってもよい。   In the virtual sound image generation processing unit shown in FIG. 2, both panning and head-related transfer function filters are used to generate a virtual sound image, but only one of them may be used.

また、上記説明では、重み係数決定部において、方位に関する重み係数を決定する際、重心Gの方位θを基準として方位領域θ1、θ2p、θ2m、θ3p、θ3m・・・を設定しているが、その代わりに、予め全周を所定の角度間隔(例えば30°間隔)で分割した複数の固定の方位領域を設定し、重心Gの方位θが位置する方位領域をθ1とし、その両隣の方位領域をθ2p、θ2mとし、さらにそれらに隣接する方位領域をθ3p、θ3m・・・としてもよい。   In the above description, when the weighting coefficient determination unit determines the weighting coefficient related to the azimuth, the azimuth regions θ1, θ2p, θ2m, θ3p, θ3m,... Are set based on the azimuth θ of the center of gravity G. Instead, a plurality of fixed azimuth areas in which the entire circumference is divided in advance at predetermined angular intervals (for example, 30 ° intervals) are set, and the azimuth area where the azimuth θ of the center of gravity G is located is defined as θ1, and the adjacent azimuth areas on both sides May be θ2p, θ2m, and the azimuth regions adjacent to them may be θ3p, θ3m,.

本発明は、音楽信号が再生可能で1組以上の対となるスピーカを駆動する装置を備えた機器、例えばサラウンドシステム、TV、AVアンプ、コンポ、携帯電話、ポータブルオーディオ機器に有用である。   INDUSTRIAL APPLICABILITY The present invention is useful for a device that can reproduce a music signal and includes a device that drives one or more pairs of speakers, such as a surround system, a TV, an AV amplifier, a component, a mobile phone, and a portable audio device.

1 入力端子
2 重心算出部
3 重み係数決定部
4 仮想音像生成処理部
5 FLスピーカ
6 Cスピーカ
7 FRスピーカ
8 SLスピーカ
9 SRスピーカ
10〜15 仮想音像
16 受聴者
211〜216 係数器
221〜232 頭部伝達関数フィルタ
241〜252 係数器
261〜264 加算器
DESCRIPTION OF SYMBOLS 1 Input terminal 2 Center of gravity calculation part 3 Weight coefficient determination part 4 Virtual sound image production | generation process part 5 FL speaker 6 C speaker 7 FR speaker 8 SL speaker 9 SR speaker 10-15 Virtual sound image 16 Audience 211-216 Coefficient unit 221-232 head Partial transfer function filter 241 to 252 Coefficient unit 261 to 264 Adder

Claims (8)

受聴位置の周囲に設置した複数のスピーカを用いて、仮想音像を作るための入力信号を再生して仮想位置に定位させる音響再生装置であって、
前記仮想音像を作るための入力信号を処理して、前記仮想位置を挟む2つのスピーカに出力する信号を生成する仮想音像生成処理手段と、
前記仮想音像を作るための入力信号を含むマルチチャンネルの入力信号の重心を算出する重心算出手段と、
前記重心算出手段により算出された重心の位置と大きさに応じて重み係数を決定する重み係数決定手段とを備え、
前記仮想音像生成処理手段は、前記重み係数決定手段により決定された重み係数を前記仮想音像を作るための入力信号に掛けることを特徴とする音像定位処理装置。
A sound reproduction device that reproduces an input signal for creating a virtual sound image by using a plurality of speakers installed around a listening position and localizes it to the virtual position,
Virtual sound image generation processing means for processing an input signal for creating the virtual sound image and generating a signal to be output to two speakers sandwiching the virtual position;
Centroid calculating means for calculating the centroid of a multi-channel input signal including an input signal for creating the virtual sound image;
Weight coefficient determination means for determining a weight coefficient according to the position and size of the center of gravity calculated by the center of gravity calculation means;
The sound image localization processing device, wherein the virtual sound image generation processing means multiplies the input signal for creating the virtual sound image by the weighting coefficient determined by the weighting coefficient determination means.
前記重心算出手段は、前記マルチチャンネルのそれぞれの入力信号を、前記受聴位置を座標の中心とした前記それぞれの入力信号に対応したスピーカの位置へ向かう方位と前記それぞれの入力信号の所定時間間隔毎の平均レベルの大きさとを有するベクトルに変換し、これらのベクトルを合成して前記重心を算出することを特徴とする請求項1記載の音像定位処理装置。 The center-of-gravity calculation means calculates the input signals of the multi-channels at an orientation toward the speaker position corresponding to the respective input signals with the listening position as the center of coordinates and at predetermined time intervals of the respective input signals. The sound image localization processing apparatus according to claim 1, wherein the center of gravity is calculated by converting the vectors into a vector having a magnitude of the average level and combining the vectors. 前記重み係数決定手段は、重心が存在しうる領域を方位と大きさに基づいて複数の領域に分割し、前記重心算出手段により算出した重心の方位と大きさがそれぞれ前記複数の領域のうちのどの領域に属するかによって前記重み係数を決定することを特徴とする請求項1記載の音像定位処理装置。 The weighting factor determining unit divides a region where a centroid can exist into a plurality of regions based on an azimuth and a size, and the azimuth and size of the centroid calculated by the centroid calculating unit are respectively in the plurality of regions. The sound image localization processing apparatus according to claim 1, wherein the weighting coefficient is determined depending on which region it belongs to. 前記重み係数決定手段は、前記重心算出手段により算出した重心の方位がどの方位の領域に属するかによって方位の重み係数を算出するとともに、前記重心算出手段により算出した重心の大きさがどの大きさの領域に属するかによって大きさの重み係数を算出し、前記方位の重み係数と前記大きさの重み係数とを乗算したものを前記重み係数として決定することを特徴とする請求項3記載の音像定位処理装置。 The weighting factor determination unit calculates a weighting factor of the azimuth depending on which azimuth region the centroid direction calculated by the centroid calculation unit belongs to, and the magnitude of the centroid calculated by the centroid calculation unit 4. A sound image according to claim 3, wherein a weighting coefficient of a size is calculated depending on whether the weight belongs to a region, and the weighting coefficient obtained by multiplying the weighting coefficient of the direction and the weighting coefficient of the size is determined as the weighting coefficient. Stereotaxic equipment. 前記重み係数決定手段は、前記重心の方位に近い方位領域に属する方位の重み関数ほど大きくなるように算出し、かつ、前記重心の大きさが大きいほど前記大きさの重み係数が大きくなるように算出することを特徴とする請求項4記載の音像定位処理装置。 The weighting factor determination means calculates the weighting function of the azimuth belonging to the azimuth region close to the azimuth of the centroid so that the weighting factor of the magnitude increases as the size of the centroid increases. The sound image localization processing apparatus according to claim 4, wherein the sound image localization processing apparatus is calculated. 受聴位置の周囲に設置した複数のスピーカを用いて、仮想音像を作るための入力信号を再生して仮想位置に定位させる音響再生方法であって、
前記仮想音像を作るための入力信号を処理して、前記仮想位置を挟む2つのスピーカに出力する信号を生成する仮想音像生成処理ステップと、
前記仮想音像を作るための入力信号を含むマルチチャンネルの入力信号の重心を算出する重心算出ステップと、
前記重心算出ステップにおいて算出された重心の位置と大きさに応じて重み係数を決定する重み係数決定ステップとを有し、
前記仮想音像生成処理ステップにおいて、前記重み係数決定ステップで決定された重み係数を前記仮想音像を作るための入力信号に掛けることを特徴とする音像定位処理方法。
An acoustic reproduction method for reproducing an input signal for creating a virtual sound image by using a plurality of speakers installed around a listening position and localizing to a virtual position,
A virtual sound image generation processing step of processing an input signal for creating the virtual sound image and generating a signal to be output to two speakers sandwiching the virtual position;
A centroid calculating step of calculating a centroid of a multi-channel input signal including an input signal for creating the virtual sound image;
A weighting factor determination step for determining a weighting factor according to the position and size of the center of gravity calculated in the center of gravity calculation step;
A sound image localization processing method characterized in that, in the virtual sound image generation processing step, an input signal for creating the virtual sound image is multiplied by the weighting coefficient determined in the weighting coefficient determination step.
請求項6記載の音像定位処理方法の各ステップをコンピュータに実行させるためのプログラム。 A program for causing a computer to execute each step of the sound image localization processing method according to claim 6. 請求項7記載のプログラムを格納した記録媒体。 A recording medium storing the program according to claim 7.
JP2010074669A 2010-03-29 2010-03-29 Sound image localization processing apparatus and sound image localization processing method Pending JP2011211312A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2010074669A JP2011211312A (en) 2010-03-29 2010-03-29 Sound image localization processing apparatus and sound image localization processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2010074669A JP2011211312A (en) 2010-03-29 2010-03-29 Sound image localization processing apparatus and sound image localization processing method

Publications (1)

Publication Number Publication Date
JP2011211312A true JP2011211312A (en) 2011-10-20

Family

ID=44941953

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2010074669A Pending JP2011211312A (en) 2010-03-29 2010-03-29 Sound image localization processing apparatus and sound image localization processing method

Country Status (1)

Country Link
JP (1) JP2011211312A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015002517A1 (en) * 2013-07-05 2015-01-08 한국전자통신연구원 Virtual sound image localization method for two dimensional and three dimensional spaces
KR20150005477A (en) * 2013-07-05 2015-01-14 한국전자통신연구원 Virtual sound image localization in two and three dimensional space
JP2015506035A (en) * 2011-12-19 2015-02-26 クゥアルコム・インコーポレイテッドQualcomm Incorporated Gesture control voice user interface
JP2017507621A (en) * 2014-01-07 2017-03-16 フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン Apparatus and method for generating multiple audio channels
JP2017201815A (en) * 2012-12-04 2017-11-09 サムスン エレクトロニクス カンパニー リミテッド Audio providing apparatus and audio providing method
RU2656986C1 (en) * 2014-06-26 2018-06-07 Самсунг Электроникс Ко., Лтд. Method and device for acoustic signal rendering and machine-readable recording media
CN108777836A (en) * 2013-10-23 2018-11-09 杜比国际公司 The determination method and apparatus of decoding matrix for audio signal decoding
CN110611863A (en) * 2019-09-12 2019-12-24 苏州大学 360-degree sound source real-time playback system

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9563278B2 (en) 2011-12-19 2017-02-07 Qualcomm Incorporated Gesture controlled audio user interface
JP2015506035A (en) * 2011-12-19 2015-02-26 クゥアルコム・インコーポレイテッドQualcomm Incorporated Gesture control voice user interface
JP2017201815A (en) * 2012-12-04 2017-11-09 サムスン エレクトロニクス カンパニー リミテッド Audio providing apparatus and audio providing method
KR102149046B1 (en) * 2013-07-05 2020-08-28 한국전자통신연구원 Virtual sound image localization in two and three dimensional space
KR20150005477A (en) * 2013-07-05 2015-01-14 한국전자통신연구원 Virtual sound image localization in two and three dimensional space
CN107968985A (en) * 2013-07-05 2018-04-27 韩国电子通信研究院 Virtual sound image localization method in two dimension and three dimensions
WO2015002517A1 (en) * 2013-07-05 2015-01-08 한국전자통신연구원 Virtual sound image localization method for two dimensional and three dimensional spaces
CN104982040A (en) * 2013-07-05 2015-10-14 韩国电子通信研究院 Virtual sound image localization method for two dimensional and three dimensional spaces
CN107968985B (en) * 2013-07-05 2020-03-10 韩国电子通信研究院 Virtual sound image localization method in two-dimensional and three-dimensional space
US10694308B2 (en) 2013-10-23 2020-06-23 Dolby Laboratories Licensing Corporation Method for and apparatus for decoding/rendering an ambisonics audio soundfield representation for audio playback using 2D setups
US11750996B2 (en) 2013-10-23 2023-09-05 Dolby Laboratories Licensing Corporation Method for and apparatus for decoding/rendering an Ambisonics audio soundfield representation for audio playback using 2D setups
US11451918B2 (en) 2013-10-23 2022-09-20 Dolby Laboratories Licensing Corporation Method for and apparatus for decoding/rendering an Ambisonics audio soundfield representation for audio playback using 2D setups
US11770667B2 (en) 2013-10-23 2023-09-26 Dolby Laboratories Licensing Corporation Method for and apparatus for decoding/rendering an ambisonics audio soundfield representation for audio playback using 2D setups
CN108777836A (en) * 2013-10-23 2018-11-09 杜比国际公司 The determination method and apparatus of decoding matrix for audio signal decoding
CN108777837A (en) * 2013-10-23 2018-11-09 杜比国际公司 Method and apparatus for audio signal decoding
US10986455B2 (en) 2013-10-23 2021-04-20 Dolby Laboratories Licensing Corporation Method for and apparatus for decoding/rendering an ambisonics audio soundfield representation for audio playback using 2D setups
US10097945B2 (en) 2014-01-07 2018-10-09 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating a plurality of audio channels
US10595153B2 (en) 2014-01-07 2020-03-17 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating a plurality of audio channels
US11785414B2 (en) 2014-01-07 2023-10-10 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E. V. Apparatus and method for generating a plurality of audio channels
US10904693B2 (en) 2014-01-07 2021-01-26 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating a plurality of audio channels
US11438723B2 (en) 2014-01-07 2022-09-06 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating a plurality of audio channels
JP2017507621A (en) * 2014-01-07 2017-03-16 フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン Apparatus and method for generating multiple audio channels
US10484810B2 (en) 2014-06-26 2019-11-19 Samsung Electronics Co., Ltd. Method and device for rendering acoustic signal, and computer-readable recording medium
CN110418274A (en) * 2014-06-26 2019-11-05 三星电子株式会社 For rendering the method and apparatus and computer readable recording medium of acoustic signal
RU2656986C1 (en) * 2014-06-26 2018-06-07 Самсунг Электроникс Ко., Лтд. Method and device for acoustic signal rendering and machine-readable recording media
CN110611863A (en) * 2019-09-12 2019-12-24 苏州大学 360-degree sound source real-time playback system
CN110611863B (en) * 2019-09-12 2020-11-06 苏州大学 360-degree sound source real-time playback system

Similar Documents

Publication Publication Date Title
US8705750B2 (en) Device and method for converting spatial audio signal
CN104205878B (en) Method and system for head-related transfer function generation by linear mixing of head-related transfer functions
JP2011211312A (en) Sound image localization processing apparatus and sound image localization processing method
US9232319B2 (en) Systems and methods for audio processing
TW201820898A (en) Method for reproducing spatially distributed sounds
US20180213309A1 (en) Spatial Audio Processing Apparatus
US8873762B2 (en) System and method for efficient sound production using directional enhancement
KR20130116271A (en) Three-dimensional sound capturing and reproducing with multi-microphones
JP2007028624A (en) Method and system for reproducing wide monaural sound
JP2008522483A (en) Apparatus and method for reproducing multi-channel audio input signal with 2-channel output, and recording medium on which a program for doing so is recorded
JP6515720B2 (en) Out-of-head localization processing device, out-of-head localization processing method, and program
US20140205100A1 (en) Method and an apparatus for generating an acoustic signal with an enhanced spatial effect
EP2484127B1 (en) Method, computer program and apparatus for processing audio signals
WO2016167007A1 (en) Head-related transfer function selection device, head-related transfer function selection method, head-related transfer function selection program, and sound reproduction device
KR20220038478A (en) Apparatus, method or computer program for processing a sound field representation in a spatial transformation domain
JP2016529801A (en) Matrix decoder with constant output pairwise panning
WO2017119318A1 (en) Audio processing device and method, and program
CN108464018A (en) Reduce the phase difference between the voice-grade channel at multiple spatial positions
WO2020036077A1 (en) Signal processing device, signal processing method, and program
CN109923877B (en) Apparatus and method for weighting stereo audio signal
JP2011199707A (en) Audio data reproduction device, and audio data reproduction method
JP6463955B2 (en) Three-dimensional sound reproduction apparatus and program
WO2018211984A1 (en) Speaker array and signal processor
JP2013176170A (en) Reproduction device and reproduction method
JP7332745B2 (en) Speech processing method and speech processing device