WO2020057050A1 - Method for extracting direct sound and background sound, and loudspeaker system and sound reproduction method therefor - Google Patents

Method for extracting direct sound and background sound, and loudspeaker system and sound reproduction method therefor Download PDF

Info

Publication number
WO2020057050A1
WO2020057050A1 PCT/CN2019/075368 CN2019075368W WO2020057050A1 WO 2020057050 A1 WO2020057050 A1 WO 2020057050A1 CN 2019075368 W CN2019075368 W CN 2019075368W WO 2020057050 A1 WO2020057050 A1 WO 2020057050A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
sound
background
direct sound
channel
Prior art date
Application number
PCT/CN2019/075368
Other languages
French (fr)
Chinese (zh)
Inventor
叶超
蔡野锋
马登永
沐永生
Original Assignee
中科上声(苏州)电子有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中科上声(苏州)电子有限公司 filed Critical 中科上声(苏州)电子有限公司
Publication of WO2020057050A1 publication Critical patent/WO2020057050A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating

Definitions

  • the invention relates to a method for converting a stereo dual-channel signal into a multi-channel signal, and particularly relates to a direct sound and background sound extraction method based on frequency-domain spatial decomposition, a speaker system, and a sound playback method thereof.
  • the traditional processing method generally uses the point-by-point calculation method in the time domain.
  • the correlation coefficient is a point-by-point calculation method, it is easy to introduce errors, and the direct sound and the background sound cannot be distinguished well.
  • PCA principal component analysis
  • X uses the minimum mean square error method to calculate the weighting factors of the left and right channels to separate the speech sound and the background sound.
  • PCA principal component analysis
  • X uses the minimum mean square error method to calculate the weighting factors of the left and right channels to separate the speech sound and the background sound.
  • determine the vector relationship of the acoustic signals in three-dimensional coordinates determine the vector relationship of the acoustic signals in three-dimensional coordinates, and then divide the speech sound and the background sound into four signals left, middle, right, and surround according to the principle of energy conservation, and then divide the surround signals into left and rear through a decorrelation filter And right back surround to achieve dual-channel to 5-channel conversion.
  • This method performs calculations in the time domain.
  • the method is simple and fast, but only one surround signal can be separated by PCA analysis.
  • the present invention aims to propose a direct sound and background sound extraction method, which can better distinguish the direct sound and the background sound.
  • the invention also aims to propose a speaker system and a method for sound reproduction based on the direct sound and background sound extraction method.
  • the technical solution adopted by the present invention is:
  • step S1 in step S1,
  • J represents the number of direct sound sources existing in space
  • s j (n) represents the direct sound signal at a certain moment
  • S j (m, k) represents the direct sound signal in the time-frequency domain
  • N L (m, k) and N R (m, k) represent the time-frequency domain expressions of the left and right channel background signals.
  • step S2 specifically includes:
  • a L and A R respectively represent the coefficients of the direct sound signal assigned to the left and right channel signals
  • N L (m, k) B L (m, k) N (m, k)
  • N R (m, k) B R (m, k) N (m, k)
  • N (m, k) represents the background signal in the time-frequency domain
  • b L (m, k) and b R (m, k) represent the amplitudes of the spatial factors of the left and right channels, respectively.
  • X L (m, k) A L (m, k) S (m, k) + B L (m, k) N (m, k)
  • E ⁇ represents the expectation of the signal
  • step S3 specifically includes: setting the value of the space factor to obtain an analytical solution of P S (m, k), P N (m, k), A L (m, k), A R (m, k) To calculate the following formulas (1) and (2)
  • the technical solution adopted by the present invention is:
  • a sound reproduction method for a speaker system which uses the direct sound and background sound extraction methods described above to separate the direct sound signal and the background sound signal, and distributes the direct sound signal and the background sound signal to each speaker of the speaker system for Sound playback.
  • the direct sound signal and the background sound signal are allocated to each speaker of the speaker system according to the orientation of the sound image in the stereo signal and the number and position of the speakers of the speaker system.
  • the technical solution adopted by the present invention is:
  • a speaker system includes a plurality of speakers, characterized in that the speaker system further comprises an extraction device for performing the direct sound and background sound extraction method as described above.
  • the extraction device includes an STFT module, an energy estimation module, a signal separation module, and an ISTFT module, which are sequentially connected,
  • the input of the STFT module is a left channel signal x L (n) and a right channel signal x R (n), which are used to output X corresponding to the left channel signal and the right channel signal after performing short-time Fourier transform.
  • the energy estimation module is used to receive X L (m, k) and X R (m, k) output by the STFT module, and introduce a space factor to express the background sound signal as a signal passing through different transmission paths in the room. Generated signals, and perform energy estimation on X L (m, k) and X R (m, k) respectively, and obtain the energy P L (m, k) and P R (m, k) of the left and right channels and output To the signal separation module;
  • the signal separation module is used to set the value of the spatial factor and perform signal separation to obtain with And output to the ISTFT module;
  • the present invention adopts the above scheme, and has the following advantages over the prior art:
  • the traditional method can only isolate one background signal.
  • FIG. 1 is a signal processing flowchart of a direct sound and background sound extraction method according to the present invention
  • Figure 2 shows the left and right channel signals
  • FIG. 3 shows the correlation coefficients of the left and right channel signals at a certain moment
  • 4a, 4b, and 4c respectively show the separated direct sound signal, the left channel background sound signal, and the right channel background sound signal.
  • This embodiment provides a direct sound and background sound extraction method.
  • the extraction method includes the following steps:
  • the left and right channel signals are:
  • s j (n) represents the direct sound signal at a certain time
  • n L (n) and n R (n) represent the background signals of the left and right channels.
  • m and k represent time and frequency, respectively.
  • X L (m, k) A L (m, k) S (m, k) + B L (m, k) N (m, k)
  • the correlation coefficient between the left and right channel signals (as shown in Figure 3) is defined as
  • the energy of the left and right channels can be obtained as follows:
  • This embodiment also provides a sound playback method for a speaker system.
  • the speaker system includes multiple speakers, and each speaker is disposed at a different position.
  • the sound playback method is a method for stereo conversion of multi-channel sound signals, and specifically includes: using the direct sound and background sound extraction methods described above to separate the direct sound signal and the background sound signal, and according to the orientation of the sound image in the stereo signal And the number and position of the speakers of the speaker system, the direct sound signal and the background sound signal are distributed to each speaker of the speaker system, thereby completing sound reproduction.
  • This embodiment further provides a speaker system including a plurality of speakers, and the speaker system further includes an extraction device for performing the direct sound and background sound extraction method described above.
  • the extraction device specifically includes an STFT module, an energy estimation module, a signal separation module, and an ISTFT module connected in this order.
  • the input of the STFT module is the left channel signal x L (n) and the right channel signal x R (n).
  • the energy estimation module receives X L (m, k) and X R (m, k) output by the STFT module, and introduces a space factor to express the background sound signal as a signal Signals generated through different transmission paths in the room, and energy estimates for X L (m, k) and X R (m, k), respectively, to obtain the energy P L (m, k) and P of the left and right channels R (m, k) and A L , A R and output to the signal separation module; the signal separation module also sets the value of the space factor and performs signal separation to obtain with And output to the ISTFT module; the ISTFT module respectively performs inverse Fourier transform to output direct sound signals Left channel background sound signal And right channel background signal

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

A method for extracting a direct sound and a background sound, and a loudspeaker system and sound reproduction method therefor, capable of better distinguishing a direct sound from a background sound. The method comprises the following steps: S1, respectively performing short-time Fourier transform on a left-channel signal xL(n) and a right-channel signal xR(n) to obtain XL(m,k) and XR(m,k) respectively corresponding to the left-channel signal and the right-channel signal, wherein n represents a time domain sampling point, and m and k respectively represent a discrete time and a discrete frequency; S2, introducing a spatial factor to express a background signal as a signal generated by a signal passing though different transmission paths in a room, and respectively performing energy estimation on the XL(m,k) and the XR(m,k) to obtain energies PL(m,k) and PR(m,k) of left and right channels; S3, setting the value of the spatial factor, and performing signal separation to obtain an estimation Ŝ(m,k) of a direct sound signal, an estimation (I) of a left-channel background signal, and an estimation (II) of a right-channel background signal of a time-frequency domain; and S4, performing inverse Fourier transform to obtain a direct sound signal Ŝ(n), a left-channel background signal (III), and a right-channel background signal (IV) of a time domain.

Description

直达声与背景声提取方法、扬声器系统及其声重放方法Direct sound and background sound extraction method, speaker system and sound reproduction method 技术领域Technical field
本发明涉及立体声双通道信号转换为多通道信号的方法,具体涉及一种基于频域空间分解的直达声与背景声提取方法、扬声器系统及其声重放方法。The invention relates to a method for converting a stereo dual-channel signal into a multi-channel signal, and particularly relates to a direct sound and background sound extraction method based on frequency-domain spatial decomposition, a speaker system, and a sound playback method thereof.
背景技术Background technique
目前大多数的音源仍然是立体声的,包括CD、MP3、广播信号等都是双通道输出,仅有左右通道(L、R),所有的特征信息,比如直达声信号、混响声信号、声源位置、声场空间大小等都包含在两个通道内。当采用多个扬声器重放立体声音源时,如果直接将左右通道信号馈给各个扬声器,会造成空间声场的混乱。因此,需要采用数字信号处理技术将立体声信号转换为多通道信号,通过多个扬声器系统进行重放,以构建真实的空间声场。At present, most of the audio sources are still stereo, including CD, MP3, broadcast signals, etc. are dual-channel output, only left and right channels (L, R), all the characteristic information, such as direct sound signal, reverb sound signal, sound source Position, sound field space, etc. are contained in both channels. When multiple speakers are used to reproduce a stereo sound source, if the left and right channel signals are directly fed to each speaker, the spatial sound field will be chaotic. Therefore, it is necessary to use digital signal processing technology to convert stereo signals into multi-channel signals and play them back through multiple speaker systems to build a real spatial sound field.
传统处理方法一般采用时域逐点计算的方法,在分离直达声和背景声信号时,由于相关系数是采用逐点计算的方法,容易引入误差,导致直达声与背景声不能很好的区分。The traditional processing method generally uses the point-by-point calculation method in the time domain. When the direct sound and the background sound signal are separated, because the correlation coefficient is a point-by-point calculation method, it is easy to introduce errors, and the direct sound and the background sound cannot be distinguished well.
如美国专利US6496584B2和中国专利ZL01802081.X公开的基于主成分分析(PCA)的方法,采用最小均方误差法计算左右通道的加权因子,分离出语言声与背景声,通过计算左右通道间的相关系数,确定在三维坐标下声信号的矢量关系,然后根据能量守恒原则将语言声与背景声分为左、中、右和环绕四个信号,再通过去相关滤波器把环绕信号分为左后和右后环绕,实现双通道到5通道的转换。这种方法从时域上进行计算,方法简单,运算速度快,但是通过PCA分析只能分离出一个环绕信号,而利用去相关滤波器分离左后和右后环绕的方法会产生一定的误差。For example, the method based on principal component analysis (PCA) disclosed in U.S. patent US6496584B2 and Chinese patent ZL01802081.X uses the minimum mean square error method to calculate the weighting factors of the left and right channels to separate the speech sound and the background sound. By calculating the correlation between the left and right channels Coefficient, determine the vector relationship of the acoustic signals in three-dimensional coordinates, and then divide the speech sound and the background sound into four signals left, middle, right, and surround according to the principle of energy conservation, and then divide the surround signals into left and rear through a decorrelation filter And right back surround to achieve dual-channel to 5-channel conversion. This method performs calculations in the time domain. The method is simple and fast, but only one surround signal can be separated by PCA analysis. The method of separating left and right back surround by using a decorrelation filter will generate certain errors.
发明内容Summary of the Invention
针对上述问题,本发明旨在提出一种直达声与背景声提取方法,能够较好地区分直达声与背景声。本发明还旨在提出一种基于该直达声与背景声提取方法进行声重放的扬声器系统及其声重放方法。In view of the above problems, the present invention aims to propose a direct sound and background sound extraction method, which can better distinguish the direct sound and the background sound. The invention also aims to propose a speaker system and a method for sound reproduction based on the direct sound and background sound extraction method.
根据本发明的第一个方面,本发明采用的技术方案为:According to the first aspect of the present invention, the technical solution adopted by the present invention is:
一种直达声与背景声提取方法,包括如下步骤:A direct sound and background sound extraction method includes the following steps:
S1、分别将左声道信号x L(n)和右声道信号x R(n)进行短时傅里叶变换得到分别对应左声道信号和右声道信号的X L(m,k)和X R(m,k),其中n表示时域采样点,m和k分别表示离散时间和离散频率; S1. Perform short-time Fourier transform on the left channel signal x L (n) and the right channel signal x R (n) to obtain X L (m, k) corresponding to the left channel signal and the right channel signal, respectively. And X R (m, k), where n is the time domain sampling point, and m and k are discrete time and discrete frequency, respectively;
S2、引入空间因子,将背景声信号表达为一个信号经过房间内的不同传递路径而产生的信号,并分别对X L(m,k)和X R(m,k)进行能量估计,得出左右声道的能量P L(m,k)和P R(m,k); S2. Introduce the space factor, express the background sound signal as a signal generated by different transmission paths in the room, and estimate the energy of X L (m, k) and X R (m, k) respectively. The energy P L (m, k) and P R (m, k) of the left and right channels;
S3、设定空间因子的值,进行信号分离得出时频域的直达声信号的估计
Figure PCTCN2019075368-appb-000001
左声道背景声信号的估计
Figure PCTCN2019075368-appb-000002
和右声道背景声信号的估计
Figure PCTCN2019075368-appb-000003
S3. Set the value of the spatial factor and perform signal separation to obtain an estimate of the direct sound signal in the time-frequency domain
Figure PCTCN2019075368-appb-000001
Left channel background sound signal estimation
Figure PCTCN2019075368-appb-000002
And right channel background sound signal estimation
Figure PCTCN2019075368-appb-000003
S4、经过逆傅里叶变换得到直达声信号
Figure PCTCN2019075368-appb-000004
左声道背景声信号
Figure PCTCN2019075368-appb-000005
和右声道背景声信号
Figure PCTCN2019075368-appb-000006
S4. Direct acoustic signals are obtained through inverse Fourier transform
Figure PCTCN2019075368-appb-000004
Left channel background sound signal
Figure PCTCN2019075368-appb-000005
And right channel background signal
Figure PCTCN2019075368-appb-000006
在一实施例,步骤S1中,In an embodiment, in step S1,
Figure PCTCN2019075368-appb-000007
Figure PCTCN2019075368-appb-000007
Figure PCTCN2019075368-appb-000008
Figure PCTCN2019075368-appb-000008
其中,J表示空间中存在的直达声源的个数,s j(n)表示某个时刻的直达声信号,
Figure PCTCN2019075368-appb-000009
分别表示直达声信号分配给左右通道信号的系数,n L(n)和n R(n)分别表示左右通道的背景信号;
Among them, J represents the number of direct sound sources existing in space, and s j (n) represents the direct sound signal at a certain moment,
Figure PCTCN2019075368-appb-000009
The coefficients of the direct sound signals assigned to the left and right channel signals, respectively, and n L (n) and n R (n) represent the background signals of the left and right channels, respectively;
Figure PCTCN2019075368-appb-000010
Figure PCTCN2019075368-appb-000010
Figure PCTCN2019075368-appb-000011
Figure PCTCN2019075368-appb-000011
其中,S j(m,k)表示时频域的直达声信号,
Figure PCTCN2019075368-appb-000012
分别表示 直达声信号分配给左右通道信号的系数的时频域表达式,N L(m,k)和N R(m,k)分别表示左右通道的背景信号的时频域表达式。
Where S j (m, k) represents the direct sound signal in the time-frequency domain,
Figure PCTCN2019075368-appb-000012
The time-frequency domain expressions of the coefficients of the direct sound signal assigned to the left and right channel signals, respectively. N L (m, k) and N R (m, k) represent the time-frequency domain expressions of the left and right channel background signals.
在一实施例中,步骤S2具体包括:In an embodiment, step S2 specifically includes:
S21、在某一时间m和某一频段k,只存在一个声源S i,则 S21. At a certain time m and a certain frequency band k, there is only one sound source S i , then
Figure PCTCN2019075368-appb-000013
Figure PCTCN2019075368-appb-000013
Figure PCTCN2019075368-appb-000014
Figure PCTCN2019075368-appb-000014
Figure PCTCN2019075368-appb-000015
Figure PCTCN2019075368-appb-000015
其中,A L、A R分别表示直达声信号分配给左右通道信号的系数; Among them, A L and A R respectively represent the coefficients of the direct sound signal assigned to the left and right channel signals;
S22、引入空间因子B L(m,k)和B R(m,k),得出如下表达式,N L(m,k)=B L(m,k)N(m,k),N R(m,k)=B R(m,k)N(m,k), S22. Introduce space factors B L (m, k) and B R (m, k), and obtain the following expression, N L (m, k) = B L (m, k) N (m, k), N R (m, k) = B R (m, k) N (m, k),
Figure PCTCN2019075368-appb-000016
Figure PCTCN2019075368-appb-000016
Figure PCTCN2019075368-appb-000017
Figure PCTCN2019075368-appb-000017
|b L(m,k)|≤1,
Figure PCTCN2019075368-appb-000018
| b L (m, k) | ≤1,
Figure PCTCN2019075368-appb-000018
|b R(m,k)|≤1,
Figure PCTCN2019075368-appb-000019
| b R (m, k) | ≤1,
Figure PCTCN2019075368-appb-000019
其中,N(m,k)表示时频域的背景信号,b L(m,k)、b R(m,k)分别表示左右通道空间因子的幅度,
Figure PCTCN2019075368-appb-000020
分别表示左右通道空间因子的相位;
Among them, N (m, k) represents the background signal in the time-frequency domain, and b L (m, k) and b R (m, k) represent the amplitudes of the spatial factors of the left and right channels, respectively.
Figure PCTCN2019075368-appb-000020
Respectively indicate the phase of the left and right channel spatial factors;
则,X L(m,k)和X R(m,k)分别简化为: Then, X L (m, k) and X R (m, k) are simplified as:
X L(m,k)=A L(m,k)S(m,k)+B L(m,k)N(m,k) X L (m, k) = A L (m, k) S (m, k) + B L (m, k) N (m, k)
X R(m,k)=A R(m,k)S(m,k)+B R(m,k)N(m,k) X R (m, k) = A R (m, k) S (m, k) + B R (m, k) N (m, k)
左右声道信号之间的相关系数Correlation coefficient between left and right channel signals
Figure PCTCN2019075368-appb-000021
Figure PCTCN2019075368-appb-000021
其中,E{}表示信号的期望;Among them, E {} represents the expectation of the signal;
S23、从能量角度可以得出左右声道的能量P L(m,k)和P R(m,k)分别为: S23. From the energy perspective, the energy P L (m, k) and P R (m, k) of the left and right channels can be obtained respectively:
Figure PCTCN2019075368-appb-000022
Figure PCTCN2019075368-appb-000022
Figure PCTCN2019075368-appb-000023
Figure PCTCN2019075368-appb-000023
Figure PCTCN2019075368-appb-000024
Figure PCTCN2019075368-appb-000024
优选地,步骤S3具体包括:设定空间因子的值,得到P S(m,k),P N(m,k),A L(m,k),A R(m,k)的解析解,计算出下式(1)和(2) Preferably, step S3 specifically includes: setting the value of the space factor to obtain an analytical solution of P S (m, k), P N (m, k), A L (m, k), A R (m, k) To calculate the following formulas (1) and (2)
Figure PCTCN2019075368-appb-000025
Figure PCTCN2019075368-appb-000025
Figure PCTCN2019075368-appb-000026
Figure PCTCN2019075368-appb-000026
将空间因子B L(m,k)和B R(m,k)分别代入式(2)中,得到
Figure PCTCN2019075368-appb-000027
Figure PCTCN2019075368-appb-000028
Substituting the space factors B L (m, k) and B R (m, k) into Eq. (2) respectively, we get
Figure PCTCN2019075368-appb-000027
with
Figure PCTCN2019075368-appb-000028
更优选地,设定空间因子的值b L(m,k)=b R(m,k)=1,
Figure PCTCN2019075368-appb-000029
Figure PCTCN2019075368-appb-000030
More preferably, the value of the space factor b L (m, k) = b R (m, k) = 1 is set,
Figure PCTCN2019075368-appb-000029
Figure PCTCN2019075368-appb-000030
根据本发明的第二个方面,本发明采用的技术方案为:According to the second aspect of the present invention, the technical solution adopted by the present invention is:
一种扬声器系统的声重放方法,采用如上所述的直达声与背景声提取方法分离出直达声信号和背景声信号,将直达声信号和背景声信号分配给扬声器系统的各个扬声器,以进行声重放。A sound reproduction method for a speaker system, which uses the direct sound and background sound extraction methods described above to separate the direct sound signal and the background sound signal, and distributes the direct sound signal and the background sound signal to each speaker of the speaker system for Sound playback.
具体地,根据立体声信号中声像的方位及所述扬声器系统的扬声器数量和位置,将直达声信号和背景声信号分配给扬声器系统的各个扬声器。Specifically, the direct sound signal and the background sound signal are allocated to each speaker of the speaker system according to the orientation of the sound image in the stereo signal and the number and position of the speakers of the speaker system.
根据本发明的第三个方面,本发明采用的技术方案为:According to the third aspect of the present invention, the technical solution adopted by the present invention is:
一种扬声器系统,包括多个扬声器,其特征在于,所述扬声器系统还包括用于执行如上所述的直达声与背景声提取方法的提取装置。A speaker system includes a plurality of speakers, characterized in that the speaker system further comprises an extraction device for performing the direct sound and background sound extraction method as described above.
具体地,所述提取装置包括包括依次连接的STFT模块、能量估计模块、信号分离模块及ISTFT模块,Specifically, the extraction device includes an STFT module, an energy estimation module, a signal separation module, and an ISTFT module, which are sequentially connected,
所述STFT模块的输入为左声道信号x L(n)和右声道信号x R(n),用于进行短时傅里叶变换后输出对应左声道信号和右声道信号的X L(m,k)和X R(m,k); The input of the STFT module is a left channel signal x L (n) and a right channel signal x R (n), which are used to output X corresponding to the left channel signal and the right channel signal after performing short-time Fourier transform. L (m, k) and X R (m, k);
所述能量估计模块,用于接收STFT模块输出的X L(m,k)和X R(m,k),并引入空间因子,将背景声信号表达为一个信号经过房间内的不同传递路径而产生的信号,并分别对X L(m,k)和X R(m,k)进行能量估计,得出左右声道的能量P L(m,k)和P R(m,k)并输出至所述信号分离模块; The energy estimation module is used to receive X L (m, k) and X R (m, k) output by the STFT module, and introduce a space factor to express the background sound signal as a signal passing through different transmission paths in the room. Generated signals, and perform energy estimation on X L (m, k) and X R (m, k) respectively, and obtain the energy P L (m, k) and P R (m, k) of the left and right channels and output To the signal separation module;
所述信号分离模块,用于设定空间因子的值,进行信号分离得出
Figure PCTCN2019075368-appb-000031
Figure PCTCN2019075368-appb-000032
Figure PCTCN2019075368-appb-000033
并输出至所述ISTFT模块;
The signal separation module is used to set the value of the spatial factor and perform signal separation to obtain
Figure PCTCN2019075368-appb-000031
Figure PCTCN2019075368-appb-000032
with
Figure PCTCN2019075368-appb-000033
And output to the ISTFT module;
所述ISTFT模块,用于进行逆傅里叶变换,输出直达声信号
Figure PCTCN2019075368-appb-000034
左声道背景声信号
Figure PCTCN2019075368-appb-000035
和右声道背景声信号
Figure PCTCN2019075368-appb-000036
The ISTFT module is used to perform inverse Fourier transform and output a direct sound signal
Figure PCTCN2019075368-appb-000034
Left channel background sound signal
Figure PCTCN2019075368-appb-000035
And right channel background signal
Figure PCTCN2019075368-appb-000036
本发明采用以上方案,相比现有技术具有如下优点:The present invention adopts the above scheme, and has the following advantages over the prior art:
通过定义左右声道信号之间的空间因子变量,以表征背景声信号在声传播过程中由于房间混响、空间大小等因素引起的左右通道之间的差异;可以分离出左右通道的背景声信号,而传统方法只能分离出一个背景信号。By defining the spatial factor variables between the left and right channel signals, to characterize the differences between the left and right channels caused by background reverberation due to factors such as room reverberation and space size in the process of sound propagation; the background sound signals of the left and right channels can be separated However, the traditional method can only isolate one background signal.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本发明的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solution of the present invention more clearly, the drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For ordinary technicians, other drawings can be obtained based on these drawings without paying creative work.
图1为根据本发明的直达声与背景声提取方法的信号处理流程图;1 is a signal processing flowchart of a direct sound and background sound extraction method according to the present invention;
图2示出了左右声道信号;Figure 2 shows the left and right channel signals;
图3示出了左右声道信号在某个时刻的相关系数;FIG. 3 shows the correlation coefficients of the left and right channel signals at a certain moment;
图4a、4b、4c分别示出了分离后的直达声信号、左声道背景声信号、右声道背景声信号。4a, 4b, and 4c respectively show the separated direct sound signal, the left channel background sound signal, and the right channel background sound signal.
具体实施方式detailed description
下面结合附图对本发明的较佳实施例进行详细阐述,以使本发明的优点和特征能更易于被本领域的技术人员理解。在此需要说明的是,对于这些实施方式的说明用于帮助理解本发明,但并不构成对本发明的限定。此外,下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以互相结合。The preferred embodiments of the present invention are described in detail below with reference to the accompanying drawings, so that the advantages and features of the present invention can be more easily understood by those skilled in the art. It should be noted that the description of these embodiments is used to help understand the present invention, but does not limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.
本实施例提供一种直达声与背景声提取方法,参照图1所示的信号流程图,该提取方法包括如下步骤:This embodiment provides a direct sound and background sound extraction method. Referring to the signal flowchart shown in FIG. 1, the extraction method includes the following steps:
S1、分别将左声道信号x L(n)和右声道信号x R(n)进行短时傅里叶变换(STFT)得到分别对应左声道信号和右声道信号的X L(m,k)和X R(m,k),其中n表示时域采样点,m和k分别表示离散时间和离散频率; S1. Perform short-time Fourier transform (STFT) on the left channel signal x L (n) and the right channel signal x R (n) to obtain X L (m , k) and X R (m, k), where n is the sampling point in time domain, and m and k are discrete time and discrete frequency, respectively
S2、引入空间因子,将背景声信号表达为一个信号经过房间内的不同传递路径而产生的信号,并分别对X L(m,k)和X R(m,k)进行能量估计,得出左右声道的能量P L(m,k)和P R(m,k); S2. Introduce the space factor, express the background sound signal as a signal generated by different transmission paths in the room, and estimate the energy of X L (m, k) and X R (m, k) respectively. The energy P L (m, k) and P R (m, k) of the left and right channels;
S3、设定空间因子的值,进行信号分离得出时频域的直达声信号的估计
Figure PCTCN2019075368-appb-000037
左声道背景声信号的估计
Figure PCTCN2019075368-appb-000038
和右声道背景声信号的估计
Figure PCTCN2019075368-appb-000039
S3. Set the value of the spatial factor and perform signal separation to obtain an estimation of the direct sound signal in the time-frequency domain.
Figure PCTCN2019075368-appb-000037
Left channel background sound signal estimation
Figure PCTCN2019075368-appb-000038
And right channel background sound signal estimation
Figure PCTCN2019075368-appb-000039
S4、经过逆傅里叶变换(ISTFT)得到时域的直达声信号
Figure PCTCN2019075368-appb-000040
左声道背景声信号
Figure PCTCN2019075368-appb-000041
和右声道背景声信号
Figure PCTCN2019075368-appb-000042
S4. Obtain the direct sound signal in the time domain through the inverse Fourier transform (ISTFT)
Figure PCTCN2019075368-appb-000040
Left channel background sound signal
Figure PCTCN2019075368-appb-000041
And right channel background signal
Figure PCTCN2019075368-appb-000042
具体地,如图2所示,左、右声道信号分别为:Specifically, as shown in FIG. 2, the left and right channel signals are:
Figure PCTCN2019075368-appb-000043
Figure PCTCN2019075368-appb-000043
Figure PCTCN2019075368-appb-000044
Figure PCTCN2019075368-appb-000044
其中,s j(n)表示某个时刻的直达声信号,
Figure PCTCN2019075368-appb-000045
表示直达声信号分配给左右通道信号的系数,n L(n)和n R(n)表示左右通道的背景信号。
Where s j (n) represents the direct sound signal at a certain time,
Figure PCTCN2019075368-appb-000045
Represents the coefficients of the direct sound signal assigned to the left and right channel signals, and n L (n) and n R (n) represent the background signals of the left and right channels.
S1、经过短时傅里叶变换后(STFT),得到S1, after short-time Fourier transform (STFT), get
Figure PCTCN2019075368-appb-000046
Figure PCTCN2019075368-appb-000046
Figure PCTCN2019075368-appb-000047
Figure PCTCN2019075368-appb-000047
其中,m和k分别表示时间和频率。Among them, m and k represent time and frequency, respectively.
S2、有两个假设:S2, there are two assumptions:
S21、在某个时间m和某个频段k,只存在一个声源S i,即
Figure PCTCN2019075368-appb-000048
因此
S21. At a certain time m and a certain frequency band k, there is only one sound source S i , that is,
Figure PCTCN2019075368-appb-000048
therefore
Figure PCTCN2019075368-appb-000049
Figure PCTCN2019075368-appb-000049
Figure PCTCN2019075368-appb-000050
Figure PCTCN2019075368-appb-000050
Figure PCTCN2019075368-appb-000051
Figure PCTCN2019075368-appb-000051
S22、引入空间因子B,将背景声信号表达为一个信号经过房间内的不同传递路径而产生的信号,类似于直达声信号的表达方式,即N L(m,k)=B L(m,k)N(m,k),N R(m,k)=B R(m,k)N(m,k) S22. The spatial factor B is introduced, and the background sound signal is expressed as a signal generated through different transmission paths in the room, which is similar to the direct sound signal expression mode, that is, N L (m, k) = B L (m, k) N (m, k), N R (m, k) = B R (m, k) N (m, k)
Figure PCTCN2019075368-appb-000052
Figure PCTCN2019075368-appb-000052
Figure PCTCN2019075368-appb-000053
Figure PCTCN2019075368-appb-000053
|b L(m,k)|≤1,
Figure PCTCN2019075368-appb-000054
| b L (m, k) | ≤1,
Figure PCTCN2019075368-appb-000054
|b R(m,k)|≤1,
Figure PCTCN2019075368-appb-000055
| b R (m, k) | ≤1,
Figure PCTCN2019075368-appb-000055
这样,上述公式可以简化为In this way, the above formula can be simplified to
X L(m,k)=A L(m,k)S(m,k)+B L(m,k)N(m,k) X L (m, k) = A L (m, k) S (m, k) + B L (m, k) N (m, k)
X R(m,k)=A R(m,k)S(m,k)+B R(m,k)N(m,k) X R (m, k) = A R (m, k) S (m, k) + B R (m, k) N (m, k)
左右声道信号之间的相关系数(如图3所示)定义为The correlation coefficient between the left and right channel signals (as shown in Figure 3) is defined as
Figure PCTCN2019075368-appb-000056
Figure PCTCN2019075368-appb-000056
S23、从能量角度可以得出左右通道的能量分别为:S23. From the energy perspective, the energy of the left and right channels can be obtained as follows:
Figure PCTCN2019075368-appb-000057
Figure PCTCN2019075368-appb-000057
Figure PCTCN2019075368-appb-000058
Figure PCTCN2019075368-appb-000058
Figure PCTCN2019075368-appb-000059
Figure PCTCN2019075368-appb-000059
S3、一般情况下,背景声能量小于直达声能量,并且
Figure PCTCN2019075368-appb-000060
所以传统做法是忽略P N(m,k),即
S3. Generally, the background sound energy is less than the direct sound energy, and
Figure PCTCN2019075368-appb-000060
So the traditional approach is to ignore P N (m, k), that is,
Figure PCTCN2019075368-appb-000061
Figure PCTCN2019075368-appb-000061
Figure PCTCN2019075368-appb-000062
Figure PCTCN2019075368-appb-000062
在这里,假设空间因子的值为b L=b R=1,
Figure PCTCN2019075368-appb-000063
没有忽略P N(m,k)的贡献,这样就得到P S(m,k),P N(m,k),A L(m,k),A R(m,k)的解析解。
Here, it is assumed that the value of the space factor is b L = b R = 1,
Figure PCTCN2019075368-appb-000063
The contribution of P N (m, k) is not ignored, so that the analytical solutions of P S (m, k), P N (m, k), A L (m, k), and A R (m, k) are obtained.
于是,可以计算出S(m,k),N(m,k)Then, S (m, k), N (m, k) can be calculated.
Figure PCTCN2019075368-appb-000064
Figure PCTCN2019075368-appb-000064
Figure PCTCN2019075368-appb-000065
Figure PCTCN2019075368-appb-000065
再将空间因子B L(m,k)和B R(m,k)代入其中,可以得到N L(m,k),N R(m,k)。 Substituting the space factors B L (m, k) and B R (m, k) into N L (m, k) and N R (m, k).
S4、最后经过逆傅里叶变换得到分别如图4a、4b、4c所示的直达声信号
Figure PCTCN2019075368-appb-000066
背景声信号
Figure PCTCN2019075368-appb-000067
Figure PCTCN2019075368-appb-000068
S4. Finally, the inverse Fourier transform is used to obtain the direct sound signals shown in Figs.
Figure PCTCN2019075368-appb-000066
Background sound signal
Figure PCTCN2019075368-appb-000067
with
Figure PCTCN2019075368-appb-000068
该提取方法中,(1)通过定义左右声道信号之间的空间因子变量,以表征背景声信号在声传播过程中由于房间混响、空间大小等因素引起的左右通道之间的 差异;(2)可以分离出左右通道的背景声信号,而传统方法只能分离出一个背景信号;(3)加入空间因子后的计算过程比较简单,可以得到直达声与背景声的解析解。In this extraction method, (1) by defining the spatial factor variables between the left and right channel signals, to characterize the difference between the left and right channels caused by background reverberation, room size, and other factors during the sound propagation process; 2) The background sound signals of the left and right channels can be separated, but the traditional method can only separate one background signal; (3) The calculation process after adding the space factor is relatively simple, and an analytical solution of the direct sound and the background sound can be obtained.
本实施例还提供一种扬声器系统的声重放方法,该扬声器系统包括多个扬声器,各扬声器分别布放在不同位置。该声重放方法是一种立体声转换多通道声信号的方法,具体包括:采用如上所述的直达声与背景声提取方法分离出直达声信号和背景声信号,根据立体声信号中声像的方位及所述扬声器系统的扬声器数量和位置,将直达声信号和背景声信号分配给扬声器系统的各个扬声器,从而完成声重放。This embodiment also provides a sound playback method for a speaker system. The speaker system includes multiple speakers, and each speaker is disposed at a different position. The sound playback method is a method for stereo conversion of multi-channel sound signals, and specifically includes: using the direct sound and background sound extraction methods described above to separate the direct sound signal and the background sound signal, and according to the orientation of the sound image in the stereo signal And the number and position of the speakers of the speaker system, the direct sound signal and the background sound signal are distributed to each speaker of the speaker system, thereby completing sound reproduction.
本实施例还提供一种扬声器系统,包括多个扬声器,所述扬声器系统还包括用于执行如上所述的直达声与背景声提取方法的提取装置。结合图1所示,该提取装置具体包括依次连接的STFT模块、能量估计模块、信号分离模块及ISTFT模块。其中,STFT模块的输入为左声道信号x L(n)和右声道信号x R(n),进行短时傅里叶变换后输出对应左声道信号和右声道信号的X L(m,k)和X R(m,k);能量估计模块接收STFT模块输出的X L(m,k)和X R(m,k),并引入空间因子,将背景声信号表达为一个信号经过房间内的不同传递路径而产生的信号,并分别对X L(m,k)和X R(m,k)进行能量估计,得出左右声道的能量P L(m,k)和P R(m,k)以及A L、A R并输出至信号分离模块;信号分离模块还设定空间因子的值,进行信号分离得出
Figure PCTCN2019075368-appb-000069
Figure PCTCN2019075368-appb-000070
并输出至ISTFT模块;ISTFT模块分别进行逆傅里叶变换,输出直达声信号
Figure PCTCN2019075368-appb-000071
左声道背景声信号
Figure PCTCN2019075368-appb-000072
和右声道背景声信号
Figure PCTCN2019075368-appb-000073
This embodiment further provides a speaker system including a plurality of speakers, and the speaker system further includes an extraction device for performing the direct sound and background sound extraction method described above. As shown in FIG. 1, the extraction device specifically includes an STFT module, an energy estimation module, a signal separation module, and an ISTFT module connected in this order. The input of the STFT module is the left channel signal x L (n) and the right channel signal x R (n). After short-time Fourier transform, the X L ( m, k) and X R (m, k); the energy estimation module receives X L (m, k) and X R (m, k) output by the STFT module, and introduces a space factor to express the background sound signal as a signal Signals generated through different transmission paths in the room, and energy estimates for X L (m, k) and X R (m, k), respectively, to obtain the energy P L (m, k) and P of the left and right channels R (m, k) and A L , A R and output to the signal separation module; the signal separation module also sets the value of the space factor and performs signal separation to obtain
Figure PCTCN2019075368-appb-000069
with
Figure PCTCN2019075368-appb-000070
And output to the ISTFT module; the ISTFT module respectively performs inverse Fourier transform to output direct sound signals
Figure PCTCN2019075368-appb-000071
Left channel background sound signal
Figure PCTCN2019075368-appb-000072
And right channel background signal
Figure PCTCN2019075368-appb-000073
上述实施例只为说明本发明的技术构思及特点,是一种优选的实施例,其目的在于熟悉此项技术的人士能够了解本发明的内容并据以实施,并不能以此限定本发明的保护范围。The above embodiment is only for explaining the technical concept and features of the present invention, and is a preferred embodiment. The purpose is that persons familiar with the technology can understand the content of the present invention and implement it accordingly. protected range.

Claims (9)

  1. 一种直达声与背景声提取方法,其特征在于,包括如下步骤:A direct sound and background sound extraction method is characterized in that it includes the following steps:
    S1、分别将左声道信号x L(n)和右声道信号x R(n)进行短时傅里叶变换得到分别对应左声道信号和右声道信号的X L(m,k)和X R(m,k),其中n表示时域采样点,m和k分别表示离散时间和离散频率; S1. Perform short-time Fourier transform on the left channel signal x L (n) and the right channel signal x R (n) to obtain X L (m, k) corresponding to the left channel signal and the right channel signal, respectively. And X R (m, k), where n is the time domain sampling point, and m and k are discrete time and discrete frequency, respectively;
    S2、引入空间因子,将背景声信号表达为一个信号经过房间内的不同传递路径而产生的信号,并分别对X L(m,k)和X R(m,k)进行能量估计,得出左右声道的能量P L(m,k)和P R(m,k); S2. Introduce the space factor, express the background sound signal as a signal generated by different transmission paths in the room, and estimate the energy of X L (m, k) and X R (m, k) respectively. The energy P L (m, k) and P R (m, k) of the left and right channels;
    S3、设定空间因子的值,进行信号分离得出时频域的直达声信号的估计
    Figure PCTCN2019075368-appb-100001
    左声道背景声信号的估计
    Figure PCTCN2019075368-appb-100002
    和右声道背景声信号的估计
    Figure PCTCN2019075368-appb-100003
    S3. Set the value of the spatial factor and perform signal separation to obtain an estimation of the direct sound signal in the time-frequency domain.
    Figure PCTCN2019075368-appb-100001
    Left channel background sound signal estimation
    Figure PCTCN2019075368-appb-100002
    And right channel background sound signal estimation
    Figure PCTCN2019075368-appb-100003
    S4、经过逆傅里叶变换得到时域的直达声信号
    Figure PCTCN2019075368-appb-100004
    左声道背景声信号
    Figure PCTCN2019075368-appb-100005
    和右声道背景声信号
    Figure PCTCN2019075368-appb-100006
    S4. Obtaining a direct sound signal in the time domain through the inverse Fourier transform
    Figure PCTCN2019075368-appb-100004
    Left channel background sound signal
    Figure PCTCN2019075368-appb-100005
    And right channel background signal
    Figure PCTCN2019075368-appb-100006
  2. 根据权利要求1所述的直达声与背景声提取方法,其特征在于,步骤S1中,The direct sound and background sound extraction method according to claim 1, wherein in step S1,
    Figure PCTCN2019075368-appb-100007
    Figure PCTCN2019075368-appb-100007
    Figure PCTCN2019075368-appb-100008
    Figure PCTCN2019075368-appb-100008
    其中,J表示空间中存在的直达声源的个数,s j(n)表示某个时刻的直达声信号,
    Figure PCTCN2019075368-appb-100009
    分别表示直达声信号分配给左右通道信号的系数,n L(n)和n R(n)分别表示左右通道的背景信号;
    Among them, J represents the number of direct sound sources existing in space, and s j (n) represents the direct sound signal at a certain moment,
    Figure PCTCN2019075368-appb-100009
    The coefficients of the direct sound signals assigned to the left and right channel signals, respectively, and n L (n) and n R (n) represent the background signals of the left and right channels, respectively;
    Figure PCTCN2019075368-appb-100010
    Figure PCTCN2019075368-appb-100010
    Figure PCTCN2019075368-appb-100011
    Figure PCTCN2019075368-appb-100011
    其中,S j(m,k)表示时频域的直达声信号,
    Figure PCTCN2019075368-appb-100012
    分别表示 直达声信号分配给左右通道信号的系数的时频域表达式,N L(m,k)和N R(m,k)分别表示左右通道的背景信号的时频域表达式。
    Where S j (m, k) represents the direct sound signal in the time-frequency domain,
    Figure PCTCN2019075368-appb-100012
    The time-frequency domain expressions of the coefficients of the direct sound signal assigned to the left and right channel signals, respectively. N L (m, k) and N R (m, k) represent the time-frequency domain expressions of the left and right channel background signals.
  3. 根据权利要求1或2所述的直达声与背景声提取方法,其特征在于,步骤S2具体包括:The direct sound and background sound extraction method according to claim 1 or 2, wherein step S2 specifically comprises:
    S21、在某一时间m和某一频段k,只存在一个声源S i,则 S21. At a certain time m and a certain frequency band k, there is only one sound source S i , then
    Figure PCTCN2019075368-appb-100013
    Figure PCTCN2019075368-appb-100013
    Figure PCTCN2019075368-appb-100014
    Figure PCTCN2019075368-appb-100014
    Figure PCTCN2019075368-appb-100015
    Figure PCTCN2019075368-appb-100015
    其中,A L、A R分别表示直达声信号分配给左右通道信号的系数; Among them, A L and A R respectively represent the coefficients of the direct sound signal assigned to the left and right channel signals;
    S22、引入空间因子B L(m,k)和B R(m,k),得出如下表达式,N L(m,k)=B L(m,k)N(m,k),N R(m,k)=B R(m,k)N(m,k), S22. Introduce space factors B L (m, k) and B R (m, k), and obtain the following expression, N L (m, k) = B L (m, k) N (m, k), N R (m, k) = B R (m, k) N (m, k),
    Figure PCTCN2019075368-appb-100016
    Figure PCTCN2019075368-appb-100016
    Figure PCTCN2019075368-appb-100017
    Figure PCTCN2019075368-appb-100017
    Figure PCTCN2019075368-appb-100018
    Figure PCTCN2019075368-appb-100018
    Figure PCTCN2019075368-appb-100019
    Figure PCTCN2019075368-appb-100019
    其中,N(m,k)表示时频域的背景信号,b L(m,k)、b R(m,k)分别表示左右通道空间因子的幅度,
    Figure PCTCN2019075368-appb-100020
    分别表示左右通道空间因子的相位;
    Among them, N (m, k) represents the background signal in the time-frequency domain, and b L (m, k) and b R (m, k) represent the amplitudes of the spatial factors of the left and right channels, respectively.
    Figure PCTCN2019075368-appb-100020
    Respectively indicate the phase of the left and right channel spatial factors;
    则,X L(m,k)和X R(m,k)分别简化为: Then, X L (m, k) and X R (m, k) are simplified as:
    X L(m,k)=A L(m,k)S(m,k)+B L(m,k)N(m,k) X L (m, k) = A L (m, k) S (m, k) + B L (m, k) N (m, k)
    X R(m,k)=A R(m,k)S(m,k)+B R(m,k)N(m,k) X R (m, k) = A R (m, k) S (m, k) + B R (m, k) N (m, k)
    左右声道信号之间的相关系数Correlation coefficient between left and right channel signals
    Figure PCTCN2019075368-appb-100021
    Figure PCTCN2019075368-appb-100021
    其中,E{}表示信号的期望;Among them, E {} represents the expectation of the signal;
    S23、从能量角度可以得出左右声道的能量P L(m,k)和P R(m,k)分别为: S23. From the energy perspective, the energy P L (m, k) and P R (m, k) of the left and right channels can be obtained respectively:
    Figure PCTCN2019075368-appb-100022
    Figure PCTCN2019075368-appb-100022
    Figure PCTCN2019075368-appb-100023
    Figure PCTCN2019075368-appb-100023
    Figure PCTCN2019075368-appb-100024
    Figure PCTCN2019075368-appb-100024
  4. 根据权利要求3所述的直达声与背景声提取方法,其特征在于,步骤S3具体包括:设定空间因子的值,得到P S(m,k),P N(m,k),A L(m,k),A R(m,k)的解析解,计算出下式(1)和(2) The direct sound and background sound extraction method according to claim 3, wherein step S3 specifically comprises: setting a value of a space factor to obtain P S (m, k), P N (m, k), A L (m, k), Analytical solution of A R (m, k), calculate the following formulas (1) and (2)
    Figure PCTCN2019075368-appb-100025
    Figure PCTCN2019075368-appb-100025
    Figure PCTCN2019075368-appb-100026
    Figure PCTCN2019075368-appb-100026
    将空间因子B L(m,k)和B R(m,k)分别代入式(2)中,得到
    Figure PCTCN2019075368-appb-100027
    Figure PCTCN2019075368-appb-100028
    Substituting the space factors B L (m, k) and B R (m, k) into Eq. (2) respectively, we get
    Figure PCTCN2019075368-appb-100027
    with
    Figure PCTCN2019075368-appb-100028
  5. 根据权利要求4所述的直达声与背景声提取方法,其特征在于,设定空间因子的值b L(m,k)=b R(m,k)=1,
    Figure PCTCN2019075368-appb-100029
    The direct sound and background sound extraction method according to claim 4, wherein the value of the spatial factor is set to b L (m, k) = b R (m, k) = 1,
    Figure PCTCN2019075368-appb-100029
  6. 一种扬声器系统的声重放方法,其特征在于,采用如权利要求1-5任一项所述的直达声与背景声提取方法分离出直达声信号和背景声信号,将直达声信号和背景声信号分配给扬声器系统的各个扬声器,以进行声重放。A sound playback method for a speaker system, characterized in that the direct sound and background sound extraction method according to any one of claims 1 to 5 is used to separate the direct sound signal and the background sound signal, and separate the direct sound signal and the background. The acoustic signals are distributed to the individual speakers of the speaker system for acoustic reproduction.
  7. 根据权利要求6所述的声重放方法,其特征在于,根据立体声信号中声像的方位及所述扬声器系统的扬声器数量和位置,将直达声信号和背景声信号分配给扬声器系统的各个扬声器。The sound reproduction method according to claim 6, wherein the direct sound signal and the background sound signal are allocated to each speaker of the speaker system according to the orientation of the sound image in the stereo signal and the number and position of the speakers of the speaker system. .
  8. 一种扬声器系统,包括多个扬声器,其特征在于,所述扬声器系统还包括用于执行权利要求1-5任一项所述的直达声与背景声提取方法的提取装置。A speaker system, comprising a plurality of speakers, characterized in that the speaker system further comprises an extraction device for performing a direct sound and background sound extraction method according to any one of claims 1-5.
  9. 根据权利要求8所述的扬声器系统,其特征在于:所述提取装置包括包括依次连接的STFT模块、能量估计模块、信号分离模块及ISTFT模块,The speaker system according to claim 8, wherein the extraction device comprises an STFT module, an energy estimation module, a signal separation module, and an ISTFT module connected in sequence,
    所述STFT模块的输入为左声道信号x L(n)和右声道信号x R(n),用于进行 短时傅里叶变换后输出对应左声道信号和右声道信号的X L(m,k)和X R(m,k); The input of the STFT module is a left channel signal x L (n) and a right channel signal x R (n), which are used to output X corresponding to the left channel signal and the right channel signal after performing short-time Fourier transform. L (m, k) and X R (m, k);
    所述能量估计模块,用于接收STFT模块输出的X L(m,k)和X R(m,k),并引入空间因子,将背景声信号表达为一个信号经过房间内的不同传递路径而产生的信号,并分别对X L(m,k)和X R(m,k)进行能量估计,得出左右声道的能量P L(m,k)和P R(m,k)并输出至所述信号分离模块; The energy estimation module is used to receive X L (m, k) and X R (m, k) output by the STFT module, and introduce a space factor to express the background sound signal as a signal passing through different transmission paths in the room. Generated signals, and perform energy estimation on X L (m, k) and X R (m, k) respectively, and obtain the energy P L (m, k) and P R (m, k) of the left and right channels and output To the signal separation module;
    所述信号分离模块,用于设定空间因子的值,进行信号分离得出
    Figure PCTCN2019075368-appb-100030
    Figure PCTCN2019075368-appb-100031
    Figure PCTCN2019075368-appb-100032
    并输出至所述ISTFT模块;
    The signal separation module is used to set the value of the spatial factor and perform signal separation to obtain
    Figure PCTCN2019075368-appb-100030
    Figure PCTCN2019075368-appb-100031
    with
    Figure PCTCN2019075368-appb-100032
    And output to the ISTFT module;
    所述ISTFT模块,用于进行逆傅里叶变换,输出直达声信号
    Figure PCTCN2019075368-appb-100033
    左声道背景声信号
    Figure PCTCN2019075368-appb-100034
    和右声道背景声信号
    Figure PCTCN2019075368-appb-100035
    The ISTFT module is used to perform inverse Fourier transform and output a direct sound signal
    Figure PCTCN2019075368-appb-100033
    Left channel background sound signal
    Figure PCTCN2019075368-appb-100034
    And right channel background signal
    Figure PCTCN2019075368-appb-100035
PCT/CN2019/075368 2018-09-17 2019-02-18 Method for extracting direct sound and background sound, and loudspeaker system and sound reproduction method therefor WO2020057050A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811072475.6A CN109036455B (en) 2018-09-17 2018-09-17 Direct sound and background sound extraction method, loudspeaker system and sound reproduction method thereof
CN201811072475.6 2018-09-17

Publications (1)

Publication Number Publication Date
WO2020057050A1 true WO2020057050A1 (en) 2020-03-26

Family

ID=64621766

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/075368 WO2020057050A1 (en) 2018-09-17 2019-02-18 Method for extracting direct sound and background sound, and loudspeaker system and sound reproduction method therefor

Country Status (2)

Country Link
CN (1) CN109036455B (en)
WO (1) WO2020057050A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109036455B (en) * 2018-09-17 2020-11-06 中科上声(苏州)电子有限公司 Direct sound and background sound extraction method, loudspeaker system and sound reproduction method thereof
CN111669697B (en) * 2020-05-25 2021-05-18 中国科学院声学研究所 Coherent sound and environmental sound extraction method and system of multichannel signal

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101341793A (en) * 2005-09-02 2009-01-07 Lg电子株式会社 Method to generate multi-channel audio signals from stereo signals
US20090198356A1 (en) * 2008-02-04 2009-08-06 Creative Technology Ltd Primary-Ambient Decomposition of Stereo Audio Signals Using a Complex Similarity Index
CN101622669A (en) * 2007-02-26 2010-01-06 高通股份有限公司 Systems, methods, and apparatus for signal separation
CN102804264A (en) * 2010-01-15 2012-11-28 弗兰霍菲尔运输应用研究公司 Apparatus and method for extracting a direct/ambience signal from a downmix signal and spatial parametric information
CN105409247A (en) * 2013-03-05 2016-03-16 弗劳恩霍夫应用研究促进协会 Apparatus and method for multichannel direct-ambient decomposition for audio signal processing
CN109036455A (en) * 2018-09-17 2018-12-18 中科上声(苏州)电子有限公司 Direct sound wave and background sound extracting method, speaker system and its sound playback method

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1286333B1 (en) * 2001-08-21 2004-10-06 Culturecom Technology (Macau) Ltd. Method and apparatus for processing a sound signal
JP5082327B2 (en) * 2006-08-09 2012-11-28 ソニー株式会社 Audio signal processing apparatus, audio signal processing method, and audio signal processing program
US8385556B1 (en) * 2007-08-17 2013-02-26 Dts, Inc. Parametric stereo conversion system and method
CN101894559B (en) * 2010-08-05 2012-06-06 展讯通信(上海)有限公司 Audio processing method and device thereof
CN103000179B (en) * 2011-09-16 2014-11-12 中国科学院声学研究所 Multichannel audio coding/decoding system and method
CN102610237A (en) * 2012-03-21 2012-07-25 山东大学 Digital signal processor (DSP) implementation system for two-channel convolution mixed voice signal blind source separation algorithm
CN104078051B (en) * 2013-03-29 2018-09-25 南京中兴软件有限责任公司 A kind of voice extracting method, system and voice audio frequency playing method and device
CN107146630B (en) * 2017-04-27 2020-02-14 同济大学 STFT-based dual-channel speech sound separation method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101341793A (en) * 2005-09-02 2009-01-07 Lg电子株式会社 Method to generate multi-channel audio signals from stereo signals
CN101622669A (en) * 2007-02-26 2010-01-06 高通股份有限公司 Systems, methods, and apparatus for signal separation
US20090198356A1 (en) * 2008-02-04 2009-08-06 Creative Technology Ltd Primary-Ambient Decomposition of Stereo Audio Signals Using a Complex Similarity Index
CN102804264A (en) * 2010-01-15 2012-11-28 弗兰霍菲尔运输应用研究公司 Apparatus and method for extracting a direct/ambience signal from a downmix signal and spatial parametric information
CN105409247A (en) * 2013-03-05 2016-03-16 弗劳恩霍夫应用研究促进协会 Apparatus and method for multichannel direct-ambient decomposition for audio signal processing
CN109036455A (en) * 2018-09-17 2018-12-18 中科上声(苏州)电子有限公司 Direct sound wave and background sound extracting method, speaker system and its sound playback method

Also Published As

Publication number Publication date
CN109036455B (en) 2020-11-06
CN109036455A (en) 2018-12-18

Similar Documents

Publication Publication Date Title
US10531198B2 (en) Apparatus and method for decomposing an input signal using a downmixer
EP3320692B1 (en) Spatial audio processing apparatus
KR101341523B1 (en) Method to generate multi-channel audio signals from stereo signals
JP5081838B2 (en) Audio encoding and decoding
KR101984115B1 (en) Apparatus and method for multichannel direct-ambient decomposition for audio signal processing
CA2835463C (en) Apparatus and method for generating an output signal employing a decomposer
CN101842834B (en) Device and method for generating a multi-channel signal using voice signal processing
JP6620235B2 (en) Apparatus and method for sound stage expansion
US7567845B1 (en) Ambience generation for stereo signals
JP6284480B2 (en) Audio signal reproducing apparatus, method, program, and recording medium
CN102907120A (en) System and method for sound processing
WO2020057051A1 (en) Multi-channel signal conversion method for vehicle audio system and, vehicle audio system
US10523171B2 (en) Method for dynamic sound equalization
WO2020057050A1 (en) Method for extracting direct sound and background sound, and loudspeaker system and sound reproduction method therefor
JP2020508590A (en) Apparatus and method for downmixing multi-channel audio signals
KR20110041062A (en) Virtual speaker apparatus and method for porocessing virtual speaker
CN109036456B (en) Method for extracting source component environment component for stereo
Kinoshita et al. Blind upmix of stereo music signals using multi-step linear prediction based reverberation extraction
AU2015238777B2 (en) Apparatus and Method for Generating an Output Signal having at least two Output Channels
JP2017163458A (en) Up-mix device and program
AU2012252490A1 (en) Apparatus and method for generating an output signal employing a decomposer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19862882

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19862882

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19862882

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19.11.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19862882

Country of ref document: EP

Kind code of ref document: A1