CN111193990B

CN111193990B - 3D audio system capable of resisting high-frequency spatial aliasing and implementation method

Info

Publication number: CN111193990B
Application number: CN202010009944.0A
Authority: CN
Inventors: 曲天书; 吴玺宏; 林晶
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2020-01-06
Filing date: 2020-01-06
Publication date: 2021-01-19
Anticipated expiration: 2040-01-06
Also published as: CN111193990A

Abstract

The invention discloses a 3D audio system for resisting high-frequency spatial aliasing and an implementation method. The method comprises the following steps: 1) for a given spherical microphone array, sampling spherical sound pressure, and performing discrete spherical Fourier transform on the sampled spherical sound pressure; the expansion order of the discrete sphere Fourier transform is not more than the truncation order N; 2) obtaining a spatial aliasing matrix E according to the relation between the expansion coefficient of the discrete sphere Fourier transform and the true coefficient of the spherical sound pressure expansion in the step 1); 3) through the formula min (| s | | non-conducting phosphor)₁)、

Solving to obtain a signal s; 4) from the resulting signal s, by formula B_N＝Y_NsCoding s to a higher order N to obtain a higher order HOA signal B_N(ii) a 5) And multiplying the obtained HOA signal by an inverse matrix of spherical Fourier transform to reconstruct a sound field and obtain 3D audio.

Description

3D audio system capable of resisting high-frequency spatial aliasing and implementation method

Technical Field

The invention belongs to the technical field of 3D audio, and particularly relates to a 3D audio system capable of resisting high-frequency spatial aliasing and an implementation method.

Background

The 3D audio technology mainly refers to a related technology adopted for a listener to obtain a corresponding spatial hearing sensation at the time of audio playback.

The sound image reconstructed by the currently commonly adopted stereo or surround sound system only has the degree of freedom in the horizontal direction, cannot be separated from the plane where the loudspeaker is located, does not reach the 2D specification, and is far different from the 3D spatial audio definition. Due to the disparity between the 3D audio technology development and the 3D video technology development, the mainstream 3D multimedia system adopts a scheme of "3D video + stereo/surround sound" no matter in a cinema or at home, and this implementation has the defect of inconsistent visual perception and auditory perception, resulting in insufficient immersion and reality, and is difficult to achieve the immersive effect. With the increasing requirements of people on sound reality and immersion and the rise of virtual reality related technologies, 3D audio playback is gradually gaining importance.

In 3D audio playback, the most direct approach is to simulate human perception of any azimuth sound source in space by using Head Related Transfer Function (HRTF), however, this method can only realize audio playback in a specific direction, and has side effects such as front-back confusion, Head-in-Head effect, and the like. Other possible mainstream methods are Vector-Based Amplitude Panning (VBAP), Wave Field Synthesis (WFS), and Ambisonics-Based 3D audio systems are more promising with their unique advantages. Firstly, the method has the characteristic of convenient recording, can realize that a recording end and a playback end are independent respectively, and does not need to consider the layout of a loudspeaker during playback during recording; secondly, the system can be compatible with the existing stereo, 5.1/7.1 and other non-3D space audio playback systems; thirdly, it can provide a plurality of playback modes, and can use a loudspeaker for playback and a headphone for playback; finally, it can enable binaural playback based on head tracking.

Ambisonics has a long history of development, and in the early 20 th 70 s, Michael Gerzon proposed the implementation of first-order Ambisonics. Since the low spatial resolution of first Order Ambisonics does not meet the needs of people, many researchers have begun to study Higher Order Ambisonics (Higher Order Ambisonics, HOA). The HOA uses spherical harmonic functions as a group of orthogonal bases of a space to carry out spherical harmonic decomposition on a sound field to obtain multichannel HOA signals, and the sound field is analyzed and reconstructed according to the HOA signals. Theoretically, the higher the HOA order used, the larger the sound field area that can be reconstructed accurately, but in practical applications, its order is limited by the number of microphones and speakers, and as the coding order increases, the number of microphones and speakers increases squarely.

Ambisonics-based 3D audio systems can provide users with sufficient realism and immersion, but they also face a key problem in practical applications: the available band is narrow (there is serious spatial aliasing and disorientation in high frequency). The upper cut-off frequency of a 3D audio system with 4-order HOA coding using 32 microphones is 5.4kHz, which is intolerable in some application scenarios where high frequencies are required (e.g. concert recording).

High frequency spatial aliasing occurs because the nyquist spatial sampling theorem is not satisfied due to the limited number of spherical microphones. One relatively straightforward approach is to increase the number of microphones and reduce the radius of the array. Although the spatial aliasing can be relieved by increasing the number of the microphones, the number of the microphones is in a square relation with the cut-off frequency, and the number of the microphones required is increased sharply along with the increase of the cut-off frequency, so that the microphones cannot be applied in practice; reducing the radius of the array without changing the number of microphones is limited by the manufacturing process on the one hand, and on the other hand, reducing the radius of the array increases the frequency of the low frequency noise amplification. There have also been proposals to use multi-radius spherical microphone array structures to broaden the available frequency bands, but the multi-radius arrays require complex and expensive array designs and are limited in practical applications. From the above analysis, it can be known that expanding the available frequency band in the hardware level requires a large cost, so a new anti-spatial aliasing HOA coding algorithm is required, the upper cut-off frequency can be greatly improved on the basis of not changing the hardware structure, and the problem of narrow available frequency band of the Ambisonics-based 3D audio system is solved.

Disclosure of Invention

The problem to be solved by the invention is that the available frequency band of the current 3D audio system based on Ambisonics is narrow, and the problem limits the application of the system in some scenes with higher requirements on sound, such as concert recording. Aiming at the problem, the invention provides a 3D audio system implementation method for resisting high-frequency spatial aliasing, which utilizes the inherent aliasing mode of the spherical microphone array for generating spatial aliasing and combines a sparse recovery method to achieve the aim of avoiding the influence of spatial aliasing when HOA coding is carried out at high frequency.

The technical scheme of the invention is as follows:

a method for implementing a 3D audio system with high frequency spatial aliasing rejection, comprising the steps of:

1) for a given spherical microphone array, sampling spherical sound pressure, and performing discrete spherical Fourier transform on the sampled spherical sound pressure; the expansion order of the discrete sphere Fourier transform is not more than the truncation order N;

2) expansion coefficient of discrete sphere Fourier transform according to step 1)

True coefficient p expanded from spherical sound pressure_nmObtaining a spatial aliasing matrix E through the relationship between the two matrixes;

3) through the formula min (| s | | non-conducting phosphor)₁)、

Solving to obtain a signal s; wherein, Y_NIs a spherical Fourier transform matrix of order N, B'_NThe method comprises the steps that an N-order HOA signal (with aliasing errors) is obtained by HOA coding according to signals of a spherical microphone array, and epsilon is a set value;

4) according to the signal s obtained in step 3), by means of the formula B_N＝Y_Ns encodes s to a higher order N to obtain a higher order HOA signal B without aliasing errors_N；

5) Multiplying the HOA signal obtained in the step 4) by an inverse matrix of the spherical Fourier transform to reconstruct a sound field and obtain 3D audio.

Furthermore, the frequency f of the signal collected by the spherical microphone array meets the requirement

Where c is the speed of sound and r is the radius of the spherical microphone array.

Further, the truncation order N<(M+1)²And M is the number of spherical microphones in the spherical microphone array.

Further, the spatial aliasing matrix E is

A matrix of (a); therein of elements

The spherical fourier expansion order of the spherical sound pressure, Q is the number of spherical microphones.

Furthermore, convolution and superposition are carried out on each loudspeaker signal obtained when the sound field is reconstructed and the head-related impact response of the corresponding loudspeaker, a binaural signal is obtained, and the 3D audio system based on the earphone is realized.

A3D audio system for resisting high frequency spatial aliasing is characterized by comprising a high order HOA signal generation module and a sound field reconstruction module; wherein the content of the first and second substances,

the high-order HOA signal generation module is used for sampling the spherical sound pressure of the spherical microphone array and performing discrete spherical Fourier transform on the sampled spherical sound pressure; the expansion order of the discrete sphere Fourier transform is not more than the truncation order N; then expansion coefficient based on discrete sphere Fourier transform

True coefficient p expanded from spherical sound pressure_nmObtaining a spatial aliasing matrix E through the relationship between the two matrixes; then through the formula min (| s | | non-woven phosphor)₁)、

Solving to obtain a signal s; wherein, Y_NIs a spherical Fourier transform matrix of order N, B'_NThe HOA signal of N orders is obtained by HOA coding according to the signal of the spherical microphone array, and epsilon is a set value; then by formula B_N＝Y_Ns encodes s to N order to obtain HOA signal B of N order_N；

And the sound field reconstruction module is used for multiplying the obtained HOA signal by an inverse matrix of the spherical Fourier transform to reconstruct a sound field and obtain 3D audio.

The invention has the beneficial effects that:

the upper cut-off frequency of a spherical microphone array (32 microphones and 4-order HOA coding) is increased from 5.4kHz to 10kHz, so that the problem of high-frequency space aliasing is solved, and the problem of universality of the 3D audio system based on Ambisonics in different scenes is solved.

Drawings

FIG. 1 is a global scheme of an Ambisonics-based 3D audio system;

FIG. 2 is a flow chart of anti-spatially aliased HOA (high Order ambisonics) coding;

FIG. 3 is a diagram of the spatial aliasing pattern of a spherical microphone array (32 microphones, rigid sphere) with a radius of 5 cm;

FIG. 4 is a spatial orientation of frequencies for a single source experiment;

(a) the ideal HOA signal is used, (b) the conventional HOA coding scheme,

(c) the coding method of the invention, (d) the optimized coding method of the invention;

FIG. 5 is a graph of spatial orientation of frequencies for two sound source experiments;

(a) the ideal HOA signal is used, (b) the conventional HOA coding scheme,

(c) the coding method of the present invention, and (d) the optimized coding method of the present invention.

Detailed Description

The following describes a method for implementing a 3D audio system for resisting high-frequency spatial aliasing according to the present invention with reference to the accompanying drawings and embodiments.

Fig. 1 is a global scheme of an Ambisonics-based 3D audio system, and specific implementation steps of the system include spatial aliasing matrix solution, anti-spatial aliasing HOA coding, and experimental verification. FIG. 2 is a flow chart of a spatial aliasing matrix solution. The concrete realization of each step is as follows:

1. spatial aliasing matrix solution

For a given spherical microphone array, the mode in which spatial aliasing occurs is determined, so that the information of the spatial aliasing mode can be used to achieve an anti-spatial aliasing effect. The spatial aliasing pattern of an array of spherical microphones that obey an approximately uniform distribution is analyzed as follows:

a spherical coordinate system is adopted, theta is an elevation angle (the range is 0 to pi), and phi is a horizontal angle (the anticlockwise is increased, and the value range is 0 to 2 pi). The radius of a rigid sphere is r, and the sound pressure of the surface of the rigid sphere can be expanded by using a spherical harmonic function according to the formula (1):

here W_n(kr) is a radial function, n is the order of the spherical sound pressure developed using the spherical harmonics, k is the wavenumber, and r is the radius of the spherical microphone array. If the spherical sound pressure is subjected to spherical Fourier transform, the result is represented by p_nm(n-order m-order spherical fourier transform coefficient obtained from spherical continuous sound pressure) represents:

in practical application, spherical sound pressure needs to be sampled, and the expansion order is truncated to N, then the spherical sound pressure expansion can be written in a matrix form as follows:

the result of discrete spherical Fourier transform of discrete spherical sound pressure

(discrete spherical fourier transform coefficient of order n m obtained from spherical discrete sound pressure) represents:

where Q is the number of spherical microphones. When the order of the array is N (the order of the array is determined by the number of spherical microphones and the sampling scheme, N<(M+1)²And M is the number of spherical microphones. Generally, the truncation order is equal to the order of the array in HOA encoding), high-order parts with orders greater than N are superimposed to a low order in a certain mode after the spherical sound pressure is subjected to discrete sphere fourier expansion, so that low-order components are polluted, and spatial aliasing is realized. The spherical sound pressure function is required to be satisfied without spatial aliasingIs of finite order and is less than N. The higher the order of the spherical sound pressure function expansion as the frequency increases. Thus, for a known array structure, the signal frequency f is satisfied

(c is the speed of sound), the spatial aliasing error can be considered negligible, referred to herein as the upper cut-off frequency. More severe spatial aliasing occurs when the signal frequency exceeds the upper cutoff frequency, but the aliasing pattern of the fixed array is fixed, and the spatial aliasing problem can be improved by analyzing and utilizing the aliasing pattern.

The coefficient of the spherical Fourier expansion of the spherical sound pressure is p_nmThe order of expansion is

Coefficient obtained by analytical calculation

True coefficient p expanded from spherical sound pressure_nm(coefficient of truth p_nmDerived from a formula with spherical continuity) to analyze the process of aliasing occurrence.

Wherein the content of the first and second substances,

where α is_qIs a parameter related to the distribution of spherical microphones, and a common sampling scheme is approximately uniform sampling, so that the parameter can be regarded as 1, Y_n,m(θ_q,φ_q) And Y_n’,m’(θ_q,φ_q) Is spherical FourierAnd transforming the matrix to represent the value of the spherical harmonic function at each point. E is called the spatial aliasing matrix and reflects the aliasing mode of the array. The elements in E are visualized as shown in fig. 3.

As can be seen from equation (5), if spatial aliasing is not to occur, it is necessary to set (n ', m') to (n, m)

Other cases

For the array aliasing matrix diagram shown in FIG. 3, E is

Front (N +1)²×(N+1)²Part is a unit array, if the fixed coding order is 4, the requirement is met when the spherical sound pressure expansion order is less than 5, and no space aliasing occurs; if the coding order of the spherical sound pressure is more than 5, the obtained coefficient

There is an aliasing error e due to the higher order component from the ideal coefficient_nmAs shown in equation (6).

However, spatial aliasing does not pollute all coefficients in all cases, and as can be seen from fig. 3, when the expansion order of the spherical sound pressure is 6, only the fourth order of the calculated coefficients deviates from the ideal value, and the coefficients of other lower orders are correct. That is, when the signal frequency exceeds the upper cut-off frequency, spatial aliasing errors contaminate the higher order components first, affecting the lower order components gradually as the frequency increases.

2. Anti-spatial aliasing HOA coding method

When the signal frequency exceeds the upper cut-off frequency, not all orders are contaminated at the beginning, but rather, as the frequency increases, the influence goes from higher orders to lower orders. A more direct idea is then to reject the contaminated higher order components and use only a lower order for encoding, but with the problem of low spatial resolution. In order to make the high frequency have a larger listening area in the reconstruction, it is necessary to process the encoded low-order signal that is not affected by the spatial aliasing. The method of up-scaling to high order with low order HOA signals can be used to partially solve the spatial aliasing problem. Because some low-order components are not affected by the spatial aliasing in a certain frequency range, the high-order components can be recovered by using the correct components, so that the effect caused by the spatial aliasing is eliminated in a certain frequency range. With the HOA signal of lower order N', the up-scaling algorithm is as follows:

B′_N′＝Y_N′s (7)

here B'_NIs an N-stage HOA signal with aliasing errors, B'_N′Represents to B'_NTruncation to the N' order (N)>N′)。Y_N′Is a spherical fourier transform matrix of order N'. s ═ s₁,s₂,…,s_L]^TIs a virtual loudspeaker signal, virtually L virtual loudspeakers in space, B₀₀～B_NNIs the modified HOA signal. The angle T is uniformly distributed on a sphere (which can be obtained by using approximately uniform distribution), and if the HOA signal of N 'order is to be raised to N, L > (N' +1) is satisfied²And L > (N +1)²The conditions of (1).

Solving s by the formula (7), an underdetermined equation needs to be solved, the number of solutions is infinite, in order to obtain a more ideal solution, a sound source sparsity assumption needs to be introduced, and if a sound source is sparse at a time-frequency point, the solution of the equation can be constrained by the following formula:

min(||s||₁)

||||_prepresenting the p-norm,. epsilon.is a parameter with a small value to avoid that the plane wave dictionary cannot contain all possible sound source directions. The method has some disadvantages: this method fails once the signal frequency is so high that even the first order components contain large aliasing errors. And when the available orders are fixed, the performance of the method is rapidly deteriorated along with the increase of the number of sound sources, because the low-order components only depict a rough part of a sound field, in a certain situation, the low-order components of multiple sound sources may be matched with the low-order components of a single sound source, but sparsity constraint selects a more sparse solution, and a true solution of multiple sound sources is abandoned. The aliasing matrix describes the aliasing relationship among the components, the information of the aliasing matrix can be utilized, and the information of more components can be used to optimize the result. Equation (8) becomes:

min(||s||₁)

wherein, B'_NSpatial aliasing errors exist for the N-order HOA signal obtained by HOA encoding from the signals of the spherical microphone array. Recovering HOA coefficient B without spatial aliasing error according to HOA signal with spatial aliasing error_N。

Unlike equation (8), equation (9) uses all signals of order N, whereas the method of equation (8) only applies to signals of order N '(N > N').

For the obtained more accurate s, regardless of (8) or (9), s can be encoded into N order by equation (10), and an HOA signal of N order is obtained:

B_N＝Y_Ns (10)

wherein, Y_NIs a spherical fourier transform matrix of order N.

3. Spatial decoding

After obtaining the HOA signal, multiplying the HOA signal by an inverse matrix of the spherical fourier transform, i.e. reconstructing the sound field according to the matrix inversion method, the basic principle is as follows: when the spherical harmonic expanded form of the superimposed sound field produced by the loudspeaker array is equivalent to the spherical harmonic expanded form of the original sound field, the sound field reconstructed by the loudspeaker array is equivalent to the original sound field.

[s₁,s₂,…,s_L]Is the loudspeaker signal, and L is the number of loudspeakers. And obtaining the signal of the loudspeaker according to the matrix inversion for loudspeaker playing, or converting the signal into a binaural signal in one step for playing by an earphone.

Each of the obtained speaker signals is convolved with its corresponding Head Related Impulse Response (HRIR), and then superimposed to obtain a binaural signal.

I.e. a speaker-based and headphone-based 3D audio system can be implemented.

THE ADVANTAGES OF THE PRESENT INVENTION

The advantages of the present invention will be described below with reference to practical results.

The invention uses a spherical microphone array composed of a rigid sphere with the radius of 5cm and 32 microphones as the acquisition equipment of spatial audio, calculates the spatial direction of a sound source for HOA signals obtained by spatial coding, and performs experiments on 2kHz to 10kHz in order to judge the effectiveness of the method on all frequencies.

The spatial orientation map of the sound source can be calculated by:

b_N(Ω)＝y^TB_N， (9)

where Y is { Y ═ Y₀₀(Ω)，...，Y_NN(Ω)]^TIs a vector formed by spherical harmonic functions of each order in the omega direction, B_NIs a calculated HOA signal, from the formulaIt can be seen that when the value of Ω is from 0 to 2 π, a spatial orientation graph of the horizontal plane can be drawn.

FIG. 4 is the experimental result of a single sound source, with a unit amplitude of the source incident from the 50 degree direction on the horizontal plane; fig. 5 shows experimental results of two sound sources, two sound sources of unit amplitude, which are incident from the directions of 50 degrees and 310 degrees on the horizontal plane, respectively. From experimental results, when the frequency is higher than 5.4kHz, the traditional HOA coding method is seriously influenced by space aliasing, and disorder is generated in the high-frequency direction; the method provided by the invention effectively solves the problem, and is hardly influenced by spatial aliasing from 5.4kHz to 10 kHz.

Although specific embodiments of the invention have been disclosed for illustrative purposes and the accompanying drawings, which are included to provide a further understanding of the invention and are incorporated by reference, those skilled in the art will appreciate that: various substitutions, changes and modifications are possible without departing from the spirit and scope of the present invention and the appended claims. Therefore, the present invention should not be limited to the disclosure of the preferred embodiments and the accompanying drawings.

Claims

1. A method for implementing a 3D audio system with high frequency spatial aliasing rejection, comprising the steps of:

3) through the formula mn (| s | | non-woven phosphor)₁)、

Solving to obtain a signal s; wherein, Y_NIs a spherical Fourier transform matrix of order N, B'_NThe HOA signal of N orders is obtained by HOA coding according to the signal of the spherical microphone array, and epsilon is a set value;

4) according to the signal s obtained in step 3), by means of the formula B_N＝Y_Ns encodes s into higher order N to obtain higher order HOA signal B_N；

2. The method of claim 1, wherein the frequency f of the signal collected by the spherical microphone array is satisfied

3. Method according to claim 1 or 2, characterized in that the truncation order N<(M+1)²And M is the number of spherical microphones in the spherical microphone array.

4. The method of claim 1, wherein the spatial aliasing matrix E is

A matrix of (a); therein of elements

The sphere Fourier expansion order of the spherical sound pressure, Q being the number of spherical microphones, alpha_qIs a parameter related to the distribution of spherical microphones, Y_n,m(θ_q,φ_q) Is an n-order m-order spherical Fourier transform matrix, representing the point (theta) of the spherical harmonic_q,φ_q) Value of (A), Y_n’,m’(θ_q,φ_q) Is n 'order m' order spherical Fourier transform matrix and represents spherical harmonic functionNumber point (theta)_q,φ_q) Value of (a), theta_qIs the elevation angle of q point in the spherical coordinate system, phi_qIs the horizontal angle of q point in the spherical coordinate system.

5. The method of claim 1, wherein each speaker signal obtained when reconstructing the sound field is convolved with a head-related impulse response of a corresponding speaker and then superimposed to obtain a binaural signal, implementing a headphone-based 3D audio system.

6. A3D audio system for resisting high frequency spatial aliasing is characterized by comprising a high order HOA signal generation module and a sound field reconstruction module; wherein the content of the first and second substances,

Solving to obtain a signal s; wherein, Y_NIs a spherical Fourier transform matrix of order N, B'_NThe HOA signal of N orders is obtained by HOA coding according to the signal of the spherical microphone array, and epsilon is a set value; then by formula B_N＝Y_Ns encodes s into higher order N to obtain higher order HOA signal B_N；

7. 3D audio system according to claim 6, characterized in that the truncation order N is of order N<(M+1)²And M is the number of spherical microphones in the spherical microphone array.

8. The 3D audio system of claim 6 wherein the frequency f of the signal picked up by the ball microphone array is such that