CN117061985A

CN117061985A - Spatial audio processing method and device

Info

Publication number: CN117061985A
Application number: CN202311172643.XA
Authority: CN
Inventors: 张腾蔚; 陈运达; 何海峰
Original assignee: Guangzhou Rantion Technology Co Ltd
Current assignee: Guangzhou Rantion Technology Co Ltd
Priority date: 2023-09-11
Filing date: 2023-09-11
Publication date: 2023-11-14

Abstract

The invention discloses a spatial audio processing method, which relates to the technical field of audio processing and comprises the following steps: the method comprises the steps of obtaining dynamic positioning information of a multi-azimuth virtual sound source, wherein the positioning information comprises azimuth information, distance information and motion speed information of the virtual sound source relative to a user, processing audio signals input by the virtual sound source by utilizing a dynamically updated virtual sound source rendering algorithm to obtain virtual sound source rendering signals, processing the audio signals input by the virtual sound source in each azimuth by utilizing a preset environment rendering algorithm to obtain environment rendering signals, determining the environment rendering algorithm based on a first-order sound field coding algorithm and a first-order sound field decoding algorithm, and determining audio signals with spatial sound effects according to the environment rendering signals and superposition signals of the virtual sound source rendering signals after cross transition processing. The method can render a natural dynamic virtual sound source, the space azimuth of the dynamic sound source is transited naturally, the reverberation is proper, and the immersion feeling is good.

Description

Spatial audio processing method and device

Technical Field

The invention relates to the technical field of audio processing, in particular to a spatial audio processing method and device.

Background

Spatial audio processing is an important audio signal processing technology in virtual, enhanced and mixed reality applications, the spatial audio technology can replay positioning information of audio, a vivid acoustic environment is synthesized, immersive audio experience is created, spatial hearing research shows that spatial perception of sound by people is mainly influenced by binaural factors, virtual replay technology based on binaural sound signals is mainly used for signal processing by using Head-related transfer functions (Head-Related Transfer Function, HRTF) or binaural room impulse responses (Binaural Room Impulse Response, BRIR), binaural sound signals are obtained, users can perceive corresponding virtual sound sources through headphones, consumption cost of consumers or personal users is limited, virtual hearing replay based on the HRTF has natural advantages, virtual hearing replay technology based on the HRTF in the prior art is mainly applied to static free field sound source scenes, and processing of dynamic sound sources in the reverberant environment is insufficient.

Disclosure of Invention

The present invention aims to solve at least one of the technical problems existing in the prior art. To this end, a first aspect of the present invention proposes a spatial audio processing method, comprising:

acquiring dynamic positioning information of a multidirectional virtual sound source, wherein the positioning information comprises azimuth information, distance information and movement speed information of the virtual sound source relative to a user;

processing an audio signal input by a virtual sound source by using a dynamically updated virtual sound source rendering algorithm to obtain a virtual sound source rendering signal, wherein the virtual sound source rendering algorithm is determined based on a preset filter construction algorithm and a preset dynamic effect processing algorithm, parameters of the virtual sound source filtering algorithm are updated according to dynamic azimuth information of a multi-azimuth virtual sound source, and parameters of the dynamic effect processing algorithm are updated according to dynamic distance information and motion speed information of the multi-azimuth virtual sound source;

processing audio signals input by virtual sound sources in each direction by using a preset environment rendering algorithm to obtain environment rendering signals, wherein the environment rendering algorithm is determined based on a first-order sound field coding algorithm and a first-order sound field decoding algorithm;

and determining the audio signal with the spatial sound effect according to the superposition signal of the environment rendering signal and the virtual sound source rendering signal after the cross transition processing.

Optionally, the step of processing the audio signal input by the virtual sound source by using a dynamically updated virtual sound source rendering algorithm to obtain a virtual sound source rendering signal includes:

dynamically processing audio signals input by a multidirectional virtual sound source by using a preset filter construction algorithm, and outputting multipath filtering signals;

and processing the output multipath filtering signals by using a preset dynamic effect processing algorithm to obtain virtual sound source rendering signals.

Optionally, the step of acquiring dynamic positioning information of the multi-azimuth virtual sound source includes:

acquiring the source of an audio signal input by a virtual sound source, wherein the source of the audio signal comprises program preset, user dynamic input or sensor field acquisition;

and determining the positioning information of the virtual sound source at any moment relative to the user according to the source of the audio signal.

Optionally, the step of dynamically processing audio signals input by the virtual sound source in multiple directions by using a preset filter construction algorithm and outputting multipath filtered signals includes:

respectively obtaining a base vector, a left ear weight coefficient, a right ear weight coefficient and a head related delay parameter according to the preprocessed head related transfer function database;

according to the azimuth information of the virtual sound source relative to the user at any moment, determining the space angle information of the virtual sound source relative to the user at any moment;

determining multidirectional control coefficients at any moment according to the space angle information of the virtual sound source at any moment relative to a user, wherein the control coefficients comprise left and right ear weight coefficients and head-related delay parameters;

updating parameters of a preset filter construction algorithm based on the base vector and multidirectional control coefficients at any moment;

and processing the audio signals input by the multidirectional virtual sound source by using a preset filter construction algorithm after parameter updating to obtain multipath filtering signals.

Optionally, before the step of updating parameters of the preset filter construction algorithm based on the basis vector and the multidirectional control coefficient at any time, the method further comprises the step of processing the control coefficient by using a spatial smoothing algorithm, wherein the spatial smoothing algorithm comprises a bilinear difference algorithm, and the step of processing the control coefficient by using a bilinear difference algorithm comprises the following steps:

acquiring azimuth information of each virtual sound source at any moment relative to a user, wherein the azimuth information comprises first three-dimensional coordinate information of the virtual sound source taking the user as a center point;

determining a linear expression of each first three-dimensional coordinate information according to four second three-dimensional coordinate information of adjacent directions of each first three-dimensional coordinate information;

and calculating the interpolation coefficient of the linear expression of each first three-dimensional coordinate information by using a preset interpolation coefficient calculation formula, thereby obtaining the control coefficient of each virtual sound source after smoothing processing relative to the user.

Optionally, the step of processing the output multipath filtered signal by using a preset dynamic effect processing algorithm to obtain a virtual sound source rendering signal includes:

according to the distance information and the motion speed information of the virtual sound source relative to the user at any moment, respectively obtaining a corresponding distance gain coefficient and a Doppler effect coefficient;

updating parameters of a preset dynamic effect processing algorithm by using the distance gain coefficient and the Doppler effect coefficient at any moment;

and processing the output multipath filtering signals by using a dynamic effect processing algorithm after parameter updating to obtain virtual sound source rendering signals.

Optionally, the step of processing the audio signal input by the virtual sound source in each azimuth by using a preset environment rendering algorithm to obtain an environment rendering signal includes:

processing an audio signal input by a virtual sound source in each azimuth by using a first-order sound field coding algorithm to obtain a coded audio signal, wherein the expression of the first-order sound field coding algorithm comprises:

wherein s is _i Is the ith input audio signal, θ, of the N virtual sound sources _i ，φ _i The angle of the horizontal direction and the angle of the vertical direction of the ith virtual sound source are respectively represented, and W, X, Y and Z respectively represent first output data, second output data, third output data and fourth output data of a first-order sound field coding algorithm;

decoding the encoded audio signal using a matrix constructed of binaural room impulse response signals and spherical harmonic matrices to obtain an environment rendering signal, the matrix constructed of binaural room impulse response signals and spherical harmonic matrices comprising:

D＝Y ^H (Y ^* Y ^H ) ^-1 B

y represents omega in different directions _S B represents binaural room impulse response signals of audio signals input by virtual sound sources of respective orientations.

Another aspect of the present invention also provides a spatial audio processing apparatus, including:

the information acquisition module is used for acquiring dynamic positioning information of the multi-azimuth virtual sound source, wherein the positioning information comprises azimuth information, distance information and movement speed information of the virtual sound source relative to a user;

the first processing module is used for processing an audio signal input by the virtual sound source by utilizing a dynamically updated virtual sound source rendering algorithm to obtain a virtual sound source rendering signal, wherein the virtual sound source rendering algorithm is determined based on a preset filter construction algorithm and a preset dynamic effect processing algorithm, parameters of the virtual sound source filtering algorithm are updated according to dynamic azimuth information of the multi-azimuth virtual sound source, and parameters of the dynamic effect processing algorithm are updated according to dynamic distance information and motion speed information of the multi-azimuth virtual sound source;

the second processing module is used for processing the audio signals input by the virtual sound sources in each direction by utilizing a preset environment rendering algorithm to obtain environment rendering signals, and the environment rendering algorithm is determined based on a first-order sound field coding algorithm and a first-order sound field decoding algorithm;

and the output module is used for determining an audio signal with spatial sound effect according to the environment rendering signal and the superposition signal of the virtual sound source rendering signal after the cross over processing.

In another aspect, the present invention also provides an electronic device, including a processor and a memory, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored, where at least one instruction, at least one program, a set of codes, or a set of instructions is loaded and executed by the processor to implement the spatial audio processing method according to any one of the first aspects 1-7.

In another aspect, the present invention also provides a computer readable storage medium having stored therein at least one instruction, at least one program, code set, or instruction set, the at least one instruction, at least one program, code set, or instruction set being loaded and executed by a processor to implement a spatial audio processing method as in any of the first aspects 1-7.

Compared with the prior art, the embodiment of the invention provides a spatial audio processing method and device, which have the following beneficial effects:

the method can be used for rendering a natural dynamic virtual sound source, the spatial azimuth transition of the dynamic sound source is natural, reverberation is proper, and the method has good immersion sense.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the following description will make a brief introduction to the drawings used in the description of the embodiments or the prior art. It should be apparent that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained from these drawings without inventive effort to those of ordinary skill in the art.

Fig. 1 is a flowchart of a spatial audio processing method according to an embodiment of the present invention;

fig. 2 is a step chart of acquiring dynamic positioning information of a virtual sound source in multiple directions in a spatial audio processing method according to an embodiment of the present invention;

fig. 3 is a step chart of obtaining a virtual sound source rendering signal in the spatial audio processing method according to the embodiment of the present invention;

fig. 4 is a step diagram of outputting a multipath filtered signal in a spatial audio processing method according to an embodiment of the present invention;

fig. 5 is a step diagram of obtaining a virtual sound source rendering signal in a spatial audio processing method according to an embodiment of the present invention;

fig. 6 is a step diagram of obtaining an environment rendering signal in a spatial audio processing method according to an embodiment of the present invention;

fig. 7 is a schematic diagram of encoding and decoding an environment rendering signal of a spatial audio processing method according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a spatial audio processing method according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The present specification provides method operational steps as described in the examples or flowcharts, but may include more or fewer operational steps based on conventional or non-inventive labor. When implemented in a real system or server product, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multithreaded environment).

Fig. 1 is a flowchart of a spatial audio processing method according to an embodiment of the present invention, where, as shown in fig. 1, the processing method includes:

step 101, dynamic positioning information of a multi-azimuth virtual sound source is obtained, wherein the positioning information comprises azimuth information, distance information and movement speed information of the virtual sound source relative to a user.

It should be noted that, in the present invention, when a user moves, the virtual sound sources input audio information, wherein the number of the virtual sound sources is at least one and the virtual sound sources are located in different directions, when the user moves, the positioning information changes, so that the direction information transmitted to the user by the sound sources changes at the moment, meanwhile, the distance information and the movement speed information of the user change at the moment, and when the user is stationary, the positioning information does not change.

The Virtual sound source may be directed to a Virtual Reality technology (VR), or a Virtual usage scenario of the emitted audio signal in augmented Reality (Augmented Reality, AR).

Specifically, as shown in fig. 2, the step 101 of obtaining dynamic positioning information of a virtual sound source with multiple directions includes:

step 1011, obtaining the source of the audio signal input by the virtual sound source, wherein the source of the audio signal comprises program preset, user dynamic input or sensor field acquisition;

the program presets dynamic sound effects which are applied to program settings without human participation in a program specified scene, for example, when one end of a train picture is designed in a VR program to be led to the other end, the sound effects are dynamically changed;

the dynamic input of the user is applied to the action sound effect needing human participation, such as the dynamic process that the footstep sound of the opponent in the VR game is from far to near and from near to far;

the sensor field collection is to transmit sound effects at different positions by using sensors at different positions.

Step 1012, determining positioning information of the virtual sound source relative to the user at any moment according to the source of the audio signal.

Specifically, the audio signals of different sources, the determined positioning information of different types, such as the positioning information from a picture to a user, the positioning information from an operation action to the user, the distance information from a sensor to the user, and at least one source selected according to actual use to acquire the positioning information.

Step 102, processing an audio signal input by a virtual sound source by using a dynamically updated virtual sound source rendering algorithm to obtain a virtual sound source rendering signal, wherein the virtual sound source rendering algorithm is determined based on a preset filter construction algorithm and a preset dynamic effect processing algorithm, parameters of the virtual sound source filtering algorithm are updated according to dynamic azimuth information of a multi-azimuth virtual sound source, and parameters of the dynamic effect processing algorithm are updated according to dynamic distance information and motion speed information of the multi-azimuth virtual sound source.

Specifically, as shown in fig. 3, the step 102 of processing the audio signal input by the virtual sound source to obtain the virtual sound source rendering signal by using the dynamically updated virtual sound source rendering algorithm includes:

step 1021, dynamically processing audio signals input by the multi-azimuth virtual sound source by using a preset filter construction algorithm, and outputting multipath filtering signals.

As shown in fig. 4, the step 1021 of dynamically processing audio signals input by a multi-directional virtual sound source and outputting multi-path filtering signals by using a preset filter construction algorithm includes:

step 10211, respectively obtaining a base vector, a left ear weight coefficient, a right ear weight coefficient and a head related delay parameter according to the preprocessed head related transfer function database;

wherein the step of preprocessing comprises:

and processing the head related transfer function database by using a phase transformation algorithm to obtain the head related transfer function database with the minimum phase characteristic.

And processing the head related transfer function database with the minimum phase characteristic by using a principal component analysis method to obtain a head related transfer function database with low data volume. The data volume after processing is about 4% of the original data volume, so that the data volume can be obviously reduced and the processing speed can be improved after the processing of the steps.

Step 10212, determining the space angle information of the virtual sound source relative to the user at any moment according to the azimuth information of the virtual sound source relative to the user at any moment.

The azimuth information of the virtual sound source relative to the user at any moment comprises three-dimensional coordinate information of the virtual sound source at any moment, and the three-dimensional coordinate information of the virtual sound source is converted into space angle information of the virtual sound source by utilizing a coordinate system conversion method of the three-dimensional coordinates and the space angles.

Step 10213, determining multidirectional control coefficients at any moment according to the spatial angle information of the virtual sound source at any moment relative to the user, wherein the control coefficients comprise left and right ear weight coefficients and head-related delay parameters.

The left and right ear weight coefficients and the head-related delay parameters stored in the HRTF database are determined based on the spatial angle information, and because the spatial angle of the virtual sound source at any moment relative to the user is changed, the virtual sound source can be matched with the HRTF database by using the spatial angle information, and the left and right ear weight coefficients and the head-related delay parameters corresponding to the spatial angle information of the virtual sound source stored in the HRTF database are selected, so that the virtual sound source can be used for determining different left and right ear weight coefficients and head-related delay parameters for different links of the virtual sound source.

Step 10214, updating parameters of a preset filter construction algorithm based on the base vector and multidirectional control coefficients at any moment;

the formula of the preset filter construction algorithm is as follows:

in the formula (i),

represents the ith filter, w _qi Left and right ear weight coefficient of ith filter, d _q Representing the basis vector of the filter, τ is the head-related delay parameter and t- τ represents the delay to the basis vector.

Step 10215, processing the audio signal input by the multi-azimuth virtual sound source by using the filter construction algorithm after parameter updating to obtain a multi-path filtering signal.

The filter constructed by the filter construction algorithm corresponds to the virtual sound source, the virtual sound source in each azimuth corresponds to one filter, and filtering processing is carried out on the virtual sound source to obtain a filtering signal.

And 1022, processing the output multipath filtering signals by using a preset dynamic effect processing algorithm to obtain virtual sound source rendering signals.

As shown in fig. 5, the step 1022 of processing the output multipath filtered signals by using a preset dynamic effect processing algorithm to obtain virtual sound source rendering signals includes:

step 10221, respectively obtaining a corresponding distance gain coefficient and a corresponding Doppler effect coefficient according to the distance information and the motion speed information of the virtual sound source relative to the user at any moment;

the dynamic effect processing algorithm calculates a distance gain coefficient and a Doppler effect coefficient according to the relative distance and the speed of the virtual sound source, wherein the two coefficients form a control coefficient, and a distance gain coefficient calculation formula is as follows:

d _ref is the reference distance, A _ref Is the attenuation coefficient of the reference distance, typically, the attenuation coefficient A of the reference distance is set _ref =0.5, i.e. the distance is doubled and the gain decays by 6dB.

Doppler coefficientThe calculation formula of (2) is as follows:

v _ls is the velocity of the user leaving the sound source, v _sl Is the speed at which the sound source leaves the user, who is the listener.

Step 10222, updating parameters of a preset dynamic effect processing algorithm by using the distance gain coefficient and the Doppler effect coefficient at any moment;

the preset dynamic effect processing algorithm is just to select a common dynamic effect device in the field, and the parameters of the common dynamic effect device are determined according to the distance gain coefficient and the Doppler effect coefficient of the virtual sound source at any moment.

Step 10223, processing the output multipath filtering signals by using a dynamic effect processing algorithm after parameter updating to obtain virtual sound source rendering signals.

After processing by this procedure, the input mono sound source is output as a binaural sound signal, and if it is a dynamic virtual sound source, the binaural sound signal is output as a dynamic virtual sound source.

Before the step of updating parameters of a preset filter construction algorithm based on the basis vector and the multidirectional control coefficient at any moment, the method further comprises the step of processing the control coefficient by using a spatial smoothing algorithm, wherein the spatial smoothing algorithm comprises a bilinear difference algorithm, and the step of processing the control coefficient by using the bilinear difference algorithm comprises the following steps:

w′＝a ₁ w ₁ +a ₂ w ₂ +a ₃ w ₃ +a ₄ w ₄

Where w' is the target azimuth (θ ₀ ,φ ₀ ) The coefficient vectors of the smoothed filter, w1, w2, w3, w4 are the target spatial orientation neighboring spatial orientations, respectively, (θ) ₁ ,φ ₁ ),(θ ₂ ,φ ₁ ),(θ ₁ ,φ ₂ ),(θ ₂ ,φ ₂ ) And 4 filter coefficient vectors representing four second three-dimensional coordinates of adjacent orientations of the first three-dimensional coordinates.

And 103, processing the audio signals input by the virtual sound sources in each azimuth by using a preset environment rendering algorithm to obtain environment rendering signals, wherein the environment rendering algorithm is determined based on a first-order sound field coding algorithm and a first-order sound field decoding algorithm.

Specifically, as shown in fig. 6 and 7, the step 103 of processing the audio signal input by the virtual sound source of each azimuth using a preset environment rendering algorithm to obtain an environment rendering signal includes:

step 1031, processing the audio signal input by the virtual sound source in each azimuth by using a first-order sound field coding algorithm, to obtain a coded audio signal, where the expression of the first-order sound field coding algorithm includes:

step 1032, decoding the encoded audio signal using a matrix constructed of binaural room impulse response signals and spherical harmonic matrices to obtain an environment rendering signal, the matrix constructed of binaural room impulse response signals and spherical harmonic matrices comprising:

D＝Y ^H (Y ^* Y ^H ) ^-1 B

y represents omega in different directions _S Spherical harmonic matrix of virtual sound source of (B) each representing a respectiveBinaural room impulse response signals of audio signals input by azimuth virtual sound sources.

And 104, determining the audio signal with the spatial sound effect according to the environment rendering signal and the superposition signal of the virtual sound source rendering signal after the cross over processing.

According to the invention, the environment reverberant sound rendered by the environment rendering algorithm is superimposed on the cross-transition processed virtual sound source rendering signal, namely the binaural sound signal, so that the realism and immersion of the virtual space sound source are enhanced.

In another aspect, the present invention further provides a spatial audio processing apparatus 200, as shown in fig. 8, including:

the information acquisition module 201 is configured to acquire dynamic positioning information of a multi-azimuth virtual sound source, where the positioning information includes azimuth information, distance information and movement speed information of the virtual sound source relative to a user;

the first processing module 202 is configured to process an audio signal input by a virtual sound source by using a dynamically updated virtual sound source rendering algorithm, to obtain a virtual sound source rendering signal, where the virtual sound source rendering algorithm is determined based on a preset filter construction algorithm and a preset dynamic effect processing algorithm, parameters of the virtual sound source filtering algorithm are updated according to dynamic azimuth information of a multi-azimuth virtual sound source, and parameters of the dynamic effect processing algorithm are updated according to dynamic distance information and motion speed information of the multi-azimuth virtual sound source;

the second processing module 203 is configured to process an audio signal input by a virtual sound source in each azimuth by using a preset environment rendering algorithm, to obtain an environment rendering signal, where the environment rendering algorithm is determined based on a first-order sound field coding algorithm and a first-order sound field decoding algorithm;

and the output module 204 is used for determining an audio signal with spatial sound effect according to the superposition signal of the environment rendering signal and the virtual sound source rendering signal after the cross over processing.

In yet another embodiment of the present invention, there is also provided an apparatus including a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or an instruction set, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement a spatial audio processing method described in an embodiment of the present invention.

In yet another embodiment of the present invention, there is further provided a computer readable storage medium having stored therein at least one instruction, at least one program, a code set, or a set of instructions, which are loaded and executed by a processor to implement the spatial audio processing method in an embodiment of the present invention.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes a plurality of computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of a plurality of available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, with reference to the description of method embodiments in part.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. A method of spatial audio processing comprising:

2. The spatial audio processing method as set forth in claim 1, wherein the step of processing the audio signal inputted by the virtual sound source using a dynamically updated virtual sound source rendering algorithm to obtain the virtual sound source rendering signal comprises:

3. The spatial audio processing method as set forth in claim 1, wherein the step of acquiring dynamic positioning information of the virtual sound source of a plurality of directions comprises:

and determining the positioning information of the virtual sound source at any moment according to the source of the audio signal.

4. The spatial audio processing method as set forth in claim 2, wherein the step of dynamically processing audio signals inputted from a multi-directional virtual sound source using a preset filter construction algorithm, and outputting multi-path filtered signals, comprises:

5. The spatial audio processing method of claim 4, further comprising, before the step of updating parameters of a preset filter construction algorithm based on the basis vector and the multi-directional control coefficients at any time, processing the control coefficients using a spatial smoothing algorithm including a bilinear difference algorithm, the step of processing the control coefficients using a bilinear difference algorithm comprising:

6. The spatial audio processing method as set forth in claim 2, wherein the step of processing the output multi-path filtered signal using a preset dynamic effect processing algorithm to obtain the virtual sound source rendering signal comprises:

7. A spatial audio processing apparatus, comprising:

8. An electronic device comprising a processor and a memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the spatial audio processing method of any one of claims 1-6.

9. A computer readable storage medium having stored therein at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, the code set, or instruction set being loaded and executed by a processor to implement the spatial audio processing method of any of claims 1-6.