CN117061985A - Spatial audio processing method and device - Google Patents
Spatial audio processing method and device Download PDFInfo
- Publication number
- CN117061985A CN117061985A CN202311172643.XA CN202311172643A CN117061985A CN 117061985 A CN117061985 A CN 117061985A CN 202311172643 A CN202311172643 A CN 202311172643A CN 117061985 A CN117061985 A CN 117061985A
- Authority
- CN
- China
- Prior art keywords
- sound source
- virtual sound
- algorithm
- processing
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 27
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 120
- 238000012545 processing Methods 0.000 claims abstract description 102
- 238000009877 rendering Methods 0.000 claims abstract description 83
- 230000005236 sound signal Effects 0.000 claims abstract description 62
- 230000000694 effects Effects 0.000 claims abstract description 47
- 238000000034 method Methods 0.000 claims abstract description 22
- 230000007704 transition Effects 0.000 claims abstract description 4
- 238000010276 construction Methods 0.000 claims description 23
- 238000001914 filtration Methods 0.000 claims description 21
- 239000013598 vector Substances 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 10
- 238000012546 transfer Methods 0.000 claims description 9
- 238000009499 grossing Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000007654 immersion Methods 0.000 abstract description 3
- 230000004044 response Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 6
- 230000009471 action Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 5
- 230000013707 sensory perception of sound Effects 0.000 description 4
- 230000003190 augmentative effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012847 principal component analysis method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/305—Electronic adaptation of stereophonic audio signals to reverberation of the listening space
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Stereophonic System (AREA)
Abstract
The invention discloses a spatial audio processing method, which relates to the technical field of audio processing and comprises the following steps: the method comprises the steps of obtaining dynamic positioning information of a multi-azimuth virtual sound source, wherein the positioning information comprises azimuth information, distance information and motion speed information of the virtual sound source relative to a user, processing audio signals input by the virtual sound source by utilizing a dynamically updated virtual sound source rendering algorithm to obtain virtual sound source rendering signals, processing the audio signals input by the virtual sound source in each azimuth by utilizing a preset environment rendering algorithm to obtain environment rendering signals, determining the environment rendering algorithm based on a first-order sound field coding algorithm and a first-order sound field decoding algorithm, and determining audio signals with spatial sound effects according to the environment rendering signals and superposition signals of the virtual sound source rendering signals after cross transition processing. The method can render a natural dynamic virtual sound source, the space azimuth of the dynamic sound source is transited naturally, the reverberation is proper, and the immersion feeling is good.
Description
Technical Field
The invention relates to the technical field of audio processing, in particular to a spatial audio processing method and device.
Background
Spatial audio processing is an important audio signal processing technology in virtual, enhanced and mixed reality applications, the spatial audio technology can replay positioning information of audio, a vivid acoustic environment is synthesized, immersive audio experience is created, spatial hearing research shows that spatial perception of sound by people is mainly influenced by binaural factors, virtual replay technology based on binaural sound signals is mainly used for signal processing by using Head-related transfer functions (Head-Related Transfer Function, HRTF) or binaural room impulse responses (Binaural Room Impulse Response, BRIR), binaural sound signals are obtained, users can perceive corresponding virtual sound sources through headphones, consumption cost of consumers or personal users is limited, virtual hearing replay based on the HRTF has natural advantages, virtual hearing replay technology based on the HRTF in the prior art is mainly applied to static free field sound source scenes, and processing of dynamic sound sources in the reverberant environment is insufficient.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. To this end, a first aspect of the present invention proposes a spatial audio processing method, comprising:
acquiring dynamic positioning information of a multidirectional virtual sound source, wherein the positioning information comprises azimuth information, distance information and movement speed information of the virtual sound source relative to a user;
processing an audio signal input by a virtual sound source by using a dynamically updated virtual sound source rendering algorithm to obtain a virtual sound source rendering signal, wherein the virtual sound source rendering algorithm is determined based on a preset filter construction algorithm and a preset dynamic effect processing algorithm, parameters of the virtual sound source filtering algorithm are updated according to dynamic azimuth information of a multi-azimuth virtual sound source, and parameters of the dynamic effect processing algorithm are updated according to dynamic distance information and motion speed information of the multi-azimuth virtual sound source;
processing audio signals input by virtual sound sources in each direction by using a preset environment rendering algorithm to obtain environment rendering signals, wherein the environment rendering algorithm is determined based on a first-order sound field coding algorithm and a first-order sound field decoding algorithm;
and determining the audio signal with the spatial sound effect according to the superposition signal of the environment rendering signal and the virtual sound source rendering signal after the cross transition processing.
Optionally, the step of processing the audio signal input by the virtual sound source by using a dynamically updated virtual sound source rendering algorithm to obtain a virtual sound source rendering signal includes:
dynamically processing audio signals input by a multidirectional virtual sound source by using a preset filter construction algorithm, and outputting multipath filtering signals;
and processing the output multipath filtering signals by using a preset dynamic effect processing algorithm to obtain virtual sound source rendering signals.
Optionally, the step of acquiring dynamic positioning information of the multi-azimuth virtual sound source includes:
acquiring the source of an audio signal input by a virtual sound source, wherein the source of the audio signal comprises program preset, user dynamic input or sensor field acquisition;
and determining the positioning information of the virtual sound source at any moment relative to the user according to the source of the audio signal.
Optionally, the step of dynamically processing audio signals input by the virtual sound source in multiple directions by using a preset filter construction algorithm and outputting multipath filtered signals includes:
respectively obtaining a base vector, a left ear weight coefficient, a right ear weight coefficient and a head related delay parameter according to the preprocessed head related transfer function database;
according to the azimuth information of the virtual sound source relative to the user at any moment, determining the space angle information of the virtual sound source relative to the user at any moment;
determining multidirectional control coefficients at any moment according to the space angle information of the virtual sound source at any moment relative to a user, wherein the control coefficients comprise left and right ear weight coefficients and head-related delay parameters;
updating parameters of a preset filter construction algorithm based on the base vector and multidirectional control coefficients at any moment;
and processing the audio signals input by the multidirectional virtual sound source by using a preset filter construction algorithm after parameter updating to obtain multipath filtering signals.
Optionally, before the step of updating parameters of the preset filter construction algorithm based on the basis vector and the multidirectional control coefficient at any time, the method further comprises the step of processing the control coefficient by using a spatial smoothing algorithm, wherein the spatial smoothing algorithm comprises a bilinear difference algorithm, and the step of processing the control coefficient by using a bilinear difference algorithm comprises the following steps:
acquiring azimuth information of each virtual sound source at any moment relative to a user, wherein the azimuth information comprises first three-dimensional coordinate information of the virtual sound source taking the user as a center point;
determining a linear expression of each first three-dimensional coordinate information according to four second three-dimensional coordinate information of adjacent directions of each first three-dimensional coordinate information;
and calculating the interpolation coefficient of the linear expression of each first three-dimensional coordinate information by using a preset interpolation coefficient calculation formula, thereby obtaining the control coefficient of each virtual sound source after smoothing processing relative to the user.
Optionally, the step of processing the output multipath filtered signal by using a preset dynamic effect processing algorithm to obtain a virtual sound source rendering signal includes:
according to the distance information and the motion speed information of the virtual sound source relative to the user at any moment, respectively obtaining a corresponding distance gain coefficient and a Doppler effect coefficient;
updating parameters of a preset dynamic effect processing algorithm by using the distance gain coefficient and the Doppler effect coefficient at any moment;
and processing the output multipath filtering signals by using a dynamic effect processing algorithm after parameter updating to obtain virtual sound source rendering signals.
Optionally, the step of processing the audio signal input by the virtual sound source in each azimuth by using a preset environment rendering algorithm to obtain an environment rendering signal includes:
processing an audio signal input by a virtual sound source in each azimuth by using a first-order sound field coding algorithm to obtain a coded audio signal, wherein the expression of the first-order sound field coding algorithm comprises:
wherein s is i Is the ith input audio signal, θ, of the N virtual sound sources i ,φ i The angle of the horizontal direction and the angle of the vertical direction of the ith virtual sound source are respectively represented, and W, X, Y and Z respectively represent first output data, second output data, third output data and fourth output data of a first-order sound field coding algorithm;
decoding the encoded audio signal using a matrix constructed of binaural room impulse response signals and spherical harmonic matrices to obtain an environment rendering signal, the matrix constructed of binaural room impulse response signals and spherical harmonic matrices comprising:
D=Y H (Y * Y H ) -1 B
y represents omega in different directions S B represents binaural room impulse response signals of audio signals input by virtual sound sources of respective orientations.
Another aspect of the present invention also provides a spatial audio processing apparatus, including:
the information acquisition module is used for acquiring dynamic positioning information of the multi-azimuth virtual sound source, wherein the positioning information comprises azimuth information, distance information and movement speed information of the virtual sound source relative to a user;
the first processing module is used for processing an audio signal input by the virtual sound source by utilizing a dynamically updated virtual sound source rendering algorithm to obtain a virtual sound source rendering signal, wherein the virtual sound source rendering algorithm is determined based on a preset filter construction algorithm and a preset dynamic effect processing algorithm, parameters of the virtual sound source filtering algorithm are updated according to dynamic azimuth information of the multi-azimuth virtual sound source, and parameters of the dynamic effect processing algorithm are updated according to dynamic distance information and motion speed information of the multi-azimuth virtual sound source;
the second processing module is used for processing the audio signals input by the virtual sound sources in each direction by utilizing a preset environment rendering algorithm to obtain environment rendering signals, and the environment rendering algorithm is determined based on a first-order sound field coding algorithm and a first-order sound field decoding algorithm;
and the output module is used for determining an audio signal with spatial sound effect according to the environment rendering signal and the superposition signal of the virtual sound source rendering signal after the cross over processing.
In another aspect, the present invention also provides an electronic device, including a processor and a memory, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored, where at least one instruction, at least one program, a set of codes, or a set of instructions is loaded and executed by the processor to implement the spatial audio processing method according to any one of the first aspects 1-7.
In another aspect, the present invention also provides a computer readable storage medium having stored therein at least one instruction, at least one program, code set, or instruction set, the at least one instruction, at least one program, code set, or instruction set being loaded and executed by a processor to implement a spatial audio processing method as in any of the first aspects 1-7.
Compared with the prior art, the embodiment of the invention provides a spatial audio processing method and device, which have the following beneficial effects:
the method can be used for rendering a natural dynamic virtual sound source, the spatial azimuth transition of the dynamic sound source is natural, reverberation is proper, and the method has good immersion sense.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the following description will make a brief introduction to the drawings used in the description of the embodiments or the prior art. It should be apparent that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained from these drawings without inventive effort to those of ordinary skill in the art.
Fig. 1 is a flowchart of a spatial audio processing method according to an embodiment of the present invention;
fig. 2 is a step chart of acquiring dynamic positioning information of a virtual sound source in multiple directions in a spatial audio processing method according to an embodiment of the present invention;
fig. 3 is a step chart of obtaining a virtual sound source rendering signal in the spatial audio processing method according to the embodiment of the present invention;
fig. 4 is a step diagram of outputting a multipath filtered signal in a spatial audio processing method according to an embodiment of the present invention;
fig. 5 is a step diagram of obtaining a virtual sound source rendering signal in a spatial audio processing method according to an embodiment of the present invention;
fig. 6 is a step diagram of obtaining an environment rendering signal in a spatial audio processing method according to an embodiment of the present invention;
fig. 7 is a schematic diagram of encoding and decoding an environment rendering signal of a spatial audio processing method according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a spatial audio processing method according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The present specification provides method operational steps as described in the examples or flowcharts, but may include more or fewer operational steps based on conventional or non-inventive labor. When implemented in a real system or server product, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multithreaded environment).
Fig. 1 is a flowchart of a spatial audio processing method according to an embodiment of the present invention, where, as shown in fig. 1, the processing method includes:
step 101, dynamic positioning information of a multi-azimuth virtual sound source is obtained, wherein the positioning information comprises azimuth information, distance information and movement speed information of the virtual sound source relative to a user.
It should be noted that, in the present invention, when a user moves, the virtual sound sources input audio information, wherein the number of the virtual sound sources is at least one and the virtual sound sources are located in different directions, when the user moves, the positioning information changes, so that the direction information transmitted to the user by the sound sources changes at the moment, meanwhile, the distance information and the movement speed information of the user change at the moment, and when the user is stationary, the positioning information does not change.
The Virtual sound source may be directed to a Virtual Reality technology (VR), or a Virtual usage scenario of the emitted audio signal in augmented Reality (Augmented Reality, AR).
Specifically, as shown in fig. 2, the step 101 of obtaining dynamic positioning information of a virtual sound source with multiple directions includes:
step 1011, obtaining the source of the audio signal input by the virtual sound source, wherein the source of the audio signal comprises program preset, user dynamic input or sensor field acquisition;
the program presets dynamic sound effects which are applied to program settings without human participation in a program specified scene, for example, when one end of a train picture is designed in a VR program to be led to the other end, the sound effects are dynamically changed;
the dynamic input of the user is applied to the action sound effect needing human participation, such as the dynamic process that the footstep sound of the opponent in the VR game is from far to near and from near to far;
the sensor field collection is to transmit sound effects at different positions by using sensors at different positions.
Step 1012, determining positioning information of the virtual sound source relative to the user at any moment according to the source of the audio signal.
Specifically, the audio signals of different sources, the determined positioning information of different types, such as the positioning information from a picture to a user, the positioning information from an operation action to the user, the distance information from a sensor to the user, and at least one source selected according to actual use to acquire the positioning information.
Step 102, processing an audio signal input by a virtual sound source by using a dynamically updated virtual sound source rendering algorithm to obtain a virtual sound source rendering signal, wherein the virtual sound source rendering algorithm is determined based on a preset filter construction algorithm and a preset dynamic effect processing algorithm, parameters of the virtual sound source filtering algorithm are updated according to dynamic azimuth information of a multi-azimuth virtual sound source, and parameters of the dynamic effect processing algorithm are updated according to dynamic distance information and motion speed information of the multi-azimuth virtual sound source.
Specifically, as shown in fig. 3, the step 102 of processing the audio signal input by the virtual sound source to obtain the virtual sound source rendering signal by using the dynamically updated virtual sound source rendering algorithm includes:
step 1021, dynamically processing audio signals input by the multi-azimuth virtual sound source by using a preset filter construction algorithm, and outputting multipath filtering signals.
As shown in fig. 4, the step 1021 of dynamically processing audio signals input by a multi-directional virtual sound source and outputting multi-path filtering signals by using a preset filter construction algorithm includes:
step 10211, respectively obtaining a base vector, a left ear weight coefficient, a right ear weight coefficient and a head related delay parameter according to the preprocessed head related transfer function database;
wherein the step of preprocessing comprises:
and processing the head related transfer function database by using a phase transformation algorithm to obtain the head related transfer function database with the minimum phase characteristic.
And processing the head related transfer function database with the minimum phase characteristic by using a principal component analysis method to obtain a head related transfer function database with low data volume. The data volume after processing is about 4% of the original data volume, so that the data volume can be obviously reduced and the processing speed can be improved after the processing of the steps.
Step 10212, determining the space angle information of the virtual sound source relative to the user at any moment according to the azimuth information of the virtual sound source relative to the user at any moment.
The azimuth information of the virtual sound source relative to the user at any moment comprises three-dimensional coordinate information of the virtual sound source at any moment, and the three-dimensional coordinate information of the virtual sound source is converted into space angle information of the virtual sound source by utilizing a coordinate system conversion method of the three-dimensional coordinates and the space angles.
Step 10213, determining multidirectional control coefficients at any moment according to the spatial angle information of the virtual sound source at any moment relative to the user, wherein the control coefficients comprise left and right ear weight coefficients and head-related delay parameters.
The left and right ear weight coefficients and the head-related delay parameters stored in the HRTF database are determined based on the spatial angle information, and because the spatial angle of the virtual sound source at any moment relative to the user is changed, the virtual sound source can be matched with the HRTF database by using the spatial angle information, and the left and right ear weight coefficients and the head-related delay parameters corresponding to the spatial angle information of the virtual sound source stored in the HRTF database are selected, so that the virtual sound source can be used for determining different left and right ear weight coefficients and head-related delay parameters for different links of the virtual sound source.
Step 10214, updating parameters of a preset filter construction algorithm based on the base vector and multidirectional control coefficients at any moment;
the formula of the preset filter construction algorithm is as follows:
in the formula (i),
represents the ith filter, w qi Left and right ear weight coefficient of ith filter, d q Representing the basis vector of the filter, τ is the head-related delay parameter and t- τ represents the delay to the basis vector.
Step 10215, processing the audio signal input by the multi-azimuth virtual sound source by using the filter construction algorithm after parameter updating to obtain a multi-path filtering signal.
The filter constructed by the filter construction algorithm corresponds to the virtual sound source, the virtual sound source in each azimuth corresponds to one filter, and filtering processing is carried out on the virtual sound source to obtain a filtering signal.
And 1022, processing the output multipath filtering signals by using a preset dynamic effect processing algorithm to obtain virtual sound source rendering signals.
As shown in fig. 5, the step 1022 of processing the output multipath filtered signals by using a preset dynamic effect processing algorithm to obtain virtual sound source rendering signals includes:
step 10221, respectively obtaining a corresponding distance gain coefficient and a corresponding Doppler effect coefficient according to the distance information and the motion speed information of the virtual sound source relative to the user at any moment;
the dynamic effect processing algorithm calculates a distance gain coefficient and a Doppler effect coefficient according to the relative distance and the speed of the virtual sound source, wherein the two coefficients form a control coefficient, and a distance gain coefficient calculation formula is as follows:
d ref is the reference distance, A ref Is the attenuation coefficient of the reference distance, typically, the attenuation coefficient A of the reference distance is set ref =0.5, i.e. the distance is doubled and the gain decays by 6dB.
Doppler coefficientThe calculation formula of (2) is as follows:
v ls is the velocity of the user leaving the sound source, v sl Is the speed at which the sound source leaves the user, who is the listener.
Step 10222, updating parameters of a preset dynamic effect processing algorithm by using the distance gain coefficient and the Doppler effect coefficient at any moment;
the preset dynamic effect processing algorithm is just to select a common dynamic effect device in the field, and the parameters of the common dynamic effect device are determined according to the distance gain coefficient and the Doppler effect coefficient of the virtual sound source at any moment.
Step 10223, processing the output multipath filtering signals by using a dynamic effect processing algorithm after parameter updating to obtain virtual sound source rendering signals.
After processing by this procedure, the input mono sound source is output as a binaural sound signal, and if it is a dynamic virtual sound source, the binaural sound signal is output as a dynamic virtual sound source.
Before the step of updating parameters of a preset filter construction algorithm based on the basis vector and the multidirectional control coefficient at any moment, the method further comprises the step of processing the control coefficient by using a spatial smoothing algorithm, wherein the spatial smoothing algorithm comprises a bilinear difference algorithm, and the step of processing the control coefficient by using the bilinear difference algorithm comprises the following steps:
acquiring azimuth information of each virtual sound source at any moment relative to a user, wherein the azimuth information comprises first three-dimensional coordinate information of the virtual sound source taking the user as a center point;
determining a linear expression of each first three-dimensional coordinate information according to four second three-dimensional coordinate information of adjacent directions of each first three-dimensional coordinate information;
and calculating the interpolation coefficient of the linear expression of each first three-dimensional coordinate information by using a preset interpolation coefficient calculation formula, thereby obtaining the control coefficient of each virtual sound source after smoothing processing relative to the user.
w′=a 1 w 1 +a 2 w 2 +a 3 w 3 +a 4 w 4
Where w' is the target azimuth (θ 0 ,φ 0 ) The coefficient vectors of the smoothed filter, w1, w2, w3, w4 are the target spatial orientation neighboring spatial orientations, respectively, (θ) 1 ,φ 1 ),(θ 2 ,φ 1 ),(θ 1 ,φ 2 ),(θ 2 ,φ 2 ) And 4 filter coefficient vectors representing four second three-dimensional coordinates of adjacent orientations of the first three-dimensional coordinates.
And 103, processing the audio signals input by the virtual sound sources in each azimuth by using a preset environment rendering algorithm to obtain environment rendering signals, wherein the environment rendering algorithm is determined based on a first-order sound field coding algorithm and a first-order sound field decoding algorithm.
Specifically, as shown in fig. 6 and 7, the step 103 of processing the audio signal input by the virtual sound source of each azimuth using a preset environment rendering algorithm to obtain an environment rendering signal includes:
step 1031, processing the audio signal input by the virtual sound source in each azimuth by using a first-order sound field coding algorithm, to obtain a coded audio signal, where the expression of the first-order sound field coding algorithm includes:
wherein s is i Is the ith input audio signal, θ, of the N virtual sound sources i ,φ i The angle of the horizontal direction and the angle of the vertical direction of the ith virtual sound source are respectively represented, and W, X, Y and Z respectively represent first output data, second output data, third output data and fourth output data of a first-order sound field coding algorithm;
step 1032, decoding the encoded audio signal using a matrix constructed of binaural room impulse response signals and spherical harmonic matrices to obtain an environment rendering signal, the matrix constructed of binaural room impulse response signals and spherical harmonic matrices comprising:
D=Y H (Y * Y H ) -1 B
y represents omega in different directions S Spherical harmonic matrix of virtual sound source of (B) each representing a respectiveBinaural room impulse response signals of audio signals input by azimuth virtual sound sources.
And 104, determining the audio signal with the spatial sound effect according to the environment rendering signal and the superposition signal of the virtual sound source rendering signal after the cross over processing.
According to the invention, the environment reverberant sound rendered by the environment rendering algorithm is superimposed on the cross-transition processed virtual sound source rendering signal, namely the binaural sound signal, so that the realism and immersion of the virtual space sound source are enhanced.
In another aspect, the present invention further provides a spatial audio processing apparatus 200, as shown in fig. 8, including:
the information acquisition module 201 is configured to acquire dynamic positioning information of a multi-azimuth virtual sound source, where the positioning information includes azimuth information, distance information and movement speed information of the virtual sound source relative to a user;
the first processing module 202 is configured to process an audio signal input by a virtual sound source by using a dynamically updated virtual sound source rendering algorithm, to obtain a virtual sound source rendering signal, where the virtual sound source rendering algorithm is determined based on a preset filter construction algorithm and a preset dynamic effect processing algorithm, parameters of the virtual sound source filtering algorithm are updated according to dynamic azimuth information of a multi-azimuth virtual sound source, and parameters of the dynamic effect processing algorithm are updated according to dynamic distance information and motion speed information of the multi-azimuth virtual sound source;
the second processing module 203 is configured to process an audio signal input by a virtual sound source in each azimuth by using a preset environment rendering algorithm, to obtain an environment rendering signal, where the environment rendering algorithm is determined based on a first-order sound field coding algorithm and a first-order sound field decoding algorithm;
and the output module 204 is used for determining an audio signal with spatial sound effect according to the superposition signal of the environment rendering signal and the virtual sound source rendering signal after the cross over processing.
In yet another embodiment of the present invention, there is also provided an apparatus including a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or an instruction set, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement a spatial audio processing method described in an embodiment of the present invention.
In yet another embodiment of the present invention, there is further provided a computer readable storage medium having stored therein at least one instruction, at least one program, a code set, or a set of instructions, which are loaded and executed by a processor to implement the spatial audio processing method in an embodiment of the present invention.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes a plurality of computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of a plurality of available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, with reference to the description of method embodiments in part.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.
Claims (9)
1. A method of spatial audio processing comprising:
acquiring dynamic positioning information of a multidirectional virtual sound source, wherein the positioning information comprises azimuth information, distance information and movement speed information of the virtual sound source relative to a user;
processing an audio signal input by a virtual sound source by using a dynamically updated virtual sound source rendering algorithm to obtain a virtual sound source rendering signal, wherein the virtual sound source rendering algorithm is determined based on a preset filter construction algorithm and a preset dynamic effect processing algorithm, parameters of the virtual sound source filtering algorithm are updated according to dynamic azimuth information of a multi-azimuth virtual sound source, and parameters of the dynamic effect processing algorithm are updated according to dynamic distance information and motion speed information of the multi-azimuth virtual sound source;
processing audio signals input by virtual sound sources in each direction by using a preset environment rendering algorithm to obtain environment rendering signals, wherein the environment rendering algorithm is determined based on a first-order sound field coding algorithm and a first-order sound field decoding algorithm;
and determining the audio signal with the spatial sound effect according to the superposition signal of the environment rendering signal and the virtual sound source rendering signal after the cross transition processing.
2. The spatial audio processing method as set forth in claim 1, wherein the step of processing the audio signal inputted by the virtual sound source using a dynamically updated virtual sound source rendering algorithm to obtain the virtual sound source rendering signal comprises:
dynamically processing audio signals input by a multidirectional virtual sound source by using a preset filter construction algorithm, and outputting multipath filtering signals;
and processing the output multipath filtering signals by using a preset dynamic effect processing algorithm to obtain virtual sound source rendering signals.
3. The spatial audio processing method as set forth in claim 1, wherein the step of acquiring dynamic positioning information of the virtual sound source of a plurality of directions comprises:
acquiring the source of an audio signal input by a virtual sound source, wherein the source of the audio signal comprises program preset, user dynamic input or sensor field acquisition;
and determining the positioning information of the virtual sound source at any moment according to the source of the audio signal.
4. The spatial audio processing method as set forth in claim 2, wherein the step of dynamically processing audio signals inputted from a multi-directional virtual sound source using a preset filter construction algorithm, and outputting multi-path filtered signals, comprises:
respectively obtaining a base vector, a left ear weight coefficient, a right ear weight coefficient and a head related delay parameter according to the preprocessed head related transfer function database;
according to the azimuth information of the virtual sound source relative to the user at any moment, determining the space angle information of the virtual sound source relative to the user at any moment;
determining multidirectional control coefficients at any moment according to the space angle information of the virtual sound source at any moment relative to a user, wherein the control coefficients comprise left and right ear weight coefficients and head-related delay parameters;
updating parameters of a preset filter construction algorithm based on the base vector and multidirectional control coefficients at any moment;
and processing the audio signals input by the multidirectional virtual sound source by using a preset filter construction algorithm after parameter updating to obtain multipath filtering signals.
5. The spatial audio processing method of claim 4, further comprising, before the step of updating parameters of a preset filter construction algorithm based on the basis vector and the multi-directional control coefficients at any time, processing the control coefficients using a spatial smoothing algorithm including a bilinear difference algorithm, the step of processing the control coefficients using a bilinear difference algorithm comprising:
acquiring azimuth information of each virtual sound source at any moment relative to a user, wherein the azimuth information comprises first three-dimensional coordinate information of the virtual sound source taking the user as a center point;
determining a linear expression of each first three-dimensional coordinate information according to four second three-dimensional coordinate information of adjacent directions of each first three-dimensional coordinate information;
and calculating the interpolation coefficient of the linear expression of each first three-dimensional coordinate information by using a preset interpolation coefficient calculation formula, thereby obtaining the control coefficient of each virtual sound source after smoothing processing relative to the user.
6. The spatial audio processing method as set forth in claim 2, wherein the step of processing the output multi-path filtered signal using a preset dynamic effect processing algorithm to obtain the virtual sound source rendering signal comprises:
according to the distance information and the motion speed information of the virtual sound source relative to the user at any moment, respectively obtaining a corresponding distance gain coefficient and a Doppler effect coefficient;
updating parameters of a preset dynamic effect processing algorithm by using the distance gain coefficient and the Doppler effect coefficient at any moment;
and processing the output multipath filtering signals by using a dynamic effect processing algorithm after parameter updating to obtain virtual sound source rendering signals.
7. A spatial audio processing apparatus, comprising:
the information acquisition module is used for acquiring dynamic positioning information of the multi-azimuth virtual sound source, wherein the positioning information comprises azimuth information, distance information and movement speed information of the virtual sound source relative to a user;
the first processing module is used for processing an audio signal input by the virtual sound source by utilizing a dynamically updated virtual sound source rendering algorithm to obtain a virtual sound source rendering signal, wherein the virtual sound source rendering algorithm is determined based on a preset filter construction algorithm and a preset dynamic effect processing algorithm, parameters of the virtual sound source filtering algorithm are updated according to dynamic azimuth information of the multi-azimuth virtual sound source, and parameters of the dynamic effect processing algorithm are updated according to dynamic distance information and motion speed information of the multi-azimuth virtual sound source;
the second processing module is used for processing the audio signals input by the virtual sound sources in each direction by utilizing a preset environment rendering algorithm to obtain environment rendering signals, and the environment rendering algorithm is determined based on a first-order sound field coding algorithm and a first-order sound field decoding algorithm;
and the output module is used for determining an audio signal with spatial sound effect according to the environment rendering signal and the superposition signal of the virtual sound source rendering signal after the cross over processing.
8. An electronic device comprising a processor and a memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the spatial audio processing method of any one of claims 1-6.
9. A computer readable storage medium having stored therein at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, the code set, or instruction set being loaded and executed by a processor to implement the spatial audio processing method of any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311172643.XA CN117061985A (en) | 2023-09-11 | 2023-09-11 | Spatial audio processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311172643.XA CN117061985A (en) | 2023-09-11 | 2023-09-11 | Spatial audio processing method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117061985A true CN117061985A (en) | 2023-11-14 |
Family
ID=88660935
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311172643.XA Pending CN117061985A (en) | 2023-09-11 | 2023-09-11 | Spatial audio processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117061985A (en) |
-
2023
- 2023-09-11 CN CN202311172643.XA patent/CN117061985A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11792598B2 (en) | Spatial audio for interactive audio environments | |
US6021206A (en) | Methods and apparatus for processing spatialised audio | |
US10349197B2 (en) | Method and device for generating and playing back audio signal | |
JP7038725B2 (en) | Audio signal processing method and equipment | |
JP2023169208A (en) | Audio apparatus and method of audio processing | |
US20200260209A1 (en) | Devices and methods for binaural spatial processing and projection of audio signals | |
EP1938655A1 (en) | Spatial audio simulation | |
CN108370485A (en) | Audio signal processor and method | |
JP2024069464A (en) | Reverberation Gain Normalization | |
JP2024097849A (en) | Index scheming in regard to filter parameter | |
CN111492342A (en) | Audio scene processing | |
Carvalho et al. | Head tracker using webcam for auralization | |
CN114038486A (en) | Audio data processing method and device, electronic equipment and computer storage medium | |
CN117061985A (en) | Spatial audio processing method and device | |
CN113691927B (en) | Audio signal processing method and device | |
Bujacz et al. | Sonification of 3d scenes in an electronic travel aid for the blind | |
TW202245487A (en) | Method and apparatus for determining virtual speaker set | |
CN117998274B (en) | Audio processing method, device and storage medium | |
WO2024203148A1 (en) | Information processing device and method | |
WO2023199817A1 (en) | Information processing method, information processing device, acoustic playback system, and program | |
CN112470218A (en) | Low frequency inter-channel coherence control | |
US11758348B1 (en) | Auditory origin synthesis | |
D'Andrea Fonseca et al. | Head tracker using webcam for auralization | |
RU2815366C2 (en) | Audio device and audio processing method | |
CN118301536A (en) | Audio virtual surrounding processing method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |