WO2024098221A1 - 一种音频信号渲染方法、装置、设备及存储介质 - Google Patents

一种音频信号渲染方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2024098221A1
WO2024098221A1 PCT/CN2022/130428 CN2022130428W WO2024098221A1 WO 2024098221 A1 WO2024098221 A1 WO 2024098221A1 CN 2022130428 W CN2022130428 W CN 2022130428W WO 2024098221 A1 WO2024098221 A1 WO 2024098221A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
gain
hrir
rendered
audio object
Prior art date
Application number
PCT/CN2022/130428
Other languages
English (en)
French (fr)
Inventor
胡晨昊
史润宇
王宾
Original Assignee
北京小米移动软件有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京小米移动软件有限公司 filed Critical 北京小米移动软件有限公司
Priority to PCT/CN2022/130428 priority Critical patent/WO2024098221A1/zh
Publication of WO2024098221A1 publication Critical patent/WO2024098221A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K15/00Acoustics not otherwise provided for
    • G10K15/08Arrangements for producing a reverberation or echo sound
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control

Definitions

  • the present disclosure relates to the field of communication technology, and in particular to an audio signal rendering method, device, equipment and storage medium.
  • This object-based spatial audio technology mainly includes two main parts: encoding and decoding and rendering.
  • the encoding and decoding part is to transmit the audio object signal to be played by the user end and the corresponding metadata
  • the rendering part is to reproduce the spatial audio according to the received audio object signal and its metadata.
  • the method of rendering the audio object signal includes at least one of the following:
  • Method 1 Calculate the orientation gain parameter of the audio object signal according to the sound cone information of the audio object signal, and then render the audio object signal based on the orientation gain parameter.
  • Method 2 Train a set of head-related sound transfer functions (HRTF) to simulate the changes in the sound source when it is transmitted to the human ear in different directions and orientations, obtain the orientation information of the sound source corresponding to the audio object signal, and use the trained HRTF filter to obtain the rendered audio object signal based on the orientation information during rendering.
  • HRTF head-related sound transfer functions
  • method 2 is not suitable for personalized adjustment of various sound sources and has a small scope of application.
  • the audio signal rendering method, device, equipment and storage medium proposed in the present disclosure are intended to solve the technical problems of poor rendering effect and small application scope in related technical methods.
  • an embodiment of the present disclosure provides an audio signal rendering method, which is executed by a signal receiving end and includes:
  • the audio object signal to be rendered is rendered based on the fused HRIR.
  • the signal receiving end determines the first gain HRIR corresponding to the direct sound signal in the audio object signal to be rendered; and also determines the second gain HRIR corresponding to at least one reflected sound signal in the audio object signal to be rendered; then, the fused HRIR corresponding to the audio object signal to be rendered is determined based on the first gain HRIR and the second gain HRIR; and the audio object signal to be rendered is rendered based on the fused HRIR.
  • the first gain HRIR can reflect the gain loss caused by the direction and distance of the sound source during the transmission of the direct sound signal.
  • the second gain HRIR can reflect at least one of the gain loss caused by the direction of the sound source, the gain loss caused by reflection, and the gain loss caused by the transmission distance during the transmission of the reflected sound signal. Therefore, when the audio object signal to be rendered is rendered based on the fused HRIR obtained by the first gain HRIR and the second gain HRIR, the rendering effect can be ensured, the real environment of the sound signal can be restored as much as possible, the experience of rendering the direction of the sound source is improved, and the rendering effect is closer to the real situation. Moreover, the rendering method of the present invention can also realize personalized processing of different types of sound sources, and has a wide range of applications. At the same time, when rendering, the present invention can realize targeted processing of different sound sources and fully consider the scene information of the scene where the sound source is located, thereby further ensuring the rendering effect.
  • an embodiment of the present disclosure provides a communication device, which is configured in an AAnF network element in a first network, including:
  • a processing module used for determining a first gain head-related impulse response HRIR corresponding to a direct sound signal in an audio object signal to be rendered;
  • the processing module is further used to determine a second gain HRIR corresponding to at least one reflected sound signal in the audio object signal to be rendered;
  • the processing module is further used to determine a fused HRIR corresponding to the audio object signal to be rendered based on the first gain HRIR and the second gain HRIR;
  • the processing module is further used to render the audio object signal to be rendered based on the fused HRIR.
  • an embodiment of the present disclosure provides a communication device, which includes a processor.
  • the processor calls a computer program in a memory, the method described in the first aspect is executed.
  • an embodiment of the present disclosure provides a communication device, which includes a processor and a memory, in which a computer program is stored; the processor executes the computer program stored in the memory so that the communication device executes the method described in the first aspect above.
  • an embodiment of the present disclosure provides a communication device, which includes a processor and an interface circuit, wherein the interface circuit is used to receive code instructions and transmit them to the processor, and the processor is used to run the code instructions to enable the device to execute the method described in the first aspect above.
  • an embodiment of the present disclosure provides a communication system, the system includes the communication device described in the second aspect, or the system includes the communication device described in the third aspect, or the system includes the communication device described in the fourth aspect, or the system includes the communication device described in the fifth aspect.
  • an embodiment of the present invention provides a computer-readable storage medium for storing instructions used by the above-mentioned network device, and when the instructions are executed, the terminal device executes the method described in the first aspect.
  • the present disclosure further provides a computer program product comprising a computer program, which, when executed on a computer, enables the computer to execute the method described in the first aspect above.
  • the present disclosure provides a chip system, which includes at least one processor and an interface, for supporting a network device to implement the functions involved in the method described in the first aspect, for example, determining or processing at least one of the data and information involved in the above method.
  • the chip system also includes a memory, which is used to store computer programs and data necessary for the source auxiliary node.
  • the chip system can be composed of a chip, or it can include a chip and other discrete devices.
  • the present disclosure provides a computer program, which, when executed on a computer, enables the computer to execute the method described in the first aspect.
  • FIG1 is a schematic diagram of the architecture of a communication system provided by an embodiment of the present disclosure.
  • FIG2a is a schematic flow chart of an audio signal rendering method provided by another embodiment of the present disclosure.
  • FIG2b is a schematic diagram of a direct sound signal and a reflected sound signal provided by an embodiment of the present disclosure
  • FIG3a is a schematic flow chart of an audio signal rendering method provided by yet another embodiment of the present disclosure.
  • FIG3 b is a schematic diagram of an inner angle of a sound cone and an outer angle of a sound cone provided by an embodiment of the present disclosure
  • FIG4 is a schematic flow chart of an audio signal rendering method provided by yet another embodiment of the present disclosure.
  • FIG5 is a schematic diagram of a flow chart of an audio signal rendering method provided by another embodiment of the present disclosure.
  • FIG6 is a schematic flow chart of an audio signal rendering method provided by yet another embodiment of the present disclosure.
  • FIG7 is a schematic flow chart of an audio signal rendering method provided by yet another embodiment of the present disclosure.
  • FIG8 is a schematic flow chart of an audio signal rendering method provided by yet another embodiment of the present disclosure.
  • FIG9 is a schematic flow chart of an audio signal rendering method provided by yet another embodiment of the present disclosure.
  • FIG10 is a schematic flow chart of an audio signal rendering method provided by yet another embodiment of the present disclosure.
  • FIG11 is a schematic flow chart of an audio signal rendering method provided by yet another embodiment of the present disclosure.
  • FIG12 is a schematic flow chart of an audio signal rendering method provided by yet another embodiment of the present disclosure.
  • FIG13 is a schematic diagram of the structure of a communication device provided by another embodiment of the present disclosure.
  • FIG14 is a schematic diagram of the structure of a communication device provided in an embodiment of the present application.
  • FIG. 15 is a schematic diagram of the structure of a chip provided by an embodiment of the present disclosure.
  • first, second, third, etc. may be used to describe various information in the disclosed embodiments, these information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information.
  • the words "if” and “if” as used herein may be interpreted as “at” or "when” or "in response to determination".
  • Figure 1 is a schematic diagram of the architecture of a communication system provided in an embodiment of the present disclosure.
  • the communication system may include but is not limited to a signal sending device and a signal receiving device, wherein the above-mentioned signal sending device and signal receiving device may be a network device or a terminal device.
  • the number and form of devices shown in Figure 1 are only used for example and do not constitute a limitation on the embodiment of the present disclosure. In actual applications, two or more signal sending devices and two or more signal receiving devices may be included.
  • the communication system shown in Figure 1 takes a signal sending device 11 and a signal receiving device 12 as an example.
  • LTE long term evolution
  • 5G fifth generation
  • NR 5G new radio
  • the network device in the embodiment of the present disclosure is an entity on the network side for transmitting or receiving signals.
  • the network device 11 may be an evolved NodeB (eNB), a transmission reception point (TRP), a next generation NodeB (gNB) in an NR system, a base station in other future mobile communication systems, or an access node in a wireless fidelity (WiFi) system.
  • eNB evolved NodeB
  • TRP transmission reception point
  • gNB next generation NodeB
  • WiFi wireless fidelity
  • the embodiment of the present disclosure does not limit the specific technology and specific device form adopted by the network device.
  • the network device provided in the embodiment of the present disclosure may be composed of a central unit (CU) and a distributed unit (DU), wherein the CU may also be referred to as a control unit.
  • CU central unit
  • DU distributed unit
  • the CU-DU structure may be used to split the protocol layer of the network device, such as a base station, and the functions of some protocol layers are placed in the CU for centralized control, and the functions of the remaining part or all of the protocol layers are distributed in the DU, and the DU is centrally controlled by the CU.
  • the terminal device in the disclosed embodiment is an entity on the user side for receiving or transmitting signals, such as a mobile phone.
  • the terminal device may also be referred to as a terminal device (terminal), a user equipment (UE), a mobile station (MS), a mobile terminal device (MT), etc.
  • the terminal device may be a car with communication function, a smart car, a mobile phone (mobile phone), a wearable device, a tablet computer (Pad), a computer with wireless transceiver function, a virtual reality (VR) terminal device, an augmented reality (AR) terminal device, a wireless terminal device in industrial control (industrial control), a wireless terminal device in self-driving, a wireless terminal device in remote medical surgery, a wireless terminal device in smart grid (smart grid), a wireless terminal device in transportation safety (transportation safety), a wireless terminal device in a smart city (smart city), a wireless terminal device in a smart home (smart home), etc.
  • the embodiments of the present disclosure do not limit the specific technology and specific device form adopted by the terminal device.
  • the communication system described in the embodiment of the present disclosure is for the purpose of more clearly illustrating the technical solution of the embodiment of the present disclosure, and does not constitute a limitation on the technical solution provided by the embodiment of the present disclosure.
  • a person skilled in the art can know that with the evolution of the system architecture and the emergence of new business scenarios, the technical solution provided by the embodiment of the present disclosure is also applicable to similar technical problems.
  • the audio signal rendering method provided by any embodiment can be executed alone, and any implementation method in the embodiment can also be executed alone, or combined with other embodiments, or possible implementation methods in other embodiments, and can also be executed together with any technical solution in the related technology.
  • FIG2a is a flow chart of an audio signal rendering method provided by an embodiment of the present disclosure. The method is executed by a signal receiving end. As shown in FIG2a , the audio signal rendering method may include the following steps:
  • Step 201 determine a first gain head-related impulse response (HRIR) corresponding to a direct sound signal in an audio object signal to be rendered.
  • HRIR head-related impulse response
  • the direct sound signal may be a signal whose transmission path coincides with the straight line between the sound source and the listener, that is, it can be understood that the direct sound signal is a signal emitted by the sound source and directly transmitted to the listener's position without reflection, refraction or diffraction.
  • Each audio object signal includes a direct sound signal.
  • FIG2b is a schematic diagram of a direct sound signal and a reflected sound signal provided by an embodiment of the present disclosure. It can be seen from (1) and (2) in FIG2b that when the direction of the sound source changes, the two ears of the listener will receive different sound signals due to the possible obstacles (such as walls) in the space where the sound source is located.
  • the sound source emits sound toward the listener, and the left ear of the listener will receive a stronger direct sound signal, as well as a reflected sound signal after the radiation of the sound cone is attenuated and then reflected once; in (2) of FIG2b, the sound source emits sound toward a certain area in front of the listener, and the left ear of the listener will receive the direct sound signal after the radiation of the sound cone is attenuated, and the right ear will receive the reflected sound signal after one reflection.
  • the first gain HRIR mentioned above can be determined based on the direction gain parameter and distance gain parameter of the direct sound signal, and can be used to reflect the gain loss caused by the direction and distance of the sound source during the transmission of the direct sound signal.
  • determining the first gain HRIR corresponding to the direct sound signal in the audio object signal to be rendered may include: for each audio object signal to be rendered, determining the first gain HRIR corresponding to the direct sound signal in the audio object signal respectively.
  • Step 202 Determine a second gain HRIR corresponding to at least one reflected sound signal in the audio object signal to be rendered.
  • the above-mentioned reflected sound signal may be: a signal whose transmission path coincides with a straight line between the mirror image sound source position and the listener, wherein the mirror image sound source position may be: a mirror image position of the sound source position relative to the reflector, wherein how to specifically determine the mirror image sound source position will be described in detail in subsequent embodiments.
  • the reflected sound signal can be understood as: a signal emitted by a sound source and transmitted to the listener's position after being reflected.
  • the second gain HRIR may be determined based on at least one of a direction gain parameter, a reflection gain parameter, and a distance delay gain parameter of the reflected sound signal, and may be used to reflect at least one of the following gain losses of the reflected sound signal during transmission:
  • the reason for calculating the above-mentioned first gain HRIR and second gain HRIR is mainly to enable the subsequent rendering of the audio object signal based on the calculated first gain HRIR and second gain HRIR, so as to ensure that the gain losses in various aspects can be taken into account during the rendering process, ensure the rendering accuracy, enhance the experience of rendering in the direction of the sound source, and make the rendering effect closer to the real situation.
  • the above-mentioned "determining the second gain HRIR corresponding to the reflected sound signal in the audio object signal to be rendered” mainly includes: for each audio object signal to be rendered, determining the second gain HRIR corresponding to the reflected sound signal in the audio object signal.
  • each audio object signal includes at least one reflected sound signal. Based on this, if the second gain HRIR corresponding to each reflected sound signal in the audio object signal is calculated, the required calculation amount is relatively large. Therefore, in one embodiment of the present disclosure, only the second gain HRIR corresponding to the early reflected sound can be calculated to reduce the calculation amount. Moreover, the second gain HRIR corresponding to the early reflected sound is sufficient to provide a good sense of direction and can also guarantee the rendering effect to a large extent.
  • the early reflected sound signal may include at least one of the following:
  • First-order reflected sound signal i.e., sound signal that is reflected only once
  • Second-order reflected sound signal i.e., sound signal reflected twice.
  • the distance gain parameter is used to reflect the loss of radiation attenuation caused by distance during the transmission of the acoustic signal.
  • the distance delay gain parameter is used to reflect the loss of radiation attenuation caused by distance during the transmission of the acoustic signal, as well as the transmission delay loss caused by distance.
  • Step 203 Determine a fused HRIR corresponding to the audio object signal to be rendered based on the first gain HRIR and the second gain HRIR.
  • a fused HRIR can be obtained by weighted summing a first gain HRIR corresponding to a direct sound signal of an audio object signal to be rendered and a second gain HRIR corresponding to at least one reflected sound signal of the audio object signal to be rendered.
  • a fused HRIR corresponding to the audio object signal is determined respectively.
  • Step 204 Render the audio object signal to be rendered based on the fused HRIR.
  • the fused HRIR is determined based on the first gain HRIR and the second gain HRIR corresponding to the audio object signal to be rendered, wherein the first gain HRIR can reflect the gain loss caused by the direction and distance of the sound source during the transmission of the direct sound signal.
  • the second gain HRIR can reflect at least one of the gain loss caused by the direction of the sound source, the gain loss caused by reflection, and the gain loss caused by the transmission distance during the transmission of the reflected sound signal.
  • each audio object signal to be rendered is rendered separately. It can be seen that the method of the present disclosure performs targeted rendering on each audio object signal to be rendered, that is, the present disclosure realizes targeted processing of different sound sources, thereby further ensuring the rendering effect.
  • the above-mentioned reflection gain parameter, distance gain parameter, and distance delay gain parameter can all reflect the scene information of the scene in which the sound source is located. It can be seen that when rendering the signal in the method of the present disclosure, the scene information of the scene in which the sound source is located can be fully considered to ensure the accuracy of the rendering.
  • the method of the present disclosure when rendering the audio object signal, it not only relies on the orientation gain parameter, but also refers to the reflection gain parameter and the distance gain parameter. Therefore, the method of the present disclosure is not limited to processing the sound source of directional sound, but can also be applied to personalized adjustment of other types of sound sources (such as the sound source of divergent sound), and has a wide range of applications.
  • the signal receiving end will determine the first gain HRIR corresponding to the direct sound signal in the audio object signal to be rendered; and will also determine the second gain HRIR corresponding to at least one reflected sound signal in the audio object signal to be rendered; then, the fusion HRIR corresponding to the audio object signal to be rendered will be determined based on the first gain HRIR and the second gain HRIR; and the audio object signal to be rendered will be rendered based on the fusion HRIR.
  • the first gain HRIR can reflect the gain loss caused by the direction and distance of the sound source during the transmission of the direct sound signal.
  • the second gain HRIR can reflect at least one of the gain loss caused by the direction of the sound source, the gain loss caused by reflection, and the gain loss caused by the transmission distance during the transmission of the reflected sound signal. Therefore, when the audio object signal to be rendered is rendered based on the fusion HRIR obtained by the first gain HRIR and the second gain HRIR, the rendering effect can be ensured, the real environment of the sound signal can be restored as much as possible, the experience of rendering the direction of the sound source is improved, and the rendering effect is closer to the real situation. Moreover, the rendering method of the present invention can also realize personalized processing of different types of sound sources, and has a wide range of applications. At the same time, when rendering, the present invention can realize targeted processing of different sound sources and fully consider the scene information of the scene where the sound source is located, thereby further ensuring the rendering effect.
  • FIG3a is a flow chart of an audio signal rendering method provided by an embodiment of the present disclosure. The method is executed by a signal receiving end. As shown in FIG3a , the audio signal rendering method may include the following steps:
  • Step 301 Receive a code stream sent by a signal sending end.
  • Step 302 Decode the code stream to obtain at least one audio object signal to be rendered and metadata of at least one audio object signal to be rendered.
  • the metadata may include at least one of the following information:
  • Room information of the room where the sound source of the audio object signal to be rendered is located
  • the spatial position information of the listener is the spatial position information of the listener.
  • the above-mentioned “direction information of the sound source” may include the direction angle and/or direction azimuth of the sound source.
  • the above-mentioned “sound cone information of the sound source” may include at least one of the inner angle of the sound cone, the outer angle of the sound cone, and the outer angle gain value.
  • each sound source corresponds to an inner angle of the sound cone and an outer angle of the sound cone.
  • Figure 3b is a schematic diagram of the inner angle of the sound cone and the outer angle of the sound cone provided in an embodiment of the present disclosure.
  • the inner angle of the sound cone means that the sound signal within the inner angle range of the sound cone is considered to have no radiation attenuation due to the orientation;
  • the outer angle of the sound cone means that the sound signal outside the inner angle range of the sound cone and within the outer angle range of the sound cone is considered to have radiation attenuation due to the orientation, wherein the outer angle gain value can be used to reflect the maximum degree of radiation attenuation of this part of the signal due to the orientation.
  • the sound signal emitted by the sound source will no longer attenuate.
  • the above-mentioned room information may include at least one of the size of the room, the reflection coefficient of objects in the room (such as walls), etc.
  • the above-mentioned spatial orientation information of the sound source may include the absolute position of the sound source and/or the relative position between the sound source and the listener.
  • the above-mentioned orientation information of the listener may include the orientation angle and/or orientation direction of the listener.
  • the above-mentioned spatial orientation information of the listener may include the absolute position of the listener and/or the relative position of the listener with respect to the sound source.
  • the above-mentioned absolute position and relative position can be absolute coordinates or relative coordinates respectively, wherein the absolute coordinates and relative coordinates can be coordinate values in a specific coordinate system, and the specific coordinate system can be a three-dimensional coordinate system established based on a certain point in the room where the sound source is located as the origin.
  • the information included in the above-mentioned spatial position information of the sound source and the spatial position information of the listener should be able to determine the absolute position of the listener and the absolute position of the sound source.
  • the above-mentioned orientation information of the listener and/or the spatial orientation information of the listener may also be directly determined by the listener.
  • Step 303 Determine a first gain HRIR corresponding to a direct sound signal and a second gain HRIR corresponding to a reflected sound signal in the audio object signal to be rendered based on metadata of the audio object signal to be rendered.
  • the signal receiving end will determine the first gain HRIR corresponding to the direct sound signal in the audio object signal to be rendered; and will also determine the second gain HRIR corresponding to at least one reflected sound signal in the audio object signal to be rendered; then, the fusion HRIR corresponding to the audio object signal to be rendered will be determined based on the first gain HRIR and the second gain HRIR; and the audio object signal to be rendered will be rendered based on the fusion HRIR.
  • the first gain HRIR can reflect the gain loss caused by the direction and distance of the sound source during the transmission of the direct sound signal.
  • the second gain HRIR can reflect at least one of the gain loss caused by the direction of the sound source, the gain loss caused by reflection, and the gain loss caused by the transmission distance during the transmission of the reflected sound signal. Therefore, when the audio object signal to be rendered is rendered based on the fusion HRIR obtained by the first gain HRIR and the second gain HRIR, the rendering effect can be ensured, the real environment of the sound signal can be restored as much as possible, the experience of rendering the direction of the sound source is improved, and the rendering effect is closer to the real situation. Moreover, the rendering method of the present invention can also realize personalized processing of different types of sound sources, and has a wide range of applications. At the same time, when rendering, the present invention can realize targeted processing of different sound sources and fully consider the scene information of the scene where the sound source is located, thereby further ensuring the rendering effect.
  • FIG4 is a flow chart of an audio signal rendering method provided by an embodiment of the present disclosure. The method is executed by a signal receiving end. As shown in FIG4 , the audio signal rendering method may include the following steps:
  • Step 401 Determine an incident angle of a direct sound signal relative to a listener based on spatial orientation information of a sound source and/or spatial orientation information of a listener.
  • the positions of the sound source and the listener can be determined based on the spatial orientation information of the sound source and/or the spatial orientation information of the listener, and then the incident angle of the direct sound signal relative to the listener can be determined based on the straight line between the sound source position and the listener position.
  • the angle between the straight line between the sound source position and the listener position and the normal line of the incident surface is the incident angle of the direct sound signal relative to the listener.
  • Step 402 Determine a first HRIR corresponding to the direct sound signal based on the incident angle of the direct sound signal relative to the listener and the orientation information of the listener.
  • the first HRIR corresponding to the direct sound signal may be determined from a head-related transfer function database based on the incident angle of the direct sound signal relative to the listener and the orientation information of the listener.
  • Step 403 Determine a directional gain parameter of the direct sound signal based on the directional information of the sound source and the sound cone information.
  • the direction information and the sound cone information of the sound source may be input into the sound cone model, and the sound cone model outputs the direction gain parameter of the direct sound signal.
  • Step 404 Determine a distance gain parameter of the direct sound signal based on the spatial orientation information of the sound source and/or the spatial orientation information of the listener.
  • the sound source position and the listener position (the position here is an absolute position) can be determined based on the spatial orientation information of the sound source and/or the spatial orientation information of the listener, and then the distance gain parameter of the direct sound signal is determined based on the distance between the sound source position and the listener position.
  • Step 405 Determine the first gain HRIR based on the orientation gain parameter, the distance gain parameter and the first HRIR.
  • the first HRIR may be weighted and multiplied by the orientation gain parameter and the distance gain parameter to obtain the first gain HRIR.
  • the signal receiving end will determine the first gain HRIR corresponding to the direct sound signal in the audio object signal to be rendered; and will also determine the second gain HRIR corresponding to at least one reflected sound signal in the audio object signal to be rendered; then, the fusion HRIR corresponding to the audio object signal to be rendered will be determined based on the first gain HRIR and the second gain HRIR; and the audio object signal to be rendered will be rendered based on the fusion HRIR.
  • the first gain HRIR can reflect the gain loss caused by the direction and distance of the sound source during the transmission of the direct sound signal.
  • the second gain HRIR can reflect at least one of the gain loss caused by the direction of the sound source, the gain loss caused by reflection, and the gain loss caused by the transmission distance during the transmission of the reflected sound signal. Therefore, when the audio object signal to be rendered is rendered based on the fusion HRIR obtained by the first gain HRIR and the second gain HRIR, the rendering effect can be ensured, the real environment of the sound signal can be restored as much as possible, the experience of rendering the direction of the sound source is improved, and the rendering effect is closer to the real situation. Moreover, the rendering method of the present invention can also realize personalized processing of different types of sound sources, and has a wide range of applications. At the same time, when rendering, the present invention can realize targeted processing of different sound sources and fully consider the scene information of the scene where the sound source is located, thereby further ensuring the rendering effect.
  • FIG5 is a flow chart of an audio signal rendering method provided by an embodiment of the present disclosure. The method is executed by a signal receiving end. As shown in FIG5 , the audio signal rendering method may include the following steps:
  • Step 501 Determine an incident angle of a reflected sound signal relative to a listener based on spatial orientation information of a sound source and/or spatial orientation information of a listener.
  • the above-mentioned “determining the incident angle of the reflected sound signal relative to the listener based on the spatial orientation information of the sound source and/or the spatial orientation information of the listener” may include the following steps:
  • Step a determining the sound source position and the listener position (the positions here are all absolute positions) based on the spatial orientation information.
  • Step b determine the mirror image sound source position relative to the reflector.
  • the reflector may be an object that reflects the acoustic signal, such as a wall.
  • different reflected sound signals correspond to different image sound source positions.
  • the method for determining its image sound source position is: first determine the first reflector that reflects the reflected sound signal for the first time and the second reflector that reflects the reflected sound signal for the second time, then determine the first image sound source position of the sound source position relative to the first reflector, then determine the second image sound source position of the first image sound source position relative to the second reflector, and determine the second image sound source position as the image sound source position of the second-order reflected sound signal.
  • Step c determining the incident angle of the reflected sound signal relative to the listener based on the image sound source position and the listener position.
  • the angle between the straight line between the image sound source position and the listener position and the normal line of the incident surface is the incident angle of the reflected sound signal relative to the listener.
  • Step 502 Determine a second HRIR corresponding to the reflected sound signal based on the incident angle of the reflected sound signal relative to the listener and the orientation information of the listener.
  • the second HRIR corresponding to the reflected sound signal may be determined from a head-related transfer function database based on the incident angle of the reflected sound signal relative to the listener and the orientation information of the listener.
  • Step 503 Determine at least one of a direction gain parameter, a reflection gain parameter, and a distance delay gain parameter of the reflected sound signal based on the information in the metadata.
  • the method for determining the directional gain parameter of the reflected sound signal based on the information in the metadata may include:
  • Step 1 Determine the mirror listener position of the listener relative to the reflector.
  • the reflector may be an object that reflects the acoustic signal, such as a wall.
  • different reflected sound signals correspond to different image listener positions.
  • the method for determining its mirror image listener position is: first determine the first reflector that reflects the reflected sound signal for the first time and the second reflector that reflects the reflected sound signal for the second time, then determine the first mirror image listener position of the listener position relative to the first reflector, then determine the second mirror image listener position of the first mirror image listener position relative to the second reflector, and determine the second mirror image listener position as the mirror image listener position of the second-order reflected sound signal.
  • Step 2 Determine the exit angle of the reflected sound signal relative to the sound source based on the mirror image listener position and the sound source position.
  • the angle between the straight line between the mirror image listener position and the sound source position and the normal line of the exit surface is the exit angle of the reflected sound signal relative to the sound source.
  • Step 3 determining a directional gain parameter of the reflected sound signal based on the emission angle, the directional information of the sound source and the sound cone information.
  • the emission angle, the direction information of the sound source and the sound cone information may be input into a sound cone model, and the sound cone model outputs the direction gain parameter of the reflected sound signal.
  • determining the reflection gain parameter of the reflected sound signal based on the information in the metadata may include: determining the reflection gain parameter of the reflected sound signal based on the room information. Specifically, the reflection coefficient of the object reflecting the reflected sound signal may be determined based on the room information, and then the reflection gain parameter of the reflected sound signal may be determined based on the reflection coefficient.
  • determining the distance delay gain parameter of the reflected sound signal based on the information in the metadata may include:
  • Step 1 Determine the mirror image listener position of the listener relative to the reflector.
  • Step 2 Determine a distance delay gain parameter based on the distance between the sound source position and the image listener position, or the distance between the image sound source position and the listener position.
  • Step 504 determine the second gain HRIR based on at least one of an orientation gain parameter, a reflection gain parameter, and a distance delay gain parameter and the second HRIR.
  • the second HRIR may be weighted and multiplied by at least one of the orientation gain parameter, the reflection gain parameter, and the distance delay gain parameter to obtain the second gain HRIR.
  • the signal receiving end will determine the first gain HRIR corresponding to the direct sound signal in the audio object signal to be rendered; and will also determine the second gain HRIR corresponding to at least one reflected sound signal in the audio object signal to be rendered; then, the fusion HRIR corresponding to the audio object signal to be rendered will be determined based on the first gain HRIR and the second gain HRIR; and the audio object signal to be rendered will be rendered based on the fusion HRIR.
  • the first gain HRIR can reflect the gain loss caused by the direction and distance of the sound source during the transmission of the direct sound signal.
  • the second gain HRIR can reflect at least one of the gain loss caused by the direction of the sound source, the gain loss caused by reflection, and the gain loss caused by the transmission distance during the transmission of the reflected sound signal. Therefore, when the audio object signal to be rendered is rendered based on the fusion HRIR obtained by the first gain HRIR and the second gain HRIR, the rendering effect can be ensured, the real environment of the sound signal can be restored as much as possible, the experience of rendering the direction of the sound source is improved, and the rendering effect is closer to the real situation. Moreover, the rendering method of the present invention can also realize personalized processing of different types of sound sources, and has a wide range of applications. At the same time, when rendering, the present invention can realize targeted processing of different sound sources and fully consider the scene information of the scene where the sound source is located, thereby further ensuring the rendering effect.
  • FIG6 is a flow chart of an audio signal rendering method provided by an embodiment of the present disclosure. The method is executed by a signal receiving end. As shown in FIG6 , the audio signal rendering method may include the following steps:
  • Step 601 Determine the sound source position and the listener position based on spatial orientation information.
  • Step 602 determine the mirror image sound source position of the sound source position relative to the reflector; the reflector is an object that reflects the sound signal;
  • Step 603 Determine an incident angle of the reflected sound signal relative to the listener based on the image sound source position and the listener position.
  • steps 601 - 603 please refer to the above embodiment description.
  • the signal receiving end will determine the first gain HRIR corresponding to the direct sound signal in the audio object signal to be rendered; and will also determine the second gain HRIR corresponding to at least one reflected sound signal in the audio object signal to be rendered; then, the fusion HRIR corresponding to the audio object signal to be rendered will be determined based on the first gain HRIR and the second gain HRIR; and the audio object signal to be rendered will be rendered based on the fusion HRIR.
  • the first gain HRIR can reflect the gain loss caused by the direction and distance of the sound source during the transmission of the direct sound signal.
  • the second gain HRIR can reflect at least one of the gain loss caused by the direction of the sound source, the gain loss caused by reflection, and the gain loss caused by the transmission distance during the transmission of the reflected sound signal. Therefore, when the audio object signal to be rendered is rendered based on the fusion HRIR obtained by the first gain HRIR and the second gain HRIR, the rendering effect can be ensured, the real environment of the sound signal can be restored as much as possible, the experience of rendering the direction of the sound source is improved, and the rendering effect is closer to the real situation. Moreover, the rendering method of the present invention can also realize personalized processing of different types of sound sources, and has a wide range of applications. At the same time, when rendering, the present invention can realize targeted processing of different sound sources and fully consider the scene information of the scene where the sound source is located, thereby further ensuring the rendering effect.
  • FIG. 7 is a flow chart of an audio signal rendering method provided by an embodiment of the present disclosure. The method is executed by a signal receiving end. As shown in FIG. 7 , the audio signal rendering method may include the following steps:
  • Step 701 determining the mirror image listener position of the listener relative to the reflector
  • Step 702 determining an emission angle of the reflected sound signal relative to the sound source based on the mirror image listener position and the sound source position;
  • Step 703 Determine a directional gain parameter of the reflected sound signal based on the exit angle, the directional information of the sound source, and the sound cone information.
  • steps 701 - 703 please refer to the above embodiment description.
  • the signal receiving end will determine the first gain HRIR corresponding to the direct sound signal in the audio object signal to be rendered; and will also determine the second gain HRIR corresponding to at least one reflected sound signal in the audio object signal to be rendered; then, the fusion HRIR corresponding to the audio object signal to be rendered will be determined based on the first gain HRIR and the second gain HRIR; and the audio object signal to be rendered will be rendered based on the fusion HRIR.
  • the first gain HRIR can reflect the gain loss caused by the direction and distance of the sound source during the transmission of the direct sound signal.
  • the second gain HRIR can reflect at least one of the gain loss caused by the direction of the sound source, the gain loss caused by reflection, and the gain loss caused by the transmission distance during the transmission of the reflected sound signal. Therefore, when the audio object signal to be rendered is rendered based on the fusion HRIR obtained by the first gain HRIR and the second gain HRIR, the rendering effect can be ensured, the real environment of the sound signal can be restored as much as possible, the experience of rendering the direction of the sound source is improved, and the rendering effect is closer to the real situation. Moreover, the rendering method of the present invention can also realize personalized processing of different types of sound sources, and has a wide range of applications. At the same time, when rendering, the present invention can realize targeted processing of different sound sources and fully consider the scene information of the scene where the sound source is located, thereby further ensuring the rendering effect.
  • FIG8 is a flow chart of an audio signal rendering method provided by an embodiment of the present disclosure. The method is executed by a signal receiving end. As shown in FIG8 , the audio signal rendering method may include the following steps:
  • Step 801 Determine a reflection gain parameter of the reflected sound signal based on the room information.
  • step 801 For a detailed description of step 801, please refer to the above embodiment description.
  • the signal receiving end determines the first gain HRIR corresponding to the direct sound signal in the audio object signal to be rendered; and also determines the second gain HRIR corresponding to at least one reflected sound signal in the audio object signal to be rendered; then, the fused HRIR corresponding to the audio object signal to be rendered is determined based on the first gain HRIR and the second gain HRIR; and the audio object signal to be rendered is rendered based on the fused HRIR.
  • the first gain HRIR can reflect the gain loss caused by the direction and distance of the sound source during the transmission of the direct sound signal.
  • the second gain HRIR can reflect at least one of the gain loss caused by the direction of the sound source, the gain loss caused by reflection, and the gain loss caused by the transmission distance during the transmission of the reflected sound signal. Therefore, when the audio object signal to be rendered is rendered based on the fused HRIR obtained by the first gain HRIR and the second gain HRIR, the rendering effect can be ensured, the real environment of the sound signal can be restored as much as possible, the experience of rendering the direction of the sound source is improved, and the rendering effect is closer to the real situation. Moreover, the rendering method of the present invention can also realize personalized processing of different types of sound sources, and has a wide range of applications. At the same time, when rendering, the present invention can realize targeted processing of different sound sources and fully consider the scene information of the scene where the sound source is located, thereby further ensuring the rendering effect.
  • FIG9 is a flow chart of an audio signal rendering method provided by an embodiment of the present disclosure. The method is executed by a signal receiving end. As shown in FIG9 , the audio signal rendering method may include the following steps:
  • Step 901 Determine a mirror image listener position of the listener relative to a reflector.
  • Step 902 Determine the distance delay gain parameter based on the distance between the sound source position and the image listener position, or the distance between the image sound source position and the listener position.
  • steps 901 - 902 please refer to the above embodiment description.
  • the signal receiving end will determine the first gain HRIR corresponding to the direct sound signal in the audio object signal to be rendered; and will also determine the second gain HRIR corresponding to at least one reflected sound signal in the audio object signal to be rendered; then, the fusion HRIR corresponding to the audio object signal to be rendered will be determined based on the first gain HRIR and the second gain HRIR; and the audio object signal to be rendered will be rendered based on the fusion HRIR.
  • the first gain HRIR can reflect the gain loss caused by the direction and distance of the sound source during the transmission of the direct sound signal.
  • the second gain HRIR can reflect at least one of the gain loss caused by the direction of the sound source, the gain loss caused by reflection, and the gain loss caused by the transmission distance during the transmission of the reflected sound signal. Therefore, when the audio object signal to be rendered is rendered based on the fusion HRIR obtained by the first gain HRIR and the second gain HRIR, the rendering effect can be ensured, the real environment of the sound signal can be restored as much as possible, the experience of rendering the direction of the sound source is improved, and the rendering effect is closer to the real situation. Moreover, the rendering method of the present invention can also realize personalized processing of different types of sound sources, and has a wide range of applications. At the same time, when rendering, the present invention can realize targeted processing of different sound sources and fully consider the scene information of the scene where the sound source is located, thereby further ensuring the rendering effect.
  • FIG10 is a flow chart of an audio signal rendering method provided by an embodiment of the present disclosure. The method is executed by a signal receiving end. As shown in FIG10 , the audio signal rendering method may include the following steps:
  • Step 1001 In response to a plurality of audio object signals to be rendered, downmix a plurality of rendered audio object signals.
  • the signal receiving end will determine the first gain HRIR corresponding to the direct sound signal in the audio object signal to be rendered; and will also determine the second gain HRIR corresponding to at least one reflected sound signal in the audio object signal to be rendered; then, the fusion HRIR corresponding to the audio object signal to be rendered will be determined based on the first gain HRIR and the second gain HRIR; and the audio object signal to be rendered will be rendered based on the fusion HRIR.
  • the first gain HRIR can reflect the gain loss caused by the direction and distance of the sound source during the transmission of the direct sound signal.
  • the second gain HRIR can reflect at least one of the gain loss caused by the direction of the sound source, the gain loss caused by reflection, and the gain loss caused by the transmission distance during the transmission of the reflected sound signal. Therefore, when the audio object signal to be rendered is rendered based on the fusion HRIR obtained by the first gain HRIR and the second gain HRIR, the rendering effect can be ensured, the real environment of the sound signal can be restored as much as possible, the experience of rendering the direction of the sound source is improved, and the rendering effect is closer to the real situation. Moreover, the rendering method of the present invention can also realize personalized processing of different types of sound sources, and has a wide range of applications. At the same time, when rendering, the present invention can realize targeted processing of different sound sources and fully consider the scene information of the scene where the sound source is located, thereby further ensuring the rendering effect.
  • the method disclosed in the present invention mainly utilizes a sound cone model to improve a reverberation module, and simulates the reflected sound received by both ears in different directions.
  • the rendering method involved in the present disclosure also belongs to the object-based spatial audio technology, and its overall workflow is shown in Figure 11.
  • the innovative part of the patent disclosed in the present disclosure is mainly the "audio orientation rendering" in Figure 11.
  • the orientation-related parameters (angle information, room information, sound cone information) in the metadata are used to render the audio object in the orientation.
  • the orientation rendering technology disclosed in the present disclosure generates binaural signals of the audio object in different orientations according to the scene (room information) and position (spatial orientation of the audio object in other metadata) of the audio object, thereby achieving a more realistic sense of the direction of the sound source.
  • the basic idea of this method is to use the image source and the sound cone model to process the direct sound and the early reflection sound.
  • the image source method is used to calculate the position of the sound source image and obtain the delayed HRIR after being absorbed by the wall. It should be noted that in the scenario of real-time communication, the cost of calculating and processing all reflected sound rays is too high. The first-order or second-order reflections are sufficient to provide a good sense of direction, and other reflections can be processed in the reverberation function.
  • the sound cone model is used to assign different gains to each reflected sound ray. In this way, when the direction changes, the left and right channels presented will be different.
  • the orientation rendering is performed using orientation rendering parameters (sound source angle information, room parameter information, and sound cone information of the audio object) and the spatial orientation of the audio object.
  • the spatial orientation of the audio object can be a relative coordinate relative to the listener, or it can be an absolute coordinate in the scene.
  • the process of orientation rendering is shown in Figure 12.
  • the sound source orientation rendering is divided into two parts. One part is to calculate the gain parameter of the direct sound using the sound cone information, and the direct sound obtains the corresponding HRIR according to the usual spatial audio rendering technology; the other part is to calculate the reflected sound at the listener's position based on the room information and the sound cone information.
  • the HRIR of the reflected sound and the related gain parameters are obtained according to the orientation and orientation of the listener and the orientation and orientation of the sound source.
  • the gain parameters are used to process the HRIR of the reflected sound.
  • the weighted sum of the two parts of the HRIR is obtained to obtain the HRIR with a sense of orientation of the object, which is used to perform binaural rendering of the audio object signal. Repeat this process for multiple objects and then downmix to obtain the final output.
  • reflected sound processing first, calculate the position of the mirror sound source and the mirror listener; the line between the real listener and each mirror sound source can form a reflected sound ray. Then, use the mirror position and metadata to obtain the gain and delay of each reflected sound ray.
  • the gain includes the orientation gain value, the reflection gain value and the distance gain value.
  • the distance loss of each sound ray is based on the distance between the real listener position and the mirror sound source or the distance between the real sound source position and the mirror listener.
  • the reflection loss is based on the reflection coefficient in the metadata.
  • the orientation loss of the reflected sound is based on the sound cone model and the exit angle of the reflected sound ray relative to the direction of the sound source.
  • the distance loss and delay are determined according to the distance between the real listener position and the mirror sound source.
  • the head-related impulse response HRIR corresponding to each sound ray is obtained according to the incident angle of the reflected sound ray relative to the direction of the human head. All head-related impulse responses are summed and processed using the calculated gain and delay to obtain the final head-related impulse response for binaural rendering.
  • This disclosure proposes a spatial audio rendering method with a sense of sound source direction. Based on the existing technology, the sound cone model is combined to improve the reverberation algorithm to achieve the rendering of the sense of sound source direction.
  • the technical solution proposed in this disclosure can enhance the experience of sound source direction rendering and make the rendering effect closer to the real situation.
  • FIG. 13 is a schematic diagram of the structure of a communication device provided by an embodiment of the present disclosure. As shown in FIG. 13 , the device may include:
  • a processing module used for determining a first gain head-related impulse response HRIR corresponding to a direct sound signal in an audio object signal to be rendered;
  • the processing module is further used to determine a second gain HRIR corresponding to at least one reflected sound signal in the audio object signal to be rendered;
  • the processing module is further used to determine a fused HRIR corresponding to the audio object signal to be rendered based on the first gain HRIR and the second gain HRIR;
  • the processing module is further used to render the audio object signal to be rendered based on the fused HRIR.
  • the signal receiving end will determine the first gain HRIR corresponding to the direct sound signal in the audio object signal to be rendered; and will also determine the second gain HRIR corresponding to at least one reflected sound signal in the audio object signal to be rendered; then, the fusion HRIR corresponding to the audio object signal to be rendered will be determined based on the first gain HRIR and the second gain HRIR; and the audio object signal to be rendered will be rendered based on the fusion HRIR.
  • the first gain HRIR can reflect the gain loss caused by the direction and distance of the sound source during the transmission of the direct sound signal.
  • the second gain HRIR can reflect at least one of the gain loss caused by the direction of the sound source, the gain loss caused by reflection, and the gain loss caused by the transmission distance during the transmission of the reflected sound signal. Therefore, when the audio object signal to be rendered is rendered based on the fusion HRIR obtained by the first gain HRIR and the second gain HRIR, the rendering effect can be ensured, the real environment of the sound signal can be restored as much as possible, the experience of rendering the direction of the sound source is improved, and the rendering effect is closer to the real situation. Moreover, the rendering method of the present invention can also realize personalized processing of different types of sound sources, and has a wide range of applications. At the same time, when rendering, the present invention can realize targeted processing of different sound sources and fully consider the scene information of the scene where the sound source is located, thereby further ensuring the rendering effect.
  • the processing module is further configured to:
  • a first gain HRIR corresponding to the direct sound signal and a second gain HRIR corresponding to the reflected sound signal in the audio object signal to be rendered are determined.
  • the metadata includes at least one of the following:
  • Room information of the room where the sound source of the audio object signal to be rendered is located
  • the spatial position information of the listener is the spatial position information of the listener.
  • the processing module is further configured to:
  • the first gain HRIR is determined based on the orientation gain parameter, the distance gain parameter, and the first HRIR.
  • the processing module is further configured to:
  • a distance gain parameter of the direct sound signal is determined based on the distance between the sound source position and the listener position.
  • the processing module is further configured to:
  • the second gain HRIR is determined based on at least one of the orientation gain parameter, the reflection gain parameter, and the distance delay gain parameter and the second HRIR.
  • the processing module is further configured to:
  • the reflector is an object that reflects the sound signal
  • the incident angle of the reflected sound signal relative to the listener is determined based on the image sound source position and the listener position.
  • the processing module is further configured to:
  • a directional gain parameter of the reflected sound signal is determined based on the exit angle, the directional information of the sound source and the sound cone information.
  • the processing module is further configured to:
  • a reflection gain parameter of the reflected sound signal is determined based on the room information.
  • the processing module is further configured to:
  • the distance delay gain parameter is determined based on the distance between the sound source position and the image listener position, or the distance between the image sound source position and the listener position.
  • the processing module is further configured to:
  • the fused HRIR is obtained by weightedly adding a first gain HRIR corresponding to the direct sound signal of the audio object signal to be rendered and a second gain HRIR corresponding to at least one reflected sound signal of the audio object signal to be rendered.
  • the apparatus in response to a plurality of audio object signals to be rendered, is further configured to:
  • the reflected sound signal includes an early reflected sound signal.
  • the early reflected sound signal includes at least one of the following:
  • Second-order reflected sound signal Second-order reflected sound signal.
  • FIG 14 is a schematic diagram of the structure of a communication device 1400 provided in an embodiment of the present application.
  • the communication device 1400 can be a network device, or a terminal device, or a chip, a chip system, or a processor that supports the network device to implement the above method, or a chip, a chip system, or a processor that supports the terminal device to implement the above method.
  • the device can be used to implement the method described in the above method embodiment, and the details can be referred to the description in the above method embodiment.
  • the communication device 1400 may include one or more processors 1401.
  • the processor 1401 may be a general-purpose processor or a dedicated processor, etc.
  • it may be a baseband processor or a central processing unit.
  • the baseband processor may be used to process the communication protocol and communication data
  • the central processing unit may be used to control the communication device (such as a base station, a baseband chip, a terminal device, a terminal device chip, a DU or a CU, etc.), execute a computer program, and process the data of the computer program.
  • the communication device 1400 may further include one or more memories 1402, on which a computer program 1404 may be stored, and the processor 1401 executes the computer program 1404 so that the communication device 1400 performs the method described in the above method embodiment.
  • data may also be stored in the memory 1402.
  • the communication device 1400 and the memory 1402 may be provided separately or integrated together.
  • the communication device 1400 may further include a transceiver 1405 and an antenna 1406.
  • the transceiver 1405 may be referred to as a transceiver unit, a transceiver, or a transceiver circuit, etc., and is used to implement a transceiver function.
  • the transceiver 1405 may include a receiver and a transmitter, the receiver may be referred to as a receiver or a receiving circuit, etc., and is used to implement a receiving function; the transmitter may be referred to as a transmitter or a transmitting circuit, etc., and is used to implement a transmitting function.
  • the communication device 1400 may further include one or more interface circuits 1407.
  • the interface circuit 1407 is used to receive code instructions and transmit them to the processor 1401.
  • the processor 1401 runs the code instructions to enable the communication device 1400 to perform the method described in the above method embodiment.
  • the processor 1401 may include a transceiver for implementing the receiving and sending functions.
  • the transceiver may be a transceiver circuit, an interface, or an interface circuit.
  • the transceiver circuit, interface, or interface circuit for implementing the receiving and sending functions may be separate or integrated.
  • the above-mentioned transceiver circuit, interface, or interface circuit may be used for reading and writing code/data, or the above-mentioned transceiver circuit, interface, or interface circuit may be used for transmitting or delivering signals.
  • the processor 1401 may store a computer program 1403, which runs on the processor 1401 and enables the communication device 1400 to perform the method described in the above method embodiment.
  • the computer program 1403 may be fixed in the processor 1401, in which case the processor 1401 may be implemented by hardware.
  • the communication device 1400 may include a circuit that can implement the functions of sending or receiving or communicating in the aforementioned method embodiments.
  • the processor and transceiver described in the present application can be implemented in an integrated circuit (IC), an analog IC, a radio frequency integrated circuit RFIC, a mixed signal IC, an application specific integrated circuit (ASIC), a printed circuit board (PCB), an electronic device, etc.
  • the processor and transceiver can also be manufactured using various IC process technologies, such as complementary metal oxide semiconductor (CMOS), N-type metal oxide semiconductor (nMetal-oxide-semiconductor, NMOS), P-type metal oxide semiconductor (positive channel metal oxide semiconductor, PMOS), bipolar junction transistor (bipolar junction transistor, BJT), bipolar CMOS (BiCMOS), silicon germanium (SiGe), gallium arsenide (GaAs), etc.
  • CMOS complementary metal oxide semiconductor
  • N-type metal oxide semiconductor nMetal-oxide-semiconductor
  • PMOS bipolar junction transistor
  • BJT bipolar junction transistor
  • BiCMOS bipolar CMOS
  • SiGe silicon germanium
  • GaAs gallium arsenide
  • the communication device described in the above embodiments may be a network device or a terminal device, but the scope of the communication device described in the present application is not limited thereto, and the structure of the communication device may not be limited by FIG. 14.
  • the communication device may be an independent device or may be part of a larger device.
  • the communication device may be:
  • the IC set may also include a storage component for storing data and computer programs;
  • ASIC such as modem
  • the communication device can be a chip or a chip system
  • the communication device can be a chip or a chip system
  • the schematic diagram of the chip structure shown in Figure 15 includes a processor 1501 and an interface 1502.
  • the number of processors 1501 can be one or more, and the number of interfaces 1502 can be multiple.
  • the chip further includes a memory 1503, and the memory 1503 is used to store necessary computer programs and data.
  • the present application also provides a readable storage medium having instructions stored thereon, which implement the functions of any of the above method embodiments when executed by a computer.
  • the present application also provides a computer program product, which implements the functions of any of the above method embodiments when executed by a computer.
  • the computer program product includes one or more computer programs.
  • the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer program can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer program can be transmitted from a website site, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) mode to another website site, computer, server or data center.
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server or data center that includes one or more available media integrated.
  • the available medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a high-density digital video disc (DVD)), or a semiconductor medium (e.g., a solid state disk (SSD)), etc.
  • a magnetic medium e.g., a floppy disk, a hard disk, a magnetic tape
  • an optical medium e.g., a high-density digital video disc (DVD)
  • DVD high-density digital video disc
  • SSD solid state disk
  • At least one in the present application can also be described as one or more, and a plurality can be two, three, four or more, which is not limited in the present application.
  • the technical features in the technical feature are distinguished by “first”, “second”, “third”, “A”, “B”, “C” and “D”, etc., and there is no order of precedence or size between the technical features described by the "first”, “second”, “third”, “A”, “B”, “C” and “D”.
  • the corresponding relationships shown in each table in the present application can be configured or predefined.
  • the values of the information in each table are only examples and can be configured as other values, which are not limited by the present application.
  • the corresponding relationships shown in some rows may not be configured.
  • appropriate deformation adjustments can be made based on the above table, such as splitting, merging, etc.
  • the names of the parameters shown in the titles in the above tables can also use other names that can be understood by the communication device, and the values or representations of the parameters can also be other values or representations that can be understood by the communication device.
  • other data structures can also be used, such as arrays, queues, containers, stacks, linear lists, pointers, linked lists, trees, graphs, structures, classes, heaps, hash tables or hash tables.
  • the predefined in the present application may be understood as defined, predefined, stored, pre-stored, pre-negotiated, pre-configured, solidified, or pre-burned.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)

Abstract

本公开提出一种音频信号渲染方法、装置、设备及存储介质,方法包括:确定待渲染的音频对象信号中的直达声信号对应的第一增益HRIR;确定待渲染的音频对象信号中的至少一个反射声信号对应的第二增益HRIR;基于所述第一增益HRIR和所述第二增益HRIR确定待渲染的音频对象信号对应的融合HRIR;基于所述融合HRIR对待渲染的音频对象信号进行渲染。本公开的方法可以确保渲染效果,能够尽可能还原声信号的真实环境,提升声源朝向渲染的体验感,使渲染效果更接近真实情况。

Description

一种音频信号渲染方法、装置、设备及存储介质 技术领域
本公开涉及通信技术领域,尤其涉及音频信号渲染方法、装置、设备及存储介质。
背景技术
由于空间音频可以给用户带来逼真的空间感与方位感,因此空间音频处理技术得到了广泛的应用。其中,一种实现空间音频的有效方法是基于对象的空间音频技术。该基于对象的空间音频技术主要包括编解码和渲染两个主要部分。其中编解码部分是为了传输用户端要播放的音频对象信号以及对应的元数据,渲染部分是根据接收到的音频对象信号及其元数据再现空间音频。
相关技术中,渲染音频对象信号的方包括以下至少一种:
方法一:根据音频对象信号的声锥信息计算音频对象信号的朝向增益参数,再基于该朝向增益参数对音频对象信号进行渲染。
方法二:训练一套头部相关声音传输函数(Head-Related Transfer Function,HRTF)来模拟声源在不同方位不同朝向时传播到人耳的变化,获取音频对象信号对应声源的朝向信息,基于该朝向信息在渲染时利用训练得到的HRTF滤波得到渲染后的音频对象信号。
但是,相关技术的方法一和方法二的渲染效果均较差,使得声源的朝向感不会十分真实。并且,方法二中不适用于针对各类的声源进行个性化调整,适用范围较小。
发明内容
本公开提出的音频信号渲染方法、装置、设备及存储介质,以解决相关技术方法中的渲染效果较差,适用范围较小的技术问题。
第一方面,本公开实施例提供一种音频信号渲染方法,该方法被信号接收端执行,包括:
确定待渲染的音频对象信号中的直达声信号对应的第一增益头相关冲击响应HRIR;
确定待渲染的音频对象信号中的至少一个反射声信号对应的第二增益HRIR;
基于所述第一增益HRIR和所述第二增益HRIR确定待渲染的音频对象信号对应的融合HRIR;
基于所述融合HRIR对待渲染的音频对象信号进行渲染。
本公开中,信号接收端会确定待渲染的音频对象信号中的直达声信号对应的第一增益HRIR;还会确定待渲染的音频对象信号中的至少一个反射声信号对应的第二增益HRIR;之后,会基于第一增益HRIR和所述第二增益HRIR确定待渲染的音频对象信号对应的融合HRIR;并基于融合HRIR对待渲染的音频对象信号进行渲染。其中,在本公开中,该第一增益HRIR能够体现直达声信号在传输过程中,声源朝向和距离而造成的增益损失。该第二增益HRIR能够体现反射声信号在传输过程中,声源朝向而造成的增益损失、反射造成的增益损失、传输距离造成的增益损失中的至少一种。由此当基于由第一增益HRIR和第二增益HRIR得到的融合HRIR,来渲染待渲染的音频对象信号时,可以确保渲染效果,能够尽可能还原声信号的真实环境,提升声源朝向渲染的体验感,使渲染效果更接近真实情况。并且,本公开的渲染方法中还可以实现对不同类声源的个性化处理,适用范围较广,同时,本公开在渲染时,可以实现对不同声源的针对性处理,且充分考虑到声源所处场景的场景信息,从而可以进一步确保渲染效果。
第二方面,本公开实施例提供一种通信装置,该装置被配置在第一网络中的AAnF网元中,包括:
处理模块,用于确定待渲染的音频对象信号中的直达声信号对应的第一增益头相关冲击响应HRIR;
所述处理模块,还用于确定待渲染的音频对象信号中的至少一个反射声信号对应的第二增益HRIR;
所述处理模块,还用于基于所述第一增益HRIR和所述第二增益HRIR确定待渲染的音频对象信号对应的融合HRIR;
所述处理模块,还用于基于所述融合HRIR对待渲染的音频对象信号进行渲染。
第三方面,本公开实施例提供一种通信装置,该通信装置包括处理器,当该处理器调用存储器中的计算机程序时,执行上述第一方面所述的方法。
第四方面,本公开实施例提供一种通信装置,该通信装置包括处理器和存储器,该存储器中存储有计算机程序;所述处理器执行该存储器所存储的计算机程序,以使该通信装置执行上述第一方面所述的方法。
第五方面,本公开实施例提供一种通信装置,该装置包括处理器和接口电路,该接口电路用于接收代码指令并传输至该处理器,该处理器用于运行所述代码指令以使该装置执行上述第一方面所述的方法。
第六方面,本公开实施例提供一种通信系统,该系统包括第二方面所述的通信装置,或者,该系统包括第三方面所述的通信装置,或者,该系统包括第四方面所述的通信装置,或者,该系统包括第五方面所述的通信装置。
第六方面,本发明实施例提供一种计算机可读存储介质,用于储存为上述网络设备所用的指令,当所述指令被执行时,使所述终端设备执行上述第一方面所述的方法。
第七方面,本公开还提供一种包括计算机程序的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面所述的方法。
第八方面,本公开提供一种芯片系统,该芯片系统包括至少一个处理器和接口,用于支持网络设备实现第一方面所述的方法所涉及的功能,例如,确定或处理上述方法中所涉及的数据和信息中的至少一种。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存源辅节点必要的计算机程序和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。
第九方面,本公开提供一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面所述的方法。
附图说明
本公开上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:
图1为本公开实施例提供的一种通信系统的架构示意图;
图2a为本公开另一个实施例所提供的音频信号渲染方法的流程示意图;
图2b为本公开实施例所提供的一种直达声信号和反射声信号的示意图;
图3a为本公开再一个实施例所提供的音频信号渲染方法的流程示意图;
图3b为本公开实施例所提供的一种声锥内角和声锥外角的示意图;
图4为本公开又一个实施例所提供的音频信号渲染方法的流程示意图;
图5为本公开另一个实施例所提供的音频信号渲染方法的流程示意图;
图6为本公开再一个实施例所提供的音频信号渲染方法的流程示意图;
图7为本公开又一个实施例所提供的音频信号渲染方法的流程示意图;
图8为本公开又一个实施例所提供的音频信号渲染方法的流程示意图;
图9为本公开又一个实施例所提供的音频信号渲染方法的流程示意图;
图10为本公开又一个实施例所提供的音频信号渲染方法的流程示意图;
图11为本公开又一个实施例所提供的音频信号渲染方法的流程示意图;
图12为本公开又一个实施例所提供的音频信号渲染方法的流程示意图;
图13为本公开另一个实施例所提供的通信装置的结构示意图;
图14是本申请实施例提供的一种通信装置的结构示意图;
图15为本公开一个实施例所提供的一种芯片的结构示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表 与本公开实施例相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开实施例的一些方面相一致的装置和方法的例子。
在本公开实施例使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开实施例。在本公开实施例和所附权利要求书中所使用的单数形式的“一种”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本公开实施例可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开实施例范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”及“若”可以被解释成为“在……时”或“当……时”或“响应于确定”。
下面详细描述本公开的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的要素。下面通过参考附图描述的实施例是示例性的,旨在用于解释本公开,而不能理解为对本公开的限制。
为了更好的理解本公开实施例公开的一种音频信号渲染方法,下面首先对本公开实施例适用的通信系统进行描述。
请参见图1,图1为本公开实施例提供的一种通信系统的架构示意图。该通信系统可包括但不限于一个信号发送设备和一个信号接收设备,其中,上述的信号发送设备和信号接收设备均可以为网络设备或终端设备。以及,图1所示的设备数量和形态仅用于举例并不构成对本公开实施例的限定,实际应用中可以包括两个或两个以上的信号发送设备,两个或两个以上的信号接收设备。图1所示的通信系统以包括一个信号发送设备11、一个信号接收设备12为例。
需要说明的是,本公开实施例的技术方案可以应用于各种通信系统。例如:长期演进(long term evolution,LTE)系统、第五代(5th generation,5G)移动通信系统、5G新空口(new radio,NR)系统,或者其他未来的新型移动通信系统等。
本公开实施例中的网络设备是网络侧的一种用于发射或接收信号的实体。例如,网络设备11可以为演进型基站(evolved NodeB,eNB)、发送接收点(transmission reception point,TRP)、NR系统中的下一代基站(next generation NodeB,gNB)、其他未来移动通信系统中的基站或无线保真(wireless fidelity,WiFi)系统中的接入节点等。本公开的实施例对网络设备所采用的具体技术和具体设备形态不做限定。本公开实施例提供的网络设备可以是由集中单元(central unit,CU)与分布式单元(distributed unit,DU)组成的,其中,CU也可以称为控制单元(control unit),采用CU-DU的结构可以将网络设备,例如基站的协议层拆分开,部分协议层的功能放在CU集中控制,剩下部分或全部协议层的功能分布在DU中,由CU集中控制DU。
本公开实施例中的终端设备是用户侧的一种用于接收或发射信号的实体,如手机。终端设备也可以称为终端设备(terminal)、用户设备(user equipment,UE)、移动台(mobile station,MS)、移动终端设备(mobile terminal,MT)等。终端设备可以是具备通信功能的汽车、智能汽车、手机(mobile phone)、穿戴式设备、平板电脑(Pad)、带无线收发功能的电脑、虚拟现实(virtual reality,VR)终端设备、增强现实(augmented reality,AR)终端设备、工业控制(industrial control)中的无线终端设备、无人驾驶(self-driving)中的无线终端设备、远程手术(remote medical surgery)中的无线终端设备、智能电网(smart grid)中的无线终端设备、运输安全(transportation safety)中的无线终端设备、智慧城市(smart city)中的无线终端设备、智慧家庭(smart home)中的无线终端设备等等。本公开的实施例对终端设备所采用的具体技术和具体设备形态不做限定。
可以理解的是,本公开实施例描述的通信系统是为了更加清楚的说明本公开实施例的技术方案,并不构成对于本公开实施例提供的技术方案的限定,本领域普通技术人员可知,随着系统架构的演变和新业务场景的出现,本公开实施例提供的技术方案对于类似的技术问题,同样适用。
下面参考附图对本公开实施例所提供的音频信号渲染方法、装置、设备及存储介质进行详细描述。
需要说明的是,本公开中,任一个实施例提供的音频信号渲染方法可以单独执行,实施例中任一实现方式也可以单独执行,或是结合其他实施例,或其他实施例中的可能的实现方法一起被执行,还可以 结合相关技术中的任一种技术方案一起被执行。
图2a为本公开实施例所提供的一种音频信号渲染方法的流程示意图,该方法由信号接收端执行,如图2a所示,该音频信号渲染方法可以包括以下步骤:
步骤201、确定待渲染的音频对象信号中的直达声信号对应的第一增益头相关冲击响应(Head RelatedInpulse Response,HRIR)。
在本公开的一个实施例之中,上述的直达声信号可以为:传输路径与声源和听音者之间的直线重合的信号,也即是,可以理解为:直达声信号为由声源发出的未经反射、折射或衍射而直接传输至听音者位置的信号。每个音频对象信号中包括一个直达声信号。
其中,图2b为本公开实施例所提供的一种直达声信号和反射声信号的示意图,从图2b中的(1)和(2)两图可以看出,当声源朝向发生变化时,由于声源所处的空间中可能存在障碍(如墙壁),听音者的双耳会接收到不同的声音信号。图2b的(1)中,声源朝着听音者发声,听音者的左耳会接收到较强的直达声信号,以及还会接收到声锥辐射衰减后再经过1次反射的反射声信号;图2b的(2)中,声源朝着听音者前方某区域发声,听音者的左耳会接收到声锥辐射衰减后的直达声信号,右耳接收到经过1次反射的反射声信号。
以及,在本公开的一个实施例之中,上述的第一增益HRIR可以是基于直达声信号的朝向增益参数和距离增益参数确定的,可以用于体现直达声信号在传输过程中,声源朝向和距离而造成的增益损失。
进一步地,在本公开的一个实施例之中,上述的“确定待渲染的音频对象信号中的直达声信号对应的第一增益HRIR”可以包括:针对每个待渲染的音频对象信号,分别确定出该音频对象信号中的直达声信号对应的第一增益HRIR。
关于具体如何确定该第一增益HRIR的相关介绍会在后续实施例描述。
步骤202、确定待渲染的音频对象信号中的至少一个反射声信号对应的第二增益HRIR。
在本公开的一个实施例之中,上述的反射声信号可以为:传输路径与镜像声源位置和听音者之间的直线重合的信号,其中,该镜像声源位置可以为:声源位置相对于反射体的镜像位置,其中,关于镜像声源位置具体如何确定会在后续实施例进行详细说明。
以及,在本公开的一个实施例之中,该反射声信号可以理解为:由声源发出的经过反射后传输至听音者位置的信号。
该第二增益HRIR可以是基于反射声信号的朝向增益参数、反射增益参数、距离延迟增益参数中的至少一种确定的,可以用于体现反射声信号在传输过程中,以下至少一种增益损失:
声源朝向而造成的增益损失;
反射造成的增益损失;
传输距离造成的增益损失和传输延迟损失。
需要说明的是,在本公开的一个实施例之后,之所以要计算上述第一增益HRIR和第二增益HRIR,主要是为了后续可以基于所计算出的第一增益HRIR和第二增益HRIR渲染音频对象信号,以确保渲染过程中可以考虑到各个方面的增益损失,保证渲染精度,提升声源朝向渲染的体验感,使渲染效果更接近真实情况。
但是,在本公开的一个实施例之中,上述的“确定待渲染的音频对象信号中的反射声信号对应的第二增益HRIR”主要为:针对每个待渲染的音频对象信号,分别确定出该音频对象信号中的反射声信号对应的第二增益HRIR。其中,每个音频对象信号中包括至少一个反射声信号。基于此,若计算音频对象信号中每一个反射声信号对应的第二增益HRIR,则所需计算量较大。由此,在本公开的一个实施例之中,可以仅计算早期反射声对应的第二增益HRIR,以降低计算量。并且,该早期反射声对应的第二增益HRIR足以提供良好的方向感,也可以很大程度上保证渲染效果。
其中,在本公开的一个实施例之中,该早期反射声信号可以包括以下至少一种:
一阶反射声信号(即:仅反射了一次的声信号);
二阶反射声信号(即:反射了二次的声信号)。
此外,需要强调的是,上述内容中出现了“距离增益参数”和“距离延迟增益参数”。该两者之间存在不同。其中,距离增益参数用于体现:在声信号传输过程中,由于距离而造成的辐射衰减等损失。距离延迟增益参数用于体现:在声信号传输过程中,由于距离而造成的辐射衰减等损失,以及,由于距离而引起的传输延迟损失。
步骤203、基于第一增益HRIR和第二增益HRIR确定待渲染的音频对象信号对应的融合HRIR。
其中,在本公开的一个实施例之中,可以通过对待渲染的音频对象信号的直达声信号对应的第一增益HRIR,以及待渲染的音频对象信号的至少一个反射声信号对应的第二增益HRIR进行加权求和得到融合HRIR。
以及,在本公开的一个实施例之中,具体是针对每个待渲染的音频对象信号,分别确定出该音频对象信号对应的融合HRIR。
步骤204、基于融合HRIR对待渲染的音频对象信号进行渲染。
由前述内容可知,在本公开的一个实施例之中,该融合HRIR是基于待渲染的音频对象信号对应的第一增益HRIR和第二增益HRIR确定出的,其中,该第一增益HRIR能够体现直达声信号在传输过程中,声源朝向和距离而造成的增益损失。该第二增益HRIR能够体现反射声信号在传输过程中,声源朝向而造成的增益损失、反射造成的增益损失、传输距离造成的增益损失中的至少一种。基于此,当基于融合HRIR渲染待渲染的音频对象信号时,不仅可以考虑到直达声信号在传输过程中,声源朝向而造成的增益损失,还可以考虑到反射声信号在传输过程中,声源朝向而造成的增益损失、反射造成的增益损失、传输距离造成的增益损失,从而可以确保对该待渲染的音频对象信号的渲染效果,确保能够尽可能还原声信号的真实环境,提升声源朝向渲染的体验感,使渲染效果更接近真实情况。
并且,在本公开的一个实施例之中,上述的”基于融合HRIR对待渲染的音频对象信号进行渲染”时,主要是基于各个待渲染的音频对象信号的融合HRIR,分别对各个待渲染的音频对象信号进行渲染。由此可知,本公开方法中是针对各个待渲染的音频对象信号进行针对性渲染,也即是,本公开实现了对不同声源的针对性处理,从而可以进一步确保渲染效果。
再者,本公开的一个实施例之中,上述的反射增益参数、距离增益参数、距离延迟增益参数均可以体现出声源所处场景的场景信息,由此可知,本公开的方法中在渲染信号时,可以充分考虑到声源所处场景的场景信息,确保渲染的准确性。
此外,本公开的一个实施例之中,在对音频对象信号进行渲染时,并非仅仅依赖于朝向增益参数,还同时会参考反射增益参数以及距离增益参数,由此使得本公开的方法不会仅局限于对朝向性发声的声源进行处理,而还可以适用于对其他类的声源(如发散性发声的声源)进行个性化调整,适用范围较广。
综上所述,本公开实施例提供的音频信号渲染方法之中,信号接收端会确定待渲染的音频对象信号中的直达声信号对应的第一增益HRIR;还会确定待渲染的音频对象信号中的至少一个反射声信号对应的第二增益HRIR;之后,会基于第一增益HRIR和所述第二增益HRIR确定待渲染的音频对象信号对应的融合HRIR;并基于融合HRIR对待渲染的音频对象信号进行渲染。其中,在本公开中,该第一增益HRIR能够体现直达声信号在传输过程中,声源朝向和距离而造成的增益损失。该第二增益HRIR能够体现反射声信号在传输过程中,声源朝向而造成的增益损失、反射造成的增益损失、传输距离造成的增益损失中的至少一种。由此当基于由第一增益HRIR和第二增益HRIR得到的融合HRIR,来渲染待渲染的音频对象信号时,可以确保渲染效果,能够尽可能还原声信号的真实环境,提升声源朝向渲染的体验感,使渲染效果更接近真实情况。并且,本公开的渲染方法中还可以实现对不同类声源的个性化处理,适用范围较广,同时,本公开在渲染时,可以实现对不同声源的针对性处理,且充分考虑到声源所处场景的场景信息,从而可以进一步确保渲染效果。
图3a为本公开实施例所提供的一种音频信号渲染方法的流程示意图,该方法由信号接收端执行,如图3a所示,该音频信号渲染方法可以包括以下步骤:
步骤301、接收信号发送端发送的码流。
步骤302、对该码流进行解码,得到至少一个待渲染的音频对象信号以及至少一个待渲染的音频对象信号的元数据。
其中,在本公开的一个实施例之中,该元数据中可以包括以下至少一种信息:
待渲染的音频对象信号声源的朝向信息;
待渲染的音频对象信号声源的声锥信息;
待渲染的音频对象信号声源所处房间的房间信息;
待渲染的音频对象信号声源的空间方位信息;
听音者的朝向信息;
听音者的空间方位信息。
具体的,在本公开的一个实施例之中,上述的“声源的朝向信息”可以包括声源的朝向角度和/或朝向方位等。
上述的“声源的声锥信息”可以包括声锥内角、声锥外角、外角增益值中的至少一种。
其中,每个声源均对应有声锥内角和声锥外角,图3b为本公开实施例所提供的一种声锥内角和声锥外角的示意图。该声锥内角意为:位于声锥内角范围内的声信号,认为不存在由于朝向引起的辐射衰减;该声锥外角意为:位于声锥内角范围外,且位于声锥外角范围内的声信号,认为存在由于朝向引起的辐射衰减,其中,该外角增益值可以用于体现该部分信号的由于朝向引起的辐射衰减的最大程度。以及,认为在声锥外角范围外,声源发出的声信号不会再进行衰减。
上述的房间信息可以包括房间的尺寸、房间中物体(如墙壁)的反射系数等中的至少一种。
上述的声源的空间方位信息可以包括声源的绝对位置,和/或,声源与听音者的相对位置等。
上述的听音者的朝向信息可以包括听音者的朝向角度和/或朝向方位等。
上述的听音者的空间方位信息可以包括听音者的绝对位置,和/或,听音者相对于声源的相对位置等。
其中,上述的绝对位置和相对位置可以分别为绝对坐标或相对坐标,其中,该绝对坐标和相对坐标可以为特定坐标系中的坐标值,该特定坐标系可以为基于声源所处房间中的某一点为原点所建立的三维坐标系。
需要说明的是,上述的声源的空间方位信息和听音者的空间方位信息所包括的信息,应当能够确定出听音者的绝对位置和声源的绝对位置。
此外,在本公开的一个实施例之中,上述的听音者的朝向信息,和/或,听音者的空间方位信息也可以是由听音者直接确定出的。
步骤303、基于待渲染的音频对象信号的元数据,确定待渲染的音频对象信号中直达声信号对应的第一增益HRIR和反射声信号对应的第二增益HRIR。
关于该部分的详细介绍可以参考后续实施例描述。
综上所述,本公开实施例提供的音频信号渲染方法之中,信号接收端会确定待渲染的音频对象信号中的直达声信号对应的第一增益HRIR;还会确定待渲染的音频对象信号中的至少一个反射声信号对应的第二增益HRIR;之后,会基于第一增益HRIR和所述第二增益HRIR确定待渲染的音频对象信号对应的融合HRIR;并基于融合HRIR对待渲染的音频对象信号进行渲染。其中,在本公开中,该第一增益HRIR能够体现直达声信号在传输过程中,声源朝向和距离而造成的增益损失。该第二增益HRIR能够体现反射声信号在传输过程中,声源朝向而造成的增益损失、反射造成的增益损失、传输距离造成的增益损失中的至少一种。由此当基于由第一增益HRIR和第二增益HRIR得到的融合HRIR,来渲染待渲染的音频对象信号时,可以确保渲染效果,能够尽可能还原声信号的真实环境,提升声源朝向渲染的体验感,使渲染效果更接近真实情况。并且,本公开的渲染方法中还可以实现对不同类声源的个性化处理,适用范围较广,同时,本公开在渲染时,可以实现对不同声源的针对性处理,且充分考虑到声源所处场景的场景信息,从而可以进一步确保渲染效果。
图4为本公开实施例所提供的一种音频信号渲染方法的流程示意图,该方法由信号接收端执行,如图4所示,该音频信号渲染方法可以包括以下步骤:
步骤401、基于声源的空间方位信息和/或听音者的空间方位信息确定直达声信号相对于听音者的入射角。
其中,在本公开的一个实施例之中,可以基于声源的空间方位信息和/或听音者的空间方位信息,确定出声源与听音者的位置(此处位置意为绝对位置),之后,基于声源位置与听音者位置之间的直线即可确定出直达声信号相对于听音者的入射角。其中,声源位置与听音者位置之间直线与入射表面法线之间的夹角即为直达声信号相对于听音者的入射角。
步骤402、基于直达声信号相对于听音者的入射角和听音者的朝向信息确定直达声信号对应的第一HRIR。
具体的,在本公开的一个实施例之中,可以从头相关传递函数数据库中,基于该直达声信号相对于听音者的入射角和听音者的朝向信息确定出直达声信号对应的第一HRIR。
步骤403、基于声源的朝向信息和声锥信息确定直达声信号的朝向增益参数。
在本公开的一个实施例之中,可以将声源的朝向信息和声锥信息输入至声锥模型中,由该声锥模型输出该直达声信号的朝向增益参数。
步骤404、基于声源的空间方位信息和/或听音者的空间方位信息确定直达声信号的距离增益参数。
在本公开的一个实施例之中,可以基于声源的空间方位信息和/或听音者的空间方位信息确定出声源位置和听音者位置(此处位置为绝对位置),之后,基于声源位置和听音者位置之间的距离确定出直达声信号的距离增益参数。
步骤405、基于朝向增益参数、距离增益参数和第一HRIR确定所述第一增益HRIR。
具体的,可以利用该朝向增益参数和距离增益参数对第一HRIR进行加权相乘以得到第一增益HRIR。
综上所述,本公开实施例提供的音频信号渲染方法之中,信号接收端会确定待渲染的音频对象信号中的直达声信号对应的第一增益HRIR;还会确定待渲染的音频对象信号中的至少一个反射声信号对应的第二增益HRIR;之后,会基于第一增益HRIR和所述第二增益HRIR确定待渲染的音频对象信号对应的融合HRIR;并基于融合HRIR对待渲染的音频对象信号进行渲染。其中,在本公开中,该第一增益HRIR能够体现直达声信号在传输过程中,声源朝向和距离而造成的增益损失。该第二增益HRIR能够体现反射声信号在传输过程中,声源朝向而造成的增益损失、反射造成的增益损失、传输距离造成的增益损失中的至少一种。由此当基于由第一增益HRIR和第二增益HRIR得到的融合HRIR,来渲染待渲染的音频对象信号时,可以确保渲染效果,能够尽可能还原声信号的真实环境,提升声源朝向渲染的体验感,使渲染效果更接近真实情况。并且,本公开的渲染方法中还可以实现对不同类声源的个性化处理,适用范围较广,同时,本公开在渲染时,可以实现对不同声源的针对性处理,且充分考虑到声源所处场景的场景信息,从而可以进一步确保渲染效果。
图5为本公开实施例所提供的一种音频信号渲染方法的流程示意图,该方法由信号接收端执行,如图5所示,该音频信号渲染方法可以包括以下步骤:
步骤501、基于声源的空间方位信息和/或听音者的空间方位信息确定反射声信号相对于听音者的入射角。
其中,在本公开的一个实施例之中,上述的“基于声源的空间方位信息和/或听音者的空间方位信息确定反射声信号相对于听音者的入射角”可以包括以下步骤:
步骤a、基于空间方位信息确定声源位置和所述听音者位置(此处位置均为绝对位置)。
步骤b、确定声源位置相对于反射体的镜像声源位置。
在本公开的一个实施例之中,该反射体可以为反射了声信号的物体,如墙壁。
以及,在本公开的一个实施例之中,不同的反射声信号对应不同的镜像声源位置。
其中,针对一阶反射声信号而言,其镜像声源位置为:声源位置相对于反射体的镜像位置;针对二阶反射声信号而言,确定其镜像声源位置的方法为:先确定出第一次反射了该反射声信号的第一反射体和第二次反射了该反射声信号的第二反射体,之后,确定出声源位置相对于第一反射体的第一镜像声源位置,再确定出第一镜像声源位置相对于第二反射体的第二镜像声源位置,将该第二镜像声源位置确定为该二阶反射声信号的镜像声源位置。
步骤c、基于镜像声源位置和听音者位置确定反射声信号相对于听音者的入射角。
其中,镜像声源位置和听音者位置之间直线与入射表面法线之间的夹角即为反射声信号相对于听音者的入射角。
步骤502、基于反射声信号相对于听音者的入射角和听音者的朝向信息确定反射声信号对应的第二HRIR。
具体的,在本公开的一个实施例之中,可以从头相关传递函数数据库中,基于该反射声信号相对于听音者的入射角和听音者的朝向信息确定出反射声信号对应的第二HRIR。
步骤503、基于元数据中的信息确定所述反射声信号的朝向增益参数、反射增益参数、距离延迟增益参数中的至少一种。
可选的,在本公开的一个实施例之中,基于元数据中的信息确定反射声信号的朝向增益参数的方法可以包括:
步骤1、确定听音者相对于反射体的镜像听音者位置。
在本公开的一个实施例之中,该反射体可以为反射了声信号的物体,如墙壁。
以及,在本公开的一个实施例之中,不同的反射声信号对应不同的镜像听音者位置。
其中,针对一阶反射声信号而言,其镜像听音者位置为:听音者位置相对于反射体的镜像位置;针对二阶反射声信号而言,确定其镜像听音者位置的方法为:先确定出第一次反射了该反射声信号的第一反射体和第二次反射了该反射声信号的第二反射体,之后,确定出听音者位置相对于第一反射体的第一镜像听音者位置,再确定出第一镜像听音者位置相对于第二反射体的第二镜像听音者位置,将该第二镜像听音者位置确定为该二阶反射声信号的镜像听音者位置。
步骤2、基于镜像听音者位置和声源位置确定反射声信号相对于声源的出射角。
其中,镜像听音者位置和声源位置之间直线与出射表面法线之间的夹角即为反射声信号相对于声源的出射角。
步骤3、基于出射角、声源的朝向信息和声锥信息确定所述反射声信号的朝向增益参数。
在本公开的一个实施例之中,可以将出射角、声源的朝向信息和声锥信息输入至声锥模型中,由该声锥模型输出该反射声信号的朝向增益参数。
可选的,在本公开的一个实施例之中,基于元数据中的信息确定反射声信号的反射增益参数可以包括:基于房间信息确定反射声信号的反射增益参数。具体的,可以基于房间信息确定出反射了该反射声信号的物体的反射系数,之后,基于该反射系数确定出反射声信号的反射增益参数。
可选的,在本公开的一个实施例之中,基于元数据中的信息确定所述反射声信号的距离延迟增益参数可以包括:
步骤一、确定听音者相对于反射体的镜像听音者位置。
步骤二、基于声源位置和镜像听音者位置之间的距离,或者,镜像声源位置和听音者位置之间的距离确定距离延迟增益参数。
步骤504、基于朝向增益参数、反射增益参数、距离延迟增益参数中的至少一种以及所述第二HRIR确定所述第二增益HRIR。
具体的,可以利用朝向增益参数、反射增益参数、距离延迟增益参数中的至少一种对第二HRIR进行加权相乘以得到第二增益HRIR。
综上所述,本公开实施例提供的音频信号渲染方法之中,信号接收端会确定待渲染的音频对象信号中的直达声信号对应的第一增益HRIR;还会确定待渲染的音频对象信号中的至少一个反射声信号对应的第二增益HRIR;之后,会基于第一增益HRIR和所述第二增益HRIR确定待渲染的音频对象信号对应的融合HRIR;并基于融合HRIR对待渲染的音频对象信号进行渲染。其中,在本公开中,该第一增益HRIR能够体现直达声信号在传输过程中,声源朝向和距离而造成的增益损失。该第二增益HRIR能够体现反射声信号在传输过程中,声源朝向而造成的增益损失、反射造成的增益损失、传输距离造成的增益损失中的至少一种。由此当基于由第一增益HRIR和第二增益HRIR得到的融合HRIR,来渲染待渲染的音频对象信号时,可以确保渲染效果,能够尽可能还原声信号的真实环境,提升声源朝向渲染的体验感,使渲染效果更接近真实情况。并且,本公开的渲染方法中还可以实现对不同类声源的个性化处 理,适用范围较广,同时,本公开在渲染时,可以实现对不同声源的针对性处理,且充分考虑到声源所处场景的场景信息,从而可以进一步确保渲染效果。
图6为本公开实施例所提供的一种音频信号渲染方法的流程示意图,该方法由信号接收端执行,如图6所示,该音频信号渲染方法可以包括以下步骤:
步骤601、基于空间方位信息确定所述声源位置和所述听音者位置。
步骤602、确定所述声源位置相对于反射体的镜像声源位置;所述反射体为反射了声信号的物体;
步骤603、基于所述镜像声源位置和所述听音者位置确定所述反射声信号相对于所述听音者的入射角。
其中,关于步骤601-603的详细介绍可以参考上述实施例描述。
综上所述,本公开实施例提供的音频信号渲染方法之中,信号接收端会确定待渲染的音频对象信号中的直达声信号对应的第一增益HRIR;还会确定待渲染的音频对象信号中的至少一个反射声信号对应的第二增益HRIR;之后,会基于第一增益HRIR和所述第二增益HRIR确定待渲染的音频对象信号对应的融合HRIR;并基于融合HRIR对待渲染的音频对象信号进行渲染。其中,在本公开中,该第一增益HRIR能够体现直达声信号在传输过程中,声源朝向和距离而造成的增益损失。该第二增益HRIR能够体现反射声信号在传输过程中,声源朝向而造成的增益损失、反射造成的增益损失、传输距离造成的增益损失中的至少一种。由此当基于由第一增益HRIR和第二增益HRIR得到的融合HRIR,来渲染待渲染的音频对象信号时,可以确保渲染效果,能够尽可能还原声信号的真实环境,提升声源朝向渲染的体验感,使渲染效果更接近真实情况。并且,本公开的渲染方法中还可以实现对不同类声源的个性化处理,适用范围较广,同时,本公开在渲染时,可以实现对不同声源的针对性处理,且充分考虑到声源所处场景的场景信息,从而可以进一步确保渲染效果。
图7为本公开实施例所提供的一种音频信号渲染方法的流程示意图,该方法由信号接收端执行,如图7所示,该音频信号渲染方法可以包括以下步骤:
步骤701、确定所述听音者相对于反射体的镜像听音者位置;
步骤702、基于所述镜像听音者位置和所述声源位置确定所述反射声信号相对于所述声源的出射角;
步骤703、基于所述出射角、所述声源的朝向信息和所述声锥信息确定所述反射声信号的朝向增益参数。
其中,关于步骤701-703的详细介绍可以参考上述实施例描述。
综上所述,本公开实施例提供的音频信号渲染方法之中,信号接收端会确定待渲染的音频对象信号中的直达声信号对应的第一增益HRIR;还会确定待渲染的音频对象信号中的至少一个反射声信号对应的第二增益HRIR;之后,会基于第一增益HRIR和所述第二增益HRIR确定待渲染的音频对象信号对应的融合HRIR;并基于融合HRIR对待渲染的音频对象信号进行渲染。其中,在本公开中,该第一增益HRIR能够体现直达声信号在传输过程中,声源朝向和距离而造成的增益损失。该第二增益HRIR能够体现反射声信号在传输过程中,声源朝向而造成的增益损失、反射造成的增益损失、传输距离造成的增益损失中的至少一种。由此当基于由第一增益HRIR和第二增益HRIR得到的融合HRIR,来渲染待渲染的音频对象信号时,可以确保渲染效果,能够尽可能还原声信号的真实环境,提升声源朝向渲染的体验感,使渲染效果更接近真实情况。并且,本公开的渲染方法中还可以实现对不同类声源的个性化处理,适用范围较广,同时,本公开在渲染时,可以实现对不同声源的针对性处理,且充分考虑到声源所处场景的场景信息,从而可以进一步确保渲染效果。
图8为本公开实施例所提供的一种音频信号渲染方法的流程示意图,该方法由信号接收端执行,如图8所示,该音频信号渲染方法可以包括以下步骤:
步骤801、基于所述房间信息确定所述反射声信号的反射增益参数。
其中,关于步骤801的详细介绍可以参考上述实施例描述。
综上所述,本公开实施例提供的音频信号渲染方法之中,信号接收端会确定待渲染的音频对象信号中的直达声信号对应的第一增益HRIR;还会确定待渲染的音频对象信号中的至少一个反射声信号对应的第二增益HRIR;之后,会基于第一增益HRIR和所述第二增益HRIR确定待渲染的音频对象信号对 应的融合HRIR;并基于融合HRIR对待渲染的音频对象信号进行渲染。其中,在本公开中,该第一增益HRIR能够体现直达声信号在传输过程中,声源朝向和距离而造成的增益损失。该第二增益HRIR能够体现反射声信号在传输过程中,声源朝向而造成的增益损失、反射造成的增益损失、传输距离造成的增益损失中的至少一种。由此当基于由第一增益HRIR和第二增益HRIR得到的融合HRIR,来渲染待渲染的音频对象信号时,可以确保渲染效果,能够尽可能还原声信号的真实环境,提升声源朝向渲染的体验感,使渲染效果更接近真实情况。并且,本公开的渲染方法中还可以实现对不同类声源的个性化处理,适用范围较广,同时,本公开在渲染时,可以实现对不同声源的针对性处理,且充分考虑到声源所处场景的场景信息,从而可以进一步确保渲染效果。
图9为本公开实施例所提供的一种音频信号渲染方法的流程示意图,该方法由信号接收端执行,如图9所示,该音频信号渲染方法可以包括以下步骤:
步骤901、确定所述听音者相对于反射体的镜像听音者位置。
步骤902、基于所述声源位置和镜像听音者位置之间的距离,或者,所述镜像声源位置和听音者位置之间的距离确定所述距离延迟增益参数。
其中,关于步骤901-902的详细介绍可以参考上述实施例描述。
综上所述,本公开实施例提供的音频信号渲染方法之中,信号接收端会确定待渲染的音频对象信号中的直达声信号对应的第一增益HRIR;还会确定待渲染的音频对象信号中的至少一个反射声信号对应的第二增益HRIR;之后,会基于第一增益HRIR和所述第二增益HRIR确定待渲染的音频对象信号对应的融合HRIR;并基于融合HRIR对待渲染的音频对象信号进行渲染。其中,在本公开中,该第一增益HRIR能够体现直达声信号在传输过程中,声源朝向和距离而造成的增益损失。该第二增益HRIR能够体现反射声信号在传输过程中,声源朝向而造成的增益损失、反射造成的增益损失、传输距离造成的增益损失中的至少一种。由此当基于由第一增益HRIR和第二增益HRIR得到的融合HRIR,来渲染待渲染的音频对象信号时,可以确保渲染效果,能够尽可能还原声信号的真实环境,提升声源朝向渲染的体验感,使渲染效果更接近真实情况。并且,本公开的渲染方法中还可以实现对不同类声源的个性化处理,适用范围较广,同时,本公开在渲染时,可以实现对不同声源的针对性处理,且充分考虑到声源所处场景的场景信息,从而可以进一步确保渲染效果。
图10为本公开实施例所提供的一种音频信号渲染方法的流程示意图,该方法由信号接收端执行,如图10所示,该音频信号渲染方法可以包括以下步骤:
步骤1001、响应于包括多个待渲染的音频对象信号,将多个渲染后的音频对象信号进行下混。
综上所述,本公开实施例提供的音频信号渲染方法之中,信号接收端会确定待渲染的音频对象信号中的直达声信号对应的第一增益HRIR;还会确定待渲染的音频对象信号中的至少一个反射声信号对应的第二增益HRIR;之后,会基于第一增益HRIR和所述第二增益HRIR确定待渲染的音频对象信号对应的融合HRIR;并基于融合HRIR对待渲染的音频对象信号进行渲染。其中,在本公开中,该第一增益HRIR能够体现直达声信号在传输过程中,声源朝向和距离而造成的增益损失。该第二增益HRIR能够体现反射声信号在传输过程中,声源朝向而造成的增益损失、反射造成的增益损失、传输距离造成的增益损失中的至少一种。由此当基于由第一增益HRIR和第二增益HRIR得到的融合HRIR,来渲染待渲染的音频对象信号时,可以确保渲染效果,能够尽可能还原声信号的真实环境,提升声源朝向渲染的体验感,使渲染效果更接近真实情况。并且,本公开的渲染方法中还可以实现对不同类声源的个性化处理,适用范围较广,同时,本公开在渲染时,可以实现对不同声源的针对性处理,且充分考虑到声源所处场景的场景信息,从而可以进一步确保渲染效果。
以下对本公开上述方法进行举例说明。
本公开的方法主要在于利用声锥模型改进混响模块,模拟不同朝向下双耳接收到的反射声。
本公开中涉及的渲染方法,也是属于基于对象的空间音频技术,其整体的工作流程见图11。其中本公开专利创新的部分主要为图11中的“音频朝向渲染”。利用元数据中朝向相关参数(角度信息、房间信息、声锥信息),对音频对象进行朝向渲染。本公开的朝向渲染技术根据音频对象所处场景(房 间信息)以及位置(其他元数据中音频对象的空间方位),生成音频对象在不同朝向下的双耳信号,从而实现更加逼真的音源朝向感。
在真实世界中,当声源朝向改变时,听者可以感觉到音量的变化和双耳信号之间的差异。因此,本方法的基本思想是利用镜像源和声锥模型对直接声和早期反射声进行处理。采用镜像源法计算声源镜像的位置,得到被墙壁吸收后的延迟HRIR。需要注意的是,在实时通信的场景中,计算和处理所有反射声线的成本太高。一阶或二阶反射足以提供良好的方向感,其他反射可在混响函数中处理。基于声源方向和声源和接收器位置,使用声锥模型为每个反射声射线分配不同的增益。这样,当朝向变化时,呈现的左通道和右通道就会不同。
在本公开的渲染部分中,朝向渲染利用朝向渲染参数(声源角度信息、房间参数信息以及音频对象的声锥信息)以及音频对象的空间方位进行。音频对象的空间方位可以是相对于听音者的相对坐标,也可以是在场景中的绝对坐标。朝向渲染的流程如图12所示。本公开中音源朝向渲染分为两部分进行,一部分是利用声锥信息计算直达声的增益参数,直达声按照通常的空间音频渲染技术得到对应的HRIR;另一部分是根据房间信息与声锥信息计算听音者位置的反射声,计算时根据听者的方位与朝向、声源的方位与朝向得到反射声的HRIR和相关的增益参数,增益参数用来对反射声的HRIR进行处理。然后将两部分HRIR加权求和便得到该对象带朝向感的HRIR,用其对音频对象信号进行双耳渲染。多个对象重复该过程后进行下混,得到最终的输出。
在反射声处理的具体实施时:首先,计算镜像声源和镜像听者的位置;真实听者与每个镜像声源之间的连线可以形成反射声射线。然后,利用镜像的位置和元数据得到每条反射声线的增益和延迟。增益包括朝向增益值、反射增益值和距离增益值。每条声线的距离损失是基于真实听者位置和镜像声源之间的距离或者基于真实声源位置和镜像听音者之间的距离。反射损失基于元数据中的反射系数。反射声的朝向损失是基于声锥模型以及反射声线相对于声源朝向的出射角。距离损失和延迟根据真实听者位置和镜像声源之间的距离确定。最后,在头相关传递函数数据库中根据反射声线相对于人头朝向的入射角得到每个声线对应的头相关冲击响应HRIR,对所有的头相关冲击响应求和并利用计算到的增益和延迟进行处理,即可得到最终的头相关冲击响应用于双耳渲染。
本公开提出一种带音源朝向感的空间音频渲染方法。在现有技术的基础上,结合声锥模型改进混响算法实现对音源朝向感的渲染。本公开所提的技术方案,可以提升音源朝向渲染的体验感,使渲染效果更接近真实情况。
图13为本公开实施例所提供的一种通信装置的结构示意图,如图13所示,装置可以包括:
处理模块,用于确定待渲染的音频对象信号中的直达声信号对应的第一增益头相关冲击响应HRIR;
所述处理模块,还用于确定待渲染的音频对象信号中的至少一个反射声信号对应的第二增益HRIR;
所述处理模块,还用于基于所述第一增益HRIR和所述第二增益HRIR确定待渲染的音频对象信号对应的融合HRIR;
所述处理模块,还用于基于所述融合HRIR对待渲染的音频对象信号进行渲染。
综上所述,在本公开实施例提供的通信装置之中,信号接收端会确定待渲染的音频对象信号中的直达声信号对应的第一增益HRIR;还会确定待渲染的音频对象信号中的至少一个反射声信号对应的第二增益HRIR;之后,会基于第一增益HRIR和所述第二增益HRIR确定待渲染的音频对象信号对应的融合HRIR;并基于融合HRIR对待渲染的音频对象信号进行渲染。其中,在本公开中,该第一增益HRIR能够体现直达声信号在传输过程中,声源朝向和距离而造成的增益损失。该第二增益HRIR能够体现反射声信号在传输过程中,声源朝向而造成的增益损失、反射造成的增益损失、传输距离造成的增益损失中的至少一种。由此当基于由第一增益HRIR和第二增益HRIR得到的融合HRIR,来渲染待渲染的音频对象信号时,可以确保渲染效果,能够尽可能还原声信号的真实环境,提升声源朝向渲染的体验感,使渲染效果更接近真实情况。并且,本公开的渲染方法中还可以实现对不同类声源的个性化处理,适用范围较广,同时,本公开在渲染时,可以实现对不同声源的针对性处理,且充分考虑到声源所处场景的场景信息,从而可以进一步确保渲染效果。
可选的,在本公开的一个实施例之中,所述处理模块还用于:
接收信号发送端发送的码流;
对所述码流进行解码,得到至少一个待渲染的音频对象信号以及至少一个待渲染的音频对象信号的元数据;
基于所述待渲染的音频对象信号的元数据,确定所述待渲染的音频对象信号中直达声信号对应的第一增益HRIR和反射声信号对应的第二增益HRIR。
可选的,在本公开的一个实施例之中,所述元数据包括以下至少一种:
所述待渲染的音频对象信号声源的朝向信息;
所述待渲染的音频对象信号声源的声锥信息;
所述待渲染的音频对象信号声源所处房间的房间信息;
所述待渲染的音频对象信号声源的空间方位信息;
所述听音者的朝向信息;
所述听音者的空间方位信息。
可选的,在本公开的一个实施例之中,所述处理模块还用于:
基于所述声源的空间方位信息和/或听音者的空间方位信息确定所述直达声信号相对于听音者的入射角;
基于所述直达声信号相对于听音者的入射角和所述听音者的朝向信息确定所述直达声信号对应的第一HRIR;
基于所述声源的朝向信息和所述声锥信息确定所述直达声信号的朝向增益参数;
基于声源的空间方位信息和/或听音者的空间方位信息确定直达声信号的距离增益参数;
基于所述朝向增益参数、距离增益参数和所述第一HRIR确定所述第一增益HRIR。
可选的,在本公开的一个实施例之中,所述处理模块还用于:
基于声源的空间方位信息和/或听音者的空间方位信息确定声源位置和听音者位置;
基于所述声源位置和所述听音者位置之间的距离确定所述直达声信号的距离增益参数。
可选的,在本公开的一个实施例之中,所述处理模块还用于:
基于所述声源的空间方位信息和/或听音者的空间方位信息确定所述反射声信号相对于听音者的入射角;
基于所述反射声信号相对于听音者的入射角和所述听音者的朝向信息确定所述反射声信号对应的第二HRIR;
基于所述元数据中的信息确定所述反射声信号的朝向增益参数、反射增益参数、距离延迟增益参数中的至少一种;
基于所述朝向增益参数、反射增益参数、距离延迟增益参数中的至少一种以及所述第二HRIR确定所述第二增益HRIR。
可选的,在本公开的一个实施例之中,所述处理模块还用于:
基于所述空间方位信息确定所述声源位置和所述听音者位置;
确定所述声源位置相对于反射体的镜像声源位置;所述反射体为反射了声信号的物体;
基于所述镜像声源位置和所述听音者位置确定所述反射声信号相对于所述听音者的入射角。
可选的,在本公开的一个实施例之中,所述处理模块还用于:
确定所述听音者相对于反射体的镜像听音者位置;
基于所述镜像听音者位置和所述声源位置确定所述反射声信号相对于所述声源的出射角;
基于所述出射角、所述声源的朝向信息和所述声锥信息确定所述反射声信号的朝向增益参数。
可选的,在本公开的一个实施例之中,所述处理模块还用于:
基于所述房间信息确定所述反射声信号的反射增益参数。
可选的,在本公开的一个实施例之中,所述处理模块还用于:
确定所述听音者相对于反射体的镜像听音者位置;
基于所述声源位置和镜像听音者位置之间的距离,或者,所述镜像声源位置和听音者位置之间的距 离确定所述距离延迟增益参数。
可选的,在本公开的一个实施例之中,所述处理模块还用于:
对所述待渲染的音频对象信号的直达声信号对应的第一增益HRIR,以及所述待渲染的音频对象信号的至少一个反射声信号对应的第二增益HRIR进行加权求和得到所述融合HRIR。
可选的,在本公开的一个实施例之中,响应于包括多个待渲染的音频对象信号,所述装置还用于:
将多个渲染后的音频对象信号进行下混。
可选的,在本公开的一个实施例之中,所述反射声信号包括早期反射声信号。
可选的,在本公开的一个实施例之中,所述早期反射声信号包括以下至少一种:
一阶反射声信号;
二阶反射声信号。
请参见图14,图14是本申请实施例提供的一种通信装置1400的结构示意图。通信装置1400可以是网络设备,也可以是终端设备,也可以是支持网络设备实现上述方法的芯片、芯片系统、或处理器等,还可以是支持终端设备实现上述方法的芯片、芯片系统、或处理器等。该装置可用于实现上述方法实施例中描述的方法,具体可以参见上述方法实施例中的说明。
通信装置1400可以包括一个或多个处理器1401。处理器1401可以是通用处理器或者专用处理器等。例如可以是基带处理器或中央处理器。基带处理器可以用于对通信协议以及通信数据进行处理,中央处理器可以用于对通信装置(如,基站、基带芯片,终端设备、终端设备芯片,DU或CU等)进行控制,执行计算机程序,处理计算机程序的数据。
可选的,通信装置1400中还可以包括一个或多个存储器1402,其上可以存有计算机程序1404,处理器1401执行所述计算机程序1404,以使得通信装置1400执行上述方法实施例中描述的方法。可选的,所述存储器1402中还可以存储有数据。通信装置1400和存储器1402可以单独设置,也可以集成在一起。
可选的,通信装置1400还可以包括收发器1405、天线1406。收发器1405可以称为收发单元、收发机、或收发电路等,用于实现收发功能。收发器1405可以包括接收器和发送器,接收器可以称为接收机或接收电路等,用于实现接收功能;发送器可以称为发送机或发送电路等,用于实现发送功能。
可选的,通信装置1400中还可以包括一个或多个接口电路1407。接口电路1407用于接收代码指令并传输至处理器1401。处理器1401运行所述代码指令以使通信装置1400执行上述方法实施例中描述的方法。
在一种实现方式中,处理器1401中可以包括用于实现接收和发送功能的收发器。例如该收发器可以是收发电路,或者是接口,或者是接口电路。用于实现接收和发送功能的收发电路、接口或接口电路可以是分开的,也可以集成在一起。上述收发电路、接口或接口电路可以用于代码/数据的读写,或者,上述收发电路、接口或接口电路可以用于信号的传输或传递。
在一种实现方式中,处理器1401可以存有计算机程序1403,计算机程序1403在处理器1401上运行,可使得通信装置1400执行上述方法实施例中描述的方法。计算机程序1403可能固化在处理器1401中,该种情况下,处理器1401可能由硬件实现。
在一种实现方式中,通信装置1400可以包括电路,所述电路可以实现前述方法实施例中发送或接收或者通信的功能。本申请中描述的处理器和收发器可实现在集成电路(integrated circuit,IC)、模拟IC、射频集成电路RFIC、混合信号IC、专用集成电路(application specific integrated circuit,ASIC)、印刷电路板(printed circuit board,PCB)、电子设备等上。该处理器和收发器也可以用各种IC工艺技术来制造,例如互补金属氧化物半导体(complementary metal oxide semiconductor,CMOS)、N型金属氧化物半导体(nMetal-oxide-semiconductor,NMOS)、P型金属氧化物半导体(positive channel metal oxide semiconductor,PMOS)、双极结型晶体管(bipolar junction transistor,BJT)、双极CMOS(BiCMOS)、硅锗(SiGe)、砷化镓(GaAs)等。
以上实施例描述中的通信装置可以是网络设备或者终端设备,但本申请中描述的通信装置的范围并不限于此,而且通信装置的结构可以不受图14的限制。通信装置可以是独立的设备或者可以是较大设 备的一部分。例如所述通信装置可以是:
(1)独立的集成电路IC,或芯片,或,芯片系统或子系统;
(2)具有一个或多个IC的集合,可选的,该IC集合也可以包括用于存储数据,计算机程序的存储部件;
(3)ASIC,例如调制解调器(Modem);
(4)可嵌入在其他设备内的模块;
(5)接收机、终端设备、智能终端设备、蜂窝电话、无线设备、手持机、移动单元、车载设备、网络设备、云设备、人工智能设备等等;
(6)其他等等。
对于通信装置可以是芯片或芯片系统的情况,可参见图15所示的芯片的结构示意图。图15所示的芯片包括处理器1501和接口1502。其中,处理器1501的数量可以是一个或多个,接口1502的数量可以是多个。
可选的,芯片还包括存储器1503,存储器1503用于存储必要的计算机程序和数据。
本领域技术人员还可以了解到本申请实施例列出的各种说明性逻辑块(illustrative logical block)和步骤(step)可以通过电子硬件、电脑软件,或两者的结合进行实现。这样的功能是通过硬件还是软件来实现取决于特定的应用和整个系统的设计要求。本领域技术人员可以对于每种特定的应用,可以使用各种方法实现所述的功能,但这种实现不应被理解为超出本申请实施例保护的范围。
本申请还提供一种可读存储介质,其上存储有指令,该指令被计算机执行时实现上述任一方法实施例的功能。
本申请还提供一种计算机程序产品,该计算机程序产品被计算机执行时实现上述任一方法实施例的功能。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序。在计算机上加载和执行所述计算机程序时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机程序可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,高密度数字视频光盘(digital video disc,DVD))、或者半导体介质(例如,固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以理解:本申请中涉及的第一、第二等各种数字编号仅为描述方便进行的区分,并不用来限制本申请实施例的范围,也表示先后顺序。
本申请中的至少一个还可以描述为一个或多个,多个可以是两个、三个、四个或者更多个,本申请不做限制。在本申请实施例中,对于一种技术特征,通过“第一”、“第二”、“第三”、“A”、“B”、“C”和“D”等区分该种技术特征中的技术特征,该“第一”、“第二”、“第三”、“A”、“B”、“C”和“D”描述的技术特征间无先后顺序或者大小顺序。
本申请中各表所示的对应关系可以被配置,也可以是预定义的。各表中的信息的取值仅仅是举例,可以配置为其他值,本申请并不限定。在配置信息与各参数的对应关系时,并不一定要求必须配置各表中示意出的所有对应关系。例如,本申请中的表格中,某些行示出的对应关系也可以不配置。又例如,可以基于上述表格做适当的变形调整,例如,拆分,合并等等。上述各表中标题示出参数的名称也可以采用通信装置可理解的其他名称,其参数的取值或表示方式也可以通信装置可理解的其他取值或表示方式。上述各表在实现时,也可以采用其他的数据结构,例如可以采用数组、队列、容器、栈、线性表、指针、链表、树、图、结构体、类、堆、散列表或哈希表等。
本申请中的预定义可以理解为定义、预先定义、存储、预存储、预协商、预配置、固化、或预烧制。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (18)

  1. 一种音频信号渲染方法,其特征在于,所述方法被信号接收端执行,包括:
    确定待渲染的音频对象信号中的直达声信号对应的第一增益头相关冲击响应HRIR;
    确定待渲染的音频对象信号中的至少一个反射声信号对应的第二增益HRIR;
    基于所述第一增益HRIR和所述第二增益HRIR确定待渲染的音频对象信号对应的融合HRIR;
    基于所述融合HRIR对待渲染的音频对象信号进行渲染。
  2. 如权利要求1所述的方法,其特征在于,确定所述直达声信号对应的第一增益HRIR和所述反射声信号对应的第二增益HRIR,包括:
    接收信号发送端发送的码流;
    对所述码流进行解码,得到至少一个待渲染的音频对象信号以及至少一个待渲染的音频对象信号的元数据;
    基于所述待渲染的音频对象信号的元数据,确定所述待渲染的音频对象信号中直达声信号对应的第一增益HRIR和反射声信号对应的第二增益HRIR。
  3. 如权利要求2所述的方法,其特征在于,所述元数据包括以下至少一种:
    所述待渲染的音频对象信号声源的朝向信息;
    所述待渲染的音频对象信号声源的声锥信息;
    所述待渲染的音频对象信号声源所处房间的房间信息;
    所述待渲染的音频对象信号声源的空间方位信息;
    所述听音者的朝向信息;
    所述听音者的空间方位信息。
  4. 如权利要求3所述的方法,其特征在于,所述基于所述待渲染的音频对象信号的元数据确定所述待渲染的音频对象信号中直达声信号对应的第一增益HRIR,包括:
    基于所述声源的空间方位信息和/或听音者的空间方位信息确定所述直达声信号相对于听音者的入射角;
    基于所述直达声信号相对于听音者的入射角和所述听音者的朝向信息确定所述直达声信号对应的第一HRIR;
    基于所述声源的朝向信息和所述声锥信息确定所述直达声信号的朝向增益参数;
    基于声源的空间方位信息和/或听音者的空间方位信息确定直达声信号的距离增益参数;
    基于所述朝向增益参数、距离增益参数和所述第一HRIR确定所述第一增益HRIR。
  5. 如权利要求4所述的方法,其特征在于,所述基于声源的空间方位信息和/或听音者的空间方位信息确定直达声信号的距离增益参数,包括:
    基于声源的空间方位信息和/或听音者的空间方位信息确定声源位置和听音者位置;
    基于所述声源位置和所述听音者位置之间的距离确定所述直达声信号的距离增益参数。
  6. 如权利要求3所述的方法,其特征在于,所述基于所述待渲染的音频对象信号的元数据确定所述待渲染的音频对象信号中反射声信号对应的第二增益HRIR,包括:
    基于所述声源的空间方位信息和/或听音者的空间方位信息确定所述反射声信号相对于听音者的入射角;
    基于所述反射声信号相对于听音者的入射角和所述听音者的朝向信息确定所述反射声信号对应的第二HRIR;
    基于所述元数据中的信息确定所述反射声信号的朝向增益参数、反射增益参数、距离延迟增益参数中的至少一种;
    基于所述朝向增益参数、反射增益参数、距离延迟增益参数中的至少一种以及所述第二HRIR确定所述第二增益HRIR。
  7. 如权利要求6所述的方法,其特征在于,基于所述空间方位信息确定所述反射声信号相对于听音者的入射角,包括:
    基于所述空间方位信息确定所述声源位置和所述听音者位置;
    确定所述声源位置相对于反射体的镜像声源位置;所述反射体为反射了声信号的物体;
    基于所述镜像声源位置和所述听音者位置确定所述反射声信号相对于所述听音者的入射角。
  8. 如权利要求6所述的方法,其特征在于,基于所述元数据中的信息确定所述反射声信号的朝向增益参数,包括:
    确定所述听音者相对于反射体的镜像听音者位置;
    基于所述镜像听音者位置和所述声源位置确定所述反射声信号相对于所述声源的出射角;
    基于所述出射角、所述声源的朝向信息和所述声锥信息确定所述反射声信号的朝向增益参数。
  9. 如权利要求6所述的方法,其特征在于,基于所述元数据中的信息确定所述反射声信号的反射增益参数,包括:
    基于所述房间信息确定所述反射声信号的反射增益参数。
  10. 如权利要求6所述的方法,其特征在于,基于所述元数据中的信息确定所述反射声信号的距离延迟增益参数,包括:
    确定所述听音者相对于反射体的镜像听音者位置;
    基于所述声源位置和镜像听音者位置之间的距离,或者,所述镜像声源位置和听音者位置之间的距离确定所述距离延迟增益参数。
  11. 如权利要求1所述的方法,其特征在于,所述确定待渲染的音频对象信号对应的融合HRIR,包括:
    对所述待渲染的音频对象信号的直达声信号对应的第一增益HRIR,以及所述待渲染的音频对象信号的至少一个反射声信号对应的第二增益HRIR进行加权求和得到所述融合HRIR。
  12. 如权利要求1所述的方法,其特征在于,响应于包括多个待渲染的音频对象信号,所述方法还包括:
    将多个渲染后的音频对象信号进行下混。
  13. 如权利要求1-12任一所述的方法,其特征在于,所述反射声信号包括早期反射声信号。
  14. 如权利要求13所述的方法,其特征在于,所述早期反射声信号包括以下至少一种:
    一阶反射声信号;
    二阶反射声信号。
  15. 一种通信装置,其特征在于,被配置在信号接收端中,包括:
    处理模块,用于确定待渲染的音频对象信号中的直达声信号对应的第一增益头相关冲击响应HRIR;
    所述处理模块,还用于确定待渲染的音频对象信号中的至少一个反射声信号对应的第二增益HRIR;
    所述处理模块,还用于基于所述第一增益HRIR和所述第二增益HRIR确定待渲染的音频对象信号对应的融合HRIR;
    所述处理模块,还用于基于所述融合HRIR对待渲染的音频对象信号进行渲染。
  16. 一种通信装置,其特征在于,所述装置包括处理器和存储器,其中,所述存储器中存储有计算机程序,所述处理器执行所述存储器中存储的计算机程序,以使所述装置执行如权利要求1至14任一所述的方法。
  17. 一种通信装置,其特征在于,包括:处理器和接口电路,其中
    所述接口电路,用于接收代码指令并传输至所述处理器;
    所述处理器,用于运行所述代码指令以执行如权利要求1至14任一所述的方法。
  18. 一种计算机可读存储介质,用于存储有指令,当所述指令被执行时,使如权利要求1至14任一所述的方法被实现。
PCT/CN2022/130428 2022-11-07 2022-11-07 一种音频信号渲染方法、装置、设备及存储介质 WO2024098221A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/130428 WO2024098221A1 (zh) 2022-11-07 2022-11-07 一种音频信号渲染方法、装置、设备及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/130428 WO2024098221A1 (zh) 2022-11-07 2022-11-07 一种音频信号渲染方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2024098221A1 true WO2024098221A1 (zh) 2024-05-16

Family

ID=91031739

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/130428 WO2024098221A1 (zh) 2022-11-07 2022-11-07 一种音频信号渲染方法、装置、设备及存储介质

Country Status (1)

Country Link
WO (1) WO2024098221A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110268281A1 (en) * 2010-04-30 2011-11-03 Microsoft Corporation Audio spatialization using reflective room model
CN104604257A (zh) * 2012-08-31 2015-05-06 杜比实验室特许公司 用于在各种收听环境中渲染并且回放基于对象的音频的系统
WO2019066348A1 (ko) * 2017-09-28 2019-04-04 가우디오디오랩 주식회사 오디오 신호 처리 방법 및 장치
US20200037091A1 (en) * 2017-03-27 2020-01-30 Gaudio Lab, Inc. Audio signal processing method and device
CN114531640A (zh) * 2018-12-29 2022-05-24 华为技术有限公司 一种音频信号处理方法及装置
CN115278471A (zh) * 2022-06-21 2022-11-01 咪咕文化科技有限公司 一种音频数据处理方法、装置、设备及可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110268281A1 (en) * 2010-04-30 2011-11-03 Microsoft Corporation Audio spatialization using reflective room model
CN104604257A (zh) * 2012-08-31 2015-05-06 杜比实验室特许公司 用于在各种收听环境中渲染并且回放基于对象的音频的系统
US20200037091A1 (en) * 2017-03-27 2020-01-30 Gaudio Lab, Inc. Audio signal processing method and device
WO2019066348A1 (ko) * 2017-09-28 2019-04-04 가우디오디오랩 주식회사 오디오 신호 처리 방법 및 장치
CN114531640A (zh) * 2018-12-29 2022-05-24 华为技术有限公司 一种音频信号处理方法及装置
CN115278471A (zh) * 2022-06-21 2022-11-01 咪咕文化科技有限公司 一种音频数据处理方法、装置、设备及可读存储介质

Similar Documents

Publication Publication Date Title
CN112771894B (zh) 针对计算机介导现实系统进行渲染时表示遮挡
CN109791193A (zh) 环绕声系统中扬声器位置的自动发现和定位
US10757528B1 (en) Methods and systems for simulating spatially-varying acoustics of an extended reality world
KR102537714B1 (ko) 오디오 신호 처리 방법 및 장치
WO2017215640A1 (zh) 音效处理方法及装置
TWI819344B (zh) 音訊訊號渲染方法、裝置、設備及電腦可讀存儲介質
US20210006922A1 (en) Timer-based access for audio streaming and rendering
CN107301028B (zh) 一种基于多人远程通话的音频数据处理方法及装置
US20220394347A1 (en) Radio coexistence techniques for playback devices
US11641561B2 (en) Sharing locations where binaural sound externally localizes
TW202117500A (zh) 用於音訊呈現之隱私分區及授權
CN114072792A (zh) 用于音频渲染的基于密码的授权
US11653169B2 (en) Playing binaural sound clips during an electronic communication
WO2024098221A1 (zh) 一种音频信号渲染方法、装置、设备及存储介质
WO2022067652A1 (zh) 实时通信方法、装置和系统
WO2024082181A1 (zh) 空间音频采集方法及装置
WO2023197646A1 (zh) 一种音频信号处理方法及电子设备
US12028699B2 (en) Playing binaural sound clips during an electronic communication
WO2023197187A1 (zh) 一种信道状态信息的处理方法及装置
WO2023216034A1 (zh) 一种校验位置信息的方法及其装置
US20210358506A1 (en) Audio signal processing method and apparatus
WO2024113254A1 (zh) 散射体位置确定方法、装置及系统
WO2022051897A1 (zh) 编码方法及装置
WO2024020747A1 (zh) 一种模型的生成方法及装置
CN116724351A (zh) 一种量化编码方法、装置、设备及存储介质