WO2023212883A1 - 音频输出方法和装置、通信装置和存储介质 - Google Patents

音频输出方法和装置、通信装置和存储介质 Download PDF

Info

Publication number
WO2023212883A1
WO2023212883A1 PCT/CN2022/091055 CN2022091055W WO2023212883A1 WO 2023212883 A1 WO2023212883 A1 WO 2023212883A1 CN 2022091055 W CN2022091055 W CN 2022091055W WO 2023212883 A1 WO2023212883 A1 WO 2023212883A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
signal
virtual space
output
emission angle
Prior art date
Application number
PCT/CN2022/091055
Other languages
English (en)
French (fr)
Inventor
吕雪洋
吕柱良
史润宇
刘晗宇
Original Assignee
北京小米移动软件有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京小米移动软件有限公司 filed Critical 北京小米移动软件有限公司
Priority to PCT/CN2022/091055 priority Critical patent/WO2023212883A1/zh
Publication of WO2023212883A1 publication Critical patent/WO2023212883A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams

Definitions

  • the present disclosure relates to the field of communication technology, and specifically, to an audio output method, an audio output device, a communication device and a computer-readable storage medium.
  • VR Virtual Reality
  • AR Augmented Reality
  • the current implementation areas of VR (Virtual Reality)/AR (Augmented Reality) are mainly in games and film and television scenes.
  • the main implementation areas are vision and audio.
  • the implementation of spatial audio has not been popularized yet.
  • Dolby Atmos format audio is all post-produced, that is, recording a track sound as an object sound, and then using production software to add metadata to each time point of this sound.
  • the spatial location in (metadata) is only suitable as an audio storage format for offline playback.
  • Dolby Atmos adds the position information of the object, it does not add the sound angle of the object. Therefore, during playback, the sound of the object can only be played back as a non-directional sound. Therefore, Dolby Atmos only has spatial position information during playback. As a result, users listening to the sound cannot distinguish the direction of the sound source, which affects the user's VR/AR experience.
  • embodiments of the present disclosure provide an audio output method, an audio output device, a communication device and a computer-readable storage medium to solve technical problems in related technologies.
  • an audio output method including: determining the orientation information of the audio sending end in the virtual space, and the orientation information of the audio receiving end and the sending end in the virtual space. Angle information; according to the angle information and the orientation information, determine the audio emission angle of the sending end relative to the receiving end in the virtual space; render the audio data according to at least the audio emission angle to obtain Audio signal; output the audio signal.
  • rendering audio data according to at least the audio emission angle to obtain an audio signal includes: determining a first gain coefficient and/or a high-frequency attenuation coefficient of a low-pass filter according to the audio emission angle; The first gain coefficient and/or the low-pass filter renders the audio data to obtain the audio signal; wherein the first gain coefficient is positively related to the audio emission angle, and/or the The high frequency attenuation coefficient is inversely related to the audio emission angle.
  • rendering the audio data according to at least the audio emission angle to obtain an audio signal includes: determining a distance between the receiving end and the sending end in the virtual space; determining a second audio signal based on the distance.
  • Gain coefficient wherein the second gain coefficient is inversely correlated with the distance within a preset distance range; the audio data is rendered according to the second gain coefficient to obtain a signal to be output; according to the audio emission angle The signal to be output is rendered to obtain the audio signal.
  • rendering audio data according to at least the audio emission angle to obtain an audio signal includes: determining a first position of the sending end in the virtual space, determining a first position of the receiving end in the virtual space the second position, as well as the three-dimensional shape and reflection coefficient of the room where the sending end and the receiving end are located in the virtual space; rendering the audio data according to the audio emission angle to obtain the signal to be output; Reverberation is generated according to the first position, the second position, the three-dimensional shape and reflection coefficient of the room, and the signal to be output, and is added to the signal to be output to obtain the audio signal.
  • rendering audio data according to at least the audio emission angle to obtain an audio signal includes: determining, according to the angle information and the orientation information, the relative position of the receiving end in the virtual space to the audio signal.
  • the audio receiving angle of the sending end rendering the audio data according to the audio transmitting angle to obtain the signal to be output; and rendering the signal to be output according to the head-related transformation function hrtf and/or amplitude vector synthesis positioning vbap algorithm and the audio receiving angle.
  • the output signal is rendered to obtain the audio signal.
  • an audio output device including: a processing module configured to determine the orientation information of the audio sending end in the virtual space, and the audio receiving end and the sending end. Angle information in the virtual space; according to the angle information and the orientation information, determine the audio emission angle of the sending end relative to the receiving end in the virtual space; at least according to the audio emission angle
  • the audio data is rendered to obtain an audio signal; the output module is configured to output the audio signal.
  • the processing module is configured to determine a first gain coefficient and/or a high-frequency attenuation coefficient of a low-pass filter according to the audio emission angle;
  • the filter renders the audio data to obtain the audio signal; wherein the first gain coefficient is positively related to the audio emission angle, and/or the high-frequency attenuation coefficient is negatively related to the audio emission angle.
  • the processing module is configured to determine the distance between the receiving end and the sending end in the virtual space; determine a second gain coefficient according to the distance, wherein the second gain coefficient is Anti-correlated with the distance within a preset distance range; rendering the audio data according to the second gain coefficient to obtain the signal to be output; rendering the signal to be output according to the audio emission angle to obtain the audio signal.
  • the processing module is configured to determine the first position of the sending end in the virtual space, the second position of the receiving end in the virtual space, and the relationship between the sending end and the The three-dimensional shape and reflection coefficient of the room where the receiving end is located in the virtual space; rendering the audio data according to the audio emission angle to obtain the signal to be output; according to the first position, the second position, the The three-dimensional shape and reflection coefficient of the room and the signal to be output are used to generate reverberation and added to the signal to be output to obtain the audio signal.
  • the processing module is configured to determine the audio receiving angle of the receiving end relative to the sending end in the virtual space based on the angle information and the orientation information; based on the audio transmitting angle
  • the audio data is rendered to obtain the signal to be output; the signal to be output is rendered according to the head-related transformation function hrtf and/or the amplitude vector synthesis positioning vbap algorithm and the audio receiving angle to obtain the audio signal.
  • a communication device including: a processor; and a memory for storing a computer program; wherein when the computer program is executed by the processor, the above audio output method is implemented.
  • a computer-readable storage medium for storing a computer program that, when executed by a processor, implements the steps in the above audio output method.
  • the audio emission angle of the transmitting end relative to the receiving end is considered, so that the rendered audio signal can contain features related to the audio emission angle, so that the user of the receiving end can hear the
  • the source direction of the sound in the virtual space can be distinguished, which is beneficial to improving the user's communication experience in the virtual space.
  • FIG. 1 is a schematic flow chart of an audio output method according to an embodiment of the present disclosure.
  • Figure 2 is a schematic diagram of the spatial relationship between the receiving end and the transmitting end according to an embodiment of the present disclosure.
  • FIG. 3 is a schematic flow chart of another audio output method according to an embodiment of the present disclosure.
  • FIG. 4 is a schematic flow chart of yet another audio output method according to an embodiment of the present disclosure.
  • FIG. 5 is a schematic flow chart of yet another audio output method according to an embodiment of the present disclosure.
  • Figure 6 is a schematic flow chart of yet another audio output method according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic flow chart of yet another audio output method according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic block diagram of an audio output device according to an embodiment of the present disclosure.
  • FIG. 9 is a schematic block diagram of a device for audio output according to an embodiment of the present disclosure.
  • first, second, third, etc. may be used to describe various information in the embodiments of the present disclosure, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from each other.
  • first information may also be called second information, and similarly, the second information may also be called first information.
  • word “if” as used herein may be interpreted as "when” or "when” or "in response to determining.”
  • the terms used in this article are “greater than” or “less than”, “higher than” or “lower than” when characterizing size relationships. But for those skilled in the art, it can be understood that: the term “greater than” also covers the meaning of “greater than or equal to”, and “less than” also covers the meaning of “less than or equal to”; the term “higher than” covers the meaning of “higher than or equal to”. “The meaning of “less than” also covers the meaning of "less than or equal to”.
  • FIG. 1 is a schematic flow chart of an audio output method according to an embodiment of the present disclosure.
  • the method shown in this embodiment can be applied to devices such as VR devices and AR devices that can serve as receivers of audio signals in virtual space.
  • the audio output method may include the following steps:
  • step S101 determine the orientation information of the audio sending end in the virtual space, and the angle information of the audio receiving end and the sending end in the virtual space;
  • step S102 determine the audio emission angle of the sending end relative to the receiving end in the virtual space according to the angle information and the orientation information;
  • step S103 render the audio data according to at least the audio emission angle to obtain an audio signal
  • step S104 the audio signal is output.
  • the virtual space may be a virtual space in a VR scene or a virtual space in an AR scene, which may be determined based on the application scenario.
  • the application scenarios of the embodiments of the present disclosure include but are not limited to at least one of the following:
  • Remote multi-person virtual conference scenarios remote online class scenarios, online concert scenarios, immersive game scenarios, audio-visual interaction scenarios
  • the virtual space may include a virtual conference room
  • the sender may include speakers in the virtual space
  • the receiver may be a user of a VR/AR device, such as a participant.
  • the virtual scene may include a virtual classroom
  • the sender may include teachers and students speaking in the virtual space
  • the receiver may be a user of a VR/AR device, such as a student.
  • the virtual scene may include a virtual concert hall
  • the sender may include performers and singers in the virtual space
  • the receiver may be a user of a VR/AR device, such as a listener.
  • the virtual scene can include a game scene
  • the initiating end can include other players and NPCs (Non-Player Character) in the virtual space
  • the receiving end can be the user of the VR/AR device. For example, as a player.
  • the virtual scene may include a virtual cinema
  • the sending end may include speakers in the virtual space
  • the receiving end may be a user of a VR/AR device, for example, as an audience.
  • Figure 2 is a schematic diagram of the spatial relationship between the receiving end and the transmitting end according to an embodiment of the present disclosure.
  • the sending end is located at point A in the virtual space
  • the receiving end is located at point B in the virtual space.
  • a coordinate system is constructed with point B as the origin.
  • the yBz plane is the base plane of the receiving end
  • the positive y-axis direction is the receiving end.
  • the reference direction, that is, the positive direction of the y-axis is a rotation angle of 0 degrees, and the rotation angle increases in the counterclockwise direction in the xBy plane.
  • the angle information may include Rotation angle ⁇ ab and pitch angle from the sending end (point A) to the receiving end (point B) Among them, ⁇ ab and is calculated as follows:
  • the angle information may only include the rotation angle but not the pitch angle.
  • the first position, the second position, and the orientation information of the sending end can be directly obtained.
  • the first position and the second position can be obtained by positioning the sending end and the receiving end and then mapping them to the virtual space; or when the position of the receiving end is the origin, the first position and the second position can be obtained based on the position of the sending end generated in the virtual space (for example, virtual space).
  • the relative positional relationship between the position of the character) and the origin determines the first position and the second position (for example, the origin).
  • the orientation information of the sending end in the virtual space can be determined according to the gyroscope provided on the sending end, where the orientation information of the sending end in the virtual space can include the rotation orientation angle azima and the pitch orientation angle elev a .
  • the audio emission angle of the transmitting end relative to the receiving end in the virtual space can be determined based on the angle information and the orientation information.
  • the audio emission angle includes the angle ⁇ trans in the rotation direction and the angle ⁇ trans in the pitch direction.
  • ⁇ trans azim a - ⁇ ab ,
  • the audio data can be rendered according to the audio emission angle to obtain an audio signal, and finally the audio signal is output.
  • the audio emission angle of the transmitting end relative to the receiving end is considered, so that the rendered audio signal can contain features related to the audio emission angle, so that the user of the receiving end can hear the
  • the source direction of the sound in the virtual space can be distinguished, which is beneficial to improving the user's communication experience in the virtual space.
  • the embodiments of the present disclosure can also render the audio data in combination with other parameters to ensure that the obtained audio signal is consistent with the receiving end.
  • the virtual space you are in is more suitable.
  • FIG. 3 is a schematic flow chart of another audio output method according to an embodiment of the present disclosure. As shown in Figure 3, rendering audio data according to at least the audio emission angle to obtain an audio signal includes:
  • step S301 determine the first gain coefficient and/or the high-frequency attenuation coefficient of the low-pass filter according to the audio emission angle
  • step S302 render the audio data according to the first gain coefficient and/or the low-pass filter to obtain the audio signal
  • the first gain coefficient is positively correlated with the audio emission angle, and/or the high frequency attenuation coefficient is negatively correlated with the audio emission angle.
  • the first gain coefficient may be determined according to the audio emission angle, and then the audio data may be rendered according to the first gain coefficient.
  • the first gain coefficient is positively related to the audio emission angle. For example, in the angle range from 0 to ⁇ , the first gain coefficient increases as the audio emission angle increases.
  • the first gain coefficient is the smallest, and the audio signal obtained by rendering the audio data according to the first gain coefficient is also relatively small.
  • the receiving end listens to the sound emitted by the transmitting end.
  • the audio volume is relatively small; when the audio emission angle is ⁇ , that is, when the transmitting end is facing the receiving end, the first gain coefficient is the largest, and the audio signal obtained by rendering the audio data according to the first gain coefficient is also relatively large, and the receiving end
  • the audio volume from the sending end is relatively loud.
  • the transmitting end when the transmitting end is closer to the receiving end, the user at the receiving end hears the audio volume from the transmitting end, and the closer the transmitting end is to the receiving end, the user at the receiving end hears the audio volume from the transmitting end. The smaller.
  • the high-frequency attenuation coefficient of the low-pass filter can be determined based on the audio emission angle, and then the audio data can be rendered based on the low-pass filter, which can be specifically filtered.
  • the high-frequency attenuation coefficient is negatively related to the audio emission angle. For example, in the angle range from 0 to ⁇ , the high-frequency attenuation coefficient decreases as the audio emission angle increases.
  • the high-frequency attenuation coefficient is the largest.
  • the audio signal obtained by filtering the audio data according to the low-pass filter has relatively little high-frequency part, and the receiver listens The high-frequency part of the audio emitted to the transmitting end is relatively small; when the audio emission angle is ⁇ , that is, when the transmitting end is facing the receiving end, the high-frequency attenuation coefficient is the smallest (for example, the low-pass filter is an all-pass filter), according to The audio signal obtained by rendering audio data with a low-pass filter is also relatively large, and the receiving end hears relatively more high-frequency parts of the audio emitted by the transmitting end.
  • the receiving end since the brightness and detailed components of the sound are mainly determined by the high-frequency part, and the high-frequency part has stronger directivity, according to this embodiment, it can be ensured that when the transmitting end is closer to facing the receiving end, the receiving end The user determines that the transmitting end is pointing away from the receiving end based on more high-frequency parts. When the transmitting end is closer to facing away from the receiving end, the user at the receiving end determines that the transmitting end is facing away from the receiving end based on less high-frequency parts.
  • the first gain coefficient and the low-pass filter can be combined to render the audio data.
  • the first gain coefficient is g trans
  • the low-pass filter is LPF
  • the audio data is Au
  • the rendered audio signal is Au'
  • Au' g trans LPF(Au)
  • FIG. 4 is a schematic flow chart of yet another audio output method according to an embodiment of the present disclosure. As shown in Figure 4, rendering audio data according to at least the audio emission angle to obtain an audio signal includes:
  • step S401 determine the distance between the receiving end and the sending end in the virtual space
  • step S402 determine a second gain coefficient according to the distance, wherein the second gain coefficient is inversely correlated with the distance within a preset distance range;
  • step S403 the audio data is rendered according to the second gain coefficient to obtain a signal to be output;
  • step S404 the signal to be output is rendered according to the audio emission angle to obtain the audio signal.
  • the first position of the sending end in the virtual space is (x a , ya , za ), and the second position of the receiving end in the virtual space is (x b , y b , z b ), and then calculate the distance d ab between the receiving end and the sending end in the virtual space according to the first position and the second position, where:
  • the second gain coefficient g d can be determined based on the distance, and the second gain coefficient g d is within the preset distance range (can Set as needed, for example, within a distance range of more than 1 meter, it is inversely related to the distance, for example:
  • the audio data is then rendered according to the second gain coefficient to obtain a signal to be output.
  • the signal to be output is rendered according to the audio emission angle to obtain the audio signal. Accordingly, when rendering audio data, the distance from the sending end to the receiving end is taken into consideration, so that the rendered audio signal can contain distance-related features so that the user at the receiving end can distinguish the Based on the source direction of the sound in the virtual space, the distance in the virtual space can also be distinguished, which is beneficial to improving the user's communication experience in the virtual space.
  • the second gain coefficient can be set to a fixed value of 1, that is, within 1 meter, the rendered audio signal will no longer increase as the distance decreases, which is beneficial to avoid affecting the user due to excessive volume. experience.
  • FIG. 5 is a schematic flow chart of yet another audio output method according to an embodiment of the present disclosure. As shown in Figure 5, rendering audio data according to at least the audio emission angle to obtain an audio signal includes:
  • step S501 determine the first position of the sending end in the virtual space, the second position of the receiving end in the virtual space, and the positions of the sending end and the receiving end in the virtual space.
  • step S502 the audio data is rendered according to the audio emission angle to obtain a signal to be output;
  • step S503 reverberation is generated according to the first position, the second position, the three-dimensional shape and reflection coefficient of the room, and the signal to be output, and is added to the signal to be output to obtain the audio signal.
  • the first position of the sending end in the virtual space is (x a , ya , za ), and the second position of the receiving end in the virtual space is (x b , y b , z b ), as well as the three-dimensional shape (x r ,y r ,z r ) and reflection coefficient r w of the room where the sending end and receiving end are located in the virtual space, (x r ,y r ,z r ) and r w can Integrated into (x r ,y r ,z r ,r w ).
  • the audio data can be rendered first according to the audio emission angle to obtain the signal to be output, and then the signal to be output can be generated based on the first position, the second position, the three-dimensional shape and reflection coefficient of the room, and the signal to be output. Reverberation and adding to the signal to be output to obtain the audio signal.
  • the signal to be output is Au
  • the rendered audio signal is Au"'
  • Au"' reverb(Au, (x a , ya , z a ), (x a , ya , z a ), (x r ,y r ,z r ,r w ))
  • reverb represents a function used to calculate reverberation and add reverberation to the signal to be output to obtain an audio signal.
  • the three-dimensional shape and reflection coefficient of the room where the sending end and the receiving end are located in the virtual space are considered, so that the rendered audio signal can contain reverberation-related features for reception
  • users at the end listen to the audio signal they can not only distinguish the source direction of the sound in the virtual space, but also distinguish the situation of the room in the virtual space based on the reverberation, which is conducive to improving the user's ability to understand the virtual space.
  • the experience of communicating in space is considered, so that the rendered audio signal can contain reverberation-related features for reception.
  • FIG. 6 is a schematic flow chart of yet another audio output method according to an embodiment of the present disclosure. As shown in Figure 6, rendering audio data according to at least the audio emission angle to obtain an audio signal includes:
  • step S601 determine the audio receiving angle of the receiving end relative to the sending end in the virtual space based on the angle information and the orientation information;
  • step S602 render the audio data according to the audio emission angle to obtain a signal to be output
  • step S603 the signal to be output is rendered according to the head related transformation function hrtf (Head Related Transfer Function) and/or the amplitude vector synthetic positioning vbap (Vector Base Amplitude Panning) algorithm and the audio receiving angle to obtain the audio signal.
  • hrtf Head Related Transfer Function
  • vbap Vector Base Amplitude Panning
  • the audio receiving angle of the receiving end relative to the transmitting end in the virtual space can be determined based on the angle information and the orientation information.
  • the audio data can be rendered first according to the audio emission angle to obtain the signal to be output, and then the signal to be output can be obtained according to the head-related transformation function hrtf and/or the amplitude vector synthesis positioning vbap algorithm and the audio reception angle. Rendering is performed to obtain the audio signal.
  • the signal to be output is Au
  • the rendered audio signal is Au"
  • the receiving end listens to audio signals in different ways, you can choose different ways to render. For example, when listening through headphones, you can render through hrtf. For example when listening through speakers you can render via vbap,
  • the audio receiving angle of the receiving end relative to the transmitting end in the virtual space is considered, so that the rendered audio signal can contain characteristics related to the audio receiving angle, so that the user at the receiving end can hear the audio receiving angle when listening to the audio data.
  • listening to audio signals on the basis of being able to distinguish the source direction of the sound in the virtual space, it can also ensure the listening effect when listening with headphones and/or listening with speakers, which is conducive to improving users' communication in the virtual space. experience.
  • rendering Au 3 reverb (Au 2 , (x a , ya , z a ), (x a , ya , z a ) according to the first position, the second position, the three-dimensional shape of the room and the reflection coefficient , (x r ,y r ,z r ,r w ));
  • FIG. 7 is a schematic flow chart of yet another audio output method according to an embodiment of the present disclosure.
  • the sound emitted by the sending end may include two parts, one part is the audio data Audio, and the other part is the metadata (metadata) of the sending end.
  • the metadata at least includes the audio emission angle, and specifically may include a gain coefficient. , low-pass filter (high-frequency attenuation coefficient), which can also include the distance between the receiving end and the transmitting end in the virtual space, the three-dimensional shape and reflection coefficient of the room where the transmitting end and the receiving end are located in the virtual space, etc.
  • Audio data and metadata can be integrated into Object format audio, then encoded and transmitted to the receiving end.
  • the receiving end can decode the received content to obtain the audio data and metadata of the sending end, and then use the audio data and metadata of the sending end to obtain the audio data and metadata of the sending end.
  • the data and the metadata of the receiving end (for example, at least including the audio receiving angle) render the audio data, and after obtaining the audio information, it is output (played), and the user of the receiving end listens.
  • embodiments of the present disclosure can be applied to real-time audio listening scenarios and can also be applied to audio playback scenarios.
  • the present disclosure also provides embodiments of an audio output device.
  • FIG. 8 is a schematic block diagram of an audio output device according to an embodiment of the present disclosure.
  • the device shown in this embodiment can be applied to VR equipment, AR equipment and other equipment that can serve as the receiving end of audio signals in virtual space.
  • the audio output device may include:
  • the processing module 801 is configured to determine the orientation information of the audio sending end in the virtual space, and the angle information of the audio receiving end and the sending end in the virtual space; according to the angle information and the Orientation information, determine the audio emission angle of the sending end relative to the receiving end in the virtual space; render the audio data according to at least the audio emission angle to obtain an audio signal;
  • the output module 802 is configured to output the audio signal.
  • the processing module is configured to determine a first gain coefficient and/or a high-frequency attenuation coefficient of the low-pass filter according to the audio emission angle; according to the first gain coefficient and/or the A low-pass filter renders the audio data to obtain the audio signal;
  • the first gain coefficient is positively correlated with the audio emission angle, and/or the high frequency attenuation coefficient is negatively correlated with the audio emission angle.
  • the processing module is configured to determine the distance between the receiving end and the sending end in the virtual space; determine a second gain coefficient according to the distance, wherein the second gain coefficient Anti-correlated with the distance within a preset distance range; rendering the audio data according to the second gain coefficient to obtain the signal to be output; rendering the signal to be output according to the audio emission angle to obtain the signal to be output the audio signal.
  • the processing module is configured to determine the first position of the sending end in the virtual space, the second position of the receiving end in the virtual space, and the and the three-dimensional shape and reflection coefficient of the room where the receiving end is located in the virtual space; rendering the audio data according to the audio emission angle to obtain the signal to be output; according to the first position, the second position , the three-dimensional shape and reflection coefficient of the room and the signal to be output generate reverberation and add it to the signal to be output to obtain the audio signal.
  • the processing module is configured to determine the audio receiving angle of the receiving end relative to the sending end in the virtual space according to the angle information and the orientation information; according to the audio
  • the audio data is rendered using the emission angle to obtain the signal to be output; the signal to be output is rendered according to the head-related transformation function hrtf and/or the amplitude vector synthesis positioning vbap algorithm and the audio receiving angle to obtain the audio signal .
  • the device embodiment since it basically corresponds to the method embodiment, please refer to the partial description of the method embodiment for relevant details.
  • the device embodiments described above are only illustrative.
  • the modules described as separate components may or may not be physically separated.
  • the components shown as modules may or may not be physical modules, that is, they may be located in One place, or it can be distributed to multiple network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
  • An embodiment of the present disclosure also provides a communication device, including: a processor; a memory for storing a computer program; wherein, when the computer program is executed by the processor, the audio output method described in any of the above embodiments is implemented .
  • Embodiments of the present disclosure also provide a computer-readable storage medium for storing a computer program.
  • the computer program is executed by a processor, the steps in the audio output method described in any of the above embodiments are implemented.
  • FIG. 9 is a schematic block diagram of a device 900 for audio output according to an embodiment of the present disclosure.
  • the device 900 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like.
  • apparatus 900 may include one or more of the following components: a processing component 902, a memory 904, a power supply component 906, a multimedia component 908, an audio component 910, an input/output (I/O) interface 912, a sensor component 914, and Communication component 916.
  • Processing component 902 generally controls the overall operations of device 900, such as operations associated with display, phone calls, data communications, camera operations, and recording operations.
  • the processing component 902 may include one or more processors 920 to execute instructions to complete all or part of the steps of the above method.
  • processing component 902 may include one or more modules that facilitate interaction between processing component 902 and other components.
  • processing component 902 may include a multimedia module to facilitate interaction between multimedia component 908 and processing component 902.
  • Memory 904 is configured to store various types of data to support operations at device 900 . Examples of such data include instructions for any application or method operating on device 900, contact data, phonebook data, messages, pictures, videos, etc.
  • Memory 904 may be implemented by any type of volatile or non-volatile storage device, or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EEPROM), Programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EEPROM erasable programmable read-only memory
  • EPROM Programmable read-only memory
  • PROM programmable read-only memory
  • ROM read-only memory
  • magnetic memory flash memory, magnetic or optical disk.
  • Power supply component 906 provides power to the various components of device 900 .
  • Power supply components 906 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to device 900 .
  • Multimedia component 908 includes a screen that provides an output interface between the device 900 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide action.
  • multimedia component 908 includes a front-facing camera and/or a rear-facing camera.
  • the front camera and/or the rear camera may receive external multimedia data.
  • Each front-facing camera and rear-facing camera can be a fixed optical lens system or have a focal length and optical zoom capabilities.
  • Audio component 910 is configured to output and/or input audio signals.
  • audio component 910 includes a microphone (MIC) configured to receive external audio signals when device 900 is in operating modes, such as call mode, recording mode, and speech recognition mode. The received audio signals may be further stored in memory 904 or sent via communications component 916 .
  • audio component 910 also includes a speaker for outputting audio signals.
  • the I/O interface 912 provides an interface between the processing component 902 and a peripheral interface module, which may be a keyboard, a click wheel, a button, etc. These buttons may include, but are not limited to: Home button, Volume buttons, Start button, and Lock button.
  • Sensor component 914 includes one or more sensors that provide various aspects of status assessment for device 900 .
  • the sensor component 914 can detect the open/closed state of the device 900, the relative positioning of components, such as the display and keypad of the device 900, and the sensor component 914 can also detect a change in position of the device 900 or a component of the device 900. , the presence or absence of user contact with the device 900 , device 900 orientation or acceleration/deceleration and temperature changes of the device 900 .
  • Sensor assembly 914 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact.
  • Sensor assembly 914 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor component 914 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • Communication component 916 is configured to facilitate wired or wireless communication between apparatus 900 and other devices.
  • the device 900 can access a wireless network based on a communication standard, such as WiFi, 2G, 3G, 4G LTE, 5G NR, or a combination thereof.
  • the communication component 916 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communications component 916 also includes a near field communications (NFC) module to facilitate short-range communications.
  • NFC near field communications
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • apparatus 900 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable Gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are implemented for executing the above method.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field programmable Gate array
  • controller microcontroller, microprocessor or other electronic components are implemented for executing the above method.
  • a non-transitory computer-readable storage medium including instructions such as a memory 904 including instructions, which are executable by the processor 920 of the apparatus 900 to complete the above method is also provided.
  • the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)

Abstract

本公开涉及音频输出方法和装置、通信装置和存储介质,其中,所述音频输出方法包括:确定音频的发送端在虚拟空间中的朝向信息,以及所述音频的接收端和所述发送端在所述虚拟空间中的角度信息;根据所述角度信息和所述朝向信息,确定在所述虚拟空间中所述发送端相对于所述接收端的音频发射角;至少根据所述音频发射角对音频数据进行渲染以得到音频信号;输出所述音频信号。根据本公开,在对音频数据进行渲染时,考虑了发送端相对于接收端的音频发射角,使得渲染得到的音频信号可以包含有关音频发射角的特征,以便接收端的用户在收听到所述音频信号时,能够分辨出声音在虚拟空间中的来源方向,有利于提高用户在虚拟空间中进行通信的体验。

Description

音频输出方法和装置、通信装置和存储介质 技术领域
本公开涉及通信技术领域,具体而言,涉及音频输出方法、音频输出装置、通信装置和计算机可读存储介质。
背景技术
当前VR(Virtual Reality,虚拟现实)/AR(Augmented Reality,增强现实)的实现领域主要是在游戏以及影视场景,主要的实现领域是在视觉,音频方面还没有普及空间音频的实现。
要实现VR/AR的音频通信,需要依托空间音频技术,目前空间音频的技术在多媒体播放端已经比较成熟,比如Dolby Atmos(杜比全景声),DTS(Digital Theatre System,数字化影院系统),sony 360等音频格式,但是目前这些技术存在一些问题。
以Dolby Atmos格式为例,Dolby Atmos格式音频都是后期制作的,也就是录制一道音轨声音作为object(对象)声音,然后再用制作软件,去给这段声音的每个时间点配上metadata(元数据)里的空间位置,只适合作为一种离线回放的音频存储格式。
Dolby Atmos虽然添加了object的位置信息,但是没有添加object的发声角度,所以在回放的时候,object的声音只能作为一个无指向性的声音进行回放,所以dolby atmos在回放时只有空间位置信息,导致收听声音的用户无法区分声音的来源方向,影响用户VR/AR的使用体验。
发明内容
有鉴于此,本公开的实施例提出了音频输出方法、音频输出装置、通信装置和计算机可读存储介质,以解决相关技术中的技术问题。
根据本公开实施例的第一方面,提出一种音频输出方法,包括:确定音频的发送端在虚拟空间中的朝向信息,以及所述音频的接收端和所述发送端在所述虚拟空间中的角度信息;根据所述角度信息和所述朝向信息,确定在所述虚拟空间中所述发送端相对于所述接收端的音频发射角;至少根据所述音频发射角对音频数据进行渲染以 得到音频信号;输出所述音频信号。
可选地,所述至少根据所述音频发射角对音频数据进行渲染以得到音频信号包括:根据所述音频发射角确定第一增益系数和/或低通滤波器的高频衰减系数;根据所述第一增益系数和/或所述低通滤波器对所述音频数据进行渲染以得到所述音频信号;其中,所述第一增益系数与所述音频发射角正相关,和/或所述高频衰减系数与所述音频发射角负相关。
可选地,所述至少根据所述音频发射角对音频数据进行渲染以得到音频信号包括:确定所述接收端和所述发送端在所述虚拟空间中的距离;根据所述距离确定第二增益系数,其中,所述第二增益系数在预设距离范围内与所述距离反相关;根据所述第二增益系数对所述音频数据进行渲染以得到待输出信号;根据所述音频发射角对所述待输出信号进行渲染以得到所述音频信号。
可选地,所述至少根据所述音频发射角对音频数据进行渲染以得到音频信号包括:确定所述发送端在所述虚拟空间中的第一位置、所述接收端在所述虚拟空间中的第二位置、以及所述发送端和所述接收端在所述虚拟空间中所处房间的三维形状和反射系数;根据所述音频发射角对所述音频数据进行渲染以得到待输出信号;根据所述第一位置、第二位置、所述房间的三维形状和反射系数以及所述待输出信号生成混响并添加至所述待输出信号以得到所述音频信号。
可选地,所述至少根据所述音频发射角对音频数据进行渲染以得到音频信号包括:根据所述角度信息和所述朝向信息,确定在所述虚拟空间中所述接收端相对于所述发送端的音频接收角;根据所述音频发射角对所述音频数据进行渲染以得到待输出信号;根据头相关变换函数hrtf和/或幅度矢量合成定位vbap算法以及所述音频接收角对所述待输出信号进行渲染以得到所述音频信号。
根据本公开实施例的第二方面,提出一种音频输出装置,包括:处理模块,被配置为确定音频的发送端在虚拟空间中的朝向信息,以及所述音频的接收端和所述发送端在所述虚拟空间中的角度信息;根据所述角度信息和所述朝向信息,确定在所述虚拟空间中所述发送端相对于所述接收端的音频发射角;至少根据所述音频发射角对音频数据进行渲染以得到音频信号;输出模块,别配置为输出所述音频信号。
可选地,所述处理模块,被配置为根据所述音频发射角确定第一增益系数和/或低通滤波器的高频衰减系数;根据所述第一增益系数和/或所述低通滤波器对所述音 频数据进行渲染以得到所述音频信号;其中,所述第一增益系数与所述音频发射角正相关,和/或所述高频衰减系数与所述音频发射角负相关。
可选地,所述处理模块,被配置为确定所述接收端和所述发送端在所述虚拟空间中的距离;根据所述距离确定第二增益系数,其中,所述第二增益系数在预设距离范围内与所述距离反相关;根据所述第二增益系数对所述音频数据进行渲染以得到待输出信号;根据所述音频发射角对所述待输出信号进行渲染以得到所述音频信号。
可选地,所述处理模块,被配置为确定所述发送端在所述虚拟空间中的第一位置、所述接收端在所述虚拟空间中的第二位置、以及所述发送端和所述接收端在所述虚拟空间中所处房间的三维形状和反射系数;根据所述音频发射角对所述音频数据进行渲染以得到待输出信号;根据所述第一位置、第二位置、所述房间的三维形状和反射系数以及所述待输出信号生成混响并添加至所述待输出信号以得到所述音频信号。
可选地,所述处理模块,被配置为根据所述角度信息和所述朝向信息,确定在所述虚拟空间中所述接收端相对于所述发送端的音频接收角;根据所述音频发射角对所述音频数据进行渲染以得到待输出信号;根据头相关变换函数hrtf和/或幅度矢量合成定位vbap算法以及所述音频接收角对所述待输出信号进行渲染以得到所述音频信号。
根据本公开实施例的第三方面,提出一种通信装置,包括:处理器;用于存储计算机程序的存储器;其中,当所述计算机程序被处理器执行时,实现上述音频输出方法。
根据本公开实施例的第四方面,提出一种计算机可读存储介质,用于存储计算机程序,当所述计算机程序被处理器执行时,实现上述音频输出方法中的步骤。
根据本公开的实施例,在对音频数据进行渲染时,考虑了发送端相对于接收端的音频发射角,使得渲染得到的音频信号可以包含有关音频发射角的特征,以便接收端的用户在收听到所述音频信号时,能够分辨出声音在虚拟空间中的来源方向,有利于提高用户在虚拟空间中进行通信的体验。
附图说明
为了更清楚地说明本公开实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施 例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是根据本公开的实施例示出的一种音频输出方法的示意流程图。
图2是根据本公开的实施例示出的接收端和发送端的空间关系示意图。
图3是根据本公开的实施例示出的另一种音频输出方法的示意流程图。
图4是根据本公开的实施例示出的又一种音频输出方法的示意流程图。
图5是根据本公开的实施例示出的又一种音频输出方法的示意流程图。
图6是根据本公开的实施例示出的又一种音频输出方法的示意流程图。
图7是根据本公开的实施例示出的又一种音频输出方法的示意流程图。
图8是根据本公开的实施例示出的一种音频输出装置的示意框图。
图9是根据本公开的实施例示出的一种用于音频输出的装置的示意框图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
在本公开实施例使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开实施例。在本公开实施例和所附权利要求书中所使用的单数形式的“一种”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本公开实施例可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开实施例范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
出于简洁和便于理解的目的,本文在表征大小关系时,所使用的术语为“大于”或“小于”、“高于”或“低于”。但对于本领域技术人员来说,可以理解:术语“大于” 也涵盖了“大于等于”的含义,“小于”也涵盖了“小于等于”的含义;术语“高于”涵盖了“高于等于”的含义,“低于”也涵盖了“低于等于”的含义。
图1是根据本公开的实施例示出的一种音频输出方法的示意流程图。本实施例所示的方法可以适用于VR设备、AR设备等能够作为虚拟空间中音频信号的接收端的设备。
如图1所示,所述音频输出方法可以包括以下步骤:
在步骤S101中,确定音频的发送端在虚拟空间中的朝向信息,以及所述音频的接收端和所述发送端在所述虚拟空间中的角度信息;
在步骤S102中,根据所述角度信息和所述朝向信息,确定在所述虚拟空间中所述发送端相对于所述接收端的音频发射角;
在步骤S103中,至少根据所述音频发射角对音频数据进行渲染以得到音频信号;
在步骤S104中,输出所述音频信号。
在一个实施例中,所述虚拟空间可以是VR场景中的虚拟空间,也可以是AR场景中的虚拟空间,具体可以基于应用场景而定。
其中,本公开实施例的应用场景包括但不限于以下至少之一:
远程多人虚拟会议场景、远程网课场景、线上音乐会场景、沉浸式游戏场景、影音交互场景
例如在远程多人虚拟会议场景中,虚拟空间可以包括虚拟会议室,发送端可以包括虚拟空间中的发言者,接收端可以是VR/AR设备的使用者,例如作为与会人员。
例如在远程网课场景中,虚拟场景可以包括虚拟教室,发送端可以包括虚拟空间中的老师、发言的学生,接收端可以是VR/AR设备的使用者,例如作为学生。
例如在线上音乐会场景中,虚拟场景可以包括虚拟音乐厅,发送端可以包括虚拟空间中的演奏者、歌唱者,接收端可以是VR/AR设备的使用者,例如作为听众。
例如在沉浸式游戏场景中,虚拟场景可以包括游戏场景,发动端可以包括虚拟空间中的其他玩家、NPC(Non-Player Character,非玩家角色),接收端可以是VR/AR设备的使用者,例如作为玩家。
例如在影音交互场景中,虚拟场景可以包括虚拟电影院,发送端可以包括虚拟空间中的音响,接收端可以是VR/AR设备的使用者,例如作为观众。
图2是根据本公开的实施例示出的接收端和发送端的空间关系示意图。
如图2所示,发送端在虚拟空间中位于点A,接收端在虚拟空间中位于点B,以点B为原点构建坐标系,yBz平面为接收端的基准面,y轴正方向为接收端的基准方向,也即y轴正方向为旋转角0度,在xBy平面内沿着逆时针方向旋转角的角度增大。
发送端在虚拟空间中的第一位置为(x a,y a,z a),接收端在虚拟空间中的第二位置为(x b,y b,z b),所述角度信息可以包括发送端(点A)到接收端(点B)的旋转角θ ab和俯仰角
Figure PCTCN2022091055-appb-000001
其中,θ ab
Figure PCTCN2022091055-appb-000002
的计算方式如下:
Figure PCTCN2022091055-appb-000003
Figure PCTCN2022091055-appb-000004
Figure PCTCN2022091055-appb-000005
Figure PCTCN2022091055-appb-000006
需要说明的是,图2所示的是三维的虚拟空间,在二位的虚拟空间中,角度信息可以仅包括旋转角,而不包括俯仰角。
在一个实施例中,第一位置、第二位置、发送端的朝向信息,可以直接获取到。
例如可以通过对发送端和接收端进行定位,然后映射到虚拟空间中得到第一位置和第二位置;或者在接收端的位置为原点的情况下,根据在虚拟空间中生成的发送端(例如虚拟人物)所处的位置与原点的相对位置关系确定第一位置和第二位置(例如原点)。
例如可以根据发送端上设置的陀螺仪来确定发送端在虚拟空间中的朝向信息,其中,发送端在虚拟空间中的朝向信息,可以包括旋转朝向角度azim a和俯仰朝向角度elev a
然后可以根据所述角度信息和所述朝向信息,确定在所述虚拟空间中所述发送端相对于所述接收端的音频发射角,例如音频发射角包括旋转方向上的角度θ trans和俯仰方向上的角度
Figure PCTCN2022091055-appb-000007
其中,θ trans=azim aab
Figure PCTCN2022091055-appb-000008
例如在A点和B点处于同一水平面的情况下,当发送端的朝向为正对着接收端时,θ trans=π,当发送端背对着接收端时,θ trans=0。进而可以根据所述音频发射角对音频数据进行渲染以得到音频信号,最后输出所述音频信号。
根据本公开的实施例,在对音频数据进行渲染时,考虑了发送端相对于接收端的音频发射角,使得渲染得到的音频信号可以包含有关音频发射角的特征,以便接收端的用户在收听到所述音频信号时,能够分辨出声音在虚拟空间中的来源方向,有利于提高用户在虚拟空间中进行通信的体验。
需要说明的是,本公开的实施例在根据所述音频发射角对音频数据进行渲染以得到音频信号的基础上,还可以结合其他参数对音频数据进行渲染,以便确保得到的音频信号与接收端所处的虚拟空间更为契合。
图3是根据本公开的实施例示出的另一种音频输出方法的示意流程图。如图3所示,所述至少根据所述音频发射角对音频数据进行渲染以得到音频信号包括:
在步骤S301中,根据所述音频发射角确定第一增益系数和/或低通滤波器的高频衰减系数;
在步骤S302中,根据所述第一增益系数和/或所述低通滤波器对所述音频数据进行渲染以得到所述音频信号;
其中,所述第一增益系数与所述音频发射角正相关,和/或所述高频衰减系数与所述音频发射角负相关。
在一个实施例中,可以根据音频发射角度确定第一增益系数,然后根据第一增益系数对音频数据进行渲染。其中,第一增益系数与音频发射角正相关,例如在0至π角度范围内,第一增益系数随着音频发射角的增大而增大。
例如在音频发射角为0时,也即发送端背对着接收端时,第一增益系数最小, 根据第一增益系数渲染音频数据得到的音频信号也相对较小,接收端收听到发送端发出音频的音量相对较小;在音频发射角为π时,也即发送端正对着接收端时,第一增益系数最大,根据第一增益系数渲染音频数据得到的音频信号也相对较大,接收端收听到发送端发出音频的音量相对较大。
据此,发射端越接近正对着接收端时,接收端的用户听到发送端发出音频的音量越大,发射端越接近背对着接收端时,接收端的用户听到发送端发出音频的音量越小。
在一个实施例中,可以根据音频发射角度确定低通滤波器的高频衰减系数,然后根据低通滤波器对音频数据进行渲染,具体可以是滤波。其中,高频衰减系数与所述音频发射角负相关,例如在0至π角度范围内,高频衰减系数随着音频发射角的增大而减小。
例如在音频发射角为0时,也即发送端背对着接收端时,高频衰减系数最大,根据低通滤波器对音频数据进行滤波得到的音频信号中高频部分相对较少,接收端收听到发送端发出音频的高频部分相对较少;在音频发射角为π时,也即发送端正对着接收端时,高频衰减系数最小(例如低通滤波器为全通滤波器),根据低通滤波器渲染音频数据得到的音频信号也相对较大,接收端收听到发送端发出音频的高频部分相对较多。
据此,由于声音的明亮以及细节成分主要是由高频部分决定的,且高频部分的指向性更强,因此根据本实施例,可以确保发射端越接近正对着接收端时,接收端的用户根据较多的高频部分确定发射端指向接收端,发射端越接近背对着接收端时,接收端的用户根据较少的高频部分确定发射端背对接收端。
在一个实施例中,可以综合第一增益系数和低通滤波器对音频数据进行渲染,例如第一增益系数为g trans,低通滤波器为LPF,音频数据为Au,渲染得到的音频信号为Au',那么Au'=g transLPF(Au)。据此,用户可以结合音频信号的音量和高频部分准确地区分音频信号的来源方向。
图4是根据本公开的实施例示出的又一种音频输出方法的示意流程图。如图4所示,所述至少根据所述音频发射角对音频数据进行渲染以得到音频信号包括:
在步骤S401中,确定所述接收端和所述发送端在所述虚拟空间中的距离;
在步骤S402中,根据所述距离确定第二增益系数,其中,所述第二增益系数 在预设距离范围内与所述距离反相关;
在步骤S403中,根据所述第二增益系数对所述音频数据进行渲染以得到待输出信号;
在步骤S404中,根据所述音频发射角对所述待输出信号进行渲染以得到所述音频信号。
在一个实施例中,可以确定发送端在虚拟空间中的第一位置为(x a,y a,z a),接收端在虚拟空间中的第二位置为(x b,y b,z b),进而根据第一位置和第二位置计算接收端和所述发送端在虚拟空间中的距离d ab,其中:
Figure PCTCN2022091055-appb-000009
由于接收端到发送端的距离越远,接收端接收到发送端发出声音的音量就越小,因此可以根据距离确定第二增益系数g d,且g d第二增益系数在预设距离范围(可以根据需要进行设置,例如1米以上距离范围)内与所述距离反相关,例如:
Figure PCTCN2022091055-appb-000010
进而根据所述第二增益系数对所述音频数据进行渲染以得到待输出信号,音频数据为Au,待输出信号为Au”,那么Au”=g dAu。然后根据所述音频发射角对所述待输出信号进行渲染以得到所述音频信号。据此,在对音频数据进行渲染时,考虑了发送端到接收端的距离,使得渲染得到的音频信号可以包含有关距离的特征,以便接收端的用户在收听到所述音频信号时,在能够分辨出声音在虚拟空间中的来源方向的基础上,还能分辨出在虚拟空间中的距离,有利于提高用户在虚拟空间中进行通信的体验。
其中,在1米以内,可以设置第二增益系数为定值1,也即在1米以内,渲染得到的音频信号不再随着距离的减少而增大,有利于避免音量过大而影响用户体验。
图5是根据本公开的实施例示出的又一种音频输出方法的示意流程图。如图5所示,所述至少根据所述音频发射角对音频数据进行渲染以得到音频信号包括:
在步骤S501中,确定所述发送端在所述虚拟空间中的第一位置、所述接收端在所述虚拟空间中的第二位置、以及所述发送端和所述接收端在所述虚拟空间中所处 房间的三维形状和反射系数;
在步骤S502中,根据所述音频发射角对所述音频数据进行渲染以得到待输出信号;
在步骤S503中,根据所述第一位置、第二位置、所述房间的三维形状和反射系数以及所述待输出信号生成混响并添加至所述待输出信号以得到所述音频信号。
在一个实施例中,可以确定发送端在虚拟空间中的第一位置为(x a,y a,z a),接收端在虚拟空间中的第二位置为(x b,y b,z b),以及发送端和接收端在所述虚拟空间中所处房间的三维形状(x r,y r,z r)和反射系数r w,(x r,y r,z r)和r w可以整合为(x r,y r,z r,r w)。
进而可以先根据所述音频发射角对所述音频数据进行渲染以得到待输出信号,然后据所述第一位置、第二位置、所述房间的三维形状和反射系数以及所述待输出信号生成混响并添加至所述待输出信号以得到所述音频信号。
例如待输出信号为Au,渲染得到的音频信号为Au”',那么Au”'=reverb(Au,(x a,y a,z a),(x a,y a,z a),(x r,y r,z r,r w)),reverb表示用于计算混响以及将混响添加到待输出信号以得到音频信号的函数。
据此,在对音频数据进行渲染时,考虑了发送端和接收端在所述虚拟空间中所处房间的三维形状和反射系数,使得渲染得到的音频信号可以包含有关混响的特征,以便接收端的用户在收听到所述音频信号时,在能够分辨出声音在虚拟空间中的来源方向的基础上,还能根据混响分辨出在虚拟空间中所处房间的情况,有利于提高用户在虚拟空间中进行通信的体验。
图6是根据本公开的实施例示出的又一种音频输出方法的示意流程图。如图6所示,所述至少根据所述音频发射角对音频数据进行渲染以得到音频信号包括:
在步骤S601中,根据所述角度信息和所述朝向信息,确定在所述虚拟空间中所述接收端相对于所述发送端的音频接收角;
在步骤S602中,根据所述音频发射角对所述音频数据进行渲染以得到待输出信号;
在步骤S603中,根据头相关变换函数hrtf(Head Related Transfer Function)和/或幅度矢量合成定位vbap(Vector Base Amplitude Panning)算法以及所述音频接收角对所述待输出信号进行渲染以得到所述音频信号。
在一个实施例中,可以根据所述角度信息和所述朝向信息,确定在所述虚拟空间中所述接收端相对于所述发送端的音频接收角,例如音频接收角包括旋转方向上的角度θ rece和俯仰方向上的角度
Figure PCTCN2022091055-appb-000011
其中,θ rece=θ ab-azim b
Figure PCTCN2022091055-appb-000012
进而可以先根据所述音频发射角对所述音频数据进行渲染以得到待输出信号,然后根据头相关变换函数hrtf和/或幅度矢量合成定位vbap算法以及所述音频接收角对所述待输出信号进行渲染以得到所述音频信号。
例如待输出信号为Au,渲染得到的音频信号为Au””。其中,在接收端通过不同方式收听音频信号时,可以选择不同的方式进行渲染,例如在通过耳机收听时,可以通过hrtf渲染,
Figure PCTCN2022091055-appb-000013
例如在通过扬声器收听时,可以通过vbap渲染,
Figure PCTCN2022091055-appb-000014
据此,在对音频数据进行渲染时,考虑了在虚拟空间中接收端相对于发送端的音频接收角,使得渲染得到的音频信号可以包含有关音频接收角的特征,以便接收端的用户在收听到所述音频信号时,在能够分辨出声音在虚拟空间中的来源方向的基础上,还能确保在使用耳机收听时和/或使用扬声器收听时的收听效果,有利于提高用户在虚拟空间中进行通信的体验。
需要说明的是,本公开的各个实施例可以根据需要自行结合。例如,可以综合考虑上面多个实施例来对音频数据进行渲染,例如音频数据为Au。
首先,根据所述第二增益系数进行渲染Au 1=g dAu;
然后,根据第一增益系数和低通滤波器进行渲染Au 2=g transLPF(Au 1);
接下来,根据第一位置、第二位置、房间的三维形状和反射系数进行渲染Au 3=reverb(Au 2,(x a,y a,z a),(x a,y a,z a),(x r,y r,z r,r w));
最后,根据hrtf以及音频接收角
Figure PCTCN2022091055-appb-000015
进行渲染得到音频信号
Figure PCTCN2022091055-appb-000016
或者根据vbap算法音频接收角进行渲染得到音频信号
Figure PCTCN2022091055-appb-000017
图7是根据本公开的实施例示出的又一种音频输出方法的示意流程图。
如图7所示,发送端发出的声音可以包括两部分,一部分为音频数据Audio,另一部分为发送端的元数据(metadata),元数据中至少包括所述音频发射角,具体 可以包括一增益系数、低通滤波器(高频衰减系数),还可以包括接收端和发送端在虚拟空间中的距离、发送端和接收端在虚拟空间中所处房间的三维形状和反射系数等。
可以将音频数据和元数据整合为Object格式音频,进而进行编码并传输至接收端,接收端可以对接收到的内容进行解码,得到音频数据、发送端的元数据,然后根据音频数据、发送端的元数据和接收端的元数据(例如至少包括音频接收角)对音频数据进行渲染,得到音频信息后进行输出(播放),接收端的用户进行收听。
需要说明的是,本公开的实施例可以适用音频实时收听场景,也可以适用于音频回放场景。
与前述的音频输出方法的实施例相对应地,本公开还提供了音频输出装置的实施例。
图8是根据本公开的实施例示出的一种音频输出装置的示意框图。本实施例所示的装置可以适用于VR设备、AR设备等能够作为虚拟空间中音频信号的接收端的设备。
如图8所示,所述音频输出装置可以包括:
处理模块801,被配置为确定音频的发送端在虚拟空间中的朝向信息,以及所述音频的接收端和所述发送端在所述虚拟空间中的角度信息;根据所述角度信息和所述朝向信息,确定在所述虚拟空间中所述发送端相对于所述接收端的音频发射角;至少根据所述音频发射角对音频数据进行渲染以得到音频信号;
输出模块802,别配置为输出所述音频信号。
在一个实施例中,所述处理模块,被配置为根据所述音频发射角确定第一增益系数和/或低通滤波器的高频衰减系数;根据所述第一增益系数和/或所述低通滤波器对所述音频数据进行渲染以得到所述音频信号;
其中,所述第一增益系数与所述音频发射角正相关,和/或所述高频衰减系数与所述音频发射角负相关。
在一个实施例中,述处理模块,被配置为确定所述接收端和所述发送端在所述虚拟空间中的距离;根据所述距离确定第二增益系数,其中,所述第二增益系数在预设距离范围内与所述距离反相关;根据所述第二增益系数对所述音频数据进行渲染以得到待输出信号;根据所述音频发射角对所述待输出信号进行渲染以得到所述音频信 号。
在一个实施例中,所述处理模块,被配置为确定所述发送端在所述虚拟空间中的第一位置、所述接收端在所述虚拟空间中的第二位置、以及所述发送端和所述接收端在所述虚拟空间中所处房间的三维形状和反射系数;根据所述音频发射角对所述音频数据进行渲染以得到待输出信号;根据所述第一位置、第二位置、所述房间的三维形状和反射系数以及所述待输出信号生成混响并添加至所述待输出信号以得到所述音频信号。
在一个实施例中,所述处理模块,被配置为根据所述角度信息和所述朝向信息,确定在所述虚拟空间中所述接收端相对于所述发送端的音频接收角;根据所述音频发射角对所述音频数据进行渲染以得到待输出信号;根据头相关变换函数hrtf和/或幅度矢量合成定位vbap算法以及所述音频接收角对所述待输出信号进行渲染以得到所述音频信号。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在相关方法的实施例中进行了详细描述,此处将不做详细阐述说明。
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
本公开的实施例还提出一种通信装置,包括:处理器;用于存储计算机程序的存储器;其中,当所述计算机程序被处理器执行时,实现上述任一实施例所述的音频输出方法。
本公开的实施例还提出一种计算机可读存储介质,用于存储计算机程序,当所述计算机程序被处理器执行时,实现上述任一实施例所述的音频输出方法中的步骤。
图9是根据本公开的实施例示出的一种用于音频输出的装置900的示意框图。例如,装置900可以是移动电话、计算机、数字广播终端、消息收发设备、游戏控制台、平板设备、医疗设备、健身设备、个人数字助理等。
参照图9,装置900可以包括以下一个或多个组件:处理组件902、存储器904、 电源组件906、多媒体组件908、音频组件910、输入/输出(I/O)的接口912、传感器组件914以及通信组件916。
处理组件902通常控制装置900的整体操作,诸如与显示、电话呼叫、数据通信、相机操作和记录操作相关联的操作。处理组件902可以包括一个或多个处理器920来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件902可以包括一个或多个模块,便于处理组件902和其他组件之间的交互。例如,处理组件902可以包括多媒体模块,以方便多媒体组件908和处理组件902之间的交互。
存储器904被配置为存储各种类型的数据以支持在装置900的操作。这些数据的示例包括用于在装置900上操作的任何应用程序或方法的指令、联系人数据、电话簿数据、消息、图片、视频等。存储器904可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM)、电可擦除可编程只读存储器(EEPROM)、可擦除可编程只读存储器(EPROM)、可编程只读存储器(PROM),只读存储器(ROM)、磁存储器、快闪存储器、磁盘或光盘。
电源组件906为装置900的各种组件提供电力。电源组件906可以包括电源管理系统,一个或多个电源,及其他与为装置900生成、管理和分配电力相关联的组件。
多媒体组件908包括在所述装置900和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件908包括一个前置摄像头和/或后置摄像头。当装置900处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件910被配置为输出和/或输入音频信号。例如,音频组件910包括一个麦克风(MIC),当装置900处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器904或经由通信组件916发送。在一些实施例中,音频组件910还包括一个扬声器,用于输出音频信号。
I/O接口912为处理组件902和外围接口模块之间提供接口,上述外围接口模块可以是键盘、点击轮、按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件914包括一个或多个传感器,用于为装置900提供各个方面的状态评估。例如,传感器组件914可以检测到装置900的打开/关闭状态,组件的相对定位,例如所述组件为装置900的显示器和小键盘,传感器组件914还可以检测装置900或装置900一个组件的位置改变,用户与装置900接触的存在或不存在,装置900方位或加速/减速和装置900的温度变化。传感器组件914可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件914还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件914还可以包括加速度传感器、陀螺仪传感器、磁传感器、压力传感器或温度传感器。
通信组件916被配置为便于装置900和其他设备之间有线或无线方式的通信。装置900可以接入基于通信标准的无线网络,如WiFi、2G、3G、4G LTE、5G NR或它们的组合。在一个示例性实施例中,通信组件916经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件916还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术、红外数据协会(IrDA)技术、超宽带(UWB)技术、蓝牙(BT)技术和其他技术来实现。
在示例性实施例中,装置900可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器904,上述指令可由装置900的处理器920执行以完成上述方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本领域技术人员在考虑说明书及实践这里公开的公开后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的 公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上对本公开实施例所提供的方法和装置进行了详细介绍,本文中应用了具体个例对本公开的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本公开的方法及其核心思想;同时,对于本领域的一般技术人员,依据本公开的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本公开的限制。

Claims (12)

  1. 一种音频输出方法,其特征在于,包括:
    确定音频的发送端在虚拟空间中的朝向信息,以及所述音频的接收端和所述发送端在所述虚拟空间中的角度信息;
    根据所述角度信息和所述朝向信息,确定在所述虚拟空间中所述发送端相对于所述接收端的音频发射角;
    至少根据所述音频发射角对音频数据进行渲染以得到音频信号;
    输出所述音频信号。
  2. 根据权利要求1所述的方法,其特征在于,所述至少根据所述音频发射角对音频数据进行渲染以得到音频信号包括:
    根据所述音频发射角确定第一增益系数和/或低通滤波器的高频衰减系数;
    根据所述第一增益系数和/或所述低通滤波器对所述音频数据进行渲染以得到所述音频信号;
    其中,所述第一增益系数与所述音频发射角正相关,和/或所述高频衰减系数与所述音频发射角负相关。
  3. 根据权利要求1或2所述的方法,其特征在于,所述至少根据所述音频发射角对音频数据进行渲染以得到音频信号包括:
    确定所述接收端和所述发送端在所述虚拟空间中的距离;
    根据所述距离确定第二增益系数,其中,所述第二增益系数在预设距离范围内与所述距离反相关;
    根据所述第二增益系数对所述音频数据进行渲染以得到待输出信号;
    根据所述音频发射角对所述待输出信号进行渲染以得到所述音频信号。
  4. 根据权利要求1或2所述的方法,其特征在于,所述至少根据所述音频发射角对音频数据进行渲染以得到音频信号包括:
    确定所述发送端在所述虚拟空间中的第一位置、所述接收端在所述虚拟空间中的第二位置、以及所述发送端和所述接收端在所述虚拟空间中所处房间的三维形状和反射系数;
    根据所述音频发射角对所述音频数据进行渲染以得到待输出信号;
    根据所述第一位置、第二位置、所述房间的三维形状和反射系数以及所述待输出信号生成混响并添加至所述待输出信号以得到所述音频信号。
  5. 根据权利要求1或2所述的方法,其特征在于,所述至少根据所述音频发射角 对音频数据进行渲染以得到音频信号包括:
    根据所述角度信息和所述朝向信息,确定在所述虚拟空间中所述接收端相对于所述发送端的音频接收角;
    根据所述音频发射角对所述音频数据进行渲染以得到待输出信号;
    根据头相关变换函数hrtf和/或幅度矢量合成定位vbap算法以及所述音频接收角对所述待输出信号进行渲染以得到所述音频信号。
  6. 一种音频输出装置,其特征在于,包括:
    处理模块,被配置为确定音频的发送端在虚拟空间中的朝向信息,以及所述音频的接收端和所述发送端在所述虚拟空间中的角度信息;根据所述角度信息和所述朝向信息,确定在所述虚拟空间中所述发送端相对于所述接收端的音频发射角;至少根据所述音频发射角对音频数据进行渲染以得到音频信号;
    输出模块,别配置为输出所述音频信号。
  7. 根据权利要求6所述的装置,其特征在于,所述处理模块,被配置为根据所述音频发射角确定第一增益系数和/或低通滤波器的高频衰减系数;根据所述第一增益系数和/或所述低通滤波器对所述音频数据进行渲染以得到所述音频信号;
    其中,所述第一增益系数与所述音频发射角正相关,和/或所述高频衰减系数与所述音频发射角负相关。
  8. 根据权利要求6或7所述的装置,其特征在于,所述处理模块,被配置为确定所述接收端和所述发送端在所述虚拟空间中的距离;根据所述距离确定第二增益系数,其中,所述第二增益系数在预设距离范围内与所述距离反相关;根据所述第二增益系数对所述音频数据进行渲染以得到待输出信号;根据所述音频发射角对所述待输出信号进行渲染以得到所述音频信号。
  9. 根据权利要求6或7所述的装置,其特征在于,所述处理模块,被配置为确定所述发送端在所述虚拟空间中的第一位置、所述接收端在所述虚拟空间中的第二位置、以及所述发送端和所述接收端在所述虚拟空间中所处房间的三维形状和反射系数;根据所述音频发射角对所述音频数据进行渲染以得到待输出信号;根据所述第一位置、第二位置、所述房间的三维形状和反射系数以及所述待输出信号生成混响并添加至所述待输出信号以得到所述音频信号。
  10. 根据权利要求6或7所述的装置,其特征在于,所述处理模块,被配置为根据所述角度信息和所述朝向信息,确定在所述虚拟空间中所述接收端相对于所述发送端的音频接收角;根据所述音频发射角对所述音频数据进行渲染以得到待输出信号; 根据头相关变换函数hrtf和/或幅度矢量合成定位vbap算法以及所述音频接收角对所述待输出信号进行渲染以得到所述音频信号。
  11. 一种通信装置,其特征在于,包括:
    处理器;
    用于存储计算机程序的存储器;
    其中,当所述计算机程序被处理器执行时,实现权利要求1至5中任一项所述的音频输出方法。
  12. 一种计算机可读存储介质,用于存储计算机程序,其特征在于,当所述计算机程序被处理器执行时,实现权利要求1至5中任一项所述的音频输出方法中的步骤。
PCT/CN2022/091055 2022-05-05 2022-05-05 音频输出方法和装置、通信装置和存储介质 WO2023212883A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/091055 WO2023212883A1 (zh) 2022-05-05 2022-05-05 音频输出方法和装置、通信装置和存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/091055 WO2023212883A1 (zh) 2022-05-05 2022-05-05 音频输出方法和装置、通信装置和存储介质

Publications (1)

Publication Number Publication Date
WO2023212883A1 true WO2023212883A1 (zh) 2023-11-09

Family

ID=88646112

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/091055 WO2023212883A1 (zh) 2022-05-05 2022-05-05 音频输出方法和装置、通信装置和存储介质

Country Status (1)

Country Link
WO (1) WO2023212883A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106993249A (zh) * 2017-04-26 2017-07-28 深圳创维-Rgb电子有限公司 一种声场的音频数据的处理方法及装置
US20170265016A1 (en) * 2016-03-11 2017-09-14 Gaudio Lab, Inc. Method and apparatus for processing audio signal
CN108346432A (zh) * 2017-01-25 2018-07-31 北京三星通信技术研究有限公司 虚拟现实vr音频的处理方法及相应设备
CN111148013A (zh) * 2019-12-26 2020-05-12 上海大学 一个动态跟随听觉视角的虚拟现实音频双耳再现系统与方法
US20210385608A1 (en) * 2018-10-24 2021-12-09 Otto Engineering, Inc. Directional awareness audio communications system
CN114339297A (zh) * 2022-03-09 2022-04-12 央广新媒体文化传媒(北京)有限公司 音频处理方法、装置、电子设备和计算机可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170265016A1 (en) * 2016-03-11 2017-09-14 Gaudio Lab, Inc. Method and apparatus for processing audio signal
CN108346432A (zh) * 2017-01-25 2018-07-31 北京三星通信技术研究有限公司 虚拟现实vr音频的处理方法及相应设备
CN106993249A (zh) * 2017-04-26 2017-07-28 深圳创维-Rgb电子有限公司 一种声场的音频数据的处理方法及装置
US20210385608A1 (en) * 2018-10-24 2021-12-09 Otto Engineering, Inc. Directional awareness audio communications system
CN111148013A (zh) * 2019-12-26 2020-05-12 上海大学 一个动态跟随听觉视角的虚拟现实音频双耳再现系统与方法
CN114339297A (zh) * 2022-03-09 2022-04-12 央广新媒体文化传媒(北京)有限公司 音频处理方法、装置、电子设备和计算机可读存储介质

Similar Documents

Publication Publication Date Title
US11785134B2 (en) User interface that controls where sound will localize
US11991315B2 (en) Audio conferencing using a distributed array of smartphones
US11706577B2 (en) Systems and methods for equalizing audio for playback on an electronic device
US20180332395A1 (en) Audio Mixing Based Upon Playing Device Location
CN106454644B (zh) 音频播放方法及装置
KR102538775B1 (ko) 오디오 재생 방법 및 오디오 재생 장치, 전자 기기 및 저장 매체
CN110049428B (zh) 用于实现多声道环绕立体声播放的方法、播放设备及系统
CN109121047B (zh) 双屏终端立体声实现方法、终端及计算机可读存储介质
TWI709131B (zh) 音訊場景處理技術
WO2016123901A1 (zh) 终端及其定向播放音频信号的方法
CN112770248B (zh) 音箱控制方法、装置及存储介质
WO2023212883A1 (zh) 音频输出方法和装置、通信装置和存储介质
WO2018058331A1 (zh) 控制音量的方法及装置
WO2022059362A1 (ja) 情報処理装置、情報処理方法および情報処理システム
US10993064B2 (en) Apparatus and associated methods for presentation of audio content
US20210306448A1 (en) Controlling audio output
WO2024027315A1 (zh) 音频处理方法、装置、电子设备、存储介质和程序产品
WO2023240467A1 (zh) 音频播放方法、装置及存储介质
JP2024041721A (ja) ビデオ電話会議
CN116088786A (zh) 音频播放方法、装置、电子设备和存储介质
CN117319889A (zh) 音频信号的处理方法、装置、电子设备、及存储介质
CN118059485A (zh) 音频处理方法以及装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22940579

Country of ref document: EP

Kind code of ref document: A1