WO2022067652A1 - 实时通信方法、装置和系统 - Google Patents
实时通信方法、装置和系统 Download PDFInfo
- Publication number
- WO2022067652A1 WO2022067652A1 PCT/CN2020/119357 CN2020119357W WO2022067652A1 WO 2022067652 A1 WO2022067652 A1 WO 2022067652A1 CN 2020119357 W CN2020119357 W CN 2020119357W WO 2022067652 A1 WO2022067652 A1 WO 2022067652A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- communication device
- audio
- communication
- location
- playback
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/02—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using radio waves
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/02—Details
- H04L12/16—Arrangements for providing special services to substations
- H04L12/18—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/02—Services making use of location information
Definitions
- the present application relates to the field of communication technologies, and in particular, to a real-time communication method, apparatus and system.
- smart headphones have functions such as stereo sound effects, noise reduction, and biometric monitoring, which greatly improve the user's audio experience and become a product that users often use. And because of its portability, users often use smart headsets to interact with the other party in a call.
- a chat tool to conduct a real-time voice call or video call (referred to as an audio-video call)
- the user can only obtain the voice content of the calling party, which reduces the user's experience of the audio and video call.
- Embodiments of the present application provide a real-time communication method, device, and system.
- a user can listen to the other party's voice with position-directed sound effects, thereby improving the audio-video call experience.
- an embodiment of the present application provides a real-time communication method.
- the method includes: during a process of an audio and video call between a first communication device and a second communication device, acquiring the location of the first communication device, and receiving a communication message from the first communication device.
- the device 's first audio.
- a second audio frequency is generated, and the second audio frequency is audio having position directivity. Therefore, when the second communication device is playing the second audio, the user can listen to the other party's voice with position-directed sound effects, which improves the experience of audio and video calls.
- receiving the first audio from the first communication device should be understood as directly receiving the first audio from the first communication device, for example, directly receiving the first audio sent by the first communication device.
- the first audio from the first communication device is received indirectly, eg, the first audio from the first communication device sent by other devices is received.
- acquiring the location of the first communication device specifically includes: receiving a first message from the first communication device, where the first message includes the location of the first communication device.
- the location of the first communication device is sent by the first communication device, so that the second communication device can directly obtain the location of the first communication device, which is more concise and efficient.
- acquiring the location of the first communication device is specifically: configuring a first virtual location for the first communication device.
- the second communication device directly assigns the first virtual location to the first communication device, without the need for the first communication device to send the location of the first communication device, so that the second communication device obtains the location of the first communication device, and more Simple and efficient.
- acquiring the location of the first communication device specifically includes: detecting a location keyword in the audio data of the first audio, where the location keyword is used to represent the location of the first communication device.
- the second communication device detects the position keyword contained in the audio data of the first audio of the first communication device, and analyzes the position keyword to determine the position of the first communication device, without relying on the first communication device to send the first Location of communication equipment.
- a third audio frequency is generated according to the position of the first communication device and the first audio frequency, and the third audio frequency includes relative position information of the first communication device and the second communication device.
- the second audio is generated based on the third audio, and the parameters of the second communication device.
- the third audio may be, for example, a spatial audio object in a standard of "object-based audio immersive sound metadata and codestream".
- the spatial audio object contains a location field and a content field.
- the position field is the relative position information of the first communication device and the second communication device;
- the content field is the voice content information of the first audio of the first communication device.
- the second communication device includes at least one playback device; the playback device includes a headset, virtual reality VR or augmented reality AR.
- the second communication device is connected to a playback device.
- the second audio is generated according to the position of the first communication device, the first audio frequency, and the parameters of the second communication device, specifically: when the second communication device is connected to the playback device, according to the position of the first communication device, the first The audio, and the parameters of the second communication device, generate the second audio.
- the head-related transfer function on the playback device corresponding to the relative position information is acquired according to the relative position information included in the third audio.
- the first audio is processed with the head-related transfer function to obtain the second audio.
- the playback device is an earphone
- the earphone has a left ear and a right ear.
- the second audio frequency is generated, specifically: obtaining the head-related transmission of the left ear corresponding to the relative position information according to the relative position information contained in the third audio frequency. function and the head-related transfer function of the right ear.
- the first audio frequency is processed with the head-related transfer function of the left ear and the head-related transfer function of the right ear, respectively, to obtain left-ear audio and right-ear audio of the earphone.
- the second audio is used for playing by the second communication device.
- the second communication device can use the player configured by itself to play the second audio.
- the method includes: sending the second audio to the third communication device, and the second audio is sent to the third communication device.
- the audio is used to instruct the third communication device to play the second audio.
- the third communication device may be an external playback device of the second communication device, and the third communication device may know that the second audio should be played after receiving the second audio.
- the audio and video call includes one or more of a video call, a voice call, a voice conference, and a video conference.
- an embodiment of the present application provides a real-time communication method.
- the method includes: during an audio-video call between a first communication device and a second communication device, the first communication device sends first audio.
- the second communication device obtains the location of the first communication device.
- the second communication device receives the first audio from the first communication device.
- the second communication device generates a second audio according to the position of the first communication device, the first audio from the first communication device, and the parameters of the second communication device, and the second audio is audio with position directivity.
- the second communication device plays the second audio.
- the second communication device includes at least one playback device, and the second communication device plays the second audio, specifically: the second communication device sends the second audio to the at least one playback device.
- the playback device receives the second audio and plays the second audio.
- an embodiment of the present application provides a real-time communication method.
- the method includes: during an audio-video call between a first communication device and a second communication device, the first communication device sends a first audio of the first communication device. .
- the server receives the first audio from the first communication device.
- the server obtains the location of the first communication device and the location of the second communication device.
- the server generates a second audio according to the position of the first communication device, the position of the second communication device, the first audio from the first communication device, and the parameters of the second communication device, and the second audio is audio with position directivity.
- the server sends the second audio to the second communication device.
- the second communication device plays the second audio.
- the second communication device includes at least one playback device, and the second communication device plays the second audio, specifically: the second communication device sends the second audio to the at least one playback device.
- the playback device receives the second audio and plays the second audio.
- an embodiment of the present application provides a communication device, including: an obtaining and receiving unit, configured to obtain the location of the first communication device during an audio and video call between the first communication device and the second communication device, and receive a message from the first communication device.
- the generating unit is configured to generate a second audio according to the position of the first communication device, the first audio frequency, and the parameters of the second communication device, where the second audio is audio with position directivity.
- the acquiring and receiving unit is further configured to receive a first message from the first communication device, where the first message includes the location of the first communication device; or, configure the first virtual location for the first communication device; or , to detect a location keyword in the audio data of the first audio, where the location keyword is used to characterize the location of the first communication device.
- the generating unit is further configured to: generate a third audio according to the position of the first communication device and the first audio, where the third audio includes relative position information of the first communication device and the second communication device; The second audio is generated based on the third audio, and the parameters of the second communication device.
- the second communication device includes at least one playback device.
- Playback devices include headsets, virtual reality VR or augmented reality AR.
- the second communication device is connected to a playback device.
- the generating unit is further configured to generate the second audio according to the position of the first communication device, the first audio, and the parameters of the second communication device when the second communication device is connected to the playback device.
- the generating unit is further configured to: obtain the head-related transfer function on the playback device corresponding to the relative position information according to the relative position information contained in the third audio; process the first audio and the head-related transfer function. , to get the second audio.
- the playback device is an earphone
- the earphone has a left ear and a right ear.
- the generating unit is also used to: obtain the head-related transfer function of the left ear and the head-related transfer function of the right ear corresponding to the relative position information according to the relative position information contained in the third audio;
- the function and the head-related transfer function of the right ear are processed to obtain the left ear audio and the right ear audio of the earphone.
- the second audio is used for playing by the second communication device.
- the communication apparatus further includes: a sending unit configured to send the second audio to the third communication device, where the second audio is used to instruct the third communication device to play the second audio.
- the audio and video call includes one or more of a video call, a voice call, a voice conference, and a video conference.
- an embodiment of the present application provides a communication system, including: during a process of an audio-video call between a first communication device and a second communication device, the first communication device is configured to send the first audio.
- the second communication device is for receiving the first audio from the first communication device.
- the second communication device is used to obtain the position of the first communication device, and generate a second audio according to the position of the first communication device, the first audio from the first communication device, and the parameters of the second communication device, and the second audio is Position-directed audio.
- the second communication device is used to play the second audio.
- the second communication device includes at least one playback device, including: the second communication device is used to send the second audio to the at least one playback device; the playback device is used to receive the second audio and play the second audio. audio.
- an embodiment of the present application provides a communication system, including: during a process of an audio-video call between a first communication device and a second communication device, the first communication device is used to send the first audio of the first communication device;
- the server is for receiving the first audio from the first communication device.
- the server is used to obtain the position of the first communication device and the position of the second communication device; the server is used to obtain the position of the first communication device, the position of the second communication device, the first audio from the first communication device, and the second communication
- the parameters of the device are used to generate the second audio, and the second audio is audio with positional directivity; the server is used to send the second audio to the second communication device; the second communication device is used to play the second audio.
- the second communication device includes at least one playback device, including: the second communication device is used to send the second audio to the at least one playback device; the playback device is used to receive the second audio and play the second audio. audio.
- an embodiment of the present application provides an electronic device, the electronic device includes: a processor and a memory, the memory is coupled to the processor, the memory is used for storing computer program code, and the computer program code includes Computer instructions, when the processor reads the computer instructions from the memory, so that the electronic device executes the real-time communication method described in the first aspect above or any possible design of the above aspect.
- an embodiment of the present application provides a computer program product, where the computer program product includes computer instructions, and when the computer instructions are run on a computer, the computer can execute the first aspect or any one of the above aspects.
- the design of the described real-time communication method is not limited to:
- an embodiment of the present application provides a computer-readable storage medium, characterized in that it includes computer instructions, and the computer-readable storage medium includes computer instructions, when the computer instructions are executed on a computer, the The computer executes the real-time communication method described in the above first aspect or any possible design of the above aspect.
- an embodiment of the present application provides a chip system, characterized in that it includes one or more processors, and when the one or more processors execute an instruction, the one or more processors execute the above-mentioned first step.
- the time communication method is described.
- the first audio from the first communication device is received, and the second communication
- the parameters of the device to generate the second audio with position directivity Therefore, when the second communication device is playing the second audio, the user can listen to the other party's voice with position-directed sound effects, which improves the experience of audio and video calls.
- FIG. 1a is a schematic diagram 1 of a practical application scenario of a smart speaker provided by an embodiment of the application;
- FIG. 1b is a second schematic diagram of a practical application scenario of a smart speaker provided by an embodiment of the application
- FIG. 2a is a schematic diagram of a practical application scenario of an earphone provided by an embodiment of the application
- 3a is a schematic diagram 1 of an actual application scenario of a vehicle-mounted device provided by an embodiment of the application;
- 3b is a second schematic diagram of a practical application scenario of a vehicle-mounted device provided by an embodiment of the application;
- FIG. 1 is a schematic diagram of the architecture of a communication system provided by an embodiment of the present application.
- FIG. 2 is a schematic diagram of the composition of an electronic device provided by an embodiment of the present application.
- FIG. 3 is a flowchart of a real-time communication method provided by an embodiment of the present application.
- FIG. 4 is a schematic diagram of an application scenario of a communication method provided by an embodiment of the present application.
- FIG. 5 is a schematic diagram of an application scenario of another communication method provided by an embodiment of the present application.
- FIG. 6 is a flowchart of another communication method provided by an embodiment of the present application.
- FIG. 7 is a flowchart of still another communication method provided by an embodiment of the present application.
- FIG. 8 is a schematic diagram of the composition of a communication apparatus according to an embodiment of the present application.
- smart headphones have functions such as stereo sound effects, noise reduction, and biometric monitoring, which greatly improve the user's audio experience and become a product that users often use.
- the stereo sound effect function can utilize the human perception principle of the spatial orientation of sound.
- the principle of the person's perception of the spatial orientation of the sound specifically: the spatial sound source is transmitted to the left and right ears of the person through the air. Due to the different distances of the sound waves reaching the left and right ears, the sound waves reaching the left and right ears of the person are different, including the sound pressure of the sound waves. Different frequencies have different phases.
- the left and right ears form the perception of the audio spatial orientation and distance of the sound source based on the different sound waves of the same sound source. Exemplarily, when a user listens to music with a smart earphone, the user can listen to music with stereo sound effects.
- the noise reduction function can utilize the principle of noise neutralization. Specifically, the microphone arranged inside the smart earphone detects low-frequency noise (100-1000 Hz) in the environment, and transmits the detected low-frequency noise to the control circuit in the smart earphone. The control circuit performs real-time operation according to the low-frequency noise, obtains a sound wave with the opposite phase and the same amplitude as the low-frequency noise, and controls the player to play.
- low-frequency noise 100-1000 Hz
- the control circuit performs real-time operation according to the low-frequency noise, obtains a sound wave with the opposite phase and the same amplitude as the low-frequency noise, and controls the player to play.
- the existing smart earphones are powerful. And because of its portability, users often use smart headsets to interact with the other party in a call. However, when the user makes a real-time audio and video call using the chat tool, the user can only obtain the voice content of the other party in the call, and cannot effectively utilize the stereo sound effect function of the smart earphone.
- a real-time communication method is proposed.
- the method generates the third audio according to the acquired position of the first communication device and the first audio from the first communication device during the audio-video call between the first communication device and the second communication device.
- the third audio includes relative position information of the first communication device and the second communication device.
- a second audio with location directivity is generated. Therefore, when the second communication device is playing the second audio, the user can listen to the other party's voice with position-directed sound effects, which improves the experience of audio and video calls.
- the second communication device described in the above embodiment may be a smart speaker.
- the first communication device may be an electronic device capable of making audio and video calls with the second communication device, for example, a mobile phone.
- FIG. 1a is a schematic diagram 1 of an actual application scenario of a smart speaker provided by an embodiment of the present application.
- user A is sitting on a sofa at home (i.e., location A), and user B is at location B by bus.
- position A is located in the due south direction of position B.
- the smart speaker 11 uses the smart speaker 11 (ie, the second communication device) to make an audio call with the mobile phone 12 (ie, the first communication device) used by user B
- the smart speaker 11 uses the acquired location of the mobile phone 12 and the first An audio generates a second audio with positional directivity.
- FIG. 1b is a second schematic diagram of a practical application scenario of a smart speaker provided by an embodiment of the present application.
- user A is sitting on a sofa at home facing north.
- User A can perceive that user B is speaking directly in front of user A, so that user A and user B are talking face-to-face at close range, improving the experience of audio and video calls.
- the second communication device described in the above embodiment may also be a smart screen, and user A can use the smart screen to conduct an audio call with the mobile phone used by user B.
- the smart screen can perform similar operations and the same effects as the above-mentioned smart speakers, which will not be repeated here.
- the second communication device described in the foregoing embodiment may be an earphone, and the number of earphones may be one or more.
- the first communication device may be an electronic device (eg, a mobile phone, a headset) capable of performing an audio and video call with the second communication device, and the number of the first communication device may be one or more.
- FIG. 2a is a schematic diagram of an actual application scenario of a earphone provided by an embodiment of the present application.
- user A is located at location A
- user B1 is located at location B1
- user B2 is located at location B2.
- the position B1 is located in the southwest direction of the position A
- the position B2 is located in the southeast direction of the position A.
- the headset adopted by user A conducts a conference call with user B1 and user B2.
- the earphone of user A acquires the location of the earphone of user B1 and the location of the earphone of user B2.
- the earphone of user A generates position-directed audio according to the position of user B1's earphone and the position of user B2's earphone, and the audio from user B1's earphone and the audio from user B2's earphone, which can realize the audio of different people Has different position pointing sound effects.
- user A can perceive that user B1 speaks in front of user A's right and user B2 speaks in front of user A's left, providing user B with an immersive sound. Therefore, when multiple people speak at the same time, the listener can also distinguish different speakers according to the sound effects of different positions, improving the audio and video call experience.
- the second communication device described in the above embodiments may be a vehicle-mounted device.
- the first communication device may be an electronic device capable of making audio and video calls with the second communication device, for example, a mobile phone.
- FIG. 3 a is a schematic diagram 1 of an actual application scenario of an in-vehicle device according to an embodiment of the present application.
- user A is driving at location A (ie, the location of the in-vehicle device)
- user B is at location B (ie, the location of the mobile phone) by bus, where location B is located due east of location A.
- User A uses the in-vehicle device 31 to make an audio and video call with the mobile phone 32 used by user B.
- prompt information such as "User B is in a call" may be displayed on the display screen of the in-vehicle device 31 .
- FIG. 3b is a second schematic diagram of an actual application scenario of an in-vehicle device provided by an embodiment of the present application. As shown in Figure 3b, user A can perceive that user B is speaking to the east (ie, right) of user A, so that user B is talking with user A in the passenger seat, improving the experience of audio and video calls.
- the first communication device and the second communication device may also be other devices, such as a television set, a camera, and so on. I will not list them one by one here. For details, please refer to the following related content.
- the real-time communication method provided in this embodiment of the present application may be applied to the communication system shown in FIG. 1 .
- the communication system 100 may include a first communication device 110 and a second communication device 120 .
- the communication system 100 may also include a server 130 .
- the devices involved in the architecture shown in FIG. 1 are introduced below.
- the first communication device 110 may be a device for implementing a wireless communication function, such as a communication device or a chip that can be used in the communication device.
- the first communication device 110 may be a communication device having functional units such as a microphone, a display screen, a camera, and a player.
- the first communication device 110 may include user equipment (UE), smart screen, access terminal, terminal unit, terminal station, mobile station, mobile station, remote station, remote terminal, mobile device, wireless communication device, terminal agent or terminal device, etc.
- UE user equipment
- the access terminal may be a cellular telephone, a cordless telephone, a session initiation protocol (SIP) telephone, a wireless local loop (WLL) station, a personal digital assistant (PDA), a wireless communication
- SIP session initiation protocol
- WLL wireless local loop
- PDA personal digital assistant
- a functional handheld device, computing device or other processing device connected to a wireless modem, in-vehicle device or wearable device, the first communication device 110 may be mobile or stationary.
- the first communication device 110 can also be connected to a playback device.
- the audio may be played by the first communication device 110 using a player configured by itself, or an external playback device may be used to play the audio.
- the playback device may be a head-mounted playback device, and the head-mounted playback device may be a wired headset, a wireless headset (eg, TWS Bluetooth headset, neck-mounted Bluetooth headset, headset Bluetooth headset), virtual reality (virtual reality) , VR) equipment, augmented reality (augmented reality, AR) equipment, etc., this application does not specifically limit the specific form of the headset.
- a wireless headset eg, TWS Bluetooth headset, neck-mounted Bluetooth headset, headset Bluetooth headset
- virtual reality virtual reality
- VR augmented reality
- AR augmented reality
- the second communication device 120 and the first communication device 110 may be the same, and reference may be made to the above-mentioned description of the first communication device 110, and details are not repeated here.
- the first communication device and the second communication device are respectively connected to the head mounted playback device 140 .
- the first communication device 110 is connected to the head-mounted playback device 140
- the second communication device 120 is not connected to the head-mounted playback device 140 .
- the first communication device 110 is not connected to the head-mounted playback device 140
- the second communication device 120 is connected to the head-mounted playback device 140 .
- the embodiment of the present application is described by taking as an example that the first communication device 110 is not connected to the head-mounted playback device 140 , and the second communication device 120 is connected to the head-mounted playback device 140 .
- the server 130 may also be referred to as a service device, a service device, a cloud server, a cloud computing server, or a cloud host.
- the server in the embodiment of the present application may be used to provide audio and video call services, such as cellular calls or Internet calls.
- the first communication device 110 and the second communication device 120 conduct an audio and video call through the server 130 .
- the audio and video call may include one or more of a video call, a voice call, a voice conference and a video conference.
- the first communication device 110 and the second communication device 120 may use a session initiation protocol (session initiation protocol, SIP) and a real-time transport protocol (real-time transport protocol, RTP) for audio and video calls.
- session initiation protocol session initiation protocol
- RTP real-time transport protocol
- the real-time communication method provided by the embodiment of the present application may be executed by the second communication device 120 or the server 130, and the details are as follows:
- the second communication device 120 is the execution body of the real-time communication method provided by the embodiment of the application.
- the second communication device 120 receives the first audio from the first communication device 110 .
- the second communication device 120 acquires the location of the first communication device 110 .
- the second communication device 120 generates a third audio according to the position of the first communication device 110 and the first audio of the first communication device 110 , and the third audio includes relative position information of the first communication device 110 and the second communication device 120 .
- the second communication device 120 generates a second audio with position directivity according to the third audio and the parameters of the second communication device 120 , and the second communication device 120 plays the second audio.
- the second communication device when the second communication device receives the first audio from the first communication device, it should be understood as directly receiving the first audio from the first communication device, for example, the second communication device directly receives the first audio sent by the first communication device .
- the second communication device indirectly receives the first audio from the first communication device, for example, the first communication device sends the first audio to other devices, and the second communication device receives the first audio sent by the other device from the first communication device. an audio.
- the second communication device may acquire the location of the first communication device in any of the following ways: receiving a first message from the first communication device, where the first message includes the location of the first communication device; or, the second communication device is The first communication device configures the first virtual position; or, the second communication device detects a position keyword in the audio data of the first audio, where the position keyword is used to characterize the position of the first communication device.
- This embodiment of the present application does not specifically limit the manner in which the second communication device acquires the location of the first communication device.
- the third audio may be, for example, a spatial audio object in a standard of "object-based audio immersive sound metadata and codestream".
- the spatial audio object contains a location field and a content field.
- the position field is the relative position information of the first communication device and the second communication device;
- the content field is the voice content information of the first audio of the first communication device.
- the second communication device is connected to a playback device.
- the second communication device sends the second audio to the playback device, and the playback device receives the second audio and plays the second audio.
- the second communication device is connected to a playback device.
- the second communication device sends the third audio to the playback device, the playback device receives the third audio, and according to the third audio and the parameters of the playback device, generates a second audio with positional directivity, and plays the second audio.
- the playback device may be a head-mounted playback device
- the head-mounted playback device may be a wired headset, a wireless headset (for example, a TWS Bluetooth headset, a neck-mounted Bluetooth headset, a headset Bluetooth headset), a virtual reality ( A virtual reality (VR) device, an augmented reality (AR) device, etc.
- a wireless headset for example, a TWS Bluetooth headset, a neck-mounted Bluetooth headset, a headset Bluetooth headset
- VR virtual reality
- AR augmented reality
- the head-mounted playback device If the head-mounted playback device is an earphone, the head-mounted playback device generates the second audio according to the third audio and the parameters of the head-mounted playback device.
- the head-related transfer function of the left ear and the head-related transfer function of the right ear corresponding to the position information; the first audio frequency is respectively processed with the head-related transfer function of the left ear and the head-related transfer function of the right ear to obtain the left ear audio of the earphone. and right ear audio.
- the earphones are used for the left earphone and the right earphone to be worn on the left ear and the right ear of the user, respectively.
- the headsets can communicate with each other through a wired connection or a wireless connection (path 11 as shown in Figure 1).
- the headset may also communicate with the second communication device via a wired or wireless connection (path 12 as shown in Figure 1).
- the wireless connection may be, for example, a connection manner such as Bluetooth, WiFi, NFC, and ZigBee.
- the path 12 may adopt, for example, BT, WLAN (such as Wi-Fi), Zigbee, FM, NFC, IR, or general 2.4G/5G wireless communication technology.
- the connection mode adopted by the path 12 and the connection mode adopted by the path 11 may be the same or different, which is not specifically limited in this embodiment of the present application.
- a playback device connected to the second communication device performs similar operations as the second communication device.
- the server may also be the execution body of the real-time communication method provided by the embodiment of the present application.
- the server performs similar operations with the second communication device.
- the server 130 acquires the location of the first communication device 110 and the location of the second communication device 120 .
- the server 130 calculates the relative position of the first communication device 110 and the second communication device 120 according to the position of the first communication device 110 and the position of the second communication device 120 .
- the server 130 generates a third audio according to the relative positions of the first communication device 110 and the second communication device 120 and the first audio of the first communication device 110 . relative location information.
- the server 130 generates the second audio according to the third audio and the parameters of the head-mounted playback device 140 of the second communication device 120 .
- FIG. 2 is a schematic structural diagram of an electronic device 200 according to an embodiment of the present application.
- the electronic device 200 may include a processor 201, a memory 202, a universal serial bus (USB) interface 203, an antenna 1, and an antenna 2 , a mobile communication module 204, a wireless communication module 205, an audio module 206, a microphone 206A, and a headphone jack 206B.
- a processor 201 a processor 201
- the electronic device 200 may include a processor 201, a memory 202, a universal serial bus (USB) interface 203, an antenna 1, and an antenna 2 , a mobile communication module 204, a wireless communication module 205, an audio module 206, a microphone 206A, and a headphone jack 206B.
- USB universal serial bus
- the structures illustrated in the embodiments of the present invention do not constitute a specific limitation on the electronic device 200 .
- the electronic device 200 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
- the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
- the processor 201 may include one or more processing units, for example, the processor 201 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
- application processor application processor, AP
- modem processor graphics processor
- image signal processor image signal processor
- ISP image signal processor
- controller video codec
- digital signal processor digital signal processor
- baseband processor baseband processor
- neural-network processing unit neural-network processing unit
- a memory may also be provided in the processor 201 for storing instructions and data.
- the memory in processor 201 is cache memory.
- the memory may hold instructions or data that have just been used or recycled by the processor 201 . If the processor 201 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 201 is reduced, thereby improving the efficiency of the system.
- the USB interface 203 is an interface that conforms to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like.
- the USB interface 203 can be used to connect an earphone and play audio through the earphone.
- the interface can also be used to connect other communication devices, such as AR devices.
- the interface connection relationship between the modules illustrated in the embodiment of the present invention is only a schematic illustration, and does not constitute a structural limitation of the electronic device 200 .
- the electronic device 200 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
- the wireless communication function of the electronic device 200 can be implemented by the antenna 1, the antenna 2, the mobile communication module 204, the wireless communication module 205, the modulation and demodulation processor, the baseband processor, and the like.
- Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
- Each antenna in electronic device 200 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
- the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
- the mobile communication module 204 can provide wireless communication solutions including 2G/3G/4G/5G etc. applied on the electronic device 200 .
- the mobile communication module 204 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), and the like.
- the mobile communication module 204 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
- the mobile communication module 204 can also amplify the signal modulated by the modulation and demodulation processor, and then convert it into electromagnetic waves and radiate it out through the antenna 1 .
- at least part of the functional modules of the mobile communication module 204 may be provided in the processor 201 .
- at least part of the functional modules of the mobile communication module 204 may be provided in the same device as at least part of the modules of the processor 201 .
- the wireless communication module 205 can provide applications on the electronic device 200 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global navigation satellites Wireless communication solutions such as global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), and infrared technology (IR).
- WLAN wireless local area networks
- BT Bluetooth
- GNSS global navigation satellite system
- FM frequency modulation
- NFC near field communication
- IR infrared technology
- the wireless communication module 205 may be one or more devices integrating at least one communication processing module.
- the wireless communication module 205 receives electromagnetic waves via the antenna 2 , modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 201 .
- the wireless communication module 205 can also receive the signal to be sent from the processor 201 , perform frequency modulation on it, amplify the signal, and then convert it into an electromagnetic wave for radiation through the antenna 2 .
- the antenna 1 of the electronic device 200 is coupled with the mobile communication module 204, and the antenna 2 is coupled with the wireless communication module 205, so that the electronic device 200 can communicate with the network and other devices through wireless communication technology.
- the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
- the GNSS may include a global positioning system (global positioning system, GPS), a global navigation satellite system (GLONASS), a Beidou navigation satellite system (BDS), a quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).
- GPS global positioning system
- GLONASS global navigation satellite system
- BDS Beidou navigation satellite system
- QZSS quasi-zenith satellite system
- SBAS satellite based augmentation systems
- Memory 202 may be used to store computer-executable program code, which includes instructions.
- the internal memory 202 may include a stored program area and a stored data area.
- the storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like.
- the storage data area can store data (such as audio data, phone book, etc.) created during the use of the electronic device 200 and the like.
- the internal memory 202 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
- the processor 201 executes various functional applications and data processing of the electronic device 200 by executing the instructions stored in the internal memory 202 and/or the instructions stored in the memory provided in the processor.
- the electronic device 200 can implement audio functions through the audio module 206, the microphone 206A, the headphone interface 206B, and the application processor. Such as music playback, recording, etc.
- the audio module 206 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 206 may also be used to encode and decode audio signals. In some embodiments, the audio module 206 may be provided in the processor 201 , or some functional modules of the audio module 206 may be provided in the processor 201 .
- Microphone 206A also referred to as “microphone”, “microphone”, is used to convert sound signals into electrical signals.
- the user can make a sound by approaching the microphone 206A through the human mouth, and input the sound signal into the microphone 206A.
- the electronic device 200 may be provided with at least one microphone 206A.
- the electronic device 200 may be provided with two microphones 206A, which may implement a noise reduction function in addition to collecting sound signals.
- the electronic device 200 may further be provided with three, four or more microphones 206A to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
- the earphone jack 206B is used to connect wired earphones.
- the earphone interface 206B can be the USB interface 203, or can be a 3.5mm open mobile terminal platform (open mobile terminal platform, OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
- open mobile terminal platform OMTP
- CTIA cellular telecommunications industry association of the USA
- the electronic device 200 may also include a sensor module 207, a camera 208, a display screen 209, and the like.
- the sensor module 207 may include a gyro sensor 207A and the like. The embodiments of the present application are not described in detail.
- FIG. 1 and FIG. 2 are only exemplary architecture diagrams.
- the system may further include other functional units, which are not limited in this embodiment of the present application.
- the names of each device in Figures 1 and 2 are not limited.
- each device can also be named with other names, such as replacing it with the name of a network element with the same or similar functions, No restrictions.
- the above-mentioned server may include a processor and a memory. Further, the server may also include a communication line and a communication interface. Wherein, the processor, the memory and the communication interface can be connected through a communication line.
- the processor can be a central processing unit (CPU), a general-purpose processor, a network processor (NP), a digital signal processing (DSP), a microprocessor, a microcontroller, Programmable logic device (PLD) or any combination thereof.
- the processor may also be other devices with processing functions, such as circuits, devices or software modules, which are not limited.
- a communication line used to transfer information between the components included in the server.
- a communication interface for communicating with other devices or other communication networks may be Ethernet, radio access network (RAN), wireless local area networks (WLAN) and the like.
- the communication interface may be a module, circuit, transceiver, or any device capable of enabling communication.
- the instructions may be computer programs.
- the memory may be a read-only memory (ROM) or other types of static storage devices that can store static information and/or instructions, or a random access memory (RAM) or a storage device that can store static information and/or instructions.
- ROM read-only memory
- RAM random access memory
- Other types of dynamic storage devices for information and/or instructions which may also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) ) or other optical disc storage, optical disc storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs), magnetic disk storage media, and other magnetic storage devices, without limitation.
- EEPROM electrically erasable programmable read-only memory
- CD-ROM compact disc read-only memory
- optical disc storage including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs
- magnetic disk storage media and other magnetic storage devices, without limitation.
- the memory may exist independently of the processor, or may be integrated with the processor.
- the memory can be used to store instructions or program code or some data or the like.
- the storage can be located inside the server or outside the server without limitation.
- the processor is configured to execute the instructions stored in the memory to implement the real-time communication method provided by the following embodiments of the present application. For example, when the electronic device is a session management network element or a chip or a system-on-chip in the session management network element, the processor executes the instructions stored in the memory to implement the steps performed by the session management network element in the following embodiments of the present application. For another example, when the electronic device is a mobility management network element or a chip or a system-on-chip in the mobility management network element, the processor may execute the instructions stored in the memory, so as to realize all of the mobility management network elements in the following embodiments of the present application. steps to perform.
- a processor may include one or more CPUs.
- the server includes multiple processors.
- the server further includes an output device and an input device.
- the input device is a device such as a keyboard, a mouse, a microphone or a joystick
- the output device is a device such as a display screen, a speaker, and the like.
- the server may also include other functional units, which are not limited in this embodiment of the present application.
- the communication method provided by the embodiment of the present application is described below by taking the architecture shown in FIG. 1 as an example.
- Each network element in the following embodiments may have the components shown in FIG. 2 , which will not be repeated.
- the name of the message or the name of the parameter in the message interacted with each other is just an example, and other names may also be used in the specific implementation.
- Generate in the embodiments of the present application can also be understood as creating or determining, and “including” in the embodiments of the present application can also be understood as “carrying”, which is uniformly described here, and the embodiments of the present application do not make any reference to this. Specific restrictions.
- FIG. 3 is a schematic flowchart of a real-time communication method provided by an embodiment of the present application. As shown in FIG. 3 , the method may include:
- the second communication device acquires the location of the first communication device, and receives the first audio from the first communication device.
- obtaining the location of the first communication device by the second communication device may be implemented in at least one of the following ways:
- the second communication device may receive a first message from the first communication device, where the first message includes the location of the first communication device.
- the second communication device obtains the location of the first communication device carried in the audio and video call request, and the audio and video call request is in the first Sent by the first communication device when the communication device is connected to the second communication device for an audio and video call.
- the location of the first communication device is carried in the audio and video call request when the audio and video call is established, so that the second communication device acquires the location of the first communication device concisely and efficiently.
- the second communication device obtains the location of the first communication device encapsulated in the media packet sent by the first communication device.
- the second communication device receives the media message sent by the first communication device, and parses out the location of the first communication device.
- the first communication device sends a media packet
- the second communication device A media packet sent by the first communication device is received, and the location of the first communication device encapsulated in the media packet is parsed.
- the location of the first communication device is encapsulated in a media packet sent by the first communication device, so that the second communication device can directly obtain the location of the first communication device, which is more concise and efficient.
- the second communication device configures the first virtual location for the first communication device. It should be understood that the second communication device randomly assigns a virtual location to the first communication device.
- the virtual position can be understood as taking the reference object as the origin, and is set as the front, left, right or rear position of the reference object.
- the virtual position is set by coordinates on a coordinate system
- the coordinate system may be a two-dimensional coordinate system, such as a rectangular coordinate system.
- the coordinate system may also be a three-dimensional coordinate system, such as a three-dimensional Cartesian coordinate system.
- the second communication device may use itself as a reference object, and use its own location as the origin of coordinates, Specify any coordinate on the positive half-axis of the x-axis (ie, the left side of the second communication device) as the first virtual position of the first communication device; or, on the negative half-axis of the x-axis (ie, the right side of the second communication device) Any coordinate of the first communication device is used as the first virtual position of the first communication device; or, any coordinate on the positive half-axis of the y-axis (that is, the front of the second communication device) is used as the first virtual position of the first communication device; or, Any coordinate on the negative semi-axis of the y-axis (ie, behind the second communication device) is used as the first virtual position of the first communication device.
- FIG. 4 is a schematic diagram of an application scenario of a communication method provided by an embodiment of the present application.
- the second communication device designates a position in front of the second communication device as the first virtual position of the first communication device (virtual position A as shown in FIG. 4).
- the second communication device may use the center position (That is, the conference center) as the reference object, that is, the center position is the coordinate origin, and the second communication device can also arbitrarily designate any two positions around the conference center as the virtual positions of the first communication device and the second communication device.
- FIG. 5 is a schematic diagram of an application scenario of another communication method provided by an embodiment of the present application.
- the second communication device may use virtual location A as its own location, and the second communication device designates virtual location B as the first virtual location of the first communication device.
- the second communication device designates the virtual position C as the first virtual position of the first communication device, and uses the virtual position D as its own position; or, the second communication device designates the virtual position A as the first virtual position of the first communication device. position, using virtual position E as its own position.
- other combinations are also possible, which will not be listed one by one here.
- the second communication device can also set a virtual target direction (for example, in a rectangular coordinate system, along the positive half-axis direction of the x-axis), and use any position on the target direction as the target direction of the first communication device.
- a virtual target direction for example, in a rectangular coordinate system, along the positive half-axis direction of the x-axis
- the second communication device directly assigns the first virtual location to the first communication device, without the need for the first communication device to send the location of the first communication device, so that the second communication device obtains the location of the first communication device, and more Simple and efficient.
- the second communication device detects a position keyword in the audio data of the first audio of the first communication device, where the position keyword is used to characterize the position of the first communication device.
- the second communication device determines the location of the first communication device according to the location keyword.
- the second communication device detects that the audio data of the first audio of the first communication device contains "XXAXX", and A is a place name.
- the second communication device determines the position of A according to A, and uses the position of A as the position of the first communication device.
- the second communication device detects that the audio data of the first audio of the first communication device contains "I'm in A” or "Are you in A?", where A is a place name.
- the second communication device determines the position of A according to A, and uses the position of A as the position of the first communication device.
- the second communication device detects that the audio data of the first audio of the first communication device contains more occurrences of "A" than "B", and both A and B are place names.
- the second communication device selects A with the most occurrences, determines the position of A according to A, and uses the position information of A as the position of the first communication device.
- the second communication device detects that the audio data of the first audio of the first communication device contains "A”, "B” and "C", A, B and C are all place names, and B is the subordinate place name of A, C is the subordinate place name of B, and A appears the most times.
- the second communication device determines the position of C according to C, and uses the position of C as the position of the first communication device. Therefore, it needs to be set according to the actual situation, and the embodiments of the present application are not listed one by one.
- the second communication device detects the position keyword contained in the audio data of the first audio of the first communication device, and analyzes the position keyword to determine the position of the first communication device, without relying on the first communication device to send the first Location of communication equipment.
- the second communication device when the second communication device receives the first audio from the first communication device, it should be understood as directly receiving the first audio from the first communication device, for example, the second communication device directly receives the first audio sent by the first communication device .
- the second communication device indirectly receives the first audio from the first communication device, for example, the first communication device sends the first audio to other devices, and the second communication device receives the first audio sent by the other device from the first communication device. an audio.
- the second communication device During the audio-video call between the first communication device and the second communication device, the second communication device generates a third audio according to the location of the first communication device and the first audio from the first communication device.
- the audio and video call may include one or more of a video call, a voice call, a voice conference, and a video conference.
- the third audio contains relative position information of the first communication device and the second communication device.
- the location of the first communication device may be represented by latitude and longitude. Of course, the location of the first communication device may also be represented by geographic location coordinates. Similarly, the location of the second communication device may be represented by latitude and longitude or geographic location coordinates. Therefore, the relative position information of the first communication device and the second communication device may be relative longitude and latitude, or may be relative geographic location coordinates.
- S301 may be specifically implemented as: the second communication device determines the relative position of the first communication device and the second communication device according to the position of the first communication device and the position of the second communication device.
- the second communication device generates the third audio according to the relative positions of the first communication device and the second communication device and the audio data of the first audio of the first communication device.
- the third audio includes relative position information of the first communication device and the second communication device.
- the third audio may also be a spatial audio object in a standard of "Object-Based Audio Immersive Sound Metadata and Codestream".
- the position field corresponds to the relative position information of the first communication device and the second communication device;
- the content field corresponds to the voice content information of the first audio of the first communication device.
- the x-axis of the first communication device A relative to the second communication device B and the y-axis coordinates are:
- the position coordinates of the first communication device A relative to the second communication device B are obtained as A'(X, Y).
- the relative positions of the first communication device and the second communication device may be expressed in degrees of longitude and latitude, and need not be converted into geographic location coordinates. Since the distance corresponding to the degrees of latitude and longitude is relatively long (for example, each degree of latitude and longitude represents approximately 111 kilometers), when the embodiment of the present application is applied to the spatial audio object, the unit of the relative position of the first communication device and the second communication device may be centimeters or decimeters.
- the embodiments of the present application utilize the existing structure of spatial audio objects.
- the second communication device determines the position parameter of the third audio according to the position of the first communication device and the position of the second communication device.
- the second communication device determines content parameters for scheduling the third audio based on the audio data of the first audio from the first communication device. That is, the second communication device replaces the position information corresponding to the position field of the third audio with the relative position of the first communication device and the second communication device.
- the second communication device replaces the content information corresponding to the content field of the third audio with the voice content information of the first audio of the first communication device, so as to obtain the third audio of the first communication device.
- the second communication device generates the second audio according to the third audio and the parameters of the second communication device.
- the second audio is used for playback by the second communication device.
- the second audio is played by the second communication device.
- the real-time communication method provided by the embodiment of the present application further includes: the second communication device sends a second audio to the third communication device, where the second audio is further used to instruct the third communication device to play the second audio.
- the third communication device may be an external playback device of the second communication device, and after receiving the second audio, the third communication device may know that the second audio should be played.
- the second communication device has at least one playback device, which can be understood as: the playback device may be a part of the second communication device, that is, the playback device and the second playback device are the same device; or, the playback device may interact with the second communication device. as a stand-alone device.
- the playback device may be a head-mounted playback device, and the head-mounted playback device may include headphones (including wired headphones, wireless headphones, etc.), VR (virtual reality, virtual reality) or AR (augmented reality, augmented reality), etc. .
- Step S302 can be specifically implemented as: according to the relative position information contained in the third audio frequency, obtain the head-related transfer function on the playback device corresponding to the relative position information; process the first audio frequency and the head-related transfer function to obtain the second audio frequency .
- the head-mounted playback device can be an earphone
- the earphone has a left ear and a right ear
- the second communication device generates the second audio according to the third audio frequency and the parameters of the head-mounted playback device of the second communication device, specifically: Can be implemented as:
- the second communication device acquires the HRTF function of the left ear and the HRTF function of the right ear corresponding to the position parameter according to the position parameter of the third audio frequency.
- the HRTF function also known as the head related transfer function (HRTF)
- HRTF head related transfer function
- ITD interaural time difference
- ILD interaural level difference
- the human auditory system utilizes the ITD and historical auditory experience to achieve precise localization of the sound source.
- the left ear HRTF function and the right ear HRTF function essentially contain spatial orientation information. That is, different spatial orientations have completely different HRTF functions. Therefore, the HRTF function actually contains spatial information and is a representation of the transfer function from different spatial sound sources to binaural.
- the second communication device processes the third audio with the HRTF function of the left ear and the HRTF function of the right ear, respectively, to obtain the audio of the left ear and the audio of the right ear of the earphone.
- the second communication device spatially locates the user's head or the posture of the headset through the sensor of the headset, sets the spatial positioning as the origin of coordinates, and sets the sound source that the user should hear as the target. Since there is no HRTF function for each position in the open source HRTF function library, the second communication device performs interpolation calculation according to the HRTF function of the known azimuth around the target, and can obtain the HRTF function of the target azimuth. In the time domain, the second communication device convolves the HRTF function of the target azimuth with the third audio frequency to obtain the left ear audio and right ear audio of the earphone, and inversely transforms them to obtain the time domain signal, and then Users can experience spatial audio by playing them based on headphones.
- the above application embodiments are described based on an audio and video call between a first communication device and a second communication device.
- the following describes an audio and video call between a second communication device and a plurality of first communication devices, and a plurality of first communication devices. Audio and video calls between a communication device and multiple second communication devices are described. details as follows:
- the second communication device can use the first audio of each first communication device, the location of each first communication device, and the second communication device The location of the device, generating the third audio for each of the first communication devices.
- the second communication device generates the second audio according to the third audio of each of the first communication devices and the parameters of the second communication device.
- the second communication device B acquires the position A1 of the first communication device A1 and the position A2 of the first communication device A2.
- the second communication device B according to the position A1 of the first communication device A1, the position B of the second communication device B, and The first audio from the first communication device A1 generates a third audio for the first communication device A1.
- the second communication device B generates a third audio of the first communication device A2 according to the position A2 of the first communication device A2, the position B of the second communication device B, and the first audio from the first communication device A2.
- the second communication device B generates the second audio of the first communication device A1 according to the third audio of the first communication device A1 and the parameters of the second communication device B.
- the second communication device B generates the second audio of the first communication device A2 according to the third audio of the first communication device A2 and the parameters of the second communication device B.
- the second communication device B plays the second audio of the first communication device A1 and the second audio of the first communication device A2, so that the user B of the second communication device B can listen to the user A1 of the first communication device A1 with the position pointing voice, and the voice of the user A2 of the first communication device A2 with positional directivity, which provides the user B with an immersive sound experience and improves the experience of audio and video calls.
- the specific implementation process for the second communication device B to obtain the location A1 of the first communication device A1 and the location A2 of the first communication device A2 may use the relevant content in the foregoing embodiments, which will not be repeated in this embodiment of the present application.
- the target second communication device among the multiple second communication devices, according to the first audio of each first communication device, each The location of a communication device, and the location of the target second communication device, generate the third audio for each of the first communication devices.
- the target second communication device generates the second audio according to the third audio of each first communication device and the parameters of the target second communication device.
- the second communication device B1 acquires the position of the first communication device A1, the position of the first communication device A2, and the position of the second communication device B2. During the audio and video call between the first communication device A1, the first communication device A2, the second communication device B1 and the second communication device B2, the second communication device B1, according to the position of the first communication device A1, the second communication device B1 The position of , the first audio from the first communication device A1, and the parameters of the second communication device B1, generate the second audio of the first communication device A1. The second communication device B1 generates the first communication device A2 based on the position of the first communication device A2, the position of the second communication device B1, the first audio from the first communication device A2, and the parameters of the second communication device B1. Two audio.
- the second communication device B1 generates a second communication device B2 based on the position of the second communication device B1, the position of the second communication device B2, the first audio of the second communication device B2, and the parameters of the second communication device B1. Audio.
- the second communication device B1 plays the second audio of the first communication device A1, the second audio of the first communication device A2 and the second audio of the second communication device B2, so that the user of the second communication device B1
- the location-directed voice provides user B with an immersive sound experience and improves the experience of audio and video calls.
- the real-time communication method provided by the embodiment of the present application includes: a second The communication device acquires the first audio of each of the first communication devices.
- the second communication device receives mixed audio, where the mixed audio includes multiple audio channels.
- the second communication device performs sampling processing on the mixed audio, and extracts speech features of the sampled mixed audio.
- the second communication device inputs the mixed audio into the neural network model, and the core of the attention mechanism (also called the attention mechanism) in the neural network model is to screen out information that is more effective for the current task from a large amount of information.
- the second communication device adopts a k-means clustering algorithm, or k-means algorithm, which clusters the speech features after the attention mechanism to obtain separated multi-channel audio.
- the mixed audio stream received by the second communication device is x(n), including two audio streams s1(n) and s2(n).
- x(n) including two audio streams s1(n) and s2(n).
- LSTM Long Short-Term Memory
- O n [a 1 *i 1 ,a 2 *i 2 ,...,a n *i n ]
- 1 determine the K value, that is, the number of sets after clustering, which may be designated as 2 in this embodiment of the present application.
- 2 Randomly select K data points from the dataset as the initial centroids. 3 For each point in the data set, calculate the Euclidean distance d k between them and the K points respectively, and divide the data into the sets where the K centroids are located according to the distance. 4 For each data point in the K sets, recalculate the centroid of each set separately. 5 If the new centroid obtained does not change, the clustering ends, and the obtained K sets are the final division results, otherwise return to 3.
- the separated multi-channel first audio frequencies s1(n) and s2(n) are obtained.
- S302 may be specifically implemented as: when the second communication device detects that the second communication device is connected to the headset When the playback device is connected, the second communication device generates the second audio according to the third audio and the parameters of the second communication device.
- This embodiment of the present application effectively saves energy consumption by performing the operation of generating the second audio only when the second communication device detects that the second communication device is connected to the head-mounted playback device.
- the execution subject of each step in the above-mentioned embodiments is the same subject, such as a second electronic communication device.
- the execution body may also be a playback device, a server, etc., which will not be listed here.
- the following refers to multiple execution subjects, for example, a first communication device and a second communication device; or, a first communication device, a second communication device, and a server.
- the specific implementation manner of the real-time communication method provided by the embodiment of the present application is as follows:
- FIG. 6 is a schematic flowchart of another real-time communication method provided by an embodiment of the present application. As shown in FIG. 6 , a first communication device and a second communication device perform an audio and video call, and the method may include:
- the first communication device sends the first audio.
- the second communication device receives the first audio from the first communication device, and acquires the location of the first communication device.
- the second communication device generates a third audio according to the first audio from the first communication device and the position of the first communication device.
- the playback device may be a part of the second communication device; alternatively, the playback device and the second communication device may be independent components of each other; or, the playback device and the second playback device are the same device.
- the playback device may be a head-mounted playback device, and the head-mounted playback device may include headphones (including wired headphones, wireless headphones, etc.), VR (virtual reality, virtual reality) or AR (augmented reality, augmented reality), and the like.
- headphones including wired headphones, wireless headphones, etc.
- VR virtual reality, virtual reality
- AR augmented reality, augmented reality
- the second communication device sends the third audio to the playback device.
- the playback device receives the third audio, and generates the second audio according to the third audio and the parameters of the playback device.
- S604 includes S6041 and S6042, which can be implemented as follows:
- the earphone obtains the HRTF function of the left ear and the HRTF function of the right ear corresponding to the position parameter according to the position parameter of the third audio frequency.
- the earphone processes the third audio frequency with the HRTF function of the left ear and the HRTF function of the right ear respectively, to obtain the audio of the left ear and the audio of the right ear of the earphone.
- S601 to S602 are similar to the above-mentioned S300 to S301, and reference may be made to the relevant descriptions in the above-mentioned S300 to S301.
- S604 and its included S6041 and S6042 are similar to the above-mentioned S302 and its included S3021 and S3022, and reference may be made to the relevant description in the above-mentioned S302 and its included S3021 and S3022.
- FIG. 7 is a schematic flowchart of another real-time communication method provided by an embodiment of the present application. As shown in FIG. 7 , a first communication device and a second communication device perform an audio and video call, and the method may include:
- the first communication device sends the first audio of the first communication device to the server.
- the first audio of the first communication device should be understood as the audio sent by the first communication device.
- the server receives the first audio from the first communication device, and the server obtains the position of the first communication device and the position of the second communication device.
- S702 can be specifically implemented in at least one of the following ways:
- the server may receive the location of the first communications device sent by the first communications device and the location of the second communications device sent by the second communications device.
- Method 1 can be subdivided into the following two situations:
- the server may obtain the location of the first communication device and the location of the second communication device. Specifically, the server may obtain the first communication device to send The location of the first communication device carried in the audio and video call request sent by the second communication device, and the location of the second communication device carried in the audio and video call request sent by the second communication device.
- the location of the first communication device is carried in the audio-video call request when the audio-video call is established, and the location of the second communication device is carried in the audio-video call request when the audio-video call is established, so that the server obtains
- the location of the first communication device and the location of the second communication device are more concise and efficient.
- the server obtains the media message sent by the first communication device.
- the server receives the media message sent by the first communication device, and parses out the location of the first communication device.
- the server receives the media message sent by the second communication device, and parses out the location of the second communication device.
- the location of the first communication device/second communication device is during the audio and video call between the first communication device and the second communication device, and the first communication device/second communication device is in a mute state It is encapsulated in the sent media message, so that the server can acquire the location of the first communication device and the location of the second communication device more concisely and efficiently.
- the server configures the first virtual location for the first communication device.
- the server configures the second virtual location for the second communication device. It should be understood that the server randomly assigns virtual locations to the first communication device and the second communication device.
- the virtual position can be understood as taking the reference object as the origin, and is set as the front, left, right or rear position of the reference object.
- the virtual position is set by coordinates on a coordinate system
- the coordinate system may be a two-dimensional coordinate system, such as a rectangular coordinate system.
- the coordinate system may also be a three-dimensional coordinate system, such as a three-dimensional Cartesian coordinate system.
- the server may also use the second communication device as a reference object, and use the location of the second communication device as the coordinate origin, and specify any coordinate in the Cartesian coordinate system as the first communication device. a virtual location.
- the server designates a user-facing position of the second communication device as the first virtual position of the first communication device (the virtual position A shown in FIG. 4 ).
- the server may also set a virtual target direction, and use any position on the target direction as the first virtual position of the first communication device.
- the server may take the center position (ie the conference center) of the first communication device and the second communication device as a reference, that is, the center position as the coordinate origin.
- the server can also arbitrarily designate any location around the conference center as the first virtual location for configuring the first communication device and the second virtual location for the second communication device.
- FIG. 5 is a schematic diagram of an application scenario of another communication method provided by an embodiment of the present application. As shown in FIG. 5 , the server may designate virtual location A as the second virtual location of the second communication device, and the server may designate virtual location B as the first virtual location of the first communication device.
- the server designates virtual location C as the first virtual location of the first communication device, and virtual location D as the second virtual location of the second communication device; or, the server designates virtual location A as the first virtual location of the first communication device , the virtual location E is used as the second virtual location of the second communication device.
- the server designates virtual location C as the first virtual location of the first communication device, and virtual location D as the second virtual location of the second communication device; or, the server designates virtual location A as the first virtual location of the first communication device , the virtual location E is used as the second virtual location of the second communication device.
- other combinations are also possible, which will not be listed one by one here.
- the server directly assigns virtual locations to the first communication device and the second communication device, without the need for the first communication device and the second communication device to send their respective locations, so that the server uses the location of the first communication device and the second communication device
- the location of the communication equipment is used to locate the sound source, which is more concise and efficient.
- the server detects a location keyword in the audio data of the first audio of the first communication device, where the location keyword is used to represent the location of the first communication device.
- the server determines the location of the first communication device according to the location keyword.
- the server detects a second position keyword in the audio data of the first audio of the second communication device, where the second position keyword is used to characterize the position of the second communication device.
- the server determines the location of the second communication device according to the second location keyword.
- the server detects that the audio data of the first audio of the first communication device contains "XXAXX", and A is a place name.
- the server determines the location of A according to A, and uses the location of A as the location of the first communication device. Similarly, the server determines the location of the second communication device.
- the server detects that the audio data of the first audio of the first communication device contains "I'm in A” or "Are you in A?", where A is a place name.
- the server determines the location of A according to A, and uses the location of A as the location of the first communication device. Similarly, the server determines the location of the second communication device.
- the server detects that the audio data of the first audio of the first communication device contains more occurrences of "A" than "B", and both A and B are place names.
- the server selects A with the most occurrences, determines the location of A according to A, and uses the location information of A as the location of the first communication device. Similarly, the server determines the location of the second communication device.
- the audio data of the first audio of the first communication device detected by the server contains "A”, "B” and "C", A, B and C are all place names, and B is A's Subordinate place names, C is the subordinate place name of B, and A appears the most times.
- the server determines the location of C according to C, and uses the location of C as the location of the first communication device. Similarly, the server determines the location of the second communication device. Therefore, the specific needs are set according to the actual situation, and the embodiments of the present application are not listed one by one.
- the server detects the location keywords contained in the audio data of the first audio of the first communication device, and analyzes the location keywords to determine the location of the first communication device and the location of the second communication device, without relying on the first communication device.
- the server generates a third audio according to the first audio from the first communication device and the location of the first communication device.
- the server sends the third audio to the second communication device.
- the second communication device receives the third audio, and generates the second audio according to the third audio and the parameters of the second communication device.
- the playback device may be a part of the second communication device; alternatively, the playback device and the second communication device may be independent components of each other; or, the playback device and the second playback device are the same device.
- the playback device may be a head-mounted playback device, and the head-mounted playback device may include headphones (including wired headphones, wireless headphones, etc.), VR (virtual reality, virtual reality) or AR (augmented reality, augmented reality), and the like.
- headphones including wired headphones, wireless headphones, etc.
- VR virtual reality, virtual reality
- AR augmented reality, augmented reality
- S705 includes S7051 and S7052.
- S7051 and S7052 can be implemented as:
- the second communication device sends the third audio to the playback device.
- the playback device receives the third audio, and generates the second audio according to the third audio and the parameters of the playback device.
- S7052 includes S70521 and S70522. If the playback device is a headset, the S70521 and S70522 can be implemented as:
- the earphone obtains the HRTF function of the left ear and the HRTF function of the right ear corresponding to the position parameter according to the position parameter of the third audio frequency.
- the earphone processes the third audio frequency with the HRTF function of the left ear and the HRTF function of the right ear respectively, to obtain the audio of the left ear and the audio of the right ear of the earphone.
- S702 to S703 are similar to the above-mentioned S300 to S301, and reference may be made to the relevant descriptions in the above-mentioned S300 to S301.
- S705, S7051, S7052 and the included S70521 and S70522 are similar to the above-mentioned S302 and its included S3021 and S3022, and reference may be made to the relevant description in the above-mentioned S302 and its included S3021 and S3022.
- the communication system described in this possible design is used to perform the functions of each device in the real-time communication method shown in FIG. 3 , so it can achieve the same effect as the above-mentioned real-time communication method.
- FIG. 8 is a communication apparatus provided by an embodiment of the present application.
- the communication apparatus 800 may include: an obtaining and receiving unit 810, configured to obtain the first communication during the audio-video call between the first communication device and the second communication device. The position of the device, receiving the first audio from the first communication device; the generating unit 820 is used to generate the second audio according to the position of the first communication device, the first audio, and the parameters of the second communication device, and the second audio is Position-directed audio.
- the obtaining and receiving unit 810 is further configured to: receive a first message from the first communication device, where the first message includes the location of the first communication device; or, the first communication device configures the first virtual location; or, A location keyword in the audio data of the first audio is detected, where the location keyword is used to characterize the location of the first communication device.
- the acquiring and receiving unit 810 may include: a receiving subunit 811, a configuration subunit 812 and a detection subunit 813.
- the receiving subunit 811 is configured to receive a first message from the first communication device, where the first message includes the location of the first communication device; or the configuration subunit 812 is configured to configure the first virtual location for the first communication device ; or, the detection subunit 813, configured to detect a position keyword in the audio data of the first audio, where the position keyword is used to characterize the position of the first communication device.
- the generating unit 820 is further configured to: generate a third audio according to the position of the first communication device and the first audio, where the third audio includes relative position information of the first communication device and the second communication device;
- the second generating subunit is configured to generate the second audio according to the third audio and the parameters of the second communication device.
- the generating unit 820 may include: a first generating subunit 821 and a second generating subunit 822 .
- the first generation subunit 821 is used to generate a third audio according to the position of the first communication device and the first audio, and the third audio includes the relative position information of the first communication device and the second communication device;
- the second generation The subunit 822 is configured to generate the second audio according to the third audio and the parameters of the second communication device.
- the second communication device includes at least one playback device; the playback device includes a headset, virtual reality VR or augmented reality AR.
- the second communication device is connected to a playback device; the generating unit 820 is further configured to, when the second communication device and the playback device are connected, according to the position of the first communication device, the first audio, and the parameter to generate the second audio.
- the second communication device is connected to a playback device; the generating unit 820 may further include: a third generating subunit 823 .
- the third generating subunit 823 is configured to generate the second audio according to the position of the first communication device, the first audio, and the parameters of the second communication device when the second communication device is connected to the playback device.
- the generating unit 820 is further configured to obtain the head-related transfer function on the playback device corresponding to the relative position information according to the relative position information contained in the third audio; process the first audio and the head-related transfer function to obtain Second audio.
- the second generating subunit 822 may further include: a first acquiring subunit 8221 and a first processing subunit 8222.
- the first acquisition subunit 8221 is used to acquire the head-related transfer function on the playback device corresponding to the relative position information according to the relative position information contained in the third audio;
- the first processing subunit 8222 is used to compare the first audio with The head-related transfer function is processed to obtain the second audio.
- the playback device is an earphone, and the earphone has a left ear and a right ear; the generating unit 820 is further configured to: obtain, according to the relative position information contained in the third audio, the head-related transfer function of the left ear corresponding to the relative position information and The head-related transfer function of the right ear; the first audio frequency is respectively processed with the head-related transfer function of the left ear and the head-related transfer function of the right ear to obtain the left-ear audio and the right-ear audio of the earphone.
- the second generating subunit 822 may further include: a second acquiring subunit 8223 and a second processing subunit 8224 .
- the second acquisition subunit 8223 is used to acquire the head-related transfer function of the left ear and the head-related transfer function of the right ear corresponding to the relative position information according to the relative position information contained in the third audio;
- the second processing subunit 8224 It is used for processing the first audio frequency with the head-related transfer function of the left ear and the head-related transfer function of the right ear respectively, so as to obtain the left-ear audio and the right-ear audio of the earphone.
- the second audio is played by the second communication device.
- the communication apparatus 800 may include: a sending unit 830, configured to send the second audio to the third communication device, where the second audio is used to instruct the third communication device to play the second audio.
- the audio and video calls include one or more of video calls, voice calls, voice conferences, and video conferences.
- the above-mentioned communication apparatus 800 may be implemented by code or by circuit.
- the communication apparatus may be a complete machine of a terminal device.
- the acquiring and receiving unit 810 may be a receiving circuit, or may be implemented by an antenna (such as the antenna 1 shown in FIG. 2 ) and a mobile communication module (such as the mobile communication module shown in The antenna 2 shown in FIG. 2) and the wireless communication module (the wireless communication module shown in FIG. 2) are realized.
- the generating unit 820 may be a processor (such as the processor 201 shown in FIG. 2 ).
- the sending unit 830 may be a sending circuit, or may be implemented by an antenna (the antenna 1 shown in FIG. 2 ) and a mobile communication module (the mobile communication module shown in FIG. 2 ), or may be implemented by an antenna (as shown in FIG. 2 ) The antenna 2) and the wireless communication module (the wireless communication module shown in Fig. 2) are realized.
- An electronic device provided by an embodiment of the present application includes: a processor and a memory, the memory is coupled to the processor, the memory is used to store computer program codes, and the computer program codes include computer instructions, when the processor reads the computer instructions from the memory, So that the electronic device executes the real-time communication method shown in FIG. 3 to FIG. 7 .
- a computer program product provided by an embodiment of the present application when the computer program product runs on a computer, enables the computer to execute the real-time communication methods shown in FIGS. 3 to 7 .
- a computer-readable storage medium provided by an embodiment of the present application includes computer instructions.
- the network device can execute the real-time communication methods shown in FIG. 3 to FIG. 7 .
- a chip system provided by an embodiment of the present application includes one or more processors, and when the one or more processors execute an instruction, the one or more processors execute the real-time communication methods shown in FIG. 3 to FIG. 7 .
- first and second are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implicitly indicating the number of indicated technical features.
- a feature defined as “first” or “second” may expressly or implicitly include one or more of that feature.
- plural means two or more.
- words such as “exemplary” or “for example” are used to represent examples, illustrations or illustrations. Any embodiments or designs described in the embodiments of the present application as “exemplary” or “such as” should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as “exemplary” or “such as” is intended to present the related concepts in a specific manner.
- the above-mentioned communication devices and the like include corresponding hardware structures and/or software modules for executing each function.
- the embodiments of the present application can be implemented in hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of the embodiments of the present invention.
- each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
- the above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. It should be noted that, the division of modules in the embodiment of the present invention is schematic, and is only a logical function division, and there may be other division manners in actual implementation.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Telephone Function (AREA)
- Telephonic Communication Services (AREA)
Abstract
本申请实施例提供一种实时通信方法、装置和系统,在音视频通话的过程中,用户能够收听到对方带有位置指向性音效的语音,提升音视频通话的体验,该方法包括:在第一通信设备与第二通信设备进行音视频通话的过程中,根据第一通信设备的位置,以及来自第一通信设备的第一音频,及第二通信设备的参数,生成具有位置指向性的第二音频,该第二音频用于播放设备播放。
Description
本申请涉及通信技术领域,尤其涉及一种实时通信方法、装置和系统。
目前,智能耳机已具有立体音效功能、降噪功能、以及生物监测等功能,极大地提升了用户音频体验,成为用户经常使用的产品。又由于其具有便携性,用户也经常使用智能耳机与通话对方进行语音交互。但是,当用户利用聊天工具进行实时的语音通话或视频通话(简称音视频通话)的过程中,用户仅能获取通话对方的声音内容,降低了用户的音视频通话的体验。
发明内容
本申请实施例提供一种实时通信方法、装置和系统,在音视频通话的过程中,用户能够收听到对方带有位置指向性音效的语音,提升音视频通话的体验。
为达到上述目的,本申请实施例采用如下技术方案。
第一方面,本申请实施例提供一种实时通信方法,该方法包括:在第一通信设备与第二通信设备进行音视频通话的过程中,获取第一通信设备的位置,接收来自第一通信设备的第一音频。根据第一通信设备的位置,第一音频,以及第二通信设备的参数,生成第二音频,第二音频为具有位置指向性的音频。因此,当第二通信设备在播放第二音频时,用户能够收听到对方带有位置指向性音效的语音,提升音视频通话的体验。
其中,接收来自第一通信设备的第一音频,应理解为,直接接收来自第一通信设备的第一音频,如,直接接收第一通信设备发送的第一音频。或者,间接接收来自第一通信设备的第一音频,如,接收由其他设备发送的来自第一通信设备的第一音频。
一种具体可实现方式中,获取第一通信设备的位置,具体为:接收来自第一通信设备的第一消息,第一消息包括第一通信设备的位置。本申请实施例中,第一通信设备的位置是第一通信设备发送的,使得第二通信设备可以直接获取第一通信设备的位置,更简洁高效。
一种具体可实现方式中,获取第一通信设备的位置,具体为:为第一通信设备配置第一虚拟位置。在本申请实施例中,第二通信设备直接为第一通信设备分配第一虚拟位置,无需第一通信设备发送第一通信设备的位置,使得第二通信设备获取第一通信设备的位置,更简洁高效。
一种具体可实现方式中,获取第一通信设备的位置,具体为:检测第一音频的音频数据中的位置关键词,位置关键词用于表征第一通信设备的位置。本申请实施例通过第二通信设备检测第一通信设备的第一音频的音频数据中包含的位置关键词,通过分析位置关键词确定第一通信设备的位置,无需依赖第一通信设备发送第一通信设备的位置。
一种具体可实现方式中,根据第一通信设备的位置,以及第一音频,生成第三音 频,第三音频包含第一通信设备与第二通信设备的相对位置信息。根据第三音频,及第二通信设备的参数,生成第二音频。
其中,第三音频例如可以是一种“基于对象的音频沉浸式声音元数据和码流”的标准中的空间音频对象。该空间音频对象包含位置字段和内容字段。其中位置字段为第一通信设备和第二通信设备的相对位置信息;内容字段为第一通信设备的第一音频的语音内容信息。
一种具体可实现方式中,第二通信设备包括至少一个播放设备;播放设备包括耳机、虚拟现实VR或增强现实AR。
一种具体可实现方式中,第二通信设备外接播放设备。根据第一通信设备的位置,第一音频,以及第二通信设备的参数,生成第二音频,具体为:在第二通信设备与播放设备处于连接时,根据第一通信设备的位置,第一音频,以及第二通信设备的参数,生成第二音频。
一种具体可实现方式中,根据第三音频包含的相对位置信息,获取相对位置信息对应的播放设备上的头相关传输函数。将第一音频与头相关传输函数进行处理,得到第二音频。
一种具体可实现方式中,播放设备为耳机,耳机具有左耳和右耳。根据第一通信设备的位置,第一音频,以及第二通信设备的参数,生成第二音频,具体为:根据第三音频包含的相对位置信息,获取相对位置信息对应的左耳的头相关传输函数和右耳的头相关传输函数。将第一音频分别与左耳的头相关传输函数和右耳的头相关传输函数进行处理,得到耳机的左耳音频和右耳音频。
一种具体可实现方式中,第二音频用于第二通信设备播放。
也就是说,第二通信设备可以使用自身配置的播放器播放第二音频。
基于第一方面所述的方法,在根据第一通信设备的位置,第一音频,以及第二通信设备的参数,生成第二音频之后,包括:向第三通信设备发送第二音频,第二音频用于指示第三通信设备播放第二音频。
应理解为,第三通信设备可以是第二通信设备的外接播放设备,第三通信设备收到第二音频后可以知道应该播放第二音频。
一种具体可实现方式中,音视频通话包括视频通话、语音通话、语音会议和视频会议中的一种或多种。
第二方面,本申请实施例提供一种实时通信方法,该方法包括:在第一通信设备与第二通信设备进行音视频通话的过程中,第一通信设备发送第一音频。第二通信设备获取第一通信设备的位置。第二通信设备接收来自第一通信设备的第一音频。第二通信设备根据第一通信设备的位置,来自第一通信设备的第一音频,以及第二通信设备的参数,生成第二音频,第二音频为具有位置指向性的音频。第二通信设备播放第二音频。
一种具体可实现方式中,第二通信设备包括至少一个播放设备,第二通信设备播放第二音频,具体为:第二通信设备将第二音频发送给至少一个播放设备。播放设备接收第二音频,并播放第二音频。
第三方面,本申请实施例提供一种实时通信方法,该方法包括:在第一通信设备 与第二通信设备进行音视频通话的过程中,第一通信设备发送第一通信设备的第一音频。服务器接收来自第一通信设备的第一音频。服务器获取第一通信设备的位置和第二通信设备的位置。服务器根据第一通信设备的位置,第二通信设备的位置,来自第一通信设备的第一音频,以及第二通信设备的参数,生成第二音频,第二音频为具有位置指向性的音频。服务器将第二音频发送给第二通信设备。第二通信设备播放第二音频。
一种具体可实现方式中,第二通信设备包括至少一个播放设备,第二通信设备播放第二音频,具体为:第二通信设备将第二音频发送给至少一个播放设备。播放设备接收第二音频,并播放第二音频。
第四方面,本申请实施例提供一种通信装置,包括:获取接收单元,用于在第一通信设备与第二通信设备进行音视频通话的过程中,获取第一通信设备的位置,接收来自第一通信设备的第一音频。生成单元,用于根据第一通信设备的位置,第一音频,以及第二通信设备的参数,生成第二音频,第二音频为具有位置指向性的音频。
一种具体可实现方式中,获取接收单元还用于接收来自第一通信设备的第一消息,第一消息包括第一通信设备的位置;或者,为第一通信设备配置第一虚拟位置;或者,检测第一音频的音频数据中的位置关键词,位置关键词用于表征第一通信设备的位置。
一种具体可实现方式中,生成单元还用于:根据第一通信设备的位置,以及第一音频,生成第三音频,第三音频包含第一通信设备与第二通信设备的相对位置信息;根据第三音频,及第二通信设备的参数,生成第二音频。
一种具体可实现方式中,第二通信设备包括至少一个播放设备。播放设备包括耳机、虚拟现实VR或增强现实AR。
一种具体可实现方式中,第二通信设备外接播放设备。生成单元还用于在第二通信设备与播放设备处于连接时,根据第一通信设备的位置,第一音频,以及第二通信设备的参数,生成第二音频。
一种具体可实现方式中,生成单元还用于:根据第三音频包含的相对位置信息,获取相对位置信息对应的播放设备上的头相关传输函数;将第一音频与头相关传输函数进行处理,得到第二音频。
一种具体可实现方式中,播放设备为耳机,耳机具有左耳和右耳。生成单元还用于:根据第三音频包含的相对位置信息,获取相对位置信息对应的左耳的头相关传输函数和右耳的头相关传输函数;将第一音频分别与左耳的头相关传输函数和右耳的头相关传输函数进行处理,得到耳机的左耳音频和右耳音频。
一种具体可实现方式中,第二音频用于第二通信设备播放。
基于第一方面所述的通信装置,该通信装置还包括:发送单元,用于向第三通信设备发送第二音频,第二音频用于指示第三通信设备播放第二音频。
一种具体可实现方式中,音视频通话包括视频通话、语音通话、语音会议和视频会议中的一种或多种。
第五方面,本申请实施例提供一种通信系统,包括:在第一通信设备与第二通信设备进行音视频通话的过程中,第一通信设备用于发送第一音频。第二通信设备用于接收来自第一通信设备的第一音频。第二通信设备用于获取第一通信设备的位置,并 根据第一通信设备的位置,来自第一通信设备的第一音频,以及第二通信设备的参数,生成第二音频,第二音频为具有位置指向性的音频。第二通信设备用于播放第二音频。
一种具体可实现方式中,第二通信设备包括至少一个播放设备,包括:第二通信设备用于将第二音频发送给至少一个播放设备;播放设备用于接收第二音频,并播放第二音频。
第六方面,本申请实施例提供一种通信系统,包括:在第一通信设备与第二通信设备进行音视频通话的过程中,第一通信设备用于发送第一通信设备的第一音频;服务器用于接收来自第一通信设备的第一音频。服务器用于获取第一通信设备的位置和第二通信设备的位置;服务器用于根据第一通信设备的位置,第二通信设备的位置,来自第一通信设备的第一音频,以及第二通信设备的参数,生成第二音频,第二音频为具有位置指向性的音频;服务器用于将第二音频发送给第二通信设备;第二通信设备用于播放第二音频。
一种具体可实现方式中,第二通信设备包括至少一个播放设备,包括:第二通信设备用于将第二音频发送给至少一个播放设备;播放设备用于接收第二音频,并播放第二音频。
第七方面,本申请实施例提供一种电子设备,该电子设备包括:处理器和存储器,所述存储器与所述处理器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述处理器从所述存储器中读取所述计算机指令,以使得所述电子设备执行上述第一方面或者上述方面的任一种可能的设计所述的实时通信方法。
第八方面,本申请实施例提供一种计算机程序产品,所述计算机程序产品包括计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行上述第一方面或者上述方面的任一种可能的设计所述的实时通信方法。
第九方面,本申请实施例提供一种计算机可读存储介质,其特征在于,包括计算机指令,所述计算机可读存储介质包括计算机指令,当所述计算机指令在计算机上运行时,使得所述计算机执行上述第一方面或者上述方面的任一种可能的设计所述的实时通信方法。
第十方面,本申请实施例提供一种芯片系统,其特征在于,包括一个或多个处理器,当所述一个或多个处理器执行指令时,所述一个或多个处理器执行上述第一方面或者上述方面的任一种可能的设计所述的时通信方法。
其中,上述第二方面至第十方面中各个实施例的具体实施方式及对应的技术效果可以参见上述第一方面的具体实施方式及技术效果。
本申请实施例,通过在第一通信设备与第二通信设备进行音视频通话的过程中,根据获取的第一通信设备的位置,接收的来自第一通信设备的第一音频,以及第二通信设备的参数,生成具有位置指向性的第二音频。因此,当第二通信设备在播放第二音频时,用户能够收听到对方带有位置指向性音效的语音,提升音视频通话的体验。
图1a为本申请实施例提供的一种智能音响的实际应用场景示意图一;
图1b为本申请实施例提供的一种智能音响的实际应用场景示意图二;
图2a为本申请实施例提供的一种耳机的实际应用场景示意图;
图3a为本申请实施例提供的一种车载设备的实际应用场景示意图一;
图3b为本申请实施例提供的一种车载设备的实际应用场景示意图二;
图1为本申请实施例提供的一种通信系统的架构示意图;
图2为本申请实施例提供的一种电子设备的组成示意图;
图3为本申请实施例提供的一种实时通信方法的流程图;
图4为本申请实施例提供的一种通信方法的应用场景示意图;
图5为本申请实施例提供的又一种通信方法的应用场景示意图;
图6为本申请实施例提供的又一种通信方法的流程图;
图7为本申请实施例提供的再一种通信方法的流程图;
图8为本申请实施例提供的一种通信装置的组成示意图。
目前,智能耳机已具有立体音效功能、降噪功能、以及生物监测等功能,极大地提升了用户音频体验,成为用户经常使用的产品。
其中,立体音效功能可以利用人对声音的空间方位的感知原理。该人对声音的空间方位的感知原理,具体为:空间声源经过空气传递到人的左右耳,由于声波到达左右耳的距离不同,造成达到人的左右耳的声波不同,包括声波的声压不同、频率的相位不相同。左右耳基于同一声源的不同声波形成了对声源的音频空间方位和距离的感知。示例性的,当用户采用智能耳机听音乐时,用户可以收听到具有立体音效的音乐。
降噪功能可以利用噪音中和的原理。具体为:安置在智能耳机内部的麦克风检测环境中低频噪音(100~1000Hz),并将检测到的低频噪音传递至智能耳机中的控制电路。控制电路根据低频噪音进行实时运算,得到与低频噪音相位相反、振幅相同的声波,并控制播放器播放。
综上可知,现有的智能耳机功能强大。又由于其具有便携性,用户也经常使用智能耳机与通话对方进行语音交互。但是,当用户利用聊天工具进行实时的音视频通话的过程中,用户仅能获取通话对方的声音内容,并不能有效利用智能耳机的立体音效功能。
因此,为了解决上述技术问题,在本申请实施例中,提出了一种实时通信方法。该方法通过在第一通信设备与第二通信设备进行音视频通话的过程中,根据获取的第一通信设备的位置,以及来自第一通信设备的第一音频生成第三音频。其中,第三音频包含第一通信设备和第二通信设备的相对位置信息。根据第三音频和第二通信设备的参数,生成具有位置指向性的第二音频。因此,当第二通信设备在播放第二音频时,用户能够收听到对方带有位置指向性音效的语音,提升音视频通话的体验。
以下结合一些具体的应用场景,对本申请实施例提供的技术方案进行简单说明。
场景一、智能家居
上述实施例中所述的第二通信设备可以为智能音响。第一通信设备可以为能够与第二通信设备进行音视频通话的电子设备,例如,手机。
以第一通信设备为手机,第二通信设备为智能音响为例,图1a为本申请实施例提供的一种智能音响的实际应用场景示意图一。如图1a所示,用户A坐在家里的沙发 (即位置A)上,用户B乘坐公共汽车位于位置B处。其中,在地理位置上,位置A位于位置B的正南方向上。当用户A使用智能音响11(即第二通信设备)与用户B使用的手机12(即第一通信设备)进行音频通话时,智能音响11根据获取的手机12的位置,及来自手机12的第一音频生成具有位置指向性的第二音频。当智能音响11播放第二音频时,用户A能够收听到用户B带有位置指向性的语音。例如,图1b为本申请实施例提供的一种智能音响的实际应用场景示意图二。如图1b所示,用户A坐在家里的沙发上面向北方。用户A能够感知用户B在用户A的正前方说话,使得用户A与用户B像在面对面的近距离交谈,提升音视频通话的体验。
当然,上述实施例中所述的第二通信设备还可以为智慧屏,用户A可以使用智慧屏与用户B使用的手机进行音频通话。其中,智慧屏可以与上述智能音响执行相似的操作,及相同的效果,在此不再赘述。
场景二、多人会议
上述实施例中所述的第二通信设备可以为耳机,耳机的数量可以为一个或多个。第一通信设备可以为能够与第二通信设备进行音视频通话的电子设备(例如,手机、耳机),第一通信设备的数量可以为一个或多个。
以两个第一通信设备,一个第二通信设备,且第一通信设备和第二通信设备均是耳机为例,图2a为本申请实施例提供的一种耳机的实际应用场景示意图。如图2a所示,用户A位于位置A处,用户B1位于位置B1处,用户B2位于位置B2处。其中,位置B1位于位置A的西南方位上,位置B2位于位置A的东南方位上。用户A采用的耳机与用户B1和用户B2进行电话会议。用户A的耳机获取用户B1的耳机的位置和用户B2的耳机的位置。用户A的耳机根据用户B1的耳机的位置和用户B2的耳机的位置,及来自用户B1的耳机的音频和来自用户B2的耳机的音频,生成具有位置指向性的音频,可以实现不同人的音频具有不同的位置指向音效。如图2a所示,用户A能够感知用户B1在用户A的右前方说话,及用户B2在用户A的左前方说话,为用户B提供声临其境的感觉。因此,当多个人同时讲话时,收听者也可以根据不同的位置指向音效区分不同的发声者,提高音视频通话体验。
场景三、驾车场景
上述实施例中所述的第二通信设备可以为车载设备。第一通信设备可以为能够与第二通信设备进行音视频通话的电子设备,例如,手机。
以第一通信设备为手机,第二通信设备为车载设备为例,图3a为本申请实施例提供的一种车载设备的实际应用场景示意图一。如图3a所示,用户A开车位于位置A(即车载设备的位置)处,用户B乘坐公交车位于位置B(即手机的位置)处,其中,位置B位于位置A的正东方位。用户A使用车载设备31与用户B采用的手机32进行音视频通话。此时,车载设备31的显示屏上可以显示“用户B通话中”等提示信息。车载设备31获取手机32的位置。车载设备31根据手机32的位置,及来自手机32的音频,生成具有位置指向性的语音。例如,图3b为本申请实施例提供的一种车载设备的实际应用场景示意图二。如图3b所示,用户A能够感知用户B在用户A的东边(即右侧方)说话,使得用户B像在副驾驶座位上与用户A进行交谈,提升音视频通话的体验。
当然,第一通信设备和第二通信设备还可以为其他设备,如电视机、摄像头,等等。在此不再一一列举。具体可详见下述相关内容。
下面结合本申请实施例中的附图,对本申请实施例提供的实时通信方法进行描述。
本申请实施例提供的实时通信方法可应用于图1所示的通信系统,如图1所示,该通信系统100可以包括第一通信设备110和第二通信设备120。该通信系统100还可以包括服务器130。下面对图1所示架构中涉及的设备进行介绍。
第一通信设备110,可以是用于实现无线通信功能的设备,例如通信设备或者可用于通信设备中的芯片等。该第一通信设备110可以为具有麦克风、显示屏、摄像头和播放器等功能单元的通信设备。具体的,该第一通信设备110可以包括5G网络或者未来演进的通信系统中的用户设备(user equipment,UE)、智慧屏、接入终端、终端单元、终端站、移动站、移动台、远方站、远程终端、移动设备、无线通信设备、终端代理或终端装置等。接入终端可以是蜂窝电话、无绳电话、会话启动协议(session initiation protocol,SIP)电话、无线本地环路(wireless local loop,WLL)站、个人数字处理(personal digital assistant,PDA)、具有无线通信功能的手持设备、计算设备或连接到无线调制解调器的其它处理设备、车载设备或可穿戴设备,第一通信设备110可以是移动的,也可以是固定的。在一些示例中,第一通信设备110也可以外接播放设备。可以由第一通信设备110可以使用自身配置的播放器播放音频,也可以采用外接的播放设备播放音频。
其中,播放设备可以为头戴式播放设备,该头戴式播放设备可以为有线耳机、无线耳机(例如,TWS蓝牙耳机、颈挂式蓝牙耳机、头戴式蓝牙耳机)、虚拟现实(virtual reality,VR)设备、增强现实(augmented reality,AR)设备等,本申请对该穿耳机的具体形式不做特殊限制。
第二通信设备120与第一通信设备110可以相同,可参见上述第一通信设备110的相关描述,在此不再赘述。
在一示例中,第一通信设备和第二通信设备分别连接头戴式播放设备140。或者,第一通信设备110连接头戴式播放设备140,第二通信设备120不连接头戴式播放设备140。或者,第一通信设备110不连接头戴式播放设备140,第二通信设备120连接头戴式播放设备140。如图1所示,本申请实施例以第一通信设备110不连接头戴式播放设备140,第二通信设备120连接头戴式播放设备140为例进行阐述。
服务器130,该服务器也可以称为服务设备、服务装置、云服务器、云计算服务器、云主机。本申请的实施例中的服务器可用于提供音视频通话服务,如蜂窝通话或互联网通话。具体的,第一通信设备110和第二通信设备120通过服务器130进行音视频通话。该音视频通话可以包括:视频通话,语音通话,语音会议和视频会议中的一种或多种。在一示例中,第一通信设备110和第二通信设备120可以采用会话初始协议(session initiation protocol,SIP)和实时传输协议(real-time transport protocol,RTP)进行音视频通话。
在具体实现时,可以由第二通信设备120或服务器130执行本申请实施例提供的实时通信方法,具体如下:
第二通信设备120为本申请实施例提供的实时通信方法的执行主体。
在第一通信设备110与第二通信设备120进行音视频通话的过程中,第二通信设备120接收来自第一通信设备110的第一音频。第二通信设备120获取第一通信设备110的位置。第二通信设备120根据第一通信设备110的位置,及第一通信设备110的第一音频,生成第三音频,第三音频包含第一通信设备110和第二通信设备120的相对位置信息。第二通信设备120根据第三音频及第二通信设备120的参数,生成具有位置指向性的第二音频,第二通信设备120播放第二音频。
其中,第二通信设备接收来自第一通信设备的第一音频,应理解为,直接接收来自第一通信设备的第一音频,如,第二通信设备直接接收第一通信设备发送的第一音频。或者,第二通信设备间接接收来自第一通信设备的第一音频,如,第一通信设备将第一音频发送给其他设备,第二通信设备接收由其他设备发送的来自第一通信设备的第一音频。
其中,第二通信设备可以采用如下任一种方式获取第一通信设备的位置:接收来自第一通信设备的第一消息,第一消息包括第一通信设备的位置;或者,第二通信设备为第一通信设备配置第一虚拟位置;或者,第二通信设备检测第一音频的音频数据中的位置关键词,位置关键词用于表征第一通信设备的位置。本申请实施例对第二通信设备获取第一通信设备的位置的方式不做具体限定。
其中,第三音频例如可以是一种“基于对象的音频沉浸式声音元数据和码流”的标准中的空间音频对象。该空间音频对象包含位置字段和内容字段。其中位置字段为第一通信设备和第二通信设备的相对位置信息;内容字段为第一通信设备的第一音频的语音内容信息。
在一示例中,第二通信设备外接播放设备。第二通信设备将第二音频发送给播放设备,该播放设备接收第二音频,并播放第二音频。
在一示例中,第二通信设备外接播放设备。第二通信设备将第三音频发送给播放设备,播放设备接收第三音频,并根据第三音频及播放设备的参数,生成具有位置指向性的第二音频,并播放该第二音频。
示例性的,播放设备可以为头戴式播放设备,该头戴式播放设备可以为有线耳机、无线耳机(例如,TWS蓝牙耳机、颈挂式蓝牙耳机、头戴式蓝牙耳机)、虚拟现实(virtual reality,VR)设备、增强现实(augmented reality,AR)设备等,本申请对该穿耳机的具体形式不做特殊限制。
若头戴式播放设备为耳机,则头戴式播放设备根据第三音频及头戴式播放设备的参数,生成第二音频,具体可实现为:根据第三音频包含的相对位置信息,获取相对位置信息对应的左耳的头相关传输函数和右耳的头相关传输函数;将第一音频分别与左耳的头相关传输函数和右耳的头相关传输函数进行处理,得到耳机的左耳音频和右耳音频。
以下,以无线耳机为例。耳机分别用于佩戴在用户左耳部和右耳部的左耳机和右耳机。耳机可通过有线连接或无线连接(如图1所示的路径11)彼此通信。耳机还可以通过有线连接或无线连接(如图1所示的路径12)与第二通信设备进行通信。其中,无线连接例如可以是蓝牙、WiFi、NFC、紫蜂(ZigBee)等连接方式。其中,路径12可以采用例如BT,WLAN(如Wi-Fi),Zigbee,FM,NFC,IR,或通用2.4G/5G无线 通信技术等。路径12所采用的连接方式与路径11所采用的连接方式可以相同,也可以不同,本申请实施例对此不做具体限定。
在一示例中,第二通信设备外接的播放设备与第二通信设备执行相似的操作。
另外,服务器也可以为本申请实施例提供的实时通信方法的执行主体。服务器与第二通信设备执行相似的操作。如,服务器130获取第一通信设备110的位置和第二通信设备120的位置。服务器130根据第一通信设备110的位置和第二通信设备120的位置,计算第一通信设备110和第二通信设备120的相对位置。服务器130根据第一通信设备110和第二通信设备120的相对位置,及第一通信设备110的第一音频,生成第三音频,第三音频包含第一通信设备110和第二通信设备120的相对位置信息。进一步的,服务器130根据第三音频及第二通信设备120的头戴式播放设备140的参数,生成第二音频。
在具体实现时,上述各设备(如:通信设备、头戴式播放设备)均可以采用图2所示的组成结构,或者包括图2所示的部件。图2为本申请实施例提供的一种电子设备200的结构示意图,该电子设备200可以包括处理器201,存储器202,通用串行总线(universal serial bus,USB)接口203,天线1,天线2,移动通信模块204,无线通信模块205,音频模块206,麦克风206A,耳机接口206B。
可以理解的是,本发明实施例示意的结构并不构成对电子设备200的具体限定。在本申请另一些实施例中,电子设备200可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器201可以包括一个或多个处理单元,例如:处理器201可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
处理器201中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器201中的存储器为高速缓冲存储器。该存储器可以保存处理器201刚用过或循环使用的指令或数据。如果处理器201需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器201的等待时间,因而提高了系统的效率。
USB接口203是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口203可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他通信设备,例如AR设备等。
可以理解的是,本发明实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备200的结构限定。在本申请另一些实施例中,电子设备200也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
电子设备200的无线通信功能可以通过天线1,天线2,移动通信模块204,无线通信模块205,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备200中的每个天线可用于 覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块204可以提供应用在电子设备200上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块204可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块204可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块204还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块204的至少部分功能模块可以被设置于处理器201中。在一些实施例中,移动通信模块204的至少部分功能模块可以与处理器201的至少部分模块被设置在同一个器件中。
无线通信模块205可以提供应用在电子设备200上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块205可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块205经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器201。无线通信模块205还可以从处理器201接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备200的天线1和移动通信模块204耦合,天线2和无线通信模块205耦合,使得电子设备200可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
存储器202可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器202可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备200使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器202可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器201通过运行存储在内部存储器202的指令,和/或存储在设置于处理器中的存储器的指令,执行电子设备200的各种功能应用以及数据处理。
电子设备200可以通过音频模块206,麦克风206A,耳机接口206B,以及应用处 理器等实现音频功能。例如音乐播放,录音等。
音频模块206用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块206还可以用于对音频信号编码和解码。在一些实施例中,音频模块206可以设置于处理器201中,或将音频模块206的部分功能模块设置于处理器201中。
麦克风206A,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风206A发声,将声音信号输入到麦克风206A。电子设备200可以设置至少一个麦克风206A。在另一些实施例中,电子设备200可以设置两个麦克风206A,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备200还可以设置三个,四个或更多麦克风206A,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口206B用于连接有线耳机。耳机接口206B可以是USB接口203,也可以是3.5mm的开放移动通信设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
当然,该电子设备200还可以包括传感器模块207,摄像头208,显示屏209等。其中传感器模块207可以包括陀螺仪传感器207A等。本申请实施例不在详细介绍。
需要说明的是,图1和图2仅为示例性架构图,除图1和图2中所示功能单元外,该系统还可以包括其他功能单元,本申请实施例对此不进行限定。此外,图1和图2中各个设备的名称不受限制,除图1和图2所示名称之外,各个设备还可以命名为其他名称,如替换成具备相同或相似功能的网元名称,不予限制。
在具体实现时,上述服务器可以包括处理器和存储器。进一步的,该服务器还可以包括通信线路以及通信接口。其中,处理器,存储器以及通信接口之间可以通过通信线路连接。
处理器,可以是中央处理器(central processing unit,CPU)、通用处理器、网络处理器(network processor,NP)、数字信号处理器(digital signal processing,DSP)、微处理器、微控制器、可编程逻辑器件(programmable logic device,PLD)或它们的任意组合。处理器还可以是其它具有处理功能的装置,如电路、器件或软件模块,不予限制。
通信线路,用于在服务器所包括的各部件之间传送信息。
通信接口,用于与其他设备或其它通信网络进行通信。该其它通信网络可以为以太网,无线接入网(radio access network,RAN),无线局域网(wireless local area networks,WLAN)等。通信接口可以是模块、电路、收发器或者任何能够实现通信的装置。
存储器,用于存储指令。其中,指令可以是计算机程序。
其中,存储器可以是只读存储器(read-only memory,ROM)或可存储静态信息和/或指令的其他类型的静态存储设备,也可以是随机存取存储器(random access memory,RAM)或可存储信息和/或指令的其他类型的动态存储设备,还可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包 括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟)、磁盘存储介质、其他磁存储设备,不予限制。
需要说明的是,存储器可以独立于处理器存在,也可以和处理器集成在一起。存储器可以用于存储指令或者程序代码或者一些数据等。存储器可以位于服务器内,也可以位于服务器外,不予限制。
处理器,用于执行存储器中存储的指令,以实现本申请下述实施例提供的实时通信方法。例如,当电子设备为会话管理网元或者会话管理网元中的芯片或者片上系统时,处理器执行存储器中存储的指令,以实现本申请下述实施例中会话管理网元所执行的步骤。又例如,当电子设备为移动性管理网元或者移动性管理网元中的芯片或者片上系统时,处理器可以执行存储器中存储的指令,以实现本申请下述实施例中移动性管理网元所执行的步骤。
在一种示例中,处理器可以包括一个或多个CPU。
作为一种可选的实现方式,服务器包括多个处理器。
作为一种可选的实现方式,服务器还包括输出设备和输入设备。示例性地,输入设备是键盘、鼠标、麦克风或操作杆等设备,输出设备是显示屏、扬声器(speaker)等设备。
当然,服务器还可以包括其他功能单元,本申请实施例对此不进行限定。
下面以图1所示架构为例,对本申请实施例提供的通信方法进行描述。下述实施例中的各网元可以具备图2所示部件,不予赘述。需要说明的是,本申请的实施例中各个设备之间交互的消息名称或消息中的参数名称等只是一个示例,具体实现中也可以采用其他的名称。本申请实施例中的生成(generate)也可以理解为创建(create)或者确定,本申请实施例中的“包括”也可以理解为“携带”,在此统一说明,本申请实施例对此不作具体限定。
图3为本申请实施例提供的一种实时通信方法的流程示意图,如图3所示,该方法可以包括:
S300、第二通信设备获取第一通信设备的位置,接收来自第一通信设备的第一音频。
其中,第二通信设备获取第一通信设备的位置,具体可以采用以下至少一种方式实现:
方式一,第二通信设备可以接收来自第一通信设备的第一消息,该第一消息包括第一通信设备的位置。
具体可以采用如下方式:
第一种方式,在第一通信设备与第二通信设备进行音视频通话连接时,第二通信设备获取音视频通话请求中携带的第一通信设备的位置,该音视频通话请求为在第一通信设备与第二通信设备进行音视频通话连接时,第一通信设备发送的。
在本申请实施例中,第一通信设备的位置是在音视频通话建立时音视频通话请求中携带的,使得第二通信设备获取第一通信设备的位置简洁高效。
第二种方式,在第一通信设备与第二通信设备进行音视频通话的过程中,第二通信设备获取第一通信设备发送的媒体报文中封装的第一通信设备的位置。第二通信设 备接收第一通信设备发送的媒体报文,并解析出第一通信设备的位置。
具体的,在第一通信设备与第二通信设备进行音视频通话的过程中,且在第一通信设备与第二通信设备处于静音状态时,第一通信设备发送媒体报文,第二通信设备接收第一通信设备发送的媒体报文,并解析出该媒体报文中封装的第一通信设备的位置。
本申请实施例中,第一通信设备的位置是第一通信设备发送的媒体报文中封装的,使得第二通信设备可以直接获取第一通信设备的位置,更简洁高效。
方式二,第二通信设备为第一通信设备配置第一虚拟位置。应理解为,第二通信设备为第一通信设备随机指定虚拟位置。
该虚拟位置,可以理解为以参照物为原点,设置为该参照物的前方,左边,右边或后方等位置。
具体为,该虚拟位置通过坐标系上的坐标设置,该坐标系可以为二维坐标系,如直角坐标系等。当然,该坐标系也可以为三维坐标系,如三维笛卡尔坐标系等。
示例性的,以直角坐标系为例,当第二通信设备与第一通信设备进行一对一的音视频通话时,第二通信设备可以以自身作为参照物,以自身所在位置为坐标原点,指定x轴的正半轴(即第二通信设备的左边)上的任一坐标作为第一通信设备的第一虚拟位置;或者,x轴的负半轴(即第二通信设备的右边)上的任一坐标作为第一通信设备的第一虚拟位置;或者,y轴的正半轴(即第二通信设备的前面)上的任一坐标作为第一通信设备的第一虚拟位置;或者,y轴的负半轴(即第二通信设备的后面)上的任一坐标作为第一通信设备的第一虚拟位置。
具体实施时,图4为本申请实施例提供的一种通信方法的应用场景示意图。第二通信设备指定第二通信设备的前面的位置作为第一通信设备的第一虚拟位置(如图4中所示的虚拟位置A)。
示例性的,以直角坐标系为例,当第二通信设备与第一通信设备进行一对一的音视频通话时,第二通信设备可以以第一通信设备和第二通信设备的中心位置(即会议中心)作为参照物,即中心位置为坐标原点,第二通信设备也可以任意指定会议中心周围的任两位置作为第一通信设备和第二通信设备的虚拟位置。
具体实施时,图5为本申请实施例提供的又一种通信方法的应用场景示意图。如图5所示,第二通信设备可以将虚拟位置A作为自身的位置,第二通信设备指定虚拟位置B作为第一通信设备的第一虚拟位置。或者,第二通信设备指定虚拟位置C为第一通信设备的第一虚拟位置,将虚拟位置D作为自身的位置;再或者,第二通信设备指定虚拟位置A为第一通信设备的第一虚拟位置,将虚拟位置E作为自身的位置。当然,还可以为其他组合,在此不再一一列举。
这里需要说明的是,第二通信设备还可以设置虚拟的目标方向(如在直角坐标系中,沿x轴的正半轴方向),并将目标方向上的任一位置作为第一通信设备的第一虚拟位置。
在本申请实施例中,第二通信设备直接为第一通信设备分配第一虚拟位置,无需第一通信设备发送第一通信设备的位置,使得第二通信设备获取第一通信设备的位置,更简洁高效。
方式三,第二通信设备检测第一通信设备的第一音频的音频数据中的位置关键词,位置关键词用于表征第一通信设备的位置。第二通信设备根据位置关键词确定第一通信设备的位置。
示例性的,第二通信设备检测第一通信设备的第一音频的音频数据中含有“XXAXX”,A为地名。第二通信设备根据A确定A的位置,并将A的位置作为第一通信设备的位置。
示例性的,第二通信设备检测第一通信设备的第一音频的音频数据中含有“我在A”或“你在A吗?”其中,A为地名。第二通信设备根据A确定A的位置,并将A的位置作为第一通信设备的位置。
示例性的,第二通信设备检测第一通信设备的第一音频的音频数据中含有“A”的出现次数多于“B”的出现次数,A和B均为地名。第二通信设备选择出现次数最多的A,并根据A确定A的位置,并将A的位置信息作为第一通信设备的位置。
当然,还可以包括其他情况,如第二通信设备检测第一通信设备的第一音频的音频数据中包含有“A”、“B”和“C”,A、B和C均为地名,B为A的下级地名,C为B的下级地名,且A出现的次数最多。第二通信设备根据C确定C的位置,并将C的位置作为第一通信设备的位置。因此,具体需要根据实际情况设定,本申请实施例不再一一列举。
本申请实施例通过第二通信设备检测第一通信设备的第一音频的音频数据中包含的位置关键词,通过分析位置关键词确定第一通信设备的位置,无需依赖第一通信设备发送第一通信设备的位置。
其中,第二通信设备接收来自第一通信设备的第一音频,应理解为,直接接收来自第一通信设备的第一音频,如,第二通信设备直接接收第一通信设备发送的第一音频。或者,第二通信设备间接接收来自第一通信设备的第一音频,如,第一通信设备将第一音频发送给其他设备,第二通信设备接收由其他设备发送的来自第一通信设备的第一音频。本申请不做具体限定。
S301、在第一通信设备与第二通信设备进行音视频通话的过程中,第二通信设备根据第一通信设备的位置,以及来自第一通信设备的第一音频,生成第三音频。
该音视频通话可以包括视频通话,语音通话、语音会议和视频会议中的一种或多种。
该第三音频包含第一通信设备与第二通信设备的相对位置信息。
其中,第一通信设备的位置可以用经纬度表示。当然,第一通信设备的位置也可以用地理位置坐标表示。同理,第二通信设备的位置可以用经纬度表示或地理位置坐标表示。所以,第一通信设备与第二通信设备的相对位置信息可以为相对经纬度,也可以为相对地理位置坐标。
其中,S301具体可实现为:第二通信设备根据第一通信设备的位置和第二通信设备的位置,确定第一通信设备和第二通信设备的相对位置。第二通信设备根据第一通信设备和第二通信设备的相对位置,及第一通信设备的第一音频的音频数据,生成第三音频。
其中,该第三音频包含第一通信设备与第二通信设备的相对位置信息。第三音频 也可以是一种“基于对象的音频沉浸式声音元数据和码流”的标准中的空间音频对象。该空间音频对象对应的数组字段中:位置字段对应第一通信设备和第二通信设备的相对位置信息;内容字段对应第一通信设备的第一音频的语音内容信息。
示例性的,假设第二通信设备B的位置坐标为(Xb,Yb),第一通信设备A的位置坐标为(Xa,Ya),则第一通信设备A相对第二通信设备B的x轴和y轴坐标分别为:
Y=Ya-Yb
X=Xa-Xb
即,得到第一通信设备A相对于第二通信设备B的位置坐标为A’(X,Y)。
这里需要说明的是,第一通信设备与第二通信设备的相对位置可以用经纬度的度数表示,无需转换成地理位置坐标。由于经纬度的度数对应的距离较长(如每一度经纬度大约代表111公里),因此,本申请实施例中应用于空间音频对象时,第一通信设备与第二通信设备的相对位置的单位可以为厘米或分米。
由于现有的空间音频对象对应的数组字段中包括位置字段和内容字段。因此,本申请实施例利用了现有的空间音频对象的结构。具体为,第二通信设备根据第一通信设备的位置和第二通信设备的位置,确定第三音频的位置参数。第二通信设备根据来自第一通信设备的第一音频的音频数据,确定调度第三音频的内容参数。也就是说,第二通信设备将第三音频的位置字段对应的位置信息替换为第一通信设备与第二通信设备的相对位置。第二通信设备将第三音频的内容字段对应的内容信息替换为第一通信设备的第一音频的语音内容信息,即可得到第一通信设备的第三音频。
S302、第二通信设备根据第三音频,及第二通信设备的参数生成第二音频。
第二音频用于第二通信设备播放。换句话说,第二音频由第二通信设备播放。
再或者,本申请实施例提供的实时通信方法还包括:第二通信设备向第三通信设备发送第二音频,该第二音频还用于指示第三通信设备播放第二音频。也就是说,第三通信设备可以是第二通信设备的外接播放设备,第三通信设备收到第二音频后可以知道应该播放第二音频。
其中,第二通信设备具有至少一个播放设备,可以理解为:播放设备可以是第二通信设备的一部分,即播放设备与第二播放设备为同一设备;或者,播放设备可以与第二通信设备互为独立器件。其中,该播放设备可以为头戴式播放设备,该头戴式播放设备可以包括耳机(包括有线耳机、无线耳机等)、VR(virtual reality,虚拟现实)或AR(augmented reality,增强现实)等。
步骤S302具体可实现为:根据第三音频包含的相对位置信息,获取相对位置信息对应的播放设备上的头相关传输函数;将第一音频与所述头相关传输函数进行处理,得到第二音频。
示例性的,若头戴式播放设备可以为耳机,该耳机具有左耳和右耳,第二通信设备根据第三音频及第二通信设备的头戴式播放设备的参数生成第二音频,具体可实现为:
S3021、第二通信设备根据第三音频的位置参数,获取位置参数对应的左耳HRTF函数和右耳HRTF函数。
其中,HRTF函数,又称头相关传输函数(head related transfer function,HRTF), 其用于描述人体的头部及耳廓等部位对声波的散射作用以及由此产生的双耳时差(interaural time difference,ITD)和声级差(interaural level difference,ILD),反映了声波从声源到双耳的传输过程。具体实施时,人体的听觉系统利用ITD和历史的听觉经验,实现声源的精确定位。
也就是说,左耳HRTF函数和右耳HRTF函数本质上是包含了空间方位信息。即不同的空间方位,其HRTF函数是完全不一样的。因此,HRTF函数实际上是包含了空间信息的,是不同空间声源到双耳传递函数的一个表征。
S3022、第二通信设备将第三音频分别与左耳HRTF函数和右耳HRTF函数进行处理,得到耳机的左耳音频和右耳音频。
应理解为,第二通信设备通过耳机的传感器对用户的头部或耳机姿态进行空间定位,并将该空间定位设定为坐标原点,将用户应该听到的声源设为目标。由于开源的HRTF函数库中没有每个位置的HRTF函数,因此,第二通信设备根据目标周围已知方位的HRTF函数进行插值计算,可以得到目标方位的HRTF函数。在时域上,第二通信设备将目标方位的HRTF函数与第三音频进行卷积处理,得到耳机的左耳音频和右耳音频,并将其经逆变换后即可得到时域信号,再基于耳机播放用户即可感受空间音频。
以上申请实施例是以一个第一通信设备与一个第二通信设备的音视频通话进行说明的,下面分别对一个第二通信设备对多个第一通信设备的音视频通话,及多个第一通信设备与多个第二通信设备的音视频通话进行说明。具体如下:
在多个第一通信设备与一个第二通信设备进行音视频通话的过程中,第二通信设备根据每个第一通信设备的第一音频,每个第一通信设备的位置,及第二通信设备的位置,生成各个第一通信设备的第三音频。第二通信设备根据各个第一通信设备的第三音频,及第二通信设备的参数生成第二音频。
以第一通信设备A1、第一通信设备A2与第二通信设备B进行音视频通话为例:
示例性的,第二通信设备B获取第一通信设备A1的位置A1和第一通信设备A2的位置A2。在第一通信设备A1、第一通信设备A2和第二通信设备B进行音视频通话过程中,第二通信设备B根据第一通信设备A1的位置A1,第二通信设备B的位置B,及来自第一通信设备A1的第一音频,生成第一通信设备A1的第三音频。同时,第二通信设备B根据第一通信设备A2的位置A2,第二通信设备B的位置B,及来自第一通信设备A2的第一音频,生成第一通信设备A2的第三音频。第二通信设备B根据第一通信设备A1的第三音频,及第二通信设备B的参数生成第一通信设备A1的第二音频。第二通信设备B根据第一通信设备A2的第三音频,及第二通信设备B的参数生成第一通信设备A2的第二音频。第二通信设备B播放第一通信设备A1的第二音频和第一通信设备A2的第二音频,使得第二通信设备B的用户B能够收听到第一通信设备A1的用户A1带有位置指向性的语音,及第一通信设备A2的用户A2带有位置指向性的语音,为用户B提供声临其境的感受,提升音视频通话的体验。
其中,第二通信设备B获取第一通信设备A1的位置A1和第一通信设备A2的位置A2的具体实现过程可以采用上述实施例中的相关内容,本申请实施例不再赘述。
在多个第一通信设备与多个第二通信设备进行音视频通话的过程中,多个第二通 信设备中的目标第二通信设备根据每个第一通信设备的第一音频,每个第一通信设备的位置,及目标第二通信设备的位置,生成各个第一通信设备的第三音频。目标第二通信设备根据各个第一通信设备的第三音频,及目标第二通信设备的参数生成第二音频。
以第一通信设备A1、第一通信设备A2、第二通信设备B1和第二通信设备B2进行音视频通话为例:
第二通信设备B1获取第一通信设备A1的位置、第一通信设备A2的位置和第二通信设备B2的位置。在第一通信设备A1、第一通信设备A2、第二通信设备B1和第二通信设备B2进行音视频通话过程中,第二通信设备B1根据第一通信设备A1的位置,第二通信设备B1的位置,来自第一通信设备A1的第一音频,及第二通信设备B1的参数,生成第一通信设备A1的第二音频。第二通信设备B1根据第一通信设备A2的位置,第二通信设备B1的位置,来自第一通信设备A2的第一音频,及第二通信设备B1的参数,生成第一通信设备A2的第二音频。第二通信设备B1根据第二通信设备B1的位置,第二通信设备B2的位置,第二通信设备B2的第一音频,及第二通信设备B1的参数,生成第二通信设备B2的第二音频。第二通信设备B1播放第一通信设备A1的第二音频、第一通信设备A2的第二音频及第二通信设备B2的第二音频,使得第二通信设备B1的用户能够收听到对方带有位置指向性的语音,为用户B提供声临其境的感受,提升音视频通话的体验。
当然,第一通信设备A1、第一通信设备A2和第二通信设备B2的操作与第二通信设备B1采用的操作相似,在此不再赘述。
这里需要说明的是,在上述一个第二通信设备对多个第一通信设备的音视频通话,及多个第一通信设备与多个第二通信设备的音视频通话的实施例中,在第二通信设备根据每个第一通信设备的第一音频,以及每个第一通信设备的位置,生成各个第一通信设备的第三音频之前,本申请实施例提供的实时通信方法包括:第二通信设备获取各个第一通信设备的第一音频。
在一种可实现方式中,第二通信设备接收混合音频,该混合音频中包含多路音频。第二通信设备对混合音频进行采样处理,并提取采样处理后的混合音频的语音特征。第二通信设备将混合音频输入神经网络模型,神经网络模型中的注意力机制(亦称attention机制)的核心是从大量信息中筛选出对当前任务更有效的信息。第二通信设备采用k均值聚类算法(k-means clustering algorithm),或称k-means算法,该算法对经过attention机制后的语音特征进行聚类,就得到了分离后的多路音频。
示例性的,假设第二通信设备收到的混合音频流是x(n),包含两路音频流s1(n)和s2(n)。首先,对混合音频流x(n)进行降采样到8kHz,并对其做短时傅里叶变化。在实验中使用32ms的汉明窗,窗移为8ms。为了保证语音信号的局部一致性,对混合音频流进行100帧的分割。其次,采用双向长短期记忆网络(Long Short-Term Memory,LSTM)提取采样处理后的合音频流的语音特征。其中,LSTM解决了循环神经网络(recurrent neural network,RNN)长距离依赖的问题。然后,初始时,令Q=K=V=I,其中I为输入向量且I=[i
1,i
2,…,i
n],其中n为向量维度,计算Q和K的点积,并除以K的维度,将所得结果通过归一化指数函数(Softmax logical regression),或称softmax函 数,从而得到每一特征向量的权重a:
经过attention机制后,所得向量为:
O
n=[a
1*i
1,a
2*i
2,…,a
n*i
n]
最后,①确定K值,即聚类后的集合数目,本申请实施例中可以指定为2。②从数据集中随机选择K个数据点作为初始质心。③对于数据集中的每一个点,分别计算它们与这K个点的欧氏距离d
k,根据距离远近分别将这些数据划分到K个质心所在的集合中。④对K个集合中的每个数据点,分别重新计算每个集合的质心。⑤如果得到的新的质心没有变化,则聚类结束,所得的K个集合就是最后的划分结果,否则返回③。通过以上算法过程,就得到了分离后的多路第一音频s1(n)和s2(n)。
在一些实施例中,若第二通信设备包括至少一个播放设备,该播放设备外接于第二通信设备上,则S302具体可实现为:在第二通信设备检测到第二通信设备与头戴式播放设备处于连接时,第二通信设备根据第三音频,及第二通信设备的参数生成第二音频。
本申请实施例通过仅在第二通信设备检测到第二通信设备与头戴式播放设备处于连接的状态下执行生成第二音频的操作,有效节省能耗。
以上所述的实施例中各个步骤的执行主体为同一主体,如第二电子通信设备。当然,执行主体也可以为播放设备、服务器等,在此不再一一列举。下面针对多个执行主体,例如,第一通信设备和第二通信设备;或者,第一通信设备、第二通信设备和服务器。本申请实施例提供的实时通信方法的具体实现方式如下:
方式一,以执行主体为第一通信设备和第二通信设备为例:
图6为本申请实施例提供的又一种实时通信方法的流程示意图,如图6所示,第一通信设备与第二通信设备进行音视频通话,该方法可以包括:
S600、第一通信设备发送第一音频。
S601、第二通信设备接收来自第一通信设备的第一音频,及获取第一通信设备的位置。
S602、第二通信设备根据来自第一通信设备的第一音频,以及第一通信设备的位置,生成第三音频。
若第二通信设备具有至少一个播放设备,其中,播放设备可以是第二通信设备的一部分;或者,播放设备可以与第二通信设备互为独立器件;再或者,播放设备与第二播放设备为同一设备。
其中,播放设备可以为头戴式播放设备,该头戴式播放设备可以包括耳机(包括有线耳机、无线耳机等)、VR(virtual reality,虚拟现实)或AR(augmented reality,增强现实)等。
S603、第二通信设备将第三音频发送给播放设备。
S604、播放设备接收第三音频,并根据第三音频,及播放设备的参数生成第二音频。
若播放设备为耳机,S604包括S6041和S6042,具体可实现为:
S6041、耳机根据第三音频的位置参数,获取位置参数对应的左耳HRTF函数和右耳HRTF函数。
S6042、耳机将第三音频分别与左耳HRTF函数和右耳HRTF函数进行处理,得到耳机的左耳音频和右耳音频。
其中,S601至S602与上述S300至S301相似,可参考上述S300至S301中的相关描述。S604及其包括的S6041和S6042与上述S302及其包括的S3021和S3022相似,可参考上述S302及其包括的S3021和S3022中的相关描述。
方式二,以执行主体为第一通信设备、第二通信设备和服务器为例:
图7为本申请实施例提供的另一种实时通信方法的流程示意图,如图7所示,第一通信设备与第二通信设备进行音视频通话,该方法可以包括:
S701、第一通信设备向服务器发送第一通信设备的第一音频。
其中,第一通信设备的第一音频,应理解为,第一通信设备发送的音频。
S702、服务器接收来自第一通信设备的第一音频,以及服务器获取第一通信设备的位置和第二通信设备的位置。
其中,S702具体可以采用以下至少一种方式实现:
方式一,服务器可以接收第一通信设备发送的第一通信设备的位置,及第二通信设备发送的第二通信设备的位置。
方式一可以细分以下两种情况:
第一种情况,在第一通信设备与第二通信设备进行音视频通话连接时,服务器可以获取第一通信设备的位置和第二通信设备的位置,具体可以为:服务器获取第一通信设备发送的音视频通话请求中携带的第一通信设备的位置,及第二通信设备发送的音视频通话请求中携带的第二通信设备的位置。
在本申请实施例中,第一通信设备的位置是在音视频通话建立时音视频通话请求携带的,第二通信设备的位置是在音视频通话建立时音视频通话请求携带的,使得服务器获取第一通信设备的位置及第二通信设备的位置较简洁高效。
第一种情况,在第一通信设备与第二通信设备进行音视频通话的过程中,且在第一通信设备与第二通信设备处于静音状态时,服务器获取第一通信设备发送的媒体报文中封装的第一通信设备的位置,及第二通信设备发送的媒体报文中封装的第二通信设备的位置。服务器接收第一通信设备发送的媒体报文,并解析出第一通信设备的位置。服务器接收第二通信设备发送的媒体报文,并解析出第二通信设备的位置。
在本申请实施例中,第一通信设备/第二通信设备的位置是在第一通信设备与第二通信设备进行音视频通话的过程中,且静音状态下第一通信设备/第二通信设备发送的媒体报文中封装的,使得服务器获取第一通信设备的位置和第二通信设备的位置更简洁高效。
方式二,服务器为第一通信设备配置第一虚拟位置。服务器为第二通信设备配置第二虚拟位置。应理解为,服务器为第一通信设备和第二通信设备随机指定虚拟位置。
该虚拟位置,可以理解为以参照物为原点,设置为该参照物的前方,左边,右边或后方等位置。
具体为,该虚拟位置通过坐标系上的坐标设置,该坐标系可以为二维坐标系,如直角坐标系等。当然,该坐标系也可以为三维坐标系,如三维笛卡尔坐标系等。
示例性的,以直角坐标系为例,服务器也可以以第二通信设备作为参照物,以第二通信设备所在位置为坐标原点,指定直角坐标系中的任一坐标为第一通信设备的第一虚拟位置。具体实施时,服务器指定第二通信设备的用户面向的位置作为第一通信设备的第一虚拟位置(如图4中所示的虚拟位置A)。
这里需要说明的是,服务器还可以设置虚拟的目标方向,并将目标方向上的任一位置作为第一通信设备的第一虚拟位置。
当然,以直角坐标系为例,服务器可以以第一通信设备和第二通信设备的中心位置(即会议中心)作为参照物,即中心位置为坐标原点。服务器也可以任意指定会议中心周围的任一位置作为第一通信设备配置第一虚拟位置和第二通信设备的第二虚拟位置。例如,图5为本申请实施例提供的又一种通信方法的应用场景示意图。如图5所示,服务器可以指定虚拟位置A作为第二通信设备的第二虚拟位置,服务器指定虚拟位置B作为第一通信设备的第一虚拟位置。或者,服务器指定虚拟位置C为第一通信设备的第一虚拟位置,虚拟位置D作为第二通信设备的第二虚拟位置;再或者,服务器指定虚拟位置A为第一通信设备的第一虚拟位置,虚拟位置E作为第二通信设备的第二虚拟位置。当然,还可以为其他组合,在此不再一一列举。
在本申请实施例中,服务器直接为第一通信设备和第二通信设备分配虚拟位置,无需第一通信设备和第二通信设备发送各自的位置,使得服务器利用第一通信设备的位置和第二通信设备的位置进行声源定位,更简洁高效。
方式三,服务器检测第一通信设备的第一音频的音频数据中的位置关键词,位置关键词用于表征第一通信设备的位置。服务器根据位置关键词确定第一通信设备的位置。同理,服务器检测第二通信设备的第一音频的音频数据中的第二位置关键词,第二位置关键词用于表征第二通信设备的位置。服务器根据第二位置关键词确定第二通信设备的位置。
示例性的,服务器检测第一通信设备的第一音频的音频数据中含有“XXAXX”,A为地名。服务器根据A确定A的位置,并将A的位置作为第一通信设备的位置。同理,服务器确定第二通信设备的位置。
示例性的,服务器检测第一通信设备的第一音频的音频数据中含有“我在A”或“你在A吗?”其中,A为地名。服务器根据A确定A的位置,并将A的位置作为第一通信设备的位置。同理,服务器确定第二通信设备的位置。
示例性的,服务器检测第一通信设备的第一音频的音频数据中含有“A”的出现次数多于“B”的出现次数,A和B均为地名。服务器选择出现次数最多的A,并根据A确定A的位置,并将A的位置信息作为第一通信设备的位置。同理,服务器确定第二通信设备的位置。
当然,还可以包括其他情况,如服务器检测第一通信设备的第一音频的音频数据中包含有“A”、“B”和“C”,A、B和C均为地名,B为A的下级地名,C为B的下级地名,且A出现的次数最多。服务器根据C确定C的位置,并将C的位置作为第一通信设备的位置。同理,服务器确定第二通信设备的位置。因此,具体需要根据实际 情况设定,本申请实施例不再一一列举。
本申请实施例通过服务器检测第一通信设备的第一音频的音频数据中包含的位置关键词,通过分析位置关键词确定第一通信设备的位置和第二通信设备的位置,无需依赖第一通信设备和第二通信设备发送的位置。
S703、服务器根据来自第一通信设备的第一音频,以及第一通信设备的位置,生成第三音频。
S704、服务器将第三音频发送给第二通信设备。
S705、第二通信设备接收第三音频,并根据第三音频及第二通信设备的参数生成第二音频。
若第二通信设备具有至少一个播放设备,其中,播放设备可以是第二通信设备的一部分;或者,播放设备可以与第二通信设备互为独立器件;再或者,播放设备与第二播放设备为同一设备。
其中,播放设备可以为头戴式播放设备,该头戴式播放设备可以包括耳机(包括有线耳机、无线耳机等)、VR(virtual reality,虚拟现实)或AR(augmented reality,增强现实)等。
其中,S705包括S7051和S7052。S7051和S7052具体可实现为:
S7051、第二通信设备将第三音频发送给播放设备。
S7052、播放设备接收第三音频,并根据第三音频,及播放设备的参数生成第二音频。
其中,S7052包括S70521和S70522。若播放设备为耳机,S70521和S70522具体可实现为:
S70521、耳机根据第三音频的位置参数,获取位置参数对应的左耳HRTF函数和右耳HRTF函数。
S70522、耳机将第三音频分别与左耳HRTF函数和右耳HRTF函数进行处理,得到耳机的左耳音频和右耳音频。
其中,S702至S703与上述S300至S301相似,可参考上述S300至S301中的相关描述。S705、S7051、S7052及其包括的S70521和S70522与上述S302及其包括的S3021和S3022相似,可参考上述S302及其包括的S3021和S3022中的相关描述。
具体的,该可能的设计中所述的通信系统用于执行图3所示实时通信方法中各个设备的功能,因此可以达到与上述实时通信方法相同的效果。
图8为本申请实施例提供的一种通信装置,该通信装置800可以包括:获取接收单元810,用于在第一通信设备与第二通信设备进行音视频通话的过程中,获取第一通信设备的位置,接收来自第一通信设备的第一音频;生成单元820,用于根据第一通信设备的位置,第一音频,以及第二通信设备的参数,生成第二音频,第二音频为具有位置指向性的音频。
在一些实施例中,获取接收单元810还用于:接收来自第一通信设备的第一消息,第一消息包括第一通信设备的位置;或者,第一通信设备配置第一虚拟位置;或者,检测第一音频的音频数据中的位置关键词,位置关键词用于表征第一通信设备的位置。
可选的,获取接收单元810可以包括:接收子单元811,配置子单元812和检测 子单元813。其中,接收子单元811,用于接收来自第一通信设备的第一消息,第一消息包括第一通信设备的位置;或者,配置子单元812,用于为第一通信设备配置第一虚拟位置;或者,检测子单元813,用于检测第一音频的音频数据中的位置关键词,位置关键词用于表征第一通信设备的位置。
在一些实施例中,生成单元820还用于:根据第一通信设备的位置,以及第一音频,生成第三音频,第三音频包含第一通信设备与第二通信设备的相对位置信息;第二生成子单元,用于根据第三音频,及第二通信设备的参数,生成第二音频。
可选的,生成单元820可以包括:第一生成子单元821和第二生成子单元822。其中,第一生成子单元821,用于根据第一通信设备的位置,以及第一音频,生成第三音频,第三音频包含第一通信设备与第二通信设备的相对位置信息;第二生成子单元822,用于根据第三音频,及第二通信设备的参数,生成第二音频。
在一些实施例中,第二通信设备包括至少一个播放设备;播放设备包括耳机、虚拟现实VR或增强现实AR。
在一些实施例中,第二通信设备外接播放设备;生成单元820还用于在第二通信设备与播放设备处于连接时,根据第一通信设备的位置,第一音频,以及第二通信设备的参数,生成第二音频。
可选的,第二通信设备外接播放设备;生成单元820还可以包括:第三生成子单元823。其中,第三生成子单元823,用于在第二通信设备与播放设备处于连接时,根据第一通信设备的位置,第一音频,以及第二通信设备的参数,生成第二音频。
在一些实施例中,生成单元820还用于根据第三音频包含的相对位置信息,获取相对位置信息对应的播放设备上的头相关传输函数;将第一音频与头相关传输函数进行处理,得到第二音频。
可选的,第二生成子单元822还可以包括:第一获取子单元8221和第一处理子单元8222。其中,第一获取子单元8221,用于根据第三音频包含的相对位置信息,获取相对位置信息对应的播放设备上的头相关传输函数;第一处理子单元8222,用于将第一音频与头相关传输函数进行处理,得到第二音频。
在一些实施例中,播放设备为耳机,耳机具有左耳和右耳;生成单元820还用于:根据第三音频包含的相对位置信息,获取相对位置信息对应的左耳的头相关传输函数和右耳的头相关传输函数;将第一音频分别与左耳的头相关传输函数和右耳的头相关传输函数进行处理,得到耳机的左耳音频和右耳音频。
可选的,第二生成子单元822还可以包括:第二获取子单元8223和第二处理子单元8224。其中,第二获取子单元8223,用于根据第三音频包含的相对位置信息,获取相对位置信息对应的左耳的头相关传输函数和右耳的头相关传输函数;第二处理子单元8224,用于将第一音频分别与左耳的头相关传输函数和右耳的头相关传输函数进行处理,得到耳机的左耳音频和右耳音频。
在一些实施例中,第二音频用于第二通信设备播放。
在一些实施例中,通信装置800可以包括:发送单元830,用于向第三通信设备发送第二音频,第二音频用于指示第三通信设备播放第二音频。
在一些实施例中,音视频通话包括视频通话、语音通话、语音会议和视频会议中 的一种或多种。
可选的,上述通信装置800可以由代码实现也可由电路实现,具体的,通信装置可以是终端设备的整机。示例性的,获取接收单元810可以是接收电路,也可以由天线(如图2所示的天线1)和移动通信模块(如图2所示的移动通信模块)实现,还可以由天线(如图2所示的天线2)和无线通信模块(如图2所示的无线通信模块)实现。生成单元820可以是处理器(如图2所示的处理器201)。发送单元830可以是发送电路,也可以由天线(如图2所示的天线1)和移动通信模块(如图2所示的移动通信模块)实现,还可以由天线(如图2所示的天线2)和无线通信模块(如图2所示的无线通信模块)实现。
可选的,该可能的设计中,上述图1a~图7所示方法实施例中涉及电子设备的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。该可能的设计中所述的电子设备用于执行图1a~图7所示实时通信方法中电子设备的功能,因此可以达到与上述实时通信方法相同的效果。
本申请实施例提供的一种电子设备,包括:处理器和存储器,存储器与处理器耦合,存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当处理器从存储器中读取计算机指令,以使得电子设备执行图3~图7所示实时通信方法。
本申请实施例提供的一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行图3~图7所示实时通信方法。
本申请实施例提供的一种计算机可读存储介质,包括计算机指令,当计算机指令在终端上运行时,使得网络设备执行图3~图7所示实时通信方法。
本申请实施例提供的一种芯片系统,包括一个或多个处理器,当一个或多个处理器执行指令时,一个或多个处理器执行图3~图7所示实时通信方法。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。
在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
可以理解的是,上述通信设备等为了实现上述功能,其包含了执行各个功能相应 的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请实施例能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明实施例的范围。
本申请实施例可以根据上述方法示例对上述通信设备等进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本发明实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
Claims (32)
- 一种实时通信方法,其特征在于,包括:在第一通信设备与第二通信设备进行音视频通话的过程中,获取所述第一通信设备的位置,接收来自所述第一通信设备的第一音频;根据所述第一通信设备的位置,所述第一音频,以及所述第二通信设备的参数,生成第二音频,所述第二音频为具有位置指向性的音频。
- 根据权利要求1所述的方法,其特征在于,获取所述第一通信设备的位置,包括:接收来自所述第一通信设备的第一消息,所述第一消息包括所述第一通信设备的位置;或者,为所述第一通信设备配置第一虚拟位置;或者,检测所述第一音频的音频数据中的位置关键词,所述位置关键词用于表征所述第一通信设备的位置。
- 根据权利要求1或2所述的方法,其特征在于,根据所述第一通信设备的位置,所述第一音频,以及所述第二通信设备的参数信息,生成第二音频,包括:根据所述第一通信设备的位置,以及所述第一音频,生成第三音频,所述第三音频包含所述第一通信设备与所述第二通信设备的相对位置信息;根据所述第三音频,及所述第二通信设备的参数,生成所述第二音频。
- 根据权利要求3所述的方法,其特征在于,所述第二通信设备包括至少一个播放设备;所述播放设备包括耳机、虚拟现实VR或增强现实AR。
- 根据权利要求4所述的方法,其特征在于,所述第二通信设备外接所述播放设备;根据所述第一通信设备的位置,所述第一音频,以及所述第二通信设备的参数,生成第二音频,包括:在所述第二通信设备与所述播放设备处于连接时,根据所述第一通信设备的位置,所述第一音频,以及所述第二通信设备的参数,生成第二音频。
- 根据权利要求4所述的方法,其特征在于,根据所述第三音频,及所述第二通信设备的参数,生成所述第二音频,包括:根据所述第三音频包含的所述相对位置信息,获取所述相对位置信息对应的播放设备上的头相关传输函数;将所述第一音频与所述头相关传输函数进行处理,得到所述第二音频。
- 根据权利要求6所述的方法,其特征在于,所述播放设备为耳机,所述耳机具有左耳和右耳;根据所述第三音频,及所述第二通信设备的参数,生成所述第二音频,包括:根据所述第三音频包含的所述相对位置信息,获取所述相对位置信息对应的左耳的头相关传输函数和右耳的头相关传输函数;将所述第一音频分别与所述左耳的头相关传输函数和所述右耳的头相关传输函数进行处理,得到所述耳机的左耳音频和右耳音频。
- 根据权利要求1-7任一项所述的方法,其特征在于,所述第二音频用于所述第二通信设备播放。
- 根据权利要求1-7任一项所述的方法,其特征在于,在根据所述第一通信设备的位置,所述第一音频,以及所述第二通信设备的参数,生成第二音频之后,包括:向第三通信设备发送所述第二音频,所述第二音频用于指示所述第三通信设备播放所述第二音频。
- 根据权利要求1-9任一项所述的方法,其特征在于,所述音视频通话包括视频通话、语音通话、语音会议和视频会议中的一种或多种。
- 一种实时通信方法,其特征在于,包括:在第一通信设备与第二通信设备进行音视频通话的过程中,所述第一通信设备发送第一音频;所述第二通信设备获取所述第一通信设备的位置,及接收来自所述第一通信设备的第一音频;所述第二通信设备根据所述第一通信设备的位置,来自所述第一通信设备的第一音频,以及所述第二通信设备的参数,生成第二音频,所述第二音频为具有位置指向性的音频;所述第二通信设备播放所述第二音频。
- 根据权利要求11所述的方法,其特征在于,所述第二通信设备包括至少一个播放设备,所述第二通信设备播放所述第二音频,包括:所述第二通信设备将所述第二音频发送给所述至少一个播放设备;所述播放设备接收所述第二音频,并播放所述第二音频。
- 一种实时通信方法,其特征在于,包括:在第一通信设备与第二通信设备进行音视频通话的过程中,所述第一通信设备发送所述第一通信设备的第一音频;服务器获取所述第一通信设备的位置和所述第二通信设备的位置,及接收来自所述第一通信设备的第一音频;所述服务器根据所述第一通信设备的位置,所述第二通信设备的位置,来自所述第一通信设备的第一音频,以及所述第二通信设备的参数,生成第二音频,所述第二音频为具有位置指向性的音频;所述服务器将所述第二音频发送给所述第二通信设备;所述第二通信设备接收所述第二音频,并播放所述第二音频。
- 根据权利要求13所述的方法,其特征在于,所述第二通信设备包括至少一个播放设备,所述第二通信设备播放所述第二音频,包括:所述第二通信设备将所述第二音频发送给所述至少一个播放设备;所述播放设备接收所述第二音频,并播放所述第二音频。
- 一种通信装置,其特征在于,包括:获取接收单元,用于在第一通信设备与第二通信设备进行音视频通话的过程中,获取所述第一通信设备的位置,接收来自所述第一通信设备的第一音频;生成单元,用于根据所述第一通信设备的位置,所述第一音频,以及所述第二通信设备的参数,生成第二音频,所述第二音频为具有位置指向性的音频。
- 根据权利要求15所述的通信装置,其特征在于,所述获取接收单元还用于:接收来自所述第一通信设备的第一消息,所述第一消息包括所述第一通信设备的位置;或者,所述第一通信设备配置第一虚拟位置;或者,检测所述第一音频的音频数据中的位置关键词,所述位置关键词用于表征所述第一通信设备的位置。
- 根据权利要求15或16所述的通信装置,其特征在于,所述生成单元还用于:根据所述第一通信设备的位置,以及所述第一音频,生成第三音频,所述第三音频包含所述第一通信设备与所述第二通信设备的相对位置信息;根据所述第三音频,及所述第二通信设备的参数,生成所述第二音频。
- 根据权利要求17所述的通信装置,其特征在于,所述第二通信设备包括至少一个播放设备;所述播放设备包括耳机、虚拟现实VR或增强现实AR。
- 根据权利要求18所述的通信装置,其特征在于,所述第二通信设备外接所述播放设备;所述生成单元还用于在所述第二通信设备与所述播放设备处于连接时,根据所述第一通信设备的位置,所述第一音频,以及所述第二通信设备的参数,生成第二音频。
- 根据权利要求19所述的通信装置,其特征在于,所述生成单元还用于:根据所述第三音频包含的所述相对位置信息,获取所述相对位置信息对应的播放设备上的头相关传输函数;将所述第一音频与所述头相关传输函数进行处理,得到所述第二音频。
- 根据权利要求20所述的通信装置,其特征在于,所述播放设备为耳机,所述耳机具有左耳和右耳;所述生成单元还用于:根据所述第三音频包含的所述相对位置信息,获取所述相对位置信息对应的左耳的头相关传输函数和右耳的头相关传输函数;将所述第一音频分别与所述左耳的头相关传输函数和所述右耳的头相关传输函数进行处理,得到所述耳机的左耳音频和右耳音频。
- 根据权利要求15-21任一项所述的通信装置,其特征在于,所述第二音频用于所述第二通信设备播放。
- 根据权利要求15-21任一项所述的通信装置,其特征在于,包括:发送单元,用于向第三通信设备发送所述第二音频,所述第二音频用于指示所述第三通信设备播放所述第二音频。
- 根据权利要求15-23任一项所述的通信装置,其特征在于,所述音视频通话包括视频通话、语音通话、语音会议和视频会议中的一种或多种。
- 一种通信系统,其特征在于,包括:在第一通信设备与第二通信设备进行音视频通话的过程中,所述第一通信设备用于发送第一音频;所述第二通信设备用于获取所述第一通信设备的位置,及接收来自所述第一通信设备的第一音频;所述第二通信设备用于根据所述第一通信设备的位置,来自所述第一通信设备的第一音频,以及所述第二通信设备的参数,生成第二音频,所述第二音频为具有位置指向性的音频;所述第二通信设备用于播放所述第二音频。
- 根据权利要求25所述的通信系统,其特征在于,所述第二通信设备包括至少一个播放设备,包括:所述第二通信设备用于将所述第二音频发送给所述至少一个播放设备;所述播放设备用于接收所述第二音频,并播放所述第二音频。
- 一种通信系统,其特征在于,包括:在第一通信设备与第二通信设备进行音视频通话的过程中,所述第一通信设备用于发送所述第一通信设备的第一音频;服务器用于获取所述第一通信设备的位置和所述第二通信设备的位置,及接收来自所述第一通信设备的第一音频;所述服务器用于根据所述第一通信设备的位置,所述第二通信设备的位置,来自所述第一通信设备的第一音频,以及所述第二通信设备的参数,生成第二音频,所述第二音频为具有位置指向性的音频;所述服务器用于将所述第二音频发送给所述第二通信设备;所述第二通信设备用于接收所述第二音频,并播放所述第二音频。
- 根据权利要求27所述的通信系统,其特征在于,所述第二通信设备包括至少一个播放设备,包括:所述第二通信设备用于将所述第二音频发送给所述至少一个播放设备;所述播放设备用于接收所述第二音频,并播放所述第二音频。
- 一种电子设备,其特征在于,包括:处理器和存储器,所述存储器与所述处理器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述处理器从所述存储器中读取所述计算机指令,以使得所述电子设备执行如权利要求1-10中任一项所述的实时通信方法。
- 一种计算机程序产品,所述计算机程序产品包括计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如权利要求1-10中任一项所述的实时通信方法。
- 一种计算机可读存储介质,其特征在于,包括计算机指令,所述计算机可读存储介质包括计算机指令,当所述计算机指令在计算机上运行时,使得所述计算机执行如权利要求1-10中任一项所述的实时通信方法。
- 一种芯片系统,其特征在于,包括一个或多个处理器,当所述一个或多个处理器执行指令时,所述一个或多个处理器执行如权利要求1-10中任一项所述的实时通信方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/119357 WO2022067652A1 (zh) | 2020-09-30 | 2020-09-30 | 实时通信方法、装置和系统 |
CN202080036481.5A CN114667744B (zh) | 2020-09-30 | 2020-09-30 | 实时通信方法、装置和系统 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/119357 WO2022067652A1 (zh) | 2020-09-30 | 2020-09-30 | 实时通信方法、装置和系统 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022067652A1 true WO2022067652A1 (zh) | 2022-04-07 |
Family
ID=80951105
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/119357 WO2022067652A1 (zh) | 2020-09-30 | 2020-09-30 | 实时通信方法、装置和系统 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114667744B (zh) |
WO (1) | WO2022067652A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115623156A (zh) * | 2022-08-30 | 2023-01-17 | 荣耀终端有限公司 | 音频处理方法和相关装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190025416A1 (en) * | 2016-06-22 | 2019-01-24 | Loose Cannon Systems, Inc. | System and method to indicate relative location of nodes in a group |
US20190392854A1 (en) * | 2018-06-26 | 2019-12-26 | Capital One Services, Llc | Doppler microphone processing for conference calls |
US20200103486A1 (en) * | 2018-09-28 | 2020-04-02 | Silicon Laboratories Inc. | Systems And Methods For Modifying Information Of Audio Data Based On One Or More Radio Frequency (RF) Signal Reception And/Or Transmission Characteristics |
US20200137494A1 (en) * | 2018-10-29 | 2020-04-30 | Incontact, Inc. | Systems and methods for distinguishing audio using positional information |
US10708706B1 (en) * | 2019-05-07 | 2020-07-07 | Facebook Technologies, Llc | Audio spatialization and reinforcement between multiple headsets |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4608400B2 (ja) * | 2005-09-13 | 2011-01-12 | 株式会社日立製作所 | 音声通話システムおよび音声通話中におけるコンテンツの提供方法 |
CN107301028B (zh) * | 2016-04-14 | 2020-06-02 | 阿里巴巴集团控股有限公司 | 一种基于多人远程通话的音频数据处理方法及装置 |
WO2019160953A1 (en) * | 2018-02-13 | 2019-08-22 | SentiAR, Inc. | Intercom system for multiple users |
-
2020
- 2020-09-30 WO PCT/CN2020/119357 patent/WO2022067652A1/zh active Application Filing
- 2020-09-30 CN CN202080036481.5A patent/CN114667744B/zh active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190025416A1 (en) * | 2016-06-22 | 2019-01-24 | Loose Cannon Systems, Inc. | System and method to indicate relative location of nodes in a group |
US20190392854A1 (en) * | 2018-06-26 | 2019-12-26 | Capital One Services, Llc | Doppler microphone processing for conference calls |
US20200103486A1 (en) * | 2018-09-28 | 2020-04-02 | Silicon Laboratories Inc. | Systems And Methods For Modifying Information Of Audio Data Based On One Or More Radio Frequency (RF) Signal Reception And/Or Transmission Characteristics |
US20200137494A1 (en) * | 2018-10-29 | 2020-04-30 | Incontact, Inc. | Systems and methods for distinguishing audio using positional information |
US10708706B1 (en) * | 2019-05-07 | 2020-07-07 | Facebook Technologies, Llc | Audio spatialization and reinforcement between multiple headsets |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115623156A (zh) * | 2022-08-30 | 2023-01-17 | 荣耀终端有限公司 | 音频处理方法和相关装置 |
CN115623156B (zh) * | 2022-08-30 | 2024-04-02 | 荣耀终端有限公司 | 音频处理方法和相关装置 |
Also Published As
Publication number | Publication date |
---|---|
CN114667744B (zh) | 2024-03-01 |
CN114667744A (zh) | 2022-06-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11638130B2 (en) | Rendering of sounds associated with selected target objects external to a device | |
CN107659551B (zh) | 用于利用lte进行音频通信的系统与方法 | |
US12101621B2 (en) | Multimedia information processing method and apparatus, and storage medium | |
US20160142851A1 (en) | Method for Generating a Surround Sound Field, Apparatus and Computer Program Product Thereof | |
US10388297B2 (en) | Techniques for generating multiple listening environments via auditory devices | |
US20180357038A1 (en) | Audio metadata modification at rendering device | |
US11309983B2 (en) | Media exchange between devices | |
US11006233B2 (en) | Method and terminal for playing audio file in multi-terminal cooperative manner | |
TWI819344B (zh) | 音訊訊號渲染方法、裝置、設備及電腦可讀存儲介質 | |
WO2022242405A1 (zh) | 语音通话方法和装置、电子设备及计算机可读存储介质 | |
US20170195817A1 (en) | Simultaneous Binaural Presentation of Multiple Audio Streams | |
US10257623B2 (en) | Hearing assistance system, system signal processing unit and method for generating an enhanced electric audio signal | |
EP3695621B1 (en) | Selecting a microphone based on estimated proximity to sound source | |
WO2022067652A1 (zh) | 实时通信方法、装置和系统 | |
US20240031759A1 (en) | Information processing device, information processing method, and information processing system | |
CN113301544B (zh) | 一种音频设备间语音互通的方法及设备 | |
CN115407962A (zh) | 一种音频分流的方法及电子设备 | |
WO2023185589A1 (zh) | 音量控制方法及电子设备 | |
CN115175159B (zh) | 一种蓝牙耳机播放方法及设备 | |
US20240267697A1 (en) | Audio signal processing method and electronic device | |
WO2022151336A1 (en) | Techniques for around-the-ear transducers | |
CN116962919A (zh) | 拾音方法、拾音系统及电子设备 | |
KR20230152139A (ko) | Hoa 계수를 획득하기 위한 방법 및 장치 | |
CN117917901A (zh) | 生成参数化空间音频表示 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20955677 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20955677 Country of ref document: EP Kind code of ref document: A1 |