CN113660063B

CN113660063B - Spatial audio data processing method and device, storage medium and electronic equipment

Info

Publication number: CN113660063B
Application number: CN202110948128.0A
Authority: CN
Inventors: 王兴鹤; 阮良; 陈功; 陈丽
Original assignee: Hangzhou Netease Zhiqi Technology Co Ltd
Current assignee: Hangzhou Netease Zhiqi Technology Co Ltd
Priority date: 2021-08-18
Filing date: 2021-08-18
Publication date: 2023-12-08
Anticipated expiration: 2041-08-18
Also published as: CN113660063A

Abstract

The embodiment of the disclosure relates to a spatial audio data processing method, a device, a storage medium and electronic equipment, and relates to the technical field of data processing. The method comprises the following steps: acquiring a current frame space audio data packet; analyzing current frame space audio data from the current frame space audio data packet, and determining at least one of redundant data of the current frame audio data and redundant data of the current frame space information data according to the current frame space audio data; encapsulating at least one of the redundant data of the current frame audio data and the redundant data of the current frame space information data, the current frame audio data and the current frame space information data according to a second data encapsulation format to obtain a redundant data packet of the current frame space audio data; and sending redundant data packets of the current frame of spatial audio data to the second terminal. The method reduces the transmission delay of the spatial audio data and improves the packet loss resistance of the spatial information data.

Description

Spatial audio data processing method and device, storage medium and electronic equipment

Technical Field

Embodiments of the present disclosure relate to the field of data processing technology, and more particularly, to a spatial audio data processing method, a spatial audio data processing apparatus, a computer-readable storage medium, and an electronic device.

Background

The spatial audio technology refers to that in the process of transmitting audio data, spatial information data of an audio data generation source is transmitted at the same time, and when the audio data is played at a receiving end of the audio data, a three-dimensional spatial effect is created for a user by utilizing the spatial information data corresponding to the audio data, so that the user can generate an immersive hearing experience.

In the related art, a real-time audio communication frame can be utilized to realize real-time transmission of the spatial audio data, so that richer hearing experience is provided for users.

This section is intended to provide a background or context for the embodiments of the disclosure recited in the claims, which description herein is not admitted to be prior art by inclusion in this section.

Disclosure of Invention

In this context, embodiments of the present disclosure desirably provide a spatial audio data processing method, apparatus, computer-readable storage medium, and electronic device.

According to a first aspect of embodiments of the present disclosure, there is provided a spatial audio data processing method, the method being applied to a first terminal, the method comprising:

acquiring a current frame space audio data packet, wherein the space audio data packet comprises current frame space audio data encapsulated according to a first data encapsulation format, and the current frame space audio data comprises current frame audio data and current frame space information data;

Analyzing the current frame space audio data from the current frame space audio data packet, and determining at least one of redundant data of the current frame audio data and redundant data of the current frame space information data according to the current frame space audio data;

encapsulating at least one of the redundant data of the current frame audio data and the redundant data of the current frame spatial information data, the current frame audio data and the current frame spatial information data according to a second data encapsulation format to obtain a redundant data packet of the current frame spatial audio data;

and sending the redundant data packet of the current frame space audio data to a second terminal.

In an alternative embodiment, the acquiring the current frame spatial audio data packet includes:

collecting current frame space audio data;

storing the current frame audio data in an audio data storage field corresponding to the first data encapsulation format, and storing the current frame spatial information data in a spatial information data storage field corresponding to the first data encapsulation format, so as to obtain a current frame spatial audio data packet.

In an optional implementation manner, the encapsulating, according to a second data encapsulation format, at least one of the redundant data of the current frame audio data and the redundant data of the current frame spatial information data, the current frame audio data, and the current frame spatial information data to obtain the redundant data packet of the current frame spatial audio data includes:

Storing the current frame audio data in an audio data storage field corresponding to the second data encapsulation format, storing the current frame spatial information data in a spatial information data storage field corresponding to the second data encapsulation format, and storing at least one of redundant data of the current frame audio data and redundant data of the current frame spatial information data in a redundant data storage field corresponding to the second data encapsulation format to obtain a redundant data packet of the current frame spatial audio data.

In an alternative embodiment, after storing at least one of the redundant data of the current frame audio data and the redundant data of the current frame spatial information data in the redundant data storage field corresponding to the second data encapsulation format, the method further includes:

and updating the field value of the redundant information identification field corresponding to the second data encapsulation format.

In an optional embodiment, the updating the field value of the redundant information identification field corresponding to the second data encapsulation format includes:

if the redundant data storage field comprises redundant data of the current frame audio data and redundant data of the current frame space information data, updating a field value of the redundant information identification field to be a first field value; or,

And if the redundant data storage field comprises redundant data of the current frame space information data, updating the field value of the redundant information identification field to be a second field value.

In an alternative embodiment, the method further comprises: the redundant data of the current frame of audio data includes at least one frame of audio data preceding the current frame of audio data.

In an alternative embodiment, the method further comprises: the redundant data of the current frame space information data includes at least one frame space information data located before the current frame space information data.

According to a second aspect of embodiments of the present disclosure, there is provided a spatial audio data processing method, the method being applied to a second terminal, the method comprising:

receiving a redundant data packet of current frame space audio data sent by the first terminal, wherein the redundant data packet of the current frame space audio data comprises at least one of redundant data of the current frame audio data and redundant data of current frame space information data, the current frame audio data and the current frame space information data;

and analyzing a target redundant data packet to acquire the current frame of spatial audio data, wherein the target redundant data packet comprises a redundant data packet of the current frame of spatial audio data or a redundant data packet of next frame or multi-frame spatial audio data corresponding to the current frame of spatial audio data.

In an optional implementation manner, if the redundant data packet of the current frame spatial audio data is not lost, the analyzing the target redundant data packet to obtain the current frame spatial audio data includes:

and analyzing the redundant data packet of the current frame space audio data to obtain the current frame space audio data.

In an optional implementation manner, the parsing the redundant data packet of the current frame spatial audio data to obtain the current frame spatial audio data includes:

analyzing the redundant data packet of the current frame space audio data, acquiring the current frame audio data from the audio data storage field corresponding to the second data encapsulation format, and acquiring the current frame space information data from the space information data storage field corresponding to the second data encapsulation format to obtain the current frame space audio data.

In an optional implementation manner, if the redundancy data packet of the current frame spatial audio data is lost, the analyzing the target redundancy data packet to obtain the current frame spatial audio data includes:

and analyzing a redundant data packet of the next frame or multiple frames of spatial audio data corresponding to the current frame of spatial audio data to acquire the current frame of spatial audio data.

In an optional implementation manner, the parsing the redundant data packet of the next frame or multiple frames of spatial audio data corresponding to the current frame of spatial audio data, to obtain the current frame of spatial audio data, includes:

analyzing a redundant data packet of next frame or multi-frame space audio data corresponding to the current frame space audio data, and if the field value of the redundant information identification field corresponding to the second data encapsulation format is a first field value, acquiring the current frame space audio data from the redundant data storage field corresponding to the second data encapsulation format.

analyzing a redundant data packet of next frame or multi-frame space audio data corresponding to the current frame space audio data, and if the field value of the redundant information identification field corresponding to the second data encapsulation format is a second field value, acquiring the current frame space information data from the redundant data storage field corresponding to the second data encapsulation format;

Acquiring current frame audio data corresponding to the current frame spatial audio data according to a packet loss processing result;

and combining the current frame audio data and the current frame space information data to obtain the current frame space audio data.

According to a third aspect of embodiments of the present disclosure, there is provided a spatial audio data processing apparatus, the apparatus being applied to a first terminal, the apparatus comprising:

the system comprises an acquisition module, a first data encapsulation module and a second data encapsulation module, wherein the acquisition module is configured to acquire a current frame space audio data packet, the space audio data packet comprises current frame space audio data encapsulated according to a first data encapsulation format, and the current frame space audio data comprises current frame audio data and current frame space information data;

a first parsing module configured to parse the current frame spatial audio data from the current frame spatial audio data packet, and determine at least one of redundant data of the current frame audio data and redundant data of the current frame spatial information data according to the current frame spatial audio data

The encapsulation module is configured to encapsulate at least one of the redundant data of the current frame audio data and the redundant data of the current frame spatial information data, the current frame audio data and the current frame spatial information data according to a second data encapsulation format to obtain a redundant data packet of the current frame spatial audio data;

And the sending module is configured to send the redundant data packet of the current frame space audio data to the second terminal.

In an alternative embodiment, the acquiring module is configured to:

collecting current frame space audio data;

In an alternative embodiment, the packaging module is configured to:

In an alternative embodiment, the apparatus further comprises:

and the updating module is configured to update the field value of the redundant information identification field corresponding to the second data encapsulation format.

In an alternative embodiment, the update module is configured to:

In an alternative embodiment, the apparatus further comprises: the redundant data of the current frame of audio data includes at least one frame of audio data preceding the current frame of audio data.

In an alternative embodiment, the apparatus further comprises: the redundant data of the current frame space information data includes at least one frame space information data located before the current frame space information data.

According to a fourth aspect of embodiments of the present disclosure, there is provided a spatial audio data processing apparatus, the apparatus being applied to a second terminal, the apparatus comprising:

The receiving module is configured to receive a redundant data packet of current frame space audio data sent by the first terminal, wherein the redundant data packet of the current frame space audio data comprises at least one of redundant data of the current frame audio data and redundant data of the current frame space information data, the current frame audio data and the current frame space information data;

the second analyzing module is configured to analyze a target redundant data packet to obtain the current frame of spatial audio data, wherein the target redundant data packet comprises a redundant data packet of the current frame of spatial audio data or a redundant data packet of next frame or multi-frame spatial audio data corresponding to the current frame of spatial audio data.

In an optional embodiment, if the redundant packet of the current frame spatial audio data is not lost, the second parsing module is configured to:

In an alternative embodiment, the second parsing module is configured to:

In an optional embodiment, if the redundant packet of the current frame spatial audio data is lost, the second parsing module is configured to:

In an alternative embodiment, the second parsing module is configured to:

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the methods described above.

According to a sixth aspect of embodiments of the present disclosure, there is provided an electronic device, comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform any of the methods described above via execution of the executable instructions.

According to the spatial audio data processing method, the spatial audio data processing device, the computer-readable storage medium and the electronic equipment, the audio data and the spatial information data can be simultaneously transmitted in the spatial audio data transmission process, the transmission delay of the spatial audio data is reduced, the transmission efficiency of the spatial audio data is improved, the type of redundant data carried in the redundant data packet of the spatial audio data of the current frame can be flexibly determined according to the actual condition of the current audio communication, the packet loss resistance of the spatial audio data, particularly the spatial information data, is improved, and the network resource waste is reduced.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which:

FIG. 1 illustrates a system architecture diagram of a spatial audio data processing method operating environment in accordance with an embodiment of the present disclosure;

FIG. 2 shows a flow diagram of a spatial audio data processing method according to an embodiment of the present disclosure;

FIG. 3 illustrates a schematic diagram of a first data encapsulation format according to an embodiment of the present disclosure;

FIG. 4 illustrates a schematic diagram of a second data encapsulation format, according to an embodiment of the present disclosure;

FIG. 5 shows a flow diagram of another spatial audio data processing method according to an embodiment of the present disclosure;

fig. 6 shows a block diagram of a spatial audio data processing device according to an embodiment of the present disclosure;

fig. 7 shows a block diagram of another spatial audio data processing device according to an embodiment of the present disclosure;

fig. 8 shows a block diagram of a structure of an electronic device according to an embodiment of the present disclosure.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present disclosure will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are presented merely to enable one skilled in the art to better understand and practice the present disclosure and are not intended to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Those skilled in the art will appreciate that embodiments of the present disclosure may be implemented as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the following forms, namely: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

According to embodiments of the present disclosure, a spatial audio data processing method, apparatus, computer-readable storage medium, and electronic device are provided.

Any number of elements in the figures are for illustration and not limitation, and any naming is used for distinction only, and not for any limiting sense.

The principles and spirit of the present disclosure are described in detail below with reference to several representative embodiments thereof.

Summary of The Invention

The inventor finds that in the process of using the existing Real-time audio communication framework (for example, web Real-time communication (Real-Time Communication, webRTC) framework) to transmit the spatial audio data, two data transmission mechanisms can be adopted to transmit the audio data and the spatial information data respectively, in general, real-time transmission (Real-time Transport Protocol, RTP) protocol can be adopted to transmit the audio data, real-time transmission control (Real-time Transport Control Protocol, RTCP) protocol is adopted to transmit the spatial information data, and in the process of transmitting the spatial information data, no service quality (Quality of Service, qos) control strategy is adopted, so that the spatial information data can be lost under the condition of poor network condition, and a receiving end can not create a three-dimensional effect of the audio data for a user, thereby influencing the hearing experience of the user; meanwhile, at the receiving end of the spatial audio data, a data synchronization mechanism is needed to be provided for carrying out data synchronization processing on the audio data and the corresponding spatial information data, however, the data synchronization technology provided in the prior art is complex in implementation process, and can cause larger delay in transmission of the spatial audio data, so that hearing experience of a user is affected; or, the RTP protocol may be used to transmit the spatial information data, so as to provide a certain Qos control policy for the spatial information data, where, however, the audio data and the spatial information data belong to different data packets, and a data synchronization mechanism is still required to be provided at the receiving end of the spatial audio data to perform data synchronization processing on the audio data and the corresponding spatial information data, so that a larger delay is generated in transmission of the spatial audio data.

In view of the above, the basic idea of the present disclosure is that: provided are a method, a device, a computer readable storage medium and an electronic device for transmitting spatial audio data, which can acquire a spatial audio data packet of a current frame; analyzing current frame space audio data from the current frame space audio data packet, and determining at least one of redundant data of the current frame audio data and redundant data of the current frame space information data according to the current frame space audio data; encapsulating at least one of the redundant data of the current frame audio data and the redundant data of the current frame space information data, the current frame audio data and the current frame space information data according to a second data encapsulation format to obtain a redundant data packet of the current frame space audio data; the method comprises the steps that a redundant data packet of current frame space audio data is sent to a second terminal, the second terminal can receive the redundant data packet of the current frame space audio data and analyze a target redundant data packet to obtain the current frame space audio data, wherein the current frame space audio data packet comprises current frame space audio data packaged according to a first data packaging format, and the current frame space audio data comprises current frame audio data and current frame space information data; the target redundant data packet includes a redundant data packet of the current frame spatial audio data or a redundant data packet of the next frame or frames of spatial audio data corresponding to the current frame spatial audio data. The synchronous transmission of the audio data and the spatial information data in the spatial audio data can be realized, the transmission efficiency of the spatial audio data is improved, the type of redundant data carried in the redundant data packet of the spatial audio data of the current frame can be determined according to the current condition of audio communication, the packet loss resistance of the spatial audio data, particularly the spatial information data, is improved, and the hearing experience of a user is improved.

Having described the basic principles of the present disclosure, various non-limiting embodiments of the present disclosure are specifically described below.

Application scene overview

It should be noted that the following application scenarios are only shown for facilitating understanding of the spirit and principles of the present disclosure, and embodiments of the present disclosure are not limited in this respect. Rather, embodiments of the present disclosure may be applied to any scenario where applicable.

The present disclosure may be applied to all scenarios of spatial audio data transmission, for example: in the audio communication process, an audio communication initiator can acquire current frame space audio data, wherein the current frame space audio data comprises current frame audio data and current frame space information data, and the current frame space audio data is packaged according to a first data packaging format to obtain a current frame space audio data packet; analyzing current frame space audio data from the current frame space audio data packet, and determining at least one of redundant data of the current frame audio data and redundant data of the current frame space information data according to the current frame space audio data; encapsulating at least one of the redundant data of the current frame audio data and the redundant data of the current frame space information data, the current frame audio data and the current frame space information data according to a second data encapsulation format to obtain a redundant data packet of the current frame space audio data; transmitting a redundant data packet of the current frame of spatial audio data to an audio communication receiver; the audio communication receiver can analyze the target redundant data packet to obtain the current frame of spatial audio data after receiving the redundant data packet of the current frame of spatial audio data, wherein the target redundant data packet can comprise the redundant data packet of the current frame of spatial audio data or the redundant data packet of the next frame or multiple frames of spatial audio data corresponding to the current frame of spatial audio data. The transmission efficiency of the spatial audio data and the packet loss resistance of the spatial audio data, particularly the spatial information data, can be improved, and the hearing experience of a user is improved.

Exemplary method

Exemplary embodiments of the present disclosure first provide a spatial audio data processing method, and fig. 1 shows a system architecture diagram of an operating environment of the method. As shown in fig. 1, the system architecture 100 may include: a first terminal 110, a server 120, and a second terminal 130. The first terminal 110 may be a terminal device used by an audio communication initiator, the second terminal 130 may be a terminal device used by an audio communication receiver, and the terminal device may be a smart phone, a tablet computer, a personal computer, an intelligent wearable device, an intelligent vehicle-mounted device, a game machine, or the like. The server 120 may include a background system of a third party platform, which may be a live service provider, an audio communication provider, or a gaming service provider, among others.

In general, interaction may be performed between the first terminal 110 and the server 120, and interaction may be performed between the second terminal 130 and the server 120, where after the first terminal 110 initiates audio communication, audio data and spatial information data may be obtained in real time to obtain current frame spatial audio data, where the current frame spatial audio data includes current frame audio data and current frame spatial information data, and the current frame spatial audio data is encapsulated according to a first data encapsulation format to obtain a current frame spatial audio data packet; analyzing current frame space audio data from the current frame space audio data packet, and determining at least one of redundant data of the current frame audio data and redundant data of the current frame space information data according to the current frame space audio data; encapsulating at least one of the redundant data of the current frame audio data and the redundant data of the current frame space information data, the current frame audio data and the current frame space information data according to a second data encapsulation format to obtain a redundant data packet of the current frame space audio data; the redundant data packet of the current frame spatial audio data is sent to the server 120, the server 120 may receive the redundant data packet of the current frame spatial audio data sent by the first terminal 110, and send the redundant data packet of the current frame spatial audio data to the second terminal 130 that performs audio communication with the first terminal 110, and the second terminal 130 may receive the redundant data packet of the current frame spatial audio data, parse the target redundant data packet to obtain the current frame spatial audio data, and play the current frame spatial audio data. The target redundant data packet comprises a redundant data packet of the current frame of spatial audio data or a redundant data packet of the next frame or multiple frames of spatial audio data corresponding to the current frame of spatial audio data.

It should be noted that, in the embodiment of the present disclosure, the first terminal device 110 and the second terminal device 130 may be mounted with a spatial audio data transmission frame, for example, webRTC frame, so that packaging, packaging and transmission of spatial audio data may be implemented, and the server 120 may be a server or a cluster formed by multiple servers.

The exemplary embodiments of the present disclosure first provide a spatial audio data processing method, which may be applied to a first terminal, where the first terminal may be a sender of spatial audio data in an audio communication process, as shown in fig. 2, and the method may include steps S201 to S204:

step S201, a current frame space audio data packet is acquired.

In an embodiment of the present disclosure, the spatial audio data packet includes current frame spatial audio data encapsulated in a first data encapsulation format, the current frame spatial audio data including current frame audio data and current frame spatial information data.

Step S202, analyzing current frame space audio data from the current frame space audio data packet, and determining at least one of redundant data of the current frame audio data and redundant data of the current frame space information data according to the current frame space audio data.

And step S203, encapsulating at least one of the redundant data of the current frame audio data and the redundant data of the current frame space information data, the current frame audio data and the current frame space information data according to a second data encapsulation format to obtain a redundant data packet of the current frame space audio data.

In the embodiment of the disclosure, in order to solve the problem of packet loss that easily occurs under the condition of poor network conditions, an anti-packet loss policy may be provided in the current frame spatial audio data transmission process, where the anti-packet loss policy may be implemented based on a redundancy encoding technology, after the current frame spatial audio data packet is acquired, the current frame spatial audio data may be parsed from the current frame spatial audio data packet, at least one of the redundancy data of the current frame audio data and the redundancy data of the current frame spatial information data is determined to be the redundancy data of the current frame spatial audio data, and then the current frame spatial audio data and the redundancy data of the current frame spatial audio data are encapsulated in a second data encapsulation format to obtain a redundancy data packet of the current frame spatial audio data, where the redundancy data of the current frame audio data may include at least one frame audio data located before the current frame spatial audio data, and the redundancy data of the current frame spatial information data may include at least one frame spatial information data located before the current frame spatial information data.

It should be noted that, in the embodiment of the present disclosure, in order to reduce network resource waste and improve network resource utilization, the type of redundant data in the redundant data of the current frame of spatial audio data may be determined according to the current data transmission environment, where the data transmission environment may be the current network environment condition, or the current packet loss rate or the current packet transmission delay condition.

And step S204, redundant data packets of the current frame space audio data are sent to the second terminal.

In the embodiment of the disclosure, because the second data encapsulation format supports simultaneous encapsulation of the audio data and the spatial information in the spatial audio data, the second terminal can receive the redundant data packet of the spatial audio data of the current frame, and can directly analyze the redundant data packet of the spatial audio data of the current frame under the condition that the redundant data packet of the spatial audio data of the current frame is determined not to be lost, and simultaneously acquire the spatial audio data of the current frame, thereby improving the transmission efficiency of the spatial audio data; and the second data encapsulation format supports encapsulation of redundant data of the spatial audio data, and under the condition that the packet loss of the redundant data packet of the spatial audio data of the current frame is determined, the redundant data packet of the spatial audio data of the next frame or multiple frames corresponding to the spatial audio data of the current frame can be analyzed, the spatial audio data of the current frame is obtained, and the packet loss resistance of the spatial audio data is improved.

In summary, the spatial audio data processing method provided by the embodiment of the present disclosure may simultaneously transmit audio data and spatial information data in a spatial audio data transmission process, thereby reducing transmission delay of the spatial audio data, improving transmission efficiency of the spatial audio data, flexibly determining a type of redundant data carried in a redundant data packet of the spatial audio data of a current frame according to an actual condition of current audio communication, improving packet loss resistance of the spatial audio data, particularly the spatial information data, and reducing network resource waste.

In an alternative embodiment, the first terminal in step S201 may acquire the current frame spatial audio packet.

The process of the first terminal obtaining the current frame space audio data packet may include: collecting current frame space audio data; storing the current frame audio data in an audio data storage field corresponding to the first data encapsulation format, and storing the current frame spatial information data in a spatial information data storage field corresponding to the first data encapsulation format, so as to obtain a current frame spatial audio data packet. The first data package format may be determined according to a spatial audio data transmission frame used for transmitting spatial audio data, which is not limited in the embodiments of the present disclosure.

It should be noted that, in the embodiment of the present disclosure, the process of collecting the spatial audio data of the current frame may be determined based on an actual audio communication scene, which is not limited in this embodiment of the present disclosure, and if the current audio communication scene is a real session scene, for example, an audio conference, the spatial information data of the audio data and the audio data production source may be collected in real time, so as to obtain the spatial audio data of the current frame; if the current audio communication scene is a virtual session scene, for example, a virtual character session in a virtual game, audio data can be collected in real time to obtain current frame audio data, and current frame space information data is generated according to the position information of the virtual character in the virtual game scene to obtain current frame space audio data.

For example, assuming that the spatial audio data transmission frame on which the spatial audio data depends is a WebRTC frame, the spatial audio data may be transmitted based on an RTP protocol, and the first data encapsulation format may be a data encapsulation format based on the RFC3550 base protocol standard, where, as shown in fig. 3, the first data encapsulation format includes four parts: the device comprises a fixed header 301, an extension header 302, a load header 303 and a load data field 304, wherein fields in the fixed header 301 are used for storing attribute information data related to a generated current frame space audio data packet, a field V is a version identification field of an RTP protocol, a field P is a padding (padding) identification field, a field X is an extension header identification field, a field CC is a number identification field of a contribution source (contributing source), a field M is an identification field of an important event in a data stream, a field PT is a load type identification field, a field sequence number is a sequence number identification field, a field timestamp is a timestamp identification field, and a field Synchronization source is a synchronization source identification field; the field in the extension header 302 is used for storing current frame space information Data, the extension header is a Data encapsulation format based on a protocol standard of RFC5285, and can support single-byte or double-byte Data encapsulation, the extension header shown in fig. 3 is used for supporting double-byte Data encapsulation, wherein the field ID is an identification field of the extension header, the field L is a Data length identification field of current frame space information Data stored in the extension header, the field subID1 is an identification field of first current frame space information Data stored in the extension header, the field subLen1 is a Data length identification field of first current frame space information Data, the field Data1 is an identification field of first current frame space information Data, the field subLen2 is a Data length identification field of second current frame space information Data stored in the extension header, the field Data2 is an identification field of second current frame space information Data, the first current frame space information Data and the second current frame space information Data are Data representing different space position information, for example, the field Data1 is a Data coordinate of current frame space information, and the second current frame space information Data is a sound source of Data; the field in the payload header 303 is used to store attribute information data related to the payload data, and the field payload header is a payload header identification field; the payload data field 304 is used to store the current frame audio data, and the field payload data is a payload data identification field.

It should be noted that, in the embodiment of the present disclosure, in order to ensure universality of the first data encapsulation format, the extension header identification field X in the fixed header 301 may be 0, which indicates that the first data encapsulation format does not include the extension header, and at this time, common audio data is transmitted.

It may be appreciated that, if the first data encapsulation format is a data encapsulation format based on RFC3550 base protocol standard, the process of storing the current frame audio data in the audio data storage field corresponding to the first data encapsulation format and storing the current frame spatial information data in the spatial information data storage field corresponding to the first data encapsulation format to obtain the current frame spatial audio data packet may include: the current frame audio data is stored in a payload data field of a data encapsulation format based on the RFC3550 base protocol standard, and the current frame spatial information data is stored as metadata in an extension header of the data encapsulation format based on the RFC3550 base protocol standard.

In an optional embodiment, the first terminal in step S202 may parse the current frame spatial audio data from the current frame spatial audio data packet, and determine at least one of the redundant data of the current frame audio data and the redundant data of the current frame spatial information data according to the current frame spatial audio data.

In the embodiment of the disclosure, in order to solve the problem of packet loss that easily occurs under the condition of poor network conditions, after the current frame space audio data packet is acquired, the current frame space audio data packet is not directly transmitted to the second terminal, the current frame space audio data can be firstly parsed from the current frame space audio data packet, the redundant data of the current frame space audio data is determined, the current frame space audio data and the corresponding redundant data are packaged according to the second data packaging format, the redundant data packet of the current frame space audio data is obtained, and after the second terminal determines that the previous frame or the multiple frames of space audio data are lost, the second terminal can recover the previous frame or the multiple frames of space audio data by utilizing the redundant data packet of the current frame space audio data.

In an alternative embodiment, the process of determining at least one of the redundant data of the current frame audio data and the redundant data of the current frame spatial information data according to the current frame spatial audio data may include: determining a data transmission environment for transmitting the current frame of spatial audio data, and determining redundant data of the current frame of spatial audio data according to the data transmission environment, wherein the redundant data of the current frame of spatial audio data can be redundant data of the current frame of audio data, or redundant data of the current frame of spatial information data, or redundant data of the current frame of audio data and redundant data of the current frame of spatial information data. Wherein the redundant data of the current frame of audio data may include at least one frame of audio data located before the current frame of audio data, and the redundant data of the current frame of spatial information data may include at least one frame of spatial information data located before the current frame of spatial information data; the data transmission environment may be a current network environment condition, or a current packet loss rate or a current packet transmission delay condition.

In an optional embodiment, the first terminal in step S203 may encapsulate at least one of the redundant data of the current frame audio data and the redundant data of the current frame spatial information data, the current frame audio data, and the current frame spatial information data according to the second data encapsulation format, to obtain a redundant data packet of the current frame spatial audio data.

In an alternative embodiment, the process of encapsulating, by the first terminal, at least one of the redundant data of the current frame audio data and the redundant data of the current frame spatial information data, the current frame audio data, and the current frame spatial information data according to the second data encapsulation format, to obtain the redundant data packet of the current frame spatial audio data may include:

storing the current frame audio data in an audio data storage field corresponding to the second data encapsulation format, storing the current frame spatial information data in a spatial information data storage field corresponding to the second data encapsulation format, and storing at least one of redundant data of the current frame audio data and redundant data of the current frame spatial information data in a redundant data storage field corresponding to the second data encapsulation format to obtain a redundant data packet of the current frame spatial audio data. The second data package format may be determined according to a spatial audio data transmission frame used for transmitting the spatial audio data, which is not limited in the embodiments of the present disclosure.

For example, assuming that the spatial audio data transmission frame on which the spatial audio data depends is a WebRTC frame, the spatial audio data may be transmitted based on the RTP protocol, and the second data encapsulation format may be a data encapsulation format based on the RFC2198 protocol standard, as shown in fig. 4, where the second data encapsulation format includes four parts: the fields in the fixed header 401, the extension header 402, the load header and the load data field 404, where the fields in the fixed header 401 are used to store attribute information data related to the generated redundant data packet of the current frame spatial audio data, and the fields in the extension header 402 are used to store the current frame spatial information data, where the fields in the fixed header 401 and the extension header 402 and the identification meanings thereof may refer to the fixed header 301 and the extension header 302 in fig. 3 in the above embodiment, respectively, which is not repeated in the embodiments of the disclosure; the fields in the payload header are used to store attribute information data related to the payload data, the payload header includes a redundancy block header 4031 and a main encoding block header 4032, and the redundancy block header 4031 includes: the field F is a block header identification field, the field F is 1, another block header exists after the block header, the field F is 0, another block header does not exist after the block header, the field block PT is an identification field of a data type stored by a redundant block header, the field XY is a redundant information identification field, the field timestamp offset is a timestamp offset identification field, the field block length is a load data length identification field, and the main coded block header 4032 includes: the field F is 0, which indicates that the main coded block header is the last block header of the payload header, the field block PT is an identification field of a data type stored in the main coded block header, the payload data field 404 is used for storing current frame audio data and redundant data of the current frame spatial audio data, the field redundancy data (4042) is an identification field for storing the redundant data of the current frame spatial audio data, the field primary data (4041) is an identification field for storing the current frame audio data, LPC encoded redundant data in fig. 4 indicates that the redundant data of the current frame spatial audio data encoded based on the LPC (Linear predictive coding) format is stored in the field redundancy data of the payload data field 404, and the DVI4 encoded primary data indicates that the current frame audio data encoded based on the DVI4 (Digital Visual Interface, digital video interface) format is stored in the field primary data of the payload data field 404.

The field value of the redundant information identification field may represent a redundant data type of the current frame spatial audio data carried in the redundant data packet of the current frame spatial audio data, and if the field value of the redundant information identification field XY is a first field value 10, the redundant data type carried in the redundant data packet of the current frame spatial audio data includes: redundant data of current frame audio data and redundant data of current frame space information data; or if the field value of the redundant information identification field XY is the second field value 11, the redundant data type carried in the redundant data packet representing the current frame of spatial audio data includes: redundant data of the current frame spatial information data.

It should be noted that, in the embodiment of the present disclosure, in order to ensure universality of the second data encapsulation format, a field value of the redundant information identification field XY may also be 00, and redundant data carried in the redundant data packet includes redundant data of the current frame audio data, where common audio data is transmitted.

It may be understood that, if the second data encapsulation format is a data encapsulation format based on RFC2198 protocol standard, the process of encapsulating, by the first terminal, at least one of the redundant data of the current frame audio data and the redundant data of the current frame spatial information data, the current frame audio data, and the current frame spatial information data according to the second data encapsulation format to obtain the redundant data packet of the current frame spatial audio data may include:

Storing the current frame audio data in a load data field corresponding to a data encapsulation format based on the RFC2198 protocol standard, storing the current frame space information data in an extension header of the data encapsulation format based on the RFC2198 protocol standard, and storing at least one of redundant data of the current frame audio data and redundant data of the current frame space information data in the load data field of the data encapsulation format based on the RFC2198 protocol standard to obtain a redundant data packet of the current frame space audio data.

In an alternative embodiment, the first terminal may further update the field value of the redundant information identification field corresponding to the second data encapsulation format after at least one of the redundant data of the current frame audio data and the redundant data of the current frame spatial information data is stored in the redundant data storage field corresponding to the second data encapsulation format.

Wherein, the process of updating the field value of the redundant information identification field corresponding to the second data encapsulation format may include: if the redundant data storage field comprises redundant data of the current frame audio data and redundant data of the current frame space information data, updating a field value of the redundant information identification field to be a first field value; or if the redundant data storage field comprises the redundant data of the current frame space information data, updating the field value of the redundant information identification field to be a second field value.

In the embodiment of the present disclosure, the second data encapsulation format is a data encapsulation format based on RFC2198 protocol standard, where if the type of redundant data stored in the payload data field includes redundant data of current frame audio data and redundant data of current frame spatial information data, the field value of the redundant information identification field XY may be updated to the first field value 10; if the type of redundant data stored in the payload data field includes redundant data of the current frame space information data, the field value of the redundant information identification field XY may be updated to the second field value 11.

In an alternative embodiment, the first terminal in step S204 may send the redundant data packet of the current frame spatial audio data to the second terminal.

In the embodiment of the disclosure, the second terminal may receive the redundant data packet of the current frame spatial audio data sent by the first terminal, and the second terminal may parse the redundant data packet of the current frame spatial audio data to obtain the current frame spatial audio data; however, if the current network condition is poor, the redundant data packet of the current frame spatial audio data is lost, at this time, the second terminal may parse the redundant data packet of the next frame or multiple frames of spatial audio data corresponding to the current frame spatial audio data to obtain the current frame spatial audio data, so the second terminal may parse the target redundant data packet to obtain the current frame spatial audio data, where the target redundant data packet includes the redundant data packet of the current frame spatial audio data, or the redundant data packet of the next frame or multiple frames of spatial audio data corresponding to the current frame spatial audio data, and it may be understood that the redundant data packet of the next frame or multiple frames of spatial audio data corresponding to the current frame spatial audio data may include the redundant data packet of at least one frame of spatial audio data located after the current frame spatial audio data.

The process of the second terminal parsing the target redundant data packet to obtain the current frame of spatial audio data may include, but is not limited to, the following several optional embodiments:

in an alternative embodiment, if the redundant data packet of the current frame spatial audio data is not lost, the process of the second terminal analyzing the target redundant data packet to obtain the current frame spatial audio data may include: and analyzing the redundant data packet of the current frame space audio data to obtain the current frame space audio data.

The process of analyzing the redundant data packet of the current frame of spatial audio data and obtaining the current frame of spatial audio data may include: analyzing the redundant data packet of the current frame space audio data, acquiring the current frame audio data from the audio data storage field corresponding to the second data encapsulation format, and acquiring the current frame space information data from the space information data storage field corresponding to the second data encapsulation format, so as to obtain the current frame space audio data.

Further, the second terminal may further obtain redundancy data of the current frame of spatial audio data carried in the redundancy data packet of the current frame of spatial audio data, where the redundancy data of the current frame of spatial audio data may include one or more frames of spatial audio data of a previous frame, or include one or more frames of spatial information data of a previous frame, and if the redundancy data packet of the one or more frames of spatial audio data of the previous frame is lost, the one or more frames of spatial audio data of the previous frame or the one or more frames of spatial information data of the previous frame may be recovered by using the redundancy data of the current frame of spatial audio data; if the redundant data packet of the previous frame or the redundant data packet of the multiple frames of spatial audio data is not lost, the redundant data of the spatial audio data of the current frame can be discarded, and data deduplication is performed.

For example, if the second data encapsulation format is a data encapsulation format based on RFC2198 protocol standard, the process of the second terminal analyzing the redundant data packet of the current frame spatial audio data to obtain the current frame spatial audio data may include: analyzing a redundant data packet of the current frame space audio data, acquiring the current frame audio data from a load data field corresponding to a data encapsulation format based on the RFC2198 protocol standard, and acquiring the current frame space information data from an extension header of the data encapsulation format based on the RFC2198 protocol standard to obtain the current frame space audio data.

In an alternative embodiment, if the redundant data packet of the current frame spatial audio data is lost, the process of the second terminal analyzing the target redundant data packet to obtain the current frame spatial audio data may include: and analyzing a redundant data packet of the next frame or multiple frames of spatial audio data corresponding to the spatial audio data of the current frame to acquire the spatial audio data of the current frame.

Optionally, the process of analyzing the redundant data packet of the next frame or multiple frames of spatial audio data corresponding to the current frame of spatial audio data to obtain the current frame of spatial audio data may include:

analyzing a redundant data packet of next frame or multi-frame space audio data corresponding to the current frame space audio data, if the field value of the redundant information identification field corresponding to the second data encapsulation format is a first field value, indicating that the carried redundant data type comprises the current frame audio data and the current frame space information data in the redundant data packet of the next frame or multi-frame space audio data corresponding to the current frame space audio data, and acquiring the current frame space audio data from the redundant data storage field corresponding to the second data encapsulation format.

Optionally, the process of analyzing the redundant data packet of the next frame or multiple frames of spatial audio data corresponding to the current frame of spatial audio data to obtain the current frame of spatial audio data may also include: analyzing a redundant data packet of next frame or multi-frame space audio data corresponding to the current frame space audio data, and if the field value of the redundant information identification field corresponding to the second data encapsulation format is a second field value, indicating that the carried redundant data type comprises the current frame space information data in the redundant data packet of the next frame or multi-frame space audio data corresponding to the current frame space audio data, acquiring the current frame space information data from the redundant data storage field corresponding to the second data encapsulation format; acquiring current frame audio data corresponding to the current frame spatial audio data obtained according to the packet loss processing result; and combining the current frame audio data and the current frame space information data to obtain the current frame space audio data. The packet loss processing result may be current frame audio data determined according to a data recovery algorithm, and the data recovery algorithm may be an audio data recovery algorithm such as in-band forward error correction or data retransmission.

For example, if the second data encapsulation format is a data encapsulation format based on RFC2198 protocol standard, the process of the second terminal analyzing the redundant data packet of the next frame or multiple frames of spatial audio data corresponding to the current frame of spatial audio data to obtain the current frame of spatial audio data may include: analyzing a redundant data packet of next frame or multi-frame space audio data corresponding to the current frame space audio data, and acquiring the current frame space audio data from a load data field if the field value of a redundant information identification field XY corresponding to a data encapsulation format based on RFC2198 protocol standard is a first field value 10; if the field value of the redundant information identification field XY corresponding to the data encapsulation format based on the RFC2198 protocol standard is the second field value 11, current frame space information data is obtained from the load data field, current frame audio data corresponding to the current frame space information data obtained according to the packet loss processing result is obtained, and the current frame space audio data is obtained.

An exemplary embodiment of the present disclosure provides a spatial audio data processing method, which may be applied to a second terminal, where the second terminal may be a receiving party of spatial audio data in an audio communication process, as shown in fig. 5, and the method may include steps S501 to S502:

Step S501, receiving a redundant data packet of the current frame spatial audio data sent by the first terminal.

In an embodiment of the present disclosure, the redundant data packet of the current frame spatial audio data includes at least one of redundant data of the current frame audio data and redundant data of the current frame spatial information data, the current frame audio data, and the current frame spatial information data.

And S502, analyzing the target redundant data packet to obtain the current frame space audio data.

In an embodiment of the present disclosure, the target redundancy data packet includes a redundancy data packet of the current frame spatial audio data, or a redundancy data packet of the next frame or multiple frames of spatial audio data corresponding to the current frame spatial audio data.

In summary, according to the spatial audio data processing method provided by the embodiment of the disclosure, the audio data and the spatial information data can be obtained simultaneously in the spatial audio data transmission process, so that the transmission delay of the spatial audio data is reduced, the transmission efficiency of the spatial audio data is improved, and when a data packet is lost, the spatial audio data of the current frame can be recovered according to the redundant data packet of the spatial audio data of the next frame or multiple frames, and the anti-packet loss capability of the spatial audio data, especially the spatial information data, is improved.

In an alternative embodiment, the second terminal in step S502 may parse the target redundancy data packet to obtain the current frame spatial audio data.

The process of the second terminal analyzing the target redundant data packet to obtain the current frame of spatial audio data may have the following several optional embodiments:

Optionally, the process of analyzing the redundant data packet of the next frame or multiple frames of spatial audio data corresponding to the current frame of spatial audio data to obtain the current frame of spatial audio data may also include: analyzing a redundant data packet of next frame or multi-frame space audio data corresponding to the current frame space audio data, and if the field value of the redundant information identification field corresponding to the second data encapsulation format is a second field value, indicating that the carried redundant data type comprises the current frame space information data in the redundant data packet of the next frame or multi-frame space audio data corresponding to the current frame space audio data, acquiring the current frame space information data from the redundant data storage field corresponding to the second data encapsulation format; acquiring current frame audio data corresponding to the current frame spatial audio data obtained according to the packet loss processing result; and combining the current frame audio data and the current frame space information data to obtain the current frame space audio data. The packet loss processing result may be current frame audio data determined according to a data recovery algorithm, and the data recovery algorithm may be in-band forward error correction or data retransmission.

Exemplary apparatus

Having described the method of exemplary embodiments of the present disclosure, next, an apparatus of exemplary embodiments of the present disclosure will be described.

Exemplary embodiments of the present disclosure first provide a spatial audio data processing apparatus that may be applied to a first terminal, as shown in fig. 6, the spatial audio data processing apparatus 600 may include:

an obtaining module 601 configured to obtain a current frame spatial audio data packet, where the spatial audio data packet includes current frame spatial audio data encapsulated according to a first data encapsulation format, and the current frame spatial audio data includes current frame audio data and current frame spatial information data;

a first parsing module 602 configured to parse the current frame spatial audio data from the current frame spatial audio data packet, and determine at least one of redundant data of the current frame audio data and redundant data of the current frame spatial information data according to the current frame spatial audio data

The encapsulating module 603 is configured to encapsulate at least one of the redundant data of the current frame audio data and the redundant data of the current frame spatial information data, the current frame audio data and the current frame spatial information data according to the second data encapsulation format, so as to obtain a redundant data packet of the current frame spatial audio data;

And a transmitting module 604 configured to transmit the redundant data packet of the current frame spatial audio data to the second terminal.

In an alternative embodiment, the obtaining module 601 is configured to:

collecting current frame space audio data;

In an alternative embodiment, the encapsulation module 603 is configured to:

In an alternative embodiment, as shown in fig. 6, the spatial audio data processing device 600 further includes:

The updating module 605 is configured to update the field value of the redundant information identification field corresponding to the second data encapsulation format.

In an alternative embodiment, update module 605 is configured to:

In an alternative embodiment, the spatial audio data processing device 600 further includes: the redundant data of the current frame of audio data includes at least one frame of audio data preceding the current frame of audio data.

In an alternative embodiment, the spatial audio data processing device 600 further includes: the redundant data of the current frame space information data includes at least one frame space information data located before the current frame space information data.

Exemplary embodiments of the present disclosure first provide a spatial audio data processing apparatus that may be applied to a second terminal, as shown in fig. 7, the spatial audio data processing apparatus 700 may include:

A receiving module 701, configured to receive a redundancy data packet of current frame spatial audio data sent by the first terminal, where the redundancy data packet of current frame spatial audio data includes at least one of redundancy data of current frame audio data and redundancy data of current frame spatial information data, the current frame audio data, and the current frame spatial information data;

the second parsing module 702 is configured to parse a target redundancy data packet to obtain the current frame spatial audio data, where the target redundancy data packet includes a redundancy data packet of the current frame spatial audio data, or a redundancy data packet of next frame or multiple frames of spatial audio data corresponding to the current frame spatial audio data.

In an alternative embodiment, if the redundant packet of the spatial audio data of the current frame is not lost, the second parsing module 702 is configured to:

In an alternative embodiment, the second parsing module 702 is configured to:

analyzing the redundant data packet of the current frame space audio data, acquiring the current frame audio data from the audio data storage field corresponding to the second data encapsulation format, and acquiring the current frame space information data from the space information data storage field corresponding to the second data encapsulation format, so as to obtain the current frame space audio data.

In an alternative embodiment, if the redundant packet of the current frame spatial audio data is lost, the second parsing module 702 is configured to:

and analyzing a redundant data packet of the next frame or multiple frames of spatial audio data corresponding to the spatial audio data of the current frame to acquire the spatial audio data of the current frame.

In an alternative embodiment, the second parsing module 702 is configured to:

analyzing a redundant data packet of next frame or multi-frame space audio data corresponding to the current frame space audio data, and if the field value of the redundant information identification field corresponding to the second data encapsulation format is the first field value, acquiring the current frame space audio data from the redundant data storage field corresponding to the second data encapsulation format.

In an alternative embodiment, the second parsing module 702 is configured to:

acquiring current frame audio data corresponding to the current frame spatial audio data obtained according to the packet loss processing result;

In addition, other specific details of the embodiments of the present disclosure are described in the foregoing embodiments of the method, and are not described herein.

Exemplary storage Medium

A storage medium according to an exemplary embodiment of the present disclosure is described below.

In the present exemplary embodiment, the above-described method may be implemented by a program product, such as a portable compact disc read only memory (CD-ROM) and including program code, and may be run on a device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RE, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a local area network (FAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

Exemplary electronic device

An electronic device of an exemplary embodiment of the present disclosure is described with reference to fig. 8.

The electronic device 800 shown in fig. 8 is merely an example and should not be construed to limit the functionality and scope of use of embodiments of the present disclosure in any way.

As shown in fig. 8, the electronic device 800 is embodied in the form of a general purpose computing device. Components of electronic device 800 may include, but are not limited to: at least one processing unit 810, at least one memory unit 820, a bus 830 connecting the different system components (including memory unit 820 and processing unit 810), and a display unit 840.

Wherein the storage unit stores program code that is executable by the processing unit 810 such that the processing unit 810 performs steps according to various exemplary embodiments of the present disclosure described in the above section of the present specification. For example, the processing unit 810 may perform the method steps as shown, etc.

Storage 820 may include volatile storage such as Random Access Memory (RAM) 821 and/or cache memory 822, and may further include read-only memory (ROM) 823.

The storage unit 820 may also include a program/utility 824 having a set (at least one) of program modules 825, such program modules 825 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

Bus 830 may include a data bus, an address bus, and a control bus.

The electronic device 800 may also communicate with one or more external devices 900 (e.g., keyboard, pointing device, bluetooth device, etc.) via an input/output (I/O) interface 850. The electronic device 800 further comprises a display unit 840 connected to an input/output (I/O) interface 850 for displaying. Also, electronic device 800 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 860. As shown, network adapter 860 communicates with other modules of electronic device 800 over bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 800, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

It should be noted that while several modules or sub-modules of the apparatus are mentioned in the detailed description above, such partitioning is merely exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.

Furthermore, although the operations of the methods of the present disclosure are depicted in the drawings in a particular order, this is not required to or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

While the spirit and principles of the present disclosure have been described with reference to several particular embodiments, it is to be understood that this disclosure is not limited to the particular embodiments disclosed nor does it imply that features in these aspects are not to be combined to benefit from this division, which is done for convenience of description only. The disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A method for spatial audio data processing, the method being applied to a first terminal, the method comprising:

2. The method of claim 1, wherein said obtaining a current frame spatial audio data packet comprises:

collecting current frame space audio data;

3. The method of claim 1, wherein encapsulating at least one of the redundant data of the current frame audio data and the redundant data of the current frame spatial information data, the current frame audio data, and the current frame spatial information data in the second data encapsulation format to obtain the redundant data packet of the current frame spatial audio data comprises:

4. The method of claim 3, wherein after storing at least one of the redundant data of the current frame audio data and the redundant data of the current frame spatial information data in the redundant data storage field corresponding to the second data encapsulation format, the method further comprises:

5. The method of claim 4, wherein updating the field value of the redundant information identification field corresponding to the second data encapsulation format comprises:

6. The method according to claim 1, wherein the method further comprises: the redundant data of the current frame of audio data includes at least one frame of audio data preceding the current frame of audio data.

7. The method according to claim 1, wherein the method further comprises: the redundant data of the current frame space information data includes at least one frame space information data located before the current frame space information data.

8. A method for spatial audio data processing, the method being applied to a second terminal, the method comprising:

receiving a redundant data packet of current frame space audio data sent by a first terminal, wherein the redundant data packet of the current frame space audio data comprises at least one of redundant data of the current frame audio data and redundant data of current frame space information data, the current frame audio data and the current frame space information data;

9. The method of claim 8, wherein if the redundancy data packet of the current frame spatial audio data is not lost, the parsing the target redundancy data packet to obtain the current frame spatial audio data comprises:

10. The method of claim 9, wherein said parsing the redundant data packets of the current frame spatial audio data to obtain the current frame spatial audio data comprises:

analyzing the redundant data packet of the current frame space audio data, acquiring the current frame audio data from an audio data storage field corresponding to a second data encapsulation format, and acquiring the current frame space information data from a space information data storage field corresponding to the second data encapsulation format to obtain the current frame space audio data.

11. The method of claim 8, wherein if the redundancy data packet of the current frame spatial audio data is lost, the parsing the target redundancy data packet to obtain the current frame spatial audio data comprises:

12. The method of claim 11, wherein parsing the redundant data packet of the next frame or frames of spatial audio data corresponding to the current frame of spatial audio data to obtain the current frame of spatial audio data comprises:

analyzing a redundant data packet of the next frame or multi-frame spatial audio data corresponding to the current frame spatial audio data, and if the field value of the redundant information identification field corresponding to the second data encapsulation format is the first field value, acquiring the current frame spatial audio data from the redundant data storage field corresponding to the second data encapsulation format.

13. The method of claim 11, wherein parsing the redundant data packet of the next frame or frames of spatial audio data corresponding to the current frame of spatial audio data to obtain the current frame of spatial audio data comprises:

analyzing a redundant data packet of next frame or multi-frame space audio data corresponding to the current frame space audio data, and if the field value of the redundant information identification field corresponding to a second data encapsulation format is a second field value, acquiring the current frame space information data from a redundant data storage field corresponding to the second data encapsulation format;

14. A spatial audio data processing device, the device being applied to a first terminal, the device comprising:

15. The apparatus of claim 14, wherein the acquisition module is configured to:

collecting current frame space audio data;

16. The apparatus of claim 14, wherein the encapsulation module is configured to:

17. The apparatus of claim 16, wherein the apparatus further comprises:

18. The apparatus of claim 17, wherein the update module is configured to:

19. The apparatus of claim 14, wherein the apparatus further comprises: the redundant data of the current frame of audio data includes at least one frame of audio data preceding the current frame of audio data.

20. The apparatus of claim 14, wherein the apparatus further comprises: the redundant data of the current frame space information data includes at least one frame space information data located before the current frame space information data.

21. A spatial audio data processing device, the device being applied to a second terminal, the device comprising:

22. The apparatus of claim 21, wherein if the redundant data packet of the current frame spatial audio data is not lost, the second parsing module is configured to:

23. The apparatus of claim 22, wherein the second parsing module is configured to:

24. The apparatus of claim 21, wherein if the redundant data packet of the current frame spatial audio data is lost, the second parsing module is configured to:

25. The apparatus of claim 24, wherein the second parsing module is configured to:

26. The apparatus of claim 24, wherein the second parsing module is configured to:

27. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any one of claims 1 to 13.

28. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of any one of claims 1 to 13 via execution of the executable instructions.