CN117676185A

CN117676185A - Packet loss compensation method and device for audio data and related equipment

Info

Publication number: CN117676185A
Application number: CN202311661514.7A
Authority: CN
Inventors: 戴祖华; 葛文婷
Original assignee: Zhejiang Zhonggan Microelectronics Co ltd; Zgmicro Corp
Current assignee: Zhejiang Zhonggan Microelectronics Co ltd; Zgmicro Corp
Priority date: 2023-12-05
Filing date: 2023-12-05
Publication date: 2024-03-08

Abstract

The disclosure provides a packet loss compensation method, device and related equipment for audio data, and relates to the technical field of audio coding and decoding, wherein the method comprises the following steps: under the condition that the target audio data packet is detected to be missing in the cache, N reference audio data packets are obtained from the cache, wherein the audio playing time of any one of the reference audio data packets is earlier than the audio playing time of the target audio data packet, and the N reference audio data packets are continuous and adjacent to the target audio data packet; sub-band decomposition is carried out on the reference audio data packet to obtain M reference sub-band data sets, wherein M is an integer greater than 1; and predicting the target audio data packet according to the M reference subband data sets to obtain M subband compensating audio data packets, synthesizing the M subbands to obtain the compensating audio data packets, and storing the compensating audio data packets in the cache. The method and the device can enable the predicted compensation audio data packet to be more accurate.

Description

Packet loss compensation method and device for audio data and related equipment

Technical Field

The disclosure relates to the technical field of audio encoding and decoding, and in particular relates to a packet loss compensation method and device for audio data and related equipment.

Background

During the audio data transmission process, audio data packets are lost due to signal attenuation, interference, network congestion or other factors, and the audio data packet loss may cause significant interruption or tone quality loss of an audio signal and seriously affect the hearing experience of a user.

In the related art, a linear prediction algorithm is used to compensate for a lost audio data packet, that is, the lost audio data packet is estimated by analyzing a received audio data packet and extracting characteristic values such as formants of an audio signal, which is found in application that the accuracy of the audio data packet estimated based on the method is low when the audio data packet is faced with diversified audio signals (such as multimedia resource transmission of audio and video) and different communication environments (such as a situation that the network environment fluctuates greatly).

Disclosure of Invention

The disclosure aims to provide a packet loss compensation method, device and related equipment for audio data, which are used for solving the technical problem that the accuracy of an audio data packet estimated in a complex application environment by a related technology is low.

In a first aspect, an embodiment of the present disclosure provides a packet loss compensation method for audio data, including:

under the condition that a target audio data packet is detected to be missing in a cache, N reference audio data packets are obtained from the cache, wherein the audio playing time of any reference audio data packet is earlier than the audio playing time of the target audio data packet, the N reference audio data packets are continuous and adjacent to the target audio data packet, and N is an integer greater than 1;

respectively carrying out sub-band decomposition on the N reference audio data packets to obtain M reference sub-band data sets, wherein M is an integer greater than 1;

and predicting the target audio data packet according to the M reference subband data sets to obtain a compensation audio data packet, and storing the compensation audio data packet into the cache.

In one embodiment, the predicting the target audio data packet according to the M reference subband data sets to obtain a compensated audio data packet includes:

respectively carrying out downsampling treatment on the M reference subband data sets to obtain M downsampled data sets, wherein the multiple of the downsampling treatment is more than 1 and less than or equal to M;

and predicting the target audio data packet according to the M downsampled data sets to obtain a compensated audio data packet.

In one embodiment, the predicting the target audio data packet according to the M downsampled data sets to obtain the compensated audio data packet includes:

predicting the target audio data packet according to M downsampled data sets to obtain M target subbands, wherein the M target subbands are in one-to-one correspondence with the M downsampled data sets;

and carrying out data synthesis according to the M target sub-bands to obtain the compensation audio data packet.

In one embodiment, the synthesizing data according to the M target subbands to obtain the compensated audio data packet includes:

respectively carrying out up-sampling treatment on the M target sub-bands to obtain M up-sampling sub-bands, wherein the up-sampling multiple of each target sub-band is the same as the down-sampling multiple of the corresponding reference sub-band data set;

and carrying out data synthesis on the M liter sampling sub-bands to obtain the compensation audio data packet.

In one embodiment, the sub-band decomposing the N reference audio data packets to obtain M reference sub-band data sets includes:

sub-band decomposition is carried out on each reference audio data packet to obtain M reference sub-bands corresponding to the M sub-band positions one by one, N reference sub-band data of the N reference audio data packets at the same sub-band position form a reference sub-band set, and the reference sub-band data of the N reference audio data packets form M reference sub-band data sets;

the predicting the target audio data packet according to the M reference subband data sets to obtain a compensated audio data packet, including:

respectively carrying out downsampling treatment on the M reference subband data sets to obtain M downsampled data sets corresponding to the M subband positions one by one;

predicting the audio frequency sub-bands of the target audio frequency data packet at the same sub-band position according to each downsampling data set to obtain corresponding target sub-bands, wherein M sub-band positions are in one-to-one correspondence with M audio frequency sub-bands of the target audio frequency data packet;

and obtaining the compensation audio data packet according to the M target sub-bands.

In one embodiment, the buffer stores a plurality of audio data packets, each audio data packet corresponds to a serial number, and serial numbers corresponding to different audio data packets in the plurality of audio data packets are different;

the method further comprises the steps of:

and under the condition that a plurality of serial numbers corresponding to a plurality of audio data packets stored in the cache are not continuous, determining to detect that a target audio data packet is missing in the cache, wherein the target audio data packet is an audio data packet corresponding to a missing serial number in the plurality of serial numbers.

In one embodiment, after the storing the compensated audio data packets in the buffer, the method further comprises:

obtaining output configuration information, wherein the output configuration information is used for indicating time delay and/or audio duration of target audio data;

extracting the target audio data from the cache according to the output configuration information;

outputting the target audio data.

In a second aspect, an embodiment of the present disclosure provides a packet loss compensation apparatus for audio data, the apparatus including:

the acquisition module is used for acquiring N reference audio data packets from the cache under the condition that the target audio data packet is detected to be missing in the cache, wherein the audio playing time of any reference audio data packet is earlier than the audio playing time of the target audio data packet, the N reference audio data packets are continuous and adjacent to the target audio data packet, and N is an integer greater than 1;

the decomposition module is used for respectively carrying out sub-band decomposition on the N reference audio data packets to obtain M reference sub-band data sets, wherein M is an integer greater than 1;

and the compensation module is used for predicting the target audio data packet according to the M reference subband data sets to obtain a compensation audio data packet, and storing the compensation audio data packet into the cache.

In a third aspect, an embodiment of the present disclosure further provides an electronic device, including a processor, a memory, and a computer program stored on the memory and executable on the processor, where the computer program when executed by the processor implements the steps of the packet loss compensation method for audio data described above.

In a fourth aspect, the embodiments of the present disclosure further provide a computer readable storage medium having a computer program stored thereon, the computer program implementing the steps of the packet loss compensation method for audio data described above when executed by a processor.

In the embodiment of the disclosure, when the loss of the audio data packet is detected, a plurality of reference audio data packets adjacent to the lost audio data packet are acquired from a cache, the plurality of reference audio data packets are converted into a plurality of reference sub-band data sets in a sub-band decomposition mode, and then the lost audio data packet is predicted according to the plurality of reference sub-band data sets; the frequency spectrum representation of the audio signal corresponding to the audio data packet can be simplified through the measure of sub-band decomposition, so that the data error of subsequent prediction is reduced, and the audio data packet obtained by prediction is more accurate.

Drawings

Fig. 1 is a flowchart of a packet loss compensation method for audio data according to an embodiment of the present disclosure;

fig. 2 is a flowchart of an audio signal packet loss compensation method according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a packet loss compensation device for audio data according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

The following description of the technical solutions in the embodiments of the present disclosure will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure.

An embodiment of the present disclosure provides a packet loss compensation method for audio data, as shown in fig. 1, where the packet loss compensation method for audio data includes:

step 101, acquiring N reference audio data packets from a cache under the condition that the target audio data packet is detected to be missing in the cache.

The audio playing time of any one of the reference audio data packets is earlier than the audio playing time of the target audio data packet, and the N reference audio data packets are continuous and adjacent to the target audio data packet.

The method disclosed by the disclosure is applied to an audio transmission scene, and particularly relates to transmission of an audio stream, for example: a first audio stream for music playback, a second audio stream for forming a video audio track, a third audio stream for audio-video telephony, etc.

In the present disclosure, an audio stream is transmitted in the form of a plurality of audio data packets, and the plurality of audio data packets are sequentially transmitted based on a plurality of audio playing moments corresponding to the audio data packets in the audio stream, for example: when the corresponding audio playing time is the audio data packet A1 of the first second, the corresponding audio playing time is the audio data packet A2 of the second and the corresponding audio playing time is the audio data packet A3 of the third second, the audio data packet transmitted first is the audio data packet A1, the audio data packet A2 is the audio data packet A2, and the audio data packet A3 is the audio data packet A3.

The setting of the buffer memory is to facilitate the management of a plurality of audio data packets forming a corresponding audio stream, that is, to monitor whether a plurality of audio data packets corresponding to the audio stream are missing in time, if so, the missing audio data packets are compensated by executing the subsequent steps, so as to further ensure the integrity of a plurality of audio data packets stored in the buffer memory, and avoid the problems of playing interruption or tone quality loss of audio signals extracted and output from the buffer memory.

If a plurality of audio data packets corresponding to the audio stream are not missing, the following steps are not executed.

For example, if the audio data packets expected to be transmitted into the buffer are in turn: the audio data packet A1, the audio data packet A2, the audio data packet A3, and the audio data packet A4 corresponding to the fourth second audio playing time are all understood as the reference audio data packet if the target audio data packet missing in the buffer is the audio data packet A4.

Step 102, respectively carrying out sub-band decomposition on the N reference audio data packets to obtain M reference sub-band data sets.

It should be noted that, the number of subbands obtained by performing subband decomposition on the N reference audio data packets is the same, for example: each reference audio data packet in the N reference audio data packets is decomposed into 4 or 8 sub-bands (also called frequency bands), the sub-bands are respectively decomposed by the N reference audio data packets, N data corresponding to the same sub-band form the reference sub-band data set, and at the moment, M is 4 or 8.

In one example, subband decomposition processing may be performed using a cosine modulation scheme of subband Coding (SBC).

Illustratively, the number of reference subbands included in each reference subband data set is greater than 2, such as 4, 5, 8, etc., and the specific number of reference subbands included in each reference subband data set is not limited in the present disclosure, and a user may adaptively select based on the spectral characteristics and transmission requirements of an actual audio stream.

And 103, predicting the target audio data packet according to the M reference subband data sets to obtain a compensation audio data packet, and storing the compensation audio data packet into the cache.

In one example, the M reference subband data sets may be processed based on a berg algorism (burg algorism) to obtain the compensated audio data packets; it should be noted that, in this example, although the execution of the subband decomposition action may cause a certain loss of processing efficiency (compared with the scheme of directly inputting the M reference audio data packets into the berg algorithm), since the input into the berg algorithm is the M reference subband data sets subjected to subband decomposition, and the decomposed subbands respectively correspond to different frequency band components, the above arrangement can simplify the spectrum representation of the audio signal corresponding to the reference audio data packet, and thus can reduce the interference suffered by the berg algorithm when predicting the target audio data packet based on the audio signal corresponding to the reference audio data packet, so that the berg algorithm can analyze and process the audio signal portions in different frequency ranges more finely, and can output more accurate compensated audio data packets.

For example, if the subband decomposition corresponds to 3 subband positions (i.e., m=3), the reference audio packet is 4 (i.e., n=4), and the subband decomposition is performed on the 4 reference audio packets to obtain 4 groups of reference subband sets A1[ a11, a12, a13], A2[ a21, a22, a23], A3[ a31, a32, a33], A4[ a41, a42, a43], where a11, a21, a31, a41 each corresponds to a first subband position, a12, a22, a32, a42 each corresponds to a second subband position, and a13, a23, a33, a43 each corresponds to a third subband position;

in this example, three reference subband data sets may be obtained based on 4 sets of reference subband data sets, respectively, a reference subband data set [ a11, a21, a31, a41] corresponding to a first subband position, a reference subband data set [ a12, a22, a32, a42] corresponding to a second subband position, a reference subband data set [ a13, a23, a33, a43] corresponding to a third subband position;

predicting based on the reference subband data sets [ a11, a21, a31, a41], a target subband b1 corresponding to the first subband position is obtained;

predicting based on the reference subband data sets [ a12, a22, a32, a42], a target subband b2 corresponding to the second subband position is obtained;

predicting based on the reference subband dataset [13, a23, a33, a43], a target subband b3 corresponding to a third subband position is obtained;

the above-mentioned compensating audio data packet can be obtained according to the target sub-band b1, the target sub-band b2, and the target sub-band b 3.

In this embodiment, by setting the step of downsampling, the amount of computation in the prediction process (the effective information contained in the downsampled data is correspondingly reduced), which not only improves the prediction efficiency, but also reduces the error accumulation caused by overlarge amount of computation, so that the output compensated audio data packet is more accurate.

In this way, the multiple of the downsampling is limited to be not more than the number of the sub-bands corresponding to the sub-band decomposition step, so that the problem of information loss caused by excessive downsampling is avoided, and the accuracy of the obtained prediction result can be ensured.

In one example, if the above prediction processing is performed by using the berg algorithm, although the berg algorithm has a very good prediction effect on the short-time stationary signal, the more the number of prediction points input to the berg algorithm is, the more errors are accumulated, which may result in a decrease in the accuracy of the prediction result finally output by the algorithm, and after the above downsampling processing is performed, the accumulation of errors may be suppressed by reducing the number of prediction points input to the berg algorithm, so as to further improve the accuracy of the prediction result finally output by the algorithm.

It should be noted that, in some embodiments, if the foregoing prediction processing is performed based on the berger algorithm, the sub-band decomposition step may be skipped, and the downsampling processing may be directly performed on the M reference audio data packets, and then the prediction may be performed according to the downsampled M reference audio data packets, which may also improve the accuracy of the prediction result finally output by the algorithm.

In application, for any two different reference subband data sets in the M reference subband data sets, the times of the downsampling processing of the two reference subband data sets may be the same or different, which enables the processing of the method in the downsampling stage to be more flexible.

For example, if the number M of subbands is 4 and the downsampling multiple L is 4, the N reference audio data packets are decomposed into reference subband data sets C1, C2, C3, and C4, and downsampled by L times to obtain downsampled data sets C1, C2, C3, and C4, and the downsampled data sets C1, C2, C3, and C4 are respectively predicted to obtain corresponding target subbands t1, t2, t3, and t4.

Illustratively, the M target subbands may be data synthesized by a synthesis filter to obtain the aforementioned compensated audio data packet.

In this embodiment, after the sub-bands in the reference sub-band data set are downsampled, the data amount corresponding to the target sub-band is recovered by upsampling the target sub-band adaptability obtained by prediction, so as to ensure the data consistency between the compensated audio data packet and the reference audio data packet obtained by subsequent synthesis.

In one example, data synthesis may be performed on M target subbands to obtain an initial audio packet, and then up-sampling is performed on the initial audio packet to obtain the compensated audio packet, which also ensures data consistency between the obtained compensated audio packet and a reference audio packet.

the method further comprises the steps of:

In this embodiment, different serial numbers are set for a plurality of audio data packets corresponding to the same audio stream, so that whether the audio data packets are missing in the buffer memory or not is rapidly identified by monitoring a plurality of serial numbers corresponding to a plurality of audio data packets stored in the buffer memory, and further data compensation of the missing audio data packets is timely completed, so that continuity of a plurality of audio data packets stored in the buffer memory is ensured, audio data extracted and output from the buffer memory has better quality and more reliable data continuity, and a user obtains better hearing experience.

For example, the sequence numbers of the plurality of audio data packets stored in the buffer memory may be set to be in a continuously increasing or continuously decreasing trend, in this case, if the sequence numbers corresponding to the plurality of audio data packets stored in the buffer memory are discontinuous or missing, it may be determined that a packet loss problem occurs in the audio streaming process, that is, it is determined that the target audio data packet missing in the buffer memory is monitored.

outputting the target audio data.

As described above, based on the foregoing measures of packet loss compensation, continuity of the audio data packets stored in the buffer memory is guaranteed, in this case, the user may specify a time delay and/or a playing duration (i.e., an audio duration) of the desired target audio data based on the actual requirement, and accordingly extract the corresponding target audio data from the buffer memory, so as to meet the audio playing requirement or the audio processing requirement under different application scenarios.

Illustratively, the formula of the Boger algorithm corresponding to the prediction process is as follows:

wherein X is _t For predicting the signal (i.e. forming a target subband of the compensating audio data packet), X _t-p For historical signals (i.e. reference sub-band, X of reference audio data packets _t-1 …X _t-p Corresponding to the M reference audio data packets respectively), p is the order of the berg algorithm, t indicates the sequence number of the predicted signal in the signal stream (i.e. the sequence number corresponding to the target audio data packet in the audio stream), a _p For the recursive predictive coefficient, epsilon, of the corresponding historical signal _t Is a random disturbance term (i.e., noise) corresponding to the predicted signal.

By applying the method disclosed by the invention, the packet loss can be effectively compensated while the audio quality is maintained, so that more stable and consistent audio transmission experience is provided for users, and the method has the advantages of applicability, self-adaption, high efficiency and the like in a multimedia data transmission scene.

For ease of understanding, examples are illustrated below:

referring to fig. 2, fig. 2 shows an audio signal packet loss compensation method, specifically:

step 1: inputting an audio data packet corresponding to the input audio into a cache; wherein, the buffer is used for storing the history data for packet loss compensation, and if no packet loss exists, the audio data is directly output from the buffer; if the packet loss exists, firstly storing the predicted audio data packet compensated by the packet loss into a buffer memory, and then taking out the corresponding audio data from the buffer memory to output.

Step 2: in the audio stream transmission process, a unique serial number is allocated to each audio data packet so as to be recombined and recovered at an audio receiving end; the receiving end detects the packet loss condition by monitoring the serial number of the received audio data packet.

In the case of no packet loss, the sequence number of the audio data should be continuously incremented; if the receiving end detects that the sequence number is discontinuous or missing, the receiving end judges that packet loss occurs.

When the packet loss is detected, the receiving end marks the audio data packet with the corresponding serial number as a lost state, and obtains the corresponding predicted audio data packet based on the packet loss compensation step.

The packet loss compensation steps are as follows:

firstly, sequentially decomposing a plurality of audio data packets positioned in front of a lost state in a cache into different sub-bands or frequency bands, wherein each sub-band represents a part of frequency spectrum information in the audio data packets;

secondly, sequentially carrying out downsampling treatment on the plurality of sub-bands obtained by decomposition so as to reduce the operand and improve the prediction precision;

then, self-adaptive prediction compensation is carried out on the output after the downsampling processing by utilizing a burg algorithm, a subband estimated value of a lost audio data packet is generated by combining the filtering processing process, and then the subband estimated value is subjected to upsampling processing;

and applying the packet loss compensation step to each sub-band to obtain a sub-band estimated value of each sub-band, and finally, carrying out data synthesis on a plurality of sub-band estimated values through a synthesis filter to obtain a final predicted audio data packet.

Finally, the time delay and the length of the audio data extracted from the buffer memory are determined based on the user requirement, and the extracted audio data are sent to other modules for further processing.

Referring to fig. 3, fig. 3 is a packet loss compensation device 300 for audio data according to an embodiment of the present disclosure, as shown in fig. 3, the packet loss compensation device 300 for audio data includes:

the obtaining module 301 is configured to obtain N reference audio data packets from a cache when a target audio data packet is detected to be missing in the cache, where an audio playing time of any one of the reference audio data packets is earlier than an audio playing time of the target audio data packet, the N reference audio data packets are continuous and adjacent to the target audio data packet, and N is an integer greater than 1;

the decomposition module 302 is configured to perform subband decomposition on the N reference audio data packets respectively to obtain M reference subband data sets, where M is an integer greater than 1;

and the compensation module 303 is configured to predict the target audio data packet according to the M reference subband data sets, obtain a compensated audio data packet, and store the compensated audio data packet in the buffer.

In one embodiment, the decomposition module 302 is specifically configured to:

the compensation module 303 is specifically configured to:

In one embodiment, the compensation module 303 includes:

the downsampling unit is used for respectively performing downsampling processing on the M reference subband data sets to obtain M downsampled data sets, wherein the multiple of the downsampling processing is more than 1 and less than or equal to M;

and the prediction unit is used for predicting the target audio data packet according to the M downsampled data sets to obtain a compensated audio data packet.

In one embodiment, the prediction unit is specifically configured to:

In an embodiment, the prediction unit is further configured to:

In one embodiment, the downsampling process is by a factor greater than 1 and less than or equal to M.

the apparatus 300 further comprises:

and the monitoring module is used for determining that the target audio data packet is missed in the cache under the condition that a plurality of serial numbers corresponding to the plurality of audio data packets stored in the cache are not continuous, wherein the target audio data packet is the audio data packet corresponding to the missing serial number in the plurality of serial numbers.

In one embodiment, the apparatus 300 further comprises an output module for:

outputting the target audio data.

The packet loss compensation device 300 for audio data provided in the embodiment of the present disclosure can implement each process in the embodiment of the packet loss compensation method for audio data, and in order to avoid repetition, a description thereof is omitted here.

According to an embodiment of the disclosure, the disclosure further provides an electronic device, a readable storage medium.

Fig. 4 illustrates a schematic block diagram of an example electronic device 400 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 4, the apparatus 400 includes a computing unit 401 that can perform various appropriate actions and processes according to a computer program stored in a Read-Only Memory (ROM) 402 or a computer program loaded from a storage unit 408 into a random access Memory (Random Access Memory, RAM) 403. In RAM 403, various programs and data required for the operation of device 400 may also be stored. The computing unit 401, ROM 402, and RAM 403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

Various components in device 400 are connected to I/O interface 405, including: an input unit 406 such as a keyboard, a mouse, etc.; an output unit 404 such as various types of displays, speakers, and the like; a storage unit 408, such as a magnetic disk, optical disk, etc.; and a communication unit 409 such as a network card, modem, wireless communication transceiver, etc. The communication unit 409 allows the device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 401 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 401 include, but are not limited to, a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphic Process Unit, GPU), various dedicated artificial intelligence (Artificial Intelligence, AI) computing chips, various computing units running machine learning model algorithms, digital signal processors (Digital Signal Processing, DSP), and any suitable processors, controllers, microcontrollers, etc. The calculation unit 401 performs the respective methods and processes described above, for example, a packet loss compensation method of audio data. For example, in some embodiments, the method of packet loss compensation of audio data may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 400 via the ROM 402 and/or the communication unit 409. When the computer program is loaded into the RAM 403 and executed by the computing unit 401, one or more steps of the packet loss compensation method of audio data described above may be performed. Alternatively, in other embodiments, the computing unit 401 may be configured to perform the packet loss compensation method of the audio data in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above can be implemented in digital electronic circuitry, integrated circuitry, field programmable gate arrays (Field-Programmable Gate Array, FPGA), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), application specific standard products (Application Specific Standard Product, ASSP), system On Chip (SOC), complex programmable logic devices (Complex Programmable Logic Device, CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method for packet loss compensation of audio data, the method comprising:

2. The method of claim 1, wherein predicting the target audio data packet from the M reference subband data sets to obtain a compensated audio data packet comprises:

3. The method of claim 2, wherein predicting the target audio data packet from the M downsampled data sets to obtain a compensated audio data packet comprises:

4. A method according to claim 3, wherein said synthesizing data according to said M target subbands to obtain said compensated audio data packet comprises:

5. The method of claim 1, wherein sub-band decomposing the N reference audio data packets to obtain M reference sub-band data sets comprises:

6. The method of claim 1, wherein a plurality of audio data packets are stored in the buffer, each of the audio data packets corresponding to a sequence number, the sequence numbers corresponding to different audio data packets in the plurality of audio data packets being different;

the method further comprises the steps of:

7. The method of claim 1, wherein after storing the compensated audio data packets in the buffer, the method further comprises:

outputting the target audio data.

8. A packet loss compensation device for audio data, the device comprising:

9. An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, which when executed by the processor performs the steps of the method according to any one of claims 1 to 7.

10. A computer readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the steps of the method according to any of claims 1 to 7.