CN117676185A - Packet loss compensation method and device for audio data and related equipment - Google Patents

Packet loss compensation method and device for audio data and related equipment Download PDF

Info

Publication number
CN117676185A
CN117676185A CN202311661514.7A CN202311661514A CN117676185A CN 117676185 A CN117676185 A CN 117676185A CN 202311661514 A CN202311661514 A CN 202311661514A CN 117676185 A CN117676185 A CN 117676185A
Authority
CN
China
Prior art keywords
audio data
data packet
sub
target
band
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311661514.7A
Other languages
Chinese (zh)
Inventor
戴祖华
葛文婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Zhonggan Microelectronics Co ltd
Zgmicro Corp
Original Assignee
Zhejiang Zhonggan Microelectronics Co ltd
Zgmicro Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Zhonggan Microelectronics Co ltd, Zgmicro Corp filed Critical Zhejiang Zhonggan Microelectronics Co ltd
Priority to CN202311661514.7A priority Critical patent/CN117676185A/en
Publication of CN117676185A publication Critical patent/CN117676185A/en
Pending legal-status Critical Current

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The disclosure provides a packet loss compensation method, device and related equipment for audio data, and relates to the technical field of audio coding and decoding, wherein the method comprises the following steps: under the condition that the target audio data packet is detected to be missing in the cache, N reference audio data packets are obtained from the cache, wherein the audio playing time of any one of the reference audio data packets is earlier than the audio playing time of the target audio data packet, and the N reference audio data packets are continuous and adjacent to the target audio data packet; sub-band decomposition is carried out on the reference audio data packet to obtain M reference sub-band data sets, wherein M is an integer greater than 1; and predicting the target audio data packet according to the M reference subband data sets to obtain M subband compensating audio data packets, synthesizing the M subbands to obtain the compensating audio data packets, and storing the compensating audio data packets in the cache. The method and the device can enable the predicted compensation audio data packet to be more accurate.

Description

Packet loss compensation method and device for audio data and related equipment
Technical Field
The disclosure relates to the technical field of audio encoding and decoding, and in particular relates to a packet loss compensation method and device for audio data and related equipment.
Background
During the audio data transmission process, audio data packets are lost due to signal attenuation, interference, network congestion or other factors, and the audio data packet loss may cause significant interruption or tone quality loss of an audio signal and seriously affect the hearing experience of a user.
In the related art, a linear prediction algorithm is used to compensate for a lost audio data packet, that is, the lost audio data packet is estimated by analyzing a received audio data packet and extracting characteristic values such as formants of an audio signal, which is found in application that the accuracy of the audio data packet estimated based on the method is low when the audio data packet is faced with diversified audio signals (such as multimedia resource transmission of audio and video) and different communication environments (such as a situation that the network environment fluctuates greatly).
Disclosure of Invention
The disclosure aims to provide a packet loss compensation method, device and related equipment for audio data, which are used for solving the technical problem that the accuracy of an audio data packet estimated in a complex application environment by a related technology is low.
In a first aspect, an embodiment of the present disclosure provides a packet loss compensation method for audio data, including:
under the condition that a target audio data packet is detected to be missing in a cache, N reference audio data packets are obtained from the cache, wherein the audio playing time of any reference audio data packet is earlier than the audio playing time of the target audio data packet, the N reference audio data packets are continuous and adjacent to the target audio data packet, and N is an integer greater than 1;
respectively carrying out sub-band decomposition on the N reference audio data packets to obtain M reference sub-band data sets, wherein M is an integer greater than 1;
and predicting the target audio data packet according to the M reference subband data sets to obtain a compensation audio data packet, and storing the compensation audio data packet into the cache.
In one embodiment, the predicting the target audio data packet according to the M reference subband data sets to obtain a compensated audio data packet includes:
respectively carrying out downsampling treatment on the M reference subband data sets to obtain M downsampled data sets, wherein the multiple of the downsampling treatment is more than 1 and less than or equal to M;
and predicting the target audio data packet according to the M downsampled data sets to obtain a compensated audio data packet.
In one embodiment, the predicting the target audio data packet according to the M downsampled data sets to obtain the compensated audio data packet includes:
predicting the target audio data packet according to M downsampled data sets to obtain M target subbands, wherein the M target subbands are in one-to-one correspondence with the M downsampled data sets;
and carrying out data synthesis according to the M target sub-bands to obtain the compensation audio data packet.
In one embodiment, the synthesizing data according to the M target subbands to obtain the compensated audio data packet includes:
respectively carrying out up-sampling treatment on the M target sub-bands to obtain M up-sampling sub-bands, wherein the up-sampling multiple of each target sub-band is the same as the down-sampling multiple of the corresponding reference sub-band data set;
and carrying out data synthesis on the M liter sampling sub-bands to obtain the compensation audio data packet.
In one embodiment, the sub-band decomposing the N reference audio data packets to obtain M reference sub-band data sets includes:
sub-band decomposition is carried out on each reference audio data packet to obtain M reference sub-bands corresponding to the M sub-band positions one by one, N reference sub-band data of the N reference audio data packets at the same sub-band position form a reference sub-band set, and the reference sub-band data of the N reference audio data packets form M reference sub-band data sets;
the predicting the target audio data packet according to the M reference subband data sets to obtain a compensated audio data packet, including:
respectively carrying out downsampling treatment on the M reference subband data sets to obtain M downsampled data sets corresponding to the M subband positions one by one;
predicting the audio frequency sub-bands of the target audio frequency data packet at the same sub-band position according to each downsampling data set to obtain corresponding target sub-bands, wherein M sub-band positions are in one-to-one correspondence with M audio frequency sub-bands of the target audio frequency data packet;
and obtaining the compensation audio data packet according to the M target sub-bands.
In one embodiment, the buffer stores a plurality of audio data packets, each audio data packet corresponds to a serial number, and serial numbers corresponding to different audio data packets in the plurality of audio data packets are different;
the method further comprises the steps of:
and under the condition that a plurality of serial numbers corresponding to a plurality of audio data packets stored in the cache are not continuous, determining to detect that a target audio data packet is missing in the cache, wherein the target audio data packet is an audio data packet corresponding to a missing serial number in the plurality of serial numbers.
In one embodiment, after the storing the compensated audio data packets in the buffer, the method further comprises:
obtaining output configuration information, wherein the output configuration information is used for indicating time delay and/or audio duration of target audio data;
extracting the target audio data from the cache according to the output configuration information;
outputting the target audio data.
In a second aspect, an embodiment of the present disclosure provides a packet loss compensation apparatus for audio data, the apparatus including:
the acquisition module is used for acquiring N reference audio data packets from the cache under the condition that the target audio data packet is detected to be missing in the cache, wherein the audio playing time of any reference audio data packet is earlier than the audio playing time of the target audio data packet, the N reference audio data packets are continuous and adjacent to the target audio data packet, and N is an integer greater than 1;
the decomposition module is used for respectively carrying out sub-band decomposition on the N reference audio data packets to obtain M reference sub-band data sets, wherein M is an integer greater than 1;
and the compensation module is used for predicting the target audio data packet according to the M reference subband data sets to obtain a compensation audio data packet, and storing the compensation audio data packet into the cache.
In a third aspect, an embodiment of the present disclosure further provides an electronic device, including a processor, a memory, and a computer program stored on the memory and executable on the processor, where the computer program when executed by the processor implements the steps of the packet loss compensation method for audio data described above.
In a fourth aspect, the embodiments of the present disclosure further provide a computer readable storage medium having a computer program stored thereon, the computer program implementing the steps of the packet loss compensation method for audio data described above when executed by a processor.
In the embodiment of the disclosure, when the loss of the audio data packet is detected, a plurality of reference audio data packets adjacent to the lost audio data packet are acquired from a cache, the plurality of reference audio data packets are converted into a plurality of reference sub-band data sets in a sub-band decomposition mode, and then the lost audio data packet is predicted according to the plurality of reference sub-band data sets; the frequency spectrum representation of the audio signal corresponding to the audio data packet can be simplified through the measure of sub-band decomposition, so that the data error of subsequent prediction is reduced, and the audio data packet obtained by prediction is more accurate.
Drawings
Fig. 1 is a flowchart of a packet loss compensation method for audio data according to an embodiment of the present disclosure;
fig. 2 is a flowchart of an audio signal packet loss compensation method according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a packet loss compensation device for audio data according to an embodiment of the present disclosure;
fig. 4 is a schematic diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
The following description of the technical solutions in the embodiments of the present disclosure will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure.
An embodiment of the present disclosure provides a packet loss compensation method for audio data, as shown in fig. 1, where the packet loss compensation method for audio data includes:
step 101, acquiring N reference audio data packets from a cache under the condition that the target audio data packet is detected to be missing in the cache.
The audio playing time of any one of the reference audio data packets is earlier than the audio playing time of the target audio data packet, and the N reference audio data packets are continuous and adjacent to the target audio data packet.
The method disclosed by the disclosure is applied to an audio transmission scene, and particularly relates to transmission of an audio stream, for example: a first audio stream for music playback, a second audio stream for forming a video audio track, a third audio stream for audio-video telephony, etc.
In the present disclosure, an audio stream is transmitted in the form of a plurality of audio data packets, and the plurality of audio data packets are sequentially transmitted based on a plurality of audio playing moments corresponding to the audio data packets in the audio stream, for example: when the corresponding audio playing time is the audio data packet A1 of the first second, the corresponding audio playing time is the audio data packet A2 of the second and the corresponding audio playing time is the audio data packet A3 of the third second, the audio data packet transmitted first is the audio data packet A1, the audio data packet A2 is the audio data packet A2, and the audio data packet A3 is the audio data packet A3.
The setting of the buffer memory is to facilitate the management of a plurality of audio data packets forming a corresponding audio stream, that is, to monitor whether a plurality of audio data packets corresponding to the audio stream are missing in time, if so, the missing audio data packets are compensated by executing the subsequent steps, so as to further ensure the integrity of a plurality of audio data packets stored in the buffer memory, and avoid the problems of playing interruption or tone quality loss of audio signals extracted and output from the buffer memory.
If a plurality of audio data packets corresponding to the audio stream are not missing, the following steps are not executed.
For example, if the audio data packets expected to be transmitted into the buffer are in turn: the audio data packet A1, the audio data packet A2, the audio data packet A3, and the audio data packet A4 corresponding to the fourth second audio playing time are all understood as the reference audio data packet if the target audio data packet missing in the buffer is the audio data packet A4.
Step 102, respectively carrying out sub-band decomposition on the N reference audio data packets to obtain M reference sub-band data sets.
It should be noted that, the number of subbands obtained by performing subband decomposition on the N reference audio data packets is the same, for example: each reference audio data packet in the N reference audio data packets is decomposed into 4 or 8 sub-bands (also called frequency bands), the sub-bands are respectively decomposed by the N reference audio data packets, N data corresponding to the same sub-band form the reference sub-band data set, and at the moment, M is 4 or 8.
In one example, subband decomposition processing may be performed using a cosine modulation scheme of subband Coding (SBC).
Illustratively, the number of reference subbands included in each reference subband data set is greater than 2, such as 4, 5, 8, etc., and the specific number of reference subbands included in each reference subband data set is not limited in the present disclosure, and a user may adaptively select based on the spectral characteristics and transmission requirements of an actual audio stream.
And 103, predicting the target audio data packet according to the M reference subband data sets to obtain a compensation audio data packet, and storing the compensation audio data packet into the cache.
In the embodiment of the disclosure, when the loss of the audio data packet is detected, a plurality of reference audio data packets adjacent to the lost audio data packet are acquired from a cache, the plurality of reference audio data packets are converted into a plurality of reference sub-band data sets in a sub-band decomposition mode, and then the lost audio data packet is predicted according to the plurality of reference sub-band data sets; the frequency spectrum representation of the audio signal corresponding to the audio data packet can be simplified through the measure of sub-band decomposition, so that the data error of subsequent prediction is reduced, and the audio data packet obtained by prediction is more accurate.
In one example, the M reference subband data sets may be processed based on a berg algorism (burg algorism) to obtain the compensated audio data packets; it should be noted that, in this example, although the execution of the subband decomposition action may cause a certain loss of processing efficiency (compared with the scheme of directly inputting the M reference audio data packets into the berg algorithm), since the input into the berg algorithm is the M reference subband data sets subjected to subband decomposition, and the decomposed subbands respectively correspond to different frequency band components, the above arrangement can simplify the spectrum representation of the audio signal corresponding to the reference audio data packet, and thus can reduce the interference suffered by the berg algorithm when predicting the target audio data packet based on the audio signal corresponding to the reference audio data packet, so that the berg algorithm can analyze and process the audio signal portions in different frequency ranges more finely, and can output more accurate compensated audio data packets.
In one embodiment, the sub-band decomposing the N reference audio data packets to obtain M reference sub-band data sets includes:
sub-band decomposition is carried out on each reference audio data packet to obtain M reference sub-bands corresponding to the M sub-band positions one by one, N reference sub-band data of the N reference audio data packets at the same sub-band position form a reference sub-band set, and the reference sub-band data of the N reference audio data packets form M reference sub-band data sets;
the predicting the target audio data packet according to the M reference subband data sets to obtain a compensated audio data packet, including:
respectively carrying out downsampling treatment on the M reference subband data sets to obtain M downsampled data sets corresponding to the M subband positions one by one;
predicting the audio frequency sub-bands of the target audio frequency data packet at the same sub-band position according to each downsampling data set to obtain corresponding target sub-bands, wherein M sub-band positions are in one-to-one correspondence with M audio frequency sub-bands of the target audio frequency data packet;
and obtaining the compensation audio data packet according to the M target sub-bands.
For example, if the subband decomposition corresponds to 3 subband positions (i.e., m=3), the reference audio packet is 4 (i.e., n=4), and the subband decomposition is performed on the 4 reference audio packets to obtain 4 groups of reference subband sets A1[ a11, a12, a13], A2[ a21, a22, a23], A3[ a31, a32, a33], A4[ a41, a42, a43], where a11, a21, a31, a41 each corresponds to a first subband position, a12, a22, a32, a42 each corresponds to a second subband position, and a13, a23, a33, a43 each corresponds to a third subband position;
in this example, three reference subband data sets may be obtained based on 4 sets of reference subband data sets, respectively, a reference subband data set [ a11, a21, a31, a41] corresponding to a first subband position, a reference subband data set [ a12, a22, a32, a42] corresponding to a second subband position, a reference subband data set [ a13, a23, a33, a43] corresponding to a third subband position;
predicting based on the reference subband data sets [ a11, a21, a31, a41], a target subband b1 corresponding to the first subband position is obtained;
predicting based on the reference subband data sets [ a12, a22, a32, a42], a target subband b2 corresponding to the second subband position is obtained;
predicting based on the reference subband dataset [13, a23, a33, a43], a target subband b3 corresponding to a third subband position is obtained;
the above-mentioned compensating audio data packet can be obtained according to the target sub-band b1, the target sub-band b2, and the target sub-band b 3.
In one embodiment, the predicting the target audio data packet according to the M reference subband data sets to obtain a compensated audio data packet includes:
respectively carrying out downsampling treatment on the M reference subband data sets to obtain M downsampled data sets, wherein the multiple of the downsampling treatment is more than 1 and less than or equal to M;
and predicting the target audio data packet according to the M downsampled data sets to obtain a compensated audio data packet.
In this embodiment, by setting the step of downsampling, the amount of computation in the prediction process (the effective information contained in the downsampled data is correspondingly reduced), which not only improves the prediction efficiency, but also reduces the error accumulation caused by overlarge amount of computation, so that the output compensated audio data packet is more accurate.
In this way, the multiple of the downsampling is limited to be not more than the number of the sub-bands corresponding to the sub-band decomposition step, so that the problem of information loss caused by excessive downsampling is avoided, and the accuracy of the obtained prediction result can be ensured.
In one example, if the above prediction processing is performed by using the berg algorithm, although the berg algorithm has a very good prediction effect on the short-time stationary signal, the more the number of prediction points input to the berg algorithm is, the more errors are accumulated, which may result in a decrease in the accuracy of the prediction result finally output by the algorithm, and after the above downsampling processing is performed, the accumulation of errors may be suppressed by reducing the number of prediction points input to the berg algorithm, so as to further improve the accuracy of the prediction result finally output by the algorithm.
It should be noted that, in some embodiments, if the foregoing prediction processing is performed based on the berger algorithm, the sub-band decomposition step may be skipped, and the downsampling processing may be directly performed on the M reference audio data packets, and then the prediction may be performed according to the downsampled M reference audio data packets, which may also improve the accuracy of the prediction result finally output by the algorithm.
In application, for any two different reference subband data sets in the M reference subband data sets, the times of the downsampling processing of the two reference subband data sets may be the same or different, which enables the processing of the method in the downsampling stage to be more flexible.
In one embodiment, the predicting the target audio data packet according to the M downsampled data sets to obtain the compensated audio data packet includes:
predicting the target audio data packet according to M downsampled data sets to obtain M target subbands, wherein the M target subbands are in one-to-one correspondence with the M downsampled data sets;
and carrying out data synthesis according to the M target sub-bands to obtain the compensation audio data packet.
For example, if the number M of subbands is 4 and the downsampling multiple L is 4, the N reference audio data packets are decomposed into reference subband data sets C1, C2, C3, and C4, and downsampled by L times to obtain downsampled data sets C1, C2, C3, and C4, and the downsampled data sets C1, C2, C3, and C4 are respectively predicted to obtain corresponding target subbands t1, t2, t3, and t4.
Illustratively, the M target subbands may be data synthesized by a synthesis filter to obtain the aforementioned compensated audio data packet.
In one embodiment, the synthesizing data according to the M target subbands to obtain the compensated audio data packet includes:
respectively carrying out up-sampling treatment on the M target sub-bands to obtain M up-sampling sub-bands, wherein the up-sampling multiple of each target sub-band is the same as the down-sampling multiple of the corresponding reference sub-band data set;
and carrying out data synthesis on the M liter sampling sub-bands to obtain the compensation audio data packet.
In this embodiment, after the sub-bands in the reference sub-band data set are downsampled, the data amount corresponding to the target sub-band is recovered by upsampling the target sub-band adaptability obtained by prediction, so as to ensure the data consistency between the compensated audio data packet and the reference audio data packet obtained by subsequent synthesis.
In one example, data synthesis may be performed on M target subbands to obtain an initial audio packet, and then up-sampling is performed on the initial audio packet to obtain the compensated audio packet, which also ensures data consistency between the obtained compensated audio packet and a reference audio packet.
In one embodiment, the buffer stores a plurality of audio data packets, each audio data packet corresponds to a serial number, and serial numbers corresponding to different audio data packets in the plurality of audio data packets are different;
the method further comprises the steps of:
and under the condition that a plurality of serial numbers corresponding to a plurality of audio data packets stored in the cache are not continuous, determining to detect that a target audio data packet is missing in the cache, wherein the target audio data packet is an audio data packet corresponding to a missing serial number in the plurality of serial numbers.
In this embodiment, different serial numbers are set for a plurality of audio data packets corresponding to the same audio stream, so that whether the audio data packets are missing in the buffer memory or not is rapidly identified by monitoring a plurality of serial numbers corresponding to a plurality of audio data packets stored in the buffer memory, and further data compensation of the missing audio data packets is timely completed, so that continuity of a plurality of audio data packets stored in the buffer memory is ensured, audio data extracted and output from the buffer memory has better quality and more reliable data continuity, and a user obtains better hearing experience.
For example, the sequence numbers of the plurality of audio data packets stored in the buffer memory may be set to be in a continuously increasing or continuously decreasing trend, in this case, if the sequence numbers corresponding to the plurality of audio data packets stored in the buffer memory are discontinuous or missing, it may be determined that a packet loss problem occurs in the audio streaming process, that is, it is determined that the target audio data packet missing in the buffer memory is monitored.
In one embodiment, after the storing the compensated audio data packets in the buffer, the method further comprises:
obtaining output configuration information, wherein the output configuration information is used for indicating time delay and/or audio duration of target audio data;
extracting the target audio data from the cache according to the output configuration information;
outputting the target audio data.
As described above, based on the foregoing measures of packet loss compensation, continuity of the audio data packets stored in the buffer memory is guaranteed, in this case, the user may specify a time delay and/or a playing duration (i.e., an audio duration) of the desired target audio data based on the actual requirement, and accordingly extract the corresponding target audio data from the buffer memory, so as to meet the audio playing requirement or the audio processing requirement under different application scenarios.
Illustratively, the formula of the Boger algorithm corresponding to the prediction process is as follows:
wherein X is t For predicting the signal (i.e. forming a target subband of the compensating audio data packet), X t-p For historical signals (i.e. reference sub-band, X of reference audio data packets t-1 …X t-p Corresponding to the M reference audio data packets respectively), p is the order of the berg algorithm, t indicates the sequence number of the predicted signal in the signal stream (i.e. the sequence number corresponding to the target audio data packet in the audio stream), a p For the recursive predictive coefficient, epsilon, of the corresponding historical signal t Is a random disturbance term (i.e., noise) corresponding to the predicted signal.
By applying the method disclosed by the invention, the packet loss can be effectively compensated while the audio quality is maintained, so that more stable and consistent audio transmission experience is provided for users, and the method has the advantages of applicability, self-adaption, high efficiency and the like in a multimedia data transmission scene.
For ease of understanding, examples are illustrated below:
referring to fig. 2, fig. 2 shows an audio signal packet loss compensation method, specifically:
step 1: inputting an audio data packet corresponding to the input audio into a cache; wherein, the buffer is used for storing the history data for packet loss compensation, and if no packet loss exists, the audio data is directly output from the buffer; if the packet loss exists, firstly storing the predicted audio data packet compensated by the packet loss into a buffer memory, and then taking out the corresponding audio data from the buffer memory to output.
Step 2: in the audio stream transmission process, a unique serial number is allocated to each audio data packet so as to be recombined and recovered at an audio receiving end; the receiving end detects the packet loss condition by monitoring the serial number of the received audio data packet.
In the case of no packet loss, the sequence number of the audio data should be continuously incremented; if the receiving end detects that the sequence number is discontinuous or missing, the receiving end judges that packet loss occurs.
When the packet loss is detected, the receiving end marks the audio data packet with the corresponding serial number as a lost state, and obtains the corresponding predicted audio data packet based on the packet loss compensation step.
The packet loss compensation steps are as follows:
firstly, sequentially decomposing a plurality of audio data packets positioned in front of a lost state in a cache into different sub-bands or frequency bands, wherein each sub-band represents a part of frequency spectrum information in the audio data packets;
secondly, sequentially carrying out downsampling treatment on the plurality of sub-bands obtained by decomposition so as to reduce the operand and improve the prediction precision;
then, self-adaptive prediction compensation is carried out on the output after the downsampling processing by utilizing a burg algorithm, a subband estimated value of a lost audio data packet is generated by combining the filtering processing process, and then the subband estimated value is subjected to upsampling processing;
and applying the packet loss compensation step to each sub-band to obtain a sub-band estimated value of each sub-band, and finally, carrying out data synthesis on a plurality of sub-band estimated values through a synthesis filter to obtain a final predicted audio data packet.
Finally, the time delay and the length of the audio data extracted from the buffer memory are determined based on the user requirement, and the extracted audio data are sent to other modules for further processing.
Referring to fig. 3, fig. 3 is a packet loss compensation device 300 for audio data according to an embodiment of the present disclosure, as shown in fig. 3, the packet loss compensation device 300 for audio data includes:
the obtaining module 301 is configured to obtain N reference audio data packets from a cache when a target audio data packet is detected to be missing in the cache, where an audio playing time of any one of the reference audio data packets is earlier than an audio playing time of the target audio data packet, the N reference audio data packets are continuous and adjacent to the target audio data packet, and N is an integer greater than 1;
the decomposition module 302 is configured to perform subband decomposition on the N reference audio data packets respectively to obtain M reference subband data sets, where M is an integer greater than 1;
and the compensation module 303 is configured to predict the target audio data packet according to the M reference subband data sets, obtain a compensated audio data packet, and store the compensated audio data packet in the buffer.
In one embodiment, the decomposition module 302 is specifically configured to:
sub-band decomposition is carried out on each reference audio data packet to obtain M reference sub-bands corresponding to the M sub-band positions one by one, N reference sub-band data of the N reference audio data packets at the same sub-band position form a reference sub-band set, and the reference sub-band data of the N reference audio data packets form M reference sub-band data sets;
the compensation module 303 is specifically configured to:
respectively carrying out downsampling treatment on the M reference subband data sets to obtain M downsampled data sets corresponding to the M subband positions one by one;
predicting the audio frequency sub-bands of the target audio frequency data packet at the same sub-band position according to each downsampling data set to obtain corresponding target sub-bands, wherein M sub-band positions are in one-to-one correspondence with M audio frequency sub-bands of the target audio frequency data packet;
and obtaining the compensation audio data packet according to the M target sub-bands.
In one embodiment, the compensation module 303 includes:
the downsampling unit is used for respectively performing downsampling processing on the M reference subband data sets to obtain M downsampled data sets, wherein the multiple of the downsampling processing is more than 1 and less than or equal to M;
and the prediction unit is used for predicting the target audio data packet according to the M downsampled data sets to obtain a compensated audio data packet.
In one embodiment, the prediction unit is specifically configured to:
predicting the target audio data packet according to M downsampled data sets to obtain M target subbands, wherein the M target subbands are in one-to-one correspondence with the M downsampled data sets;
and carrying out data synthesis according to the M target sub-bands to obtain the compensation audio data packet.
In an embodiment, the prediction unit is further configured to:
respectively carrying out up-sampling treatment on the M target sub-bands to obtain M up-sampling sub-bands, wherein the up-sampling multiple of each target sub-band is the same as the down-sampling multiple of the corresponding reference sub-band data set;
and carrying out data synthesis on the M liter sampling sub-bands to obtain the compensation audio data packet.
In one embodiment, the downsampling process is by a factor greater than 1 and less than or equal to M.
In one embodiment, the buffer stores a plurality of audio data packets, each audio data packet corresponds to a serial number, and serial numbers corresponding to different audio data packets in the plurality of audio data packets are different;
the apparatus 300 further comprises:
and the monitoring module is used for determining that the target audio data packet is missed in the cache under the condition that a plurality of serial numbers corresponding to the plurality of audio data packets stored in the cache are not continuous, wherein the target audio data packet is the audio data packet corresponding to the missing serial number in the plurality of serial numbers.
In one embodiment, the apparatus 300 further comprises an output module for:
obtaining output configuration information, wherein the output configuration information is used for indicating time delay and/or audio duration of target audio data;
extracting the target audio data from the cache according to the output configuration information;
outputting the target audio data.
The packet loss compensation device 300 for audio data provided in the embodiment of the present disclosure can implement each process in the embodiment of the packet loss compensation method for audio data, and in order to avoid repetition, a description thereof is omitted here.
According to an embodiment of the disclosure, the disclosure further provides an electronic device, a readable storage medium.
Fig. 4 illustrates a schematic block diagram of an example electronic device 400 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 4, the apparatus 400 includes a computing unit 401 that can perform various appropriate actions and processes according to a computer program stored in a Read-Only Memory (ROM) 402 or a computer program loaded from a storage unit 408 into a random access Memory (Random Access Memory, RAM) 403. In RAM 403, various programs and data required for the operation of device 400 may also be stored. The computing unit 401, ROM 402, and RAM 403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
Various components in device 400 are connected to I/O interface 405, including: an input unit 406 such as a keyboard, a mouse, etc.; an output unit 404 such as various types of displays, speakers, and the like; a storage unit 408, such as a magnetic disk, optical disk, etc.; and a communication unit 409 such as a network card, modem, wireless communication transceiver, etc. The communication unit 409 allows the device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 401 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 401 include, but are not limited to, a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphic Process Unit, GPU), various dedicated artificial intelligence (Artificial Intelligence, AI) computing chips, various computing units running machine learning model algorithms, digital signal processors (Digital Signal Processing, DSP), and any suitable processors, controllers, microcontrollers, etc. The calculation unit 401 performs the respective methods and processes described above, for example, a packet loss compensation method of audio data. For example, in some embodiments, the method of packet loss compensation of audio data may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 400 via the ROM 402 and/or the communication unit 409. When the computer program is loaded into the RAM 403 and executed by the computing unit 401, one or more steps of the packet loss compensation method of audio data described above may be performed. Alternatively, in other embodiments, the computing unit 401 may be configured to perform the packet loss compensation method of the audio data in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above can be implemented in digital electronic circuitry, integrated circuitry, field programmable gate arrays (Field-Programmable Gate Array, FPGA), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), application specific standard products (Application Specific Standard Product, ASSP), system On Chip (SOC), complex programmable logic devices (Complex Programmable Logic Device, CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (10)

1. A method for packet loss compensation of audio data, the method comprising:
under the condition that a target audio data packet is detected to be missing in a cache, N reference audio data packets are obtained from the cache, wherein the audio playing time of any reference audio data packet is earlier than the audio playing time of the target audio data packet, the N reference audio data packets are continuous and adjacent to the target audio data packet, and N is an integer greater than 1;
respectively carrying out sub-band decomposition on the N reference audio data packets to obtain M reference sub-band data sets, wherein M is an integer greater than 1;
and predicting the target audio data packet according to the M reference subband data sets to obtain a compensation audio data packet, and storing the compensation audio data packet into the cache.
2. The method of claim 1, wherein predicting the target audio data packet from the M reference subband data sets to obtain a compensated audio data packet comprises:
respectively carrying out downsampling treatment on the M reference subband data sets to obtain M downsampled data sets, wherein the multiple of the downsampling treatment is more than 1 and less than or equal to M;
and predicting the target audio data packet according to the M downsampled data sets to obtain a compensated audio data packet.
3. The method of claim 2, wherein predicting the target audio data packet from the M downsampled data sets to obtain a compensated audio data packet comprises:
predicting the target audio data packet according to M downsampled data sets to obtain M target subbands, wherein the M target subbands are in one-to-one correspondence with the M downsampled data sets;
and carrying out data synthesis according to the M target sub-bands to obtain the compensation audio data packet.
4. A method according to claim 3, wherein said synthesizing data according to said M target subbands to obtain said compensated audio data packet comprises:
respectively carrying out up-sampling treatment on the M target sub-bands to obtain M up-sampling sub-bands, wherein the up-sampling multiple of each target sub-band is the same as the down-sampling multiple of the corresponding reference sub-band data set;
and carrying out data synthesis on the M liter sampling sub-bands to obtain the compensation audio data packet.
5. The method of claim 1, wherein sub-band decomposing the N reference audio data packets to obtain M reference sub-band data sets comprises:
sub-band decomposition is carried out on each reference audio data packet to obtain M reference sub-bands corresponding to the M sub-band positions one by one, N reference sub-band data of the N reference audio data packets at the same sub-band position form a reference sub-band set, and the reference sub-band data of the N reference audio data packets form M reference sub-band data sets;
the predicting the target audio data packet according to the M reference subband data sets to obtain a compensated audio data packet, including:
respectively carrying out downsampling treatment on the M reference subband data sets to obtain M downsampled data sets corresponding to the M subband positions one by one;
predicting the audio frequency sub-bands of the target audio frequency data packet at the same sub-band position according to each downsampling data set to obtain corresponding target sub-bands, wherein M sub-band positions are in one-to-one correspondence with M audio frequency sub-bands of the target audio frequency data packet;
and obtaining the compensation audio data packet according to the M target sub-bands.
6. The method of claim 1, wherein a plurality of audio data packets are stored in the buffer, each of the audio data packets corresponding to a sequence number, the sequence numbers corresponding to different audio data packets in the plurality of audio data packets being different;
the method further comprises the steps of:
and under the condition that a plurality of serial numbers corresponding to a plurality of audio data packets stored in the cache are not continuous, determining to detect that a target audio data packet is missing in the cache, wherein the target audio data packet is an audio data packet corresponding to a missing serial number in the plurality of serial numbers.
7. The method of claim 1, wherein after storing the compensated audio data packets in the buffer, the method further comprises:
obtaining output configuration information, wherein the output configuration information is used for indicating time delay and/or audio duration of target audio data;
extracting the target audio data from the cache according to the output configuration information;
outputting the target audio data.
8. A packet loss compensation device for audio data, the device comprising:
the acquisition module is used for acquiring N reference audio data packets from the cache under the condition that the target audio data packet is detected to be missing in the cache, wherein the audio playing time of any reference audio data packet is earlier than the audio playing time of the target audio data packet, the N reference audio data packets are continuous and adjacent to the target audio data packet, and N is an integer greater than 1;
the decomposition module is used for respectively carrying out sub-band decomposition on the N reference audio data packets to obtain M reference sub-band data sets, wherein M is an integer greater than 1;
and the compensation module is used for predicting the target audio data packet according to the M reference subband data sets to obtain a compensation audio data packet, and storing the compensation audio data packet into the cache.
9. An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, which when executed by the processor performs the steps of the method according to any one of claims 1 to 7.
10. A computer readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the steps of the method according to any of claims 1 to 7.
CN202311661514.7A 2023-12-05 2023-12-05 Packet loss compensation method and device for audio data and related equipment Pending CN117676185A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311661514.7A CN117676185A (en) 2023-12-05 2023-12-05 Packet loss compensation method and device for audio data and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311661514.7A CN117676185A (en) 2023-12-05 2023-12-05 Packet loss compensation method and device for audio data and related equipment

Publications (1)

Publication Number Publication Date
CN117676185A true CN117676185A (en) 2024-03-08

Family

ID=90084072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311661514.7A Pending CN117676185A (en) 2023-12-05 2023-12-05 Packet loss compensation method and device for audio data and related equipment

Country Status (1)

Country Link
CN (1) CN117676185A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111953694A (en) * 2020-08-13 2020-11-17 南京百家云科技有限公司 Live broadcast-based packet loss compensation method and device
CN113744714A (en) * 2021-09-27 2021-12-03 深圳市木愚科技有限公司 Speech synthesis method, speech synthesis device, computer equipment and storage medium
CN113763974A (en) * 2021-08-31 2021-12-07 易兆微电子(杭州)股份有限公司 Packet loss compensation method and device, electronic equipment and storage medium
US20230343344A1 (en) * 2020-06-11 2023-10-26 Dolby International Ab Frame loss concealment for a low-frequency effects channel
CN116959458A (en) * 2022-04-18 2023-10-27 腾讯科技(深圳)有限公司 Audio transmission method, device, terminal, storage medium and program product

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230343344A1 (en) * 2020-06-11 2023-10-26 Dolby International Ab Frame loss concealment for a low-frequency effects channel
CN111953694A (en) * 2020-08-13 2020-11-17 南京百家云科技有限公司 Live broadcast-based packet loss compensation method and device
CN113763974A (en) * 2021-08-31 2021-12-07 易兆微电子(杭州)股份有限公司 Packet loss compensation method and device, electronic equipment and storage medium
CN113744714A (en) * 2021-09-27 2021-12-03 深圳市木愚科技有限公司 Speech synthesis method, speech synthesis device, computer equipment and storage medium
CN116959458A (en) * 2022-04-18 2023-10-27 腾讯科技(深圳)有限公司 Audio transmission method, device, terminal, storage medium and program product

Similar Documents

Publication Publication Date Title
JP6558748B2 (en) Voice / audio signal processing method and apparatus
CN111508519B (en) Method and device for enhancing voice of audio signal
EP3869775B1 (en) Double-talk state detection method and device, and electronic device
WO2023056783A1 (en) Audio processing method, related device, storage medium and program product
CN110931035B (en) Audio processing method, device, equipment and storage medium
US20170127089A1 (en) Switching Between Transforms
CN114360562A (en) Voice processing method, device, electronic equipment and storage medium
JP2022006158A (en) Video coding method, video coding apparatus, electronic device, computer-readable storage medium and computer program
CN111429926A (en) Method and device for optimizing audio coding speed
CN113270107A (en) Method and device for acquiring noise loudness in audio signal and electronic equipment
CN114757229A (en) Signal processing method, signal processing device, electronic apparatus, and medium
CN113205824B (en) Sound signal processing method, device, storage medium, chip and related equipment
CN113674752B (en) Noise reduction method and device for audio signal, readable medium and electronic equipment
CN112309418A (en) Method and device for inhibiting wind noise
CN117676185A (en) Packet loss compensation method and device for audio data and related equipment
CN114333912B (en) Voice activation detection method, device, electronic equipment and storage medium
CN113938749B (en) Audio data processing method, device, electronic equipment and storage medium
CN115273880A (en) Voice noise reduction method, model training method, device, equipment, medium and product
KR101748039B1 (en) Sampling rate conversion method and system for efficient voice call
CN113096670A (en) Audio data processing method, device, equipment and storage medium
CN114171038A (en) Voice noise reduction method, device, equipment, storage medium and program product
CN116013337B (en) Audio signal processing method, training method, device, equipment and medium for model
CN112634930B (en) Multichannel sound enhancement method and device and electronic equipment
CN114900730B (en) Method and device for acquiring delay estimation steady state value, electronic equipment and storage medium
US7739105B2 (en) System and method for processing audio frames

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination