WO2024099233A1 - 一种音频数据的编码方法、解码方法及装置 - Google Patents

一种音频数据的编码方法、解码方法及装置 Download PDF

Info

Publication number
WO2024099233A1
WO2024099233A1 PCT/CN2023/129685 CN2023129685W WO2024099233A1 WO 2024099233 A1 WO2024099233 A1 WO 2024099233A1 CN 2023129685 W CN2023129685 W CN 2023129685W WO 2024099233 A1 WO2024099233 A1 WO 2024099233A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
audio frame
audio
encoding
sample point
Prior art date
Application number
PCT/CN2023/129685
Other languages
English (en)
French (fr)
Inventor
伍子谦
张德军
蒋佳为
王鹤
林坤鹏
肖益剑
丁飘
宋慎义
Original Assignee
抖音视界有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 抖音视界有限公司 filed Critical 抖音视界有限公司
Publication of WO2024099233A1 publication Critical patent/WO2024099233A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/002Dynamic bit allocation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/22Mode decision, i.e. based on audio signal content versus external parameters

Definitions

  • the present disclosure relates to the field of data processing technology, and in particular to an audio data encoding method, a decoding method and a device.
  • the encoder In a VOIP (Voice over Internet Protocol) call, in order to improve the quality of the audio signal, the encoder will adjust the encoding mode according to the real-time network conditions, such as switching between Multiple Description Coding (MDC) mode and Single Description Coding (SDC) mode.
  • MDC Multiple Description Coding
  • SDC Single Description Coding
  • parameters such as delay and sampling rate may be inconsistent, which may cause audio discontinuity and/or noise when decoding audio data when switching coding modes.
  • the embodiments of the present disclosure provide an audio data encoding method, a decoding method, a processing method and a device, which are used to improve the quality of the audio signal when the encoding mode is switched.
  • an embodiment of the present disclosure provides a method for encoding audio data, comprising:
  • the third data is generated according to the first data, the second data and the first delay;
  • the first data is the low-frequency data obtained by dividing the original audio data of the first audio frame, and the second data is the low-frequency data obtained by dividing the original audio data of the second audio frame.
  • the low-frequency data is obtained by frequency division of the original audio data of the audio frame, and the first delay is the coding delay of the multiple description coding;
  • Multiple description coding is performed on the third data to obtain coded data of the first audio frame.
  • the method further includes:
  • the encoding mode of the first audio frame is different from the encoding mode of the second audio frame, and the encoding mode of the first audio frame is single description encoding, generating sixth data according to the fourth data, the fifth data and the second delay;
  • the fourth data is the original audio data of the first audio frame,
  • the fifth data is the original audio data of the second audio frame, and
  • the second delay is the encoding delay of the single description encoding;
  • Single description encoding is performed on the sixth data to obtain encoded data of the first audio frame.
  • generating third data according to the first data, the second data and the first delay includes:
  • Samples having a length equal to the first delay are deleted from the tail end of the eighth data to obtain the third data.
  • generating sixth data according to the fourth data, the fifth data and the second delay includes:
  • the samples having a length of the second delay are deleted from the tail end of the tenth data to obtain the sixth data.
  • determining the encoding mode of the first audio frame includes:
  • the duration of the coding mode is the playback duration of audio frames continuously encoded in the current coding mode
  • the encoding mode of the first audio frame is determined according to network parameters of the audio encoding data transmission network.
  • determining whether a coding mode switching condition is met based on the duration of the coding mode and the signal type of the first audio frame includes:
  • the duration of the coding mode is greater than the threshold duration, and the probability that the first audio frame is a speech audio frame is less than the threshold probability, determining that the coding mode switching condition is met;
  • the duration of the coding mode is less than or equal to the threshold duration and/or the probability that the first audio frame is a speech audio frame is greater than or equal to the threshold probability, it is determined that the coding mode switching condition is not met.
  • determining the encoding mode of the first audio frame according to the network parameters of the audio encoding data transmission network includes:
  • the encoding mode of the first audio frame is single description coding.
  • an embodiment of the present disclosure provides a method for decoding audio data, comprising:
  • the coding mode of the first audio frame is multiple description coding, generating packet loss compensation data based on the second audio frame;
  • the decoded data is smoothed according to the delay data of the second audio frame and the packet loss compensation data to obtain playback data of the first audio frame.
  • the method further includes:
  • the encoding mode of the first audio frame is different from the encoding mode of the second audio frame, and the encoding mode of the first audio frame is single description coding, generating packet loss compensation data based on the second audio frame;
  • the smoothing result is delayed according to the packet loss compensation data and the number of delayed samples to obtain playback data of the first audio frame; the number of delayed samples is the number of delayed samples of multiple description coding.
  • the smoothing of the decoded data according to the packet loss compensation data to obtain a smoothing result corresponding to the decoded data includes:
  • the first sample sequence is a sample sequence consisting of a first first number of sample points of the decoded data, where the first number is a difference between a first preset number and the number of delayed sample points
  • the second sample sequence is a sample sequence consisting of sample points in the packet loss compensation data whose index values are from the number of delayed sample points to the first preset number
  • a third sample point sequence in the first replacement result and a fourth sample point sequence in the packet loss compensation data are windowed and superimposed based on a first window function to obtain a smoothed result corresponding to the decoded data, the third sample point sequence being a sample point sequence consisting of samples whose index values in the first replacement result are from the first number to the sum of the first number and a second preset number; and the fourth sample point sequence being a sample point sequence consisting of samples whose index values in the packet loss compensation data are from the first preset number to the sum of the first preset number and the second preset number.
  • the delay processing of the smoothing result according to the packet loss compensation data and the number of delayed samples to obtain the playback data of the first audio frame includes:
  • the fifth sample point sequence is a sample point sequence consisting of the number of delayed sample points of the packet loss compensation data
  • the sixth sample point sequence is a sample point sequence consisting of the number of delayed sample points after the first splicing result.
  • the smoothing of the decoded data according to the delay data of the second audio frame and the packet loss compensation data to obtain the playback data of the first audio frame includes:
  • the seventh sample point sequence is a sample point sequence consisting of the number of delayed sample points in the decoded data
  • the eighth sample point sequence in the second replacement result and the ninth sample point sequence in the packet loss compensation data are windowed and superimposed based on a second window function to obtain playback data of the first audio frame, the eighth sample point sequence being a sample point sequence composed of samples whose index values in the second replacement result are from the number of delayed samples to the sum of the number of delayed samples and a third preset number; and the ninth sample point sequence is a sample point sequence composed of the first third preset number of samples of the packet loss compensation data.
  • the method further includes:
  • the decoded data is delayed according to the delay data of the second audio frame and the number of delayed samples to obtain the playback data of the first audio frame.
  • the delay processing of the decoded data according to the delay data of the second audio frame and the number of delay samples to obtain the playback data of the first audio frame includes:
  • the tenth sample sequence in the second splicing result is deleted to obtain playback data of the first audio frame, wherein the tenth sample sequence is a sample sequence consisting of the number of delayed sample points after the second splicing result.
  • an audio data encoding device including:
  • a determining unit configured to determine a coding mode of a first audio frame
  • a determining unit configured to determine whether a coding mode of the first audio frame is the same as a coding mode of a second audio frame; the second audio frame being an audio frame preceding the first audio frame;
  • a generating unit configured to generate third data according to first data, second data, and a first delay, when the encoding mode of the first audio frame is different from the encoding mode of the second audio frame and the encoding mode of the first audio frame is multiple description encoding;
  • the first data is low-frequency data obtained by frequency-dividing original audio data of the first audio frame
  • the second data is low-frequency data obtained by frequency-dividing original audio data of the second audio frame
  • the first delay is an encoding delay of the multiple description encoding
  • the encoding unit is used to perform multiple description encoding on the third data to obtain encoded data of the first audio frame.
  • the generating unit is further used to generate sixth data according to the fourth data, the fifth data and the second delay when the encoding mode of the first audio frame is different from the encoding mode of the second audio frame and the encoding mode of the first audio frame is single description encoding;
  • the fourth data is the original audio data of the first audio frame
  • the fifth data is the original audio data of the second audio frame
  • the second delay is the encoding delay of the single description encoding;
  • the encoding unit is further configured to perform single description encoding on the sixth data to obtain encoded data of the first audio frame.
  • the generating unit is specifically used to: intercept samples with a length of the first delay from the tail end of the second data to obtain seventh data; splice the seventh data to the head end of the first data to obtain eighth data; delete samples with a length of the first delay from the tail end of the eighth data to obtain the third data.
  • the generating unit is specifically used to: cut off samples with a length of the second delay from the tail end of the fifth data to obtain ninth data; splice the ninth data to the head end of the fourth data to obtain tenth data; delete samples with a length of the second delay from the tail end of the tenth data to obtain the sixth data.
  • the determination unit is specifically used to: determine whether the coding mode switching condition is met based on the duration of the coding mode and the signal type of the first audio frame; the duration of the coding mode is the playback duration of the audio frames continuously encoded by the current coding mode; if not, determine the coding mode of the second audio frame as the coding mode of the first audio frame; if so, determine the coding mode of the first audio frame according to the network parameters of the audio coding data transmission network model.
  • the determination unit is specifically used to: determine whether the duration of the encoding mode is greater than a threshold duration; determine whether the probability that the first audio frame is a speech audio frame is less than a threshold probability; if the duration of the encoding mode is greater than the threshold duration, and the probability that the first audio frame is a speech audio frame is less than the threshold probability, determine that the encoding mode switching condition is met; if the duration of the encoding mode is less than or equal to the threshold duration and/or the probability that the first audio frame is a speech audio frame is greater than or equal to the threshold probability, determine that the encoding mode switching condition is not met.
  • the determination unit is specifically used to: determine the packet loss rate of the audio encoding data transmission network according to the network parameters; determine whether the packet loss rate is greater than or equal to a threshold packet loss rate; if so, determine that the encoding mode of the first audio frame is multiple description coding; if not, determine that the encoding mode of the first audio frame is single description coding.
  • an embodiment of the present disclosure provides a decoding device for audio data, including:
  • a determining unit configured to determine an encoding mode of the first audio frame according to the encoding data of the first audio frame
  • a decoding unit configured to decode the encoded data of the first audio frame according to the encoding mode to obtain decoded data
  • a determining unit configured to determine whether a coding mode of the first audio frame is the same as a coding mode of a second audio frame; the second audio frame being an audio frame preceding the first audio frame;
  • a processing unit is used to generate packet loss compensation data based on the second audio frame when the encoding mode of the first audio frame is different from the encoding mode of the second audio frame and the encoding mode of the first audio frame is multiple description coding; and to smooth the decoded data according to the delay data of the second audio frame and the packet loss compensation data to obtain the playback data of the first audio frame.
  • the processing unit is also used to: when the encoding mode of the first audio frame is different from the encoding mode of the second audio frame, and the encoding mode of the first audio frame is single description encoding, generate packet loss compensation data based on the second audio frame; smooth the decoded data according to the packet loss compensation data to obtain a smoothed result corresponding to the decoded data; delay the smoothed result according to the packet loss compensation data and the number of delayed sample points to obtain playback data of the first audio frame; the number of delayed sample points is the number of delayed sample points of multiple description encoding.
  • the processing unit is specifically used to: replace a first sample point sequence in the decoded data with a second sample point sequence in the packet loss compensation data to obtain a first replacement result;
  • the first sample point sequence is a sample point sequence composed of a first first number of sample points of the decoded data, and the first number is a difference between a first preset number and the number of delayed sample points;
  • the second sample point sequence is a sample point sequence composed of sample points with index values from the number of delayed sample points to the first preset number in the packet loss compensation data;
  • based on a first window function a third sample point sequence in the first replacement result and a fourth sample point sequence in the packet loss compensation data are windowed and superimposed to obtain a smoothing result corresponding to the decoded data
  • the third sample point sequence is a sample point sequence composed of sample points with index values from the first number to the sum of the first number and the second preset number in the first replacement result;
  • the fourth sample point sequence is a sample point sequence composed of sample point sequence composed of
  • the processing unit is specifically used to: obtain a fifth sample point sequence, the fifth sample point sequence is a sample point sequence composed of the number of delayed sample points before the packet loss compensation data; splice the fifth sample point sequence before the smoothing result to obtain a first splicing result; delete the sixth sample point sequence in the first splicing result to obtain the playback data of the first audio frame, the sixth sample point sequence is a sample point sequence composed of the number of delayed sample points after the first splicing result.
  • the processing unit is specifically used to: replace the seventh sample point sequence in the decoded data with the delayed data to obtain a second replacement result;
  • the seventh sample point sequence is a sample point sequence composed of the number of delayed sample points in the decoded data; based on a second window function, the eighth sample point sequence in the second replacement result and the ninth sample point sequence in the packet loss compensation data are windowed and superimposed to obtain the playback data of the first audio frame,
  • the eighth sample point sequence is a sample point sequence composed of samples whose index values in the second replacement result are from the number of delayed sample points to the sum of the number of delayed sample points and a third preset number;
  • the ninth sample point sequence is a sample point sequence composed of the third preset number of sample points in the packet loss compensation data.
  • the processing unit is further used for, when the encoding mode of the first audio frame is the same as the encoding mode of the second audio frame and the encoding mode of the first audio frame is single description encoding,
  • the decoded data is delayed according to the number of samples to obtain playback data of the first audio frame.
  • the processing unit is specifically used to: splice the delayed data before the decoded data to obtain a second splicing result; delete the tenth sample point sequence in the second splicing result to obtain the playback data of the first audio frame, and the tenth sample point sequence is a sample point sequence consisting of the number of delayed sample points after the second splicing result.
  • an embodiment of the present disclosure provides an electronic device, comprising: a memory and a processor, wherein the memory is used to store a computer program; and the processor is used to enable the electronic device to implement the audio data encoding method or the audio data decoding method described in any of the above embodiments when executing the computer program.
  • an embodiment of the present disclosure provides a computer-readable storage medium, which, when the computer program is executed by a computing device, enables the computing device to implement the audio data encoding method or the audio data decoding method described in any of the above-mentioned embodiments.
  • an embodiment of the present disclosure provides a computer program product, which, when executed on a computer, enables the computer to implement the audio data encoding method or the audio data decoding method described in any one of the above-mentioned embodiments.
  • the audio data encoding method and decoding method provided in the embodiment of the present disclosure generate target data through the following steps: determining the encoding mode of the first audio frame; judging whether the encoding mode of the first audio frame is the same as the encoding mode of the second audio frame; if they are not the same, and the encoding mode of the first audio frame is multiple description encoding, then generating target data according to the first data, the second data and the first delay.
  • the embodiment of the present disclosure since the audio data encoding method provided in the embodiment of the present disclosure, when the encoding mode of the first audio frame is different from the encoding mode of the second audio frame, and the encoding mode of the first audio frame is multiple description encoding, the low-frequency data obtained by dividing the original audio data of the first audio frame according to the low-frequency data obtained by dividing the original audio data of the second audio frame and the encoding delay of the multiple description encoding will be processed, and then the processed third data will be encoded. Therefore, when the change mode is switched from single description encoding to multiple description encoding, the embodiment of the present application can avoid the problem of audio discontinuity and noise, thereby improving the quality of the audio signal.
  • FIG1 is a flowchart of one of the steps of the method for encoding audio data provided by an embodiment of the present disclosure
  • FIG2 is a schematic diagram of a method for encoding audio data according to an embodiment of the present disclosure
  • FIG3 is a second schematic diagram of the method for encoding audio data provided by an embodiment of the present disclosure.
  • FIG4 is a second flowchart of the method for encoding audio data provided by an embodiment of the present disclosure.
  • FIG5 is a flowchart of the third step of the method for encoding audio data provided by an embodiment of the present disclosure
  • FIG6 is a flowchart of one of the steps of the method for decoding audio data provided by an embodiment of the present disclosure
  • FIG. 7 is a second flowchart of the method for decoding audio data provided by an embodiment of the present disclosure.
  • FIG8 is a schematic diagram of a method for decoding audio data provided by an embodiment of the present disclosure.
  • FIG9 is a second schematic diagram of the audio data decoding method provided by an embodiment of the present disclosure.
  • FIG10 is a third schematic diagram of the audio data decoding method provided by an embodiment of the present disclosure.
  • FIG11 is a fourth schematic diagram of the audio data decoding method provided by an embodiment of the present disclosure.
  • FIG12 is a fifth schematic diagram of the audio data decoding method provided by an embodiment of the present disclosure.
  • FIG13 is a flowchart of the third step of the method for decoding audio data provided by an embodiment of the present disclosure.
  • FIG14 is a sixth schematic diagram of the audio data decoding method provided by an embodiment of the present disclosure.
  • FIG15 is a schematic diagram of the structure of an audio data encoding device provided in an embodiment of the present disclosure.
  • FIG16 is a schematic diagram of the structure of an audio data decoding device provided in an embodiment of the present disclosure.
  • FIG. 17 is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of the present disclosure.
  • words such as “exemplary” or “for example” are used to indicate examples, illustrations or descriptions. Any embodiment or design described as “exemplary” or “for example” in the embodiments of the present disclosure should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of words such as “exemplary” or “for example” is intended to present related concepts in a concrete way. In addition, in the description of the embodiments of the present disclosure, unless otherwise specified, the meaning of "multiple” refers to two or more.
  • the present disclosure provides a method for encoding audio data. As shown in FIG. 1 , the method for encoding audio data includes the following steps:
  • S101 Determine a coding mode of a first audio frame.
  • the encoding methods of audio frames are Single Description Coding (SDC) and Multiple Description Coding (MDC).
  • S102 Determine whether a coding mode of the first audio frame is the same as a coding mode of the second audio frame.
  • the second audio frame is an audio frame preceding the first audio frame.
  • step S102 if the coding mode of the first audio frame is different from the coding mode of the second audio frame, and the coding mode of the first audio frame is multiple description coding, the following steps S103 and S104 are performed:
  • the first data is low-frequency data obtained by dividing the original audio data of the first audio frame
  • the second data is low-frequency data obtained by dividing the original audio data of the second audio frame
  • the first delay is the encoding delay of the multiple description coding.
  • the encoding mode of the current audio frame is multi-description encoding
  • the original data of the current audio frame is written into a delay buffer (delay_buffer)
  • the encoding mode of the current audio frame is single-description encoding
  • the low-frequency data obtained by dividing the original data of the current audio frame is written into a designated buffer, so that when the second data needs to be obtained, the low-frequency data obtained by dividing the original audio data of the previous audio frame is directly read from the delay buffer.
  • S104 Perform multiple description coding on the third data to obtain a code of the first audio frame. data.
  • step S102 if the encoding mode of the first audio frame is different from the encoding mode of the second audio frame, and the encoding mode of the first audio frame is single description encoding, the following steps S105 and S106 are performed:
  • the fourth data is the original audio data of the first audio frame
  • the fifth data is the original audio data of the second audio frame
  • the second delay is the coding delay of the single description coding.
  • the audio data encoding method and decoding method provided in the embodiment of the present disclosure generate target data through the following steps: determining the encoding mode of the first audio frame; judging whether the encoding mode of the first audio frame is the same as the encoding mode of the second audio frame; if they are not the same, and the encoding mode of the first audio frame is multiple description encoding, then generating target data according to the first data, the second data and the first delay.
  • the embodiment of the present disclosure since the audio data encoding method provided in the embodiment of the present disclosure, when the encoding mode of the first audio frame is different from the encoding mode of the second audio frame, and the encoding mode of the first audio frame is multiple description encoding, the low-frequency data obtained by dividing the original audio data of the first audio frame according to the low-frequency data obtained by dividing the original audio data of the second audio frame and the encoding delay of the multiple description encoding will be processed, and then the processed third data will be encoded. Therefore, when the change mode is switched from single description encoding to multiple description encoding, the embodiment of the present application can avoid the problem of audio discontinuity and noise, thereby improving the quality of the audio signal.
  • the present disclosure provides a method for encoding audio data.
  • the method for encoding audio data includes the following steps:
  • S201 Determine a coding mode for a first audio frame.
  • the encoding mode of the current audio frame is determined.
  • S202 Determine whether a coding mode of the first audio frame is the same as a coding mode of the second audio frame.
  • the second audio frame is an audio frame preceding the first audio frame.
  • S203 Cut off samples with a length equal to the first delay from the tail end of the second data to obtain seventh data.
  • S206 Perform multiple description coding on the third data to obtain coded data of the first audio frame.
  • the first delay length in FIG3 is delay_8kHZ
  • the data cached in the delay buffer (delay_buffer) is the low-frequency data (second data 31) obtained by dividing the original data of the second audio frame
  • the input of the encoder in multiple description encoding is the low-frequency data (first data 32) obtained by dividing the first audio frame.
  • the data processing process of the above steps S203 to S205 includes: firstly, a sample point with a length of delay_8kHZ is intercepted from the tail end of the second data 31 to obtain the seventh data 311, and secondly, the seventh data 311 is spliced to the head end of the first data 32 to obtain the eighth data 33, and finally, the sample point with a length of delay_8kHZ is deleted from the tail end of the eighth data 33 to obtain the third data 34.
  • the third data 34 consists of two parts, one part is the seventh data 311 , and the other part is the remaining data of the first data 32 after deleting samples with a length of delay_8 kHz from the tail of the first data 32 .
  • the first delay length in FIG4 is delay_16kHZ
  • the data cached in the delay buffer (delay_buffer) is the original audio data of the second audio frame (fifth data 41)
  • the input of the output encoder in single description encoding is the original audio data of the first audio frame (fourth data 42).
  • the data processing process of the above steps S207 to S209 includes: firstly, a sample point with a length of delay_16kHZ is intercepted from the tail end of the fifth data 41 to obtain the ninth data 411, secondly, the ninth data 411 is spliced to the head end of the fourth data 42 to obtain the tenth data 43, and finally, a sample point with a length of delay_16kHZ is deleted from the tail end of the tenth data 43 to obtain the sixth data 44.
  • the sixth data 44 consists of two parts, one part is the ninth data 411 , and the other part is the remaining data of the fourth data 42 after deleting samples with a length of delay_16 kHz from the tail of the fourth data 42 .
  • the present disclosure provides a method for processing audio data.
  • the method for processing audio data includes:
  • S501 Determine whether a coding mode switching condition is met based on a coding mode duration and a signal type of the first audio frame.
  • the duration of the encoding mode is the duration of playing audio frames continuously encoded in the current encoding mode.
  • the implementation method of determining whether the coding mode switching condition is met based on the coding mode duration and the signal type of the first audio frame may include the following steps a to d:
  • Step a determine whether the duration of the encoding mode is greater than a threshold duration.
  • the embodiment of the present application does not limit the threshold duration.
  • the threshold duration may be 2s.
  • step a if the duration of the encoding mode is less than or equal to the threshold duration, the following step b is executed.
  • Step b determine that the coding mode switching condition is not met.
  • Step a if the duration of the encoding mode is greater than the threshold duration, the following is performed: Steps c to e:
  • Step c determine whether the probability that the first audio frame is a speech audio frame is less than a threshold probability.
  • step d if the probability that the first audio frame is a speech audio frame is less than the threshold probability, perform the following step d:
  • Step d Determine whether the coding mode switching condition is met.
  • step e if the probability that the first audio frame is a speech audio frame is greater than or equal to the threshold probability, perform the following step e:
  • Step e Determine whether the coding mode switching condition is not met.
  • the duration of the coding mode is less than or equal to the threshold duration and/or the probability that the first audio frame is a speech audio frame is greater than or equal to the threshold probability, it is determined that the coding mode switching condition is not met.
  • step S501 if the coding mode switching condition is not met, the following step S502 is performed:
  • S502 Determine the encoding mode of the second audio frame as the encoding mode of the first audio frame.
  • the encoding is performed using the encoding mode of the previous audio frame.
  • step S501 if the coding mode switching condition is met, the following step S503 is performed:
  • S503 Determine a coding mode for the first audio frame according to network parameters of an audio coding data transmission network.
  • the implementation method of the above step S503 (determining the encoding mode of the first audio frame according to the network parameters of the audio encoding data transmission network) includes the following steps 1 to 3:
  • Step 1 Determine the packet loss rate of the audio encoding data transmission network according to the network parameters.
  • the packet loss rate (Packet Loss Rate) in the embodiment of the present application refers to the ratio of the number of data packets lost during the data packet transmission process to the total number of data packets sent.
  • Step 2 Determine whether the packet loss rate is greater than or equal to a threshold packet loss rate.
  • the embodiment of the present application does not limit the threshold packet loss rate.
  • the threshold packet loss rate may be 5%.
  • step 2 if the packet loss rate is greater than or equal to the threshold packet loss rate, the following step 3 is executed, and if the packet loss rate is less than the threshold packet loss rate, the following step 4 is executed:
  • Step 3 Determine that the coding mode of the first audio frame is multiple description coding.
  • Step 4 Determine that the encoding mode of the first audio frame is single description encoding.
  • S504 Determine whether the encoding mode of the first audio frame is the same as the encoding mode of the second audio frame.
  • the second data is low-frequency data obtained by frequency-dividing original audio data of the second audio frame
  • the first delay is a coding delay of the multiple description coding
  • S506 Splice the seventh data to the head end of the first data to obtain eighth data.
  • the first data is low-frequency data obtained by frequency-dividing original audio data of the first audio frame.
  • S507 Delete samples having a length of the first delay from the end of the eighth data to obtain the third data.
  • S508 Perform multiple description coding on the third data to obtain coded data of the first audio frame.
  • S512 Perform single description encoding on the sixth data to obtain encoded data of the first audio frame.
  • the encoding mode of the first audio frame is the same as the encoding mode of the second audio frame and the encoding mode of the first audio frame is multiple description coding
  • multiple description coding is performed on the low-frequency data obtained by frequency division of the original audio data of the first audio frame to obtain the first The encoded data of the audio frame.
  • the encoding mode of the first audio frame is the same as the encoding mode of the second audio frame and the encoding mode of the first audio frame is single description encoding
  • the original audio data of the first audio frame is single description encoding to obtain the encoded data of the first audio frame.
  • the present disclosure provides a method for decoding audio data, referring to FIG. 6 , the method for decoding audio data includes:
  • S601 Determine a coding mode of a first audio frame according to coding data of a first audio frame.
  • S602 Decode the encoded data of the first audio frame according to the encoding mode to obtain decoded data.
  • S603 Determine whether the encoding mode of the first audio frame is the same as the encoding mode of the second audio frame.
  • the second audio frame is an audio frame preceding the first audio frame.
  • the encoding mode of the first audio frame is different from the encoding mode of the second audio frame, and the encoding mode of the first audio frame is single description coding, and the following S604 to S606 are performed:
  • S604 Generate packet loss compensation data based on the second audio frame.
  • the packet loss compensation data is data obtained based on the packet loss compensation mechanism (Packet Loss Concealment, PLC), which is used by the media engine to solve the network packet loss problem.
  • PLC Packet Loss Concealment
  • FEC Forward Error Correction
  • the packet loss compensation mechanism is not standardized, and it allows the media engine and codec to implement and expand it according to their own situation.
  • the packet loss compensation data in the embodiment of the present application may be data with a length of 10 ms.
  • S605 Smoothing the decoded data according to the packet loss compensation data to obtain a smoothing result corresponding to the decoded data.
  • S606 Perform delay processing on the smoothing result according to the packet loss compensation data and the number of delayed samples to obtain playback data of the first audio frame.
  • the number of delayed sample points is the number of delayed sample points of multiple description coding.
  • the output audio delay after decoding can be set to 0.
  • the encoding method of the first audio frame is SDC
  • the coding mode of the first audio frame is different from the coding mode of the second audio frame, and the coding mode of the first audio frame is multiple description coding, and the following S607 and S608 are performed:
  • S607 Generate packet loss compensation data based on the second audio frame.
  • S608 Smooth the decoded data according to the delay data of the second audio frame and the packet loss compensation data to obtain playback data of the first audio frame.
  • the encoding mode of the first audio frame is first determined according to the encoding data of the first audio frame, and then the encoding data of the first audio frame is decoded according to the encoding mode to obtain the decoded data, and then it is determined whether the encoding mode of the first audio frame is the same as the encoding mode of the second audio frame. If the encoding modes are not the same and the encoding mode of the first audio frame is single description encoding, packet loss compensation data is generated based on the second audio frame, and then the decoded data is smoothed according to the packet loss compensation data to obtain the smoothed result corresponding to the decoded data.
  • the smoothed result is delayed according to the packet loss compensation data and the number of delayed samples to obtain the playback data of the first audio frame; if the encoding modes are not the same and the encoding mode of the first audio frame is multiple description encoding, packet loss compensation data is generated based on the second audio frame, and then the decoded data is smoothed according to the delay data of the second audio frame and the packet loss compensation data to obtain the playback data of the first audio frame.
  • the audio data decoding method generates packet loss compensation data based on the second audio frame, and then smoothes the decoded data to obtain the playback data of the first audio frame when the encoding mode of the first audio frame is different from the encoding mode of the second audio frame and the encoding mode of the first audio frame is single-description encoding; when the encoding mode of the first audio frame is different from the encoding mode of the second audio frame and the encoding mode of the first audio frame is multi-description encoding, packet loss compensation data is generated based on the second audio frame, and then the playback data of the first audio frame is obtained in combination with the delay data of the second audio frame.
  • the embodiment of the present application can process the encoded data according to the current audio frame encoding mode type when the encoding mode of the first audio frame is different from the encoding mode of the second audio frame, thereby avoiding audio
  • the problem of discontinuity and noise is solved, thereby improving the quality of the audio signal.
  • the present disclosure provides a method for decoding audio data.
  • the method for decoding audio data includes the following steps:
  • S702 Decode the encoded data of the first audio frame according to the encoding mode to obtain decoded data.
  • S703 Determine whether the encoding mode of the first audio frame is the same as the encoding mode of the second audio frame.
  • the second audio frame is an audio frame preceding the first audio frame.
  • the first sample point sequence is a sample point sequence consisting of the first first number of sample points of the decoded data, and the first number is the difference between a first preset number and the delayed number of sample points;
  • the second sample point sequence is a sample point sequence consisting of sample points with index values from the delayed number of sample points to the first preset number in the packet loss compensation data.
  • the original data of the current audio frame is written into a transition buffer (transition_buffer)
  • the encoding mode of the current audio frame is single description encoding
  • the low-frequency data obtained by dividing the original data of the current audio frame is written into a designated buffer, so that when the second data needs to be obtained, the low-frequency data obtained by dividing the original audio data of the previous audio frame is directly read from the delay buffer.
  • the decoded data will be stored in a pulse code modulation buffer (pcm_buffer), and the first replacement result will be written into the storage location of the original decoded data in the pulse code modulation buffer, and the second replacement result obtained in S707 below will also be written into the storage location of the original decoded data in the pulse code modulation buffer.
  • pcm_buffer pulse code modulation buffer
  • the decoded data is stored in the pulse code modulation buffer
  • the first sample sequence in the decoded data is the F5-Fd sample sequence before the pulse code modulation buffer.
  • the second sample sequence in the packet loss compensation data is the sample sequence with index values from Fd to F5 in the packet loss compensation data.
  • the transition buffer stores packet loss compensation data (packet loss compensation data 81) generated based on the second audio frame, and the sample sequence composed of the sample points whose index value in the transition buffer is the number of delayed samples to the first preset number is the second sample sequence 811, and the sample sequence (decoded data 82) stored in the pulse code modulation buffer is the decoded data obtained by decoding the encoded data of the first audio frame according to the encoding mode, and the sample sequence composed of the first first number of samples in the pulse code modulation buffer (first sample sequence 821).
  • the above step S704 is: replace the first sample sequence 821 in the decoded data 82 with the second sample sequence 811 in the packet loss compensation data 81 to obtain the first replacement result 83.
  • S705 Perform windowing and superposition on the third sample point sequence in the first replacement result and the fourth sample point sequence in the packet loss compensation data based on a first window function to obtain a smoothing result corresponding to the decoded data.
  • the third sample point sequence is a sample point sequence composed of samples whose index values in the first replacement result are from the first number to the sum of the first number and the second preset number; and the fourth sample point sequence is a sample point sequence composed of samples whose index values in the packet loss compensation data are from the first preset number to the sum of the first preset number and the second preset number.
  • Window function Fourier transform can only transform time domain data of limited length, so it is necessary to truncate the time domain signal. Even for periodic signals, if the truncated time length is not an integer multiple of the period (period truncation), then the truncated signal will have leakage. In order to minimize this leakage error, we need to use a weighting function, also called a window function. Windowing is mainly to make the time domain signal seem to better meet the periodicity requirements of Fourier processing and reduce leakage. In this embodiment, smoothing is performed according to the switching type, and transition smoothing is performed using windowing smoothing.
  • the third sample point sequence is a sample point sequence with index values from F5-Fd to F5-Fd+F2.5 in the pulse code modulation buffer.
  • the fourth sample point sequence is a sample point sequence with index values from F5-Fd to F5-Fd+F2.5 in the transition buffer.
  • w(i) is the expression of the window function
  • the smoothing method is to perform windowing and superposition on the corresponding part and the sample points with indexes from F5 to F5+F2.5 in the transition buffer to achieve the purpose of smooth transition.
  • the second preset number F2.5 On the basis of the embodiment shown in FIG. 8 above, as shown in FIG. 9, the second preset number F2.5.
  • the sample point sequence composed of the sample points whose index values in the transition buffer are the first preset number to the sum of the first preset number and the second preset number is the fourth sample point sequence 812
  • the sample point sequence composed of the sample points whose index values in the first replacement result 83 in the pulse code modulation buffer are the first number to the sum of the first number and the second preset number is the third sample point sequence 831.
  • the third sample point sequence 831 in the first replacement result 83 and the fourth sample point sequence 812 in the packet loss compensation data 81 are windowed and superimposed to obtain the smoothing result 91 corresponding to the decoded data, and the smoothing result 91.
  • the fifth sample point sequence is a sample point sequence consisting of the number of delayed sample points of the packet loss compensation data.
  • the fifth sample point sequence is the first Fd sample point sequences in the transition buffer.
  • S707 Splice the fifth sample point sequence before the smoothing result to obtain a first splicing result.
  • S708 Delete the sixth sample point sequence in the first splicing result to obtain playback data of the first audio frame.
  • the sixth sample point sequence is a sample point sequence consisting of the delayed sample points after the first splicing result.
  • the sample sequence of the packet loss compensation data consisting of the aforementioned delayed sample number of samples is a fifth sample sequence 101.
  • the fifth sample The point sequence 101 is spliced before the smoothing result 91 to obtain the first splicing result 102.
  • the sample point sequence composed of the delayed sample point number after the first splicing result 102 is the sixth sample point sequence 103.
  • the sixth sample point sequence 103 in the first splicing result 102 is deleted to obtain the playback data 104 of the first audio frame.
  • the playback data 104 of the first audio frame consists of the fifth sample point sequence 101, the second sample point sequence 811 and the remaining part of the first splicing result 102 with the sixth sample point sequence deleted from the tail end.
  • the seventh sample point sequence is a sample point sequence consisting of the number of delayed sample points in the decoded data.
  • the decoded data (decoded data 112) obtained by decoding the encoded data of the first audio frame according to the encoding mode in the pulse code modulation cache is the decoded data (decoded data 112)
  • the first qmf_order-1 sample sequence in the pulse code modulation cache is the seventh sample sequence 1121
  • the first qmf_order-1 sample sequence in the delay cache is the delayed data 111.
  • the seventh sample sequence 1121 in the decoded data 112 is replaced by the delayed data 111 to obtain a second replacement result 113
  • the second replacement result 113 consists of the delayed data 111 and the remaining part of the tail end of the decoded data 112.
  • S710 Perform windowing and superposition on the eighth sample point sequence in the second replacement result and the ninth sample point sequence in the packet loss compensation data based on a second window function.
  • the eighth sample point sequence is a sample point sequence composed of samples whose index values in the second replacement result are from the number of delayed sample points to the sum of the number of delayed sample points and the third preset number;
  • the sample point sequence composed of the first third preset number of sample points in the transition buffer is the ninth sample point sequence 1211
  • the sample point sequence composed of the samples whose index values are the number of delayed sample points to the sum of the number of delayed sample points and the third preset number in the second replacement result 113 in the pulse code modulation buffer is the eighth sample point sequence 1031
  • the result 122 obtained by windowing and superimposing the ninth sample point sequence 1211 and the eighth sample point sequence 1031 in the packet loss compensation data 121 is composed of a smoothed result 1221 obtained by windowing and superimposing the delayed data 102, the ninth sample point sequence 1211, and the eighth sample point sequence 1031, and the remaining part at the tail end of the second replacement result 104.
  • the present disclosure provides a method for processing audio data.
  • the method for processing audio data includes the following steps:
  • S1302 Decode the encoded data of the first audio frame according to the encoding mode to obtain decoded data.
  • S1303 Determine whether the encoding mode of the first audio frame is the same as the encoding mode of the second audio frame.
  • Step a splicing the delayed data before the decoded data to obtain a second splicing result.
  • Step b deleting the tenth sample point sequence in the second splicing result to obtain playback data of the first audio frame.
  • the tenth sample point sequence is a sample point sequence consisting of the delayed sample points after the second splicing result.
  • the above steps a and b may refer to FIG. 14
  • the delayed data 1411 is the first qmf_order-1 sample sequence in the delay buffer
  • the delayed data 1411 is spliced before the decoded data 142
  • the second splicing result 143 is obtained.
  • the sample sequence consisting of the number of delayed sample points after the second splicing result 143 is the tenth sample sequence 1431
  • the tenth sample sequence 1431 in the second splicing result 143 is deleted to obtain the playback data 144 of the first audio frame
  • the playback data 144 of the first audio frame consists of the delayed data 1411 and the remaining part of the decoded data 142 at the tail end.
  • the encoding mode of the first audio frame is different from the encoding mode of the second audio frame, and the encoding mode of the first audio frame is single description coding, then the following S1304 to S1306 are performed:
  • S1304 Generate packet loss compensation data based on the second audio frame.
  • the first sample point sequence is a sample point sequence consisting of the first first number of sample points of the decoded data, and the first number is the difference between a first preset number and the delayed number of sample points;
  • the second sample point sequence is a sample point sequence consisting of sample points with index values from the delayed number of sample points to the first preset number in the packet loss compensation data.
  • S1306 Perform windowing and superposition on the third sample point sequence in the first replacement result and the fourth sample point sequence in the packet loss compensation data based on a first window function to obtain a smoothing result corresponding to the decoded data.
  • the third sample point sequence is a sample point sequence composed of samples whose index values in the first replacement result are from the first number to the sum of the first number and the second preset number; and the fourth sample point sequence is a sample point sequence composed of samples whose index values in the packet loss compensation data are from the first preset number to the sum of the first preset number and the second preset number.
  • the fifth sample point sequence is a sample point sequence consisting of the number of delayed sample points of the packet loss compensation data.
  • S1308 Splice the fifth sample point sequence before the smoothing result to obtain a first splicing result.
  • the sixth sample point sequence is a sample point sequence consisting of the delayed sample points after the first splicing result.
  • S1312 Perform windowing and superposition on the eighth sample point sequence in the second replacement result and the ninth sample point sequence in the packet loss compensation data based on a second window function to obtain playback data of the first audio frame.
  • the eighth sample point sequence is a sample point sequence composed of samples whose index values in the second replacement result are the number of delayed sample points to the sum of the number of delayed sample points and a third preset number;
  • the ninth sample point sequence is a sample point sequence composed of the first third preset number of samples of the packet loss compensation data.
  • S1313 Perform delay processing on the smoothing result according to the packet loss compensation data and the number of delayed samples to obtain playback data of the first audio frame.
  • the embodiment of the present disclosure also provides an encoding device and a decoding device for audio data.
  • This embodiment corresponds to the above method embodiment.
  • this embodiment will no longer repeat the details of the above method embodiment one by one, but it should be clear that the audio data processing device in this embodiment can correspond to the implementation of all the contents in the above method embodiment.
  • the determining unit 1501 is configured to determine a coding mode of a first audio frame
  • the determining unit 1502 is configured to determine whether the encoding mode of the first audio frame is the same as the encoding mode of the second audio frame; the second audio frame is an audio frame preceding the first audio frame;
  • the generating unit 1503 is further configured to generate sixth data according to the fourth data, the fifth data, and the second delay when the encoding mode of the first audio frame is different from the encoding mode of the second audio frame and the encoding mode of the first audio frame is single description encoding;
  • the fourth data is the original audio data of the first audio frame
  • the fifth data is the original audio data of the second audio frame
  • the second delay is the encoding delay of the single description encoding;
  • the encoding unit 1504 is configured to encode the target data according to the encoding mode of the first audio frame to obtain encoded data of the first audio frame.
  • the generation unit 1503 is specifically used to: intercept samples with a length of the first delay from the tail end of the second data to obtain fifth data; splice the fifth data to the head end of the first data to obtain sixth data; delete samples with a length of the first delay from the tail end of the sixth data to obtain the target data.
  • the generation unit 1503 is specifically used to: intercept samples with a length of the second delay from the tail end of the fifth data to obtain seventh data; splice the seventh data to the head end of the fourth data to obtain eighth data; delete samples with a length of the second delay from the tail end of the eighth data to obtain the target data.
  • the determination unit 1501 is specifically used to: determine whether the coding mode switching condition is met based on the duration of the coding mode and the signal type of the first audio frame; the duration of the coding mode is the playback duration of audio frames continuously encoded by the current coding mode; if not, determine the coding mode of the second audio frame as the coding mode of the first audio frame; if so, determine the coding mode of the first audio frame according to the network parameters of the audio coding data transmission network.
  • the determination unit 1501 is specifically used to: determine whether the duration of the encoding mode is greater than a threshold duration; determine whether the probability that the first audio frame is a speech audio frame is less than a threshold probability; if the duration of the encoding mode is greater than the threshold duration, and the probability that the first audio frame is a speech audio frame is less than the threshold probability, determine that the encoding mode switching condition is met; if the duration of the encoding mode is less than or equal to the threshold duration and/or the probability that the first audio frame is a speech audio frame is greater than or equal to the threshold probability, determine that the encoding mode switching condition is not met.
  • the determining unit 1501 is specifically configured to: Determine the packet loss rate of the audio encoding data transmission network according to the network parameters; determine whether the packet loss rate is greater than or equal to a threshold packet loss rate; if so, determine that the encoding mode of the first audio frame is multiple description coding; if not, determine that the encoding mode of the first audio frame is single description coding.
  • FIG16 is a schematic diagram of the structure of the audio data decoding device. As shown in FIG16 , the audio data decoding device 1600 includes:
  • the determining unit 1601 is configured to determine a coding mode of the first audio frame according to the coding data of the first audio frame;
  • a decoding unit 1602 is used to decode the encoded data of the first audio frame according to the encoding mode to obtain decoded data;
  • the determining unit 1603 is configured to determine whether the encoding mode of the first audio frame is the same as the encoding mode of the second audio frame; the second audio frame is an audio frame preceding the first audio frame;
  • the processing unit 1604 is configured to generate packet loss compensation data based on the second audio frame when the encoding mode of the first audio frame is different from the encoding mode of the second audio frame and the encoding mode of the first audio frame is single description encoding, and smooth the decoded data according to the packet loss compensation data to obtain a smoothed result corresponding to the decoded data; delay the smoothed result according to the packet loss compensation data and the number of delayed samples to obtain playback data of the first audio frame; the number of delayed samples is the number of delayed samples of multiple description encoding;
  • the processing unit 1604 is also used to generate packet loss compensation data based on the second audio frame when the encoding mode of the first audio frame is different from the encoding mode of the second audio frame and the encoding mode of the first audio frame is multiple description coding, and smooth the decoded data according to the delay data of the second audio frame and the packet loss compensation data to obtain the playback data of the first audio frame.
  • the processing unit 1604 is specifically used to: replace the first sample point sequence in the decoded data with the second sample point sequence in the packet loss compensation data to obtain a first replacement result;
  • the first sample point sequence is a sample point sequence composed of the first first number of sample points of the decoded data, and the first number is the difference between the first preset number and the number of delayed sample points;
  • the second sample point sequence is a sample point sequence composed of sample points with index values from the number of delayed sample points to the first preset number in the packet loss compensation data; based on a first window function, a third sample point sequence in the first replacement result and a fourth sample point sequence in the packet loss compensation data are windowed and superimposed to obtain a smoothing result corresponding to the decoded data, and the third sample point sequence is the first a sample point sequence in the replacement result consisting of samples whose index values are from the first number to the sum of the first number and the second preset number; and the fourth sample point sequence is a sample point sequence in the
  • the processing unit 1604 is specifically used to: obtain a fifth sample point sequence, the fifth sample point sequence is a sample point sequence composed of the number of delayed sample points before the packet loss compensation data; splice the fifth sample point sequence before the smoothing result to obtain a first splicing result; delete the sixth sample point sequence in the first splicing result to obtain the playback data of the first audio frame, the sixth sample point sequence is a sample point sequence composed of the number of delayed sample points after the first splicing result.
  • the processing unit 1604 is specifically used to: replace the seventh sample point sequence in the decoded data with the delayed data to obtain a second replacement result;
  • the seventh sample point sequence is a sample point sequence composed of the number of delayed sample points in the decoded data; based on a second window function, the eighth sample point sequence in the second replacement result and the ninth sample point sequence in the packet loss compensation data are windowed and superimposed to obtain the playback data of the first audio frame,
  • the eighth sample point sequence is a sample point sequence composed of samples whose index values in the second replacement result are from the number of delayed sample points to the sum of the number of delayed sample points and a third preset number;
  • the ninth sample point sequence is a sample point sequence composed of the third preset number of sample points in the packet loss compensation data.
  • the processing unit 1604 is specifically used to delay the decoded data according to the delay data of the second audio frame and the number of delayed samples to obtain the playback data of the first audio frame if the encoding mode of the first audio frame is the same as the encoding mode of the second audio frame, and the encoding mode of the first audio frame is single description encoding.
  • the processing unit 1604 is specifically used to: splice the delayed data before the decoded data to obtain a second splicing result; delete the tenth sample point sequence in the second splicing result to obtain the playback data of the first audio frame, and the tenth sample point sequence is a sample point sequence consisting of the number of delayed sample points after the second splicing result.
  • the audio data processing device provided in this embodiment can execute the audio data processing method provided in the above method embodiment. Its implementation principle and technical effect are similar and will not be repeated here.
  • FIG17 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure.
  • the electronic device provided by the present embodiment includes: a memory 1701 and a processor 1702, wherein the memory 1701 is used to store a computer program; and the processor 1702 is used to execute the audio data processing method provided by the above embodiment when executing the computer program.
  • an embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored.
  • the computer program is executed by a processor, the computing device implements the audio data processing method provided in the above embodiment.
  • an embodiment of the present disclosure further provides a computer program product.
  • the computing device implements the audio data processing method provided in the above embodiment.
  • the embodiments of the present disclosure may be provided as methods, systems, or computer program products. Therefore, the present disclosure may take the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Furthermore, the present disclosure may take the form of a computer program product implemented on one or more computer-usable storage media containing computer-usable program code.
  • the processor may be a central processing unit 103 (CPU), or other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASIC), field-programmable gate arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • Memory may include non-permanent storage in a computer-readable medium, in the form of random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer-readable media include permanent and non-permanent, removable and non-removable storage media. Storage media can use any method or technology to store information, and the information can be computer-readable. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission media that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory media such as modulated data signals and carrier waves.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically era

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

一种音频数据的处理方法及装置,涉及数据处理技术领域。方法包括:确定第一音频帧的编码模式(S101);判断第一音频帧的编码模式与第二音频帧的编码模式是否相同(S102);若不相同,且第一音频帧的编码模式为多描述编码,则根据第一数据、第二数据以及第一延迟,生成第三数据(S103);若不相同,且第一音频帧的编码模式为单描述编码,则根据第四数据、第五数据以及第二延迟,生成第六数据(S105);根据第一音频帧的编码模式对目标数据进行编码,以获取第一音频帧的编码数据。用于在编码模式切换的情况下,提升解码音频质量。

Description

一种音频数据的编码方法、解码方法及装置
相关申请的交叉引用
本申请是以申请号为202211387602.8,申请日为2022年11月7日的中国申请为基础,并主张其优先权,该中国申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及数据处理技术领域,尤其涉及一种音频数据的编码方法、解码方法及装置。
背景技术
在VOIP(Voice over Internet Protocol,基于IP的语音传输)通话中,为了提高音频信号质量,编码端会根据实时网络情况调整编码模式,例如在多描述编码(Multiple Description Coding,MDC)模式和单描述编码(Single Description Coding,SDC)模式之间切换。
由于多描述编码MDC模式和单描述编码SDC模式使用不同编码算法,因此,在延时、采样率等参数上可能不一致,这导致在切换编码模式的情况下解码音频数据时可能出现音频不连续和/或有杂音的问题。
发明内容
有鉴于此,本公开实施例提供了一种音频数据的编码方法、解码方法处理方法及装置,用于在编码模式切换的情况下,提升音频信号的质量。
为了实现上述目的,本公开实施例提供技术方案如下:
第一方面,本公开的实施例提供了一种音频数据的编码方法,包括:
确定第一音频帧的编码模式;
判断所述第一音频帧的编码模式与第二音频帧的编码模式是否相同;所述第二音频帧为所述第一音频帧的前一个音频帧;
若不相同,且所述第一音频帧的编码模式为多描述编码,则根据第一数据、第二数据以及第一延迟,生成第三数据;所述第一数据为对所述第一音频帧的原始音频数据进行分频得到的低频数据,所述第二数据为对所述第二 音频帧的原始音频数据进行分频得到的低频数据,所述第一延迟为所述多描述编码的编码延迟;
对所述第三数据进行多描述编码,以获取所述第一音频帧的编码数据。
作为本申请实施例一种可选的实施方式,所述方法还包括:
若所述第一音频帧的编码模式与第二音频帧的编码模式不相同,且所述第一音频帧的编码模式为单描述编码,则根据第四数据、第五数据以及第二延迟,生成第六数据;所述第四数据为所述第一音频帧的原始音频数据,所述第五数据为所述第二音频帧的原始音频数据,所述第二延迟为所述单描述编码的编码延迟;
对所述第六数据进行单描述编码,以获取所述第一音频帧的编码数据。
作为本申请实施例一种可选的实施方式,所述根据第一数据、第二数据以及第一延迟,生成第三数据,包括:
从所述第二数据的尾端截取长度为所述第一延迟的样点,以获取第七数据;
将所述第七数据拼接于所述第一数据的首端,以获取第八数据;
从所述第八数据的尾端删除长度为所述第一延迟的样点,以获取所述第三数据。
作为本申请实施例一种可选的实施方式,所述根据第四数据、第五数据以及第二延迟,生成第六数据,包括:
从所述第五数据的尾端截取长度为所述第二延迟的样点,以获取第九数据;
将所述第九数据拼接于所述第四数据的首端,以获取第十数据;
从所述第十数据的尾端删除长度为所述第二延迟的样点,以获取所述第六数据。
作为本申请实施例一种可选的实施方式,所述确定第一音频帧的编码模式,包括:
基于编码模式持续时长和所述第一音频帧的信号类型确定是否满足编码模式切换条件;所述编码模式持续时长为当前编码模式连续编码的音频帧的播放时长;
若否,则将所述第二音频帧的编码模式确定为所述第一音频帧的编码模式;
若是,则根据音频编码数据传输网络的网络参数确定所述第一音频帧的编码模式。
作为本申请实施例一种可选的实施方式,所述基于编码模式持续时长和所述第一音频帧的信号类型,确定是否满足编码模式切换条件,包括:
判断所述编码模式持续时长是否大于阈值时长;
判断所述第一音频帧为语音音频帧的概率是否小于阈值概率;
若所述编码模式持续时长大于所述阈值时长,且所述第一音频帧为语音音频帧的概率小于所述阈值概率,则确定满足所述编码模式切换条件;
若所述编码模式持续时长小于或等于所述阈值时长和/或所述第一音频帧为语音音频帧的概率大于或等于所述阈值概率,则确定不满足所述编码模式切换条件。
作为本申请实施例一种可选的实施方式,所述根据音频编码数据传输网络的网络参数确定所述第一音频帧的编码模式,包括:
根据所述网络参数确定所述音频编码数据传输网络的丢包率;
判断所述丢包率是否大于或等于阈值丢包率;
若是,则确定所述第一音频帧的编码模式为多描述编码;
若否,则确定所述第一音频帧的编码模式为单描述编码。
第二方面,本公开的实施例提供了一种音频数据的解码方法,包括:
根据第一音频帧的编码数据确定第一音频帧的编码模式;
根据所述编码模式对所述第一音频帧的编码数据进行解码,获取解码数据;
判断所述第一音频帧的编码模式与第二音频帧的编码模式是否相同;所述第二音频帧为所述第一音频帧的前一个音频帧;
若不相同,且所述第一音频帧的编码模式为多描述编码,则基于所述第二音频帧生成丢包补偿数据;
根据所述第二音频帧的延迟数据和所述丢包补偿数据对所述解码数据进行平滑处理,以获取所述第一音频帧的播放数据。
作为本申请实施例一种可选的实施方式,所述方法还包括:
若所述第一音频帧的编码模式与第二音频帧的编码模式不相同,且所述第一音频帧的编码模式为单描述编码,则基于所述第二音频帧生成丢包补偿数据;
根据所述丢包补偿数据对所述解码数据进行平滑处理,以获取所述解码数据对应的平滑结果;
根据所述丢包补偿数据和延迟样点数量对所述平滑结果进行延迟处理,以获取所述第一音频帧的播放数据;所述延迟样点数量为多描述编码的延迟样点数量。
作为本申请实施例一种可选的实施方式,所述根据所述丢包补偿数据对所述解码数据进行平滑处理,以获取所述解码数据对应的平滑结果,包括:
将所述解码数据中的第一样点序列替换为所述丢包补偿数据中的第二样点序列,以获取第一替换结果;所述第一样点序列为所述解码数据的前第一数量个样点组成的样点序列,所述第一数量为第一预设数量与所述延迟样点数量的差值;所述第二样点序列为所述丢包补偿数据中索引值为所述延迟样点数量至所述第一预设数量的样点组成的样点序列;
基于第一窗函数对所述第一替换结果中的第三样点序列和所述丢包补偿数据中的第四样点序列进行加窗叠加,以获取所述解码数据对应的平滑结果,所述第三样点序列为所述第一替换结果中索引值为所述第一数量至所述第一数量与第二预设数量的和的样点组成的样点序列;所述第四样点序列为所述丢包补偿数据中索引值为所述第一预设数量至所述第一预设数量与第二预设数量的和的样点组成的样点序列。
作为本申请实施例一种可选的实施方式,所述根据所述丢包补偿数据和延迟样点数量对所述平滑结果进行延迟处理,以获取所述第一音频帧的播放数据,包括:
获取第五样点序列,所述第五样点序列为所述丢包补偿数据的前所述延迟样点数量个样点组成的样点序列;
将所述第五样点序列拼接于所述平滑结果之前,以获取第一拼接结果;
删除所述第一拼接结果中的第六样点序列,以获取所述第一音频帧的播 放数据,所述第六样点序列为所述第一拼接结果的后所述延迟样点数量个样点组成的样点序列。
作为本申请实施例一种可选的实施方式,所述根据所述第二音频帧的延迟数据和所述丢包补偿数据对所述解码数据进行平滑处理,以获取所述第一音频帧的播放数据,包括:
将所述解码数据中的第七样点序列替换为所述延迟数据,以获取第二替换结果;所述第七样点序列为所述解码数据中的前所述延迟样点数量个样点组成的样点序列;
基于第二窗函数对所述第二替换结果中的第八样点序列和所述丢包补偿数据中的第九样点序列进行加窗叠加,以获取所述第一音频帧的播放数据,所述第八样点序列为所述第二替换结果中索引值为所述延迟样点数量至所述延迟样点数量与第三预设数量的和的样点组成的样点序列;所述第九样点序列为所述丢包补偿数据的前所述第三预设数量个样点组成的样点序列。
作为本申请实施例一种可选的实施方式,所述方法还包括:
若所述第一音频帧的编码模式与第二音频帧的编码模式相同,且所述第一音频帧的编码模式为单描述编码,则根据所述第二音频帧的延迟数据和所述延迟样点数量对所述解码数据进行延迟处理,以获取所述第一音频帧的播放数据。
作为本申请实施例一种可选的实施方式,所述根据所述第二音频帧的延迟数据和所述延迟样点数量对所述解码数据进行延迟处理,以获取所述第一音频帧的播放数据,包括:
将所述延迟数据拼接于所述解码数据之前,以获取第二拼接结果;
删除所述第二拼接结果中的第十样点序列,以获取所述第一音频帧的播放数据,所述第十样点序列为所述第二拼接结果的后所述延迟样点数量个样点组成的样点序列。
第三方面,本公开实施例提供一种音频数据的编码装置,包括:
确定单元,用于确定第一音频帧的编码模式;
判断单元,用于判断所述第一音频帧的编码模式与第二音频帧的编码模式是否相同;所述第二音频帧为所述第一音频帧的前一个音频帧;
生成单元,用于在所述第一音频帧的编码模式与第二音频帧的编码模式不相同且所述第一音频帧的编码模式为多描述编码的情况下,根据第一数据、第二数据以及第一延迟,生成第三数据;所述第一数据为对所述第一音频帧的原始音频数据进行分频得到的低频数据,所述第二数据为对所述第二音频帧的原始音频数据进行分频得到的低频数据,所述第一延迟为所述多描述编码的编码延迟;
编码单元,用于对所述第三数据进行多描述编码,以获取所述第一音频帧的编码数据。
作为本申请实施例一种可选的实施方式,所述生成单元还用于在所述第一音频帧的编码模式与第二音频帧的编码模式不相同,且所述第一音频帧的编码模式为单描述编码的情况下,根据第四数据、第五数据以及第二延迟,生成第六数据;所述第四数据为所述第一音频帧的原始音频数据,所述第五数据为所述第二音频帧的原始音频数据,所述第二延迟为所述单描述编码的编码延迟;
所述编码单元还用于对所述第六数据进行单描述编码,以获取所述第一音频帧的编码数据。
作为本申请实施例一种可选的实施方式,所述生成单元具体用于:从所述第二数据的尾端截取长度为所述第一延迟的样点,以获取第七数据;将所述第七数据拼接于所述第一数据的首端,以获取第八数据;从所述第八数据的尾端删除长度为所述第一延迟的样点,以获取所述第三数据。
作为本申请实施例一种可选的实施方式,所述生成单元具体用于:从所述第五数据的尾端截取长度为所述第二延迟的样点,以获取第九数据;将所述第九数据拼接于所述第四数据的首端,以获取第十数据;从所述第十数据的尾端删除长度为所述第二延迟的样点,以获取所述第六数据。
作为本申请实施例一种可选的实施方式,所述确定单元具体用于:基于编码模式持续时长和所述第一音频帧的信号类型确定是否满足编码模式切换条件;所述编码模式持续时长为当前编码模式连续编码的音频帧的播放时长;若否,则将所述第二音频帧的编码模式确定为所述第一音频帧的编码模式;若是,则根据音频编码数据传输网络的网络参数确定所述第一音频帧的编码 模式。
作为本申请实施例一种可选的实施方式,所述确定单元具体用于:判断所述编码模式持续时长是否大于阈值时长;判断所述第一音频帧为语音音频帧的概率是否小于阈值概率;若所述编码模式持续时长大于所述阈值时长,且所述第一音频帧为语音音频帧的概率小于所述阈值概率,则确定满足所述编码模式切换条件;若所述编码模式持续时长小于或等于所述阈值时长和/或所述第一音频帧为语音音频帧的概率大于或等于所述阈值概率,则确定不满足所述编码模式切换条件。
作为本申请实施例一种可选的实施方式,所述确定单元具体用于:根据所述网络参数确定所述音频编码数据传输网络的丢包率;判断所述丢包率是否大于或等于阈值丢包率;若是,则确定所述第一音频帧的编码模式为多描述编码;若否,则确定所述第一音频帧的编码模式为单描述编码。
第四方面,本公开实施例提供一种音频数据的解码装置,包括:
确定单元,用于根据第一音频帧的编码数据确定第一音频帧的编码模式;
解码单元,用于根据所述编码模式对所述第一音频帧的编码数据进行解码,获取解码数据;
判断单元,用于判断所述第一音频帧的编码模式与第二音频帧的编码模式是否相同;所述第二音频帧为所述第一音频帧的前一个音频帧;
处理单元,用于在所述第一音频帧的编码模式与第二音频帧的编码模式不相同且所述第一音频帧的编码模式为多描述编码的情况下,基于所述第二音频帧生成丢包补偿数据;以及根据所述第二音频帧的延迟数据和所述丢包补偿数据对所述解码数据进行平滑处理,以获取所述第一音频帧的播放数据。
作为本公开实施例一种可选的实施方式,所述处理单元还用于:在所述第一音频帧的编码模式与第二音频帧的编码模式不相同,且所述第一音频帧的编码模式为单描述编码的情况下,基于所述第二音频帧生成丢包补偿数据;根据所述丢包补偿数据对所述解码数据进行平滑处理,以获取所述解码数据对应的平滑结果;根据所述丢包补偿数据和延迟样点数量对所述平滑结果进行延迟处理,以获取所述第一音频帧的播放数据;所述延迟样点数量为多描述编码的延迟样点数量。
作为本申请实施例一种可选的实施方式,所述处理单元具体用于:将所述解码数据中的第一样点序列替换为所述丢包补偿数据中的第二样点序列,以获取第一替换结果;所述第一样点序列为所述解码数据的前第一数量个样点组成的样点序列,所述第一数量为第一预设数量与所述延迟样点数量的差值;所述第二样点序列为所述丢包补偿数据中索引值为所述延迟样点数量至所述第一预设数量的样点组成的样点序列;基于第一窗函数对所述第一替换结果中的第三样点序列和所述丢包补偿数据中的第四样点序列进行加窗叠加,以获取所述解码数据对应的平滑结果,所述第三样点序列为所述第一替换结果中索引值为所述第一数量至所述第一数量与第二预设数量的和的样点组成的样点序列;所述第四样点序列为所述丢包补偿数据中索引值为所述第一预设数量至所述第一预设数量与第二预设数量的和的样点组成的样点序列。
作为本申请实施例一种可选的实施方式,所述处理单元具体用于:获取第五样点序列,所述第五样点序列为所述丢包补偿数据的前所述延迟样点数量个样点组成的样点序列;将所述第五样点序列拼接于所述平滑结果之前,以获取第一拼接结果;删除所述第一拼接结果中的第六样点序列,以获取所述第一音频帧的播放数据,所述第六样点序列为所述第一拼接结果的后所述延迟样点数量个样点组成的样点序列。
作为本申请实施例一种可选的实施方式,所述处理单元具体用于:将所述解码数据中的第七样点序列替换为所述延迟数据,以获取第二替换结果;所述第七样点序列为所述解码数据中的前所述延迟样点数量个样点组成的样点序列;基于第二窗函数对所述第二替换结果中的第八样点序列和所述丢包补偿数据中的第九样点序列进行加窗叠加,以获取所述第一音频帧的播放数据,所述第八样点序列为所述第二替换结果中索引值为所述延迟样点数量至所述延迟样点数量与第三预设数量的和的样点组成的样点序列;所述第九样点序列为所述丢包补偿数据的前所述第三预设数量个样点组成的样点序列。
作为本申请实施例一种可选的实施方式,所述处理单元还用于在所述第一音频帧的编码模式与第二音频帧的编码模式相同,且所述第一音频帧的编码模式为单描述编码的情况下,根据所述第二音频帧的延迟数据和所述延迟 样点数量对所述解码数据进行延迟处理,以获取所述第一音频帧的播放数据。
作为本申请实施例一种可选的实施方式,所述处理单元具体用于:将所述延迟数据拼接于所述解码数据之前,以获取第二拼接结果;删除所述第二拼接结果中的第十样点序列,以获取所述第一音频帧的播放数据,所述第十样点序列为所述第二拼接结果的后所述延迟样点数量个样点组成的样点序列。
第五方面,本公开实施例提供一种电子设备,包括:存储器和处理器,所述存储器用于存储计算机程序;所述处理器用于在执行计算机程序时,使得所述电子设备实现上述任一实施方式所述的音频数据的编码方法或音频数据的解码方法。
第六方面,本公开实施例提供一种计算机可读存储介质,当所述计算机程序被计算设备执行时,使得所述计算设备实现上述任一实施方式所述的音频数据的编码方法或音频数据的解码方法。
第七方面,本公开实施例提供一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机实现上述任一实施方式所述的音频数据的编码方法或音频数据的解码方法。
本公开实施例提供的音频数据的编码方法、解码方法通过以下步骤生成目标数据:确定第一音频帧的编码模式;判断所述第一音频帧的编码模式与第二音频帧的编码模式是否相同;若不相同,且所述第一音频帧的编码模式为多描述编码,则根据第一数据、第二数据以及第一延迟,生成目标数据。由于本公开实施例提供的音频数据的编码方法在所述第一音频帧的编码模式与所述第二音频帧的编码模式不相同,且所述第一音频帧的编码模式为多描述编码时,会根据第二音频帧的原始音频数据进行分频得到的低频数据和多描述编码的编码延迟对第一音频帧的原始音频数据进行分频得到的低频数据进行处理,然后再对处理得到的第三数据进行编码,因此本申请实施例可以在变化模式由单描述编码切换为多描述编码时,避免出现音频不连续以及有杂音的问题,进而提升音频信号的质量。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要调用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的音频数据的编码方法的步骤流程图之一;
图2为本公开实施例提供的音频数据的编码方法的示意图之一;
图3为本公开实施例提供的音频数据的编码方法的示意图之二;
图4为本公开实施例提供的音频数据的编码方法的步骤流程图之二;
图5为本公开实施例提供的音频数据的编码方法的步骤流程图之三;
图6为本公开实施例提供的音频数据的解码方法的步骤流程图之一;
图7为本公开实施例提供的音频数据的解码方法的步骤流程图之二;
图8为本公开实施例提供的音频数据的解码方法的示意图之一;
图9为本公开实施例提供的音频数据的解码方法的示意图之二;
图10为本公开实施例提供的音频数据的解码方法的示意图之三;
图11为本公开实施例提供的音频数据的解码方法的示意图之四;
图12为本公开实施例提供的音频数据的解码方法的示意图之五;
图13为本公开实施例提供的音频数据的解码方法的步骤流程图之三;
图14为本公开实施例提供的音频数据的解码方法的示意图之六;
图15为本公开实施例提供的音频数据的编码装置结构示意图;
图16为本公开实施例提供的音频数据的解码装置结构示意图;
图17为本公开实施例提供的电子设备的硬件结构示意图。
具体实施方式
为了能够更清楚地理解本公开的上述目的、特征和优点,下面将对本公开的方案进行进一步描述。需要说明的是,在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合。
在下面的描述中阐述了很多具体细节以便于充分理解本公开,但本公开 还可以采用其他不同于在此描述的方式来实施;显然,说明书中的实施例只是本公开的一部分实施例,而不是全部的实施例。
在本公开实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本公开实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,调用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。此外,在本公开实施例的描述中,除非另有说明,“多个”的含义是指两个或两个以上。
本公开实施例提供了一种音频数据的编码方法,参照图1所示,该音频数据的编码方法包括以下步骤:
S101、确定第一音频帧的编码模式。
本公开实施例中,音频帧的编码方式为单描述编码(Single Description Coding,SDC)和多描述编码(Multiple Description Coding,MDC)。
S102、判断所述第一音频帧的编码模式与第二音频帧的编码模式是否相同。
其中,所述第二音频帧为所述第一音频帧的前一个音频帧。
在上步骤S102中,若所述第一音频帧的编码模式与第二音频帧的编码模式不同,且所述第一音频帧的编码模式为多描述编码,则执行如下步骤S103和S104:
S103、根据第一数据、第二数据以及第一延迟,生成第三数据。
其中,所述第一数据为对所述第一音频帧的原始音频数据进行分频得到的低频数据,所述第二数据为对所述第二音频帧的原始音频数据进行分频得到的低频数据,所述第一延迟为所述多描述编码的编码延迟。
在一些实施例中,若当前音频帧的编码模式为多描述编码,则将当前音频帧的原始数据写入延迟缓存(delay_buffer)中,而若当前音频帧的编码模式为单描述编码,则将对当前音频帧的原始数据进行分频得到的低频数据写入指定缓存中,以便在需要获取第二数据时,直接从所述延迟缓存中读取对前一个音频帧的原始音频数据进行分频得到的低频数据。
S104、对所述第三数据进行多描述编码,以获取所述第一音频帧的编码 数据。
在上步骤S102中,若所述第一音频帧的编码模式与第二音频帧的编码模式不同,且所述第一音频帧的编码模式为单描述编码,则执行如下步骤S105和S106:
S105、则根据第四数据、第五数据以及第二延迟,生成第六数据。
其中,所述第四数据为所述第一音频帧的原始音频数据,所述第五数据为所述第二音频帧的原始音频数据。所述第二延迟为所述单描述编码的编码延迟。
S106、对所述第六数据进行单描述编码,以获取所述第一音频帧的编码数据。
本公开实施例提供的音频数据的编码方法、解码方法通过以下步骤生成目标数据:确定第一音频帧的编码模式;判断所述第一音频帧的编码模式与第二音频帧的编码模式是否相同;若不相同,且所述第一音频帧的编码模式为多描述编码,则根据第一数据、第二数据以及第一延迟,生成目标数据。由于本公开实施例提供的音频数据的编码方法在所述第一音频帧的编码模式与所述第二音频帧的编码模式不相同,且所述第一音频帧的编码模式为多描述编码时,会根据第二音频帧的原始音频数据进行分频得到的低频数据和多描述编码的编码延迟对第一音频帧的原始音频数据进行分频得到的低频数据进行处理,然后再对处理得到的第三数据进行编码,因此本申请实施例可以在变化模式由单描述编码切换为多描述编码时,避免出现音频不连续以及有杂音的问题,进而提升音频信号的质量。
作为上述实施例的细化与扩展,本公开实施例提供了一种音频数据的编码方法,参照图2所示,该音频数据的编码方法包括以下步骤:
S201、确定第一音频帧的编码模式。
即,确定当前音频帧的编码模式。
S202、判断所述第一音频帧的编码模式与第二音频帧的编码模式是否相同。
其中,所述第二音频帧为所述第一音频帧的前一个音频帧。
即,判断当前音频帧的编码模式与前一个音频帧的编码模式是否相同。
在上述S202中,若所述第一音频帧的编码模式与第二音频帧的编码模式不相同,且所述第一音频帧的编码模式为多描述编码,则执行如下S203至S206:
S203、从所述第二数据的尾端截取长度为所述第一延迟的样点,以获取第七数据。
S204、将所述第五数据拼接于所述第一数据的首端,以获取第八数据。
S205、从所述第八数据的尾端删除长度为所述第一延迟的样点,以获取所述第三数据。
S206、对所述第三数据进行多描述编码,以获取所述第一音频帧的编码数据。
当所述第一音频帧的编码模式与第二音频帧的编码模式不相同且所述第一音频帧的编码模式为多描述编码时,即当前音频帧的编码模式为多描述编码,上一帧音频帧的编码模式为单描述编码时,参照图4所示,图3中第一延迟长度为delay_8kHZ,延迟缓存(delay_buffer)中缓存的数据为对第二音频帧的原始数据进行分频得到的低频数据(第二数据31),多描述编码时编码器的输入为对第一音频帧进行分频得到的低频数据(第一数据32)。上步骤S203至S205的数据处理过程包括:首先从所述第二数据31的尾端截取长度为所述delay_8kHZ的样点,以获取第七数据311,其次将所述第七数据311拼接于所述第一数据32的首端,以获取第八数据33,最后从所述第八数据33的尾端删除长度为delay_8kHZ的样点,以获取所述第三数据34。如图3所示,第三数据34由两部分组成,一部分为第七数据311,另一部分为将第一数据32尾端删除长度为delay_8kHZ的样点后第一数据32剩余的数据。
在上述S202中,若所述第一音频帧的编码模式与第二音频帧的编码模式不相同,且所述第一音频帧的编码模式为多描述编码,则执行如下S207至S210:
S207、从所述第五数据的尾端截取长度为所述第二延迟的样点,以获取第九数据。
S208、将所述第九数据拼接于所述第四数据的首端,以获取第十数据。
S209、从所述第十数据的尾端删除长度为所述第二延迟的样点,以获取 所述第六数据。
S210、对所述第六数据进行单描述编码,以获取所述第一音频帧的编码数据。
当所述第一音频帧的编码模式与第二音频帧的编码模式不相同且所述第一音频帧的编码模式为单描述编码时,即当前音频帧的编码模式为单描述编码,上一帧音频帧的编码模式为多描述编码时,参照图4所示,图4中第一延迟长度为delay_16kHZ,延迟缓存(delay_buffer)中缓存的数据为对第二音频帧的原始音频数据(第五数据41),单描述编码时输出编码器的输入为第一音频帧的原始音频数据(第四数据42)。上述步骤S207至S209的数据处理过程包括:首先从所述第五数据41的尾端截取长度为所述delay_16kHZ的样点,以获取第九数据411,其次将所述第九数据411拼接于所述第四数据42的首端,以获取第十数据43,最后从所述第十数据43的尾端删除长度为delay_16kHZ的样点,以获取所述第六数据44。如图4所示,第六数据44由两部分组成,一部分为第九数据411,另一部分为将第四数据42尾端删除长度为delay_16kHZ的样点后第四数据42剩余的数据。
作为上述实施例的细化与扩展,本公开实施例提供了一种音频数据的处理方法,参照图5,该音频数据的处理方法包括:
S501、基于编码模式持续时长和所述第一音频帧的信号类型确定是否满足编码模式切换条件。
其中,所述编码模式持续时长为当前编码模式连续编码的音频帧的播放时长。
在一些实施例中,基于编码模式持续时长和所述第一音频帧的信号类型确定是否满足编码模式切换条件的实现方式可以包括如下步骤a至步骤d:
步骤a、判断所述编码模式持续时长是否大于阈值时长。
本申请实施例对阈值时长不做限定,示例性的,阈值时长可以为2s。
在上述步骤a中,若所述编码模式持续时长小于或等于阈值时长,则执行如下步骤b。
步骤b、确定不满足编码模式切换条件。
在上述步骤a中,若所述编码模式持续时长大于阈值时长,则执行如下 步骤c至e:
步骤c、判断所述第一音频帧为语音音频帧的概率是否小于阈值概率。
在上述步骤c中,若所述第一音频帧为语音音频帧的概率小于所述阈值概率,则执行如下步骤d:
步骤d、确定满足所述编码模式切换条件。
在上述步骤c中,若所述第一音频帧为语音音频帧的概率大于或等于所述阈值概率,则执行如下步骤e:
步骤e:确定不满足所述编码模式切换条件。
即,述编码模式持续时长小于或等于所述阈值时长和/或所述第一音频帧为语音音频帧的概率大于或等于所述阈值概率,则确定不满足所述编码模式切换条件。
在上述步骤S501中,若不满足所述编码模式切换条件,则执行如下步骤S502:
S502、将所述第二音频帧的编码模式确定为所述第一音频帧的编码模式。
即,沿用前音频帧的编码模式进行编码。
在上述步骤S501中,若满足所述编码模式切换条件,则执行如下步骤S503:
S503、根据音频编码数据传输网络的网络参数确定所述第一音频帧的编码模式。
在一些实施例中,上步骤S503(根据音频编码数据传输网络的网络参数确定所述第一音频帧的编码模式)的实现方式步骤如下步骤1至步骤3:
步骤1、根据所述网络参数确定所述音频编码数据传输网络的丢包率。
本申请实施例中的丢包率(Packet Loss Rate)是指数据包传输过程中所丢失数据包数量占全部发送的数据包的比率。
步骤2、判断所述丢包率是否大于或等于阈值丢包率。
本申请实施例对阈值丢包率不做限定,示例性的,阈值丢包率可以为5%。
在上步骤2中,若所述丢包率大于或等于所述阈值丢包率,则执行如下步骤3,而若所述丢包率小于所述阈值丢包率,则执行如下步骤4:
步骤3、确定所述第一音频帧的编码模式为多描述编码。
步骤4、确定所述第一音频帧的编码模式为单描述编码。
S504、判断所述第一音频帧的编码模式与第二音频帧的编码模式是否相同。
在上述S504中,所述第一音频帧的编码模式与第二音频帧的编码模式不相同且所述第一音频帧的编码模式为多描述编码,则执行如下S505至S508:
S505、从所述第二数据的尾端截取长度为所述第一延迟的样点,以获取第七数据。
其中,所述第二数据为对所述第二音频帧的原始音频数据进行分频得到的低频数据,所述第一延迟为所述多描述编码的编码延迟。
S506、将所述第七数据拼接于所述第一数据的首端,以获取第八数据。
其中,所述第一数据为对所述第一音频帧的原始音频数据进行分频得到的低频数据。
S507、从所述第八数据的尾端删除长度为所述第一延迟的样点,以获取所述第三数据。
S508、对所述第三数据进行多描述编码,以获取所述第一音频帧的编码数据。
在上述S504中,若所述第一音频帧的编码模式与第二音频帧的编码模式不相同且所述第一音频帧的编码模式为单描述编码,则执行如下S509至S512:
S509、从所述第五数据的尾端截取长度为所述第二延迟的样点,以获取第九数据。
S510、将所述第九数据拼接于所述第四数据的首端,以获取第十数据。
S511、从所述第十数据的尾端删除长度为所述第二延迟的样点,以获取所述第四数据。
S512、对所述第六数据进行单描述编码,以获取所述第一音频帧的编码数据。
在上述S504中,若所述第一音频帧的编码模式与第二音频帧的编码模式相同且所述第一音频帧的编码模式为多描述编码,则对于对所述第一音频帧的原始音频数据进行分频得到的低频数据进行多描述编码,以获取所述第一 音频帧的编码数据,在上述S504中,若所述第一音频帧的编码模式与第二音频帧的编码模式相同且所述第一音频帧的编码模式为单描述编码,对所述第一音频帧的原始音频数据进行单描述编码,以获取所述第一音频帧的编码数据。
本公开实施例提供了一种音频数据的解码方法,参照图6,该音频数据的解码方法包括:
S601、根据第一音频帧的编码数据确定第一音频帧的编码模式。
S602、根据所述编码模式对所述第一音频帧的编码数据进行解码,获取解码数据。
S603、判断所述第一音频帧的编码模式与第二音频帧的编码模式是否相同。
其中,所述第二音频帧为所述第一音频帧的前一个音频帧。
在上述S603中,所述第一音频帧的编码模式与第二音频帧的编码模式不相同,且所述第一音频帧的编码模式为单描述编码,执行如下S604至S606:
S604、基于所述第二音频帧生成丢包补偿数据。
所述丢包补偿数据为基于丢包补偿机制(Packet Loss Concealment,PLC)得到的数据,丢包补偿机制是媒体引擎用来解决网络丢包问题的。当媒体引擎在接收一系列媒体流数据包时,并不能保证所有的包都被接收到。如果有数据包丢失,且此时前向纠错(Forward Error Correction,FEC)机制又没有使用,丢包补偿机制就会起作用。丢包补偿机制并不是标准一致,它允许媒体引擎和编解码器根据自身情况加以实现和扩展。
本申请实施例中的丢包补偿数据可以为长度为10ms的数据。
S605、根据所述丢包补偿数据对所述解码数据进行平滑处理,以获取所述解码数据对应的平滑结果。
S606、根据所述丢包补偿数据和延迟样点数量对所述平滑结果进行延迟处理,以获取所述第一音频帧的播放数据。
其中,所述延迟样点数量为多描述编码的延迟样点数量。
在本公开实施例中,由于MDC算法本身有qmf_order-1个样点数的延时,因此当第一音频帧的编码方式为MDC时,可以设置解码后输出音频延时为0, 而当第一音频帧的编码方式为SDC时,为了对齐与MDC算法延时,需要设置解码后输出音频延时为qmf_order-1,对齐两种算法的延时可以通过以下公式实现:
在上述S603中,所述第一音频帧的编码模式与第二音频帧的编码模式不相同,且所述第一音频帧的编码模式为多描述编码,执行如下S607和S608:
S607、基于所述第二音频帧生成丢包补偿数据。
S608、根据所述第二音频帧的延迟数据和所述丢包补偿数据对所述解码数据进行平滑处理,以获取所述第一音频帧的播放数据。
在上述实施例在对第一音频数据的数据包进行解码时,首先根据第一音频帧的编码数据确定第一音频帧的编码模式,然后根据所述编码模式对所述第一音频帧的编码数据进行解码,获取解码数据,再判断所述第一音频帧的编码模式与第二音频帧的编码模式是否相同,若编码模式不相同且所述第一音频帧的编码模式为单描述编码,基于所述第二音频帧生成丢包补偿数据,再根据所述丢包补偿数据对所述解码数据进行平滑处理,以获取所述解码数据对应的平滑结果。根据所述丢包补偿数据和延迟样点数量对所述平滑结果进行延迟处理,以获取所述第一音频帧的播放数据;若编码模式不相同且所述第一音频帧的编码模式为多描述编码,基于所述第二音频帧生成丢包补偿数据,再根据所述第二音频帧的延迟数据和所述丢包补偿数据对所述解码数据进行平滑处理,以获取所述第一音频帧的播放数据。本公开实施例提供的音频数据的解码方法当所述第一音频帧的编码模式与所述第二音频帧的编码模式不相同且第一音频帧的编码模式为单描述编码时,会基于第二音频帧生成丢包补偿数据,进而对解码数据进行平滑处理,得到第一音频帧的播放数据;当所述第一音频帧的编码模式与所述第二音频帧的编码模式不相同且第一音频帧的编码模式为多描述编码时,会基于第二音频帧生成丢包补偿数据,再结合所述第二音频帧的延迟数据获取所述第一音频帧的播放数据,因此本申请实施例可以在第一音频帧的编码模式与第二音频帧的编码模式不相同时,根据当前音频帧编码模式类型对编码数据进行处理,进而避免出现音频 不连续以及有杂音的问题,进而提升音频信号的质量。
作为上述实施例的细化与扩展,本公开实施例提供了一种音频数据的解码方法,参照图7所示,该音频数据的解码方法包括以下步骤:
S701、根据第一音频帧的编码数据确定第一音频帧的编码模式。
S702、根据所述编码模式对所述第一音频帧的编码数据进行解码,获取解码数据。
S703、判断所述第一音频帧的编码模式与第二音频帧的编码模式是否相同。
其中,所述第二音频帧为所述第一音频帧的前一个音频帧。
在上述S703中,若所述第一音频帧的编码模式与第二音频帧的编码模式相同,且所述第一音频帧的编码模式为单描述编码,则执行如下S704至S706:
S704、将所述解码数据中的第一样点序列替换为所述丢包补偿数据中的第二样点序列,以获取第一替换结果。
其中,所述第一样点序列为所述解码数据的前第一数量个样点组成的样点序列,所述第一数量为第一预设数量与所述延迟样点数量的差值;所述第二样点序列为所述丢包补偿数据中索引值为所述延迟样点数量至所述第一预设数量的样点组成的样点序列。
在一些实施例中,若当前音频帧的编码模式为多描述编码,则将当前音频帧的原始数据写入过渡缓存(transition_buffer)中,而若当前音频帧的编码模式为单描述编码,则将对当前音频帧的原始数据进行分频得到的低频数据写入指定缓存中,以便在需要获取第二数据时,直接从所述延迟缓存中读取对前一个音频帧的原始音频数据进行分频得到的低频数据。所述解码数据将存储在脉冲编码调制缓存(pcm_buffer)中,所述第一替换结果将写入脉冲编码调制缓存中原解码数据的存储位置,下述S707中获取的第二替换结果也同样会写入到脉冲编码调制缓存中原解码数据的存储位置。
在本实施例中,所述解码数据存储在脉冲编码调制缓存中,解码数据中的第一样点序列为脉冲编码调制缓存前F5-Fd样点序列。丢包补偿数据中的第二样点序列为丢包补偿数据中索引值为Fd至F5的样点序列,获取第一替换结果可以通过以下公式实现:
pcm_buffe(i-Fd)=transition_buffer(i)
i=Fd,……,F5-1
在本实施例中,参照图8,所述延迟样点数量为Fd、第一预设数量为F5。图8中过渡缓存中存储的为基于第二音频帧生成丢包补偿数据(丢包补偿数据81),过渡缓存中索引值为所述延迟样点数量至第一预设数量的样点组成的样点序列为第二样点序列811,存储在脉冲编码调制缓存中的为根据编码模式对第一音频帧的编码数据进行解码得到的解码数据(解码数据82),脉冲编码调制缓存中的前第一数量个样点组成的样点序列(第一样点序列821)。则上步骤S704即为:将解码数据82中的第一样点序列821替换为丢包补偿数据81中的第二样点序列811,以获取第一替换结果83。
S705、基于第一窗函数对所述第一替换结果中的第三样点序列和所述丢包补偿数据中的第四样点序列进行加窗叠加,以获取所述解码数据对应的平滑结果。
其中,所述第三样点序列为所述第一替换结果中索引值为所述第一数量至所述第一数量与第二预设数量的和的样点组成的样点序列;所述第四样点序列为所述丢包补偿数据中索引值为所述第一预设数量至所述第一预设数量与第二预设数量的和的样点组成的样点序列。
窗函数:傅里叶变换只能对有限长度的时域数据进行变换,因此,需要对时域信号进行信号截断。即使是周期信号,如果截断的时间长度不是周期的整数倍(周期截断),那么,截取后的信号将会存在泄漏。为了将这个泄漏误差减少到最小程度,我们需要使用加权函数,也叫窗函数。加窗主要是为了使时域信号似乎更好地满足傅里叶处理的周期性要求,减少泄漏。本实施例中根据切换类型进行平滑,使用加窗平滑的方式进行过渡平滑。
在本实施例中,所述第三样点序列为脉冲编码调制缓存中索引值从F5-Fd至F5-Fd+F2.5的样点序列。所述第四样点序列为过渡缓存中索引值从F5-Fd至F5-Fd+F2.5的样点序列,获取平滑结果可以通过以下公式实现:
pcm_buffe(i+F5-Fd)
=w(i)*pcm(i+F5-Fd)+(1-w(i))
*transition_buffer(i+F5)
i=0,1,……F2.5-1
其中,w(i)为窗函数的表达式,平滑方法为将对应部分与过渡缓存中索引为F5到F5+F2.5的样点进行加窗叠加达到平滑过渡的目的。
在上述图8所示实施例的基础上,参照图9所示,所述第二预设数量F2.5。在上述S704的基础上,过渡缓存中的索引值为所述第一预设数量至所述第一预设数量与第二预设数量的和的样点组成的样点序列为第四样点序列812,脉冲编码调制缓存中的第一替换结果83中索引值为第一数量至第一数量与第二预设数量的和的样点组成的样点序列为第三样点序列831。将所述第一替换结果83中的第三样点序列831和所述丢包补偿数据81中的第四样点序列812进行加窗叠加,得到所述解码数据对应的平滑结果91,平滑结果91。
S706、获取第五样点序列。
其中,所述第五样点序列为所述丢包补偿数据的前所述延迟样点数量个样点组成的样点序列。
在本实施例中,所述第五样点序列为过渡缓存中前Fd个样点序列序列。获取第五样点序列可以通过以下公式实现:
delay_buffer(i)=transition_buffer(i)
i=0,1,……Fd-1
S707、将所述第五样点序列拼接于所述平滑结果之前,以获取第一拼接结果。
S708、删除所述第一拼接结果中的第六样点序列,以获取所述第一音频帧的播放数据。
其中,所述第六样点序列为所述第一拼接结果的后所述延迟样点数量个样点组成的样点序列。
在上述图9所示实施例的基础上,参照图10所示,丢包补偿数据的前所述延迟样点数量个样点组成的样点序列为第五样点序列101,首先,将第五样 点序列101拼接至平滑结果91之前,得到第一拼接结果102,第一拼接结果102后所述延迟样点数量个样点组成的样点序列为第六样点序列103,然后将第一拼接结果102中的第六样点序列103删除,得到所述第一音频帧的播放数据104,第一音频帧的播放数据104由第五样点序列101、第二样点序列811以及将第一拼接结果102尾端删除第六样点序列的剩余部分。
在上述S703中,若所述第一音频帧的编码模式与第二音频帧的编码模式相同,且所述第一音频帧的编码模式为多描述编码,则执行如下S709和S710:
S709、将所述解码数据中的第七样点序列替换为所述延迟数据,以获取第二替换结果。
其中,所述第七样点序列为所述解码数据中的前所述延迟样点数量个样点组成的样点序列,获取第二替换结果可以通过以下公式实现:
pcm_buffer(i)=delay_buffer(i)
i=0,1,……qmf_order-2
在本实施例中,参照图11所示,图中在脉冲编码调制缓存中的为根据编码模式对第一音频帧的编码数据进行解码得到的解码数据(解码数据112),脉冲编码调制缓存中的前qmf_order-1的样点序列为第七样点序列1121,延迟缓存中的前qmf_order-1样点序列为延迟数据111,将所述解码数据112中的第七样点序列1121替换为所述延迟数据111,得到第二替换结果113,第二替换结果113由延迟数据111以及解码数据112中尾端的剩余部分。
S710、基于第二窗函数对所述第二替换结果中的第八样点序列和所述丢包补偿数据中的第九样点序列进行加窗叠加。
其中,所述第八样点序列为所述第二替换结果中索引值为所述延迟样点数量至所述延迟样点数量与第三预设数量的和的样点组成的样点序列;所述第九样点序列为所述丢包补偿数据的前所述第三预设数量个样点组成的样点序列,上述步骤基于第二窗函数对所述第二替换结果中的第八样点序列和所述丢包补偿数据中的第九样点序列进行加窗叠加可以通过以下公式实现:
pcm_buffe(i+qmf_order-1)
=(i)*pcm(i+qmf_order-1)+(1-(i))
*transition_buffer(i),i=0,1,……F2.5-1
在本实施例中,在上述图11所示的实施例基础上,参照图12所示,过渡缓存中的前第三预设数量个样点组成的样点序列为第九样点序列1211,脉冲编码调制缓存中的第二替换结果113中索引值为所述延迟样点数量至所述延迟样点数量与第三预设数量的和的样点组成的样点序列为第八样点序列1031,丢包补偿数据121中的第九样点序列1211与第八样点序列1031进行加窗叠加得到的结果122,加窗叠加得到的结果122由延迟数据102、第九样点序列1211和第八样点序列1031加窗叠加得到的平滑结果1221,以及第二替换结果104中尾端剩余的部分组成。
作为上述实施例的细化与扩展,本公开实施例提供了一种音频数据的处理方法,参照图13所示,该音频数据的处理方法包括以下步骤:
S1301、根据第一音频帧的编码数据确定第一音频帧的编码模式。
S1302、根据所述编码模式对所述第一音频帧的编码数据进行解码,获取解码数据。
S1303、判断所述第一音频帧的编码模式与第二音频帧的编码模式是否相同。
在上述S1303中,所述第一音频帧的编码模式与第二音频帧的编码模式相同,则执行如下步骤a和步骤b:
步骤a、将所述延迟数据拼接于所述解码数据之前,以获取第二拼接结果。
步骤b、删除所述第二拼接结果中的第十样点序列,以获取所述第一音频帧的播放数据。
其中,所述第十样点序列为所述第二拼接结果的后所述延迟样点数量个样点组成的样点序列。
在一些实施例中,上述步骤a与步骤b可参照图14,所述延迟数据1411为延迟缓存中前qmf_order-1样点序列,将所述延迟数据1411拼接于所述解码数据142之前,获取第二拼接结果143。第二拼接结果143的后所述延迟样点数量个样点组成的样点序列为第十样点序列1431,然后删除第二拼接结果143中的第十样点序列1431,以获取所述第一音频帧的播放数据144,第一音频帧的播放数据144由延迟数据1411以及解码数据142中尾端剩余的部分。
在上述S1303中,所述第一音频帧的编码模式与第二音频帧的编码模式不相同,且所述第一音频帧的编码模式为单描述编码则执行如下S1304至S1306:
S1304、基于所述第二音频帧生成丢包补偿数据。
S1305、将所述解码数据中的第一样点序列替换为所述丢包补偿数据中的第二样点序列,以获取第一替换结果。
其中,所述第一样点序列为所述解码数据的前第一数量个样点组成的样点序列,所述第一数量为第一预设数量与所述延迟样点数量的差值;所述第二样点序列为所述丢包补偿数据中索引值为所述延迟样点数量至所述第一预设数量的样点组成的样点序列。
S1306、基于第一窗函数对所述第一替换结果中的第三样点序列和所述丢包补偿数据中的第四样点序列进行加窗叠加,以获取所述解码数据对应的平滑结果。
其中,所述第三样点序列为所述第一替换结果中索引值为所述第一数量至所述第一数量与第二预设数量的和的样点组成的样点序列;所述第四样点序列为所述丢包补偿数据中索引值为所述第一预设数量至所述第一预设数量与第二预设数量的和的样点组成的样点序列。
在上述S1303中,所述第一音频帧的编码模式与第二音频帧的编码模式不相同,且所述第一音频帧的编码模式为多描述编码则,则执行如下S1307至S1313:
S1307、获取第五样点序列。
其中,所述第五样点序列为所述丢包补偿数据的前所述延迟样点数量个样点组成的样点序列。
S1308、将所述第五样点序列拼接于所述平滑结果之前,以获取第一拼接结果。
S1309、删除所述第一拼接结果中的第六样点序列,以获取所述第一音频帧的播放数据。
其中,所述第六样点序列为所述第一拼接结果的后所述延迟样点数量个样点组成的样点序列。
S1310、若所述第一音频帧的编码模式为多描述编码,则基于所述第二音频帧生成丢包补偿数据。
S1311、将所述解码数据中的第七样点序列替换为所述延迟数据,以获取第二替换结果。
S1312、基于第二窗函数对所述第二替换结果中的第八样点序列和所述丢包补偿数据中的第九样点序列进行加窗叠加,以获取所述第一音频帧的播放数据。
其中,所述第八样点序列为所述第二替换结果中索引值为所述延迟样点数量至所述延迟样点数量与第三预设数量的和的样点组成的样点序列;所述第九样点序列为所述丢包补偿数据的前所述第三预设数量个样点组成的样点序列。
S1313、根据所述丢包补偿数据和延迟样点数量对所述平滑结果进行延迟处理,以获取所述第一音频帧的播放数据。
基于同一发明构思,作为对上述方法的实现,本公开实施例还提供了一种音频数据的编码装置、解码装置,该实施例与前述方法实施例对应,为便于阅读,本实施例不再对前述方法实施例中的细节内容进行逐一赘述,但应当明确,本实施例中的音频数据的处理设备能够对应实现前述方法实施例中的全部内容。
本公开实施例提供了一种音频数据的编码装置,图15为该音频数据的处理装置的结构示意图,参照图15所示,该音频数据的处理设备1500包括:
确定单元1501,用于确定第一音频帧的编码模式;
判断单元1502,用于判断所述第一音频帧的编码模式与第二音频帧的编码模式是否相同;所述第二音频帧为所述第一音频帧的前一个音频帧;
生成单元1503,用于当所述第一音频帧的编码模式与第二音频帧的编码模式不相同且所述第一音频帧的编码模式为多描述编码时,则根据第一数据、第二数据以及第一延迟,生成目标数据;所述第一数据为对所述第一音频帧的原始音频数据进行分频得到的低频数据,所述第二数据为对所述第二音频帧的原始音频数据进行分频得到的低频数据,所述第一延迟为所述多描述编码的编码延迟;
所述生成单元1503还用于当所述第一音频帧的编码模式与第二音频帧的编码模式不相同且所述第一音频帧的编码模式为单描述编码,则根据第四数据、第五数据以及第二延迟,生成第六数据;所述第四数据为所述第一音频帧的原始音频数据,所述第五数据为所述第二音频帧的原始音频数据,所述第二延迟为所述单描述编码的编码延迟;
编码单元1504,用于根据所述第一音频帧的编码模式对所述目标数据进行编码,以获取所述第一音频帧的编码数据。
作为本公开实施例一种可选的实施方式,所述生成单元1503具体用于:从所述第二数据的尾端截取长度为所述第一延迟的样点,以获取第五数据;将所述第五数据拼接于所述第一数据的首端,以获取第六数据;从所述第六数据的尾端删除长度为所述第一延迟的样点,以获取所述目标数据。
作为本公开实施例一种可选的实施方式,所述生成单元1503具体用于:从所述第五数据的尾端截取长度为所述第二延迟的样点,以获取第七数据;将所述第七数据拼接于所述第四数据的首端,以获取第八数据;从所述第八数据的尾端删除长度为所述第二延迟的样点,以获取所述目标数据。
作为本公开实施例一种可选的实施方式,所述确定单元1501具体用于:基于编码模式持续时长和所述第一音频帧的信号类型确定是否满足编码模式切换条件;所述编码模式持续时长为当前编码模式连续编码的音频帧的播放时长;若否,则将所述第二音频帧的编码模式确定为所述第一音频帧的编码模式;若是,则根据音频编码数据传输网络的网络参数确定所述第一音频帧的编码模式。
作为本公开实施例一种可选的实施方式,所述确定单元1501具体用于:判断所述编码模式持续时长是否大于阈值时长;判断所述第一音频帧为语音音频帧的概率是否小于阈值概率;若所述编码模式持续时长大于所述阈值时长,且所述第一音频帧为语音音频帧的概率小于所述阈值概率,则确定满足所述编码模式切换条件;若所述编码模式持续时长小于或等于所述阈值时长和/或所述第一音频帧为语音音频帧的概率大于或等于所述阈值概率,则确定不满足所述编码模式切换条件。
作为本公开实施例一种可选的实施方式,所述确定单元1501具体用于: 根据所述网络参数确定所述音频编码数据传输网络的丢包率;判断所述丢包率是否大于或等于阈值丢包率;若是,则确定所述第一音频帧的编码模式为多描述编码;若否,则确定所述第一音频帧的编码模式为单描述编码。
本公开实施例提供了一种音频数据的解码装置,图16为该音频数据的解码装置的结构示意图,参照图16所示,该音频数据的解码装置1600包括:
确定单元1601,用于根据第一音频帧的编码数据确定第一音频帧的编码模式;
解码单元1602,用于根据所述编码模式对所述第一音频帧的编码数据进行解码,获取解码数据;
判断单元1603,用于判断所述第一音频帧的编码模式与第二音频帧的编码模式是否相同;所述第二音频帧为所述第一音频帧的前一个音频帧;
处理单元1604,用于当所述第一音频帧的编码模式与第二音频帧的编码模式不相同且所述第一音频帧的编码模式为单描述编码,则基于所述第二音频帧生成丢包补偿数据,根据所述丢包补偿数据对所述解码数据进行平滑处理,以获取所述解码数据对应的平滑结果;根据所述丢包补偿数据和延迟样点数量对所述平滑结果进行延迟处理,以获取所述第一音频帧的播放数据;所述延迟样点数量为多描述编码的延迟样点数量;
所述处理单元1604还用于当所述第一音频帧的编码模式与第二音频帧的编码模式不相同且所述第一音频帧的编码模式为多描述编码,则基于所述第二音频帧生成丢包补偿数据,根据所述第二音频帧的延迟数据和所述丢包补偿数据对所述解码数据进行平滑处理,以获取所述第一音频帧的播放数据。
作为本公开实施例一种可选的实施方式,所述处理单元1604具体用于:将所述解码数据中的第一样点序列替换为所述丢包补偿数据中的第二样点序列,以获取第一替换结果;所述第一样点序列为所述解码数据的前第一数量个样点组成的样点序列,所述第一数量为第一预设数量与所述延迟样点数量的差值;所述第二样点序列为所述丢包补偿数据中索引值为所述延迟样点数量至所述第一预设数量的样点组成的样点序列;基于第一窗函数对所述第一替换结果中的第三样点序列和所述丢包补偿数据中的第四样点序列进行加窗叠加,以获取所述解码数据对应的平滑结果,所述第三样点序列为所述第一 替换结果中索引值为所述第一数量至所述第一数量与第二预设数量的和的样点组成的样点序列;所述第四样点序列为所述丢包补偿数据中索引值为所述第一预设数量至所述第一预设数量与第二预设数量的和的样点组成的样点序列。
作为本公开实施例一种可选的实施方式,所述处理单元1604具体用于:获取第五样点序列,所述第五样点序列为所述丢包补偿数据的前所述延迟样点数量个样点组成的样点序列;将所述第五样点序列拼接于所述平滑结果之前,以获取第一拼接结果;删除所述第一拼接结果中的第六样点序列,以获取所述第一音频帧的播放数据,所述第六样点序列为所述第一拼接结果的后所述延迟样点数量个样点组成的样点序列。
作为本公开实施例一种可选的实施方式,所述处理单元1604具体用于:将所述解码数据中的第七样点序列替换为所述延迟数据,以获取第二替换结果;所述第七样点序列为所述解码数据中的前所述延迟样点数量个样点组成的样点序列;基于第二窗函数对所述第二替换结果中的第八样点序列和所述丢包补偿数据中的第九样点序列进行加窗叠加,以获取所述第一音频帧的播放数据,所述第八样点序列为所述第二替换结果中索引值为所述延迟样点数量至所述延迟样点数量与第三预设数量的和的样点组成的样点序列;所述第九样点序列为所述丢包补偿数据的前所述第三预设数量个样点组成的样点序列。
作为本公开实施例一种可选的实施方式,所述处理单元1604具体用于若所述第一音频帧的编码模式与第二音频帧的编码模式相同,且所述第一音频帧的编码模式为单描述编码,则根据所述第二音频帧的延迟数据和所述延迟样点数量对所述解码数据进行延迟处理,以获取所述第一音频帧的播放数据。
作为本公开实施例一种可选的实施方式,所述处理单元1604具体用于:将所述延迟数据拼接于所述解码数据之前,以获取第二拼接结果;删除所述第二拼接结果中的第十样点序列,以获取所述第一音频帧的播放数据,所述第十样点序列为所述第二拼接结果的后所述延迟样点数量个样点组成的样点序列。
本实施例提供的音频数据的处理设备可以执行上述方法实施例提供的音频数据的处理方法,其实现原理与技术效果类似,此处不再赘述。
基于同一发明构思,本公开实施例还提供了一种电子设备。图17为本公开实施例提供的电子设备的结构示意图,如图17所示,本实施例提供的电子设备包括:存储器1701和处理器1702,所述存储器1701用于存储计算机程序;所述处理器1702用于在执行计算机程序时执行上述实施例提供的音频数据的处理方法。
基于同一发明构思,本公开实施例还提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,当计算机程序被处理器执行时,使得所述计算设备实现上述实施例提供的音频数据的处理方法。
基于同一发明构思,本公开实施例还提供了一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算设备实现上述实施例提供的音频数据的处理方法。
本领域技术人员应明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质上实施的计算机程序产品的形式。
处理器可以是中央渲染单元103(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。存储器是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动存储介质。存储介质可以由任何方法或技术来实现信息存储,信息可以是计算机可读指 令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。根据本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
最后应说明的是:以上各实施例仅用以说明本公开的技术方案,而非对其限制;尽管参照前述各实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本公开各实施例技术方案的范围。

Claims (19)

  1. 一种音频数据的编码方法,包括:
    确定第一音频帧的编码模式;
    判断所述第一音频帧的编码模式与第二音频帧的编码模式是否相同;所述第二音频帧为所述第一音频帧的前一个音频帧;
    若不相同,且所述第一音频帧的编码模式为多描述编码,则根据第一数据、第二数据以及第一延迟,生成第三数据;所述第一数据为对所述第一音频帧的原始音频数据进行分频得到的低频数据,所述第二数据为对所述第二音频帧的原始音频数据进行分频得到的低频数据,所述第一延迟为所述多描述编码的编码延迟;
    对所述第三数据进行多描述编码,以获取所述第一音频帧的编码数据。
  2. 根据权利要求1所述的方法,所述方法还包括:
    若所述第一音频帧的编码模式与第二音频帧的编码模式不相同,且所述第一音频帧的编码模式为单描述编码,则根据第四数据、第五数据以及第二延迟,生成第六数据;所述第四数据为所述第一音频帧的原始音频数据,所述第五数据为所述第二音频帧的原始音频数据,所述第二延迟为所述单描述编码的编码延迟;
    对所述第六数据进行单描述编码,以获取所述第一音频帧的编码数据。
  3. 根据权利要求1所述的方法,所述根据第一数据、第二数据以及第一延迟,生成第三数据,包括:
    从所述第二数据的尾端截取长度为所述第一延迟的样点,以获取第七数据;
    将所述第七数据拼接于所述第一数据的首端,以获取第八数据;
    从所述第八数据的尾端删除长度为所述第一延迟的样点,以获取所述第三数据。
  4. 根据权利要求2所述的方法,所述根据第四数据、第五数据以及第二延迟,生成第六数据,包括:
    从所述第五数据的尾端截取长度为所述第二延迟的样点,以获取第九数 据;
    将所述第九数据拼接于所述第四数据的首端,以获取第十数据;
    从所述第十数据的尾端删除长度为所述第二延迟的样点,以获取所述第六数据。
  5. 根据权利要求1-4任一项所述的方法,所述确定第一音频帧的编码模式,包括:
    基于编码模式持续时长和所述第一音频帧的信号类型确定是否满足编码模式切换条件;所述编码模式持续时长为当前编码模式连续编码的音频帧的播放时长;
    若否,则将所述第二音频帧的编码模式确定为所述第一音频帧的编码模式;
    若是,则根据音频编码数据传输网络的网络参数确定所述第一音频帧的编码模式。
  6. 根据权利要求5所述的方法,所述基于编码模式持续时长和所述第一音频帧的信号类型,确定是否满足编码模式切换条件,包括:
    判断所述编码模式持续时长是否大于阈值时长;
    判断所述第一音频帧为语音音频帧的概率是否小于阈值概率;
    若所述编码模式持续时长大于所述阈值时长,且所述第一音频帧为语音音频帧的概率小于所述阈值概率,则确定满足所述编码模式切换条件;
    若所述编码模式持续时长小于或等于所述阈值时长和/或所述第一音频帧为语音音频帧的概率大于或等于所述阈值概率,则确定不满足所述编码模式切换条件。
  7. 根据权利要求5所述的方法,所述根据音频编码数据传输网络的网络参数确定所述第一音频帧的编码模式,包括:
    根据所述网络参数确定所述音频编码数据传输网络的丢包率;
    判断所述丢包率是否大于或等于阈值丢包率;
    若是,则确定所述第一音频帧的编码模式为多描述编码;
    若否,则确定所述第一音频帧的编码模式为单描述编码。
  8. 一种音频数据的解码方法,包括:
    根据第一音频帧的编码数据确定第一音频帧的编码模式;
    根据所述编码模式对所述第一音频帧的编码数据进行解码,获取解码数据;
    判断所述第一音频帧的编码模式与第二音频帧的编码模式是否相同;所述第二音频帧为所述第一音频帧的前一个音频帧;
    若不相同,且所述第一音频帧的编码模式为多描述编码,则基于所述第二音频帧生成丢包补偿数据;
    根据所述第二音频帧的延迟数据和所述丢包补偿数据对所述解码数据进行平滑处理,以获取所述第一音频帧的播放数据。
  9. 根据权利要求8所述的方法,所述方法还包括:
    若所述第一音频帧的编码模式与第二音频帧的编码模式不相同,且所述第一音频帧的编码模式为单描述编码,则基于所述第二音频帧生成丢包补偿数据;
    根据所述丢包补偿数据对所述解码数据进行平滑处理,以获取所述解码数据对应的平滑结果;
    根据所述丢包补偿数据和延迟样点数量对所述平滑结果进行延迟处理,以获取所述第一音频帧的播放数据;所述延迟样点数量为多描述编码的延迟样点数量。
  10. 根据权利要求9所述的方法,所述根据所述丢包补偿数据对所述解码数据进行平滑处理,以获取所述解码数据对应的平滑结果,包括:
    将所述解码数据中的第一样点序列替换为所述丢包补偿数据中的第二样点序列,以获取第一替换结果;所述第一样点序列为所述解码数据的前第一数量个样点组成的样点序列,所述第一数量为第一预设数量与所述延迟样点数量的差值;所述第二样点序列为所述丢包补偿数据中索引值为所述延迟样点数量至所述第一预设数量的样点组成的样点序列;
    基于第一窗函数对所述第一替换结果中的第三样点序列和所述丢包补偿数据中的第四样点序列进行加窗叠加,以获取所述解码数据对应的平滑结果,所述第三样点序列为所述第一替换结果中索引值为所述第一数量至所述第一数量与第二预设数量的和的样点组成的样点序列;所述第四样点序列为所述 丢包补偿数据中索引值为所述第一预设数量至所述第一预设数量与第二预设数量的和的样点组成的样点序列。
  11. 根据权利要求9所述的方法,所述根据所述丢包补偿数据和延迟样点数量对所述平滑结果进行延迟处理,以获取所述第一音频帧的播放数据,包括:
    获取第五样点序列,所述第五样点序列为所述丢包补偿数据的前所述延迟样点数量个样点组成的样点序列;
    将所述第五样点序列拼接于所述平滑结果之前,以获取第一拼接结果;
    删除所述第一拼接结果中的第六样点序列,以获取所述第一音频帧的播放数据,所述第六样点序列为所述第一拼接结果的后所述延迟样点数量个样点组成的样点序列。
  12. 根据权利要求9所述的方法,所述根据所述第二音频帧的延迟数据和所述丢包补偿数据对所述解码数据进行平滑处理,以获取所述第一音频帧的播放数据,包括:
    将所述解码数据中的第七样点序列替换为所述延迟数据,以获取第二替换结果;所述第七样点序列为所述解码数据中的前所述延迟样点数量个样点组成的样点序列;
    基于第二窗函数对所述第二替换结果中的第八样点序列和所述丢包补偿数据中的第九样点序列进行加窗叠加,以获取所述第一音频帧的播放数据,所述第八样点序列为所述第二替换结果中索引值为所述延迟样点数量至所述延迟样点数量与第三预设数量的和的样点组成的样点序列;所述第九样点序列为所述丢包补偿数据的前所述第三预设数量个样点组成的样点序列。
  13. 根据权利要求8-12任一项所述的方法,所述方法还包括:
    若所述第一音频帧的编码模式与第二音频帧的编码模式相同,且所述第一音频帧的编码模式为单描述编码,则根据所述第二音频帧的延迟数据和所述延迟样点数量对所述解码数据进行延迟处理,以获取所述第一音频帧的播放数据。
  14. 根据权利要求13所述的方法,所述根据所述第二音频帧的延迟数据和所述延迟样点数量对所述解码数据进行延迟处理,以获取所述第一音频帧 的播放数据,包括:
    将所述延迟数据拼接于所述解码数据之前,以获取第二拼接结果;
    删除所述第二拼接结果中的第十样点序列,以获取所述第一音频帧的播放数据,所述第十样点序列为所述第二拼接结果的后所述延迟样点数量个样点组成的样点序列。
  15. 一种音频数据的编码装置,包括:
    确定单元,用于确定第一音频帧的编码模式;
    判断单元,用于判断所述第一音频帧的编码模式与第二音频帧的编码模式是否相同;所述第二音频帧为所述第一音频帧的前一个音频帧;
    生成单元,用于在所述第一音频帧的编码模式与第二音频帧的编码模式不相同且所述第一音频帧的编码模式为多描述编码的情况下,根据第一数据、第二数据以及第一延迟,生成第三数据;所述第一数据为对所述第一音频帧的原始音频数据进行分频得到的低频数据,所述第二数据为对所述第二音频帧的原始音频数据进行分频得到的低频数据,所述第一延迟为所述多描述编码的编码延迟;
    编码单元,用于对所述第三数据进行多描述编码,以获取所述第一音频帧的编码数据。
  16. 一种音频数据的解码装置,包括:
    确定单元,用于根据第一音频帧的编码数据确定第一音频帧的编码模式;
    解码单元,用于根据所述编码模式对所述第一音频帧的编码数据进行解码,获取解码数据;
    判断单元,用于判断所述第一音频帧的编码模式与第二音频帧的编码模式是否相同;所述第二音频帧为所述第一音频帧的前一个音频帧;
    处理单元,用于在所述第一音频帧的编码模式与第二音频帧的编码模式不相同且所述第一音频帧的编码模式为多描述编码的情况下,基于所述第二音频帧生成丢包补偿数据;以及根据所述第二音频帧的延迟数据和所述丢包补偿数据对所述解码数据进行平滑处理,以获取所述第一音频帧的播放数据。
  17. 一种电子设备,包括:存储器和处理器,所述存储器用于存储计算机程序;所述处理器用于在执行计算机程序时,使得所述电子设备实现权利 要求1-7任一项所述的音频数据的编码方法或权利要求8-14任一项所述的音频数据的解码方法。
  18. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,当所述计算机程序被计算设备执行时,使得所述计算设备实现权利要求1-7任一项所述的音频数据的编码方法或权利要求8-14任一项所述的音频数据的解码方法。
  19. 一种计算机程序产品,所述计算机程序产品包括计算机程序,所述计算机程序被处理器执行时实现权利要求1-7任一项所述的音频数据的编码方法或权利要求8-14任一项所述的音频数据的解码方法。
PCT/CN2023/129685 2022-11-07 2023-11-03 一种音频数据的编码方法、解码方法及装置 WO2024099233A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211387602.8 2022-11-07
CN202211387602.8A CN118038879A (zh) 2022-11-07 2022-11-07 一种音频数据的编码方法、解码方法及装置

Publications (1)

Publication Number Publication Date
WO2024099233A1 true WO2024099233A1 (zh) 2024-05-16

Family

ID=90988246

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/129685 WO2024099233A1 (zh) 2022-11-07 2023-11-03 一种音频数据的编码方法、解码方法及装置

Country Status (2)

Country Link
CN (1) CN118038879A (zh)
WO (1) WO2024099233A1 (zh)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0856956A1 (en) * 1997-01-30 1998-08-05 AT&T Corp. Multiple description coding communication system
CN101115051A (zh) * 2006-07-25 2008-01-30 华为技术有限公司 音频信号处理方法、系统以及音频信号收发装置
CN101340261A (zh) * 2007-07-05 2009-01-07 华为技术有限公司 多描述编码和多描述解码的方法、装置及系统
JP2009089157A (ja) * 2007-10-01 2009-04-23 Yamaha Corp 配信システムおよび配信方法
CN101777960A (zh) * 2008-11-17 2010-07-14 华为终端有限公司 音频编码方法、音频解码方法、相关装置及通信系统
CN101833953A (zh) * 2009-03-12 2010-09-15 华为终端有限公司 降低多描述编解码冗余度的方法和装置
CN101989425A (zh) * 2009-07-30 2011-03-23 华为终端有限公司 多描述音频编解码的方法、装置及系统
CN109616129A (zh) * 2018-11-13 2019-04-12 南京南大电子智慧型服务机器人研究院有限公司 用于提升语音丢帧补偿性能的混合多描述正弦编码器方法
CN114333862A (zh) * 2021-11-10 2022-04-12 腾讯科技(深圳)有限公司 音频编码方法、解码方法、装置、设备、存储介质及产品

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0856956A1 (en) * 1997-01-30 1998-08-05 AT&T Corp. Multiple description coding communication system
CN101115051A (zh) * 2006-07-25 2008-01-30 华为技术有限公司 音频信号处理方法、系统以及音频信号收发装置
CN101340261A (zh) * 2007-07-05 2009-01-07 华为技术有限公司 多描述编码和多描述解码的方法、装置及系统
JP2009089157A (ja) * 2007-10-01 2009-04-23 Yamaha Corp 配信システムおよび配信方法
CN101777960A (zh) * 2008-11-17 2010-07-14 华为终端有限公司 音频编码方法、音频解码方法、相关装置及通信系统
CN101833953A (zh) * 2009-03-12 2010-09-15 华为终端有限公司 降低多描述编解码冗余度的方法和装置
CN101989425A (zh) * 2009-07-30 2011-03-23 华为终端有限公司 多描述音频编解码的方法、装置及系统
CN109616129A (zh) * 2018-11-13 2019-04-12 南京南大电子智慧型服务机器人研究院有限公司 用于提升语音丢帧补偿性能的混合多描述正弦编码器方法
CN114333862A (zh) * 2021-11-10 2022-04-12 腾讯科技(深圳)有限公司 音频编码方法、解码方法、装置、设备、存储介质及产品

Also Published As

Publication number Publication date
CN118038879A (zh) 2024-05-14

Similar Documents

Publication Publication Date Title
US10964334B2 (en) Audio decoder and method for providing a decoded audio information using an error concealment modifying a time domain excitation signal
US10269359B2 (en) Audio decoder and method for providing a decoded audio information using an error concealment based on a time domain excitation signal
JP5072835B2 (ja) 堅牢なデコーダ
JP5208901B2 (ja) 音声信号および音楽信号を符号化する方法
US20110145004A1 (en) Bitrate constrained variable bitrate audio encoding
BRPI0607247A2 (pt) mÉtodo para gerar seqÜÊncia de saÍda de amostras em resposta a uma primeira e uma segunda subseqÜÊncias de amostras, càdigo de programa executÁvel por computador, dispositivo de armazenamento de programa, e, arranjo para receber um sinal de Áudio digitalizado
WO2019228423A1 (zh) 立体声信号的编码方法和装置
WO2001018790A1 (en) Method and apparatus in a telecommunications system
WO2024099233A1 (zh) 一种音频数据的编码方法、解码方法及装置
US7363231B2 (en) Coding device, decoding device, and methods thereof
CN115867965A (zh) 低频效果声道的帧丢失隐藏
JP2005122034A (ja) オーディオデータ圧縮方法
CN116978388A (zh) 一种音频数据的处理方法、处理系统和存储介质
KR20100009411A (ko) 주파수 영역 변환 기법 및 시간 영역 변환 기법을 전환하며오디오 신호를 부호화하는 장치 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23887923

Country of ref document: EP

Kind code of ref document: A1