WO2023202250A1 - 音频传输方法、装置、终端、存储介质及程序产品 - Google Patents

音频传输方法、装置、终端、存储介质及程序产品 Download PDF

Info

Publication number
WO2023202250A1
WO2023202250A1 PCT/CN2023/079987 CN2023079987W WO2023202250A1 WO 2023202250 A1 WO2023202250 A1 WO 2023202250A1 CN 2023079987 W CN2023079987 W CN 2023079987W WO 2023202250 A1 WO2023202250 A1 WO 2023202250A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
band
subband
sub
frequency
Prior art date
Application number
PCT/CN2023/079987
Other languages
English (en)
French (fr)
Inventor
梁俊斌
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2023202250A1 publication Critical patent/WO2023202250A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/0017Lossless audio signal coding; Perfect reconstruction of coded audio signal by transmission of coding error
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm

Definitions

  • Embodiments of the present application relate to the field of multimedia transmission technology, and in particular to an audio transmission method, device, terminal, storage medium and program product.
  • Speech codec plays an important role in modern communication systems.
  • the signal sending end compresses and packages the sound signal through the encoder, and then sends the data to the receiving end according to the network transmission format and protocol.
  • the receiving end unpacks and decodes the data packet to obtain the sound signal.
  • the sending end in order to solve the problem of packet loss during transmission, the sending end usually uses forward error correction (ForwardErrorCorrection, FEC) technology to perform channel coding and generate redundant data packets.
  • FEC forward error correction
  • the receiving end determines that there is packet loss, it can perform data recovery based on redundant data packets and obtain complete multimedia data.
  • FEC redundant data packets will consume additional transmission bandwidth, and the anti-packet loss capability of the transmission system is positively related to the coding redundancy.
  • FEC coding redundancy needs to be improved, resulting in a significant increase in transmission bandwidth and operating costs.
  • Embodiments of the present application provide an audio transmission method, device, terminal, storage medium and program product.
  • the technical solutions are as follows:
  • this application provides an audio transmission method, which is executed by an audio sending end.
  • the method includes:
  • Second subband coded data Based on the energy distribution of the input signal, determine second subband coded data from the first subband coded data, and the audio frequency band of the signal subband corresponding to the second subband coded data is the signal energy concentration frequency band;
  • this application provides an audio transmission method, which is performed by an audio receiving end.
  • the method includes:
  • Receive audio data packets include redundant data and at least two sets of first subband encoded data, the redundant data is encoded by the audio sending end to the second subband of the first subband encoded data.
  • the coded data is obtained by error correction coding.
  • the first sub-band coded data is obtained by the audio sending end performing sub-band decomposition and compression coding on the input signal.
  • Different sub-band coded data corresponds to the first sub-band coded data of different audio frequency bands in the input signal.
  • a signal subband, the audio frequency band of the second subband encoded data is the signal energy concentration frequency band;
  • this application provides an audio transmission device, which includes:
  • a subband coding module used to perform subband decomposition and compression coding on the input signal to obtain first subband coded data of at least two sets of signal subbands, where different signal subbands correspond to different audio frequency bands of the input signal;
  • Determining module configured to determine second sub-band encoded data from the first sub-band encoded data based on the energy distribution of the input signal, and the audio frequency band of the second sub-band encoded data corresponding to the signal sub-band is the signal Energy concentrated frequency band;
  • An error correction coding module used to perform error correction coding on the second subband coded data to obtain redundant data
  • a data sending module configured to send an audio data packet to an audio receiving end, where the audio data packet contains the first subband encoded data and the redundant data, and the audio receiving end is configured to respond in the event of packet loss. Data recovery is performed on the first subband encoded data based on the redundant data.
  • this application provides an audio transmission device, which includes:
  • a data receiving module configured to receive audio data packets.
  • the audio data packets contain redundant data and at least two sets of first sub-band encoded data.
  • the redundant data is encoded by the audio sending end to the first sub-band encoded data.
  • the second sub-band coded data in is obtained by error correction coding.
  • the first sub-band coded data is obtained by the audio sending end performing sub-band decomposition and compression coding on the input signal.
  • Different first sub-band coded data correspond to the First signal subbands of different audio frequency bands in the input signal, and the audio frequency band of the second subband encoded data is the signal energy concentration frequency band;
  • a packet loss detection module configured to perform packet loss detection on the first subband encoded data
  • a decoding module configured to perform data recovery on the first sub-band encoded data based on the redundant data to obtain an output signal when the first sub-band encoded data loses packets.
  • the present application provides a terminal, which includes a processor and a memory; at least one program is stored in the memory, and the at least one program is loaded and executed by the processor to implement the above aspects.
  • a terminal which includes a processor and a memory; at least one program is stored in the memory, and the at least one program is loaded and executed by the processor to implement the above aspects.
  • the present application provides a computer-readable storage medium in which at least one computer program is stored, and the computer program is loaded and executed by a processor to implement audio as described in the above aspect. Transmission method.
  • a computer program product including computer instructions stored in a computer-readable storage medium.
  • the processor of the terminal reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the terminal performs the audio transmission method provided in various optional implementations of the above aspect.
  • At least two sets of first sub-band coded data are obtained by performing frequency band decomposition and compression coding on the input signal.
  • Error correction coding is performed on part of the sub-band coded data where the signal energy is concentrated to ensure that the audio receiving end Primary audio data recovery capability.
  • Figure 1 is an audio transmission flow chart of related technical solutions
  • Figure 2 is a schematic diagram of an implementation environment provided by an exemplary embodiment of the present application.
  • Figure 3 is a flow chart of an audio transmission method provided by an exemplary embodiment of the present application.
  • Figure 4 is a flow chart of an audio transmission method provided by another exemplary embodiment of the present application.
  • Figure 5 is a framework diagram of a subband coding model provided by an exemplary embodiment of the present application.
  • Figure 6 is a flow chart of an audio transmission method provided by another exemplary embodiment of the present application.
  • Figure 7 is a flow chart of an audio transmission method provided by another exemplary embodiment of the present application.
  • Figure 8 is a framework diagram of an audio coding and decoding system provided by an exemplary embodiment of the present application.
  • Figure 9 is a structural block diagram of an audio transmission device provided by an exemplary embodiment of the present application.
  • Figure 10 is a structural block diagram of an audio transmission device provided by another exemplary embodiment of the present application.
  • Figure 11 is a structural block diagram of a terminal provided by an exemplary embodiment of the present application.
  • Speech codec plays an important role in modern communication systems.
  • the sound signal is collected through a microphone, and the terminal (sender) converts the analog sound signal into a digital sound signal through an analog-to-digital conversion circuit.
  • the digital sound signal is compressed and encoded by the speech encoder, and then packaged and sent to the receiving end according to the communication network transmission format and protocol.
  • the receiving end device After receiving the data packet, the receiving end device unpacks and outputs the speech encoding and compression stream, which is compressed and decoded by the speech decoder. Regenerate the speech digital signal.
  • the voice digital signal plays the sound signal through the speaker.
  • Voice coding and decoding effectively reduces the bandwidth of voice signal transmission, plays a decisive role in saving voice information storage and transmission costs, and ensuring the integrity of voice information during communication network transmission.
  • the instability of the transmission network will lead to packet loss during the transmission process, causing the sound at the receiving end to be stuck and incoherent, making the listener's experience poor.
  • a variety of methods have been adopted to resist network packet loss, including: forward error correction, packet loss concealment, automatic retransmission requests, etc.
  • the forward error correction anti-packet loss solution can effectively solve the problem of perfectly recovering packet loss location information.
  • the data after forward error correction encoding is packaged and sent to the receiving end.
  • the receiving end receives the forward error correction code and decodes it to recover the complete data at the packet loss location, achieving perfect recovery.
  • the preceding error correction requires additional bandwidth consumption, and the higher the redundancy of the preceding error correction, the stronger the ability to resist packet loss, but it also increases bandwidth. Therefore, how to effectively control the forward error correction redundancy and reduce bandwidth consumption while achieving better end-to-end audio transmission effects is a topic worthy of study.
  • FIG. 2 shows a schematic diagram of an implementation environment provided by an exemplary embodiment of this application.
  • the implementation environment includes: an audio sending end 110 and an audio receiving end 120.
  • the audio sending end 110 combines the sub-band encoding and decoding method to perform sub-band decomposition and compression coding on the input signal and performs signal classification. Based on the signal classification results, error correction encoding is performed on part of the sub-band encoding data with concentrated energy to generate redundant data.
  • the audio sending end 110 sends each group of subband coded data and redundant data to the audio receiving end 120.
  • the audio receiving end 120 receives and parses the data, and detects whether the subband encoded data is packet lost. In the case of packet loss, the audio receiving end 120 can recover the signal subband of the energy concentrated frequency band based on the redundant data, and then obtain a complete output signal through subband prediction.
  • the audio sending end 110 shown in the figure can also be used as a receiving end to receive audio data
  • the audio receiving end 120 can also be used as a sending end to send audio data.
  • the figure only shows two terminals accessing the transmission network. In actual application scenarios (such as multi-person call scenarios or online conference scenarios, etc.), the number of terminals can be more. The embodiments of this application do not limit the number of terminals and device types.
  • FIG. 3 shows a flow chart of an audio transmission method provided by an exemplary embodiment of the present application.
  • This embodiment takes the method executed by the audio sending end as an example to illustrate.
  • the method includes the following steps:
  • Step 301 Perform sub-band decomposition and compression coding on the input signal to obtain first sub-band coded data of at least two sets of signal sub-bands. Different signal sub-bands correspond to different audio frequency bands of the input signal.
  • the input signal is the sound signal collected by the terminal through a device such as a microphone.
  • the audio sending end converts the input signal from the time domain to the frequency domain, performs sub-band decomposition of the input signal in the frequency domain, obtains signal sub-bands of different audio frequency bands, and analyzes each group of signals respectively.
  • the input signals of the subbands are compressed and encoded to obtain the first subband coded data of each group of signal subbands. Therefore, different signal subbands correspond to different audio frequency bands of the input signal.
  • the audio sending end performs sub-band decomposition and compression coding on the input signal once to obtain the first sub-band coded data of each signal sub-band, or the audio sending end performs sub-band decomposition on the input signal multiple times (for example, first through Subband decomposition yields two sets of signal subbands, and then continues to decompose some or all of the signal subbands), and then performs compression encoding.
  • first through Subband decomposition yields two sets of signal subbands, and then continues to decompose some or all of the signal subbands
  • compression encoding for example, first through Subband decomposition yields two sets of signal subbands, and then continues to decompose some or all of the signal subbands.
  • first through Subband decomposition yields two sets of signal subbands, and then continues to decompose some or all of the signal subbands
  • compression encoding for example, first through Subband decomposition yields two sets of signal subbands, and then continues to decompos
  • the frequency of human speech is usually distributed in the range of 500Hz to 4KHz. Therefore, for the transmission of 16KHz audio files, the audio sending end first performs sub-band decomposition and compression encoding of the input signal to obtain 0 -The first sub-band encoded data of the two audio frequency bands 8KHz and 8KHz-16KHz.
  • the audio sending end can use multiple band-pass filters (Band-Pass Filter, BPF) to divide the input signal into several continuous audio frequency bands.
  • BPF Band-Pass Filter
  • the input signal of each audio frequency band is called a signal sub-band, and then each The signal subbands are compressed and encoded to obtain multiple sets of first subband coded data of the input signal.
  • Step 302 Based on the energy distribution of the input signal, determine the second subband coded data from the first subband coded data.
  • the audio frequency band of the signal subband corresponding to the second subband coded data is the signal energy concentration frequency band.
  • error correction coding is performed by extracting only the sub-band coding data of some key signal sub-bands in the input signal to reduce the cost of transmitting redundant data (coded data obtained after error correction coding). The bandwidth to be consumed.
  • the audio sending end first needs to determine the second subband coded data of the key signal subband from the first subband coded data.
  • key signal subbands are often frequency bands where signal energy in the input signal is concentrated, for example, if most of the energy of the input signal is concentrated in low frequency, then the low frequency subband in the input signal is the associated signal subband in the input signal. band; if most of the energy of the input signal is concentrated in high frequency, the high-frequency sub-band in the input signal is the key signal sub-band.
  • the corresponding audio sending end can determine the audio frequency band where the signal energy is concentrated from the input signal by calculating the energy distribution of the input signal, and determine the first sub-band coded data corresponding to the audio frequency band as the second sub-band coded data, that is, the key Subband encoded data.
  • the audio sending end determines that the signal energy is concentrated in the audio frequency band of 0-8KHz, then the first sub-band coded data of the audio frequency band is determined as the second sub-band coded data.
  • the second sub-band coded data is a group of first sub-band coded data with the highest energy proportion, or, in the case of relatively fine frequency band division, the second sub-band coded data includes multiple groups with the highest energy proportion.
  • First subband encoded data The embodiments of the present application do not limit this.
  • Step 303 Perform error correction coding on the second subband coded data to obtain redundant data.
  • Error correction coding is also called channel coding, which mainly includes Packet Loss Concealment (PLC), Automatic Repeat-reQuest (ARQ), Forward Error Correction (FEC) coding, and Hybrid Correction Error coding, bit interleaving and BCH error correction coding and other technologies.
  • PLC Packet Loss Concealment
  • ARQ Automatic Repeat-reQuest
  • FEC Forward Error Correction
  • Hybrid Correction Error coding bit interleaving and BCH error correction coding and other technologies.
  • forward error correction coding can be implemented through various algorithms such as Reed-Solomoncode (RScode), HammingCode (HammingCode) or Low Density Parity Check Code (LDPC).
  • the audio sending end performs error correction coding on the second subband coded data to obtain redundant data, but does not perform error correction coding on other first subband coded data. This ensures that the audio receiving end can first recover the sound signal of the important audio frequency band (key audio frequency band) based on redundant data in the event of packet loss. At the same time, it can reduce the loss of transmission bandwidth caused by redundant data.
  • Step 304 Send an audio data packet to the audio receiving end.
  • the audio data packet contains the first subband encoded data and redundant data.
  • the audio receiving end is used to encode the first subband based on the redundant data in the event of packet loss. Encoded data for data recovery.
  • the audio transmitting end packages and sends each set of first subband encoded data and redundant data corresponding to the input signal to the audio receiving end, so that the audio receiving end performs decoding based on the first subband encoded data and redundant data, and finally Output sound signal.
  • At least two sets of first sub-band coded data are obtained by performing frequency band decomposition and compression coding on the input signal, and error correction is performed on the part of the second sub-band coded data where the signal energy is concentrated.
  • Encoding ensures the ability of the audio receiver to recover primary audio data. Compared with the solution of directly performing error correction coding on the complete input signal, while improving the audio transmission quality, it can also reduce the amount of redundant data, thereby reducing the impact of error correction coding on transmission bandwidth and Consumption of operating costs.
  • developers can set a fixed audio frequency band that requires error correction coding based on actual application scenarios. For example, in a voice call scenario, since human voice is usually a low-frequency signal, the audio transmitter is set to The first subband coded data of the subband is used as the second subband coded data, that is, only the first subband coded data of the low frequency subband is error correction encoded. In order to improve audio coding and transmission quality, the audio sending end can also determine the second sub-band coded data from the first sub-band coded data by calculating the energy ratio.
  • FIG. 4 shows a flow chart of an audio transmission method provided by another exemplary embodiment of the present application.
  • This embodiment takes the method executed by the audio sending end as an example to illustrate.
  • the method includes the following steps:
  • Step 401 Perform analog-to-digital conversion on the analog sound signal collected by the microphone to generate a digital sound signal.
  • the sound signal is collected through the microphone.
  • the sound signal collected by the audio sending end is an analog signal.
  • the audio sending end converts the analog sound signal into a digital sound signal through an analog-to-digital conversion circuit for subsequent compression encoding, error correction encoding, and audio transmission.
  • Step 402 Perform Fourier transform on the digital sound signal to obtain a frequency domain signal.
  • Subband coding technology is a technology that converts the original signal from the time domain to the frequency domain, then divides it into several sub-bands, and digitally codes the signals of each sub-band. Since the audio transmitter needs to decompose the input signal into sub-bands, it first converts the time domain signal into a frequency domain signal. The audio sending end performs Fourier transform on the digital sound signal to obtain the frequency domain sound signal.
  • Step 403 Perform subband decomposition and compression coding on the frequency domain signal to generate first subband coded data of at least two sets of signal subbands.
  • the audio sending end decomposes the input signal into components of different frequency bands to remove signal correlation, and then samples, quantizes, and codes each group of components separately to obtain multiple groups of unrelated codewords.
  • the specific implementation of step 403 may include the following steps 403a to 403b (not shown in the figure):
  • Step 403a Perform sub-band decomposition of the frequency domain signal through at least two band-pass filters to obtain at least two signal sub-bands.
  • Different band-pass filters correspond to different audio frequency bands, and the audio frequency bands of each band-pass filter are continuous.
  • the basic idea of speech sub-band coding is that the audio transmitter first decomposes the input signal into several signal sub-bands in different audio frequency bands through a set of band-pass filters, and then passes these signal sub-bands through The frequency shift is converted into a baseband signal, and each baseband signal is sampled separately. The sampled signal is quantized, encoded, and combined into a total code stream and transmitted to the receiving end.
  • Subband coding can reasonably allocate the number of bits of each signal subband according to the hearing characteristics of the human ear to obtain better hearing effects, while also saving storage resources and reducing transmission bandwidth.
  • the audio sending end in the embodiment of the present application performs sub-band decomposition and compression coding on the input signal based on the above basic idea to obtain the first sub-band coded data of each signal sub-band.
  • the audio transmitter first passes through a set of bandpass filters, such as a Quadrature Mirror Filter (QMF), to divide the frequency band of a frame of input signal into several continuous audio frequency bands. Each audio frequency band is called a signal sub-band. bring.
  • QMF Quadrature Mirror Filter
  • Step 403b Perform frequency shifting and quantization coding on the signal subbands to obtain the first subband coded data of each group of signal subbands.
  • the audio sending end frequency-shifts each signal sub-band to the high-frequency end, and performs quantization coding on the frequency-shifted signal sub-band.
  • the audio sending end uses a unified coding scheme to code each group of signal subbands, or the audio sending end uses a separate coding scheme to code each group of signal subbands.
  • the embodiments of the present application do not limit this.
  • Step 404 Determine the low-frequency energy proportion of the low-frequency subband based on the sample signals of the input signal in each audio frequency band.
  • the audio sending end performs compression coding and calculation of the low-frequency energy ratio simultaneously after decomposing the input signal into sub-bands, or the audio sending end performs compression encoding on the signal sub-bands and then calculates the low-frequency energy of the input signal.
  • the embodiments of this application do not limit this.
  • the audio sending end determines the signal subband where the energy is concentrated by calculating the energy proportion of the low-frequency subband. If the energy proportion of the low-frequency subband is high, it means that the signal energy is concentrated in the low-frequency subband; if the energy proportion of the low-frequency subband is low, it means that the signal energy Concentrate on high-frequency subbands. Among them, the audio frequency of the low-frequency subband is lower than the audio frequency of other signal subbands.
  • x (k, i) is the i-th sample signal of the k-th signal sub-band after the single-frame signal has been decomposed into sub-bands.
  • the sample signal is a sample signal in each signal subband.
  • the audio sending end only needs to calculate the energy proportion of one signal sub-band at low frequency.
  • the audio transmitter calculates the energy proportion of a group of signal subbands with the lowest frequency, or calculates the energy proportion of multiple groups of signal subbands with the lowest frequency. Developers can calculate the energy proportion of the lowest frequency group of signal subbands based on actual application scenarios and Factors such as audio file format, set the calculation method of low-frequency energy proportion and the method of determining the second sub-band encoding data.
  • the audio sending end can first decompose the input signal into two frequency bands: 0-16KHz and 16-32KHz, and then decompose the 0-16KHz frequency band into 0-8KHz and 8-16KHz Two frequency bands, and calculate the proportion of low-frequency energy in the 0-8KHz frequency band and the 8-16KHz frequency band.
  • the embodiments of the present application do not limit this.
  • Step 405 Based on the low-frequency energy proportion, determine the second sub-band coded data from the first sub-band coded data.
  • the audio sending end determines the audio frequency band where the energy is concentrated based on the proportion of low-frequency energy, and then determines the first sub-band coded data of the audio frequency band (signal sub-band) where the energy is concentrated as the second sub-band coded data.
  • step 405 may specifically include the following steps 405a to 405b (not shown in the figure):
  • Step 405a When the proportion of low-frequency energy is higher than the threshold, determine the first sub-band coded data of the low-frequency subband as the second sub-band coded data.
  • the audio sending end stores a threshold value.
  • the audio sending end calculates the low-frequency energy proportion, it compares the relationship between the low-frequency energy proportion and the threshold value, and classifies the input signal according to the comparison result to determine the input signal. Belonging to a low-frequency signal or a high-frequency signal, the first sub-band encoded data of the low-frequency sub-band or the high-frequency sub-band is determined to be selected as the second sub-band encoded data according to the comparison result.
  • the low-frequency subband is a signal subband with an audio frequency lower than other signal subbands in the signal subband
  • the high-frequency subband is a signal subband with a higher audio frequency than other signal subbands in the signal subband.
  • the audio sending end determines that the proportion of low-frequency energy is higher than the threshold, it means that the input signal is a low-frequency signal, and the low-frequency signal subband is the key signal self-subband of the input signal. Subsequently, the low-frequency signal subband needs to be focused on repairing. . Then the audio sending end directly determines the first sub-band coded data of the low-frequency signal sub-band as the second sub-band coded data.
  • the audio transmitter can determine the signal sub-band with concentrated energy by calculating the energy proportion of the multiple groups of signal sub-bands, and then determine the second sub-band coding data.
  • the input signal is decomposed into two signal sub-bands of 0-8KHz and 8-16KHz, with a threshold of 50%. If the proportion of low-frequency energy in 0-8KHz is higher than 50%, the input signal is determined to be a low-frequency signal. , the low-frequency signal is the key signal sub-band, and the corresponding audio sending end determines the first sub-band coded data in the 0-8KHz frequency band as the second sub-band coded data.
  • Step 405b When the proportion of low-frequency energy is lower than the threshold, determine the first sub-band coded data of the high-frequency sub-band as the second sub-band coded data.
  • the audio frequency of the high-frequency sub-band is higher than that of other signal sub-bands. audio frequency.
  • the audio sending end directly determines the first sub-band coded data of the high-frequency signal sub-band as the second sub-band coded data.
  • the audio transmitter can determine the signal sub-band with concentrated energy by calculating the energy proportion of the multiple groups of signal sub-bands, and then determine the second sub-band coding data.
  • Step 406 Perform error correction coding on the second subband coded data to obtain redundant data.
  • step 406 For the specific implementation of step 406, reference can be made to the above-mentioned step 303, which will not be described again in the embodiment of this application.
  • Step 407 Generate a signal type identifier based on the low-frequency energy proportion.
  • the signal type identifier is used to indicate that the input signal is a voiced signal or a non-voiced signal, where the voiced signal
  • the proportion of low-frequency energy is higher than the threshold.
  • Low-frequency signals play a key role in speech intelligibility. It is necessary to perform error correction coding on the first sub-band coded data of the low-frequency signal sub-band so that the low-frequency signal can be restored in the event of packet loss. ;
  • the proportion of low-frequency energy of the non-voiced signal is lower than the threshold, and the high-frequency signal plays a key role in speech intelligibility. It is necessary to perform error correction coding on the first sub-band coded data of the high-frequency signal sub-band in order to prevent packet loss. In this case, you can focus on recovering high-frequency signals.
  • the terminal classifies the input signal, and the signal type includes a voiced signal and a non-voiced signal.
  • the voiced signal refers to the sound signal whose energy is concentrated in the low-frequency area
  • the unvoiced signal refers to the sound signal whose energy is concentrated in the high-frequency area.
  • the signal type identifiers corresponding to voiced signals and non-voiced signals are different.
  • the audio transmitting end determines that the low-frequency energy proportion of the input signal is higher than the threshold, it determines that the input signal is a voiced signal and sets the signal type identifier of the voiced signal; if the audio transmitting end determines that the high-frequency energy proportion of the input signal is low When the threshold value is reached, the input signal is determined to be a non-voiced signal, and the signal type identifier of the non-voiced signal is set.
  • the audio sending end after calculating the low-frequency energy proportion, first classifies the input signal, generates a signal type identifier, and carries the signal type identifier of the input signal in the audio data packet, so that the audio
  • the receiving end determines that packet loss has occurred, it can determine the second sub-band encoded data that needs to be repaired from the first sub-band encoded data based on the signal type identifier, and then perform error correction on the second sub-band encoded data based on the redundant data. coding.
  • the signal type identifier belongs to the voiced signal identifier, it means that in the case of packet loss, it is mainly necessary to perform data recovery on the low-frequency signal subband, and the corresponding audio receiving end determines the third low-frequency signal subband from the first subband encoded data.
  • One sub-band encoded data (second sub-band encoded data), and then perform data recovery on the second sub-band encoded data based on redundant data.
  • the signal type identifier belongs to the non-voiced signal identifier, it means that in the case of packet loss, data recovery of the high-frequency signal subband is mainly required, and the corresponding audio receiving end determines the first of the high-frequency signal subband from the first subband encoded data.
  • sub-band encoded data (second sub-band encoded data), and then perform data recovery on the second sub-band encoded data based on redundant data.
  • the audio sending end when the signal type identifier belongs to the voiced signal identifier, the audio sending end performs error correction coding on the first subband coded data of the low-frequency subband; when the signal type identifier belongs to the non-voiced signal identifier, the audio sending end performs error correction coding on the high-frequency subband.
  • the first subband coded data of the subband is error correction coded.
  • Step 408 Pack the first subband coded data, redundant data and signal type identifier to generate an audio data packet.
  • the audio sending end packages the signal type identification, the first subband encoded data and the redundant data and sends it to the audio receiving end, so that the audio receiving end can, based on the signal type identification, start from the first subband in the event of packet loss.
  • the second subband coded data is determined from the band coded data, and data recovery and signal subband prediction are performed.
  • the audio receiving end determines the first subband encoded data of the low-frequency subband as the second subband encoded data; if the signal type identifier is a non-voiced signal identifier, the signal receiving end determines the high-frequency subband encoded data.
  • the first subband coded data of the subband is determined as the second subband coded data.
  • Step 409 Send the audio data packet to the audio receiving end.
  • the audio sending end after the audio sending end generates the audio data packet based on the signal type identifier, the first subband encoded data and the redundant data, it can send the audio data packet to the audio receiving end, and the corresponding audio receiving end is used to detect packet loss.
  • the second subband coded data is determined from the first subband coded data based on the signal type identification, and data recovery is performed based on the second subband coded data and the redundant data.
  • the audio transmitting end determines the frequency band where the energy is concentrated by calculating the low-frequency energy proportion of the low-frequency subband, and then determines the second subband encoding data, so that error correction coding can be performed on the actually important signal subbands to avoid Due to the error correction coding of the fixed frequency band, it is impossible to recover a continuous signal when packets are lost, which improves the signal transmission quality on the basis of reducing the transmission bandwidth.
  • the audio receiving end can determine the signal type identifier from the first subband encoded data based on the signal type identifier.
  • the second sub-band encoded data that needs to be repaired is then restored based on the redundant data and the second sub-band encoded data; so that the audio receiving end does not need to repeatedly determine the signal sub-band that needs to be repaired, and can accurately locate the signal sub-band that needs to be repaired.
  • Signal subbands for data recovery improve the accuracy of data recovery in case of packet loss.
  • the above embodiments illustrate the process of subband coding and error correction coding performed by the audio sending end.
  • the audio receiving end After receiving the audio data packet, it first determines whether there is packet loss. In the event of packet loss, the audio receiving end needs to perform data recovery and subband prediction on the first subband encoded data based on redundant data, thereby outputting a continuous sound signal.
  • FIG. 6 shows a flow chart of an audio transmission method provided by an exemplary embodiment of the present application. This embodiment takes the method executed by the audio receiving end as an example to illustrate. The method includes the following steps:
  • Step 601 Receive audio data packets.
  • the audio data packet contains redundant data and at least two sets of first sub-band coded data.
  • the redundant data is obtained by the audio sending end performing error correction coding on the second sub-band coded data in the first sub-band coded data.
  • the third sub-band coded data is One subband coded data is obtained by the audio sending end performing subband decomposition and compression coding on the input signal. Different first subband coded data correspond to the first signal subbands of different audio frequency bands in the input signal, and the second subband coded data The corresponding audio frequency band is the frequency band where the signal energy is concentrated.
  • the audio receiving end After receiving the audio data packet, the audio receiving end performs data analysis, obtains the first subband encoding data and redundant data contained in the audio data packet, and caches the data.
  • Step 602 Perform packet loss detection on the first subband encoded data.
  • the audio sending end adds consecutive numbers to the first subband encoded data according to the timing of signal collection. After the audio receiving end parses the data, it detects whether the numbers corresponding to the first subband encoded data are consecutive. If the numbers are continuous, it is determined that the first subband encoded data is not lost; if the numbers are discontinuous, it is determined that packet loss occurs.
  • Step 603 In the case of packet loss of the first sub-band encoded data, perform data recovery on the first sub-band encoded data based on redundant data to obtain an output signal.
  • the audio receiver When no packet loss occurs, the audio receiver directly performs the subband decoding process. If packet loss is detected, the audio receiver needs to first obtain the redundant data and adjacent data packets from the data buffer area for error correction decoding to obtain the subband coded data at the packet loss location, and then through subband decoding and Subband prediction, obtaining a continuous output signal.
  • the audio receiving end receives an audio data packet containing redundant data and first subband coded data.
  • the redundant data is obtained by the audio sending end performing error correction coding on the data in the energy-concentrated frequency band.
  • it can not only improve the network's anti-packet loss capability, but also reduce the amount of redundant data and reduce the storage resources consumed by the audio receiver to cache data. On the other hand, it can Reduce transmission bandwidth and operating costs.
  • the redundant data is not obtained by error correction coding of the complete input signal, but only the low-frequency subband or high-frequency subband in the input signal according to the signal type of the input signal. Therefore, only the input signal of the low-frequency subband or the high-frequency subband can be recovered based on the redundant data.
  • FIG. 7 shows a flow chart of an audio transmission method provided by another exemplary embodiment of the present application.
  • This embodiment takes the method executed by the audio receiving end as an example to illustrate. The method includes the following steps:
  • Step 701 Receive audio data packets.
  • Step 702 Perform packet loss detection on the first subband encoded data.
  • steps 701 to 702 For the specific implementation of steps 701 to 702, reference may be made to the above-mentioned steps 601 to 602, which will not be described again in the embodiment of the present application.
  • Step 703 Determine the second subband coded data from the first subband coded data based on the signal type identifier.
  • the audio data packet also contains a signal type identifier, which is used to indicate that the input signal corresponding to the first subband encoded data belongs to a voiced signal or a non-voiced signal.
  • the second sub-band coded data of the voiced signal is the first sub-band coded data of the low-frequency subband
  • the second sub-band coded data of the unvoiced signal is the first sub-band coded data of the high-frequency subband.
  • the audio frequency of the low-frequency subband is lower than the audio frequency of other first signal subbands
  • the audio frequency of the high-frequency subband is higher than the audio frequency of other first signal subbands.
  • the voiced signal refers to a sound signal (input signal) whose signal energy is concentrated in a low-frequency region
  • the unvoiced signal refers to a sound signal (input signal) whose signal energy is concentrated in a non-low-frequency region.
  • the audio receiving end When there is packet loss, the audio receiving end needs to read relevant redundant data and adjacent data packets from the data buffer area for error correction and decoding.
  • the redundant data is obtained by the audio receiving end performing error correction coding on the second sub-band encoded data. Therefore, the audio receiving end first obtains the signal type (voiced signal or non-voiced signal) based on the signal type indicated by the signal type identifier, from at least two sets of third Second sub-band coded data is determined from one sub-band coded data.
  • the first subband coded data of the low frequency subband is determined as the second subband coded data; if the signal type indicator indicates that the signal type is a non-voiced signal, then the high frequency subband is determined as the second subband coded data. The first sub-band coded data is determined as the second sub-band coded data.
  • Step 704 Perform error correction decoding on the second subband encoded data based on the redundant data and the first subband encoded data in the adjacent audio data packet.
  • the audio receiver uses the corresponding error correction decoding algorithm to perform error correction decoding to obtain the subband coded data and signal classification identification of the packet loss location.
  • Step 705 Perform subband decoding on the second subband coded data after error correction decoding to obtain the second signal subband.
  • the audio receiving end After the audio receiving end recovers the second subband encoded data at the packet loss position, it compresses and decodes the complete second subband encoded data to obtain the second signal subband.
  • Step 706 Perform data recovery on other first subband encoded data based on the second signal subband.
  • the redundant data is obtained by the audio sending end performing error correction coding on the second sub-band encoded data, and the audio receiving end also performs packet loss data recovery on the second sub-band encoded data based on the redundant data.
  • Audio data is transmitted in the channel in the form of data packets. Packet loss means that there is packet loss in each sub-band encoded data. Therefore, the audio receiving end also needs to perform subband prediction on the data of other subbands based on the recovered second signal subband and signal classification identifier to obtain a complete sound signal.
  • step 706 specifically includes the following steps 706a to 706c (not shown in the figure):
  • Step 706a When the signal type identifier belongs to the voiced signal identifier, perform feature extraction on the second signal subband to obtain the first signal feature.
  • the first signal feature includes at least one of logarithmic power spectrum, gene period and cross-correlation value. A sort of.
  • the corresponding audio receiving end also needs to predict the high-frequency sub-band signal through the decoded signal of the low-frequency sub-band.
  • the relevant features of the low-frequency subband are extracted as input to the deep learning network, such as logarithmic power spectrum, pitch period, and cross-correlation value.
  • Step 706b Input the first signal feature into the first deep learning network to obtain the high-frequency subband power spectrum output by the first deep learning network.
  • the first deep learning network is trained based on the signal characteristics of the sample low-frequency signal and the power spectrum of the sample high-frequency signal.
  • the sample low-frequency signal and the sample high-frequency signal belong to different signal subbands of the same sound signal.
  • the computer device performs sub-band decomposition on the sample sound signal to obtain a sample low-frequency signal and a sample high-frequency signal.
  • the computer device inputs the signal characteristics of the sample low-frequency signal into the first deep learning network to obtain the high-frequency subband power spectrum predicted by the first deep learning network.
  • the computer device performs backpropagation training on the first deep learning network based on the power spectrum of the sample high-frequency signal and the prediction results of the first deep learning network.
  • the first deep learning network can be a combination of multi-layer convolutional neural networks (Convolutional Neural Networks, CNN) and multi-layer long short-term memory networks (Long Short-Term Memory, LSTM).
  • CNN Convolutional Neural Networks
  • LSTM Long Short-Term Memory
  • Step 706c Perform inverse Fourier transform based on the high-frequency sub-band power spectrum and random phase values to obtain the high-frequency sub-band signal.
  • the high-frequency power spectrum value predicted by the first deep learning network is combined with the random phase value and undergoes inverse Fourier transformation to obtain the time-domain high-frequency sub-band signal.
  • the audio receiving end combines the second signal subband recovered based on the redundant data and the high-frequency subband signal predicted based on the second signal subband, that is, a complete output signal recovered from the data can be obtained.
  • step 706 when the received audio frame belongs to a non-voiced frame, step 706 specifically includes the following steps 706d to 706f (not shown in the figure):
  • Step 706d When the signal type identifier belongs to a non-voiced signal identifier, perform feature extraction on the second signal subband to obtain a second signal feature, where the second signal feature includes a logarithmic power spectrum.
  • non-voiced frames since the redundant data is obtained by error correction coding of the first sub-band coded data of the high-frequency sub-band, after error-correction decoding, only the second signal sub-band of the high frequency can be obtained. band, and in order to recover the completed input signal, the corresponding audio receiving end also needs to predict the low-frequency subband through the decoded signal of the high-frequency subband (the second signal subband).
  • the relevant features of the high-frequency signal are extracted as input to the deep learning network, such as the logarithmic power spectrum.
  • Step 706e Input the second signal feature into the second deep learning network to obtain the low-frequency subband power spectrum output by the second deep learning network.
  • the second deep learning network is trained based on the signal characteristics of the sample high-frequency signal and the power spectrum of the sample low-frequency signal.
  • the sample low-frequency signal and the sample high-frequency signal belong to different signal subbands of the same sound signal.
  • the computer device performs sub-band decomposition on the sample sound signal to obtain a sample low-frequency signal and a sample high-frequency signal.
  • the computer device inputs the signal characteristics of the sample high-frequency signal into the second deep learning network to obtain the low-frequency subband power spectrum predicted by the second deep learning network.
  • the computer device performs backpropagation training on the second deep learning network based on the power spectrum of the sample low-frequency signal and the prediction results of the second deep learning network.
  • the second deep learning network can be a combination of multi-layer CNN and multi-layer LSTM.
  • Step 706f Perform inverse Fourier transform based on the low-frequency sub-band power spectrum and random phase values to obtain the low-frequency sub-band signal.
  • the low-frequency signal power spectrum value predicted by the second deep learning network is combined with the random phase value, and then through the inverse Fourier transform, the time domain low-frequency sub-band signal can be obtained.
  • the audio receiving end combines the second signal subband recovered based on the redundant data and the low-frequency subband signal predicted based on the second signal subband, that is, a complete output signal recovered from the data can be obtained.
  • Step 707 Perform subband synthesis based on each second signal subband to obtain an output signal.
  • the complete subband signals of all subbands are obtained. Subsequently, through sub-band synthesis, such as the QMF sub-band synthesis method, multiple sets of sub-band signals are synthesized into a complete sub-band signal for output.
  • sub-band synthesis such as the QMF sub-band synthesis method
  • step 703 to 707 are the process in which the audio receiving end performs error correction decoding and sub-band prediction to obtain a complete sound signal when the sub-band encoded data packets are lost.
  • step 702 also includes the following steps (not shown in the figure):
  • sub-band decoding and sub-band synthesis are performed on the first sub-band code data to obtain an output signal.
  • the audio receiving end can directly compress and decode each group of subband coded data to obtain the second signal subband. Then through inverse Fourier transform, sub-band synthesis and other processes, the output signal is obtained.
  • the audio receiving end in the case of packet loss, can recover key signal subbands based on redundant data, and then predict other signal subbands based on key signal subbands, ensuring neutrality and intelligibility of the input signal.
  • the transmission accuracy of relevant partial signals further improves the anti-packet loss capability of the audio transmission network.
  • the audio sending end encodes the input signal: first, it performs sub-band decomposition and sub-band coding on the input signal, and at the same time determines the type of the input signal.
  • the signal type includes voiced signals and non-voiced signals; for voiced signals, the audio sending end extracts low-frequency sub-band coding
  • the code stream and signal type identifier are error-corrected and encoded.
  • the high-frequency sub-band encoded code stream and signal classification identifier are extracted for error correction encoding.
  • the sub-band encoded data and error-correction encoded redundant data are packaged.
  • the audio receiving end decodes the received signal: first receives the data and caches it; detects whether there is packet loss. If no packet loss occurs, the sub-band decoding process is performed, and the decoded sub-band signals are The complete output signal is obtained through sub-band synthesis. If packet loss occurs, the relevant redundant data and adjacent data treasures are obtained from the data cache area for error correction and decoding. After error correction and decoding, the sub-band coded data and signal type of the packet loss location are obtained. Identification; Based on the sub-band code stream obtained by error correction decoding, the remaining sub-bands are predicted and restored to obtain all sub-band signals, and then the complete output signal is obtained through sub-band synthesis.
  • FIG. 9 is a structural block diagram of an audio transmission device provided by an exemplary embodiment of the present application.
  • the device includes the following structure:
  • the subband coding module 901 is used to perform subband decomposition and compression coding on the input signal to obtain the first subband coded data of at least two sets of signal subbands. Different signal subbands correspond to different audio frequency bands of the input signal;
  • Determining module 902 is configured to determine second sub-band encoded data from the first sub-band encoded data based on the energy distribution of the input signal, and the audio frequency band of the signal sub-band corresponding to the second sub-band encoded data is Frequency band where signal energy is concentrated;
  • the error correction coding module 903 is used to perform error correction coding on the second subband coded data to obtain redundant data;
  • the data sending module 904 is used to send audio data packets to the audio receiving end.
  • the audio data packets contain the first subband encoding data and the redundant data.
  • the audio receiving end is used to respond to packet loss situations. Data recovery is performed on the first subband encoded data based on the redundant data.
  • the determination module 902 is also used to:
  • the second sub-band coded data is determined from the first sub-band coded data.
  • the determination module 902 is also used to:
  • the first sub-band encoded data of the high-frequency sub-band is determined as the second sub-band encoded data, and the audio frequency of the high-frequency sub-band is Audio frequencies higher than other signal subbands.
  • the device also includes:
  • An identification generation module configured to generate a signal type identification based on the low-frequency energy proportion, the signal type identification being used to indicate that the input signal belongs to a voiced signal or a non-voiced signal, wherein the low-frequency energy of the voiced signal accounts for ratio is higher than the threshold, and the low-frequency energy proportion of the non-voiced sound signal is lower than the threshold;
  • the data sending module 904 is also used to:
  • the subband encoding module 901 is also used to:
  • Subband decomposition and compression coding are performed on the frequency domain signal to generate the first subband coded data of at least two sets of signal subbands.
  • the subband encoding module 901 is also used to:
  • the frequency domain signal is decomposed into sub-bands through at least two band-pass filters to obtain at least two signal sub-bands.
  • Different band-pass filters correspond to different audio frequency bands, and the frequency of each band-pass filter is The audio frequency band is continuous;
  • Frequency shifting and quantization coding are performed on the signal subbands to obtain the first subband coded data of each group of the signal subbands.
  • FIG. 10 is a structural block diagram of an audio transmission device provided by another exemplary embodiment of the present application.
  • the device includes the following structure:
  • Data receiving module 1001 used to receive audio data packets.
  • the audio data packets contain redundant data and at least two sets of first sub-band encoding data.
  • the redundant data is encoded by the audio sending end to the first sub-band.
  • the second sub-band coded data in the data is obtained by error correction coding.
  • the first sub-band coded data is obtained by the audio sending end performing sub-band decomposition and compression coding on the input signal.
  • Different first sub-band coded data correspond to The first signal subbands of different audio frequency bands in the input signal, and the audio frequency band of the second subband encoded data is the signal energy concentration frequency band;
  • Packet loss detection module 1002 used to perform packet loss detection on the first subband encoded data
  • the decoding module 1003 is configured to perform data recovery on the first sub-band encoded data based on the redundant data to obtain an output signal when the first sub-band encoded data loses packets.
  • the audio data packet also contains a signal type identifier, which is used to indicate that the input signal belongs to a voiced signal or a non-voiced signal, wherein the second subband encoding of the voiced signal
  • the data is the first sub-band coded data of the low-frequency subband
  • the second sub-band coded data of the unvoiced signal is the first sub-band coded data of the high-frequency subband
  • the audio frequency is lower than the audio frequency of other first signal sub-bands
  • the audio frequency of the high-frequency sub-band is higher than the audio frequency of other first signal sub-bands
  • the decoding module 1003 is also used to:
  • Subband synthesis is performed based on each of the second signal subbands to obtain the output signal.
  • the decoding module 1003 is also used to:
  • the signal type identifier belongs to a voiced signal identifier
  • feature extraction is performed on the second signal subband to obtain a first signal feature.
  • the first signal feature includes a logarithmic power spectrum, gene period and cross-correlation value. at least one of;
  • the first deep learning network is based on the signal characteristics of the sample low-frequency signal and the sample high-frequency signal. It is obtained through power spectrum training that the sample low-frequency signal and the sample high-frequency signal belong to different signal subbands of the same sound signal;
  • An inverse Fourier transform is performed based on the high-frequency sub-band power spectrum and random phase values to obtain a high-frequency sub-band signal.
  • the decoding module 1003 is also used to:
  • the signal type identifier belongs to a non-voiced signal identifier
  • the second deep learning network is based on the signal characteristics of the sample high-frequency signal and the sample low-frequency signal. Power spectrum training shows that the sample low-frequency signal and the sample high-frequency signal belong to different signal subbands of the same sound signal;
  • An inverse Fourier transform is performed based on the low-frequency sub-band power spectrum and random phase values to obtain a low-frequency sub-band signal.
  • the decoding module 1003 is also used to:
  • sub-band decoding and sub-band synthesis are performed on the first sub-band code data to obtain the output signal.
  • At least two sets of sub-band coded data are obtained by performing frequency band decomposition and compression coding on the input signal, and error correction coding is performed on part of the sub-band coded data where the signal energy is concentrated to ensure that the audio The receiver's ability to recover primary audio data.
  • error correction coding is performed on part of the sub-band coded data where the signal energy is concentrated to ensure that the audio The receiver's ability to recover primary audio data.
  • the terminal 1100 may be a portable mobile terminal, such as a smart phone, a tablet computer, a Moving Picture Experts Group Audio Layer III (MP3) player, a Moving Picture Experts Group Audio Layer III (MP3) player, or a Moving Picture Experts Group Audio Layer III (MP3) player. Experts Group Audio Layer IV, MP4) player.
  • the terminal 1100 may also be called user equipment, portable terminal, or other names.
  • the terminal 1100 includes: a processor 1101 and a memory 1102.
  • the processor 1101 may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc. deal with The processor 1101 can be implemented using at least one hardware form among digital signal processing (DSP), field-programmable gate array (Field-Programmable Gate Array, FPGA), and programmable logic array (Programmable Logic Array, PLA). .
  • the processor 1101 may also include a main processor and a co-processor.
  • the main processor is a processor used to process data in the wake-up state, also called a central processing unit (Central Processing Unit, CPU); the co-processor is A low-power processor used to process data in standby mode.
  • CPU Central Processing Unit
  • the processor 1101 may be integrated with a graphics processor (Graphics Processing Unit, GPU), and the GPU is responsible for rendering and drawing content that needs to be displayed on the display screen.
  • the processor 1101 may also include an artificial intelligence (Artificial Intelligence, AI) processor, which is used to process computing operations related to machine learning.
  • AI Artificial Intelligence
  • Memory 1102 may include one or more computer-readable storage media, which may be tangible and non-transitory. Memory 1102 may also include high-speed random access memory, and non-volatile memory, such as one or more disk storage devices, flash memory storage devices. In some embodiments, the non-transitory computer-readable storage medium in the memory 1102 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 1101 to implement the method provided by the embodiment of the present application.
  • the terminal 1100 optionally further includes: a peripheral device interface 1103.
  • the peripheral device interface 1103 may be used to connect at least one input/output (I/O) related peripheral device to the processor 1101 and the memory 1102 .
  • the processor 1101, the memory 1102, and the peripheral device interface 1103 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 1101, the memory 1102, and the peripheral device interface 1103 or Both of them can be implemented on separate chips or circuit boards, which is not limited in this embodiment.
  • Embodiments of the present application also provide a computer-readable storage medium that stores at least one instruction.
  • the at least one instruction is loaded and executed by a processor to implement the audio transmission method described in each of the above embodiments. .
  • a computer program product including computer instructions stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the audio transmission method provided in various optional implementations of the above aspect.
  • the information including but not limited to user equipment information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • signals involved in this application All are authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data need to comply with relevant laws, regulations and standards of relevant countries and regions.
  • the input signals, audio data, etc. involved in this application were obtained with full authorization.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

一种音频传输方法、装置、终端(1100)、存储介质及程序产品,属于多媒体传输技术领域。方法包括:对输入信号进行子带分解和压缩编码,得到至少两组信号子带的第一子带编码数据(301);基于输入信号的能量分布情况,从第一子带编码数据中确定第二子带编码数据(302);对第二子带编码数据进行纠错编码,得到冗余数据(303);向音频接收端(120)发送音频数据包,音频数据包中包含第一子带编码数据和冗余数据(304)。音频传输方法可以在提升音频传输质量的同时,降低音频传输过程中冗余数据的数据量。

Description

音频传输方法、装置、终端、存储介质及程序产品
本申请要求于2022年04月18日提交的申请号为202210405956.4、发明名称为“音频传输方法、装置、终端、存储介质及程序产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及多媒体传输技术领域,特别涉及一种音频传输方法、装置、终端、存储介质及程序产品。
背景技术
语音编解码在现代通讯系统中占有重要地位。在音视频通话场景中,信号发送端通过编码器对声音信号进行压缩和打包,而后按照网络传输格式和协议将数据发送至接收端。接收端对数据包进行解包和解码得到声音信号。
相关技术中,为了解决传输过程中的丢包问题,发送端通常采用前向纠错(ForwardErrorCorrection,FEC)技术进行信道编码,生成冗余数据包。接收端在确定存在丢包情况时,能够基于冗余数据包进行数据恢复,得到完整的多媒体数据。
然而,FEC冗余数据包会消耗额外的传输带宽,并且传输系统的抗丢包能力与编码冗余度正相关。为了保证通信质量,需要提高FEC编码冗余度,从而导致传输带宽和运行成本大幅度增加。
发明内容
本申请实施例提供了一种音频传输方法、装置、终端、存储介质及程序产品。所述技术方案如下:
一方面,本申请提供了一种音频传输方法,所述方法由音频发送端执行,所述方法包括:
对输入信号进行子带分解和压缩编码,得到至少两组信号子带的第一子带编码数据,不同信号子带对应所述输入信号的不同音频频段;
基于所述输入信号的能量分布情况,从所述第一子带编码数据中确定第二子带编码数据,所述第二子带编码数据对应信号子带的音频频段为信号能量集中频段;
对所述第二子带编码数据进行纠错编码,得到冗余数据;
向音频接收端发送音频数据包,所述音频数据包中包含所述第一子带编码数据和所述冗余数据,所述音频接收端用于在丢包的情况下基于所述冗余数据对所述第一子带编码数据进行数据恢复。
另一方面,本申请提供了一种音频传输方法,所述方法由音频接收端执行,所述方法包括:
接收音频数据包,所述音频数据包中包含冗余数据以及至少两组第一子带编码数据,所述冗余数据由音频发送端对所述第一子带编码数据中的第二子带编码数据进行纠错编码得到,所述第一子带编码数据由所述音频发送端对输入信号进行子带分解和压缩编码得到,不同子带编码数据对应所述输入信号中不同音频频段的第一信号子带,所述第二子带编码数据的音频频段为信号能量集中频段;
对所述第一子带编码数据进行丢包检测;
在所述第一子带编码数据丢包的情况下,基于所述冗余数据对所述第一子带编码数据进行数据恢复,得到输出信号。
另一方面,本申请提供了一种音频传输装置,所述装置包括:
子带编码模块,用于对输入信号进行子带分解和压缩编码,得到至少两组信号子带的第一子带编码数据,不同信号子带对应所述输入信号的不同音频频段;
确定模块,用于基于所述输入信号的能量分布情况,从所述第一子带编码数据中确定第二子带编码数据,所述第二子带编码数据对应信号子带的音频频段为信号能量集中频段;
纠错编码模块,用于对所述第二子带编码数据进行纠错编码,得到冗余数据;
数据发送模块,用于向音频接收端发送音频数据包,所述音频数据包中包含所述第一子带编码数据和所述冗余数据,所述音频接收端用于在丢包的情况下基于所述冗余数据对所述第一子带编码数据进行数据恢复。
另一方面,本申请提供了一种音频传输装置,所述装置包括:
数据接收模块,用于接收音频数据包,所述音频数据包中包含冗余数据以及至少两组第一子带编码数据,所述冗余数据由音频发送端对所述第一子带编码数据中的第二子带编码数据进行纠错编码得到,所述第一子带编码数据由所述音频发送端对输入信号进行子带分解和压缩编码得到,不同第一子带编码数据对应所述输入信号中不同音频频段的第一信号子带,所述第二子带编码数据的音频频段为信号能量集中频段;
丢包检测模块,用于对所述第一子带编码数据进行丢包检测;
解码模块,用于在所述第一子带编码数据丢包的情况下,基于所述冗余数据对所述第一子带编码数据进行数据恢复,得到输出信号。
另一方面,本申请提供了一种终端,所述终端包括处理器和存储器;所述存储器中存储有至少一段程序,所述至少一段程序由所述处理器加载并执行以实现如上述方面所述的音频传输方法。
另一方面,本申请提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条计算机程序,所述计算机程序由处理器加载并执行以实现如上述方面所述的音频传输方法。
根据本申请的一个方面,提供了一种计算机程序产品,该计算机程序产品包括计算机指令,该计算机指令存储在计算机可读存储介质中。终端的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该终端执行上述方面的各种可选实现方式中提供的音频传输方法。
本申请实施例提供的技术方案至少包括以下有益效果:
本申请实施例中,通过对输入信号进行分频段分解和压缩编码,得到至少两组第一子带编码数据,针对其中信号能量集中的部分子带编码数据进行纠错编码,确保音频接收端对主要音频数据的恢复能力。相比于直接对完整的输入信号进行纠错编码的方案,在提升音频传输质量的同时,能够降低冗余数据的数据量,从而降低纠错编码对传输带宽和运行成本的消耗。
附图说明
图1是相关技术方案的音频传输流程图;
图2是本申请一个示例性实施例提供的实施环境的示意图;
图3是本申请一个示例性实施例提供的音频传输方法的流程图;
图4是本申请另一个示例性实施例提供的音频传输方法的流程图;
图5是本申请一个示例性实施例提供的子带编码模型的框架图;
图6是本申请另一个示例性实施例提供的音频传输方法的流程图;
图7是本申请另一个示例性实施例提供的音频传输方法的流程图;
图8是本申请一个示例性实施例提供的音频编解码系统的框架图;
图9是本申请一个示例性实施例提供的音频传输装置的结构框图;
图10是本申请另一个示例性实施例提供的音频传输装置的结构框图;
图11是本申请一个示例性实施例提供的终端的结构框图。
具体实施方式
语音编解码在现代通讯系统中占有重要的地位。如图1所示,在语音通话场景中,声音信号经由麦克风采集得到,终端(发送端)通过模数转换电路将模拟的声音信号转换为数字声音信号。数字声音信号经过语音编码器进行压缩编码,而后按照通信网络传输格式和协议打包发送到接收端,接收端设备接收到数据包后解包输出语音编码压缩码流,通过语音解码器进行压缩解码后重新生成语音数字信号。最后语音数字信号通过扬声器播放声音信号。语音编解码有效地降低语音信号传输的带宽,对于节省语音信息存储传输成本,保障通信网络传输过程中的语音信息完整性方面起了决定性作用。
在实际应用中,传输网络的不稳定性会导致传输过程出现丢包现象,造成接收端声音的卡顿和不连贯,使收听者体验较差。为抵抗网络丢包采取了多种方法,包括:前向纠错、丢包隐藏、自动重传请求等,其中前向纠错抗丢包方案能有效解决完美恢复丢包位置信息。经过前向纠错编码后的数据打包发送到接收端,接收端接收到前向纠错码后进行解码从而能恢复出丢包位置的完整数据,实现完美恢复的效果。前项纠错需要额外消耗带宽,且前项纠错的冗余度越高抗丢包能力越强,但同时也带来带宽的增加。因此如何有效控制前向纠错冗余度,减少带宽消耗的同时达到端到端较佳的音频传输效果,是值得研究的课题。
本申请提出了一种音频传输方法,请参考图2,其示出了本申请一个示例性实施例提供的实施环境的示意图。该实施环境中包括:音频发送端110和音频接收端120。
音频发送端110结合子带编解码方法,对输入信号进行子带分解和压缩编码并进行信号分类,根据信号分类结果,对能量集中的部分子带编码数据进行纠错编码,生成冗余数据。音频发送端110向音频接收端120发送各组子带编码数据以及冗余数据。音频接收端120接收并解析数据,检测子带编码数据是否丢包。在丢包的情况下,音频接收端120可以基于冗余数据恢复出能量集中频段的信号子带,进而通过子带预测得到完整的输出信号。通过结合子带编码以及纠错编码,传输部分子带的纠错编码,相比于相关技术中的纠错编码方案,能够有效降低纠错编码的比特消耗,从而降低传输带宽和运行成本。
值得一提的是,图中所示的音频发送端110也可以作为接收端接收音频数据,音频接收端120也可以作为发送端发送音频数据。并且,图中仅示出了两个终端接入传输网络,实际应用场景(比如多人通话场景或在线会议场景等)中终端的数量可以更多。本申请实施例对终端的数量和设备类型不加以限定。
请参考图3,其示出了本申请一个示例性实施例提供的音频传输方法的流程图。本实施例以该方法由音频发送端执行为例进行说明,该方法包括如下步骤:
步骤301,对输入信号进行子带分解和压缩编码,得到至少两组信号子带的第一子带编码数据,不同信号子带对应输入信号的不同音频频段。
输入信号为终端通过麦克风等装置采集到的声音信号。在一种可能的实施方式中,音频发送端将输入信号从时域转换至频域,在频域上对输入信号进行子带分解,得到不同音频频段的信号子带,并分别对各组信号子带的输入信号进行压缩编码,得到各组信号子带的第一子带编码数据。因此不同信号子带对应输入信号的不同音频频段。
可选的,音频发送端通过对输入信号进行一次子带分解和压缩编码得到各信号子带的第一子带编码数据,或者音频发送端对输入信号进行多次子带分解(例如先通过一次子带分解得到两组信号子带,然后继续对部分或全部信号子带进行再分解),然后进行压缩编码。若单次子带分解,则可以得到两组信号子带,对应压缩编码后得到两组第一子带编码数据;若经过多次子带分解,则可以至少得到三组以及三组以上信号子带,对应压缩编码后得到三组以 及三组以上第一子带编码数据。
示意性的,在语音通话场景中,人说话声的频率通常分布在500Hz至4KHz的范围内,因此对于16KHz音频文件的传输,音频发送端首先对输入信号进行子带分解和压缩编码,得到0-8KHz以及8KHz-16KHz两个音频频段的第一子带编码数据。
可选的,音频发送端可以使用多个带通滤波器(Band-Pass Filter,BPF)将输入信号分成若干连续的音频频段,每个音频频段的输入信号称为信号子带,进而对每个信号子带进行担负压缩编码,从而得到输入信号的多组第一子带编码数据。
步骤302,基于输入信号的能量分布情况,从第一子带编码数据中确定第二子带编码数据,第二子带编码数据对应信号子带的音频频段为信号能量集中频段。
不同于相关技术中直接对全部原始数据包进行纠错编码,并将编码后数据发送给音频接收端,虽然抗丢包能力较强,但是相对会带来较多额外带宽的消耗;为了减少音频传输过程中的带宽消耗,本实施例中,通过仅提取输入信号中部分关键信号子带的子带编码数据进行纠错编码,以减少传输冗余数据(纠错编码后得到的编码数据)所需消耗的带宽。则为了可以使得后续可以对部分第一子带编码数据进行纠错编码,首先音频发送端需要从第一子带编码数据中确定关键信号子带的第二子带编码数据。
可选的,由于关键信号子带往往是输入信号中的信号能量集中频段,比如,若输入信号的绝大部分能量集中在低频,则输入信号中的低频子带即输入信号中的关联信号子带;若输入信号的绝大部分能量集中在高频,则输入信号中的高频子带即关键信号子带。对应音频发送端可以通过计算输入信号的能量分布情况,从输入信号中确定出信号能量集中的音频频段,将该音频频段对应的第一子带编码数据确定为第二子带编码数据,即关键子带编码数据。
例如,对于步骤301中的输入信号,若音频发送端确定信号能量集中在0-8KHz这一音频频段,则将该音频频段的第一子带编码数据确定为第二子带编码数据。
可选的,第二子带编码数据为能量占比最高的一组第一子带编码数据,或者,在频带划分较为精细的情况下,第二子带编码数据包含能量占比最高的多组第一子带编码数据。本申请实施例对此不作限定。
步骤303,对第二子带编码数据进行纠错编码,得到冗余数据。
在实际音频传输场景中,由于传输网络的不稳定性、设备硬件的故障等原因,导致音频数据传输过程出现丢包现象,从而造成音频接收端所播放声音的卡顿和不连贯,会使收听者体验较差。传输系统通常采用纠错编码的方式抵抗网络丢包。纠错编码又称为信道编码,主要包括丢包隐藏(Packet Loss Concealment,PLC)、自动重传请求(Automatic Repeat-reQuest,ARQ)、前向纠错(Forward Error Correction,FEC)编码、混合纠错编码、比特交织以及BCH纠错编码等技术。其中,前向纠错编码又可以通过里德-所罗门码(Reed-Solomoncode,RScode),汉明码(HammingCode)或低密度奇偶校验码(Low Density Parity Check Code,LDPC)等多种算法实现。
音频发送端对第二子带编码数据进行纠错编码得到冗余数据,而对于其它第一子带编码数据则不进行纠错编码。以此确保音频接收端能够在丢包的情况下基于冗余数据首先恢复出重要音频频段(关键音频频段)的声音信号。同时又能够降低冗余数据对传输带宽的损耗。
步骤304,向音频接收端发送音频数据包,音频数据包中包含第一子带编码数据和冗余数据,音频接收端用于在丢包的情况下基于冗余数据对所述第一子带编码数据进行数据恢复。
可选的,音频发送端将输入信号对应的各组第一子带编码数据以及冗余数据打包发送至音频接收端,使音频接收端基于第一子带编码数据和冗余数据进行解码,最终输出声音信号。
综上所述,本申请实施例中,通过对输入信号进行分频段分解和压缩编码,得到至少两组第一子带编码数据,针对其中信号能量集中的部分第二子带编码数据进行纠错编码,确保音频接收端对主要音频数据的恢复能力。相比于直接对完整的输入信号进行纠错编码的方案,在提升音频传输质量的同时,能够降低冗余数据的数据量,从而降低纠错编码对传输带宽和 运行成本的消耗。
在一种可能的实施方式中,开发人员可以基于实际应用场景,设置固定的需要进行纠错编码的音频频段,例如对于语音通话场景,由于人声通常为低频信号,因此设置音频发送端将低频子带的第一子带编码数据作为第二子带编码数据,也即仅对低频子带的第一子带编码数据进行纠错编码。而为了提高音频编码以及传输质量,音频发送端还可以通过计算能量占比从第一子带编码数据中确定第二子带编码数据。
请参考图4,其示出了本申请另一个示例性实施例提供的音频传输方法的流程图。本实施例以该方法由音频发送端执行为例进行说明,该方法包括如下步骤:
步骤401,对麦克风采集到的模拟声音信号进行模数转换,生成数字声音信号。
在语音通话场景中,声音信号经由麦克风采集得到,此时音频发送端采集到的声音信号为模拟信号。音频发送端通过模数转换电路将模拟的声音信号转换为数字声音信号,以便进行后续的压缩编码、纠错编码以及音频传输。
步骤402,对数字声音信号进行傅里叶变换,得到频域信号。
子带编码技术是将原始信号由时间域转变为频率域,然后将其分割为若干个子频带,并分别对各个子频带的信号进行数字编码的技术。由于音频发送端需要对输入信号进行子带分解,因此首先将时域的信号转换为频域的信号。音频发送端通过对数字声音信号进行傅里叶变换,得到频域声音信号。
步骤403,对频域信号进行子带分解和压缩编码,生成至少两组信号子带的第一子带编码数据。
音频发送端通过将输入信号分解成不同频带的分量以去除信号相关性,再将每组分量分别进行取样、量化、编码,从而得到多组互不相关的码字。在一种可能的实施方式中,步骤403具体实施方式可以包括如下步骤403a至步骤403b(图中未示出):
步骤403a,通过至少两个带通滤波器对频域信号进行子带分解,得到至少两个信号子带,不同带通滤波器对应不同音频频段,且各个带通滤波器的音频频段连续。
如图5所示,语音子带编码的基本思想是由音频发送端先通过一组带通滤波器将输入信号分解成若干个在不同音频频段上的信号子带,然后将这些信号子带经过频率搬移转变成基带信号,再分别对各个基带信号进行取样。取样后的信号经过量化、编码,合并成一个总的码流传输给接收端。子带编码可以根据人耳的听觉特性,合理分配各信号子带的比特数,以得到更好的听觉效果,同时还能够节省存储资源,降低传输带宽。
在一种可能的实施方式中,本申请实施例中的音频发送端基于上述基本思想对输入信号进行子带分解和压缩编码处理,得到各个信号子带的第一子带编码数据。音频发送端首先通过一组带通滤波器,例如正交镜像滤波器组(Quadrature Mirror Filter,QMF),将一帧输入信号的频带分成若干个连续的音频频段,每个音频频段称为信号子带。
步骤403b,对信号子带进行频率搬移以及量化编码,得到各组信号子带的第一子带编码数据。
音频发送端将各信号子带进行频率搬移,移至高频端,并对频率搬移后的信号子带进行量化编码。可选的,音频发送端采用统一的编码方案对各组信号子带进行编码,或者,音频发送端对每组信号子带采用单独的编码方案进行编码。本申请实施例对此不作限定。
步骤404,基于输入信号在各音频频段内的样点信号,确定低频子带的低频能量占比。
可选的,音频发送端在对输入信号进行子带分解后,同步进行压缩编码和低频能量占比的计算,或者,音频发送端在对信号子带进行压缩编码之后再计算输入信号的低频能量占比,本申请实施例对此不作限定。
音频发送端通过计算低频子带的能量占比,确定能量集中的信号子带。若低频子带的能量占比高,则说明信号能量集中在低频子带;若低频子带的能量占比较低,则说明信号能量 集中在高频子带。其中,低频子带的音频频率低于其他信号子带的音频频率。
示意性的,低频能量占比的计算公式如下:
其中,x(k,i)为单帧信号经过子带分解后第k个信号子带的第i个样点信号,其中k值越大则对应的子带频率越高,k=1代表的是低频子带,M为总子带数。可选的,样点信号是各个信号子带中的采样点信号。
可选的,当输入信号被分解为两组信号子带时,音频发送端只需计算低频的一个信号子带的能量占比。当总信号子带数大于2时,音频发送端计算最低频的一组信号子带的能量占比,或者计算最低频的多组信号子带的能量占比,开发人员可以根据实际应用场景以及音频文件格式等因素,设置低频能量占比的计算方式以及第二子带编码数据的确定方式。例如,当终端传输的音频文件为32KHz时,音频发送端可以首先将输入信号分解为0-16KHz以及16-32KHz两个频段,然后再将0-16KHz的频段分解为0-8KHz和8-16KHz两个频段,并计算0-8KHz频段以及8-16KHz频段的低频能量占比。本申请实施例对此不作限定。
步骤405,基于低频能量占比,从第一子带编码数据中确定第二子带编码数据。
可选的,音频发送端基于低频能量占比,判断能量所集中的音频频段,继而将能量集中的音频频段(信号子带)的第一子带编码数据确定为第二子带编码数据。在一种可能的实施方式中,步骤405具体可以包括如下步骤405a至步骤405b(图中未示出):
步骤405a,在低频能量占比高于阈值的情况下,将低频子带的第一子带编码数据确定为第二子带编码数据。
可选的,音频发送端中存储有阈值,当音频发送端计算出低频能量占比后,通过比较低频能量占比和阈值的大小关系,并根据比较结果对输入信号进行分类,以确定输入信号属于低频信号或高频信号,进而根据比较结果确定选取低频子带或高频子带的第一子带编码数据,作为第二子带编码数据。
可选的,低频子带是信号子带中音频频率低于其他信号子带的信号子带,高频子带是信号子带中音频频率高于其他信号子带的信号子带。
可选的,当音频发送端确定低频能量占比高于阈值时,表示输入信号为低频信号,低频信号子带即为输入信号的关键信号自子带,后续需要对低频信号子带进行着重修复。则音频发送端直接将低频信号子带的第一子带编码数据确定为第二子带编码数据。在输入信号被分解为三组或三组以上信号子带的情况下,音频发送端可以通过计算多组信号子带的能量占比,确定能量集中的信号子带,进而确定第二子带编码数据。
示意性的,输入信号被分解为0-8KHz和8-16KHz两个频段的信号子带,阈值为50%,若0-8KHz的低频能量占比高于50%,则确定输入信号为低频信号,低频信号为关键信号子带,对应音频发送端将0-8KHz频段的第一子带编码数据确定为第二子带编码数据。
步骤405b,在低频能量占比低于阈值的情况下,将高频子带的第一子带编码数据确定为第二子带编码数据,高频子带的音频频率高于其它信号子带的音频频率。
可选的,在输入信号被分解为两组信号子带的情况下,若低频能量占比低于阈值,表示输入信号为高频信号,高频信号子带即为输入信号的关键信号自子带,后续需要对高频信号子带进行着重修复。则音频发送端直接将高频信号子带的第一子带编码数据确定为第二子带编码数据。在输入信号被分解为三组或三组以上信号子带的情况下,音频发送端可以通过计算多组信号子带的能量占比,确定能量集中的信号子带,进而确定第二子带编码数据。
步骤406,对第二子带编码数据进行纠错编码,得到冗余数据。
步骤406的具体实施方式可以参考上述步骤303,本申请实施例在此不再赘述。
步骤407,基于低频能量占比生成信号类型标识。
其中,信号类型标识用于指示输入信号属于浊音信号或非浊音信号,其中,浊音信号的 低频能量占比高于阈值,低频信号对语音可懂度起到关键作用,需要对低频信号子带的第一子带编码数据进行纠错编码,以便在丢包的情况下可以着重恢复低频信号;非浊音信号的低频能量占比低于阈值,高频信号对语音可懂度起到关键作用,需要对高频信号子带的第一子带编码数据进行纠错编码,以便在丢包的情况下可以着重恢复高频信号。
在一种可能的实施方式中,终端在计算得到低频能量占比后,对输入信号进行分类,信号类型包括浊音信号和非浊音信号。浊音信号指能量集中在低频区域的声音信号,非浊音信号指能量集中在高频区域的声音信号。浊音信号与非浊音信号对应的信号类型标识不同。可选的,音频发送端在确定输入信号的低频能量占比高于阈值时,确定输入信号为浊音信号,设置浊音信号的信号类型标识;若音频发送端确定输入信号的高频能量占比低于阈值时,确定输入信号为非浊音信号,设置非浊音信号的信号类型标识。
在另一种可能的实施方式中,音频发送端在计算得到低频能量占比后,首先对输入信号进行分类,生成信号类型标识,并在音频数据包中携带输入信号的信号类型标识,使得音频接收端在确定丢包的情况下,可以根据信号类型标识从第一子带编码数据中确定需要着重修复的第二子带编码数据,进而基于冗余数据对第二子带编码数据进行纠错编码。可选的,当信号类型标识属于浊音信号标识时,表示在丢包情况下主要需要对低频信号子带进行数据恢复,对应音频接收端从第一子带编码数据中确定低频信号子带的第一子带编码数据(第二子带编码数据),进而基于冗余数据对第二子带编码数据进行数据恢复。当信号类型标识属于非浊音信号标识时,表示在丢包情况下主要需要对高频信号子带进行数据恢复,对应音频接收端从第一子带编码数据中确定高频信号子带的第一子带编码数据(第二子带编码数据),进而基于冗余数据对第二子带编码数据进行数据恢复。
可选的,在信号类型标识属于浊音信号标识时,音频发送端对低频子带的第一子带编码数据进行纠错编码;当信号类型标识属于非浊音信号标识时,音频发送端对高频子带的第一子带编码数据进行纠错编码。
步骤408,对第一子带编码数据、冗余数据以及信号类型标识进行打包,生成音频数据包。
可选的,音频发送端将信号类型标识与第一子带编码数据和冗余数据打包后一同发送至音频接收端,以便音频接收端在丢包的情况下基于信号类型标识,从第一子带编码数据中确定出第二子带编码数据,并进行数据恢复和信号子带预测。其中,若信号类型标识为浊音信号标识,音频接收端将低频子带的第一子带编码数据确定为第二子带编码数据;若信号类型标识为非浊音信号标识,信号接收端将高频子带的第一子带编码数据确定为第二子带编码数据。
步骤409,向音频接收端发送音频数据包。
可选的,当音频发送端基于信号类型标识、第一子带编码数据和冗余数据生成音频数据包后,即可以向音频接收端发送音频数据包,对应音频接收端用于在丢包的情况下,基于信号类型标识从第一子带编码数据中确定第二子带编码数据,以及基于第二子带编码数据和冗余数据进行数据恢复。
本申请实施例中,音频发送端通过计算低频子带的低频能量占比,确定能量集中的频段,进而确定第二子带编码数据,使得能够对实际重要的信号子带进行纠错编码,避免由于对固定频段进行纠错编码,导致丢包时无法恢复出连续信号的情况,在降低传输带宽的基础上提高了信号传输质量。此外,通过确定输入信号的信号类型标识,并在音频数据包中携带该信号类型标识,使得音频接收端在丢包的情况下,可以根据信号类型标识,从第一子带编码数据中确定出需要进行数据修复的第二子带编码数据,进而基于冗余数据和第二子带编码数据进行数据恢复;使得音频接收端无需重复确定需要进行修复的信号子带,且可以准确定位出需要进行数据恢复的信号子带,提高丢包情况下数据修复的准确行。
上述各个实施例示出了音频发送端进行子带编码以及纠错编码的过程。对于音频接收端,在接收到音频数据包后,首先判断是否存在丢包的情况。在发生丢包的情况下,需要音频接收端基于冗余数据对第一子带编码数据进行数据恢复和子带预测,从而输出连续的声音信号。请参考图6,其示出了本申请一个示例性实施例提供的音频传输方法的流程图。本实施例以该方法由音频接收端执行为例进行说明,该方法包括如下步骤:
步骤601,接收音频数据包。
音频数据包中包含冗余数据以及至少两组第一子带编码数据,冗余数据由音频发送端对第一子带编码数据中的第二子带编码数据进行纠错编码得到,所述第一子带编码数据由所述音频发送端对输入信号进行子带分解和压缩编码得到,不同第一子带编码数据对应输入信号中不同音频频段的第一信号子带,第二子带编码数据对应的音频频段为信号能量集中频段。
音频接收端接收音频数据包后进行数据解析,得到音频数据包中包含的第一子带编码数据和冗余数据并进行数据缓存。
步骤602,对第一子带编码数据进行丢包检测。
在一种可能的实施方式中,音频发送端在进行数据编码的过程中,按照信号采集的时序,对第一子带编码数据添加连续的编号。音频接收端解析数据后,检测第一子带编码数据对应的编号是否连续。若编号连续,则确定第一子带编码数据未丢包,若编号不连续,则确定存在丢包的情况。
步骤603,在第一子带编码数据丢包的情况下,基于冗余数据对第一子带编码数据进行数据恢复,得到输出信号。
当未发生丢包时音频接收端直接进行子带解码流程。若检测到存在丢包的情况,则音频接收端需要先从数据缓存区中获取冗余数据以及相邻数据包进行纠错解码,得到丢包位置的子带编码数据,然后通过子带解码以及子带预测,得到连续的输出信号。
本申请实施例中,音频接收端接收包含冗余数据以及第一子带编码数据的音频数据包,其中冗余数据是音频发送端针对能量集中频段的数据进行纠错编码的到的,相比于直接对完整输入信号进行纠错编码的方式,在提升网络抗丢包能力的同时,一方面能够降低冗余数据的数据量,减少音频接收端缓存数据所消耗的存储资源,另一方面能够降低传输带宽和运行成本。
由于冗余数据并非是对完整输入信号进行纠错编码得到,而是根据输入信号的信号类型,仅对输入信号中的低频子带或者高频子带进行纠错编码得到。因此,根据冗余数据仅能恢复出低频子带或高频子带的输入信号,则在数据修复过程中,需要根据输入信号的信号类型标识,确定如何对第一子带编码数据进行数据修复。
请参考图7,其示出了本申请另一个示例性实施例提供的音频传输方法的流程图。本实施例以该方法由音频接收端执行为例进行说明,该方法包括如下步骤:
步骤701,接收音频数据包。
步骤702,对第一子带编码数据进行丢包检测。
步骤701至步骤702的具体实施方式可以参考上述步骤601至步骤602,本申请实施例在此不再赘述。
步骤703,基于信号类型标识从第一子带编码数据中确定第二子带编码数据。
在一种可能的实施方式中,音频数据包中还包含信号类型标识,该信号类型标识用于指示第一子带编码数据对应的输入信号属于浊音信号或非浊音信号。其中,浊音信号的第二子带编码数据为低频子带的第一子带编码数据,非浊音信号的第二子带编码数据为高频子带的第一子带编码数据。低频子带的音频频率低于其它第一信号子带的音频频率,高频子带的音频频率高于其它第一信号子带的音频频率。即,浊音信号指信号能量集中在低频区域的声音信号(输入信号),非浊音信号指信号能量集中在非低频区域的声音信号(输入信号)。
当存在丢包的情况时,音频接收端需要从数据缓存区中读取相关冗余数据以及相邻数据包进行纠错解码。而冗余数据是音频接收端针对第二子带编码数据进行纠错编码得到的,因此音频接收端首先基于信号类型标识所指示的信号类型(浊音信号或非浊音信号),从至少两组第一子带编码数据中确定出第二子带编码数据。若信号类型标识指示信号类型为浊音信号,则将低频子带的第一子带编码数据确定为第二子带编码数据;若信号类型标识指示信号类型为非浊音信号,则将高频子带的第一子带编码数据确定为第二子带编码数据。
步骤704,基于冗余数据以及相邻音频数据包中的第一子带编码数据,对第二子带编码数据进行纠错解码。
基于音频发送端的纠错编码算法,音频接收端采用对应的纠错解码算法进行纠错解码,得到丢包位置的子带编码数据和信号分类标识。
步骤705,对纠错解码后的第二子带编码数据进行子带解码,得到第二信号子带。
音频接收端恢复出丢包位置的第二子带编码数据后,对完整的第二子带编码数据进行压缩解码,得到第二信号子带。
步骤706,基于第二信号子带对其它第一子带编码数据进行数据恢复。
冗余数据是音频发送端对第二子带编码数据进行纠错编码得到的,音频接收端同样也是基于冗余数据对第二子带编码数据进行丢包数据恢复。而音频数据是以数据捆包的形式在信道中传输,丢包即意味着各个子带编码数据均存在丢包情况。因此音频接收端还需基于恢复出的第二信号子带和信号分类标识,对其它子带的数据进行子带预测,才能够得到完整声音信号。
本申请实施例采用深度学习的方法进行子带预测。在一种可能的实施方式中,当接收到的音频帧属于浊音帧时,步骤706具体包括如下步骤706a至步骤706c(图中未示出):
步骤706a,在信号类型标识属于浊音信号标识的情况下,对第二信号子带进行特征提取,得到第一信号特征,第一信号特征包括对数功率谱、基因周期以及互相关值中的至少一种。
对于浊音帧(浊音信号),由于冗余数据是低频子带的第一子带编码数据进行纠错编码得到的,则在纠错解码后,仅可以得到低频的第二信号子带,而为了恢复出完成输入信号,对应音频接收端还需要通过低频子带的解码信号预测高频子带信号。首先经过提取低频子带(第二信号子带为低频子带)的相关特征作为深度学习网络的输入,例如:对数功率谱、基音周期、互相关值。
步骤706b,将第一信号特征输入第一深度学习网络,得到第一深度学习网络输出的高频子带功率谱。
第一深度学习网络基于样本低频信号的信号特征以及样本高频信号的功率谱训练得到,样本低频信号以及样本高频信号属于同一声音信号的不同信号子带。
在一种可能的实施方式中,在模型训练阶段,计算机设备对样本声音信号进行子带分解,得到样本低频信号和样本高频信号。计算机设备将样本低频信号的信号特征输入第一深度学习网络,得到第一深度学习网络预测的高频子带功率谱。计算机设备基于样本高频信号的功率谱以及第一深度学习网络的预测结果,对第一深度学习网络进行反向传播训练。
第一深度学习网络可以是多层卷积神经网络(Convolutional Neural Networks,CNN)和多层长短期记忆网络(Long Short-Term Memory,LSTM)的结合。
步骤706c,基于高频子带功率谱以及随机相位值进行反傅里叶变换,得到高频子带信号。
经过第一深度学习网络预测得到的高频功率谱值,配合随机相位值,并经过反傅里叶变换,即可得到时域高频子带信号。
可选的,音频接收端将基于冗余数据恢复出的第二信号子带,以及根据第二信号子带预测得到的高频子带信号合并,即可以得到数据恢复出的完整输出信号。
在一种可能的实施方式中,当接收到的音频帧属于非浊音帧时,步骤706具体包括如下步骤706d至步骤706f(图中未示出):
步骤706d,在信号类型标识属于非浊音信号标识的情况下,对第二信号子带进行特征提取,得到第二信号特征,第二信号特征包括对数功率谱。
对于非浊音帧(非浊音信号),由于冗余数据是高频子带的第一子带编码数据进行纠错编码得到的,则在纠错解码后,仅可以得到高频的第二信号子带,而为了恢复出完成输入信号,对应音频接收端还需要通过高频子带的解码信号(第二信号子带)预测低频子带。首先经过提取高频信号的相关特征作为深度学习网络的输入,例如对数功率谱。
步骤706e,将第二信号特征输入第二深度学习网络,得到第二深度学习网络输出的低频子带功率谱。
第二深度学习网络基于样本高频信号的信号特征以及样本低频信号的功率谱训练得到,样本低频信号以及样本高频信号属于同一声音信号的不同信号子带。
在一种可能的实施方式中,在模型训练阶段,计算机设备对样本声音信号进行子带分解,得到样本低频信号和样本高频信号。计算机设备将样本高频信号的信号特征输入第二深度学习网络,得到第二深度学习网络预测的低频子带功率谱。计算机设备基于样本低频信号的功率谱以及第二深度学习网络的预测结果,对第二深度学习网络进行反向传播训练。
第二深度学习网络可以是多层CNN和多层LSTM的结合。
步骤706f,基于低频子带功率谱以及随机相位值进行反傅里叶变换,得到低频子带信号。
经过第二深度学习网络预测得到的低频信号功率谱值,配合随即相位值,再经过反傅里叶变换,即可得到时域低频子带信号。
可选的,音频接收端将基于冗余数据恢复出的第二信号子带,以及根据第二信号子带预测得到的低频子带信号合并,即可以得到数据恢复出的完整输出信号。
步骤707,基于各个第二信号子带进行子带合成,得到输出信号。
音频接收端进行子带预测和恢复后,得到所有子带的完整子带信号。随后经过子带合成,例如QMF子带合成方法,将多组子带信号合成为一个完整的子带信号进行输出。
上述步骤703至步骤707是在子带编码数据丢包的情况下,音频接收端进行纠错解码和子带预测,得到完整声音信号的过程。在一种可能的实施方式中,步骤702之后还包括如下步骤(图中未示出):
在子带编码数据未丢包的情况下,对第一子带编码数据进行子带解码以及子带合成,得到输出信号。
若不存在丢包的情况,音频接收端可直接对各组子带编码数据进行压缩解码,得到第二信号子带。而后经过反傅里叶变换、子带合成等过程,得到输出信号。
本申请实施例中,在丢包的情况下,音频接收端可以基于冗余数据恢复出关键信号子带,进而基于关键信号子带预测得到其他信号子带,保证了输入信号中与可懂度相关的部分信号的传输准确性,进一步提高了音频传输网络的抗丢包能力。
如图8所示,其示出了音频发送端采集并发送音频以及音频接收端接收并输出音频的流程。音频发送端对输入信号进行编码:首先对输入信号进行子带分解和子带编码,同时确定输入信号的类型,信号类型包括浊音信号和非浊音信号;对于浊音信号,音频发送端提取低频子带编码码流和信号类型标识进行纠错编码,对于非浊音信号则提取高频子带编码码流和信号分类标识进行纠错编码;最终对子带编码数据、纠错编码冗余数据进行数据捆包发送至音频接收端,音频接收端对接收到的信号进行解码:首先接收数据并缓存;检测是否存在丢包情况,若没有发生丢包则进行子带解码流程,将解码得到的各子带信号经过子带合成得到完整输出信号,若发生丢包则从数据缓存区中获取相关冗余数据以及相邻数据宝进行纠错解码,经过纠错解码得到丢包位置的子带编码数据和信号类型标识;基于纠错解码得到的子带码流,对其余子带进行预测和恢复,得到所有子带信号,随后经过子带合成得到完整的输出信号。
图9是本申请一个示例性实施例提供的音频传输装置的结构框图,该装置包括如下结构:
子带编码模块901,用于对输入信号进行子带分解和压缩编码,得到至少两组信号子带的第一子带编码数据,不同信号子带对应所述输入信号的不同音频频段;
确定模块902,用于基于所述输入信号的能量分布情况,从所述第一子带编码数据中确定第二子带编码数据,所述第二子带编码数据对应信号子带的音频频段为信号能量集中频段;
纠错编码模块903,用于对所述第二子带编码数据进行纠错编码,得到冗余数据;
数据发送模块904,用于向音频接收端发送音频数据包,所述音频数据包中包含所述第一子带编码数据和所述冗余数据,所述音频接收端用于在丢包的情况下基于所述冗余数据对所述第一子带编码数据进行数据恢复。
可选的,所述确定模块902,还用于:
基于所述输入信号在各音频频段内的样点信号,确定低频子带的低频能量占比,所述低频子带的音频频率低于其它信号子带的音频频率;
基于所述低频能量占比,从所述第一子带编码数据中确定所述第二子带编码数据。
可选的,所述确定模块902,还用于:
在所述低频能量占比高于阈值的情况下,将所述低频子带的所述第一子带编码数据确定为所述第二子带编码数据;
在所述低频能量占比低于所述阈值的情况下,将高频子带的所述第一子带编码数据确定为所述第二子带编码数据,所述高频子带的音频频率高于其它信号子带的音频频率。
可选的,所述装置还包括:
标识生成模块,用于基于所述低频能量占比生成信号类型标识,所述信号类型标识用于指示所述输入信号属于浊音信号或非浊音信号,其中,所述浊音信号的所述低频能量占比高于所述阈值,所述非浊音信号的所述低频能量占比低于所述阈值;
所述数据发送模块904,还用于:
对所述第一子带编码数据、所述冗余数据以及所述信号类型标识进行打包,生成所述音频数据包;
向所述音频接收端发送所述音频数据包,所述音频接收端用于在丢包的情况下,基于所述信号类型标识从所述第一子带编码数据中确定所述第二子带编码数据,以及基于所述第二子带编码数据和所述冗余数据进行数据恢复。
可选的,所述子带编码模块901,还用于:
对麦克风采集到的模拟声音信号进行模数转换,生成数字声音信号;
对所述数字声音信号进行傅里叶变换,得到频域信号;
对所述频域信号进行子带分解和压缩编码,生成至少两组信号子带的所述第一子带编码数据。
可选的,所述子带编码模块901,还用于:
通过至少两个带通滤波器对所述频域信号进行子带分解,得到至少两个所述信号子带,不同带通滤波器对应不同音频频段,且各个所述带通滤波器的所述音频频段连续;
对所述信号子带进行频率搬移以及量化编码,得到各组所述信号子带的所述第一子带编码数据。
图10是本申请另一个示例性实施例提供的音频传输装置的结构框图,该装置包括如下结构:
数据接收模块1001,用于接收音频数据包,所述音频数据包中包含冗余数据以及至少两组第一子带编码数据,所述冗余数据由音频发送端对所述第一子带编码数据中的第二子带编码数据进行纠错编码得到,所述第一子带编码数据由所述音频发送端对输入信号进行子带分解和压缩编码得到,不同第一子带编码数据对应所述输入信号中不同音频频段的第一信号子带,所述第二子带编码数据的音频频段为信号能量集中频段;
丢包检测模块1002,用于对所述第一子带编码数据进行丢包检测;
解码模块1003,用于在所述第一子带编码数据丢包的情况下,基于所述冗余数据对所述第一子带编码数据进行数据恢复,得到输出信号。
可选的,所述音频数据包中还包含信号类型标识,所述信号类型标识用于指示所述输入信号属于浊音信号或非浊音信号,其中,所述浊音信号的所述第二子带编码数据为低频子带的所述第一子带编码数据,所述非浊音信号的所述第二子带编码数据为高频子带的所述第一子带编码数据,所述低频子带的音频频率低于其它第一信号子带的音频频率,所述高频子带的音频频率高于其它第一信号子带的音频频率;
所述解码模块1003,还用于:
基于所述信号类型标识从所述第一子带编码数据中确定所述第二子带编码数据;
基于所述冗余数据以及相邻音频数据包中的第一子带编码数据,对所述第二子带编码数据进行纠错解码;
对纠错解码后的所述第二子带编码数据进行子带解码,得到第二信号子带;
基于所述第二信号子带对其它所述第一子带编码数据进行数据恢复;
基于各个所述第二信号子带进行子带合成,得到所述输出信号。
可选的,所述解码模块1003,还用于:
在所述信号类型标识属于浊音信号标识的情况下,对所述第二信号子带进行特征提取,得到第一信号特征,所述第一信号特征包括对数功率谱、基因周期以及互相关值中的至少一种;
将所述第一信号特征输入第一深度学习网络,得到所述第一深度学习网络输出的高频子带功率谱,所述第一深度学习网络基于样本低频信号的信号特征以及样本高频信号的功率谱训练得到,所述样本低频信号以及所述样本高频信号属于同一声音信号的不同信号子带;
基于所述高频子带功率谱以及随机相位值进行反傅里叶变换,得到高频子带信号。
可选的,所述解码模块1003,还用于:
在所述信号类型标识属于非浊音信号标识的情况下,对所述第二信号子带进行特征提取,得到第二信号特征,所述第二信号特征包括对数功率谱;
将所述第二信号特征输入第二深度学习网络,得到所述第二深度学习网络输出的低频子带功率谱,所述第二深度学习网络基于样本高频信号的信号特征以及样本低频信号的功率谱训练得到,所述样本低频信号以及所述样本高频信号属于同一声音信号的不同信号子带;
基于所述低频子带功率谱以及随机相位值进行反傅里叶变换,得到低频子带信号。
可选的,所述解码模块1003,还用于:
在所述子带编码数据未丢包的情况下,对所述第一子带编码数据进行子带解码以及子带合成,得到所述输出信号。
综上所述,本申请实施例中,通过对输入信号进行分频段分解和压缩编码,得到至少两组子带编码数据,针对其中信号能量集中的部分子带编码数据进行纠错编码,确保音频接收端对主要音频数据的恢复能力。相比于直接对完整的输入信号进行纠错编码的方案,在提升音频传输质量的同时,能够降低冗余数据的数据量,从而降低纠错编码对传输带宽和运行成本的消耗。
请参考图11,其示出了本申请一个示例性实施例提供的终端1100的结构框图。该终端1100可以是便携式移动终端,比如:智能手机、平板电脑、动态影像专家压缩标准音频层面3(Moving Picture Experts Group Audio Layer III,MP3)播放器、动态影像专家压缩标准音频层面4(Moving Picture Experts Group Audio Layer IV,MP4)播放器。终端1100还可能被称为用户设备、便携式终端等其他名称。
通常,终端1100包括有:处理器1101和存储器1102。
处理器1101可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理 器1101可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器1101也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称中央处理器(Central Processing Unit,CPU);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器1101可以在集成有图像处理器(Graphics Processing Unit,GPU),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器1101还可以包括人工智能(Artificial Intelligence,AI)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器1102可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是有形的和非暂态的。存储器1102还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器1102中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器1101所执行以实现本申请实施例提供的方法。
在一些实施例中,终端1100还可选包括有:外围设备接口1103。
外围设备接口1103可被用于将输入/输出(Input/Output,I/O)相关的至少一个外围设备连接到处理器1101和存储器1102。在一些实施例中,处理器1101、存储器1102和外围设备接口1103被集成在同一芯片或电路板上;在一些其他实施例中,处理器1101、存储器1102和外围设备接口1103中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限定。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有至少一条指令,所述至少一条指令由处理器加载并执行以实现如上各个实施例所述的音频传输方法。
根据本申请的一个方面,提供了一种计算机程序产品,该计算机程序产品包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述方面的各种可选实现方式中提供的音频传输方法。
需要说明的是,本申请所涉及的信息(包括但不限于用户设备信息、用户个人信息等)、数据(包括但不限于用于分析的数据、存储的数据、展示的数据等)以及信号,均为经用户授权或者经过各方充分授权的,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。例如,本申请中涉及到的输入信号、音频数据等都是在充分授权的情况下获取的。

Claims (20)

  1. 一种音频传输方法,所述方法由音频发送端执行,所述方法包括:
    对输入信号进行子带分解和压缩编码,得到至少两组信号子带的第一子带编码数据,不同信号子带对应所述输入信号的不同音频频段;
    基于所述输入信号的能量分布情况,从所述第一子带编码数据中确定第二子带编码数据,所述第二子带编码数据对应信号子带的音频频段为信号能量集中频段;
    对所述第二子带编码数据进行纠错编码,得到冗余数据;
    向音频接收端发送音频数据包,所述音频数据包中包含所述第一子带编码数据和所述冗余数据,所述音频接收端用于在丢包的情况下基于所述冗余数据对所述第一子带编码数据进行数据恢复。
  2. 根据权利要求1所述的方法,其中,所述基于所述输入信号的能量分布情况,从所述第一子带编码数据中确定第二子带编码数据,包括:
    基于所述输入信号在各音频频段内的样点信号,确定低频子带的低频能量占比,所述低频子带的音频频率低于其它信号子带的音频频率;
    基于所述低频能量占比,从所述第一子带编码数据中确定所述第二子带编码数据。
  3. 根据权利要求2所述的方法,其中,所述基于所述低频能量占比,从所述第一子带编码数据中确定所述第二子带编码数据,包括:
    在所述低频能量占比高于阈值的情况下,将所述低频子带的所述第一子带编码数据确定为所述第二子带编码数据;
    在所述低频能量占比低于所述阈值的情况下,将高频子带的所述第一子带编码数据确定为所述第二子带编码数据,所述高频子带的音频频率高于其它信号子带的音频频率。
  4. 根据权利要求3所述的方法,其中,所述基于所述输入信号在各音频频段内的样点信号,确定低频子带的低频能量占比之后,所述方法包括:
    基于所述低频能量占比生成信号类型标识,所述信号类型标识用于指示所述输入信号属于浊音信号或非浊音信号,其中,所述浊音信号的所述低频能量占比高于所述阈值,所述非浊音信号的所述低频能量占比低于所述阈值;
    所述向音频接收端发送音频数据包,包括:
    对所述第一子带编码数据、所述冗余数据以及所述信号类型标识进行打包,生成所述音频数据包;
    向所述音频接收端发送所述音频数据包,所述音频接收端用于在丢包的情况下,基于所述信号类型标识从所述第一子带编码数据中确定所述第二子带编码数据,以及基于所述第二子带编码数据和所述冗余数据进行数据恢复。
  5. 根据权利要求1至4任一所述的方法,其中,所述对输入信号进行子带分解和压缩编码,得到至少两组信号子带的第一子带编码数据,包括:
    对麦克风采集到的模拟声音信号进行模数转换,生成数字声音信号;
    对所述数字声音信号进行傅里叶变换,得到频域信号;
    对所述频域信号进行子带分解和压缩编码,生成至少两组所述信号子带的所述第一子带编码数据。
  6. 根据权利要求5所述的方法,其中,所述对所述频域信号进行子带分解和压缩编码, 生成至少两组信号子带的所述第一子带编码数据,包括:
    通过至少两个带通滤波器对所述频域信号进行子带分解,得到至少两个所述信号子带,不同带通滤波器对应不同音频频段,且各个所述带通滤波器的所述音频频段连续;
    对所述信号子带进行频率搬移以及量化编码,得到各组所述信号子带的所述第一子带编码数据。
  7. 一种音频传输方法,所述方法由音频接收端执行,所述方法包括:
    接收音频数据包,所述音频数据包中包含冗余数据以及至少两组第一子带编码数据,所述冗余数据由音频发送端对所述第一子带编码数据中的第二子带编码数据进行纠错编码得到,所述第一子带编码数据由所述音频发送端对输入信号进行子带分解和压缩编码得到,不同第一子带编码数据对应所述输入信号中不同音频频段的第一信号子带,所述第二子带编码数据的音频频段为信号能量集中频段;
    对所述第一子带编码数据进行丢包检测;
    在所述第一子带编码数据丢包的情况下,基于所述冗余数据对所述第一子带编码数据进行数据恢复,得到输出信号。
  8. 根据权利要求7所述的方法,其中,所述音频数据包中还包含信号类型标识,所述信号类型标识用于指示所述输入信号属于浊音信号或非浊音信号,其中,所述浊音信号的所述第二子带编码数据为低频子带的所述第一子带编码数据,所述非浊音信号的所述第二子带编码数据为高频子带的所述第一子带编码数据,所述低频子带的音频频率低于其它第一信号子带的音频频率,所述高频子带的音频频率高于其它第一信号子带的音频频率;
    所述在所述第一子带编码数据丢包的情况下,基于所述冗余数据对所述第一子带编码数据进行数据恢复,得到输出信号,包括:
    基于所述信号类型标识从所述第一子带编码数据中确定所述第二子带编码数据;
    基于所述冗余数据以及相邻音频数据包中的第一子带编码数据,对所述第二子带编码数据进行纠错解码;
    对纠错解码后的所述第二子带编码数据进行子带解码,得到第二信号子带;
    基于所述第二信号子带对其它所述第一子带编码数据进行数据恢复;
    基于各个所述第二信号子带进行子带合成,得到所述输出信号。
  9. 根据权利要求8所述的方法,其中,所述基于所述第二信号子带对其它第一子带编码数据进行数据恢复,包括:
    在所述信号类型标识属于浊音信号标识的情况下,对所述第二信号子带进行特征提取,得到第一信号特征,所述第一信号特征包括对数功率谱、基因周期以及互相关值中的至少一种;
    将所述第一信号特征输入第一深度学习网络,得到所述第一深度学习网络输出的高频子带功率谱,所述第一深度学习网络基于样本低频信号的信号特征以及样本高频信号的功率谱训练得到,所述样本低频信号以及所述样本高频信号属于同一声音信号的不同信号子带;
    基于所述高频子带功率谱以及随机相位值进行反傅里叶变换,得到高频子带信号。
  10. 根据权利要求8所述的方法,其中,所述基于所述第二信号子带对其它第一子带编码数据进行数据恢复,包括:
    在所述信号类型标识属于非浊音信号标识的情况下,对所述第二信号子带进行特征提取,得到第二信号特征,所述第二信号特征包括对数功率谱;
    将所述第二信号特征输入第二深度学习网络,得到所述第二深度学习网络输出的低频子带功率谱,所述第二深度学习网络基于样本高频信号的信号特征以及样本低频信号的功率谱 训练得到,所述样本低频信号以及所述样本高频信号属于同一声音信号的不同信号子带;
    基于所述低频子带功率谱以及随机相位值进行反傅里叶变换,得到低频子带信号。
  11. 根据权利要求7至10任一所述的方法,其中,所述对所述子带编码数据进行丢包检测之后,所述方法还包括:
    在所述子带编码数据未丢包的情况下,对所述第一子带编码数据进行子带解码以及子带合成,得到所述输出信号。
  12. 一种音频传输装置,所述装置包括:
    子带编码模块,用于对输入信号进行子带分解和压缩编码,得到至少两组信号子带的第一子带编码数据,不同信号子带对应所述输入信号的不同音频频段;
    确定模块,用于基于所述输入信号的能量分布情况,从所述第一子带编码数据中确定第二子带编码数据,所述第二子带编码数据对应信号子带的音频频段为信号能量集中频段;
    纠错编码模块,用于对所述第二子带编码数据进行纠错编码,得到冗余数据;
    数据发送模块,用于向音频接收端发送音频数据包,所述音频数据包中包含所述第一子带编码数据和所述冗余数据,所述音频接收端用于在丢包的情况下基于所述冗余数据对所述第一子带编码数据进行数据恢复。
  13. 根据权利要求12所述的装置,其中,所述确定模块,还用于:
    基于所述输入信号在各音频频段内的样点信号,确定低频子带的低频能量占比,所述低频子带的音频频率低于其它信号子带的音频频率;
    基于所述低频能量占比,从所述第一子带编码数据中确定所述第二子带编码数据。
  14. 根据权利要求13所述的装置,其中,所述确定模块,还用于:
    在所述低频能量占比高于阈值的情况下,将所述低频子带的所述第一子带编码数据确定为所述第二子带编码数据;
    在所述低频能量占比低于所述阈值的情况下,将高频子带的所述第一子带编码数据确定为所述第二子带编码数据,所述高频子带的音频频率高于其它信号子带的音频频率。
  15. 根据权利要求14所述的装置,其中,所述装置还包括:
    标识生成模块,用于基于所述低频能量占比生成信号类型标识,所述信号类型标识用于指示所述输入信号属于浊音信号或非浊音信号,其中,所述浊音信号的所述低频能量占比高于所述阈值,所述非浊音信号的所述低频能量占比低于所述阈值;
    所述数据发送模块,还用于:
    对所述第一子带编码数据、所述冗余数据以及所述信号类型标识进行打包,生成所述音频数据包;
    向所述音频接收端发送所述音频数据包,所述音频接收端用于在丢包的情况下,基于所述信号类型标识从所述第一子带编码数据中确定所述第二子带编码数据,以及基于所述第二子带编码数据和所述冗余数据进行数据恢复。
  16. 根据权利要求12至15任一所述的装置,其中,所述子带编码模块,还用于:
    对麦克风采集到的模拟声音信号进行模数转换,生成数字声音信号;
    对所述数字声音信号进行傅里叶变换,得到频域信号;
    对所述频域信号进行子带分解和压缩编码,生成至少两组所述信号子带的所述第一子带编码数据。
  17. 一种音频传输装置,所述装置包括:
    数据接收模块,用于接收音频数据包,所述音频数据包中包含冗余数据以及至少两组第一子带编码数据,所述冗余数据由音频发送端对所述第一子带编码数据中的第二子带编码数据进行纠错编码得到,所述第一子带编码数据由所述音频发送端对输入信号进行子带分解和压缩编码得到,不同第一子带编码数据对应所述输入信号中不同音频频段的第一信号子带,所述第二子带编码数据的音频频段为信号能量集中频段;
    丢包检测模块,用于对所述第一子带编码数据进行丢包检测;
    解码模块,用于在所述第一子带编码数据丢包的情况下,基于所述冗余数据对所述第一子带编码数据进行数据恢复,得到输出信号。
  18. 一种终端,所述终端包括处理器和存储器;所述存储器中存储有至少一段程序,所述至少一段程序由所述处理器加载并执行以实现如权利要求1至6任一所述的音频传输方法或权利要求7至11任一所述的音频传输方法。
  19. 一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条计算机程序,所述计算机程序由处理器加载并执行以实现如权利要求1至6任一所述的音频传输方法或权利要求7至11任一所述的音频传输方法。
  20. 一种计算机程序产品,所述计算机程序产品包括计算机指令,所述计算机指令存储在计算机可读存储介质中;终端的处理器从所述计算机可读存储介质读取所述计算机指令,所述处理器执行所述计算机指令,使得所述终端执行如权利要求1至6任一所述的音频传输方法或权利要求7至11任一所述的音频传输方法。
PCT/CN2023/079987 2022-04-18 2023-03-07 音频传输方法、装置、终端、存储介质及程序产品 WO2023202250A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210405956.4 2022-04-18
CN202210405956.4A CN116959458A (zh) 2022-04-18 2022-04-18 音频传输方法、装置、终端、存储介质及程序产品

Publications (1)

Publication Number Publication Date
WO2023202250A1 true WO2023202250A1 (zh) 2023-10-26

Family

ID=88419044

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/079987 WO2023202250A1 (zh) 2022-04-18 2023-03-07 音频传输方法、装置、终端、存储介质及程序产品

Country Status (2)

Country Link
CN (1) CN116959458A (zh)
WO (1) WO2023202250A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117676185A (zh) * 2023-12-05 2024-03-08 无锡中感微电子股份有限公司 一种音频数据的丢包补偿方法、装置及相关设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010093755A (ja) * 2008-10-10 2010-04-22 Tamura Seisakusho Co Ltd 送信機、受信機及び送受信システム及び方法
US20100312552A1 (en) * 2009-06-04 2010-12-09 Qualcomm Incorporated Systems and methods for preventing the loss of information within a speech frame
US20170103761A1 (en) * 2015-10-10 2017-04-13 Dolby Laboratories Licensing Corporation Adaptive Forward Error Correction Redundant Payload Generation
CN111371957A (zh) * 2020-05-26 2020-07-03 腾讯科技(深圳)有限公司 一种冗余度控制方法、装置、电子设备和存储介质
CN112489665A (zh) * 2020-11-11 2021-03-12 北京融讯科创技术有限公司 语音处理方法、装置以及电子设备
CN113890687A (zh) * 2021-11-15 2022-01-04 杭州叙简未兰电子有限公司 一种基于纠错码与纠删码混合高可靠音频传输方法与装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010093755A (ja) * 2008-10-10 2010-04-22 Tamura Seisakusho Co Ltd 送信機、受信機及び送受信システム及び方法
US20100312552A1 (en) * 2009-06-04 2010-12-09 Qualcomm Incorporated Systems and methods for preventing the loss of information within a speech frame
US20170103761A1 (en) * 2015-10-10 2017-04-13 Dolby Laboratories Licensing Corporation Adaptive Forward Error Correction Redundant Payload Generation
CN111371957A (zh) * 2020-05-26 2020-07-03 腾讯科技(深圳)有限公司 一种冗余度控制方法、装置、电子设备和存储介质
CN112489665A (zh) * 2020-11-11 2021-03-12 北京融讯科创技术有限公司 语音处理方法、装置以及电子设备
CN113890687A (zh) * 2021-11-15 2022-01-04 杭州叙简未兰电子有限公司 一种基于纠错码与纠删码混合高可靠音频传输方法与装置

Also Published As

Publication number Publication date
CN116959458A (zh) 2023-10-27

Similar Documents

Publication Publication Date Title
KR101570589B1 (ko) 워터마킹된 신호를 인코딩 및 검출하는 디바이스들
CN101346760B (zh) 用于音频编码的编码器辅助的帧丢失隐藏技术
US10636432B2 (en) Method for predicting high frequency band signal, encoding device, and decoding device
CN1732512A (zh) 用于隐蔽压缩域分组丢失的方法和装置
KR101548846B1 (ko) 워터마킹된 신호의 적응적 인코딩 및 디코딩을 위한 디바이스
CN113470667A (zh) 语音信号的编解码方法、装置、电子设备及存储介质
WO2023202250A1 (zh) 音频传输方法、装置、终端、存储介质及程序产品
KR101590239B1 (ko) 워터마킹된 신호를 인코딩 및 디코딩하는 디바이스들
CN114550732B (zh) 一种高频音频信号的编解码方法和相关装置
CN110619881B (zh) 一种语音编码方法、装置及设备
CN103187065A (zh) 音频数据的处理方法、装置和系统
JP2017503192A (ja) 帯域幅拡張モード選択
CN115171709B (zh) 语音编码、解码方法、装置、计算机设备和存储介质
CN103915097A (zh) 一种语音信号处理方法、装置和系统
WO2015165264A1 (zh) 处理信号的方法及设备
WO2015000373A1 (zh) 信号编码和解码方法以及设备
CN112769524B (zh) 语音传输方法、装置、计算机设备和存储介质
CN115831132A (zh) 音频编解码方法、装置、介质及电子设备
CN112767955A (zh) 音频编码方法及装置、存储介质、电子设备
WO2022267754A1 (zh) 语音编码、语音解码方法、装置、计算机设备和存储介质
Dorogov et al. Overview of Technologies for Transmitting Audio Streams over Low-Speed and Unstable Communication Channels
CN117640015B (zh) 一种语音编码、解码方法、装置、电子设备及存储介质
CN116996489A (zh) 投屏码的传输、投屏方法、装置及设备
CN116137151A (zh) 低码率网络连接中提供高质量音频通信的系统和方法
CN115312069A (zh) 音频编解码方法、装置、计算机可读介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23790911

Country of ref document: EP

Kind code of ref document: A1