WO2021098405A1 - 数据传输方法、装置、终端及存储介质 - Google Patents

数据传输方法、装置、终端及存储介质 Download PDF

Info

Publication number
WO2021098405A1
WO2021098405A1 PCT/CN2020/120300 CN2020120300W WO2021098405A1 WO 2021098405 A1 WO2021098405 A1 WO 2021098405A1 CN 2020120300 W CN2020120300 W CN 2020120300W WO 2021098405 A1 WO2021098405 A1 WO 2021098405A1
Authority
WO
WIPO (PCT)
Prior art keywords
transmitted
audio frame
audio
level
redundant
Prior art date
Application number
PCT/CN2020/120300
Other languages
English (en)
French (fr)
Inventor
梁俊斌
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2021098405A1 publication Critical patent/WO2021098405A1/zh
Priority to US17/513,736 priority Critical patent/US11798566B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/22Arrangements for detecting or preventing errors in the information received using redundant apparatus to increase reliability
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/08Arrangements for detecting or preventing errors in the information received by repeating transmission, e.g. Verdan system
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/0017Lossless audio signal coding; Perfect reconstruction of coded audio signal by transmission of coding error
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0009Systems modifying transmission characteristics according to link quality, e.g. power backoff by adapting the channel coding
    • H04L1/0011Systems modifying transmission characteristics according to link quality, e.g. power backoff by adapting the channel coding applied to payload information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0015Systems modifying transmission characteristics according to link quality, e.g. power backoff characterised by the adaptation strategy
    • H04L1/0017Systems modifying transmission characteristics according to link quality, e.g. power backoff characterised by the adaptation strategy where the mode-switching is based on Quality of Service requirement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/12Arrangements for detecting or preventing errors in the information received by using return channel
    • H04L1/16Arrangements for detecting or preventing errors in the information received by using return channel in which the return channel carries supervisory signals, e.g. repetition request signals
    • H04L1/18Automatic repetition systems, e.g. Van Duuren systems
    • H04L1/1812Hybrid protocols; Hybrid automatic repeat request [HARQ]
    • H04L1/1819Hybrid protocols; Hybrid automatic repeat request [HARQ] with retransmission of additional or different redundancy
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/12Arrangements for detecting or preventing errors in the information received by using return channel
    • H04L1/16Arrangements for detecting or preventing errors in the information received by using return channel in which the return channel carries supervisory signals, e.g. repetition request signals
    • H04L1/18Automatic repetition systems, e.g. Van Duuren systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/32Flow control; Congestion control by discarding or delaying data units, e.g. packets or frames

Definitions

  • the present disclosure relates to the field of network technology, and in particular to a data transmission method, device, terminal, and storage medium.
  • terminals can communicate with each other through VoIP (Voice over Internet Protocol, voice transmission based on Internet Protocol) technology. Since the Internet is an unreliable transmission network, the sender is based on the audio data transmitted by the Internet. Packet loss is prone to occur.
  • VoIP Voice over Internet Protocol, voice transmission based on Internet Protocol
  • FEC Forward Error Correction
  • PLC Packet Loss Concealment, packet loss concealment
  • ARQ Automatic Repeat Request, automatic repeat request
  • the embodiments of the present disclosure provide a data transmission method, device, terminal, and storage medium.
  • the technical solutions are as follows:
  • a data transmission method includes:
  • the current redundancy multiple is determined based on the current packet loss situation of the target terminal;
  • a data transmission device which includes:
  • An analysis module configured to perform voice criticality analysis on the audio to be transmitted to obtain the criticality level of at least one audio frame to be transmitted in the to-be-transmitted audio, where the criticality level is used to measure the amount of information carried by the audio frame;
  • the obtaining module is configured to obtain the number of redundant transmissions of the at least one audio frame to be transmitted according to the current redundancy multiple and the redundant multiple transmission factor corresponding to the criticality level, the criticality level and the redundant multiple transmission factor
  • the magnitude of is in a positive correlation, and the current redundancy multiple is determined based on the current packet loss situation of the target terminal;
  • the sending module is configured to copy the at least one audio frame to be transmitted according to the number of redundant sending times to obtain at least one redundant data packet, and send the at least one redundant data packet to the target terminal.
  • a terminal in one aspect, includes one or more processors and one or more memories, and at least one piece of program code is stored in the one or more memories.
  • the processor loads and executes to implement the operations performed by the data transmission method in any of the foregoing possible implementation manners.
  • a storage medium in which at least one piece of program code is stored, and the at least one piece of program code is loaded and executed by a processor to implement operations performed by the data transmission method in any of the above-mentioned possible implementation modes .
  • FIG. 1 is a schematic diagram of an implementation environment of a data transmission method provided by an embodiment of the present disclosure
  • FIG. 2 is a flowchart of a data transmission method provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of a data transmission method provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic structural diagram of a data transmission device provided by an embodiment of the present disclosure.
  • FIG. 5 is a schematic structural diagram of a terminal provided by an embodiment of the present disclosure.
  • VoIP Voice over Internet Protocol, voice transmission based on Internet Protocol
  • IP Internet Protocol
  • the sender device encodes and compresses audio data through an audio compression algorithm , Pack the encoded and compressed audio data according to the network transmission protocol standard to obtain the audio data packet, and send the audio data packet to the target IP address corresponding to the receiving end device through the IP network, and the receiving end device parses and decompresses the audio data packet Later, the audio data packet is restored to the original voice signal, so as to achieve the purpose of transmitting the voice signal through the Internet.
  • the embodiments of the present disclosure provide a data transmission method that can achieve a good anti-packet loss effect in various scenarios, which will be described in detail below.
  • Fig. 1 is a schematic diagram of an implementation environment of a data transmission method provided by an embodiment of the present disclosure.
  • this implementation environment may include a sending end device 101 and a receiving end device 102.
  • the sending end device 101 may be a terminal used by the first user, and an application supporting audio data transmission may be installed and running on the sending end device 101.
  • the application may be a call application, a social application, a live broadcast application, a takeaway application, or a taxi application. In any of them, the application can provide audio data transmission services through VoIP, voice broadcast, or live audio and video.
  • the receiving end device 102 may be a terminal used by a second user, and an application supporting audio data transmission may be installed and running on the receiving end device 102.
  • the application may be a call application, a social application, a live broadcast application, a takeaway application, or a taxi application. In any of them, the application can provide audio data transmission services through VoIP, voice broadcast, or live audio and video.
  • the sending end device 101 and the receiving end device 102 may be connected through a wired or wireless network.
  • the first user may trigger a call request for a VoIP call on the application of the sending end device 101
  • the sending end device 101 may send the call request to the receiving end device 102 through the server
  • the second user may be on the receiving end
  • a call response is triggered on the application of the device 102.
  • the call response is used to indicate to answer the VoIP call or reject the VoIP call.
  • the receiving end device 102 sends the call response to the sending end device 101 through the server.
  • the sending end device 101 receives the call response Analyze the call response. If the call response indicates that the VoIP call is answered, a data channel for the VoIP call is established between the sending end device 101 and the receiving end device 102, and audio data is transmitted in the data channel.
  • the server may be a computer device that provides VoIP call service.
  • the sender device 101 and the receiver device 102 respectively refer to the data sender and data receiver in a data transmission process.
  • the sender device 101 can also act as the data receiver. Therefore, the receiving end device 102 can also act as a data sender, and the sending end device 101 and the receiving end device 102 send audio data to each other during a VoIP call, so as to achieve the interactive effect of a real-time voice call.
  • the applications installed on the sending end device 101 and the receiving end device 102 are the same, or the applications installed on the two terminals are the same type of application on different operating system platforms.
  • the transmitting end device 101 may generally refer to one of multiple terminals, and the receiving end device 102 may generally refer to one of multiple terminals. This embodiment only uses the transmitting end device 101 and the receiving end device 102 as examples.
  • the device types of the sending end device 101 and the receiving end device 102 are the same or different.
  • the device types include: smartphones, tablets, e-book readers, MP3 (Moving Picture Experts Group Audio Layer III, and dynamic image experts compress standard audio layer 3) ) Player, at least one of MP4 (Moving Picture Experts Group Audio Layer IV, Motion Picture Experts compresses standard audio layer 4) player, laptop portable computer and desktop computer.
  • the sending end device 101 and the receiving end device 102 may be smart phones or other handheld portable terminal devices.
  • the terminal includes a smart phone as an example.
  • the number of the above-mentioned terminals may be more or less.
  • the above-mentioned terminals may be only two, or in a multi-party call scenario, there are dozens or hundreds of the above-mentioned terminals, or more.
  • the embodiments of the present disclosure do not limit the number of terminals and device types.
  • Fig. 2 is a flowchart of a data transmission method provided by an embodiment of the present disclosure. Referring to FIG. 2, this embodiment can be applied to the interaction process between the sending end device 101 and the receiving end device 102 in the foregoing implementation environment.
  • the sending end device 101 is the terminal and the receiving end device 102 is the target terminal as an example for description.
  • This embodiment It includes the following steps:
  • the terminal obtains audio to be transmitted.
  • the terminal may call the recording interface to record a piece of audio to be transmitted, or the terminal may select a piece of audio from locally pre-stored audio as the audio to be transmitted.
  • the embodiment of the present disclosure does not specifically limit the method of acquiring the transmitted audio.
  • a user can log in to an application that provides audio services on a terminal.
  • the terminal initiates a VoIP call request to the target terminal.
  • the application on the terminal calls the recording interface to record, and obtains the audio to be transmitted.
  • the terminal acquires at least one of energy change information, voice activity detection information, or fundamental frequency change information of any audio frame to be transmitted in the audio frame to be transmitted.
  • the audio to be transmitted may include at least one audio frame to be transmitted obtained by natural framing.
  • the terminal may also re-framing the audio to be transmitted to obtain at least one audio frame to be transmitted.
  • At least one audio frame to be transmitted may have an association in the time domain or the frequency domain.
  • the terminal when it performs framing processing of the audio to be transmitted, it may first perform pre-emphasis processing on the audio to be transmitted to enhance the high-frequency components of the audio to be transmitted, and then divide the pre-emphasized audio to be transmitted through a window function At least one to-be-transmitted audio frame of equal duration is obtained to obtain at least one to-be-transmitted audio frame of the to-be-transmitted audio.
  • each to-be-transmitted audio frame may have a certain ratio of overlap, so as to ensure that each to-be-transmitted audio The edge feature of the frame is not lost.
  • the window function may be at least one of a hamming window, a hanning window, or a rectangular window. The embodiment of the present disclosure does not adopt the window function type for frame division. Make specific restrictions.
  • the energy change information of any audio frame to be transmitted can be the short-term energy change information between the audio frame to be transmitted and the audio frame to be transmitted before the audio frame to be transmitted, or it can also be the short-term energy change information between the to-be-transmitted audio frame and the to-be-transmitted audio frame.
  • the voice activity detection (Voice Activity Detection, VAD) information of any audio frame to be transmitted can be the to-be-transmitted audio frame VAD value change information between an audio frame and the audio frame to be transmitted before the audio frame to be transmitted;
  • the fundamental frequency change information of any audio frame to be transmitted may be the audio frame to be transmitted and the previous audio frame to be transmitted The fundamental frequency value change information between audio frames to be transmitted.
  • the terminal may perform short-term energy detection on the at least one audio frame to be transmitted to obtain the energy value of each audio frame to be transmitted, and perform VAD detection on the at least one audio frame to be transmitted based on features such as audio short-term stability , Obtain the VAD value of each audio frame to be transmitted, perform fundamental frequency detection on the above at least one audio frame to be transmitted, and obtain the fundamental frequency value of each audio frame to be transmitted, where the fundamental frequency value refers to the pitch, which is a kind of audio An important feature. Different morphemes have different fundamental frequency values. The fundamental frequency value is constantly changing in a period of normal speech. Therefore, if the fundamental frequency value changes very little within a certain period of time, it can be considered that each of the to-be-transmitted during this period of time Audio frames are noise frames.
  • the terminal determines the criticality level of the audio frame to be transmitted according to at least one of energy change information, voice activity detection information, or fundamental frequency change information of the audio frame to be transmitted.
  • the criticality level is used to measure the amount of information carried by the audio frame to be transmitted.
  • the amount of information carried by the audio frame to be transmitted is usually related to the possibility that the audio frame to be transmitted is a noise frame, that is, when the audio frame to be transmitted.
  • the larger the amount of information carried the less likely it is that the audio frame to be transmitted is considered a noise frame.
  • the criticality level may include at least one level, and as the criticality of each level descends from high to low, it means that the amount of information carried by the audio frames to be transmitted at each level becomes less and less.
  • the terminal can determine the criticality level of the audio frame to be transmitted in any of the following two ways:
  • Method 1 The terminal separately determines whether the audio frame to be transmitted meets the determination conditions of each level, and when the audio frame to be transmitted meets the determination conditions of at least one level, the criticality level of the audio frame to be transmitted is acquired as the at least one level Among the most critical levels, the determination conditions of each level are related to at least one of energy change information, voice activity detection information, or fundamental frequency change information of the audio frame to be transmitted.
  • the terminal can determine whether the audio frame to be transmitted meets the determination condition of each level one by one. If the audio frame to be transmitted only meets the determination condition of a single level, the criticality level of the audio frame to be transmitted is obtained as For this level, if the audio frame to be transmitted meets multiple levels of determination conditions at the same time, the terminal may determine the most critical level among the multiple levels as the criticality level of the audio frame to be transmitted. In the above process, the terminal judges whether the audio frame to be transmitted meets the judgment conditions of all levels one by one, and selects the most critical level as the critical level of the audio frame to be transmitted, so that the judgment conditions of each level can overlap each other. In the case of ensuring the accuracy of the criticality level to which the audio frame to be transmitted belongs.
  • Method 2 The terminal separately determines whether the audio frame to be transmitted meets the determination conditions of each level according to the order of criticality, wherein the determination conditions of each level are related to the energy change information and voice activity detection information of the audio frame to be transmitted Or at least one item in the fundamental frequency change information is related; if the audio frame to be transmitted meets the judgment condition of the current level, the criticality level of the audio frame to be transmitted is acquired as the current level; if the audio frame to be transmitted does not meet the current level The judgment condition of the level is executed to determine whether the audio frame to be transmitted meets the judgment condition of the next level.
  • the terminal judges whether the audio frame to be transmitted meets the judgment conditions of each level according to the order of criticality from high to low, starting from the most critical level, if the audio frame to be transmitted meets the criticality.
  • the criticality level of the to-be-transmitted audio frame is obtained as the highest criticality level, and there is no need to perform subsequent levels of determination operations. If the to-be-transmitted audio frame does not meet the highest criticality determination Condition, continue to judge whether the audio frame to be transmitted meets the judgment condition of the second most critical level, and so on, and will not be repeated here. In the above process, there is no need to make a one-by-one determination of the audio frame to be transmitted and each level.
  • the level judgment condition optimizes the judgment logic for the criticality level on the terminal, and improves the processing efficiency of the terminal to determine the criticality level of each audio frame to be transmitted.
  • the criticality level may include the first level, the second level, the third level, and the fourth level.
  • the determination conditions of the four levels are described in detail below:
  • the first-level determination condition includes: the voice activity detection value (VAD value) of the to-be-transmitted audio frame is 1, the VAD value of the previous-to-be-transmitted audio frame of the to-be-transmitted audio frame is 0, and the to-be-transmitted audio frame has a VAD value of 0 and the The absolute value of the difference between the fundamental frequency value between the transmitted audio frame and the previous audio frame to be transmitted is greater than the target threshold.
  • VAD value voice activity detection value
  • the target threshold may be any value greater than or equal to 0, and the embodiment of the present disclosure does not specifically limit the value of the target threshold.
  • the audio frame to be transmitted is likely to be transmitted.
  • the audio is the starting frame of the transition from a noise frame (or silent frame) to a non-noise frame (or non-silent frame), and further, the value of the fundamental frequency between the audio frame to be transmitted and the audio frame to be transmitted before The absolute value of the difference is large, indicating that the audio frame to be transmitted is likely to be an audio frame to be transmitted with a change in voice pitch in the audio to be transmitted.
  • the audio frame to be transmitted is not only the starting frame of the transition from noise frame (or silent frame) to non-noise frame (or non-silent frame), but also the audio frame to be transmitted where the voice pitch changes, so the key to the audio frame to be transmitted can be determined
  • the sex level is the first level, that is, the most critical level, and the amount of information carried by the audio frame to be transmitted is the largest.
  • the second-level determination conditions include: the VAD value of the audio frame to be transmitted is 1, the VAD value of the previous audio frame to be transmitted of the audio frame to be transmitted is 1, and the energy value of the audio frame to be transmitted Is greater than the target multiple of the energy value of the previous audio frame to be transmitted; or, the VAD value of the to-be-transmitted audio frame is 1, the VAD value of the previous to-be-transmitted audio frame of the to-be-transmitted audio frame is 0, and the to-be-transmitted audio
  • the absolute value of the difference between the fundamental frequency value between the frame and the previous audio frame to be transmitted is less than or equal to the target threshold.
  • the target multiple may be any value greater than or equal to 1, and the embodiment of the present disclosure does not specifically limit the value of the target multiple.
  • the VAD value of the audio frame to be transmitted is 1
  • the VAD value of the previous audio frame to be transmitted is 1
  • the energy value of the audio frame to be transmitted is greater than the target multiple of the energy value of the previous audio frame to be transmitted.
  • the to-be-transmitted audio frame is likely to be a to-be-transmitted audio frame that transitions from unvoiced to voiced in the to-be-transmitted audio.
  • the VAD value of the to-be-transmitted audio frame is 1, the VAD value of the previous to-be-transmitted audio frame Is 0 and the absolute value of the difference between the fundamental frequency value between the audio frame to be transmitted and the audio frame to be transmitted before is less than or equal to the target threshold, indicating that although the audio frame to be transmitted is unlikely to have a voice pitch change, it is still It is very likely that the starting frame of a noise frame (or silent frame) will transition to a non-noise frame (or non-silent frame). Therefore, when a certain audio frame to be transmitted meets any of the above-mentioned judgment conditions, the to-be-transmitted audio frame is The criticality level of the transmitted audio frame is determined as the second level.
  • the third-level determination condition includes: the VAD value of the audio frame to be transmitted is 1.
  • the VAD value of the audio frame to be transmitted is 1, indicating that the audio frame to be transmitted is more likely to be a non-noise frame (or non-silent frame). Therefore, when the VAD value of a certain audio frame to be transmitted is 1.
  • the criticality level of the audio frame to be transmitted is determined to be the third level.
  • the determination condition of the fourth level includes: the VAD value of the audio frame to be transmitted is 0.
  • the VAD value of the audio is 0, indicating that the audio frame to be transmitted is likely to be a noise frame (or silent frame). Therefore, when the VAD value of a certain audio frame to be transmitted is 0, the The criticality level of the audio frame to be transmitted is determined as the fourth level.
  • the criticality level corresponding to each audio frame to be transmitted can be determined according to the above-mentioned method 1 or method 2.
  • the criticality level can also be divided into more
  • the number of criticality levels is larger or smaller, and other judgment conditions can also be determined for each criticality level.
  • the embodiments of the present disclosure do not specifically limit the number of criticality levels and the judgment conditions of each criticality level.
  • the above steps 202-203 take any audio frame to be transmitted as an example to illustrate how to determine the criticality level of the audio frame to be transmitted.
  • the terminal repeats the above steps 202-203 for each audio frame to be transmitted. Realize the voice criticality analysis of the audio to be transmitted, and obtain the criticality level of at least one audio frame to be transmitted in the to-be-transmitted audio. Since each audio frame to be transmitted is divided into criticality levels, it can be based on each to-be-transmitted audio frame. Depending on the criticality level to which the audio frame belongs, in the subsequent steps, different modified redundancy multiples (that is, the number of redundant transmissions) are assigned to each audio frame to be transmitted, so as to set a higher criticality level for the audio frame to be transmitted.
  • a large modified redundancy multiple a smaller modified redundancy multiple is set for the audio frames to be transmitted with a lower critical level. Compared with the situation where the same current redundancy multiple is set for all audio frames to be transmitted, it can avoid occupancy. More system bandwidth resources can avoid network congestion caused by redundant multiple transmission mechanism, thereby improving the anti-packet loss effect of audio data transmission while avoiding network congestion.
  • the terminal obtains a redundant multiple transmission factor corresponding to the criticality level of the audio frame to be transmitted.
  • the criticality level has a positive correlation with the magnitude of the redundant multiple transmission factors, that is, The higher the criticality level, the greater the redundancy factor, and the lower the criticality level, the smaller the redundancy factor.
  • the redundancy factor can be any value greater than 0 and less than or equal to 1.
  • the terminal can map the criticality level of the audio frame to be transmitted to the corresponding redundant multiple transmission factor for any audio frame to be transmitted according to the mapping relationship between the criticality level and the redundant multiple transmission factor, optionally
  • the mapping relationship may be pre-stored locally on the terminal, or downloaded from the server whenever the terminal needs to transmit audio data, or downloaded from the server periodically by the terminal.
  • the server may also periodically download it from the server.
  • the mapping relationship is updated, and the updated mapping relationship is delivered to each terminal after the mapping relationship is updated, so that the terminal replaces the existing mapping relationship with the updated mapping relationship.
  • the terminal determines a value obtained by multiplying the redundant multiple transmission factor and the current redundancy multiple as the modified redundancy multiple of the audio frame to be transmitted.
  • the current redundancy multiple is determined based on the current packet loss situation of the target terminal, and the corrected redundancy multiple of each audio frame to be transmitted is used to indicate the number of redundant transmissions of each audio frame to be transmitted.
  • the foregoing step 205 that is, the terminal determines the value obtained by multiplying the redundant multiple transmission factor by the current redundant multiple as the number of redundant transmissions of the audio frame to be transmitted.
  • the terminal can obtain the current packet loss situation of the target terminal from the target terminal.
  • the terminal determines the current redundancy multiple according to the current packet loss situation of the target terminal.
  • the current redundancy multiple is not static for the same target terminal. That is to say, as the current packet loss situation of the target terminal changes in different time periods, the current redundancy multiple of the target terminal will also change accordingly.
  • the current packet loss situation is more serious (for example, the packet loss rate (Higher)
  • the larger the value of the current redundancy multiple when the current packet loss situation is relatively good (for example, the packet loss rate is low), the smaller the value of the current redundancy multiple.
  • the current packet loss situation may include at least one of the number of packets lost, the packet loss rate, network delay, or network fluctuation. The embodiment of the present disclosure does not specifically limit the content of the current packet loss situation.
  • the target terminal may also determine the current redundancy multiple of the target terminal by itself after counting the current packet loss situation, and then send the current redundancy multiple to the above-mentioned terminal.
  • the embodiment of the present disclosure does not calculate the current redundancy multiple. Whether it is determined by the terminal or the target terminal is specifically limited.
  • the terminal repeats the above steps 204-205 for each audio frame to be transmitted, obtains the redundant multiple transmission factor of each audio frame to be transmitted, and multiplies the redundant multiple transmission factor of each audio frame to be transmitted by the current redundancy multiple to obtain each audio frame to be transmitted
  • the modified redundancy multiple of the frame that is, the number of redundant transmissions
  • the terminal obtains the multiple transmission factor of each audio frame to be transmitted according to the current redundancy multiple and the redundancy factor corresponding to the criticality level of each audio frame to be transmitted
  • Modified redundancy multiple that is, the number of redundant transmissions of at least one audio frame to be transmitted is obtained.
  • the modified redundancy multiple of different audio frames to be transmitted can be the same or different.
  • the terminal performs encoding processing on the audio to be transmitted to obtain an audio code stream, and sends the audio code stream to the target terminal.
  • the terminal can use any encoding method to encode the audio to be transmitted to obtain the audio code stream, the audio code stream can be encapsulated into at least one audio data packet, and the at least one audio data packet is added to the transmission of the network session In the queue, at least one audio data packet in the sending queue that carries the audio code stream is sent to the target terminal.
  • the encoding method adopted by the terminal may include any of the following: waveform encoding, parameter encoding, hybrid encoding, or FEC (Forward Error Correction) encoding.
  • the embodiment of the present disclosure does not deal with the encoding method for transmitting audio Make specific restrictions.
  • the audio code stream obtained by the terminal through encoding can also be in any format, such as PCM (Pulse Code Modulation) format, MP3 (Moving Picture Experts Group Audio Layer III, Motion Picture Experts compress standard audio) Level 3) format, OGG format (OGG Vobis, an audio compression format), etc.
  • PCM Pulse Code Modulation
  • MP3 Motion Picture Experts Group Audio Layer III, Motion Picture Experts compress standard audio
  • OGG OGG Vobis, an audio compression format
  • the embodiments of the present disclosure also do not specifically limit the format of the audio code stream.
  • the terminal and the target terminal may pre-appoint one or more formats for the audio code stream transmitted between the two parties, so that the terminal encodes the to-be-transmitted audio into the aforementioned pre-appointed one or more formats.
  • the audio code stream for example, the one or more formats may be audio compression formats supported by both the terminal and the target terminal.
  • step 206 can be performed before or after any of the above steps 202-205, that is, the embodiment of the present disclosure does not correct the difference between the above step 206 and any of the above steps 202-205.
  • the execution timing is specifically limited.
  • the terminal copies each audio frame to be transmitted in the audio code stream multiple times, until the copy times of each audio frame to be transmitted reach the corrected redundancy multiple of each audio frame to be transmitted, and multiple redundant audio frames are obtained.
  • the above step 207 that is, the terminal copies at least one audio frame to be transmitted in the audio code stream multiple times, until the number of copies reaches the redundant transmission times of the at least one audio frame to be transmitted, and multiple redundant audio frames are obtained.
  • the terminal copies the audio frame to be transmitted in the audio code stream r times to obtain r redundant audio frames, and so on,
  • Each audio frame to be transmitted is subjected to the duplication operation of the corrected redundancy multiple times to obtain the redundant audio frame with the corrected redundancy multiple.
  • the set of redundant audio frames of each audio frame to be transmitted can be referred to as the aforementioned multiple redundancy Audio frame.
  • the audio frames to be transmitted with different criticality levels have different modified redundancy multiples
  • the audio frames to be transmitted with different criticality levels also correspond to different copy times, that is, the criticality level
  • Different audio frames to be transmitted each have a different number of redundant audio frames. Therefore, for some noise frames (or silent frames) with lower critical levels, the smaller the modified redundancy multiple, the fewer the number of copies, and the redundant audio The smaller the number of frames, the less system bandwidth resources are occupied. This is because noise frames (or silent frames) will not have a large adverse effect on the transmission effect of audio data even if network packet loss occurs. There is no need to allocate more system bandwidth resources to ensure the transmission reliability of these audio frames to be transmitted.
  • non-noise frames or non-silent frames
  • the greater the correction redundancy multiple the more copy times
  • the greater the number of redundant audio frames the more system bandwidth resources are occupied. This is because when these audio frames to be transmitted have network packet loss, they will have a greater adverse effect on the audio data transmission effect, so you can In the case of limited system bandwidth resources, allocate more system bandwidth resources to ensure the transmission reliability of these audio frames to be transmitted, ensure the rationality of system bandwidth resource allocation, and avoid network congestion caused by redundant multiple transmission mechanisms. .
  • the terminal encapsulates the multiple redundant audio frames to be transmitted into at least one redundant data packet, and sends the at least one redundant data packet carrying each audio frame to be transmitted to the target terminal.
  • the terminal can add the at least one redundant data packet to the sending queue of the network session.
  • the sending queue can be matched according to a certain rule.
  • Each redundant data packet is sorted.
  • the ordering rule can be as follows: Obtain the cumulative weights of the criticality levels of all redundant audio frames carried in each redundant data packet, and perform the evaluation on each redundant data packet in the order of the cumulative weight from high to low. Sorting, where the higher the criticality level, the higher the weight, and the cumulative weight of a redundant data packet is the sum of the weights of the criticality levels of all redundant audio frames in the redundant data packet.
  • the terminal respectively copies each audio frame to be transmitted according to the modified redundancy multiple of each audio frame to be transmitted to obtain at least one redundant data packet, and sends to the target terminal the data of each audio frame to be transmitted The at least one redundant data packet.
  • the terminal copies the at least one audio frame to be transmitted according to the number of redundant transmissions to obtain at least one redundant data packet, and sends the at least one redundant data packet to the target terminal.
  • not only the audio code stream is transmitted to the target terminal, but also each redundant data packet is transmitted to the target terminal.
  • Such a redundant and multiple transmission mechanism can better resist the risk of packet loss in the Internet transmission process and improve the performance in various scenarios. Anti-packet loss.
  • the at least one audio data packet when at least one audio data packet carrying an audio code stream and at least one redundant data packet carrying multiple redundant audio frames are both added to the buffer area of the sending queue, the at least one audio data packet may be added
  • the sending sequence is placed before the at least one redundant data packet, so that at least one audio data packet is sent to the target terminal first. If the current packet loss occurs, the target terminal can obtain the transmission according to the at least one redundant data packet received subsequently Redundant audio frames corresponding to the lost audio frames to be transmitted, so that the anti-packet phenomenon of the target terminal is improved, so that no matter in a large number of packet loss scenes or in a scene with high real-time performance, it can not cause network congestion. In the case of data transmission, the anti-packet loss phenomenon in data transmission is greatly improved.
  • the method provided by the embodiment of the present disclosure obtains the criticality level of each audio frame to be transmitted in the audio to be transmitted by performing voice criticality analysis on the audio to be transmitted, and corresponds to the current redundancy multiple and the criticality level of each audio frame to be transmitted.
  • the redundancy factor of each audio frame to be transmitted is obtained, and the modified redundancy multiple of each audio frame to be transmitted is obtained.
  • the current redundancy multiple is determined based on the current packet loss situation of the target terminal, and the modified redundancy multiple of each audio frame to be transmitted is used to represent each The number of redundant transmissions of the audio frame to be transmitted.
  • this criticality level is used to measure the amount of information carried by the audio frame, different criticality levels correspond to different redundant multiple transmission factors, and the criticality level corresponds to the magnitude of the redundant multiple transmission factor. It is positively correlated, that is, the higher the criticality level of the audio frame to be transmitted corresponds to the greater the value of the redundant multi-transmission factor, so that the greater the value of the corrected redundancy multiple can be obtained, so as to according to the each audio frame to be transmitted Modify the redundancy multiple to copy each audio frame to be transmitted, so that different audio frames to be transmitted can have different numbers of copies to obtain at least one redundant data packet, and by sending the at least one redundant data packet to the target terminal, It can increase the probability that the target terminal receives each audio frame to be transmitted based on a redundant multiple transmission mechanism, thereby combating packet loss in data transmission, because the audio frames to be transmitted carrying different amounts of information are assigned different modified redundancy multiples Therefore, it can avoid the network congestion caused by the redundant multiple transmission mechanism, and optimize the resource allocation scheme of the system bandwidth under the
  • FIG. 3 is a schematic diagram of a data transmission method provided by an embodiment of the present disclosure. Referring to FIG. 3, it shows the audio data transmission process between the sending end device and the receiving end device. After the sending end device obtains the recording to be transmitted, On the one hand, the sending end device encodes the recording to be transmitted to generate an audio code stream. On the other hand, the sending end device performs a voice criticality analysis on the recording to be transmitted, confirming the criticality level of each audio frame to be transmitted in the recording to be transmitted, and the criticality is different. Levels of audio frames to be transmitted correspond to different redundant multiple transmission factors.
  • the sender device determines the current redundancy multiple that matches the current packet loss according to the feedback information of the receiver device on the current packet loss situation, and according to the current redundancy multiple
  • the redundancy factor corresponding to the criticality level of each audio frame to be transmitted is obtained, and the modified redundancy multiple of each audio frame to be transmitted is obtained.
  • the above process of determining the modified redundancy multiple of each audio frame to be transmitted can also be referred to as "redundancy".
  • the sending end device makes multiple copies of each audio frame to be transmitted in the encoded audio stream based on the modified redundancy multiple of each audio frame to be transmitted, so that the number of copies of each audio frame to be transmitted is equal to the number of copies of each audio frame to be transmitted
  • the modified redundancy multiple of the audio frame encapsulates each of the copied audio frames to be transmitted into at least one redundant data packet, and after sorting the redundant data packets according to a certain rule, they are evenly distributed in the sending queue corresponding to the receiving end device
  • the sending end device sends the audio data packet carrying the audio code stream and at least one redundant data packet to the receiving end device through the network.
  • the receiving end device Due to the audio code stream and the redundant code stream (that is, at least one redundant data packet obtained by copying)
  • the code stream formed by the packet actually has the same code stream data, so for the same piece of audio data, the receiving end device only needs to obtain either the audio data packet or the redundant data packet from the network, and receive The end device sorts each data packet according to the serial number of each data packet (audio data packet or redundant data packet). If it detects that multiple data packets with the same serial number are received, it retains any data packet with that serial number. Filter out other repeated data packets received. After the filtering is completed, the receiving end device decodes the sorted data packets through the decoder to obtain the sound signal, which can then play the sound signal, and the receiving end device can also correct the current loss.
  • Packet statistics such as statistical packet loss rate, network delay, network fluctuations, etc.
  • the current redundancy multiple used such as statistical packet loss rate, network delay, network fluctuations, etc.
  • Non-key frames by analyzing the criticality (also called importance) of each audio frame to be transmitted, some non-speech frames, such as silent frames, noise frames, etc., are classified as Non-key frames, these non-key frames will not affect the audio transmission effect (such as VoIP call effect) even if they suffer from network packet loss.
  • voice key frames such as the voice start frame and voice pitch of the transition from a non-voice frame to a voice frame
  • the changed voice frames, etc. are key frames due to the large amount of information carried.
  • the key frame can be assigned a higher redundant multiple multiple based on the redundant multiple decision. To ensure the reliability of its transmission, that is, to ensure that the data packet where the key frame is located will not be lost, so as to solve the network packet loss problem of the key frame, so that the limited bandwidth system can be used through a reasonable resource allocation plan. To achieve a good anti-packet loss effect, avoid network congestion caused by redundant multiple transmission decisions, and avoid more serious network packet loss problems that may be caused by network congestion.
  • a redundant multiple transmission transmission method applied to audio transmission services is provided.
  • the modified redundancy multiple of each audio frame to be transmitted is not only due to changes in the quality of the transmission network It can be adjusted at any time, and changes according to the criticality of the audio content currently being transmitted, so that the high-critical audio frames to be transmitted can be better guaranteed, while the low-critical audio frames to be transmitted have no impact on the sound quality. If it is larger, a smaller modified redundancy multiple is used for transmission, which can effectively utilize network bandwidth resources.
  • the redundant multiple transmission mechanism provided by the embodiments of the present disclosure, It can ensure high-quality audio data transmission, and can achieve high-reliability real-time audio transmission for services such as VoIP, audio broadcasting, audio and video live broadcasts, etc. Not only does it have excellent anti-packet loss effects under continuous bursts of large packet loss scenarios, but also in real-time The anti-packet loss effect has also been significantly improved in high-performance scenarios.
  • FIG. 4 is a schematic structural diagram of a data transmission device provided by an embodiment of the present disclosure. Referring to FIG. 4, the device includes:
  • the analysis module 401 is used to perform voice criticality analysis on the audio to be transmitted to obtain the criticality level of each audio frame to be transmitted in the to-be-transmitted audio, and the criticality level is used to measure the amount of information carried by the audio frame, and different criticality levels Corresponding to different redundant multiple occurrence factors, the criticality level is positively correlated with the magnitude of the redundant multiple occurrence factor;
  • the analysis module 401 is configured to perform voice criticality analysis on the audio to be transmitted to obtain the criticality level of at least one audio frame to be transmitted in the audio to be transmitted, and the criticality level is used to measure the information carried by the audio frame the amount;
  • the obtaining module 402 is configured to obtain the modified redundancy multiple of each audio frame to be transmitted according to the current redundancy multiple and the redundancy factor corresponding to the criticality level of each audio frame to be transmitted, and the current redundancy multiple is based on the target terminal's The current packet loss situation is determined, and the modified redundancy multiple of each audio frame to be transmitted is used to indicate the number of redundant transmissions of each audio frame to be transmitted;
  • the obtaining module 402 is configured to obtain the number of redundant transmissions of the at least one audio frame to be transmitted according to the current redundancy multiple and the redundancy multiple transmission factor corresponding to the criticality level, and the criticality level is related to the redundant transmission factor.
  • the size of the surplus transmission factor is positively correlated, and the current redundancy multiple is determined based on the current packet loss situation of the target terminal;
  • the sending module 403 is configured to copy each audio frame to be transmitted according to the corrected redundancy multiple of each audio frame to be transmitted to obtain at least one redundant data packet, and send the at least one of each audio frame to be transmitted to the target terminal Redundant data packet;
  • the sending module 403 is configured to copy the at least one audio frame to be transmitted according to the number of redundant transmissions to obtain at least one redundant data packet, and send the at least one redundant data packet to the target terminal.
  • the device provided by the embodiment of the present disclosure obtains the criticality level of each audio frame to be transmitted in the audio to be transmitted by performing voice criticality analysis on the audio to be transmitted, and corresponds to the current redundancy multiple and the criticality level of each audio frame to be transmitted
  • the redundancy factor of each audio frame to be transmitted is obtained, and the modified redundancy multiple of each audio frame to be transmitted is obtained.
  • the current redundancy multiple is determined based on the current packet loss situation of the target terminal, and the modified redundancy multiple of each audio frame to be transmitted is used to represent each The number of redundant transmissions of the audio frame to be transmitted.
  • this criticality level is used to measure the amount of information carried by the audio frame, different criticality levels correspond to different redundant multiple transmission factors, and the criticality level corresponds to the magnitude of the redundant multiple transmission factor. It is positively correlated, that is, the higher the criticality level of the audio frame to be transmitted corresponds to the greater the value of the redundant multi-transmission factor, so that the greater the value of the corrected redundancy multiple can be obtained, so as to according to the each audio frame to be transmitted Modify the redundancy multiple to copy each audio frame to be transmitted, so that different audio frames to be transmitted can have different numbers of copies to obtain at least one redundant data packet, and by sending the at least one redundant data packet to the target terminal, It can increase the probability that the target terminal receives each audio frame to be transmitted based on a redundant multiple transmission mechanism, thereby combating packet loss in data transmission, because the audio frames to be transmitted carrying different amounts of information are assigned different modified redundancy multiples Therefore, it can avoid the network congestion caused by the redundant multiple transmission mechanism, and optimize the resource allocation scheme of the system bandwidth under the
  • the analysis module 401 includes:
  • the acquiring unit is configured to acquire at least one of energy change information, voice activity detection information, or fundamental frequency change information of any audio frame to be transmitted in the audio frame to be transmitted;
  • the determining unit is configured to determine the criticality level of the audio frame to be transmitted according to at least one of energy change information, voice activity detection information, or fundamental frequency change information of the audio frame to be transmitted.
  • the determining unit is used to:
  • the criticality level of the audio frame to be transmitted is acquired as the most critical among the at least one level.
  • the determination condition of each level is related to at least one of energy change information, voice activity detection information, or fundamental frequency change information of the audio frame to be transmitted.
  • the determining unit is used to:
  • the determination conditions of each level are related to the energy change information, voice activity detection information, or fundamental frequency change of the audio frame to be transmitted At least one item in the information is relevant;
  • the criticality level of the audio frame to be transmitted is acquired as the current level
  • the criticality level includes the first level, the second level, the third level, and the fourth level;
  • the first-level determination conditions include: the voice activity detection value of the to-be-transmitted audio frame is 1, the voice activity detection value of the previous-to-be-transmitted audio frame of the to-be-transmitted audio frame is 0, and the to-be-transmitted audio frame and the previous An absolute value of the difference between the fundamental frequency values of the audio frames to be transmitted is greater than the target threshold;
  • the second-level determination conditions include: the voice activity detection value of the to-be-transmitted audio frame is 1, the voice-activity detection value of the previous to-be-transmitted audio frame of the to-be-transmitted audio frame is 1, and the energy value of the to-be-transmitted audio frame Is greater than the target multiple of the energy value of the previous audio frame to be transmitted; or, the voice activity detection value of the audio frame to be transmitted is 1, the voice activity detection value of the previous audio frame to be transmitted of the audio frame to be transmitted is 0, and The absolute value of the difference of the fundamental frequency value between the audio frame to be transmitted and the audio frame to be transmitted before is less than or equal to the target threshold;
  • the third-level determination condition includes: the voice activity detection value of the to-be-transmitted audio frame is 1;
  • the determination condition of the fourth level includes: the voice activity detection value of the to-be-transmitted audio frame is 0.
  • the obtaining module 402 is used to:
  • the value obtained by multiplying the redundancy multiple factor corresponding to the criticality level of the audio frame to be transmitted by the current redundancy multiple is determined as the modified redundancy multiple of the audio frame to be transmitted (also That is, the number of redundant transmissions).
  • the sending module 403 is used to:
  • the multiple redundant audio frames are encapsulated into the at least one redundant data packet.
  • the sending module 403 is used for:
  • At least one audio frame to be transmitted in the audio code stream is copied multiple times, until the number of copies reaches the redundant transmission times of the at least one audio frame to be transmitted, and multiple redundant audio frames are obtained;
  • the multiple redundant audio frames are encapsulated into the at least one redundant data packet.
  • the data transmission device provided in the above embodiment transmits data
  • only the division of the above functional modules is used as an example for illustration.
  • the above function allocation can be completed by different functional modules according to needs, i.e.
  • the internal structure of the terminal device is divided into different functional modules to complete all or part of the functions described above.
  • the data transmission device and the data transmission method embodiment provided in the above embodiment belong to the same concept, and the implementation process is detailed in the data transmission method embodiment, which will not be repeated here.
  • FIG. 5 is a schematic structural diagram of a terminal provided by an embodiment of the present disclosure.
  • the terminal 500 can be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, moving picture experts compressing standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture experts compressing standard audio Level 4) Player, laptop or desktop computer.
  • the terminal 500 may also be called user equipment, portable terminal, laptop terminal, desktop terminal and other names.
  • the terminal 500 includes a processor 501 and a memory 502.
  • the processor 501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on.
  • the processor 501 may adopt at least one hardware form among DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array, Programmable Logic Array). achieve.
  • the processor 501 may also include a main processor and a coprocessor.
  • the main processor is a processor used to process data in the awake state, also called a CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor used to process data in the standby state.
  • the processor 501 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used to render and draw content that needs to be displayed on the display screen.
  • the processor 501 may further include an AI (Artificial Intelligence) processor, and the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence
  • the memory 502 may include one or more computer-readable storage media, which may be non-transitory.
  • the memory 502 may also include high-speed random access memory and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices.
  • the non-transitory computer-readable storage medium in the memory 502 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 501 to implement the data transmission provided by the various embodiments of the present disclosure. method.
  • the terminal 500 may optionally further include: a peripheral device interface 503 and at least one peripheral device.
  • the processor 501, the memory 502, and the peripheral device interface 503 may be connected by a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 503 through a bus, a signal line, or a circuit board.
  • the peripheral device includes: at least one of a radio frequency circuit 504, a display screen 505, a camera component 506, an audio circuit 507, a positioning component 508, and a power supply 509.
  • the peripheral device interface 503 can be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 501 and the memory 502.
  • the processor 501, the memory 502, and the peripheral device interface 503 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 501, the memory 502, and the peripheral device interface 503 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.
  • the radio frequency circuit 504 is used for receiving and transmitting RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals.
  • the radio frequency circuit 504 communicates with a communication network and other communication devices through electromagnetic signals.
  • the radio frequency circuit 504 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals.
  • the radio frequency circuit 504 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, and so on.
  • the radio frequency circuit 504 can communicate with other terminals through at least one wireless communication protocol.
  • the wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity, wireless fidelity) networks.
  • the radio frequency circuit 504 may also include a circuit related to NFC (Near Field Communication), which is not limited in the present disclosure.
  • the display screen 505 is used to display a UI (User Interface, user interface).
  • the UI can include graphics, text, icons, videos, and any combination thereof.
  • the display screen 505 also has the ability to collect touch signals on or above the surface of the display screen 505.
  • the touch signal may be input to the processor 501 as a control signal for processing.
  • the display screen 505 may also be used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards.
  • the display screen 505 may be one display screen 505, which is provided with the front panel of the terminal 500; in other embodiments, there may be at least two display screens 505, which are respectively provided on different surfaces of the terminal 500 or in a folded design; In still other embodiments, the display screen 505 may be a flexible display screen, which is arranged on a curved surface or a folding surface of the terminal 500. Furthermore, the display screen 505 can also be set as a non-rectangular irregular pattern, that is, a special-shaped screen.
  • the display screen 505 may be made of materials such as LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode, organic light-emitting diode).
  • the camera assembly 506 is used to capture images or videos.
  • the camera assembly 506 includes a front camera and a rear camera.
  • the front camera is set on the front panel of the terminal, and the rear camera is set on the back of the terminal.
  • the camera assembly 506 may also include a flash.
  • the flash can be a single-color flash or a dual-color flash. Dual color temperature flash refers to a combination of warm light flash and cold light flash, which can be used for light compensation under different color temperatures.
  • the audio circuit 507 may include a microphone and a speaker.
  • the microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals to be input to the processor 501 for processing, or input to the radio frequency circuit 504 to implement voice communication. For the purpose of stereo collection or noise reduction, there may be multiple microphones, which are respectively set in different parts of the terminal 500.
  • the microphone can also be an array microphone or an omnidirectional collection microphone.
  • the speaker is used to convert the electrical signal from the processor 501 or the radio frequency circuit 504 into sound waves.
  • the speaker can be a traditional thin-film speaker or a piezoelectric ceramic speaker.
  • the speaker When the speaker is a piezoelectric ceramic speaker, it can not only convert the electrical signal into human audible sound waves, but also convert the electrical signal into human inaudible sound waves for distance measurement and other purposes.
  • the audio circuit 507 may also include a headphone jack.
  • the positioning component 508 is used to locate the current geographic location of the terminal 500 to implement navigation or LBS (Location Based Service, location-based service).
  • the positioning component 508 may be a positioning component based on the GPS (Global Positioning System, Global Positioning System) of the United States, the Beidou system of China, the Grenas system of Russia, or the Galileo system of the European Union.
  • the power supply 509 is used to supply power to various components in the terminal 500.
  • the power source 509 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery.
  • the rechargeable battery may support wired charging or wireless charging.
  • the rechargeable battery can also be used to support fast charging technology.
  • the terminal 500 further includes one or more sensors 510.
  • the one or more sensors 510 include, but are not limited to: an acceleration sensor 511, a gyroscope sensor 512, a pressure sensor 513, a fingerprint sensor 514, an optical sensor 515, and a proximity sensor 516.
  • the acceleration sensor 511 can detect the magnitude of acceleration on the three coordinate axes of the coordinate system established by the terminal 500.
  • the acceleration sensor 511 can be used to detect the components of gravitational acceleration on three coordinate axes.
  • the processor 501 may control the display screen 505 to display the user interface in a horizontal view or a vertical view according to the gravitational acceleration signal collected by the acceleration sensor 511.
  • the acceleration sensor 511 may also be used for the collection of game or user motion data.
  • the gyroscope sensor 512 can detect the body direction and rotation angle of the terminal 500, and the gyroscope sensor 512 can cooperate with the acceleration sensor 511 to collect the user's 3D actions on the terminal 500.
  • the processor 501 can implement the following functions according to the data collected by the gyroscope sensor 512: motion sensing (for example, changing the UI according to the user's tilt operation), image stabilization during shooting, game control, and inertial navigation.
  • the pressure sensor 513 may be arranged on the side frame of the terminal 500 and/or the lower layer of the display screen 505.
  • the processor 501 performs left and right hand recognition or quick operation according to the holding signal collected by the pressure sensor 513.
  • the processor 501 controls the operability controls on the UI interface according to the pressure operation of the user on the display screen 505.
  • the operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.
  • the fingerprint sensor 514 is used to collect the user's fingerprint.
  • the processor 501 can identify the user's identity based on the fingerprint collected by the fingerprint sensor 514, or the fingerprint sensor 514 can identify the user's identity based on the collected fingerprint. When it is recognized that the user's identity is a trusted identity, the processor 501 authorizes the user to perform related sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings.
  • the fingerprint sensor 514 may be provided on the front, back or side of the terminal 500. When a physical button or a manufacturer logo is provided on the terminal 500, the fingerprint sensor 514 can be integrated with the physical button or the manufacturer logo.
  • the optical sensor 515 is used to collect the ambient light intensity.
  • the processor 501 may control the display brightness of the display screen 505 according to the ambient light intensity collected by the optical sensor 515.
  • the processor 501 may also dynamically adjust the shooting parameters of the camera assembly 506 according to the ambient light intensity collected by the optical sensor 515.
  • the proximity sensor 516 also called a distance sensor, is usually arranged on the front panel of the terminal 500.
  • the proximity sensor 516 is used to collect the distance between the user and the front of the terminal 500.
  • the processor 501 controls the display screen 505 to switch from the on-screen state to the off-screen state; when the proximity sensor 516 detects When the distance between the user and the front of the terminal 500 gradually increases, the processor 501 controls the display screen 505 to switch from the rest screen state to the bright screen state.
  • FIG. 5 does not constitute a limitation on the terminal 500, and may include more or fewer components than shown, or combine certain components, or adopt different component arrangements.
  • the terminal includes one or more processors and one or more memories, and at least one piece of program code is stored in the one or more memories, and the at least one piece of program code is loaded by the one or more processors And execute to achieve the following operations:
  • the number of redundant transmissions of the at least one audio frame to be transmitted is obtained.
  • the criticality level is positively correlated with the magnitude of the redundant multiple transmission factor, and the current redundancy The multiple is determined based on the current packet loss situation of the target terminal;
  • the at least one audio frame to be transmitted is copied according to the number of redundant transmissions to obtain at least one redundant data packet, and the at least one redundant data packet is sent to the target terminal.
  • the at least one piece of program code is loaded and executed by the one or more processors to implement the following operations:
  • any audio frame to be transmitted For any audio frame to be transmitted, acquiring at least one of energy change information, voice activity detection information, or fundamental frequency change information of the audio frame to be transmitted;
  • the at least one piece of program code is loaded and executed by the one or more processors to implement the following operations:
  • the criticality level of the audio frame to be transmitted is acquired as the most critical among the at least one level.
  • the determination condition of each level is related to at least one of energy change information, voice activity detection information, or fundamental frequency change information of the audio frame to be transmitted.
  • the at least one piece of program code is loaded and executed by the one or more processors to implement the following operations:
  • the determination conditions of each level are related to the energy change information, voice activity detection information, or fundamental frequency change of the audio frame to be transmitted At least one item in the information is relevant;
  • the criticality level of the audio frame to be transmitted is acquired as the current level
  • the criticality level includes the first level, the second level, the third level, and the fourth level;
  • the first-level determination conditions include: the voice activity detection value of the to-be-transmitted audio frame is 1, the voice activity detection value of the previous-to-be-transmitted audio frame of the to-be-transmitted audio frame is 0, and the to-be-transmitted audio frame and the previous An absolute value of the difference between the fundamental frequency values of the audio frames to be transmitted is greater than the target threshold;
  • the second-level determination conditions include: the voice activity detection value of the to-be-transmitted audio frame is 1, the voice-activity detection value of the previous to-be-transmitted audio frame of the to-be-transmitted audio frame is 1, and the energy value of the to-be-transmitted audio frame Is greater than the target multiple of the energy value of the previous audio frame to be transmitted; or, the voice activity detection value of the audio frame to be transmitted is 1, the voice activity detection value of the previous audio frame to be transmitted of the audio frame to be transmitted is 0, and The absolute value of the difference of the fundamental frequency value between the audio frame to be transmitted and the audio frame to be transmitted before is less than or equal to the target threshold;
  • the third-level determination condition includes: the voice activity detection value of the to-be-transmitted audio frame is 1;
  • the determination condition of the fourth level includes: the voice activity detection value of the to-be-transmitted audio frame is 0.
  • the at least one piece of program code is loaded and executed by the one or more processors to implement the following operations:
  • a value obtained by multiplying the redundancy multiple transmission factor corresponding to the criticality level of the audio frame to be transmitted by the current redundancy multiple is determined as the number of redundant transmissions of the audio frame to be transmitted.
  • the at least one piece of program code is loaded and executed by the one or more processors to implement the following operations:
  • the multiple redundant audio frames are encapsulated into the at least one redundant data packet.
  • a computer-readable storage medium such as a memory including at least one program code, which can be executed by a processor in a terminal to complete the data transmission method in the foregoing embodiment.
  • the computer-readable storage medium may be ROM (Read-Only Memory), RAM (Random-Access Memory, random access memory), CD-ROM (Compact Disc Read-Only Memory, CD-ROM) , Magnetic tapes, floppy disks and optical data storage devices.
  • the at least one piece of program code is loaded by the processor and performs the following operations:
  • the number of redundant transmissions of the at least one audio frame to be transmitted is obtained.
  • the criticality level is positively correlated with the magnitude of the redundant multiple transmission factor, and the current redundancy The multiple is determined based on the current packet loss situation of the target terminal;
  • the at least one audio frame to be transmitted is copied according to the number of redundant transmissions to obtain at least one redundant data packet, and the at least one redundant data packet is sent to the target terminal.
  • the at least one piece of program code is loaded by the processor and performs the following operations:
  • any audio frame to be transmitted For any audio frame to be transmitted, acquiring at least one of energy change information, voice activity detection information, or fundamental frequency change information of the audio frame to be transmitted;
  • the at least one piece of program code is loaded by the processor and performs the following operations:
  • the criticality level of the audio frame to be transmitted is acquired as the most critical among the at least one level.
  • the determination condition of each level is related to at least one of energy change information, voice activity detection information, or fundamental frequency change information of the audio frame to be transmitted.
  • the at least one piece of program code is loaded by the processor and performs the following operations:
  • the determination conditions of each level are related to the energy change information, voice activity detection information, or fundamental frequency change of the audio frame to be transmitted At least one item in the information is relevant;
  • the criticality level of the audio frame to be transmitted is acquired as the current level
  • the criticality level includes the first level, the second level, the third level, and the fourth level;
  • the first-level determination conditions include: the voice activity detection value of the to-be-transmitted audio frame is 1, the voice activity detection value of the previous-to-be-transmitted audio frame of the to-be-transmitted audio frame is 0, and the to-be-transmitted audio frame and the previous An absolute value of the difference between the fundamental frequency values of the audio frames to be transmitted is greater than the target threshold;
  • the second-level determination conditions include: the voice activity detection value of the to-be-transmitted audio frame is 1, the voice-activity detection value of the previous to-be-transmitted audio frame of the to-be-transmitted audio frame is 1, and the energy value of the to-be-transmitted audio frame Is greater than the target multiple of the energy value of the previous audio frame to be transmitted; or, the voice activity detection value of the audio frame to be transmitted is 1, the voice activity detection value of the previous audio frame to be transmitted of the audio frame to be transmitted is 0, and The absolute value of the difference of the fundamental frequency value between the audio frame to be transmitted and the audio frame to be transmitted before is less than or equal to the target threshold;
  • the third-level determination condition includes: the voice activity detection value of the to-be-transmitted audio frame is 1;
  • the determination condition of the fourth level includes: the voice activity detection value of the to-be-transmitted audio frame is 0.
  • the at least one piece of program code is loaded by the processor and performs the following operations:
  • a value obtained by multiplying the redundancy multiple transmission factor corresponding to the criticality level of the audio frame to be transmitted by the current redundancy multiple is determined as the number of redundant transmissions of the audio frame to be transmitted.
  • the at least one piece of program code is loaded by the processor and performs the following operations:
  • the multiple redundant audio frames are encapsulated into the at least one redundant data packet.
  • a computer program product including instructions is also provided, which, when run on a computer, causes the computer to execute any of the possible implementations of the data transmission methods provided in the foregoing various embodiments, which will not be repeated here. .

Abstract

本公开公开了一种数据传输方法、装置、终端及存储介质,属于网络技术领域。本公开通过对待传输音频进行语音关键性分析,得到待传输音频中各个待传输音频帧的关键性级别,根据当前冗余倍数以及各个待传输音频帧的关键性级别对应的冗余多发因子,获取各个待传输音频帧的修正冗余倍数,从而按照该各个待传输音频帧的修正冗余倍数分别对各个待传输音频帧进行复制,得到至少一个冗余数据包,通过向目标终端发送该至少一个冗余数据包,能够在不造成网络拥堵的情况下提升网络抗丢包效果。

Description

数据传输方法、装置、终端及存储介质
本公开要求于2019年11月20日提交的申请号为201911141212.0、发明名称为“数据传输方法、装置、终端及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及网络技术领域,特别涉及一种数据传输方法、装置、终端及存储介质。
背景技术
随着网络技术的发展,终端与终端之间可以通过VoIP(Voice over Internet Protocol,基于互联网协议的语音传输)技术进行通话,由于互联网是不可靠的传输网络,因此发送端基于互联网传输的音频数据容易发生丢包现象。
目前,可以通过FEC(Forward Error Correction,前向纠错)、PLC(Packet Loss Concealment,丢包隐藏)或者ARQ(Automatic Repeat Request,自动重传请求)的方式来抵抗网络丢包。然而,FEC或者PLC技术对于连续突发的大量丢包现象的抗丢包效果不佳,ARQ技术又在一些实时性较高的场景下抗丢包效果不佳,因此,亟需一种在各类场景下提升网络抗丢包效果的数据传输方法。
发明内容
本公开实施例提供了一种数据传输方法、装置、终端及存储介质。技术方案如下:
一方面,提供了一种数据传输方法,该方法包括:
对待传输音频进行语音关键性分析,得到所述待传输音频中至少一个待传输音频帧的关键性级别,所述关键性级别用于衡量音频帧承载的信息量;
根据当前冗余倍数以及所述关键性级别对应的冗余多发因子,获取所述至少一个待传输音频帧的冗余发送次数,所述关键性级别与所述冗余多发因子的大小呈正相关,所述当前冗余倍数基于目标终端的当前丢包情况而确定;
按照所述冗余发送次数对所述至少一个待传输音频帧进行复制,得到至少 一个冗余数据包,向所述目标终端发送所述至少一个冗余数据包。
一方面,提供了一种数据传输装置,该装置包括:
分析模块,用于对待传输音频进行语音关键性分析,得到所述待传输音频中至少一个待传输音频帧的关键性级别,所述关键性级别用于衡量音频帧承载的信息量;
获取模块,用于根据当前冗余倍数以及所述关键性级别对应的冗余多发因子,获取所述至少一个待传输音频帧的冗余发送次数,所述关键性级别与所述冗余多发因子的大小呈正相关关系,所述当前冗余倍数基于目标终端的当前丢包情况而确定;
发送模块,用于按照所述冗余发送次数对所述至少一个待传输音频帧进行复制,得到至少一个冗余数据包,向所述目标终端发送所述至少一个冗余数据包。
一方面,提供了一种终端,该终端包括一个或多个处理器和一个或多个存储器,该一个或多个存储器中存储有至少一条程序代码,该至少一条程序代码由该一个或多个处理器加载并执行以实现如上述任一种可能实现方式的数据传输方法所执行的操作。
一方面,提供了一种存储介质,该存储介质中存储有至少一条程序代码,该至少一条程序代码由处理器加载并执行以实现如上述任一种可能实现方式的数据传输方法所执行的操作。
附图说明
为了更清楚地说明本公开实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本公开实施例提供的一种数据传输方法的实施环境示意图;
图2是本公开实施例提供的一种数据传输方法的流程图;
图3是本公开实施例提供的一种数据传输方法的原理性示意图;
图4是本公开实施例提供的一种数据传输装置的结构示意图;
图5是本公开实施例提供的一种终端的结构示意图。
具体实施方式
为使本公开的目的、技术方案和优点更加清楚,下面将结合附图对本公开实施方式作进一步地详细描述。
以下,对本公开实施例涉及到的术语进行说明。
VoIP(Voice over Internet Protocol,基于互联网协议的语音传输):一种基于IP(Internet Protocol,互联网协议)网络的语音通话方式,可选地,发送端设备通过音频压缩算法对音频数据进行编码压缩后,按照网络传输协议标准对编码压缩后的音频数据进行打包,得到音频数据包,通过IP网络将音频数据包发送到接收端设备所对应的目标IP地址,接收端设备对音频数据包解析和解压缩后,将音频数据包恢复为原来的语音信号,从而达到通过互联网传送语音信号的目的。
由于互联网并非是可靠的传输网络,该传输网络的不稳定性会导致音频数据在传输过程中发生丢包现象(也即是部分或全部的音频数据包在传输中丢失),使得接收端设备会出现声音卡顿、不连贯等情况,降低了音频收听者的用户体验,因此,承载在互联网上的音频应用(例如VoIP、语音广播、音视频直播等)需要面对的主要问题就是音频数据在传输过程中的抗丢包问题。有鉴于此,本公开实施例提供了一种在各类场景下均能够起到良好抗丢包效果的数据传输方法,将在下文进行详述。
图1是本公开实施例提供的一种数据传输方法的实施环境示意图。参见图1,该实施环境中可以包括发送端设备101和接收端设备102。
发送端设备101可以是第一用户使用的终端,发送端设备101上可以安装和运行有支持音频数据传输的应用程序,该应用程序可以是通话应用、社交应用、直播应用、外卖应用或者打车应用中任意一种,该应用程序可以通过VoIP、语音广播或者音视频直播等方式来提供音频数据传输服务。
接收端设备102可以是第二用户使用的终端,接收端设备102上可以安装和运行有支持音频数据传输的应用程序,该应用程序可以是通话应用、社交应用、直播应用、外卖应用或者打车应用中任意一种,该应用程序可以通过VoIP、语音广播或者音视频直播等方式来提供音频数据传输服务。
发送端设备101和接收端设备102之间可以通过有线或无线网络相连。
在一个示例性场景中,第一用户可以在发送端设备101的应用程序上触发 VoIP通话的呼叫请求,发送端设备101可以通过服务器向接收端设备102发送该呼叫请求,第二用户在接收端设备102的应用程序上触发呼叫响应,该呼叫响应用于表示接听VoIP通话或者拒绝VoIP通话,接收端设备102通过服务器向发送端设备101发送该呼叫响应,当发送端设备101接收到呼叫响应时,解析该呼叫响应,若呼叫响应表示接听VoIP通话时,发送端设备101与接收端设备102之间建立VoIP通话的数据通道,在该数据通道中传输音频数据。其中,该服务器可以是提供VoIP通话服务的计算机设备。
需要说明的是,发送端设备101和接收端设备102分别是指在一次数据传输过程中的数据发送方和数据接收方,在数据交互过程中,发送端设备101也可以作为数据接收方,同理,接收端设备102也可以作为数据发送方,发送端设备101和接收端设备102在VoIP通话中互相发送音频数据,以达到实时语音通话的交互效果。
可选地,发送端设备101和接收端设备102上安装的应用程序是相同的,或两个终端上安装的应用程序是不同操作系统平台的同一类型应用程序。发送端设备101可以泛指多个终端中的一个,接收端设备102可以泛指多个终端中的一个,本实施例仅以发送端设备101和接收端设备102来举例说明。发送端设备101和接收端设备102的设备类型相同或不同,该设备类型包括:智能手机、平板电脑、电子书阅读器、MP3(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)播放器、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机中的至少一种。例如,发送端设备101和接收端设备102可以是智能手机,或者其他手持便携式终端设备。以下实施例,以终端包括智能手机来举例说明。
本领域技术人员可以知晓,上述终端的数量可以更多或更少。比如在双方通话的场景下,上述终端可以仅为两个,或者在多方通话的场景下,上述终端为几十个或几百个,或者更多数量。本公开实施例对终端的数量和设备类型不加以限定。
图2是本公开实施例提供的一种数据传输方法的流程图。参见图2,该实施例可以应用于上述实施环境中发送端设备101和接收端设备102的交互过程,以发送端设备101为终端、接收端设备102为目标终端为例进行说明,该实施 例包括下述步骤:
201、终端获取待传输音频。
在上述过程中,终端可以调用录音接口录制一段待传输音频,或者,终端也可以从本地预存的音频中选择一段音频作为待传输音频,本公开实施例不对待传输音频的获取方式进行具体限定。
在一个示例性场景中,用户可以在终端上登录提供音频服务的应用程序,当检测到用户对该应用程序的VoIP通话选项的触发操作时,终端向目标终端发起VoIP通话请求,当目标终端向终端返回VoIP通话的接听响应时,终端上该应用程序调用录音接口进行录音,得到该待传输音频。
202、终端对该待传输音频中任一待传输音频帧,获取该待传输音频帧的能量变化信息、语音活跃检测信息或者基频变化信息中至少一项。
在上述过程中,该待传输音频可以包括自然分帧得到的至少一个待传输音频帧,在一些实施例中,终端还可以对该待传输音频进行重新分帧得到至少一个待传输音频帧,该至少一个待传输音频帧之间可以具有时域或频域上的关联性。
在一些实施例中,终端在对待传输音频进行分帧处理时,可以先对待传输音频进行预加重处理,以增强待传输音频的高频分量,再通过窗函数将预加重后的待传输音频分割为时长相等的至少一个待传输音频帧,得到待传输音频的至少一个待传输音频帧,需要说明的是,各个待传输音频帧之间可以具有一定比率的重叠率,从而能够保证各个待传输音频帧的边缘特征不受损失,可选地,该窗函数可以是哈明(hamming)窗、汉宁(hanning)窗或者矩形窗中至少一项,本公开实施例不对分帧采用的窗函数类型进行具体限定。
在上述过程中,任一待传输音频帧的能量变化信息可以为该待传输音频帧与该待传输音频帧的前一待传输音频帧之间的短时能量变化信息,或者,也可以为该待传输音频帧与该待传输音频帧的前一待传输音频帧之间的短时平均幅度变化信息;任一待传输音频帧的语音活跃检测(Voice Activity Detection,VAD)信息可以为该待传输音频帧与该待传输音频帧的前一待传输音频帧之间的VAD值变化信息;任一待传输音频帧的基频变化信息可以为该待传输音频帧与该待传输音频帧的前一待传输音频帧之间的基频值变化信息。
可选地,终端可以对上述至少一个待传输音频帧进行短时能量检测,得到各个待传输音频帧的能量值,对上述至少一个待传输音频帧进行基于音频短时 平稳性等特征的VAD检测,得到各个待传输音频帧的VAD值,对上述至少一个待传输音频帧进行基频检测,得到各个待传输音频帧的基频值,其中,基频值是指音高,是音频的一种重要特征,不同语素具有不同的基频值,在一段正常的语音中基频值是不断变化的,因此如果基频值在某一时段内变化很小,则可以认为该时段内的各待传输音频帧属于噪音帧。
203、终端根据该待传输音频帧的能量变化信息、语音活跃检测信息或者基频变化信息中至少一项,确定该待传输音频帧的关键性级别。
其中,该关键性级别用于衡量待传输音频帧承载的信息量,待传输音频帧承载的信息量通常与待传输音频帧属于噪音帧的可能性相关,也即是说,当待传输音频帧承载的信息量越大时,则认为该待传输音频帧属于噪音帧的可能性越小,当待传输音频帧承载的信息量越小时,则认为该待传输音频帧属于噪音帧的可能性越大。可选地,该关键性级别可以包括至少一个级别,随着各级别的关键性从高到低的顺序,代表着处于各级别的待传输音频帧承载的信息量越来越少。
在一些实施例中,对任一待传输音频帧,终端可以通过下述两种方式中任一种来确定该待传输音频帧的关键性级别:
方式一、终端分别判定该待传输音频帧是否满足各个级别的判定条件,当该待传输音频帧满足至少一个级别的判定条件时,将该待传输音频帧的关键性级别获取为该至少一个级别中关键性最高的级别,其中,各个级别的判定条件与待传输音频帧的能量变化信息、语音活跃检测信息或者基频变化信息中至少一项相关。
也即是说,终端可以逐一判断该待传输音频帧是否满足每个级别的判定条件,若该待传输音频帧仅满足单个级别的判定条件时,将该待传输音频帧的关键性级别获取为该级别,若该待传输音频帧同时满足多个级别的判定条件时,终端可以将多个级别中关键性最高的级别确定为该待传输音频帧的关键性级别。在上述过程中,终端通过逐一判断该待传输音频帧是否满足所有级别的判定条件,并选取关键性最高的级别作为该待传输音频帧的关键性级别,能够在各个级别的判定条件互有重叠的情况下,保证待传输音频帧所属的关键性级别的准确性。
方式二、终端按照关键性从高到低的顺序,分别判定该待传输音频帧是否满足各个级别的判定条件,其中,各个级别的判定条件与待传输音频帧的能量 变化信息、语音活跃检测信息或者基频变化信息中至少一项相关;若该待传输音频帧满足当前级别的判定条件,将该待传输音频帧的关键性级别获取为该当前级别;若该待传输音频帧不满足该当前级别的判定条件,则执行判定该待传输音频帧是否满足下一级别的判定条件的操作。
也即是说,终端按照关键性从高到低的顺序,分别判断该待传输音频帧是否满足各个级别的判定条件,从关键性最高的级别开始,若该待传输音频帧满足该关键性最高的级别的判定条件,将该待传输音频帧的关键性级别获取为该关键性最高的级别,无需执行后续各个级别的判定操作,若该待传输音频帧不满足该关键性最高的级别的判定条件,继续判断该待传输音频帧是否满足关键性第二高的级别的判定条件,以此类推,这里不做赘述。在上述过程中,无需将待传输音频帧与每个级别都进行一一判定,当确定待传输音频帧的关键性级别之后,无需再判定该待传输音频帧是否满足下一个关键性较低的级别的判定条件,优化了终端上针对关键性级别的判断逻辑,提升了终端确定各待传输音频帧的关键性级别的处理效率。
在一些实施例中,该关键性级别可以包括第一级、第二级、第三级以及第四级,下面对四个级别的判定条件进行详述:
可选地,该第一级的判定条件包括:该待传输音频帧的语音活跃检测值(VAD值)为1、该待传输音频帧的前一待传输音频帧的VAD值为0且该待传输音频帧与该前一待传输音频帧之间基频值之差的绝对值大于目标阈值。
其中,该目标阈值可以是任一大于或等于0的数值,本公开实施例不对目标阈值的取值进行具体限定。
由于该待传输音频帧的VAD值为1、前一待传输音频帧的VAD值为0,说明在这一待传输音频帧发生了VAD值的跳变,该待传输音频帧很可能在待传输音频中是属于从噪声帧(或静音帧)过渡到非噪声帧(或非静音帧)的起始帧,进一步地,该待传输音频帧与前一待传输音频帧之间的基频值之差的绝对值较大,说明该待传输音频帧很可能在待传输音频中是属于语音音调发生变化的待传输音频帧,因此当某一待传输音频帧同时满足上述几个情况时,说明该待传输音频帧既是噪声帧(或静音帧)过渡到非噪声帧(或非静音帧)的起始帧,又是语音音调发生变化的待传输音频帧,因此可以确定该待传输音频帧的关键性级别为第一级,也即是关键性最高的级别,该待传输音频帧所承载的信息量最大。
可选地,该第二级的判定条件包括:该待传输音频帧的VAD值为1、该待传输音频帧的前一待传输音频帧的VAD值为1且该待传输音频帧的能量值大于该前一待传输音频帧的能量值的目标倍数;或,该待传输音频帧的VAD值为1、该待传输音频帧的前一待传输音频帧的VAD值为0且该待传输音频帧与该前一待传输音频帧之间基频值之差的绝对值小于或等于该目标阈值。
其中,该目标倍数可以是任一大于或等于1的数值,本公开实施例不对目标倍数的取值进行具体限定。
一方面,由于该待传输音频帧的VAD值为1、前一待传输音频帧的VAD值为1且该待传输音频帧的能量值大于前一待传输音频帧的能量值的目标倍数,说明该待传输音频帧很可能在待传输音频中是从清音过渡到浊音的待传输音频帧,而另一方面,由于该待传输音频帧的VAD值为1、前一待传输音频帧的VAD值为0且该待传输音频帧与前一待传输音频帧之间基频值之差的绝对值小于或等于目标阈值,说明该待传输音频帧虽然发生语音音调变化的可能性不大,但仍然属于噪声帧(或静音帧)过渡到非噪声帧(或非静音帧)的起始帧的可能性很大,因此当某一待传输音频帧满足上述任一方面的判定条件时,将该待传输音频帧的关键性级别确定为第二级。
可选地,该第三级的判定条件包括:该待传输音频帧的VAD值为1。
在上述过程中,该待传输音频帧的VAD值为1,说明该待传输音频帧属于非噪声帧(或非静音帧)的可能性较大,因此当某一待传输音频帧的VAD值为1且不符合第一级以及第二级的判定条件时,将该待传输音频帧的关键性级别确定为第三级。
可选地,该第四级的判定条件包括:该待传输音频帧的VAD值为0。
在上述过程中,该音频的VAD值为0,说明该待传输音频帧属于噪声帧(或静音帧)的可能性较大,因此当某一待传输音频帧的VAD值为0时,将该待传输音频帧的关键性级别确定为第四级。
通过上述四个关键性级别的划分,可以分别按照上述方式一或方式二来确定各个待传输音频帧所对应的关键性级别,当然,在一些实施例中,还可以将关键性级别划分为更多或者更少的数量,并且还可以为各个关键性级别确定其他的判定条件,本公开实施例不对关键性级别的数量以及各个关键性级别的判定条件进行具体限定。
需要说明的是,上述步骤202-203是以任一待传输音频帧为例说明如何确定 该待传输音频帧的关键性级别,终端对各个待传输音频帧重复执行上述步骤202-203,即可实现对待传输音频进行语音关键性分析,得到该待传输音频中至少一个待传输音频帧的关键性级别,由于是分别对各个待传输音频帧进行了关键性级别的划分,因此可以根据各个待传输音频帧所属的关键性级别的不同,在后续步骤中为各个待传输音频帧分配不同的修正冗余倍数(也即冗余发送次数),从而为关键性级别较高的待传输音频帧设置较大的修正冗余倍数,为关键性级别较低的待传输音频帧设置较小的修正冗余倍数,相较于对所有待传输音频帧设置相同的当前冗余倍数的情况,能够避免占用较多的系统带宽资源,避免由于冗余多发机制而造成的网络拥堵,从而可以在避免网络拥堵的情况下提升音频数据传输的抗丢包效果。
204、终端对任一待传输音频帧,获取该待传输音频帧的关键性级别所对应的冗余多发因子。
其中,对具有不同关键性级别的待传输音频帧而言,不同的关键性级别对应不同的冗余多发因子,该关键性级别与该冗余多发因子的大小呈正相关关系,也即是说,关键性级别越高,冗余多发因子越大,关键性级别越低,冗余多发因子越小,该冗余多发因子可以是任一大于0且小于或等于1的数值。
例如,以上述四级划分的关键性级别为例说明,各个级别的冗余多发因子a可以分别进行如下配置:第一级的冗余多发因子a=1,第二级的冗余多发因子a=0.7,第三级的冗余多发因子a=0.4,第四级的冗余多发因子a=0.2。
在上述过程中,终端可以根据关键性级别与冗余多发因子的映射关系,对任一待传输音频帧,将该待传输音频帧的关键性级别映射至对应的冗余多发因子,可选地,该映射关系可以是终端预存在本地的,也可以是终端每当需要传输音频数据时从服务器中下载的,还可以是终端定期从服务器中下载的,在一些实施例中,服务器还可以定期更新该映射关系,在更新映射关系后向各个终端下发更新后的映射关系,使得终端将已有的映射关系替换为该更新后的映射关系。
205、终端将该冗余多发因子与当前冗余倍数相乘所得的数值确定为该待传输音频帧的修正冗余倍数。
其中,该当前冗余倍数基于目标终端的当前丢包情况而确定,各个待传输音频帧的修正冗余倍数用于表示各个待传输音频帧的冗余发送次数。
上述步骤205,也即是说,终端将该冗余多发因子与当前冗余倍数相乘所得 的数值确定为该待传输音频帧的冗余发送次数。
假设用a(0<a≤1)表示冗余多发因子,用r0表示目标终端的当前冗余倍数,那么任一待传输音频帧的修正冗余倍数r可以表示为:r=a*r0。
在上述过程中,终端可以从目标终端处获取目标终端的当前丢包情况,终端根据目标终端的当前丢包情况确定该当前冗余倍数,对同一目标终端而言当前冗余倍数并非是一成不变的,也即是说,随着目标终端在不同时段下当前丢包情况的变化,目标终端的当前冗余倍数也会随之变化,可选地,当当前丢包情况较为严重(例如丢包率较高)时,当前冗余倍数的数值越大,当当前丢包情况较为良好(例如丢包率较低)时,当前冗余倍数的数值越小。可选地,当前丢包情况可以包括丢包数、丢包率、网络延时或者网络波动中至少一项,本公开实施例不对当前丢包情况的内容进行具体限定。
在一些实施例中,还可以由目标终端自从统计当前丢包情况后,自行确定目标终端的当前冗余倍数,然后将该当前冗余倍数发送至上述终端,本公开实施例不对当前冗余倍数由终端确定还是目标终端确定进行具体限定。
终端对各个待传输音频帧重复执行上述步骤204-205,获取各个待传输音频帧的冗余多发因子,将各个待传输音频帧的冗余多发因子与当前冗余倍数相乘得到各个待传输音频帧的修正冗余倍数(也即冗余发送次数),也即是说,终端根据当前冗余倍数以及各个待传输音频帧的关键性级别对应的冗余多发因子,获取各个待传输音频帧的修正冗余倍数,也即获取到至少一个待传输音频帧的冗余发送次数,不同待传输音频帧的修正冗余倍数可以相同,也可以不同,通常情况下,关键性级别越高的待传输音频帧,冗余多发因子的取值越大,从而修正冗余倍数也就越大,关键性级别越低的待传输音频帧,冗余多发因子的取值越小,从而修正冗余倍数也就越小。
206、终端对该待传输音频进行编码处理,得到音频码流,将该音频码流发送至目标终端。
在上述过程中,终端可以采取任意编码方式对待传输音频进行编码,得到该音频码流,将该音频码流可以封装为至少一个音频数据包,将该至少一个音频数据包添加到网络会话的发送队列中,向目标终端发送该发送队列中的、携带该音频码流的至少一个音频数据包。
可选地,终端采用的编码方式可以包括下述任一种:波形编码、参数编码、混合编码或者FEC(Forward Error Correction,前向纠错)编码,本公开实施例 不对待传输音频的编码方式进行具体限定。
可选地,终端通过编码得到的音频码流也可以是任一种格式的,例如PCM(Pulse Code Modulation,脉冲编码调制)格式、MP3(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)格式、OGG格式(OGG Vobis,一种音频压缩格式)等,本公开实施例也不对音频码流的格式进行具体限定。
在一些实施例中,终端与目标终端可以预先约定双方之间传输的音频码流所采用的一种或多种格式,从而终端将待传输音频编码为上述预先约定的一种或多种格式的音频码流,例如,该一种或多种格式可以为终端与目标终端双方均支持的音频压缩格式。
需要说明的是,上述步骤206可以在上述步骤202-205中任一步骤之前或者之后执行,也即是说,本公开实施例不对上述步骤206与上述步骤202-205中任一步骤之间的执行时序进行具体限定。
207、终端分别对该音频码流中各个待传输音频帧进行多次复制,直到各个待传输音频帧的复制次数分别到达各个待传输音频帧的修正冗余倍数,得到多个冗余音频帧。
上述步骤207,也即终端分别对该音频码流中至少一个待传输音频帧进行多次复制,直到复制次数分别到达该至少一个待传输音频帧的冗余发送次数,得到多个冗余音频帧。
在上述过程中,假设某一待传输音频帧的修正冗余倍数用r表示,那么终端对音频码流中该待传输音频帧进行r次复制,得到r个冗余音频帧,以此类推,对每个待传输音频帧进行各自修正冗余倍数次的复制操作,得到修正冗余倍数个的冗余音频帧,各个待传输音频帧的冗余音频帧的集合可以称为上述多个冗余音频帧。
在上述过程中,由于关键性级别不同的待传输音频帧具有不同的修正冗余倍数,因此,关键性级别不同的待传输音频帧也对应于不同的复制次数,也即是说,关键性级别不同待传输音频帧各自具有不同数量的冗余音频帧,从而,对一些关键性级别较低的噪声帧(或静音帧)而言,修正冗余倍数越小,复制次数越少,冗余音频帧的数量越小,所占用的系统带宽资源就越少,这是由于噪声帧(或静音帧)即使发生了网络丢包,也不会对音频数据的传输效果造成较大的不良影响,因此无需分配较多的系统带宽资源去确保这些待传输音频帧 的传输可靠性,而对于一些关键性级别较高的非噪声帧(或非静音帧),修正冗余倍数越大,复制次数越多,冗余音频帧的数量越大,所占用的系统带宽资源越多,这是由于当这些待传输音频帧发生网络丢包时,会对音频数据的传输效果造成较大的不良影响,因此可以在系统带宽资源有限的情况下,分配较多的系统带宽资源去保证这些待传输音频帧的传输可靠性,保证了系统带宽资源配置的合理性,避免了由于冗余多发机制而导致的网络堵塞。
208、终端将该多个冗余待传输音频帧封装为至少一个冗余数据包,向该目标终端发送携带各个待传输音频帧的该至少一个冗余数据包。
在上述过程中,终端与目标终端在建立网络会话后,终端可以将该至少一个冗余数据包添加至该网络会话的发送队列中,可选地,在该发送队列中可以按照一定的规律对各个冗余数据包进行排序。
可选地,排序的规律可以如下:获取各个冗余数据包内携带的所有冗余音频帧的关键性级别的累计权值,按照累计权值从高到低的顺序对各个冗余数据包进行排序,其中,关键性级别越高则权值越高,一个冗余数据包的累计权值即为该冗余数据包内所有冗余音频帧的关键性级别的权值之和。
在上述步骤206-208中,终端按照该各个待传输音频帧的修正冗余倍数分别对各个待传输音频帧进行复制,得到至少一个冗余数据包,向该目标终端发送各个待传输音频帧的该至少一个冗余数据包。也即是说,终端按照冗余发送次数对该至少一个待传输音频帧进行复制,得到至少一个冗余数据包,向目标终端发送该至少一个冗余数据包。从而不但向目标终端传输音频码流,而且还向目标终端传输各个冗余数据包,这样一种冗余多发的机制,能够较好地抵抗互联网传输过程的丢包风险,提升各类场景下的抗丢包现象。
在一些实施例中,当携带音频码流的至少一个音频数据包与携带多个冗余音频帧的至少一个冗余数据包均添加到发送队列的缓存区时,可以将该至少一个音频数据包的发送顺序置于该至少一个冗余数据包之前,从而先向目标终端发送至少一个音频数据包,如果发生当前丢包情况,目标终端可以根据后续接收到的至少一个冗余数据包来获取传输中丢失的待传输音频帧对应的冗余音频帧,从而使得目标终端的抗丢包现象得到改善,这样不论在大量丢包场景还是在实时性较高的场景下,均能够在不造成网络拥堵的情况下,大大改善数据传输中的抗丢包现象。
上述所有可选技术方案,可以采用任意结合形成本公开的可选实施例,在 此不再一一赘述。
本公开实施例提供的方法,通过对待传输音频进行语音关键性分析,得到该待传输音频中各个待传输音频帧的关键性级别,根据当前冗余倍数以及各个待传输音频帧的关键性级别对应的冗余多发因子,获取各个待传输音频帧的修正冗余倍数,其中,该当前冗余倍数基于目标终端的当前丢包情况而确定,各个待传输音频帧的修正冗余倍数用于表示各个待传输音频帧的冗余发送次数,由于该关键性级别用于衡量音频帧承载的信息量,不同的关键性级别对应不同的冗余多发因子,该关键性级别与该冗余多发因子的大小呈正相关关系,也即是说关键性级别越高的待传输音频帧对应于数值越大的冗余多发因子,从而可以得到数值越大的修正冗余倍数,从而按照该各个待传输音频帧的修正冗余倍数分别对各个待传输音频帧进行复制,使得不同的待传输音频帧可以具有不同的复制份数,得到至少一个冗余数据包,通过向目标终端发送该至少一个冗余数据包,能够基于一种冗余多发机制增加目标终端接收到各个待传输音频帧的概率,从而对抗数据传输中的丢包现象,由于为承载了不同信息量的待传输音频帧分配不同的修正冗余倍数,因此能够避免由于冗余多发机制而造成的网络拥堵,优化了冗余多发机制下系统带宽的资源配置方案,不管是在连续突发的大量丢包场景,还是在实时性较高的场景下,均能够在不造成网络拥堵的情况下提升网络抗丢包效果。
图3是本公开实施例提供的一种数据传输方法的原理性示意图,参见图3,示出了发送端设备与接收端设备之间的音频数据传输流程,发送端设备获取待传输录音后,一方面,发送端设备对待传输录音进行编码生成音频码流,另一方面,发送端设备对待传输录音进行语音关键性分析,确认待传输录音中各个待传输音频帧的关键性级别,不同关键性级别的待传输音频帧对应于不同的冗余多发因子,发送端设备根据接收端设备对当前丢包情况的反馈信息,确定与当前丢包情况所匹配的当前冗余倍数,根据当前冗余倍数与各个待传输音频帧的关键性级别对应的冗余多发因子,获取各个待传输音频帧的修正冗余倍数,上述确定各个待传输音频帧的修正冗余倍数的过程也可以称为“冗余多发决策”,发送端设备基于各个待传输音频帧的修正冗余倍数,对编码后音频码流的各个待传输音频帧分别进行多份复制,使得各个待传输音频帧的复制次数等于 各个待传输音频帧的修正冗余倍数,将复制得到的各个待传输音频帧封装为至少一个冗余数据包,对各个冗余数据包按照一定规律进行排序后,均匀分布在与接收端设备对应的发送队列中,发送端设备通过网络将携带音频码流的音频数据包以及至少一个冗余数据包发送到接收端设备,由于音频码流和冗余码流(也即是复制得到的至少一个冗余数据包所形成的码流)实际上具有相同的码流数据,因此对同一段音频数据而言,接收端设备只需要从网络获取到音频数据包或者冗余数据包中的任一个即可,接收端设备按照各个数据包(音频数据包或冗余数据包)的序号对各个数据包进行整理,如果检测到接收了具有相同序号的多个数据包时,保留具有该序号的任一个数据包,过滤掉接收到的其他重复数据包,接收端设备在过滤完毕后,通过解码器对整理后的各个数据包进行解码,得到声音信号,进而可以播放该声音信号,接收端设备还可以对当前丢包情况进行统计,例如统计丢包率、网络延时、网络波动等,向发送端设备发送统计得到的当前丢包情况,以便于发送端设备及时根据当前丢包情况调整下一次音频数据传输时采用的当前冗余倍数。
在上述过程中,通过分析各个待传输音频帧的关键性(也可以称为重要性),从而对一些非语音帧而言,例如静音帧、噪声帧等,由于承载的信息量较低,属于非关键帧,这些非关键帧即使遭受网络丢包也不会对音频传输效果(例如VoIP通话效果)造成影响,没有必要耗用过多的带宽资源对非关键帧进行冗余多发来确保其传输可靠性,因此可以基于冗余多发决策为非关键帧分配较低的修正冗余倍数;反之,对一些语音关键帧而言,例如从非语音帧到语音帧过渡的语音起始帧、语音音调发生变化的语音帧等,由于承载的信息量较大,属于关键帧,这些关键帧如果在弱网传输中遭受丢包,那么会对音频传输效果(例如VoIP通话效果)造成不良影响,使得接收端设备播放声音信号时出现声音卡顿、断续或者忽大忽小等问题,极大降低了语音收听者的用户体验,因此可以基于冗余多发决策为关键帧分配较高的冗余多发倍数来确保其传输可靠性,也即是确保关键帧所在的数据包不会被丢失,从而解决关键帧的网络丢包问题,这样能够在带宽系统有限的情况下,通过合理的资源分配方案利用有限的带宽资源来达到良好的抗丢包效果,避免由于冗余多发决策造成的网络拥堵,也就避免了由于网络拥堵所可能导致的更严重的网络丢包问题。
在本公开实施例中,提供了一种应用于音频传输业务的冗余多发传输方式,该传输方式的冗余发送决策中,各个待传输音频帧的修正冗余倍数不但因传输 网络质量的变化而随时调整,而且还根据当前传输的音频内容的关键性而发生变化,能够让关键性高的待传输音频帧得到更好的传输保障,而由于关键性低的待传输音频帧对音质影响不大,则采用较小的修正冗余倍数进行传输,能够有效利用网络带宽资源,由于在音频实时传输的过程中音频内容的关键性存在较大波动,本公开实施例提供的冗余多发机制,能够确保高质量的音频数据传输,能够实现高可靠性的VoIP、音频广播、音视频直播等业务的实时音频传输,不但在连续突发的大量丢包场景下抗丢包效果优异,而且在实时性较高的场景下抗丢包效果也得到显著提升。
图4是本公开实施例提供的一种数据传输装置的结构示意图,参见图4,该装置包括:
分析模块401,用于对待传输音频进行语音关键性分析,得到该待传输音频中各个待传输音频帧的关键性级别,该关键性级别用于衡量音频帧承载的信息量,不同的关键性级别对应不同的冗余多发因子,该关键性级别与该冗余多发因子的大小呈正相关关系;
也即是说,该分析模块401,用于对待传输音频进行语音关键性分析,得到该待传输音频中至少一个待传输音频帧的关键性级别,该关键性级别用于衡量音频帧承载的信息量;
获取模块402,用于根据当前冗余倍数以及该各个待传输音频帧的关键性级别对应的冗余多发因子,获取各个待传输音频帧的修正冗余倍数,该当前冗余倍数基于目标终端的当前丢包情况而确定,该各个待传输音频帧的修正冗余倍数用于表示各个待传输音频帧的冗余发送次数;
也即是说,该获取模块402,用于根据当前冗余倍数以及该关键性级别对应的冗余多发因子,获取该至少一个待传输音频帧的冗余发送次数,该关键性级别与该冗余多发因子的大小呈正相关关系,该当前冗余倍数基于目标终端的当前丢包情况而确定;
发送模块403,用于按照该各个待传输音频帧的修正冗余倍数分别对各个待传输音频帧进行复制,得到至少一个冗余数据包,向该目标终端发送各个待传输音频帧的该至少一个冗余数据包;
也即是说,该发送模块403,用于按照该冗余发送次数对该至少一个待传输音频帧进行复制,得到至少一个冗余数据包,向该目标终端发送该至少一个冗 余数据包。
本公开实施例提供的装置,通过对待传输音频进行语音关键性分析,得到该待传输音频中各个待传输音频帧的关键性级别,根据当前冗余倍数以及各个待传输音频帧的关键性级别对应的冗余多发因子,获取各个待传输音频帧的修正冗余倍数,其中,该当前冗余倍数基于目标终端的当前丢包情况而确定,各个待传输音频帧的修正冗余倍数用于表示各个待传输音频帧的冗余发送次数,由于该关键性级别用于衡量音频帧承载的信息量,不同的关键性级别对应不同的冗余多发因子,该关键性级别与该冗余多发因子的大小呈正相关关系,也即是说关键性级别越高的待传输音频帧对应于数值越大的冗余多发因子,从而可以得到数值越大的修正冗余倍数,从而按照该各个待传输音频帧的修正冗余倍数分别对各个待传输音频帧进行复制,使得不同的待传输音频帧可以具有不同的复制份数,得到至少一个冗余数据包,通过向目标终端发送该至少一个冗余数据包,能够基于一种冗余多发机制增加目标终端接收到各个待传输音频帧的概率,从而对抗数据传输中的丢包现象,由于为承载了不同信息量的待传输音频帧分配不同的修正冗余倍数,因此能够避免由于冗余多发机制而造成的网络拥堵,优化了冗余多发机制下系统带宽的资源配置方案,不管是在连续突发的大量丢包场景,还是在实时性较高的场景下,均能够在不造成网络拥堵的情况下提升网络抗丢包效果。
在一种可能实施方式中,基于图4的装置组成,该分析模块401包括:
获取单元,用于对该待传输音频中任一待传输音频帧,获取该待传输音频帧的能量变化信息、语音活跃检测信息或者基频变化信息中至少一项;
确定单元,用于根据该待传输音频帧的能量变化信息、语音活跃检测信息或者基频变化信息中至少一项,确定该待传输音频帧的关键性级别。
在一种可能实施方式中,该确定单元用于:
分别判定该待传输音频帧是否满足各个级别的判定条件,当该待传输音频帧满足至少一个级别的判定条件时,将该待传输音频帧的关键性级别获取为该至少一个级别中关键性最高的级别,其中,各个级别的判定条件与待传输音频帧的能量变化信息、语音活跃检测信息或者基频变化信息中至少一项相关。
在一种可能实施方式中,该确定单元用于:
按照关键性从高到低的顺序,分别判定该待传输音频帧是否满足各个级别的判定条件,其中,各个级别的判定条件与待传输音频帧的能量变化信息、语 音活跃检测信息或者基频变化信息中至少一项相关;
若该待传输音频帧满足当前级别的判定条件,将该待传输音频帧的关键性级别获取为该当前级别;
若该待传输音频帧不满足该当前级别的判定条件,则执行判定该待传输音频帧是否满足下一级别的判定条件的操作。
在一种可能实施方式中,该关键性级别包括第一级、第二级、第三级以及第四级;
该第一级的判定条件包括:该待传输音频帧的语音活跃检测值为1、该待传输音频帧的前一待传输音频帧的语音活跃检测值为0且该待传输音频帧与该前一待传输音频帧之间基频值之差的绝对值大于目标阈值;
该第二级的判定条件包括:该待传输音频帧的语音活跃检测值为1、该待传输音频帧的前一待传输音频帧的语音活跃检测值为1且该待传输音频帧的能量值大于该前一待传输音频帧的能量值的目标倍数;或,该待传输音频帧的语音活跃检测值为1、该待传输音频帧的前一待传输音频帧的语音活跃检测值为0且该待传输音频帧与该前一待传输音频帧之间基频值之差的绝对值小于或等于该目标阈值;
该第三级的判定条件包括:该待传输音频帧的语音活跃检测值为1;
该第四级的判定条件包括:该待传输音频帧的语音活跃检测值为0。
在一种可能实施方式中,该获取模块402用于:
对任一待传输音频帧,将该待传输音频帧的关键性级别所对应的冗余多发因子与该当前冗余倍数相乘所得的数值确定为该待传输音频帧的修正冗余倍数(也即冗余发送次数)。
在一种可能实施方式中,该发送模块403用于:
对该待传输音频进行编码处理,得到音频码流;
分别对该音频码流中各个待传输音频帧进行多次复制,直到各个待传输音频帧的复制次数分别到达该各个待传输音频帧的修正冗余倍数,得到多个冗余音频帧;
将该多个冗余音频帧封装为该至少一个冗余数据包。
也即是说,该发送模块403用于:
对该待传输音频进行编码,得到音频码流;
分别对该音频码流中至少一个待传输音频帧进行多次复制,直到复制次数 分别到达该至少一个待传输音频帧的冗余发送次数,得到多个冗余音频帧;
将该多个冗余音频帧封装为该至少一个冗余数据包。
上述所有可选技术方案,可以采用任意结合形成本公开的可选实施例,在此不再一一赘述。
需要说明的是:上述实施例提供的数据传输装置在传输数据时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将终端设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的数据传输装置与数据传输方法实施例属于同一构思,其实现过程详见数据传输方法实施例,这里不再赘述。
图5是本公开实施例提供的一种终端的结构示意图。该终端500可以是:智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。终端500还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。
通常,终端500包括有:处理器501和存储器502。
处理器501可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器501可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器501也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器501可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器501还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器502可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器502还可包括高速随机存取存储器,以及非易失 性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器502中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器501所执行以实现本公开中各个实施例提供的数据传输方法。
在一些实施例中,终端500还可选包括有:外围设备接口503和至少一个外围设备。处理器501、存储器502和外围设备接口503之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口503相连。可选地,外围设备包括:射频电路504、显示屏505、摄像头组件506、音频电路507、定位组件508和电源509中的至少一种。
外围设备接口503可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器501和存储器502。在一些实施例中,处理器501、存储器502和外围设备接口503被集成在同一芯片或电路板上;在一些其他实施例中,处理器501、存储器502和外围设备接口503中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限定。
射频电路504用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路504通过电磁信号与通信网络以及其他通信设备进行通信。射频电路504将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。可选地,射频电路504包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路504可以通过至少一种无线通信协议来与其它终端进行通信。该无线通信协议包括但不限于:城域网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。在一些实施例中,射频电路504还可以包括NFC(Near Field Communication,近距离无线通信)有关的电路,本公开对此不加以限定。
显示屏505用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏505是触摸显示屏时,显示屏505还具有采集在显示屏505的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器501进行处理。此时,显示屏505还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏505可以为一个,设置终端500的前面板;在另一些实施例中,显示屏505可以为至少两个,分别设置在终端500的不同表面或呈折叠设计;在再一些 实施例中,显示屏505可以是柔性显示屏,设置在终端500的弯曲表面上或折叠面上。甚至,显示屏505还可以设置成非矩形的不规则图形,也即异形屏。显示屏505可以采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件506用于采集图像或视频。可选地,摄像头组件506包括前置摄像头和后置摄像头。通常,前置摄像头设置在终端的前面板,后置摄像头设置在终端的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能或者其它融合拍摄功能。在一些实施例中,摄像头组件506还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,可以用于不同色温下的光线补偿。
音频电路507可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器501进行处理,或者输入至射频电路504以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在终端500的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器501或射频电路504的电信号转换为声波。扬声器可以是传统的薄膜扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路507还可以包括耳机插孔。
定位组件508用于定位终端500的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。定位组件508可以是基于美国的GPS(Global Positioning System,全球定位系统)、中国的北斗系统、俄罗斯的格雷纳斯系统或欧盟的伽利略系统的定位组件。
电源509用于为终端500中的各个组件进行供电。电源509可以是交流电、直流电、一次性电池或可充电电池。当电源509包括可充电电池时,该可充电电池可以支持有线充电或无线充电。该可充电电池还可以用于支持快充技术。
在一些实施例中,终端500还包括有一个或多个传感器510。该一个或多个传感器510包括但不限于:加速度传感器511、陀螺仪传感器512、压力传感器 513、指纹传感器514、光学传感器515以及接近传感器516。
加速度传感器511可以检测以终端500建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器511可以用于检测重力加速度在三个坐标轴上的分量。处理器501可以根据加速度传感器511采集的重力加速度信号,控制显示屏505以横向视图或纵向视图进行用户界面的显示。加速度传感器511还可以用于游戏或者用户的运动数据的采集。
陀螺仪传感器512可以检测终端500的机体方向及转动角度,陀螺仪传感器512可以与加速度传感器511协同采集用户对终端500的3D动作。处理器501根据陀螺仪传感器512采集的数据,可以实现如下功能:动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。
压力传感器513可以设置在终端500的侧边框和/或显示屏505的下层。当压力传感器513设置在终端500的侧边框时,可以检测用户对终端500的握持信号,由处理器501根据压力传感器513采集的握持信号进行左右手识别或快捷操作。当压力传感器513设置在显示屏505的下层时,由处理器501根据用户对显示屏505的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
指纹传感器514用于采集用户的指纹,由处理器501根据指纹传感器514采集到的指纹识别用户的身份,或者,由指纹传感器514根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器501授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器514可以被设置终端500的正面、背面或侧面。当终端500上设置有物理按键或厂商Logo时,指纹传感器514可以与物理按键或厂商Logo集成在一起。
光学传感器515用于采集环境光强度。在一个实施例中,处理器501可以根据光学传感器515采集的环境光强度,控制显示屏505的显示亮度。可选地,当环境光强度较高时,调高显示屏505的显示亮度;当环境光强度较低时,调低显示屏505的显示亮度。在另一个实施例中,处理器501还可以根据光学传感器515采集的环境光强度,动态调整摄像头组件506的拍摄参数。
接近传感器516,也称距离传感器,通常设置在终端500的前面板。接近传感器516用于采集用户与终端500的正面之间的距离。在一个实施例中,当接近传感器516检测到用户与终端500的正面之间的距离逐渐变小时,由处理器 501控制显示屏505从亮屏状态切换为息屏状态;当接近传感器516检测到用户与终端500的正面之间的距离逐渐变大时,由处理器501控制显示屏505从息屏状态切换为亮屏状态。
本领域技术人员可以理解,图5中示出的结构并不构成对终端500的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
在一些实施例中,该终端包括一个或多个处理器和一个或多个存储器,该一个或多个存储器中存储有至少一条程序代码,该至少一条程序代码由该一个或多个处理器加载并执行以实现如下操作:
对待传输音频进行语音关键性分析,得到该待传输音频中至少一个待传输音频帧的关键性级别,该关键性级别用于衡量音频帧承载的信息量;
根据当前冗余倍数以及该关键性级别对应的冗余多发因子,获取该至少一个待传输音频帧的冗余发送次数,该关键性级别与该冗余多发因子的大小呈正相关,该当前冗余倍数基于目标终端的当前丢包情况而确定;
按照该冗余发送次数对该至少一个待传输音频帧进行复制,得到至少一个冗余数据包,向该目标终端发送该至少一个冗余数据包。
在一些实施例中,该至少一条程序代码由该一个或多个处理器加载并执行以实现如下操作:
对任一待传输音频帧,获取该待传输音频帧的能量变化信息、语音活跃检测信息或者基频变化信息中至少一项;
根据该待传输音频帧的能量变化信息、语音活跃检测信息或者基频变化信息中至少一项,确定该待传输音频帧的关键性级别。
在一些实施例中,该至少一条程序代码由该一个或多个处理器加载并执行以实现如下操作:
分别判定该待传输音频帧是否满足各个级别的判定条件,当该待传输音频帧满足至少一个级别的判定条件时,将该待传输音频帧的关键性级别获取为该至少一个级别中关键性最高的级别,其中,各个级别的判定条件与待传输音频帧的能量变化信息、语音活跃检测信息或者基频变化信息中至少一项相关。
在一些实施例中,该至少一条程序代码由该一个或多个处理器加载并执行以实现如下操作:
按照关键性从高到低的顺序,分别判定该待传输音频帧是否满足各个级别 的判定条件,其中,各个级别的判定条件与待传输音频帧的能量变化信息、语音活跃检测信息或者基频变化信息中至少一项相关;
若该待传输音频帧满足当前级别的判定条件,将该待传输音频帧的关键性级别获取为该当前级别;
若该待传输音频帧不满足该当前级别的判定条件,则执行判定该待传输音频帧是否满足下一级别的判定条件的操作。
在一些实施例中,该关键性级别包括第一级、第二级、第三级以及第四级;
该第一级的判定条件包括:该待传输音频帧的语音活跃检测值为1、该待传输音频帧的前一待传输音频帧的语音活跃检测值为0且该待传输音频帧与该前一待传输音频帧之间基频值之差的绝对值大于目标阈值;
该第二级的判定条件包括:该待传输音频帧的语音活跃检测值为1、该待传输音频帧的前一待传输音频帧的语音活跃检测值为1且该待传输音频帧的能量值大于该前一待传输音频帧的能量值的目标倍数;或,该待传输音频帧的语音活跃检测值为1、该待传输音频帧的前一待传输音频帧的语音活跃检测值为0且该待传输音频帧与该前一待传输音频帧之间基频值之差的绝对值小于或等于该目标阈值;
该第三级的判定条件包括:该待传输音频帧的语音活跃检测值为1;
该第四级的判定条件包括:该待传输音频帧的语音活跃检测值为0。
在一些实施例中,该至少一条程序代码由该一个或多个处理器加载并执行以实现如下操作:
对任一待传输音频帧,将该待传输音频帧的关键性级别所对应的冗余多发因子与该当前冗余倍数相乘所得的数值确定为该待传输音频帧的冗余发送次数。
在一些实施例中,该至少一条程序代码由该一个或多个处理器加载并执行以实现如下操作:
对该待传输音频进行编码,得到音频码流;
分别对该音频码流中至少一个待传输音频帧进行多次复制,直到复制次数分别到达该至少一个待传输音频帧的冗余发送次数,得到多个冗余音频帧;
将该多个冗余音频帧封装为该至少一个冗余数据包。
在示例性实施例中,还提供了一种计算机可读存储介质,例如包括至少一条程序代码的存储器,上述至少一条程序代码可由终端中的处理器执行以完成 上述实施例中数据传输方法。例如,该计算机可读存储介质可以是ROM(Read-Only Memory,只读存储器)、RAM(Random-Access Memory,随机存取存储器)、CD-ROM(Compact Disc Read-Only Memory,只读光盘)、磁带、软盘和光数据存储设备等。
在一些实施例中,该至少一条程序代码由处理器加载并执行如下操作:
对待传输音频进行语音关键性分析,得到该待传输音频中至少一个待传输音频帧的关键性级别,该关键性级别用于衡量音频帧承载的信息量;
根据当前冗余倍数以及该关键性级别对应的冗余多发因子,获取该至少一个待传输音频帧的冗余发送次数,该关键性级别与该冗余多发因子的大小呈正相关,该当前冗余倍数基于目标终端的当前丢包情况而确定;
按照该冗余发送次数对该至少一个待传输音频帧进行复制,得到至少一个冗余数据包,向该目标终端发送该至少一个冗余数据包。
在一些实施例中,该至少一条程序代码由处理器加载并执行如下操作:
对任一待传输音频帧,获取该待传输音频帧的能量变化信息、语音活跃检测信息或者基频变化信息中至少一项;
根据该待传输音频帧的能量变化信息、语音活跃检测信息或者基频变化信息中至少一项,确定该待传输音频帧的关键性级别。
在一些实施例中,该至少一条程序代码由处理器加载并执行如下操作:
分别判定该待传输音频帧是否满足各个级别的判定条件,当该待传输音频帧满足至少一个级别的判定条件时,将该待传输音频帧的关键性级别获取为该至少一个级别中关键性最高的级别,其中,各个级别的判定条件与待传输音频帧的能量变化信息、语音活跃检测信息或者基频变化信息中至少一项相关。
在一些实施例中,该至少一条程序代码由处理器加载并执行如下操作:
按照关键性从高到低的顺序,分别判定该待传输音频帧是否满足各个级别的判定条件,其中,各个级别的判定条件与待传输音频帧的能量变化信息、语音活跃检测信息或者基频变化信息中至少一项相关;
若该待传输音频帧满足当前级别的判定条件,将该待传输音频帧的关键性级别获取为该当前级别;
若该待传输音频帧不满足该当前级别的判定条件,则执行判定该待传输音频帧是否满足下一级别的判定条件的操作。
在一些实施例中,该关键性级别包括第一级、第二级、第三级以及第四级;
该第一级的判定条件包括:该待传输音频帧的语音活跃检测值为1、该待传输音频帧的前一待传输音频帧的语音活跃检测值为0且该待传输音频帧与该前一待传输音频帧之间基频值之差的绝对值大于目标阈值;
该第二级的判定条件包括:该待传输音频帧的语音活跃检测值为1、该待传输音频帧的前一待传输音频帧的语音活跃检测值为1且该待传输音频帧的能量值大于该前一待传输音频帧的能量值的目标倍数;或,该待传输音频帧的语音活跃检测值为1、该待传输音频帧的前一待传输音频帧的语音活跃检测值为0且该待传输音频帧与该前一待传输音频帧之间基频值之差的绝对值小于或等于该目标阈值;
该第三级的判定条件包括:该待传输音频帧的语音活跃检测值为1;
该第四级的判定条件包括:该待传输音频帧的语音活跃检测值为0。
在一些实施例中,该至少一条程序代码由处理器加载并执行如下操作:
对任一待传输音频帧,将该待传输音频帧的关键性级别所对应的冗余多发因子与该当前冗余倍数相乘所得的数值确定为该待传输音频帧的冗余发送次数。
在一些实施例中,该至少一条程序代码由处理器加载并执行如下操作:
对该待传输音频进行编码,得到音频码流;
分别对该音频码流中至少一个待传输音频帧进行多次复制,直到复制次数分别到达该至少一个待传输音频帧的冗余发送次数,得到多个冗余音频帧;
将该多个冗余音频帧封装为该至少一个冗余数据包。
在一些实施例中,还提供一种包括指令的计算机程序产品,当其在计算机上运行时,使得计算机执行前述各个实施例所提供的数据传输方法中任一种可能实现方式,在此不作赘述。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,该程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本公开的可选实施例,并不用以限制本公开,凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。

Claims (10)

  1. 一种数据传输方法,应用于终端,所述方法包括:
    对待传输音频进行语音关键性分析,得到所述待传输音频中至少一个待传输音频帧的关键性级别,所述关键性级别用于衡量音频帧承载的信息量;
    根据当前冗余倍数以及所述关键性级别对应的冗余多发因子,获取所述至少一个待传输音频帧的冗余发送次数,所述关键性级别与所述冗余多发因子的大小呈正相关,所述当前冗余倍数基于目标终端的当前丢包情况而确定;
    按照所述冗余发送次数对所述至少一个待传输音频帧进行复制,得到至少一个冗余数据包,向所述目标终端发送所述至少一个冗余数据包。
  2. 根据权利要求1所述的方法,其特征在于,所述对待传输音频进行语音关键性分析,得到所述待传输音频中至少一个待传输音频帧的关键性级别包括:
    对任一待传输音频帧,获取所述待传输音频帧的能量变化信息、语音活跃检测信息或者基频变化信息中至少一项;
    根据所述待传输音频帧的能量变化信息、语音活跃检测信息或者基频变化信息中至少一项,确定所述待传输音频帧的关键性级别。
  3. 根据权利要求2所述的方法,其特征在于,所述确定所述待传输音频帧的关键性级别包括:
    分别判定所述待传输音频帧是否满足各个级别的判定条件,当所述待传输音频帧满足至少一个级别的判定条件时,将所述待传输音频帧的关键性级别获取为所述至少一个级别中关键性最高的级别,其中,各个级别的判定条件与待传输音频帧的能量变化信息、语音活跃检测信息或者基频变化信息中至少一项相关。
  4. 根据权利要求2所述的方法,其特征在于,所述确定所述待传输音频帧的关键性级别包括:
    按照关键性从高到低的顺序,分别判定所述待传输音频帧是否满足各个级别的判定条件,其中,各个级别的判定条件与待传输音频帧的能量变化信息、语音活跃检测信息或者基频变化信息中至少一项相关;
    若所述待传输音频帧满足当前级别的判定条件,将所述待传输音频帧的关键性级别获取为所述当前级别;
    若所述待传输音频帧不满足所述当前级别的判定条件,则执行判定所述待传输音频帧是否满足下一级别的判定条件的操作。
  5. 根据权利要求2至4任一项所述的方法,其特征在于,所述关键性级别包括第一级、第二级、第三级以及第四级;
    所述第一级的判定条件包括:所述待传输音频帧的语音活跃检测值为1、所述待传输音频帧的前一待传输音频帧的语音活跃检测值为0且所述待传输音频帧与所述前一待传输音频帧之间基频值之差的绝对值大于目标阈值;
    所述第二级的判定条件包括:所述待传输音频帧的语音活跃检测值为1、所述待传输音频帧的前一待传输音频帧的语音活跃检测值为1且所述待传输音频帧的能量值大于所述前一待传输音频帧的能量值的目标倍数;或,所述待传输音频帧的语音活跃检测值为1、所述待传输音频帧的前一待传输音频帧的语音活跃检测值为0且所述待传输音频帧与所述前一待传输音频帧之间基频值之差的绝对值小于或等于所述目标阈值;
    所述第三级的判定条件包括:所述待传输音频帧的语音活跃检测值为1;
    所述第四级的判定条件包括:所述待传输音频帧的语音活跃检测值为0。
  6. 根据权利要求1所述的方法,其特征在于,所述根据当前冗余倍数以及所述关键性级别对应的冗余多发因子,获取所述至少一个待传输音频帧的冗余发送次数包括:
    对任一待传输音频帧,将所述待传输音频帧的关键性级别所对应的冗余多发因子与所述当前冗余倍数相乘所得的数值确定为所述待传输音频帧的冗余发送次数。
  7. 根据权利要求1所述的方法,其特征在于,所述按照所述冗余发送次数对所述至少一个待传输音频帧进行复制,得到至少一个冗余数据包包括:
    对所述待传输音频进行编码,得到音频码流;
    分别对所述音频码流中至少一个待传输音频帧进行多次复制,直到复制次 数分别到达所述至少一个待传输音频帧的冗余发送次数,得到多个冗余音频帧;
    将所述多个冗余音频帧封装为所述至少一个冗余数据包。
  8. 一种数据传输装置,其特征在于,所述装置包括:
    分析模块,用于对待传输音频进行语音关键性分析,得到所述待传输音频中至少一个待传输音频帧的关键性级别,所述关键性级别用于衡量音频帧承载的信息量;
    获取模块,用于根据当前冗余倍数以及所述关键性级别对应的冗余多发因子,获取所述至少一个待传输音频帧的冗余发送次数,所述关键性级别与所述冗余多发因子的大小呈正相关关系,所述当前冗余倍数基于目标终端的当前丢包情况而确定;
    发送模块,用于按照所述冗余发送次数对所述至少一个待传输音频帧进行复制,得到至少一个冗余数据包,向所述目标终端发送所述至少一个冗余数据包。
  9. 一种终端,其特征在于,所述终端包括一个或多个处理器和一个或多个存储器,所述一个或多个存储器中存储有至少一条程序代码,所述至少一条程序代码由所述一个或多个处理器加载并执行以实现如权利要求1至权利要求7任一项所述的数据传输方法所执行的操作。
  10. 一种存储介质,其特征在于,所述存储介质中存储有至少一条程序代码,所述至少一条程序代码由处理器加载并执行以实现如权利要求1至权利要求7任一项所述的数据传输方法所执行的操作。
PCT/CN2020/120300 2019-11-20 2020-10-12 数据传输方法、装置、终端及存储介质 WO2021098405A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/513,736 US11798566B2 (en) 2019-11-20 2021-10-28 Data transmission method and apparatus, terminal, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911141212.0 2019-11-20
CN201911141212.0A CN110890945B (zh) 2019-11-20 2019-11-20 数据传输方法、装置、终端及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/513,736 Continuation US11798566B2 (en) 2019-11-20 2021-10-28 Data transmission method and apparatus, terminal, and storage medium

Publications (1)

Publication Number Publication Date
WO2021098405A1 true WO2021098405A1 (zh) 2021-05-27

Family

ID=69748072

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/120300 WO2021098405A1 (zh) 2019-11-20 2020-10-12 数据传输方法、装置、终端及存储介质

Country Status (3)

Country Link
US (1) US11798566B2 (zh)
CN (1) CN110890945B (zh)
WO (1) WO2021098405A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113660063A (zh) * 2021-08-18 2021-11-16 杭州网易智企科技有限公司 空间音频数据处理方法、装置、存储介质及电子设备

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110890945B (zh) * 2019-11-20 2022-02-22 腾讯科技(深圳)有限公司 数据传输方法、装置、终端及存储介质
CN111464262B (zh) * 2020-03-18 2022-03-25 腾讯科技(深圳)有限公司 数据处理方法、装置、介质及电子设备
CN111628992B (zh) * 2020-05-26 2021-04-13 腾讯科技(深圳)有限公司 一种多人通话控制方法、装置、电子设备及存储介质
CN112767953B (zh) * 2020-06-24 2024-01-23 腾讯科技(深圳)有限公司 语音编码方法、装置、计算机设备和存储介质
CN113936669A (zh) * 2020-06-28 2022-01-14 腾讯科技(深圳)有限公司 数据传输方法、系统、装置、计算机可读存储介质及设备
CN113992547B (zh) * 2020-07-09 2023-03-14 福建天泉教育科技有限公司 一种实时语音中自动检测丢包率的测试方法及其系统
CN112767955B (zh) * 2020-07-22 2024-01-23 腾讯科技(深圳)有限公司 音频编码方法及装置、存储介质、电子设备
CN111916109B (zh) * 2020-08-12 2024-03-15 北京鸿联九五信息产业有限公司 一种基于特征的音频分类方法、装置及计算设备
CN112272168A (zh) * 2020-10-14 2021-01-26 天津津航计算技术研究所 一种轻量级udp通信冗余方法
CN112489665B (zh) * 2020-11-11 2024-02-23 北京融讯科创技术有限公司 语音处理方法、装置以及电子设备
CN113438057A (zh) * 2021-06-23 2021-09-24 中宇联云计算服务(上海)有限公司 基于sd-wan云网融合技术的数据包复制方法、系统以及设备
CN116073946A (zh) * 2021-11-01 2023-05-05 中兴通讯股份有限公司 抗丢包方法、装置、电子设备及存储介质
CN114582365B (zh) * 2022-05-05 2022-09-06 阿里巴巴(中国)有限公司 音频处理方法和装置、存储介质和电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103632671A (zh) * 2013-06-28 2014-03-12 华为软件技术有限公司 数据编解码方法、装置及数据通信系统
CN107181968A (zh) * 2016-03-11 2017-09-19 腾讯科技(深圳)有限公司 一种视频数据的冗余控制方法和装置
CN108075859A (zh) * 2016-11-17 2018-05-25 中国移动通信有限公司研究院 数据传输方法及装置
CN109951254A (zh) * 2019-03-21 2019-06-28 腾讯科技(深圳)有限公司 一种数据处理方法及装置、计算机可读存储介质
CN110890945A (zh) * 2019-11-20 2020-03-17 腾讯科技(深圳)有限公司 数据传输方法、装置、终端及存储介质

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6785261B1 (en) * 1999-05-28 2004-08-31 3Com Corporation Method and system for forward error correction with different frame sizes
US7668712B2 (en) * 2004-03-31 2010-02-23 Microsoft Corporation Audio encoding and decoding with intra frames and adaptive forward error correction
US8352252B2 (en) * 2009-06-04 2013-01-08 Qualcomm Incorporated Systems and methods for preventing the loss of information within a speech frame
CN102376306B (zh) * 2010-08-04 2013-01-23 华为技术有限公司 语音帧等级的获取方法及装置
CN102438152B (zh) * 2011-12-29 2013-06-19 中国科学技术大学 可伸缩视频编码容错传输方法、编码器、装置和系统
US9047863B2 (en) * 2012-01-12 2015-06-02 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for criticality threshold control
GB201316575D0 (en) * 2013-09-18 2013-10-30 Hellosoft Inc Voice data transmission with adaptive redundancy
CN105050199A (zh) * 2015-06-09 2015-11-11 西北工业大学 一种基于正交频分多址接入机制的上行接入方法
US10504525B2 (en) * 2015-10-10 2019-12-10 Dolby Laboratories Licensing Corporation Adaptive forward error correction redundant payload generation
US10334518B2 (en) * 2015-10-20 2019-06-25 Qualcomm Incorporated Power gains and capacity gains for a relaxed frame erasure rate
CN105610635B (zh) * 2016-02-29 2018-12-07 腾讯科技(深圳)有限公司 语音编码发送方法和装置
CN105871514A (zh) * 2016-05-19 2016-08-17 乐视控股(北京)有限公司 一种数据传输方法以及数据发送装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103632671A (zh) * 2013-06-28 2014-03-12 华为软件技术有限公司 数据编解码方法、装置及数据通信系统
CN107181968A (zh) * 2016-03-11 2017-09-19 腾讯科技(深圳)有限公司 一种视频数据的冗余控制方法和装置
CN108075859A (zh) * 2016-11-17 2018-05-25 中国移动通信有限公司研究院 数据传输方法及装置
CN109951254A (zh) * 2019-03-21 2019-06-28 腾讯科技(深圳)有限公司 一种数据处理方法及装置、计算机可读存储介质
CN110890945A (zh) * 2019-11-20 2020-03-17 腾讯科技(深圳)有限公司 数据传输方法、装置、终端及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113660063A (zh) * 2021-08-18 2021-11-16 杭州网易智企科技有限公司 空间音频数据处理方法、装置、存储介质及电子设备
CN113660063B (zh) * 2021-08-18 2023-12-08 杭州网易智企科技有限公司 空间音频数据处理方法、装置、存储介质及电子设备

Also Published As

Publication number Publication date
US11798566B2 (en) 2023-10-24
US20220059100A1 (en) 2022-02-24
CN110890945B (zh) 2022-02-22
CN110890945A (zh) 2020-03-17

Similar Documents

Publication Publication Date Title
WO2021098405A1 (zh) 数据传输方法、装置、终端及存储介质
JP7225405B2 (ja) 上りリンク制御情報uciの伝送方法及び端末
WO2015085959A1 (zh) 语音处理方法及装置
US8606183B2 (en) Method and apparatus for remote controlling bluetooth device
WO2015058656A1 (zh) 直播控制方法,及主播设备
JP7361890B2 (ja) 通話方法、通話装置、通話システム、サーバ及びコンピュータプログラム
CN111596885B (zh) 音频数据处理方法、服务器及存储介质
WO2017215661A1 (zh) 一种场景音效的控制方法、及电子设备
KR102491006B1 (ko) 데이터 송신 방법 및 전자 기기
CN110149491B (zh) 视频编码方法、视频解码方法、终端及存储介质
CN114095437A (zh) 发送数据包的方法、装置、电子设备和存储介质
WO2022037261A1 (zh) 音频播放、设备管理方法及装置
CN113921002A (zh) 一种设备控制方法及相关装置
EP4254927A1 (en) Photographing method and electronic device
US20220094748A1 (en) Connection establishment method and terminal device
CN112689191B (zh) 一种投屏控制方法、终端及计算机可读存储介质
CN110113669B (zh) 获取视频数据的方法、装置、电子设备及存储介质
WO2017147743A1 (zh) 视频优化方法、用户设备和网络设备
US20220174356A1 (en) Method for determining bandwidth, terminal, and storage medium
CN113192519B (zh) 音频编码方法和装置以及音频解码方法和装置
CN111970668B (zh) 一种蓝牙音频控制方法、设备及计算机可读存储介质
CN111586433B (zh) 码率调整方法、装置、设备及存储介质
US20230232066A1 (en) Device recommendation method and electronic device
CN114666433B (zh) 一种终端设备中啸叫处理方法及装置、终端
WO2020107169A1 (zh) 一种音频模式校正的方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20890413

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20890413

Country of ref document: EP

Kind code of ref document: A1