WO2022001494A1 - 丢包重发方法、系统、装置、计算机可读存储介质及设备 - Google Patents

丢包重发方法、系统、装置、计算机可读存储介质及设备 Download PDF

Info

Publication number
WO2022001494A1
WO2022001494A1 PCT/CN2021/095677 CN2021095677W WO2022001494A1 WO 2022001494 A1 WO2022001494 A1 WO 2022001494A1 CN 2021095677 W CN2021095677 W CN 2021095677W WO 2022001494 A1 WO2022001494 A1 WO 2022001494A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio data
data packet
audio
loudness
target audio
Prior art date
Application number
PCT/CN2021/095677
Other languages
English (en)
French (fr)
Inventor
梁俊斌
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2022001494A1 publication Critical patent/WO2022001494A1/zh
Priority to US17/730,061 priority Critical patent/US11908482B2/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/12Arrangements for detecting or preventing errors in the information received by using return channel
    • H04L1/16Arrangements for detecting or preventing errors in the information received by using return channel in which the return channel carries supervisory signals, e.g. repetition request signals
    • H04L1/18Automatic repetition systems, e.g. Van Duuren systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/12Arrangements for detecting or preventing errors in the information received by using return channel
    • H04L1/16Arrangements for detecting or preventing errors in the information received by using return channel in which the return channel carries supervisory signals, e.g. repetition request signals
    • H04L1/18Automatic repetition systems, e.g. Van Duuren systems
    • H04L1/1809Selective-repeat protocols
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • H04L43/0829Packet loss

Definitions

  • the present application relates to the field of computer technology, and in particular, to the technology of packet loss and retransmission.
  • packet loss during data transmission is usually caused by reasons such as the instability of the transmission network.
  • data retransmission is generally used to ensure that the receiver terminal receives complete data.
  • the existing packet loss retransmission method is usually: when the packet loss situation fed back by the receiver terminal is detected, the lost data packets included in the packet loss situation are retransmitted.
  • the audio data for example, slight ambient sound
  • the packet retransmission method also feeds back the packet loss situation of such data, which may easily lead to the problem of long data retransmission time, which may easily lead to low data transmission efficiency.
  • the purpose of this application is to provide a packet loss retransmission method, system, device, computer-readable storage medium and electronic device, which can improve the problem of long data retransmission time and improve data transmission efficiency.
  • a method for retransmission of lost packets including:
  • the target audio data packet is retransmitted according to the loudness corresponding to the target audio data packet.
  • a system for retransmission of lost packets including a sender terminal, a server and a receiver terminal, wherein:
  • the sender terminal is used to send the target audio data packet to the server;
  • the server is used to obtain the loudness corresponding to the target audio data packet
  • a receiver terminal configured to receive the target audio data packet, and send to the server a packet loss state indicating that the target audio data packet is lost;
  • the server is further configured to retransmit the target audio data packet to the receiver terminal according to the corresponding loudness of the target audio data packet when the packet loss state is received.
  • a packet loss retransmission device comprising:
  • a loudness obtaining unit used for obtaining the loudness corresponding to the target audio data packet
  • the data sending unit is configured to retransmit the target audio data packet according to the corresponding loudness of the target audio data packet when receiving the packet loss state indicating that the target audio data packet is lost.
  • an electronic device for packet loss retransmission comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute The executable instructions are used to perform any of the methods described above.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements any one of the methods described above.
  • the loudness corresponding to the target audio data packet can be obtained.
  • the corresponding loudness retransmits the target audio data packet.
  • the present application can use the corresponding loudness of the audio data packet as a data retransmission condition, and can retransmit the target audio data packet in a targeted manner. It can reduce the amount of retransmitted data and reduce the occupation of network resources, thereby improving the problem of long data retransmission time and improving data transmission efficiency.
  • FIG. 1 schematically shows a schematic diagram of a system architecture of a method for retransmission of lost packets provided by an embodiment of the present application
  • FIG. 2 schematically shows a schematic structural diagram of a computer system of an electronic device provided by an embodiment of the present application
  • FIG. 3 schematically shows an architecture diagram of a method for retransmission of lost packets provided by an embodiment of the present application
  • FIG. 4 schematically shows a flow chart of a method for retransmission of lost packets according to an embodiment of the present application
  • FIG. 5 schematically shows a loudness weighting graph according to an embodiment of the present application
  • FIG. 6 schematically shows an acoustic iso-loudness curve diagram according to an embodiment of the present application
  • FIG. 7 schematically shows a sequence diagram of a packet loss retransmission method according to an embodiment of the present application.
  • FIG. 8 schematically shows a structural block diagram of a packet loss retransmission system according to an embodiment of the present application
  • FIG. 9 schematically shows a structural block diagram of a packet loss retransmission system according to an embodiment of the present application.
  • FIG. 10 schematically shows a structural block diagram of a packet loss retransmission system according to yet another embodiment of the present application.
  • FIG. 11 schematically shows a structural block diagram of an apparatus for retransmitting lost packets according to an embodiment of the present application.
  • Example embodiments will now be described more fully with reference to the accompanying drawings.
  • Example embodiments can be embodied in various forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this application will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
  • the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
  • numerous specific details are provided in order to give a thorough understanding of the embodiments of the present application.
  • those skilled in the art will appreciate that the technical solutions of the present application may be practiced without one or more of the specific details, or other methods, components, devices, steps, etc. may be employed.
  • well-known solutions have not been shown or described in detail to avoid obscuring aspects of the application.
  • FIG. 1 schematically shows a schematic diagram of a system architecture of a packet loss retransmission method provided by an embodiment of the present application.
  • the system architecture 100 may include one or more of terminal devices 101 , 102 , 103 , a network 104 and a server 105 .
  • the network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 .
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
  • the terminal devices 101, 102, 103 may be various electronic devices with display screens, including but not limited to desktop computers, portable computers, smart phones, tablet computers, and the like. It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative.
  • the packet loss retransmission method provided in the embodiment of the present application is generally performed by the server 105 , and accordingly, the packet loss retransmission device is generally set in the server 105 .
  • the packet loss retransmission method provided in this embodiment of the present application can also be performed by the terminal equipment 101, 102 or 103, and correspondingly, the packet loss retransmission device can also be set on the terminal equipment 101, 102 or 103.
  • the packet loss retransmission device can also be set on the terminal equipment 101, 102 or 103. In 102 or 103, no special limitation is made in this exemplary embodiment.
  • the server 105 may obtain the loudness corresponding to the target audio data packet; if receiving a packet loss state indicating that the target audio data packet is lost, the loudness corresponding to the target audio data packet Resend the target audio packet.
  • the server 105 may obtain the loudness corresponding to the target audio data packet; if receiving a packet loss state indicating that the target audio data packet is lost, the loudness corresponding to the target audio data packet Resend the target audio packet.
  • the terminal devices 101, 102 or 103 and the server 105 may be collectively referred to as electronic devices.
  • the terminal devices 101, 102 or 103 may be terminals independent of the sender terminal and the receiver terminal, or may be the sender terminal or the receiver terminal.
  • FIG. 2 shows a schematic structural diagram of a computer system suitable for implementing the electronic device according to the embodiment of the present application.
  • the computer system 200 includes a central processing unit (CPU) 201, which can be loaded into a random device according to a program stored in a read-only memory (Read-Only Memory, ROM) 202 or from a storage part 208 Various appropriate actions and processes are performed by accessing programs in a memory (Random Access Memory, RAM) 203 .
  • RAM Random Access Memory
  • various programs and data required for system operation are also stored.
  • the CPU 201, the ROM 202, and the RAM 203 are connected to each other through a bus 204.
  • An input/output (I/O) interface 205 is also connected to the bus 204 .
  • the following components are connected to the I/O interface 205: an input section 206 including a keyboard, a mouse, etc.; an output section 207 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc. ; a storage section 208 including a hard disk, etc.; and a communication section 209 including a network interface card such as a local area network (Local Area Network, LAN) card, a modem, and the like.
  • the communication section 209 performs communication processing via a network such as the Internet.
  • a drive 210 is also connected to the I/O interface 205 as needed.
  • a removable medium 211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 210 as needed so that a computer program read therefrom is installed into the storage section 208 as needed.
  • embodiments of the present application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication portion 209 and/or installed from the removable medium 211 .
  • CPU central processing unit
  • various functions defined in the method and apparatus of the present application are performed.
  • the reasons for packet loss are usually: wireless channel interference of WIFI or mobile network, congestion of routers during peak hours, and insufficient performance of mobile devices.
  • packet loss processing methods are generally as follows: when the receiver receives the data transmitted by the sender, it performs packet loss detection. Until there is no packet loss in the received data.
  • FIG. 3 schematically shows an architecture diagram of a method for retransmitting a packet loss provided by an embodiment of the present application.
  • the architecture diagram shown in FIG. 3 includes: a sender terminal 310, a server 320, and a receiver terminal 330; wherein, the sender terminal 310 includes a feature extraction module 311, a speech encoding module 312, and a data retransmission module 313, and the server 320 includes packet loss The detection module 321 , the audio routing module 322 and the data retransmission module 323 .
  • the receiver terminal 330 includes a packet loss detection module 331 , a voice decoding module 332 , a sound mixing module 333 and a playback module 334 .
  • the feature extraction module 311 can collect audio signals, perform feature extraction on the collected audio signals to obtain audio features, and send the collected audio signals to the speech encoding module 312 .
  • the speech encoding module 312 can encode the audio signal to obtain a corresponding audio code stream, and send the audio code stream to the data retransmission module 323 .
  • the data retransmission module 323 can package the matched audio code stream and audio features into audio data packets, and after packaging to obtain a plurality of audio data packets, it is forwarded to the server 320 as a data packet, so that the lost data in the server 320 is lost.
  • the packet detection module 321 performs packet loss detection on the received data packets.
  • the packet loss detection module 321 feeds back the lost state to the data retransmission module 313, so that the data retransmission module 313 retransmits the data for the lost state. Further, when the packet loss detection module 321 receives the retransmitted audio data packet and detects that there is no audio data packet in the lost state in the data packet, the data packet is updated according to the retransmitted audio data packet, and the updated The data packets are sent to the audio routing module 322 .
  • the audio routing module 322 can screen the audio data packets in the data packet according to the corresponding audio features of each data packet in the data packet to obtain the target audio data packet; wherein, the maximum energy amplitude corresponding to the target audio data packet is greater than The similarity of the energy spectrum distribution corresponding to the other audio data packets in the data packet or the target audio data packet and the preset energy spectrum distribution is greater than that of other audio data packets in the data packet.
  • the audio routing module 322 may send the target audio data packet to the data retransmission module 323 , so that the data retransmission module 323 forwards the target audio data packet to the receiver terminal 330 .
  • the packet loss detection module 331 in the receiver terminal 330 performs packet loss detection on the target audio data packet.
  • the packet loss detection module 331 feeds back the lost state to the data retransmission module 323, so that the data retransmission module 323 retransmits the data for the lost state. Further, when the packet loss detection module 331 receives the retransmitted target audio data packet and detects that there is no target audio data packet in a lost state in all target audio data packets, then update all target audio packets according to the retransmitted target audio data packet audio data packets, and send all the updated target audio data packets to the voice decoding module 332.
  • the speech decoding module 332 can decode the audio code stream in each target audio data packet into an audio signal, and send the decoded audio signal to the sound mixing module 333, so that the sound mixing module 333 performs sound mixing processing on the audio signal . Further, the sound mixing module 333 may send the sound mixing processing result to the playing module 334, so that the playing module 334 plays the sound mixing processing result.
  • each data packet transmission process may include multiple audio data packets to be transmitted.
  • multiple audio data packets are completely received (that is, there is no packet loss situation)
  • the amount of audio signal collection is large. If the above method is used to retransmit the lost packets, it will easily cause the problem of low data transmission efficiency, which will lead to the time when the receiver terminal plays audio and the sender. There is a long time difference between the moments when the terminal sends the audio, which affects the real-time performance of the audio output in the multi-person conference and the user experience.
  • an audio signal with low loudness that is not easily perceived by the user's hearing can be used as a screening condition for audio data packets, so as to determine whether to retransmit the lost audio data packets.
  • the server can not only filter the audio data packets according to the energy of the audio data packets, but also filter the filtered audio data packets by the loudness of the audio data packets, and then transmit the audio data packets after the secondary screening to the
  • the receiver terminal can improve the efficiency of data transmission in the event of packet loss, shorten the time difference between the time when the receiver terminal plays the audio and the time when the sender terminal sends the audio, thereby improving the real-time audio output in multi-person conferences. performance and user experience.
  • this exemplary embodiment provides a method for retransmitting lost packets.
  • the packet loss retransmission method may be applied to the foregoing server 105, and may also be applied to one or more of the foregoing terminal devices 101, 102, and 103, which is not particularly limited in this exemplary embodiment.
  • the method of packet loss and retransmission will be introduced mainly by taking the server as the execution subject as an example. Referring to Fig. 3, the packet loss retransmission method may include the following S410 to S420:
  • S410 and S420 may be executed by the server.
  • the server may be a routing server for screening channel signals.
  • the loudness corresponding to the audio data packet can be used as the data retransmission condition, thereby improving the problem of long data retransmission time and improving the data transmission efficiency.
  • the target audio data packet can also be retransmitted in a targeted manner. Compared with the related art, which retransmits all data packets sent at the same time including the target audio data packet, the amount of retransmitted data can be reduced, and the impact on network resources can be reduced. occupied.
  • the server may receive multiple audio data packets sent by the sender terminal, and the target audio data packet is one or more of the multiple audio data packets. Since not all conference members in a multi-person conference are talking, some conference members may remain silent. At this time, the audio data packets sent by the sender terminal corresponding to the silent conference members do not need to be sent to the receiver terminal. In this case, in this embodiment of the present application, before acquiring the loudness corresponding to the target audio data packet, the server may filter the acquired audio data packets for the target audio data packets whose audio characteristics meet the preset conditions.
  • the audio data packet may include the loudness corresponding to the audio data packet, the audio code stream, and the audio features corresponding to the audio code stream, and the audio characteristics corresponding to the audio code stream may include the energy distribution corresponding to the audio code stream and each frequency in the audio code stream. The energy amplitude corresponding to the point.
  • the amount of forwarded data can be reduced by screening the acquired audio data packets, thereby reducing the loss of network resources and improving the data transmission efficiency.
  • the sender terminal since the sender terminal sends multiple audio data packets to the server, packet loss may also occur. Therefore, before screening the target audio data packets whose audio characteristics meet the preset conditions from the acquired multiple audio data packets, the Packet loss detection is performed on the received multiple audio data packets; if the packet loss detection result includes a loss state, the loss state is fed back to the sender terminal, so that the sender terminal retransmits data for the loss state.
  • the number of sender terminals may be one or more, which is not limited in this embodiment of the present application.
  • a conference member terminal in the multi-person conference can serve as a sender terminal and can also serve as a receiver terminal.
  • multiple audio data packets may belong to the same data packet, the sender terminal may send one data packet at a time, each data packet includes one or more audio data packets, and the multiple audio data packets correspond to different audio channels.
  • the method of performing packet loss detection on the received multiple audio data packets may specifically be: determining the sequence numbers corresponding to the multiple audio data packets received within a unit time (eg, 5s) respectively. ; And then, according to the packet header (packet header) (as, Sequence Number consecutive number) of data transmission protocol (as, TCP protocol), detect the continuity of the sequence numbers corresponding to multiple audio packets respectively; If multiple audio packets correspond respectively If the serial numbers of the audio data packets are not consecutive (for example, 1, 2, 4, 5), it is determined that there is packet loss; if the serial numbers corresponding to multiple audio data packets are consecutive (for example, 1, 2, 3, 4), it is determined that There is no packet loss situation, wherein, when the packet loss situation does not occur, the packet loss state corresponding to the multiple audio data packets is the non-loss state; further, the corresponding sequence numbers of the multiple audio data packets are determined to correspond to the packet loss situation.
  • packet header packet header
  • TCP protocol Transmission Protocol
  • a packet loss detection result is generated according to the lost data packets; wherein, the number of lost data packets may be one or more, which is not limited in the embodiment of the present application.
  • the packet loss detection result may include at least one of the lost data packet, the sequence number corresponding to the lost data packet, and the packet loss state corresponding to the lost data packet; wherein, the packet loss state can be represented by a value of 0 or 1, and a value of 0 can represent Lost state in the packet loss state, a value of 1 can represent the non-lost state in the packet loss state.
  • the method of feeding back the loss status to the sender terminal may specifically be: feeding back a negative feedback (Negative Acknowledgement, NACK) signal including the packet loss detection result to the sender terminal.
  • NACK Negative Acknowledgement
  • the following steps may be included: sending a positive feedback (Acknowledgement, ACK) to the sender terminal for indicating that data reception is normal. )Signal.
  • the sender's terminal may specifically retransmit data for the lost state: the sender's client determines the lost data packet corresponding to the lost state and retransmits the lost data packet; or, the sender's client The client retransmits the data packet corresponding to the lost state, wherein the data packet is composed of the above-mentioned multiple audio data packets.
  • the sender can be triggered to resend data when the packet loss state is detected, which can improve the integrity of the transmitted data.
  • the above method further includes: according to the retransmission Send packets to update multiple audio packets.
  • the number of retransmitted data packets may be one or more, and the plurality of updated audio data packets include the above-mentioned retransmitted data packets.
  • the method of updating the plurality of audio data packets according to the retransmitted data packets may specifically be: determining the sequence number corresponding to the retransmitted data packet and the sequence numbers corresponding to the multiple audio data packets respectively, according to the sequence number
  • the retransmission data packets and the above-mentioned multiple audio data packets are sequentially integrated to realize the updating of the multiple audio data packets.
  • the integrity of the data sent to the receiver terminal can be guaranteed by updating the data packet.
  • the received multiple audio data packets may be sent by the sender terminal.
  • the way for the sender terminal to send multiple audio data packets is as follows: the sender terminal collects the audio signal and performs feature extraction on the audio signal to obtain the audio feature; the sender terminal encodes the audio signal to obtain the audio code stream; The terminal packs the audio code stream and audio features into audio data packets and sends them to the server.
  • the audio signal collected by the sender terminal may be an analog signal.
  • the audio feature corresponding to the audio signal may include at least one of zero-crossing rate, short-term energy, short-term autocorrelation function, and short-term average amplitude difference, which is not limited in this embodiment of the present application.
  • the sender terminal collects the audio signal and performs feature extraction on the audio signal. Calculate the zero-crossing rate of the audio signal as the audio feature of the audio signal; wherein, N is the frame length of the audio signal, and n is the number of frames of the audio signal.
  • the calculated zero-crossing rate can be used to characterize the number of times each frame of audio signal passes through the zero value.
  • the above-mentioned zero-crossing rate can determine the unvoiced and voiced sounds in the audio signal, which is helpful for the server to screen multiple received audio data packets.
  • the sender terminal collects the audio signal and performs feature extraction on the audio signal to obtain the audio feature. Detect the short-term energy E n of the n-th frame audio signal x n (m) in the collected audio signal, and obtain the short-term energy corresponding to each frame as the audio feature of the audio signal; wherein, N is the frame length of the audio signal, n is a positive integer.
  • the energy of human voice is larger than that of noise, and the short-term energy corresponding to each frame can be used to distinguish human voice and noise in the audio signal, which is helpful for the server to receive the corresponding energy of each frame. Noise filtering is performed on the multiple audio data packets received, thereby improving the processing efficiency of audio data.
  • the sender terminal collects the audio signal and performs feature extraction on the audio signal to obtain the audio feature.
  • the calculated short-term autocorrelation function can be used to measure the similarity of the time waveform of the signal itself, which is helpful for the server to detect the similar characteristics of audio according to the short-term autocorrelation function, thereby improving the processing efficiency of audio data.
  • the audio feature corresponding to the audio signal may also include features such as spectrogram, short-term power spectral density, spectral entropy, fundamental frequency, formant, etc., which are not limited in this embodiment of the present application.
  • the sender terminal can perform feature extraction on the audio signal, which can help the server to select the channel signal according to the result of the feature extraction, so that while ensuring the real-time output of the audio of the multi-person conference, It can also guarantee the effect of audio output.
  • the preset conditions used include the preset energy amplitude and/or the preset signal-to-noise ratio.
  • the way to filter the target audio data packets whose audio features meet the preset conditions in the package may be: if it is detected that there is at least one energy amplitude value greater than the preset energy amplitude value in the audio features, then the audio code stream corresponding to the audio features belongs to The audio data packet is determined as the target audio data packet; and/or, if it is detected that there is at least one signal-to-noise ratio greater than the preset signal-to-noise ratio in the audio feature, then the audio data packet to which the audio code stream corresponding to the audio feature belongs is determined as Destination audio packet.
  • each audio frame in the audio code stream corresponds to an energy amplitude value, and the energy amplitude value is used to represent the short-term energy corresponding to the audio frame.
  • the audio data packets can be screened according to the energy amplitude or the signal-to-noise ratio, so as to simplify the audio data packets to be transmitted and improve the data transmission efficiency.
  • the method of obtaining the loudness corresponding to the target audio data packet may be to perform frame-by-frame processing on the audio code stream in the target audio data packet according to the preset duration, so as to obtain multiple audio frames; Multiple audio frames are obtained to obtain multiple reference frames; the power spectrum corresponding to the multiple reference frames is calculated; the loudness corresponding to the target audio data packet is calculated according to the power spectrum.
  • the loudness corresponding to the target audio data packet may be calculated by the electronic device that executes the method for retransmitting the lost packets according to the above method.
  • the electronic device performing the packet loss retransmission method is a server
  • the server can calculate the loudness corresponding to the target audio data packet according to the above method.
  • the sender terminal can also calculate the loudness corresponding to the target audio data packet according to the above method.
  • the server directly Obtain the loudness corresponding to the target audio data packet from the sender terminal.
  • the durations corresponding to the multiple audio frames are the same (for example, 10ms or 20ms), and the durations corresponding to the multiple reference frames and the audio frames are the same.
  • FFT Fast Fourier Transform
  • the preset window function is a Hanning window function, a Hamming window function, a Blackman window function, a Keiser window function, a triangular window function or a rectangular window function.
  • a preset window function is used to process multiple audio frames respectively, and the method for obtaining multiple reference frames may be: determining the time domain expressions corresponding to the multiple audio frames respectively; The corresponding time domain expression is dot-multiplied with a preset window function to obtain reference frames corresponding to multiple audio frames respectively.
  • the manner of calculating the power spectra corresponding to the multiple reference frames may be: performing fast Fourier transform on the multiple reference frames to determine the power spectra corresponding to the multiple reference frames.
  • the loudness of the target audio data packet can be calculated and obtained, and the loudness can be used as a data retransmission condition.
  • the server may not retransmit the lost target audio data packet to avoid Reduce the occupation of network resources.
  • a possible implementation manner of calculating the loudness corresponding to the target audio data packet according to the power spectrum may be to calculate the frequency point loudness of each frequency point in the power spectrum according to the energy amplitude of each frequency point in the power spectrum; Frequency point loudness Calculate the loudness weight of each frequency point in the power spectrum; calculate the weighted sum between the energy amplitude of each frequency point in the power spectrum and the loudness weight of each frequency point in the power spectrum, as the loudness of the reference frame corresponding to the power spectrum value; determine the sum of the loudness values corresponding to multiple reference frames as the loudness corresponding to the target audio data packet.
  • the frequency point is the number of the fixed frequency, which can be used as the only representation of the fixed frequency.
  • the method of calculating the loudness of each frequency point in the power spectrum according to the energy amplitude of each frequency point in the power spectrum may specifically be: determining the energy amplitude of each frequency point i in the power spectrum.
  • the method of calculating the loudness weight of each frequency point in the power spectrum according to the loudness of the frequency point may specifically be: calculating the loudness weight cof(freq) of each frequency point freq in the power spectrum based on the following formula; wherein ,
  • freq is the frequency value of the frequency point for which the perception coefficient needs to be calculated (for example, the center frequency value of the frequency band corresponding to the frequency point);
  • ff, af, bf, cf are the data in the equal-loudness curve data table disclosed by BS3383; loud means The loudness of the frequency point freq, cof(freq) represents the perceptual coefficient corresponding to the frequency point freq.
  • FIG. 5 schematically shows a loudness weighting graph according to an embodiment of the present application.
  • the horizontal axis represents the frequency
  • the vertical axis represents the loudness weight.
  • the method of calculating the weighted sum between the energy amplitude of each frequency point in the power spectrum and the loudness weight of each frequency point in the power spectrum as the loudness value of the reference frame corresponding to the power spectrum may be is: according to the weighted sum between the energy amplitude of each frequency point in the power spectrum and the loudness weight of each frequency point in the power spectrum, as the loudness value EP(i) of the reference frame corresponding to the power spectrum; wherein, i is the frame sequence number , k is the frequency sequence number.
  • the loudness corresponding to the target audio data packet can be determined according to the loudness corresponding to each frequency point as the loudness of the data retransmission condition, thereby improving the data retransmission efficiency of the server.
  • the target audio data packet is retransmitted according to the loudness corresponding to the target audio data packet.
  • the implementation manner of retransmitting the target audio data packet according to the loudness corresponding to the target audio data packet may be that if the corresponding loudness of the target audio data packet is greater than the preset loudness, retransmitting the target audio data packet to the receiver terminal , so that the receiver terminal decodes and outputs the target audio data packet after and before retransmission; if the corresponding loudness of the target audio data packet is less than the preset loudness, the target audio data packet is not retransmitted to the receiver terminal.
  • loudness is a subjective feeling quantity used to describe the loudness of a sound, and the unit is sone. 1000Hz pure tone, when the sound pressure level is 40dB, the loudness is 1 Song; the sound of 2 Song is twice the loudness of the 40-square sound; 4 Song is 4 times the loudness of the 40-square sound. That is to say, when the sound pressure level increases by 10dB, the loudness doubles, and the human ear's perception of loudness varies with the sound pressure level.
  • FIG. 6 schematically shows an acoustic equal loudness curve diagram according to an embodiment of the present application.
  • the acoustic equal loudness curve shown in Figure 6 shows curves of various loudness levels (ie, 100phon, 80phon, 60phon, 40phon, 20phon, threshold threshold phon), the horizontal axis is used to represent the frequency of sound waves (Frequency), and the vertical axis is used In order to represent the sound pressure level (Sound Pressure Level), it can be seen from the acoustic equal-loudness curve shown in Figure 6 that the equal-loudness curve is a curve describing the relationship between the sound pressure level and the sound wave frequency under the condition of equal loudness. , the lower the frequency, the greater the sound pressure intensity (energy) required for equal sound. That is, the greater the sound energy, the more consistent the hearing experience of the human ear.
  • the frequencies of different frequency bands correspond to different acoustic auditory perception characteristics.
  • FIG. 6 it can be seen that if the audio data packets are filtered by the loudness, the sounds with weaker perception ability of the human ear can be filtered out, which can improve the data transmission effect.
  • audio data packets can be screened according to loudness, thereby reducing the occupation of network resources by sounds with weak human ear perception ability, thereby improving data transmission efficiency.
  • the present application can retransmit lost data packets, it can solve the problem of low transmission efficiency caused by the need to retransmit all data of the entire data packet in the prior art for each retransmission.
  • the method for the receiver terminal to decode and output the target audio data packets after retransmission and before retransmission may be as follows: the receiver terminal decodes the target audio data packets after retransmission and before retransmission, and obtains multiple Audio signals to be output; the receiver terminal performs mixing processing on a plurality of audio signals to be output to obtain the mixed signals and play them.
  • the receiver terminal before the receiver terminal performs audio mixing processing on the plurality of audio signals to be output, the following step may be further included: the receiver terminal performs format normalization on the plurality of audio signals to be output, so that the Ensure that the formats of multiple audio signals to be output are unified; further, convert multiple audio signals to be output into preset sampling rates (eg, 16kHz, 32kHz, 44.1kHz, 48kHz, etc.); further, detect multiple audio signals to be output.
  • the consistency of the bit depth (Bit-Depth) or the sampling format (Sample Format) of the audio signal if the bit depth is inconsistent or the sampling format is inconsistent, the corresponding normalization processing is performed on the multiple audio signals to be output, so as to carry each audio signal.
  • the number of bits of the audio data of each sampling point is the same; further, it is detected whether the channels (such as mono or dual channels) of the audio signals to be output are consistent, and if they are inconsistent, a prompt message indicating the inconsistency of the channels is output. ; If they are consistent, perform the operation of mixing multiple audio signals to be output.
  • the receiver terminal may also perform processing such as echo cancellation, noise suppression, and silence detection on the plurality of audio signals to be output, which is implemented in this application. Examples are not limited.
  • the receiver terminal performs sound mixing processing on a plurality of audio signals to be output, and the method for obtaining the mixed sound signal may be as follows: the receiver terminal equalizes the energy amplitude values corresponding to the plurality of audio signals to be output respectively. Adjust the volume of a plurality of audio signals to be output, and mix the results of the equalization adjustment to obtain a mixed signal.
  • the receiver terminal performs mixing processing on a plurality of audio signals to be output, and after obtaining the mixed signal, the following steps may be further included: the receiver terminal performs overflow detection on the mixed signal, and if there is an overflow , the overflow processing/smoothing processing is performed on the overflowed sample points, and then the processing result is played.
  • the receiver terminal can mix and output multiple complete audio data packets after receiving them.
  • the output result retains the important audio content in the conference and discards the human ear perception ability.
  • the weak audio content can improve the message forwarding efficiency of multi-person conferences, ensure the real-time output of audio signals, and thus improve the user experience.
  • FIG. 7 schematically shows a sequence diagram of a method for retransmitting lost packets according to an embodiment of the present application. As shown in Figure 7, it includes S700-S780, wherein:
  • the sender terminal sends a plurality of audio data packets to the server.
  • the sender terminal can perform feature extraction on the collected audio signal to obtain audio features; and, encode the audio signal to obtain an audio code stream corresponding to the audio signal; Packed as audio packets.
  • the multiple audio data packets obtained by packaging are sent to the server.
  • the server may perform packet loss detection after receiving multiple audio data packets.
  • the server feeds back the loss state to the sender terminal.
  • the sender terminal resends the data to the server for the lost state.
  • the server forwards the target audio data packet determined from the plurality of audio data packets to the receiver terminal.
  • the server may update multiple audio data packets according to the retransmitted data packets, filter target audio data packets whose audio characteristics meet preset conditions from the multiple audio data packets, and forward the target audio data packets to the receiver terminal.
  • the receiver terminal may perform packet loss detection.
  • the receiver terminal feeds back the loss state of the target audio data packet to the server.
  • the server resends the target audio data packet to the receiver terminal.
  • the receiver terminal decodes and outputs the target audio data packet after retransmission and before retransmission.
  • the audio data packet may not include loudness.
  • the server may calculate the corresponding loudness of the audio signal according to the audio features in the audio data packet.
  • the data retransmission function can be enabled for audio data with strong human ear perception ability, and the data retransmission function is not enabled for audio data with weak human ear perception ability to ensure data transmission quality. Furthermore, it can ensure that the transmission quality of the main sound source in multi-person network calls is guaranteed to the greatest extent, avoid unnecessary consumption of network bandwidth resources, and reduce call delays caused by packet loss and retransmission without screening conditions. increased bad user call experience.
  • a system 800 for retransmission of lost packets is also provided. Please refer to FIG. 8, including a sender terminal 801, a server 802 and a receiver terminal 803, wherein:
  • the sender terminal 801 is used to send the target audio data packet to the server 802;
  • a server 802 configured to obtain the loudness corresponding to the target audio data packet
  • the receiver terminal 803 is used to receive the target audio data packet, and send the packet loss status for indicating the loss of the target audio data packet to the server 802;
  • the server 802 is further configured to retransmit the target audio data packet to the receiver terminal 803 according to the loudness corresponding to the target audio data packet when the packet loss state is received.
  • the loudness corresponding to the audio data packet can be used as the data retransmission condition, thereby improving the problem of long data retransmission time and improving the data transmission efficiency.
  • the target audio data packet can also be retransmitted in a targeted manner. Compared with the prior art, which retransmits all data packets sent at the same time including the target audio data packet, the amount of retransmitted data can be reduced, and the amount of retransmitted data can be reduced. occupancy.
  • FIG. 9 schematically shows a structural block diagram of a system 900 for retransmission of lost packets according to another embodiment of the present application.
  • the structural block diagram shown in FIG. 9 includes: a sender terminal 910 , a server 920 and a receiver terminal 930 ; wherein the sender terminal 910 includes a feature extraction module 911 , a speech encoding module 912 and a data retransmission module 913 , the server 920 includes a packet loss detection module 921, an audio routing module 922, a data retransmission module 923 and a perceptual analysis module 924, and the receiver terminal 930 includes a packet loss detection module 931, a voice decoding module 932, a mixing module 933 and a playback module 934.
  • the feature extraction module 911 can collect audio signals, perform feature extraction on the collected audio signals to obtain audio features, and send the collected audio signals to the speech encoding module 912 .
  • the speech encoding module 912 can encode the audio signal to obtain a corresponding audio code stream, and send the audio code stream to the data retransmission module 923 .
  • the data retransmission module 923 can package the matched audio code stream and audio feature into an audio data packet, and the audio data packet includes the corresponding loudness of the audio data packet, the audio code stream and the audio feature corresponding to the audio code stream;
  • the multiple audio data packets are then forwarded to the server 920 as one data packet, so that the packet loss detection module 921 in the server 920 performs packet loss detection on the received data packets.
  • the packet loss detection module 921 feeds back the lost state to the data retransmission module 913, so that the data retransmission module 913 retransmits the data for the lost state. Further, when the packet loss detection module 921 receives the retransmitted audio data packet and detects that there is no audio data packet in the lost state in the data packet, the data packet is updated according to the retransmitted audio data packet, and the updated The data packets are sent to the audio routing module 922 .
  • the audio routing module 922 can screen the audio data packets in the data packet according to the corresponding audio features of each data packet in the data packet to obtain the target audio data packet; wherein, the maximum energy amplitude corresponding to the target audio data packet is greater than The similarity of the energy spectrum distribution corresponding to the other audio data packets in the data packet or the target audio data packet and the preset energy spectrum distribution is greater than that of other audio data packets in the data packet.
  • the audio routing module 922 may send the target audio data packet to the perceptual analysis module 924 and the data retransmission module 923 , so that the data retransmission module 923 forwards the target audio data packet to the receiver terminal 930 .
  • the packet loss detection module 931 in the receiver terminal 930 performs packet loss detection on the target audio data packets; wherein, the number of the target audio data packets may be one or more.
  • the packet loss detection module 931 feeds back the lost state to the data retransmission module 923 . Furthermore, if the loudness of the target audio data packet is greater than the preset loudness, the data retransmission module 923 may retransmit the data for the lost state; wherein, there may be one or more target audio data packets in the lost state. Further, when the packet loss detection module 931 receives the retransmitted target audio data packet and detects that there is no target audio data packet in a lost state in all target audio data packets, then update all target audio data packets according to the retransmitted target audio data packet data packets, and send all the updated target audio data packets to the voice decoding module 932.
  • the speech decoding module 932 can decode the audio code stream in each target audio data packet into an audio signal, and send the decoded audio signal to the sound mixing module 933, so that the sound mixing module 933 performs sound mixing processing on the audio signal . Furthermore, the sound mixing module 933 may send the sound mixing processing result to the playing module 934, so that the playing module 934 plays the sound mixing processing result.
  • the loudness corresponding to the audio data packet can be used as the data retransmission condition, thereby improving the problem of long data retransmission time and improving the data transmission efficiency.
  • the target audio data packet can also be retransmitted in a targeted manner. Compared with the prior art, which retransmits all data packets sent at the same time including the target audio data packet, the amount of retransmitted data can be reduced, and the amount of retransmitted data can be reduced. occupancy.
  • FIG. 10 schematically shows a structural block diagram of a system 1000 for retransmission of lost packets according to yet another embodiment of the present application. As shown in FIG. 10, it includes a sender terminal 1011, a sender terminal 1012, ..., a sender terminal 101n, a server cluster 1030, a receiver terminal 1021, a receiver terminal 1022, ..., a receiver terminal 102n; wherein, n is a positive integer greater than or equal to 3.
  • the server cluster 1030 in the present application may receive data packets sent from at least one of the sender terminal 1011 , the sender terminal 1012 , . . . , and the sender terminal 101 n , and each data packet may include one or more If the audio data packet satisfies the preset conditions, the audio data packet can be determined as the target audio data packet by the server cluster 1030 and forwarded to the receiver terminal 1021, the receiver terminal 1022, . . . , the receiver terminal 102n. If receiving a packet loss state sent by any one of the receiver terminal 1021, the receiver terminal 1022, ... and the receiver terminal 102n to indicate the loss of the target audio data packet, the target audio The packet is retransmitted.
  • the loudness corresponding to the audio data packet can be used as the data retransmission condition, thereby improving the problem of long data retransmission time and improving the data transmission efficiency.
  • the target audio data packet can also be retransmitted in a targeted manner. Compared with the prior art, which retransmits all data packets sent at the same time including the target audio data packet, the amount of retransmitted data can be reduced, and the amount of retransmitted data can be reduced. occupancy.
  • the packet loss retransmission apparatus 1100 may include:
  • Loudness obtaining unit 1101 used to obtain the loudness corresponding to the target audio data packet
  • the data sending unit 1102 is configured to retransmit the target audio data packet according to the loudness corresponding to the target audio data packet when receiving a packet loss state indicating that the target audio data packet is lost.
  • the loudness corresponding to the audio data packet can be used as the data retransmission condition, thereby improving the problem of long data retransmission time and improving the data transmission efficiency.
  • the target audio data packet can also be retransmitted in a targeted manner. Compared with the prior art, which retransmits all data packets sent at the same time including the target audio data packet, the amount of retransmitted data can be reduced, and the amount of retransmitted data can be reduced. occupancy.
  • the above-mentioned apparatus further includes a data packet screening unit (not shown), wherein:
  • the data packet screening unit is configured to filter target audio data packets whose audio characteristics meet preset conditions from the acquired multiple audio data packets before the loudness acquisition unit 1101 acquires the loudness corresponding to the target audio data packets.
  • the received audio data packets can be screened according to the audio features, so as to reduce the amount of data to be forwarded, thereby reducing the loss of network resources and improving the data transmission efficiency.
  • the above-mentioned apparatus further includes a packet loss detection unit (not shown) and a packet loss status feedback unit (not shown), wherein:
  • a packet loss detection unit configured to perform packet loss detection on the received multiple audio data packets before the data packet screening unit filters the target audio data packets whose audio characteristics meet the preset conditions from the received multiple audio data packets;
  • the packet loss state feedback unit is configured to feed back the loss state to the sender terminal if the packet loss detection result includes the loss state, so that the sender terminal retransmits data for the loss state.
  • the audio data packet includes the loudness corresponding to the audio data packet, the audio code stream, and the audio features corresponding to the audio code stream, and the audio characteristics corresponding to the audio code stream include the energy distribution corresponding to the audio code stream and the corresponding frequency points in the audio code stream. energy magnitude.
  • the sender terminal can be triggered to resend data when the packet loss state is detected, which can improve the integrity of the transmitted data.
  • the above-mentioned apparatus further includes a data packet updating unit (not shown), wherein:
  • the data packet updating unit is used for after the sender terminal retransmits the data for the lost state, and before the data packet screening unit filters the target audio data packets whose audio characteristics meet the preset conditions from the acquired plurality of audio data packets, Updates multiple audio packets based on retransmitted packets.
  • the integrity of the data sent to the receiver terminal can be guaranteed by updating the data packet.
  • the received multiple audio data packets are sent by the sender terminal;
  • the multiple audio data packets are obtained by packaging the audio code stream and audio features by the sender terminal, and the audio features are obtained by collecting the audio signal and extracting the features of the audio signal by the sender terminal; the audio code stream It is obtained by encoding the audio signal by the sender terminal.
  • the sender terminal can perform feature extraction on the audio signal, which can help the server to select the channel signal according to the result of the feature extraction, so that while ensuring the real-time output of the audio of the multi-person conference, It can also guarantee the effect of audio output.
  • the preset condition includes a preset energy amplitude and/or a preset signal-to-noise ratio
  • the data packet screening unit selects the audio characteristics from the acquired multiple audio data packets to meet the preset requirements Conditional target audio packets, including:
  • the audio data packet to which the audio code stream corresponding to the audio feature belongs is determined as the target audio data packet; and/or,
  • the audio data package to which the audio code stream corresponding to the audio feature belongs is determined as the target audio data package.
  • the audio data packets can be screened according to the energy amplitude or the signal-to-noise ratio, so as to simplify the audio data packets to be transmitted and improve the data transmission efficiency.
  • the loudness obtaining unit 1101 is used for:
  • Framing processing is performed on the audio code stream in the target audio data packet according to the preset duration to obtain multiple audio frames;
  • the preset window function is a Hanning window function, a Hamming window function, a Blackman window function, a Kaiser window function, a triangular window function or a rectangular window function.
  • the audio loudness of the target audio data packet can be calculated and obtained, and the audio loudness can be used as a data retransmission condition.
  • the server may not replay the lost target audio data packet. to reduce the occupation of network resources.
  • the loudness obtaining unit 1101 is specifically configured to:
  • the sum of the loudness values corresponding to the multiple reference frames is determined as the loudness corresponding to the target audio data packet.
  • the loudness corresponding to the target audio data packet as the data retransmission condition can be determined according to the loudness corresponding to each frequency point, thereby improving the data retransmission efficiency of the server.
  • the data sending unit 1102 is configured to:
  • the target audio data packet is resent to the receiver terminal, so that the receiver terminal decodes and outputs the target audio data packet after retransmission and before retransmission.
  • audio data packets can be screened according to loudness, thereby reducing the occupation of network resources by sounds with weak human ear perception ability, thereby improving data transmission efficiency.
  • the present application can retransmit lost data packets, it can solve the problem of low transmission efficiency caused by the need to retransmit all data of the entire data packet in the prior art for each retransmission.
  • the manner in which the receiver terminal decodes and outputs the target audio data packet after retransmission and before retransmission is specifically:
  • the receiver terminal decodes the target audio data packets after retransmission and before retransmission, and obtains a plurality of audio signals to be output;
  • the receiver terminal performs mixing processing on a plurality of audio signals to be outputted to obtain and play the mixed audio signals.
  • the receiver terminal can mix and output multiple complete audio data packets after receiving them.
  • the output result retains the important audio content in the conference and discards the human ear perception ability.
  • the weak audio content can improve the message forwarding efficiency of multi-person conferences, ensure the real-time output of audio signals, and thus improve the user experience.
  • each functional module of the apparatus for retransmission of packet loss corresponds to the steps of the above-mentioned exemplary embodiment of the method for retransmission of packet loss for packet loss, for details not disclosed in the embodiment of the apparatus of this application, please refer to the above-mentioned method of this application.
  • An embodiment of the packet loss retransmission method An embodiment of the packet loss retransmission method.
  • the present application also provides a computer-readable medium.
  • the computer-readable medium may be included in the electronic device described in the above embodiments; it may also exist alone without being assembled into the electronic device. middle.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by an electronic device, enables the electronic device to implement the methods described in the above-mentioned embodiments.
  • the computer-readable medium shown in this application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • the units involved in the embodiments of the present application may be implemented in software or hardware, and the described units may also be provided in a processor. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本申请提供一种丢包重发方法、系统、装置、计算机可读存储介质及电子设备;涉及计算机技术领域,该丢包重发方法包括:获取目标音频数据包对应的响度;若接收到用于表示目标音频数据包丢失的丢包状态时,根据目标音频数据包对应的响度对目标音频数据包进行重发。可见,实施本申请的技术方案,可以改善数据重发时间较长的问题,提升数据传输效率。

Description

丢包重发方法、系统、装置、计算机可读存储介质及设备
本申请要求于2020年6月28日提交中国专利局、申请号202010601648.X、申请名称为“丢包重发方法、系统、装置、计算机可读存储介质及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,具体而言,涉及丢包重发技术。
背景技术
在音频数据传输过程中,通常会因为传输网络不稳定等原因导致数据传输过程出现丢包现象。针对丢包现象,一般采用数据重发的方式保证接收方终端接收到完整数据。现有的丢包重发方式通常为:在检测到接收方终端反馈的丢包情况时,对丢包情况中包含的丢失数据包进行重发。
但是,在实时音频数据传输过程中,通常会存在如下情况:音频数据包中的音频数据(例如,轻微的环境音)在被解码输出后并不一定能够被人耳所感知,如果依照上述丢包重发方式对此类数据的丢包情况也进行反馈,则容易导致数据重发时间较长的问题,进而容易导致数据传输效率较低。
需要说明的是,在上述背景技术部分公开的信息仅用于加强对本申请的背景的理解,因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。
发明内容
本申请的目的在于提供一种丢包重发方法、系统、装置、计算机可读存储介质及电子设备,可以改善数据重发时间较长的问题,提升数据传输效率。
本申请的其他特性和优点将通过下面的详细描述变得显然,或部分地通过本申请的实践而习得。
根据本申请的一方面,提供一种丢包重发方法,包括:
获取目标音频数据包对应的响度;
若接收到用于表示目标音频数据包丢失的丢包状态时,根据目标音频数据包对应的响度对目标音频数据包进行重发。
根据本申请的一方面,提供一种丢包重发系统,包括发送方终端、服务器以及接收方终端,其中:
发送方终端,用于向服务器发送目标音频数据包;
服务器,用于获取目标音频数据包对应的响度;
接收方终端,用于接收所述目标音频数据包,并向服务器发送用于表示目标音频数据包丢失的丢包状态;
服务器,还用于在接收到丢包状态时,根据目标音频数据包对应的响度将目标音频数据包重发至接收方终端。
根据本申请的一方面,提供一种丢包重发装置,包括:
响度获取单元,用于获取目标音频数据包对应的响度;
数据发送单元,用于在接收到用于表示目标音频数据包丢失的丢包状态时,根据目标音频数据包对应的响度对目标音频数据包进行重发。
根据本申请的一方面,提供一种用于丢包重发的电子设备,包括:处理器;以及存储器,用于存储所述处理器的可执行指令;其中,所述处理器配置为经由执行所述可执行指令来执行上述任意一项所述的方法。
根据本申请的一方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任意一项所述的方法。
本申请示例性实施例可以具有以下部分或全部有益效果:
在本申请的一示例实施方式所提供的丢包重发方法中,可以获取目标音频数据包对应的响度,若接收到用于表示目标音频数据包丢失的丢包状态时,根据目标音频数据包对应的响度对目标音频数据包进行重发。依据上述方案描述,本申请一方面可以将音频数据包对应的响度作为数据重发条件,可以有针对性地重发目标音频数据包,相较于相关技术将包含目标音频数据包的当次发送的所有数据包进行重发,可以降低重发数据量,降低对于网络资源的占用,从而改善数据重发时间较长的问题,提升数据传输效率。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示意性示出了本申请实施例提供的一种丢包重发方法的系统架构的示意图;
图2示意性示出了本申请实施例提供的电子设备的计算机系统的结构示意图;
图3示意性示出了本申请实施例提供的一种丢包重发方法的架构图;
图4示意性示出了根据本申请的一个实施例的丢包重发方法的流程图;
图5示意性示出了根据本申请的一个实施例的响度权重曲线图;
图6示意性示出了根据本申请的一个实施例的声学等响曲线图;
图7示意性示出了根据本申请的一个实施例的丢包重发方法的时序图;
图8示意性示出了根据本申请的一个实施例的丢包重发系统的结构框图;
图9示意性示出了根据本申请的一个实施例的丢包重发系统的结构框图;
图10示意性示出了根据本申请的又一个实施例的丢包重发系统的结构框图;
图11示意性示出了根据本申请的一个实施例中的丢包重发装置的结构框图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本申请将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。在下面的描述中,提供许多具体细节从而给出对本申请的实施方式的充分理解。然而,本领域技术人员将意识到,可以实践本申请的技术方案而省略所述特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知技术方案以避免喧宾夺主而使得本申请的各方面变得模糊。
此外,附图仅为本申请的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
图1示意性示出了本申请实施例提供的一种丢包重发方法的示系统架构的示意图。
如图1所示,系统架构100可以包括终端设备101、102、103中的一个或多个,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。终端设备101、102、103可以是具有显示屏的各种电子设备,包括但不限于台式计算机、便携式计算机、智能手机和平板电脑等等。应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。
本申请实施例所提供的丢包重发方法一般由服务器105执行,相应地,丢包重发装置一般设置于服务器105中。但本领域技术人员容易理解的是,本申请实施例所提供的丢包重发方法也可以由终端设备101、102或103执行,相应的,丢包重发装置也可以设置于终 端设备101、102或103中,本示例性实施例中对此不做特殊限定。举例而言,在一种示例性实施例中,服务器105可以获取目标音频数据包对应的响度;若接收到用于表示目标音频数据包丢失的丢包状态时,根据目标音频数据包对应的响度对目标音频数据包进行重发。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
其中,终端设备101、102或103,服务器105可以统称为电子设备,终端设备101、102或103可以是独立于发送方终端和接收方终端的终端,也可以是发送方终端或接收方终端。
图2示出了适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
需要说明的是,图2示出的电子设备的计算机系统200仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图2所示,计算机系统200包括中央处理单元(central processing unit,CPU)201,其可以根据存储在只读存储器(Read-Only Memory,ROM)202中的程序或者从储存部分208加载到随机访问存储器(Random Access Memory,RAM)203中的程序而执行各种适当的动作和处理。在RAM 203中,还存储有系统操作所需的各种程序和数据。CPU 201、ROM 202以及RAM 203通过总线204彼此相连。输入/输出(I/O)接口205也连接至总线204。
以下部件连接至I/O接口205:包括键盘、鼠标等的输入部分206;包括诸如阴极射线管(Cathode Ray Tube,CRT)、液晶显示器(Liquid Crystal Display,LCD)等以及扬声器等的输出部分207;包括硬盘等的储存部分208;以及包括诸如局域网(Local Area Network,LAN)卡、调制解调器等的网络接口卡的通信部分209。通信部分209经由诸如因特网的网络执行通信处理。驱动器210也根据需要连接至I/O接口205。可拆卸介质211,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器210上,以便于从其上读出的计算机程序根据需要被安装入储存部分208。
特别地,根据本申请的实施例,下文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分209从网络上被下载和安装,和/或从可拆卸介质211被安装。在该计算机程序被中央处理单元(CPU)201执行时,执行本申请的方法和装置中限定的各种功能。
一般来说,音频传输过程中很难避免丢包情况发生,导致丢包的原因通常有:WIFI或移动网络的无线信道干扰、高峰时期路由器拥塞、移动设备性能不足等。当一个音频数据包在网络上传输的时间过长,即,在需要播放时不能及时传达,即使后续收到也会判定丢包。现有的丢包处理方式一般为:接收端在接收到发送端传输的数据时对其进行丢包检 测,若存在丢包情况,则反馈至发送端以使其进行数据重发,直到检测到接收到的数据中不存在丢包为止。
请参阅图3,图3示意性示出了本申请实施例提供的一种丢包重发方法的架构图。图3所示的架构图包括:发送方终端310、服务器320以及接收方终端330;其中,发送方终端310包括特征提取模块311、语音编码模块312以及数据重发模块313,服务器320包括丢包检测模块321、音频选路模块322以及数据重发模块323,接收方终端330包括丢包检测模块331、语音解码模块332、混音模块333以及播放模块334。
特征提取模块311可以采集音频信号,对采集到的音频信号进行特征提取,得到音频特征,并将采集到的音频信号发送至语音编码模块312。语音编码模块312可以对音频信号进行编码,得到对应的音频码流,并将音频码流发送至数据重发模块323。数据重发模块323可以将相匹配的音频码流和音频特征打包为音频数据包,并在打包得到多个音频数据包后将其作为一个数据分组转发至服务器320,以使得服务器320中的丢包检测模块321对接收到的数据分组进行丢包检测。
若丢包检测结果中包括处于丢失状态的音频数据包,丢包检测模块321则向数据重发模块313反馈该丢失状态,以使得数据重发模块313针对该丢失状态进行数据重发。进而,在丢包检测模块321接收到重发的音频数据包且检测到数据分组中不存在处于丢失状态的音频数据包时,则根据重发的音频数据包更新数据分组,并将更新后的数据分组发送至音频选路模块322。进而,音频选路模块322可以根据数据分组中每个数据包对应的音频特征对数据分组中的音频数据包进行筛选,得到目标音频数据包;其中,目标音频数据包对应的最大能量幅值大于数据分组中其他音频数据包,或者,目标音频数据包对应的能量谱分布与预设能量谱分布的相似度大于数据分组中其他音频数据包。进而,音频选路模块322可以将目标音频数据包发送至数据重发模块323,以使得数据重发模块323将目标音频数据包转发至接收方终端330。进而,接收方终端330中的丢包检测模块331对目标音频数据包进行丢包检测。
若丢包检测结果中包括处于丢失状态的目标音频数据包,丢包检测模块331则向数据重发模块323反馈该丢失状态,以使得数据重发模块323针对该丢失状态进行数据重发。进而,在丢包检测模块331接收到重发的目标音频数据包且检测到所有目标音频数据包中不存在处于丢失状态的目标音频数据包时,则根据重发的目标音频数据包更新所有目标音频数据包,并将更新后的所有目标音频数据包发送至语音解码模块332。进而,语音解码模块332可以将各个目标音频数据包中的音频码流解码为音频信号,并将解码得到的音频信号发送至混音模块333,以使得混音模块333对音频信号进行混音处理。进而,混音模块333可以将混音处理结果发送至播放模块334,以使得播放模块334播放混音处理结果。
在上述的丢包重发方法中,每一次数据包传输过程中可能包含了多个需要传输的音频数据包,在多个音频数据包均被完整接收(即,不存在丢包情况)时,才能触发下一模块执行相应的操作。当应用于多人会议领域时,音频信号的采集量较大,若采用上述方式进 行丢包重传,则容易造成数据传输效率较低的问题,进而导致接收方终端播放音频的时刻与发送方终端发送音频的时刻之间存在较长的时间差,影响多人会议中音频输出的实时性以及用户的使用体验。
此外,对于多人会议而言,在播放多通路音频信号时用户通常对响度较弱的音频信号感知力较弱。因此,本申请实施例可以将用户听觉不容易感知的响度较弱的音频信号作为音频数据包的筛选条件,从而确定是否重发丢失的音频数据包。如果服务器不仅能够根据音频数据包的能量对音频数据包进行筛选,还能够通过音频数据包的响度对筛选后的音频数据包进行二次筛选,进而再将二次筛选后的音频数据包传输至接收方终端,这样就可以在出现丢包情况时提升数据传输的效率,缩短接收方终端播放音频的时刻与发送方终端发送音频的时刻之间的时间差,进而改善多人会议中音频输出的实时性以及用户的使用体验。
基于上述问题,本示例实施方式提供了一种丢包重发方法。该丢包重发方法可以应用于上述服务器105,也可以应用于上述终端设备101、102、103中的一个或多个,本示例性实施例中对此不做特殊限定。接下来,将主要以服务器为执行主体为例,对丢包重发方法进行介绍。参考图3所示,该丢包重发方法可以包括以下S410至S420:
S410:获取目标音频数据包对应的响度。
S420:若接收到用于表示目标音频数据包丢失的丢包状态时,根据目标音频数据包对应的响度对目标音频数据包进行重发。
其中,S410和S420可以由服务器执行。举例来说,该服务器可以为用于筛选通路信号的选路服务器。
实施图1所示的方法,可以将音频数据包对应的响度作为数据重发条件,从而改善数据重发时间较长的问题,提升数据传输效率。此外,还可以有针对性地重发目标音频数据包,相较于相关技术将包含目标音频数据包的当次发送的所有数据包进行重发,可以降低重发数据量,降低对于网络资源的占用。
下面,对于本示例实施方式的上述步骤进行更加详细的说明。
在S410中,获取目标音频数据包对应的响度。
需要说明的是,服务器可以接收发送方终端发送的多个音频数据包,目标音频数据包为多个音频数据包中的一个或多个。由于多人会议中的会议成员并非都在说话,有的会议成员可能保持静默,此时保持静默的会议成员对应的发送方终端发送的音频数据包没有必要向接收方终端发送。在这种情况下,本申请实施例中,服务器在获取目标音频数据包对应的响度之前,可以从获取到的多个音频数据包中筛选音频特征满足预设条件的目标音频数据包。
其中,音频数据包中可以包括音频数据包对应的响度、音频码流和音频码流对应的音频特征,音频码流对应的音频特征可以包括音频码流对应的能量分布以及音频码流中各频点对应的能量幅值。目标音频数据包可以为一个或多个。
可见,实施该可选的实施例,能够通过对获取到的音频数据包进行筛选,以降低转发的数据量,从而降低对于网络资源的损耗,提升数据传输效率。
本申请实施例中,由于发送方终端向服务器发送多个音频数据包也可能发生丢包,因此从获取到的多个音频数据包中筛选音频特征满足预设条件的目标音频数据包之前,可以对接收到的多个音频数据包进行丢包检测;若丢包检测结果中包括丢失状态,则向发送方终端反馈丢失状态,以使得发送方终端针对丢失状态进行数据重发。
可以理解的是,发送方终端的数量可以为一个或多个,本申请实施例不作限定。当本申请实施例应用于多人会议时,多人会议中的会议成员终端可以作为发送方终端,同时也可以作为接收方终端。另外,多个音频数据包可以同属于一个数据分组,发送方终端一次可以发送一个数据分组,每个数据分组中包括一个或多个音频数据包,多个音频数据包对应于不同的音频通路。
作为一种可选的实施方式,对接收到的多个音频数据包进行丢包检测的方式具体可以为:确定单位时间(如,5s)内接收到的多个音频数据包分别对应的序列号;进而,根据数据传输协议(如,TCP协议)的包头(packet header)(如,Sequence Number连续数)检测多个音频数据包分别对应的序列号的连续性;如果多个音频数据包分别对应的序列号不连续(如,1、2、4、5),则判定出现丢包情况;如果多个音频数据包分别对应的序列号连续(如,1、2、3、4),则判定未出现丢包情况,其中,在未出现丢包情况时,多个音频数据包分别对应的丢包状态为未丢失状态;进而,根据多个音频数据包分别对应的序列号确定丢包情况对应的丢失数据包;进而,根据丢失数据包生成丢包检测结果;其中,丢失数据包的数量可以为一个或多个,本申请实施例不作限定。另外,丢包检测结果中可以包括丢失数据包、丢失数据包对应的序号以及丢失数据包对应的丢包状态中至少一个;其中,丢包状态可以通过数值0或1进行表示,数值0可以表示丢包状态中的丢失状态,数值1可以表示丢包状态中的未丢失状态。
作为一种可选的实施方式,向发送方终端反馈丢失状态的方式具体可以为:向发送方终端反馈包含丢包检测结果的负向反馈(Negative Acknowledgement,NACK)信号。可以理解的是,在该可选的实施方式中,若丢包检测结果中不包括丢失状态,则可以包括以下步骤:向发送方终端发送用于表示数据接收正常的正向反馈(Acknowledgement,ACK)信号。
作为一种可选的实施方式,发送方终端针对丢失状态进行数据重发的方式具体可以为:发送方客户端确定丢失状态对应的丢失数据包并对丢失数据包进行重发;或者,发送方客户端将丢失状态对应的数据分组进行重发,其中,数据分组由上述的多个音频数据包构成。
可见,实施该可选的实施例,能够在检测到丢包状态时触发发送方重新发送数据,这样可以提升所传输的数据的完整性。
本申请实施例中,在发送方终端针对丢失状态进行数据重发之后,从获取到的多个音频数据包中筛选音频特征满足预设条件的目标音频数据包之前,上述方法还包括:根据重发数据包更新多个音频数据包。
其中,重发数据包的数量可以为一个或多个,更新后的多个音频数据包中包括上述的重发数据包。
作为一种可选的实施方式,根据重发数据包更新多个音频数据包的方式具体可以为:确定重发数据包对应的序列号以及多个音频数据包分别对应的序列号,根据序列号顺序整合重发数据包和上述的多个音频数据包,以实现对于多个音频数据包的更新。
可见,实施该可选的实施例,能够通过数据包更新,保证发送至接收方终端的数据的完整性。
本申请实施例中,接收到的多个音频数据包可以由发送方终端发送的。
其中,发送方终端发送多个音频数据包的方式具体为:发送方终端采集音频信号并对音频信号进行特征提取,得到音频特征;发送方终端对音频信号进行编码,得到音频码流;发送方终端将音频码流和音频特征打包为音频数据包并发送至服务器。
需要说明的是,发送方终端采集的音频信号可以为模拟信号。音频码流用于表示音频信号在单位时间内使用的数据流量,音频码流=采样率*比特数*声道,如,44100*16*2=1.41Mbit/sec。
作为一个可选的实施方式,音频信号对应的音频特征可以包括过零率、短时能量、短时自相关函数以及短时平均幅度差中至少一种,本申请实施例不作限定。
当音频信号对应的音频特征包括过零率时,发送方终端采集音频信号并对音频信号进行特征提取,得到音频特征的方式具体可以为:发送方终端采集音频信号并根据
Figure PCTCN2021095677-appb-000001
Figure PCTCN2021095677-appb-000002
计算音频信号的过零率,作为音频信号的音频特征;其中,N为音频信号的帧长,n为音频信号的帧数。计算得到的过零率可以用于表征每帧音频信号通过零值的次数,通过上述的过零率可以判定音频信号中清音和浊音,有利于服务器筛选接收到的多个音频数据包。
当音频信号对应的音频特征包括短时能量时,发送方终端采集音频信号并对音频信号进行特征提取,得到音频特征的方式具体可以为:发送方终端采集音频信号并根据
Figure PCTCN2021095677-appb-000003
Figure PCTCN2021095677-appb-000004
检测采集到的音频信号中第n帧音频信号x n(m)的短时能量E n,得到各帧对应的短时能量,作为音频信号的音频特征;其中,N为音频信号的帧长,n为正整数。一般来说,人声的能量比噪声的能量大,计算得到的各帧对应的短时能量能够用于区分音 频信号中的人声和噪声,进而有利于服务器根据该各帧对应的能量对接收到的多个音频数据包进行噪声筛除,从而提升对于音频数据的处理效率。
当音频信号对应的音频特征包括短时自相关函数时,发送方终端采集音频信号并对音频信号进行特征提取,得到音频特征的方式具体可以为:发送方终端采集音频信号并根据
Figure PCTCN2021095677-appb-000005
Figure PCTCN2021095677-appb-000006
计算音频信号的短时自相关函数,作为音频信号的音频特征;其中,N为音频信号的帧长,n为正整数,w用于表示窗函数,w’(m)用于表示加窗后的音频帧。计算得到的短时自相关函数能够用于衡量信号自身时间波形的相似性,有利于服务器根据短时自相关函数检测音频的相似特性,从而有利于提升对于音频数据的处理效率。
当音频信号对应的音频特征包括短时平均幅度差时,发送方终端采集音频信号并对音频信号进行特征提取,得到音频特征的方式具体可以为:发送方终端采集音频信号并根据
Figure PCTCN2021095677-appb-000007
Figure PCTCN2021095677-appb-000008
计算音频信号的短时平均幅度差,作为音频信号的音频特征;其中,k=0,1,……,N-1,计算得到的短时平均幅度差可以用于衡量音频幅度的变化。
此外,音频信号对应的音频特征还可以包括语谱图、短时功率谱密度、谱熵、基频、共振峰等特征,本申请实施例不作限定。
可见,实施该可选的实施例,发送方终端可以对音频信号进行特征提取,这样可以有利于服务器依据特征提取的结果进行通路信号选取,从而在保证多人会议的音频输出实时性的同时,还能够保证音频输出的效果。
本申请实施例中,筛选音频特征满足预设条件的目标音频数据包时,所使用的预设条件包括预设能量幅值和/或预设信噪比,故从获取到的多个音频数据包中筛选音频特征满足预设条件的目标音频数据包的方式可以是:若检测到音频特征中存在大于预设能量幅值的至少一个能量幅值,则将音频特征对应的音频码流所属的音频数据包确定为目标音频数据包;和/或,若检测到音频特征中存在大于预设信噪比的至少一个信噪比,则将音频特征对应的音频码流所属的音频数据包确定为目标音频数据包。
例如,音频码流中各音频帧分别对应一个能量幅值,能量幅值用于表示音频帧对应的短时能量。信噪比(Signal-to-Noise Ratio)为音频信号的平均功率和噪声的平均功率之比,即:信噪比(dB)=10*log10(S/N)(dB)。
可见,实施该可选的实施例,能够通过能量幅值或信噪比对音频数据包进筛选,以精简需要传输的音频数据包,提升数据传输效率。
本申请实施例中,获取目标音频数据包对应的响度的方式可以是根据预设时长对目标音频数据包中的音频码流进行分帧处理,得到多个音频帧;通过预设窗函数分别处理多个音频帧,得到多个参考帧;计算多个参考帧分别对应的功率谱;根据功率谱计算目标音频数据包对应的响度。
可以理解的是,在本申请实施例中,可以由执行丢包重发方法的电子设备按照上述方法计算目标音频数据包对应的响度。若执行丢包重发方法的电子设备为服务器,则可以由服务器按照上述方法计算目标音频数据包对应的响度,当然也可以由发送方终端按照上述方法计算目标音频数据包对应的响度,服务器直接从发送方终端获取目标音频数据包对应的响度。
多个音频帧对应的时长一致(如,10ms或20ms),多个参考帧与音频帧对应的时长一致。另外,由于快速傅里叶变换(Fast Fourier Transform,FFT)只能对有限长度的时域数据进行变换,因此,需要对时域信号进行信号截断。即使是周期信号,如果截断的周期信号的时间长度不是周期的整数倍,截取后的信号将容易存在泄漏。因此,本申请应用了可以使时域的音频信号满足傅里叶变换的周期性要求的预设窗函数,以减少信号泄漏。预设窗函数为汉宁窗函数、汉明窗函数、布莱克曼窗函数、凯泽窗函数、三角形窗函数或矩形窗函数。
作为一种可选的实施方式,通过预设窗函数分别处理多个音频帧,得到多个参考帧的方式可以为:确定多个音频帧分别对应的时域表达式;将多个音频帧分别对应的时域表达式与预设窗函数进行点乘,得到多个音频帧分别对应的参考帧。
作为一种可选的实施方式,计算多个参考帧分别对应的功率谱的方式可以为:对多个参考帧进行快速傅里叶变换,以确定出多个参考帧分别对应的功率谱。
可见,实施该可选的实施例,能够计算得到目标音频数据包的响度,该响度可以作为数据重发条件,在响度较低时,服务器可以不对丢包的目标音频数据包进行重发,以减少对于网络资源的占用。
本申请实施例中,根据功率谱计算目标音频数据包对应的响度的一种可能的实现方式可以是根据功率谱中各频点的能量幅值计算功率谱中各频点的频点响度;根据频点响度计算功率谱中各频点的响度权重;计算功率谱中各频点的能量幅值与功率谱中各频点的响度权重之间的加权和,作为功率谱对应的参考帧的响度值;将多个参考帧对应的响度值之和确定为目标音频数据包对应的响度。
其中,频点是固定频率的编号,可以作为固定频率的唯一表征。
作为一种可选的实施方式,根据功率谱中各频点的能量幅值计算功率谱中各频点的频点响度的方式具体可以为:确定功率谱中各频点i的能量幅值的绝对值P(i,j),j=0~K-1,K为总频点数。
作为一种可选的实施方式,根据频点响度计算功率谱中各频点的响度权重的方式具体可以为:基于下述公式计算功率谱中各频点freq的响度权重cof(freq);其中,
Figure PCTCN2021095677-appb-000009
Figure PCTCN2021095677-appb-000010
Figure PCTCN2021095677-appb-000011
Figure PCTCN2021095677-appb-000012
freq的响度loud=4.2+afy*(dB-cfy)/(1+bfy*(dB-cfy));
Figure PCTCN2021095677-appb-000013
其中,freq为需要计算感知系数的频点的频率值(例如,频点对应的频带的中心频率值);ff、af、bf、cf为BS3383公开的等响曲线数据表内的数据;loud表示频点freq的响度,cof(freq)表示频点freq对应的感知系数。
结合cof(freq)的表达式参阅图5,图5示意性示出了根据本申请的一个实施例的响度权重曲线图。图5所示的曲线图中,横轴用于表示频率,纵轴用于表示响度权重。在已知频点响度情况下,可以通过图5所示的曲线图确定出该频点对应的权重。
作为一种可选的实施方式,计算功率谱中各频点的能量幅值与功率谱中各频点的响度权重之间的加权和,作为功率谱对应的参考帧的响度值的方式具体可以为:根据功率谱中各频点的能量幅值和功率谱中各频点的响度权重之间的加权和,作为功率谱对应的参考帧的响度值EP(i);其中,i为帧序号,k为频点序号。
可见,实施该可选的实施例,能够根据各频点对应的响度确定目标音频数据包对应的响度作为数据重发条件的响度,从而提升服务器的数据重发效率。
在S420中,若接收到用于表示目标音频数据包丢失的丢包状态时,根据目标音频数据包对应的响度对目标音频数据包进行重发。
本申请实施例中,根据目标音频数据包对应的响度对目标音频数据包进行重发的实现方式可以是若目标音频数据包对应的响度大于预设响度,向接收方终端重发目标音频数据包,以使得接收方终端对重发后以及重发前的目标音频数据包进行解码输出;若目标音频数据包对应的响度小于预设响度,则不向接收方终端重发目标音频数据包。
需要说明的是响度是用于描述声音大小的主观感觉量,单位是宋(sone)。1000Hz纯音,声压级为40dB时的响度为1宋;2宋的声音是40方声音响度的2倍;4宋为40方声音响度的4倍。也就是说,声压级增加10dB,响度增加一倍,人耳对响度的感觉随声压级变化。请参阅图6,图6示意性示出了根据本申请的一个实施例的声学等响曲线图。图6所示的声学等响曲线中展示了多种响度级(即,100phon、80phon、60phon、40phon、20phon、阈值thrshold phon)的曲线,横轴用于表示声波频率(Frequency),纵轴用于表示声压级(Sound Pressure Level),通过图6所示的声学等响曲线图可知,等响曲线是描述等响条件下声压级与声波频率的关系曲线,在中低频(1kHz以下)中,频率越低,等响需要的声压强度(能量)越大。即,声音能量越大,人耳的听觉感受越一致。而在中高频中(1kHz以 上),不同频段的频率对应不同的声学听觉感知特征。参见图6可知,若通过响度对音频数据包筛选,则能够将人耳感知能力较弱的声音筛除,这样可以提升数据传输效果。
作为一种可选的实施方式,若目标音频数据包对应的响度大于预设响度,向接收方终端重发目标音频数据包的方式可以为:检测目标音频数据包在对应的响度是否大于预设响度C;如果是,则触发数据重发功能开启(即,触发使能开关ArqEnable=1),并通过数据重发功能向接收方终端重发目标音频数据包;如果否,则关闭数据重发功能(即,触发使能开关ArqEnable=0)。也就是说,可以依据下述表达式向接收方终端重发目标音频数据包
Figure PCTCN2021095677-appb-000014
可见,实施该可选的实施例,能够根据响度对音频数据包进行筛选,减少人耳感知能力较弱的声音对于网络资源的占用,进而提升数据传输效率。此外,由于本申请可以针对丢失数据包进行重传,因而,可以解决现有技术中每次重传都需要重传整个数据分组的所有数据带来的传输效率较低的问题。
本申请实施例中,接收方终端对重发后以及重发前的目标音频数据包进行解码输出的方式可以为:接收方终端解码重发后以及重发前的目标音频数据包,得到多个待输出音频信号;接收方终端对多个待输出音频信号进行混音处理,得到混音信号并播放。
作为一种可选的实施方式,在接收方终端对多个待输出音频信号进行混音处理之前,还可以包括以下步骤:接收方终端对多个待输出音频信号进行格式归一化,这样可以保证多个待输出音频信号格式统一;进而,将多个待输出音频信号转换为预设采样率(如,16k Hz、32k Hz、44.1k Hz、48k Hz等);进而,检测多个待输出音频信号的位深(Bit-Depth)或采样格式(Sample Format)的一致性,若位深不一致或采样格式不一致,则对多个待输出音频信号进行相应的归一化处理,以使得承载每个采样点音频数据的比特数相同;进而,检测多个待输出音频信号的声道(如,单声道或双声道)是否一致,若不一致,则输出用于表示声道不一致的提示消息;若一致,则执行对多个待输出音频信号进行混音处理的操作。此外,可选的,在接收方终端对多个待输出音频信号进行混音处理之前,接收方终端还可以对多个待输出音频信号进行回声消除、噪音抑制、静音检测等处理,本申请实施例不作限定。
作为一种可选的实施方式,接收方终端对多个待输出音频信号进行混音处理,得到混音信号的方式可以为:接收方终端根据多个待输出音频信号分别对应的能量幅值均衡调节多个待输出音频信号的音量,并混合均衡调节的结果,得到混音信号。
作为一种可选的实施方式,接收方终端对多个待输出音频信号进行混音处理,得到混音信号之后,还可以包括以下步骤:接收方终端对混音信号进行溢出检测,若存在溢出,则对溢出的采样点进行溢出处理/平滑处理,进而播放处理结果。
可见,实施该可选的实施例,接收方终端在接收到完整的多个音频数据包之后可以对其进行混音输出,该输出结果保留了会议中的重要音频内容,摒弃了人耳感知能力不强的 音频内容,可以提升多人会议的消息转发效率,保证输出音频信号的实时性,从而改善用户的使用体验。
请参阅图7,图7示意性示出了根据本申请的一个实施例的丢包重发方法的时序图。如图7所示,包括S700~S780,其中:
在S700中,发送方终端向服务器发送多个音频数据包。
发送方终端可以对采集的音频信号进行特征提取,得到音频特征;以及,对音频信号进行编码,得到音频信号对应的音频码流;以及,将音频数据包对应的响度、音频码流和音频特征打包为音频数据包。从而将打包得到的多个音频数据包发送至服务器。
在S710中,服务器在接收到多个音频数据包之后,可以进行丢包检测。
在S720中,若丢包检测结果中包括丢失状态,服务器向发送方终端反馈丢失状态。
在S730中,发送方终端针对丢失状态向服务器重发数据。
在S740中,服务器将从多个音频数据包中确定的目标音频数据包转发至接收方终端。
服务器可以根据重发数据包更新多个音频数据包,从多个音频数据包中筛选音频特征满足预设条件的目标音频数据包,并将目标音频数据包转发至接收方终端。
在S750中,接收方终端在接收到目标音频数据包之后,可以进行丢包检测。
在S760中,若丢包检测结果中包括丢失状态,接收方终端向服务器反馈目标音频数据包的丢失状态。
在S770中,若目标音频数据包的响度大于预设响度,服务器向接收方终端重发目标音频数据包。
在S780中,接收方终端对重发后以及重发前的目标音频数据包进行解码输出。
其中,可选的,音频数据包中也可以不包括响度,在发送方终端将音频数据包发送至服务器之后,服务器可以根据音频数据包中的音频特征计算音频信号对应的响度。
可见,实施图7所示的方法,能够对人耳感知能力较强的音频数据启动数据重传功能,对人耳感知能力较弱的音频数据不启动数据重传功能,以保障数据传输质量。进而,可以确保多人网络通话中的主要声源的传输质量得到最大程度的保障,可以避免不必要的网络带宽资源的消耗,同时可以降低因无筛选条件下丢包重传带来的通话延时增加的不良用户通话体验。
本示例实施方式中,还提供了一种丢包重发系统800。请参阅图8,包括发送方终端801、服务器802以及接收方终端803,其中:
发送方终端801,用于向服务器802发送目标音频数据包;
服务器802,用于获取目标音频数据包对应的响度;
接收方终端803,用于接收目标音频数据包,并向服务器802发送用于表示目标音频数据包丢失的丢包状态;
服务器802,还用于在接收到丢包状态时,根据目标音频数据包对应的响度将目标音频数据包重发至接收方终端803。
可见,实施图8所示的系统,可以将音频数据包对应的响度作为数据重发条件,从而改善数据重发时间较长的问题,提升数据传输效率。此外,还可以有针对性地重发目标音频数据包,相较于现有技术将包含目标音频数据包的当次发送的所有数据包进行重发,可以降低重发数据量,降低对于网络资源的占用。
请参阅图9,图9示意性示出了根据本申请的另一个实施例的丢包重发系统900的结构框图。如图9所示,图9所示的结构框图包括:发送方终端910、服务器920以及接收方终端930;其中,发送方终端910包括特征提取模块911、语音编码模块912以及数据重发模块913,服务器920包括丢包检测模块921、音频选路模块922、数据重发模块923以及感知分析模块924,接收方终端930包括丢包检测模块931、语音解码模块932、混音模块933以及播放模块934。
特征提取模块911可以采集音频信号,对采集到的音频信号进行特征提取,得到音频特征,并将采集到的音频信号发送至语音编码模块912。语音编码模块912可以对音频信号进行编码,得到对应的音频码流,并将音频码流发送至数据重发模块923。数据重发模块923可以将相匹配的音频码流和音频特征打包为音频数据包,音频数据包中包括音频数据包对应的响度、音频码流和音频码流对应的音频特征;并在打包得到多个音频数据包后将其作为一个数据分组转发至服务器920,以使得服务器920中的丢包检测模块921对接收到的数据分组进行丢包检测。
若丢包检测结果中包括处于丢失状态的音频数据包,丢包检测模块921则向数据重发模块913反馈该丢失状态,以使得数据重发模块913针对该丢失状态进行数据重发。进而,在丢包检测模块921接收到重发的音频数据包且检测到数据分组中不存在处于丢失状态的音频数据包时,则根据重发的音频数据包更新数据分组,并将更新后的数据分组发送至音频选路模块922。进而,音频选路模块922可以根据数据分组中每个数据包对应的音频特征对数据分组中的音频数据包进行筛选,得到目标音频数据包;其中,目标音频数据包对应的最大能量幅值大于数据分组中其他音频数据包,或者,目标音频数据包对应的能量谱分布与预设能量谱分布的相似度大于数据分组中其他音频数据包。进而,音频选路模块922可以将目标音频数据包发送至感知分析模块924和数据重发模块923,以使得数据重发模块923将目标音频数据包转发至接收方终端930。进而,接收方终端930中的丢包检测模块931对目标音频数据包进行丢包检测;其中,目标音频数据包的数量可以为一个或多个。
若丢包检测结果中包括处于丢失状态的目标音频数据包,丢包检测模块931则向数据重发模块923反馈该丢失状态。进而,若目标音频数据包的响度大于预设响度,数据重发模块923可以针对该丢失状态进行数据重发;其中,处于丢失状态的目标音频数据包可以为一个或多个。进而,丢包检测模块931接收到重发的目标音频数据包且检测到所有目标音频数据包中不存在处于丢失状态的目标音频数据包时,则根据重发的目标音频数据包更新所有目标音频数据包,并将更新后的所有目标音频数据包发送至语音解码模块932。进而,语音解码模块932可以将各个目标音频数据包中的音频码流解码为音频信号,并将解码得到的音频信号发送至混音模块933,以使得混音模块933对音频信号进行混音处理。进而,混音模块933可以将混音处理结果发送至播放模块934,以使得播放模块934播放混音处理结果。
可见,实施图9所示的系统,可以将音频数据包对应的响度作为数据重发条件,从而改善数据重发时间较长的问题,提升数据传输效率。此外,还可以有针对性地重发目标音频数据包,相较于现有技术将包含目标音频数据包的当次发送的所有数据包进行重发,可以降低重发数据量,降低对于网络资源的占用。
本实例实施例中,上述的发送方终端和接收方终端的数量均可以为多个,上述的服务器可以为服务器集群。请参阅图10,图10示意性示出了根据本申请的又一个实施例的丢包重发系统1000的结构框图。如图10所示,包括发送方终端1011、发送方终端1012、……、发送方终端101n、服务器集群1030、接收方终端1021、接收方终端1022、……、接收方终端102n;其中,n为正整数且大于等于3。
参见图10可知,本申请中的服务器集群1030可以接收来自发送方终端1011、发送方终端1012、……、发送方终端101n中至少一个发送的数据分组,每个数据分组中可以包括一个或多个音频数据包,若音频数据包满足预设条件,则音频数据包可以被服务器集群1030判定为目标音频数据包并转发至接收方终端1021、接收方终端1022、……、接收方终端102n。若接收到接收方终端1021、接收方终端1022、……、接收方终端102n中任一个发送的用于表示目标音频数据包丢失的丢包状态时,根据目标音频数据包对应的响度对目标音频数据包进行重发。
可见,实施图10所示的系统,可以将音频数据包对应的响度作为数据重发条件,从而改善数据重发时间较长的问题,提升数据传输效率。此外,还可以有针对性地重发目标音频数据包,相较于现有技术将包含目标音频数据包的当次发送的所有数据包进行重发,可以降低重发数据量,降低对于网络资源的占用。
本示例实施方式中,还提供了一种丢包重发装置。参考图11所示,该丢包重发装置1100可以包括:
响度获取单元1101,用于获取目标音频数据包对应的响度;
数据发送单元1102,用于在接收到用于表示目标音频数据包丢失的丢包状态时,根据目标音频数据包对应的响度对目标音频数据包进行重发。
可见,实施图11所示的装置,可以将音频数据包对应的响度作为数据重发条件,从而改善数据重发时间较长的问题,提升数据传输效率。此外,还可以有针对性地重发目标音频数据包,相较于现有技术将包含目标音频数据包的当次发送的所有数据包进行重发,可以降低重发数据量,降低对于网络资源的占用。
在本申请的一种示例性实施例中,上述装置还包括数据包筛选单元(未图示),其中:
数据包筛选单元,用于在响度获取单元1101获取目标音频数据包对应的响度之前,从获取到的多个音频数据包中筛选音频特征满足预设条件的目标音频数据包。
可见,实施该可选的实施例,能够通过音频特征对接收到的音频数据包进行筛选,以降低转发的数据量,从而降低对于网络资源的损耗,提升数据传输效率。
在本申请的一种示例性实施例中,上述装置还包括丢包检测单元(未图示)和丢包状态反馈单元(未图示),其中:
丢包检测单元,用于在数据包筛选单元从接收到的多个音频数据包中筛选音频特征满足预设条件的目标音频数据包之前,对接收到的多个音频数据包进行丢包检测;
丢包状态反馈单元,用于若丢包检测结果中包括丢失状态,则向发送方终端反馈丢失状态,以使得发送方终端针对丢失状态进行数据重发。
其中,音频数据包中包括音频数据包对应的响度、音频码流和音频码流对应的音频特征,音频码流对应的音频特征包括音频码流对应的能量分布以及音频码流中各频点对应的能量幅值。
可见,实施该可选的实施例,能够在检测到丢包状态时触发发送方终端重新发送数据,这样可以提升所传输的数据的完整性。
在本申请的一种示例性实施例中,上述装置还包括数据包更新单元(未图示),其中:
数据包更新单元,用于在发送方终端针对丢失状态进行数据重发之后,以及在数据包筛选单元从获取到的多个音频数据包中筛选音频特征满足预设条件的目标音频数据包之前,根据重发数据包更新多个音频数据包。
可见,实施该可选的实施例,能够通过数据包更新,保证发送至接收方终端的数据的完整性。
在本申请的一种示例性实施例中,接收到的多个音频数据包由发送方终端发送;
其中,所述多个音频数据包是发送方终端将音频码流和音频特征打包得到的,所述音频特征是发送方终端采集音频信号并对音频信号进行特征提取得到的;所述音频码流是发送方终端对音频信号进行编码得到的。
可见,实施该可选的实施例,发送方终端可以对音频信号进行特征提取,这样可以有利于服务器依据特征提取的结果进行通路信号选取,从而在保证多人会议的音频输出实时性的同时,还能够保证音频输出的效果。
在本申请的一种示例性实施例中,预设条件包括预设能量幅值和/或预设信噪比,数据包筛选单元从获取到的多个音频数据包中筛选音频特征满足预设条件的目标音频数据包,包括:
若检测到音频特征中存在大于预设能量幅值的至少一个能量幅值,则将音频特征对应的音频码流所属的音频数据包确定为目标音频数据包;和/或,
若检测到音频特征中存在大于预设信噪比的至少一个信噪比,则将音频特征对应的音频码流所属的音频数据包确定为目标音频数据包。
可见,实施该可选的实施例,能够通过能量幅值或信噪比对音频数据包进筛选,以精简需要传输的音频数据包,提升数据传输效率。
在本申请的一种示例性实施例中,响度获取单元1101,用于:
根据预设时长对目标音频数据包中的音频码流进行分帧处理,得到多个音频帧;
通过预设窗函数分别处理多个音频帧,得到多个参考帧;
计算多个参考帧分别对应的功率谱;
根据功率谱计算目标音频数据包对应的响度。
其中,预设窗函数为汉宁窗函数、汉明窗函数、布莱克曼窗函数、凯泽窗函数、三角形窗函数或矩形窗函数。
可见,实施该可选的实施例,能够计算得到目标音频数据包的音频响度,该音频响度可以作为数据重发条件,在音频响度较低时,服务器可以不对丢包的目标音频数据包进行重发,以减少对于网络资源的占用。
在本申请的一种示例性实施例中,响度获取单元1101,具体用于:
根据功率谱中各频点的能量幅值计算功率谱中各频点的频点响度;
根据频点响度计算功率谱中各频点的响度权重;
计算功率谱中各频点的能量幅值与功率谱中各频点的响度权重之间的加权和,作为功率谱对应的参考帧的响度值;
将多个参考帧对应的响度值之和确定为目标音频数据包对应的响度。
可见,实施该可选的实施例,能够根据各频点对应的响度确定目标音频数据包对应的作为数据重发条件的响度,从而提升服务器的数据重发效率。
在本申请的一种示例性实施例中,数据发送单元1102,用于:
若目标音频数据包对应的响度大于预设响度,向接收方终端重发目标音频数据包,以使得接收方终端对重发后以及重发前的目标音频数据包进行解码输出。
可见,实施该可选的实施例,能够根据响度对音频数据包进行筛选,减少人耳感知能力较弱的声音对于网络资源的占用,进而提升数据传输效率。此外,由于本申请可以针对丢失数据包进行重传,因而,可以解决现有技术中每次重传都需要重传整个数据分组的所有数据带来的传输效率较低的问题。
在本申请的一种示例性实施例中,接收方终端对重发后以及重发前的目标音频数据包进行解码输出的方式具体为:
接收方终端解码重发后以及重发前的目标音频数据包,得到多个待输出音频信号;
接收方终端对多个待输出音频信号进行混音处理,得到混音信号并播放。
可见,实施该可选的实施例,接收方终端在接收到完整的多个音频数据包之后可以对其进行混音输出,该输出结果保留了会议中的重要音频内容,摒弃了人耳感知能力不强的音频内容,可以提升多人会议的消息转发效率,保证输出音频信号的实时性,从而改善用户的使用体验。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本申请的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
由于本申请的示例实施例的丢包重发装置的各个功能模块与上述丢包重发方法的示例实施例的步骤对应,因此对于本申请装置实施例中未披露的细节,请参照本申请上述的丢包重发方法的实施例。
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被一个该电子设备执行时,使得该电子设备实现上述实施例中所述的方法。
需要说明的是,本申请所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。 计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现,所描述的单元也可以设置在处理器中。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由下面的权利要求指出。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。

Claims (17)

  1. 一种丢包重发方法,所述方法由电子设备执行,包括:
    获取目标音频数据包对应的响度;
    若接收到用于表示所述目标音频数据包丢失的丢包状态时,根据所述目标音频数据包对应的响度对所述目标音频数据包进行重发。
  2. 根据权利要求1所述的方法,根据所述目标音频数据包对应的响度对所述目标音频数据包进行重发,包括:
    若所述目标音频数据包对应的响度大于预设响度,向接收方终端重发所述目标音频数据包,以使得所述接收方终端对重发后以及重发前的目标音频数据包进行解码输出。
  3. 根据权利要求1所述的方法,所述获取目标音频数据包对应的响度,包括:根据预设时长对所述目标音频数据包中的音频码流进行分帧处理,得到多个音频帧;
    通过预设窗函数分别处理所述多个音频帧,得到多个参考帧;
    计算所述多个参考帧分别对应的功率谱;
    根据所述功率谱计算所述目标音频数据包对应的响度。
  4. 根据权利要求3所述的方法,所述预设窗函数为汉宁窗函数、汉明窗函数、布莱克曼窗函数、凯泽窗函数、三角形窗函数或矩形窗函数。
  5. 根据权利要求3所述的方法,所述根据所述功率谱计算所述目标音频数据包对应的响度,包括:
    根据所述功率谱中各频点的能量幅值计算所述功率谱中各频点的频点响度;
    根据所述频点响度计算所述功率谱中各频点的响度权重;
    计算所述功率谱中各频点的能量幅值与所述功率谱中各频点的响度权重之间的加权和,作为所述功率谱对应的参考帧的响度值;
    将所述多个参考帧对应的响度值之和确定为所述目标音频数据包对应的响度。
  6. 根据权利要求5所述的方法,所述接收方终端对重发后以及重发前的目标音频数据包进行解码输出的方式具体为:
    所述接收方终端解码重发后以及重发前的目标音频数据包,得到多个待输出音频信号;
    所述接收方终端对所述多个待输出音频信号进行混音处理,得到混音信号并播放。
  7. 根据权利要求1所述的方法,获取目标音频数据包对应的响度之前,所述方法还包括:
    从获取到的多个音频数据包中筛选音频特征满足预设条件的目标音频数据包。
  8. 根据权利要求7所述的方法,从获取到的多个音频数据包中筛选音频特征满足预设条件的目标音频数据包之前,所述方法还包括:
    对接收到的所述多个音频数据包进行丢包检测;
    若丢包检测结果中包括丢失状态,则向发送方终端反馈所述丢失状态,以使得所述发送方终端针对所述丢失状态进行数据重发。
  9. 根据权利要求8所述的方法,在所述发送方终端针对所述丢失状态进行数据重发之后,从获取到的多个音频数据包中筛选音频特征满足预设条件的目标音频数据包之前,所述方法还包括:
    根据重发数据包更新所述多个音频数据包。
  10. 根据权利要求7所述的方法,所述音频数据包中包括所述音频数据包对应的响度、音频码流和所述音频码流对应的音频特征,所述音频码流对应的音频特征包括所述音频码流对应的能量分布以及所述音频码流中各频点对应的能量幅值。
  11. 根据权利要求10所述的方法,获取所述多个音频数据包,包括:接收发送方终端发送的所述多个音频数据包;
    其中,所述多个音频数据包是所述发送方终端将音频码流和音频特征打包得到的,所述音频特征是所述发送方终端采集音频信号并对所述音频信号进行特征提取得到的,所述音频码流是所述发送方终端对所述音频信号进行编码得到的。
  12. 根据权利要求10所述的方法,所述预设条件包括预设能量幅值和/或预设信噪比,从获取到的多个音频数据包中筛选音频特征满足预设条件的目标音频数据包,包括:
    若检测到所述音频特征中存在大于所述预设能量幅值的至少一个能量幅值,则将所述音频特征对应的音频码流所属的音频数据包确定为所述目标音频数据包;和/或,
    若检测到所述音频特征中存在大于所述预设信噪比的至少一个信噪比,则将所述音频特征对应的音频码流所属的音频数据包确定为所述目标音频数据包。
  13. 一种丢包重发系统,包括发送方终端、服务器以及接收方终端,其中:
    所述发送方终端,用于向所述服务器发送目标音频数据包;
    所述服务器,用于获取所述目标音频数据包对应的响度;
    所述接收方终端,用于接收所述目标音频数据包,并向所述服务器发送用于表示所述目标音频数据包丢失的丢包状态;
    所述服务器,还用于在接收到所述丢包状态时,根据所述目标音频数据包对应的响度将所述目标音频数据包重发至所述接收方终端。
  14. 一种丢包重发装置,所述装置部署在电子设备上,包括:
    响度获取单元,用于获取目标音频数据包对应的响度;
    数据发送单元,用于在接收到用于表示所述目标音频数据包丢失的丢包状态时,根据所述目标音频数据包对应的响度对所述目标音频数据包进行重发。
  15. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-12任一项所述的方法。
  16. 一种用于丢包重发的电子设备,包括:
    处理器;以及
    存储器,用于存储所述处理器的可执行指令;
    其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1-12任一项所述的方法。
  17. 一种计算机程序产品,当所述计算机程序产品被执行时,用于执行权利要求1-12任一项所述的方法。
PCT/CN2021/095677 2020-06-28 2021-05-25 丢包重发方法、系统、装置、计算机可读存储介质及设备 WO2022001494A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/730,061 US11908482B2 (en) 2020-06-28 2022-04-26 Packet loss retransmission method, system, and apparatus, computer-readable storage medium, and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010601648.XA CN113936670A (zh) 2020-06-28 2020-06-28 丢包重发方法、系统、装置、计算机可读存储介质及设备
CN202010601648.X 2020-06-28

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/730,061 Continuation US11908482B2 (en) 2020-06-28 2022-04-26 Packet loss retransmission method, system, and apparatus, computer-readable storage medium, and device

Publications (1)

Publication Number Publication Date
WO2022001494A1 true WO2022001494A1 (zh) 2022-01-06

Family

ID=79272647

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/095677 WO2022001494A1 (zh) 2020-06-28 2021-05-25 丢包重发方法、系统、装置、计算机可读存储介质及设备

Country Status (3)

Country Link
US (1) US11908482B2 (zh)
CN (1) CN113936670A (zh)
WO (1) WO2022001494A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102137438A (zh) * 2010-01-25 2011-07-27 华为技术有限公司 一种ip网络资源的分配方法及装置
CN104105045A (zh) * 2013-04-08 2014-10-15 深圳富泰宏精密工业有限公司 响度检测方法及系统
CN106067847A (zh) * 2016-05-25 2016-11-02 腾讯科技(深圳)有限公司 一种语音数据传输方法及装置
CN106899380A (zh) * 2015-12-19 2017-06-27 联芯科技有限公司 一种volte视频电话传输方法及其系统
CN111246312A (zh) * 2020-01-15 2020-06-05 安徽文香信息技术有限公司 一种丢包处理方法及装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3893763B2 (ja) * 1998-08-17 2007-03-14 富士ゼロックス株式会社 音声検出装置
US7110941B2 (en) * 2002-03-28 2006-09-19 Microsoft Corporation System and method for embedded audio coding with implicit auditory masking
US7546508B2 (en) * 2003-12-19 2009-06-09 Nokia Corporation Codec-assisted capacity enhancement of wireless VoIP
JP4172530B2 (ja) * 2005-09-02 2008-10-29 日本電気株式会社 雑音抑圧の方法及び装置並びにコンピュータプログラム
JP2008216720A (ja) * 2007-03-06 2008-09-18 Nec Corp 信号処理の方法、装置、及びプログラム
US10043529B2 (en) * 2016-06-30 2018-08-07 Hisense Usa Corp. Audio quality improvement in multimedia systems
EP3688756B1 (en) * 2017-09-28 2022-11-09 Sony Europe B.V. Method and electronic device
JP6838588B2 (ja) * 2018-08-28 2021-03-03 横河電機株式会社 音声分析装置、音声分析方法、プログラム、および記録媒体

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102137438A (zh) * 2010-01-25 2011-07-27 华为技术有限公司 一种ip网络资源的分配方法及装置
CN104105045A (zh) * 2013-04-08 2014-10-15 深圳富泰宏精密工业有限公司 响度检测方法及系统
CN106899380A (zh) * 2015-12-19 2017-06-27 联芯科技有限公司 一种volte视频电话传输方法及其系统
CN106067847A (zh) * 2016-05-25 2016-11-02 腾讯科技(深圳)有限公司 一种语音数据传输方法及装置
CN111246312A (zh) * 2020-01-15 2020-06-05 安徽文香信息技术有限公司 一种丢包处理方法及装置

Also Published As

Publication number Publication date
US11908482B2 (en) 2024-02-20
CN113936670A (zh) 2022-01-14
US20220254354A1 (en) 2022-08-11

Similar Documents

Publication Publication Date Title
US11727946B2 (en) Method, apparatus, and system for processing audio data
WO2021196905A1 (zh) 语音信号去混响处理方法、装置、计算机设备和存储介质
US6889187B2 (en) Method and apparatus for improved voice activity detection in a packet voice network
US20050108004A1 (en) Voice activity detector based on spectral flatness of input signal
US8560307B2 (en) Systems, methods, and apparatus for context suppression using receivers
US8751221B2 (en) Communication apparatus for adjusting a voice signal
CN103077727A (zh) 一种用于语音质量监测和提示的方法和装置
RU2312405C2 (ru) Способ осуществления машинной оценки качества звуковых сигналов
US20090055169A1 (en) Voice encoding device, and voice encoding method
US8996389B2 (en) Artifact reduction in time compression
JP6073456B2 (ja) 音声強調装置
US20230360666A1 (en) Voice signal detection method, terminal device and storage medium
WO2022166710A1 (zh) 语音增强方法、装置、设备及存储介质
US8423357B2 (en) System and method for biometric acoustic noise reduction
WO2008138263A1 (fr) Procédé et dispositif de génération de paramètres de bruit de confort
WO2022001494A1 (zh) 丢包重发方法、系统、装置、计算机可读存储介质及设备
Dantas et al. Comparing network performance of mobile voip solutions
US20230050519A1 (en) Speech enhancement method and apparatus, device, and storage medium
Hinrichs et al. A subjective and objective evaluation of a codec for the electrical stimulation patterns of cochlear implants
US20150317993A1 (en) Method and system to play background music along with voice on a cdma network
CN111326166B (zh) 语音处理方法及装置、计算机可读存储介质、电子设备
CN113936669A (zh) 数据传输方法、系统、装置、计算机可读存储介质及设备
JP4437011B2 (ja) 音声符号化装置
WO2024021733A1 (zh) 音频信号的处理方法、装置、存储介质及计算机程序产品
CN109841222B (zh) 音频通信方法、通信设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21834317

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19.05.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21834317

Country of ref document: EP

Kind code of ref document: A1