WO2018081937A1 - 一种确定音视频数据编码速率的方法、终端以及存储介质 - Google Patents

一种确定音视频数据编码速率的方法、终端以及存储介质 Download PDF

Info

Publication number
WO2018081937A1
WO2018081937A1 PCT/CN2016/104281 CN2016104281W WO2018081937A1 WO 2018081937 A1 WO2018081937 A1 WO 2018081937A1 CN 2016104281 W CN2016104281 W CN 2016104281W WO 2018081937 A1 WO2018081937 A1 WO 2018081937A1
Authority
WO
WIPO (PCT)
Prior art keywords
video data
audio
terminal
rate
uplink
Prior art date
Application number
PCT/CN2016/104281
Other languages
English (en)
French (fr)
Inventor
裘涵宇
孙兵
刘远来
杨琪
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2016/104281 priority Critical patent/WO2018081937A1/zh
Priority to CN201680080576.0A priority patent/CN108702352B/zh
Publication of WO2018081937A1 publication Critical patent/WO2018081937A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/24Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth, upstream requests

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a method, a terminal, and a storage medium for determining an encoding rate of audio and video data.
  • VoIP Voice Over LTE
  • LTE Long Term Evolution
  • IMS Internet Protocol
  • VoIP Internet Protocol
  • IMS Internet Multimedia Subsystem
  • VoIP Voice Over Broadband
  • PSTN Public Switched Telephone Network
  • 3GPP 3rd Generation Partnership Project
  • VoLTE can provide high-quality audio and video calls, which can realize the unification of data and voice services under the same network.
  • the voice data and the video data are often fixed in the transmission process, and the continuous transmission of the data at a fixed rate may cause an uplink error at the physical layer. It will continue to generate a large uplink real-time transport protocol (RTP) to discard the audio and video data packet rate; on the other hand, the uplink grant is reduced, and the physical layer uplinks the hybrid automatic repeat request (Hybrid Automatic Repeat).
  • RTP real-time transport protocol
  • the uplink transmission data is accumulated in the Packet Data Convergence Protocol (PDCP) layer, and the delay is increased. If the accumulation continues beyond the time delay requirement of the QoS Class Identifier (QCI), the PDCP will actively discard the buffered voice data or video data, resulting in end-to-end voice or video calls. The decline in quality has a negative impact on the user experience.
  • PDCP Packet Data Convergence Protocol
  • QCI QoS Class Identifier
  • Embodiments of the present invention provide a method, a terminal, and a storage medium for determining an encoding rate of audio and video data, which are used to reduce discarded audio and video data packets and accumulation of audio and video data during transmission, thereby improving audio and video communication quality.
  • an embodiment of the present invention provides a method for determining an encoding rate of audio and video data, where the method includes: detecting, by the terminal, uplink information, where the uplink information is information for characterizing a transmission attribute of the uplink audio and video data; The terminal adjusts the encoding rate of the audio and video data according to the detected first uplink information. It can be seen that, by using the solution provided by the application, the coding rate of the audio and video data can be adaptively adjusted based on the uplink transmission information of the audio and video data, thereby improving the physical layer transmission error rate or the terminal by adjusting the coding rate. The buffer accumulation on the uplink transmission path leads to the active discarding of audio and video data packets, thereby improving the quality of the multimedia call.
  • the terminal adjusts the encoding rate of the audio and video data according to the obtained first uplink information, including: the correspondence between the preset uplink information and the encoding rate of the audio and video data. And determining a first coding rate corresponding to the first uplink information, and adjusting an encoding rate of the audio and video data to a first coding rate.
  • the terminal adjusts an encoding rate of the audio and video data, including: the terminal compares the detected uplink information with a preset threshold; The terminal adjusts the encoding rate of the audio and video data according to the comparison result and a preset criterion corresponding to the comparison result and the encoding rate of the current audio and video data.
  • the solution provided by the present application may compare the detected uplink information with a preset threshold by setting a threshold, and determine an adjustment strategy for the encoding rate of the current audio and video data according to the comparison result, for example, when the detected The uplink information is greater than the preset threshold, and the adjustment strategy corresponding to the comparison result is to reduce the encoding rate of the audio and video data by 80% relative to the encoding rate of the current audio and video data, that is, by using a relative adjustment strategy with the current audio and video encoding rate.
  • Determining the target audio and video coding rate so that the adjustment of the coding rate of the audio and video data can be further adapted to the current uplink transmission environment, further improving the physical layer transmission error rate or the buffer on the terminal uplink transmission path. Stacked and caused to actively discard audio and video data packets.
  • the audio and video data is voice data.
  • the audio and video data is video data; and the coding rate of the video data may be adjusted by adjusting a video frame rate of the video data and/or a compression ratio of the video data.
  • the voice data can be adaptively adjusted based on the uplink transmission information
  • the video data can be adaptively adjusted based on the uplink transmission information, thereby improving the physical layer transmission error rate.
  • the buffer accumulation on the uplink transmission path of the terminal causes the audio and video data packets to be actively discarded.
  • the uplink information includes a physical layer uplink transmission error rate. It can be seen that, by using the solution provided by the application, the coding rate of the audio and video data can be adaptively adjusted based on the uplink transmission error rate of the physical layer in the uplink transmission environment, thereby improving the physical layer uplink transmission error rate. Improve the quality of multimedia calls.
  • the uplink information includes a cache on a uplink transmission path of the terminal. It can be seen that, by using the solution provided by the application, the coding rate of the audio and video data can be adaptively adjusted based on the buffer on the uplink transmission path of the terminal in the uplink transmission environment, thereby improving the buffer on the uplink transmission path of the terminal.
  • the problem of stacking and actively discarding audio and video data packets improves the quality of multimedia calls.
  • the buffer on the uplink transmission path of the terminal includes any one or more of the following: an audio encoder/video encoder cache, a real-time transport protocol RTP layer cache, and a user datagram protocol UDP/transport.
  • Control protocol TCP layer cache Internet Protocol IP layer cache, number of packets
  • the MAC layer cache is controlled according to the convergence protocol PDCP layer buffer, the radio link layer control protocol RLC layer buffer, and/or the medium access control.
  • the uplink information includes a physical layer uplink transmission error rate and a buffer on the uplink transmission path of the terminal, and the terminal adjusts a coding rate of the audio and video data according to the uplink information, including If the terminal determines that the coding rate of the audio and video data needs to be reduced according to the buffer on the uplink transmission path of the terminal, the terminal lowers the coding rate of the audio and video data; otherwise, the terminal according to the The physical layer uplinks the bit error rate and adjusts the coding rate of the audio and video data.
  • the coding rate of the audio and video data can be adaptively adjusted based on the buffer on the uplink transmission path of the terminal in the uplink transmission environment and the uplink transmission error rate of the physical layer, thereby enabling It avoids problems such as accumulation of buffers on the uplink transmission path of the terminal and actively discards audio and video data packets, thereby improving the quality of multimedia calls.
  • the audio and video data is voice data;
  • the uplink information includes a physical layer uplink transmission error rate, a buffer on a terminal uplink transmission path, and RoHC switch information; and the terminal is configured according to the uplink.
  • the information, the encoding rate of the audio and video data is adjusted, including: when the terminal determines that the RoHC switch information indicates that the RoHC switch is turned off, if the error rate is sent according to the physical layer uplink, and/or the terminal sends the uplink
  • the buffer on the path determines that the encoding rate of the voice data needs to be reduced, and the terminal lowers the encoding rate of the voice data to not lower than the preset encoding rate.
  • the influence of the RoHC switch information on the voice data coding in the voice data coding is further considered, so that the control code can be controlled based on the RoHC switch information.
  • the rate is reduced, the coding rate is not lower than the preset coding rate, so that the quality of the voice call can be guaranteed after the speed adjustment.
  • the audio and video data is voice data
  • the terminal adjusts the encoding rate of the audio and video data according to the obtained first uplink information, including: the terminal according to the preset uplink information and Corresponding relationship between the coding modes of the voice data, determining a first coding mode corresponding to the first uplink information, and adjusting an encoding rate of the audio and video data to a coding rate corresponding to the first coding mode.
  • the coding rate can be adaptively adjusted based on the specific situation of the audio and video data uplink transmission environment (by setting at least two thresholds to achieve differentiation), thereby reducing the discarded voice data packets and accumulation of voice data, and improving the quality of the multimedia call.
  • an embodiment of the present invention provides a terminal, where the terminal includes: a memory and a processor; the memory is coupled to the processor; and the memory is configured to store computer executable program code, the program code
  • the instructions are included; when the processor executes the instructions, the instructions cause the terminal to perform each of the possible methods of determining an audiovisual data encoding rate according to the first aspect and the first aspect of the claims.
  • the embodiment of the present invention provides a terminal, where the terminal includes: a detection module, an adjustment module, and a detection module, configured to detect uplink information, where the uplink information is information used to represent a transmission attribute of the uplink audio and video data. And an adjustment module, configured to adjust a coding rate of the audio and video data according to the first uplink information detected by the detection module.
  • the implementation of the possible method for determining the encoding rate of audio and video data and the beneficial effects of the first aspect and the first aspect may be referred to the principles and benefits of the problem.
  • the implementation of the terminal refer to the implementation of the method, and the repeated description will not be repeated.
  • an embodiment of the present invention provides a storage medium, where the storage medium is a non-transitory computer readable storage medium, where the non-volatile computer readable storage medium stores at least one program, each of the The program includes computer software instructions relating to the first aspect and various possible implementations of the first aspect, the instructions, when executed by a terminal having a processor, causing the terminal to perform each of the first aspect and the first aspect described above A method of determining the encoding rate of audio and video data that may be implemented.
  • the embodiment of the present invention adaptively adjusts the encoding rate of the audio and video data based on the uplink transmission information of the audio and video data, thereby improving the physical layer transmission error rate by adjusting the encoding rate or
  • the buffer accumulation on the uplink transmission path of the terminal causes the active discarding of audio and video data packets, thereby improving the quality of the multimedia call.
  • FIG. 1 is a schematic structural diagram of a system of audio and video channels in the prior art
  • FIG. 2 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
  • FIG. 3 is a flowchart of a method for determining an audio and video coding rate according to an embodiment of the present invention
  • FIG. 4 is a schematic structural diagram of a system of an audio and video path according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of another system for audio and video channels according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of another system for audio and video channels according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of another system for audio and video channels according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of an audio and video access system applied to a VoWIFI service according to an embodiment of the present invention.
  • the present invention provides a method, a terminal, and a storage medium for determining the encoding rate of audio and video data to provide a sound for the problem.
  • the encoding rate adjustment scheme of the video data in the encoding transmission process mainly adjusts the encoding rate of the audio and video data based on the detected uplink information by detecting the uplink information used to characterize the transmission attribute of the uplink audio and video data, thereby
  • the encoding rate of the audio and video data can be adaptively adjusted according to the actual situation of the uplink transmission of the audio and video data, thereby improving the problem of discarded audio and video data packets and accumulation in the transmission process of the audio and video data, and improving the quality of the VoLTE call and Delay.
  • FIG. 1 is a schematic structural diagram of a system for an exemplary audio-video path in which a scheme for determining an audio/video data encoding rate according to some embodiments of the present invention is applicable.
  • the exemplary audio and video path shown in FIG. 1 does not constitute a limitation of the audio and video path in which the scheme for determining the encoding rate of the audio and video data provided by the embodiment of the present invention can be applied, and may include more than the illustration. Or fewer levels, or combine some levels, or split some levels, or different levels of layout.
  • the functions of the exemplary audiovisual path illustrated in FIG. 1 may be specifically implemented by hardware, software programming, and a combination of hardware and software, and the hardware may specifically include one or more signal processing and/or application specific integrated circuits and the like.
  • the exemplary audio and video path structure may include an application processor (AP) 101, a communication processor (CP) 102, an audio data signal processor (Audio DSP) 103, and Wireless network NET 104.
  • AP application processor
  • CP communication processor
  • Audio DSP audio data signal processor
  • Wireless network NET 104 Wireless network NET 104.
  • the AP 101 may mainly include an application layer (Application, APP) 1011, a radio interface layer (RIL) 1012, a camera 1013, and a video encoder (Video Encoder) 1014; audio data signal processing.
  • Application, APP Application, APP
  • RIL radio interface layer
  • Video Encoder Video Encoder
  • Audio DSP audio data signal processing.
  • 103 can mainly include a voice encoder (Voice Encoder) 1031;
  • the CP 102 may mainly include an AT filter 1021, an IP Multimedia Subsystem Adapter (IMSA) 1022, an IMS/RTP 1023, an IP 1024, and a non-access layer (Non-access).
  • IMSA IP Multimedia Subsystem Adapter
  • NAS IP Multimedia Subsystem Adapter
  • RLC Radio Link Control
  • MAC Media Access Control
  • PHY Physical Layer
  • a communication connection can be established between the AP 101 and the CP 102.
  • the AP 101 can communicate with the CP 102 through an AT (Attention) command set, between the RIL 1012 in the AP 101 and the AT Filter 1021 in the CP 102. Send and receive AT commands, etc.
  • the IMSA 1022 in the CP 102 is mainly used to provide an adaptation between the IMS and the wireless communication network; the IMS/RTP 1023 is mainly used to provide end-to-end communication services for VoLTE calls, etc.; NAS/RRC 1025, PDCP 1026, RLC 1027, MAC 1028, and PHY 1029 are primarily used to provide functionality of the LTE wireless protocol stack; PHY 1029 may include network hardware such as transceivers to support communication with the wireless network NET 104.
  • the establishment process of the VoLTE call mainly involves the APP 1011, the RIL 1012 in the AP 101, and the AT Filter 1021, the IMSA 1022, and the NAS/RRC in the CP 102.
  • Message interaction between 1025, IP 1024, PDCP 1026, RLC 1027, MAC 1028, and PHY 1029 (as illustrated by the call command 1-7 shown in Figure 1).
  • the process of establishing a VoLTE call in this application will not be described in detail for the sake of brevity of description.
  • audio and video data can be transmitted.
  • the audio and video data may specifically include voice data and video data.
  • the voice data transmission mainly involves Voice Encoder 1031 in Audio DSP 103, and IMSA 1022, IMS/RTP 1023, IP 1024 in CP 102.
  • IMSA 1022 Voice Encoder 1031 in Audio DSP 103
  • IMS/RTP 1023 IMS/RTP 1023
  • IP 1024 IP 1024 in CP 102.
  • PDCP 1026 PDCP 1026
  • RLC 1027 MAC 1028
  • PHY 1029 PHY
  • the transmission process of the voice data shown in FIG. 1 mainly includes the following steps:
  • Step 101 The voice data is collected and encoded by the Voice Encoder 1031 in the Audio DSP 103, and the encoded voice data is transmitted to the IMS/RTP 1023 through the IMSA 1022 in the CP 102;
  • Step 102 Perform RTP/UDP/IP on voice data through IMS/RTP 1023 and IP 1024. After being packaged, it is further transmitted to PDCP 1026;
  • Step 103 The voice data is transmitted to the wireless network side NET 104 via the PDCP 1026, the RLC 1027, the MAC 1028, and the PHY 1029.
  • the transmission process of the video data is similar to the transmission process of the voice data.
  • the transmission process of the video data is different from the transmission process of the voice data only in the data collection part.
  • the acquisition and encoding of the video data is mainly performed by Camera 1013 and Video Encoder 1014 in the AP 101 (step 101'), and then transmitted to the IMS/RTP 1023 through the IMSA 1022 in the CP 102 (step 102), and then through the PDCP 1026.
  • the RLC 1027, the MAC 1028, and the PHY 1029 are transmitted to the wireless network side NET 104 (step 103).
  • the Voice Encoder in the Audio DSP will always perform voice data according to the fixed bandwidth of the air interface negotiated in the VoLTE call establishment process. Acquisition and encoding, outputting fixed-rate voice data; RTP/UDP/IP is only packetized and forwarded to PDCP for buffering; RLC always extracts corresponding amount of voice data from PDCP according to the uplink grant size obtained from the MAC. Net side.
  • the transmission of video data is similar to this and will not be described here.
  • the embodiments of the present invention provide a solution for determining an encoding rate of audio and video data, which is mainly based on detecting uplink information used to characterize transmission attributes of uplink audio and video data, and thus can be based on
  • the detected uplink information is used to adjust the encoding rate of the audio and video data, so that the encoding rate of the audio and video data can be adaptively adjusted according to the actual transmission condition of the audio and video data, improving the physical layer transmission error rate or the uplink transmission path of the terminal.
  • the accumulation of caches leads to the active discarding of audio and video data packets, improving the voice quality and delay of VoLTE calls.
  • the application determines the encoding rate of the audio and video data provided by some embodiments of the present invention.
  • the terminal of the solution may perform uplink, such as physical layer uplink error rate information, cache information on the uplink transmission path of the terminal, and RoHC switch information, in the uplink audio and video path, to indicate the uplink of the transmission attribute of the uplink audio and video data.
  • the information is detected to adjust the encoding rate of the audio and video data based on the detected uplink information.
  • FIG. 1 is a schematic structural diagram of a system of audio and video channels in the prior art; based on the exemplary audio and video path structure shown in FIG. 1 , a terminal applying a scheme for determining an encoding rate of audio and video data provided by some embodiments of the present invention may be used.
  • a terminal applying a scheme for determining an encoding rate of audio and video data provided by some embodiments of the present invention may be used.
  • audio and video data such as physical layer (PHY1029 in Fig. 1), PDCP layer (PDCP1026 in Fig. 1), RRC layer (RRC1025 in Fig. 1), RTP layer (IMS/RTP in Fig. 1) 1023), UDP/TCP layer (not shown), IP layer (IP1024 in FIG. 1), RLC layer (RLC 1027 in FIG. 1), MAC layer (MAC 1028 in FIG.
  • the uplink information feedback mechanism is configured to enable the function module of the terminal or the terminal to implement the solution for determining the audio and video data encoding rate provided by the embodiment of the present invention, to control the encoding used for encoding the audio and video data based on the feedback uplink information.
  • the (the VoiceEncoder 1031 in FIG. 1 and/or the VideoEncoder 1014 in FIG. 1) adaptively adjusts the encoding rate to achieve a rate balance between encoding of audio and video data and uplink transmission of audio and video data, thereby improving the physical layer transmission error rate.
  • terminal uplink The accumulation of buffers on the transmission path leads to the active discarding of audio and video data packets, improving VoLTE call quality and delay.
  • FIG. 2 is a schematic structural diagram of an exemplary terminal in which an audio/video data encoding rate scheme is applicable according to some embodiments of the present invention.
  • the structure of the terminal 200 may include components such as a transceiver 201, a processor 202, and an audio and video data processing circuit 203. It should be understood that the terminal 200 shown in FIG. 2 does not constitute a limitation of the terminal in which the scheme for determining the encoding rate of the audio and video data provided by the embodiment of the present invention can be applied, and may include more or less components than those illustrated. Or combine some parts, or split some parts, or different parts.
  • the transceiver 201 may be configured such that the terminal as shown in FIG. 2 is capable of transmitting wireless signals to and from the wireless network via a connection to a wireless network access point, such as a wireless network access point. receive signal.
  • transceiver 201 can be configured to support Any type of radio access technology (Radio Access Technologies, RAT) used to support communication between a terminal and a network over a wireless channel.
  • RAT Radio Access Technologies
  • the transceiver 201 can be configured to support communication via any type of RAT that can be used for communication between the terminal and the wireless network access point.
  • processor 202 may be configured to execute software programs and/or modules that may be stored in memory 204 or accessible to processor 202, as well as invoking data stored in memory 204, executing the terminal 200 various functions and processing data.
  • processor 202 may be configured to perform one or more functions of terminal 200 and/or control execution of one or more functions, such as in accordance with aspects provided by some embodiments of the present invention, such as Perform data processing, application execution, and/or other processing and management services.
  • processor 202 can be implemented in a variety of forms.
  • processor 202 can be implemented as various hardware-based processing devices, such as microprocessors, coprocessors, controllers, or various other computing or processing devices including integrated circuits. It should be understood that although FIG. 2 shows only a single processor, processor 202 may include one or more processors. The plurality of processors can be in operative communication with one another and can be collectively configured to perform one or more functions of the terminal.
  • the processor 202 can integrate an application processor (such as the AP 101 shown in FIG. 1) and a communication processor (such as the CP 102 shown in FIG. 1).
  • the application processor mainly processes an operating system, a user interface, an application, etc.
  • the communication processor (or may also be referred to as a modem processor) mainly processes wireless communication.
  • memory 204 may include one or more memory devices.
  • Memory 204 can include fixed and/or removable memory devices.
  • memory 204 can provide a non-transitory computer readable storage medium that can store computer program instructions executable by processor 202.
  • the memory 204 can be configured to store information, data, applications, computer software instructions, etc. for enabling the terminal 200 to perform various functions in accordance with one or more embodiments of the present invention.
  • memory 204 may be in communication with one or more of processor 202, transceiver 201, audio and video data processing circuitry 203 via one or more buses for use in components of the terminal Transfer information between.
  • the audio and video data processing circuit 203 may specifically include Encoded encoder.
  • the audio/video data processing circuit 203 may include an audio data processing circuit and a voice encoder or the like for encoding the voice data, and/or may include a video data processing circuit and a video encoder for encoding the video data.
  • the audio and video data processing circuit 203 may specifically include an audio data processing circuit (such as the Audio DSP 103 in the system structure shown in FIG. 1), which is included in the Audio DSP 103. a speech encoder for encoding speech data (such as Voice Encoder 1031 in the system structure shown in FIG. 1); the audio and video data processing circuit 203 may further include a camera (such as Camera 1013 in the system structure shown in FIG. 1) and a video encoder. (such as Video Encoder 1014 in the system structure shown in Figure 1).
  • an audio data processing circuit such as the Audio DSP 103 in the system structure shown in FIG. 1
  • a speech encoder for encoding speech data such as Voice Encoder 1031 in the system structure shown in FIG. 1
  • the audio and video data processing circuit 203 may further include a camera (such as Camera 1013 in the system structure shown in FIG. 1) and a video encoder. (such as Video Encoder 1014 in the system structure shown in Figure 1).
  • the audiovisual data processing circuit 203 may also include a microphone, a speaker, and the like.
  • the microphone can collect the sound signal and convert the collected sound signal into a signal, and the voice encoder performs encoding processing to obtain voice data, and the voice data can be further transmitted to the transceiver to be sent to, for example, another terminal, or the voice data is output to the voice data.
  • memory 204 for further processing.
  • the speech encoder can also be used to decode the received speech data, and the decoded converted signal can be further transmitted to the speaker for conversion to a sound signal output.
  • processor 202 can communicate with transceiver 201, audio and video data processing circuitry 203, and control transceiver 201, audio and video data processing circuitry 203, and the like.
  • the transceiver 201 can be configured to receive or transmit audio and video data under the control of the processor 202.
  • processor 202 can be configured to read computer executable program code stored in memory 204 and execute instructions in the code, when processor 202 executes the instructions, the instructions
  • the terminal 200 can be caused to perform the method as shown in FIG. 3:
  • Step 301 Detecting uplink information, where the uplink information is information used to represent a transmission attribute of the uplink audio and video data.
  • Step 302 Adjust the encoding rate of the audio and video data according to the detected first uplink information.
  • the audiovisual data may include voice data and/or video data.
  • the terminal 200 detects the uplink audio and video data detected in step 301.
  • a typical example of the uplink information of the transmission attribute may include a physical layer uplink transmission error rate, a buffer on the terminal uplink transmission path, and the like.
  • the embodiment of the present invention uses the PDCP layer uplink transmission buffer as an example of the buffer on the uplink transmission path of the terminal, and exemplifies the adjustment of the coding rate of the audio and video data according to the buffer on the uplink transmission path of the terminal;
  • the method for adjusting the coding rate of the audio and video data by the uplink transmission buffer of the video encoder, the RTP layer, the UDP layer, the IP layer, the PDCP layer, the RLC layer, and/or the MAC layer may be referred to according to the uplink transmission path of the terminal.
  • the method of buffering the method of adjusting the encoding rate of audio and video data is not repeated here.
  • processor 202 in terminal 200 may be configured to detect physical layer uplinks of VoLTE talk uplink (Uplink, UL) Send the bit error rate.
  • VoLTE talk uplink Uplink, UL
  • the terminal 200 may implement detection of the physical layer uplink transmission error rate based at least in part on the physical layer of the terminal 200 or the medium access control layer and the physical layer.
  • the processor 202 in the terminal 200 may be configured to acquire an acknowledgment (ACK) and/or a negative ACK (NACK) received from the network to determine a physical layer uplink transmission error rate, or may be It is configured to detect the uplink transmission error rate of the physical layer and the like through the medium access control layer.
  • ACK acknowledgment
  • NACK negative ACK
  • the processor 202 in the terminal 200 may be configured to detect the buffer of the VoLTE call uplink. Specifically, taking the PDCP layer uplink transmission buffer as an example, the terminal 200 may perform detection of the PDCP layer buffer based at least in part on the PDCP layer of the terminal 200 when performing step 301.
  • the processor 202 in the terminal 200 may be further configured to detect a physical layer uplink transmission error rate of the VoLTE call uplink and a buffer on the terminal uplink transmission path.
  • processor 202 in terminal 200 may be configured to detect RoHC switch information encoded for voice data. Specifically, since the RoHC switch information is generally configured by the network side to the RRC layer, the terminal 200 may perform detection of the RoHC switch information based at least in part on the RRC layer of the terminal 200 when performing step 301.
  • the present application only exemplifies the physical layer uplink transmission error rate and the buffer on the uplink transmission path of the terminal, and the uplink data for the voice data
  • the uplink information for characterizing the transmission attribute of the uplink audio and video data is exemplarily listed.
  • the RoHC switch information is listed; these example information does not constitute a limitation on the uplink information detected in the present application, and the uplink information that the terminal can detect may include more or less information than the above listed example information.
  • the terminal 200 may further perform step 302 to adjust the encoding rate of the audio and video data based on the detected first uplink information.
  • the first uplink information used by the terminal 200 in step 302 may specifically include a physical layer uplink transmission error rate and a terminal uplink transmission path.
  • One or more of the uplink information such as the cache and the RoHC switch information.
  • the first uplink information may be a physical layer uplink transmission error rate; or may be a buffer on the uplink transmission path of the terminal; or may be a physical layer uplink transmission error rate and a cache on the uplink transmission path of the terminal; or It may be a physical layer uplink transmission error rate, a buffer on the terminal uplink transmission path, and RoHC switch information; and the like.
  • the specific process performed by the terminal 200 in step 302 may be implemented by first determining and detecting the first uplink information according to the correspondence between the preset uplink information and the encoding rate of the audio and video data. Corresponding first coding rate, thereby adjusting the coding rate of the audio and video data to the determined first coding rate; wherein, the memory can be pre-stored in the memory The correspondence between the preset uplink information and the encoding rate of the audio and video data.
  • the processor 202 of the terminal 200 may be configured to read and detect from the correspondence between the preset uplink information stored in the memory 204 and the encoding rate of the audio and video data according to the detected first uplink information. Corresponding relationship corresponding to the first uplink information, and further determining a first coding rate corresponding to the first uplink information, thereby instructing the encoder in the terminal 200 to adjust the coding rate of the audio and video data to the determined first coding rate;
  • the encoder for encoding speech data in 200 may be configured to adjust the encoding rate of the speech data to a first encoding rate under the direction of the processor 202, and/or the encoder for encoding the video data may be configured to be processed
  • the encoding rate of the video data is adjusted to be the first encoding rate under the instruction of the device 202.
  • the corresponding relationship between the preset uplink information and the encoding rate of the audio and video data may be stored in the form of a table or the like in the memory 204, and the
  • the correspondence between the preset uplink information and the encoding rate of the audio and video data may be a one-to-one correspondence between a value of the uplink information and a value of the audio and video data encoding rate. Or a relationship between a value range of the uplink information and a value of the audio and video data encoding rate; or a value or range of the uplink information and the encoding of the audio and video data.
  • a corresponding relationship between the policies of the rate, the terminal may determine a value of the encoding rate of the audio and video data based on the criterion; and the like.
  • the terminal can adjust the coding rate to be compatible with the uplink transmission environment when the uplink transmission environment is good.
  • the higher coding rate adjusts the coding rate to a lower coding rate that is compatible with the uplink transmission environment when the uplink transmission environment is poor.
  • the correspondence between the preset uplink information and the encoding rate of the audio and video data may be different; for the voice data and the video data, the correspondence between the preset uplink information and the encoding rate of the voice data and the uplink
  • the correspondence between the encoding rate of information and video data may be the same or different.
  • the encoding of speech data by a speech coder is generally encoded according to the encoding mode of the speech data it configures, and different encoding modes correspond to different encoding rates.
  • Common coding modes of voice data include an Adaptive Multi-Rate (AMR) coding mode, wherein the AMR coding mode may include an AMR-WB coding mode and an AMR-NB coding mode, and an Enhanced Voice Service (Enhanced Voice Service). EVS) coding mode, etc., taking the AMR-NB coding mode as an example, the value of an AMR-NB coding mode mode usually corresponds to a fixed coding rate.
  • AMR Adaptive Multi-Rate
  • EVS Enhanced Voice Service
  • an encoding mode corresponding to the currently determined uplink information may be determined based on a correspondence relationship between the encoding mode and the uplink information; and audio data encoding is performed according to the determined encoding mode.
  • the encoding of the video data by the video encoder typically adjusts the encoding rate by adjusting the video frame rate and/or compression ratio.
  • the video frame may be based on a preset video frame. The correspondence between the rate and/or the compression ratio and the uplink information determines a video frame rate and/or a compression ratio corresponding to the determined uplink information, and encodes the video data according to the determined video frame rate and/or compression ratio.
  • the terminal adjusts the encoding rate of the audio and video data to the first encoding rate corresponding to the detected first uplink information, thereby making the sound
  • the encoding rate of the video data can be adaptively adjusted based on the current uplink transmission environment, improving the quality and efficiency of audio and video data transmission.
  • the specific process performed by the terminal 200 to perform step 302 may also be implemented by first comparing the detected first uplink information with a preset threshold, and further, according to the comparison result, and the comparison result and the current tone.
  • the preset criterion corresponding to the encoding rate of the video data adjusts the encoding rate of the audio and video data.
  • the processor 202 of the terminal 200 may be configured to read the preset threshold stored in the memory 204 according to the detected first uplink information, and compare the detected first uplink information with a preset threshold, according to the comparison result. Reading a preset criterion corresponding to the comparison result and the encoding rate of the current audio and video data stored in the memory 204, thereby indicating the terminal 200 in the terminal 200 based on the read preset criterion
  • the encoder adjusts the encoding rate of the audio and video data.
  • the encoder for encoding voice data in the terminal 200 may be configured to adjust the encoding rate of the voice data under the direction of the processor 202, and/or the encoder for encoding the video data may be configured at the processor 202. Adjust the encoding rate of the video data under the instruction.
  • the foregoing preset threshold for performing comparison may be set based on a principle capable of reflecting an uplink transmission environment, and the preset criterion for adjusting the encoding rate may be based on the comparison result.
  • the first uplink information is used as an example of the physical layer uplink transmission error rate.
  • the terminal 200 is a common indicator requirement for uplink data transmission.
  • the preset threshold corresponding to the preset threshold is set to 30%, and the physical uplink transmission error rate is greater than 30%.
  • the preset criterion corresponding to the comparison is that the encoding rate of the audio and video data is reduced by 80% relative to the encoding rate of the current audio and video data.
  • the preset criterion corresponding to the comparison result that the physical uplink transmission error rate is not more than 30% is to increase the encoding rate of the audio and video data by 20% with respect to the current encoding rate.
  • the processor 202 in the terminal 200 can encode the current audio and video data according to a preset criterion corresponding to the comparison result of 30% and the encoding rate of the current audio and video data.
  • the rate is adjusted to improve the physical layer transmission error rate.
  • the adjustment manner of the current coding rate and the adjustment ratio with respect to the current coding rate in the preset criteria may be different.
  • the corresponding preset criterion may be a criterion for maintaining the coding rate of the current audio and video data or maintaining The criterion that the encoding rate of the current audio and video data does not change; if the comparison result indicates that the current transmission environment is poor (for example, the value of the uplink information is higher than a preset threshold), the corresponding preset criterion may be for the current audio and video data.
  • the coding rate is reduced according to a certain range; if the comparison result indicates that the current transmission environment is poor (for example, the value of the uplink information is much higher than a preset threshold), the corresponding preset criterion may be relative to the current tone.
  • Video data encoding speed The rate is preset to adjust the ratio to reduce the criteria.
  • the preset thresholds used for comparison in the foregoing manner may be multiple, so as to achieve the effect of distinguishing the quality of the uplink transmission environment by a step hierarchy, and the processor 202 passes the The first uplink information that is detected is compared with a plurality of preset thresholds, and the current uplink transmission environment is determined to be good or bad, so that the current uplink transmission environment is determined according to the comparison result, and the comparison result is obtained.
  • the preset criterion corresponding to the encoding rate of the current audio and video data so that the encoding rate of the audio and video data can be correspondingly adjusted corresponding to different uplink transmission environments.
  • the first uplink information detected by the terminal 200 is an example of a physical layer uplink transmission error rate.
  • two thresholds are preset (a first threshold and a second threshold, A threshold is less than the second threshold, and a first criterion is set corresponding to the uplink information whose value is smaller than the first threshold, and a second criterion is set corresponding to the uplink information whose value is greater than the first threshold and smaller than the second threshold, where the value is greater than
  • the uplink information of the second threshold corresponds to setting a third criterion, wherein the first criterion increases or maintains the coding rate according to a set ratio with respect to the current coding rate, and the second criterion makes the coding rate relative to the current coding rate.
  • the third criterion reduces the coding rate by a further ratio higher than the ratio in the second criterion with respect to the set further ratio; and the processor 202 in the terminal 200 is detecting After the physical layer uplink transmission error rate is compared with the first threshold, when the physical layer uplink transmission error rate is greater than the first threshold, the processor 202 may further Comparing the detected physical layer uplink transmission error rate with the second threshold; and further, according to the comparison result, when the physical layer uplink transmission error rate is greater than the first threshold and less than the second threshold, the current coding rate is preset according to the preset The second criterion is reduced; when the uplink layer transmission error rate is greater than the second threshold, the current coding rate is reduced according to a preset third criterion; it can be seen that the latter reduces the current coding rate. Larger, so it is more suitable for the actual situation of the uplink transmission environment.
  • the preset criterion corresponding to the comparison result and the encoding rate of the current audiovisual data may include a criterion for increasing the encoding rate with respect to the current audiovisual data and a criterion for decreasing the encoding rate of the current audiovisual data, considering To some practical applications, in some embodiments of the present invention, for a preset criterion, a criterion for increasing the encoding rate of the current audio and video data may increase the encoding rate by less than the encoding rate of any relative current audio and video data. The reduced criterion is used to reduce the magnitude of the coding rate.
  • the preset criterion may also be a criterion for adjusting according to the comparison result and the encoding mode of the comparison result and the current voice data, thereby achieving the adjustment of the coding mode. Adjusting the effect of the encoding rate of the voice data; for video data, the preset criterion may also be a criterion for adjusting according to the comparison result and the video frame rate and/or compression ratio of the comparison result and the current video data, and then adjusting The video frame rate and/or compression ratio of the video data has the effect of adjusting the encoding rate of the video data.
  • the processor 202 may compare the detected physical layer uplink transmission error rate with a preset threshold, so that the error rate may be sent in the physical layer uplink.
  • a preset threshold it is determined according to the preset criterion that the coding mode of the current voice data needs to be reduced according to a certain ratio, so that the voice encoder is instructed to reduce the coding mode of the encoded voice data according to the criterion; and the error is transmitted in the uplink layer of the physical layer.
  • the vocoder can adjust the encoding mode of the voice data according to the indication of the processor 202.
  • the processor 202 may also compare the detected physical layer uplink transmission error rate with a preset threshold, so that the uplink may be sent at the physical layer.
  • the error rate is greater than the threshold, it is determined according to a preset criterion that the video frame rate of the current video data needs to be reduced according to a certain ratio and/or the compression ratio of the current video data is increased according to a certain ratio, thereby indicating the video coding according to the criterion.
  • the preset criterion determines that the video encoder increases the video frame rate of the current video data by a certain ratio and/or reduces the compression ratio of the video data according to a certain ratio, or instructs the video encoder to maintain the video frame rate and/or video of the video data.
  • the compression ratio of the data is unchanged; the video encoder can adjust the video frame rate of the video data and/or the compression ratio of the video data under the direction of the processor 202.
  • the first uplink information detected by the terminal may include a physical layer uplink transmission error rate and a buffer on the terminal uplink transmission path, specifically, the terminal is detected according to the
  • the first uplink information may be adjusted by, but not limited to, the encoding rate of the audio and video data:
  • the terminal determines that the coding rate of the audio and video data needs to be reduced according to the buffer on the uplink transmission path of the terminal, the terminal adjusts the coding rate of the bass video data; otherwise, the terminal adjusts the coding rate of the audio and video data according to the uplink transmission error rate of the physical layer. .
  • the terminal adjusts the error rate of the physical layer uplink transmission error rate and the buffering of the audio and video data on the uplink transmission path of the terminal, which may be considered as part of the foregoing embodiments.
  • the process of adjusting the encoding rate of the audio and video data based on the physical layer uplink transmission error rate and the adjustment and optimization process of the adjustment process of the audio and video data encoding rate based on the buffer on the uplink transmission path of the terminal described in the foregoing embodiments are described.
  • the processor 202 determines the audio and video data according to the buffer on the detected uplink transmission path of the terminal.
  • the encoder for encoding the audio and video data may be instructed to directly adjust the encoding rate of the bass video data to the lowest rate, where the minimum rate may be preset or the minimum supported by the call setup. Coding rate.
  • the terminal determines whether the coding rate of the audio and video data needs to be adjusted according to the buffer on the uplink transmission path of the terminal, and the process of adjusting the coding rate of the audio and video data according to the detected physical layer uplink transmission error rate of the terminal.
  • the RoHC switch for the uplink transmission of voice data, considering whether RoHC is the compression of the IP+UDP+RTP header, whether the RoHC switch is turned on will affect the size of the voice packet. Thus for the transmission of voice data, the effect of whether the RoHC switch is turned on for voice data transmission can be further considered.
  • the IP+UDP+RTP header has a total of 60 bytes, and the maximum payload of the payload carrying voice data is only 60 bytes. Therefore, if the RoHC switch is turned on, executing the RoHC will make the IP+UDP+RTP average. Compressed to 3-5 bytes, so that the AMR payload (that is, the payload carrying voice data, or can also be understood as a valid data payload) will greatly increase the bit proportion of the entire PDCP voice packet; and if the RoHC switch When IP+UDP+RTP does not perform compression, the IP+UDP+RTP bytes occupy no less than 50% of the total PDCP voice packets.
  • the voice quality can be guaranteed, and if the RoHC switch is turned off, since the effective data load ratio is small, if the coding rate is reduced, if it is not limited, there may be too little effective data in the data encoded and transmitted within the set time. Therefore, the voice quality cannot be guaranteed, which will make the speed regulation gain that can be achieved by adjusting the encoding rate of the voice data cannot achieve the desired effect.
  • the uplink information detected by the terminal in step 301 may further include RoHC switch information, in consideration of the effect of whether the RoHC switch is turned on for the voice data encoding rate adjustment effect.
  • the first uplink information detected by the terminal may include a physical layer uplink transmission error rate, a buffer on the terminal uplink transmission path, and header compression.
  • the RoHC switch information specifically, the terminal may adjust the coding rate of the audio and video data by, but not limited to, according to the detected first uplink information:
  • the terminal determines that the RoHC switch information indicates that the RoHC switch is off, if the coding rate of the voice data needs to be reduced according to the physical layer uplink transmission error rate and/or the buffer on the uplink transmission path of the terminal, the terminal lowers the coding of the voice data.
  • the rate is not lower than the preset encoding rate; thus, when the RoHC switch is turned off, the effective data load ratio is too small, resulting in the set time
  • the problem that the amount of valid data in the encoded data is too small affects the voice quality.
  • the terminal may first determine whether the RoHC switch is turned off according to the RoHC switch information when performing step 302. If it is determined that the RoHC switch information indicates that the RoHC switch is off, the terminal may Further, when it is determined that the coding rate of the voice data needs to be reduced according to the physical layer uplink transmission error rate and/or the buffer on the uplink transmission path of the terminal, the coding rate of the voice data is lowered to not lower than the preset coding rate.
  • the foregoing embodiment of the present invention relating to RoHC switch information is a buffer pair based on physical layer uplink transmission error rate and terminal uplink transmission path described in some embodiments of the present invention. Based on the adjustment process of the audio and video data encoding rate, the optimization process of the influence of RoHC switch information on voice data transmission is further considered.
  • the process of determining whether the coding rate of the audio and video data needs to be adjusted and how to adjust according to the detected physical layer uplink transmission error rate and/or the buffer on the uplink transmission path of the terminal may be referred to in the foregoing embodiment, This will not be repeated here.
  • FIG. 4 is a schematic diagram showing the application of the scheme for determining the encoding rate of audio and video data in an audiovisual path provided by some embodiments of the present invention.
  • the first uplink information detected by the terminal 200 is a physical layer uplink transmission error rate.
  • the terminal may be configured to increase the detection of the current physical layer (PHY 1029) uplink transmission error rate at the MAC 1028, and may be configured to receive the current physics reported by the MAC 1028 by the IMSA 1022.
  • Layer uplink transmission error rate such as specifically by MAC 1028 reports to IMSA 1022 every unit time (e.g., 1 second), as shown in step 401 of Figure 4.
  • the terminal may configure a preset threshold for the IMSA 1022, so that the IMSA 1022 can determine the comparison result with the comparison result and the current audio and video data according to the comparison result of the physical layer uplink transmission error rate and the preset threshold.
  • a preset criterion for adjusting the encoding rate of audio and video data may be configured.
  • the terminal may configure two preset thresholds (threshold 1 and threshold 2) for the IMSA 1022, and the IMSA 1022 may compare the received physical layer uplink transmission error rate with the threshold 1 and the threshold 2. And determining, based on a comparison result with the two preset thresholds, a preset criterion for adjusting a coding rate of the voice data. For example, suppose that the threshold 1 for the uplink layer transmission error rate of the physical layer is 30%, and the threshold 2 is 40%, assuming that the coding mode for encoding the voice data is the AMR coding mode, and the comparison result and the current setting are set.
  • the preset criterion corresponding to the coding rate of the voice data is: when the physical layer uplink transmission error rate is less than the threshold 1, the AMR coding mode (mode) of the current voice data is increased by 1; when the physical layer uplink transmission error rate is greater than the threshold 1 When the value is less than the threshold 2, the mode of the current voice data is decremented by 2; when the physical layer uplink transmission error rate is greater than the threshold 2, the mode of the current voice data is decremented by 4.
  • the IMSMA 1022 may initiate a speed adjustment request for changing the AMR coding mode (mode) to the voice encoder (Voice Encoder 1031). Carrying the determined AMR coding mode in the speed adjustment request, so that the voice Encoder 1031 encodes the voice data according to the coding rate corresponding to the AMR coding mode (step 402a shown in FIG. 4);
  • the Voice Encoder 1031 determines the corresponding coding rate according to the AMR coding mode carried in the speed adjustment request, and performs voice coding, so that the coding rate of the voice data is uplinked with the current physical layer.
  • the environment adapts to improve the bit error rate of the physical layer transmission and achieve the effect of improving the quality of voice calls.
  • the following table shows the test results of the improvement of the voice quality when the voice data encoding rate is adjusted based on the physical layer uplink transmission error rate in some embodiments of the present invention, and it can be seen that the solutions provided by some embodiments of the present invention correspond to Average subjective opinion of voice quality (Mean Opinion Score, The Mos) is higher than the Mos score of the voice quality corresponding to the prior art solution.
  • the packet loss rate corresponding to the solution provided by some embodiments of the present invention is also far lower than the packet loss rate corresponding to the prior art solution.
  • the terminal may configure threshold 3 and threshold 4 for the IMSA 1022, and the IMSA 1022 may compare the received physical layer uplink transmission error rate with the thresholds 3 and 4, thereby based on the two
  • the comparison result of the preset threshold determines a preset criterion corresponding to the comparison result and the encoding rate of the current video data.
  • the preset threshold 3 for the uplink layer transmission error rate of the physical layer is 30%
  • the threshold value 4 is 40%. It is assumed that the preset criterion realizes the video data encoding rate by adjusting the video frame rate and/or the compression ratio of the video data.
  • the preset criterion corresponding to the comparison result and the encoding rate of the current video data is: when the physical layer uplink transmission error rate is less than the threshold 3, the video frame rate of the current video data is increased by 1, and the current video is added.
  • the compression ratio of the data is reduced by 10%; when the physical layer uplink transmission error rate is greater than the threshold 3 and less than the threshold 4, the video frame rate of the current video data is reduced by 2, and the video frame rate of the current video data is reduced to At 6 o'clock, the compression ratio of the current video data is increased by 20%; when the physical layer uplink transmission error rate is greater than the threshold 4, the video frame rate of the current video data is decreased by 4, and the video frame rate of the current video data is reduced. By 6 o'clock, the compression ratio of the current video data is increased by 40%.
  • the IMSMA 1022 may initiate a speed adjustment request for adjusting the video coding rate to the video encoder (Video Encoder 1014).
  • the speed adjustment request carries the determined video frame rate and/or the compression ratio of the video data, so that the Video Encoder 1014 performs adjustment according to the speed adjustment request (step 402b as shown in FIG. 4);
  • Video Encoder 1014 will follow the speed adjustment request.
  • the video frame rate and/or the compression ratio of the video data are video-encoded, so that the coding rate of the video data is adapted to the current physical layer uplink environment, improving the physical layer transmission error rate, and improving the video call quality. .
  • the threshold value of the above setting and the value in the preset criterion are based on actual application experience and considering that the physical layer uplink error rate is less than 30% in actual application, which is a common indicator requirement for uplink data transmission.
  • the specific value of the foregoing threshold value and the preset criterion corresponding to the comparison result can be adaptively adjusted, which is not limited in this application.
  • FIG. 5 is a schematic diagram showing the application of a scheme for determining an encoding rate of audio and video data in an audiovisual path provided by some embodiments of the present invention.
  • the embodiment of the present invention uses the PDCP layer uplink transmission buffer as an example of the buffer on the uplink transmission path of the terminal, and exemplifies the adjustment of the coding rate of the audio and video data according to the buffer on the uplink transmission path of the terminal;
  • the upper buffer further includes at least an uplink transmit buffer of an audio encoder, a video encoder, an RTP layer, a UDP layer, an IP layer, a PDCP layer, an RLC layer, and/or a MAC layer; according to an audio encoder, a video encoder,
  • the method for adjusting the coding rate of the audio and video data by the uplink transmission buffer of the RTP layer, the UDP layer, the IP layer, the PDCP layer, the RLC layer, and/or the MAC layer may be adjusted by adjusting the audio and video data according to
  • the terminal may be configured to increase the detection of the current PDCP uplink transmission buffer and the threshold setting in the PDCP 1026, so that the terminal can compare the detected current PDCP uplink transmission buffer with a preset threshold. And determining a preset criterion for adjusting the encoding rate of the audio and video data based on the comparison result (such as whether the preset criterion may need to be flow controlled).
  • the terminal may compare the detected PDCP layer uplink sending buffer with a preset one or more thresholds; thereby determining a corresponding criterion when the detected PDCP uplink sending buffer is greater than a threshold, such as a sixth threshold.
  • a threshold such as a sixth threshold.
  • the corresponding criterion is determined to be Release flow control, ie audio and video data
  • the encoding rate can be increased.
  • the preset threshold may be the value T1 sent by the network side, otherwise the default value T2 may be taken.
  • the threshold of the amount of data can be specifically set to 150 ms.
  • a threshold (assumed to be K) for the PDCP layer
  • it may be further obtained by using at least two ratios (for example, N1, N2) to the threshold. Comparing at least two thresholds of the process, for example, if the PDCP layer detects that the current uplink transmit buffer data amount exceeds a set threshold (for example, K ⁇ N1), it is considered that flow control is required (ie, determining audio and video data)
  • the coding rate needs to be reduced.
  • the PDCP After the flow control starts, if the amount of buffered data is detected to gradually decrease to a lower ratio than the set threshold (for example, K ⁇ N2), the PDCP can be considered to be able to release the flow control (ie, determine The encoding rate of audio and video data needs to be increased).
  • the set threshold for example, K ⁇ N2
  • the specific values of the ratio values N1 and N2 set for the PDCP layer data buffer threshold may be at least one voice packet duration (usually For the fluctuation time of 20ms), for example, N1 can be determined as 80% by calculation formula ((1-20)/100), and N2 can be determined as 20% by (1-N1); for the transmission of video data, consider The duration of the video frame is not fixed, and the values of the above-mentioned ratio values N1 and N2 may refer to values corresponding to the voice data.
  • the configurable PDCP 1026 notifies the IMSA 1022 in the CP 102 to perform flow control when detecting that the uplink transmission buffer exceeds the first set threshold; and after detecting the start of the flow control, detecting that the uplink transmission buffer is reduced to be lower than the first
  • the IMSA 1022 is notified to release the flow control, and the IMSA 1022 can be configured to receive the notification from the PDCP 1026 and perform the corresponding processing (step 501 as shown in FIG. 5).
  • the IMSA 1022 may send a notification message for implementing the flow control to the Voice Encoder 1031, so that the Voice Encoder 1031 can notify the voice according to the notification.
  • the message reduces the encoding rate of the voice data to achieve the purpose of uplink flow control.
  • the IMSA 1022 may send a notification message for releasing the flow control to the Voice Encoder 1031. In order for the Voice Encoder 1031 to increase the encoding rate of the voice data according to the notification message. (Step 502a as shown in Figure 5).
  • the Voice Encoder 1031 can reduce the output voice data rate by reducing the coding mode of the voice data, thereby achieving the flow control purpose, and reducing the voice data in the transmission due to the buffer accumulation on the uplink transmission path of the terminal. Discard voice packets and improve the quality of voice calls.
  • the Voice Encoder 1031 can be reduced (or increased) in a direct or stepwise manner to the minimum (or maximum) coding rate negotiated for VoLTE call setup.
  • the flow control sensitivity is better in the direct change mode, and the voice quality is smoother in the stepwise change mode.
  • the IMSA 1022 may send a notification message for implementing the flow control to the Video Encoder 1014 in the AP 101 to enable the Video Encoder. 1014, according to the notification message, reducing the encoding rate of the video data; after receiving the indication of the flow control of the unblocking audio and video data sent by the PDCP 1026, the notification message for releasing the flow control may be sent to the Video Encoder 1014 to enable the Video Encoder 1014.
  • the encoding rate of the video data is increased according to the notification message (step 502b as shown in FIG. 5).
  • the Video Encoder 1014 can reduce the output video rate by reducing the video frame rate and increasing the compression ratio, thereby achieving the flow control purpose, and reducing the video data accumulation in the transmission due to the buffer accumulation on the uplink transmission path of the terminal. This leads to the phenomenon of actively discarding video packets and improves the quality of video calls.
  • FIG. 6 shows a scheme for determining an encoding rate of audio and video data provided by some embodiments of the present invention. Schematic diagram of application in audio and video channels.
  • the first uplink information detected by the terminal 200 is a physical layer uplink transmission error rate and a PDCP uplink transmission buffer.
  • the MAC 1028 can detect the current physical layer (PHY 1029) uplink transmission error rate, and report the detected current physical layer uplink transmission error rate to the IMSA 1022 (as shown in FIG. 6).
  • the step 401) is shown; the PDCP 1026 can detect the current PDCP uplink transmission buffer, and report to the IMSA 1022 whether flow control is required based on the comparison result between the current PDCP uplink transmission buffer and the preset threshold (eg, Step 501) shown in FIG. 6; at the same time, the speed adjustment policy can be determined by the IMSA 1022 based on the current physical layer uplink transmission error rate reported by the MAC 1028 and the indication of whether the flow control is required to be reported by the PDCP 1026.
  • Step 601a for voice data encoding and 601b for video data encoding as shown in Figure 6 the speed adjustment policy can be determined by the IMSA 1022 based on the current physical layer uplink transmission error rate reported by the MAC 1028 and the indication of whether the flow control is
  • the IMSA 1022 may directly encode the audio and video data for encoding, regardless of the current physical layer uplink transmission error rate reported by the MAC 1028.
  • the transmitter sends a notification message for implementing flow control, so that the encoder for encoding the audio and video data reduces the encoding rate of the audio and video data according to the notification message, and specifically can directly slow down to the lowest rate to avoid buffer overflow and discarding.
  • the IMSA 1022 may further base the current physical layer uplink transmission error rate reported by the MAC 1028, based on the description in the foregoing embodiment.
  • the physical layer uplinks the bit error rate to adjust the speed, and adjusts the encoding rate of the audio and video data.
  • the voice encoder when it lowers or increases the coding rate according to the instruction of the processor, it may adopt a direct manner to lower or increase the minimum coding rate or the highest coding rate supported by the negotiation when the call is established, or may also adopt The step-by-step method is to lower or increase to the lowest coding rate or the highest coding rate, wherein the former has better sensitivity of the flow control and the latter has a smoother speech quality.
  • the terminal further instructs the voice encoder to adjust the coding rate of the voice data according to the detected physical layer uplink transmission error rate, which may be specifically detected according to the foregoing embodiment.
  • the physical layer transmits the bit error rate in the uplink and adjusts the coding mode used to encode the voice data. This application does not repeat here.
  • FIG. 7 is a schematic diagram showing the application of a scheme for determining an encoding rate of audio and video data in an audiovisual path provided by some embodiments of the present invention.
  • the audio and video data is voice data
  • the first uplink information detected by the terminal 200 includes a physical layer uplink transmission error rate, a PDCP uplink transmission buffer, and RoHC switch information.
  • the terminal may be configured to detect RoHC switch information configured by the network side to the RRC layer (such as step 701 shown in FIG. 7), and the MAC 1028 may send an error error to the current physical layer (PHY 1029).
  • the rate is detected, and the detected current physical layer uplink transmission error rate is reported to the IMSA 1022 (step 401 shown in FIG. 7); the PDCP 1026 can detect the current PDCP uplink transmission buffer, and based on the detection
  • the comparison result of the current PDCP uplink transmission buffer and the preset threshold is reported to the IMSA 1022 whether an indication of flow control is required (such as step 501 shown in FIG. 7); and can be configured by the IMSA 1022 to report the current based on the MAC 1028.
  • the physical layer sends the error rate of the uplink, the indication of whether the flow control is required to be reported by the PDCP 1026, and the RoHC switch information reported by the RRC to determine how to adjust the voice data of the Voice Encoder 1031.
  • the IMCA 1022 may perform the physical layer uplink transmission error rate and the PDCP uplink transmission buffer as described in the foregoing embodiment shown in FIG. 6.
  • the speed adjustment mode instructs the Voice Encoder 1031 to adjust the coding rate of the voice data; if the IMCA 1022 receives the RoHC switch information from the RRC to indicate that the RoHC is off, the IMSMA 1022 determines that the coding rate of the voice data needs to be reduced according to the PDCP layer uplink transmission buffer.
  • the terminal determines the code of the voice according to the detected physical layer uplink transmission error rate and/or the buffer on the uplink transmission path of the terminal.
  • the process of adjusting the rate and how to adjust refer to the description of the foregoing embodiment, and the application will not be repeated here.
  • an embodiment of the present invention further provides a method for determining an encoding rate of audio and video data.
  • the process may be specifically performed by a terminal, and may be implemented by software, hardware, or a combination of hardware and software.
  • the exemplary terminal shown in FIG. 2 can provide means or functional modules for performing the flow steps as shown in FIG.
  • the embodiment of the present invention further provides a terminal, which can execute the method flow described in the foregoing embodiment, and can be specifically implemented as a device or a function module in the terminal as shown in FIG. The process steps shown in Figure 3 are performed.
  • FIG. 8 is a schematic structural diagram of a terminal according to some embodiments of the present invention. As shown in FIG. 8, the terminal includes: a detection module 801, an adjustment module 802;
  • the detecting module 801 is configured to detect uplink information, where the uplink information is information used to represent a transmission attribute of the uplink audio and video data;
  • the adjusting module 802 is configured to adjust an encoding rate of the audio and video data according to the first uplink information detected by the detecting module.
  • the terminal may further include: a comparing module, configured to compare the detected first uplink information with a preset threshold; and the adjusting module 802 is specifically configured to: according to the comparing module The comparison result and the preset criterion corresponding to the comparison result and the encoding rate of the current audio and video data adjust the encoding rate of the audio and video data.
  • the detection module and the adjustment module 802 in the terminal provided by some embodiments of the present invention can solve the problem and the beneficial effects of the method shown in FIG. 3 and the implementation manner of the terminal shown in FIG.
  • the beneficial effect so the implementation of the terminal can Refer to step 301 in the above method, the implementation of step 302, and the implementation of the terminal shown in FIG. 2, and the repeated description is not repeated.
  • an embodiment of the present invention further provides a storage medium, where the storage medium is a non-transitory computer-readable storage medium, where the non-volatile computer-readable storage medium stores at least one program.
  • Each of the programs includes instructions that, when executed by a terminal having a processor, cause the terminal to perform a method flow for determining an encoding rate of audio and video data according to the foregoing embodiments of the present invention, as described in the foregoing embodiments. The description of this application will not be repeated here.
  • the embodiment of the present invention provides a method, a terminal, and a storage medium for determining an encoding rate of audio and video data, so as to provide a coding rate adjustment scheme for encoding and transmitting data in an encoding transmission process, which mainly adopts Detecting uplink information used to characterize the transmission attribute of the uplink audio and video data, and further adjusting the coding rate of the audio and video data based on the detected uplink information, so that the coding rate of the audio and video data can follow the uplink transmission actual situation of the audio and video data.
  • the adaptive adjustment adjusts the rate between the encoding of the audio and video data and the uplink transmission of the audio and video data, thereby improving the physical layer transmission error rate or the buffer accumulation on the uplink transmission path of the terminal, thereby actively discarding the audio and video data packets. Such phenomena improve the quality and delay of VoLTE calls.
  • the transmission scheme for determining the encoding rate of the audio and video data can also be adaptively applied to the voice over WIFI service.
  • the difference between the VoWIFI service and the VoLTE is mainly that the VoWIFI service uses WiFi as an access network to access the IMS.
  • FIG. 9 is a schematic structural diagram showing an example of adaptively applying a scheme for determining an encoding rate of audio and video data to a VoWIFI service according to an embodiment of the present invention.
  • the structure of the exemplary system layer includes an AP 901, an Audio DSP 902, and a WIFI 903.
  • the AP 901 can mainly include a Camera 9011, a Video Encoder 9012, an RTP 9013, an IP stack 9014, and a WIFI Driver 9015.
  • the Audio DSP 902 can mainly include a Voice Encoder 9021 and an RTP 9022.
  • the transmission process of the audio and video data is similar to that of the VoLTE, and specifically includes the transmission of the voice data and the transmission of the video data.
  • the uplink voice data transmission process may be included by Audio
  • the DSP 902 transmits the voice data to the IP stack 9014 in the AP 901, and then transmits it to the WIFI 903 through the WIFI Driver 9015 (as shown in FIG. 8 for uplink data transmission 1-3);
  • the uplink video data transmission process may include the AP 901.
  • the Camera 9011, Video Encoder 9012, and RTP 9013 transmit the voice data to the IP stack 9014, and then transmit it to the WIFI 903 through the WIFI Driver 9015 (as shown in Figure 9 for the uplink data transmission 1'-3).
  • the detection point for detecting the uplink information may be configured in the WIFI Driver 9015, specifically how to determine whether the speed adjustment is needed based on the uplink information detected by the WIFI Driver 9015 and how to perform The speed adjustment can be described with reference to the foregoing embodiment for the process of adjusting the coding rate in the VoLTE service.
  • the IP protocol stack uses the AP 901 side, and sends the detected uplink information to the WIFI.
  • Driver 9015 wherein the uplink transmission error rate information is stored in the WIFI Driver 9015, and the uplink transmission data is also cached in the WIFI Driver 9015.
  • the uplink transmission error rate information and the flow control message determined based on the uplink transmission buffer are sent by the WIFI Driver 9015 to the IP Stack 9014 (as shown in the speed adjustment indication 901 shown in FIG. 9), and then by the IP Stack 9014.
  • the RTP 9022 in the Audio DSP 902 is notified to control the encoding rate of the voice data (such as the speed indication indication 902a shown in FIG. 9), or to the RTP 9013 corresponding to the Camera 9011 in the AP 901 to control the encoding of the video data. Rate (such as the speed indication indication 902b shown in Figure 9).
  • the scheme for determining the encoding rate of the audio and video data provided by the above embodiment of the present invention is adaptively applied to the VoWIFI service, the quality of the VoWIFI video call with large data traffic is The delay will be able to achieve a more significant improvement.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Communication Control (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本发明公开了一种确定音视频数据编码速率的方法、终端以及存储介质。本发明方法包括:终端检测上行信息,所述上行信息为用于表征上行音视频数据的传输属性的信息;终端根据检测到的第一上行信息,调整音视频数据的编码速率。本发明能够改善物理层传输误码率以及终端上行发送路径上的缓存堆积而导致主动丢弃音视频数据包的问题。

Description

一种确定音视频数据编码速率的方法、终端以及存储介质 技术领域
本发明涉及通信技术领域,尤其涉及一种确定音视频数据编码速率的方法、终端以及存储介质。
背景技术
长期演进(Long Term Evolution,LTE)中的语音通话(Voice Over LTE,VoLTE)是基于互联网协议(Internet Protocol,IP)多媒体子系统(IP Multimedia Subsystem,IMS)的语音业务。IMS由于支持多种接入和丰富的多媒体业务,成为全IP时代的核心网标准架构。经历了过去几年的发展成熟后,如今IMS已经跨越裂谷,成为固定话音领域宽带语音(Voice Over Broadband,VoBB)、公共交换电话网络(Public Switched Telephone Network,PSTN)的主流选择,而且也被第三代合作伙伴计划(3rd Generation Partnership Project,3GPP)确定为移动语音的标准架构。
VoLTE可提供高质量的音视频通话,可实现数据与语音业务在同一网络下的统一。现有技术在VoLTE通话过程中,语音数据以及视频数据在发送过程中速率往往是固定的,而这样持续使用固定的较大速率发送数据一方面会导致在物理层存在上行误码的同时,还会持续产生较大的上行实时传输协议(Real-time Transport Protocol,RTP)丢弃音视频数据包率;另一方面,在发生上行授权变小、物理层上行发送混合自动重传请求(Hybrid Automatic Repeat reQuest,HARQ)重传、或者无线资源控制(Radio Resource Control,RRC)信道重建等情况时,会导致上行发送数据堆积在分组数据汇聚协议(Packet Data Convergence Protocol,PDCP)层,时延增加,这种堆积如果一直持续到超过语音或视频承载标度值(QoS Class Identifier,QCI)对时延的要求后,PDCP就会主动丢弃缓存的语音数据或视频数据,进而导致端到端语音或视频通话质量的下降,对用户体验产生负面影响。
因此,如何克服现有技术中所存在的上述缺陷,减少音视频数据在传输中的丢弃音视频数据包和堆积,提升VoLTE通话质量和时延,是业界所亟待研究和解决的问题。
发明内容
本发明实施例提供一种确定音视频数据编码速率的方法、终端以及存储介质,用以减少音视频数据在传输中的丢弃音视频数据包和堆积,进而提升音视频通信质量。
第一方面,本发明实施例提供一种确定音视频数据编码速率的方法,所述方法包括:终端检测上行信息,所述上行信息为用于表征上行音视频数据的传输属性的信息;所述终端根据检测到的第一上行信息,调整所述音视频数据的编码速率。可以看到,通过本申请所提供的方案,可以基于音视频数据的上行传输信息适应性地调整音视频数据的编码速率,因而能够通过编码速率的调整而达到改善物理层传输误码率或终端上行发送路径上的缓存堆积而导致主动丢弃音视频数据包,提升多媒体通话的质量的效果。
在一种可能的实现方式中,所述终端根据获取到的第一上行信息,调整所述音视频数据的编码速率,包括:终端根据预设的上行信息与音视频数据的编码速率的对应关系,确定与所述第一上行信息对应的第一编码速率,调整所述音视频数据的编码速率为第一编码速率。可以看到,通过本申请所提供的方案,可以将音视频数据的编码速率调整为与当前检测到的上行信息相对应的编码速率,因而可以使得调整后的编码速率与当前的传输环境相适应,从而改善物理层传输误码率或终端上行发送路径上的缓存堆积而导致主动丢弃音视频数据包,提升多媒体通话的质量。
在一种可能的实现方式中,所述终端根据获取到的第一上行信息,调整所述音视频数据的编码速率,包括:所述终端将检测到的上行信息与预设阈值比较;所述终端根据比较结果以及与所述比较结果和当前音视频数据的编码速率对应的预设准则,调整所述音视频数据的编码速率。可以看到,通过 本申请所提供的方案,可以通过设定阈值的方式,将检测到的上行信息与预设阈值比较,根据比较结果确定相对于当前音视频数据的编码速率的调整策略,例如:当检测到的上行信息大于预设阈值,且与该比较结果对应的调整策略为将音视频数据的编码速率相对于当前音视频数据的编码速率降低80%,即:通过与当前音视频编码速率的相对调整策略以确定目标音视频编码速率,从而进一步地可以使得对音视频数据的编码速率的调整更好地适应于当前的上行传输环境,进一步地改善物理层传输误码率或终端上行发送路径上的缓存堆积而导致主动丢弃音视频数据包。
在一种可能的实现方式中,所述音视频数据为语音数据。
在一种可能的实现方式中,所述音视频数据为视频数据;可通过调整所述视频数据的视频帧率和/或所述视频数据的压缩率,以实现调整所述视频数据的编码速率。可以看到,通过本申请所提供的方案,可以对语音数据基于上行传输信息进行适应性地调整,也可以对视频数据基于上行传输信息进行适应性地调整,因而可以改善物理层传输误码率或终端上行发送路径上的缓存堆积而导致主动丢弃音视频数据包。
在一种可能的实现方式中,所述上行信息包括物理层上行发送误码率。可以看到,通过本申请所提供的方案,可以使得音视频数据的编码速率能够基于上行传输环境中物理层的上行发送误码率得到适应性地调整,进而能够改善物理层上行发送误码率,提升多媒体通话的质量。
在一种可能的实现方式中,所述上行信息包括终端上行发送路径上的缓存。可以看到,通过本申请所提供的方案,可以使得音视频数据的编码速率能够基于上行传输环境中终端上行发送路径上的缓存得到适应性地调整,进而能够改善由于终端上行发送路径上的缓存堆积而主动丢弃音视频数据包的问题,提升多媒体通话的质量。
在一种可能的实现方式中,所述终端上行发送路径上的缓存包括以下任一种或多种:音频编码器/视频编码器缓存、实时传输协议RTP层缓存、用户数据报协议UDP/传输控制协议TCP层缓存、因特网协议IP层缓存、分组数 据汇聚协议PDCP层缓存、无线链路层控制协议RLC层缓存、和/或介质访问控制MAC层缓存。
在一种可能的实现方式中,所述上行信息包括物理层上行发送误码率和终端上行发送路径上的缓存;所述终端根据所述上行信息,调整所述音视频数据的编码速率,包括:所述终端若根据所述终端上行发送路径上的缓存确定所述音视频数据的编码速率需要减小,则所述终端调低所述音视频数据的编码速率;否则,所述终端根据所述物理层上行发送误码率,调整所述音视频数据的编码速率。可以看到,通过本申请所提供的方案,可以使得音视频数据的编码速率能够基于上行传输环境中终端上行发送路径上的缓存以及物理层的上行发送误码率得到适应性地调整,进而能够避免终端上行发送路径上的缓存堆积堆积而主动丢弃音视频数据包等问题,提升多媒体通话的质量。
在一种可能的实现方式中,所述音视频数据为语音数据;所述上行信息包括物理层上行发送误码率、终端上行发送路径上的缓存以及RoHC开关信息;所述终端根据所述上行信息,调整所述音视频数据的编码速率,包括:当所述终端在确定所述RoHC开关信息指示RoHC开关关闭时,若根据所述物理层上行发送误码率和/或所述终端上行发送路径上的缓存确定所述语音数据的编码速率需要减小,则所述终端调低所述语音数据的编码速率至不低于预设编码速率。可以看到,通过本申请所提供的方案,在对语音数据的编码速率进行调整时进一步考虑了语音数据编码中RoHC开关信息对于语音数据编码的影响,从而能够基于RoHC开关信息,控制在调控编码速率降低时使得编码速率不低于预设的编码速率,从而保证调速后仍能保证语音通话的质量。
在一种可能的实现方式中,所述音视频数据为语音数据;所述终端根据获取到的第一上行信息,调整所述音视频数据的编码速率,包括:终端根据预设的上行信息与语音数据的编码模式的对应关系,确定与所述第一上行信息对应的第一编码模式,调整所述音视频数据的编码速率为第一编码模式对应的编码速率。可以看到,通过本申请所提供的方案,可以使得音视频数据 的编码速率能够基于音视频数据上行传输环境的具体情况(通过设置至少两个阈值来实现区分)得到适应性地调整,进而能够减少语音数据的丢弃语音数据包和堆积,提升多媒体通话的质量。
第二方面,本发明实施例提供一种终端,所述终端包括:存储器和处理器;所述存储器与所述处理器耦合连接;所述存储器用于存储计算机可执行程序代码,所述程序代码包括指令;当所述处理器执行所述指令时,所述指令使所述终端执行根据权利要求上述第一方面和第一方面的各可能的确定音视频数据编码速率的方法。由于该终端解决问题的实施方式以及有益效果可以参见上述第一方面和第一方面的各可能的确定音视频数据编码速率的方法的实施方式以及所带来的有益效果,因此该终端设备的实施可以参见上述第一方面和第一方面的各可能的确定音视频数据编码速率的方法的实施,重复之处不再赘述。
第三方面,本发明实施例提供一种终端,所述终端包括:检测模块,调整模块;检测模块,用于检测上行信息,所述上行信息为用于表征上行音视频数据的传输属性的信息;调整模块,用于根据所述检测模块检测到的第一上行信息,调整所述音视频数据的编码速率。基于同一发明构思,由于该终端解决问题的原理以及有益效果可以参见上述第一方面和第一方面的各可能的确定音视频数据编码速率的方法的实施方式以及所带来的有益效果,因此该终端的实施可以参见方法的实施,重复之处不再赘述。
第四方面,本发明实施例提供一种存储介质,所述存储介质为非易失性计算机可读存储介质,所述非易失性计算机可读存储介质存储有至少一个程序,每个所述程序包括上述第一方面和第一方面的各可能的实现方式所涉及的计算机软件指令,所述指令当被具有处理器的终端执行时使所述终端执行上述第一方面和第一方面的各可能实现方式的确定音视频数据编码速率的的方法。
本发明实施例基于音视频数据的上行传输信息适应性地调整音视频数据的编码速率,因而能够通过编码速率的调整而达到改善物理层传输误码率或 终端上行发送路径上的缓存堆积而导致主动丢弃音视频数据包,提升多媒体通话的质量的效果。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为现有技术中音视频通路的系统结构示意图;
图2为本发明的实施例提供的一种终端结构示意图;
图3为本发明的实施例提供的一种确定音视频编码速率的方法流程图;
图4为本发明的实施例提供的一种音视频通路的系统结构示意图;
图5为本发明的实施例提供的另一种音视频通路的系统结构示意图;
图6为本发明的实施例提供的又一种音视频通路的系统结构示意图;
图7为本发明的实施例提供的再一种音视频通路的系统结构示意图;
图8为本发明的实施例提供的一种终端的结构示意图;
图9为本发明实施例提供的应用于VoWIFI业务中的音视频通路系统结构示意图。
具体实施方式
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。
为了克服现有技术中所存在的在VoLTE通话过程中,由于物理层上行误码率、终端上行发送路径上的缓存堆积导致的主动丢弃音视频数据包所可能导致的语音和/或视频通话的质量下降、时延增加等问题,本发明实施例提供了一种确定音视频数据编码速率的方法、终端以及存储介质,以提供针对音 视频数据在编码传输过程中的编码速率调整方案,该方案主要通过检测用于表征上行音视频数据的传输属性的上行信息,进而能够基于检测到的上行信息来调整音视频数据的编码速率,从而使得音视频数据的编码速率能够跟随音视频数据的上行传输实际情况而适应性地调整,进而能够改善音视频数据在传输过程中的丢弃音视频数据包和堆积等问题,提升VoLTE通话的质量和时延。
下面结合附图对本发明实施例进行详细描述。
图1示出了本发明一些实施例提供的确定音视频数据编码速率的方案可以应用其中的示例性音视频通路的系统结构示意图。
应当理解的是,图1中示出的示例性音视频通路并不构成对本发明实施例提供的确定音视频数据编码速率的方案可以应用其中的音视频通路的限定,可以包括比图示更多或更少的层次,或者组合某些层次,或者拆分某些层次,或者不同的层次布置等。图1中示出的示例性音视频通路的功能具体可以通过硬件、软件编程以及软硬件的组合来实现,硬件具体可以包括一个或多个信号处理和/或专用集成电路等。
如图1所示,该示例性音视频通路结构中可包括有应用处理器(Application Processor,AP)101、通信处理器(Communication Processor,CP)102、音频数据信号处理器(Audio DSP)103以及无线网络NET 104。
其中,AP 101中主要可包括有应用层(Application,APP)1011、无线接口层(Radio Interface Layer,RIL)1012、相机(Camera)1013、以及视频编码器(Video Encoder)1014;音频数据信号处理器(Audio DSP)103中主要可包括有语音编码器(Voice Encoder)1031;
其中,CP 102中主要可包括有AT过滤层(AT Filter)1021、多媒体子系统适配层(IP Multimedia Subsystem Adapter,IMSA)1022、IMS/RTP 1023、IP 1024、非接入层(Non-access stratum,NAS)/RRC 1025、PDCP 1026、无线链路层控制协议层(Radio Link Control,RLC)1027、介质访问控制层(Media Access Control,MAC)1028、以及物理层(Physical Layer,PHY)1029。
其中,AP 101与CP 102之间可以建立通信连接。具体地,比如将该示例性系统结构应用到移动终端时,AP 101可通过AT(Attention)命令集与CP 102之间进行通信,由AP 101中的RIL 1012与CP 102中AT Filter 1021之间AT命令的发送和接收等。
其中,CP 102中的IMSA 1022主要用于提供IMS与无线通信网络之间的适配;IMS/RTP 1023主要用于提供VoLTE通话的端到端通信服务等;NAS/RRC 1025、PDCP 1026、RLC 1027、MAC 1028、以及PHY 1029主要用于提供LTE无线协议栈的功能;PHY 1029可包括网络硬件诸如收发器等以支持与无线网络NET 104的通信。
具体地,如图1所示的示例性音视频通路结构中,VoLTE通话的建立流程主要涉及有AP 101中的APP 1011、RIL 1012,以及CP 102中的AT Filter 1021、IMSA 1022、NAS/RRC 1025、IP 1024、PDCP 1026、RLC 1027、MAC 1028和PHY 1029之间的消息交互(如图1中所示出的通话命令示意①-⑦)。考虑到本申请不涉及VoLTE通话的建立流程,因而为了描述的简洁,本申请中对VoLTE通话的建立流程将不作详述。
具体地,如图1所示的示例性音视频通路结构中,VoLTE通话建立成功后,便可以进行音视频数据的传输。其中,音视频数据具体可以包括有语音数据以及视频数据等。
以所传输的音视频数据为语音数据为例,如图1所示,语音数据的传输主要涉及有Audio DSP 103中的Voice Encoder 1031,以及CP 102中的IMSA 1022、IMS/RTP 1023、IP 1024、PDCP 1026、RLC 1027、MAC 1028和PHY 1029之间的消息交互。
具体比如图1中所示出的语音数据的传输流程,其主要包括有以下步骤:
步骤101:由Audio DSP 103中的Voice Encoder 1031执行对语音数据的采集和编码,编码后的语音数据通过CP 102中的IMSA 1022传输至IMS/RTP 1023;
步骤102:通过IMS/RTP 1023以及IP 1024对语音数据进行RTP/UDP/IP 打包后进一步地传输至PDCP 1026;
步骤103:语音数据经过PDCP 1026、RLC 1027、MAC 1028和PHY 1029传输至无线网络侧NET 104。
相应地,视频数据的传输流程同语音数据的传输流程类似,如图1所示的示例性音视频通路结构中,视频数据的传输流程与语音数据的传输流程区别仅在于数据的采集部分。具体地,视频数据的采集和编码主要由AP 101中的Camera1013和Video Encoder 1014完成(步骤101’),进而通过CP 102中的IMSA 1022传输至IMS/RTP 1023(步骤102),再经过PDCP 1026、RLC 1027、MAC 1028和PHY 1029传输至无线网络侧NET 104(步骤103)。
然而对于上述语音数据或视频数据的传输流程,在现有技术中,比如以语音数据的传输为例,Audio DSP中的Voice Encoder将始终按照VoLTE通话建立过程中协商的空口固定带宽大小进行语音数据的采集和编码,输出固定码率的语音数据;RTP/UDP/IP只做打包处理后转送给PDCP进行缓存;RLC则始终根据从MAC获取的上行授权大小从PDCP提取相应量的语音数据发给网侧。视频数据的传输与此类似,在此不再赘述。持续使用固定的较大速率发送数据不仅会导致在物理层存在上行误码时,持续产生较大的上行RTP丢弃音视频数据包率,在发生上行授权变小、物理层上行发送HARQ重传、或者RRC信道重建等情况时,还会导致上行发送数据堆积在PDCP层,时延增加,导致PDCP主动丢弃缓存的语音数据或视频数据,进而降低端到端语音或视频通话质量,影响用户体验。
为了解决现有技术中所存在上述缺陷,本发明实施例提供了一种确定音视频数据编码速率的方案,该方案主要通过检测用于表征上行音视频数据的传输属性的上行信息,进而能够基于检测到的上行信息来调整音视频数据的编码速率,从而使得音视频数据的编码速率能够跟随音视频数据的实际传输情况得到适应性地调整,改善物理层传输误码率或终端上行发送路径上的缓存堆积而导致主动丢弃音视频数据包,提升VoLTE通话语音质量和时延。
具体地,应用有本发明的一些实施例提供的确定音视频数据编码速率的 方案的终端可以对上行音视频通路中诸如物理层上行发送误码率信息、终端上行发送路径上的缓存信息、以及RoHC开关信息等一种或多种可以表征上行音视频数据的传输属性的上行信息进行检测,从而根据检测到的上行信息调整音视频数据的编码速率。
图1为现有技术中音视频通路的系统结构示意图;基于图1所示出的示例性音视频通路结构,应用有本发明的一些实施例提供的确定音视频数据编码速率的方案的终端可以在音视频数据的通路中的、诸如物理层(图1中的PHY1029)、PDCP层(图1中的PDCP1026)、RRC层(图1中的RRC1025)、RTP层(图1中的IMS/RTP 1023)、UDP/TCP层(未示出)、IP层(图1中的IP1024)、RLC层(图1中的RLC1027)、MAC层(图1中的MAC1028)等中任一层或多层处配置上行信息反馈机制,从而使得终端或者该终端中用于实现本发明实施例提供的确定音视频数据编码速率的方案的功能模块能够基于反馈的上行信息,控制用于编码音视频数据的编码器(图1中的VoiceEncoder1031和/或图1中的VideoEncoder1014)适应性调整编码速率,从而达到音视频数据的编码与音视频数据的上行传输之间的速率平衡,进而改善物理层传输误码率或终端上行发送路径上的缓存堆积而导致主动丢弃音视频数据包等现象,改善VoLTE通话质量和时延。
图2示出了本发明一些实施例提供的确定音视频数据编码速率方案可应用其中的示例性终端的结构示意图。
如图2所示,该终端200的结构中可包括有收发器201、处理器202、以及音视频数据处理电路203等部件。应当理解,图2中所示出的终端200并不构成对本发明实施例提供的确定音视频数据编码速率的方案可应用其中的终端的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。
在本发明的一些实施例中,收发器201可以被配置为使得如图2所示终端能够经由到无线网络接入点诸如无线网络接入点的连接来传输无线信号到无线网络并从无线网络接收信号。同样地,收发器201可以被配置为支持可 用于支持终端与网络之间通过无线信道进行通信的任何类型的无线接入技术(Radio Access Technologies,RAT)。例如,收发器201可被配置为支持经由任何类型的RAT的通信,该RAT可用于终端与无线网络接入点之间的通信。
在本发明的一些实施例中,处理器202可被配置为执行可存储在存储器204中或者可供处理器202访问的软件程序和/或模块,以及调用存储在存储器204内的数据,执行终端200的各种功能和处理数据。
在本发明的一些实施例中,处理器202可被配置为根据本发明的一些实施例所提供的方案来执行终端200的一个或多个功能和/或控制一个或多个功能的执行,比如执行数据处理、应用执行和/或其他处理和管理服务。
具体地,处理器202能够以多种形式实施。例如,处理器202可实施为各种基于硬件的处理装置,诸如微处理器、协处理器、控制器或包括集成电路的各种其他计算或处理设备等。应当理解的是,尽管图2仅示出单个处理器,处理器202可包括一个或多个处理器。多个处理器可彼此操作性通信,并且可被共同地配置为执行终端的一个或多个功能。
具体比如,以图1所示音视频通路为例,处理器202可集成有应用处理器(如图1所示的AP 101)和通信处理器(如图1所示的CP 102)。其中,应用处理器主要处理操作系统、用户界面和应用程序等,通信处理器(或也可称为调制解调处理器)主要处理无线通信。
在本发明的一些实施例中,存储器204可包括一个或多个存储器设备。存储器204可包括固定式和/或可移除的存储器设备。比如存储器204可提供非易失性计算机可读存储介质,该非易失性计算机可读存储介质可存储可由处理器202执行的计算机程序指令。存储器204可被配置为存储用于使得终端200能够根据本发明的一个或多个实施例来执行各种功能的信息、数据、应用程序、计算机软件指令等。在本发明的一些实施例中,存储器204可经由一条或多条总线与处理器202、收发器201、音视频数据处理电路203中的一者或多者进行通信以用于在终端的部件之间传递信息。
在本发明的一些实施例中,音视频数据处理电路203可具体包括有用于 编码的编码器。具体比如,音视频数据处理电路203可包括有音频数据处理电路以及用于编码语音数据的语音编码器等,和/或可包括有视频数据处理电路以及用于编码视频数据的视频编码器等。
比如,基于图1所示音视频通路结构示例,音视频数据处理电路203可具体包括有音频数据处理电路(比如图1所示系统结构中的Audio DSP 103),在Audio DSP 103中包含有用于编码语音数据的语音编码器(比如图1所示系统结构中的Voice Encoder 1031);音视频数据处理电路203还可具体包括有摄像头(比如图1所示系统结构中的Camera1013)以及视频编码器(比如图1所示系统结构中的Video Encoder 1014)等。
具体地,本发明的一些实施例中,音视频数据处理电路203还可包括麦克风以及扬声器等。其中,麦克风可以收集声音信号并将收集的声音信号转换为信号,由语音编码器进行编码处理得到语音数据,语音数据可进一步传输至收发器以发送给比如另一终端,或者将语音数据输出至与存储器204以便进一步处理。语音编码器还可以用于对接收到的语音数据进行解码,解码转换得到的信号可进一步地传输到扬声器,由扬声器转换为声音信号输出。
在本发明的一些实施例中,处理器202可与收发器201、音视频数据处理电路203进行通信以及控制收发器201、音视频数据处理电路203等。其中,收发器201可被配置为在处理器202的控制下接收或发送音视频数据。
本发明的一些实施例中,处理器202可被配置为读取存储器204中存储的计算机可执行程序代码,并执行所述代码中的指令,当处理器202执行所述指令时,所述指令可以使得终端200执行如图3所示的方法:
步骤301:检测上行信息,其中,上行信息为用于表征上行音视频数据的传输属性的信息;
步骤302:根据检测到的第一上行信息,调整音视频数据的编码速率。
具体地,音视频数据可以包括有语音数据和/或视频数据。
考虑到音视频数据上行传输涉及到物理层的信道状况、终端上行发送路径上的缓存等,终端200在步骤301中所检测的用于表征上行音视频数据的 传输属性的上行信息,其典型示例可以包括有物理层上行发送误码率、终端上行发送路径上的缓存等。为便于说明,本发明实施例以PDCP层上行发送缓存作为终端上行发送路径上的缓存的一种示例,示例性说明根据终端上行发送路径上的缓存调整音视频数据的编码速率;根据音频编码器、视频编码器、RTP层、UDP层、IP层、PDCP层、RLC层、和/或MAC层等的上行发送缓存调整音视频数据的编码速率的方法实施,可以参考根据终端上行发送路径上的缓存调整音视频数据的编码速率的方法实施,重复之处不再赘述。
考虑到物理层信道状况对音视频数据上行传输的影响,在本发明的一些实施例中,终端200中的处理器202可被配置为检测VoLTE通话上行链路(Uplink,UL)的物理层上行发送误码率。
具体地,终端200在执行步骤301时可至少部分地基于终端200的物理层、或者介质访问控制层与物理层来实现对物理层上行发送误码率的检测。例如,终端200中的处理器202可被配置为获取从网络接收的确认(Acknowledgement,ACK)和/或非确认(Negative ACK,NACK)的数量来确定物理层上行发送误码率,或者可被配置为通过介质访问控制层检测物理层的上行发送误码率等。
考虑到终端上行发送路径上的缓存对音视频数据上行传输的影响,在本发明的一些实施例中,终端200中的处理器202可被配置为检测VoLTE通话上行链路的缓存。具体地,以PDCP层上行发送缓存为例,终端200在执行步骤301时可至少部分地基于终端200的PDCP层来实现对PDCP层缓存的检测。
进一步地,在本发明的一些实施例中,终端200中的处理器202还可被配置为检测VoLTE通话上行链路的物理层上行发送误码率以及终端上行发送路径上的缓存。
此外,针对语音数据的编码传输,由于语音数据的编码涉及到RoHC开关,RoHC开关是否打开将会影响语音数据包的大小,因而对于语音数据的传输而言,还可以进一步地考虑RoHC开关是否打开对于语音数据上行传输的 影响。在本发明的一些实施例中,终端200中的处理器202可被配置为检测针对语音数据编码的RoHC开关信息。具体地,由于RoHC开关信息通常是由网络侧配置给RRC层的,终端200在执行步骤301时可至少部分地基于终端200的RRC层来实现对RoHC开关信息的检测。
应当理解的是,尽管本申请对用于表征上行音视频数据的传输属性的上行信息仅示例性地列出了物理层上行发送误码率和终端上行发送路径上的缓存、以及针对语音数据进一步地列出了RoHC开关信息;这些示例信息并不构成对本申请中所检测的上行信息的限定,终端可以检测的上行信息可以包括有比所列出上述示例信息更多或更少的信息。
在本发明的一些实施例中,终端200在通过步骤301检测上行信息之后,便可进一步地执行步骤302,基于检测到的第一上行信息,来调整音视频数据的编码速率。
应当理解的是,虽然术语“第一”、“第二”等可能在本申请中用来描述各种元素(比如阈值),但是这些术语只是用来将一个元素与另一个元素区分开,这些元素并不被这些术语所限定(比如阈值的具体取值不受这些术语的限定)。
在本发明的一些实施例中,基于步骤301中对上行信息的检测,终端200在步骤302中所使用的第一上行信息具体可以包括有物理层上行发送误码率、终端上行发送路径上的缓存、RoHC开关信息等上行信息中的一种或多种。
比如,第一上行信息具体可以是物理层上行发送误码率;或者可以是终端上行发送路径上的缓存;或者也可以是物理层上行发送误码率和终端上行发送路径上的缓存;或者还可以是物理层上行发送误码率、终端上行发送路径上的缓存以及RoHC开关信息;等等。
在本发明的一些实施例中,终端200执行步骤302的具体过程可以通过以下方式实现:首先根据预设的上行信息与音视频数据的编码速率的对应关系,确定与检测到的第一上行信息对应的第一编码速率,从而将音视频数据的编码速率调整为所确定的第一编码速率;其中,可以在存储器中预先存储 预设的上行信息与音视频数据的编码速率的对应关系。
具体地,终端200的处理器202可被配置为根据检测到的第一上行信息,从存储器204中存储的预设的上行信息与音视频数据的编码速率的对应关系中读取与所检测到的第一上行信息对应的对应关系,进而确定与该第一上行信息对应的第一编码速率,从而指示终端200中的编码器调整音视频数据的编码速率为所确定的第一编码速率;终端200中用于编码语音数据的编码器可被配置为在处理器202的指示下调整语音数据的编码速率为第一编码速率,和/或用于编码视频数据的编码器可被配置为在处理器202的指示下调整视频数据的编码速率为第一编码速率。其中,预设的上行信息与音视频数据的编码速率的对应关系在存储器204中可以存储为表格等形式,具体存储形式,本发明不做限定。
在本发明的一些实施例中,预设的上行信息与音视频数据的编码速率的对应关系具体可以是上行信息的一个取值与音视频数据编码速率的一个取值之间一一对应的关系;或者也可以是上行信息的一个取值范围与音视频数据编码速率的一个取值之间对应的关系;或者也可以是上行信息的一个取值或取值范围与用于确定音视频数据编码速率的一个策略之间对应的关系,终端可以基于该准则确定出音视频数据编码速率的一个取值;等等。
具体地,在预设的上行信息与音视频数据的编码速率的对应关系中,上行信息的取值所指示的上行传输环境状态越好,与该取值所对应的编码速率越高;上行信息的取值所指示的上行传输环境状态越差,与该取值所对应的编码速率越低;进而,终端可以在上行传输环境好的时候,将编码速率调整到与该上行传输环境相适应的较高的编码速率,在上行传输环境差的时候,将编码速率调整到与该上行传输环境相适应的较低的编码速率。
具体地,针对不同的上行信息,预设的上行信息与音视频数据的编码速率的对应关系可以不同;针对语音数据和视频数据,预设的上行信息与语音数据的编码速率的对应关系和上行信息与视频数据的编码速率的对应关系可以相同或不同。
考虑到在实际应用中,语音编码器对语音数据的编码通常是按照其所配置的语音数据的编码模式进行编码的,不同的编码模式对应不同的编码速率。常见的语音数据的编码模式包括有自适应多速率(Adaptive Multi-Rate,AMR)编码模式、其中AMR编码模式可以包括AMR-WB编码模式以及AMR-NB编码模式、全高清编码(Enhanced Voice Service,EVS)编码模式等,以AMR-NB编码模式为例,一个AMR-NB编码模式mode的取值通常对应一固定的编码速率,比如mode为2对应的编码速率为5.9kbit/s、mode为4对应的编码速率为7.40kbit/s等。因而,在本发明的一些实施例中,可以基于编码模式与上行信息的对应关系,确定与当前所确定的上行信息对应的编码模式;根据所确定的编码模式进行音频数据编码。
又考虑到在实际应用中,视频编码器对视频数据的编码通常是通过调整视频帧率和/或压缩率来调整编码速率的,在本发明的一些实施例中,可以基于预设的视频帧率和/或压缩率与上行信息的对应关系,确定与所确定的上行信息所对应的视频帧率和/或压缩率,根据所确定的视频帧率和/或压缩率对视频数据进行编码。
可以看到,终端通过根据预设的上行信息与音视频数据的编码速率的对应关系,将音视频数据的编码速率调整为与检测到的第一上行信息对应的第一编码速率,从而使得音视频数据的编码速率能够基于当前的上行传输环境得到适应性地调整,提高音视频数据传输的质量和效率。
在本发明的一些实施例中,终端200执行步骤302的具体过程还可以通过以下方式实现:首先将检测到的第一上行信息与预设阈值比较,进而根据比较结果以及与比较结果和当前音视频数据的编码速率对应的预设准则,调整所述音视频数据的编码速率。
具体地,终端200的处理器202可被配置为根据检测到的第一上行信息,读取存储器204中存储的预设阈值,将检测到的第一上行信息与预设阈值比较,根据比较结果读取存储器204中存储的与该比较结果和当前音视频数据的编码速率对应的预设准则,进而基于所读取的预设准则指示终端200中的 编码器调整音视频数据的编码速率。其中,终端200中用于编码语音数据的编码器可被配置为在处理器202的指示下调整语音数据的编码速率,和/或用于编码视频数据的编码器可被配置为在处理器202的指示下调整视频数据的编码速率。
在本发明的一些实施例中,上述用于进行比较的预设阈值可以基于能够反映上行传输环境好坏的原则来进行设置,用于调整编码速率的预设准则可以是基于比较结果所指示的上行传输环境的好坏对当前编码速率的调整策略。
具体比如,以检测到的第一上行信息为物理层上行发送误码率为例,考虑到在实际应用中物理层上行误码率低于30%是上行数据传输的一个常用指标要求,终端200可以将预设阈值设置为30%,并设置物理上行发送误码率大于30%的比较结果对应的预设准则为将音视频数据的编码速率相对于当前音视频数据的编码速率降低80%,设置物理上行发送误码率不大于30%的比较结果对应的预设准则为将音视频数据的编码速率相对于当前编码速率增加20%。这样,终端200中的处理器202在检测到物理层上行发送误码率后,可以根据与30%的比较结果和当前音视频数据的编码速率对应的预设准则,对当前音视频数据的编码速率进行调整,以改善物理层传输误码率。
具体地,对于不同比较结果所指示的上行传输环境,预设的准则中对当前编码速率的调整方式以及相对于当前编码速率的调整比例都可以是不同的。
比如,如果比较结果指示当前传输环境较佳(比如上行信息的取值不高于预设阈值),那相对应的预设的准则可以是对于当前音视频数据的编码速率进行增加的准则或保持当前音视频数据的编码速率不变的准则;如果比较结果指示当前传输环境较差(比如上行信息的取值高于预设阈值),那么相对应的预设的准则可以是对于当前音视频数据的编码速率按照一定幅度进行减小的准则;如果比较结果指示当前传输环境很差(比如上行信息的取值远高于预设阈值),那么相对应的预设的准则可以是相对于当前音视频数据的编码速 率的预设调整比例进行减小的准则。
进一步地,在本发明的一些具体实施例中,上述方式所用于进行比较的预设阈值可以是多个,从而达到对上行传输环境的好坏进行按阶梯层次区分的效果,处理器202通过将检测到的第一上行信息与多个预设阈值进行比较,将能够确定出当前上行传输环境的好坏程度,从而根据比较结果确定当前上行传输环境的好坏程度,进而得到与该比较结果和当前音视频数据的编码速率对应的预设准则,从而使得音视频数据的编码速率能够对应于不同的上行传输环境下得到相应调整。
比如,以终端200检测到的第一上行信息为物理层上行发送误码率为例,在本发明的一些实施例中,假设预设有有两个阈值(第一阈值和第二阈值,第一阈值小于第二阈值),并对取值小于第一阈值的上行信息对应设置第一准则,对取值大于第一阈值且小于第二阈值的上行信息对应设置第二准则,对取值大于第二阈值的上行信息对应设置第三准则,其中,第一准则使编码速率相对于当前的编码速率按照设定的一个比例增加或保持不变,第二准则使编码速率相对于当前的编码速率按照设定的又一个比例降低,第三准则使编码速率相对于当前的编码速率按照比第二准则中的比例取值更高的又一个比例降低;进而终端200中的处理器202在将检测到的物理层上行发送误码率与第一阈值进行比较后,在物理层上行发送误码率大于该第一阈值时,处理器202可以进一步地将检测到的物理层上行发送误码率与第二阈值进行比较;进而根据比较结果,在物理层上行发送误码率大于第一阈值且小于第二阈值时,将当前的编码速率按照预设的第二准则进行减小;在物理层上行发送误码率大于二阈值时,将当前的编码速率按照预设的第三准则进行减小;可以看到,后者对当前编码速率的降低幅度更大,从而更能于上行传输环境的实际情况相适应。
应当理解的是,本发明上述设置有两个阈值的实施例仅为通过多个阈值划分上行环境好坏程度的一个示例,在实际的应用中,还可以适应性地设置更多的阈值,本申请对此并不限定。
具体地,与比较结果和当前音视频数据的编码速率对应的预设准则可以包括有相对当前音视频数据的编码速率进行增加的准则以及相对当前音视频数据的编码速率进行减小的准则,考虑到实际应用,在本发明的一些实施例中,对于预设准则,相对当前音视频数据的编码速率进行增加的准则对编码速率进行增加的幅度可以低于任一相对当前音视频数据的编码速率进行减小的准则对编码速率进行减小的幅度。
进一步地,在本发明的一些实施例中,对于语音数据而言,预设的准则也可以是根据比较结果以及与比较结果和当前语音数据的编码模式进行调整的准则,进而通过调整编码模式达到调整语音数据的编码速率的效果;对于视频数据而言,预设的准则也可以是根据比较结果以及与比较结果和当前视频数据的视频帧率和/或压缩率进行调整的准则,进而通过调整视频数据的视频帧率和/或压缩率达到调整视频数据的编码速率的效果。
具体地,在本发明的一些实施例中,对于语音数据,处理器202可以将检测到的物理层上行发送误码率与预设的一个阈值进行比较,从而可以在物理层上行发送误码率大于该阈值时,根据预设的准则确定需要将当前语音数据的编码模式按照一定比例进行减小,从而按照该准则指示语音编码器减小编码语音数据的编码模式;在物理层上行发送误码率不大于该阈值时,根据预设的准则确定需要对于当前语音数据的编码模式按照一定比例进行增加或者保持不变,从而按照该准则指示语音编码器增加编码语音数据的编码模式或保持编码模式不变;语音编码器则可根据处理器202的指示调整语音数据的编码模式。
具体地,在本发明的一些实施例中,对于视频数据的传输,处理器202也可以将检测到的物理层上行发送误码率与预设的一个阈值进行比较,从而可以在物理层上行发送误码率大于该阈值时,根据预设的准则确定需要将当前视频数据的视频帧率按照一定比例减小和/或将当前视频数据的压缩率按照一定比例增加,从而按照该准则指示视频编码器调低视频数据的视频帧率和/或调高视频数据的压缩率;在物理层上行发送误码率不大于该阈值时,根据 预设的准则确定指示视频编码器按照一定比例增加当前视频数据的视频帧率和/或按照一定比例减小视频数据的压缩率、或者指示视频编码器保持视频数据的视频帧率和/或视频数据的压缩率不变;视频编码器则可在处理器202的指示下调整视频数据的视频帧率和/或视频数据的压缩率。
进一步地,在本发明的一些实施例中,考虑到终端所检测到的第一上行信息可以包括有物理层上行发送误码率和终端上行发送路径上的缓存,具体地,终端根据检测到的第一上行信息,可以通过但不限于以下方式来调整所述音视频数据的编码速率:
终端如果根据终端上行发送路径上的缓存确定音视频数据的编码速率需要减小,则终端调低音视频数据的编码速率;否则,终端根据物理层上行发送误码率,调整音视频数据的编码速率。
可以看到,本发明的一些实施例中,终端基于检测到物理层上行发送误码率和终端上行发送路径上的缓存对音视频数据编码速率的调整过程,可以认为是对前述部分实施例所描述的基于物理层上行发送误码率对音视频数据编码速率的调整过程和前述部分实施例所描述的基于终端上行发送路径上的缓存对音视频数据编码速率的调整过程的综合和优化。
在本发明的一些实施例中,为了防止由于终端上行发送路径上的缓存堆积导致主动丢弃音视频数据包的情况,处理器202在根据检测到的终端上行发送路径上的缓存确定音视频数据的编码速率需要减小时,可以指示用于编码音视频数据的编码器直接调低音视频数据的编码速率至最低速率,这里的最低速率可以是预设的,也可以是通话建立时所协商支持的最低编码速率。
具体地,终端根据终端上行发送路径上的缓存确定音视频数据的编码速率是否需要调整的过程、终端根据检测到的物理层上行发送误码率,调整音视频数据的编码速率的过程均可以参见前述实施例中的描述,本文在此将不再赘述。
在本发明的一些实施例中,对于语音数据的上行传输,考虑到RoHC是IP+UDP+RTP包头的压缩,RoHC开关是否打开将会影响语音数据包的大小, 因而对于语音数据的传输而言,还可以进一步地考虑RoHC开关是否打开对于语音数据传输的影响。
比如在IPv6场景,IP+UDP+RTP包头共60字节,而承载语音数据的载荷最大字节也只有60个字节,因而如果RoHC开关打开,执行RoHC将会使得IP+UDP+RTP可平均压缩到3-5个字节,这样AMR载荷(即承载语音数据的载荷,或者也可以理解为有效数据载荷)在整个PDCP语音数据包的比特(bit)占比将大大增加;而如果RoHC开关关闭,IP+UDP+RTP不执行压缩,则IP+UDP+RTP字节数在整个PDCP语音数据包中的占比不低于50%。因而,如果确定出语音数据的编码速率需要减小,如果RoHC开关打开,由于有效数据载荷占比较大,减小编码速率时,仍可以保证在设定时间内有一定数量的有效载荷被编码传输,从而能够保证语音质量,而如果RoHC开关关闭,由于有效数据载荷占比小,减小编码速率时如果不加以限制,将有可能导致在设定时间内被编码传输的数据中有效数据过少,因而无法保证语音质量,这将会使得对语音数据的编码速率进行调速所能够取得的调速收益不能达到预期效果。
因而,考虑到RoHC开关是否打开对于语音数据编码速率调整效果的影响,在本发明的一些实施例中,步骤301中终端所检测的上行信息还可以包括有RoHC开关信息。
进一步地,在本发明的一些实施例中,对于音视频数据为语音数据,终端所检测到的第一上行信息可以包括有物理层上行发送误码率、终端上行发送路径上的缓存以及报头压缩RoHC开关信息,具体地,终端根据检测到的第一上行信息,可以通过但不限于以下方式来调整所述音视频数据的编码速率:
当终端在确定RoHC开关信息指示RoHC开关关闭时,如果根据物理层上行发送误码率和/或终端上行发送路径上的缓存确定语音数据的编码速率需要减小,则终端调低语音数据的编码速率至不低于预设编码速率;由此,避免了当RoHC开关关闭时,由于有效数据载荷占比过小,导致在设定时间内 被编码传输的数据中有效数据过少而影响语音质量的问题。
具体地,在本发明的一些实施例中,对于语音数据的上行传输,终端在执行步骤302时可以首先根据RoHC开关信息确定RoHC开关是否关闭,如果确定RoHC开关信息指示RoHC开关关闭,那么终端可以进一步地在根据物理层上行发送误码率和/或终端上行发送路径上的缓存确定语音数据的编码速率需要减小时,调低语音数据的编码速率至不低于预设编码速率。
具体比如,考虑到各语音编码速率对应的具体比特(bit)数,在确定RoHC开关信息指示RoHC开关关闭时,可以设置编码速率为AMR编码模式mode=2所对应的编码速率为预设编码速率,即限制调速程度最低只能下降编码速率到mode=2所对应的编码速率。
可以看到,针对于语音数据,本发明的上述涉及RoHC开关信息的实施例,是在前述本发明的部分实施例所描述的基于物理层上行发送误码率和终端上行发送路径上的缓存对音视频数据编码速率的调整过程的基础上,进一步考虑了RoHC开关信息对语音数据传输的影响后的优化改进过程。
具体地,终端根据检测到的物理层上行发送误码率和/或终端上行发送路径上的缓存确定音视频数据的编码速率是否需要调整以及如何调整的过程可以参见前述实施例中的描述,本文在此将不再赘述。
为了更清楚地说明本发明上述实施例所提供的确定音视频数据编码速率的方案,下面将基于图1所示出的示例性音视频通路,以及本发明上述基于预设阈值的调整编码速率的实施例,结合附图对终端200执行如图3所示流程的示例性过程进行具体介绍。
图4示出了本发明的一些实施例提供的确定音视频数据编码速率的方案在音视频通路中应用的示意图。如图4所示,终端200所检测到的第一上行信息为物理层上行发送误码率。
具体地,如图4所示,终端可以被配置为在MAC 1028增加对当前物理层(PHY 1029)上行发送误码率的检测,以及可以被配置为由IMSA 1022接收由MAC 1028上报的当前物理层上行发送误码率,比如具体可以是由MAC  1028在每单位时间内(如1秒)检测向IMSA 1022上报一次,如图4中所示出的步骤401。
进一步地,终端可对IMSA 1022配置预设阈值,从而IMSA 1022将能够根据物理层上行发送误码率与预设阈值的比较结果,确定与比较结果和当前音视频数据的编码速率对应的、用于调整音视频数据的编码速率的预设准则。
比如,对于语音数据的上行传输,终端可以对IMSA 1022配置两个预设阈值(阈值1和阈值2),IMSA 1022可以将接收到的物理层上行发送误码率与阈值1和阈值2进行比较,基于与这两个预设阈值的比较结果,确定调整语音数据的编码速率的预设准则。具体比如,假设对于物理层上行发送误码率预设的阈值1为30%,阈值2为40%,假设用于编码语音数据的编码模式为AMR编码模式,并设定的与比较结果和当前语音数据的编码速率对应的预设准则为:当物理层上行发送误码率小于阈值1时,将当前语音数据的AMR编码模式(mode)加1;当物理层上行发送误码率大于阈值1而小于阈值2时,将当前语音数据的mode减2;当物理层上行发送误码率大于阈值2时,将当前语音数据的mode减4。
进一步地,IMSA 1022在根据物理上行发送误码率以及预设准则确定出语音数据的AMR编码模式后,可以向语音编码器(Voice Encoder 1031)发起改变AMR编码模式(mode)的调速请求,在该调速请求中携带所确定出的AMR编码模式,以使得Voice Encoder 1031按照该AMR编码模式对应的编码速率对语音数据进行编码(如图4中所示出的步骤402a);
相应地,Voice Encoder 1031在接收到该调速请求后,将按照该调速请求中携带的AMR编码模式确定出对应的编码速率进行语音编码,进而使得语音数据的编码速率与当前的物理层上行环境相适应,改善物理层传输误码率,达到改善语音通话质量的效果。
下表示出了本发明的一些实施例中基于物理层上行发送误码率对语音数据编码速率进行调速时对语音质量的改善测试结果,可以看到本发明的一些实施例所提供的方案对应的语音质量的平均主观意见分(Mean Opinion Score, Mos)均高于现有技术方案对应的语音质量的Mos分,本发明的一些实施例所提供的方案对应的丢包率也均远低于现有技术方案对应的丢包率。
Figure PCTCN2016104281-appb-000001
又比如,对于视频数据的上行传输,终端可以对IMSA 1022配置阈值3和阈值4,IMSA 1022可以将接收到的物理层上行发送误码率与阈值3和4进行比较,从而基于与这两个预设阈值的比较结果,确定与比较结果和当前视频数据的编码速率对应的预设准则。具体比如,假设对于物理层上行发送误码率预设的阈值3为30%,阈值4为40%,假设预设准则通过调整视频数据的视频帧率和/或压缩率实现视频数据编码速率的调整,并设定的与比较结果和当前视频数据的编码速率对应的预设准则为:当物理层上行发送误码率小于阈值3时,将当前视频数据的视频帧率增加1,将当前视频数据的压缩率减小10%;当物理层上行发送误码率大于阈值3而小于阈值4时,将当前视频数据的视频帧率减小2,并在当前视频数据的视频帧率已减少到6时,将当前视频数据的压缩率增加20%;当物理层上行发送误码率大于阈值4时,将当前视频数据的视频帧率减小4,并在当前视频数据的视频帧率已减少到6时,将当前视频数据的压缩率增加40%。
进一步地,IMSA 1022在根据物理上行发送误码率确定出视频数据的视频帧率和/或压缩率后,可以向视频编码器(Video Encoder 1014)发起调整视频编码码率的调速请求,在该调速请求中携带所确定出的视频帧率和/或视频数据的压缩率,以使得Video Encoder 1014按照该调速请求进行调整(如图4中所示出的步骤402b);
相应地,Video Encoder 1014在接收到该调速请求后,将按照该调速请求 中携带的视频帧率和/或视频数据的压缩率进行视频编码,进而使得视频数据的编码速率与当前的物理层上行环境相适应,改善物理层传输误码率,达到改善视频通话质量的效果。
应当理解的是,上述设置的阈值取值以及预设准则中的取值是基于实际应用经验以及考虑到实际应用中物理层上行误码率低于30%是上行数据传输的一个常用指标要求所设置的,在具体应用中,上述阈值的具体取值以及与比较结果相对应的预设准则都可以进行适应性的调整,本申请并不对此进行限定。
图5示出了本发明的一些实施例提供的确定音视频数据编码速率的方案在音视频通路中应用的示意图。为便于说明,本发明实施例以PDCP层上行发送缓存作为终端上行发送路径上的缓存的一种示例,示例性说明根据终端上行发送路径上的缓存调整音视频数据的编码速率;终端上行发送路径上的缓存,还至少包括音频编码器、视频编码器、RTP层、UDP层、IP层、PDCP层、RLC层、和/或MAC层等的上行发送缓存;根据音频编码器、视频编码器、RTP层、UDP层、IP层、PDCP层、RLC层、和/或MAC层等的上行发送缓存调整音视频数据的编码速率的方法实施,可以参考根据终端上行发送路径上的缓存调整音视频数据的编码速率的方法实施,重复之处不再赘述。如图5所示,终端200所检测到的第一上行信息为PDCP上行发送缓存。
具体地,如图5所示,终端可以被配置为在PDCP 1026增加对当前PDCP上行发送缓存的检测以及阈值设定,从而使得终端能够将检测到的当前PDCP上行发送缓存与预设阈值进行比较,进而基于比较结果确定用于调整音视频数据的编码速率的预设准则(比如该预设准则可以是否需要进行流控)。
具体地,终端可以将检测到的PDCP层上行发送缓存与预设的一个或多个阈值进行比较;从而在检测到的PDCP上行发送缓存大于一个阈值(比如第六阈值)时,确定对应的准则为需要进行流控,即音视频数据的编码速率需要减小;在检测到的PDCP层上行发送缓存小于该阈值或者另一个较小的阈值(比如第五阈值)时,确定对应的准则为可以解除流控,即音视频数据 的编码速率可以增加。
具体地,对于预设的阈值,如果VoLTE通话建立时网侧对PDCP层的缓存有要求,则该预设阈值可以是网侧所下发的值T1,否则可以取缺省值T2。T2可以使用经验值,比如对于语音数据和视频数据均可以取值为500ms;对于T1,考虑到在3GPP规范要求中,语音数据使用QCI=1的专用承载,视频数据使用QCI=2的专用承载,因而,对于使用QCI=1的专用承载的语音数据所对应配置的PDCP层缓存数据量的阈值可以具体设定为100ms;对于使用QCI=2的专用承载的视频数据所对应配置的PDCP层缓存数据量的阈值可以具体设定为150ms。
进一步地,在本发明的一些实施例中,对PDCP层配置阈值(假设为K)之后,可以进一步地通过对该阈值设定至少两个比例(比如为N1、N2)来得到用于参与缓存比较过程的至少两个阈值,比如,如果在PDCP层检测到当前上行发送缓存数据量超过设定阈值的一个比例(比如为K×N1),则认为需要进行流控(即确定音视频数据的编码速率需要减小);流控开始后,如果在PDCP层检测到缓存数据量逐步减少至低于设定阈值的又一个比例(比如为K×N2),则认为可以解除流控(即确定音视频数据的编码速率需要增加)。
具体地,考虑到流控频度以及缓存程度,对于语音数据的传输,上述对于PDCP层数据缓存阈值所设定的比例值N1和N2具体取值可以按照至少能够预留一个语音包时长(通常为20ms)的波动时间为准,比如N1可以通过计算式((1-20)/100)确定为80%,N2可以通过(1-N1)确定为20%;对于视频数据的传输,考虑到视频帧的时长不固定,上述比例值N1和N2的取值可以参考语音数据所对应的取值。
具体比如,可配置PDCP 1026在检测到上行发送缓存超过第一设定阈值时,通知CP 102中的IMSA 1022进行流控;以及在流控开始后,在检测到上行发送缓存减少至低于第二设定阈值时,通知IMSA 1022解除流控,进而IMSA 1022可被配置为接收来自PDCP 1026的通知并执行相应地处理(如图5中所示出的步骤501)。
相应地,对于语音数据,IMSA 1022在接收到PDCP 1026发送的对音视频数据进行流控的指示后,可以向Voice Encoder 1031发送用以实现流控的通知消息,以使Voice Encoder 1031根据该通知消息减小语音数据的编码速率,达到上行流控的目的;IMSA 1022在接收到PDCP 1026发送的解除音视频数据流控的指示后,可以向Voice Encoder 1031发送用以解除流控的通知消息,以使Voice Encoder 1031根据该通知消息增加语音数据的编码速率。(如图5中所示出的步骤502a)。
具体地,Voice Encoder 1031可以通过减小语音数据的编码模式等方式来减小输出的语音数据码率从而达到流控目的,减少语音数据在传输中由于终端上行发送路径上的缓存堆积而导致主动丢弃语音数据包的现象,改善语音通话的质量。
可选地,Voice Encoder 1031在减小(或增大)编码速率时,可以直接变化或逐步变化的方式减小(或增大)到VoLTE通话建立时所协商支持的最小(或最大)编码速率,其中,直接变化的方式流控灵敏度更好,逐步变化的方式语音质量更平滑。
类似地,对于视频数据,IMSA 1022在接收到PDCP 1026发送的对音视频数据进行流控的指示后,可以向AP 101中的Video Encoder 1014发送用以实现流控的通知消息,以使Video Encoder 1014根据该通知消息减小视频数据的编码速率;在接收到PDCP 1026发送的解除音视频数据流控的指示后,可以向Video Encoder 1014发送用以解除流控的通知消息,以使Video Encoder 1014根据该通知消息增加视频数据的编码速率(如图5中所示出的步骤502b)。
具体地,Video Encoder 1014可以通过减小视频帧率以及增大压缩率等方式来减小输出的视频码率从而达到流控目的,减少视频数据在传输中由于终端上行发送路径上的缓存堆积而导致主动丢弃视频数据包的现象,改善视频通话的质量。
图6示出了本发明的一些实施例提供的确定音视频数据编码速率的方案 在音视频通路中应用的示意图。如图6所示,终端200所检测到的第一上行信息为物理层上行发送误码率和PDCP上行发送缓存。
具体地,如图5所示,MAC 1028可以对当前物理层(PHY 1029)上行发送误码率进行检测,并将检测到的当前物理层上行发送误码率上报给IMSA 1022(如图6中所示出的步骤401);PDCP 1026可以对当前PDCP上行发送缓存进行检测,并基于检测到的当前PDCP上行发送缓存与预设阈值的比较结果向IMSA 1022上报是否需要进行流控的指示(如图6中所示出的步骤501);同时可配置由IMSA 1022基于MAC 1028上报的当前物理层上行发送误码率和PDCP 1026上报的是否需要进行流控的指示来确定调速策略。(如图6中所示出的针对语音数据编码的步骤601a和针对视频数据编码的601b)
具体地,如果IMSA 1022收到来自PDCP上报的需要进行流控的指示,则IMSA 1022将可以不考虑MAC 1028上报的当前物理层上行发送误码率,而直接向用于编码音视频数据的编码器发送用以实现流控的通知消息,以使用于编码音视频数据的编码器根据该通知消息减小音视频数据的编码速率,具体可以直接降速到最低速率,以避免发生缓存溢出丢弃音视频数据包的情况;如果IMSA 1022收到来自PDCP上报的解除流控的指示,则IMSA 1022可以进一步地基于MAC 1028上报的当前物理层上行发送误码率,按照前述实施例中所描述的基于物理层上行发送误码率进行调速的方式,调整音视频数据的编码速率。
具体地,语音编码器在按照处理器的指示调低或调高编码速率时,可以采取直接的方式调低或调高到通话建立时协商支持的最低编码速率或最高编码速率,或者也可以采取逐步的方式调低或调高到最低编码速率或最高编码速率,其中,前者流控的灵敏度更好,后者语音的质量更平滑。
在本发明的一些实施例中,终端进一步根据检测到的物理层上行发送误码率,指示语音编码器调整语音数据的编码速率的过程,具体可以如前述实施例所描述的,根据检测到的物理层上行发送误码率,调整用于编码语音数据的编码模式,本申请在此不再赘述。
图7示出了本发明的一些实施例提供的确定音视频数据编码速率的方案在音视频通路中应用的示意图。如图7所示,音视频数据为语音数据,终端200所检测到的第一上行信息包括有物理层上行发送误码率、PDCP上行发送缓存以及RoHC开关信息。
如图7所示,终端可被配置为检测网侧配置给RRC层的RoHC开关信息(如图7中所示出的步骤701),MAC 1028可以对当前物理层(PHY 1029)上行发送误码率进行检测,并将检测到的当前物理层上行发送误码率上报给IMSA 1022(如图7中所示出的步骤401);PDCP 1026可以对当前PDCP上行发送缓存进行检测,并基于检测到的当前PDCP上行发送缓存与预设阈值的比较结果向IMSA 1022上报是否需要进行流控的指示(如图7中所示出的步骤501);同时可被配置由IMSA 1022基于MAC 1028上报的当前物理层上行发送误码率、PDCP 1026上报的是否需要进行流控的指示以及RRC上报的RoHC开关信息来确定如何对Voice Encoder1031编码语音数据进行调速。
具体地,如果IMSA 1022收到来自RRC的RoHC开关信息指示RoHC打开,则IMSA 1022将可以按照前述如图6所示实施例中所描述的基于物理层上行发送误码率和PDCP上行发送缓存进行调速的方式,指示Voice Encoder 1031调整语音数据的编码速率;如果IMSA 1022收到来自RRC的RoHC开关信息指示RoHC关闭,则IMSA 1022在根据PDCP层上行发送缓存确定语音数据的编码速率需要减小和/或在根据物理层上行发送误码率确定语音数据的编码速率需要减小时,可以向语音编码器发送通知消息,以使语音编码器根据该通知消息在减小语音数据的编码速率时,按照使调低后的编码速率不低于预设编码速率的准则进行调整(比如该预设编码速率可以是AMR编码模式mode=2所对应的编码速率),其它情况则可以继续按照前述实施例中所描述的基于物理层上行发送误码率和PDCP上行发送缓存进行调速的方式执行,本申请在此将不再赘述。
具体地,终端在确定RoHC开关信息指示RoHC开关打开时,根据检测到的物理层上行发送误码率和/或终端上行发送路径上的缓存确定语音的编码 速率是否需要调整以及如何进行调整的过程,可以参见前述实施例的描述,本申请在此将不再赘述。
基于相同的技术构思,本发明实施例还提供了一种确定音视频数据编码速率的方法。该流程具体可由终端执行,并具体可通过软件、硬件或软硬件的结合来实现。比如图2所示出的示例性终端可提供用于执行如图3所示的流程步骤的装置或功能模块。
具体地,本发明的一些实施例所提供的确定音视频数据编码速率的方法流程的具体实施可以参见前述实施例中对于终端执行如图3所示流程的描述,本申请在此将不再赘述。
基于相同的技术构思,本发明实施例还提供了一种终端,该终端可执行前述实施例所描述的方法流程,并可以具体实现为如图2所示的终端中的装置或功能模块,可用于执行如图3所示的流程步骤。
图8示出了本发明的一些实施例所提供的终端的结构示意图,如图8所示,该终端包括:检测模块801,调整模块802;
检测模块801,用于检测上行信息,所述上行信息为用于表征上行音视频数据的传输属性的信息;
调整模块802,用于根据所述检测模块检测到的第一上行信息,调整所述音视频数据的编码速率。
在本发明的一些实施例所提供的终端中还可以进一步地包括:比较模块,用于将检测到的第一上行信息与预设阈值比较;调整模块802,具体用于:根据所述比较模块的比较结果以及与比较结果和当前音视频数据的编码速率对应的预设准则,调整音视频数据的编码速率。
具体地,上述本发明的一些实施例所提供的终端中的调整模块802所执行的具体过程可以参见前述实施例的描述,本申请在此将不再赘述。
基于同一发明构思,本发明的一些实施例所提供的终端中的检测模块以及调整模块802解决问题的原理以及有益效果可以参见上述图3所示方法以及图2所示终端的实施方式以及所带来的有益效果,因此该终端的实施可以 参见上述方法中步骤301,步骤302的实施,以及图2所示终端的实施,重复之处不再赘述。
基于相同的技术构思,本发明实施例还提供了一种存储介质,所述存储介质为非易失性计算机可读存储介质,所述非易失性计算机可读存储介质存储有至少一个程序,每个所述程序包括指令,所述指令当被具有处理器的终端执行时使所述终端执行根据本发明前述实施例所描述的确定音视频数据编码速率的方法流程,具体可参见前述实施例的描述,本申请在此将不再赘述。
通过以上描述可以看出,本发明实施例提供了一种确定音视频数据编码速率的方法、终端以及存储介质,以提供针对音视频数据在编码传输过程中的编码速率调整方案,该方案主要通过检测用于表征上行音视频数据的传输属性的上行信息,进而能够基于检测到的上行信息来调整音视频数据的编码速率,从而使得音视频数据的编码速率能够跟随音视频数据的上行传输实际情况而适应性地调整,达到音视频数据的编码与音视频数据的上行传输之间的速率平衡,进而改善物理层传输误码率或终端上行发送路径上的缓存堆积而导致主动丢弃音视频数据包等现象,改善VoLTE通话质量和时延。
应当指出的是,本发明上述实施例所提供的一种确定音视频数据编码速率的传输方案也可以适应性地应用到VoWIFI(voice over WIFI)业务中。其中,VoWIFI业务与VoLTE的区别主要在于VoWIFI业务将WiFi作为接入网接入IMS。
具体比如,图9示出了将本发明实施例所提供的确定音视频数据编码速率的方案适应性地应用到VoWIFI业务中的示例结构示意图。
如图9所示,该示例性系统层的结构中包括有AP 901、Audio DSP 902以及WIFI 903。其中,AP 901中主要可包括有Camera 9011、Video Encoder 9012、RTP 9013、IP stack 9014以及WIFI Driver 9015;Audio DSP 902中主要可包括有Voice Encoder 9021以及RTP 9022。
其中,音视频数据的传输流程与VoLTE类似,具体可以包括有语音数据的传输以及视频数据的传输。其中,上行语音数据传输过程可以包括由Audio  DSP 902将语音数据传输至AP 901中的IP stack 9014,再通过WIFI Driver 9015传输至WIFI 903(如图8所示的上行数据传输示意①-③);上行视频数据传输过程可以包括由AP 901中的Camera 9011、Video Encoder 9012以及RTP 9013将语音数据传输至IP stack 9014,再通过WIFI Driver 9015传输至WIFI 903(如图9所示的上行数据传输示意①’-③)。
具体地,在将本发明上述实施例所提供的一种音视频数据的传输方案也可以适应性地应用到VoWIFI业务时,与在应用与VoLTE业务中相区别的是,在应用到VoWIFI时,用于检测上行信息(比如上行发送误码率以及上行发送缓存等信息)的的检测点可以配置在WIFI Driver 9015,具体地如何基于WIFI Driver 9015检测到的上行信息确定是否需要调速以及如何进行调速可以参照前述实施例对在VoLTE业务中调整编码速率过程的描述。
具体地,在将本发明上述实施例所提供的一种音视频数据的传输方案也可以适应性地应用到VoWIFI业务时,IP协议栈使用AP 901侧,且把检测到的上行信息发给WIFI Driver 9015,其中,上行发送误码率信息保存在WIFI Driver 9015中,上行发送数据也缓存在WIFI Driver 9015。进而相应地,上行发送误码率信息以及基于或上行发送缓存确定出的流控消息将由WIFI Driver 9015发送给IP Stack 9014(如图9所示的调速指示示意901),再由IP Stack 9014通知给Audio DSP 902中的RTP 9022来控制语音数据的编码速率(如图9所示的调速指示示意902a),或者发给AP 901中的对应与Camera 9011的RTP 9013来控制视频数据的编码速率(如图9所示的调速指示示意902b)。
考虑到WIFI没有专用承载的概念,因而在将本发明上述实施例所提供的一种确定音视频数据编码速率的方案适应性地应用到VoWIFI业务时,对数据流量较大的VoWIFI视频通话质量和时延将能够取得较为明显的提升效果。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、 嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (22)

  1. 一种确定音视频数据编码速率的方法,其特征在于,该方法包括:
    终端检测上行信息,所述上行信息为用于表征上行音视频数据的传输属性的信息;
    所述终端根据检测到的第一上行信息,调整所述音视频数据的编码速率。
  2. 如权利要求1所述的方法,其特征在于,所述终端根据检测到的第一上行信息,调整所述音视频数据的编码速率,包括:
    终端根据预设的上行信息与音视频数据的编码速率的对应关系,确定与所述第一上行信息对应的第一编码速率,调整所述音视频数据的编码速率为第一编码速率。
  3. 如权利要求1所述的方法,其特征在于,所述终端根据检测到的第一上行信息,调整所述音视频数据的编码速率,包括:
    所述终端将检测到的第一上行信息与预设阈值比较;
    所述终端根据比较结果以及与所述比较结果和当前音视频数据的编码速率对应的预设准则,调整所述音视频数据的编码速率。
  4. 如权利要求1至3任一项所述的方法,其特征在于,所述音视频数据为语音数据。
  5. 如权利要求1至3任一项所述的方法,其特征在于,所述音视频数据为视频数据;可通过调整所述视频数据的视频帧率和/或所述视频数据的压缩率,以实现调整所述视频数据的编码速率。
  6. 如权利要求1至5任一项所述的方法,其特征在于,所述第一上行信息包括物理层上行发送误码率。
  7. 如权利要求1至5任一项所述的方法,其特征在于,所述第一上行信息包括终端上行发送路径上的缓存。
  8. 如权利要求7所述的方法,其特征在于,所述终端上行发送路径上的缓存包括以下任一种或多种:音频编码器/视频编码器缓存、实时传输协议RTP 层缓存、用户数据报协议UDP/传输控制协议TCP层缓存、因特网协议IP层缓存、分组数据汇聚协议PDCP层缓存、无线链路层控制协议RLC层缓存、和/或介质访问控制MAC层缓存。
  9. 如权利要求1至5中任一项所述的方法,其特征在于,所述第一上行信息包括物理层上行发送误码率和终端上行发送路径上的缓存;
    所述终端根据检测到的第一上行信息,调整所述音视频数据的编码速率,包括:
    当所述终端根据所述终端上行发送路径上的缓存确定所述音视频数据的编码速率需要减小时,则所述终端调低所述音视频数据的编码速率;
    当所述终端根据所述终端上行发送路径上的缓存确定所述音视频数据的编码速率保持不变或增加时,所述终端根据所述物理层上行发送误码率,调整所述音视频数据的编码速率。
  10. 如权利要求1至3所述的方法,其特征在于,所述音视频数据为语音数据;所述第一上行信息包括物理层上行发送误码率、终端上行发送路径上的缓存以及报头压缩RoHC开关信息;所述终端根据检测到的第一上行信息,调整所述音视频数据的编码速率,包括:
    当所述终端在确定所述RoHC开关信息指示RoHC开关关闭时,若根据所述物理层上行发送误码率和/或所述终端上行发送路径上的缓存确定所述语音数据的编码速率需要减小,则所述终端调低所述语音数据的编码速率至不低于预设编码速率。
  11. 一种终端,其特征在于,该终端包括:存储器和处理器;所述存储器与所述处理器耦合连接;所述存储器用于存储计算机可执行程序代码,所述程序代码包括指令;当所述处理器执行所述指令时,所述指令使所述电子设备执行根据权利要求1-10任一项所述的确定音视频数据编码速率的方法。
  12. 一种终端,其特征在于,所述终端包括:检测模块,调整模块;
    检测模块,用于检测上行信息,所述上行信息为用于表征上行音视频数据的传输属性的信息;
    调整模块,用于根据所述检测模块检测到的第一上行信息,调整所述音视频数据的编码速率。
  13. 如权利要求12所述的终端,其特征在于,所述调整模块,具体用于:
    根据预设的上行信息与音视频数据的编码速率的对应关系,确定与所述第一上行信息对应的第一编码速率,调整所述音视频数据的编码速率为第一编码速率。
  14. 如权利要求12所述的终端,其特征在于,还包括:
    比较模块,用于将检测到的第一上行信息与预设阈值比较;
    所述调整模块,具体用于:
    根据所述比较模块的比较结果以及与所述比较结果和当前音视频数据的编码速率对应的预设准则,调整所述音视频数据的编码速率。
  15. 如权利要求12至14任一项所述的终端,其特征在于,所述音视频数据为语音数据。
  16. 如权利要求12至14任一项所述的终端,其特征在于,所述音视频数据为视频数据;所述调整模块具体用于:通过调整所述视频数据的视频帧率和/或所述视频数据的压缩率,以实现调整所述视频数据的编码速率。
  17. 如权利要求12至16任一项所述的终端,其特征在于,所述第一上行信息包括物理层上行发送误码率。
  18. 如权利要求12至16任一项所述的终端,其特征在于,所述第一上行信息包括终端上行发送路径上的缓存。
  19. 如权利要求18所述的终端,其特征在于,所述终端上行发送路径上的缓存包括以下任一种或多种:音频编码器/视频编码器缓存、RTP层缓存、UDP/TCP层缓存、IP层缓存、PDCP层缓存、RLC层缓存、和/或MAC层缓存。
  20. 如权利要求12至16中任一项所述的终端,其特征在于,所述第一上行信息包括物理层上行发送误码率和终端上行发送路径上的缓存;
    所述调整模块,具体用于:
    在根据所述终端上行发送路径上的缓存确定所述音视频数据的编码速率需要减小时,调低所述音视频数据的编码速率;
    在根据所述终端上行发送路径上的缓存确定所述音视频数据的编码速率保持不变或增加时,根据所述物理层上行发送误码率,调整所述音视频数据的编码速率。
  21. 如权利要求12至14所述的终端,其特征在于,所述音视频数据为语音数据;所述第一上行信息包括物理层上行发送误码率、终端上行发送路径上的缓存以及报头压缩RoHC开关信息;
    所述调整模块,具体用于:
    当确定所述RoHC开关信息指示RoHC开关关闭时,若根据所述物理层上行发送误码率和/或所述终端上行发送路径上的缓存确定所述语音数据的编码速率需要减小,则调低所述语音数据的编码速率至不低于预设编码速率。
  22. 一种存储介质,其特征在于,所述存储介质为非易失性计算机可读存储介质,所述非易失性计算机可读存储介质存储有至少一个程序,每个所述程序包括指令,所述指令当被具有处理器的电子设备执行时使所述电子设备执行根据权利要求1-10任一项所述的确定音视频数据编码速率的方法。
PCT/CN2016/104281 2016-11-01 2016-11-01 一种确定音视频数据编码速率的方法、终端以及存储介质 WO2018081937A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2016/104281 WO2018081937A1 (zh) 2016-11-01 2016-11-01 一种确定音视频数据编码速率的方法、终端以及存储介质
CN201680080576.0A CN108702352B (zh) 2016-11-01 2016-11-01 一种确定音视频数据编码速率的方法、终端以及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/104281 WO2018081937A1 (zh) 2016-11-01 2016-11-01 一种确定音视频数据编码速率的方法、终端以及存储介质

Publications (1)

Publication Number Publication Date
WO2018081937A1 true WO2018081937A1 (zh) 2018-05-11

Family

ID=62075387

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/104281 WO2018081937A1 (zh) 2016-11-01 2016-11-01 一种确定音视频数据编码速率的方法、终端以及存储介质

Country Status (2)

Country Link
CN (1) CN108702352B (zh)
WO (1) WO2018081937A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115209439A (zh) * 2021-04-09 2022-10-18 Oppo广东移动通信有限公司 无线终端及发送、接收音视频数据的方法
CN113630619A (zh) * 2021-08-12 2021-11-09 三星电子(中国)研发中心 节目录制方法和装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101212459A (zh) * 2006-12-28 2008-07-02 华为技术有限公司 控制媒体编码速率的方法、系统和设备
WO2009048298A2 (en) * 2007-10-10 2009-04-16 Samsung Electronics Co., Ltd. Method for setting output bit rate for video data transmission in a wibro system
CN103560862A (zh) * 2013-10-18 2014-02-05 华为终端有限公司 移动终端及其编码速率控制方法
CN104519415A (zh) * 2013-09-26 2015-04-15 成都鼎桥通信技术有限公司 自适应调整的触发方法、音视频处理设备和网络设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101924603B (zh) * 2009-06-09 2014-08-20 华为技术有限公司 数据传输速率的自适应调整方法、装置及系统
EP2466870B1 (en) * 2010-12-14 2015-07-08 Alcatel Lucent Caching entity
CN102739548B (zh) * 2012-07-12 2015-08-19 苏州阔地网络科技有限公司 一种数据传输的速率控制方法
CN103079196B (zh) * 2013-01-11 2016-02-10 广东欧珀移动通信有限公司 加密通话方法及终端

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101212459A (zh) * 2006-12-28 2008-07-02 华为技术有限公司 控制媒体编码速率的方法、系统和设备
WO2009048298A2 (en) * 2007-10-10 2009-04-16 Samsung Electronics Co., Ltd. Method for setting output bit rate for video data transmission in a wibro system
CN104519415A (zh) * 2013-09-26 2015-04-15 成都鼎桥通信技术有限公司 自适应调整的触发方法、音视频处理设备和网络设备
CN103560862A (zh) * 2013-10-18 2014-02-05 华为终端有限公司 移动终端及其编码速率控制方法

Also Published As

Publication number Publication date
CN108702352A (zh) 2018-10-23
CN108702352B (zh) 2021-09-14

Similar Documents

Publication Publication Date Title
US11082956B2 (en) Scheduling systems and methods for wireless networks
US8750207B2 (en) Adapting transmission to improve QoS in a mobile wireless device
US7194000B2 (en) Methods and systems for provision of streaming data services in an internet protocol network
US11677689B2 (en) Data processing method and apparatus
TWI415433B (zh) 雙向無線電連結控制非持久模式低延遲服務
US9706565B2 (en) Method and device for video transmission
JP5021681B2 (ja) 無線通信ネットワークにおけるアップリンクチャネルの性能最適化
TWI462623B (zh) 演進無線系統中電路切換語音應用資料率控制方法
WO2021163954A1 (zh) 数据传输方法、装置、设备、系统及介质
WO2017201677A1 (zh) 数据传输的方法及装置
US20140226476A1 (en) Methods Providing Packet Communications Including Jitter Buffer Emulation and Related Network Nodes
JP2009533967A (ja) 無線リソース制御が要求するVoIPに対するコーデック速度制御方法
US9674737B2 (en) Selective rate-adaptation in video telephony
JP2010530155A (ja) リアルタイム通信システムにおけるジッタベースのメディアレイヤアダプテーション
WO2012097737A1 (zh) 一种数据传输控制方法及设备
WO2012163305A1 (zh) 数据传输控制方法和设备
WO2011153903A1 (zh) 一种ip接口amr语音编码速率调整方法及装置
US20100172332A1 (en) Method and apparatus for controlling a vocoder mode in a packet switched voice wirelss network
WO2018081937A1 (zh) 一种确定音视频数据编码速率的方法、终端以及存储介质
WO2010082236A1 (ja) バッファ制御装置及び無線通信端末
WO2012146150A1 (zh) 数据传输中速率的调整方法和设备
WO2017045125A1 (zh) 语音自适应参数的调整方法、系统及相关设备
WO2017045127A1 (zh) 媒体自适应参数的调整方法、系统及相关设备
WO2020042167A1 (zh) 一种提高语音通话质量的方法、终端和系统
CN114285800A (zh) 一种tcp数据流的拥塞调整方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16920461

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16920461

Country of ref document: EP

Kind code of ref document: A1