CN113573003A

CN113573003A - Weak network-based audio and video real-time communication method, device and equipment

Info

Publication number: CN113573003A
Application number: CN202110920355.2A
Authority: CN
Inventors: 袁观福; 巫有福; 王居辉
Original assignee: Ringslink Xiamen Network Communication Technologies Co ltd
Current assignee: Ringslink Xiamen Network Communication Technologies Co ltd
Priority date: 2021-08-11
Filing date: 2021-08-11
Publication date: 2021-10-29
Anticipated expiration: 2041-08-11
Also published as: CN113573003B

Abstract

The invention discloses an audio and video real-time communication method based on a weak network, which comprises the following steps: receiving streaming media data and RR packets sent by a sending end; rearranging the streaming media data, assembling effective video frames, and decoding and playing the video frames; performing Round Trip Time (RTT) and packet loss rate statistics on the RR packet, monitoring whether a request is generated and sent to the sending end through a first RTCP based on preset time, wherein the network parameter comprises at least one of a key frame PLI, NACK, a REMB packet and an SR packet; and triggering the corresponding callback function according to different requests through a second RTCP, and dynamically correcting the correlation value of the encoder and retransmission of packet loss data through the callback function. The method and the device can ensure the video call quality of the weak network and improve the conversation experience of the user under the condition of configuring relatively low-end equipment.

Description

Weak network-based audio and video real-time communication method, device and equipment

Technical Field

The invention relates to the technical field of audio and video communication, in particular to an audio and video real-time communication method, device and equipment based on a weak network.

Background

At present, most open sources are designed based on a PC, and the framework of the open sources is huge. On one hand, the existing mainstream audio/video frameworks such as ffmpeg, webRTC and the like have large overall items including the size of a library and have higher requirements on a memory and a CPU (central processing unit), and on the other hand, the video quality condition of a weak network is severer under the condition of low memory and CPU performance, so that the difficulty in realizing a relatively complete weak network solution is higher; on the other hand, a large number of STL standard library functions are adopted, the method is not suitable for low-configuration Linux embedded terminal products, low-configuration equipment has a single audio and video solution under the weak network, only a simple packet loss retransmission function can be supported by part of the low-configuration equipment, and the weak network countermeasure effect is general. Therefore, the low-configuration device in the existing scheme cannot provide a good audio and video call effect under a weak network.

Disclosure of Invention

In view of this, the present invention provides a method, an apparatus, and a device for audio/video real-time communication based on a weak network, which can ensure the quality of video call of the weak network and improve the session experience of a user when relatively low-end devices are configured.

In order to achieve the above object, the present invention provides an audio and video real-time communication method based on a weak network, the method comprising:

receiving streaming media data and RR packets sent by a sending end;

rearranging the streaming media data, assembling effective video frames, and decoding and playing the video frames;

performing Round Trip Time (RTT) and packet loss rate statistics on the RR packet, monitoring whether a request is generated and sent to the sending end through a first RTCP based on preset time, wherein the network parameter comprises at least one of a key frame PLI, NACK, a REMB packet and an SR packet;

and triggering the corresponding callback function according to different requests through a second RTCP, and dynamically correcting the correlation value of the encoder and retransmission of packet loss data through the callback function.

Preferably, the rearranging and assembling the streaming media data into an effective video frame, and the decoding and playing the video frame includes:

performing packet loss statistics on the sequence number of the streaming media data to obtain a packet loss queue, and sending the packet loss queue to the first RTCP to generate a NACK request;

and judging whether a packet loss preset value is reached or not according to the packet loss queue, and/or analyzing whether a decoding error exists or not, if so, sending the packet loss preset value to the first RTCP to generate a key frame PLI request.

Preferably, the step of performing packet loss statistics on the sequence number of the streaming media data to obtain a packet loss queue includes:

and when an old packet is received, removing the corresponding sequence number from the packet loss queue.

Preferably, after performing the round trip time RTT and packet loss rate statistics on the RR packet, the method further includes:

and processing the data size and the receiving time of the acquired streaming media data according to the packet loss rate and the Round Trip Time (RTT) to obtain a predictive coding rate of the current bandwidth and sending the predictive coding rate to the first RTCP to generate a REMB request.

Preferably, the processing further comprises:

processing is performed by using Kalman filtering or a neural network.

Preferably, the step of triggering the corresponding callback function according to the different requests through the second RTCP, and dynamically correcting the correlation value of the encoder and the retransmission of the packet loss data through the callback function includes:

if the NACK request is received, analyzing the packet loss queue, and then checking whether each serial number is outdated and whether the current retransmission flow exceeds the preset flow, if so, further checking whether the packet loss queue exists in a buffer queue, and if so, retransmitting the corresponding flow packet.

and if the key frame PLI request is received, judging whether a plurality of requests in a short time exist, and if so, requesting a decoder to generate a key frame for sending.

if the REMB request is received, analyzing a rate value according to the REMB request, and calculating through TFRC according to the packet loss rate and the round trip time RTT to obtain a lower limit of a sending rate value;

and correcting the sending rate value according to the packet loss rate, the round trip time RTT and the rate value, calculating the code rate and the resolution of the current bandwidth based on a preset rate range, and sending the code rate and the resolution to a decoder for modification.

In order to achieve the above object, the present invention further provides an audio/video real-time communication device based on a weak network, the device comprising:

a receiving unit, configured to receive streaming media data and an RR packet sent by a sending end;

the assembly unit is used for rearranging the streaming media data, assembling effective video frames and decoding and playing the video frames;

a monitoring unit, configured to perform round trip time RTT and packet loss rate statistics on the RR packet, and monitor, based on a preset time, whether a request is generated and sent to the sending end by using a first RTCP, where the network parameter includes at least one of a key frame PLI, NACK, a REMB packet, and an SR packet;

and the correcting unit is used for triggering the corresponding callback function according to different requests through a second RTCP, and dynamically correcting the correlation value of the encoder and retransmission of the packet loss data through the callback function.

In order to achieve the above object, the present invention further provides a weak network-based audio/video real-time communication device, which includes a processor, a memory, and a computer program stored in the memory, where the computer program is executable by the processor to implement the weak network-based audio/video real-time communication method according to the above embodiment.

Has the advantages that:

above scheme, through adopting self-developed RTCP to carry out unified control to flow control, packet loss request etc. overall design is compact, and take for the flash of system little, memory and cpu consume still less, through simplifying inside realization, only provide corresponding support to specific weak net function, reduce unnecessary spending, make whole lighter-weight, the response is more quick, be applicable to the use of the relatively little Linux terminal equipment of various configurations, guaranteed the quality promotion of weak net video.

According to the scheme, the quality of video call under a weak network is guaranteed by packet loss retransmission, key frame requests, bandwidth prediction, dynamic code rate and frame rate, and smooth video playing under high delay and packet loss is realized under low bandwidth.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow diagram of an audio and video real-time communication method based on a weak network according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of an auxiliary processing system according to an embodiment of the present invention.

Fig. 3 is a schematic structural diagram of an audio/video real-time communication device based on a weak network according to an embodiment of the present invention.

Fig. 4 is a schematic structural diagram of an audio-video real-time communication device based on a weak network according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

In the description of the present invention, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

The present invention will be described in detail with reference to the following examples.

In the prior art, only high-configuration equipment can provide a solution for audio and video weak network countermeasure, and most low-configuration equipment cannot provide a good audio and video call effect under the weak network. Because many weak network schemes are transplanted to an open source webRTC library or subjected to secondary optimization, the weak network schemes are relatively redundant as a whole and are not suitable for weak network countermeasure in a specific scene. The whole development period of the transplanted open source project is short, but a plurality of redundant functions are introduced to increase the resource consumption of the system, the system is relatively complex, the data processing and response period is longer, and the method is not suitable for some specific video call scenes. Therefore, the solution of the lightweight Linux terminal weak network video call is provided, the C/C + + code is used for realizing the algorithm, the platform applicability is wider, and the algorithm can be used by being embedded into an audio and video framework constructed by the existing C or C + + language. The method can better support packet loss, jitter and time delay under the weak network, and can better ensure the audio and video conversation quality under the weak network.

It should be noted that the communication method, the communication apparatus, and the auxiliary processing system provided in the embodiments of the present application are applied to a terminal that is deployed with a simple audio/video framework (such as FFmpeg or webRTC).

In this embodiment, the communication method is implemented based on an auxiliary processing system, wherein the auxiliary processing system can be seen from fig. 2. The communication method provided by the application can assist in improving the audio and video call quality under the weak network of the existing audio and video framework.

The communication method comprises the following steps:

and S11, receiving the streaming media data and the RR packets sent by the sending end.

And S12, rearranging the streaming media data, assembling effective video frames, and decoding and playing the video frames.

Wherein, the step of rearranging the streaming media data and assembling the effective video frame, and the step of decoding and playing the video frame comprises:

s12-1, performing packet loss statistics on the serial number of the streaming media data to obtain a packet loss queue, and sending the packet loss queue to the first RTCP to generate a NACK request;

s12-2, according to the packet loss queue, judging whether a packet loss preset value is reached, and/or analyzing whether a decoding error exists, if yes, sending the packet loss preset value to the first RTCP to generate a key frame PLI request.

In this embodiment, streaming media data from a transmitting end is received, rearranged and assembled into valid video frames. And for incomplete video frames, the correct reception of the video frames is guaranteed by waiting for a certain time. And counting according to the sequence number of the streaming media to obtain a packet loss queue, and removing the corresponding sequence number from the packet loss queue when an old packet is received, so as to ensure the smoothness of a video frame and avoid redundancy repetition. And sending the packet loss queue to a first RTCP to generate a NACK request, wherein the first RTCP is a receiving end RTCP, and the second RTCP is a sending end RTCP. Judging whether the packet loss amount of the current packet loss queue reaches a preset packet loss capacity value or not according to the packet loss amount of the current packet loss queue, judging whether a decoding error exists during decoding, judging whether the streaming media data stored in the jitter buffer area exceed a stored preset value or not, and if yes, sending the streaming media data to the first RTCP to generate a key frame PLI request.

S13, performing Round Trip Time (RTT) and packet loss rate statistics on the RR packet, monitoring whether a request is generated and sent to the sending end through a first RTCP based on preset time, wherein the network parameter comprises at least one of a key frame PLI, NACK, a REMB packet and an SR packet.

And further performing Kalman filtering processing according to the packet loss rate and the Round Trip Time (RTT) as well as the data size and the receiving time of the acquired streaming media data to obtain a predictive code rate of the current bandwidth and send the predictive code rate to the first RTCP to generate a REMB request.

In this embodiment, the RR packet of the receiving sending end performs round trip time RTT and packet loss statistics, and simultaneously checks whether a key frame PLI request needs to be sent, whether a NACK request needs to be sent, whether a REMB packet needs to be sent, whether an SR packet needs to be sent, and the like at regular time.

And S14, triggering corresponding callback functions according to different requests through a second RTCP, and dynamically correcting the correlation value of the encoder and retransmission of the packet loss data through the callback functions.

Triggering corresponding callback functions according to different requests through a second RTCP, and dynamically correcting correlation values of the encoder and retransmission of packet loss data through the callback functions, wherein the steps comprise:

s14-1, if the NACK request is received, analyzing the packet loss queue, and then checking whether each sequence number is outdated, and whether the current retransmission traffic exceeds a preset traffic, if so, further checking whether the packet loss queue exists in a buffer queue, and if so, retransmitting the corresponding traffic packet.

S14-2, if the key frame PLI request is received, judging whether there is a plurality of requests in short time, if yes, requesting to generate a key frame to send to a decoder.

S14-3, if the REMB request is received, resolving a speed value according to the REMB request, and calculating by TFRC according to the packet loss rate and the round trip time RTT to obtain a lower limit of a sending speed value;

According to the method, the self-developed weak network countermeasure scheme is adopted, the whole code volume is controlled within 300k, meanwhile, the size of a newly-added flash integrated into an audio and video frame is within 300k, the comparison before and after running is within 3M of the whole memory usage, and lower memory usage and code volume are achieved. Meanwhile, most of components are written by pure C, do not depend on a standard STL library, and are convenient to transplant to other platforms for realization. The low-configuration Linux terminal equipment can carry out video call under 4CIF resolution and 1024kb code rate, and can normally play videos under a weak network with 30% lost packet and 800ms delay. The flow control and packet loss requests are controlled in a unified mode mainly through self-researched RTCP, the overall design is compact, and protocol messages such as main flows SR, RR, BYE, NACK, PLI and REMB are supported. In addition, the method and the device also realize the functions of packet loss retransmission, network bandwidth prediction, video jitter buffer, dynamic code rate, dynamic resolution and the like of the main flow weak network countermeasure, and have good improvement on the video quality under the weak network.

In fig. 2, the auxiliary processing system includes a receiver and a sender, wherein the receiver includes a jitter buffer component M1, a packet loss statistic component M2, a key frame request component M3, a receiver rate estimation component M4, and a first RTCP component M5; the sending end comprises a second RTCP component M6, a packet loss retransmission component M7, a key frame retransmission component M8, a sending end rate estimation component M9, a dynamic resolution and code rate component M10. Specifically, the method comprises the following steps:

jitter buffer component M1: receiving streaming media data from a sending end, rearranging streaming media data packets, assembling effective video frames, ensuring correct receiving of the video frames by waiting for a certain time for incomplete video frames, and sending a streaming media sequence number to M2.

Packet loss statistics component M2: and counting according to the received streaming media sequence number to obtain a packet loss queue, removing the sequence number from the queue when an old packet is received, and transmitting the packet loss queue to M5.

Key frame request component M3: and judging whether the current packet loss capacity reaches the maximum value, whether decoding errors occur, whether a jitter buffer area stores information exceeding the maximum packet and the like, and informing M5 of sending a key frame PLI request.

Receiving end rate estimation module M4: and obtaining packet loss rate and round-trip time according to the RTCP RR data packets received by the M5, and simultaneously carrying out Kalman filtering on the data size and the receiving time of the received streaming media packet to finally obtain a code rate predicted value for the current bandwidth and transmitting the code rate predicted value to the M5.

RTCP sink assembly M5: and receiving the RR packet of the sending end M6 to perform round trip time RTT and packet loss statistics, simultaneously regularly checking whether a key frame PLI request needs to be sent or not, whether a NACK request needs to be sent or not, whether a REMB packet needs to be sent or not, whether an SR packet needs to be sent or not and the like, and uniformly sending the RTR packet and the SR packet to the M6 of the sending end.

RTCP sender component M6: receiving the RTCP packet for analysis, replying an RR packet according to an SR packet, analyzing and triggering different callback functions according to different requests, and analyzing a packet loss list and transmitting the list to M7 for NACK requests; for a key frame PLI request, pass to M8; for REMB requests, the parsed predicted code rate is passed to M9.

Packet loss retransmission component M7: buffering a certain amount of streaming media data packets, when a packet loss queue is received, respectively checking whether each sequence number is outdated, checking whether the current retransmission flow exceeds the media flow, if so, further checking whether the current retransmission flow exists in the buffer queue, and if so, retransmitting the flow packet.

Key frame retransmission component M8: after receiving the key frame request, firstly judging whether a plurality of requests in a short time exist, and if the request meets the condition, requesting a decoder to generate and send a key frame.

Sender-side rate estimation component M9: and analyzing the estimated rate according to the received REMB, obtaining the lower limit of the sending rate by using a TFRC (round trip time) calculation formula according to the received packet loss rate and the RTT (round trip time), correcting the received bandwidth estimation according to the current packet loss rate, and transmitting the final sending end rate, the packet loss rate and the RTT to M10.

Dynamic resolution, rate component M10: and correcting the rate value according to the received rate value, the received packet loss rate and the RTT, finally obtaining the optimal rate and resolution under the current bandwidth by utilizing an experience table according to the rate range, and informing a decoder to modify the currently sent rate and resolution.

Specifically, the audio and video stream of the receiving end is introduced into the M1 for jitter countermeasure, and the M1 video frame is added into a decoder of the receiving end for decoding and playing; meanwhile, the sender needs to access the data sent by the encoder to M7 for buffering the streaming media packet, and the M8 request key frames and M10 need to be applied to the sender encoder for modification and validation. In particular, the algorithms employed by M4 and M10 in the above description may also be replaced with deep-learning neural networks to predict outcomes. Through the Kalman filtering or the neural network, the accuracy of bandwidth prediction and dynamic resolution is higher, so that the performance of the whole session is further improved.

In this embodiment, the apparatus 30 includes:

a receiving unit 31, configured to receive streaming media data and an RR packet sent by a sending end;

an assembling unit 32, configured to rearrange the streaming media data, assemble an effective video frame, and decode and play the video frame;

a monitoring unit 33, configured to perform round trip time RTT and packet loss rate statistics on the RR packet, and monitor, based on a preset time, whether a request is generated and sent to the sending end by using a first RTCP, where the network parameter includes at least one of a key frame PLI, NACK, a REMB packet, and an SR packet;

and a correcting unit 34, configured to trigger a corresponding callback function according to different requests through a second RTCP, and dynamically correct the correlation value of the encoder and retransmission of packet loss data through the callback function.

Wherein, the assembling unit 32 further includes:

a first generating unit, configured to perform packet loss statistics on the sequence number of the streaming media data to obtain a packet loss queue, and send the packet loss queue to the first RTCP to generate a NACK request;

and the second generating unit is used for judging whether a packet loss preset value is reached or not according to the packet loss queue and/or whether a decoding error exists or not through analyzing the decoding, and if so, sending the packet loss preset value to the first RTCP to generate a key frame PLI request.

Wherein the first generating unit is further configured to:

Wherein, the monitoring unit 33 further includes:

and a third generating unit, configured to process the data size and the receiving time of the obtained streaming media data according to the packet loss rate and the round trip time RTT, obtain a prediction code rate of a current bandwidth, and send the prediction code rate to the first RTCP to generate a REMB request. Wherein the processing further comprises: processing is performed by using Kalman filtering or a neural network.

Wherein, the correcting unit 34 is further configured to:

Each unit module of the apparatus 30 can respectively execute the corresponding steps in the above method embodiments, and therefore, the detailed description of each unit module is omitted here, and please refer to the description of the corresponding steps above.

The embodiment of the invention also provides weak network-based audio and video real-time communication equipment which comprises a processor, a memory and a computer program stored in the memory, wherein the computer program can be executed by the processor to realize the weak network-based audio and video real-time communication method.

As shown in fig. 4, the weak network-based audiovisual real-time communication device may include, but is not limited to, a processor, a memory. It will be understood by those skilled in the art that the schematic diagram is merely an example of the weak network-based audiovisual real-time communication device, and does not constitute a limitation of the weak network-based audiovisual real-time communication device, and may include more or less components than those shown, or combine some components, or different components, for example, the weak network-based audiovisual real-time communication device may further include an input-output device, a network access device, a bus, etc.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general processor can be a microprocessor or the processor can also be any conventional processor and the like, and the control center of the audio and video real-time communication equipment based on the weak network utilizes various interfaces and lines to connect all parts of the whole audio and video real-time communication equipment based on the weak network.

The memory can be used for storing the computer program and/or the module, and the processor realizes various functions of the audio and video real-time communication device based on the weak network by operating or executing the computer program and/or the module stored in the memory and calling the data stored in the memory. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

If the integrated unit of the audio and video real-time communication equipment based on the weak network is realized in the form of a software functional unit and is sold or used as an independent product, the integrated unit can be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc.

It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.

The embodiments in the above embodiments can be further combined or replaced, and the embodiments are only used for describing the preferred embodiments of the present invention, and do not limit the concept and scope of the present invention, and various changes and modifications made to the technical solution of the present invention by those skilled in the art without departing from the design idea of the present invention belong to the protection scope of the present invention.

Claims

1. A weak network-based audio and video real-time communication method is characterized by comprising the following steps:

receiving streaming media data and RR packets sent by a sending end;

2. The weak network-based audio-video real-time communication method according to claim 1, wherein the step of rearranging the streaming media data and assembling valid video frames, and the step of decoding and playing the video frames comprises:

3. The audio-video real-time communication method based on the weak network according to claim 2, wherein the step of performing packet loss statistics on the sequence number of the streaming media data to obtain a packet loss queue comprises:

4. The weak network-based audio/video real-time communication method according to claim 1, wherein after performing round trip time RTT and packet loss rate statistics on the RR packet, the method further comprises:

5. The weak network based audio-video real-time communication method according to claim 4, wherein the processing further comprises:

processing is performed by using Kalman filtering or a neural network.

6. The method according to claim 2, wherein a corresponding callback function is triggered according to different requests through a second RTCP, and the step of dynamically correcting the correlation value of the encoder and the retransmission of the packet loss data through the callback function includes:

7. The method according to claim 2, wherein a corresponding callback function is triggered according to different requests through a second RTCP, and the step of dynamically correcting the correlation value of the encoder and the retransmission of the packet loss data through the callback function includes:

8. The weak network-based audio-video real-time communication method according to claim 4, wherein a corresponding callback function is triggered according to different requests through a second RTCP, and the step of dynamically correcting the correlation value of the encoder and the retransmission of the packet loss data through the callback function includes:

9. An audio-video real-time communication device based on a weak network, the device comprising:

10. A weak network based audiovisual real-time communication device, characterized in that it comprises a processor, a memory and a computer program stored in said memory, said computer program being executable by said processor to implement a weak network based audiovisual real-time communication method according to any of claims 1 to 8.