CN111918093A

CN111918093A - Live broadcast data processing method and device, computer equipment and storage medium

Info

Publication number: CN111918093A
Application number: CN202010812124.5A
Authority: CN
Inventors: 向晨宇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-08-13
Filing date: 2020-08-13
Publication date: 2020-11-10
Anticipated expiration: 2040-08-13
Also published as: CN111918093B

Abstract

The application relates to a live data processing method and device, computer equipment and a storage medium. The method comprises the following steps: acquiring live broadcast stream data, and writing the live broadcast stream data into a live broadcast cache region; the live streaming data comprises audio streaming data and video streaming data, when the cached data amount of the live streaming data in the live streaming buffer area meets the playing adjustment condition, the initial audio frame rate of the audio streaming data in the live streaming buffer area is adjusted according to the cached data amount to obtain a target audio frame rate, the audio streaming data is played according to the target audio frame rate, and the video streaming data corresponding to the audio streaming data with the adjusted audio frame rate is synchronously played. The method can reduce the playing delay of the playing end.

Description

Live broadcast data processing method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a live data processing method and apparatus, a computer device, and a storage medium.

Background

With the development of computer technology, live webcast is favored by users because of the advantages of images, sounds and characters, especially, instant and interactive communication and communication modes are provided for users, the number of active users of live webcast is continuously increased, the watching demands of audiences are also diversified, and more people not only serve as the audiences of live webcast, but also start to enter the live webcast industry as anchor.

However, in the live broadcasting process, when the network jitter occurs at the playing end, the live broadcasting picture at the playing end is jammed. When the network of the playing end is recovered to normal, the playing end can receive a large amount of live broadcast data, so that the live broadcast data is accumulated, and the playing delay of the playing end is increased.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a live data processing method, apparatus, computer device and storage medium capable of reducing play delay at a play end.

A method of live data processing, the method comprising:

acquiring live broadcast stream data, and writing the live broadcast stream data into a live broadcast cache region; the live streaming data comprises audio streaming data and video streaming data;

when the cached data volume of the live streaming data in the live streaming buffer area meets the playing adjustment condition, adjusting the initial audio frame rate of the audio streaming data in the live streaming buffer area according to the cached data volume to obtain a target audio frame rate;

and playing audio stream data according to the target audio frame rate, and synchronously playing video stream data corresponding to the audio stream data with the adjusted audio frame rate.

A live data processing apparatus, the apparatus comprising:

the data acquisition module is used for acquiring live streaming data and writing the live streaming data into a live caching area; the live streaming data comprises audio streaming data and video streaming data;

the frame rate adjusting module is used for adjusting the initial audio frame rate of the audio stream data in the live broadcast buffer zone according to the cached data amount when the cached data amount of the live broadcast stream data in the live broadcast buffer zone meets the playing adjusting condition, so as to obtain a target audio frame rate;

and the data playing module is used for playing the audio stream data according to the target audio frame rate and synchronously playing the video stream data corresponding to the audio stream data with the adjusted audio frame rate.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

The live broadcast data processing method, the live broadcast data processing device, the computer equipment and the storage medium acquire live broadcast stream data and write the live broadcast stream data into the live broadcast cache region; the live streaming data comprises audio streaming data and video streaming data, when the cached data amount of the live streaming data in the live streaming buffer area meets the playing adjustment condition, the initial audio frame rate of the audio streaming data in the live streaming buffer area is adjusted according to the cached data amount to obtain a target audio frame rate, the audio streaming data is played according to the target audio frame rate, and the video streaming data corresponding to the audio streaming data with the adjusted audio frame rate is synchronously played. Therefore, when the playing end is blocked, live streaming data in the live streaming buffer area can be accumulated, and when the cached data volume of the live streaming data in the live streaming buffer area meets the playing adjustment condition, the playing speed of the audio data is adjusted according to the cached data volume, and the audio data is played at an accelerated speed without frame loss according to the target audio frame rate, so that the playing delay of the playing end can be reduced while the integrity of the audio content in the live content is ensured, and the difference between the playing content of the playing end and the live content of the live streaming end is reduced. In addition, when the audio data is played, the video data is played synchronously following the audio data, so that the audio data and the video data are played synchronously.

Drawings

FIG. 1 is a diagram of an application environment of a live data processing method in one embodiment;

FIG. 2 is a flow diagram of a method for live data processing in one embodiment;

FIG. 3 is a diagram of the transmission of live streaming data in one embodiment;

fig. 4 is a schematic diagram of audio and video synchronous playing in one embodiment;

FIG. 5 is a flow diagram illustrating an exemplary process for increasing an initial audio frame rate;

FIG. 6 is a flow diagram illustrating a process of determining a target frame rate increase rate according to a buffered data amount of audio stream data in one embodiment;

FIG. 7 is a diagram illustrating an embodiment of determining a frame rate adjustment ratio according to a buffered data amount of audio stream data;

FIG. 8 is a flow diagram illustrating a process for reducing an initial audio frame rate in one embodiment;

FIG. 9 is a flow diagram of a method for live data processing in one embodiment;

fig. 10 is a flowchart illustrating a live data processing method according to another embodiment;

FIG. 11 is a diagram of an application scenario of a live data processing method in one embodiment;

fig. 12 is a diagram of an application scenario of a live data processing method in another embodiment;

FIG. 13 is a block diagram of a live data processing apparatus in one embodiment;

FIG. 14 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The live data processing method provided by the application can be applied to the application environment shown in fig. 1. The live terminal 102 communicates with the server 104 through a network, and the server 104 communicates with the playback terminal 106 through the network. The live terminal 102 and the play terminal 106 are both installed with live related clients. The live terminal 102 and the play terminal 106 may be, but are not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster composed of a plurality of servers.

Specifically, the server receives live streaming data sent by a live terminal, wherein the live streaming data comprises video streaming data and audio streaming data. The server sends the live streaming data to the playing terminal. The playing terminal writes the live streaming data into the live streaming buffer area, and when the cached data volume of the live streaming data in the live streaming buffer area meets the playing adjustment condition, the playing terminal adjusts the initial audio frame rate of the audio streaming data in the live streaming buffer area to the target audio frame rate according to the cached data volume. And the playing terminal plays the audio stream data according to the target audio frame rate and synchronously plays the video stream data corresponding to the audio stream data with the adjusted audio frame rate.

In an embodiment, as shown in fig. 2, a live data processing method is provided, which is described by taking an example that the method is applied to the play terminal in fig. 1, and includes the following steps:

step S202, acquiring live streaming data, and writing the live streaming data into a live caching area; live streaming data includes audio streaming data and video streaming data.

The live streaming data is data related to live broadcasting in the live broadcasting process. The live stream data includes audio stream data that is data related to sound and video stream data that is data related to images. The live broadcast buffer area is a buffer area of live broadcast stream data and is used for temporarily storing the live broadcast stream data so as to adjust the play mode of the live broadcast stream data.

Specifically, in order to watch live broadcast, the play terminal needs to obtain live broadcast stream data related to the live broadcast from the server. When the network condition of the playing terminal is always kept good, in order to reduce the playing delay of the playing terminal, the playing terminal can directly decode and play without buffering after acquiring the live streaming data from the server. Certainly, in order to maintain the playing stability, after the playing terminal acquires the live streaming data from the server, the live streaming data can be stored in a live streaming buffer area, and when the accumulation of the live streaming data reaches a certain threshold, the live streaming data is decoded and played. When the network condition of the playing terminal is abnormal, the playing terminal can not obtain live streaming data from the server temporarily, the live streaming data continuously generated by the live playing terminal is accumulated in the server, and after the network condition of the playing terminal is recovered to be normal, the playing terminal can obtain a large amount of live streaming data from the server and store the live streaming data in a live streaming cache area.

Referring to fig. 3, fig. 3 is a diagram illustrating transmission of live streaming data in one embodiment. The live streaming data comprises a plurality of live data blocks. When the network conditions of the live broadcast terminal and the play terminal are normal, the encoding speed and the uplink speed of the live broadcast stream data are kept relatively stable, and the downlink speed and the decoding speed of the live broadcast stream data are kept relatively stable, such as the first live broadcast data block and the second live broadcast data block in fig. 3, so that the play terminal can smoothly play the live broadcast stream data, and a user can smoothly listen to live broadcast audio and watch live broadcast pictures. However, when the network abnormality occurs in the live broadcast terminal, the uplink speed of the live broadcast stream data is lower than the encoding speed, the live broadcast stream data generated by the live broadcast terminal cannot be uploaded to the server in time, and the live broadcast stream data backlogs on the live broadcast terminal, for example, the third live broadcast data block and the fourth live broadcast data block in fig. 3, so that the server cannot send the live broadcast stream data to the play terminal in time, and the play terminal may jam, for example, the play terminal appears a black screen, and the play terminal appears a text prompt "wait buffer" or the like. In addition, when the network abnormality occurs at the play terminal, and the downstream speed of the live streaming data is lower than the decoding speed, the server cannot send the live streaming data to the play terminal in time, and the play terminal is blocked. Therefore, in the live broadcasting process, it is necessary to ensure that the uplink speed of the live broadcasting terminal is greater than the encoding speed and the downlink speed of the playing terminal is greater than the decoding speed, so that the smoothness of the live broadcasting picture of the playing terminal can be ensured.

In one embodiment, acquiring live streaming data and writing the live streaming data into a live buffer, includes: shunting the live streaming data to obtain audio streaming data and video streaming data; writing the audio stream data into an audio buffer area in the live broadcast buffer area; and writing the video stream data into a video buffer area in the live broadcast buffer area.

Specifically, the live broadcast buffer area includes an audio buffer area and a video buffer area, the audio buffer area is used for buffering audio stream data, the video buffer area is used for buffering video stream data, and the buffer capacities of the audio buffer area and the video buffer area can be set according to actual needs, for example, set to 2000 frames. The live broadcast streaming data comprises a plurality of live broadcast data blocks, data identification is stored in a preset position of each live broadcast data block, and the data identification is used for determining whether the live broadcast data block is an audio data block or a video data block. The preset position may be a head, a tail or a specific position of the live data block. After receiving the live streaming data, the playing terminal can shunt the live streaming data according to the data identifier and separate audio streaming data and video streaming data. The playing terminal can store the audio stream data obtained by shunting to the audio buffer area and store the video stream data to the video buffer area. Therefore, the audio data and the video data are stored in a partitioned and isolated mode, disorder caused by network delay jitter can be eliminated, the continuity of the audio data stream and the video data stream is ensured, and blocking caused by the network jitter is prevented.

And step S204, when the cached data volume of the live streaming data in the live broadcast buffer zone meets the play adjustment condition, adjusting the initial audio frame rate of the audio streaming data in the live broadcast buffer zone according to the cached data volume to obtain the target audio frame rate.

The playing adjustment condition is used for determining an audio playing mode according to the cached data volume of the live streaming data and adjusting the consumption speed of the audio streaming data. The initial audio frame rate is determined when the live broadcast terminal performs data coding, and the playing terminal can determine the initial audio frame rate by decoding audio stream data. Playing the audio stream data according to the initial audio frame rate is playing the audio according to the normal speed, that is, the audio playing mode is normal playing. The target audio frame rate is the audio frame rate obtained by adjusting the initial audio frame rate according to the cached data amount by the playing terminal. When the target audio frame rate is greater than the initial audio frame rate, playing the audio stream data according to the target audio frame rate to be accelerated playing audio, that is, the audio playing mode is accelerated playing. When the target audio frame rate is less than the initial audio frame rate, playing the audio stream data according to the target audio frame rate to be the speed-down playing audio, that is, the audio playing mode is the speed-down playing. Different cached data amount corresponds to different playing adjustment conditions, and the different cached data amount corresponds to normal playing, corresponding accelerated playing, corresponding decelerated playing, corresponding accelerated playing at a continuously changing speed, or corresponding decelerated playing at an abrupt speed.

Specifically, the playing terminal can monitor the cached data volume of the live streaming data in the live streaming buffer area in real time, and when the cached data volume of the live streaming data meets the playing adjustment condition, adjust the initial audio frame rate of the audio streaming data in the live streaming buffer area in real time according to the cached data volume, so as to obtain the target audio frame rate. It can be understood that the playing terminal continuously receives and consumes the live streaming data, so that the cached data amount of the live streaming data in the live streaming buffer area continuously changes, once the cached data amount of the live streaming data meets the playing adjustment condition, the initial audio frame rate is adjusted in real time according to the current cached data amount, and the target audio frame rate is updated in real time. For example, when the cached data amount of the live streaming data is greater than the first threshold, it is determined that the audio playing mode is accelerated playing, different cached data amounts correspond to different acceleration magnifications, the acceleration magnifications are larger as the cached data amount is larger, and since the cached data is continuously changed, the acceleration magnifications need to be determined in real time according to the cached data amount, the initial audio frame rate is dynamically adjusted in real time according to the acceleration magnifications, and the accelerated playing of the audio data is performed at the accelerated playing speed which is dynamically changed in real time.

In one embodiment, when the cached data amount of the audio stream data in the live broadcast buffer area meets the play adjustment condition, the initial audio frame rate of the audio stream data in the live broadcast buffer area is adjusted according to the cached data amount of the audio stream data, so as to obtain the target audio frame rate. The target video frame rate does not need to be determined according to the cached data amount of the video stream data, because the video playing strategy is to actively follow the audio playing. When the audio is played in an accelerated manner, the video is automatically played in an accelerated manner along with the audio; when the audio is played in a speed-down mode, the video is automatically played in a speed-down mode along with the audio.

In an embodiment, the condition that the cached data amount satisfies the play adjustment condition may be that a play time corresponding to the cached live streaming data is greater than a preset threshold, or that the number of the cached live streaming data is greater than the preset threshold. For example, when the buffered data amount of the audio stream data is greater than 2s, the audio play mode is determined to be accelerated play.

Step S206, according to the target audio frame rate, playing the audio stream data, and synchronously playing the video stream data corresponding to the audio stream data with the adjusted audio frame rate.

Specifically, the audio stream data and the video stream data both carry timestamps, and the playing terminal may establish an association relationship between the audio stream data and the video stream data with the timestamps within the synchronization tolerance according to the timestamps. Then, when the playing terminal plays the audio stream data according to the target audio frame rate, the playing terminal may synchronously play the video stream data corresponding to the audio stream data with the adjusted audio frame rate. Thus, when the audio is accelerated, the video automatically follows the audio to be accelerated. However, when the target video frame rate is greater than the screen refresh rate, the playing terminal may filter the video data and discard a portion of the video data to reduce the target video frame rate. When the audio is played in a speed reducing way, the video is automatically played in a speed reducing way along with the audio, and the video can be played in a speed reducing way along with the audio without frame loss. Wherein the synchronization margin can be set as desired.

In the live broadcast data processing method, live broadcast stream data is obtained and written into a live broadcast cache region; the live streaming data comprises audio streaming data and video streaming data, when the cached data amount of the live streaming data in the live streaming buffer area meets the playing adjustment condition, the initial audio frame rate of the audio streaming data in the live streaming buffer area is adjusted according to the cached data amount to obtain a target audio frame rate, the audio streaming data is played according to the target audio frame rate, and the video streaming data corresponding to the audio streaming data with the adjusted audio frame rate is synchronously played. Therefore, when the playing end is blocked, live streaming data in the live streaming buffer area can be accumulated, and when the cached data volume of the live streaming data in the live streaming buffer area meets the playing adjustment condition, the playing speed of the audio data is adjusted according to the cached data volume, and the audio data is played at an accelerated speed without frame loss according to the target audio frame rate, so that the playing delay of the playing end can be reduced while the integrity of the audio content in the live content is ensured, and the difference between the playing content of the playing end and the live content of the live streaming end is reduced. In addition, when the audio data is played, the video data is played synchronously following the audio data, so that the audio data and the video data are played synchronously.

In one embodiment, before adjusting the initial audio frame rate of the audio stream data in the live buffer according to the buffered data amount and obtaining the target audio frame rate, the method further includes: reading audio stream data from the audio buffer area, and decoding the audio stream data to obtain an audio frame set corresponding to the audio stream data; audio frames in the audio frame set carry audio time stamps; reading video stream data from the video cache region, and decoding the video stream data to obtain a video frame set corresponding to the video stream data; video frames in the video frame set carry video timestamps; and establishing an incidence relation between the audio frame and the video frame according to the audio time stamp and the video time stamp.

Specifically, when the cached data amount of the live streaming data in the live streaming buffer meets the play adjustment condition, the play terminal may read the audio streaming data from the audio buffer, determine a compression format (encoding standard) corresponding to the audio streaming data from a preset position of the audio streaming data, and decode the compressed audio streaming data into an original audio frame by using the corresponding decoding standard, so as to obtain an audio frame set. The playing terminal can read the video stream data from the video buffer area, determine the compression format (encoding standard) corresponding to the video stream data from the preset position of the video stream data, and decode the compressed video stream data into the original video frame by adopting the corresponding decoding standard to obtain the video frame set. And then, the playing terminal renders the audio frame and the video frame to realize the playing of the audio and video. In order to realize the video following audio playing, the playing terminal can establish an association relationship between the audio and the video which need to be played synchronously, and then when the audio is played, that is, the audio frame is played, the video frame associated with the audio frame is played synchronously. Since the audio frame carries the audio time stamp and the video frame carries the video time stamp, the playing terminal can specifically establish an association relationship between the audio frame and the video frame according to the audio time stamp and the video time stamp, for example, when the audio time stamp is the same as the video time stamp, the corresponding audio frame and the video frame are established with the association relationship. Or when the audio time stamp and the video time stamp are within the synchronous tolerance, the corresponding audio frame and the video frame are associated. The playing terminal can also determine an initial audio frame rate according to the decoding result of the audio stream data, and determine an initial video frame rate according to the decoding result of the video stream data.

Referring to fig. 4, fig. 4 is a schematic diagram illustrating a principle of audio and video synchronous playing in an embodiment. When the live broadcast terminal generates the audio frames, the corresponding time stamps are marked for the audio frames so as to mark the playing sequence of the audio frames, and when the video frames are generated, the corresponding time stamps are marked for the video frames so as to mark the playing sequence of the video frames. After the playing terminal decodes the audio stream data to obtain audio frames, the audio frames are input to the loudspeaker to be played at a set speed, the timestamp of the currently played audio frame is updated to a Master Clock (Master Clock) every time one audio frame is played, and whether the video frame is rendered or not is determined according to the timestamp of the Master Clock, so that audio and video synchronous playing is realized.

When playing a video frame, as shown in fig. 4, there are three cases:

(1) when the timestamp difference (diff) between the video frame to be rendered currently and the master clock is within the synchronous threshold (min) (i.e., -min < diff < min), the video frame to be rendered currently can be rendered and played normally.

(2) When the timestamp difference between the video frame to be rendered and the main clock is greater than the synchronization threshold and smaller than the exception threshold (i.e., min < diff < max), the video frame to be rendered may wait for the update of the main clock until the timestamp difference between the video frame to be rendered and the main clock is within the synchronization threshold (i.e., the waiting time is diff-min), and then render and play the video frame exception threshold.

(3) When the timestamp difference between the current video frame to be rendered and the master clock is greater than the abnormal threshold (max), the current video frame to be rendered can be rendered and played only by waiting for a long time, and then the current video frame to be rendered can be determined to be an abnormal video frame, and the abnormal video frame is discarded. In order to prevent the video frame from waiting too long for rendering, when the timestamp difference between the current video frame to be rendered and the master clock is greater than an abnormal threshold (i.e., diff < -min | | diff > max), it is determined that the current video frame to be rendered is an abnormal frame, and the abnormal video frame is discarded.

Since audio and video are captured separately, the audio and video timestamps may not correspond exactly, and thus, the audio and video may be considered synchronized as long as the audio and video timestamps are within a reasonable "threshold" (synchronization margin, synchronization threshold). The international standard defines the following thresholds to define the audio-video playing time intervals that are imperceptible, perceptible and unacceptable to the user:

(1) the following cannot be perceived: the difference in audio and video timestamps is: between-100 ms and +25 ms;

(2) can perceive that: the audio lags by more than 100ms or leads by more than 25 ms;

(3) unacceptable: the audio lags by more than 185ms or leads by more than 90 ms.

In this embodiment, the audio stream data and the video stream data are decoded respectively to obtain a corresponding audio frame set and a corresponding video frame set, so as to facilitate subsequent output and play. And establishing an incidence relation between the audio frame and the video frame through the audio time stamp and the video time stamp so as to facilitate the subsequent video to be played along with the audio, thereby realizing the synchronous playing of the audio and the video.

In one embodiment, as shown in fig. 5, when the cached data amount of the live streaming data in the live streaming buffer meets the play adjustment condition, adjusting the initial audio frame rate of the audio streaming data in the live streaming buffer according to the cached data amount to obtain the target audio frame rate includes:

step S502, when the cached data amount of the audio stream data in the audio cache region is larger than a first threshold value, determining a target frame rate increasing proportion according to the cached data amount of the audio stream data; the first threshold is greater than the normal play threshold.

Step S504, the initial audio frame rate is increased according to the target frame rate increasing proportion, and the target audio frame rate is obtained.

Specifically, when the buffered data amount of the audio stream data in the audio buffer is the normal play threshold, the target audio frame rate is the initial audio frame rate, that is, the audio play mode is normal play. That is, when the buffered data amount of the audio stream data in the audio buffer reaches the normal play threshold, the audio is normally played at the initial audio frame rate. When the cached data amount of the audio stream data in the audio cache region is greater than the first threshold (the first threshold is greater than the normal playing threshold), it indicates that the playing terminal has sufficient cached audio stream data, and the difference between the playing terminal and the live broadcasting terminal is large, so that the initial audio frame rate can be increased, that is, the audio playing mode is accelerated playing. The playing terminal may specifically determine a target frame rate increase ratio, that is, an audio acceleration rate, according to the cached data amount of the audio stream data, and increase the initial audio frame rate according to the target frame rate increase ratio to obtain the target audio frame rate. The target audio frame rate, that is, the playing speed of the audio accelerated playing, may be obtained by multiplying the audio acceleration multiplying factor and the initial audio frame rate. The target frame rate increase rate may be continuously increased as the amount of buffered data of the audio stream data increases, or may be intermittently increased as the amount of buffered data of the audio stream data increases.

In one embodiment, the user is unacceptable when playing audio lags behind video by more than 185ms, or leads video by more than 90 ms. Therefore, when the playing terminal accumulates frames due to the pause, the playing terminal can accelerate the playing of the audio and video at a speed which is greater than 1 time and less than or equal to 1+ (185/1000) — 1.185 "times, and accelerate the playing at a speed which can be received by the user to perform frame chase, thereby reducing the playing delay of the playing terminal. Further, the user can perceive that the audio is played more than 100ms later or more than 25ms ahead of the video. Therefore, the playing terminal can accelerate the playing of the audio/video at a speed greater than 1 time and less than or equal to 1+ (100/1000) ═ 1.1 times, accelerate the playing at a speed which cannot be perceived by the user, and perform frame chase, thereby reducing the playing delay of the playing terminal.

In this embodiment, after the playing terminal is stuck, the playing terminal accumulates audio stream data, and when the cached data amount of the audio stream data in the audio cache region is greater than the first threshold, the target frame rate increase ratio is determined according to the cached data amount of the audio stream data, the initial audio frame rate is increased according to the target frame rate increase ratio, the target audio frame rate is obtained, and the audio is accelerated to be played according to the target audio frame rate, so that the playing delay of the playing terminal is reduced.

In one embodiment, as shown in fig. 6, determining the target frame rate increase ratio according to the buffered data amount of the audio stream data includes:

in step S502A, when the buffered data amount of the audio stream data in the audio buffer area is greater than the first threshold and less than the second threshold, the target frame rate increase ratio increases linearly with the increase of the buffered data amount.

In step S502B, when the buffered data amount of the audio stream data in the audio buffer area is greater than or equal to the second threshold, the target frame rate increase rate increases nonlinearly with the increase of the buffered data amount, and the acceleration of the increase of the target frame rate increase rate is inversely proportional to the increase of the buffered data amount.

In step S502C, the maximum value of the rate at which the control target frame rate is increased is less than or equal to a preset rate.

The preset ratio can be set according to actual requirements, for example, the initial preset ratio is 1.185. The user may adjust the initial preset ratio to 1.5 or 2, etc.

Specifically, the target frame rate increase rate may be continuously increased as the amount of buffered data of the audio stream data increases. When the cached data amount of the audio stream data in the audio cache region is between a first threshold and a second threshold (the first threshold is smaller than the second threshold), the target frame rate increase proportion increases linearly with the increase of the cached data amount, that is, the target frame rate increase proportion increases at a constant speed with the increase of the cached data amount, and the target audio frame rate increases at a constant speed with the increase of the cached data amount. It can be understood that when the amount of the buffered data of the audio stream data in the audio buffer area is larger, the target audio frame rate is larger, and the playing audio according to the target audio frame rate can quickly catch up with the live broadcast progress. When the cached data amount of the audio stream data in the audio buffer area is greater than or equal to the second threshold, the target frame rate increase proportion increases nonlinearly with the increase of the cached data amount, the acceleration of the increase of the target frame rate increase proportion is inversely proportional to the increase of the cached data amount, that is, the target frame rate increase proportion increases in a deceleration manner with the increase of the cached data amount, the more the cached data amount, the slower the increase speed of the target frame rate increase proportion, and the slower the target audio frame rate increase. It is understood that when the target frame rate increase rate approaches the preset rate, the increase rate of the target frame rate increase rate may be slowed down to achieve smooth and stable approach or reach the preset rate.

Referring to fig. 7, when the frame rate adjustment ratio is greater than 1, the frame rate adjustment ratio is the target frame rate increase ratio; when the frame rate adjustment ratio is smaller than 1, the frame rate adjustment ratio is the target frame rate reduction ratio. Assuming that the preset ratio is 1.185, when the buffered data amount of the audio stream data in the audio buffer is between the first threshold and the second threshold, the target frame rate increase ratio is farther from 1.185, and the target frame rate increase ratio steadily and rapidly increases at a certain speed. When the buffered data amount of the audio stream data in the audio buffer is greater than or equal to the second threshold, the target frame rate increase ratio approaches 1.185, the increase speed of the target frame rate increase ratio slows down, the target frame rate increase ratio smoothly and stably transitions to 1.185, and when the target frame rate increase ratio is equal to 1.185, the target frame rate increase ratio remains 1.185.

In this embodiment, the target frame rate improvement ratio may be continuously increased along with the increase of the cached data amount of the audio stream data, and the larger the cached data amount of the audio stream data is, the larger the target frame rate improvement ratio is, the faster the playing speed of the audio by the playing terminal is, thereby quickly reducing the playing error of the playing terminal.

In one embodiment, as shown in fig. 8, when the cached data amount of the live streaming data in the live streaming buffer meets the play adjustment condition, adjusting the initial audio frame rate of the audio streaming data in the live streaming buffer according to the cached data amount to obtain the target audio frame rate includes:

step S802, when the current pause time is longer than the preset time and the cached data volume of the audio stream data in the audio cache region is between a third threshold and a fourth threshold, determining a target frame rate reduction ratio according to the cached data volume of the audio stream data; the third threshold is less than the fourth threshold, and the fourth threshold is less than or equal to the normal playing threshold.

Step S804, the initial audio frame rate is reduced according to the target frame rate reduction ratio, so as to obtain the target audio frame rate.

The current pause time refers to a duration time when the playing terminal does not receive live data, namely, a network abnormal time.

Specifically, when the network condition of the play terminal is abnormal, the play terminal cannot acquire new live streaming data from the server temporarily, and the play terminal can only consume the live streaming data in the local live streaming cache region. And when the current pause time of the playing terminal is longer than the preset time length, indicating that the playing terminal does not receive new live broadcast data for a long time. When the cached data amount of the audio stream data in the audio cache region of the playing terminal is between the third threshold and the fourth threshold (the third threshold is smaller than the fourth threshold, and the fourth threshold is smaller than or equal to the normal playing threshold), the cached data amount of the surface playing terminal is insufficient, and the long-time normal-speed playing cannot be supported. Therefore, when the current pause time of the playing terminal is longer than the preset time and the cached data volume of the audio stream data in the audio cache region is between the third threshold and the fourth threshold, the playing terminal can adopt the speed-down playing to slow down the consumption of the cached data volume of the audio stream data, so that more time is reserved for waiting for the network to recover to be normal. Therefore, in the period from the abnormity of the network of the playing terminal to the recovery of the network of the playing terminal, the playing terminal can keep playing as far as possible, and the occurrence of black screen is avoided or delayed. The playing terminal adopting the speed-down playing specifically may determine a target frame rate reduction ratio according to the cached data amount of the audio stream data, and reduce the initial audio frame rate according to the target frame rate reduction ratio to obtain the target audio frame rate. The target frame rate reduction ratio may be continuously reduced with a reduction in the amount of buffered data of the audio stream data, or may be intermittently reduced with a reduction in the amount of buffered data of the audio stream data.

In this embodiment, when the playback terminal pauses, the playback terminal consumes audio stream data accumulated before, when the current pause time is longer than a preset duration and a cached data amount of the audio stream data in the audio cache region is between a third threshold and a fourth threshold, a target frame rate reduction ratio is determined according to the cached data amount of the audio stream data, an initial audio frame rate is reduced according to the target frame rate reduction ratio to obtain a target audio frame rate, and the audio is played at a reduced speed according to the target audio frame rate, so that the playback is maintained as much as possible, and the occurrence of a black screen is avoided or delayed.

In one embodiment, determining the target frame rate reduction ratio according to the buffered data amount of the audio stream data comprises: the target frame rate reduction ratio is reduced in a stepwise manner as the buffered data amount of the audio stream data is reduced.

Specifically, the target frame rate reduction ratio may be intermittently reduced as the amount of buffered data of the audio stream data is reduced. Specifically, the target frame rate reduction ratio may be reduced stepwise with a reduction in the amount of buffered data of the audio stream data. For example, when the buffered data amount of the audio stream data is within the first interval (100-150 frames), the target frame rate reduction ratio is a first preset threshold, and when the buffered data amount of the audio stream data is within the second interval (50-100 frames), the target frame rate reduction ratio is a second preset threshold, and the second preset threshold is smaller than the first preset threshold.

Referring to fig. 7, when the current pause time is greater than the preset time period and the buffered data amount of the audio stream data in the audio buffer area is between the third threshold and the fourth threshold, the target frame rate reduction ratio is reduced stepwise as the buffered data amount of the audio stream data is reduced. That is, the smaller the amount of buffered data of the audio stream data in the audio buffer area, the smaller the target frame rate reduction ratio, and the slower the speed at which the playing terminal plays the audio.

In this embodiment, the target frame rate reduction ratio may be discontinuously reduced along with the reduction of the cached data amount of the audio stream data, and the smaller the target frame rate reduction ratio is, the slower the playing speed of the playing terminal plays the audio is, so that more time can be reserved to wait for the network to recover to normal.

In one embodiment, the establishing of the association relationship between the audio frame and the video frame according to the audio time stamp and the video time stamp includes: determining a time synchronization area corresponding to each audio frame by taking the audio time stamp as a central point; and when the video time stamp of at least one video frame falls in the time synchronization area of the audio frame, establishing an association relation between the video frame and the audio frame corresponding to the same time synchronization area.

Specifically, when the time intervals of the audio time stamp and the video time stamp are within the synchronization tolerance, the corresponding audio frame and the video frame may be considered as the data acquired synchronously, and the data is played synchronously. The playing terminal may first use the audio time stamp as a center point, determine a time start endpoint of the time synchronization region by returning a first preset time from the center point, determine a time end endpoint of the time synchronization region by advancing a second preset time from the center point, and obtain a time synchronization region corresponding to each audio frame by using a time region before the time start endpoint and the time end endpoint as the time synchronization region corresponding to the audio frame. Then, when the video timestamp of a video frame falls within the time synchronization region of an audio frame, the time interval of the video frame and the audio frame is determined to be within the synchronization tolerance, the video frame corresponding to within the time synchronization region of the audio frame. One time synchronization area may correspond to at least one video frame, and when one time synchronization area corresponds to a plurality of video frames, that is, when audio and video are played, a plurality of pictures correspond to one segment of audio. After determining the video frames falling into the time synchronization regions corresponding to the audio frames, the playing terminal may establish an association relationship between the video frames and the audio frames corresponding to the same time synchronization region.

In one embodiment, the first preset time and the second preset time may be set according to actual needs, for example, the first preset time and the second preset time are min. Further, due to the specific arrangement of the first preset time and the second preset time, different time synchronization areas may be independent from each other, and an overlapping area may exist. When the time synchronization regions are independent of each other and there is no overlapping region, one video frame may correspond to one audio frame, and a plurality of video frames may correspond to a plurality of audio frames. When the time synchronization region has an overlapping region, one video frame may correspond to a plurality of audio frames, that is, when playing audio and video, one frame corresponds to a plurality of audio segments.

In this embodiment, the time synchronization region corresponding to each audio frame is determined by using the audio time stamp as a central point, and when the video time stamp of at least one video frame falls within the time synchronization region of the audio frame, an association relationship is established between the video frame and the audio frame corresponding to the same time synchronization region, so that an association relationship can be quickly established between the video frame and the audio frame whose time intervals are within the synchronization tolerance.

In one embodiment, when the video timestamp of at least one video frame falls within the time synchronization region of the audio frame, before associating the video frame and the audio frame corresponding to the same time synchronization region, the method further includes: determining an initial video frame rate according to a decoding result of the video stream data; adjusting the initial video frame rate according to the target audio frame rate to obtain a target video frame rate; when the target video frame rate is greater than the video frame rate threshold and the video timestamps of the video frames fall in the same time synchronization area, one video frame is selected from the video frames to serve as an associated video frame of a current audio frame corresponding to the same time synchronization area, an association relation is established between the current audio frame and the associated video frame, and the selection is stopped until the current video frame rate is equal to the video frame rate threshold.

Specifically, the video frame rate threshold is a screen refresh rate. When the target video frame rate is greater than the video frame rate threshold, that is, the playing speed of the video exceeds the screen refresh rate, the playing terminal cannot render each video frame in time, and cannot successfully play each video frame. Therefore, when the target video frame rate is greater than the video frame rate threshold, frame dropping processing of the video frames is required to reduce the target video frame rate. The playing terminal can determine the initial video frame rate according to the decoding result of the video stream data, and adjust the initial video frame rate according to the target audio frame rate to obtain the target video frame rate. The adjusting the initial video frame rate according to the target audio frame rate may specifically be determining the initial audio frame rate according to a decoding result of the audio stream data, determining a frame rate adjustment ratio according to the target audio frame rate and the initial audio frame rate, and adjusting the initial video frame rate according to the frame rate adjustment ratio in the same ratio to obtain the target video frame rate. When the target video frame rate is greater than the video frame rate threshold and the video timestamps of the video frames fall in the same time synchronization area, the playing terminal selects one video frame from the video frames as an associated video frame of the current audio frame corresponding to the same time synchronization area, discards the rest video frames in the video frames, and establishes an association relationship between the current audio frame and the associated video frame. The playing terminal can sort according to the time sequence of the video timestamps, the earlier the sorting is, the later the sorting is, the closer the time is to the current time, and one video frame is screened from the plurality of video frames according to the sorting result. The playing terminal may filter the video frame with the first time sequence as the associated video frame, may filter the video frame with the middle time sequence as the associated video frame, and may filter the video frame with the last time sequence as the associated video frame.

Further, in order to ensure the playing quality of the audio/video playing, the discarding of the video frame can be stopped when the target video frame rate is reduced to the video frame rate threshold. In addition, in order to ensure the timeliness of audio and video playing, all the time synchronization areas can be sequenced according to the time sequence of the video timestamps, the earlier the sequencing is, and the later the sequencing is, the closer the time is to the current time. The playing terminal can preferentially screen the video frames corresponding to the time synchronization area with the front time sequence, and when the target video frame rate is reduced to the video frame rate threshold value, the screening is stopped. For example, the screen refresh rate is 60 frames per second, and when the target video frame rate is 65 frames per second, 5 frames of video frames per second need to be discarded. The playing terminal can preferentially screen the video frames corresponding to a plurality of time synchronization areas with the front time sequence in each second, screen out 5 frames of video frames and discard the video frames, so that the target video frame rate is reduced to 60 frames per second. Therefore, the relatively old video frames are preferentially discarded, and the latest audio and video can be completely played by the playing terminal.

In this embodiment, when the target video frame rate is greater than the video frame rate threshold and the video timestamps of the plurality of video frames fall within the same time synchronization region, one video frame is selected from the plurality of video frames as an associated video frame of a current audio frame corresponding to the same time synchronization region, and an association relationship is established between the current audio frame and the associated video frame, so that the target video frame rate can be flexibly reduced under the condition that a live broadcast picture is played at the play terminal and the live broadcast picture is continuously played.

In one embodiment, the method further comprises: determining an updated video frame rate according to the screening result; and when the updated video frame rate is greater than the video frame rate threshold, filtering the screened video frames according to the video time stamps and the sequence of the time stamps from front to back, and stopping filtering until the current video frame rate is less than or equal to the video frame rate threshold.

Specifically, after the screening is completed, the play terminal may determine an updated video frame rate according to the screening result, when the updated video frame rate is still greater than the video frame rate threshold (that is, one audio frame corresponds to only one video frame, and the updated video frame rate is still greater than the video frame rate threshold), the play terminal may sort the remaining video frames that are not discarded according to the time sequence of the video timestamps, the earlier the sorting is, the later the sorting is, the closer the time is to the current time, the video frames with the earlier time sorting are filtered, that is, the filtered video frames are filtered according to the sequence of the timestamps from the front to the back, and when the current video frame rate is less than or equal to the video frame rate threshold, the filtering may be stopped, that is, when the current video frame rate is decreased to the video frame rate threshold, the filtering may be stopped. Therefore, the relatively old video frames are preferentially discarded, and the latest audio and video can be completely played by the playing terminal.

In one embodiment, playing audio stream data according to a target audio frame rate, and synchronously playing video stream data corresponding to the audio stream data with an adjusted audio frame rate, includes: sequentially storing the audio frames into an audio frame play queue according to the time sequence of the audio time stamps; sequentially storing the video frames into a video frame play queue according to the time sequence of the video timestamps; and playing the audio frames in the audio frame playing queue according to the target audio frame rate, and synchronously playing the video frames corresponding to the current audio frames in the video frame playing queue according to the association relation.

Specifically, in order to ensure the order of audio and video playing, the playing terminal may store the audio frames with the earlier time sequence into the audio frame playing queue, and store the audio frames with the later time sequence into the audio frame playing queue according to the time sequence of the audio timestamps, so as to store the audio frames into the audio frame playing queue in sequence. Similarly, the playing terminal can store the video frames into the video frame playing queue in sequence according to the time sequence of the video timestamps. And then, when playing the audio and video, the playing terminal plays the audio frames in the audio frame playing queue according to the target audio frame rate, and when playing the current audio frame, synchronously plays the video frames corresponding to the current audio frame in the video frame playing queue according to the association relation, thereby realizing the synchronous playing of the audio and video.

In one embodiment, the method for determining the playing adjustment condition includes the following steps: acquiring a playing test data set corresponding to a plurality of candidate playing conditions; the playing test data set comprises at least one of first buffer time length, second buffer time length, no buffer rate, unit time buffer frequency and unit time error frequency; calculating the playing quality corresponding to each candidate playing condition according to the playing test data set; and determining the candidate playing condition with the highest playing quality from the candidate playing conditions as the playing adjustment condition.

The candidate playing condition comprises at least one of a first threshold, a normal playing threshold, a linear increasing coefficient, an inverse ratio coefficient, a preset proportion, a preset duration, a third threshold, a fourth threshold and a stepwise decreasing coefficient. The first buffering duration refers to the first buffering time when a user opens a new live frame. The secondary buffering duration refers to the secondary buffering time occurring when the user watches the live broadcast picture. The secondary buffering time may be the sum of the time of all buffering during the play-out except the primary buffering. The no-buffer rate refers to the ratio of the no-buffer time of the smooth playing of the live broadcast picture to the total playing time in the process of watching the complete live broadcast content by the user. The number of times of buffering per unit time refers to the number of times of secondary buffering in a unit time, for example, the average number of times of secondary buffering per hour, during the process of watching the complete live content by the user. The number of errors in unit time refers to the number of errors occurring in a live broadcast picture in unit time in the process of watching complete live broadcast content by a user. The live broadcast picture error can be audio frame loss, video frame playing sequence error and the like.

Specifically, different candidate playing conditions may be set at the playing terminal, and the playing terminal plays the live content under the different candidate playing conditions. In the process of watching live broadcast by a user, the generated related playing test data can be collected through the live broadcast monitoring equipment. The playing terminal or the server may obtain, from the live broadcast monitoring device, a playing test data set corresponding to the plurality of candidate playing conditions, where the playing test data set includes at least one of a first buffering duration, a second buffering duration, a no-buffering rate, a buffering number of times per unit time, and an error number of times per unit time. The playing terminal or the server can obtain the weight corresponding to each playing test data, calculate the playing score corresponding to each playing test data in the same playing test data set according to each playing test data in the same playing test data set and the corresponding weight, and add each playing score in the same playing test data set to obtain the playing total score corresponding to the same playing test data set, namely the playing quality corresponding to a candidate playing condition. Similarly, the playing quality corresponding to each candidate playing condition can be obtained. The playback terminal or the server may determine, as the playback adjustment condition, the candidate playback condition with the highest playback quality from among the candidate playback conditions.

In one embodiment, the play-out score corresponding to the first buffering time length is max (0, min (100, -0.5+103.52829 × (0.00001/(1+ exp ((X1 × 1.3-1.6)/0.8)) + (1-0.00001)/(1+ exp ((X1 × 1.3-4.8)/1.2)))). The playback score corresponding to the secondary buffer time length is min (100, max (0, (-2.65489) × pow (10, -7) × exp (-min (24, X2/2+1.5)/-0.52007) -8.55291 × exp (-min (24, X2/2+1.5)/-5.68698) + 112.19011)). The playback score corresponding to the no-buffer rate is min (100, max (0, 109.65485-111.02498/pow ((1-0.01009 xmin (100, (0.98-X3) × 100)), (1/-1.82083)))). The play score corresponding to the number of buffering times per unit time ═ min (100, max (0, min (100, -5.7+113.84829 × (0.45469/(1+ exp (((X4/2.2+1.5) × 10-24.48895)/5.64201)) + (1-0.45469)/(1+ exp (((X4/2.2+1.5) × 10-56.82314)/4.61486)))). The play score corresponding to the number of errors per unit time is min (100, max (0, -1084875+1084960 × exp (-X5 × 2.2/155685.63359) +29.7503 × exp (-X5 × 2.2/0.8932))). Where exp denotes an evolution, X1 denotes a primary buffer time length, X2 denotes a secondary buffer time length, X3 denotes a no-buffer rate, X4 denotes the number of times of buffering per unit time, X5 denotes the number of times of error per unit time, and pow (a, b) denotes a power of b of a.

In this embodiment, by obtaining a playing test data set corresponding to a plurality of candidate playing conditions, where the playing test data set includes at least one of a first buffering time, a second buffering time, a no buffering rate, a buffering time per unit time and an error time per unit time, the playing quality corresponding to each candidate playing condition is calculated according to the playing test data set, and from each candidate playing condition, the candidate playing condition with the highest playing quality is determined as a playing adjustment condition, and a playing adjustment condition can be determined for related data of a user actually watching live broadcast, so that accuracy and practicability of the playing adjustment condition are improved.

Fig. 9 is a flow diagram of a method for live data processing in one embodiment. With reference to fig. 9, a live data processing method is described, which includes the steps of:

1. and acquiring and buffering audio stream data and video stream data.

The live broadcast terminal can encode live broadcast audio and video data related to live broadcast to obtain audio stream data and video stream data, and the audio stream data and the video stream data form live broadcast stream data to be sent to the server. When the playing terminal wants to watch the live broadcast, the playing terminal can acquire live broadcast stream data from the server, demultiplex the live broadcast stream data to obtain Audio stream data and Video stream data, store the Audio stream data in an Audio Jitter area (Audio Jitter), and store the Video stream data in a Video Jitter area (Video Jitter).

2. And determining the target audio frame rate according to the buffered data amount of the audio stream data.

When the playing terminal is jammed, the audio buffer area and the video buffer area of the playing terminal can accumulate audio stream data and video stream data. When the cached data amount of the audio stream data in the audio cache region meets the playing adjustment condition, the playing terminal reads the audio stream data from the audio cache region, reads the video stream data from the video cache region, inputs the audio stream data and the video stream data into a decoding layer for decoding (Decode), obtains an audio frame set and a video frame set, and obtains an initial audio frame rate and a video frame rate. And the playing terminal inputs the audio frame set and the video frame set into a sound and picture synchronization layer for rendering (QGP Render), and establishes an association relation between the audio frame and the video frame according to the audio time stamp and the video time stamp in the sound and picture synchronization layer so that the video can be synchronously played along with the audio according to the association relation in the following process.

Referring to fig. 10, different amounts of buffered data correspond to different playback adjustment conditions. When the playing terminal network is good and no pause occurs, the playing terminal normally plays the live broadcast content. When the playing terminal is blocked, the playing terminal respectively caches the audio stream data and the video stream data in the audio cache region and the video cache region. When the cached data amount of the audio stream data in the audio cache region is larger than a first threshold value, the audio playing mode is determined to be accelerated playing, the playing terminal determines a target frame rate increasing proportion according to the cached data amount of the audio stream data, increases the initial audio frame rate according to the target frame rate increasing proportion to obtain a target audio frame rate, and accelerates playing of live broadcast content according to the target audio frame rate. And when the cached data volume of the audio stream data in the audio cache region reaches a normal playing threshold value, determining that the audio playing mode is normal playing, and normally playing the live broadcast content by the playing terminal according to the initial audio frame rate. When the current pause time is longer than the preset time length and the cached data volume of the audio stream data in the audio cache region is between the third threshold value and the fourth threshold value, determining that the audio playing mode is speed-down playing, determining a target frame rate reduction ratio by the playing terminal according to the cached data volume of the audio stream data, reducing the initial audio frame rate according to the target frame rate reduction ratio to obtain a target audio frame rate, and playing the live broadcast content at the speed-down speed according to the target audio frame rate. When the cached data amount of the audio stream data in the audio cache region does not meet the condition, the playing terminal can pause playing the live broadcast content or consume all the cached audio stream data and video stream data at a certain speed.

In addition, when the target video frame rate is greater than the screen refresh rate, frame dropping processing of the video frames can be performed to reduce the target video frame rate. Specifically, when a plurality of video frames correspond to one audio frame, one video frame is selected and retained from the plurality of video frames, and the rest video frames are discarded, so as to reduce the target video frame rate. When one audio frame corresponds to one audio frame, the older video frame is preferentially discarded to reduce the target video frame rate.

3. And playing the audio stream data with the adjusted audio frame rate according to the target audio frame rate, and synchronously playing the video stream data associated with the audio stream data with the adjusted audio frame rate.

The playing terminal stores the audio frames into an audio frame playing queue (audio frame queue) in sequence according to the time sequence, and stores the video frames into a video frame playing queue (video frame queue) in sequence according to the time sequence. When the target audio frame rate is equal to the initial audio frame rate, the playing terminal normally plays the audio frames in the audio frame playing queue according to the target audio frame rate, and synchronously plays the video frames corresponding to the current audio frames in the video frame playing queue. When the target audio frame rate is greater than the initial audio frame rate, the playing terminal accelerates to play the audio frames in the audio frame playing queue according to the target audio frame rate, and synchronously plays the video frames corresponding to the current audio frames in the video frame playing queue. When the target audio frame rate is less than the initial audio frame rate, the playing terminal plays the audio frames in the audio frame playing queue in a speed-down mode according to the target audio frame rate, and synchronously plays the video frames corresponding to the current audio frames in the video frame playing queue.

In the conventional technology, when a user encounters a problem of jamming and the like, the gap between live content watched by the user at a playing terminal and real content is increased, and the user needs to reconnect a server through a refreshing interface to catch up with the real content. However, according to the live broadcast data processing method, the user does not need to reconnect the server through a refresh interface, and the playing terminal can catch up frames at the playing speed acceptable by the user according to the cached data volume of the audio data to accelerate playing, so that the difference between live broadcast content and real content is reduced, and the playing delay of the playing terminal is reduced.

The application also provides an application scene, and the application scene applies the live data processing method. The application of the live data processing method in the application scene is as follows: as shown in fig. 11, taking the live game of the shooting game as an example, the spectator can watch the live game screen of the shooting game through the play terminal. Because the game live broadcast by the live broadcast terminal is in progress all the time, when the playing terminal is in pause, the difference between the game live broadcast picture played by the playing terminal and the game live broadcast picture live broadcast by the live broadcast terminal is larger and larger. When the playing terminal is jammed, the playing terminal can buffer the game audio stream data and the game video stream data into the audio buffer area and the video buffer area respectively. When the network of the playing terminal recovers to normal, the playing terminal can receive a large amount of game audio stream data and game video stream data, and then the amount of the cached data in the audio buffer area and the video buffer area is increased. When the cached data amount of the game audio stream data in the audio cache region is larger than a first threshold value, the playing terminal can accelerate playing of the game audio stream data, and the game video stream data is automatically played along with the game audio stream data, so that the difference between the live broadcast terminal and the playing terminal is gradually reduced. The playing terminal may specifically increase the initial audio frame rate of the game audio stream data in the audio buffer area according to the buffered data amount of the game audio stream data to obtain the target audio frame rate, for example, increase the initial audio frame rate by 1.185 times, play the game audio stream data with the adjusted audio frame rate according to the target audio frame rate, and synchronously play the game video stream data corresponding to the game audio stream data with the adjusted audio frame rate.

The application also additionally provides an application scene, and the application scene applies the live data processing method. Specifically, the application of the live data processing method in the application scenario is as follows: as shown in fig. 12, taking the example of live singing, the main broadcast can live singing through the live broadcasting terminal, and the audience can watch the live singing through the broadcasting terminal. When the playing terminal is jammed, the audio buffer area and the video buffer area of the playing terminal can accumulate singing audio stream data and singing video stream data. When the cached data amount of the singing audio stream data in the audio cache region is larger than the first threshold value, the playing terminal can accelerate playing of the singing audio stream data, for example, the singing audio stream data is accelerated to play at a speed which is 1.1 times of the speed which cannot be perceived by the audience, and the singing video stream data is automatically played along with the singing audio stream data, so that the playing delay of the playing terminal is gradually reduced, and the current live broadcast progress of the main broadcast is tracked. In addition, when the target video frame rate does not exceed the screen refresh rate, the singing video stream data can be played along with the singing audio stream data without frame loss, and therefore the audiences can be guaranteed to watch complete singing pictures and listen to complete singing audio through the playing terminal. When the target video frame rate exceeds the screen refresh rate, the playing terminal can discard part of singing video stream data under the condition of ensuring that audiences listen to the complete singing audio through the playing terminal, so that the target video frame rate is reduced, and playing errors are avoided.

It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the above-mentioned flowcharts may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or the stages is not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a part of the steps or the stages in other steps.

In one embodiment, as shown in fig. 13, there is provided a live data processing apparatus, which may be a part of a computer device using a software module or a hardware module, or a combination of the two, and specifically includes: a data obtaining module 1302, a frame rate adjusting module 1304, and a data playing module 1306, wherein:

a data obtaining module 1302, configured to obtain live streaming data, and write the live streaming data into a live buffer; live streaming data includes audio streaming data and video streaming data.

The frame rate adjustment module 1304 is configured to, when the cached data amount of the live streaming data in the live broadcast buffer meets the play adjustment condition, adjust the initial audio frame rate of the audio streaming data in the live broadcast buffer according to the cached data amount, so as to obtain the target audio frame rate.

The data playing module 1306 is configured to play the audio stream data according to the target audio frame rate, and synchronously play the video stream data corresponding to the audio stream data with the adjusted audio frame rate.

In one embodiment, the data acquisition module is further configured to shunt live streaming data to obtain audio streaming data and video streaming data; writing the audio stream data into an audio buffer area in the live broadcast buffer area; and writing the video stream data into a video buffer area in the live broadcast buffer area.

In one embodiment, the frame rate adjustment module is further configured to read audio stream data from the audio buffer, and decode the audio stream data to obtain an audio frame set corresponding to the audio stream data; audio frames in the audio frame set carry audio time stamps; reading video stream data from the video cache region, and decoding the video stream data to obtain a video frame set corresponding to the video stream data; video frames in the video frame set carry video timestamps; and establishing an incidence relation between the audio frame and the video frame according to the audio time stamp and the video time stamp.

In one embodiment, the frame rate adjustment module is further configured to determine a target frame rate increase ratio according to the cached data amount of the audio stream data when the cached data amount of the audio stream data in the audio cache region is greater than a first threshold; the first threshold value is larger than the normal playing threshold value; and increasing the initial audio frame rate according to the target frame rate increasing proportion to obtain the target audio frame rate.

In one embodiment, the frame rate adjustment module is further configured to increase the target frame rate increase rate linearly with an increase in the amount of buffered data when the amount of buffered data of the audio stream data in the audio buffer is greater than a first threshold and less than a second threshold; when the cached data amount of the audio stream data in the audio cache region is larger than or equal to a second threshold value, the target frame rate increasing proportion increases in a non-linear mode along with the increase of the cached data amount, and the acceleration of the increase of the target frame rate increasing proportion is in an inverse proportion to the increase of the cached data amount; the maximum value of the rate at which the control target frame rate is increased is less than or equal to a preset rate.

In one embodiment, the frame rate adjustment module is further configured to determine a target frame rate reduction ratio according to the cached data amount of the audio stream data when the current pause time is greater than the preset time length and the cached data amount of the audio stream data in the audio cache region is between a third threshold and a fourth threshold; the third threshold value is smaller than a fourth threshold value, and the fourth threshold value is smaller than or equal to a normal playing threshold value; and reducing the initial audio frame rate according to the target frame rate reduction ratio to obtain the target audio frame rate.

In one embodiment, the frame rate adjustment module is further configured to decrease the target frame rate decrease rate in a stepwise manner as the buffered data amount of the audio stream data decreases.

In one embodiment, the frame rate adjustment module is further configured to determine a time synchronization region corresponding to each audio frame with the audio time stamp as a central point; and when the video time stamp of at least one video frame falls in the time synchronization area of the audio frame, establishing an association relation between the video frame and the audio frame corresponding to the same time synchronization area.

In one embodiment, the frame rate adjustment module is further configured to determine an initial video frame rate according to a decoding result of the video stream data; adjusting the initial video frame rate according to the target audio frame rate to obtain a target video frame rate; when the target video frame rate is greater than the video frame rate threshold and the video timestamps of the video frames fall in the same time synchronization area, one video frame is selected from the video frames to serve as an associated video frame of a current audio frame corresponding to the same time synchronization area, an association relation is established between the current audio frame and the associated video frame, and the selection is stopped until the current video frame rate is less than or equal to the video frame rate threshold.

In one embodiment, the frame rate adjustment module is further configured to determine to update the video frame rate according to the filtering result; and when the updated video frame rate is greater than the video frame rate threshold, filtering the screened video frames according to the video time stamps and the sequence of the time stamps from front to back, and stopping filtering until the current video frame rate is less than or equal to the video frame rate threshold.

In one embodiment, the data playing module is further configured to store the audio frames in the audio frame playing queue in sequence according to the time sequence of the audio timestamps; sequentially storing the video frames into a video frame play queue according to the time sequence of the video timestamps; and playing the audio frames in the audio frame playing queue according to the target audio frame rate, and synchronously playing the video frames corresponding to the current audio frames in the video frame playing queue according to the association relation.

In one embodiment, the frame rate adjustment module is further configured to obtain a play test data set corresponding to a plurality of candidate play conditions; the playing test data set comprises at least one of first buffer time length, second buffer time length, no buffer rate, unit time buffer frequency and unit time error frequency; calculating the playing quality corresponding to each candidate playing condition according to the playing test data set; and determining the candidate playing condition with the highest playing quality from the candidate playing conditions as the playing adjustment condition.

The live broadcast data processing device is used for writing the live broadcast stream data into a live broadcast cache region by acquiring the live broadcast stream data; the live streaming data comprises audio streaming data and video streaming data, when the cached data amount of the live streaming data in the live streaming buffer area meets the playing adjustment condition, the initial audio frame rate of the audio streaming data in the live streaming buffer area is adjusted according to the cached data amount to obtain a target audio frame rate, the audio streaming data is played according to the target audio frame rate, and the video streaming data corresponding to the audio streaming data with the adjusted audio frame rate is synchronously played. Therefore, when the playing end is blocked, live streaming data in the live streaming buffer area can be accumulated, and when the cached data volume of the live streaming data in the live streaming buffer area meets the playing adjustment condition, the playing speed of the audio data is adjusted according to the cached data volume, and the audio data is played at an accelerated speed without frame loss according to the target audio frame rate, so that the playing delay of the playing end can be reduced while the integrity of the audio content in the live content is ensured, and the difference between the playing content of the playing end and the live content of the live streaming end is reduced. In addition, when the audio data is played, the video data is played synchronously following the audio data, so that the audio data and the video data are played synchronously.

For specific limitations of the live data processing apparatus, reference may be made to the above limitations on the live data processing method, which is not described herein again. The modules in the live data processing apparatus can be implemented in whole or in part by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 14. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a live data processing method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 14 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps in the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A live data processing method, characterized in that the method comprises:

acquiring live streaming data, and writing the live streaming data into a live caching area; the live streaming data comprises audio streaming data and video streaming data;

when the cached data volume of the live streaming data in the live streaming cache region meets the playing adjustment condition, adjusting the initial audio frame rate of the audio streaming data in the live streaming cache region according to the cached data volume to obtain a target audio frame rate;

and playing the audio stream data according to the target audio frame rate, and synchronously playing the video stream data corresponding to the audio stream data with the adjusted audio frame rate.

2. The method of claim 1, wherein the obtaining live streaming data and writing the live streaming data into a live buffer comprises:

shunting the live streaming data to obtain the audio streaming data and the video streaming data;

writing the audio stream data into an audio buffer in the live broadcast buffer;

and writing the video stream data into a video buffer area in the live broadcast buffer area.

3. The method of claim 2, wherein before adjusting the initial audio frame rate of the audio stream data in the live buffer according to the buffered data amount to obtain the target audio frame rate, the method further comprises:

reading audio stream data from the audio buffer area, and decoding the audio stream data to obtain an audio frame set corresponding to the audio stream data; audio frames in the audio frame set carry audio time stamps;

reading video stream data from the video buffer area, and decoding the video stream data to obtain a video frame set corresponding to the video stream data; video frames in the video frame set carry video timestamps;

and establishing an incidence relation between the audio frame and the video frame according to the audio time stamp and the video time stamp.

4. The method as claimed in claim 3, wherein when the buffered data amount of the live streaming data in the live buffer meets the play adjustment condition, adjusting the initial audio frame rate of the audio streaming data in the live buffer according to the buffered data amount to obtain the target audio frame rate includes:

when the cached data amount of the audio stream data in the audio cache region is larger than a first threshold value, determining a target frame rate improvement proportion according to the cached data amount of the audio stream data; the first threshold value is larger than a normal playing threshold value;

and increasing the initial audio frame rate according to the target frame rate increasing proportion to obtain the target audio frame rate.

5. The method of claim 4, wherein determining a target frame rate increase ratio based on the buffered amount of audio stream data comprises:

when the cached data amount of the audio stream data in the audio cache region is larger than a first threshold and smaller than a second threshold, the target frame rate increasing proportion is increased linearly along with the increase of the cached data amount;

when the cached data amount of the audio stream data in the audio cache region is greater than or equal to the second threshold, the target frame rate increase proportion increases nonlinearly with the increase of the cached data amount, and the acceleration of the increase of the target frame rate increase proportion is inversely proportional to the increase of the cached data amount;

controlling a maximum value of the rate of the target frame rate increase to be less than or equal to a preset rate.

6. The method as claimed in claim 3, wherein when the buffered data amount of the live streaming data in the live buffer meets the play adjustment condition, adjusting the initial audio frame rate of the audio streaming data in the live buffer according to the buffered data amount to obtain the target audio frame rate includes:

when the current pause time is longer than the preset time and the cached data volume of the audio stream data in the audio cache region is between a third threshold value and a fourth threshold value, determining a target frame rate reduction proportion according to the cached data volume of the audio stream data; the third threshold is smaller than the fourth threshold, and the fourth threshold is smaller than or equal to a normal playing threshold;

and reducing the initial audio frame rate according to the target frame rate reduction ratio to obtain the target audio frame rate.

7. The method of claim 6, wherein determining a target frame rate reduction ratio according to the buffered data amount of the audio stream data comprises:

the target frame rate reduction ratio is reduced in a stepwise manner with the reduction of the buffered data amount of the audio stream data.

8. The method of claim 3, wherein the associating of the audio frame and the video frame according to the audio time stamp and the video time stamp comprises:

determining a time synchronization area corresponding to each audio frame by taking the audio time stamp as a central point;

and when the video time stamp of at least one video frame falls in the time synchronization area of the audio frame, establishing an association relation between the video frame and the audio frame corresponding to the same time synchronization area.

9. The method according to claim 8, wherein before associating the video frame and the audio frame corresponding to the same time synchronization region when the video timestamp of at least one video frame falls within the time synchronization region of the audio frame, the method further comprises:

determining an initial video frame rate according to a decoding result of the video stream data;

adjusting the initial video frame rate according to the target audio frame rate to obtain a target video frame rate;

when the target video frame rate is greater than a video frame rate threshold and the video timestamps of a plurality of video frames fall in the same time synchronization area, screening one video frame from the plurality of video frames as an associated video frame of a current audio frame corresponding to the same time synchronization area, establishing an association relationship between the current audio frame and the associated video frame, and stopping screening until the current video frame rate is less than or equal to the video frame rate threshold.

10. The method of claim 9, further comprising:

determining an updated video frame rate according to the screening result;

and when the updated video frame rate is greater than the video frame rate threshold, filtering the screened video frames according to the video timestamps and the sequence of the timestamps from front to back, and stopping filtering until the current video frame rate is less than or equal to the video frame rate threshold.

11. The method of claim 3, wherein the playing the audio stream data according to the target audio frame rate, and synchronously playing the video stream data corresponding to the audio stream data with the adjusted audio frame rate comprises:

sequentially storing the audio frames into an audio frame play queue according to the time sequence of the audio time stamps;

sequentially storing the video frames into a video frame play queue according to the time sequence of the video timestamps;

and playing the audio frames in the audio frame playing queue according to the target audio frame rate, and synchronously playing the video frames corresponding to the current audio frames in the video frame playing queue according to the incidence relation.

12. The method according to claim 1, wherein the method for determining the playing adjustment condition comprises the following steps:

acquiring a playing test data set corresponding to a plurality of candidate playing conditions; the playing test data set comprises at least one of first buffer time length, second buffer time length, no buffer rate, unit time buffer frequency and unit time error frequency;

calculating the playing quality corresponding to each candidate playing condition according to the playing test data set;

and determining the candidate playing condition with the highest playing quality from the candidate playing conditions as the playing adjustment condition.

13. A live data processing apparatus, characterized in that the apparatus comprises:

14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 12.

15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 12.