CN115883855B

CN115883855B - Playing data processing method, device, computer equipment and storage medium

Info

Publication number: CN115883855B
Application number: CN202111123072.1A
Authority: CN
Inventors: 吴昊; 张亮; 肖志宏; 李玉龙
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-09-24
Filing date: 2021-09-24
Publication date: 2024-02-23
Anticipated expiration: 2041-09-24
Also published as: CN115883855A

Abstract

The application relates to a play data processing method, a play data processing device, computer equipment and a storage medium. The method comprises the following steps: receiving a first data stream corresponding to a target playing object from a first transmission channel, and caching the first data stream through a first data cache region; receiving a second data stream from a second transmission channel, and caching the second data stream through a second data cache region; storing the data stream in the first data buffer area into an aggregation buffer area; when the data loss of the first data stream is determined, switching a data stream buffer source corresponding to the convergence buffer area from the first data buffer area to a second data buffer area, and determining a convergence starting position corresponding to the second data stream according to the data loss position; starting from the convergence initial position, storing the data stream in the second data buffer area into the convergence buffer area; and obtaining the playing data corresponding to the target playing object based on the converged data stream in the converged buffer area. By adopting the method, the fluency of playing the video can be improved.

Description

Playing data processing method, device, computer equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for processing play data, a computer device, and a storage medium.

Background

With the development of internet technology, streaming media technology has emerged, which refers to a media format such as audio or video that is continuously played in real time on a network using streaming technology. Streaming technology can be applied in many ways, for example in applications and live broadcast. In the streaming media technology, the problem of data loss often occurs in the data transmission process, so that the data is abnormal in playing, for example, the data is blocked in playing live video.

In the traditional technology, the situation that the played data is in a back or forward jump exists, so that the smoothness of the data playing is low.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a play data processing method, apparatus, computer device, and storage medium capable of improving smoothness of data play.

A play data processing method, the method comprising: receiving a first data stream corresponding to a target playing object from a first transmission channel, and caching the first data stream through a first data cache region; receiving a second data stream corresponding to the target playing object from a second transmission channel, and caching the second data stream through a second data cache region; taking the first data buffer area as a data stream buffer source corresponding to an aggregation buffer area, and storing the data stream in the first data buffer area into the aggregation buffer area; when the first data stream is determined to have data missing, switching a data stream cache source corresponding to the convergence cache region from the first data cache region to the second data cache region, determining a data missing position corresponding to the first data stream, and determining a convergence starting position corresponding to the second data stream according to the data missing position; starting from a convergence starting position corresponding to the second data stream, storing the data stream in the second data buffer area into the convergence buffer area; and obtaining the playing data corresponding to the target playing object based on the converged data stream in the converged buffer area.

A play data processing device, the device comprising: the first data stream receiving module is used for receiving a first data stream corresponding to a target playing object from a first transmission channel and caching the first data stream through a first data cache area; the second data stream receiving module is used for receiving a second data stream corresponding to the target playing object from a second transmission channel and caching the second data stream through a second data cache area; the first data stream aggregation module is used for taking the first data buffer area as a data stream buffer source corresponding to the aggregation buffer area and storing the data stream in the first data buffer area into the aggregation buffer area; the convergence starting position determining module is used for switching a data stream buffer source corresponding to the convergence buffer area from the first data buffer area to the second data buffer area when determining that the first data stream has data missing, determining a data missing position corresponding to the first data stream, and determining a convergence starting position corresponding to the second data stream according to the data missing position; the second data stream convergence module is used for storing the data streams in the second data buffer area into the convergence buffer area from the convergence starting position corresponding to the second data streams; and the play data obtaining module is used for obtaining the play data corresponding to the target play object based on the converged data stream in the converged buffer area.

In some embodiments, the first data stream and the second data stream are encoded data streams; the convergence starting position determining module comprises: a target data position determining unit, configured to obtain a target data position corresponding to the data missing position in the second data stream; a frame coding type obtaining unit, configured to obtain a frame coding type of a target data frame corresponding to the target data position in the second data stream; and the convergence starting position determining unit is used for determining a position determining strategy according to the frame coding type and determining a convergence starting position corresponding to the second data flow according to the target data position and the position determining strategy.

In some embodiments, the convergence start location determination unit is further configured to: when the frame coding type is a non-coding reference frame, determining the position determining strategy as a coding data group skipping strategy; and skipping the target coding data set corresponding to the target data position based on the coding data set skipping strategy, and taking the position of the coding reference frame in the backward coding data set corresponding to the target coding data set as the convergence starting position corresponding to the second data stream.

In some embodiments, the convergence starting position determining unit is further configured to determine that the position determining policy is a position maintaining policy when the frame coding type is a coding reference frame; and taking the target data position as a convergence starting position corresponding to the second data flow based on the position maintaining strategy.

In some embodiments, the play data obtaining module includes: the target data length acquisition unit is used for acquiring the target data length corresponding to the transcoding data group; an updated aggregate data flow obtaining unit, configured to insert a reference indication frame into the aggregate data flow in the aggregate buffer according to the target data length, to obtain an updated aggregate data flow; a transcoded data stream obtaining unit, configured to determine a transcoded reference frame based on the reference indication frame in a transcoding process, and transcode a transcoded data set where the transcoded reference frame is located in the updated aggregate data stream based on the transcoded reference frame, so as to obtain a transcoded data stream; and the play data obtaining unit is used for obtaining the play data corresponding to the target play object according to the transcoding data stream.

In some embodiments, the transcoded data stream obtaining unit is further configured to decode based on the updated aggregate data stream to obtain a decoded data stream; in the process of encoding the decoded data stream, when a reference indication frame is detected, taking a backward adjacent data frame of the reference indication frame as a transcoding reference frame, and carrying out intra-frame encoding on the transcoding reference frame to obtain an intra-frame encoded frame; and transcoding the transcoding data group where the transcoding reference frame is located based on the intra-frame coding frame to obtain transcoding data in the transcoding data stream, wherein the transcoding data group where the transcoding reference frame is located comprises the data frame with the target data length.

In some embodiments, the first data stream and the second data stream are encoded data streams, and the second data stream aggregation module includes: an information switching indication frame generating unit, configured to obtain a second coding parameter corresponding to the second data stream, and generate an information switching indication frame based on the second coding parameter; the information switching indication frame is used for indicating that in the decoding process, if the information switching indication frame is detected, the second coding parameter is used as a new coding parameter, so that the backward data stream of the information switching indication frame is decoded based on the second coding parameter; and the data stream converging unit is used for inserting the information switching indication frame into the tail end position of the first data stream in the converging buffer area, starting from the converging starting position corresponding to the second data stream, storing the data stream in the second data buffer area into the converging buffer area, and taking the data stream as the backward data stream of the information switching indication frame.

In some embodiments, the play data obtaining module includes: a first decoding unit, configured to decode, in the process of decoding the aggregate data stream, based on a first coding parameter corresponding to the first data stream; a second decoding unit, configured to extract a second coding parameter from the information switching instruction frame when the information switching instruction frame is detected, switch the coding parameter referred to by decoding from the first coding parameter to the second coding parameter, and perform decoding based on the second coding parameter; and the coding unit is used for uniformly coding the decoded data stream obtained by decoding to obtain the playing data corresponding to the target playing object.

In some embodiments, the apparatus further comprises: the live view angle set acquisition module is used for acquiring a live view angle set corresponding to a target live scene and establishing a cache group corresponding to each live view angle in the live view angle set; the cache group comprises data cache areas corresponding to the live broadcast devices corresponding to the live broadcast viewing angles and convergence cache areas; the live broadcast visual angle corresponds to a plurality of live broadcast devices; the data buffer area corresponding to any one of the direct broadcasting devices in the buffer group is a first data buffer area, and the convergence starting position determining module is further configured to select a data buffer area other than the first data buffer area from the buffer group corresponding to the first data buffer area as the second data buffer area when determining that the first data stream has data missing, and switch a data stream buffer source corresponding to the convergence buffer area from the first data buffer area to the second data buffer area.

In some embodiments, the target playing object is a target live video corresponding to a target live scene, and the first data stream receiving module is further configured to establish a first transmission channel with a first shooting device corresponding to the target live scene, and receive a first video stream transmitted by the first shooting device through the first transmission channel and a target scene identifier corresponding to the target live scene; taking the first video stream as a main video stream corresponding to the target live video based on the target scene identification; the second data stream receiving module is further configured to establish a second transmission channel with a second shooting device corresponding to the target live broadcast scene, and receive a second video stream transmitted by the second shooting device through the second transmission channel and a target scene identifier corresponding to the target live broadcast scene; and taking the second video stream as a backup video stream corresponding to the target live video based on the target scene identification.

A computer device comprising a memory storing a computer program and a processor implementing the steps of the above-described play data processing method when the processor executes the computer program.

A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described play data processing method.

In some embodiments, a computer program product or computer program is provided that includes computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the steps in the above-described method embodiments.

According to the playing data processing method, the device, the computer equipment and the storage medium, the first data stream of the target playing object is cached through the first data cache region, the second data stream of the target playing object is cached through the second data cache region, the data stream in the first data cache region is stored into the convergence cache region, when the fact that the first data stream has data missing is determined, the data stream cache source corresponding to the convergence cache region is switched from the first data cache region to the second data cache region, the convergence starting position corresponding to the second data stream is determined according to the data missing position corresponding to the first data stream, and the data stream in the second data cache region is stored into the convergence cache region from the convergence starting position corresponding to the second data stream. The data streams corresponding to the same target playing object are respectively transmitted through the first transmission channel and the second transmission channel, and are cached through the cache spaces respectively corresponding to the channels, and the first data cache region in the converging cache region is used as a data stream cache source to acquire the data streams, so that the data streams of the playing object can be acquired, when the data to be cached by the data stream cache source is abnormal, the data streams can be immediately and accurately switched to the second data cache region, the converging starting position corresponding to the second data streams is determined based on the data missing position, and the data streams are cached from the converging starting position, so that the missing condition of the data streams in the converging cache region is reduced, the playing data corresponding to the target playing object is acquired based on the converging data streams in the converging cache region, and the integrity of the playing data and the fluency of the data playing are improved.

Drawings

FIG. 1 is a diagram of an application environment for a play data processing method in some embodiments;

FIG. 2 is a flow chart of a method for processing play data in some embodiments;

FIG. 3 is a graph of the relationship between the time stamps of data frames and the ordering of the data frames in some embodiments;

FIG. 4 is a schematic diagram of an aggregate data flow obtained in some embodiments;

FIG. 5 is a diagram of a live streaming technology framework in some embodiments;

fig. 6 is a schematic diagram of a main backup stream switching in a live streaming technology in some embodiments;

fig. 7 is a schematic diagram of a main backup stream switching in a live streaming technology in some embodiments;

fig. 8 is a schematic diagram of a main backup stream switching in a live streaming technology in some embodiments;

fig. 9 is a schematic diagram of a main backup stream switching in a live streaming technology in some embodiments;

FIG. 10 is a schematic diagram of a primary backup flow switch in some embodiments;

fig. 11A is a schematic diagram of a frame format of an SEI frame in some embodiments;

FIG. 11B is a diagram of a resulting transcoded data stream in some embodiments;

FIG. 12 is a block diagram of a play data processing device in some embodiments;

FIG. 13 is an internal block diagram of a computer device in some embodiments;

FIG. 14 is an internal block diagram of a computer device in some embodiments.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The play data processing method provided by the application can be applied to an application environment shown in fig. 1, where the application environment includes a first terminal 102, a server 104 and a second terminal 106. Wherein the first terminal 102, the server 104, and the second terminal 106 communicate via a network. Various applications, for example, an application for live broadcasting may be installed on the first terminal 102 and the second terminal 106, the first terminal 102 may send the collected live broadcasting data to the server 104, the first terminal 102 may transmit the collected live broadcasting data to the server 104 through different transmission channels, for example, may transmit the collected live broadcasting data to the server 104 through 2 or more transmission channels. The server 104 may also receive live broadcast data transmitted by other terminals, which may be capturing devices in the same live broadcast scene as the first terminal 102, for example, may be multimedia capturing devices in the live broadcast scene of the same concert, where the multimedia capturing devices may have at least one of a video capturing function or an audio capturing function. The second terminal 106 may be a terminal that views live. The server 104 may receive live broadcast data transmitted by the same first terminal 102 through different transmission channels, process live broadcast data from different transmission channels, obtain processed live broadcast data, and send the processed live broadcast data to the second terminal 106 that views live broadcast.

Specifically, the server 104 may receive, from the first transmission channel, a first data stream corresponding to the target play object sent by the first terminal 102, and receive, through the second transmission channel, a second data stream corresponding to the target play object sent by the first terminal 102. The server 104 may buffer the first data stream through a first data buffer, buffer the second data stream through a second data buffer, and aggregate the data streams in an aggregation buffer based on the first data stream and the second data stream. For example, the server 104 may store the data stream in the first data buffer into the aggregation buffer with the first data buffer as a data stream buffer source corresponding to the aggregation buffer, switch the data stream buffer source corresponding to the aggregation buffer from the first data buffer to the second data buffer when determining that the first data stream has a data loss, determine a data loss position corresponding to the first data stream, determine an aggregation start position corresponding to the second data stream according to the data loss position, and store the data stream in the second data buffer into the aggregation buffer from the aggregation start position corresponding to the second data stream. The server 104 may transcode the aggregate data stream in the aggregate buffer to obtain a transcoded data stream, and the server 104 may send the transcoded data stream to the second terminal 106. There may be a plurality of second terminals 106, and the server 104 may distribute the processed data stream to each of the second terminals 106. The server 104 may be, for example, a server where a streaming media background resides.

In some embodiments, the first terminal 102 may be a terminal corresponding to a live user, the terminal corresponding to live may be referred to as a main broadcasting terminal or an audio/video data source, and the second terminal 106 may be a terminal corresponding to a user watching live, and the terminal corresponding to a user watching live may be referred to as a viewer terminal. During live broadcasting of the user, the first terminal 102 may collect audio data or video data sources, encode the collected audio data or video data, send the encoded audio data or video data to the server 104, and the server 104 may transcode the received encoded audio data or video data and send the transcoded audio data or video data to the second terminal 106.

The terminal 102 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted terminal, a smart television, etc. The server 104 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited herein.

In some embodiments, as shown in fig. 2, a play data processing method is provided, and the method is applied to the server 104 in fig. 1 for illustration, and includes the following steps:

s202, a first data stream corresponding to a target playing object is received from a first transmission channel, and the first data stream is buffered through a first data buffer area.

The playing object refers to an object to be played, and may be video data or audio data to be played, for example, may be video data or audio data in live broadcast. When the play object is video data, the play object may include a plurality of video frames, which may also be referred to as image frames, and when the play object is audio data, the play object may include a plurality of audio frames, which may be collectively referred to as data frames, that is, the data frames may include at least one of audio frames or image frames. The data frame in the playing object may be an encoded data frame, where the encoded data frame refers to data obtained after encoding, and the encoded data frame in the playing object may be an encoded video frame obtained after encoding the collected video frame, or an encoded audio frame obtained after encoding the collected audio frame. Each encoded data frame in the play data may correspond to a play time, where the play time is used to indicate a sequence of the play data, and the earlier the play time is, the earlier the play time is. When the data frame is video, the play time may be the display time. The display time may be, for example, a display time stamp (presentation time stamp, PTS), and the encoded data frames in the playback object may be sequentially arranged in the display time sequence, with the earlier the display time, the earlier the encoded data frames in the playback object. Each encoded data frame in the play-out data may also have a decoding time, which may be, for example, a decoding time stamp (decoding time stamp, DTS). In the playing data, the display time corresponding to different data frames is different, and the decoding time corresponding to different data frames is different. The decoding time stamp is used for indicating the decoding sequence of the data frame, the earlier the decoding time stamp is, the earlier the decoding sequence of the data frame is, the display time stamp is used for indicating the display sequence of the data frame, and the earlier the display time stamp is, the earlier the display sequence is. As shown in fig. 3, a GOP including 15 frames of video frames is shown, and it can be seen from the figure that the display order coincides with the order of PTS and the decoding order coincides with the order of DTS. The target play object may be any play object. The data frames may also correspond to sequence numbers, which may be for example sequence numbers in RTP.

The data stream refers to a form of the target play object in the transmission process, and the data stream includes data in the target play object. The first data stream refers to data received through the first transmission channel.

The data buffer is a storage space for buffering data, and may be a storage area in a memory, for example. The first data buffer is used for buffering the data stream transmitted from the first transmission channel. The data stored in the data buffer may be updated continuously, for example, during a first period of time, the data stream received during the first period of time is stored in the data buffer, and during a second period of time, the data stream received during the second period of time is stored in the data buffer. The data buffer may store the data stream received in the target time period, and the target time period may be preset or set according to needs, for example, may be the data stream received in 1 minute. The first data buffer may be created when the server determines that the first transmission channel is established, or the first data buffer may be created when it is determined that the first transmission channel has data transmission. The server may establish a correspondence between the first transmission channel and the first data buffer. The data frames in the first data buffer may be arranged according to a play time, the more forward the position in the first data buffer.

Specifically, the first transmission channel may be a channel established between the first terminal and the server and used for transmitting data, the target playing object is a playing object obtained by the first terminal, the first terminal may transmit the target playing object to the server through the first transmission channel, the server may receive a first data stream corresponding to the target playing object transmitted by the first transmission channel, and store the first data stream in the first data buffer area. For example, the server may be provided with a data stream receiving module corresponding to the first transmission channel, where the data stream receiving module corresponding to the first transmission channel is referred to as a first data stream receiving module. When the server acquires the first data stream corresponding to the target playing object, the server can receive the data stream transmitted by the first transmission channel through the first data stream receiving module, and store the received data stream into the first data buffer area.

In some embodiments, the first terminal is a first shooting device in the target live broadcast scene, and the first terminal may acquire the target live broadcast scene to obtain first scene data, and encode the first scene data to obtain the target playing object. The photographing apparatus may have a function of collecting audio data or video data. The data may be encoded in a plurality of encoding modes, for example, for video data, the first terminal may encode the video data into data in h.264 format or h.265 format. H.264 and h.265 are Video coding (or decoding) standards, which are highly compressed digital Video codec standards proposed by the Joint Video Team (JVT) consisting of a combination of ITU-T (ITU-T for ITU Telecommunication Standardization Sector, international telecommunication union, telecommunication standards division) Video coding experts group (Video Coding Experts Group, VCEG) and ISO (International Standard Organization, international standardization organization) and IEC (International Electro technical Commission ) Moving Picture Experts Group (MPEG). For audio data, the first terminal may encode the audio data in ACC (advanced audio coding ) format or MP3 (Moving Picture Experts Group Audio Layer III, MP3, moving picture experts compression standard audio layer 3) format. AAC is an MPEG-2 (MovingPicture Experts Group, moving picture experts group generation 2) based audio coding technology, which is the mainstream audio coding (or decoding) format of current live streaming media.

In some embodiments, the target playing object may be data obtained by encoding and packaging the data collected by the first terminal. Specifically, the first terminal may encode the collected data, and encapsulate the encoded data to obtain the target playing object. For example, the first terminal may encapsulate the encoded data into an FLV format. The FLV (Flash Video, streaming media format) is a Video format developed along with the push of Flash MX, and may be applied in live streaming media technology. Consists of a series of tags, and encapsulates audio, video, media description information and the like. The server may unpack, decode, encode, repackage the first data stream and send the unpackaged first data stream to the second terminals, for example, the server may repackage the data into RTMP/FLV/HLS or DASH, and distribute the data to each second terminal for playing.

In some embodiments, the first terminal may transmit the first data stream to the server through a preset transmission protocol. Wherein the preset transmission protocol may be at least one of RTMP, RTP, SRT, HLS or DASH. The RTMP (Real Time Messaging Protocol, real-time message transfer protocol) is a network protocol for real-time data communication, and is mainly used for audio/video and data communication between a Flash/AIR platform (an operation platform crossing an operating system) and a streaming media or interactive server supporting the RTMP protocol. RTP (Real-time Transport Protocol ) is used for streaming media systems, and is matched with RTSP (Real Time Streaming Protocol, real-time streaming protocol) protocol or directly uses RTP to Transport TS (Transport Stream) streams, and is also used for video conference systems, thus being a technical foundation of IP telephone industry. The RTP protocol is used together with the RTP control protocol RTCP (Real-time Control Protocol, real-time transport control protocol) and is created over the UDP (User Datagram Protocol ) protocol. SRT (Secure Reliable Transport, real-time reliable transport protocol) is an open-source and copyright-free UDP-based transport protocol formulated by Havision in combination with Woza, and aims to safely and reliably solve the problems of high delay and poor jitter resistance of TCP (Transmission Control Protocol, transport control protocol) in long-distance link transmission and optimize live streaming media scenes. HLS (HTTP Live Streaming, HTTP-based adaptive rate streaming media transmission protocol), which is an Apple dynamic rate adaptive technology, is mainly used for audio and video services of PC (Personal Computer ) and Apple terminals, and comprises an m3u (8) index file, TS media fragment file and key encryption string file. HTTP (Hypertext Transfer Prtcl) refers to the hypertext transfer protocol. DASH (Dynamic Adaptive Streaming over HTTP ) is mainly used to efficiently distribute content in an adaptive, progressive, download or streaming manner over the HTTP protocol for MPEG media. For example, the first terminal may encode at least one of audio data or video data, encapsulate the encoded data, and send the encapsulated data to the server.

S204, receiving a second data stream corresponding to the target playing object from the second transmission channel, and caching the second data stream through the second data cache area.

The second data stream refers to a data stream corresponding to the target playing object received through the second transmission channel. The target play object may include data generated by a plurality of data stream pushing devices, the data generated by the same data stream pushing device may be output by the same encoder, and the data generated by different data stream pushing devices may be output by different encoders. The data stream pushing device may have functions of data collection, data processing and pushing encoded data to the server in a data stream mode, for example, the data stream pushing device may be a first terminal, for example, when the target playing object is a target live video in a target live video scene, the target live video scene includes a plurality of data stream pushing devices, for example, the data stream pushing device may be a shooting device, each data stream pushing device may generate and send live data to the server, and then the target playing object may include data generated by the plurality of data stream pushing devices respectively. The first terminal may be a data stream pushing device in a target live scene. The first data stream and the second data stream may be two data streams pushed by the same data stream pushing device, for example, may be two data streams pushed by the first terminal. The first data stream and the second data stream may also be two paths of data streams pushed by different data stream pushing devices, for example, the first data stream is a data stream pushed by a first terminal, the second data stream is a data stream pushed by a third terminal, which is different from the first terminal, and the third terminal may be, for example, a data stream pushing device in a target live broadcast scene. The second data buffer area is used for buffering the data stream transmitted from the second transmission channel. The data frames in the second data buffer may be arranged according to a play time, the more forward the position in the second data buffer. The second data buffer is different from the first data buffer. Since there may be a problem of data loss during transmission of data, the data stored in the first data buffer may be missing, and thus the data stored in the first data buffer may be missing, and the data stored in the second data buffer may be missing, and the data stored in the first data buffer may include the data missing in the second data buffer, and the data stored in the second data buffer may include the data missing in the first data buffer. The second data buffer may be set when the server determines that the second transmission channel is established, or may be set when it is determined that the second transmission channel has data transmission. The server may establish a correspondence between the second transmission channel and the second data buffer.

Specifically, the server may receive the second data stream corresponding to the target play object transmitted by the second transmission channel, and store the second data stream in the second data buffer area. The server may be provided with a data stream receiving module corresponding to the second transmission channel, where the data stream receiving module corresponding to the second transmission channel is called a second data stream receiving module. The server may receive the data stream transmitted by the second transmission channel through the second data stream receiving module, and store the received data stream in the second data buffer area.

S206, taking the first data buffer area as a data stream buffer source corresponding to the convergence buffer area, and storing the data stream in the first data buffer area into the convergence buffer area.

The data stream cache source refers to a source of data in the convergence cache region. The data stream buffer sources corresponding to the convergence buffer area can be the same or different at different moments, and at the same moment, the convergence buffer area corresponds to one data stream buffer source. The data stream buffer sources corresponding to the convergence buffer area can be changed continuously, for example, when the data in the first data buffer area is discontinuous, the data stream buffer sources can be switched to other data buffer areas. The convergence buffer zone can be preset or created when the server receives the convergence buffer zone creation instruction. The data streams may correspond to a data stream type, which may include at least one of a primary data stream, which may also be referred to as a primary stream, or a backup data stream, which may also be referred to as a backup stream. The primary data stream may be preferentially taken as the data stream cache source when determining the data stream cache source, e.g., when the first data stream is the primary stream and the second data stream is the backup stream. Of course, one of the main stream and the backup stream may be selected randomly as the data stream buffer source. The server may switch the data stream cache sources in real time based on the stream state and stream quality.

Specifically, an aggregation module may be provided in the server, where the aggregation module may include a first data buffer area, a second data buffer area, and an aggregation buffer area, and the server may aggregate data in the first data buffer area and the second data buffer area into the aggregation buffer area.

In some embodiments, when data is stored in the aggregation buffer, the server may store the data according to the play time corresponding to the data frame, where the earlier the play time is, the earlier the stored order is, that is, the data frames with earlier play times are preferentially stored, and in the storing process, when it is determined that the time interval between the play times of the data frames in the first data buffer is greater than the standard time interval, it is determined that there is a loss of data in the first data buffer, and when it is determined that there is a loss, the server may acquire the lost data frame from other data buffers, and store the lost data frame in the aggregation buffer, so that the data in the aggregation buffer is complete. The standard time interval refers to a time interval between two adjacent frames of data in the non-missing data stream.

In some embodiments, the server may count the number of transmission channels corresponding to the target play object, and as the number of target channels, when determining that the number of target channels is greater than the threshold of the number of channels, trigger an aggregation buffer creation instruction, and create an aggregation buffer corresponding to the target play object. The threshold number of channels can be preset or set as required, for example, can be 2.

S208, when the data loss of the first data stream is determined, switching the data stream buffer source corresponding to the convergence buffer area from the first data buffer area to the second data buffer area, determining the data loss position corresponding to the first data stream, and determining the convergence starting position corresponding to the second data stream according to the data loss position.

Wherein the data missing position refers to the position of the missing data frame in the first data stream. The missing data frame refers to a data frame that is missing in the first data stream. The playing time corresponding to different data frames in the data stream is different, the position of the data frame can be identified by the playing time, that is, the data missing position can be identified by the playing time, for example, when the data frame is a video frame, the display time of the missing data frame can be used for identifying the data missing position. The data missing locations may include data missing locations corresponding to one or more missing data frames.

Specifically, the server may acquire the first data frame from the first data buffer according to the playing time, store the first data frame in the aggregation buffer, for example, the server acquires the current first data frame from the first data buffer, determines, based on the playing time corresponding to the current first data frame, the playing time corresponding to the backward adjacent data frame corresponding to the current first data frame, as the backward adjacent playing time, acquires, from the first data buffer, a data frame whose playing time is the backward adjacent playing time, and when the acquisition is successful, stores the acquired data frame in the aggregation buffer, and when the acquisition fails, determines that the backward adjacent data frame is missing in the first data buffer, that is, determines that the first data stream has data loss. The current first data frame may be any one of the data frames in the first data buffer. The backward adjacent data frame corresponding to the current first data frame refers to a data frame arranged after and adjacent to the current first data frame when there is no data loss and the data frames are arranged from front to back according to the play time.

In some embodiments, the data frames in the second data buffer may be arranged according to a play time, the data missing positions may be determined by using play times, and the play times are different, so that the data missing positions are different, the server may obtain the data frames with the front play time according to the play time corresponding to the data frames when obtaining the data from the second data buffer, and the server may take the data missing positions as convergence starting positions, for example, when the first data stream and the second data stream are both data streams that are pushed to the server by the first terminal, that is, when the first data stream and the second data stream come from the same encoder, the server may take the data missing positions as convergence starting positions, or the server may calculate the convergence starting positions based on the data missing positions, for example, may take positions in the second data buffer and the data missing positions that are less than a position difference threshold as convergence starting positions. The position difference threshold may be preset or set as desired, for example, may be 1 second. For example, when the first data stream is a data stream pushed to the server by a different device than the second data stream, for example, the first data stream is pushed to the server by the first terminal, the second data stream is pushed to the server by the second terminal, the server may acquire data buffered in the second data buffer, as second buffered data, acquire a position of a data frame in the second buffered data, as a comparison position, calculate a position difference between the data missing position and the comparison position, and determine a convergence start position corresponding to the second data stream based on the target data position, using the comparison position with the position difference smaller than a position difference threshold as a target data position. For example, the target data position may be used as the convergence start position or the convergence start position may be determined based on the frame encoding type of the data frame at the target data position, for example, when the frame encoding type of the data frame at the target data position is an intra-frame encoding type, the server may use the target data position as the convergence start position, and when the frame encoding type corresponding to the data frame at the target data position is another frame encoding type than the intra-frame encoding type, the server may acquire, as the convergence start position, a position in the second data stream where the position is after the target data position and the frame encoding type is the position where the data frame of the intra-frame encoding type is located.

Wherein, when the data frame is a video frame, the frame coding type may include at least one of an intra-frame coding frame, a forward predictive coding frame, or a bi-predictive interpolation coding frame, the intra-frame coding frame may also be referred to as a key frame or an I frame, the forward predictive coding frame may be referred to as a P frame, and the bi-predictive interpolation coding frame may be referred to as a B frame. The I frame, P frame, and B frame are data frames obtained after encoding. The I frame comprises a complete image, the I frame is decoded without referring to other frames, the I frame can be decoded only by the data of the I frame, and the I frame comprises a large amount of data. When decoding a P frame, reference is required to a frame preceding the P frame, and when decoding a B frame, reference is required to a frame preceding and following the B frame, and the data amounts of the P frame and the B frame are small. The I-frames may include normal I-frames and IDR (Instantaneous Decoding Refresh, instantaneous decode refresh) frames. An IDR frame is the beginning of a coding sequence, which is a sequence of coded video frames, and may also be referred to as a group of pictures (group of pictures, GOP), which is a set of video frame data in a video stream, where the length of a group of pictures is the frame interval between two IDR frames. When the decoder reads the IDR frame, the decoder will refresh the relevant parameter information of the coding, and will not refer to the frame before the IDR frame when decoding the frame after the IDR frame.

In some embodiments, the server may obtain a play time corresponding to a data frame in the first cache data, and obtain a play time corresponding to a data frame in the second cache data, calibrate the play time corresponding to the data frame in the second cache data based on the play time corresponding to the data frame in the first cache data, so as to obtain calibrated second cache data, for example, may calibrate the display time stamp corresponding to the data frame in the second cache data based on the display time stamp corresponding to the data frame in the first cache data, so that the data frame in the second cache data is unified with the data frame in the first cache data in display time, for example, the server may obtain the display time stamp corresponding to each first data frame and the display time stamp corresponding to each second data frame, and when calibrating a certain second data frame, may obtain a display time stamp with the smallest difference between the display time stamps corresponding to the second data frame from the display time stamps corresponding to each first data frame, and use the display time stamp with the smallest difference as the display time stamp corresponding to the calibrated second data frame. The first cache data refers to data cached in the first data cache region, and the second cache data refers to data cached in the second data cache region. The first data frame refers to a data frame in the first buffered data, and the second data frame refers to a data frame in the second buffered data.

In some embodiments, when the server determines that the first data stream has data missing, the data stream buffer source corresponding to the aggregation buffer area is switched from the first data buffer area to the second data buffer area, the data missing position corresponding to the first data stream is determined, and the aggregation starting position corresponding to the calibrated second buffer data is determined according to the data missing position. Specifically, the server may use the data missing position as the convergence starting position, or determine the frame coding type of the data frame corresponding to the data missing position from the second buffer data, when the frame coding type is an intra-frame coding frame, use the data missing position as the convergence starting position, when the frame coding type is not an intra-frame coding frame, determine the data frame corresponding to the data missing position from the second buffer data as the missing data frame, determine the intra-frame coding frame located after the missing data frame and closest to the missing data frame from the second buffer data, and use the position corresponding to the intra-frame coding frame as the convergence starting position.

In this embodiment, by using a consistency algorithm based on a timestamp, it is possible to implement that, under multiple paths, the audio and video data acquired by the stream processing module cannot be repeated or lost due to jitter of an uplink stream, so that a frame rollback, jump or clip occurs when a player plays.

S210, starting from a convergence starting position corresponding to the second data stream, storing the data stream in the second data buffer area into the convergence buffer area.

Specifically, the server may start from the convergence starting position, acquire a second data frame from the data in the second data buffer area, that is, the second buffer data, store the acquired second data frame in the convergence buffer area, arrange the second data frame in the convergence buffer area according to the sequence stored in the convergence buffer area, and then arrange the data stream stored in the convergence buffer area after the data stream previously stored in the convergence buffer area, so as to obtain the play data corresponding to the target play object based on the convergence data stream in the convergence buffer area. As shown in fig. 4, each rectangular frame in the diagram represents a data frame, the data frame represented by the rectangular frame with "I" inside is an intra-frame encoded frame, 13 th to 16 th frames of the data frame in the first data stream are missing, and 13 th to 16 th frames of the data stream are present in the second data stream, so the 1 st to 12 th frames of the first data stream can be stored in the aggregation buffer first, then the 13 th to 16 th frames of the second data stream are aggregated in the aggregation buffer, and an aggregated data stream is obtained, and the data frames in the aggregated data stream are continuous without missing.

In some embodiments, when determining that the data frame in the second data buffer area has a data loss, the server may switch the data stream buffer source corresponding to the aggregation buffer area to a third data buffer area, where the third data buffer area may be the first data buffer area, or may be a buffer area other than the first data buffer area and the second data buffer area. And the third data buffer area stores the data stream corresponding to the target playing object. For example, the server may receive a third data stream corresponding to the target play object from the third transmission channel, and buffer the third data stream through the third data buffer. S212, obtaining the playing data corresponding to the target playing object based on the converged data stream in the converged buffer area.

Specifically, the play data is data for play. The play data may be data obtained after the decoding process. The play data may be data obtained after the second terminal decodes the data, for example, the server may send the aggregate data stream to the second terminal, and the second terminal may decode the aggregate data stream to obtain playable data for playing. Or, the server may transcode the aggregate data stream to obtain a transcoded data stream, and send the transcoded data stream to the second terminal, for example, the server may include a stream processing module, where the stream processing module has a transcoding function, and the server may transcode the aggregate data stream by using the stream processing module to obtain the transcoded data stream.

In some embodiments, the server may divide the transcoded data stream to obtain a plurality of divided data streams, and send each divided data stream to the second terminal. The second terminal can decode the transcoded data stream to obtain playable data for playing. Wherein when there are a plurality of second terminals, the server may distribute the transcoded data stream to each of the second terminals.

In some embodiments, the target play object is a target live video in a target live scene. The first data stream is from a first terminal and the first terminal is a main broadcasting end in a live broadcasting streaming media technology, the server can be a server of a streaming media background in the live broadcasting streaming media technology, the main broadcasting end is an audio and video data source, namely equipment for collecting audio and video data, the main broadcasting end can encode the collected audio and video data, the audio data and the video data obtained by encoding are packaged and transmitted to the streaming media background through a transmission protocol, the background plays the packaged data through a player distributed to a spectator end, and for example, the packaged data can be distributed to the spectator end by a content distribution network (Content Delivery Network, CDN). As shown in fig. 5, an architecture diagram of a live streaming technology is shown, where a streaming node is configured to receive audio and video data sent by a hosting end.

In some embodiments, the first terminal is a main broadcasting end in a live broadcasting streaming media technology, the main broadcasting end belongs to a data stream pushing device, the main broadcasting end can push multiple paths of streams when pushing streams, the multiple paths refer to at least two paths, and the first data stream and the second data stream can be two paths of audio streams or video streams pushed by the same main broadcasting end. The push flow means that the main broadcasting end pushes locally collected audio and video flows to a server where a streaming media background is located, and each path of pushed audio and video flows are received through different access nodes, so that the number of the access nodes can be multiple, each path of pushed audio and video flows can comprise a main flow and a standby flow, the access node receiving the main flow can be called a main flow access node, the access node receiving the standby flow can be called a standby flow access node, and a flow processing module can switch between the main flow and the standby flow, for example, can switch in real time according to the flow state and quality. The first data stream receiving module for receiving the first data stream may be an access node, for example, a main stream access node, and the second data stream receiving module for receiving the second data stream may also be an access node, for example, a standby stream access node.

As shown in fig. 6, the number of push flows at the anchor end is 2, the access node includes an uplink access point a and an uplink access point B, the uplink access point a receives a main flow, the uplink access point B receives a standby flow, the flow processing model obtains the main flow from the uplink access point a, and when the main flow fails or is blocked seriously, the flow processing module can be automatically switched to receive the standby flow, for example, the flow processing module can combine scheduling to disconnect the connection with the main flow access node, i.e. the uplink access point a, then establish the connection with the standby flow access node, i.e. the uplink access point B, and obtain the audio/video data from the uplink access point B.

In some embodiments, in the server, an aggregation module may be disposed between the access node and the flow processing module, and the aggregation module may be used to obtain real-time audio and video data from the main flow access node and the standby flow access node, and aggregate the audio and video data obtained from the main flow access node and the standby flow access node. As shown in fig. 7, an aggregation module is disposed between the uplink access point and the flow processing module, where the aggregation/switching module refers to the aggregation module. In fig. 7, by unifying the time stamps of the main stream and the backup stream and the GOP sequence, switching between the main stream and the backup stream can be achieved, so that the downlink is not perceived and smooth.

In the above playing data processing method, a first data stream of a target playing object is cached by a first data cache region, a second data stream of the target playing object is cached by a second data cache region, the data stream in the first data cache region is stored in a convergence cache region, when it is determined that the first data stream has data missing, a data stream cache source corresponding to the convergence cache region is switched from the first data cache region to the second data cache region, a convergence starting position corresponding to the second data stream is determined according to the data missing position corresponding to the first data stream, and the data stream in the second data cache region is stored in the convergence cache region from the convergence starting position corresponding to the second data stream. The data streams corresponding to the same target playing object are respectively transmitted through the first transmission channel and the second transmission channel, and are cached through the cache spaces respectively corresponding to the channels, and the first data cache region in the converging cache region is used as a data stream cache source to acquire the data streams, so that the data streams of the playing object can be acquired, when the data to be cached by the data stream cache source is abnormal, the data streams can be immediately and accurately switched to the second data cache region, the converging starting position corresponding to the second data streams is determined based on the data missing position, and the data streams are cached from the converging starting position, so that the missing condition of the data streams in the converging cache region is reduced, the playing data corresponding to the target playing object is acquired based on the converging data streams in the converging cache region, and the integrity of the playing data and the fluency of the data playing are improved.

Currently, in the live streaming technology, when there is only one upstream stream, in order to improve the high availability of the cloud system (streaming background system), multipath forwarding can be adopted during internal transmission of the cloud system, and in combination with a different-place mode, disaster tolerance is improved, as shown in fig. 8, and multipath disaster tolerance is adopted in the system. In some scenes, in the scene of a proportional match event or live broadcasting of a live concert, a plurality of cameras (machine positions), a data acquisition system and coding pushing equipment are arranged on the live broadcasting scene, an uplink access point A is used for receiving audio and video data pushed by different pushing ends, as shown in fig. 9, the uplink access point A receives the audio and video data sent by the pushing end A, an uplink access point B receives the audio and video data sent by the pushing end B, a stream processing module acquires the data pushed by the pushing end A from the uplink access point A, and when the data pushed by the pushing end A fails, a stream processing module is switched to acquire the data pushed by the pushing end B from the uplink access point B. However, whether the push end performs multipath push or the inside of the streaming media background system performs multipath disaster recovery, when the main stream and the standby stream are switched, due to the problem of switching time, data before and after switching cannot be accurately aligned, so that a user at the play end can see a picture with rollback or forward jump, and the viewing experience of the clock-turning end is affected. As shown in fig. 10, at the time of 102 seconds (S), if the main stream is switched to the standby stream, the data acquired from the standby stream starts from 101 seconds because the data of the standby stream remains in the previous frame, or the data acquired from the standby stream starts from 101 seconds because of a video gp cache (buffering) mechanism, so that a rollback phenomenon occurs. Whether the pushing end carries out multipath pushing or multipath disaster recovery in the system, when the main and standby flows are switched, due to the problem of switching time, accurate alignment of data before and after switching on the main and standby flows can not be ensured, and picture jump, rollback or blocking is caused, so that the viewing experience of the audience end is influenced.

The play data processing method provided by the application can realize a plurality of input streams with the same codes, when the main stream is in fault switching to the standby stream, the switching can be very smooth at the stream receiving end, and the alignment of frame levels is realized, so that the watching of a downlink play user is not influenced, the picture rollback, jump or blocking does not occur, and the main stream and the standby stream can be more smooth and lossless when being switched.

In some embodiments, the first data stream and the second data stream are encoded data streams; determining a data missing position corresponding to the first data stream, and determining a convergence starting position corresponding to the second data stream according to the data missing position includes: acquiring a target data position corresponding to the data missing position in the second data stream; acquiring a frame coding type of a target data frame corresponding to a target data position in a second data stream; and determining a position determining strategy according to the frame coding type, and determining a convergence starting position corresponding to the second data flow according to the target data position and the position determining strategy.

The target data location refers to a location of the second data frame in the second data stream, for example, may be a playing time of the second data frame in the second data stream, and the playing time may be a display timestamp, for example. For example, if the complete video stream, i.e., the video stream without data loss, is [ encoded video frame 1, encoded video frame 2, encoded video frame 3, encoded video frame 4, encoded video frame 5, encoded video frame 6], and the first data stream is [ encoded video frame 1, encoded video frame 2, encoded video frame 4, encoded video frame 5, encoded video frame 6], then the first buffer data obtained after storing the first data stream in the first data buffer area is [ encoded video frame 1, encoded video frame 2, encoded video frame 4, encoded video frame 5, encoded video frame 6], and thus it can be determined that one frame of video frame is missing between video frame 2 and video frame 4 in the first buffer data, and if the display time length of each frame of video frame is 0.2 seconds, the display time stamp corresponding to encoded video frame 1 is 0.2 seconds, the display time stamp corresponding to encoded video frame 2 is 0.4 seconds, the display time stamp corresponding to encoded video frame 3 is 0.6 seconds, and thus it can be determined that the display time stamp corresponding to the data loss position is 0.6 seconds.

The position determination policy may include at least one of an encoded data set skip policy or a position maintenance policy. The encoded data set skipping strategy is used for indicating to skip the encoded data set where the target data position is located, and determining the convergence starting position from the position after the encoded data set where the target data position is located in the second cache data. The location maintenance policy is used to indicate the target data location as the convergence starting location. The target data frame is a data frame at a target data location in the second buffered data.

The data group is a sequence in which a plurality of data frames are arranged from front to back in accordance with the play time.

The encoded data sets refer to data sets in a data stream obtained by encoding, that is, segments in the data stream, and the lengths of the encoded data sets may be the same or different, that is, the number of data frames included in the encoded data sets may be the same or different. When decoding one data set, the other data set is not used. When decoding each data frame in the same data group, decoding may be performed with reference to other data frames in the data group. The encoded data set refers to a data set in the encoded data stream. When the data frame is an image frame, if the image frame is encoded using h.264, one encoded data set may be referred to as one image set. The data frames included in the encoded data set are arranged according to the playing time, the initial data frame in the encoded data set is an encoded reference frame, the encoded reference frame can decode itself based on the data of itself, and the data frames outside the encoded reference frame in the encoded data set are directly or indirectly decoded by means of the encoded reference frame. The decoding of a data frame in one encoded data set is independent of the data frames in other encoded data sets. The encoded reference frame may be an intra-coded frame, i.e., an I-frame. The non-coded reference frames may be P frames or B frames.

For example, if the original video data, i.e., the uncoded video data is [ video frame 1, video frame 2, video frame 3, video frame 4, video frame 5, video frame 6], the original video data is uncoded, i.e., uncompressed data, and the first terminal encodes the original video data, during the encoding process, if video frames 1 to 3 are encoded as data group 1 to obtain encoded data group 1= [ encoded video frame 1, encoded video frame 2, encoded video frame 3], and video frames 4 to 6 are encoded as data group 2 to obtain encoded data group 2= [ encoded video frame 4, encoded video frame 5, encoded video frame 6], the original video data is encoded to obtain [ encoded data group 1, encoded data group 2]. Specifically, the server may obtain the display time stamp corresponding to each second data frame in the second cache data, and obtain the display time stamp corresponding to the data missing position, and when the server determines that the display time stamp corresponding to each second data frame has the same display time stamp as the display time stamp corresponding to the data missing position, the server takes the data missing position as the display time stamp corresponding to the target data position.

In some embodiments, when the server determines that the display time stamp corresponding to the data missing position does not exist in the display time stamps of the respective second data frames, the server may acquire, as the display time stamp corresponding to the target data position, the display time stamp having the smallest difference between the display time stamps of the data missing position from the display time stamps of the respective second data frames.

In some embodiments, the server may determine a target data frame corresponding to the target data location from the second buffered data, obtain a frame encoding type corresponding to the target data frame, determine a location determination policy based on the target data frame, e.g., the server may determine that the location determination policy is a location maintenance policy when the frame encoding type of the target data frame is an encoded reference frame, and determine that the location determination policy is an encoded data set skip policy when the frame encoding type of the target data frame is a non-encoded reference frame. In this embodiment, the position determining policy is determined according to the frame coding type, and the convergence starting position corresponding to the second data stream is determined according to the target data position and the position determining policy, so that the accuracy of the convergence starting position is improved.

In some embodiments, determining a position determination policy according to the frame coding type, determining a convergence starting position corresponding to the second data stream according to the target data position and the position determination policy includes: when the frame coding type is a non-coding reference frame, determining a position determining strategy as a coding data group skipping strategy; and skipping the target coding data set corresponding to the target data position based on the coding data set skipping strategy, and taking the position of the coding reference frame in the backward coding data set corresponding to the target coding data set as the convergence starting position corresponding to the second data stream.

The target encoded data set refers to an encoded data set to which a data frame at a target data position in the second cache data belongs. The backward coded data set corresponding to the target coded data set refers to a coded data set which is located behind the target coded data set and has the smallest distance with the target coded data set in the second cache data. For example, the second buffered data is [ encoded video frame 1, encoded video frame 2, encoded video frame 3, encoded video frame 4, encoded video frame 5, encoded video frame 6], [ encoded video frame 1, encoded video frame 2, encoded video frame 3] is encoded data set 1, [ encoded video frame 4, encoded video frame 5, encoded video frame 6] is encoded data set 2, if the data frame at the target data position is encoded video frame 2, since the encoded video frame belongs to encoded data set 1, the target encoded data set is encoded data set 1, and the backward encoded data set corresponding to the target encoded data set is encoded data set 2.

Specifically, when the server determines that the frame coding type of the target data frame is a non-coding reference frame, that is, when it determines that the target data frame is not a starting data frame in the coded data set, the server may determine that the position determining policy is a coded data set skipping policy, skip the target coded data set corresponding to the target data position, determine, from the positions behind the target coded data set, a convergence starting position corresponding to the second buffered data, for example, take a position of the coded reference frame in the backward coded data set corresponding to the target coded data set as the convergence starting position, that is, take a position corresponding to the starting data frame in the backward coded data set corresponding to the target coded data set as the convergence starting position.

In this embodiment, when the frame coding type is a non-coding reference frame, the target coding data set corresponding to the target data position is skipped, and the position of the coding reference frame in the backward coding data set corresponding to the target coding data set is used as the convergence starting position corresponding to the first data stream, so that the data frame corresponding to the convergence starting position is the coding reference frame, and therefore, the data in the second data stream converged in the convergence buffer area can be transcoded based on the coding reference frame, and the quality and the transcoding effect of the data stream in the convergence buffer area are improved.

In some embodiments, determining a position determination policy according to the frame coding type, determining a convergence starting position corresponding to the second data stream according to the target data position and the position determination policy includes: when the frame coding type is the coding reference frame, determining the position determining strategy as a position maintaining strategy; and taking the target data position as a convergence starting position corresponding to the second data flow based on the position maintaining strategy.

Specifically, when the server determines that the frame coding type corresponding to the target data frame is the coding reference frame, that is, when it determines that the frame coding type corresponding to the target data frame is the starting data frame in the coding data set, the server may determine that the position determination policy is a position maintaining policy, and use the target data position as the convergence starting position corresponding to the second buffer data.

In this embodiment, when the frame coding type is the coding reference frame, the target data position is used as the convergence starting position corresponding to the second data stream, so that the data frame corresponding to the convergence starting position is the coding reference frame, and therefore data in the second data stream converged in the convergence buffer area can be transcoded based on the coding reference frame, and quality and transcoding effect of the data stream in the convergence buffer area are improved.

In some embodiments, obtaining play data corresponding to the target play object based on the aggregate data stream in the aggregate buffer includes: acquiring a target data length corresponding to the transcoding data set; inserting a reference indication frame into the converged data stream in the converged buffer area according to the target data length to obtain an updated converged data stream; in the transcoding process, a transcoding reference frame is determined based on the reference indication frame, and transcoding is carried out on a transcoding data set where the transcoding reference frame is located in the updated converged data stream based on the transcoding reference frame, so that a transcoding data stream is obtained; and obtaining the playing data corresponding to the target playing object according to the transcoding data stream.

The transcoding data set refers to a data set to be transcoded in the converged data stream. The data frames in the transcoded data set are arranged from front to back according to the play time. The data frames corresponding to the transcoded data set may coincide with the data frames corresponding to the encoded data set, and may not coincide with each other, for example, the aggregate data stream is [ encoded video frame 1, encoded video frame 2, encoded video frame 3, encoded video frame 4, encoded video frame 5, encoded video frame 6], the encoded data set 1 is [ encoded video frame 1, encoded video frame 2, encoded video frame 3], and the transcoded data set 1 is [ encoded video frame 2, encoded video frame 3, encoded video frame 4].

The target data length refers to the length of the transcoded data group, and may be represented by the number of data frames, for example, if the transcoded data group includes 10 frames of data frames, the target data length may be 10 frames, or of course, the target data length may also be represented by the display duration corresponding to the transcoded data group, where the display duration refers to a time interval between a start display timestamp and a stop display timestamp, the start display timestamp refers to a display timestamp corresponding to the start data frame in the data group, and the stop display timestamp refers to a display timestamp corresponding to the stop data frame in the data group, for example, if the display duration corresponding to the transcoded data group is 10 seconds, the target data length may be 10 seconds. The transcoded data stream is a data stream obtained by transcoding data frames in the aggregate data stream. The transcoding process comprises decoding and encoding, namely, firstly decoding the data frames in the converged data stream, and then encoding the data frames obtained by decoding to obtain the data frames in the transcoded data stream.

The reference indication frame is used to indicate the location of the transcoded reference frame. The transcoding reference frame refers to the starting data frame of the transcoded data set, and during the transcoding process, the transcoding reference frame is the data frame that needs to be transcoded into the intra-frame encoded frame. The updated aggregate data flow is a data flow obtained by inserting a reference indication frame into the aggregate data flow.

The reference indication frame may be a custom indication frame. For example, the reference indication frame may be implemented using an SEI (Supplemental Enhancement Information ) frame, or may be implemented using script data. The frame format of the SEI frame corresponding to the reference indication frame is shown in fig. 11A.

The format adds a type SEI payload (payload) =5, the length field is a variable byte, the variable byte conforms to the h.264 or h.265SEI standard, the length does not contain 0×80 end bytes but contains Content bytes, wherein the userbusiness id is 16 bytes, and the Content is a custom string: { \iframe\1 }. The iframe is used to mark the SIE frame as key frame control information.

Specifically, the server may insert reference indication frames into the aggregate data stream at intervals of the target data length to obtain an updated aggregate data stream, where the number of data frames included between the reference indication frames in the updated aggregate data stream is the target data length. For example, if the target data length is 3, the aggregate data stream is [ encoded video frame 1, encoded video frame 2, encoded video frame 3, encoded video frame 4, encoded video frame 5, encoded video frame 6, encoded video 7, encoded video 8, encoded video 9], the encoded video 1-encoded video 6 is from the first buffered data, the encoded video 7-encoded video 9 is from the second buffered data ], the updated aggregate data stream may be [ reference indication frame, encoded video frame 1, encoded video frame 2, encoded video frame 3, reference indication frame, encoded video frame 4, encoded video frame 5, encoded video frame 6, reference indication frame, encoded video 7, encoded video 8, encoded video 9]. Wherein, the server may prioritize the reference indication frame as a forward adjacent frame of the key frame, and the forward adjacent frame of the key frame refers to a frame located before and adjacent to the key frame. Key frames refer to intra-coded frames.

In some embodiments, the server may determine the target data length according to at least one of the first data group length or the second data group length, for example, the server may take the length of the data group in the first data stream as the target data length, or take the length of the data group in the second data stream as the target data length, or take the smaller one of the first data group length and the second data group length as the target data length, or take the larger one of the first data group length and the second data group length as the target data length, or perform a weighted calculation, such as a mean value operation, on the first data group length and the second data group length, and take the calculated result as the target data length. Wherein the first data group length refers to the length of the encoded data group in the first data stream and the second data group length refers to the length of the encoded data group in the second data stream. In some embodiments, the server may send the transcoded data stream to the second terminal, and since the transcoded data stream is obtained by decoding the recoded data, the data frames in the transcoded data stream are encoded data frames, and thus when the second terminal receives the transcoded data stream, the second terminal may decode the transcoded data stream and play the data frames obtained by decoding.

In some embodiments, the server may determine a backward neighboring frame of the reference indication frame from the updated aggregate data stream, and use the backward neighboring frame of the reference indication frame as the transcoding reference frame, where the backward neighboring frame of the reference indication frame refers to a frame that is located after and neighboring the reference indication frame.

In some embodiments, the server may determine a current transcoded reference frame from the updated aggregate data stream, where the current transcoded reference frame may be any one of the updated aggregate data streams, and the server may use a sequence of data frames between the current transcoded reference frame and a backward transcoded reference frame of the current transcoded reference frame (including the current transcoded reference frame but excluding the backward transcoded reference frame) as the transcoded data set in which the current transcoded reference frame is located. The backward transcoded reference frame of the current transcoded reference frame refers to a transcoded reference frame located after and closest to the current transcoded reference frame.

In some embodiments, the server may transcode the transcoded data group, in the transcoding process, may decode a transcoded reference frame in the transcoded data group to obtain a decoded data frame corresponding to the transcoded reference frame, decode other data frames in the transcoded data group based on the decoded data frame corresponding to the transcoded reference frame to obtain decoded data frames of other data frames, and encode each decoded data frame to obtain each data frame in the transcoded data stream.

In some embodiments, the transcoded data stream does not include a reference indication frame, as shown in fig. 11B, which illustrates a schematic diagram of the transcoded data stream, and a rectangular frame including S in the diagram refers to the reference indication frame, and it can be seen from the diagram that the updated aggregate data stream includes the reference indication frame, and there are 6 data frames between the reference indication frames, so that it is known that the target data length corresponding to the transcoded data set is 6 frames. It can also be seen from the transcoded data stream in the figure that the transcoded data set is 6 frames in length and that the reference indication frame is not included in the transcoded data stream.

In this embodiment, according to the target data length corresponding to the transcoded data set, a reference indication frame is inserted into the converged data stream in the converged buffer area to obtain an updated converged data stream, a transcoded reference frame is determined based on the reference indication frame, the transcoded data set in which the transcoded reference frame is located in the updated converged data stream is transcoded based on the transcoded reference frame to obtain a transcoded data stream, and play data corresponding to the target play object is obtained according to the transcoded data stream, so that the lengths of the data sets in the transcoded data stream are consistent, and the uniformity of the lengths of the data sets in the transcoded data stream is improved.

In some embodiments, in the transcoding process, determining a transcoding reference frame based on the reference indication frame, and transcoding the transcoding data set where the transcoding reference frame is located in the updated aggregate data stream based on the transcoding reference frame, so as to obtain a transcoded data stream includes: decoding based on the updated aggregate data stream to obtain a decoded data stream; in the process of encoding the decoded data stream, when a reference indication frame is detected, taking a backward adjacent data frame of the reference indication frame as a transcoding reference frame, and performing intra-frame encoding on the transcoding reference frame to obtain an intra-frame encoded frame; and transcoding the transcoding data group where the transcoding reference frame is located based on the intra-frame coding frame to obtain transcoding data in the transcoding data stream, wherein the transcoding data group where the transcoding reference frame is located comprises a data frame with a target data length.

The decoded data stream is a data stream obtained by decoding the data frames in the updated aggregate data stream. The backward adjacent data frame of the reference indication frame refers to a data frame located after and adjacent to the reference indication frame.

Specifically, in order to implement transcoding, the server may encode the decoded data stream, and in the encoding process, the server may determine, from the updated aggregate data stream, a data frame to be encoded as an intra-frame encoded frame according to a reference indication frame, where the transcoding reference frame refers to a data frame that needs to be encoded as an intra-frame encoded frame in the encoding process. For example, a backward adjacent data frame of the reference indication frame may be used as a transcoding reference frame, and the transcoding reference frame may be intra-coded to obtain an intra-coded frame.

In some embodiments, the server may obtain a transcoded data set in which the transcoded reference frame is located, encode other data frames in the transcoded data set in which the transcoded reference frame is located based on an intra-coded frame obtained by intra-coding the transcoded reference frame, obtain other coded frames, and use the obtained intra-coded frame and other coded frames as transcoded data in the transcoded data stream.

In some embodiments, the server may further store a first encoding parameter corresponding to the first data stream into the aggregation buffer when storing the data in the first data buffer into the aggregation buffer, may further store a second encoding parameter corresponding to the second data stream into the aggregation buffer when caching the data in the second data buffer into the aggregation buffer, and may decode the data from the first data buffer in the updated aggregation data stream based on the first encoding parameter and decode the data from the second data buffer in the updated aggregation data stream based on the second encoding parameter in the process of decoding the updated aggregation data stream. The coding parameters are parameters related to coding, and the coding parameters may be at least one of PPS (Picture Paramater Set, image parameter set) or SPS (Sequence Paramater Set, sequence parameter set), the first data stream and the second data stream are data obtained by coding, the first coding parameters are coding parameters used for coding data in the first data stream, and the second coding parameters are coding parameters used for coding data in the second data stream.

In some embodiments, when a new reference indication frame is detected, the server may update the reference indication frame and return to use a backward neighboring data frame of the reference indication frame as a transcoding reference frame, and perform intra-coding on the transcoding reference frame to obtain an intra-coded frame.

In this embodiment, in the process of decoding a decoded data stream, when a reference indication frame is detected, a backward data frame of the reference indication frame is used as a transcoding reference frame, intra-coding is performed on the transcoding reference frame to obtain an intra-coded frame, and transcoding is performed on a transcoding data set where the transcoding reference frame is located based on the intra-coded frame to obtain transcoding data in the transcoding data stream, and because the transcoding data set where the transcoding reference frame is located includes data frames with a target data length, the lengths of the transcoding data sets in the transcoding data stream are unified.

In some embodiments, the storing the data stream in the second data buffer into the aggregation buffer, starting from the aggregation start position corresponding to the second data stream, where the first data stream and the second data stream are encoded data streams includes: acquiring a second coding parameter corresponding to the second data stream, and generating an information switching indication frame based on the second coding parameter; the information switching indication frame is used for indicating that in the decoding process, if the information switching indication frame is detected, the second coding parameter is used as a new coding parameter, so that the backward data stream of the information switching indication frame is decoded based on the second coding parameter; and inserting the information switching indication frame into the tail end position of the first data stream in the convergence buffer area, starting from the convergence starting position corresponding to the second data stream, storing the data stream in the second data buffer area into the convergence buffer area, and taking the data stream as the backward data stream of the information switching indication frame.

The information switching indication frame generated by the second coding parameter may include the second coding parameter, where the information switching indication frame is used to indicate switching of the coding parameter used for decoding, for example, the information switching indication frame generated by the second coding parameter is used to indicate switching of the coding parameter used for decoding to the second coding parameter. The information switching indication frame can be generated by the server according to the coding parameters or carried in the data stream. The information switch indication frame generated by the first coding parameter may be referred to as a first information switch indication frame, and the information switch indication frame generated by the second coding parameter may be referred to as a second information switch indication frame.

Specifically, when switching the data stream buffer source corresponding to the aggregation buffer area, the server may obtain an information switching indication frame corresponding to the switched data stream buffer source, insert the information switching indication frame into the aggregation buffer area after the data is already in the aggregation buffer area, and store the data into the aggregation buffer area after the information switching indication frame from the aggregation starting position corresponding to the switched data stream buffer source. For example, if the server first uses the first data buffer as the data stream buffer, then switches the data stream buffer to the second data buffer, and then switches the data stream buffer back to the first data buffer, the data in the aggregate buffer may be [ the first information switch indication frame, the data from the first data buffer, the second information switch indication frame, the data from the second data buffer, the first information switch indication frame, the data from the first data buffer ].

In some embodiments, starting from the convergence starting position corresponding to the second data stream, the step of storing the data stream in the second data buffer area into the convergence buffer area, and using the data stream as the backward data stream of the information switching indication frame includes: and starting from the convergence starting position corresponding to the second data stream, storing the data stream in the second data buffer area into the convergence buffer area, and after the information switching indication frame.

In this embodiment, a second coding parameter corresponding to the second data stream is obtained, an information switching indication frame is generated based on the second coding parameter, the information switching indication frame is inserted into the end position of the first data stream in the convergence buffer area, the data stream in the second data buffer area is stored in the convergence buffer area and is used as a backward data stream of the information switching indication frame, so that when decoding the data from the second data buffer area in the convergence data stream, the second coding parameter can be used for decoding, and the decoding success rate is improved.

In some embodiments, obtaining play data corresponding to the target play object based on the aggregate data stream in the aggregate buffer includes: in the process of decoding the converged data stream, decoding is carried out based on a first coding parameter corresponding to the first data stream; when the information switching indication frame is detected, extracting a second coding parameter from the information switching indication frame, switching the coding parameter referenced by decoding from the first coding parameter to the second coding parameter, and decoding based on the second coding parameter; and uniformly encoding the decoded data stream obtained by decoding to obtain the playing data corresponding to the target playing object.

Specifically, when the data in the first data buffer area is stored in the aggregate data stream, and the data in the second data buffer area is stored later, the first information switching indication frame in the aggregate data stream is an information switching indication frame generated based on the first coding parameter, and the second information switching indication frame is an information switching indication frame generated based on the second coding parameter. The first information switch indication frame precedes the second information switch indication frame. In the process of decoding the converged data stream, the server preferentially acquires a first information switching indication frame, decodes data after the first information switching indication frame based on the first information switching indication frame, decodes the data after the second information switching indication frame based on the second information switching indication frame when the data before the second information switching indication frame is decoded, obtains a decoded data stream, encodes the decoded data stream, obtains a transcoded data stream, and sends the transcoded data stream to the second terminal, so that the second terminal decodes the transcoded data stream to obtain playable data for playing.

In this embodiment, when the information switching indication frame is detected, the second coding parameter is extracted from the information switching indication frame, the coding parameter referred by decoding is switched from the first coding parameter to the second coding parameter, and decoding is performed based on the second coding parameter, so that when decoding the data from the second data buffer area in the aggregate data stream, decoding can be performed by using the second coding parameter, and the success rate of decoding is improved.

In some embodiments, the method further comprises: acquiring a live view angle set corresponding to a target live scene, and establishing a cache group corresponding to each live view angle in the live view angle set; the cache group comprises data cache areas corresponding to the live broadcast devices corresponding to the live broadcast viewing angles and convergence cache areas; the live view angle corresponds to a plurality of live devices; the switching the data stream buffer source corresponding to the convergence buffer area from the first data buffer area to the second data buffer area when determining that the first data stream has data loss comprises the following steps: when the data loss of the first data stream is determined, selecting a data buffer area except the first data buffer area from the buffer group corresponding to the first data buffer area as a second data buffer area, and switching a data stream buffer source corresponding to the convergence buffer area from the first data buffer area to the second data buffer area.

The target playing object may be a target live video corresponding to a target live scene, and the first data stream and the second data stream are data streams obtained by collecting the target live scene by using live equipment.

The target live broadcast scene can be provided with a plurality of live broadcast devices, the live broadcast devices are used for collecting data of the live broadcast scene, coding the collected data, and sending the coded data to a server corresponding to a live broadcast background, so that the server sends the data of the live broadcast scene to a viewer terminal. The live view angle can be determined according to the shooting angle of the live device, and the shooting angles are different from the live view angle. The same shooting angle may be provided with one or more live devices.

The live view angle set comprises a plurality of live view angles. The cache group comprises a plurality of data cache areas. And the buffer group corresponding to the live view comprises data buffer areas corresponding to the live devices of the live view respectively, and the data buffer areas are used for buffering data streams from the live devices. The cache group corresponding to the live view angle can also comprise an aggregation cache region, and the data in the aggregation cache region corresponding to the live view angle comes from the data cache region corresponding to the live view angle.

Specifically, the server can obtain the convergence cache stream corresponding to the live view angle by converging the data in each data cache region corresponding to the live view angle into the convergence cache region, so that the convergence cache stream corresponding to each live view angle can be obtained. Taking a live view as an example for illustration, a cache group of the live view includes data cache areas corresponding to a plurality of live devices, a server receives a data stream sent by the live devices, the data stream sent by the live devices is cached in the corresponding data cache areas, a first data cache area is determined from among the data cache areas, for example, any data cache area can be used as the first data cache area, data in the first data cache area is stored in a convergence cache area, when the data in the first data cache area is determined to be missing, a second data cache area is selected from among the data cache areas except the first data cache area in the cache group, and a data stream cache source corresponding to the convergence cache area is switched from the first data cache area to the second data cache area. In this embodiment, from the cache group corresponding to the first data cache region, the data cache region other than the first data cache region is selected as the second data cache region, so that the first data cache region and the second data cache region correspond to the same live view angle, and thus data streams at the same live view angle are converged, the integrity of the data streams at the live view angle can be improved, the situation of data loss is reduced, and the fluency of live data corresponding to the live view angle is improved.

In some embodiments, the target playing object is a target live video corresponding to a target live scene, receiving a first data stream corresponding to the target playing object from the first transmission channel, and caching the first data stream through the first data cache area includes:

a first transmission channel is established with first shooting equipment corresponding to a target live broadcast scene, and a first video stream transmitted by the first shooting equipment through the first transmission channel and a target scene identifier corresponding to the target live broadcast scene are received; taking the first video stream as a main video stream corresponding to the target live video based on the target scene identification; receiving a second data stream corresponding to the target playing object from the second transmission channel, and caching the second data stream through the second data cache area comprises: a second transmission channel is established with a second shooting device corresponding to the target live broadcast scene, and a second video stream transmitted by the second shooting device through the second transmission channel and a target scene identifier corresponding to the target live broadcast scene are received; and taking the second video stream as a backup video stream corresponding to the target live video based on the target scene identification.

The target live scene can be any live scene. The target playing object is a target live video corresponding to a target live scene, the target live scene can be, for example, a live broadcast scene of a concert developed by the star A in the gymnasium L, and the target live video can be, for example, a live broadcast video of the concert.

The photographing device may be used to at least one of photograph video data or record audio data. The target live scene may include a plurality of photographing apparatuses, and the plurality refers to at least two photographing apparatuses, including, for example, a first photographing apparatus and a second photographing apparatus. The first terminal may be a first photographing device in the target live scene, and the third terminal may be a second photographing device in the target live scene. The first transmission channel may be a channel for transmitting data established between the first photographing apparatus and the server, and the second transmission channel may be a channel for transmitting data established between the second photographing apparatus and the server. The scene identifier is used for uniquely identifying the scene, and the target scene identifier is the scene identifier corresponding to the target live broadcast scene. The scene identifier may be set or preset as needed, for example, may be determined using the location where the scene is located or identity information of the person in the scene.

The primary video stream is a relative concept to the backup video stream, and when the server receives multiple video streams from the target live scene, at least one of the multiple video streams may be used as the primary video stream, and at least one of the multiple video streams may be used as the backup data stream, e.g., there may be only one primary video stream.

The first data stream may be a first video stream and the second data stream may be a second video stream.

Specifically, the server may establish a first transmission channel with the first shooting device, establish a second transmission channel with the second shooting device, and the first shooting device may transmit the video data of the obtained target live broadcast scene to the server through the first transmission channel to obtain a first video stream, and the second shooting device may transmit the video data of the obtained target live broadcast scene to the server through the second transmission channel to obtain a second video stream.

In some embodiments, the first photographing apparatus may send a first data reception request to the server through the first transmission channel, where the first data reception request may include the target scene identifier and the first video stream. The second photographing apparatus may transmit a second data reception request to the server through the second transmission channel, and the second data reception request may include the target scene identifier and the second video stream.

In some embodiments, the server may determine the primary video stream from a plurality of video streams according to the quality of the video streams, e.g., may take the better quality video stream as the primary video stream. For example, the server receives 3 video streams from the live scene, namely, a first video stream, a second video stream and a third video stream, and if the quality of the first video stream is better than that of the second video stream and the third video stream, the server can take the first video stream as a main video stream.

In this embodiment, a first transmission channel is established by a first shooting device corresponding to a target live broadcast scene, a first video stream transmitted by the first shooting device through the first transmission channel and a target scene identifier corresponding to the target live broadcast scene are received, the first video stream is used as a main video stream corresponding to the target live broadcast video based on the target scene identifier, a second transmission channel is established by a second shooting device corresponding to the target live broadcast scene, a second video stream transmitted by the second shooting device through the second transmission channel and the target scene identifier are received, and the second video stream is used as a backup video stream, so that video streams acquired by different devices of the target live broadcast scene are obtained, and therefore, when the main video stream is abnormal, the backup video stream can be used for transmitting the video stream, disaster tolerance in the live broadcast scene is improved, the condition of video data loss in the live broadcast scene is reduced, and the smoothness of live broadcast is improved.

The application scenario also provides an application scenario, and the application scenario applies the play data processing method. Specifically, the application of the play data processing method in the application scene is as follows:

1. and establishing a first transmission channel and a second transmission channel with the first terminal.

The first terminal acquires video of the target live broadcast scene and encodes the acquired video data to obtain target live broadcast video, and the first terminal transmits the target live broadcast video to the server through the first transmission channel and the second transmission channel respectively.

2. And receiving a first video stream corresponding to the target live video sent by the first terminal from the first transmission channel, and receiving a second video stream corresponding to the target live video sent by the first terminal from the second transmission channel.

3. And caching the first video stream through the first data caching area, and caching the second video stream through the second data caching area.

The server may create a first data buffer for the first transmission channel, create a second data buffer for the second transmission channel, store data transmitted by the first transmission channel in the first data buffer, and store data transmitted by the second transmission channel in the second data buffer. The server may also be provided with an aggregation buffer.

4. And taking the first data buffer area as a data stream buffer source corresponding to the convergence buffer area, and storing the data stream in the first data buffer area into the convergence buffer area.

5. When the data loss of the first data stream is determined, switching a data stream buffer source corresponding to the convergence buffer area from the first data buffer area to the second data buffer area, determining a data loss position corresponding to the first data stream, and taking the data loss position as a convergence starting position corresponding to the second data stream.

6. And starting from the convergence starting position corresponding to the second data stream, storing the data stream in the second data buffer area into the convergence buffer area.

7. And obtaining the playing data corresponding to the target playing object based on the converged data stream in the converged buffer area.

1. and establishing a first transmission channel with the first terminal and establishing a second transmission channel with the third terminal.

The first terminal acquires video of the target live broadcast scene and encodes the acquired video data to obtain a first target live broadcast video, the second terminal acquires video of the target live broadcast scene and encodes the acquired video data to obtain a second target live broadcast video, the first terminal transmits the first target live broadcast video to the server through a first transmission channel, and the second terminal transmits the second target live broadcast video to the server through a second transmission channel.

2. And receiving a first video stream corresponding to the first target live video sent by the first terminal from the first transmission channel, and receiving a second video stream corresponding to the second target live video sent by the second terminal from the second transmission channel.

5. When the first data stream is determined to have data missing, switching a data stream cache source corresponding to the convergence cache region from the first data cache region to the second data cache region, and determining a data missing position corresponding to the first data stream.

6. And acquiring a target data position corresponding to the data missing position in the second data stream.

7. And acquiring the frame coding type of the target data frame corresponding to the target data position in the second data stream.

8. And when the frame coding type is a non-coding reference frame, skipping a target coding data set corresponding to the target data position by a skipping strategy, and taking the position of the coding reference frame in a backward coding data set corresponding to the target coding data set as a convergence starting position corresponding to the second data stream.

9. And when the frame coding type is the coding reference frame, taking the target data position as the convergence starting position corresponding to the second data stream.

10. Acquiring a second coding parameter corresponding to the second data stream, and generating an information switching indication frame based on the second coding parameter; the information switching indication frame is used for indicating that in the decoding process, if the information switching indication frame is detected, the second coding parameter is used as a new coding parameter, so that the backward data stream of the information switching indication frame is decoded based on the second coding parameter.

11. And inserting the information switching indication frame into the tail end position of the first data stream in the convergence buffer area, starting from the convergence starting position corresponding to the second data stream, storing the data stream in the second data buffer area into the convergence buffer area, and taking the data stream as the backward data stream of the information switching indication frame.

12. And obtaining a target data length corresponding to the transcoding data group, and inserting a reference indication frame into the converged data stream in the converged buffer area according to the target data length to obtain an updated converged data stream.

13. Decoding based on the updated converged data stream to obtain a decoded data stream, and decoding based on a first coding parameter corresponding to the first data stream in the process of decoding the converged data stream; when the information switching indication frame is detected, a second coding parameter is extracted from the information switching indication frame, the coding parameter referenced by decoding is switched from the first coding parameter to the second coding parameter, and decoding is performed based on the second coding parameter.

14. In the process of encoding the decoded data stream, when the reference indication frame is detected, a backward adjacent data frame of the reference indication frame is used as a transcoding reference frame, and the transcoding reference frame is subjected to intra-frame encoding to obtain an intra-frame encoding frame.

15. And transcoding the transcoding data group where the transcoding reference frame is based on the intra-frame coding frame to obtain transcoding data in the transcoding data stream.

The transcoding data group where the transcoding reference frame is located comprises a data frame with a target data length.

16. And splitting the transcoded data stream, and distributing the split data stream to each terminal for watching live broadcast.

The first terminal acquires the audio frequency of the target live broadcast scene and encodes the acquired audio frequency data to obtain target live broadcast audio frequency, and the first terminal transmits the target live broadcast audio frequency to the server through the first transmission channel and the second transmission channel respectively.

2. And receiving a first audio stream corresponding to the target live audio sent by the first terminal from the first transmission channel, and receiving a second audio stream corresponding to the target live audio sent by the first terminal from the second transmission channel.

3. The first audio stream is buffered by the first data buffer and the second audio stream is buffered by the second data buffer.

At present, more and more live broadcast platforms appear, and clients tend to select live broadcast platforms with stable and low cartoon frequency, so the stability and playing quality of the live broadcast platforms are two important aspects for measuring the quality of the live broadcast platforms, and the live broadcast data processing method provided by the embodiment can improve the stability of the live broadcast platforms and reduce the cartoon frequency in the live broadcast process. Because the manufacturer (such as cloud manufacturer) providing the live broadcast service relies on the downstream flow of the customer and the related value added service to charge the customer, improving the stability of the live broadcast platform can improve the possibility of selecting the live broadcast platform by the user, thereby improving the income of the manufacturer providing the live broadcast service.

In the play data processing method provided by the application, a plurality of input streams with the same codes can be realized, when the main stream is in fault switching to the standby stream, the switching can be very smooth at the stream receiving end, the alignment of frame levels is realized, the watching of a downlink play user is not influenced, and the picture rollback, jump or blocking cannot occur. The method and the device are applied to the server side of the live broadcast platform, realize global consistency of audio and video data through an algorithm mechanism, reduce downlink jamming, and have higher disaster recovery capacity and robustness of the system, so that user experience of watching live broadcast is optimized.

It should be understood that, although the steps in the flowcharts of fig. 2 to 11B are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps in fig. 2-11B may include a plurality of steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the steps or stages are performed necessarily occur sequentially, but may be performed alternately or alternately with at least a portion of the steps or stages in other steps or other steps.

In some embodiments, as shown in fig. 12, a play data processing apparatus is provided, which may employ a software module or a hardware module, or a combination of both, as a part of a computer device, and specifically includes: a first data stream receiving module 1202, a second data stream receiving module 1204, a first data stream converging module 1206, a converging start position determining module 1208, a second data stream converging module 1210, and a playing data obtaining module 1212, wherein:

A first data stream receiving module 1202, configured to receive a first data stream corresponding to a target play object from a first transmission channel, and buffer the first data stream through a first data buffer;

the second data stream receiving module 1204 is configured to receive a second data stream corresponding to the target play object from the second transmission channel, and buffer the second data stream through the second data buffer;

the first data stream aggregation module 1206 is configured to store the data stream in the first data buffer as a data stream buffer source corresponding to the aggregation buffer;

the convergence starting position determining module 1208 is configured to switch a data stream buffer source corresponding to the convergence buffer region from the first data buffer region to the second data buffer region when determining that the first data stream has data loss, determine a data loss position corresponding to the first data stream, and determine a convergence starting position corresponding to the second data stream according to the data loss position;

a second data stream aggregation module 1210, configured to store, from an aggregation start position corresponding to the second data stream, the data stream in the second data buffer into the aggregation buffer;

the play data obtaining module 1212 is configured to obtain play data corresponding to the target play object based on the aggregate data stream in the aggregate buffer.

In some embodiments, the first data stream and the second data stream are encoded data streams; the convergence starting position determining module comprises: the target data position determining unit is used for obtaining a target data position corresponding to the data missing position in the second data stream; a frame coding type obtaining unit, configured to obtain a frame coding type of a target data frame corresponding to a target data position in the second data stream; and the convergence starting position determining unit is used for determining a position determining strategy according to the frame coding type and determining a convergence starting position corresponding to the second data flow according to the target data position and the position determining strategy.

In some embodiments, the convergence start location determination unit is further configured to: when the frame coding type is a non-coding reference frame, determining a position determining strategy as a coding data group skipping strategy; and skipping the target coding data set corresponding to the target data position based on the coding data set skipping strategy, and taking the position of the coding reference frame in the backward coding data set corresponding to the target coding data set as the convergence starting position corresponding to the second data stream.

In some embodiments, the convergence start position determining unit is further configured to determine that the position determining policy is a position maintaining policy when the frame coding type is a coding reference frame; and taking the target data position as a convergence starting position corresponding to the second data flow based on the position maintaining strategy.

In some embodiments, the play data obtaining module includes: the target data length acquisition unit is used for acquiring the target data length corresponding to the transcoding data group; an updated aggregate data flow obtaining unit, configured to insert a reference indication frame into the aggregate data flow in the aggregate buffer according to the target data length, to obtain an updated aggregate data flow; a transcoded data stream obtaining unit, configured to determine a transcoded reference frame based on the reference indication frame during a transcoding process, and transcode a transcoded data set where the transcoded reference frame is located in the updated aggregate data stream based on the transcoded reference frame, so as to obtain a transcoded data stream; and the play data obtaining unit is used for obtaining the play data corresponding to the target play object according to the transcoded data stream.

In some embodiments, the transcoded data stream obtaining unit is further configured to decode based on the updated aggregate data stream to obtain a decoded data stream; in the process of encoding the decoded data stream, when a reference indication frame is detected, taking a backward adjacent data frame of the reference indication frame as a transcoding reference frame, and performing intra-frame encoding on the transcoding reference frame to obtain an intra-frame encoded frame; and transcoding the transcoding data group where the transcoding reference frame is located based on the intra-frame coding frame to obtain transcoding data in the transcoding data stream, wherein the transcoding data group where the transcoding reference frame is located comprises a data frame with a target data length.

In some embodiments, the first data stream and the second data stream are encoded data streams, and the second data stream aggregation module includes: the information switching indication frame generation unit is used for acquiring second coding parameters corresponding to the second data stream and generating an information switching indication frame based on the second coding parameters; the information switching indication frame is used for indicating that in the decoding process, if the information switching indication frame is detected, the second coding parameter is used as a new coding parameter, so that the backward data stream of the information switching indication frame is decoded based on the second coding parameter; and the data stream converging unit is used for inserting the information switching indication frame into the tail end position of the first data stream in the converging cache region, starting from the converging starting position corresponding to the second data stream, storing the data stream in the second data cache region into the converging cache region, and taking the data stream as the backward data stream of the information switching indication frame.

In some embodiments, the play data obtaining module includes: the first decoding unit is used for decoding the converged data stream based on a first coding parameter corresponding to the first data stream in the process of decoding the converged data stream; the second decoding unit is used for extracting a second coding parameter from the information switching indication frame when the information switching indication frame is detected, switching the coding parameter referenced by decoding from the first coding parameter to the second coding parameter, and decoding based on the second coding parameter; and the coding unit is used for uniformly coding the decoded data stream obtained by decoding to obtain the playing data corresponding to the target playing object.

In some embodiments, the apparatus further comprises: the live view angle set acquisition module is used for acquiring a live view angle set corresponding to a target live scene and establishing a cache group corresponding to each live view angle in the live view angle set; the cache group comprises data cache areas corresponding to the live broadcast devices corresponding to the live broadcast viewing angles and convergence cache areas; the live view angle corresponds to a plurality of live devices; the data buffer area corresponding to any one of the direct broadcasting equipment in the buffer group is a first data buffer area, and the convergence starting position determining module is further used for selecting a data buffer area except the first data buffer area from the buffer group corresponding to the first data buffer area as a second data buffer area when determining that the first data stream has data missing, and switching a data stream buffer source corresponding to the convergence buffer area from the first data buffer area to the second data buffer area.

In some embodiments, the target playing object is a target live video corresponding to a target live scene, and the first data stream receiving module is further configured to establish a first transmission channel with a first shooting device corresponding to the target live scene, and receive a first video stream transmitted by the first shooting device through the first transmission channel and a target scene identifier corresponding to the target live scene; taking the first video stream as a main video stream corresponding to the target live video based on the target scene identification; the second data stream receiving module is further used for establishing a second transmission channel with a second shooting device corresponding to the target live broadcast scene, and receiving a second video stream transmitted by the second shooting device through the second transmission channel and a target scene identifier corresponding to the target live broadcast scene; and taking the second video stream as a backup video stream corresponding to the target live video based on the target scene identification.

The specific limitation of the play data processing device may be referred to as limitation of the play data processing method hereinabove, and will not be described herein. The respective modules in the play data processing device described above may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In some embodiments, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 13. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a play data processing method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

In some embodiments, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 14. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data involved in the playing data processing method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a play data processing method.

It will be appreciated by those skilled in the art that the structures shown in fig. 13 and 14 are block diagrams of only portions of structures associated with the present application and are not intended to limit the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In some embodiments, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In some embodiments, a computer readable storage medium is provided, storing a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A play data processing method, the method comprising:

receiving a first data stream corresponding to a target playing object from a first transmission channel, and caching the first data stream through a first data cache region;

receiving a second data stream corresponding to the target playing object from a second transmission channel, and caching the second data stream through a second data cache region;

Taking the first data buffer area as a data stream buffer source corresponding to an aggregation buffer area, and storing the data stream in the first data buffer area into the aggregation buffer area;

when the first data stream is determined to have data missing, switching a data stream cache source corresponding to the convergence cache region from the first data cache region to the second data cache region, determining a data missing position corresponding to the first data stream, and determining a convergence starting position corresponding to the second data stream according to the data missing position;

starting from a convergence starting position corresponding to the second data stream, storing the data stream in the second data buffer area into the convergence buffer area;

and obtaining the playing data corresponding to the target playing object based on the converged data stream in the converged buffer area.

2. The method of claim 1, wherein the first data stream and the second data stream are encoded data streams; the determining the data missing position corresponding to the first data stream, and determining the convergence starting position corresponding to the second data stream according to the data missing position includes:

acquiring a target data position corresponding to the data missing position in the second data stream;

Acquiring a frame coding type of a target data frame corresponding to the target data position in the second data stream;

and determining a position determining strategy according to the frame coding type, and determining a convergence starting position corresponding to the second data flow according to the target data position and the position determining strategy.

3. The method of claim 2, wherein determining a position determination policy according to the frame coding type, and determining a convergence starting position corresponding to the second data stream according to the target data position and the position determination policy comprises:

when the frame coding type is a non-coding reference frame, determining the position determining strategy as a coding data group skipping strategy;

and skipping the target coding data set corresponding to the target data position based on the coding data set skipping strategy, and taking the position of the coding reference frame in the backward coding data set corresponding to the target coding data set as the convergence starting position corresponding to the second data stream.

4. The method of claim 2, wherein determining a position determination policy according to the frame coding type, and determining a convergence starting position corresponding to the second data stream according to the target data position and the position determination policy comprises:

When the frame coding type is a coding reference frame, determining the position determining strategy as a position maintaining strategy;

and taking the target data position as a convergence starting position corresponding to the second data flow based on the position maintaining strategy.

5. The method of claim 1, wherein the obtaining the play data corresponding to the target play object based on the aggregate data flows in the aggregate buffer comprises:

acquiring a target data length corresponding to the transcoding data set;

inserting a reference indication frame into the converged data stream in the converged buffer area according to the target data length to obtain an updated converged data stream;

in the transcoding process, a transcoding reference frame is determined based on the reference indication frame, and transcoding is carried out on a transcoding data set where the transcoding reference frame is positioned in the updated converged data stream based on the transcoding reference frame, so as to obtain a transcoding data stream;

and obtaining the playing data corresponding to the target playing object according to the transcoding data stream.

6. The method of claim 5, wherein during transcoding, determining a transcoded reference frame based on the reference indicator frame, and transcoding the transcoded data group in which the transcoded reference frame is located in the updated aggregate data stream based on the transcoded reference frame, to obtain a transcoded data stream includes:

Decoding based on the updated aggregate data stream to obtain a decoded data stream;

in the process of encoding the decoded data stream, when a reference indication frame is detected, taking a backward adjacent data frame of the reference indication frame as a transcoding reference frame, and carrying out intra-frame encoding on the transcoding reference frame to obtain an intra-frame encoded frame;

and transcoding the transcoding data group where the transcoding reference frame is located based on the intra-frame coding frame to obtain transcoding data in the transcoding data stream, wherein the transcoding data group where the transcoding reference frame is located comprises the data frame with the target data length.

7. The method of claim 1, wherein the first data stream and the second data stream are encoded data streams, and wherein the storing the data streams in the second data buffer into the aggregation buffer from the aggregation start position corresponding to the second data stream comprises:

acquiring a second coding parameter corresponding to the second data stream, and generating an information switching indication frame based on the second coding parameter; the information switching indication frame is used for indicating that in the decoding process, if the information switching indication frame is detected, the second coding parameter is used as a new coding parameter, so that the backward data stream of the information switching indication frame is decoded based on the second coding parameter;

And inserting the information switching indication frame into the tail end position of the first data stream in the convergence buffer area, starting from the convergence starting position corresponding to the second data stream, storing the data stream in the second data buffer area into the convergence buffer area, and taking the data stream as the backward data stream of the information switching indication frame.

8. The method of claim 7, wherein the obtaining the play data corresponding to the target play object based on the aggregate data flows in the aggregate buffer comprises:

in the process of decoding the converged data stream, decoding is carried out based on a first coding parameter corresponding to the first data stream;

when an information switching indication frame is detected, extracting a second coding parameter from the information switching indication frame, switching the coding parameter referenced by decoding from the first coding parameter to the second coding parameter, and decoding based on the second coding parameter;

and uniformly encoding the decoded data stream obtained by decoding to obtain the playing data corresponding to the target playing object.

9. The method according to claim 1, wherein the method further comprises:

Acquiring a live view angle set corresponding to a target live scene, and establishing a cache group corresponding to each live view angle in the live view angle set; the cache group comprises data cache areas corresponding to the live broadcast devices corresponding to the live broadcast viewing angles and convergence cache areas; the live broadcast visual angle corresponds to a plurality of live broadcast devices;

the data buffer area corresponding to any one of the direct broadcast equipment in the buffer group is a first data buffer area, and when determining that the first data stream has data missing, switching the data stream buffer source corresponding to the convergence buffer area from the first data buffer area to the second data buffer area includes:

when the first data stream is determined to have data missing, selecting a data buffer area outside the first data buffer area from a buffer group corresponding to the first data buffer area as the second data buffer area, and switching a data stream buffer source corresponding to the convergence buffer area from the first data buffer area to the second data buffer area.

10. The method of claim 1, wherein the target play object is a target live video corresponding to a target live scene, and the receiving, from the first transmission channel, a first data stream corresponding to the target play object, and buffering the first data stream by the first data buffer area includes:

A first transmission channel is established with first shooting equipment corresponding to the target live broadcast scene, and a first video stream transmitted by the first shooting equipment through the first transmission channel and a target scene identifier corresponding to the target live broadcast scene are received; taking the first video stream as a main video stream corresponding to the target live video based on the target scene identification;

the receiving the second data stream corresponding to the target playing object from the second transmission channel, and caching the second data stream through the second data cache area includes:

a second transmission channel is established with second shooting equipment corresponding to the target live broadcast scene, and a second video stream transmitted by the second shooting equipment through the second transmission channel and a target scene identifier corresponding to the target live broadcast scene are received; and taking the second video stream as a backup video stream corresponding to the target live video based on the target scene identification.

11. A play data processing device, the device comprising:

the first data stream receiving module is used for receiving a first data stream corresponding to a target playing object from a first transmission channel and caching the first data stream through a first data cache area;

The second data stream receiving module is used for receiving a second data stream corresponding to the target playing object from a second transmission channel and caching the second data stream through a second data cache area;

the first data stream aggregation module is used for taking the first data buffer area as a data stream buffer source corresponding to the aggregation buffer area and storing the data stream in the first data buffer area into the aggregation buffer area;

the convergence starting position determining module is used for switching a data stream buffer source corresponding to the convergence buffer area from the first data buffer area to the second data buffer area when determining that the first data stream has data missing, determining a data missing position corresponding to the first data stream, and determining a convergence starting position corresponding to the second data stream according to the data missing position;

the second data stream convergence module is used for storing the data streams in the second data buffer area into the convergence buffer area from the convergence starting position corresponding to the second data streams;

and the play data obtaining module is used for obtaining the play data corresponding to the target play object based on the converged data stream in the converged buffer area.

12. The apparatus of claim 11, wherein the first data stream and the second data stream are encoded data streams; the convergence starting position determining module comprises:

a target data position determining unit, configured to obtain a target data position corresponding to the data missing position in the second data stream;

a frame coding type obtaining unit, configured to obtain a frame coding type of a target data frame corresponding to the target data position in the second data stream;

and the convergence starting position determining unit is used for determining a position determining strategy according to the frame coding type and determining a convergence starting position corresponding to the second data flow according to the target data position and the position determining strategy.

13. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 10 when the computer program is executed.

14. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 10.

15. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 10.