CN110392269B

CN110392269B - Media data processing method and device and media data playing method and device

Info

Publication number: CN110392269B
Application number: CN201810345274.2A
Authority: CN
Inventors: 江天德
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-04-17
Filing date: 2018-04-17
Publication date: 2021-11-30
Anticipated expiration: 2038-04-17
Also published as: CN110392269A

Abstract

The application relates to a media data processing method and device and a media data playing method and device, wherein the media data processing method comprises the following steps: receiving a media data request; determining an initiation frame requested by a media data request in the media data; when the start-up frame is not a key frame, searching a key frame which is before the start-up frame and is closest to the start-up frame and a prediction frame between the closest key frame and the start-up frame from the media data; constructing a key frame according to the searched key frame and the predicted frame; and feeding back the constructed key frame and a frame following the start playing frame in the media data in response to the media data request. The scheme provided by the application can reduce the delay of the fed back media data.

Description

Media data processing method and device and media data playing method and device

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for processing media data, and a method and an apparatus for playing media data.

Background

With the rapid development of computer technology, media data can be transmitted to users around the world instantly when requested. In particular, in recent years, video live broadcasting technology is applied to the aspects of life and work of people, and video data can be immediately presented to users after being collected in real time, so that various social activities can be transmitted to users around the world by virtue of a live broadcasting platform, such as a publishing meeting, a sports event, a teleconference and the like.

Generally, after receiving a media data request, the fed back media data needs to include a key frame to ensure normal playing of the media data. However, at present, after receiving the media data request, the feedback is started from the key frame found in the media data, which causes a large delay in the fed back media data.

Disclosure of Invention

Therefore, it is necessary to provide a media data processing method and apparatus, and a media data playing method and apparatus, aiming at the technical problem that the existing fed back media data has a large delay.

A media data processing method, comprising:

receiving a media data request;

determining, in media data, an initiation frame requested by the media data request;

when the broadcast starting frame is not a key frame, searching a key frame which is before the broadcast starting frame and is closest to the broadcast starting frame and a prediction frame between the closest key frame and the broadcast starting frame from the media data;

constructing a key frame according to the searched key frame and the predicted frame;

and responding to the media data request, and feeding back the constructed key frame and a frame following the start playing frame in the media data.

A media data playback method, comprising:

sending a media data request;

receiving a frame fed back in response to the media data request; the received frames include a key frame and a frame following the start-up frame in the media data; the fed-back key frame is constructed according to a prediction frame and a key frame which is in the media data and is closest to the broadcast starting frame before the broadcast starting frame; the predicted frame is a predicted frame between the nearest key frame and the broadcast starting frame; the initiating frame is specified by the media data request;

and decoding and playing the received frames from the key frames in the received frames.

A media data processing apparatus, comprising:

a receiving module for receiving a media data request;

a determining module for determining an initiating frame requested by the media data request in media data;

the searching module is used for searching a key frame which is before the playing start frame and is closest to the playing start frame and a prediction frame between the closest key frame and the playing start frame from the media data when the playing start frame is not a key frame;

the construction module is used for constructing a key frame according to the searched key frame and the predicted frame;

and the response module is used for responding to the media data request and feeding back the constructed key frame and a frame after the start frame in the media data.

A media data playback apparatus comprising:

a sending module, configured to send a media data request;

a receiving module for receiving a frame fed back in response to the media data request; the received frames include a key frame and a frame following the start-up frame in the media data; the fed-back key frame is constructed according to a prediction frame and a key frame which is in the media data and is closest to the broadcast starting frame before the broadcast starting frame; the predicted frame is a predicted frame between the nearest key frame and the broadcast starting frame; the initiating frame is specified by the media data request;

and the playing module is used for decoding and playing the received frames from the key frames in the received frames.

A computer-readable storage medium, which stores a computer program that, when executed by a processor, causes the processor to execute the steps of the above-described media data processing method or the above-described media data playing method.

A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the above-mentioned media data processing method or the above-mentioned media data playing method.

According to the media data processing method, the media data processing device, the computer readable storage medium and the computer device, when the start frame requested by the media data request is not a key frame, in order to enable the fed-back data to include the key frame, the key frame which is before the start frame and is closest to the start frame is searched from the media data, and the key frame is constructed according to the searched key frame and the predicted frame between the key frame and the start frame, so that when the media data request is responded, the constructed key frame and the frame which is after the start frame in the media data can be directly fed back, the media data do not need to be fed back in sequence from the searched key frame which is before the start frame, and the delay of the fed-back media data can be reduced.

Drawings

FIG. 1 is a diagram of an embodiment of a media data processing method;

FIG. 2 is a flow diagram illustrating a method for media data processing according to one embodiment;

FIG. 3 is a diagram illustrating a determination that an initiating frame in a live video stream is a key frame in one embodiment;

FIG. 4 is a diagram illustrating a determination that an initiating frame in a live video stream is not a key frame in one embodiment;

FIG. 5 is a diagram illustrating a user requesting a live video stream from a server via a terminal in an application scenario;

FIG. 6 is a block diagram that illustrates a client requesting a server to play a live video stream, in accordance with an embodiment;

FIG. 7 is a diagram illustrating partitioning of media data into a plurality of media segments, according to one embodiment;

FIG. 8 is a block diagram of another embodiment of a client requesting to play a live video stream from a server;

FIG. 9 is a flow diagram of a method for media data processing in accordance with an exemplary embodiment;

FIG. 10 is a flowchart illustrating a method for playing media data according to an embodiment;

FIG. 11 is a block diagram showing the construction of a media data processing apparatus according to one embodiment;

fig. 12 is a block diagram showing the construction of a media data processing apparatus according to another embodiment;

FIG. 13 is a block diagram showing the construction of a media data playback apparatus according to an embodiment;

FIG. 14 is a block diagram showing the construction of a computer device according to one embodiment;

fig. 15 is a block diagram showing a configuration of a computer device according to another embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Fig. 1 is a diagram of an application environment of a media data processing method according to an embodiment. Referring to fig. 1, the media data processing method is applied to a media data processing system. The media data processing system includes a terminal 110 and a server 120. The terminal 110 and the server 120 are connected through a network. The terminal 110 may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, a notebook computer, and the like. The server 120 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers.

The terminal 110 may send a media data request to the server 120. The server 120 may receive a media data request. The server 120 may determine, in the media data, an initiation frame requested by the media data request after receiving the media data request transmitted by the terminal 110. The server 120 may look up a key frame that is closest to the start frame before the start frame and a predicted frame between the closest key frame and the start frame from the media data when the start frame is not a key frame. The server 120 may construct a key frame from the found key frame and predicted frame. The server 120 may feed back the constructed key frame and a frame following the start frame in the media data to the terminal 110 in response to a media data request transmitted by the terminal 110. The terminal 110 may receive a frame fed back by the server 120 in response to the media data request. The terminal 110 may decode and play the received frames from the key frames in the received frames.

As shown in fig. 2, in one embodiment, a media data processing method is provided. The embodiment is mainly illustrated by applying the method to the server 120 in fig. 1. Referring to fig. 2, the media data processing method specifically includes the following steps:

s202, a media data request is received.

The media data is data which is obtained by compressing original data and is convenient to store and transmit. The raw data may be picture data or video data. The raw data may be collected in real time or may be pre-stored. In one embodiment, the media data may be a live video stream as applied in live video technology.

The minimum transmission unit of the media data is a Frame (Frame). Each frame of media data may be a key frame or a predicted frame. The key frame is a frame which carries complete information of the original data of the current frame and can independently decode the original data according to the frame. A predicted frame is a frame that needs to be decoded with reference to other frames to obtain all the information of the original data of the current frame, and the predicted frame only carries redundant information with respect to other frames.

In one embodiment, the key Frame may be an Intra-coded Frame (I-Frame) and the predicted Frame may be an Inter-coded Frame (Inter-coded Frame). The inter-coded Frame includes a P Frame (forward-coded Frame) and/or a B Frame (bidirectional Predicted Frame).

The media data request is an instruction for requesting media data. In one embodiment, a server may receive a request for media data over a network, determine requested media data from the request for media data, and feed back frames in the requested media data in response to the request for media data.

In an embodiment, the server may also encode the immediately acquired original data after receiving the media data request, and feed back the encoded media data to the terminal through the network. The server can also obtain the stored coded media data and feed the media data back to the terminal through the network.

In one embodiment, the media data request may be triggered by the user clicking on an application on the terminal or logging into an associated website that provides a platform for the user to view the media. It will be appreciated that the server may receive media data requests triggered by different users via the terminal at different times.

S204, determining the start frame requested by the media data request in the media data.

Wherein the start frame is a frame of the media data corresponding to the received media data request. It is to be understood that the start-up frames corresponding to different media data requests for the same media data may be different.

The server may identify the respective frame by a timestamp (timestamp) of the respective frame in the media data. In an embodiment, the server may obtain a timestamp of each frame in the media data, and store a corresponding relationship between the timestamp of each frame and the broadcast start time node, when a media data broadcast request is obtained, according to the corresponding relationship, find a frame identified by the timestamp corresponding to the broadcast start time node from the media data, and use the found frame as a broadcast start frame corresponding to the media data request.

In one embodiment, the media data is a live video stream. Specifically, the server may encode live video images acquired in real time to obtain video frames, and when a live video stream playing request is obtained, in order to reduce a delay of live video playing, a newly obtained video frame is used as a play start frame corresponding to the live video stream playing request. It is to be understood that the newly acquired video frame may be a video frame that has been encoded before the live video stream request is acquired.

In one embodiment, when obtaining each video frame, the server records the corresponding relationship between the local current system time and the timestamp of each video frame, extracts the time sent by the request from the request after obtaining the live video stream playing request, finds the video frame identified by the timestamp corresponding to the local system time which is the same as the time sent by the request according to the recorded corresponding relationship, and takes the video frame as the playing start frame. It will be appreciated that since there will typically be a time difference between the transmission and reception of the data, the system time local to the server when the request for live streaming is received will typically be later than the time at which the request was made, and video frames corresponding to the same local system time as the time at which the request was made will have been acquired by the server.

For example, in a live video platform, the anchor is from 13:00:00, starting live broadcasting, and acquiring live video streams pushed by the anchor in real time through a network by the server. Assuming that a live video stream playing request is sent to the server by a 13:10:00 user through the terminal when live broadcasting is performed for 10 minutes, and the local system time of the server when receiving the request is 13:10:05, the server can extract the time point of the video stream playing request sent when receiving the request, namely 13:10:05, and take the video frame identified by the timestamp with the local system time of 13:10:00 in the media data as the start-to-play frame corresponding to the live video stream playing request. Of course, in order to reduce the delay of the live video, the server may also use the video frame identified by the timestamp corresponding to the time point with the local system time of 13:10:04 in the media data as the start-up frame. In order to avoid that the server does not acquire the live video stream pushed by the anchor at 13:10:04 when receiving the request, namely at 13:10:05, the server may also use the video frame identified by the timestamp corresponding to the time point 13:09:59 as the start frame.

It will be appreciated that the frame rate of the media data in the above example is 1, i.e. one video frame per second. In practical situations, one second may correspond to a plurality of video frames, so that the time when the user issues the live video stream request is also refined to the millisecond. Or, the time unit of the request of the live video stream sent by the user is also the unit of second, but the server defaults to set the start-playing frame corresponding to the request in the live video stream as one of the video frames corresponding to the second.

S206, when the starting frame is not the key frame, searching the key frame which is before the starting frame and is nearest to the starting frame and the prediction frame between the nearest key frame and the starting frame from the media data.

When the start frame is not a key frame, the start frame may be a predicted frame. It can be understood that, in order to ensure that the media data can be correctly decoded and played during playing, the frames in the media data are sequenced. The frame preceding the start frame refers to a frame in the media data that occurs earlier than the start frame.

When the initiating frame is not a key frame, the initiating frame may be a predicted frame generated with reference to a key frame in the media data that precedes the initiating frame and is closest to the initiating frame. The initiating frame may also be a predicted frame generated with reference to a key frame in the media data that precedes the initiating frame, is closest to the initiating frame, and/or is a predicted frame that precedes the initiating frame or follows the initiating frame.

Specifically, when the start frame is not a key frame, the server may find a key frame that is before the start frame and closest to the start frame in the media data, and determine a predicted frame between the start frame and the found key frame.

In one embodiment, after the server acquires the media data, the server may store a correspondence between the type of each frame in the media data and the timestamp of each frame, and after the server determines an initiating frame in the media data, the server may determine whether the initiating frame is a key frame according to the correspondence.

Further, when determining that the start frame is not a key frame, the server may sequentially search the types of frames identified by the timestamps forward according to the time sequence of the timestamps of the frames until finding the key frame closest to the start frame, and determine a predicted frame between the key frame and the start frame in the media data.

In one embodiment, each frame in the media data includes a frame type field. The server can perform inverse coding on the frame type field in the determined broadcast starting frame, so as to determine whether the broadcast starting frame is a key frame according to the result of the inverse coding, when the broadcast starting frame is determined not to be the key frame, forward and sequentially search each frame identified by the timestamp before the timestamp according to the timestamp of the broadcast starting frame, and judge whether the searched frame is the key frame according to the frame type field of each searched frame.

In one embodiment, when the server determines that the start frame is a key frame, the start frame and a frame of the media data after the start frame can be directly fed back as a response to the media data request.

Fig. 3 is a schematic diagram illustrating a case where the start frame determined in the live video stream is a key frame in one embodiment. Referring to fig. 3, if the determined start frame is an I frame, the server may directly use the start frame and frames following the start frame in the media data as feedback of the request for playing the live video stream.

And S208, constructing the key frame according to the searched key frame and the predicted frame.

In particular, when the initiating frame is not a key frame, then the initiating frame may be a predicted frame. That is, the initiating frame may carry redundant information with respect to the key frame that precedes and is closest to the initiating frame. The initiating frame may also carry redundant information for the key frame and other predicted frames between the key frame and the initiating frame. It can be understood that the decoding of the start frame needs to refer to the key frame, or the decoding of the start frame needs to refer to the key frame and the other predicted frames mentioned above, and the start frame can completely express all the information of the corresponding original data after being decoded.

Therefore, the server can construct a key frame capable of completely expressing the original data when acquiring the key frame closest to the broadcast starting frame and other prediction frames between the key frame and the broadcast starting frame, and all information of the original data capable of being expressed by the constructed key frame is close to all information of the original data to be expressed by the broadcast starting frame after being decoded.

Fig. 4 is a schematic diagram illustrating the construction of a key frame when a start frame determined in a live video stream is not a key frame in one embodiment. Referring to fig. 4, the determined start frame is a P frame, the start frame is not a key frame, the server searches forward for the nearest I frame, constructs a key frame according to the I frame and a prediction frame (a part of a live video stream surrounded by a shaded square in the figure) between the I frame and the start frame, and sequentially feeds back the constructed key frame and a frame after the start frame in the media data to the terminal. Generally, the number of predicted frames between an I frame and an initiating frame is many, and if the server feeds back other frames in the media data to the terminal in sequence from the found nearest I frame, the terminal decodes and plays the I frame and the predicted frames between the I frame and the initiating frame from the received I frame, so that the delay time of playing the live video stream is increased.

In one embodiment, the server may also construct a key frame from the found key frame, the predicted frame, and the start-up frame. In this embodiment, the constructed key frame is generated according to the broadcast start frame, that is, the constructed key frame refers to not only the searched key frame and the predicted frame, but also the redundant information carried by the broadcast start frame itself, so that the information of the original data that can be expressed by the constructed key frame is closer to the information of the original data that is expressed when the broadcast start frame is decoded.

In one embodiment, the located key frame may be the previous frame of the determined initiating frame, i.e., there are no other predicted frames before the initiating frame and the located key frame. In this case, the server may construct a key frame from the found key frame and the start-up frame. The server can also directly feed back the key frame and the frame following the start frame in the media data to the terminal.

S210, responding to the media data request, feeding back the constructed key frame and the frame after the start playing frame in the media data.

Specifically, after the key frame is constructed, the server may feed back the constructed key frame and a frame following the start frame in the media data requested by the media data request to the terminal. After receiving the feedback frame, the terminal decodes and plays the received frame in sequence from the key frame.

In one embodiment, the server may construct, for each predicted frame in the acquired media data, a key frame corresponding to the predicted frame based on a previous and latest key frame and other predicted frames between the latest key frame and the predicted frame, and store the constructed key frame in advance. When the broadcast starting frame corresponding to the media data request sent by the terminal is a predicted frame, a pre-stored key frame corresponding to the predicted frame can be directly obtained, and the key frame and a frame behind the predicted frame in the media data are fed back to the terminal.

As shown in fig. 5, in a specific application scenario, a framework diagram of a user requesting a live video stream from a server through a terminal is shown. Referring to fig. 5, live video data of an anchor terminal 502 in a live broadcast process is pushed to a live broadcast streaming server 504 in real time, when receiving a video live broadcast streaming play request sent by a user terminal 506, the live broadcast streaming server 505 immediately feeds back a live broadcast video frame according to the request sent by the user terminal, if a frame identified by a timestamp corresponding to a time node when a certain user clicks to play the live broadcast video stream is not a key frame, the server searches for a nearest key frame before the frame in the obtained live broadcast video stream, constructs a new key frame according to the searched key frame and a predicted frame between the key frame and the frame, immediately feeds back the key frame and a frame after the frame corresponding to the time node to the user, and as a first frame received by the user terminal is a key frame, a phenomenon of 'screen splash' does not occur, and because the feedback is started from the frame after the frame corresponding to the current time node, the delay of pushing the live video stream to the terminal is reduced, and users in various places can feel the effect of synchronous live broadcasting.

According to the media data processing method, when the broadcast starting frame requested by the media data request is not a key frame, in order to enable the fed back data to contain the key frame, the key frame which is before the broadcast starting frame and is closest to the broadcast starting frame is searched from the media data, and the key frame is constructed according to the searched key frame and the predicted frame between the key frame and the broadcast starting frame, so that when the media data request is responded, the constructed key frame and the frame which is after the broadcast starting frame in the media data can be directly fed back, the media data do not need to be fed back in sequence from the searched key frame which is before the broadcast starting frame, and the delay of the fed back media data can be reduced.

In one embodiment, step S204 is to determine the start frame requested by the media data request in the media data, and specifically includes: determining media data specified by the media data request; determining a play-out time node according to the media data request; and determining the frame matched with the play-out time node in the media data as the play-out frame.

The play-starting time node is a preset time node for playing a certain frame in the media data. It is understood that each frame in the media data has a corresponding start time node, and when the start time node is reached, the frame corresponding to the start time node can be output for playing. In one embodiment, the start-up time node corresponds to a timestamp of each frame in the media data.

Specifically, the server may determine the media data specified by the media data request after receiving the media data request sent by the terminal. For example, a user clicks certain live video push information on a terminal, requests a server for a live video stream corresponding to the link, and the server can determine an address of the corresponding live video stream according to a received live video stream playing request and issue the address to the terminal, so that the terminal can continuously acquire and play the live video stream from the live video stream address.

After determining the media data corresponding to the media data request, the server may determine a start-up time node according to the request. In one embodiment, the server may extract from the request the time node from which the request was issued as the start-of-play time node. The server may also take the local current system time when the request was received as the start-of-broadcast time node.

In an embodiment, the server may select, from the play time nodes corresponding to frames in the acquired media data, a play time node closest to a local system time when the media data request is received, and use a next play time node after the selected closest play time node as a play start time node.

After determining the start-of-play time node, the server may determine from the media data a start-of-play frame that matches the start-of-play time node. In an embodiment, the server may preset a corresponding relationship between each frame in the media data and the start-playing time node, extract the start-playing time node from the media data playing request when the media data playing request is obtained, and find the start-playing frame corresponding to the start-playing time node from the media data according to the corresponding relationship. For example, the frame rate of the media data obtained by compressing the original data by the server is 6fps (frame Per second), that is, 6 frames of data Per second. Assuming that the user requests to play from the 10 th second of the media data, the server determines that the play start frame in the media data corresponding to the media data request is the 60 th frame in the media data.

It is to be understood that one start-up time node may correspond to a plurality of frames of media data, and thus, in the above example, the start-up frame may be any one of the 60 th frame to the 65 th frame. Of course, the start time node may be more refined, for example, in the 10 th to 11 th seconds, the time period of this second is divided into 6 start time nodes, which are respectively in the 10 th, 10+1/6 th, 10+2/6 th, 10+3/6 th, 10+4/6 th and 10+5/6 th seconds, and then these start time nodes respectively correspond to the 60 th, 61 th, 62 th, 63 th, 64 th and 65 th frames.

In this embodiment, the start-up frames in the media data for which each received media data request is directed are different, and the server may determine the start-up time node according to the media data request, and may determine the start-up frame corresponding to the request in the media data.

In one embodiment, the media data request is a live video stream play request; the media data is live video stream; determining a start-of-play time node from a media data request, comprising: determining a live broadcast delay time length according to the media data request; acquiring a current time node; and determining a broadcast starting time node according to the current time node and the live broadcast delay time, wherein the broadcast starting time node is earlier than the current time node by the live broadcast delay time.

The live broadcast delay time is the time difference between the time node when the server receives the latest frame in the live broadcast video stream pushed by the main broadcast and the time node when the frame is fed back to the terminal. The live delay period may be 3-6 seconds. The current time node is the time node corresponding to the latest frame in the live video stream acquired by the server.

For example, if the anchor starts to push the stream to the server from 13:00:00 for 10 minutes, then the current time node may be 13:10: 00. The server may also consider 13:00:00 as the start time node 00:00:00 of the live video stream, and then the current time node is 00:10: 00. Assuming that the live delay time is 5 seconds, the corresponding start time node is 13:09:55 or 00:09: 55. It will be appreciated that at the current time 13:10:00, the server receives the latest frame in the live video stream of the main push, and that this frame will not be fed back to the terminal at the time 13:10: 05. And the frame pushed to the terminal at the current moment is the frame in the live video stream of the main push received at the 13:09:55 or at the 00:09:55 th server. It should be noted that the live broadcast delay time carried in the live broadcast video stream playing request is a time within a range of the receivable delay time of the user, and with this delay time, it can be avoided that when the user requests the live broadcast video stream, the anchor broadcast is affected by the network or external factors, and the latest video frame has not yet come to be pushed to the server, or the server has not yet come to feed back the latest video frame to the terminal, which results in the phenomenon of "playing by card" of the player at the terminal.

Specifically, a live broadcast video stream playing request sent by a terminal to a server carries live broadcast delay time, the server extracts the live broadcast delay time from the received live broadcast video stream playing request, determines a current time node according to a time node corresponding to a latest frame pushed by a main broadcast, and takes the time node which is earlier than the determined current time node by the live broadcast delay time as a broadcast starting time node.

As shown in fig. 6, in one embodiment, a framework diagram is provided in which a client requests to play a live video stream from a server. Referring to fig. 6, the client only needs to send a live video stream playing request to the sending server to load the live video stream to be played and explicitly inform the configured live delay time; and secondly, playing the live video stream data requested from the server through a player, so that the user can watch the live video stream. Compared with the case that the client needs to request the server for the I frame index list in advance, inquire the I frame closest to the current playing time node according to the list, and request the server to start playing from the closest I frame, the scheme in the figure 6 has no request process, and does not need to start playing from the searched I frame, so that the playing delay of the live video can be greatly reduced.

In one embodiment, the server may adjust the live delay duration configured by the terminal according to the current network conditions. When the network condition is good, the server receives the video frame pushed by the main broadcast very timely, and the video frame fed back to the terminal is also very timely, so that the server can properly reduce the live broadcast delay time, the picture played after the terminal decodes according to the received latest video frame is closer to the live broadcast picture of the main broadcast, and the low-delay live broadcast effect is realized.

In one embodiment, the live video stream playing request sent by the terminal to the server may not carry live delay time, and the live delay time is set by the server in a default manner. In one embodiment, the server may also dynamically set the corresponding live broadcast delay time length by referring to the geographical location information of the terminal that sends the live broadcast video stream, the network condition information of the terminal, the related information of the application program used for watching the live broadcast video, the user account information, and the like.

In this embodiment, the present scheme may be applied to play a live video stream, and when a terminal requests a server for a live video stream, in order to reduce a "card play" occurring when a terminal player decodes a fed-back video frame, a start play time node may be a live delay time later than a current time node.

In one embodiment, the media data comprises media segments; the media fragment comprises a plurality of frames; the start-up frame is a frame in the start-up media segment; step S210, responding to the media data request, feeding back the constructed key frame and the frame after the start frame in the media data, specifically including: inserting the constructed key frame into the playing start media fragment; and responding to the media data request, and feeding back the start-playing media segment inserted with the key frame and the media segments following the start-playing media segment in the media data.

The media fragment is the fragment data containing multiple frames in the media data. The unit of media data transmission may also be a media slice.

In one embodiment, after the server acquires the media data, the plurality of frames are divided into one media segment according to the sequence of the frames in the media data, so that the media data is divided into the plurality of media segments. The number of frames in the divided media segments may be the same or different. The server may also record the correspondence between each media segment and each contained frame. It will be appreciated that key frames and/or predicted frames may be included in a media segment. There may be no key frames in a media segment.

In one embodiment, the server may generate a key frame index table, and by querying the key frame index table, it may be determined whether a media segment corresponding to each segment identifier includes a key frame.

In one embodiment, the server may generate a corresponding segment identifier for each media segment in the media data. The segment identity may be generated from timestamps of frames contained in the media segment. For example, if the timestamp of each frame included in the a-media segment is 801,802,803,804, etc., the segment id of the a-media segment may be 8, and the timestamp of each frame included in the B-media segment is 901,902,903,904, etc., the segment id of the B-media segment may be 9.

The start-playing media segment is a media segment to which a start-playing frame corresponding to the received media data request belongs in the media data.

Specifically, the server may insert the constructed key frame into the start-playing media segment, so that the key frame exists in the start-playing media segment, and then feed back the start-playing media segment including the constructed key frame and the media segment following the start-playing media segment in the media data to the terminal.

In one embodiment, the server may insert the constructed key frame into the first position of the originating media segment. When the terminal receives the start-up media fragment, the terminal can directly decode the first frame in the received start-up media fragment, the first frame is a key frame, the terminal can independently and normally decode according to the frame to obtain all the information of the complete original data without referring to other frames, and therefore the phenomenon of screen splash of the first picture played by the player can be avoided.

As shown in fig. 7, in one embodiment, a schematic diagram of the division of media data into a plurality of media segments is shown. The media data shown in fig. 7 is a live video stream including I frames, P frames, and B frames. Referring to fig. 7, a plurality of media segments are shown in succession, each media segment comprising 6 video frames, the first frame of media segment 6 being an I-frame, none of media segments 7 and 8 being an I-frame, only a P-frame or a B-frame. It can be understood that the server sequentially feeds back the media segments 6, 7, 8, and … to the terminal in the order of the segment identifiers, the terminal sequentially decodes and plays from the received media segments 6, 7, 8, and …, and the terminal needs to refer to the I frame in the media segment 6 when decoding and playing each predicted frame in the media segments 7 and 8.

In this embodiment, the server divides the media data into a plurality of media segments, and feeds the media segments back to the terminal as a transmission unit, so that the constructed key frame is inserted into the start-playing media segment in order to enable the terminal to normally decode and play when receiving the start-playing media segment, and thus the terminal can decode and start playing the complete data due to all information including the complete original data in the start-playing media segment.

In one embodiment, in step S206, when the start frame is not a key frame, searching for a key frame closest to the start frame and a prediction frame between the closest key frame and the start frame before the start frame from the media data specifically includes: when the start-playing media segment does not comprise the key frame, searching the media segment which is nearest to the start-playing media segment and comprises the key frame from the media data; searching key frames from the searched media fragments; and acquiring the predicted frame between the searched key frame and the start-playing media fragment.

Specifically, after determining a start-playing media segment to which a start-playing frame corresponding to the media data request belongs, the server queries whether the start-playing media segment includes a key frame; when the start-playing media segment does not include the key frame, the frames in the start-playing media segment are all prediction frames, and the server can search the media data for the media segment which is nearest before the start-playing media segment and comprises the key frame; and searching a key frame from the searched media segment, and acquiring a predicted frame between the searched key frame and the starting playing media segment.

It is to be appreciated that when the start-up media segment does not include a key frame, the frames in the start-up media segment and the located predicted frames are generated with reference to the located key frame. In order to guarantee the timeliness of the data transmission, the number of frames contained in a media slice may be small. In order to ensure the compression rate, the number of predicted frames between two key frames in the media data may be large, so that the frequency of occurrence of key frames is low, and in this case, if a key frame is included in a media segment, only one key frame is usually included.

In one embodiment, the server may determine whether the on-air media segment includes a key frame by querying a key frame index list. Further, after it is determined that the start-playing media segment does not include the key frame, whether the media segment identified by each segment identifier includes the key frame or not may be sequentially searched forward from the segment identifier corresponding to the start-playing media segment according to the time sequence corresponding to the segment identifier of each media segment until the media segment including the key frame is found, the key frame in the segment is determined, and the predicted frame between the key frame and the start-playing media segment is obtained.

In one embodiment, the server may construct the key frames from the key frames and predicted frames found from the nearest media segment. Because each frame in the start-up media segment carries redundant information relative to the searched key frame and also carries redundant information relative to the searched prediction frame, decoding of each frame in the start-up media segment needs to refer to the searched key frame and prediction frame. The constructed key frame is constructed according to the searched key frame and the predicted frame between the searched key frame and the start-up media fragment, so that all information of the original data which can be expressed by the constructed key frame is very close to all information of the original data which is to be expressed after the first frame in the start-up media fragment is decoded.

In this embodiment, when the start-up media segment does not include a key frame, the server may construct a new key frame according to the found key frame and a predicted frame between the key frame and the start-up media segment, and the constructed key frame is inserted into the start-up media segment, so that the media segment includes a frame capable of expressing complete information of the original data.

In one embodiment, step S204 is to determine the start frame requested by the media data request in the media data, and specifically includes: determining a play-out time node according to the media data request; and searching the play-starting media fragment matched with the play-starting time node in the media data.

It can be understood that each frame in the media data has a corresponding start-playing time node, after the server divides each frame into a plurality of media segments, each media segment corresponds to a start-playing time node of a range, when any start-playing time node in the range is reached, the media segment to which the start-playing frame corresponding to the start-playing time node belongs is the start-playing media segment, and the start-playing media segment can be output for decoding and playing.

In one embodiment, a server acquires media data, divides the media data into a plurality of media fragments, records timestamps of frames in the media fragments, stores the timestamps of the frames corresponding to broadcast start time nodes corresponding to the frames, and stores fragment identifiers of the media fragments corresponding to the timestamps of the frames.

For example, in the media data, timestamps corresponding to frames in the media segment identified by the segment identifier 7 are 701, 702, 703, 704, 705, and 706, and the start time nodes corresponding to frames are 7 seconds, 7+1/6 seconds, 7+2/6 seconds, 7+3/6 seconds, 7+4/6 seconds, and 7+5/6 seconds, so that the start time node range corresponding to the media segment identified by the segment identifier 7 is 7 to 8 seconds; the timestamps corresponding to the frames in the start-playing media segment identified by the segment identifier 8 are 801,802,803,804, 805, and 806, the start-playing time nodes corresponding to the frames are 8 seconds, 8+1/6 seconds, 8+2/6 seconds, 8+3/6 seconds, 8+4/6 seconds, and 8+5/6 seconds, and the start-playing time node range corresponding to the media segment identified by the segment identifier 8 is 8 to 9 seconds; when the server determines that the start-playing time node is 7+3/6 seconds according to the media data request, the server determines that the timestamp corresponding to the start-playing time node is 703 according to the start-playing time node, and the frame identified by the timestamp 703 belongs to the media segment identified by the segment identifier 7, so that the media segment 7 can be used as the start-playing media segment.

As shown in fig. 8, in one embodiment, a framework diagram is shown in which a client requests to play a live video stream from a server. Referring to fig. 8, a live video stream data is divided into a plurality of video segments, a client sends a live video stream playing request to a server, the server determines a play-starting video segment according to the request, as shown in fig. 8, the segment identifier of the play-starting video segment is 8, after determining that the play-starting video segment does not include an I frame, the server finds that the video segment before the play-starting video segment and closest to the play-starting video segment is a video segment 6, the I frame in the video segment 6 is the first frame in the segment, the server constructs a key frame according to the I frame and each predicted frame (the part of the live video stream surrounded by a shaded square in the figure) between the I frame and the play-starting video segment 8, inserts the constructed key frame into the video segment 8, and sequentially feeds back the play-starting video segment and the video segments 9 and 10 and the like after the video segment 8 in the live video stream to the client, the client side decodes and plays the received video fragments 8, 9, 10 and the like sequentially through the player.

In this embodiment, the server may determine the start-playing time node according to the media data playing request sent by the terminal, and further determine the start-playing media segment according to the start-playing time node, so that the media data may be sequentially fed back to the terminal from the start-playing media segment, and the delay of the fed-back media data may be reduced.

In one embodiment, the media data includes a group of frames; each frame group comprises a plurality of frames, and the first frame of each frame group is a key frame; step S206, when the start frame is not a key frame, searching a key frame closest to the start frame and a prediction frame between the closest key frame and the start frame before the start frame from the media data, specifically including: determining a broadcast starting frame group where a broadcast starting frame is located; and when the starting frame is not the first frame in the starting frame group, acquiring a key frame serving as the first frame in the starting frame group and a prediction frame between the first frame and the starting frame.

Wherein a Group of pictures is a sequence of frames formed by a set of consecutive frames in the media data. The frame group is also called a frame picture. The first frame of the frame group is a key frame and the rest of the frames in the frame group are predicted frames. The predicted frames in a group of frames are generated with reference to redundant information with respect to the first frame. The difference between the information of the original data to be expressed after each frame in the frame group is decoded is not too large. It will be appreciated that a frame group may comprise a plurality of media segments.

In one embodiment, the server may record information for each frame included in each frame group. For example, the time stamp of each frame in a group of frames may be recorded. Specifically, the server may determine, according to a timestamp of the start frame, a frame group in which the start frame is located, determine the frame group as the start frame group, and when the determined start frame is not a first frame in the start frame group, acquire a key frame in the start frame group as the first frame and a predicted frame between the first frame and the start frame in the start frame group.

In one embodiment, the server may construct a key frame corresponding to the start frame from the key frames in the acquired start frame group and the predicted frame between the first frame and the start frame. It can be understood that the constructed key frame refers to the key frame in the frame group, the predicted frame between the first frame and the start frame, so that the information of the original data that can be expressed by the constructed key frame is very close to the information of the original data that can be expressed when the start frame is decoded.

In one embodiment, the server may feed back frames following the start frame in the media data to the terminal in sequence from the constructed key frame, so that the terminal decodes and plays the media data in sequence after receiving the frames, and delay of the fed back media data can be reduced.

In this embodiment, the media data is composed of a frame group, and therefore when the determined start frame is not a key frame, the server may determine the frame group where the start frame is located, and may further acquire a first frame, the first frame, and a prediction frame that is direct to the start frame of the frame group, so that the key frame may be constructed according to the acquired first frame, the first frame, and the prediction frame that is direct to the start frame.

In an embodiment, step S208, constructing a key frame according to the found key frame and the predicted frame, specifically including: determining the frame reference relationship between the searched key frame and each frame in the predicted frame; and based on the searched key frame, restoring the key frame according to the determined frame reference relation to obtain the constructed key frame.

The frame reference relationship is a reference relationship between frames in the media data.

In one embodiment, the found key frame is an I frame, and the found predicted frame is a B frame or a P frame, and the frame reference relationship includes a reference relationship between a P frame and a previous I frame, a reference relationship between a P frame and a previous P frame, a reference relationship between a B frame and a P frame, a reference relationship between a B frame and an I frame, and the like.

In an embodiment, the server may perform frame decoding on the broadcast start frame corresponding to the media data request according to the found frame reference relationship between the key frame and the predicted frame to obtain original image data that can be completely expressed by the broadcast start frame, and then encode the original image data obtained by decoding into the key frame to obtain the constructed key frame. It will be appreciated that the information carried by the constructed key frame is very close to the information carried by the original image that resulted when the start frame was decoded.

The process of encoding the original data to obtain the media data is exemplified as follows: the encoder encodes the original data to obtain an I frame, then skips n frames forward, encodes an n +1 th frame to obtain a P frame by using the encoded I frame as a reference frame, jumps back to the next frame of the I frame according to the obtained I frame and the obtained P frame, and encodes the skipped n frames into a B frame. Then, the encoder skips n frames again, encodes the next P frame using the first P frame as a reference frame, and then skips back again to fill the frame between the first P frame and the encoded second P frame with B frames. Each B frame carries difference information between it and an I frame, or between it and a P frame, or between a preceding P frame and a following I frame, etc.

Taking n as an example of 2, the obtained media data is I, B, B, P, B, B, P, B, B, I, B, B, P, when decoding, since the 0 th frame carries complete information, the 0 th frame can be decoded, and the 3 rd frame is obtained by calculation according to the frame reference relationship between the 0 th frame and the 3 rd frame; then, the 1 st frame, the 2 nd frame and so on are calculated according to the frame reference relation. Namely, the frames in the media data are decoded in the order of 0 th, 3 rd, 1 th, 2 th, 6 th, 4 th, 5 th, 9 th, 7 th and 8 th frames. Then, when the broadcast frame corresponding to the media data request is the 3 rd frame, frame restoration is performed according to the frame reference relationship between the searched key frame and each frame in the predicted frame to obtain the original data of the 3 rd frame, and then I-frame coding is performed on the original data of the 3 rd frame to construct the key frame. Similarly, when the broadcast frame corresponding to the media data request is the 5 th frame, frame restoration is performed according to the frame reference relationship between the previous key frame and each frame in the predicted frame to obtain the original data of the 5 th frame, and then the original data of the 5 th frame is subjected to I-frame coding to construct the key frame.

In this embodiment, according to the frame reference relationship between the searched key frame and each of the predicted frames, a new key frame may be constructed according to the searched key frame and the predicted frame, and the information carried by the constructed key frame is close to the complete information of the original data expressed when the play-starting frame is decoded.

As shown in fig. 9, in a specific embodiment, the media data processing method specifically includes the following steps:

s902, receiving a media data request.

S904, the media data specified by the media data request is determined.

S906, the media data request is a live video stream playing request; the media data is live video stream; and determining the live broadcast delay time according to the media data request.

And S908, acquiring the current time node.

S910, determining a broadcast starting time node according to the current time node and the live broadcast delay time, wherein the broadcast starting time node is earlier than the current time node by the live broadcast delay time.

S912a, determines a frame matching the play-out time node in the media data as a play-out frame.

S914a, when the start frame is not a key frame, searching the media data for a key frame that is closest to the start frame before the start frame and a predicted frame between the closest key frame and the start frame.

S916a, constructing a key frame according to the found key frame and predicted frame.

S918a, in response to the media data request, feeding back the constructed key frame and a frame following the start frame in the media data.

S912b, find the play-start media segment matching the play-start time node in the media data.

S914b, when the start-playing media segment does not include a key frame, searching the media data for a media segment that is the closest before the start-playing media segment and that includes a key frame.

S916b, finding key frames from the searched media segments; acquiring a predicted frame between the searched key frame and the start-playing media fragment; and constructing the key frame according to the searched key frame and the predicted frame.

S918b, inserting the constructed key frame into the start-playing media segment; and responding to the media data request, and feeding back the start-playing media segment inserted with the key frame and the media segments following the start-playing media segment in the media data.

FIG. 9 is a flowchart illustrating a method for processing media data according to one embodiment. It should be understood that, although the steps in the flowchart of fig. 9 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 9 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

As shown in fig. 10, in one embodiment, a media data playing method is provided. The embodiment is mainly illustrated by applying the method to the terminal 110 in fig. 1. Referring to fig. 10, the media data playing method specifically includes the following steps:

s1002, sending a media data request.

As before, the media data request is an instruction for requesting media data. In one embodiment, the terminal sends a media data request to the server through the network, so that the server determines media data requested by the terminal according to the media data request and feeds back frames in the requested media data in response to the media data request.

In one embodiment, the media data request may be triggered by the user clicking on an application on the terminal or logging into an associated website that provides a platform for the user to view the media. It will be appreciated that the server may receive media data requests triggered by different users through the terminal at different times.

S1004, receiving a frame fed back in response to the media data request; the received frames include a key frame and a frame following the start-up frame in the media data; the fed-back key frame is constructed according to the prediction frame and the key frame which is closest to the start-playing frame in the media data before the start-playing frame; the prediction frame is the prediction frame between the nearest key frame and the broadcast starting frame; the initiating frame is specified by the media data request.

Specifically, the terminal may receive a frame fed back by the server in response to the media data request, where the fed back frame is constructed by the server according to the predicted frame and a key frame of the media data that is closest to the start frame before the start frame. Therefore, the terminal does not need to request the server for feedback from the searched key frame, and does not need to start decoding playing from the received searched key frame but start decoding playing from the constructed key frame, so that the delay of media data playing can be greatly reduced.

S1006, decoding and playing the received frame from the key frame in the received frame.

In one embodiment, the step of receiving the frame fed back in response to the media data request specifically includes: receiving media fragments fed back in response to the media data request; each media fragment comprises a plurality of frames; the media segments comprise start-playing media segments; the start-playing media segment includes a key frame; the step of decoding and playing the received frame from the key frame in the received frame specifically includes: and decoding and playing the frames in the received media fragments one by one from the key frame in the received start playing media fragment.

According to the media data playing method, when the start frame requested by the media data request is not a key frame, the received feedback key frame is a key frame which is searched from the media data and is before the start frame and closest to the start frame, and the key frame is constructed according to the searched key frame and the predicted frame between the key frame and the start frame, so that decoding playing can be directly performed according to the constructed key frame and the frame after the start frame in the media data, the media data does not need to be played in sequence from the searched key frame before the start frame, and therefore delay of the played media data can be reduced.

As shown in fig. 11, in one embodiment, there is provided a media data processing apparatus 1100, the apparatus comprising: a receiving module 1102, a determining module 1104, a finding module 1106, a constructing module 1108, and a responding module 1110, wherein:

a receiving module 1102, configured to receive a media data request.

A determining module 1104 for determining an originating frame requested by the media data request in the media data.

A searching module 1106, configured to search, when the start frame is not a key frame, a key frame that is closest to the start frame before the start frame and a prediction frame between the closest key frame and the start frame from the media data.

A constructing module 1108, configured to construct the key frame according to the found key frame and the predicted frame.

A response module 1110, configured to feed back the constructed key frame and a frame following the start frame in the media data in response to the media data request.

In one embodiment, as shown in fig. 12, the determining module 1104 specifically includes: a media data determination module 1202, an on-air time node determination module 1204, and an on-air frame determination module 1206, wherein the media data determination module 1202 is configured to determine media data specified by the media data request; the play-out time node determining module 1204 is configured to determine a play-out time node according to the media data request; the play-out frame determining module 1206 is configured to determine a frame in the media data that matches the play-out time node as a play-out frame.

In one embodiment, the media data request is a live video stream play request; the media data is live video stream; the play-out time node determining module 1204 is further configured to determine a live broadcast delay duration according to the media data request; acquiring a current time node; and determining a broadcast starting time node according to the current time node and the live broadcast delay time, wherein the broadcast starting time node is earlier than the current time node by the live broadcast delay time.

In one embodiment, the media data comprises media segments; the media fragment comprises a plurality of frames; the start-up frame is a frame in the start-up media segment; the response module 1110 is further configured to insert the constructed key frame into the play-start media segment; and responding to the media data request, and feeding back the start-playing media segment inserted with the key frame and the media segments following the start-playing media segment in the media data.

In one embodiment, the construction module 1108 is further configured to search the media data for a media segment that is closest to the start-playing media segment and that includes a key frame when the start-playing media segment does not include a key frame; searching key frames from the searched media fragments; and acquiring the predicted frame between the searched key frame and the start-playing media fragment.

In one embodiment, the determining module 1104 is further configured to determine an on-air time node from the media data request; and searching the play-starting media fragment matched with the play-starting time node in the media data.

In one embodiment, the media data includes a group of frames; each frame group comprises a plurality of frames, and the first frame of each frame group is a key frame; the building module 1108 is further configured to determine an originating frame group where an originating frame is located; and when the starting frame is not the first frame in the starting frame group, acquiring a key frame serving as the first frame in the starting frame group and a prediction frame between the first frame and the starting frame.

In one embodiment, the building module 1108 is further configured to build a key frame from the found key frame, the predicted frame, and the start-up frame.

In one embodiment, the building module 1108 is further configured to determine a frame reference relationship between the found key frame and each of the predicted frames; and based on the searched key frame, restoring the key frame according to the determined frame reference relation to obtain the constructed key frame.

In the media data processing apparatus 1100, when the start frame requested by the media data request is not a key frame, in order to enable the fed-back data to include a key frame, a key frame that is before the start frame and is closest to the start frame is searched from the media data, and the key frame is constructed according to the searched key frame and a prediction frame between the key frame and the start frame, so that when the media data request is responded, the constructed key frame and a frame that follows the start frame in the media data can be directly fed back, and the media data does not need to be fed back in sequence from the searched key frame before the start frame, thereby reducing the delay of the fed-back media data.

As shown in fig. 13, in an embodiment, there is provided a media data playing apparatus 1300, which specifically includes: a sending module 1302, a receiving module 1304, and a playing module 1306, wherein:

a sending module 1302, configured to send a media data request.

A receiving module 1304 for receiving a frame fed back in response to the media data request; the received frames include a key frame and a frame following the start-up frame in the media data; the fed-back key frame is constructed according to the prediction frame and the key frame which is closest to the start-playing frame in the media data before the start-playing frame; the prediction frame is the prediction frame between the nearest key frame and the broadcast starting frame; the initiating frame is specified by the media data request.

A playing module 1306, configured to decode and play the received frames from the key frames in the received frames.

In one embodiment, the receiving module 1304 is further configured to receive a media fragment fed back in response to the media data request; each media fragment comprises a plurality of frames; the media segments comprise start-playing media segments; the start-playing media segment includes a key frame; the playing module 1306 is further configured to decode and play frames in the received media segments one by one from the key frame in the received start playing media segment.

In the media data playing apparatus 1300, when the start frame requested by the media data request is not a key frame, the received feedback key frame is a key frame that is found before the start frame and is closest to the start frame from the media data, and the received feedback key frame is a key frame that is constructed according to the found key frame and the predicted frame between the key frame and the start frame, so that the media data can be directly decoded and played according to the constructed key frame and the frame of the media data after the start frame, and the media data does not need to be played in sequence from the found key frame before the start frame, thereby reducing the delay of the played media data.

FIG. 14 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may specifically be the server 120 in fig. 1. As shown in fig. 14, the computer device includes a processor, a memory, a network interface connected by a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program which, when executed by the processor, causes the processor to implement the media data processing method. The internal memory may also have stored therein a computer program that, when executed by the processor, causes the processor to perform the media data processing method.

Those skilled in the art will appreciate that the architecture shown in fig. 14 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, the media data processing apparatus 1100 provided herein may be implemented in the form of a computer program that is executable on a computer device such as that shown in fig. 14. The memory of the computer device may store various program modules that make up the media data processing apparatus, such as the receiving module 1102, the determining module 1104, the finding module 1106, the constructing module 1108, and the responding module 1110 shown in FIG. 11. The computer program constituted by the respective program modules causes the processor to execute the steps in the media data processing method of the respective embodiments of the present application described in the present specification.

For example, the computer device shown in fig. 14 may perform step S202 by the receiving module 1102 in the media data processing apparatus shown in fig. 11. The computer device may perform step S204 by the determination module 1104. The computer device may perform step S206 by the lookup module 1106. The computer device may perform step S208 by the building module 1108. The computer device may perform step S210 through the response module 1110.

In one embodiment, there is provided a computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of: receiving a media data request; determining an initiation frame requested by a media data request in the media data; when the start-up frame is not a key frame, searching a key frame which is before the start-up frame and is closest to the start-up frame and a prediction frame between the closest key frame and the start-up frame from the media data; constructing a key frame according to the searched key frame and the predicted frame; and feeding back the constructed key frame and a frame following the start playing frame in the media data in response to the media data request.

In one embodiment, the computer program, when executed by the processor, causes the processor to perform the steps of determining in the media data an originating frame requested by the media data request, in particular: determining media data specified by the media data request; determining a play-out time node according to the media data request; and determining the frame matched with the play-out time node in the media data as the play-out frame.

In one embodiment, the media data request is a live video stream play request; the media data is live video stream; the computer program, when executed by the processor, causes the processor to perform the steps of determining a start-of-play time node based on the media data request, in particular: determining a live broadcast delay time length according to the media data request; acquiring a current time node; and determining a broadcast starting time node according to the current time node and the live broadcast delay time, wherein the broadcast starting time node is earlier than the current time node by the live broadcast delay time.

In one embodiment, the media data comprises media segments; the media fragment comprises a plurality of frames; the start-up frame is a frame in the start-up media segment; when the computer program is executed by the processor in response to a media data request, the step of feeding back the constructed key frame and the frame following the start frame in the media data causes the processor to specifically execute the steps of: inserting the constructed key frame into the playing start media fragment; and responding to the media data request, and feeding back the start-playing media segment inserted with the key frame and the media segments following the start-playing media segment in the media data.

In one embodiment, the computer program, when executed by the processor, causes the processor to perform the steps of, when the start-up frame is not a key frame, searching for a key frame that is closest to the start-up frame and a predicted frame between the closest key frame and the start-up frame before the start-up frame from the media data, specifically: when the start-playing media segment does not comprise the key frame, searching the media segment which is nearest to the start-playing media segment and comprises the key frame from the media data; searching key frames from the searched media fragments; and acquiring the predicted frame between the searched key frame and the start-playing media fragment.

In one embodiment, the computer program, when executed by the processor, causes the processor to perform the steps of determining in the media data an originating frame requested by the media data request, in particular: determining a play-out time node according to the media data request; and searching the play-starting media fragment matched with the play-starting time node in the media data.

In one embodiment, the media data includes a group of frames; each frame group comprises a plurality of frames, and the first frame of each frame group is a key frame; the computer program is executed by the processor for causing the processor to perform the following steps in particular when the step of finding from the media data a key frame that is closest to the start frame before the start frame and a prediction frame between the closest key frame and the start frame is performed when the start frame is not a key frame: determining a broadcast starting frame group where a broadcast starting frame is located; and when the starting frame is not the first frame in the starting frame group, acquiring a key frame serving as the first frame in the starting frame group and a prediction frame between the first frame and the starting frame.

In one embodiment, the computer program, when executed by the processor, causes the processor to perform the steps of constructing a key frame from the located key frame and predicted frame, in particular: and constructing the key frame according to the searched key frame, the predicted frame and the broadcast starting frame.

In one embodiment, the computer program, when executed by the processor, causes the processor to perform the steps of constructing a key frame from the located key frame and predicted frame, in particular: determining the frame reference relationship between the searched key frame and each frame in the predicted frame; and based on the searched key frame, restoring the key frame according to the determined frame reference relation to obtain the constructed key frame.

According to the computer equipment, when the broadcast starting frame requested by the media data request is not a key frame, in order to enable the fed back data to contain the key frame, the key frame which is before the broadcast starting frame and is closest to the broadcast starting frame is searched from the media data, and the key frame is constructed according to the searched key frame and the predicted frame between the key frame and the broadcast starting frame, so that when the media data request is responded, the constructed key frame and the frame which is after the broadcast starting frame in the media data can be directly fed back, and the media data do not need to be fed back in sequence from the searched key frame which is before the broadcast starting frame, and therefore the delay of the fed back media data can be reduced.

In one embodiment, a computer readable storage medium is provided, having a computer program stored thereon, which, when executed by a processor, causes the processor to perform the steps of: receiving a media data request; determining an initiation frame requested by a media data request in the media data; when the start-up frame is not a key frame, searching a key frame which is before the start-up frame and is closest to the start-up frame and a prediction frame between the closest key frame and the start-up frame from the media data; constructing a key frame according to the searched key frame and the predicted frame; and feeding back the constructed key frame and a frame following the start playing frame in the media data in response to the media data request.

In the computer-readable storage medium, when the start frame requested by the media data request is not a key frame, in order to enable the fed-back data to include the key frame, the key frame which is before the start frame and is closest to the start frame is searched from the media data, and the key frame is constructed according to the searched key frame and the predicted frame between the key frame and the start frame, so that when the media data request is responded, the constructed key frame and the frame after the start frame in the media data can be directly fed back, and the media data does not need to be fed back in sequence from the searched key frame before the start frame, thereby reducing the delay of the fed-back media data.

FIG. 15 is a diagram showing an internal structure of a computer device in one embodiment. The computer device may specifically be the terminal 110 in fig. 1. As shown in fig. 15, the computer apparatus includes a processor, a memory, a network interface, a display screen, and an input device connected through a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement the media data playback method. The internal memory may also store a computer program, and the computer program, when executed by the processor, may cause the processor to execute the media data playback method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 15 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, the media data playing apparatus 1300 provided in the present application can be implemented in a form of a computer program, and the computer program can be run on a computer device as shown in fig. 15. The memory of the computer device may store various program modules constituting the media data playback apparatus, such as the transmission module 1302, the reception module 1304, and the playback module 1306 shown in fig. 13. The computer program constituted by the respective program modules causes the processor to execute the steps in the media data playback method of the respective embodiments of the present application described in the present specification.

For example, the computer device shown in fig. 15 may execute step S1002 by the sending module 1302 in the media data playback apparatus shown in fig. 13. The computer device may perform step S1004 through the receiving module 1304. The computer device may perform step S1006 through the play module 1306.

In one embodiment, there is provided a computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of: sending a media data request; receiving a frame fed back in response to the media data request; the received frames include a key frame and a frame following the start-up frame in the media data; the fed-back key frame is constructed according to the prediction frame and the key frame which is closest to the start-playing frame in the media data before the start-playing frame; the prediction frame is the prediction frame between the nearest key frame and the broadcast starting frame; the initiating frame is specified by a media data request; and decoding and playing the received frames from the key frames in the received frames.

In one embodiment, the computer program, when executed by the processor, causes the processor to perform the steps of: receiving media fragments fed back in response to the media data request; each media fragment comprises a plurality of frames; the media segments comprise start-playing media segments; the start-playing media segment includes a key frame; when the computer program is executed by the processor to perform the step of decoding and playing the received frame from the key frame in the received frame, the processor is specifically caused to perform the following steps: and decoding and playing the frames in the received media fragments one by one from the key frame in the received start playing media fragment.

According to the computer equipment, when the start frame requested by the media data request is not a key frame, the received feedback key frame is a key frame which is searched from the media data and is before the start frame and closest to the start frame, and the key frame is constructed according to the searched key frame and the predicted frame between the key frame and the start frame, so that decoding playing can be directly performed according to the constructed key frame and the frame after the start frame in the media data, the media data does not need to be played in sequence from the searched key frame before the start frame, and the delay of the played media data can be reduced.

In one embodiment, a computer readable storage medium is provided, having a computer program stored thereon, which, when executed by a processor, causes the processor to perform the steps of: sending a media data request; receiving a frame fed back in response to the media data request; the received frames include a key frame and a frame following the start-up frame in the media data; the fed-back key frame is constructed according to the prediction frame and the key frame which is closest to the start-playing frame in the media data before the start-playing frame; the prediction frame is the prediction frame between the nearest key frame and the broadcast starting frame; the initiating frame is specified by a media data request; and decoding and playing the received frames from the key frames in the received frames.

When the start frame requested by the media data request is not a key frame, the received feedback key frame is a key frame which is searched from the media data and is before the start frame and closest to the start frame, and the key frame is constructed according to the searched key frame and the predicted frame between the key frame and the start frame, so that the decoding playing can be directly performed according to the constructed key frame and the frame after the start frame in the media data, the media data does not need to be played in sequence from the searched key frame before the start frame, and the delay of the played media data can be reduced.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A media data processing method, comprising:

receiving a media data request;

determining a frame reference relationship between the searched key frame and each frame in the predicted frame, based on the searched key frame, performing frame decoding on the broadcast starting frame according to the determined frame reference relationship to obtain original image data which can be completely expressed by the broadcast starting frame, and encoding the original image data obtained by decoding into the key frame as a constructed key frame;

and responding to the media data request, feeding back the constructed key frame and the frame after the play-starting frame in the media data to indicate that the constructed key frame and the frame after the play-starting frame are played in sequence.

2. The method of claim 1, wherein determining the start frame requested by the media data request in the media data comprises:

determining media data specified by the media data request;

determining a play-out time node according to the media data request;

and determining the frame matched with the play-out time node in the media data as a play-out frame.

3. The method of claim 2, wherein the media data request is a live video stream play request; the media data is live video stream; determining a start-up time node according to the media data request comprises:

determining a live broadcast delay time according to the media data request;

acquiring a current time node;

and determining a broadcast starting time node according to the current time node and the live broadcast delay time, wherein the broadcast starting time node is earlier than the current time node by the live broadcast delay time.

4. The method of claim 1, wherein the media data comprises media segments; the media fragment comprises a plurality of frames; the start-up frame is a frame in a start-up media segment;

the feeding back the constructed key frame and the frame following the start frame in the media data in response to the media data request comprises:

inserting the constructed key frame into the playing media fragment;

and responding to the media data request, and feeding back the start-up media segment with the inserted key frame and the media segments after the start-up media segment in the media data.

5. The method of claim 4, wherein when the initiating frame is not a key frame, searching the media data for a key frame that is closest to the initiating frame before the initiating frame and a predicted frame between the closest key frame and the initiating frame comprises:

when the start-playing media segment does not comprise a key frame, searching the media data for a media segment which is nearest to the start-playing media segment and comprises the key frame;

searching key frames from the searched media fragments;

and acquiring the searched key frame and the predicted frame between the start-up media fragments.

6. The method of claim 4, wherein determining the start frame requested by the media data request in the media data comprises:

determining a play-out time node according to the media data request;

and searching the playing media fragment matched with the playing time node in the media data.

7. The method of claim 1, wherein the media data comprises a group of frames; each frame group comprises a plurality of frames, and the first frame of each frame group is a key frame;

when the start frame is not a key frame, searching a key frame which is before the start frame and is closest to the start frame and a prediction frame between the closest key frame and the start frame from the media data, comprising:

determining a broadcast starting frame group where the broadcast starting frame is located;

and when the starting frame is not the first frame in the starting frame group, acquiring a key frame serving as the first frame in the starting frame group and a prediction frame between the first frame and the starting frame.

8. A media data playback method, comprising:

sending a media data request;

receiving a frame fed back in response to the media data request; the received frames include a key frame and a frame following the start-up frame in the media data; the fed-back key frame is obtained by encoding original image data obtained by decoding after frame decoding is carried out on the broadcast starting frame according to a key frame which is closest to the broadcast starting frame before the broadcast starting frame and a frame reference relation between the closest key frame and a prediction frame between the broadcast starting frames to obtain the original image data which can be completely expressed by the broadcast starting frame; the initiating frame is specified by the media data request;

9. The method of claim 8, wherein receiving the frame fed back in response to the media data request comprises:

receiving media fragments fed back in response to the media data request; each of the media segments comprises a plurality of frames; the media segments comprise start-playing media segments; the start-up media segment includes the key frame;

the decoding and playing the received frame from the key frame in the received frame includes:

and decoding and playing the frames in the received media fragments one by one from the key frame in the received start playing media fragment.

10. A media data processing apparatus, comprising:

a receiving module for receiving a media data request;

11. The apparatus according to claim 10, wherein the determining module specifically includes: the device comprises a media data determining module, a play-starting time node determining module and a play-starting frame determining module;

the media data determining module is used for determining the media data specified by the media data request;

the play-out time node determining module is used for determining a play-out time node according to the media data request;

and the play-starting frame determining module is used for determining a frame matched with the play-starting time node in the media data as a play-starting frame.

12. The apparatus of claim 11, wherein the media data request is a live video stream play request; the media data is live video stream; the broadcast starting time node determining module is further used for determining a broadcast delay time according to the media data request; acquiring a current time node; and determining a broadcast starting time node according to the current time node and the live broadcast delay time, wherein the broadcast starting time node is earlier than the current time node by the live broadcast delay time.

13. The apparatus of claim 10, wherein the media data comprises media segments; the media fragment comprises a plurality of frames; the start-up frame is a frame in a start-up media segment;

the response module is further configured to insert the constructed key frame into the start-up media segment; and responding to the media data request, and feeding back the start-up media segment with the inserted key frame and the media segments after the start-up media segment in the media data.

14. The apparatus of claim 13, wherein the construction module is further configured to search the media data for a media segment that is closest to the start-playing media segment and that includes a key frame when the start-playing media segment does not include a key frame; searching key frames from the searched media fragments; and acquiring the searched key frame and the predicted frame between the start-up media fragments.

15. The apparatus of claim 13, wherein the determining module is further configured to determine an on-air time node from the media data request; and searching the playing media fragment matched with the playing time node in the media data.

16. The apparatus of claim 10, wherein the media data comprises a group of frames; each frame group comprises a plurality of frames, and the first frame of each frame group is a key frame; the building module is also used for determining a broadcast starting frame group where the broadcast starting frame is located; and when the starting frame is not the first frame in the starting frame group, acquiring a key frame serving as the first frame in the starting frame group and a prediction frame between the first frame and the starting frame.

17. A media data playback apparatus comprising:

a sending module, configured to send a media data request;

a receiving module for receiving a frame fed back in response to the media data request; the received frames include a key frame and a frame following the start-up frame in the media data; the fed-back key frame is obtained by encoding original image data obtained by decoding after frame decoding is carried out on the broadcast starting frame according to a key frame which is closest to the broadcast starting frame before the broadcast starting frame and a frame reference relation between the closest key frame and a prediction frame between the broadcast starting frames to obtain the original image data which can be completely expressed by the broadcast starting frame; the initiating frame is specified by the media data request;

18. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 9.

19. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method according to any one of claims 1 to 9.