CN111918121B

CN111918121B - Accurate editing method for streaming media file

Info

Publication number: CN111918121B
Application number: CN202010577439.6A
Authority: CN
Inventors: 胡一凡; 张宇; 周继波; 殷力; 陈洋
Original assignee: South Sagittarius Integration Co Ltd
Current assignee: South Sagittarius Integration Co Ltd
Priority date: 2020-06-23
Filing date: 2020-06-23
Publication date: 2022-02-18
Anticipated expiration: 2040-06-23
Also published as: CN111918121A

Abstract

The invention belongs to the technical field of audio and video multimedia, and particularly provides an accurate clipping method of a streaming media file, which comprises the steps of obtaining data information of the streaming media file to be clipped and clipping start-stop time T1 and T2; jumping to the file handle to the I frame of the GOP corresponding to the group of pictures T1, and starting to read the audio/video frame of the streaming media file; the video and audio frame display time T1 and T2 are less than T1, the video frame is decoded and generated YUV original data is discarded, the audio frame is directly discarded, and the next frame of media data is read; and when the display time T1 and T2 of the video and audio frames is greater than T2, discarding the frames, updating and perfecting the header information of the DstFile of the target file, writing the tail of the file, and finishing the clipping of the streaming media file. The scheme continuously decodes the video and maintains the inter-frame dependency relationship. By comparing the audio and video display time, the accurate clipping of the audio and video is realized; the method is not limited by the format of the media file, has higher precision than the current mainstream video editing tool, and has wide application range.

Description

Accurate editing method for streaming media file

Technical Field

The invention belongs to the technical field of audio and video multimedia, and particularly relates to a method for accurately editing a streaming media file.

Background

With the rapid development of network communication technology, various applications based on streaming media, such as video monitoring, video conferences, short videos, network television movies and the like, are available everywhere. The streaming media is a technology and a process for instantly transmitting compressed audio and video data, text data and the like on the network in a water flow mode, the technology greatly improves the real-time performance of network live broadcast, and if the technology is not available, the whole media file must be downloaded before watching. Streaming media files are used for recording and storing streaming media data, and the common file formats are MP4, FLV, AVI, and the like.

In order to reduce bandwidth occupation of video data in network transmission, video data in streaming media is subjected to intra-frame coding and inter-frame associated coding, a picture coded in the intra-frame is called a key frame I frame, a picture coded in the inter-frame is called a P frame or a B frame, the I frame can be independently displayed during decoding and playing, the P frame and the B frame both depend on the I frame in the same Group of pictures (GOP), and otherwise, the video data cannot be normally played. When a user uses a streaming media file, the user has a requirement for clipping the file, the user specifies the starting time of the clipping, because decoding of a P frame and a B frame depends on an I frame, the actual clipping searches for the I frame of a group of pictures (GOP) corresponding to the specified starting time, then the clipping is started from the I frame, the error exists between the time corresponding to the I frame and the clipping starting time at a large probability, and the error value can reach several seconds and tens of seconds.

Disclosure of Invention

The invention aims to overcome the problem of large clipping errors of streaming media files in the prior art.

Therefore, the invention provides a method for accurately editing a streaming media file, which comprises the following steps:

s1: acquiring data information of a streaming media file to be clipped, a clip start time T1, a clip end time T2 and a streaming media file duration T, ensuring that T1 is not less than 0 and less than T2, and ensuring that T2 is not more than long T;

s2: jumping to the file handle to the I frame of the GOP corresponding to the clip start time T1, and starting to read the audio/video frame of the streaming media file;

if the video frame is the original YUV data, decoding the frame to be the original YUV data, comparing the relation between the display time T1 of the video frame and the clip start time T1, when T1 is smaller than T1, indicating that the clip is not reached yet, discarding the decoded YUV data, reading the next frame data contained in the streaming media file, when T1 is not smaller than T1, indicating that the video frame is the required video frame, coding the original YUV data, re-assigning the timestamp to the generated coded data, and writing the coded data into a target file DstFile;

if the frame is an audio frame, comparing the relation between the display time T2 of the audio frame and the clip start time T1, when T2 is smaller than T1, similarly, discarding the frame data and reading the next frame data contained in the streaming media file, and when T2 is not smaller than T1, reassigning the timestamp for the frame and writing the timestamp into the target file DstFile;

s3: and when the video frame display time T1 or the audio frame display time T2 is greater than the clip ending time T2, discarding the frame, updating and perfecting the header information of the DstFile of the target file, writing the tail of the file, and finishing the clipping of the streaming media file.

Preferably, the data information includes the number of video streams, the number of audio streams, the duration of the streaming media file, the video frame rate, the audio sampling rate and the number of audio frame samples.

Preferably, the format of the streaming media file to be clipped is MP4, FLV, AVI, MPEG or MP 3.

Preferably, step S1 specifically includes: if the clip start time T1 is less than 0, then 0 is defaulted.

Preferably, step S1 specifically includes: and analyzing the streaming media file header, and acquiring necessary data information of the file clip, wherein the duration T of the streaming media file is used for comparing the clip ending time T2, the video frame rate is used for reassigning the time stamp to the video frame, and the audio sampling rate and the frame sample number are used for reassigning the time stamp to the audio frame.

Preferably, when the audio frame is read from the streaming media file, the display time of the audio frame is directly compared without encoding and decoding.

Preferably, step S3 specifically includes: and (4) re-assigning the timestamp to the audio and video frame, and writing the data meeting the end time limit into the target streaming media file.

Preferably, step S1 specifically includes: and comparing the clip ending time T2 with the duration T of the streaming media file to be clipped, and if the ending time T2 is greater than the duration T of the streaming media file, clipping to the end of the file by default.

The invention also provides a device for accurately editing the streaming media file, which is used for editing the streaming media file to be edited according to the method for accurately editing the streaming media file.

The invention has the beneficial effects that: the method for accurately editing the streaming media file, provided by the invention, acquires the data information of the streaming media file to be edited, the editing start time T1, the editing end time T2 and the duration T of the streaming media file; jumping to the file handle to the I frame of the GOP corresponding to the clip start time T1, and starting to read the audio/video frame of the streaming media file; the video frame display time T1 or the audio frame display time T2 is less than the clip start time T1, the video frame is decoded, the generated YUV original data is discarded, the audio frame is directly discarded, and the next frame of media data is read; and when the video frame display time T1 or the audio frame display time T2 is greater than the clip ending time T2, discarding the frame, updating and perfecting the header information of the DstFile of the target file, writing the tail of the file, and finishing the clipping of the streaming media file. The scheme can continuously decode and maintain the inter-frame dependency relationship. The method for accurately editing the audio and video is realized by comparing the audio and video display time; the method is not limited by the format of the media file, is suitable for all audio and video frames, has higher precision compared with the current mainstream video editing tool, and has wide application range.

The present invention will be described in further detail below with reference to the accompanying drawings.

Drawings

FIG. 1 is a flowchart illustrating the steps of a method for accurately editing a streaming media file according to the present invention;

FIG. 2 is a video frame editing flowchart of the method for accurately editing a streaming media file according to the present invention;

fig. 3 is a flow chart of audio frame clipping of the method for accurately clipping a streaming media file according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it is to be understood that the terms "center", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplicity of description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention.

The terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature; in the description of the present invention, "a plurality" means two or more unless otherwise specified.

The embodiment of the invention provides a method for accurately editing a streaming media file, which comprises the following steps:

Specifically, based on the fact that video data interframe coding is interdependent, video display time is monotonically increased in a unit of millisecond and video decoding occupies low resources, a scheme is provided for continuously decoding video frames, maintaining interframe dependency relationship, continuously comparing the video display time with user-specified clipping starting time, having no dependency relationship among audio data interframes, requiring no coding and decoding, and continuously comparing the audio display time with the user-specified clipping starting time to ensure that the requirements of users for clipping are met.

In a streaming media file, the stored data contains video, audio, subtitle text, pictures, etc. This patent assumes that the user-specified clip start time is T1, the clip end time is T2, and the duration of the file to be clipped is T in milliseconds, and the target streaming media file duration T3 for clip generation is equal to T2 minus T1, or T minus T1. The following main working steps are carried out:

the method comprises the following steps: the user specifies the streaming media file SrcFile to be edited and the clip start time T1 and the clip end time T2, the time unit may be accurate to milliseconds.

Step two: the clipping device obtains the information of the streaming media file, which mainly comprises the number of video streams, the number of audio streams, the duration of the streaming media file, the video frame rate, the audio sampling rate, the number of audio frame samples and the like.

Step three: comparing the clip start time and the streaming media file time length designated by the user, ensuring that the start time T1 is greater than or equal to 0 and less than the end time T2, the end time T2 is not greater than the streaming media file time length T, and when the end time T is greater than T, clipping to the end of the file by default. Jumping a file handle to an I frame of a GOP (group of pictures) corresponding to the clip start time T1, starting to read an audio/video frame of a streaming media file, if the frame is a video frame, decoding the frame to be original YUV data, comparing the relation between the video frame display time T1 and the clip start time T1, if T1 is less than T1, indicating that the frame does not reach the clip position, discarding the decoded YUV data, reading next frame data contained in the streaming media file, if T1 is more than T1, indicating that the frame is a required video frame, encoding the original YUV data, assigning a timestamp again to the generated encoded data, and writing the encoded data into a target file DstFile; if the audio frame is an audio frame, comparing the relation between the display time T2 of the audio frame and the clip start time T1, when T2 is smaller than T1, similarly, discarding the frame data and reading the next frame data contained in the streaming media file, and when T2 is larger than T1, re-assigning a timestamp to the frame and writing the timestamp into the target file DstFile.

Step four: and when the video frame display time T1 or the audio frame display time T2 is greater than the clip ending time T2, discarding the frame, updating and perfecting the header information of the DstFile of the target file, writing the tail of the file, and finishing the clipping of the streaming media file.

In the first step, a user specifies a file to be clipped and a clip start time, the format of the streaming media file to be clipped is not limited, the common MP4, FLV, AVI, MPEG, MP3 and the like support, the clip start time precision is millisecond, the start time is ensured to be less than the end time, otherwise, the time setting is directly prompted to be abnormal, the clipping is terminated, if the start time is set to be less than 0, the default start time is 0, and if the end time is set to be greater than the duration of the streaming media file, the default clipping is to the end of the file.

In the second step, the clipping device analyzes the stream media file header to obtain parameter information necessary for file clipping, wherein the stream media file duration T is used for comparing the clipping ending time, and the video frame rate is used by reassigning the timestamp in the video frame to ensure lip synchronization of the target file generated by clipping.

In the third step, the audio frames have no inter-frame interdependence relation, each frame of audio decoding can be independently completed, and the audio stream can not influence the accuracy of the video clip, so the display time of the audio frames is directly compared without encoding and decoding. The coding mechanism of the video frame directly influences the clipping precision, the decoding of the P frame and the B frame depends on the I frame in the same GOP (group of pictures), if the I frame is not referred for direct decoding, obvious abnormity occurs, a user cannot accept the abnormity, the decoder continuously decodes the GOP, the normal decoding of the P frame and the B frame video can be ensured, and the YUV data generated by decoding is the original video data which can be coded into the I frame, the P frame or the B frame. And respectively reassigning the time stamps for the audio and video frames, and ensuring the lip synchronization of the clipped audio and video data according to the frame rate of the original video stream, the sampling rate of the original audio stream and the number of frame samples. The minimum frame rate requirement for comfortable viewing of video pictures by human eyes is 25 frames per second, i.e. a minimum of 25 pictures are played per second, in practical applications, the video frame rate is generally set to be 25 to 60, 25 to 30 are used most, the frame rate is 25 and 30 respectively as an example, the time interval between every two pictures is 40 milliseconds and 33.3 milliseconds, when the relation between T1 and T1 is compared, the clipping scheme described herein can control the maximum error within 40 milliseconds, and human eyes cannot watch the pictures differently. The current popular clipping tool and scheme can only start decoding from the first frame I frame in the GOP, the video frame number contained in the GOP can last for several seconds and tens of seconds, the clipping error is in the second level, the maximum error reaches tens of seconds, and the watching effect of human eyes is very poor.

In the fourth step, the audio/video frame is assigned with the timestamp again, the data meeting the end time limit is written into the target streaming media file, and finally the file header information is updated and the file tail information is written after the streaming media data is written.

In a specific implementation scenario, the steps of accurately clipping a streaming media file are shown in fig. 1:

s101: the user specifies to the clipping apparatus the streaming media file SrcFile to be clipped, and at the same time specifies the start time T1, T2 of the desired clip in milliseconds.

S102: the editing device checks the format of the streaming media file and the correctness of the media data, the file is free of abnormity, file basic information is obtained, the file basic information mainly comprises the number of audio and video streams, a video frame rate Fps, an audio sampling rate SampleRate, the number of samples Nums contained in each frame of audio, the duration T of the streaming media file and the like, and a video decoder is established at the same time. The clipping device checks the reasonableness of the clip start time designated by the user, if the clip start time T1 is greater than the clip end time T2, the clipping device stops working, the cue time is set to be abnormal, if the start time T1 is negative, the clipping device defaults to set the start time to 0, and if the clip end time T2 is greater than the streaming media file duration T, the clipping device defaults to the end of the file. The editing device creates a target streaming media file DstFile according to the basic information of the streaming media file SrcFile to be edited and simultaneously creates an encoder.

S103: the video frames are continuously decoded by the stream media file clips, the inter-frame dependency relationship is maintained, the display time of each frame of data is compared, the video frames meeting the time requirement are encoded and re-assigned with the timestamps and then written into the target stream media file created in S102, and the audio frames and the subtitle information meeting the time requirement are written into the target stream media file after re-assigned with the display timestamps. The re-assignment of the time stamp depends on the information of the video frame rate Fps, the audio sampling rate SampleRate, the number of samples Nums contained in each frame of audio, and the like.

S104: after all the streaming media data in the specified time are written into the target file, the file head of the target file needs to be updated, and the file tail needs to be written. The streaming media files with different formats have different requirements on the head and tail information, the head and tail information is missing or abnormal, and the player cannot normally play the streaming media files.

When the streaming media file to be edited is a video frame, refer to fig. 2:

s201: on the basis of S101, the clipping device jumps the file handle of the streaming media file to be clipped to the GOP corresponding to the clip start time, and the file handle is positioned at the I frame of the GOP.

S202: the clipping device reads a frame of video data, decodes the frame of video using the decoder created in S102, and generates raw YUV data. The I frame in the image group can be directly decoded, a decoder can record I frame information, P frame decoding needs to depend on the I frame in the image group and other P frames arranged in front of the P frame, the decoder records P frame information, and B frame decoding needs to depend on the I frame in the image group and the P frames arranged in front of and behind the I frame. The decoding of P frame and B frame must depend on I frame, and the continuous decoding of group of pictures GOP is to maintain the inter-frame dependency relationship.

S203: compared with the magnitude relation between the video display time T1 and the clip start time T1 of the frame, the value of T1 needs to be converted according to the display time stamp pts of the video frame and the time base TimeBase of the video stream. For video frames that do not meet the cut-time requirement, its corresponding YUV raw data is discarded.

S204: the encoder created in S102 is selected to encode the original YUV data meeting the clipping time requirement, and the encoder created in S102 can specify the GOP size of the image group generated by encoding, the width of the encoded image and other information. The encoder encodes YUV raw data into I-frames or P-frames or B-frames, respectively, following the size and rules of the group of pictures GOP.

S205: the video frame generated by coding is written into a target streaming media file DstFile, and in order to ensure that the generated target file DstFile can be played like SrcFile, the timestamp needs to be assigned again for the video frame, and the assignment mode is the same as that of the SrcFile. After the newly generated video frame is successfully written into the target file DstFile, the operation proceeds to S202 to operate the next frame.

S206: and after all the video frames meeting the time requirement are written into the DstFile, updating the header information of the target file and writing the tail of the target file.

When the streaming media file to be edited is an audio frame, refer to fig. 3:

s301: the same as S201. The audio stream can be accurately edited, encoding and decoding are not needed, and the time stamp needs to be assigned to the audio frame again. And assigning the timestamp, wherein the audio sampling rate SampleRate and the number Nums of samples per frame need to be acquired.

S302: the clipping tool reads a frame of audio data in a time track.

S303: as in S203. The audio frame display time t2 needs to be converted according to the audio frame display time stamps pts and the time base of the audio stream.

S304: and the audio frame meeting the requirement of the editing time is re-assigned with the timestamp and then written into the target file DstFile, the audio sampling rate SampleRate and the number Nums of samples of each frame for re-assigning the timestamp are from SrcFile, and the audio playing effect of the target file DstFile is the same as that of the SrcFile.

Compared with the existing streaming media file clipping device, the device has the following advantages:

the logic is simple and easy to realize; the accuracy is high, and the user experience is good; high feasibility and reliability and wide application range.

The above examples are merely illustrative of the present invention and should not be construed as limiting the scope of the invention, which is intended to be covered by the claims and any design similar or equivalent to the scope of the invention.

Claims

1. A method for accurately editing a streaming media file is characterized by comprising the following steps:

s1: acquiring data information of a streaming media file SrcFile to be clipped, a clip start time T1, a clip end time T2 and a streaming media file duration T, ensuring that T1 is not less than 0 and less than T2, and ensuring that T2 is not more than the duration T; the duration T of the streaming media file is used for comparing the clipping ending time T2, the video frame rate is used when the timestamp is reassigned to the video frame, and the audio sampling rate and the number of frame samples are used when the timestamp is reassigned to the audio frame;

specifically, a video frame generated by encoding is written into a target streaming media file DstFile, in order to ensure that the generated target file DstFile can be played like SrcFile, a timestamp is assigned to the video frame again, and the assignment mode is the same as SrcFile;

s3: and when the video frame display time T1 or the audio frame display time T2 is greater than the clip ending time T2, discarding the frame, writing the audio and video frame with the re-assigned timestamp meeting the time requirement into the target file DstFile, updating and perfecting the file header information of the target file DstFile, writing the file header information into the file tail, and completing the clipping of the streaming media file.

2. The method for accurately clipping a streaming media file according to claim 1, wherein: the data information comprises the number of video streams, the number of audio streams, the video frame rate, the audio sampling rate and the number of audio frame samples.

3. The method for accurately clipping a streaming media file according to claim 1, wherein: the format of the streaming media file to be clipped is MP4, FLV, AVI, MPEG or MP 3.

4. The method for accurately editing a streaming media file according to claim 1, wherein the step S1 specifically comprises: if the clip start time T1 is less than 0, then 0 is defaulted.

5. The method for accurately clipping a streaming media file according to claim 1, wherein: when the audio frame is read from the streaming media file, the display time of the audio frame is directly compared without encoding and decoding.

6. The method for accurately editing a streaming media file according to claim 1, wherein the step S1 specifically comprises: and comparing the clip ending time T2 with the duration T of the streaming media file to be clipped, and if the ending time T2 is greater than the duration T of the streaming media file, clipping to the end of the file by default.