CN110662084B

CN110662084B - MP4 file stream live broadcasting method, mobile terminal and storage medium

Info

Publication number: CN110662084B
Application number: CN201910979491.1A
Authority: CN
Inventors: 毕新维
Original assignee: Beijing Cheerbright Technologies Co Ltd
Current assignee: Beijing Cheerbright Technologies Co Ltd
Priority date: 2019-10-15
Filing date: 2019-10-15
Publication date: 2021-07-09
Anticipated expiration: 2039-10-15
Also published as: CN110662084A

Abstract

The invention discloses a method for live broadcasting of MP4 file streams, which is executed in a mobile terminal, wherein the mobile terminal is connected with a video recording device. The invention also discloses a corresponding mobile terminal and a storage medium. The method comprises the following steps: acquiring a first MP4 file from a video recording device, wherein the first MP4 file comprises video frames and video frame description information; acquiring a video frame byte number range, a video frame header byte and a video frame analysis parameter from a first MP4 file based on the video frame description information; acquiring a second MP4 file to be live-broadcasted, which is recorded by the same recording parameters, from the video recording equipment, wherein the second MP4 file does not include video frame description information; determining a video frame byte interval of the second MP4 file according to the acquired video frame byte number range and the video frame header bytes; and acquiring and analyzing the video frame from the second MP4 file according to the video frame interval and the video frame analysis parameter to obtain an H264 code stream of the video frame so as to carry out live broadcast.

Description

MP4 file stream live broadcasting method, mobile terminal and storage medium

Technical Field

The invention relates to a video coding and decoding technology, in particular to a method for live broadcasting of MP4 file streams, a mobile terminal and a storage medium.

Background

The mobile live broadcast is developed to the present, by means of the characteristics of real-time and socialization, the participation sense of a user is met, the user can feel the atmosphere of a live broadcast site, and the live broadcast technology is continuously updated and iterated and tends to be mature. Generally, the process of live video is divided into the following steps: acquisition- > processing- > encoding and packaging- > transmission (push streaming to a server, server stream distribution) > player stream playing.

For streaming media transmission, the encoding performance, encoding speed and encoding compression ratio directly affect the user experience and transmission cost of the whole streaming media transmission. The current mainstream video compression format H.264 has the advantages of low code rate, wide application target, strong fault-tolerant capability, high-efficiency network transmission (having a network adaptation layer) and excellent image transmission quality (standard definition digital image transmission can be realized under the code rate of 1 Mbps) and the like, and is widely applied to the numerous media application fields of low-code-rate wireless application, network streaming media, network courses, video conferences and the like.

For the live broadcast requirement, the data of the camera picture and the audio collected by a microphone of a mobile phone or other peripheral audio equipment are collected respectively, and then the data and the audio are synthesized to carry out stream pushing. But in some special scenarios, for example, we need to support plug-streaming with third-party recording devices, but some third-party devices only support outgoing MP4 or mov file streams.

The current decoding technology for MP4 file stream includes two technologies, the first is an audio-video codec technology based on FFmpeg, the second is a standard based on MP4 compression format, and the parsing process is implemented in C language, but both of them need to be decoded by means of media description information (moov). The moov information stores key global data such as file offset, sample rate size, time stamp and the like of video streams and audio streams, is mostly generated when recording is finished, and exists at the end of a file structure of the MP 4. In a live scene, moov information is not generated, and the existing decoding technology of the MP4 file stream cannot realize real-time decoding for stream pushing, so that most of live video streams in the market are in an flv format, and the MP4 format is mostly used for on-demand playing, but not in a live scene. If the moov is lost unexpectedly or the file is damaged unexpectedly, the FFmpeg cannot be analyzed and played, and the recovery of the expected important data cannot be finished.

Disclosure of Invention

To this end, the present invention provides a method, mobile terminal and storage medium for decoding a live stream of MP4 files in an effort to solve or at least mitigate at least one of the problems identified above.

According to an aspect of the present invention, there is provided a method for streaming live MP4 files, the method being performed in a mobile terminal, the mobile terminal being connected to a video recording device, the method comprising: acquiring a first MP4 file from a video recording device, wherein the first MP4 file comprises video frames and video frame description information; based on the video frame description information, acquiring a video frame byte number range, a video frame header byte number and a video frame analysis parameter from the first MP4 file, wherein the video frame byte number range comprises a video frame maximum byte number and a video frame minimum byte number; acquiring a second MP4 file to be live-broadcasted, which is recorded by the same recording parameters, from the video recording equipment, wherein the second MP4 file does not include video frame description information; determining a video frame byte interval of the second MP4 file according to the acquired video frame byte number range and the video frame header byte, wherein the video frame byte interval is used for distinguishing a start segment and an end segment of a video frame; and acquiring and analyzing the video frame from the second MP4 file according to the video frame interval and the video frame analysis parameter to obtain an H264 code stream of the video frame so as to carry out live broadcast according to the H264 code stream.

Optionally, in the MP4 file stream live broadcasting method according to the present invention, the step of determining the video frame byte interval of the second MP4 file according to the obtained video frame byte number range and the video frame header byte includes: judging whether the number of frame bytes of the second MP4 file is smaller than the maximum number of video frame bytes of the first MP4 file and larger than the minimum number of video frame bytes of the first MP4 file, if so, determining the frame of the second MP4 file as an undetermined video frame; and judging whether the head byte of the video frame to be determined is equal to the head byte of the video frame of the MP4 file, if so, determining the video frame to be determined as the video frame, and if not, determining the video frame to be determined as the non-video frame.

Optionally, in the MP4 file stream live broadcasting method according to the present invention, when a header byte of a pending video frame is equal to a header byte of a video frame of the MP4 file, a slice header syntax element of the pending video frame is compared with a slice header syntax element of a video frame of the first MP4 file, if the comparison is successful, the pending video frame is determined as a video frame, and if the comparison is not successful, the pending video frame is determined as a non-video frame.

Optionally, in the MP4 file stream live method according to the present invention, the slice header syntax element of the video frames of the first MP4 file is obtained by slice parsing the video frames in the first MP4 file.

Optionally, in the MP4 live streaming method according to the present invention, the slice header syntax element of the effectively pending video frame is obtained by decoding the effectively pending video frame by exponential golomb coding.

Optionally, in the MP4 file stream live method according to the present invention, the slice header syntax element includes: the current frame number, the address of the first macroblock in a slice, the type of slice, the number of picture parameter sets on which the current slice depends, the decoding order of the pictures.

Optionally, in the MP4 file stream live method according to the present invention, the video frame header bytes are video frame header bytes of I frames and P frames.

Alternatively, in the MP4 file stream live method according to the present invention, the video frame description information of the first MP4 file is obtained by a separate program.

Optionally, in the MP4 file stream live method according to the present invention, the separation procedure is FFmpeg.

Optionally, in the MP4 file stream live broadcasting method according to the present invention, the step of obtaining and parsing the video frames from the second MP4 file according to the video frame interval and the video frame parsing parameter includes: after the second MP4 file is recorded, the video frame with the parsing error is located.

Optionally, in the MP4 live file streaming method according to the present invention, the step of locating a video frame with a parsing error includes: comparing the parsed video frame with the video frame of the recorded file of the second MP 4byte by byte, locating the byte position with the parsing error, and locating the video frame with the parsing error according to the byte position with the parsing error.

According to still another aspect of the present invention, there is provided a mobile terminal including: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing a method of live streaming of MP4 files according to the present invention.

According to yet another aspect of the present invention, there is provided a readable storage medium, one or more programs comprising instructions which, when executed on a mobile terminal, cause the mobile terminal to perform a method of MP4 file streaming live according to the present invention.

According to the technical scheme of the invention, a first MP4 file is recorded in advance, the first MP4 file contains moov information, the first MP4 file acquires the byte number range of a video frame, the head byte of the video frame and video frame analysis parameters, then acquires a second MP4 file recorded by the same recording equipment under the same recording parameters, the video frame byte interval of the second MP4 file is determined according to the byte number range of the video frame and the head byte of the video frame of the first MP4 file, and the H264 code stream of the video frame is obtained through analysis and used for restoring the MP4 file with live broadcast or missing video description information (moov).

The invention also provides a slice level-based parsing method, which comprises the steps of determining an effective undetermined video frame byte interval of a second MP4 file according to the byte number range of the video frame of the first MP4 file and the video frame header byte, parsing the byte behind the effective undetermined video frame header byte to obtain a slice header syntax element, and comparing the effective undetermined video frame with the slice header syntax element of the video frame of the first MP4 file. The total number of the analyzed bytes is small, and the analyzing speed and accuracy are greatly improved. And the analysis is more accurate, more complex macro block level analysis is not needed, and the adverse effects of finally presented video mosaic, blurring, screen splash, green screen and the like caused by misjudgment and misjudgment can be effectively avoided.

Drawings

To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.

Fig. 1 illustrates a configuration diagram of a mobile terminal 100 according to an embodiment of the present invention;

FIG. 2 illustrates a flow diagram of a method 200 of streaming an MP4 file, according to one embodiment of the invention;

FIG. 3 illustrates a flow diagram of a method 300 of streaming a MP4 file according to another embodiment of the invention;

FIG. 4 shows an MP4 standard architecture;

fig. 5 shows a video frame sequence structure.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 is a block diagram of a mobile terminal 100. The mobile terminal 100 may include a memory interface 102, one or more data processors, image processors and/or central processing units 104, and a peripheral interface 106.

The memory interface 102, the one or more processors 104, and/or the peripherals interface 106 can be discrete components or can be integrated in one or more integrated circuits. In the mobile terminal 100, the various elements may be coupled by one or more communication buses or signal lines. Sensors, devices, and subsystems can be coupled to peripheral interface 106 to facilitate a variety of functions.

For example, a motion sensor 110, a light sensor 112, and a distance sensor 114 may be coupled to the peripheral interface 106 to facilitate directional, lighting, and ranging functions. Other sensors 116 may also be coupled to the peripheral interface 106, such as a positioning system (e.g., a GPS receiver), an acceleration sensor, a temperature sensor, a biometric sensor, or other sensing device, to facilitate related functions.

The camera subsystem 120 and optical sensor 122, which may be, for example, a charge-coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) optical sensor, may be used to facilitate implementation of camera functions such as recording photographs and video clips. Communication functions may be facilitated by one or more wireless communication subsystems 124, which may include radio frequency receivers and transmitters and/or optical (e.g., infrared) receivers and transmitters. The particular design and implementation of the wireless communication subsystem 124 may depend on the one or more communication networks supported by the mobile terminal 100. For example, the mobile terminal 100 may include a communication subsystem 124 designed to support an LTE, 3G, GSM network, a GPRS network, an EDGE network, a Wi-Fi or WiMax network, and a Bluetooth network.

The audio subsystem 126 may be coupled to a speaker 128 and a microphone 130 to facilitate voice-enabled functions, such as voice recognition, voice replication, digital recording, and telephony functions. The I/O subsystem 140 may include a touch screen controller 142 and/or one or more other input controllers 144. The touch screen controller 142 may be coupled to a touch screen 146. For example, the touch screen 146 and touch screen controller 142 may detect contact and movement or pauses made therewith using any of a variety of touch sensing technologies, including but not limited to capacitive, resistive, infrared, and surface acoustic wave technologies.

One or more other input controllers 144 may be coupled to other input/control devices 148 such as one or more buttons, rocker switches, thumbwheels, infrared ports, USB ports, and/or pointing devices such as styluses. The one or more buttons (not shown) may include up/down buttons for controlling the volume of the speaker 128 and/or microphone 130.

The memory interface 102 may be coupled with a memory 150. The memory 150 may include high speed random access memory and/or non-volatile memory, such as one or more magnetic disk storage devices, one or more optical storage devices, and/or flash memory (e.g., NAND, NOR). The memory 150 may store an operating system 152, such as an operating system like Android, iOS or Windows Phone. The operating system 152 may include instructions for handling basic system services and performing hardware dependent tasks. In some embodiments, the operating system 152 includes instructions for performing the system cleaning method. The memory 150 may also store applications 154. While the mobile terminal is running, the operating system 152 is loaded from the memory 150 and executed by the processor 104. The application 154 is also loaded from the memory 150 and executed by the processor 104 at runtime. The application 154 runs on top of the operating system, and implements various functions desired by the user, such as instant messaging, web browsing, picture management, video playing, etc., using interfaces provided by the operating system and underlying hardware. The application 154 may be provided independently of the operating system, or may be provided with the operating system itself, and include various social application software, such as QQ, wechat, microblog, and the like, various video playing application software, and system self-contained application programs, such as a photo album, a calculator, and a recording pen. In addition, a driver module may also be added to the operating system when the application 154 is installed in the mobile terminal 100.

The present invention provides a method for streaming MP4 files by implementing the above functions via one or more programs (including the instructions described above) stored in memory 150 of mobile terminal 100. It should be noted that the mobile terminal 100 of the present invention may be a device with data processing capability, such as a mobile phone, a tablet computer, and a notebook computer, which have the above-mentioned structure.

In a general MP4 file stream parsing technology, video frame description information (moov) generated only at the end of recording and existing at the end of the file structure of MP4 is required, and the parsing is non-real-time and cannot be used for live scenes. Therefore, the method for live broadcasting of the MP4 file stream provides a scheme capable of realizing real-time analysis of the MP4 file stream.

Fig. 2 shows a flow diagram of a method 200 of MP4 file stream decoding a live broadcast according to one embodiment of the invention. The method 200 is suitable for execution in a computing device, such as the computing device 100 described above. As shown in fig. 2, the method of decoding the MP4 file stream to be live is suitable for step S210.

In step S210, a first MP4 file is acquired from the video recording device, the first MP4 file including video frames and video frame description information. The mobile terminal is connected with the video recording device, the first MP4 file is output by the video recording device, the image quality of the output video is clearer and more stable than that of the video recorded by the mobile terminal through a camera of the mobile terminal in order to obtain better live broadcast effect by using the video recording device. Some video recording devices only support outputting MP4 files, such as pocket pan-tilt cameras.

In one embodiment, a video recording device is connected to the mobile terminal through a data line, the video recording device is used for recording a first MP4 file, the number of the first MP4 files can be multiple, and the mobile terminal performs data transmission through the data line to obtain a first MP4 file.

All data in the MP4 file are packaged in boxes, i.e. the MP4 file is composed of several boxes, each box has a type and a length, and a box can be understood as a data object block. Another box may be included in the box, one nested level to store media information.

Fig. 4 illustrates an MP4 standard structure, mdat stores data information, and moov contains video frame description information. The moov includes many small boxes, and the emphasis is on a sample table box, and the sample table includes information for distinguishing frame timing and physical layout in the bitstream, including timing, type, size, and location in the respective storage container of the sample. A video sample is a video frame, or a group of consecutive video frames, and an audio sample is a continuous piece of compressed audio, which are collectively referred to as samples.

The MP4 encapsulation format is defined based on the QuickTime container format, and the video frame description information is applied to encapsulate H264 video and ACC audio, separately from the data information. The video frame exists in the mdat in the form of an H264 code stream, and the audio frame exists in the mdat in the form of an ACC code stream, generally in the form of H264+ AAC + AAC + H264 …. As described above, the sample table that can only distinguish frame timing and physical layout in the bitstream exists in moov, and moov exists at the end of the MP4 file stream, and the MP4 file can be parsed for playing only after the MP4 file is recorded, so the MP4 file stream cannot be used in a live broadcast scene in the prior art.

Subsequently, in step S220, the video frame byte count range and the video frame header byte and the video frame parsing parameter are obtained from the first MP4 file based on the video frame description information.

The H264 stream of video frames in MP4 file is referred to as existing in mdat box, and in the H264 video coding standard, the whole system framework is divided into two layers: video Coding Layer (VCL) and Network Abstraction Layer (NAL). The former is responsible for efficiently representing the content of the video data, while the latter is responsible for formatting the data and providing header information to ensure that the data is suitable for transmission over various channels and storage media. Therefore, a video frame is usually a NAL unit when transmitted in a network, and the difference is that there is a start code inside each NAL unit, and the start codes of different NAL units in different packetization modes are different.

There are two packing methods for H264 code stream. One format is Annex B, which is the default output format of most encoders, i.e., the first 3-4 bytes of each frame are the start code of H264, and the start code is 0x00000001 or 0x 000001. The other is the AVCC format, the original NAL packet format. It is the first few bytes (1, 2 or 4 bytes) that are the length of the NAL unit after removing the length of the start code, i.e. the number of bytes of the video frame, rather than the fixed format. Which must then be decoded with the aid of the video frame description information. MP4 is the original NAL packet format, and the start code is 4 bytes, representing the number of bytes of the video frame. The next byte of the start code is the NAL unit header, and the video frame header bytes, and the NAL unit type byte corresponding to each type of frame is fixed. The NAL unit header is followed by the payload of the NAL unit. Table 1 shows the structure of one NAL unit composition.

TABLE 1 NAL Unit composition

Start code (4Byte)

NAL unit header (1Byte)

NAL unit load

According to one embodiment of the invention, because the first MP4 file is a complete recorded file, it contains video frame description information. In the scheme, the video frame description information is moov, an Ffmmpeg separation program is used for separating the H264 file in the first MP4 file according to the global moov, and the Ffmmpeg separation only needs to store the obtained AVpacket of the video as a local file after the av _ read _ frame () is called every time. The AVPacket is a carrier of code stream data in Ffmpeg. After the Ffmpeg separation program, the byte number range of the video frame, the header byte of the video frame and the video frame analysis parameter obtained according to each AVpacket can be obtained.

According to an embodiment of the present invention, the byte number range of the video frames includes the maximum byte number of the video frames and the minimum byte number of the video frames, if the byte number of the frame of the second MP4 file is smaller than the maximum byte number of the video frames of the first MP4 file and larger than the minimum byte number of the video frames of the first MP4 file, the frame of the second MP4 file is a pending video frame. If the maximum Byte number of the obtained video frame is 412650 bytes, 50 ten thousand bytes can be taken. This value is only used to estimate the maximum number of bytes of a frame of video, and is not an exact value.

The video frame parsing parameters, including the sequence parameter set SPS and the picture parameter set PPS, are a set of data that changes very little and provides parsing information for a large number of NAL units. The SPS contains sub-items of parameters for a continuous coded video sequence, such as the identifier seq _ parameter _ set _ id, constraints on frame number and POC, reference frame number, decoded picture size and field coding mode selection flag, etc. PPS corresponds to a picture or sub-items of pictures in a sequence, with parameters such as identifier pic _ parameter _ set _ id, optional seq _ parameter _ set _ id, entropy coding mode selection flag, slice group number, initial quantization parameter, and deblocking filter coefficient adjustment flag, etc. The second MP4 file is output by the same video recording device as in step S220, and the same recording configuration information, such as video width and height, frame rate, encoder parameters, etc., is used to keep consistent with the first MP4 file. The same recording parameters of the same equipment are obtained, and the analyzed data are fixed. That is, the SPS and PPS of the second MP4 are also these byte data. Specifically, the SPS and PPS contain information parameters required for initializing the H264 decoder, including profile and level used for encoding, width and height of an image, a deblocking filter, and the like. SPS and PPS may be obtained by a separation procedure, which exists in the moov of mp4 in the form of a structure of an AVCDecoderConfigurationRecord. The sub-functions provided by the separate program Ffmpeg may be used, or may be obtained using a standard based on the MP4 compression format, using C language, or otherwise.

Fig. 5 shows a video frame sequence structure, and according to an embodiment of the present invention, the following manner is adopted to distinguish the frame types, because a frame of video image can be encoded into one or several slices, and in practical applications, the code stream does not present the complexity of the data partitioning mechanism, and only one frame is considered to be one slice. In the code stream of H264, a part of a video frame sequence is compressed into an I frame, a part is compressed into a P frame, and a part is compressed into a B frame, and the type of the frame needs to be obtained by analyzing the slice header of the video frame through slice analysis. The slice type (slice _ type) of the I frame is 2 or 7, the slice _ type of the P frame is 0 or 5, and the slice _ type of the B frame is 1 or 6. I.e. for different types of frames, their corresponding slice _ type% 5 value is fixed. The type of the frame obtained by the slice parsing may be based on a standard of MP4 compression format, using C language or other methods, which are not limited herein. The role of the various types of frames is further explained here:

i frame: intra frame intra coded frame;

IDR: instant Decoding Refresh frame of instant Decoding of instant output Decoding Refresh;

p frame: predictive-frame forward predictive coded frames;

b frame: bi-directionally interpolated prediction frame.

Similarly, in step S230, a second MP4 file to be live-played is acquired from the video recording device, the second MP4 file not including video frame description information. The second MP4 file is output by the same recording device as in step S220, and the same recording configuration information is used, such as video width and height, frame rate, encoder parameters, etc., to keep consistent with the first MP4 file, the same recording parameters are used in the same device, the byte number range of the video frame is similar, and the NAL unit headers of the parsed various types of frames are fixed.

In one embodiment, the recording device is connected to the mobile terminal through the data line, the video recording device starts recording the second MP4 file, and the mobile terminal performs data transmission through the data line to acquire the second MP4 file in real time.

Next, in step S240, the video frame byte interval of the second MP4 file is determined according to the obtained video frame byte number range and the video frame header byte.

According to an embodiment of the present invention, after the parsing is started, the first MP4 file stream contains mdat, bytes are 0x6D646174, which is used as a starting point for parsing one MP4 data, and bytes of the character string moov are 0x6D6F 76, which is used as an end point, which indicates that the parsing procedure is ended. Parsing is done from the mp4 data transmitted in real time for live broadcast.

According to an embodiment of the present invention, since the length byte of a video frame only needs three bytes to represent, but it occupies four bytes in the bitstream in total, the first byte is usually 0x 00. One byte is read, and if the byte is 0x00, three bytes are read, and the value of the three bytes is the number of bytes of the frame waiting for judgment. And judging whether the number of frame bytes of the second MP4 file is less than the maximum number of video frames of the first MP4 file and greater than the minimum number of video frames of the first MP4 file, if so, determining the frame of the second MP4 file as an undetermined video frame, and reducing the judging steps. If the value size of the three bytes is within the range of the byte number of the video frame obtained by the first MP4 file, reading one byte is continued. And if the byte belongs to the NAL unit header byte of each type of video frame, determining the video frame to be determined as a video frame, and if the byte does not belong to the NAL unit header byte, determining the video frame to be determined as a non-video frame. The video frame interval of the second MP4 file is then divided by the number of bytes of the video frame.

In order to speed up decoding in the actual live broadcast process, B frames are not generally involved, and various types of frames for comparison include I frames and P frames. If the MP4 file stream to be parsed contains B frames, the parsing flow of the present scheme is not affected.

In the last step S250, the video frame is acquired from the second MP4 file and parsed according to the video frame interval and the video frame parsing parameter, so as to obtain the H264 code stream of the video frame, so as to perform live broadcast according to the H264 code stream.

The start code of NAL unit of H264 stream is not 00000001 but video frame byte length, so when H264 is derived, the header 4byte content is replaced. Otherwise, the decoder cannot normally play the H264 code stream. Meanwhile, in addition to the replaced H264 code stream, the decoder needs to play according to the video parsing parameters.

The video frame parsing parameters, including the sequence parameter set SPS and the picture parameter set PPS, are a set of data that changes rarely and provides parsing information for a large number of NAL units that the decoder cannot correctly receive, and so other NAL units cannot be decoded. And the SPS and PPS of the second MP4 file using the same recording parameters as the first MP4 file are also the same byte data.

Thus, by recording the first MP4 file in advance with the same video recording device, the byte number range of the video frame, the header byte of the video frame, and the resolution parameter of the video frame are obtained. Even if the second MP4 file needs to be live, i.e. parsed in real time, or the missing video description information (moov) information cannot be parsed. And determining a video frame byte interval of the second MP4 file according to the video frame byte number range, the video frame header byte and the video frame analysis parameter of the first MP4 file. And acquiring and analyzing the video frames from the second MP4 file according to the video frame interval and the video frame analysis parameters to obtain an H264 code stream of the video frames for restoring the MP4 file which is live broadcast or lacks moov information.

Fig. 3 shows a flow diagram of a method 300 of decoding a MP4 file stream for live broadcast according to another embodiment of the invention. The method 300 is a slice level-based parsing method, and aims to improve the accuracy of the determination.

Beginning at step S310, a first MP4 file is obtained from a video recording device, the first MP4 file including video frames and video frame description information. The first MP4 file is output from the video recording device, and the video recording device is used for the purpose of obtaining better live broadcast effect.

In step S320, the byte number range of the video frame, the header byte of the video frame, and the resolution parameter of the video frame are obtained from the first MP4 file based on the video frame description information.

In the code stream defined in H264, syntax elements are organized into hierarchical structures, which respectively describe information of respective levels. The hierarchical structure of syntax elements helps to save codestreams more efficiently. For example, in an image, there are often the same data between slices, and if each slice carries the data at the same time, the waste of the code stream is inevitable. It is more effective to extract the common information of the picture to form a syntax element at the picture level, and only carry the syntax element unique to the slice itself at the slice level. In H264, the syntax elements are organized into five levels of sequence, picture, slice, macroblock, sub-macroblock. The scheme mainly uses syntax elements of the slice header. It should be noted that, because a frame of video image can be encoded into one or several slices, in practical applications, the code stream does not have the complex situation of a data partitioning mechanism, and only one frame is considered to be one slice.

In step S330, a second MP4 file to be live-broadcasted is obtained from the video recording device, and the second MP4 file does not include video frame description information. The second MP4 file is output by the same video recording device as in step S320, and the same recording configuration information, such as video width and height, frame rate, encoder parameters, etc., is used to keep consistent with the first MP4 file. The same recording parameters of the same equipment are obtained, and the analyzed data are fixed. That is, the SPS and PPS of the second MP4 are also these byte data.

In step S340, the video frame byte interval of the second MP4 file is determined according to the obtained video frame byte number range and the video frame header byte.

In step S350, a slice header syntax element of a video frame of the second MP4 file is obtained according to a frame taken in the byte interval of the video frame of the second MP4 file.

How to obtain the slice header syntax element of the video frame of the second MP4 file, according to one embodiment of the present invention, the start segment of the key frame is followed by reading 4 bytes, and through decoding ue (v) of exponential golomb encoding, respectively:

pic _ order _ cnt _ lsb: the current frame serial number represents another metering mode of the current frame serial number;

first _ mb _ in _ slice: the first macroblock address in a slice;

pic _ parameter _ set _ id: the sequence number of the picture parameter set on which the current slice depends;

frame _ num: the decoding order of pictures, each reference frame has as their identifier a consecutive frame _ num, which indicates the decoding order of the pictures.

The slice type (slice _ type) may be obtained by decoding of exponential golomb coding, or may be obtained by other methods, and the scheme is not limited.

Next, in step S360, the slice header syntax element of the video frame of the first MP4 file and the slice header syntax element of the video frame of the second MP4 file are compared, and if the comparison is successful, it is regarded as a video frame.

The comparison rule is determined according to different properties of the slice header syntactic element, and according to one embodiment of the present invention, the comparison rule is as follows:

pic _ order _ cnt _ lsb: when there is a value in this field, which takes a variable but which can be fixed, the pic _ order _ cnt _ lsb of the video frames of the first MP4 file and the pic _ order _ cnt _ lsb of the video frames of the second MP4 file should be equal;

first _ mb _ in _ slice and pic _ parameter _ set _ id must be equal to 0;

whether the values of the frame _ num are increased by the step size of 1 before meeting the I frame or not, the value of the frame _ num is not more than 16, the values belong to a fixed rule and are generally 0 to 15 cycles, but in the case of containing B frames, the values of the same two frames can be equal, in the case of only containing the I frame and the P frame, the frame _ num of the current frame is 1 larger than the frame _ num of the previous frame, and the frame _ num of the I frame is fixed to be 0;

frame _ mbs _ only _ flag: a value of 0 indicates that the coded picture of the video sequence may be a coded field or a coded frame; the reference 1 indicates that the coded pictures of each coded video sequence are coded frames containing only frame macroblocks. The frame _ mbs _ only _ flag of the video frames of the first MP4 file and the frame _ mbs _ only _ flag of the video frames of the second MP4 file should be equal.

There are other rules that contain a comparison of the slice header syntax elements of the video frames of the first MP4 file with the slice header syntax elements of the video frames of the second MP4 file, which the present invention is not limited to.

Finally, in step S370, the video frame is obtained and analyzed from the second MP4 file according to the video frame interval and the video frame analysis parameter, so as to obtain the H264 code stream of the video frame, so as to perform live broadcast according to the H264 code stream.

The above rules resolve the first 4 bytes of the RBSP, plus NAL unit length 4 bytes, and NAL unit byte type 1 byte. A total of 9 bytes of a NAL unit are parsed. I.e. a legal NAL, must satisfy that the values of its first 9 bytes satisfy the above rules. The total number of the analyzed bytes is small, and the analyzing speed and accuracy are greatly improved. And the analysis is more accurate, more complex macro block level analysis is not needed, and the adverse effects of finally presented video mosaic, blurring, screen splash, green screen and the like caused by misjudgment and misjudgment can be effectively avoided.

In accordance with another embodiment of the present invention, during development, to improve accuracy, after the second MP4 file is recorded, the video frame with the parsing error is located.

Comparing the parsed video frame with the video frame of the recorded file of the second MP 4byte by byte, and locating the byte position with the parsing error, wherein the video frame with the byte position with the parsing error is the video frame with the parsing error. Because moov is not relied on, when parsing missing frames, multiple frames, frame data inconsistency and the like occur, it is necessary to locate which byte the parsing error starts from, for example 0001 c 71 c, and the length is 116508. A small piece of code may be added in the ffmpeg separator to look up this string in mp4, find its 4-byte start code, and go to little endian length. And determining a frame sequence number according to the analyzed frame number and the number of sps pps, namely completing the positioning from the position of the error byte to the analysis error occurring in the few frames. And observing the fragment header syntax element, finding out the correct fragment header syntax element of H264, and adjusting the rule. An example of code is as follows:

if(pkt.size＝＝116512){

printf("％d\n",framecnt+2*icount)；

PrintBuffer(pkt.data,10)；

}

where framecnt is the parsed frame number and icount is the number of sps pps.

The present invention may further comprise: a7, the method of any one of A1-A6, the video frame header bytes being video frame header bytes of I-frames and P-frames.

A8, the method of any one of A1-A6, the video frame description information of the first MP4 file is obtained by a separation program.

A9, the method as described in A8, the separation procedure being FFmpeg.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

Furthermore, some of the described embodiments are described herein as a method or combination of method elements that can be performed by a processor of a computer system or by other means of performing the described functions. A processor having the necessary instructions for carrying out the method or method elements thus forms a means for carrying out the method or method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.

As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.

Claims

1. A method for live streaming of MP4 files, the method being executed in a mobile terminal, the mobile terminal being connected with a video recording device, the method comprising:

acquiring a recorded first MP4 file from the video recording device, wherein the first MP4 file comprises video frames and video frame description information;

acquiring a video frame byte number range, a video frame header byte and a video frame analysis parameter from a first MP4 file based on the video frame description information, wherein the video frame byte number range comprises a video frame maximum byte number and a video frame minimum byte number;

acquiring a second MP4 file to be live-broadcasted, which is recorded by the same recording parameters, from the video recording equipment, wherein the second MP4 file does not include video frame description information;

determining a video frame byte interval of a second MP4 file according to the obtained video frame byte number range and the video frame header byte, wherein the video frame byte interval is used for distinguishing a start segment and an end segment of a video frame;

and acquiring and analyzing the video frame in real time according to the video frame byte interval and the video frame analysis parameters from the unrecorded second MP4 file to obtain an H264 code stream of the video frame so as to carry out live broadcast according to the H264 code stream.

2. The method of claim 1, wherein the step of determining the video frame byte interval of the second MP4 file according to the obtained video frame byte number range and the video frame header byte comprises:

judging whether the number of frame bytes of the second MP4 file is smaller than the maximum number of video frame bytes of the first MP4 file and larger than the minimum number of video frame bytes of the first MP4 file, if so, determining the frame of the second MP4 file as an undetermined video frame;

and judging whether the header byte of the video frame to be determined is equal to the header byte of the video frame of the first MP4 file, if so, determining the video frame to be determined as the video frame, and if not, determining the video frame to be determined as the non-video frame.

3. The method of claim 2, further comprising:

when the head byte of the video frame to be determined is equal to the head byte of the video frame of the first MP4 file, comparing the head syntax element of the video frame to be determined with the head syntax element of the video frame of the first MP4 file, if the comparison is successful, determining the video frame to be determined as the video frame, and if the comparison is not successful, determining the video frame to be determined as the non-video frame.

4. The method of claim 3, wherein the slice header syntax element of the video frames of the first MP4 file is derived from slice parsing the video frames in the first MP4 file.

5. The method of claim 3, the slice header syntax element comprising: the current frame number, the address of the first macroblock in a slice, the type of slice, the number of picture parameter sets on which the current slice depends, the decoding order of the pictures.

6. The method of any of claims 1-5, the video frame header bytes are video frame header bytes of I-frames and P-frames.

7. The method of any of claims 1-5, wherein the video frame description information of the first MP4 file is obtained by a separate program.

8. The method of claim 7, wherein the separation procedure is FFmpeg.

9. The method of any of claims 1-5, wherein the step of obtaining and parsing video frames from a second MP4 file according to the video frame interval and video frame parsing parameters comprises: after the second MP4 file is recorded, the video frame with the parsing error is located.

10. The method of claim 9, the step of locating a video frame with a parsing error comprising: comparing the parsed video frame with the video frame of the recorded file of the second MP 4byte by byte, locating the byte position with the parsing error, and locating the video frame with the parsing error according to the byte position with the parsing error.

11. A mobile terminal, comprising:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-10.

12. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed on a mobile terminal, cause the mobile terminal to perform any of the methods of claims 1-10.