CN110753259A

CN110753259A - Video data processing method and device, electronic equipment and computer readable medium

Info

Publication number: CN110753259A
Application number: CN201911122016.9A
Authority: CN
Inventors: 银国徽
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd; Douyin Vision Beijing Co Ltd
Priority date: 2019-11-15
Filing date: 2019-11-15
Publication date: 2020-02-04
Anticipated expiration: 2039-11-15
Also published as: WO2021093608A1; CN110753259B

Abstract

The present disclosure provides a method, an apparatus, an electronic device and a computer storage medium for processing video data, wherein the method comprises: acquiring a transport stream TS file to be processed; analyzing the TS file to obtain basic data stream (PES) data packets of each packet corresponding to the TS file, wherein one PES data packet corresponds to the content of one video frame; respectively analyzing each PES data packet to obtain an ES data packet contained in each PES data packet; respectively analyzing each ES data packet to obtain audio and video parameters of each ES data packet; and obtaining the audio and video parameters of the TS file based on the audio and video parameters of each ES data packet. In the embodiment of the disclosure, after the plurality of ES data packets in the TS file are obtained through analysis, the corresponding audio and video parameters can be analyzed for each ES data packet, and missing ES data packets are avoided when all ES data packets are analyzed simultaneously, so that the audio and video parameters of the TS file obtained based on analysis of each ES data packet are more accurate.

Description

Video data processing method and device, electronic equipment and computer readable medium

Technical Field

The present disclosure relates to the field of multimedia processing technologies, and in particular, to a method and an apparatus for processing video data, an electronic device, and a computer-readable medium.

Background

In the prior art, for a TS file, one TS file usually includes a plurality of ES data packets, and when audio and video parameters in the ES data packets are analyzed, the audio and video parameters in all the ES data packets are usually analyzed at the same time, so that some ES data packets are not missed and not analyzed by analyzing the ES data packets, and thus, the audio and video parameters of the TS file obtained by analysis are inaccurate.

Disclosure of Invention

The purpose of the present disclosure is to solve at least one of the above technical drawbacks and to improve the accuracy of audio/video parameters. The technical scheme adopted by the disclosure is as follows:

in a first aspect, the present disclosure provides a method for processing video data, the method including:

acquiring a transport stream TS file to be processed;

analyzing the TS file to obtain basic data stream PES data packets of each packet corresponding to the TS file;

respectively analyzing each PES data packet to obtain an ES data packet contained in each PES data packet;

respectively analyzing each ES data packet to obtain audio and video parameters of each ES data packet;

and obtaining the audio and video parameters of the TS file based on the audio and video parameters of each ES data packet.

In an embodiment of the first aspect of the present disclosure, parsing a TS file to obtain PES packets corresponding to the TS file includes:

analyzing the TS file to obtain a first TS data packet in the TS file;

and obtaining each PES (packet error correction) data packet corresponding to the TS file according to a first TS data packet meeting a preset condition, wherein the first preset condition is that a first designated identification bit of the TS data packet is a first set value.

In an embodiment of the first aspect of the present disclosure, the method further comprises:

determining the data type of the PES data packet based on the analysis result of the PES data packet, wherein the data type is video data or audio data;

if the data type is video data, the PES data packet is a video PES data packet;

if the data type is audio data, the PES packet is an audio PES packet.

In an embodiment of the first aspect of the present disclosure, parsing each PES packet to obtain an ES packet of each PES packet includes:

respectively analyzing each video PES data packet to obtain a video ES data packet corresponding to each video PES data packet;

respectively analyzing each audio PES data packet to obtain an audio ES data packet corresponding to each audio PES data packet;

and determining the ES data packet corresponding to each PES data packet based on the video ES data packet and the audio ES data packet corresponding to each PES data packet.

In an embodiment of the first aspect of the present disclosure, parsing each video PES packet to obtain a video ES packet corresponding to each video PES packet includes:

respectively analyzing the second TS data packet corresponding to each video PES data packet to obtain a video ES data packet corresponding to each video PES data packet;

the second TS packet is a TS packet meeting a second preset condition, and the second preset condition is that a second designated identification bit of the TS packet is a second set value.

In an embodiment of the first aspect of the present disclosure, the analyzing each audio PES packet to obtain an audio ES packet corresponding to each audio PES packet includes:

respectively analyzing the third TS data packet corresponding to each video PES data packet to obtain an audio ES data packet corresponding to each audio PES data packet;

the third TS data packet is a TS data packet meeting a third preset condition, and the third preset condition is that a third designated identification bit of the TS data packet is a third set value.

In an embodiment of the first aspect of the present disclosure, the audio/video parameters include a display time parameter PTS, a decoding time parameter DTS, a sequence parameter set SPS, and a picture parameter set PPS;

the method further comprises the following steps:

and carrying out format conversion on the TS file to be processed based on the audio and video parameters.

In a second aspect, the present disclosure provides an apparatus for processing video data, the apparatus comprising:

the TS file acquisition module is used for acquiring a transport stream TS file to be processed;

a PES data packet determining module for analyzing the TS file to obtain a basic data stream PES data packet of each packet corresponding to the TS file;

an ES packet determining module, configured to analyze each PES packet to obtain an ES packet included in each PES packet;

the ES data packet analysis module is used for respectively analyzing each ES data packet to obtain the audio and video parameters of each ES data packet;

and the audio and video parameter determination module is used for obtaining the audio and video parameters of the TS file based on the audio and video parameters of each ES data packet.

In an embodiment of the second aspect of the present disclosure, when the PES packet determining module analyzes the TS file to obtain each PES packet corresponding to the TS file, the PES packet determining module is specifically configured to:

analyzing the TS file to obtain a first TS data packet in the TS file;

In an embodiment of the second aspect of the present disclosure, the apparatus further comprises:

the data type determining module is used for determining the data type of the PES data packet based on the analysis result of the PES data packet, wherein the data type is video data or audio data;

if the data type is video data, the PES data packet is a video PES data packet;

if the data type is audio data, the PES packet is an audio PES packet.

In an embodiment of the second aspect of the present disclosure, when the ES packet determining module analyzes each PES packet to obtain an ES packet of each PES packet, the ES packet determining module is specifically configured to:

In an embodiment of the second aspect of the present disclosure, when the ES packet determining module respectively parses each video PES packet to obtain a video ES packet corresponding to each video PES packet, the ES packet determining module is specifically configured to:

In an embodiment of the second aspect of the present disclosure, the ES packet determining module is configured to analyze each audio PES packet, and when obtaining an audio ES packet corresponding to each audio PES packet, specifically:

In an embodiment of the second aspect of the present disclosure, the audio/video parameters include a display time parameter PTS, a decoding time parameter DTS, a sequence parameter set SPS, and a picture parameter set PPS; the device also includes:

and the format conversion module is used for converting the format of the TS file to be processed based on the audio and video parameters.

In a third aspect, the present disclosure provides an electronic device comprising:

a processor and a memory;

a memory for storing computer operating instructions;

a processor for performing the method as shown in any embodiment of the first aspect of the present disclosure by invoking computer operational instructions.

In a fourth aspect, the present disclosure provides a computer readable medium having stored thereon at least one instruction, at least one program, set of codes or set of instructions, which is loaded and executed by a processor to implement a method as shown in any one of the embodiments of the first aspect of the present disclosure.

The technical scheme provided by the embodiment of the disclosure has the following beneficial effects:

according to the video data processing method, the video data processing device, the electronic equipment and the computer readable medium, after the plurality of ES data packets in the TS file are obtained through analysis, the corresponding audio and video parameters can be analyzed for each ES data packet, missing ES data packets are avoided when all ES data packets are analyzed simultaneously, and therefore the audio and video parameters of the TS file obtained through analyzing each ES data packet are more accurate.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings used in the description of the embodiments of the present disclosure will be briefly described below.

Fig. 1 is a schematic flowchart of a method for processing video data according to an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of a video data processing apparatus according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing the devices, modules or units, and are not used for limiting the devices, modules or units to be different devices, modules or units, and also for limiting the sequence or interdependence relationship of the functions executed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

ES, elementary stream (elementary stream), is used in some hard disk players or editing systems.

TS, transport stream (transition stream), for data transmission.

PES, Elementary data Stream (Packetized Elementary Stream).

PTS, Presentation Time Stamp.

DTS, Decoding Time parameter (Decoding Time Stamp).

SPS, Sequence parameter Set (Sequence Paramater Set).

PPS, Picture Parameter Set (Picture Parameter Set).

The following describes the technical solutions of the present disclosure and how to solve the above technical problems in specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present disclosure will be described below with reference to the accompanying drawings.

In view of the above technical problem, an embodiment of the present disclosure provides a method for processing video data, as shown in fig. 1, the method may include:

step S110, a transport stream TS file to be processed is obtained.

Specifically, a ts (transport stream) file is a video clip with a fixed duration.

Step S120, analyzing the TS file to obtain the basic data stream PES data packet of each packet corresponding to the TS file.

Specifically, one TS file may include a plurality of PES packets, and one PES packet corresponds to the content of one video frame, including image information and audio data corresponding to the video frame.

Step S130, each PES packet is analyzed to obtain the ES packets included in each PES packet.

Specifically, the ES packets include image information and audio data in PES packets, and each PES packet has a corresponding ES packet.

And step S140, respectively analyzing each ES data packet to obtain the audio and video parameters of each ES data packet.

And step S150, obtaining audio and video parameters of the TS file based on the audio and video parameters of each ES data packet.

Specifically, the audio and video parameters obtained through the analysis may be used to perform subsequent processing on the TS file, for example, the audio and video parameters obtained through the analysis may be used to perform format conversion on the TS file, so that the TS file may be played through a player. The audio-video parameters may include a display time parameter PTS, a decoding time parameter DTS, a sequence parameter set SPS, and a picture parameter set PPS if the TS file is format-converted based on the HLS protocol.

According to the scheme in the embodiment of the disclosure, after the plurality of ES data packets in the TS file are obtained through analysis, the corresponding audio and video parameters are analyzed for each ES data packet, and missing ES data packets are avoided when all ES data packets are analyzed simultaneously, so that the audio and video parameters of the TS file obtained through analyzing each ES data packet are more accurate.

In the embodiment of the present disclosure, in step S120, analyzing the TS file to obtain each PES packet corresponding to the TS file may include:

analyzing the TS file to obtain a first TS data packet in the TS file;

Specifically, one TS file may include a plurality of TS packets, and one TS packet is typically 188 bytes. When parsing the PES packet, it is actually parsing the body data in the TS packet therein, and one PES packet may include at least one TS packet. Before analyzing the TS packet, it may be determined whether the TS packet is a packet that meets the analysis condition, and if the TS packet is a packet that meets the analysis condition, the TS packet is analyzed, and if the TS packet is not a packet that meets the analysis condition, the TS packet is not analyzed.

As an example, for example, if the first setting value is 1, and the first specified identification bit is the first 3 bytes or the first 4 bytes of the body data, when parsing the body data in the TS packet, the first 3 bytes or the first 4 bytes are read first, if the 3 bytes or the 4 bytes are 1, it indicates that the first TS packet meets the parsing condition, and can be parsed, otherwise, if the 3 bytes or the 4 bytes are not 1, it indicates that the TS packet does not meet the parsing condition, and cannot be parsed.

In an embodiment of the present disclosure, the method may further include:

if the data type is video data, the PES data packet is a video PES data packet;

if the data type is audio data, the PES packet is an audio PES packet.

Specifically, one PES packet corresponds to the content of one video frame, and the content of one video frame may include video data and audio data, so that the data in the PES packet may include the audio data and the video data, and when the TS file is parsed, the PES packet may be obtained by parsing according to the audio data and the video data, and the PES packet may be an audio PES packet, or a PES video packet.

As an example, for example, one data type identification bit obtained by parsing a PES packet is streamId, and whether the data in the PES packet is audio data or video data can be determined based on the streamId, in this example, when the streamId is between c0 and df (16 systems), the PES packet is an audio PES packet, and when the streamId is between e0 and ef, the PES packet is a video PES packet.

In the embodiment of the present disclosure, the analyzing each PES packet to obtain an ES packet of each PES packet includes:

Specifically, if the PES packet is a video PES packet, the PES packet corresponds to the video PES packet, and a video ES packet corresponding to the video PES packet may be obtained through parsing, and if the PES packet is an audio PES packet, the PES packet corresponds to the audio PES packet, and an audio ES packet corresponding to the audio PES packet may be obtained through parsing, and then the ES packet corresponding to the PES packet may be obtained based on the video ES packet and the audio ES packet.

In the embodiment of the present disclosure, the parsing each video PES packet to obtain a video ES packet corresponding to each video PES packet includes:

Specifically, if the PES packet is a video PES packet, it is determined that a second TS packet meeting a second preset condition is included in TS packets corresponding to the video PES packet, and if the second TS packet is a TS packet meeting the second preset condition, it indicates that a corresponding video ES packet can be parsed from the video PES packet, otherwise, if the second TS packet is not a TS packet meeting the second preset condition, it does not parse the corresponding video ES packet from the video PES packet.

For example, if the second designated identification bit is the first 3 bits or the first 4 bits of the TS packet, and the second setting value is 1, the first 3 bits or the first 4 bits of the body data corresponding to the video PES packet are parsed, if the first 3 bits or the first 4 bits are 1, the video PES packet is determined to meet the second preset condition, the second TS packet meeting the second preset condition is parsed, and the video ES packet corresponding to the video PES packet is obtained, otherwise, if the first 3 bits or the first 4 bits are not 1, the video ES packet cannot be parsed from the video PES packet.

In the embodiment of the present disclosure, the analyzing each audio PES packet to obtain an audio ES packet corresponding to each audio PES packet includes:

Specifically, if the PES packet is an audio PES packet, it is determined that a third TS packet meeting a third preset condition is included in TS packets corresponding to the audio PES packet, and if the third TS packet is a TS packet meeting the third preset condition, it indicates that the corresponding audio ES packet can be parsed from the audio PES packet, otherwise, if the third TS packet is not a TS packet meeting the third preset condition, the corresponding audio ES packet cannot be parsed from the audio PES packet.

As an example, for example, if the third specified flag is the corresponding flag aac in the TS packet, and the third set value is 0xff (hexadecimal), then for the audio PES packet, first parsing 2 bytes of the body data corresponding to the audio PES packet, shifting the byte to the right by 4 bits, and then determining whether 0xff is present, if 0xff is present, it indicates that the audio PES packet meets the third preset condition, then parsing the third TS packet meeting the third preset condition, and may obtain the audio ES packet corresponding to the audio PES packet, otherwise, if not 0xff, it indicates that the audio ES packet cannot be parsed from the audio PES packet.

In the example of the present invention, while parsing the audio PES packet, other related parameters, such as a channel, a frequency, etc., may also be parsed, and these may also be used as a part of the audio/video parameters.

In the embodiment of the present disclosure, in step S110, acquiring a transport stream TS file to be processed may include:

acquiring an m3u8 file to be processed;

analyzing the m3u8 file to be processed, and determining the playing address information of each TS file corresponding to the m3u8 file to be processed according to the analysis result;

and acquiring the TS file to be processed according to the playing address information, wherein the TS file to be processed is a file in each TS file corresponding to the m3u8 file to be processed.

Specifically, the m3u8 file is a plain text file, the m3u8 file may originate from a network, and the m3u8 file may be a live file or an on-demand file. After obtaining the m3u8 file to be processed, the m3u8 file to be processed may be parsed to obtain a corresponding TS file, which may include a plurality of TS files, wherein the m3u8 file may be parsed by a regular parsing manner.

The m3u8 file may also carry an index, where the index identifies each TS file and corresponding play address information, and each TS file corresponds to one play address information, so that when the m3u8 file to be processed is analyzed to obtain a corresponding TS file, the TS file to be processed may be obtained based on the index. The playing address information is an online playing address corresponding to the TS file, and based on the playing address information, the TS file corresponding to the address information can be correspondingly played. Before playing the TS file, the format conversion process needs to be performed on the TS file, so that the format of the TS file conforms to the playing format requirement of the player.

The TS file to be processed may be any one or several of the TS files corresponding to the m3u8 file to be processed, or may be one of the TS files determined based on the play request of the user. The play request may be a link request sent by the user to the server through the terminal, indicating that the user wants to play a certain m3u8 file, and the play request may include an identifier of the m3u8 file, based on which the server knows which m3u8 file the user wants to play.

If the m3u8 file includes a plurality of TS files and the playing address information corresponding to each TS file may further include a playing sequence, the plurality of TS files may be played based on the address information corresponding to each TS file according to the playing sequence.

In the embodiment of the disclosure, the m3u8 file carries an identifier of a file type, and the file type is an on-demand file or a live file;

if the file type is a live file, acquiring the m3u8 file to be processed as an m3u8 file acquired in real time.

Specifically, when the m3u8 file is parsed, the m3u8 file carries an identifier for identifying a file type of the m3u8 file, the identifier may be a certain field in the file, and by the field, the file type of the m3u8 file may be determined, that is, the m3u8 file is an on-demand file or a live file, the m3u8 file is consistent with the file type of the parsed TS file, the m3u8 file is an on-demand file, the parsed TS file is also an on-demand file, the m3u8 file is a live file, and the parsed TS file is also a live file.

If the file type of the m3u8 file is an on-demand file, the content in the m3u8 file is not changed, if the file type of the m3u8 file is a live file, the content in the m3u8 file is continuously changed, that is, new content is continuously added to the m3u8 file, the m3u8 file needs to be obtained in real time, and the m3u8 file is analyzed in real time to obtain a corresponding TS file, so that if the file type of the m3u8 file is an on-demand file, the number of TS files obtained by analyzing the m3u8 file is fixed, and if the file type of the m3u8 file is a live file, the number of TS files obtained by analyzing the m3u8 file is changed in real time.

When the m3u8 file is parsed, not only the file type of the m3u8 file but also the time length of each parsed TS file can be obtained.

In the embodiment of the disclosure, the audio and video parameters comprise a display time parameter PTS, a decoding time parameter DTS, a sequence parameter set SPS and an image parameter set PPS; the method further comprises the following steps:

and based on the display time parameter PTS, the decoding time parameter DTS, the sequence parameter set SPS and the image parameter set PPS, carrying out format conversion on the TS file to be processed.

Specifically, m3u8 is a special video format for some mobile device browser cached videos, and a common player cannot normally play the videos and needs transcoding to play the videos. Therefore, the TS file to be processed can be converted into a file with a set format based on the audio and video parameters corresponding to the TS file to be processed, wherein the set format is a format of a playable file corresponding to the player. For example, fmp4 format is a video format that can be directly played on a mobile device by a normal player, so that an M3U8 file can be transcoded into a format compatible with the normal player, that is, fmp4 format, and the transcoded format of a TS file is compatible with the playing format of the player, through which the file with the set format can be directly played.

In an embodiment of the present disclosure, after performing format conversion on the TS file to be processed, the method may further include:

acquiring a playing request aiming at an m3u8 file to be processed and a TS file after format conversion;

determining playing address information corresponding to each TS file in m3u8 files to be processed;

and sequentially playing the TS files after format conversion corresponding to the playing address information according to the playing sequence corresponding to the playing address information.

Specifically, after the format conversion is performed on the TS file, the file format of the converted TS file may correspond to the playing format of the player, and the TS file after the corresponding format conversion may be played according to the playing sequence corresponding to each playing address information, so as to implement playing of the TS file, where the played TS file may be any one or several TS files in the m3u8 files.

The following is a specific example to specifically explain the scheme of the present disclosure:

step 1: and analyzing the TS file to obtain a TS data packet.

Step 2: and based on the TS data packet, searching to obtain the PAT table.

Specifically, each TS packet has a PID field, and the lookup PAT table is actually a TS packet whose PID is 0, that is, a first TS packet including a program table, among a plurality of TS packets obtained by parsing a TS file.

And step 3: based on the PAT table, a PMT table is obtained through lookup.

Specifically, the first TS packet having a PID of 0 includes a program table, which is a PMT table, and it is possible to know which of a plurality of TS packets obtained by analyzing a TS file is the program table from the PID field. Based on the program table, attribute information of the TS packets (the PMT table includes related information for identifying which of the TS packets are audio data and which are video data) can be obtained correspondingly, the attribute information includes data types of the TS packets, the data types include video data and audio data, and the data types can be represented by different type identifiers, for example, a video PID and an audio PID, and it can be known which of the TS packets are video data and which of the TS packets are audio data through the video PID and the audio PID.

And 4, step 4: based on the PMT table, a plurality of TS packets are classified by video and audio.

Specifically, it can be known based on the PMT table which data in the TS packets are audio data and which data are video data, and then the video data in the plurality of TS packets are classified into one type and the audio data are classified into one type.

And 5: and aggregating the plurality of TS data packets into a PES data packet based on the classified plurality of TS data packets.

Specifically, one TS packet has only 188 bytes, one frame of image played by the player may correspond to a plurality of 188 bytes, and the content of one video frame corresponds to one PES packet, so that the data of one TS packet may be only a part of the content corresponding to the content of one video frame, and thus, a plurality of TS packets are aggregated into at least one PES packet; the classified video data and audio data are respectively aggregated to obtain at least one PES packet, and the PES packet may be an audio PES packet or a video PES packet.

In the aggregation process, one PES packet may be identified by an identification bit payload, that is, starting from payload 0 to the next payload 0, where payload 0 indicates the start of one PES packet, and the obtained one PES packet may be a video PES packet or an audio PES packet.

Step 6: analyzing each PES data packet, and determining the data type of each PES data packet;

specifically, the data type of the PES packet is audio data or video data, that is, the PES packet is parsed, and it can be known whether the PES packet is video data (video PES packet) or audio data (audio PES packet), and when the TS packet is classified in step 4, it is already distinguished which is video and which is audio by the PID of video and PID of audio, so that when the PES is parsed, it is still possible to distinguish which is audio data and which is video data in the PES based on the PID of video and PID of audio.

It is understood that after parsing the PES packet, not only the data type of the PES packet may be determined, but also other parameters, such as ESCR, CRC, and packet length of the PES packet, may be obtained.

And 7: based on the PES packet, a display time parameter PTS and a decoding time parameter DTS are calculated.

Specifically, PTS and DTS are two encoding-related parameters required in an MP4 format (player format) file, where PTS is a decoding time stamp, DTS is a presentation time stamp, the decoding time stamp precedes the presentation time stamp, the decoding time stamp is used to tell a decoder when to decode a PES, and the presentation time stamp is used to tell the decoder when to play the PES.

And 8: determining an ES packet of the PES packets, and storing the ES packet of the PES packets.

Specifically, the data in the ES packets is real media data, i.e., image information of video, which does not include any information other than the image information, e.g., neither PTS nor DTS is included in the ES packets. And if only one ES data packet should be contained in one PES data packet, merging the data in the TS data packet corresponding to the PES data packet to obtain the corresponding ES data packet.

And step 9: based on the ES packets, the SPS and the PPS are determined.

Specifically, the SPS and PPS are two parameters that are necessary for decoding MP4 format (player playing format) files, and can be determined based on ES packets, and the specific determination process is as follows:

based on the data corresponding to the TS packet in the ES packet, based on the body (body includes the real data in the TS packet) data in the first TS packet (TS {0}), reading 4 bytes from the first byte of the body data first, determining whether the read byte is 1, if the read byte is 1, continuing to read 1 byte, if the read byte is not equal to 0, storing the current byte in the SPS, obtaining the SPS, if the read byte is equal to 0, reading 3 bytes, if the three bytes are equal to 1, returning to 4 bytes; the SPS obtained in the above process is a number between 0 and 1 in the ES data, and the number outside 0 to 1 is the PPS.

The above procedure for calculating SPS is to find out the number whose start is 001 or 0001 in ES data, if 1, continue reading the subsequent bytes, otherwise exit, if 1, continue reading the subsequent bytes until the next 001 or 0001, if there is a number not equal to 0 between the first 001 or 0001 and the next 001 or 0001, then SPS is obtained, and PPS is obtained if more than 0001 or 001.

Step 10: and transcoding the TS file into a file in a playing format of the player based on the calculated PTS, DTS, SPS and PPS parameters, for example, if the playing format of the player is mp4, transcoding the TS file into a file in an mp4 format.

In this example, in step 6, the specific process of parsing one PES packet is as follows:

step A: the first TS packet of the PES packets is extracted.

Specifically, when a plurality of TS packets are aggregated into a PES packet, a plurality of PES packets may be obtained, and one PES packet corresponds to a plurality of TS packets, and then, based on the body data in the PES packet, analysis is started from the first TS packet (TS {0}) corresponding to the PES packet, where TS {0} corresponds to data corresponding to a payload equal to 0, and in one PES packet, data in the TS packet corresponding to a first payload equal to 0 to a next payload equal to 0 corresponds to one PES packet.

And B: based on the body data in the PES packet, it is determined whether the first TS packet is a normal packet.

Specifically, based on the body data in the PES packet, 3 bytes 001 or 4 bytes 0001 are read first, if the 3 bytes or 4 bytes are not 1, this TS packet is an abnormal packet, and is not analyzed, and if the 3 bytes or 4 bytes are 1, this TS packet is a normal packet, and is analyzed.

And C: when the first TS data packet is a normal data packet, the data type of the current PES data packet is determined.

Specifically, based on the body data in the PES packet, first read 3 bytes 001, or 4 bytes 0001, and when the 3 bytes or 4 bytes are 1, read 1 byte again, and the identification bit corresponding to the byte is streamId, the data type of the current PES can be determined by the identification bit, where the data type of the PES includes video data and audio data. Specifically, one implementation way of determining whether the current PES packet is video data or audio data based on streamId is as follows: streamId is audio data between c0 and df (16 systems) and video data between e0 and ef.

In the embodiment of the disclosure, in the process of continuously reading the body data in the PES packet, the body data further includes a parameter identifier of the PTS and a parameter identifier of the DTS, the parameter identifier corresponding to the PTS is a first identifier, the parameter identifier corresponding to the DTS is a second parameter identifier, if the first identifier corresponding to the PTS is read as 1, the PTS can be correspondingly obtained, if the parameter identifier corresponding to the PTS is read as 0, the reading can not be continued, the principle that the parameter identifier of the DTS is the same as the parameter identifier of the PTS is adopted, if the second parameter identifier corresponding to the DTS is 1, the DTS can be correspondingly obtained, and if the parameter identifier corresponding to the DTS is 0, the reading can not be continued. During reading the body data, information corresponding to other identification bits, such as the packet length of the PES packet, may also be obtained.

In the embodiment of the present disclosure, in step 7, the calculating to obtain PTS and DTS based on the PES packet may specifically include:

calculating to obtain PTS and DTS based on a flag ptsdtsflag in the PES data packet, wherein the ptsdtsflag is 2 or 3;

when ptsdtsflag is 2, based on the body data in the PES packet, firstly reading 1 byte from the body data corresponding to ts {0}, shifting right by one bit, performing an and with 0x07 to obtain PTS0, then reading 2 bytes, shifting right by one bit to obtain PTS1, then reading 2 bytes, shifting right by one bit to obtain PTS2, then PTS ═ PTS0 shifts left by 30 bits, PTS1 shifts left by 15 bits, PTS2, and at this time, PTS corresponds to a numerical value;

if the PES packet is audio data, PTS is DTS.

When ptsdtsflag is 3, the calculation of PTS mode is the same as that of PTS when ptsdtsflag is 2, and the calculation of DTS mode is the same as that of PTS calculation, but the meaning of the corresponding read byte is different.

In an embodiment of the present disclosure, in step 8, determining an ES packet in the PES packets, and storing the ES packet in the PES packets may include:

parsing, based on the data type of the PES, the ES packets in the PES packets:

if the ES data packet is video data, based on the body data in the ES data packet, first read 3 bytes 001, or 4 bytes 0001, if the 3 bytes or 4 bytes are not 1, it indicates that the TS data packet is an abnormal data packet, and do not perform parsing, and if the 3 bytes or 4 bytes are 1, it indicates that the TS data packet is a normal data packet, and may perform parsing. After the TS packet is a normal packet, reading an identifier NAL (network abstraction layer), and calculating to obtain SPS and PPS based on the ES packet. The code stream format of h264 includes a byte stream format, and the byte stream format is a format specified in an h264 official protocol document. Can be the default output format of most encoders. The basic data unit of the byte stream format is NAL unit, i.e. NALU. To extract NALUs from the byte stream, the protocol provides that each NALU is preceded by a start code: 0x000001 or 0x00000001(0x represents hexadecimal).

And then combining the ES data packets in the TS data packets based on the SPS, the PPS, the PTS and the DTS, namely, grouping the ES data packets in the TS data packets in one PES data packet according to a time sequence to obtain the ES data packet corresponding to one PES data packet.

If the ES data packet is audio data, checking whether the identification bit aac is normal, specifically: reading 2 bytes, shifting 4 bits to the right, judging whether the byte is 0xff, if so, judging that the mark bit aac is normal, and if not, judging that the mark bit aac is abnormal; if the identification bit aac is normal, an ID is analyzed from the ES data packet, and based on the ID, the mpeg type is determined, wherein the mpeg type comprises mpeg-2 and mpeg-4, and parameters such as a sound channel, frequency, audio decoding configuration and the like are analyzed.

Based on parameters such as audio channels, frequencies, audio decoding configurations, and the like, all ES packets are merged, that is, the ES packets in each TS packet are merged, specifically, the ES packets corresponding to each TS packet in one PES packet are merged according to a time sequence order, so as to obtain the ES packet corresponding to one PES packet. Wherein the calculating of the audio coding configuration is based on channel and frequency calculations.

And finally, storing the ES data packet obtained by combination.

Based on the same principle as the video data processing method shown in fig. 1, an embodiment of the present disclosure also provides a video data processing apparatus 20, as shown in fig. 2, where the apparatus 20 may include: a TS file acquisition module 210, a PES packet determination module 220, an ES packet determination module 230, an ES packet parsing module 240, and an audio/video parameter determination module 250, wherein,

a TS file obtaining module 210, configured to obtain a transport stream TS file to be processed;

a PES packet determining module 220, configured to parse the TS file to obtain a basic data stream PES packet of each packet corresponding to the TS file;

an ES packet determining module 230, configured to analyze each PES packet to obtain an ES packet included in each PES packet;

the ES data packet analyzing module 240 is configured to analyze each ES data packet to obtain an audio/video parameter of each ES data packet;

and an audio/video parameter determining module 250, configured to obtain audio/video parameters of the TS file based on the audio/video parameters of each ES data packet.

The video data processing device of the embodiment of the disclosure can analyze corresponding audio and video parameters for each ES data packet after analyzing a plurality of ES data packets in a TS file, and avoid missing ES data packets when analyzing all ES data packets simultaneously, so that the audio and video parameters of the TS file obtained based on analyzing each ES data packet are more accurate.

Optionally, when the PES packet determining module analyzes the TS file to obtain each PES packet corresponding to the TS file, the PES packet determining module is specifically configured to:

analyzing the TS file to obtain a first TS data packet in the TS file;

Optionally, the apparatus further comprises:

if the data type is video data, the PES data packet is a video PES data packet;

if the data type is audio data, the PES packet is an audio PES packet.

Optionally, the ES packet determining module, when analyzing each PES packet to obtain an ES packet of each PES packet, is specifically configured to:

Optionally, the ES packet determining module, when analyzing each video PES packet to obtain a video ES packet corresponding to each video PES packet, is specifically configured to:

Optionally, the ES packet determining module is configured to analyze each audio PES packet to obtain an audio ES packet corresponding to each audio PES packet, and is specifically configured to:

Optionally, the audio-video parameters include a display time parameter PTS, a decoding time parameter DTS, a sequence parameter set SPS, and a picture parameter set PPS; the device also includes:

The apparatus of the present disclosure may execute the video data processing method shown in fig. 1, and the implementation principle is similar, the actions executed by the modules in the video data processing apparatus in the embodiments of the present disclosure correspond to the steps in the video data processing method in the embodiments of the present disclosure, and for the detailed functional description of the modules in the video data processing apparatus, reference may be specifically made to the description in the corresponding video data processing method shown in the foregoing, and details are not repeated here.

Based on the same principle as the method in the embodiment of the present disclosure, reference is made to fig. 3, which shows a schematic structural diagram of an electronic device (e.g., a terminal device or a server in fig. 1) 600 suitable for implementing the embodiment of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

The electronic device includes: a memory and a processor, wherein the processor may be referred to as the processing device 601 hereinafter, and the memory may include at least one of a Read Only Memory (ROM)602, a Random Access Memory (RAM)603 and a storage device 608 hereinafter, which are specifically shown as follows:

as shown in fig. 3, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 3 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (Hyper Text transfer protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a transport stream TS file to be processed; analyzing the TS file to obtain basic data stream PES data packets of each packet corresponding to the TS file; respectively analyzing each PES data packet to obtain an ES data packet contained in each PES data packet; respectively analyzing each ES data packet to obtain audio and video parameters of each ES data packet; and obtaining the audio and video parameters of the TS file based on the audio and video parameters of each ES data packet.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules or units described in the embodiments of the present disclosure may be implemented by software or hardware. Wherein the designation of a module or unit does not in some cases constitute a limitation of the unit itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, [ example one ] there is provided a video data processing method, including:

acquiring a transport stream TS file to be processed;

According to one or more embodiments of the present disclosure, parsing a TS file to obtain PES packets corresponding to the TS file includes:

analyzing the TS file to obtain a first TS data packet in the TS file;

In accordance with one or more embodiments of the present disclosure, the method further comprises:

if the data type is video data, the PES data packet is a video PES data packet;

if the data type is audio data, the PES packet is an audio PES packet.

According to one or more embodiments of the present disclosure, parsing each PES packet to obtain an ES packet of each PES packet includes:

According to one or more embodiments of the present disclosure, parsing each video PES packet to obtain a video ES packet corresponding to each video PES packet includes:

According to one or more embodiments of the present disclosure, parsing each audio PES packet to obtain an audio ES packet corresponding to each audio PES packet includes:

According to one or more embodiments of the present disclosure, the audio-video parameters include a display time parameter PTS, a decoding time parameter DTS, a sequence parameter set SPS, and a picture parameter set PPS; the method further comprises the following steps:

According to one or more embodiments of the present disclosure, [ example two ] there is provided an apparatus of processing of video data, the apparatus comprising:

According to one or more embodiments of the present disclosure, when the PES packet determining module analyzes the TS file to obtain each PES packet corresponding to the TS file, the PES packet determining module is specifically configured to:

analyzing the TS file to obtain a first TS data packet in the TS file;

According to one or more embodiments of the present disclosure, the apparatus further comprises:

if the data type is video data, the PES data packet is a video PES data packet;

if the data type is audio data, the PES packet is an audio PES packet.

According to one or more embodiments of the present disclosure, data in a TS packet corresponding to a PES packet is data that meets an analysis condition that a specified identifier of the TS packet is a set value. The ES packet determining module is specifically configured to, when analyzing each PES packet to obtain an ES packet of each PES packet:

According to one or more embodiments of the present disclosure, when the ES packet determining module respectively parses each video PES packet to obtain a video ES packet corresponding to each video PES packet, the ES packet determining module is specifically configured to:

According to one or more embodiments of the present disclosure, the ES packet determining module analyzes each audio PES packet, and when obtaining an audio ES packet corresponding to each audio PES packet, is specifically configured to:

According to one or more embodiments of the present disclosure, the audio-video parameters include a display time parameter PTS, a decoding time parameter DTS, a sequence parameter set SPS, and a picture parameter set PPS; the device also includes:

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A method for processing video data, comprising:

acquiring a transport stream TS file to be processed;

analyzing the TS file to obtain basic data stream (PES) data packets of each packet corresponding to the TS file;

2. The method according to claim 1, wherein the parsing the TS file to obtain each PES packet corresponding to the TS file comprises:

analyzing the TS file to obtain a first TS data packet in the TS file;

and obtaining each PES data packet corresponding to the TS file according to a first TS data packet meeting a preset condition, wherein the first preset condition is that a first designated identification bit of the TS data packet is a first set value.

3. The method according to claim 1 or 2, characterized in that the method further comprises:

if the data type is the video data, the PES data packet is a video PES data packet;

if the data type is the audio data, the PES data packet is an audio PES data packet.

4. The method of claim 3, wherein the parsing each of the PES packets to obtain ES packets for each of the PES packets comprises:

and determining an ES data packet corresponding to each PES data packet based on the video ES data packet and the audio ES data packet corresponding to each PES data packet.

5. The method of claim 4, wherein the parsing each video PES packet to obtain a video ES packet corresponding to each video PES packet comprises:

respectively analyzing a second TS data packet corresponding to each video PES data packet to obtain a video ES data packet corresponding to each video PES data packet;

the second TS data packet is a TS data packet meeting a second preset condition, where the second preset condition is that a second designated identification bit of the TS data packet is a second set value.

6. The method of claim 4, wherein the parsing each audio PES packet to obtain an audio ES packet corresponding to each audio PES packet comprises:

the third TS data packet is a TS data packet meeting a third preset condition, where the third preset condition is that a third designated identification bit of the TS data packet is a third set value.

7. The method according to any of claims 1 to 6, wherein the audio-video parameters comprise a display time parameter PTS, a decoding time parameter DTS, a sequence parameter set SPS and a picture parameter set PPS; the method further comprises the following steps:

8. An apparatus for processing video data, comprising:

a PES packet determining module, configured to parse the TS file to obtain a basic data stream PES packet of each packet corresponding to the TS file;

9. An electronic device, comprising:

a processor and a memory;

the memory is used for storing computer operation instructions;

the processor is used for executing the method of any one of claims 1 to 7 by calling the computer operation instruction.

10. A computer readable medium storing at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method of any one of claims 1 to 7.