WO2012079406A1 - 帧类型的检测方法和装置 - Google Patents

帧类型的检测方法和装置 Download PDF

Info

Publication number
WO2012079406A1
WO2012079406A1 PCT/CN2011/080343 CN2011080343W WO2012079406A1 WO 2012079406 A1 WO2012079406 A1 WO 2012079406A1 CN 2011080343 W CN2011080343 W CN 2011080343W WO 2012079406 A1 WO2012079406 A1 WO 2012079406A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
current
data amount
threshold
type
Prior art date
Application number
PCT/CN2011/080343
Other languages
English (en)
French (fr)
Inventor
沈秋
谢清鹏
张冬
李厚强
Original Assignee
华为技术有限公司
中国科学技术大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司, 中国科学技术大学 filed Critical 华为技术有限公司
Priority to EP11848791.7A priority Critical patent/EP2637410B1/en
Publication of WO2012079406A1 publication Critical patent/WO2012079406A1/zh
Priority to US13/919,674 priority patent/US9497459B2/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/577Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/89Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving methods or arrangements for detection of transmission errors at the decoder

Definitions

  • the present invention relates to the field of video processing technologies, and in particular, to a frame type detection method and apparatus. Background technique
  • the decodable data frame types in the video coding standard can be classified into intra-coded frames (I-Frames, Intra coded frames, I-frames), unidirectional predictive coded frames (P-Frames, Predicted frames, P-frames), bi-directional predictive coding.
  • I-Frames Intra coded frames
  • P-Frames unidirectional predictive coded frames
  • B-Frame Bi-directional predicted frames
  • I frames are the decodable start, commonly referred to as random access points, providing services such as random access and fast browsing.
  • different frame types are erroneous, and the influence on the subjective quality of the decoding end is different.
  • the I frame has the function of truncating error propagation.
  • the commonly used streaming technology is mainly the Internet Streaming Media Alliance (SMAA) and the Moving Picture Expert Group-2 Mobile Stream over Internet Protocol (MPEG-2 TS). Over IP) mode, these two protocol modes are designed to indicate the type of video data when the compressed video data stream is encapsulated.
  • the ISMA method encapsulates the compressed video data stream directly using the Real-time Transport Protocol (RTP), which complies with Internet Standard 3016 (Request For Comments 3016, RFC3016), H.264/Hearing and Vision.
  • RTP Real-time Transport Protocol
  • AVC Aural and Visual Code
  • RFC3984 the sequence number (Sequence Number) and timestamp (Timestamp) included in the RTP header can be used.
  • MPEG-2 TS over IP mode is also divided into two types: User Datagram Protocol/IP over TS (TS over UDP/IP) and real-time Transport over the transport protocol / UDP / IP transport (TS over Real-time Transport Protocol / UDP / IP, TS over RTP / UDP / IP), in the video transmission is more commonly used TS over RTP / UDP / IP (rear TS over RTP ) is to encapsulate the compressed video data stream into an elementary stream, further divide the elementary stream into TS packets, and finally encapsulate and transmit the TS packets by RTP.
  • RTP is a transmission protocol for multimedia data streams. It is responsible for providing end-to-end real-time data transmission. Its message mainly consists of four parts: RTP header, RTP extension header, payload header, and payload data.
  • the data contained in the RTP header mainly includes: serial number, time stamp, flag bit, and so on.
  • the serial number corresponds to the RTP packet—corresponding to each other, one packet is sent for each packet, which can be used to detect packet loss.
  • the timestamp can indicate the sampling time of the video data. Different frames have different timestamps, which can indicate the playback order of the video data.
  • the flag bit is used to identify the end of a frame. This information is an important basis for frame type judgment.
  • a TS packet has 188 bytes, and is composed of a packet header, a variable length adaptation header, and payload data, wherein a payload unit start indicator (PUSI) indicates whether the payload data includes a packetized packet.
  • PUSI payload unit start indicator
  • PES Packet Elementary Stream
  • PSI Program Special Information
  • each PES header indicates the beginning of a NAL unit.
  • the PES packet is composed of a PES packet header and subsequent packet data, and original stream data (video, audio, etc.) is loaded in the PES packet data.
  • the PES packet is inserted in the transport stream packet, and the first byte of the header of each PES packet is the first byte of the transport stream packet payload. That is, a PES packet header must be included in a new TS packet, and the PES packet data should be filled with the payload area of the TS transport packet. If the end of the PES packet data cannot be aligned with the end of the TS packet, the adaptive region of the TS is required.
  • the PES priority indicates the importance of the payload in the PES packet data.
  • 1 indicates the Intra data;
  • the PTS indicates the display time, and
  • the DTS indicates the decoding time, which can be used to determine the content of the video payload. Correlation before and after, to determine the type of load.
  • the transmission is encrypted in the transmission process.
  • the encryption of the TS packet is to encrypt the payload portion of the packet. Once the scrambling flag of the TS header is set to 1, the payload is encrypted. At this time, only the length of the data packet having the same PID between adjacent PUSIs can be utilized. (The length of the same video frame) to determine the payload data type. If the PES header in the TS packet is not encrypted, in addition to determining the data frame type using the length of the above video frame, the PTS can also be utilized to assist in determining the frame type.
  • the length of each video frame is obtained by parsing the TS packet, and the frame type is inferred by the length size information.
  • the proposed method is to determine the frame type in the case where the payload portion of the TS packet is encrypted.
  • the method determines the lost state of the packet by parsing the Continuity Counter field of the TS packet, estimates the lost packet state by using the structural information of the previous Group Of Pictures (GOP), and combines the TS packet header adaptive field.
  • the available information Random Access Indicator, RAI or Elementary Stream Priority Indicator, ESPI is used to determine the type of video frame.
  • the maximum value of the currently cached data is used as the I frame by buffering the data of one GOP.
  • the length of the GOP needs to be predefined. Once the length of the GOP changes, the method will be invalid. .
  • each frame whose selection data amount is larger than the surrounding frame is determined as a P frame.
  • Determining the frame mode included in the GOP structure of the processing target stream A continuous frame corresponding to the N determined frame patterns is selected as the determination target frame in the fixed period, and the size relationship between the data amounts of the determined target frames is compared with the determined frame mode, and the P frame can be determined based on the matching therebetween.
  • the determination frame mode This mode includes all consecutive B frames immediately before the P frame and one B frame in the next frame of the P frame.
  • Adjustment coefficient a temporary adjustment coefficient sequentially selected within a given range to perform the same processing as the frame type determination processing, thereby estimating the frame type of each frame in a predetermined learning period, and calculating the estimation result from The error determination ratio of the actual frame type acquired in the encrypted stream is obtained as the real adjustment coefficient with the temporary adjustment coefficient having the lowest error determination ratio.
  • the judgment method is: I frame, and the frame other than the P frame is determined as the B frame.
  • the packet loss can be detected based on the RTP sequence number and the TS header continuity indicator (CC), and the lost packet state can be estimated by the GOP structure to achieve a certain degree. correct.
  • the GOP information needs to be input in advance, and the method for adjusting the threshold needs to acquire the frame type information from the unencrypted code stream to train the coefficient, which requires excessive manual intervention.
  • the I frame judgment is performed only once, and the adjustable coefficient is the period. In each period, the maximum value is directly taken as I, and only the local characteristics are considered, and the global characteristics are not considered.
  • Scaled_max_iframe scaled_max_iframe*0.995; where scaled_max_iframe is the previous I frame size.
  • Itlesh (scaled_max_iframe/4+ av_nbytes*2)/2; where av_nbytes is the sliding average of the current 8 frames.
  • Scaled_max_pframe scaled_max_pframe*0.995; where scaled_max_pframe is the size of the previous P frame.
  • Detection I frame The video will have an I frame at regular intervals, the I frame is larger than the average value, and the I frame is larger than the P frame. If the current frame data amount is larger than Ithresh, the frame is considered to be an I frame.
  • Detection P frame The B frame is smaller than the average value. If the current frame has a larger amount of data than Pthresh, 'J, in Ithresh, the frame is considered to be a frame.
  • the other frames are ⁇ frames.
  • the second method for determining the frame type uses the attenuation factor to control the threshold.
  • the factor directly affects the judgment of the I frame.
  • the I frame is easily determined; but when the subsequent I frame is much smaller than the current I.
  • the algorithm is fixed at 0.995, and the situation in which the GOP changes drastically is not considered, and it is not applicable in many cases.
  • the technical problem to be solved by the embodiments of the present invention is to provide a frame type detection method and apparatus, which improve the correct rate of frame type detection.
  • the method for detecting a frame type provided by the present invention can be implemented by the following technical solutions:
  • the play time of the current frame is less than the maximum play time of the already received frame, it is determined that the current frame is a bidirectional predictive coded frame frame.
  • a method for detecting a frame type including:
  • an encoding type of a code stream in which the received frame is located where the encoding type includes: open loop coding and closed loop coding;
  • the first threshold is calculated by setting an average data amount of consecutive frames and an I frame data amount; If the previous frame of the current frame is an I frame, the encoding type is closed-loop encoding, and the current frame is non-obvious I frame, or if the previous frame of the current frame is an I frame, the encoding type is open-loop encoding and the current frame data If the quantity is greater than the fourth threshold, determining that the current frame is a unidirectional predictive coding frame P frame; the fourth threshold is an average value of the P frame average data amount of one image group and an average value of the B frame average data amount;
  • the current frame is not a P frame or a P frame, it is determined that the current frame is a B frame.
  • a frame type detecting device includes:
  • a time detecting unit configured to detect a playing time of each frame
  • a frame type determining unit configured to determine that the current frame is a bidirectional predictive coding B frame if a play time of the current frame is less than a maximum play time of the already received frame.
  • a frame type detecting device includes:
  • a type obtaining unit configured to obtain an encoding type of a code stream in which the received frame is located, where the encoding type includes: open loop coding and closed loop coding;
  • a frame type determining unit configured to determine, if the data amount of the current frame is greater than the first domain value, that the current frame is an obvious I frame, where the first threshold is determined by setting an average data amount of the consecutive number of frames and an I frame data amount.
  • the encoding type is closed-loop encoding, and the current frame is non-obvious I frame, or if the previous frame of the current frame is an I frame, the encoding type is open-loop encoding and the current frame data If the quantity is greater than the fourth threshold, determining that the current frame is a P frame; the fourth threshold is an average value of the P frame average data amount of one image group and an average value of the B frame average data amount;
  • the current frame is not a P frame or a P frame, it is determined that the current frame is a B frame.
  • the technical solution provided by the embodiment of the present invention combines the coding order of different types of frames and the magnitude of data before and after the different types of frames, determines the frame type without decoding the payload, eliminates the influence of the attenuation factor, and improves the frame type.
  • 1A is a schematic flowchart of a method according to an embodiment of the present invention
  • IB is a schematic flowchart of a method according to an embodiment of the present invention
  • FIG. 2a is a schematic structural diagram of a hierarchical B frame coding according to an embodiment of the present invention
  • 2b is a diagram showing a relationship between a coding sequence and a play sequence, and a hierarchical diagram of coding according to an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of a packet loss frame according to an embodiment of the present invention.
  • FIG. 4 is a schematic flowchart of a method according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a device according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural view of a device according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural view of a device according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a device according to an embodiment of the present invention.
  • FIG. 9 is a schematic structural view of a device according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of detection results according to an embodiment of the present invention.
  • FIG. 11 is a schematic diagram of detection results according to an embodiment of the present invention.
  • FIG. 13 is a schematic diagram of detection results according to an embodiment of the present invention.
  • FIG. 15 is a schematic diagram of detection results according to an embodiment of the present invention. detailed description
  • a method for detecting a frame type includes:
  • the embodiment of the present invention may further: determine, according to a play order and a coding order of each frame, a level to which the B frame belongs in the hierarchical coding. How to determine the level will be further explained later. Based on the characteristics of the B-frame, if it is determined that the hierarchy to which it belongs can be applied in many fields, for example: When a data frame is used, the B-frame with a high level can be discarded. The application after the level determination of the B frame is not limited in the embodiment of the present invention.
  • the frame type is judged without decoding the payload, the influence of the attenuation factor is eliminated, and the correct rate of the frame type detection is improved.
  • the embodiment of the present invention further provides another method for detecting a frame type.
  • the method includes: 101B: obtaining an encoding type of a code stream in which the received frame is located, where the encoding type includes: open loop coding and closed loop coding;
  • 102B If the data volume of the current frame is greater than the first domain value, determining that the current frame is an obvious I frame, where the first threshold is calculated by setting an average data amount of the consecutive number of frames and an I frame data amount;
  • the above-mentioned obvious I frame belongs to the I frame. If it is judged to be an obvious I frame, the probability of judging the error is very low, but there may be a missed judgment, and the subsequent other methods of judging the I frame may be in the case of misjudgement of the I frame.
  • the encoding type is closed-loop encoding, and the current frame is non-obvious I frame (the current frame is still unclear at its frame type, but it can be determined whether it is an obvious I frame), or If the previous frame of the current frame is an I frame, the coding type is open-loop coding, and the data volume of the current frame is greater than the fourth threshold, determining that the current frame is a P frame; and the fourth threshold is a P-frame average data of an image group. The average of the amount and the average amount of data in the B frame;
  • the current frame is not a P frame or a P frame, it is determined that the current frame is a B frame.
  • the method corresponding to FIG. 1B above may be applied independently or in combination with the method of FIG. 1A. If used in combination, the implementation mode may be used when the play time cannot be detected in FIG. 1A.
  • the coding types of the code stream in which the received frame is obtained include:
  • the type of the frame after the I frame is statistically significant. If the proportion of the P frame reaches the set ratio, the coding type is determined to be closed-loop coding, otherwise it is open-loop coding.
  • the current frame is greater than the second threshold, it is determined that the current frame is an I frame; and the second threshold is an I frame data before the current frame.
  • the third threshold is: the average data amount of each frame of the image group in which the current frame is located, the distance from the previous I frame to the current frame, the distance from the expected fixed I frame interval, the data amount of the previous P frame of the current frame, and the current
  • the data amount of the image group I frame in which the frame is located is calculated; or, the third threshold value is based on the average data amount of each frame of the image group in which the current frame is located, and the distance between the distance from the previous I frame to the current frame and the expected fixed I frame interval. Calculated.
  • the method embodiment of the present invention if the playback time cannot be detected, further includes: if the previous frame of the current frame is a P frame and the data volume of the current frame is greater than a fifth threshold, or the current image group has a B frame and is currently If the data volume of the frame is greater than the sixth threshold, determine that the current frame is a P frame; the fifth threshold is: a product of the first adjustment factor and the average data amount of the P frame of the image group in which the current frame is located, and the first adjustment factor is greater than 0.5. And less than 1; the sixth threshold is: an average of the P frame average data amount and the B frame average data amount;
  • the previous frame of the current frame is a B frame and the data volume of the current frame is less than the seventh threshold, or the current image group has a P frame and the data volume of the current frame is less than the eighth threshold, determining that the current frame is a P frame;
  • the threshold is: the product of the second adjustment factor and the average data amount of the B frame of the image group in which the current frame is located, and the second adjustment factor is greater than 1 and less than 1.5;
  • the eighth threshold is: the average data amount of the P frame and the average data volume of the B frame.
  • the method further includes: determining, after the end of the frame type, determining a fixed interval of the I frame, if the I frame is still not determined after the fixed interval is reached, the method is fixed.
  • the frame of the maximum data amount within the set range at the interval is determined as an I frame; and the average data amount of each type of frame in the image group and the interval parameter of the I frame are updated.
  • the method further includes: after the end of the frame type determination, counting consecutive B frames, if the consecutive B frames is greater than the predicted value, the consecutive B frames are The frame with the largest amount of data is determined as a P frame; and the average data amount of each type of frame in the image group is updated; the above predicted value is greater than or equal to 3 and less than or equal to 7.
  • the method further includes: determining whether a packet has been received in the received frame, and determining a packet loss type if the packet loss occurs; if the packet loss type is intra-frame loss Packet, when calculating the frame data amount, determining that the sum of the data amount of the received frame and the amount of lost data is the data amount of the frame; If the packet loss type is inter-frame packet loss, it is determined whether the flag bit of the packet before the packet loss is 1; if yes, the data amount of the packet loss is calculated into the next frame, otherwise the data amount of the packet loss is equally distributed to Two frames before and after.
  • Further determining the type of packet loss includes:
  • the packet loss type is inter-frame loss
  • the flag of the packet before the packet loss cannot be detected, and the current data length is divided according to the predicted coding structure and the location of the packet loss.
  • the embodiment of the present invention fully utilizes the header information of RTP or TS over RTP, combines the coding order of different types of frames in the video, and the magnitude of the data volume before and after the different types of frames, and determines the frame type in real time without decoding the video payload. And improve the correct rate of frame type detection by packet loss processing, automatic update parameters, and post-frame type correction.
  • the video stream has header information indicating the playback time of the video data, such as: RTP timestamp in the ISMA mode, and PTS of the PES header in the TS over RTP mode.
  • the relationship between the playback time information and the encoding order is used to determine the encoding type of some special structures, such as: B frame.
  • B frame For the TS over RTP mode, there may be a case where the TS payload fully encrypted PES header cannot be decoded, that is, the PTS is not available. Therefore, the embodiment of the present invention also provides that the frame type determination is performed by using only information such as the amount of data without using the playback time. Program.
  • the I frame has the largest amount of data
  • the P frame is second
  • the B frame is the smallest. If the I frame at the beginning of each GOP can be correctly identified, the P frame and B frame inside the GOP can be judged by the data amount of the frame.
  • the embodiment of the present invention designs a set of dynamically adjustable dynamic parameters to improve the robustness and accuracy of frame type judgment. Especially when judging the I frame, the adjustment judgment criterion and related parameters of the I frame in different application scenarios are fully considered, which greatly reduces the false positive rate of the I frame.
  • packet loss occurs in the input video stream. According to the impact of the packet loss on the judgment process, it can be divided into two categories: 1. Packet loss within the frame. If you lose, you can get the frame boundary first, and use the corresponding serial number to count the number of packets in one frame. Second, the frame boundary is lost (for example, the packet with the flag bit in RTP is 1, or the packet with PUSI set in TS over RTP) At this time, the boundary between the two frames may not be judged, or the data of the two frames before and after may be spliced to one frame, so that the statistics of the frame data are inaccurate, affecting the result of the frame type judgment.
  • Embodiments of the present invention will perform packet loss detection, frame boundary estimation, and Part of the frame type estimate.
  • the frame type correction is added, and if the output result has obvious error after the data is increased, the internal correction is performed. Although the internal correction cannot change the frame type that has been output, the parameter can be improved by adjusting the parameter. The accuracy of subsequent judgments.
  • the encoding order is after the backward reference frame, so that the playing time tends to be inconsistent with the encoding order, so the B time frame can be determined by the playing time information. If the playing time of the current frame is less than the maximum playing time of the already received frame, the frame is definitely a B frame, otherwise it is an I frame or a P frame.
  • the playback time can also be utilized to further determine the highest level and the level to which each B frame belongs.
  • Fig. 2a is the coding structure diagram of the hierarchical B frame in this case, the subscript of the first row of letters indicates the level to which each frame belongs, and the number of the second row is each The playback sequence number of the frame.
  • the actual coding order is (the number in brackets is the play number) I0/P0(0), I0/P0 (8), ⁇ 1(4), ⁇ 2(2), ⁇ 3(1), ⁇ 3(3), ⁇ 2 (6), ⁇ 3(5), ⁇ 3(7).
  • Figure 2b shows the relationship between the encoding order and the playback order, and the level of the encoding.
  • the Arabic numerals indicate the playback sequence number
  • the Chinese numerals indicate the code sequence number.
  • the algorithm for judging grading by playing time can be divided into two steps:
  • Step 1 Determine the highest level (3 in this case). Set the level of the 0th frame to 0, and then read the play time in the coding order. If the play time of the current frame is less than the play time of the previous frame, the level of the current frame is 1 plus the level of the previous frame, and vice versa. The same as the previous frame. Until the first frame in which the playback time is immediately adjacent to the 0th frame is read, the level corresponding to the 1st frame is the highest level.
  • Step 2 Determine the level to which the remaining frames belong according to the symmetric relationship of the playing time of adjacent B frames.
  • the level of the five (b) solid line box has been determined, and the level to which the B frame in the dotted line box belongs is detected.
  • the detection method is to traverse in the frame of the determined level, and two frames are found such that the average of their playing time is equal to the playing time of the current frame, and the level of the current frame is increased by 1 for the maximum level of the two frames.
  • the ellipse in the figure shows the symmetry relationship, that is, the average of the playing time of the upper two frames in the ellipse is equal to the playing time of the lowermost frame, and the level of the lowermost frame is just the above two The maximum value of the frame level is increased by one.
  • the present embodiment provides a scheme for judging an I frame and a P frame by using only information such as the amount of data. For the case where the B frame can be judged according to the play time, it is only necessary to distinguish whether the remaining frame is an I frame or a P frame; and for the case where the B frame cannot be judged according to the play time (for example, the case where the header information is encrypted) The frame is judged, the I frame and the P frame are determined first, and the remaining frames are determined as the B frame.
  • the frame type is determined by the method of automatic parameter update, and the frame type is mainly divided into the following modules (as shown in FIG. 6): an I frame judgment module, a P frame judgment module, a parameter update module, and a type correction module. .
  • I frames in video can be divided into the following two types: Fixed-interval I-frames, that is, in order to satisfy random access during the compression process at a fixed interval (fixed within a certain period of time, once the user switches channels, the interval may occur) Change) inserted I frame; adaptively inserted I frame, that is, I frame inserted at the scene switching to improve compression efficiency.
  • the fixed interval can be estimated during the identification process.
  • the judgment condition is actively relaxed or determined by using local features (this will be described in detail later). ).
  • the code rate of the I-frame is often larger than that of the previous P-frame. If the code is a P frame, the code rate will be larger due to the prediction degradation. At this time, the frame is a more important frame, and it is easier to judge the I frame (the data amount of the P frame and the I frame are relatively large, and it is easy to be wrong. The P frame is mistakenly identified as an I frame). For the scene switching of the spatial complexity, the encoding of the I frame may be smaller than the previous P frame. There is no way to correctly identify the I frame of this type, but the subsequent P or B frames will also correspond. It becomes smaller, and through subsequent updates, type correction can be performed to improve the recognition rate of subsequent frame types.
  • the I frame can be judged by the following three steps, that is, the current frame data amount and the given threshold are respectively compared, and the I frame is determined as long as the current frame data amount in a certain step is larger than a given threshold:
  • the I frame is not immediately adjacent to the B frame. If the frame is not judged to be an I frame, it is a P frame;
  • the frame is a P frame, otherwise the frame is a B frame;
  • the frame is a P frame
  • the previous frame is a B frame
  • Counting the encoding type of the GOP In the identification process, for the more obvious I frame, it can be counted whether the next frame is a B frame or a P frame. If most I frames are followed by P frames, then The encoder can be considered to be closed-loop coding, otherwise it is considered to be open-loop coding.
  • Calculate the expected I frame fixed interval After the I frame is judged, the probability distribution of the interval is counted, and the expected fixed interval is obtained by weighted averaging.
  • the threshold in the above module is updated in real time according to the newly determined frame type:
  • Threshold 1 Calculated according to formula (1) based on the average data volume of the previous 50 frames ( av_IBPnbytes ) and the data volume of the previous I frame ( iframe_size_GOP ):
  • Threshold 1 delta 1* iframe_size_GOP+av_IBPnbytes
  • threshold 2 according to the data amount of the previous I frame (iframe_size_GOP), the average data amount of the largest P frame in the current GOP (max_pframes_size_GOP), and the average data amount of the I frame P frame in the first 50 frames (av_IPnbytes), according to the formula (2) Calculated:
  • Threshold 2 max(delta2*max_pframes_size_GOP, delta2 * av_IPnby te s , delta3 *iframe_size_GOP)
  • delta2 and delta3 are adjustment factors, respectively, and their empirical values are 1.5 and 0.5.
  • Threshold 3 The amount of data of the previous P frame (prew_pframe_nbytes), the amount of data of the I frame of the current GOP, based on the average data amount per frame of the current GOP (av_frame_size_GOP)
  • Threshold 3 max(av_frame_size_GOP, ip_thresh*prew_pframe_nbytes, iframe_size_GOP/3) Equation (3)
  • ip_thresh is calculated as the distance from the previous I frame to the current frame ( curr_i_interval ) is far from the expected fixed I frame interval ( expected_iframe_interval ):
  • Threshold 3 SThresh* av_pframes_size_GOP+ av_pframes_size_GOP Equation (5) where sThresh is calculated according to curr_i_interval and expected_iframe_interval: SThresh:
  • delta4 and delta5 are adjustment factors, respectively, and their empirical values are 0.2 and 2.0.
  • Threshold 4 is the mean of the P-frame average data amount ( av_pframes_size_Last_GOP ) of the previous GOP and the average B-frame data amount ( av_bframes_size_Last_GOP ), as in equation (7):
  • Threshold 4 (av_pframes_size_Last_GOP+av_bframes_size_Last_GOP)/2 e )
  • Threshold 5 Multiply 0.75 by the average amount of P frames in the current GOP ( av_pframes_size_GOP ), as in equation (8):
  • Threshold 5 delta6*av_pframes_size_GOP
  • delta6 is the adjustment factor, and its experience value is 0.75.
  • Threshold 6 is the average of the P frame average data amount ( av_pframes_size_GOP ) and the B frame average data amount ( max _bframes_size_GOP ), as in formula (9);
  • Threshold 6 (av_pframes_size_GOP+ max _bframes_size_GOP)/2
  • Threshold 7 Multiply 1.25 by the average amount of B frames in the current GOP ( av_bframes_size_GOP ), as in equation (10):
  • Threshold 7 delta7* av_bframes_size_GOP
  • delta7 is the adjustment factor, and its experience value is 1.25.
  • Threshold 8 is the average of the P frame average data amount ( av_pframes_size_GOP ) and the B frame average data amount ( av_bframes_size_GOP ), as in formula (11):
  • Threshold 7 (av_pframes_size_GOP+av_bframes_size_GOP)/2 D: Type correction:
  • the I frame is far beyond the expected fixed interval.
  • the local information may be used to correct the parameters, so that the subsequent frame type judgment is more accurate.
  • the frame with the largest amount of data is taken near the expected fixed interval, the frame type is changed to I frame, and the parameters such as the average data amount and the I frame interval of each frame type in the GOP are updated.
  • Video encoders in practical applications generally consider decoding delay and decoding storage overhead when using B-frames to improve coding efficiency, and do not encode more than 7 consecutive B-frames, or even more extreme, continuous B. There will be no more than 3 frames.
  • the predicted value of the largest consecutive B frame in the code stream is obtained by the previously determined frame type statistics. When determining a frame as a B frame, it is necessary to ensure that the number of consecutive B frames does not exceed the predicted value. If the value is exceeded, it indicates that there may be a wrong judgment in the frame that is continuously determined to be B frames. It is necessary to change the frame with the largest amount of data in these frames into a P frame, and update the information such as the average data amount of each frame type in the GOP.
  • Both of the first two instances need to be done with frame boundaries and frame data volumes already obtained.
  • the frame boundary and the data of each frame can be accurately learned by the RTP serial number, time stamp, flag bit (ISMA mode) or RTP sequence number, TS CC, PUSL PID (TS over RTP mode).
  • ISMA mode flag bit
  • RTP sequence number TS CC
  • PUSL PID TS over RTP mode
  • the PES header is solvable, it can be judged according to the PTS therein whether the current data length (that is, the data length between two packets having PES headers) contains the header information:
  • the packet loss is not included in the packet loss of the data length, that is, the current data length is one frame, and the packet loss occurs inside the frame, and no segmentation is required;
  • the expected coding structure If the expected coding structure is not met, it indicates that the packet header information is likely to be included in the packet loss, and the current data length is divided according to the expected coding structure and the location of the packet loss (continuous length, packet loss length, etc.), and the allocation is reasonable. Frame type and frame size and PTS.
  • the packet loss length, continuous length, maximum continuous length, maximum packet loss length, etc. it is determined whether the current data length is one frame and which frame type belongs to:
  • the data length is similar to the P frame plus the B frame, it is split into P frames + B frames, and the packet with the largest continuous length is assigned to the P frame, and the data length is divided into two segments on this basis, so that The length of each segment is close to the P frame and the B frame, respectively, and it is necessary to ensure that the second segment starts with the missing packet; for other cases, go to 4);
  • the length of the data is considered to belong to one frame.
  • This embodiment combines the above examples to provide an optional frame type detection scheme.
  • the specific process is shown in Figure 4: It is divided into the following stages: PTS preliminarily judges the frame type, packet loss processing, and uses the data amount to further determine the frame. Type and type correction.
  • Determine the frame type according to the play time First, determine whether the input code stream is a TS over RTP packet. If yes, determine whether the PES header of the TS packet is encrypted. For the TS over RTP packet that can be solved by the RTP packet or the PES header, it can be initially determined whether it is a B frame according to the playing time information. For specific implementation, refer to point one; packet loss processing: detecting whether there is packet loss, if no packet loss is directly counted The data volume enters the following frame type judgment step; if there is a packet loss, the packet loss processing needs to be performed for the RTP or the TS over RTP packet, and the frame boundary, the frame data amount or the partial frame type is estimated. For the specific implementation, refer to the third point;
  • the process determines the frame type in real time, and dynamically adjusts relevant parameters. For specific implementation, refer to point 2;
  • Type correction If the previous judgment result is wrong in the judgment process, it will be corrected. The process does not affect the output result, but it can be used to update the relevant parameters to improve the accuracy of subsequent judgments. For specific implementation, refer to point 2.
  • the embodiment of the present invention further provides a frame type detecting device, as shown in FIG. 5, including: a time detecting unit 501, configured to detect a playing time of each frame;
  • a frame type determining unit 502 configured to determine that the current frame is a bidirectional predictive coding B frame if a play time of the current frame is less than a maximum play time of the already received frame;
  • FIG. 5 may further include:
  • a level determining unit 503 configured to determine, according to a play order and an encoding order of each frame, a B frame in the grading
  • the level to which the code belongs is not required. It should be noted that the level determination is not a necessary technical feature for determining the B frame in the embodiment of the present invention, and the technical feature is only required for the subsequent related processing of the required level information.
  • the embodiment of the present invention further provides another frame type detecting apparatus.
  • the method includes: a type obtaining unit 601, configured to obtain an encoding type of a code stream in which a received frame is located, where the encoding type includes: Ring coding and closed loop coding;
  • the frame type determining unit 602 is further configured to: if the data amount of the current frame is greater than the first domain value, determine that the current frame is an obvious I frame, where the first threshold is set by the average data amount of the consecutive number of frames and the I frame data. Calculated by quantity;
  • the encoding type is closed-loop encoding, and the current frame is non-obvious I frame, or if the previous frame of the current frame is an I frame, the encoding type is open-loop encoding and the current frame data If the quantity is greater than the fourth threshold, determining that the current frame is a P frame; the fourth threshold is a mean value of the P frame average data amount of one image group and an average value of the B frame average data amount;
  • the current frame is not a P frame or a P frame, it is determined that the current frame is a B frame.
  • the frame type determining unit 602 is further configured to: if the current frame is greater than the second threshold, determine that the current frame is an I frame; the second threshold is a data amount of an I frame before the current frame, and an image group of the current frame. The average data amount of the medium P frame and the maximum value of the average data amount of the set number of consecutive frames.
  • the frame type determining unit 602 is further configured to: if the interval between the current frame and the previous I frame exceeds a fixed interval, and the current frame is greater than the third threshold, determine that the current frame is an I frame; the third threshold is: current The average data amount of each frame of the image group in which the frame is located, the data amount of the previous P frame of the current frame, and the data amount of the image group I frame of the current frame, the distance from the previous I frame to the current frame, and the expected fixed I frame interval. The distance is calculated according to the degree of distance; or, the third threshold is calculated according to the average data amount of each frame of the image group in which the current frame is located and the distance between the distance from the previous I frame to the current frame and the expected fixed I frame interval.
  • the frame type determining unit 602 is further configured to: if the previous frame of the current frame is a P frame and the data amount of the current frame is greater than a fifth threshold, or the current image group has a B frame and the data volume of the current frame is greater than the sixth frame.
  • the threshold is determined to be a P frame;
  • the fifth threshold is: a product of a first adjustment factor and an average data amount of a P frame of an image group in which the current frame is located, wherein the first adjustment factor is greater than 0.5 and less than 1;
  • the threshold is: the average of the P frame average data amount and the B frame average data amount;
  • the previous frame of the current frame is a B frame and the data volume of the current frame is less than a seventh threshold, or the current image group is thicker than the P frame and the data volume of the current frame is less than the eighth threshold, determining that the current frame is a P frame;
  • Seven threshold The value is: the product of the second adjustment factor and the average data amount of the B frame of the image group in which the current frame is located, and the second adjustment factor is greater than 1 and less than 1.5;
  • the eighth threshold is: P frame average data amount and B frame average data amount Mean.
  • the foregoing apparatus further includes:
  • the interval obtaining unit 701 is configured to determine a fixed interval of the I frame after the end of the frame type determination.
  • the frame type determining unit 602 is further configured to: if the I frame is not determined after the fixed interval is reached, the fixed interval is set. The frame of the maximum amount of data within the range is determined as an I frame;
  • the first updating unit 702 is configured to update an average data amount of each type of frame in the image group and an interval parameter of the I frame.
  • the foregoing apparatus further includes:
  • a statistical unit 801 configured to count consecutive B frames after the frame type judgment ends
  • the frame type determining unit 602 is further configured to: if the number of consecutive B frames is greater than a predicted value, determine a frame with the largest amount of data in the consecutive B frames as a P frame; and the predicted value is greater than or equal to 3 and less than or equal to 7
  • Unit 802 is configured to update an average amount of data of various types of frames in the image group.
  • the foregoing apparatus further includes:
  • the packet loss type determining unit 901 is configured to determine whether a packet has been lost in the received frame, and if a packet loss occurs, determine a packet loss type.
  • the data amount determining unit 902 is configured to determine, when the packet loss type is an intra-frame packet loss, the sum of the data amount of the received frame and the amount of the lost packet data as the data amount of the frame when calculating the frame data amount;
  • the packet loss type is inter-frame packet loss
  • the apparatus of the present embodiment and the apparatus of FIG. 4 or FIG. 5 can be used in combination, and the frame type determining unit 502 and the frame type determining unit 602 can be implemented by using the same functional unit.
  • the embodiment of the present invention fully utilizes the header information of RTP or TS over RTP, combines the coding order of different types of frames in the video, and the magnitude of the data volume before and after the different types of frames, and determines the frame type in real time without decoding the video payload. And improve the correct rate of frame type detection by packet loss processing, automatic update parameters, and post-frame type correction.
  • the video stream has header information indicating the playback time of the video data, such as: the RTP timestamp in the ISMA mode, and the PTS of the PES header in the TS over RTP mode.
  • the relationship between the playing time information and the encoding order is used to determine the encoding type of some special structures, such as: B frame.
  • B frame For the TS over RTP mode, there may be a case where the TS payload fully encrypted PES header cannot be decoded, that is, the PTS is not available. Therefore, the embodiment of the present invention also provides that the frame type determination is performed by using only information such as the amount of data without using the playback time. Program.
  • the I frame has the largest amount of data
  • the P frame is second
  • the B frame is the smallest. If the I frame at the beginning of each GOP can be correctly identified, the P frame and B frame inside the GOP can be judged by the data amount of the frame.
  • the embodiment of the present invention designs a set of dynamically adjustable dynamic parameters to improve the robustness and accuracy of frame type judgment. Especially when judging the I frame, the adjustment judgment criterion and related parameters of the I frame in different application scenarios are fully considered, which greatly reduces the false positive rate of the I frame.
  • packet loss occurs in the input video stream. According to the impact of the packet loss on the judgment process, it can be divided into two categories: 1. Packet loss within the frame. If you lose, you can get the frame boundary first, and use the corresponding serial number to count the number of packets in one frame. Second, the frame boundary is lost (for example, the packet with the flag bit in RTP is 1, or the packet with PUSI set in TS over RTP) At this time, the boundary between the two frames may not be judged, or the data of the two frames before and after may be spliced to one frame, so that the statistics of the frame data are inaccurate, affecting the result of the frame type judgment.
  • Embodiments of the present invention will perform packet loss detection, frame boundary estimation, and partial frame type estimation in this regard.
  • the frame type correction is added, and if the output result has obvious error after the data is increased, the internal correction is performed. Although the internal correction cannot change the frame type that has been output, the parameter can be improved by adjusting the parameter. The accuracy of subsequent judgments.
  • QOS Quality of Service
  • Test sequence Test using the TS stream captured by the current network and the code stream encoded by the fixed rate, as shown in Table 1, where the first three streams (iptvl37, iptvl38, iptvl39) captured by the live network are payload partial encryption but PES header.
  • the unencrypted code stream; the code rate of the fixed rate coded code is (1500, 3000, 4000, 5000, 6000, 7000, 9000, 12000, 15000).
  • the selected code streams are all H.264 coded, and the frame types are divided into three types: I, P, and B, and there is no classification.
  • the experimental results of the frame type detection of the above sequence are given below, as shown in Table 2.
  • I frame miss detection rate is the ratio of the I frame of the missed detection to the total number of I frames in the sequence
  • I frame error detection rate In order to misjudge P or B as the ratio of the number of I frames to the total number of I frames (it is worth noting that in most cases, only P is wrongly judged as I, and in rare cases, B is incorrectly judged as I, this is consistent with the fact that the B frame rate is much smaller than the I frame rate;) P->I error rate is wrong.
  • P frame is judged as the ratio of the number of I frames to the total number of actual P frames; P->B error The rate is wrong.
  • the P frame is judged as the ratio of the number of B frames to the total number of actual P frames; B->P error rate is wrong.
  • the B frame is judged as the ratio of the number of P frames to the total number of actual B frames; the total error rate is wrong.
  • the ratio of the number of judgments to the total number of frames (as long as the judgment of the frame type does not match the actual type is a wrong judgment).
  • the I frame miss detection rate and the I frame error detection rate average can reflect the correct detection probability for the I frame.
  • the process of judging the B frame by using the playing time is also added to the existing method in the case of using the playing time. Therefore, the difference in performance mainly comes from judging by the amount of frame data. Differences in methods.
  • the results show that, in the case that the frame type can be judged by the playing time and the frame type is not determined by using the playing time, the method is better than the existing method for the code stream intercepted by the current network and the self-programmed code stream, especially for the self-encoding stream.
  • the detection effect of this method is more obvious, and even in some cases, it can be error-free, and the existing method is less error-free.
  • Figure 10 to Figure 15 show the detailed detection results of some sequences, in which the actual lines are marked with a circle, and the predicted lines are identified by triangles; including the I frame distribution (the horizontal axis represents the I frame interval, and the interval is 1). The two adjacent frames are I, and the interval is 0, indicating that the I frame interval is greater than 49.
  • the I frame prediction period is the method of this paper.
  • the predicted I frame period, the actual period of the I frame is the actual I frame period) and the distribution of the frame type (in the table, the diagonal of the matrix is the number of frames correctly judged, and the other positions are misjudged).
  • the figure title is the sequence name + total number of frames + total error detection rate.
  • the sequence of the live network generally has a fixed I frame interval (the maximum value in the figure), and with the switching of the scene, some I frames are adaptively inserted, thereby causing a disturbance near the maximum value, forming a The I frame distribution in the figure.
  • I frame interval the maximum value in the figure
  • the algorithm can also distinguish the two maxima more accurately.
  • the expected I frame interval estimated according to the algorithm of this paper is similar to the actual I frame interval, so it can be used to guide the frame skipping during fast browsing.
  • Figure 10 iptvl37 15861 (error 0.6%)
  • Table 3 The results are shown in Table 3:

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Synchronisation In Digital Transmission Systems (AREA)

Description

帧类型的检测方法和装置
本申请要求于 2010 年 12 月 17 日提交中国专利局、 申请号为 201010594322.5、 发明名称为 "帧类型的检测方法和装置" 的中国专利申请的 优先权, 其全部内容通过引用结合在本申请中。 技术领域
本发明涉及视频处理技术领域, 特別涉及帧类型的检测方法和装置。 背景技术
视频编码标准中的可解码数据帧类型可分为帧内编码帧 (I-Frame , Intra coded frames, I帧)、 单向预测编码帧(P-Frame, Predicted frames , P帧)、 双 向预测编码帧 (B-Frame, Bi-directional predicted frames, B帧)。 在视频应用 中, I帧作为可解码的起始, 一般称为随机接入点, 可提供随机接入以及快速 浏览等服务。 在传输过程中, 不同的帧类型出错, 对解码端的主观质量的影响 是不同的, I帧具有截断误差传播的作用, 因此, 如果 I帧出错, 则对整个视 频的解码质量影响极大; P帧往往会作为其他帧间编码帧的参考帧, 其作用次 于 I帧; 由于 B帧通常不做为参考帧,其丟失对视频解码质量影响较小。因此, 在视频传输应用中区分数据流的不同帧类型有非常重要的意义, 比如: 作为视 频质量评估的重要参数, 帧类型判断的准确性直接影响到评估结果的准确性; 可以对视频中不同类型的帧进行不等差保护来实现视频的有效传输,另外为了 节省传输资源, 在带宽不足时可以丟弃一些对主观质量影响不大的帧。
常用的流传输技术主要为互联网流媒体联盟 (Internet Streaming Media Alliance , ISMA ) 方式和活在网际协议上的动图像专家组传输流 ( Moving Picture Expert Group- 2 Transport Stream over Internet Protocol , MPEG-2 TS over IP )方式, 这两种协议方式在将压缩视频数据流进行封装时, 都设计了能指示 视频数据类型的指示位。 ISMA方式是将压缩视频数据流直接采用实时传输协 议(Real-time Transport Protocol, RTP )进行封装, 其中 MPEG-4 Part2 遵循 互联网标准 3016 ( Request For Comments 3016, RFC3016 ), H.264/听觉和视觉 信号编码(Aural and Visual Code, AVC )遵循 RFC3984, 以 RFC3984为例, RTP头部包含的序列号 (Sequence Number ), 时间戳( Timestamp )等可以用 来判断丟帧以及帮助检测帧类型; MPEG-2 TS over IP方式也分两种: 用户数 据报文协议/ IP 上的传输流 (TS over User Datagram Protocol/IP , TS over UDP/IP )和实时传输协议 /UDP/IP 上的传输流(TS over Real-time Transport Protocol /UDP/IP, TS over RTP/UDP/IP ), 在视频传输中比较常用的是 TS over RTP/UDP/IP (后面筒称 TS over RTP ), 是将压缩视频数据流封装为基本流, 进一步将基本流划分为 TS分组, 最后对 TS分组用 RTP进行封装并传输。
RTP是针对多媒体数据流的一种传输协议,负责提供端到端的实时数据传 输, 其报文主要包括四个部分: RTP 头, RTP扩展头, 净载头, 净载数据。 RTP 头中的包含的数据主要有: 序列号、 时间戳、 标志位等。 序列号与 RTP 包——对应, 每发送一个包增加 1 , 可用于检测丟包; 时间戳可表示视频数据 的采样时间, 不同的帧会有不同的时间戳, 可指示视频数据的播放顺序; 标志 位则用来标识一帧的结束。 这些信息是帧类型判断的重要依据。
一个 TS分组有 188个字节, 由分组首部、 可变长度适配头和净负荷数据 组成, 其中分组首部的起始指示位( payload unit start indicator, PUSI )表示净 负荷数据是否包含打包的分组流(Packet Elementary Stream, PES ) 包头或节 目特殊信息 ( Program Special Information, PSI )。 对于 H.264媒体格式, 每个 PES包头预示着一个 NAL单元的开始。 TS分组自适应区段中的一些标志位, ^口: 随机接入指示 ( random access indicator ), 基本 优先级指示 ( elementary stream priority indicator ), 可以用来判断传输内容的重要性, 对于视频而言, 随机接入指示为 1表示随后遇到的第一个 PES包中包含序列开始信息, 基本 流优先级指示为 1表示该 TS分组负载内容有较多的 Intra块数据。
如果通过 PUSI判断出 TS分组负载部分包含 PES包头, 则可以进一步挖 掘对传输有用的信息。 PES分组由 PES分组包头及其后的分组数据组成, 原 始流数据(视频、 音频等)加载在 PES 包数据中。 PES分组插在传送流分组 中, 每个 PES分组首部的第一个字节就是传送流分组有效负载的第一个字节。 即一个 PES包头必须包含在一个新的 TS包中, 同时 PES包数据要充满 TS传 送包的有效负荷区域, 若 PES包数据的结尾无法与 TS包的结尾对齐, 则需要 在 TS的自适应区域中插入相应数量的填充字节, 使得两者的结尾对齐。 PES 优先级表示 PES包数据中的净载的重要性, 对于视频, 为 1表示 Intra数据; 另外 PTS表示显示时间, DTS表示解码时间, 可以用来判断视频载荷内容的 前后相关性, 从而判断载荷类型。
TS over RTP方式中, 为了保护传输中的视频版权内容, 在传输过程中往 往会采用对负载加密的方式进行传输。 对 TS分组的加密是对分组的有效载荷 部分进行加密, 一旦 TS头部的加扰标记置 1 , 则其载荷被加密, 此时仅可以 利用相邻 PUSI之间具有相同 PID的数据包的长度(同一个视频帧的长度)来 判断出载荷数据类型。 如果 TS分组中 PES头未加密, 则除了可以利用上述视 频帧的长度来判断数据帧类型外, 还可以利用 PTS来协助判断帧类型。
通过以上的介绍可知: 不同类型的数据帧其数据量有区別, I帧由于只去 除了帧内的冗余,其数据量一般比去除了帧间冗余的帧间编码帧大, 而 P帧一 般又比 B帧的数据量大。 针对这一特性, 目前存在一些帧类型检测算法在 TS 分组加密的情况下, 利用帧的数据量来判断帧类型; 以下介绍运用比较多的两 种方法:
一: 通过解析 TS分组, 得到每个视频帧的长度, 通过长度大小信息推断 帧类型。 已经提出的方法是针对 TS分组的有效载荷部分已加密的情况下, 确 定帧类型。
该方法通过解析 TS分组的 Continuity Counter域判断分组的丟失状态,通 过本次执行判断之前的图像组(Group Of Pictures, GOP ) 的结构信息估计丟 失的分组状态, 并结合 TS分组头部自适应字段的可用信息 (Random Access Indicator, RAI或者 Elementary Stream Priority Indicator, ESPI )来判断视频†贞 的类型。
对于 I帧的识別, 可以用以下三种方法:
1、 利用 RAI或者 ESPI识別 I帧。
2、 在不能利用 RAI或者 ESPI识別时, 通过緩存一个 GOP的数据, 将当 前緩存的数据中的最大值做为 I帧, GOP的长度需要预先定义, 一旦 GOP长 度发生变化, 该方法将失效。
3、 使用表示最大 GOP长度的值作为 I帧的确定周期, 确定周期内的最大 数据量帧是 I帧, 确定周期是已检测出的 I帧周期的最大值。
对于 P帧, 用以下三种方法:
1、 从起始帧至紧接 I帧之前的帧之间的帧中, 选择数据量大于周围帧的 每一帧确定为 P帧。 针对处理目标流的 GOP结构中包括的确定帧模式, 从确 定周期中选择与 N种确定帧模式相对应的连续帧作为确定目标帧, 将确定目 标帧的数据量之间的大小关系与确定帧模式进行比较,可以基于其间的匹配来 确定 P帧。 在 GOP结构中, 使用以下模式作为确定帧模式: 该模式包括紧接 在 P帧之前的所有连续 B帧和在 P帧下一帧的一个 B帧。此时 GOP的一些信 息需要预先输入。
2、 基于表现模式中预定位置处的多个帧的帧数据量的平均值而计算的阈 值与表现模式中每一帧的帧数据量之间的比较结果。
3、 使用调整系数基于帧数据量来调整用于区分 P和 B帧的阈值。 调整系 数:在给定范围内顺序选择的临时调整系数来执行与帧类型确定处理相同的处 理,从而对预先给定的获知周期中的每一帧的帧类型进行估计,计算估计结果 与从未加密流中获取的实际帧类型的错误确定比,获知具有最低错误确定比的 临时调整系数作为真实的调整系数。
对于 B帧, 判断方法为: I帧, P帧以外的帧确定为 B帧。
以上判断帧类型的方法, 对于有分组丟失的情况, 基于 RTP序列号和 TS 首部连续性指示符(CC )可以检测分组丟失, 通过 GOP结构可以模式匹配估 计丟失的分组状态,从而达到一定程度的纠正。但是对于不可调整阈值的方法 需要预先输入 GOP信息, 而对于可调整阈值的方法则需要从未加密的码流中 获取帧类型信息来训练系数, 需要过多的人工干预。 另夕卜, 需要緩存一个 GOP 再进行帧类型估计, 不适用于实时应用。 再次, I帧判断只进行一次, 可调整 的系数是周期, 在每个周期中直接取最大值为 I, 只考虑到了局部特性, 对于 全局特性没有考虑。
二: 利用阈值区分不同帧的方法可以分四步进行:
1、 阈值的更新:
区分 I帧的阈值 ( Ithresh ):
scaled_max_iframe= scaled_max_iframe*0.995; 其中 scaled_max_iframe为 上一个 I帧大小。
如果 nbytes> scaled_max_iframe ,
贝l ithresh=(scaled_max_iframe/4+ av_nbytes*2)/2; 其中 av_nbytes为当前 8 帧的滑动均值。
区分 P帧的阈值 ( Pthresh ): scaled_max_pframe= scaled_max_pframe*0.995; 其中 scaled_max_pframe 为上一个 P帧大小。
^口果 nbytes> scaled_max_pframe , 贝l pthresh= av_nbytes*0.75;
2、检测 I帧: 视频每隔一段时间会有一个 I帧, I帧比平均值大, I帧比 P 帧大。 如果当前帧数据量比 Ithresh大, 则认为该帧是 I帧。
3、 检测 P帧: 利用 B帧比平均值小。 如果当前帧的数据量大于 Pthresh, 'J、于 Ithresh , 则认为该帧是 Ρ帧。
4、 其他的帧为 Β帧。
以上第二种判断帧类型的方法, 采用衰减因子控制阈值, 该因子直接影响 I帧的判断, 当后续 I帧大于当前 I帧时, 容易判断出 I帧; 但是当后续 I帧远 小于当前 I帧时, 需要经过很多帧的衰减才能重新判断出 I帧。 且算法中固定 为 0.995 , 没有考虑 GOP变化剧烈的情况, 很多情况下并不适用。 衰减因子越 小, 则 I帧漏检率越小, 同时 Ρ误判为 I帧的概率增加; 衰减因子越大, 贝' J I 帧漏检率增大(序列中 I帧的大小变化剧烈时), 将 I帧判断为 P帧。 因此检 测准确率较低。 另外, 仅考虑使用阈值判断 B/P帧, 对 I/P/P/P...这种帧结构, 算法会将^艮多 Ρ帧错判为 Β帧误判率高。 发明内容
本发明实施例要解决的技术问题是提供一种帧类型的检测方法和装置,提 高帧类型检测的正确率。
为解决上述技术问题,本发明所提供的帧类型的检测方法实施例可以通过 以下技术方案实现:
检测各帧的播放时间;
若当前帧的播放时间小于已经接收到的帧的最大播放时间,则确定所述当 前帧为双向预测编码帧 Β帧。
一种帧类型的检测方法, 包括:
获得接收到的帧所在码流的编码类型, 所述编码类型包括: 开环编码和闭 环编码;
若当前帧的数据量大于第一域值则确定当前帧为明显的帧内编码帧 I帧, 所述第一阈值由设定连续个数的帧的平均数据量以及 I帧数据量计算得到; 若当前帧的前一帧为 I帧、 编码类型为闭环编码且当前帧为非明显的 I帧, 或者, 若当前帧的前一帧为 I帧、 编码类型为开环编码且当前帧的数据量大于 第四阈值, 则确定当前帧为单向预测编码帧 P帧; 所述第四阈值为一个图像组 的 P帧平均数据量以及 B帧平均数据量的均值;
若当前帧非 I帧也非 P帧, 则确定当前帧为 B帧。
一种帧类型的检测装置, 包括:
时间检测单元, 用于检测各帧的播放时间;
帧类型确定单元,用于若当前帧的播放时间小于已经接收到的帧的最大播 放时间, 则确定所述当前帧为双向预测编码 B帧。
一种帧类型的检测装置, 包括:
类型获得单元, 用于获得已经接收到的帧所在码流的编码类型, 所述编码 类型包括: 开环编码和闭环编码;
帧类型确定单元,用于若当前帧的数据量大于第一域值则确定当前帧为明 显的 I帧, 所述第一阈值由设定连续个数的帧的平均数据量以及 I帧数据量计算 得到;
若当前帧的前一帧为 I帧、 编码类型为闭环编码且当前帧为非明显的 I帧, 或者, 若当前帧的前一帧为 I帧、 编码类型为开环编码且当前帧的数据量大于 第四阈值,则确定当前帧为 P帧;所述第四阈值为一个图像组的 P帧平均数据量 以及 B帧平均数据量的均值;
若当前帧非 I帧也非 P帧, 则确定当前帧为 B帧。
本发明实施例提供的技术方案,结合不同类型帧的编码顺序以及不同类型 帧的前后数据量大小关系,在不解码净载的情况下判断帧类型, 消除了衰减因 子的影响, 提高了帧类型检测的正确率。 附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需 要使用的附图作筒单地介绍, 显而易见地, 下面描述中的附图仅仅是本发明的 一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下, 还可以根据这些附图获得其他的附图。
图 1A为本发明实施例方法流程示意图; 图 IB为本发明实施例方法流程示意图;
图 2a为本发明实施例分级 B帧编码结构示意图;
图 2b为本发明实施例编码顺序和播放顺序的关系, 以及编码的层级示意 图;
图 3为本发明实施例丟包帧结构示意图;
图 4为本发明实施例方法流程示意图;
图 5为本发明实施例方装置结构示意图;
图 6为本发明实施例方装置结构示意图;
图 7为本发明实施例方装置结构示意图;
图 8为本发明实施例方装置结构示意图;
图 9为本发明实施例方装置结构示意图;
图 10为本发明实施例检测结果示意图;
图 11为本发明实施例检测结果示意图;
图 12为本发明实施例检测结果示意图;
图 13为本发明实施例检测结果示意图;
图 14为本发明实施例检测结果示意图;
图 15为本发明实施例检测结果示意图。 具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清 楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而不是 全部的实施例。基于本发明中的实施例, 本领域普通技术人员在没有作出创造 性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。
一种帧类型的检测方法, 如图 1A所示, 包括:
101 A: 检测各帧的播放时间;
102A: 若当前帧的播放时间小于已经接收到的帧的最大播放时间, 则确 定上述当前帧为双向预测编码 B帧;
进一步地, 本发明实施例还可以: 依据各帧的播放顺序和编码顺序确定 B 帧在分级编码中所属的层级。 对于如何确定层级在后文中将作进一步的说明。 基于 B帧的特性, 若确定了其所属的层级可以在很多领域应用, 例如: 在压缩 数据帧时, 可以丟弃层级高的 B帧。 在 B帧的层级确定后的应用本发明实施例 不予限定。
上述实施例,结合不同类型帧的编码顺序以及不同类型帧的前后数据量大 小关系, 在不解码净载的情况下判断帧类型, 消除了衰减因子的影响, 提高了 帧类型检测的正确率。
本发明实施例还提供了另一种帧类型的检测方法, 如图 1B所示, 包括: 101B: 获得接收到的帧所在码流的编码类型, 上述编码类型包括: 开环 编码和闭环编码;
102B: 若当前帧的数据量大于第一域值则确定当前帧为明显的 I帧, 上述 第一阈值由设定连续个数的帧的平均数据量以及 I帧数据量计算得到;
上述明显的 I帧属于 I帧, 若判断为明显的 I帧, 那么判断错误的几率是很低 的, 但是有可能出现漏判, 后续其他判断 I帧的方式可能出现错判 I帧的情况。
若当前帧的前一帧为 I帧、 编码类型为闭环编码且当前帧为非明显的 I帧 (当前帧此时还不清楚其帧类型, 但是可以确定其是否为明显的 I帧), 或者, 若当前帧的前一帧为 I帧、 编码类型为开环编码且当前帧的数据量大于第四阈 值, 则确定当前帧为 P帧; 上述第四阈值为一个图像组的 P帧平均数据量以及 B 帧平均数据量的均值;
若当前帧非 I帧也非 P帧, 则确定当前帧为 B帧。
需要说明的是, 上述图 1B对应的方法可以独立应用, 也可以与图 1A的方 法结合使用, 若结合使用可以在图 1A中播放时间无法检测到的情况下使用实 现方式。
上述获得接收到的帧所在码流的编码类型包括:
统计明显的 I帧后一帧的类型,若为 P帧的比例达到设定比例则确定编码类 型为闭环编码, 否则为开环编码。
以下实施例以图 1B的方案与图 1A的方案结合使用为例进行说明, 若图 1B 方案独立使用的时, 可以不用检查播放时间是否能够被检测到。
进一步地在 101 A中若播放时间无法检测到的方法实施例, 还包括: 若当前帧大于第二阈值, 则确定当前帧为 I帧; 上述第二阈值为当前帧之 前的一个 I帧的数据量、 当前帧所在图像组中 P帧的平均数据量以及设定个数连 续帧的平均数据量中的最大值。 进一步地在 101 A中若播放时间无法检测到的方法实施例, 还包括: 若当前帧大于第三阈值, 且当前帧与前一个 I帧的间隔超过固定间隔, 则 确定当前帧为 I帧; 上述第三阈值为: 当前帧所在图像组各帧的平均数据量、 上一个 I帧到当前帧的距离与预期的固定 I帧间隔的远离程度、 当前帧的前一个 P帧的数据量以及当前帧所在图像组 I帧的数据量计算得到; 或者, 上述第三阈 值根据当前帧所在图像组各帧的平均数据量以及上一个 I帧到当前帧的距离与 预期的固定 I帧间隔的远离程度计算得到。
进一步地在 101 A中若播放时间无法检测到的方法实施例, 还包括: 若当前帧的上一帧为 P帧且当前帧的数据量大于第五阈值, 或者当前图像 组存在 B帧且当前帧的数据量大于第六阈值, 则确定当前帧为 P帧; 上述第五 阈值为: 第一调节因子与当前帧所在图像组的 P帧的平均数据量的积, 上述第 一调节因子大于 0.5且小于 1; 上述第六阈值为: P帧平均数据量和 B帧平均数据 量的均值;
若当前帧的上一帧为 B帧且当前帧的数据量小于第七阈值, 或者当前图像 组存在 P帧且当前帧的数据量小于第八阈值,则确定当前帧为 P帧;上述第七阈 值为: 第二调节因子与当前帧所在图像组的 B帧的平均数据量的积, 上述第二 调节因子大于 1小于 1.5; 上述第八阈值为: P帧平均数据量与 B帧平均数据量的 均值。
进一步地在 101 A中若播放时间无法检测到的方法实施例, 还包括: 在帧类型判断结束后, 确定 I帧的固定间隔, 若在固定间隔达到后仍然没 有判断存在 I帧, 则将固定间隔处设定范围内的最大数据量的帧确定为 I帧; 并 更新图像组中各种类型帧的平均数据量以及 I帧的间隔参数。
进一步地在 101 A中若播放时间无法检测到的方法实施例, 还包括: 在帧类型判断结束后, 统计连续的 B帧, 若连续的 B帧数大于预测值, 则 将上述连续的 B帧中数据量最大的帧确定为 P帧; 并更新图像组中各种类型帧 的平均数据量; 上述预测值大于等于 3小于等于 7。
进一步地在 101 A中若播放时间无法检测到的方法实施例, 还包括: 确定已经接收到的帧是否发生丟包, 若发生丟包, 则确定丟包类型; 若丟包类型为帧内丟包,则计算帧数据量时确定收到帧的数据量与丟包数 据量的和为该帧的数据量; 若丟包类型为帧间丟包, 则确定丟包处之前的包的标志位是否为 1 , 若是, 则将丟包的数据量计算入后一帧, 否则将丟包的数据量平均分配给前后两帧。
进一步地上述确定丟包类型包括:
通过统计已经检测出的帧类型预测编码结构;
若丟包类型为帧间丟包,丟包处之前的包的标志位无法检测, 则依据预测 的编码结构以及丟包的位置分割当前数据长度。
本发明实施例充分利用 RTP或 TS over RTP的包头信息,结合视频中不同类 型帧的编码顺序以及不同类型帧的前后数据量大小关系,在不解码视频净载的 情况下快速实时的判断帧类型, 并且通过丟包处理、 自动更新参数以及后期帧 类型纠正的方法提高帧类型检测的正确率。
视频流中会有指示视频数据的播放时间的包头信息, 如: ISMA方式中的 RTP时间戳, 以及 TS over RTP方式中 PES头的 PTS。 本发明实施例将利用播放 时间信息和编码顺序的相互关系, 来判断某些特殊结构的编码类型, 如: B帧。 但对于 TS over RTP方式,可能存在 TS净载完全加密 PES头无法解码的情况, 即 PTS不可得, 因此, 本发明实施例还提供了不利用播放时间只利用数据量等信 息来进行帧类型判断的方案。
观察实际应用中的视频码流可以发现,同一个 GOP内不同类型的帧一般具 有较为明显的区別, I帧数据量最大, P帧其次, B帧最小。 如果能正确识別出 每个 GOP起始处的 I帧,则可以利用该帧的数据量判断此 GOP内部的 P帧和 B帧。 但由于视频信号的非平稳性, 不同位置处的 I 帧数据量差別存在着较大的差 別, 甚至会和之前 GOP中的 P帧的数据量相当, 给判断 I帧带来了困难。 本发明 实施例设计了一套可智能调节的动态参数,以提高帧类型判断的鲁棒性和准确 性。 特別是在判断 I帧时, 充分考虑了不同应用场景中 I帧的特性适当的调节判 断准则和相关参数, 大大降低了 I帧的误判率。
在有损传输的应用场景中,输入的视频流会发生丟包,根据丟包对判断过 程的影响,可以将其分为两类:一、帧内的丟包,此时帧边界的信息没有丟失, 可以先获取到帧边界,用对应的序列号来统计一帧的包数;二、帧边界丟包(如: RTP中标志位为 1的包, 或 TS over RTP中 PUSI置 1的包), 此时可能无法判断前 后两帧的边界,也可能前后两帧的数据拼接到一帧,使得帧数据量统计不准确, 影响帧类型判断的结果。本发明实施例将就此进行丟包检测、 帧边界估计以及 部分的帧类型估计。
在帧类型判断的前期, 由于统计数据不充足, 会存在较多的误判, 不仅影 响到已输出的结果, 更会通过改变各种参数影响到后续判断的准确性。本发明 实施例在判断帧类型流程之后增加了帧类型纠正,在数据增加后若输出结果有 明显错误时进行内部纠正, 内部纠正虽然不能改变已经输出的帧类型,但可以 通过调整参数的方式提高后续判断的准确性。
以下将分別就本发明实施例的三个要点进行详细说明:
一: 利用播放时间判断 B帧或 /和分级 B帧:
由于 B帧采用前向以及后向的已编码帧作为预测, 其编码顺序在后向参考 帧之后,使得其播放时间往往和编码顺序不一致, 因此可以用播放时间信息来 判定 B帧。 若当前帧的播放时间小于已经接收到的帧的最大的播放时间, 则该 帧肯定为 B帧, 否则为 I帧或 P帧。
对于分级编码的 B帧也可以利用播放时间来进一步判断最高层级以及每 个 B帧所属的层级。 以连续 7个 B帧的情况为例, 图 2a所示, 是该情况下分级 B 帧的编码结构图, 第一排字母的下标表示每帧所属的层级, 第二排的数字为每 一帧的播放序号。 而实际的编码顺序为 (括弧中的数字为播放序号) I0/P0(0), I0/P0 (8), Β1(4) , Β2(2) , Β3(1) , Β3(3) , Β2(6) , Β3(5) , Β3(7)。 图 2 b为编码顺序 和播放顺序的关系, 以及编码的层级, 阿拉伯数字表示播放序号, 中文数字表 示编码序号。
用播放时间判断分级的算法可分为两步:
第一步: 判断最高层级(此例中为 3 )。 将第 0帧的层级设为 0, 然后按编码 顺序读取播放时间,如果当前一帧的播放时间小于前一帧的播放时间则当前帧 的层级为前一帧的层级加 1 , 反之则与前一帧的一样。 直到读到播放时间紧邻 于第 0帧的帧即第 1帧, 此时第 1帧所对应的层级即为最高层级。
第二步: 根据相邻 B帧播放时间的对称关系来判断剩余的帧所属的层级。 第一步完成后图.五(b ) 实线框中的层级都已经确定, 此时需检测虚线框中的 B帧所属的层级。 检测方法是在已经确定层级的帧中进行遍历, 寻找到两个帧 使得他们播放时间的均值与当前帧的播放时间相等,则当前帧的层级为该两个 帧的最大层级加 1。 图中的橢圓展示的即是这种对称关系, 即橢圓中上面两帧 的播放时间的均值等于最下面帧的播放时间,而最下面帧的层级刚好为以上两 帧层级的最大值加 1。
二、 利用帧数据量来判断帧类型:
由于根据播放时间只能区分出是否为 B帧, 本实施例提供了只利用数据量 等信息来判断 I帧和 P帧的方案。 对于根据播放时间可判断出 B帧的情况, 只需 要对剩余的帧区分是否 I帧或 P帧; 而对于无法根据播放时间判断出 B帧的情况 (例如包头信息加密的情况)则要对所有帧进行判断, 先确定 I帧和 P帧, 剩余 的帧则判定为 B帧。
本实施例通过自动参数更新的方法利用帧数据量来判断帧类型,主要分为 以下几个模块(如图六所示): I帧判断模块、 P帧判断模块、 参数更新模块和 类型纠正模块。
A: I帧判断:
一般来说视频中的 I帧可分为以下两类: 固定间隔的 I帧, 即为了满足随机 接入在压缩过程中按照固定间隔 (一定时期内固定, 一旦用户切换频道, 该间 隔可能会发生变化)插入的 I帧; 自适应插入的 I帧, 即是为了提高压缩效率, 在 场景切换处插入的 I帧。
对于固定间隔的 I帧, 在识別过程中可以估计该固定间隔, 在超过该间隔 还没有判断到 I帧时, 主动放宽判断条件或者用局部的特征来判断 (后文对此 将有详细说明)。
而对于自适应插入的 I帧, 在序列空间复杂度类似的场景切换处, 如果编 码为自适应插入的 I帧, 由于 I帧的压缩效率差, 其码率往往会比之前的 P帧大; 如果编码为 P帧, 由于预测变差, 其码率也会比较大, 此时该帧是比较重要的 帧, 较为容易的判断为 I帧(P帧和 I帧数据量都比较大, 容易错误地将 P帧误认 是 I帧)。 对于空间复杂度筒单的场景切换处, 编码为 I帧可能会比之前的 P帧 还小, 对于此类的 I帧没有办法正确识別, 但是其后的那些 P帧或 B帧也会相应 变小, 通过后续的更新, 可以进行类型纠正, 以提高对后续帧类型的识別率。
因此, 可通过以下三个步骤来判断 I帧, 即分別比较当前帧数据量和给定 阈值, 只要某一步中当前帧数据量大于给定阈值就判定为 I帧:
才艮据阈值 1判断明显的 I帧;
才艮据阈值 2判断非固定间隔的 I帧;
根据阈值 3判断超过预期的固定间隔的 I帧。 B: P帧判断:
对于上一帧为 I帧且当前视频流为闭环编码的情况, I帧后面不会紧邻 B帧。 如果该帧没有判断为 I帧, 则为 P帧;
对于上一帧为 I帧且当前视频流为开环编码的情况, 如果当前帧的数据量 大于阈值 4, 则该帧为 P帧, 否则该帧为 B帧;
对于上一帧为 P帧的情况, 如果当前帧数据量大于阈值 5或者在当前 GOP 存在 B帧的情况下大于阈值 6, 那么该帧为 P帧;
对于上一帧为 B帧的情况, 表示当前 GOP中存在 B帧, 如果当前帧数据量 小于阈值 7或者在当前 GOP已经判断出有 P帧的情况下小于阈值 8,那么该帧为 P 帧。
C: 参数更新:
统计 GOP的编码类型 (开环或闭环): 在识別过程中, 对于比较明显的 I 帧, 可以统计其后一帧是 B帧还是 P帧, 若大多数 I帧后面都是 P帧, 则可以认为 该编码器是闭环编码, 否则认为是开环编码。
计算预期的 I帧固定间隔: 在判断出 I帧后, 统计其间隔的概率分布, 并通 过加权平均, 得到的预期的固定间隔。
根据新判断出的帧类型实时的更新上述模块中的阈值:
a )阈值 1 : 根据之前 50帧的平均数据量( av_IBPnbytes ) 以及前一个 I帧的 数据量( iframe_size_GOP ), 按照公式(1 )计算得到:
阈值 1= delta 1* iframe_size_GOP+av_IBPnbytes
其中, deltal为调节因子, 取值范围为 (0,1), 根据实验得到的经验值为 0.5。 b )阈值 2: 根据前一个 I帧的数据量(iframe_size_GOP )、 当前 GOP中最大 的 P帧的平均数据量( max_pframes_size_GOP ) 以及前 50帧中 I帧 P帧的平均数 据量( av_IPnbytes ), 按照公式(2 )计算得到:
阈 值 2= max(delta2*max_pframes_size_GOP, delta2 * av_IPnby te s , delta3 *iframe_size_GOP)
其中, delta2和 delta3分別为调节因子, 其经验值为 1.5和 0.5。
c ) 阈值 3: 根据当前 GOP的每帧的平均数据量(av_frame_size_GOP ), 前 一个 P帧的数据量 ( prew_pframe_nbytes ) , 当前 GOP的 I帧的数据量
( iframe_size_GOP ), 按照公式(3 )计算得到; 或者根据当前 GOP的 P帧平均 数据量( av_pframes_size_GOP )按照公式(5 )计算得到:
阈 值 3=max(av_frame_size_GOP, ip_thresh*prew_pframe_nbytes, iframe_size_GOP/3) 公式( 3 )
其中, ip_thresh随着从上一个 I帧到当前帧的距离( curr_i_interval )与预期 的固定 I帧间隔 ( expected_iframe_interval ) 的远离程度来计算的:
ip_thresh=max(2-(curr_i_interval- expected_iframe_interval)*0.1,1.5) 公式
( 4 )
阈值 3= SThresh* av_pframes_size_GOP+ av_pframes_size_GOP 公式 ( 5 ) 其中 , sThresh根据 curr_i_interval和 expected—iframe—interval来计算: SThresh:
max(delta4, SThresh/(delta5*curr_i_interval/expected_iframe_interval)) 公 式(6 )
其中, delta4和 delta5分別为调节因子, 其经验值为 0.2和 2.0。
d ) 阈值 4: 为上一个 GOP的 P帧平均数据量( av_pframes_size_Last_GOP ) 和 B帧平均数据量( av_bframes_size_Last_GOP ) 的均值, 如公式(7 ) :
阈值 4=(av_pframes_size_Last_GOP+av_bframes_size_Last_GOP)/2 e )阈值 5: 为 0.75乘以当前 GOP中 P帧平均数据量( av_pframes_size_GOP ), 如公式 ( 8 ) :
阈值 5= delta6*av_pframes_size_GOP
其中, delta6分別为调节因子, 其经验值为 0.75
f ) 阈值 6: 为 P帧平均数据量( av_pframes_size_GOP )和 B帧平均数据量 ( max _bframes_size_GOP ) 的均值, 如公式(9 );
阈值 6= (av_pframes_size_GOP+ max _bframes_size_GOP)/2
g )阈值 7:为 1.25乘以当前 GOP中 B帧平均数据量( av_bframes_size_GOP ), 如公式 ( 10 ) :
阈值 7=delta7* av_bframes_size_GOP
其中, delta7分別为调节因子, 其经验值为 1.25
h ) 阈值 8: 为 P帧平均数据量( av_pframes_size_GOP )和 B帧平均数据量 的均值( av_bframes_size_GOP ), 如公式( 11 ) :
阈值 7= (av_pframes_size_GOP+av_bframes_size_GOP)/2 D: 类型纠正:
纠正漏判的 I帧:
经过上述步骤后, 可能存在远超过预期的固定间隔却还没有判断出 I帧的 情况, 此时, 虽然已经输出帧类型, 但是可以利用局部的信息纠正参数, 使得 后续的帧类型判断更准确。在接近预期的固定间隔附近取数据量最大的帧, 将 其帧类型改为 I帧, 并更新 GOP中各帧类型的平均数据量和 I帧间隔等参数。
纠正错判的 B帧:
实际应用中的视频编码器, 在利用 B帧提高编码效率时一般会考虑到解码 延时以及解码存储开销, 不会编码出超过 7个的连续 B帧, 甚至, 更为极端的 是, 连续 B帧不会超过 3个。 通过之前判断出的帧类型统计得出该码流中最大 连续 B帧的预测值。在将一帧确定为 B帧时, 需要确保此次连续的 B帧数不超过 预测值。 如果超过该值, 说明当前连续判断为 B帧的帧中可能有错判, 需要将 这些帧中数据量最大的帧改判为 P帧, 并更新 GOP中的各帧类型的平均数据量 等信息。
三、 无法确定边界和帧数据量时的帧类型检测:
前两个实例都需要在帧边界和帧数据量已获得的情况下进行。在无丟包时 可以通过 RTP的序列号、 时间戳、 标志位(ISMA方式)或 RTP序列号、 TS中 CC、 PUSL PID ( TS over RTP方式)来准确的获知帧边界和每一帧的数据量, 但在发生丟包的情况下,如果处于帧边界的包发生丟失, 则无法准确判断帧边 界的位置, 可能会将帧的包数估计错误甚至将两帧的数据量拼为一帧, 这将对 帧类型的检测带来极大的干扰。 因此, 如果有丟包则需要在帧类型判断之前进 行丟包处理, 来获得帧边界、 帧数据量和帧类型等信息。
由于 ISMA方式中 RTP时间戳的变化标志着新的帧到达, 因此在发生丟包 时, 其处理过程比较筒单:
1 )如果丟包前后时间戳无变化, 代表丟失的包处于一帧内部, 只需在统 计帧数据量时考虑丟包的数据即可;
2 )如果丟包前后时间戳发生变化, 代表丟包发生在帧的边界, 此时如果 丟包前一个包的标志位为 1 , 则视丟包为后一帧的数据, 添加到后一帧的数据 量中; 否则, 将丟包的数据量平均分配给前后两帧 (此处假设一次突发丟包不 会超过一帧的长度)。 TS over RTP的情况要相对复杂, 由于只能通过是否有 PES头(即 PUSI为 1 ) 来判断一帧的开始, 若发生丟包, 则 4艮难判断两个有 PES头的包之间的数据是 属于一帧或多帧,如图 3所示,在两个有 PES头的包之间的数据发生了 3次丟包, 但由于无法知晓丟失的包中是否也有 PES头 (即代表一帧的开始), 无法判断 这些数据是否属于同一帧。 本案例从两方面分別提供了解决方法。
如果 PES头可解, 则可以根据其中的 PTS来判断当前数据长度(即两个有 PES头的包之间的数据长度)是否包含帧头信息:
1 ) 统计正确检测出来的 GOP的 PTS的顺序, 将分布概率与距离目前帧的 距离加权作为预期指数, 得到预期编码结构;
2 )根据接收顺序中从 I帧开始的一系列帧的 PTS到当前的 PTS及下一个 PTS 与预期的编码结构进行匹配
a ) 如果符合预期的编码结构, 则认为该数据长度的丟包中不包含帧头信 息, 即当前数据长度为一帧, 丟包发生在该帧内部, 不需要分割;
b )如果不符合预期的编码结构, 说明丟包中很可能包含帧头信息, 按照 预期的编码结构以及丟包发生的位置(连续长度, 丟包长度等)分割当前的数 据长度, 分配合理的帧类型和帧大小以及 PTS。
3 )若后续发现了之前判断为丟失帧头的帧, 则在校正步骤中更新之前的 判断结果。
另外, 可以根据丟包长度, 连续长度, 最大连续长度, 最大丟包长度等来 判断当前的数据长度是否为一帧以及属于何种帧类型:
1 )如果该数据长度和前一个 I帧的长度差不多, 则认为属于同一个 I帧; 如果该数据长度和 P帧差不多大, 且最大连续长度比 50帧之内的平均 B帧的数 据量大, 则认为该数据长度都属于同一个 P帧; 对其他情况转到 2 );
2 )如果该数据长度和两个 P帧差不多大, 则要拆分为两个 P帧, 将这改数 据长度分为两段, 使得每段的长度都和 P帧最接近, 并且要确保第二段以丟失 包开头; 对其他情况转到 3 );
3 )如果该数据长度和 P帧加 B帧差不多, 则要拆分为 P帧 +B帧, 将连续长 度最大的包归属为 P帧, 在此基 上将该数据长度分为两段, 使得每段的长度 分別接近 P帧和 B帧, 并且要确保第二段以丟失包开头; 对其他情况转到 4 );
4 )如果最大连续长度小于 B帧且该数据长度和三个 B帧差不多, 则要拆分 为三个 B帧, 将该数据长度分为三段, 使得每段的长度都接近 B帧, 并且要确 保第二段第三段以丟失包开头; 对其他情况转到 5 );
5 )如果最大连续长度小于 B帧且该数据长度和两个 B帧差不多, 则要拆分 为两个 B帧, 将该数据长度的包分为两段, 使得每段的长度都接近 B帧, 并且 要确保第二段第三段以丟失包开头; 对其他情况转到 6 );
6 )其他情况下认为该数据长度全部属于一帧。
本实施例结合以上各例,提供一个可选的帧类型检测方案, 具体流程如图 4所示: 分为以下几个阶段: 利用 PTS初步判断帧类型、 丟包处理、 利用数据 量进一步判断帧类型和类型纠正。
401: 数据输入后, 判断包头是否可解, 是则执行根据播放时间判断帧类 型,否则执行丟包处理;在帧类型判断结束后,判断是否之前帧判断是否有误, 有则执行帧类型纠正, 否则可以进入帧类型判断的循环, 即进入 401 , 具体执 行如下:
根据播放时间判断帧类型: 对输入的码流先判断是否为 TS over RTP的包, 如果是则需判断 TS包的 PES头是否加密。 对于 RTP包或 PES头可解的 TS over RTP的包, 可根据播放时间信息初步判断是否为 B帧, 具体实施可参考要点一; 丟包处理: 检测是否存在丟包, 若无丟包直接统计出数据量进入以下帧类 型判断步骤; 若有丟包则需针对 RTP或 TS over RTP包分別进行丟包处理,估计 帧边界、 帧数据量或部分帧类型, 具体实施可参考要点三;
根据数据量判断帧类型: 该过程实时判断帧类型,且动态智能的调整相关 参数, 具体实施可参考要点二;
类型纠正: 在判断过程中若发现之前的判断结果有误则会进行纠正, 该过 程不影响输出结果, 但可用于更新相关参数, 以提高后续判断的准确性, 具体 实施可参考要点二。
本发明实施例还提供了一种帧类型的检测装置, 如图 5所示, 包括: 时间检测单元 501 , 用于检测各帧的播放时间;
帧类型确定单元 502 , 用于若当前帧的播放时间小于已经接收到的帧的最 大播放时间, 则确定上述当前帧为双向预测编码 B帧;
进一步地, 上述图 5中还可以包括:
层级确定单元 503 , 用于依据各帧的播放顺序和编码顺序确定 B帧在分级 编码中所属的层级; 需要说明的是, 层级确定不是本发明实施例确定 B帧的必 要技术特征, 该技术特征仅作为后续进行需要层级信息的相关处理时才需要。
本发明实施例还提供了另一种帧类型的检测装置, 如图 6所示, 包括: 类型获得单元 601 , 用于获得已经接收到的帧所在码流的编码类型, 上述 编码类型包括: 开环编码和闭环编码;
帧类型确定单元 602 , 还用于若当前帧的数据量大于第一域值则确定当前 帧为明显的 I帧, 上述第一阈值由设定连续个数的帧的平均数据量以及 I帧数据 量计算得到;
若当前帧的前一帧为 I帧、 编码类型为闭环编码且当前帧为非明显的 I帧, 或者, 若当前帧的前一帧为 I帧、 编码类型为开环编码且当前帧的数据量大于 第四阈值,则确定当前帧为 P帧;上述第四阈值为一个图像组的 P帧平均数据量 以及 B帧平均数据量的均值;
若当前帧非 I帧也非 P帧, 则确定当前帧为 B帧。
进一步地, 上述帧类型确定单元 602, 还用于若当前帧大于第二阈值, 则 确定当前帧为 I帧; 上述第二阈值为当前帧之前的一个 I帧的数据量、 当前帧所 在图像组中 P帧的平均数据量以及设定个数连续帧的平均数据量中的最大值。
进一步地, 上述帧类型确定单元 602, 还用于若当前帧与前一个 I帧的间隔 超过固定间隔, 且当前帧大于第三阈值, 则确定当前帧为 I帧; 上述第三阈值 为: 当前帧所在图像组各帧的平均数据量、 当前帧的前一个 P帧的数据量以及 当前帧所在图像组 I帧的数据量、上一个 I帧到当前帧的距离与预期的固定 I帧间 隔的远离程度计算得到; 或者, 上述第三阈值根据当前帧所在图像组各帧的平 均数据量以及上一个 I帧到当前帧的距离与预期的固定 I帧间隔的远离程度计 算得到。
进一步地,上述帧类型确定单元 602,还用于若当前帧的上一帧为 P帧且当 前帧的数据量大于第五阈值, 或者当前图像组存在 B帧且当前帧的数据量大于 第六阈值, 则确定当前帧为 P帧; 上述第五阈值为: 第一调节因子与当前帧所 在图像组的 P帧的平均数据量的积, 上述第一调节因子大于 0.5且小于 1; 上述 第六阈值为: P帧平均数据量和 B帧平均数据量的均值;
若当前帧的上一帧为 B帧且当前帧的数据量小于第七阈值, 或者当前图像 组粗在 P帧且当前帧的数据量小于第八阈值,则确定当前帧为 P帧;上述第七阈 值为: 第二调节因子与当前帧所在图像组的 B帧的平均数据量的积, 上述第二 调节因子大于 1小于 1.5; 上述第八阈值为: P帧平均数据量与 B帧平均数据量的 均值。
进一步地, 如图 7所示, 上述装置还包括:
间隔获取单元 701 , 用于在帧类型判断结束后, 确定 I帧的固定间隔; 上述帧类型确定单元 602,还用于若在固定间隔达到后仍然没有判断存在 I 帧, 则将固定间隔处设定范围内的最大数据量的帧确定为 I帧;
第一更新单元 702,用于更新图像组中各种类型帧的平均数据量以及 I帧的 间隔参数。
进一步地, 如图 8所示, 上述装置还包括:
统计单元 801 , 用于在帧类型判断结束后, 统计连续的 B帧;
上述帧类型确定单元 602, 还用于若连续 B帧的数量大于预测值, 则将上 述连续的 B帧中数据量最大的帧确定为 P帧; 上述预测值大于等于 3小于等于 7 第二更新单元 802 , 用于更新图像组中各种类型帧的平均数据量。
进一步地, 如图 9所示, 上述装置还包括:
丟包类型确定单元 901 , 用于确定已经接收到的帧是否发生丟包, 若发生 丟包, 则确定丟包类型;
数据量确定单元 902, 用于若丟包类型为帧内丟包, 则计算帧数据量时确 定收到帧的数据量与丟包数据量的和为该帧的数据量;
若丟包类型为帧间丟包, 则确定丟包处之前的包的标志位是否为 1 , 若是, 则将丟包的数据量计算入后一帧, 否则将丟包的数据量平均分配给前后两帧。
需要说明的是, 本实施的装置和图 4或图 5的装置是可以合并使用的, 帧类 型确定单元 502与帧类型确定单元 602可以使用同一个功能单元实现。
本发明实施例充分利用 RTP或 TS over RTP的包头信息,结合视频中不同类 型帧的编码顺序以及不同类型帧的前后数据量大小关系,在不解码视频净载的 情况下快速实时的判断帧类型, 并且通过丟包处理、 自动更新参数以及后期帧 类型纠正的方法提高帧类型检测的正确率。
视频流中会有指示视频数据的播放时间的包头信息, 如: ISMA方式中的 RTP时间戳, 以及 TS over RTP方式中 PES头的 PTS。 本发明实施例将利用播放 时间信息和编码顺序的相互关系, 来判断某些特殊结构的编码类型, 如: B帧。 但对于 TS over RTP方式,可能存在 TS净载完全加密 PES头无法解码的情况, 即 PTS不可得, 因此, 本发明实施例还提供了不利用播放时间只利用数据量等信 息来进行帧类型判断的方案。
观察实际应用中的视频码流可以发现,同一个 GOP内不同类型的帧一般具 有较为明显的区別, I帧数据量最大, P帧其次, B帧最小。 如果能正确识別出 每个 GOP起始处的 I帧,则可以利用该帧的数据量判断此 GOP内部的 P帧和 B帧。 但由于视频信号的非平稳性, 不同位置处的 I 帧数据量差別存在着较大的差 別, 甚至会和之前 GOP中的 P帧的数据量相当, 给判断 I帧带来了困难。 本发明 实施例设计了一套可智能调节的动态参数,以提高帧类型判断的鲁棒性和准确 性。 特別是在判断 I帧时, 充分考虑了不同应用场景中 I帧的特性适当的调节判 断准则和相关参数, 大大降低了 I帧的误判率。
在有损传输的应用场景中,输入的视频流会发生丟包,根据丟包对判断过 程的影响,可以将其分为两类:一、帧内的丟包,此时帧边界的信息没有丟失, 可以先获取到帧边界,用对应的序列号来统计一帧的包数;二、帧边界丟包(如: RTP中标志位为 1的包, 或 TS over RTP中 PUSI置 1的包), 此时可能无法判断前 后两帧的边界,也可能前后两帧的数据拼接到一帧,使得帧数据量统计不准确, 影响帧类型判断的结果。本发明实施例将就此进行丟包检测、 帧边界估计以及 部分的帧类型估计。
在帧类型判断的前期, 由于统计数据不充足, 会存在较多的误判, 不仅影 响到已输出的结果, 更会通过改变各种参数影响到后续判断的准确性。本发明 实施例在判断帧类型流程之后增加了帧类型纠正,在数据增加后若输出结果有 明显错误时进行内部纠正, 内部纠正虽然不能改变已经输出的帧类型,但可以 通过调整参数的方式提高后续判断的准确性。
以下是帧类型判断后的几种应用,可以理解的是帧类型确定后的应用举例 不应理解为穷举, 不对本发明实施例构成限定。
1. 根据判断出来的帧类型进行不等保护: 带宽受限时, 可根据不同帧类 型对视频质量影响的区別进行不等保护, 使得视频接收质量达到最优。
2. 用预期周期结合 GOP的平均码率可以实现视频快速浏览: 对于存储在 本地的码流用户不想浏览全部的视频, 可以通过快速的预处理, 提取出 I帧对 应位置从而实现快速流览。对于存储在服务器的码流, 用户不想浏览全部的视 频, 服务器可以通过快速的预处理, 提取出 I帧对应位置从而有选择的传输关 键帧信息给用户。
3. 服务质量(Quality of Service, QOS ): 在带宽不足时, 在中间节点, 可以根据判断出的帧类型, 智能丟弃一部分 B帧或者 P帧 (靠近 GOP结束的 P 帧), 使得降低码率的同时, 尽可能少的影响视频质量。
另外基于实验,对本发明实施例的技术方案的效果进行了测试, 以下是测 试结果。
本节的实验在没有丟包的情况下,对利用播放时间和不利用播放时间的两 种情况, 分別与背景技术中的 二进行了对比, 结果如表 1所示。
Figure imgf000023_0001
Figure imgf000023_0002
测试序列: 使用现网捕获的 TS码流以及定码率编码的码流进行测试, 如 表一, 其中现网捕获的码流前三个(iptvl37,iptvl38,iptvl39 )是载荷部分加密 但 PES头部未加密的码流;定码率编码的码流码率为( 1500, 3000, 4000, 5000, 6000, 7000, 9000, 12000, 15000 )。 选用的码流都为 H.264编码, 其帧类型分 为 I、 P、 B三种, 且无分级^ 下面给出以上序列的帧类型检测实验结果, 如 表 2所示。
表 2 本文方法与现有方法检测结果对比 码流来源 I帧漏检率 I帧错检率 1 P- >ι出错率 | P- >B出错率 B- >P出错率 1 总出错率 对截获的码 现有方法 29.03% 7.09% : 0.73% 0.00% 0.01% ! 1.65% 流检测结果 11.81% ! 1.40% 0.00% 0.01% : 1.20% 利用 PTS 本文方法 15.19%
对自编的码 现有方法 10.67% 63.16% ; 7.62% 0.00% 0.00% : 3.08% 流检测结果 本文方法 10.77% 16.24% : 2.08% 0.00% 0.00% ! 1.19% 对截获的码 现有方法 29.03% 7.13% ; 0.73% ; 9.57% 4.51% : 9.47% 流检测结果 % 8.28% 4.14% ! 8.47% 不利用 PTS 本文方法 15.12% 11.49% ; 1.40
对自编的码 现有方法 10.67% 64.90% : 7.62% : 6.15% 3.35% ! 7.31% 流检测结果 本文方法 11.93% 15.43% ! 1.96% ; 6.39% 1.75% : 4.44% 如表二所示,本实验比较了以下几个因子: I帧漏检率为漏检的 I帧与序列中 I帧总数的比值; I帧错检率为将 P或 B误判为 I帧的数目与 I帧总数的比值(值得 注意的是, 绝大多数情况下都是只会将 P错判为 I, 很少情况下会将 B错判为 I, 这与 B帧码率远远小于 I帧码率的事实一致); P->I出错率为错将 P帧判为 I帧的 数目与实际 P帧总数的比值; P->B出错率为错将 P帧判为 B帧的数目与实际 P帧 总数的比值; B->P出错率为错将 B帧判为 P帧的数目与实际 B帧总数的比值; 总 出错率为错判的数目与总帧数的比值(只要判断的帧类型与实际类型不符合即 为错判)。 I帧漏检率与 I帧错检率平均值可以体现对于 I帧的正确检测概率。
由于利用 PTS判断 B帧的准确率是 100% , 因此不再单独比较利用播放时间 和不利用播放时间的结果。 同时, 为了充分体现本发明实施例二的优越性, 在 利用播放时间的情况下对现有方法也增加了利用播放时间判断 B帧的过程, 因 此, 性能的不同主要来自利用帧数据量判断的方法的差异。 结果显示, 在可以 利用播放时间判断帧类型以及不利用播放时间判断帧类型的情况下,本方法对 于现网截获的码流以及自编的码流都比现有方法好, 尤其对自编码流, 本方法 检测效果更是明显, 甚至有些情况下可以无错, 而现有方法则 艮少存在无错的 情况。
图 10至图 15给出了一些序列的详细检测结果,其中实际的线条上用圓形标 识, 预测的线条用三角形标识; 包括 I帧分布状况(横轴表示 I帧间隔, 间隔为 1表示相邻的两帧为 I, 间隔为 0表示 I帧间隔大于 49 , I帧预测周期是本文方法 预测的 I帧周期, I帧实际周期是实际的 I帧周期)以及帧类型的分布情况(图中 表格里, 矩阵对角线为正确判断的帧数, 其他位置为错判)。 图标题为序列名 + 总帧数 +总错检率。 可见现网的序列一般都是存在一个固定的 I帧间隔(图中最 大值), 伴随着场景的切换, 会自适应地插入一些 I帧, 从而造成了在最大值附 近的一个扰动, 形成了图中的 I帧分布状况。 对于 FIFA序列 (图十四), 可以看 到实际周期中存在两个极大值, 本文算法也能较精确的分辨出两个极大值。按 照本文算法估计出的预期 I帧间隔与实际 I帧间隔 ί艮相似, 因此可以用来指导快 速浏览时的跳帧。 图 10: iptvl37 15861 (error 0.6%) 结果如表 3所示:
Figure imgf000025_0001
Figure imgf000025_0002
图 11 : iptvl38 17320 (error 0.1%), 结果如表 4所示:
表 4
Figure imgf000025_0003
图 12: song 38741 (error 0.9%), 结果如表 5所示:
表 5
Figure imgf000025_0004
图 13: FIFA 9517 (error 1.3%), 结果如表 6所示:
表 6
FIFA 检测为 P 检测为 B 检测为 I
实际类型 P 4267 0 21
实际类型 B 0 4693 0 实际类型 I 106 0 430
图 14: travel 1486 (error 0.8%), 结 表 7所示:
Figure imgf000026_0002
图 15: sport 1156 (error 0.3%), 结 8所示:
Figure imgf000026_0001
Figure imgf000026_0003
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步骤 是可以通过程序来指令相关的硬件完成,上述的程序可以存储于一种计算机可 读存储介质中, 上述提到的存储介质可以是只读存储器, 磁盘或光盘等。
以上对本发明实施例所提供的帧类型的检测方法和装置进行了详细介绍, 说明只是用于帮助理解本发明的方法及其核心思想; 同时,对于本领域的一般 技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处, 综上, 本说明书内容不应理解为对本发明的限制。

Claims

权 利 要 求
1、 一种帧类型的检测方法, 其特征在于, 包括:
检测各帧的播放时间;
若当前帧的播放时间小于已经接收到的帧的最大播放时间,则确定所述当 前帧为双向预测编码帧 B帧。
2、 根据权利要求 1所述方法, 其特征在于, 帧类型检测结束后, 还包括: 依据各帧的播放顺序和编码顺序确定 B帧在分级编码中所属的层级。
3、 根据权利要求 1所述方法, 其特征在于, 若检测播放时间失败还包括: 获得接收到的帧所在码流的编码类型, 所述编码类型包括: 开环编码和闭 环编码;
若当前帧的数据量大于第一域值则确定当前帧为明显的帧内编码帧 I帧, 所述第一阈值由设定连续个数的帧的平均数据量以及 I帧数据量计算得到; 若当前帧的前一帧为 I帧、 编码类型为闭环编码且当前帧为非明显的 I帧, 或者, 若当前帧的前一帧为 I帧、 编码类型为开环编码且当前帧的数据量大于 第四阈值, 则确定当前帧为单向预测编码帧 P帧; 所述第四阈值为一个图像组 的 P帧平均数据量以及 B帧平均数据量的均值;
若当前帧非 I帧也非 P帧, 则确定当前帧为 B帧。
4、根据权利要求 3所述方法, 其特征在于, 所述获得已经接收到的帧所在 码流的编码类型包括:
统计明显的 I帧后一帧的类型,若为 P帧的比例达到设定比例则确定编码类 型为闭环编码, 否则为开环编码。
5、 根据权利要求 3所述方法, 其特征在于, 还包括:
若当前帧的数据量大于第二阈值, 则确定当前帧为 I帧; 所述第二阈值为 当前帧之前的一个 I帧的数据量、 当前帧所在图像组中 P帧的平均数据量以及设 定个数连续帧的平均数据量中的最大值。
6、 根据权利要求 3所述方法, 其特征在于, 还包括:
若当前帧与前一个 I帧的间隔超过固定间隔, 且当前帧的数据量大于第三 阈值, 则确定当前帧为 I帧; 所述第三阈值根据当前帧所在图像组各帧的平均 数据量、 当前帧的前一个 P帧的数据量以及当前帧所在图像组 I帧的数据量、 上 一个 I帧到当前帧的距离与预期的固定 I帧间隔的远离程度计算得到; 或者, 所 述第三阈值根据当前帧所在图像组各帧的平均数据量以及上一个 I帧到当前帧 的距离与预期的固定 I帧间隔的远离程度计算得到。
7、 根据权利要求 3所述方法, 其特征在于, 还包括:
若当前帧的上一帧为 P帧且当前帧的数据量大于第五阈值, 或者当前图像 组存在 B帧且当前帧的数据量大于第六阈值, 则确定当前帧为 P帧; 所述第五 阈值为: 第一调节因子与当前帧所在图像组的 P帧的平均数据量的积, 所述第 一调节因子大于 0.5且小于 1; 所述第六阈值为: P帧平均数据量和 B帧平均数据 量的均值;
若当前帧的上一帧为 B帧且当前帧的数据量小于第七阈值, 或者当前图像 组存在 P帧且当前帧的数据量小于第八阈值,则确定当前帧为 P帧;所述第七阈 值为: 第二调节因子与当前帧所在图像组的 B帧的平均数据量的积, 所述第二 调节因子大于 1小于 1.5; 所述第八阈值为: P帧平均数据量与 B帧平均数据量的 均值。
8、 根据权利要求 3至 7任意一项所述方法, 其特征在于, 还包括: 在帧类型判断结束后, 确定 I帧的固定间隔, 若在固定间隔达到后仍然没 有判断存在 I帧, 则将固定间隔处设定范围内的最大数据量的帧确定为 I帧; 并 更新图像组中各种类型帧的平均数据量以及 I帧的间隔参数。
9、 根据权利要求 3至 7任意一项所述方法, 其特征在于, 还包括: 在帧类型判断结束后, 统计连续的 B帧, 若连续 B帧的数量大于预测值, 则将所述连续的 B帧中数据量最大的帧确定为 P帧; 并更新图像组中各种类型 帧的平均数据量; 所述预测值大于等于 3小于等于 7。
10、 根据权利要求 3至 7任意一项所述方法, 其特征在于, 还包括: 确定已经接收到的帧是否发生丟包, 若发生丟包, 则确定丟包类型; 若丟包类型为帧内丟包,则计算帧数据量时确定收到帧的数据量与丟包数 据量的和为该帧的数据量;
若丟包类型为帧间丟包, 则确定丟包处之前的包的标志位是否为 1 , 若是, 则将丟包的数据量计算入后一帧, 否则将丟包的数据量平均分配给前后两帧。
11、 一种帧类型的检测方法, 其特征在于, 包括:
获得接收到的帧所在码流的编码类型, 所述编码类型包括: 开环编码和闭 环编码; 若当前帧的数据量大于第一域值则确定当前帧为明显的帧内编码帧 I帧, 所述第一阈值由设定连续个数的帧的平均数据量以及 I帧数据量计算得到; 若当前帧的前一帧为 I帧、 编码类型为闭环编码且当前帧为非明显的 I帧, 或者, 若当前帧的前一帧为 I帧、 编码类型为开环编码且当前帧的数据量大于 第四阈值, 则确定当前帧为单向预测编码帧 P帧; 所述第四阈值为一个图像组 的 P帧平均数据量以及 B帧平均数据量的均值;
若当前帧非 I帧也非 P帧, 则确定当前帧为 B帧。
12、 根据权利要求 11所述方法, 其特征在于, 所述获得已经接收到的帧所 在码流的编码类型包括:
统计明显的 I帧后一帧的类型,若为 P帧的比例达到设定比例则确定编码类 型为闭环编码, 否则为开环编码。
13、 根据权利要求 11所述方法, 其特征在于, 还包括:
若当前帧的数据量大于第二阈值, 则确定当前帧为 I帧; 所述第二阈值为 当前帧之前的一个 I帧的数据量、 当前帧所在图像组中 P帧的平均数据量以及设 定个数连续帧的平均数据量中的最大值。
14、 根据权利要求 11所述方法, 其特征在于, 还包括:
若当前帧与前一个 I帧的间隔超过固定间隔, 且当前帧的数据量大于第三 阈值, 则确定当前帧为 I帧; 所述第三阈值根据当前帧所在图像组各帧的平均 数据量、 当前帧的前一个 P帧的数据量以及当前帧所在图像组 I帧的数据量、 上 一个 I帧到当前帧的距离与预期的固定 I帧间隔的远离程度计算得到; 或者, 所 述第三阈值根据当前帧所在图像组各帧的平均数据量以及上一个 I帧到当前帧 的距离与预期的固定 I帧间隔的远离程度计算得到。
15、 根据权利要求 11所述方法, 其特征在于, 还包括:
若当前帧的上一帧为 P帧且当前帧的数据量大于第五阈值, 或者当前图像 组存在 B帧且当前帧的数据量大于第六阈值, 则确定当前帧为 P帧; 所述第五 阈值为: 第一调节因子与当前帧所在图像组的 P帧的平均数据量的积, 所述第 一调节因子大于 0.5且小于 1; 所述第六阈值为: P帧平均数据量和 B帧平均数据 量的均值;
若当前帧的上一帧为 B帧且当前帧的数据量小于第七阈值, 或者当前图像 组存在 P帧且当前帧的数据量小于第八阈值,则确定当前帧为 P帧;所述第七阈 值为: 第二调节因子与当前帧所在图像组的 B帧的平均数据量的积, 所述第二 调节因子大于 1小于 1.5; 所述第八阈值为: P帧平均数据量与 B帧平均数据量的 均值。
16、 根据权利要求 11至 15任意一项所述方法, 其特征在于, 还包括: 在帧类型判断结束后, 确定 I帧的固定间隔, 若在固定间隔达到后仍然没 有判断存在 I帧, 则将固定间隔处设定范围内的最大数据量的帧确定为 I帧; 并 更新图像组中各种类型帧的平均数据量以及 I帧的间隔参数。
17、 根据权利要求 11至 15任意一项所述方法, 其特征在于, 还包括: 在帧类型判断结束后, 统计连续的 B帧, 若连续 B帧的数量大于预测值, 则将所述连续的 B帧中数据量最大的帧确定为 P帧; 并更新图像组中各种类型 帧的平均数据量; 所述预测值大于等于 3小于等于 7。
18、 根据权利要求 11至 15任意一项所述方法, 其特征在于, 还包括: 确定已经接收到的帧是否发生丟包, 若发生丟包, 则确定丟包类型; 若丟包类型为帧内丟包,则计算帧数据量时确定收到帧的数据量与丟包数 据量的和为该帧的数据量;
若丟包类型为帧间丟包, 则确定丟包处之前的包的标志位是否为 1 , 若是, 则将丟包的数据量计算入后一帧, 否则将丟包的数据量平均分配给前后两帧。
19、 根据权利要求 18所述方法, 其特征在于, 还包括:
通过统计已经检测出的帧类型预测编码结构;
若丟包类型为帧间丟包,丟包处之前的包的标志位无法检测, 则依据预测 的编码结构以及丟包的位置分割当前数据长度。
20、 一种帧类型的检测装置, 其特征在于, 包括:
时间检测单元, 用于检测各帧的播放时间;
帧类型确定单元,用于若当前帧的播放时间小于已经接收到的帧的最大播 放时间, 则确定所述当前帧为双向预测编码 B帧。
21、 根据权利要求 20所述装置, 其特征在于, 还包括:
层级确定单元, 用于依据各帧的播放顺序和编码顺序确定 B帧在分级编码 中所属的层级。
22、 一种帧类型的检测装置, 其特征在于, 包括:
类型获得单元, 用于获得已经接收到的帧所在码流的编码类型, 所述编码 类型包括: 开环编码和闭环编码;
帧类型确定单元,用于若当前帧的数据量大于第一域值则确定当前帧为明 显的 I帧, 所述第一阈值由设定连续个数的帧的平均数据量以及 I帧数据量计算 得到;
若当前帧的前一帧为 I帧、 编码类型为闭环编码且当前帧为非明显的 I帧, 或者, 若当前帧的前一帧为 I帧、 编码类型为开环编码且当前帧的数据量大于 第四阈值,则确定当前帧为 P帧;所述第四阈值为一个图像组的 P帧平均数据量 以及 B帧平均数据量的均值;
若当前帧非 I帧也非 P帧, 则确定当前帧为 B帧。
23、 根据权利要求 22所述装置, 其特征在于,
所述帧类型确定单元,还用于若当前帧的数据量大于第二阈值, 则确定当 前帧为 I帧; 所述第二阈值为当前帧之前的一个 I帧的数据量、 当前帧所在图像 组中 P帧的平均数据量以及设定个数连续帧的平均数据量中的最大值。
24、 根据权利要求 22所述装置, 其特征在于,
所述帧类型确定单元,还用于若当前帧与前一个 I帧的间隔超过固定间隔, 且当前帧的数据量大于第三阈值, 则确定当前帧为 I帧; 所述第三阈值为: 当 前帧所在图像组各帧的平均数据量、 当前帧的前一个 P帧的数据量以及当前帧 所在图像组 I帧的数据量、上一个 I帧到当前帧的距离与预期的固定 I帧间隔的远 离程度计算得到; 或者, 所述第三阈值根据当前帧所在图像组各帧的平均数据 量以及上一个 I帧到当前帧的距离与预期的固定 I帧间隔的远离程度计算得到。
25、 根据权利要求 22所述装置, 其特征在于,
所述帧类型确定单元, 还用于若当前帧的上一帧为 P帧且当前帧的数据量 大于第五阈值, 或者当前图像组存在 B帧且当前帧的数据量大于第六阈值, 则 确定当前帧为 P帧; 所述第五阈值为: 第一调节因子与当前帧所在图像组的 P 帧的平均数据量的积, 所述第一调节因子大于 0.5且小于 1; 所述第六阈值为: P帧平均数据量和 B帧平均数据量的均值;
若当前帧的上一帧为 B帧且当前帧的数据量小于第七阈值, 或者当前图像 组粗在 P帧且当前帧的数据量小于第八阈值,则确定当前帧为 P帧;所述第七阈 值为: 第二调节因子与当前帧所在图像组的 B帧的平均数据量的积, 所述第二 调节因子大于 1小于 1.5; 所述第八阈值为: P帧平均数据量与 B帧平均数据量的 均值。
26、 根据权利要求 22至 25任意一项所述装置, 其特征在于, 还包括: 间隔获取单元, 用于在帧类型判断结束后, 确定 I帧的固定间隔; 所述帧类型确定单元,还用于若在固定间隔达到后仍然没有判断存在 I帧, 则将固定间隔处设定范围内的最大数据量的帧确定为 I帧;
第一更新单元, 用于更新图像组中各种类型帧的平均数据量以及 I帧的间 隔参数。
27、 根据权利要求 22至 25任意一项所述装置, 其特征在于, 还包括: 统计单元, 用于在帧类型判断结束后, 统计连续的 B帧;
所述帧类型确定单元, 还用于若连续 B帧的数量大于预测值, 则将所述连 续的 B帧中数据量最大的帧确定为 P帧; 所述预测值大于等于 3小于等于 7
第二更新单元, 用于更新图像组中各种类型帧的平均数据量。
28、 根据权利要求 22至 25任意一项所述装置, 其特征在于, 还包括: 丟包类型确定单元,用于确定已经接收到的帧是否发生丟包,若发生丟包, 则确定丟包类型;
数据量确定单元, 用于若丟包类型为帧内丟包, 则计算帧数据量时确定收 到帧的数据量与丟包数据量的和为该帧的数据量;
若丟包类型为帧间丟包, 则确定丟包处之前的包的标志位是否为 1 , 若是, 则将丟包的数据量计算入后一帧, 否则将丟包的数据量平均分配给前后两帧。
PCT/CN2011/080343 2010-12-17 2011-09-29 帧类型的检测方法和装置 WO2012079406A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP11848791.7A EP2637410B1 (en) 2010-12-17 2011-09-29 Detection method and device for frame type
US13/919,674 US9497459B2 (en) 2010-12-17 2013-06-17 Method and apparatus for detecting frame types

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010594322.5A CN102547300B (zh) 2010-12-17 2010-12-17 帧类型的检测方法和装置
CN201010594322.5 2010-12-17

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/919,674 Continuation US9497459B2 (en) 2010-12-17 2013-06-17 Method and apparatus for detecting frame types

Publications (1)

Publication Number Publication Date
WO2012079406A1 true WO2012079406A1 (zh) 2012-06-21

Family

ID=46244056

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/080343 WO2012079406A1 (zh) 2010-12-17 2011-09-29 帧类型的检测方法和装置

Country Status (4)

Country Link
US (1) US9497459B2 (zh)
EP (2) EP2637410B1 (zh)
CN (1) CN102547300B (zh)
WO (1) WO2012079406A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2830317A4 (en) * 2012-08-21 2015-07-08 Huawei Tech Co Ltd METHOD AND DEVICE FOR DETECTING IMAGE TYPES AND IMAGE SIZES IN A VIDEO POWER
CN111917661A (zh) * 2020-07-29 2020-11-10 北京字节跳动网络技术有限公司 数据传输方法、装置、电子设备和计算机可读存储介质

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160234528A1 (en) * 2015-02-09 2016-08-11 Arris Enterprises, Inc. Carriage of video coding for browsers (vcb) video over mpeg-2 transport streams
KR101687115B1 (ko) * 2015-09-09 2016-12-15 한국과학기술원 통신 시스템의 암호화 패킷 전송 방법
CN108024111B (zh) * 2016-10-28 2019-12-06 北京金山云网络技术有限公司 一种帧类型判定方法及装置
CN107566844B (zh) * 2017-09-05 2019-05-14 成都德芯数字科技股份有限公司 一种隐藏字幕处理方法和装置
CN109492408B (zh) * 2017-09-13 2021-06-18 杭州海康威视数字技术股份有限公司 一种加密数据的方法及装置
CN111901605B (zh) * 2019-05-06 2022-04-29 阿里巴巴集团控股有限公司 视频处理方法、装置、电子设备及存储介质
CN112102400B (zh) * 2020-09-15 2022-08-02 上海云绅智能科技有限公司 基于距离的闭环检测方法、装置、电子设备和存储介质
CN112689195B (zh) * 2020-12-22 2023-04-11 中国传媒大学 视频加密方法、分布式加密系统、电子设备及存储介质
US11876620B1 (en) * 2022-08-23 2024-01-16 Hewlett Packard Enterprise Development Lp Error correction for decoding frames

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1379937A (zh) * 1999-10-15 2002-11-13 艾利森电话股份有限公司 在采用可变比特率的系统中健壮的帧类型保护方法与系统
EP1608087A1 (en) * 2004-05-18 2005-12-21 Motorola, Inc. Wireless communication unit and method for acquiring synchronisation therein
CN101626507A (zh) * 2008-07-07 2010-01-13 华为技术有限公司 一种识别rtp包的帧类型的方法、装置及系统

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1067831C (zh) 1998-07-15 2001-06-27 国家科学技术委员会高技术研究发展中心 Mpeg-2视频解码器及其输入缓冲器的控制方法
KR100531854B1 (ko) * 2003-05-07 2005-11-30 엘지전자 주식회사 영상 코덱의 프레임 타입 오인식 방지 방법
CN1722830A (zh) * 2004-07-13 2006-01-18 皇家飞利浦电子股份有限公司 一种编码数据的解码方法和装置
JP4375305B2 (ja) * 2004-10-26 2009-12-02 ソニー株式会社 情報処理装置および情報処理方法、記録媒体、並びに、プログラム
KR100801002B1 (ko) 2006-06-05 2008-02-11 삼성전자주식회사 무선 네트워크 상에서 멀티미디어 데이터를 전송/재생하는방법, 및 그 방법을 이용한 무선 기기
EP2213000B1 (en) 2007-07-16 2014-04-02 Telchemy, Incorporated Method and system for content estimation of packet video streams
WO2009025357A1 (ja) 2007-08-22 2009-02-26 Nippon Telegraph And Telephone Corporation 映像品質推定装置、映像品質推定方法、フレーム種別判定方法、および記録媒体
US8335262B2 (en) * 2008-01-16 2012-12-18 Verivue, Inc. Dynamic rate adjustment to splice compressed video streams
JP5524193B2 (ja) * 2008-06-16 2014-06-18 テレフオンアクチーボラゲット エル エム エリクソン(パブル) メディアストリーム処理
CN101426137B (zh) * 2008-11-25 2011-08-03 上海华为技术有限公司 一种视频帧类型的识别方法和装置
CN101518657A (zh) 2008-12-31 2009-09-02 上海序参量科技发展有限公司 扇形环境污染消除装置
CN101651815B (zh) 2009-09-01 2012-01-11 中兴通讯股份有限公司 一种可视电话及利用其提高视频质量方法
EP2413535B1 (en) * 2010-07-30 2012-09-19 Deutsche Telekom AG Method for estimating the type of the group of picture structure of a plurality of video frames in a video stream
CN103634698B (zh) * 2012-08-21 2014-12-03 华为技术有限公司 视频流的帧类型检测、帧大小检测方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1379937A (zh) * 1999-10-15 2002-11-13 艾利森电话股份有限公司 在采用可变比特率的系统中健壮的帧类型保护方法与系统
EP1608087A1 (en) * 2004-05-18 2005-12-21 Motorola, Inc. Wireless communication unit and method for acquiring synchronisation therein
CN101626507A (zh) * 2008-07-07 2010-01-13 华为技术有限公司 一种识别rtp包的帧类型的方法、装置及系统

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2830317A4 (en) * 2012-08-21 2015-07-08 Huawei Tech Co Ltd METHOD AND DEVICE FOR DETECTING IMAGE TYPES AND IMAGE SIZES IN A VIDEO POWER
JP2015524188A (ja) * 2012-08-21 2015-08-20 ▲ホア▼▲ウェイ▼技術有限公司 ビデオ・ストリームのフレーム・タイプ検出方法および装置ならびにフレーム・サイズ検出方法および装置
US9571862B2 (en) 2012-08-21 2017-02-14 Huawei Technologies Co., Ltd. Frame type detection and frame size detection methods and apparatuses for video stream
EP3203733A1 (en) * 2012-08-21 2017-08-09 Huawei Technologies Co., Ltd. Frame type detection method for video stream
CN111917661A (zh) * 2020-07-29 2020-11-10 北京字节跳动网络技术有限公司 数据传输方法、装置、电子设备和计算机可读存储介质
CN111917661B (zh) * 2020-07-29 2023-05-02 抖音视界有限公司 数据传输方法、装置、电子设备和计算机可读存储介质

Also Published As

Publication number Publication date
EP2814255B1 (en) 2017-01-04
US9497459B2 (en) 2016-11-15
US20130279585A1 (en) 2013-10-24
EP2637410A1 (en) 2013-09-11
EP2814255A3 (en) 2015-06-03
EP2637410B1 (en) 2020-09-09
CN102547300A (zh) 2012-07-04
EP2814255A2 (en) 2014-12-17
CN102547300B (zh) 2015-01-21
EP2637410A4 (en) 2014-07-30

Similar Documents

Publication Publication Date Title
WO2012079406A1 (zh) 帧类型的检测方法和装置
KR101414435B1 (ko) 비디오 스트림 품질 평가 방법 및 장치
EP2524515B1 (en) Technique for video quality estimation
US10165310B2 (en) Transcoding using time stamps
EP2413612B1 (en) Methods and apparatuses for temporal synchronization between the video bit stream and the output video sequence
US9723329B2 (en) Method and system for determining a quality value of a video stream
CN107770538B (zh) 一种检测场景切换帧的方法、装置和系统
WO2010060376A1 (zh) 一种视频帧类型的识别方法和装置
EP2404451B1 (en) Processing of multimedia data
CN103716640B (zh) 帧类型的检测方法和装置
US8565083B2 (en) Thinning of packet-switched video data
KR101373414B1 (ko) 무선망에서의 mpeg-2 ts 기반 h.264/avc 비디오 전송 품질 향상을 위한 패킷 기반 비디오 스트리밍 우선 전송 방법 및 시스템
Zhang et al. Packet-layer model for quality assessment of encrypted video in IPTV services

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11848791

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2011848791

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE