WO2021114305A1 - Video processing method and apparatus, and computer readable storage medium - Google Patents

Video processing method and apparatus, and computer readable storage medium Download PDF

Info

Publication number
WO2021114305A1
WO2021114305A1 PCT/CN2019/125411 CN2019125411W WO2021114305A1 WO 2021114305 A1 WO2021114305 A1 WO 2021114305A1 CN 2019125411 W CN2019125411 W CN 2019125411W WO 2021114305 A1 WO2021114305 A1 WO 2021114305A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
gop
video
index information
frames
Prior art date
Application number
PCT/CN2019/125411
Other languages
French (fr)
Chinese (zh)
Inventor
杨胜凯
刘俊
杨海涛
陈绍林
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201980086119.6A priority Critical patent/CN113261283A/en
Priority to PCT/CN2019/125411 priority patent/WO2021114305A1/en
Publication of WO2021114305A1 publication Critical patent/WO2021114305A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/114Adapting the group of pictures [GOP] structure, e.g. number of B-frames between two anchor frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/177Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction

Definitions

  • the present invention relates to the field of Internet technology, in particular to video processing methods, devices and computer storage media.
  • a GOP is a group of continuous image pictures (ie, frame pictures, referred to as frames for short).
  • the computing device after the computing device receives the video, it needs to decode and play the video.
  • a computing device receives a request to play a drag progress bar for a video, in response to the playback request, it acquires multiple GOPs that make up the video from the drag stop position, and decodes and plays each GOP.
  • the target frame pointed to by the drag stop position is a non-I frame
  • the computing device needs to search for the I frame in several frames before or after the target frame, so as to start decoding and playing from the I frame.
  • the distance between the I frame and the target frame is large, the video processing efficiency will be reduced to a certain extent, and the user's viewing experience will be affected.
  • the computing device can discard the GOP where the target frame is located, and enter the decoding and playback of the next GOP. This will cause some important video information to be discarded and affect the user's viewing experience.
  • the embodiments of the present invention disclose a video processing method, a device, and a computer-readable storage medium, which can solve the problems of reduced video processing efficiency and loss of important video information in existing solutions.
  • an embodiment of the present invention discloses a video processing method applied to a computing device.
  • the method includes: acquiring a group of pictures GOP in a video, the first frame of the GOP is the first I frame, and the GOP is Including M frames, M is a positive integer. It is determined whether the M frames include a virtual intra-coded VI frame, and when the VI frame is included in the M frames, a second I frame is inserted before the VI frame. Among them, the second I frame is a frame referenced by the VI frame during video decoding.
  • inserting the second I frame before the VI frame facilitates subsequent decoding and playback of the video from the second I frame. It can solve the problems of reduced video processing efficiency, loss of important video information, and waste of storage resources of computing devices in the prior art, thereby helping to improve video processing efficiency.
  • the computing device in response to the video playback request, determines that the start time of the video in the video playback request is after the second I frame in the GOP. Then start decoding and play the video from the first frame.
  • the video can be decoded and played from the second I frame in the video playback scene.
  • the decoding starts from the first I frame of the GOP, which can save video decoding time and improve video processing efficiency.
  • the second I frame is the previous frame of the VI frame.
  • the GOP further includes index information of the GOP, and the index information records the storage address of the second I frame.
  • the computing device Before inserting the second I frame before the VI frame, the computing device can obtain the second I frame from the storage address of the second I frame according to the index information of the GOP.
  • the index information of the VI frame is inserted in the VI frame after the second I frame.
  • the index information of the VI frame is used to point to the second I frame.
  • the computing device may obtain the second I frame pointed to by the index information according to the index information of the VI frame.
  • the computing device can obtain the second I frame to be inserted according to the index information of the GOP or the index information of the VI frame. It is convenient to insert the second I frame before the VI frame later. This can decode the video faster.
  • the second I frame is only used for decoding the VI frame, and is not used for output display.
  • the GOP includes at least one network abstraction layer unit NALU, and the computing device recognizes whether supplementary enhancement information SEI NALU is included in the GOP to determine whether the M frames include VI frames.
  • the SEI NALU is used to indicate that the frame in which the i-th NALU before the SEI NALU is located is a VI frame, or indicates that the frame in which the j-th NALU after the SEI NALU is located is a VI frame.
  • the computing device can recognize the VI frames included in the GOP by recognizing the SEI NALU in the GOP, which can improve the convenience and efficiency of VI frame recognition.
  • the GOP includes reference frame RPS information of the frame.
  • the computing device determines whether the VI frame is included in the M frames by identifying the RPS information of each frame in the GOP. Wherein, when the RPS information of the frame is used to indicate to refer to an I frame when decoding the frame, and the previous frame of the frame is a non-I frame, the frame is a VI frame.
  • the computing device directly recognizes the reference frame RPS information in the frame to determine whether the frame is a VI frame. This can improve the accuracy of VI frame recognition.
  • the computing device receives a video processing request, the video processing request carries the start time of the video, and the video includes at least one GOP.
  • the GOP corresponding to the start time is obtained from the GOP index table.
  • at least one mapping relationship is recorded in the GOP index table, and the mapping relationship is that each GOP corresponds to the index information of the GOP, and the index information of the GOP includes the start time of the GOP.
  • the video processing request includes a video playback request or a video download request.
  • the computing device may respond to the video playback request and obtain the GOP where the start time is located from the GOP index table.
  • the video processing request is a video download request, in response to the video download request, at least one GOP starting from the GOP at the start time is obtained from the GOP index table.
  • the computing device can obtain the corresponding GOP in the video according to different application scenarios. To process the GOP. This helps to obtain the corresponding GOP for video processing according to the actual needs of the device.
  • the index information of the GOP also includes the playing time of the frame.
  • the VI frame is the VI frame with the smallest difference between the playback time and the start time of the GOP in the GOP.
  • the computing device can find the VI frame closest to the playback time to insert the second I frame, which avoids I frame insertion processing for each VI frame in the GOP, saves equipment resources, and improves Video processing efficiency.
  • an embodiment of the present invention provides a video processing device, which includes a functional module or unit for executing the method described in the first aspect or any possible implementation manner of the first aspect.
  • an embodiment of the present invention provides a computing device, including: a processor, a memory, a communication interface, and a bus; the processor, the communication interface, and the memory communicate with each other through the bus; the communication interface is used to receive and send data; and the memory , Is used to store instructions; the processor is used to call instructions in the memory to execute the method described in the first aspect or any possible implementation of the first aspect.
  • a computer-readable storage medium is provided, and the computer-readable storage medium is used to execute the instructions of the method described in the first aspect.
  • a computer program product which when it runs on a computer, enables the computer to execute the instructions of the method described in the first aspect.
  • a chip product is provided to implement the foregoing first aspect or the method in any possible implementation manner of the first aspect.
  • Fig. 1 is a schematic structural diagram of a GOP provided by an embodiment of the present invention.
  • Figure 2 is a schematic structural diagram of a NALU provided by an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a SEI NALU provided by an embodiment of the present invention.
  • Fig. 4 is a schematic structural diagram of a video processing system provided by an embodiment of the present invention.
  • Fig. 5 is a schematic structural diagram of a video decoding unit provided by an embodiment of the present invention.
  • Fig. 6 is a schematic structural diagram of another GOP provided by an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a SEI NALU inserted into a GOP according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of another SEI NALU inserted into a GOP according to an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a video reading and writing unit provided by an embodiment of the present invention.
  • Fig. 10 is a schematic structural diagram of another GOP provided by an embodiment of the present invention.
  • FIG. 11 is a schematic diagram of a user dragging a video playback progress bar according to an embodiment of the present invention.
  • Figure 12 is a schematic structural diagram of another GOP provided by an embodiment of the present invention.
  • FIG. 13A is a schematic diagram of storing GOPs in a time index manner according to an embodiment of the present invention.
  • FIG. 13B is a schematic diagram of storing GOPs in a frame number index mode provided by an embodiment of the present invention.
  • FIG. 14 is a schematic flowchart of a video processing method provided by an embodiment of the present invention.
  • FIG. 15 is a schematic diagram of a GOP that composes a video according to an embodiment of the present invention.
  • FIG. 16 is a schematic diagram of an operation for a user to download a video offline according to an embodiment of the present invention.
  • Figure 17 is a schematic structural diagram of a new GOP provided by an embodiment of the present invention.
  • FIG. 18 is a schematic structural diagram of a video processing device provided by an embodiment of the present invention.
  • FIG. 19 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.
  • GOP also known as group of pictures. Refers to a group of continuous images (also called frames), specifically the group of images between two I-frames. GOP indicates the distance between two I frames.
  • I frame also called intra-frame coded frame, is an independent frame with all its own information, and can be decoded independently without referring to other frames.
  • the first frame in the video is usually an I frame.
  • Non-I frames refer to frames other than I frames, specifically including B frames or P frames.
  • the B frame also called bidirectional predictive coding frame.
  • the B frame records the difference between the current frame and the previous and next frames. That is to say, when decoding a B frame, it is necessary to refer to the previous frame and the next frame of the B frame to decode.
  • the previous frame of the B frame refers to the frame before the B frame and adjacent to the B frame; the frame after the B frame refers to the frame after the B frame and adjacent to the B frame.
  • P frame also known as inter-frame predictive coding frame.
  • the P frame records the difference between the current frame and the previous frame. That is to say, when decoding a P frame, you need to refer to the previous frame of the P frame (specifically, it may be a P frame or an I frame) to decode.
  • VI frame virtual independent frame, VI
  • the VI frame is essentially a P frame, but when decoding a VI frame, refer to the I frame in front of the VI frame.
  • Fig. 1 for a schematic structural diagram of a GOP. As shown in Figure 1, the GOP includes 3 VI frames. When decoding each VI frame, only the I frame that appears before the VI frame in the GOP is referred to, as shown by the arrow in the figure.
  • NALU network abstraction layer unit
  • each frame is composed of at least one NALU.
  • FIG. 2 for a schematic structural diagram of a NALU provided by an embodiment of the present invention.
  • NALU includes NAL Header and NAL Body.
  • the length of a NAL Header is fixed at 1 byte, that is, 8 bits.
  • the NAL Header includes three fields, namely: forbidden_zero_bit, important indication field nal_ref_idc, and type field nal_unit_type. among them,
  • the forbidden_zero_bit occupies 1 bit, and the forbidden_zero_bit field must be 0 in the video coding standard (such as H.264). If the network finds an error in the NALU, the forbidden_zero_bit can be set to 1, which is convenient for the receiver to correct the error or discard the NALU.
  • nal_ref_idc occupies 2 bits and is used to indicate the importance of NALU.
  • the value range of nal_ref_idc is 00 to 11. When the value of nal_ref_idc is larger, it means that the current NALU is more important and needs to be protected first.
  • nal_unit_type occupies 5 bits and is used to indicate the type of NALU.
  • NAL Body includes the encapsulation of payload data (video data).
  • payload data video data
  • the first layer extended byte string payload (EBSP), which specifically includes the emulation_prevention_three_byte field. The purpose of setting this field is to prevent conflict with the NALU start code (0x000001 or 0x00000001) in the NAL Body.
  • the second layer raw byte sequence payload (RBSP), which is equivalent to the data after NAL Body removes the emulation_prevention_three_byte, and is the data generated after further processing of the original syntax element code stream (encoded data).
  • the basic structure of RBSP is to add end bits after the original encoded data to facilitate byte alignment.
  • the third layer data byte stream (string of data bits, SODB), which identifies the actual original binary code stream after encoding the syntax elements of the H.264 coding standard.
  • NALU may also include only NAL header (NAL Header) and RBSP. That is, the main body of the NAL is RBSP.
  • NAL Header NAL Header
  • RBSP the main body of the NAL
  • the enhanced supplemental information network abstraction layer unit refers to the NALU whose type field nal_unit_type is the SEI type of the supplementary enhanced information unit.
  • FIG. 3 is a schematic structural diagram of an SEI NALU provided by an embodiment of the present invention.
  • the SEI NALU includes a NAL header (NAL Header) and a NAL body (NAL Body).
  • NAL Header may correspond to the introduction in the embodiment shown in FIG. 2 for details.
  • the nal_unit_type in the NAL Header occupies 5 bits and is used to indicate the type of NALU. In practical applications, different NALU types are indicated by setting the value of the nal_unit_type field.
  • nal_unit_type when nal_unit_type is "0X06", it means that the type of NALU indicated by nal_unit_type is SEI type; when nal_unit_type is "0X67”, it means that the type of NALU indicated by nal_unit_type is sequence parameter sets (SPS) type; When nal_unit_type is "0X68", it indicates that the type of NALU indicated by nal_unit_type is a picture parameter set (picture parameter set, PPS) type, etc. In the present invention, nal_unit_type is 0X06, indicating that the type of NALU is SEI.
  • the NAL body includes SEI payload type (SEI payload type), SEI payload size (SEI payload size), and SEI universally unique identifie (SEI UUID) and custom fields.
  • SEI payload type field occupies 1 byte, that is, 8 bits, and is used to indicate the type of payload data carried in the SEI NALU, such as video data, SPS or PPS data.
  • SEI payload size field is used to indicate the size of the payload data, and is referred to as payload size for short.
  • the SEI UUID field occupies 16 bytes and is used to indicate the unique identification of the load data.
  • the number of bytes occupied by the custom field can be a system custom setting for carrying system custom data, which is not limited by the present invention.
  • the video is composed of several time-continuous frames, and the video can be divided into several GOPs during encoding. For example, when a computing device receives a request to play a drag progress bar for a video, if the target frame pointed to by the drag stop position is a non-I frame, it needs to find the closest frame to the target frame from the frames before or after the target frame. From the I frame, read the GOP to decode and play.
  • the decoding time will be prolonged, which will greatly affect the efficiency of video processing, resulting in a reduction in video processing efficiency, and affecting user viewing experience.
  • the target frame cannot be decoded, and part of the video is discarded.
  • the next I-frame position needs to be decoded and played. This leads to the discarding of some important video information, affects the accuracy of video information acquisition, and affects the user's viewing experience.
  • a computing device when a computing device receives a video reverse playback request, it inputs the complete GOP that constitutes the video into the decoder for decoding, stores the decoding result (decoded video) in the buffer, and then plays it in reverse order. For example, a short video of 5 minutes, in the video reverse scene, the computing device needs to play backwards from the end of the short video (that is, the 5th minute) to the beginning and end of the short video. In practice, it is found that if the GOP is large, the buffer space occupied by the decoded GOP is large.
  • the storage space occupied by the computing device to cache the decoded video needs to be 5.8GB. This will lead to waste of storage resources of the computing device.
  • FIG. 4 is a schematic structural diagram of a video processing system provided by an embodiment of the present invention.
  • the video processing system 100 shown in FIG. 4 includes a video encoding unit 102, a video reading and writing unit 104, a video decoding unit 106, and a storage unit 108. among them,
  • the video encoding unit 102 is responsible for encoding the input original video into a video code stream, and specifically can convert the format file of the original video into a file of another video format.
  • the video encoding unit 102 may use encoding standards such as H.261, H.263, H.264, H.265, or H.266 to encode the original video into a video stream.
  • Common video formats include but are not limited to audio video interleaved (AVI), digital video-audio video format-AVI (DV-AVI), moving picture expert group format, MPEG), advanced streaming format (ASF), windows media video (WMV), real media (RM) or other video supported formats, etc.
  • the video encoding unit 102 may divide the video into several GOPs for encoding.
  • a video that is, a video code stream
  • the present invention takes a video (or a video code stream) including one GOP as an example to describe related content.
  • the video encoding unit 102 may specifically be an encoder or other devices that support image or video encoding.
  • the video encoding unit 102 may be deployed in a camera device, such as a camera, a camera, etc.; it may also be deployed as a separate encoder.
  • the storage unit 108 is used for storing video, for example, storing a video code stream obtained after encoding by the video encoding unit 102 and the like.
  • the video reading and writing unit 104 is responsible for writing the video code stream into the storage unit 108. Or read the video code stream from the storage unit 108 (specifically read the GOP in the video code stream), and then input it to the video decoding unit 106 for decoding.
  • the video decoding unit 106 is responsible for decoding the input video code stream and outputting the decoded video code stream. Specifically, the GOP contained in the video bitstream is decoded, and each frame contained in the GOP is output.
  • the video reading and writing unit 104 may specifically be an input output (IO) device that supports data reading and writing functions, such as an IO interface.
  • the video decoding unit 106 may specifically be a device or device that supports a video decoding function, such as a decoder.
  • the video decoding unit 106 may be deployed in a video processing device of a computing device, or may be deployed as a separate decoder, etc., which is not limited in the present invention.
  • the storage unit 108 may specifically be a device supporting a data storage function, which may include, but is not limited to, random access memory (RAM) flash memory, read only memory (ROM), hard disk, registers, and the like.
  • RAM random access memory
  • ROM read only memory
  • registers registers, and the like.
  • the video processing technology provided by the embodiments of the present invention may be applicable to scenarios such as GOP playback or download.
  • This solution includes inserting one or more I frames into the GOP. Compared with the original I frames in the GOP, the newly inserted I frames are closer to the VI frame. In this case, when the content in the specified GOP needs to be played or downloaded, the GOP is found according to the requested time and the VI frame corresponding to the requested time, and the newly inserted I frame is used as the reference frame of the VI frame. Video decoding (without needing to use the original I frame in the GOP as a reference value for video decoding). Thereby improving the efficiency of video processing and the accuracy of playback time. When there are many frames in the GOP, the beneficial effects of the embodiments of the present invention will be more prominent.
  • FIG. 5 is a schematic structural diagram of a video encoding unit 102 provided by an embodiment of the present invention.
  • the video encoding unit 102 includes a VI detector 1021.
  • the system framework may be further divided into two layers: a video coding layer (VCL) and a network abstraction layer (NAL).
  • VCL video coding layer
  • NAL network abstraction layer
  • the video coding unit 102 may also include a video coding layer VCL 1022 and a network abstraction layer NAL 1023.
  • the video encoding unit 102 encodes the input original video through the video encoding layer VCL 1022 to obtain a video encoded bit stream, which is referred to as a video stream for short, and specifically also refers to the GOP in the video stream. Then, the VI frame identification is performed on the GOP obtained by the video coding layer VCL 1022 through the VI detector 1021.
  • the specific implementation of the VI frame identification is not limited. For example, the VI frame identification can be performed according to the definition of the VI frame, and the VI frame identification can also be performed based on the received out-of-band information.
  • the out-of-band information is used to indicate that the frame corresponding to the preset time stamp in the GOP is a VI frame, for example, the out-of-band information is used to indicate that the frame corresponding to the 3s in the GOP is a VI frame, and so on.
  • the network abstraction layer NAL 1023 is notified to mark the VI frame to indicate the position of the VI frame in the GOP.
  • the specific implementation of the VI frame marking is not limited, for example, the supplementary enhancement information SEI marking method, other marking methods that conform to the video coding standard for marking the specific position of the VI frame in the GOP, or the out-of-band method to notify the VI The position of the frame in the GOP, etc.
  • the GOP encoded by the video encoding unit 102 can also be sent to the network abstraction layer NAL 1023 for encapsulation, so as to encapsulate the GOP as a unit packet NALU of the network abstraction layer NALU.
  • the GOP is composed of multiple NALUs.
  • the GOP is composed of a series of NALUs.
  • the first frame of GOP data is picture parameter set (PPS) and sequence parameter set (SPS), followed by I frame and other frames.
  • the GOP includes at least one frame, and each frame includes one or more NALUs.
  • PPS includes information of all slices of an image (ie, frame)
  • SPS includes all information of an image sequence (ie, each frame in the GOP).
  • the VI detector can notify the network abstraction layer NAL 1023 to generate a custom supplement Enhanced information network abstraction layer unit (SEI NAL unit, SEI NALU).
  • SEI NAL unit SEI NALU
  • Insert the SEI NALU before or after the VI frame to indicate that the frame where the i-th NALU before the SEI NALU is located is a VI frame, or indicate that the frame where the j-th NALU after the SEI NALU is located is a VI frame,
  • it can be specifically used to indicate that the previous frame or the next frame of the SEI NALU is a VI frame. Please refer to FIG.
  • the original structure diagram of the GOP and the structure diagram of a new GOP correspondingly obtained after inserting the SEI NALU before and after the VI frame in the GOP are respectively shown.
  • the GOP obtained by the video coding layer VCL 1022 includes P frames and VI frames.
  • the video encoding unit 102 After detecting the VI frame in the GOP through the VI detector 1021, the video encoding unit 102 notifies the network abstraction layer NAL 1023 to add SEI NALU before the VI frame; or notifies the network abstraction layer NAL 1023 to add SEI NALU after the VI frame.
  • the specific position where the SEI NALU is added before or after the VI frame is not limited.
  • the SEI NALU is added before the VI frame as the jth NALU before the first NALU included in the VI frame to indicate The frame where the jth NALU after the SEI NALU is located is the VI frame; or the SEI NALU is added after the VI frame as the i-th NALU after the last NALU included in the VI frame to indicate that it is before the SEI NALU
  • the frame where the i-th NALU is located is the VI frame.
  • each frame (including VI frame) in the GOP is composed of one or more NALUs.
  • the VI frame in the figure includes 3 NALUs, namely NALU1 to NALU3.
  • the SEI NALU can be added before the first NALU (NALU1 shown in the figure) contained in the VI frame, that is, as the first NALU before NALU1. ; Or, add SEI NALU after the last NALU (NALU3 shown in the figure) included in the VI frame, that is, add it as the first NALU after NALU3.
  • the SEI NALU is specifically used to indicate that the frame in which the previous NALU or the next NALU of the SEI NALU is located is a VI frame.
  • the network abstraction layer NAL 1023 can specifically set the value of the relevant field in the SEI NALU to indicate that the frame of the i-th NALU before the SEI NALU is a VI frame; or to indicate that the SEI NALU The frame where the jth NALU is located is the VI frame.
  • the network abstraction layer NAL 1023 can indicate the position of the VI frame in the GOP by setting the Type field in the SEI NALU (specifically, the SEI payload type field) or the value of the SEI UUID field.
  • the network abstraction layer NAL 1023 can also add a field to the custom field of the SEI NALU, and set the value of the added field to indicate the position of the VI frame in the GOP.
  • the network abstraction layer NAL 1023 sets the SEI payload type to +1, it means that the frame where the previous NALU of the SEI NALU is located is a VI frame. Conversely, if the network abstraction layer NAL 1023 sets the SEI payload type to -1, it means that the frame where the next NALU of the SEI NALU is located is a VI frame.
  • FIG. 9 is a schematic structural diagram of a video reading and writing unit 104 according to an embodiment of the present invention.
  • the video reading and writing unit 104 includes a code stream detector 1041, an index generator 1042 and a code stream modifier 1043.
  • the video reading and writing unit 104 and the storage unit 108 communicate with each other. among them,
  • the code stream detector 1041 is used to perform frame detection (ie frame recognition) on the video code stream input to the video read-write unit 104 (specifically refers to the GOP in the video code stream) to determine the frame and the position of each frame contained in the GOP .
  • the present invention can determine the respective positions of the I frame and the VI frame in the GOP.
  • the form of expression of the position is not limited.
  • the frame index, the playback time corresponding to the frame in the GOP, the storage location of the frame in the storage unit 108 (also called the storage address), or other indications of the frame can be used.
  • the code stream detector 1041 performs VI frame mark detection on the GOP to detect the VI frame in the GOP and the position of the VI frame. Since the video encoding unit 102 has different marking methods for VI frames in the GOP, the specific implementation manners of the code stream detector 1041 for VI frame mark detection are also different. The following two specific manners for VI frame mark detection are exemplified.
  • the code stream detector 1041 detects whether the SEI NALU is included in the GOP, and if it does, it determines that the frame where the i-th NALU before the SEI NALU is located is the VI frame, or determines that the SEI NALU is in the SEI NALU.
  • the frame where the jth NALU after the NALU is located is the VI frame.
  • the number of SEI NALUs is not limited, and it may be one or more. When the number of SEI NALUs is multiple, the code stream detector 1041 can detect multiple VI frames included in the GOP and the position of each VI frame in the GOP according to the foregoing principle.
  • the code stream detector 1041 performs VI frame analysis on the GOP according to the out-of-band information sent from the video encoding unit 102, and determines the VI frame contained in the GOP and the position of the VI frame in the GOP.
  • the out-of-band information is used to indicate or notify the position of the VI frame in the GOP, for example, the fifth frame in the GOP is a VI frame, or the frame corresponding to the third second in the GOP is a VI frame, and so on.
  • the code stream detector 1041 can also detect the VI frame in the GOP by parsing the GOP. For details, refer to the following third implementation manner for details.
  • the code stream detector 1041 parses each frame included in the GOP, identifies the reference picture sequence (RPS) information in each frame, and determines the VI frame included in the GOP and the position of the VI frame.
  • RPS reference picture sequence
  • the first slice of each frame contains an RPS message.
  • the RPS information is composed of some identification information, and the meaning indicated by the identification information is specifically a system custom setting, for example, indicating whether the frame is used as a reference for decoding the current frame or subsequent frames.
  • the RPS information includes the reference frame information of the current frame. If the reference frame information is used to indicate that the current frame has only one decoding reference I frame, and the previous frame of the current frame is a non-I frame, it means that the current frame is a VI frame. Specifically, the RPS information indicates that there is a picture order count (POC) of the reference frame.
  • POC picture order count
  • the POC of the reference frame is 1, it means that the current frame has 1 reference frame, and the reference frame is an I frame. That is, the decoding of the current frame only refers to the I frame. Further, if the code stream detector 1041 detects that the previous frame of the current frame is a non-I frame, it can determine that the current frame is a VI frame.
  • the code stream detector 1041 When the code stream detector 1041 detects that the GOP includes a VI frame, it can send a VI frame identification signal to the index generator 1042 for notifying that the GOP includes the VI frame and the related information of the VI frame, such as the index of the VI frame (ie Frame number), the corresponding play time of the VI frame in the GOP, the storage address of the VI frame in the storage unit 108, and so on.
  • the code stream detector 1041 when the code stream detector 1041 detects that an I frame is included in the GOP, it may send an I frame identification signal to the index generator 1042 to notify the I frame included in the GOP and related information of the I frame. For example, the frame number of the I frame, the corresponding playback time of the I frame in the GOP, the storage address of the I frame in the storage unit 108, and so on.
  • the index generator 1042 is configured to receive the I frame identification signal and the VI frame identification signal sent by the code stream detector 1041. After receiving the VI frame identification signal, the index generator 1042 can determine the target I frame corresponding to the reference when decoding the VI frame (this application may also be referred to as the second I frame hereinafter), and perform the comparison between the target I frame and the VI frame.
  • Associated storage for example, storing the storage address of the target I frame into the index information of the GOP to instruct to refer to the target I frame stored at the storage address when decoding the VI frame.
  • the corresponding index information in the VI frame points to the target I frame, which is specifically used to indicate the target I frame pointed to by the index information when the VI frame is decoded.
  • the index information of the GOP is used to identify the GOP, which can include but is not limited to the index number of the GOP, the duration of the GOP, the start time and end time of the GOP corresponding (video code stream), whether the GOP contains the VI frame identifier, and the GOP is in Information such as the storage address in the storage unit 108 and the storage address or offset of the VI frame in the GOP.
  • the target I frame here may specifically refer to an I frame that appears before the VI frame in the GOP, that is, the playback time corresponding to the target I frame has priority over the playback time corresponding to the VI frame.
  • the target I frame here may also refer to the I frame that is closest to the VI frame in the GOP.
  • FIG. 10 for a schematic diagram of a GOP.
  • GOP is a video stream of 10s.
  • the 7s frame in the figure is the VI frame.
  • the target I frame is specifically the 0s-th I frame in the figure. If the target I frame referenced by decoding the VI frame is the I frame closest to the VI frame in the GOP, the target I frame is specifically the 9th I frame in the figure.
  • each GOP has its own GOP index information
  • the index generator 1042 can store each GOP and the index information of the GOP in the form of a GOP index table.
  • the storage unit 108 At least one mapping relationship is stored in the GOP index table, and the mapping relationship is that one GOP corresponds to having one index information of the GOP.
  • the specific index information of the GOP please refer to the above description, which will not be repeated here.
  • the code stream modifier 1043 is used to modify the GOP input by the video reading and writing unit 104 to obtain a new modified GOP. Specifically, the code stream modifier 1043 reads the VI frame contained in the GOP and the target I frame corresponding to the reference when decoding the VI frame, and then inserts the target I frame before the VI frame to obtain at least two new GOPs (also called For multiple GOPs). Wherein, the specific position where the target I frame is inserted before the VI frame is not limited, for example, it is inserted as the m-th frame before the VI frame, and m is a positive integer.
  • the embodiments of the present invention can be visually understood as: by inserting a new I frame in the original GOP, a GOP is divided into multiple new GOPs, where , Each new GOP has an I frame.
  • the newly inserted I frame in the embodiment of the present invention is for reference by VI frame decoding, so it may not have all the functions of the I frame in the original GOP (for example, it may not have the function to be played), as long as it is enough to decode
  • the VI frame can be used for reference at any time.
  • the newly inserted frame only has the function of I frame for VI frame reference and decoding. Therefore, this newly inserted frame can be called a quasi-I frame.
  • the original GOP frame can be considered as not really divided into multiple new GOPs, but still a GOP (it's just that one or more new GOPs are added to this GOP).
  • a quasi-I frame if the newly inserted I frame is exactly the same as the I frame in the original GOP, it can be considered that the original GOP is divided into multiple new GOPs.
  • the inserted frames are collectively referred to as I frames, and the I frames (or quasi I frames) are inserted into the GOP.
  • the result of this operation is collectively referred to as obtaining a "new GOP”.
  • the inserted I frame (for example, the second I frame) in the embodiment of the present invention is: the same frame as the I frame in the original GOP, or has the I frame in the original GOP possesses the VI frame reference decoding function Frame.
  • the video read-write unit 104 when it receives a video processing request, it detects whether there is a VI frame in the GOP through the code stream detector 1041. If there is a VI frame in the GOP, the code stream modifier 1043 reads the target I frame from the storage address of the target I frame referenced by decoding the VI frame recorded in the index information of the GOP. Then the code stream modifier 1043 inserts the read target I frame before the VI frame, thereby obtaining multiple new GOPs. This can solve the problems in the prior art when the GOP is large, if the distance between the I frame and the VI frame is large, the decoding time will be too long, the video processing efficiency will be reduced, or some important video information will be lost.
  • the present invention adopts the method of inserting an I frame before the VI frame, can split a large GOP into multiple small GOPs, and can decode and play based on the split small GOPs during video playback. Compared with the prior art, It can avoid the decoding of some unnecessary information, improve the efficiency of video decoding, avoid the discarding of some important video information and other issues, and ensure the user's viewing experience.
  • the video encoding unit 102 can mark VI frames contained in the GOP, and transmit the VI frame marks along with the GOP, so that the compatibility of the video encoding unit 102 can be improved.
  • the video reading and writing unit 104 can insert the target I frame before the VI frame, and divide the large GOP into multiple new GOPs. In this way, the control is based on the granularity of the VI frame, which can effectively improve the video playback effect. Especially in video reverse scenes, using new GOPs to replace large GOPs and cache them can effectively save storage resources.
  • the first is the video playback scene.
  • the video processing request is specifically a video playback request. Specifically, when a user watches a video, he can drag the progress bar of the video playback at will according to his own needs. Please refer to FIG. 11 which shows a schematic diagram of a user dragging the progress bar of the video playback.
  • the computing device detects that the user is dragging the video playback progress bar, it can generate a corresponding video playback request.
  • the GOP where the dragging stop position is obtained is obtained, and then it is recognized whether the VI frame is included in the GOP.
  • the storage address of the target I frame referred to when decoding the VI frame is obtained from the GOP index table, the target I frame is obtained from the storage address, and the target I frame is inserted before the I frame.
  • the target I frame here may specifically refer to the I frame that appears before the VI frame in the GOP, or may refer to the I frame that is closest to the VI frame in the GOP. For details, please refer to the example shown in FIG. 10 above.
  • the computing device can only process the VI frame closest to the drag stop position in the GOP, that is, before the VI frame Insert the target I frame to get two new GOPs.
  • the playback time corresponding to the inserted target I frame has priority over the playback time corresponding to the dragging stop position. Then decode and play the new GOP where the dragging stop position is located.
  • FIG. 12 for a schematic diagram of the structure of a GOP. As shown in Fig. 12, the GOP is a video code stream of 10s, and the GOP includes two VI frames, VI frame 1 and VI frame 2, respectively.
  • the playback time corresponding to VI frame 1 is the 5th second, and the playback time corresponding to VI frame 2 is the 7th second.
  • users can drag the progress bar of video playback at will. If the user drags the progress bar to stop at 3s, the VI frame closest to the dragging stop position is VI frame 1. At this time, the computing device can insert the target I frame before VI frame 1.
  • the insertion position of the target I frame is not limited, for example, it can be at any position between the drag stop position and the VI frame, or at the drag stop position At any position before, it can ensure that the playback time corresponding to the target I frame after insertion is not later than (that is, greater than or equal to) the playback time corresponding to the dragging stop position, which can avoid the loss of some important video information.
  • the second is the video download scene.
  • the video processing request is specifically a video download request. Specifically, if the user wants to watch the video offline, he can download and cache the video locally in advance.
  • the computing device can download the video (specifically, one or more GOPs contained in the video) in response to the video download request.
  • the video download request can carry the start time and end time of the video, and the computing device will download the video (that is, one or more GOPs in the video) from the start time to the end time. It can start downloading from the GOP at the start time to the end of the GOP at the end time. Then identify whether each GOP includes a VI frame.
  • the storage address of the target I frame referred to when decoding the VI frame is obtained from the GOP index table, the target I frame is obtained from the storage address, and the target I frame is inserted before the VI frame.
  • the target I frame is obtained from the storage address, and the target I frame is inserted before the VI frame.
  • the computing device can process each VI frame included in each GOP in the video, that is, before the VI frame Insert the target I frame to realize the split of large GOP to small GOP.
  • the processing process of the computing device for any VI frame in each GOP is the same. For details, reference may be made to the relevant introduction of the foregoing embodiment, which will not be repeated here.
  • the following describes related embodiments related to GOP storage.
  • Different video processing systems can use different indexing methods to create and store corresponding index information for the GOP.
  • the indexing methods corresponding to the index information of the GOPs in different video processing systems may be different, for example, the time indexing method or the frame number indexing method can be supported.
  • the specific implementation manners of the two indexing methods are given as an example as follows.
  • the first is the time index method.
  • the computing device uses the time index method to create and store corresponding index information for the GOP. Specifically, the computing device creates an index for the GOP according to a preset duration (for example, 1s), and obtains index information of the GOP.
  • the index information includes, but is not limited to, the number of the GOP, whether the GOP contains an I frame, the storage address of the I frame, whether the GOP contains a VI frame, the storage address of the VI frame, the storage address of the GOP in the storage unit 108, and the storage address of the GOP.
  • the preset duration is self-defined by the system, such as self-defined settings according to user requirements, or statistically obtained based on a series of empirical data. Please refer to FIG.
  • FIG. 13A shows a schematic diagram of a GOP stored in a time index manner.
  • GOP is a 10s video code stream
  • the specific figure shows the video code stream from the 0th second to the 9th second.
  • the second is the frame number index method.
  • the computing device uses the frame number index method to create and store corresponding index information for the GOP. Specifically, the computing device creates an index for the GOP according to the I frame interval, and obtains the index information of the GOP.
  • the GOP is used to indicate a group of consecutive frames between two I frames. For the specific index information of the GOP, please refer to the above description, which will not be repeated here.
  • FIG. 13B shows a schematic diagram of a GOP stored in a frame number index mode. As shown in the figure, the GOP is a video code stream including 10 frames, as shown in the figure are frame 0 to frame 10. Each frame corresponds to the index number of the frame.
  • FIG. 14 is a schematic flowchart of a video processing method according to an embodiment of the present invention.
  • the method shown in Figure 14 includes the following implementation steps:
  • Step S102 The computing device obtains the GOP of the group of pictures in the video, and the first frame of the GOP is the first I frame.
  • the GOP includes M frames, and M is a positive integer.
  • the computing device obtains a video processing request, and the video processing request carries the start time of the video.
  • a video corresponding to the start time is acquired, that is, at least one GOP in the video is acquired.
  • the video processing request may also carry the end time of the video, or other system-customized information, etc., which is not limited in the present invention.
  • the video processing request may specifically be generated by the user performing a corresponding video operation on the video, or may be received from other devices.
  • the video processing request may also be different. For example, in a video playback scene, the computing device detects a user's drag operation on the video playback progress bar, and can generate a corresponding video playback request. In a video download scenario, when the computing device detects a user's video download operation for a preset period (the period from the start time to the end time), it can generate a corresponding video download request, etc.
  • step S102 taking the video processing request as a video playback request and a video download request as an example, the specific implementation manner of step S102 is described in detail.
  • the video playback request carries the start time T s of the video.
  • the video includes multiple GOPs.
  • the computing device can respond to the video playback request and obtain the GOP of the group of pictures where the start time T s is located from the multiple GOPs of the video.
  • the video processing request is a video download request
  • the video download request carries the start time T s of the video, and optionally the end time T e of the video.
  • the computing device may respond to a video playback request, where the start time T s GOP start the download, the time until the end of the GOP end T e where, whereby the at least one GOP composed of downloaded video.
  • FIG. 15 for a schematic diagram of a GOP that composes a video according to an embodiment of the present invention.
  • a user plays the movie "XXX" online on a computing device.
  • the movie includes 8 GOPs.
  • T s time to start playing from the video.
  • the computing device may generate a video playback request when detecting that the user is dragging the playback progress bar of the movie.
  • the video playback request carries the start time T s of the video.
  • the computing device may respond to the video playback request to obtain the GOP where the start time T s is located, and the figure is specifically GOP3.
  • the computing device may generate a video download request when detecting the user's download operation for the movie.
  • the video download request may carry the start time and end time of the video to be downloaded.
  • the video to be downloaded can be a video segment (for example, the beginning or the end) of the movie "XXX", or it can be the entire video.
  • the start time and end time of the video to be downloaded can be customized by the user according to actual needs, for example, 00:01:00-00:21:00 (that is, download the video segment from the 1st minute to the 21st minute).
  • the user can perform offline download settings on the display interactive interface provided by the computing device. Please refer to FIG. 16 for a schematic diagram of an operation for a user to download videos offline.
  • the computing device detects an offline download operation for the display interactive interface, it can start downloading from the GOP at the start time until the GOP at the end time ends.
  • the GOP at the start time 00:01:00 is GOP1
  • the GOP at the end time 00:21:00 is GOP3
  • the 20-minute video downloaded by the computing device may specifically include GOP1, GOP2, and GOP3.
  • Step S104 The computing device determines whether a VI frame is included in the M frames.
  • the GOP includes one or more NALUs.
  • the computing device determines whether the VI frame is included in the M frames of the GOP by identifying whether the SEI and NALU are included in the GOP. Specifically, if the SEI NALU is included in the GOP, the frame where the i-th NALU before the SEI NALU is located is the VI frame, or the frame where the j-th NALU after the SEI NALU is located is the VI frame according to the indication of the SEI NALU. frame.
  • the number of SEI NALU is not limited, and it can be one or more. When the number of SEI NALUs is multiple, the computing device can determine the indicated VI frame corresponding to each of the multiple SEI NALUs by referring to the above-mentioned VI frame determination principle. Thus, one or more VI frames included in the M frames are determined.
  • the GOP includes at least one frame.
  • Each frame includes the reference frame RPS information of the frame.
  • the computing device can analyze the respective RPS information of the M frames to determine whether each frame is a VI frame. Specifically, if the RPS information of any frame in the GOP is used to indicate that any frame has a reference decoded I frame, and the previous frame of any frame is a non-I frame (specifically, it may be a B frame or a P frame), It is determined that any frame is a VI frame. Otherwise, it is determined that any frame is not a VI frame.
  • the computing device obtains out-of-band information of the GOP, and the out-of-band information is used to indicate the position of the VI frame included in the GOP.
  • the position refers to the specific or definite position of the VI frame in the GOP, which may include, but is not limited to, the frame number (index number) of the VI frame, the playing time corresponding to the VI frame, and the like.
  • the out-of-band information may specifically be received by the computing device from other devices (such as a server); it may also be obtained by the computing device from its own video encoding unit, which is not limited in the present invention.
  • the computing device recognizes whether the VI frame and the position of the VI frame are included in the M frames of the GOP according to the out-of-band information of the GOP.
  • the computing device when the computing device determines that the VI frame is not included in the GOP, the computing device does not need to process the GOP.
  • the computing device can start decoding and playing from the first I frame in the GOP.
  • Step S106 When determining that the M frames include a VI frame, the computing device inserts a second I frame before the VI frame to obtain multiple new GOPs. The number of the new GOP is increased by one for the number of VI frames included in the GOP.
  • the computing device can obtain the target I frame (also referred to as the second I frame) corresponding to the VI frame. Specifically, for example, the computing device may determine the storage address of the associated second I frame corresponding to the VI frame from the index information of the GOP, and then obtain the second I frame from the storage location. Or the computing device can search for the second I frame pointed to by the index information of the VI frame.
  • the second I frame can be an I frame that appears before the VI frame in the GOP, or the I frame that is the closest to the VI frame in the GOP. For details, please refer to the related introduction about the target I frame. Go into details again.
  • the index information of the GOP records the second I frame referenced when decoding VI frames, the storage address of the second I frame, the frame index of each frame, the corresponding playback time of each frame, the playback duration of the GOP, and the GOP Information such as start time and end time.
  • the computing device may insert the second I frame before the VI frame, specifically, it may be inserted as the m-th frame before the VI frame, and m is a positive integer. For example, insert the second I frame as the previous frame of the VI frame. Therefore, the computing device can split the GOP into multiple new GOPs, and the number of the new GOPs is the number of VI frames in the GOP increased by one.
  • a GOP includes 4 VI frames, and after inserting a second I frame for each VI frame, 5 new GOPs can be obtained.
  • FIG. 17 showing a schematic diagram of a new GOP. As shown in the figure, the GOP includes 4 VI frames, and the computing device adopts the above-mentioned I frame insertion principle to insert the corresponding second I frame before each VI frame, thereby obtaining 5 new GOPs.
  • the computing device can modify the value of the related field of the second I frame (for example, the value of the control field or the flag field in the second I frame). Value) to mark the second I frame as a non-display frame or a non-output frame.
  • the second I frame is only used to decode the VI frame, and is not used for output display.
  • the new GOP involved in this application has a different meaning from the GOP in the conventional definition.
  • the term description of the new GOP is still used.
  • the new GOP is used to indicate the distance between two I frames, but the first I frame of the new GOP is only used for decoding, not for display output.
  • the pseudo code description of the second I frame modified by the computing device is specifically as follows:
  • the present invention also has different specific processing objects for the GOP of the video and the VI frames included in the GOP. specifically:
  • the video processing request in S102 is specifically a video playback request.
  • the video playback request carries the start time T s of the video.
  • the computing device responds to the video playback request, obtains the GOP where the start time T s is located, and then identifies whether the GOP includes a VI frame. If the GOP includes multiple VI frames, the computing device obtains the VI frame closest to the start time T s from the multiple VI frames for processing, that is, inserts the second I frame before the obtained VI frame, thereby obtaining two new VI frames. GOP.
  • the playback time corresponding to the second I frame after the insertion has priority over the start time T s .
  • the video processing request in S102 is specifically a video download request.
  • the video download request carries the start time T s and the end time T e of the video.
  • the computing device in response to a request to download a video, where a starting time T S from the GOP start the download until the end of time T E where the end of the GOP to GOP consisting of a plurality of downloaded video.
  • For each GOP identify whether the GOP includes VI frames. If the GOP includes one or more VI frames, the computing device inserts a corresponding second I frame before each VI frame, thereby splitting one GOP into multiple new GOPs.
  • the computing device may decode and play the corresponding new GOPs in response to the video playback request if it obtains a video playback request.
  • the specific implementation is as follows:
  • the video processing request in S102 is a video playback request
  • the computing device responds to the video playback request to perform the second I frame insertion on the VI frame closest to the start time T s in the GOP to obtain two new GOPs. Further respond to the video play request, obtain the new GOP at the start time T s , start decoding and play the new GOP from the second I frame of the new GOP.
  • the computing device determines that the start time T s is located after the second I frame in the GOP, and then decodes and plays the video corresponding to the GOP from the second I frame.
  • the video processing request in S102 is a video download request.
  • the computing device responds to the video download request, downloads multiple GOPs included in the video, and inserts a second I frame for each VI frame included in each GOP to obtain Multiple new GOPs.
  • the user can drag the playback progress bar of the video at will when watching the video.
  • the computing device detects that the user is dragging the video playback progress bar, it can generate a corresponding video playback request.
  • the video playback request carries the start time T s of the video.
  • the new GOP where the start time T s is located is searched from among multiple new GOPs, and then the new GOP is decoded and played from the second I frame of the new GOP.
  • FIG. 18 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present invention.
  • the video processing device 18 includes an acquiring unit 181, a determining unit 182, and an inserting unit 183.
  • a decoding and playing unit 184 may also be included. among them,
  • the acquiring unit 181 is configured to acquire a group of pictures GOP in the video, the first frame of the GOP is the first I frame, the GOP includes M frames, and M is a positive integer;
  • the determining unit 182 is configured to determine whether a virtual intra-coded VI frame is included in the M frames;
  • the inserting unit 183 is configured to insert a second I frame before the VI frame when a VI frame is included in the M frames;
  • the second I frame is a frame referenced by the VI frame during video decoding.
  • the video processing device 180 may further include a decoding and playing unit 184.
  • the determining unit 182 is configured to determine that the start time of the video in the video playback request is located after the second I frame in the GOP in response to the video playback request; the decoding and playback unit 184 is configured to download The second I frame starts to decode and play the video.
  • the second I frame is the previous frame of the VI frame.
  • the GOP further includes index information of the GOP, the storage address of the second I frame is recorded in the index information, and the second I frame is inserted before the VI frame.
  • the acquiring unit 181 is further configured to acquire the second I frame from the storage address of the second I frame according to the index information of the GOP.
  • the second I frame is used for decoding the VI frame, and is not used for output display.
  • the acquiring unit 181 is specifically configured to receive a video processing request, the video processing request carries the start time of the video, and the video includes at least one group of pictures GOP; responding to the video processing request , Obtain the GOP of the group of pictures corresponding to the start time from the GOP index table;
  • mapping relationship is recorded in the GOP index table, and the mapping relationship is that each GOP corresponds to index information of the GOP, and the index information of the GOP includes the start time of the GOP.
  • the index information of the GOP further includes the playback time of the frame, and when the video processing request is a video playback request, the VI frame is the playback time in the GOP and the GOP The VI frame with the smallest difference between the start time.
  • the functions of the acquiring unit 181 and the determining unit 182 of the present invention can be implemented by the code stream detector 1041 in FIG. 9.
  • the function of the insertion unit 183 of the present invention can be implemented by the code stream modifier 1043 in FIG. 9.
  • the function of the decoding and playing unit 184 of the present invention can be implemented by the video decoding unit 106 in FIG. 4.
  • the code stream detector 1041 in the video reading and writing unit 104 in FIG. 4 or FIG. 9 can be specifically implemented by functional modules such as the acquiring unit 181 and the determining unit 182.
  • the code stream modifier 1043 in the video reading and writing unit 104 can be specifically implemented by functional modules such as the plug-in unit 183.
  • the video decoding unit 106 may be specifically implemented by functional modules such as the decoding and playing unit 184.
  • Each module or unit involved in the device 18 of the embodiment of the present invention may be specifically implemented by software programs or hardware.
  • the modules or units involved in the device 18 are software modules or software units.
  • the modules or units involved in the device 18 can be implemented through application-specific integrated circuits.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the above PLD can be a complex programmable logic device (CPLD), field-programmable gate array (field-programmable gate array, FPGA), general purpose Generic array logic (GAL) or any combination thereof is not limited in the present invention.
  • FIG. 18 is only a possible implementation manner of the embodiment of the present invention.
  • the video processing device may also include more or fewer components, which is not limited here.
  • FIG. 19 is a schematic structural diagram of a computing device 19 according to an embodiment of the present invention.
  • the computing device shown in FIG. 19 includes one or more processors 1901, a communication interface 1902, and a memory 1903.
  • the processor 1901, the communication interface 1902, and the memory 1903 can be connected by a bus, or communication can be achieved by other means such as wireless transmission.
  • the embodiment of the present invention takes the connection through the bus 1904 as an example, where the memory 1903 is used to store instructions, and the processor 1901 is used to execute instructions stored in the memory 1903.
  • the memory 1903 stores program codes, and the processor 1901 can call the program codes stored in the memory 1903 to implement the video processing device 18 as shown in FIG. 18.
  • the processor 1901 in the embodiment of the present invention may call the program code stored in the memory 1903 to execute all or part of the steps described in the method embodiment described in FIG. 14 above, and/or other steps described in the text. The content, etc., will not be repeated here.
  • the processor 1901 may be composed of one or more general-purpose processors, such as a central processing unit (CPU).
  • the processor 1901 may be used to run programs of the following functional modules in the related program code.
  • the functional module may specifically include, but is not limited to, any one or a combination of the above-mentioned acquiring unit 181, determining unit 182, and inserting unit 183.
  • the program code executed by the processor 1901 can perform the functions of any one or more of the above functional modules.
  • the functional modules mentioned here please refer to the relevant descriptions in the foregoing embodiments, which will not be repeated here.
  • the communication interface 1902 may be a wired interface (such as an Ethernet interface) or a wireless interface (such as a cellular network interface or using a wireless local area network interface) for communicating with other modules or devices.
  • a wired interface such as an Ethernet interface
  • a wireless interface such as a cellular network interface or using a wireless local area network interface
  • the communication interface 1902 in the embodiment of the present invention may be specifically used to obtain GOPs in the video and so on.
  • the memory 1903 may include volatile memory (Volatile Memory), such as random access memory (Random Access Memory, RAM); the memory may also include non-volatile memory (Non-Volatile Memory), such as read-only memory (Read-Only Memory). Memory, ROM), Flash Memory (Flash Memory), Hard Disk Drive (HDD), or Solid-State Drive (SSD); the memory 1903 may also include a combination of the foregoing types of memories.
  • volatile memory such as random access memory (Random Access Memory, RAM
  • non-Volatile Memory such as read-only memory (Read-Only Memory).
  • Memory ROM
  • Flash Memory Flash Memory
  • HDD Hard Disk Drive
  • SSD Solid-State Drive
  • the memory 1903 may be used to store a group of program codes, so that the processor 1901 can call the program codes stored in the memory 1903 to implement the functions of the above-mentioned functional modules involved in the embodiments of the present invention.
  • FIG. 19 is only a possible implementation manner of the embodiment of the present invention.
  • the computing device may also include more or fewer components, which is not limited here.
  • the content not shown or described in the embodiment of the present invention reference may be made to the relevant description in the foregoing method embodiment, which will not be repeated here.
  • the embodiment of the present invention also provides a computer-readable storage medium in which instructions are stored.
  • the computer-readable storage medium runs on a computing device, the method flow shown in the embodiment in FIG. 14 is implemented.
  • the embodiment of the present invention also provides a computer program product.
  • the computer program product runs on a computing device, the method flow shown in the embodiment of FIG. 14 is realized.
  • the steps of the method or algorithm described in combination with the disclosure of the embodiment of the present invention may be implemented in a hardware manner, or may be implemented in a manner in which a processor executes software instructions.
  • Software instructions can be composed of corresponding software modules, which can be stored in random access memory (English: Random Access Memory, RAM), flash memory, read-only memory (English: Read Only Memory, ROM), erasable and programmable Read-only memory (English: Erasable Programmable ROM, EPROM), electrically erasable programmable read-only memory (English: Electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM, or well-known in the art Any other form of storage medium.
  • An exemplary storage medium is coupled to the processor, so that the processor can read information from the storage medium and write information to the storage medium.
  • the storage medium may also be an integral part of the processor.
  • the processor and the storage medium may be located in the ASIC.
  • the ASIC may be located in the computing device.
  • the processor and the storage medium may also exist as discrete components in the computing device.
  • the program can be stored in a computer readable storage medium, and the program can be stored in a computer readable storage medium. When executed, it may include the procedures of the above-mentioned method embodiments.
  • the aforementioned storage media include: ROM, RAM, magnetic disks or optical disks and other media that can store program codes.

Abstract

A video processing technique, applicable to scenarios such as GOP playback or download. The solution comprises: additionally inserting one or more I frames in a GOP, the newly inserted I frame being closer to a VI frame than an original I frame in the GOP. In this case, when it is needed to play or download content in the GOP, the VI frame in the GOP is found according to a requested time, the newly inserted I frame is taken as a reference frame of the VI frame for video decoding, and there is no need to take the original I frame in the GOP as a reference value for video decoding. Therefore, the video processing efficiency and the playback time accuracy are improved.

Description

视频处理方法、装置及计算机可读存储介质Video processing method, device and computer readable storage medium 技术领域Technical field
本发明涉及互联网技术领域,尤其涉及视频处理方法、装置及计算机存储介质。The present invention relates to the field of Internet technology, in particular to video processing methods, devices and computer storage media.
背景技术Background technique
随着计算机与网络通信技术的飞速发展,人们对获得多媒体信息的需求日益旺盛。近年来,与视频相关的应用涵盖各个领域,如视频会议、视频监控和移动电视等。在这些领域中,为节省网络传输资源,视频通常被压缩传输。现有视频传输过程中多采用图像组(group of pictures,GOP)结构来进行,一个GOP就是一组连续的图像画面(即帧画面,简称为帧)。With the rapid development of computer and network communication technology, people have a growing demand for multimedia information. In recent years, video-related applications have covered various fields, such as video conferencing, video surveillance, and mobile TV. In these fields, in order to save network transmission resources, video is usually compressed and transmitted. In the existing video transmission process, a group of pictures (GOP) structure is often used. A GOP is a group of continuous image pictures (ie, frame pictures, referred to as frames for short).
相应地计算设备接收视频后,需对视频解码播放。例如,计算设备接收针对视频的拖动进度条播放请求时,响应该播放请求,从拖动停止位置开始获取组成视频的多个GOP,对每个GOP进行解码和播放。具体地,若拖动停止位置所指向的目标帧为非I帧,则计算设备需在目标帧之前或之后的若干帧中寻找I帧,以从I帧开始解码并播放。当I帧与目标帧之间的距离较大时,在一定程度上会降低视频处理效率,影响用户观看体验。Correspondingly, after the computing device receives the video, it needs to decode and play the video. For example, when a computing device receives a request to play a drag progress bar for a video, in response to the playback request, it acquires multiple GOPs that make up the video from the drag stop position, and decodes and plays each GOP. Specifically, if the target frame pointed to by the drag stop position is a non-I frame, the computing device needs to search for the I frame in several frames before or after the target frame, so as to start decoding and playing from the I frame. When the distance between the I frame and the target frame is large, the video processing efficiency will be reduced to a certain extent, and the user's viewing experience will be affected.
若目标帧之前或之后的若干帧中不包含I帧,将会导致GOP无法解码和播放。计算设备可丢弃该目标帧所在的GOP,进入下个GOP的解码和播放。这样会导致一些重要视频信息被丢弃,影响用户观看体验。If the I frame is not included in the frames before or after the target frame, the GOP cannot be decoded and played. The computing device can discard the GOP where the target frame is located, and enter the decoding and playback of the next GOP. This will cause some important video information to be discarded and affect the user's viewing experience.
发明内容Summary of the invention
本发明实施例公开了视频处理方法、装置及计算机可读存储介质,能够解决现有方案中存在的降低视频处理效率、丢失重要视频信息等问题。The embodiments of the present invention disclose a video processing method, a device, and a computer-readable storage medium, which can solve the problems of reduced video processing efficiency and loss of important video information in existing solutions.
第一方面,本发明实施例公开提供了一种视频处理方法,应用于计算设备中,所述方法包括:获取视频中的图像组GOP,该GOP的首帧是第一I帧,该GOP中包括M个帧,M为正整数。确定M个帧中是否包括虚拟帧内编码VI帧,在M个帧中包括VI帧时,在VI帧之前插入第二I帧。其中,第二I帧为在视频解码时VI帧参考的帧。In the first aspect, an embodiment of the present invention discloses a video processing method applied to a computing device. The method includes: acquiring a group of pictures GOP in a video, the first frame of the GOP is the first I frame, and the GOP is Including M frames, M is a positive integer. It is determined whether the M frames include a virtual intra-coded VI frame, and when the VI frame is included in the M frames, a second I frame is inserted before the VI frame. Among them, the second I frame is a frame referenced by the VI frame during video decoding.
通过实施本发明实施例在VI帧之前插入第二I帧,便于后续从第二I帧开始解码并播放视频。能够解决现有技术中存在的视频处理效率降低、丢失重要视频信息及浪费计算设备的存储资源等问题,从而有利于提升视频处理效率。By implementing the embodiment of the present invention, inserting the second I frame before the VI frame facilitates subsequent decoding and playback of the video from the second I frame. It can solve the problems of reduced video processing efficiency, loss of important video information, and waste of storage resources of computing devices in the prior art, thereby helping to improve video processing efficiency.
结合第一方面,在一些可能的实施例中,计算设备响应于视频播放请求,确定视频播放请求中视频的起始时间位于GOP中第二I帧之后。然后从第I帧开始解码并播放视频。With reference to the first aspect, in some possible embodiments, the computing device, in response to the video playback request, determines that the start time of the video in the video playback request is after the second I frame in the GOP. Then start decoding and play the video from the first frame.
通过实施本步骤,在VI帧插入第二I帧后,在视频播放场景中能从第二I帧开始解码并播放视频。相比于现有技术从GOP的第一I帧开始解码,能够节省视频解码时间、提升视频处理效率。By implementing this step, after the VI frame is inserted into the second I frame, the video can be decoded and played from the second I frame in the video playback scene. Compared with the prior art, the decoding starts from the first I frame of the GOP, which can save video decoding time and improve video processing efficiency.
结合第一方面,在一些可能的实施例中,第二I帧为VI帧的前一帧。With reference to the first aspect, in some possible embodiments, the second I frame is the previous frame of the VI frame.
结合第一方面,在一些可能的实施例中,GOP还包括GOP的索引信息,该索引信息中记录第二I帧的存储地址。在VI帧之前插入第二I帧之前,计算设备可根据GOP的索 引信息从第二I帧的存储地址中获取第二I帧。With reference to the first aspect, in some possible embodiments, the GOP further includes index information of the GOP, and the index information records the storage address of the second I frame. Before inserting the second I frame before the VI frame, the computing device can obtain the second I frame from the storage address of the second I frame according to the index information of the GOP.
结合第一方面,在一些可能的实施例中,在VI帧之前插入第二I帧之后,在所述第二I帧后的VI帧中插入该VI帧的索引信息。该VI帧的索引信息用于指向第二I帧。计算设备可根据VI帧的索引信息获取该索引信息所指向的第二I帧。With reference to the first aspect, in some possible embodiments, after the second I frame is inserted before the VI frame, the index information of the VI frame is inserted in the VI frame after the second I frame. The index information of the VI frame is used to point to the second I frame. The computing device may obtain the second I frame pointed to by the index information according to the index information of the VI frame.
通过实施本步骤,计算设备可根据GOP的索引信息或VI帧的索引信息获取待插入的第二I帧。便于后续将第二I帧插入VI帧之前。从而能更快地解码视频。By implementing this step, the computing device can obtain the second I frame to be inserted according to the index information of the GOP or the index information of the VI frame. It is convenient to insert the second I frame before the VI frame later. This can decode the video faster.
结合第一方面,在一些可能的实施例中,第二I帧仅用于解码VI帧,并不用于输出显示。With reference to the first aspect, in some possible embodiments, the second I frame is only used for decoding the VI frame, and is not used for output display.
结合第一方面,在一些可能的实施例中,GOP包括至少一个网络抽象层单元NALU,计算设备识别GOP中是否包括补充增强信息SEI NALU,以确定M个帧中是否包括VI帧。其中,该SEI NALU用于指示在SEI NALU之前的第i个NALU所在的帧为VI帧,或者指示在SEI NALU之后的第j个NALU所在的帧为VI帧。With reference to the first aspect, in some possible embodiments, the GOP includes at least one network abstraction layer unit NALU, and the computing device recognizes whether supplementary enhancement information SEI NALU is included in the GOP to determine whether the M frames include VI frames. The SEI NALU is used to indicate that the frame in which the i-th NALU before the SEI NALU is located is a VI frame, or indicates that the frame in which the j-th NALU after the SEI NALU is located is a VI frame.
通过实施本步骤,计算设备通过识别GOP中的SEI NALU即可识别出GOP中包括的VI帧,这样能够提升VI帧识别的便捷性和高效性。By implementing this step, the computing device can recognize the VI frames included in the GOP by recognizing the SEI NALU in the GOP, which can improve the convenience and efficiency of VI frame recognition.
结合第一方面,在一些可能的实施例中,GOP包括帧的参考帧RPS信息。计算设备通过识别GOP中每个帧的RPS信息,确定M个帧中是否包括VI帧。其中,当帧的RPS信息用于指示解码该帧时参考一个I帧,且该帧的前一帧为非I帧,则该帧为VI帧。With reference to the first aspect, in some possible embodiments, the GOP includes reference frame RPS information of the frame. The computing device determines whether the VI frame is included in the M frames by identifying the RPS information of each frame in the GOP. Wherein, when the RPS information of the frame is used to indicate to refer to an I frame when decoding the frame, and the previous frame of the frame is a non-I frame, the frame is a VI frame.
通过实施本步骤,计算设备直接识别帧中的参考帧RPS信息,以确定帧是否为VI帧。这样能提升VI帧识别的准确性。By implementing this step, the computing device directly recognizes the reference frame RPS information in the frame to determine whether the frame is a VI frame. This can improve the accuracy of VI frame recognition.
结合第一方面,在一些可能的实施例中,计算设备接收视频处理请求,该视频处理请求中携带有视频的起始时间,该视频包括至少一个GOP。响应该视频处理请求,从GOP索引表中获取与起始时间对应的GOP。其中,该GOP索引表中记录有至少一个映射关系,该映射关系为每个GOP对应有该GOP的索引信息,该GOP的索引信息中包括GOP的起始时间。With reference to the first aspect, in some possible embodiments, the computing device receives a video processing request, the video processing request carries the start time of the video, and the video includes at least one GOP. In response to the video processing request, the GOP corresponding to the start time is obtained from the GOP index table. Wherein, at least one mapping relationship is recorded in the GOP index table, and the mapping relationship is that each GOP corresponds to the index information of the GOP, and the index information of the GOP includes the start time of the GOP.
结合第一方面,在一些可能的实施例中,视频处理请求包括视频播放请求或视频下载请求。当视频处理请求为视频播放请求时,计算设备可响应视频播放请求,从GOP索引表中获取起始时间所在的GOP。反之,当视频处理请求为视频下载请求时,响应视频下载请求,从GOP索引表中获取从起始时间所在的GOP开始的至少一个GOP。With reference to the first aspect, in some possible embodiments, the video processing request includes a video playback request or a video download request. When the video processing request is a video playback request, the computing device may respond to the video playback request and obtain the GOP where the start time is located from the GOP index table. Conversely, when the video processing request is a video download request, in response to the video download request, at least one GOP starting from the GOP at the start time is obtained from the GOP index table.
通过实施本步骤,计算设备能根据不同的应用场景获取视频中相应地GOP。以对GOP进行处理。这样有利于根据设备实际需求获取相应GOP进行视频处理。By implementing this step, the computing device can obtain the corresponding GOP in the video according to different application scenarios. To process the GOP. This helps to obtain the corresponding GOP for video processing according to the actual needs of the device.
结合第一方面,在一些可能的实施例中,GOP的索引信息中还包括帧的播放时间。在视频处理请求为视频播放请求时,VI帧为GOP中播放时间与GOP的起始时间之差最小的VI帧。With reference to the first aspect, in some possible embodiments, the index information of the GOP also includes the playing time of the frame. When the video processing request is a video playback request, the VI frame is the VI frame with the smallest difference between the playback time and the start time of the GOP in the GOP.
通过实施本步骤,在视频播放场景下,计算设备能查找距离播放时间最近的VI帧进行第二I帧的插入,这样避免对GOP中每个VI帧进行I帧插入处理,节省设备资源,提升视频处理效率。By implementing this step, in the video playback scene, the computing device can find the VI frame closest to the playback time to insert the second I frame, which avoids I frame insertion processing for each VI frame in the GOP, saves equipment resources, and improves Video processing efficiency.
第二方面,本发明实施例提供了一种视频处理装置,该装置包括用于执行如上第一方面或第一方面的任意可能的实施方式中所描述的方法的功能模块或单元。In the second aspect, an embodiment of the present invention provides a video processing device, which includes a functional module or unit for executing the method described in the first aspect or any possible implementation manner of the first aspect.
第三方面,本发明实施例提供了一种计算设备,包括:处理器,存储器,通信接口和总线;处理器、通信接口、存储器通过总线相互通信;通信接口,用于接收和发送数据;存储器,用于存储指令;处理器,用于调用存储器中的指令,执行上述第一方面或第一方面的任意可能的实施方式中所描述的方法。In a third aspect, an embodiment of the present invention provides a computing device, including: a processor, a memory, a communication interface, and a bus; the processor, the communication interface, and the memory communicate with each other through the bus; the communication interface is used to receive and send data; and the memory , Is used to store instructions; the processor is used to call instructions in the memory to execute the method described in the first aspect or any possible implementation of the first aspect.
第四方面,提供了一种计算机可读存储介质,所述计算机可读存储介质用于执行上述第一方面所描述的方法的指令。In a fourth aspect, a computer-readable storage medium is provided, and the computer-readable storage medium is used to execute the instructions of the method described in the first aspect.
第五方面,提供了一种计算机程序产品,当其在计算机上运行时,使得所述计算机用于执行上述第一方面所描述的方法的指令。In a fifth aspect, a computer program product is provided, which when it runs on a computer, enables the computer to execute the instructions of the method described in the first aspect.
第六方面,提供了一种芯片产品,以执行上述第一方面或第一方面的任意可能的实施方式中的方法。In a sixth aspect, a chip product is provided to implement the foregoing first aspect or the method in any possible implementation manner of the first aspect.
本发明在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。On the basis of the implementation manners provided by the above aspects, the present invention can be further combined to provide more implementation manners.
附图说明Description of the drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art.
图1是本发明实施例提供的一种GOP的结构示意图。Fig. 1 is a schematic structural diagram of a GOP provided by an embodiment of the present invention.
图2是本发明实施例提供的一种NALU的结构示意图。Figure 2 is a schematic structural diagram of a NALU provided by an embodiment of the present invention.
图3是本发明实施例提供的一种SEI NALU的结构示意图。FIG. 3 is a schematic structural diagram of a SEI NALU provided by an embodiment of the present invention.
图4是本发明实施例提供的一种视频处理系统的结构示意图。Fig. 4 is a schematic structural diagram of a video processing system provided by an embodiment of the present invention.
图5是本发明实施例提供的一种视频解码单元的结构示意图。Fig. 5 is a schematic structural diagram of a video decoding unit provided by an embodiment of the present invention.
图6是本发明实施例提供的另一种GOP的结构示意图。Fig. 6 is a schematic structural diagram of another GOP provided by an embodiment of the present invention.
图7是本发明实施例提供的一种SEI NALU插入GOP的结构示意图。FIG. 7 is a schematic structural diagram of a SEI NALU inserted into a GOP according to an embodiment of the present invention.
图8是本发明实施例提供的另一种SEI NALU插入GOP的结构示意图。FIG. 8 is a schematic structural diagram of another SEI NALU inserted into a GOP according to an embodiment of the present invention.
图9是本发明实施例提供的一种视频读写单元的结构示意图。FIG. 9 is a schematic structural diagram of a video reading and writing unit provided by an embodiment of the present invention.
图10是本发明实施例提供的另一种GOP的结构示意图。Fig. 10 is a schematic structural diagram of another GOP provided by an embodiment of the present invention.
图11是本发明实施例提供的一种用户拖动视频播放进度条的示意图。FIG. 11 is a schematic diagram of a user dragging a video playback progress bar according to an embodiment of the present invention.
图12是本发明实施例提供的另一种GOP的结构示意图。Figure 12 is a schematic structural diagram of another GOP provided by an embodiment of the present invention.
图13A是本发明实施例提供的一种采用时间索引方式存储GOP的示意图。FIG. 13A is a schematic diagram of storing GOPs in a time index manner according to an embodiment of the present invention.
图13B是本发明实施例提供的一种采用帧号索引方式存储GOP的示意图。FIG. 13B is a schematic diagram of storing GOPs in a frame number index mode provided by an embodiment of the present invention.
图14是本发明实施例提供的一种视频处理方法的流程示意图。FIG. 14 is a schematic flowchart of a video processing method provided by an embodiment of the present invention.
图15是本发明实施例提供的一种组成视频的GOP示意图。FIG. 15 is a schematic diagram of a GOP that composes a video according to an embodiment of the present invention.
图16是本发明实施例提供的一种用户离线下载视频的操作示意图。FIG. 16 is a schematic diagram of an operation for a user to download a video offline according to an embodiment of the present invention.
图17是本发明实施例提供的一种新GOP的结构示意图。Figure 17 is a schematic structural diagram of a new GOP provided by an embodiment of the present invention.
图18是本发明实施例提供的一种视频处理装置的结构示意图。FIG. 18 is a schematic structural diagram of a video processing device provided by an embodiment of the present invention.
图19是本发明实施例提供的一种计算设备的结构示意图。FIG. 19 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明的附图,对本发明实施例中的技术方案进行详细描述。The technical solutions in the embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings of the present invention.
首先,介绍本发明适用的一些技术术语或技术概念。First, introduce some technical terms or technical concepts applicable to the present invention.
GOP,也称图像组。指一组连续的图像画面(也称为帧),具体指两个I帧之间的图像组。GOP指示了两个I帧之间的距离。GOP, also known as group of pictures. Refers to a group of continuous images (also called frames), specifically the group of images between two I-frames. GOP indicates the distance between two I frames.
I帧,也称帧内编码帧,是一种自带全部信息的独立帧,无需参考其他帧便可独立进行解码。视频中第一个帧通常是I帧。I frame, also called intra-frame coded frame, is an independent frame with all its own information, and can be decoded independently without referring to other frames. The first frame in the video is usually an I frame.
非I帧,指除了I帧之外的帧,具体包括B帧或P帧。Non-I frames refer to frames other than I frames, specifically including B frames or P frames.
B帧,也称双向预测编码帧。B帧记录了当前帧与前后帧之间的差别。也就是说解码B帧时,需参考B帧的前一帧及后一帧才能解码。B帧的前一帧是指在B帧之前,并且和B帧相邻的帧;B帧后一帧是指在B帧之后,并且和B帧相邻的帧。B frame, also called bidirectional predictive coding frame. The B frame records the difference between the current frame and the previous and next frames. That is to say, when decoding a B frame, it is necessary to refer to the previous frame and the next frame of the B frame to decode. The previous frame of the B frame refers to the frame before the B frame and adjacent to the B frame; the frame after the B frame refers to the frame after the B frame and adjacent to the B frame.
P帧,也称帧间预测编码帧。P帧记录了当前帧与前一帧的差别。也就是说解码P帧时,需要参考P帧的前一帧(具体可能为P帧或I帧)才能解码。P frame, also known as inter-frame predictive coding frame. The P frame records the difference between the current frame and the previous frame. That is to say, when decoding a P frame, you need to refer to the previous frame of the P frame (specifically, it may be a P frame or an I frame) to decode.
VI帧(virtual independent frame,VI),也称虚拟I帧。VI帧本质也是P帧,但解码VI帧时,参考VI帧前面的I帧。请参见图1示出一种GOP的结构示意图。如图1中,GOP中包括3个VI帧,解码每个VI帧时仅参考该GOP中出现在VI帧之前的I帧,具体如图箭头所示。VI frame (virtual independent frame, VI), also called virtual I frame. The VI frame is essentially a P frame, but when decoding a VI frame, refer to the I frame in front of the VI frame. Please refer to Fig. 1 for a schematic structural diagram of a GOP. As shown in Figure 1, the GOP includes 3 VI frames. When decoding each VI frame, only the I frame that appears before the VI frame in the GOP is referred to, as shown by the arrow in the figure.
网络抽象层单元(network abstraction layer unit,NALU),是视频压缩的基本单位。视频编码中,每帧由至少一个NALU组成。请参见图2是本发明实施例提供的一种NALU的结构示意图。如图2,NALU包括NAL头(NAL Header)和NAL主体(NAL Body)。在H.264视频编码标中,一个NAL Header的长度固定为1字节,即8比特(bit)。NAL Header中包括三个字段,分别为:禁止位字段forbidden_zero_bit、重要指示字段nal_ref_idc和类型字段nal_unit_type。其中,The network abstraction layer unit (NALU) is the basic unit of video compression. In video coding, each frame is composed of at least one NALU. Refer to FIG. 2 for a schematic structural diagram of a NALU provided by an embodiment of the present invention. As shown in Figure 2, NALU includes NAL Header and NAL Body. In the H.264 video coding standard, the length of a NAL Header is fixed at 1 byte, that is, 8 bits. The NAL Header includes three fields, namely: forbidden_zero_bit, important indication field nal_ref_idc, and type field nal_unit_type. among them,
forbidden_zero_bit占用1bit,在视频编码标准(如H.264)中规定forbidden_zero_bit字段需为0。如果网络发现NALU出错,则forbidden_zero_bit可设置为1,便于接收方纠错或丢弃该NALU。The forbidden_zero_bit occupies 1 bit, and the forbidden_zero_bit field must be 0 in the video coding standard (such as H.264). If the network finds an error in the NALU, the forbidden_zero_bit can be set to 1, which is convenient for the receiver to correct the error or discard the NALU.
nal_ref_idc占用2bit,用于指示NALU的重要性。nal_ref_idc的取值范围为00~11。当nal_ref_idc的取值越大,表示当前NALU越重要,需优先保护。nal_ref_idc occupies 2 bits and is used to indicate the importance of NALU. The value range of nal_ref_idc is 00 to 11. When the value of nal_ref_idc is larger, it means that the current NALU is more important and needs to be protected first.
nal_unit_type占用5bit,用于指示NALU的类型。nal_unit_type occupies 5 bits and is used to indicate the type of NALU.
NAL Body包括有效负载数据(视频数据)的封装。在实际应用中,视频编码得到视频码流存在3层封装。第一层:拓展字节序列载荷(extended byte string payload,EBSP),具体包括emulation_prevention_three_byte字段,设置该字段目的是为了防止NAL Body内部出现与NALU起始码(0x 000001或0x 00000001)冲突。第二层:原始字节序列载荷(raw byte sequence payload,RBSP),相当于NAL Body去掉emulation_prevention_three_byte之后的数据,是对原始的语法元素码流(编码数据)进一步处理后产生的数据。RBSP的基本结构是在原始编码数据的后面添加结尾比特,以便于字节对齐。第三层:数据字节流(string of data bits,SODB),标识H.264编码标准的语法元素编码完成后实际的原始二进制码流。NAL Body includes the encapsulation of payload data (video data). In practical applications, there are three layers of encapsulation in the video code stream obtained by video encoding. The first layer: extended byte string payload (EBSP), which specifically includes the emulation_prevention_three_byte field. The purpose of setting this field is to prevent conflict with the NALU start code (0x000001 or 0x00000001) in the NAL Body. The second layer: raw byte sequence payload (RBSP), which is equivalent to the data after NAL Body removes the emulation_prevention_three_byte, and is the data generated after further processing of the original syntax element code stream (encoded data). The basic structure of RBSP is to add end bits after the original encoded data to facilitate byte alignment. The third layer: data byte stream (string of data bits, SODB), which identifies the actual original binary code stream after encoding the syntax elements of the H.264 coding standard.
可选地在H.264视频编码标准中,NALU中也可仅包括NAL头(NAL Header)和RBSP。即,NAL主体为RBSP。关于NAL头和RBSP具体可参见上文所述,这里不再赘述。Optionally, in the H.264 video coding standard, NALU may also include only NAL header (NAL Header) and RBSP. That is, the main body of the NAL is RBSP. For details about the NAL header and RBSP, please refer to the above description, which will not be repeated here.
增强补充信息网络抽象层单元(supplemental enhancement information NALU,SEI NALU),指类型字段nal_unit_type为补充增强信息单元SEI类型的NALU。请参见图3,是本发明实施例提供的一种SEI NALU的结构示意图。如图3,SEI NALU中包括NAL头(NAL Header)和NAL主体(NAL Body)。其中,NAL Header具体可对应参见图2所述实施例中的介绍。NAL Header中nal_unit_type占用5bit,用于指示NALU的类型。在实际应用中通过设置nal_unit_type字段的数值,来表示不同NALU的类型。例如,当nal_unit_type为“0X06”时,表示nal_unit_type指示的NALU的类型为SEI类型;当nal_unit_type为“0X67”时,表示nal_unit_type指示的NALU的类型为序列参数集(sequence parameter sets,SPS)类型;当nal_unit_type为“0X68”时,表示nal_unit_type指示的NALU的类型为图像参数集(picture parameter set,PPS)类型等。本发明这里,nal_unit_type为0X06,表示NALU的类型为SEI类型。The enhanced supplemental information network abstraction layer unit (supplemental enhancement information NALU, SEI NALU) refers to the NALU whose type field nal_unit_type is the SEI type of the supplementary enhanced information unit. Refer to FIG. 3, which is a schematic structural diagram of an SEI NALU provided by an embodiment of the present invention. As shown in Figure 3, the SEI NALU includes a NAL header (NAL Header) and a NAL body (NAL Body). Among them, NAL Header may correspond to the introduction in the embodiment shown in FIG. 2 for details. The nal_unit_type in the NAL Header occupies 5 bits and is used to indicate the type of NALU. In practical applications, different NALU types are indicated by setting the value of the nal_unit_type field. For example, when nal_unit_type is "0X06", it means that the type of NALU indicated by nal_unit_type is SEI type; when nal_unit_type is "0X67", it means that the type of NALU indicated by nal_unit_type is sequence parameter sets (SPS) type; When nal_unit_type is "0X68", it indicates that the type of NALU indicated by nal_unit_type is a picture parameter set (picture parameter set, PPS) type, etc. In the present invention, nal_unit_type is 0X06, indicating that the type of NALU is SEI.
NAL主体中包括SEI负载类型(SEI payload type)、SEI负载大小(SEI payload size)及SEI负载的全局唯一标识(SEI universally unique identifie,SEI UUID)及自定义字段。其中,SEI payload type字段占用1个字节,即8bit,用于指示SEI NALU中承载的负载数据的类型,例如视频数据、SPS或PPS数据等。SEI payload size字段用于指示负载数据的大小,简称为负载大小。SEI UUID字段占用16字节,用于指示负载数据的唯一标识。自定义字段占用的字节数可为系统自定义设置,用于承载系统自定义数据,本发明并不做限定。The NAL body includes SEI payload type (SEI payload type), SEI payload size (SEI payload size), and SEI universally unique identifie (SEI UUID) and custom fields. Among them, the SEI payload type field occupies 1 byte, that is, 8 bits, and is used to indicate the type of payload data carried in the SEI NALU, such as video data, SPS or PPS data. The SEI payload size field is used to indicate the size of the payload data, and is referred to as payload size for short. The SEI UUID field occupies 16 bytes and is used to indicate the unique identification of the load data. The number of bytes occupied by the custom field can be a system custom setting for carrying system custom data, which is not limited by the present invention.
随着网络通信技术的进步和带宽网络的提速,网络视频得到越来越多的发展和应用。目前,为节省网络传输资源,视频通常被压缩传输。视频由若干时间连续的帧构成,编码中可将视频分割成若干个GOP来进行。例如,计算设备在接收到针对视频的拖动进度条播放请求时,若拖动停止位置所指向的目标帧为非I帧,则需从目标帧之前或之后的若干帧中寻找与目标帧最近的I帧,从该I帧开始读取GOP解码并播放。当GOP较大时,若目标帧与I帧之间的距离较大,则会延长解码时间、很大程度上会影响视频处理的效率,导致视频处理效率降低,影响用户观看体验。反之,若目标帧之前或之后的若干帧中缺少I帧,导致目标帧无法解码,丢弃部分视频,需从下个I帧位置开始解码与播放。这样导致一些重要视频信息的丢弃,影响视频信息获取的准确率,影响用户观看体验。With the advancement of network communication technology and the speeding up of bandwidth networks, network video has been developed and applied more and more. At present, in order to save network transmission resources, videos are usually compressed and transmitted. The video is composed of several time-continuous frames, and the video can be divided into several GOPs during encoding. For example, when a computing device receives a request to play a drag progress bar for a video, if the target frame pointed to by the drag stop position is a non-I frame, it needs to find the closest frame to the target frame from the frames before or after the target frame. From the I frame, read the GOP to decode and play. When the GOP is large, if the distance between the target frame and the I frame is large, the decoding time will be prolonged, which will greatly affect the efficiency of video processing, resulting in a reduction in video processing efficiency, and affecting user viewing experience. Conversely, if there are missing I-frames in the several frames before or after the target frame, the target frame cannot be decoded, and part of the video is discarded. The next I-frame position needs to be decoded and played. This leads to the discarding of some important video information, affects the accuracy of video information acquisition, and affects the user's viewing experience.
又如计算设备在接收到视频倒放请求时,将组成视频的完整GOP输入至解码器中解码,并将解码结果(解码视频)存入缓存中,然后倒序播放。例如一段5分钟的短视频,在视频倒放场景中计算设备需从短视频的结尾(即第5分钟)开始倒序播放至短视频的开头结束。在实践中发现,若GOP较大时,解码后的GOP所占用的缓存空间较大。例如采用25fps(帧/秒)帧速率传输的4K视频,如果单个GOP的大小为20s(秒),则计算设备缓存解码后的视频所占用的存储空间需为5.8GB。这样会导致计算设备的存储资源浪费。For another example, when a computing device receives a video reverse playback request, it inputs the complete GOP that constitutes the video into the decoder for decoding, stores the decoding result (decoded video) in the buffer, and then plays it in reverse order. For example, a short video of 5 minutes, in the video reverse scene, the computing device needs to play backwards from the end of the short video (that is, the 5th minute) to the beginning and end of the short video. In practice, it is found that if the GOP is large, the buffer space occupied by the decoded GOP is large. For example, if a 4K video is transmitted at a frame rate of 25fps (frame per second), if the size of a single GOP is 20s (seconds), the storage space occupied by the computing device to cache the decoded video needs to be 5.8GB. This will lead to waste of storage resources of the computing device.
为解决上述现有技术中存在的视频处理效率降低、丢失一些重要视频信息或在大GOP倒放场景中会浪费计算设备的存储资源等问题,本发明提出一种视频处理方法、所述方法适用的系统及相关产品。请参见图4,是本发明实施例提供的一种视频处理系统的结构示 意图。如图4所示的视频处理系统100包括视频编码单元102、视频读写单元104、视频解码单元106及存储单元108。其中,In order to solve the problems of reduced video processing efficiency, loss of some important video information, or wasted storage resources of computing equipment in a large GOP reverse playback scenario, the present invention proposes a video processing method, which is applicable to the above-mentioned prior art. Systems and related products. Refer to Fig. 4, which is a schematic structural diagram of a video processing system provided by an embodiment of the present invention. The video processing system 100 shown in FIG. 4 includes a video encoding unit 102, a video reading and writing unit 104, a video decoding unit 106, and a storage unit 108. among them,
视频编码单元102负责将输入的原始视频编码为视频码流,具体可将原始视频的格式文件转换为另一种视频格式的文件。例如,视频编码单元102可采用H.261、H.263、H.264、H.265或H.266等编码标准将原始视频编码为视频码流。常见的视频格式包括但不限于音频视频交错格式(audio video interleaved,AVI)、数字视频-音频视频交错格式(digital video format-AVI,DV-AVI)、运动图像专家组格式(moving picture expert group,MPEG)、改进流格式(advanced streaming format,ASF)、窗口媒体视频格式(windows media video,WMV)、真实媒体格式(real media,RM)或其他视频支持的格式等。The video encoding unit 102 is responsible for encoding the input original video into a video code stream, and specifically can convert the format file of the original video into a file of another video format. For example, the video encoding unit 102 may use encoding standards such as H.261, H.263, H.264, H.265, or H.266 to encode the original video into a video stream. Common video formats include but are not limited to audio video interleaved (AVI), digital video-audio video format-AVI (DV-AVI), moving picture expert group format, MPEG), advanced streaming format (ASF), windows media video (WMV), real media (RM) or other video supported formats, etc.
在视频编码中,视频编码单元102可将视频分割成若干个GOP来进行编码。换句话说,一个视频(即视频码流)可包括一个或多个GOP,本发明下文以一个视频(或视频码流)包括一个GOP为例进行相关内容的阐述。In video encoding, the video encoding unit 102 may divide the video into several GOPs for encoding. In other words, a video (that is, a video code stream) may include one or more GOPs. In the following, the present invention takes a video (or a video code stream) including one GOP as an example to describe related content.
在实际应用中,视频编码单元102具体可为编码器、或其他支持图像或视频编码的器件。例如,视频编码单元102可部署在摄像装置中,例如部署在摄像头、相机中等;也可部署为单独的编码器等。In practical applications, the video encoding unit 102 may specifically be an encoder or other devices that support image or video encoding. For example, the video encoding unit 102 may be deployed in a camera device, such as a camera, a camera, etc.; it may also be deployed as a separate encoder.
存储单元108用于存储视频,例如存储视频编码单元102编码后得到的视频码流等。The storage unit 108 is used for storing video, for example, storing a video code stream obtained after encoding by the video encoding unit 102 and the like.
视频读写单元104负责将视频码流写入至存储单元108中。或者从存储单元108中读取视频码流(具体读取视频码流中的GOP),然后输入到视频解码单元106中解码。The video reading and writing unit 104 is responsible for writing the video code stream into the storage unit 108. Or read the video code stream from the storage unit 108 (specifically read the GOP in the video code stream), and then input it to the video decoding unit 106 for decoding.
视频解码单元106负责对输入的视频码流进行解码,输出解码后的视频码流。具体地,对视频码流中包含的GOP进行解码,输出该GOP包含的每个帧。The video decoding unit 106 is responsible for decoding the input video code stream and outputting the decoded video code stream. Specifically, the GOP contained in the video bitstream is decoded, and each frame contained in the GOP is output.
在实际应用中,视频读写单元104具体可为支持数据读写功能的输入输出(input output,IO)设备,例如IO接口等。视频解码单元106具体可为支持视频解码功能的设备或器件,例如解码器等。该视频解码单元106可部署在计算设备的视频处理装置中,也可部署为单独的解码器等,本发明并不做限定。存储单元108具体可为支持数据存储功能的器件,其可包括但不限于随机存取存储器(random access memory,RAM)闪存、只读存储器(read only memory,ROM)、硬盘、寄存器等。In actual applications, the video reading and writing unit 104 may specifically be an input output (IO) device that supports data reading and writing functions, such as an IO interface. The video decoding unit 106 may specifically be a device or device that supports a video decoding function, such as a decoder. The video decoding unit 106 may be deployed in a video processing device of a computing device, or may be deployed as a separate decoder, etc., which is not limited in the present invention. The storage unit 108 may specifically be a device supporting a data storage function, which may include, but is not limited to, random access memory (RAM) flash memory, read only memory (ROM), hard disk, registers, and the like.
本发明实施例所提供的视频处理技术,可以适用于GOP的播放或者下载等场景。这个方案包括:在GOP中额外插入一个或者多个I帧,和GOP中原有I帧相比,新插入的I帧距离VI帧更近。这种情况下,当需要播放或者下载指定GOP中的内容时,根据所请求的时间找到GOP,以及请求的时间所对应的VI帧,把新插入的I帧作为所述VI帧的参考帧进行视频解码(而不需要以GOP中原有I帧作为参考值进行视频解码)。从而提高视频处理效率以及播放的时间精确度。当GOP中帧较多时,本发明实施例的有益效果会更加突出。The video processing technology provided by the embodiments of the present invention may be applicable to scenarios such as GOP playback or download. This solution includes inserting one or more I frames into the GOP. Compared with the original I frames in the GOP, the newly inserted I frames are closer to the VI frame. In this case, when the content in the specified GOP needs to be played or downloaded, the GOP is found according to the requested time and the VI frame corresponding to the requested time, and the newly inserted I frame is used as the reference frame of the VI frame. Video decoding (without needing to use the original I frame in the GOP as a reference value for video decoding). Thereby improving the efficiency of video processing and the accuracy of playback time. When there are many frames in the GOP, the beneficial effects of the embodiments of the present invention will be more prominent.
请参见图5,是本发明实施例提供的一种视频编码单元102的结构示意图。如图5中,该视频编码单元102包括VI检测器1021。可选地,视频编码单元102在视频编码中,系统框架还可被分为两层:视频编码层(video coding layer,VCL)和网络抽象层(network abstraction layer,NAL)。具体如图5所示,视频编码单元102中还可包括视频编码层VCL1022和网络抽象层NAL 1023。Refer to FIG. 5, which is a schematic structural diagram of a video encoding unit 102 provided by an embodiment of the present invention. As shown in FIG. 5, the video encoding unit 102 includes a VI detector 1021. Optionally, in the video coding of the video coding unit 102, the system framework may be further divided into two layers: a video coding layer (VCL) and a network abstraction layer (NAL). Specifically, as shown in FIG. 5, the video coding unit 102 may also include a video coding layer VCL 1022 and a network abstraction layer NAL 1023.
视频编码单元102通过视频编码层VCL 1022对输入的原始视频进行编码,得到视频编码后的比特流,简称为视频码流,具体也指视频码流中的GOP。然后通过VI检测器1021对视频编码层VCL 1022得到的GOP进行VI帧识别。关于VI帧识别的具体实施方式并不做限定,例如可根据VI帧的定义来进行VI帧识别,也可根据接收的带外信息来进行VI帧识别等。该带外信息用于指示GOP中预设时间戳对应的帧为VI帧,例如带外信息用于指示GOP中第3s对应的帧为VI帧等等。The video encoding unit 102 encodes the input original video through the video encoding layer VCL 1022 to obtain a video encoded bit stream, which is referred to as a video stream for short, and specifically also refers to the GOP in the video stream. Then, the VI frame identification is performed on the GOP obtained by the video coding layer VCL 1022 through the VI detector 1021. The specific implementation of the VI frame identification is not limited. For example, the VI frame identification can be performed according to the definition of the VI frame, and the VI frame identification can also be performed based on the received out-of-band information. The out-of-band information is used to indicate that the frame corresponding to the preset time stamp in the GOP is a VI frame, for example, the out-of-band information is used to indicate that the frame corresponding to the 3s in the GOP is a VI frame, and so on.
若识别到VI帧,则通知网络抽象层NAL 1023对该VI帧进行标记,以用于指示VI帧在GOP中的位置。关于VI帧标记的具体实施方式并不做限定,例如采用补充增强信息SEI标记方式、其他符合视频编码标准的用于标记VI帧在GOP中具体位置的标记方式,或者采用带外方式以通知VI帧在GOP中的位置等。If the VI frame is recognized, the network abstraction layer NAL 1023 is notified to mark the VI frame to indicate the position of the VI frame in the GOP. The specific implementation of the VI frame marking is not limited, for example, the supplementary enhancement information SEI marking method, other marking methods that conform to the video coding standard for marking the specific position of the VI frame in the GOP, or the out-of-band method to notify the VI The position of the frame in the GOP, etc.
与此同时,视频编码单元102编码得到的GOP还可送入至网络抽象层NAL 1023中封装,以将GOP封装为网络抽象层NALU的单元封包NALU。换句话说,GOP由多个NALU组成。请参见图6示出一种GOP的示意图。如图6,该GOP由一系列的NALU组成。通常,GOP的首帧数据为图像参数集(picture parameter set,PPS)和序列参数集(sequence parameter sets,SPS),接着是I帧及其他帧。如图,GOP中包括至少一个帧,每个帧包括一个或多个NALU。其中,PPS包括一个图像(即帧)所有片(slice)的信息,SPS包括一个图像序列(即GOP中每个帧)的所有信息。At the same time, the GOP encoded by the video encoding unit 102 can also be sent to the network abstraction layer NAL 1023 for encapsulation, so as to encapsulate the GOP as a unit packet NALU of the network abstraction layer NALU. In other words, the GOP is composed of multiple NALUs. Please refer to Fig. 6 for a schematic diagram of a GOP. As shown in Figure 6, the GOP is composed of a series of NALUs. Generally, the first frame of GOP data is picture parameter set (PPS) and sequence parameter set (SPS), followed by I frame and other frames. As shown in the figure, the GOP includes at least one frame, and each frame includes one or more NALUs. Among them, PPS includes information of all slices of an image (ie, frame), and SPS includes all information of an image sequence (ie, each frame in the GOP).
举例来说,以VI帧标记采用SEI标记方式来指示VI帧在GOP中的位置为例,VI检测器在识别到GOP中的VI帧后,可通知网络抽象层NAL 1023产生一个自定义的补充增强信息网络抽象层单元(SEI NAL unit,SEI NALU)。将该SEI NALU插入VI帧的前面或者后面,以用于指示在SEI NALU之前的第i个NALU所在的帧为VI帧,或指示在SEI NALU之后的第j个NALU所在的帧为VI帧,例如具体可用于指示SEI NALU的前一帧或后一帧为VI帧。请参见图7示出一种SEI NALU插入GOP的示意图。如图7中,分别示出GOP的原始结构示意图、及在GOP中VI帧之前和之后插入SEI NALU后对应获得的一个新GOP的结构示意图。如图7,视频编码层VCL 1022得到的GOP中包括有P帧和VI帧。视频编码单元102通过VI检测器1021检测到GOP中的VI帧后,通知网络抽象层NAL1023在VI帧的前面增加SEI NALU;或者通知网络抽象层NAL 1023在VI帧的后面增加SEI NALU。其中,SEI NALU增加到VI帧前面或后面的具体位置并不做限定,例如将SEI NALU作为在VI帧中包含的第一个NALU前的第j个NALU添加到VI帧之前,以用于指示在SEI NALU之后的第j个NALU所在的帧为VI帧;或者将SEI NALU作为在VI帧中包含的最后一个NALU之后的第i个NALU添加到VI帧之后,以用于指示在SEI NALU之前的第i个NALU所在的帧为VI帧。For example, take the VI frame mark using the SEI mark method to indicate the position of the VI frame in the GOP as an example. After the VI detector recognizes the VI frame in the GOP, it can notify the network abstraction layer NAL 1023 to generate a custom supplement Enhanced information network abstraction layer unit (SEI NAL unit, SEI NALU). Insert the SEI NALU before or after the VI frame to indicate that the frame where the i-th NALU before the SEI NALU is located is a VI frame, or indicate that the frame where the j-th NALU after the SEI NALU is located is a VI frame, For example, it can be specifically used to indicate that the previous frame or the next frame of the SEI NALU is a VI frame. Please refer to FIG. 7 showing a schematic diagram of inserting a SEI NALU into a GOP. As shown in FIG. 7, the original structure diagram of the GOP and the structure diagram of a new GOP correspondingly obtained after inserting the SEI NALU before and after the VI frame in the GOP are respectively shown. As shown in Figure 7, the GOP obtained by the video coding layer VCL 1022 includes P frames and VI frames. After detecting the VI frame in the GOP through the VI detector 1021, the video encoding unit 102 notifies the network abstraction layer NAL 1023 to add SEI NALU before the VI frame; or notifies the network abstraction layer NAL 1023 to add SEI NALU after the VI frame. The specific position where the SEI NALU is added before or after the VI frame is not limited. For example, the SEI NALU is added before the VI frame as the jth NALU before the first NALU included in the VI frame to indicate The frame where the jth NALU after the SEI NALU is located is the VI frame; or the SEI NALU is added after the VI frame as the i-th NALU after the last NALU included in the VI frame to indicate that it is before the SEI NALU The frame where the i-th NALU is located is the VI frame.
请参见图8示出一种在GOP中插入SEI NALU的示意图。如图8中,GOP中每个帧(包括VI帧)均是由一个或多个NALU组成,图示中的VI帧包括3个NALU,分别为NALU1~NALU3。相应地,网络抽象层NAL 1023采用SEI标记方式标记VI帧时,可在VI帧包含的第一个NALU(图示为NALU1)的前面增加SEI NALU,即作为在NALU1前的第一个NALU添加;或者,在VI帧包含的最后一个NALU(图示为NALU3)的后面增加SEI NALU,即作为在NALU3后的第一个NALU添加。本例中,该SEI NALU具体用 于指示SEI NALU的前一个NALU或后一个NALU所在的帧为VI帧。Please refer to FIG. 8 for a schematic diagram of inserting SEI NALU in a GOP. As shown in Figure 8, each frame (including VI frame) in the GOP is composed of one or more NALUs. The VI frame in the figure includes 3 NALUs, namely NALU1 to NALU3. Correspondingly, when the network abstraction layer NAL 1023 uses the SEI marking method to mark VI frames, the SEI NALU can be added before the first NALU (NALU1 shown in the figure) contained in the VI frame, that is, as the first NALU before NALU1. ; Or, add SEI NALU after the last NALU (NALU3 shown in the figure) included in the VI frame, that is, add it as the first NALU after NALU3. In this example, the SEI NALU is specifically used to indicate that the frame in which the previous NALU or the next NALU of the SEI NALU is located is a VI frame.
在实际应用中,网络抽象层NAL 1023具体可对SEI NALU中相关字段的数值进行设置,以用于指示在SEI NALU之前的第i个NALU所在的帧为VI帧;或者用于指示在SEI NALU之后的第j个NALU所在的帧为VI帧。例如,网络抽象层NAL 1023可通过设置SEI NALU中的Type字段(具体可为SEI payload type字段)、或SEI UUID字段的数值来指示GOP中VI帧的位置。或者,网络抽象层NAL 1023也可在SEI NALU的自定义字段中新增一个字段,通过设置该新增字段的数值来指示GOP中VI帧的位置。以设置SEI payload type字段的数值为例,若网络抽象层NAL 1023将SEI payload type设置为+1,则表示在SEI NALU的前一个NALU所在的帧为VI帧。反之,若网络抽象层NAL 1023将SEI payload type设置为-1,则表示在SEI NALU的后一个NALU所在的帧为VI帧。In practical applications, the network abstraction layer NAL 1023 can specifically set the value of the relevant field in the SEI NALU to indicate that the frame of the i-th NALU before the SEI NALU is a VI frame; or to indicate that the SEI NALU The frame where the jth NALU is located is the VI frame. For example, the network abstraction layer NAL 1023 can indicate the position of the VI frame in the GOP by setting the Type field in the SEI NALU (specifically, the SEI payload type field) or the value of the SEI UUID field. Alternatively, the network abstraction layer NAL 1023 can also add a field to the custom field of the SEI NALU, and set the value of the added field to indicate the position of the VI frame in the GOP. Taking the setting of the value of the SEI payload type field as an example, if the network abstraction layer NAL 1023 sets the SEI payload type to +1, it means that the frame where the previous NALU of the SEI NALU is located is a VI frame. Conversely, if the network abstraction layer NAL 1023 sets the SEI payload type to -1, it means that the frame where the next NALU of the SEI NALU is located is a VI frame.
请参见图9,是本发明实施例提供的一种视频读写单元104的结构示意图。如图9,该视频读写单元104包括码流检测器1041、索引生成器1042及码流修改器1043。视频读写单元104与存储单元108存在相互通信。其中,Please refer to FIG. 9, which is a schematic structural diagram of a video reading and writing unit 104 according to an embodiment of the present invention. As shown in FIG. 9, the video reading and writing unit 104 includes a code stream detector 1041, an index generator 1042 and a code stream modifier 1043. The video reading and writing unit 104 and the storage unit 108 communicate with each other. among them,
码流检测器1041用于对输入视频读写单元104的视频码流(具体指视频码流中的GOP)进行帧检测(即帧识别),以确定GOP中包含的各帧及各帧的位置。例如本发明可确定GOP中I帧与VI帧各自的位置等。该位置的表现形式并不做限定,例如可用帧索引、GOP中该帧对应的播放时间、该帧对应在存储单元108中的存储位置(也可称存储地址)、或者其他用于指示该帧在GOP中位置的信息表示等。The code stream detector 1041 is used to perform frame detection (ie frame recognition) on the video code stream input to the video read-write unit 104 (specifically refers to the GOP in the video code stream) to determine the frame and the position of each frame contained in the GOP . For example, the present invention can determine the respective positions of the I frame and the VI frame in the GOP. The form of expression of the position is not limited. For example, the frame index, the playback time corresponding to the frame in the GOP, the storage location of the frame in the storage unit 108 (also called the storage address), or other indications of the frame can be used. Information representation of the position in the GOP, etc.
具体地,以码流检测器1041检测GOP中的VI帧为例,码流检测器1041对GOP进行VI帧标记检测,以检测出GOP中的VI帧及该VI帧的位置。由于视频编码单元102对GOP中VI帧的标记方式不同,则码流检测器1041进行VI帧标记检测的具体实施方式也不同,示例性给出如下两种VI帧标记检测的具体方式。Specifically, taking the code stream detector 1041 detecting the VI frame in the GOP as an example, the code stream detector 1041 performs VI frame mark detection on the GOP to detect the VI frame in the GOP and the position of the VI frame. Since the video encoding unit 102 has different marking methods for VI frames in the GOP, the specific implementation manners of the code stream detector 1041 for VI frame mark detection are also different. The following two specific manners for VI frame mark detection are exemplified.
第一种,码流检测器1041检测GOP中是否包括SEI NALU,如果包括,则根据SEI NALU的指示,确定到在SEI NALU之前的第i个NALU所在的帧为VI帧,或者确定到在SEI NALU之后的第j个NALU所在的帧为VI帧。该SEI NALU的数量并不做限定,其可为一个或多个。当SEI NALU的数量为多个时,码流检测器1041按照上述原理可检测出GOP中包括的多个VI帧及每个VI帧在GOP中的位置。In the first type, the code stream detector 1041 detects whether the SEI NALU is included in the GOP, and if it does, it determines that the frame where the i-th NALU before the SEI NALU is located is the VI frame, or determines that the SEI NALU is in the SEI NALU. The frame where the jth NALU after the NALU is located is the VI frame. The number of SEI NALUs is not limited, and it may be one or more. When the number of SEI NALUs is multiple, the code stream detector 1041 can detect multiple VI frames included in the GOP and the position of each VI frame in the GOP according to the foregoing principle.
第二种,码流检测器1041根据来自视频编码单元102发送的带外信息对GOP进行VI帧解析,确定GOP中包含的VI帧及VI帧在GOP中的位置。其中,该带外信息用于指示或通知GOP中VI帧的位置,例如GOP中第五帧为VI帧、或GOP中第3秒对应的帧为VI帧等。可选地码流检测器1041还可通过解析GOP来检测GOP中的VI帧,具体参考如下第三种实施方式具体示出。In the second type, the code stream detector 1041 performs VI frame analysis on the GOP according to the out-of-band information sent from the video encoding unit 102, and determines the VI frame contained in the GOP and the position of the VI frame in the GOP. The out-of-band information is used to indicate or notify the position of the VI frame in the GOP, for example, the fifth frame in the GOP is a VI frame, or the frame corresponding to the third second in the GOP is a VI frame, and so on. Optionally, the code stream detector 1041 can also detect the VI frame in the GOP by parsing the GOP. For details, refer to the following third implementation manner for details.
第三种,码流检测器1041解析GOP中包含的每个帧,识别每个帧中的参考帧集(reference picture sequence,RPS)信息,确定GOP中包含的VI帧及该VI帧的位置。In the third type, the code stream detector 1041 parses each frame included in the GOP, identifies the reference picture sequence (RPS) information in each frame, and determines the VI frame included in the GOP and the position of the VI frame.
应理解的,在视频编码中一帧图像被编码为一个或多个片(slice),将每帧的这些片承载在NALU中传输。每个帧的第一片中包含一个RPS信息。该RPS信息是由一些标识信息组成,该标识信息所指示的含义具体为系统自定义设置,例如指示该帧是否用作解码 当前帧或后续帧的参考等。RPS信息中包括有当前帧的参考帧信息,如果该参考帧信息用于指示当前帧拥有唯一一个解码参考的I帧,且该当前帧的前一帧为非I帧,则表示当前帧为VI帧。具体地,RPS信息指示有参考帧的图序计数(picture order count,POC),若该参考帧的POC=1,表示当前帧拥有1个参考帧,且该参考帧为I帧。即解码当前帧仅参考I帧。进一步若码流检测器1041检测到当前帧的前一帧为非I帧,则可确定当前帧为VI帧。It should be understood that in video encoding, one frame of image is encoded into one or more slices, and these slices of each frame are carried in the NALU for transmission. The first slice of each frame contains an RPS message. The RPS information is composed of some identification information, and the meaning indicated by the identification information is specifically a system custom setting, for example, indicating whether the frame is used as a reference for decoding the current frame or subsequent frames. The RPS information includes the reference frame information of the current frame. If the reference frame information is used to indicate that the current frame has only one decoding reference I frame, and the previous frame of the current frame is a non-I frame, it means that the current frame is a VI frame. Specifically, the RPS information indicates that there is a picture order count (POC) of the reference frame. If the POC of the reference frame is 1, it means that the current frame has 1 reference frame, and the reference frame is an I frame. That is, the decoding of the current frame only refers to the I frame. Further, if the code stream detector 1041 detects that the previous frame of the current frame is a non-I frame, it can determine that the current frame is a VI frame.
码流检测器1041在检测到GOP中包括VI帧时,可向索引生成器1042发送VI帧标识信号,用于通知GOP中包括VI帧及该VI帧的相关信息,例如VI帧的索引(即帧号)、VI帧在GOP中对应的播放时间、VI帧在存储单元108中的存储地址等。可选地同理,码流检测器1041在检测到GOP中包括I帧时,可向索引生成器1042发送I帧标识信号,用于通知GOP中包括的I帧及该I帧的相关信息,例如I帧的帧号、I帧在GOP中对应的播放时间、I帧在存储单元108中的存储地址等。When the code stream detector 1041 detects that the GOP includes a VI frame, it can send a VI frame identification signal to the index generator 1042 for notifying that the GOP includes the VI frame and the related information of the VI frame, such as the index of the VI frame (ie Frame number), the corresponding play time of the VI frame in the GOP, the storage address of the VI frame in the storage unit 108, and so on. Optionally, in the same way, when the code stream detector 1041 detects that an I frame is included in the GOP, it may send an I frame identification signal to the index generator 1042 to notify the I frame included in the GOP and related information of the I frame. For example, the frame number of the I frame, the corresponding playback time of the I frame in the GOP, the storage address of the I frame in the storage unit 108, and so on.
索引生成器1042用于接收码流检测器1041发送的I帧标识信号和VI帧标识信号。在接收到VI帧标识信号后,索引生成器1042可确定解码该VI帧时对应参考的目标I帧(本申请下文也可称为第二I帧),将该目标I帧与该VI帧进行关联存储,例如将该目标I帧的存储地址存储至GOP的索引信息中,以指示解码VI帧时参考该存储地址处存储的目标I帧。或者,在该VI帧中相应的索引信息,以指向目标I帧,具体用于指示解码VI帧时参考该索引信息所指向的目标I帧。其中GOP的索引信息用于标识GOP,其可包括但不限于GOP的索引号、GOP的时长、GOP对应(视频码流)的起始时间、结束时间、GOP中是否包含VI帧标识、GOP在存储单元108中的存储地址、VI帧在GOP中的存储地址或偏移量等信息。其中,这里的目标I帧具体可指GOP中出现在VI帧之前的一个I帧,即目标I帧对应的播放时间优先于VI帧对应的播放时间。或者,这里的目标I帧也可指GOP中与VI帧距离最近的I帧。The index generator 1042 is configured to receive the I frame identification signal and the VI frame identification signal sent by the code stream detector 1041. After receiving the VI frame identification signal, the index generator 1042 can determine the target I frame corresponding to the reference when decoding the VI frame (this application may also be referred to as the second I frame hereinafter), and perform the comparison between the target I frame and the VI frame. Associated storage, for example, storing the storage address of the target I frame into the index information of the GOP to instruct to refer to the target I frame stored at the storage address when decoding the VI frame. Or, the corresponding index information in the VI frame points to the target I frame, which is specifically used to indicate the target I frame pointed to by the index information when the VI frame is decoded. The index information of the GOP is used to identify the GOP, which can include but is not limited to the index number of the GOP, the duration of the GOP, the start time and end time of the GOP corresponding (video code stream), whether the GOP contains the VI frame identifier, and the GOP is in Information such as the storage address in the storage unit 108 and the storage address or offset of the VI frame in the GOP. Among them, the target I frame here may specifically refer to an I frame that appears before the VI frame in the GOP, that is, the playback time corresponding to the target I frame has priority over the playback time corresponding to the VI frame. Or, the target I frame here may also refer to the I frame that is closest to the VI frame in the GOP.
举例来说,请参见图10示出一种GOP的示意图。如图10,GOP为10s的一个视频码流。图示中第7s的帧为VI帧。此例中若解码该VI帧参考的目标I帧为该GOP中出现在VI帧之前的I帧,则该目标I帧具体为图示中第0s的I帧。若解码该VI帧参考的目标I帧为GOP中与VI帧距离最近的I帧,则该目标I帧具体为图示中第9s的I帧。For example, refer to FIG. 10 for a schematic diagram of a GOP. As shown in Figure 10, GOP is a video stream of 10s. The 7s frame in the figure is the VI frame. In this example, if the target I frame referenced by decoding the VI frame is the I frame that appears before the VI frame in the GOP, the target I frame is specifically the 0s-th I frame in the figure. If the target I frame referenced by decoding the VI frame is the I frame closest to the VI frame in the GOP, the target I frame is specifically the 9th I frame in the figure.
可选地,若视频中包含的GOP数量有多个时,每个GOP对应拥有自身GOP的索引信息,索引生成器1042可将每个GOP及该GOP的索引信息以GOP索引表的形式存储至存储单元108中。其中,该GOP索引表中存储有至少一个映射关系,该映射关系为一个GOP对应拥有一个该GOP的索引信息。关于GOP的索引信息具体可参见上文所述,这里不再赘述。Optionally, if there are multiple GOPs contained in the video, each GOP has its own GOP index information, and the index generator 1042 can store each GOP and the index information of the GOP in the form of a GOP index table. In the storage unit 108. Wherein, at least one mapping relationship is stored in the GOP index table, and the mapping relationship is that one GOP corresponds to having one index information of the GOP. For the specific index information of the GOP, please refer to the above description, which will not be repeated here.
码流修改器1043用于对视频读写单元104输入的GOP进行修改,以获得修改后的新GOP。具体地码流修改器1043读取GOP中包含的VI帧及解码该VI帧时对应参考的目标I帧,然后将目标I帧插入到VI帧之前,从而获得至少两个新GOP(也可称为多个GOP)。其中,目标I帧在VI帧之前插入的具体位置并不做限定,例如作为VI帧之前的第m帧插 入等,m为正整数。The code stream modifier 1043 is used to modify the GOP input by the video reading and writing unit 104 to obtain a new modified GOP. Specifically, the code stream modifier 1043 reads the VI frame contained in the GOP and the target I frame corresponding to the reference when decoding the VI frame, and then inserts the target I frame before the VI frame to obtain at least two new GOPs (also called For multiple GOPs). Wherein, the specific position where the target I frame is inserted before the VI frame is not limited, for example, it is inserted as the m-th frame before the VI frame, and m is a positive integer.
需要特别说明的是,为了更形象的描述本发明实施例,可以把本发明实施例形象的理解为:通过在原GOP中插入新的I帧,从而把一个GOP划分成多个新的GOP,其中,每个新的GOP拥有一个I帧。然而,本发明实施例中新插入的I帧是为了能够被VI帧解码参考之用,因此可能并不拥有原GOP中I帧的全部功能(例如可以不拥有被播放的功能),只要足以解码时供VI帧参考即可,换句话说,新插入的帧仅拥有I帧的供VI帧参考解码的功能,因此,可以把这种新插入的帧可以称为准I帧。在这种情况下,由于插入的不是真正的I帧,因此原GOP帧可以被认为并没有真正的被划分多个新GOP,仍然是一个GOP(只不过在这个GOP中新增了一个或者多个准I帧)。当然,在另外一种情况下,如果新插入的I帧和原GOP中的I帧完全相同时,那么可以认为原GOP被划分多个新GOP。为了方便描述,在没有特别说明的情况下,本发明后续实施例中对这两种情况不做特别区分,把插入的帧统称为I帧,把在GOP中插入I帧(或者准I帧)的这种操作的结果统一称为获得“新GOP”。简言之,本发明实施例中插入的I帧(例如第二I帧)是:和原GOP中I帧相同的帧,或者拥有原GOP中的I帧所拥有的、供VI帧参考解码功能的帧。It should be noted that, in order to describe the embodiments of the present invention more vividly, the embodiments of the present invention can be visually understood as: by inserting a new I frame in the original GOP, a GOP is divided into multiple new GOPs, where , Each new GOP has an I frame. However, the newly inserted I frame in the embodiment of the present invention is for reference by VI frame decoding, so it may not have all the functions of the I frame in the original GOP (for example, it may not have the function to be played), as long as it is enough to decode The VI frame can be used for reference at any time. In other words, the newly inserted frame only has the function of I frame for VI frame reference and decoding. Therefore, this newly inserted frame can be called a quasi-I frame. In this case, since the inserted I frame is not a real I frame, the original GOP frame can be considered as not really divided into multiple new GOPs, but still a GOP (it's just that one or more new GOPs are added to this GOP). A quasi-I frame). Of course, in another case, if the newly inserted I frame is exactly the same as the I frame in the original GOP, it can be considered that the original GOP is divided into multiple new GOPs. For the convenience of description, unless otherwise specified, the following embodiments of the present invention do not distinguish between these two cases. The inserted frames are collectively referred to as I frames, and the I frames (or quasi I frames) are inserted into the GOP. The result of this operation is collectively referred to as obtaining a "new GOP". In short, the inserted I frame (for example, the second I frame) in the embodiment of the present invention is: the same frame as the I frame in the original GOP, or has the I frame in the original GOP possesses the VI frame reference decoding function Frame.
具体实现中,视频读写单元104接收到视频处理请求时,通过码流检测器1041检测GOP中是否存在VI帧。如果GOP中存在VI帧,则通过码流修改器1043从该GOP的索引信息中记录的解码该VI帧参考的目标I帧的存储地址处,读取目标I帧。然后码流修改器1043将读取的目标I帧插入到VI帧之前,从而得到多个新GOP。这样能够解决现有技术中存在的GOP较大时若I帧与VI帧之间的距离较大,则会导致解码时间过长、视频处理效率降低、或一些重要视频信息丢失等问题。本发明采用在VI帧前插入I帧的方式,能够将大GOP拆分为多个小GOP,在视频播放时能够基于拆分后的小GOP进行解码播放,相比于现有技术而言,能够避免一些不必要信息的解码,提升视频解码效率、避免一些重要视频信息丢弃等问题,保障用户观看体验。In specific implementation, when the video read-write unit 104 receives a video processing request, it detects whether there is a VI frame in the GOP through the code stream detector 1041. If there is a VI frame in the GOP, the code stream modifier 1043 reads the target I frame from the storage address of the target I frame referenced by decoding the VI frame recorded in the index information of the GOP. Then the code stream modifier 1043 inserts the read target I frame before the VI frame, thereby obtaining multiple new GOPs. This can solve the problems in the prior art when the GOP is large, if the distance between the I frame and the VI frame is large, the decoding time will be too long, the video processing efficiency will be reduced, or some important video information will be lost. The present invention adopts the method of inserting an I frame before the VI frame, can split a large GOP into multiple small GOPs, and can decode and play based on the split small GOPs during video playback. Compared with the prior art, It can avoid the decoding of some unnecessary information, improve the efficiency of video decoding, avoid the discarding of some important video information and other issues, and ensure the user's viewing experience.
通过实施本发明实施例,视频编码单元102能对GOP中包含的VI帧进行VI帧标记,并将VI帧标记随同GOP一起传输,这样能够提升视频编码单元102的兼容性。视频读写单元104能在VI帧前插入目标I帧,将大GOP划分为多个新GOP,这样基于VI帧为粒度进行控制,能有效提升视频播放效果。尤其在视频倒放场景中,利用新GOP替代大GOP并缓存,能有效节省存储资源。By implementing the embodiment of the present invention, the video encoding unit 102 can mark VI frames contained in the GOP, and transmit the VI frame marks along with the GOP, so that the compatibility of the video encoding unit 102 can be improved. The video reading and writing unit 104 can insert the target I frame before the VI frame, and divide the large GOP into multiple new GOPs. In this way, the control is based on the granularity of the VI frame, which can effectively improve the video playback effect. Especially in video reverse scenes, using new GOPs to replace large GOPs and cache them can effectively save storage resources.
下面介绍本发明适用的两种应用场景。Two application scenarios to which the present invention is applicable are described below.
第一种,视频播放场景。视频处理请求具体为视频播放请求。具体地,用户观看视频时,能根据自身需求随意拖动视频播放的进度条,请参见图11示出一种用户拖动视频播放进度条的示意图。计算设备在检测到用户拖动视频播放进度条时,可生成相应地视频播放请求。进而响应该视频播放请求,获取拖动停止位置所在的GOP,然后识别该GOP中是否包括VI帧。若该GOP中包括VI帧,则从GOP索引表中获取解码该VI帧时参考的目标I帧的存储地址,从该存储地址处获取目标I帧,进而将目标I帧插入到I帧之前。这里的目标I帧具体可指GOP中出现在VI帧之前的I帧,也可指GOP中与该VI帧距离最近的I帧,具体可参考上述图10所示的例子。The first is the video playback scene. The video processing request is specifically a video playback request. Specifically, when a user watches a video, he can drag the progress bar of the video playback at will according to his own needs. Please refer to FIG. 11 which shows a schematic diagram of a user dragging the progress bar of the video playback. When the computing device detects that the user is dragging the video playback progress bar, it can generate a corresponding video playback request. In response to the video playback request, the GOP where the dragging stop position is obtained is obtained, and then it is recognized whether the VI frame is included in the GOP. If the GOP includes a VI frame, the storage address of the target I frame referred to when decoding the VI frame is obtained from the GOP index table, the target I frame is obtained from the storage address, and the target I frame is inserted before the I frame. The target I frame here may specifically refer to the I frame that appears before the VI frame in the GOP, or may refer to the I frame that is closest to the VI frame in the GOP. For details, please refer to the example shown in FIG. 10 above.
若GOP中包括VI帧的数量有多个,则在视频播放场景中为节省设备处理资源,计算设备可仅需对GOP中距离拖动停止位置最近的VI帧进行处理,即在该VI帧前插入目标I帧得到两个新GOP。可选地,插入后的目标I帧对应的播放时间优先于拖动停止位置对应的播放时间。进而对拖动停止位置所在的新GOP进行解码和播放。请参见图12示出一种GOP的结构示意图。如图12中,该GOP为10s的一个视频码流,该GOP中包括两个VI帧,分别为VI帧1和VI帧2。VI帧1对应的播放时间为第5秒,VI帧2对应的播放时间为第7秒。用户在线观看该视频码流时,可随意拖动视频播放的进度条。若用户拖动进度条在第3s处停止,则距离拖动停止位置最近的VI帧为VI帧1。此时,计算设备可在VI帧1之前插入目标I帧,该目标I帧的插入位置不做限定,例如可在拖动停止位置与VI帧之间的任意位置,也可在拖动停止位置之前的任意位置,这样可保证插入后的目标I帧对应的播放时间不晚于(即大于或等于)拖动停止位置对应的播放时间,能够避免一些重要视频信息的丢失。If there are multiple VI frames included in the GOP, in order to save equipment processing resources in the video playback scene, the computing device can only process the VI frame closest to the drag stop position in the GOP, that is, before the VI frame Insert the target I frame to get two new GOPs. Optionally, the playback time corresponding to the inserted target I frame has priority over the playback time corresponding to the dragging stop position. Then decode and play the new GOP where the dragging stop position is located. Please refer to FIG. 12 for a schematic diagram of the structure of a GOP. As shown in Fig. 12, the GOP is a video code stream of 10s, and the GOP includes two VI frames, VI frame 1 and VI frame 2, respectively. The playback time corresponding to VI frame 1 is the 5th second, and the playback time corresponding to VI frame 2 is the 7th second. When users watch the video stream online, they can drag the progress bar of video playback at will. If the user drags the progress bar to stop at 3s, the VI frame closest to the dragging stop position is VI frame 1. At this time, the computing device can insert the target I frame before VI frame 1. The insertion position of the target I frame is not limited, for example, it can be at any position between the drag stop position and the VI frame, or at the drag stop position At any position before, it can ensure that the playback time corresponding to the target I frame after insertion is not later than (that is, greater than or equal to) the playback time corresponding to the dragging stop position, which can avoid the loss of some important video information.
第二种,视频下载场景。视频处理请求具体为视频下载请求。具体地用户若想离线观看视频,则可预先将视频下载并缓存到本地。相应地计算设备接收到视频下载请求后,可响应该视频下载请求下载视频(具体指视频中包含的一个或多个GOP)。可选地该视频下载请求中可携带有视频的起始时间和结束时间,则计算设备将下载从起始时间至结束时间这段时长的视频(即视频中的一个或多个GOP),具体可从起始时间所在的GOP开始下载直至结束时间所在的GOP结束。然后识别每个GOP中是否包括有VI帧。若GOP中包括VI帧,则从GOP索引表中获取解码该VI帧时参考的目标I帧的存储地址,从该存储地址处获取目标I帧,进而将目标I帧插入到VI帧之前。关于目标I帧的介绍具体可参考上述第一种应用场景中的相关阐述,这里不再赘述。The second is the video download scene. The video processing request is specifically a video download request. Specifically, if the user wants to watch the video offline, he can download and cache the video locally in advance. Correspondingly, after receiving the video download request, the computing device can download the video (specifically, one or more GOPs contained in the video) in response to the video download request. Optionally, the video download request can carry the start time and end time of the video, and the computing device will download the video (that is, one or more GOPs in the video) from the start time to the end time. It can start downloading from the GOP at the start time to the end of the GOP at the end time. Then identify whether each GOP includes a VI frame. If the GOP includes a VI frame, the storage address of the target I frame referred to when decoding the VI frame is obtained from the GOP index table, the target I frame is obtained from the storage address, and the target I frame is inserted before the VI frame. For details about the introduction of the target I frame, please refer to the relevant description in the first application scenario above, which will not be repeated here.
在实际应用中,不同GOP对应有不同的播放时段,当起始时间位于某个GOP的播放时段之内时,可简单理解为该起始时间在这个GOP中,将该GOP作为该起始时间所在的GOP。具体可参见本申请下文图15例子所述。In practical applications, different GOPs correspond to different play periods. When the start time is within the play period of a certain GOP, it can be simply understood that the start time is in this GOP, and the GOP is used as the start time. The GOP where it is located. For details, please refer to the example described in Figure 15 below in this application.
在视频下载场景中,考虑到用户可拖动视频播放的进度条在任意位置处开始播放视频,则计算设备可对视频中每个GOP所包括的各个VI帧进行处理,即在该VI帧前插入目标I帧,从而实现大GOP到小GOP的拆分。其中计算设备针对每个GOP中的任一个VI帧的处理过程均相同,具体可参考前述实施例的相关介绍,这里不再赘述。In the video download scene, considering that the user can drag the video playback progress bar to start playing the video at any position, the computing device can process each VI frame included in each GOP in the video, that is, before the VI frame Insert the target I frame to realize the split of large GOP to small GOP. The processing process of the computing device for any VI frame in each GOP is the same. For details, reference may be made to the relevant introduction of the foregoing embodiment, which will not be repeated here.
下面介绍GOP存储涉及的相关实施例。不同视频处理系统可采用不同的索引方式为GOP创建相应地索引信息并存储。换句话说,在不同视频处理系统中GOP的索引信息对应的索引方式可不同,例如支持采用时间索引方式、或帧号索引方式等等。具体如下示例性给出两种索引方式的具体实现方式。The following describes related embodiments related to GOP storage. Different video processing systems can use different indexing methods to create and store corresponding index information for the GOP. In other words, the indexing methods corresponding to the index information of the GOPs in different video processing systems may be different, for example, the time indexing method or the frame number indexing method can be supported. Specifically, the specific implementation manners of the two indexing methods are given as an example as follows.
第一种,时间索引方式。计算设备采用时间索引方式为GOP创建相应地索引信息并存储。具体地,计算设备按照预设时长(例如1s)为GOP创建索引,得到该GOP的索引信息。该索引信息包括但不限于GOP的编号、GOP中是否包含I帧、I帧的存储地址、GOP中是否包含VI帧、VI帧的存储地址、GOP在存储单元108中的存储地址、GOP中每帧对应的播放时间等信息。该预设时长为系统自定义设置的,例如根据用户需求自定义设置、 或根据一系列经验数据统计获得等。请参见图13A示出一种采用时间索引方式存储的GOP示意图。如图,GOP为一个10s的视频码流,具体如图示出第0秒~第9秒的视频码流。每秒对应GOP中的一帧(图像)。The first is the time index method. The computing device uses the time index method to create and store corresponding index information for the GOP. Specifically, the computing device creates an index for the GOP according to a preset duration (for example, 1s), and obtains index information of the GOP. The index information includes, but is not limited to, the number of the GOP, whether the GOP contains an I frame, the storage address of the I frame, whether the GOP contains a VI frame, the storage address of the VI frame, the storage address of the GOP in the storage unit 108, and the storage address of the GOP. The playback time and other information corresponding to the frame. The preset duration is self-defined by the system, such as self-defined settings according to user requirements, or statistically obtained based on a series of empirical data. Please refer to FIG. 13A, which shows a schematic diagram of a GOP stored in a time index manner. As shown in the figure, GOP is a 10s video code stream, and the specific figure shows the video code stream from the 0th second to the 9th second. Corresponds to one frame (image) in the GOP per second.
第二种,帧号索引方式。计算设备采用帧号索引方式为GOP创建相应地索引信息并存储。具体地,计算设备按照I帧间隔为GOP创建索引,得到该GOP的索引信息。该GOP用于指示两个I帧之间的一组连续的帧。关于GOP的索引信息具体可参见上文所述,这里不再赘述。请参见图13B示出一种采用帧号索引方式存储的GOP示意图。如图,该GOP为包括10帧的视频码流,如图所示分别为帧0~帧10。每帧对应有该帧的索引号。The second is the frame number index method. The computing device uses the frame number index method to create and store corresponding index information for the GOP. Specifically, the computing device creates an index for the GOP according to the I frame interval, and obtains the index information of the GOP. The GOP is used to indicate a group of consecutive frames between two I frames. For the specific index information of the GOP, please refer to the above description, which will not be repeated here. Please refer to FIG. 13B, which shows a schematic diagram of a GOP stored in a frame number index mode. As shown in the figure, the GOP is a video code stream including 10 frames, as shown in the figure are frame 0 to frame 10. Each frame corresponds to the index number of the frame.
基于前述实施例,请参见图14是本发明实施例提供的一种视频处理方法的流程示意图。如图14所示的方法包括如下实施步骤:Based on the foregoing embodiment, please refer to FIG. 14 which is a schematic flowchart of a video processing method according to an embodiment of the present invention. The method shown in Figure 14 includes the following implementation steps:
步骤S102、计算设备获取视频中的图像组GOP,该GOP的首帧为第一I帧。该GOP中包括M个帧,M为正整数。Step S102: The computing device obtains the GOP of the group of pictures in the video, and the first frame of the GOP is the first I frame. The GOP includes M frames, and M is a positive integer.
计算设备获取视频处理请求,该视频处理请求中携带有视频的起始时间。响应视频处理请求,获取与起始时间相对应的视频,即获取该视频中的至少一个GOP。其中视频处理请求还可携带视频的结束时间、或其他系统自定义的信息等,本发明不做限定。该视频处理请求具体可为用户对视频进行相应视频操作所产生的,也可为接收来自其他设备的。在不同应用场景中,该视频处理请求也可不同。例如,在视频播放场景中,计算设备检测到用户针对视频播放进度条的拖动操作,可生成相应地视频播放请求。在视频下载场景中,计算设备检测到用户针对预设时段(从起始时间到结束时间这个时段)视频的下载操作时,可生成相应地视频下载请求等。The computing device obtains a video processing request, and the video processing request carries the start time of the video. In response to the video processing request, a video corresponding to the start time is acquired, that is, at least one GOP in the video is acquired. The video processing request may also carry the end time of the video, or other system-customized information, etc., which is not limited in the present invention. The video processing request may specifically be generated by the user performing a corresponding video operation on the video, or may be received from other devices. In different application scenarios, the video processing request may also be different. For example, in a video playback scene, the computing device detects a user's drag operation on the video playback progress bar, and can generate a corresponding video playback request. In a video download scenario, when the computing device detects a user's video download operation for a preset period (the period from the start time to the end time), it can generate a corresponding video download request, etc.
下面以视频处理请求分别为视频播放请求和视频下载请求为例,详述步骤S102的具体实施方式。In the following, taking the video processing request as a video playback request and a video download request as an example, the specific implementation manner of step S102 is described in detail.
在一种实施方式中,若视频处理请求为视频播放请求,该视频播放请求中携带有视频的起始时间T s。该视频包括多个GOP。则计算设备可响应视频播放请求,从视频的多个GOP中获取起始时间T s所在的图像组GOP。 In one embodiment, if the video processing request is a video playback request, the video playback request carries the start time T s of the video. The video includes multiple GOPs. Then, the computing device can respond to the video playback request and obtain the GOP of the group of pictures where the start time T s is located from the multiple GOPs of the video.
再一种实施方式中,若视频处理请求为视频下载请求,该视频下载请求中携带有视频的起始时间T s,可选地还可携带视频的结束时间T e。则计算设备可响应视频播放请求,从起始时间T s所在的GOP开始下载,直至结束时间T e所在的GOP结束,从而下载得到组成视频的至少一个GOP。 In another embodiment, if the video processing request is a video download request, the video download request carries the start time T s of the video, and optionally the end time T e of the video. The computing device may respond to a video playback request, where the start time T s GOP start the download, the time until the end of the GOP end T e where, whereby the at least one GOP composed of downloaded video.
举例来说,请参见图15是本发明实施例提供的一种组成视频的GOP示意图。某用户在计算设备上在线播放电影《XXX》。如图15,该电影包括8个GOP。假设用户拖动电影的播放进度条停留在T s时刻,以从T s时刻开始播放视频。计算设备在检测到用户拖动电影的播放进度条时,可生成视频播放请求。该视频播放请求中携带有视频的起始时间T s。进一步计算设备可响应该视频播放请求,获取起始时间T s所在的GOP,图示具体为GOP3。 For example, refer to FIG. 15 for a schematic diagram of a GOP that composes a video according to an embodiment of the present invention. A user plays the movie "XXX" online on a computing device. As shown in Figure 15, the movie includes 8 GOPs. Assuming that the user drags the movie playback progress bar stuck in a time T s, T s time to start playing from the video. The computing device may generate a video playback request when detecting that the user is dragging the playback progress bar of the movie. The video playback request carries the start time T s of the video. Further, the computing device may respond to the video playback request to obtain the GOP where the start time T s is located, and the figure is specifically GOP3.
若用户需离线下载该电影,计算设备在检测到用户针对电影的下载操作时,可生成视频下载请求。该视频下载请求可携带有待下载视频的起始时间和结束时间。该待下载视频可为电影《XXX》的一个视频片段(例如片头或片尾),也可为整个视频。该待下载视频 的起始时间和结束时间可为用户根据实际需求自定义设置,例如00:01:00-00:21:00(即下载第1分钟-第21分钟的视频段)。用户可对计算设备提供的显示交互界面进行离线下载设置,请参见图16示出一种用户离线下载视频的操作示意图。如图16,在显示交互界面中根据自身需求设置待下载视频的起始时间、结束时间及视频名称等信息。相应地计算设备在检测到针对显示交互界面的离线下载操作时,可从起始时间所在的GOP开始下载,直至结束时间所在的GOP结束。假设本例中,起始时间00:01:00所在的GOP为GOP1,结束时间00:21:00所在的GOP为GOP3,则计算设备下载上述20分钟的视频具体可包括GOP1、GOP2和GOP3。If the user needs to download the movie offline, the computing device may generate a video download request when detecting the user's download operation for the movie. The video download request may carry the start time and end time of the video to be downloaded. The video to be downloaded can be a video segment (for example, the beginning or the end) of the movie "XXX", or it can be the entire video. The start time and end time of the video to be downloaded can be customized by the user according to actual needs, for example, 00:01:00-00:21:00 (that is, download the video segment from the 1st minute to the 21st minute). The user can perform offline download settings on the display interactive interface provided by the computing device. Please refer to FIG. 16 for a schematic diagram of an operation for a user to download videos offline. As shown in Figure 16, in the display interactive interface, set the start time, end time, and video name of the video to be downloaded according to your own needs. Correspondingly, when the computing device detects an offline download operation for the display interactive interface, it can start downloading from the GOP at the start time until the GOP at the end time ends. Assuming that in this example, the GOP at the start time 00:01:00 is GOP1, and the GOP at the end time 00:21:00 is GOP3, the 20-minute video downloaded by the computing device may specifically include GOP1, GOP2, and GOP3.
步骤S104、计算设备确定M个帧中是否包括VI帧。Step S104: The computing device determines whether a VI frame is included in the M frames.
在一种实施方式中,GOP包括一个或多个NALU。计算设备通过识别GOP中是否包括SEI NALU来确定GOP的M个帧中是否包括VI帧。具体地,若GOP中包括SEI NALU,则根据SEI NALU的指示,确定在SEI NALU之前的第i个NALU所在的帧为VI帧,或者确定在SEI NALU之后的第j个NALU所在的帧为VI帧。其中SEI NALU的数量不做限定,其可为一个或多个。当SEI NALU的数量为多个时,计算设备参照上述VI帧确定原理,可确定多个SEI NALU各自对应指示的VI帧。从而确定出M个帧中包括的一个或多个VI帧。In one embodiment, the GOP includes one or more NALUs. The computing device determines whether the VI frame is included in the M frames of the GOP by identifying whether the SEI and NALU are included in the GOP. Specifically, if the SEI NALU is included in the GOP, the frame where the i-th NALU before the SEI NALU is located is the VI frame, or the frame where the j-th NALU after the SEI NALU is located is the VI frame according to the indication of the SEI NALU. frame. The number of SEI NALU is not limited, and it can be one or more. When the number of SEI NALUs is multiple, the computing device can determine the indicated VI frame corresponding to each of the multiple SEI NALUs by referring to the above-mentioned VI frame determination principle. Thus, one or more VI frames included in the M frames are determined.
再一种实施方式中,GOP包括至少一个帧。每个帧中包括该帧的参考帧RPS信息。计算设备可对M个帧各自的RPS信息进行分析,确定每个帧是否为VI帧。具体地若GOP中任一帧的RPS信息用于指示该任一帧拥有一个参考解码的I帧,且该任一帧的前一帧为非I帧(具体可为B帧或P帧),则确定该任一帧为VI帧。否则确定该任一帧不为VI帧。In another embodiment, the GOP includes at least one frame. Each frame includes the reference frame RPS information of the frame. The computing device can analyze the respective RPS information of the M frames to determine whether each frame is a VI frame. Specifically, if the RPS information of any frame in the GOP is used to indicate that any frame has a reference decoded I frame, and the previous frame of any frame is a non-I frame (specifically, it may be a B frame or a P frame), It is determined that any frame is a VI frame. Otherwise, it is determined that any frame is not a VI frame.
再一种实施方式中,计算设备获取GOP的带外信息,该带外信息用于指示GOP中包含的VI帧的位置。该位置是指VI帧在GOP的具体或确定位置,其可包括但不限于VI帧的帧号(索引号)、VI帧对应的播放时间等。该带外信息具体可为计算设备接收来自其他设备(如服务器)发送而来的;也可为计算设备从自身的视频编码单元中获得的,本发明不做限定。相应地计算设备根据GOP的带外信息,识别GOP的M个帧中是否包括VI帧及该VI帧的位置等信息。In another implementation manner, the computing device obtains out-of-band information of the GOP, and the out-of-band information is used to indicate the position of the VI frame included in the GOP. The position refers to the specific or definite position of the VI frame in the GOP, which may include, but is not limited to, the frame number (index number) of the VI frame, the playing time corresponding to the VI frame, and the like. The out-of-band information may specifically be received by the computing device from other devices (such as a server); it may also be obtained by the computing device from its own video encoding unit, which is not limited in the present invention. Correspondingly, the computing device recognizes whether the VI frame and the position of the VI frame are included in the M frames of the GOP according to the out-of-band information of the GOP.
可选实施例中,当计算设备确定到GOP中不包括VI帧,则计算设备无需对该GOP处理。在播放该GOP对应的视频时,计算设备可从GOP中的第一I帧开始解码和播放。In an alternative embodiment, when the computing device determines that the VI frame is not included in the GOP, the computing device does not need to process the GOP. When playing the video corresponding to the GOP, the computing device can start decoding and playing from the first I frame in the GOP.
步骤S106、计算设备在确定M个帧中包括VI帧时,在VI帧前插入第二I帧,得到多个新GOP。该新GOP的数量为GOP中包括的VI帧的数量增加1。Step S106: When determining that the M frames include a VI frame, the computing device inserts a second I frame before the VI frame to obtain multiple new GOPs. The number of the new GOP is increased by one for the number of VI frames included in the GOP.
计算设备在识别M个帧中包括VI帧后,可获取该VI帧对应关联的目标I帧(也称为第二I帧)。具体地例如计算设备可从GOP的索引信息中,确定该VI帧对应关联的第二I帧的存储地址,进而从该存储位置处获取第二I帧。或者计算设备可从该VI帧的索引信息中查找其所指向的第二I帧。该第二I帧具体可为GOP中出现在VI帧之前的一个I帧,也可为GOP中与VI帧距离最近的一个I帧,具体可对应参考前述关于目标I帧的相关介绍,这里不再赘述。其中该GOP的索引信息中记录有解码VI帧时参考的第二I帧、第二I帧 的存储地址、每个帧的帧索引、每个帧对应的播放时间、GOP的播放时长、GOP的起始时间和结束时间等信息。After recognizing that the VI frame is included in the M frames, the computing device can obtain the target I frame (also referred to as the second I frame) corresponding to the VI frame. Specifically, for example, the computing device may determine the storage address of the associated second I frame corresponding to the VI frame from the index information of the GOP, and then obtain the second I frame from the storage location. Or the computing device can search for the second I frame pointed to by the index information of the VI frame. The second I frame can be an I frame that appears before the VI frame in the GOP, or the I frame that is the closest to the VI frame in the GOP. For details, please refer to the related introduction about the target I frame. Go into details again. Wherein the index information of the GOP records the second I frame referenced when decoding VI frames, the storage address of the second I frame, the frame index of each frame, the corresponding playback time of each frame, the playback duration of the GOP, and the GOP Information such as start time and end time.
计算设备在获得第二I帧后,可将第二I帧插入VI帧之前,具体可作为VI帧之前的第m个帧插入,m为正整数。例如,将第二I帧作为VI帧的前一帧插入等。从而计算设备可将GOP拆分为多个新GOP,该新GOP的数量为GOP中VI帧的数量增加1。例如,GOP中包括4个VI帧,对每个VI帧进行第二I帧的插入后,可获得5个新GOP。具体参见图17示出一种新GOP的示意图。如图中,GOP中包括4个VI帧,计算设备采用上述I帧插入原理,在每个VI帧前插入相应地第二I帧,从而得到5个新GOP。After obtaining the second I frame, the computing device may insert the second I frame before the VI frame, specifically, it may be inserted as the m-th frame before the VI frame, and m is a positive integer. For example, insert the second I frame as the previous frame of the VI frame. Therefore, the computing device can split the GOP into multiple new GOPs, and the number of the new GOPs is the number of VI frames in the GOP increased by one. For example, a GOP includes 4 VI frames, and after inserting a second I frame for each VI frame, 5 new GOPs can be obtained. For details, refer to FIG. 17 showing a schematic diagram of a new GOP. As shown in the figure, the GOP includes 4 VI frames, and the computing device adopts the above-mentioned I frame insertion principle to insert the corresponding second I frame before each VI frame, thereby obtaining 5 new GOPs.
可选地在不影响视频播放质量的情况下,计算设备在VI帧前插入第二I帧后,可修改第二I帧的相关字段的数值(例如第二I帧中控制字段或标志字段的值),以将第二I帧标记为非显示帧、或非输出帧。换句话说,该第二I帧仅用于解码VI帧,并不用于输出显示。此时本申请涉及的新GOP与常规定义上的GOP意义不相同,为方便理解本申请仍采用新GOP的术语描述。该新GOP用于指示两个I帧之间的距离,但新GOP的首个I帧仅用于解码,并不用于显示输出。示例性地,计算设备修改第二I帧的伪代码描述具体如下:Optionally, without affecting the video playback quality, after the computing device inserts the second I frame before the VI frame, it can modify the value of the related field of the second I frame (for example, the value of the control field or the flag field in the second I frame). Value) to mark the second I frame as a non-display frame or a non-output frame. In other words, the second I frame is only used to decode the VI frame, and is not used for output display. At this time, the new GOP involved in this application has a different meaning from the GOP in the conventional definition. To facilitate the understanding of this application, the term description of the new GOP is still used. The new GOP is used to indicate the distance between two I frames, but the first I frame of the new GOP is only used for decoding, not for display output. Exemplarily, the pseudo code description of the second I frame modified by the computing device is specifically as follows:
Figure PCTCN2019125411-appb-000001
Figure PCTCN2019125411-appb-000001
需要说明的是,针对不同的应用场景,本发明对视频的GOP及GOP中包含的VI帧进行具体处理的对象也存在不同。具体地:It should be noted that, for different application scenarios, the present invention also has different specific processing objects for the GOP of the video and the VI frames included in the GOP. specifically:
第一种,在视频播放场景中,S102中视频处理请求具体为视频播放请求。该视频播放请求中携带有视频的起始时间T s。相应地计算设备响应视频播放请求,获取起始时间T s所在的GOP,然后识别该GOP中是否包括VI帧。若该GOP中包括多个VI帧,则计算设备从多个VI帧获取与起始时间T s最近的VI帧进行处理,即在获取的VI帧之前插入第二I帧,从而获得两个新GOP。关于VI帧的获取具体可参考前述图12所述示例中的相关介绍,这里不再赘述。可选地,为保证视频信息不丢失,插入后的第二I帧对应的播放时间优先 于起始时间T sFirst, in a video playback scene, the video processing request in S102 is specifically a video playback request. The video playback request carries the start time T s of the video. Correspondingly, the computing device responds to the video playback request, obtains the GOP where the start time T s is located, and then identifies whether the GOP includes a VI frame. If the GOP includes multiple VI frames, the computing device obtains the VI frame closest to the start time T s from the multiple VI frames for processing, that is, inserts the second I frame before the obtained VI frame, thereby obtaining two new VI frames. GOP. For details about obtaining the VI frame, please refer to the relevant introduction in the example described in FIG. 12, which will not be repeated here. Optionally, to ensure that the video information is not lost, the playback time corresponding to the second I frame after the insertion has priority over the start time T s .
第二种,在视频下载场景中,S102中视频处理请求具体为视频下载请求。该视频下载请求中携带有视频的起始时间T s和结束时间T e。相应地计算设备响应视频下载请求,从起始时间T s所在的GOP开始下载,直至结束时间T e所在的GOP结束,从而下载得到组成视频的多个GOP。针对每个GOP,识别该GOP中是否包括VI帧。若该GOP中包括一个或多个VI帧,则计算设备在每个VI帧之前均插入相应地第二I帧,从而将一个GOP拆分为多个新GOP。具体可参考前述图9所述实施例中的相关介绍,这不再赘述。 Second, in a video download scenario, the video processing request in S102 is specifically a video download request. The video download request carries the start time T s and the end time T e of the video. Accordingly, the computing device in response to a request to download a video, where a starting time T S from the GOP start the download until the end of time T E where the end of the GOP to GOP consisting of a plurality of downloaded video. For each GOP, identify whether the GOP includes VI frames. If the GOP includes one or more VI frames, the computing device inserts a corresponding second I frame before each VI frame, thereby splitting one GOP into multiple new GOPs. For details, reference may be made to the relevant introduction in the embodiment described in FIG.
可选实施例中,计算设备在获得多个新GOP后,若获取到视频播放请求,则可响应该视频播放请求针对相应新GOP进行解码和播放。在不同应用场景中,其具体实施方式如下:In an alternative embodiment, after obtaining multiple new GOPs, the computing device may decode and play the corresponding new GOPs in response to the video playback request if it obtains a video playback request. In different application scenarios, the specific implementation is as follows:
在视频播放场景中,S102中视频处理请求为视频播放请求,计算设备响应视频播放请求对GOP中距离起始时间T s最近的VI帧进行第二I帧插入后得到两个新GOP。进一步响应视频播放请求,获取起始时间T s所在的新GOP,从该新GOP的第二I帧开始解码并播放该新GOP。换句话说,计算设备响应于视频播放请求,确定起始时间T s位于GOP中第二I帧之后,然后从第二I帧开始解码并播放GOP对应的视频。 In a video playback scenario, the video processing request in S102 is a video playback request, and the computing device responds to the video playback request to perform the second I frame insertion on the VI frame closest to the start time T s in the GOP to obtain two new GOPs. Further respond to the video play request, obtain the new GOP at the start time T s , start decoding and play the new GOP from the second I frame of the new GOP. In other words, in response to the video playback request, the computing device determines that the start time T s is located after the second I frame in the GOP, and then decodes and plays the video corresponding to the GOP from the second I frame.
在视频下载场景中,S102中视频处理请求为视频下载请求,计算设备响应视频下载请求,下载视频中包括的多个GOP,对每个GOP包括的每个VI帧进行第二I帧插入,得到多个新GOP。用户观看视频时可随意拖动视频的播放进度条。计算设备在检测到用户拖动视频播放进度条时,可生成相应视频播放请求。该视频播放请求中携带有视频的起始时间T s。响应该视频播放请求,从多个新GOP中查找起始时间T s所在的新GOP,然后从该新GOP的第二I帧开始解码并播放该新GOP。 In the video download scenario, the video processing request in S102 is a video download request. The computing device responds to the video download request, downloads multiple GOPs included in the video, and inserts a second I frame for each VI frame included in each GOP to obtain Multiple new GOPs. The user can drag the playback progress bar of the video at will when watching the video. When the computing device detects that the user is dragging the video playback progress bar, it can generate a corresponding video playback request. The video playback request carries the start time T s of the video. In response to the video playback request, the new GOP where the start time T s is located is searched from among multiple new GOPs, and then the new GOP is decoded and played from the second I frame of the new GOP.
通过实施本发明实施例,能够解决现有技术中存在的视频处理效率较低、丢失一些重要视频信息、或在大GOP倒放场景中比较浪费计算设备的存储资源等问题。By implementing the embodiments of the present invention, it is possible to solve the problems of low video processing efficiency in the prior art, loss of some important video information, or waste of storage resources of the computing device in a large GOP reverse playback scenario.
结合前述图1-图17所述实施例中的相关阐述,下面介绍本发明适用的装置及设备。请参见图18,是本发明实施例提供的一种视频处理装置的结构示意图。如图18,该视频处理装置18包括获取单元181、确定单元182及插入单元183。可选地,还可包括解码播放单元184。其中,With reference to the relevant descriptions in the embodiments described in FIGS. 1 to 17, the following describes the devices and equipment to which the present invention is applicable. Refer to FIG. 18, which is a schematic structural diagram of a video processing apparatus according to an embodiment of the present invention. As shown in FIG. 18, the video processing device 18 includes an acquiring unit 181, a determining unit 182, and an inserting unit 183. Optionally, a decoding and playing unit 184 may also be included. among them,
所述获取单元181,用于获取视频中的图像组GOP,所述GOP的首帧是第一I帧,所述GOP包括M个帧,M为正整数;The acquiring unit 181 is configured to acquire a group of pictures GOP in the video, the first frame of the GOP is the first I frame, the GOP includes M frames, and M is a positive integer;
所述确定单元182,用于确定所述M个帧中是否包括虚拟帧内编码VI帧;The determining unit 182 is configured to determine whether a virtual intra-coded VI frame is included in the M frames;
所述插入单元183,用于在所述M个帧中包括VI帧时,在所述VI帧之前插入第二I帧;The inserting unit 183 is configured to insert a second I frame before the VI frame when a VI frame is included in the M frames;
其中,所述第二I帧为在视频解码时所述VI帧参考的帧。Wherein, the second I frame is a frame referenced by the VI frame during video decoding.
在一些可能的实施方式中,视频处理装置180还可包括解码播放单元184。其中,所述确定单元182用于响应于视频播放请求,确定所述视频播放请求中视频的起始时间位于所述GOP中所述第二I帧之后;所述解码播放单元184用于从所述第二I帧开始解码并播放所述视频。In some possible implementation manners, the video processing device 180 may further include a decoding and playing unit 184. The determining unit 182 is configured to determine that the start time of the video in the video playback request is located after the second I frame in the GOP in response to the video playback request; the decoding and playback unit 184 is configured to download The second I frame starts to decode and play the video.
在一些可能的实施方式中,所述第二I帧是所述VI帧的前一帧。In some possible implementation manners, the second I frame is the previous frame of the VI frame.
在一些可能的实施方式中,所述GOP还包括所述GOP的索引信息,所述索引信息中记录所述第二I帧的存储地址,在所述VI帧之前插入第二I帧之前,所述获取单元181还用于根据所述GOP的索引信息,从所述第二I帧的存储地址中获取所述第二I帧。In some possible implementation manners, the GOP further includes index information of the GOP, the storage address of the second I frame is recorded in the index information, and the second I frame is inserted before the VI frame. The acquiring unit 181 is further configured to acquire the second I frame from the storage address of the second I frame according to the index information of the GOP.
在一些可能的实施方式中,所述第二I帧用于解码所述VI帧,并不用于输出显示。In some possible implementation manners, the second I frame is used for decoding the VI frame, and is not used for output display.
在一些可能的实施方式中,所述获取单元181具体用于接收视频处理请求,所述视频处理请求携带有视频的起始时间,所述视频包括至少一个图像组GOP;响应所述视频处理请求,从GOP索引表中获取与所述起始时间对应的图像组GOP;In some possible implementation manners, the acquiring unit 181 is specifically configured to receive a video processing request, the video processing request carries the start time of the video, and the video includes at least one group of pictures GOP; responding to the video processing request , Obtain the GOP of the group of pictures corresponding to the start time from the GOP index table;
其中,所述GOP索引表中记录有至少一个映射关系,所述映射关系为每个所述GOP对应有所述GOP的索引信息,所述GOP的索引信息中包括所述GOP的起始时间。Wherein, at least one mapping relationship is recorded in the GOP index table, and the mapping relationship is that each GOP corresponds to index information of the GOP, and the index information of the GOP includes the start time of the GOP.
在一些可能的实施方式中,所述GOP的索引信息还包括所述帧的播放时间,在所述视频处理请求为视频播放请求时,所述VI帧为所述GOP中播放时间与所述GOP的起始时间之差最小的VI帧。In some possible implementation manners, the index information of the GOP further includes the playback time of the frame, and when the video processing request is a video playback request, the VI frame is the playback time in the GOP and the GOP The VI frame with the smallest difference between the start time.
在实际应用中,本发明获取单元181和确定单元182的功能可由图9的码流检测器1041实现。本发明插入单元183的功能可由图9的码流修改器1043实现。本发明解码播放单元184的功能可由图4的视频解码单元106实现。换句话说,图4或图9中视频读写单元104中的码流检测器1041具体可由获取单元181和确定单元182等功能模块实现。视频读写单元104中的码流修改器1043具体可由插入单元183等功能模块实现。视频解码单元106具体可由解码播放单元184等功能模块实现。In practical applications, the functions of the acquiring unit 181 and the determining unit 182 of the present invention can be implemented by the code stream detector 1041 in FIG. 9. The function of the insertion unit 183 of the present invention can be implemented by the code stream modifier 1043 in FIG. 9. The function of the decoding and playing unit 184 of the present invention can be implemented by the video decoding unit 106 in FIG. 4. In other words, the code stream detector 1041 in the video reading and writing unit 104 in FIG. 4 or FIG. 9 can be specifically implemented by functional modules such as the acquiring unit 181 and the determining unit 182. The code stream modifier 1043 in the video reading and writing unit 104 can be specifically implemented by functional modules such as the plug-in unit 183. The video decoding unit 106 may be specifically implemented by functional modules such as the decoding and playing unit 184.
本发明实施例的装置18中涉及的各模块或单元具体可通过软件程序或硬件实现。当由软件程序实现时,装置18中涉及的各模块或单元均为软件模块或软件单元,当由硬件实现时,装置18涉及的各模块或单元可以通过专用集成电路(application-specific integrated circuit,ASIC)实现,或可编程逻辑器件(programmable logic device,PLD)实现,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程门阵列(field-programmable gatearray,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合,本发明不做限定。Each module or unit involved in the device 18 of the embodiment of the present invention may be specifically implemented by software programs or hardware. When implemented by a software program, the modules or units involved in the device 18 are software modules or software units. When implemented by hardware, the modules or units involved in the device 18 can be implemented through application-specific integrated circuits. ASIC) implementation, or programmable logic device (programmable logic device, PLD) implementation, the above PLD can be a complex programmable logic device (CPLD), field-programmable gate array (field-programmable gate array, FPGA), general purpose Generic array logic (GAL) or any combination thereof is not limited in the present invention.
需要说明的,图18仅仅是本发明实施例的一种可能的实现方式,实际应用中,视频处理装置还可以包括更多或更少的部件,这里不作限制。关于本发明实施例中未示出或未描述的内容,可参见前述方法实施例中的相关阐述,这里不再赘述。It should be noted that FIG. 18 is only a possible implementation manner of the embodiment of the present invention. In practical applications, the video processing device may also include more or fewer components, which is not limited here. Regarding the content not shown or described in the embodiment of the present invention, reference may be made to the relevant description in the foregoing method embodiment, which will not be repeated here.
请参见图19,是本发明实施例提供的一种计算设备19的结构示意图。如图19所示的计算设备包括一个或多个处理器1901、通信接口1902和存储器1903,处理器1901、通信接口1902和存储器1903可通过总线方式连接,也可通过无线传输等其他手段实现通信。本发明实施例以通过总线1904连接为例其中,该存储器1903用于存储指令,该处理器1901用于执行该存储器1903存储的指令。该存储器1903存储程序代码,且处理器1901可以调用存储器1903中存储的程序代码以实现如图18中所示的视频处理装置18。Please refer to FIG. 19, which is a schematic structural diagram of a computing device 19 according to an embodiment of the present invention. The computing device shown in FIG. 19 includes one or more processors 1901, a communication interface 1902, and a memory 1903. The processor 1901, the communication interface 1902, and the memory 1903 can be connected by a bus, or communication can be achieved by other means such as wireless transmission. . The embodiment of the present invention takes the connection through the bus 1904 as an example, where the memory 1903 is used to store instructions, and the processor 1901 is used to execute instructions stored in the memory 1903. The memory 1903 stores program codes, and the processor 1901 can call the program codes stored in the memory 1903 to implement the video processing device 18 as shown in FIG. 18.
在实际应用中,本发明实施例中处理器1901可以调用存储器1903中存储的程序代码用以执行如上图14所述方法实施例中描述的所有或部分步骤,和/或,文本中描述的其他 内容等,这里不再赘述。In practical applications, the processor 1901 in the embodiment of the present invention may call the program code stored in the memory 1903 to execute all or part of the steps described in the method embodiment described in FIG. 14 above, and/or other steps described in the text. The content, etc., will not be repeated here.
应理解,处理器1901可以由一个或者多个通用处理器构成,例如中央处理器(central processing unit,CPU)。处理器1901可用于运行相关的程序代码中以下功能模块的程序。该功能模块具体可包括但不限于上文所述的获取单元181、确定单元182及插入单元183等模块中的任一项或多项的组合。也就是说,处理器1901执行程序代码可以上述功能模块中的任一项或多项的功能。其中,关于这里提及的各个功能模块具体可参见前述实施例中的相关阐述,这里不再赘述。It should be understood that the processor 1901 may be composed of one or more general-purpose processors, such as a central processing unit (CPU). The processor 1901 may be used to run programs of the following functional modules in the related program code. The functional module may specifically include, but is not limited to, any one or a combination of the above-mentioned acquiring unit 181, determining unit 182, and inserting unit 183. In other words, the program code executed by the processor 1901 can perform the functions of any one or more of the above functional modules. For details of the functional modules mentioned here, please refer to the relevant descriptions in the foregoing embodiments, which will not be repeated here.
通信接口1902可以为有线接口(例如以太网接口)或无线接口(例如蜂窝网络接口或使用无线局域网接口),用于与其他模块或装置设备进行通信。例如,本发明实施例中通信接口1902具体可用于获取视频中的GOP等等。The communication interface 1902 may be a wired interface (such as an Ethernet interface) or a wireless interface (such as a cellular network interface or using a wireless local area network interface) for communicating with other modules or devices. For example, the communication interface 1902 in the embodiment of the present invention may be specifically used to obtain GOPs in the video and so on.
存储器1903可以包括易失性存储器(Volatile Memory),例如随机存取存储器(Random Access Memory,RAM);存储器也可以包括非易失性存储器(Non-Volatile Memory),例如只读存储器(Read-Only Memory,ROM)、快闪存储器(Flash Memory)、硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);存储器1903还可以包括上述种类的存储器的组合。存储器1903可用于存储一组程序代码,以便于处理器1901调用存储器1903中存储的程序代码以实现本发明实施例中涉及的上述各功能模块的功能。The memory 1903 may include volatile memory (Volatile Memory), such as random access memory (Random Access Memory, RAM); the memory may also include non-volatile memory (Non-Volatile Memory), such as read-only memory (Read-Only Memory). Memory, ROM), Flash Memory (Flash Memory), Hard Disk Drive (HDD), or Solid-State Drive (SSD); the memory 1903 may also include a combination of the foregoing types of memories. The memory 1903 may be used to store a group of program codes, so that the processor 1901 can call the program codes stored in the memory 1903 to implement the functions of the above-mentioned functional modules involved in the embodiments of the present invention.
需要说明的,图19仅仅是本发明实施例的一种可能的实现方式,实际应用中,计算设备还可以包括更多或更少的部件,这里不作限制。关于本发明实施例中未示出或未描述的内容,可参见前述方法实施例中的相关阐述,这里不再赘述。It should be noted that FIG. 19 is only a possible implementation manner of the embodiment of the present invention. In practical applications, the computing device may also include more or fewer components, which is not limited here. Regarding the content not shown or described in the embodiment of the present invention, reference may be made to the relevant description in the foregoing method embodiment, which will not be repeated here.
本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算设备上运行时,图14所述实施例中所示的方法流程得以实现。The embodiment of the present invention also provides a computer-readable storage medium in which instructions are stored. When the computer-readable storage medium runs on a computing device, the method flow shown in the embodiment in FIG. 14 is implemented.
本发明实施例还提供一种计算机程序产品,当所述计算机程序产品在计算设备上运行时,图14所述实施例中所示的方法流程得以实现。The embodiment of the present invention also provides a computer program product. When the computer program product runs on a computing device, the method flow shown in the embodiment of FIG. 14 is realized.
结合本发明实施例公开内容所描述的方法或者算法的步骤可以硬件的方式来实现,也可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(英文:Random Access Memory,RAM)、闪存、只读存储器(英文:Read Only Memory,ROM)、可擦除可编程只读存储器(英文:Erasable Programmable ROM,EPROM)、电可擦可编程只读存储器(英文:Electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、只读光盘(CD-ROM)或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于计算设备中。当然,处理器和存储介质也可以作为分立组件存在于计算设备中。The steps of the method or algorithm described in combination with the disclosure of the embodiment of the present invention may be implemented in a hardware manner, or may be implemented in a manner in which a processor executes software instructions. Software instructions can be composed of corresponding software modules, which can be stored in random access memory (English: Random Access Memory, RAM), flash memory, read-only memory (English: Read Only Memory, ROM), erasable and programmable Read-only memory (English: Erasable Programmable ROM, EPROM), electrically erasable programmable read-only memory (English: Electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM, or well-known in the art Any other form of storage medium. An exemplary storage medium is coupled to the processor, so that the processor can read information from the storage medium and write information to the storage medium. Of course, the storage medium may also be an integral part of the processor. The processor and the storage medium may be located in the ASIC. In addition, the ASIC may be located in the computing device. Of course, the processor and the storage medium may also exist as discrete components in the computing device.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The program can be stored in a computer readable storage medium, and the program can be stored in a computer readable storage medium. When executed, it may include the procedures of the above-mentioned method embodiments. The aforementioned storage media include: ROM, RAM, magnetic disks or optical disks and other media that can store program codes.

Claims (16)

  1. 一种视频处理方法,其特征在于,应用于计算设备中,所述方法包括:A video processing method, characterized in that it is applied to a computing device, and the method includes:
    获取视频中的图像组GOP,所述GOP的首帧是第一I帧,所述GOP包括M个帧,M为正整数;Acquiring a group of pictures GOP in the video, the first frame of the GOP is the first I frame, the GOP includes M frames, and M is a positive integer;
    确定所述M个帧中是否包括虚拟帧内编码VI帧;Determine whether the M frames include a virtual intra-coded VI frame;
    在所述M个帧中包括VI帧时,在所述VI帧之前插入第二I帧;When a VI frame is included in the M frames, a second I frame is inserted before the VI frame;
    其中,所述第二I帧为在视频解码时所述VI帧参考的帧。Wherein, the second I frame is a frame referenced by the VI frame during video decoding.
  2. 如权利要求1所述的方法,其特征在于,所述方法还包括:The method of claim 1, wherein the method further comprises:
    响应于视频播放请求,确定所述视频播放请求中视频的起始时间位于所述GOP中所述第二I帧之后;In response to the video play request, determining that the start time of the video in the video play request is after the second I frame in the GOP;
    从所述第二I帧开始解码并播放所述视频。Start decoding and play the video from the second I frame.
  3. 如权利要求1或2所述的方法,其特征在于,所述第二I帧是所述VI帧的前一帧。The method according to claim 1 or 2, wherein the second I frame is the previous frame of the VI frame.
  4. 如权利要求1-3中任一项所述的方法,其特征在于,所述GOP还包括所述GOP的索引信息,所述索引信息中记录所述第二I帧的存储地址,在所述VI帧之前插入第二I帧之前,所述方法还包括:The method according to any one of claims 1 to 3, wherein the GOP further comprises index information of the GOP, and the storage address of the second I frame is recorded in the index information, and the storage address of the second I frame is recorded in the index information. Before inserting the second I frame before the VI frame, the method further includes:
    根据所述GOP的索引信息,从所述第二I帧的存储地址中获取所述第二I帧。According to the index information of the GOP, the second I frame is obtained from the storage address of the second I frame.
  5. 如权利要求1-4中任一项所述的方法,其特征在于,所述第二I帧用于解码所述VI帧,并不用于输出显示。The method according to any one of claims 1 to 4, wherein the second I frame is used for decoding the VI frame and is not used for output display.
  6. 如权利要求1-5中任一项所述的方法,其特征在于,所述获取视频中的图像组GOP包括:The method according to any one of claims 1 to 5, wherein the obtaining the group of pictures GOP in the video comprises:
    接收视频处理请求,所述视频处理请求携带有视频的起始时间,所述视频包括至少一个图像组GOP;Receiving a video processing request, where the video processing request carries the start time of the video, and the video includes at least one group of pictures GOP;
    响应所述视频处理请求,从GOP索引表中获取与所述起始时间对应的图像组GOP;In response to the video processing request, obtain the GOP of the group of pictures corresponding to the start time from the GOP index table;
    其中,所述GOP索引表中记录有至少一个映射关系,所述映射关系为每个所述GOP对应有所述GOP的索引信息,所述GOP的索引信息中包括所述GOP的起始时间。Wherein, at least one mapping relationship is recorded in the GOP index table, and the mapping relationship is that each GOP corresponds to index information of the GOP, and the index information of the GOP includes the start time of the GOP.
  7. 如权利要求6所述的方法,其特征在于,所述GOP的索引信息还包括所述帧的播放时间,The method according to claim 6, wherein the index information of the GOP further includes the playing time of the frame,
    在所述视频处理请求为视频播放请求时,所述VI帧为所述GOP中播放时间与所述GOP的起始时间之差最小的VI帧。When the video processing request is a video playback request, the VI frame is the VI frame in the GOP with the smallest difference between the playback time and the start time of the GOP.
  8. 一种视频处理装置,其特征在于,包括获取单元、确定单元及插入单元,其中:A video processing device, characterized in that it comprises an acquisition unit, a determination unit and an insertion unit, wherein:
    所述获取单元,用于获取视频中的图像组GOP,所述GOP的首帧是第一I帧,所述GOP包括M个帧,M为正整数;The acquiring unit is configured to acquire a group of pictures GOP in a video, the first frame of the GOP is the first I frame, the GOP includes M frames, and M is a positive integer;
    所述确定单元,用于确定所述M个帧中是否包括虚拟帧内编码VI帧;The determining unit is configured to determine whether the M frames include a virtual intra-coded VI frame;
    所述插入单元,用于在所述M个帧中包括VI帧时,在所述VI帧之前插入第二I帧;The inserting unit is configured to insert a second I frame before the VI frame when a VI frame is included in the M frames;
    其中,所述第二I帧为在视频解码时所述VI帧参考的帧。Wherein, the second I frame is a frame referenced by the VI frame during video decoding.
  9. 如权利要求8所述的装置,其特征在于,所述装置还包括解码播放单元,8. The device according to claim 8, wherein the device further comprises a decoding and playing unit,
    所述确定单元,用于响应于视频播放请求,确定所述视频播放请求中视频的起始时间位于所述GOP中所述第二I帧之后;The determining unit is configured to, in response to a video playback request, determine that the start time of the video in the video playback request is located after the second I frame in the GOP;
    所述解码播放单元,用于从所述第二I帧开始解码并播放所述视频。The decoding and playing unit is configured to decode and play the video from the second I frame.
  10. 如权利要求8或9所述的装置,其特征在于,所述第二I帧是所述VI帧的前一帧。9. The apparatus according to claim 8 or 9, wherein the second I frame is the previous frame of the VI frame.
  11. 如权利要求8-10中任一项所述的装置,其特征在于,所述GOP还包括所述GOP的索引信息,所述索引信息中记录所述第二I帧的存储地址,在所述VI帧之前插入第二I帧之前,The device according to any one of claims 8-10, wherein the GOP further comprises index information of the GOP, and the storage address of the second I frame is recorded in the index information, and the storage address of the second I frame is recorded in the index information. Insert before the VI frame before the second I frame,
    所述获取单元,还用于根据所述GOP的索引信息,从所述第二I帧的存储地址中获取所述第二I帧。The acquiring unit is further configured to acquire the second I frame from the storage address of the second I frame according to the index information of the GOP.
  12. 如权利要求8-11中任一项所述的装置,其特征在于,所述第二I帧用于解码所述VI帧,并不用于输出显示。The device according to any one of claims 8-11, wherein the second I frame is used for decoding the VI frame, and is not used for output display.
  13. 如权利要求8-12中任一项所述的装置,其特征在于,The device according to any one of claims 8-12, wherein:
    所述获取单元,具体用于接收视频处理请求,所述视频处理请求携带有视频的起始时间,所述视频包括至少一个图像组GOP;响应所述视频处理请求,从GOP索引表中获取与所述起始时间对应的图像组GOP;The acquiring unit is specifically configured to receive a video processing request, where the video processing request carries the start time of the video, and the video includes at least one group of pictures GOP; in response to the video processing request, acquiring and The group of pictures GOP corresponding to the start time;
    其中,所述GOP索引表中记录有至少一个映射关系,所述映射关系为每个所述GOP对应有所述GOP的索引信息,所述GOP的索引信息中包括所述GOP的起始时间。Wherein, at least one mapping relationship is recorded in the GOP index table, and the mapping relationship is that each GOP corresponds to index information of the GOP, and the index information of the GOP includes the start time of the GOP.
  14. 如权利要求13所述的装置,其特征在于,所述GOP的索引信息还包括所述帧的播放时间,The apparatus according to claim 13, wherein the index information of the GOP further includes the playing time of the frame,
    在所述视频处理请求为视频播放请求时,所述VI帧为所述GOP中播放时间与所述GOP的起始时间之差最小的VI帧。When the video processing request is a video playback request, the VI frame is the VI frame in the GOP with the smallest difference between the playback time and the start time of the GOP.
  15. 一种计算设备,其特征在于,包括处理器和接口,所述处理器和所述接口通信,所述接口用于接收GOP并发送给所述处理器,所述处理器用于所述处理器通过运行程序指令执行如权利要求1-7中任一项所述的方法。A computing device, characterized in that it comprises a processor and an interface, the processor communicates with the interface, the interface is used for receiving GOP and sending it to the processor, and the processor is used for the processor to pass through The running program instructions execute the method according to any one of claims 1-7.
  16. 一种计算机程序产品,其特征在于,当其在计算机上运行时,使得所述计算机执行上述权利要求1-7中任一项所述的方法。A computer program product, characterized in that when it runs on a computer, the computer executes the method according to any one of claims 1-7.
PCT/CN2019/125411 2019-12-13 2019-12-13 Video processing method and apparatus, and computer readable storage medium WO2021114305A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980086119.6A CN113261283A (en) 2019-12-13 2019-12-13 Video processing method, device and computer readable storage medium
PCT/CN2019/125411 WO2021114305A1 (en) 2019-12-13 2019-12-13 Video processing method and apparatus, and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/125411 WO2021114305A1 (en) 2019-12-13 2019-12-13 Video processing method and apparatus, and computer readable storage medium

Publications (1)

Publication Number Publication Date
WO2021114305A1 true WO2021114305A1 (en) 2021-06-17

Family

ID=76328817

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/125411 WO2021114305A1 (en) 2019-12-13 2019-12-13 Video processing method and apparatus, and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN113261283A (en)
WO (1) WO2021114305A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08242452A (en) * 1995-03-02 1996-09-17 Matsushita Electric Ind Co Ltd Video signal compression coder
CN101127919A (en) * 2007-09-28 2008-02-20 中兴通讯股份有限公司 A video sequence coding method
CN102378008A (en) * 2011-11-02 2012-03-14 深圳市融创天下科技股份有限公司 Video encoding method, video encoding device and video encoding system for shortening waiting time for playing
CN105847790A (en) * 2015-01-16 2016-08-10 杭州海康威视数字技术股份有限公司 Code stream transmission method and device
CN107124610A (en) * 2017-04-06 2017-09-01 浙江大华技术股份有限公司 A kind of method for video coding and device
US20190289322A1 (en) * 2016-11-16 2019-09-19 Gopro, Inc. Video encoding quality through the use of oncamera sensor information

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105847825A (en) * 2015-01-16 2016-08-10 杭州海康威视数字技术股份有限公司 Encoding, index storage and access methods for video encoding code stream and corresponding apparatus
CN106791875B (en) * 2016-11-30 2020-03-31 华为技术有限公司 Video data decoding method, video data encoding method and related devices

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08242452A (en) * 1995-03-02 1996-09-17 Matsushita Electric Ind Co Ltd Video signal compression coder
CN101127919A (en) * 2007-09-28 2008-02-20 中兴通讯股份有限公司 A video sequence coding method
CN102378008A (en) * 2011-11-02 2012-03-14 深圳市融创天下科技股份有限公司 Video encoding method, video encoding device and video encoding system for shortening waiting time for playing
CN105847790A (en) * 2015-01-16 2016-08-10 杭州海康威视数字技术股份有限公司 Code stream transmission method and device
US20190289322A1 (en) * 2016-11-16 2019-09-19 Gopro, Inc. Video encoding quality through the use of oncamera sensor information
CN107124610A (en) * 2017-04-06 2017-09-01 浙江大华技术股份有限公司 A kind of method for video coding and device

Also Published As

Publication number Publication date
CN113261283A (en) 2021-08-13

Similar Documents

Publication Publication Date Title
US8918533B2 (en) Video switching for streaming video data
US10129587B2 (en) Fast switching of synchronized media using time-stamp management
CN107634930B (en) Method and device for acquiring media data
CN111770390B (en) Data processing method, device, server and storage medium
US20230164371A1 (en) Method, device, and computer program for improving random picture access in video streaming
US11438645B2 (en) Media information processing method, related device, and computer storage medium
CN109963176B (en) Video code stream processing method and device, network equipment and readable storage medium
CN110662084B (en) MP4 file stream live broadcasting method, mobile terminal and storage medium
CN112653904B (en) Rapid video clipping method based on PTS and DTS modification
CN112087642B (en) Cloud guide playing method, cloud guide server and remote management terminal
WO2023226915A1 (en) Video transmission method and system, device, and storage medium
US9060184B2 (en) Systems and methods for adaptive streaming with augmented video stream transitions using a media server
WO2017092433A1 (en) Method and device for video real-time playback
JP2005123907A (en) Data reconstruction apparatus
CN114363648A (en) Method, equipment and storage medium for audio and video alignment in mixed flow process of live broadcast system
CN115278307B (en) Video playing method, device, equipment and medium
WO2021114305A1 (en) Video processing method and apparatus, and computer readable storage medium
CN109302574B (en) Method and device for processing video stream
JP2000331421A (en) Information recorder and information recording device
CN110574378B (en) Method and apparatus for media content asset change
WO2023078048A1 (en) Video bitstream encapsulation method and apparatus, video bitstream decoding method and apparatus, and video bitstream access method and apparatus
CN114615549B (en) Streaming media seek method, client, storage medium and mobile device
WO2013163221A1 (en) Systems and methods for adaptive streaming with augmented video stream transitions
US11973820B2 (en) Method and apparatus for mpeg dash to support preroll and midroll content during media playback
WO2022100742A1 (en) Video encoding and video playback method, apparatus and system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19956066

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19956066

Country of ref document: EP

Kind code of ref document: A1