WO2021114305A1 - Procédé et appareil de traitement vidéo, et support de stockage lisible par ordinateur - Google Patents

Procédé et appareil de traitement vidéo, et support de stockage lisible par ordinateur Download PDF

Info

Publication number
WO2021114305A1
WO2021114305A1 PCT/CN2019/125411 CN2019125411W WO2021114305A1 WO 2021114305 A1 WO2021114305 A1 WO 2021114305A1 CN 2019125411 W CN2019125411 W CN 2019125411W WO 2021114305 A1 WO2021114305 A1 WO 2021114305A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
gop
video
index information
frames
Prior art date
Application number
PCT/CN2019/125411
Other languages
English (en)
Chinese (zh)
Inventor
杨胜凯
刘俊
杨海涛
陈绍林
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201980086119.6A priority Critical patent/CN113261283B/zh
Priority to PCT/CN2019/125411 priority patent/WO2021114305A1/fr
Publication of WO2021114305A1 publication Critical patent/WO2021114305A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/114Adapting the group of pictures [GOP] structure, e.g. number of B-frames between two anchor frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/177Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction

Definitions

  • the present invention relates to the field of Internet technology, in particular to video processing methods, devices and computer storage media.
  • a GOP is a group of continuous image pictures (ie, frame pictures, referred to as frames for short).
  • the computing device after the computing device receives the video, it needs to decode and play the video.
  • a computing device receives a request to play a drag progress bar for a video, in response to the playback request, it acquires multiple GOPs that make up the video from the drag stop position, and decodes and plays each GOP.
  • the target frame pointed to by the drag stop position is a non-I frame
  • the computing device needs to search for the I frame in several frames before or after the target frame, so as to start decoding and playing from the I frame.
  • the distance between the I frame and the target frame is large, the video processing efficiency will be reduced to a certain extent, and the user's viewing experience will be affected.
  • the computing device can discard the GOP where the target frame is located, and enter the decoding and playback of the next GOP. This will cause some important video information to be discarded and affect the user's viewing experience.
  • the embodiments of the present invention disclose a video processing method, a device, and a computer-readable storage medium, which can solve the problems of reduced video processing efficiency and loss of important video information in existing solutions.
  • an embodiment of the present invention discloses a video processing method applied to a computing device.
  • the method includes: acquiring a group of pictures GOP in a video, the first frame of the GOP is the first I frame, and the GOP is Including M frames, M is a positive integer. It is determined whether the M frames include a virtual intra-coded VI frame, and when the VI frame is included in the M frames, a second I frame is inserted before the VI frame. Among them, the second I frame is a frame referenced by the VI frame during video decoding.
  • inserting the second I frame before the VI frame facilitates subsequent decoding and playback of the video from the second I frame. It can solve the problems of reduced video processing efficiency, loss of important video information, and waste of storage resources of computing devices in the prior art, thereby helping to improve video processing efficiency.
  • the computing device in response to the video playback request, determines that the start time of the video in the video playback request is after the second I frame in the GOP. Then start decoding and play the video from the first frame.
  • the video can be decoded and played from the second I frame in the video playback scene.
  • the decoding starts from the first I frame of the GOP, which can save video decoding time and improve video processing efficiency.
  • the second I frame is the previous frame of the VI frame.
  • the GOP further includes index information of the GOP, and the index information records the storage address of the second I frame.
  • the computing device Before inserting the second I frame before the VI frame, the computing device can obtain the second I frame from the storage address of the second I frame according to the index information of the GOP.
  • the index information of the VI frame is inserted in the VI frame after the second I frame.
  • the index information of the VI frame is used to point to the second I frame.
  • the computing device may obtain the second I frame pointed to by the index information according to the index information of the VI frame.
  • the computing device can obtain the second I frame to be inserted according to the index information of the GOP or the index information of the VI frame. It is convenient to insert the second I frame before the VI frame later. This can decode the video faster.
  • the second I frame is only used for decoding the VI frame, and is not used for output display.
  • the GOP includes at least one network abstraction layer unit NALU, and the computing device recognizes whether supplementary enhancement information SEI NALU is included in the GOP to determine whether the M frames include VI frames.
  • the SEI NALU is used to indicate that the frame in which the i-th NALU before the SEI NALU is located is a VI frame, or indicates that the frame in which the j-th NALU after the SEI NALU is located is a VI frame.
  • the computing device can recognize the VI frames included in the GOP by recognizing the SEI NALU in the GOP, which can improve the convenience and efficiency of VI frame recognition.
  • the GOP includes reference frame RPS information of the frame.
  • the computing device determines whether the VI frame is included in the M frames by identifying the RPS information of each frame in the GOP. Wherein, when the RPS information of the frame is used to indicate to refer to an I frame when decoding the frame, and the previous frame of the frame is a non-I frame, the frame is a VI frame.
  • the computing device directly recognizes the reference frame RPS information in the frame to determine whether the frame is a VI frame. This can improve the accuracy of VI frame recognition.
  • the computing device receives a video processing request, the video processing request carries the start time of the video, and the video includes at least one GOP.
  • the GOP corresponding to the start time is obtained from the GOP index table.
  • at least one mapping relationship is recorded in the GOP index table, and the mapping relationship is that each GOP corresponds to the index information of the GOP, and the index information of the GOP includes the start time of the GOP.
  • the video processing request includes a video playback request or a video download request.
  • the computing device may respond to the video playback request and obtain the GOP where the start time is located from the GOP index table.
  • the video processing request is a video download request, in response to the video download request, at least one GOP starting from the GOP at the start time is obtained from the GOP index table.
  • the computing device can obtain the corresponding GOP in the video according to different application scenarios. To process the GOP. This helps to obtain the corresponding GOP for video processing according to the actual needs of the device.
  • the index information of the GOP also includes the playing time of the frame.
  • the VI frame is the VI frame with the smallest difference between the playback time and the start time of the GOP in the GOP.
  • the computing device can find the VI frame closest to the playback time to insert the second I frame, which avoids I frame insertion processing for each VI frame in the GOP, saves equipment resources, and improves Video processing efficiency.
  • an embodiment of the present invention provides a video processing device, which includes a functional module or unit for executing the method described in the first aspect or any possible implementation manner of the first aspect.
  • an embodiment of the present invention provides a computing device, including: a processor, a memory, a communication interface, and a bus; the processor, the communication interface, and the memory communicate with each other through the bus; the communication interface is used to receive and send data; and the memory , Is used to store instructions; the processor is used to call instructions in the memory to execute the method described in the first aspect or any possible implementation of the first aspect.
  • a computer-readable storage medium is provided, and the computer-readable storage medium is used to execute the instructions of the method described in the first aspect.
  • a computer program product which when it runs on a computer, enables the computer to execute the instructions of the method described in the first aspect.
  • a chip product is provided to implement the foregoing first aspect or the method in any possible implementation manner of the first aspect.
  • Fig. 1 is a schematic structural diagram of a GOP provided by an embodiment of the present invention.
  • Figure 2 is a schematic structural diagram of a NALU provided by an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a SEI NALU provided by an embodiment of the present invention.
  • Fig. 4 is a schematic structural diagram of a video processing system provided by an embodiment of the present invention.
  • Fig. 5 is a schematic structural diagram of a video decoding unit provided by an embodiment of the present invention.
  • Fig. 6 is a schematic structural diagram of another GOP provided by an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a SEI NALU inserted into a GOP according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of another SEI NALU inserted into a GOP according to an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a video reading and writing unit provided by an embodiment of the present invention.
  • Fig. 10 is a schematic structural diagram of another GOP provided by an embodiment of the present invention.
  • FIG. 11 is a schematic diagram of a user dragging a video playback progress bar according to an embodiment of the present invention.
  • Figure 12 is a schematic structural diagram of another GOP provided by an embodiment of the present invention.
  • FIG. 13A is a schematic diagram of storing GOPs in a time index manner according to an embodiment of the present invention.
  • FIG. 13B is a schematic diagram of storing GOPs in a frame number index mode provided by an embodiment of the present invention.
  • FIG. 14 is a schematic flowchart of a video processing method provided by an embodiment of the present invention.
  • FIG. 15 is a schematic diagram of a GOP that composes a video according to an embodiment of the present invention.
  • FIG. 16 is a schematic diagram of an operation for a user to download a video offline according to an embodiment of the present invention.
  • Figure 17 is a schematic structural diagram of a new GOP provided by an embodiment of the present invention.
  • FIG. 18 is a schematic structural diagram of a video processing device provided by an embodiment of the present invention.
  • FIG. 19 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.
  • GOP also known as group of pictures. Refers to a group of continuous images (also called frames), specifically the group of images between two I-frames. GOP indicates the distance between two I frames.
  • I frame also called intra-frame coded frame, is an independent frame with all its own information, and can be decoded independently without referring to other frames.
  • the first frame in the video is usually an I frame.
  • Non-I frames refer to frames other than I frames, specifically including B frames or P frames.
  • the B frame also called bidirectional predictive coding frame.
  • the B frame records the difference between the current frame and the previous and next frames. That is to say, when decoding a B frame, it is necessary to refer to the previous frame and the next frame of the B frame to decode.
  • the previous frame of the B frame refers to the frame before the B frame and adjacent to the B frame; the frame after the B frame refers to the frame after the B frame and adjacent to the B frame.
  • P frame also known as inter-frame predictive coding frame.
  • the P frame records the difference between the current frame and the previous frame. That is to say, when decoding a P frame, you need to refer to the previous frame of the P frame (specifically, it may be a P frame or an I frame) to decode.
  • VI frame virtual independent frame, VI
  • the VI frame is essentially a P frame, but when decoding a VI frame, refer to the I frame in front of the VI frame.
  • Fig. 1 for a schematic structural diagram of a GOP. As shown in Figure 1, the GOP includes 3 VI frames. When decoding each VI frame, only the I frame that appears before the VI frame in the GOP is referred to, as shown by the arrow in the figure.
  • NALU network abstraction layer unit
  • each frame is composed of at least one NALU.
  • FIG. 2 for a schematic structural diagram of a NALU provided by an embodiment of the present invention.
  • NALU includes NAL Header and NAL Body.
  • the length of a NAL Header is fixed at 1 byte, that is, 8 bits.
  • the NAL Header includes three fields, namely: forbidden_zero_bit, important indication field nal_ref_idc, and type field nal_unit_type. among them,
  • the forbidden_zero_bit occupies 1 bit, and the forbidden_zero_bit field must be 0 in the video coding standard (such as H.264). If the network finds an error in the NALU, the forbidden_zero_bit can be set to 1, which is convenient for the receiver to correct the error or discard the NALU.
  • nal_ref_idc occupies 2 bits and is used to indicate the importance of NALU.
  • the value range of nal_ref_idc is 00 to 11. When the value of nal_ref_idc is larger, it means that the current NALU is more important and needs to be protected first.
  • nal_unit_type occupies 5 bits and is used to indicate the type of NALU.
  • NAL Body includes the encapsulation of payload data (video data).
  • payload data video data
  • the first layer extended byte string payload (EBSP), which specifically includes the emulation_prevention_three_byte field. The purpose of setting this field is to prevent conflict with the NALU start code (0x000001 or 0x00000001) in the NAL Body.
  • the second layer raw byte sequence payload (RBSP), which is equivalent to the data after NAL Body removes the emulation_prevention_three_byte, and is the data generated after further processing of the original syntax element code stream (encoded data).
  • the basic structure of RBSP is to add end bits after the original encoded data to facilitate byte alignment.
  • the third layer data byte stream (string of data bits, SODB), which identifies the actual original binary code stream after encoding the syntax elements of the H.264 coding standard.
  • NALU may also include only NAL header (NAL Header) and RBSP. That is, the main body of the NAL is RBSP.
  • NAL Header NAL Header
  • RBSP the main body of the NAL
  • the enhanced supplemental information network abstraction layer unit refers to the NALU whose type field nal_unit_type is the SEI type of the supplementary enhanced information unit.
  • FIG. 3 is a schematic structural diagram of an SEI NALU provided by an embodiment of the present invention.
  • the SEI NALU includes a NAL header (NAL Header) and a NAL body (NAL Body).
  • NAL Header may correspond to the introduction in the embodiment shown in FIG. 2 for details.
  • the nal_unit_type in the NAL Header occupies 5 bits and is used to indicate the type of NALU. In practical applications, different NALU types are indicated by setting the value of the nal_unit_type field.
  • nal_unit_type when nal_unit_type is "0X06", it means that the type of NALU indicated by nal_unit_type is SEI type; when nal_unit_type is "0X67”, it means that the type of NALU indicated by nal_unit_type is sequence parameter sets (SPS) type; When nal_unit_type is "0X68", it indicates that the type of NALU indicated by nal_unit_type is a picture parameter set (picture parameter set, PPS) type, etc. In the present invention, nal_unit_type is 0X06, indicating that the type of NALU is SEI.
  • the NAL body includes SEI payload type (SEI payload type), SEI payload size (SEI payload size), and SEI universally unique identifie (SEI UUID) and custom fields.
  • SEI payload type field occupies 1 byte, that is, 8 bits, and is used to indicate the type of payload data carried in the SEI NALU, such as video data, SPS or PPS data.
  • SEI payload size field is used to indicate the size of the payload data, and is referred to as payload size for short.
  • the SEI UUID field occupies 16 bytes and is used to indicate the unique identification of the load data.
  • the number of bytes occupied by the custom field can be a system custom setting for carrying system custom data, which is not limited by the present invention.
  • the video is composed of several time-continuous frames, and the video can be divided into several GOPs during encoding. For example, when a computing device receives a request to play a drag progress bar for a video, if the target frame pointed to by the drag stop position is a non-I frame, it needs to find the closest frame to the target frame from the frames before or after the target frame. From the I frame, read the GOP to decode and play.
  • the decoding time will be prolonged, which will greatly affect the efficiency of video processing, resulting in a reduction in video processing efficiency, and affecting user viewing experience.
  • the target frame cannot be decoded, and part of the video is discarded.
  • the next I-frame position needs to be decoded and played. This leads to the discarding of some important video information, affects the accuracy of video information acquisition, and affects the user's viewing experience.
  • a computing device when a computing device receives a video reverse playback request, it inputs the complete GOP that constitutes the video into the decoder for decoding, stores the decoding result (decoded video) in the buffer, and then plays it in reverse order. For example, a short video of 5 minutes, in the video reverse scene, the computing device needs to play backwards from the end of the short video (that is, the 5th minute) to the beginning and end of the short video. In practice, it is found that if the GOP is large, the buffer space occupied by the decoded GOP is large.
  • the storage space occupied by the computing device to cache the decoded video needs to be 5.8GB. This will lead to waste of storage resources of the computing device.
  • FIG. 4 is a schematic structural diagram of a video processing system provided by an embodiment of the present invention.
  • the video processing system 100 shown in FIG. 4 includes a video encoding unit 102, a video reading and writing unit 104, a video decoding unit 106, and a storage unit 108. among them,
  • the video encoding unit 102 is responsible for encoding the input original video into a video code stream, and specifically can convert the format file of the original video into a file of another video format.
  • the video encoding unit 102 may use encoding standards such as H.261, H.263, H.264, H.265, or H.266 to encode the original video into a video stream.
  • Common video formats include but are not limited to audio video interleaved (AVI), digital video-audio video format-AVI (DV-AVI), moving picture expert group format, MPEG), advanced streaming format (ASF), windows media video (WMV), real media (RM) or other video supported formats, etc.
  • the video encoding unit 102 may divide the video into several GOPs for encoding.
  • a video that is, a video code stream
  • the present invention takes a video (or a video code stream) including one GOP as an example to describe related content.
  • the video encoding unit 102 may specifically be an encoder or other devices that support image or video encoding.
  • the video encoding unit 102 may be deployed in a camera device, such as a camera, a camera, etc.; it may also be deployed as a separate encoder.
  • the storage unit 108 is used for storing video, for example, storing a video code stream obtained after encoding by the video encoding unit 102 and the like.
  • the video reading and writing unit 104 is responsible for writing the video code stream into the storage unit 108. Or read the video code stream from the storage unit 108 (specifically read the GOP in the video code stream), and then input it to the video decoding unit 106 for decoding.
  • the video decoding unit 106 is responsible for decoding the input video code stream and outputting the decoded video code stream. Specifically, the GOP contained in the video bitstream is decoded, and each frame contained in the GOP is output.
  • the video reading and writing unit 104 may specifically be an input output (IO) device that supports data reading and writing functions, such as an IO interface.
  • the video decoding unit 106 may specifically be a device or device that supports a video decoding function, such as a decoder.
  • the video decoding unit 106 may be deployed in a video processing device of a computing device, or may be deployed as a separate decoder, etc., which is not limited in the present invention.
  • the storage unit 108 may specifically be a device supporting a data storage function, which may include, but is not limited to, random access memory (RAM) flash memory, read only memory (ROM), hard disk, registers, and the like.
  • RAM random access memory
  • ROM read only memory
  • registers registers, and the like.
  • the video processing technology provided by the embodiments of the present invention may be applicable to scenarios such as GOP playback or download.
  • This solution includes inserting one or more I frames into the GOP. Compared with the original I frames in the GOP, the newly inserted I frames are closer to the VI frame. In this case, when the content in the specified GOP needs to be played or downloaded, the GOP is found according to the requested time and the VI frame corresponding to the requested time, and the newly inserted I frame is used as the reference frame of the VI frame. Video decoding (without needing to use the original I frame in the GOP as a reference value for video decoding). Thereby improving the efficiency of video processing and the accuracy of playback time. When there are many frames in the GOP, the beneficial effects of the embodiments of the present invention will be more prominent.
  • FIG. 5 is a schematic structural diagram of a video encoding unit 102 provided by an embodiment of the present invention.
  • the video encoding unit 102 includes a VI detector 1021.
  • the system framework may be further divided into two layers: a video coding layer (VCL) and a network abstraction layer (NAL).
  • VCL video coding layer
  • NAL network abstraction layer
  • the video coding unit 102 may also include a video coding layer VCL 1022 and a network abstraction layer NAL 1023.
  • the video encoding unit 102 encodes the input original video through the video encoding layer VCL 1022 to obtain a video encoded bit stream, which is referred to as a video stream for short, and specifically also refers to the GOP in the video stream. Then, the VI frame identification is performed on the GOP obtained by the video coding layer VCL 1022 through the VI detector 1021.
  • the specific implementation of the VI frame identification is not limited. For example, the VI frame identification can be performed according to the definition of the VI frame, and the VI frame identification can also be performed based on the received out-of-band information.
  • the out-of-band information is used to indicate that the frame corresponding to the preset time stamp in the GOP is a VI frame, for example, the out-of-band information is used to indicate that the frame corresponding to the 3s in the GOP is a VI frame, and so on.
  • the network abstraction layer NAL 1023 is notified to mark the VI frame to indicate the position of the VI frame in the GOP.
  • the specific implementation of the VI frame marking is not limited, for example, the supplementary enhancement information SEI marking method, other marking methods that conform to the video coding standard for marking the specific position of the VI frame in the GOP, or the out-of-band method to notify the VI The position of the frame in the GOP, etc.
  • the GOP encoded by the video encoding unit 102 can also be sent to the network abstraction layer NAL 1023 for encapsulation, so as to encapsulate the GOP as a unit packet NALU of the network abstraction layer NALU.
  • the GOP is composed of multiple NALUs.
  • the GOP is composed of a series of NALUs.
  • the first frame of GOP data is picture parameter set (PPS) and sequence parameter set (SPS), followed by I frame and other frames.
  • the GOP includes at least one frame, and each frame includes one or more NALUs.
  • PPS includes information of all slices of an image (ie, frame)
  • SPS includes all information of an image sequence (ie, each frame in the GOP).
  • the VI detector can notify the network abstraction layer NAL 1023 to generate a custom supplement Enhanced information network abstraction layer unit (SEI NAL unit, SEI NALU).
  • SEI NAL unit SEI NALU
  • Insert the SEI NALU before or after the VI frame to indicate that the frame where the i-th NALU before the SEI NALU is located is a VI frame, or indicate that the frame where the j-th NALU after the SEI NALU is located is a VI frame,
  • it can be specifically used to indicate that the previous frame or the next frame of the SEI NALU is a VI frame. Please refer to FIG.
  • the original structure diagram of the GOP and the structure diagram of a new GOP correspondingly obtained after inserting the SEI NALU before and after the VI frame in the GOP are respectively shown.
  • the GOP obtained by the video coding layer VCL 1022 includes P frames and VI frames.
  • the video encoding unit 102 After detecting the VI frame in the GOP through the VI detector 1021, the video encoding unit 102 notifies the network abstraction layer NAL 1023 to add SEI NALU before the VI frame; or notifies the network abstraction layer NAL 1023 to add SEI NALU after the VI frame.
  • the specific position where the SEI NALU is added before or after the VI frame is not limited.
  • the SEI NALU is added before the VI frame as the jth NALU before the first NALU included in the VI frame to indicate The frame where the jth NALU after the SEI NALU is located is the VI frame; or the SEI NALU is added after the VI frame as the i-th NALU after the last NALU included in the VI frame to indicate that it is before the SEI NALU
  • the frame where the i-th NALU is located is the VI frame.
  • each frame (including VI frame) in the GOP is composed of one or more NALUs.
  • the VI frame in the figure includes 3 NALUs, namely NALU1 to NALU3.
  • the SEI NALU can be added before the first NALU (NALU1 shown in the figure) contained in the VI frame, that is, as the first NALU before NALU1. ; Or, add SEI NALU after the last NALU (NALU3 shown in the figure) included in the VI frame, that is, add it as the first NALU after NALU3.
  • the SEI NALU is specifically used to indicate that the frame in which the previous NALU or the next NALU of the SEI NALU is located is a VI frame.
  • the network abstraction layer NAL 1023 can specifically set the value of the relevant field in the SEI NALU to indicate that the frame of the i-th NALU before the SEI NALU is a VI frame; or to indicate that the SEI NALU The frame where the jth NALU is located is the VI frame.
  • the network abstraction layer NAL 1023 can indicate the position of the VI frame in the GOP by setting the Type field in the SEI NALU (specifically, the SEI payload type field) or the value of the SEI UUID field.
  • the network abstraction layer NAL 1023 can also add a field to the custom field of the SEI NALU, and set the value of the added field to indicate the position of the VI frame in the GOP.
  • the network abstraction layer NAL 1023 sets the SEI payload type to +1, it means that the frame where the previous NALU of the SEI NALU is located is a VI frame. Conversely, if the network abstraction layer NAL 1023 sets the SEI payload type to -1, it means that the frame where the next NALU of the SEI NALU is located is a VI frame.
  • FIG. 9 is a schematic structural diagram of a video reading and writing unit 104 according to an embodiment of the present invention.
  • the video reading and writing unit 104 includes a code stream detector 1041, an index generator 1042 and a code stream modifier 1043.
  • the video reading and writing unit 104 and the storage unit 108 communicate with each other. among them,
  • the code stream detector 1041 is used to perform frame detection (ie frame recognition) on the video code stream input to the video read-write unit 104 (specifically refers to the GOP in the video code stream) to determine the frame and the position of each frame contained in the GOP .
  • the present invention can determine the respective positions of the I frame and the VI frame in the GOP.
  • the form of expression of the position is not limited.
  • the frame index, the playback time corresponding to the frame in the GOP, the storage location of the frame in the storage unit 108 (also called the storage address), or other indications of the frame can be used.
  • the code stream detector 1041 performs VI frame mark detection on the GOP to detect the VI frame in the GOP and the position of the VI frame. Since the video encoding unit 102 has different marking methods for VI frames in the GOP, the specific implementation manners of the code stream detector 1041 for VI frame mark detection are also different. The following two specific manners for VI frame mark detection are exemplified.
  • the code stream detector 1041 detects whether the SEI NALU is included in the GOP, and if it does, it determines that the frame where the i-th NALU before the SEI NALU is located is the VI frame, or determines that the SEI NALU is in the SEI NALU.
  • the frame where the jth NALU after the NALU is located is the VI frame.
  • the number of SEI NALUs is not limited, and it may be one or more. When the number of SEI NALUs is multiple, the code stream detector 1041 can detect multiple VI frames included in the GOP and the position of each VI frame in the GOP according to the foregoing principle.
  • the code stream detector 1041 performs VI frame analysis on the GOP according to the out-of-band information sent from the video encoding unit 102, and determines the VI frame contained in the GOP and the position of the VI frame in the GOP.
  • the out-of-band information is used to indicate or notify the position of the VI frame in the GOP, for example, the fifth frame in the GOP is a VI frame, or the frame corresponding to the third second in the GOP is a VI frame, and so on.
  • the code stream detector 1041 can also detect the VI frame in the GOP by parsing the GOP. For details, refer to the following third implementation manner for details.
  • the code stream detector 1041 parses each frame included in the GOP, identifies the reference picture sequence (RPS) information in each frame, and determines the VI frame included in the GOP and the position of the VI frame.
  • RPS reference picture sequence
  • the first slice of each frame contains an RPS message.
  • the RPS information is composed of some identification information, and the meaning indicated by the identification information is specifically a system custom setting, for example, indicating whether the frame is used as a reference for decoding the current frame or subsequent frames.
  • the RPS information includes the reference frame information of the current frame. If the reference frame information is used to indicate that the current frame has only one decoding reference I frame, and the previous frame of the current frame is a non-I frame, it means that the current frame is a VI frame. Specifically, the RPS information indicates that there is a picture order count (POC) of the reference frame.
  • POC picture order count
  • the POC of the reference frame is 1, it means that the current frame has 1 reference frame, and the reference frame is an I frame. That is, the decoding of the current frame only refers to the I frame. Further, if the code stream detector 1041 detects that the previous frame of the current frame is a non-I frame, it can determine that the current frame is a VI frame.
  • the code stream detector 1041 When the code stream detector 1041 detects that the GOP includes a VI frame, it can send a VI frame identification signal to the index generator 1042 for notifying that the GOP includes the VI frame and the related information of the VI frame, such as the index of the VI frame (ie Frame number), the corresponding play time of the VI frame in the GOP, the storage address of the VI frame in the storage unit 108, and so on.
  • the code stream detector 1041 when the code stream detector 1041 detects that an I frame is included in the GOP, it may send an I frame identification signal to the index generator 1042 to notify the I frame included in the GOP and related information of the I frame. For example, the frame number of the I frame, the corresponding playback time of the I frame in the GOP, the storage address of the I frame in the storage unit 108, and so on.
  • the index generator 1042 is configured to receive the I frame identification signal and the VI frame identification signal sent by the code stream detector 1041. After receiving the VI frame identification signal, the index generator 1042 can determine the target I frame corresponding to the reference when decoding the VI frame (this application may also be referred to as the second I frame hereinafter), and perform the comparison between the target I frame and the VI frame.
  • Associated storage for example, storing the storage address of the target I frame into the index information of the GOP to instruct to refer to the target I frame stored at the storage address when decoding the VI frame.
  • the corresponding index information in the VI frame points to the target I frame, which is specifically used to indicate the target I frame pointed to by the index information when the VI frame is decoded.
  • the index information of the GOP is used to identify the GOP, which can include but is not limited to the index number of the GOP, the duration of the GOP, the start time and end time of the GOP corresponding (video code stream), whether the GOP contains the VI frame identifier, and the GOP is in Information such as the storage address in the storage unit 108 and the storage address or offset of the VI frame in the GOP.
  • the target I frame here may specifically refer to an I frame that appears before the VI frame in the GOP, that is, the playback time corresponding to the target I frame has priority over the playback time corresponding to the VI frame.
  • the target I frame here may also refer to the I frame that is closest to the VI frame in the GOP.
  • FIG. 10 for a schematic diagram of a GOP.
  • GOP is a video stream of 10s.
  • the 7s frame in the figure is the VI frame.
  • the target I frame is specifically the 0s-th I frame in the figure. If the target I frame referenced by decoding the VI frame is the I frame closest to the VI frame in the GOP, the target I frame is specifically the 9th I frame in the figure.
  • each GOP has its own GOP index information
  • the index generator 1042 can store each GOP and the index information of the GOP in the form of a GOP index table.
  • the storage unit 108 At least one mapping relationship is stored in the GOP index table, and the mapping relationship is that one GOP corresponds to having one index information of the GOP.
  • the specific index information of the GOP please refer to the above description, which will not be repeated here.
  • the code stream modifier 1043 is used to modify the GOP input by the video reading and writing unit 104 to obtain a new modified GOP. Specifically, the code stream modifier 1043 reads the VI frame contained in the GOP and the target I frame corresponding to the reference when decoding the VI frame, and then inserts the target I frame before the VI frame to obtain at least two new GOPs (also called For multiple GOPs). Wherein, the specific position where the target I frame is inserted before the VI frame is not limited, for example, it is inserted as the m-th frame before the VI frame, and m is a positive integer.
  • the embodiments of the present invention can be visually understood as: by inserting a new I frame in the original GOP, a GOP is divided into multiple new GOPs, where , Each new GOP has an I frame.
  • the newly inserted I frame in the embodiment of the present invention is for reference by VI frame decoding, so it may not have all the functions of the I frame in the original GOP (for example, it may not have the function to be played), as long as it is enough to decode
  • the VI frame can be used for reference at any time.
  • the newly inserted frame only has the function of I frame for VI frame reference and decoding. Therefore, this newly inserted frame can be called a quasi-I frame.
  • the original GOP frame can be considered as not really divided into multiple new GOPs, but still a GOP (it's just that one or more new GOPs are added to this GOP).
  • a quasi-I frame if the newly inserted I frame is exactly the same as the I frame in the original GOP, it can be considered that the original GOP is divided into multiple new GOPs.
  • the inserted frames are collectively referred to as I frames, and the I frames (or quasi I frames) are inserted into the GOP.
  • the result of this operation is collectively referred to as obtaining a "new GOP”.
  • the inserted I frame (for example, the second I frame) in the embodiment of the present invention is: the same frame as the I frame in the original GOP, or has the I frame in the original GOP possesses the VI frame reference decoding function Frame.
  • the video read-write unit 104 when it receives a video processing request, it detects whether there is a VI frame in the GOP through the code stream detector 1041. If there is a VI frame in the GOP, the code stream modifier 1043 reads the target I frame from the storage address of the target I frame referenced by decoding the VI frame recorded in the index information of the GOP. Then the code stream modifier 1043 inserts the read target I frame before the VI frame, thereby obtaining multiple new GOPs. This can solve the problems in the prior art when the GOP is large, if the distance between the I frame and the VI frame is large, the decoding time will be too long, the video processing efficiency will be reduced, or some important video information will be lost.
  • the present invention adopts the method of inserting an I frame before the VI frame, can split a large GOP into multiple small GOPs, and can decode and play based on the split small GOPs during video playback. Compared with the prior art, It can avoid the decoding of some unnecessary information, improve the efficiency of video decoding, avoid the discarding of some important video information and other issues, and ensure the user's viewing experience.
  • the video encoding unit 102 can mark VI frames contained in the GOP, and transmit the VI frame marks along with the GOP, so that the compatibility of the video encoding unit 102 can be improved.
  • the video reading and writing unit 104 can insert the target I frame before the VI frame, and divide the large GOP into multiple new GOPs. In this way, the control is based on the granularity of the VI frame, which can effectively improve the video playback effect. Especially in video reverse scenes, using new GOPs to replace large GOPs and cache them can effectively save storage resources.
  • the first is the video playback scene.
  • the video processing request is specifically a video playback request. Specifically, when a user watches a video, he can drag the progress bar of the video playback at will according to his own needs. Please refer to FIG. 11 which shows a schematic diagram of a user dragging the progress bar of the video playback.
  • the computing device detects that the user is dragging the video playback progress bar, it can generate a corresponding video playback request.
  • the GOP where the dragging stop position is obtained is obtained, and then it is recognized whether the VI frame is included in the GOP.
  • the storage address of the target I frame referred to when decoding the VI frame is obtained from the GOP index table, the target I frame is obtained from the storage address, and the target I frame is inserted before the I frame.
  • the target I frame here may specifically refer to the I frame that appears before the VI frame in the GOP, or may refer to the I frame that is closest to the VI frame in the GOP. For details, please refer to the example shown in FIG. 10 above.
  • the computing device can only process the VI frame closest to the drag stop position in the GOP, that is, before the VI frame Insert the target I frame to get two new GOPs.
  • the playback time corresponding to the inserted target I frame has priority over the playback time corresponding to the dragging stop position. Then decode and play the new GOP where the dragging stop position is located.
  • FIG. 12 for a schematic diagram of the structure of a GOP. As shown in Fig. 12, the GOP is a video code stream of 10s, and the GOP includes two VI frames, VI frame 1 and VI frame 2, respectively.
  • the playback time corresponding to VI frame 1 is the 5th second, and the playback time corresponding to VI frame 2 is the 7th second.
  • users can drag the progress bar of video playback at will. If the user drags the progress bar to stop at 3s, the VI frame closest to the dragging stop position is VI frame 1. At this time, the computing device can insert the target I frame before VI frame 1.
  • the insertion position of the target I frame is not limited, for example, it can be at any position between the drag stop position and the VI frame, or at the drag stop position At any position before, it can ensure that the playback time corresponding to the target I frame after insertion is not later than (that is, greater than or equal to) the playback time corresponding to the dragging stop position, which can avoid the loss of some important video information.
  • the second is the video download scene.
  • the video processing request is specifically a video download request. Specifically, if the user wants to watch the video offline, he can download and cache the video locally in advance.
  • the computing device can download the video (specifically, one or more GOPs contained in the video) in response to the video download request.
  • the video download request can carry the start time and end time of the video, and the computing device will download the video (that is, one or more GOPs in the video) from the start time to the end time. It can start downloading from the GOP at the start time to the end of the GOP at the end time. Then identify whether each GOP includes a VI frame.
  • the storage address of the target I frame referred to when decoding the VI frame is obtained from the GOP index table, the target I frame is obtained from the storage address, and the target I frame is inserted before the VI frame.
  • the target I frame is obtained from the storage address, and the target I frame is inserted before the VI frame.
  • the computing device can process each VI frame included in each GOP in the video, that is, before the VI frame Insert the target I frame to realize the split of large GOP to small GOP.
  • the processing process of the computing device for any VI frame in each GOP is the same. For details, reference may be made to the relevant introduction of the foregoing embodiment, which will not be repeated here.
  • the following describes related embodiments related to GOP storage.
  • Different video processing systems can use different indexing methods to create and store corresponding index information for the GOP.
  • the indexing methods corresponding to the index information of the GOPs in different video processing systems may be different, for example, the time indexing method or the frame number indexing method can be supported.
  • the specific implementation manners of the two indexing methods are given as an example as follows.
  • the first is the time index method.
  • the computing device uses the time index method to create and store corresponding index information for the GOP. Specifically, the computing device creates an index for the GOP according to a preset duration (for example, 1s), and obtains index information of the GOP.
  • the index information includes, but is not limited to, the number of the GOP, whether the GOP contains an I frame, the storage address of the I frame, whether the GOP contains a VI frame, the storage address of the VI frame, the storage address of the GOP in the storage unit 108, and the storage address of the GOP.
  • the preset duration is self-defined by the system, such as self-defined settings according to user requirements, or statistically obtained based on a series of empirical data. Please refer to FIG.
  • FIG. 13A shows a schematic diagram of a GOP stored in a time index manner.
  • GOP is a 10s video code stream
  • the specific figure shows the video code stream from the 0th second to the 9th second.
  • the second is the frame number index method.
  • the computing device uses the frame number index method to create and store corresponding index information for the GOP. Specifically, the computing device creates an index for the GOP according to the I frame interval, and obtains the index information of the GOP.
  • the GOP is used to indicate a group of consecutive frames between two I frames. For the specific index information of the GOP, please refer to the above description, which will not be repeated here.
  • FIG. 13B shows a schematic diagram of a GOP stored in a frame number index mode. As shown in the figure, the GOP is a video code stream including 10 frames, as shown in the figure are frame 0 to frame 10. Each frame corresponds to the index number of the frame.
  • FIG. 14 is a schematic flowchart of a video processing method according to an embodiment of the present invention.
  • the method shown in Figure 14 includes the following implementation steps:
  • Step S102 The computing device obtains the GOP of the group of pictures in the video, and the first frame of the GOP is the first I frame.
  • the GOP includes M frames, and M is a positive integer.
  • the computing device obtains a video processing request, and the video processing request carries the start time of the video.
  • a video corresponding to the start time is acquired, that is, at least one GOP in the video is acquired.
  • the video processing request may also carry the end time of the video, or other system-customized information, etc., which is not limited in the present invention.
  • the video processing request may specifically be generated by the user performing a corresponding video operation on the video, or may be received from other devices.
  • the video processing request may also be different. For example, in a video playback scene, the computing device detects a user's drag operation on the video playback progress bar, and can generate a corresponding video playback request. In a video download scenario, when the computing device detects a user's video download operation for a preset period (the period from the start time to the end time), it can generate a corresponding video download request, etc.
  • step S102 taking the video processing request as a video playback request and a video download request as an example, the specific implementation manner of step S102 is described in detail.
  • the video playback request carries the start time T s of the video.
  • the video includes multiple GOPs.
  • the computing device can respond to the video playback request and obtain the GOP of the group of pictures where the start time T s is located from the multiple GOPs of the video.
  • the video processing request is a video download request
  • the video download request carries the start time T s of the video, and optionally the end time T e of the video.
  • the computing device may respond to a video playback request, where the start time T s GOP start the download, the time until the end of the GOP end T e where, whereby the at least one GOP composed of downloaded video.
  • FIG. 15 for a schematic diagram of a GOP that composes a video according to an embodiment of the present invention.
  • a user plays the movie "XXX" online on a computing device.
  • the movie includes 8 GOPs.
  • T s time to start playing from the video.
  • the computing device may generate a video playback request when detecting that the user is dragging the playback progress bar of the movie.
  • the video playback request carries the start time T s of the video.
  • the computing device may respond to the video playback request to obtain the GOP where the start time T s is located, and the figure is specifically GOP3.
  • the computing device may generate a video download request when detecting the user's download operation for the movie.
  • the video download request may carry the start time and end time of the video to be downloaded.
  • the video to be downloaded can be a video segment (for example, the beginning or the end) of the movie "XXX", or it can be the entire video.
  • the start time and end time of the video to be downloaded can be customized by the user according to actual needs, for example, 00:01:00-00:21:00 (that is, download the video segment from the 1st minute to the 21st minute).
  • the user can perform offline download settings on the display interactive interface provided by the computing device. Please refer to FIG. 16 for a schematic diagram of an operation for a user to download videos offline.
  • the computing device detects an offline download operation for the display interactive interface, it can start downloading from the GOP at the start time until the GOP at the end time ends.
  • the GOP at the start time 00:01:00 is GOP1
  • the GOP at the end time 00:21:00 is GOP3
  • the 20-minute video downloaded by the computing device may specifically include GOP1, GOP2, and GOP3.
  • Step S104 The computing device determines whether a VI frame is included in the M frames.
  • the GOP includes one or more NALUs.
  • the computing device determines whether the VI frame is included in the M frames of the GOP by identifying whether the SEI and NALU are included in the GOP. Specifically, if the SEI NALU is included in the GOP, the frame where the i-th NALU before the SEI NALU is located is the VI frame, or the frame where the j-th NALU after the SEI NALU is located is the VI frame according to the indication of the SEI NALU. frame.
  • the number of SEI NALU is not limited, and it can be one or more. When the number of SEI NALUs is multiple, the computing device can determine the indicated VI frame corresponding to each of the multiple SEI NALUs by referring to the above-mentioned VI frame determination principle. Thus, one or more VI frames included in the M frames are determined.
  • the GOP includes at least one frame.
  • Each frame includes the reference frame RPS information of the frame.
  • the computing device can analyze the respective RPS information of the M frames to determine whether each frame is a VI frame. Specifically, if the RPS information of any frame in the GOP is used to indicate that any frame has a reference decoded I frame, and the previous frame of any frame is a non-I frame (specifically, it may be a B frame or a P frame), It is determined that any frame is a VI frame. Otherwise, it is determined that any frame is not a VI frame.
  • the computing device obtains out-of-band information of the GOP, and the out-of-band information is used to indicate the position of the VI frame included in the GOP.
  • the position refers to the specific or definite position of the VI frame in the GOP, which may include, but is not limited to, the frame number (index number) of the VI frame, the playing time corresponding to the VI frame, and the like.
  • the out-of-band information may specifically be received by the computing device from other devices (such as a server); it may also be obtained by the computing device from its own video encoding unit, which is not limited in the present invention.
  • the computing device recognizes whether the VI frame and the position of the VI frame are included in the M frames of the GOP according to the out-of-band information of the GOP.
  • the computing device when the computing device determines that the VI frame is not included in the GOP, the computing device does not need to process the GOP.
  • the computing device can start decoding and playing from the first I frame in the GOP.
  • Step S106 When determining that the M frames include a VI frame, the computing device inserts a second I frame before the VI frame to obtain multiple new GOPs. The number of the new GOP is increased by one for the number of VI frames included in the GOP.
  • the computing device can obtain the target I frame (also referred to as the second I frame) corresponding to the VI frame. Specifically, for example, the computing device may determine the storage address of the associated second I frame corresponding to the VI frame from the index information of the GOP, and then obtain the second I frame from the storage location. Or the computing device can search for the second I frame pointed to by the index information of the VI frame.
  • the second I frame can be an I frame that appears before the VI frame in the GOP, or the I frame that is the closest to the VI frame in the GOP. For details, please refer to the related introduction about the target I frame. Go into details again.
  • the index information of the GOP records the second I frame referenced when decoding VI frames, the storage address of the second I frame, the frame index of each frame, the corresponding playback time of each frame, the playback duration of the GOP, and the GOP Information such as start time and end time.
  • the computing device may insert the second I frame before the VI frame, specifically, it may be inserted as the m-th frame before the VI frame, and m is a positive integer. For example, insert the second I frame as the previous frame of the VI frame. Therefore, the computing device can split the GOP into multiple new GOPs, and the number of the new GOPs is the number of VI frames in the GOP increased by one.
  • a GOP includes 4 VI frames, and after inserting a second I frame for each VI frame, 5 new GOPs can be obtained.
  • FIG. 17 showing a schematic diagram of a new GOP. As shown in the figure, the GOP includes 4 VI frames, and the computing device adopts the above-mentioned I frame insertion principle to insert the corresponding second I frame before each VI frame, thereby obtaining 5 new GOPs.
  • the computing device can modify the value of the related field of the second I frame (for example, the value of the control field or the flag field in the second I frame). Value) to mark the second I frame as a non-display frame or a non-output frame.
  • the second I frame is only used to decode the VI frame, and is not used for output display.
  • the new GOP involved in this application has a different meaning from the GOP in the conventional definition.
  • the term description of the new GOP is still used.
  • the new GOP is used to indicate the distance between two I frames, but the first I frame of the new GOP is only used for decoding, not for display output.
  • the pseudo code description of the second I frame modified by the computing device is specifically as follows:
  • the present invention also has different specific processing objects for the GOP of the video and the VI frames included in the GOP. specifically:
  • the video processing request in S102 is specifically a video playback request.
  • the video playback request carries the start time T s of the video.
  • the computing device responds to the video playback request, obtains the GOP where the start time T s is located, and then identifies whether the GOP includes a VI frame. If the GOP includes multiple VI frames, the computing device obtains the VI frame closest to the start time T s from the multiple VI frames for processing, that is, inserts the second I frame before the obtained VI frame, thereby obtaining two new VI frames. GOP.
  • the playback time corresponding to the second I frame after the insertion has priority over the start time T s .
  • the video processing request in S102 is specifically a video download request.
  • the video download request carries the start time T s and the end time T e of the video.
  • the computing device in response to a request to download a video, where a starting time T S from the GOP start the download until the end of time T E where the end of the GOP to GOP consisting of a plurality of downloaded video.
  • For each GOP identify whether the GOP includes VI frames. If the GOP includes one or more VI frames, the computing device inserts a corresponding second I frame before each VI frame, thereby splitting one GOP into multiple new GOPs.
  • the computing device may decode and play the corresponding new GOPs in response to the video playback request if it obtains a video playback request.
  • the specific implementation is as follows:
  • the video processing request in S102 is a video playback request
  • the computing device responds to the video playback request to perform the second I frame insertion on the VI frame closest to the start time T s in the GOP to obtain two new GOPs. Further respond to the video play request, obtain the new GOP at the start time T s , start decoding and play the new GOP from the second I frame of the new GOP.
  • the computing device determines that the start time T s is located after the second I frame in the GOP, and then decodes and plays the video corresponding to the GOP from the second I frame.
  • the video processing request in S102 is a video download request.
  • the computing device responds to the video download request, downloads multiple GOPs included in the video, and inserts a second I frame for each VI frame included in each GOP to obtain Multiple new GOPs.
  • the user can drag the playback progress bar of the video at will when watching the video.
  • the computing device detects that the user is dragging the video playback progress bar, it can generate a corresponding video playback request.
  • the video playback request carries the start time T s of the video.
  • the new GOP where the start time T s is located is searched from among multiple new GOPs, and then the new GOP is decoded and played from the second I frame of the new GOP.
  • FIG. 18 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present invention.
  • the video processing device 18 includes an acquiring unit 181, a determining unit 182, and an inserting unit 183.
  • a decoding and playing unit 184 may also be included. among them,
  • the acquiring unit 181 is configured to acquire a group of pictures GOP in the video, the first frame of the GOP is the first I frame, the GOP includes M frames, and M is a positive integer;
  • the determining unit 182 is configured to determine whether a virtual intra-coded VI frame is included in the M frames;
  • the inserting unit 183 is configured to insert a second I frame before the VI frame when a VI frame is included in the M frames;
  • the second I frame is a frame referenced by the VI frame during video decoding.
  • the video processing device 180 may further include a decoding and playing unit 184.
  • the determining unit 182 is configured to determine that the start time of the video in the video playback request is located after the second I frame in the GOP in response to the video playback request; the decoding and playback unit 184 is configured to download The second I frame starts to decode and play the video.
  • the second I frame is the previous frame of the VI frame.
  • the GOP further includes index information of the GOP, the storage address of the second I frame is recorded in the index information, and the second I frame is inserted before the VI frame.
  • the acquiring unit 181 is further configured to acquire the second I frame from the storage address of the second I frame according to the index information of the GOP.
  • the second I frame is used for decoding the VI frame, and is not used for output display.
  • the acquiring unit 181 is specifically configured to receive a video processing request, the video processing request carries the start time of the video, and the video includes at least one group of pictures GOP; responding to the video processing request , Obtain the GOP of the group of pictures corresponding to the start time from the GOP index table;
  • mapping relationship is recorded in the GOP index table, and the mapping relationship is that each GOP corresponds to index information of the GOP, and the index information of the GOP includes the start time of the GOP.
  • the index information of the GOP further includes the playback time of the frame, and when the video processing request is a video playback request, the VI frame is the playback time in the GOP and the GOP The VI frame with the smallest difference between the start time.
  • the functions of the acquiring unit 181 and the determining unit 182 of the present invention can be implemented by the code stream detector 1041 in FIG. 9.
  • the function of the insertion unit 183 of the present invention can be implemented by the code stream modifier 1043 in FIG. 9.
  • the function of the decoding and playing unit 184 of the present invention can be implemented by the video decoding unit 106 in FIG. 4.
  • the code stream detector 1041 in the video reading and writing unit 104 in FIG. 4 or FIG. 9 can be specifically implemented by functional modules such as the acquiring unit 181 and the determining unit 182.
  • the code stream modifier 1043 in the video reading and writing unit 104 can be specifically implemented by functional modules such as the plug-in unit 183.
  • the video decoding unit 106 may be specifically implemented by functional modules such as the decoding and playing unit 184.
  • Each module or unit involved in the device 18 of the embodiment of the present invention may be specifically implemented by software programs or hardware.
  • the modules or units involved in the device 18 are software modules or software units.
  • the modules or units involved in the device 18 can be implemented through application-specific integrated circuits.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the above PLD can be a complex programmable logic device (CPLD), field-programmable gate array (field-programmable gate array, FPGA), general purpose Generic array logic (GAL) or any combination thereof is not limited in the present invention.
  • FIG. 18 is only a possible implementation manner of the embodiment of the present invention.
  • the video processing device may also include more or fewer components, which is not limited here.
  • FIG. 19 is a schematic structural diagram of a computing device 19 according to an embodiment of the present invention.
  • the computing device shown in FIG. 19 includes one or more processors 1901, a communication interface 1902, and a memory 1903.
  • the processor 1901, the communication interface 1902, and the memory 1903 can be connected by a bus, or communication can be achieved by other means such as wireless transmission.
  • the embodiment of the present invention takes the connection through the bus 1904 as an example, where the memory 1903 is used to store instructions, and the processor 1901 is used to execute instructions stored in the memory 1903.
  • the memory 1903 stores program codes, and the processor 1901 can call the program codes stored in the memory 1903 to implement the video processing device 18 as shown in FIG. 18.
  • the processor 1901 in the embodiment of the present invention may call the program code stored in the memory 1903 to execute all or part of the steps described in the method embodiment described in FIG. 14 above, and/or other steps described in the text. The content, etc., will not be repeated here.
  • the processor 1901 may be composed of one or more general-purpose processors, such as a central processing unit (CPU).
  • the processor 1901 may be used to run programs of the following functional modules in the related program code.
  • the functional module may specifically include, but is not limited to, any one or a combination of the above-mentioned acquiring unit 181, determining unit 182, and inserting unit 183.
  • the program code executed by the processor 1901 can perform the functions of any one or more of the above functional modules.
  • the functional modules mentioned here please refer to the relevant descriptions in the foregoing embodiments, which will not be repeated here.
  • the communication interface 1902 may be a wired interface (such as an Ethernet interface) or a wireless interface (such as a cellular network interface or using a wireless local area network interface) for communicating with other modules or devices.
  • a wired interface such as an Ethernet interface
  • a wireless interface such as a cellular network interface or using a wireless local area network interface
  • the communication interface 1902 in the embodiment of the present invention may be specifically used to obtain GOPs in the video and so on.
  • the memory 1903 may include volatile memory (Volatile Memory), such as random access memory (Random Access Memory, RAM); the memory may also include non-volatile memory (Non-Volatile Memory), such as read-only memory (Read-Only Memory). Memory, ROM), Flash Memory (Flash Memory), Hard Disk Drive (HDD), or Solid-State Drive (SSD); the memory 1903 may also include a combination of the foregoing types of memories.
  • volatile memory such as random access memory (Random Access Memory, RAM
  • non-Volatile Memory such as read-only memory (Read-Only Memory).
  • Memory ROM
  • Flash Memory Flash Memory
  • HDD Hard Disk Drive
  • SSD Solid-State Drive
  • the memory 1903 may be used to store a group of program codes, so that the processor 1901 can call the program codes stored in the memory 1903 to implement the functions of the above-mentioned functional modules involved in the embodiments of the present invention.
  • FIG. 19 is only a possible implementation manner of the embodiment of the present invention.
  • the computing device may also include more or fewer components, which is not limited here.
  • the content not shown or described in the embodiment of the present invention reference may be made to the relevant description in the foregoing method embodiment, which will not be repeated here.
  • the embodiment of the present invention also provides a computer-readable storage medium in which instructions are stored.
  • the computer-readable storage medium runs on a computing device, the method flow shown in the embodiment in FIG. 14 is implemented.
  • the embodiment of the present invention also provides a computer program product.
  • the computer program product runs on a computing device, the method flow shown in the embodiment of FIG. 14 is realized.
  • the steps of the method or algorithm described in combination with the disclosure of the embodiment of the present invention may be implemented in a hardware manner, or may be implemented in a manner in which a processor executes software instructions.
  • Software instructions can be composed of corresponding software modules, which can be stored in random access memory (English: Random Access Memory, RAM), flash memory, read-only memory (English: Read Only Memory, ROM), erasable and programmable Read-only memory (English: Erasable Programmable ROM, EPROM), electrically erasable programmable read-only memory (English: Electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM, or well-known in the art Any other form of storage medium.
  • An exemplary storage medium is coupled to the processor, so that the processor can read information from the storage medium and write information to the storage medium.
  • the storage medium may also be an integral part of the processor.
  • the processor and the storage medium may be located in the ASIC.
  • the ASIC may be located in the computing device.
  • the processor and the storage medium may also exist as discrete components in the computing device.
  • the program can be stored in a computer readable storage medium, and the program can be stored in a computer readable storage medium. When executed, it may include the procedures of the above-mentioned method embodiments.
  • the aforementioned storage media include: ROM, RAM, magnetic disks or optical disks and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Technique de traitement vidéo, applicable à des scénarios tels que la lecture ou le téléchargement de GOP. La solution consiste : à insérer en complément une ou plusieurs trames I dans un GOP, la trame I nouvellement insérée étant plus proche d'une trame VI qu'une trame I d'origine dans le GOP. Dans ce cas, lorsqu'il est nécessaire de lire ou de télécharger un contenu dans le GOP, la trame VI dans le GOP est trouvée en fonction d'un temps requis, la trame I nouvellement insérée est prise en tant que trame de référence de la trame VI pour un décodage vidéo, et il n'est pas nécessaire de prendre la trame I d'origine dans le GOP en tant que valeur de référence pour le décodage vidéo. Par conséquent, l'efficacité de traitement vidéo et la précision du temps de lecture sont améliorées.
PCT/CN2019/125411 2019-12-13 2019-12-13 Procédé et appareil de traitement vidéo, et support de stockage lisible par ordinateur WO2021114305A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980086119.6A CN113261283B (zh) 2019-12-13 2019-12-13 视频处理方法、装置及计算机可读存储介质
PCT/CN2019/125411 WO2021114305A1 (fr) 2019-12-13 2019-12-13 Procédé et appareil de traitement vidéo, et support de stockage lisible par ordinateur

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/125411 WO2021114305A1 (fr) 2019-12-13 2019-12-13 Procédé et appareil de traitement vidéo, et support de stockage lisible par ordinateur

Publications (1)

Publication Number Publication Date
WO2021114305A1 true WO2021114305A1 (fr) 2021-06-17

Family

ID=76328817

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/125411 WO2021114305A1 (fr) 2019-12-13 2019-12-13 Procédé et appareil de traitement vidéo, et support de stockage lisible par ordinateur

Country Status (2)

Country Link
CN (1) CN113261283B (fr)
WO (1) WO2021114305A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08242452A (ja) * 1995-03-02 1996-09-17 Matsushita Electric Ind Co Ltd 映像信号圧縮符号化装置
CN101127919A (zh) * 2007-09-28 2008-02-20 中兴通讯股份有限公司 一种视频序列的编码方法
CN102378008A (zh) * 2011-11-02 2012-03-14 深圳市融创天下科技股份有限公司 一种减少播放等待时间的视频编码方法、装置及系统
CN105847790A (zh) * 2015-01-16 2016-08-10 杭州海康威视数字技术股份有限公司 一种码流传输方法及装置
CN107124610A (zh) * 2017-04-06 2017-09-01 浙江大华技术股份有限公司 一种视频编码方法及装置
US20190289322A1 (en) * 2016-11-16 2019-09-19 Gopro, Inc. Video encoding quality through the use of oncamera sensor information

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105847825A (zh) * 2015-01-16 2016-08-10 杭州海康威视数字技术股份有限公司 视频编码码流的编码、索引存储和访问方法及相应装置
CN106791875B (zh) * 2016-11-30 2020-03-31 华为技术有限公司 视频数据解码方法、编码方法以及相关设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08242452A (ja) * 1995-03-02 1996-09-17 Matsushita Electric Ind Co Ltd 映像信号圧縮符号化装置
CN101127919A (zh) * 2007-09-28 2008-02-20 中兴通讯股份有限公司 一种视频序列的编码方法
CN102378008A (zh) * 2011-11-02 2012-03-14 深圳市融创天下科技股份有限公司 一种减少播放等待时间的视频编码方法、装置及系统
CN105847790A (zh) * 2015-01-16 2016-08-10 杭州海康威视数字技术股份有限公司 一种码流传输方法及装置
US20190289322A1 (en) * 2016-11-16 2019-09-19 Gopro, Inc. Video encoding quality through the use of oncamera sensor information
CN107124610A (zh) * 2017-04-06 2017-09-01 浙江大华技术股份有限公司 一种视频编码方法及装置

Also Published As

Publication number Publication date
CN113261283A (zh) 2021-08-13
CN113261283B (zh) 2024-07-05

Similar Documents

Publication Publication Date Title
US8918533B2 (en) Video switching for streaming video data
US10129587B2 (en) Fast switching of synchronized media using time-stamp management
CN107634930B (zh) 一种媒体数据的获取方法和装置
CN111770390B (zh) 数据处理方法、装置、服务器及存储介质
US12088860B2 (en) Method, device, and computer program for improving random picture access in video streaming
CN109963176B (zh) 视频码流处理方法、装置、网络设备和可读存储介质
CN110662084B (zh) 一种mp4文件流直播的方法、移动终端及存储介质
CN112653904B (zh) 一种基于pts与dts修改的快速视频裁剪方法
CN112087642B (zh) 云导播播放方法、云导播服务器及远程管理终端
WO2023226915A1 (fr) Procédé et système de transmission vidéo, et support de stockage
US9060184B2 (en) Systems and methods for adaptive streaming with augmented video stream transitions using a media server
WO2017092433A1 (fr) Procédé et dispositif de lecture vidéo en temps réel
JP2005123907A (ja) データ再構成装置
CN114363648A (zh) 直播系统混流过程中音视频对齐的方法、设备及存储介质
WO2021114305A1 (fr) Procédé et appareil de traitement vidéo, et support de stockage lisible par ordinateur
CN109302574B (zh) 一种处理视频流的方法和装置
JP2000331421A (ja) 情報記録装置及び情報再生装置
CN115278307A (zh) 一种视频播放方法、装置、设备和介质
WO2024114519A1 (fr) Procédé et appareil d'encapsulation de nuage de points, procédé et appareil de désencapsulation de nuage de points, et support et dispositif électronique
WO2023078048A1 (fr) Procédé et appareil d'encapsulation de flux binaire vidéo, procédé et appareil de décodage de flux binaire vidéo, et procédé et appareil d'accès à un flux binaire vidéo
CN114615549B (zh) 流媒体seek方法、客户端、存储介质和移动设备
CN110574378A (zh) 用于媒体内容资产改变的方法及装置
US11973820B2 (en) Method and apparatus for mpeg dash to support preroll and midroll content during media playback
CN117793459A (zh) 一种视频处理方法、装置及存储介质
CN113507628A (zh) 视频数据处理方法和相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19956066

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19956066

Country of ref document: EP

Kind code of ref document: A1