WO2021114305A1

WO2021114305A1 - Video processing method and apparatus, and computer readable storage medium

Info

Publication number: WO2021114305A1
Application number: PCT/CN2019/125411
Authority: WO
Inventors: 杨胜凯; 刘俊; 杨海涛; 陈绍林
Original assignee: 华为技术有限公司
Priority date: 2019-12-13
Filing date: 2019-12-13
Publication date: 2021-06-17
Also published as: CN113261283A

Abstract

A video processing technique, applicable to scenarios such as GOP playback or download. The solution comprises: additionally inserting one or more I frames in a GOP, the newly inserted I frame being closer to a VI frame than an original I frame in the GOP. In this case, when it is needed to play or download content in the GOP, the VI frame in the GOP is found according to a requested time, the newly inserted I frame is taken as a reference frame of the VI frame for video decoding, and there is no need to take the original I frame in the GOP as a reference value for video decoding. Therefore, the video processing efficiency and the playback time accuracy are improved.

Description

Video processing method, device and computer readable storage medium

Technical field

The present invention relates to the field of Internet technology, in particular to video processing methods, devices and computer storage media.

Background technique

With the rapid development of computer and network communication technology, people have a growing demand for multimedia information. In recent years, video-related applications have covered various fields, such as video conferencing, video surveillance, and mobile TV. In these fields, in order to save network transmission resources, video is usually compressed and transmitted. In the existing video transmission process, a group of pictures (GOP) structure is often used. A GOP is a group of continuous image pictures (ie, frame pictures, referred to as frames for short).

Correspondingly, after the computing device receives the video, it needs to decode and play the video. For example, when a computing device receives a request to play a drag progress bar for a video, in response to the playback request, it acquires multiple GOPs that make up the video from the drag stop position, and decodes and plays each GOP. Specifically, if the target frame pointed to by the drag stop position is a non-I frame, the computing device needs to search for the I frame in several frames before or after the target frame, so as to start decoding and playing from the I frame. When the distance between the I frame and the target frame is large, the video processing efficiency will be reduced to a certain extent, and the user's viewing experience will be affected.

If the I frame is not included in the frames before or after the target frame, the GOP cannot be decoded and played. The computing device can discard the GOP where the target frame is located, and enter the decoding and playback of the next GOP. This will cause some important video information to be discarded and affect the user's viewing experience.

Summary of the invention

The embodiments of the present invention disclose a video processing method, a device, and a computer-readable storage medium, which can solve the problems of reduced video processing efficiency and loss of important video information in existing solutions.

In the first aspect, an embodiment of the present invention discloses a video processing method applied to a computing device. The method includes: acquiring a group of pictures GOP in a video, the first frame of the GOP is the first I frame, and the GOP is Including M frames, M is a positive integer. It is determined whether the M frames include a virtual intra-coded VI frame, and when the VI frame is included in the M frames, a second I frame is inserted before the VI frame. Among them, the second I frame is a frame referenced by the VI frame during video decoding.

By implementing the embodiment of the present invention, inserting the second I frame before the VI frame facilitates subsequent decoding and playback of the video from the second I frame. It can solve the problems of reduced video processing efficiency, loss of important video information, and waste of storage resources of computing devices in the prior art, thereby helping to improve video processing efficiency.

With reference to the first aspect, in some possible embodiments, the computing device, in response to the video playback request, determines that the start time of the video in the video playback request is after the second I frame in the GOP. Then start decoding and play the video from the first frame.

By implementing this step, after the VI frame is inserted into the second I frame, the video can be decoded and played from the second I frame in the video playback scene. Compared with the prior art, the decoding starts from the first I frame of the GOP, which can save video decoding time and improve video processing efficiency.

With reference to the first aspect, in some possible embodiments, the second I frame is the previous frame of the VI frame.

With reference to the first aspect, in some possible embodiments, the GOP further includes index information of the GOP, and the index information records the storage address of the second I frame. Before inserting the second I frame before the VI frame, the computing device can obtain the second I frame from the storage address of the second I frame according to the index information of the GOP.

With reference to the first aspect, in some possible embodiments, after the second I frame is inserted before the VI frame, the index information of the VI frame is inserted in the VI frame after the second I frame. The index information of the VI frame is used to point to the second I frame. The computing device may obtain the second I frame pointed to by the index information according to the index information of the VI frame.

By implementing this step, the computing device can obtain the second I frame to be inserted according to the index information of the GOP or the index information of the VI frame. It is convenient to insert the second I frame before the VI frame later. This can decode the video faster.

With reference to the first aspect, in some possible embodiments, the second I frame is only used for decoding the VI frame, and is not used for output display.

With reference to the first aspect, in some possible embodiments, the GOP includes at least one network abstraction layer unit NALU, and the computing device recognizes whether supplementary enhancement information SEI NALU is included in the GOP to determine whether the M frames include VI frames. The SEI NALU is used to indicate that the frame in which the i-th NALU before the SEI NALU is located is a VI frame, or indicates that the frame in which the j-th NALU after the SEI NALU is located is a VI frame.

By implementing this step, the computing device can recognize the VI frames included in the GOP by recognizing the SEI NALU in the GOP, which can improve the convenience and efficiency of VI frame recognition.

With reference to the first aspect, in some possible embodiments, the GOP includes reference frame RPS information of the frame. The computing device determines whether the VI frame is included in the M frames by identifying the RPS information of each frame in the GOP. Wherein, when the RPS information of the frame is used to indicate to refer to an I frame when decoding the frame, and the previous frame of the frame is a non-I frame, the frame is a VI frame.

By implementing this step, the computing device directly recognizes the reference frame RPS information in the frame to determine whether the frame is a VI frame. This can improve the accuracy of VI frame recognition.

With reference to the first aspect, in some possible embodiments, the computing device receives a video processing request, the video processing request carries the start time of the video, and the video includes at least one GOP. In response to the video processing request, the GOP corresponding to the start time is obtained from the GOP index table. Wherein, at least one mapping relationship is recorded in the GOP index table, and the mapping relationship is that each GOP corresponds to the index information of the GOP, and the index information of the GOP includes the start time of the GOP.

With reference to the first aspect, in some possible embodiments, the video processing request includes a video playback request or a video download request. When the video processing request is a video playback request, the computing device may respond to the video playback request and obtain the GOP where the start time is located from the GOP index table. Conversely, when the video processing request is a video download request, in response to the video download request, at least one GOP starting from the GOP at the start time is obtained from the GOP index table.

By implementing this step, the computing device can obtain the corresponding GOP in the video according to different application scenarios. To process the GOP. This helps to obtain the corresponding GOP for video processing according to the actual needs of the device.

With reference to the first aspect, in some possible embodiments, the index information of the GOP also includes the playing time of the frame. When the video processing request is a video playback request, the VI frame is the VI frame with the smallest difference between the playback time and the start time of the GOP in the GOP.

By implementing this step, in the video playback scene, the computing device can find the VI frame closest to the playback time to insert the second I frame, which avoids I frame insertion processing for each VI frame in the GOP, saves equipment resources, and improves Video processing efficiency.

In the second aspect, an embodiment of the present invention provides a video processing device, which includes a functional module or unit for executing the method described in the first aspect or any possible implementation manner of the first aspect.

In a third aspect, an embodiment of the present invention provides a computing device, including: a processor, a memory, a communication interface, and a bus; the processor, the communication interface, and the memory communicate with each other through the bus; the communication interface is used to receive and send data; and the memory , Is used to store instructions; the processor is used to call instructions in the memory to execute the method described in the first aspect or any possible implementation of the first aspect.

In a fourth aspect, a computer-readable storage medium is provided, and the computer-readable storage medium is used to execute the instructions of the method described in the first aspect.

In a fifth aspect, a computer program product is provided, which when it runs on a computer, enables the computer to execute the instructions of the method described in the first aspect.

In a sixth aspect, a chip product is provided to implement the foregoing first aspect or the method in any possible implementation manner of the first aspect.

On the basis of the implementation manners provided by the above aspects, the present invention can be further combined to provide more implementation manners.

Description of the drawings

In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art.

Fig. 1 is a schematic structural diagram of a GOP provided by an embodiment of the present invention.

Figure 2 is a schematic structural diagram of a NALU provided by an embodiment of the present invention.

FIG. 3 is a schematic structural diagram of a SEI NALU provided by an embodiment of the present invention.

Fig. 4 is a schematic structural diagram of a video processing system provided by an embodiment of the present invention.

Fig. 5 is a schematic structural diagram of a video decoding unit provided by an embodiment of the present invention.

Fig. 6 is a schematic structural diagram of another GOP provided by an embodiment of the present invention.

FIG. 7 is a schematic structural diagram of a SEI NALU inserted into a GOP according to an embodiment of the present invention.

FIG. 8 is a schematic structural diagram of another SEI NALU inserted into a GOP according to an embodiment of the present invention.

FIG. 9 is a schematic structural diagram of a video reading and writing unit provided by an embodiment of the present invention.

Fig. 10 is a schematic structural diagram of another GOP provided by an embodiment of the present invention.

FIG. 11 is a schematic diagram of a user dragging a video playback progress bar according to an embodiment of the present invention.

Figure 12 is a schematic structural diagram of another GOP provided by an embodiment of the present invention.

FIG. 13A is a schematic diagram of storing GOPs in a time index manner according to an embodiment of the present invention.

FIG. 13B is a schematic diagram of storing GOPs in a frame number index mode provided by an embodiment of the present invention.

FIG. 14 is a schematic flowchart of a video processing method provided by an embodiment of the present invention.

FIG. 15 is a schematic diagram of a GOP that composes a video according to an embodiment of the present invention.

FIG. 16 is a schematic diagram of an operation for a user to download a video offline according to an embodiment of the present invention.

Figure 17 is a schematic structural diagram of a new GOP provided by an embodiment of the present invention.

FIG. 18 is a schematic structural diagram of a video processing device provided by an embodiment of the present invention.

FIG. 19 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.

Detailed ways

The technical solutions in the embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings of the present invention.

First, introduce some technical terms or technical concepts applicable to the present invention.

GOP, also known as group of pictures. Refers to a group of continuous images (also called frames), specifically the group of images between two I-frames. GOP indicates the distance between two I frames.

I frame, also called intra-frame coded frame, is an independent frame with all its own information, and can be decoded independently without referring to other frames. The first frame in the video is usually an I frame.

Non-I frames refer to frames other than I frames, specifically including B frames or P frames.

B frame, also called bidirectional predictive coding frame. The B frame records the difference between the current frame and the previous and next frames. That is to say, when decoding a B frame, it is necessary to refer to the previous frame and the next frame of the B frame to decode. The previous frame of the B frame refers to the frame before the B frame and adjacent to the B frame; the frame after the B frame refers to the frame after the B frame and adjacent to the B frame.

P frame, also known as inter-frame predictive coding frame. The P frame records the difference between the current frame and the previous frame. That is to say, when decoding a P frame, you need to refer to the previous frame of the P frame (specifically, it may be a P frame or an I frame) to decode.

VI frame (virtual independent frame, VI), also called virtual I frame. The VI frame is essentially a P frame, but when decoding a VI frame, refer to the I frame in front of the VI frame. Please refer to Fig. 1 for a schematic structural diagram of a GOP. As shown in Figure 1, the GOP includes 3 VI frames. When decoding each VI frame, only the I frame that appears before the VI frame in the GOP is referred to, as shown by the arrow in the figure.

The network abstraction layer unit (NALU) is the basic unit of video compression. In video coding, each frame is composed of at least one NALU. Refer to FIG. 2 for a schematic structural diagram of a NALU provided by an embodiment of the present invention. As shown in Figure 2, NALU includes NAL Header and NAL Body. In the H.264 video coding standard, the length of a NAL Header is fixed at 1 byte, that is, 8 bits. The NAL Header includes three fields, namely: forbidden_zero_bit, important indication field nal_ref_idc, and type field nal_unit_type. among them,

The forbidden_zero_bit occupies 1 bit, and the forbidden_zero_bit field must be 0 in the video coding standard (such as H.264). If the network finds an error in the NALU, the forbidden_zero_bit can be set to 1, which is convenient for the receiver to correct the error or discard the NALU.

nal_ref_idc occupies 2 bits and is used to indicate the importance of NALU. The value range of nal_ref_idc is 00 to 11. When the value of nal_ref_idc is larger, it means that the current NALU is more important and needs to be protected first.

nal_unit_type occupies 5 bits and is used to indicate the type of NALU.

NAL Body includes the encapsulation of payload data (video data). In practical applications, there are three layers of encapsulation in the video code stream obtained by video encoding. The first layer: extended byte string payload (EBSP), which specifically includes the emulation_prevention_three_byte field. The purpose of setting this field is to prevent conflict with the NALU start code (0x000001 or 0x00000001) in the NAL Body. The second layer: raw byte sequence payload (RBSP), which is equivalent to the data after NAL Body removes the emulation_prevention_three_byte, and is the data generated after further processing of the original syntax element code stream (encoded data). The basic structure of RBSP is to add end bits after the original encoded data to facilitate byte alignment. The third layer: data byte stream (string of data bits, SODB), which identifies the actual original binary code stream after encoding the syntax elements of the H.264 coding standard.

Optionally, in the H.264 video coding standard, NALU may also include only NAL header (NAL Header) and RBSP. That is, the main body of the NAL is RBSP. For details about the NAL header and RBSP, please refer to the above description, which will not be repeated here.

The enhanced supplemental information network abstraction layer unit (supplemental enhancement information NALU, SEI NALU) refers to the NALU whose type field nal_unit_type is the SEI type of the supplementary enhanced information unit. Refer to FIG. 3, which is a schematic structural diagram of an SEI NALU provided by an embodiment of the present invention. As shown in Figure 3, the SEI NALU includes a NAL header (NAL Header) and a NAL body (NAL Body). Among them, NAL Header may correspond to the introduction in the embodiment shown in FIG. 2 for details. The nal_unit_type in the NAL Header occupies 5 bits and is used to indicate the type of NALU. In practical applications, different NALU types are indicated by setting the value of the nal_unit_type field. For example, when nal_unit_type is "0X06", it means that the type of NALU indicated by nal_unit_type is SEI type; when nal_unit_type is "0X67", it means that the type of NALU indicated by nal_unit_type is sequence parameter sets (SPS) type; When nal_unit_type is "0X68", it indicates that the type of NALU indicated by nal_unit_type is a picture parameter set (picture parameter set, PPS) type, etc. In the present invention, nal_unit_type is 0X06, indicating that the type of NALU is SEI.

The NAL body includes SEI payload type (SEI payload type), SEI payload size (SEI payload size), and SEI universally unique identifie (SEI UUID) and custom fields. Among them, the SEI payload type field occupies 1 byte, that is, 8 bits, and is used to indicate the type of payload data carried in the SEI NALU, such as video data, SPS or PPS data. The SEI payload size field is used to indicate the size of the payload data, and is referred to as payload size for short. The SEI UUID field occupies 16 bytes and is used to indicate the unique identification of the load data. The number of bytes occupied by the custom field can be a system custom setting for carrying system custom data, which is not limited by the present invention.

With the advancement of network communication technology and the speeding up of bandwidth networks, network video has been developed and applied more and more. At present, in order to save network transmission resources, videos are usually compressed and transmitted. The video is composed of several time-continuous frames, and the video can be divided into several GOPs during encoding. For example, when a computing device receives a request to play a drag progress bar for a video, if the target frame pointed to by the drag stop position is a non-I frame, it needs to find the closest frame to the target frame from the frames before or after the target frame. From the I frame, read the GOP to decode and play. When the GOP is large, if the distance between the target frame and the I frame is large, the decoding time will be prolonged, which will greatly affect the efficiency of video processing, resulting in a reduction in video processing efficiency, and affecting user viewing experience. Conversely, if there are missing I-frames in the several frames before or after the target frame, the target frame cannot be decoded, and part of the video is discarded. The next I-frame position needs to be decoded and played. This leads to the discarding of some important video information, affects the accuracy of video information acquisition, and affects the user's viewing experience.

For another example, when a computing device receives a video reverse playback request, it inputs the complete GOP that constitutes the video into the decoder for decoding, stores the decoding result (decoded video) in the buffer, and then plays it in reverse order. For example, a short video of 5 minutes, in the video reverse scene, the computing device needs to play backwards from the end of the short video (that is, the 5th minute) to the beginning and end of the short video. In practice, it is found that if the GOP is large, the buffer space occupied by the decoded GOP is large. For example, if a 4K video is transmitted at a frame rate of 25fps (frame per second), if the size of a single GOP is 20s (seconds), the storage space occupied by the computing device to cache the decoded video needs to be 5.8GB. This will lead to waste of storage resources of the computing device.

In order to solve the problems of reduced video processing efficiency, loss of some important video information, or wasted storage resources of computing equipment in a large GOP reverse playback scenario, the present invention proposes a video processing method, which is applicable to the above-mentioned prior art. Systems and related products. Refer to Fig. 4, which is a schematic structural diagram of a video processing system provided by an embodiment of the present invention. The video processing system 100 shown in FIG. 4 includes a video encoding unit 102, a video reading and writing unit 104, a video decoding unit 106, and a storage unit 108. among them,

The video encoding unit 102 is responsible for encoding the input original video into a video code stream, and specifically can convert the format file of the original video into a file of another video format. For example, the video encoding unit 102 may use encoding standards such as H.261, H.263, H.264, H.265, or H.266 to encode the original video into a video stream. Common video formats include but are not limited to audio video interleaved (AVI), digital video-audio video format-AVI (DV-AVI), moving picture expert group format, MPEG), advanced streaming format (ASF), windows media video (WMV), real media (RM) or other video supported formats, etc.

In video encoding, the video encoding unit 102 may divide the video into several GOPs for encoding. In other words, a video (that is, a video code stream) may include one or more GOPs. In the following, the present invention takes a video (or a video code stream) including one GOP as an example to describe related content.

In practical applications, the video encoding unit 102 may specifically be an encoder or other devices that support image or video encoding. For example, the video encoding unit 102 may be deployed in a camera device, such as a camera, a camera, etc.; it may also be deployed as a separate encoder.

The storage unit 108 is used for storing video, for example, storing a video code stream obtained after encoding by the video encoding unit 102 and the like.

The video reading and writing unit 104 is responsible for writing the video code stream into the storage unit 108. Or read the video code stream from the storage unit 108 (specifically read the GOP in the video code stream), and then input it to the video decoding unit 106 for decoding.

The video decoding unit 106 is responsible for decoding the input video code stream and outputting the decoded video code stream. Specifically, the GOP contained in the video bitstream is decoded, and each frame contained in the GOP is output.

In actual applications, the video reading and writing unit 104 may specifically be an input output (IO) device that supports data reading and writing functions, such as an IO interface. The video decoding unit 106 may specifically be a device or device that supports a video decoding function, such as a decoder. The video decoding unit 106 may be deployed in a video processing device of a computing device, or may be deployed as a separate decoder, etc., which is not limited in the present invention. The storage unit 108 may specifically be a device supporting a data storage function, which may include, but is not limited to, random access memory (RAM) flash memory, read only memory (ROM), hard disk, registers, and the like.

The video processing technology provided by the embodiments of the present invention may be applicable to scenarios such as GOP playback or download. This solution includes inserting one or more I frames into the GOP. Compared with the original I frames in the GOP, the newly inserted I frames are closer to the VI frame. In this case, when the content in the specified GOP needs to be played or downloaded, the GOP is found according to the requested time and the VI frame corresponding to the requested time, and the newly inserted I frame is used as the reference frame of the VI frame. Video decoding (without needing to use the original I frame in the GOP as a reference value for video decoding). Thereby improving the efficiency of video processing and the accuracy of playback time. When there are many frames in the GOP, the beneficial effects of the embodiments of the present invention will be more prominent.

Refer to FIG. 5, which is a schematic structural diagram of a video encoding unit 102 provided by an embodiment of the present invention. As shown in FIG. 5, the video encoding unit 102 includes a VI detector 1021. Optionally, in the video coding of the video coding unit 102, the system framework may be further divided into two layers: a video coding layer (VCL) and a network abstraction layer (NAL). Specifically, as shown in FIG. 5, the video coding unit 102 may also include a video coding layer VCL 1022 and a network abstraction layer NAL 1023.

The video encoding unit 102 encodes the input original video through the video encoding layer VCL 1022 to obtain a video encoded bit stream, which is referred to as a video stream for short, and specifically also refers to the GOP in the video stream. Then, the VI frame identification is performed on the GOP obtained by the video coding layer VCL 1022 through the VI detector 1021. The specific implementation of the VI frame identification is not limited. For example, the VI frame identification can be performed according to the definition of the VI frame, and the VI frame identification can also be performed based on the received out-of-band information. The out-of-band information is used to indicate that the frame corresponding to the preset time stamp in the GOP is a VI frame, for example, the out-of-band information is used to indicate that the frame corresponding to the 3s in the GOP is a VI frame, and so on.

If the VI frame is recognized, the network abstraction layer NAL 1023 is notified to mark the VI frame to indicate the position of the VI frame in the GOP. The specific implementation of the VI frame marking is not limited, for example, the supplementary enhancement information SEI marking method, other marking methods that conform to the video coding standard for marking the specific position of the VI frame in the GOP, or the out-of-band method to notify the VI The position of the frame in the GOP, etc.

At the same time, the GOP encoded by the video encoding unit 102 can also be sent to the network abstraction layer NAL 1023 for encapsulation, so as to encapsulate the GOP as a unit packet NALU of the network abstraction layer NALU. In other words, the GOP is composed of multiple NALUs. Please refer to Fig. 6 for a schematic diagram of a GOP. As shown in Figure 6, the GOP is composed of a series of NALUs. Generally, the first frame of GOP data is picture parameter set (PPS) and sequence parameter set (SPS), followed by I frame and other frames. As shown in the figure, the GOP includes at least one frame, and each frame includes one or more NALUs. Among them, PPS includes information of all slices of an image (ie, frame), and SPS includes all information of an image sequence (ie, each frame in the GOP).

For example, take the VI frame mark using the SEI mark method to indicate the position of the VI frame in the GOP as an example. After the VI detector recognizes the VI frame in the GOP, it can notify the network abstraction layer NAL 1023 to generate a custom supplement Enhanced information network abstraction layer unit (SEI NAL unit, SEI NALU). Insert the SEI NALU before or after the VI frame to indicate that the frame where the i-th NALU before the SEI NALU is located is a VI frame, or indicate that the frame where the j-th NALU after the SEI NALU is located is a VI frame, For example, it can be specifically used to indicate that the previous frame or the next frame of the SEI NALU is a VI frame. Please refer to FIG. 7 showing a schematic diagram of inserting a SEI NALU into a GOP. As shown in FIG. 7, the original structure diagram of the GOP and the structure diagram of a new GOP correspondingly obtained after inserting the SEI NALU before and after the VI frame in the GOP are respectively shown. As shown in Figure 7, the GOP obtained by the video coding layer VCL 1022 includes P frames and VI frames. After detecting the VI frame in the GOP through the VI detector 1021, the video encoding unit 102 notifies the network abstraction layer NAL 1023 to add SEI NALU before the VI frame; or notifies the network abstraction layer NAL 1023 to add SEI NALU after the VI frame. The specific position where the SEI NALU is added before or after the VI frame is not limited. For example, the SEI NALU is added before the VI frame as the jth NALU before the first NALU included in the VI frame to indicate The frame where the jth NALU after the SEI NALU is located is the VI frame; or the SEI NALU is added after the VI frame as the i-th NALU after the last NALU included in the VI frame to indicate that it is before the SEI NALU The frame where the i-th NALU is located is the VI frame.

Please refer to FIG. 8 for a schematic diagram of inserting SEI NALU in a GOP. As shown in Figure 8, each frame (including VI frame) in the GOP is composed of one or more NALUs. The VI frame in the figure includes 3 NALUs, namely NALU1 to NALU3. Correspondingly, when the network abstraction layer NAL 1023 uses the SEI marking method to mark VI frames, the SEI NALU can be added before the first NALU (NALU1 shown in the figure) contained in the VI frame, that is, as the first NALU before NALU1. ; Or, add SEI NALU after the last NALU (NALU3 shown in the figure) included in the VI frame, that is, add it as the first NALU after NALU3. In this example, the SEI NALU is specifically used to indicate that the frame in which the previous NALU or the next NALU of the SEI NALU is located is a VI frame.

In practical applications, the network abstraction layer NAL 1023 can specifically set the value of the relevant field in the SEI NALU to indicate that the frame of the i-th NALU before the SEI NALU is a VI frame; or to indicate that the SEI NALU The frame where the jth NALU is located is the VI frame. For example, the network abstraction layer NAL 1023 can indicate the position of the VI frame in the GOP by setting the Type field in the SEI NALU (specifically, the SEI payload type field) or the value of the SEI UUID field. Alternatively, the network abstraction layer NAL 1023 can also add a field to the custom field of the SEI NALU, and set the value of the added field to indicate the position of the VI frame in the GOP. Taking the setting of the value of the SEI payload type field as an example, if the network abstraction layer NAL 1023 sets the SEI payload type to +1, it means that the frame where the previous NALU of the SEI NALU is located is a VI frame. Conversely, if the network abstraction layer NAL 1023 sets the SEI payload type to -1, it means that the frame where the next NALU of the SEI NALU is located is a VI frame.

Please refer to FIG. 9, which is a schematic structural diagram of a video reading and writing unit 104 according to an embodiment of the present invention. As shown in FIG. 9, the video reading and writing unit 104 includes a code stream detector 1041, an index generator 1042 and a code stream modifier 1043. The video reading and writing unit 104 and the storage unit 108 communicate with each other. among them,

The code stream detector 1041 is used to perform frame detection (ie frame recognition) on the video code stream input to the video read-write unit 104 (specifically refers to the GOP in the video code stream) to determine the frame and the position of each frame contained in the GOP . For example, the present invention can determine the respective positions of the I frame and the VI frame in the GOP. The form of expression of the position is not limited. For example, the frame index, the playback time corresponding to the frame in the GOP, the storage location of the frame in the storage unit 108 (also called the storage address), or other indications of the frame can be used. Information representation of the position in the GOP, etc.

Specifically, taking the code stream detector 1041 detecting the VI frame in the GOP as an example, the code stream detector 1041 performs VI frame mark detection on the GOP to detect the VI frame in the GOP and the position of the VI frame. Since the video encoding unit 102 has different marking methods for VI frames in the GOP, the specific implementation manners of the code stream detector 1041 for VI frame mark detection are also different. The following two specific manners for VI frame mark detection are exemplified.

In the first type, the code stream detector 1041 detects whether the SEI NALU is included in the GOP, and if it does, it determines that the frame where the i-th NALU before the SEI NALU is located is the VI frame, or determines that the SEI NALU is in the SEI NALU. The frame where the jth NALU after the NALU is located is the VI frame. The number of SEI NALUs is not limited, and it may be one or more. When the number of SEI NALUs is multiple, the code stream detector 1041 can detect multiple VI frames included in the GOP and the position of each VI frame in the GOP according to the foregoing principle.

In the second type, the code stream detector 1041 performs VI frame analysis on the GOP according to the out-of-band information sent from the video encoding unit 102, and determines the VI frame contained in the GOP and the position of the VI frame in the GOP. The out-of-band information is used to indicate or notify the position of the VI frame in the GOP, for example, the fifth frame in the GOP is a VI frame, or the frame corresponding to the third second in the GOP is a VI frame, and so on. Optionally, the code stream detector 1041 can also detect the VI frame in the GOP by parsing the GOP. For details, refer to the following third implementation manner for details.

In the third type, the code stream detector 1041 parses each frame included in the GOP, identifies the reference picture sequence (RPS) information in each frame, and determines the VI frame included in the GOP and the position of the VI frame.

It should be understood that in video encoding, one frame of image is encoded into one or more slices, and these slices of each frame are carried in the NALU for transmission. The first slice of each frame contains an RPS message. The RPS information is composed of some identification information, and the meaning indicated by the identification information is specifically a system custom setting, for example, indicating whether the frame is used as a reference for decoding the current frame or subsequent frames. The RPS information includes the reference frame information of the current frame. If the reference frame information is used to indicate that the current frame has only one decoding reference I frame, and the previous frame of the current frame is a non-I frame, it means that the current frame is a VI frame. Specifically, the RPS information indicates that there is a picture order count (POC) of the reference frame. If the POC of the reference frame is 1, it means that the current frame has 1 reference frame, and the reference frame is an I frame. That is, the decoding of the current frame only refers to the I frame. Further, if the code stream detector 1041 detects that the previous frame of the current frame is a non-I frame, it can determine that the current frame is a VI frame.

When the code stream detector 1041 detects that the GOP includes a VI frame, it can send a VI frame identification signal to the index generator 1042 for notifying that the GOP includes the VI frame and the related information of the VI frame, such as the index of the VI frame (ie Frame number), the corresponding play time of the VI frame in the GOP, the storage address of the VI frame in the storage unit 108, and so on. Optionally, in the same way, when the code stream detector 1041 detects that an I frame is included in the GOP, it may send an I frame identification signal to the index generator 1042 to notify the I frame included in the GOP and related information of the I frame. For example, the frame number of the I frame, the corresponding playback time of the I frame in the GOP, the storage address of the I frame in the storage unit 108, and so on.

The index generator 1042 is configured to receive the I frame identification signal and the VI frame identification signal sent by the code stream detector 1041. After receiving the VI frame identification signal, the index generator 1042 can determine the target I frame corresponding to the reference when decoding the VI frame (this application may also be referred to as the second I frame hereinafter), and perform the comparison between the target I frame and the VI frame. Associated storage, for example, storing the storage address of the target I frame into the index information of the GOP to instruct to refer to the target I frame stored at the storage address when decoding the VI frame. Or, the corresponding index information in the VI frame points to the target I frame, which is specifically used to indicate the target I frame pointed to by the index information when the VI frame is decoded. The index information of the GOP is used to identify the GOP, which can include but is not limited to the index number of the GOP, the duration of the GOP, the start time and end time of the GOP corresponding (video code stream), whether the GOP contains the VI frame identifier, and the GOP is in Information such as the storage address in the storage unit 108 and the storage address or offset of the VI frame in the GOP. Among them, the target I frame here may specifically refer to an I frame that appears before the VI frame in the GOP, that is, the playback time corresponding to the target I frame has priority over the playback time corresponding to the VI frame. Or, the target I frame here may also refer to the I frame that is closest to the VI frame in the GOP.

For example, refer to FIG. 10 for a schematic diagram of a GOP. As shown in Figure 10, GOP is a video stream of 10s. The 7s frame in the figure is the VI frame. In this example, if the target I frame referenced by decoding the VI frame is the I frame that appears before the VI frame in the GOP, the target I frame is specifically the 0s-th I frame in the figure. If the target I frame referenced by decoding the VI frame is the I frame closest to the VI frame in the GOP, the target I frame is specifically the 9th I frame in the figure.

Optionally, if there are multiple GOPs contained in the video, each GOP has its own GOP index information, and the index generator 1042 can store each GOP and the index information of the GOP in the form of a GOP index table. In the storage unit 108. Wherein, at least one mapping relationship is stored in the GOP index table, and the mapping relationship is that one GOP corresponds to having one index information of the GOP. For the specific index information of the GOP, please refer to the above description, which will not be repeated here.

The code stream modifier 1043 is used to modify the GOP input by the video reading and writing unit 104 to obtain a new modified GOP. Specifically, the code stream modifier 1043 reads the VI frame contained in the GOP and the target I frame corresponding to the reference when decoding the VI frame, and then inserts the target I frame before the VI frame to obtain at least two new GOPs (also called For multiple GOPs). Wherein, the specific position where the target I frame is inserted before the VI frame is not limited, for example, it is inserted as the m-th frame before the VI frame, and m is a positive integer.

It should be noted that, in order to describe the embodiments of the present invention more vividly, the embodiments of the present invention can be visually understood as: by inserting a new I frame in the original GOP, a GOP is divided into multiple new GOPs, where , Each new GOP has an I frame. However, the newly inserted I frame in the embodiment of the present invention is for reference by VI frame decoding, so it may not have all the functions of the I frame in the original GOP (for example, it may not have the function to be played), as long as it is enough to decode The VI frame can be used for reference at any time. In other words, the newly inserted frame only has the function of I frame for VI frame reference and decoding. Therefore, this newly inserted frame can be called a quasi-I frame. In this case, since the inserted I frame is not a real I frame, the original GOP frame can be considered as not really divided into multiple new GOPs, but still a GOP (it's just that one or more new GOPs are added to this GOP). A quasi-I frame). Of course, in another case, if the newly inserted I frame is exactly the same as the I frame in the original GOP, it can be considered that the original GOP is divided into multiple new GOPs. For the convenience of description, unless otherwise specified, the following embodiments of the present invention do not distinguish between these two cases. The inserted frames are collectively referred to as I frames, and the I frames (or quasi I frames) are inserted into the GOP. The result of this operation is collectively referred to as obtaining a "new GOP". In short, the inserted I frame (for example, the second I frame) in the embodiment of the present invention is: the same frame as the I frame in the original GOP, or has the I frame in the original GOP possesses the VI frame reference decoding function Frame.

In specific implementation, when the video read-write unit 104 receives a video processing request, it detects whether there is a VI frame in the GOP through the code stream detector 1041. If there is a VI frame in the GOP, the code stream modifier 1043 reads the target I frame from the storage address of the target I frame referenced by decoding the VI frame recorded in the index information of the GOP. Then the code stream modifier 1043 inserts the read target I frame before the VI frame, thereby obtaining multiple new GOPs. This can solve the problems in the prior art when the GOP is large, if the distance between the I frame and the VI frame is large, the decoding time will be too long, the video processing efficiency will be reduced, or some important video information will be lost. The present invention adopts the method of inserting an I frame before the VI frame, can split a large GOP into multiple small GOPs, and can decode and play based on the split small GOPs during video playback. Compared with the prior art, It can avoid the decoding of some unnecessary information, improve the efficiency of video decoding, avoid the discarding of some important video information and other issues, and ensure the user's viewing experience.

By implementing the embodiment of the present invention, the video encoding unit 102 can mark VI frames contained in the GOP, and transmit the VI frame marks along with the GOP, so that the compatibility of the video encoding unit 102 can be improved. The video reading and writing unit 104 can insert the target I frame before the VI frame, and divide the large GOP into multiple new GOPs. In this way, the control is based on the granularity of the VI frame, which can effectively improve the video playback effect. Especially in video reverse scenes, using new GOPs to replace large GOPs and cache them can effectively save storage resources.

Two application scenarios to which the present invention is applicable are described below.

The first is the video playback scene. The video processing request is specifically a video playback request. Specifically, when a user watches a video, he can drag the progress bar of the video playback at will according to his own needs. Please refer to FIG. 11 which shows a schematic diagram of a user dragging the progress bar of the video playback. When the computing device detects that the user is dragging the video playback progress bar, it can generate a corresponding video playback request. In response to the video playback request, the GOP where the dragging stop position is obtained is obtained, and then it is recognized whether the VI frame is included in the GOP. If the GOP includes a VI frame, the storage address of the target I frame referred to when decoding the VI frame is obtained from the GOP index table, the target I frame is obtained from the storage address, and the target I frame is inserted before the I frame. The target I frame here may specifically refer to the I frame that appears before the VI frame in the GOP, or may refer to the I frame that is closest to the VI frame in the GOP. For details, please refer to the example shown in FIG. 10 above.

If there are multiple VI frames included in the GOP, in order to save equipment processing resources in the video playback scene, the computing device can only process the VI frame closest to the drag stop position in the GOP, that is, before the VI frame Insert the target I frame to get two new GOPs. Optionally, the playback time corresponding to the inserted target I frame has priority over the playback time corresponding to the dragging stop position. Then decode and play the new GOP where the dragging stop position is located. Please refer to FIG. 12 for a schematic diagram of the structure of a GOP. As shown in Fig. 12, the GOP is a video code stream of 10s, and the GOP includes two VI frames, VI frame 1 and VI frame 2, respectively. The playback time corresponding to VI frame 1 is the 5th second, and the playback time corresponding to VI frame 2 is the 7th second. When users watch the video stream online, they can drag the progress bar of video playback at will. If the user drags the progress bar to stop at 3s, the VI frame closest to the dragging stop position is VI frame 1. At this time, the computing device can insert the target I frame before VI frame 1. The insertion position of the target I frame is not limited, for example, it can be at any position between the drag stop position and the VI frame, or at the drag stop position At any position before, it can ensure that the playback time corresponding to the target I frame after insertion is not later than (that is, greater than or equal to) the playback time corresponding to the dragging stop position, which can avoid the loss of some important video information.

The second is the video download scene. The video processing request is specifically a video download request. Specifically, if the user wants to watch the video offline, he can download and cache the video locally in advance. Correspondingly, after receiving the video download request, the computing device can download the video (specifically, one or more GOPs contained in the video) in response to the video download request. Optionally, the video download request can carry the start time and end time of the video, and the computing device will download the video (that is, one or more GOPs in the video) from the start time to the end time. It can start downloading from the GOP at the start time to the end of the GOP at the end time. Then identify whether each GOP includes a VI frame. If the GOP includes a VI frame, the storage address of the target I frame referred to when decoding the VI frame is obtained from the GOP index table, the target I frame is obtained from the storage address, and the target I frame is inserted before the VI frame. For details about the introduction of the target I frame, please refer to the relevant description in the first application scenario above, which will not be repeated here.

In practical applications, different GOPs correspond to different play periods. When the start time is within the play period of a certain GOP, it can be simply understood that the start time is in this GOP, and the GOP is used as the start time. The GOP where it is located. For details, please refer to the example described in Figure 15 below in this application.

In the video download scene, considering that the user can drag the video playback progress bar to start playing the video at any position, the computing device can process each VI frame included in each GOP in the video, that is, before the VI frame Insert the target I frame to realize the split of large GOP to small GOP. The processing process of the computing device for any VI frame in each GOP is the same. For details, reference may be made to the relevant introduction of the foregoing embodiment, which will not be repeated here.

The following describes related embodiments related to GOP storage. Different video processing systems can use different indexing methods to create and store corresponding index information for the GOP. In other words, the indexing methods corresponding to the index information of the GOPs in different video processing systems may be different, for example, the time indexing method or the frame number indexing method can be supported. Specifically, the specific implementation manners of the two indexing methods are given as an example as follows.

The first is the time index method. The computing device uses the time index method to create and store corresponding index information for the GOP. Specifically, the computing device creates an index for the GOP according to a preset duration (for example, 1s), and obtains index information of the GOP. The index information includes, but is not limited to, the number of the GOP, whether the GOP contains an I frame, the storage address of the I frame, whether the GOP contains a VI frame, the storage address of the VI frame, the storage address of the GOP in the storage unit 108, and the storage address of the GOP. The playback time and other information corresponding to the frame. The preset duration is self-defined by the system, such as self-defined settings according to user requirements, or statistically obtained based on a series of empirical data. Please refer to FIG. 13A, which shows a schematic diagram of a GOP stored in a time index manner. As shown in the figure, GOP is a 10s video code stream, and the specific figure shows the video code stream from the 0th second to the 9th second. Corresponds to one frame (image) in the GOP per second.

The second is the frame number index method. The computing device uses the frame number index method to create and store corresponding index information for the GOP. Specifically, the computing device creates an index for the GOP according to the I frame interval, and obtains the index information of the GOP. The GOP is used to indicate a group of consecutive frames between two I frames. For the specific index information of the GOP, please refer to the above description, which will not be repeated here. Please refer to FIG. 13B, which shows a schematic diagram of a GOP stored in a frame number index mode. As shown in the figure, the GOP is a video code stream including 10 frames, as shown in the figure are frame 0 to frame 10. Each frame corresponds to the index number of the frame.

Based on the foregoing embodiment, please refer to FIG. 14 which is a schematic flowchart of a video processing method according to an embodiment of the present invention. The method shown in Figure 14 includes the following implementation steps:

Step S102: The computing device obtains the GOP of the group of pictures in the video, and the first frame of the GOP is the first I frame. The GOP includes M frames, and M is a positive integer.

The computing device obtains a video processing request, and the video processing request carries the start time of the video. In response to the video processing request, a video corresponding to the start time is acquired, that is, at least one GOP in the video is acquired. The video processing request may also carry the end time of the video, or other system-customized information, etc., which is not limited in the present invention. The video processing request may specifically be generated by the user performing a corresponding video operation on the video, or may be received from other devices. In different application scenarios, the video processing request may also be different. For example, in a video playback scene, the computing device detects a user's drag operation on the video playback progress bar, and can generate a corresponding video playback request. In a video download scenario, when the computing device detects a user's video download operation for a preset period (the period from the start time to the end time), it can generate a corresponding video download request, etc.

In the following, taking the video processing request as a video playback request and a video download request as an example, the specific implementation manner of step S102 is described in detail.

In one embodiment, if the video processing request is a video playback request, the video playback request carries the start time T _{s of the} video. The video includes multiple GOPs. Then, the computing device can respond to the video playback request and obtain the GOP of the group of pictures where the _{start time T s is located from the multiple GOPs of the video.}

In another embodiment, if the video processing request is a video download request, the video download request carries the start time T _{s of the} video, and optionally the end time T _{e of the} video. The computing device may respond to a video playback request, where the start time T _s GOP start the download, the time until the end of the GOP end T _e where, whereby the at least one GOP composed of downloaded video.

For example, refer to FIG. 15 for a schematic diagram of a GOP that composes a video according to an embodiment of the present invention. A user plays the movie "XXX" online on a computing device. As shown in Figure 15, the movie includes 8 GOPs. Assuming that the user drags the movie playback progress bar stuck in a time T _s, T _s time to start playing from the video. The computing device may generate a video playback request when detecting that the user is dragging the playback progress bar of the movie. The video playback request carries the start time T _{s of the} video. Further, the computing device may respond to the video playback request to obtain _{the GOP where the start time T s} is located, and the figure is specifically GOP3.

If the user needs to download the movie offline, the computing device may generate a video download request when detecting the user's download operation for the movie. The video download request may carry the start time and end time of the video to be downloaded. The video to be downloaded can be a video segment (for example, the beginning or the end) of the movie "XXX", or it can be the entire video. The start time and end time of the video to be downloaded can be customized by the user according to actual needs, for example, 00:01:00-00:21:00 (that is, download the video segment from the 1st minute to the 21st minute). The user can perform offline download settings on the display interactive interface provided by the computing device. Please refer to FIG. 16 for a schematic diagram of an operation for a user to download videos offline. As shown in Figure 16, in the display interactive interface, set the start time, end time, and video name of the video to be downloaded according to your own needs. Correspondingly, when the computing device detects an offline download operation for the display interactive interface, it can start downloading from the GOP at the start time until the GOP at the end time ends. Assuming that in this example, the GOP at the start time 00:01:00 is GOP1, and the GOP at the end time 00:21:00 is GOP3, the 20-minute video downloaded by the computing device may specifically include GOP1, GOP2, and GOP3.

Step S104: The computing device determines whether a VI frame is included in the M frames.

In one embodiment, the GOP includes one or more NALUs. The computing device determines whether the VI frame is included in the M frames of the GOP by identifying whether the SEI and NALU are included in the GOP. Specifically, if the SEI NALU is included in the GOP, the frame where the i-th NALU before the SEI NALU is located is the VI frame, or the frame where the j-th NALU after the SEI NALU is located is the VI frame according to the indication of the SEI NALU. frame. The number of SEI NALU is not limited, and it can be one or more. When the number of SEI NALUs is multiple, the computing device can determine the indicated VI frame corresponding to each of the multiple SEI NALUs by referring to the above-mentioned VI frame determination principle. Thus, one or more VI frames included in the M frames are determined.

In another embodiment, the GOP includes at least one frame. Each frame includes the reference frame RPS information of the frame. The computing device can analyze the respective RPS information of the M frames to determine whether each frame is a VI frame. Specifically, if the RPS information of any frame in the GOP is used to indicate that any frame has a reference decoded I frame, and the previous frame of any frame is a non-I frame (specifically, it may be a B frame or a P frame), It is determined that any frame is a VI frame. Otherwise, it is determined that any frame is not a VI frame.

In another implementation manner, the computing device obtains out-of-band information of the GOP, and the out-of-band information is used to indicate the position of the VI frame included in the GOP. The position refers to the specific or definite position of the VI frame in the GOP, which may include, but is not limited to, the frame number (index number) of the VI frame, the playing time corresponding to the VI frame, and the like. The out-of-band information may specifically be received by the computing device from other devices (such as a server); it may also be obtained by the computing device from its own video encoding unit, which is not limited in the present invention. Correspondingly, the computing device recognizes whether the VI frame and the position of the VI frame are included in the M frames of the GOP according to the out-of-band information of the GOP.

In an alternative embodiment, when the computing device determines that the VI frame is not included in the GOP, the computing device does not need to process the GOP. When playing the video corresponding to the GOP, the computing device can start decoding and playing from the first I frame in the GOP.

Step S106: When determining that the M frames include a VI frame, the computing device inserts a second I frame before the VI frame to obtain multiple new GOPs. The number of the new GOP is increased by one for the number of VI frames included in the GOP.

After recognizing that the VI frame is included in the M frames, the computing device can obtain the target I frame (also referred to as the second I frame) corresponding to the VI frame. Specifically, for example, the computing device may determine the storage address of the associated second I frame corresponding to the VI frame from the index information of the GOP, and then obtain the second I frame from the storage location. Or the computing device can search for the second I frame pointed to by the index information of the VI frame. The second I frame can be an I frame that appears before the VI frame in the GOP, or the I frame that is the closest to the VI frame in the GOP. For details, please refer to the related introduction about the target I frame. Go into details again. Wherein the index information of the GOP records the second I frame referenced when decoding VI frames, the storage address of the second I frame, the frame index of each frame, the corresponding playback time of each frame, the playback duration of the GOP, and the GOP Information such as start time and end time.

After obtaining the second I frame, the computing device may insert the second I frame before the VI frame, specifically, it may be inserted as the m-th frame before the VI frame, and m is a positive integer. For example, insert the second I frame as the previous frame of the VI frame. Therefore, the computing device can split the GOP into multiple new GOPs, and the number of the new GOPs is the number of VI frames in the GOP increased by one. For example, a GOP includes 4 VI frames, and after inserting a second I frame for each VI frame, 5 new GOPs can be obtained. For details, refer to FIG. 17 showing a schematic diagram of a new GOP. As shown in the figure, the GOP includes 4 VI frames, and the computing device adopts the above-mentioned I frame insertion principle to insert the corresponding second I frame before each VI frame, thereby obtaining 5 new GOPs.

Optionally, without affecting the video playback quality, after the computing device inserts the second I frame before the VI frame, it can modify the value of the related field of the second I frame (for example, the value of the control field or the flag field in the second I frame). Value) to mark the second I frame as a non-display frame or a non-output frame. In other words, the second I frame is only used to decode the VI frame, and is not used for output display. At this time, the new GOP involved in this application has a different meaning from the GOP in the conventional definition. To facilitate the understanding of this application, the term description of the new GOP is still used. The new GOP is used to indicate the distance between two I frames, but the first I frame of the new GOP is only used for decoding, not for display output. Exemplarily, the pseudo code description of the second I frame modified by the computing device is specifically as follows:

It should be noted that, for different application scenarios, the present invention also has different specific processing objects for the GOP of the video and the VI frames included in the GOP. specifically:

First, in a video playback scene, the video processing request in S102 is specifically a video playback request. The video playback request carries the start time T _{s of the} video. Correspondingly, the computing device responds to the video playback request, obtains _{the GOP where the start time T s} is located, and then identifies whether the GOP includes a VI frame. If the GOP includes multiple VI frames, the computing device obtains the VI frame _{closest to the start time T s} from the multiple VI frames for processing, that is, inserts the second I frame before the obtained VI frame, thereby obtaining two new VI frames. GOP. For details about obtaining the VI frame, please refer to the relevant introduction in the example described in FIG. 12, which will not be repeated here. Optionally, to ensure that the video information is not lost, the playback time corresponding to the second I frame after the insertion has priority over the start time T _s .

Second, in a video download scenario, the video processing request in S102 is specifically a video download request. The video download request carries the start time T _s and the end time T _{e of the} video. Accordingly, the computing device in response to a request to download a video, where a starting time T _S from the GOP start the download until the end of time T _E where the end of the GOP to GOP consisting of a plurality of downloaded video. For each GOP, identify whether the GOP includes VI frames. If the GOP includes one or more VI frames, the computing device inserts a corresponding second I frame before each VI frame, thereby splitting one GOP into multiple new GOPs. For details, reference may be made to the relevant introduction in the embodiment described in FIG.

In an alternative embodiment, after obtaining multiple new GOPs, the computing device may decode and play the corresponding new GOPs in response to the video playback request if it obtains a video playback request. In different application scenarios, the specific implementation is as follows:

In a video playback scenario, the video processing request in S102 is a video playback request, and the computing device responds to the video playback request to perform the second I frame insertion on the VI frame closest to the _{start time T s in the GOP to obtain two new GOPs.} Further respond to the video play request, obtain _{the new GOP at the start time T s} , start decoding and play the new GOP from the second I frame of the new GOP. In other words, in response to the video playback request, the computing device determines that the start time T _s is located after the second I frame in the GOP, and then decodes and plays the video corresponding to the GOP from the second I frame.

In the video download scenario, the video processing request in S102 is a video download request. The computing device responds to the video download request, downloads multiple GOPs included in the video, and inserts a second I frame for each VI frame included in each GOP to obtain Multiple new GOPs. The user can drag the playback progress bar of the video at will when watching the video. When the computing device detects that the user is dragging the video playback progress bar, it can generate a corresponding video playback request. The video playback request carries the start time T _{s of the} video. In response to the video playback request, the new GOP where the start time T _s is located is searched from among multiple new GOPs, and then the new GOP is decoded and played from the second I frame of the new GOP.

By implementing the embodiments of the present invention, it is possible to solve the problems of low video processing efficiency in the prior art, loss of some important video information, or waste of storage resources of the computing device in a large GOP reverse playback scenario.

With reference to the relevant descriptions in the embodiments described in FIGS. 1 to 17, the following describes the devices and equipment to which the present invention is applicable. Refer to FIG. 18, which is a schematic structural diagram of a video processing apparatus according to an embodiment of the present invention. As shown in FIG. 18, the video processing device 18 includes an acquiring unit 181, a determining unit 182, and an inserting unit 183. Optionally, a decoding and playing unit 184 may also be included. among them,

The acquiring unit 181 is configured to acquire a group of pictures GOP in the video, the first frame of the GOP is the first I frame, the GOP includes M frames, and M is a positive integer;

The determining unit 182 is configured to determine whether a virtual intra-coded VI frame is included in the M frames;

The inserting unit 183 is configured to insert a second I frame before the VI frame when a VI frame is included in the M frames;

Wherein, the second I frame is a frame referenced by the VI frame during video decoding.

In some possible implementation manners, the video processing device 180 may further include a decoding and playing unit 184. The determining unit 182 is configured to determine that the start time of the video in the video playback request is located after the second I frame in the GOP in response to the video playback request; the decoding and playback unit 184 is configured to download The second I frame starts to decode and play the video.

In some possible implementation manners, the second I frame is the previous frame of the VI frame.

In some possible implementation manners, the GOP further includes index information of the GOP, the storage address of the second I frame is recorded in the index information, and the second I frame is inserted before the VI frame. The acquiring unit 181 is further configured to acquire the second I frame from the storage address of the second I frame according to the index information of the GOP.

In some possible implementation manners, the second I frame is used for decoding the VI frame, and is not used for output display.

In some possible implementation manners, the acquiring unit 181 is specifically configured to receive a video processing request, the video processing request carries the start time of the video, and the video includes at least one group of pictures GOP; responding to the video processing request , Obtain the GOP of the group of pictures corresponding to the start time from the GOP index table;

Wherein, at least one mapping relationship is recorded in the GOP index table, and the mapping relationship is that each GOP corresponds to index information of the GOP, and the index information of the GOP includes the start time of the GOP.

In some possible implementation manners, the index information of the GOP further includes the playback time of the frame, and when the video processing request is a video playback request, the VI frame is the playback time in the GOP and the GOP The VI frame with the smallest difference between the start time.

In practical applications, the functions of the acquiring unit 181 and the determining unit 182 of the present invention can be implemented by the code stream detector 1041 in FIG. 9. The function of the insertion unit 183 of the present invention can be implemented by the code stream modifier 1043 in FIG. 9. The function of the decoding and playing unit 184 of the present invention can be implemented by the video decoding unit 106 in FIG. 4. In other words, the code stream detector 1041 in the video reading and writing unit 104 in FIG. 4 or FIG. 9 can be specifically implemented by functional modules such as the acquiring unit 181 and the determining unit 182. The code stream modifier 1043 in the video reading and writing unit 104 can be specifically implemented by functional modules such as the plug-in unit 183. The video decoding unit 106 may be specifically implemented by functional modules such as the decoding and playing unit 184.

Each module or unit involved in the device 18 of the embodiment of the present invention may be specifically implemented by software programs or hardware. When implemented by a software program, the modules or units involved in the device 18 are software modules or software units. When implemented by hardware, the modules or units involved in the device 18 can be implemented through application-specific integrated circuits. ASIC) implementation, or programmable logic device (programmable logic device, PLD) implementation, the above PLD can be a complex programmable logic device (CPLD), field-programmable gate array (field-programmable gate array, FPGA), general purpose Generic array logic (GAL) or any combination thereof is not limited in the present invention.

It should be noted that FIG. 18 is only a possible implementation manner of the embodiment of the present invention. In practical applications, the video processing device may also include more or fewer components, which is not limited here. Regarding the content not shown or described in the embodiment of the present invention, reference may be made to the relevant description in the foregoing method embodiment, which will not be repeated here.

Please refer to FIG. 19, which is a schematic structural diagram of a computing device 19 according to an embodiment of the present invention. The computing device shown in FIG. 19 includes one or more processors 1901, a communication interface 1902, and a memory 1903. The processor 1901, the communication interface 1902, and the memory 1903 can be connected by a bus, or communication can be achieved by other means such as wireless transmission. . The embodiment of the present invention takes the connection through the bus 1904 as an example, where the memory 1903 is used to store instructions, and the processor 1901 is used to execute instructions stored in the memory 1903. The memory 1903 stores program codes, and the processor 1901 can call the program codes stored in the memory 1903 to implement the video processing device 18 as shown in FIG. 18.

In practical applications, the processor 1901 in the embodiment of the present invention may call the program code stored in the memory 1903 to execute all or part of the steps described in the method embodiment described in FIG. 14 above, and/or other steps described in the text. The content, etc., will not be repeated here.

It should be understood that the processor 1901 may be composed of one or more general-purpose processors, such as a central processing unit (CPU). The processor 1901 may be used to run programs of the following functional modules in the related program code. The functional module may specifically include, but is not limited to, any one or a combination of the above-mentioned acquiring unit 181, determining unit 182, and inserting unit 183. In other words, the program code executed by the processor 1901 can perform the functions of any one or more of the above functional modules. For details of the functional modules mentioned here, please refer to the relevant descriptions in the foregoing embodiments, which will not be repeated here.

The communication interface 1902 may be a wired interface (such as an Ethernet interface) or a wireless interface (such as a cellular network interface or using a wireless local area network interface) for communicating with other modules or devices. For example, the communication interface 1902 in the embodiment of the present invention may be specifically used to obtain GOPs in the video and so on.

The memory 1903 may include volatile memory (Volatile Memory), such as random access memory (Random Access Memory, RAM); the memory may also include non-volatile memory (Non-Volatile Memory), such as read-only memory (Read-Only Memory). Memory, ROM), Flash Memory (Flash Memory), Hard Disk Drive (HDD), or Solid-State Drive (SSD); the memory 1903 may also include a combination of the foregoing types of memories. The memory 1903 may be used to store a group of program codes, so that the processor 1901 can call the program codes stored in the memory 1903 to implement the functions of the above-mentioned functional modules involved in the embodiments of the present invention.

It should be noted that FIG. 19 is only a possible implementation manner of the embodiment of the present invention. In practical applications, the computing device may also include more or fewer components, which is not limited here. Regarding the content not shown or described in the embodiment of the present invention, reference may be made to the relevant description in the foregoing method embodiment, which will not be repeated here.

The embodiment of the present invention also provides a computer-readable storage medium in which instructions are stored. When the computer-readable storage medium runs on a computing device, the method flow shown in the embodiment in FIG. 14 is implemented.

The embodiment of the present invention also provides a computer program product. When the computer program product runs on a computing device, the method flow shown in the embodiment of FIG. 14 is realized.

The steps of the method or algorithm described in combination with the disclosure of the embodiment of the present invention may be implemented in a hardware manner, or may be implemented in a manner in which a processor executes software instructions. Software instructions can be composed of corresponding software modules, which can be stored in random access memory (English: Random Access Memory, RAM), flash memory, read-only memory (English: Read Only Memory, ROM), erasable and programmable Read-only memory (English: Erasable Programmable ROM, EPROM), electrically erasable programmable read-only memory (English: Electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM, or well-known in the art Any other form of storage medium. An exemplary storage medium is coupled to the processor, so that the processor can read information from the storage medium and write information to the storage medium. Of course, the storage medium may also be an integral part of the processor. The processor and the storage medium may be located in the ASIC. In addition, the ASIC may be located in the computing device. Of course, the processor and the storage medium may also exist as discrete components in the computing device.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The program can be stored in a computer readable storage medium, and the program can be stored in a computer readable storage medium. When executed, it may include the procedures of the above-mentioned method embodiments. The aforementioned storage media include: ROM, RAM, magnetic disks or optical disks and other media that can store program codes.

Claims

A video processing method, characterized in that it is applied to a computing device, and the method includes:

Acquiring a group of pictures GOP in the video, the first frame of the GOP is the first I frame, the GOP includes M frames, and M is a positive integer;

Determine whether the M frames include a virtual intra-coded VI frame;

When a VI frame is included in the M frames, a second I frame is inserted before the VI frame;

Wherein, the second I frame is a frame referenced by the VI frame during video decoding.
The method of claim 1, wherein the method further comprises:

In response to the video play request, determining that the start time of the video in the video play request is after the second I frame in the GOP;

Start decoding and play the video from the second I frame.
The method according to claim 1 or 2, wherein the second I frame is the previous frame of the VI frame.
The method according to any one of claims 1 to 3, wherein the GOP further comprises index information of the GOP, and the storage address of the second I frame is recorded in the index information, and the storage address of the second I frame is recorded in the index information. Before inserting the second I frame before the VI frame, the method further includes:

According to the index information of the GOP, the second I frame is obtained from the storage address of the second I frame.
The method according to any one of claims 1 to 4, wherein the second I frame is used for decoding the VI frame and is not used for output display.
The method according to any one of claims 1 to 5, wherein the obtaining the group of pictures GOP in the video comprises:

Receiving a video processing request, where the video processing request carries the start time of the video, and the video includes at least one group of pictures GOP;

In response to the video processing request, obtain the GOP of the group of pictures corresponding to the start time from the GOP index table;

Wherein, at least one mapping relationship is recorded in the GOP index table, and the mapping relationship is that each GOP corresponds to index information of the GOP, and the index information of the GOP includes the start time of the GOP.
The method according to claim 6, wherein the index information of the GOP further includes the playing time of the frame,

When the video processing request is a video playback request, the VI frame is the VI frame in the GOP with the smallest difference between the playback time and the start time of the GOP.
A video processing device, characterized in that it comprises an acquisition unit, a determination unit and an insertion unit, wherein:

The acquiring unit is configured to acquire a group of pictures GOP in a video, the first frame of the GOP is the first I frame, the GOP includes M frames, and M is a positive integer;

The determining unit is configured to determine whether the M frames include a virtual intra-coded VI frame;

The inserting unit is configured to insert a second I frame before the VI frame when a VI frame is included in the M frames;

Wherein, the second I frame is a frame referenced by the VI frame during video decoding.
8. The device according to claim 8, wherein the device further comprises a decoding and playing unit,

The determining unit is configured to, in response to a video playback request, determine that the start time of the video in the video playback request is located after the second I frame in the GOP;

The decoding and playing unit is configured to decode and play the video from the second I frame.
9. The apparatus according to claim 8 or 9, wherein the second I frame is the previous frame of the VI frame.
The device according to any one of claims 8-10, wherein the GOP further comprises index information of the GOP, and the storage address of the second I frame is recorded in the index information, and the storage address of the second I frame is recorded in the index information. Insert before the VI frame before the second I frame,

The acquiring unit is further configured to acquire the second I frame from the storage address of the second I frame according to the index information of the GOP.
The device according to any one of claims 8-11, wherein the second I frame is used for decoding the VI frame, and is not used for output display.
The device according to any one of claims 8-12, wherein:

The acquiring unit is specifically configured to receive a video processing request, where the video processing request carries the start time of the video, and the video includes at least one group of pictures GOP; in response to the video processing request, acquiring and The group of pictures GOP corresponding to the start time;

Wherein, at least one mapping relationship is recorded in the GOP index table, and the mapping relationship is that each GOP corresponds to index information of the GOP, and the index information of the GOP includes the start time of the GOP.
The apparatus according to claim 13, wherein the index information of the GOP further includes the playing time of the frame,

When the video processing request is a video playback request, the VI frame is the VI frame in the GOP with the smallest difference between the playback time and the start time of the GOP.
A computing device, characterized in that it comprises a processor and an interface, the processor communicates with the interface, the interface is used for receiving GOP and sending it to the processor, and the processor is used for the processor to pass through The running program instructions execute the method according to any one of claims 1-7.
A computer program product, characterized in that when it runs on a computer, the computer executes the method according to any one of claims 1-7.