CN115866245A

CN115866245A - Video encoding method, video encoding device, computer equipment and storage medium

Info

Publication number: CN115866245A
Application number: CN202211423853.7A
Authority: CN
Inventors: 王曜
Original assignee: Shenzhen Yuntian Changxiang Information Technology Co ltd
Current assignee: Shenzhen Yuntian Changxiang Information Technology Co ltd
Priority date: 2022-11-15
Filing date: 2022-11-15
Publication date: 2023-03-28

Abstract

The application relates to a video coding method, a video coding device, a computer device and a storage medium. The method comprises the following steps: acquiring a source code stream to be coded, and determining an arrangement structure of video frames in the source code stream; extracting a plurality of video frames in the source code stream according to the arrangement structure to obtain a plurality of candidate code streams with different code stream structures; identifying a reference video frame in each candidate code stream, and determining a target video frame and a conventional video frame which are respectively associated with each reference video frame; respectively coding the associated target video frame according to each reference video frame to obtain an initial coding result of the candidate code stream of each code stream structure; and for the candidate code stream of each code stream structure, obtaining a target code stream corresponding to the candidate code stream of the corresponding code stream structure according to the initial coding result corresponding to the candidate code stream and the conventional video frame. The method can improve the flexibility of encoding the video.

Description

Video encoding method, video encoding device, computer equipment and storage medium

Technical Field

The present application relates to the field of video coding, and in particular, to a video coding method, apparatus, computer device, and storage medium.

Background

With the explosive growth of consumer electronics, the multi-screen interaction requirement becomes more and more urgent, so that video transmission needs to be performed between different terminals in real time. At present, in a video transmission process, due to different network bandwidth conditions in different scenes, different terminal processing capabilities, different quality requirements of users and the like, a server needs to encode a video sent by a current terminal for multiple times to ensure that other terminals can smoothly receive and analyze the video.

However, the process of encoding the same video for multiple times reduces the encoding and transmission efficiency of the video and also reduces the flexibility of accurately receiving the video by different terminals. Therefore, how to generate videos suitable for different network bandwidths and other scenes through one-time encoding is a problem to be solved by the application.

Disclosure of Invention

In view of the above, it is necessary to provide a video encoding method, an apparatus, a computer device, a computer readable storage medium, and a computer program product, which can improve the utility of video encoding.

In a first aspect, the present application provides a video encoding method. The method comprises the following steps:

acquiring a source code stream to be coded, and determining an arrangement structure of video frames in the source code stream;

extracting a plurality of video frames in the source code stream according to the arrangement structure to obtain a plurality of candidate code streams with different code stream structures;

identifying a reference video frame in each candidate code stream, and determining a target video frame and a conventional video frame which are respectively associated with each reference video frame;

respectively coding the associated target video frame according to each reference video frame to obtain an initial coding result of the candidate code stream of each code stream structure;

and for each candidate code stream in the candidate code streams with different code stream structures, obtaining a target code stream corresponding to the candidate code stream of the corresponding code stream structure according to an initial coding result corresponding to the candidate code stream and the conventional video frame.

In one embodiment, determining an arrangement structure of video frames in the source code stream includes: performing field coding on the source code stream to obtain a plurality of video frames which are sequentially arranged; the frame number of the video frame is n, and n is a positive integer; determining a video frame with a special frame number from a plurality of video frames; the special frame number at least comprises n +2; and obtaining the arrangement structure of the video frames in the source code stream according to the video frames with the special frame numbers.

In an embodiment, the obtaining an arrangement structure of video frames in the source code stream according to the video frame with the special frame number includes: responding to the association operation of the video frames with the special frame numbers, and obtaining the video frames with special marks and the reference relation between the video frames with the special marks; responding to the configuration operation of the video frames of other frame numbers in the plurality of video frames to obtain the video frame with the conventional identifier; and synthesizing the special identification and the conventional identification of the video frame and each reference relation to obtain the arrangement structure of the video frames in the source code stream.

In one embodiment, the encoding the associated target video frame according to each reference video frame to obtain an initial encoding result of the candidate code stream of each code stream structure includes: determining a current reference video frame in the candidate code stream, and determining a current target video frame associated with the current reference video frame; performing interframe prediction on the current reference video frame to obtain a current prediction result; superposing the current prediction result and the current target video frame to obtain a first coding result corresponding to the current target video frame; taking the first coding result as a new current reference video frame, returning to the step of determining the current target video frame associated with the current reference video frame, and continuing to execute the step until a first coding result of the target video frame associated with the last reference video frame in the candidate code stream is obtained; and taking the first coding result of the target video frame associated with the last reference video frame as the initial coding result of the candidate code stream.

In one embodiment, the codestream structure comprises a full codestream structure; the obtaining of the target code stream corresponding to the candidate code stream of the corresponding code stream structure according to the initial coding result corresponding to the candidate code stream and the conventional video frame includes: determining a current reference video frame associated with a conventional video frame in the candidate code stream of the full code stream structure, and if the current reference video frame is a first reference video frame, overlapping each conventional video frame in the candidate code stream of the full code stream structure with the current reference video frame respectively to obtain a second coding result corresponding to each conventional video frame; if the current reference video frame is a non-first reference video frame, determining a first coding result of the current reference video frame, and overlapping each conventional video frame in the candidate code stream of the full code stream structure with the first coding result to obtain a second coding result corresponding to each conventional video frame; and synthesizing each second coding result and the preliminary coding result of the candidate code stream of the full code stream structure to obtain a target code stream corresponding to the candidate code stream of the full code stream structure.

In one embodiment, the codestream structure comprises a semi-codestream structure; the obtaining of the target code stream corresponding to the candidate code stream of the corresponding code stream structure according to the initial coding result corresponding to the candidate code stream and the conventional video frame includes: coding and filling each conventional video frame in the candidate code stream of the half code stream structure to obtain a second coding result corresponding to each conventional video frame; and synthesizing each second coding result and the preliminary coding result of the candidate code stream of the half-code stream structure to obtain a target code stream corresponding to the candidate code stream of the half-code stream structure.

In a second aspect, the present application further provides a video encoding apparatus. The device comprises:

the candidate code stream determining module is used for acquiring a source code stream to be coded and determining an arrangement structure of video frames in the source code stream; extracting a plurality of video frames in the source code stream according to the arrangement structure to obtain a plurality of candidate code streams with different code stream structures;

the video frame identification module is used for identifying a reference video frame in each candidate code stream and determining a target video frame and a conventional video frame which are respectively associated with each reference video frame;

the target code stream determining module is used for coding the associated target video frames according to each reference video frame to obtain an initial coding result of the candidate code stream of each code stream structure; and for each candidate code stream in the candidate code streams with different code stream structures, obtaining a target code stream corresponding to the candidate code stream of the corresponding code stream structure according to the initial coding result corresponding to the candidate code stream and the conventional video frame.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the following steps when executing the computer program:

In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of:

In a fifth aspect, the present application further provides a computer program product. The computer program product comprising a computer program which when executed by a processor performs the steps of:

According to the video coding method, the video coding device, the computer equipment, the storage medium and the computer program product, a plurality of candidate code streams with different code stream structures can be obtained by acquiring a source code stream to be coded, determining the arrangement structure of video frames in the source code stream and further extracting and processing a plurality of video frames in the source code stream according to the arrangement structure; by identifying the reference video frame in each candidate code stream and determining the target video frame and the conventional video frame which are respectively associated with each reference video frame, the associated target video frame can be coded according to each reference video frame, so that the initial coding result of the candidate code stream of each code stream structure is obtained; for each candidate code stream in a plurality of candidate code streams with different code stream structures, the target code stream corresponding to the candidate code stream of the corresponding code stream structure can be obtained according to the initial coding result corresponding to the candidate code stream and the conventional video frame. According to the method, the candidate code streams with different code stream structures can be obtained directly according to the arrangement structure, and compared with the traditional process of carrying out multiple encoding on the same video, the method can obtain the target code stream corresponding to the candidate code stream for the candidate code stream of each code stream structure, so that the flexibility of generating the target code stream suitable for scenes such as different network bandwidths is ensured; meanwhile, the target video frame is directly encoded based on the reference video frame, so that the accuracy of the initial encoding result of the candidate code stream is improved, and therefore, when the target code stream corresponding to the candidate code stream is obtained through the initial encoding result and the conventional video frame, the efficiency of video encoding on the source code stream can be improved.

Drawings

FIG. 1 is a diagram of an exemplary video coding method;

FIG. 2 is a flow diagram illustrating a video encoding method according to one embodiment;

FIG. 3 is a schematic flow chart illustrating the process of determining the arrangement structure of a source code stream according to an embodiment;

FIG. 4 is a schematic structural view of an arrangement structure in one embodiment;

fig. 5 is a schematic structural diagram of a first codestream of the full codestream structure in an embodiment;

FIG. 6 is a diagram illustrating a second bitstream structure of a bitstream structure in an embodiment;

FIG. 7 is a block diagram showing the structure of a video encoding apparatus according to one embodiment;

FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The video encoding method provided by the present application can be applied to an application environment as shown in fig. 1, where the application environment includes a first terminal 102, a server 104, and a second terminal 106, where the first terminal 102 communicates with the server 104 through a network, and the second terminal 106 communicates with the server 104 through the network. The first terminal 102 sends the source code stream to be encoded to the server 104. The server 104 is configured to determine an arrangement structure of video frames in the source code stream, and extract a plurality of video frames in the source code stream according to the arrangement structure to obtain a plurality of candidate code streams with different code stream structures; the server 104 is further configured to identify a reference video frame in each candidate code stream, and determine a target video frame and a conventional video frame associated with each reference video frame; respectively coding the associated target video frame according to each reference video frame to obtain an initial coding result of the candidate code stream of each code stream structure; for each candidate code stream in a plurality of candidate code streams with different code stream structures, obtaining a target code stream corresponding to the candidate code stream of the corresponding code stream structure according to an initial coding result corresponding to the candidate code stream and a conventional video frame; the server 104 sends the target code stream to the second terminal 106, so that the second terminal 106 obtains the target video by analyzing the target code stream. The first terminal 102 and the second terminal 106 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster composed of a plurality of servers.

In one embodiment, as shown in fig. 2, a video encoding method is provided, which is exemplified by the application of the method to the server in fig. 1, and includes the following steps:

step 202, obtaining a source code stream to be coded, and determining an arrangement structure of video frames in the source code stream.

The source code stream to be coded can be a code stream corresponding to a video sent by the first terminal; the arrangement structure may be a GOP (Group Of Picture) for defining an arrangement Of each video frame between two first video frames. The first video frame may be an I frame or an IDR frame, and the video frame between two first video frames may be a P frame. The I frame is a data frame obtained after the complete image compression, the information content of the data is large, the full frame image information is subjected to JPEG compression coding and transmission, and the complete image can be reconstructed only by using the data of the I frame during decoding. The IDR frame is the first I frame of a plurality of I frames. The P frame is an inter-frame prediction coding frame, the information amount of occupied data is small, coding can be carried out only by referring to the previous I frame or P frame, the P frame represents the difference between the current frame and the previous frame, and the final image can be generated only by superposing the difference represented by the current frame on the image constructed by the previous frame during decoding.

In one embodiment, as shown in fig. 3, determining an arrangement structure of video frames in a source code stream includes the following steps:

step 302, performing field coding on the source code stream to obtain a plurality of video frames arranged in sequence.

The frame number of the video frame is n, and n is a positive integer. Each video frame is presented in a Slice format, which is a region format in the H264 video coding standard, and consists of two parts, a header and data, the header information being a frame _ number field.

Specifically, the server may obtain a plurality of video frames arranged in sequence by field-coding the source code stream, such that the value of the frame _ number field of each video frame is incremented frame by frame in the arranged sequence. Since a large number of video frames may be contained in the source code stream, the value of the frame _ number field is usually incremented to a specific value, and then the value is rotated to 0 to be incremented again frame by frame.

In one embodiment, when the frame _ num field of the bitstream is represented by 4 bytes in a structure where the area format is expanded by bits, the value of the frame _ num field is thus in the range of 0 to 15.

In step 304, a video frame with a special frame number is determined from the plurality of video frames.

Wherein, the special frame number at least comprises n +2, and n is a positive integer. A video frame with a special frame number can be characterized as an ltr (long term reference frame).

In one embodiment, the special frame number may be predetermined by the user, for example, the frame number n +3 or n +4 may also be set as the special frame number.

And step 306, obtaining the arrangement structure of the video frames in the source code stream according to the video frames with the special frame numbers.

In one embodiment, obtaining an arrangement structure of video frames in a source code stream according to a video frame with a special frame number includes: responding to the correlation operation of the video frames with the special frame numbers, and obtaining the video frames with the special marks and the reference relation between the video frames with each special mark; responding to the configuration operation of the video frames with other frame numbers in the plurality of video frames to obtain the video frame with the conventional identifier; and synthesizing the special identification and the conventional identification of the video frame and each reference relation to obtain the arrangement structure of the video frame in the source code stream.

The reference relation representation is associated, and the video frame with the special identifier of the later frame number can be obtained by performing inter-frame prediction on the video frame with the special identifier of the earlier frame number.

Specifically, the server sets the video frame with the special frame number as the video frame with the special identifier, for example, the special identifier is set to 1, that is, the long-term reference frame is set to 1. And the server responds to the association operation of the user on the video frames with the special frame numbers to obtain the reference relationship between the video frames with each special identifier, sets the video frames associated with the reference relationship as a, and sets the video frames not associated with the reference relationship as 0. The server performs a configuration operation on video frames of other frame numbers in the plurality of video frames to obtain a video frame with a regular flag, for example, the regular flag is set to 0. And synthesizing the special identification and the conventional identification of the video frame and each reference relation to obtain an arrangement structure of a plurality of video frames which are sequentially arranged.

As shown in fig. 4, fig. 4 is a schematic structural diagram of the arrangement structure. For example, the special flag of I0 frame, P2 frame, P4 frame, etc. is set to 1; setting the conventional identification of P1 frame, P3 frame, P5 frame and the like to be 0; when the I0 frame and the P2 frame are associated, the reference relation between the I0 frame and the P2 frame is obtained, when the I2 frame and the P4 frame are associated, the reference relation between the I2 frame and the P4 frame is obtained, at this time, the P2 frame, the P4 frame and the like are set as a, and at the same time, the video frames associated with the reference relation are represented by solid arrows.

In one embodiment, the reference relationship corresponding to each video frame of each specific identifier may be sequentially buffered in the queue according to the frame number of each video frame of each specific identifier.

In one embodiment, a conventionally identified video frame may represent a str (short term reference frame) when the conventional identifier is set to 0, i.e., the short term reference frame is set to 0. The short-term reference frame with the earlier frame number is generally used for inter-frame prediction of the video frame with the later frame number, that is, there is an association relationship between the short-term reference frame and the video frame with the later frame number, as shown in fig. 4, the association relationship may be represented by a dashed arrow.

In the embodiment, the reference relationship between the video frames with different special identifications can be associated by predetermining the long-term reference frame with the special frame number and the short-term reference frame with the conventional identification, so that the arrangement structure of the video frames in the source code stream is accurately obtained, and therefore, the candidate code streams of various code stream structures can be flexibly determined through the arrangement structure subsequently.

In an embodiment, the server can implement time scalable coding by a long-term reference frame technology, that is, the server can implement one-time coding and can split multiple target code streams by controlling the arrangement structure of the source code streams by the long-term reference frame technology.

And step 204, extracting a plurality of video frames in the source code stream according to the arrangement structure to obtain a plurality of candidate code streams with different code stream structures.

Wherein, the arrangement structure comprises a special mark and a conventional mark; the candidate code stream comprises a first code stream of a full code stream structure and a second code stream of a half code stream structure.

In one embodiment, extracting a plurality of video frames in a source code stream according to an arrangement structure to obtain a plurality of candidate code streams with different code stream structures includes: extracting video frames corresponding to the special identifier and the conventional identifier from the source code stream to obtain a first code stream of a full code stream structure; and deleting the video frame corresponding to the conventional identifier in the first code stream, and storing the frame number and the conventional identifier of the video frame corresponding to the conventional identifier to obtain a second code stream of the semi-code stream structure.

Specifically, the server extracts the multiple video frames that are sequentially arranged from the source code stream according to the arrangement sequence of the video frames in the source code stream respectively corresponding to the special identifier and the conventional identifier, to obtain a first code stream with a full code stream structure, as shown in fig. 5, where fig. 5 is a schematic structural diagram of the first code stream with the full code stream structure. And the server reserves the video frame corresponding to the special identifier in the first code stream, deletes the video frame corresponding to the conventional identifier in the first code stream, and stores the frame number and the conventional identifier of the video frame corresponding to the conventional identifier to obtain a second code stream of the semi-code stream structure. As shown in fig. 6, fig. 6 is a schematic structural diagram of a first code stream of a half-code stream structure.

In one embodiment, the server can determine the code stream structure of the candidate code stream according to the special frame number, and when the special frame number is n +2 and n is a positive integer, the code stream structure is determined to comprise a full code stream structure and a half code stream structure; when the special frame number is n +3 and n is a positive integer, determining that the code stream structure comprises a full code stream structure and a one-third code stream structure; and when the special frame number is n +4 and n is a positive integer, determining that the code stream structure comprises a full code stream structure, a quarter code stream structure and the like.

And step 206, identifying the reference video frame in each candidate code stream, and determining a target video frame and a conventional video frame which are respectively associated with each reference video frame.

Specifically, the server traverses each video frame in the candidate code streams of each code stream structure, and determines the identifier type corresponding to the currently traversed current video frame. The server determines the type of the current video frame according to the identification type, and determines the current video frame as a reference video frame when the identification type is a special identification; and when the identification type is a conventional identification, determining that the current video frame is a conventional video frame.

In one embodiment, determining the respective associated target video frame and regular video frame of each reference video frame comprises: aiming at each reference video frame in the candidate code stream, acquiring a target reference relation corresponding to the current reference video frame; obtaining a current target video frame associated with the current reference video frame according to the target reference relation; and taking each video frame between the current reference video frame and the current target video frame as a conventional video frame associated with the current reference video frame.

Specifically, the server determines the current reference video frame in sequence according to the sequence of the frame numbers, and acquires the target reference relationship corresponding to the current reference video frame from the queue in which the reference relationship is cached in advance. The server can determine a current target video frame according to the target reference relation and the current reference video frame, and each video frame between the current reference video frame and the current target video frame is taken as a conventional video frame associated with the current reference video frame. Referring to fig. 5, if the current reference video frame is an I0 frame, it is determined that the P2 frame is a current target video frame associated with the I0 frame according to a reference relationship represented by an implementation arrow, and the P1 frame is a conventional video frame of the current reference video frame associated with the I0 frame.

In one embodiment, a video frame with a regular identifier between the current reference video frame and the current target video frame is used as a regular video frame associated with the current reference video frame.

And step 208, coding the associated target video frame according to each reference video frame to obtain an initial coding result of the candidate code stream of each code stream structure.

In one embodiment, the encoding the associated target video frame according to each reference video frame to obtain an initial encoding result of the candidate code stream of each code stream structure includes: determining a current reference video frame in the candidate code stream, and determining a current target video frame associated with the current reference video frame; performing inter-frame prediction on a current reference video frame to obtain a current prediction result; superposing the current prediction result with the current target video frame to obtain a first coding result corresponding to the current target video frame; taking the first coding result as a new current reference video frame, returning to the step of determining the current target video frame associated with the current reference video frame, and continuing to execute the step until obtaining the first coding result of the target video frame associated with the last reference video frame in the candidate code stream; and taking the first coding result of the target video frame associated with the last reference video frame as the initial coding result of the candidate code stream.

Inter-frame prediction is a compression method based on time redundancy, and achieves the purpose of image compression by utilizing the correlation among video frames, thereby improving the compression rate of video coding.

Specifically, the server determines that the current reference video frames are sequentially determined according to the sequence of the frame numbers, and determines the current target video frame associated with the current reference video frame according to the target reference relationship corresponding to the current reference video frame. The server carries out interframe prediction on the current reference video frame to obtain a current prediction result, and the current prediction result is overlapped with a current target video frame, wherein the current target video frame is usually a P frame, the P frame adopts a motion compensation method to transmit a difference value and a motion vector between the P frame and a previous video frame, and the motion vector is also called a prediction error. After the server overlaps the current prediction result with the prediction error, a first coding result corresponding to the current target video frame can be reconstructed, and the first coding result at the moment is a complete image.

Entering a next round of circulation process, taking the first coding result as a new current reference video frame, and returning to the step of determining a current target video frame associated with the current reference video frame to continue execution until a first coding result of a target video frame associated with a last reference video frame in the candidate code stream is obtained. And the server takes the first coding result of the target video frame associated with the last reference video frame as the initial coding result of the candidate code stream, wherein the initial coding result at the moment is a complete image iterated with the first coding results corresponding to all the target video frames.

In one embodiment, if the current reference video frame is the first video frame, i.e. the I0 frame in fig. 4, a complete image can be reconstructed by directly using the current reference video frame.

In one embodiment, the server inter-predicts, transform quantizes, loop filters, entropy encodes, etc. the current reference video frame according to video coding standards including, but not limited to, more general, more widely used h.264, etc.

And step 210, for each candidate code stream in the candidate code streams with different code stream structures, obtaining a target code stream corresponding to the candidate code stream of the corresponding code stream structure according to an initial coding result corresponding to the candidate code stream and a conventional video frame.

Specifically, when the candidate code stream is of a full-code-stream structure, the server determines an association relation corresponding to the conventional video frame, determines a reference video frame corresponding to the conventional video frame according to the association relation, and further obtains a second coding result corresponding to the conventional video frame according to the reference video frame and the conventional video frame, wherein the second coding result is a complete image. And when the candidate code stream is of a semi-code stream structure, determining a second coding result corresponding to the conventional video frames, and finally, synthesizing the initial coding result and the second coding result corresponding to each conventional video frame to obtain a target code stream corresponding to the candidate code stream of the corresponding code stream structure.

In one embodiment, the server may be a configurable video encoder. The video encoder is realized by a software encoder and a hardware encoder, and can sequentially perform video encoding on each video frame in the composite source code stream by adopting a video encoding standard. When the method is based on the H.264 video coding standard, the compatibility of a hardware encoder is higher, and hierarchical coding can be realized on the hardware encoder only by determining the arrangement structure of a source code stream.

According to the video coding method, a plurality of candidate code streams with different code stream structures can be obtained by acquiring a source code stream to be coded, determining the arrangement structure of video frames in the source code stream and further extracting and processing a plurality of video frames in the source code stream according to the arrangement structure; by identifying the reference video frame in each candidate code stream and determining the target video frame and the conventional video frame which are respectively associated with each reference video frame, the associated target video frame can be coded according to each reference video frame, so that the initial coding result of the candidate code stream of each code stream structure is obtained; for each candidate code stream in the candidate code streams of different code stream structures, the target code stream corresponding to the candidate code stream of the corresponding code stream structure can be obtained according to the initial coding result corresponding to the candidate code stream and the conventional video frame. According to the method and the device, a plurality of candidate code streams with different code stream structures can be obtained directly according to the arrangement structure, and compared with the traditional process of carrying out multiple encoding on the same video, the method and the device can obtain the target code stream corresponding to the candidate code stream aiming at the candidate code stream of each code stream structure, so that the flexibility of generating the target code stream suitable for scenes such as different network bandwidths is ensured.

In one embodiment, for each video frame in the source code stream, each video frame is encoded in sequence according to the order of the frame numbers. If the current video frame is a target video frame, coding the target video frame according to a long-term reference frame corresponding to the target video frame to obtain a first coding result corresponding to the target video frame, wherein the long-term reference frame is obtained through a reference relation corresponding to the target video frame; and if the current video frame is a conventional video frame, encoding the conventional video frame according to a short-term reference frame corresponding to the conventional video frame to obtain a second encoding result corresponding to the conventional video frame, wherein the short-term reference frame is obtained through the association relation corresponding to the conventional video frame. And synthesizing the first video frame, the first coding result corresponding to each target video frame and the second coding result corresponding to each conventional video frame until a target code stream of a full code stream structure is obtained.

In one embodiment, if the current video frame is a normal video frame, the normal video frame is encoded and filled to obtain a second encoding result corresponding to the normal video frame. And synthesizing the first video frame, the first coding result corresponding to each target video frame and the second coding result corresponding to each conventional video frame until a target code stream of a half code stream structure is obtained. Because each video frame is coded in turn, the time delay of video coding can not be caused.

In one embodiment, obtaining a target code stream corresponding to a candidate code stream of a corresponding code stream structure according to an initial coding result corresponding to the candidate code stream and a conventional video frame includes: determining a current reference video frame associated with a conventional video frame in the candidate code stream of the full code stream structure, and if the current reference video frame is a first reference video frame, overlapping each conventional video frame in the candidate code stream of the full code stream structure with the current reference video frame respectively to obtain a second coding result corresponding to each conventional video frame; if the current reference video frame is a non-first reference video frame, determining a first coding result of the current reference video frame, and overlapping each conventional video frame in the candidate code stream of the full code stream structure with the first coding result to obtain a second coding result corresponding to each conventional video frame; and synthesizing each second coding result and the preliminary coding result of the candidate code stream of the full code stream structure to obtain a target code stream corresponding to the candidate code stream of the full code stream structure.

Wherein, the code stream structure comprises a full code stream structure.

Specifically, when the current reference video frame is the first reference video frame and the first reference video frame is the first video frame in the source code stream, the complete image corresponding to the current reference video frame can be directly constructed, so that, in reference to step S208, a specific real-time step of overlapping the current prediction result and the current target video frame is performed, and the server overlaps each conventional video frame on the basis of the current reference video frame, so as to obtain a second encoding result corresponding to each conventional video frame. Referring to fig. 5, when the current reference video frame is an I0 frame, after the prediction result of the P1 frame is obtained through the I0 frame, the prediction result is overlapped with the prediction error in the P1 frame, so as to reconstruct a second coding result corresponding to the P1 frame, where the second coding result is a complete image.

Similarly, when the current reference video frame is a non-first reference video frame, a first coding result corresponding to the current reference video frame needs to be determined, and the first coding result corresponding to the current reference video frame represents a first coding result obtained by coding the target video frame when the current reference video frame is regarded as the target video frame. For example, when the current reference video frame is a P1 frame, the first encoding result corresponding to the P1 frame is a complete picture corresponding to P1. Therefore, the server respectively superimposes each conventional video frame on the line on the basis of the first encoding result to obtain a second encoding result corresponding to each conventional video frame. And the server arranges each second coding result according to the arrangement sequence of the frame numbers, synthesizes the preliminary coding results of the candidate code streams of the full code stream structure to obtain target code streams corresponding to the candidate code streams of the full code stream structure, and the target code streams can be regarded as complete videos formed by a plurality of video frames obtained after decoding.

In one embodiment, the server may be a video decoder. And synthesizing each second coding result and the preliminary coding result of the candidate code stream of the full code stream structure to obtain a target code stream corresponding to the candidate code stream of the full code stream structure, wherein the process can be a video decoding process.

In this embodiment, for the candidate code stream of the full code stream structure, by determining whether the current reference video frame is the first reference video frame, and then determining the second encoding result corresponding to each conventional video frame in different manners, the target code stream corresponding to the candidate code stream of the full code stream structure can be efficiently and accurately obtained.

In one embodiment, obtaining a target code stream corresponding to a candidate code stream of a corresponding code stream structure according to an initial coding result corresponding to the candidate code stream and a conventional video frame includes: each conventional video frame in the candidate code stream of the semi-code stream structure is coded and filled to obtain a second coding result corresponding to each conventional video frame; and synthesizing each second coding result and the preliminary coding result of the candidate code stream of the half code stream structure to obtain a target code stream corresponding to the candidate code stream of the half code stream structure.

Wherein, the code stream structure comprises a half code stream structure.

Specifically, the server performs encoding and filling on each conventional video frame in the candidate code stream of the semi-code stream structure through a special macro block to obtain a second encoding result corresponding to each conventional video frame, where the second encoding result includes a smaller data amount. The SKIP MB is a special macroblock in the h264 video coding standard, its motion vector is predicted by the neighboring blocks, its residual data is all 0, it can be considered that a macroblock of the conventional video frame is a copy of a macroblock at the corresponding position on its reference video frame, and therefore, the number of coding bits required by the special macroblock is small.

As shown in fig. 6, each of the P1 frame, the P3 frame, the P5 frame, and the like uses an artificial coded frame of a special macroblock having a very small data amount. When the target code stream needs to be decoded, aiming at the respective corresponding second coding result of each conventional video frame, determining a first reference video frame positioned before the frame number sequence of the current conventional video frame, and copying the reference video frame, namely using the finished image corresponding to the reference video frame as the complete image of the current conventional video frame. And the server arranges the complete images corresponding to each conventional video frame according to the arrangement sequence of the frame numbers, and synthesizes the preliminary coding result of the candidate code stream of the half-code stream structure to obtain the target code stream corresponding to the candidate code stream of the half-code stream structure.

In the embodiment, each conventional video frame in the candidate code stream of the semi-code stream structure is coded and filled through the special macro block aiming at the candidate code stream of the semi-code stream structure, and then when video decoding is carried out, a finished image of a corresponding reference video frame can be directly copied, so that a target code stream corresponding to the candidate code stream of the semi-code stream structure can be efficiently obtained, and the load of a video decoder is also reduced.

It should be understood that, although the steps in the flowcharts related to the above embodiments are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the above embodiments may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the present application further provides a video encoding apparatus for implementing the video encoding method mentioned above. The implementation scheme for solving the problem provided by the apparatus is similar to the implementation scheme described in the above method, so specific limitations in one or more embodiments of the video encoding apparatus provided below can be referred to the limitations of the video encoding method in the foregoing, and details are not described herein again.

In one embodiment, as shown in fig. 7, there is provided a video encoding apparatus 700, including: a candidate code stream determining module 702, a video frame identifying module 704 and a target code stream determining module 706, wherein:

a candidate code stream determining module 702, configured to obtain a source code stream to be encoded, and determine an arrangement structure of video frames in the source code stream; extracting a plurality of video frames in the source code stream according to the arrangement structure to obtain a plurality of candidate code streams with different code stream structures;

a video frame identification module 704, configured to identify a reference video frame in each candidate code stream, and determine a target video frame and a conventional video frame associated with each reference video frame;

a target code stream determining module 706, configured to encode the associated target video frame according to each reference video frame, respectively, to obtain an initial encoding result of a candidate code stream of each code stream structure; and for each candidate code stream in the candidate code streams with different code stream structures, obtaining a target code stream corresponding to the candidate code stream of the corresponding code stream structure according to the initial coding result corresponding to the candidate code stream and the conventional video frame.

In an embodiment, the candidate code stream determining module 702 further includes an arrangement structure module 7021, configured to perform field coding on the source code stream to obtain a plurality of video frames arranged in sequence; the frame number of the video frame is n, and n is a positive integer; determining a video frame with a special frame number from a plurality of video frames; the special frame number at least comprises n +2; and obtaining the arrangement structure of the video frames in the source code stream according to the video frames with the special frame numbers.

In one embodiment, the arrangement structure module 7021 is further configured to, in response to the association operation on the video frames with the special frame numbers, obtain the video frames with the special identifiers and a reference relationship between each video frame with the special identifier; responding to the configuration operation of video frames with other frame numbers in the plurality of video frames to obtain video frames with conventional identifications; and synthesizing the special identification and the conventional identification of the video frame and each reference relation to obtain the arrangement structure of the video frame in the source code stream.

In one embodiment, the candidate code stream determining module 702 further includes an extracting module 7022, which extracts the video frames corresponding to the special identifier and the general identifier from the source code stream to obtain a first code stream with a full code stream structure; deleting the video frame corresponding to the conventional identifier in the first code stream, and storing the frame number and the conventional identifier of the video frame corresponding to the conventional identifier to obtain a second code stream of a semi-code stream structure.

In an embodiment, the video frame identification module 704 is configured to, for each reference video frame in the candidate code stream, obtain a target reference relationship corresponding to the current reference video frame; obtaining a current target video frame associated with the current reference video frame according to the target reference relation; and taking each video frame between the current reference video frame and the current target video frame as a conventional video frame associated with the current reference video frame.

In one embodiment, the target code stream determining module 706 includes an initial encoding result module 7061, configured to determine a current reference video frame in the candidate code stream, and determine a current target video frame associated with the current reference video frame; performing inter-frame prediction on a current reference video frame to obtain a current prediction result; superposing the current prediction result with the current target video frame to obtain a first coding result corresponding to the current target video frame; taking the first coding result as a new current reference video frame, returning to the step of determining the current target video frame associated with the current reference video frame, and continuing to execute the step until obtaining the first coding result of the target video frame associated with the last reference video frame in the candidate code stream; and taking the first coding result of the target video frame associated with the last reference video frame as the initial coding result of the candidate code stream.

In one embodiment, the target code stream determining module 706 includes a video frame overlapping module 7062, configured to determine a current reference video frame associated with a conventional video frame in the candidate code streams of the full-code stream structure, and if the current reference video frame is a first reference video frame, overlap each conventional video frame in the candidate code streams of the full-code stream structure with the current reference video frame, to obtain a second encoding result corresponding to each conventional video frame; if the current reference video frame is a non-first reference video frame, determining a first coding result of the current reference video frame, and overlapping each conventional video frame in the candidate code stream of the full code stream structure with the first coding result to obtain a second coding result corresponding to each conventional video frame; and synthesizing each second coding result and the preliminary coding result of the candidate code stream of the full code stream structure to obtain a target code stream corresponding to the candidate code stream of the full code stream structure.

In one embodiment, the target code stream determining module 706 includes a coding and filling module 7063, configured to perform coding and filling on each conventional video frame in the candidate code stream of the semi-code stream structure to obtain a second coding result corresponding to each conventional video frame; and synthesizing each second coding result and the preliminary coding result of the candidate code stream of the half code stream structure to obtain a target code stream corresponding to the candidate code stream of the half code stream structure.

The various modules in the video encoding apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 8. The computer device comprises a processor, a memory, an Input/Output (I/O) interface and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing video coding data. The input/output interface of the computer device is used for exchanging information between the processor and an external device. The communication interface of the computer device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a video encoding method.

It will be appreciated by those skilled in the art that the configuration shown in fig. 8 is a block diagram of only a portion of the configuration associated with the present application, and is not intended to limit the computing device to which the present application may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps in the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes in the methods according to the embodiments described above may be implemented by hardware instructed by a computer program, and the computer program may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes according to the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), magnetic Random Access Memory (MRAM), ferroelectric Random Access Memory (FRAM), phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the various embodiments provided herein may be, without limitation, general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, or the like.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not to be construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application should be subject to the appended claims.

Claims

1. A method of video encoding, the method comprising:

2. The method of claim 1, wherein the determining an arrangement structure of video frames in the source code stream comprises:

performing field coding on the source code stream to obtain a plurality of video frames which are sequentially arranged; the frame number of the video frame is n, and n is a positive integer;

determining a video frame with a special frame number from a plurality of video frames; the special frame number at least comprises n +2;

and obtaining the arrangement structure of the video frames in the source code stream according to the video frames with the special frame numbers.

3. The method according to claim 2, wherein obtaining the arrangement structure of the video frames in the source code stream according to the video frames with the special frame numbers comprises:

responding to the association operation of the video frames with the special frame numbers, and obtaining the video frames with special marks and the reference relation between the video frames with the special marks;

obtaining a video frame with a conventional identifier in response to a configuration operation on video frames of other frame numbers in the plurality of video frames;

and synthesizing the special identification and the conventional identification of the video frame and each reference relation to obtain the arrangement structure of the video frames in the source code stream.

4. The method of claim 1, wherein the arrangement structure comprises a special mark and a regular mark; the candidate code stream comprises a first code stream of a full code stream structure and a second code stream of a half code stream structure; the extracting a plurality of video frames in the source code stream according to the arrangement structure to obtain a plurality of candidate code streams with different code stream structures includes:

extracting video frames corresponding to the special identifier and the conventional identifier from the source code stream to obtain a first code stream of a full code stream structure;

deleting the video frame corresponding to the conventional identifier in the first code stream, and storing the frame number and the conventional identifier of the video frame corresponding to the conventional identifier to obtain a second code stream of a half code stream structure.

5. The method of claim 1, wherein determining the respective associated target video frame and regular video frame for each of the reference video frames comprises:

aiming at each reference video frame in the candidate code stream, acquiring a target reference relation corresponding to the current reference video frame;

obtaining a current target video frame associated with the current reference video frame according to the target reference relation;

and taking each video frame between the current reference video frame and the current target video frame as a conventional video frame associated with the current reference video frame.

6. The method according to claim 1, wherein said encoding the associated target video frame according to each of the reference video frames to obtain an initial encoding result of the candidate codestream for each codestream structure comprises:

determining a current reference video frame in the candidate code stream, and determining a current target video frame associated with the current reference video frame;

performing inter-frame prediction on the current reference video frame to obtain a current prediction result;

superposing the current prediction result and the current target video frame to obtain a first coding result corresponding to the current target video frame;

taking the first coding result as a new current reference video frame, returning to the step of determining the current target video frame associated with the current reference video frame, and continuing to execute the step until a first coding result of a target video frame associated with the last reference video frame in the candidate code stream is obtained;

and taking the first coding result of the target video frame associated with the last reference video frame as the initial coding result of the candidate code stream.

7. The method of claim 1, wherein the codestream structure comprises a full codestream structure; the obtaining of the target code stream corresponding to the candidate code stream of the corresponding code stream structure according to the initial coding result corresponding to the candidate code stream and the conventional video frame includes:

determining a current reference video frame associated with a conventional video frame in the candidate code stream of the full code stream structure, and if the current reference video frame is a first reference video frame, overlapping each conventional video frame in the candidate code stream of the full code stream structure with the current reference video frame respectively to obtain a second coding result corresponding to each conventional video frame;

if the current reference video frame is a non-first reference video frame, determining a first coding result of the current reference video frame, and overlapping each conventional video frame in the candidate code stream of the full code stream structure with the first coding result to obtain a second coding result corresponding to each conventional video frame;

and synthesizing each second coding result and the preliminary coding result of the candidate code stream of the full code stream structure to obtain a target code stream corresponding to the candidate code stream of the full code stream structure.

8. The method of claim 1, wherein the codestream structure comprises a semi-codestream structure; the obtaining of the target code stream corresponding to the candidate code stream of the corresponding code stream structure according to the initial coding result corresponding to the candidate code stream and the conventional video frame includes:

coding and filling each conventional video frame in the candidate code stream of the half code stream structure to obtain a second coding result corresponding to each conventional video frame;

and synthesizing each second coding result and the preliminary coding result of the candidate code stream of the semi-code stream structure to obtain a target code stream corresponding to the candidate code stream of the semi-code stream structure.

9. An apparatus for video encoding, the apparatus comprising:

10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 8.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.

12. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 8 when executed by a processor.