WO2022061613A1

WO2022061613A1 - Video coding apparatus and method, and computer storage medium and mobile platform

Info

Publication number: WO2022061613A1
Application number: PCT/CN2020/117220
Authority: WO
Inventors: 王悦名; 郑萧桢
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2020-09-23
Filing date: 2020-09-23
Publication date: 2022-03-31
Also published as: CN113454997A

Abstract

A video coding apparatus and method, and a computer storage medium and a mobile platform. The video coding apparatus comprises: an integer pixel search module, which is used for determining a matching block that matches the current block in the current frame within multiple predetermined ranges in multiple reference frames; a sub-pixel search module, which is electrically connected to the integer pixel search module and used for determining at least one sub-pixel matching block with regard to the matching block; and a mode decision making module, which is electrically connected to the sub-pixel search module and used for executing mode decision making by at least using a coding cost of sub-pixel matching blocks, so as to obtain an optimal prediction block of the current block for video coding, wherein the sub-pixel search module comprises a half-pixel interpolation module, which can use a first interpolation filter to perform half-pixel interpolation on video streams of a H.264 coding format and video streams of a H.265 coding format. In the video coding solution, part of a hardware structure is multiplexed so as to code video streams of a H.264 coding format and a H.265 coding format, so that the hardware area is saved on.

Description

Video encoding apparatus, method, computer storage medium, and removable platform

manual

technical field

The present invention relates to the technical field of video coding, and in particular, to a video coding apparatus, method, computer storage medium and removable platform.

Background technique

The H.264 video coding standard is a highly compressed digital video codec proposed by the Joint Video Team (JVT, Joint Video Team) jointly formed by the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). Compared with the previous video coding standard, the H.264 video coding standard can provide better image quality under the same bandwidth. The H.265 video coding standard is a new video coding standard formulated by the ITU-T Video Coding Expert Group following the H.264 video coding standard. The H.265 video coding standard retains some technologies of the H.264 video coding standard and improves it on this basis.

Usually, a chip will contain multiple independent encoders to implement video encoding under different video encoding standards. If you want to encode video streams in H.264 and H.265 formats, you need to set two different encoders respectively. However, multiple encoders consume more hardware area.

SUMMARY OF THE INVENTION

A series of concepts in simplified form have been introduced in the Summary section, which are described in further detail in the Detailed Description section. The Summary of the Invention section of the present invention is not intended to attempt to limit the key features and essential technical features of the claimed technical solution, nor is it intended to attempt to determine the protection scope of the claimed technical solution.

In view of the deficiencies of the prior art, the first aspect of the embodiments of the present invention provides a video encoding apparatus, where the video encoding apparatus includes:

an integer pixel search module for determining a matching block that matches the current block in the current frame within a plurality of predetermined ranges in the plurality of reference frames;

a sub-pixel search module, electrically connected to the whole pixel search module, and the sub-pixel search module is configured to determine at least one sub-pixel matching block about the matching block;

a mode decision module, electrically connected to the sub-pixel search module, for performing mode decision at least using the coding cost of the sub-pixel matching block to obtain the optimal prediction block of the current block for video coding;

Wherein, the sub-pixel search module includes a half-pixel interpolation module, and the half-pixel interpolation module can use the first interpolation filter to perform H.264 encoding format video stream and H.265 encoding format video The stream does one-half pixel interpolation.

A second aspect of the embodiments of the present invention provides a video encoding method, where the video encoding method includes:

The integer pixel search module determines a matching block that matches the current block in the current frame within a plurality of predetermined ranges in the plurality of reference frames;

A sub-pixel search module electrically connected to the whole-pixel search module determines at least a sub-pixel matching block for the matching block, wherein the sub-pixel search module includes a half-pixel interpolation module, the determination is The at least one pixel matching block of the matching block includes: the one-half pixel interpolation module uses the first interpolation filter to perform one-half pixel interpolation on the video stream in the H.264 encoding format or the video stream in the H.265 encoding format. ;

A mode decision module electrically connected to the sub-pixel search module makes mode decision at least using the coding cost of the sub-pixel matching block to obtain an optimal prediction block of the current block for video encoding.

A third aspect of the embodiments of the present invention provides a computer storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of the above video encoding method.

A fourth aspect of the embodiments of the present invention provides a movable platform, where the movable platform includes an imaging device and the above video encoding device, where the imaging device is used to collect video data, and the video encoding device is used to The video data collected by the imaging device is subjected to video encoding.

The video encoding apparatus, method, computer storage medium and movable platform of the embodiments of the present invention multiplex part of the hardware structure to encode video streams in H.264 encoding format and H.265 encoding format, saving hardware area.

Description of drawings

In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative labor.

In the attached image:

1 shows a structural block diagram of a video encoding apparatus according to an embodiment of the present invention;

2 shows a schematic diagram of a pipeline stage of a video encoding apparatus according to an embodiment of the present invention;

3 shows a flowchart of a video encoding method according to an embodiment of the present invention;

FIG. 4 shows a structural block diagram of a movable platform according to an embodiment of the present invention.

detailed description

In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments according to the present invention will be described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of the embodiments of the present invention, and it should be understood that the present invention is not limited by the example embodiments described herein. Based on the embodiments of the present invention described in the present invention, all other embodiments obtained by those skilled in the art without creative efforts shall fall within the protection scope of the present invention.

In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without one or more of these details. In other instances, some technical features known in the art have not been described in order to avoid obscuring the present invention.

It should be understood that the present invention may be embodied in different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a," "an," and "the/the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the terms "compose" and/or "include", when used in this specification, identify the presence of stated features, integers, steps, operations, elements and/or components, but do not exclude one or more other The presence or addition of features, integers, steps, operations, elements, parts and/or groups. As used herein, the term "and/or" includes any and all combinations of the associated listed items.

For a thorough understanding of the present invention, detailed steps and detailed structures will be proposed in the following description to explain the technical solutions proposed by the present invention. Preferred embodiments of the present invention are described in detail below, however, the present invention may have other embodiments in addition to these detailed descriptions.

Both H.264 and H.265 video coding standards adopt a hybrid coding framework, both of which include basic processes such as prediction, transformation, quantization, inverse transformation, inverse quantization, entropy coding, and loop filtering. Specifically, the video frame input to the video coding device is firstly divided into sub-blocks, which are implemented as macroblocks in the H.264 video coding standard and as coding tree units in the H.265 video coding standard. After that, each sub-block can be further divided into smaller sub-blocks. Each divided sub-block needs to be predicted first. The prediction is divided into intra-frame prediction and inter-frame prediction. Intra-frame prediction uses the encoded image blocks in the same frame image to predict the current block, and inter-frame prediction uses the previous frame or previous frame. Multiple frames of already coded image blocks are predicted for the current block.

The prediction block of the current block is obtained through the above prediction process, and the residual block is obtained by subtracting the prediction block from the current block. After that, the video encoding apparatus transforms the residual block, converts the coefficients from the time domain to the frequency domain, and quantizes the coefficients in the frequency domain to reduce the value of the coefficients.

On the one hand, the quantized coefficients are sent to the entropy encoder for encoding together with the encoded mode information to obtain a binary code stream, and on the other hand, inverse quantization and inverse transformation are performed to restore the prediction residual block (ie, the reconstructed residual block). ), the reconstructed residual block is added to the predicted block to obtain the reconstructed block. Finally, in-loop filtering is performed on the reconstructed image to obtain the final reconstructed image, which is then provided to the subsequent encoded image for inter-frame prediction.

Generally speaking, a chip includes multiple encoding devices, which respectively perform video encoding based on their corresponding video encoding standards. If each encoding device is independent, it needs to consume a lot of hardware area. Since the video coding standards under the H.264 video coding standard and the H.265 video coding standard both adopt similar hybrid coding frameworks, and there are many similar or identical modules, the video coding apparatus according to the embodiment of the present invention therefore has Parts are integrated, and the same hardware is reused to perform video encoding in H.264 and H.265 encoding formats, thereby saving hardware area.

The video encoding apparatus, method, computer storage medium, and removable platform according to the embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The features of the embodiments and implementations described below may be combined with each other without conflict.

FIG. 1 shows a structural block diagram of a video encoding apparatus 100 according to an embodiment of the present invention. As shown in FIG. 1 , the video encoding apparatus 100 at least includes an integer pixel search module 110 , a sub-pixel search module 120 and a mode decision module 130 . Wherein, the whole pixel search module 110 is used to determine a matching block that matches the current block in the current frame within a plurality of predetermined ranges in the multiple reference frames; the sub-pixel search module 120 is electrically connected to the whole pixel search module 110 , and the sub-pixel search module 120 is used to determine at least one sub-pixel matching block about the matching block; the mode decision module 130 is electrically connected to the sub-pixel search module 120 for at least using the The encoding cost makes mode decisions to obtain the optimal prediction block for the current block for video encoding. Wherein, the sub-pixel search module 120 includes a half-pixel interpolation module, and the half-pixel interpolation module can use the first interpolation filter to perform H.264 encoding format video streams and H.265 encoding format video streams. The video stream is interpolated by one-half pixel. In one embodiment, after the mode decision module 130 determines the optimal prediction block, the mode decision module 130 subtracts the current block from the prediction block to obtain a residual block. Next, the mode decision module 130 transforms the residual block to obtain a coefficient block, and quantizes the coefficient block to obtain a quantized coefficient block. Finally, the mode decision module 130 transmits the quantized coefficient block and mode information to the entropy encoding module for entropy encoding. The mode information includes at least information related to block division and prediction mode.

The video encoding apparatus 100 according to the embodiment of the present invention multiplexes the same hardware structure to encode video streams in H.264 and H.265 encoding formats, thereby saving the area of the hardware, wherein the multiplexed hardware structure is at least It includes a half-pixel interpolation module, that is, the half-pixel interpolation module in the video encoding device 100 can be used to perform half-pixel interpolation on the video stream in the H.264 encoding format, and can also be used for H. 265 video streams in two encoding formats for one-half pixel interpolation.

Wherein, the whole pixel search module 110 and the sub-pixel search module 120 are used to perform inter-frame prediction on the video stream in H.264 encoding format or the video stream in H.265 encoding format, and find the matching block of the current block in the reference frame, Thereby, temporal redundancy is eliminated on the basis of encoded video frames. Wherein, the whole pixel search module 110 is also used to determine the first motion vector between the current block and the matching block; the sub-pixel search module 120 is also used to determine the second motion vector of the current block relative to at least one sub-pixel matching block, The precision of the second motion vector is higher than that of the first motion vector, that is, the first motion vector is of integer pixel precision, and the second motion vector is of sub-pixel precision. Wherein, for the H.264 format, the current block and the matching block are macroblocks or sub-macroblocks; for H.265, the current block is the coding unit and the matching block is the prediction unit.

Specifically, the H.264 coding format supports the division of macroblocks and sub-macroblocks of 7 different sizes and shapes, and provides four macroblock division modes of 16×16, 16×8, 8×16 and 8×8 for the luminance component , the 8×8 macroblock can be further divided into three sub-macroblocks of 8×4, 4×8 and 4×4, and each macroblock has its own motion vector. In the H.265 coding format, a similar division structure is a coding tree unit (CTU), the size of which can be a maximum of 64×64 and a minimum size of 16×16. A coding tree unit (CTU) contains a luma coding tree block (CTB) and two chroma coding tree blocks (CTB) at the same location, as well as some corresponding syntax elements. The coding tree block CTB can be directly used as a coding block (CB), or can be further divided into multiple small CBs in the form of a quadtree. One luma CB, two chroma CBs and some related syntax elements together form a coding unit (CU), each CU can be divided into one or more corresponding prediction units (PU), and each PU can obtain its own corresponding Motion vector, the motion vector of each PU can be used to obtain prediction information from the reconstructed reference frame. The current block in this embodiment of the present application refers to the smallest prediction unit divided according to the corresponding video coding standard, and the size of the current block at different positions in the current frame may be different.

For video streams in H.264 and H.265 encoding formats, the mode of integer pixel search is basically the same, so the integer pixel search module 110 is based on exactly the same hardware structure for the video stream in H.264 encoding format and H. 265 encoding format video stream for integer pixel search, which may specifically include a candidate motion vector acquisition sub-module, a search area determination sub-module, and an integer pixel search sub-module, regardless of whether the video stream in H.264 encoding format or the video in H.265 encoding format. Streams are all searched for integer pixels through the above three sub-modules. Specifically, the candidate motion vector acquisition sub-module is used to acquire candidate motion vectors, and the candidate motion vectors may be the motion vectors of the spatial adjacent blocks of the current block, the motion vectors of the temporal adjacent blocks, the global motion vector and the zero motion vector. one or more. The search area determination sub-module is used for determining the search area of the integer pixel search according to the candidate motion vector. The integer pixel search sub-module is used to take the position pointed by the candidate motion vector as the starting search point, and perform an integer pixel search on all or part of the points in the search area, calculate the coding cost at each point during the search, and select the one with the smallest coding cost. point as the optimal search result.

The sub-pixel search module 120 is electrically connected to the whole-pixel search module 110, and is configured to further perform sub-pixel search on the basis of the matching blocks obtained by the whole-pixel search, so as to further improve the search accuracy. The sub-pixel search mainly includes two parts: interpolation and coding cost calculation. When the motion vector points to an integer pixel position, the prediction block may be composed of corresponding pixels of the reference frame, otherwise the prediction block will be obtained by interpolating using a filter to produce pixels at non-integer positions.

Sub-pixel search includes half-pixel precision and quarter-pixel precision, for interpolation at half-pixel positions, as described above, sub-pixel search module 120 uses a first interpolation filter for H.264 The video stream in the encoding format and the video stream in the H.265 encoding format are subjected to half-pixel interpolation. In one embodiment, the first interpolation filter is an 8-tap interpolation filter, that is, an 8-tap interpolation filter is used to perform the second interpolation for both the video stream in the H.264 encoding format and the video stream in the H.265 encoding format. One-half pixel interpolation, but because the predicted value of the sampled signal at one-half pixel position in the H.264 video coding standard is obtained by applying one-dimensional horizontal and vertical sixth-order filtering, it is not suitable for H.264 encoding format. When 1/2 pixel interpolation is performed on the video stream of , there are two taps in the 8-tap interpolation filter that do not participate in the operation. In another implementation manner, the coefficients of the corresponding two taps in the 8-tap interpolation filter that do not participate in the operation may be set to 0.

For interpolation at quarter-pixel positions, the sub-pixel search module 120 includes a first quarter-pixel interpolation module and a second quarter-pixel interpolation module, the first quarter-pixel interpolation module being based on the The second interpolation filter performs quarter-pixel interpolation on the video stream in H.264 encoding format, and the second quarter-pixel interpolation module performs quarter-pixel interpolation on the video stream in H.265 encoding format based on the third interpolation filter Pixel interpolation. That is to say, due to the large difference between the quarter-pixel interpolation in the H.264 video coding standard and the H.265 video coding standard, the hardware used for quarter-pixel interpolation is not multiplexed, but is used separately. Different interpolation filters perform quarter-pixel interpolation on H.264 and H.265 encoded video streams. Wherein, the second interpolation filter may be a 2-pixel mean filter, which uses adjacent integer pixels or half pixels to obtain an average value to obtain a pixel value at a quarter pixel position, which is used for the weighted average Adjacent two pixels may be whole pixels or half pixels in the horizontal, vertical or diagonal directions at the quarter pixel position. The third interpolation filter may be a 7- or 8-tap interpolation filter that uses adjacent integer pixels or half pixels to average to obtain pixel values at quarter pixel locations. Specifically, the third interpolation filter may be a horizontal or vertical 7-tap interpolation filter. Alternatively, the third interpolation filter may be a horizontal or vertical 8-tap interpolation filter. Wherein, the coefficient of a corresponding tap in the 8-tap interpolation filter that does not participate in the operation is 0.

The sub-pixel search module 120 further includes a coding cost calculation sub-module, which is used to calculate the difference between the sub-pixel matching block and the current block based on the same hardware structure for the video stream in the H.264 encoding format or the video stream in the H.265 encoding format. The first encoding cost of . That is to say, the sub-pixel search module 120 calculates the first encoding cost of sub-pixel search for the video stream in H.264 encoding format and the video stream in H.265 encoding format based on the same hardware structure, so as to realize the multiplexing of this part of the hardware structure. .

Specifically, the coding cost calculation sub-module can use the SAD/SATD cost function model to calculate the coding cost of sub-pixel search. The SAD/SATD cost function model uses the difference between the predicted value and the image pixel value to calculate the cost, which essentially reflects the difference between the current block and the predicted block. In order to reflect the cost value of each mode more accurately, in the actual calculation, the residual error can be converted to the frequency domain to obtain the absolute difference and SATD, and the coding cost can be calculated according to the SATD.

In one embodiment, the video encoding apparatus 100 further includes an intra-frame mode preliminary selection module 140 for selecting one or more optimal intra-frame prediction modes from multiple intra-frame prediction modes. Specifically, the intra-mode primary selection module 140 is connected to the mode decision module 130, and is configured to determine at least one prediction block related to the current block and at least one prediction block related to the at least one prediction block according to the pixel value corresponding to at least one adjacent reference block in the current frame. a second encoding cost corresponding to the block, and determining at least one intra-frame prediction mode according to the second encoding cost. Intra-frame prediction can make full use of the relevant information of the adjacent reference blocks for coding according to the correlation between the current block and its adjacent reference blocks, thereby improving the coding efficiency.

For H.264, there are 4 optional prediction modes for 16×16 luminance and 8×8 chrominance, including vertical mode, horizontal mode, non-DC mode and plane (Plane) mode. For 4×4 and 8×8 luminance blocks, there are 9 optional prediction modes, including horizontal prediction, vertical prediction, DC mode (DC mode), and 6 special types such as left diagonal and right diagonal. Direction prediction mode. The plane mode is based on the pixels directly above and to the left, and uses the linear function Plane to predict the pixel value of the current block.

In the H.265 video coding standard, 35 intra-frame prediction modes are defined on the basis of PU, which include Planar mode, DC mode, vertical mode, horizontal mode and 31 special angle modes. The prediction direction of each angle mode can be regarded as a certain offset in the vertical or horizontal direction.

Based on the similarities and differences of intra-frame prediction in the H.264 and H.265 video coding standards, the intra-frame mode preliminary selection module 140 multiplexes part of the hardware structure to perform the H.264 and H.265 coding formats on the one hand. The same part in intra-frame prediction, on the other hand, also provides different hardware structures for H.264 and H.265 encoding formats, respectively, to carry out the difference in intra-frame prediction of H.264 and H.265 encoding formats, respectively part.

Specifically, the intra-frame mode preliminary selection module 140 includes a first intra-frame mode preliminary selection module, a second intra-frame mode preliminary selection module and a common intra-frame mode preliminary selection module, wherein the common intra-frame prediction mode preliminary selection module is H. 264 and H.265 encoding formats multiplexed hardware, on the one hand, together with the first intra-frame mode primary selection module, it is used to select the intra-frame prediction mode for the video stream in the H.264 encoding format, and on the other hand, it is used for the first intra-frame prediction mode. The two intra-frame mode primary selection modules together select the intra-frame prediction mode for the video stream in the H.265 encoding format.

Intra-frame prediction mainly includes two parts: intra-frame prediction interpolation and coding cost calculation. In the H.264 and H.265 coding formats, the hardware structure corresponding to the intra prediction interpolation of the partial intra prediction mode is the same, including horizontal prediction, vertical prediction and partial direct current (DC) prediction. Therefore, the common intra mode primary selection module includes a horizontal prediction sub-module, a vertical prediction sub-module and a partial DC prediction sub-module. The horizontal prediction sub-module is used to perform intra-frame prediction interpolation in the horizontal mode on the video stream in the H.264 encoding format or the video stream in the H.265 encoding format based on the same hardware structure. The vertical prediction sub-module is used to perform intra-frame prediction interpolation in vertical mode on the video stream in the H.264 encoding format or the video stream in the H.265 encoding format based on the same hardware structure. Part of the DC prediction sub-module is used for the image block corresponding to the luminance component in the video stream in the H.264 encoding format, the image block corresponding to the luminance component and the chrominance component in the video stream in the H.265 encoding format based on the same hardware structure. Performs intra prediction interpolation in DC mode. Wherein, the horizontal prediction sub-module uses the right pixel to horizontally predict the corresponding pixel value of the current block. The vertical prediction sub-module utilizes the pixels directly above to vertically predict the corresponding pixel value of the current block. The partial DC prediction sub-module is suitable for large flat areas, and uses the reference pixels directly above and to the left to predict the pixel value of the current block. In the H.264 encoding format, for the image block corresponding to the luminance component, when both the pixels directly above and to the left exist, the pixel value of the current block is the average value of these two groups of pixels; Or when a group of pixels to the left exists, the pixel value of the current block is the average value of this group of pixels. In the H.265 encoding format, for the image blocks corresponding to the luminance component and the chrominance component, if the reference pixel is not available, the adjacent available pixels or the default value will be used for filling, and the adjacent left and upper adjacent pixels will be filled after filling. Pixels become available.

In addition to the above three intra-frame prediction modes multiplexing hardware structures to perform intra-frame prediction and interpolation on video streams in H.264 and H.265 encoding formats, the intra-frame prediction and interpolation of other intra-frame prediction modes are implemented by different hardware structures respectively. Specifically, the first intra-mode preliminary selection module further includes a first direction prediction sub-module, a first plane prediction sub-module, and a DC prediction sub-module of the chrominance component. The first directional prediction sub-module and the first plane prediction sub-module are respectively used to perform intra-frame prediction interpolation in the directional mode and the plane (Plane) mode on the video stream in the H.264 encoding format. Wherein, the first direction prediction sub-module includes interpolation filters in six direction modes: left diagonal, right diagonal, vertical to right, horizontal to down, vertical to left, and horizontal to up. The first plane prediction sub-module uses the linear function plane to predict the pixel value of the current block based on the pixels directly above and to the left. The DC prediction sub-module of the chrominance component is used to perform intra-frame prediction interpolation in the DC mode on the image block corresponding to the chrominance component in the video stream in the H.264 encoding format.

The second intra-frame mode preliminary selection module includes a second direction prediction sub-module and a second plane prediction sub-module, which are respectively used to perform intra-frame prediction in the direction mode and the plane mode on the video stream in the H.265 encoding format. Predictive interpolation. Specifically, the second direction prediction sub-module may include part or all of the 31 special direction modes in the H.265 video coding standard. The second plane prediction sub-module uses two linear filters in the horizontal and vertical directions, and takes the average of the two as the prediction value of the pixels in the current block.

Since the coding cost of intra prediction in H.264 and H.265 coding formats is basically the same,

Therefore, the common intra mode primary selection module further includes a coding cost calculation sub-module for calculating the second coding cost for the video stream in the H.264 encoding format or the video stream in the H.265 encoding format based on the same hardware structure. Although the method of calculating the encoding cost is basically the same, when performing intra-frame prediction on the video stream in H.264 encoding format, the encoding cost can be calculated separately for 9 prediction modes; for the video stream in H.265 encoding format, Since there are as many as 35 intra-frame prediction modes in the H.265 video coding standard, during intra-frame prediction, only some of the intra-frame prediction modes are used for intra-frame prediction and the coding cost is calculated (for example, only in three of the intra-frame prediction modes). Calculate the coding cost in the prediction mode), and finally select one or two optimal intra-frame prediction modes.

The coding cost calculation sub-module uses the cost function to calculate the coding cost of various intra-frame prediction modes, and then determines the best intra-frame prediction mode according to the size of the coding cost. Illustratively, the intra-mode priming module 140 may calculate the coding cost using the SAD/SAD cost model as described above. But optionally, the intra-mode primary selection module 140 may also use a rate-distortion optimization (RDO) cost model to calculate the coding cost.

The sub-pixel search module 120 and the intra-mode primary selection module 140 are both electrically connected to the mode decision module 130, and the mode decision module 130 at least obtains at least one intra-frame prediction mode obtained by the intra-mode primary selection module 140 and the sub-pixel search module 120. At least one motion vector of , determines the optimal prediction block, and outputs a coefficient block. The coefficient block is obtained by transforming the residual block. The mode decision module 130 can also output mode information. The mode information and coefficient blocks will finally be passed to the entropy coding block for entropy coding. In one embodiment, the mode decision module 130 is also capable of outputting reconstruction blocks. After that, the reconstructed block is subjected to deblocking filtering and entropy coding filtering. Further, the mode decision module 130 also participates in the mode decision by using the prediction results of the two special inter-frame prediction modes, the Skip mode and the Merge mode. The Merge mode directly uses the motion vector of the adjacent block in the temporal or spatial domain as the motion vector of the current block, omitting the step of motion estimation. Skip mode can also be considered as a special merge mode, the difference is that the skip mode directly considers that the residual obtained after transformation and quantization is 0, that is, the residual is not encoded, and the prediction block in this mode is the reconstruction block. It should be noted that for the H.264 and H.265 encoding formats, the Skip mode and the Merge mode acquire the MVs of adjacent blocks in different ways.

The mode decision module 130 also multiplexes part of the hardware structure to perform mode decision for the video streams of the H.264 and H.265 encoding formats. Specifically, the mode decision module 130 includes a first mode decision module, a second mode decision module and a common mode decision module. The public mode decision module is the hardware structure multiplexed by the H.264 and H.265 encoding formats. The first mode decision module and the second mode decision module are hardware structures for the H.264 encoding format and the H.265 encoding format independently. Wherein, the first mode decision module and the common mode decision module jointly select the division mode and the optimal prediction mode of the coding unit for the video stream in the H.264 encoding format, and obtain the residual of the H.264 encoding format according to the optimal prediction mode. difference block; the second mode decision module and the common mode decision module jointly select the division mode and the optimal prediction mode of the coding unit for the video stream in the H.265 encoding format, and obtain the H.265 encoding format according to the optimal prediction mode. residual block.

The mode decision mainly includes transformation, quantization, inverse transformation, inverse quantization, bit estimation and distortion estimation. Transformation and quantization can further remove the redundancy of the image and save the coding rate. The purpose of transformation is to transform the image signal from the time domain to the frequency domain. Compared with the time domain signal, the signal transformed to the frequency domain reduces the bit rate to a large extent; quantization can reduce the length of image encoding.

Since the transformation, quantization, inverse transformation and inverse quantization in the H.264 and H.265 video coding standards are quite different and need to be implemented separately, the first mode decision module includes a first transformation sub-module, a first quantization sub-module, a first An inverse transform sub-module and a first inverse quantization sub-module are respectively used to transform, quantize, inverse transform and inverse quantize the video stream in the H.264 encoding format; the second mode decision module includes a second transform sub-module, a second The quantization sub-module, the second inverse transform sub-module and the second inverse quantization sub-module are respectively used to transform, quantize, inverse transform and inverse quantize the video stream in the H.265 encoding format.

Wherein, the first transform sub-module and the second transform sub-module essentially multiply the residual matrix by the transform matrix, and the first inverse transform sub-module and the second inverse transform sub-module both multiply the coefficient matrix by the transform matrix. However, the second transform sub-module and the second inverse transform sub-module perform a shift operation at the end when performing matrix multiplication on the video stream in the H.265 encoding format, while the first transform sub-module and the first inverse transform sub-module perform a shift operation at the end. When the video stream in H.264 encoding format performs matrix multiplication, a shift operation is performed in the middle of the calculation; in addition, compared with H.265, the video stream in H.264 encoding format is in some prediction modes (such as chroma component, Under the 16x16 intra-frame mode, more Hadamard transform/inverse transform processes will be performed, so the transform and inverse transform of H.264 and H.265 use different hardware structures.

Quantization and inverse quantization essentially multiply the transformed matrix by a coefficient, and then round to the nearest integer. The difference is that the first quantization submodule multiplies different coefficients at different positions when quantizing the video stream in the H.264 encoding format, and the second quantization submodule multiplies the video stream in the H.265 encoding format when quantizing the video stream in the H.265 encoding format. Different positions are multiplied by the same coefficient, so the quantization and inverse quantization of H.264 and H.265 use different hardware structures.

Bit estimation is to estimate the number of bits required by the current prediction mode according to the syntax elements to be encoded (including prediction information and coefficients, etc.) specified in the H.264 or H.265 video coding standard. Since the syntax elements for bit estimation specified in the H.264 and H.265 video coding standards are different, the process of bit estimation for video streams in H.264 and H.265 coding formats is also different. Therefore, in one embodiment, different hardware structures can be used to implement bit estimation of video streams in H.264 and H.265 encoding formats, respectively, that is, the first mode decision module further includes a first bit estimation sub-module for Bit estimation is performed on the video stream in the H.264 encoding format; the second mode decision module further includes a second bit estimation sub-module for performing bit estimation on the video stream in the H.265 encoding format.

However, the syntax elements of bit estimation specified in the H.264 and H.265 video coding standards are similar to a certain extent, so in another embodiment, in order to save the area of the hardware, the same hardware can also be reused Structure for bit estimation of video streams in H.264 and H.265 encoding formats.

Wherein, as an implementation manner, the common mode decision module further includes an H.264 bit estimation sub-module, which is configured to perform an analysis on the video stream in the H.264 encoding format or the video in the H.265 encoding format based on the first hardware structure. stream for bit estimation. That is to say, in this implementation, the bit estimation circuit in the multiplexing H.264 encoding format realizes the bit estimation of the video streams in H.264 and H.265 encoding formats, no matter which format is used for the video stream For bit estimation, the syntax elements of bit estimation specified in the H.264 video coding standard are used.

As another implementation manner, the common mode decision module includes an H.265 bit estimation sub-module, configured to perform the H.264 encoding format video stream or the H.265 encoding format video stream based on the second hardware structure. Bit estimation, wherein the syntax elements used by the H.264 bit estimation sub-module and the H.265 bit estimation sub-module are different. That is to say, in this implementation, the bit estimation circuit in the multiplexing H.265 encoding format realizes the bit estimation of the video streams of the H.264 and H.265 encoding formats, no matter which format the video stream is used for. For bit estimation, the syntax elements of bit estimation specified in the H.265 video coding standard are used.

The process of H.264 and H.265 distortion estimation can use the same calculation, therefore, the common mode decision module includes a distortion estimation sub-module, based on the same hardware structure for the video stream in the H.264 encoding format or the H. 265-encoded video streams for distortion estimation. The distortion estimation module usually calculates the SSE, SAD, etc. of the reconstructed and original pixels as the encoded distortion. The calculation formula is as follows: Distortion=SAD+Lambda*MVBits. In the formula, SAD is the sum of absolute differences in the time domain, that is, the pixel difference between the reconstructed block and the current block; Lambda is the conversion factor, and MVBits is the number of bits obtained by the bit estimation sub-module.

Further, the video encoding apparatus 100 further includes an in-loop filtering module 150, which is electrically connected to the mode decision module 130 and configured to perform in-loop filtering processing on the residual block.

In one embodiment, the in-loop filtering module 150 includes a first deblocking filter (DBF) sub-module and a second deblocking filter sub-module, which are respectively used for H.264 encoded video streams and H.265 The video stream in the encoded format is subjected to deblocking filtering. The main function of deblocking filtering is to remove high-frequency components at block boundaries to reduce blockiness in decoded images. Blocking refers to the phenomenon that when an image is compressed in blocks, discontinuous blocks that are easily noticeable to the human eye are generated at the boundaries of the blocks during decoding. There are two reasons for the blocking effect: one is that continuous blocks use discontinuous blocks for prediction during inter-frame motion compensation, resulting in discontinuity between blocks, and the other is that the residual blocks are transformed, quantized, and encoded. The resulting quantization distortion. The first deblocking filtering sub-module and the second deblocking filtering sub-module determine that the size of the deblocking filtering boundary is different. If the filtering conditions are satisfied, the first deblocking filtering sub-module will determine the size of the 4× The boundaries of 4 blocks are deblocked, and the second deblocking filter sub-module is used to deblock the boundaries of 8×8 blocks in the H.265 encoded video stream. In addition, the first deblocking filtering sub-module and the second deblocking filtering sub-module judge the filtering strength differently, and the filters used under different filtering strengths are also different.

The H.265 video coding standard involves two kinds of loop filtering. In addition to deblocking filtering, it also includes sample adaptive offset (Sample Adaptive Offset, SAO). SAO analyzes the original data and reconstructed data of the current frame. Offset compensation is performed on the image after deblocking filtering, so that the reconstructed image is as close to the original image as possible. Therefore, in one embodiment, the in-loop filtering module 150 further includes a SAO parameter estimation sub-module and a SAO filtering sub-module for performing SAO parameter estimation and SAO filtering on the video stream in the H.265 encoding format; No SAO is involved, so the reconstructed image after deblocking filtering is directly output.

For the video stream in the H.265 encoding format, the image after the inverse quantization operation is processed by the deblocking filter sub-module, and then passed to the SAO parameter estimation sub-module as an input. SAO includes 4 kinds of EO (Edge Offset, boundary compensation mode) and 1 BO (Band Offset, with compensation mode) mode. In EO mode, you need to determine the size of the compensation value, and in BO mode, you need to determine which ones to compensate. band and compensation value. The SAO parameter estimation sub-module is used for estimating the above compensation mode and parameters of SAO to obtain the optimal compensation mode and parameters. SAO filtering is to perform the actual filtering operation according to the obtained optimal compensation mode and parameters. The reconstructed image output by the SAO filtering sub-module will be buffered in the encoder as a subsequent reference frame.

The entropy encoding module 160 performs context-based arithmetic encoding on the syntax elements, encodes the syntax elements into binary strings, and performs arithmetic encoding to encode the strings into code streams. Among them, the most common information is represented by a short code, otherwise, a long code is used to achieve the purpose of the shortest average code length. The decoder can restore the original information without distortion according to the entropy-encoded code stream. In one embodiment, the entropy coding mode adopted by the entropy coding module 160 is CABAC (Content-Based Adaptive Binary Arithmetic Coding). CABAC is an adaptive arithmetic coding based on the context model. It uses the correlation between symbols and the statistical characteristics of the video stream to continuously and automatically adjust the probability of occurrence of each symbol, so that the amount of information output by the codeword is almost the same as the symbol entropy rate. in order to obtain higher coding efficiency.

The entropy encoding module 160 also multiplexes part of the hardware structure to perform entropy encoding on the video streams in H.264 and H.265 encoding formats. Specifically, entropy coding mainly includes two steps: one is binarization, which converts the syntax elements to be encoded into binary strings. The syntax elements to be encoded include the division method of the current block, prediction information, and residuals. information, filtering information, etc.; the second is arithmetic coding, which encodes a binary string into a code stream. Among them, the syntax elements that need to be encoded in the binarization process specified in the H.264 video coding standard and the H.265 video coding standard are quite different, so they are implemented by different hardware structures; The same, so the same set of hardware structure is reused.

Therefore, the entropy encoding module 160 includes a first entropy encoding module, a second entropy encoding module, and a common entropy encoding module, and the common entropy encoding module is the hardware structure of video stream multiplexing in H.264 and H.265 encoding formats; the first entropy encoding module The encoding module and the common entropy encoding module are used to perform entropy encoding of residual blocks on the video stream in the H.264 encoding format, and the second entropy encoding and the common entropy encoding module are used for performing residual block encoding on the video stream in the H.265 encoding format. Entropy coding of difference blocks. Wherein, the first entropy coding module is used to obtain the syntax elements of the H.264 coding format according to the residual block of the H.264 coding format, and the second entropy coding module is used to obtain the H.265 coding format according to the residual block of the H.265 coding format The syntax elements of the encoding format, the common entropy encoding module is used to provide an arithmetic encoding kernel to entropy encode the syntax elements of the H.264 encoding format or the syntax elements of the H.265 encoding format.

In some embodiments, the video encoding apparatus 100 further includes a reference frame management module 170 electrically connected to the integer pixel search module 110, the sub-pixel search module 120 and the mode decision module 130, for acquiring reference frames, and The reference frame is sent to the integer pixel search module 110 , the sub-pixel search module 120 and the mode decision module 130 . This part is the same for the H.264 video coding standard and the H.265 video coding standard, so it can be realized by multiplexing the same hardware structure.

In the hardware structure, the video encoding apparatus 100 in this embodiment of the present application is implemented in a pipeline-level manner. In one embodiment, referring to FIG. 2 , the video encoding apparatus 100 includes a total of 5 pipeline stages, the integer pixel search module 110 is located in the first stage, the sub-pixel search module 120 and the intra-mode preliminary selection module 140 are located in the second stage, the mode The decision-making module 130 is located at the third level, the SAO parameter estimation sub-module and the deblocking filtering sub-module are located at the fourth level, and the entropy coding module 160 and the SAO filtering module are located at the fifth level. It should be noted that the whole pixel search module 110 is electrically connected to the sub-pixel search module 120, the sub-pixel search module 120 is electrically connected to the mode decision module 130, and the mode decision module 130 is respectively electrically connected to the SAO parameter estimation sub-module and the deblocking filter sub-module. The modules, the SAO parameter estimation sub-module and the deblocking filtering sub-module are respectively electrically connected to the entropy encoding module 160 and the SAO filtering module. Electrical connection means that the above-mentioned modules are correspondingly electrically connected. Since the whole-pixel search module 110 and the sub-pixel search module 120 are electrically connected to each other, the whole-pixel search module 110 can output a matching block matching the current block in the current frame to the sub-pixel search module 120 . Also based on this, the whole-pixel search module 110 can be set at the first pipeline stage, and the sub-pixel search module 120 can be set at the second pipeline stage. Since the sub-pixel search module 120 and the pattern decision module 130 are electrically connected to each other, the sub-pixel search module 120 can output at least one sub-pixel matching block to the pattern decision module 130 . Also based on this, the sub-pixel search module 120 can be set at the second pipeline stage, and the mode decision module 130 can be set at the third pipeline stage. For similar reasons, the SAO parameter estimation sub-module and the deblocking filtering sub-module may be set at the fourth pipeline stage, and the entropy encoding module 160 and the SAO filtering module may be set at the fifth pipeline stage.

In the pipeline stage, when the N+2th block is performing integer pixel search, the N+1th block is performing sub-pixel search and intra-mode primary selection, and the Nth block is performing mode decision. In some embodiments, since the calculation amount of the mode decision module 130 is relatively large, it can also be implemented in two-stage pipeline. The division of the pipeline stages as shown in FIG. 2 is only an example, and the division of the actual pipeline stages can also be done in different ways.

The video encoding apparatus of the embodiment of the present application multiplexes part of the hardware structure to encode video streams in the H.264 encoding format and the H.265 encoding format, which saves hardware area.

FIG. 3 shows a flowchart of a video encoding method 300 according to an embodiment of the present invention. The video encoding method 300 may be implemented by the video encoding apparatus 100 described above. Only the main steps of the video encoding method 300 will be described below, and for further details, reference may be made to the above.

As shown in FIG. 3 , the video coding method 300 according to the embodiment of the present application includes the following steps:

Step S310, the integer pixel search module determines a matching block that matches the current block in the current frame within a plurality of predetermined ranges in the plurality of reference frames;

Step S320, a sub-pixel search module electrically connected to the whole pixel search module determines at least one sub-pixel matching block about the matching block, wherein the sub-pixel search module includes a half pixel interpolation module, the Determining at least one subpixel matching block for the matching block includes: a half pixel interpolation module halving the video stream in H.264 encoding format or the video stream in H.265 encoding format using a first interpolation filter one-pixel interpolation;

Step S330 , the mode decision module electrically connected to the sub-pixel search module performs mode decision at least by using the coding cost of the sub-pixel matching block, so as to obtain the optimal prediction block of the current block for video coding.

In one embodiment, the first interpolation filter used in step S320 is an 8-tap interpolation filter.

In one embodiment, the sub-pixel search module further includes a first quarter-pixel interpolation module and a second quarter-pixel interpolation module, and the method further includes: by the first quarter-pixel interpolation module Perform quarter-pixel interpolation on the video stream in H.264 encoding format based on the second interpolation filter, or perform quarter-pixel interpolation on the video stream in H.265 encoding format based on the third interpolation filter by the second quarter-pixel interpolation module The video stream is quarter-pixel interpolated.

Further, the sub-pixel search module further includes an encoding cost calculation sub-module, and the method further includes: performing the H.264 encoding format video stream or the H.264 encoding format by the encoding cost calculation sub-module based on the same hardware structure. The first encoding cost between the sub-pixel matching block and the current block is calculated for the video stream in the H.265 encoding format.

In one embodiment, the method further includes: determining, by an intra-frame mode primary selection module connected to the mode decision module, based on pixel values corresponding to at least one adjacent reference block in the current frame, about the current frame at least one prediction block of the block and a second encoding cost corresponding to the at least one prediction block, and determining at least one intra-frame prediction mode according to the second encoding cost; the mode decision module determines according to the at least one intra-frame prediction mode The prediction mode and the at least one motion vector determine an optimal prediction block, and output mode information, a coefficient block and a reconstruction block; the reconstruction block is subjected to in-loop filtering processing by an in-loop filtering module electrically connected to the mode decision module ; used by the entropy coding module electrically connected to the in-loop filtering module to perform entropy coding on the mode information and coefficient blocks.

Exemplarily, the intra-frame mode preliminary selection module includes a first intra-frame mode preliminary selection module, a second intra-frame mode preliminary selection module, and a common intra-frame mode preliminary selection module, and the method further includes: the first frame The intra-mode preliminary selection module and the common intra-mode preliminary selection module select the intra-prediction mode for the video stream in the H.264 encoding format, or, the second intra-mode preliminary selection module and the common intra-mode preliminary selection module. The selection module selects an intra prediction mode for the video stream in the H.265 encoding format.

Wherein, the common intra mode primary selection module includes a horizontal prediction sub-module, a vertical prediction sub-module and a DC prediction sub-module, and the method further includes: the horizontal prediction sub-module, the vertical prediction sub-module and the DC prediction sub-module The prediction sub-module performs intra-frame prediction interpolation in horizontal mode, vertical mode and DC mode on the video stream in H.264 encoding format or the video stream in H.264 encoding format based on the same hardware structure.

The common intra mode primary selection module further includes an encoding cost calculation sub-module, and the method further includes: the encoding cost calculation sub-module performs an H.264 encoding format video stream or an H.265 encoding format based on the same hardware structure. The second encoding cost is calculated for the video stream.

In one embodiment, the mode decision module includes a first mode decision module, a second mode decision module, and a public mode decision module, and the mode decision includes: the first mode decision module and the public mode decision module are: For the video stream in the H.264 encoding format, the partitioning mode and the optimal prediction mode of the coding unit are selected, and the residual block in the H.264 encoding format is obtained according to the optimal prediction mode, or the second mode decision module and the The common mode decision module selects the coding unit division mode and the optimal prediction mode for the video stream in the H.265 encoding format, and obtains the residual block in the H.265 encoding format according to the optimal prediction mode.

Exemplarily, the first mode decision module further includes a first bit estimation submodule, and the mode decision further includes the first bit estimation submodule to perform bit estimation on the video stream in the H.264 encoding format; The second mode decision module further includes a second bit estimation submodule, and the mode decision further includes the second bit estimation submodule to perform bit estimation on the video stream in the H.265 encoding format.

Exemplarily, the common mode decision module further includes an H.264 bit estimation submodule, and the mode decision further includes the H.264 bit estimation submodule based on the first hardware structure for the video in the H.264 encoding format. stream or the video stream in the H.265 encoding format for bit estimation; or, the common mode decision module includes an H.265 bit estimation sub-module, and the mode decision further includes the H.265 bit estimation sub-module based on the first The second hardware structure performs bit estimation on the video stream in the H.264 encoding format or the video stream in the H.265 encoding format, wherein the H.264 bit estimation submodule and the H.265 bit estimation submodule Different syntax elements are used.

Exemplarily, the common mode decision module further includes a distortion estimation sub-module, and the mode decision further includes the distortion estimation sub-module performing the video stream of the H.264 encoding format or the H.264 encoding format based on the same hardware structure. 265-encoded video streams for distortion estimation.

In one embodiment, the in-loop filtering includes SAO parameter estimation and SAO filtering of the video stream in H.265 encoding format. The in-loop filtering further includes performing deblocking filtering on the video stream in the H.264 encoding format and the video stream in the H.265 encoding format based on the same hardware structure.

In one embodiment, the entropy encoding module includes a first entropy encoding module, a second entropy encoding module, and a common entropy encoding module, and the entropy encoding includes: a pair of the first entropy encoding module and the common entropy encoding module Entropy encoding of the residual block is performed on the video stream in the H.264 encoding format, or the second entropy encoding and the common entropy encoding module perform the entropy encoding of the residual block on the video stream in the H.265 encoding format. coding.

Further, the entropy encoding includes: the first entropy encoding module obtains syntax elements in the H.264 encoding format according to the residual block in the H.264 encoding format, and the second entropy encoding module obtains the syntax elements in the H.264 encoding format according to the The residual block of the H.265 encoding format obtains the syntax elements of the H.265 encoding format, the common entropy encoding module provides an arithmetic encoding kernel, to the syntax elements of the H.264 encoding format or the H.265 The syntax elements of the encoding format are entropy encoded.

In addition, an embodiment of the present invention further provides a computer storage medium, on which a computer program is stored. When the computer program is executed by the processor, the aforementioned video encoding apparatus 100 shown in FIG. 1 can be controlled to implement the steps of the aforementioned video encoding method 300 shown in FIG. 3 . For example, the computer storage medium is a computer-readable storage medium. Computer storage media may include, for example, memory cards for smartphones, storage components for tablet computers, hard drives for personal computers, read only memory (ROM), erasable programmable read only memory (EPROM), portable compact disk read only memory ( CD-ROM), USB memory, or any combination of the above storage media. A computer-readable storage medium can be any combination of one or more computer-readable storage media.

The embodiment of the present invention also provides a movable platform. FIG. 4 is a schematic structural diagram of a movable platform 400 according to an embodiment of the present invention. As shown in FIG. 4 , the movable platform 400 of this embodiment includes an imaging device 410 and a video encoding device 420 . The imaging device 410 is used to collect video data, and the video encoding device 420 is used to perform video encoding on the video data collected by the imaging device 410. The video encoding apparatus 420 may adopt the structure of the embodiment shown in FIG. 1 , and correspondingly, the specific details thereof can be referred to the above, which will not be repeated here.

In some embodiments, the movable platform includes at least one of an unmanned aerial vehicle, a car, a remote control car, a robot, a camera, and a gimbal. The video encoding device 420 and the imaging device 410 are mounted on the movable platform body of the movable platform. When the movable platform is an unmanned aerial vehicle, the body of the movable platform is the fuselage of the unmanned aerial vehicle. When the movable platform is an automobile, the movable platform body is the body of the automobile. The vehicle may be an autonomous driving vehicle or a semi-autonomous driving vehicle, which is not limited herein. When the movable platform is a remote control car, the movable platform body is the body of the remote control car. When the movable platform is a robot, the movable platform body is a robot. When the movable platform is a camera, the movable platform body is the camera itself. When the movable platform is a gimbal, the movable platform body is a gimbal body. The gimbal can be a handheld gimbal, or a gimbal mounted on a car or an aircraft.

To sum up, the video encoding method, video encoding device, computer storage medium and mobile platform of the embodiments of the present invention multiplex part of the hardware structure to encode video streams in H.264 encoding format and H.265 encoding format, saving energy hardware area.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware or any other combination. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present invention are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line, DSL) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes an integration of one or more available media. The usable media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, digital video disc (DVD)), or semiconductor media (eg, solid state disk (SSD)), etc. .

Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of the present invention.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present invention can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited thereto. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed by the present invention. should be included within the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Although example embodiments have been described herein with reference to the accompanying drawings, it should be understood that the above-described example embodiments are exemplary only, and are not intended to limit the scope of the invention thereto. Various changes and modifications can be made therein by those of ordinary skill in the art without departing from the scope and spirit of the invention. All such changes and modifications are intended to be included within the scope of the invention as claimed in the appended claims.

Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of the present invention.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or May be integrated into another device, or some features may be omitted, or not implemented.

In the description provided herein, numerous specific details are set forth. It will be understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it is to be understood that in the description of the exemplary embodiments of the invention, various features of the invention are sometimes grouped together , or in its description. However, this method of the invention should not be interpreted as reflecting the intention that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the corresponding claims reflect, the invention lies in the fact that the corresponding technical problem may be solved with less than all features of a single disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.

It will be understood by those skilled in the art that all features disclosed in this specification (including the accompanying claims, abstract and drawings) and any method or apparatus so disclosed may be used in any combination, except that the features are mutually exclusive. Processes or units are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that although some of the embodiments described herein include certain features, but not others, included in other embodiments, that combinations of features of different embodiments are intended to be within the scope of the invention within and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

Various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all of the functions of some modules according to the embodiments of the present invention. The present invention may also be implemented as apparatus programs (eg, computer programs and computer program products) for performing part or all of the methods described herein. Such a program implementing the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals. Such signals may be downloaded from Internet sites, or provided on carrier signals, or in any other form.

It should be noted that the above-described embodiments illustrate rather than limit the invention, and that alternative embodiments may be devised by those skilled in the art without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The invention can be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. do not denote any order. These words can be interpreted as names.

The above is only the specific embodiment of the present invention or the description of the specific embodiment, and the protection scope of the present invention is not limited thereto. Any changes or substitutions should be included within the protection scope of the present invention. The protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

A video encoding device, characterized in that the video encoding device comprises:

an integer pixel search module for determining a matching block that matches the current block in the current frame within a plurality of predetermined ranges in the plurality of reference frames;

a sub-pixel search module electrically connected to the whole-pixel search module, and the sub-pixel search module is configured to determine at least one sub-pixel matching block about the matching block;

a mode decision module, electrically connected to the sub-pixel search module, for performing mode decision at least using the coding cost of the sub-pixel matching block to obtain the optimal prediction block of the current block for video coding;

Wherein, the sub-pixel search module includes a half-pixel interpolation module, and the half-pixel interpolation module can use the first interpolation filter to perform H.264 encoding format video stream and H.265 encoding format video The stream does one-half pixel interpolation.
The video encoding apparatus according to claim 1, wherein the integer pixel search module is further configured to determine a first motion vector between the current block and the matching block;

The sub-pixel search module is further configured to determine a second motion vector of the current block relative to the at least one sub-pixel matching block, and the precision of the second motion vector is higher than that of the first motion vector.
The video encoding apparatus according to claim 1, wherein the integer pixel search module performs integer pixel search on the video stream in H.264 encoding format and the video stream in H.265 encoding format based on the same hardware structure.
The video encoding apparatus according to claim 1, wherein the first interpolation filter is an 8-tap interpolation filter.
The video encoding apparatus according to claim 1, wherein the sub-pixel search module further comprises a first quarter pixel interpolation module and a second quarter pixel interpolation module, the first quarter pixel interpolation module A pixel interpolation module performs quarter-pixel interpolation on the H.264-encoded video stream based on a second interpolation filter that performs quarter-pixel interpolation on the H.265-encoded format based on a third interpolation filter The video stream is quarter-pixel interpolated.
The video coding apparatus according to claim 1, wherein the pixel sub-pixel search module further comprises a coding cost calculation sub-module, which is configured to perform a calculation on the video stream in the H.264 coding format or the coding cost based on the same hardware structure. The first encoding cost between the sub-pixel matching block and the current block is calculated for the video stream in the H.265 encoding format.
The video encoding apparatus according to claim 1, further comprising:

The intra-frame mode primary selection module is connected to the mode decision module, and the intra-frame mode primary selection module is configured to determine the information about the current block according to the pixel value corresponding to at least one adjacent reference block in the current frame. at least one prediction block and a second encoding cost corresponding to the at least one prediction block, and determining at least one intra prediction mode according to the second encoding cost;

The mode decision module is configured to determine an optimal prediction block according to the at least one intra prediction mode and the at least one motion vector, and output mode information, a coefficient block and a reconstruction block;

an in-loop filtering module, electrically connected to the mode decision module, and used for performing in-loop filtering processing on the reconstruction block;

The entropy coding module is electrically connected to the in-loop filtering module, and the entropy coding module is used for entropy coding the mode information and the coefficient block.
The video encoding device according to claim 7, wherein the intra-frame mode preliminary selection module comprises a first intra-frame mode preliminary selection module, a second intra-frame mode preliminary selection module, and a common intra-frame mode preliminary selection module, The first intra mode preliminary selection module and the common intra mode preliminary selection module are used to select an intra prediction mode for the video stream in the H.264 encoding format, and the second intra mode preliminary selection module and the The common intra-frame mode primary selection module is used to select an intra-frame prediction mode for the video stream in the H.265 encoding format.
The video encoding apparatus according to claim 8, wherein the common intra-mode preliminary selection module comprises a horizontal prediction sub-module, a vertical prediction sub-module and a DC prediction sub-module, which are respectively used for H-prediction based on the same hardware structure. .264 encoding format video stream or H.264 encoding format video stream for intra-frame prediction interpolation in horizontal mode, vertical mode and DC mode.
The video encoding device according to claim 8, wherein the first intra-mode preliminary selection module comprises a first direction prediction sub-module and a first plane prediction sub-module, which are respectively used for the H.264 encoding format. The video stream is subjected to intra-prediction interpolation in directional mode and plane mode;

The second intra-frame mode preliminary selection module includes a second directional prediction sub-module and a second plane prediction sub-module, which are respectively used to perform intra-frame prediction in the directional mode and the plane mode on the video stream in the H.265 encoding format. interpolation.
The video encoding apparatus according to claim 8, wherein the common intra-mode preliminary selection module comprises: a coding cost calculation sub-module, which is used to perform a calculation on the video stream or H.264 encoding format based on the same hardware structure. The second encoding cost is calculated for the video stream in the .265 encoding format.
The video encoding apparatus according to claim 1, wherein the mode decision module comprises a first mode decision module, a second mode decision module and a common mode decision module, the first mode decision module and the public mode decision module The decision-making module is used to select the division mode and the optimal prediction mode of the coding unit for the video stream in the H.264 encoding format, and obtain the residual block in the H.264 encoding format according to the optimal prediction mode, and the second mode decides The module and the common mode decision module are used to select a coding unit division mode and an optimal prediction mode for a video stream in H.265 encoding format, and obtain a residual block in H.265 encoding format according to the optimal prediction mode.
The video encoding apparatus according to claim 12, wherein the first mode decision module comprises a first transformation submodule, a first quantization submodule, a first inverse transformation submodule and a first inverse quantization submodule, which are respectively for transforming, quantizing, inverse transforming, and inverse quantizing the video stream in the H.264 encoding format;

The second mode decision module includes a second transform sub-module, a second quantization sub-module, a second inverse transform sub-module and a second inverse quantization sub-module, which are respectively used to transform the video stream in the H.265 encoding format , quantization, inverse transform, inverse quantization.
The video encoding apparatus according to claim 12, wherein the first mode decision module further comprises a first bit estimation sub-module for performing bit estimation on the video stream in the H.264 encoding format;

The second mode decision module further includes a second bit estimation sub-module for performing bit estimation on the video stream in the H.265 encoding format.
The video encoding apparatus according to claim 12, wherein the common mode decision module further comprises an H.264 bit estimation sub-module, which is configured to perform an H.264 encoding format based on the first hardware structure for the video stream or the H.264 encoding format. Perform bit estimation on the video stream in the H.265 encoding format; or, the common mode decision module includes an H.265 bit estimation sub-module, configured to perform bit estimation on the video stream in the H.264 encoding format based on the second hardware structure or Bit estimation is performed on the video stream in the H.265 encoding format, wherein the syntax elements used by the H.264 bit estimation sub-module and the H.265 bit estimation sub-module are different.
The video encoding apparatus according to claim 12, wherein the common mode decision module comprises a distortion estimation sub-module for performing the video stream of the H.264 encoding format or the H.264 encoding format based on the same hardware structure. 265-encoded video streams for distortion estimation.
The video encoding device according to claim 7, wherein the in-loop filtering module comprises a SAO parameter estimation sub-module and a SAO filtering sub-module, which are used to perform SAO parameter estimation and SAO on the video stream in the H.265 encoding format filter.
The video encoding apparatus according to claim 7, wherein the in-loop filtering module comprises a first deblocking filtering sub-module and a second deblocking filtering sub-module, which are respectively used for video streams in H.264 encoding format. Perform deblocking filtering with video streams in H.265 encoding format.
The video encoding device according to claim 7, wherein the entropy encoding module comprises a first entropy encoding module, a second entropy encoding module and a common entropy encoding module, the first entropy encoding module and the common entropy encoding module The encoding module is configured to perform entropy encoding of the coefficient block on the video stream in the H.264 encoding format, and the second entropy encoding and the common entropy encoding module are configured to perform the coefficients on the video stream in the H.265 encoding format Entropy encoding of the block.
The video encoding device according to claim 19, wherein the first entropy encoding module is configured to obtain a syntax element of the H.264 encoding format according to the residual block of the H.264 encoding format, the second The entropy coding module is used to obtain the syntax elements of the H.265 coding format according to the residual block of the H.265 coding format, and the common entropy coding module is used to provide an arithmetic coding kernel to The syntax elements or syntax elements of the H.265 encoding format are entropy encoded.
The video encoding apparatus according to claim 7, further comprising a reference frame management module electrically connected to the integer pixel search module, the sub-pixel search module and the mode decision module, for obtaining the reference frame frame, and send the reference frame to the integer pixel search module, the sub-pixel search module and the mode decision module.
A video coding method, characterized in that the method comprises:

The integer pixel search module determines a matching block that matches the current block in the current frame within a plurality of predetermined ranges in the plurality of reference frames;

A sub-pixel search module electrically connected to the whole-pixel search module determines at least a sub-pixel matching block for the matching block, wherein the sub-pixel search module includes a half-pixel interpolation module, the determination is The at least one pixel matching block of the matching block includes: the one-half pixel interpolation module uses the first interpolation filter to perform one-half pixel interpolation on the video stream in the H.264 encoding format or the video stream in the H.265 encoding format. ;

A mode decision module electrically connected to the sub-pixel search module makes mode decision at least using the coding cost of the sub-pixel matching block to obtain an optimal prediction block of the current block for video encoding.
The video encoding method according to claim 22, wherein the first interpolation filter is an 8-tap interpolation filter.
The video coding method according to claim 22, wherein the sub-pixel search module further comprises a first quarter-pixel interpolation module and a second quarter-pixel interpolation module, and the method further comprises:

Performing quarter-pixel interpolation on the video stream in the H.264 encoding format by the first quarter-pixel interpolation module based on a second interpolation filter, or by the second quarter-pixel interpolation module based on The third interpolation filter performs quarter-pixel interpolation on the H.265 encoded video stream.
The video coding method according to claim 22, wherein the sub-pixel search module further comprises a coding cost calculation sub-module, and the method further comprises:

Based on the same hardware structure, the encoding cost calculation sub-module calculates the difference between the sub-pixel matching block and the current block for the video stream in the H.264 encoding format or the video stream in the H.265 encoding format. The first encoding cost.
The video coding method according to claim 22, further comprising:

The intra-mode primary selection module connected to the mode decision module determines at least one prediction block related to the current block and the at least one prediction block related to the at least one adjacent reference block in the current frame according to the pixel value corresponding to the at least one adjacent reference block. predicting a second coding cost corresponding to the block, and determining at least one intra-frame prediction mode according to the second coding cost;

determining an optimal prediction block by the mode decision module according to the at least one intra prediction mode and the at least one motion vector, and outputting mode information, a coefficient block and a reconstructed block;

performing in-loop filtering processing on the reconstruction block by an in-loop filtering module electrically connected to the mode decision module;

Entropy encoding is performed by an entropy encoding module electrically connected to the in-loop filtering module according to the mode information and coefficient blocks processed by the in-loop filtering.
The video coding method according to claim 26, wherein the intra-frame mode preliminary selection module comprises a first intra-frame mode preliminary selection module, a second intra-frame mode preliminary selection module, and a common intra-frame mode preliminary selection module, The method also includes:

The first intra mode preliminary selection module and the common intra mode preliminary selection module select the intra prediction mode for the video stream in the H.264 encoding format, or, the second intra mode preliminary selection module and the The common intra mode primary selection module selects an intra prediction mode for the video stream in the H.265 encoding format.
The video coding method according to claim 27, wherein the common intra-mode preliminary selection module comprises a horizontal prediction sub-module, a vertical prediction sub-module and a DC prediction sub-module, and the method further comprises:

The horizontal prediction sub-module, the vertical prediction sub-module and the DC prediction sub-module respectively perform horizontal mode and vertical mode on the video stream in H.264 encoding format or the video stream in H.264 encoding format based on the same hardware structure. and intra-predictive interpolation in DC mode.
The video coding method according to claim 27, wherein the common intra mode primary selection module comprises a coding cost calculation sub-module, and the method further comprises:

The encoding cost calculation submodule calculates the second encoding cost for the video stream in the H.264 encoding format or the video stream in the H.265 encoding format based on the same hardware structure.
The video coding method according to claim 22, wherein the mode decision module comprises a first mode decision module, a second mode decision module and a common mode decision module, and the mode decision comprises:

The first mode decision module and the common mode decision module select the division mode and the optimal prediction mode of the coding unit for the video stream in the H.264 encoding format, and obtain the H.264 encoding format according to the optimal prediction mode. Residual block, or, the second mode decision module and the common mode decision module select the division mode and the optimal prediction mode of the coding unit for the video stream in the H.265 encoding format, and obtain the optimal prediction mode according to the Residual block in H.265 encoding format.
The video coding method according to claim 30, wherein the first mode decision module further comprises a first bit estimation submodule, and the mode decision further comprises the first bit estimation submodule for the H. 264 encoding format video stream for bit estimation;

The second mode decision module further includes a second bit estimation submodule, and the mode decision further includes the second bit estimation submodule to perform bit estimation on the video stream in the H.265 encoding format.
The video coding method according to claim 31, wherein the common mode decision module further comprises an H.264 bit estimation submodule, and the mode decision further comprises the H.264 bit estimation submodule based on the first hardware The structure performs bit estimation on the video stream in the H.264 encoding format or the video stream in the H.265 encoding format; or, the common mode decision module includes an H.265 bit estimation sub-module, and the mode decision further includes The H.265 bit estimation submodule performs bit estimation on the video stream in the H.264 encoding format or the video stream in the H.265 encoding format based on the second hardware structure, wherein the H.264 bit estimation submodule The syntax elements used by the module and the H.265 bit estimation sub-module are different.
The video encoding method according to claim 31, wherein the common mode decision module includes a distortion estimation sub-module, and the mode decision further includes the distortion estimation sub-module performing the H.264 H.264 operation based on the same hardware structure. Distortion estimation of the video stream in the encoding format or the video stream in the H.265 encoding format.
The video encoding method according to claim 26, wherein the in-loop filtering comprises performing SAO parameter estimation and SAO filtering on the video stream in the H.265 encoding format.
The video encoding method according to claim 26, wherein the in-loop filtering comprises performing deblocking filtering on the video stream in the H.264 encoding format and the video stream in the H.265 encoding format.
The video coding method according to claim 26, wherein the entropy coding module comprises a first entropy coding module, a second entropy coding module and a common entropy coding module, and the entropy coding comprises:

The first entropy encoding module and the common entropy encoding module perform entropy encoding of the coefficient block on the video stream in the H.264 encoding format, or the second entropy encoding and the common entropy encoding module perform entropy encoding on the H.264 encoding format. 265 encoding format for the entropy encoding of the coefficient block.
The video coding method according to claim 36, wherein the entropy coding comprises: the first entropy coding module obtains, by the first entropy coding module, a syntax element of the H.264 coding format according to the residual block of the H.264 coding format, The second entropy coding module obtains the syntax elements of the H.265 coding format according to the residual block of the H.265 coding format, and the common entropy coding module provides an arithmetic coding kernel to The syntax elements or syntax elements of the H.265 encoding format are entropy encoded.
A computer storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the steps of the video encoding method according to any one of claims 22 to 37 are implemented.
A movable platform, characterized in that, the movable platform comprises an imaging device and the video encoding device according to any one of claims 1-21, the imaging device is used to collect video data, and the video encoding device The device is configured to perform video encoding on the video data collected by the imaging device.