CN110971896A

CN110971896A - H.265 coding method and device

Info

Publication number: CN110971896A
Application number: CN201811136408.6A
Authority: CN
Inventors: 张善旭; 陈恒明; 张圣钦; 何德龙
Original assignee: Fuzhou Rockchip Electronics Co Ltd
Current assignee: Fuzhou Rockchip Electronics Co Ltd
Priority date: 2018-09-28
Filing date: 2018-09-28
Publication date: 2020-04-07
Anticipated expiration: 2038-09-28
Also published as: CN110971896B

Abstract

The invention provides an H.265 coding method and a device, wherein the device comprises a plurality of modules and a plurality of pipeline steps, each pipeline step comprises at least one pipeline stage for executing at least one module, and the plurality of modules comprise a preprocessing module, a rough selection module, a precise comparison module and an integral control module; the plurality of pipelining steps includes a pre-processing pipelining step, a coarse selection pipelining step performed after the pre-processing pipelining step, and a fine comparison pipelining step performed after the coarse selection pipelining step. The integral control module is used for controlling the storage and the acquisition of the original frame data and the reference frame data, and controlling the preprocessing module, the rough selection module and the accurate comparison module to sequentially execute the flow steps corresponding to the rough selection module and the accurate comparison module. The invention improves the searching precision by a distributed searching mode, better reserves the details of the reconstructed image and reduces the consumption of hardware resources.

Description

H.265 coding method and device

Technical Field

The invention relates to the field of H.265 coding, in particular to an H.265 coding method and device.

Background

H.265 is a new video coding standard established by ITU-T VCEG following H.264. The h.265 standard surrounds the existing video coding standard h.264, preserving some of the original techniques, while improving some of the related techniques. The newly added technology is used for improving the relationship among code stream, coding quality, time delay and algorithm complexity, and achieving the optimal setting. The specific research contents comprise: the method has the advantages of improving compression efficiency, robustness and error recovery capability, reducing real-time delay, reducing channel acquisition time and random access time delay, reducing complexity and the like. At present, the problems of large hardware resource consumption and low coding efficiency generally exist in the existing H.265 algorithm.

Disclosure of Invention

Therefore, it is necessary to provide a technical solution of h.265 coding to reduce hardware resource consumption of the h.265 algorithm and improve coding efficiency.

To achieve the above object, the inventors provide an h.265 encoding apparatus comprising a plurality of modules and a plurality of pipelined steps, each pipelined step comprising at least one pipelined stage for executing at least one module, wherein:

the plurality of modules comprise a preprocessing module, a rough selection module, a precise comparison module and an integral control module, and the integral control module is respectively connected with the preprocessing module, the rough selection module and the precise comparison module;

the plurality of pipelining steps including a pre-processing pipelining step, a coarse selection pipelining step, and a precise comparison pipelining step, the coarse selection pipelining step being performed after the pre-processing pipelining step, the precise comparison pipelining step being performed after the coarse selection pipelining step;

the preprocessing pipeline step divides a current frame in an original video into a plurality of CTU blocks through a preprocessing module;

the rough selection pipelining step divides each CTU block according to a plurality of division modes through a rough selection module, performs rough inter-frame prediction selection and rough intra-frame prediction selection on each division mode of each CTU block, and generates prediction information corresponding to each division mode;

the accurate comparison pipeline step calculates and compares the cost of the prediction information corresponding to each division mode of each CTU block through an accurate comparison module, selects a division mode with the minimum cost for each CTU block and the coding information corresponding to the division mode, generates entropy coding information for generating an H.265 code stream for a current frame and reconstruction information for generating a reconstruction frame for the current frame according to the selected division mode and the coding information corresponding to the division mode,

the integral control module is used for controlling the storage and the acquisition of the original frame data and the reference frame data, and controlling the preprocessing module, the rough selection module and the accurate comparison module to sequentially execute the flow steps corresponding to the rough selection module and the accurate comparison module.

Further, the coarse selection module comprises: the device comprises an inter-frame prediction rough selection module and an intra-frame prediction rough selection module; the roughly selecting flow step comprises the following steps: roughly selecting a pipeline level by inter-frame prediction and roughly selecting a pipeline level by intra-frame prediction;

the inter-frame prediction coarse selection pipeline stage divides each CTU block according to a plurality of division modes through an inter-frame prediction coarse selection module, each division mode divides one CTU block into a plurality of corresponding CU blocks and divides each CU block into one or a plurality of corresponding PU blocks, inter-frame prediction is carried out on each division mode of each CTU block, reference frame information is obtained, intra-frame prediction is carried out on each division mode of each CTU block, and prediction information corresponding to each division mode is generated;

the intra-frame prediction coarse selection pipeline stage is characterized in that an intra-frame prediction coarse selection module is used for: and carrying out intra-frame prediction on each PU block in each division mode, calculating corresponding cost, selecting one or more intra-frame prediction directions relative to the cost of each PU block according to the cost, and taking the selected intra-frame prediction directions as prediction information corresponding to the division mode.

Further, the coarse selection module further comprises an interframe coarse selection module, and the accurate comparison module further comprises an intraframe coarse selection module;

the coarse selection pipelining step comprises an interframe coarse selection pipelining stage, and the accurate comparison pipelining step comprises an intraframe coarse selection pipelining stage.

Further, the inter-prediction coarse selection module comprises: the device comprises a rough searching module, a reference frame data loading module, a fine searching module and a fractional pixel searching module;

the roughly selecting flow step comprises the following steps: a rough searching pipeline stage, a reference frame data loading pipeline stage, a fine searching pipeline stage and a fractional pixel searching pipeline stage;

the rough search pipeline stage comprises a rough search module: selecting a frame from a reference array, selecting a reference frame from an original frame or a reconstructed frame of the frame, performing down-sampling operation on the reference frame and a current CTU block, finding a pixel position with the minimum cost compared with the down-sampled CTU block in the down-sampled reference frame, and calculating a coarse search vector of the pixel position relative to the current CTU block;

the reference frame data loading pipeline stage is characterized in that the reference frame data loading pipeline stage comprises: obtaining a rough search vector of a rough search pipeline stage through an integral control module, obtaining one or more prediction motion vectors with the same function as the rough search according to motion vectors around a CTU block, loading reference frame data according to the rough search vector and the one or more prediction vectors, and transmitting the reference frame data to a fine search pipeline stage through the integral control module;

the fine search pipeline stage is connected with the fine search module through a fine search module: setting a fine search area in a reconstructed image of a reference frame aiming at each PU block according to the coarse search vector, and generating a fine search vector with the minimum cost corresponding to the PU block in the fine search area; and is used for generating one or more prediction motion vectors with the same function as the rough search vector according to the motion vector information around the current CTU block, and generating a fine search vector according to the prediction motion vectors; all the generated fine search vectors are sent to a fractional pixel search module;

the fractional pixel search pipeline stage comprises a fractional pixel search module: and according to each received fine search vector, setting a corresponding fractional pixel search area in the reference frame for each PU block, and generating a fractional pixel search vector with the minimum cost corresponding to the PU block in the fractional pixel search area.

Furthermore, the intra-frame prediction coarse selection pipeline level and the fractional pixel search pipeline level are the same pipeline level, and the intra-frame prediction coarse selection module and the fractional pixel search module are executed in the same pipeline level in parallel.

Further, the coarse intra-prediction selection module comprises a reference pixel generation module, which is executed in a coarse intra-prediction selection pipeline stage;

the intra-prediction coarse selection pipeline stage comprises: for each PU block in each division mode, generating a reference pixel by using an original pixel of a current frame, predicting all intra-frame prediction directions according to the reference pixel according to the rule of an H.265 protocol to obtain prediction results of all directions, respectively calculating distortion cost with the original pixel according to the prediction results of all directions, and sorting the cost from small to large to select one or more intra-frame prediction directions with lower cost.

Further, the intra-frame prediction coarse selection pipeline stage and the fractional pixel search pipeline stage are different pipeline stages, and the intra-frame prediction coarse selection module is executed in the pipeline stage after the fractional pixel search module.

the reference pixel generation module is used for generating reference pixels by using reconstructed pixels of a current frame for each PU block in each partition mode, predicting all intra-frame prediction directions according to the reference pixels according to the rules of an H.265 protocol to obtain prediction results of all directions, calculating distortion cost according to the prediction results of all directions and original pixels, and sorting the cost from small to large to select one or more intra-frame prediction directions with lower cost.

Further, the plurality of modules further comprises a post-processing module, the plurality of pipelining steps further comprises a post-processing pipelining step,

and the post-processing pipeline step generates a reconstructed frame corresponding to the current frame by the post-processing module according to the division mode with the minimum cost corresponding to each CTU block output by the accurate comparison module and the reconstruction information corresponding to the division mode.

Further, the plurality of modules further comprises an entropy encoding module, the plurality of pipelined steps further comprises entropy encoding pipelined steps,

and the entropy coding pipelining step generates a binary code stream which accords with the H.265 protocol specification according to the division mode with the minimum cost corresponding to each CTU block output by the accurate comparison module and the corresponding entropy coding information through an entropy coding module.

The inventors also provide an h.265 encoding method applied to an h.265 encoding apparatus comprising a plurality of modules and a plurality of pipelined steps, each pipelined step comprising at least one pipelined stage for executing at least one module, wherein:

the method comprises the following steps:

in the preprocessing pipeline step, a current frame in an original video is divided into a plurality of CTU blocks through a preprocessing module;

the method further comprises the following steps:

the inter-frame prediction coarse selection pipeline stage divides each CTU block according to a plurality of division modes through an inter-frame prediction coarse selection module, each division mode divides one CTU block into a plurality of corresponding CU blocks and divides each CU block into one or more corresponding PU blocks, inter-frame prediction is carried out on each division mode of each CTU block, reference frame information is obtained, intra-frame prediction is carried out on each division mode of each CTU block, and prediction information corresponding to each division mode is generated;

the intra-frame prediction coarse selection pipeline stage comprises an intra-frame prediction coarse selection module: and carrying out intra-frame prediction on each PU block in each division mode, calculating corresponding cost, selecting one or more intra-frame prediction directions relative to the cost of each PU block according to the cost, and taking the selected intra-frame prediction directions as prediction information corresponding to the division mode.

The method comprises the following steps:

the method comprises the following steps:

the fine search pipeline stage passes through a fine search module: setting a fine search area in a reconstructed image of a reference frame aiming at each PU block according to the coarse search vector, and generating a fine search vector with the minimum cost corresponding to the PU block in the fine search area; generating one or more predicted motion vectors with the same function as the rough search vector according to motion vector information around the current CTU block, and generating a fine search vector according to the predicted motion vectors; all the generated fine search vectors are sent to a fractional pixel search module;

the method comprises the following steps:

the intra-prediction coarse selection pipeline comprises the following steps: for each PU block in each division mode, generating a reference pixel by using an original pixel of a current frame, predicting all intra-frame prediction directions according to the reference pixel according to the rule of an H.265 protocol to obtain prediction results of all directions, respectively calculating distortion cost with the original pixel according to the prediction results of all directions, and sorting the cost from small to large to select one or more intra-frame prediction directions with lower cost.

the method comprises the following steps:

the reference pixel generation module generates a reference pixel by using reconstructed pixels of a current frame for each PU block in each division mode, predicts all intra-frame prediction directions according to the reference pixel according to the rules of an H.265 protocol to obtain prediction results of all directions, calculates distortion cost according to the prediction results of all directions and original pixels respectively, and selects one or more intra-frame prediction directions with lower cost by sequencing the cost from small to large.

the method comprises the following steps:

Further, the plurality of modules further comprises an entropy encoding module, and the plurality of pipelined steps further comprises entropy encoding pipelined steps;

the method comprises the following steps: and the entropy coding pipelining step generates a binary code stream which accords with the H.265 protocol specification according to the division mode with the minimum cost corresponding to each CTU block output by the accurate comparison module and the corresponding entropy coding information through an entropy coding module.

Compared with the prior art, the method improves the searching precision by a distributed searching mode, better reserves the details of the reconstructed image and reduces the hardware resource consumption. Meanwhile, the modules are in flow operation, and scheduling is carried out through the integral control module, so that the coding efficiency is effectively improved.

Drawings

Fig. 1 is a schematic diagram of an h.265 encoding device according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a coarse selection module of an h.265 encoding device according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a coarse search process of an h.265 encoding device according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a fine search process of an h.265 encoding device according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating fractional pixel search of an H.265 encoding device according to an embodiment of the present invention;

fig. 6-a is a schematic diagram of search prediction performed by an h.265 encoding device according to an embodiment of the present invention;

FIG. 6-B is a diagram illustrating search prediction performed by an H.265 encoding apparatus according to another embodiment of the present invention;

FIG. 7 is a schematic diagram of a precise comparison module of an H.265 encoding device according to an embodiment of the invention;

fig. 8 is a schematic diagram of a hierarchical comparison module of an h.265 encoding device according to an embodiment of the present invention;

fig. 9 is a flowchart of an h.265 encoding method according to an embodiment of the present invention;

FIG. 10 is a flowchart of a coarse search method for H.265 encoding according to an embodiment of the present invention;

FIG. 11 is a flowchart of a fine search method for H.265 encoding according to an embodiment of the present invention;

FIG. 12 is a flowchart of a fractional pixel search method for H.265 encoding according to an embodiment of the present invention;

fig. 13 is a diagram illustrating motion vector information around a current CTU block according to an embodiment of the present invention;

fig. 14 is a schematic diagram of an h.265 encoding device according to another embodiment of the present invention;

reference numerals:

100. original video;

101. original image frames;

102. a current frame;

110. an image encoding device; 120. a preprocessing module; 130. a coarse selection module; 140. a precise comparison module; 150. an entropy coding module; 160. a deblocking filter module; 170. a sample adaptive bias module; 180. a post-processing module;

121. a current CTU; 141. encoding information; 180. encoding the video; 190. code stream; 145. reconstructing a frame image;

230. an inter-frame prediction coarse selection module; 211. a coarse search module; 213. a fine search module; 215. a fractional pixel search module;

330. an intra-frame prediction coarse selection module; 231. a reference pixel generation module;

310. a reference frame; 311. down-sampling; 320. down-sampled images; 351. a motion vector; 352. a minimum cost pixel block; 330. a current CTU; 340. and (5) performing down-sampling on the CTU.

410. A reference frame; 420. a current PU position; 421. restoring the motion vector; 423. fine searching motion vectors; 430. a fine search area; 431. starting to search the position; 433. a minimum cost location;

510. a reference frame; 520. a current PU position; 521. fine searching motion vectors; 423. fractional pixel search motion vectors; 530. a fractional pixel search area; 531. starting to search the position; 533. a minimum cost location;

711. a distribution module; 721. level _ calc0 is calculated at one Level; 722. a secondary Level calculation Level _ calc 1;

723. three-Level calculation Level _ calc 2; 724. level _ calc3 is calculated in four stages;

740. a hierarchical comparison module;

810. a single-level computing module; 820. an interframe mode cost calculation module; 830. an intra mode cost calculation module; 840. a preference module;

910. a reference frame data loading module; 920. and an integral control module.

Detailed Description

To explain technical contents, structural features, and objects and effects of the technical solutions in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.

Please refer to fig. 1 and fig. 14, which are schematic diagrams of an h.265 encoding device according to the present invention. The device is the image encoding device 110, and the device may be a chip with an image encoding function, or an electronic device including the chip, such as an intelligent mobile device like a mobile phone, a tablet computer, a personal digital assistant, or an electronic device like a personal computer, an industrial device, or the like.

The h.265 encoding device of the present invention may employ a pipeline comprising a plurality of pipelined steps to implement the steps of a particular embodiment. The "Pipeline", also called Pipeline (Pipeline), refers to a hardware implementation that divides the encoding process of h.265 into multiple steps and executes the steps in parallel by multiple corresponding hardware processing units to speed up the processing. "pipelining step" refers to a particular step on a pipeline; a "pipeline stage" refers to a particular pipeline stage within a pipeline step. In other words, a pipeline may include one or more pipeline steps; a pipeline step may include one or more pipeline stages. When only one pipeline stage is included in a pipeline step, the pipeline step and the pipeline stage may be treated identically.

In some embodiments, a particular hardware module may support the execution of one or more pipelined steps. That is, all pipeline stages in the pipeline steps are responsible for operation by the hardware module (or by the sub-modules included therein). In other embodiments, a particular hardware module may support at least one pipeline stage of operation. If a pipeline step has multiple pipeline stages, the hardware module is only responsible for the operation of a specific pipeline stage or stages in the pipeline step. In other words, the pipeline steps may be implemented by a plurality of hardware modules, each of which is responsible for running a corresponding pipeline stage in a corresponding pipeline step.

The apparatus comprises a plurality of modules and a plurality of pipeline steps, each pipeline step comprising at least one pipeline stage for executing at least one module, wherein:

the plurality of modules comprise a preprocessing module 120, a rough selection module 130, a precise comparison module 140 and an overall control module 920, wherein the overall control module 920 is respectively connected with the preprocessing module 120, the rough selection module 130 and the precise comparison module 140;

the preprocessing pipeline step divides a current frame 102 in an original video 100 into a plurality of CTU blocks (Coding Tree units) through a preprocessing module 120. The CTU is a sub-block in the current frame picture, and may have any one of a 16x16 sub-block, a 32x32 sub-block, and a 64x64 sub-block. Specifically, the preprocessing module can obtain an original image frame 101 in an original video 100 and select a current frame 102 from the original image frame 101.

The rough selection pipeline divides each CTU block according to a plurality of division modes through the rough selection module 130, performs rough inter-frame prediction selection and rough intra-frame prediction selection on each division mode of each CTU block, and generates prediction information corresponding to each division mode.

As shown in fig. 2, in the present embodiment, the rough selection module includes: the device comprises an inter-frame prediction rough selection module and an intra-frame prediction rough selection module; the roughly selecting flow step comprises the following steps: roughly selecting a pipeline level by inter-frame prediction and roughly selecting a pipeline level by intra-frame prediction;

the inter-frame Prediction coarse selection pipeline stage divides each CTU block according to a plurality of division modes through an inter-frame Prediction coarse selection module, each division mode divides one CTU block into a plurality of corresponding CU blocks (Coding units) and divides each CU block into one or more corresponding PU blocks (Prediction units), performs inter-frame Prediction on each division mode of each CTU block and acquires reference frame information, and performs intra-frame Prediction on each division mode of each CTU block and generates Prediction information corresponding to each division mode. The division mode is selected according to actual needs, for example, for a current CTU 121 with a size of 64x64, the current CTU can be divided into 4 32x32 sub-blocks; each 32x32 sub-block may in turn be divided into 4 16x16 sub-blocks.

The intra-frame prediction coarse selection pipeline stage is characterized in that an intra-frame prediction coarse selection module is used for: and carrying out intra-frame prediction on each PU block in each division mode, calculating corresponding cost, selecting one or more intra-frame prediction directions relative to the cost of each PU block according to the cost, and taking the selected intra-frame prediction directions as prediction information corresponding to the division mode. Each PU block has its own corresponding motion vector, and the motion vector of each PU block is used to obtain prediction information from the reconstructed reference frame, and specifically, the prediction information may be obtained from the position of the current PU block as a starting point according to the motion vector corresponding to the PU block.

In the accurate comparison pipeline step, through the accurate comparison module 140, cost calculation and comparison are performed on the prediction information corresponding to each division mode of each CTU block, one division mode with the minimum cost for each CTU block and the coding information corresponding to the division mode are selected, and according to the selected division mode and the coding information corresponding to the division mode, entropy coding information used for generating an h.265 code stream for a current frame and reconstruction information used for generating a reconstructed frame for the current frame are generated. Therefore, the searching precision is improved through a distributed searching mode, the details of the reconstructed image are better reserved, and the hardware resource consumption is reduced.

The integral control module is used for controlling the storage and the acquisition of the original frame data and the reference frame data, and controlling the preprocessing module, the rough selection module and the accurate comparison module to sequentially execute the flow steps corresponding to the rough selection module and the accurate comparison module. Preferably, the rough selection pipeline step is performed after the pre-treatment pipeline step, and the precise comparison pipeline step is performed after the rough selection pipeline step. In short, when the rough selection module executes the rough selection flow line step corresponding to the current frame, the preprocessing module can perform the preprocessing flow line step of the next frame corresponding to the current frame, and when the accurate comparison module executes the accurate comparison flow line step corresponding to the current frame, the rough selection module can perform the rough selection flow line step of the next frame corresponding to the current frame, so as to realize flow line operation, thereby effectively improving the coding efficiency.

In some embodiments, the coarse selection module 130 further comprises an inter-frame coarse selection module 230, and the precise comparison module 140 further comprises an intra-frame coarse selection module 330; the coarse selection pipelining step comprises an interframe coarse selection pipelining stage, and the accurate comparison pipelining step comprises an intraframe coarse selection pipelining stage.

In short, in the practical application process, the intra-frame coarse selection module 330 may be attached to either the coarse selection module 130 or the precise comparison module 140, so as to widen the application scenarios of the present apparatus.

In some embodiments, the inter-prediction coarse selection module 230 comprises: a coarse search module 211, a reference frame data loading module 910, a fine search module 213, and a fractional pixel search module 215. The roughly selecting flow step comprises the following steps: a rough searching pipeline stage, a reference frame data loading pipeline stage, a fine searching pipeline stage and a fractional pixel searching pipeline stage;

the fractional pixel search pipeline stage comprises a fractional pixel search module: and according to each received fine search vector, setting a corresponding fractional pixel search area in the reference frame for each PU block, and generating a fractional pixel search vector with the minimum cost corresponding to the PU block in the fractional pixel search area. Preferably, the intra-frame prediction coarse selection pipeline stage and the fractional pixel search pipeline stage are the same pipeline stage, and the intra-frame prediction coarse selection module and the fractional pixel search module are executed in parallel in the same pipeline stage.

The reference list is a list for storing reference frames, and the reference frame of the current frame can have multiple frames which are all indexed through the reference list. One reference frame includes a reconstructed frame and an original frame. Since the reference frame and the current CTU block are obtained by down-sampling, the coarse search vector calculated by the coarse search module should also be the corresponding down-sampled search vector, i.e. the coarse search vector corresponding to the current CTU block needs to be multiplied by the down-sampled magnification (e.g. 1/4), and the coarse search vector multiplied by the corresponding magnification is transmitted to the next processing module.

In some embodiments, the coarse intra-prediction selection module comprises a reference pixel generation module, executing in a coarse intra-prediction selection pipeline stage; the intra-prediction coarse selection pipeline stage comprises: for each PU block in each division mode, generating a reference pixel by using an original pixel of a current frame, predicting all intra-frame prediction directions according to the reference pixel according to the rule of an H.265 protocol to obtain prediction results of all directions, respectively calculating distortion cost with the original pixel according to the prediction results of all directions, and sorting the cost from small to large to select one or more intra-frame prediction directions with lower cost.

In some embodiments, the coarse intra prediction selection pipeline stage is a different pipeline stage than the fractional pixel search pipeline stage, and the coarse intra prediction selection module executes a pipeline stage after the fractional pixel search module. The intra-frame prediction coarse selection module comprises a reference pixel generation module and is executed in an intra-frame prediction coarse selection pipeline stage; the reference pixel generation module is used for generating reference pixels by using reconstructed pixels of a current frame for each PU block in each partition mode, predicting all intra-frame prediction directions according to the reference pixels according to the rules of an H.265 protocol to obtain prediction results of all directions, calculating distortion cost according to the prediction results of all directions and original pixels, and sorting the cost from small to large to select one or more intra-frame prediction directions with lower cost.

As shown in fig. 3, the coarse search module selects one of the original frame and the reconstructed frame as a reference frame, performs downsampling on the reference frame and the current CTU, and finds a pixel position and a coarse search vector with a minimum cost compared with the downsampled CTU in the downsampled reference frame. Preferably, in this embodiment, the reference frame and the current CTU are down-sampled at the same scale. For example, when the length and width of the reference frame are each scaled to 1/4 in the downsampled image 320 obtained by downsampling 311 the reference frame 310, the length and width of the current CTU330 is each scaled to 1/4 in the downsampled CTU obtained by downsampling 331 the current CTU 330. Then, taking the down-sampled CTU340 (B subblock in fig. 3) as a unit, predicting in the down-sampled image (a subblock in fig. 3), sequentially calculating the cost of each corresponding subblock (a subblock with the same size as the B subblock and with each pixel point in the a subblock as the center) in the down-sampled image 320 and the sampled CTU340, finding out a pixel block with the minimum cost compared with the down-sampled CTU, recording as a minimum cost pixel block 352 (C subblock in fig. 3), and recording the center pixel position of the current minimum cost pixel block and a coarse search vector, where the coarse search vector is a vector displacement (i.e., motion vector 351 in fig. 3) between the center pixel of the down-sampled CTU340 (B subblock in fig. 3) and the center pixel position of the minimum cost pixel block 352 (C subblock in fig. 3).

As shown in fig. 13, for a current CTU block of 64x64 size, there are 10 sub-blocks (sub-blocks labeled 1-10 in fig. 13) of 8x8 size located at the top, a CTU block adjacent to the top left and a CTU block adjacent to the top right, respectively, and a coarse search result and corresponding motion vector information thereof. Further, there are 16 cooperative motion vectors inside the current CTU block, and thus there are at most 28 mvs as adjoining mvs (i.e., motion vector information around the current CTU block). The 28 pieces of motion vector information are screened out by a certain screening to screen out a preset number (e.g., 3) of adjacent mvs and then transmitted to a fine search module, so as to determine a same preset number of fine search motion vectors. In this embodiment, the same function means that the screened preset number of adjacent mvs is consistent with the search result obtained by the coarse search module, that is, all the adjacent mvs are input to the interface of the fine search module for the next processing.

In this embodiment, the rough search module inputs a motion vector to the fine search module, then selects several mvs from adjacent mvs to input to the fine search module, and if N mvs are input to the fine search module, the fine search module also generates N fine searches rmv (i.e., fine search vectors), and inputs the N fine search vectors to the FME (i.e., fractional pixel search module), and the FME compares costs from the N fine search mvs to obtain an optimal FME _ mv (i.e., fractional pixel search vector), and finally inputs the FME _ mv to the precise comparison module.

As shown in fig. 5, in order to further improve the search accuracy, the fractional-pixel search module 215 is configured to set a corresponding fractional-pixel search area 530 in the reference frame for each PU block according to each received fine search vector, and generate a fractional-pixel search vector 423 with the smallest cost corresponding to the PU block in the fractional-pixel search area 530. Specifically, the fractional-pixel search area 530 may be determined by: according to the current PU position 520 and the previously obtained fine search motion vector, an initial search position 531 corresponding to the current PU position 520 is determined in the reference frame 510, K pixels (the value of K can be set according to actual needs) are respectively expanded in 4 directions, up, down, left and right, with the initial search position pixel as a center, and a square region with the side length of 2K is obtained as a fractional pixel search region 530. Similar to the search method of the fine search, the cost of the subblocks with the same size as the current PU and centered on the pixel of the initial search position 531 and the pixel of the fractional pixel search area 530 is sequentially calculated by taking the pixel of the initial search position 531 as the center, the minimum cost position 533 is found, and the motion vector between the current PU position and the minimum cost position 533 is calculated and recorded as the fractional pixel search motion vector 523.

Fig. 7 is a schematic diagram of a precise comparison module in an h.265 encoding device according to an embodiment of the present invention. In some embodiments, the exact compare module 140 includes a distribution module 711, a plurality of single-level computation modules (e.g., 721, 722, 723, and 724), and a plurality of hierarchical compare modules 740. The distribution module 711 is connected with the coarse selection module 130 and connected with a plurality of single-stage calculation modules; each of the single level computing modules is connected to a corresponding hierarchical comparison module 740. Wherein:

the distribution module 711 is configured to distribute, according to different partition modes of each CTU block, different prediction information corresponding to the CU block in each partition mode to different single-level calculation modules;

the single-stage calculation module is configured to calculate multiple pieces of cost information according to the prediction information corresponding to the CU block received from the distribution module 711, perform intra-layer comparison, and select a prediction mode and a partition mode with the smallest cost corresponding to the CU block;

the hierarchical comparison module 740 is configured to compare the cost information calculated by the single-level comparison modules at different layers, and select a partition mode with the minimum cost for the CTU block and corresponding coding information.

In certain embodiments, the precise comparison module 140 of FIG. 7 includes four single-

stage calculation modules

721, 722, 723, and 724. Each single-

level computing module

721, 722, 723, and 724 may be comprised of the single-level computing module 810 of fig. 8. As shown in fig. 8, the single-level calculation module 810 includes an inter mode cost calculation module 820, an intra mode cost calculation module 830, and a preference module 840. For each input CU, the single-level calculating module 810 may calculate an inter cost through the inter mode cost calculating module 820, calculate an intra cost through the intra mode cost calculating module 830, compare the inter cost and the intra cost through the optimizing module 840, and determine a partition mode and a prediction mode with the minimum comprehensive cost, that is, the partition mode and the prediction mode with the minimum cost corresponding to the current input CU.

Returning to the embodiment of fig. 7, each single-

level computation module

721, 722, 723, and 724 is used to process a particular level of CU blocks. For example, the single-level calculation module 721 may be set as a one-level calculation module for processing CU blocks of size 64 × 64; the single-level computation module 722 may be configured as a two-level computation module for processing CU blocks of size 32x 32; the single-level calculation module 723 may be provided as a three-level calculation module for processing CU blocks of size 16 × 16; the single-level calculation module 724 may be set to a four-level calculation module for processing CU blocks of size 8x 8. It is assumed that the precise comparison module 140 receives one CTU and corresponding partition mode, prediction information, and a plurality of inter motion vectors and reference information from the coarse selection module 130. The distribution module 711 can distribute the CUs in various partition modes to the

calculation modules

721 and 724 in different levels according to the sizes of the CUs.

In some embodiments, the intra mode cost calculation module 830 of each single level calculation module,

one or more intra prediction information associated with a CU at a certain level is received, and an intra cost is calculated and selected. The inter-frame mode cost calculation module 820 of each single-stage calculation module receives one or more inter-frame motion vectors and reference information related to a CU of a certain level at the same time/in parallel, calculates and selects an inter-frame cost. Then, the optimization module 840 of each single-level calculation module will optimize a minimum cost from the already calculated intra-frame cost and inter-frame cost. In other words, when the minimum cost is the intra cost, it is better to use the related intra prediction information for h.265 encoding; when the minimum cost is the inter-frame cost, it is better to use the related inter-frame motion vector and reference information for h.265 encoding.

For example, the hierarchical comparison module 743 may compare the sum of the minimum costs corresponding to the four 8x8 blocks calculated by the four-level calculation module 724 with the minimum cost of 1 16x16 block calculated by the three-level calculation module 723, and obtain the partition mode with smaller cost. Specifically, the hierarchy compares: the 4 8 × 8 blocks (referred to as A, B, C, D blocks, as an assumption), may all be the minimum cost blocks obtained by the inter-frame comparison, all be the minimum cost blocks obtained by the intra-frame comparison, or contain both the minimum cost blocks obtained by the inter-frame comparison and the minimum cost blocks obtained by the intra-frame comparison. For example, the a block may be inter-frame acquired and the B, C, D block may be intra-frame acquired. Or A, C blocks may be inter-frame acquired and B, D blocks intra-frame acquired.

Similarly, the hierarchical comparison module 742 may select 4 of the 16x16 blocks with the lowest cost obtained from the hierarchical comparison module 743 to be combined with 1 of the 32x32 blocks with the lowest cost calculated from the secondary calculation module 722 for comparison. Specifically, the 4 16 × 16 blocks (referred to as E, F, G, H blocks) selected by the hierarchical comparison module 742 may include a complete 16 × 16CU block or may be composed of a plurality of 8 × 8 blocks. For example, the E block may be a 16x16CU block acquired between frames; the F block may be a 16x16CU block acquired within a frame; the G block may be a 16x16 combined block consisting of 4 8x8 blocks acquired inter and intra.

Similarly, the hierarchical comparison module 741 may select 4 32x32 blocks with the smallest cost from the hierarchical comparison module 742 to be combined with 1 64x64 block with the smallest cost from the primary computation module 721 for comparison. Specifically, the 4 32x32 blocks (referred to as I, J, K, L blocks) selected by the hierarchical comparison module 741 may include a complete 32x32CU block, or may be a combination block composed of a plurality of 16x16 blocks, each of which is further composed of a plurality of 8x8 blocks. For example, an I-block may be a 32x32CU block acquired between frames; j blocks are composed of 4 16 × 16CU blocks including inter-frame acquisition and intra-frame acquisition; one or more of the 16x16 blocks in the K block may be respectively composed of a plurality of 8x8 blocks.

In the above manner, the hierarchical comparison module 740 may find the combination of the CTU, CU and PU blocks with the minimum cost, and select the partition mode with the minimum cost for the CTU block and the corresponding coding information.

In some embodiments, the intra-prediction coarse selection module 330 includes a reference pixel generation module 231; the intra-prediction coarse selection module 330 executes in the intra-prediction coarse selection pipeline;

the reference pixel generation module 231 is configured to generate a reference pixel for each PU block in each partition mode by using an original pixel of the current frame, predict all intra-frame prediction directions according to the reference pixel according to a rule of an h.265 protocol to obtain prediction results in each direction, calculate distortion costs according to the prediction results in each direction and the original pixel, and sort the costs from small to large to select one or more intra-frame prediction directions with lower costs.

The method for the coarse selection by the intra-frame prediction coarse selection module is similar to that of the inter-frame prediction coarse selection module, and is not described herein again. The difference between the two methods is that when the intra-frame prediction is carried out, the original frame is subjected to down-sampling to obtain an image after down-sampling, and the CTU after down-sampling is subjected to down-sampling in the original frame to obtain an image after down-sampling for prediction; when inter-frame prediction is performed, down-sampling is performed on a reference frame to obtain a down-sampled image, and prediction is performed on the down-sampled image obtained by down-sampling the reference frame by the down-sampled CTU.

As shown in fig. 6-a and 6-B, according to the protocol of h.265, the reference pixel should be a reconstructed pixel, but in the process of hardware implementation, only the original pixel can be obtained at the current time point, and the reconstructed pixel often cannot be obtained yet, so the method of replacing the reconstructed pixel with the original pixel is adopted in the present invention. Taking a PU sub-block with a size of 4x4 as an example, the dot portion with black padding in the figure is an edge pixel, and according to the h.265 protocol, the boundary pixels of the 4x4 block (the dot portion with shaded padding in fig. 6-B) are 17 in total, and the black padding portion pixels (i.e., the edge pixels) in the figure should apply the reconstructed pixel padding, but the reconstructed pixel cannot be obtained at the current time point and is only replaced by the original pixel. The shaded filling portion is a PU block of 4x4 size. After the boundary pixel filling is completed, predicting according to the protocol to obtain a block with the size of 4x4 filled by a shadow part.

As shown in fig. 4, the fine search module sets a fine search area in the reference frame for each PU according to the coarse search vector, and finds a fine search vector with the minimum cost corresponding to the PU in the fine search area. The fine search step is performed within the reference frame 410, each current CTU contains a plurality of PUs, and the fine search is performed by selecting one of the PUs as the current PU in a certain order. Specifically, the current PU position 420 is first determined, and then a fine search area 430 is set in the reference frame for the PU according to the previously obtained coarse search vector (or referred to as the restored motion vector 421). And then determines a start search location 431 within the fine search area 430 corresponding to the current PU location 420 based on the restored motion vector 421. Similar to the search mode of the coarse search, in the fine search region 430, with the pixel of the initial search position 431 as the center, the costs of sub-blocks with the same size of the current PU and the centers of the pixel points of the initial search position 431 and the fine search region 430 are sequentially calculated, the minimum cost position 433 is found, and the motion vector between the current PU position 420 and the minimum cost position 433 is calculated and recorded as the fine search motion vector 423.

In certain embodiments, the apparatus further comprises a post-processing module 180, the post-processing module 180 being connected to the precision comparison module 140; the post-processing module 180 executes post-processing flow steps, which include: and generating a reconstructed frame corresponding to the current frame according to the division mode with the minimum cost corresponding to each CTU block output by the accurate comparison module and the reconstruction information corresponding to the division mode.

Preferably, the post-processing module 180 includes a deblocking filtering module 160 and a sample adaptive offset module 170; the post-processing pipelining step comprises a deblocking filtering pipelining step and a sample self-adaptive offset step; the deblocking filtering module is executed in a deblocking filtering pipeline step, and the sample adaptive offset step is executed in a sample adaptive offset step; the deblocking filtering pipeline steps comprise: filtering the reconstructed frame by using the division mode with the minimum cost and the reconstruction information corresponding to the division mode provided by the accurate comparison module; the sample adaptive offset pipelining step includes: and carrying out SAO calculation on the reconstructed frame after the filtering processing to obtain a final reconstructed frame for reference and display. The deblocking filtering pipeline step and the sample adaptive offset pipeline step are sequentially and serially executed in the post-processing pipeline stage.

In certain embodiments, the apparatus further comprises an entropy encoding module 150, the entropy encoding module 150 being connected to the exact comparison module 140. The entropy coding module 150 performs entropy coding pipeline steps that include: and generating an h.265 code stream corresponding to the current frame according to the division mode with the minimum cost corresponding to each CTU block output by the precise comparison module 140 and entropy coding information corresponding to the current frame generated according to the coding information corresponding thereto. The entropy coding pipeline step and the post-processing pipeline step are executed in the same pipeline level in parallel.

Specifically, the exact comparison module 140 generates the data required for entropy coding corresponding to the CTU, i.e. the coding information 141 shown in fig. 1, according to the partition mode and the prediction mode with the minimum CTU cost, and the entropy coding module 150 is configured to generate the coded bitstream 190 corresponding to the original video according to the data required for entropy coding corresponding to the CTU. Meanwhile, the image encoding device 110 may also output the encoded video 180, and a certain image frame of the encoded video 180 is the reconstructed image frame 145.

As shown in fig. 9, the inventor further provides an h.265 encoding method applied to an h.265 encoding apparatus, the apparatus including a plurality of modules and a plurality of pipeline steps, each pipeline step including at least one pipeline stage for executing at least one module, wherein:

the method comprises the following steps:

firstly, a step S101 of preprocessing the pipeline is entered, and a current frame in an original video is divided into a plurality of CTU blocks through a preprocessing module;

then, in the step S102, a rough selection pipelining step is carried out, each CTU block is divided according to a plurality of division modes through a rough selection module, interframe prediction rough selection and intraframe prediction rough selection are carried out on each division mode of each CTU block, and prediction information corresponding to each division mode is generated;

then, the step S103 of accurately comparing the flow is carried out, the cost calculation and comparison are carried out on the prediction information corresponding to each division mode of each CTU block through an accurate comparison module, one division mode with the minimum cost for each CTU block and the coding information corresponding to the division mode are selected, entropy coding information used for generating the current frame into an H.265 code stream and reconstruction information used for generating the current frame into a reconstructed frame are generated according to the selected division mode and the coding information corresponding to the division mode,

As shown in fig. 10, the coarse selection module includes: the device comprises an inter-frame prediction rough selection module and an intra-frame prediction rough selection module; the roughly selecting flow step comprises the following steps: roughly selecting a pipeline level by inter-frame prediction and roughly selecting a pipeline level by intra-frame prediction;

the method further comprises the following steps:

In some embodiments, the coarse selection module further comprises an inter-frame coarse selection module, and the precise comparison module further comprises an intra-frame coarse selection module; the coarse selection pipelining step comprises an interframe coarse selection pipelining stage, and the accurate comparison pipelining step comprises an intraframe coarse selection pipelining stage.

The method comprises the following steps:

In short, the intra-frame coarse selection module may be a part of the coarse selection module or a part of the precise comparison module, thereby effectively widening the application scenarios of the present invention.

In some embodiments, the inter-prediction coarse selection module comprises: the device comprises a rough searching module, a reference frame data loading module, a fine searching module and a fractional pixel searching module;

the method comprises the following steps:

as shown in fig. 10, the coarse search pipeline stage passes through the coarse search module: firstly, step S201 is entered, a rough search module selects a frame from a reference array, and selects a reference frame from an original frame or a reconstructed frame; then, the step S202 is carried out to carry out downsampling operation on the reference frame and the current CTU block; then step S203 is entered to find the pixel position with the minimum cost compared with the CTU block after downsampling in the reference frame after downsampling, and calculate the coarse search vector of the pixel position relative to the current CTU block.

as shown in fig. 11, the fine search pipeline stage passes through the fine search module: firstly, step S301 is carried out, a fine search area is set in a reconstructed image of a reference frame aiming at each PU block according to a coarse search vector; then step S302 is entered to generate a fine search vector with the minimum cost corresponding to the PU block in the fine search area; generating one or more predicted motion vectors with the same function as the rough search vector according to motion vector information around the current CTU block, and generating a fine search vector according to the predicted motion vectors; all the generated fine search vectors are sent to a fractional pixel search module;

as shown in fig. 12, the fractional pixel search pipeline stage passes through a fractional pixel search module: firstly, step S401, the fractional pixel search module sets a corresponding fractional pixel search area in a reference frame for each PU block according to each received fine search vector; then step S402 is performed to generate a fractional-pixel search vector with the minimum cost corresponding to the PU block in the fractional-pixel search region.

In some embodiments, the coarse intra prediction selection pipeline stage and the fractional pixel search pipeline stage are the same pipeline stage, and the coarse intra prediction selection module and the fractional pixel search module are executed in parallel in the same pipeline stage. In short, the intra-frame prediction coarse selection pipeline stage and the fractional pixel search pipeline stage may be executed in parallel in time sequence, i.e., synchronously, or sequentially, i.e., the intra-frame prediction coarse selection pipeline stage is executed first, and then the fractional pixel search pipeline stage is executed.

In some embodiments, the coarse intra-prediction selection module comprises a reference pixel generation module, executing in a coarse intra-prediction selection pipeline stage; the method comprises the following steps:

In certain embodiments, the plurality of modules further comprises a post-processing module, the plurality of pipelined steps further comprises post-processing pipelined steps, the method comprising: and the post-processing pipeline step generates a reconstructed frame corresponding to the current frame by the post-processing module according to the division mode with the minimum cost corresponding to each CTU block output by the accurate comparison module and the reconstruction information corresponding to the division mode. In other embodiments, the plurality of modules further comprises an entropy encoding module, the plurality of pipelined steps further comprises an entropy encoding pipelined step; the method comprises the following steps: and the entropy coding pipelining step generates a binary code stream which accords with the H.265 protocol specification according to the division mode with the minimum cost corresponding to each CTU block output by the accurate comparison module and the corresponding entropy coding information through an entropy coding module.

Referring to fig. 14, the pre-processing module 120 belongs to a first-stage pipeline and performs pre-processing pipeline steps. The coarse selection module performs a coarse selection pipeline step, and the coarse selection module includes a coarse search module 211, a reference frame data loading module 910, a fine search module 213, and a fractional pixel search module 215. Accordingly, the coarse selection pipeline stages include a coarse search pipeline stage (i.e., a two-level pipeline), a reference frame data loading pipeline stage (i.e., a three-level pipeline), a fine search pipeline stage (i.e., a four-level pipeline), and a fractional pixel search pipeline stage (i.e., a five-level pipeline). Preferably, the intra prediction coarse selection module and the fractional pixel search module are executed in parallel in the same pipeline (i.e. both are executed in five-stage pipeline). The precise comparison module 140 performs precise comparison pipeline steps, which belong to six-level pipeline. The entropy coding module 150 and the post-processing module are respectively executed in the entropy coding pipeline stage and the post-processing pipeline stage, and the entropy coding pipeline stage and the post-processing pipeline stage are executed in parallel in seven-stage pipeline. The first to seventh-level flow lines realize data transmission, scheduling and control through the integral control module 920, so that the coding process is orderly carried out, and the coding efficiency is greatly improved.

It should be noted that, although the above embodiments have been described herein, the invention is not limited thereto. Therefore, based on the innovative concepts of the present invention, the technical solutions of the present invention can be directly or indirectly applied to other related technical fields by making changes and modifications to the embodiments described herein, or by using equivalent structures or equivalent processes performed in the content of the present specification and the attached drawings, which are included in the scope of the present invention.

Claims

1. An h.265 encoding apparatus comprising a plurality of modules and a plurality of pipelined steps, each pipelined step comprising at least one pipelined stage for executing at least one module, wherein:

2. The H.265 encoding device of claim 1,

the coarse selection module comprises: the device comprises an inter-frame prediction rough selection module and an intra-frame prediction rough selection module; the roughly selecting flow step comprises the following steps: roughly selecting a pipeline level by inter-frame prediction and roughly selecting a pipeline level by intra-frame prediction;

3. The H.265 encoding device of claim 1,

the rough selection module further comprises an interframe rough selection module, and the accurate comparison module further comprises an intraframe rough selection module;

the rough selection pipelining step comprises an interframe rough selection pipelining stage, and the precise comparison pipelining step comprises an intraframe rough selection pipelining stage;

4. The H.265 encoding device of claim 2 or 3,

the coarse inter-prediction selection module comprises: the device comprises a rough searching module, a reference frame data loading module, a fine searching module and a fractional pixel searching module;

5. The h.265 encoding device of claim 4, wherein the intra-prediction coarse selection pipeline stage is a same pipeline stage as a fractional-pixel search pipeline stage, and the intra-prediction coarse selection module and the fractional-pixel search module execute in parallel in the same pipeline stage.

6. The H.265 encoding device of claim 5,

the intra-frame prediction coarse selection module comprises a reference pixel generation module and is executed in an intra-frame prediction coarse selection pipeline stage;

7. The h.265 encoding device of claim 4, wherein the intra-prediction coarse selection pipeline stage is a different pipeline stage than a fractional-pixel search pipeline stage, the intra-prediction coarse selection module executing in the pipeline stage after the fractional-pixel search module.

8. The H.265 encoding device of claim 7,

9. The H.265 encoding device of claim 1 wherein the plurality of modules further comprises a post-processing module, the plurality of pipelined steps further comprises a post-processing pipelined step,

10. The H.265 encoding apparatus of claim 1 wherein the plurality of modules further comprises an entropy encoding module, the plurality of pipelined steps further comprises an entropy encoding pipelined step,

11. An h.265 encoding method applied to an h.265 encoding apparatus, the apparatus comprising a plurality of modules and a plurality of pipelined steps, each pipelined step comprising at least one pipelined stage for executing at least one module, wherein:

the method comprises the following steps:

12. The H.265 encoding method of claim 11,

the method further comprises the following steps:

13. The H.265 encoding method of claim 11,

the method comprises the following steps:

14. The H.265 encoding method of claim 12 or 13,

the method comprises the following steps:

15. The h.265 encoding method of claim 14 wherein the intra-prediction coarse selection pipeline stage is the same pipeline stage as the fractional-pixel search pipeline stage, and the intra-prediction coarse selection module and the fractional-pixel search module execute in parallel in the same pipeline stage.

16. The H.265 encoding method of claim 15,

the method comprises the following steps:

17. The h.265 encoding method of claim 14 wherein the intra-prediction coarse selection pipeline stage is a different pipeline stage than a fractional-pixel search pipeline stage, the intra-prediction coarse selection module executing in the pipeline stage after the fractional-pixel search module.

18. The H.265 encoding method of claim 17,

the method comprises the following steps:

19. The H.265 encoding method of claim 11 wherein the plurality of modules further comprises a post-processing module, the plurality of pipelined steps further comprises a post-processing pipelined step,

the method comprises the following steps:

20. The h.265 encoding method of claim 11 wherein the plurality of modules further comprises an entropy encoding module, the plurality of pipelined steps further comprising an entropy encoding pipelined step;