CN110519597B

CN110519597B - HEVC-based encoding method and device, computing equipment and medium

Info

Publication number: CN110519597B
Application number: CN201910837345.5A
Authority: CN
Inventors: 欧阳国胜
Original assignee: Beijing Jiaxun Feihong Electrical Co Ltd
Current assignee: Beijing Jiaxun Feihong Electrical Co Ltd
Priority date: 2019-09-05
Filing date: 2019-09-05
Publication date: 2022-05-10
Anticipated expiration: 2039-09-05
Also published as: CN110519597A

Abstract

The invention discloses a coding method, a device, computing equipment and a medium based on HEVC, wherein the method comprises the following steps: acquiring a current video frame to be coded, and identifying an actual region of interest (ROI) included in the current video frame; calculating the coding depth of each HEVC coding unit according to the position relation between a plurality of HEVC coding units of the current video frame and the actual ROI; and performing video coding on the current video frame by adopting an inter-frame prediction mode matched with the coding depth of each HEVC coding unit. According to the technical scheme of the embodiment of the invention, when the whole frame video image is coded, an unnecessary inter-frame prediction mode is eliminated, the calculation amount of rate distortion cost is reduced, the real-time performance and accuracy of coding in the selected inter-frame prediction mode are ensured, and the coding speed of HEVC is improved.

Description

HEVC-based encoding method and device, computing equipment and medium

Technical Field

The embodiment of the invention relates to a video compression coding technology, in particular to a coding method, a coding device, computing equipment and a medium based on HEVC.

Background

The international Telecommunication union Telecommunication standards institute (ITU-T, ITU-T for ITU Telecommunication Standardization Sector) has established a new generation of HEVC (High Efficiency Video Coding standard). The main goal of HEVC is to double the efficiency of high resolution or high fidelity video image compression.

Coding Unit (CU) in HEVC generally adopts quadtree recursive partitioning, and a CU at each Coding depth has its corresponding inter prediction mode. The HEVC encoder performs motion estimation and motion compensation on all prediction modes from top to bottom for CUs at different depths, calculates rate distortion cost for each inter prediction mode one by one, and finds the inter prediction mode with the minimum rate distortion cost as the best inter prediction mode of the current CU.

The traversal calculation process causes the encoder to have very high calculation complexity, and the encoding time consumed by video compression is long, so that the real-time video compression requirement cannot be met. Therefore, how to eliminate the unnecessary inter prediction mode and effectively reduce the computation amount of the rate distortion cost becomes a problem to be solved urgently at present.

Disclosure of Invention

The embodiment of the invention provides an HEVC-based coding method, an HEVC-based coding device, HEVC-based computing equipment and HEVC-based media, so that the computation amount of rate distortion cost is reduced, the real-time performance and the accuracy of coding in a selected inter-frame prediction mode are ensured, and the coding speed of HEVC is increased.

In a first aspect, an embodiment of the present invention provides an HEVC-based encoding method, where the method includes:

acquiring a current video frame to be coded, and identifying an actual ROI (region of interest) included in the current video frame;

calculating the coding depth of each HEVC coding unit according to the position relation between a plurality of HEVC coding units of the current video frame and the actual ROI;

and performing video coding on the current video frame by adopting an inter-frame prediction mode matched with the coding depth of each HEVC coding unit.

In a second aspect, an embodiment of the present invention further provides an HEVC-based encoding apparatus, where the apparatus includes:

the actual ROI identification module is used for acquiring a current video frame to be coded and identifying an actual ROI included in the current video frame;

a coding depth calculation module of an HEVC coding unit, configured to calculate a coding depth of each HEVC coding unit according to a positional relationship between a plurality of HEVC coding units of the current video frame and the actual ROI;

and the current video frame coding module is used for carrying out video coding on the current video frame by adopting an inter-frame prediction mode matched with the coding depth of each HEVC coding unit.

In a third aspect, an embodiment of the present invention further provides a computing device, where the computing device includes:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, the one or more programs cause the one or more processors to implement the HEVC-based encoding method provided by any embodiment of the present invention.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program, when executed by a processor, implements the HEVC-based encoding method provided in any embodiment of the present invention.

The method comprises the steps of firstly, obtaining a current video frame to be coded to determine the position parameters of an actual region of interest, then calculating the depth of a corresponding coding unit according to the position relation between the coding unit and the actual region of interest, and finally, carrying out video coding on the current video frame by adopting an inter-frame prediction mode matched with the depth of the coding unit. When the whole frame video image is coded, the embodiment of the invention omits the rate distortion cost estimation of an unnecessary inter-frame prediction mode, solves the problem of huge rate distortion calculation amount, ensures the real-time performance and accuracy of coding in the selected inter-frame prediction mode, and improves the coding speed of HEVC.

Drawings

Fig. 1 is a flowchart of an HEVC-based encoding method according to a first embodiment of the present invention;

fig. 2 is a flowchart of an HEVC-based encoding method according to a second embodiment of the present invention;

fig. 3 is a flowchart of a specific implementation of an HEVC-based encoding method according to a third embodiment of the present invention;

fig. 4 is a block diagram of an HEVC-based encoding apparatus according to a fourth embodiment of the present invention;

fig. 5 is a schematic structural diagram of a computing device in the fifth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of an HEVC-based encoding method according to an embodiment of the present invention, where the embodiment is applicable to a case of video compression encoding in multimedia communication, the method may be executed by an HEVC-based encoding apparatus, the apparatus may be implemented by software and/or hardware, and may be generally integrated in a terminal or a server having a video compression processing function, and typically, integrated in a video encoder of the terminal or the server, and specifically include the following steps:

step 110, obtaining a current video frame to be encoded, and identifying an actual ROI included in the current video frame.

In this embodiment, the manner of acquiring the current video frame to be encoded may be directly acquired by the video encoder from a device or computer generating the video data. The ROI refers to a general region of a current video frame to be processed and analyzed, which is called a region of interest and is usually delineated by a square, a circle, an ellipse, an irregular polygon, etc., and the actual ROI refers to a region of interest with precise position parameters.

And step 120, calculating the coding depth of each HEVC coding unit according to the position relation between the plurality of HEVC coding units of the current video frame and the actual ROI.

In this step, the HEVC coding unit is a coding block of a larger size, abbreviated as CU block, that is pre-partitioned for the current video frame. CU blocks with depth 0 have a block size of 64 × 64, and CU blocks with depths 1-3 have corresponding sizes of 32 × 32, 16 × 16, and 8 × 8, respectively.

The positional relationship between CU blocks and the actual ROI includes three types: located inside the actual ROI, outside the actual ROI and at the edge of the actual ROI, the relative position of the CU block and the actual ROI determines the coding depth of the CU block.

Specifically, when a CU block is outside the actual ROI, i.e., in a non-ROI, the CU block with a larger size is used for encoding; when the CU blocks are positioned at the edge of the actual ROI, thinning the image, and coding by adopting the CU blocks with the minimum size; when a CU block is inside the actual ROI, a smaller sized CU block is used for encoding. Wherein the smaller the size of the CU block, the larger the coding depth.

And step 130, performing video coding on the current video frame by adopting an inter-frame prediction mode matched with the coding depth of each HEVC coding unit.

In the HEVC coding process, each coding unit at each depth has its corresponding inter-prediction mode, so that motion estimation and motion compensation in video compression coding are performed.

Specifically, the following inter prediction modes are mainly used in the prior art: skip or merge mode, Square split mode, SMP (symmetric Motion Partitions) mode, AMP (Asymmetric Motion Partitions) mode, and intra mode.

According to the technical scheme, the current video frame to be coded is obtained through a video coder, then the actual region of interest included in the previous video frame is identified, the coding depth of a coding unit is determined according to the position relation between a plurality of HEVC coding units of the current video frame and the actual region of interest, and finally an inter-frame prediction mode matched with the coding unit is adopted according to the depth of the coding unit, so that video coding is carried out on the current video frame. According to the technical scheme of the embodiment, on the premise of ensuring the HEVC coding compression efficiency and the image quality of the region of interest, the calculation amount for searching the optimal inter-frame prediction mode is remarkably reduced, the video compression coding speed is increased, and the real-time video compression requirement is met.

Example two

The present embodiment provides a specific implementation manner for identifying an actual ROI in a current video frame based on the first embodiment described above. The same or corresponding terms as those of the above-described embodiments are explained, and the description of the present embodiment is omitted. In the embodiment of the invention, a current video to be coded is obtained, and an actual ROI included in a current video frame is identified, and the specific steps are as follows: the method comprises the steps of firstly identifying candidate interested pixel points included in a current video frame by utilizing an interframe image difference algorithm, then obtaining a shooting scene corresponding to the current video frame, finally selecting an identification strategy matched with the shooting scene, and identifying an actual ROI included in the current video frame according to the candidate interested pixel points.

Fig. 2 is a flowchart of an HEVC-based encoding method according to a second embodiment of the present invention, and as shown in fig. 2, the method includes the following steps:

step 210, identifying candidate interested pixel points included in the current video frame by using an interframe image difference algorithm.

In this step, in the embodiment of the present invention, a previous video frame and a next video frame corresponding to a current video frame are first obtained, and then the candidate interested pixel point is identified in the current video frame according to a difference between a gray value of each pixel point in the current video frame and a gray value of a pixel point at a corresponding position in the previous video frame or the next video frame.

Firstly, judging whether the absolute value of the difference between the gray values of the pixels at the corresponding positions of the current video frame and the previous video frame is greater than or equal to a preset threshold, if so, setting the first gray value of each pixel of the current video frame to be 255, otherwise, setting the first gray value of each pixel of the current video frame to be 0; then judging whether the absolute value of the difference between the gray values of the pixel points at the corresponding positions of the current video frame and the next video frame is greater than or equal to a preset threshold value or not, if so, setting a second gray value of each pixel point of the current video frame to be 255, and otherwise, setting the second gray value of each pixel point of the current video frame to be 0; and finally, setting the gray value of each pixel point in the current video frame as the sum operation result of the first gray value and the second gray value to obtain a binary image corresponding to the current video frame, and acquiring each pixel point with the gray value of 255 in the binary image to serve as the candidate interested pixel point.

Specifically, the first gray value is a gray value of a pixel point at a position corresponding to a binary difference image between a current video frame and a previous video frame, and the second gray value is a gray value of a pixel point at a position corresponding to a binary difference image between the current video frame and a next video frame.

Preferably, the preset threshold value in this step is set to 15.

And step 220, acquiring a shooting scene corresponding to the current video frame.

The shooting scenes corresponding to the current video frames are divided into two types, the first type is object motion and the lens is static, and the second type is object motion and the lens rotates.

In this step, the embodiment of the present invention first divides the current video frame into a plurality of first-level blocks by using a first preset pixel point as a unit; then, dividing each primary block into a plurality of secondary blocks by taking a second preset pixel point as a unit, calculating the number of candidate interested pixels included in each secondary block, and calculating the ratio of the number of the secondary blocks with the number of the candidate interested pixels smaller than a preset value to the total number of the secondary blocks; and finally, judging whether the ratio is greater than or equal to a preset threshold value, if so, determining that the shooting scene is an object motion scene and a lens is static, otherwise, determining that the shooting scene is an object motion scene and a lens rotation scene.

Preferably, in this step, the first preset pixel point is set to 128 × 128, and the second preset pixel point is set to 32 × 32, in the embodiment of the present invention, the number of candidate interested pixels included in each secondary block is calculated, and a ratio of the number of secondary blocks in which the number of candidate interested pixels is less than 5 to the total number of secondary blocks is calculated; and finally, judging whether the ratio is greater than or equal to 0.73, if so, determining that the shooting scene is an object motion scene and a lens is static, otherwise, determining that the shooting scene is an object motion scene and a lens rotation scene.

And step 230, selecting an identification strategy matched with the shooting scene, and identifying the actual ROI included in the current video frame according to the candidate interested pixel points.

If the shooting scene is a first-class scene, namely an object moves and a lens is static, the embodiment of the invention firstly calculates the number of alternative interested pixel points included in each primary block, and determines the primary block of which the number of the alternative interested pixel points is more than or equal to a preset threshold value as a candidate ROI; then, dividing each secondary block into a plurality of tertiary blocks by taking a third preset pixel point as a unit, constructing a search region matched with the image of the candidate ROI according to the image position of each candidate ROI, and determining the candidate ROI corresponding to the search region as an independent candidate ROI if each tertiary block in the search region does not include any optional candidate interested pixel point; and finally, calculating the gravity center, the width and the height of each independent candidate ROI, calculating the sport intensity corresponding to the independent candidate ROI according to the number, the width and the height of the candidate interested pixels of the independent candidate ROI, and determining the actual ROI in the independent candidate ROI according to the sport intensity.

If the shooting scene is a second kind of scene, namely object motion and lens rotation, the embodiment of the invention firstly takes the gray value of the pixel point at the position of the actual ROI in the previous video frame as a reference value, and determines the search range according to the position of the actual ROI in the previous video frame; then, motion estimation is carried out according to the reference value and the gray value of the pixel point of the current video frame in the searching range, and a plurality of matching points are obtained; and finally, determining the gravity center, the width and the height of the actual ROI of the current video frame according to the matching points, and determining the actual ROI in the current video frame according to the gravity center, the width and the height of the actual ROI.

If the shooting scene is a first-class scene, namely, the object moves and the lens is still, the number of candidate interested pixel points included in each primary block is calculated, and the primary block in which the number of the candidate interested pixel points is greater than or equal to 56 is determined as the candidate ROI. The third preset pixel point is preferably 16 × 16, and as long as there is no alternative interested pixel point in each tertiary block of one circle around the candidate ROI, the candidate ROI is an independent candidate ROI. The gravity center coordinates of the independent candidate ROI can be calculated by superposing and averaging the coordinates of each alternative interested pixel point in the independent candidate ROI, so that the width and the height of the independent candidate ROI are calculated, and the motion severity corresponding to the independent candidate ROI is calculated according to the number, the width and the height of the alternative interested pixel points of the independent candidate ROI.

Preferably, if the intensity of the above-mentioned motion intensity is greater than or equal to 1.75, the independent candidate ROI is the actual ROI.

If the shooting scene is a second kind of scene, namely object motion and lens rotation, performing motion estimation in the current video frame by taking the gray value of a pixel point at the position of an actual ROI in the previous video frame as a reference value, taking the gravity center of the actual ROI of the previous video frame as an origin, determining a search range by taking 9 pixel points as radii, taking a point with a motion estimation value in the search range smaller than or equal to 3 as an optimal matching point, determining the gravity center, the width and the height of the actual ROI of the current video frame according to the matching point, and determining the actual ROI in the current video frame according to the gravity center, the width and the height of the actual ROI.

Step 240, calculating the coding depth of each HEVC coding unit according to the position relationship between the plurality of HEVC coding units of the current video frame and the actual ROI.

In this step, according to the shooting scene, HEVC coding units that are not located inside any of the actual ROIs are set to a first coding depth or a second coding depth, where an object is moving and a scene with a still shot corresponds to the first coding depth, and an object is moving and a scene with a rotating shot corresponds to the second coding depth. And setting HEVC coding units partially positioned in at least one actual ROI as a third coding depth. And setting all HEVC coding units positioned in any one actual ROI as a fourth coding depth. The first coded depth is smaller than the second coded depth, the second coded depth is smaller than the third coded depth, and the third coded depth is larger than the fourth coded depth. The larger the coding depth, the smaller the cell block size used by the HEVC coding unit.

For a first type of scenes, namely, scenes with moving objects and still lenses, when a CU block is positioned outside an actual ROI, namely, a non-interesting region, a first coding depth is set to be 0, and a CU block size is set to be 64 x 64; when the CU block is at the actual ROI edge, the third coding depth is set to 3, and the CU block size is set to 8 × 8; when the CU block is inside the actual ROI, the fourth coding depth is set to 2 and the CU block size is set to 16 × 16.

For a second type of scene, namely object motion and lens rotation, when the CU block is outside the actual ROI, namely in a non-interesting region, the second coding depth is set to 1, and the CU block size is set to 32 × 32; when the CU block is at the actual ROI edge, the third coding depth is set to 3, and the CU block size is set to 8 × 8; when the CU block is inside the actual ROI, the fourth coding depth is set to 2 and the CU block size is set to 16 × 16.

And step 250, performing video coding on the current video frame by adopting an inter-frame prediction mode matched with the coding depth of each HEVC coding unit.

In this step, an inter prediction mode matching the depth of the coding unit is selected, and regarding the first type of scenes, namely objects, moving and the lens is still, when the CU blocks are outside the actual ROI, namely a non-interesting area, the prediction mode is SKIP or merge; when the CU blocks are at the edge of the actual ROI, the prediction modes are AMP and intra; when a CU block is inside the actual ROI, the prediction modes are Square and SMP.

Regarding the second type of scene, i.e. object motion and lens rotation, when a CU block is outside the actual ROI, i.e. in the non-ROI, the inter prediction modes may be SKIP, merge and Square; when the CU blocks are at the edge of the actual ROI, the inter prediction modes are AMP and intra; when a CU block is inside the actual ROI, the inter prediction modes are Square and SMP.

According to the technical scheme of the embodiment, firstly, candidate interested pixel points included in a current video frame are identified through an interframe image difference algorithm, and a rough interested region is simply and effectively identified; then, a shooting scene corresponding to the current video frame is obtained, an identification strategy matched with the shooting scene is selected, the actual ROI included in the current video frame is identified according to the alternative interested pixel points, and conditions are provided for rapid video coding; and finally, calculating the coding depth of each HEVC coding unit according to the position relation between a plurality of HEVC coding units of the current video frame and the actual ROI, and performing video coding on the current video frame by adopting an inter-frame prediction mode matched with the coding depth of each HEVC coding unit. According to the method and the device, unnecessary inter-frame prediction modes are eliminated by determining the region of interest, the calculation amount of rate distortion cost is reduced, the coding speed is greatly improved on the premise of ensuring the video quality of the region of interest, and real-time coding in the video transmission process is realized.

EXAMPLE III

Fig. 3 is a flowchart illustrating an implementation of an HEVC-based coding method according to a third embodiment of the present invention, and the present embodiment provides a specific implementation step of the HEVC-based coding method based on the foregoing embodiments. As shown in fig. 3, the method comprises the steps of:

identifying alternative interested pixel points included in the current video frame by utilizing an interframe image difference algorithm;

Acquiring a shooting scene corresponding to the current video frame; the shooting scenes corresponding to the current video frames are divided into two types, the first type is object motion and the lens is static, and the second type is object motion and the lens rotates.

In this step, the embodiment of the present invention first divides the current video frame into a plurality of first-level blocks by using a first preset pixel point as a unit; then, each primary block is divided into a plurality of secondary blocks by taking a second preset pixel point as a unit, the number of candidate interested pixels included in each secondary block is calculated, and the ratio of the number of the secondary blocks with the number of the candidate interested pixels smaller than a preset value to the total number of the secondary blocks is calculated.

And judging whether the ratio is greater than or equal to a preset threshold value, if so, determining that the shooting scene is an object motion scene and a lens is static, and otherwise, determining that the shooting scene is an object motion scene and a lens rotation scene.

Classifying the acquired scenes;

selecting an identification strategy matched with the shooting scene, and identifying an actual ROI included in the current video frame according to the alternative interested pixel points;

in this step, different recognition strategies are selected for different shooting scenes, if the shooting scene is a first-class scene, namely an object moves and a lens is static, the number of candidate interested pixel points included in each primary block is calculated, and the primary block in which the number of the candidate interested pixel points is greater than or equal to a preset threshold value is determined as a candidate ROI; then, dividing each secondary block into a plurality of tertiary blocks by taking a third preset pixel point as a unit, constructing a search region matched with the image of the candidate ROI according to the image position of each candidate ROI, and determining the candidate ROI corresponding to the search region as an independent candidate ROI if each tertiary block in the search region does not include any optional candidate interested pixel point; and finally, calculating the gravity center, the width and the height of each independent candidate ROI, calculating the sport intensity corresponding to the independent candidate ROI according to the number, the width and the height of the candidate interested pixels of the independent candidate ROI, and determining the actual ROI in the independent candidate ROI according to the sport intensity.

Classifying a positional relationship between a CU block and the actual ROI;

the CU block is a large-sized coding block pre-divided for a current video frame, and the positional relationship between the CU block and an actual ROI includes three types: outside of the actual ROI, edge of the actual ROI, and inside of the actual ROI.

Calculating the depth of each CU block according to the position relation between the CU blocks of the current video frame and the actual ROI;

when the CU blocks are positioned outside the actual ROI (non-interesting region), the depth of the CU blocks is set to be 0, and the size of the CU blocks is set to be 64 x 64; when the CU block is at the actual ROI edge, the CU block depth is set to 3 and the size is set to 8 × 8; when a CU block is inside the actual ROI, the CU block depth is set to 2 and the size is set to 16 × 16.

For the second kind of scenes, namely object motion and lens rotation, when the CU blocks are positioned outside the actual ROI, namely a non-interesting region, the CU block depth is set to be 1, and the size is set to be 32 x 32; when the CU block is at the actual ROI edge, the CU block depth is set to 3 and the size is set to 8 × 8; when a CU block is inside the actual ROI, the CU block depth is set to 2 and the size is set to 16 × 16.

Performing video coding on the current video frame by adopting an inter-frame prediction mode matched with the depth of each CU block;

in the step, an inter prediction mode matched with the depth of the CU block is selected, and when the CU block is located outside an actual ROI (non-interested region), namely an SKIP (SKIP shot) or merge, the prediction mode is related to a first type of scene, namely object motion and a lens is static; when the CU blocks are at the edge of the actual ROI, the prediction modes are AMP and intra; when a CU block is inside the actual ROI, the prediction modes are Square and SMP.

According to the technical scheme of the embodiment, firstly, candidate interested pixel points included in a current video frame are identified through an interframe image difference algorithm, and a rough interested region is simply and effectively identified; then acquiring a shooting scene corresponding to the current video frame, classifying the acquired scene, selecting an identification strategy matched with the shooting scene, and identifying an actual ROI included in the current video frame according to the alternative interested pixel points, thereby providing conditions for rapid video coding; and finally, according to the position relation between a plurality of CU blocks of the current video frame and the actual ROI, calculating the depth of each CU block, and performing video coding on the current video frame by adopting an inter-frame prediction mode matched with the depth of each CU block. The embodiment reduces the calculation amount for searching the optimal inter-frame prediction mode by determining the actual region of interest, greatly improves the coding speed on the premise of ensuring the video quality of the region of interest, and realizes real-time coding in the video transmission process.

Example four

Fig. 4 is a block diagram of an HEVC-based encoding apparatus according to a fourth embodiment of the present invention, where the apparatus includes: an actual ROI identification module 410, a coding depth calculation module 420 of an HEVC coding unit, and a current video frame coding module 430.

The actual ROI identifying module 410 is configured to obtain a current video frame to be encoded, and identify an actual ROI included in the current video frame; a coding depth calculation module 420 of an HEVC coding unit, configured to calculate a coding depth of each HEVC coding unit according to a position relationship between a plurality of HEVC coding units of the current video frame and the actual ROI; a current video frame coding module 430, configured to perform video coding on the current video frame by using an inter-frame prediction mode matched with a coding depth of each HEVC coding unit.

According to the technical scheme of the embodiment, the actual ROI included in the current video frame is determined through the actual ROI identification module, so that conditions are provided for subsequent video coding; then according to the position relation between the HEVC coding unit and the actual ROI, calculating coding depths corresponding to different positions of the current video frame through a coding depth calculation module of the HEVC coding unit; and finally, according to the coding depth corresponding to each coding unit, coding the current video frame by using a current video frame coding module and adopting an inter-frame prediction mode matched with the coding depth. The embodiment provides an effective HEVC-based coding device, solves the problem of huge calculation amount for searching the optimal inter-frame prediction mode by HEVC, reduces the complexity of the HEVC coding process, and meets the requirement of real-time video compression.

On the basis of the above embodiments, the actual ROI identifying module 410 may include:

the candidate interested pixel point identification module is used for identifying candidate interested pixel points included in the current video frame by utilizing an interframe image difference algorithm;

a shooting scene obtaining module, configured to obtain a shooting scene corresponding to the current video frame;

and the actual ROI determining module is used for selecting an identification strategy matched with the shooting scene and determining the actual ROI included in the current video frame according to the alternative interested pixel points.

Wherein, the alternative interested pixel point identification module may include:

a video frame acquisition module, configured to acquire a previous video frame and a next video frame corresponding to the current video frame;

and the candidate interested pixel point determining module is used for identifying the candidate interested pixel point in the current video frame according to the difference between the gray value of each pixel point in the current video frame and the gray value of the pixel point at the corresponding position in the previous video frame or the next video frame.

The candidate interested pixel point determining module may include:

a first gray value setting module, configured to determine whether an absolute value of a difference between gray values of pixels at corresponding positions of the current video frame and the previous video frame is greater than or equal to a preset threshold, if so, set a first gray value of each pixel of the current video frame to 255, and otherwise, set a first gray value of each pixel of the current video frame to 0;

a second gray value setting module, configured to determine whether an absolute value of a difference between gray values of pixels at corresponding positions of the current video frame and the next video frame is greater than or equal to a preset threshold, if so, set a second gray value of each pixel of the current video frame to 255, and otherwise, set the second gray value of each pixel of the current video frame to 0;

a current video frame gray level setting module, configured to set a gray level value of each pixel in the current video frame as an and operation result of the first gray level value and the second gray level value, so as to obtain a binary image corresponding to the current video frame;

and obtaining each pixel point with the gray value of 255 in the binary image as the candidate interested pixel point.

The photographing scene acquiring module may include:

the primary block dividing module is used for dividing the current video frame into a plurality of primary blocks by taking a first preset pixel point as a unit;

the secondary block dividing module is used for dividing each primary block into a plurality of secondary blocks by taking a second preset pixel point as a unit;

and the shot scene determining module is used for calculating the number of candidate interested pixel points in each secondary block, calculating the ratio of the number of the secondary blocks with the number of the candidate interested pixel points smaller than a preset value to the total number of the secondary blocks, judging whether the ratio is larger than or equal to a preset threshold value, if so, determining that the shot scene is a scene with object motion and a static lens, and otherwise, determining that the shot scene is a scene with object motion and a rotating lens.

The actual ROI determination module may include:

the candidate ROI determining module is applied to a scene with moving objects and still lenses and used for calculating the number of candidate interested pixel points in each primary block and determining the primary block of which the number of the candidate interested pixel points is larger than or equal to a preset threshold value as a candidate ROI;

the three-level block dividing module is applied to a scene with moving objects and static lens and used for dividing each two-level block into a plurality of three-level blocks by taking a third preset pixel point as a unit;

the searching region constructing module is applied to a scene with moving objects and still lens and is used for constructing a searching region matched with the images of the candidate ROIs according to the image positions of the candidate ROIs;

the independent candidate ROI determining module is applied to a scene with moving objects and still shots, and is used for determining the candidate ROI corresponding to the search area as the independent candidate ROI if each tertiary block in the search area does not comprise any optional candidate interested pixel point;

the motion intensity calculation module is applied to an object motion and lens static scene and used for calculating the gravity center, the width and the height of each independent candidate ROI, calculating the motion intensity corresponding to the independent candidate ROI according to the number, the width and the height of alternative interested pixels of the independent candidate ROI, and determining the actual ROI in the independent candidate ROI according to the motion intensity;

the searching range determining module is applied to a scene with an object moving and a lens rotating and is used for determining a searching range according to the position of the actual ROI in the previous video frame by taking the gray value of the pixel point at the position of the actual ROI in the previous video frame as a reference value;

the matching point acquisition module is applied to a scene with a rotating lens and used for object motion, and is used for carrying out motion estimation according to the reference value and the gray value of the pixel point of the current video frame in the search range to obtain a plurality of matching points;

and the actual ROI parameter calculation module is applied to a scene which is used for object motion and lens rotation, and is used for determining the gravity center, the width and the height of the actual ROI of the current video frame according to the plurality of matching points and determining the actual ROI in the current video frame according to the gravity center, the width and the height of the actual ROI.

The coding depth calculation module 420 of the HEVC coding unit may include:

the first coding depth or second coding depth setting module is used for setting HEVC coding units which are not positioned in any actual ROI into first coding depth or second coding depth according to the shooting scene, wherein an object moving scene and a lens static scene correspond to the first coding depth, and an object moving scene and a lens rotating scene correspond to the second coding depth;

a third coding depth setting module configured to set HEVC coding units that are partially inside the at least one actual ROI to a third coding depth;

a fourth coding depth setting module, configured to set all HEVC coding units located inside any one of the actual ROIs to a fourth coding depth;

the first coded depth is smaller than the second coded depth, the second coded depth is smaller than the third coded depth, and the third coded depth is larger than the fourth coded depth; the larger the coding depth, the smaller the cell block size used by the HEVC coding unit.

EXAMPLE five

Fig. 5 is a schematic structural diagram of a computing apparatus according to a fifth embodiment of the present invention, as shown in fig. 5, the computing apparatus includes a processor 510, a memory 520, an input device 530, and an output device 540; the number of processors 510 in the computing device may be one or more, and one processor 510 is taken as an example in fig. 5; the processor 510, memory 520, input device 530, and output device 540 in the computing device may be connected by a bus or other means, such as by a bus in fig. 5.

The memory 520 may be used as a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the HEVC-based coding method in the embodiments of the present invention (e.g., the actual ROI identification module 410, the coding depth calculation module 420, and the current video frame coding module 430 in the HEVC-based coding apparatus). The processor 510 executes various functional applications of the computing device and data processing by executing software programs, instructions, and modules stored in the memory 520, that is, implements the HEVC-based encoding method described above. That is, the program when executed by the processor implements:

acquiring a current video frame to be coded, and identifying an actual ROI included in the current video frame;

The memory 520 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 520 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 520 may further include memory located remotely from processor 510, which may be connected to a computing device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 530 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the computing device, and may include a keyboard and a mouse, etc. The output device 540 may include a display device such as a display screen.

EXAMPLE six

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements an HEVC-based encoding method according to any embodiment of the present invention. Of course, embodiments of the present invention provide a computer-readable storage medium, which can perform related operations in the HEVC-based coding method provided in any embodiment of the present invention. That is, the program when executed by the processor implements:

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.

It should be noted that, in the embodiment of the HEVC-based coding apparatus, the included units and modules are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. An HEVC-based coding method, comprising:

acquiring a current video frame to be coded, and identifying an actual region of interest (ROI) included in the current video frame;

performing video coding on the current video frame by adopting an inter-frame prediction mode matched with the coding depth of each HEVC coding unit;

the acquiring a current video frame to be encoded and identifying an actual ROI included in the current video frame includes:

acquiring a shooting scene corresponding to the current video frame;

selecting an identification strategy matched with the shooting scene, and determining an actual ROI included in the current video frame according to the alternative interested pixel points;

the acquiring of the shooting scene corresponding to the current video frame includes:

dividing the current video frame into a plurality of first-level blocks by taking a first preset pixel point as a unit;

dividing each primary block into a plurality of secondary blocks by taking a second preset pixel point as a unit;

calculating the number of candidate interested pixel points included in each secondary block, and calculating the ratio of the number of the secondary blocks with the number of the candidate interested pixel points smaller than a preset value to the total number of the secondary blocks;

judging whether the ratio is greater than or equal to a preset threshold value;

if so, determining that the shooting scene is an object motion scene and a lens static scene; otherwise, determining that the shooting scene is an object motion and lens rotation scene.

2. The method of claim 1, wherein identifying candidate pixels of interest included in the current video frame using an inter-frame image differencing algorithm comprises:

acquiring a previous video frame and a next video frame corresponding to the current video frame;

and identifying the candidate interested pixel points in the current video frame according to the difference between the gray value of each pixel point in the current video frame and the gray value of the pixel point at the corresponding position in the previous video frame or the next video frame.

3. The method of claim 2, wherein identifying the candidate interested pixel in the current video frame according to a difference between a gray value of each pixel in the current video frame and a gray value of a pixel at a corresponding position in the previous video frame or the next video frame comprises:

judging whether the absolute value of the difference of the gray values of the pixel points at the corresponding positions of the current video frame and the previous video frame is greater than or equal to a preset threshold value or not; if so, setting the first gray value of each pixel point of the current video frame to be 255, otherwise, setting the first gray value of each pixel point of the current video frame to be 0;

judging whether the absolute value of the difference of the gray values of the pixel points at the corresponding positions of the current video frame and the next video frame is greater than or equal to a preset threshold value or not; if so, setting a second gray value of each pixel point of the current video frame to be 255, otherwise, setting the second gray value of each pixel point of the current video frame to be 0;

setting the gray value of each pixel point in the current video frame as the sum operation result of the first gray value and the second gray value to obtain a binary image corresponding to the current video frame;

4. The method of claim 1, wherein selecting an identification strategy matching the capture scene and determining the actual ROI included in the current video frame based on the candidate interested pixel points comprises:

if the shooting scene is determined to be an object motion scene and a lens static scene, calculating the number of candidate interested pixel points included in each primary block, and determining the primary block of which the number of the candidate interested pixel points is greater than or equal to a preset threshold value as a candidate ROI;

dividing each secondary block into a plurality of tertiary blocks by taking a third preset pixel point as a unit;

constructing a search area matched with the image of the candidate ROI according to the image position of each candidate ROI;

if each tertiary block in the search region does not include any optional candidate interested pixel point, determining the candidate ROI corresponding to the search region as an independent candidate ROI;

calculating the gravity center, the width and the height of each independent candidate ROI, and calculating the sport intensity corresponding to the independent candidate ROI according to the number, the width and the height of the candidate interested pixel points of the independent candidate ROI;

and determining the actual ROI in the independent candidate ROIs according to the sports severity.

5. The method of claim 1, wherein selecting an identification strategy matching the capture scene and determining the actual ROI included in the current video frame based on the candidate interested pixel points comprises:

if the shooting scene is determined to be an object motion scene and a lens rotation scene, taking the gray value of the pixel point at the position of the actual ROI in the previous video frame as a reference value, and determining a search range according to the position of the actual ROI in the previous video frame;

performing motion estimation according to the reference value and the gray value of the pixel point of the current video frame in the search range to obtain a plurality of matching points;

determining the gravity center, the width and the height of the actual ROI of the current video frame according to the plurality of matching points;

and determining the actual ROI in the current video frame according to the gravity center, the width and the height of the actual ROI.

6. The method of claim 1, wherein calculating the coding depth of each HEVC coding unit according to a position relationship between a plurality of HEVC coding units of the current video frame and the actual ROI comprises:

according to the shooting scene, setting HEVC coding units which are not positioned in any actual ROI as a first coding depth or a second coding depth, wherein an object moves and a scene with a static lens corresponds to the first coding depth, and an object moves and a scene with a rotating lens corresponds to the second coding depth;

setting HEVC coding units partially inside at least one of the actual ROIs to a third coding depth;

setting all HEVC coding units positioned in any actual ROI to be a fourth coding depth;

7. An apparatus for HEVC-based coding, comprising:

the current video frame coding module is used for carrying out video coding on the current video frame by adopting an inter-frame prediction mode matched with the coding depth of each HEVC coding unit;

acquiring a shooting scene corresponding to the current video frame;

judging whether the ratio is greater than or equal to a preset threshold value;

8. A computing device, wherein the computing device comprises:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement an HEVC-based coding method as recited in any one of claims 1-6.

9. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements an HEVC based coding method as claimed in any one of claims 1-6.