CN113660498A

CN113660498A - Inter-frame image universal coding method and system based on significance detection

Info

Publication number: CN113660498A
Application number: CN202111218449.1A
Authority: CN
Inventors: 蒋先涛; 蔡佩华; 张纪庄; 郭咏梅; 郭咏阳
Original assignee: Kangda Intercontinental Medical Devices Co ltd
Current assignee: Ningbo Kangda kaineng Medical Technology Co.,Ltd.
Priority date: 2021-10-20
Filing date: 2021-10-20
Publication date: 2021-11-16
Anticipated expiration: 2041-10-20
Also published as: CN113660498B

Abstract

The invention discloses a general coding method for interframe images based on significance detection, which relates to the technical field of image processing and mainly comprises the following steps: extracting a moving target set in a preset range threshold value after the parameterization of the current frame image through a significance detector; screening out a coding tree unit set covered with any moving target or covered by any moving target from each coding tree unit according to the overlapping relation between each moving target in the moving target set and the coding tree unit where the moving target is located; acquiring the relative overlapping degree of each coding tree unit in the coding tree unit set and the corresponding moving target; extracting the corresponding coding tree unit with the relative overlapping degree larger than the preset overlapping degree as a significance region; and correcting the preset quantization parameter by using a preset correction value according to the extracted significance region, and coding the current frame image by using the corrected preset quantization parameter. When the video coding is carried out, the salient region is distributed to more number resources, and the loss of the data resources of the non-salient region is reduced.

Description

Inter-frame image universal coding method and system based on significance detection

Technical Field

The invention relates to the technical field of image processing, in particular to a method and a system for universal coding of interframe images based on significance detection.

Background

In conventional video coding, each inter picture is usually coded using a constant Quantization Parameter (QP), which depends on the basic QP selected by the user. When a Coding Tree Unit (CTU) covers an insignificant region, a block-by-block QP Adaptation system (QP Adaptation, QPA) based on the human visual system can improve transmission with lower visual quality in the subjective Coding quality region when a human is used as a final observer. The QPA function is included in the latest Video Coding standard (VVC) and its reference software VVC Test Model (VTM).

H.266/VVC (Universal Video Coding) is a standard jointly developed by VCEG and MPEG, and H.266/VVC is the latest international generation Video Coding standard currently. Today more and more multimedia data traffic is not consumed by human-based observers, but is used by computer vision algorithms to analyze data to solve different tasks, e.g. in smart machine applications in the field of surveillance or automatic driving. Therefore, MPEG introduced a special group on this so-called machine Video Coding (VCM) task to optimize the Video codec of computer-to-machine communication scenarios.

As the amount of data required to be processed and the time-dependent demands increase, the demand for video compression by machines increases, and it is crucial to select an appropriate algorithm to detect these salient regions before encoding. On the one hand it has to find precisely the important area containing the relevant object, but on the other hand the saliency detector has to be fast enough to satisfy real-time applications, which is the direction of the present invention.

Disclosure of Invention

In order to realize high-aging and high-quality processing of a video frame image by a machine during video coding, the invention provides an interframe image universal coding method based on significance detection, which comprises the following steps:

s1: extracting a moving target set in a preset range threshold value after the parameterization of the current frame image through a significance detector;

s2: screening out a coding tree unit set covered with any moving target or covered by any moving target from each coding tree unit according to the overlapping relation between each moving target in the moving target set and the coding tree unit where the moving target is located;

s3: acquiring the relative overlapping degree of each coding tree unit in the coding tree unit set and the corresponding moving target;

s4: extracting the corresponding coding tree unit with the relative overlapping degree larger than the preset overlapping degree as a significance region;

s5: and correcting the preset quantization parameter by using a preset correction value according to the extracted significance region, and coding the current frame image by using the corrected preset quantization parameter.

Furthermore, the saliency detector is a single-target detection network, and the moving target extracted by the single-target detection network contains classification information, area information and a bounding box.

Further, the step of S1 is followed by a step of,

s11: and scaling the width of the moving object bounding box to a preset size.

Further, the preset range threshold includes a non-maximum suppression threshold and an intersection threshold, and the parameterized extraction range of the moving object is as follows:

the moving object identification score is above a non-maximum rejection threshold, and the moving object overlap portion is greater than the intersection threshold.

Further, in the step S3, the relative overlapping degree may be obtained by a first formula, where the first formula is expressed as:

in the formula, CTU represents a coding tree unit, det represents a moving object, overlap represents intersection, i is the number of the moving object, k is the number of the coding tree unit,

is the relative overlap of the moving object of number i with the coding tree element of number k,

the region of the coding tree unit numbered k,

the area of the moving object numbered i,

is composed of

And

in the intersection region between, min () is the minimum solution.

Further, in the step S4, the determination of the significant region may be expressed as a second formula, where the expression of the second formula is:

in the formula, S_kThe coding tree unit region type of number k, 1 is a significance region, 0 is a non-significance region,

the moving object with the number i corresponds to the most corresponding overlapping degree of the coding tree unit with the number kThe value of the one or more of the one or,

is a preset overlap.

Further, in the step S5, the modifying the preset quantization parameter by the preset modification value according to the extracted saliency region may be represented by a third formula, where the formula expression of the third formula is:

in the formula (I), the compound is shown in the specification,

in order to preset the quantization parameter for the purpose of quantization,

in order to preset the correction value, the correction value is set,

and the preset quantization parameter is the modified preset quantization parameter of the coding tree unit with the number k.

The invention also provides a general coding system for interframe images based on significance detection, which comprises the following steps:

the saliency detector is used for extracting a moving target set within a preset range threshold after the current frame image is parameterized;

the coding tree screening unit is used for screening a coding tree unit set covered with any moving target or covered by any moving target from each coding tree unit according to the overlapping relation between each moving target in the moving target set and the coding tree unit;

the overlapping degree calculating unit is used for acquiring the relative overlapping degree of each coding tree unit and the corresponding moving target in the coding tree unit set;

the region extraction unit is used for extracting the corresponding coding tree unit with the relative overlapping degree larger than the preset overlapping degree as a significance region;

and the coding unit is used for correcting the preset quantization parameter by using the preset correction value according to the extracted significance region and coding the current frame image by using the corrected preset quantization parameter.

Furthermore, the saliency detector is a single-target detection network, the moving target extracted by the single-target detection network contains classification information, area information and a boundary box, and the saliency detector further comprises a frame scaling unit for scaling the width of the boundary box of the moving target to a preset size.

Compared with the prior art, the invention at least has the following beneficial effects:

(1) according to the interframe image general coding method and system based on significance detection, a single-target detection network is selected as a significance detector, so that parameterized data processing can be better adapted, and the consumption of data resources can be reduced in the data processing process;

(2) screening the coding tree units which can show the moving target most through screening of the full-coverage coding tree units and calculation of the corresponding overlapping degree, and adjusting quantization parameters under preset adjustment, so that more data resources can be allocated to the significant area, while the allocation of the data resources is reduced in the non-significant area, and the utilization rate of the data resources is improved;

(3) the timeliness of detecting the moving target is effectively improved by using the single-target detection network.

Drawings

FIG. 1 is a diagram of method steps for a method for universal coding of inter-frame images based on saliency detection;

FIG. 2 is a system block diagram of a general inter-frame image coding system based on saliency detection;

fig. 3 is a schematic diagram of window extraction of a sliding window.

Detailed Description

The following are specific embodiments of the present invention and are further described with reference to the drawings, but the present invention is not limited to these embodiments.

Example one

In order to meet the increasing requirements on timeliness and precision of video coding in the existing machine communication scene, as shown in fig. 1, the invention provides a general inter-frame image coding method based on significance detection, which comprises the following steps:

Based on the latest video coding standard "universal video coding (VVC)", the present invention proposes the above-mentioned coding steps for video coding in machine communication scenarios. In order to more effectively acquire the initial salient region before encoding, the invention selects a single object detection network (YOLO) to extract the moving object.

The early RCNN, FAST RCNN and FASTER RCNN computing networks roughly divide the detection result into two parts for solution, that is, an object type solution result and an object position solution result (obtained through regression calculation), and the two parts have a causal relationship and can be solved in the next step only on the basis of completing the solution in the previous step; the single target detection network is different from the previous calculation networks, the object detection is directly used as a regression problem to solve, and after an inference (reference) is carried out on an input image, the region information of all objects in the image, the categories of the objects and the corresponding confidence probabilities can be obtained, so that the intersected previous calculation networks have the advantage of innate processing speed.

Meanwhile, the moving target is extracted through the parameterized interframe images, the YOLO has the advantages of processing parameterized data, and the moving target can be extracted with less data resource (bit) consumption in the processing process because the moving target is stronger in real-time property because the moving target belongs to a single-step processing method.

However, when the moving objects are extracted through the single-object detection network, the extracted moving objects all contain bounding boxes, and the bounding boxes have a certain width. If the bounding box is too wide, it is possible to cause the calculation result to be biased because the bounding box occupies a part of the region when calculating the subsequent overlapping degree (because the width of the bounding box is constant, but the area of the moving object is not constant, so the calculation result of the relative overlapping degree of the moving objects with different areas is different when the coding tree units with the same proportion are respectively configured), thereby causing the determination error of the significant region, therefore, after the step S1, the method further comprises a step,

s11: the width of the moving object bounding box is scaled to a preset size (here, the preset size is set manually according to the precision requirement, so the detailed parameters are not limited).

The extracted boundary frame of the moving object is zoomed, so that the subsequent calculation of the relative overlapping degree is not influenced by the boundary frame any more, and the overall accuracy of the overall saliency region extraction is improved.

After the bounding box is refined, the invention selects the object which is most likely to be the moving object according to the non-maximum inhibition threshold and the intersection threshold. Here, the Non-maximum Suppression threshold (NMS) is an object whose Suppression is not the maximum value as the name implies, and may be understood as a local maximum search. This part represents a domain with two parameters that are variable, one being the dimension of the domain and the second being the size of the domain. For example, in the face detection shown in fig. 3, after a sliding window is subjected to feature extraction and identified by a classifier, each window will obtain a score. But sliding windows can result in many windows having an inclusive or mostly intersecting with other windows. The NMS is then used to select the window with the highest score (i.e., the highest probability of being determined as a pedestrian) in the area and suppress the windows with the low score. And the overlapping degree threshold value is refined again from the screened window on the basis of the non-maximum inhibition threshold value. Because the extracted windows are overlapped in the sliding window process, the more the extracted windows are close to the correct moving object, the more the windows are overlapped with other windows, the higher the frequency of the overlapping times is, the higher the probability that the part is the moving object is, and therefore, the windows with the overlapping degree exceeding the overlapping degree threshold value can be extracted to be used as the final moving object set.

In the embodiment, for better extracting the moving object, multiple comparison experiments are performed, the non-maximum suppression threshold is set to 0.1, and the intersection threshold is set to 0.5, so that such numerical setting can ensure that if the windows overlap each other too much, the window with higher confidence can be preferentially regarded as the final extraction result. Compared with the default threshold value without setting specific parameters, the setting of the two threshold values can obviously improve the recall rate of the coding tree unit.

Based on the extracted moving object set, how to extract a suitable coding tree unit from a plurality of coding tree units of the current frame image according to the moving object set is important as a salient region. Further, the overlapping region of the coding tree unit and the moving object

The following formula can be expressed under the overlap of the current coding tree unit and the moving object:

where CTU denotes a coding tree unit, det denotes a moving object,overlap represents the intersection, i is the number of the moving object, k is the number of the coding tree unit,

the region of the coding tree unit numbered k,

the area of the moving object numbered i,

is composed of

And

the intersection area between them.

In order to find a suitable threshold when defining a coding tree unit as a salient region, there are two cases that can be considered as salient. In the first case, the moving object region extracted by the single object detection network is smaller than the region size of the coding tree unit, so the overlapping region

Area not larger than moving object

. In the second case, the moving object region extracted by the single object detection network is larger than the region size of the coding tree unit, so the overlapping region

Area that cannot be larger than coding tree unit

. Based on this, the present invention proposes the concept of relative overlapping degree to determine the salient region, wherein the relative overlapping degree can be represented by a first formula:

in the formula (I), the compound is shown in the specification,

min () is the minimum solution for the relative overlap of the moving object numbered i and the coding tree element numbered k.

For the above two cases, when it is detected that the moving object is completely located in the code tree unit or the code tree unit is completely covered by the moving object and is greater than the preset overlap degree (which is manually set and can be set according to the precision requirement), it can be considered as a significant region, and when the moving object and the code tree unit are not completely overlapped, the invention can be considered as a non-significant region, which can be specifically represented by the second formula:

in the formula, S_kThe region type of the coding tree unit with number k, 1 is a significant region, 0 is a non-significant region,

the moving object with number i corresponds to the maximum value of the relative overlapping degree with the coding tree unit with number k,

is a preset overlap.

Then according to S_kThe final classification result of (2) is obtained by adjusting quantization parameters of each coding tree unit under preset regulation, and can be specifically expressed by a third formula:

in the formula (I), the compound is shown in the specification,

in order to preset the quantization parameter for the purpose of quantization,

in order to preset the correction value, the correction value is set,

and the preset quantization parameter is the modified preset quantization parameter of the coding tree unit with the number k. And then, each coding tree unit can be coded according to the corrected preset quantization parameter.

The final purpose of the invention is to reduce the data resource (bit) allocated to the non-significant area, to screen the moving target through the whole process, to select the optimal coding tree unit, and to greatly save the whole data resource consumption in the inter-frame image coding process through the data resource consumption control of one ring after another.

Example two

In order to better understand the technical content of the present invention, this embodiment explains the present invention by way of a system structure, as shown in fig. 2, a general coding system for inter-frame images based on saliency detection includes:

The saliency detector selects a single-target detection network, a moving target extracted by the single-target detection network contains classification information, area information and a boundary frame, and the saliency detector also comprises a frame scaling unit used for scaling the width of the boundary frame of the moving target to a preset size.

Meanwhile, the preset range threshold comprises a non-maximum inhibition threshold and an intersection threshold, and the parameterized extraction range of the moving target is as follows:

In summary, the method and system for coding inter-frame images based on significance detection in the present invention select a single-target detection network as the significance detector, so as to better adapt to parameterized data processing, and reduce the consumption of data resources in the data processing process.

Through screening of the full-coverage coding tree units and calculation of the corresponding overlapping degree, the coding tree unit which can show the moving target most is screened out, and quantization parameter adjustment under preset adjustment is carried out on the coding tree unit, so that more data resources can be allocated to the significant area, while the allocation of the data resources is reduced in the non-significant area, and the utilization rate of the data resources is improved. Meanwhile, the timeliness of the detection of the moving target can be effectively improved by using the single-target detection network.

It should be noted that all the directional indicators (such as up, down, left, right, front, and rear … …) in the embodiment of the present invention are only used to explain the relative position relationship between the components, the movement situation, etc. in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indicator is changed accordingly.

Moreover, descriptions of the present invention as relating to "first," "second," "a," etc. are for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicit ly indicating a number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the present invention, unless otherwise expressly stated or limited, the terms "connected," "secured," and the like are to be construed broadly, and for example, "secured" may be a fixed connection, a removable connection, or an integral part; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or they may be connected internally or in any other suitable relationship, unless expressly stated otherwise. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

In addition, the technical solutions in the embodiments of the present invention may be combined with each other, but it must be based on the realization of those skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination of technical solutions should not be considered to exist, and is not within the protection scope of the present invention.

Claims

1. A method for coding an inter-frame image based on saliency detection is characterized by comprising the following steps:

2. The method as claimed in claim 1, wherein the saliency detector is a single-object detection network, and the moving object extracted by the single-object detection network contains classification information, region information and a bounding box.

3. The method of claim 2, wherein the step of S1 is followed by a step of detecting the saliency of the inter-frame image,

s11: and scaling the width of the moving object bounding box to a preset size.

4. The method as claimed in claim 1, wherein the preset range threshold includes a non-maximum suppression threshold and an intersection threshold, and the parameterized extraction range of the moving object is:

5. The method for universal coding of inter-frame images based on saliency detection as claimed in claim 2, wherein in said step S3, the relative overlap can be obtained by a first formula, said first formula is expressed as:

movement numbered iThe relative overlap of the target with the coding tree element of number k,

the region of the coding tree unit numbered k,

the area of the moving object numbered i,

is composed of

And

in the intersection region between, min () is the minimum solution.

6. The method as claimed in claim 5, wherein in the step S4, the determination of the saliency region is represented by a second formula, and the second formula is expressed as:

is a preset overlap.

7. The method as claimed in claim 6, wherein the step S5 for correcting the preset quantization parameter according to the extracted saliency region by the preset correction value is represented by a third formula, where the formula of the third formula is:

in the formula (I), the compound is shown in the specification,

in order to preset the quantization parameter for the purpose of quantization,

in order to preset the correction value, the correction value is set,

8. A system for universal coding of inter-frame images based on saliency detection, comprising:

9. The system as claimed in claim 8, wherein the saliency detector is a single-object detection network, the moving object extracted by the single-object detection network includes classification information, region information, and a bounding box, and the saliency detector further includes a bounding box scaling unit for scaling the width of the bounding box of the moving object to a predetermined size.

10. The system as claimed in claim 8, wherein the preset range threshold includes a non-maximum suppression threshold and an intersection threshold, and the parameterized extraction range of the moving object is: