WO2011153869A1

WO2011153869A1 - Method, device and system for partition/encoding image region

Info

Publication number: WO2011153869A1
Application number: PCT/CN2011/072787
Authority: WO
Inventors: 张智雄
Original assignee: 深圳市融创天下科技股份有限公司
Priority date: 2010-06-07
Filing date: 2011-04-14
Publication date: 2011-12-15
Also published as: CN101882316A

Abstract

A method, a device and a system for partition/encoding image regions are provided, wherein, the method comprises: extracting motion vectors of an image in a video sequence, partitioning the image into regions of interest or regions of no interest according to the complexities of the motion vectors, and carrying out a further precise judgement on the region-partitioned image according to the spatial and temporal correlations of macro blocks, thereby implementing precise foreground and background regions partition on the image. In encoding, a smaller quantized parameter is adopted on the regions of interest to improve the video quality, and a larger quantized parameter is adopted on the regions of no interest, in order to keep the total bit consumption unchanged, and eventually achieve the effect of improving the subjective quality of the video.

Description

Image area division/coding method, device and system

The present invention relates to the field of video coding, and in particular, to an image area division/coding method, apparatus and system. Background technique

Faced with a complex scene, the human visual attention system (HVS) can quickly focus on a few significant visual objects and prioritize them. This process is called visual attention, significant. The visual object is called the region of interest (ROD o). Under the action of this mechanism, HVS allocates the limited information processing resources reasonably, which makes the visual perception process have the ability to select. Thus, it is not the entire image. All areas of the image have equal importance in the subjective quality of the image, and the subjective quality of the image is more determined by the quality of the region of interest in the image. R0I detection has great application value for many image analysis, among which Some of the more prominent application directions include: image quality assessment, image compression and encoding, image retrieval, scene rendering, target detection. Currently, the most commonly used standard for measuring image and video quality is the peak signal to noise ratio (PSNR, PowerSignal). - to - NoiseRatio, signal to noise power ratio), using the following equation (1) :

PSNR _dB = 101og ₁₀ (2" - \f lMSE , where MSE is the mean square error between the original image and the encoded image, ( ² " _ ² is the square of the largest possible signal value in the image, n is for each The number of bits in the pixel.

PSNR is the most commonly used indicator to measure the similarity between two images. The higher the value, the more similar we think the two images are, but the PSNR does not fully reflect the subjective quality of images and video. As shown in Figure la-Id, where la is the original image, Figure lb has PSNR = 30. 6db, Figure lc The PSNR = 28. 3db, Figure Id's PSNR = 27. 7db. Although the PSNR value of Figure lb is the highest, people tend to think that the picture Id is more similar to the original picture, and the picture quality is higher. This is because the visual interest area (such as face, glasses, etc.) in Figure Id is clearer than Figure lb and Figure lc, even if the figure Id is in the area of visually uninteresting (the floor behind the girl, the violin) Figure lb and Figure lc are blurred. Even if the overall PSNR of the graph Id is nearly 3 db lower than the graph lb (that is, the overall objective quality of the image is quite different), we will still be more subjective in the image of the artificial graph Id. It can be seen that different regions in the image have different importance in people's subjective evaluation, and the subjective quality of the image is more determined by the quality of the region of interest in the image. The prior art image coding compression method does not accurately divide the foreground and the background. In the case of a certain code rate, the limited bits cannot be organically allocated to the region of human visual interest, and the human subjective feeling cannot be provided. A clearer image effect.

Summary of the invention

The object of the embodiments of the present invention is to provide an image region dividing method, which aims to solve the problem that the prior art cannot accurately divide the foreground and background of the graphics.

The embodiment of the present invention is implemented by the method for dividing an image region, and the method includes the following steps:

Extracting a motion vector of each macroblock of each frame of the video sequence;

Counting the motion vectors of the current frame, and calculating the number of the same motion vectors;

Label and extract the relatively complex areas of the macroblock motion vector in the current frame, and divide the interesting and non-interesting areas.

Another object of the embodiments of the present invention is to provide an image area dividing apparatus, and the apparatus includes:

a macroblock motion vector extraction module, configured to extract motion directions of each macroblock of each frame of the video sequence A macroblock motion vector statistics module for counting motion vectors of a current frame and calculating the same

The number of MVs;

The macroblock motion vector complex region processing module is used for labeling and extracting a relatively complex region of the macroblock motion vector in the current frame, and further correcting the image after the region according to the spatial correlation and time correlation of the macroblock, and dividing the same Regions of interest and areas of non-interest.

Another object of the embodiments of the present invention is to provide an image region encoding method, the method comprising the following steps:

The region of interest or non-interest region is determined according to the complexity of the motion vector of the image macroblock, the region with high motion vector complexity is the region of interest, and the region with low motion vector complexity is the non-interest region;

The coding quantization parameter is reduced for the image region of interest to improve the image quality of the region, and the coding quantization parameter is increased for the non-interest region of the image to keep the overall coding bits unchanged. Another object of the embodiments of the present invention is to provide an image area coding system, the system comprising:

a macroblock motion vector extraction module, configured to extract a motion-to-macroblock motion vector statistics module of each macroblock of each frame of the video sequence, for performing statistics on the motion vector of the current frame, and calculating the same

The number of MVs;

The macroblock motion vector complex region processing module is used for labeling and extracting a relatively complex region of the macroblock motion vector in the current frame, and further correcting the image after the region according to the spatial correlation and time correlation of the macroblock, and dividing the same Regions of interest and areas of non-interest. And a video encoding module, configured to perform video encoding of different qualities according to the region of interest and the non-interest region divided by the image region dividing device. Advantageous Effects of the Invention: According to the complexity of a macroblock motion vector in an image, the present invention determines an image as a region of interest or a region of non-interest, thereby performing a region of interest and a region of non-interest for the image. Minute. Smaller quantization parameters are used for the region of interest to improve the quality of the video. Correspondingly, higher quantization parameters are used for non-interest regions to balance the overall bit consumption without ultimately achieving the effect of improving the subjective quality of the video.

DRAWINGS

Figure la is an original image in the prior art;

The image of PSNR of 30. 6db in the prior art of FIG.

Figure lc shows an image with a PSNR of 28. 3db in the prior art;

The prior art of Figure Id has an image with a PSNR of 27. 7 db;

2 is a flowchart of an image area dividing method according to an embodiment of the present invention;

Figure 3a is an image of a frame in a tennis match of an embodiment of the present invention;

Figure 3b is an image of the motion vector of each macroblock labeled in Figure 3a.

Figure 4 is a diagram of the image in Figures 3a and 3b divided into foreground and background regions;

FIG. 5 is a view showing further optimization of the image division area in FIG. 4; FIG.

6 is a structural diagram of an image area dividing apparatus according to an embodiment of the present invention;

detailed description

The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. For the purpose of explanation, only the parts related to the embodiments of the present invention are shown. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the embodiment of the present invention, the motion vector of each macroblock in each frame of the video sequence is extracted, the motion vector of the current frame is counted, the number of the same motion vector is calculated, and the image is determined as the region of interest according to the complexity of the motion vector. Non-interest regions, thereby dividing the region of interest and the region of non-interest region of the image. Smaller quantization parameters are used for the region of interest to improve the quality of the video. Correspondingly, higher quantization parameters are used for the non-interest region to balance the overall bit consumption and finally reach Improve the subjective quality of video. In each of the following embodiments, x, i, j, m, n, v, and h are all natural numbers.

5101. Extract a motion vector of each macroblock of each frame of the video sequence.

The specific method is as follows: spatially analyzing the macroblock of the image, and calculating the motion vector (MVX, MVY) of the current macroblock according to the absolute difference and the calculated difference of the Sum of Absolute Transform Difference (STD), and Store it.

5102: Perform statistics on the motion vector of the current frame, and calculate the number of the same motion vector.

The specific method is as follows: The motion search range in the X-axis direction in step S101 is [^», and the motion search range in the Y-axis direction is ^[ _ ^v , ^v ). Let the matrix M be a motion vector statistical matrix of x ² v, as shown in the following equation (2):

M _j = X {MYX _mn , MVY _mn ) = (ί -h,j- ν)

(2), where (^^ '^^, „) indicates the motion of the macroblock of the _mth row and the _nth column: , .

Indicates the number of macroblocks in the current frame where the motion vector ^(MV ^,"' ^MVY - ) is equal to d J _ .

S103, labeling and extracting a relatively complex region of the macroblock motion vector in the current frame, and dividing the region of interest and non-interest.

In general, the motion trend of the background of each frame of image is relatively similar, while the motion vector of the foreground region is relatively complex. 3a is an image of a frame in a tennis match in an embodiment of the present invention, and FIG. 3b is a diagram showing motion vectors of macroblocks in the image shown in FIG. 3a.

It can be seen that by distinguishing the relatively complex regions of the motion vector, the foreground of the image can be Areas of interest are extracted. The specific method is as follows:

The matrix M in step S102 is sorted in a small to large manner and stored in the array ^M - ^ra "^, as shown in the following equation (3):

M _ count _n = ^ _{i ;} . == n

i = 0J = 0, (3), where the value of ^M -^ represents the number of macroblocks having "one motion vector. In this array,

"The fewer values represent the rarer macroblock motion, and it also means that these macroblocks are more likely to be foreground. Therefore, when the accumulated value of ^Μ -""^« is below the domain value D, these macros are considered The block is foreground, and vice versa. The principle of the method is as follows:

r , the area is considered to be the foreground

Nt _{x ≤} D , the area is considered to be the background ( _{4 )}

Wherein, the domain value D ⁼ w^^/16^ fe/16 ; w^ refers to the width of the video sequence; ^'gfe refers to the height of the video sequence; ^w ^ ^/16 ^^ ^/16 refers to the macroblock contained in one frame of the image total; the proportional coefficient, indicating the ratio of the total full FIG macro-block motion vector of rare macroblock number here, the value is set to 0. 384, else except represents _ χ · Μ- _COi mt Other areas than _x ≤ /).

1

In order to determine the value of the proportional coefficient in the above formula (4), the rule of golden section is quoted here: The golden section, also known as the golden rule, refers to a certain mathematical proportional relationship between the various parts of the thing, that is, the whole is divided into two, larger The ratio of the portion to the smaller portion is equal to the ratio of the whole to the larger portion, and the ratio is 1: 0. 618 or 0. 618: 0. 384, that is, the long segment is 0. 618 of the whole segment. 0. 618 is recognized as the most aesthetically significant proportion. The above ratio is the ratio that most causes people's beauty, so it is called the golden section. According to the visual characteristics of humans, the image with a total of 1 is divided into a background of 0. 384 and a background of 0. 618 is more in line with human aesthetics. Therefore, we set the value to 0.384.

According to the above rules, the images in Figures 3a and 3b can be divided into foreground and background regions, Figure 4 The figure after dividing the image in Fig. 3 into foreground and background areas is shown.

S104, further determining the foreground of each frame of the video sequence.

Although Figure 4 has made a preliminary separation of the foreground background of the frame image, there are still some areas that are misjudged in the figure. Since the video sequence has strong continuity in space and time, when judging whether the current macroblock is the foreground of the image, the spatially and temporally adjacent regions can also be considered as factors to enhance the accuracy of the judgment. The continuity of the area in the video sequence space is considered as follows: For example, the current macroblock is judged as the foreground, but the upper, lower, left, and right macroblocks adjacent to the current macroblock are judged as the background, and the current macroblock is the foreground. The possibility is greatly reduced; similarly, if the current macroblock is judged as the background, but the upper, lower, left, and right macroblocks adjacent to the current macroblock are judged as foreground, then the current macroblock is background. The possibility is also very low.

The continuity of the video sequence in time is considered as follows: if the area of the current macroblock is foreground in the previous frame image, the probability that the area is also foreground in the current frame is greatly increased; conversely, if the current macroblock is in the front area The background in a frame of image is more likely to be the background than the foreground in the previous frame.

In order to make the judgment of the foreground background area of the image more accurate, the area distinguished in step S103 can be further filtered and optimized to make the result more accurate. The specific method is as follows:

1) The determination method of the foreground background in step S103 is optimized by the following formula (5). ^xM _count _x ≤DIA ,level =

1

DI <^xM _count _x ≤DI2 ,level = 2 ₍₅₎

1

D/2<^xM _count _{x ≤} D , level=l

1

Else evel = 0

Among them, it indicates that the current macroblock judges the possibility of foreground by the rarity of motion. The higher the value, the more likely the current macroblock is to be foreground. 2) Determine whether the current macroblock is foreground based on the spatial correlation and temporal correlation of the current macroblock. Let the ^^ value of the current macroblock be ^ZeveZ ', and combine with the spatial correlation of the current macroblock to obtain the space probability that the current macroblock is foreground, as shown in the following equation (6).

L_fore _i - = level _{ . -\- · (level^ . + level _i+l . + level _{ - _x + level _{ where ^ is the line where the current macroblock is located, 歹^, "is the current macroblock The serial number of the frame, "is the time correlation coefficient, and the value is ^[Q , ^1] . For the macroblock in the boundary area, let the value of the adjacent macroblock beyond the boundary part take 0, for example: if "<ι, ie current The macroblock is located at the far left of the image, which makes ^ZeveZ 'w ^=Q combined with the temporal correlation of the current macroblock. The comprehensive probability that the current macroblock is foreground is as shown in the following equation (7).

Where ^ is the row in which the current macroblock is located, 歹 "is the sequence number of the frame in which the current macroblock is located, and A is the time correlation coefficient, and the value is ^[Q , ^1] .

3) Select the field value to determine if the current macroblock is foreground.

The method is as shown in the following equation (8):

For equation (6), take 0.6, for equation (7), take A.05, for equation (8), for i^y/^W, take 10, and for Fig. 3a and 3b, for foreground region extraction, you can get FIG. 5 is a diagram showing further optimization of the image division area in FIG. 4.

In the embodiment of the present invention, the motion vector of each macroblock in each frame of the video sequence is extracted, the motion vector of the current frame is counted, the number of the same motion vector is calculated, and the image is determined as the region of interest according to the complexity of the motion vector. The non-interest region, and the image after the segmentation is further accurately judged according to the spatial correlation and temporal correlation of the macroblock, thereby accurately dividing the foreground and background regions of the image. Embodiment 2

FIG. 6 is a structural diagram of an image area dividing apparatus according to an embodiment of the present invention. The device comprises the following parts:

a macroblock motion vector extraction module, configured to extract a motion-to-macroblock motion vector statistics module of each macroblock of each frame of the video sequence, for performing statistics on the motion vector of the current frame, and calculating the number of the same MV;

The macroblock motion vector complex region processing module is used for labeling and extracting a relatively complex region of the macroblock motion vector in the current frame, and further correcting the image of the partitioned region according to the spatial correlation and temporal correlation of the macroblock, and dividing the same Regions of interest and areas of non-interest. The macroblock motion vector complex region processing module includes a preliminary processing module and an optimization processing module, and the preliminary processing module is configured to label and extract a relatively complex region of a macroblock motion vector in a current frame; the optimization processing module is configured to use the current The spatial correlation and temporal correlation of the macroblocks further accurately determine the images after the sub-regions, and divide the regions of interest and non-interest regions.

In the embodiment of the present invention, the motion vector of the image in the video sequence is extracted, and the image is determined as the region of interest or the non-interest region according to the complexity of the motion vector, and the spatial correlation and temporal correlation of the image after the region are determined according to the macroblock. Make further precise judgments to accurately segment the foreground and background regions of the image.

Embodiment 3

An embodiment of the present invention further provides an image region encoding method, where the method includes the steps included in the method of the first embodiment, and further includes the steps of: performing video encoding of different qualities according to the final divided foreground and background. The region uses smaller quantization parameter coding to improve the quality of the video, and uses higher quantization parameter coding for the non-interest region to balance the overall bit consumption.

In the embodiment of the present invention, the motion vector of the image in the video sequence is extracted, and the image is determined as the region of interest or the non-interest region according to the complexity of the motion vector, and the spatial correlation and temporal correlation of the image after the region are determined according to the macroblock. Make further precise judgments to make foreground and background areas of the image The domain is precisely divided. Smaller quantization parameters are used for the region of interest to improve the quality of the video, and higher quantization parameters are used for the non-interest region to balance the overall bit consumption, and finally achieve the effect of improving the subjective quality of the video.

Embodiment 4

An embodiment of the present invention further provides an image area coding system, where the system includes:

The number of MVs;

The macroblock motion vector complex region processing module is used for labeling and extracting a relatively complex region of the macroblock motion vector in the current frame, and further correcting the image of the partitioned region according to the spatial correlation and temporal correlation of the macroblock, and dividing the same Regions of interest and areas of non-interest. The macroblock motion vector complex region processing module includes a preliminary processing module and an optimization processing module, and the preliminary processing module is configured to label and extract a relatively complex region of a macroblock motion vector in a current frame; the optimization processing module is configured to use the current The spatial correlation and time correlation of the macroblocks further accurately judge the images after the subregions, and divide the regions of interest and non-interest regions;

The video coding module is configured to perform video coding of different qualities according to the finally divided region of interest and non-interest region.

In the embodiment of the present invention, the motion vector of the image in the video sequence is extracted, and the image is determined as the region of interest or the non-interest region according to the complexity of the motion vector, and the spatial correlation and temporal correlation of the image after the region are determined according to the macroblock. Make further precise judgments to accurately segment the foreground and background regions of the image. Smaller quantization parameters are used for the region of interest to improve the quality of the video, and higher quantization parameters are used for the non-interest region to balance the overall bit consumption, and finally achieve the effect of improving the subjective quality of the video. One of ordinary skill in the art will appreciate that all or part of the steps of the above embodiments are implemented. The program can be completed by a program instruction related hardware, and the program can be stored in a computer readable storage medium, and the storage medium can be a ROM, a RAM, a magnetic disk, an optical disk, or the like. The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. Within the scope.

Claims

2. The image region dividing method according to claim 1, wherein the method further comprises the following steps:

The image after dividing the region of interest and non-interest is further judged based on the correlation of the macroblock space and time.

The image region dividing method according to claim 1, wherein the extracting the motion vector of each macroblock of each frame image in the video sequence comprises:

The spatial continuity of the macroblock of the image is analyzed, and the motion vector (MVX, MVY) of the current macroblock is calculated according to the motion prediction transformed absolute difference and calculated.

The image region dividing method according to claim 1, wherein the calculating the motion vector of the current frame, and calculating the number of the same motion vector comprises:

The motion search range in the X-axis direction of the motion vector of the macroblock is b), the motion search range in the Y-axis direction is ^[_ ^, and the motion vector statistical matrix of the matrix M is ^{2/zx 2} v:

(2), wherein, in the above formula (2), ( ^MVX represents a motion vector of a macroblock of the mth row and the nth column, and represents a number of macroblocks of a motion vector ( ^MVX equal to ^v) in the current frame, the i, j, m, n, v, h are all natural numbers.

The image region dividing method according to claim 1, wherein the labeling and extracting a relatively complex region of a macroblock motion vector in a current frame, and dividing the region of interest and non-interest region comprises: Sort by small to large, and store the array of the following formula (3)

Medium:

i=2h-l =2v-l

M count" = V . , == η

i = ⁰ ' ⁰ (3), where the value of ^M - ^raiWi " in the above formula (3) represents the number of macroblocks having "one motion vector," the less the value represents the rarer macroblock Motion, the more likely it is for the foreground, when the accumulated value of M_a^t„ is below the domain value of ₀ , the macroblocks are considered to be foreground, and vice versa; the principle of the method is as follows: (4):

}x-M_count _x ≤D , the area is considered to be the foreground

¹ else , the region is considered to be the background ( ₄ ), where the domain value D =

Refers to the width of the video sequence; refers to the height of the video sequence; ^^^ ¹⁶ '^^^ ¹⁶ refers to the total number of macroblocks contained in a frame of image; is the proportional coefficient, indicating that the number of macroblocks in the motion vector is relatively rare The ratio of the total number of macroblocks in the graph is, ^where the value of ^k is set to 0.384; the else represents a region other than 1χ·Μ—coimt _x ≤Ζ,

1

x, i, j, n, v, h are all natural numbers.

The image region dividing method according to claim 5, wherein the further determining the correlation between the partitioning interest and the non-interest region image according to the correlation between the macroblock space and the time further comprises the following steps:

A) The method for determining the foreground background in the formula (4) is optimized by the following formula (5), jx-M _count _x <DI ,level=4

1

D/4<jx-M_count _x ≤D/2 , level = 2

D/2< j x- M _count _{x ≤} D , level = 1

The other area 'level = 0, which indicates that the current macroblock is determined by the rarity of motion as the foreground. The higher the value indicates that the current macroblock is more likely to be the foreground;

B) determining whether the current macroblock is foreground according to the spatial correlation and temporal correlation of the current macroblock, and setting the ^^ value of the current macroblock to ^Zev ", combining the spatial correlation of the current macroblock to obtain the current macroblock as foreground The space possibility is - , ^, expressed by the following formula (6),

L_ fore _in - lev el + ί· (level^ + level _i+l + lev el _t + lev el _{t j+l} ) ( g ) where is the row of the current macroblock, 歹^, "is the current macroblock The sequence number of the frame in which it is located, "is the time correlation coefficient, and takes the value ^[Q , ^1] . For the macroblock in the boundary region, the value of the ^^ of the adjacent macroblock beyond the boundary portion is 0.

Combined with the temporal correlation of the current macroblock, the comprehensive probability that the current macroblock is foreground is

Lv

Small ¹¹ , expressed by the following formula (7),

^Lv i , _n = ^L - f° ^re i , n + ^λ ■ L_fore _ijn _ _l

(7), where Α is the time correlation coefficient and the value is ^[Q , ^1] ;

C) Select the domain value and determine whether the current macroblock is foreground. The judgment method is as follows:

Lv _ijn > threshold , the macroblock is the foreground

(8),

Other areas, the macro block is the background

For equation (6), "take 0.6, for equation (8), take A.05, and for equation (9), ^?/^/6? take ₁₀ , extract the foreground region of the image, and further optimize the image. Division

The i, j, m, n, v, h are all natural numbers.

7. An image area dividing apparatus, wherein the apparatus comprises:

The number of MVs;

The image region dividing device according to claim 7, wherein the macroblock motion vector complex region processing module comprises: a preliminary processing module, configured to label and extract a relatively complex region of a macroblock motion vector in a current frame. The optimization processing module is configured to further accurately determine the image after the sub-region according to the spatial correlation and temporal correlation of the current macroblock, and divide the region of interest and the non-region of interest.

9. An image region encoding method, the method comprising the steps of: determining, according to the complexity of a motion vector of an image macroblock, a region of interest or a region of non-interest, a region with a high complexity of motion vectors The region of interest, where the motion vector complexity is low, is a non-interest region;

The coding quantization parameter is reduced for the image region of interest to improve the image quality of the region, and the coding quantization parameter is increased for the non-interest region of the image to keep the overall coding bits unchanged.

10. An image area coding system, wherein the system comprises:

The image area dividing device according to claim 7;

And a video encoding module, configured to perform video encoding of different qualities according to the region of interest and the region of non-interest that are divided by the image region dividing device.