CN115396669A

CN115396669A - Video compression method and device based on interest area enhancement

Info

Publication number: CN115396669A
Application number: CN202211006575.5A
Authority: CN
Inventors: 滕芳
Original assignee: Shanghai University of Engineering Science
Current assignee: Shanghai University of Engineering Science
Priority date: 2022-08-22
Filing date: 2022-08-22
Publication date: 2022-11-25

Abstract

The invention provides a video compression method and a device based on interest area enhancement, wherein the method comprises the following steps: carrying out transformation and quantization processing on the video frame to remove spatial redundant information; extracting an image interesting region through YOLOv4, converting RGB color space components into HSV space components, and then enhancing the interesting region; after image enhancement, compressing the brightness component in the data; taking a part of image data after the interest area is enhanced as training data, and training by utilizing the training data to generate a confrontation network; and decoding the compressed brightness component, inputting the decoded image into a generation countermeasure network, and coloring the image to obtain a decoded enhanced image. According to the invention, the interesting area is detected through YOLOv4, the image enhancement is carried out in the HSV space, only the image brightness component is compressed, and the video compression efficiency is improved.

Description

Video compression method and device based on interest area enhancement

Technical Field

The invention relates to the technical field of video compression, in particular to a video compression method and device based on interest area enhancement.

Background

With the wide application of high-definition video in multimedia communication equipment, high-efficiency video coding has attracted wide attention at home and abroad. In practical applications, demands for high-definition and high-contrast image processing are increasing, image quality is difficult to guarantee due to limitations in illumination conditions, exposure degrees, transmission bandwidths, storage capacities and the like, and requirements for image quality and details are increasing, but image detail distortion is still serious due to limitations in shooting environments, transmission environments and the like, and therefore, a compression technology for image enhancement is urgently needed to be provided to break through the technical bottleneck of image compression at present.

The image enhancement technology plays an important role in the wide application of images, and the main purpose of image enhancement is to inhibit useless noise doped in the image in the process of acquisition and transmission, highlight useful information in the image, enable the image to conform to the visual effect of people as much as possible, or enable the image to be converted into a form which is beneficial to computer recognition and analysis, and improve the subsequent processing capability and application value of the image.

Disclosure of Invention

In view of solving the above problems, an object of the present invention is to provide a method and an apparatus for video compression based on region of interest enhancement, which can improve video compression efficiency and enhance image quality.

In order to solve the problems, the technical scheme of the invention is as follows:

a video compression method based on region of interest enhancement comprises the following steps:

carrying out transformation and quantization processing on the video frame to remove spatial redundant information;

extracting an image interesting region through YOLOv4, converting RGB color space components into HSV space components, and then enhancing the interesting region;

after image enhancement, compressing the brightness component in the data;

taking a part of image data after the interest area is enhanced as training data, and training by utilizing the training data to generate a confrontation network;

and decoding the compressed brightness component, inputting the decoded image into a generation countermeasure network, coloring the image and obtaining a decoded enhanced image.

Optionally, the step of extracting the image region of interest by YOLOv4, converting the RGB color space component into an HSV space component, and then enhancing the region of interest specifically includes the following steps:

detecting people and vehicles in the image by using YOLO v4, and taking the detected objects such as the people and the vehicles as interest areas;

converting an RGB color space image of the interest area into an HSV space image;

separating the irradiated light component and the reflected light component of the V component by a logarithm taking method;

enhancing the V component;

carrying out self-adaptive adjustment on the S component;

after the image is enhanced, the enhanced H, S, V components are reconverted to R, G, B components.

Optionally, the RGB color space image of the interest region is converted into an HSV space image by using a formula:

wherein R, G, B is R, G, B component of the image, H, S, V is the component of the image-again-HSV space, H ∈ [0,360], S ∈ [0,1], V ∈ [0,1], tmax is the maximum value in R, G, B, and Tmin is the minimum value in R, G, B.

Optionally, the irradiation light component and the reflected light component of the V component are separated by a logarithm method, and an adopted formula is:

V＝L×R

where V is the V component of the image in HSV space, L is ambient illumination image data, and R is reflectance image data.

Optionally, in the step of enhancing the V component, the enhanced V component may be represented as V':

in the formula, ω N is a weighting coefficient of the nth scale, and N is the number of scales.

Optionally, in the step of adaptively adjusting the S component, the adjusted S component may be represented as S':

S′＝S+t×(V′-V×λ)

in the formula, t is a proportionality constant, and λ is an adaptive coefficient.

Optionally, in the step of compressing the luminance component in the data after the image enhancement, the formula for compressing the luminance component of the enhanced image is as follows:

Y′＝C(Y)

where C () is an image encoder, Y is an image luminance signal, and Y' is a compressed luminance signal.

Optionally, the step of training a part of the image data after enhancing the interest region as training data to generate a confrontation network by using the training data specifically includes the following steps:

constructing a chrominance component to generate a confrontation network to color the picture;

the generator loss function is designed.

Optionally, the generator loss function is:

L _mixed ＝a ₁ ·L _a +a ₂ ·L _MSE +a ₃ L _content +a ₄ L _color

where a1, a2, a3, a4 are loss function weights, where La is the term of the opposing loss:

L _a ＝-log D(G(Y))

where log () is the log function, D () is the image discriminator model, and LMSE is the mean square error loss term:

L _MSE ＝||G(Y)-X|| ₂

in the formula, |2 is a2 norm, X is a target color image, and Lcontent is a characteristic loss term;

in the formula, | | |1 is a1 norm, cj, hj, wj respectively represent the channel number, length and width of the characteristic diagram,

a j-th layer output of the network feature extraction layer;

in the formula, G (y) is the generated image, and G0 and Gt are gaussian filters.

Further, the present invention also provides a video compression apparatus based on region of interest enhancement, the apparatus includes a processor and a memory, the processor executes a program corresponding to an executable program code by reading the executable program code stored in the memory, so as to implement the video compression method based on region of interest enhancement as described above.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the invention, the interest area is detected by using a YOLOv4 algorithm, the algorithm is high in detection speed, and video data can be processed in real time;

2. the RGB components are converted into HSV components to enhance the region of interest, and the mean value, variance and entropy of the enhanced image are superior to those of the image directly enhanced in the RGB space;

3. the invention only compresses the image brightness component, and designs and generates a countermeasure network at a decoding end to color the image, thereby improving the video compression efficiency;

4. the generation of the anti-network mixing loss function constructed by the invention improves the quality of the enhanced image in the video coloring process.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

fig. 1 is a flow chart of a video compression method based on region of interest enhancement according to an embodiment of the present invention;

FIG. 2 is a diagram of a structure of a generate confrontation network according to an embodiment of the invention;

fig. 3 is a block diagram of a video compression apparatus based on region of interest enhancement according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will help those skilled in the art to further understand the present invention, and will make the technical solutions of the present invention and their advantages obvious.

The invention provides a video compression method based on region of interest enhancement, which extracts a region of interest in a video frame through a YOLOV4 algorithm, enhances and compresses the video frame, and colors a decoded image through a generation countermeasure network, and specifically, fig. 1 is a flow diagram of the video compression method based on region of interest enhancement provided by the embodiment of the invention, as shown in fig. 1, the method comprises the following steps:

s1: carrying out transformation and quantization processing on the video frame to remove spatial redundant information;

s2: extracting an image interesting region through YOLOv4, converting RGB color space components into HSV space components, and then enhancing the interesting region;

specifically, the step S2 includes the steps of:

step 21: detecting people and vehicles in the image by using YOLO v4, and taking the detected objects such as the people and the vehicles as interest areas;

step 22: converting an RGB color space image of the interest area into an HSV space image;

the concrete formula is as follows:

wherein R, G, B is R, G, B component of image respectively, H, S, V is component of image-again-HSV space respectively, H is equal to [0,360], S is equal to [0,1], V is equal to [0,1], tmax is maximum value in R, G, B, and Tmin is minimum value in R, G, B;

step 23: separating the irradiated light component and the reflected light component of the V component by a logarithm taking method;

V＝L×R (2)

in the formula, V is a V component of an image in HSV space, L is environment illumination image data, and R is reflection image data;

and step 24: the V component is enhanced, and the enhanced V component can be expressed as V':

in the formula, ω N is the weighting coefficient of the nth scale, and N is the number of scales;

step 25: the S component is adaptively adjusted, and the adjusted S component can be represented as S':

S′＝S+t×(V′-V×λ) (4)

in the formula, t is a proportionality constant, and lambda is an adaptive coefficient;

step 26: after the image is enhanced, the enhanced H, S, V components are reconverted to R, G, B components.

S3: after image enhancement, compressing the brightness component in the data;

specifically, the luminance component of the enhanced image is compressed:

Y′＝C(Y) (5)

S4: taking a part of image data after the interest area is enhanced as training data, and training by utilizing the training data to generate a confrontation network;

specifically, the generation of the structure diagram of the countermeasure network is shown in fig. 2, and the step S4 includes the following steps:

step 41: constructing a chrominance component to generate a confrontation network to color the picture;

the generator is composed of a multi-scale feature extractor, residual error connection based on an attention mechanism and a regularized feature reconstruction mechanism, and the discriminator adopts a PatchGAN structure.

Step 42: designing a generator loss function;

the concrete formula is as follows:

L _mixed ＝a ₁ ·L _a +a ₂ ·L _MSE +a ₃ L _content +a ₄ L _color (6)

L _a ＝-logD(G(Y)) (7)

L _MSE ＝||G(Y)-X|| ₂ (8)

a j-th layer output of the network feature extraction layer;

S5: and decoding the compressed brightness component, inputting the decoded image into a generation countermeasure network, and coloring the image to obtain a decoded enhanced image.

As shown in fig. 3, the present invention further provides a video compression apparatus based on region of interest enhancement, the apparatus includes a processor 31 and a memory 32, the processor 31 runs a program corresponding to an executable program code by reading the executable program code stored in the memory 32, so as to implement the video compression method based on region of interest enhancement according to the foregoing embodiment.

Compared with the prior art, the invention has the beneficial effects that:

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A method for video compression based on region of interest enhancement, the method comprising the steps of:

after image enhancement, compressing the brightness component in the data;

and decoding the compressed brightness component, inputting the decoded image into a generation countermeasure network, and coloring the image to obtain a decoded enhanced image.

2. The method of claim 1, wherein the step of extracting the image region of interest by YOLOv4, converting RGB color space components into HSV space components, and enhancing the region of interest comprises the following steps:

enhancing the V component;

carrying out self-adaptive adjustment on the S component;

3. The method according to claim 2, wherein the region of interest RGB color space image is converted into HSV space image by the following formula:

wherein R, G, B is R, G, B component of image respectively, H, S, V is component of image-again-HSV space respectively, H epsilon [0,360], S epsilon [0,1], V epsilon [0,1], tmax is the maximum value in R, G, B, and Tmin is the minimum value in R, G, B.

4. The region of interest enhancement-based video compression method according to claim 2, wherein the V component of the illumination light component and the reflected light component are separated by a logarithm method, and the formula is as follows:

V＝L×R

5. The method of claim 2, wherein in the step of enhancing the V component, the enhanced V component is represented by V':

6. The method of claim 2, wherein in the step of adaptively adjusting the S component, the adjusted S component can be represented as S':

S′＝S+t×(V′-V×λ)

7. The method of claim 1, wherein in the step of compressing the luminance component of the data after the image enhancement, the formula for compressing the luminance component of the enhanced image is as follows:

T′＝C(Y)

8. The method according to claim 1, wherein the step of generating the countermeasure network by training with the training data, using a part of the image data after the region of interest enhancement as the training data, comprises the steps of:

the generator loss function is designed.

9. The method of claim 8, wherein the generator loss function is:

L _mixed ＝a ₁ ·L _a +a ₂ ·L _MSE +a ₃ L _content +a ₄ L _color

L _a ＝-log D(G(Y))

L _MSE ＝||G(Y)-X|| ₂

a j-th layer output of the network feature extraction layer;

10. An apparatus for video compression based on region of interest enhancement, the apparatus comprising a processor and a memory, the processor running a program corresponding to an executable program code stored in the memory by reading the executable program code, for implementing the method of video compression based on region of interest enhancement according to any one of claims 1 to 9.