CN115396669A - Video compression method and device based on interest area enhancement - Google Patents
Video compression method and device based on interest area enhancement Download PDFInfo
- Publication number
- CN115396669A CN115396669A CN202211006575.5A CN202211006575A CN115396669A CN 115396669 A CN115396669 A CN 115396669A CN 202211006575 A CN202211006575 A CN 202211006575A CN 115396669 A CN115396669 A CN 115396669A
- Authority
- CN
- China
- Prior art keywords
- image
- component
- interest
- region
- enhancement
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/167—Position within a video image, e.g. region of interest [ROI]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/137—Motion inside a coding unit, e.g. average field, frame or block difference
- H04N19/139—Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/186—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/34—Scalability techniques involving progressive bit-plane based encoding of the enhancement layer, e.g. fine granular scalability [FGS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/36—Scalability techniques involving formatting the layers as a function of picture distortion after decoding, e.g. signal-to-noise [SNR] scalability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/91—Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Image Processing (AREA)
Abstract
The invention provides a video compression method and a device based on interest area enhancement, wherein the method comprises the following steps: carrying out transformation and quantization processing on the video frame to remove spatial redundant information; extracting an image interesting region through YOLOv4, converting RGB color space components into HSV space components, and then enhancing the interesting region; after image enhancement, compressing the brightness component in the data; taking a part of image data after the interest area is enhanced as training data, and training by utilizing the training data to generate a confrontation network; and decoding the compressed brightness component, inputting the decoded image into a generation countermeasure network, and coloring the image to obtain a decoded enhanced image. According to the invention, the interesting area is detected through YOLOv4, the image enhancement is carried out in the HSV space, only the image brightness component is compressed, and the video compression efficiency is improved.
Description
Technical Field
The invention relates to the technical field of video compression, in particular to a video compression method and device based on interest area enhancement.
Background
With the wide application of high-definition video in multimedia communication equipment, high-efficiency video coding has attracted wide attention at home and abroad. In practical applications, demands for high-definition and high-contrast image processing are increasing, image quality is difficult to guarantee due to limitations in illumination conditions, exposure degrees, transmission bandwidths, storage capacities and the like, and requirements for image quality and details are increasing, but image detail distortion is still serious due to limitations in shooting environments, transmission environments and the like, and therefore, a compression technology for image enhancement is urgently needed to be provided to break through the technical bottleneck of image compression at present.
The image enhancement technology plays an important role in the wide application of images, and the main purpose of image enhancement is to inhibit useless noise doped in the image in the process of acquisition and transmission, highlight useful information in the image, enable the image to conform to the visual effect of people as much as possible, or enable the image to be converted into a form which is beneficial to computer recognition and analysis, and improve the subsequent processing capability and application value of the image.
Disclosure of Invention
In view of solving the above problems, an object of the present invention is to provide a method and an apparatus for video compression based on region of interest enhancement, which can improve video compression efficiency and enhance image quality.
In order to solve the problems, the technical scheme of the invention is as follows:
a video compression method based on region of interest enhancement comprises the following steps:
carrying out transformation and quantization processing on the video frame to remove spatial redundant information;
extracting an image interesting region through YOLOv4, converting RGB color space components into HSV space components, and then enhancing the interesting region;
after image enhancement, compressing the brightness component in the data;
taking a part of image data after the interest area is enhanced as training data, and training by utilizing the training data to generate a confrontation network;
and decoding the compressed brightness component, inputting the decoded image into a generation countermeasure network, coloring the image and obtaining a decoded enhanced image.
Optionally, the step of extracting the image region of interest by YOLOv4, converting the RGB color space component into an HSV space component, and then enhancing the region of interest specifically includes the following steps:
detecting people and vehicles in the image by using YOLO v4, and taking the detected objects such as the people and the vehicles as interest areas;
converting an RGB color space image of the interest area into an HSV space image;
separating the irradiated light component and the reflected light component of the V component by a logarithm taking method;
enhancing the V component;
carrying out self-adaptive adjustment on the S component;
after the image is enhanced, the enhanced H, S, V components are reconverted to R, G, B components.
Optionally, the RGB color space image of the interest region is converted into an HSV space image by using a formula:
wherein R, G, B is R, G, B component of the image, H, S, V is the component of the image-again-HSV space, H ∈ [0,360], S ∈ [0,1], V ∈ [0,1], tmax is the maximum value in R, G, B, and Tmin is the minimum value in R, G, B.
Optionally, the irradiation light component and the reflected light component of the V component are separated by a logarithm method, and an adopted formula is:
V=L×R
where V is the V component of the image in HSV space, L is ambient illumination image data, and R is reflectance image data.
Optionally, in the step of enhancing the V component, the enhanced V component may be represented as V':
in the formula, ω N is a weighting coefficient of the nth scale, and N is the number of scales.
Optionally, in the step of adaptively adjusting the S component, the adjusted S component may be represented as S':
S′=S+t×(V′-V×λ)
in the formula, t is a proportionality constant, and λ is an adaptive coefficient.
Optionally, in the step of compressing the luminance component in the data after the image enhancement, the formula for compressing the luminance component of the enhanced image is as follows:
Y′=C(Y)
where C () is an image encoder, Y is an image luminance signal, and Y' is a compressed luminance signal.
Optionally, the step of training a part of the image data after enhancing the interest region as training data to generate a confrontation network by using the training data specifically includes the following steps:
constructing a chrominance component to generate a confrontation network to color the picture;
the generator loss function is designed.
Optionally, the generator loss function is:
L mixed =a 1 ·L a +a 2 ·L MSE +a 3 L content +a 4 L color
where a1, a2, a3, a4 are loss function weights, where La is the term of the opposing loss:
L a =-log D(G(Y))
where log () is the log function, D () is the image discriminator model, and LMSE is the mean square error loss term:
L MSE =||G(Y)-X|| 2
in the formula, |2 is a2 norm, X is a target color image, and Lcontent is a characteristic loss term;
in the formula, | | |1 is a1 norm, cj, hj, wj respectively represent the channel number, length and width of the characteristic diagram,a j-th layer output of the network feature extraction layer;
in the formula, G (y) is the generated image, and G0 and Gt are gaussian filters.
Further, the present invention also provides a video compression apparatus based on region of interest enhancement, the apparatus includes a processor and a memory, the processor executes a program corresponding to an executable program code by reading the executable program code stored in the memory, so as to implement the video compression method based on region of interest enhancement as described above.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the invention, the interest area is detected by using a YOLOv4 algorithm, the algorithm is high in detection speed, and video data can be processed in real time;
2. the RGB components are converted into HSV components to enhance the region of interest, and the mean value, variance and entropy of the enhanced image are superior to those of the image directly enhanced in the RGB space;
3. the invention only compresses the image brightness component, and designs and generates a countermeasure network at a decoding end to color the image, thereby improving the video compression efficiency;
4. the generation of the anti-network mixing loss function constructed by the invention improves the quality of the enhanced image in the video coloring process.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
fig. 1 is a flow chart of a video compression method based on region of interest enhancement according to an embodiment of the present invention;
FIG. 2 is a diagram of a structure of a generate confrontation network according to an embodiment of the invention;
fig. 3 is a block diagram of a video compression apparatus based on region of interest enhancement according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will help those skilled in the art to further understand the present invention, and will make the technical solutions of the present invention and their advantages obvious.
The invention provides a video compression method based on region of interest enhancement, which extracts a region of interest in a video frame through a YOLOV4 algorithm, enhances and compresses the video frame, and colors a decoded image through a generation countermeasure network, and specifically, fig. 1 is a flow diagram of the video compression method based on region of interest enhancement provided by the embodiment of the invention, as shown in fig. 1, the method comprises the following steps:
s1: carrying out transformation and quantization processing on the video frame to remove spatial redundant information;
s2: extracting an image interesting region through YOLOv4, converting RGB color space components into HSV space components, and then enhancing the interesting region;
specifically, the step S2 includes the steps of:
step 21: detecting people and vehicles in the image by using YOLO v4, and taking the detected objects such as the people and the vehicles as interest areas;
step 22: converting an RGB color space image of the interest area into an HSV space image;
the concrete formula is as follows:
wherein R, G, B is R, G, B component of image respectively, H, S, V is component of image-again-HSV space respectively, H is equal to [0,360], S is equal to [0,1], V is equal to [0,1], tmax is maximum value in R, G, B, and Tmin is minimum value in R, G, B;
step 23: separating the irradiated light component and the reflected light component of the V component by a logarithm taking method;
V=L×R (2)
in the formula, V is a V component of an image in HSV space, L is environment illumination image data, and R is reflection image data;
and step 24: the V component is enhanced, and the enhanced V component can be expressed as V':
in the formula, ω N is the weighting coefficient of the nth scale, and N is the number of scales;
step 25: the S component is adaptively adjusted, and the adjusted S component can be represented as S':
S′=S+t×(V′-V×λ) (4)
in the formula, t is a proportionality constant, and lambda is an adaptive coefficient;
step 26: after the image is enhanced, the enhanced H, S, V components are reconverted to R, G, B components.
S3: after image enhancement, compressing the brightness component in the data;
specifically, the luminance component of the enhanced image is compressed:
Y′=C(Y) (5)
where C () is an image encoder, Y is an image luminance signal, and Y' is a compressed luminance signal.
S4: taking a part of image data after the interest area is enhanced as training data, and training by utilizing the training data to generate a confrontation network;
specifically, the generation of the structure diagram of the countermeasure network is shown in fig. 2, and the step S4 includes the following steps:
step 41: constructing a chrominance component to generate a confrontation network to color the picture;
the generator is composed of a multi-scale feature extractor, residual error connection based on an attention mechanism and a regularized feature reconstruction mechanism, and the discriminator adopts a PatchGAN structure.
Step 42: designing a generator loss function;
the concrete formula is as follows:
L mixed =a 1 ·L a +a 2 ·L MSE +a 3 L content +a 4 L color (6)
where a1, a2, a3, a4 are loss function weights, where La is the term of the opposing loss:
L a =-logD(G(Y)) (7)
where log () is the log function, D () is the image discriminator model, and LMSE is the mean square error loss term:
L MSE =||G(Y)-X|| 2 (8)
in the formula, |2 is a2 norm, X is a target color image, and Lcontent is a characteristic loss term;
in the formula, | | |1 is a1 norm, cj, hj, wj respectively represent the channel number, length and width of the characteristic diagram,a j-th layer output of the network feature extraction layer;
in the formula, G (y) is the generated image, and G0 and Gt are gaussian filters.
S5: and decoding the compressed brightness component, inputting the decoded image into a generation countermeasure network, and coloring the image to obtain a decoded enhanced image.
As shown in fig. 3, the present invention further provides a video compression apparatus based on region of interest enhancement, the apparatus includes a processor 31 and a memory 32, the processor 31 runs a program corresponding to an executable program code by reading the executable program code stored in the memory 32, so as to implement the video compression method based on region of interest enhancement according to the foregoing embodiment.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the invention, the interest area is detected by using a YOLOv4 algorithm, the algorithm is high in detection speed, and video data can be processed in real time;
2. the RGB components are converted into HSV components to enhance the region of interest, and the mean value, variance and entropy of the enhanced image are superior to those of the image directly enhanced in the RGB space;
3. the invention only compresses the image brightness component, and designs and generates a countermeasure network at a decoding end to color the image, thereby improving the video compression efficiency;
4. the generation of the anti-network mixing loss function constructed by the invention improves the quality of the enhanced image in the video coloring process.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.
Claims (10)
1. A method for video compression based on region of interest enhancement, the method comprising the steps of:
carrying out transformation and quantization processing on the video frame to remove spatial redundant information;
extracting an image interesting region through YOLOv4, converting RGB color space components into HSV space components, and then enhancing the interesting region;
after image enhancement, compressing the brightness component in the data;
taking a part of image data after the interest area is enhanced as training data, and training by utilizing the training data to generate a confrontation network;
and decoding the compressed brightness component, inputting the decoded image into a generation countermeasure network, and coloring the image to obtain a decoded enhanced image.
2. The method of claim 1, wherein the step of extracting the image region of interest by YOLOv4, converting RGB color space components into HSV space components, and enhancing the region of interest comprises the following steps:
detecting people and vehicles in the image by using YOLO v4, and taking the detected objects such as the people and the vehicles as interest areas;
converting an RGB color space image of the interest area into an HSV space image;
separating the irradiated light component and the reflected light component of the V component by a logarithm taking method;
enhancing the V component;
carrying out self-adaptive adjustment on the S component;
after the image is enhanced, the enhanced H, S, V components are reconverted to R, G, B components.
3. The method according to claim 2, wherein the region of interest RGB color space image is converted into HSV space image by the following formula:
wherein R, G, B is R, G, B component of image respectively, H, S, V is component of image-again-HSV space respectively, H epsilon [0,360], S epsilon [0,1], V epsilon [0,1], tmax is the maximum value in R, G, B, and Tmin is the minimum value in R, G, B.
4. The region of interest enhancement-based video compression method according to claim 2, wherein the V component of the illumination light component and the reflected light component are separated by a logarithm method, and the formula is as follows:
V=L×R
where V is the V component of the image in HSV space, L is ambient illumination image data, and R is reflectance image data.
6. The method of claim 2, wherein in the step of adaptively adjusting the S component, the adjusted S component can be represented as S':
S′=S+t×(V′-V×λ)
in the formula, t is a proportionality constant, and λ is an adaptive coefficient.
7. The method of claim 1, wherein in the step of compressing the luminance component of the data after the image enhancement, the formula for compressing the luminance component of the enhanced image is as follows:
T′=C(Y)
where C () is an image encoder, Y is an image luminance signal, and Y' is a compressed luminance signal.
8. The method according to claim 1, wherein the step of generating the countermeasure network by training with the training data, using a part of the image data after the region of interest enhancement as the training data, comprises the steps of:
constructing a chrominance component to generate a confrontation network to color the picture;
the generator loss function is designed.
9. The method of claim 8, wherein the generator loss function is:
L mixed =a 1 ·L a +a 2 ·L MSE +a 3 L content +a 4 L color
where a1, a2, a3, a4 are loss function weights, where La is the term of the opposing loss:
L a =-log D(G(Y))
where log () is the log function, D () is the image discriminator model, and LMSE is the mean square error loss term:
L MSE =||G(Y)-X|| 2
in the formula, |2 is a2 norm, X is a target color image, and Lcontent is a characteristic loss term;
in the formula, | | |1 is a1 norm, cj, hj, wj respectively represent the channel number, length and width of the characteristic diagram,a j-th layer output of the network feature extraction layer;
in the formula, G (y) is the generated image, and G0 and Gt are gaussian filters.
10. An apparatus for video compression based on region of interest enhancement, the apparatus comprising a processor and a memory, the processor running a program corresponding to an executable program code stored in the memory by reading the executable program code, for implementing the method of video compression based on region of interest enhancement according to any one of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211006575.5A CN115396669A (en) | 2022-08-22 | 2022-08-22 | Video compression method and device based on interest area enhancement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211006575.5A CN115396669A (en) | 2022-08-22 | 2022-08-22 | Video compression method and device based on interest area enhancement |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115396669A true CN115396669A (en) | 2022-11-25 |
Family
ID=84120936
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211006575.5A Pending CN115396669A (en) | 2022-08-22 | 2022-08-22 | Video compression method and device based on interest area enhancement |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115396669A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115604477A (en) * | 2022-12-14 | 2023-01-13 | 广州波视信息科技股份有限公司(Cn) | Ultrahigh-definition video distortion optimization coding method |
CN116258653A (en) * | 2023-05-16 | 2023-06-13 | 深圳市夜行人科技有限公司 | Low-light level image enhancement method and system based on deep learning |
-
2022
- 2022-08-22 CN CN202211006575.5A patent/CN115396669A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115604477A (en) * | 2022-12-14 | 2023-01-13 | 广州波视信息科技股份有限公司(Cn) | Ultrahigh-definition video distortion optimization coding method |
CN116258653A (en) * | 2023-05-16 | 2023-06-13 | 深圳市夜行人科技有限公司 | Low-light level image enhancement method and system based on deep learning |
CN116258653B (en) * | 2023-05-16 | 2023-07-14 | 深圳市夜行人科技有限公司 | Low-light level image enhancement method and system based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115396669A (en) | Video compression method and device based on interest area enhancement | |
Sadek et al. | Robust video steganography algorithm using adaptive skin-tone detection | |
EP3354030B1 (en) | Methods and apparatuses for encoding and decoding digital images through superpixels | |
CN110798690B (en) | Video decoding method, and method, device and equipment for training loop filtering model | |
CN110717868B (en) | Video high dynamic range inverse tone mapping model construction and mapping method and device | |
CN101360184A (en) | System and method for extracting key frame of video | |
Li et al. | Novel image authentication scheme with fine image quality for BTC-based compressed images | |
US20140212046A1 (en) | Bit depth reduction techniques for low complexity image patch matching | |
CN109389569A (en) | Based on the real-time defogging method of monitor video for improving DehazeNet | |
US11854164B2 (en) | Method for denoising omnidirectional videos and rectified videos | |
Yang et al. | Low-light image enhancement based on Retinex theory and dual-tree complex wavelet transform | |
CN111899193A (en) | Criminal investigation photography system and method based on low-illumination image enhancement algorithm | |
US7106908B2 (en) | Method and apparatus for selecting a format in which to re-encode a quantized image | |
CN111968073B (en) | No-reference image quality evaluation method based on texture information statistics | |
Akbari et al. | Image compression using adaptive sparse representations over trained dictionaries | |
Katakol et al. | Distributed learning and inference with compressed images | |
CN113810654A (en) | Image video uploading method and device, storage medium and electronic equipment | |
Li et al. | Efficient visual computing with camera raw snapshots | |
CN114066914A (en) | Image processing method and related equipment | |
GB2299912A (en) | Fractal image compression device and method using perceptual distortion measure | |
Liang et al. | Multi-scale and multi-patch transformer for sandstorm image enhancement | |
Haffner et al. | Color documents on the Web with DjVu | |
CN115619677A (en) | Image defogging method based on improved cycleGAN | |
Cao et al. | Oodhdr-codec: Out-of-distribution generalization for hdr image compression | |
WO2022226850A1 (en) | Point cloud quality enhancement method, encoding and decoding methods, apparatuses, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |