CN107155111B

CN107155111B - Video compression method and device

Info

Publication number: CN107155111B
Application number: CN201710413111.9A
Authority: CN
Inventors: 李益永
Original assignee: Individual
Current assignee: Individual
Priority date: 2017-06-05
Filing date: 2017-06-05
Publication date: 2020-02-18
Anticipated expiration: 2037-06-05
Also published as: CN107155111A

Abstract

The invention provides a video compression method and device. A video compression method, comprising: acquiring video data to be compressed; extracting each frame image of the video data; respectively establishing a tensor model for each frame of image; and respectively carrying out visual disturbance processing on the tensor model of each frame of image, and carrying out low-rank approximation processing on the tensor model of each frame of image to obtain compressed video data. The video compression method combines the visual characteristics of human eyes, and the video compression is realized more efficiently.

Description

Video compression method and device

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a video compression method and apparatus.

Background

Video compression technology is a premise of computer processing of video data, and the purpose of video compression is to reduce the space for video storage or reduce the bandwidth of video transmission. Even today, where storage and bandwidth are relatively inexpensive, facing the strong demand for high definition video, large capacity storage and real-time transmission of video is almost impossible without video compression.

A common video compression algorithm basically splits video data into images, and then encodes the images by using an image encoding method to compress the images, thereby compressing the video data. The improvement of the coding algorithm is a main approach for improving the performance of the video compression algorithm. However, only the coding method is improved, and the progress space is small. In fact, the information contained in the video image has a large part of redundancy relative to the image information that can be perceived by human eyes, and the common video compression algorithm ignores the compression of the video data in combination with the visual effect of human eyes.

Disclosure of Invention

Based on the defects and shortcomings of the prior art, the invention provides a video compression method and device, which can comprehensively consider the visual characteristics of human eyes and the data volume after compression, perform tensor-based low-rank approximation compression processing on video data, and realize the compression of the video data more efficiently.

A video compression method, comprising:

acquiring video data to be compressed;

extracting each frame image of the video data;

respectively establishing a tensor model for each frame of image;

and respectively carrying out visual disturbance processing on the tensor model of each frame of image, and carrying out low-rank approximation processing on the tensor model of each frame of image to obtain compressed video data.

Preferably, the separately performing the visual perturbation processing on the tensor model of each frame of image includes:

for each element of each pixel in the tensor model of each frame image, the following processing is respectively carried out:

performing modular operation on the value of the element with respect to a set compression precision value P to obtain a modular value; wherein the set compression precision value P takes the value of the positive integer power of 2;

if the obtained module value is less than P/2, setting the result obtained by subtracting the module value from the value of the element as a new value of the element;

if the obtained module value is larger than P/2, adding P to the value of the element, subtracting the module value to obtain a result, and setting the result as a new value of the element;

if the obtained module value is equal to P/2, resetting the element according to the value of the corresponding element of the adjacent pixel; the adjacent pixels comprise adjacent pixels of the pixel of the element in the image of the current frame and pixels of the pixel of the element in the image of the previous frame.

Preferably, said resetting the value of the corresponding element of the neighboring pixel according to the value of the element includes:

calculating the absolute value of the difference between the value of the element and the value of the corresponding element of each adjacent pixel;

if the absolute value of the difference value between the value of the corresponding element of the n adjacent pixels and the value of the element is smaller than (P/2) -1, analyzing the change condition of the value of the corresponding element of the n adjacent pixels after the visual disturbance processing is carried out on the corresponding element of the n adjacent pixels; wherein n is a positive integer;

if the number of the adjacent pixels of which the values of the corresponding elements are increased is more in the n adjacent pixels, setting the result obtained by adding P/2 to the value of the element as a new value of the element;

if the number of the adjacent pixels, which are increased by the value of the corresponding element, in the n adjacent pixels is less, setting the result obtained by subtracting P/2 from the value of the element as a new value of the element;

if the absolute value of the difference value between the value of the corresponding element of the adjacent pixel and the value of the element is not less than (P/2) -1, analyzing the change condition of the value of the corresponding element of each adjacent pixel after the visual disturbance processing is carried out on the corresponding element of each adjacent pixel;

if the number of the adjacent pixels of which the values of the corresponding elements are increased is large in each adjacent pixel, setting a result obtained by adding P/2 to the value of the element as a new value of the element;

and if the number of the adjacent pixels of which the values of the corresponding elements are increased is less in each adjacent pixel, setting the result obtained by subtracting P/2 from the value of the element as the new value of the element.

Preferably, after extracting each frame of image of the video data, the method further comprises:

carrying out scene division on all the extracted images;

respectively establishing a tensor model for each divided scene;

and respectively carrying out visual disturbance processing on the tensor model of each scene, and carrying out low-rank approximation processing on the tensor model of each scene to obtain compressed video data.

Preferably, the dividing of the entrance scene for all the extracted images includes:

respectively selecting pixel points for dividing the scene for each frame of image;

calculating the sum of absolute values of differences between the pixel point of the starting frame image and the pixel point of the subsequent image;

and carrying out scene division on all the images according to the sum of the absolute values of the differences between the pixel point of the initial frame image and the pixel point of the subsequent image.

Preferably, the performing low-rank approximation processing on the tensor model to obtain compressed video data includes:

converting the tensor model into a matrix;

decomposing the matrix to obtain an expression of the matrix;

and selecting a set number of low-rank vectors in the expression as compressed video data.

A video compression apparatus comprising:

the data acquisition unit is used for acquiring video data to be compressed;

the image extraction unit is used for extracting each frame of image of the video data;

the model establishing unit is used for respectively establishing tensor models for each frame of image;

and the compression processing unit is used for respectively carrying out visual disturbance processing on the tensor model of each frame of image and carrying out low-rank approximation processing on the tensor model of each frame of image to obtain compressed video data.

Preferably, when the compression processing unit performs the visual perturbation processing on the tensor model of each frame of image, the compression processing unit is specifically configured to:

Preferably, when the compression processing unit resets the value of the element according to the value of the corresponding element of the adjacent pixel, the compression processing unit is specifically configured to:

Preferably, the apparatus further comprises: the scene division unit is used for carrying out scene division on all the extracted images;

correspondingly, the model establishing unit respectively establishes tensor models for each divided scene.

Preferably, when the scene dividing unit divides all the images obtained by extraction into the approach scenes, the scene dividing unit is specifically configured to:

respectively selecting pixel points for dividing the scene for each frame of image; calculating the sum of absolute values of differences between the pixel point of the starting frame image and the pixel point of the subsequent image; and carrying out scene division on all the images according to the sum of the absolute values of the differences between the pixel point of the initial frame image and the pixel point of the subsequent image.

Preferably, the compression processing unit performs low-rank approximation processing on the tensor model to obtain compressed video data, and is specifically configured to:

converting the tensor model into a matrix; decomposing the matrix to obtain an expression of the matrix; and selecting a set number of low-rank vectors in the expression as compressed video data.

When the video compression method provided by the invention is adopted to compress the video data, tensor models are respectively established for each frame image of the video data, and each tensor model is subjected to visual disturbance processing and low-rank approximation processing to obtain the compressed video data. The visual disturbance processing can reduce visual redundant information in the video image, and meanwhile, the tensor model of the image is subjected to low-rank approximation processing, so that the amount of stored video data can be reduced, and therefore the method combines the visual characteristics of human eyes and realizes video compression more efficiently.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic flow chart of a video compression method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of another video compression method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a video compression apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of another video compression apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention discloses a video compression method, which is shown in figure 1 and comprises the following steps:

s101, acquiring video data to be compressed;

specifically, the embodiments of the present invention process video data in a digital format. Furthermore, since color video data is common in life and is commonly transmitted in communication, there is a wide demand for compression of color video data. It is the compression of color video data that is being investigated by the embodiments of the present invention. In fact, the compression processing performed on the black-and-white video data according to the technical solution of the embodiment of the present invention is also within the protection scope of the embodiment of the present invention, if the practical situation allows.

S102, extracting each frame image of the video data;

specifically, the compression of video data generally includes compression processing of each frame of image of the video and processing of inter-frame correlation. When compressing video data, the embodiment of the invention firstly decomposes the video data into images of one frame and one frame, and then compresses each frame of image. Generally, a computer can store and process digital images in RGB form, and thus, the embodiment of the present invention decomposes video data into single-frame images, specifically into RGB images of a single frame. It is understood that the embodiment of the present invention may also decompose the video data into other forms of images, such as HSV images, CMYK images, and the like.

It should be noted that, as can be seen from the above description, the compression processing on the video data according to the embodiment of the present invention is actually the compression processing on each frame of image of the video data. Therefore, the embodiment of the present invention can be used for compressing a single digital image, which is the same as the compression process for each frame of image of video data described later.

S103, establishing a tensor model for each frame of image respectively;

specifically, the tensor is a high-order array, and the n-order tensor is an array with n index sets. For example, a second-order array (matrix) a m, n is a second-order tensor having two sets of indices, a row index set containing m indices and a column index set containing n indices. If each element in the tensor is greater than 0, it is called a non-negative tensor.

A frame of RGB color image contains three elements, Red (R), Green (G) and Blue (B), and it is understood that a frame of RGB color image includes R, G, B three layers, each layer having two parameters, length and width. Therefore, a frame of RGB color images corresponds to a third order tensor a [ 1: h, 1: q, 1:3], where H represents the length of the image, Q represents the width of the image, and 1:3 represents R, G, B trilayers of the image. It is apparent that the tensor for each frame of image is a non-negative tensor.

The tensor model is established for each frame of image, which is equivalent to representing each frame of image by the tensor model, so the processing of the tensor model is equivalent to the processing of the image.

And S104, respectively carrying out visual disturbance processing on the tensor model of each frame of image, and carrying out low-rank approximation processing on the tensor model of each frame of image to obtain compressed video data.

Specifically, the sensitivity of human eyes to color images has an upper limit, and human eyes cannot feel certain finer color expression of color images, so that repeated color elements or more vivid color elements in the color images can be adjusted to occupy smaller storage space in order to make the storage space occupied by the images smaller. The tensor model of the color image is subjected to visual disturbance processing, and the purpose of reducing the storage capacity is achieved.

In the tensor model of the image, a large number of components are contained, wherein the components comprise low-rank components and high-rank components, and main information is concentrated in the low-rank parts, so that the high-rank components contain less information relative to the low-rank parts of the tensor, and the information contained in the whole tensor can be completely replaced by the low-rank components of the tensor, so that the aim of reducing the data volume of the tensor is fulfilled, and the definition of the image is not influenced. Based on the principle, the embodiment of the invention performs low-rank approximation processing on the tensor model of the image, reduces the data volume of the tensor model, can achieve the purpose of compressing the image, and further achieves the purpose of compressing video data.

It should be noted that the order of the visual disturbance processing and the low-rank approximation processing performed on the tensor model of the image may be exchanged, and the compression performance of the technical scheme of the embodiment of the present invention is not affected.

When the video compression method provided by the embodiment of the invention is adopted to compress video data, tensor models are respectively established for each frame of image of the video data, and visual disturbance processing and low-rank approximation processing are carried out on each tensor model to obtain the compressed video data. The visual disturbance processing can reduce visual redundant information in the video image, and meanwhile, the tensor model of the image is subjected to low-rank approximation processing, so that the amount of stored video data can be reduced, and therefore the method combines the visual characteristics of human eyes and realizes video compression more efficiently.

Optionally, in another embodiment of the present invention, the separately performing the visual perturbation on the tensor model of each frame of image includes:

Specifically, each element of each pixel of the tensor model of each frame of image refers to R, G, B three pixels of each pixel of each frame of image, and the value of each element is between 0 and 255. For example, assume that the tensor model of the image is B [ 1: h, 1: q, 1: and 3, one element of one pixel can be represented as B [ j, k, m ], wherein j is more than or equal to 1 and less than or equal to H, k is more than or equal to 1 and less than or equal to Q, and m is more than or equal to 1 and less than or equal to 3. And respectively carrying out visual disturbance processing on each element of each pixel of each frame of image, namely finishing the visual disturbance processing of the whole frame of image.

For the specific visual disturbance processing of each element, i.e. adjusting the value of the element, the following illustrates the specific process of the visual disturbance processing:

first, a compression precision value P (illustrated as P) is set, which is generally a positive integer power of 2, for example, P is 8;

then taking a module value of the value pair 8 of the element B [ j, k, m ] to obtain a module value i, and if i is less than 4, setting B [ j, k, m ] ═ B [ j, k, m ] -i; if i is greater than 4, setting B [ j, k, m ] + 8-i; if i is 4, it is decided whether to add 4 or subtract 4 to B [ j, k, m ] according to the values of the corresponding elements of the neighboring pixels of B [ j, k, m ].

It should be noted that, if the image where B [ j, k, m ] is located is the first frame image, the adjacent pixels of B [ j, k, m ] include the upper, lower, left and right adjacent pixels of B [ j, k, m ] in the present frame image; and if the image where the B [ j, k, m ] is located is not the first frame image, the adjacent pixels of the B [ j, k, m ] comprise the upper, lower, left and right adjacent pixels of the B [ j, k, m ] in the current frame image and pixel elements corresponding to the B [ j, k, m ] in the previous frame image.

Optionally, in another embodiment of the present invention, the resetting the value of the corresponding element of the neighboring pixel according to the value of the element includes:

specifically, as in the example of the above embodiment, the absolute values of the differences between the pixel elements B [ j, k, m ] and the corresponding elements of each adjacent pixel are calculated, and the obtained number of absolute values is the same as the number of adjacent pixels.

And if the number of the absolute values smaller than 3 (namely P/2-1) is more than the number of the absolute values larger than 3 in each absolute value, determining the value change of the B [ j, k, m ] according to the value change of the corresponding elements of the adjacent pixels of the B [ j, k, m ] corresponding to each absolute value smaller than 3 in the visual disturbance processing. Specifically, if the number of elements whose values increase when the visual disturbance processing is performed is large among the corresponding elements of the adjacent pixels of B [ j, k, m ] whose absolute values are smaller than 3, the value of B [ j, k, m ] is increased by 4, that is, B [ j, k, m ] is equal to B [ j, k, m ] + 4; otherwise, the value of B [ j, k, m ] is reduced by 4, i.e., B [ j, k, m ] is equal to B [ j, k, m ] -4.

Specifically, if there is no absolute value smaller than 3 in the absolute values obtained by calculation, it is determined whether to add 4 or subtract 4 when the visual disturbance processing is performed on B [ j, k, m ] according to the value change of the corresponding element of the adjacent pixel of B [ j, k, m ] during the visual disturbance processing.

If the number of corresponding elements of the adjacent pixels with increased values is large when visual disturbance processing is performed on the corresponding elements of the adjacent pixels of B [ j, k, m ], adding 4 to the value of B [ j, k, m ], namely B [ j, k, m ] + 4; if the number of corresponding elements of the adjacent pixels whose values decrease is large when the visual disturbance processing is performed on the corresponding elements of the adjacent pixels of B [ j, k, m ], the value of B [ j, k, m ] is decreased by 4, that is, B [ j, k, m ] is equal to B [ j, k, m ] -4.

It should be noted that, whether there is less than 3 or not less than 3 in the calculated absolute values, when analyzing the specific situation of performing the visual disturbance processing on the corresponding elements of the adjacent pixels, if the increased number of elements is equal to the decreased number of elements, the adjustment of the value of B [ j, k, m ] may be selected to be increased by 4 or decreased by 4.

Optionally, in another embodiment of the present invention, referring to fig. 2, after each frame of image of the video data is extracted, the method further includes:

s203, carrying out scene division on all the extracted images;

specifically, the images of the same scene have high similarity, and the images of the same scene are classified and compressed, so that the same parts among the images can be reduced, and the compression ratio can be improved. Therefore, the embodiment of the invention compresses the image frames of the video by taking the scene as a unit. Firstly, all images contained in video data are subjected to scene division, the images of the same scene are gathered together, and subsequent compression processing is carried out.

S204, respectively establishing a tensor model for each divided scene;

specifically, the above embodiment describes that the tensor model of one frame of image can be expressed as a [ 1: h, 1: q, 1:3], assuming that a scene has r frames of pictures, the scene corresponds to a fourth-order tensor B [ 1: h, 1: q, 1:3,1: r ], wherein B [ 1: h, 1: q, 1: and 3, k is data corresponding to the kth RGB color image in the scene. Since each element of the RGB data is between 0 and 255, the tensor corresponding to each scene is a fourth-order non-negative tensor.

S205, performing visual disturbance processing on the tensor model of each scene, and performing low-rank approximation processing on the tensor model of each scene to obtain compressed video data.

The specific visual disturbance processing and low-rank approximation processing are the same as those performed on each frame of image in the above embodiment.

Steps S201 and S202 in this embodiment correspond to steps S101 and S102 in the method embodiment shown in fig. 1, respectively, for specific content, please refer to the content corresponding to the method embodiment shown in fig. 1, which is not described herein again.

Optionally, in another embodiment of the present invention, the dividing of the entrance scene for all the extracted images includes:

specifically, assume that the tensor model of the image is a [ 1: h, 1: q, 1:3], selecting diagonal elements of each layer (R layer, G layer and B layer) of the tensor as pixel points of scene classification, and also taking an intermediate row of elements and an intermediate column of elements.

Specifically, starting from a first frame image of a video, as a starting frame of a first scene, calculating the sum of absolute values of differences between pixel points for dividing the scene in the starting frame image and pixel points for dividing the scene in a subsequent image, if the sum of absolute values of the differences is not greater than 150 × H, considering that the subsequent image belongs to the first scene, and when the sum of absolute values is greater than 150 × H, using the last frame image as a starting frame of the next scene, and so on, completing scene division of video data.

For example, assuming that the sum of absolute values of differences of pixel points for dividing a scene of the previous N frame images is calculated to be less than 150 × H, and the sum of absolute values of differences of pixel points for dividing a scene of the previous N +1 frame images is calculated to be greater than 150 × H, the previous N frame images are divided into one scene, and the N +1 th frame image is used as a starting frame of a next scene.

Optionally, in another embodiment of the present invention, performing low rank approximation processing on the tensor model to obtain compressed video data includes:

converting the tensor model into a matrix;

decomposing the matrix to obtain an expression of the matrix;

For example, taking the low rank approximation processing of the tensor model of the scene as an example, first, the fourth order tensor B [ 1: h, 1: q, 1:3,1: r is]Conversion to matrix C [ 1: h, 1: q3 r](ii) a Then, matrix decomposition is carried out

Vi is then converted to a matrix D [ 1: q, 1:3 r]Performing a second decomposition

Secondly, the first k λ with larger coefficient are selected_iσ_jAnd the corresponding vector, (where the fraction points are reduced by 1/32), the tensor B can be expressed as

D-B-C, where the components of the vector are each integers between-32-32. The data to be saved then includes the restoration vector and the residual tensor: lambda [ alpha ]_i，u_i，w_i，x_iD, i.e. λ is the video data after compression processing_i，u_i，w_i，x_iAnd D. Because the element data in D is smaller, the used storage amount is smaller. On the other hand, since 3 × k is much smaller than H × Q × 3 × r, the integer vector storage amount is relatively small.

The embodiment of the invention also discloses a video compression device, which is shown in fig. 3 and comprises:

a data acquisition unit 301 configured to acquire video data to be compressed;

an image extracting unit 302, configured to extract each frame of image of the obtained video data;

a model establishing unit 303, configured to respectively establish a tensor model for each frame of image;

and the compression processing unit 304 is configured to perform visual disturbance processing on the tensor model of each frame of image, and perform low-rank approximation processing on the tensor model of each frame of image, so as to obtain compressed video data.

Specifically, please refer to the content of the corresponding method embodiment for the specific working content of each unit in this embodiment, which is not described herein again.

Optionally, in another embodiment of the present invention, when the compression processing unit 304 performs the visual perturbation processing on the tensor model of each frame of image, specifically, the compression processing unit is configured to:

Specifically, please refer to the content of the corresponding method embodiment for the specific work content of the compression processing unit 304 in this embodiment, which is not described herein again.

Optionally, in another embodiment of the present invention, when the compression processing unit 304 resets the value of the element according to the value of the corresponding element of the adjacent pixel, specifically, the compression processing unit is configured to:

Optionally, in another embodiment of the present invention, referring to fig. 4, the apparatus further includes: a scene division unit 305, configured to perform scene division on all the extracted images;

correspondingly, the model establishing unit 303 establishes a tensor model for each divided scene.

Specifically, for the specific work content of the scene partitioning unit 305 and the model establishing unit 303 in this embodiment, please refer to the content of the corresponding method embodiment, which is not described herein again.

Optionally, in another embodiment of the present invention, when the scene dividing unit 305 divides all the extracted images into the approach scenes, it is specifically configured to:

Specifically, please refer to the content of the corresponding method embodiment for the specific work content of the scene dividing unit 305 in this embodiment, which is not described herein again.

Optionally, in another embodiment of the present invention, when the compression processing unit 304 performs low-rank approximation processing on the tensor model to obtain compressed video data, the compression processing unit is specifically configured to:

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of video compression, comprising:

acquiring video data to be compressed;

extracting each frame image of the video data;

respectively establishing a tensor model for each frame of image;

respectively carrying out visual disturbance processing on the tensor model of each frame of image, and carrying out low-rank approximation processing on the tensor model of each frame of image to obtain compressed video data;

wherein, the respectively performing visual disturbance processing on the tensor model of each frame of image comprises:

if the obtained module value is equal to P/2, resetting the element according to the value of the corresponding element of the adjacent pixel; wherein the neighboring pixels include a neighboring pixel of a pixel where the element is located in the current frame of image and a pixel corresponding to the pixel where the element is located in the previous frame of image, and the resetting the value of the element according to the value of the corresponding element of the neighboring pixel includes:

2. The method of claim 1, wherein after extracting each frame of image of the video data, the method further comprises:

carrying out scene division on all the extracted images;

respectively establishing a tensor model for each divided scene;

3. The method according to claim 2, wherein the dividing of the entrance scene for all the extracted images comprises:

4. The method of claim 1 or 2, wherein performing low rank approximation on the tensor model to obtain compressed video data comprises:

converting the tensor model into a matrix;

decomposing the matrix to obtain an expression of the matrix;

5. A video compression apparatus, comprising:

the data acquisition unit is used for acquiring video data to be compressed;

the compression processing unit is used for respectively performing visual disturbance processing on the tensor model of each frame of image and performing low-rank approximation processing on the tensor model of each frame of image to obtain compressed video data;

when the compression processing unit performs the visual disturbance processing on the tensor model of each frame of image, the compression processing unit is specifically configured to:

if the obtained module value is equal to P/2, resetting the element according to the value of the corresponding element of the adjacent pixel; wherein, the adjacent pixels include an adjacent pixel of the pixel where the element is located in the image of the frame and a pixel corresponding to the pixel where the element is located in the image of the previous frame, and the compression processing unit is specifically configured to, when resetting the value of the element according to the value of the corresponding element of the adjacent pixel:

6. The apparatus of claim 5, further comprising: the scene division unit is used for carrying out scene division on all the extracted images;

7. The apparatus according to claim 6, wherein the scene segmentation unit, when segmenting all the extracted images into the approach scene, is specifically configured to:

8. The apparatus according to claim 5 or 6, wherein the compression processing unit is configured to perform low rank approximation processing on the tensor model to obtain the compressed video data, and is specifically configured to: