Summary of the invention
In view of this, the present invention provides a kind of video pressure based on convolutional neural networks and the significant information of HEVC compression domain
Contracting method, this method organically combine method for video coding and human visual system, can remove more subjective vision perception
Redundancy further improves video compress effect while promoting the subjective vision perceived quality of human eye.
To achieve the goals above, the present invention adopts the following technical scheme:
A kind of video-frequency compression method based on convolutional neural networks and the significant information of HEVC compression domain, this method includes following
Step:
It is combined on the basis of convolutional neural networks in HEVC compression process to the motion estimation result of each CU block to defeated
Enter video and carries out conspicuousness detection;
It calculates the saliency value of each CU block and its corresponding QP value is selected, and tradition is added in the saliency value of each CU block
Rate distortion computation method, obtain final rate-distortion optimization target, realize high quality video-aware coding.
The beneficial effects of the present invention are: this method is from the saliency algorithm based on attention mechanism and perceives excellent
The two aspects of first video compression algorithm improve and strengthen to HEVC, and in terms of saliency, this method is in convolution
It is combined on the basis of neural network in HEVC compression process and adaptive move is carried out to the two to the motion estimation result of each CU
State fusion, to complete to detect the conspicuousness of input video;In terms of perceiving preferential video compression algorithm, according to the aobvious of CU
Work value selects its corresponding QP, can be encoded with ensuring to have with lesser QP compared with the CU of highly significant, while will be current
The significant characteristics of CU block are included in traditional rate distortion computation method, thus achieve the purpose that perception is preferential, this method reduce
The perception redundancy of video is to obtain preferable compression effectiveness.
On the basis of above scheme, explanation is further explained to technical solution of the present invention.
Further, conspicuousness detection is carried out to input video, specifically includes the following steps:
Original video frame is inputted, conspicuousness detection in airspace is carried out to input video frame according to convolutional neural networks, is generated empty
Domain conspicuousness testing result;
According to the motion vector that inter predication process obtains in HEVC compression process, the movement conspicuousness of domain portion is generated
As a result;
By the movement significant result of the airspace conspicuousness testing result and the domain portion, using Entropy uncertainty
Algorithm is merged.
Further, the convolutional neural networks structure, comprising:
(1) it convolutional layer: obtains indicating the characteristic pattern of image local feature after convolution operation, adds one after each convolutional layer
A amendment linear unit;Since the space relationship between the pixel of image is local, the only local message ratio consideration of consideration pixel
The complexity of global information is much lower, the characteristic pattern for indicating image local feature can be obtained after convolution operation, in every secondary volume
After product operation, a Rectified Linear Unit can be generally followed, the activation primitive calculating speed is fast, and can have
Effect alleviates gradient disappearance problem;
(2) local acknowledgement normalizes layer: the output of neural network middle layer smoothly, export as follows:
Wherein, (x, y) indicates that location of pixels, i indicate channel index, and N is port number, and α, β, k, n are customized constant;
L indicates to normalize layer, the corresponding channel index of j expression in first of local acknowledgement;
(3) maximum pond layer: for maximum pond layer for extracting semantic information similar in part, which passes through a N × N
Sliding window operation, wherein window moving step length be N, pass through calculate original image part by window institute inclusion region maximum
It is worth the pixel value as new characteristic pattern corresponding position;
(4) it warp lamination: realizes the size by the characteristic pattern size scaling of small size for original image, obtains final output.
Further, the movement significant result generating process of domain portion are as follows: extract motion information, benefit from video compress domain
With the decoded process of shallow-layer is carried out in HEVC, the motion vector information of predicting unit PU in video frame is obtained, then swears movement
The size of amount reassembles into temporal motion characteristic pattern as the severe degree that block moves.
Further, according to the motion vector that inter predication process obtains in HEVC compression process, the fortune of domain portion is generated
Dynamic significant result, specifically includes the following steps:
Motion information is extracted from video compress domain, using the decoded process of shallow-layer is carried out in HEVC, is obtained pre- in video frame
Survey the motion vector information of unit PU;
Temporal motion characteristic pattern is reassembled into using the size of the motion vector information as the severe degree that block moves.
Further, it is regarded using overall motion estimation algorithm using perspective model according to the temporal motion characteristic pattern
Global motion information in frequency, the process can be stated are as follows:
(x, y) and (x ', y ') is the corresponding pixel points of present frame and reference frame, parameter set m=[m respectively in formula0,...,
m7] represent the globe motion parameter for needing to estimate;
Using gradient descent method to the model solution, the global motion for representing camera motion information can be calculated, by original
Begin to move and subtract global motion, obtains the foreground moving relative to background;
According to the perception prior distribution power function of display movement velocity:
In formula, v indicates movement velocity;K and α indicates constant;
The time conspicuousness of movement is calculated according to its self-information, calculation formula is as follows:
S(t)=-logp (v)=α logv+ β
Wherein β=- logk, α=0.2, β=0.09 are finally normalized to [0,1], obtain time domain Saliency maps.
Further, by the movement significant result of the airspace conspicuousness testing result and the domain portion, using entropy
Uncertainty algorithm is merged, comprising:
The airspace Saliency maps will be calculated and the time domain Saliency maps merge, obtain whole time and space significance
Figure calculates notable figure after merging using following formula:
In formula, U(t)Indicate the perception uncertainty of time domain;U(s)Indicate the uncertainty of airspace conspicuousness;S(t)Indicate fortune
Dynamic time conspicuousness;S(s)Indicate the airspace conspicuousness of movement.
Further, it calculates the saliency value of each CU block and its corresponding QP value is selected, specifically includes the following steps:
Calculate the saliency value of each CU block, calculation formula are as follows:
Wherein, Sn×n(k) indicate that the saliency value of k-th of CU block, the size of k-th of CU block are n*n, i is indicated in n*n block
Coordinate from left to right, j indicate coordinate from top to bottom.;
Calculate the average saliency value of all CU blocks, calculation formula are as follows:
Wherein, SavgIndicate the average saliency value of all CU blocks, width indicates that the width of video frame, height indicate video frame
Height;
According to the average saliency value for the saliency value and all CU blocks for calculating resulting current CU block, dynamic adjusts present frame
QP value obtains the perception QP value of current CU block.
Further, the calculation formula of the perception QP value of current CU block are as follows:
Wherein, QPcIndicate the QP value of present frame, QPkIndicate the perception QP value of current CU block, wkIndicate a transformation parameter,
wkCalculation formula are as follows:
Wherein, a, b, c are normal parameter, and S (k) indicates the saliency value of k-th of CU block, SavgIndicate being averaged for all CU blocks
Saliency value.
Further, final rate-distortion optimization target is obtained, specifically includes the following steps:
The saliency value of each CU block in video is obtained, calculates and perceives the preferential distortion factor;
By the preferential distortion factor of perception, traditional rate distortion computation method is added, obtains final optimization aim.
Further, the formula for calculating the preferential distortion factor of perception is as follows:
Ds=D × (1+SF × SD)
In formula, D is the distortion factor calculation method of HM standard;SF indicates to need to configure the specified sensing and optimizing parameter of file;
The significant sexual deviation of SD expression present encoding block;
The SD calculation formula is as follows:
In formula, SD value range is (- 1,1), ScuIndicate the conspicuousness of current block, SavgIndicate all CU blocks of present frame
Average significance value.
Further, by the preferential distortion factor of perception, traditional rate distortion computation method is added, obtains final optimization mesh
Mark, comprising:
The traditional rate-distortion optimization algorithm of analogy, the improved target of Lagrangian Arithmetic may be expressed as:
min{Ds+λR}
In formula, DsIndicate the perceptual distortion degree of current block conspicuousness;λ indicates Lagrange multiplier;R presentation code bit
Rate.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
Referring to attached drawing 1, the embodiment of the invention discloses one kind to be based on convolutional neural networks and the significant information of HEVC compression domain
Video-frequency compression method, method includes the following steps:
S1: the motion estimation result pair in HEVC compression process to each CU block is combined on the basis of convolutional neural networks
Input video carries out conspicuousness detection;
S2: calculating the saliency value of each CU block and selects its corresponding QP value, and the saliency value of each CU block is added
Traditional rate distortion computation method, obtains final rate-distortion optimization target, realizes the video-aware coding of high quality.
Referring to attached drawing 2, in this method video time and space significance detection and for HD video perception compression process it is whole
Body realizes process are as follows: after input original video frame, conspicuousness detection in airspace is carried out to input video frame using convolutional neural networks,
Simultaneously according to the motion vector that inter predication process obtains in HEVC compression process, the movement conspicuousness knot of domain portion is generated
Fruit merges time-space domain conspicuousness using the method for Entropy uncertainty, to obtain the time-space domain conspicuousness for video
As a result.Sound assurance is provided for subsequent video compression.In video coding section, HEVC standard algorithm can be optimized,
After obtaining video visual conspicuousness, these marking areas (people are more likely to the region paid close attention in statistical significance) are given more
Good compression quality, and can suitably reduce under the premise of not occurring and being excessively distorted the compression quality in non-significant region so as to
Reduce video code rate.In addition it from the core concept of rate-distortion optimization, is calculated according to the rate-distortion optimization based on significance weighted
Method can effectively improve the perceived quality of video compress.
Referring to attached drawing 3, conspicuousness detection is carried out to input video, specifically includes the following steps:
S101: input original video frame carries out the detection of airspace conspicuousness to input video frame according to convolutional neural networks, raw
At airspace conspicuousness testing result;
S102: according to the motion vector that inter predication process obtains in HEVC compression process, the movement of domain portion is generated
Significant result;
S103: by the movement significant result of the airspace conspicuousness testing result and the domain portion, not using entropy
Degree of certainty algorithm is merged.
Specifically, the structure of above-mentioned convolutional neural networks is as shown in Figure 4.
The structure and function of each layer of the convolutional neural networks is as follows:
(1) convolutional layer: due to the space relationship between the pixel of image be it is local, only consider pixel local message ratio examine
The complexity for considering global information is much lower, the characteristic pattern for indicating image local feature can be obtained after convolution operation, each
After convolution operation, a Rectified Linear Unit (ReLU) can be generally followed, the activation primitive calculating speed is fast,
And gradient disappearance problem can be effectively relieved.
(2) local acknowledgement normalizes layer: this layer is equivalent to the output to neural network middle layer and smoothly, be conducive to
Improve the generalization ability of model.The output of this layer is as follows:
Wherein (x, y) indicates that location of pixels, i indicate channel index, and N is port number, and α, β, k, n are customized constant;l
It indicates to normalize layer, the corresponding channel index of the expression of j in first of local acknowledgement;
(3) maximum pond layer: maximum pond layer can extract semantic information similar in part, which passes through a N × N's
Sliding window operation, wherein window moving step length is N, by calculating the part of original image by the maximum value of window institute inclusion region
Pixel value as new characteristic pattern corresponding position.Pondization operates the size that can reduce output, thereby reduces over-fitting.
(4) it warp lamination: realizes the size by the characteristic pattern size scaling of small size for original image, obtains final output.
For example be trained the above-mentioned network on the SALICON data set comprising 9000 pictures, this can be obtained
Airspace conspicuousness network used in inventing.
By trained network model, propagated forward is carried out to the triple channel image of input, final sky can be obtained
Domain Saliency maps, as shown in Fig. 5 a-5b, which can effectively calculate the salient region in picture.
Referring to Fig. 6, as the motion vector schematic diagram of video frame, but by the obtained temporal motion feature of the above process
Figure contains the total movement in video frame, and tests and show to stimulate more it is apparent that foreground object is relative to back human eye
The movement of scenery body, therefore, further, the present invention uses overall motion estimation algorithm, is obtained in video using perspective model
Global motion information, which can state are as follows:
Wherein, (x, y) and (x ', y ') is the corresponding pixel points of present frame and reference frame, parameter set m=[m respectively0,...,
m7] globe motion parameter for needing to estimate is represented, it can be used gradient descent method to the model solution, can be calculated and represent video camera
The global motion of motion information, subtracting global motion by original motion can be obtained foreground moving relative to background.
Stocker et al. measures the priori that the mankind perceive about moving object by a series of experiment of psycho-visuals
Probability, experimental result show that the perception prior distribution of movement velocity can be calculated by following power function:
Wherein v is movement velocity, and k and α indicate constant;The time that so movement can be calculated using its self-information is significant
Property, calculation formula are as follows:
S(t)=-logp (v)=α logv+ β
Wherein β=- logk, α=0.2, β=0.09, finally being normalized to [0,1] can be obtained time domain conspicuousness
Figure, as shown in Figure 7 a, image is the frame picture in video BasketballDrive, and video camera is according to personage in the video
The operations such as frequent translation rotation are carried out with the strenuous exercise of basketball, which intercepted when video camera is translated;Figure
7b image shows the time domain Saliency maps that the mentioned algorithm of the present invention is calculated, due to the motion information in algorithm from
Each piece of motion vector in HEVC cataloged procedure, so motion detection result inevitably block structure, but remain to
Find out that global motion is eliminated well, and highlights more significant moving region in foreground object.
This fusion rule proposed by the present invention can dynamically be adjusted with the variation of time domain and airspace uncertainty,
This method is more flexible compared with the fusion method of traditional preset parameter, more meets the detection demand for video, such as Fig. 8 institute
Show, (a is raw frames, and b is time domain Saliency maps, and c is the uncertain figure of time domain, and d is airspace Saliency maps, and e is uncertain for airspace
Figure, f be the final notable figure after uncertain weighting) for the feature progress effective integration of space-time, and strengthen uncertainty compared with
The testing result of low area, fused space-time, which does not know figure, can preferably reflect the significant watching area of human eye.
In order to preferably assess the testing result of the algorithm, the present invention can choose five evaluation indexes compare testing result and
The difference of practical gaze data, meanwhile, the algorithm and analogous algorithms (such as SAVC algorithm) are compared.
Experiment is chosen 10 video sequences from 3 different resolutions and is detected, and video information is as shown in table 1:
Table 1 tests Video sequence information used
Assessment strategy (AUC, SIM, CC, NSS, KL) using six kinds of mainstream in the world to conspicuousness model calculates three kinds
Method is assessed, and wherein AUC value is more accurate to the prediction of image signal portion closer to 1 explanation, and SIM is to measure two distribution phases
Like a measurement of degree, CC is for measuring notable figure and watching a kind of symmetrical index of the linear relationship between figure attentively, and NSS is
The average normalized conspicuousness of fixed position is assessed, above four indexs are the bigger the better, and KL is then to utilize a kind of probability solution
Release to assess significance and watch figure attentively, the information that value has evaluated notable figure is lost, opposite, the numerical value of KL index it is more low more
It is good.
Specifically, it calculates the saliency value of each CU block and its corresponding QP value is selected, specifically includes the following steps:
Calculate the saliency value of each CU block, calculation formula are as follows:
Wherein, Sn×n(k) indicate that the saliency value of k-th of CU block, the size of k-th of CU block are n*n, i is indicated in n*n block
Coordinate from left to right, j indicate coordinate from top to bottom.
Calculate the average saliency value of all CU blocks, calculation formula are as follows:
Wherein, SavgIndicate the average saliency value of all CU blocks, width indicates that the width of video frame, height indicate video frame
Height;
According to the average saliency value for the saliency value and all CU blocks for calculating resulting current CU block, dynamic adjusts present frame
QP value obtains the perception QP value of current CU block.
Specifically, the calculation formula of the perception QP value of current CU block are as follows:
Wherein, QPcIndicate the QP value of present frame, QPkIndicate the perception QP value of current CU block, wkIndicate a transformation parameter,
wkCalculation formula are as follows:
Wherein, a, b, c are normal parameter, and S (k) indicates the saliency value of k-th of CU block, SavgIndicate being averaged for all CU blocks
Saliency value.
Referring to attached drawing 9, final rate-distortion optimization target is obtained, specifically includes the following steps:
S201: obtaining the saliency value of each CU block in video, calculates and perceives the preferential distortion factor;
S202: by the preferential distortion factor of perception, traditional rate distortion computation method is added, obtains final optimization mesh
Mark.
Specifically, the formula for calculating the preferential distortion factor of perception is as follows:
Ds=D × (1+SF × SD)
In formula, D is the distortion factor calculation method of HM standard;SF indicates to need to configure the specified sensing and optimizing parameter of file;
The significant sexual deviation of SD expression present encoding block;
Specifically, the SD calculation formula is as follows:
In formula, SD value range is (- 1,1);ScuIndicate the conspicuousness of current block, SavgIndicate all CU blocks of present frame
Average significance value.
Specifically, by the preferential distortion factor of perception, traditional rate distortion computation method is added, obtains final optimization mesh
Mark, comprising:
The traditional rate-distortion optimization algorithm of analogy, the improved target of Lagrangian Arithmetic may be expressed as:
min{Ds+λR}
In formula, DsIndicate the perceptual distortion degree of current block conspicuousness;λ indicates Lagrange multiplier;R presentation code bit
Rate.
The perception characteristics of the present embodiment combination human eye propose the rate distortion meter of a kind of combination sensor model and time and space significance
Calculation method, using this improved method can various coding modes on the basis of considering conspicuousness in comprehensive consideration HEVC,
Such as CU division, search pattern, to proceed from the situation as a whole to carry out optimal parameter selection.
The traditional rate-distortion optimization algorithm of analogy, the improved target of Lagrangian Arithmetic may be expressed as:
min{Ds+λR}
DsAs the perceptual distortion degree for combining current block conspicuousness, it can be ensured that better perceived coding quality, it is this to change
Into can ensure lower perceptual distortion and low bit rate, this for video flowing low-bandwidth transmission advantageously.
Three kinds of methods based on HM standard are respectively adopted as benchmark, using BD-EWPSNR, based on the BD- of EWPSNR
Rate, BD-PSNR and BD-SSIM carry out quantitative comparison in all directions to experimental result, wherein preceding two indexs can be intuitively anti-
The performance superiority and inferiority of proposed method and pedestal method under human eye perceptual criteria herein is mirrored, rear two indexs are then the objective finger of standard
Mark, wherein the calculating of PSNR is based only on error suseptibility (error sensitivity), the matching with perception visual quality
It is not fine, it is difficult to describe the perceived quality of reconstruction image or video, SSIM is then by brightness relevant to object structures and right
Distortion measure is carried out as the structural information in image than degree, the structure distortion of image entirety can be reacted to a certain extent.
In four indices, BD-EWPSNR, BD-PSNR and BD-Rate are the bigger the better, BD-Rate the smaller the better (consideration symbol).
Assessment result is as shown in table 2:
2 video compress assessment result of table is referring to table
The experimental results showed that mentioned algorithm is maximum relative to the adaptive QP algorithm advantage proposed in HM standard herein, put down
Equal BD-EWPSNR, which improves 0.710, BD-Rate, reduces by 20.332, while relative to the quantization of the rate-distortion optimization of HM standard and more QP
Optimization method BD-EWPSNR is also higher by 0.317 and 0.354 respectively, although the BD-PSNR and BD-SSIM of mentioned method are equal herein
Declined, but the decline is to improve the inexorable trend of marking area compression effectiveness, improves Perception Area under the conditions of same code rate
The compression quality in domain must more be concentrated to sacrifice the compression quality in non-significant region as cost, and when marking area is smaller
When, the trend is more significant.
To exclude influence of the conspicuousness testing result for compression effectiveness, effectively compare the compression algorithm that this method is proposed
Compression effectiveness is watched figure attentively using the eye movement of video each in database and is tested, and experimental result is as shown in table 3:
Table 3 watches the video compress quality assessment result of figure attentively referring to table based on eye movement
As can be seen from the above table, this method institute's pressure-raising compression algorithm performs better than at this time, it is known that this method can guarantee
Video objective quality effectively improves EWPSNR under the premise of not being remarkably decreased, compared to HM standard optimization techniques with more validity and
Superiority, and under the subjective feeling of people, the method that we are proposed has optimal appreciation effect.
For the compression process of HD video, compression efficiency is also a very important factor of evaluation, to measure
The compression efficiency of the mentioned algorithm of the present invention records the compression time under various methods in experimentation, in an Intel
It is carried out under the experiment condition of Xeon E5-1620 v3 CPU with 8GB RAM and a NVIDIA Titan X GPU real
It tests, on the basis of using the setting of time used in the HM standard method of RDOQ, it is as shown in table 4 data can be obtained:
The comparison of 4 video compression efficiency of table is referring to table
According to experimental result it is recognized that while the time used in the AQP method based on HM standard is most short, but its compression effectiveness
It is worst, although while MQP method effect it is good compared with AQP method carried out within the scope of given QP because MQP algorithm is equivalent to
Exhaustion is to obtain QP used in the best compression result of effect under rate-distortion optimization, so its compression time longest, is standard HM
6.46 times.
Quantitative experiment the result shows that, the mentioned method of the present invention is superior to HM canonical algorithm in compression efficiency and compression effectiveness
And its every optimization method proposed, wherein the BD-EWPSNR of method provided by the present invention ratio AQP method is averagely higher by
0.71, it is 2.59 times of MQP method in compression efficiency.
Method provided in this embodiment obtains the time domain conspicuousness of video using the motion vector information of HEVC compression domain, benefit
Airspace conspicuousness is detected with convolutional neural networks, and is merged the two using the method for Entropy uncertainty, gives full play to time-space domain
Different characteristic feature, and instruct using obtained significant result the compression process of HEVC.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other
The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment
For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part
It is bright.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest scope of cause.