CN109309834A

CN109309834A - Video-frequency compression method based on convolutional neural networks and the significant information of HEVC compression domain

Info

Publication number: CN109309834A
Application number: CN201811392633.6A
Authority: CN
Inventors: 祝世平; 刘畅
Original assignee: Beihang University
Current assignee: Xiaoxiang Zhipao (Chongqing) Innovation Technology Co.,Ltd.
Priority date: 2018-11-21
Filing date: 2018-11-21
Publication date: 2019-02-05
Anticipated expiration: 2038-11-21
Also published as: CN109309834B

Abstract

The invention discloses a kind of video-frequency compression method based on convolutional neural networks and the significant information of HEVC compression domain, this method is from the saliency algorithm based on attention mechanism and perceives preferential video compression algorithm in terms of the two HEVC is improved and strengthened, in terms of saliency, this method combines in HEVC compression process on the basis of convolutional neural networks and carries out adaptive dynamic fusion to the two to the motion estimation result of each CU, to complete to detect the conspicuousness of input video；In terms of perceiving preferential video compression algorithm, its corresponding QP is selected according to the saliency value of CU, it can be encoded with ensuring to have compared with the CU of highly significant with lesser QP, the significant characteristics of current CU block are included in traditional rate distortion computation method simultaneously, to achieve the purpose that perception is preferential, this method reduce the perception redundancies of video to obtain preferable compression effectiveness.

Description

Video-frequency compression method based on convolutional neural networks and the significant information of HEVC compression domain

Technical field

The present invention relates to technical field of video processing, more particularly to a kind of to be based on convolutional neural networks and HEVC The video-frequency compression method of the significant information of compression domain.

Background technique

Currently, with the continuous development of video compression technology, people to the high-quality of video, high real-time requirements increasingly Height, video encoding standard HEVC (High Efficiency Video Coding) of new generation is come into being, with higher volume Code performance is applied in HD video processing.

But video encoding standard HEVC of new generation, while coding efficiency improves, encoder complexity also increases therewith, When it is applied in video compression in this way, compression efficiency is difficult to ensure, especially more and more general in HD video application In the case where and, because Bandwidth-Constrained and the problem bring huge challenge, modern society pair to video compression technology The requirement of video high definition is also higher and higher, is developed to 4K (resolution ratio is 3840 × 2160) by initial QCIF, or even soon Development be the ultra high-definition video of 8K (resolution ratio be 7680 × 4320) afterwards, in this way to the compression of video, store and transmit and all propose Specifically how higher requirement, can be improved compression efficiency, be more clear the image quality of human eye part of interest, be true etc. It is most important.Existing video encoding standard HEVC be no longer satisfied high quality HD video transmission, promote human eye While subjective vision experiences quality, and compression efficiency can be improved, be more clear the image quality of human eye part of interest, very Real requirement.

Therefore, human eye subjective vision perceived quality can be promoted and further promote video compress by how providing one kind The problem of video-frequency compression method of effect is those skilled in the art's urgent need to resolve.

Summary of the invention

In view of this, the present invention provides a kind of video pressure based on convolutional neural networks and the significant information of HEVC compression domain Contracting method, this method organically combine method for video coding and human visual system, can remove more subjective vision perception Redundancy further improves video compress effect while promoting the subjective vision perceived quality of human eye.

To achieve the goals above, the present invention adopts the following technical scheme:

A kind of video-frequency compression method based on convolutional neural networks and the significant information of HEVC compression domain, this method includes following Step:

It is combined on the basis of convolutional neural networks in HEVC compression process to the motion estimation result of each CU block to defeated Enter video and carries out conspicuousness detection；

It calculates the saliency value of each CU block and its corresponding QP value is selected, and tradition is added in the saliency value of each CU block Rate distortion computation method, obtain final rate-distortion optimization target, realize high quality video-aware coding.

The beneficial effects of the present invention are: this method is from the saliency algorithm based on attention mechanism and perceives excellent The two aspects of first video compression algorithm improve and strengthen to HEVC, and in terms of saliency, this method is in convolution It is combined on the basis of neural network in HEVC compression process and adaptive move is carried out to the two to the motion estimation result of each CU State fusion, to complete to detect the conspicuousness of input video；In terms of perceiving preferential video compression algorithm, according to the aobvious of CU Work value selects its corresponding QP, can be encoded with ensuring to have with lesser QP compared with the CU of highly significant, while will be current The significant characteristics of CU block are included in traditional rate distortion computation method, thus achieve the purpose that perception is preferential, this method reduce The perception redundancy of video is to obtain preferable compression effectiveness.

On the basis of above scheme, explanation is further explained to technical solution of the present invention.

Further, conspicuousness detection is carried out to input video, specifically includes the following steps:

Original video frame is inputted, conspicuousness detection in airspace is carried out to input video frame according to convolutional neural networks, is generated empty Domain conspicuousness testing result；

According to the motion vector that inter predication process obtains in HEVC compression process, the movement conspicuousness of domain portion is generated As a result；

By the movement significant result of the airspace conspicuousness testing result and the domain portion, using Entropy uncertainty Algorithm is merged.

Further, the convolutional neural networks structure, comprising:

(1) it convolutional layer: obtains indicating the characteristic pattern of image local feature after convolution operation, adds one after each convolutional layer A amendment linear unit；Since the space relationship between the pixel of image is local, the only local message ratio consideration of consideration pixel The complexity of global information is much lower, the characteristic pattern for indicating image local feature can be obtained after convolution operation, in every secondary volume After product operation, a Rectified Linear Unit can be generally followed, the activation primitive calculating speed is fast, and can have Effect alleviates gradient disappearance problem；

(2) local acknowledgement normalizes layer: the output of neural network middle layer smoothly, export as follows:

Wherein, (x, y) indicates that location of pixels, i indicate channel index, and N is port number, and α, β, k, n are customized constant； L indicates to normalize layer, the corresponding channel index of j expression in first of local acknowledgement；

(3) maximum pond layer: for maximum pond layer for extracting semantic information similar in part, which passes through a N × N Sliding window operation, wherein window moving step length be N, pass through calculate original image part by window institute inclusion region maximum It is worth the pixel value as new characteristic pattern corresponding position；

(4) it warp lamination: realizes the size by the characteristic pattern size scaling of small size for original image, obtains final output.

Further, the movement significant result generating process of domain portion are as follows: extract motion information, benefit from video compress domain With the decoded process of shallow-layer is carried out in HEVC, the motion vector information of predicting unit PU in video frame is obtained, then swears movement The size of amount reassembles into temporal motion characteristic pattern as the severe degree that block moves.

Further, according to the motion vector that inter predication process obtains in HEVC compression process, the fortune of domain portion is generated Dynamic significant result, specifically includes the following steps:

Motion information is extracted from video compress domain, using the decoded process of shallow-layer is carried out in HEVC, is obtained pre- in video frame Survey the motion vector information of unit PU；

Temporal motion characteristic pattern is reassembled into using the size of the motion vector information as the severe degree that block moves.

Further, it is regarded using overall motion estimation algorithm using perspective model according to the temporal motion characteristic pattern Global motion information in frequency, the process can be stated are as follows:

(x, y) and (x ', y ') is the corresponding pixel points of present frame and reference frame, parameter set m=[m respectively in formula₀,..., m₇] represent the globe motion parameter for needing to estimate；

Using gradient descent method to the model solution, the global motion for representing camera motion information can be calculated, by original Begin to move and subtract global motion, obtains the foreground moving relative to background；

According to the perception prior distribution power function of display movement velocity:

In formula, v indicates movement velocity；K and α indicates constant；

The time conspicuousness of movement is calculated according to its self-information, calculation formula is as follows:

S^(t)=-logp (v)=α logv+ β

Wherein β=- logk, α=0.2, β=0.09 are finally normalized to [0,1], obtain time domain Saliency maps.

Further, by the movement significant result of the airspace conspicuousness testing result and the domain portion, using entropy Uncertainty algorithm is merged, comprising:

The airspace Saliency maps will be calculated and the time domain Saliency maps merge, obtain whole time and space significance Figure calculates notable figure after merging using following formula:

In formula, U^(t)Indicate the perception uncertainty of time domain；U^(s)Indicate the uncertainty of airspace conspicuousness；S^(t)Indicate fortune Dynamic time conspicuousness；S^(s)Indicate the airspace conspicuousness of movement.

Further, it calculates the saliency value of each CU block and its corresponding QP value is selected, specifically includes the following steps:

Calculate the saliency value of each CU block, calculation formula are as follows:

Wherein, S_n×n(k) indicate that the saliency value of k-th of CU block, the size of k-th of CU block are n*n, i is indicated in n*n block Coordinate from left to right, j indicate coordinate from top to bottom.；

Calculate the average saliency value of all CU blocks, calculation formula are as follows:

Wherein, S_avgIndicate the average saliency value of all CU blocks, width indicates that the width of video frame, height indicate video frame Height；

According to the average saliency value for the saliency value and all CU blocks for calculating resulting current CU block, dynamic adjusts present frame QP value obtains the perception QP value of current CU block.

Further, the calculation formula of the perception QP value of current CU block are as follows:

Wherein, QP_cIndicate the QP value of present frame, QP_kIndicate the perception QP value of current CU block, w_kIndicate a transformation parameter, w_kCalculation formula are as follows:

Wherein, a, b, c are normal parameter, and S (k) indicates the saliency value of k-th of CU block, S_avgIndicate being averaged for all CU blocks Saliency value.

Further, final rate-distortion optimization target is obtained, specifically includes the following steps:

The saliency value of each CU block in video is obtained, calculates and perceives the preferential distortion factor；

By the preferential distortion factor of perception, traditional rate distortion computation method is added, obtains final optimization aim.

Further, the formula for calculating the preferential distortion factor of perception is as follows:

D_s=D × (1+SF × SD)

In formula, D is the distortion factor calculation method of HM standard；SF indicates to need to configure the specified sensing and optimizing parameter of file； The significant sexual deviation of SD expression present encoding block；

The SD calculation formula is as follows:

In formula, SD value range is (- 1,1), S_cuIndicate the conspicuousness of current block, S_avgIndicate all CU blocks of present frame Average significance value.

Further, by the preferential distortion factor of perception, traditional rate distortion computation method is added, obtains final optimization mesh Mark, comprising:

The traditional rate-distortion optimization algorithm of analogy, the improved target of Lagrangian Arithmetic may be expressed as:

min{D_s+λR}

In formula, D_sIndicate the perceptual distortion degree of current block conspicuousness；λ indicates Lagrange multiplier；R presentation code bit Rate.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.

Fig. 1 attached drawing is a kind of video pressure based on convolutional neural networks and the significant information of HEVC compression domain provided by the invention The flow chart of contracting method；

Fig. 2 attached drawing is video time and space significance provided by the invention detection and the perception compression process for HD video It is whole to realize block diagram；

Fig. 3 attached drawing is the method flow diagram that time-space domain conspicuousness provided by the invention detected and merged part；

Fig. 4 attached drawing is the structure composition schematic diagram of each layer of convolutional neural networks provided by the invention；

Fig. 5 a attached drawing is the exemplary raw frames figure of conspicuousness effect in airspace provided by the invention；

Fig. 5 b attached drawing is that the exemplary model of conspicuousness effect in airspace provided by the invention calculates gained airspace Saliency maps；

Fig. 6 attached drawing is motion vector schematic diagram provided by the invention；

Fig. 7 a attached drawing is the exemplary raw frames figure of time domain conspicuousness effect provided by the invention；

Fig. 7 b attached drawing is that the exemplary model of time domain conspicuousness effect provided by the invention calculates gained time domain Saliency maps；

Fig. 8 attached drawing is temporal-spatial fusion effect picture provided by the invention；

Fig. 9 attached drawing method flow diagram provided by the invention for obtaining final rate-distortion optimization target part.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

Referring to attached drawing 1, the embodiment of the invention discloses one kind to be based on convolutional neural networks and the significant information of HEVC compression domain Video-frequency compression method, method includes the following steps:

S1: the motion estimation result pair in HEVC compression process to each CU block is combined on the basis of convolutional neural networks Input video carries out conspicuousness detection；

S2: calculating the saliency value of each CU block and selects its corresponding QP value, and the saliency value of each CU block is added Traditional rate distortion computation method, obtains final rate-distortion optimization target, realizes the video-aware coding of high quality.

Referring to attached drawing 2, in this method video time and space significance detection and for HD video perception compression process it is whole Body realizes process are as follows: after input original video frame, conspicuousness detection in airspace is carried out to input video frame using convolutional neural networks, Simultaneously according to the motion vector that inter predication process obtains in HEVC compression process, the movement conspicuousness knot of domain portion is generated Fruit merges time-space domain conspicuousness using the method for Entropy uncertainty, to obtain the time-space domain conspicuousness for video As a result.Sound assurance is provided for subsequent video compression.In video coding section, HEVC standard algorithm can be optimized, After obtaining video visual conspicuousness, these marking areas (people are more likely to the region paid close attention in statistical significance) are given more Good compression quality, and can suitably reduce under the premise of not occurring and being excessively distorted the compression quality in non-significant region so as to Reduce video code rate.In addition it from the core concept of rate-distortion optimization, is calculated according to the rate-distortion optimization based on significance weighted Method can effectively improve the perceived quality of video compress.

Referring to attached drawing 3, conspicuousness detection is carried out to input video, specifically includes the following steps:

S101: input original video frame carries out the detection of airspace conspicuousness to input video frame according to convolutional neural networks, raw At airspace conspicuousness testing result；

S102: according to the motion vector that inter predication process obtains in HEVC compression process, the movement of domain portion is generated Significant result；

S103: by the movement significant result of the airspace conspicuousness testing result and the domain portion, not using entropy Degree of certainty algorithm is merged.

Specifically, the structure of above-mentioned convolutional neural networks is as shown in Figure 4.

The structure and function of each layer of the convolutional neural networks is as follows:

(1) convolutional layer: due to the space relationship between the pixel of image be it is local, only consider pixel local message ratio examine The complexity for considering global information is much lower, the characteristic pattern for indicating image local feature can be obtained after convolution operation, each After convolution operation, a Rectified Linear Unit (ReLU) can be generally followed, the activation primitive calculating speed is fast, And gradient disappearance problem can be effectively relieved.

(2) local acknowledgement normalizes layer: this layer is equivalent to the output to neural network middle layer and smoothly, be conducive to Improve the generalization ability of model.The output of this layer is as follows:

Wherein (x, y) indicates that location of pixels, i indicate channel index, and N is port number, and α, β, k, n are customized constant；l It indicates to normalize layer, the corresponding channel index of the expression of j in first of local acknowledgement；

(3) maximum pond layer: maximum pond layer can extract semantic information similar in part, which passes through a N × N's Sliding window operation, wherein window moving step length is N, by calculating the part of original image by the maximum value of window institute inclusion region Pixel value as new characteristic pattern corresponding position.Pondization operates the size that can reduce output, thereby reduces over-fitting.

For example be trained the above-mentioned network on the SALICON data set comprising 9000 pictures, this can be obtained Airspace conspicuousness network used in inventing.

By trained network model, propagated forward is carried out to the triple channel image of input, final sky can be obtained Domain Saliency maps, as shown in Fig. 5 a-5b, which can effectively calculate the salient region in picture.

Referring to Fig. 6, as the motion vector schematic diagram of video frame, but by the obtained temporal motion feature of the above process Figure contains the total movement in video frame, and tests and show to stimulate more it is apparent that foreground object is relative to back human eye The movement of scenery body, therefore, further, the present invention uses overall motion estimation algorithm, is obtained in video using perspective model Global motion information, which can state are as follows:

Wherein, (x, y) and (x ', y ') is the corresponding pixel points of present frame and reference frame, parameter set m=[m respectively₀,..., m₇] globe motion parameter for needing to estimate is represented, it can be used gradient descent method to the model solution, can be calculated and represent video camera The global motion of motion information, subtracting global motion by original motion can be obtained foreground moving relative to background.

Stocker et al. measures the priori that the mankind perceive about moving object by a series of experiment of psycho-visuals Probability, experimental result show that the perception prior distribution of movement velocity can be calculated by following power function:

Wherein v is movement velocity, and k and α indicate constant；The time that so movement can be calculated using its self-information is significant Property, calculation formula are as follows:

S^(t)=-logp (v)=α logv+ β

Wherein β=- logk, α=0.2, β=0.09, finally being normalized to [0,1] can be obtained time domain conspicuousness Figure, as shown in Figure 7 a, image is the frame picture in video BasketballDrive, and video camera is according to personage in the video The operations such as frequent translation rotation are carried out with the strenuous exercise of basketball, which intercepted when video camera is translated；Figure 7b image shows the time domain Saliency maps that the mentioned algorithm of the present invention is calculated, due to the motion information in algorithm from Each piece of motion vector in HEVC cataloged procedure, so motion detection result inevitably block structure, but remain to Find out that global motion is eliminated well, and highlights more significant moving region in foreground object.

This fusion rule proposed by the present invention can dynamically be adjusted with the variation of time domain and airspace uncertainty, This method is more flexible compared with the fusion method of traditional preset parameter, more meets the detection demand for video, such as Fig. 8 institute Show, (a is raw frames, and b is time domain Saliency maps, and c is the uncertain figure of time domain, and d is airspace Saliency maps, and e is uncertain for airspace Figure, f be the final notable figure after uncertain weighting) for the feature progress effective integration of space-time, and strengthen uncertainty compared with The testing result of low area, fused space-time, which does not know figure, can preferably reflect the significant watching area of human eye.

In order to preferably assess the testing result of the algorithm, the present invention can choose five evaluation indexes compare testing result and The difference of practical gaze data, meanwhile, the algorithm and analogous algorithms (such as SAVC algorithm) are compared.

Experiment is chosen 10 video sequences from 3 different resolutions and is detected, and video information is as shown in table 1:

Table 1 tests Video sequence information used

Assessment strategy (AUC, SIM, CC, NSS, KL) using six kinds of mainstream in the world to conspicuousness model calculates three kinds Method is assessed, and wherein AUC value is more accurate to the prediction of image signal portion closer to 1 explanation, and SIM is to measure two distribution phases Like a measurement of degree, CC is for measuring notable figure and watching a kind of symmetrical index of the linear relationship between figure attentively, and NSS is The average normalized conspicuousness of fixed position is assessed, above four indexs are the bigger the better, and KL is then to utilize a kind of probability solution Release to assess significance and watch figure attentively, the information that value has evaluated notable figure is lost, opposite, the numerical value of KL index it is more low more It is good.

Specifically, it calculates the saliency value of each CU block and its corresponding QP value is selected, specifically includes the following steps:

Wherein, S_n×n(k) indicate that the saliency value of k-th of CU block, the size of k-th of CU block are n*n, i is indicated in n*n block Coordinate from left to right, j indicate coordinate from top to bottom.

Specifically, the calculation formula of the perception QP value of current CU block are as follows:

Referring to attached drawing 9, final rate-distortion optimization target is obtained, specifically includes the following steps:

S201: obtaining the saliency value of each CU block in video, calculates and perceives the preferential distortion factor；

S202: by the preferential distortion factor of perception, traditional rate distortion computation method is added, obtains final optimization mesh Mark.

Specifically, the formula for calculating the preferential distortion factor of perception is as follows:

D_s=D × (1+SF × SD)

Specifically, the SD calculation formula is as follows:

In formula, SD value range is (- 1,1)；S_cuIndicate the conspicuousness of current block, S_avgIndicate all CU blocks of present frame Average significance value.

Specifically, by the preferential distortion factor of perception, traditional rate distortion computation method is added, obtains final optimization mesh Mark, comprising:

min{D_s+λR}

The perception characteristics of the present embodiment combination human eye propose the rate distortion meter of a kind of combination sensor model and time and space significance Calculation method, using this improved method can various coding modes on the basis of considering conspicuousness in comprehensive consideration HEVC, Such as CU division, search pattern, to proceed from the situation as a whole to carry out optimal parameter selection.

min{D_s+λR}

D_sAs the perceptual distortion degree for combining current block conspicuousness, it can be ensured that better perceived coding quality, it is this to change Into can ensure lower perceptual distortion and low bit rate, this for video flowing low-bandwidth transmission advantageously.

Three kinds of methods based on HM standard are respectively adopted as benchmark, using BD-EWPSNR, based on the BD- of EWPSNR Rate, BD-PSNR and BD-SSIM carry out quantitative comparison in all directions to experimental result, wherein preceding two indexs can be intuitively anti- The performance superiority and inferiority of proposed method and pedestal method under human eye perceptual criteria herein is mirrored, rear two indexs are then the objective finger of standard Mark, wherein the calculating of PSNR is based only on error suseptibility (error sensitivity), the matching with perception visual quality It is not fine, it is difficult to describe the perceived quality of reconstruction image or video, SSIM is then by brightness relevant to object structures and right Distortion measure is carried out as the structural information in image than degree, the structure distortion of image entirety can be reacted to a certain extent. In four indices, BD-EWPSNR, BD-PSNR and BD-Rate are the bigger the better, BD-Rate the smaller the better (consideration symbol). Assessment result is as shown in table 2:

2 video compress assessment result of table is referring to table

The experimental results showed that mentioned algorithm is maximum relative to the adaptive QP algorithm advantage proposed in HM standard herein, put down Equal BD-EWPSNR, which improves 0.710, BD-Rate, reduces by 20.332, while relative to the quantization of the rate-distortion optimization of HM standard and more QP Optimization method BD-EWPSNR is also higher by 0.317 and 0.354 respectively, although the BD-PSNR and BD-SSIM of mentioned method are equal herein Declined, but the decline is to improve the inexorable trend of marking area compression effectiveness, improves Perception Area under the conditions of same code rate The compression quality in domain must more be concentrated to sacrifice the compression quality in non-significant region as cost, and when marking area is smaller When, the trend is more significant.

To exclude influence of the conspicuousness testing result for compression effectiveness, effectively compare the compression algorithm that this method is proposed Compression effectiveness is watched figure attentively using the eye movement of video each in database and is tested, and experimental result is as shown in table 3:

Table 3 watches the video compress quality assessment result of figure attentively referring to table based on eye movement

As can be seen from the above table, this method institute's pressure-raising compression algorithm performs better than at this time, it is known that this method can guarantee Video objective quality effectively improves EWPSNR under the premise of not being remarkably decreased, compared to HM standard optimization techniques with more validity and Superiority, and under the subjective feeling of people, the method that we are proposed has optimal appreciation effect.

For the compression process of HD video, compression efficiency is also a very important factor of evaluation, to measure The compression efficiency of the mentioned algorithm of the present invention records the compression time under various methods in experimentation, in an Intel It is carried out under the experiment condition of Xeon E5-1620 v3 CPU with 8GB RAM and a NVIDIA Titan X GPU real It tests, on the basis of using the setting of time used in the HM standard method of RDOQ, it is as shown in table 4 data can be obtained:

The comparison of 4 video compression efficiency of table is referring to table

According to experimental result it is recognized that while the time used in the AQP method based on HM standard is most short, but its compression effectiveness It is worst, although while MQP method effect it is good compared with AQP method carried out within the scope of given QP because MQP algorithm is equivalent to Exhaustion is to obtain QP used in the best compression result of effect under rate-distortion optimization, so its compression time longest, is standard HM 6.46 times.

Quantitative experiment the result shows that, the mentioned method of the present invention is superior to HM canonical algorithm in compression efficiency and compression effectiveness And its every optimization method proposed, wherein the BD-EWPSNR of method provided by the present invention ratio AQP method is averagely higher by 0.71, it is 2.59 times of MQP method in compression efficiency.

Method provided in this embodiment obtains the time domain conspicuousness of video using the motion vector information of HEVC compression domain, benefit Airspace conspicuousness is detected with convolutional neural networks, and is merged the two using the method for Entropy uncertainty, gives full play to time-space domain Different characteristic feature, and instruct using obtained significant result the compression process of HEVC.

Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part It is bright.

The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims

1. a kind of video-frequency compression method based on convolutional neural networks and the significant information of HEVC compression domain, which is characterized in that including Following steps:

It is combined on the basis of convolutional neural networks in HEVC compression process and the motion estimation result of each CU block regards input Frequency carries out conspicuousness detection；

It calculates the saliency value of each CU block and its corresponding QP value is selected, and traditional rate is added in the saliency value of each CU block Distortion computation method obtains final rate-distortion optimization target, realizes the video-aware coding of high quality.

2. a kind of video compress side based on convolutional neural networks and the significant information of HEVC compression domain according to claim 1 Method, which is characterized in that conspicuousness detection is carried out to input video, specifically includes the following steps:

Original video frame is inputted, conspicuousness detection in airspace is carried out to input video frame according to convolutional neural networks, it is aobvious to generate airspace Work property testing result；

According to the motion vector that inter predication process obtains in HEVC compression process, the movement conspicuousness knot of domain portion is generated Fruit；

By the movement significant result of the airspace conspicuousness testing result and the domain portion, using Entropy uncertainty algorithm It is merged.

3. a kind of video compress side based on convolutional neural networks and the significant information of HEVC compression domain according to claim 2 Method, which is characterized in that the structure of the convolutional neural networks includes:

(1) it convolutional layer: obtains indicating the characteristic pattern of image local feature after convolution operation, adds one after each convolutional layer and repair Linear positive unit；

Wherein, (x, y) indicates that location of pixels, i indicate channel index, and N is port number, and α, β, k, n are customized constant；L table Show and normalizes layer, the corresponding channel index of the expression of j in first of local acknowledgement；

(3) maximum pond layer: for maximum pond layer for extracting semantic information similar in part, which passes through the cunning of a N × N Dynamic window operation, wherein window moving step length is N, and the part by calculating original image is made by the maximum value of window institute inclusion region For the pixel value of new characteristic pattern corresponding position；

4. a kind of video compress side based on convolutional neural networks and the significant information of HEVC compression domain according to claim 2 Method, which is characterized in that according to the motion vector that inter predication process obtains in HEVC compression process, generate the movement of domain portion Significant result, specifically includes the following steps:

Motion information is extracted from video compress domain, using the decoded process of shallow-layer is carried out in HEVC, obtains predicting in video frame single The motion vector information of first PU；

5. a kind of video compress side based on convolutional neural networks and the significant information of HEVC compression domain according to claim 3 Method, which is characterized in that regarded using overall motion estimation algorithm using perspective model according to the temporal motion characteristic pattern Global motion information in frequency, the process can be stated are as follows:

(x, y) and (x ', y ') is the corresponding pixel points of present frame and reference frame, parameter set m=[m respectively in formula₀,...,m₇] generation Table needs the globe motion parameter estimated；

Using gradient descent method to the model solution, the global motion for representing camera motion information can be calculated, by original fortune It is dynamic to subtract global motion, obtain the foreground moving relative to background；

In formula, v indicates movement velocity；K and α indicates constant；

S^(t)=-logp (v)=α logv+ β

6. a kind of video compress side based on convolutional neural networks and the significant information of HEVC compression domain according to claim 2 Method, which is characterized in that by the movement significant result of the airspace conspicuousness testing result and the domain portion, not using entropy Degree of certainty algorithm is merged, comprising:

The airspace Saliency maps will be calculated and the time domain Saliency maps merge, obtain whole time and space significance figure, Notable figure after merging is calculated using following formula:

In formula, U^(t)Indicate the perception uncertainty of time domain；U^(s)Indicate the uncertainty of airspace conspicuousness；S^(t)Indicate movement Time conspicuousness；S^(s)Indicate the airspace conspicuousness of video frame.

7. a kind of video compress side based on convolutional neural networks and the significant information of HEVC compression domain according to claim 1 Method, which is characterized in that it calculates the saliency value of each CU block and its corresponding QP value is selected, specifically includes the following steps:

Wherein, S_n×n(k) indicate that the saliency value of k-th of CU block, the size of k-th of CU block are n*n, i is indicated in n*n block from a left side To right coordinate, j indicates coordinate from top to bottom.

Wherein, S_avgIndicate the average saliency value of all CU blocks, width indicates that the width of video frame, height indicate video frame It is high；

According to the average saliency value for the saliency value and all CU blocks for calculating resulting current CU block, dynamic adjusts the QP of present frame Value, obtains the perception QP value of current CU block.

8. a kind of video compress side based on convolutional neural networks and the significant information of HEVC compression domain according to claim 6 Method, which is characterized in that the calculation formula of the perception QP value of current CU block are as follows:

Wherein, QP_cIndicate the QP value of present frame, QP_kIndicate the perception QP value of current CU block, w_kIndicate a transformation parameter, w_k's Calculation formula are as follows:

Wherein, a, b, c are normal parameter, and S (k) indicates the saliency value of k-th of CU block, S_avgIndicate the average significant of all CU blocks Value.

9. a kind of video compress side based on convolutional neural networks and the significant information of HEVC compression domain according to claim 1 Method, which is characterized in that final rate-distortion optimization target is obtained, specifically includes the following steps:

10. a kind of video compress based on convolutional neural networks and the significant information of HEVC compression domain according to claim 8 Method, which is characterized in that the formula for calculating the preferential distortion factor of perception is as follows:

D_s=D × (1+SF × SD)

In formula, D is the distortion factor calculation method of HM standard；SF indicates to need to configure the specified sensing and optimizing parameter of file；SD table Show the significant sexual deviation of present encoding block；

The SD calculation formula is as follows:

In formula, SD value range is (- 1,1), S_cuIndicate the conspicuousness of current block, S_avgIndicate being averaged for all CU blocks of present frame Significance value.

11. a kind of video compress based on convolutional neural networks and the significant information of HEVC compression domain according to claim 8 Method, which is characterized in that by the preferential distortion factor of perception, traditional rate distortion computation method is added, obtains final optimization Target, comprising:

min{D_s+λR}