CN104966276A

CN104966276A - Image video scene content conformal mapping sparse representation method

Info

Publication number: CN104966276A
Application number: CN201510337089.5A
Authority: CN
Inventors: 陈小武; 李健伟; 邹冬青; 赵沁平; 高博
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2015-06-17
Filing date: 2015-06-17
Publication date: 2015-10-07
Anticipated expiration: 2035-06-17
Also published as: CN104966276B

Abstract

The invention provides an image video scene content conformal mapping sparse representation method. The method comprises the following steps: 1) inputting an original image or video and carrying out sampling in a characteristic space; 2) calculating K nearest neighbor of each sample and establishing a local full adjacent map and calculating the distance between adjacent samples; 3) according to conformal mapping rules, combining the conformal mapping rules with the sparse representation method and learning a dictionary having a conformal property; and 4) carrying out reconstruction on the original image or video by utilizing the dictionary. According to the method, by introducing the conformal mapping rules, angle information between the adjacent samples is kept to the maximum degree, and the dictionary having more powerful presentation capability is obtained. Meanwhile, conformal mapping drives the adjacent samples to use the similar dictionary to carry out reconstruction, so that the dictionary is allowed to be more concise and compact. The method has a wide application prospect in the technical field of picture processing, computer vision and reality augmentation.

Description

A kind of conformal projection sparse expression method of image/video scene content

Technical field

The present invention relates to image procossing, computer vision and augmented reality field, specifically a kind of conformal projection sparse expression method of image/video scene content.

Background technology

In the last few years, sparse expression and dictionary learning technology were paid close attention in a large number as a study hotspot, and were widely used in image procossing and computer vision field, such as image super-resolution, image denoising, classification and color editor etc.Sparse expression technology is that signal was used the linear combination of sample in complete dictionary to reconstruct, and the number limiting reconstructed sample is to reach sparse property.

At present, a lot of researcher is devoted to the research of sparse expression method, and dictionary plays very important effect in sparse expression technology.The people such as Michal Aharon proposed K-SVD dictionary learning method in 2006 and are applied to image procossing.The people such as Honglak Lee proposed a kind of rapid sparse coding method in 2006, accelerate solving speed.The people such as Mairal proposed the online dictionary learning method based on stochastic approximation in 2009, the method effectively can process large data sets.The method for solving focusing on sparse expression of these methods and operational efficiency.These methods are absorbed in the re-configurability of dictionary, but need to depend on a large amount of training samples.Further, the dictionary number of these methods needs manually to arrange, can not auto scaling, makes the dictionary redundancy obtained.Other sparse expression method obtain certain achievement in the tight ness rating and expressivity of dictionary.Such as, the people such as Qiu proposed the action attributes dictionary learning method based on maximum mutual information in 2011; The people such as Siyahjani proposed context-aware dictionary in 2013 and for the identification of image object and location.These dictionary learning methods add the otherness between classification, but do not consider the local relation in data space and contextual information, cause the ability to express of dictionary low.And some researchs show, keep the partial structurtes relation between data interconnects can strengthen fidelity when data reconstruction, avoid the generation of distortion situation.

Sparse expression technology is applied to image procossing and computer vision field more and more.Such as, K-SVD method is used for image denoising by the people such as Elad; The people such as Yang proposed the method simultaneously learning out high resolving power and low resolution two dictionaries by sparse expression method in 2010, and for image super-resolution; The people such as Chen proposed in 2014 to utilize sparse expression technology to carry out editing the theory propagated, and can process the image/video of ultrahigh resolution and greatly reduce calculating internal memory.In addition, sparse expression technology can also be applied to the aspects such as recognition of face, Postprocessing technique, Images Classification.And in the processing procedure of above-mentioned application, generate the emphasis that the higher result of eye fidelity remains sparse expression technical research.

Summary of the invention

In order to overcome above-mentioned the deficiencies in the prior art, the present invention proposes a kind of conformal projection sparse expression method of image/video scene content, the method, by introducing conformal projection, maintains the local angle information between adjacent sample to greatest extent, and obtains the stronger dictionary of ability to express.Meanwhile, conformal projection impels the similar dictionary of adjacent sample to be reconstructed, and makes dictionary concision and compact more.Finally, make the reconstruction result after picture editting keep original partial structurtes better, strengthen the visual effect and the sense of reality that generate result.

For completing goal of the invention, the technical solution used in the present invention is:

The conformal projection sparse expression method of a kind of image/video scene content of the present invention, its concrete steps are as follows:

Step one: input original image or video sampling in feature space;

Step 2: in feature space, calculates the k nearest neighbor of each sample and sets up local adjacent map completely, then calculating the distance between adjacent sample;

Step 3: according to conformal projection rule, it combined with sparse expression method, study has the dictionary of conformal character;

Step 4: for embody rule, utilizes this dictionary to be reconstructed original image or video, obtains result.

Wherein, " local is adjacent map completely " described in step 2, referring in the set formed for certain sample and its k nearest neighbor, is all connected between any two samples.

Wherein, " conformal projection rule " described in step 3 is a kind of manifold learning, specifically describes to be: given feature space M is to the mapping g:M → N of another feature space N, (x _i, x _j, x _k) be sample point adjacent in feature space M and form triangle, (α _i, α _j, α _k) be the mappings of these sample points in feature space N.Need meet according to conformal projection rule:

m i n \underset{j, k &Element; N_{i}}{Σ} {(| | x_{j} - x_{k} | |^{2} - s_{i} | | α_{j} - α_{k} | |^{2})}^{2},

Wherein, N _irepresent sample x _ik nearest neighbor set, s _irepresent the change of scale mapped.

Wherein, combining with sparse expression method described in step 3 learns to have the dictionary of conformal character, and concrete steps are: conformal projection rule be combined with sparse expression algorithm, obtain following energy theorem:

\underset{D, α, S}{m i n} \underset{i}{Σ} | | x_{i} - {Dα}_{i} | |_{2}^{2} + λ_{1} \underset{i}{Σ} | | α | |_{1} + λ_{2} \underset{i}{Σ} \underset{j, k &Element; N_{i}}{Σ} {(| | x_{j} - x_{k} | |^{2} - s_{i} | | α_{j} - α_{k} | |^{2})}^{2},

Wherein, x is input amendment feature, and D is characteristics dictionary, and α is reconstruction coefficients, λ ₁, λ ₂for weight coefficient, minimize this energy theorem by iterative algorithm, finally try to achieve the dictionary D with conformal character.

Wherein, the method can be applied to the video image editor application such as image super-resolution, video image color editor, image denoising.

Compared with prior art, its useful feature is in the present invention:

1, in sparse expression technical foundation, by introducing conformal projection rule, maintaining the local angle information between adjacent sample to greatest extent, obtaining the dictionary that ability to express is stronger; By conformal projection, impel the similar dictionary of adjacent sample to be reconstructed, make dictionary concision and compact more.

2, benefit from more succinct and that ability to express is stronger dictionary, the reconstruction result after the present invention makes picture editting keeps original partial structurtes better, strengthens the visual effect and the sense of reality that generate result.

3, the method that the present invention proposes can be applied to a lot of field and Be very effective, comprising: image super-resolution, video image color editor, image denoising etc.

Accompanying drawing explanation

Fig. 1 is the method for the invention process flow diagram;

Fig. 2 is principle schematic of the present invention;

Fig. 3 is dictionary learning total algorithm process flow diagram of the present invention;

In figure, symbol description is as follows:

D: the dictionary learnt under particular feature space;

A: reconstruction coefficients;

S: change of scale coefficient;

X _i, x _j, x _k: the sample point of input, the i.e. sample characteristics of image/video;

α _i, α _j, α _k: the sample point being mapped to another space, namely sparse reconstruction coefficients.

Embodiment

In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with accompanying drawing, in detail explanation is explained to method of the present invention.Should be appreciated that instantiation described herein only in order to explain the present invention, be not intended to limit the present invention.

The present invention proposes a kind of conformal projection sparse expression method of image/video scene content, and the method, by introducing conformal projection rule, maintains the local angle information between adjacent sample to greatest extent, obtains more succinct and that ability to express is stronger dictionary; The dictionary using the method to generate carries out video image editor, and its reconstruction result can keep original partial structurtes better, strengthens the visual effect and the sense of reality that generate result.Meanwhile, the method is applied to three typical apply, comprises image super-resolution, video image color editor, image denoising.

The conformal projection sparse expression method of a kind of image/video scene content of the present invention, as shown in Figure 1, embodiment is as follows for flow process:

Step one: input original image or video sampling in feature space.

The original image inputted or video are sampled, obtains input amendment collection X.Different feature spaces is chosen according to no application demand.Such as, for image super-resolution application, by image from RGB color space conversion to Ycbcr color space, patch rank is sampled to the luminance channel Y of image.For color editing application, pixel scale is sampled to RGB color characteristic; For image denoising application, patch rank is sampled to gray feature or RGB color characteristic.

Step 2: in feature space, calculates the k nearest neighbor of each sample and sets up local adjacent map completely, then calculating the distance between adjacent sample.

First in feature space, each sample x is calculated by Kd-tree method _ik nearest neighbor, use Euclidean distance during calculating, at sample x _iand in the set of K neighbour's sample composition, connect every two compositions of sample local adjacent map completely; The Euclidean distance connected between sample is calculated in feature space.

Step 3: according to conformal projection rule, it combined with sparse expression method, study has the dictionary of conformal character.

Given input amendment collection X=[x ₁, x ₂..., x _n], utilize sparse expression method, can in the hope of crossing complete dictionary D, and reconstruction coefficients α:

\underset{D, α}{m i n} \underset{i}{Σ} | | x_{i} - {Dα}_{i} | |_{2}^{2} + λ \underset{i}{Σ} | | α_{i} | |_{1} .

In order to improve the performance of sparse expression method, invention introduces the partial structurtes information of input data, the basis of above-mentioned formula adds conformal item f (α).

Conformal projection has been proved to be to improve manifold learning effect in manifold learning field.Concrete grammar is: given feature space M is to the mapping g:M → N of another feature space N, (x _i, x _j, x _k) be sample point adjacent in feature space M and form triangle, (α _i, α _j, α _k) be the mappings of these sample points in feature space N, as shown in Figure 2.Need meet according to conformal projection rule:

m i n \underset{j, k &Element; N_{i}}{Σ} {(| | x_{j} - x_{k} | |^{2} - s_{i} | | α_{j} - α_{k} | |^{2})}^{2},

Wherein, N _irepresent sample x _ik nearest neighbor set, s _irepresent the change of scale after mapping.

Then, conformal projection rule is combined with sparse expression algorithm, obtains following energy theorem:

\underset{D, α, S}{m i n} \underset{i}{Σ} | | x_{i} - {Dα}_{i} | |_{2}^{2} + λ_{1} \underset{i}{Σ} | | α | |_{1} + λ_{2} \underset{i}{Σ} \underset{j, k &Element; N_{i}}{Σ} {(| | x_{j} - x_{k} | |^{2} - s_{i} | | α_{j} - α_{k} | |^{2})}^{2},

Wherein, x is input amendment feature, and D is characteristics dictionary, and α is reconstruction coefficients, λ ₁, λ ₂for weight coefficient.Minimize this energy theorem by iterative algorithm, finally try to achieve the dictionary D with conformal character.

Above-mentioned formula has three unknown variables (D, α, S), and wherein D is dictionary to be asked, and α is sparse reconstruction coefficients, and S is change of scale.Therefore the present invention is decomposed into three subproblems: sparse coding, dictionary updating, and yardstick upgrades.When each subproblem solves, only optimize a variable and fix other Two Variables.These three continuous loop iterations of step are until obtain optimum solution.

First, the value of initializing variable D and S is needed to be stochastic matrix.In the sparse coding stage, the value of fixing D and S, by following equations factor alpha:

J_{(A)} = \arg \underset{α}{m i n} \underset{i}{Σ} | | x_{i} - {Dα}_{i} | |_{2}^{2} + λ_{1} \underset{i}{Σ} | | α_{i} | |_{1} + λ_{2} \underset{i}{Σ} \underset{j, k &Element; N_{i}}{Σ} {(| | x_{j} - x_{k} | |^{2} - s_{i} | | α_{j} - α_{k} | |^{2})}^{2} .

Here, the present invention uses iterative projection method to solve this formula.

Then, in the dictionary updating stage, the value of fixing α and S solves D, and solution formula is:

J_{(D)} = \arg \min_{D} \underset{i}{Σ} | | x_{i} - {Dα}_{i} | |_{2}^{2} .

Here each d in dictionary is required _jfor unit vector, namely meet this formula is quadratic programming problem, can upgrade each in dictionary item by item.

Finally, in the yardstick more new stage, fixing D and α solves S, and solution formula is:

J_{(S)} = \arg \underset{S}{m i n} \underset{i}{Σ} \underset{j, k &Element; N_{i}}{Σ} {(| | x_{j} - x_{k} | |^{2} - s_{i} | | α_{j} - α_{k} | |^{2})}^{2} .

Notice each s in above-mentioned formula _ibe all independently, therefore can be solved respectively by least square method.Method for solving is:

s_{i} = \frac{Σ_{j, k &Element; N_{i}} | | x_{j} - x_{k} | |^{2} \cdot | | α_{j} - α_{k} | |^{2}}{Σ_{j, k &Element; N_{i}} {(| | α_{j} - α_{k} | |^{2})}^{2}} .

By the continuous iteration optimization of these three processes, finally try to achieve optimum solution.Algorithm flow chart is shown in Fig. 3.

Step 4: utilize this dictionary to be reconstructed original image or video, obtain result.

The present invention provides three kinds of different performances that should be used for verifying the method, comprises image super-resolution, video image color editor, image denoising.

Image super-resolution application is high-resolution image by the Image Reconstruction of low resolution.First set up high-definition picture and low-resolution image storehouse one to one, utilize above-mentioned dictionary learning method simultaneously from storehouse learning two dictionaries.When the image of an input low resolution, utilize low-resolution dictionary to be reconstructed and try to achieve coefficient, then usage factor and high resolving power dictionary reconstruct corresponding high-definition picture.

Video image color editing application is the colouring information being changed video image by interactive mode.After inputted video image, first learn out its color dictionary, when user by paintbrush on image object during marker color, color corresponding in dictionary can change into the color of user's mark, this change simultaneously can propagate into whole video image, obtains final color edited result.

Image denoising application is the Gaussian noise filtered out on image.Input the image of a band noise, first gather the image block of 8*8 size, and learn out dictionary as data.Then utilize match tracing method reconstructed image, obtain the image after noise filtering.

The dictionary utilizing the present invention to try to achieve has good presentation skills and re-configurability, and dictionary is also more succinct simultaneously.By can this point be proved with the comparison of classic method.Such as traditional dictionary learning method K-SVD, the dictionary size of trying to achieve is 256, and the present invention can be reduced to 205, and ability to express is stronger.Can be represented the ability to express of this dictionary by the related coefficient of dictionary internal, the less ability to express of coefficient is stronger.The related coefficient of dictionary that tradition sparse expression method is tried to achieve is 0.8817, and after the present invention introduces conformal projection, related coefficient is reduced to 0.8477, illustrates that the dictionary that the present invention learns to obtain has stronger learning ability.

The foregoing is only basic explanations more of the present invention, any equivalent transformation done according to technical scheme of the present invention, all should belong to protection scope of the present invention.

Claims

1. a conformal projection sparse expression method for image/video scene content, is characterized in that comprising the following steps:

(1) input original image or video and sample in feature space;

(2) in feature space, calculate the k nearest neighbor of each sample and set up local adjacent map completely, then calculating the distance between adjacent sample;

(3) according to conformal projection rule, it combined with sparse expression method, study has the dictionary of conformal character;

(4) utilize this dictionary to be reconstructed original image or video, obtain result.

2. the conformal projection sparse expression method of a kind of image/video scene content according to claim 1, it is characterized in that: the complete adjacent map in local described in step (2), referring in the set formed for certain sample and its k nearest neighbor, is all connected between any two samples.

3. the conformal projection sparse expression method of a kind of image/video scene content according to claim 1, it is characterized in that: conformal projection rule described in step (3), it is a kind of manifold learning, specific descriptions are: given feature space M is to the mapping g:M → N of another feature space N, (x _i, x _j, x _k) be sample point adjacent in feature space M and form triangle, (α _i, α _j, α _k) be the mappings of these sample points in feature space N; Need meet according to conformal projection rule:

m i n \underset{j, k &Element; N_{i}}{Σ} {(| | x_{j} - x_{k} | |^{2} - s_{i} | | α_{j} - α_{k} | |^{2})}^{2},

4. the conformal projection sparse expression method of a kind of image/video scene content according to claim 1, it is characterized in that: combining with sparse expression method in step (3) learns to have the dictionary of conformal character, concrete steps are: conformal projection rule be combined with sparse expression algorithm, obtain following energy theorem:

\min_{D, α, S} \underset{i}{Σ} | | x_{i} - {Dα}_{i} | |_{2}^{2} + λ_{1} \underset{i}{Σ} | | α_{i} | |_{1} + λ_{2} \underset{i}{Σ} \underset{j, k &Element; N_{i}}{Σ} {(| | x_{j} - x_{k} | |^{2} - s_{i} | | α_{j} - α_{k} | |^{2})}^{2},

5. the conformal projection sparse expression method of a kind of image/video scene content according to claim 1, is characterized in that: described method is applied to video image editor application, comprises image super-resolution, video image color editor, image denoising.

6. the conformal projection sparse expression method of a kind of image/video scene content according to claim 1, it is characterized in that: by introducing conformal projection rule, maintain the local angle information between adjacent sample to greatest extent, obtain the dictionary that ability to express is stronger; Meanwhile, conformal projection impels the similar dictionary of adjacent sample to be reconstructed, and makes dictionary concision and compact more.