CN105590108A

CN105590108A - Scene identification method used in noisy environment

Info

Publication number: CN105590108A
Application number: CN201610103825.5A
Authority: CN
Inventors: 陶大鹏; 郭亚男; 杨喜鹏
Original assignee: Yunnan University YNU
Current assignee: Yunnan University YNU
Priority date: 2016-02-25
Filing date: 2016-02-25
Publication date: 2016-05-18
Anticipated expiration: 2036-02-25
Also published as: CN105590108B

Abstract

The invention relates to a scene identification method used in a noisy environment, and relates to the corresponding technical field. The scene identification method is characterized in that (1) scene images comprising samples of marks can be acquired by adopting a sensor; (2) the characteristic extraction and the characteristic expression of the color images and the depth images of the scenes can be respectively carried out, and the color image characteristics and the depth image characteristics of the same group can be combined together; (3)the characteristic selecting models of the samples of the marks can be acquired by adopting the characteristic selecting algorithm according to the second step; (4) the classification can be carried out by adopting a classifier. The scene identification method is advantageous in that the accurate identification of the scenes can be carried out in the noisy environment, and a certain identification capability can be guaranteed under the condition of the samples mixed with the noises, and therefore the performance can be improved under the condition of the indoor scene data mixed with the noises in a concentrated manner.

Description

A kind of scene Recognition method having under the environment of making an uproar

Technical field

The invention belongs to a kind of scene Recognition method and technology field, especially belong to a kind of scene Recognition having under the environment of making an uproarMethod and technology field.

Background technology

In general, scene classification can be regarded independently object identification problem of a kind of visual angle as, and a scene is by oneThe entity composition of series. For example, indoor scene can include chair, desk, people and bookshelf, and the ornaments of these things neitherChangeless. Accurate identification to scene contributes to solve a lot of practical applications, such as CBIR, and machinePeople's Path Planning Technique and image labeling, etc. Nowadays, scene Recognition has more and more received researcher's concern.

Research widely shows that the characteristics of image dimension obtaining after feature extraction is very high, is limited to computational resource, higherCharacteristic dimension can affect the scene Recognition of RGB-D (in conjunction with coloured image and depth image (depthimage)) sensor compositionThe practicality of system. Although existing Feature Selection can make high dimensional feature become more succinct and effective, but existingSome feature selection approach have been ignored the problem that is mixed with much noise in sample, but, in actual application, due to systemComplexity issue and device processes precision problem a lot of noise that usually can adulterate, the knowledge of so existing feature selecting algorithmOther effect just has certain limitation.

Through retrieval, disclosed scene Recognition process patent is existing more than hundred sections through retrieving, but under the environment of making an uproarScene Recognition method few in number, applicant is in conjunction with Cauchy's estimation theory, forms stream shape Cauchy learning algorithm, reached and made an uproarAccurate identification to scene under environment.

Summary of the invention

The present invention describes a kind of scene Recognition method and calculation of a kind of new feature selecting having under the environment of making an uproar in detailMethod---stream shape Cauchy learning algorithm.

The present invention adopts following technical scheme to realize.

There is the scene Recognition method under the environment of making an uproar, comprise the following steps: 1) utilize sensor to obtain scene image, inContaining the sample of mark; 2) coloured image to scene and depth image (depthimage) carry out feature extraction and feature respectivelyExpress, merge coloured image feature and depth image (depthimage) feature of same group; 3) select feature selecting algorithm pairThe sample of mark obtains feature selecting model by the feature that the 2nd step obtains; 4) adopt grader to classify.

Step 1 of the present invention) be to utilize Kinect sensor to obtain scene image.

Step 2 of the present invention) be specially, convert all images to gray-scale map, image is carried out to convergent-divergent, then rightIn the localized mass of coloured image and depth image (depthimage), extract special by yardstick invariant features conversion (SIFT) methodLevy, then adopt partial restriction uniform enconding (LLC) algorithm to carry out feature representation.

Step 3 of the present invention) feature selecting algorithm is for flowing shape Cauchy learning algorithm, and concrete steps are as follows:

For a given sample x_iBelong to sample set X=[x₁,x₂,...x_n]∈R^D×N(N is number of samples here,D is the original dimension of sample, and R represents at real number space), the low-dimensional sample y that it is corresponding_iBelong to sample set Y=[y₁,y₂,...y_n]∈R^d×N(d is the dimension after dimensionality reduction here), finds the similar and inhomogeneous sample of K arest neighborsItsIn, there is k₁Individually be and x_iSimilar sample, remaining k₂Individually be and x_iInhomogeneous sample, wherein K=k₁+k₂, use respectivelyWithRepresent this two groups of samples; For whole x_iLocalized mass be expressed as:(whereinRepresent D × (k₁+k₂+ 1) linear space of dimension), rightThe low-dimensional of answering is expressedIn a low-dimensional localized mass newly obtaining,Reach between similar sample enough far away of the enough distance near and between inhomogeneity sample of distance, therefore above majorized function tableShow as follows:

\min_{y_{i}} Σ_{j = 1}^{k_{1}} | | y_{i} - y_{i^{j}} | |^{2} - α Σ_{p = 1}^{k_{2}} | | y_{i} - y_{i_{p}} | |^{2}, - - - (1)

α is scale factor, is used for controlling the impact of sample between the interior sample of class and class;

Define a coefficient vector ω_i：

Utilize the coefficient vector of definition, (1) formula will be become following form by abbreviation:

\arg \min_{Y_{i}} t r (Y_{i} L_{i} Y_{i}^{T}), - - - (2)

What tr () represented is mark computing, in formula

L_{i} = [\begin{matrix} Σ_{j = 1}^{k_{1} + k_{2}} {(ω_{i})}_{j} & - ω_{i}^{T} \\ - ω_{i} & d i a g (ω_{i}) \end{matrix}] .

Introduce selection matrix (S below_i)_pq：

{(S_{i})}_{p q} = {\begin{matrix} 1, & \begin{matrix} I f & p = F_{i} {q}; \end{matrix} \\ 0, & e l s e . \end{matrix} - - - (3)

S_{i} &Element; R^{N \times (1 + k_{1} + k_{2})};

Therefore, obtain low-dimensional and express Y_i＝YS_i, object function (2) is rewritten as:

\begin{matrix} \underset{Y}{\arg \min} Σ_{i = 1}^{N} t r ({YS}_{i} L_{i} S_{i}^{T} Y^{T}) \\ = \underset{Y}{\arg \min} t r ({YLY}^{T}) . \end{matrix} - - - (4)

Introduced Cauchy's estimation theory, overcome the impact that noise brings, (4) formula becomes following form:

\underset{Y}{\arg \min} l o g (1 + (\frac{t r ({YLY}^{T})}{c^{2}})) - - - (5)

C is the coefficient for weighing noise;

Owing to there being Y=U^TX relation, (7) formula is reduced to:

\arg \min_{U} l o g (1 + (\frac{t r (U^{T} {XLX}^{T} U)}{c^{2}})) - - - (6)

Distant between the sample out of the space representation in low-dimensional just represents for each sampleFor enough far away of: the distance at the sample of each lower dimensional space and all sample classes center, be expressed as following object function:

\arg \underset{y_{i}}{m a x} Σ_{i = 1}^{N} | | y_{i} - \overset{&OverBar;}{y} | |^{2} - - - (7)

Be exactly all sample Lei centers,

For fear of the situation of over-fitting occurs, added two norms, integrate so all above-mentioned situations, just write as withUnder object function:

\arg \underset{U}{m i n} l o g (1 + (\frac{t r (U^{T} {XLX}^{T} U)}{c^{2}})) - C_{1} t r (U^{T} S U) + C_{2} | | U | |^{2} - - - (8)

The C here₁And C₂It is regularization coefficient;

In order to make (8) have unique solution, the given qualifications of applicant is U^TU=I; Projection matrix U is by the method for iterationWith the method for solving solution of characteristic value out.

The invention has the beneficial effects as follows under the environment of making an uproar scene is accurately identified, ensured to sneak into noise at sampleEqually also there is certain identification capability later; Therefore just improve indoor scene data centralization and be mixed with the performance under noise situations.

Below in conjunction with the drawings and specific embodiments, the present invention is further explained.

Brief description of the drawings

Fig. 1 is the logical framework figure of technical solution of the present invention.

Detailed description of the invention

See Fig. 1, the object of the invention is to overcome existing combination coloured image and depth image (depthiamge) and passIn the indoor scene recognition system of sensor, do not consider the problem of influence of noise, proposed a kind of scene Recognition having under the environment of making an uproarMethod, comprises the following steps: 1) utilize Kinect sensor to obtain scene image; 2) coloured image to scene and the degree of depth respectivelyImage (depthimage) carries out feature extraction and feature representation, merges coloured image feature and the depth image of same group(depthimage) feature; 3) adopt stream shape Cauchy learning algorithm to obtain feature to the sample of mark by the feature that the 2nd step obtainsPreference pattern; 4) adopt SVMs (SVM) grader to classify.

The described first step utilizes Kinect sensor to obtain scene coloured image and corresponding depth image (depthimage)。

The feature of the coloured image that described second step obtains the first step and corresponding depth image (depthimage)The process of extraction and feature representation and merging is as follows: 1) convert all images to gray-scale map, and by certain proportion to imageCarry out convergent-divergent, make its size be less than or equal to 300 × 300 pixels. 2) to coloured image and depth image (depthimage)Localized mass on extract feature by yardstick invariant features conversion (SIFT) method, the size of this localized mass is 16 × 16 pixels, phaseBetween adjacent localized mass on image level or vertically have the overlapping region of 8 pixels, the yardstick invariant features extracting in localized massConverting characteristic dimension is 128. 3) adopt LLC algorithm to carry out feature representation. Adopt partial restriction uniform enconding (LLC) algorithm to carry outWhen expression, need to carry out k average (k-means) cluster to the localized mass on all data sets, thereby form a code book (wordAllusion quotation). K means clustering algorithm chooses at random first cluster centre, when cluster centre is in the time changing among a small circle, and the termination of iteration.Suppose that the number of code book is 1024 in embodiments. Applicant carries out maximum polymerization on three sheaf space pyramid models,This three sheaf spaces pyramid model is divided into 1 × 1,2 × 2 and 4 × 4 subregion. To paired coloured image and depth image(depthimage) concerning, the characteristic length of partial restriction uniform enconding (LLC) is all (1+4+16) × 1024=21504.The characteristics of image finally coloured image and depth image (depthimage) being obtained respectively merges and obtains 21504 × 2=43008 dimensional features.

The 3rd described step feature selecting algorithm, flows shape Cauchy learning algorithm concrete steps as follows:

For a given sample x_iBelong to sample set X=[x₁,x₂,...x_n]∈R^D×N(N is number of samples here,D is the original dimension of sample, and R represents at real number space), the low-dimensional sample y that it is corresponding_iBelong to sample set Y=[y₁,y₂,...y_n]∈R^d×N(d is the dimension after dimensionality reduction here), finds the similar and inhomogeneous sample of K arest neighborsItsIn, there is k₁Individually be and x_iSimilar sample, remaining k₂Individually be and x_iInhomogeneous sample, wherein K=k₁+k₂, applicant respectivelyWithWithRepresent this two groups of samples; For whole x_iLocalized mass be expressed as:(whereinRepresent D × (k₁+k₂+ 1) linear space of dimension), rightThe low-dimensional of answering is expressedIn a low-dimensional localized mass newly obtaining,Reach between similar sample enough far away of the enough distance near and between inhomogeneity sample of distance, therefore above saying optimizationFunction representation is as follows:

\min_{y_{i}} Σ_{j = 1}^{k_{1}} | | y_{i} - y_{i^{j}} | |^{2} - α Σ_{p = 1}^{k_{2}} | | y_{i} - y_{i_{p}} | |^{2}, - - - (1)

Here α is scale factor, is used for controlling the impact of sample between the interior sample of class and class.

Here can define a coefficient vector ω_i：

\arg \min_{Y_{i}} t r (Y_{i} L_{i} Y_{i}^{T}) - - - (2)

What the tr () here represented is mark computing, in formula

L_{i} = [\begin{matrix} Σ_{j = 1}^{k_{1} + k_{2}} {(ω_{i})}_{j} & - ω_{i}^{T} \\ - ω_{i} & d i a g (ω_{i}) \end{matrix}] .

Introduce selection matrix (S below_i)_pq：

{(S_{i})}_{p q} = {\begin{matrix} 1, & \begin{matrix} I f & p = F_{i} {q}; \end{matrix} \\ 0, & e l s e . \end{matrix} - - - (3)

Here

S_{i} &Element; R^{N \times (1 + k_{1} + k_{2})} .

\begin{matrix} \underset{Y}{\arg \min} Σ_{i = 1}^{N} t r ({YS}_{i} L_{i} S_{i}^{T} Y^{T}) \\ = \underset{Y}{\arg \min} t r ({YLY}^{T}) . \end{matrix} - - - (4)

Introduce Cauchy's estimation theory [1],

[1]M.IvanandC.H.Muller,“Breakdownpointsofcauchyregression-scaleestimators,”Statistics&probabilityletters,vol.57,no.1,pp.79–89,Feb.2002.

Overcome the impact that noise brings, (4) formula becomes following form:

\arg \min_{Y} l o g (1 + (\frac{t r ({YLY}^{T})}{c^{2}})) - - - (5)

The c is here the coefficient for weighing noise.

Owing to there being Y=U^TX relation, (7) formula abbreviation is:

\arg \min_{U} l o g (1 + (\frac{t r (U^{T} {XLX}^{T} U)}{c^{2}})) - - - (6)

\arg \underset{y_{i}}{m a x} Σ_{i = 1}^{N} | | y_{i} - \overset{&OverBar;}{y} | |^{2} - - - (7)

HereBe exactly all sample Lei centers,

For fear of the situation that over-fitting occurs, so added two norms, integrate so all above-mentioned situations, just writeBecome following object function:

\arg \underset{U}{m i n} l o g (1 + (\frac{t r (U^{T} {XLX}^{T} U)}{c^{2}})) - C_{1} t r (U^{T} S U) + C_{2} | | U | |^{2} - - - (8)

The C here₁And C₂It is regularization coefficient.

In order to make (8) have unique solution, the given qualifications of applicant is U^TU=I. Projection matrix U is by the method for iterationWith the method for solving solution of characteristic value out.

The 4th described step adopts SVMs (SVM) grader to classify.

Claims

1. there is the scene Recognition method under the environment of making an uproar, it is characterized in that, comprise the following steps: 1) utilize sensor to obtain fieldScape image, the sample of intragenic marker; 2) coloured image to scene and depth image depthimage carry out feature extraction respectivelyAnd feature representation, merge coloured image feature and the depth image depthimage feature of same group; 3) select feature selecting to calculateMethod is obtained feature selecting model to the sample of mark by the feature that the 2nd step obtains; 4) adopt grader to classify.

2. a kind of scene Recognition method having under the environment of making an uproar according to claim 1, is characterized in that described step 1)To utilize Kinect sensor to obtain scene image.

3. a kind of scene Recognition method having under the environment of making an uproar according to claim 1, is characterized in that described step 2)Be specially, convert all images to gray-scale map, image is carried out to convergent-divergent, then to coloured image and depth image depthIn the localized mass of image, extract feature by yardstick invariant features conversion method, then adopt partial restriction uniform enconding algorithm to carry outFeature representation.

4. a kind of scene Recognition method having under the environment of making an uproar according to claim 1, is characterized in that described step 3)Feature selecting algorithm is stream shape Cauchy learning algorithm, and concrete steps are as follows:

For a given sample x_iBelong to sample set X=[x₁,x₂,...x_n]∈R^D×N(N is number of samples here, and D isThe original dimension of sample, R represents at real number space), the low-dimensional sample y that it is corresponding_iBelong to sample set Y=[y₁,y₂,...y_n]∈R^d×N(d is the dimension after dimensionality reduction here), finds the similar and inhomogeneous sample of K arest neighborsWherein, there is k₁Individually be and x_iSimilar sample, remaining k₂Individually be and x_iInhomogeneous sample, wherein K=k₁+k₂, use respectivelyWithRepresent this two groups of samples; For whole x_iLocalized mass be expressed as:

X_{i} = [x_{i}, x_{i^{1}}, ... x_{i^{k_{1}}}, x_{i_{1}}, ..., x_{i_{k_{2}}}] &Element; R^{D \times (k_{1} + k_{2} + 1)}

(whereinRepresent D × (k₁+k₂+ 1) the linearity sky of dimensionBetween), corresponding low-dimensional is expressed and is

Y_{i} = [y_{i}, y_{i^{1}}, ..., y_{i^{k_{1}}}, y_{i_{1}}, ..., y_{i_{k_{2}}}] &Element; R^{d \times (k_{1} + k_{2} + 1)};

A low-dimensional part newly obtainingIn piece, reach between similar sample enough far away of the enough distance near and between inhomogeneity sample of distance, therefore above optimizationFunction representation is as follows:

\underset{y_{i}}{m i n} Σ_{j = 1}^{k_{1}} | | y_{i} - y_{i^{j}} | |^{2} - α Σ_{p = 1}^{k_{2}} | | y_{i} - y_{i_{p}} | |^{2}, - - - (1)

Define a coefficient vector ω_i：

\arg \min_{Y_{i}} t r (Y_{i} L_{i} Y_{i}^{T}), - - - (2)

What tr () represented is mark computing, in formula

L_{i} = [\begin{matrix} Σ_{j = 1}^{k_{1} + k_{2}} {(ω_{i})}_{j} & - ω_{i}^{T} \\ - ω_{i} & d i a g (ω_{i}) \end{matrix}] .

Introduce selection matrix (S below_i)_pq：

{(S_{i})}_{p q} = \{\begin{matrix} 1, & \begin{matrix} I f & p = F_{i} {q}; \end{matrix} \\ 0, & e l s e . \end{matrix} - - - (3)

S_{i} &Element; R^{N \times (1 + k_{1} + k_{2})};

\begin{matrix} \underset{Y}{\arg \min} Σ_{i = 1}^{N} t r ({YS}_{i} L_{i} S_{i}^{T} Y^{T}) \\ = \underset{Y}{\arg \min} t r ({YLY}^{T}) . \end{matrix} - - - (4)

\arg \underset{Y}{m i n} l o g (1 + (\frac{t r ({YLY}^{T})}{c^{2}})) - - - (5)

C is the coefficient for weighing noise;

Owing to there being Y=U^TX relation, (7) formula is reduced to:

\arg \underset{U}{m i n} l o g (1 + (\frac{t r (U^{T} {XLX}^{T} U)}{c^{2}})) - - - (6)

Distant between the sample out of the space representation in low-dimensional is just expressed as: every for each sampleEnough far away of the distance at the sample of individual lower dimensional space and all sample classes center, is expressed as following object function:

\arg \underset{y_{i}}{m a x} Σ_{i = 1}^{N} | | y_{i} - \overset{&OverBar;}{y} | |^{2} - - - (7)

Be exactly all sample Lei centers,

For fear of the situation that over-fitting occurs, add two norms, integrate so all above-mentioned situations, just write as followingObject function:

\arg \underset{U}{m i n} l o g (1 + (\frac{t r (U^{T} {XLX}^{T} U)}{c^{2}})) - C_{1} t r (U^{T} S U) + C_{2} | | U | |^{2} - - - (8)

The C here₁And C₂It is regularization coefficient;

In order to make (8) have unique solution, the given qualifications of applicant is U^TU=I; Projection matrix U is by method and the feature of iterationThe method for solving solution of value out.