CN103605984B

CN103605984B - Indoor scene sorting technique based on hypergraph study

Info

Publication number: CN103605984B
Application number: CN201310566625.XA
Authority: CN
Inventors: 俞俊; 王超杰
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2013-11-14
Filing date: 2013-11-14
Publication date: 2016-08-24
Anticipated expiration: 2033-11-14
Also published as: CN103605984A

Abstract

Indoor scene sorting technique based on hypergraph study, relates to indoor scene classification.More or less a hundred target detection is used to extract target from image, according to the super descriptor of the goal descriptor composition formed as the feature descriptor of image；Use k nearest neighbor method that image descriptor is built hypergraph, calculate its Laplacian Matrix, build semi-supervised learning framework；Build a linear regression model (LRM), and this linear regression model (LRM) is joined in semi-supervised learning framework；According to constructed semi-supervised learning framework, and combine the feature descriptor of extracted image, parts of images descriptor is labeled, this semi-supervised learning frame is made can to dope to automatic Iterative the label not marking image, thus complete image classification, meanwhile, linear regression model (LRM) is initialised during automatic Iterative；Foundation linear regression model (LRM), and combine the feature descriptor of extracted image, the data being newly added directly can be carried out image classification, and hypergraph need not be again pulled up.

Description

Indoor scene sorting technique based on hypergraph study

Technical field

The present invention relates to indoor scene classification, especially relate to a kind of indoor scene sorting technique based on hypergraph study.

Background technology

At present, the general feature descriptor using low level of indoor scene classification, mainly include the information such as color, texture, shape. The feature descriptor of these low levels has preferable effect to outdoor scene classification, yet with the kind of object that indoor scene is complicated And overlap, thus performance is general on indoor scene classifying quality.Along with the development of correlation technique, there are some images improved special Levy descriptor and be introduced into the classifying quality for improving image, such as pyramid matching attribute ([1] S.Lazebnik, C.Schmid, and J.Ponce,“Beyond bags of features:Spatial pyramid matching for recognizing natural scene categories,”in Proc.IEEE Int.Conf.Computer Vision and Pattern Recognition,2006,vol.2,pp. 2169 2178), global description's ([2] C.Siagian and L.Itti, " Rapid biologically-inspired scene classification using features shared with visual attention,”IEEE Trans.Pattern Anal.Mach.Intell., Vol.29, no.2, pp.300 312, Feb.2007) etc., but these characteristics of image improved describe owing to not solving indoor The key problem of scene image, can not be significantly increased the classifying quality of indoor scene.Use and high-level comprise image language The feature descriptor of justice, owing to saving the substantial amounts of semanteme of image, it is possible to identify multiple object in indoor scene, to improving room Interior scene image classifying quality important role.

With in high-level image descriptor, employing a series of image, semantic attribute of having researched and proposed in early days is believed to describe image Breath, these methods describing image obtain good effect in Image Acquisition and image classification field.Stanford University laboratory It is also proposed that one new is super descriptor ([3] L.Li, H.Su, E.Xing and F.Li, " Object Bank:A High-Level Image Representation for Scene Classification and Semantic Feature Sparsification,”Proceedings Of the Neural Information Processing Systems (NIPS), 2010) image is described, this image descriptor is being retouched State to have on the image of the class with complex object, especially off-the-air picture and preferably describe effect.But these images are classified still Conventional full measure of supervision is so used to classify, it is impossible to enough to consider global property information and the local number of all data It is believed that the relation between breath, so showing the most general on image classifying quality.

Summary of the invention

It is an object of the invention to provide a kind of indoor scene sorting technique based on hypergraph study.

The present invention comprises the following steps:

(1) more or less a hundred target detection is used to extract target from image, further according to one of the goal descriptor composition formed Super descriptor, as the feature descriptor of image；

(2) use k nearest neighbor method that the image descriptor of all generations builds hypergraph, and hypergraph based on generation calculates it and draws This matrix of pula, and then build semi-supervised learning framework；

(3) build a linear regression model (LRM), and this linear regression model (LRM) is joined in semi-supervised learning framework；

(4) feature descriptor of the image that semi-supervised learning framework constructed in foundation step (3), and integrating step (1) is extracted, Parts of images descriptor is labeled so that this semi-supervised learning frame can dope to automatic Iterative the label not marking image, Thus complete image classification, meanwhile, the linear regression model (LRM) in step (3) is initialised during automatic Iterative；

(5) feature descriptor of the image that the linear regression model (LRM) in foundation step (3), and integrating step (1) is extracted, can be right The data being newly added directly carry out image classification, and need not again pull up hypergraph.

In step (2), the concrete grammar of described structure semi-supervised learning framework can be:

First calculate the feature descriptor Euclidean distance between any two of the image of extraction, and obtain correlation matrix H with this:

H (&upsi;, e) = \{\begin{matrix} 1, & if&upsi; &Element; e \\ 0, & if&upsi; &NotElement; e \end{matrix}

Wherein υ represents the node of hypergraph, and e represents the limit of hypergraph；

And then weight w (e) of each edge in hypergraph, the number of degrees d (υ) of each node and number of degrees δ (e) on every super limit can be calculated, W (e), d (υ) and δ (e) can be used to construct its relevant diagonal matrix W, D as diagonal element simultaneously_υAnd D_e, according to these three Diagonal matrix and correlation matrix, can be calculated intermediate object program Θ is:

Use unit matrix I to deduct Θ then can obtain:

L=I-Θ

Result of calculation L i.e. represents the Laplacian Matrix of this hypergraph；Semi-supervised learning can be constructed based on this Laplacian Matrix The regularization term of framework:

Ω(f)=f^TLf

Wherein f represents the label vector needing prognostic chart picture, f^TRepresent the transposed vector of vector f；And then construct semi-supervised frame Frame, its formula is as follows:

\arg \underset{F}{\min tr} F^{T} LF + λtr [{(F - Y)}^{T} (F - Y)]

Wherein Y represents the matrix being labeled image, and tr represents calculating matrix trace, and lambda parameter is the number of a non-negative, controls The balance between model complexity and empirical loss function；By calculating this formula, the prediction label of total data can be obtained F。

In step (3), described linear regression model (LRM), its effect is to the data being newly added, it is possible to use this linear regression model (LRM) Directly carry out image classification, and hypergraph need not be again pulled up；Linear regression model (LRM) formula is as follows:

g(x)=Q^Tx+θ

Wherein Q is the first order parameter of linear regression model (LRM), and θ is constant term parameter；This linear regression model (LRM) is embedded into semi-supervised In habit framework, the newest framework is:

\arg \min_{F, Q} tr F^{T} LF + λtr [{(F - Y)}^{T} (F - Y)]

+ α {| | XQ - F | |}_{F}^{2} + γ {| | Q | |}_{F}^{2}

Wherein, X represents the feature descriptor of each image, α and γ as the regular parameter of non-negative, control model complexity and Balance between empirical loss function；

According to the convex attribute of above-mentioned formula, the partial derivative of parameters can be calculated respectively and try to achieve the optimal solution of F, first use J table Show this semi-supervised learning framework, if the partial derivative of F and Q obtains equal to 0:

\frac{&PartialD; J}{&PartialD; F} = 2 FL + 2 λ (F - Y) + 2 α (F - XQ)

&DoubleRightArrow; [L + (λ + α) I] F = Y + αXQ

\frac{&PartialD; J}{&PartialD; Q} = 2 α X^{T} (XQ - F) + 2 γQ

&DoubleRightArrow; (α X^{T} X + γI) Q = α X^{T} F

The Q that second equation is tried to achieve is updated in the first equation, can be in the hope of the result of F:

F=(K-αXM)^-1Y

Wherein, intermediate object program K represents L+ (λ+α) I, and intermediate object program M represents (α X^TX+γI)^-1αX^T, now will try to achieve F Substitute into ask and the local derviation formula equation of Q can obtain Q be:

Q=(αX^TX+γI)^-1αX^TF=MF

Obtain Q and be the parameter of linear regression model (LRM), when there being new data to enter, new data need not be built hypergraph, Ke Yizhi Connect according to formula g (x)=Q^TX+ θ tries to achieve the label information of new data.

The present invention uses raw image data to build a hypergraph, and uses semi-supervised learning framework to predict the mark not marking image Sign, scheme more rich information owing to hypergraph itself saves than common, and semi-supervised learning framework not only considers global data Attribute information, has allowed also for the local message between labeled data and unlabeled data, thus the present invention is at indoor scene Classification aspect obtains preferable effect.

The invention have the advantages that: use the image descriptor comprising semantic information and semi-supervised learning framework to come room Interior scene is classified, and can effectively provide the precision that indoor scene is classified.The linear regression model (LRM) simultaneously trained, it is possible to add The prediction of speed new data label.The present invention is that robot path selects and Indoor Video field provides new technical method, has Effect improves the efficiency using indoor scene art.

Accompanying drawing explanation

Fig. 1 is the flow chart of the embodiment of the present invention.

Fig. 2 is that the present invention compares schematic diagram with the classifying quality of other sorting techniques.In fig. 2, abscissa is training data Mark ratio (%), ordinate is classification accuracy (%)；Curve a represents hypergraph learning method of the present invention, and curve b represents general Logical drawing method, curve c represents that k near neighbor method, curve d represent that Laplce's SVMs, curve e represent progressive and directly push away Formula SVMs, curve f represents common SVMs.

Fig. 3 is the linear regression model (LRM) prediction image tag result schematic diagram that the present invention uses.In figure 3, abscissa is training The mark ratio (%) of data, ordinate is classification accuracy (%)；Curve a represents parameter Q that 10% training data generates, Curve b represents parameter Q that 20% training data generates, and curve c represents parameter Q that 30% training data generates, curve d table Showing parameter Q that 40% training data generates, curve e represents parameter Q that 50% training data generates.

Detailed description of the invention

The indoor scene sorting technique based on hypergraph study that the present invention proposes, introduces the concrete technical scheme of the present invention according to Fig. 1 With implement step:

Step one: use more or less a hundred target detection to extract target from image, further according to the goal descriptor composition formed One super descriptor, as the feature descriptor of image；

Step 2: use k nearest neighbor method that the image descriptor of all generations is built hypergraph, and calculate based on the hypergraph generated Its Laplacian Matrix, and then construct semi-supervised learning framework；

Step 3: build a linear regression model (LRM), and this linear regression model (LRM) is joined in semi-supervised learning framework；

Step 4: according to semi-supervised learning framework constructed in step 3, and the feature of image that integrating step one is extracted retouches State symbol, parts of images descriptor is labeled so that this semi-supervised learning frame can dope to automatic Iterative and not mark image Label, thus complete image classification.Meanwhile, the model of the linear regression in step 3 is initialised during automatic Iterative；

Step 5: according to the model of the linear regression in step 3, and the feature descriptor of image that integrating step one is extracted, The data being newly added directly can be carried out image classification, and hypergraph need not be again pulled up.

About the concrete grammar building semi-supervised learning framework mentioned in step 2, first retouch according to the feature of the image extracted State symbol and build hypergraph, and calculate its correlation matrix H:

H (&upsi;, e) = \{\begin{matrix} 1, & if&upsi; &Element; e \\ 0, & if&upsi; &NotElement; e \end{matrix}

Wherein υ represents the node of hypergraph, and e represents the limit of hypergraph.And then weight w (e) of each edge, Mei Gejie in hypergraph can be calculated The number of degrees d (υ) of point and number of degrees δ (e) on every super limit, can use w (e), d (υ) and δ (e) to construct its phase as diagonal element simultaneously The diagonal matrix W, D closed_υAnd D_e, according to these three diagonal matrix and correlation matrix, can be calculated intermediate object program Θ is:

Use unit matrix I to deduct Θ then can obtain:

L=I-Θ

Result of calculation L i.e. represents the Laplacian Matrix of this hypergraph.Semi-supervised learning can be constructed based on this Laplacian Matrix The regularization term of framework:

Ω(f)=f^TLf

Wherein f represents the label vector needing prognostic chart picture, f^TRepresent the transposed vector of vector f.And then construct semi-supervised frame Frame, its formula is as follows:

\arg \min_{F} tr F^{T} LF + λtr [{(F - Y)}^{T} (F - Y)]

Wherein Y represents the matrix being labeled image, and tr represents calculating matrix trace, and lambda parameter is the number of a non-negative, control Make the balance between model complexity and empirical loss function.By calculating this formula, the pre-mark of total data can be obtained Sign F.

The model of the linear regression mentioned in step 3, its effect is to the data being newly added, it is possible to use this linear regression model (LRM) Directly carry out image classification, and hypergraph need not be again pulled up.The model formation of this linear regression is as follows:

g(x)=Q^Tx+θ

Wherein Q is the first order parameter of linear regression model (LRM), and θ is constant term parameter.This linear model is embedded into semi-supervised learning In framework, the newest framework is:

\arg \min_{F, Q} tr F^{T} LF + λtr [{(F - Y)}^{T} (F - Y)]

+ α {| | XQ - F | |}_{F}^{2} + γ {| | Q | |}_{F}^{2}

Wherein, X represents the feature descriptor of each image, α and γ controls complexity and the warp of model as the regular parameter of non-negative Test the balance between loss function.

According to the convex attribute of above-mentioned formula, the partial derivative of parameters can be calculated respectively and try to achieve the optimal solution of F, first use J Represent this semi-supervised learning framework, if the partial derivative of F and Q obtains equal to 0:

\frac{&PartialD; J}{&PartialD; F} = 2 FL + 2 λ (F - Y) + 2 α (F - XQ)

&DoubleRightArrow; [L + (λ + α) I] F = Y + αXQ

\frac{&PartialD; J}{&PartialD; Q} = 2 α X^{T} (XQ - F) + 2 γQ

&DoubleRightArrow; (α X^{T} X + γI) Q = α X^{T} F

F=(K-αXM)^-1Y

Q=(αX^TX+γI)^-1αX^TF=MF

Obtain Q and be the parameter of linear regression model (LRM), when there being new data to enter, new data need not be built hypergraph, can direct root According to formula g (x)=Q^TX+ θ tries to achieve the label information of new data.

Claims

1. indoor scene sorting technique based on hypergraph study, it is characterised in that comprise the following steps:

(2) use k nearest neighbor method that the image descriptor of all generations builds hypergraph, and hypergraph based on generation calculates it and draws This matrix of pula, and then build semi-supervised learning framework；Described structure semi-supervised learning framework method particularly includes:

H (v, e) = \{\begin{matrix} 1, & \begin{matrix} i f & v &Element; e \end{matrix} \\ 0, & \begin{matrix} i f & v &NotElement; e \end{matrix} \end{matrix}

Wherein v represents the node of hypergraph, and e represents the limit of hypergraph；

And then weight w (e) of each edge in hypergraph, number of degrees d (v) of each node and number of degrees δ (e) on every super limit can be calculated, W (e), d (v) and δ (e) can be used to construct its relevant diagonal matrix W, D as diagonal element simultaneously_vAnd D_e, according to these three Diagonal matrix and correlation matrix, can be calculated intermediate object program Θ is:

Θ = D_{v}^{- (1 / 2)} {HWD}_{e}^{- 1} H^{T} D_{v}^{- (1 / 2)}

Use unit matrix I to deduct Θ then can obtain:

L=I-Θ

Ω (f)=f^TLf

\arg \underset{F}{m i n} t r F^{T} L F + λ t r [{(F - Y)}^{T} (F - Y)]

Wherein Y represents the matrix being labeled image, and tr represents calculating matrix trace, and lambda parameter is the number of a non-negative, controls The balance between model complexity and empirical loss function；By calculating this formula, the prediction label of total data can be obtained F；

2. the indoor scene sorting technique learnt based on hypergraph as claimed in claim 1, it is characterised in that in step (3), institute Stating linear regression model (LRM), its effect is to the data being newly added, it is possible to use this linear regression model (LRM) directly to carry out image classification, And hypergraph need not be again pulled up；Linear regression model (LRM) formula is as follows:

G (x)=Q^Tx+θ

\begin{matrix} \arg \min_{F, Q} t r F^{T} L F + λ t r [{(F - Y)}^{T} (F - Y)] \\ + α | | X Q - F | |_{F}^{2} + γ | | Q | |_{F}^{2} \end{matrix}

\begin{matrix} \frac{\partial J}{\partial F} = 2 F L + 2 λ (F - Y) + 2 α (F - X Q) \\ &DoubleRightArrow; [L + (λ + α) I] F = Y + α X Q \end{matrix}

\begin{matrix} \frac{\partial J}{\partial Q} = 2 {αX}^{T} (X Q - F) + 2 γ Q \\ &DoubleRightArrow; ({αX}^{T} X + γ I) Q = {αX}^{T} F \end{matrix}

F=(K-α XM)^-1Y

Q=(α X^TX+γI)^-1αX^TF=MF