CN102867195B

CN102867195B - Method for detecting and identifying a plurality of types of objects in remote sensing image

Info

Publication number: CN102867195B
Application number: CN201210300645.8A
Authority: CN
Inventors: 韩军伟; 周培诚; 王东阳; 郭雷; 程塨; 李晖晖
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2012-08-22
Filing date: 2012-08-22
Publication date: 2014-11-26
Anticipated expiration: 2032-08-22
Also published as: CN102867195A

Abstract

The invention relates to a method for detecting and identifying a plurality of types of objects in a remote sensing image based on sparse representation dictionary learning. The method is technically characterized by comprising the following steps of: training preprocessed training data to form a dictionary by using a sparse representation dictionary training method; performing sparse coding on the dictionary obtained by training by using an image sub-block in a test image, calculating a sparse representation coefficient to further obtain a reconstruction error of the image sub-block, and determining a candidate target area by thresholding the reconstruction error; and accurately detecting and identifying a plurality of types of objects in the remote sensing image by using post processing. By the method, a plurality of types of objects in the remote sensing image under a complex background can be detected and identified. The method is high in detection and identification accuracy and low in false alarm rate.

Description

A kind of remote sensing images multi-class targets detection and Identification method

Technical field

The present invention relates to a kind of remote sensing images multi-class targets detection and Identification method, can be applied to the polytype Target detection and identification under complex background remote sensing images.

Background technology

As an application of Remote Sensing Image Processing Technology, object detection and recognition under complex background remote sensing images is a gordian technique in the field such as military surveillance and precision strike, also be study hotspot and the difficult point in this field always, there is important military and civilian to be worth, be subject to people and more and more pay close attention to.

Remote Sensing Target detects the two kinds of methods that mainly contain at present.A kind of is that some shape, the geometric properties having by detection target in remote sensing images solves target detection problems, but due to remote sensing images background complexity, exist a large amount of shape, geometric properties similar with target, only rely on these features to detect target and there will be a large amount of undetected, flase drop.Another kind is the thought based on classification, wherein modal is Bag-of-Words(BoW) sorting technique, first the method is that image is extracted to SIFT feature cluster, one group of standard base (image-region of standard) using cluster centre in image space, then can organize standard base with this image is carried out to vector representation, finally obtained vector is classified and thresholding by use svm classifier device, obtain testing result; But BoW method, although the SIFT feature of extracting has yardstick and rotational invariance, has only utilized the statistical nature of characteristic area, and has ignored the spatial information of characteristic area, therefore uses the method verification and measurement ratio of BoW low, and false alarm rate is high; And another sorting technique Linear Spatial Pyramid Matching Using Sparse Coding(ScSPM) although considered the spatial information of characteristic area, the vectorial dimension for classifying obtaining is too high, operand is excessive.In addition, the object detection method of most based on classification also only limits to simple target to detect, and can not detect and identification multiple targets simultaneously.

Summary of the invention

The technical matters solving

For fear of the deficiencies in the prior art part, the present invention proposes a kind of method of the remote sensing images multi-class targets detection and Identification based on rarefaction representation dictionary learning.This method can automatically detect and identify dissimilar target from the remote sensing images of complex background, has higher accuracy of detection and lower false alarm rate.

Technical scheme

A kind of remote sensing images multi-class targets detection and Identification method, is characterized in that step is as follows:

Step 1: use the method training dictionary based on rarefaction representation dictionary learning, concrete steps are as follows:

Step a1 training image early stage processes: first by generic target unification to principal direction in original image, then by the image after unified direction along 0 ° to 360 °, according to step-length rotate to be the image of individual different directions; The original image of different classes of target is all processed according to the method described above, obtained class training image, the wherein different classes of number of targets of p for detecting, for the anglec of rotation, c is the total number of classification of different target different directions image in obtained training image; Wherein: for rounding downwards;

Step b1 data pre-service: adopt method of weighted mean pair tri-components of RGB of class training image are weighted and on average obtain gray level image, then gray level image are carried out to down-sampling processing, obtain the image of n × n size; The image of n × n size is carried out to energy normalized processing and obtain normalized image, then normalized image is converted to n ²the column vector of × 1 dimension, the row using column vector in training data, obtain pretreated training dataset U=[U ₁, U ₂..., U _c], wherein U _ithe subdata collection of corresponding i class in training dataset U, i=1,2 ..., c;

Step c1 trains dictionary: the FDDL software package of issuing by Fisher Discrimination Dictionary Learning for Sparse Representation is trained known training dataset U=[U ₁, U ₂..., U _c], obtain dictionary D=[D ₁, D ₂..., D _c], wherein, D _ithe sub-dictionary corresponding with i class;

Step 2 sparse coding: the dictionary D=[D obtaining according to training ₁, D ₂..., D _c], the each subimage block in test pattern is carried out to sparse coding, obtain the sparse coefficient that each subimage block is corresponding, concrete treatment step is as follows:

Step a2 test pattern pre-service: first use the method for weighted mean described in step b1 that test pattern is converted into test gray level image, then use the moving window that size is S × S to slide and obtain subimage block with interval steps b along test gray level image; Subimage block down-sampling is processed to the image that size is n × n, then carry out energy normalized processing, then image after treatment energy normalized is converted to a n ²the column vector β of × 1 dimension, represents the pixel grey scale value information of the subimage block obtaining by moving window with column vector β;

Step b2 sparse coding: each subimage block is passed through to Optimized model

\hat{α} = \arg \min {| | α | |}_{1} s . t . {| | β - Dα | |}_{2}^{2} \leq ϵ

Obtain the sparse coding coefficient of corresponding each subimage block wherein be and sub-dictionary D _icorresponding coefficient vector, ε > 0 is allowable error, || || ₁for l ₁norm, || || ₂for l ₂norm;

Step c2 asks for reconstructed error: according to sparse coding coefficient calculate the reconstructed error e of each subimage block and each class _i, get e=min{e _ias the reconstructed error of this subimage block, and record its corresponding classification then judge in this subimage block, whether to comprise target according to the magnitude relationship between reconstructed error e and predefined threshold tau: if e < is τ, illustrates and comprise target, otherwise, illustrate that this subimage block is background;

Step 3 object detection and recognition:

Step a3: will judge the corresponding reconstructed error e of each subimage block that comprises target in step c2, form a reconstructed error matrix E=(e of the same size with test gray level image, to represent candidate target region _st) _{p × Q}; Wherein, e _stthe value of locating in coordinate points (s, t) for reconstruct error matrix,

e_{st} = \{\begin{matrix} 0 & e &GreaterEqual; τ \\ e & e < τ \end{matrix},

P × Q is the size of test pattern, s=1, and 2 ... P, t=1,2 ... Q;

To in step c2, judge the corresponding classification C of each subimage block that comprises target, form a classification matrix L=(C of the same size with test gray level image, to represent candidate target classification _st) _{p × Q}; Wherein C _stthe value of locating in coordinate points (s, t) for classification matrix,

C_{st} = \{\begin{matrix} 0 & e &GreaterEqual; τ \\ C & e < τ \end{matrix};

Step b3: big or small G time of window S × S slided in change, repeating step 2 ~ step a3G time, the G obtaining a reconstructed error matrix and G classification matrix, the span of G is 5 ~ 10; By a multiple dimensioned reconstructed error matrix M AP=(e of the G obtaining a reconstructed error matrix composition _stg) _{p × Q × G}; Wherein, e _stgfor the element in matrix M AP, its value is to change the corresponding e of reconstructed error matrix that sliding window size obtains for the g time _st, P × Q × G is the size of multiple dimensioned reconstructed error matrix, g=1, and 2 ... G;

The G obtaining a classification matrix formed to a multiple dimensioned classification Matrix C LASS=(C _stg) _{p × Q × G}; Wherein, C _stgfor the element in Matrix C LASS, its value is to change the corresponding C of classification matrix that sliding window size obtains for the g time _st; Obtain a minimal reconstruction error matrix (map (s, t)) according to multiple dimensioned reconstructed error matrix M AP _{p × Q}, wherein map (s, t) is the value that corresponding minimal reconstruction error matrix is located in coordinate points (s, t),

Then obtain the minimum classification matrix (class (s, t)) of corresponding minimal reconstruction error matrix _{p × Q}, wherein class (s, t) is the value that minimum classification matrix is located in coordinate points (s, t),

Obtain Scale Matrixes scale=(scale (s, t)) according to multiple dimensioned reconstructed error matrix M AP _{p × Q}, scale (s, t) is the value that corresponding Scale Matrixes is located in coordinate points (r, t),

scale (s, t) = \{\begin{matrix} 0 & e_{st} = 0 \\ \underset{g}{\arg \min} {e_{stg}} & e_{st} &NotEqual; 0 \end{matrix};

Step c3: ask for minimal reconstruction error matrix (map (s, t)) _{p × Q}local neighborhood minimal value as the target response value detecting, local neighborhood minimal value is at minimal reconstruction error matrix (map (s, t)) _{p × Q}in corresponding coordinate be the center of target, according to center at (class (s, t)) _{p × Q}(scale (s, t)) _{p × Q}the corresponding classification of target and scale size are found in the position of middle correspondence.

Described calculated with weighted average method formula is f (x, y)=0.3R (x, y)+0.59G (x, y)+0.11B (x, y), in formula, f (x, y) gray level image obtaining for method of weighted mean is at the gray-scale value of pixel (x, y), R (x, y), G (x, y) and B (x, y) be respectively input training image at tri-component values of the RGB of pixel (x, y).

Described energy normalized computing formula is in formula, f _norm(x, y) is the gray-scale value of f (x, y) after energy normalized, and u and v are respectively the row and column size of gray level image.

Described l ₁the computing formula of norm is

{| | z | |}_{1} = Σ_{k = 1}^{M} | ξ_{k} |

In formula, z is the vector of size for M × 1, ξ _kfor the element of vectorial z, k=1,2 ..., M.

Described l ₂the computing formula of norm is

{| | z | |}_{1} = \sqrt{Σ_{k = 1}^{M} {| ξ_{k} |}^{2}}

Described reconstructed error e _icomputing formula be

e_{i} = {| | β - D_{i} {\hat{α}}_{i} | |}_{2}^{2} + γ {| | \hat{α} - m_{i} | |}_{2}^{2}

In formula, γ is predefined weights, and the span of γ is 0 ~ 1, m _ito Y _iin the element of the every a line mean vector obtaining of averaging; Y _ifor U _ithe optimum code coefficient obtaining through dictionary D sparse coding.

The described anglec of rotation span is 0 ° to 90 °.

Described FDDL software package parameter lambda ₁scope be 0.001 ~ 0.01, λ ₂scope be 0.01 ~ 0.1.

The span of described S is the integer between 40 ~ 90, and the span of b is the integer between 1 ~ 15.

The span of described threshold tau is 0 ~ 1.

Beneficial effect

The method of a kind of remote sensing images multi-class targets detection and Identification based on rarefaction representation dictionary learning that the present invention proposes, first use pretreated training data to train redundant dictionary, then the dictionary that uses training to obtain to the subimage block in test pattern carries out sparse coding, obtain its rarefaction representation coefficient, and then obtain the reconstructed error of subimage block by rarefaction representation coefficient, and it is carried out to thresholding processing, determine candidate target region; Realize the accurate detection and Identification to remote sensing images multi-class targets finally by crossing some post-processed.

The present invention automatically detects and identifies the target of plurality of classes in the remote sensing images from complex background.Facts have proved, the method has higher detection and accuracy of identification and lower false alarm rate.

Brief description of the drawings

Fig. 1: the basic flow sheet of the inventive method

Fig. 2: the training data in the inventive method

Fig. 3: the partial detection of the inventive method

(a) Aircraft Targets testing result (red square frame represents Aircraft Targets, and yellow square frame is false-alarm)

(b) Ship Target Detection result (white box represents Ship Target)

(c) oil depot target detection result (blue square frame represents oil depot target)

(d) aircraft, Ship Target Detection result

(e) aircraft, oil depot target detection result

(f) naval vessel, oil depot target detection result

Embodiment

Now in conjunction with embodiment, accompanying drawing, the invention will be further described:

For the hardware environment of implementing be: Intel Pentium 2.93GHz CPU computing machine, 2.0GB internal memory, the software environment of operation is: Matlab R2011a and Windows XP.Choose the remote sensing images that 100 width obtain from Google Earth and carried out multi-class targets test experience, mainly comprised tertiary target: aircraft, naval vessel, oil depot, wherein, totally 200 of Aircraft Targets, totally 120 of Ship Targets, totally 420 of oil depot targets.

The present invention is specifically implemented as follows:

1, training redundant dictionary: use the method training dictionary based on rarefaction representation dictionary learning, detailed process is as follows:

(1.1) training image is processed early stage: concrete processing procedure is: first by generic target unification to principal direction in original image, then by the image after unified direction along 0 ° to 360 °, every 10 ° of rotations once, obtain 36 class training datas, the original image of different classes of target, all according to above method processing, is finally obtained to 55 class training image, i.e. c=55, wherein, aircraft is totally 36 classes, naval vessel 18 classes, oil depot 1 class;

(1.2) data pre-service: adopt method of weighted mean to be weighted and on average to obtain gray level image tri-components of RGB of 55 class training images, then gray level image is carried out to down-sampling processing, obtain the image of 15 × 15 sizes, the image of 15 × 15 sizes is carried out to energy normalized processing and obtain normalized image, again normalized image is converted to the column vector of 255 × 1 dimensions, row using this column vector in training data, obtain pretreated training dataset U=[U ₁, U ₂..., U _c], wherein U _ithe subdata collection of corresponding i class in training dataset U, i=1,2 ..., c;

(1.3) the FDDL software package of issuing by Lei Zhang is trained known training dataset U=[U ₁, U ₂..., U _c], obtain dictionary D=[D ₁, D ₂..., D _c], wherein, D _ithe sub-dictionary corresponding with i class; Software package parameter lambda ₁=0.005, λ ₂=0.05;

The FDDL software package of described Lei Zhang is shown in paper: Meng Yang, Lei Zhang, Xiangchu Feng, David Zhang.Fisher Discrimination Dictionary Learning for Sparse Representation[C] .ICCV, 2011

2, sparse coding: the dictionary D=[D obtaining according to training ₁, D ₂..., D _c], the each subimage block in test pattern is carried out to sparse coding, obtain the sparse coefficient that each subimage block is corresponding, concrete treatment step is as follows:

(2.1) test pattern pre-service: first use the method for weighted mean described in (1.1) that test pattern is converted into test gray level image, then use the moving window that size is S × S to slide and obtain subimage block as 5 pixels taking interval steps along test gray level image, S initial value gets 90; To each subimage block that uses moving window to obtain, be down sampled to size and be 15 × 15 image, then carry out energy normalized processing, again image after treatment energy normalized is converted to the column vector β of one 225 × 1 dimension, represents the pixel grey scale value information of the subimage block obtaining by moving window with column vector β;

(2.2) sparse coding: to each subimage block by passing through Optimized model:

\hat{α} = \arg \min {| | α | |}_{1} s . t . {| | β - Dα | |}_{2}^{2} \leq ϵ

Obtain the sparse coding coefficient vector of corresponding each subimage block wherein be and sub-dictionary D _i

Corresponding coefficient vector, allowable error ε=0.15, || || ₁for l ₁norm, || || ₂for l ₂norm; ;

(2.3) ask for reconstructed error: according to sparse coding coefficient calculate the reconstructed error of subimage block picture piece and each class weights γ=0.5, gets e=min{e _ias the reconstructed error of this subimage block, and record its corresponding classification then judge in this subimage block, whether to comprise target according to the magnitude relationship between reconstructed error e and predefined threshold tau=0.3: if e < is τ, illustrates and comprise target, otherwise, illustrate that this subimage block is background;

3, object detection and recognition:

(3.1) will in (2.3), judge the corresponding reconstructed error e of each subimage block that comprises target, form a reconstructed error matrix E=(e of the same size with test gray level image, to represent candidate target region _st) _{p × Q}; Wherein e _stthe value of locating in coordinate points (s, t) for reconstruct error matrix,

e_{st} = \{\begin{matrix} 0 & e &GreaterEqual; τ \\ e & e < τ \end{matrix},

P × Q is the size of test pattern, s=1, and 2 ... P, t=1,2 ... Q; To in (2.3), judge the corresponding classification of each subimage block that comprises target, form a classification matrix L=(C of the same size with test gray level image, to represent candidate target classification _st) _{p × Q}; Wherein C _stthe value of locating in coordinate points (s, t) for classification matrix,

C_{st} = \{\begin{matrix} 0 & e &GreaterEqual; τ \\ C & e < τ \end{matrix};

(3.2) change the size of sliding window S × S, become S=90-10 × j, j=1,2 ... G is number of times for a change, repetition 2, step (3.1) G time altogether, the G obtaining a reconstructed error matrix and G classification matrix; By a multiple dimensioned reconstructed error matrix M AP=(e of the G obtaining a reconstructed error matrix composition _stg) _{p × Q × G}; Wherein, e _stgfor the element in matrix M AP, its value is to change the corresponding e of reconstructed error matrix that sliding window size obtains for the g time _st, P × Q × G is the size of multiple dimensioned reconstructed error matrix, g=1, and 2 ... G; The G obtaining a classification matrix formed to a multiple dimensioned classification Matrix C LASS=(C _stg) _{p × Q × G}; Wherein, C _stgfor the element in Matrix C LASS, its value is to change the corresponding C of classification matrix that sliding window size obtains for the g time _st; Obtain a minimal reconstruction error matrix (map (s, t)) according to multiple dimensioned reconstructed error matrix M AP _{p × Q}, wherein map (s, t) is the value that corresponding minimal reconstruction error matrix is located in coordinate points (s, t), then obtain the minimum classification matrix (class (s, t)) of corresponding minimal reconstruction error matrix _{p × Q}, wherein class (s, t) is the value that minimum classification matrix is located in coordinate points (s, t), obtain Scale Matrixes according to multiple dimensioned reconstructed error matrix M AP scale (s, t) is the value that corresponding Scale Matrixes is located in coordinate points (r, t),

scale (s, t) = \{\begin{matrix} 0 & e_{st} = 0 \\ \underset{g}{\arg \min} {e_{stg}} & e_{st} &NotEqual; 0 \end{matrix};

(3.3): ask for minimal reconstruction error matrix (map (s, t)) _{p × Q}local neighborhood minimal value as the target response value detecting, local neighborhood minimal value is at minimal reconstruction error matrix (map (s, t)) _{p × Q}in corresponding coordinate be the center of target, just can be at (class (s, t)) according to center _{p × Q}(scale (s, t)) _{p × Q}the corresponding classification of target and scale size are found in the position of middle correspondence.

Described calculated with weighted average method formula is

f(x,y)＝0.3R(x,y)+0.59G(x,y)+0.11B(x,y)

In formula, f (x, y) gray level image obtaining for method of weighted mean is at pixel (x, y) gray-scale value, R (x, y), G (x, y) and B (x, y) training image that is respectively input is at tri-component values of the RGB of pixel (x, y).

Described energy normalized computing formula is

f_{norm} (x, y) = \frac{f (x, y)}{\sqrt{Σ_{x = 1}^{u} Σ_{y = 1}^{v} {[f (x, y)]}^{2}}}

In formula, f _norm(x, y) is the gray-scale value of f (x, y) after energy normalized, and u and v are respectively the row and column size of gray level image, u=15, v=15.

Described l ₁the computing formula of norm is

{| | z | |}_{1} = Σ_{k = 1}^{M} | ξ_{k} |

Described l ₂the computing formula of norm is

{| | z | |}_{1} = \sqrt{Σ_{k = 1}^{M} {| ξ_{k} |}^{2}}

Described reconstructed error e _icomputing formula be

e_{i} = {| | β - D_{i} {\hat{α}}_{i} | |}_{2}^{2} + γ {| | \hat{α} - m_{i} | |}_{2}^{2}

In formula, γ is predefined weights, γ=0.5, m _ito Y _iin the element of the every a line mean vector obtaining of averaging; Y _ifor U _ithe optimum code coefficient obtaining through dictionary D sparse coding;

Select correct verification and measurement ratio and false alarm rate to assess validity of the present invention.Wherein, the ratio of the target number that correct verification and measurement ratio is defined as correct detection and total target number, false alarm rate is defined as false-alarm number and the correct target number detecting and the ratio of false-alarm number sum., the testing result of gained of the present invention and the multi-class targets detection algorithm based on BoW are contrasted, comparing result is as shown in table 1 meanwhile.Correct verification and measurement ratio and false alarm rate have all shown the validity of the inventive method.

Table 1 evaluation

Claims

1. a remote sensing images multi-class targets detection and Identification method, is characterized in that step is as follows:

Step b1 data pre-service: adopt method of weighted mean pair tri-components of RGB of class training image are weighted and on average obtain gray level image, described calculated with weighted average method formula is f (x, y)=0.3R (x, y)+0.59G (x, y)+0.11B (x, y), in formula, the gray level image that f (x, y) obtains for method of weighted mean is at pixel (x, y) gray-scale value, R (x, y), G (x, y) and B (x, y) training image that is respectively input is at tri-component values of the RGB of pixel (x, y); Then gray level image is carried out to down-sampling processing, obtain the image of n × n size; Image to n × n size carries out energy normalized processing, and described energy normalized computing formula is in formula, f _norm(x, y) is the gray-scale value of f (x, y) after energy normalized, and u and v are respectively the row and column size of gray level image; Carry out having obtained normalized image after energy normalized processing, then normalized image is converted to n ²the column vector of × 1 dimension, the row using column vector in training data, obtain pretreated training dataset U=[U ₁, U ₂..., U _c], wherein U _ithe subdata collection of corresponding i class in training dataset U, i=1,2 ..., c;

Step b2 sparse coding: each subimage block is passed through to Optimized model

\hat{α} = \arg \min {| | α | |}_{1} s . t . {| | β - Dα | |}_{2}^{2} \leq ϵ

Obtain the sparse coding coefficient of corresponding each subimage block wherein be and sub-dictionary D _icorresponding coefficient vector, ε > 0 is allowable error, || || ₁for l ₁norm, || || ₂for l ₂norm; Described l _lthe computing formula of norm is described l ₂the computing formula of norm is in two formula, z is the vector of size for M × 1, ξ _kfor the element of vectorial z, k=1,2 ..., M

Step c2 asks for reconstructed error: according to sparse coding coefficient calculate the reconstructed error e of each subimage block and each class _i, described reconstructed error e _icomputing formula be in formula, γ is predefined weights, m _ito Y _iin the element of the every a line mean vector obtaining of averaging; Y _ifor U _ithe optimum code coefficient obtaining through dictionary D sparse coding; Get e=min{e _ias the reconstructed error of this subimage block, and record its corresponding classification then judge in this subimage block, whether to comprise target according to the magnitude relationship between reconstructed error e and predefined threshold tau: if e < is τ, illustrates and comprise target, otherwise, illustrate that this subimage block is background;

Step 3 object detection and recognition:

e_{st} = \{\begin{matrix} 0 & e &GreaterEqual; τ \\ e & e < τ \end{matrix},

P × Q is the size of test pattern, s=1, and 2 ... P, t=1,2 ... Q;

C_{st} = \{\begin{matrix} 0 & e &GreaterEqual; τ \\ C & e < τ \end{matrix},

Step b3: big or small G time of window S × S slided in change, repeating step 2～step a3G time, the G obtaining a reconstructed error matrix and G classification matrix, the span of G is 5～10; By a multiple dimensioned reconstructed error matrix M AP=(e of the G obtaining a reconstructed error matrix composition _stg) _{p × Q × G}; Wherein, e _stgfor the element in matrix M AP, its value is to change the corresponding e of reconstructed error matrix that sliding window size obtains for the g time _st, P × Q × G is the size of multiple dimensioned reconstructed error matrix, g=1, and 2 ... G;

scale (s, t) = \{\begin{matrix} 0 & e_{st} = 0 \\ \underset{g}{\arg \min} {e_{stg}} & e_{st} &NotEqual; 0 \end{matrix};

2. remote sensing images multi-class targets detection and Identification method according to claim 1, is characterized in that: the described anglec of rotation span is 0 ° to 90 °.

3. remote sensing images multi-class targets detection and Identification method according to claim 1, is characterized in that: described FDDL software package parameter lambda ₁scope be 0.001～0.01, λ ₂scope be 0.01～0.1.

4. remote sensing images multi-class targets detection and Identification method according to claim 1, is characterized in that: the span of described S is the integer between 40～90, the span of b is the integer between 1～15.

5. remote sensing images multi-class targets detection and Identification method according to claim 1, is characterized in that: the span of described threshold tau is 0～1.