CN107301398B

CN107301398B - A kind of identification method of image target of synthetic aperture radar realized based on GPU

Info

Publication number: CN107301398B
Application number: CN201710485297.9A
Authority: CN
Inventors: 曹宗杰; 夏爽; 崔宗勇; 皮亦鸣
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2017-06-23
Filing date: 2017-06-23
Publication date: 2019-04-30
Anticipated expiration: 2037-06-23
Also published as: CN107301398A

Abstract

A kind of radar image target identification technology of the present invention field, and in particular to identification method of image target of synthetic aperture radar realized based on GPU.With the development of SAR imaging technique, the resolution ratio and data volume of SAR image increase sharply, so traditional PCA method efficiency based on CPU serial computing is too low and calculates cost prohibitive.The present invention utilizes the efficient computing capability of GPU general-purpose computations, carries out parallel parsing to PCA feature extracting method, and ask most the methods of value to carry out GPU the strong matrix multiplication of wherein concurrency, Jacobi feature decomposition and reduction and improve parallel.

Description

A kind of identification method of image target of synthetic aperture radar realized based on GPU

Technical field

The invention belongs to radar image target identification technology fields, and in particular to a kind of synthetic aperture realized based on GPU Radar image target identification method.

Background technique

The identification of synthetic aperture radar (Synthetic Aperture Radar, hereinafter referred to as SAR) Image Automatic Target (Automatic Target Recognition, ATR) technology refers in the case where no artificial assistance, right from large scene Target is detected and is positioned and realized the judgement of situations such as model to target, attribute and equipment.SAR target identification institute at present The method of use has very much, has more method in feature extraction and the design of classifier and has respective excellent scarce Point.However compared to these plentiful and substantial theoretical results, the technology of SAR target identification is realized and real application research is but more slow Slowly, practical SAR Target detection and identification system is also less or immature.The big reason for influencing this process is exactly SAR The raising of image resolution ratio causes the data volume obtained to sharply increase, so that traditional CPU processing mode is difficult to handle in real time Data.The existing research for SAR target identification method is directed to the nicety of grading for how improving target mostly, and rare to SAR The research of Target Recognition Algorithms calculating speed.Therefore, it how under the premise of guaranteeing recognizer accuracy rate, improves algorithm and executes Speed realizes the real-time of Target Recognition Algorithms, is the challenge that researcher faces.

In recent years, the high speed development of graphics processor (GPU, Graphic Process Unit), framework it is continuous complete It is kind, so that it develops into the calculating instrument of the Floating-point Computation ability with superelevation from the initial rendering for being only applicable to figure, be The slow problem of Target Recognition Algorithms speed provides powerful.The concurrency of GPU is determined by threads a large amount of in its piece, this It is bigger that kind concurrency has GPU relative to traditional CPU serial arithmetic when calculating the data that data volume is big, closeness is high Advantage.In addition, NVIDIA company continues to optimize the index of GPU various aspects, guarantee computing capability in advance under, reduction function as far as possible Consumption is to reduce calculating cost.

Summary of the invention

The present invention provides a kind of identification method of image target of synthetic aperture radar realized based on GPU, can effectively solve The certainly problem of Target Recognition Algorithms real-time difference.Principal Component Analysis (Principle Component Analysis, referred to as It PCA is) a kind of more commonly used SAR image feature extracting method, it corresponds to the thought of high information based on big variance, by original number According to projection to new coordinate space, with less several generalized variables reflected sample raw information as much as possible.Its realization Steps are as follows:

Each sample process is constituted a matrix P for original n training sample at one-dimensional data by the first step_m*n ={ X₁,X₂,X₃,…,X_n, and calculate equalization matrix P.

Second step calculates the covariance matrix for finding out the matrix P after mean value:

Third step, Jacobi iterative method carry out Eigenvalues Decomposition to covariance matrix Q:

Q=U^TΛU

The covariance matrix for being k for an order, wherein covariance matrix has k nonzero eigenvalue, and U is orthogonal matrix, often One column are also that individual features are worth corresponding feature vector.At this moment the preceding d maximum eigenvalue institute of sorted characteristic value is usually taken Corresponding vector, as PCA projector space, to reduce the dimension of feature.Select big characteristic value because characteristic value it is bigger that The feature mapped to its corresponding feature vector direction is more obvious.

Sample data is projected to the Projection Character obtained in step 3 spatially, so that it may obtain principal component by the 4th step Feature:

Y_PCA=TP

PCA algorithm implementation flow chart is provided according to above four step, as shown in Fig. 1.

Next parallel parsing is carried out to each realization process of PCA algorithm, to finally realize that the GPU of algorithm is counted parallel It calculates, reaches speed-increasing effect.

The first step averages to input sample matrix by rows, and then mean value is individually subtracted in each column vector in original matrix P Vector.During calculating every a line mean value, parallel computation can be carried out with matrix behavior basic unit, per thread is realized The calculating of a line.

Second step, the calculating of covariance matrix, i.e. matrix multiple.Each element of covariance matrix Q is by matrix P Certain a line and a certain column count inner product obtain, and the calculating process of each element is relatively independent, are that a kind of typically suitable GPU is parallel The operation of realization.

Since matrix multiplication is there are good concurrency, it can realize that each thread is corresponded in calculating matrix Q by CUDA One value of corresponding position, allows each thread to read the i-th row and P of P^TJth column, thus the position calculating matrix Q be (i, j) The value of element.Calculating complete matrix Q needs kernel function access global storage to read the data for participating in calculating, wherein reading Take P matrix P^T.width secondary, P^TMatrix P.height times.In order to allow the thread quantity of each block to reach the whole of warp size Several times, the size that we define block is 16*16, so the size of grid is (n/16, n/16).

Third step, the feature decomposition of covariance matrix Q.By decomposing Jacobi iterative algorithm calculating process, it can be found that There are the concurrencys between data for it, and be related to largely can Parallel Implementation matrix operation.

The detailed process of Jacobi iterative method are as follows: set n rank matrix A as the real symmetric matrix to feature decomposition, change each time Dai Zhongjun chooses the element in matrix A on an off-diagonal of maximum absolute value, is set as a_pq, the coordinate position on A be (p, q).Specific rotation transformation formula is as follows:

A₀=A

Wherein spin matrix J_KIt is as follows:

The method is also named bilateral rotary process, as described above, each round J_KAfter bilateral transformation, the off-diagonal position of matrix A Element quadratic sum is reducedElement quadratic sum increases on diagonal lineBy several wheel iteration, original matrix A is become Diagonal matrix, the element on diagonal line is matrix exgenvalue.

Above-mentioned spin matrix J_KIn play turning effort be then 2 × 2 submatrix R, R form it is as follows:

Need to find a kind of suitable dispatching algorithm now, 2 × 2 submatrixs all to matrix are carried out and only once revolved Turn, such process is known as a wheel iteration.In every wheel iteration, the bilateral rotation that n/2 group is independent of each other can be at most carried out simultaneously. As it can be seen that Jacobi Rotation Algorithm has good data parallelism, number can be realized by the common single cycle of competitive sports According to rational management.

In addition, in Jacobi iterative algorithm during GPU Parallel Implementation, relate to what a large amount of matrix multiplications and asking most were worth It calculates.It wherein the parallel parsing of matrix multiplication and realizes by the agency of above, next introduces parallel reduction method and seek matrix most The GPU of value realizes process.

The method of CPU serial computing matrix or array maximum value is calculated by all elements in traversal array, this The time complexity of sample processing is N.And here, we realize the calculating of maximum value using the method for reduction on GPU, will be very big Raising program operational efficiency.The maximum value of adjacent 2 elements is found out first, is then asked on the basis of last calculated result again The maximum value of 2 adjacent maximas out, that is, be equivalent to the maximum value for having found out 4 adjacent elements, according to above-mentioned rule according to The secondary element maximum value for finding out adjacent 8,16,32 etc..

As can be seen that this method based on GPU reduction maximizing only needs the time of logN that can both complete N on CPU The calculating of time is greatly saved and calculates the time, and N is bigger, and acceleration effect will be better.The tool of CUDA parallel reduction is given below Body realizes step.

Assuming that we need to calculate the maximum value in the array that one possesses 1526 elements, per thread alignment processing one A data, then it is 6 that block quantity, which is arranged, and the number of threads of each block is 256, and grid and block are one-dimensional.It adopts Optimized with shared memory and accessed, firstly, per thread reads a data, to complete to deposit from global memory to shared The data copy of reservoir.Then simultaneously operating is done by synchronous function _ _ syncthreads (), to guarantee copy function whole Complete, the data in such same block just can safety by other thread accesses.It connects down and is realized by for () circulation The half thread that reduction procedure, i.e. each round circulation only use last circulation ask the maximum of tid and tid+s thread Value, is as a result stored in the tid thread, i.e. reduction procedure.S can be understood as span herein, for each thread block For block, have:

S=1, only tid=0 when recycling for the first time, the calculating that 2,4,6,8 ..., 254 threads compare in execution, i.e. s_ Compared with the element that data [tid] and span thereafter are 1 and maximizing, as a result it is stored in s_data [tid].

S=2, only tid=0 when second of circulation, the calculating that 4,8,16 ..., 252 threads compare in execution, i.e. s_ Compared with the element that data [tid] and span thereafter are 2 and maximizing, as a result it is stored in s_data [tid].

And so on.

When last time s=128, only thread tid=0 really performs calculating, i.e. s_data [tid] and span thereafter Element for 128 compares and maximizing, is as a result stored in s_data [0].The numerical value in s_data [0] is exactly this at this time The maximum value of 256 data in block.Attached drawing 2 has given in parallel reduction algorithm thread dispatch situation in single thread block.

4th step, sample data project on feature space, can GPU Parallel Implementation by matrix multiplication.

The invention has the benefit that the present invention utilizes the efficient computing capability of GPU general-purpose computations, to PCA feature extraction Method carries out parallel parsing, and the strong matrix multiplication of wherein concurrency, Jacobi feature decomposition and reduction are sought most the methods of value GPU is carried out to improve parallel.

Detailed description of the invention

Fig. 1 is PCA algorithm flow chart；

Fig. 2 is parallel reduction thread scheduling figure；

Fig. 3 is PCA feature extraction speed-up ratio with the increased situation of change of sample size.

Specific embodiment

With reference to the accompanying drawing, the technical schemes of the invention are described in detail

In order to verify effective acceleration of the GPU to SAR Target Recognition Algorithms, using in uniform hardware environment and development platform Operation time as index, compares, to guarantee that variable is single.GPU model NVIDIA GeForce GTX 750Ti, Possess 512 CUDA processor cores, 1G video memory.CPU model Intel Core i7-4790,3.60GHz, 16.0G memory. Experimental implementation system is 64 Windows7.CUDA version is CUDA7.5.Programmed environment is Visual Studio2010, programming Language is CUDA C language.

In addition, experimental data uses MSTAR image data, MSTAR is simply introduced now.

MSTAR (Moving and Stationary Target Acquisition Recognition) project initiation in 1994, it was a SAR ATR project by the multiple research institution's joint studyes in the U.S..Wherein, the laboratory U.S. Sandia It is responsible for providing the original SAR data of 0.3~1m of X-band resolution ratio.The U.S. is responsible in the laboratory Wright establishing to be studied for model Various landform back scattering directional diagrams and for sort research 18 kinds of surface cars obtain database, can to each car The sample of 72 different perspectivess and different directions is provided.And the responsible offer special analysis such as laboratory MIT Lincoln, extract and Sorting algorithm.Present MSTAR data have become the standard database of examination SAR target identification and sorting algorithm.Major part exists The SAR target identification and sorting algorithm delivered on authoritative magazine and meeting are tested and are assessed using MSTAR data.

The present invention is the acceleration situation in order to improve SAR image target identification parallel by GPU.PCA feature extraction algorithm Input be all data set P and data dimension d, and data dimension is fixed as d=64*64.This experiment will gradually change sample This number increases to 6400 from 400, not only can parser in the acceleration situation at the end CPU and the end GPU, equally can analyze Algorithm is when the end GPU is calculated with the increased time change situation of sample number.In order to which expanding data library is increased with reaching in experiment The requirement of sample number, by the artificial fog-level to photo increase by 100% and 200%.

Table 1 gives PCA feature extraction experimental result, and it is real parallel by GPU to give PCA algorithm by calculating speed-up ratio Existing acceleration effect.

1 PCA feature extraction experimental result of table

Attached drawing 3, which gives PCA feature extraction speed-up ratio line chart, to be caused due to the limitation of GPU computing resource with sample Several increases, the increased trend of speed-up ratio slow down.

The identification method of image target of synthetic aperture radar realized based on GPU of the invention can effectively utilize GPU's Powerful calculating ability, and the feature that combination algorithm computation complexity is high, greatly improve algorithm execution speed.

Claims

1. a kind of identification method of image target of synthetic aperture radar realized based on GPU, this method are mentioned using Principal Component Analysis SAR image feature is taken, sample is obtained, which is characterized in that uses the algorithm based on GPU, improves the following recognition methods to sample Processing speed:

S1, each sample process of extraction is formed into a matrix P for original n training sample at one-dimensional data_m*n= {X₁,X₂,X₃,…,X_n, equalization matrix P is obtained, is averaged to input sample matrix by rows, it is then each in original matrix P Mean vector is individually subtracted in column vector；During calculating every a line mean value, using the parallel algorithm based on GPU, with matrix The basic thread unit of behavior carries out parallel computation；

S2, calculating find out the covariance matrix of the matrix P after mean value:

Each element of covariance matrix Q is obtained by certain a line and a certain column count inner product of matrix P, the meter of each element Calculation process is mutually indepedent, and using the parallel computation of GPU, per thread corresponds to the value of a position in calculating matrix Q；

S3, Eigenvalues Decomposition is carried out to covariance matrix Q using Jacobi iterative method:

Q=U^TΛUΛ

The covariance matrix for being k for an order, wherein covariance matrix has k nonzero eigenvalue, and U is orthogonal matrix, Mei Yilie It is also that individual features are worth corresponding feature vector, vector corresponding to the preceding d maximum eigenvalue of sorted characteristic value is taken to make For PCA projector space；In Jacobi iterative process, each round rotation transformation can only change one 2 × 2 in original matrix square Battle array；It is primary to carry out the rotation transformation that n/2 group is independent of each other simultaneously using the parallel algorithm based on GPU, i.e., once realize original square The transformation of all elements, the n/2 group rotation transformation are independent of each other in battle array, specifically include matrix multiplication and most value is asked to transport It calculates, wherein ask most value operation that can realize GPU parallel computation by reducing method, it will be several wait ask the data being most worth to read respectively In a thread block, this is parallel for thread block grade, Thread-Level Parallelism is then realized in per thread block, finally by each thread block institute Most value is asked to make once relatively, most value finally can be obtained；

S4, sample data is projected on the Projection Character space T obtained in step S3, obtains the feature of principal component:

Y_PCA=TP.