CN111191583B

CN111191583B - Space target recognition system and method based on convolutional neural network

Info

Publication number: CN111191583B
Application number: CN201911388125.5A
Authority: CN
Inventors: 齐仁龙; 张庆辉; 朱小会; 李大海; 张亚超; 张东林
Original assignee: Zhengzhou University of Science and Technology
Current assignee: Zhengzhou University of Science and Technology
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2023-08-25
Anticipated expiration: 2039-12-30
Also published as: CN111191583A

Abstract

The invention discloses a space target recognition system and method based on a convolutional neural network, which are used for solving the technical problem that the existing deep learning cannot achieve both recognition accuracy and calculated amount. The method comprises model training and target recognition, wherein the model training comprises the steps of inputting positive and negative samples, extracting features by using a sparse convolutional neural network, and training a classifier to form a learning dictionary; the target recognition comprises input image preprocessing, input model matching and output result. The beneficial technical effects of the invention are as follows: on the premise of ensuring accuracy, the recognition efficiency is improved, the storage space is saved, and the more efficient hardware design is facilitated.

Description

Space target recognition system and method based on convolutional neural network

Technical Field

The invention relates to the technical field of deep learning, in particular to a space recognition system and method based on a convolutional neural network.

Background

Spatial destination identification refers to the process by which a particular object (or type of object) is distinguished from other objects (or types of objects). It includes both the identification of two very similar objects and the identification of one type of object with the other. Spatial object recognition is how objects are extracted from sequential images. The space target detection and recognition is a popular direction of computer vision and digital image processing, and is widely applied to the fields of robot navigation, intelligent video monitoring, industrial detection, aerospace and the like.

In recent years, with the development of artificial intelligence technology, image recognition technology based on deep learning has been greatly advanced, and important fields such as medical treatment, automatic driving, fashion and the like are brought into the field of floor application around the technology. Currently, extraction of model features using deep learning tools has become a mainstream technique. The deep learning field also generates a plurality of models, such as classical convolutional neural networks, deep confidence networks, automatic encoders, generation countermeasure networks and the like, wherein the convolutional neural networks have great success in the fields of voice, images, videos and the like due to the strong capability of abstracting and extracting high-level semantic features, and are widely applied.

However, the conventional convolutional neural network needs to extract as many features as possible when the precise recognition result is desired, but too many features are unfavorable for the calculation speed, and the burden of hardware is increased, even the large calculation amount cannot be supported, so that the problem of how to simplify the calculation on the premise of ensuring the recognition precision is urgent to be solved.

Disclosure of Invention

The invention provides a space target recognition system and a space target recognition method based on a convolutional neural network, which are used for solving the technical problem that the conventional deep learning cannot achieve both recognition accuracy and calculation amount.

In order to solve the technical problems, the invention adopts the following technical scheme:

the space target recognition system based on the convolutional neural network comprises a calculation unit and a storage unit, wherein the calculation unit comprises a convolutional layer for extracting features, a pooling layer for reducing dimensionality and a full-connection layer for combining features, and the convolutional layer comprises a sparse convolutional module for separating a target and a background; the storage unit comprises a RAM module for storing weight values and bias values and a ROM module for storing classification labels.

Further, the convolution layer comprises a first data buffer module and a feature classification module, the pooling layer comprises a second data buffer module, and the full-connection layer comprises a multiplier, an accumulator, a comparator and a residual calculator.

Further, the sparse convolution module includes an inner product calculator of the sparse vector and the dense vector.

The method for constructing the space target recognition model comprises the following steps:

s1: inputting a training sample set, selecting positive and negative target samples with labels and background patterns, manufacturing an aerial target static and video image data set, and storing the air target static and video image data set in a hard disk of the equipment;

s2: establishing a sparse convolutional neural network, wherein the network adopts a pruning algorithm to divide the characteristics of a space target into a sparse characteristic matrix and a dense characteristic matrix, the sparse characteristic matrix represents the background, and the dense characteristic matrix represents the target;

s3: initializing parameters of the neural network, and then carrying out feature extraction, feature classification and sample labeling on samples in the training sample set one by one, wherein the spatial target recognition system updates the parameters of the neural network in the operation process; the processing of the features mainly comprises:

(1) Color characteristics and selections;

(2) Texture features and selections;

(3) Shape characteristics and selection;

(4) Spatial features and selection;

s4: after all the samples are identified, a space target identification model can be obtained.

Further, the training sample set comprises ten thousands of positive and negative target samples, wherein the positive target sample refers to an image sample containing the space target, and the negative target sample refers to other image samples without the space target.

Further, after the convolutional neural network performs convolutional operation on the image, the result is input into the pooling layer after passing through the activation function ReLU and the bias.

Further, the features of the spatial target also include color, texture, shape.

Further, the training sample set needs to be preprocessed before training: coding, setting a threshold, improving modes, normalizing and performing discrete mode operation.

The method for identifying the space target comprises the following steps:

i, inputting an image to be identified into a storage unit in a space target identification model;

II, extracting features of the image by using a sparse convolutional neural network;

and III, matching the parameters of the characteristics with parameters in the space target recognition model, if the parameters are completely matched, indicating that the image to be recognized contains a target, otherwise, judging that the image is a background.

And further, preprocessing the image in the step II before extracting the characteristics, and adjusting the image to be the same as the apparent characteristics of the training sample of the space target recognition model.

Compared with the prior art, the invention has the beneficial technical effects that:

1. according to the invention, a sparse convolutional neural network is utilized, an image segmentation and activation function optimization technology is adopted, a network depth and width and a link mode are effectively designed, a target feature model is extracted, task retrieval and accurate classification based on the features of the extracted target such as shape, color, texture, illumination, position relation and semantic expression are trained, and a target recognition classifier is formed, so that the recognition accuracy and the recognition efficiency of a space target are improved.

2. The invention realizes that coarse-granularity pruning can realize the similar sparse proportion as unstructured pruning, almost has no loss in precision, balances the sparse rule and the prediction precision through quantization, and solves the problem of maintaining the precision of the model while ensuring more structured sparsity.

3. The sparse structure saves the storage space due to the index, and the coarse-granularity pruning can achieve better compression rate than fine granularity under the same accuracy threshold, so that the efficiency of hardware acceleration design and the accuracy of a model are improved, and the method is more suitable for the purpose of hardware acceleration.

4. The sparse structure solves the problem that coarse-grain sparse and fine-grain sparse can save twice the storage space, and the storage cost is two orders of magnitude higher than the operation cost of arithmetic operation, so that the regularity of the sparse structure can lead to more efficient hardware design.

Drawings

FIG. 1 is a hardware architecture diagram of a convolutional neural network-based spatial target recognition system of the present invention.

FIG. 2 is a flowchart of a training method of a spatial target recognition model based on a convolutional neural network.

Fig. 3 is a schematic diagram of a convolutional neural network-based spatial target recognition method of the present invention.

Fig. 4 is a schematic diagram of signal connections of a conventional convolutional neural network.

Fig. 5 is a schematic diagram of signal connections of a convolutional neural network of the present invention.

Fig. 6 is a schematic diagram of a sparse convolutional neural network target recognition algorithm according to the present invention.

Detailed Description

The following examples are given to illustrate the invention in detail, but are not intended to limit the scope of the invention in any way.

Example 1: referring to fig. 1, the whole system hardware is designed according to a CNN network structure, each layer is designed into a single module by utilizing the parallel characteristic of an FPGA hardware circuit, the final classification layer finishes the classification result of input data by utilizing an integrated learning dictionary classifier, and then the weight to be updated is calculated by a back propagation algorithm. The system firstly initializes the weight value index address of each layer of convolution Kernel Kernel by the controller, and loads the weight value and the offset value from the RAM module according to the index address. And when in forward propagation, input data enters a data buffer area through an input signal, then convolution operation is completed according to the number of the output characteristic diagrams of each layer and a convolution kernel, and the operation result is subjected to an activation function ReLU and offset to complete the final output characteristic diagram of the current layer and is input into the next layer. Downsampling operations are performed as they pass through the pooling layer, thereby reducing the dimensionality of the feature map. Finally, the Softmax classifier searches the corresponding numerical value in the ROM according to the input data, and then completes the final output result after probability conversion. And when the back propagation is carried out, comparing the label in the ROM with the output result to obtain residual errors of each layer, storing the residual errors in the RAM, completing updating of the weight and the offset value of the convolution kernels in all the convolution layers according to the set learning rate after calculation is completed, and storing the updated weight into corresponding storage positions by the controller until all training data are input.

Example 2: referring to fig. 2, the model firstly extracts features for a sample, classifies the identified features, such as a car lamp is marked as a, a car wheel is marked as B, and the classified features are combined to form a classifier for identifying different car models, namely a classification dictionary, so that the model is constructed, and the method is as follows:

(1) Creating positive and negative samples

The positive sample refers to a sample of the object to be detected, in this embodiment, the object is an automobile, the negative sample refers to any other picture that does not include the object, such as a background, and all sample pictures are normalized to the same size, in this embodiment, 20×20. In order to make the classification detection accuracy better, training samples are set to 1 ten thousand space target static and video image data so as to ensure the accuracy requirement of training.

(2) Extracting target features

The feature extraction is to transform the original data to obtain the features which can reflect the classification essence most. Mainly comprises the following steps: color, texture, shape, and spatial features.

Color features describe the surface properties of the scene corresponding to the image or image region, and common color features include image patch features, color channel histogram features and the like; texture is generally defined as a local property of an image or a measure of the relationship between pixels in a local region, and the texture feature extraction of the present invention adopts a method based on a spatial correlation matrix of gray level, i.e. a co-occurrence matrix; the shape is one of the basic characteristics of the object, and the shape characteristics are represented by two types, namely a shape outline characteristic description and a shape area characteristic. The shape profile features are mainly: straight line segment description, spline fitting curve, boli She Miaoshu, interior angle histogram, gaussian parameter curve, etc., the shape region features mainly are: extraneous moment of shape, area of region, aspect ratio of shape, etc.; the spatial features refer to the mutual spatial position or relative direction relation among a plurality of targets segmented in an image, and have relative position information, such as upper, lower, left and right, absolute position information, and the basic idea of the method for extracting the spatial features is that after the image is segmented, the features are extracted, and then the features are indexed.

(3) Modeling

The object modeling is a geometric description of important features of the object, the representation of the model contains all relevant feature information reflecting the object, but without any redundant feature information, and the feature information is organized in such a way that different components of the object recognition system can easily access the feature information. The model object is a spatial structural relationship between features; the main selection criteria, one is whether the assumptions of the model apply to the current problem; and secondly, whether the computational complexity required by the model can be tolerated or whether there is an algorithm that is as efficient, accurate or approximate as possible. The object model expression method mainly comprises the following steps: the local statistical features, i.e. the generative model, or the interrelationship definition between the object features, such as the positional relationship, etc., i.e. the discontinative model, or a hybrid model of both. The method for modeling the target comprises the following steps: (1) Entity representation, solid geometry, point cloud, volume network, voxels; 2) Boundary representations such as surface meshes, parametric surfaces, subdivision surfaces, implicit surfaces. In the description model, each class of object is modeled, and then class information is obtained by using maximum likelihood or Bayesian inference.

(4) Matching

The matching is also called feature selection, which is to match the target feature according to the feature template of the target, and if the feature value is completely matched, the recognition result is obtained, namely the feature selection. The target matching process is to use a scanning sub-window to continuously shift and slide in the image to be detected, calculate the characteristics of the region every time the sub-window reaches a position, and then use a classifier trained by us to screen the characteristics to determine whether the region is a target. Then, because the size of the target image may be different from the sample picture size that you use when training the classifier, the sub-window for this scan needs to be larger or smaller, then slide in the image, and then match one pass.

(5) Positioning

And constructing a space target data set according to the training sample data set format standard, wherein each image in the data set corresponds to one label, the labels mark the image name, the type of the initial target in the image, the coordinates and width and height of the circumscribed rectangle of the target, and thus the model construction is completed.

Example 3: referring to fig. 2, the target recognition method mainly comprises spatial target image input, target preprocessing, sparse convolutional neural network, feature extraction and feature selection, wherein the extracted target features are matched and compared with a target feature integrated learning dictionary, and finally the recognized target is output. Specific:

1. pretreatment of

The method is characterized in that preprocessing operations such as image information acquisition, analog-to-digital conversion, filtering, fuzzy elimination, noise reduction, geometric distortion correction and the like are completed, and on the premise that essential information carried by the images is not changed, the apparent characteristics (such as object shape edges, color distribution, textures, overall brightness, size and the like) of each image are consistent as much as possible, so that the subsequent processing process is facilitated.

The preprocessing described above includes five operations, (1) encoding: the effective description of the mode is suitable for computer operation. (2) threshold or filter operation: some functions are selected as needed, and others are suppressed. (3) mode improvement: errors in the pattern are excluded or corrected, or unnecessary function values. (4) normalization: some parameter values are adapted to standard values, or standard value ranges. (5) discrete mode operation: special operations in discrete mode processing.

2. Sparse convolutional neural network

Referring to fig. 5, a sparse convolutional neural network is used to perform positive and negative sample processing, that is, object and background separation (pruning algorithm), to obtain object features, to form a two-part matrix with sparse features and dense features, the sparse matrix is characterized by the background, the dense matrix is characterized by the object, and the two-part matrix is respectively represented as W, I. Meanwhile, the background matrix W can be further divided into W1 and I1 to extract finer target features. Therefore, a multi-level fine target classifier (target feature learning dictionary) can be constructed through systematic training. The essence of the process of target identification is that the background separation is carried out on a space target by adopting sparse nerve convolution, the characteristic of the space target is extracted, and then the characteristic is compared with the characteristic target in the classifier on the basis, and the higher the similarity is, the identification result is obtained.

Compared with the traditional neural network, the traditional convolutional neural network adopts an input-output layer full-connection structure, the calculated amount is large, the sparse convolutional neural network adopts a pruning technology to separate a target and a background to form two sparse (coarse granularity) and dense (fine granularity) matrixes, and then convolutional calculation (characteristic extraction) is carried out to circularly form two finer sparse (coarse granularity) and dense (fine granularity) matrixes until the lost information energy in the process of extracting the characteristic value is minimum. The pruning algorithm is to prune the connections between two neurons in the neural network, see fig. 5, i.e. to set the corresponding coefficients to zero.

The feature extraction by using the sparse convolution network is to directly sparse convolve the space picture on the basis of pruning, and the data replication in the input feature matrix is removed by a direct sparse convolution algorithm, which expands the scale of the convolution kernel matrix to the same size of the input matrix. And developing the extended convolution kernel row to generate a vector Wm, wherein the length of the vector Wm is C, H and W. Assuming that the convolution layer has M convolution kernels, a weight matrix M (c×h×w) is obtained by extending each convolution kernel. For the input matrix under the batch task, a column vector I is formed in a row expansion manner, and the length of the column vector I is c×h×w. Then, in computing the convolution, the elements for the different sensing regions may be mapped to the correct local region by adjusting the starting pointer of vector I.

Feature extraction includes extension of the convolution kernel matrix and updating of the feature pointers. In the extension of the convolution kernel matrix, a sparse matrix with a weight of M (c×r×s) is expressed in CSR format, and for the j-th non-zero element (C, y, x) in the output channel M, there is:

x=col%s

y=(col/s)%R

C=col/(S*R)

where col=colidx [ j ], then the extended weight matrix size is M (c×h×w), and the CSR format column pointer of the same non-zero element (C, y, x) is updated as: colidx [ j ] = (c×h+y) ×w+x.

The update of the pointer can be expressed as: o (O) _m,y,x =W _m I _y,w+x . The core is to realize the inner product operation of sparse vector and dense vector. For calculating the points (m, y, x) in the output matrix, the MAC operation that needs to be done depends on the sparse vector W _m Is a non-zero element number of (c). Since the sparse vector Wm is constant for all points in the same output channel m, the MAC operands required to calculate these output nodes are equal. Matrix I in a direct sparse convolution algorithm ^virtual Is generated from vector I, wherein the element pointed to by the start pointer of each column vector is I [ y, W+x]. According to this feature, we simply reside the elements of vector I in memory, rather than storing the entire dense matrix.

Considering that the sparsity of all the convolution layers subjected to weight pruning in the convolution CNN model is different, the sparsity of the current convolution layer is calculated by the following method:

Sparsity=1.0-Nnonzero/M*kernel_size

where Nnonzero is the number of all non-zero elements of the current convolutional layer, M is the number of output channels of the current convolutional layer, and kernel_size is the size of the convolutional kernel. For convolutional layers of different sparsities, a threshold is set. The convolution layer with the sparsity larger than the threshold value adopts an optimized direct sparse convolution mode, and the convolution layer with the sparsity smaller than the threshold value still adopts a traditional dimension reduction mode.

3. Feature extraction and selection

And transforming the original data to obtain the characteristics which can reflect the classification essence most. Mainly comprises the following steps: (1) color characterization. Color features describe the surface properties of the scene corresponding to the image or image region, and common color features include image patch features, color channel histogram features and the like; (2) texture features. Texture is generally defined as some local property of an image or a measure of the relationship between pixels in a local area. One effective method for extracting texture features is based on a gray level space correlation matrix, namely a co-occurrence matrix, and other methods are also based on feature extraction of an image friend difference histogram and feature extraction of an image gray level co-occurrence matrix; (3) shape features. The shape is one of the basic characteristics of the object, and the shape characteristics are represented by two types, namely a shape outline characteristic description and a shape area characteristic. The shape profile features are mainly: straight line segment description, spline fitting curve, boli She Miaoshu, interior angle histogram, gaussian parameter curve, etc., the shape region features mainly are: the irrelevant moment of the shape, the area of the region, the aspect ratio of the shape, etc. (4) spatial features. The spatial features refer to the mutual spatial position or relative direction relation among a plurality of targets segmented in an image, and have relative position information, such as upper, lower, left and right, absolute position information, and the basic idea of a common method for extracting the spatial features is to segment the image, extract the features, and then index the features. Algorithm for feature extraction: including Haar features, LBP features, haar-like features.

The feature selection is to match the target feature according to the feature template of the target, and if the feature value is completely matched, the recognition result is obtained, namely the feature selection.

4. Outputting the result

And marking the images matched with the model features as 'targets', storing the images in a memory unit, marking the images which are not matched as 'no targets', and storing the images in the memory unit, so as to finish one-time target identification.

While the present invention has been described in detail with reference to the drawings and the embodiments, those skilled in the art will understand that various specific parameters in the above embodiments may be changed without departing from the spirit of the invention, and a plurality of specific embodiments are common variation ranges of the present invention, and will not be described in detail herein.

Claims

1. The space target recognition system based on the convolutional neural network comprises a calculation unit and a storage unit, and is characterized in that the calculation unit comprises a convolutional layer for extracting features, a pooling layer for reducing dimensionality and a fully-connected layer for combining the features, and the convolutional layer comprises a sparse convolutional module for separating a target and a background; the storage unit comprises a RAM module for storing weight values and bias values and a ROM module for storing classification labels;

the whole system hardware is designed according to the CNN network structure, each layer is designed into a single module by utilizing the parallel characteristic of an FPGA hardware circuit, the last classification layer finishes the classification result of input data by utilizing an integrated learning dictionary classifier, and then the weight to be updated is calculated by a back propagation algorithm; the system firstly initializes the weight value index address of each layer of convolution Kernel Kernel by a controller, and loads weight values and offset values from the RAM module according to the index address; when in forward propagation, input data enters a data buffer area through an input signal, then convolution operation is completed according to the number of the output characteristic diagrams of each layer and a convolution kernel, and the operation result is subjected to an activation function ReLU and bias to complete the final output characteristic diagram of the current layer and is input into the next layer; performing downsampling operation when passing through the pooling layer, so as to reduce the dimension of the feature map; finally, the Softmax classifier searches the corresponding numerical value in the ROM according to the input data and completes a final output result after probability conversion; and when the back propagation is carried out, comparing the label in the ROM with the output result to obtain residual errors of each layer, storing the residual errors in the RAM, completing updating of the weight and the offset value of the convolution kernels in all the convolution layers according to the set learning rate after calculation is completed, and storing the updated weight into corresponding storage positions by the controller until all training data are input.

2. A method for constructing a space target recognition model, which relates to the space target recognition system based on a convolutional neural network as set forth in claim 1, and is characterized by comprising the following steps:

s1: inputting a training sample set;

s2: establishing a sparse convolutional neural network space target recognition system, wherein the system comprises a convolutional layer, a pooling layer and a full-connection layer, the convolutional layer adopts a pruning algorithm to divide the characteristics of a space target into a sparse characteristic matrix and a dense characteristic matrix, the sparse characteristic matrix represents a background, and the dense characteristic matrix represents a target;

s3: initializing parameters of the neural network, and then carrying out feature extraction, feature classification and sample labeling on samples in the training sample set one by one, wherein the spatial target recognition system updates the parameters of the neural network in the operation process;

3. The method for constructing a space object recognition model according to claim 2, wherein the output result of the volume layer is input to the pooling layer after passing through an activation function ReLU and a bias.

4. The method of claim 2, wherein the training sample set comprises ten thousand positive and negative target samples, wherein a positive target sample refers to an image sample containing the spatial target, and a negative target sample refers to other image samples not containing the spatial target.

5. The method for constructing a spatial target recognition model according to claim 2, wherein the characteristics of the spatial target further include color, texture, shape.

6. The method for constructing a spatial target recognition model according to claim 2, wherein the training sample set needs to be preprocessed before training: coding, setting a threshold, improving modes, normalizing and performing discrete mode operation.

7. A method for identifying a space target, which relates to a space target identification model as claimed in claim 2, and is characterized by comprising the following steps:

i, inputting an image to be identified into a storage unit in the space target identification model;

8. The method according to claim 7, wherein the image in step ii is preprocessed before feature extraction, and the image is adjusted to have the same apparent characteristics as the training sample of the spatial target recognition model.