CN113989655A

CN113989655A - Radar or sonar image target detection and classification method based on automatic deep learning

Info

Publication number: CN113989655A
Application number: CN202111107594.2A
Authority: CN
Inventors: 唐劲松; 张鹏; 钟何平; 吴浩然; 宁明强; 张智圣
Original assignee: Naval University of Engineering PLA
Current assignee: Naval University of Engineering PLA
Priority date: 2021-09-22
Filing date: 2021-09-22
Publication date: 2022-01-28

Abstract

The invention discloses a radar or sonar image target detection and classification method based on automatic deep learning, which directly uses neural network architecture search to automatically design an optimal convolutional network structure for classification tasks; for a detection task, firstly, a derivative classification data set is extracted from a detection data set according to labeling information, a convolutional neural network architecture search is performed on the derivative classification data set to automatically design an optimal convolutional network structure, and then the network is used as a backbone network to construct a self-training automatic deep learning target detector. The invention solves the problems that the structural design of a deep neural network is time-consuming and labor-consuming in the current radar or sonar image target detection and classification task based on deep learning and the deep neural network strongly depends on transfer learning. The automatic deep learning neural network design method realizes the automatic design of the deep neural network for the specific radar or sonar image target detection and classification data set.

Description

Radar or sonar image target detection and classification method based on automatic deep learning

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a radar or sonar image target detection and classification method based on automatic deep learning.

Background

Imaging radars and sonars are currently indispensable sensors in the field of remote sensing, which can provide rich visual information of the observed area on land, on the sea bottom, in the air and even in the outer space. Target detection and classification are basic tasks of radar and sonar image interpretation, and in recent years, research on RS image target detection and classification based on deep learning is rapidly developed.

The deep neural network model is high in complexity and easy to overfit training data, and is particularly suitable for small sample data sets such as radar sonar. To this end, transductive transfer learning (often referred to directly as transfer learning) is often used to transfer generalizable knowledge onto target tasks to improve the generalization capability of deep neural network models. However, there are at least three problems in radar sonar image target detection using the current transfer learning paradigm: (a) different from the migration of ImageNet and Pascal VOC between optical images, the distribution difference between RS images and optical images is huge, which does not meet the requirement that induction migration learning has the same input space for a source field and a target field, and can cause the problem of inappropriate migration learning and influence the migration effect. (b) Backbone networks ResNet, ZFNET, VGGNet, DarkNet and the like which are commonly used in RS image detection research at present are designed for large-scale optical image classification tasks, and RS image data sets generally contain few effective targets (for example, a synthetic aperture radar image target classification data set MSTAR only contains 5171 images after expansion, and a SAR ship identification data set SSDD only contains 1174 samples), so that the current network model is too complex and redundant and has no pertinence. (c) Pre-training of external data sets does not automatically provide better regularization, and many hyper-parameters need to be carefully selected to avoid overfitting during the fine-tuning process, especially on small-scale data sets.

Self-training directly on small datasets is the primary approach to solving the above problem, so recent research in the field of deep learning object detection has begun to focus on how to better train from scratch on task datasets. The backbone network of the target detection algorithm trained from the beginning can realize successful training on the detection data set only by a conventional initialization method (such as Xavier) without loading model parameters after pre-training on a classification task, and the obtained model not only has higher accuracy, but also greatly reduces the size and the calculated amount of the model. The de novo training ground core consists in designing a convolutional network model that matches the task data set.

A great deal of energy of machine learning experts is consumed for designing a proper convolutional neural network structure, and the hyper-parameters of a plurality of networks need to be determined by relying on an empirical trial and error method, grid search and random search, so that the field of automatic deep learning researches how to automatically design a network structure with more excellent performance, namely neural structure search. However, most of the existing neural network architecture search technologies are designed for classification tasks, and even though the latest DetNAS is designed for detection tasks, the method still depends on external big data pre-training, so that the three problems existing in radar sonar image target detection by using the transfer learning paradigm cannot be overcome. Therefore, designing a self-trained deep-learning detector structure using neural architecture search also requires implementing self-training of architecture search on the detection data set and designing a convolutional network structure for the detection task.

Through the above analysis, the problems and defects of the prior art are as follows:

(1) the deep neural network model is high in complexity and easy to overfit training data, and is particularly suitable for small sample data sets such as radar sonar.

(2) When the existing transfer learning model is used for radar sonar image target detection, different from the transfer of ImageNet and Pascal VOC between optical images, the distribution difference between RS images and optical images is huge, the requirement that transfer learning has the same input space to the source field and the target field is not met strictly, the problem that transfer learning is not suitable can be caused, and the transfer effect is influenced.

(3) Backbone networks ResNet, ZFNET, VGGNet, DarkNet and the like which are commonly used in RS image detection research at present are designed for large-scale optical image classification tasks, and RS image data sets generally contain few effective targets, so that the current network models are too complex and redundant and have no pertinence.

(4) Pre-training of external data sets does not automatically provide better regularization, and many hyper-parameters need to be carefully selected to avoid overfitting during the fine-tuning process, especially on small-scale data sets.

(5) Designing a proper convolutional neural network structure needs to consume a great deal of energy of machine learning experts, and the hyper-parameters of many networks need to be determined by relying on an empirical trial and error method, grid search and random search.

(6) Most of the existing neural network architecture search technologies are designed for classification tasks, even though the latest DetNAS is designed for detection tasks, the recent DetNAS still depends on external big data pre-training, and the three problems existing in radar sonar image target detection by using a transfer learning paradigm cannot be solved.

The difficulty in solving the above problems and defects is:

the core problem for solving the problems and the defects is to realize the automatic design of the deep neural network structure, but most of the existing neural architecture searching methods for the automatic design of the deep neural network structure are classified network design, and the resource overhead such as memory, calculation amount and the like during architecture searching is large.

The significance of solving the problems and the defects is as follows:

the method realizes automatic design of the deep neural network for specific radar or sonar image target detection and classification data sets through the neural architecture search method of automatic deep learning, can solve the problem of complex design of the neural network in manual design classification tasks or detection tasks, cancels the dependence of the existing deep learning method on transfer learning, and improves the generalization performance of the neural network on the radar or sonar image target detection and classification data sets. Meanwhile, the automatic design method of the deep learning detector is enriched theoretically.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a radar or sonar image target detection and classification method based on automatic deep learning, and particularly relates to a radar or sonar image target detection and classification method based on automatic deep learning.

The invention is realized in this way, based on the radar or sonar image target detection and classification method of the automatic deep learning, the radar sonar image target detection and classification method includes the following steps:

step one, a radar or sonar image data set is given, if the radar or sonar image data set is a classified data set, step three is directly performed, and if the radar or sonar image data set is a detection task, step two, step three and step four are sequentially performed.

Extracting a derivative classification data set from the detection data set according to the labeling information;

and step three, performing automatic design and retraining of a convolutional network structure on the derived classification data set (for the classification problem, the task classification data set is directly used) by using convolutional neural network architecture search. The automatic network after retraining can be directly used for the classification task of radar or sonar images.

And step four, constructing an automatic deep learning target detector by using the self-training backbone network automatically designed in the step three, and training and verifying the automatic deep learning target detector on the task detection data set. The automatic deep learning target detector after training can be directly used for the detection task of radar or sonar images.

Further, the neural architecture search algorithm used includes, but is not limited to, a differentiable neural architecture search algorithm.

Further, in step two, the extracting a derivative classification dataset from the detection dataset according to the labeling information includes:

(1) for the detection task, the derived data set is extracted from a training set of a radar sonar detection data set and contains various target and background images, and the design and self-training of the detector backbone convolution network can be indirectly realized through the derived data set;

(2) the derivative data set is completely extracted from a training set and a verification set of the detection data set; classification data train-DC and val-DC extracted from the detection training set are respectively used for training and verifying the backbone network; the test classification data set test-DC extracted from the detection verification set is used for testing the classification performance of the backbone network;

(3) and (3) obtaining higher self-training classification accuracy on the derivative classification data set by using an NAS method, and dividing the train-DC data set obtained in the step (2) into a train-search and a val-search which are respectively used for training and verification during backbone network architecture search.

Further, in the step (1), a reaction mixture of 7: 1: 2, dividing the detection data set into a training set and a verification set test set.

Further, in the step (3), a step of 5: a ratio of 5 divides the train-DC dataset into a train-search and a val-search.

Further, in step three, the performing automated design and retraining of the optimal convolutional network structure on the derived classification dataset by using convolutional neural network architecture search includes:

(1) simplifying the search of the integral structure of the convolutional network into the structure search in a convolutional network composition unit through functional abstraction; the function of the convolutional neural network is abstracted into two types of computing units, namely a standard unit and a reduction unit, wherein the standard unit can keep the size of the feature map in the process of extracting the features, and the reduction unit can reduce the size of the feature map and increase the number of channels of the feature map in the process of extracting the features.

(2) Designing operator set, standard cell O, for two types of convolution network composition cells^N(6) A reduction unit O^R(6) (ii) a Modeling a convolution network structure in each computing unit into a directed acyclic graph; the inside of the computing unit is provided with 4 computing nodes of 0, 1, 2 and 3, and an operator set is designed and used for connecting the computing nodes.

(3) And constructing a conversion calculation mode of the internal feature diagram of the calculation unit during searching and reasoning. Let the feature graph represented by each node be x^(j)During network reasoning, each directed edge (i, j) is driven from O^N(6) Or O^R(6) In a convolution operation o^(i,j)For converting the preceding profile x⁽ⁱ⁾I.e. each node receives as input all preceding nodes:

will be paired with O during search^N(6) Or O^R(6) All operations within the system are assigned attention in a continuous relaxation mode, i.e. a weighted average of the results of all operations

In place of o^(i,j)(x⁽ⁱ⁾)。

(4) And designing a network structure searching scheme with friendly memory on the basis of the formula (1). Assume a preceding feature layer x⁽ⁱ⁾If there are M channels, then the FL-DARTS will randomly sample M/K channels as sampling feature S according to the sampling rate 1/K^(i,j)And other features not sampled

Directly as output, thus x^(j)Received from x⁽ⁱ⁾The partial weighting characteristics of (a) can be written as:

wherein the content of the first and second substances,

(5) edge normalization is used, i.e. each edge (i, j) is given an explicit weight β^(i,j)Using channel sampling and edge regularization

The calculation of (d) will become:

wherein the content of the first and second substances,

(6) optimizing the hyper-network constructed by the formula (2) or (3) by adopting the double-layer optimization algorithm from the step (7) to the step (10), extracting a discrete structure from the optimized average operation optimization result, and outputting each node x^(j)All need to be from { x⁽⁰⁾,x⁽¹⁾,...,x^(j-1)Randomly selecting two input nodes, wherein the weights are as follows:

i.e. replacing the mixing operation by the most probable operation

To obtain a discrete network structure, i.e. the operation with the largest weight is selected by argmax to obtain a discrete structure:

in the formula, the average operation weight α on each directed edge can be regarded as the attention distribution of the network architecture search algorithm to the convolution operation, i.e., α can be expressed as the continuous coding of the convolution structure.

(7) And constructing a convolutional network architecture to search a differentiable double-layer optimization scheme. In equations (2) and (3), the attention of the mixing operation is based on the conditional probability

Modeling, and p_oWith vector α in dimension | O |^(i,j)As a parameter, the problem of the architectural search is thus simplified to a set of relaxed continuous vectors α ═ α^(i,j)The learning problem. By L_trainRepresents the loss of training set, L_valAnd representing the loss of the test set, and after continuous relaxation is carried out on the operation weight, the structure parameter alpha and the weight w can be jointly learned. Taking the classification accuracy of the verification set as the final reward or goodness of fit, taking the dual target as the minimized verification set loss, and using a gradient descent method with momentum in the optimization method; the core principle of carrying out architecture search is to determine the optimal network architecture alpha through gradient optimization_oTo minimize verification set loss L_val(w_o(α_o),α_o) Namely:

α_o＝argmin_αL_val(w_o(α),α) (5)

in the formula, the network weight w used when the verification set is lost is calculated_o＝w_o(α) is obtained by minimizing the training set loss, i.e.:

w_o＝w_o(α)＝argmin_wL_train(w,α) (6)

for the hyper-network model during the architecture search, not only the average operation weight α on the directed edge for the network structure coding needs to be optimized, but also the weight w of the hyper-network itself needs to be learned. Therefore, the architecture optimization is performed by using the following double-layer optimization formula:

s.t.w_o(α)＝argmin_wL_train(w,α) (7)

in the formula, alpha is an outer layer structure optimization variable, and w is an inner layer network weight optimization variable.

(8) And accelerating the performance evaluation by adopting a one-time reasoning mode to solve the double-layer optimization problem shown in the formula (7). Only single-step training is carried out in each time of inner layer model learning to obtain the approximate weight w of the current hyper-network^*(α). By this approximation, the gradient upon training using the inner layer model is:

if the learning rate of the inner layer optimization is xi, the approximate weight w of the current structure alpha obtained after single step learning is carried out^*And (alpha) is:

(9) after each time of inner layer optimization, the classification accuracy of the current structure on the verification set can be evaluated, and the classification loss L of the current structure on the verification set is calculated_val(w^*And (alpha), continuously performing outer layer optimization through a gradient descent method, and updating the structural weight alpha to realize higher classification accuracy. The outer layer optimization uses gradient descent of the verification set classification loss,

having the following form:

when the outer layer structure is optimized, only a standard gradient descent method is used for optimization.

(10) Different from common neural architecture search, a stable architecture search process is carried out by adopting multiple auxiliary classification branches during the search, and the weights of the two branches are respectively set to be 0.4 and 0.2.

(11) And extracting to obtain the optimal dispersion, and retraining the final structure on the RS derivative classification data set to obtain the optimal network weight.

(12) And if a detection task needs to be carried out, cutting the network structure after retraining so as to remove redundant classification heads and auxiliary classification branches and combine the classification heads and the auxiliary classification branches to form a standard multi-scale feature pyramid.

Further, in step four, the constructing a self-trained automated deep learning target detector using the designed self-trained backbone network includes:

(1) replacing a manually designed backbone network in a manually designed conventional deep learning detector with the automatically designed backbone network, and referring such a detector as an automatic detector;

(2) respectively selecting a single-stage RetinaNet, a double-stage Faster R-CNN and a multi-stage Cascade R-CNN as detection frames of the automatic detector to respectively obtain an ARN, an AFR and an ACR;

(3) designing a specific data enhancement scheme for the radar sonar data set by combining an automatic detector training process; and when backbone network searching and retraining are carried out on the derivative classification data set, random rotation, random cutting and random erasing are carried out on the target image block, and random cutting and random overturning are adopted on the detection data set when the detector is self-trained.

The invention also aims to provide a radar or sonar image target detection and classification control system based on automatic deep learning, which implements the radar or sonar image target detection and classification method based on automatic deep learning.

The invention also aims to provide application of the radar or sonar image target detection and classification method based on the automatic deep learning in visual information processing of land, seabed and air observation areas.

By combining all the technical schemes, the invention has the advantages and positive effects that: the radar sonar image target detection and classification method provided by the invention detects and classifies radar sonar image targets by using a deep convolution neural network automatically designed by an automatic deep learning technology. The efficient neural architecture searching method provided by the invention can quickly and automatically design an optimal convolutional neural network structure for radar and sonar data sets so as to realize the purposes of target classification and detection. Secondly, the neural architecture search algorithm can be used for automatic design of detection backbone networks, and the design process does not depend on pre-training of external data. Compared with the existing deep learning classifier and detector for radar and sonar images, the automatic deep learning method provided by the invention realizes self-training directly on a task data set, and has the advantages of simpler network structure, low parameter quantity, small calculated quantity, high inference speed, high classification accuracy and high detection precision.

The invention provides a method for automatically designing a convolutional network structure matched with a radar sonar data set by using a neural architecture search technology in the field of automatic deep learning. However, most users require too much computation power in the conventional neural architecture search technology, and therefore, the present invention firstly proposes a method for reducing the computation amount during the neural architecture search. Secondly, aiming at the detection problem, most of backbone networks of the existing deep learning detectors use convolutional neural networks designed for classification tasks to extract features, so that the special requirements of the detectors on the convolutional networks are considered during the search of the neural network architecture, and the problem of self-training of the backbone networks of the detectors on a detection data set is solved.

For the classification task, a classification accuracy of 99.9% can be achieved using steps S2(a) -S2(j) on the synthetic aperture radar classification dataset MSTAR.

For the detection task, the network structure shown in fig. 8 can be obtained by performing step S1 and step S2 on the radar ship target detection data set SSDD and the sonar common target detection data set SCTD, respectively. Then, the three types of detectors, including ARN, AFR and ACR, designed in step S3 can obtain the detection performances achieved in tables 3 and 4 in SSDD and SCTD.

Table 3 comparison of detection performance of the present invention on radar ship target detection data set SSDD with existing deep learning detector

Table 4 comparison of detection performance of the automatic detector of the present invention and the conventional transfer learning detector on sonar detection data set SCTD

According to the detection results shown in table 3, the automatic backbone network designed by the present invention can respectively improve the detection mAP of the three types of detectors by 2.2%, 1.3%, and 1.5% under the same training and testing conditions, wherein the improvement of the single-stage detector performance is most obvious, and in the aspect of the detection recall rate, the automatic detectors of the present invention respectively improve by 0.1%, 2.7%, and 1.4% on the basis of the transfer learning detector. In terms of model complexity, the parameters of the automatic detector are 49.59Mb, 21.79Mb, and 13.89Mb, respectively, and are only about 3/5, 1/2, and 1/3 corresponding to the parameters of the migration learning method. In terms of computational complexity, three types of automatic detectors can save computation load by 33%, 37% and 38% respectively. Therefore, the automatic detector is superior to the transfer learning method in detection effect, the model is generally simpler, and the model reasoning speed is higher. In order to show the detection effect of the automatic detector on the radar target detection data set SSDD, small targets and large targets under a complex background are respectively selected in the test set for detection, the prediction effects of ARN, AFR and ACR boundary frame positions and category confidence coefficients are used as shown in FIG. 9, and FIG. 9 shows that the automatic detector can accurately return the positions of the boundary frames of the objects in the test set.

In terms of sonar image target detection, as can be seen from table 4, the average detection accuracy of the automatic detector on SCTD is significantly better than that of the transfer learning method, and compared to the transfer learning method, the automatic detector respectively increases the maps of RetinaNet, fast RCNN, and Cascade RCNN by 1.15%, 23.98%, and 75.42%, and the performance improvement rate is much higher than the detection result on the SSDD data set. In addition, aiming at the problem of unbalanced samples, the automatic detector realizes higher recall rate and detection precision on small samples such as human, and explains the effectiveness of searching and self-training the backbone network on the derivative classification data set; this table 4 also shows that the automatic detector is more advantageous in terms of inference speed (FPS) than the detector corresponding thereto. The results of the ARN, AFR and ACR bounding box prediction are shown in FIG. 10, and FIG. 10 also shows that the automatic detector can accurately return the position of the object bounding box in the test set.

The reasoning speed can be further optimized, and a hardware platform-aware framework search algorithm can be adopted to carry out reasoning acceleration on an edge computing platform such as a domestic development board, so as to make up the reasoning speed difference caused by the hardware difference.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic diagram of a radar sonar image target detection and classification method provided by an embodiment of the present invention.

Fig. 2 is a flowchart of a method for extracting a derived data set from a radar sonar data set according to an embodiment of the present invention.

Fig. 2(a) is a flowchart of a data set processing and step-by-step self-training method provided by the embodiment of the present invention (the data set is divided from top to bottom, and the order of using the data set during self-training is bottom to top).

Fig. 2(b) is a flowchart of a method for extracting a derivative classification data set according to an embodiment of the present invention (firstly, extracting a target region into target image blocks of different categories according to detection data set bounding box labeling information, then using the other parts of the target region after being removed as background region candidate regions, and dividing the candidate regions into rectangular blocks meeting specific requirements as background regions).

Fig. 3 is a schematic diagram of a convolutional neural network architecture search superstructure provided in an embodiment of the present invention.

Fig. 4 is a schematic structural diagram of the interior of each computing unit constituting a super network during a search of a convolutional network structure provided by an embodiment of the present invention.

Fig. 5 is a flowchart of an architecture search two-tier optimization algorithm for optimizing the super-network structure shown in fig. 4 according to an embodiment of the present invention.

Fig. 6 is a schematic diagram of cutting a network structure obtained by search to form a multi-scale feature pyramid according to an embodiment of the present invention.

FIG. 7 is a schematic diagram of a framework for constructing three types of automatic detectors used in accordance with an embodiment of the present invention;

in the figure: "Pool" denotes region level feature extraction, "B0" is a predefined anchor box, "H" is typically composed of fully connected modules for outputting class C and bounding box B.

Fig. 8 is a schematic diagram of an internal network structure of a computing unit designed for a SSDD and SCTD derived classification data set by using the two-layer optimization algorithm shown in fig. 5 according to an embodiment of the present invention (a final convolutional network may be formed by the concatenation method shown in fig. 3).

Fig. 8(a) is a schematic diagram of a standard cell structure of SSDD search according to an embodiment of the present invention.

Fig. 8(b) is a schematic structural diagram of a reduction unit of SSDD search according to an embodiment of the present invention.

Fig. 8(c) is a schematic diagram of a standard cell structure of SCTD search provided in the embodiment of the present invention.

Fig. 8(d) is a schematic structural diagram of a reduction unit of SCTD search provided by an embodiment of the present invention.

Fig. 9 is a schematic diagram of the detection effect of three types of automatic detectors provided by the embodiment of the present invention on an SSDD radar image detection data set (light-colored thin-line frames around an object represent labels, and dark-colored thick-line frames represent the prediction output of the detector, and the more overlapping the two indicate that the detection accuracy is higher).

Fig. 9(a) is a schematic diagram of the detection effect of ARN on SSDD radar image detection data set according to an embodiment of the present invention.

Fig. 9(b) is a schematic diagram of the detection effect of the AFR on the SSDD radar image detection data set according to the embodiment of the present invention.

Fig. 9(c) is a schematic diagram of the detection effect of the ACR on the SSDD radar image detection data set according to the embodiment of the present invention.

Fig. 10 is a schematic diagram of the detection effect of three types of automatic detectors provided by the embodiment of the present invention in an SCTD sonar image detection data set (light-colored thin line frame around an object indicates labels, and dark-colored thick line frame indicates the predicted output of the detector, and the more overlapping the two indicates the higher detection accuracy, and in the diagrams (b), (c), and (d), the predicted outputs of ARN, AFR, and ACR are sequentially from left to right in the diagrams (b), (c), and (d).

Fig. 10(a) is a schematic diagram of a sonar detection scene (detector input image) according to an embodiment of the present invention.

Fig. 10(b) is a schematic diagram of a local amplification effect of a sunken ship detection result provided by the embodiment of the invention.

Fig. 10(c) is a schematic diagram of a local amplification effect of a detection result of a sunken aircraft provided in an embodiment of the present invention.

Fig. 10(d) is a schematic diagram of a local amplification effect of a detection result of a submarine dummy according to an embodiment of the present invention.

Fig. 11 is a flowchart of a radar sonar image target detection and classification method provided by the embodiment of the present invention.

FIG. 12 is a block diagram of a radar sonar image target detection and classification system according to an embodiment of the present invention;

in the figure: 1. a data set determination module; 2. a data set extraction module; 3. an optimal convolutional network construction module; 4. a target detector building block; 5. and an index output module.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Aiming at the problems in the prior art, the invention provides a radar or sonar image target detection and classification method based on automatic deep learning, and the invention is described in detail below with reference to the accompanying drawings.

As shown in fig. 11, the radar sonar image target detection and classification method provided by the embodiment of the present invention includes the following steps:

s101, a radar sonar image data set is given, and the problem type is judged; wherein the question category comprises a classification question and a detection question;

s102, extracting a derivative classification data set from the detection data set according to the labeling information for the detection problem;

s103, for the classification problem, using a convolutional neural network architecture to search for automatic design and retraining of an optimal convolutional network structure for the derived classification data set, and accurately outputting the classification;

s104, constructing a self-training automatic deep learning target detector by using the self-training backbone network designed in the S103, and training and detecting on a task detection data set;

and S105, outputting the detection precision and the recall ratio index.

As shown in fig. 12, a radar sonar image target detecting and classifying system according to an embodiment of the present invention includes:

the data set determining module 1 is used for giving a radar sonar image data set and judging the problem type; wherein the question category comprises a classification question and a detection question;

the data set extraction module 2 is used for extracting a derivative classification data set from the detection data set according to the labeling information for the detection problem;

the optimal convolutional network construction module 3 is used for carrying out automatic design and retraining of an optimal convolutional network structure on the derived classification data set by using convolutional neural network architecture search for the classification problem and accurately outputting the classification;

the target detector building module 4 is used for building a self-trained automatic deep learning target detector by using the designed self-trained backbone network, and training and detecting on the task detection data set;

and the index output module 5 is used for outputting the detection precision and the recall rate index.

The present invention will be further described with reference to the following examples.

1. Summary of the invention

The deployment of the self-training deep learning classification and detector on the radar sonar image data set is helpful for improving the target classification and detection performance, but the realization of the deep learning detection or the self-training of the classifier on the radar sonar image firstly needs to overcome the difficulty of the convolutional network structure design matched with the data set. To this end, the present invention proposes to automatically design a convolutional network structure that matches a radar sonar data set using neural architecture search techniques in the field of automated deep learning. However, most users require too much computation power in the conventional neural architecture search technology, and therefore, the present invention firstly proposes a method for reducing the computation amount during the neural architecture search. Secondly, aiming at the detection problem, most of backbone networks of the existing deep learning detectors use convolutional neural networks designed for classification tasks to extract features, so that the special requirements of the detectors on the convolutional networks are considered during the search of the neural network architecture, and the problem of self-training of the backbone networks of the detectors on a detection data set is solved.

2. Summary of the invention

As shown in fig. 1, the method is divided into 3 main steps, S1, S2 and S3 are required to be sequentially executed for a radar sonar image detection task, and S2 is required to be executed for a radar sonar image classification task.

Step S1: for the detection task, firstly, a derivative classification data set is extracted from the detection data set according to the labeling information.

(a) The derived data set is extracted from a training set of a radar sonar detection data set and contains various target and background images, the design and self-training of the detector backbone convolution network can be indirectly realized through the derived data set, and the fine adjustment of the detector can be initialized according to the self-training weight to realize better detection effect. As shown in fig. 2(a), the test data set is first divided into training, validation, and test sets for testing. The method adopts the following steps: 1: and 2, dividing.

(b) The derived data set is completely extracted from the training set and the verification set of the detection data set, and the effectiveness of the detector test cannot be influenced. The classification data train-DC and val-DC extracted from the detection training set are respectively used for training and verifying the backbone network. And if necessary, the test classification data set test-DC extracted from the detection verification set is used for testing the classification performance of the backbone network.

(c) The derived classification data set is still a small sample data set consistent with the detection data set, and how to design a reasonable convolution network structure is important for the feature extraction effect of the backbone network. Using NAS method can achieve higher self-training classification accuracy on the derived classified dataset, for which the obtained train-DC dataset of (b) needs to be further divided into train-search and val-search, which are respectively used for training and verification during backbone network architecture search, the method adopts 5: and 5, dividing.

Step S2: and (3) carrying out automatic design and retraining of an optimal convolutional network structure on the derived classification data set (the classification task is directly the task classification data set) by using a memory-friendly convolutional neural network architecture search with a flexible structure.

(a) The search of the integral structure of the convolutional network is simplified into the structure search in the constituent unit of the convolutional network through functional abstraction. As shown in fig. 3, the convolutional neural network function is abstracted into two types of computing units, namely a standard unit and a reduction unit, the standard unit can maintain the feature map size in the process of extracting the features, and the reduction unit can reduce the feature map size and increase the number of feature map channels in the process of extracting the features.

(b) Designing operator set, standard cell O, for two types of convolution network composition cells^N(6) A reduction unit O^R(6). The convolutional network structure within each compute unit is modeled as a directed acyclic graph. As shown in FIG. 4, the computing unit has 4 computing nodes of 0, 1, 2, and 3, and the invention designs the operator set (such as pooling and convolution) shown in Table 1 to connect these computing nodes.

TABLE 1 custom operator set for two types of arithmetic units

'none' and 'skip _ connect' denote multiplying the inputs by 0 and 1, respectively. 'sep _ conv' and 'dil _ conv' denote separable convolution and hole convolution, respectively, 'max _ pool' and 'avg _ pool' denote maximum pooling and average pooling. '3 x 3' and '5 x 5' represent the filter kernel size of the above operation.

(c) And constructing a conversion calculation mode of the internal feature diagram of the calculation unit during searching and reasoning. Let the feature graph represented by each node be x^(j)During network reasoning, each directed edge (i, j) is driven from O^N(6) Or O^R(6) In a convolution operation o^(i,j)For converting the preceding profile x⁽ⁱ⁾I.e. each node receives as input all preceding nodes:

In place of o^(i,j)(x⁽ⁱ⁾)。

(d) And designing a network structure searching scheme with friendly memory on the basis of the formula (1). Assume a preceding feature layer x⁽ⁱ⁾If there are M channels, then the FL-DARTS will randomly sample M/K channels as sampling feature S according to the sampling rate 1/K^(i,j)And other features not sampled

Directly as output, thus x^(j)Received from x⁽ⁱ⁾Can be recorded as：

Wherein the content of the first and second substances,

(e) in particular, to overcome the instability problem of gradient descent direction that may be caused by channel sampling of equation (2), edge normalization may be employed, i.e., each edge (i, j) is given an explicit weight β^(i,j)Using channel sampling and edge regularization

The calculation of (d) will become:

wherein the content of the first and second substances,

(f) optimizing the hyper-network constructed by the formula (2) or (3) by adopting the double-layer optimization algorithm from the step (g) to the step (j), extracting a discrete structure from the optimized average operation optimization result, and outputting each node x^(j)All need to be from { x⁽⁰⁾,x⁽¹⁾,...,x^(j-1)Randomly selecting two input nodes, wherein the weights are as follows:

i.e. replacing the mixing operation by the most probable operation

(g) And constructing a convolutional network architecture to search a differentiable double-layer optimization scheme. In equations (2) and (3), the attention of the mixing operation is based on the conditional probability

Modeling, and p_oWith vector α in dimension | O |^(i,j)As a parameter, the problem of the architectural search can thus be reduced to a set of relaxed continuous vectors α ═ α^(i,j)The learning problem. By L_trainRepresents the loss of training set, L_valAnd representing the loss of the test set, and after continuous relaxation is carried out on the operation weight, the structure parameter alpha and the weight w can be jointly learned. And taking the classification accuracy of the verification set as the final reward or goodness of fit, taking the dual target as the minimized verification set loss, and using a gradient descent method with momentum in the optimization method. The core principle of carrying out architecture search is to determine the optimal network architecture alpha through gradient optimization_oTo minimize verification set loss L_val(w_o(α_o),α_o) Namely:

α_o＝argmin_αL_val(w_o(α),α) (5)

w_o＝w_o(α)＝argmin_wL_train(w,α) (6)

for the hyper-network model during the architecture search, not only the average operation weight α on the directed edge for the network structure coding needs to be optimized, but also the weight w of the hyper-network itself needs to be learned. Therefore, the invention uses the following double-layer optimization formula to optimize the architecture:

s.t.w_o(α)＝argmin_wL_train(w,α) (7)

(h) And accelerating the performance evaluation by adopting a one-time reasoning mode to solve the double-layer optimization problem shown in the formula (7). Only single-step training is carried out in each time of inner layer model learning to obtain the approximate weight w of the current hyper-network^*(α). By this approximation, the training gradient of an embodiment of the present invention using the inner layer model is:

(i) after each time of executing the inner layer optimization, the classification accuracy of the current structure on the verification set can be evaluated, and the classification loss L of the current structure on the verification set is calculated_val(w^*(α), α), then continuing with the outer layer optimization by gradient descent method, updating the structural weight α to achieve higher classification accuracy (lower validation set classification loss). The outer layer optimization uses gradient descent of the verification set classification loss,

having the following form:

when the outer layer structure is optimized, only the standard gradient descent method is used for optimization, so the overall flow of the double-layer optimization is shown in fig. 5.

(j) Different from the common neural architecture search, the method adopts multiple auxiliary classification branches to perform a stable architecture search process during the search, and the weights of the two branches are respectively set to be 0.4 and 0.2 in the embodiment.

(k) And extracting to obtain the optimal dispersion, and retraining the final structure on the RS derivative classification data set to obtain the optimal network weight.

(h) If a detection task needs to be performed, the network structure is cut continuously after retraining so as to remove redundant classification heads and auxiliary classification branches and combine the redundant classification heads and the auxiliary classification branches to form a standard multi-scale feature pyramid, as shown in fig. 6.

Step S3: the self-trained automated deep learning target detector is constructed using the self-trained backbone network designed at S2.

(a) The backbone network automatically designed by step S2(h) is used instead of the backbone network manually designed in the conventional deep learning detector manually designed, as shown in fig. 7. The present invention refers to such detectors as automatic detectors.

(b) And respectively selecting single-stage RetinaNet, double-stage Faster R-CNN and multi-stage Cascade R-CNN as detection frames of the automatic detector to respectively obtain Auto RetinaNet (ARN), Auto FasterRCNN (AFR) and Auto CascadeRCNN (ACR).

(c) And designing a specific data enhancement scheme for the radar sonar data set by combining an automatic detector training process. In order to improve the robustness of rotation and deformation, when backbone network search and retraining are carried out on the derivative classification data set, random rotation, random cutting, random erasing and the like are carried out on a target image block, and when the detector is self-trained, random cutting and random overturning are adopted on the detection data set. The data enhancement scheme used in this example is shown in table 2.

Table 2 data enhancement method used for radar sonar image target detection task in this embodiment

The radar and sonar image classification and detection effects obtained by the invention are as follows:

For the detection task: the network structure shown in fig. 8 can be obtained by performing steps S1 and S2 on the radar ship target detection data set SSDD and the sonar common target detection data set SCTD, respectively. Then, the three types of detectors, including ARN, AFR and ACR, designed in step S3 can obtain the detection performances achieved in tables 3 and 4 in SSDD and SCTD.

In the description of the present invention, "a plurality" means two or more unless otherwise specified; the terms "upper", "lower", "left", "right", "inner", "outer", "front", "rear", "head", "tail", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are only for convenience in describing and simplifying the description, and do not indicate or imply that the device or element referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, should not be construed as limiting the invention. Furthermore, the terms "first," "second," "third," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, can be implemented in a computer program product that includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A radar or sonar image target detection and classification method based on automatic deep learning is characterized in that the radar or sonar image target detection and classification method based on automatic deep learning realizes automatic design of a deep neural network for specific radar or sonar image target detection and classification data sets through neural architecture search.

2. The method for detecting and classifying radar or sonar image targets based on automatic deep learning according to claim 1, wherein the method for detecting and classifying radar or sonar image targets based on automatic deep learning specifically comprises the following steps:

step one, a radar or sonar image data set is given, if the radar or sonar image data set is a classified data set, step three is directly performed, and if the radar or sonar image data set is a detection task, step two, step three and step four are sequentially performed;

thirdly, carrying out automatic design and retraining of a convolutional network structure on the derived classification data set by using convolutional neural network architecture search, and directly using the task classification data set for classification problems; the automatic network after retraining is directly used for the classification task of radar or sonar images;

step four, constructing an automatic deep learning target detector by using the self-training backbone network automatically designed in the step three, and training and verifying the automatic deep learning target detector on a task detection data set; the automatic deep learning target detector after training can be directly used for the detection task of radar or sonar images.

3. The method for detecting and classifying radar or sonar image targets based on automatic deep learning according to claim 1 and claim 2, wherein the neural architecture search algorithm used includes but is not limited to a differentiable neural architecture search algorithm.

4. The method for detecting and classifying radar or sonar image targets based on automatic deep learning according to claim 2, wherein the derived data set is a set of images including various targets and backgrounds extracted from a training set of a radar sonar detection data set, and design and self-training of the detector backbone convolutional network are indirectly achieved through the derived data set.

5. The method for radar or sonar image object detection and classification based on automated deep learning according to claim 2, further characterized in that the derived dataset is extracted only from a training set or a validation set of the detected dataset without using a test set of the detected dataset.

6. The method for detecting and classifying radar or sonar image targets based on automatic deep learning according to claim 2, wherein the automatic deep learning target detector replaces a manually designed backbone network module in the existing deep learning detector with an automatically designed backbone network module by replacing a part of modules or an overall structure of the existing deep learning target detector with a neural architecture search automatically designed network module or an overall structure.

7. The method for detecting and classifying radar or sonar image targets based on automatic deep learning according to claim 2, wherein in the second step, the extracting a derivative classification dataset from the detection dataset according to the labeling information includes:

8. The method for detecting and classifying radar or sonar image targets based on automatic deep learning according to claim 2, wherein in step three, the automatic design and retraining of the optimal convolutional network structure for the derived classification data set by using convolutional neural network architecture search comprises:

(1) simplifying the search of the integral structure of the convolutional network into the structure search in a convolutional network composition unit through functional abstraction; abstracting the function of the convolutional neural network into two types of computing units, a standard unit and a reduction unit, wherein the standard unit can keep the size of a feature map in the process of extracting features, and the reduction unit can reduce the size of the feature map and increase the number of channels of the feature map in the process of extracting the features;

(2) designing operator set, standard cell O, for two types of convolution network composition cells^N(6) A reduction unit O^R(6) (ii) a Modeling a convolution network structure in each computing unit into a directed acyclic graph; 4 computing nodes of 0, 1, 2 and 3 are arranged in the computing unit, and an operator set is designed and used for connecting the computing nodes;

(3) and constructing a conversion calculation mode of the internal feature diagram of the calculation unit during searching and reasoning. Let the feature graph represented by each node be x^(j)During network reasoning, each directed edge (i, j) is driven from O^N(6) Or O^R(6) In a convolution operation o⁽ⁱ ^,j)For converting the preceding profile x⁽ⁱ⁾I.e. each node receives as input all preceding nodes:

In place of o^(i,j)(x⁽ⁱ⁾)；

Directly as output, thus x^(j)Received from x⁽ⁱ⁾The partial weighting characteristics of (a):

wherein the content of the first and second substances,

The calculation of (d) will become:

wherein the content of the first and second substances,

i.e. replacing the mixing operation by the most probable operation

To obtain a discrete network structure, the operation with the largest weight is selected by argmax to obtain a discrete structure:

in the formula, the average operation weight alpha on each directed edge can be regarded as the attention distribution of the network architecture search algorithm to the convolution operation, namely, the alpha can be expressed as the continuous coding of the convolution structure;

(7) constructing a convolution network architecture to search a differentiable double-layer optimization scheme; in equations (2) and (3), the attention of the mixing operation is based on the conditional probability

Modeling, and p_oWith vector α in dimension | O |^(i,j)As a parameter, the problem of the architectural search is thus simplified to a set of relaxed continuous vectors α ═ α^(i,j)The learning problem. By L_trainRepresents the loss of training set, L_valRepresenting the loss of the test set, and after continuous relaxation is carried out on the operation weight, jointly learning the structure parameter alpha and the weight w; taking the classification accuracy of the verification set as the final reward or goodness of fit, taking the dual target as the minimized verification set loss, and using a gradient descent method with momentum in the optimization method; the core principle of carrying out architecture search is to determine the optimal network architecture alpha through gradient optimization_oTo minimize verification set loss L_val(w_o(α_o),α_o) Namely:

α_o＝argmin_αL_val(w_o(α),α) (5)

w_o＝w_o(α)＝argmin_wL_train(w,α) (6)

for a hyper-network model during framework search, not only an average operation weight alpha on a directed edge for network structure coding needs to be optimized, but also the weight w of the hyper-network itself needs to be learned; therefore, the architecture optimization is performed by using the following double-layer optimization formula:

s.t.w_o(α)＝argmin_wL_train(w,α) (7)

in the formula, alpha is an outer layer structure optimization variable, and w is an inner layer network weight optimization variable;

(8) accelerating performance evaluation by adopting a one-time reasoning mode to solve the double-layer optimization problem shown in the formula (7); only single-step training is carried out in each time of inner layer model learning to obtain the approximate weight w of the current hyper-network^*(α); by this approximation, the gradient upon training using the inner layer model is:

(9) after each time of executing the inner layer optimization, the current structure can be evaluated in the verification setThe classification loss L of the current structure on the verification set is calculated_val(w^*(alpha), continuously performing outer layer optimization by a gradient descent method, and updating the structure weight alpha to realize higher classification accuracy; the outer layer optimization uses gradient descent of the verification set classification loss,

having the following form:

when the outer layer structure is optimized, only a standard gradient descent method is used for optimization;

(10) different from common neural architecture search, a stable architecture search process is carried out by adopting multiple auxiliary classification branches during the search, and the weights of the two branches are respectively set to be 0.4 and 0.2;

(11) extracting to obtain optimal dispersion, and retraining a final structure on the RS derivative classification data set to obtain optimal network weight;

9. A radar or sonar image target detection and classification control system based on automatic deep learning by implementing the radar or sonar image target detection and classification method based on automatic deep learning according to any one of claims 1 to 8.

10. An application of the radar or sonar image target detection and classification method based on the automatic deep learning according to any one of claims 1 to 8 in visual information processing of land, seabed and air observation areas.