CN113359135A

CN113359135A - Training method, application method, device and medium for imaging and recognition model

Info

Publication number: CN113359135A
Application number: CN202110766344.3A
Authority: CN
Inventors: 胡晓伟; 郭艺夺; 冯为可; 何兴宇; 王宇晨; 冯存前
Original assignee: Air Force Engineering University of PLA
Current assignee: Air Force Engineering University of PLA
Priority date: 2021-07-07
Filing date: 2021-07-07
Publication date: 2021-09-07
Anticipated expiration: 2041-07-07
Also published as: CN113359135B

Abstract

The embodiment of the application provides a training method, an application method, a device and a medium for an imaging and recognition model, wherein the method comprises the following steps: acquiring labeling data used for training an imaging and recognition model, wherein the labeling data is used for performing imaging result labeling and classification result labeling on target echo signals acquired by a radar; according to the marking data and the loss function, an imaging and recognition model to be trained is trained to obtain a target imaging and recognition model, wherein the input of the imaging and recognition model is the target echo signal, the output of the imaging and recognition model is the classification result predicted for the target echo signal, and the loss function is represented by the predicted classification result and the marked classification result.

Description

Training method, application method, device and medium for imaging and recognition model

Technical Field

The embodiment of the application relates to the field of echo imaging identification, in particular to a training method, an application method, a device and a medium of an imaging and identification model.

Background

In the related art, in general, in Synthetic Aperture Radar (SAR)/Inverse Synthetic Aperture Radar (ISAR) automatic target recognition, an imaging model and a recognition model are usually trained respectively and independent loss functions are set for each, so that a target image obtained according to the trained imaging model usually has better sparseness, but the target image with better sparseness can cause a reduction in recognition accuracy in a subsequent classification process of the recognition model, and in a conventional imaging algorithm, a large number of iterations are required to obtain sparse solution, and a recognition and classification effect is also influenced.

Therefore, how to acquire a target image more beneficial to classification according to a target echo signal becomes a problem to be solved urgently.

Disclosure of Invention

The embodiments of the present application provide a training method, an application method, an apparatus, and a medium for an imaging and recognition model, which enable at least an imaging module to generate a target image optimal for a recognition process, thereby improving classification accuracy to the maximum.

In a first aspect, an embodiment of the present application provides a training method for an imaging and recognition model, where the training method includes: obtaining labeling data used for training an imaging and recognition model, wherein the labeling data is obtained by imaging result labeling and classification result labeling of target echo signals acquired by a radar; and training an imaging and recognition model to be trained according to the labeling data and a loss function to obtain a target imaging and recognition model, wherein the input of the imaging and recognition model is the target echo signal, the output of the imaging and recognition model is a classification result predicted by the target echo signal, and the loss function is characterized by the classification result predicted by the input target echo signal and a labeling classification result.

Therefore, the loss function in the embodiment of the present application is characterized by the prediction classification result and the label classification result of the input target echo data, instead of using the prediction imaging result of the target echo signal to evaluate whether the imaging model is trained or not in the related art, and training the imaging and recognition model including the imaging model and the recognition model, the imaging module can generate the target image optimal to the recognition process, so that the classification accuracy can be improved to the maximum extent.

With reference to the first aspect, in one embodiment, the imaging and recognition model includes an imaging model and a recognition model; the method for training the imaging and recognition model to be trained according to the labeling data and the loss function to obtain the target imaging and recognition model comprises the following steps: training an imaging model to be trained in a multi-iteration mode to obtain a value of a target parameter corresponding to the imaging model, completing preliminary training of the imaging model, and obtaining an initial imaging model, wherein the type of the target parameter at least comprises: step size and soft threshold; and training the imaging and recognition model by using the imaging result output by the initial imaging model and the loss function to obtain the target imaging and recognition model.

Therefore, different from a traditional Sparse Recovery (SR) algorithm with manually set parameters, in the embodiment of the present application, a target parameter value corresponding to an imaging model is obtained through training, and the imaging model is determined, so that the imaging model can generate an optimal target image beneficial to recognition and classification, and thus an imaging and recognition model with high classification accuracy can be obtained according to the target parameter value.

With reference to the first aspect, in one embodiment, the imaging model includes: a primary image generation module and a target image generation module; the training the imaging model to be trained in a multiple iteration mode to obtain a target parameter value corresponding to the imaging model includes: in the kth iteration, inputting the target echo signal into the primary image generation module to obtain a kth primary image, wherein k is an integer greater than or equal to 1; performing sparse domain transformation on the kth primary image according to the target image generation module to obtain a kth target image; and repeating the steps until a first termination condition is met or a set cycle number is reached, and obtaining a target parameter value corresponding to the imaging model.

Therefore, the primary image generated by the primary image generation module is input into the target image generation module for training, so that sparse domain transformation and inverse transformation are realized, the problem of poor effect caused by sparse domain transformation using an orthogonal matrix in the related technology can be solved at least, and meanwhile, the accuracy and identifiability of the target image generated by the imaging model can be improved in the training process.

With reference to the first aspect, in one implementation, the target image generation module includes: a transformation module and an inverse transformation module; the transformation module comprises in sequence: the device comprises an input layer, a first convolution layer, a first linear rectifying layer, a first pooling layer, a second convolution layer and an output layer; the inverse transformation module sequentially comprises: an input layer, a third convolutional layer, a second linear rectifying layer, a second pooling layer, a fourth convolutional layer and an output layer.

Therefore, the embodiment of the application can obtain parameters with a more flexible structure by training the transformation module (in the related art, only the two-dimensional orthogonal matrix is used for sparse domain transformation); by training the inverse transformation module, the primary image in the sparse domain can be inversely transformed to the image domain, so that the target image with high identifiability is generated.

With reference to the first aspect, in an implementation manner, before the imaging model to be trained is trained in a multiple iteration manner and a target parameter value corresponding to the imaging model is obtained, the method further includes: pre-training an imaging model to be pre-trained according to the imaging result marking data to obtain an ith pre-training result, wherein i is an integer greater than or equal to 1; and repeating the steps, and obtaining the imaging model to be trained when the ith pre-training result is confirmed to meet a second termination condition.

With reference to the first aspect, in an embodiment, the determining that the ith pre-training result satisfies the second termination condition includes determining that the second termination condition is satisfied according to an imaging loss function, where the imaging loss function is used to measure a difference between a predicted image and a standard image.

Therefore, according to the embodiment of the application, the imaging model to be pre-trained is pre-trained before the imaging and recognition model to be trained is trained, so that high-quality initial parameters can be obtained for the imaging model to be trained, the iteration times are reduced in the process of training the imaging and recognition model to be trained, and the precision and the training efficiency are improved.

In a second aspect, an embodiment of the present application provides an application method of imaging and recognition, where the method includes: acquiring a target echo signal acquired by a radar; inputting the target echo signal into the imaging and recognition model obtained by the training method according to any one of the first aspect and the embodiments thereof, and obtaining an imaging result and/or a classification result.

In a third aspect, an embodiment of the present application provides a training apparatus for an imaging recognition model, where the training apparatus includes: the data acquisition module is configured to acquire labeling data used for training an imaging and recognition model, wherein the labeling data is imaging result labeling and classification result labeling of a target echo signal acquired by a radar; and the model training module is configured to train an imaging and recognition model to be trained according to the labeling data and a loss function to obtain the imaging and recognition model, wherein the input of the imaging and recognition model is the target echo signal, the output of the imaging and recognition model is a classification result predicted for the target echo signal, and the loss function is represented by the predicted classification result and the labeled classification result.

With reference to the third aspect, in one embodiment, the imaging and recognition model includes an imaging model and a recognition model; the model training module is further configured to: training an imaging model to be trained in a multi-iteration mode to obtain a target parameter value corresponding to the imaging model, wherein the type of the target parameter at least comprises: step size and soft threshold; and obtaining the imaging and recognition model according to the target parameter value and the loss function.

With reference to the third aspect, in one embodiment, the imaging model includes: a primary image generation module and a target image generation module; wherein the model training module is further configured to: in the kth iteration, inputting the target echo signal into the primary image generation module to obtain a kth primary image, wherein k is an integer greater than or equal to 1; performing sparse domain transformation on the kth primary image according to the target image generation module to obtain a kth target image; and repeating the steps until a first termination condition is met or a set cycle number is reached, and obtaining a target parameter value corresponding to the imaging model.

With reference to the third aspect, in one embodiment, the target image generation module includes: a transformation module and an inverse transformation module; the transformation module comprises in sequence: the device comprises an input layer, a first convolution layer, a first linear rectifying layer, a first pooling layer, a second convolution layer and an output layer; the inverse transformation module sequentially comprises: an input layer, a third convolutional layer, a second linear rectifying layer, a second pooling layer, a fourth convolutional layer and an output layer.

With reference to the third aspect, in an embodiment, the model training module is further configured to: pre-training an imaging model to be pre-trained according to the imaging result marking data to obtain an ith pre-training result, wherein i is an integer greater than or equal to 1; and repeating the steps, and obtaining the imaging model to be trained when the ith pre-training result is confirmed to meet a second termination condition.

With reference to the third aspect, in an embodiment, the model training module is further configured to: determining that the second termination condition is satisfied according to an imaging loss function, wherein the imaging loss function is used for measuring the difference between the predicted image and the standard image.

In a fourth aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory, and a bus; the processor is connected to the memory via the bus, and the memory stores computer readable instructions for implementing the method according to any one of the first and second aspects when the computer readable instructions are executed by the processor.

In a fifth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed, implements the method of any one of the first and second aspects.

In a sixth aspect, embodiments of the present application provide a system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations of the respective methods according to any one of the first aspects.

In a seventh aspect, embodiments of the present application provide one or more computer storage media storing instructions that, when executed by one or more computers, cause the one or more computers to perform the operations of the respective methods according to any one of the first aspects.

Drawings

FIG. 1 is a diagram illustrating an application of an imaging and recognition model according to an embodiment of the present application;

fig. 2 is a network structure of a transformation module according to an embodiment of the present disclosure;

fig. 3 is a network structure of an inverse transform module according to an embodiment of the present application;

FIG. 4 is a network structure of a recognition model according to an embodiment of the present application;

FIG. 5 is a flowchart illustrating a training method of an imaging and recognition model according to an embodiment of the present disclosure;

FIG. 6 illustrates an embodiment of training of an imaging and recognition model according to the present application;

FIG. 7 illustrates an application of an imaging and recognition model according to an embodiment of the present application;

FIG. 8 is a block diagram of an imaging and recognition model training apparatus according to an embodiment of the present disclosure;

fig. 9 is an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as presented in the figures, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

The method steps in the embodiments of the present application are described in detail below with reference to the accompanying drawings.

The method and the device can be applied to scenes of imaging and identifying various target echo signals. For example, as shown in fig. 1, in one scenario, the aircraft 120 is provided with a radar and a target imaging and recognition model, wherein the radar detects an object on the ground to obtain a target echo signal 110 collected by the radar, and then the target imaging and recognition model performs target recognition according to the target echo data collected by the radar to output a recognition result. It is understood that in other embodiments, the imaging result and the classification result 130 may be output simultaneously.

The following takes an airplane as an example to identify a ground object, and exemplarily illustrates the problems in the related art. In the related art, an imaging module and a recognition module of a target echo signal acquired by an aircraft radar are usually trained respectively, so that a target image obtained in the imaging module usually has better sparse performance, but the target image with better sparse performance can cause the reduction of recognition accuracy in the subsequent recognition and classification process, and in the traditional imaging algorithm, a two-dimensional orthogonal matrix is used for sparse domain transformation, so that the imaging effect is poor.

At least in order to solve the above problem, some embodiments of the present application achieve the purpose of improving the model identification effect by improving a loss function for evaluating whether training can be finished. For example, in some embodiments of the present application, the loss function is characterized for a predicted classification and an annotated classification of the input target echo data. By the method for training the imaging and recognition model to be trained through the loss function, the problem of low recognition accuracy is solved. Some embodiments of the present application further provide that the primary image is input into the target image generation module for training to obtain a parameter with a flexible structure, so as to implement sparse domain transformation and inverse transformation, and solve the problem of poor imaging effect caused by sparse domain transformation using a two-dimensional orthogonal matrix in the related art.

The following describes an example of a process of imaging a target echo signal in the related art.

In the related art, according to a Range-Doppler Algorithm (RDA), after motion compensation is performed on a target echo signal, two-dimensional fourier transform is performed on the target echo signal in frequency and azimuth to obtain a target image, and this process is expressed as formula (1):

where E denotes the fully sampled target echo signal, X denotes the target image, and N denotes the received noise. F₁,F₂Is a normalized fourier matrix.

Considering the sparse sampling condition, only sampling a part of frequency and azimuth angle of the fully-sampled target echo signal to obtain a target echo signal, which can be expressed as shown in formula (2):

wherein Y represents a target echo signal, E represents a full-sampling target echo signal, phi₁，Φ₂Respectively a frequency sparse sampling matrix and an angle sparse sampling matrix, N' represents a sampled noise matrix, X represents a target image, psi₁＝Φ₁F₁，Ψ₂＝Φ₂F₂。

If the target image X is sufficiently sparse, X can be recovered from the target echo signal by solving equation (3):

wherein | · | purple sweet_FIs the Frobenius norm of the matrix, λ is the regularization parameter, | | · | | luminance₁Representing the sum of the absolute values of the matrix elements. Equation (3) can be solved using various SR algorithms, which typically require a large number of iterations. Suppose SR^(k)Representing the operation in the kth iteration, the output of the kth iteration is represented as:

X^(k)＝SR^(k)(X^(k-1)|Y，Ψ₁，Ψ₂，λ) (4)

wherein K represents the maximum iteration number, and is an integer greater than or equal to 1.

Meanwhile, in order to satisfy imaging of a complex scene in the related art, the target image obtained by formula (3) is generally transformed into another sparse domain, and two orthogonal matrices ω are defined₁And ω₂As image transformation matrices

Similarly, the above formula (3) can solve the target image by the formula (5):

selecting an appropriate transformation matrix is a challenge for imaging target echo signals. In the related art, the Discrete Wavelet Transform (DWT) is generally used for transforming an image domain and a sparse domain, but the discrete wavelet transform is a sparse transform commonly used for natural images, and has not ideal effect in imaging of target echo signals acquired by radar.

In combination with the above description of the related art, on one hand, the recognition accuracy is reduced due to the fact that the imaging model and the recognition model of the related art are trained respectively, and on the other hand, the problem of poor imaging effect is caused due to the fact that the related art uses a two-dimensional orthogonal matrix to perform sparse domain transformation.

In order to solve the problems in the related art, embodiments of the present application provide a training method, an application method, an apparatus, and a medium for an imaging and recognition model.

The imaging and recognition models constructed by some embodiments of the present application are first described below.

The imaging and recognition model in the embodiments of the present application includes: the imaging model is configured to image according to a target echo signal acquired by the radar to obtain a target image; the identification model is configured to identify the classification of the target image and obtain a classification result; wherein, the imaging module includes: the primary image generation module is configured to image according to the target echo signal to obtain a primary image; and the target image generation module is configured to perform sparse domain transformation and inverse transformation on the primary image to obtain a target image.

That is, the imaging model of some embodiments of the present application is represented by a primary image generation module and a target image generation module, the primary image generation module being represented by equation (6):

wherein R is^(k)Is shown at the i-th iterationThe kth primary image generated in (1); x^(k-1)Representing a (k-1) th target image generated during a (k-1) th iteration; rho^(k)Representing a kth step size parameter generated in the process of the kth iteration; Ψ₁＝Φ₁F₁，Ψ₂＝Φ₂F₂Wherein F is₁And F₂Representing a normalized Fourier matrix, phi₁Sparse sampling matrix, phi, representing frequency₂A sparse sampling matrix representing angles; y represents a target echo signal, k represents the number of iterations, and k is an integer greater than or equal to 1.

The target image generation module is represented by the following formula (7):

wherein, X^(k)Representing a kth target image generated during a kth iteration;

representing the k-th inverse transformation according to an inverse transformation module in the imaging model; csoft () represents a complex contraction function; f^(k)Representing the k-th sparse domain transformation according to a transformation module in the imaging model; theta^(k)Representing the kth soft threshold parameter generated during the kth iteration.

The target image generation module includes a transformation module and an inverse transformation module. As shown in fig. 2, the transformation module is composed of an input layer 310, a first convolution layer 320, a first linear rectifying layer 330, a first pooling layer 340, an index passing module 350, a second convolution layer 360 and an output layer 370 in sequence; wherein, the first convolutional layer 320(ComConv) and the second convolutional layer 360(ComConv) are complex convolutional layers with a size of 3 × 3, the first linear rectifying layer 330(ComReLU) is a complex linear rectifying layer, the first pooling layer 340(ComMaxPool) is a complex pooling layer with a size of 2 × 2, and the index passing module 350 is used for passing the index parameters to the inverse transform module.

That is, as shown in fig. 3, the inverse transform module is composed of an input layer 410, a third convolution layer 420, a second linear rectification layer 430, an index passing module 440, a second pooling layer 450, a fourth convolution layer 460, and an output layer 470 in this order; wherein, the third convolutional layer 420(ComConv) and the fourth convolutional layer 460(ComConv) are complex convolutional layers with a size of 3 × 3, the second linear rectifying layer 430(ComReLU) is a complex linear rectifying layer, the second pooling layer 450(ComMaxPool) is a complex pooling layer with a size of 2 × 2, and the index passing module 440 is configured to receive the index parameters passed by the transform module.

As shown in fig. 4, the recognition model is a convolutional network, and is formed by sequentially connecting an input layer 511, a fifth convolutional layer 520, a first average pooling layer 530, a sixth convolutional layer (including a linear rectifying unit) 540, a maximum pooling layer 550, an eighth convolutional layer (including a linear rectifying unit) 560, a second average pooling layer 570, a ninth convolutional layer 580, a third average pooling layer 590, and an output layer 512; wherein, the sizes of convolution kernels in the convolution layers are all 5 multiplied by 5 or 3 multiplied by 3, and the sliding step length is 1; the sixth convolution layer and the eighth convolution layer adopt batch standard operation, and a linear rectifying unit is used for rectifying; the window size of the first and second average pooling layers was 2 × 2, and the step size was 2.

It should be noted that the parameters (such as the size and the step size of the convolution kernel, etc.) set in the above imaging and recognition model are not limited to this, and may also take any other values.

The specific structure of the imaging and recognition model constructed by some embodiments of the present application is described above; the method for training the imaging and recognition model, which is executed by an electronic device (or system) in the embodiments of the present application, will be described in detail below with reference to the accompanying drawings.

As shown in fig. 5, a training method for an imaging and recognition model provided in an embodiment of the present application includes: s210, acquiring labeling data for training an imaging and recognition model; and S220, training the imaging and recognition model to be trained according to the labeling data and the loss function to obtain the imaging and recognition model.

The implementation process of the above steps is exemplarily set forth below.

In one embodiment, the labeling data related to S210 is imaging result labeling and classification result labeling for target echo signals acquired by radar.

That is, the labeling data may include multiple sets of target echo data, imaging result labels, and classification result labels, or may include only multiple sets of imaging result labels and classification result labels.

It should be noted that, the radar (for example, synthetic aperture radar) may be installed on an airplane to collect the target echo signal on the ground, and may also be installed on a satellite to collect the target echo signal on the ground; the radar (such as inverse synthetic aperture radar) can also be arranged on the ground to collect target echo signals of objects such as airplanes, ships, missiles and the like.

An embodiment of S220 performed by the electronic device will be described below.

In order to improve the convergence capability of the imaging and recognition model, in some embodiments of the present application, before S220, the imaging model to be pre-trained having the above model structure needs to be pre-trained according to the imaging result labeling data, so as to obtain an ith pre-training result, where i is an integer greater than or equal to 1; and repeating the steps, and obtaining the imaging model to be trained when the ith pre-training result is confirmed to meet a second termination condition.

That is, before the imaging and recognition model to be trained is trained integrally, the imaging model to be pre-trained needs to be pre-trained, as shown in equation (8):

wherein R is⁽ⁱ⁾Representing an ith primary image generated during an ith iteration; x^(i-1)Representing the (i-1) th target image generated in the process of the (i-1) th iteration; rho⁽ⁱ⁾Representing an ith step size parameter generated in the process of the ith iteration; Ψ₁＝Φ₁F₁，Ψ₂＝Φ₂F₂Wherein F is₁And F₂Representing a normalized Fourier matrix, phi₁Sparse sampling of representation frequenciesMatrix, phi₂A sparse sampling matrix representing angles; y represents a target echo signal; x⁽ⁱ⁾Representing an ith target image generated in the ith iteration process;

representing the ith inverse transformation according to an inverse transformation module in the imaging model; csoft () represents a complex contraction function; f⁽ⁱ⁾Representing the ith sparse domain transformation according to a transformation module in the imaging model; theta⁽ⁱ⁾Representing the ith soft threshold parameter generated during the ith iteration.

In the process of pre-training the imaging model to be pre-trained, taking the 2 nd iteration as an example (i.e. i equals 2), the pre-training process is exemplarily illustrated:

the target echo signal and the 1 st target image X generated by the first iteration are combined⁽¹⁾Inputted into the above formula (8), and calculated to obtain the 2 nd primary image R⁽²⁾While obtaining the 2 nd step size parameter rho⁽²⁾(ii) a Inputting the 2 nd primary image into a network structure of a transformation module, and outputting a transformed sparse domain image; processing the sparse domain image according to the complex shrinkage function, and simultaneously obtaining a 2 nd soft threshold parameter; and inputting the sparse domain image subjected to the complex contraction function processing and the 2 nd soft threshold parameter into an inverse transformation module, and performing inverse transformation on the sparse domain image to an image domain by the inverse transformation module to obtain a 2 nd target image generated in the 2 nd iteration process.

After the 2 nd target image is obtained, in the imaging loss function shown in formula (9), the imaging loss function is used for measuring the difference between a predicted image (i.e. the ith target image) and the imaging result annotation data, the 2 nd target image is compared with the imaging result annotation data, when the second termination condition is confirmed to be met, iteration is stopped, the imaging model to be trained (the imaging model to be trained is the imaging model to be trained in advance after being trained) and the initial parameters of the step length and the soft threshold value are obtained, so that the better initial parameters can be obtained in the whole training process of the imaging and recognition model to be trained, and the convergence effect is enhanced.

Therein, Loss_SRRepresenting the image loss function, X_mThe reference image in the imaging result labeling data is expressed, and γ represents a parameter (which may be 0.1 in the embodiment of the present application) that balances accuracy and orthogonality.

The second termination condition may be that the set number of iterations (e.g., 20 or 200) is satisfied, or that the similarity between the generated predicted image and the imaging result annotation data is greater than 80%. The embodiments of the present application are not limited thereto.

Therefore, the imaging model is pre-trained before the imaging and recognition model to be trained is trained, so that high-quality initial parameters can be obtained for the imaging model to be trained, the number of iterations is reduced in the process of training the imaging and recognition model to be trained, and the precision and the training efficiency are improved.

In the above description, in the embodiment of the present application, before the imaging and recognition model to be trained is integrally trained, the imaging model is pre-trained, so that at least the obtained soft threshold parameter and step size parameter can be used as the initial parameters for integrally training the imaging and recognition model.

In the embodiment of the present application, a specific implementation of the overall training of the imaging and recognition model to be trained will be described below.

In one embodiment, the input of the imaging and recognition model in S220 is the target echo signal, the output is the predicted classification result of the target echo signal, and the loss function is characterized by the predicted classification result and the labeled classification result of the target echo signal.

That is, in the process of training the imaging and recognition model, the target echo signal is input into the imaging and recognition model, the classification result predicted for the target echo signal is output, the loss function compares the predicted classification result with the labeled classification result, and then the predicted classification result and the labeled classification result are fed back to the imaging and recognition model to enter the next iteration.

In one embodiment, the imaging and recognition model includes an imaging model and a recognition model; s220 comprises: training an imaging model to be trained in a multi-iteration mode to obtain a value of a target parameter corresponding to the imaging model, completing preliminary training of the imaging model, and obtaining an initial imaging model, wherein the type of the target parameter at least comprises: step size and soft threshold; and training the imaging and recognition model by using the imaging result output by the initial imaging model and the loss function to obtain the target imaging and recognition model.

That is to say, the procedure of supervised training of the imaging and recognition model to be trained is as shown in fig. 6, where the imaging and recognition model to be trained includes an imaging model 620 to be trained and a recognition model 630 to be trained; the following exemplary description describes the overall process of training the imaging and recognition model to be trained:

inputting the target echo signal 610 into an imaging model 620 to be trained, and performing k-time iterative learning on the imaging of the target echo signal by the imaging model 620 to be trained to obtain a target image; inputting the target image into a recognition model 630 to be trained for classification and recognition; the recognition model 630 to be trained inputs the predicted classification result into the loss function 640; the loss function 640 compares the predicted classification result with the labeled classification result, feeds the comparison result back to the imaging model 620 to be trained and the recognition model to be trained, and enters the next iteration; and obtaining target parameters corresponding to the imaging model and the trained imaging and recognition model until the cycle number is met.

It should be noted that the learning of k iterations by the imaging model 620 to be trained on imaging of the target echo signal is 8 times (i.e., k is 8), and the embodiment of the present application is not limited thereto.

The training process of the imaging model to be trained will be described in detail below.

In one embodiment, an imaging model comprises: a primary image generation module and a target image generation module; in the kth iteration, inputting a target echo signal into a primary image generation module to obtain a kth primary image, wherein k is an integer greater than or equal to 1; performing sparse domain transformation on the kth primary image according to a target image generation module to obtain a kth target image; and repeating the steps until the first termination condition is met or the set cycle number is reached, and obtaining the target parameter value corresponding to the imaging model.

In the embodiment of the present application, an iterative shrinkage threshold algorithm is selected as a solver of an SR algorithm, and a primary image generation module included in an imaging model is represented by the above formula (6): the target echo signal and the k-1 target image X generated in the last iteration are combined^(k-1)Inputted into the above formula (6), and calculated to obtain the k-th primary image R^(k)And simultaneously, obtaining a k-th step length parameter rho^(k)After the k-th primary image is obtained, the imaging effect is poor due to the fact that the k-th primary image is not sparse enough, and therefore the k-th primary image needs to be transformed from an image domain to a sparse domain.

The kth primary image is thus input to the target image generation module, which is represented by the above equation (7): the target image generation module includes a transformation module shown in fig. 2 and an inverse transformation module shown in fig. 3. The specific structures of the transformation module and the inverse transformation module are as described above, and are not described herein again.

That is, the target image generation module performs the k-th primary image R according to the above formula (7)^(k)The specific process of performing the sparse domain transform and inverse transform is as follows:

first, the kth primary image R^(k)An image with dimension of 128 × 128 and channel of 1 is input through an input layer 310 of a transform module, and enters a first convolution layer 320 to generate an image with dimension of 128 × 128 and channel of 32; then, an image with 128 × 128 dimensions and 32 channels is generated by the first linear rectifying layer 330; then, generating an image with 64 × 64 dimensionality and 32 channels by the first pooling layer 340, and meanwhile, inputting the index parameters into an index transfer module 440 in the inverse transformation module through an index transfer module 350; then, a second convolution layer 360 is formed to have a dimension of 64 × 64The 32 th image is obtained and output through the output layer 370, whereby the k-th primary image in the sparse domain is obtained.

And secondly, inputting the kth primary image in the sparse domain into a complex contraction function for transformation, and obtaining the kth sparse domain image after the transformation of the complex contraction function.

Finally, the k sparse domain image after the complex contraction function transformation and a k soft threshold parameter theta generated in the k iteration process^(k)After being input into the input layer 410 of the inverse transform module, the input image enters the third convolution layer 420 to generate an image with dimension of 64 × 64 and channel (layer) of 32; then, an image with 64 x 64 of dimension and 32 channels is generated by the second linear rectifying layer 430; then, generating an image with 128 × 128 dimensionality and 32 channels by using a second pooling layer 450, and meanwhile, receiving index parameters by an index transmission module 440; then, an image with dimension 128 × 128 and channel 1 is generated by the fourth convolution layer 460 and output through the output layer 470, so that the target image and the target parameter value corresponding to the imaging model are obtained.

After the imaging model outputs the target image, the target image is input into the recognition model shown in fig. 4 for classification recognition, and the specific structure of the recognition model is as described above and will not be described herein again.

That is, first, the imaging model generates a target image (128 × 128 × 1) having dimensions of 128 × 128 and a channel of 1, then, the target image is subjected to a center cropping operation to obtain a 64 × 64 × 1 image, and the target image is subjected to channel copying to obtain a 64 × 64 × 2 image.

A 64 × 64 × 2 image (image size is 64 × 64, channel is 2) is input into a recognition model for classification recognition, and after being input through an input layer 511, the image sequentially passes through a fifth convolutional layer 520 (obtaining an image with size 60 × 60, channel is 32), a first average pooling layer 530 (obtaining an image with size 30 × 30, channel is 32), a sixth convolutional layer (including a linear rectifying unit) 540 (obtaining an image with size 28 × 28, channel is 64), a max pooling layer 550 (obtaining an image with size 14 × 14, channel is 64), an eighth convolutional layer (including a linear rectifying unit) 560 (obtaining an image with size 12 × 12, channel is 64), a second average pooling layer 570 (obtaining an image with size 6 × 6, channel is 64), a ninth convolutional layer 580 (obtaining an image with size 2 × 2, channel is 128), and a third average pooling layer 590 (obtaining an image with size 1 × 1, images with channel 128).

Then, the recognition model inputs the extracted feature vectors into a fully connected layer containing a parameter regularization method (Dropout), and the output result through the Softmax classifier is shown as formula (10):

p＝TC(X^(K)) (10)

wherein TC represents respective recognition operations, and P represents a probability set that the target image belongs to at least one category; the ith element in the vector P may be represented by equation (11).

Wherein, P_iRepresenting the probability that the target image belongs to the i-th class, O_iAnd (4) representing the result corresponding to the ith class classification, wherein I represents the number of the target classes.

Finally, after obtaining the probability value that the target image belongs to each type, P is added_iInput to the cross entropy based loss function as shown in equation (12).

Wherein Loss represents a Loss value, M represents a total number of labeled data, and Y represents a total number of labeled data_mRepresenting m target echo signals, label_mLabel, P (label), representing the classification result of the mth target echo signal_m|Y_m) Representing the correct classification probability P_i(i＝label_m)。

That is, after calculating the loss value, the loss function feeds back the loss value to the imaging model and the recognition model, and then enters the next iteration until the iteration number or the classification precision is met.

It should be noted that the parameters in the imaging and recognition model are composed of learnable parameters in the imaging model and the recognition model, and in the conventional SR algorithm, all the parameters are manually set, and the iteration number is constant, so that a large number of iterations are usually required to converge, and the recovery performance is unstable. In the embodiment of the application, a learnable SR algorithm is used, the iterative process of the traditional SR algorithm is expanded into a network structure, and parameters are learnt through training.

The above describes a specific process of training the imaging and recognition model in the embodiment of the present application, and the following describes a process of applying the imaging and recognition model in the embodiment of the present application.

In one embodiment, a method for applying an imaging and recognition model includes: acquiring a target echo signal acquired by a radar; and inputting the target echo signal into the imaging and recognition model obtained by the training method to obtain an imaging result and/or a classification result.

That is, as shown in fig. 7, after the training of the imaging and recognition model is completed, the target echo signal 710 obtained by the radar is input into the imaging model 720 of the imaging and recognition model, the imaging model 720 generates a target image, the target image is input into the recognition model 730 for classification and recognition, and then the target image and the classification result are output; or when the category of the target image is not identified, the imaging model directly outputs the target image after generating the target image; or, under the condition that only the classification result needs to be obtained, the identification model directly outputs the classification result after performing classification identification.

The application of the imaging and recognition model in the embodiments of the present application is described above, and the comparative tests with respect to the imaging and recognition model and other algorithms are described below.

In the embodiment of the present application, the imaging and recognition model of the embodiment of the present application is tested on an MSTAR dataset commonly used in Synthetic Aperture Radar (SAR) automatic target recognition. The standard operation condition of the MSTAR is considered, the classification corresponding to the target echo signals on the ground is divided into ten categories, and each category comprises hundreds of target images with different azimuth angles and pitch angles. According to the method in the related art, a target echo signal is recovered from a 128 × 128 complex MSTAR image, and the size of the obtained full-sampling echo is 128 × 128. The target echo signal is constructed by randomly selecting a partial frequency and an angle from the full-sampling echo, and defining the sparse sampling rate as ssr, the size of the target echo signal is (ssr × 128) × (ssr × 128), training is performed using the labeling data of the pitch angle of 17 °, and the test is performed using the target echo signal data of the pitch angle of 15 °.

In the imaging and recognition model in the embodiment of the application, the number of layers of the imaging model is set to be 8 through experiments, so that the accuracy and the efficiency can be ensured. The learnable imaging model is first pre-trained by equation (9), and then the entire imaging and recognition model is trained end-to-end by equation (12).

For comparison, some comparison methods, the imaging and classification steps of which were performed in steps, were also tested in the examples of the present application. In the imaging step, an RDA algorithm based on zero-filling fourier transform, a conventional iterative thresholding-thresholding algorithm (ISTA), and an ISTA algorithm using Discrete Wavelet Transform (DWT) as sparse transform are used. To improve the imaging effect of the ISTA, the step size and soft threshold parameters in the ISTA need not be set manually, but rather are learned from the data by minimizing subsequent losses.

After training, the ISTA will use the same parameters in different iterations. In the stage of classification and identification, a convolution network structure which is the same as the identification model in the imaging and identification model is adopted, and the obtained images are classified after training. For simplicity, the three step imaging and classification methods described above are referred to as RD, ISTA, and ISTA-DWT, respectively.

All networks were trained using a PyTorch-based Adam optimization algorithm. The experiment was performed on a workstation using an intelcorei9-9900 processor and an nvidiageforcertx2080ti graphics processor.

Experiment 1: in the contrast experiment of the imaging and recognition model, the sparse sampling condition is set to be ssr-0.5; the maximum number of iterations of ISTA is 100.

The experimental results of experiments using the RDA algorithm, the ISTA algorithm, and the ISTA-DWT algorithm in the related art are described below.

The experimental results also show a target result under relatively weak clutter, including: the result of the experiment performed by using the RDA algorithm has serious side lobes; in the results of experiments performed by using the ISTA algorithm, an ISTA image is composed of a plurality of isolated strong scatterers, and the sparsity of the image is enhanced by restraining weak scatterers or regions, so that the outline of a target is discontinuous; in the results of experiments using the ISTA-DWT algorithm, it can be seen that scatterers in the DWT domain are isolated, but scatterers in the final image are not isolated; thus, the ISTA-DWT may retain some weak scatterers or regions. However, since the low frequency components of the target image are suppressed in the DWT domain, there are some steep edges in the final target image.

The following is a description of experimental results of an imaging and recognition model (ICI-Net) in the examples of the present application.

In the results of experiments performed using the imaging and recognition model (i.e., ICI-Net) of the embodiments of the present application, it can be seen that: the target image generated using the imaging model is in good agreement with the reference image; meanwhile, the target area is well stored, classification and identification are facilitated, clutter backgrounds which probably do not contribute to classification are averaged, and the influence of the clutter on the classification and identification can be reduced to a certain extent.

In addition, the imaging and recognition model in the embodiment of the application has a stable effect under the condition that the target image is not sparse enough in both the image domain and the DWT domain; but both the imaging results of ISTA and ISTA-DWT become poor.

Therefore, it can be seen from the above experimental results that in the imaging and classification identification method, it is difficult to select appropriate parameters and sparse domains for various target scenes; the "sparsest image" for imaging may not be the "best image" for classification. Therefore, the method of separately performing imaging and classification is not favorable for obtaining better classification performance.

In order to solve the problems in the related art, the embodiment of the present application combines the imaging process and the classification process into an integrated model; learning parameters in the imaging process, aiming at maximizing the classification precision instead of pursuing the most sparse image; in addition, the learnable sparse transformation increases the degree of freedom of the whole network, and is beneficial to further improving the classification precision.

The following is a comparison of the data of each algorithm in the related art and the result of the ICI-Net algorithm in the examples of the present application in the experiment, and the average result of 50 experiments is shown in table 1:

it can be seen that the ICI-Net (i.e. the imaging and recognition model in the embodiment of the present application) proposed in the embodiment of the present application achieves the best classification accuracy. The results of ISTA and ISTA-DWT are similar and are superior to RDA. To intuitively interpret the classification results, two examples of imaging results are given.

TABLE 1 results of experiment one

	RDA	ISTA	ISTA-DWT	ICI-Net
					Accuracy of classification	93.22	95.27	95.05	97.01
Run time (s/GPU)	0.0065	0.5758	0.5675	0.1091

Table 1 also gives the run times of the different algorithms. Obviously, the ICI-Net proposed in the examples of the present application is more efficient than the ISTA and ISTA-DWT methods (100 iterations), but slower than the RDA method. As the number of iterations decreases, the runtime of the ISTA decreases. The ISTA performance under different iteration times is tested. Since ISTA-DWT is not better than ISTA as shown in Table 1, only ISTA was tested here. The classification accuracy and the operation time of 50, 100, 150 iterations are shown in table 2, and as the iteration number is reduced from 100 to 50, not only the operation time is significantly reduced, but also the classification accuracy is significantly reduced. As it increases from 100 to 150, the run time increases significantly, but the accuracy increases slightly. The iteration number 100 proves to be a suitable choice for the ISTA with higher precision and efficiency.

Table 2, ISTA results for different iterations.

Number of iterations in ISAT	50	100	150
				Accuracy of classification	0.9314	0.9527	0.9567
Operation (s/GPU)	0.2854	0.5758	0.872

Experiment 2: the ICI-Net networks proposed in the embodiments of the present application were tested at different sparse sampling rates ssr ═ 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8. The model for each ssr is retrained based on the initial model with ssr 0.5. The results of this experiment were obtained by 50 experiments, and it is evident that the classification accuracy of the different methods all improved with increasing ssr. More importantly, the ICI-Net provided by the embodiment of the application is obviously superior to other methods under different ssr. For the MSTAR dataset when the sparse sampling rate ssr is 0.8, the average classification accuracy is as high as 98.42%.

Therefore, in summary, conventional SAR/ISAR Automatic Target Recognition (ATR) generally considers the imaging step and the classification recognition step separately. The target image obtained in the imaging step may not be the best for classification recognition in the subsequent step. The application initially researches the SAR/ISAR imaging and classification integration problem under the sparse sampling condition, and provides an imaging and classification integration framework which is realized by a deep network, namely an imaging and classification integration network (ICI-Net). The ICI-Net network combines a sparse recovery algorithm and a Convolutional Neural Network (CNN), and hyper-parameters, a sparse transformation matrix and CNN parameters in the sparse recovery algorithm are obtained by learning from training data, so that a target image most beneficial to a final classification task is automatically generated. The experimental result of the 10-class target classification task of the MSTAR reference data set shows that the average classification accuracy of the ICI-Net is 97.01% under the condition that the sparse sampling rate is 50%, and the average classification accuracy of the ICI-Net is 98.42% under the condition that the sparse sampling rate is 80%.

In addition, in the embodiment of the application, the imaging step and the classification and identification step are comprehensively considered. An integrated framework combining images and classification is provided and is realized by utilizing a deep network. The network consists of an imaging model constructed by an expanded SR algorithm and a recognition model with CNN layers. In SR-based imaging models, SAR/ISAR images are sparse in the transform domain, which can be learned from training data, rather than in the image domain itself. Thus, the target information is more likely to be recovered from the sparse sampling. All parameters in the SR algorithm are learned through training, rather than being set manually. The goal of learning is to maximize classification accuracy rather than finding the sparsest image. It can therefore produce an optimal target image for the classification task. Finally, the SR imaging model contains only a small number of layers, which is much less iterative than the conventional SR algorithm. Thus, the entire network is more efficient than conventional imaging and classification methods.

The comparative testing of the imaging and recognition model with other algorithms in the embodiments of the present application is described above, and the apparatus of one of the imaging and recognition models in the embodiments of the present application will be described below.

As shown in fig. 8, an imaging and recognition model training apparatus 800 in the embodiment of the present application includes: a data acquisition module 810 and a model training module 820.

In one embodiment, the present application provides a training apparatus 800 for imaging and recognizing a model, the training apparatus comprising: a data obtaining module 810, configured to obtain labeling data for training an imaging and recognition model, where the labeling data is imaging result labeling and classification result labeling of a target echo signal acquired by a radar; and a model training module 820 configured to train the imaging and recognition model to be trained according to the labeled data and a loss function, to obtain the imaging and recognition model, wherein the input of the imaging and recognition model is a target echo signal, the output of the imaging and recognition model is a classification result predicted for the target echo signal, and the loss function is represented by the predicted classification result and the labeled classification result.

In one embodiment, the imaging and recognition model includes an imaging model and a recognition model; model training module 820 is further configured to: training an imaging model to be trained in a multi-iteration mode to obtain a target parameter value corresponding to the imaging model, wherein the type of the target parameter at least comprises: step size and soft threshold; and obtaining an imaging and recognition model according to the target parameter value and the loss function.

In one embodiment, an imaging model includes: a primary image generation module and a target image generation module; wherein model training module 820 is further configured to: in the kth iteration, inputting a target echo signal into a primary image generation module to obtain a kth primary image, wherein k is an integer greater than or equal to 1; performing sparse domain transformation on the kth primary image according to a target image generation module to obtain a kth target image; and repeating the steps until the first termination condition is met or the set cycle number is reached, and obtaining the target parameter value corresponding to the imaging model.

In one embodiment, the target image generation module comprises: a transformation module and an inverse transformation module; the transformation module comprises in sequence: the device comprises an input layer, a first convolution layer, a first linear rectifying layer, a first pooling layer, a second convolution layer and an output layer; the inverse transformation module sequentially comprises: an input layer, a third convolutional layer, a second linear rectifying layer, a second pooling layer, a fourth convolutional layer and an output layer.

In one embodiment, the model training module 820 is further configured to: pre-training an imaging model to be pre-trained according to the imaging result marking data to obtain an ith pre-training result, wherein i is an integer greater than or equal to 1; and repeating the steps, and obtaining the imaging model to be trained when the ith pre-training result is confirmed to meet the second termination condition.

In one embodiment, the model training module 820 is further configured to: the second termination condition is determined to be satisfied based on an imaging loss function that measures a difference between the predicted image and the standard image.

In the embodiment of the present application, the module shown in fig. 8 can implement each process in the method embodiments of fig. 1 to 7. The operations and/or functions of the respective modules in fig. 8 are respectively for implementing the corresponding flows in the method embodiments in fig. 1 to 7. Reference may be made specifically to the description of the above method embodiments, and a detailed description is appropriately omitted herein to avoid redundancy.

As shown in fig. 9, an embodiment of the present application provides an electronic device 900, including: a processor 910, a memory 920 and a bus 930, wherein the processor 910 is connected to the memory 920 via the bus 930, and the memory stores computer readable instructions for implementing the method according to any one of the above embodiments when the computer readable instructions are executed by the processor, and the detailed description can be omitted appropriately to avoid redundancy.

Wherein the bus is used for realizing direct connection communication of the components. The processor in the embodiment of the present application may be an integrated circuit chip having signal processing capability. The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The Memory may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Read Only Memory (EPROM), an electrically Erasable Read Only Memory (EEPROM), and the like. The memory has stored therein computer readable instructions that, when executed by the processor, may perform the methods of the embodiments described above.

It will be appreciated that the configuration shown in fig. 9 is merely illustrative and may include more or fewer components than shown in fig. 9 or have a different configuration than shown in fig. 9. The components shown in fig. 9 may be implemented in hardware, software, or a combination thereof.

Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed, the method in any of the above-mentioned all embodiments is implemented, in particular, refer to the description in the above-mentioned method embodiments, and in order to avoid repetition, detailed description is appropriately omitted here.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A training method for an imaging and recognition model, the training method comprising:

obtaining labeling data used for training an imaging and recognition model, wherein the labeling data is obtained by imaging result labeling and classification result labeling of target echo signals acquired by a radar;

and training an imaging and recognition model to be trained according to the labeling data and a loss function to obtain a target imaging and recognition model, wherein the input of the imaging and recognition model is the target echo signal, the output of the imaging and recognition model is a classification result predicted by the target echo signal, and the loss function is characterized by the classification result predicted by the input target echo signal and a labeling classification result.

2. The method of claim 1, wherein the imaging and recognition model comprises an imaging model and a recognition model;

the method for training the imaging and recognition model to be trained according to the labeling data and the loss function to obtain the target imaging and recognition model comprises the following steps:

training an imaging model to be trained in a multi-iteration mode to obtain a value of a target parameter corresponding to the imaging model, completing preliminary training of the imaging model, and obtaining an initial imaging model, wherein the type of the target parameter at least comprises: step size and soft threshold;

and training the imaging and recognition model by using the imaging result output by the initial imaging model and the loss function to obtain the target imaging and recognition model.

3. The method of claim 2, wherein the imaging model comprises: a primary image generation module and a target image generation module; wherein the content of the first and second substances,

the training the imaging model to be trained in a multiple iteration mode to obtain a target parameter value corresponding to the imaging model includes:

in the kth iteration, inputting the target echo signal into the primary image generation module to obtain a kth primary image, wherein k is an integer greater than or equal to 1;

performing sparse domain transformation on the kth primary image according to the target image generation module to obtain a kth target image;

and repeating the steps until a first termination condition is met or a set cycle number is reached, and obtaining a target parameter value corresponding to the imaging model.

4. The method of claim 3, wherein the target image generation module comprises: a transformation module and an inverse transformation module;

the transformation module comprises in sequence: the device comprises an input layer, a first convolution layer, a first linear rectifying layer, a first pooling layer, a second convolution layer and an output layer;

the inverse transformation module sequentially comprises: an input layer, a third convolutional layer, a second linear rectifying layer, a second pooling layer, a fourth convolutional layer and an output layer.

5. The method according to any one of claims 2 to 4, wherein before the imaging model to be trained is trained in a multiple iteration manner and target parameter values corresponding to the imaging model are obtained, the method further comprises:

pre-training an imaging model to be pre-trained according to the imaging result marking data to obtain an ith pre-training result, wherein i is an integer greater than or equal to 1;

and repeating the steps, and obtaining the imaging model to be trained when the ith pre-training result is confirmed to meet a second termination condition.

6. The method of claim 5, wherein the step of confirming that the ith pre-training result satisfies a second termination condition comprises:

and determining that the second termination condition is met according to an imaging loss function, wherein the imaging loss function is used for measuring the difference between the predicted image and the imaging result annotation data.

7. A method of applying an imaging and recognition model, the method comprising:

acquiring a target echo signal acquired by a radar;

inputting the target echo signal into a target imaging and recognition model obtained by the training method according to any one of claims 1 to 6, and obtaining an imaging result and/or a classification result.

8. A training apparatus for imaging and recognizing a model, the training apparatus comprising:

the data acquisition module is configured to acquire labeling data used for training an imaging and recognition model, wherein the labeling data is obtained by labeling imaging results and labeling classification results of target echo signals acquired by a radar;

and the model training module is configured to train an imaging and recognition model to be trained according to the labeling data and a loss function to obtain a target imaging and recognition model, wherein the input of the imaging and recognition model is the target echo signal, the output of the imaging and recognition model is a classification result predicted by the target echo signal, and the loss function is characterized by the classification result predicted by the input target echo signal and a labeling classification result.

9. A system, characterized in that the system comprises one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform the operations of the respective methods of any of claims 1-7.

10. One or more computer storage media storing instructions that, when executed by one or more computers, cause the one or more computers to perform operations of the respective methods of any of claims 1-7.