CN113359135B

CN113359135B - Training method, application method, device and medium for imaging and recognition model

Info

Publication number: CN113359135B
Application number: CN202110766344.3A
Authority: CN
Inventors: 胡晓伟; 郭艺夺; 冯为可; 何兴宇; 王宇晨; 冯存前
Original assignee: Air Force Engineering University of PLA
Current assignee: Air Force Engineering University of PLA
Priority date: 2021-07-07
Filing date: 2021-07-07
Publication date: 2023-08-22
Anticipated expiration: 2041-07-07
Also published as: CN113359135A

Abstract

The embodiment of the application provides a training method, an application method, a device and a medium for imaging and identifying models, wherein the method comprises the following steps: acquiring annotation data for training an imaging and recognition model, wherein the annotation data is formed by carrying out imaging result annotation and classification result annotation on target echo signals acquired by a radar; according to the labeling data and the loss function, training an imaging and recognition model to be trained to obtain a target imaging and recognition model, wherein the input of the imaging and recognition model is the target echo signal, the output is a classification result predicted for the target echo signal, and the loss function is represented by the predicted classification result and the labeled classification result.

Description

Training method, application method, device and medium for imaging and recognition model

Technical Field

The embodiment of the application relates to the field of echo imaging recognition, in particular to a training method, an application method, a device and a medium of an imaging and recognition model.

Background

In the related art, in general, in automatic target recognition of Synthetic Aperture Radar (Synthetic Aperture Radar, SAR)/Inverse Synthetic Aperture Radar (ISAR), an imaging model and a recognition model are usually trained separately and set with independent loss functions for each, so that a target image obtained from the trained imaging model generally has better sparse performance, but has better sparse performance, which may result in a decrease in recognition accuracy in the subsequent classification of the recognition model, and in a conventional imaging algorithm, a large number of iterations are required to obtain sparse decomposition, while also affecting recognition classification effects.

Therefore, how to obtain a target image more beneficial to classification according to a target echo signal is a problem to be solved.

Disclosure of Invention

The embodiment of the application provides a training method, an application method, a device and a medium for an imaging and recognition model, which at least can enable an imaging module to generate a target image optimal for a recognition process through some embodiments of the application, so that classification accuracy can be improved to the greatest extent.

In a first aspect, an embodiment of the present application provides a training method for imaging and identifying a model, the training method including: acquiring annotation data for training an imaging and recognition model, wherein the annotation data is obtained by carrying out imaging result annotation and classification result annotation on target echo signals acquired by a radar; training an imaging and identifying model to be trained according to the labeling data and the loss function to obtain a target imaging and identifying model, wherein the input of the imaging and identifying model is the target echo signal, the output is a classification result of the target echo signal prediction, and the loss function is characterized by the classification result of the target echo signal prediction and the labeling classification result.

Therefore, the loss function in the embodiment of the application is characterized by the prediction classification result and the labeling classification result of the input target echo data, rather than evaluating whether the imaging model is trained by using the prediction imaging result of the target echo signal in the related technology, and training the imaging and recognition model comprising the imaging model and the recognition model can enable the imaging module to generate the target image which is optimal for the recognition process, thereby maximally improving the classification precision.

With reference to the first aspect, in one implementation, the imaging and recognition model includes an imaging model and a recognition model; training the imaging and identifying model to be trained according to the labeling data and the loss function to obtain a target imaging and identifying model, wherein the training comprises the following steps: training an imaging model to be trained in a multi-iteration mode to obtain a value of a target parameter corresponding to the imaging model, and completing preliminary training of the imaging model to obtain an initial imaging model, wherein the types of the target parameter at least comprise: step size and soft threshold; and training the imaging and identifying model by using the imaging result output by the initial imaging model and the loss function to obtain the target imaging and identifying model.

Therefore, unlike the traditional Sparse Recovery (SR) algorithm of manually set parameters, in the embodiment of the application, the target parameter value corresponding to the imaging model is obtained through training, and the imaging model is determined, so that the imaging model can generate the optimal target image favorable for recognition and classification, and the imaging and recognition model with higher classification accuracy can be obtained according to the target parameter value.

With reference to the first aspect, in one implementation, the imaging model includes: a primary image generation module and a target image generation module; the training the imaging model to be trained in a multi-iteration mode to obtain a target parameter value corresponding to the imaging model includes: in the kth iteration, inputting the target echo signal to the primary image generation module to obtain a kth primary image, wherein k is an integer greater than or equal to 1; performing sparse domain transformation on the kth primary image according to the target image generation module to obtain a kth target image; repeating the steps until the first termination condition is met or the set cycle times are reached, and obtaining the target parameter value corresponding to the imaging model.

Therefore, the primary image generated by the primary image generating module is input into the target image generating module for training, so that the sparse domain transformation and the inverse transformation are realized, at least the problem of poor effect caused by using an orthogonal matrix for sparse domain transformation in the related technology can be solved, and meanwhile, the accuracy and the identifiability of generating the target image by the imaging model can be improved in the training process.

With reference to the first aspect, in an implementation manner, the target image generating module includes: a transformation module and an inverse transformation module; the transformation module comprises the following components in sequence: an input layer, a first convolution layer, a first linear rectification layer, a first pooling layer, a second convolution layer and an output layer; the inverse transformation module sequentially comprises: an input layer, a third convolution layer, a second linear rectification layer, a second pooling layer, a fourth convolution layer, and an output layer.

Therefore, the embodiment of the application can obtain parameters with more flexible structures through the training transformation module (in the related technology, the transformation of the sparse domain is only carried out through the two-dimensional orthogonal matrix); by training the inverse transform module, the primary image in the sparse domain can be inversely transformed into the image domain, thereby generating a target image with high recognizability.

With reference to the first aspect, in an implementation manner, before the training, by using a plurality of iterative ways, the imaging model to be trained, and obtaining a target parameter value corresponding to the imaging model, the method further includes: pre-training an imaging model to be pre-trained according to imaging result marking data to obtain an ith pre-training result, wherein i is an integer greater than or equal to 1; and repeating the steps, and obtaining the imaging model to be trained when the ith pre-training result is confirmed to meet a second termination condition.

With reference to the first aspect, in an implementation manner, when the ith pre-training result is confirmed to meet a second termination condition, determining that the second termination condition is met according to an imaging loss function, wherein the imaging loss function is used for measuring a difference between a predicted image and a standard image.

Therefore, the embodiment of the application can obtain better initial parameters for the imaging model to be trained by pre-training the imaging model to be pre-trained before training the imaging and identifying model to be trained, thereby reducing iteration times and improving precision and training efficiency in the process of training the imaging and identifying model to be trained.

In a second aspect, an embodiment of the present application provides an application method for imaging and identification, the method including: acquiring a target echo signal acquired by a radar; inputting the target echo signal into an imaging and recognition model obtained by the training method according to any one of the first aspect and implementation manners thereof, so as to obtain an imaging result and/or a classification result.

In a third aspect, an embodiment of the present application provides a training apparatus for an imaging recognition model, the training apparatus including: the data acquisition module is configured to acquire annotation data for training an imaging and recognition model, wherein the annotation data is used for carrying out imaging result annotation and classification result annotation on target echo signals acquired by the radar; the model training module is configured to train an imaging and identifying model to be trained according to the marking data and the loss function to obtain the imaging and identifying model, wherein the input of the imaging and identifying model is the target echo signal, the output is a classification result predicted for the target echo signal, and the loss function is characterized by the predicted classification result and the marked classification result.

With reference to the third aspect, in one embodiment, the imaging and recognition model includes an imaging model and a recognition model; the model training module is further configured to: training an imaging model to be trained in a multi-iteration mode to obtain a target parameter value corresponding to the imaging model, wherein the types of the target parameters at least comprise: step size and soft threshold; and obtaining the imaging and identifying model according to the target parameter value and the loss function.

With reference to the third aspect, in one embodiment, the imaging model includes: a primary image generation module and a target image generation module; wherein the model training module is further configured to: in the kth iteration, inputting the target echo signal to the primary image generation module to obtain a kth primary image, wherein k is an integer greater than or equal to 1; performing sparse domain transformation on the kth primary image according to the target image generation module to obtain a kth target image; repeating the steps until the first termination condition is met or the set cycle times are reached, and obtaining the target parameter value corresponding to the imaging model.

With reference to the third aspect, in one implementation manner, the target image generating module includes: a transformation module and an inverse transformation module; the transformation module comprises the following components in sequence: an input layer, a first convolution layer, a first linear rectification layer, a first pooling layer, a second convolution layer and an output layer; the inverse transformation module sequentially comprises: an input layer, a third convolution layer, a second linear rectification layer, a second pooling layer, a fourth convolution layer, and an output layer.

With reference to the third aspect, in one embodiment, the model training module is further configured to: pre-training an imaging model to be pre-trained according to imaging result marking data to obtain an ith pre-training result, wherein i is an integer greater than or equal to 1; and repeating the steps, and obtaining the imaging model to be trained when the ith pre-training result is confirmed to meet a second termination condition.

With reference to the third aspect, in one embodiment, the model training module is further configured to: and determining that the second termination condition is met according to an imaging loss function, wherein the imaging loss function is used for measuring the difference between the predicted image and the standard image.

In a fourth aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory, and a bus; the processor is connected to the memory via the bus, the memory storing computer readable instructions which, when executed by the processor, are adapted to carry out the method according to any one of the first and second aspects.

In a fifth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which when executed implements the method according to any one of the first and second aspects.

In a sixth aspect, embodiments of the present application provide a system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations in accordance with the respective method of any one of the first aspects.

In a seventh aspect, embodiments of the present application provide one or more computer storage media storing instructions that, when executed by one or more computers, cause the one or more computers to perform the operations of the respective method of any one of the first aspects.

Drawings

FIG. 1 is a diagram illustrating an application system for imaging and recognition models in accordance with an embodiment of the present application;

FIG. 2 is a network architecture of a transformation module according to an embodiment of the present application;

FIG. 3 is a network structure of an inverse transform module according to an embodiment of the present application;

FIG. 4 is a network structure of an identification model according to an embodiment of the present application;

FIG. 5 is a flow chart of a training method for imaging and recognition models according to an embodiment of the present application;

FIG. 6 is a training embodiment of an imaging and recognition model according to an embodiment of the present application;

FIG. 7 illustrates an application of an imaging and recognition model in accordance with an embodiment of the present application;

FIG. 8 is a training device for imaging and recognition models according to an embodiment of the present application;

fig. 9 is an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments of the present application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without any inventive effort, are intended to be within the scope of the present application based on the embodiments of the present application.

The method steps in the embodiments of the present application are described in detail below with reference to the drawings.

The embodiment of the application can be applied to various target echo signal imaging and identifying scenes. For example, as shown in fig. 1, in one scenario, an aircraft 120 is provided with a radar and a target imaging and recognition model, where the radar detects an object on the ground to obtain a target echo signal 110 collected by the radar, and then the target imaging and recognition model performs target recognition according to the target echo data collected by the radar to output a recognition result. It will be appreciated that in other embodiments the imaging results and classification results 130 may be output simultaneously.

The problems of the related art will be exemplarily described below by taking an aircraft recognition of a ground object as an example. In the related art, an imaging module and an identification module for target echo signals acquired by an aircraft radar are usually trained respectively, so that a target image obtained by the imaging module generally has good sparse performance, but the target image with good sparse performance can cause the reduction of the identification accuracy in the subsequent identification and classification process, and in the traditional imaging algorithm, a two-dimensional orthogonal matrix is used for sparse domain transformation, so that the imaging effect is poor.

At least to solve the above problems, some embodiments of the present application achieve the object of improving the model recognition effect by improving the loss function that evaluates whether training can be completed. For example, in some embodiments of the application the loss function is characterized by a predictive classification result and a labeling classification result for the input target echo data. The method for training the imaging and recognition model to be trained by the loss function solves the problem of reduced recognition accuracy. Some embodiments of the present application further provide that the primary image is input into the target image generating module for training, so as to obtain parameters with flexible structure, thereby implementing sparse domain transformation and inverse transformation, and solving the problem of poor imaging effect caused by using a two-dimensional orthogonal matrix for sparse domain transformation in the related art.

The process of imaging the target echo signal in the related art will be exemplarily described below.

In the related art, according to the Range-doppler algorithm (Range-Doppler Algorithm, RDA), after motion compensation is performed on a target echo signal, a two-dimensional fourier transform is performed on the target echo signal in frequency and azimuth, so as to obtain a target image, where the process is expressed as shown in formula (1):

Where E represents the full-sampling target echo signal, X represents the target image, and N represents the reception noise. F (F) ₁ ,F ₂ Is a normalized fourier matrix.

Considering the sparse sampling condition, only part of the frequency and azimuth angle of the fully sampled target echo signal are sampled to obtain a target echo signal, and the target echo signal can be expressed as shown in a formula (2):

wherein Y represents a target echo signal, E represents a fully sampled target echo signal, phi ₁ ，Φ ₂ Respectively a frequency sparse sampling matrix and an angle sparse sampling matrix, wherein N' represents a noise matrix after sampling, X represents a target image, and ψ is the noise matrix after sampling ₁ ＝Φ ₁ F ₁ ，Ψ ₂ ＝Φ ₂ F ₂ 。

If the target image X is sufficiently sparse, X can be recovered from the target echo signal by solving equation (3):

wherein I _F Is the matrix Frobenius norm, λ is the regularization parameter, |·|| ₁ Representing the sum of the absolute values of the matrix elements. Equation (3) can be solved using various SR algorithms, which typically require a large number of iterations. Suppose SR ^(k) Representing the operation in the kth iteration, the output of the kth iteration is represented as:

X ^(k) ＝SR ^(k) (X ^(k-1) |Y，Ψ ₁ ，Ψ ₂ ，λ) (4)

wherein K represents the maximum iteration number, and K is an integer greater than or equal to 1.

Meanwhile, in order to satisfy imaging of a complex scene in the related art, a target image obtained by the formula (3) is generally transformed into another domain having sparsity, and two orthogonal matrices ω are defined ₁ And omega ₂ As an image transformation matrixAnd as such, the above equation (3) can solve the target image by the equation (5):

but selecting an appropriate transformation matrix is a challenge for imaging the target echo signal. In the related art, discrete Wavelet Transform (DWT) is generally used for transforming an image domain and a sparse domain, but the discrete wavelet transform is a sparse transform commonly used for natural images, and the effect is not ideal in imaging a target echo signal acquired by a radar.

As can be seen from the above description of the related art, on one hand, the accuracy of recognition is reduced due to the separate training of the imaging model and the recognition model of the related art, and on the other hand, the imaging effect is poor due to the sparse domain transformation performed by the related art using the two-dimensional orthogonal matrix.

In order to solve the problems in the related art, the embodiment of the application provides a training method, an application method, a device and a medium for imaging and identifying models.

The imaging and recognition models constructed in accordance with some embodiments of the present application are described first below.

The imaging and identifying model in the embodiment of the application comprises the following steps: the imaging model is configured to image according to target echo signals acquired by the radar, so as to obtain a target image; the identification model is configured to identify the classification of the target image and obtain a classification result; wherein, imaging module includes: the primary image generation module is configured to image according to the target echo signal to obtain a primary image; and the target image generation module is configured to perform sparse domain transformation and inverse transformation on the primary image to obtain a target image.

That is, the imaging model of some embodiments of the present application is represented by a primary image generation module and a target image generation module, the primary image generation module being represented by formula (6):

wherein R is ^(k) Representing a kth primary image generated during an ith iteration; x is X ^(k-1) Representing a kth-1 target image generated during a kth-1 iteration; ρ ^(k) Representing a kth step size parameter generated during a kth iteration; psi ₁ ＝Φ ₁ F ₁ ，Ψ ₂ ＝Φ ₂ F ₂ Wherein F is ₁ And F ₂ Representing a normalized Fourier matrix, Φ ₁ Sparse sampling matrix, Φ, representing frequency ₂ A sparse sampling matrix representing angles; y represents a target echo signal, k represents the number of iterations, and k is an integer greater than or equal to 1.

The target image generation module is shown by the following formula (7):

wherein X is ^(k) Representing a kth target image generated during a kth iteration;representing the kth inverse transformation according to an inverse transformation module in the imaging model; csoft () represents a complex puncturing function; f (F) ^(k) Representing the kth sparse domain transformation according to a transformation module in the imaging model; θ ^(k) Representing the kth soft threshold parameter generated during the kth iteration.

The target image generation module comprises a transformation module and an inverse transformation module. As shown in fig. 2, the transformation module is composed of an input layer 310, a first convolution layer 320, a first linear rectification layer 330, a first pooling layer 340, an index transfer module 350, a second convolution layer 360, and an output layer 370 in this order; wherein the first convolution layer 320 (ComConv) and the second convolution layer 360 (ComConv) are complex convolution layers with a size of 3×3, the first linear rectification layer 330 (ComReLU) is a complex linear rectification layer, the first pooling layer 340 (ComMaxPool) is a complex pooling layer with a size of 2×2, and the index transfer module 350 is configured to transfer the index parameters to the inverse transform module.

That is, as shown in fig. 3, the inverse transform module is composed of an input layer 410, a third convolution layer 420, a second linear rectification layer 430, an index transfer module 440, a second pooling layer 450, a fourth convolution layer 460, and an output layer 470 in this order; wherein the third convolution layer 420 (ComConv) and the fourth convolution layer 460 (ComConv) are complex convolution layers with a size of 3×3, the second linear rectification layer 430 (ComReLU) is a complex linear rectification layer, the second pooling layer 450 (ComMaxPool) is a complex pooling layer with a size of 2×2, and the index transfer module 440 is configured to receive the index parameters transferred by the conversion module.

As shown in fig. 4, the recognition model is a convolutional network, and is formed by sequentially connecting an input layer 511, a fifth convolutional layer 520, a first average pooling layer 530, a sixth convolutional layer (including a linear rectifying unit) 540, a maximum pooling layer 550, an eighth convolutional layer (including a linear rectifying unit) 560, a second average pooling layer 570, a ninth convolutional layer 580, a third average pooling layer 590, and an output layer 512; the size of the convolution kernel in the convolution layer is 5 multiplied by 5 or 3 multiplied by 3, and the sliding step length is 1; the sixth convolution layer and the eighth convolution layer adopt batch standardization operation, and are rectified by a linear rectifying unit; the window sizes of the first and second averaged pooling layers are 2 x 2 with a step size of 2.

It should be noted that, the parameters (such as the size of the convolution kernel, the step size, etc.) set in the imaging and recognition model are not limited thereto in the embodiment of the present application, and may take any other values.

The foregoing describes the specific structure of imaging and recognition models constructed in accordance with some embodiments of the present application; the method for training the imaging and recognition model described above, which is performed by the electronic device (or system) in the embodiment of the present application, will be described in detail with reference to the accompanying drawings.

As shown in fig. 5, a training method for an imaging and recognition model according to an embodiment of the present application includes: s210, acquiring annotation data for training imaging and identifying models; s220, training the imaging and identifying model to be trained according to the labeling data and the loss function to obtain the imaging and identifying model.

The implementation of the above steps is exemplarily described below.

In one embodiment, the labeling data related to S210 is imaging result labeling and classification result labeling of the target echo signals acquired by the radar.

That is, the labeling data may include multiple sets of target echo data, imaging result labels, and classification result labels, or may include only multiple sets of imaging result labels and classification result labels.

The radar (for example, synthetic aperture radar) can be installed on an aircraft to collect target echo signals of the ground, and can also be installed on a satellite to collect target echo signals of the ground; the radar (such as inverse synthetic aperture radar) can also be installed on the ground to collect target echo signals of objects such as airplanes, ships, missiles and the like.

An embodiment of S220 performed by the electronic device will be described below.

In order to improve convergence ability of the imaging and recognition model, in some embodiments of the present application, before S220, the imaging model to be pre-trained having the above model structure needs to be pre-trained according to the imaging result labeling data to obtain an ith pre-training result, where i is an integer greater than or equal to 1; and repeating the steps, and obtaining the imaging model to be trained when the ith pre-training result meets a second termination condition.

That is, before the imaging and recognition model to be trained is entirely trained, the imaging model to be pre-trained needs to be pre-trained, as shown in formula (8):

wherein R is ⁽ⁱ⁾ Representing an ith primary image generated during an ith iteration; x is X ^(i-1) Representing an i-1 th target image generated in the process of the i-1 th iteration; ρ ⁽ⁱ⁾ Representing an ith step size parameter generated in the process of an ith iteration; psi ₁ ＝Φ ₁ F ₁ ，Ψ ₂ ＝Φ ₂ F ₂ Wherein F is ₁ And F ₂ Representing a normalized Fourier matrix, Φ ₁ Sparse sampling matrix, Φ, representing frequency ₂ A sparse sampling matrix representing angles; y represents a target echo signal; x is X ⁽ⁱ⁾ Representing an ith target image generated in an ith iteration process;representing the ith inverse transformation according to the inverse transformation module in the imaging model; csoft () represents a complex puncturing function; f (F) ⁽ⁱ⁾ Representing the ith sparse domain transformation according to a transformation module in the imaging model; θ ⁽ⁱ⁾ Representing the ith soft threshold parameter generated during the ith iteration.

In the process of pre-training the imaging model to be pre-trained, taking the 2 nd iteration as an example (i.e., i=2), an exemplary process of pre-training is illustrated:

the target echo signal and the 1 st target image X generated by the first iteration ⁽¹⁾ Input into the above formula (8), and obtain the 2 nd primary image R through calculation ⁽²⁾ At the same time, the 2 nd step parameter ρ is obtained ⁽²⁾ The method comprises the steps of carrying out a first treatment on the surface of the Inputting the 2 nd primary image into a network structure of a transformation module, and outputting a transformed sparse domain image; processing the sparse domain image according to the complex contraction function, and obtaining a 2 nd soft threshold parameter; and (3) inputting the sparse domain image processed by the complex contraction function and the 2 nd soft threshold parameter into an inverse transformation module, and inversely transforming the sparse domain image into an image domain by the inverse transformation module to obtain a 2 nd target image generated in the 2 nd iteration process.

After the 2 nd target image is obtained, in an imaging loss function shown in a formula (9), the imaging loss function is used for measuring the difference between a predicted image (i.e. the i-th target image) and imaging result labeling data, the 2 nd target image is compared with the imaging result labeling data, when the condition that the second termination condition is met is confirmed, iteration is stopped, and initial parameters of an imaging model to be trained (the imaging model to be trained is the imaging model to be pre-trained which is already trained) and a step length and a soft threshold are obtained, so that better initial parameters can be provided in the whole training process of imaging and identifying the model to be trained, and the convergence effect is enhanced.

Wherein, loss _SR Representing imaging loss function, X _m Representing the reference image in the imaging result annotation data, and γ represents the parameters of balance accuracy and orthogonality (0.1 may be taken in the embodiment of the present application).

It should be noted that, the second termination condition may be that the set iteration number (for example, 20 times or 200 times) is satisfied, or the similarity between the generated predicted image and the labeling data of the imaging result is greater than 80%. The embodiments of the present application are not limited thereto.

Therefore, the embodiment of the application can obtain better initial parameters for the imaging model to be trained by pre-training the imaging model before training the imaging and identifying model to be trained, thereby reducing iteration times and improving precision and training efficiency in the process of training the imaging and identifying model to be trained.

In the above description, in the embodiment of the present application, the process of pre-training the imaging model is performed before the imaging and recognition model to be trained is integrally trained, so that at least the obtained soft threshold parameter and step parameter can be used as initial parameters for integrally training the imaging and recognition model.

In the following, a specific implementation of the overall training of the imaging and recognition model to be trained in the embodiment of the present application will be described.

In one embodiment, the imaging and recognition model of S220 is input to the target echo signal, the output is a classification result of the target echo signal prediction, and the loss function is characterized by the classification result of the target echo signal prediction and the labeled classification result.

In other words, in the process of training the imaging and recognition model, the target echo signal is input into the imaging and recognition model, the classification result of the target echo signal prediction is output, the loss function compares the predicted classification result with the labeled classification result, and then the comparison result is fed back to the imaging and recognition model to enter the next iteration.

In one embodiment, the imaging and recognition model includes an imaging model and a recognition model; s220 includes: training an imaging model to be trained in a multi-iteration mode to obtain a value of a target parameter corresponding to the imaging model, and completing preliminary training of the imaging model to obtain an initial imaging model, wherein the types of the target parameter at least comprise: step size and soft threshold; and training the imaging and identifying model by using the imaging result output by the initial imaging model and the loss function to obtain the target imaging and identifying model.

That is, the process of performing supervised training on the imaging and recognition models to be trained, as shown in fig. 6, includes an imaging model to be trained 620 and a recognition model to be trained 630; the following exemplary description describes the overall process of training an imaging and recognition model to be trained:

inputting the target echo signal 610 into an imaging model 620 to be trained, and learning the imaging of the target echo signal by the imaging model 620 to be trained for k times to obtain a target image; inputting the target image into the recognition model 630 to be trained for classification recognition; the recognition model 630 to be trained inputs the predicted classification result into the loss function 640; the loss function 640 compares the predicted classification result with the labeled classification result, and feeds back the comparison result to the imaging model 620 to be trained and the recognition model to be trained, so as to enter the next iteration; until the circulation times are met, obtaining target parameters corresponding to the imaging model and the imaging and identifying model after training is completed.

It should be noted that, the learning of the imaging model 620 to be trained for k iterations of imaging the target echo signal is 8 (i.e., k=8), which is not limited to the embodiment of the present application.

The training process of the imaging model to be trained will be described in detail below.

In one embodiment, an imaging model includes: a primary image generation module and a target image generation module; in the kth iteration, inputting a target echo signal into a primary image generation module to obtain a kth primary image, wherein k is an integer greater than or equal to 1; performing sparse domain transformation on the kth primary image according to the target image generation module to obtain a kth target image; repeating the steps until the first termination condition is met or the set cycle times are reached, and obtaining the target parameter value corresponding to the imaging model.

In the embodiment of the application, an iterative shrinkage threshold algorithm is selected as a solver of an SR algorithm, and a primary image generation module included in an imaging model is shown by the formula (6): the target echo signal and the kth-1 target image X generated by the last iteration ^(k-1) Inputting into the above formula (6), obtaining the kth primary image R through calculation ^(k) At the same time, the kth step size parameter ρ is obtained ^(k) After the kth primary image is obtained, since the kth primary image is not sufficiently sparse, the imaging effect is not good, and therefore, the kth primary image needs to be transformed from the image domain to the sparse domain.

Thus, the kth primary image is input into the target image generation module, which is shown by the above formula (7): the target image generating module includes a transformation module shown in fig. 2 and an inverse transformation module shown in fig. 3. The specific structures of the transformation module and the inverse transformation module are described above, and are not described herein.

That is, the target image generation module generates the kth primary image R according to the above formula (7) ^(k) The specific procedure for performing the sparse domain transform and the inverse transform is as follows:

first, the kth primary image R ^(k) An image with 128×128 dimensions and channel 1 is input through the input layer 310 of the transformation module, and enters the first convolution layer 320 to generate an image with 128×128 dimensions and channel 32; generating an image with 128×128 dimensions and 32 channels by the first linear rectifying layer 330; generating an image with the dimension of 64×64 and the channel of 32 by the first pooling layer 340, and simultaneously, inputting the index parameter into the index transfer module 440 in the inverse transformation module by the index transfer module 350; an image with dimensions 64 x 64 and channels 32 is generated through the second convolution layer 360 and output through the output layer 370, thereby obtaining a kth primary image in the sparse domain.

And secondly, inputting the kth primary image in the sparse domain into a complex contraction function for transformation to obtain a kth sparse domain image after the complex contraction function transformation.

Finally, the k soft threshold value parameter theta generated in the process of the k sparse domain image and the k iteration after the complex contraction function is transformed ^(k) After being input into the input layer 410 of the inverse transform module, the images with dimensions of 64×64 and channels (layers) of 32 are generated after entering the third convolution layer 420; and then pass through a second lineThe sexual rectifying layer 430 generates an image with dimensions 64×64 and channels 32; generating an image with 128×128 dimensions and 32 channels by the second pooling layer 450, and receiving index parameters by the index transfer module 440; an image with 128×128 dimensions and channel 1 is generated by the fourth convolution layer 460 and output by the output layer 470, thereby obtaining target parameter values corresponding to the target image and the imaging model.

After the imaging model outputs the target image, the target image is input into the recognition model shown in fig. 4 for classification recognition, and the specific structure of the recognition model is as described above and will not be described herein.

That is, first, the imaging model obtains a 64×64×1 image by performing a center cropping operation after generating a target image (128×128×1) having dimensions of 128×128 and a channel of 1, and obtains a 64×64×2 image by performing channel copying once.

The image of 64×64×2 (image size is 64×64, channel is 2) is input to the recognition model for classification recognition, and after input through the input layer 511, the processing of the fifth convolution layer 520 (image of 60×60, channel is 32 is obtained), the first average pooling layer 530 (image of 30×30, channel is 32 is obtained), the sixth convolution layer (including linear rectifying unit) 540 (image of 28×28, channel is 64 is obtained), the maximum pooling layer 550 (image of 14×14, channel is 64 is obtained), the eighth convolution layer (including linear rectifying unit) 560 (image of 12×12, channel is 64 is obtained), the second average pooling layer 570 (image of 6×6, channel is 64 is obtained), the ninth convolution layer 580 (image of 2×2, channel is 128 is obtained), and the third average pooling layer 590 (image of 1×1, channel is 128) are sequentially performed.

Then, the recognition model inputs the extracted feature vector into a fully connected layer containing a parameter regularization method (Dropout), and the result output by the Softmax classifier is shown in formula (10):

p＝TC(X ^(K) ) (10)

wherein TC represents the respective recognition operation, and P represents a probability set that the target image belongs to at least one category; the i-th element in the vector P may be represented by formula (11).

Wherein P is _i Representing the probability that the target image belongs to the ith class, O _i And representing the result corresponding to the I-th class classification, wherein I represents the number of target classes.

Finally, after obtaining the probability value of the target image belonging to various types, P is calculated _i Is input into a cross entropy based loss function as shown in equation (12).

Wherein Loss represents a Loss value, M represents the total number of marked data, Y _m Represents m target echo signals, label _m Classification result label representing mth target echo signal, P (label) _m |Y _m ) Representing the correct classification probability P _i (i＝label _m )。

That is, after the loss value is calculated, the loss function is fed back to the imaging model and the identification model, and then the next iteration is performed until the number of iterations or the classification accuracy is satisfied.

It should be noted that, the parameters in the imaging and recognition model are composed of the learnable parameters in the imaging model and the recognition model, in the traditional SR algorithm, all the parameters are manually set, and the iteration times are unchanged, so a large number of iterations are usually required to converge, and the recovery performance is unstable. In the embodiment of the application, a learnable SR algorithm is used, which expands the iterative process of the traditional SR algorithm into a network structure and learns parameters through training.

The specific process of training the imaging and recognition model in the embodiment of the present application is described above, and the process of applying the imaging and recognition model in the embodiment of the present application will be described below.

In one embodiment, a method of imaging and identifying a model includes: acquiring a target echo signal acquired by a radar; and inputting the target echo signals into an imaging and identifying model obtained by the training method to obtain imaging results and/or classification results.

That is, as shown in fig. 7, after the training of the imaging and recognition model is completed, the target echo signal 710 acquired by the radar is input into the imaging model 720 of the imaging and recognition model, the imaging model 720 generates a target image, the target image is input into the recognition model 730 for classification recognition, and then the target image and the classification result are output; or when the category to which the target image belongs is not identified, the imaging model generates the target image and then directly outputs the target image; or under the condition that only the classification result is required to be obtained, the recognition model directly outputs the classification result after carrying out classification recognition.

The application method of the imaging and recognition model in the embodiment of the present application is described above, and the comparison test of the imaging and recognition model and other algorithms will be described below.

In an embodiment of the present application, the imaging and recognition model of the embodiment of the present application will be tested on MSTAR datasets commonly used in synthetic aperture radar (Synthetic Aperture Radar, SAR) automatic target recognition. The standard operation conditions of MSTAR are considered, the classification corresponding to the target echo signals of the ground is divided into ten major classes, and each class comprises hundreds of target images with different azimuth angles and pitch angles. We recover the target echo signal from the 128 x 128 complex MSTAR image according to the method in the related art, and obtain a full sample echo size of 128 x 128. The target echo signal is constructed by randomly selecting a partial frequency and an angle from full-sampling echoes, and the size of the target echo signal is (ssr multiplied by 128) multiplied by (ssr multiplied by 128) when the sparse sampling rate is defined as ssr, training is carried out by using labeling data with a pitch angle of 17 degrees, and testing is carried out on target echo signal data with a pitch angle of 15 degrees.

In the imaging and identifying model in the embodiment of the application, the number of layers of the imaging model is set to 8 layers through experiments, so that the precision and the efficiency can be ensured. The learnable imaging model is first pre-trained by equation (9), and then the whole imaging and recognition model is trained end-to-end by equation (12).

For comparison, embodiments of the present application also tested some comparison methods in which imaging and classification steps were performed in steps. In the imaging step, an RDA algorithm based on a zero-padding fourier transform, a conventional iterative threshold contraction algorithm (iterative shrinkage-thresholding algorithm, ISTA) and an ISTA algorithm with a discrete wavelet transform (Discrete Wavelet Transformation, DWT) as a sparse transform are employed. To improve the imaging effect of the ISTA, the step size and soft threshold parameters in the ISTA do not need to be set manually, but rather are learned from the data by minimizing subsequent losses.

After training, the ISTA will employ the same parameters in different iterations. In the classification and identification stage, the same convolution network structure as the identification model in the imaging and identification model is adopted, and the obtained images are classified after training. For simplicity, the three stepwise imaging and classification methods described above are referred to as RD, ISTA, and ISTA-DWT, respectively.

All networks were trained using the Adam optimization algorithm based on pyrerch. The experiment was performed on a workstation using an intelcorei9-9900 processor and an nvidiageforcex 2080ti graphics processor.

Experiment 1: in the contrast experiment of the application on the imaging and recognition model, the sparse sampling condition is set to ssr=0.5; the maximum number of iterations of the ISTA is 100.

The experimental results of experiments performed using the RDA algorithm, the ISTA algorithm, and the ISTA-DWT algorithm in the related art are described below.

The experimental results also show a target result under relatively weak clutter, including: in the result of experiments performed by using RDA algorithm, serious side lobes are present; in the experimental result using the ISTA algorithm, an ISTA image is composed of a plurality of isolated strong scatterers, and the sparsity of the image is enhanced by inhibiting weak scatterers or areas, so that the contour of a target is discontinuous; in the results of experiments using the ISTA-DWT algorithm, it can be seen that the scatterers in the DWT domain are isolated, but the scatterers in the final image are not isolated; thus, the ISTA-DWT may retain some weak scatterers or regions. However, since the low frequency components of the target image are suppressed in the DWT domain, there are some steep edges in the final target image.

The following is a description of experimental results of an imaging and recognition model (ICI-Net) in an embodiment of the present application.

As a result of experiments performed using the imaging and recognition model (i.e., ICI-Net) of the embodiments of the present application, it can be seen that: the target image generated by using the imaging model is very consistent with the reference image; meanwhile, the target area is well stored, classification and recognition are facilitated, clutter backgrounds which may not be conducive to classification are averaged, and the influence of clutter on classification and recognition can be reduced to a certain extent.

In addition, the imaging and recognition model in the embodiment of the application has stable effect under the condition that the target image is not sparse enough in the image domain and the DWT domain; but both the imaging results of the ISTA and the ISTA-DWT are poor.

Therefore, as can be seen from the above experimental results, in the imaging and classification recognition method, it is difficult to select appropriate parameters and sparse fields for various target scenes; the "sparsest image" for imaging may not be the "best image" for classification. Thus, the separate imaging and classification methods are disadvantageous for achieving better classification performance.

In order to solve the problems in the related art, the embodiment of the application combines the imaging process and the classifying process into an integrated model; learning parameters in the imaging process in order to maximize classification accuracy, rather than pursuing the sparsest image; in addition, the learnable sparse transformation increases the degree of freedom of the whole network, and is beneficial to further improving the classification precision.

The following is a comparison of the results data of the related art algorithms and the ICI-Net algorithm in the embodiment of the present application, and the average results of 50 tests are shown in table 1:

It can be seen that ICI-Net proposed by the embodiments of the present application (i.e., the imaging and recognition model in the embodiments of the present application) achieves the best classification accuracy. The results of ISTA and ISTA-DWT are similar and superior to RDA. In order to intuitively interpret the classification results, two examples of imaging results are given.

TABLE 1 results of experiment one

	RDA	ISTA	ISTA-DWT	ICI-Net
					Classification accuracy	93.22	95.27	95.05	97.01
Run time (s/GPU)	0.0065	0.5758	0.5675	0.1091

The run times of the different algorithms are given in table 1 at the same time. Clearly, the ICI-Net proposed in the embodiments of the present application is more efficient than the ISTA and ISTA-DWT methods (100 iterations), but slower than the RDA method. As the number of iterations decreases, the running time of the ISTA decreases. The ISTA performance at different iteration times was tested. Since ISTA-DWT was not better than ISTA, as shown in Table 1, only ISTA was tested here. The classification accuracy and the running time of the iteration times of 50, 100 and 150 are shown in table 2, and as the iteration times are reduced from 100 to 50, the running time is obviously reduced, and the classification accuracy is also obviously reduced. As it increases from 100 to 150, the run time increases significantly, but the accuracy increases slightly. The number of iterations 100 has proven to be a suitable choice for the ISTA with a high accuracy and efficiency.

Table 2. ISTA results for different iteration numbers.

Iteration number in ISAT	50	100	150
				Classification accuracy	0.9314	0.9527	0.9567
Run-time (s/GPU)	0.2854	0.5758	0.872

Experiment 2: the ICI-Net network proposed by the embodiment of the present application was tested under different sparse sampling rates ssr= [0.2,0.3,0.4,0.5,0.6,0.7,0.8 ]. The model for each ssr is retrained based on the initial model for ssr=0.5. The experiment shows that the classification accuracy of the different methods is improved along with the increase of ssr through 50 experiments. More importantly, the ICI-Net provided by the embodiment of the application is obviously superior to other methods under different ssr. For MSTAR datasets at sparse sampling rate ssr=0.8, the average classification accuracy is as high as 98.42%.

Thus, in summary, conventional SAR/ISAR Automatic Target Recognition (ATR), generally considers the imaging step and the classification recognition step separately. The target image obtained in the imaging step may not be the best for the classification recognition of the subsequent step. The application provides an imaging and classifying integrated architecture which is realized by a deep network, namely an imaging and classifying integrated network (ICI-Net), and the SAR/ISAR imaging and classifying integrated problem under the sparse sampling condition is primarily researched. The ICI-Net network combines a sparse recovery algorithm and a Convolutional Neural Network (CNN), wherein super parameters, a sparse transformation matrix and CNN parameters in the sparse recovery algorithm are all learned from training data, and a target image which is most favorable for a final classification task is automatically generated. Experimental results of the class 10 target classification task for the MSTAR reference dataset show that the average classification accuracy of ICI-Net is 97.01% at a sparse sampling rate of 50% and 98.42% at a sparse sampling rate of 80%.

In addition, in the embodiment of the application, the imaging step and the classification and identification step are comprehensively considered. An integrated architecture of image and classification combination is presented and implemented using a deep network. The network consists of an imaging model constructed by the unfolding SR algorithm and an identification model with a CNN layer. In SR-based imaging models, SAR/ISAR images are sparse in the transform domain, which can be learned from training data, rather than in the image domain itself. Thus, it is more likely to recover the target information from the sparse samples. All parameters in the SR algorithm are learned by training, not manually set. The purpose of learning is to maximize classification accuracy, rather than finding the sparsest image. It can thus produce the best target image for the classification task. Finally, the SR imaging model contains only a small number of layers, much fewer iterations than the conventional SR algorithm. Thus, the overall network is more efficient than conventional imaging and classification methods.

Having described the comparative testing of imaging and recognition models with other algorithms in embodiments of the present application, an apparatus for imaging and recognition models in embodiments of the present application is described below.

As shown in fig. 8, a training apparatus 800 for imaging and identifying a model according to an embodiment of the present application includes: a data acquisition module 810 and a model training module 820.

In one implementation, an embodiment of the present application provides a training apparatus 800 for imaging and identifying a model, the training apparatus including: a data acquisition module 810 configured to acquire annotation data for training an imaging and recognition model, wherein the annotation data is an imaging result annotation and a classification result annotation for a target echo signal acquired by a radar; the model training module 820 is configured to train the imaging and recognition model to be trained according to the labeling data and the loss function to obtain the imaging and recognition model, wherein the input of the imaging and recognition model is a target echo signal, the output is a classification result of the target echo signal prediction, and the loss function is characterized by the predicted classification result and the labeled classification result.

In one embodiment, the imaging and recognition model includes an imaging model and a recognition model; model training module 820 is also configured to: training an imaging model to be trained in a multi-iteration mode to obtain a target parameter value corresponding to the imaging model, wherein the types of the target parameters at least comprise: step size and soft threshold; and obtaining an imaging and identifying model according to the target parameter value and the loss function.

In one embodiment, an imaging model includes: a primary image generation module and a target image generation module; wherein the model training module 820 is further configured to: in the kth iteration, inputting a target echo signal into a primary image generation module to obtain a kth primary image, wherein k is an integer greater than or equal to 1; performing sparse domain transformation on the kth primary image according to the target image generation module to obtain a kth target image; repeating the steps until the first termination condition is met or the set cycle times are reached, and obtaining the target parameter value corresponding to the imaging model.

In one embodiment, the target image generation module includes: a transformation module and an inverse transformation module; the transformation module comprises the following components in sequence: an input layer, a first convolution layer, a first linear rectification layer, a first pooling layer, a second convolution layer and an output layer; the inverse transformation module sequentially comprises: an input layer, a third convolution layer, a second linear rectification layer, a second pooling layer, a fourth convolution layer, and an output layer.

In one embodiment, model training module 820 is further configured to: pre-training an imaging model to be pre-trained according to imaging result marking data to obtain an ith pre-training result, wherein i is an integer greater than or equal to 1; repeating the steps, and obtaining the imaging model to be trained when the ith pre-training result meets the second termination condition.

In one embodiment, model training module 820 is further configured to: the second termination condition is determined to be satisfied based on an imaging loss function, wherein the imaging loss function is used to measure a difference between the predicted image and the standard image.

In an embodiment of the present application, the module shown in fig. 8 can implement each process in the method embodiment of fig. 1 to 7. The operation and/or function of the individual modules in fig. 8 are respectively for realizing the respective flows in the method embodiments in fig. 1 to 7. Reference is specifically made to the description in the above method embodiments, and detailed descriptions are omitted here as appropriate to avoid repetition.

As shown in fig. 9, an embodiment of the present application provides an electronic device 900, including: processor 910, memory 920 and bus 930, processor 910 being connected to memory 920 via bus 930, the memory storing computer readable instructions which, when executed by the processor, implement the method as described in any of the above embodiments, and in particular see the description of the above method embodiments, detailed description is omitted here where appropriate to avoid redundancy.

Wherein the bus is used to enable direct connection communication of these components. The processor in the embodiment of the application can be an integrated circuit chip with signal processing capability. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but may also be a Digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The Memory may be, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc. The memory has stored therein computer readable instructions which, when executed by the processor, perform the methods of the above embodiments.

It will be appreciated that the configuration shown in fig. 9 is illustrative only and may include more or fewer components than shown in fig. 9 or have a different configuration than shown in fig. 9. The components shown in fig. 9 may be implemented in hardware, software, or a combination thereof.

Embodiments of the present application also provide a computer readable storage medium, on which a computer program is stored, which when executed implements the method described in any of the above embodiments, and specifically reference may be made to the description in the above method embodiments, and detailed descriptions are omitted here as appropriate to avoid redundancy.

The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A training method for imaging and recognition models, the training method comprising:

acquiring annotation data for training an imaging and recognition model, wherein the annotation data is obtained by carrying out imaging result annotation and classification result annotation on target echo signals acquired by a radar;

Training an imaging and identifying model to be trained according to the labeling data and the loss function to obtain a target imaging and identifying model, wherein the input of the imaging and identifying model is the target echo signal, the output is a classification result predicted for the target echo signal, and the loss function is characterized by the classification result predicted for the input target echo signal and the labeling classification result;

wherein the imaging and identifying model comprises an imaging model and an identifying model;

training the imaging and identifying model to be trained according to the labeling data and the loss function to obtain a target imaging and identifying model, wherein the training comprises the following steps:

training an imaging model to be trained in a multi-iteration mode to obtain a value of a target parameter corresponding to the imaging model, and completing preliminary training of the imaging model to obtain an initial imaging model, wherein the types of the target parameter at least comprise: step size and soft threshold;

and inputting an imaging result output by the initial imaging model into an identification model to be trained, and performing feedback training on the identification model to be trained and the initial imaging model for a plurality of times through the loss function to obtain the target imaging and identification model.

2. The method of claim 1, wherein the imaging model comprises: a primary image generation module and a target image generation module; wherein, the liquid crystal display device comprises a liquid crystal display device,

training the imaging model to be trained in a multi-iteration mode to obtain a target parameter value corresponding to the imaging model, wherein the training comprises the following steps:

in the kth iteration, inputting the target echo signal to the primary image generation module to obtain a kth primary image, wherein k is an integer greater than or equal to 1;

performing sparse domain transformation on the kth primary image according to the target image generation module to obtain a kth target image;

repeating the steps until the first termination condition is met or the set cycle times are reached, and obtaining the target parameter value corresponding to the imaging model.

3. The method of claim 2, wherein the target image generation module comprises: a transformation module and an inverse transformation module;

the transformation module comprises the following components in sequence: an input layer, a first convolution layer, a first linear rectification layer, a first pooling layer, a second convolution layer and an output layer;

the inverse transformation module sequentially comprises: an input layer, a third convolution layer, a second linear rectification layer, a second pooling layer, a fourth convolution layer, and an output layer.

4. A method according to any one of claims 1-3, wherein before training the imaging model to be trained in a plurality of iterations to obtain the target parameter values corresponding to the imaging model, the method further comprises:

pre-training an imaging model to be pre-trained according to imaging result marking data to obtain an ith pre-training result, wherein i is an integer greater than or equal to 1;

and repeating the steps, and obtaining the imaging model to be trained when the ith pre-training result is confirmed to meet a second termination condition.

5. The method of claim 4, wherein when the i-th pre-training result is confirmed to satisfy a second termination condition, comprising:

and determining that the second termination condition is met according to an imaging loss function, wherein the imaging loss function is used for measuring the difference between the predicted image and imaging result labeling data.

6. A method of imaging and identifying a model, the method comprising:

acquiring a target echo signal acquired by a radar;

inputting the target echo signal into a target imaging and identifying model obtained by the training method according to any one of claims 1-5, and obtaining an imaging result and/or a classification result.

7. A training device for imaging and identifying a model, the training device comprising:

the data acquisition module is configured to acquire annotation data for training an imaging and recognition model, wherein the annotation data is obtained by carrying out imaging result annotation and classification result annotation on target echo signals acquired by a radar;

the model training module is configured to train an imaging and identifying model to be trained according to the labeling data and the loss function to obtain a target imaging and identifying model, wherein the input of the imaging and identifying model is the target echo signal, the output is a classification result of the target echo signal prediction, and the loss function is characterized by the classification result of the target echo signal prediction and the labeling classification result;

the model training module is further configured to:

8. A system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform the operations of the respective method of any one of claims 1-6.

9. One or more computer storage media storing instructions that, when executed by one or more computers, cause the one or more computers to perform the operations of the respective method of any one of claims 1-6.