CN111144462B

CN111144462B - Unknown individual identification method and device for radar signals

Info

Publication number: CN111144462B
Application number: CN201911296607.8A
Authority: CN
Inventors: 黄双双; 李臻; 单志林; 李立; 苏志杰; 胡佳
Original assignee: CETC 38 Research Institute
Current assignee: CETC 38 Research Institute
Priority date: 2019-12-16
Filing date: 2019-12-16
Publication date: 2023-10-20
Anticipated expiration: 2039-12-16
Also published as: CN111144462A

Abstract

The invention discloses a method and a device for identifying unknown individuals of radar signals, wherein the method comprises the following steps: constructing and storing a known class sample set N and an unknown class sample set UN; inputting each sample to be identified in the known type sample set N and each sample to be identified in the unknown type sample set UN into a coding network, extracting the characteristic vector of the sample to be identified and classifying the sample to be identified; generating an attention probability distribution vector according to the feature vector of the sample to be identified by using a DDPG algorithm; generating a conditional feature vector according to the feature vector and the attention probability distribution vector of the sample to be identified, and inputting the conditional feature vector into a decoding network to judge the category of the sample to be identified; performing network training in a coding network, a decoding network and a DDPG algorithm; the invention has the advantages that: the method avoids the unknown class signal from being incorrectly recognized as a known class signal, and realizes the accurate separation of the known class signal and the unknown class signal.

Description

Unknown individual identification method and device for radar signals

Technical Field

The invention relates to the field of electronic reconnaissance, in particular to a method and a device for identifying unknown individuals of radar signals.

Background

Based on deep-learning classification networks, the output type of the model is usually fixed, so that the class of test data is also known when model training is performed. In a real application scene, unknown categories which do not exist in training usually appear, and the conventional classification network cannot accurately classify the unknown categories which do not exist in the training process. The defect can cause the recognition accuracy of the classification model in a real application environment to be greatly reduced, so that the problem of recognition of unknown categories is solved, and the problem is a key factor for improving the recognition accuracy of the classification network. The existing solutions to the problem of unknown class identification of deep learning mainly include two methods, namely adding unknown sample classes into a training set, and distinguishing the unknown classes through the similarity between data to be detected and known samples. And secondly, extracting middle layer characteristics of a classification network, and carrying out cluster analysis by combining a common machine learning method (KNN, PCA, TSNE and other clustering methods) to distinguish unknown classes.

The above two methods have good effects on identifying unknown categories in some application scenes, but have certain defects. According to the method, the known type and the unknown type are distinguished by adding the unknown sample type into the training set, the dependence on the unknown sample is large, the sample set is difficult to collect, all potential unknown types are difficult to include in the unknown type, and when the model encounters a type which does not exist in the training set, correct classification cannot be performed. The second method is to extract the middle layer characteristics of the classification network, then combine the common machine learning method to perform cluster analysis, distinguish the unknown class, and have better effect only under the condition of small similarity between the unknown class and the known class, while the individual identification of the radar signals is to distinguish the unknown class and the known class through the tiny change of the signals generated by the radar hardware difference, the similarity of the individual signals of different radars is large, and the different individuals are difficult to completely distinguish by performing cluster analysis on the middle characteristics of the classification network. Therefore, in summary, the existing solution to the unknown class recognition problem of deep learning has the problem that correct classification cannot be performed, the front technology in the deep learning field is studied deeply, and the search of a better unknown class recognition method has important significance for improving the individual recognition precision of radar signals.

Disclosure of Invention

The invention aims to provide a method and a device for identifying unknown individuals of radar signals so as to accurately separate known signals from unknown signals.

The invention solves the technical problems by the following technical means: a method of identifying an unknown individual of a radar signal, the method comprising:

step one: constructing and storing a known class sample set N and an unknown class sample set UN;

step two: inputting each sample to be identified in the known type sample set N and each sample to be identified in the unknown type sample set UN into a coding network, extracting the characteristic vector of the sample to be identified and classifying the sample to be identified;

step three: generating an attention probability distribution vector according to the feature vector of the sample to be identified by using a DDPG algorithm;

step four: generating a conditional feature vector according to the feature vector and the attention probability distribution vector of the sample to be identified, and inputting the conditional feature vector into a decoding network to judge the category of the sample to be identified;

step five: network training in the encoding network, decoding network, and DDPG algorithm is performed.

The traditional coding network and decoding network finish the correct recovery of the input signal, and the invention introduces a attention mechanism to enable the coding network and the decoding network to selectively recover the input signal. Generating attention probability distribution vectors through a DDPG algorithm, generating conditional feature vectors according to feature vectors of samples to be identified and the attention probability distribution vectors, inputting the conditional feature vectors into a decoding network for signal recovery, calculating the similarity between the output result of the decoding network and an input signal, wherein the similarity is high, the input signal is a known signal, the category of the input signal is the category output by a classifier, and the input signal is an unknown category signal if the similarity is low. The feature vectors output by the coding network are distributed with different attentions through the attentiveness mechanism, and for the signals of the known class, the main features of the feature vectors are enhanced, and the secondary features are weakened, so that the signals of the known class can be correctly recovered. Whereas for signals of unknown class, their main features are weakened so that the signals of unknown class cannot be recovered. Finally, the input unknown class signals are effectively identified, the unknown class signals are prevented from being incorrectly identified as a known class signal, and the known class signals and the unknown class signals are accurately separated.

Preferably, the constructing the known category sample set N and the unknown category sample set UN includes: collecting signals of N radars, taking the signals of K radars as known categories to form a known category sample set N, and taking the signals of N-K radars as unknown categories to form an unknown category sample set UN.

Preferably, the second step includes: the coding network comprises an input layer, a middle layer and an output layer, wherein the input layer inputs a sample to be identified, the middle layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the sample to be identified, the pooling layer performs dimension reduction on the sample to be identified, the output layer outputs a feature vector h of the sample to be identified and h=f (wx+b), wherein w is the weight of the coding network, b is the bias of the coding network, x is the sample to be identified of an input one-dimensional sequence, after the feature vector h is obtained, the feature vector h is input into a classifier, and the class of the sample to be identified is obtained, wherein the classifier consists of a full connection layer and a softmax classifier.

Preferably, the third step includes: utilizing DDPG algorithm to build actor network u (s; theta) based on convolutional neural network ^u ) The critic network Q (s, a; θ ^Q ) Wherein s is the input of the actor network and the critic network and is equal to the characteristic vector h, theta extracted by the coding network ^u Is the weight parameter of the actor network, θ ^Q The weight parameter is the weight parameter of the critic network; and obtaining an action a through action and random noise sampling generated by an actor network, wherein the action a is the attention probability distribution vector.

Preferably, the fourth step includes: obtaining a conditional feature vector according to a formula c=h×a, wherein c is a conditional feature vector, inputting the obtained conditional feature vector c into a decoding network, wherein the decoding network structure is symmetrical to the encoding network, and the output result of the decoding network is x '=g (w' c+b '), wherein w' is the weight of the decoding network, b 'is the bias of the decoding network, x' is the output of the decoding network, and determining the type of the sample to be identified according to the output of the decoding network.

Preferably, the fifth step includes: training the coding network and the decoding network by using a known class sample set N, wherein a loss function formula is as followsWherein y is _i

For the input real class of the ith sample to be identified, y _i ' is the class of the ith sample to be identified output by the coding network, x _i For the ith sample to be identified, m is the number of samples to be identified, x _i ' is the output of the ith sample to be identified in the decoding network; in the training process, the loss value L of m input samples is calculated each time ₁ Then updating network parameters by a back propagation algorithm until the loss value L of the network ₁ If the network is not descending any more, the training of the coding network and the decoding network is completed;

training of an actor network and a critic network, first, defining a target network, i.e., u (s'; theta, for the actor network and the critic network, respectively ^u ') and Q (s ', a '; θ ^Q ' s ' is the input of the target actor network and the target critic network, a ' is the output action of the target actor network, and theta ^u ' weight parameter of target actor network, θ ^Q The method comprises the steps of keeping parameters of a coding network fixed in a training process, wherein the parameters of the coding network are kept to be the weight parameters of the target critic network, a training sample set comprises an N sample set and a UN sample set, an actor network generates an action a according to an input state s provided by the coding network and a decoding network, the coding network and the decoding network calculate a reward r according to the a and provide a next input state s ', the conversion states (s, a, r, s') are stored in a buffer, and m samples (s _i ，a _i ，r _i ，s _i '), m=1, 2 … m, critic network according to s in the transition state _i And output a of actor network _i Calculating a Q value, and calculating a strategy gradient and finishing updating of weight parameters of the actor network by the actor network according to the Q value, wherein the formula is as follows:

where M is the number of samples taken from the buffer,for policy gradient +.>Is a gradient sign; the target actor network is based on s in the transition state _i ' get action a _i ' target critic network according to s _i ' and a _i 'obtaining a target critic network output Q', obtaining Q according to a critic network and Q 'obtained by the target critic network, calculating a time difference error L of the Q and the Q' according to the critic network, calculating a gradient according to the L, and updating a weight of the critic network, wherein the difference error expression is as follows:

wherein y is _i Calculated according to the following formula:

y _i ＝r _i +γQ′(s _i+1 ，u′(s _i+1 |Θ ^u′ )|Θ ^Q′ )

wherein r is _i And gamma is a preset weight coefficient for the reward value of the ith sample to be identified, Q is a critic network output value, and Q' is a target critic network output value.

The weight parameters of the target actor network and the target critic network are updated through a soft update algorithm, and the updating mode is as follows: theta (theta) ^Q′ ＝τΘ ^Q +(1-τ)Θ ^Q′

Θ ^u′ ＝τΘ ^u +(1-τ)Θ ^u′

τ is a weight coefficient, and the value is 0.001;

and (3) until the differential error is smaller than a preset value and stable, completing the training of the network in the DDPG algorithm.

The invention also provides an unknown individual identification device of the radar signal, which comprises:

the sample set construction module is used for constructing and storing a known type sample set N and an unknown type sample set UN;

the extraction and classification module is used for inputting each sample to be identified in the known class sample set N and each sample to be identified in the unknown class sample set UN into the coding network, extracting the feature vector of the sample to be identified and classifying the sample to be identified;

the vector generation module is used for generating an attention probability distribution vector according to the feature vector of the sample to be identified by using a DDPG algorithm;

the class judging module is used for generating a conditional feature vector according to the feature vector and the attention probability distribution vector of the sample to be identified and inputting the conditional feature vector into the decoding network to judge the class of the sample to be identified;

and the training module is used for carrying out network training in the encoding network, the decoding network and the DDPG algorithm.

Preferably, the extraction classification module is further configured to: the coding network comprises an input layer, a middle layer and an output layer, wherein the input layer inputs a sample to be identified, the middle layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the sample to be identified, the pooling layer performs dimension reduction on the sample to be identified, the output layer outputs a feature vector h of the sample to be identified and h=f (wx+b), wherein w is the weight of the coding network, b is the bias of the coding network, x is the sample to be identified of an input one-dimensional sequence, after the feature vector h is obtained, the feature vector h is input into a classifier, and the class of the sample to be identified is obtained, wherein the classifier consists of a full connection layer and a softmax classifier.

Preferably, the vector generation module is further configured to: utilizing DDPG algorithm to build actor network u (s; theta) based on convolutional neural network ^u ) The critic network Q (s, a; θ ^Q ) Wherein s is acthe inputs of the tor network and the critic network are equal to the eigenvectors h, theta extracted by the coding network ^u Is the weight parameter of the actor network, θ ^Q The weight parameter is the weight parameter of the critic network; and obtaining an action a through action and random noise sampling generated by an actor network, wherein the action a is the attention probability distribution vector.

Preferably, the category determination module further includes: obtaining a conditional feature vector according to a formula c=h×a, wherein c is a conditional feature vector, inputting the obtained conditional feature vector c into a decoding network, wherein the decoding network structure is symmetrical to the encoding network, and the output result of the decoding network is x '=g (w' c+b '), wherein w' is the weight of the decoding network, b 'is the bias of the decoding network, x' is the output of the decoding network, and determining the type of the sample to be identified according to the output of the decoding network.

Preferably, the training module is further configured to: training the coding network and the decoding network by using a known class sample set N, wherein a loss function formula is as followsWherein y is _i For the input real class of the ith sample to be identified, y _i ' is the class of the ith sample to be identified output by the coding network, x _i For the ith sample to be identified, m is the number of samples to be identified, x _i ' is the output of the ith sample to be identified in the decoding network; in the training process, the loss value L of m input samples is calculated each time ₁ Then updating network parameters by a back propagation algorithm until the loss value L of the network ₁ If the network is not descending any more, the training of the coding network and the decoding network is completed;

training of an actor network and a critic network, first, defining a target network, i.e., u (s'; theta, for the actor network and the critic network, respectively ^u ') and Q (s ', a '; θ ^Q ' s ' is the input of the target actor network and the target critic network, a ' is the output action of the target actor network, and theta ^u ' weight parameter of target actor network, θ ^Q ' weight parameter of target critic network, and in training process, parameter of coding network is kept fixedThe training sample set comprises N sample sets and UN sample sets, an action network generates an action a according to an input state s provided by an encoding network and a decoding network, the encoding network and the decoding network calculate a reward r according to the a and provide a next input state s ', the conversion states (s, a, r, s') are stored in a buffer, and m samples (s _i ，a _i ，r _i ，s _i '), m=1, 2 … m, critic network according to s in the transition state _i And output a of actor network _i Calculating a Q value, and calculating a strategy gradient and finishing updating of weight parameters of the actor network by the actor network according to the Q value, wherein the formula is as follows:

wherein y is _i Calculated according to the following formula:

y _i ＝r _i +γQ′(s _i+1 ，u′(s _i+1 |Θ ^u′ )|Θ ^Q′ )

wherein r is _i Gamma for the prize value of the ith sample to be identifiedQ is a critic network output value, and Q' is a target critic network output value;

Θ ^u′ ＝τΘ ^u +(1-τ)Θ ^u′

τ is a weight coefficient, and the value is 0.001;

The invention has the advantages that: the traditional coding network and decoding network finish the correct recovery of the input signal, and the invention introduces a attention mechanism to enable the coding network and the decoding network to selectively recover the input signal. Generating attention probability distribution vectors through a DDPG algorithm, generating conditional feature vectors according to feature vectors of samples to be identified and the attention probability distribution vectors, inputting the conditional feature vectors into a decoding network for signal recovery, calculating the similarity between the output result of the decoding network and an input signal, wherein the similarity is high, the input signal is a known signal, the category of the input signal is the category output by a classifier, and the input signal is an unknown category signal if the similarity is low. The feature vectors output by the coding network are assigned different attentions through the attentiveness mechanism, and for signals of known classes, the main features of the feature vectors are enhanced and the secondary features are weakened, so that the signals of known classes can be correctly recovered. Whereas for signals of unknown class, their main features are weakened so that the signals of unknown class cannot be recovered. Finally, the input unknown class signals are effectively identified, the unknown class signals are prevented from being incorrectly identified as a known class signal, and the known class signals and the unknown class signals are accurately separated.

Drawings

FIG. 1 is a flow chart of a method for identifying unknown individuals of radar signals according to an embodiment of the present invention;

fig. 2 is a schematic block diagram of a method for identifying unknown individuals of radar signals according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described in the following in conjunction with the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

As shown in fig. 1 and 2, a method for identifying an unknown individual of a radar signal, the method comprising: step S1: the method comprises the steps of constructing and storing a known category sample set N and an unknown category sample set UN, specifically: collecting signals of N radars, taking the signals of K radars as known categories to form a known category sample set N, and taking the signals of N-K radars as unknown categories to form an unknown category sample set UN.

Step S2: inputting each sample to be identified in the known type sample set N and each sample to be identified in the unknown type sample set UN into a coding network, extracting the characteristic vector of the sample to be identified and classifying the sample to be identified; the method comprises the steps of extracting and classifying feature vectors of a coding network, wherein the coding network comprises an input layer, a middle layer and an output layer, the input layer inputs samples to be identified, the middle layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the samples to be identified, the pooling layer performs dimension reduction on the samples to be identified, the output layer outputs feature vectors h of the samples to be identified and h=f (wx+b), f () is a function expression mode, h=f (wx+b) represents h is a function related to x, w is weight of the coding network, b is bias of the coding network, x is the samples to be identified of an input one-dimensional sequence, after the feature vectors h are obtained, the feature vectors h are input into a classifier, and the classes of the samples to be identified are obtained, wherein the classifier consists of a full connection layer and a softmax classifier.

Step S3: generating an attention probability distribution vector according to the feature vector of the sample to be identified by utilizing DDPG (Deep Deterministic Policy Gradient) algorithm of deep reinforcement learning (Deep Reinforcement Learning); the main process is as follows: utilizing DDPG algorithm to build actor network u (s; theta) based on convolutional neural network ^u ) The critic network Q (s, a; θ ^Q ) Wherein s is the input of the actor network and the critic network and is equal to the characteristic vector h, theta extracted by the coding network ^u Is the weight parameter of the actor network, θ ^Q The weight parameter is the weight parameter of the critic network; and obtaining an action a through action and random noise sampling generated by an actor network, wherein the action a is the attention probability distribution vector. The coding network calculates the corresponding rewards r according to a and feeds back to the critic network, and transfers to the next input state s ', and stores the transition states (s, a, r, s') in a cache for network training.

Step S4: generating a conditional feature vector according to the feature vector and the attention probability distribution vector of the sample to be identified, and inputting the conditional feature vector into a decoding network to judge the category of the sample to be identified; the specific process is as follows: obtaining a conditional feature vector according to a formula c=h×a, wherein c is a conditional feature vector, inputting the obtained conditional feature vector c into a decoding network, wherein the decoding network structure is symmetrical to the encoding network, the decoding network output result is x '=g (w' c+b '), wherein g () is a function expression mode, x' =g (w 'c+b') represents x 'is a function related to c, w' is a weight of the decoding network, b 'is a bias of the decoding network, x' is an output of the decoding network, and the type of a sample to be identified is determined according to the output of the decoding network. Note that, the conditional feature vector is obtained by multiplying the feature vector h by the attention probability distribution vector a corresponding to the maximum Q value obtained in step S3.

Step S5: network training in the encoding network, decoding network, and DDPG algorithm is performed. The training process belongs to the existing common training mode, and the training process is briefly introduced as follows: training the coding network and the decoding network by using a known class sample set N, wherein a loss function formula is as followsWherein y is _i For the input real class of the ith sample to be identified, y _i ' is the class of the ith sample to be identified output by the coding network, x _i For the ith sample to be identified, m is the number of samples to be identified, x _i ' is the output of the ith sample to be identified in the decoding network; in the training process, the loss value L of m input samples is calculated each time ₁ Then updating network parameters by a back propagation algorithm until the loss value L of the network ₁ If the network is not descending any more, the training of the coding network and the decoding network is completed;

where i represents the ith sample, si is the input state of the ith sample, a _i Representing the output of the actor network in the ith sample, M is the number of samples taken from the buffer,for policy gradient +.>Is a gradient sign; the target actor network is based on s in the transition state _i ' get action a _i ' target critic network according to s _i ' and a _i ' obtaining a target critic network output Q ', calculating a time difference error L of the Q obtained by the critic network and the Q ' obtained by the target critic network, calculating a gradient according to the L, and updating the weight of the critic network, wherein the difference error expression is as follows:

wherein y is _i Calculated according to the following formula:

y _i ＝r _i +γQ′(s _i+1 ，u′(s _i+1 |Θ ^u′ )|Θ ^Q′ )

The weight parameters of the target actor network and the target critic network are updated through a soft update algorithm, and the updating mode is as follows:

Θ ^Q′ ＝τΘ ^Q +(1-τ)Θ ^Q′

Θ ^u′ ＝τΘ ^u +(1-τ)Θ ^u′

τ is a weight coefficient, and is generally 0.001;

The working principle of the invention is as follows: as shown in fig. 2, the signal of the sample to be identified is first passed through the coding network, the implicit characteristics of the signal are automatically extracted, the obtained characteristic vector of the input signal is input into the classifier on one hand, the signal is classified, and the process is consistent with the traditional classifying network flow. On the other hand, the conditional feature vector is obtained by multiplying the conditional feature vector by the attention probability distribution vector generated by reinforcement learning, and the generated conditional feature vector is input to a decoding network for recovering an input signal. And calculating the similarity between the output result of the decoding network and the original input signal, wherein if the similarity is high, the input signal is a known signal, the class of the input signal is the class output by the classifier, and if the similarity is low, the input signal is an unknown class signal. The intelligent agent for reinforcement learning generates attention probability distribution vectors according to the feature vectors of the input signals, judges the types of the input signals according to the output of the decoding network, feeds back the judgment to the reinforcement learning reward as +1 if the judgment is correct, otherwise, is-1, and dynamically adjusts parameters according to the obtained reward so as to achieve the optimal generated attention probability distribution vectors.

According to the method and the device for identifying the unknown individual of the radar signal, disclosed by the invention, the output of the decoding network is effectively controlled by carrying out weighted output on the characteristic vectors of the hidden layers in the middle of the traditional coding network and the decoding network, when an input sample is a known type existing in a training sample set, the decoding network can correctly decode the input sample, and when the input sample is an unknown type not existing in the training sample set, the decoding network cannot correctly decode the input sample. The input samples are classified by the encoding network and the output result of the decoding network is used to determine whether the input samples are of unknown classes that do not exist in the training sample set. Firstly, the signal to be classified passes through a coding network, the implicit characteristics of the signal are automatically extracted, the obtained characteristic vector of the input sample is input into a classifier on one hand, the sample is classified, and the process is consistent with the traditional classification network flow. On the other hand, the conditional feature vector is obtained by multiplying the conditional feature vector by the attention probability distribution vector generated by reinforcement learning, and the generated conditional feature vector is input to a decoding network for recovering input samples. And calculating the similarity between the output result of the decoding network and the original input sample, wherein if the similarity is high, the input sample is a known signal, the class of the input sample is the class output by the classifier, and if the similarity is low, the input sample is an unknown class signal. The intelligent agent for reinforcement learning generates attention probability distribution vectors according to the feature vectors of the input signals, judges the types of the input samples according to the output of the decoding network, feeds back the judgment to the reinforcement learning reward as +1 if the judgment is correct, otherwise, is-1, and dynamically adjusts parameters according to the obtained reward so as to achieve the optimal generated attention probability distribution vectors.

Example 2

Corresponding to embodiment 1 of the present invention, embodiment 2 of the present invention further provides an unknown individual identification device of a radar signal, the device including:

Specifically, the construction of the known category sample set N and the unknown category sample set UN includes: collecting signals of N radars, taking the signals of K radars as known categories to form a known category sample set N, and taking the signals of N-K radars as unknown categories to form an unknown category sample set UN.

Specifically, the extraction classification module is further configured to: the coding network comprises an input layer, a middle layer and an output layer, wherein the input layer inputs a sample to be identified, the middle layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the sample to be identified, the pooling layer performs dimension reduction on the sample to be identified, the output layer outputs a feature vector h of the sample to be identified and h=f (wx+b), wherein w is the weight of the coding network, b is the bias of the coding network, x is the sample to be identified of an input one-dimensional sequence, after the feature vector h is obtained, the feature vector h is input into a classifier, and the class of the sample to be identified is obtained, wherein the classifier consists of a full connection layer and a softmax classifier.

Specifically, the vector generation module is further configured to: utilizing DDPG algorithm to build actor network u (s; theta) based on convolutional neural network ^u ) The critic network Q (s, a; θ ^Q ) Wherein s is the input of the actor network and the critic network and is equal to the characteristic vector h, theta extracted by the coding network ^u Is the weight parameter of the actor network, θ ^Q The weight parameter is the weight parameter of the critic network; and obtaining an action a through action and random noise sampling generated by an actor network, wherein the action a is the attention probability distribution vector.

Specifically, the category determination module further includes: obtaining a conditional feature vector according to a formula c=h×a, wherein c is a conditional feature vector, inputting the obtained conditional feature vector c into a decoding network, wherein the decoding network structure is symmetrical to the encoding network, and the output result of the decoding network is x '=g (w' c+b '), wherein w' is the weight of the decoding network, b 'is the bias of the decoding network, x' is the output of the decoding network, and determining the type of the sample to be identified according to the output of the decoding network.

Specifically, the training module is further configured to: training the coding network and the decoding network by using a known class sample set N, wherein a loss function formula is as followsWherein y is _i For the input real class of the ith sample to be identified, y _i ' is the class of the ith sample to be identified output by the coding network, x _i For the ith sample to be identified, m is the number of samples to be identified, x _i ' is the output of the ith sample to be identified in the decoding network; during training, each time an input is calculatedLoss value L of m samples of (2) ₁ Then updating network parameters by a back propagation algorithm until the loss value L of the network ₁ If the network is not descending any more, the training of the coding network and the decoding network is completed;

where M is the number of samples taken from the buffer,for policy gradient +.>Is a gradient sign; the target actor network is based on s in the transition state _i ' get action a _i ' target critic network according to s _i ' and a _i 'get target critic network output Q', get Q and target critic network according to critic networkAnd calculating the time difference error L of the obtained Q 'and the obtained Q', calculating the gradient according to the L, and updating the weight of the critic network, wherein the difference error expression is as follows:

wherein y is _i Calculated according to the following formula:

y _i ＝r _i +γQ′(s _i+1 ，u′(s _i+1 |Θ ^u′ )|Θ ^Q′ )

wherein r is _i Gamma is a preset weight coefficient for the reward value of the ith sample to be identified, Q is a critic network output value, and Q' is a target critic network output value;

Θ ^u′ ＝τΘ ^u +(1-τ)Θ ^u′

τ is a weight coefficient, and the value is 0.001;

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for identifying an unknown individual of a radar signal, the method comprising:

step five: the network training in the encoding network, the decoding network and the DDPG algorithm is carried out, and the specific process is as follows: training the coding network and the decoding network by using a known class sample set N, wherein a loss function formula is as followsWherein y is _i For the input real class of the ith sample to be identified, y _i ' is the class of the ith sample to be identified output by the coding network, x _i For the ith sample to be identified, m is the number of samples to be identified, x _i ' is the output of the ith sample to be identified in the decoding network; in the training process, the loss value L of m input samples is calculated each time ₁ Then updating network parameters by a back propagation algorithm until the loss value L of the network ₁ If the network is not descending any more, the training of the coding network and the decoding network is completed;

training of an actor network and a critic network, first, defining a target network, i.e., u (s'; theta, for the actor network and the critic network, respectively ^u ') and Q (s ', a '; θ ^Q ' s ' is the input of the target actor network and the target critic network, a ' is the output action of the target actor network, and theta ^u ' weight parameter of target actor network, θ ^Q The 'weight parameter of the target critic network' is the weight parameter of the target critic network, in the training process, the parameter of the coding network is kept fixed, the training sample set comprises an N sample set and a UN sample set, and the actor network generates an action a and codes according to the input state s provided by the coding network and the decoding networkThe network and decoding network calculate the prize r from a and provide the next input state s ', store the transition states (s, a, r, s') in a buffer, collect m samples from the buffer (s _i ，a _i ，r _i ，s _i '), m=1, 2 … m, critic network according to s in the transition state _i And output a of actor network _i Calculating a Q value, and calculating a strategy gradient and finishing updating of weight parameters of the actor network by the actor network according to the Q value, wherein the formula is as follows:

wherein y is _i Calculated according to the following formula:

y _i ＝r _i +γQ′(s _i+1 ，u′(s _i+1 |Θ ^u′ )|Θ ^Q′ )

Θ ^Q′ -τΘ ^Q +(1-τ)Θ ^Q′

Θ ^u′ ＝τΘ ^u +(1-τ)Θ ^u′

τ is a weight coefficient, and the value is 0.001;

2. The method for identifying unknown individuals of radar signals according to claim 1, wherein said constructing a known class sample set N and an unknown class sample set UN comprises: collecting signals of N radars, taking the signals of K radars as known categories to form a known category sample set N, and taking the signals of N-K radars as unknown categories to form an unknown category sample set UN.

3. The method for identifying an unknown individual of a radar signal according to claim 1, wherein the second step includes: the coding network comprises an input layer, a middle layer and an output layer, wherein the input layer inputs a sample to be identified, the middle layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the sample to be identified, the pooling layer performs dimension reduction on the sample to be identified, the output layer outputs a feature vector h of the sample to be identified and h=f (wx+b), wherein w is the weight of the coding network, b is the bias of the coding network, x is the sample to be identified of an input one-dimensional sequence, after the feature vector h is obtained, the feature vector h is input into a classifier, and the class of the sample to be identified is obtained, wherein the classifier consists of a full connection layer and a softmax classifier.

4. A method for identifying an unknown individual of a radar signal according to claim 3, wherein said step three comprises: utilizing DDPG algorithm to build actor network u (s; theta) based on convolutional neural network ^u ) The critic network Q (s, a;θ ^Q ) Wherein s is the input of the actor network and the critic network and is equal to the characteristic vector h, theta extracted by the coding network ^u Is the weight parameter of the actor network, θ ^Q The weight parameter is the weight parameter of the critic network; and obtaining an action a through action and random noise sampling generated by an actor network, wherein the action a is the attention probability distribution vector.

5. The method for identifying an unknown individual of a radar signal according to claim 4, wherein said step four comprises: obtaining a conditional feature vector according to a formula c=h×a, wherein c is a conditional feature vector, inputting the obtained conditional feature vector c into a decoding network, wherein the decoding network structure is symmetrical to the encoding network, and the output result of the decoding network is x '=g (w' c+b '), wherein w' is the weight of the decoding network, b 'is the bias of the decoding network, x' is the output of the decoding network, and determining the type of the sample to be identified according to the output of the decoding network.

6. An unknown individual identification device for radar signals, the device comprising:

the training module is used for carrying out network training in an encoding network, a decoding network and a DDPG algorithm, and is specifically used for: training an encoding network and a decoding network using a set of known class samples NThe loss function formula isWherein y is _i For the input real class of the ith sample to be identified, y _i ' is the class of the ith sample to be identified output by the coding network, x _i For the ith sample to be identified, m is the number of samples to be identified, x _i ' is the output of the ith sample to be identified in the decoding network; in the training process, the loss value L of m input samples is calculated each time ₁ Then updating network parameters by a back propagation algorithm until the loss value L of the network ₁ If the network is not descending any more, the training of the coding network and the decoding network is completed;

wherein y is _i Calculated according to the following formula:

y _i ＝r _i +γQ′(s _i+1 ，u′(s _i+1 |Θ ^u′ )|Θ ^Q′ )

wherein r is _i For the reward value of the ith sample to be identified, gamma is a preset weight coefficient, Q is a critic network output value, and Q' is a target critic network output value;

Θ ^Q′ ＝τΘ ^Q +(1-τ)Θ ^Q′

Θ ^u′ ＝τΘ ^u +(1-τ)Θ ^u′ τ is a weight coefficient, and the value is 0.001;

7. An unknown individual identification device for radar signals according to claim 6, wherein said constructing a known class sample set N and an unknown class sample set UN comprises: collecting signals of N radars, taking the signals of K radars as known categories to form a known category sample set N, and taking the signals of N-K radars as unknown categories to form an unknown category sample set UN.

8. An unknown individual identification device for radar signals according to claim 6, wherein said extraction classification module is further configured to: the coding network comprises an input layer, a middle layer and an output layer, wherein the input layer inputs a sample to be identified, the middle layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the sample to be identified, the pooling layer performs dimension reduction on the sample to be identified, the output layer outputs a feature vector h of the sample to be identified and h=f (wx+b), wherein w is the weight of the coding network, b is the bias of the coding network, x is the sample to be identified of an input one-dimensional sequence, after the feature vector h is obtained, the feature vector h is input into a classifier, and the class of the sample to be identified is obtained, wherein the classifier consists of a full connection layer and a softmax classifier.

9. An unknown individual identification device for radar signals according to claim 6, wherein said vector generation module is further adapted to: utilizing DDPG algorithm to build actor network u (s; theta) based on convolutional neural network ^u ) The critic network Q (s, a; θ ^Q ) Wherein s is the input of the actor network and the critic network and is equal to the characteristic vector h, theta extracted by the coding network ^u Is the weight parameter of the actor network, θ ^Q The weight parameter is the weight parameter of the critic network; and obtaining an action a through action and random noise sampling generated by an actor network, wherein the action a is the attention probability distribution vector.